PROGRESS IN
Nucleic A c i d Research a n d M o l e c u l a r Biology Volume 47
This Page Intentionally Left Blank
PROGRESS IN
Nucleic Acid Research and Molecular Biology edited by
WALDO E. COHN
KlVlE MOLDAVE
Biology Dicision Oak Ridge National I A o r u t o n j Oak Ridge. Tennessee
Departtnent of Molecular Biology and Biochemistry C'nioersity of Ca1if)rnia. Iroine Ircine, California
Volume 47
ACAD EMlC PRESS A Division of Harcourt Brace 6 Company Son Diego New York Boston London Sydney Tokyo Toronto
This book is printed on acid-free paper. @
Copyright 0 1994 by ACADEMIC PRESS, INC. All Rights Reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher.
Academic Press, Inc. 525 B Street, Suite 1900, San Diego, California 92101-4495 United Kingdom Edition published by
Academic Press Limited 24-28 Oval Road, London NWl 7DX InternationalStandard Serial Number: 0079-6603 International Standard Book Number: 0-12-540047-0 PRINTED IN THE UNITED STATES OF AMERICA 94 9 5 9 6 97 98 99
BB
9 8 7 6 5 4 3 2
1
Contents
ABBREVIATIONS AND SYMBOLS. . . , . . . . . . . . . . . . . . . . . . . . . SOME ARTICLESPLANNED FOR FUTUREVOLUMES . . . . . . . .
. . .. .. . . .. ...
ix xi
Prestalk Cell-differentiation and Movement during the Morphogenesis of Dictyosteliurn discoideum Jeffrey Williams and Alastair Morrison I. Extracellular CAMP-signaling in Dictyosteliurn . . . . . . . . . . . . . . . . . . . . 11. Cellular Differentiation and the Role of Differentiation-inducing Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111. Prestalk Cell Heterogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I\'. Slug Format ion . . . . . . . . . lJ.Culmination . . . . . . . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VI. Intracellular Signaling and the Multiple Roles of CAMP-dependent Protein Kinase VII. Other Extracell ........................................ differentiation . \'HI. Concliisions . . . . . . . . . . . . . . . . . . . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 3 10
13 16
19
21 22 23
Collagen Genes: Mutations Affecting Collagen Structure and Expression William G. Cole I. 'The Collagens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111. I\? V. VI. VII. VIII.
Type-I1 Collagen Gene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Type-I11 Collagen Gene ................................ Type-IV Collagen Genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Type-VII Collagen Gene Type-IX Collagen Genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tyt3e-X Collagen Gene . . . . . . . . . . . . . . . . . . . . . . .
.......................... References . . . . . . . . . . . . . .
29 30 47 57 62 65 66 68 70 70
vi
CONTENTS
Signal-Transducing G Proteins: Basic and Clinical implications C. W. Emala, W. F. Schwindinger, G. S. Wand and
M. A. Levine .................... I. Guanine Nucleotide Binding Proteins 11. Structure of a Subunits of G Proteins ........................... 111. Function of a Subunits of G Proteins . . .................... IV. Structure of p Subunits of G Proteins ........................... V. Function of Py Dimers of G Proteins ...................... VI. The P3 Subunit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VII. Clinical Implications of Altered G-Protein Function . . . . . . . . . . . . VIII. Summary ....................... ....................... IX. Glossary .................................... ......... References ... . . . . . . . . . . . . . . . . . . . ....................
81 82 86 88 90 92 93 105 106 107
The tis Genes, Primary Response Genes Induced by Growth Factors and Tumor Promoters in 3T3 Cells Harvey R. Herschman, Dean A. Kujubu, Bradley S. Fletcher, Qiufu Ma, Brian C. Varnum, Rebecca S. Gilbert and Srinivasa T.Reddy I. Phorbol-induced Primary Response Genes Can Be Cloned from Swiss 3T3 Cells . . . . . . . . . . . 11. The tisZl Gene Encodes a Pr Anti-oncogene . . . . . . . . . . . . 111. The tisll Gene Is a Member of a Multigene Family . . . . . . . . . . . . . . . IV. The tislO Gene Encodes a Functional Prostaglandin Synthase/Cyclooxygenase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V. Both Prostaglandin Synthesis and TISlO/PGS-Z Synthesis Are Induced in Swiss 3T3 Cells .................................... VI. The Structures of the tislO/pgs-Z and pgs-1 Genes Are Similar . . . . . VII. tislO/pgs-2 Satisfies the Criteria for the “Second Pool” of Prostaglandin Synthase ........................................ VIII. tislO/pgs-2 Is Induced in Macrophages . IX. Can the Proteins Encoded by the tislO/p Functional Heterodimers? What Would Be the Consequences of This Interaction for Pharmacologic Intervention? ...................... X. Similarities in the Expression and Regulation of Inducible Forms of Prostaglandin Synthase and Nitric-Oxide Synthase . . . . . . . . . . . . . . . . XI. Conclusions and Future Directions . . . . . . . . . ....... References . . . . .............. ....................
115 119 121 124 126 137 139 139
142 142 143 144
CONTENTS
vii
Nuclear Pre-mRNA Processing in Higher Plants Kenneth R. Luehrsen, Sharif Taha and Virginia Walbot I. 11. 111. IV. V. \‘I.
Biochemistry of Splicing and Intron Recognition . . . . . . . . . . . . . . . . . . Plant Splicing . . . . . . . . . . . . . . . . . . . . . . . . Transposable-clement-induced Mutants of Examples of Alternative Splicing . . . . . . . . . . . . . . . . Biological Phenomena Associated with Splicing . . . . . . . . . . . . . . . . . . . Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . .............................
lS0 158 174 183 185 188 190
New Concepts in Protein-DNA Recognition: Sequence-directed DNA Bending and Flexibility Rodney E. Harrington and Ilga Winicov I. DNA Sequence Dependence in Protein-Nuclcic Acid Binding Specificity . . . . . . . . . . . . . . . . . . . . . . 11. .4 Short Taxonomy of DNA-bending Proteins and Their Recognition
196
213
111. Models of Seq
I\’. Past Challenges and Future Prospects . . . . . . . . . . . . . V. Glossary of Ahbreviations and Polynucleotide Notation . . . . . . . . . . . . Refcrenccs . . . . . . . . ........
253 259 26 1 263
Nonsense-mediated mRNA Decay in Yeast Stuart W. Peltz, Feng He, Ellen Welch and Allan Jacobson I. Identification of cis-Acting Sequences Involved in Nonsensemediated mRNA Decay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11. trans-Acting Factors Involved in the Nonsense-mediated mRNA Decay Pathway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111. Possible Functions of the Nonsense-mediated mRNA Decay Pathway IV. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
272 283 290 293 296
Molecular Biology and Regulatory Aspects of Glycogen Biosynthesis in Bacteria Jack Preiss and Tony Romeo I. Genetic Regulation of the Glycogen Synthesis Pathway in Esclierichia coli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
301
viii
CONTENTS
I1. Site-directed Mutagenesis of the ADPglucose Pyrophosphorylase Genes to Study Structure-Function Relationships of Enzyme Action and Regulatory Control ........................................ I11. Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
315 326 327
Diverse Mechanisms for Regulating Ribosomal Protein Synthesis in Escherichia coli Janice M . Zengel and Lasse Lindahl I . Organization of Ribosomal Protein Genes in Escherichia coli I1. Overview of the Control of Ribosomal Protein Synthesis in Escherichia coli
.......
..............................................
I11. Review of Individual Operons .................................. IV. Epilogue .................................................... References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
332 336 341 363 365
Enzymologic Mechanism of Replicative DNA Polymerases in Higher Eukaryotes Paul A . Fisher I . Catalytic Core of DNA Polymerase ci ............................ I1. Holoenzyme of DNA Polymerase a ............................. I11. Interaction of DNA Polymerase ci with Template-Primers Containing Chemically Damaged Nucleotides .............................. IV. DNA Polymerase 6 ........................................... V. Conclusions and Prospects for Future Research . . . . . . . . . . . . . . . . . . . References ..................................................
ADDENDUMTO New Concepts in Protein-DNA Recognition: Sequence-directed DNA Bending and Flexibility . . . . . . . . . . . . . . . . INDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
372 380 386 390 394 396
399 401
Abbreviations and Symbols All contributors to this Series are asked to use the terminology (abbreviations and symbols) recommended by the IIJPAC-IUB Commission on Biochemical Nomenclature (CBN) and approved by IUP.4C and IUB, and the Editors endeavor to assure conformity. These Hecoinmendations have been published in many journals (I, 2) and compendia (3);they are therefore considered to he generally known. Those used in nucleic acid work, originally set out in section 5 of the first Recommendations (1)and subsequently revised and expanded (2, 3), are given in condensed form in the frontmatter ofVolumes 9-33 of this series. A recent expansion of the oneletter system ( 5 )follows. SINGLE-LEI-I-ER C o u e RECOMMENDATIONSO(5) Symbol
Meaning
Origin of symbol
G A T(U) C
Guanosine Adenosine (riho)Thymidine (Uridine) Cytidine
G or A T(U) or C -4or C G or T(U)
G or C A o r T(U)
puRine pyrimidine aMino Keto Strong interaction (3 H-bonds) Weak interaction (2 H-bonds)
A or C or T(U) G or T(U) or C G or C or A G or A or T(U)
not not not not
G o r A or T(U) or C
aNy nucleoside (i.e.. unspecified)
Q
Queuosine (nucleoside of queuine)
G; H follows G in the alphabet A; B follows A T (not U); V follows U C; D follows C
aModified from Proc. &‘at/. Acad. Sci. U.S.A. 83,4 (1986). hW has been used for wyosine, the nucleoside of “base Y” (wye). “D has been used for dihvdrouridine (hU or H, Urd).
Enzymes
In naming enzymes, the 1984 recommendations of the IUB Commission on Biochemical Nomenclature ( 4 ) are followed as far as possible. At first mention, each enzyme is described either by its systematic name or by the equation for the reaction catalyzed o r by the recommended trivial name, followed by its EC number in parentheses. Thereafter, a trivial name may be used. Enzynir names are not to be abbreviated except when the substrate has an approved abbreviation (e.g., ATPase, hut not LDH, is acceptable).
ix
ABBREVIATIONS AND SYMBOLS
X
REFERENCES
1. ]BC 241,527(1966);Bchern 5,1445 (1966);BJ 101,1(1966);ABB 115,1(1966),129,1(1969); and elsewhere. General. 2. EJB 15, 203 (1970);]BC 245, 5171 (1970);] M B 55, 299 (1971);and elsewhere. 3. “Handbook of Biochemistry” (G. Fasman, ed.), 3rd ed. Chemical Rubber Co., Cleveland, Ohio, 1970,1975,Nucleic Acids, Vols. I and 11, pp. 3-59. Nucleic acids. 4. “Enzyme Nomenclature” [Recommendations (1984)of the Nomenclature Committee of the IUB]. Academic Press, New York, 1984. 5. EJB 150, 1 (1985).Nucleic Acids (One-letter system). Abbreviations of Journal Titles JOUt-tta2.S
Abbreviations used
Annu. Rev. Biochem. Annu. Rev. Genet. Arch. Biochem. Biophys. Biochem. Biophys. Res. Commun. Biochemistry Biochem. J. Biochim. Biophys. Acta Cold Spring Harbor Cold Spring Harbor Lab Cold Spring Harbor Symp. Quant. Biol. Eur. J. Biochem. Fed. Proc. Hoppe-Seyler‘s Z. Physiol. Chem. J. Amer. Chem. SOC. J. Bacteriol. J. Biol. Chem. J. Chem. SOC. J. Mol. Biol. J. Nat. Cancer Inst. Mol. Cell. Biol. Mol. Cell. Biochem. Mol. Gen. Genet. Nature, New Biology Nucleic Acid Research Proc. Natl. Acad. Sci. U. S.A. Proc. SOC.Exp. Biol. Med. Progr. Nucl. Acid. Res. Mol. Biol.
ARB ARGen ABB BBRC Bchem BJ BBA CSH CSHLab CSHSQB EJB FP ZpChem JACS J. Bact. JBC JCS JMB JNCI MCBiol MCBchem MGG Nature NB NARes PNAS PSEBM This Series
Some Articles Planned for Future Volumes
The Poly ADP-ribosylation System of Higher Eukaryotes
FELIXH. ALTHAUS Role of Multisite Phosphorylation in the Regulation of RNA Polymerase II Activity M . E. DAHMUS Adenylyl Cyclases. A Heterogenous Class of ATP-utilizing Enzymes
OCTAV~AN BARZU AND ANTOINE DANCHIN Genetic Dissection of the Synthesis and Function of Modified Nucleotides in Bacterial tRNA
GLENNBJORK mRNA Binding Proteins in Eukaryotic Cells
TOMDONAHUE AND K . GULYAS Processing of Eukaryotic Ribosomal RNA
DUANEC. EICHLERAND NESSLYCRAIG Mechanism of Transcription Fidelity GUNTHER
EICHHORN AND JIMBUTZOW
Regulution of Replication of an Iteron-containing DNA Molecule M. FILUTOWICZ, S. DELLIS,I. LEVCHENKO, M. URH, F. wu AND D. YORK Molecular Properties and Regulation of G Protein-coupled Receptors CLAIREM. FRASER,NORMANH . LEE, SUSANM. PELLECRINO AND ANTHONY
R. KERLAVAGE The Human Immunodeficiency Virus Type-1 Long Terminal Repeat and Its Role in Gene Expression JOSEPH A. GARCIAAND RICILARD B . GAYNOR The Mechanics and Specificity of Signal Transduction to the Nucleus: Lessons from c-fos
MICHAEL GILMAN RNA Polymerase as a Molecular Machine: The Coupling between Catalytic Function and Propagation along DNA ALEX GOLDFARB
xi
xii
SOME ARTICLES PLANNED FOR FUTURE VOLUMES
Polynucleotide Recognition and Degradation by Bleomycin STEFANIEA. KANE AND SIDNEYM. HECHT Aminoacyl-tRNA Synthetases from Higher Eukaryotes L. KISSELEVAND A. D. WOLFSON
START Control in Cycling S. cerevisiae Cells , ROTTJAKOB,A. SCHWEDAND W. ZWERSCHKE H. K ~ N T Z E L H.-W. Adeno-associated Virus Type 2: A Latent Life Cycle C. J. LEONARDAND K. I. BERNS Uracil-excision DNA Repair D. W. MOSBAUGH AND S. E. BENNETT The Regulation of Ribosomal Transcription TOMMoss Analysis of EGF-Receptor Interaction by Protein Engineering SALILK. NIYOGIAND STEVECAMPION The Role .of the 5' Untranslated Region of Eukaryotic mRNAs in Translation and Its Investigation Using Anti-sense Technologies H. E. JOHANSSON AND M. W. HENTZE K. PANTOFOULOS, New Members of the Collagen Gene Family TMNAPIHLAJANIEMI AND MARKREHN DNA Methylation from Embryo to Adult A. RAZIN AND T. KAFRI The Prosomes (Multicatalytic Proteinases-Proteasomes) and Their Relation to the Untranslated Messenger Ribonucleoproteins, the Cytoskeleton, and Cell Differentiation KLAUSSCHERRER AND FAYCALBEY Biological Implications of the Mechanism of Action of Human DNA(Cytosine-5)Methyltransferase STEVENS. SMITH Human Mutational Spectrometry: Means and Ends WILLIAMG. THILLY AND KONSTANTIN KHRAPKO The Balbiani Ring Multigene Family: Coding Repetitive Sequences and Evolution of a Tissue-specific Cell Function L. WIESLANDER
Prestalk Cell-differentiation and Movement during the Morphogenesis of Dictyosteliurn discoideum W I L L I AAND ~ ALASTAIRMORRISON JEFFREY
Imperial Cancer Research Fund Clare Hall Laboratories South Mimms, Herts EN6 3LD United Kingdom I. Extracellular CAMP-signaling in Dictyosteliuin .................... 11. Cellular Differentiation and the Role of Differentiation-inducing
3
......................................
3 10 13 16
IV. Slug Formation.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V. Culmination . . . . . . . ............
......................
Protein Kinase
19
alk and Stalk Cell-
VII. Other Extracel
........ ........................................... References .....................
differentiation . . .
VIII. Conclusions
21 22 23
The cellular slime mold Dictyostelium discoideum displays almost all the characteristic features of development found in higher eukaryotes, so that insights gained by studying Dictyostelium are likely to be of general relevance. First, there is true cellular dqferentiation. Growing amoebae entering development have two totally distinct choices: 80%form spores and 20% form stalk cells (Fig. 1).The spores are elliptical cells that are highly resistant to environmental insult. The stalk is composed of dead, vacuolated cells surrounded by a wall composed of protein and cellulose (1-3). The fruiting bodies, or culminants, have a distinctform. The stalk tapers from bottom to top and is widely expanded at the base, to create the supporting basal disc from which D. discoideum derives its name. There is regulation at a level that is unsurpassed in other developing systems, with an approximate 4-to-1 stalk-to-spore cell-ratio being maintained at widely varying aggregate sizes (4, 5). Finally, Dictyostelium shows morphogenetic cell movement, initially to
Progress in Nucleic Acid Research and Molecuhr Biology, Vol. 47
1
Copyright 0 1994 by Academic Press. Inc. All rights of reproduction in any form reserved.
2
JEFFREY WILLIAMS AND ALASTAIR MORRISON Earty
wengate
Tight aggregate
First Finger Prestalk
Cells
-
M i culminant
Late culminant
A
Slug
FIG. 1. The Dictyosteliurn asexual life cycle. This is a highly schematic representation of the development of Dictyosteliurn. Cellular aggregation yields mounds containing up to 105 cells, wherein about 20% of cells differentiate into prestalk cells and the remaining cells differentiate as prespore cells. The environmental conditions that exist below the surface of the forest floor (high humidity, darkness, and high ionic strength) promote formation of a migratory slug. If the aggregate forms under the conditions that exist at the surface, it enters culmination immediately, the migratory slug phase being entirely omitted. Typically, the slug is 1-2 mm in length. When the slug encounters conditions favorable for fruiting body formation, it enters culmination. If the slug phase is eschewed, the entire developmental process is completed within about 24 hours.
bring cells together during aggregation and then later to build the fruiting body. In this review we concentrate on multicellular development, the period after the cells have aggregated together. Furthermore, we lay special emphasis on prestalk and stalk cells, because it is their movement and differentiation that are ultimately responsible for building the fruiting body. The genes encoding the extracellular matrix' proteins, EcmA and EcmB, are markers of prestalk and stalk cell-differentiation (6) and analysis of their expression reveals an unexpectedly complex pattern of prestalk celldifferentiation and movement, both during slug formation and at culmination (7, 8). Three extracellular signaling molecules, CAMP, DIF, and ammonia, combine to regulate expression of the two genes. We review what is known about the cognate intracellular signal-transduction systems.
* Abbreviations: Ecm, extracellular matrix; DIF, differentiation-inducing factor, 1-[(3,5dichloro-2,6dihydroxy-4-methoxy~phenyl]hexan-l-one; ALC, anterior-like cell; PSV, prespore vesicle: PKA, CAMP-dependent protein kinase; pst, prestalk; psp, prespore.
Dictyostelium
MOFWHOGENESIS
3
1. Extracellular CAMP-signaling in Dictyostelium
A. Multiple Developmental Roles of Extracellular cAMP Signals The movement of cells toward each other at aggregation occurs as the result of the chemotaxis of individual amoebae up a concentration gradient of cAMP (reviewed in 9). Signaling is initiated by cells at the center of the aggregation territory. On receipt of a cAMP signal, cells farther out in the territory synthesize and secrete CAMP, so relaying the signal. In addition to its function as a chemoattractant, cAMP regulates gene expression at several stages of development. Genes expressed early in development are transcriptionally repressible by CAMP, genes expressed somewhat later during aggregation are inducible by cAMP pulses, and gene expression in prespore cells is inducible by a high constant level of cAMP (reviewed in 10-13).
B. The cAMP Signal-transduction System The complex pattern of responses to cAMP is probably achieved by utilizing multiple receptors that differ in their precise responses to extracellular CAMP. The cAMP receptors are proteins with seven trunsmembrane domains and they exhibit a high degree of homology to mammalian G-protein-linked receptors, such as the P-adrenergic receptor (14). There are at least four different cAMP receptors (15-17) and at least eight Ga proteins (18-20). The genes encoding the receptors and the Ga proteins are developmentally regulated, both temporally and in their cell-type specificity, and this provides a mechanism whereby cells can change their responsiveness to cAMP as they differentiate. In response to receipt of a cAMP signal, adenylate cyclase, guanylate cyclase, and phospholipase C are activated, but the precise coupling of receptors to G proteins is not well understood (reviewed in 11,21-25). The intracellular signal-transduction pathways responsible for regulating prespore cell-specific gene expression are as yet undefined.
II. Cellular Differentiation and the Role of Differentiation-inducing Factor
A. Markers of Cellular Differentiation At the end of aggregation the cells group into a cylindrical structure known as the first finger (Fig. 1). If the first finger is formed under the environmental conditions that exist below the surface of the soil or leaf litter,
4
JEFFREY WILLIAMS AND ALASTAIH MOHHISON
it topples onto its side and undertakes a period of migration as a slug. The slug is both phototactic and thermotactic, and these sensitivities direct it to the surface, where terminal differentiation, to form a fruiting body, takes place (26-28). The slug is patterned. Cells within the front one-fifth differentiate to form stalk cells, whereas most of the cells in the rear four-fifths differentiate to form spore cells (1).There are, however, scattered cells in the back of the slug, known as anterior-like cells (ALCs), that share most of the properties of prestalk cells (29-31). Cells are not irreversibly committed to these fates, so that if the slug is cut between the prestalk and prespore cells, there is regulation, i.e., some of the prestalk and prespore cells change their fate to give a normally proportioned fruit (1, 32, 33). Regulation implies communication, and two extracellular signaling molecules that control initial prespore and prestalk cell-differentiation are, perhaps, involved in regulation. As mentioned, extracellular CAMP induces prespore differentiation, whereas DIF, a chlorinated hexaphenone (Fig. 2), induces prestalk and stalk cell-differentiation (34-38). The DIF signaling pathway has been investigated using two genes that depend on DIF for their expression, the ecmA and ecmB genes (6).These genes also provided the first definitive markers of prestalk- and stalk-specific differentiation. Because there is some confusion as to the reliability of various prestalk-specific markers, and of the nature of the extracellular signals that induce prestalk celldifferentiation, it is worthwhile considering these questions in some detail. Markers of prespore differentiation are relatively easy to identih. Prespore cells contain vesicles (prespore vesicles, or PSVs) that act as the repository for spore coat-proteins prior to spore maturation, at which time they fuse with the cell membrane (39, 40). The PSVs can be detected immunologically and the genes encoding the major spore coat-proteins, the cotA, OH
0
CI FIG. 2. The clietnical structure of DIF-1. l-[(R,5-dichloro-2,6-dih?.drox).-4-metl~oxy)pheiiyl]hesan- 1-one.
Dictyostelium
MORPHOGENESIS
5
-B, and -C genes (formerly called, respectively, the SP96, -70, and -60 genes), have been cloned (41-45). In another approach to identifying prespore-specific markers, monoclonal antibodies have been raised against slug-stage cells, yielding MUD1, an antibody that detects a protein called PsA, which is expressed only in prespore cells (46, 47). Subsequent analysis showed that the PsA protein is encoded by a gene, termed D19 (48), originally identified by differential screening of a cDNA library prepared &om slug-stage cells (49). The library was screened with labeled cDNA prepared using RNAs isolated from prespore cells and prestalk cells separated by centrifugation through a density gradient (50, 51). A similar approach has been used to identify several other prespore-specific mRNAs encoding proteins of unknown function (41). Prespore markers are, for the most part, highly specific to prespore cells and they share the property of being induced when cells shaken in suspension are exposed to exogenous cAMP (41, 49, 52). Conversely, if CAMPsignaling is disrupted, by disaggregating slug cells and shaking them rapidly in buffer in the absence of CAMP, their transcription is inhibited and their mRNAs are rapidly degraded (52-56). In some cases the mRNAs can be induced to reaccumulate by addition of CAMP (57,58). Thus, prespore gene expression is both inducible by, and dependent on, extracellular CAMP. Together, these results support the idea that extracellular cAMP signals induce and maintain prespore cell-differentiation. This success in identifying markers of the prespore cell-differentiation pathway has led to the application of similar approaches to the prestalkcell pathway. Indeed, the same differential screening experiments that yielded prespore-specific markers also yielded cDNA clones that hybridize to mRNA sequences enriched in prestalk cells (41, 49). These prestalk-enriched mRNAs are inducible by cAMP (52). Hence the conclusion drawn was that cAMP induces both prestalk and prespore cell-differentiation. On this premise cAMP becomes a truly totipotent signal, responsible for chemotaxis, for regulating gene expression at stages up to slug formation, and for the differentiation of both cell types. There is, however, a conceptual problem in using the same signal to induce the differentiation of two cell types within such a unitary structure. The CAMP-signaling system operates on a relay principle, so that all cells within a territory or aggregate should, in theory, perceive the same signal. Although developmental changes in the repertoire of receptors and G proteins can alter the signaling system in the ways described above, prior to the process of cellulur dgferentiation all cells must, by definition, possess the same signaling machinery. Thus it is difficult to see how CAMP-signaling could be used at one time in development to induce the differentiation of two different cell types. This theoretical objection was validated using a different
6
JEFFREY WILLIAMS AND ALASTALR MORRISON
approach to the problem of identifying prestalk-specific markers. This approach relied on che identification of the extracellular signal that actually induces prestalk and stalk cell-differentiation.
6. DIF-inducible Genes as Markers of Prestalk Cell-differentiation A major problem in analyzing the induction of prestalk and stalk celldifferentiation is that of access: the prestalk cells are a minor population in a multicellular, three-dimensional structure. A major advance was the demonstration that isolated cells in a tissue culture dish could be induced to differentiate into stalk cells (59). Subsequent analysis showed a density dependence of stalk cell-differentiation in such an in oitro induction system (34, 35). Cells produce a factor that, when removed from a population of cells incubated at a high cell density, can induce cells incubated at a low density to differentiate into stalk cells (36, 37). This factor, differentiation-inducing factor (DIF), was purified and shown to be a chlorinated hexaphenone (38, 60) (Fig. 2). There is a family of related chlorinated compounds, but DIF-12 i$ by far the most biologically active (60-62). D I F is produced during Dictyostelium development and acts at nanomolar concentrations to divert cells from prespore to stalk cell-differentiation (63-66). A mutant strain called HM44, which accumulates greatly reduced amounts of DIF, becomes arrested in its differentiation just prior to tip formation (67). When analyzed by two-dimensional gel electrophoresis the HM44 mutant was found to synthesize prespore-specific gene products but it did not express stalk-specific proteins (68). HM44 is an excellent strain in which to search for DIF-inducible genes, because stalk cell-differentiation is dependent on the addition of exogenous DIF. A cDNA library prepared from HM44 cells exposed to D I F for 12 hours was screened with probe cDNA prepared from the same mRNA or from mRNA isolated from cells not exposed to DIF. This yielded three different cDNA clones, pDd63, pDd56, and pDd26, that hybridize to mRNAs dependent on D I F for their expression (6). The three mRNA sequences differ in their time course of induction by D I F (6).The pDd63 gene is the most rapidly induced; there is an increase in the level of its transcription within 15 minutes of the addition of DIF. It therefore seems likely to be directly induced by the D I F signal transduction pathway. The pDd56 mRNA accumulates much more slowly but, again, D I F acts at the level of gene transcription to induce its accumulation. The pDd26 mRNA accumulates only at the very end of the induction process and addiLn all
subsequent discussions “DIF” is to be taken to mean the DIF-1 molecule
Dictyostelium
MORPHOGENESIS
7
tion of an inhibitor of protein synthesis prevents its induction by DIF (G. Weeks, personal communication). The relative induction kinetics of the three mRNAs are only partially reflected in their time courses of accumulation during normal development (6, 66). The pDd56 and pDd63 mRNAs are first detectable, approximately contemporaneously, about 2-3 hours prior to tip formation. In one study the pDd56 mRNA was actually detectable slightly before the pDd63 mRNA, but this may simply reflect a difference in relative sensitivity of detection of the two gene products (66, 69). The pDd26 mRNA first appears only much later during development, at culmination (6).The pDd63 and pDd56 mRNAs are expressed within the slug and can therefore be used to analyze gene expression in isolated prestalk and prespore cells. There is always cross-contamination of cell types in such experiments but, as closely as could be judged, both mRNA sequences were found to be expressed only in prestalk cells (6).
C. Two Alternative Types of Prestalk-enriched Markers The fact that the pDd63 gene, which seems to be directly inducible by the DIF signaling pathway, is expressed only in prestalk cells constitutes very strong evidence that DIF is the inducer of prestalk cell-differentiation (70). What then of the CAMP-inducibleprestalk-specific genes? Direct, side by side, comparison showed that the CAMP-inducible mRNAs are all present, to a greater or lesser extent, in prespore cells, i.e., that they are enriched in prestalk cells rather than being specific to prestalk cells (41,49).It was suggested, therefore, that there might be two ways of achieving prestalk enrichment, a primary induction route and a secondary route (6, 10, 52). Genes such as pod56 and pDd63, which are induced by DIF and which are expressed only as a cell becomes a prestalk cell, employ the primary mechanism. The other class of genes was suggested to become prestalk enriched by a secondary mechanism, whereby the genes were first expressed in most or aI1 cells, with their mRNA sequences then being selectively degraded in prespcre cells. There is a precedent for such behavior from analysis of a protein detected by a monoclonal antibody (71)and there is now additional evidence for such a mechanism for one of the best characterized prestalk-enriched gene products, that encoded by the Dictyostelium rusD gene. The rusD gene is inducible by CAMPbut is not inducible by DIF ( 6 , 4 1 , 72). The rusD mRNA accumulates to a very low level during growth, disappears early during development, and then reaccumulates later during aggregation (41, 72). The mRNA is only threefold enriched in prestalk cells compared to prespore cells, purified by gradient centrifugation (6),but when the
8
JEFFREY WILLIAMS AND ALASTAIR MORRISON
msD promoter is coupled to a lmZ reporter gene and introduced into Dictyosteliuni by transformation, it proves to be a very good marker for prestalk cells in the migrating slug (73). The fusion gene is strongly expressed in the prestalk region and in ALCs, but there is very little, if any, P-galactosidase protein in prespore cells. A resolution of this apparent conflict with the conclusions drawn from studying DIF-inducible genes has recently come from an analysis of the expression of the rusD-lmZ fusion gene during slug formation (K. Jerinyn and J. G . Williams, unpublished). The fusion gene is expressed in both prestalk and prespore cells during aggregation, and at the tight-mound stage, but the P-galactosidase protein produced by this fusion gene must be unstable, because, by the slug stage, only the prestalk cells are stained. Thus the rusD-lmZ fusion gene utilizes a secondary enrichment mechanism and s o cannot be used to determine the signals that induce prestalk celldifferentiation. Interestingly, the gradient analysis of slug-stage cells showed there to be only a marginal enrichment of the mRNA in prestalk cells, whereas the ZacZ fusion-gene analysis showed there to be no 6-galactosidase protein in prespore cells (6, 73). Perhaps there is some form of translational control at work, whereby the rusD mRNA is present in prespore cells but in an untranslatable form. There is a precedent for this notion derived from an analysis of the closely related rusG gene (74). In axenically growing cells, the rusG niRNA is present throughout development, but the protein disappears from the cells late during aggregation (S. Robbins and G. Weeks, personal communication). Wtiatever the explanation for the presence of the rusD mRNA in prespore cells, the anomaly is resolved: there are no known CAMP-inducible prestalk-specific inRNA sequences, DIF is the inducer of prestalk celldifferentiation, and the challenge now is to determine how it fulfills this function. The DIF-inducible genes have started to give insights into this question, but before discussing the data, we first discuss briefly their possibk= hiological functions.
D. Functional Analysis of the DIF-induced Prestalk-specific mRNAs The pDd26 inRNA encodes a small protein, of unknown function and cellular localization (75), while the pDd63 and pDd56 mRNA sequences encode large proteins, each composed of a linear array of a 24-amino-acid, cysteine-rich repeat (70, 76) (Fig. 3a). The pDd63 gene contains approximately seventy copies of the repeated sequence whereas the pDdS6 gene contains about 35 copies of a very closely related sequence. The migrating slug is surrounded by an extracellular matrix that f o r m a kind of sausage
Dictyostelium MORPHOGENESIS
9
a
b Protein
Antibody
(ST430)
ecm6
EcmB
(Sr310)
JABl&2
FIG.3. (a) The consensus sequences of the repeats encoded by the ecmA and e m B genes. The residues marked with an asterisk are “invariant” residues, defined as being present in more than 90% of copies of the repeat (76). Cysteine residues are highlighted by the boxes. (b) Tabulation of the nomenclature and antibody recognition patterns of the e d and e m B genes. The JAB2 monoclonal antibody recognizes both the e d - and ecmB-encoded proteins whereas the JABl antibody recognizes only the emB-encoded protein (83).
tube (1, 77).The discarded skin, the slime trail, is left on the substratum as the slug moves forward. Stalk cells are surrounded by a chemically similar matrix called the stalk tube (3, 78, 79). Immunoelectron microscopy shows that the pDd63- and pDd56-encoded proteins are present in both matrices (80, 81). The antibodies used for these analyses were also used to show that the proteins encoded by the pDd63 and pDd56 mRNA sequences were already known (81). They had previously been detected by two-dimensional gel electrophoresis, when they were, respectively, called ST430 and ST310 (82) (Fig. 3b), and they were also identified using two monoclonal antibodies, JABl and JAB2 (83) (Fig. 3b). Because they encode extracellular matrix proteins, the pDd63 gene has been renamed e c d and the pDd56 gene has been renamed ecmB. Despite their presence in both the slime sheath and stalk tube, the EcmA protein probably has a primary role during slug formation and the EcmB protein
10
JEFFREY WILLIAMS AND ALASTAIR MORRISON
probably plays a major role during culmination. Part of the evidence for these conclusions derives from analysis of a mutant in which the ecmA gene is inactivated by gene disruption (A. Morrison, unpublished). The mutant forms highly elongated first fingers that are prone to the effects of desiccation, but these go on to form apparently normal fruiting bodies. Thus the EcmA protein plays a role in shaping the slug but, after this stage, it appears to have no essential function. Disruption of the e m B gene has no apparent efiect upon morphogenesis, perhaps because the EcmA protein can subsume its function (A. Morrison, unpublished). The other evidence for the respective roles assigned to these two proteins comes from a consideration of when, and where, they are made. These studies also produced the surprising observation that the slug is composed not of one type of prestalk cell but of several prestalk cell subtypes.
111. Prestalk Cell Heterogeneity When the promoters of the e c m and ecmB genes were coupled to an immunologically detectable reporter gene and introduced into Dictyostelium cells by transformation, a hitherto unsuspected degree of complexity in prestalk cell-differentiation was revealed (7).The ecmA gene was found to be expressed at a high level in cells in the front part of the prestalk zone, but there was no detectable expression in cells in the rear part of the prestalk zone (Fig. 4). The expressing cells were termed pstA cells and the apparently nonexpressing cells were termed pstO cells. Subsequently, when the ecmA promoter was fused to the ZacZ gene, which provides a much more sensitive assay for gene expression, the pstO cells expressed the ecmA gene but at a lower level than did the pstA cells (8). ALC
PstA
FIG.4. Prestalk cell heterogeneity within the slug. Schematic representation of the slug stage of development illustrating the positions and proposed movement patterns of the prestalk cell types. The pstO cells are found in the posterior portion of the prestalk zone; the arrows indicate possible interchange between the anterior-like cells (ALCs) and pstO cells. The pstA cells are shown moving into the central core, where they differentiate into pstAB cells. The pstA cells replace the pstAB cells, which are occasionally lost into the slime trail as the slug moves forward (W), and the pstA cells are replaced by pstO cells.
Dictyostelium MORPHOGENESIS
11
A recent analysis of the e d promoter gives us an insight into the mechanism responsible for the difference in the level of ecrnA gene expression by pstA and pstO cells. The characteristic staining pattern described above is obtained when a region comprising 1694 nucleotides upstream of the cap site of the ecmA gene is fused to the lac2 reporter gene. Analysis of subfragments of the promoter shows that there are at least two discrete regions that direct expression in pstO cells, and that a region proximal to the cap site is necessary for expression in pstA cells (84) (Fig. 5a). Although a fusion gene that is expressed only in pstO cells has been generated, it has not as yet been possible to create a fusion gene that is expressed only in pstA cells. The pstO region occupies the approximate rear two-thirds of the prestalk zone during slug formation. If the first finger forms a migratory slug, the anterior prestalk zone eventually becomes stained, perhaps as the result of forward movement of pstO cells into the pstA zone.
ecmA -1694
-1212
I
-1048
-531
I
I
IpbtO/ALC]
I
I
I
pstO/ALC
ecmB CAP -1614
Chi
I
-1503 I
-a77 - 8 s -757
I
-208
-486
I
I
On in a n t d a prestalk cdls
Off in sntdor prostalk cdk
FIG.5. Comparison of the promoter elements of the ecmA and ecmB genes. Both genes have their regulatory elements contained within approximately 1.6 kb of upstream sequence (84, 119). Also, both genes contain a proximal element that directs expression in anterior prestalk cells (region -877 to -757 in the ecmB gene and region -120 to +41 in the ecmA gene), and distal elements that drive expression in ALC- and pstO-derived cells. The ecmB gene, however, contains an additional region (-486 to -208) that represses the expression of this gene in all anterior prestalk cells until they enter the stalk tube.
12
JEFFREY WILLIAiMS AND ALASTAIR MORRISON
Interestingly, a pstO-specific construct that contains only 164 nucleotides of the promoter region is also expressed strongly in ALCs. There is known to be a considerable rate of forward movement of these cells into the prestalk zone (85, 86), and it may be that they and the pstO cells are the same celltype located within different parts of the slug (Fig. 4). Further evidence for this notion comes from an analysis of their respective fates at culmination, and this will be discussed below. This idea does not, however, provide an entire explanation for the existence of ALCs, because some ALCs express only the ecinB gene while others express both ecmA and ecmB genes (8, 87). Also, some ALCs, identified by selective staining with vital dyes (the original criterion for this cell type), express neither the ectnA nor the ecmB gene (8). The existence of discrete pstA and pstO regions within the slug tip poses a number of interesting questions. How are they formed and maintained and what is the functional significance of this heterogeneity? In transplantation experiments, in which a coherent mass of pstO cells or pstA cells was introduced into the front or back of the prestalk region, the two different cell types seemed to know their correct address; i.e., they rapidly returned to their places of origin (88).Possibly, therefore, the discrete zones are formed and maintained by cell sorting. The close relationship (and possible identity) between pstO cells and ALCs provides a clue as to their function. The ALCs have been proposed to play a key role in regulation within the slug by acting as a kind of a halfivay house, with the option of trans differentiating, either to become an authentic prestalk cell that moves forward into the anterior zone, or to become a prespore cell that remains in the rear of the slug (89, 90). It may be important for this role as an intermediary that the ALCs spend part of their existence within the prespore zone and part of their existence in the prestalk region. The remaining prestalk cell subtype within the anterior region is found as a cone-shaped mass of cells that express both ecinA and ecmB genes (8, 87) (Fig. 4). These cells were originally identified by their expression of the eonB gene and thus were termed pstB cells. They were subsequently renamed pstAB cells when it was realized that they express both ecnd and ecmB genes (87). The pstAB cells lie in the position where stalk tube formation is initiated at culmination and they can be regarded as a kind of stalk primordium. During culmination, pstA cells differentiate into pstAB cells and then into stalk cells in a steplike manner (Fig. 6). It seems that during slug migration some of the pstA cells make the transition into pstAB cells prematurely and then form the central core (90). These observations show that there is an underlying similarity between D. discoideum and other dictyostelid species, such as Dictyosteliutn mucoroides, which continuously produce a stalk during slug migration by trans differentiation of prespore cells at the slug tip (91).
Dictyosteliurn
13
MORPHOGENESIS Upper Cup/Lower Cup/Basai
Cells in late aggregation pStA
--
(ecm.4 high)
Slug Formation
pstAB
,Stalk
(ecmn & e m s )
Culmination
FIG.6. Proposed model for a stepwise progression along the prestalk cell differentiation pathway. W e propose two separate cell pathways along which a prestalk cell may differentiate. It can become a pstA cell or it can become an ALC/pstO cell. The pstA cells invariably differentiate into pstAB cells and thence into stalk cells. However, an ALClpstO cell can either enter the stalk tube and become a stalk cell or it can remain outside the stalk tube and become part of one of the three ancillary structures-the upper cup, the lower cup, or the basal disc.
PstAB cells are stained selectively by the vital dye methylene blue and have been observed to move back through the slug periodically, eventually to fall behind in the slime trail (90). Because some slugs had multiple pstAB regions scattered along the length of the slug, there must be replacement of pstAB cells by recruitment of pstA cells. One attractive idea is that the pstO cells might replace these pstA cells, so accounting for the gradual loss of the discrete pstO zone during slug migration (Fig. 4). The slug should therefore be viewed as a dynamic structure in which pstAB cells discarded into the slime trail are replaced by trans differentiation. The ultimate source of renewal must be the prespore cells, and their trans differentiation into prestalk cells may account for the apparent erosion of the clear boundaries of the prestalk and prespore zones that occurs during prolonged slug migration (92, 93).
IV. Slug Formation A. The Mechanism of Tip Formation Because it is such an attractive system in which to study pattern formation, the process of slug formation has been one of the central foci of Dictyosteliurn research. There are two obvious mechanisms whereby the slug might be formed. The prestalk and prespore cells could differentiate in situ, in response to positionally localized morphogenetic signals. Alternatively, the prestalk and prespore cells could differentiateat random positions within
14
JEFFREY WILLIAMS AND ALASTAIR MORRISON
the aggregate and then move to their respective final positions in a cellsorting process, The weight of evidence greatly favors a cell-sorting mechanism for pattern formation (reviewed in 12), and analysis of the ecmA and ecmB genes provides strong support for this notion. The hndamental problem is to determine how the tip is formed, because this is the event that breaks the symmetry of the aggregate, and the tip of the aggregate eventually comes to form the prestalk zone in the migratory slug. Where do the first prestalk cells arise? Analysis of gene products such as those encoded by the CP2 and rasD genes, which become enriched by the secondary mechanism described in Section II,A, showed that the cells first expressed appear at scattered positions within the aggregate (73, 94). However, these markers cannot safely be used to determine whether there is cell-type sorting during slug formation, because they are initially expressed in both prestalk and prespore cells. Analysis of cells expressing the ecmA-lacZ fusion gene showed that the first detectable expression occurs in randomly scattered cells at the tightaggregate stage and the expressing cells then accumulate in the tip (95). If prestalk cells differentiate at random positions within the aggregate, presumably by exposure to DIF, how is the ratio of prestalk-to-prespore cells regulated? Analysis of the metabolism of DIF suggests that there may be a homeostasis mechanism that limits the number of cells that differentiate as prestalk cells. DIF induces DIF-1 dechlorinase, the enzyme responsible for DIF degradation, and prestalk cells are enriched in this enzyme (96). This could provide a feedback loop to control the concentration of DIF within the aggregate (96). As they enter development, cells are heterogeneous with respect to variables such as position in the cell cycle and nutritional status, and it may be that these factors affect the relative sensitivity to DIF (89, 97, 98). Indeed, there is a clear correlation between the eventual fate of a cell and the phase in the cell cycle at which a cell finds itself when it receives the starvation signal that triggers development (99-102). However, results from cell-cycle studies should not be over-interpreted. While they may provide an inbuilt heterogeneity that prejudices cells toward one or another pathway, there is not an absolute correlation between cell-cycle position and cell fate. Also, this is a regulative developmental system (100,103), so that cell-to-cell communication, by diffusible morphogens such as DIF, must play the over-riding role in controlling the ratio of prestalk cells to prespore cells. The accumulation of pstA and pstO cells within the tip is, of course, only one part of the process of tip formation. The aggregate must undergo a shape change, to convert the hemispherical mound into a cylindrical first finger. This process is in many ways analogous to gastrulation in higher organisms
Dictyostelium MORPHOGENESIS
15
and it may be that similar mechanisms are employed. This process of cellular intercalation presumably requires specific changes in cell shape and this may explain why strains mutated in cytoskeletal components are defective in tip elongation (104-107).
B. Apical Sorting of Prestalk Cells in Response to CAMP-s ignaIing Analysis of a strain that overexpresses extracellular cAMP phosphodiesterase (108), so that the extracellular cAMP concentration is reduced, showed that the sorting of ecd-expressing cells to the tip was greatly retarded (109).Prestalk cells detected by vital dyes can be caused to migrate to the base, rather than to the tip, if tight mounds are transferred to a substratum containing cAMP (110). This also holds true for ecmA-expressing cells in the phosphodiesterase overexpressing strain (109). These results support the notion of the tip as a source of CAMP-signaling and accord well with the observation that prestalk cells are more chemotactically responsive to cAMP than are prespore cells (30, 111, 112).
C. Basipetal Migration of pstB Cells When e c d and ecmB gene expression during slug formation was analyzed, using immunologically detectable markers, most detectable ecmBexpressing cells were found within the base of the tight mound (95). Now, using the much more sensitive enzymatic staining procedures, we know that ecmB-expressing cells arise at random positions within the aggregate (D. Traynor and J. G. Williams, unpublished results). We term these pstB cells because they do not appear to express the ecmA gene when they first arise. They subsequently seem to migrate to the base, where later, during tip extension, they begin to express the e c d gene (95).This basal population of coexpressing cells is left behind if the slug moves away from its site of formation, but if a culminant is formed in situ, they form part of the basal disc (8, 95). Thus there is a striking asymmetry in slug formation, in that ecmAexpressing cells migrate upward and ecmB-expressing cells migrate downward. A subset of ecd-expressing cells within the slug tip then activates expression of the ecmB gene, and ecmB-expressing cells within the base activate expression of the ecmA gene. How can we explain this behavior? The simplest notion is that there is a second CAMP-signalingcenter in the base, a suggestion that was made (113)to explain the downward movement of stalk cells at culmination. However, we still have no idea why the pstA and pstB cells should move in different directions. The only clues to understanding this process come from analysis of the behavior of the prestalk celh during culmination.
16
JEFFREY WILLIAMS AND ALASTAIR MORRISON
V. Culmination A. Stalk-tube Formation At culmination, the tip of the slug stops moving, and its rear end catches up with, and hence comes to sit underneath, the tip (114).The stalk tube is then constructed by the prestalk cells within the slug tip. As noted above, the EcmA and EcmB proteins are both present within the stalk tube. As they move up the outside of the tube, the pstA cells lay down cellulose fibers and proteins; presumably one of these is the EcmA protein (115-11 7). As the proteins reach the apex and enter the tube, the pstA cells reverse their direction of movement. Because of this change of direction, this process has often been likened to a “reverse-fountain” movement. As they pass the entrance to the stalk tube the pstA cells begin to express the ecmB gene and so become pstAB cells (8) (Figs. 6 and 7). The pstAB cells presumably secrete EcmB protein into the stalk tube and thereafter differentiate into dead, vacuolated stalk cells. It may be that expression of the
Early culminant
Late culminant
FIG, 7. Schematic representation of the distribution and movement of various prestalk cell types during the process of culmination. During culmination some of the ALCs move to the base of the aggregate and others move up to join with the pstO cell population (8, 29). Some of these pstO/ALC cells start to express the ecniB gene at a high level while still within the papilla. and they form the upper cup. Although the other (i.e., non-upper-cup) pstO/ALCs at the prestalkprespore boundary are shown as expressing only the ecmA gene, a very low level of e c d gene expression can be detected if staining is allowed to continue for extended periods. The upper cup cells, in cvntrast, show a very high level ofexpression of the eanB gene. Once all ofthe pstA cells have entered the stalk tube. some of the pstO/ALC cells enter the tube and express the ecmB gene. The cells in the upper cup (i.e., those pstO/ALC cells that express the ecmB gene at a high level) never enter the stalk tube and eventually form a disc of cells at the apex of the culminant. Approximately half of the pstO/ALC cells enter the stalk tube and half form the tipper cup.
Dictyostelium
MORPHOGENESIS
17
ecmB gene occurs synchronously with the commitment step because, in another species of slime mold (D. mucoroides), cells very near the top of the stalk are known to be inviable (118). At first glance the mechanisms of pattern formation during culmination and slug formation seem to differ radically. While slug formation appears to occur by cell sorting, stalk formation occurs by positional differentiation; expression of the ecmB gene being induced, presumably by DIF, precisely at the entrance to the stalk tube. This is, however, an oversimplification because the fruiting body is not simply a ball on a stick. Anterior-like cells within the culminant contribute to the formation of ancillary structures that give the fruiting body its characteristic shape, and they undergo extensive cell sorting.
B. The Movement of Anterior-like Cells at Culmination Over the same period that pstA cells in the tip are differentiating into pstAB cells, the ALCs move to surround the emerging spore head and, as they do so, they activate expression of the ecmB gene (8,29) (Figs. 7 and 8). Those ALCs that move downward are split into two as the spore mass begins to rise. This occurs after the stalk tube has embedded itself into the ALCs at the base, in a manner rather like pushing a finger into plasticine (8, 90). Those cells that remain in the base contribute to the basal disc and those cells that move up with the spore head form a kind of a cup (the lower cup)
I
Elevation of intracellular CAMP and activation of
FIG. 8. A model for the regulation of the e m B gene. This is a simplified representation of the e m B gene showing the sequences proximal to the promoter (downstream of residue -877; see Fig. 5) that direct expression in the stalk tube at culmination. This region of the gene is proposed to be potentially active in all cells that are exposed to DIF because of the presence of a positively acting region that lies at the end of the DIF signal transduction pathway. However, two repressor elements ( 1 2 0 ~keep ) the gene inactive in pstA cells. At culmination, when CAMP levels rise, PKA is activated in cells at the entrance to the stalk tube, the repressor is phosphorylated, and hence inactivated, and the e m B qene is expressed.
18
JEFFREY WILLIAMS AND ALASTAIR MORRISON
underneath it (8). Once again, therefore, just as during slug formation, a population of cells that express the ecmB gene move downward to the base. The ALCs that move upward merge with the band of pstO cells. Approximately half the cells within this region enter the stalk and activate expression of the ecmB gene (8)(Figs. 6 and 7). In a first finger that enters culmination immediately, and where there are discrete pstA and pstO regions, the first cells to activate the ecmB gene are the pstA cells, followed by the pstO/ALCs (8).By this stage of development, it is impossible to distinguish the pstO cells from the ALCs, and the fact that they come together in this way provides supportive evidence for their being the same cell type. The half of the pstO/ALC population that fails to enter forms a structure termed the upper cup. The pstO/ALCs that enter the stalk can be distinguished from the upper cup cells, because members of the former population do not activate ecmB gene expression until they enter the stalk tube, whereas members of the latter population do express the ecmB gene (Figs. 6 and 7). In a double-staining experiment, therefore, the upper cup cells are detected as a band of ecmA- and ecmB-expressing cells within a broader band of cells that express only the ecmA gene (8).The upper and lower cups may play a supportive role, cradling and protecting the spore head, or they may play an active role in lifting the spore mass up the spore head. They do not become vacuolated, and are therefore presumably viable cells; this fact has been held to support the latter idea, i.e., that they help lift the spore head up the stalk (29).
C. insights into Culmination from the Structure of the Promoter of the ecmB Gene The fact that the ecmB gene is induced in two different populations of cells, pstA cells and ALCs, is in part explained by the observation that the promoter of the eclnB gene contains separate regions that direct expression in these different cell types (119). A region proximal to the cap site directs expression in pstA cells as they enter the stalk tube and a distal region directs expression in the subset of ALCs that migrate to the upper cup (Fig. 5b). The location of the region that directs expression in the ALCs that move downward at culmination is as yet unknown. Understanding how pstAB differentiation occurs at the entrance to the stalk tube is one of the keys to understanding dictyostelid morphogenesis. As noted above, D. discoideum is somewhat exceptional among the dictyostelids in forming a freely migrating slug with distinct prestalk and prespore zones. Most other species form a stalk continuously during slug migration (4). In a species such as D. mucoroides, prespore cells trans differentiate, directly and rapidly, into stalk cells at the entrance to the stalk tube (91). There is no regulation. If such an aggregate migrates long enough,
Dictyostelium
MORPHOGENESIS
19
it forms an enormously elongated stalk with virtually no spores. The entrance to the stalk tube is, in all dictyostelids, the decision point. The ecmB gene provides a marker for this commitment step. It is with this process, the conversion of a pstA to a pstAB cell, by activation of ecmB gene expression at the entrance to the stalk tube, that the remainder of this review is concerned. The proximal region of the ecmB promoter, which directs expression to the stalk tube, is composed of two parts: a positively acting region capable of directing transcription in all prestalk cells, and a more proximal, negative control region that keeps the gene inactive until cells enter the stalk tube (119) (Fig. 5b). If this negative control region is deleted, the gene is expressed in apical pstA cells, i.e., the ecmB gene is effectively converted into an e m A gene with regard to its pattern of expression. The positively acting signal region may be the site of action of the DIF signal transduction system. DIF is a lipophilic molecule and there is a Dictyostelium protein with some of the properties expected of a steroid receptor (120). The negative control region contains two, apparently redundant, repressor elements, either of which can act to prevent expression of the ecmB gene in pstA cells until they enter the stalk tube (120~). They therefore act antagonistically to the putative DIF response region. These repressor elements are of central importance because they control the switch to stalk celldifferentiation.
VI. lntracellular Signaling and the Multiple Roles of CAMP-dependent Protein Kinase
A. Regulation of Stalk Cell-differentiation by CAMP-dependent Protein Kinase Recent work suggests that the negative control region, which keeps the ecmB gene off until cells enter the stalk tube at culmination, may contain the binding site for a repressor under the ultimate control of the CAMPdependent protein kinase (PKA). The effects of intracellular CAMPin eukaryotes are mediated by PKA. A rise in intracellular CAMPlevels causes dissociation of the R and C subunits of the CAMP-dependent protein kinase. Once dissociated, the C subunit migrates to the nucleus and activates gene expression (reviewed in 121). We believe that stalk cell-differentiation in Dictyostelium at culmination is induced by elevation of the intracellular CAMP concentration and activation of PKA. In the simplest model (Fig. 8) the catalytic subunit of PKA would phosphorylate a repressor protein, so causing it to dissociate from
20
JEFFREY WILLIAMS AND ALASTAIR MORRISON
the negative elements. The evidence for this model comes from analysis of the effects of expressing a dominant inhibitor of PKA selectively in pstA cells. Rni is a mutant form of the D. discoideutn R subunit of PKA, with point mutations in the two CAMP-binding sites that render it unable to bind CAMP (122).It can, however, bind to and inactivate the C subunit. In stable transformants containing the R?n gene fused to the promoter of the ecmA gene (ecmA-Rtn), development appears relatively normal up to slug stage (123). However, the slugs that are formed migrate almost indefinitely, under conditions in which control slugs enter culmination at a high frequency. They therefore phenocopy a class of mutants termed “sluggers” (124, 125). When ecmA-Rm slugs do eventually attempt to culminate, they stop migrating and rear up on end, but they remain indefinitely arrested in this upright position (123). They become blocked because the pstA cells are unable to initiate stalk-tube formation. As might be expected, they are also unable to activate stalk-specific gene expression so that if a ZucZ reporter construct containing a subfragment of the ecmB gene, which directs expression only within the stalk tube, is introduced into the ecmA-Rm mutant, the 2acZ gene is inactive (123). This effect is specific to the stalk-tube-specific region of the promoter, because the region that directs expression in the upper cup is expressed at normal levels in scattered cells within such structures.
B. A Role for CAMP-dependent Protein Kinase in Regulating Morphogenetic Cell Movement during Culmination When ectnA-Rm cells are mixed with an equal number of wild-type cells and allowed to develop, a fruiting body is formed that contains a bulbous protrusion at the prestalk-prespore boundary (123).This protrusion remains completely excluded from the stalk and comes to form a bolus of cells attached to the spore head of the mature culminant. It contains ecmA-Rrn cells that stay, frozen in situ, while the wild-type cells move through them to form and enter the stalk tube. Thus inactivation of PKA has an effect on cell movement that becomes manifest before cells move to the entrance to the stalk tube. Perhaps, then, culmination is a two-step process in which cells throughout the prestalk zone perceive the inductive signal and change their pattern of cell movement, but with activation of stalk-specific gene expression occurring only at the entrance to the stalk tube. Testing such a hypothesis will require a better understanding of the signals that induce culmination. Such factors have been investigated using various in uitro assays.
Dictyostelium
MORPHOGENESIS
21
VII. Other Extracellular Signals Controlling Prestalk and Stalk Cell-diff erentiation
A. Ammonia Several pieces of evidence suggest that the rise in cAMP that triggers culmination is brought about by a drop in ammonia levels. Ammonia is produced in large amounts, as a result of the extensive catabolism of cellular components that occurs during development (126-128), and exposure to an enzymatic “cocktail” that utilizes ammonia induces migrating slugs to culminate (129). Presumably, under natural conditions, there is an increase in the rate of loss of ammonia as the slug reaches the surface of the soil or leaf litter as it orientates upward, toward the light (130, 131). Ammonia inhibits the cAMP relay response, i.e., the stimulation of cAMP production by extracellular cAMP (132-134). Thus a drop in ammonia levels at culmination would be expected to trigger a rise in the intracellular cAMP concentration. There is supportive evidence of a role for ammonia in regulating stalk cell-differentiation from a number of other studies. When assayed in uitro, ammonia and other weak bases inhibit stalk cell-differentiation (36, 135). Conversery, exposure of slugs to weak acids such as CO, causes prestalk cells to differentiate into stalk cells in situ (136). The fact that treatment with ammonia elevates the p H of intracellular vesicles (137) supports the notion that ammonia functions by regulating vesicular pH (pH,) (135,138).There is as yet no clear link between pH, and intracellular CAMP, but there is the suggestion that changes in pH, may act to alter cytosolic calcium concentration (138). Such changes in calcium concentration could, perhaps, act to modulate the activity of adenylate cyclase.
B. Extracellular cAMP Extracellular CAMP-signaling is required in order that cells progress through development, and, as determined using in vitro assays, expression of both the ecmA and the ecmB genes depends on a period of incubation with cAMP (66,69,139,140). There is a DIF-binding protein that has some of the characteristics of a steroid receptor and that could, perhaps, be responsible for gene activation by D I F (120). Accumulation of this protein is dependent on prior incubation with cAMP and this may explain the cAMP dependency of ecmA and e m B gene expression. To a greater or lesser degree, ecmA gene expression is stimulatable by extracellular cAMP in all studies reported to date (66, 141). Surprisingly, however, at times after tip formation, stalk cell formation is inhibited by the addition of cAMP (66, 69, 139).This is an unexpected result because of the considerable body of evidence, described above, showing that a rise in intra-
22
JEFFREY WILLIAMS AND ALASTAIR MORRISON
cellular CAMP activates stalk cell-differentiation. During early development at least, an increase in extracellular cAMP triggers a rise in intracellular cAMP so that extracellular cAMP should stimulate intracellular CAMP accumulation and hence induce stalk cell formation (9). The effect of extracellular CAMP on ecmB gene expression has varied from study to study. In some studies, expression was markedly inhibited (66, 139), whereas in others ecmB gene expression was stimulated (A. Morrison, unpublished) or was highly variable (142). Differences in timing of the cAMP treatment in the various experimental protocols may explain the variability in results (140). The ecmB gene has a complex pattern of gene expression. It is activated during slug formation in scattered cells within the aggregate and is later activated within the stalk tube. The early- and late-expressing cell populations may differ in their responses to extracellular CAMP, so that analysis of the expression at the level of the mRNA may give misleading results. The pDd26 gene is expressed only during culmination and here the situation is quite clear: extracellular CAMP markedly inhibits expression (J. S. So and G. Weeks, personal communication; A. Morrison, unpublished). This apparent paradox-that stalk cell-differentiation is inhibited by extracellular CAMP but requires elevated intracellular CAMP-may have a straightforward explanation. Recent results suggest that extracellular CAMP acts to inhibit stalk cell-differentiation by affecting an intracellular-signaling component other than adenylate cyclase. The evidence for this derives from the observation that expression of the pDd26 gene remains CAMPrepressible in cells in which the catalytic subunit of the kinase is constitutively active ( 1 4 2 ~ )There . may, therefore, be two parallel pathways controlling stalk cell-differentiation: an ammonia-regulated pathway that operates by controlling intracellular CAMP, and a pathway regulated by extracellular CAMP that fiinctions through an as-yet unknown intracellular-signaling mechanism.
VIII. Conclusions The ecmA and ecmB genes have provided a number of new insights into the mechanisms regulating the morphogenesis of Dictyostelium. Rather than a simple binary decision, between stalk cell-differentiation and spore celldifferentiation, it is now clear that cells are faced with a series of differentiation decision. There are multiple prestalk cell subtypes and analysis of the promoter of a gene expressed selectively in prespore cells suggests that there may also be multiple prespore cell subtypes (143,144). We believe that the most important choice a cell faces on the stalk cell pathway is whether it should differentiate as a pstA cell or as a pstO/ALC (Fig. 6). These two
Dictyosteliurn
23
MORPHOGENESIS
prestalk cell subtypes differ in several important aspects of their behavior and we need to identify the biochemical basis for these differences. We also need to define the signaling conditions that direct cells into one or other of these two pathways. The other pressing need is to understand directed cell movement within the aggregate, both during slug formation and at culmination. The latter process is particularly complex, with at least three modes of cell movement occurring simultaneously: what directs the “reverse-fountain’’ movement of pstA cells, and why do some of the ALCs move upward and some move downward? Here the ecmA and ecmB gene show that there is a link between the state of cellular differentiation and the pattern of directed cell movement. The subpopulation of pstO/ALCs that fail to enter the stalk tube at culmination can express the ecmB gene via a specific subregion of its promoter (119). This gives us the hope that, both by identifying other genes with this pattern of expression and by using the subfragment of the ecmB promoter to direct expression of dominant inhibitors within upper cup cells, we may understand the cellular basis for their specific movement pattern. In summary then, the construction of a Dictyostelium fruiting body is a more complex process than was previously believed, but is infinitely less complex than the construction of a fly or a mouse. There is, therefore, the very real hope of obtaining a detailed cellular and molecular understanding of an entire morphogenetic process.
ACKNOWLEDGMENTS We thank Keith Jermyn and Gerry Weeks for their insightful comments on an earlier version of this review.
REFERENCES 1 . K. B. Raper, J . Elisha Mitchell Sci. Soc. 59, 241 (1940). 2. J. T. Bonner, Am. J . Bot. 31, 175 (1944). 3. K. B. Raper and D . I. Fennell, Bull. Torrey Bot. Club 79, 25 (1952). 4. J. T. Bonner, Q . Rev. Biol. 32, 232 (1957). 5. F. 0. Stenhouse and K. L. Williams, Dew. B i d . 59, 140 (1977). 6. K. A. Jermyn, M. Berks, R. R. Kay and J. G. Williams, Dewlopment 100, 745 (1987). 7. K. A. Jermyn, K. Duffy and J. G. Williams, Nature 340, 144 (1989). 8. K. A. Jermyn and J. G . Williams, Development 111, 779 (1991). 9. P. N. Devreotes, “The Development of Dictyosteliurn discoideum” (W. F. Loomis, ed.), p. 117. Academic Press, New York, 1982. 10. J. G. Williams, C. J. Pears, K. A. Jermyn, D . M. Driscoll, H. Mahhubani and R. R. Kay,
24
11.
12. 13. 14.
JEFFREY ‘WILLIAMS AND ALASTAIR MORRISON in “Symposinm of the Society for General Microbiology).” (I. Booth and C. Higgins, eds.), p. 277. Cambridge University Press, Cambridge, England. 1986. 6. Gerisch, ARB 56, 853 (1987). P. Schraap. Dij&rentiation 33, l(1986). A. R. Kimmel and R. A. Firtel, Curr. Opin. Genet. Dec. 1, 383 (1991). P, Klein, T. Sun, C. Saxe, A. R. Kimmel, R. Johnson and P. Devreotes, Science 241, 1467
(1988). 15. C . L. Saxe. H. L. Jolinson, P. Pi. Devreotes and A. R. Kimmel, Dee. Genet. 12,6 (1991). 16. C. I,. Saxe, 6. T. Ginsburg, J. M. Luuis, R. L. Johnson, P. . Devreotesand A. R. Kimmel, Ccnes Dec. 7, 262 (1993). 17. R. L. Johnson, C. L. Saxe, J. hf. Louis, R. Gollop, A. R. Kimmel and P. N. Devreotes, Genes Dec. 7, 273 (1993). 18. M. Pupillo, A. Kuniagdi, G. Pitt. R. A. Firtel and P. N. Devreotes, PNAS 86, 4892 (1989). 19. J. Hadwiger, T. Wilkie, M. Strathmann and R. A. Firtel, PNAS 88, 8213 (1991). 20. J. A. Hadwiger and R. A. Firtel. Genes Deo. 6, 38 (1992). 21. P. C. Newell, G. N . Europe-Finner and N. V. Small, Microbiol. Sci. 4, 5 (1987). 22. P. Janssens and P. Van Haastert, Microbid. Rea 51, 3’36 (1987). 23. B. Snaar Jagalska, F. Kesbeke and P. \’atan Haastert, Dea Genet. 9, 215 (1988). 24. R. A. Firtel, P. J. hl. van Haastert. A. R. Kimmel and P. N . Devreotes, Cell 58, 253 (1989). 25. R . A. Firtel. Trends Genet. 7, 381 (1991). 26. J. T. Bonner, W. W. Clarke, C. L. Neely and hl. K. Slifkin,]. Cell. Comp. Physiol. 36, 149 (1950).
27. D. LL‘. Francis, J. Cell. Conip. Physiol. 64, 131 (1964). 28. K. L. Poff and M. Skokut, PNAS 74, 2007 (1977). 29. J. Sternfeld and C. N. David, Dec;. Biol. 93, 111 (1982). 30. J. Sternfeld and C. N. David, Dijferentiution 20, 10 (1981). .31. K. Ilevine and W. F. Loomis, Dea Biol. 107, 364 (1985). 32. E Sakai, Dec. Grolcfh Differ, 15, 11 (1973). .3.3. 1. Sampson, J. Etnbryol. Exp. Morphul. 36, 663 (1976). .W.C . 11. Town, J. D. Gross and R. R. Kay, h‘ature 262, 717 11976). .35. K. R. Kay. D. Garrod and R. Tilly, Nature 271, 58 (1978). 36. J. D. Cross, C. D. Town, J. J. Brtmkman. K. A. Jermyn, M . J. Peacy and R. R. Kay. Philos. Trmr. R . Soc. London, B 295, 497 (1981). 37. R. R . Kay and K. A. Jermyn. Nature 303, 242 (1983). 38. H. R. Morris, G . R: Taylor, M. S. Xlasento. K. A. Jermyn and R. R. Kay, Nature 328,811 (1987). .39. H. K . Hohl and S. T. Hamamoto. 1. Ultrustruct. Res. 26, 442 (1969). .10. V. Sluller and H. R . Hohl, Diflerentiation 1, 267 (1973). 41. M. C. hlehdy, D. Ratner and R. A. Firtel, Cell 32, 7&3 (1983). -12. B. C . A. Dowds and W. F. Looinis, BBRC 135, 336 (1986). 4.3. C . B. Hong and Wr.F. Loomis, BBA 950, 61 (1988). 44. K. L. Fosnangh and LV. F. Loomis, MCBiol9, 5215 (1989). 4.5. K. L. Fosnaugh and \V. F. h o m i s , h7ARes17, 9489 (1989). 46. J. H. Gregg, ht. Krefft. A. Haaskraus and K. L. Williams, Exp. Cell Res. 142, 229 (1982). 47. hl. KrefFt, L. Voet. J. H. G r e g and K. L. Williams, 1. End>ryol.Exp. Morphol. 88, 15 ( 1985). 45.. A. E. Early. J. G . Williams, H. E. hleyer, S. B. Pnr, E. Smith, K. L. Williams and A. A. Gooley, MCHiol 8, 3458 (1988). 49. E. Barklis and H. F. Lcxlish, Cell 32, 1139 (1983).
Dictyostelium MORPHOGENESIS
25
50. A. Tsang and J. M. Bradbury, Exp. Cell Res. 132, 433 (1981). 51. D. Fatner and W. Borth, Exp. Cell Res. 143, 1 (1983). 52. R. L. Chisholm, E. Barklis and H. F. Lodish, Nature 310, 67 (1984). 53. G. Mangiarotti, S. Chung, C. Zuker and H. F. Lodish, NARes 9, 947 (1981). 54. M. Oyama and D. D. Blumberg, J. Cell B i d . 99, A241 (1984). 55. S. Chung, S. M. Landfear, D. Blumberg, N. S. Cohen and H. F. Lodish, Cell 24, 785 (1981). 56. G Mangiarotti, S. Bulfone, R. Giorda, P. Morandini, A. Ceccarelli and B. Hames, Deuelopment 106, 473 (1989). 57. M. Oyama and D. D. Blumberg, Deu. B i d . 117, 557 (1986). 58. A. J. Richards, A. J. Corney and B. D. Hames, Mol. Microbiol. 4, 1279 (1990). 59. J. Bonner, PNAS 65, 110 (1970). 60. R. R. Kay, B. Dhokia and K. A. Jermyn, EJB 136, 51 (1983). 61. H. Morris, M. Masento, G. Taylor, K. Jermyn and R. Kay, BJ 249, 903 (1988). 62. M. Masento, H. Morris, G. Taylor, S. Johnson, A. Skapski and R. Kay, BJ 256, 23 (1988). 63. J. J. Brookman, C. D. Town, K. Jermyn and R. R. Kay, Deu. B i d . 91, 191 (1982). 64. M. Masento, H. Morns, G. Taylor and R. Kay, Biomed. Enuiron. Mass Spectrom. 16,353 (1988). 65. A. Sobolewski, N. Neave and G. Weeks, Differentiation 25, 93 (1983). 66. M. Berks and R. R. Kay, Deuelopment 110, 977 (1990). 67. W. Kopachik, A. Oohata, B. Dhokia, J. J. Brookman and R. R. Kay, Cell 33, 397 (1983). 68. W. Kopachik, B. Khokia and R. R. Kay, Differentiation 28, 209 (1985). 69. M. Berks, D. Traynor, I. Carrin, R. H. Insall and R. R. Kay, Deuelopment, Suppl. 1, 131 (1991). 70. J. Williams, A. Ceccarelli, S. McRobbie, H. Mahbubani, R. Kay, A. Early, M. Berks and K. Jermyn, Cell 49, 185 (1987). 71. M. Tasaka, T. Noce and I. Takeuchi, PNAS SO, 5340 (1983). 72. C. D. Reymond, R. H. Gomer, M. C. Mehdy and R. A. Firtel, Cell 39, 141 (1984). 73. R. K. Esch and R. A. Firtel, Genes Deu. 5, 9 (1991). 74. S. Bobbins, J. Williams, K. Jermyn, G,. Spiegelman and G. Weeks, PNAS 86, 938 (1989). 75. S. J. McRobbie and A. Ceccarelli, NARes 16, 4738 (1988). 76. A. Ceccarelli, S. J. McRobbie, K. A. Jermyn, K. Du@, A. Early and J. G. Williams, NARes 15, 7463 (1987). 77. B. M. Shaffer, J. Embryol. Exp. Morphol. 13, 97 (1965). 78. H. Freeze and W. Loomis, JBC 252, 820 (1977). 79. H. Freeze and W. F. Loomis, BBA 539, 529 (1978). 80. S. J. McRobbie, K. A. Jermyn, K. Du@, K. Blight and J. Williams, Deuelopment 104,275 (1988). 81. S. McRobbie, R. Tilly, K. Blight, A. Ceccarelli and J. Williams, Deu. Biol. 125, 59 (1988). 82. J. Morrissey, K. Devine and W. Loomis, Deu. Biol. 103, 414 (1984). 83. J. S. Wallace, J. H. Morrissey and P. C. Newell, Cell Differ. 14, 205 (1984). 84. A. Early, M. Gaskell, D. Traynor and J. Williams, Development 118, 353 (1993). 85. D. Francis and D. O’Day, J . Exp. Zool. 176, 265 (1971). 86. T. Kakutani and I. Takeuchi, Deu. Biol. 115, 439 (1986). 87. M. Gaskell, D. Watts, T. Treffry, K. A. Jermyn and J. G. Williams, Differentiution 51, 171 (1992). 88. B. Buhl and H. K. MacWilliams, Differentiation 45, 147 (1991). 89. A. Blaschke, C . Weijer and H. MacWilliams, Differentiation 32, 1 (1986). 90. J. Sternfeld, Wilhelm Roux’s Arch. Deu. B i d . 201, 354 (1992). 91. J. H. Gregg and R. W. Davis, Differentiation 21, 200 (1982).
26
JEFFREY WILLIAMS AND ALASTAIR MORRISON
92. I. Takeuchi, M. Hayashi and M. Tasaka, in “Development and Differentiation in the Cellular Slime Moulds” (P. Cappuccinelli and J. M. Ashworth, eds.), p. 1. Elsevier, New York, 1977. 93. A. J. Harwood. A. E. Early, K. Jermyn and J. G. Williams, Differentiation 46, 7 (1991). 94. R. H. Gomer. S. Goiner and R. A. Firtel, I. Cell B i d . 103, 1999 (1986). 95. J, 6. Williams, K. T. DUE$, D. P. Lane, S.J. McRobbie, A. J. Harwood, D. Traynor and K. A. Jermyn, Cell 59, 1157 (1989). 96. R . H. Insall, 0. Nayler and R. R. Kay, E M B O J . 11, 2849 (1992). 97. C . K. Leach, 1. M . Ashworth and D. R. Garrod,]. Embryol. Exp. Morphol. 29,647 (1973). 98. R. H. Gomer and R. A. Firtel, Science 237, 758 (1987). 99. S. McDonald and A. Durston, I. Cell Sci. 66, 195 (1984). 100. C. J. Weijer, G. Duschl and C. N . David, J . Cell Sci. 70, 133 (1984). 101. R. Gomer and R. Firtel, Science 237, 758 (1987). 102. T. Ohmori and Y. Maeda, Cell Differ. 22, 11 (1987). 103. Y. Maeda. T. Ohmori, T. Abe, F. Abe and A. Amagai, Differentiation 41, 169 (19x9). IO4. A D e Lozanne and J. A. Spudich, Science 236, 1086 (1987). 10.5. D. Knecht and W. Loomis, Deu. B i d . 128, 178 (1988). 106. D. Knecht and W. Loamis, Science 236, 1081 (1987). 107. W. Witke, M. Schleicher and A. A. Noegel, Cell 68, 53 (1992). 108. M. Faure, G . J. Podgorski, J. Franke and R. H. Kessin, PNAS 85, 8076 (1988). 109. 1).Traynor, R. H. Kessin and J. G. Williams, PNAS 89, 8303 (1992). 110. A. J . Durston and F. Vork, 1. Cell Sci. 35, 261 (1979). I l l . S. Matsukuina and A. Durston, 1. Einbryol. Exp. Morphol. 50, 243 (1979). I12. J. D. Mee. C. Tortolo and M. B. Coukell, Biochem. Cell. Biol. 64, 722 (1986). 1 1 3 . M . Sussinan and J. Schindler, Differentiation 10, 1 (1978). 114. K. D. Rand and M. Sussman, Differentiation 24, 88 (1983). 11.5. K. Grzelius and B. Ranby. Exp. Cell Res. 12, 265 (1957). 116. R, P. George. H. R. Hohl and K. B. Raper, I . Gen. Microbiol. 70, 477 (1972). 117. H. Hohl and J. Jehli. Arch. Microbiol. 92, 179 (1973). 118. U’. F. Whittingliam and K. B. Raper, PNAS 46, 642 (1960). 119. A. Ceccarelli, H. Mahbubani and J. G. Williams, Cell 65, 983 (1991). 120. R. Insall and R. R. Kay, E M B O ] . 9, 3323 (1990). 120a. A. J. Hanvood, A. Early and J. G. Williams, Deoelopment (1993). 121. K. A. W. Lee, Curr. Opin. Cell Bwl. 3, 953 (1991). 122. A. J. Harwood, N . A. Hopper, hl.-N. Simon, S. Bouzid, M . Veron and J. G. Williams, Dec. B i d . 149, 90 (1992). 323. A. J. Harwood, N. A . Hopper, M . N. Simon, D. M. Driscoll, M . Veron and J. 6. \Villianis, Cell 69, 615 (1992). 124. M . Sussman, J. Schindler and H. Kim, E x p . Cell Res. 116, 217 (1978). 725. P. C. Newell and F. M. Ross, J , Gen Microbiol. 128, 1639 (1982). 126. J. Gregg, A. Hackney and J. Krivanek, B i d . Bull. 107, 226 (1954). 127. J. Walsh and 8. Wright, I . Gen. Microbiol. 108, 57 (1978). 128. J. Wilson and C. Rutherford, /. Cell. Physiol. 94, 37 (1978). 129. J. Schindler and M. Sussman, J M B 116, 161 (1977). 1.30. J. Bonner, T. Davidowski, W. Hsu. I>. Lapeyrolerie and H. Suthers, Differentiation 21, 123 (1982). 1.31. J. Bonner, H. Suthers and G. Odell. Nuturc 323, 630 (1986). 132. J. Schindler arid M. Sussman, BBRC 79, 611 (1977). 13.3. J. Schindler and M. Sussman, Del;. Genet. 1, 13 (1979). 134. 6. B. Williams. E. M. Elder and M. Sussman, Dgferentiation 31, 92 (1986).
Dictyostelium
MORPHOGENESIS
135. J. D. Gross, J. Bradbury, R. R. Kay and M. Peacey, Nature 303, 244 (1983). 136. K. Inouye, Development 104, 669 (1988). 137. A. Yamamoto and I. Takeuchi, Differentiation 24, 83 (1983). 138. J. Gross, M. Peacey and R. P. Von Stradmann, Differentiation 38, 91 (1988). 139. M. Berks and R. R. Kay, Dew. Biol. 125, 108 (1988). 140. Y. Yamada and K. Okamoto, Dev. B i d 149, 235 (1992). 141. L. Kwong and G . Weeks, Differentiation 44, 88 (1990). 142. J. S. So and G. Weeks, Differentiation 51, 73 (1992). 142n. N. Hopper, C . Anjard, C. Reymond and J. G . Williams, E M B O ] . (1993). 143. L. Haberstroh and R. A. Firtel, Genes Dev. 4, 596 (1990).
144. L. Haberstroh, J. Galinda and R. A. Firtel, Development 113, 947 (1991).
27
This Page Intentionally Left Blank
Collagen Genes: Mutations Affecting Collagen Structure and Expression WILLIAMG. COLE Division of Orthopaedics The Hospital for Sick Children Toronto, Ontario, Canada M5G 1x8
...................... ........................................ .................... ene ....................................... VII. Type-IX Collagen Genes . . . . . . . . . . . . . . . . . . VIII. Type-X Collagen Gene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IX. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29 30 47 57
68 70
1. The Collagens The collagens are the major structural proteins of the extracellular matrix. Their importance is most obvious in the connective tissues such as bone, cartilage, ligament, tendon, dermis, and dentin, where they provide a highly organized fibrous matrix. The collagens also provide structural integrity to the capsules, septa, and laminae of the gastrointestinal, cardiovascular, urogenital, respiratory, and nervous systems. Basement-membrane collagens provide important structural and physiological functions in the kidneys, lungs, and other tissues. The collagens of the eyes and ears also have specialized structural and physiological roles. The collagens are a complex family of secreted molecules that share similar triple-helical motifs and have a structural role in the extracellular matrix. There are at least 26 different collagen polypeptide chains encoded by 26 unique collagen genes (1). Acetylcholine esterase, lung-surfactant protein, complement Clq, and conglutinin also contain collagenous (COL)1 1 Abbreviations: COL, collagen(ous); bp, base pairs; kb, kilobases; 0 1 , osteogenesis imperfecta; EDS, Ehlers-Danlos syndrome; NC, noncollagenous; GAG, glycosaminoglycan; RFLP, restriction-fragment-length polymorphism; IF, inhibitory factor; TGF, transforming growth
Progress in Nucleic Acid Research and Molecular Biology. Vol. 47
29
Copyright 0 1994 by Academic Press. Inc. All rights of reproduction in any form reserved.
30
WILLIAM G . COLE
triple-helical domains, but they are not classified as collagens because they do not contribute to the structure of the extracellular matrix (2). Roman numerals are used to identify each of the collagens, and arabic numerals represent individual polypeptide chains, referred to as a-chains (Table I). The collagens can be grouped into categories according to their a-chain characteristics, molecular assembly, and supramolecular structures (Table I). The most widespread and abundant collagens are the fibrillar collagens, types I, 11, 111, V, and XI (reviewed in 2). Type-IV collagen of basement membranes forms complex sheet structures; type-VIII and probably type-X collagens form complex hexagonal lattices; type-VI collagen forms beaded microfibrils; type-VII collagen forms anchoring fibrils; and collagens types IX, XII, and XIV are fibril-associated collagens with interrupted triple helices (FACIT). In addition to these levels of diversity, the collagens are also distributed differently throughout the extracellular matrices (Table I). Rapid progress continues to be made in characterizing the genes that encode the unique collagen a-chains (Table 11). The gene symbols indicate the type of collagen and a-chain. For example, in the symbol C O L l A l , COLl indicates type-I collagen and A1 indicates the al(1) chain of this collagen. The genes are widely dispersed in the genome but some of them are clustered together, such as COLAAl and COLAA2 on chromosome 13q34 and COLGAI and COL6A2 on chromosome 21q22.3 (Table 11). This review focuses on the collagen genes in which naturally occurring or induced mutations have been characterized. It includes the genes that encode collagens types I, 11, 111, IV, VII, IX, and X. Each of the following sections includes a description of the normal collagen and its gene(s) as well as a description of its naturally occurring and induced mutations. Descriptions of other collagen genes and the genes that encode the posttranslational modifying enzymes are available elsewhere (reviewed in 1, 3).
II. Type-I Collagen Genes A. Normal Type-I Collagen Genes Type-I collagen, the classical fibrous collagen, is the most abundant and widespread collagen. Each molecule contains two al(1)and one aZ(I) chains and forms a 300-nm long and 1.4-nm wide semiflexible rod that spontaneously assembles into microfibrils. The molecules overlap by a quarter of &tor; FACIT, fibril-associated collagen with interrupted triple helices; Mov-13, mouse with integration of Moloney leukemia virus into intron 1 of the COLlAl gene; CBF, CCAAT-binding protein: PCR, polymerase chain reaction; SED, spondyloepiphyseal dysplasia; LC, long-chain collagen.
31
COLLAGEN GENES AND THEIR MUTATIONS
TABLE I THE RPES OF COLLAGENS Collagen type
I trimer
IV
Molecular form
al(I),
al(Iv)2a2(Iv) a3(IV), a4(IV), aS(IV)? aW2aW) al(V)a2(V)u3(V)
V
Tissue distribution Skin, tendon, hone, cornea, and most of the connective tissues Embryonic tissues Hyaline cartilage, intervertebral disc, vitreous humor With type I in skin, visceral, cardiovascular tissues, placenta Basal membranes
Type of aggregate Quarter-staggered fibrils Quarter-staggered fibrils Quarter-staggered fibrils Quarter-staggered fibrils Sheets
Ubiquitous
Quarter-staggered fibrils
Aorta intima, placenta, uterus, skin, cartilage, nucleus pulposus, lung, liver Dermo-epidermal junction Endothelial cell culture, Descemet membrane Cartilage, intervertebral disc, vitreous humor Hypertrophic zone growth plate Cartilage
Beaded filaments
am93
VI
al(VI)aZ(VI)a3(VI)
VII VIII
al(VII), al(VIII),
IX
a 1(IX)a2(IX)a3(IX)
X XI
al(X)3 al(XI)a2(XI)a3(XI)
XI1 XI11
al(XII),
Tendon, ligament, skin Skin fibroblast culture
XIV
al(XIV),
Tendon, skin
Anchoring fibrils Sheets FACIP Sheets Quarter-staggered fibrils FACIT Described from cDNA FACIT
their lengths, SO producing the characteristic 67-nm periodicity of collagen fibrils (4). The characteristic properties of type-I collagen are largely due to the triple-helical conformation of approximately 95% of the molecule (5). The triple helix is formed from the uninterrupted sequence of 338 Gly-X-Y triplets in which X is frequently proline and Y is frequently hydroxyproline. Formation of the triple helix is dependent on glycine, the smallest amino acid, occupying each third position, and stabilization of the helix is achieved by hydroxyproline residues in the Y position. The type-I collagen molecule
32
WILLIAM G . COLE
TABLE I1 CHROMOSOMAL
LOCALIZArION OF COLLAGEN GENES
Collagen gene
Chromosome
Ref.
1Al lA2 2A 1 3.4 1 4A 1 4A2 4113 4A4 4'45 5,41 5d2 6'4 1 6A2 6A3 7.4 1 8'4 1 8'4 2 9A 1 1M1 11'41 11'4.2 1.3'41 15'41 16Al
17q21.3-q22 7q21.3-q22.1 12q12-q13.1 2q31-q32.2 13q3.1 13q3-4 2q36-q37 2q3.5337 xq22 9q34.2-q34.3 2q32-q33 21q22.3 21qq22.3 2q3i 3pZ1.1 3 q l l .1 4 3 . 2 lq34.3-~32.3 6q 12-ql4 r-2 1 -q22 lp21 6~21.3 10q22 9q21q22 1P3.5-N
254 255 2.56 256 25 7 257 194,258 258 198, 203 259 260, 261 262-264 262-26.1 263, 264 213, 265 266 267 268 248, 249 269 270, 271 272-274 2 75 2 76
also contains amino- and carboxy-terminal nonhelical extensions called telopeptides. They form intramolecular and intermolecular covalent crosslinkages that stabilize the polymer. The type-I collagen chains are synthesized as pre-pro-a-chains from which the signal peptitles of 22 residues are rapidly removed to yield the pro-a-chains (Table 111). Each type-I procollagen molecule forms by a process of association, registration, and disulfide bonding of the carboxy propeptides of two pro-al(1) and one pro-a2(I) chains (5).These processes ensure the correct alignment of the amino-acid residues required for the formation and propagation of the triple helix from the carboxy terminus of the helical domain. The pro-a-chains also undergo extensive co- and posttranslational modifications. Approximately 100 proline residues in the Y position of GlyX-Y triplets are hydroxylated to 4-hydroxyproline whereas a few specific lysine residues are hyroxylated to hydroxylysine, which may then be glycosylated to produce galactosyl- or glucosylgalactosyl-hydroxylysine.These
33
COLLAGEN GENES AND THEIR MUTATIONS
TABLE 111 DOMAIN STRUCTURES OF PROCOLLAGENS TYPESI, 11, AND IIIa Domain
Pro-al(1)
Pro-a2(1)
Pro-a1(111)
Pro-al(I1) ~~
Signal peptide N-Propeptide N-Telopeptide Triple helix C-Telopeptide C-Propeptide
22 139 17 1014 26 246
22 57 11 1014 15 247
21, 23, or 25 91, 89, or 87 19 1014 27 246
24 129 14 1029 25 246
.Number of amino-acid residues. Data are from 103, 118,170,and 277.
enzymatic modifications occur while the chains are in a nascent state and cease when the helix forms. An N-linked oligosaccharide is also added to the carboxy propeptides. The extensively modified procollagen is secreted and the amino and carboxy propeptides are then removed by specific proteinases to yield type-I collagen, which self-assembles into fibrils. The final enzymatic modification is the oxidative deamination of specific lysine residues, which are essential for the formation of normal intra- and intermolecular cross-linkages. The COLlAl gene, encoding the al(1)chain, and COLlA2 gene, encoding the a2(I) chain, are single-copy genes located on chromosomes 17q21.3q22 and 7q21.3-q22, respectively (Table 11).They contain 52 exons, of which exons 7 to 48 encode the main triple-helical domain of 1014 amino acids and 338 Gly-X-Y repeats (6, 7). The exon sizes in this domain are identical in the two genes except that exons 33 and 34 are fused in COLlAl and are referred to as exon 33/34 (3).The exons encoding the triple-helical domain are predominantly 54 bp, multiples of 54 bp, or combinations of 45 and 54 bp (Table IV). In each case, the exons begin with a complete glycine codon and encode complete Gly-X-Y triplets. The first six exons encode the pre-propeptides whereas the last four exons encode the carboxy telopeptides and carboxy propeptides. Exon 52 of the COLlAl and COLlA2 genes encodes two and four polyadenylation sites, respectively. These exons, and particularly the junctional exons 6 and 48, have sizes that differ markedly from those encoding the triple-helical domain of the protein. Most of the exons encoding the pre-propeptide and carboxy propeptide of the pre-pro-d(I) chain are larger than the equivalent exons of the COLlA2 gene. Despite these differences, the 38-kb COLlA2 gene is much larger than the 18-kb COLlAl gene, due to the larger introns in COLlA2. The expression of the COLlAl and COLlAZ genes is coordinately regulated in many physiological and pathological circumstances (8).The mecha-
34
WILLIAM G . COLE
TABLE IV
E x o ~ sENCODING THE TRIPLE-HELICALDOMAIN OF a2(I) CHAINS OF
Exon number
Exon size (bp)
7
43 54
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
.54 54 54
.54 45
.54 45
54 99 45 99
54 108 54 99 54 99 54 54
RPE-I PRO COLLAGEN^
Amino acids encoded
Exon number
Exon size (bp)
4-18 19-36 37-54 55-72 73-90 91-108 109-123 124-141 142-156 157- 174 175-207 208-222 223-255 2%-273 274-309 310-327 328-360 361-378 379-41 1 412-429 430-447
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
54 54 45 99 108 54 54 54 54 108 54
54 162 108 108
54 108 54
108 54 108
Amino acids encoded
448-465 466-483 484-498 499-532 533-567 568-585 586-603 604-621 622-639 640-675 676-693 694-711 712-765 766-801 802-837 838-855 856-891 892-909 910-945 946-963 964-999
'Data are from 6, 7. and 103.
nisnis involved in coordinate regulation are poorly understood but both genes share some similar regulatory motifs, which have been more thoroughly studied in COLlAl (8). Studies of the transcriptional activities of the COLlAl gene have revealed functional promoters in the 5' flanking region and transcriptional activator and suppressor elements within the first intron and within the 3' untranslated region (8-12). The embryonic lethal mutation in the homozygous Mov-13 mouse results from insertion of the Moloney murine leukemia virus into the first intron of COLlAl (13).The viral insertion causes changes in the methylation pattern and chromatin conformation of the mutant gene, inactivating its transcription (14, 15).The first intron of human COLlAl also contains cis-regulatory sequences that appear to interact, in transient transfection systems, with 5'-promoter sequences to regulate the transcription of COLlAl (16-18). In contrast, at least 85%of the intron 1 sequences are not essential for expression of the human gene in stably transfected mouse cells (12).
COLLAGEN GENES AND THEIR MUTATIONS
35
The human, mouse, and rat 5'-promoter regions of COLlAl are similar (19). Of particular note are two CCAAT boxes, of which the proximal one interacts with a heterodimeric CCAAT-binding protein (CBF) to activate COLlAl transcription (8). Using an in uitro transcription system, point mutations in the CCAAT motif that showed either no binding or decreased binding of CBF abolished or reduced transcriptional activation by this transacting factor (20). The transcriptional inhibitory motifs, IF1 and IF2, are also near the CCAAT motif of COLlAl (8).The upstream elements also contain a binding site for nuclear factor 1and a related TGF-P activating element (21). In the rat COLlAl, an additional TGF-P activating element has been located upstream at - 1627, and one or more stimulatory elements, which are preferentially active in osteoblastic cells, have been located between -3521 and -2295 (22, 23). The untranslated 3' region of COLlAl also contains numerous putative gene regulatory motifs, and cell-specific binding of nuclear proteins to two highly conserved domains flanking the two polyadenylation sites has been demonstrated (11). Sequences in the 3' untranslated region of al(1)mRNA that might affect its stability by altering its secondary structure or protein binding have been identified (11). A highly conserved inverted repeat sequence of approximately 50 nucleotides in the 5' nontranslated region of al(1)mRNA was proposed to form a stem-loop structure that might alter the availability of the critical -AUGsequence for translation (24). However, deletion of sequences required for the stem-loop formation did not affect the efficiency of translation of transcripts from a fusion gene in either transiently or stably transfected cells. Also, DNase I protection experiments did not reveal binding of any protein to the sequences found in the inverted repeat (25).
B. Naturally Occurring Mutations of the Type-I Collagen Genes Many workers have characterized mutations of the type-I collagen genes in the heritable connective-tissue diseases, osteogenesis imperfecta (01)and Ehlers-Danlos syndrome (EDS). Fortunately, both diseases are expressed in the dermis so that the abnormal type-I collagen protein and mRNA are available with genomic DNA for the characterization of the mutations. The various categories of mutations and their consequences are described in the following sections. 1. MUTATIONS OF TYPE-I COLLAGEN GENES IN OSTEOGENESIS IMPERFECTA This disorder affects the type-I collagen-containing tissues and produces bone fragility, dentinogenesis imperfecta, blue scleral color, premature deaf-
36
WILLIAM G . COLE
ness, and joint laxity (26,27). It is heterogeneous with a mild form (type I), a lethal form (type II), a progressively deforming severe form (type III), and a moderately severe form (type IV) (28). Autosoma] dominant inheritance or sporadic Occurrence of heterozygous mutations of the type-I collagen genes are observed in most patients. Some patients with the type-111 form have autosomal recessive defects of these genes (26).
Biochemical Clussijication of Osteogenesis lmpegecta. Two broad categories of biochemical defects of type-I collagen have been identified. The first category includes dominant-negative mutations of the COLlAl or COLlA2 genes. Approximately 100 such mutations have been characterized to date and have been tabulated (29). The second category includes null mutations, usually of the COLlAl gene, in which the mutant allele fails to produce a pro-a-chain or produces a functionless pro-a-chain (30, 31). The dominant-negative mutations produce abnormalities in the sequence of the helical domain, carboxy telopeptide, or carboxy propeptide or the pro-a-chains. The most common are inissense mutations that alter Gly codons in Gly-X-Y triplets within the triple helix. Less common are point mutations that produce splicing errors with loss of an exon or insertion of intronic sequences, and least common are deletions and insertions involving single or multiple exons (26). In contrast, little is known about the molecular basis of the null mutations. The dominant-negative and the null mutations are outlined in the following sections. u.
I;. Helical Glycine Substitutions, Point mutations of the first two nucleotides of the GGN codon for glycine would be expected to produce substitutions by eight amino acids, including alanine, arginine, aspartic acid, cysteine, glutainic acid, serine, valine, and tryptophan. Substitutions by cysteine are the most common whereas substitutions by alanine are rare and substitutions by glutaniic acid and tryptophan have yet to be seen (26, 29). There is a strong preference for mutations of the first rather than the second base of the glycine codon. There do not appear to be any obvious “hot spots” for point mutations, as glycine substitutions have been observed along both a-chains. Most of the mutations are unique to the affected individual or family members. There are three examples of different substitutions at the same glycine residue consisting of a substitution of glycine 1009 by serine or valine, of glycine 832 by aspartic acid or serine, and of glycine 415 by cvsteine or serine (32-35). However, two unrelated infants with severe osteogenesis imperfecta had new dominant mutations of glycine 154 to arginine in the al(1) chain (36). Another two unrelated infants with lethal
COLLAGEN GENES AND THEIR MUTATIONS
37
perinatal 01 had new dominant mutations of glycine 1003 to serine in the al(1)chain (36). Both of these recurrent mutations appeared in CpG dinucleotides and are consistent with deamination of a methylcytosine to produce thymidine. There may be a marginal preference for mutations in CpG dinucleotides, which are frequent targets for mutations in other genes (37). Substitution of a glycine residue in a glycine-X-Y triplet does not alter the length of the chain but it does alter the formation and integrity of the triple helix. Numerous reports have provided data consistent with a model in which the carboxy propeptides align and disulfide-bond in the normal manner. Formation of the triple helix then commences at the carboxy terminus of the a-chain and continues at a normal rate until it encounters the glycine substitution (38-40). Further propagation of the helix is delayed, so that the amino-terminal portions of the chains remain in the nascent state for an abnormally long period; consequently, more lysine residues are hydroxylated and more hydroxylysine residues are glycosylated. Type-I collagen molecules that contain one or two mutant a-chains are often unstable and susceptible to intracellular degradation (38).In addition, the mutant molecules are secreted poorly, often to such an extent that the rough endoplasmic reticulum is grossly distended. These findings indicate that a significant proportion of the normal a-chains are destroyed in type-I collagen molecules that contain one or two mutant a-chains. For example, a heterozygous mutation of the COLlAl gene that alters half of the pro-al(1) chains synthesized by 01 fibroblasts inactivates half of the normal pro-al(1) chains and half of the normal pro-a2(I) chains produced by the same cells. The consequence of these events is a 75% reduction in the amount of normal type-I collagen that is available to form the extracellular matrix. This phenomenon, in which normal chains are lost in degraded mutant molecules, typical of dominant-negative mutations, has been called “procollagen suicide” (41). Glycine substitutions often reduce the thermal stability of the triple helix (42). However, some glycine substitutions, such as the substitutions of glycines 844 and 631 by serine in the al(1)chain, do not show this phenomenon (43, 44). These findings indicate that the model that considers the collagen helix as a single functional unit in relation to helix formation and stabilization is too simplistic; a more detailed model must consider the helix as being composed of subdomains of differing stabilities. One such model considers these domains as cooperative folding blocks that result in nonlinear helix formation and stabilization (42,44).The position of glycine mutations within these sequence subdomains would be expected to have a critical role in determining the effect of helix propagation on helix destabilization (42). The possible effects of the local sequences in determining the effect of a mutation on helical structure by providing domains of relatively low or high
38
WILLIAM G . COLE
local helix stability have been explored (45). Mutations in regions of low local stability would be less deleterious to helical stability. Further, according to this model, if sequences amino-terminal to the mutation are of high stability, renucleation of helix formation could occur readily with minimal interruption of helix propagation. Some of these points are illustrated by the differing effects of glycine substitutions by serine in the al(1)chain. The nonlethal substitutions of glycines 832 and 844 are in such a region of low stability with a region of higher stability immediately amino-terminal. Thus, disturbance of the helix folding caused by the mutations may be rescued by relatively easy renucleation of triple-helix folding beyond the mutations. In contrast, the more amino-terminal substitutions of glycines 565, 598, and 631 by serine lack such amino-terminal renucleation sites. Different glycine substitutions may also produce different effects on helix stability (46). Decreased thermal stability is a feature of collagens with glycine substitutions by arginine, cysteine, and aspartic acid (29).In a study of the structural effects associated with a substitution of cysteine for glycine 998 in the a l ( I ) chain, Traub and Steinmann (47)examined space-filling molecular models of the predicted structure of the mutant collagen. They found that a glycine-to-cysteine mutation caused only a limited local disturbance of the helix, possibly due to the introduction of disulfide bonds into the helix. Changes of glycines 391 and 667 to arginine in the a l ( I ) chain are also associated with decreased thermal stability of type-I collagen molecules (48, 49). Computerized molecular modeling studies indicate that these glycineto-arginine substitutions lead to only a minor localized disruption of the triple helix (.% These I) results . are in agreement with the space-filling models of the substitution of cysteine for glycine 988 in the al(1)chain. It is likely, in these instances, that the glycine substitutions have a kinetic effect on the collagen folding that leads to overmodification of the collagen ol-chains, rather than that the substitution leads to a major disruption of collagen folding. In 0 1 fibroblast cultures, some of the mutant molecules escape intracellular degradation and are secreted into the extracellular matrix where additional adverse effects occur (38). Delayed removal of the amino propeptide in the conversion of procollagen to collagen has been observed as a secondary effect of some glycine substitutions, even when the substitution is distant from the enzyme cleavage site (51, 52). Abnormal fibril formation has also been noted in a patient with the substitution of glycine 748 by cysteine in the a l ( I ) chain (53). Rotary-shadowing electron-microscopy shows that some of the procollagen molecules have a flexible kink at the site of the cysteine substitution (S4). Further in oitro studies showed that the mutant and normal type-I collagens synthesized by the proband’s fibroblasts are similar enough in structure to copolymerize into the same fibrils, and that, because of the copolymerization, there is a distortion of normal fibril mor-
COLLAGEN GENES AND THEIR MUTATIONS
39
phology, a delay in fibril formation, and a decrease in the net amount of collagen incorporated into fibrils (53). Glycine substitutions within the triple helix produce mild, moderate, and lethal perinatal forms of 0 1 . The most important factors that determine the severity of the phenotype appear to be the type of substitution, the type of chain involved, its position within the a-chain, and the surrounding sequences (26, 55). These relationships have been best studied in patients with glycine substitutions by cysteine in the al(I)chain (56). With the exception of the substitution at position 244, there is a smooth gradation of severity of phenotype from the carboxy-terminal to the amino-terminal substitutions (26). The reason for the lethal phenotype in the infant with the substitution at position 244 is unclear, but may involve unrecognized influences of surrounding sequences or epigenetic factors. If the latter are excluded, the transition region between the lethal and severe phenotypes occurs between cysteine substitutions of glycines 610 and 691; for the severe and moderate phenotypes, it occurs between glycines 415 and 382, and for the moderate and mild phenotypes, it occurs between glycines 175 and 94. In contrast, the transition points are more amino terminal with substitutions of glycine by aspartic acid, arginine, and valine in the al(1)chain. For arginine substitutions, the transition zone separating lethal and severe phenotypes occurs between glycines 298 and 154, and the zone separating severe and mild occurs between glycines 154 and 85 (57). The transition points for glycine substitutions by valine or aspartic acid are not known, but are probably amino terminal to glycine 256 and glycine 97, respectively (58). The relationship between the sites of glycine substitutions by serine in the al(1)chain and the severity of the clinical phenotype is more variable than with other substitutions. Lethal perinatal phenotypes have been observed with substitutions as far amino terminal as glycine 565. However, the substitutions producing lethal phenotypes are interspersed with substitutions producing mild and moderate phenotypes. The reasons for the variable phenotypes are unknown, but may involve variations in local helix stabilities (45). It has been proposed that glycine substitutions in the a2(I) chain are less deleterious than equivalent mutations of the al(I) chains because type-I collagen molecules contain only one a2(I) chain (59). However, 10 of the 16 glycine substitutions so far reported in the a2(I) chain produce lethal perinatal phenotypes. Indeed, all the glycine substitutions by aspartic acid in the al(1)and a2(I) chains produce lethal perinatal phenotypes. c. Helical Splicing Mutations. Point mutations or deletions of splicedonor or -acceptor sites result in the loss of adjoining exon-encoded sequences in the splicing of pre-mRNA to mRNA. The point mutations most
40
WILLIAM G . COLE
often involve the first two positions of the splice-donor (G-T) and -acceptor (A-T)sites. However, mutations at the + 5 splice-donor position can also lead to splicing out of the upstream exon-encoded sequence (60).The translational reading-frame is maintained following loss of an exon-encoded sequence from mRNA, as the exons encode complete codons. The helical-triplet sequence is also maintained as each exon encodes multiple complete glycineX-Y triplets. However, some point mutations and deletions result in the activation of cryptic splice sites, so that intronic sequences are inserted into the mRNA without loss of exon-encoded sequences. For example, a point mutation of T for C at position + 6 in the splice-donor site of intron 33 of the COLlA2 gene resulted in the activation of a cryptic splice site at positions 19 and +20, so that 18 nucleotides were inserted into the inRNA and six nonhelical amino acids were inserted between residues 585 and 586 of the triple-helical domain of the a2(I) chain (61). In a similar manner, 75 nucleotides from intron 35 were inserted into the mRNA and 25 nonhelical amino acids were inscrted into the helical domain of the al(1) chain of another patient (62).
+
d. Helical Deletions und Insertions. Multiexon deletions are uncommon. Examples include deletions of exons 22 to 24 of the COLlAl gene and exons 34 to 40 in the COLlA2 gene (63-66). Multiexon insertions are also uncommon. The one example is a tandem duplication of exons 15, 16, and part of 17 in the COLlAl gene (67, 68). It is surprising, given the repetitive structure of the type-collagen genes, that large rearrangements within the COLlAl and COLlA2 are so uncommon. Smaller deletions have also been reported. Two patients had deletions of a Cly-Ala-Pro triplet in the d ( I ) chain (69, 70). In another patient, three nucleotides were deleted from the a2(I) cDNA, resulting in the loss of valine 255 from the a@) chain (71).The 3' codon of exon 19 of the COLlA2 gene had been deleted in this patient, but its loss did not appear to alter the splicing of pre-mRNA. I n contrast to glycine substitutions, deletions, insertions, or splicing errors alter the length of the affected pro-a-chain. Deletions usually preserve the mandatory repeating glycine-X-Y triplet structure whereas insertions may not. However, the disparity in length of the normal and mutant chains produces major disturbances in collagen metabolism with poor secretion and increased intracellular degradation and procollagen suicide (41).The type of metabolic abnormalities are similar to, but often more severe than, those observed with glycine substitutions. Deletions or insertions in the al(1)chain, as a result of major gene rearrangements or splicing mutations, usually produce lethal perinatal phenotypes. The few exceptions involve exons 8, 9, 17, and 30. These findings
COLLAGEN GENES AND THEIR MUTATIONS
41
suggest that, like glycine substitutions, loss of the more 5’ exons are more likely to produce milder phenotypes than the loss of more 3’ exons. The unexpected clinical severity of the phenotype in an infant showing a skipping of exon 14 was probably due to the homozygous nature of the molecular defect (60).In contrast, deletions and insertions in the a2(I) chain commonly produce mild or only moderately severe phenotypes.
e. Carboxy-Telopeptide and Carboxy-Propeptide Mutations. Mutations in the carboxy-terminal nonhelical domains of the pro-a-chains have been reported less frequently than helical mutations. Point mutations and inframe deletions, as well as out-of-frame insertions and deletions, have been reported. The sole carboxy-telopeptide mutation occurred in the al(1)chain and resulted in the substitution of glycine 3 by cysteine (72). Mutations in the carboxy telopeptides and propeptides may alter chain association and disulfide bonding and delay the formation of the triple helix with consequent overmodification of the full length of the helical domain of the a-chains. Varying amounts of normal and mutant type-I collagen molecules are produced by these mutations. At one extreme are such mutations as the substitution of glycine 3 by cysteine in the carboxy telopeptide of al(1)chains, which produce a biochemical phenotype similar to that produced by glycine substitutions in the carboxy terminus of the triple helix (72). At the other extreme are such mutations as a 5-bp frameshift-deletion involving the carboxy propeptide of the pro-al(I) chain, which prevent the mutant and normal chains from associating and forming disulfide bonds (73). Only normal type-I collagen molecules are produced, procollagen suicide is avoided, and the mutant allele acts like a null allele. In a similar manner, a frameshift deletion of 4 bp from the carboxy proqeptide of the pro-a2(1) chain prevented these chains from associating with normal pro-al(1) chains (74). The normal pro-al(1) chains were not wasted in the phenomenon of procollagen suicide. In the homozygous patient, the mutant pro-a2(1) chains were degraded and the normal p r o d ( 1 ) chains formed al(1)homotrimers (75, 76). In between these extremes are mutations that allow the mutant chain to associate with normal chains, but all of the mutant molecules are retained within the cell and are degraded. Only normal type-I collagen is secreted from cultured dermal fibroblasts and only normal type-I collagen is found in the dermis and bone of the proband. However, in each circumstance, the concentration of type-I collagen is reduced to about 20% of normal because of procollagen suicide (32, 77). The clinical consequences of these mutations are variable, with mild and lethal phenotypes. The one mutation of the carboxy telopeptide of the al(1) chain produced a mild phenotype (72).The mutations of the carboxy propeptides produced lethal phenotypes ifa mutant protein chain, capable of form-
42
WILLIAM G. COLE
ing type-I procollagen molecules, was produced. Two such infants had relatively well-formed but severely porotic skeletons (57). Mutations of the carboxy propeptide of the d ( 1 ) chain that prevented chain association led to a mild phenotype (73). The excess normal a2(I) chains were destroyed, as molecules containing two or three a2(I) chains are unstable at 37°C. In contrast, an equivalent homozygous mutation of the carboxy propeptide of the pro-a2(1) chain resulted in a moderately severe type-111 phenotype, and the heterozygous parents were clinically normal (78).The milder-than-expected clinical phenotypes appeared to be due to the degradation of the mutant pro-a2(I) chains, absence of procollagen suicide, and formation of d ( I ) trimeric molecules that compensated, to some extent, for the lack of type-I collagen.
f. Null Alblic Mutations. Heterozygous mutations of this type result in the production of about half the normal amounts of type-I collagen (79, 80). The collagen produced by cultured dermal fibroblasts and the collagens found in dermis and bone are otherwise normal. The underlying molecular mechanisms are poorly understood but several studies provide insights into some possible mechanisms. Cultured dermal fibroblasts from such individuals produce reduced amounts of pro-orl(1) mRNA and chains (30, 81). Loss of an allele, through deletion or rearrangement, was not the cause of the diminished COLlAl mRNA levels (81). However, there was marked diminution in steady-state mRNA levels from one COLlAl allele. Primer extension with nucleotidespecific chain termination was used to identify mutant COLlAl alleles in cell strains heterozygous for expressed polymorphisms (81). Rowe et a1. (30)observed reduced levels of cytoplasmic d ( 1 ) mRNA and increased levels of nuclear al(1)mRNA. These findings suggested that mutant al(1) mRNA is retained within the nucleus. In more recent studies, Rowe et al. (82) observed that orl(1) mRNA that retained an out-of-frame intron was only detected in the nuclear and not in the cytoplasmic mKNA. They proposed that if a splicing mutation results in a product that is inframe, the abnormal mRNA is transported to the cytoplasm and an abnormal chain is produced, whereas if the abnormal spliced product is out-of-frame, the abnormal mRNA remains within the nucleus and only the normal allele is expressed. These findings are in keeping with reports of a nuclear scanning apparatus involved in the nuclear export of mRNA (83). Rarely, a mutant type-I collagen chain is translatable in uitro but is not detectable in the intracellular or extracellular collagens produced by dermal fibroblasts (73). In such cases, mutations in the carboxy propeptide prevent pro-a-chain association, registration, and disulfide bonding. Alternative mechanisms that have not as yet been identified as causes of a null COLlAl
COLLAGEN GENES AND THEIR MUTATIONS
43
allele in osteogenesis imperfecta include deletion of an allele, regulatory mutations, and nonsense mutations (1). The typical osteogenesis imperfecta type-IA phenotype, which lacks dentinogenesis imperfecta, is produced by heterozygous mutations of the COLlAl gene that produce a functionally null allele (30, 79). Little is known about the clinical effects of null allelic mutations of the COLlA2 gene, except in one child with osteogenesis imperfecta type 111, described in Section II,B,l, who was homozygous for a carboxy-propeptide mutation that prevented the mutant p r o - 4 1 ) chains from associating with pro-al(I) chains.
g . Somatic and Gonadal Mosaicism. Although the recurrence rate of new dominant mutations in 01has been estimated to be 1in 5000 to 1O,OOO, the observed recurrence rate of lethal perinatal01 is approximately 7% (84). This higher than expected rate of recurrence of lethal perinatal 01 is due to parental gonadal mosaicism. Recurrences of lethal perinatal forms of 01 have been observed in families in which the parents are clinically normal or in which one parent is minimally affected by 01. Such recurrences have been proposed to be due to autosomal recessive inheritance (28). However, this proposal is inconsistent with the heterozygous nature of the mutations found in most of the affected infants. The alternative explanation for the recurrence of heterozygous 01 is that one parent has gonadal mosaicism (70, 85, 86). Direct evidence of mosaicism was found in the sperm and somatic cells of some fathers and in somatic cells of some mothers. In one mosaic father, it was estimated that one in eight of his sperm and 40% of his diploid cells, such as white blood cells and hair root bulbs, carried the mutation, although he was clinically normal (85). 2. MUTATIONS OF ? ~ P E -COLLAGEN I GENES I N EHLERS-DANLOS SYNDROME
a . Amino-Telopeptide and Amino-Propeptide Mutations. Mutations that impair the removal of the amino peptides of the pro-a-chains of type-I procollagen produce the Ehlers-Danlos type-VII phenotype with major joint dislocations and skin laxity (87, 88). As in osteogenesis imperfecta, the abnormal collagen is found in the type-I collagen-containing tissues. There are three forms of EDS type VII. EDS type VIIA is due to mutations that remove the N-proteinase cleavage site of the pro-al(1) chains. EDS type VIIB is due to similar defects involving the pro-a2(1) chains. Joint dislocations of similar severity are a feature of these two subtypes of EDS type VII. EDS type VIIC results from a deficiency of N-proteinase and produces a more severe phenotype with greater fragility of the skin. The two reported cases of the VIIA variant were due to a G-to-A transition at the -1 position of the splice donor site of intron 6 of COLlAl. This
44
WILLIAM G . COLE
mutation results in alternative splicing with products lacking exon-6encoded sequences, which includes the N-proteinase cleavage site, or including exon-6 sequences and a substitution of methionine 159 by isoleucine (88-90). The clinical features of these two cases were similar. Ten cases of the type-VIIB variant have been described; three were due to a G-to-A transition at the -1 position of intron 6, as in the type-VIIA cases, except that they involved the COLlA2 gene (91-93). Six cases had point mutations of the GT-dinucleotide splice-donor site of intron 6, which resulted in the complete loss of exon-6-encoded sequences from the 012(I) mRNh and chain (52, 94-98). The deleted sequence included the N-proteinase cleavage site and the cross-linking lysine site in the N-telopeptide of the a2(I) chain. In another case there was partial loss of exon-6-encoded sequences due to a base substitution at the splice-acceptor site of intron 5 of the COLlA2 gene that activated a cryptic splice site at positions +14 and +15 of exon 6 (87).The N-proteinase cleavage site was lost but the nearby cross-linking lysine site in the N-telopeptide was retained. Dermatosparaxis (EDS type VIIC) was first recognized as a recessively inherited disorder of cattle (99, 100).Two human cases of the type VIIC, due to N-proteinase deficiency with retention of the N-propeptides of pro-oll(1) and pro-a2(1) chains, resemble clinically and histologically the animal equivalent, dermatosparaxis (101, 102). It is not clear why mutations that impair the processing of the amino propeptides produce the Ehlers-Danlos type-VII phenotype while mutations of the helix and carboxy-terminal extensions of type-I procollagen produce an osteogenesis imperfecta phenotype. As expected, however, some usteogenesis imperfecta patients with amino-terminal mutations of the triple-helical domain of type-I collagen have features of the Ehlers-Danlos syndrome (103).Cultured fibroblasts from some 0 1 patients with mutations distant from the amino-proteinase cleavage sites also show impaired removal of the amino propeptides (51, 52, 104).
C. Insertional and Site-directed Mutagenesis of the COL7A7 Gene 1. CELL CULTUREAIVD THAXSGENIC MOUSE MODELS OF LETHALFERINATAL OSTEOCEIVESIS IMPERFECTA
Site-directed mutagenesis of the mouse and human COLlAl genes have been used successfully to investigate the effects of glycine substitutions and deletions on the metabolism and structure of tvpe-I collagen in transfected cells and transgenic mice. Site-directed mutagenesis of the mouse COLlAl gene provided direct evidence that glycine substitutions within the triple-helical domain of the
COLLAGEN GENES AND THEIR MUTATIONS
45
pro-al(1) chain result in the biochemical phenotypes observed in human lethal perinatal osteogenesis imperfecta type I1 (105). Mutagenesis of the gene was carried out to produce the substitution of glycine 859 by cysteine or arginine in the d(1) chain (106).These two mutations were selected as they had been identified in the carboxy-terminal region of the a l ( I ) chain in infants with lethal perinatal 01. Neither of them had involved glycine 859 but this residue was selected for substitution because its codon was suitably placed within the genomic nucleotide sequence known at that time. The mutated mouse COLlAl gene was transfected into Mov-13 cells, which did not express their endogenous COLlAl genes, and into NIH 3T3 cells, which did express their endogenous COLlAl genes. The Mov-13 cells produced stable type-I collagen molecules that contained the mutant d(1) chains and the endogenous normal 012(I) chains. The NIH 3T3 cells reproduced the heterozygous state seen in infants with lethal perinatal osteogenesis imperfecta in that the cells produced normal and mutant type-I collagen molecules. Also, as observed in affected infants, the mutant collagen was poorly secreted, enzymatically overmodified, and had reduced thermal stability. To study the effects of these mutations further, the mouse COLlAl gene encoding the substitution of glycine 859 by cysteine was introduced into fertilized normal mouse ova (106). In this heterozygous mouse model, the introduced mutant gene was expressed against the background expression of the endogenous COLlAl gene. The mutant construct was made in such a way that the mutant and normal COLlAl gene and their mRNAs and &-chains could be distinguished from each other. A lethal phenotype was observed in some of the mice carrying the transgene. The severity of the pathological phenotype correIated directly with the extent of mutant gene expression, in that fetuses most severely affected expressed the mutant gene to the greatest extent. A striking reduction of the total type-I collagen content of the dermis, proportional to the level of mutant gene expression, was seen in all transgenic fetuses. For example, fetuses in which only 10 or 30% of the al(1)mRNA was mutant produced 54 or 80% less type-I collagen than controls, respectively. These results showed that the production of relatively small amounts of mutant type-I procollagen leads to a dramatically reduced type-I collagen content of the skin. These findings are in accordance with those observed in the dermis of infants with the lethal perinatal phenotype (54). Similar results were observed with a human COLlAl minigene when expressed in NIH 3T3 cells and transgenic mice (107, 108). A minigene, lacking the central 41 exons and introns, was constructed to mimic a multiexon deletion as observed in one of the first characterized cases of human lethal perinatal osteogenesis imperfecta (64).The naturally occurring mutation,
46
WILLIAM G . COLE
which consisted of an in-frame deletion of exons 22 to 24 from one allele of the COLlAl gene, resulted in the formation of shortened pro-al(1) chains. The engineered construct consisted of 2.5 kb of the 5'-promoter region, the first 5 exons and introns, the last 6 exon and introns, and about 2 kb of the 3' flanking region. The junction of the 5' and 3' halves of the minigene were made in introns so as to preserve the consensus sequence for RNA splicing. Shortened pro-al(I) chains were produced by stably transfected NIH 3T3 cells. The biochemical phenotype was similar to that observed with the naturally occurring deletion of exons 22 to 24 of one COLlAl allele (108). Similarly, transgenic mice expressing high levels of the minigene developed a genetically transmitted lethal 0 1 phenotype (107). Cultured fibroblasts from these mice showed that the shortened pro-al(1) chains synthesized from the human minigene associated with and became disulfide linked to mouse pro-al(1) chains. The major effect of the minigene products appeared to be the depletion of normal type-I procollagen in the transgenic animals through the phenomenon of procollagen suicide.
2. CELL CULTUREAND TRANSGENIC MOUSEMODELS OF OSTEOGENESIS IMPERFECTA TYPEIA The mouse model of mild osteogenesis imperfecta type IA is based on the Mov-13 mouse, in which integration of a murine retrovirus within the first intron of the COLlAl gene blocked transcription, so producing a functidnally null allele (109). In the homozygous Mov-13 mouse, development is normal until approximately the 13th day of gestation, when the embryos die from bleeding (110). The tissues of these mice lack type-I collagen, and cultured dermal fibroblasts do not produce al(1)mRNA or protein. In contrast, tooth rudiments from homozygous Mov-13 mouse embryos produce a dentin layer containing normal amounts of type-I collagen when grown as transplants either in the anterior chamber of the eye or under the kidney capsule of syngeneic hosts (111).Odontoblasts appear to have the ability to produce al(1) mRNA efficiently despite stable integration of the retrovirus within the first intron of the COLlAl gene. Similarly, about 5% of osteoblasts from long bones were able to produce normal amounts of type-I collagen, indicating that a small subset of osteoblasts also did not express the mutant phenotype (111). The heterozygous Mov-13 mouse is a model of mild human osteogenesis imperfecta type IA (112). In keeping with human 01 type I, the dermis contains about half the normal amounts of type-I collagen, and cultured dermal fibroblasts show a similar reduction in type-I collagen production. The long bones are abnormally brittle on biomechanical testing, but the teeth appear to be normal.
COLLAGEN GENES AND THEIR MUTATIONS
47
111. Type4 Collagen Gene A. The Normal Type-ll Collagen Gene Type-I1 collagen is a typical fibrillar collagen with a helical domain 300 nm long and a fibril periodicity of 67 nm. It is a homotrimer, with a molecular weight of 300,000, containing three identical a l ( I I ) chains. It is found in the cartilages and virteous humor although it is transiently expressed more widely during embryogenesis. Chondrocytes are the predominant cell expressing the C O D A 1 gene. The type-I1 collagen gene, approximately 30 kb in size, is a typical singlecopy fibrillar collagen gene (3). It contains 54 exons and is localized to chromosome band 12q13.11-q13.12 (113).The N-propeptide is encoded by 7 exons (114).The 242-bp exon-1 encodes 157 bp of untranslated 5' sequence, the signal peptide, and the N-terminus of the N-propeptide. The 207-bp exon-2 encodes a cysteine-rich region of the N-propeptide that is alternatively spliced (114).The pro-al(I1) chain lacking exon-2 sequences is the B isoform, whereas the chain containing the exon-2 sequences is the A isoform (115).The type IIB isoform is produced by cells with a mature chrondrocytic appearance. The cells expressing the type IIA isoform have a fibroblastic phenotype and appear during embryogenesis where they surround cartilage. However, type-I1 collagen is also transiently produced by many cells, including epithelial cells, during embryogenesis (116, 117). It is likely that the type-IIA isoform is expressed by these embryonic cells and takes part in determining the shape of the craniofacial and other tissues (115).It appears, however, that chondrogenesis depends on removal of exon-2-encoded sequences to produce type-IIB procollagen. The type-IIA and -1IB isoforms are identical except for the alternative splicing of exon 2. The N-telopeptide contains 19 amino acids and the main triple helix contains 1014 amino acids, including 338 Gly-X-Y triplets that are essential for the formation of the triple helix. The C-telopeptide and C-propeptide contain 27 and 246 amino acid residues, respectively (118).All exons encoding the main triple helix contain 54 bp or multiples thereof; all of them contain complete codons and encode complete Gly-X-Y triplets. The carboxy propeptide is encoded by four exons, 51-54. The 334-bp exon 54 contains 230 bp of 3' untranslated sequence. There is a single polyadenylation ATIAAA signal located 23 nucleotides upstream of the poly(A) tail of pro-al(I1) mRNA (119). Regulatory sequences that provide tissue-specific expression are located within the 4.5-kb sequences upstream of the transcription start site and within the 4105-bp first intron of the human COL2Al gene (120-122). A TATA box is located 28 bp upstream of the transcription start site (121, 122).
48
WILLIAM G. COLE
A search for regulatory sequences in the 1542-bp segment upstream of the transcription site revealed numerous potential regulatory elements and several conserved motifs, including four Spl-recognition sites, two enhancer core elements, and one pyrimidine-rich region (122).There also appear to be two “silencer” elements in the promoter that function to keep COL2AI inactive in nonchondrocytes (121). They modify the activity of a powerful promoter located within 130 nucleotides of the start site of transcription (123).An enhancer element, required for chondrocytic expression, is located in the first intron of the rat and human COL2A1 genes (121, 123). A decamer sequence, CACAATGCAT, in the middle of the enhancer binds a chondrocyte nuclear protein(s) necessary for enhancer activity. The same sequence is present in the human gene, and it has homology to the consensus binding-sequence for helix-loop-helix transcription factors (123).
B. Naturally Occurring Mutations of the Type-ll Collagen Gene Mutations of the COL2Al gene would be expected to result in phenotypes characterized by structural anomalies of the hyaline- and fibrocartilages, the eye and the ear. Spranger (125) grouped the bone dysplasias demonstrating these phenotypic features into the type-I1 collagen family. It includes achondrogenesis type-I1 (hypochondrogenesis), spondyloepiphyseal dysplasia congenita, Stickler syndrome, and Kniest syndrome (Table V).
1. MmuTATrorvs OF TYPE-I1 COLLAGEN IN TYPE-I1 ACHONDROGENESISAND HYPOCHONDROGENESIS Achondrogenesis type-I1 and hypochondrogenesis are the same condition, except the former is more severe than the latter (126). The principal features are severe lethal dwarfism. The extremities are very short and broad. the trunk is short, the head is large, the nasal bridge is flat, the jaw is hypoplastic, and a cleft palate is frequent. Ossification of the cranium is normal whereas the long bones are very short and broad with metaphyseal irregularities. Ossification of the pelvis and vertebrae is severely retarded or absent. The hyaline cartilages of the skeleton have a fetal histological appearance. The chondrocytes are enlarged and the amount of intercellular matrix is reduced and disorganized (127, 128). The growth plates are severely disorganized and vascular channels that extend into the epiphyses are fibrotic (1 26). The epiphyseai cartilages, which have an abnormally gelatinous appearance, lack or contain markedly reduced amounts of type-I1 collagen (127-129). The major collagen is type-I collagen, not normally found in hyaline cartilages, with about 10% of type-XI collagen. Sufficient type-I1 collagen was present in the cartilage from one infant to
49
COLLAGEN GENES AND THEIR MUTATIONS TABLE V MUTATIONSOF THE COL2Al GENEAND THEIRPHENOTYPES~ Disease
Zygosity
Mutation
Amino-acid change
Ref.
Ach-H yp Ach-Hyp Ach-Hyp
Hetero Hetero Hetero
G-to-A, exon 46 G-to-A, exon 43 G-to-A, exon 33
Gly-943 to Ser Gly-853 to Glu Gly-574 to Ser
182 133 132
SEDc SEDc SEDc SEDc SEDc
Hetero Hetero Hetero Hetero Hetero
Deletion, exon 48 45-bp insert, exon 48 G-to-A, exon 48 C-to-T, exon 41 C-to-T, exon 31
Deletion Gly-970-IIe-984 Duplication of Gly-970-IIe-984 Gly-997 to Ser Arg-789 to Cysb Arg-519 to Cysc
143 144 150 153 145
Stickler Stickler Stickler Stickler Stickler
Hetero Hetero Mosaic Hetero Hetero
C-to-T, exon 40 T del, exon 40 Skip exon 12 G-to-A, exon 10 Exon 7
Arg-732 to Ter Premature Ter Deletion of Gly-91-Lys-108d Gly-67 to Asp Arg-9 to Ter
160 162 156 163 161
Kniest
Hetero
Skip exon 12
Deletion of Gly-91-Lys-108d
156
OAhbreviations: Ach-Hyp, type-I1 achondrogenesis-hypohondrogenesis; SEDc, spondyloepiphyseal dysplasia congenita; Ter, premature termination condon. bTwo unrelated families shown to have the same mutation. CThree unrelated families shown to have the same mutation. dMemhers of the same family in which the mosaic mother has typical Stickler syndrome and her heterozygous child has typical Kniest syndrome.
enable the mutation to be localized by protein analysis (130). The al(I1) chains migrated slowly on electrophoresis in the manner typical of enzymatic overmodification of lysine residues. It was proposed that the mutation was probably in the carboxy-terminal region of the triple-helical domain of the al(1I) chain, as all of the cyanogen bromide peptides migrated slowly due to overmodification of lysine residues (130). The corresponding region of the COL2Al gene was studied, using a cosmid library that was prepared from the proband’s fibroblast DNA (131).The infant was heterozygous for a G-toA transition in exon 46, which was within the region predicted to contain the mutation. It resulted in the substitution of Gly-943 of the triple-helical domain of the al(I1) chain by Ser (Table V). It interrupted the normal Gly-X-Y repeating-triplet structure and severely impaired the function of type-11 collagen molecules containing one or more mutant chains (131). The chondrocytes were shown by electron-microscopy to contain grossly distended rough endoplasmic reticulum due to impaired secretion of the mutant type-I1 collagen. The lethal phenotype was due to the dominant negative nature of the mutation that resulted in the production of struc-
50
WILLIAM G . COLE
turally defective a l ( I I ) chains and the degradation of a significant fraction of the normal chains produced from the normal C O U A I allele. Although the chain compositions of the type-I1 collagen molecules were not determined in this infant it is likely that approximately 12.5% of the molecules were normal homotrimers, 12.5% were mutant homotrimers, and 75% were heterotrimers containing both normal and mutant chains. It is also likely that most of the molecules containing one or more mutant chains were destroyed and this process would have also led to the destruction of about half of the normal a l ( I I ) chains. Similar abnormalities were observed in an infant with hypochondrogenesis (Table V) due to the substitution of Gly-574 by Ser (132).The strategy used to characterize this mutation differed from the previous case as it relied on using a chondrocyte culture system and cDNA-PCR scanning analysis. The small amounts of available cartilage were used for electronmicroscopy and for cultures. Light-microscopy showed typical features of the disorder whereas eiectron-microscopy showed that most chondrocytes contained dilated rough endoplasmic reticulum and irregular thickening of collagen fibrils. Because cartilage contains few chondrocytes, their numbers were increased by allowing them to proliferate in the fibroblastic state in monolayer cultures. They reexpressed the chondrocyte phenotype when grown on agarose. Within a few days, the cells expressed the COL2A1 gene and produced overmodified mutant chains and normal chains indicating that the proband was heterozygous for the mutation. The protein findings also suggested impaired processing of type-I1 procollagen to collagen. Microscopy showed, as in uit;o, that the chondrocytes contained dilated rough endoplasmic reticulum and an abnormal extracellular matrix. Although the latter studies provided an insight into the pathogenesis of the disease, there was insufficient protein available for localization of the mutation by peptide mapping. Southern and northern blot analyses were also unhelpful, because they were normal. Rather than scanning the COL2A1 gene of 38 kb, Horton et al. (132)scanned the full-length cDNA of approximately 5 kb, prepared from cultured chondrocyte mRNA. The fulllength cDNA, encoding the translated domain of COL2A1, was synthesized as three overlapping fragments (exons 1-22, 22-38, and 38-50), readily analyzed by PCR-amplification, single-strand conformational polymorphism analysis of 200- to 4 0 0 - b ~subfragments, and direct sequencing of abnormal subfragments. With this approach, the infant was shown to be heterozygous for a C-to-A transition in exon 33 resulting in the substitution of Gly-574 of the triple-helical domain of al(I1) chains by Ser. Type-I1 collagen abnormalities were also observed in another case of hypochondrogenesis and were shown to be due to the substitution of Gly-853
COLLAGEN GENES AND THEIR MUTATIONS
51
by Glu resulting from a GGA-to-GAA transition in one allele of COL2Al
(133, 134). 2. MUTATIONS OF T~PE-IICOLLAGEN IN SPONDYLOEPIPHYSEAL DYSPLASIA CONGENITA Spondyloepiphyseal dysplasia (SED) congenita varies markedly in its severity. In its most severe form it produces a short-trunk form of dwarfism with facial hypoplasia, cleft palate, myopia, retinal detachments, sensorineural deafness, abnormal vertebrae, and short long-bones with markedly retarded epiphyseal ossification (135). In its mildest form, it produces minimal epiphyseal dysplasia and premature osteoarthritis (136, 137). Most of the early protein studies (138, 139) reported overmodified typeI1 collagen in patients with SED congenita and its variants and showed that the distribution of the overmodified cyanogen bromide peptides correlated with the clinical severity of the phenotype. For example, the severest phenotypes were observed in patients with overmodification of all peptides, indicative of a carboxy-terminal mutation, and the milder phenotypes were observed when the overmodification was restricted to the more aminoterminal peptides, indicative of a more amino-terminal mutation. As expected, genetic linkage was established between many of these phenotypes and restriction-fragment-length polymorphisms of COL2Al (140-143). However, Lee et al. (143) were the first to report direct evidence of a type-11 collagen mutation in SED congenita. Coarse screening of COL2Al by Southern blot analysis revealed an abnormal restriction pattern, shown by genomic DNA sequencing to be due to a heterozygous deletion of 390 bp from the middle of intron 47 to the splice-donor site of intron 48. The deletion eliminated exon 48 and the 36 amino acids from residues 964 to 999 of the triple-helical domain of the al(I1) chain. Protein studies were not undertaken, but it was predicted that the mutant allele acted in a dominant-negative manner typical of similar mutations of the type-I collagen genes (26). Other COL2A1 mutations have been characterized by a variety of techniques (Table V). In one of them, protein studies were used to localize the mutation for directed analysis of COL2A1, which showed a heterozygous 45bp duplication within exon 48 (144).The in-frame insertion added 15 amino acids to the helical domain of the al(I1) chains, resulting in the observed amino-terminal overmodification of lysine residues. The DNA sequence homology observed in the region of the duplication suggested that the mutation had arisen by unequal crossover between related sequences. Ala-Kokko et al. (145) used an intragenic COL2Al linkage marker to isolate a mutant cosmid clone, containing most of the mutant COL2Al allele, from a proband with mild SED. Sequencing of over 20 kb, including all of
52
WILLIAM G . COLE
the nucleotides for exons 2 to 52, revealed a transition of C-to-T in exon 31 resulting in the replacement of Arg 519 by Cys. This substitution involved the third position of the triplet Gly-Pro-Arg. Protein studies showed that the mutant a l ( I I ) chains migrated slowly due to enzymatic overmodification, and formed disulfide-bonded dimers (146, 147). The protein features were similar to those observed with Gly replacements by Cys in Gly-X-Y triplets of type-I collagen chains (148). The same Arg-519-to-Cys substitution was also identified in two additional, unrelated families with a similar type of mild SED and premature osteoarthritis (149). A novel method was used to characterize a heterozygous G-to-A transition in exon 48 of COUA1 that substituted Ser for Gly-997 (150).Advantage was taken of the phenomenon of low basal or “illegitimate” transcription of genes for tissue-specific proteins by cultured fibroblasts and lymphoblastoid cells (151).Both of these cell types transcribe the COUAI gene to process it in a normal manner (150). Lymphoblastoid cells also transcribe normal and mutant COLlAl and COLlA2 alleles in 0 1 as well as normal and mutant COL3Al alleles, in EDS type IV (150).The low steady-state levels of specific niRNAs in these cells cannot be detected by northern blot analysis and there is no detectable protein produced. However, the sequences can be detected following cDNA-PCK (151). The only modifications to our standard technique of PCR of abundant collagen cDNAs were as follows: to increase the number of amplification cycles from 30 to 40; to amplify only 500- to 600-bp fragments in order to reduce PCR errors; to use oligonucleotide primers that spanned several exons to ensure that cDNA rather than DNA templates were being used; and to include control reactions to detect DNA contamination (150, 151). To take advantage of this phenomenon, a 621-bp fragment from the 3’ coding region of al(I1) cDNA, prepared from cultured SED fibroblasts and lymphoblastoid cells, was amplified by the PCR and examined for sequence anomalies using the chemical cleavage method (1*52).An anomaly was observed that was shown, following sequencing, to consist of a heterozygous G-to-A transition resulting in the substitution of Ser for Gly-997 in the triplehelical domain (150).The mutation altered a restriction site used to confirm that the proband was heterozygous for the point mutation in exon 48 of COL;?Al and that the parents were normal. A heterozygous substitution of Arg-789 in the triple-helical domain of d(11) chains by Cys, which was characterized by a combination of the techniques described above, illustrates the need to use varying strategies according to the availability of cartilage samples and genetic linkage data (153). Collagen protein from the cartilage was used to localize the mutation, and chondrocyte cultures were used to characterize the metabolic consequences of the mutation. As the number of chondrocytes available for the latter
COLLAGEN GENES AND THEIR MUTATIONS
53
studies was limited, cultured fibroblasts were used as an alternative source of al(I1) mRNA, cDNA, and genomic DNA for characterization of the mutation. The region of the cDNA corresponding to the cyanogen bromide peptide map anomaly was amplified by the PCR and shown by single-strand conformational polymorphism and chemical cleavage analyses to contain a sequence anomaly. Direct sequencing showed a C-to-T transition, confirmed by restriction mapping to be due to a heterozygous mutation in exon 41. The type-I1 collagen chains were overmodified and formed disulfide-bonded dimers in a manner similar to the type-I1 collagen bearing a substitution of Cys for Arg-519 (145). The same Arg-789-to-Cys substitution was also identified in an unrelated child with a similar form of SED congenita.
3. MUTATIONSOF TYPE-I1COLLAGEN I N THE KNIEST SYNDROME This is a short-trunk form of dwarfism with spinal deformities, flat face, cleft palate, hearing loss, myopia, and stiff, deformed joints (154).The cartilage is grossly abnormal with large chondrocytes lying in a loosely woven matrix containing numerous empty spaces (154).Electron-microscopic studies showed that the collagen fibril organization was abnormal in epiphyseal and growth plate cartilages (155).The carboxy-terminalpropeptide of type-I1 procollagen appeared to accumulate within chondrocytes, but the remainder of the al(II) chains appeared to be normally secreted. It was proposed that the reduced amount of the carboxy propeptide is responsible for the observed abnormality of collagen fibrils. However, the molecular basis of these changes was not reported. One mutation, however, was identified in a patient kith typical Kniest syndrome (156).It consisted of a deletion of 28 bp from the 3' region of exon 12 including the highly conserved GT dinucleotide at the 5' end of intron 12. The changes in the protein and mRNA were not reported, but shortened species would be expected. This mutation does not account for the anomalies of the carboxy propeptide reported in four other cases (155). 4. MUTATIONS OF TYPE-I1COLLAGEN IN THE STICKLER
SYNDROME (ARTHRO-OPHTHALMOPATHY) Individuals with the Stickler syndrome and those with SED congenita display anomalies of the same tissues. The face is hypoplastic and eye anomalies, deafness, cleft palate, as well as joint anomalies with premature osteoarthritis, are common. In contrast, however, individuals with the Stickler syndrome are excessively tall and thin and have long thin hands and feet (157). Genetic linkage of COL2Al with the Stickler phenotype was reported in several families (158, 159). A stop codon was identified as the cause of the
54
WILLIAM G . COLE
phenotype in one family (160). Cartilage was not available for study but genetic linkage analysis provided an RFLP marker for the mutant COUAI allele. Cosmid clones were prepared and clones containing the marker were subcloned and sequenced. A sequence of over 7 kb was analyzed, including exons 2 to 44, which included 43 of the 54 exons of COL2Al and 38 of the 43 exons encoding the triple-helical domain of al(I1) chains. Sequencing included approximately 40 bp of flanking regions of each exon. A single-base mutation that altered a CG dinucleotide in exon 40 converted the codon CGA for Arg-732 to TGA, a stop codon. The mutation was confirmed in the affected family members, using allele-specific oligonucleotides to the wildtype and mutant sequences. I n another family, a similar mutation, but involving the codon for Arg-9 in exon 7, also produced a stop codon (161). A premature stop codon was also observed in another family, but it resulted from a different molecular mechanism (162). The heterozygous molecular defect consisted of the deletion of the thymidine nucleotide at position 18 of exon 40 of one COL2Al allele, resulting in a translational frameshift and the formation of a nonsense codon, TGA, in the sequences encoded by exon 42. In these families, the Stickler phenotype appeared to be due to a quantitative deficiency of type-I1 collagen, as the truncated mutant pro-al(I1) chains would lack the carboxy propeptide that is essential for chain association, registration, and disulfide bonding. In contrast, mutations that would be expected to produce abnormal al(I1) chains have been identified in two other families (156, 163). One of them had a heterozygous substitution of Asp for Gly-67 due to a G-to-A transition in exon 10 of COL2Al (163).Again, cartilage was not available for study, so screening of genomic DNA was undertaken. Forty three of the 54 exons of COL2Al were amplified, and the PCR products were analyzed for sequence anomalies by denaturing gradient gel electrophoresis. In the other family, the mother of a child with typical Kniest syndrome, described in Section 111,B,3, had a typical Stickler syndrome, due to a mosaic deletion of 28 bp from exon 12 that probably resulted in loss of exon-12 sequences from mature al(I1) mRNA (156).
5. GENOTYPE-PHENOTYPE RELATIONSHIPS Although only a small number of mutations have been characterized to date, some patterns are already emerging that indicate that the osteogenesis imperfecta model of fibrillar collagen mutations also applies to the family of type-I1 collagen diseases. For example, the phenotypes in this family appear to result either from the effects of dominant-negative mutations, with qualitative and quantitative anomalies of type-11 collagen, or from the effects of autosomal dominant mutations that produce a functionless allele and a quan-
COLLAGEN GENES AND THEIR MUTATIONS
55
titative deficiency of type-I1 collagen (Table V). When considered in this way, type-I1 achondrogenesis and hypochondrogenesis are equivalent to lethal perinatal osteogenesis imperfecta (27). SED congenita is the equivalent of 0 1 of type III/IV. As with the latter types of 01, the severity of the phenotype varies and the mutations are intermingled with those that produce a lethal phenotype. It is likely that the severity of the phenotype is related to the type of mutation, its site, surrounding sequences, and epigenetic factors, although the mechanisms involved are poorly understood (26,27). Unexpectedly, three unrelated families shared Arg-519-to-Cys substitutions and two unrelated families shared Arg-789-to-Cys substitutions. The phenotype was more severe in the families with the more carboxy-terminal Arg-to-Cys substitutions, in accordance with the gradient of severity observed with Gly-toCys mutations in the al(1)chain of type-I collagen in 01 (26). The reported deletion of part of exon 12 in the Kniest syndrome was unexpected, as the severity of the phenotype and the results of previous immunohistochemical studies were more in keeping with mutations in the 3’ region of the COL2Al gene. More mutations must be characterized in order to clarify these apparent inconsistencies. The Stickler syndrome results from a quantitative deficiency of type-11 collagen or from amino-terminal glycine substitutions in the helical domain of the oll(I1) chain. As a result, the molecular pathology of the Stickler syndrome is equivalent to that of 01 of type I (27).
C. Site-directed Mutagenesis of the Type4 Collagen Gene The most obvious difficulties with the study of CODA1 mutations in human diseases are the absence, in many cases, of cartilage and chondrocyte cultures, the inability to predict accurately the type and site of the mutation from the clinical and radiological phenotypes, and the consequent need to scan and sequence large regions of cDNA or DNA to localize the mutation. Site-directed mutagenesis of COL2A1, with expression of mutations in cells and transgenic mice, provides the opportunity to study genotypephenotype relationships systematically. A cosmid containing the human type-I1 collagen gene, including 4.5 kb of 5’ and 2.2 kb of 3’ flanking DNA, was introduced into embryonic stem cells, which were used to produce chimeric mice (120). The normal human COL2Al transgene was correctly transcribed in a stage-, cell-, and tissuespecific manner in the chimeric mice. These results showed that the sequences required for normal expression were included within the construct. These conclusions were confirmed in transgenic mice bearing a construct consisting of the rat COL2Al promoter and enhancer and the gene for the
s6
WILLIAM G . COLE
diphtheria toxin A chain (164).This construct led to cell lineage ablation of chondrocytes and resulted in the abortion of small fetuses with shortened and underdeveloped limbs and a cleft palate-the features of a chondrodysplastic mouse. Histological studies showed a sparse and disorganized extracellular matrix surrounding chondrocytes. Chondrodysplastic transgenic mice have also been produced by the introduction of COL2A1 gene constructs containing in-frame deletions or point mutations into pronuclei of one-cell mouse embryos. A minigene version of human COL2A1, containing only 12 of the normal 54 exons, equivalent to a large in-frame deletion, was used to produce five lines of transgenic mice (165). Most of the mice expressing the minigene developed a chondrodysplasia characterized by dwarfism, short thick limbs, short snout, cranial bulge, cleft palate, and delayed mineralization of bone. These features are shared by the lethal and severe members of the human type-I1 collagen family-SED congenita and achondrogenesis type-II-hypochondrogenesis. The cartilage contained thin, sparse, and abnormally organized collagen fibrils. Cultured chondrocytes produced a mixture of normal and truncated al(1I) chains. Some of the latter chains were disulfide-bonded to normal chains, which were largely retained within the cells. The heterotrimers were degraded within the cell by procollagen suicide, which destroys both normal and mutant chains (166).As a result, the phenotype is due to quantitative and qualitative anomalies of type-I1 collagen, indicating that the mutation acted in a dominant negative manner. A similar phenotype was produced by transgenic mice expressing a Glyto-Cys change at residue 85 of the triple-helical domain of al(I1) chains (167). The 39-kb D N A construct contained the mouse COL2A.Z gene, including 3 kb of 5’ flanking sequences and 7 kb of 3’ flanking sequences. The severity of the chondrodysplastic phenotype appeared to depend on the level of expression of the transgene. For example, one set of offspring heterozygous for the transgene did not show overt signs of abnormal bone growth. They expressed the mutant transgene at about one-third the level of expression of the endogenous gene. In contrast, a set of mice that were homozygous for the transgene died at birth and showed symptoms of severe growth retardation. The cartilage structure was similar to that observed with the human COL2Al minigene, except that the matrix also contained some abnormally thick fibrils. These fibrils may have formed because of defective collagen crosslinking, due to the proximity of the point mutation to the cross-linking Lvs at residue 87 of the triple helix. This proposal was supported by the observation of similar large fibrils in transgenic mice bearing a Lys-to-Arg mutation of the latter cross-linking site or of the Lys cross-linking site of the N-telopeptide.
COLLAGEN GENES AND THEIR MUTATIONS
57
The abnormally large collagen fibrils were not present in transgenic chondrodysplastic mice bearing a deletion of exon 7 and intron 7 (168).This deletion removed amino-acid residues 4 to 18 from the triple-helical domain of al(II) chains. It was considered to be too amino terminal to influence cross-linking of Lys-87 (167).As with the other chondrodystrophictiransgenic mice, the cartilage matrix and endochondral ossification were severely disorganized.
IV. Type-Ill Collagen Gene
A. The Normal Type-Ill Collagen Gene Type-I11 collagen is a widely distributed fibrillar collagen, being most abundant in dermis, blood vessels, and fibrous structures of viscera. It forms fine collagen fibrils in uitro, but in uiuo it is appears in heterotypic fibrils with collagens types I and XI. It is secreted by fibroblasts and smooth muscle cells as a procollagen. The amino and carboxy propeptides are cleaved to yield type-111 collagen, although cleavage of the amino propeptide is often slow in uiuo (169). The nucleotide sequence of the 5' nontranslated region of al(II1) cDNA shows three ATG translation start codons (1 70).The first two, at - 15 to - 17 and at -67 to -69, are followed by in-frame stop codons. The amino-acid sizes of the domains of al(II1)procollagen chains are listed in Table 111. The signal peptide, N-propeptide, and N-telopeptides are similar to those of pro-al(I) procollagen chains (1 70, 171). The triple-helical domain contains 343 Gly-X-Y triplets and 1029 amino acids. It is five triplets longer than the al(1)or 4 1 ) chains of type-I collagen and the al(I1)chain of type-I1 collagen (Table 111). Comparison of hydrophobicity plots of the al(II1) chain with al(1)and a2(I) chains showed that the additional triplets increase the differences between the ends of the type-I11 and -I collagens (170).The codon usage for amino acids in the triple-helical domain showed a preference for U in the third position of codons for glycine, proline, and alanine, as also observed for type-I collagen (172). The nucleotide sequence of the 3' nontranslated region of the cDNA showed that the mRNA contains three polyadenylation signals, AATAAA (170).Two major mRNA transcripts of 4.9 and 5.5 kb have been observed in cultured fibroblasts (173). The COL3A1 gene is located on chromosome 2q31-q32.3 (Table 11). The COL5A2 gene has also been colocalized to this region of chromosome 2 (Table 11). It is likely that the C O D A 1 gene contains 52 exons and that it is
58
WILLIAM G. COLE
larger than the COLlAl and COLlA2 genes (174). Partial characterization of the human COLJAI gene showed complete identity between the sizes of the exons encoding the Gly-X-Y triple helical domain and those of COLlA2 and COL2Al (65).These genes differ from COLlAl in that the 54-bp exons 19 and 20 are replaced by a fusion exon of 108 bp in COLlAl (174). Some inconsistencies are observed in the sizes of the four exons that encode the carboxy propeptides of the type-I, -11, and -111 procollagen chains, but these are mainly due to differences in the length of the 3’ untranslated region (65, 174). The amino pre-propeptide is encoded by five exons rather than six exons as in the COLlAl and COLlA2 genes (175).The promoter region of human COL3Al contains a TATA element but lacks a CCAAT sequence. A putative NF- 1 binding sequence, GGGCTGGAAAG, around nucleotide -60, and a potential AP-1 binding site, TGAGTC, at nucleotide -470, were also observed (175). Detailed analyses of the mouse COL3Al promoter has confirmed that the mouse gene also lacks a functional CCAAT sequence that is, however, found in both human and mouse COLlAl and COLlA2 promoters ( 1 76). The mouse promoter contains cis-regulatory elements that are not found in the COLlAl and COLlA2 promoters. The observed differences in the promoters of these genes may provide a mechanism for the separate regulation of tvpe-I and -III collagens genes and account for the varying proportions of type-I and -111 collagens in different tissues and at different ages.
B. Naturally Occurring Mutations of the Type-Ill Collagen Gene Mutations of type-111 collagen produce a range of phenotypes, from the most severe acrogeric forms of Ehlers-Danlos syndrome type IV to the relatively minor phenotypes that are indistinguishable from EDS type I11 and the benign joint-hypermobility syndrome (177). The acrogeric EDS type-IV phenotype is unique. The patients, even at a young age, appear prematurely aged. The face has a pinched appearance with prominent eyes, thin nose, thin lips, and lobeless ears. The hands are prematurely aged, an appearance referred to as acrogeria (177). The skin is thin and fragile and may show a circumscribed rash due to the extrusion of dermal elastin fibers into the epidermis. Loss of the terminal phalanges, a mild arthropathy, keloid scars, and recurrent penumothoraces, as well as intestinal rupture, torsion, and intramural hematomas, are frequent. However, the most important manifestations involve the cardiovascular system with aneurysms and ruptures of the aorta and medium-sized arteries. Some patients have major vascular anomalies but not the acrogeric features typical of the severe forms of EDS. Others have only thin skin and joint hypermobility.
59
COLLAGEN GENES AND THEIR MUTATIONS
A moderate number of dominant-negative mutations of the COL3Al gene have been identified in patients with different forms of the EDS type IV, as outlined in the following sections. The heterozygous dominantnegative mutations consist of deletions within the gene, point mutations that produce exon skipping, and point mutations that produce substitutions of glycine residues in Gly-X-Y triplets in the triple helix (Table VI). The latter mutations appear to be less common than deletions or splicing mutations compared to the pattern of mutations in the genes of type-I and type-I1 collagens. The reasons for this difference are unclear. It may be due to selection bias because of the easier detection of truncated al(II1) chains in the protein screening of cultured skin fibroblasts from patients suspected of having EDS type IV. Alternatively, glycine substitutions may result in a
TABLE VI DOMINANT-NEGATIVE ’I~PE-IIICOLLAGEN MUTATIONS IN EHLERS-DANLOS SYNDROME R P E Iva Mutation 3.3-kb deletion 2-kb deletion of exons 8-15 7.5-kb deletion of exons 9-24 9-kb deletion of exons 34-48 27-bp in-frame deletion, exon 37 IVS 16, G+1 to A, skip exon 16 IVS 17, G+ l to G, skip exon 17 IVS 20, G+1 to A, skip exon 20 IVS 25, G+5 to T, skip exon 25 IVS 27, G+5 to T, skip exon 27 IVS 37, G+5 to T, skip exon 37 IVS 41, G + l to A, skip exon 41 IVS 42, G+1 to A, skip exon 42 G-to-A, exon 14 Exon 35 G-to-A, exon 40 Exon 41 G-to-A, exon 43 Exon 44 G-to-T, exon 45 Exon 48 Exon 48 G-to-A, exon 48 Exon 49 “Zygosity is all hetero. bPersonal observation.
Protein change Deletion of 220 amino acids Deletion of Gly-46-Pro-387 Deletion of Gly-595-Glu-1008 Deletion of Ala-662-Pro-671 Deletion of Gly-166-Lys-183 Deletion of Gly-184-Pro-216 Deletion of Gly-265-Arg-282 Deletion of Gly-388-Asp-420 Deletion of Gly-439-Thr-456 Deletion of Gly-649-Ala-684 Deletion of Gly-775-Lys-810 Deletion of Gly-811-Asp-846 Gly-136 to Arg Gly-619 to Arg Gly-757 to Asp Gly-790 to Ser Gly-847 to Glu Gly-883 to Asp Gly-910 to Val Gly-1000 to Val Gly-1003 to Asp Gly-1006 to Glu Gly-1018 to Asp
Ref.
178 279 179 182 180 278 ll
280 281 282 283 183 279 284 285
b 286 287 286 288 287 287 289 290
60
WILLIAM G. COLE
greater number of intrauterine deaths because of the inability of the mutant homotrimeric molecules to rescue the severe phenotype partially.
1. HELICALDELETIONS An in-frame 3.3-kb, multiexon deletion was the first COL3A1 mutation to be characterized (178). The proband’s cultured fibroblasts produced equal amounts of normal length al(II1) mRNA and mRNA shortened by approximately 600 bases. The cells also produced approximately equal amounts of normal pro-d(III) and mutant pro-al(II1) chains in which the triple helix was shortened by 220 amino acids. The mutant procollagen molecules showed decreased thermal stability and were poorly secreted. Three other multiexon deletions have been reported with maintenance of the translational reading frame and the repetitive Gly-X-Y triplet structure (Table VI). The 7.5-kb deletion resulted from an exon-to-intron recombination with deletion of 16 exons and 1026 nucleotides (179). The deletion extended from position 13 of exon 9 to a domain with intron 24 that contained a series of polymorphic dinucleotide repeats. An in-frame deletion of 27 bp from within exon 37 resulted in the loss of nine amino acids and maintenance of the Gy-X-Y repeating structure of the triple helix (180). The deletion, which was flanked by two short repeats of CTCC, appeared to have arisen by slipped mispairing of bases. These cases showed similar protein anomalies, with poor secretion and decreased thermal stability of the mutant procollagen. The amount of normal type-111 procollagen secreted into the medium was markedly reduced, in keeping with the very low (approximately 12.5%) amount of normal homotrimeric collagen that would be expected in these heterozygous cases (181). Approximately 75% of the molecules would contain a mixture of normal and mutant chains, and 12.5% would be expected to contain only mutant chains (181). The heterotriineric molecules would be unstable and nonfunctional because of the differing lengths of the normal and truncated, mutant chains. However, the mutant homotrimeric molecules are often stable and may be functional (182). 2. HELICALSPLICING MUTATIONS Point mutations involving splice-donor sites of exons encoding the triplehelical domain of al(II1) chains have been identified in patients with EDS type IV (Table VI). The preceding e x m is lost in the conversion of premRNA to mRNA. Because the exons contain complete codons and encode complete Gly-X-Y triplets, the translational reading frame and Gly-X-Y triplet structure is maintained. Half of the cases involve the substitution of G+1
COLLAGEN GENES AND THEIR MUTATIONS
61
by A in the mandatory GT dinucleotide of the splice-donor site (Table VI). The other half involve the substitution of G+5 by T. In contrast to the G+' substitutions, in which exon skipping is complete, the G+5 substitutions show alternative splicing in accordance with similar mutations in other collagen genes. For example, exov skipping was nearly abolished when fibroblasts from a proband with a G+5-to-T transversion of intron 25 of the C O U A I gene were cultured at a reduced temperature (Table VI). The protein anomalies documented in these cases were similar to those already described in (Section IV,B,l) in patients with deletions within the C O U A l gene.
3. HELICALGLYCINESUBSTITUTIONS Glycine substitutions in Gly-X-Y triplets have been characterized along the triple helix of al(II1) chains in patients with EDS type IV (Table VI). The mutations and their metabolic consequences are similar to those observed in collagens types I and 11. The metabolic donsequences are also similar to those observed in fibroblasts from patients with'deletions or exon-skipping mutations of the COL3Al gene. However, it is unlikely that mutant homotrimeric molecules containing al(II1) chains bearing a glycine substitution would be functional, whereas mutant homotrimers containing truncated chains are likely to be stable and functional (183).
4. NULL ALLELICMUTATIONS Mutations of this type have been proposed to account for the finding of reduced production of apparently normal type-111 collagen by cultured fibroblasts from some patients with EDS type IV and its variants (177).
5. GENOTYPE-PHENOTYPE RELATIONSHIPS The clinical and pathological manifestations of EDS type IV are consistent with the known distribution of type-I11 collagen. As expected, the clinical phenotypes are more severe with dominant-negative mutations than with mutations that produce a functionally null COL3Al allele. Most patients with the acrogeric form of EDS have mutations that alter the structure of the carboxy-terminal domain of the triple helix of al(II1) chains (1 77). In contrast, milder phenotypes of patients with predominantly vascular manifestations have mutations that alter the structure of the more amino-terminal domains of the al(II1) chains or have a quantitative deficiency due to a null COL3A1 allele (184). However, more mutations need to be defined in order to better understand the genotype-phenotype relationships and to identify patients who require more frequent clinical monitoring.
62
WILLIAM G . COLE
V. Type-IV Collagen Genes
A. Normal Type-IV Collagen Genes Type-IV collagen is a major constituent of basement membranes, forming supramolecular network-like structures by self-aggregation (185). It was initially characterized as a heterotrimeric molecule containing two al(1V) and one a2(IV) chains with molecular weights of 185,000 and 170,000, respectively (186). The human al(IV) chain is 1669 residues long, with a 27-residue signal peptide, a 145-residue amino-terminal domain, a 1268-residue central triple-helical domain, and a 229-residue noncollagenous domain, NC1 (187, 188). The human a2(IV) chain contains 1712 residues that include a 36residue signal peptide, a 147-residue amino-terminal domain, a 1302residue triple-helical domain, and a 227-residue noncollagenous domain (189, 190). A characteristic feature of the triple-helical domain is the presence of 21 interruptions, 2-11 residues long, in the al(1V) chain, and 23 interruptions, 1-24 residues long, in the a2(IV) chains. These interruptions result in greater flexibility of the triple helix of type-IV collagen when compared to the fibrillar collagens (185). A striking feature of the NC1 domains of the al(1V) and a2(IV) chains is the presence of two equal-sized internal repeats, each containing six cysteine residues. The NC1 domains have thus been suggested to have evolved by an intra- or intergenic duplication (191). The type-IV collagen molecules form aggregates in which four molecules associate alternatively in parallel and antiparallel fashion through their amino-termini (7s domains) with stabilization of the complex by intramolecular and intermolecular disulfide and lysine-derived crosslinkages (185). Interactions of the NCI domains produce dimers that are responsible for tail-to-tail binding of type-IV collagen molecules. The genes for the al(IV) and a2(IV) chains are located together in chromosome 13q34 (Table 11). They are arranged in opposite directions, head-tohead, and separated by 127 bp (192, 193). The intervening promoter segment contains symmetrical functional efements such as C U T boxes and binding sites for S p l but no TATA box. It is probable that the genes have a common, bidirectional promoter (193). This was the first description of two structural human genes encoding two polypeptide chains of the same protein that have overlapping 5’ flanking regions. The a3(IV) and a4(IV) chains have also been identified in glomerular and lens basement membranes, but there is no evidence that they form heterotrimers with either al(1V) or a2(IV) chains (194-197). The Goodpasture epitope, which is involved in Goodpasture’s syndrome-an autoimmune disorder of the kidney and lung-is located within the NC1 domain of the
COLLAGEN GENES AND THEIR MUTATIONS
63
a3(IV) chain (196). The genes for the a3(IV) and a4(IV) genes are located together in chromosomes 2q35-37 (Table 11). The a5(IV) cDNA predicts a translation product with 1685 amino-acid residues, including a 26-residue signal peptide, a 1430-residue collagenous domain starting with a 14-residue noncollagenous sequence, and a Gly-X-Y repeating sequence interrupted at 22 locations, followed by a 229-residue carboxyl-terminal noncollagenous domain (198, 199). The noncollagenous domain has 12 conserved cysteine residues and 83 and 63% sequence homology to the noncollagenous domain of the al(1V) and a2(IV) chains. The calculated molecular weight of the mature chain is 158,303, and the molecular weight (determined by immunoblot analysis) of the a5(IV) chains produced by cultured dermal fibroblasts is 185,000 (198). In the human kidney, the chain is specifically immunolocalized to the glomerulus. The COL5A4 gene is located in Xq22 (198). It is probably at least 110 kb long, and the 3’ half of the exodintron structure is similar to that observed in the C O U A l gene (200, 201).
B. Naturally Occurring Mutations of the Type-IV Collagen Genes Mutations of the COL4A5 gene have been identified in some families with X-linked hereditary nephritis (Alport syndrome). The major phenotypic features of the Alport syndrome are hereditary glomerulonephritis with disruption of the glomerular basement membrane and progressive loss of kidney function as well as progressive sensorineural deafness (202). However, the disorder is heterogeneous, with no deafness in some families. There are also autosomal forms of the Alport syndrome that may prove to be due to mutations of the COMA1 and C O U A 2 genes on chromosome 13 or of the C O U A 3 or C O U A 4 genes on chromosome 2 (Table 11). The COUAS gene and X-linked Alport syndrome are both on Xq22-q23 (202, 203). The COLAAS gene was shown to be the Alport gene by the demonstration of major gene rearrangements and point mutations of this gene in patients with this syndrome (Table VII). Coarse screening of the COUAS gene by Southern map analysis has yielded many positive results. In addition to identifying major rearrangements, it has also revealed loss or gain of restriction-endonuclease cleavage-sites due to point mutations (Table VII). The mutations have usually been located and sequenced using genomic DNA rather than cDNA, because glomerular cells that functionally express the COLAAS have not been available for study. However, lymphoblast mRNA contains a small number of COLAAS transcripts as a result of low basal or “illegitimate” transcription (204). Direct sequencing of a5(IV) cDNA amplified from lymphoblasts was used to identlfy a point mutation that resulted in the substitution of Gly-325 of the triple helix by Arg (204).
64
WILLIAM G . COLE
TABLE VII MurA'rivNs OF
THE
COMA5 GENE IN ALPORT'S SYNDROME (HEREDITARY NEPHRITIS)
Mutation" Deletion, exons 5-10 Complex deletion Deletion of 450 kb including 12 kb of COMAS Deletion of >38 kl) including 5' end of COIAA5 Duplication of 35 kh Complex insertionideletion of 25 kb Contiguous gene deletion with deletion in 5' part of COWAS G-to-h C-to-T, exon 23 C-to-?\, exon 14 C-to-C, exon 4 C-to-C, exon 3
Protein change
Protein domain
Ref.
202 206 201 291 201 201 292 Gly-325 to Arg Glv-325 to Glu GI?-521 to Cys Glv-I 143 to Asp Tryp-1538 to Ser Cys-1564 to Ser
Collagenous Collagenous Collagenous Collagenous Noncollagenous Noncollagenous
204 293 199 205 206 200
nExons are numbered from the 3' end of COIAA5
Major COLAAS gene rearrangements appear to be more frequent than in the fibrillar type-I, -11, and -111 collagen genes. This finding may reflect the greater tolerance of the a5(IV) chain to point mutations, or it may reflect the greater ease of identifying major rearrangements by Southern map analysis. The protein changes that result from these mutations have not been reported. It is likely, however, that the glycine substitutions within the collagenous domain behave like similar mutations in the fibrillar collagen genes (205). It was also proposed that a substitution of a highly conserved cysteine by serine in the carboxy-noncollagenous domain would significantly impair intermolecular cross-linking and formation of the triple helix (200). More mutations must be defined to clarlfy the genotype-phenotype relationships. It is unclear why there is deafness in some families and not in others; it has been proposed that deafness may be correlated with the severity of the defect in COLAA5 (202). For example, profound deafness is a feature of a family with a deletion of exons 5 to 10, whereas hearing is normal in a family with the substitution of Gly-1143 to Asp in the collagenous domain of the (w5(IV)chain (202,205).Hematuria is a feature of females who are heterozygous for a large deletion (206).
COLLAGEN GENES AND THEIR MUTATIONS
65
VI. Type-VII Collagen Gene
A. The Normal Type-VII Collagen Gene Type-VII collagen was initially isolated from chorioamniotic membranes (207). It consists of three identical al(VI1) chains. Because each chain had a molecular weight of approximately 170,000, the collagen is referred to as long-chain (LC) collagen (207). The triple-helical domain of type-VII collagen is about 1.5 times the length of the helix of type-I collagen. The molecules form disulfide-bonded dimeric aggregates (208). Keratincytes of the skin are the major cells expressing the COL7Al gene (209, 210). Only one mRNA of approximately 9.5 kb is produced (210). Analyses of al(VI1) cDNA show that the al(VI1) chain has a large collagenous domain, which is interrupted in its middle by a noncollagenous segment of 31 amino acids and by several 1- to 3-amino-acid insertions or deletions in the Gly-X-Y repeating units of the triple-helical domains (209, 210). The chain also contains N-terminal and C-terminal globular noncollagenous domains. The large N-terminal domains of type-VII collagen interact with other extracellular macromolecules within the basement membranes at the dermal-epidermal junction at one end, and with basement-membrane-like structures, known as anchoring plaques, at the other end (211, 212). The gene for type-VII collagen, COL7A1, is a single-copy gene that is located in chromosome 3p21.3 (209, 213). Details of the gene structure are not yet available.
B. Naturally Occurring Mutations of the Type-VII Collagen Gene Epidermolysis bullosa is a heterogeneous disorder that consists of a group of approximately 20 hereditary and acquired diseases of the skin that share the common feature of the formation of blisters and erosions in response to minor trauma (214). The genetically determined forms can be either autosomal-dominant or autosomal-recessive (212). One form of epidermolysis bullosa simplex, the herpetiform variety, results from mutations of keratin-14 (215);mutations of other components of the cutaneous-basement-membrane zone have been proposed in other types of epidermolysis bullosa (212).There is increasing evidence that mutations of the COL7Al gene produce some of the recessive and dominant subtypes of dystrophic epidermolysis bullosa (212, 216). Genetic linkage of this gene to the dominant phenotype has been found in several families (217) and structurally abnormal type-VII collagen was observed in the skin of a patient with the recessive form (218, 219).
66
WILLIAM G . COLE
By analogy with mutations in the fibrillar collagen genes, Uitto et al. (212)speculated on the types of anchoring fibril anomalies that might follow mutations in various domains of type-VII collagen. For example, they speculated that mutations in the triple-helical domain could prevent or delay the secretion of this collagen from basal keratinocytes, the primary cell responsible for the synthesis of anchoring fibrils (220). In support of this proposal, type-VII collagen is retained within keratinocytes of some patients with recessive dystrophic epidermolysis bullosa (221). In contrast, mutations in the noncollagenous N-terminal domain, which is thought to interact with other components of the basement-membrane zone, may result in impaired anchoring fibril function.
VII. Type-IX Collagen Genes A. Normal Type-IX Collagen Genes Type-IX collagen, a FACIT collagen, is 190 nm long and is a heterotrimer of disulfide-bonded al(IX), a2(IX), and a3(IX) chains (222).The a2(IX) chain contains a covalently bound glycosaminoglycan (GAG) chain (223). Although unable to form quarter-staggered fibrils by itself, type-IX collagen is distributed in a periodic fashion along the surface of type-I1 collagen fibrils. The human molecule is kinked at about 40 nm from the globular N-terminus (223). The kink site corresponds to the NC3 domain of the molecule whereas the other noncollagenous and collagenous domains, NC1, COL1, NC2, and COL2, are aligned along the surface of the fibril. This arrangement is stabilized by covalent cross-linkages between the COL2 domain of type-IX collagen and the telopeptides of type-I1 collagen (224, 225). In this model, the GAG chain, attached to the NC3 domain of the a2(IX) chain, is likely to occupy the gap regions of the type-I1 collagen fibrils. The COL3 and NC4 domains project from the surface of the fibrils and the NC4 domain, with an estimated pZ of 10.4, probably interacts with the polyanionic proteoglycans of the cartilage matrix. Type-IX collagen of hyaline cartilage exists in proteoglycan (PG-IX) and nonproteoglycan (COL-IX) forms (226, 227). The serine attachment site of the GAG is contained within the extra five residues in the NC3 domain of the substituted and nonsubstituted a2(IX) chains (228, 229). The existence of PG-IX and COL-IX is not due to isoforms of the a2(IX) chain but rather to the low efficiency of xylosylation of the acceptor serine residue. In contrast, all of the type-IX collagen of chick vitreous humor appears to bear an extraordinarily large GAG chain, which produces a gel-like matrix (226).In contrast,
COLLAGEN GENES AND THEIR MUTATIONS
67
the gel-like matrix of the mammalian vitreous humor is due to hyaluronan; the type-IX collagen molecules do not have GAG chains as long as those observed in the chicken vitreous humor. The complete cDNA sequences of the human al(1X) and a2(IX) and chick a3(IX) chains have also been determined (230-232). The human d ( I X ) chain exists in two isoforms for which the complete cDNA sequences have been determined (230).The two forms share a common stretch of 663 aminoacid residues, but differ in their N-terminal regions. The long form contains an N-terminal NC4 domain of 245 residues and a putative signal peptide of 23 residues. The NC4 domain of the long form has a calculated molecular mass of 27,430 Da. It contains four cysteine residues and has a calculated pZ of 10.4. The human NC4 domain contains a potential carbohydrate attachment site that appears to be used. The short human al(IX) isoform contains only 25 residues of non-triple-helical sequence at the N-terminus of the triple-helical domain of COL3. If the signal peptidase cleaves between residues 23 and 24, the secreted short form contains only two residues at the N-terminus of the COL3 domain. The human COLSAI gene, located on chromosome band 6q12-14, gives rise to these two transcripts that differ in their 5' regions through the alternative use of two transcription start sites and RNA splicing patterns (230). The long human al(IX) transcript contains a 5' region encoded by exons 1-7 spliced to exon 8. The short human al(1X) transcript contains a 5' region encoded by an alternative exon 1* located in the intron between exons 6 and 7, spliced to exon 8. A novel FACIT-like collagen locus (D6S228E)is also located in human chromosome 6q12-14 (233). Type-IX collagen occurs in hyaline cartilages and in a variety of nonchondrogenic tissues, such as the primary corneal stroma of the chick embryo, vitreous humor, and sclera and neural retina of the eye (226, 234);inner ear (235);and notochordal sheath (236). It is known that the primary corneal stroma and the vitreous humor contain the short isoform of the al(1X) chain (226, 237). Type-I1 collagen and the long isoform al(1X) transcripts are both located in the chondrogenic elements of the developing avian forelimb (238). In contrast, the short al(IX) transcripts are distributed throughout the nonchondrogenic, nonmyogenic mesenchymal regions of the limb and are not detectable in the limb chondrogenic elements. Similarly, the short isoform al(IX) transcripts are in the developing nonchondrogenic notocord (239). Conversely, the long al(IX) transcripts are not detectable in the notocord or perinotocordal sheath. However, all transcripts appear in the developing chondrogenic vertebrae as well as in the chondrocranium and Meckel's cartilage. The expression of the short form in these regions is more restricted than that of the long form of al(1X).
68
WILLIAM G . COLE
B. Site-directed Mutagenesis of the Type-IX Collagen Genes Disorganization of the articular cartilage and corneal stroma was observed in transgenic mice bearing a COLSAI minigene (240).The minigene construct consisted of hybrid human and rat al(1X) collagen cDNA that lacked part of COL3, the entire NC3, and a part of COL2. In order to obtain a high level of expression of this cDNA transcript in cartilaginous tissues, the promoter/enhancer of the rat type-I1 collagen gene was ligated to the 5’ end of the cDNA construct. In affected mice, the cornea was opaque with surface irregularities and disorganization of its stroma. Although radiographic analysis up to 6 months of age showed no detectable skeletal anomalies, there was histological evidence of osteoarthritis. The articular cartilage was irregular, with loss of matrix staining and deep clefts in the articular surface.
VIII. Type-X Collagen Gene A. The Normal Type-X Collagen Gene Type-X collagen is a short-chain collagen with a triple-helical domain of 132 nm and globular domains at either end (241, 242). It is produced exclusively by hypertrophic chondrocytes of the growth plate, and appears first in the human growth plate during the second trimester (243).Type-X collagen is found in the vicinity of type-I1 collagen fibrils in hypertrophic cartilage (244),in matlike aggregates in the pericellular area (245), and in the pericellular capsule of hypertrophic chondrocytes (246). It has been proposed that the matlike structures may represent a hexagonal lattice similar to that observed for another short-chain collagen, type-VIII collagen (247). Human type-X collagen is a disulfide-linked homotrimer and migrates after reduction as 66,000-kDa al(X) chains (242).It is uncertain whether the chain is processed to shorter forms. The human COLlOAl gene is a single-copy gene located on chromosome 6q21-22.3; the full cDNA and genomic DNA sequences are available (248, 249). The largest exon, 2940 bp, encodes 1054 bp of 3’ untranslated sequence, the complete 161-amino-acid residue C-globular domain, the full 463-amino-acid residue triple-helical domain, and 4%-amino-acid residues of the N-globular domain. The 169-bp exon encodes B1/3-amino-acid residues of the N-globular domain, 18 amino acids of the signal peptide, and 15 bp of untranslated 5‘ sequence. It is a condensed gene with a 3200-bp intron separating the two exons. The triple-helical domain contains eight imperfections in the Gly-X-Y sequence and there are two mammalian collagenase cleavage sites. The C-globular domain also contains a putative N-linked
COLLAGEN GENES AND THEIR MUTATIONS
69
oligosaccharide attachment site. The calculated relative molecular mass of the al(X) chain is 59,000 Da (249). The promoter region of the mouse COLlOAl gene contains two TATA boxes separated by 165 bp (250).Both TATA boxes are active in transcription, generating two sets of transcripts with different 5’ termini. The longer transcript is of low abundance and is detectable by the PCR in newborn mice. The short transcript of 3.6 kb is the major species. The COLlOAl promoter also contains a CCAAT box as well as other consensus sequence elements required for binding of transcription factors. However, these findings have not provided an insight into the specific expression of the COLlOAl gene by hypertrophic chondrocytes.
B. Naturally Occurring Mutations of the Type-X Collagen Gene Using PCR and single-strand conformation polymorphism analysis, seven sequence changes have been identified in the coding region of the human COLlOAl gene (251). Six of these are polymorphic and have been used to demonstrate discordant segregation between the COLlOAl locus and both achondroplasia and pseudoachondroplasia. The seventh sequence change resulted in a Val-to-Met substitution in the carboxy-terminal domain of the molecule and was identified only in two persons with hypochondroplasia from a single family. Segregation analysis in this family was inconclusive. A G-to-C transition that results in a Gly-to-Arg substitution at residue 26 of the carboxy propeptide was also detected using similar techniques (252). Analysis of a family with multiple epiphyseal dysplasia ruled out this sequence change as a cause of this autosomal-dominant disease.
C. Site-directed Mutagenesis of the Type-X Collagen Gene Jacenko et al. (253) investigated the phenotype produced by a sitedirected, dominant-negative mutation in transgenic mice. Chick COLlOAl minigene constructs were designed to produce a truncated triple-helical domain, but with a normal C-terminus that would enable the abnormal chain to combine with normal chains. A severe spondylometaphyseal chondrodysplastic phenotype resulted. There were major changes in the growth plates and metaphyses, confirming the importance of type-X collagen in the structure and function of this region of the developing bone. The histological features of collapse of the hypertrophic zone with absence or reduction in the numbers of hypertrophic cells were similar to those observed in human metaphyseal chondrodysplasias. The autosomal dominant
70
WILLIAM G . COLE
disorders-metaphyseal chondrodysplasia of the Schmid type and spondylometaphyseal chondrodysplasia of the Kozlowski type-have many of the same features. Unexpectedly, the transgenic mice also had abnormalities in the bone marrow and lymphoid system, tissues in which the type-X collagen minigene was not expressed. It is possible that integration of the transgene altered the function of another gene that was responsible for the lymphopoietic anomalies.
IX. Summary It is to be expected that more collagen genes will be identified and that additional heritable connective tissue diseases will be shown to arise from collagen mutations. Further progress will be fostered by the coordinated study of naturally occurring and induced heritable connective tissues diseases. In some instances, human mutations will be studied in more detail using transgenic mice, while in others, transgenic studies will be used to determine the type of human phenotype that is likely to result from mutations of a given collagen gene. Further studies of transcriptional regulation of the collagen genes will provide the prospect for therapeutic control of expression of specific collagen genes in patients with genetically determined collagen disorders as well as in a wide range of common human diseases in which abnormal formation of the connective tissues is a feature.
REFERENCES I . E. Vuorio and B. de Cromhrugghe, ARB 59, &37 (1990). 2. M. van der Rest and H. Garrone, FASEB /. 5, 2814 (1991). 3. L. J. Sandell and C. D. Boyd, in “Extracellular Matrix Genes” (L. D. Sandell and C D. Boyd, eds.), p. 1. Academic Press, San Diego, 1990. 4. A. Miller, in “Biochemistry of Collagen” (G. N. Rramachandran and A. H. Reddi, eds.), p. &5. Plenum, New York, 1976. 5 . D. J. Prockop, R. A. Berg, K. I. Kivirikko and J. Uitto, in “Biochemistry of Collagen” (G. N. Ramachandran and A. H. Reddi, eds.), p. 163. Plenum, New York, 1976. 6. W. de Wet, M. Bernard, V. Benson-Chanda, M.-L. Chu, L. Dickson, D. Weil and F. Ramirez. JBC 262, 16032 (1987). 7. M. D’Alessio, M. Bernard, P. J. Pretorius, W. de Wet and F. Ramierz, Gene 67, 105 (1988). 8 . G. Karsenty and B. de Crombrugghe, JBC 265, 9934 (1990). 3. B. Boast, M.-W. Su, F. Ramirez, M. Sanchez and E. V. Avvedimento, JBC 265, 13351 (1990). 10. D. J. Liska, J. L. Slack and P. Bornstein, Cell Regul. 1, 487 (1990). 11. A. Maatta, P. Bornstein and R. P. K. Penttinen, FEBS Lett. 279, 9 (1991).
COLLAGEN GENES AND THEIR MUTATIONS 12. 13. 14. 15. 16. 17.
18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47.
71
A. S. Olson, A. E. Geddis and D. J. Prockop, JBC 266, 1117 (1991). K. Harbers, M. Kuehn, H. Delius and R. Jaenisch, PNAS 81, 1504 (1984). D. Jahner and R. Jaenisch, Nature 315, 594 (1985). J. P. Thompson, C. P. Simkevich, M. A. Holness, A. H. Kang and R. Raghow, JBC 266, 2549 (1991). P. Bornstein, J. McKay, J. K. Hoishima, S. Devarayalu and R. E. Gelinas, PNAS 84, 8869 (1987). C. M. S. Rossouw, W. P. Vergeer, S. du Plooy, M. Bernard, F. Ramirez and W. de Wet, JBC 262, 15151 (1987). A. L. Sherwood and P. Bornstein, BJ 265, 895 (1990). R. Ravazzolo, G. Karsenty and B. de Crombrugghe, JBC 266, 7382 (1991). S. N. Maity, P. T. Golumbek, G. Karsenty and B. de Crombrugghe, Science 241, 582 (1988). P. Rossi, G. Karsenty, A. B. Roberts, N. S. Roche, M. B. Sporn and B. de Crombrugghe, Cell 52, 405 (1988). J. D. Ritzenthaler, R. H. Goldstein, A. Fine, A. Lichtler, D. W. RoseandB. D. Smith, BJ 280, 157 (1991). D. P. Pavlin, A. C. Lichtler, A. Bedalov, B. E. Kream, J. R. Harrison, J. R. Thomas, G. A. Gronowics, S. H. Clark, C. 0. Woody and D. W. Rowe, J. Cell Biol. 116, 227 (1992). Y. Yamada, M. Mudryi and B. de Crombrugghe, JBC 258, 14914 (1983). P. Bornstein, J. McKay, S. Devarayalu and S. C. Cook, NARes 16, 9721 (1988). P. H. Byers, G. A. Wallis and M. C. Willing, J. Med. Genet. 28, 433 (1991). W. G . Cole, in “Calcium Regulating Hormones and Bone Metabolism” (D. V. Cohn, C. Gennari and A. H. Tashjian, Jr., eds.), p. 315. Elsevier, Amsterdam, 1992. D. 0. Sillence, A. Senn and D. M. Danks, J. Med. Genet. 16, 101 (1979). H. Kuivaniemi, G . Tromp and D. J. Prockop, FASEB J . 5, 2052 (1991). D. W. Rowe, J. R. Shapiro, M. Poirier and S. Schlesinger, J. Clin. Invest. 76, 604 (1985). R. J. Wenstrup, M. C. Willing, B. J. Starman and P. H. Byers, Am. J. Hum. Genet. 46,975 (1gw. J. F. Bateman, M. Hannagan, S. R. Lamande, I. Moeller, D. Chan and W. G . Cole, Int. Conj. Osteogenesis Zmperfecta 4th abstr. 2 (1990). D. H. Cohn, G. Wallis, X. Zhang and P. H. Byers, Matrix 10, 236 (1990). J. C. Marini, D. K. Grange, G. S. Gottesman, M. B. Lewis and D. A. Koeplin, JBC 264, 11893 (1989). A. C. Nicholls, J. Oliver, D. V. Renouf, M. Keston and F. M. Pope,]. Med. Genet. 28,757 (1991). C. J. Pruchno, D. H. Cohn, G. A. Wallis, M. C. Willing, B. J. Starman, X. M. Zhangand P. H. Byers, Hum. Genet. 87, 33 (1991). B. Sykes, Nature 348, 18 (1990). J. F. Bateman, T. Mascara, D. Chan and W. G. Cole, BJ 217, 103 (1984). J. Bonadio and P. H. Byers, Nature 316, 363 (1985). J. Bonadio, K. A. Holbrook, R. E. Gelinas, J. Jacob and P. H. Byers, JBC 260,1734 (1985). D. J. Prockop, Am. J . Hum. Genet. 36, 499 (1984). J. Engel and D. J. Prockop, Annu. Reo. Biophys. Chem. 20, 137 (1991). M. Pack, C. D. Constantinou, K. Kalia, K. B. Nielsen and D. J. Prockop, JBC 264,19694 (1989). A. Westerhausen, J. Kishi and D. J. Prockop, JBC 265, 13995 (1990). H. P. Bachinger and J. M. Davis, Int. J. Biol. Mocromol. 13, 152 (1991). V. H. Rao, B. Steinmann, W. de Wet and D. W. Hollister, JBC 264, 1793 (1989). W. Traub and B. Steinmann, FEBS Lett. 198, 213 (1986).
72
WILLIAM G . COLE
48. J. F. Bateman, D. Chan, I. D. Walker, J. G. Rogers and W. G. Cole, JBC 262,7021 (1987). 49. J . F. Bateman, D. Chan, S. b m a n d e , T. Mascara and W. G. Cole, Ann. N.Y. Acad. Sci. 543, 95 (1988). 50. A. T. Baker, J. A. Ramshaw. D. Chan, W. G . Cole and J. F. Bateman, BJ 261, 253 (1989). 51. K. E. Dombrowski, B. E. Vogel and D. J. Prockop, Bchem 28, 7107 (1989). 52. R. R. Minor, M. Sippola-Thiele, J. McKeon, J. Berger and D. J. Prockop, JBC 261, loo06 (1986).
53.
K. E. Kadler, A. Torre-Blanco, E. Adachi, B. E. Vogel. Y. Hojima and D. J. Prockop, Rchem 30,5081 (1991).
S4. B. E. Vogel, R. R. Minor, M. Freund and D. J. Prockop, JBC 262, 14737 (1987).
55. W. G . Cole, D. Chan, S. Lamande, T.Mascara, J. Rogers and J. Bateman, Connect. Tissue
Res. 21, 91 (1989). 56. B. Steinmann, A. Nicholls and F. M. Pope, JBC 261, 8958 (1986).
57. W, G. Cole, C. W. Chow, J. G. Rogers and J. F. Bateman, J . Med. Genet. 27, 228 (1990). 58. W, 6. Cole. E. Patterson, J. Bonadio, P. E. Campbell and D. W. Fortune,J. Med. Genet. 29, 112 (1992). 59. R. J. Wenstrup, P. Tsipouras and P. H. Byers, J . C h . Invest. 78, 1449 (1986). 60. J. Bonadio, F. Ramierz and M. Barr, JBC 265, 2262 (1990). 61. R . J. Wenstrup, A. Schrago-Howe, L. Lever and C. Phillips, Int. Conf. Osteogenesis Iinperfecta, 4th abstr. 58 (1990). 62. C . Genovese, A. Brufsky, J. Shapiro and D. Rowe, JBC 264, 9632 (1989). 63. G . S. Barsh, C. L. Roush, J. Bonadio, P. H. Byers and R. E. Gelinas, PNAS 82, 2870 (1985). 6 4 . M. L. Chu, C. J. Williams, G. Pepe, J. L. Hirsch, D. J. Prockop and F. Ramirez, Nature 304, 78 (1983). 65. M . L. Chu, V. Gargiulo, C. J. Williams and F. Ramirez, JBC 260, 691 (1985). 66. M. C. Willing, D. H . Cohn, B. Starman, K. A. Holbrook, C. R. Greenberg and P. H. Byers, JBC 263, 8398 (1988). 67. P. H. Byers, B. J. Starman, D. H. Cohn and A. L. Horwitz, JBC 263, 7855 (1988). (58. D. H. Cohn. C. J. Pruchno, X. Zhangand P. H. Byers, A m . ] . Hum. Genet. 47, IlO(1990). 69. J. R. Hawhns, A. Superti-Furga, B. Steinmann and R. Dalgleish, JBC 266,22370 (1991). 70. 6. A. Wallis, B. J. Starman and P. H. Byers, Am. J . Hum. Genet. 45, A228 (1989). 71. K. Molyneux, P. H . Byers and R. Dalgleish, lnt. Conf. Osteogenesis Imperfecta, 4th abstr. 19 (1990). 72. D. H. Cohn. S. Apone, D. R. Eyre, B. J. Starman, P. Andreassen, H. Charbonneau, A. C. Nicholls, F. M. Pope and B. H. Byers, JSC 263, 14605 (1988). 73. M .C. Willing, D. H. Cohn and P. H. Byers, J . Clin. Incest. 85, 282 (1990). 74. T. Pihlajaniemi, L. A. Dickson, F. M. Pope, V. R. Korhonen, A. Nicholls, D. J. Prockop and J. C. Myers, JBC 259, 12941 (1984). 75. S. B. Deak, A. Nicholls, F. M. Pope and D. J. Prockop, JBC 258, 15192 (1983). 76. S . B. Deak, M. van der Rest and D. J. Prockop, Collagen Relat. Res. 5, 305 (1985). 77. J. F. Bateman, S. R. Lamande, H. H. Dahl, D. Chan, T. Mascara and W. G. Cole, JBC 264, 10960 (1989). 78. A. C. Nicholls, G . Osse, H. G. Schloon, S. Deak, J. C. Myers, D. J. Prockop, W. R. F. Weigel. P. Fryer and F. M. Pope, J . Med. Genet. 21, 257 (1984). 79. M. J. Francis, K. J. Williams, B. C. Sykes and R. Smith, Clin. Sci. 60,617 (1981). (30.M . C. Willing, C. Pruchno and P. Byers, Am. J . Hum. Genet. 49, A434 (1991). 81. M. C. Willing, C. J. Pruchno, M. Atkinson and P. H. Byers, Am. J . Hum. Genet. 51, 508 ( 1992).
COLLAGEN GENES AND THEIR MUTATIONS
73
82. D. W. Rowe, M. L. Stover, M. McKinstry, A. Brufsky, B. Kream, S. Chipman and J. Shapiro, lnt. Conf. Osteogenesis lmperfectu, 4th abstr. 57 (1990). 83. D. Goldfarh and N. Michaud, Trends Cell Biol. 1, 20 (1991). 84. P. H. Byers, P. Tsipouras, F. J. Bonadio, B. J. Starman and R. C. Schwartz, Am. J. Hum. Genet. 42, 237 (1988). 85. D. H. Cohn, B. J. Starman, B. Blumherg and P. H. Byers, Am. J . Hum. Genet. 46, 591 (1990). 86. C. D. Constantinou, M. Pack, S. B. Young and D. J. Prockop, Am. J. Hum. Genet. 47,670 (1990). 87. A. A. Chiodo, A. Hockey and W. G. Cole, JBC 267, 6361 (1992). 88. D. Weil, M. D’Alessio, F. Ramirez, W. de Wet, W. G. Cole, D. Chan and J. F. Bateman, EMBO J. 8, 1705 (1989). 89. W. G. Cole, D. Chan, D. W. Chambers, I. D. Walker and J. F. Bateman, ]BC 261, 5496 (1986). 90. M. D’Alessio, M. Bernard, P. H. Pretorius, W. de Wet and F. Ramirez, Gene 67, 105 (1988). 91. D. Weil, M. D’Alessio, F. Ramirez, B. Steinmann, M . K. Wirtz, R. W. Glanville and D. W. Hollister, JBC 264, 16804 (1989). 92. B. Steinmann, L. Tuderman, L. Peltonen, G. R. Martin, V. A. McKusick and D. J. Prockop, JBC 255, 8887 (1980). 93. R. B. Watson, G. A. Wallis, D. F. Holmes, D. Vilijoen, P. H. Byers and K. E. Kadler, JBC 267, 9093 (1992). 94. A. C. Nicholls, J. Oliver, D. V. Renouf, J. McPheat, A. Palan and F. M. Pope, Hum. Gcnet. 87, 193 (1991). 95. N. S. Vasan, H. Kuivaniemi, B. E. Vogel, R. R. Minor, J. A. Wootton, G. Tromp, R. Weksherg and D. J. Prockop, Am. J. Hum. Genet. 48, 305 (1991). 96. D. Weil, M. Bernard, N. Combates, M. K. Wirtz, D. W. Hollister, B. Steinmann and F. Ramirez, JBC 263, 8561 (1988). 97. D. Weil, M. D’Alessio, F. Ramirez and D. R. Eyre, ]BC 265, 16007 (1990). 98. M. K. Wirtz, R. W. Glanville, B. Steinmann, V. Rao and D. W. Hollister, JBC 262, 16376 (1987). 99. A. Lenaers, M. Ansay, B. V. Nusgens and C. M. Lapiere, EJB 23, 533 (1971). 100. J. R. Lichtenstein, G. R. Martin and L. D. Kohn, Science 182, 298 (1973). 101, B V. Nusgens and C. Verellen-Dumoulin, T. Hermanns-Le, A. De Paepe, L. Nuytink, G. E. Pierard and C. M. Lapiere, Nature Genet. 1, 214 (1992). 102. L. T. Smith, W. Wertelecki, L. M. Milstone, E. M. Perry, M. R. Seashore, I. M. Braverman, T. G. Jenkins and P. H. Byers, Am. J. Hum. Genet. 51, 235 (1992). 103. H. Kuivaniemi, G. Tromp. M.-L. Chn and D. J. Prockop, BJ 252, 633 (1988). 104. M. Sippola, S . K&e and D. J. Prockop, JBC 259, 14094 (1984). 105. W. 6. Cole, R. Jaenisch and J. F. Bateman, Q . J. Med. 70, 1 (1989). 106. A. Stacey, J. Bateman, T. Choi, T. Mascara, W. Cole and R. Jaenisch, Nature 332, 131 (1988). 107. J. S. Khillan, A. S. Olsen, S. Kontusaari, B. Sokolov and D. J. Prockup, JBC 266, 23373 (1991). 108. A. S. Olsen, A. E. Geddes and D. J. Prockop, JBC 266, 1117 (1991). 109. S. Hartung, R. Jaenisch and M. Breindl, Nature 320, 365 (1986). 110. A. Schnieke, K. Hahers and R. Jaenisch, Nature 304, 315 (1983). 111. K. Kratochwil, K. von der Mark, E. Kollar, R. Jaenisch, M. Mooslehner, M. Schwarz, K. Haase, I. Gmachl and K. Harbers, Cell 57, 807 (1989).
74
WILLIAM G . COLE
112. J. Bonadio. T. L. Saunders, E. Tsdi, S. A. Goldstein, J. Morris-Wiman, L. Brinkley, D. F. Dolan, R. A. Altschuler, J. E. Hawkins, Jr., and J. F. Bateman, PNAS 87, 7145 (1990). 113. E. Takahashi, T. Hori, P. O’Connell, M. Leppert and R. White. Hum. Genet. 86, 14 (1990). 114. M . C. Ryan and L. J. Sandell, J S C 265, 10334 (1990). 115. L. J. Sandell, N . Morris, J. R. Robbins and M. B. Goldring, J. Cell Biol. 114, 1307 (1991). 116. li. S . Cheah, E. T. Lau, P. K. C. Au and P. P. L. Tam, Detjelopment 111, 945 (1991). 117. .4.Wood, D. E. Aslihurst, A. Corbett and P. Thorogtmd, Deoeloptnent 111, 955 (1991). 118. C. T. Baldwin, A. M. Reginato, C. Smith, S. A. Jirnenez and D. J. Prockop, BJ 262,521 (1989). 119. K, Elima, T Vuorio and E. Vuorio, XARes 15, 9499 (1987). 120. R. H. Lovell-Badge, A. Bygrave, A. Bradley, E. Robertson, R. Tilly and K. S. E. Cheah, PNAS 84, 2803 (1987). 121. W. E. Horton, Jr., L. Wang. D. Bradham, P. Precht and R. Balakir, DNA Cell B i d . 11, 193 (1992). 122. M. \?kkula. M. Metsaranta, A.-C. Syvanen. L. Ala-Kokko, E. Vuorio and L. Peltonen, BJ 285, 287 (1992). 123. J. Wu, J. Sugai, R . Yamin, M. Goldring and I,. J. Sandell, Proc. Orthop. Res. SOC. 39, 118 (1993). 124. L. Wang, R. Balakir and W. E. Horton, Jr., JRC 266, 19878 (1991). 125. J. Spranger, P~thol.Irnmumpathol. Res. 7, 76 (1988). 126. P. Maroteaux, V. Stanescu and R . Stanescu. C h . Orthop. Relat. Res. 114, 31 (1976). 127. I).H. E y e , M. P. Upton, F. D. Shapiro, R. H. Wilkinson and D. F. Vawter, Am. J , Hum. Genet. 39, 52 (1986). 128. S. P. Feshchenko, I. A. Rebrin, V. P. Sokolnik, B. M. Sher, B. P. Sokolov, V. N. Kalinin and G. T. Lazjuk, Hum. Genet. 82, 49 (1989). 129, W. A. Horton, M. A. Machado, J. W.Choii and D. Campbell, Pediatr. Res. 22, 324 (1987). 130. h.1. Godfrey and D. W. Hollister, Am. J . Hum. Genet. 43, 904 (1988). 131. H. Vissing. h l . D’Alessio, B. Lee, F. Ramirez, M. Godfrey and D. W. Hollister, JRC 264, 1 8 B (1989). 1.32. W A. Horton, M. A. Machado, J. Ellard, D. Campbell. J. Bartley, F. Rainirez, E. Vitale and B. Lee, PNAS 89, 4.583 (1992). 133. R. Bogaert. G . E. Tiller, M . A. Weiss, H. E. Gruber, D. L. Rimoin, D. H. Cohn and D. R. E y e , JRC 267, 22522 (1992). 134. R . S. Lachnian, 6. E. Tiller, J. M .Graham, Jr., and D. L. Rimoin, Eur. J. Radiol. 14, 1 11992). 135. J. Spranger and H. R. Wiedemann, Heltj. Paediutr. Actu 21, 598 (1966). 136. R. Stanescu, V. Stanescu, C. Bordet and P. Maroteaux. J. Rheumatol. 14, 1061 (1987). 137. P. L. Katzenstein, C. J. Malemud, M. N. Pathria, J. R. Carter, R. P. Sheon and R. W. Moskowitz, Arthritis Rheum. 33, 674 (1990). 138. L. W. Murray, J. Bautista, P. L. James and D. L. Rimoin, Am. J. Hum. Genet. 45,5 (1989). 139. W. A. Horton, J. W. Chou, M. A. Machado and D. Campbell, Collagen Relat. Res. 5,349 (1985). 140. I. J. Anderson, R. B. Goldberg, R. W.Marion, W. B. Upholt and P. Tsipouras, An.J. Hum. Genet. 46, 896 (1990). 141. H. G. Knowlton, P. L. Katzenstoin, R. W. Moskowitz, E. J. Weaver, C. J. Malemu, M. N . Pathria, S. A. Jimenez and D. J. Prockop, N . Engl. J. Med. 322, 526 (1990). 142. C. Sher, R. Ramesar, R. Martell, I. Learmonth, P. Tsiporous and P. Beighton, Am. J . H u m . Genet. 48, 518 (1991). 143. B. Lee, H. Vissing, F. Ramirez, D. Rogers and D. Rimoin, Science 244, 978 (1989).
COLLAGEN GENES AND THEIR MUTATIONS
75
144. G. E. Tiller, D. L. Rimoin, L. W. Murray and D. H. Cohn, PNAS 87, 3889 (1990). 145. L. Ala-Kokko, C . T. Baldwin, R. W. Moskowitz and D. J. Prockop, PNAS 87,6565 (1990). 146. D. R. Eyre, M. A. Weis and R. W. Moskowitz, J. Clin. Znuest. 87, 357 (1991). 147. S. A. Jimenez, L. Ala-Kokko, N. Ahmad, C. Baldwin, R. Dharmavaram, A. Reginato, R. Knowlton and D. J. Prockop, in “Articular Cartilage and Osteoarthritis” (K. Kuettner, R. Schleyerbach, J. G. Peyron and V. C. Hascall, eds.), p. 167. Raven, New York, 1992. 148. B. Steinmann, A. Nicholls and F. M. Pope, JBC 261, 8958 (1986). 149. R. W. Moskowitz, Y. L. W. Pun, S. LieandT. M. Haggi, Proc. Orthop. Res. SOC.39,669 (1993). 150. D. Chan and W. G. Cole, JBC 266, 12487 (1991). 151. J. Chelly, J. Concordet, J. C. Kaplan and A. Kahn, PNAS 86, 2617 (1989). 152. R. G. H. Cotton, N. R. Rodriguez and R. D. Campbell, PNAS 85, 4397 (1988). 153. D. Chan, T.K. F. Taylor and W. G. Cole, JBC 268, 15238 (1993). 154. D. L. Rimoin, D. C. Siggers, R. S. Lachman and R. Silberberg, Clin. Orthop. Relat. Res. 114, 70 (1976). 155. A. R. Poole, I. Pidoux, A. Reiner,L. Rosenberg, D. Hollister, L. Murray and D. Rimoin, J . Clin. Znuest. 81, 579 (1988). 156. A. Winterpacht, M. Hilbert, U. Schwarze, S. Mundlos, J. Spranger and B. U. Zabel, Nature Genet. 3, 323, (1993). 157. G B. Stickler and D. G. Pugh, Mayo Clin. Proc. 42, 495 (1967). 158. R. G. Knowlton, E. J. Weaver, A. F. Struyk, W. H. Knohloch, R. A. King, K. Norris, A. Shamban, J. Uitto, S. A. Jimenez and D. J. Prockop, Am. J . Hum. Genet. 45, 681 (1989). 159. C. A. Francomano, R. M. Liberfarb, T. Hirose, I. H. Maumenee, E. A. Streeteu, D. A. Meyers and R. E. Pyeritz, Genomics 1, 293 (1987). 160. N. N. Ahmad, L. Ala-Kokko, R. G. Knowlton, S. A. Jimenez, E. J. Weaver, J. I. Maguire, W. Tasman and D. J. Prockop, PNAS 88, 6624 (1991). 161. N. N. Ahmad, D. M.McDonald-McGinn, E. H. Zackai, R. G. Knowlton, D. LaRossa, J. Dimascio and D. J. Prockop, Int. Conf. Mol. B i d . Pathol. Matrix, 4th AIII-10 (1992). 162. D. M. Brown, B. E. Nichols, T. A. Weingeist and V. C. Sheffield, Arch. Ophthalmol. 110, 1589 (1992). 163. P. Ritvaniemi, L. Ala-Kokko, J. Korkko, L. Haataja, H. Kaariainen, K. I. Kivirkko and D. J. Prockop, Znt. Conf. Mol. Biol. Pathol. Matrix 4th AIII-47 (1992). 164. L. A. Bruggeman, X. Hou-Xiang, K. S. Brown and Y. Yamada, Teratology 44, 203 (1991). 165. P. Vandenberg, J. S. Khillan, D. J. Prockop, H. Helminen, S. Knotusaari and L. AlaKokko, PNAS 88, 7640 (1991). 166. D. J. Prockop, JBC 265, 15349 (1990). 167. S. Garofalo, E. Vuorio, M. Metsaranta, R. Rosati, D. Toman, J. Vaughan, G. Lozano, R. Mayne, J. Ellard, W. Horton and B. de Crombrugghe, PNAS 88, 9648 (1991). 168. M. Metsaranta, S. Garofalo, G. Decker, M. Rintala, B. de Crombrugghe and E. Vuorio, J . Cell Biol. 118, 203 (1992). 169. R. Fleischmajer, R. Timpl, L. Tuderman, L. Raisher, M. Wiestner, J. S. Perlish and P. N. Graves, PNAS 78, 7360 (1981). 170. L. Ala-Kokko, S. Kontusaari, C. T. Baldwin, H. Kuivaniemi and D. J. Prockop, BJ 260, 509 (1989). 171. R. A. Janeczko and F. Ramierz, NARes 17, 6742 (1989). 172. M. P. Bernard, J. C. Myers, M.-L. Chu, F. Ramirez, E. F. Eikenberry and D. J. Prockop, Bchem 22, 1139 (1988). 173. H. R. Loidl, J. M. Brinker, M. May, T. Pihlajaniemi, S. Morrow, J. Rosenbloom and J. C. Meyers, NARes 12, 9282 (1984).
76
WILLIAM G. COLE
174. F. Ramirez, M. Bernard, M.-L. Chu, L. Dickson, F. Sangiorgi, D. Weil, W. de Wet, C. Junien and M. Sobel, Ann. N.Y. Acad. Sci. 460, 117 (1985). 175. V. Benson-Chanda, S. Ming-Wan, D. Weil, M.-L. Chu and F. Ramirez, Gene 78, 255 (1989). 176. E. C. Ruteshouser and B. de Crombrugghe, JBC 264, 13740 (1989). 177. F. M. Pope. B. E. Kendall, G. I. Slapak, R. Kapoor, W. I. McDonald, D. A. Compston, R. Mitchell, D. T. Hope, M. W. Millar-Craig, J. C. Dean, A. W. Jonston, P. 6. Lynch, P. Sarathchandra, P. Narcisi, A. C. Nicholls, A. J. Richards and J. L. MacKenzie, Rr. J. Yeurosurg. 5, 551 (1991). 178. A. Superti-Furga, E. Gugler, R. Glitzelmann and B. Steinmann. JBC 263, 6226 (1988). 179. B. Lee. M. D’Alessio, H. Vissing, F. Ramirez, B. Steinmann and A. Superti-Furga, Am. J. Hum. Genet. 48, 511 (1991). 180. A. J. Richards, J. C. Lloyd, P. Narcisi, P. N. Ward, A. C. Nicholls, A. De Paepe and F. M. Pope, Hum. Genet. 88, 325 (1992). 181. P. H. Byers, K. A. Holbrook, G. S. Barsh, L. T. Smith and P. Bornstein, h b . Invest. 44, 336 (1981). 182. H . \Tissing, M . D’Alessio, B. Lee, F. Ramirez, P. H. Byers, B. Steinmann and A. SupertiFurga, JBC 266, 5244 (1991). 183. W. 6. Cole. A. A. Chiodo, S. R. Lamande, R. Janeczko, F. Ramirez, H.-H. Dahl, D. Chan and J. F. Bateinan, JBC 265, 17070 (1990). 184. F. M. Pope, A. C. Nicholls, P. Narcisi, A. Temple, Y. Chia, P. Fryer, A. De Paepe, W. P. De Groote, J. R. McEwan, D. A. Compston, H. Oorthuys, J. Davies and D. L. Dinwoodie, Clin. Exp. Dermutol. 13, 285 (1988). 185. H. Timpl, H. Wiedemann, V. van Delden, H. Furthmayr and K. Kuhn, Eur. J. Biochem. 120, 203 (1981). 186. K. Tryggvason, P. Gehron Robey and G. R. Martin, Bchem 19, 1284 (1980). 187. D. Brazel, I. Oberbaumer, H. Dieringer, W. Babel, R. W. Glanville, R. W. Deutzrnann and K. Kuhn, EJB 168, 529 (1987). 188. R. Soininen, T. Haka-Risku, D. J. Prockop and K. Trywason, FEBS Lett, 225,188 (1987). 189. 1). Brazel, R. Polher, I. Oberbaumer and K. Kuhn, EJB 172, 35 (1988). 190. S. I,. Hostikka and K. Tryggvason, JBC 263, 19488 (1988). 191. J. C. Myers. P. S. Howard, A. M. Jelen, A. S. Dion, and E. J. Macarak, JBC 262, 9231 (1987). 192. R. Soininen, M. Huotari, M. Hostikka, D. J. Prockop and K. Tryggvason,JBC 263, 17217 (1988). 193. E. Poschl, R. Polher and K. Kuhn, E M B O J . 7, 2687 (1988). 194. N. Turner, P. J. Mason, R. Brown, M. Fox, S. Povey, A. Rees and C. D. Pusey, J. Clin. Invest. 89, 592 (1992). 195, K. E. Morrison, M. Mariyama, T. L. Yang-Feng and S. T. Reeders, Am. J. Hum. Genet. 49, 545 (1991). I,%. R . J. Butkowski, J. P. M. h g e v e l d , J. Weislander, J. Hamilton and B. G . Hudson, JBC 262, 7874 (1987). 197. J. Saus, J. Weislander, J. P. M. Langeveld, S. Quinones and B. G. Hudson, JBC 263, 1S374 (1988). 198. S. L. Hootikka, R. L. Eddy, M. 6 . Byers, M. Hoyhtya, T. B. Shows and K. Tryggvason. PNAS 87, 1606 (1990). 199. J. Zhou, J. M. Hertz, A. Leinonen and K. Tryggvason,JBC 267, 12475 (1992). 200. J. Zhou, D. F. Barker, S. L. Hostikka, M. C. Gregory, C. L. Atkin and K. Tryggvason, Genomics 9, 10 (1991). 201. D. Vetrie, E. Boye, F. Flinter, M. Bobrow and A. Harris, Genomics 14, 624 (1992).
COLLAGEN GENES AND THEIR MUTATIONS
77
202. D. F. Barker, S. L. Hostikka, J. Zhou, L. T. Chow, A. R. Oliphant, S. C. Gerkin, M. H. Skolnick, C. L. Atkin and K. Tryggvason, Science 248, 1224 (1990). 203. J. C . Myers, T. A. Jones, E. R. Pohjolaineu, A. S. Kadri, A. D. Goddard, D. Sheer, E. Solomon and T. Pihlajaniemi, Am. J. Hum. Gnet. 46, 1024 (1990). 204. B. Kuebelmann, G . Deschenes, G. Gros, M. C. Hors, J. P. Grunfeld, K. Tryggvason, M. C. Gubler and C. Antignac, Am. J . Hum. Genet. 51, 135 (1992). 205. J. Zhou, J. M. Hertz and K. Tryggvason, Am. J . Hum. Genet. 50, 1291 (1992). 206. H. J. Smeets, J. J. Melenhorst, H. H. Lemmink, C. H. Schroder, M. R. Nelen, J. Zhou, S. L. Hostikka, K. Tryggvason, H. H. Ropers, M. C. Jansweijer, L. A. Monniens, H. G. Brunner and B. A. van Oost, Kidney Int. 42, 83 (1992). 207. H. Bentz, N. P. Morris, L. W. Murray, L. Y. Sakai, D. Hollister and R. E. Burgeson, PNAS 80, 3168 (1983). 208. R. E. Burgeson, N. P. Morris, L. W. Murray, K. G. Duncan, D. R. KeeneandL. Y. Sakai, Ann. N.Y. Acad. Sci. 460, 47 (1985). 209. M. G. Parente, L. C. Chung, J. Ryynanen, D. Woodley, K. C. Wynn, E. Bauer, M.-L. Chu and J. Uitto, PNAS 88, 6931 (1991). 210. T. Tanaka, K. Takahashi, F. Furukawa and S. Imamura, BBRC 183, 958 (1992). 211. D. R. Keene, L. Y. Sakai, G . P. Lunstrum, N. P. Morris and R. E. Burgeson, J. Cell Biol. 104, 611 (1987). 212. J. Uitto, E. A. Bauer and A. N. Moshell, J. Invest. Dermatol. 98, 392 (1992). 213. D. S. Greenspan, M. G. Byers, R. L. Eddy, G. G. HoffmanandT. B. Shows, Cytogenet. Cell Genet. 62, 35 (1993). 214. J.-D. Fine, E. A. Bauer, R. A. Briggaman, R. Pearson and V. P. Sybert, J. Am. Acad. D e m t o l . 24, 119 (1991). 215. P. A. Coulombe, M. E. Hutton, A. Letai, A. Hebert, A. Paller and E. Fuchs, Cell 66, 1301 (1991). 216. A. Hovanian, P. Duquesny, C. Blanchet-Bardon. R. G. Knowlton, S. Amselem, M. Lathrop 11, L. Dubertret, J. Uitto and M. Goossens, Clin. Res. 40, 188A (1992). 217. L. al-Imara, A. J. Richards, R. A. Eady, I. M. Leigh, M. Farrall and F. M. Pope, J . Med. Genet. 29, 381 (1992). 218. L. Bruckner-Tuderman, I. Rantala and T. Reunala, J. Invest. Dermatol. 98, 141 (1992). 219. E. H. Epstein, Jr., Science 256, 799 (1992). 220. M. Ryynanen, J. Ryynanen, S. Sollberg, R. V. Iozzo, R. G. Knowlton and J. Uitto, J. Clin. Invest. 89, 974 (1992). 221. L. T. Smith and V. P. Sybert, J . Invest. D e m t o l . 94, 261 (1990). 222. M. K. Gordon, D. R. Gerecke, B. Dublet, M. van der Rest and B. R. Olsen, JBC 264, 19772 (1989). 223. P. Bruckner, M. Mendler, S. Huber and K. H. Winterhalter, JBC 263, 16911 (1988). 224. D. R. Eyre, S. Apron, J. J. Wu, L. H. Ericsson and K. A. Walsh, FEBS Lett. 220, 337 (1987). 225. M. van der Rest and R. Mayne, JBC 263, 1615 (1988). 226. T. Yada, M. Arai, S. Suzuki and K. Kimata, JBC 267, 9391 (1992). 227. S. Ayad, A. Marriott, V. H. Brierley and M. E. Grant, BJ 278, 441 (1991). 228. D. McCormick, M. van der Rest, J. Goodship, G Lozano, Y. Ninomiya and G. R. Olsen, PNAS 84, 4044 (1987). 229. S . Huber, K. H. Winterhalter and L. Vaughan, JBC 263, 752 (1988). 230. Y. Muragaki, T. Kimura, Y. Ninomiya and B R. Olsen, EJB 192, 703 (1990). 231. R. Har-el, Y. D. Sharma, A. Aguilera, N. Ueyama, J. J. Wu, D. R. Eyre, L. Juricic, S. Chandrasekaran, M. Li and H. D. Nah, JBC 267, 10070 (1992). 232. M. Perala, M. Hanninen, J. Hastbacka, K. Elima and E. Vuorio, FEBS Lett. 319, l(1993).
78
WILLIAM G . COLE
233. H. Yoshioka, H . Zhang, F. Ramierz, hi.-G. Mattei, M. Moradi-Ameli, M. van d e r Rest and M. K. Gordon, Genomics 13, 884 (1992). 234. J. M . Fitcli, A. Mentzer, R. Mayne and T. F. Linsenmayer, Deu. B i d . 128, 396 (1988). 235. N. B. Slepecky, J. E. Savage and T. J. Yoo, Aefa Oto-bryngol. 112, 611 (1992). 2.36. J. J. Fitch, A. Mentzer, R. Mayne and T. F. Linsenmayer. Deuelopment 105, 85 (1989). 2.37 K. K. Svoboda, I. Nishimura, Y. Ninomiya and B. R. Olsen, PNAS 85, 7496 (1988). 238. R. E. Swiderski and M. Solursh, Demlopment 115, 169 (1992). 2.39. R. E. Swiderski and M. Solursh, Deu. Dyn. 194, 118 (1992). 240. K. Nakata, K. Yamamura, J. Miyazaki, B. R. Olsen, Y. Muragaki, Y. Yamada and T. Kirnura, Prof. Orthop. Res. Soc. 38, 2.51 (1992). 241. G. J , Gibson, B. W. Beaumont and M. H. Flint, J. Cell Biol. 99, 208 (1984). 242. T. Kirsch and K. von d e r Mark, EJB 196, 575 (1991). 243. T. F. Linsenmaver, E . Gibney and T. M. Schmid, Deu. B i d . 113, 467 (1986). 244. A. R. Poole and I. Pidoux, J . Cell Bid. 109, 2547 (1989). 245. T. M. Schmid and T. F. Linsenmayer, Dea B i d . 138, 53 (1990). 246. G . J. Gibson, C . H. Bearman and M. H. Flint, Collagen Reht. Res. 6, 163 (1986). 247. A. P. Kwan, C. E. Cummings, J. A. Chapman and M. E. Grant. 1. Cell B i d . 114, 597 (1991). 248. S. Apte, M. G. Mattei and B. R. Olsen, FEBS Lett. 282, 393 (1991). 249. J. T. Thomas, C . J. Cresswell. B. Rash, H. Nicolai, T. Jones, E. Solomon, M. E. Grant and R. P. Boot-Handford, B/ 280, 617 (1991). 250. S, Apte and B. R. Olsen, Proc. Orthop. Res. Soc. 39, 117 (1993). 251. W. A. Sweetman, B. Rash, 8 . Sykes, P. Beighton, J. T. Hecht, B. Zbel, J. T. Thomas, R. Boot-Hanford, M . E. Grant and G . A. Wallis, A m . 1.Hum. Genet. 51, 841 (1992). 252. R. bf. Dharmavaram. M. Peng, R. R. Strawbridge and S. A. Jiminez, BBRC 187, 420 i1992). 253. 0. Jacenko, P. Luvalle, K. Solum and B. R. Olsen, ibfatrix in press (1993). 2%. C. IIuerre, C. Junien, D. Weil, h4.-L. Chu, M. M. Morabito, N. Van Cong, J. C. Myers, C. Foubert. M . 4 . Gross, D. J. Prockop, A. Boue, J.-C. Kaplan, A. d e la Chapelle and F. Hamirez, PNAS 79, 6627 (1982). 255. E. Retief, M . I. Parker and A. E. Retief, Hum. Genet. 69, 304 (1985). 256. E. Solomon, L. R. Hoiorns, N. Spurr, M. Kurkinen. D. Barlow, B. L. M. Hogan and R. Ihlgleish. PNAS 82, 3330 (1985). 257. E. Solomon, F Hall and M. Kurkinen, Ann. Hum. Genet. 51, 125 (1987). 258. M. Mariyama, K . Zheng, T. L. Yang-Fend and S. T. Reeders, Genomics 13, 809 (1992). 259. D . S. Greenspan, $1. 6. Byers, R. L. Eddy, W. Cheng, S. Jani-Sait and T. B. Shows, Genomics 12, 836 (1992). 260. C. Huerre-Jeanpierre, I. Henry, M. G. Mattei. D. \.Veil, K. H. Grzeschik, M. L. Chu, M. Bernard, F. Ramirez and C. Junien, Cytogenet. Cell Genet. 40, 657 (1985). 261. B. S. Emanuel. L. A. Cannizzaro, J. M. Seyer and J. C. Myers, PNAS 82, 3385 (1985). 262. C . A. Francomano, G . R. Cutting, M . K. McCormick, M. L. Chu, R. Timpl, H. K. Hong and S. E. Antonarakis, Hum. Genet. 87, 162 (1991). 263. D. Weil, M. G . Mattei, E. Passage, N. Van Cong, D. Pribula-Conway, K. Mann, R. Deutzmann, R. Timpl and M. L. Chu, Cytogenet. Cell Genet. 46, 713 (1987). 26.i. I). Weil, M. 6. Mattei, E. Passage, V. C. N'Guyen, D. Pribula-Conway, K. Mann, R. Deutzmann, R. Timpl and M. L. Chu. Am. J. Hum. Genet. 42, 435 (1988). 265. A. M. Christiano, L. C. Chung-Honet, A. Hovnanian and J. Uitto, Genomics 14, 827 (1992).
COLLAGEN GENES AND THEIR MUTATIONS
79
266. Y. Muragaki, M. G. Mattei, N. Yamaguchi, B. R. Olsen and Y. Ninomiya, EJB 197, 615 (1991). 267. Y. Muragaki, 0. Jacenko, S. Apte, M. G . Mattei, Y. Ninomiya and B. R. Olsen, JBC 266, 7721 (1991). 268. T. Kimura, M. G. Mattei, J. W. Stevens, M. B. Goldring, Y. Ninomiya and B. R. Olsen, EJB 179, 71 (1989). 269. I. Henry, A. Bernheim, M. Bernard, M. van der Rest, T. Kimura, C. Jeanpierre, F. Barichard, R. Berger, B. R. Olsen and F. Ramirez, Genomics 3, 87 (1988). 270. I. M. Hanson, P. Gorman, V. C. Lui, K. S. Cheab, E. Solomon and J. Trowsdale, Cytogenet. Cell Genet. 51, 1010 (1989). 271. T. Kirnura, K. S. Cheah, S. D. Chan, V. C. Lui, M. G. Mattei, M. van der Rest, K. Ono, E. Solomon, Y. Ninomiya and B. R. Olsen, JBC 264, 13910 (1989). 272. L. Tikka, P. Tania, H. Pirkko, D. J. Prockop and K. Tryggvason, PNAS 85, 7491 (1988). 273. T. B. Shows, L. Tikka, M. G. Byers, R. L. Eddy, L. L. Haley, W. M. Henry, D. J. Prockop and K. Tryggvason, Genomics 5, 128 (1989). 274. L. Pajunen, M. Tamminen, E. Solomon and T. Pihlajaniemi, Cytogenet. Cell Genet. 52, 190 (1989). 275. K. Huebner, L. A. Cannizzaro, E. W. Jabs, S. Kivirikko, H. Manzone, T. Pihlajaniemi and J. C. Myers, Genomics 14, 220 (1992). 276. T. C. Pan, R. Z. Zhang, M. G. Mattei, R. Timpl and M. L. Chu, PNAS 89,6565 (1992). 277. G. Tromp, H. Kuivaniemi, A. Stacey, H. Shikata, C. T. Baldwin, R. Jaenisch and D. J. Prockop, BJ 253, 919 (1988). 278. D. J. McGookey, A. C. M. Smith, G. Waldstein and P. H. Byers, Am. j . Hum. Genet. 45, A206 (1989). 279. H. Kuivaniemi, S. Kontusaari, G . Tromp, M. Zhao, C. Sabol and D. J. Prockop, JBC 265, 12067 (1990). 280. S. Kontusaari, G. Tromp, H. Kuivaniemi, R. L. Ladda and D. J. Prockop, Am. J. Hum. Genet. 47, 112 (1990). 281. B. Lee, E. Vitale, A. Superti-Furga, B. Steinmann and F. Ramirez, JBC 266,5256 (1991). 282. S. Thakker-Varia, D. Anderson, H. Kuivaniemi, G. Tromp, D. J. Prockop and C. A. Stolle, Matrix 10, 249 (1990). 283. T. Wu, G. Tromp, H. Kuivaniemi and D. J. Prockop, in “Miami Short Reports, Vol. 1. Advances in Gene Technology: The Molecular Biology of Human Genetic Disease. Proceedings of the Miami Bio/Technology Winter Symposium” (F. Ahrnad, ed.), p. 39. (1991). 284. J. Earley, G . Tromp, H. Kuivaniemi, Z. Gatalica and D. J. Prockop, Znt. Conf. Mol. Biol. Puthol. Mutrix, 4th AIII-14 (1992). 285. S. Kontusaari, G. Tromp, H. Kuivaniemi, A. M. Romanic and D. J. Prockop, J. Clin. Znoest. 86, 1465 (1990). 286. G . Tromp, H. Kuivanierni, C. Stolle, F. M. PopeandD. J. Prokop,JBC 264, 19313(1989). 287. A. J. Richards, P. N. Ward, P. Narcisi, A. C. Nicholls, J. Lloyd and F. M. Pope, Hum. Genet. 89, 414 (1992). 288. A. J. Richards, J. C. Lloyd, P. N. Ward, A. De Paepe, P. Narcisi and F. M. Pope, J. Med. Genet. 28, 458 (1991). 289. P. H. Johnson, A. J. Richards, F. M. Pope and D. A. Hopkinson, J. Inherited Metub. Dis. 15, 426 (1992). 290. S . Kontusaari, G. Tromp, H. Kuivaniemi, C. Stolle, F. M. Pope and D. J. Prockop, Am. J. Hum. Genet. 51, 497 (1992). 291. A. Renieri, M. Seri, J. C. Myers, T. Pihlajanierni, A. Sessa, G . Rizzoni and M. De Marchi, Hum. Genet. 89, 120 (1992).
80
WILLIAM G . COLE
292. C. Antignac, J. Zhou, M. Sanak, P. Cochat, B. Roussel, C. Deschenes, F. Gros, B. Knebelmann. hl.-C. Hors-Cayla, li. Tryggvason and M . C. Gubler, Kidney Znt. 42, 1178 (1992). 293. A. Renieri, M . Seri, J. C. hlyers, T. Pihlajaniemi, L. Massella, G. Rizzoni and M . De Marchi, Hum. Mol. Genet 1, 127 (1992).
Signal-Transducing G Proteins: Basic and Clinical Implications’ C. W. EMALA,* W. F. SCHWINDINGER,~ G. S. WAND+ AND M. A. L E V I N E ~ . ~ *The Departments of Anesthesiology and ?Medicine The Johns Hopkins Unioersity School of Medicine Baltimore, Maryland 21205 I. Guanine Nucleotide Binding Proteins
........................
11. Structure of a Subunits of G Proteins ........................... 111. Function of a Subunits of G Proteins ............
IV. Structure of Subunits of G Proteins ........................... V. Function of $y Dimers of G Proteins . . . . . . . . . . . . .
......................... A. B. C. D. E. F. G.
Altered G-Protein Function Genetic Studies of G,a ..................................... Methods of Study ...... Albright Hereditary Osteo ................... Deficient Expression or Function of G,a as the Basis for Albright Hereditary Osteodystrophy . . .......................... Sporadic Endocrine Neoplas McCune-Aibright Syndrome ... Altered Expression of G Proteins ............................
.............
IX. Glossary.. ...................................................
81 82 86 88 90 92 93 94 94 95 97 100 100 101
105 106 107
1. Guanine Nucleotide Binding Proteins Guanine nucleotide binding proteins (GTP-binding proteins) are a superfamily of proteins that regulate diverse cellular functions. Members of this superfamily sort transmembrane signals, direct the fidelity of protein synthesis, guide vesicular transport through the cytoplasm, and control growth 1 2
A glossary for this chapter appears on p. 106. To whom correspondence should be addressed.
Progress in Nucleic Acid Research
and Molecular Biology, Vol. 47
81
Copyright 8 1994 by Academic Press, Inc. All rights of reproduction in any form reserved.
82
C. W. EMALA E T AL.
and clifferentiation of cells. GTP-binding proteins include a group of small (i.e., 21-28 kDa) proteins that regulate intracellular hnctions (e.g., ~21‘“s and EF-Tu), as well as a group of larger proteins (i.e., 39-52 kDa) that regulate transmembrane signal transduction. The GTP-binding proteins utilize a common mechanism, the binding and hydrolysis of GTP as a “molecular switch” to regulate their activity. This GTPase cycle is initiated by the release of GDP and the binding of GTP. The binding of GTP causes a conformational change that activates the GTPbinding protein and increases its a f h i t y for effector proteins. The GTPbinding protein remains active until bound GTP is hydrolyzed to GDP by an intrinsic GTPase activity of the protein. The signal-transducing GTP-binding proteins, termed G proteins, couple receptors to diverse intracellular second messengers, including enzymes and ion channels. The G proteins share a heterotrimeric structure composed of an a chain and a Py dimer. The a chain is most unique; it binds and hydrolyzes GTP, and facilitates discrimination between different receptor and effector molecules. At least 17 different cx chains interact with a smaller pool offour P (1-7) and at least six y (8-13) subunits. The f3y dimer is tightly associated with a GDP-bound a chain and facilitates interaction of the G protein with a receptor molecule. In addition to their role regulating a-chain function, the Py subunits appear capable of transmitting intracellular messages in some cells and may anchor the af3y complex to the cell membrane.
II. Structure of a Subunits of G Proteins The G proteins share a heterotrimeric structure consisting of an a subunit and two smaller, tightly coupled subunits, py (Fig. 1). The a subunits bind guanine nucleotides, possess intrinsic GTPase activity, and couple receptor with effector molecules; some are substrates for ADP-ribosylation by bacterial toxins. The a subunits demonstrate the greatest diversity, are unique to each G protein, and are thought to confer functional specificity, allowing discrimination among multiple receptors and effectors. The 17 different a chains so far identified range in size from 350 to 395 amino acids, have an overall 40% amino-acid homology, and have 60-90% homology within subfamilies (12).The a chains have been subdivided into four subfamilies based on their amino-acid sequence homology: (1)Gsa and Go,,; (2) G,a(1-3), G,a, GLa, transducins 1 and 2, and gustducin; (3) Gqa, G,,a, G14a, G,,5cx; and (4)Giza, Gl,a (Fig. 2) (12).Expression of some a subunits appears to be highly tissue-specific. For example, one form of transducin ( T a l ) is specific to the rods whereas a second form (Ta2) is expressed only in cones of the
SIGNAL-TRANSDUCING G PROTEINS
83
FIG. 1. A schematic representation of the predicted three-dimensional conformation of a G protein. The putative regions that interact with the receptor and effector are shown. The regions G1 through G5 are regions believed important for GDP-GTP exchange, GTP-induced conformational changes, and GTP hydrolysis. [Reprinted with permission from Ref. 80; 0 The Endocrine Society.]
retina (14, 15);Golfis expressed primarily in olfactory neuroepithelium (16); and gustducin is expressed only in taste buds of the tongue (17). Other a subunits, such as Gsa, the G p s , and G,, have a ubiquitous distribution (18). Receptor activation of G proteins facilitates exchange of bound GDP for GTP by the a subunit, and causes dissociation of the a subunit from the Py dimer (Fig. 3). The free a-GTP chain interacts with specific signal effector enzymes or ion channels to regulate their activity. The interaction of the a subunit with its effector is terminated with hydrolysis of GTP to GDP by an intrinsic GTPase. The GDP-liganded a chain reassociates with the Py dimer and the heterotrimeric G protein is ready for another cycle of receptor activation. The interaction of additional proteins that regulate the rate of GDP dissociation and GTP hydrolysis adds another level of control to G-protein activity. Guanine-nucleotide-releasingproteins (GNRPs) catalyze the release of bound GDP from the GTP-binding protein, allowing GTP to bind and activate the GTP-binding protein. Examples of GNRPs include ribosomes that enhance the release of GDP from EF-Tu during protein synthesis, and
84
C. W. EMALA ET AL.
GS
Gi
Gq
\
A
G12
io
50
1
60
io
6
!h
,
loo
'--I
Amino acid identity ('YO)
FIG. 2. Amino-acid homology of the four subfamilies of G-protein
Q
subunits.
hormone receptors that facilitate the release of GDP from a subunits of G proteins. GTPase-activatingproteins (GAPS)increase the rate of GTP hydrolysis by small GTP-binding proteins such as p21rm (19). The (Y subunit of G proteins is hypothesized to contain its own GAP region, explaining the greater kcat-c:TPof the CL chain (3-5 min-1) compared to p21ras (0.02 min-I), and also may account for its larger size. Other proteins may regulate the GTPase of at least some G proteins. For example, the y subunit of cGMP phosphodiesterase, the effector protein of transducin, functions as a GAP for transducin (20). The a subunits of many G proteins undergo posttranslational modifications. The N-myristoylation of an amino-terminal glycine in Cia, Goa, Gz a , and Gta is important for attachment of the a chain to the membrane, probably via enhanced aff;nityof the modified CY chain for the Py dimer. The function of additional modificationsof the N-terminal region of transducin by heterologous fatty acids is unclear at present (21). Modifications of G proteins that alter their function include phosphorylation of a subunits. Cia@)subunits are substrates for phosphorylation by
85
SIGNAL-TRANSDUCING G PROTEINS Hormone Receptor
1
xptor
\
FIG. 3. Schematic representation of the GDP-GTP cycle of G-protein (Y subunits. Following release of GDP, GTP binds and releases Pr. The activated (Y subunit with GTP bound then activates a second messenger until GTP is hydrolyzed to GDP, allowing p-y to again return the heterotrimer to its basal state. Additionally, liberated P-y subunits may interact with second messengers; in this example, Pr subunits can stimulate or inhibit adenylate cyclase depending on the type of adenylate cyclase present. The effects of toxin treatment or several mutations are schematically shown to block GTPase activity in G,u or the activation of G p .
protein kinase C. The basal level of phosphorylation of Gia(2)is increased in hepatocyte membranes from rats with induced diabetes, and G p can be phosphorylated on a serine by protein kinase C in platelets (22-24). Some (Y chains can be modified by bacterial toxins. The A, chain of choleragen covalently attaches an ADP-ribose to arginine 201 of the (Y subunit of G, or Golf, causing a loss of GTPase activity. The activated (Y subunit results in constitutive activation of adenylate cyclase. The resulting high levels of CAMPimpair luminal transport of salt and water, causing secretory diarrhea. Pertussis toxin, an islet-activating protein (IAP), ADP-ribosylates the cx subunit of Gi, Go, or transducin, which uncouples them from receptor activation. X-ray crystal structures of p21ra and EF-Tu have identified critical noncontiguous amino-acid sequences that form a guanine-nucleotide-binding pocket and function in the hydrolysis of GTP (25-30). Although the threedimensional conformation of G-protein (Y chains has not been described, comparison of regions of amino-acid homology between ras and a conserved (Y chain has facilitated predictions of functional domains (Fig. 1). Five regions of the polypeptide chain, designated G1 to G5, have been identified as functional domains important for GDP-GTP exchange, GTPinduced conformational changes, and GTP hydrolysis. The G1 region has the
86
C. W. EMALA ET AL.
amino-acid sequence motif GlyXXXXGlyLys(Ser/Thr). In the G1 of p21ras, amide groups of several amino acids and the €-amino group of lysine 16 form hydrogen bonds with the a and p phosphates of GTP or GDP. The G2 region, which corresponds to amino-acid residues 32-40 in p21ra8, changes conformation with the binding of GTP. The binding of the GTP analog Gpp(NH)pand a magnesium ion in the G2 region of p2lraSallows interaction of the p and y phosphates of GTP with threonine 35. The presence of Mg2+ is essential for GTP hydrolysis, and the conserved threonine residue is present in each putative G2 sequence. The G 3 region corresponding to amino-acid residues 53-62 in p21rus contains the sequence motif conserved in all GTPases, AspXXGly. A conserved aspartate binds the catalytic Mg2+ while a conserved glycine binds to the y-phosphate of GTP. Dramatic conformational change occurs in amino acids 60-63 when GDP is exchanged for GTP. The G4 region consists of the characteristic sequence motif [(Am/ Thr)(Lys/Glu)XAsp] and contains residues responsible for binding to the purine ring of guanine nucleosides and confers guanine specificity. The G5 region has not been unambiguously identified by the amino-acid sequence in all GTPase subfamilies. In p21ras, residues 144-146 of this region interact indirectly with the guanine nucleotide. The corresponding region of G-protein a chains consists of the amino-acid sequence motif Thr(Cys/Gln)Ala(ThrNalfAspThr, which may also indirectly interact with the guanine nucleotide. Several regions of a chains are important for interaction of a chains with eEector molecules (Fig. 1; see also Fig. 11). Residues 236-356 of the a chain of C,a comprise the shortest linear chimeric construct that results in activation of adenylate cyclase. Scanning mutagenesis assays have been used to identify four separate regions (Cl-CIV) in which amino-acid substitutions prevent activation of adenylate cyclase (31).Three noncontiguous amino-acid clusters appear to form a surface within Gsa that may bind and stimulate adenylate cyclase. This surface includes a region corresponding to a domain in p21ru.y that undergoes a conformational change with the binding of GTP.
111. Function of a Subunits of G Proteins The a chains of C proteins have been identified by biochemical purification or molecular cloning. In many instances, molecular cloning has revealed the existence of more than one protein that might subserve the role of a specific G protein first identified by function. The initial description of an inhibitory Gi protein has led to the cloning of three distinct genes encoding three closely related Gi proteins. Further diversity derives from alternate
SIGNAL-TRANSDUCING G PROTEINS
87
splicing of gene products (e.g., four species of Gsa and two to four forms of Go are generated by alternative splicing) (32-34). G proteins couple cellsurface receptors to intracellular enzymes or channels. More than 100 distinct receptors have been shown by pharmacologic or molecular biological means to be coupled to effector molecules through G proteins. The reader is referred to a recent comprehensive review for the growing list of G-proteincoupled receptor-effector systems (13). The G-protein-coupled receptors share a common topographical organization (Fig. 4). The receptors' extracellular N-terminal domain has sites for N-linked glycosylation. The receptor structure contains seven hydrophobic segments, which are predicted to span the plasma membrane. Each receptor has a cytoplasmic C-terminal domain that has sites for phosphorylation on serines and threonines. The ligand-binding sites, at least for rhodopsin and catecholamines, have been identified within the plane of the plasma membrane and include important contacts on almost all transmembrane segments. Threonine and serine residues in the third cytoplasmic loop and the carboxyl-terminal tail are substrates for phosphorylation by protein kinase A, protein kinase C, and 6-adrenergic receptor kinase (PARK). Phosphorylation of the receptor leads to desensitization of the hormone response. Homo-
?-"'
FIG. 4. A schematic representation of the P-adrenergic receptor, which is a representative member of G-protein-coupled receptors predicted to contain seven transmembrane hydrophobic regions. BARK, P-Adrenergic receptor kinase; PKA, protein kinase A.
88
c. w.
EMALA
Er
AL.
logous desensitization of the p-adrenergic receptor requires the binding of the protein p-arrestin following phosphorylation. The cytoplasmic loop between transmembrane domains V and VI contains amino acids essential for G-protein coupling. G proteins regulate the activity of several enzymes and ion channels, including (1) cGMP-specific phosphodiesterase, which is coupled to rhodopsin 11, transducin ( G , )in the visual system, (2) hormone-sensitive adenylate cyclase, which is coupled to receptors that increase or decrease activity through two classes of G proteins, G , and G,,lf or G , and G,, respectively, (3) phospholipase C , which is coupled through members of the G, family, and (4)at least six sodium, potassium, and calcium-ion channels that are coupled through a subunits, including G , , GI, Go, and Gk (35-38). Receptor activation in different tissues can elicit a variety of cell responses, indicating that some receptors may couple with more than one G protein. Thus some receptors (e.g., muscarinic) can couple to homologous G proteins (e.g., GI, G,,, or G,,) that inhibit adenylate cyclase or activate phospholipase C. By contrast, expression of cloned parathyroid receptors (39) and calcitonin receptors (40,41) leads to stimirlution of adenylate cyclase and activation of phospholipase C , suggesting that a single receptor can couple to two diuergent G proteins (e.g.. G, and a G protein linked to phospholipase C). Recently, similar coupling abilities have been demonstrated for the thyrotropin receptor (42).
IV. Structure of p Subunits of G Proteins Although early attempts to understand G-protein function had focused primarily on 01 chains, a growing interest in the role of the Py dimers has emerged. Four different p subunits (1-7) and six y subunits (8-13) have thus far been identified, and it is likely that additional forms exist. Little is known about the function, tissue distribution, specificity of Py dimerization, or specificity of coupling of p-y dimers to a subunits; however the PI and Pz polypeptides are immunologically (43, 44) and structurally (1-3, 45, 46) distinct and are encoded by separate genes (47).Recent immunohistochemical evidence indicates that the p3 subunit is present in cones whereas the PI subunit is present in rods of the retina (Fig. 5) (48, 49). This cell-specific distribution allows intriguing speculation concerning specificities of function and coupling of the p subunits. Each p subunit consists of 340 amino acids and migrates in SDS-PAGE with an apparent mass of 35-36 kDa. Eight repetitive segments of 40 amino acids (termed WD40) occur in the p chains, and comprise a motif of unclear significance that is found in several other apparently unrelated proteins (12).
SIGNAL-TRANSDUCING G PROTEINS
89
FIG. 5. (A) Immunoperoxidase staining of a cross-section of monkey retina with a rabbit polyclonal antibody against G-protein PI subunit. Large black arrow indicates a stained amacrine cell body, small black arrow indicates a stained ganglion cell body, and white arrow indicates a cone photoreceptor, which is not stained. (B) Monkey retinal section stained with a rabbit polyclonal antibody against G-protein p3. Arrow indicates a stained bipolar cell. RPE, Retinal pigment epithelium; PRL, photoreceptor layer; ONL, outer nuclear layer; OPL, outer plexiform layer; INL, inner nuclear layer; IPL, inner plexiform layer; GCL, ganglion cell layer; OFL, optic fiber layer. [Reprinted with modifications from Peng et al., Proc. Natl. Acad. Sci.
U.S.A. 89, I0882 (1992).]
The six y subunits each consist of 70 amino acids; they migrate with apparent masses of 6-10 kDa (8-13). The “CAAX” (CysAlaAlaX) carboxylterminus residues are the sites of posttranslational modification (50-53). A terminal cysteine is isoprenylated, followed by the proteolysis of three amino acids distal to the cysteine, and carboxylation of the isoprenylated cysteine occurs. The modified cysteine residue is then thought to anchor the G-protein heterotrimer to the membrane. There is discrimination in the specificity of coupling of p and y subunits (Fig. 6) (54, 55). Limited tryptic proteolysis and measurement of Stokes radii of p and y subunits translated in vitro offer insight into the specificity of Py dimerization. p1 dimerizes with y1 or y2, p2 dimerizes with y2 but not yl, and p3 dimerizes with neither y1 nor y2 in vitro. The specificity of p and y interactions offers another level of regulation of G-protein-mediated signaltransduction pathways. Moreover, Py dimers appear to differ in their ability to couple a subunits to receptors or modulate effectors. The f3y dimers have opposing effects on adenylate cyclase depending on the type of adenylate cyclase present. Py dimers inhibited type-I adenylate cyclase but stimulated type-I1 adenylate cyclase, and these effects were accentuated by the addition of G,a subunits (56).
C . W. EMALA ET AL.
FIG. 6. Limited tryptic proteolysis provides a means to investigate the conformation of the
f3 subunit in the presence or absence of y. Tryptic digests of in citro translated Py dimers vields fraginents of 26-27 kDa, which are not obtained when f3 monomers are digested. Lane 1. no y subunit; lane 2, no y subunit; lane 3, yz added; lane 4, y I added. Samples were incubated in the absence (lane 1) or in the presence (lanes 2. 3, 1)of 20 pmol of trypsin. Arrowhead indicates po.iition of the 26- or 27-kDa tryptic fragments. Suhunits y I and y2 diinerize with p,, y2 dimerizchs with p2, but neither y subunit dinierizes with P3. [Reprinted with modifications from Schmidt et n l . . I . H i d . Cheni. 267, 13807 (1992).]
The (3, and Pz subunits have a widespread tissue distribution but the abundance of each protein varies (53, with highest concentrations in rat brain, lung, kidney, liver, spleen, and human placenta. There are lesser amounts in rat erythrocytes and cardiac and skeletal muscle. Conversely, tissue expression of fl3 and p4 appears to be more variable. Northern blot hybridization identified p3 niRNA in retina, rhabdomyosarcoma, pheochromocytoma, neuroblastoma, and dermal fibroblasts (Fig. 7) (5). Distribution of the inRNA encoding the p4 subunit vanes significantly, being highest in the brain, eye, lung, heart, and testis (7). Identification of a p4 protein has not yet been reported.
V. Function of fly Dimers of G Proteins The functions of the py dimers appear diverse and include (1)binding to an inactive a-GDP subunit to reassociate the heterotrimer into its inactive
SIGNAL-TRANSDUCING G PROTEINS
91
FIG. 7. (A) Blot hybridization analysis of bovine (lanes 1-3) and human (lanes 4 and 5) P-subunit mRNAs. Each lane contained 10 pg of total retinal RNA. The hybridization probe was the radiolabeled bovine subunit cDNA derived from clone BTdl (lanes 1-3) and cDNAs derived from human retina clones encoding the PI, pz, or p3 subunit or the 45-base oligonucleotide derived from the 3’ untranslated region of p3. (B) Blot hybridization analysis of p1 and p3subunit mRNA in human cell lines. Poly(A)+RNA was isolated from human cell lines and hybridized first with the radiolabeled cDNA from the human p3 subunit clone p22 (lanes 1-5). After exposure to X-ray film, the filter was strip-washed off the bound probe and rehybridized with a radiolabeled cDNA from the human p1 subunit clone (lane 6-10). Lanes 1 and 2, rhabdomyosarcoma; lane 3, pheochromocytoma; lane 4, neuroblastoma; and lane 5, dermal fibroblast. RNA size markers are indicated at left. [Reprinted with modificationsfrom Levine et al., Proc. Natl. Acad. Sci. U.S.A. 87, 2329 (1990).]
92
C. W. EMALA ET AL.
state, (2)anchoring the heterotriiner to the membrane via an isoprenylated y subunit (58), and (3) transducing the intracellular message in some cells, which may include (a) transduction of the message of the mating pheromone receptor in yeast (59-61), (b) regulation of cardiac potassium channels (62), ( c ) activation of retinal and cardiac phospholipase A, (63, 64),(d) differing direct effects on the inodulation of various isoforms of adenylate cyclase (Fig. 3) 136, 65, 66), (e) assisting in the translocation of P-adrenergic receptor kiiiast to the membrane to facilitate P-adrenergic phosphorylation and down-regulation (67), and (f) stimulation of phospholipase C in cultured granulocytes (HL-60) (68).
VI. The fi3 Subunit Our studies i n a variety of tissues suggest that the P3 mRNA is widely distributed, but attempts to identify the P3 protein in membrane preparations by immunoblot analysis have been unsuccessful. An affinity-purified polyclonal P:, antibody has been used to localize the P3 to the cones of the retina b y immunohistochemical staining (48), but immunoblotting with this antibody has not identified the protein in tissues that contain the mRNA. It and pz proteins and the is uncertain if the ubiquitous expression of limited identification of P3 reflect a true absence of P3 protein in tissues. Conversely, the quantity of P3 present in membranes may be too low to be detected by immunoblot, or the P3 protein may be present in greater abundance in a soluble rather than membrane fraction of the cell. Recent immunohistochemical studies in our laboratory suggest that in some tissues the distribution of P3 is limited to only certain cell types within tissues (Fig. 5). If distribution of P3 is limited to few cells in tissues, detection of the PJ protein by immunoblot analysis may not be possible. The P:, protein is specifically localized to photoreceptor cone cells of the retina, and PI is uniquely localized in the rod photoreceptor cells, as determined by iinmunocytochemistry utilizing specific antisera (48).The PI antibody specifically stained rods and the P2 antibody weakly but specifically stained the outer plexiform layer and discrete laminae of the inner plexiform layer. The Po antibody specifically stained the entire cone cell, including the outer segment, axon, and synaptic terminal. Double-labeling immunofluorescence confirmed reactivity of the P3 antibody with the three types of cones; red, blue,and green. These findings in the monkey retina were confirmed in the retinas of human, cattle, rabbit, rat, and ground squirrel. The specific localization of PI and P3 subunits to rods and cones, respectively, and the presence of distinct transducin subunits and distinct cGMP phosphodiesterases in rods and cones raise an intriguing possibility that P
SIGNAL-TRANSDUCING G PROTEINS
93
subunits may be segregated and specific to an a subunit. This awaits the determination of any specificity of binding of p subunits to transducin subunits. Furthermore, attempts to form dimers in vitro between p3 and y1 or y, have been unsuccessful, suggesting that another y subunit couples in vivo to p3 which would presumably be present in the cones of the retina (Fig. 6). A growing complexity emerges with respect to the tissue distribution and specificity of coupling of different subunits of the heterotrimer, which affords the cell a complex array of second-messenger signaling pathways. Thus the cell may modulate its response to external stimuli not only by the type of receptors and second-messenger pathways it contains, but also by the availability and perhaps competition for G-protein subunits between different signaling pathways.
VII. Clinical Implications of Altered G-Protein Function There is growing evidence that altered function or expression of G proteins may pIay a role in several clinical disorders. These include covalent modifications of a chains by toxins of the bacteria Vibrio cholerm and Bordetella pertussis, genetic mutations in genes encoding a chains that result in increased (e.g., McCune- Albright syndrome and neoplasias) or decreased (e.g., Albright hereditary osteodystrophy) function. Moreover, expression of a chains of G proteins is influenced by hypo- or hyperthyroidism (69), glucocorticoid treatment (70), insulin deficiency (24), ethanol (71-75), and heart failure (76-79). A more comprehensive description of altered signal transduction in these conditions appears in two recent reviews (80, 81). Our laboratory has focused on analysis of genetic defects in G,a as a basis for altered regulation of hormone-sensitive adenylate cyclase and clinical disease. The adenylate cyclase system consists of at least three distinct components: specific receptors, adenylate cyclase molecules, and G proteins. Activity of adenylate cyclase is regulated by G proteins that couple signals produced by the interaction of hormones, neurotransmitters, and other ligands with specific receptors at the cell surface. At least two G proteins, one stirnulatory (e.g., G,, G,,,) and the other inhibitory (e.g., G,, GJ, regulate catalytic activity of adenylate cyclase. Stimulation of adenylate cyclase enhances the formation of CAMP. Subsequent actions of CAMP-dependent protein kinases lead ultimately to the physiological expression of the agonist. Stimulatory receptors (RJ include those for P-adrenergic agonists, PTH, TSH: ACTH, glucagon, and other peptides, whereas inhibitory control is exerted by agents such as a,-adrenergic and muscarinic agonists that bind to inhibitory receptors (Ri) (82).
94
C. W. EMALA ET AL.
A. Genetic Studies of G p The G,a protein is encoded by a single gene, GNASl, that is located in the human chromosome 2 0 q 1 3 . 1 4 13.2 (83). The GNASl gene spans approximately 20 kb and contains 13 exons and 12 introns (84). The nascent G,a mRNA is alternatively spliced to yield four products, by including or excluding exon 3, and by utilizing alternative splice-acceptor sites in intron 3 to include or exclude a single serine residue (32). These four transcripts encode a chains that migrate as 45- or 52-kDa proteins, and contain 380 or 381, and 394 or 395 amino acids, respectively. An alternate exon 1 has been identified (termed 1’) located 2.5 kb 5’ of the previously described exon l(85). Mutations in GNASl are associated with several human disorders. Inherited mutations in GNASl that result in decreased expression or function of the G,a protein have been identified in individuals with Albright hereditary osteodystrophy (AHO) (86). Sporadic mutations in GNASl that inactivate intrinsic GTPase activity of G,a and thereby lead to constitutive activation of adenvlate cyclase have been identified in a subset of sporadic human pituitary (87) and thyroid adenomas (88). Similar activating mutations of GNASl have been identified in tissues from individuals with McCune- Albright syndrome (MAS). The mosaic distribution of the abnormal GNASl allele may account for the unusual distribution of bone lesions, cutaneous hyperpigmentation. and endocrine neoplasia in these patients (89, 90).
B. Methods of Study Mutational analysis of the GNASl gene has been greatly facilitated by the advent of the polymerase chain reaction (PCR) and development of methods for rapid screening of amplified DNA fragments for the presence of mutations. We have used denaturing gradient gel electrophoresis (DGGE) to analyze amplified segments of the GNASl gene from patients with A H 0 and MAS (Fig. 8). Pairs of specific oligonucleotide primers were synthesized for each of the 13 exons of G p . These oligonucleotides corresponded to intron sequences that flank each of the exons. To optimize analysis of each amplified melting domain, the MELTMAP 87 program (91)was used to design primer pairs and one oligonucleotide from each primer pair was synthesized with a 5’ G-C-rich clamp. PCR fragments were amplified from DNA prepared from peripheral leukocytes from patients and from normal controls. These fragments contained one or two exons, splice-donor and -acceptor sites, and flanking intron sequence. The fragments were then analyzed by DGGE (Fig. 8). This technique employs a polyacrylamide gel with a concentration gradient of the denaturants (urea and formaniide) that increases toward the bottom of the gel. The double-stranded DNA fragments migrate through the gel until they become partially denatured, at which point their migration is
95
SIGNAL-TRANSDUCING G PROTEINS Exon
Allele A
-
I ' L
Genomic DNA
Allele B
1
PCR
I I
t
c
Electrophoresis
-
!IA-
a NDGE
-A I
DGGE
FIG.8. Analysis of genomic DNA for point mutations. Genomic DNA was amplified by polymerase chain reaction using primer pairs in which one oligonucleotide had a 40-base G.C extension at the 5' end ( G C clamp). PCR products are then subjected to electrophoresis at 60°C using polyacrylamide gels that contain a linearly increasing gradient of formamide and urea. Homologous alleles result in a single band (denoted A or B), whereas heterologous (A and B) alleles yield two homoduplexes and two more slowly migrating heteroduplex bands. DGGE, Denaturing gradient gel electrophoresis.
greatly retarded. A single base-pair mismatch can sufficiently alter the melting point of a DNA fragment and thereby alter migration through the gel. The sensitivity of this technique is improved when the mutation is present in only one allele of the gene. Under these circumstances, the PCR product is resolved into four distinct bands on DGGE, representing two homodupIex and two heteroduplex fragments that are formed during the PCR. Mutations identified by PCR and DGGE can then be sequenced.
C. Albright Hereditary Osteodystrophy In 1942, Albright et al. (92)described a group of patients with clinical and biochemical features of hypoparathyroidism (i.e, hypocalcemia and hyper-
96
C . W. EMALA ET AL.
phosphateinia) who lacked the normal phosphaturic response to infusion of parathyroid extract. They proposed that biochemical hypoparathyroidisin in these patients was not caused by a deficiency of parathyroid hormone (PTH) but 1,). a resistance of the target tissues, bone and kidney, to the biological actions of PTH. Thus, this condition was termed pseudohypoparathyroidism (PHP). These patients also had a distinctive constellation of developmental and skeletal defects, including round facies, short stature, brachydactyly, obesity, heterotopic ossifications, and, in some patients, mental retardation (Fig. 9). Subsequently, other subjects were identified who had these same physical defects but who lacked evidence of hormone resistance; this disorder was termed pseudopseudohypoparathyroidism (pseudoPHP) (93). This constellation of skeletal and developmental abnormalities, with or without hormone resistance, is now termed Albright hereditary osteodystrophy.
FIG.9. A sul>ject with pseudohypoparath~roidism type la demonstrating the physical stigmata of Allxight hereditary osteodystrophy.
SIGNAL-TRANSDUCINGG PROTEINS
97
There is considerable variability in the clinical presentation of AHO, even in affected members of the same family. Round facies, short stature, and brachydactyly may be obvious in some subjects and subtle in others. There is heterotopic ossification in approximately half the patients; some may have profound mental retardation or normal intelligence. Hormone resistance is also variable; hormone resistance does not occur in all target tissues, nor does it occur in all patients with AHO.
D. Deficient Expression or Function of G p as the Basis for Al bright Hereditary Osteodystrophy PHP was the first recognized human disease to be ascribed to diminished responsiveness to a hormone by otherwise normal target organs. Characterization of the molecular basis for PTH resistance in PHP commenced when it was shown (94) that the actions of PTH on its target organs, bone and kidney, are mediated by cyclic 3',5'-adenosine monophosphate (CAMP), and that FTH infusion in man leads to increased urinary excretion of nephrogenous cAMP (95). Patients with PHP type 1 1 have a markedly blunted urinary cAMP response to exogenous PTH. This observation first suggested that PTH resistance in PHP type 1 is the result of a defect in the hormonereceptor adenylate cyclase complex that produces CAMP. Patients with A H 0 have reduction of approximately 50% in G,a activity, as determined by functional assays of membranes from a variety of tissues, including red blood cells, lymphocytes, and fibroblasts (96). In most cases, these functional deficiencies are accompanied by decreased levels of G,a protein (Fig. 10)(97)and mRNA (98). In other cases, levels of G p protein or mRNA may be normal despite a functional deficiency of G,a. Multiple distinct mutations that account for G,a have been identified in the GNASl gene, including missense mutations in the coding sequence, mutations in sequences necessary for correct splicing, and small deletions (Fig. 11) (86, 99, 100). Thus, the autosomal dominant inheritance of A H 0 and G p deficiency is explained by transmission of heterozygous mutations in the GNASl gene. The inheritance of A H 0 has generated considerable debate. The reported fema1e:male ratio of 2:1 among affected individuals had suggested that A H 0 is an X-linked disorder. However, examples of male-to-male transmission have been documented, and it is currently believed that the mode of inheritance is autosomal dominant with sex modification (98). This is consistent with the recent association of A H 0 with inherited defects in the gene for G,a (GNASl),located on chromosome 20 (101). 1 In contrast to individuals with PHP type 2, a much rarer condition in which the urinary cAMP response to PTH is normal but the phosphaturic response is deficient (137).
98
C. W. EMALA ET AL.
i
A A
125
A
0 0
25
50
75
100
125
G,a BIOACTIVITY (% Normal)
FIG. 10. Biological and immunological activity of erythrocyte G,a protein in normal (A) and A H 0 (0)patients.
Patients with PHP type l a have deficient expression of G,a and show resistance not only to PTH but to other hormones and neurotransmitters that activate adenylate cyclase. Many patients have biochemical hypothyroidism with low serum levels of T4, an elevated serum level of TSH, and a normalsized thyroid gland. Subjects demonstrate an exaggerated release of pituitary thyrotropin to TRH administration, but fail to show appropriate thyroid release of T3 (102). These observations are consistent with resistance of follicular cells to TSH (103). Many subjects with PHP type l a have hypogonadism, oligomenorrhea, or amenorrhea with elevated levels of plasma gonadotropins; exaggerated responses to GnRH (gonadotropin releasing hormone) are common in many patients. Several clinical conundrums remain unanswered despite advances in our knowledge of the molecular basis for G,a deficiency. Patients with PHP type l a do not show abnormal responses to all hormones that stimulate adenylate cyclase. For example, PHP type l a patients have normal physiologic responses to ACTH and AVP. Some of this variability may be explained by deficiencies in the level at which the response is tested. PHP type l a patients dre considered to have impaired responses to glucagon and isoproterenol when plasma cAMP levels are analyzed, but normal responses if the physiologic effects (e.g., blood glucose) are measured. Thus, apparently normal hormone response of some tissues may indicate that less cAMP is required to produce a physiological effect. Conversely, it remains unknown whether G,a deficiency impairs adenylate cyclase activity in all tissues to an
4-bp deletion
+
5'
I
I
EXON
2
3
Gsa Gene
+I I +I
*T+C
A+G
*T+GI
*G+A
*C+T
A+G
4-bp deletion 1-bp deletion
Gsa
Protein
NH2
I II I
I I I I
I
I
COOH
FIG. 11. G p gene and protein in AHO. The upper diagram depicts the human G,a gene, which spans over 20,000 bp and contains 13 exons and 12 introns. Ten distinct mutations have been identified in affected members of ten unrelated families. The mutation in exon 1 eliminates the initiator methionine codon and prevents synthesis of a normal C,a protein (86). The 4-bp deletions in exon 7 (134)and exon 8 (100),and the 1-bp deletion in exon 10 all shift the normal reading frame and prevent normal mRNA andlor protein synthesis. Mutations in intron 3 and at the donor splice junction between exon 10 and intron 10 cause splicing abnormalities that prevent normal mRNA synthesis (99). The four mutations indicated with an asterisk represent missense mutations (100,135, 136); the resultant amino-acid substitutions are indicated in the schematic diagram of the G,a protein at the bottom of the figure. Some of these mutations may prevent normal protein synthesis by altering protein secondary structure, but the Arg -+ His substitution in exon 13 appears to encode an altered protein that cannot couple normally to receptors (135).
100
C. W. EMALA ET AL.
equivalent degree. Perhaps even more perplexing is the existence of pseudoPHP. Patients with pseudoPHP, compared to their relatives with PHP type la, have similar degrees of G,a deficiency and have mutations in GNASl. Surprisingly, in these subjects G p deficiency is not associated with hormone resistance. Even more perplexing is the observation that individuals with PHP and individuals with pseudoPHP occur in a single family, but PHP type l a and pseudoPHP do not occur in the same generation. Moreover, there are no examples of affected males having progeny with PHP, or of affected females having offspring with pseudoPHP. We have no explanation for these observations.
E. Sporadic Endocrine Neoplasia Somatic mutations in G,a have been identified in a subset of pituitary tumors secreting human growth hormone (88)and autonomously functioning human thyroid tumors (104, 105). These point mutations result in specific amino-acid replacements, including Arg201-+ Cys, Arg201-+ His, and Gln227 -+ Arg, that reduce the intrinsic GTPase activity of G p , thereby activating G,CLaxid enabling it to stimulate adenylate cyclase constitutively. These observations have led to the concept that activating mutations of the G,a gene convert it into a putative oncogene termed g s p (106).Analogous mutations in the cognate Arg-201 residue have been found in the Gia(2) chain in tumors of human ovary and adrenal cortex (gip 2) (104).
F . M cCune- At bright Sy ndrome McCune-Albright Syndrome is characterized by the clinical triad of cutaneous hyperpigmentation, polyostotic fibrous dysplasia, and endocrine dysfunction. The pigmented cutaneous lesions, termed ct&-nti-Zuit spots, have irregular borders that have been described to resemble the “coast of Maine.” The lesions typically display a segmental distribution that follows the developmental lines of Blaschko (106). Based on the peculiar distribution of the cutaneous lesions, it was suggested that MAS results from a postzygotic somatic cell mutation in an unknown gene that leads to a mosaic distribution of affected cells (107, 108). An analogous pattern of variable involvement of hormonally responsive cells occurs in subjects with MAS. The metabolic abnormalities are characterized by excessive function of the responsive cells, and are associated with endocrine syndromes including precocious puberty, hyperthyroidism, hypercortisolism, growth hormone excess, and hyperprolactinemia. The bone lesions of MAS (polyostotic fibrous dysplasia) bear considerable resemblance to those of hyperparathyroidism. These diverse metabolic abnorinalities share in common the involvement of cells that respond to extracellular signals through activation of the hormone-sensitive adenylate cyclase system.
SIGNAL-TRANSDUCINC G PROTEINS
101
However, these metabolic disturbances are not accompanied by elevated plasma concentrations of the relevant trophic or stimulatory hormones. Thus, MAS patients with precocious puberty demonstrate ovarian enlargement with development and involution of ovarian cysts, but have prepubertal levels of luteinizing hormone (LH) and follicle-stimulating hormone (FSH) and a prepubertal response of LH to luteinizing-hormone-releasing hormone (LHRH) (109). Patients with hyperthyroidism demonstrate thyroid enlargement with thyroid cysts and have suppressed levels of TSH (107). The few patients with cortisol excess have been found to have adrenal hyperplasia and undetectable levels of ACTH (110). Finally, excessive secretion of GH by subjects with MAS is indistinguishable biochemically from that which occurs in patients who have autonomous GH-secreting pituitary tumors (111). These observations have led to the speculation that MAS is caused by a lesion that results in constitutive activation of adenylate cyclase. This speculation has recently been confirmed by the description of mutations in the GNASl gene encoding in patients with MAS (89, 90). These mutations occur in the codon for Arg-201 in exon 8, and result in substitution of His or Cys at that position. These mutations are identical to those present in many human endocrine tumors, and result in inhibition of the GTPase activity of G p , thereby causing constitutive activation of adenylate cyclase. Expression of the GNASl Arg-201 mutation is greatest in cells from affected tissues and is absent or present in low copy number in apparently normal tissues (89, 90). Differential levels of mutant Gsa alleles have been demonstrated in ovarian tumors (go), cufd-au-laitlesions (89), fibrous dysplasia (112), and in nonendocrine tissue (112; W. F. Schwindinger, unpublished). These findings suggest that the molecular basis of MAS is a heterozygous somatic mutation in GNAS1 that occurs early in development and that results in mosaicism. The mosaic distribution of cells bearing the mutant GNASl allele readily explains the observed clinical variability and noninheritance of MAS.
G. Altered Expression of G Proteins Expression and/or function of G proteins is altered by many drugs, hormones, and metabolic states. We have shown that levels of Gp(2) and Gp(3) proteins and mRNA are greater in cardiac ventricles from hypothyroid animals than from euthyroid animals (69). Immunoreactive p1 and pz polypeptides were also increased in the hypothyroid state and corresponded with comparable increases in the relative levels of p1 and p2mRNAs. There were no differences seen between the amounts of Gia(2), Gia(3), p1and pz in the euthyroid state and the hyperthyroid state. In addition, steady-state levels of G p protein and mRNA were not altered by thyroid hormone status. Interestingly, these perturbations in G-protein expression were not associated with alterations in adenylate cyclase activity.
102
C. W. EMALA ET AL.
Glucocorticoids can also regulate expression of G proteins. Dexamethasone increases expression of G,a in the rat pituitary GH, cell line (70).Administration of corticosterone for 7 days increased G,a levels in the frontal cortex of rats (113). Levels of GSa fall in adipocytes following adrenalectomy and return to normal with glucocorticoid replacement (114). Moreover, glucocorticoids increase adenylate cyclase activity in astrocytoma cells and increase adenylate cyclase activity and G,a expression in vascular smooth muscle (115, 116). Induction of insulin deficiency in rats by treatment with streptozotocin leads to tissue-specific changes in the expression of selected G proteins (24). Insulin deficiency is associated with decreased levels of mRNA and protein for Gia(2), Gia(3), and G,a in hepatocytes whereas levels of Gp(3) are increased in the adipocyte. In these experimental models of diabetes mellitus, reduced expression of G, results in decreased inhibition of adenylate cyclase and increased cAMP production. In addition to hormones, many drugs influence G-protein expression. For example, chronic lithium treatment decreases levels of mRNA and protein for Gp(1) and Gia(2)without affecting levels of other G-protein subunits, including G,a, G,a, and p subunits (117).Both cocaine and morphine alter expression of G proteins in specific regions of the brain. We have focused on the effects of ethanol on expression of G proteins in the pituitary, central nervous system, and peripheral blood lymphocytes. The effects of ethanol on a cell are not the result of nonspecific perturbations in membrane lipid structure, but require interaction of ethanol with specific lipid-protein domains or with hydrophobic regions of particular proteins. A primary action of ethanol is to promote the interaction of G,a and adenylate cyclase. The result may be increased or decreased adenylate cyclase activity, depending on whether the ethanol exposure is acute or chronic. Acute ethanol exposure augments adenylate cyclase activity in many systems (118-122). By contrast, chronic ethanol exposure generally reduces adenylate cyclase activity in both striatal tissue and cerebral cortex of mice and rats (118131). Because the primary site of action of ethanol appears to be G,, such an adaptation could be similar to changes that occur during heterologous desensitization (75). This appears to be the case, and is further supported by observations in cultured cells (74).For example, in neuroblastoma N1E-115 cells, ethanol acutely augments cAMP production in response to prostaglandin E, (PGE,) (74). However, when cells are incubated in ethanol for 7 days, the response to PGE and adenosine analogs was significantly decreased. A similar change was observed in NG108-15 cells (75). Ethanol acutely enhanced cAMP stimulation in response to adenosine analogs. However, after
SIGNAL-TRANSDUCING G PROTEINS
103
chronic ethanol exposure, a decreased response to the adenosine was observed. What could account for the ability of ethanol to effect a process that resembles heterologous desensitization (i.e., the response to agonists that interact with PGE and adenosine receptors is reduced after chronic ethanol exposure)? In NG 108-15 cells, ethanol-induced reduction in adenylate cyclase activity is coincident with diminished synthesis of G p ; levels of both G p protein and mRNA are reduced (75). In addition to an ethanol-induced reduction in G p , the activity of G p extracted from ethanol-exposed cells is lower than that fiom control cells, as estimated from reconstitution assays in S49 cyc-cells (75). However, the effect of ethanol on adenylate cyclase activity may vary in other cell types. For example, exposure of N1E-115 cells to ethanol for 48 hours leads to a large increase in Gia that is associated with decreased CAMP response to PGE and N6-2-phenylisopropyladenosine(PIA), but not to cholera toxin (74, 132). However, if N1E-115 cells are incubated for longer periods with ethanol, a 60%decrease in G p is also observed. In the third cell line, N18TG2, ethanol exposure had no significant effect on agonist-stimulated CAMP production or on the quantity of either G protein. These studies emphasize that different cell types can respond differently to chronic ethanol treatment. We have investigated the effects of chronic ethanol exposure on G-protein expression in the anterior pituitary in the Long-Sleep (LS) and ShortSleep (SS) lines of mice. Using a biological indicator that is dependent on the activation of adenylate cyclase, proopiomelanocortin (POMC) biosynthesis, we found that acute ethanol stimulates POMC production in both lines of mice (LS and SS) (133). However, whereas chronic ethanol exposure can sustain POMC production in the SS mice, hormonal tolerance develops in LS mice. In this setting, the loss of ethanol's ability to elevate POMC mRNA levels in the LS line is defined as hormonal tolerance. Development of tolerance was coincident with reduced ability of GTP-yS to stimulate adenylate cyclase activity and reduced levels of Gsa protein. In the anterior pituitary, regulation of POMC gene transcription and biosynthesis of POMCderived peptides are under the regulation of the hormone-sensitive adenylate cyclase signal-transduction system. We therefore asked whether the decrease in POMC production (e.g., tolerance) in LS mice is associated with a reduction in anterior pituitary adenylate cyclase activity. In fact, coincident with deGelopment of tolerance in the LS line, there was a reduction in anterior pituitary adenylate cyclase activity. In principle, the ethanol-induced decrease in pituitary adenylate cyclase activity observed in LS mice could have been caused by (1)decreased expression or activity of the stimulatory protein, G p , or adenylate cyclase, or (2) increased expression or activity of the inhibitory protein, Gia.
C. W. EMALA ET AL.
104
Therefore, we examined expression of G proteins in the anterior pituitary before and after the development of tolerance. Decreased adenylate cyclase activity was accoiiipanied by a decrease in the level of the stimulatory G protein G,a, whereas levels of the inhibitory G protein G p , were unchanged (71). In the SS line, which did not develop hormonal tolerance, ethanol did not induce a significant change in adenylate cyclase activity or in the expression of either G protein. It is our hypothesis that ethanol (directly or indirectly) regulates expression of G,a in the LS but not SS mice, and therefore regulates activity of adenylate cyclase. According to this model, chronic ethanol exposure will produce hormonal tolerance in the POMC system. Differential regulation of G proteins could be a critical molecular event in the pathogenesis of tolerance. These observations underscore that the threshold for tolerance may be in part genetically determined. We next examined whether the ethanol-induced decrease in expression of G,a was restricted to the anterior pituitary or represented a phenomenon that was generalized throughout the central nervous system. Therefore, we examined the effects of ethanol on cerebellum and pons, in both I S and SS mice (72). Ethanol treatment of SS mice produced a fourfold increase in levels of G , a ( l )and G,a(i?)in both cerebellum and pons compared to control mice. Moreover, ethanol produced a twofold increase in levels of 41- and 40kDa proteins, corresponding to G i a ( l )and Gia(2),which were identified by pertussis toxin-dependent ADP ribosylation. By contrast, ethanol did not alter expression of G p , whether assessed by iminunoblot analysis, cholera toxin-dependent ADP ribosylation, or by the ability of detergent-extracted G,a to reconstitute a functional adenylate cyclase in membranes from S49 cyc muiine Iyniphoma cells, a cell line that genetically lacks G p . Similarly, ethanol treatment did not alter levels of G,a or G& in either cerebellar or pons membranes. After ethanol treatment, plasma membranes prepared from cerebellum and pons had significantly reduced adenylate cyclase activity when assayed in the presence of GTP, GTPyS, NaF, forskolin, and ligands for three distinct receptors coupled to the stimulation of adenylate cyclase. Pretreatment of plasma membranes with pertussis toxin, which reduces the biological activity of G p (and G,a), reversed the ethanolinduced inhibition of adenylate cyclase activity. We speculate that enhanced expression of C i a (1) arid Gp(2) iinpairs signal transduction in the CNS, and may impair signal-transduction potency of neurotransmitters that activate adenylate cyclase. Unlike the effect of ethanol in anterior pituitary, ethanol treatment produced similar effects oil CNS G-protein expression in both the LS and SS strains of mice. Our data are consistent with two mechanisms by which ethanol exposure can reduce signal transduction through the adenylate cyclase system; either ~~
SIGNAL-TRANSDUCING G PROTEINS
105
mechanism can induce hormonal tolerance. In the pituitary, chronic ethanol exposure reduces expression and decreases adenylate cyclase activity. In the pons and cerebellum, ethanol enhances expression of G,a(l)and Gia(2)and enhances inhibition of adenylate cyclase. In these tissues, ethanol did not alter expression. Neither G s a nor Gi protein levels were affected b;- ethanol in the line that did not develop hormonal tolerance (e.g., SS). Blood ethanol levels and rates of ethanol clearance were equivalent between lines. It is apparent that genetic factors can influence the threshold for the acquisition of hormonal tolerance to ethanol. What evidence is there that expression and function of the proteincoupled adenylate cyclase system is altered in human subjects with alcoholism? Lymphocytes and platelets isolated from alcoholics show reduced accumulation of intracellular CAMP when incubated in the presence of several hormones (75, 128-130). To further elucidate the basis for this effect, we have investigated adenylate cyclase activity in peripheral lymphocytes from abstinent alcoholic men (n = 22), actively drinking alcoholic men (n = 41), and nonalcoholic (control) men ( n = 16) (73). Lymphocyte membranes from abstinent alcoholics contained threefold greater levels of Gia(2)protein than did membranes from control subjects. By contrast, levels of G , a protein were similar in both groups. Lymphocytes from abstinent alcoholics had threefold greater levels of G,a(2)mRNA ( p < 0.001) and threefold greater levels of Gsa mRNA ( p < 0.03) than lymphocytes from control subjects. Lymphocytes from actively drinking alcoholics had levels of Gsa protein, G p ( 2 ) protein, and Gia(2)mRNA that were similar to those from control subjects. However, lymphocytes from actively drinking subjects contained 1.8-fold greater levels of G,a mRNA than those from control subjects. The enhanced Gia(2)in lymphocytes from abstinent alcoholics was associated with decreased basal, PGE,, GTPyS-, and forskolin-stimulated adenylate cyclase activity compared to both controls and actively drinking alcoholics. These results indicate that lymphocyte adenylate cyclase activity is reduced during abstinence from alcohol, and increased expression of the inhibitory G protein, Gia(2),may account for this change. It is possible that such regulation of G proteins can contribute to the development of tolerance.
VIII. Summary The pivotal role that G proteins play in transmembrane signal transduction is highlighted by the rapidly expanding list of receptors and effector molecules that are coupled through G proteins. G proteins are poised to allow discrimination and diversification of cellular signals into the cytosolic
106
C . W. EMALA ET AL.
milieu. The utilization of an evolutionarily conserved "GTPase clock by G proteins, offers insight into the fundamental role these proteins play in biology. Knowledge of the implication of altered expression or function of G proteins in human disease is now emerging. It is not surprising that deficiency or expression of altered forms of these important proteins can lead to global or restricted metabolic disturbances, depending upon the distribution and role of the G protein. Human disorders, including heart failure, alcoholism, endocrine abnormalities, and neoplasia, are now recognized as due in part to altered expression or function of G proteins.
IX. Glossary ras EF-TU Tol GNRP GAP IAP S DS-PAGE
PTH TS H ACTH
GNAs1 DGGE G-C PHI3 gsP giP 2 LH FS H LHRH PGE PIA s49 cycI, s
ss
POMC
21-kDa protein products of ras oncogenes (a small GTPase ) bacterial elongation factor (a small GTPase) transducin; a G-protein a subunit of the retina guanine-nucleotide-releasingprotein GTPase-activating protein islet-activating protein sodium dodecyl sulfatepolyacrylamide gel electrophoresis parathyroid hormone thyroid-stimulating hormone adrenocorticotropic hormone the gene encoding the G-protein a subunit, Gsol denaturing gradient gel electrophoresis guanine. cytosine pseudohypoparathyroidism G, protein; a G,a protein with putative oncogene potential Gi protein-2; a Gia(2)protein with putative oncogene potential luteninizing hormone follicle-stimulating hormone luteinizing-hormone-releasing hormone prostaglandin E NG-2-phen ylisoprop yladenosine a murine lymphoma cell line lacking G,a Long-Sleep strain of mice Short-Sleep strain of mice proopiomelanocortin
SIGNAL-TRANSDUCING G PROTEINS
107
REFERENCES 1. K. Sugimoto, T. Nukada, T.Tanabe, H. Takahashi, M. Noda, N. Minamino, K. Kandawa, H. Matsuo, T. Hirose, S. Inayama and S. Numa, FEBS Lett 191, 235 (1985). 2. H. K. W. Fong, J. B. Hurley, R. S. Hopkins, R. Miake-Lye, M. S. Johnson, R. F. Doolittle and M. I. Simon, PNAS 83, 2162 (1986). 3. B. Gao, A. G. Gilman and J. D. Robishaw, PNAS 84, 6122 (1987). 4 . T.Evans, A. Fawzi, E. D. Fraser, M. L. Brown and J. K. Northup, JBC 262, 176 (1987). 5. M. A. Levine, P. M. Smallwood, P. T. Moen, Jr., L. J. Helman and T. G. Ahn, PNAS 87, 2329 (1990). 6. B. K. Fung, B. S. Lieberman and R. H. Lee, JBC 267, 24782 (1992). 7. E. von Weizsacker, M. P. Strathmann and M. I. Simon, BBRC 183, 350 (1992). 8. N. Gautam, J. Northup, H. Tamir and M. I. Simon, PNAS 87, 7973 (1990). 9. J. D. Robishaw, V. K. Kalman, C. R. Moomaw and C. A. Slaughter,JBC 264,15758 (1989). 10. J. J. Cali, E. A. Balcueva, I. Rybalkin and J. D. Robishaw, JBC 267, 24023 (1992): 11. K. J. Fisher and N. N. Aronson, MCBioZ 12, 1585 (1992). 12. M. I. Simon, M. P. Strathmann and N. Gautam, Science 252, 802 (1991). 13. L. Birnbaumer, Annu. Reu. Pharmncol. Toxicol. 30, 675 (1990). 14. C. L. Lerea, D. E . Somers, J. B. Hurley, I. B. Klock and A. H. Bunt Milam, Science 234, 77 (1986). 15. C. L. Lerea, A. H. Bunt Milam and J. B. Hurley, Neuron 3, 367 (1989). 16. D. T. Jones and R. R. Reed, Science 244, 790 (1989). 17. S. K. McLaughlin, P. J. McKinnon and R. F. Margolskee, Nature 357, 563 (1992). 18. L. Birnbaumer, J. Abramowitz and A. M. Brown, BBA 1031, 163 (1990). 19. H. R. Bourne, D. A. Sanders and F. McCormick, Nature 348, 125 (1990). 20. V. Y. Arshavsky and M. D. Bownds, Nature 357, 416 (1992). 21. K. Kokame, Y. Fukada, T. Yoshizawa, T. Takao and Y. Shimonishi, Nature 359,749 (1992). 22. N. J. Pyne, G. J. Murphy, G. Milligan and M. D. Houslay, FEBS Lett. 243, 77 (1989). 23. M. Bushfield, G . J. Murphy, B. E. Lavan, P. J. Parker, V. J. Hruby, G. Milligan and M. D. Houslay, BJ 268, 449 (1990). 24. M. Bushfield, S. L. Grimths, G. J. Murphy, N. J. Pyne, J. T. Knower, G. Milligan, P. J. Parker, S. Mollner and M. D. Houslay, BJ 271, 365 (1990). 25. S. B. Masters, R. M. Stroud and H. R. Bourne. Protein Eng. 1, 47 (1986). 26. E. F. Pai, U. Krengel, G. A. Petsko, R. S. Goody, W. Kabschand A. Wittinghofer, E M B O J. 9, 2351 (1990). 27. H. R. Bourne, D. A. Sanders and F. McCormick, Nature 349, 117 (1991). 28. S. R. Holbrook and S.-H. Kim, PNAS 86, 1751 (1989). 29. A. M. de Vos, L. Tong, M. V. Milburn, P. M. Matias, J. Jancarik, S. Noguchi, S. Nishimura, K. Miura, E . Ohtsuka and S. H. Kim, Science 239, 888 (1988). 30. C. H. Berlot and H. R. Bourne, Cell 68, 911 (1992). 31. J. B. Hurley, BBRC 92, 505 (1980). 32. P. Bray, A. Carter, V. Guo, C. Puckett, J. Kamholz, A. Spiegel and M. Nirenberg, PNAS 84, 5115 (1987). 33. M. Strathmann, T. M. Wilkie and M. I. Simon, PNAS 87, 6477 (1990). 34. J. J. Murtagh, R. Eddy, T. B. Shows, J. Moss and M. Vaughan, MCBiol11, 1146 (1991). 35. A. M. Brown and L. Birnbaumer, Am. J. Physiol. 254, H401 (1988). 36. D.A. Brown, Annu. Reu. Physiol. 52, 215 (1990). 37. A. Yatani, R. Mattera, J. Codina, R. Graf, K. Okabe, E. Padrell, R. Iyengar, A. M. Brown and L. Birnbaumer, Nature 336, 680 (1988).
108
C. W. EMALA ET AL.
38. A. Yatani. H. Hamin, J. Codina, hl. R. bfazzoni, L. Birnbaumer and A. M . Brown, Science 241, 828 (1988). 39. A. B. Abou Samra, H. Juppner, T. Force, hi. W. Freeman, X. F. Kong, E. Schipani, P. Vrena, J. Richards, J. V. Bonventre, J. T. Potts. Jr., H. M. Kronenberg and G. 17. Segre, PYAS 89, 27.32 (1992). 40. 0. Chabre, B. R. Conklin, H. Y. Lin. H. F. Lodish, E. Wilson and H. E. Ives. Mol. Endocrinol. 6, 551 (1992). 41. T. Force. J. 1’.Bonventre, bl. R. Flanner); A. H. Gorn, M. YaminandS. R. Goldring, Am. J. Physiol. 262, FlllO (1992). 4 2 . J. Van Sandr, E. Raspe, J. Prrret, C . Lejeune, C. Maenhaut. 6. Vassart and J. E. I h m o n t , 3101. Cell. Endocriid. 74, R1 (1990). 4.3. I). J. Roof, hi. L. Applebury and P. C. Sternweis, JBC 260, 16242 (1985). 44. S. M . Mumby. R. A. Kahn, D. R. Manning and A. G. Gilman, PNAS 83, 265 (1986). 4.5. H. K. \V. Fong. T. T. Ainatruda. 8. iV. Birren and M. I. Simon, PNAS 84, 3792 (1987). 46. J. Codina, 11. Stengrl, S. L. Woo arid L. Birnbaumer, FEBS I ~ t 207, t 187 (1986). 47. T. T. t\matruda. N. Gautam. H. I;. Fong. J. K. Northup and M. I. Simon, JBC 263, Sjo8 (1988). 48. Y. l V . Peng. J. I>. Kohishaw. M. A. Levine and K. \V. Yau. PNAS 89, 10882 (1992). 19. R. H . Lee, €3. S. Lieberman, H. K. Yarnane, D. Bok and B. K. Fung, JBC 267, 24776 (1992). 50. B. K. Fung. H . K . Yainane. I. hi. Ota and S. Clarke, FEBS Lett. 260, 313 (1990). .5l. P. S. Backlund, W. F. Simonds and A. M. Spiegel, JBC 265, 15572 (1990). .52. Y Fukada. T. Takao, H. Ohguro, T. Yoshi7;lwa. T. Akino and 1’. Shimonishi, Nature 346, 658 (1990). 5.3. R. K. Lai, D. Perez-Sala, F. J. Canada and R. R. Rando, PNAS 87, 7673 (1990). 54. C. J, Schmidt and E. J. Neer, JBC 266, 4538 (1991). ,55. C . 1. Schmidt. T. C. Thomas, M. A. Levine and E. J. Neer, JBC 267, 13807 (1992). 56. \2: J. Tang and A. G . Gilman, Science 254. 1500 (1991). 5 ; . K. I). Hinsch, I. Tychowiecka, H. Gausepohl. R. Frank, W.Rosenthal and G. Schultz, R B A 1013, 60 (1989). 58. A . hl. Spiegrl. P. S. Backlund. J r . , 1. E. Butr).nski, T. L. Jones and W. F. Simonds, TIBS 16, 338 (1991). 59. hl. Whitrwa!; L. ffougan. D. Digmard, D. Y. Thomas, L. Bell, C. S a r i . F. J. Grant, P. Ohara and V, L. hlxkay. Cell 56, 467 (1989). 6%). Y. Kaziro, II. Itoh, T. Ko;ozllsa, M . Nakafuku and T. Satoh, ARB 60, 349 (1991). 61. I;. J. Blumer and J. Thorner, Annu. Rec. Physiol. 53, 37 (1991). 62. 1). E. Logothetis. Y. Kurachi, J. Galper, E. J. Neer and D. E. Clapham, Nature 325, 321 1.1987). 63. J. .4xelrod, H. hi. Burch and C. L. Jelsema. Trends iVeuroSci. 11, 117 (1988). 6 J D. Kim, D 1. Lewis. L. Graziadei, E. J. Neer, D. Bar-Sagi and D. E. Clapham, Nutctrcz 337, 557 (1989). Schratler. K. R. Reed and H. R. Bourne, Nature 6.5. A. D. Federnian, B. R. Conklin. K 356, 159 (1992). 66. T. Katada. I(. Kusakabe, hf. Oinunia ;ind X1. Ui, JBC 262, 11897 (1987). 67. J. A . Pitcher, J. Inglese. J. B. HigRins, J. L. Arrim, J. P. Casey, C. Kim, J. L. Benovic, M . Irl Kwatra, 51. G. Caron and R. J. Letkoitz, Science 257, 1264 (1992). 68. kf. Chnps, <:. Hou, D. Sidiropoilos, J. B. Stock, K. H. Jakolis and P. Gierschik. EJB 206, 821 (1992). \\I.
SIGNAL-TRANSDUCING G PROTEINS
109
69. M. A. Levine, A. M. Feldman, J. D. Robishaw, P. W. Ladenson, T. G. Ahn, J. F. Moroney and P. M. Smallwood, JBC 265, 3553 (1990). 70. F. Chang and H. R. Bourne, Endocrinology 121, 1711 (1987). 71. G. S. Wand and M. A. Levine, Alcohol Clin. Exp. Res. 15, 705 (1991). 72. G. S. Wand, A. M. Diehl, M. A. Levine, D. Wolfgang and S. Samy, JBC 268,2595 (1993). 73. C. Wdtman, M. A. Levine, B. McCaul, D. Svikis and G. S. Wand, Alcohol Clin. Exp. Res. 17, 315 (1993). 74. M. E. Charness, L. A. Querimit and M. Henteleff, BBRC 155, 138 (1988). 75. D. Mochly-Rosen, F. Chang, L. Cheever, M. Kim, I. Diamond and A. S . Gordon, Nature 333, 848 (1988). 76. H. K. Hammond, L. A. Ransnas, F. C. White, C. M. Bloor and P. A. Insel, Circulation 78(Suppl. 11) (1988). 77. J. P. Longabaugh, D. E . Vatner, S. F. Vatner and C. J. Homcy,]. Clin. Invest. 81, 420 (1988). 78. A. M. Feldman, R. G. Tena, P. D. Kessler, H. F. Weisman, S. P. Schulman, R. S . Blumenthal, D. G. Jackson and C. Van Dop, Circulation 81, 1341 (1990). 79. Y. K. Katoh, I. Komuro, F. Takaku, H. Yarnaguchi and Y. Yazaki, Circ. Res. 67,235 (1990). 80. A. M. Spiegel, A. Shenker and L. S. Weinstein, Endocr. Reu. 13, 536 (1992). 81. G. Milligan and M. Wakelam (eds.), “G Proteins: Signal Transduction and Disease.” Academic Press, San Diego, 1992. 82. A. G. Gilman, Cell 36, 577 (1984). 83. M. A. Levine, W. S . Modi and S . J. O’Brien, Genomics 11, 478 (1991). 84. T. Kozasa, H. Itoh, T. Tsukamoto and Y. Kaziro, PNAS 85, 2081 (1988). 85. Y. Ishikawa, C. Bianchi, B. Nadal-Ginard and C. J. Homcy, JBC 265, 8458 (1990). 86. J. L. Patten, D. R. Johns, D. Valle, C. Eil, P. A. Gruppuso, G. Steele, P. M. Smallwood and M. A. Levine, N . Engl. J. Med. 322, 1412 (1990). 87. L. Vallar, A. Spada and G. Giannattasio, Nature 330, 566 (1987). 88. C. A. Landis, S. B. Masters, A. Spada, A. M. Pace, H. R. Bourne and L. Vallar, Nature 340, 692 (1989). 89. W. F. Schwindinger, C. A. Francomano and M. A. Levine, PNAS 89, 5152 (1992). 90. L. S. Weinstein, A. Shenker, P. V. Gejman, M. J. Merino, E. Friedman and A. M Spiegel, N . Engl. J. Med. 325, 1688 (1991). 91. L. S. Lerman and K. Silverstein, in “Methods in Enzymology” (R. Wu, ed.), Vol. 155, p. 482. Academic Press, Orlando, Florida, 1987. 92. F. Albright, C. H. Burnett and P. H. Smith, Endocrinology 30, 922 (1942). 93. F. Albright, A. P. Forbes and P. H. Henneman, Trans. Assoc. Am. Physicians 65, 337 (1952). 94. L. R. Chase and 6. D. Aurbach, Science 159, 545 (1968). 95. L. R. Chase, G. L. Melson and G. D. Aurbach, J. Clin. Inuest. 48, 1832 (1969). 96. M. A. Levine, T.G. Ahn, S. F. Klupt, K. D. Kaufman, P. M. Smallwood, H. R. Bourne, K. A. Sullivan and C. Van Dop, PNAS 85, 617 (1988). 97. J. L. Patten and M. A. Levine, J . Chn. Endocrinol. Metab. 71, 1208 (1990). 98. C. Van Dop, H. R. Bourne and R. M. Neer, f. Clin. Endocrinol. Metab. 59, 8% (1984). 99. L. S. Weinstein, P. V. Gejrnan, E. Friedman, T. Kadowaki, R. M. Collins, E. S. Gershon and A. M. Spiegel, PNAS 87, 8287 (1990). 100. A. Miric, J. D. Vechio and M. A. Levine, 1. Clin. Endocrinol. Metab. 76, 1560 (1993). 101. C. Blatt, P. Eversole-Cire, V. H. Cohn, S. Zollman, R. E. K. Fournier, L. T. Mohandas, M. Nesbitt, T. Lugo, D. T. Jones, R. R. Reed, L. P. Weiner, R. S. Sparkes and M. I. Simon, PNAS 85, 7642 (1988).
110
C. W. EMALA ET AL.
102. St. A. Le\.int., H. tV. Downs, Jr.. A. hl. Moses. N. A. Breslau, S. J. Marx, R. D. Lasker, R. E. Rizzoli, 6. D. Aurbarh and A. M . Spiegel. Am. J. Med. 74, 545 (1983). 103. E. Mallet, P. Carayon, S. Amr, P. Brunelle, T. Ducastelle, J. P. Basuyau and C. Hellouin cte hienibus, I . Clin Endocrinol. Metub. 54, 1028 (1982). 1M. J. Lyons. C . A. Landis, H. GrifEth, L. Vallar, K. Grunewald, €1. Feichtinger , Q.Y. Yuh, 0. H . Clark, E. Kawasaki, H. H. Bourne and F. McCormick, Science 249, 655 (1990). 10.5. H . C. Suarez, J. '4. du Villard, B. Caillou, $1. Schlumberger, C. Parmentier and R. hlonier. Oncogene 6, 677 (1991). 106'. R . Happle. Clin. Cunet. 29, 321 (1986) 107. P. P. Feuillan, T. Shawker, S. R. Rose, J. Jones, R. K. Jervanram and B. C. Nisu1a.J. Clin. Endocrinol. Mrtab. 71, 1596 (lW). 10h. R. Happle, J . Am. Acad. Dannutol. 16, 899 (1987). 109. C . hi. Foster, J. L. Ross, T. Shawker. 0. H. Pescovitz, D. L. Loriaux, G. B. Cutler and F. Cornite. J. C l i n . Endocrinol. Metrib. 58, 1161 (1984). 110. N. hlauras and R. hl. Bliuard, Acta Endocrinol. Suppl. 256, 207 (1986). I l l . I.. Cuttler. J. A. Jackson, M . S. Uz-Zafar, L. Levitsky, R. C . Mellinger and L. A.Frohnian, 1. Clin.Endocrinol. Afetab. 68, 1148 (1989). 112. A . Shenker. D. E. Sweet, A. hf. Spiegel and L. S . Weinstein,]. Bone Miner. Res. 7, S115 (alistr.) (1992). 113. N. Saito, X. Guitart, h l . Hayward, J. Tallman, R. Duinon and E. Nestler, PNAS 86, 3906 (1989,. 114. hi. Ros, J. E. Northup and C. C . Mallmi, B] 257, 737 (1989). 115. A. Balmforth, K. Yasunari. P. Vaughen and S. Ball, J. Yetirochem. 52, 1613 (1989). 116. E. Yasunari, hi. Kohno, A. Balmforth. K. Mrirakawa, K. Yokokawa, N. Jurihara and T. Takeda, flypertension 13, 575 (1989). 117. S F. Colin, H. Chang, S. Mollner, T. Pfeuffer, R. R. Reed, R. S. Duinan and E. J. Nestler, PK.4S 88, 10634 (1991). ZIX. S. Stenstroni and E. Richelson, J . Phannueol. Exp. Ther. 221, 334 (1992). 119. T. Saito. J. hf. Lee and B. Tabakoff. I . Xeurocheni. 44, 1037 (1985). 120. C. S. Rabe, P. R. Giri, P. L. Hoffman and B. Tabakoff. Biochem. Pharnmol. 40, 565 ( 1990). 121. R. A . Rabin and P. B. Xlolinoff, J . Phaniiucol. Exp. Ther. 227, 551 (1983). 122. C . T. Chung, L. Tamarkin. P. L. Hoffnian and B. Tabakoff, J. Phannucol. E x p Ther. 249, 16 (19891. 123. 11. C . Bode and P. €3. hiolinoff, J. Phonnucol. Exp. Ther. 246, 1040 (1988). 124. L. E. Nagy. I. Diamond, K. Collier, L. Lopez, B. Ullnian and A. S. Gordon. Mol. Phonnucol. 36, 744 11989). 12.5. T. Saito. J. hl. Lee. P. L. Hoffman and B. Tabakoff, J. Seurochein. 48, 1817 (1987). 126. R. A. Rahin, 1. Phartimol. E x p . Ther. 252, 1021 (1990). 127. S. Hynie, F. Linefelt and B. B. Fredholni, Acta Pharmucol. Toxicol. 47, 58 (1980). 128. L. E. Nan. I. Diamond and .4. Gordon, PNAS 85, 6973 (1988). 129. I . Diarnond, B. N'rubel. \V. Estrin and A. Gordon, PNAS 84, 1413 (1987). 1.30. 8. Tattakoff, P. L. Hoffman, J. hl. Lee. T. Saito, B. Willard and F. DeLeon-Jones, N. Engl. 1. Med. 318, 134 (1988). 1.32. A . S. Gordon, E. Collier and I. Diamond, PXAS 83, 2105 (1986). 132. M. E . Charness, L. A. Querimit and I. Diamond, J B C 261, 3164 (1986). 133. C;. S. \Vand, Endocrinology 124, 518 (1989). 1 3 4 . I, S . Weinstein, P. V. Gejnran, P. de hfazancourt. N. American and A. M. Spiegel, Genoinics 13, 1.319 (1992).
SIGNAL-TRANSDUCINGG PROTEINS
111
135. W. F. Schwindinger, A. Miric and M. A. Levine, Program Abstr. 74th Annual Meet. Endocr. SOC.abstr. 35 (1992). 136. A. Miric and M. A. Levine, in “G Proteins: Signal Transduction and Disease” (G. Milligan and M. Wakelam, eds.), p. 29. Academic Press, San Diego, 1992. 137. M. K. Drezner, F. A. Neelon and H . E. Lebovitz, N . Engl. J . Med. 280, 1056 (1973).
This Page Intentionally Left Blank
The tis Genes, Primary Response Genes Induced by Growth Factors and Tumor Promoters in 3T3 Cells HARVEYR. HERSCHMAN,~ DEANA. KUJUBU,~ BRADLEYS. FLETCHER, QIUFU MA, BRIANc. vARNUM,3 REBECCAS . GILBERTAND SRINIVASA T. REDDY Department of Biological Chemistry and Laboratory of Structural Biology and Molecular Medicine UCLA School of Medicine Los Angeles, Calgornia 90024
I. Phorbol-induced Primary Response Genes Can Be Cloned from Swiss 3T3 Cells ................................. ............ 11. The tis21 Gene Encodes a Protein with Similarity to a Presumptive Anti-oncogene ............................. ............ 111. The tisll Gene Is a Member of a Multigene Family . . . . . . . . . . . . . . . IV. The tislO Gene Encodes a Functional Prostaglandin Synthase/Cyclooxygenase . . . . V. Both Prostaglandin Synthesis a .................. Induced in Swiss 3T3 Cells VI. The Structures of the tisl0lpgs-2 and pgs-l Genes Are Similar VII. tisl0lpgs-2 Satisfies the Criteria for the “Second Pool” of Prostaglandin Synthase ............................ VIII. tisl0lpgs-2 Is Induced in Macrophages .......................... IX. Can the Proteins Encoded by the tislO/pgs-2 and pgs-l Genes Form Functional Heterodimers? What Would Be the Consequences of This Interaction for Pharmacologic Intervention? X. Similarities in the Expression and Regulation Prostaglandin Synthase and Nitric-Oxide Synthase . . . . XI. Conclusions and Future Directions .............................. References ........... ..................
115
119 121 I24 126 137 139 139
142 142 143 144
To whom correspondence should be addressed. Present address: Department of Medicine, University of Pennsylvania School of Medicine, Philadelphia, PA 19104. 3 Present address: Department of Pharmacology, AMGEN Center, Thousand Oaks, CA 91320. 1
2
Progress in Nuclei? Acid Research and Molccular Biology, Vol. 47
113
Copynght 0 1994 by Academic Press, Inc. A11 rights of reproduction in any form reserved.
114
HARVEY R. HERSCHMAN ET AL.
A wide variety of cellular responses to extracellular ligands share a common generic pathway.. (1) A hydrophilic ligand (a growth factor, hormone, neurotransmitter, morphogen, or antigen) binds to a transmembrane receptor present on the responding cell. (2) An intracellular signal is generated by the bound receptor, as a consequence of this ligand-receptor interaction. (3) Signal-transduction mechanisms within the cytoplasm activate intracellular protein kinases. (4) Latent transcription factors are activated by phosphorylation and/or dephosphorylation, initiating the transcription of previously quiescent genes. (5) The new transcripts are translated into proteins whose functions alter the biological properties of the cell. If the hydrophylic ligand is a growth factor, the elicited response is a commitment to leave the resting (Go)state and reenter the cell cycle. If the ligand is a hormone, the endocrine function of the cell may change. Should the ligand be a neurotransmitter, a memory trace may be laid down. For morphogens, cells alter their developmental fates as a result of induced gene expression. Antigen stimulation provokes clonal cell expansion and immunoglobulin synthesis. Stimulation of cell division in response to growth factors is. therefore, a special case of a generalized biological mechanism by which extracellular ligands elicit cellular responses as a consequence of ligand-induced gene expression. Numerous questions come to mind: How do ligands, binding to cellsurface receptors, generate intracellular signals? What are the signal transduction pathways within cells that activate latent transcription factors? What transcription factors are activated? What populations of genes are transcribed by the transcription factors activated in response to specific ligands? Are overlapping populations of quiescent genes transcriptionally activated in distinct biological responses elicited by different ligands, such as growth factors and neurotransmitters? What are the roles of the newly induced gene products in modulating subsequent changes in cellular phenotypes? Do distinct mitogens activate transcription of similar constellations of genes? To address these problems in the context of the mitogenic response, we isolated cDNAs for a group of genes induced in Swiss 3T3 cells by the mitogen and tumor promoter tetradecanoyl phorbol acetate (TPA, or, simply, phorbol).d We reasoned that we could study the nature of signal transduction leading to gene expression by working backward from the regulatory regions of these genes, identifying cis-acting sequences necessary for their induc-
’ .Abbreviations:TPA, tetradecdnoyl phorbol acetate (phorbol);tis,TPA-induced sequence: PGS, prostaglandin synthase; EGF, epidermal growth factor; FGF, fibroblast growth factor; P I X F . platelet-derived growth factor; CM-CSF, granolocyte-macrophage colony-stimulating factor: NGF. n e n e growth factor; PLA,, phospholipase A,; DEX, dexamethasone; COX, cycloox)-gendse; LPS. lipopolysaccharide; IFN, interferon: NO, nitric oxide; NOS, nitric-oxide synthase.
THE
tk
115
GENES
tion, transcription factors required for their induced expression, and receptor-mediated signaling mechanisms necessary to activate these transcription factors. Conversely, we could study questions of (1) convergence of alternative stimuli induced by different ligands, and (2) specificity of gene expression in response to different ligands by considering the combinatorial expression of these genes in response to distinct inducers. Finally, we could investigate the roles of the products of these ligand-induced genes in alternative biological responses by sequencing their cDNAs and characterizing their encoded proteins. We emphasize our own studies in this review, and do not attempt to be comprehensive in our citations of the literature. There are several review articles (I, 2) that contain more extensive commentary on primary response genes identified in other contexts.
1. Phorbol-induced Primary Response Genes Can Be Cloned from Swiss 3T3 Cells Primary response genes are genes whose transcription can be induced, in response to extracellular stimulation, by activation of preexisting transcription factors. No new proteins need to be synthesized for induced transcription of primary response genes; the components of the transcriptional machinery, present in the cells in an inactive state in unstimulated cells, are activated as a result of ligand- and receptor-stimulated signal-transduction pathways (3).We chose phorbol as the mitogen for these studies because it is a potent mitogen for 3T3 cells (4), it is also a tumor promoter, and its immediate mechanism of cellular activation (stimulation of protein kinase C) is known. We chose Swiss 3T3 cells for study because of their long history as a major paradigm for studies of growth factor-induced mitogenesis ( I ) . Moreover, we had previously isolated phorbol-nonproliferative mutants of 3T3 cells (4) and thought that phorbol-induced expression of some of these primary response genes might be altered in these mutants. To isolate phorbol-induced primary response genes, Swiss 3T3 cells were exposed to phorbol in the presence of cycloheximide for 3 hours. A lambda phage cDNA library was prepared from poly(A)+-selected RNA isolated from these cells. Duplicate filter lifts of plaques from this library were screened with [32P]cDNA prepared from poly(A)+ RNA isolated from cells treated with phorbol cycloheximide, and with [32P]cDNAfrom cells treated with cycloheximide alone. Phage clones that reacted with the former but not the latter probe were further characterized on secondary screenings. Northern analyses with these phage inserts were used to verify phorbol induction and to identify redundant clones in the population. Using these criteria, we identified overlapping partial cDNAs for seven distinct primary
+
116
HARVEY R. H E R S C H M A N ET A L .
response genes whose message levels are induced by phorbol treatment of Swiss 3T3 cells (5). We refer to these genes as “TPA-induced sequences,” or tis genes. Expression patterns of the seven primary response t i s genes (tisl, tis7, tis8, tisl0, t i s l l , tis21, and tis28) share several common features. (1) The basal levels of message for all of these genes are low, often undetectable by northern analysis of total RNA. (2) Induction is rapid; maximal levels of mKNA accumulation peak between 0.5 and 3 hours after phorbol addition. (3)The induction is transient; message levels return to baseline values, even in the continued presence of phorbol (5).
A. The tis Genes Are Induced by Several Mitogens in 3T3 Cells We could address some of the questions posed above without determining the sequences encoded by the tis genes. Although the tis-gene cDNAs were cloned in response to phorbol treatment, their expression could also be induced by polypeptide growth factors-such as epidermal growth factor (ECF), fibroblast growth factor (FGF), and platelet-derived growth factor (PDGF)-as well as by mitogenic stimulation in response to elevated serum concentration (5). Indeed, cDNAs for several of these same genes were cloned by others investigating serum-induced gene expression in 3T3 cells (6-8). We have previously reviewed the relationship between seruminduced and phorbol-induced primary response genes (1). The data suggest that these structurally diverse mitogens stimulate intracellular signaltransduction pathways that converge on common activation mechanisms for the t i s genes. Alternatively, the regulatory regions of the tis genes may all be rich in modular elements, acting in cis, that regulate transcription in response to dternative transcription factors activated by distinct signaltransduction pathways.
B. tis Genes Are Induced by a Variety of Ligands, in Many Distinct Cell Types We could also address, without any sequence information about the tis gene products, the issues of specificity and selectivity of primary response gene expression following ligand-induced alterations in distinct cell populations. Many of the tis genes, isolated as phorbol-induced cDNAs following mitogenic stimulation of Swiss 3T3 cells, are also induced in response to celltype-specific stimuli such as nerve growth factor (NGF), to stimulated diflerentiation of PC12 pheochromocytoma cells (S), and to granulocytemacrophage colony-stimulation factor (GM-CSF) treatment of myeloid cells (10). Indeed, cDNAs for several of the tis genes have also been cloned as NCF-induced cDNAs from PC12 cells (11-13) and cytokine-induced cDNAs
THE
tiS
GENES
117
from lymphoid cells (14). Many of the tis genes could also be induced in cultured astrocytes, in response to growth factors, neurotransmitters, and agents that elicit morphological differentiation (15-1 7; reviewed in 18, 19). It is apparent from these data that a widely divergent group of stimuli, acting on a broad spectrum of cells to elicit a great many distinct cellular responses, can induce the expression of a number of common primary response genes. These conclusions are underscored by the extraordinary number of stimulatory inputs that can induce the expression of several of the primary response genes in cells of the nervous system (reviewed in 19).
C. Distinct Subsets of tis Genes Are Inducible in Alternative Cell-Type Responses If the same genes can be induced in a variety of cell types, in response to a wide range of ligands, questions arise as to how specificity in ligandinduced biological responses occurs. How can different cells mount distinct biological responses, if the same set of primary response genes is expressed? How can two different agents elicit distinct biological responses in the same cell, if the same primary response genes are induced by these agents? Although many tis genes can be induced in a multiplicity of systems, cell-typespecific expression is also observed. For example, the PC12 pheochromocytoma cell cannot express the tis1O gene, in response to any signal; the tis10 gene is “restricted with regard to this cell line (9). On the other hand, the tisl gene cannot be induced by any agent in the murine 32D myeloid cell line or in normal neutrophils (10). Similarly, the tis8 gene cannot be expressed in IC21 macrophage cells (20).Differential patterns of cell-type restriction for primary response gene expression no doubt contribute to cellular specificity of biological responses (21). We have also demonstrated that, in the same cell, distinct ligands induce different ratios of the tis gene products. For example, depolarization and NGF induce differing relative expression of the tis8 and tisl genes in PC12 cells (22, 23). Posttranslational modifications of the FOS protein are also different in PC12 cells followingfos induction by depolarization versus NGF activation (24-26). The combination of (1)qualitative cell-type-specific restriction of primary response genes, resulting from developmental programs, (2) quantitatively distinct patterns of induction of the primary response genes by alternative ligands, and (3) cell- and ligand-specific posttranslational modifications is likely to provide a large part of the diversity needed to account for the ability of cells to mount a wide range of biological responses mediated by the primary response genes. Additional diversity in cellular responses to ligand stimulation is provided by the developmental history of the responding cell, modulating the allowable repertoire of secondary response genes express-
118
HARVEY R. HERSCHMAN ET AL.
able in response to newly induced primary response gene-encoded transcription factors.
D. tis Genes Are Expected to Encode Proteins with Different Types of Functions What sorts of functions might be carried out by the products of the first genes to be transcribed in a mitogenic response? We might expect that transcription factors, necessary to activate “secondary response genes” in a gene expression cascade leading to initiation of DNA synthesis, might be one class of primary response gene activated by mitogen-induced signal transduction. In fact, as previously known, before most of the searches for mitogeninduced primary response genes began, serum and other mitogens can induce transcriptional activation of the c-fos gene (27-30). The AP-1 transcription factors are composed of heterodimer products of the members of thefos and jun gene families (31, 32). Both c j b s and c-jun are primary response genes; their transcripts are rapidly and transiently elevated in response to a wide variety of ligands in a number of cell types (reviewed in 1). When cells within the tissues of an intact organism are stimulated to divide, they also activate signaling mechanisms that allow communication with neighboring cells, informing surrounding cells that the responding cell is undergoing a cellular response, with potential consequences for its neighbors. It is likely that mitogen-stimulated cells may, therefore, produce cytokine-like molecules that are secreted and interact in a paracrine fashion with neighboring cells. Some of this speculation is hindsight; two of the first primary response genes identified by differential screening, kc and j e (27), encode molecules that appear to be cytokine-signaling molecules (33-35). We also expect to identify primary response gene products whose cellular functions we cannot recognize from the sequence of the transcripts and/or proteins. Our ability to clone classes of genes based on physiological differences, such as quiescence versus mitogenic activation, exceeds our current understanding of the physiological processes, and the gene products mediating them, that are required for ligand-induced cellular phenotypic changes. In this regard, the situation is not unlike that encountered by workers cloning retroviral oncogenes and their cellular homologues: even today the exact roles of the products of the c-src, c-myc, c-mJ and c-ras genes remain obscure.
E. Some tis Genes Encode Ligand-Inducible Transcription Factors Hybridization studies demonstrated that the tis28 cDNA is a partial cDNA clone derived from the c-fos message (5). The FOS protein is one partner in the heterodinieric AP-1 transcription factor (31, 32). Sequencing
THE
tis
GENES
119
studies of the egr-llz~-268/tis8 cDNA, isolated as a serum-inducible (7, 3638) or phorbol-inducible (39) primary response gene, suggested that the protein product of this gene might be a transcription factor. The specific cisacting DNA consensus binding element for this transcription factor, GCG(G/T)GGGGCG, has been identified (40-42). Subsequent studies (reviewed in 1 ) have demonstrated that this gene is, indeed, a transcription factor whose levels in cells are regulated by ligand induction of message transcription. The rat cognate of this gene, NGFI-A, induced by nerve growth factor and other ligands in PC12 cells (9),has also been cloned as an NGF-inducible primary response gene (11). Sequencing studies of the NlOlnur77ltisl cDNA, also isolated as a serum-inducible (43, 44) or, phorbol-inducible (45)primary response gene, suggested that this gene is a member of the steroid hormone receptorltranscription factor supergene family (45). The rat cognate of this gene, NGFI-B, was also identified in PC12 cells following induction by NGF (9,46) and other ligands (9). Subsequent studies demonstrate that this protein is, indeed, a DNA-binding protein likely to serve as a transcription factor (47, 48). Whether a ligand analogous to the steroid hormone is required for transcriptional activation of NlOlnur77/tisl INGFI-B has not yet been resolved. A number of laboratories have characterized (1) the DNA sequences acting in cis that are required for c-fosltis28, c-jun, egr-lltis8, and NlOlnur77/tisl induction, (2) the transcription factors necessary to induce their expression in response to ligands, and (3) the protein kinase reactions leading to transcription factor activation and gene expression. The primary response genes encoding these transcription factors and the activities of their encoded proteins are being investigated in many laboratories. These primary response genes and their transcription factor products are the subject of many recent research papers and review articles. We do not discuss these genes or their products any further in this review.
II. The iis27 Gene Encodes a Protein with Similarity fo a Presumptive Anti-oncogene The phorbol-ester induced expression of tis21 mRNA is very rapid, peaking within the first hour. Moreover, the tis21 gene is responsive to polypeptide growth factors, such as E G F and PDGF, and to forskolin, as well as to phorbol (Fig. 1).Thus this gene can be induced by agents that activate tyrosine-kinase-signaling pathways, protein-kinase-A-mediated pathways, and protein-kinase-C-mediated pathways. Promoter-construct experiments suggest that the first 460 nucleotides 5' of the transcription start site of the
120
HARVEY R. HERSCHMAN ET AL.
Flc;. 1. The tis21 gene is rapidly and transiently induced by a variety of ligands. Total RNA was prepared froin density-arrested Swiss 3T3 cells treated with epidermal growth factor (EGF; 10 nglml), forskolin (FOR: 40 nglrnl), phorbol (TPA; 50 ng/ml), or 20%fetal bovine serum (SER) for 1. 2. 4,and 8 hours. RNA was subjected to electrophoresis and transferred to nitrocellulose; 10 ~g of total RN.4 was loaded in each lane. The blot was hybridized with cDNA probes for tis21 and chol?. choh is a cDNA for a constitutive message, used to normalize for differences in loading of RNA. These data are reproduced from 49.
tis21 gene have the cis-acting elements necessary for induction by all ofthese agents (49). Thc tis21 gene can be expressed in a wide range of cells in response to phorbol ester stimulation (5, 49). The rat hoinologue of tis21 has been cloned as an NGF-inducible gene, pc3, from PC12 cells (SO). A putative signal sequence in the pc3-encoded protein has led to the suggestion that this inolecule may be secreted from cells. Antibody prepared to recombinant tis21 protein can identify a 17-kDa protein that is rapidly synthesized in response to mitogen stimulation in Swiss 3T3 cells (49). Pulse-chase experiments suggest that this protein has an extremely short half-life, 15 minutes or less ( 5 0 ~ )However, . pulse-chase experiments using our antibody to tis21 suggest that the tis21-encoded protein is not secreted from cells after induction, either by phorbol-treated Swiss 3T3 cells or by NGF-treated PC12 cells (SOU). The protein encoded by the human btgl gene shows 59% identity and 7S% conserved sequence similarity with rat PC3 protein (51).The exon/intron structure of the tis21 gene (42) and the btgl gene (51) demonstrate remarkable similarity, strengthening the suggestion that these two genes are niembers of a gene family. btgl is not the human homologue of pc3/tis21, however. A human gene more closely related that btgl to tis21lpc3 has been identified, but not characterized (51). btgl was cloned and characterized because of its proximity to a breakpoint [t(8;12)]translocation that occurs in a human B cell chronic lyinphocytic leukemia. The btgl gene is not involved in the translocation, but lies only 10 kb downstream from the breakpoint (.'2). Overexpression of btgl in NIH 3T3 cells decreases growth rates and
THE
tiS
121
GENES
reduces cloning ability, leading to the suggestion that it is an “antiproliferative gene” whose loss of function may contribute to uncontrolled growth (51). The similarities in both the exonlintron structure of the btgl and tis21lpc3 genes and the sequences of their encoded products suggest that continued characterization of the tis21 gene and its product should provide additional information on cell growth and differentiation.
111. The fisJ7 Gene Is a Member of a Multigene Family The predicted open-reading-frame of the tisll gene product is not related to any known protein. However, within the protein are two copies of a repeated amino-acid sequence, YKTELCX&X,CX,H, separated by 12 amino acids (53-55). The tisll protein was also cloned as a primary response gene from both NIH 3T3 and BALBIc 3T3 cells, as Nup475 (56) and l T P , respectively (57).Antisera to recombinant Nup475 protein detected a transient elevation in nuclear staining following mitogen stimulation (56). NuclePro)-rich ar localization, the (Cys + His) motif, and presence of (Ser domains led to the suggestion that the protein encoded by the tislllnup4751 ttp gene may be a rapidly inducible transcription factor (55-57) Many of the primary response genes that encode mitogen-inducible transcription factors are members of gene families. The c-fos gene is a mbmber of a family that includes at least four genes (c-fos,fru-1, fru-2, and fosB; reviewed in 1). Similarly, the c-jun gene is a member of a gene family that includes at least two other genes, junB and junD (1). The egrlltis8 gene belongs to a family of transcription factors with related DNA-binding domains (1).We used both hybridization with degenerate oligonucleotides and polymerase-chain-reaction procedures to identi&, from both cDNA libraries and from genomic DNA, two additional murine genes (tisIIb and tislld) that encode proteins related to the tisII-encoded protein (55). The YKTELCX,CX,CX,H repeats and the spacing between them are completely conserved in the proteins encoded by t i s l l , tisllb, and tislld. The conserved repeats are found within a 67-amino-acid region of all three proteins that shares very strong additional sequence conservation (Fig. 2). The regions of sequence similarity shared by the murine tisll, tisllb, and tislld proteins are shown in schematic form in Fig. 3. It seems likely that the regions of major sequence conservation among the tisll proteins may be DNA-binding domains present in a family of related transcription factors. However, data demonstrating this hypothesis have not been published at this time. The rat homologue of tisllb, cmgl, was cloned as an EGF-inducible
+
122
HARVEY R. HERSCHMAN ET AL.
- M D L S A I Y E S L Q S - - - - - - - - - - q T a S A q F D - - - -
-n
D ~ S O D
tfJ~ H G S E V C K G N K M L R H S A R N L H A L
ii llb lld
M L D K K A V G T P V A A A P S S S F T P G F L
11 llb lld
- - G T S L G G - L W N I N S D S I P S G V T - - - - a R L T G R S N Y S ~ D R I A V [ T u AA h F P R R H ~S Q L P A H P V P S P G S C S P K F P G A P N G G G - - - S D C G P A G G
11 1lb lld b::
lld 11 llb lld itb lld
-
_- _- -_ -_ -_
- G Q P H - [ G o -
-- -- -- _- -_
-
- - -
-,-
i
S G G G G A S
1-
- - 7 R f l I p m L S A D R P R L O S F S F A G F P S
G D L R A F G A R D A L H L G F A R E P R P K L H H S L S F S G F P S
G R R S S P P P P G F S G ~ S L S P S S S P P P G D L P L P A A I A A H - - - y f q I m L f i A g D L g m H H - - Q P G L E S a L L L D S P T S R T P P P S S S S S
-
k#$
S P S A FL S A A b G g V F R R b P N 0 R : P C P SR M R SG TL PT GP G S T I D G N N P F F S ~ E L S D C S S A A N T P S G A P
A
ALf)A A A A A L L Y R ;
W G P L G G L A R S P S A H S L G S D P D D Y A S S G S S L G G PQT G@A
F L F F P H E D L[EI- - S
~ F DiPLfjM p a D ~ s LD H E O Y P C A S C S S S G A N N A F A F G P E L S S
FIG. 2. The TIS11, TISllb, and TISlld proteins share regions of sequence identity and similarity. The predicted amino-acid sequences of the TIS11, TISllb, and TISlld proteins are aligned to identify maximal sequence similarity. Identical amino acids and conservative substitutions are boxed. The 67-amino-acid “core” region. conserved in all three proteins, is within the area enclosed in heavy lines. These data are reproduced from 55,
gene in rat colonic epithelial cells (58).The rat cmgl- and murine tisllbencoded proteins differ by only one serine-to-threonine substitution, a remarkable level of sequence conservation. In comparison, the murine tis21and rat pc4-encoded proteins differ at 13 amino-acid residues. The murine proteins encoded by nur77Jtisl and rat n&-B have much greater extents of amino-acid differences, and differ in size. Similarly, the egr-1 ltis8- and ngfIA-encoded proteins differ in size and amino-acid sequence.
~
123
THE t i S GENES
1 TISfI I
,
I ,
, I
,I *.
I
M
3 (
, I
I I
.,
w r ; I
. ,
: : j zI
I
.,
M
,.
~. , ,I ,I
. m
s'
v
d TISIIB
:<*,
,., ,,
b
\
I
I % \ % <
, , < . *
::::
,
.
I
t
mffl
]
TISIID
FIG.3. The TIS11, TISllb, and TISlld proteins share regions of sequence identity and similarity. The regions of strong sequence similarity in these three proteins are indicated in hatched boxes.
The tisll gene is not detectably transcribed in unstimulated 3T3 cells, but is rapidly and transiently expressed following phorbol or growth factor treatment. In contrast, the basal levels of tisllb and tislld are quite significant. Moreover, treatment with phorbol or other mitogens does not dramatically elevate the levels of these two messages in Swiss 3T3 cells (Fig. 4). Induction of tisll mRNA accumulation by phorbol or growth factors is dramatically enhanced by cycloheximide. In contrast, cycloheximide has relatively little effect on t i s l l b and tislld message levels in Swiss 3T3 cells (55). These data suggest that, in 3T3 cells, the t i s l l b and tislld genes are subjected to much less transcriptional modulation than the tisll gene. The differences in their message levels in induced and uninduced cells would not have been sufficient to permit identification of the tisllb and tislld cDNAs in our original differential hybridization screen for phorbol-induced se-
FIG. 4. Phorbol induces tisll mRNA accumulation to a much greater extent than does tisllb or tislld. Density-arrested 3T3 cells were treated with phorbol (50 ng/ml) for the times (in minutes) shown. Total RNA was isolated, and 10 pg was subjected to electrophoresis, transfer, blotting, and hybridization. Each probe was hybridized to a separate filter. These data are reproduced from 55.
124
HARVEY R. HEASCHMAN ET AL.
quences; the cloning of these two cDNAs depended on sequence homologies. The fact that cnzglltisllh is inducible in rat epithelial cells and is relatively constitutive in swiss 3T3 cells raises the exciting possibility that the regulation of members of this gene family may be ceil-type specific.
IV. The tisl0 Gene Encodes a Functional Prostaglandin Synthase/Cyclooxygenase The t i s f 0 gene is the most highly restricted of the tis primary response genes. In our original analysis of primary response gene expression in various cell culture lines, we found tisl0 message induced only in fibroblast cell lines (59). Subsequently, we observed that this gene could be induced in murine in yeloid cell lines and in human neutrophils by granulocytemacrophage colony-stimulating factor, as well as by phorbol esters (10). The tissue- and cell-type restriction of the tislO gene, in contrast to the relatively widespread expression of many of the other tis genes, suggested a potential cell-type-specific function for the TISlO protein. The t i s l 0 message predicts an open-reading-frame of 604 amino acids (59).Analysis of the predicted amino-acid sequence demonstrates extensive sequence identity with the previously cloned human, murine, and ovine cDNAs for prostaglandin synthaselcyclooxygenase (EC 1.14.99.1). The murine tislO-encoded and prostaglandin synthase proteins are over 60%identiCdl, and have over 80% sequence similarity when conservative amino-acid replacements are considered (59). We now refer to the prostaglandin synthase gene as pgs-1 and to the tislO gene as tisl01pgs-2. The chicken form of the tisl0tpgs-2 cDNA was cloned from a cDNA library prepared from a chicken embryo fibroblast cell line expressing a temperature-sensitive v-src gene (60). Because of the evolutionary distance between chicken and those species (human, mouse, sheep) from which the pgs-l cDNAs had previously been cloned, these workers could not be certain whether their cDNA represented the chicken homologue of pgs-1 or a product of a distinct gene. Induction of tisl01pgs-2 mRNA accumulation, following activation of a temperature-sensitive v-src gene, was also demonstrated in 3T3 cells (61).However at the time this experiment was reported, the sequence similarities between tisl01pgs-2 and pgs-1 were not known. cDNAs for the murine form of the tisl0lpgs-2 gene were subsequently cloned by several other groups (62, 63). The cDNA for the human homologue of the tisl01pgs-2 gene has subsequently also been cloned, using sequence data from the murine protein (64).
12s
THE tis GENES
Although there are extensive sequence similarities between the PGS-1 and TISlO/PGS-2 proteins, it is clear that these enzymes are encoded by distinct genes. The PGS-1 proteins from humans, mice, and sheep all share a 17-amino-acid sequence located near the amino-terminal end of the molecule that is not present in the murine TISlOJPGS-2 molecule. In contrast, there is an 18-amino-acid sequence, present near the carboxy-terminal end of the TISlOIPGS-2 protein, that is not present in the PGS-1 proteins. These relationships are illustrated diagrammatically in Fig. 5 . It appears that the tislO/pgs-2 gene encodes a second form of prostaglandin synthase/cyclooxygenase.Expression of the tisl0Jpgs-2 coding sequences in COS-1 cells demonstrated that this gene encodes a functional prostaglandin synthase; both the cyclooxygenase and hydroperoxidase activities of the enzyme complex were expressed in the transfected cells (65). A second form of prostaglandin synthase activity was also demonstrated biochemically in rat ovarian granulosa cells (66). Partial amino-acid sequencing demonstrated that this molecule, inducible by pituitary glycoprotein hormones in these cells, is the rat homologue of TISlO/PGS-2 (67). Many of the important features of the PGS-1 molecule are conserved in the TIS10/PGS-2 protein, as would be expected for related proteins that carry out such a complicated enzymatic function. The two regions thought to be the axial and distal binding sites for heme are conserved completely between these two proteins. The histidine residue thought to be involved in the distal heme-binding position is conserved in a KALGH sequence found in the three PGS-1 proteins. PGS-1 is enzymatically inactivated by aspirin as a result of acetylation at serine 503;this site is conserved in TISlO/PGS-S. Many laboratories are currently attempting to identify nonsteroidal antiinflammatory compounds that can differentially inhibit the enzymatic activity of these two prostaglandin synthases.
Leader sequence ( 17 amino acids)
t
0
PGS /COX
u TISlO *
TISlO peptida (l8 omino acids)
FIG.5. The PGS-1 and TISlO/PGS-2 proteins share regions of sequence identity and similarity. The unshaded areas indicate regions of sequence similarity or identity. The shaded area in the N-terminal region of PGS-1 indicates a 17-amino-acidhydrophobic sequence present in PGS-1 but absent in TISlO/PGS-2. The shaded area in the C-terminal region ofTISlO/PGS-2 indicates the 18-amino-acid insertion present in TISlO/PGS-2 but missing from PGS-1.
126
HARVEY R. HERSCHMAN E T AL.
V. Both Prostaglandin Synthesis and TlSl O/PGS-2 Synthesis Are Induced in Swiss 3T3 Cells Other workers have reported that mitogen treatment of 3T3 cells induces prostaglandin synthesis and secretion, and have suggested that this response is dependent on the synthesis of new protein(s) (68-70). We have confirmed these observations; phorbol-induced prostaglandin synthesis is inhibited by cycloheximide in Swiss 3T3 cells (Fig. 6). We prepared an antipeptide antiserum to a region of the 18-amino-acid sequence unique to TISlOIPGS-2 (Fig. 5). This antiserum can detect TISLOIPGS-2 in microsomal membranes of COS-1 transfected cells, but does not react with PGS-1 protein (71). Immunofluorescent staining is detectable with the antipetide antiserum in Swiss 3T3 cells treated for 6 hours with phorbol (Fig. 7). In contrast, no staining is present in untreated cells. Incubation with immunizing peptide can block the immunofluorescent staining, demonstrating the specificity of the antiserum. Cycloheximide treatment can completely prevent the appearance of phorbol-induced TISlO/PGS-2 immunoreactivity, suggesting that the phorbol-induced synthesis and secretion of prostaglandin E, is de-
TIME (hours 1
CHX
FIG.6. Cycloheximide inhibits phorbol-induced prostaglandin E, synthesis and secretion in Swiss 3T3 cella. Left panel: Density-arrested 3T3 cells were treated with phorbol(50 ngiinl) for the hours shown. Media were c d e c t e d and prostaglandin concentrations were determined by radioimmunoassay. Right panel: Cells were either exposed to cycloheximide (10bg/ml) for 30 minutes or were left untreated, then exposed to phorbol(50 ng/ml) for 4 hours. Media were collected and prostaglandin concentrations were determined. Data points are the averages and standard deviations of three dishes. CONT, Control; CHX, cycloheximide; TPA, phorbol. These data are reproduced from 88.
THE
GENES
127
FIG.7. Antibody to the unique carboxy-terminal region of TISlOIPGS-2 can detect induced TISlO/PGS-Z protein synthesis by irnmunofluorescence in phorbol-treated Swiss 3T3 cells. Indirect imrnunofluorescence of untreated Swiss 3T3 cells (upper left), cells treated with phorbol(50 ng/ml) for 6 hours (upper right), cells treated with phorbol(50 ng/ml) but incubated with 1 p M TISlO/PGS-2-specific peptide 30 minutes prior to and during incubation with primary antibody (lower left), and cells treated with phorbol (50 ng/ml) and cycloheximide (10 $ml) for 6 hours (lower right). These data are reproduced from 71.
pendent on the production of the inducible TISlO/PGS-2 form of prdstaglandin synthase. Western blot analysis confirmed that no detectable TISlO/PGS-2 antigen is present in quiescent, untreated Swiss 3T3 cells (Fig. 8). TISlO/PGS-B antigen accumulates in 3T3 cells following phorbol treatment, peaking at 6-8 hours after mitogen stimulation. Antigen levels then decline, eventually returning to baseline values, even in the continued presence of phorbol. Accumulation and disappearance of the TISlO/PGS-2 protein (Fig. 8) are mediated by a combination of phorbol-induced synthesis of the protein and cellular degradation. A procedure employing metabolic labeling and subsequent immunoprecipitation was used to measure these two parameters.
128
HARVEY R. HERSCHMAN ET AL.
FIG.8. Phorhol induces the transient accuinulation of TISIOIPGS-2 protein in Swiss 3T3 cells. Swiss 3T3 cells were treated with phorhl (50 ng/ml). At the hours indicated, extracts
were prepared and subjected to electrophoresis and western iminunoblot analysis, using antiI,ody to the TISlO’PGS-2 carboxy-terminal peptide. Antibody-antigen complexes were identified with a chemilriininescence assay; 75 pg of protein was loaded in each lane. These data are reproduced from T I .
Quiescent, untreated Swiss 3T3 cells synthesize only minimally detectable levels ofTISlO/PGS-Z protein (Fig. 9). In the first hour following stimulation with phorbol there is relatively little increase in incorporation of [35S]methionine or [ZSIcysteine into TISlO/PGS-2 protein. However, subsequent periods of pulse-labeling suggest a peak of synthesis in the third to the fifth hours d t e r mitogen stimulation. To measure TISLOIPGS-2 protein degradation, cells were induced by exposure to phorbol, and proteins were labeled with radioactive cysteine and methionine. The inducer and the metabolic labels were removed from the medium, and cells were analyzed-by imniunoprccipitation, electrophoresis, and autoradiography-at successive intervals for the remaining metabolically labeled TISlOIPGS-2 protein (Fig. 9). Thc half-life of the TISlO/PGS-S protein in Swiss 3T3 cells, determined from this experiment, is 4 hours. A doublet is observed in the immunoprecipitation experiments shown in Fig. 9. Previous studies using antiserum to the sheep seminal vesicle prostaglandin synthase, PGS-1, have also precipitated doublet bands (72). It is likely that those polyclonal antisera to PGS-1 precipitate both PGS-1 and ‘TISlO/PGS-2, thus making the previous data difficult to interpret. Immunizing peptide can block the precipitation of both forms of the TISlOIPGS-2 doublet reacting with the antipeptide antiserum (71),suggesting that both imniunprecipitated TIS1O/PGS-2 proteins share the common carboxyterminal epitope. The doublet formation may result from differential glymsylation of a common protein, as suggested for the PGS-1 molecule (73). However, alternative splicing of the human pgs-l message to produce a message that would encode a PGS-1 protein with a 37-aminoacid deletion has also been reported (74), and is also a possibility to explain the doublet band observed here.
THE
tk
GENES
129
FIG.9. Phorbol induces the synthesis of TISlO/PGS-2 protein in Swiss 3T3 cells. Top panel: Swiss 3T3 cells were incubated in serum-free, methionine/cysteine-free medium for 30 minutes prior to stimulation with 50 ng/ml phorbol. The cells were labeled with [35S]methionine and [35S]cysteine for sequential 1-hour intervals. The numbers above the lanes indicate the time periods (in hours) following phorbol addition when the cells were metabolically pulse labeled. Cells were lysed and the supernatants were subjected to immunoprecipitation, using antibody to the TISlO/PGS-2 carboxy-terminal peptide. Immunoprecipitates were subjected to electrophoresis and autoradiography. Bottom panel: degradation of the TISlO/PGS-2 protein. Swiss 3T3 cells were treated with phorbol(50 ng/ml) and subjected to metabolic labeling for 6 hours with [35S]methionine and [35S]cysteine. Medium with inducer and radioactive amino acids was then removed and replaced with methionine- and cysteine-supplemented medium. Cells were harvested at the hours indicated following the transfer to nonradioactive, methionine/cysteine-rich medium. Lysates were prepared and subjected to immunoprecipitation with the antipeptide antiserum. These data are reproduced from 71.
130
HARVEY R. HERSCHMAN ET AL.
The regulation of PGS activity is not governed solely by synthesis and degradation. PGS-1 rapidly autoinactivates during catalysis of the enzymatic reaction, producing antigenically detectable but enzymatically inactive protein (75, 76). Although autoinactivation studies have not yet been reported for the TISlO/PGS-2 protein, the similarities between these two PGS enzymes suggest that it is a safe assumption that autoinactivation is a property ofTISlO/PGS-2 as well. The ability of cells to convert free arachidonic acid to PGH, is likely to be a complex mix of the level of constitutive PGS-1 enzyme, synthesis of the tisl0lpgs-2 message and protein following ligand stimulation, covalent inactivation of the enzymes, and the degradation of both the PGS-1 and TISlO/PGS-2 proteins.
A. Expression of tis7O/pgs-2 in Swiss 3T3 Cells Can Be Induced by a Variety of Ligands Although the tis genes were isolated in response to phorbol stimulation, they are, in general, inducible in 3T3 cells by a variety of mitogens (5, 20, 59). Moreover, prostaglandin synthesis can be induced in 3T3 cells by a variety of stiinulatory agents, including phorbol, serum, and a number of polypeptide mitogens. Once tisIOIpgs-2 was identified as an inducible gene for prostaglandin synthaselcyclooxygenase,we investigated further the induction of this gene by alternative agents in S wiss 3T3 cells. We also compared induction of tisl0lpgs-2 to that of pgs-1, because induction of the pgs-1-encoded enzyme in BALBlc 3T3 cells had previously been suggested to occur in response to platelet-derived growth factor (77). Swiss 3T3 cells treated with EGF, forskolin, serum, or phorbol all rapidly and transiently express tislolpgs-2 message (Fig. 10). The tisl0Ipgs-2 gene responds to signals presumably mediated by protein tyrosine kinase (EGF receptor), protein kinase A (forskolin), and protein kinase C (phorbol) signaltransduction pathways. In contrast, none of these agents significantly altered the level of pgs-1 message; the expression of the pgs-1 gene appears to be rigidly constitutive in Swiss 3T3 cells. However, induction of pgs-1 message accumulation has been reported in human lung fibroblasts (74) and human endothelial cells (78, 79).
B. Glucocorticoids Inhibit Induction of tis 7O/pgs-2 in Swiss 3T3 Cells Eicosanoid production in response to ligand Stimulation has been demonstrated in a number of cell types, including fibroblasts (80),endothelial cells ( S l ) , neutrophils (8.21, macrophages (83-85),monocytes (86),and muscle cells (87). In each of these cases, glucocorticoids can attenuate the production of eicosanoids. Conventional explanations for induct'ion of prostaglandin
THE
ti.9 GENES
131
FIG. 10. The tislO/pgs-2 gene is rapidly and transiently induced by a variety of ligands. Total RNA was prepared from density-arrested Swiss 3T3 cells treated with EGF (10 ng/ml), forskolin (FOR; 40 ng/ml), phorbol (TPA; 50 ng/ml), or 20% fetal bovine serum (SER) for 1, 2, 4, and 8 hours. RNA (10 pg/ml) was subjected to electrophoresis and transferred to nitrocellulose. Two separate electrophoretic analyses and blots were performed for the experiments shown here. One blot was hybridized with a cDNA probe for t i s l O / p g s d . The second blot, prepared from the same RNAs, was hybridized with cDNA probes for pgs-l (COY) and chob. These data are reproduced from 59.
synthesis and its inhibition by glucocorticoids have previously centered around the activation of phospholipase A, (PLA,), and possible regulation of PGS-1 enzyme levels. Our demonstration that there exists a second, ligandregulated form of prostaglandin synthase and that PGS-1 levels are not modulated by extracellular ligands in Swiss 3T3 cells led us to reexamine this question. We had previously demonstrated that cycloheximide could block both phorbol-induced prostaglandin accumulation and phorbol-induced synthesis of TISlOIPGS-2 protein in Swiss 3T3 cells (Figs. 6 and 7). We then examined the effect of dexamethasone on these same two phorbol-mediated events (88). Dexamethasone completely blocks both PGE, induction (Fig. 11)and the accumulation of detectable TISlO/PGS-2 (Fig. 12) in phorbol-treated Swiss 3T3 cells. As shown in Fig. 12, dexamethasone also blocks seruminduced TISlO/PGS-2 accumulation, suggesting that glucocorticoids may block induced expression of the tislO/pgs-2 gene in response to all ligands. We thought it likely that the dexamethasone inhibition of phorbolinduced TISlOIPGS-2 accumulation was occurring at the level of mRNA accumulation, because this is the most common mechanism of glucocorticoid regulation. We challenged 3T3 cells with phorbol following incubation with dexamethasone, and characterized the production of both tislO/pgs-2 and
132
HARVEY H. HERSCHMAN ET AL.
CONT TPA TPA t
DEX FIG. 11. Ikxamethasone inhibits phorl)ol-induced prostaglandin E, synthesis and secretion i n Swiss 3T3 cells. Cells were either exposed to 2 phf dexainethsone (DEX) for 4 hours or were left untreated. then exposed to phorbol (TP.4;50 ng/ml) for 4 hours. Media were collected md prostaglandin E, cu,ncentrations were determined by radioimmunoassay. Data points are the averages and standard deviations of three dishes. These data are reproduced from (88.
pgs-1 message (Fig. 13). Dexamethasone alone had no effect on tislOlpgs-2 mRNA levels. However, incubation with this hormone blocked the phorboliriduccd accumulation of tislOlpgs-2 message. In contrast, dexaniethasone had no effect on the constitutively expressed levels of pgs-1 message. Dexainethasone also inhibited the forskolin-induced accumulation of tis10lpgs-2 message (88),reinforcing our suggestion that glucocorticoids block all ligandinduced tisl0lpgs-2 mRNA accumulation in Swiss 3T3 cells. Cortisol and aldosterone, at concentrations higher than those observed to he inhibitory for dexamethasone, also blocked phorbol-induced tislolpgs-2 message accumulation. In contrast, progesterone and testosterone, steroids without glucocorticoid activity, had no effect on induction of tisl Olpgs-2 message (88). Dexamethasone at concentrations as low as 2 nM can inhibit phorbolinduced tislO/pgs-2 message accumulation (Fig. 14). In contrast, concentrations of dexamethasone 1W times higher do not inhibit the phorbol-induced accumulation of tis8/egrl message. Thus, glucocorticoid inhibition of tislOl pgs-2 gene expression in response to mitogens is not a common feature of all priman response genes. It will be of great interest to determine which subset of primary response genes are subject to this additional level of hormonal regulation.
FIG. 12. Dexamethasone inhibits phorbol- and serum-induced synthesis of TISlOIPGS-2 protein. Control Swiss 3T3 cells (upper left), cells treated with phorbol (TPA; 50 ng/ml) for 6 hours (upper middle), cells treated with 20% serum for 6 hours (upper right), cell5 treated with dexaniethasone (2 + M ) for 10 hours (lower left), cells treated with dexaniethasone for 4 hours followed by phorbol for 6 hours (lower middle), and cells treated with dexamethasone for 4 hours followed by serum for 6 hours (lower right) were fixed and subjected to indirect immunofluorescent analysis. These data are reproduced from 71.
134
HARVEY R. HERSCHMAN ET AL.
FIG. 13. Dexamethasone inhibits phorhl-induced accumulation of tislO/pgs-2 rnHNA. Swiss 3T3 cells were treated with phorbol (TPA;50 ng/rnl) or dexamethasone (DEX)(2 K M )for the times (in hours) indicated. DEX4, TPA2 cells were preirrcuhated with DEX for 4 hours prior to exposure to phorbol for 2 hours. DEX TPAP cells were incubated for 2 hours with phorbol and DES,with no preincuhation. Total RNA was isolated, subjected to electrophoresis, and transferred to nitrocellulose; 10 p,g of RNA was loaded in each lane. The blot was hybridized with c D N A probes for tislO/pg.s-2, pgs-l (pgs/cox) and chob. These data are reproduced from 88.
C. Expression of #is7O/pgs-2 in Swiss 3T3 Cells Is Regulated at the Transcriptional Level The je primary response gene (27) encodes murine homologue of macrophage chemotactic-protein-1 (MCP-1). This protein is thought to play a role in wound healing, atherosclerosis, and inflammatory responses (89). Glucocorticoids block mitogen induction of jeltncp-1. However, the mode of glucocorticoid inhibition of jelmcp-1 induction appears to be cell-type specific; the hormone blocks rnitogen-induced transcription of the jelrncp-1 gene in BALB/c 3T3 cells (W), but mediates the stability ofjelmcp-1 message in vascular smooth muscle cells (91). These observations, as well as many studies on the steroid regulation of other inducible genes, suggest that the mechanisms by which glucocorticoids inhibit gene expression can vary with respect to cell type, ligand, and gene. We carried out nuclear transcription experiments to determine whether steroid inhibition of tisIOIpgs-2 induction occurs at the transcriptional or posttranscriptional level. Little or no transcription could be detected for the tis8/egr-l, tis21, or tisl01pgs-2 genes in nuclei from unstimulated cells (Fig. 15). Nuclei isolated from cells treated for 15 or 30 minutes with phorbol incorporated substantial isotope into transcripts for each of these three genes. These observations suggest that increased transcription is a major
THE
tiS
GENES
135
FIG. 14. Dexamethasone (DEX) does not inhibit phorbol-induced accumulation of tis8/egr-l mRNA. Total RNA was prepared from Swiss 3T3 cells preincubated for 4 hours with either vehicle (ethanol) or DEX at the concentrations shown, then treated with phorbol (50 ng/ml) for 2 hours. RNA was also prepared from DEX-treated, phorbol-treated, and untreated cells. The RNA preparations were subjected to northern analysis. Each lane contains 10 bg of RNA. Blots were hybridized with tislO/pgs-2 and tis8/egr-I radioactive cDNA probes. These data are reproduced from 88.
mechanism by which phorbol increases the message levels of the tis8iegr-1, tis21, and tislO1pgs-2 primary response genes. Incubation of cells with dexamethasone prior to phorbol treatment 'dramatically suppressed nuclear incorporation of radioactive RNA precursors into transcripts of the tisl0ipgs-2 gene, without appearing to have a substantial effect on the synthesis of transcripts from the tis8legr-1 or tis21 genes. These data suggest that, in cells previously incubated with glucocorticoids, transcriptiona1 inhibition contributes heavily to the observed attenuation of phorbol-induced tisl01pgs-2 mRNA accumulation. It is possible that processing of messenger precursors andior the stability of tisf0Jpgs-2 message are also modulated by glucocorticoids, as reported for other genes. While characterizing the structure of the tisl01pgs-2 gene, we also cloned several kilobases of DNA 5' to the start-site of transcription. We have fused these 5' regions of the tislO1pgs-2 to the luciferase gene, to analyze promoter structure and function by transient transfection analyses. A luciferase reporter-construct containing the first 371 bp 5' of the start-site of the tislO1pgs-2 transcript can respond to phorbol, serum (Fig. 16), forskolin, and platelet-derived growth factor (B. S. Fletcher and H . R. Henchman, unpublished). Cotransfection of this reporter-construct, along with a con-
136
HARVEY R. HERSCHMAN ET AL.
Frc:. 15. Dexamethasone (DEX) inhibits phorhl-induced transcription of the tislO/pgs-2 gene. Swiss 3T3 cells were incubated with dexamethasone (2 F M ) for 4 hours, then treated with phorbol iT; SO ngiml) for 15 or 30 minutes. Control (CON) cells were treated either with DEX done for 4 hours, or with phorhol for 15 and 30 minutes. Nuclei were prepared and allowed to incorporate radioactive nucleotides into RNA. RNA was isolated and hybridized to filters containing pGEhl plasmid DNAs with inserts or with pGEM DNA (as a nonspecific control). tis8,egr-f , fis21, tislOlpgs-2, and chob. After washing, the filters were subjected to autoradiography.
PXt
TISIO,
TISIO,
c .m
:4000
e
CI,
=L
\
.c 5
3000
~
S
FIG. 16,The first 370 nucleotides of the tisl0ipgs-2 promoter contain cis-acting response elements for p h o r h l and serum. NIH 3T3 cells were transfected with an empty expression vector, with fislo,, (a plasmid in which the first 963 nucleotides ofthe tislolpgs-2 promoter arc fused to a luciferase reporter), or with tisZ0, (a plasmid in which the first 371 nucleotides ofthe tisl0ipgs-2 promoter are fused to luciferase). Control cells (C), phorbol-treated cells (T; SO ng/ml), and serriin-treated cells (S) were assayed for luciferase activity. Error bars indicate standard deviations. These data are reproduced from 65.
THE
137
tis GENES
struct expressing pp6@", also results in elevated luciferase activity (B. S. Fletcher, W. Xie and H. R. Herschman, unpublished). We conclude that the first 371 nucleotides of the tislO1pgs-2 regulatory region contain cis-acting response elements for phorbol, serum, forskolin, PDGF, and ppG@rc. However, induction of luciferase activity by these agents cannot be blocked with dexamethasone, using either the construct containing 371 nucleotides (tisl0,) of the tislOlpgs-2 regulatory region or a construct containing approximately 1 kb (tisl0,) of promoter region (B. S. Fletcher, D. A. Kujubu and H. R. Herschman, unpublished). Dexamethasone blocks induction of the tislO1pgs-2 gene by a number of inducers, including phorbol, serum, and forskolin (71, 88). We anticipate that there must be a specific DNA site at which dexamethasone-mediated events are able to block tisl01pgs-2 gene transcription induced by a variety of agents. This site must be either greater than 1kb upstream from the start-site of transcription, or must lie within the transcribed region of the gene. Regulation of gene transcription by DNA encoding a sequence within the 3' untranslated region of the message has recently been reported for the primary responseje gene (92). One of our current major objectives is to identify the site and nature of glucocorticoid inhibition of the tisl01pgs-2 gene.
VI. The Structures of the tis7O/pgs-2 and pgs-7 Genes Are Similar Although pgs-1 and tisl Olpgs-2 share extensive sequence similarity, the enzyme encoded by the former has a hydrophobic region present in its amino-terminal end; this region is not present in the enzyme from the latter (Fig. 5). TISlOIPGS-2 has an 18-aminoacid sequence near its carboxy terminus that is not present in the PGS-1 protein of mouse, man, or sheep (59). We used a combination of polymerase-chain-reaction analysis and direct sequencing to identify, from genomic clones, the exonlintron borders of the tislO1pgs-2 gene and to analyze the sizes of the introns. The murine tisl01pgs-2 gene contains 10 exons and 9 introns (Fig. 17). The pgs-1 gene has one additional exon (and, therefore, one additional intron), encoding the hydrophobic leader sequence found near the amino terminus. The initial and final exons of the two genes are different in size, encoding distinct 5' untranslated sequences and amino-terminal portions of the protein, as well as distinct untranslated 3' sequences. The eight internal exons of the tislO1pgs-2 gene (exons 2-9) are all identical in size to the corresponding human pgs-1 exons, with the exception of tislO1pgs-2 exon 3. This exon encodes an additional proline present in the TISlOIPGS-2 protein, but absent in murine, ovine, and human PGS-1. The pgs-1 gene maps to
138
HARVEY R. HERSCHMAN E T AL.
murine chromosome 2. The tislO/pgs-2 gene is located on murine chromosome 1 (93). Although the exodintron structures of the tisl0Ipgs-2 and pgs-1 genes are quite similar, their overall sizes are substantially different (Fig. 17). The pgs-1 gene is over 22 kb long (94). The structure of the murine pgs-1 gene has recently been shown to be nearly identical to the human pgs-1 gene, both in exodintron structure and in size (95). In contrast, the murine tislOipgs-2 gene is only 8 kb long; many of the introns are substantially smaller in the inducible tisl01pgs-2 prostaglandin-synthase gene. It is of some interest to note that all the primary response genes characterized to date are quite small. Many of these genes have only one or two introns. The c - f i ~ .c-jun, egrlltis8, egr-2/krodO, nur771tis1, tis21, and j e genes are all
9
io
II
E X O I NUMBER
cox
L 52 8 247
H
9
to
287
410
EXOI IUYBER
I
tis 10
J
1 2
---
I1
3 4 5 I I ,
I II
6 7 0 I ,
I
II I1
I
I
9
to
I
I
8 8
I
I
ff I
*cox
(>22 kb)
H
2 kb Frc. 17. The tblO/pgs-2 and pgs-1 genes have similar structures. The exons of the mnrine tislO/pgs-2 gene and the human pgs-l gene are compared in the upper panel. The numbers in the boxes indicate the nuinber of nucleotides in that exon. The arrows below the genes indicate the protein-coding regions of the transcripts. Arrows above the genes indicate the transcription start sites. Using the same scale, the lower panel illustrates the exon-intron structure of the human pgs-1 gene and the inurine tisloipgs-2 gene. The numbers identify exons. The broken line in the tislO/pgs-2 gene indicates an uncharacterized region 5' of the coding portion of the first exon for this gene. The solid a r e a of the exons identify the protein-coding sequences. These data are reproduced from 65.
THE t i S GENES
139
less than 6 kb long (1).It appears that the requirements for rapid transcriptional responses lead to the production of small genes.
VII. tisIO/pgs-2 Satisfies the Criteria for the "Second Pool" of Prostaglandin Synthase Prior to the discovery of tislOlpgs-2, ligand-regulated synthesis and secretion of prostaglandins in fibroblasts, monocytes, macrophages, smooth muscle cells, and epithelial cells were thought to be due to activation of phospholipase A, to release arachidonic acid and, in some cells, to elevation of prostaglandin synthaselcyclooxygenase (PGS-1) activity. Glucocorticoid inhibition was thought to be due exclusively to the inhibition of PLA,, preventing the release of free arachidonic acid; substrate for eicosanoid synthesis was thought to be limiting and PGS-1 was generally thought to be present in excess. However, several groups demonstrated that cycloheximide or glucocorticoids can inhibit the induction of both cyclooxygenase activity and immunoprecipitable enzyme. Needleman and colleagues (96), as well as others, proposed that some cells may have two pools of PGSICOX enzyme, one PGS pool that is expressed constitutively and a second PGS pool that is regulated by ligand-induced signal-transduction pathways and whose induction is inhibited by glucocorticoids. The tislO1pgs-2 gene satisfies all criteria for the proposed mitogeninducible, glucocorticoid-inhibitable second pgslcox gene. tisl01pgs-2 is inducible by a variety of ligands (59)and is rapidly and transiently transcribed (Fig. 15). The product has both hydroperoxidase and cyclooxygenase activities (69, appears to be the rate-limiting enzyme for prostaglandin synthesis and secretion in some cells (88), and is inhibited by glucocorticoid hormones (88). It appears that the induction of the tislO1pgs-2 gene will be the ratelimiting step in the induced production of prostanoids, in at least some biological systems. Moreover, limitation of the TISlOIPGS-2 protein, due to the inhibition of its mRNA accumulation, is likely to play a major role in glucocorticoid inhibition of eicosanoid production. These recent observations have led to major reevaluations in current ideas on the regulation of prostanoid synthesis.
VIII. tisIO/pgs-2 Is Induced in Macrophages We observed that tislO1pgs-2 mRNA can be induced by phorbol or GMCSF in murine monocytes and in human neutrophils (lo), despite its relatively restricted expression in a wide variety of cell lines (59). These data
140
HARVEY R. HEHSCHMAN ET AL.
suggest (1)that the production of prostaglandins, as a consequence of niacrophage activation, might be due to induction of the tisl01pgs-2 gene, and (2) that glucocorticoid inhibition of prostaglandin production in activated macrophages might result from inhibition of tislOlpgs-2 gene expression. However, at the time we made this observation we had not determined that the tislO/pgs-Z cDNA shared sequence similarity with the pgs-l cDNA. Farber (97) cloned a partial cDNA for tislU/pgs-2 during differential screening to identify cytokine-induced genes in the murine macrophage RAW 264.7 cell line. Russell and colleagues (98) identified a p71/73 protein synthesized in the activation response of bone-marrow-derived tumor macrophage and RAM,‘ 264.7 cells. Microsequencing studies and immunoblots with our antipeptide antiserum directed uniquely to the TISlOIPGS-2 protein (71)demonstrated that the ~ 7 1 1 7 3protein induced by lipopolysaccharide (LPS) activation of RAW 264.7 cells is the TISlOIPGS-2 protein (99). Induction of tislO/pgs-2 mRNA andlor protein accumulation has now been demonstrated in activated BAN’ 264.7 macrophage-like cells (99), bone-marrow-derived macrophages (98, 99), and murine and rat alveolar macrophages (100, 101). We have examined several murine macrophage lines to establish model systems to study the contributions of PLA,, TISlOIPGS-2, and PGS-1 expression in the production of prostaglandins by activated macrophage. The inurine RAW 264.7 and JT74.Al cells express substantial tislO/pgs-2 message following LPS treatment (Fig. 18). In contrast, murine P388D, cells do not express any tisl Ulpgs-2 mRNA following LPS activation. The smaller transcript seen in the RAW 264.7 cells is a tislOlpgs-2 transcript that terminates at an alternative polyadenylation site (62).Because P388D, cells can be
FIG. 18. Lipopolysaccharide induces tislO/pgs-2 induction in some rnurine macrop1l;lge cell lines. R.4\V 2647, J774A.1, and P38811, murine macrophage cells were treated with LPS (5 ngirnl) for the times (in hours) shown. Total RNA was isolated, subjected to electrophoresis, and transferred to nitrocellulose; 10 pg of total RNA was loaded in each lane. The blot was hybridized with cDNA probes for tislO/pgs-2 and chob. The two upper bands are alternatively polyidrnylated forms of the tisfoipgs-2 message. The lowest band is the choh message.
THE
tiS
GENES
141
activated to produce prostaglandins, comparisons among these three cell lines should prove a fertile source of information regarding the alternative mechanisms of regulation for prostanoid synthesis. Interferon gamma (IFN-y) is a potent activator of murine macrophages. IFN-7 can induce tislO1pgs-2 mRNA accumiilation in RAW 264.7 cells (97). In our hands, IFN-y did not stimulate substantial tislO1pgs-2 mRNA accumulation in these cells, even up to 24 hours after stimulation (unpublished). However, we find that IFN-7 can augment the activity of LPS in inducing tisIO1pgs-2 message; approximately one-tenth as much LPS is required to induce maximal tislO/pgs-2 message accumulation in the presence of IFN-y (Fig. 19). Synergistic interactions between IFN-y and other cytokines and macrophage effectors are often observed (102-104). These macrophage cell lines should provide a resource in which to study the roles of endotoxin, cytokines, and glucocorticoids in the regulation of macrophage prostaglandin synthesis. It is likely that regulatory mechanisms controlling macrophage eicosanoid synthesis may be cell-type specific, and
FIG. 19. Interferon gamma (IFN-y) shifts the dose-response curve of LPS induction of tislO/pgs-2 message to the left for RAW 264.7 murine macrophage cells. RAW 264.7 cells were treated for 4 hours with LPS at the concentrations (ng/ml) shown in the figure in the absence (-IFN) or presence (+IFN) of murine IFN-y (50 ng/ml). Total RNA was isolated, subjected to electrophoresis, and transferred to nitrocellulose; 10 pg of total RNA was loaded in each lane. The blot was hybridized with cDNA probes for tiSlO/pgs-2 and chob.
142
HARVEY R. HERSCHMAN ET AL.
involve alterations in the expression of the phospholipase A, gene(s), the pgs-1 gene, and the tislOJpgs-2 gene.
IX. Can the Proteins Encoded by the tisIO/pgs-2 and pgs- 7 Genes Form Functional Heterodimers? What Would Be the Consequences of This Interaction for Pharmacologic Intervention? There is currently great interest in the pharmaceutical industry in identifying drugs that can distinguish between the two forms of prostaglandin synthase, PGS-1 and TISlOJPGS-2. The appeal and utility of a drug that could differentially inhibit the prostaglandin synthase induced in the inflammatory response is clear. However, the PGS-1 molecule is thought to be functional as a dimer (105,106). We (and others) have now demonstrated that many cells (Swiss 3T3 cells, epithelial cells, ovarian granulosa cells, macrophage cell lines, alveolar macrophages, and peritoneal macrophages) express both a constitutive pgs-1 gene and an inducible tisl0Jpgs-2 gene. The two proteins produced from these genes are both found in the microsomal fractions of crude subcellular analyses of transfected COS-1 cells (65). If the two proteins from pgs-1 and tkl0Jpgs-2 must form homodimers to produce an active enzyme, the question arises of potential PGS-l:TISlO/PGS-2 heterodimerization in cells to produce an additional form of the enzyme. If the two gene products can heterodimerize, the enzymatic and pharmacologic properties of the overall prostaglandin synthase activity in an activated cell are likely to change as the levels of TISlOIPGS-2 monomers, and therefore the constitution of active homodimers and heterodimers, change during the induction response. Extensive studies on the intracellular localization and interactions of the two pgs gene products will be required to answer this important basic and practical question.
X. Similarities in the Expression and Regulation of Inducible Forms of Prostaglandin Synthase and Nitric-Oxide Synthase Nitric oxide (NO) is a major secretory product of many mammalian cells. NO appears to be a paracrine mediator of cellular interactions, regulating critical functions in homeostasis and host defense (107, 108). NO production has been observed in endothelial cells (109, ] l o ) ,macrophages ( I l l ) , hepatocytes (112), bone marrow cells (113), and mesangial cells (114). N O functions
THE
tk GENES
143
as a vasodilator (]lo),a neurotransmitter ( ] I S ) , and a macrophage immune defense molecule (116). NO is synthesized by nitric-oxide synthase (NOS) from L-arginine. cDNAs for constitutive calciumlcalmodulin-activated NOS proteins have been cloned from brain (115) and endothelial cells (117-119). A second, inducible form of NOS, termed mac-NOS, was recently cloned from activated RAW 264.7 macrophage cells (120-122). The enzymatic activity of mac-NOS is not dependent on calciumlcalmodulin. Like the PGS isoforms, some cells may express both constitutive and inducible forms of NOS. Because (1) both prostaglandins and NO act as paracrine effectors, (2) both tislOlpgs-2 and mac-nos are induced by LPS in macrophages, and (3) induction of mac-nos and tklO1pgs-2 is blocked by glucocorticoids in macrophages (88, 123), we investigated whether the inducible form of nos is also expressed in 3T3 cells. We find that mac-nos is a primary response gene, induced by phorbol in both Swiss and BALBlc 3T3 cells (124). The time course of appearance of these two messages is similar following induction. Overlapping, but not identical, groups of ligands can induce both tisl Olpgs-2 and mac-nos expression in 3T3 cells. We think it likely that similar cellular and molecular mechanisms of regulation may exist for these two genes, each of which plays a key role in the production of paracrine mediators of cellular function.
XI. Conclusions and Future Directions In the past 7 years, our knowledge of ligand-induced signal-transduction pathways, and primary response gene expression in has undergone profound expansion. New families of transcription factors (egrl ltis8-related genes, members of the c-fos family, and members of the c-jun family) induced by growth factors, tumor promoters, and hormones have been identified and characterized. We hope, by studying the tisll family of genes in mammalian cells as well as by studying their homologues in organisms more amenable to genetic manipulation, such as flies and yeast, to determine whether this group of genes also falls into this functional group. The roles of some of the primary response genes, such as tis21lpc3 (49, 50) and tis7lpc4 (12, 45), remain obscure, despite the description of their encoded proteins and gene structure. Once again, studies on lower organisms may help to provide clues as to the function of such genes. Otherwise, our understanding of the roles of these primary response genes in effecting phenotypic alterations in cell behavior in response to extracellular signals is likely to remain enigmatic until our knowledge of cellular physiology increases as a result of basic studies andlor until these genes are implicated in
144
HARVEY R. HERSCHMAN ET AL.
a cellular pathway as a result of some pathophysiology related to their function. The identification of new cytokines (JE, KC) and enzymes (TIS101PGS-2, niac-NOS) that regulate the synthesis of paracrine effectors (tisl01pgs-2, tnuc-nos)has led many laboratories, including our own, into new fields. It is of some interest to note that the four groups that cloned the cDNA for the inducible form of prostaglandin synthase did not have major research programs geared to the study of inflammation or immune responses, but were instead concerned with questions either of regulation of cell division by growth factors and mitogens or of the effect of viral transformation on the regulation of cellular proliferation (59, 60, 62, 63). One of our major goals in the near future will be to try to identify the relative roles of the two forms of PGS in cellular and organismal processes such as mitogenesis, reproduction, and inflammation. We plan to use both antisense approaches and the development of murine strains with homozygously disrupted tislOlpgs-8 and pgs-1 genes to study these issues. We will also concentrate on determining the nature of the DNA genomic elements acting in cis and the trans-acting regulatory factors that govern induction of this gene by agents such as forskolin (protein kinase A), phorbol esters (protein kinase C), PDGF (ligandmediated receptor tyrosine kinase activity), and v-src (intracellular tyrosine kinase). Emphasis will also be placed on elucidating the molecular mechanism by which glucocorticoid inhibition of tislOlpgs-2 induction is achieved. Finally, the similarities and differences in the regulatory mechanisms controlling the expression of the inducible nitric-oxide synthase and prostaglandin synthase genes will be the subject of continued interest and investigation.
This work was supported by National Institutes of Health Grant M24797 (HRH) and by DOE contract 1)E FC03 87ER60615 between the Regents of the University ofCalifornia and the Department of Energy. D. A. Kujulm was sueported by NIH physician-scientist award L’SPHS 1K11 01815. R. S. Gilbert is a postdoctoral fellow supported by USPHS Training Grant C,MO-8042.
REFERENCES I . H. R . Herschntan, ARB 60, 281 (1991). 2. I,. F. Lau and D. Nathans, “Hormonal Control and Regulation of Gene Transcription,” 11. 165. 1991. 3 . li. R. Yamamoto and B. M. .4lberts, ARB 45, 721 (1976).
THE tis GENES
145
4. E. Butler-Gralla and H. R. Henchman, J. Cell. Physiol. 107, 59 (1981).
5. R. W. Lim, B. C. Varnum and H. R. Henchman, Oncogene 1, 263 (1987). 6 . L. F. Lau and D. Nathans, EMBOJ. 4, 3145 (1985). 7 . V. P. Sukhatme, K. Sreedharan, F. G. Toback, R. Tauh, R. G. Hoover and C. H. TsaiMorris, Oncogene Res. 1, 343 (1987). 8. J. M. Almendral, D. Sommer, H. Macdonald-Bravo, J. Bbrckhardt, J. Perera and R. Bravo, MCBiol8, 2140 (1988). 9. D. A. Kujuhu, R. W. Lim, B. C. Varnum.and H. A. Henchman, Oncogene 1,257 (1987). 10. B. C. Varnum, R. W. Lim, D. A. Kujubu, S. Luner, S. E. Kaufman, J. S. Greenherger, J. C. Gasson and H. R. Herschman, MCBiol 9, 3580 (1989). 1 1 . J. Milbrandt, Science 238, 797 (1987). 12. F. Tirone and E. M. Shooter, PNAS 86, 2088 (1989). 13. K. Cho, W. C. Skarnes, B. M. Minsk, S. Palmieri, L. Jackson-Grushyand J. A. Wagner, MCBiol9, 135 (1988). 14. S. G. Irving, C. H. June, P. F. Zipfel, U. Siehenlist and K. Kelly, MCBiol 9, 1034 (1989). 15. A. T. Arenander, R. W. Lim, B. C. Varnum, R. Cole, J. de Vellis and H. R. Herschman, J. Neurosci. Res. 23, 247 (1989). 16. A. T. Arenander, R. W. Lim, B. C. Varnum, R. Cole, J. de Vellis and H. R. Henchman, J. Neurosci. Res. 23, 257 (1989). 17. A. T. Arenander, J. de Vellis and H. R. Herschman, J. Neurosci. Res. 24, 107 (1989). 18. H. R. Henchman, “Regulation of Gene Expression in the Nervous System: Neurology and Neurohiology,” Vol. 59, p. 93. 1990. 19. A. T. Arenander and H. R. Henchman, in “Neurotrophic Factors” 0. Fallon and S. Loughlin, eds.), p. 89. Academic Press, New York, 1992. 20. H. R. Henchman, Cell activation in “Genetic Approaches/Advances in Regulation of Cell .. Growth” 0. J. Mond, J. C. Canhier, and A. Weiss, eds.), Vol. 2, p. 83. Raven, New York, 1991. 21. H. R. Herschman, TIBS 14, 455 (1989). 22. J. G. Akin, D. A. Kujubu, S. RafEoni, D. D. Eveleth, H. R. Henchman and R. A. Bradshaw, JBC 266, 5401 (1991). 23. D. P. Bartel, M. Sheng, L. F. Lau and M. E. Greenherg, Genes Deu. 3, 304 (1989). 24. J. I. Morgan and T. Curran, Nature 322, 552 (1986). 25. T. Curran and J. I. Morgan, PNAS 83, 8521 (1986). 26. T. G. Hazel, R. Misra, I. J. Davis, M. E. Greenherg and L. F. Lau. MCBiol 11, 3239 (1991). 27. B. H. Cochran, J. Zullo, I. M. Verma and C. D. Stiles, Science 226, 1080 (1984). 28. W. Kruijer, J. A. Cooper, T. Hunter and I. M. Verma, Nature 312, 711 (1984). 29. R. Miiller, R. Bravo, J. Burckhardt and T. Curran, Nature 312, 716 (1984). 30. M. E. Greenherg, L. A. Greene and E. B. Ziff, JBC 260, 14101 (1985). 31. P. K. Vogt and T. J. Bos, Ado. Cancer Res. 55, 1 (1990). 32. S. J. Busch and P. Sassone-Corsi, Trends Genet. 6, 36 (1990). 33. P. Oquendo, J. Alberta, D. Wen, J. L. Graycar, R. Derynck and C. D. Stiles, JBC 264, 4133 (1989). 34. B. J. Rollins, E. D. Morrison and C. D. Stiles, PNAS 85, 3738 (1988). 35. R. S. Kawahara and T. F. Deuel, JBC 264, 679 (1989). 36. V. P. Sukhatme, X. M. Cao, L. C. Chang, C. H. Tsai-Morris, D. Stamenkovich, P. C. P. Ferreira, D. R. Cohen, S. A. Edwards, T. B. Shows, T. Curran, M. M. Le Beau and E. D. Adamson, Cell 53, 37 (1988). 37. P. Lemaire, 0. Revelant, R. Bravo and P. Charnay, PNAS 85, 4691 (1988). 38. B. A. Christy, L. F. Lau and D. Nathans, PNAS 85, 7857 (1988).
146
HARVEY R. HERSCHMAN ET AL.
39. 1%.C. \h-num, R. M'. Lim, V. P. Sukhatme and H. R. Herschman, Oncogene 4, 119
11989). 40. X.Can, R. A. Koski, A. Gashler, M:McKiernan, C. F. Morris, R. GatFney, R. V. Hay and \I P. Sukhatme, MCBiol 10, 1931 (1990). 4 1 . P. Lemaire, C. Vesque, J. Schmitt, H. Stunnenberg, R. Frank and P. Charnay, MCBiol 10, 3456 (1990). 42. B. Christy and D. Nathans, PNAS 86, 8737 (1989). 43. T. (;, Hazel, D. Nathans and L. F. Lau, PNAS 85, 8444 (1988). 44. R. P. Ryseck, H. Macdonald-Bravo. M. G . Matt&, S. Ruppert and R. Bravo. EMBO]. 8, 3327 (1989). 15. B. C. Varnum, R. W'. Lim and H. R. Herschman, Oncogene 4, 126'3 (1989). 46. J. Milbrandt, Neuron 1, 183 (1988). 47. T. E. Wilson, T. J. Fahrner, M. Johnston and J. Milbrandt, Science 252, 1296 (1991). 48, R. E. Paulsen, C. A. Weaver. T. J. Fahrner and J. Milbrandt, JBC 267, 16491 (1992). 49. 13. S. Fletcher, R. W. Lim, B. C. Varnum, D. A. Kujubu, R. K. Koski and H. R. Herschman, JBC 266, 14511 (1991). .%. A. Bradburv, R. Possenti, E. M. Shooter and F. Tirone, PNAS 88, 3353 (1991). =a. B. C. Varnurn, R. K. Koski and H. Henchman, J . Cell Physiol., in press (1994). 51. J. P. Rouault, R. Rimokh, C. Tessa, G . Paranhos, M. Ffrench, L. Duret, M. Garoccio, D. Germain. J. Samarut and J. P. Magaud, EMBO I. 11, 1663 (1992). .52. K. Rimokh. J. P. Rouault, K. Wahbi, $1. Gadoox, M. Lafage, E. Archimbaud, C. Charrin, 0. Gentilhomme, D. Germain, J. Samarut and J. P. Magaud, Genes Chrom. Cancer 3,24 (1991). 53. 8. C. Varnum, R. \\'. Lim, V. P. Sukhatme and H. R. Herschman, Oncogene 4, 119 (1989). 54 Q. M a , B. %num and H. R. Herschman, Oncogene 6, 1277 (1991). 55. B. C. Varnurn, Q.Ma, T. Chi, B. Fletcher and H. R. Herschman, MCBiol11, 1754 (1991). 56. R. N. DuBois, M. W. McLane, K. Ryder, L. F. Lau and D. Nathans, JBC 265, 19186 11990). 5 7 . N'. S. Lai, I>. J. Stumpo and P. J. Blackshear. JBC 265, 16556 (1990). ,S8.hl. Gomperts, J. C. Pascall and K. D. Brown, Oncogene 5, 1081 (1990). 39. I). A. Kujuhn, B Fletcher, B. C. Varnum, R. W. Lim and H. R. Henchman, JBC 266, 12866 (1991). 60.M'. Xie, J. 6. Chipman, I>. L. Robertson, R. L. Erikson and D. L. Simmons, PNAS 88, 1692 (19911. 61. S. A . Qureshi, C . I<. Joseph, M. Rim, A. Marony and D. A. Foster, Oncogene 6, 995 (1991). 62. $1. K. O'Banion, V. D. Winn and D. A. Young, PNAS 89, 4888 (1992). 63. R. P. Ryseck, C. Raynoschek, H. Macdonald-Bravo and I<. Dorfman, Cell Growth 0tj'f.r. 3, 443 (1992). 64. T. Hla and K. Neilson, PNAS 89, 738.1 (1992). 65. B. S. Fletcher, D. A. Kujubu, D. $1. Perrin arid H. R. Herschnian,JBC 267, 4438 (1992). 66. If'. Y. L. Wong and J. S. Richards, Mol. Endocn'nol. 5, 1269 (1991). 67. J. Sirois and J. S. Richards, JBC 267, 6382 (1992). 68. A. H. Lin, M. J. Bienkowski and R. R. Gorman, JBC 264, 17379 (1989). 69. M. J. Pash and M. Barly, FASEB J. 2, 2613 (1988). 70. S. C. L. Hong, R. Polsky-Cynkin and L. Levine, ]BC 251, 776 (1976). 71. D. A. Kujubu, S. T. Reddy, B. S. Fletcher and H. R. Herschman, JBC 268,5425 (1993). 72. J. W. Han, H. Sadowski, D. A. Young and J. G. Macara. PNAS 87, 3373 (1990). 73. A. Raz, A. Wyche, N. Siegal and P. Needleman, ]BC 263, 3022 (1988). 74. A. Dim, A. M. Reginato and S. A. Jimenez, JBC 267, 10816 (1992).
THE
75. 76. 77. 78.
tis
GENES
147
W. L. Smith and W. E. M. Lands, Bchem 11, 3276 (1972). R. W. Egan, J. Paxton and F. L. Kuehl, Jr., JBC 251, 7329 (1976). A. H. Lin, M. J. Bienkowski and R. R. Gorman, JBC 264, 17379 (1989). K. K. Wu, R. Sanduja, A. L. Tsai, B. Ferhanoglu and D. S. Loose-Mitchell, PNAS 88, 2384 (1991). 79. J. A. Maier, T. Hla and T. Maciag, JBC 265, 10805 (1990). 80. A. Raz, A. Wyche and P. Needleman, PNAS 86, 1657 (1986). 81. E. M. Doerfler, L. R. Danner, H. J. Shelhamer and E. J. Parrillo, J. Clin, Inoest. 83,970 (1989). 82. G . D. Bottoms, M. A. Johnson, C. H. Lamar, J. F. Fessler and J. J. Turek, Circ. Shock 15, 155 (1985). 83. A. A. Aderem, S. D. Cohen, D. S. Wright and A. Z. Cohn, J . E r p . Med. 164, 165 (1986). 84. J. I. Kurland and R. Bockman, J. E r p . Med. 147, 952 (1978). 85. W. R. Fuller, R. C. Kelsey, J. P. Cole, T. C. Dolleryand J. MacDermot, Clin. Sci. 67,653 (1984). 86. T. Hla and K. Neilson, PNAS 89, 11586 (1992). 87. J. J. Pash and M. J. Bailey, FASEB J. 2, 2613 (1988). 88. D. A. Kujubu and H. R. Herschman, JBC 267, 7991 (1992). 89. B. J. Rollins, P. Stier, T. Ernst and G . G. Wong, MCBiol 9, 4687 (1989). 90. R. S. Kawahara, Z. W. Deng and T. F. Deuel, JBC 266, 13261 (1991). 91. M. Poon, J. Megyesi, R. Green, H. Zhang, B. Rollins, R. Sahstein and M. Taubman,JBC 266, 22375 (1991). 92. R. R. Freter, J. C. Irminger, J. A. Porter, S. D. Jones and C. D. Stiles, MCBiol 12, 5288 (1992). 93. X. W. Ping, C. Warden, B. S. Fletcher, D. A. Kujubu and H. R. Herschman, Genomics 15, 458 (1993). 94. C. Yokoyama and T. Tanabe, BBRC 165, 888 (1989). 95. S. A. Kraemer, E. A. Meade and D. L. DeWitt, ABB 293, 391 (1992). 96. J. Fi, J. L. Masferrer, K. Seibert, A. Raz and P. Needleman, JBC 265, 16737 (1990). 97. J. M. Farber, MCBiol 12, 1535 (1992). 98. R. J. MacKay, J. L. Pace, M. A. Jarpe and S. W. Russell, J. Zmmunol. 142, 1639 (1989). 99. T. A. Phillips, D. A. Kujubu, R. J. Mackay, H. R. Herschman, S. W. Russell and J. L. Pace, J. Leukocyte Biol. in press (1993). 100. S. H. Lee, E. Soyoola, P. Chanmugam, S. Hart, W. Sun, H. Xhong, S. Liou, D. Simmons and D. Hwang, JBC 267, 25934 (1992). 101. M. G . O’Sullivan, E. M. Huggins, Jr., E. A. Meade, D. L. DeWitt and C. E. McCall, BBRC 187, 1123 (1992). 102. T. A. Hamilton and D. 0. Adams, Zmmunol. Today 8, 151 (1987). 103. D. Boraschi, S. Censini, M. Bartalini and A. Tagliabue, J. Zmmunol. 135, 502 (1985). 104. R. B. Lorsbach, W. J. Murphy, C. J. Lowenstein, S. H. Snyder and S. W. Russell, JBC 268, 1908 (1993). 105. G. J. Roth, C. J. Siok and J. Ozols, JBC 255, 1301 (1980). 106. F. J. Van der Ouderaa, M. Buytenhek, D. H. Nugteren and D. A. Van Dorp, BBA 487, 315 (1977). 107. C. J. Lowenstein and S. H. Snyder, Cell 70, 705 (1992). 108. C. Nathan, FASEB J. 6, 3051 (1992). 109. L. J. Ignarro, G. M. Buga, K. S. Sood, K. E. Byrnes and G. Chandhuri, PNAS 84,9265 (1987). 110. R. M. J. Palmer, A. G. Ferrige and S. Moncada, Nature 327, 524 (1987). 111. J. Hibbs, Jr., R. Taintor, Z. Vavin and E. Rachlen, BBRC 157, 87 (1988).
148
HARVEY H. HERSCHMAN ET AL.
112. R. D. Curran, T. R. Billiar, D. J. Stuehr, K. Hofinann and R. L. Simmons, J. Exp. Med. 170, 1769 (1989). 11.3. C:. J. Punjabi, D. L. Laskin, D. E. Heck and J. D. Laskin, J . Itntnunol. 149, 2179 (1992). 113. J. Pfeilschilier and H. Schwarzenbach, FEBS Lett. 273, 185 (1990). 11.5. I). S. Bredt, P. M. Hwang, C. E. Glatt, C. Lowenstein. R. H. Heed and S . H. Snyder, .Vatitre 351, 714 (1991). 716. I). J. Stuehr. H. J. Cho. N. S. Kwon and C. F. Nathan, PNAS 88, 7773 (1991). 117. S. P. Janssens, A. Shirnouchi, T. Quertermous, I>. B. Bloch and K. D. Bloch, JBC 267, 14519 (1992). 116. S. Lamas, P. A. Marsden, C. I;. Li, P. Tempst and T. hiichel, PNAS 89, 6348 (1992). 119. if'. C. Sessa, J. K. Harrison, C. 51. Barber, D. Zeng, M. E. Dureiux, D. D. D'Angleo, K. H. Lynch and M . J. Peach, / B C 267, 15274 (1992). 120. C. J. Lowenstein, C. S. Glatt, D. S. Bredt and S. H. Snyder, PNAS 89, 6711 (1992). 121. C, H. Lyons, G. J. Orloff and J. M. Cunningham, JBC 267, 6370 (1992).. 122. (2. Vv'. Xie, H. J. Cho, J. Calaycay, R. A. Mumford, K. M. Swiderek, T.D. Lee, A. Ding, T. Trow and C. Nathan, Science 256, 2% (1992). 123. 51. Di Rosa. M . hdomski, R. Carnuccio and S. Moncada, BBRC 172, 1246 (1990). 124. R. S. Gilbert and H. R. Henchman, J . Cell. Physiol. 157, 128 (1993).
Nuclear Pre-mRNA Processing in Higher Plants
’ KENNETH R.
LUEHRSEN, SHARIFTAHAAND VIRGINIA WALBOT
’
Department of Biological Sciences Stanford Unioersity Stanford, California 94305
I. Biochemistry of Splicing and Intron Recognition . . . . . . . . . . . . . . . . . .
............................ ts ofMaize . . . . . . . . . . . . . . . . . . ................. icing . . . . . . . . . . . . . . . . . . . .............
VI. Perspective . References. ..................................................
150 158 174 183 185 188 190
The nuclear genes of eukaryotes include both protein-coding and noncoding segments. Protein-coding segments are exons, and introns are noncoding regions that separate exons. Introns are included in the primary transcript (pre-mRNA), but are excised in the nucleus before the mature mRNA is translated into protein. Since the discovery of introns in 1977, several theories have been proposed to explain their prevalence. One proposal is that introns were present in primordial genes, and acted as “hotspots” of recombination for the rapid exchange of exons and the generation of new enzymatic activities (1, 2). An alternative view is that introns were acquired much later (3, 4 ) possibly through the action of transposableelement (TE)1 insertions (5). Evidence has accumulated to support both views, which are not mutually exclusive. 1 Abbreviations: AS, alternative splice site; BMS, Black Mexican Sweet; bp, base pair; CaMV, cauliflower mosaic virus; CAT, chloramphenicol acetyltransferase; cDNA, complementary DNA; CS, change-of-state; GUS, f3-glucuronidase; kb, kilobase pair; m,G, 2,2,7trimethylguanosine; nt, nucleotide; PTB, polypyrimidine-tract binding protein; RNAP, RNA polymerase; PCR, polymerase chain reaction; snRNA, small nuclear RNA; snRNP, small nuclear ribonucleoprotein; TE, transposable element; TIR, terminal inverted repeat; UTR, untranslated region. Gene symbols: A, anthocyaninless; Ac, activator; Adh, alcohol dehydrogenase; Bz, bronze; Ds, dissociator; dSpm, defective Spm; Gpc, glyceraldehyde-3-phosphatedehydrogenase, cytosolic; hsp, heat-shock protein; Lc, leaf color; Mu, mutator element; Nos, nopdine synthase; rbcS, ribulose-1.5-bisphosphate carboxylase, small subunit; P, pericarp; Sh, shrunken; Spm, suppressor-mutator; tnp, transposase protein; Wx, waxy
Progress in Nudeic Acid Research and Mokcular Bialopy, Val. 47
149
Copyright 0 1994 by Academic Press, Inc. All rights of reproduction in any form reserved
150
KENNETH R. LUEHHSEN E T AL.
There are five classes of introns: (1)Some tRNAs contain an intron in the anticodon loop; these are excised by specific endonuclease and ligase activities. (2) Trans-splicing of pre-mRNAs occurs in some lower eukaryotes (i.e., trypanosomes and Caenorhabditis elegans) and results in the addition of a leader exon to the 5’ end of some mRNAs. (3)Group I introns are found in the genes of lower eukaryotes, in the mitochondria1 genes of plants and animals, and in the chloroplast genes of higher plants; these introns fold into a complex secondary structure and autocatalytically self-splice in the absence of protein. (4)Group I1 introns have a distribution similar to group-I selfsplicing introns, although some have an associated “maturase” protein. Group-I and group-I1 introns are distinguished by different secondary structures and the requirement of a guanine nucleotide cofactor for group-I intron splicing. (5) Nuclear pre-mRNA introns are present in some genes of all eukaryotes; these introns are spliced in large ribonucleoprotein particles called spliceosomes (6). Self-splicing group-I1 and nuclear pre-mRNA introns are processed by a similar mechanism, suggesting that the latter evolved from the former (7, 8). The focus of this review is on nuclear pre-mRNA splicing in plants (with emphasis on maize); several recent reviews have discussed the other intron types (9-12). We first discuss the salient points of nuclear intron splicing in yeast and animals. We review the literature on small nuclear RNAs and plant intron splicing, and present a working model for plant intron recognition. We then evaluate the alternative RNA processing events for several transposable-element-induced mutations of maize as a test of the intronrecognition model.
1. Biochemistry of Splicing and lntron Recognition A. Cis Requirements Although the enzymology of pre-mRNA splicing is still being studied, several reaction intermediates have been well-characterized, allowing a detailed hypothesis of the mechanism (see Figs. 1 and 2). Intron removal is the ~~
~
~~
-7
FIG. 1. Outline of the nuclear pre-mRNA splicing process in yeast, mammals, and plants. (-4) A generalized structure for yeast and mammalian introns. After transcription, the U1 snRNP is the first to bind the intron (dotted line). The U2 snRNP binds near the branchpoint (BP), followed hy U4/U6-U5 tri-snRNP addition to assemble the functional spliceosome. The first cleavage releases the 5‘ exon and the lariat-exon 2 intermediate. The second cleavage reaction results in the ligation of exons 1 and 2 and the release of the lariat intron. (B) A typical plant gene is composed of (A + T)-rich introns adjacent to (G C)-rich exons. The question mark indicates that the reaction mechanism for plant splicing has not been determined. polyT, Polythvniidine tract; polyY, polypyrimidine tract; BP, branchpoint.
+
A Yeast
BP
B
A+Trich
polyT 777
An
5'
\L
......
BP
3
152
KENNETH H. LUEHHSEN E r AL.
u’5 F I G 2. Base-pairing interactions between SIIRNASand the pre-mRNA during splicing. The Iuse-pairing interactions of the pre-mRNA and snRNAs are shown; the hybrids probably do not form coincidentall!; hut appeaF and disappear in an as yet undetermined temporal order. ‘The base-pairing interactions between the snRNAs are not shown. Lower-case letters are intron sequences (dotted line). BP. Branchpoitit; 0. m,G cap structure. Redrawn from 29.
result of two trans-esterification reactions. In the first, the 5’ end of the intron is cleaved by a nucleophilic attack by the 2’ OH from a branchpoint nucleotide (usually an A) near the 3’ end of the intron. The resulting reaction products are the upstream exon (with a 3’ OH) and the intron-downstream exon; the intron has a “lariat” structure and a 2‘-5’ linkage (branch structure) at the branchpoint nucleotide. In the second trans-esterification, the first nucleotide of exon 2 is the target of a nucleophilic attack by the last nucleotide of exon 1, and the resulting cleavage-ligation reaction produces the ligated exons and the excised intron (lariat form). The mature inRNA is transported to the cytoplasm, but the excised intron is degraded in the nucleus. Introns must be recognized and excised with great fidelity, because nucleotide additions or deletions in an open-reading-frame usually results in the translation of a nonfunctional polypeptide. What are the cis-acting sequence features that distinguish introns from exons? Although introns can be
SPLICING IN HIGHER PLANTS
153
several kilobases in length, they contain three or four short (<30 nucleotides) highly conserved sequence motifs as diagrammed in Fig. 1. Almost all introns have the dinucleotides G-U and A-G at the 5'-donor and 3'-acceptor ends, respectively, and these dinucleotides are part of longer, less conserved 5' and 3' splice junction sequences (Table I). Another conserved sequence is the branchpoint, which is the invariant UACUAAC in yeast and the variable UNCURAC in mammals; no conserved branchpoint sequence has been identified in plants (13, 14). Finally, mammalian introns have a 20to 30-nt polypyrimidine tract found just distal to the branchpoint region. Although yeast and plant (15) introns do not have a well-defined polypyrmidine tract, they usually have a 15- to 20-nt U-rich region immediately upstream of the 3' acceptor; the U tract is involved in 3'-acceptor choice in yeast (16) and possibly in plants (16a, 16b). In addition to the conserved motifs found at each splice junction, plant introns are more (A + U)-rich than the surrounding exons, and this feature is functionally important for plant splicing. Although mammalian introns do not have a base-content bias, the introns from some lower eukaryotes (such as Tetrahymena and C . elegans) are (A + U)-rich (17); it is not known if this feature is essential for splicing in these organisms. The important features of nuclear pre-mRNA splicing in yeast and mammals, ascertained using both in viva and in vitro splicing assays ( 1 1 , 18),are outlined in Fig. 1. The conserved motifs in pre-mRNAs are necessary for specific base-pairing interactions with small nuclear RNA (snRNAs) in the spliceosome (12);the snRNAs are thus involved in intron recognition and are also probably involved in catalysis. The snRNAs that base-pair with the conserved intron elements are complexed with proteins to form four different ribonucleoprotein particles (snRNPs). The splicing process is initiated when U1 snRNP associates with the 5' splice junction, and six conserved nucleotides (Table I) at the 5' end of U 1 snRNA base-pair with the intron 5' splice junction (19). Mutation of the pre-mRNA 5' splice junction can severely diminish splicing or activate cryptic splice sites upstream or downstream of the native site, illustrating the importance of this base-pairing interaction in 5' splice site choice (20, 21). Recent evidence from Schizosaccharomyces pombe shows that two nucleotides of U 1 base-pair with the conserved A-G at the 3' splice junction (22);this interaction probably serves to position the 5' and 3' splice junctions in close proximity to one another (see Fig. 2). In the next stage of assembly, U2 snRNP binds to the branchpoint region with concomitant base-pairing of U2 RNA with the branchpoint sequence (23-25). U2 snRNP binding in mammalian introns is facilitated by the prior association of the polypyrimidine-tract binding-protein (PTB) with the
TABLE I PLANTCONSENSUS SPLICEJUNCTION SEVUENCES~~ Expected match to consensus (%) Consensus splice-junction sequence
-2 to t 6
-1 to +6
5.2 4.6 5.2 17.9 8.9
8.4 7.3 8.5 40.6 14.4
5' sequence
All plants (%). Monocot (72); Ilicot (%): Yeast (96): Vertebrate (%)
-2
-1
A
G 81 80 82 50 77
I
-7 U 40 50
62 63 61 44 (T) 62 U
+1 G
+2 U 100
+3 A
100 100 100 100
100 100 100 100 A
70 68 70 97 60 U
-6 U 44
-5 U
so
67
-4 G 48 42
-3 C 81 60
100
c
c
+4 A 52 45 56 89 74 U
+5 G 54 60 49 100 84
-2
-1
A 100 100
G 100 100
c
+6 U 53 50
54 94 50 A , . . 5' U1 sequence
3' sequence Monocot (a): Dicot (8):
-8 U 43 47
64
+1 G
+2
64
42 44
58
U -
UThe plant data are derived from the compilation of 346 monocot and 528 dicot introns reported in 14. T h e yeast and vertebrate data are derived from 17. T h e vertical bar represents the intron-exon junction
SPLICING IN HIGHER PLANTS
155
neighboring polypyrimidine tract (26).Finally, the U4/6-U5 tri-snRNP associates with the 3’ splice junction, resulting in the functional spliceosome. A conserved single-stranded loop of U5 snRNA base-pairs with the exon dinucleotides immediately proximal and distal to the intron in yeast (27, 28), and possibly mediates a proofreading function (12). Using the known basepairing interactions of U1, U2, and U5 with the pre-mRNA, a composite secondary structure can be drawn that resembles the Holliday structure used to model DNA recombination [Fig. 2 (29)]. U1 base-pairs to the conserved 5‘ G-U and 3’ A-G dinucleotides found at the ends of all introns, U2 base-pairs to the branchpoint region, and U5 binds the up- and downstream exons. In addition to snRNA base-pairing to the pre-mRNA, there is also evidence that both U2 and U4 base-pair with U6 sometime during the process of splicing (30-33). Less is known about the subsequent enzymology of the splicing reactions (12). But because nuclear pre-mRNA introns share many features with group-I1 self-splicing introns (7, 8), RNA catalysis has been proposed. Recently, it was suggested that U6 snRNA acts as a ribozyme and may be the catalytic site for the trans-esterification reactions (32, 34). This hypothesis is supported by the observation that U6 is similar in sequence to self-splicing viral RNAs. In addition, the U6 genes of several fungi have intron insertions in this same region ( 3 4 , suggesting that the introns may have arisen by a reverse splicing mechanism. The same region of U6 proposed to be involved in RNA catalysis also base-pairs with U4 snRNA, suggesting that U4 acts as an antisense repressor (35) until U6 is needed to mediate cleavage and ligation. Because plants, yeast, and animals all contain the same snRNAs, it is reasonable to assume that plant introns are spliced by a mechanism similar to that of other kingdoms. A more detailed discussion of plant snRNAs is presented in Section 1,B.
9. snRNAs In mammals, there are 14 separate snRNAs, designated U1 through U14 (36, 37). They were named sequentially in the order of their discovery, and are termed U-RNAs because of their high uridine content. The snRNAs are involved in pre-mRNA splicing (Ul, U2, U4, U5, and U6), rRNA maturation (U3, U8, U13, and U14), mRNA polyadenylation (Ull), and histone 3’-end formation (U7). Each snRNA is encoded by a multigene family, although there is little sequence or length heterogeneity within each family (37).The snRNAs involved in splicing range in size from -100 to 200 nt in mammalian cells, and are present in about -106 copies per cell (summarized in Table 11). U1, U2, U4, and U5 are transcribed by RNA polymerase (RNAP) 11, whereas U6 is transcribed by RNAP I11 (reviewed in 9, 38). U1, U2, U4, and U5 have a 2,2,7-trimethylguanosine(m,G) cap; the cap is added posttranscrip-
____ Mammalian
pldllt
Cior1ed Transcri1)etl
mRNA U1
1)Y RNA pol I1
m )Go cap? Yw
Length (nt)
wRNP
4.s5OClatt.d protein
Funcbtlon
U1 Last, pairs to 5’ and 3’
I fi5
splice junctions
U2
U4
RNA pol 11
HNA pol I 1
Yes
Y1.s
189
139
u2
U 4 U 6 and
A’. B‘J
Core
“t+ snRNP”
U2 base pairs with branchpoint conSellSuS
Comment First step in recognition; ycast Cll is
- L en g t h (nt)”
G e n o m e copy
15H-I85
Pea (>6)
Yeast U2 is 1175 nt
208-260
Potato
(25-40).
Arubidupsis (10- 15). maize (25-40)
RNA pol 11
Yes
117
U5 and “tri- Core pliis snRNP” several unique
Proofreading?
Affects 5‘ and 3‘ spliet. site choice
125-137
U6
RNA pol 111
No”
107
U4iU6 and “tris nR N P
Catalytic center
N u Sm binding site; Ufi added to mature U4 snRNPs in nucleus; homology with selfsplicing viral RNAs; fungal introns
101-104
Potato (lo),Ara-
bidopsis (7). maize (1). pea (3). wheat (3) Beanv (2). pea (3)
160-176
Antisense regulator of Ufi
gene\) Pea (6). soy” bean (21, hean- (I), tomato (9). wheat (6)
S69 n t
U5
Core
( n u m b e r of
Potato (>20)
Arabidupsis (I), tomato (l), maize (1). bean‘ (I), wheat (I), potato (1)
~2,2,7-Trimethylguanosine. *Monocots and dicots. .-The core proteins are named B, B’, D,. D,, D,, E, F. and 6; they range in size from 9 to 29 kDa. The F protein probably rccognizcs the Stn binding site dThe U1 snRNP A1 and 70-kDa proteins have nnclear localization sequences. ePhaseolus uulguris. fThe U2 snRNP A’ and B proteins have nuclear localization sequences.
nVicia faba. hU6 RNA is capped with m7guanosine.
SPLICING IN HIGHER PLANTS
157
tionally in the cytoplasm and is partially necessary for nuclear localization. U6 RNA has the standard m7G cap and does not leave the nucleus following transcription. The snRNAs have extensive secondary structure, a feature that is highly conserved among phyla (37). Splicing snRNAs are each complexed with several proteins and form snRNPs, which together form the spliceosome particle. Less is known about the structure and function of plant snRNAs. Plants contain U1, U2, U4, U5, and U6 snRNAs that are homologous to the known mammalian snRNAs and are presumably involved in pre-mRNA splicing (39, 40). Each plant snRNA can be folded into the same secondary structure as its yeast and animal counterpart (41), strongly implying a similar function in plants and animals. Like mammals, plant U1, U2, U4, and US are transcribed by RNAP 11, whereas U6 is transcribed by RNAP I11 (9). Plant U l , U2, U4, and U5 contain the m3G cap (41), indicative of posttranscriptional modification in the cytoplasm and subsequent import into the nucleus. Like mamm4ian U6, plant U6 does not have the m3G cap structure (42). Plant snRNAs are present in only -104 copies per cell versus the typical 106 per cell typical of mammalian cells. Heterogeneity of both length and sequence distinguishes plant snRNAs from those of both yeast and mammals (Table 11) (40). Unlike the snRNA gene families in mammals, which contain tandem repeats, plant snRNAs are encoded by multigene families at unlinked loci (43-45). Within tandem repeats, individual genes are homogenized by gene conversion. In contrast, unlinked plant snRNAs are free to diverge within the constraint of selection pressure to retain function. The sequence variations in plant snRNAs (41)are not in regions reported to be involved in intermolecular RNA-RNA basepairing interactions; this evidence suggests that the snRNA-pre-mRNA interactions described in yeast and mammals (12) are probably retained in plants. In the regions of intramolecular base-pairing, snRNA sequence variants always have a compensatory change in the complement, making it likely that plant snRNAs retain the conserved secondary structure (41). However, there are base changes in plant snRNAs that occur in regions associated with protein binding (41). This observation opens up the possibility that these variants might bind tissue- or intron-specific snRNP proteins, but there is no experimental evidence to support this conjecture. Curiously, the expression of pea U1, U2, U4, and US snRNAs is developmentally regulated, while a single size class of U6 is constitutively expressed (46). Members of the U1, U2, U4, and U5 families are differentially expressed in seed, seedlings, and mature leaves. U1 and U4 variants appear and disappear during seed development, as the seed progresses through maturation and reaches a transcriptionally inactive state. Another shift in U1 and U4 populations occurs as the seedling develops into a mature plant. An
158
KENNETH R. LUEHRSEN ET AL.
identical pattern of U1 and U4 expression is seen in mature seeds and mature leaves, both of which are transcriptionally quiescent, suggesting that a subset of U1 and U4 variants is correlated with transcriptional activity of the cell, but not with tissue type. It is possible that some snRNA variants facilitate efficient splicing in cells that are rapidly transcribing many introncontaining genes. snRNAs are complexed with proteins to form functional particles called snRNPs (47). In mammals, there are eight proteins (named B, B’, D,, D,, D,, E. F, and G in 47) that constitute the core proteins bound to all snRNPs; there are many additional proteins unique to each snRNP. One distinguishing feature of U1, U2, U4, and U 5 is the Sm binding site, a 9- or 10-base U-rich region where the core snRNP proteins bind. U 6 does not have the Sm binding site, but this snRNA binds U4 through a long region of complementarity. By biochemical fractionation, four distinct snRNPs have been isolated: U1, U2, U4/6, and U5. In oioo, U4/6 and U 5 exist as a “tri-snRNP.” The functional spliceosome is made by a stepwise addition of these snRNPs and additional proteins to the pre-mRNA (reviewed in 11, 18, 48). In mammals, snHNAs are localized in discrete regions of the nucleus (49, 50), suggesting a nonrandom placement of the RNA processing machinery. The characterization of plant snRNP proteins is just beginning. The U5 snRNP has been purified from broad bean ( V i c i ~ f ~and, b ~ based ) on immunological cross-reactivity, several snRNP proteins are shared among plants and mammals (51).This suggests that the protein components of the splicing machineries have been conserved among plants and animals. Recently, a cDNA for the U2B” gene has been cloned from potato (52, 53). Comparing potato and mammalian U 2 B , there is 70% similarity in amino-acid sequence (including conservative changes), again suggesting that the mammalian and plant splicing apparatus share more similarities than differences (52). One important difference between plant and animal introns is the requirement that plant introns be (A + U)-rich to be efficiently spliced (see Fig. 1 and Section 11, D). It remains to be seen if there are proteins involved in splicing in plants that are not found in yeast and mammals. As will be discussed, plant introns contain (A + U)-rich sequences that facilitate efficient splicing, and these motifs might interact with a novel binding protein.
II. Plant Splicing A. Methods Used to Assay Plant Splicing Analysis of splicing in yeast and mammals has utilized both in vivo and in oitro assays. Each approach has strengths and limitations. In vivo splicing assays involve a natural milieu in which the pre-mRNA is appropriately
SPLICING IN HIGHER PLANTS
159
presented to the splicing machinery (i.e., coated with nuclear proteins), whereas in vitro assays, in which splicing reactions occur at -1% the rate found in vivo, permit the temporal detection of splicing intermediates. All of the plant splicing studies have been done in vivo, through the use of either transgenic plants or transfected tissue culture cells. I n viuo splicing studies using test introns inserted into an autonomously replicating vector, resulting in amplified levels of transcript, have been described (54).To date, there is no in vitro assay available for plants, precluding the identification of reaction intermediates (Fig. 1B). Many plant splicing studies involve measuring a reporter gene whose activity depends on intron splicing (55-57). Often, these assays require that splicing yield the correct reading frame, but reporter gene expression does not assess the diversity of RNA products. Alternatively, transcript-RNA structure and quantity are assessed (56, 58). Although northern blots are useful in sizing and quantifying message levels, less abundant alternatively spliced mRNAs might be overlooked. In addition, because many plant introns are small (
160
KENNETH H. LUEHHSEN
n AL.
B. Structure of Plant lntrons 1. CONSERVED Moms
A consensus 5’ splice junction sequence for both monocot and dicot introns has been proposed (14) and is shown in Table I. Plants, yeast, and vertebrates have the identical 5’ splice junction consensus sequence, AGIGUAAGV (the vertical bar indicates the position of cleavage). The G-U at positions + I and t-2 is nearly invariant with only a few exceptions (see Table 111). Positions -2 through +6 of the plant 5’ splice junction consensus are coniplementary to eight nucleotides near the 5’ end of plant U 1 snRNA, suggesting that the pre-mRNA and U 1 anneal during splicing. McCullough et nl. (21) made a series of mutations in the 5‘ splice junction of pea rhcS iritroii 1 and tested the effects on splicing in tho. As expected, mutating the canonical G-V to A-U or G-A inactivated the native splice site and simultaneously activated up- and downstream cryptic 5’ splice sites. Point mutations made at positions + 3 (A to C) and +5 (G to C) in the 5’ splice junction of a synthetic intron showed that cryptic 5’ splice sites were activated (61).These findings indicate that U 1 base-pairing to the 5’ splice junction is probably an important determinant in selecting the correct 5’ splice site. The percent match at each position of the 5’ splice junction consensus is, however. lower in plants than in either yeast or vertebrates. Considering the TABLE 111 PLANT IYTRONS WITH uNCSG.AI, SPLICE JUNCTION SEQUENCES
Gene and intron
Plant
Comment
Ref.
GC replaces G T at 5’ splice junction
126
Rice
r\DPglucose pyrophosphorylase intrm
CA replaces GT at 3’ splice junction; CC replaces AG at 3’ splice junction
127
Cucumber
Hydroxypyruvate reductase intron 6
GC replaces GT at 5’ splice junction
3
hl ai zt’
d S I m type-11 deletion variant
GC replaces GT at 5’ splice junction
128
Sfaize
M d 3 R gene-A intron 3
GC replaces GT at 5’ splice junction
1.
hlaize
fcx-frI9
CG replaces AG at 3’ splice junction of an alteriiatively spliced n1RNz4
129
-7
Hershberger and V. Walbot (unpublished)
SPLICING IN HIGHER PLANTS
161
overall conservation of 5’ splice junction sequences, the product of the percent match (Table I) at positions -2 through + 6 (or -1 through +6) predicts that only 5.2% (8.4%)of plant introns will be an exact match to the consensus; this contrasts with the 17.9%(40.6%)and 8.9% (14.4%)values calculated for yeast and vertebrates, respectively. In particular, the G at position + 5 is 100%and 84%conserved in yeast and vertebrates, respectively, and is essential for efficient splicing in yeast (62).A consensus G is found at position + 5 of plant introns, but occurs in only 54%of all introns. Mutating the G at +5 to an A or a C had little effect on splicing efficiency in dicots, indicating that this nucleotide is not as important to 5’ splice-site function as it is in yeast and vertebrates (21).The A at + 4 is also well conserved; it is present in only 52% of plant intron 5’ splice junctions, in contrast to the 89%and 74% of yeast and vertebrate junctions, respectively. The relaxed sequence motif of plant 5’ splice junctions suggests that additional intron features may specify intron recognition. The lack of precise complementarity between U 1 and a typical 5’ intron sequence acquires added significance when the life history of plants is considered. Laboratory yeast and mammalian cells are maintained at a relatively fixed temperature, whereas plants experience diurnal fluctuations of 1020°C, or more. Considering the range of temperatures at which splicing occurs in plants, it is highly unlikely that base-pairing across an imperfectly conserved motif is sufficient for correct 5‘ splice choice. The disparity in base content between introns and exons (discussed in Section 11,B,2)might be an additional signal. It is also possible that 5’ and 3’ splice junctions interact during splicing and must be “matched for efficient processing (63). All but two plant 3’ splice junctions reported so far (see Table 111) have the canonical 3’ acceptor A-G at the end of the intron; the A-G is part of a weakly conserved, longer consensus shown in Table I. Unlike animal introns, plant introns do not have a well-defined polypyrimidine tract 20 to 50 nt upstream of the acceptor A-G (13, 15). Instead, there is a very strong bias for U nucleotides proximal to the 3‘ A-G acceptor; 10 or more of the 13 nt proximal to the 3’ A-G acceptor are U (15).A U-rich tract in yeast maximizes splicing at the proximal A-G (16), and recent experiments suggest that 3’ acceptor usage may be enhanced by a proximal U-rich tract in plants in conjunction with an adjacent (G + C)-rich exon sequence (I&, 16b). Introns from yeast and mammals have a conserved branchpoint sequence, 20 to 50 nt upstream of the 3’ acceptor. In yeast introns, the branchpoint occurs at the highly conserved motif UACUAAC (A is the branch nucleotide), while in mammalian introns there is a more relaxed consensus sequence, UNCURAC. U2 base-pairs with the branchpoint sequence during splicing (23-25). Surveys comparing plant introns detected no similar or
2
u 0
P
3
a
2
-
5 0
4.5
4.4
4.3
4.2
4.1
4.0
3.9
3.8
3.7
3.8
3.5
3.4
3.3
3.2
3.1
3.0
2.9
2.8
2.7
2.8
2.5
2.4
2.3
2.2
2.1
2.0
1.9
1 .B
1.3
0 1
N 0
i
A
0
o i
0
:
w
3
2:
z P i
0 i
SPLICING I N HIGHER PLANTS
163
other conserved “branchpoint” sequence (13,14).The structural similarity of plant, yeast, and animal nuclear introns and the presence of U2 snRNA in plants make it likely that the biochemistry of splicing is similar among phyla, but there is no direct evidence of lariat formation in plants. Thus, if U2 basepairs with a plant branchpoint sequence, only a few base pairs might be involved, with mismatches possibly accommodated through stabilization by snRNP proteins. The branchpoint nucleotide in yeast and mammals is usually an A. Mutating adenosine residues in an artificial intron of 85 nt demonstrated (61, 64)that no adenosine residue is required within -50 nt of the 3‘ acceptor A-G for efficient splicing in either tobacco or maize. If a lariat intermediate is formed during plant splicing, the data show that nucleotides other than A can substitute as a branchpoint acceptor.
2.
INTRON
LENGTHAND BASECONTENT
Plant and animal genes differ greatly in the average length of their introns. Animal introns range in size from 31 nt to over 105 nt (65), but plant introns are grouped in a much narrower size range, from 66 nt to less than 104 nt; a survey illustrating the length difference between plant and vertebrate introns is shown in Fig. 3. Plant internal introns (within coding regions) average only 249 nt versus 1127 nt for vertebrates and 86 nt for fungi (65).Furthermore, for example, 80% of all maize introns are less than 200 nt. Plant introns are thus more like introns from other lower eukaryotes (e.g., Tetrahymena, Caenohrabditis), which are also short (17 , 65). Can plants splice long introns efficiently? The maize Lc gene contains a 3-kb intron and the transposable element Spm contains a 4.4-kb intron, and each is efficiently spliced. The longest plant intron reported is the 7-kb intron from the maize Pericarp gene; it is also efficiently (and alternatively) spliced (66, 67). Results from experiments using synthetic introns indicate that a minimum intron length of 70-73 nt is necessary for efficient splicing in both monocots and dicots (68).This is consistent with the size distribution of natural introns; for example, only 3% of maize introns are less than 73 nt. The minimum size of plant introns is similar to the minimum length of -80 nt required for efficient splicing in mammals. A minimum intron length is probably necessary to allow multi-snRNP binding to a small region of the pre-mRNA. Perhaps the most striking difference between plant and animal introns is the strong bias for high (A U) content in plant introns versus exons (17 , 61,
+
FIG. 3. Graph showing the size distribution of plant (hatched bars) and vertebrate (solid bars) introns. The lengths of 297 vertebrate (125) and 244 plant introns were compiled and grouped into bins according to the log 10 value for length. Note that very few introns fall below 101.8 (-64 nt), the minimum length required for efficient splicing of plant and animal introns.
164
KENNETH R. LUEHRSEN ET AL.
69, 70). A survey ( I 7 )of base-pair content of introns and exons indicates that plant introns are, on average, 15% more (A + U)-rich than exons; this contrasts with humans, where the difference is only 2%. A more recent analysis (14) showed that dicot introns and exons average 71% A + U and 55% A + li, respectively. A similar differential is found in monocot introns and exons, which average 61% A + U and 42% A + U , respectively. Our survey of the base content of maize introns and exons (Fig. 4) shows that introns average 60%A + U, with a bias for U (34%)over A (26%);in contrast, maize exons average only 42% A + U . The disparity in base composition between introns and exons have been proposed to be a feature defining intron borders (21, 61, 64, 69). The genes of other organisms, such as Tetruhytiwnu and Cmnohrubditis, also show a difference in the (A + U)-content of their introns and exons (17), indicating that this base content disparity might be widespread in nature. Several experiments described in Section 11,D show that the (A + U)-richness of plant introns is essential for intron recognition, selection of 5’-donor and 3’-acceptor sites, and efficient splicing.
C. Assays of lntron Splicing in HeteroIogous Systems The similarity of plant and animal introns was first assessed by testing how well the introns are spliced in a foreign host. A summary of the reported experiments is presented in Table IV. The general conclusion is that, whereas plant introns are usually spliced in mammals, mammalian introns are
A
C G exans
U
A
C G introns
U
FIG.4. Base-content survey of maize introns and exons. The 26 maize genes for which the complete gene structure was determined are included in the survey of 184 exons (47,078 total nucleotides) and 152 introns (42.134 total n d e o t i d e s l
165
SPLICING IN HIGHER PLANTS
either not efficiently or not faithfully processed in plant cells. For example, six of nine plant introns tested were spliced either in HeLa cells or in a HeLa in vitro splicing extract. Bean phaseolin intron 1 was poorly spliced in a HeLa splicing extract (71), but the intron is 72 nt and below the minimum length of -80 nt generally required for mammalian introns (72). Soybean leghemoglobin introns 1and 3 were inaccurately spliced in HeLa cells (69), because cryptic 3' acceptors were used; this is possibly a consequence of plant introns not having a polypyrimidine tract. That some plant introns are faithfully and efficiently spliced in HeLa cells and in vitro splicing extracts suggest that the structural similarities of plant and mammal introns reflect similarities in the biochemistry of splicing. In contrast, animal introns are not efficiently spliced in plant cells. Of the 15 animal introns tested in tobacco, 14 failed to splice efficiently or accurately. Only the SV40 small-t antigen was spliced efficiently (73);this intron is 80% A U and hence is similar in base content to plant introns. Taken together, the data suggest that plant introns contain sequence information apart from the conserved splice junctions, and that this information is not present in most animal introns.
+
D. The Role of (A
+ U)-Rich Sequences
Filipowicz and colleagues recognized that, unlike mammalian introns, plant introns are unusually (A U)-rich, although the exons surrounding them are (G C)-rich (69). They proposed that this disparity might be essential for the efficient and accurate splicing of plant introns. This hypothesis was tested (64)by constructing synthetic introns with (A + U)-rich internal sequences surrounded by consensus splice junctions. Several different synthetic introns were constructed using syn7 as a starting intron; syn7 is 85 nt in length and is composed of consensus 5' and 3' splice junctions, a mammalian branchpoint consensus sequence (CUUAC), and an (A U)-rich internal sequence. The entire intron is 75% A U. The synthetic introns were placed in a (G C)-rich (60%) exon environment upstream of an intron-containing leghemoglobin gene, and the gene chimeras were assayed in transgenic tobacco calli and transiently in Nicotiana plumbaginilfolia. Spliced and unspliced transcripts were analyzed by RNase protection to determine splicing efficiencies. When tested in N . plumbaginilfolia, syn7 was efficiently spliced (82%),above the 55% value for the linked leghemoglobin control intron. This experiment indicates that a synthetic polyribonucleotide can function as an intron and suggests that no previously undetected conserved sequence is required for efficient splicing. Curiously, unspliced transcripts were abundant (> 10% of total transcript) in transient assays but were barely detectable in transgenic calli. No splicing intermediates (such as a lariat structure) were detected.
+
+
+
+
+
TABLE IV SP1,IC:INC; OF INTHONS IN f1ETEHOLOGOUS HOSTS
Organism
Plants Pea
Gene
rbcS
Wheat Maize Wheat Pea Oat Bean
rhcS Adhl Amylase Legumin J Phytochrome Phaseolin
Maize Soybean
bronze-1 Leghemoblobin
Soybean
Leghemoglobin
Sorghum Bean
PEPcarboxylase Phaseolin
Soybean Maize
Leghemoglobin .4ctin
Maize
waxy
Intron
Spliced in
Introns 1 and 2 Intron 1 Intron 6 Intron 2 Intron 1 Intron 1 Intron 1
HeLa in oitro
Intron 1 Introns 1 and 3 Intron 2
HeLa in uitro
All Introns 1 through 5 Intron 1 Introns 1 and 2 Introns 9 and 10
__ HeLa in citro HeLa in oitro HeLa in uitro
-
-
Not (or poorly) spliced in
Comment
Ref. 130
50% spliced 30% spliced
Tobacco Tobacco
-
-
__
-
HeLa in oitro
HeLa cells
Intron is 72 nt; splices poorly in monkey cells -
Each splices at alternative 3’ acceptor
131 131 132 132 71 71
71 69
69
HeLa cells and in uitro Tobacco Maize
-
Maize Nicotiuna plumbaginifolio Nicotiana plumbuginifolia
133
Also splices in tobacco
61
Also splices in tobacco Intron 1 only 42% spliced in Nicotiunu plumbaginifolia Introns 9 and 10 are 40% and 42% (A + U), respectively
61 61 61
Bean
Phaseolin
Intron 3
Maize
Adhl
Maize
Adhl
Introns 1 and 2 Introns 3.and 6
-
Tobacco
-
Tobacco
-
Tobacco
Yeast Cryptococcus
Animals Human Human Human
Intron 1 through 7
Growth hormone
Intron 1 through 4 Intron 1 Introns 1 and 2
-
Introns 1 and 2 Introns 1 and 2 Intron 1 Small-t antigen
-
violaceus Tobacco
-
Tobacco
-
Tobacco
a-Globin P-Globin
Drosophila
P element
Mouse
Metallothionein
Human
Metallothionein t antigen
SV40
+
Zylanase
albidus
U)-rich; 50-69 Introns are not (A nt in length; only 46-54% (A + U)
Tobacco
-
Tobacco and sunflower Tobacco Tobacco and
Orychophagmus
Tobacco
Intron inserted in the bacterial neomycin phosbhotransferase gene Introns 1 and 2 are 56% (A + U) and 54 (A + U),respectively Cryptic 3’ acceptor for intron 3; introns 3 and 6 are 66% (A + U) and 62% (A + U), respectively
334a 134a
135
136
Intron 2 is spliced at alternative 3’ acceptor Both introns -70% (A + U), but are 56 and 53 nt in length
-
Intron is 66 nt and 80% (A
71 69
137 138
-
134
+ U)
63 73
168
KENNETH H. LUEHHSEN ET AL.
The syn7 intron was 75% A + U, but was biased toward U (50%)over A (25%).To test whether an A-rich or U-rich sequence was required for efficient splicing, a synthetic intron was constructed that was again 75%A + U, but was biased for A (48%)over U (27%).The splicing efficiencies of the U-rich and A-rich introns were 82% and 9396, respectively, indicating that, in N. pltimbaginifoliu, the relative proportions of A and U did not affect splicing. As a test for whether an (A + U)-rich intron sequence is required for efficient splicing in tobacco, (G + C)-rich synthetic oligonucleotides were used to replace (A + U)-rich sequences in syn7. After replacing up to 19 A and U nucleotides (near the 3' splice junction) with G and C nucleotides, and reducing the overall (A + U)-content to 52%, the synthetic introns were still spliced efficiently (281%). When the overall (A + U)-content of the synthetic intron was decreased to 36% or <32% by further replacement (in the center of the intron) with (G + C)-sequences, the splicing efficiency was drastically reduced to 47% or
58% A U, and were spliced efficiently. These results indicate that the deleterious effects of (G + C)-rich sequences can be overcome by restoring overall AU-richness to the intron. Similarly, we found that inserting (G + C)-rich sequences into maize Alcohol &hydrogenuse-I intron-1 interferes with splicing. We inserted (G C)-rich Adhl cDNA (exon) sequences of 0.6 to 0.9 kb into different restriction sites of Adhl intron 1, increasing the overall (G + C)-content of the intron from 43% to >48% (74). The insertions were not in or near the conserved splice junctions. Each of the insertions was poorly spliced and the most deleterious phenotypes were observed when the cDNA was placed near the 5' splice junction; alternatively spliced products did not accumulate to a detectable degree on northern blots. As a control, we found that the insertion of 0.65 kb of a native intron sequence (from actin intron 3) did not interfere with intron function, showing that Adhl intron 1 can be modified with additional sequences and still function properly. We and others also found that insertions of the (G + C)-rich Mu1 transposable element into Adhl intron 1 and actin intron 3 interfere with splicing and create alternative HNA processing events (75, 76). Taken together, these experiments suggest that recognition of introns requires more than just the conserved splice junction sequences, and that the splicing machinery probably interacts with
+
+
+
+
169
SPLICING IN HIGHER PLANTS
the entire intron in a process that distinguishes (A + U)-rich introns from (G + C)-rich exons. Dicot introns generally splice efficiently in monocots, but there are several examples in which monocot introns are poorly spliced in dicots (Table IV). For example, soybean leghemoglobin intron 1 splices efficiently in the monocot maize and in the dicot N . plumbagin$olia, but both introns 9 and 10 from the maize Waxy gene are poorly spliced in N . plumbagin$olia. With few exceptions, the absolute (A + U)-content of dicot introns is >60%, and experimental results show that synthetic introns <50% in A + U are poorly spliced in N . plumbagin$olia (61).In contrast, monocot introns average 60% A + U, but several introns are less than 40% A + U. The poor splicing of the maize Wx introns in N . plumbagin$olia may reflect their unusually low (A U)-content of less than 42%, which is below the minimum required for splicing in dicots. The syn7 series of synthetic introns described above was also tested in the monocot maize (61), and the results show that (A + U)-rich introns are efficiently spliced. In addition, synthetic introns that contain up to 75% G C and that are poorly spliced in tobacco are efficiently spliced in maize; this confirms that maize cells can splice (G + C)-rich introns. Native maize introns that are (G + C)-rich have 5’ and 3’ splice junction sequences that conform more closely to their respective consensus sequences, indicating that stronger base-pairing interactions with U1-snRNA might afford improved recognition of (A + U)-poor introns (61). Collectively, these findings suggest that intron (A U)-richness is important for monocot splicing, but that there are mechanisms that allow splicing of (G + C)-rich introns. About 80% of maize introns are <200 nt, suggesting that, for short introns, there are minimal internal sequence requirements for recognition and splicing. Are the required (A U)-rich sequence motifs redundant in long introns? We have addressed this question by making internal deletions of two native maize introns while leaving the conserved intron borders intact. Removing as much as 398 bp of the 534-bp Adhl intron 1 (Fig. 5) or 352 bp of the 882-bp actin intron 3 (not shown) did not diminish splicing efficiency or intron enhancement ( 1 6 ~ )Removing . >430 bp ofAdh1 intron 1 did impair intron function, resulting in reduced intron enhancement and lower splicing efficiency, indicating that some internal intron motifs must be retained. The results suggest that there are minimal internal sequence requirements for efficient splicing, but because large deletions in long introns have no phenotype, the functional motifs are probably redundant. Is an (A U)-rich sequence sufficient to specify an intron and its borders? Using a novel approach, we inserted (A + U)-rich fragments of an intron sequence into the (G + C)-rich (-70%) exon sequence of the Bz2 gene (Fig. 6) and determined how the resulting transcripts were processed
+
+
+
+
+
170 Piasmid
KENNETH R. LUEHRSEN ET AL.
lntron Length
% Luclferase
Expression
intron Structure
pAL67
534
l l 1 1
PAL6103
244
I aa
86
pAL6776 pAL67 73
136
104
72
57
158
398
430
93
64
a 83
pAL67 70
290
100
21
21
I 5 6
4n
51
Frc. 5. Transient expression assay of plasmid constructs containing deletion variants of Adhl intron 1. The plasmid pAL.61 contains the luciferase reporter gene downstream of the Adhl intron 1; its construction has been described in detail (56).Internal deletions were made in Adhl intron 1, and the effects on luciferase expression were tested in a transient expression assay using maize tissue culture cells. The filled boxes represent the remaining intron sequence, and the line shows the deleted intron sequence; the number below each area indicates the number of nucleotides in each region.
(K. H. Luehrsen and V. Walbot, submitted). We found that inserting an internal Adhl intron 1 sequence (that lacked both splice junctions) into Bz2 resulted in the Adhl insert being wholly or partially spliced from the transcript. In one construct, the 346-bp middle of Adhl intron 1 was placed 41 bp downstream of the single Bz2 intron. We found, using the Bz2 5’ splice donor, that the Bs2 intron, the intervening 41-nt exon, and the entire Adhl insert were excised as a single intron. The native Bz2 3’ acceptor was skipped in favor of a cryptic 3’-acceptor at the border between the (A + U)rich Adhl intron sequence and the distal Bz2 exon sequence. These observations support the hypothesis that the transition from an (A + U)-rich to a (G + C)-rich sequence determines the intron 3‘ border and that the 41-bp BzZ exon sequence is too short to be recognized within the context of a large (A U)-rich sequence. When this same Adhl intron fragment was inserted in the antisense orientation into Bz2, two introns were created from the (A + U)rich inserted sequence. The more (G + C)-rich portions of the insert remained as exon sequences. The two introns were excised using cryptic sites at both the 5’ and 3’ splice junctions. When the antisense orientation of the pea legumin intron was inserted into the maize zein (intronless) gene and expressed in tobacco (79, the (A + U)-rich intron sequence was spliced from the transcript through the use of cryptic splice sites. Also, usage of a 5‘ donor can be enhanced by its being
+
171
SPLICING IN HIGHER PLANTS
Plasmid
pcn-s
Insert
346bpmkkReof
Adhl-Simn 1 (ma-)
Splicing outcome
___.
FIG. 6. Alternative splicing induced by insertions of (A + U)-richintron sequence into the B z 2 gene. A 346-bp fragment from the center of Adhl intron 1 was inserted downstream of the single Bz2 intron in either the sense (pi/I346sS) or antisense (pc/I346aS) orientations. Aftef transient expression in maize protoplasts, the splicing outcomes were determined by sequencing cDNAs derived from each construct; other alternative processing events were also observed (not shown) ~
adjacent to a (G + C)-rich sequence (21).Taken together, the results suggest that (A +. U)-rich regions of the pre-mRNA are recognized as intron sequences and are spliced from the transcript, using 5'-donor and 3'-acceptor sequences determined by the (A U)/(G C) transition and by a loose match to the consensus motif...
+
+
E. Splice Site Selection and lntron Recognition Model Based on the data discussed in the previous section, we propose the following working hypothesis as a framework for understanding splicing in higher plants. We propose that the first step of splicing is intron recognition. At this step, internal intron motifs are bound by a nuclear factor (a protein? an RNP?) that recognizes (A U)-rich or U-rich motifs in the body of the intron. Short introns (
+
A
1 GUCCGCCUUGUUUCUCCUCUGUCUCUUGAUCUGACUAAUCUUGGUUUAUGAUUCGUUGAGUAAUUUUGGGGAAAGCUUCGUCCACAGUUUUUUUUUCGAU
Hind11 I 101 GAACAGUGCCGCAGUGGCGCUGAUCUUGUAUGCUAUCCUGCAAUCGUGGUGAACUUAUGUCUUUUAUAUCCUUCACUACCAUGAAAAGACUAGUAA~ 201
UCUCGAUGUAACAUCGUCCAGCACUGCUAUUACCGUGUGGUCCAUCCGACAGUCUGGCUGAACACAUCAUACGAUAUUGAGCAAAGAUCUAUCUUCCC~ Egl I I
301
UUCUUUAAUGAAAGACGUCAUUUUCAUCAGUAUGAUCUAAGAAUGUUGCAACUUGCAAGGAGGCGUUUCUUUCUUUGAA~AACUAACUCGUUGAGUGG -
401
C C C U G U U U C U C G G A C G U A A G GC C ~ G C U G C U C C A C A C A U G U C C A U U C G AA ~ A C C G UG U U U A G C A A G G G C G A A A A GU U U G C A U C U U G A U GAUUUAGC
StuI 501 UUGACUAUGCGAUUGCUUUCCUGGACCCGUGCAG
B 1 GUGAGCACCGAGCAGAGCACGGCAGCAAGUGUCUUUUCUUCAAGGUAACGUGCAACAAACUGCCGUGCCUUUCUGCAG FIG. 7. (A) The seqiience of the 5 W n t maize AdhI intron 1. The strings ofthree or more consecutive Us, or >70% U content for strings 4 nt and longer, are tloul,le-iiiiderlined. (B) The 78-nt single intron from maize Bz.2. with the regions rich in U (as defined in A) douhleimderlined.
173
SPLICING IN HIGHER PLANTS
tion would be accurate provided that long stretches of U residues are absent or infrequent in exons. Confirming this expectation, the U-U and A-U dinucleotides are over- and underrepresented in plant introns and exons, respectively (14). As the pre-mRNA is transcribed and released from the DNA template, we propose that the internal intron recognition motifs are bound by a factor that defines the middle of the intron. We propose that this binding serves two roles. First, potential 5’-donor and 3‘-acceptor sites within the intron are blocked. This is important in plants, because the 5’ and 3’ splice site consensus sequences are short and many introns contain “reasonable” border motifs that are, in fact, never used in splicing. We propose that the correct borders are utilized because the splicing machinery is targeted to the edges of the intron by interacting with the internal intron recognition factor (see Fig. 8). The search window to find either a 5’ or a 3’ splice site would encompass the 10 to 50 nt immediately upstream and immediately downstream of the area coated with the intron recognition factor. We propose that the closest sequence with a reasonable match to the 5’ splice site consensus (a G-U plus two or three additional complementary bases within the region that binds to U1 snRNA) is utilized. Similarly, the closest A-G dinucleotide may be sufficient to serve as the 3’ acceptor; as discussed previously, its use might be enhanced by being distal to a U-rich tract. The internal recognition protein would stabilize snRNP interaction with the intron-exon junctions facilitating splice site selection and full spliceosome assembly despite the divergence of nucleotides at the splice sites and the variability in temperature of plant cells. In summary, we propose that the requirement for internal intron recognition is the unique aspect of intron definition in plants. This extra step provides a mechanism for identifying short introns in the context of large exons, for preventing use of cryptic splice-sites that occur within introns, and for favoring the use of the correct splice sites. The model is entirely consistent with the experimental results available now, but it has not yet intron recognition factor 5’ splice junction
3’ sptiyjunction
AU-rich intron
FIG. 8. intron recognition model for plants. A typical plant gene is composed of (A + U)rich introns surrounded by (G C)-rich exons. We propose that a U- or (A + U)-rich binding factor recognizes and coats the intron sequence, leaving the 5’ and 3‘ splice junctions exposed for binding by snRNPs. (G C)-rich exons are proposed to be deficient in the (A + U)-rich motifs that are substrates for the hypothetical binding protein and hence are left uncoated.
+
+
174
KENNETH R. LUEHRSEN ET AL.
been rigorously tested. For example, what are the sequence requirements for an internal intron recognition motif? In a large intron, what spacing of internal recognition motifs is required? How close to the 5’ and 3’ intron edges are the last recognition motifs?
III. Transposable-element-induced Mutants of Maize Transposons are a rich source of genetic variation in maize and other higher plants. Many transposon sequences inserted in transcription units have an impact on RNA processing by providing alternative splice junctions and polyadenylation signals within the transcript (reviewed in 5, 78). A number of transposon-mediated changes in gene expression have been analyzed in great detail, and these case studies provide a good test of the intron recognition model outlined in Section 11. As identical transposable elements have been inserted in different genes, the impact of the context of the insertion can be explored on a limited basis. In addition, once inserted, subsequent changes in the transposon sequence may alter the splicing phenotype. We discuss several examples in which internal deletions of maize transposons have converted an insertion sequence from a poor to a very good intron. These examples can illuminate the key features of intron recognition in maize. One limitation of studying RNA processing utilizing spontaneous transposon insertion events is that every allele analyzed was recovered because it had a novel phenotype. That is, the sample of alleles is biased for changes that caused a major change in gene expression. It is possible that many transposon insertions are entirely cryptic; for example, if a transposon insertion fails to decrease gene expression compared to the wild-type allele, it will not be recognized as a mutant. And, as is described in Section III,D, some null or low-expression alleles caused by transposon insertion can be restored to partial function by changes in the transposon, but these new partialfunction alleles will be recognized. On the other hand, many changes in the transposon may not restore gene expression, and these cases are not recognized by geneticists and hence have not been analyzed at the molecular level. It is important to keep in mind that one particular transposon can have different effects if inserted at different locations in a transcription unit; this demonstrates that the combination of gene context and transposon sequence determines the type of RNA processing observed. A second aspect of transposon behavior is that an element can be inserted in either orientation
175
SPLICING IN HIGHER PLANTS
relative to the reading frame, with each orientation contributing a different sequence to the transcript. A third feature to consider is that transposable element behavior can influence the impact of an insertion element. In the presence of transposase protein, which is encoded by the autonomous element of a particular element family, transposition can occur. In addition, binding of transposase (and other proteins encoded by the regulatory element) can influence transcription (79) and may influence RNA processing as well. For simplicity, therefore, the impact of a transposon insertion on RNA processing events should be measured in a genetic background lacking an active regulatory element and hence lacking these trans-acting factors. A fourth feature is that individual transposon insertions can suffer spontaneous deletions, altering the size and sequence content of the insertion sequence; as the insert changes, its impact on RNA processing can change dramatically. Fifth, different families of transposable elements differ in their structure, sequence, and content of cryptic 5’ and 3’ splice sites. The terminal inverted-repeat sequences of both the AclDs and SpmldSpm element families contain multiple cryptic splice sites (80, 81). For example, the Ds elements contain three cryptic 5‘ splice sites representing all three reading frames, so that some fraction of splicing events that utilize the sequences in the element termini could restore the reading frame and thus allow production of a partially functional protein. Several authors have proposed that elements processed in this manner are in fact evolving into introns (5, 82). In this section, we describe the impact of the same 2.2-kb hfective Spm (dSpm)element on RNA processing events in several maize genes and of the allelic deletion series of the dSpm in two of these genes. The impact of Ds insertions in several genes is also considered. After describing what is known about each insertion case, we use the results to evaluate the intron recognition model.
A. Properties of Ds Ds elements are the defective (transposase-minus) members of the AclDs family. There are three basic classes of Ds elements that differ in base distribution and the structure of the “right” and “left” terminal inverted repeat (TIR) sequences. Standard Ds elements are 3 to 4.5 kb in length with simple, internal deletions compared to the 4.7-kb Ac element; their base composition is -44% G C. Ds2 elements are -1.3 kb and share substantial homology with Ac at the ends, but contain a unique sequence of about 500 bp in the center; these elements are 47% G C. The -400-bp Dsl elements contain only the 11-bp imperfect TIRs of Ac and an additional -20 bp of homology internal to the right TIR; Dsl elements are very (A + U)-
+
+
176
KENNETH R. LUEHHSEN Kr AL.
+
rich, and <29% G C. The impact of a number of Ds insertions was recently reviewed (5) and is only briefly mentioned here.
B. Multiple RNA Processing Fates after Ds Insertions into Exons Ds insertions into maize exons yield three types of changes in RNA processing events: (1) intron creation using sites within the insertion or a combination of an element 5’-donor site and a cryptic 3’-acceptor site in a flanking exon, (2) alternative splicing in which the Ds insertion becomes part of a novel intron, and (3)polyadenylation within the Ds insert. These alternative R N A fates occur in the absence of Ac. Dsl elements are very short for maize transposons (-400 bp), and with a 71% (A + T)-content, Dsl elements are much more similar to introns than to exons. The first report of intron creation by a Dsl element involves a 405-bp Dsl inserted in the leader sequence of the adhl-Ftn335 allele. The 5’ splice sites within the terminus of Dsl were spliced to a 3’ acceptor created as part of the host sequence duplication generated by tranposon insertion (83, 84). Similarly. a Dsl insertion in exon 9 of the wx-ml allele creates an intron; in this case, three 5‘ donors within Dsl and three 3‘ acceptors (two in Dsl and one in the flanking gene) are used (82).With Dsl insertions, intron creation is the only reported change in RNA processing. Intron creation also occurs after insertion of longer Ds elements into exoiis; so far, all of these cases involve Ds in the antiparallel orientation (opposite the direction of transcription of Ac). As with D s l , multiple 5’ sites near the terminus of-the Ds element can be utilized in wr-rn9, which contains a 4.37-kb deletion derivative of Ac, and in wx-B4, which contains a 1.49-kh Ds2 element (80).The 3’ acceptors within the element and in the flanking DNA are utilized (80).Analysis of the structure of wx-m9 highlights the question of exon size in maize. Although most maize introns are small (>ZOO nt), exons can be relatively large (>300 nt) and there are few microexons (<50 nt). When Ds is spliced from the w.x-ni9 transcript, a inicroexon of 12 nt would be created on the 3’ side, provided the downstream intron is still in the transcript. If the kinetics of intron removal could be analyzed in more detail, this would be an interesting case of exon definition to examine. In some cases, Ds insertions result in a combination of altered splicing behavior and polyadenylation within the inserted element. For antiparallel insertions-those in which the Ds 5’ splice sites are in the “correct” orientation to be recognized-polyadenylation within the element has been detected, i.e., the A c deletion in Ds2 in adh-2F11 (85). Based on northern blotting, the Ds2 in exon 4 of adhl-2F11 yields about 10%mRNA polyadenylated in the insert sequence (85). The other RNA is alternatively spliced, providing in-frame mRNA that apparently encodes a partially fimctional
177
SPLICING IN HIGHER PLANTS
product. The 5’ splice junction of intron 3 is spliced to a cryptic site within exon 4, a site not normally used, removing all of intron 3, exon 4 proximal to the Ds2 insert, all of Ds2, and a few distal bases of exon 4. In this allele, no evidence was found for use of splicing sites within the 1.3-kb Ds2 element. In alleles with “parallel” insertions of the larger Ds elements, the 5‘ and 3’ splice sites are in the wrong orientation to yield an intron. Consequently, in these cases polyadenylation within the insert is the major impact of the Ds element on RNA processing (78). For example, with the w x - d and wx-B3 alleles containing Ds and Ac elements, respectively, most of the RNA is polyadenylated within the transposon sequences (78).
C. Properties of 2.2-kb dSpm Elements Spm is an 8.3-kb autonomous tranposable element encoding two overlapping gene products, TNPA and TNPD. The two products result from the alternative splicing of the 4.4-kb first intron of Spm (86)(see Section IV). The dSpm elements are simple deletion derivatives of the autonomous element, typically involving removal of continuous segments of the internal part of Spm. Many such deletions alter the amount or type of gene product produced in the absence of Spm, suggesting that posttranscriptional events rather than excision are affected. Deletions with a recognizable, new phenotype are termed “change of state,” or CS, alleles. The impact on RNA processing of the 18-type of dSpm will be described (Table V). The I8 element is 2.2-kb long, and was first cloned from the wx-m8 TABLE V 18 dSpm INSERTIONS IN MAIZEGENES
RNA processing event“ Reporter allele
Insert location
Size
a1 -ml CS5719
Exon 2
2.2
a2-ml Class I1 state
Exon 1
-
wx-d
Exon 11
bzl -ml3
Exon 2
cs3
css cs64
-
(kb)
Splicing
Probably
2.2 1.3
Inefficient Efficient
2.2 2.2
Not observed Inefficient; spliced at AS1 and AS2 Efficient Efficient Inefficient; AS1 removed by deletion
OEfficient splicing is defined as >SO%
Ref.
-
-
1.8 0.9 2.2
Polyadenylation
Efficient
-
87, 89 81, 93, 94
178
KENNETH R. LUEHRSEN ET AL.
+ C. The same element has been identified in the a2-m1, a l - m l , and bzl-m13 alleles; a2-ml was the first of these mutable alleles to be genetically identified by McClintock. The other alleles probably contain the identical element transposed from a2-ml (88);at the DNA sequence level, the elements are identical, although not all have been fully sequenced.
allele (87j. This dSpm element is 48% G
D. Altered RNA Processing after Insertion of the 2.2-kb dSpm Element and Its DeIetio n Derivatives As was observed for large Ds insertions, the parallel orientation d S p m inserts examined result in polyadenylation within the element sequence. In the wx-rn8 allele, dSpm is inserted in the middle of an exon, and transcripts terminate within the transposon sequence (87, 89). The site utilized is not normally a transcript termination sequence in Spm. Three alleles with antiparallel dSpm insertions have been examined. Again, similar to Ds insertions, 5’ and 3’ splice sites within the dSpm transposon are utilized in a variety of gene contexts. For example, the same 3’ acceptor A-G is utilized in both a2-ml and bzl-m13. The a1 -m1(6078)allele contains an antiparallel dSpm insertion four bases internal to the exon 2lintron 2 boundary (90,91). The dSpm is not spliced from the mRNA in the original allele; instead, most of the transcript is read through. The a1 -ml allele allows molecular examination of a phenomenon McClintock termed ’change of state. CS derivatives of mutable alleles are permanent alterations in the allele; CS alleles contain deletions or internal rearrangements of dSpm that alter interaction with transposase or the impact of the insert on the expression of the host gene. We consider several cases in which a CS derivative has higher expression that the original allele in the absence of S p m ; because transposase is not present, the increase in gene expression must reflect a change in the production of transcript that encodes the host gene product. For example, the 5719 CS derivative of al-ml has an internal deletion within the 2.2-kb dSpm that removed most of the central and (G C)-rich T, and the region, leaving a dSpm of 789 bases. This insert is now 57% A allele is expressed at a much higher level; it is likely that the dSpm is spliced from the transcript. More is known about two a2-ml alleles, which contain an antiparallel dSpm insertion. The dSpm is inserted about 20% into the intronless A2 gene (Fig. 9). The A2 coding region is 70% G C, as are the sequences immediately surrounding the dSpm insertion. With the 2.2-kb dSpm insert present, kernels are almost colorless, indicating that almost no mRNA is produced that encodes the A2 product. That is, the level of splicing
+
+
+
179
SPLICING IN HIGHER PLANTS
00 bp (55% GC)
a2-M original state 353 bp (78% GC)
FIG.9. Structure of the a2-ml allele. The a2-ml original-state allele has a 2.2-kb dSpm element inserted into the intronless A2 gene. The a2-ml class-I1 state is a deletion derivative of the a2-ml original state in which a 900-bp sequence was deleted. The deletion results in the creation of an intron in the class-I1 state allele; a G-C substitutes for the canonical G-U that is used in almost all 5' splice junctions (see Tables I and 111).
of the dSpm insert from the transcript is either very low or the splicing that occurs does not restore a functional open reading frame. In the state-I1 CS allele of a2-mI, however, a well-spliced intron is created. In this allele, there is a deletion in dSpm that removes 900 bp (55% G + C), including a 353-bp block that is very (G + C)-rich (78%). The remaining d S p m sequence is 1341 bp and only 43% G + C. In this CS allele, over 75% of the mRNA is spliced, removing almost all of the dSpm; seven amino-acids are added to the A2 product, and the resulting protein yields almost wild-type coloration. The 5' and 3' splice sites utilized in this case of intron creation are both within the dSpm, and hence existed in the same context in the original, low-function a2-ml allele. However, the dSpm was not spliced to an appreciable extent until the (G C)-rich central sequence was deleted. The relevance of these transposon cases to our model of intron recognition is that intron creation depends on the overall (A + U)-content of a region and the proximity of suitable 5' and 3' splice sites. In the case of a2-ml alleles, the splice sites were always present, but were only used efficiently to
+
KENNETH R. LUEHRSEN ET AL.
180
allow removal of the dSpm as an intron when the (A + U)-content of the insert increased from 52% to 57% and was free of (G + C)-rich (>60%) blocks. It is also instructive that one large intron was created, not a series of smaller ones. The allele bd-in13 has the same 2.2-kb dSpm insertion 38 bp downstream of the single intron of B z l (81) (Fig. 10). Expression of the bzl-m13 allele is suppressed by the products of an autonomous Spm element; hence, excision events appear as spots of wild-type pigmentation on a bronzecolored background (92). In the absence of Spm, the bzl-m13 allele conditions full kernel color. A long series of CS alleles (summarized in part in Table V) exist for the bzl-m13 allele, and these were selected as derivatives in which dSpm was excised later in development and showed a low rate of germinal reversion (92). Like bzl-m13, many of the CS alleles condition full kernel pigmentation in the absence of Spm. Molecular analysis showed that each CS allele had an altered dSpm element present in the same position and orientation as the original bzl-m13 insertion (92-94). Most of the CS alleles had a large deletion of the dSpm sequence, but still contained at least 1 kb of the insertion. How is functional BZ1 enzyme produced when hundreds of base pairs of dSpm sequence are inserted in the middle of the B z l gene? Northern analy-
A
"."""..
.""**"... 5'
V
An 3
FIG. 10. Structure of the bz-nll3 allele. The Ic-nt13 allele has a 2.2-kb dSpm element inserted in the second exon, 38-bp downstream of the single B z l intron. CS3 and CS9 are deletion derivatives (dotted line) of bz-rn13. AS1 and AS2 are alternative 3'-acceptor sites.
181
SPLICING IN HIGHER PLANTS
sis of bzl-m13 and the CS alleles showed that multiple RNA species were present for each variant in the absence of Spm (93). Some transcript from each allele was of the size expected for unspliced RNA, and contained both bzl and dSpm sequences. Two additional transcripts of 2.4 and 1.8 kb were detected for several CS alleles, but only the 1.8-kb transcript was found for CS3 and CS9. Using a dSpm probe, it was further determined that the 2.4kb transcript contained some dSpm sequence, whereas the 1.8-kb transcript did not. The 2.4- and 1.8-kb transcripts each hybridized to a bzl probe 3’ of the insertion site, implying that they did not arise by polyadenylation within the dSpm element. Taken together, the data suggest that the dSpm element is spliced from the 1.8-kb transcript, but that the 2.4-kb transcript still contains some, but not all, of the dSpm sequence. Sequencing of the cDNAs from several of the CS alleles (81,94) showed that the bzl intron is sometimes spliced using a 3’ acceptor (AS1) in the distal terminal inverted repeat of the dSpm element. Using AS1, the bzl intron, the intervening 38 nt, and all of the dSpm element (except the final 2 nt of the distal TIR) are excised, resulting in a 1.8-kb transcript. Removal of 38 nt of exon 2 and addition of 2 nt from the right TIR leave the bzI reading frame intact. The amount of BZ1 enzyme activity observed for each CS allele is proportional to the level of 1.8-kb transcript (spliced at ASl), consistent with the notion that the 1.8-kb mRNA encodes functional BZ1 product. The CS64 allele has a 4-bp deletion that removes AS1, and this allele is null, further indicating that the 1.8-kb transcript encodes functional BZ1. Taken together, the results imply that translation of the 1.8-kb transcript produces a truncated BZ1* protein that still conditions kernel pigmentation. Further, S1 nuclease and cDNA analyses of mRNA from several CS alleles indicated that the bzl intron 5’ donor is sometimes spliced to a second 3‘ acceptor (AS2) in the middle of dSpm, resulting in the 2.4-kb transcript. The dSpm sequence in the 2.4-kb transcript disrupts the bzl reading frame and is not expected to encode functional BZ1 enzyme. No transcripts containing the 38 nt between the bz1 intron and the dSpm element were detected, indicating that the bzl intron 3’ acceptor is always skipped in favor of either AS1 or AS2 in dSpm. Although many CS alleles condition full kernel color in the absence of Spm, in uiuo levels of BZ1 activity vary widely among alleles (93). This suggests that the production of the 1.8-kb transcript (splicing at AS1) differs for each CS allele and that the deletion of dSpm sequence changes the frequency with which AS1 and AS2 are used. Using the splicing model outlined in Section 11, we discuss the alternate use of AS1 and AS2, and why the bzl 3’ acceptor is always skipped. The dSpm element in bzl-m13 is 52% A T overall and is inserted in the very (G C)-rich (70%)b d exon 2. The dSpm element is recognized as
+
+
182
KENNETH R. LUEHRSEN ET AL.
intronlike, owing to the 18% difference in (G + C)-content between it and bzl exon 2. In the original bzl-m13 allele, the bzl 3’ acceptor is never used and 51% of the transcript is spliced at AS1; AS1 is used 5.8-fold more often than AS2 (94).Internal deletions in the dSptn sequence alter the relative use of AS1 and AS2. Although the dSpm element is (A + T)-rich, there is a 331b p region near its 3’ end that is 79% G + C (see Fig. 10);this sequence is between AS2 and AS1. A 1.3-kbdeletion in CS9, including AS2 and all of the (G C)-rich region, improved the splicing efficiency at AS1 from 51% to 95%. The overall (A T)-content of the dSpm in CS9 is increased to 58%. By making the region more (A + U)-rich and intronlike, the CS9 transposon is better recognized and its splicing efficiency is virtually wild-type. Similarly, the CS allele of a%mI illustrated that removal of this region improved splicing of dSptn. Our previous work has shown that the insertion of a (G C)rich sequence into maize introns interferes with splicing (74); that the deletion of the (G + C)-rich segment in CS9 improves splicing is a test of the reverse possibility. The CS3 allele is missing a 437-bp (65% G + C) segment that includes most of the (G + C)-rich region, but it still retains both AS1 and AS2. For the CS3 allele, 91% of the transcript RNA is spliced at ASl, and AS1 is used 47fold more often than AS2. Thus, removing the (G + C)-rich sequence block between AS2 and AS1 greatly favors the use of AS1 over AS2 compared to the bzl-m13 allele. We propose that the transition from an (A + U)- to a (G + C)-rich sequence just downstream of AS2 helps it to be recognized as a 3’ acceptor, and that the CS3 deletion removes this signal. With the removal of the (G + C)-rich segment in CS3, AS3 was masked by being surrounded by an (A + U)-rich segment. If the (G + C)-rich segment of dSptn helps promote the use of AS2 as a 3’ acceptor, why is splicing at AS1 favored over AS2 in bzl-m13? AS1 might be an intrinsically better 3’ acceptor than AS2. Although the critical cis-acting sequences involved in plant 3’-acceptor choice have not been fully determined, plant 3’ acceptors are usually preceded by a U-rich block (14, 15). Ten of sixteen nucleotides preceding the A-G of AS1 are U, whereas only six of 16 nucleotides preceding AS2 are U, suggesting that AS1 might be a superior 3’ acceptor. Alternatively, we suggest that because AS1 is at the far downstream edge of an (A + U)-rich sequence, its use is favored. Even though AS2 is proximal to the 331-nt (G + C)-rich block, the 240-nt block proximal to AS1 is 56% A + U, and is thus intronlike (Fig. 10). When presented with an RNA having (A U)-rich insertions, the maize splicing machinery appears to favor the removal of the entire region; transcripts spliced at AS2 still retain the 240-bp (A + T)-rich region within a (G C)rich millieu and have possibly escaped full processing. We predict that AS2 would be efficiently used if a suitable 5‘ donor was available near the edge of
+
+
+
+
+
SPLICING IN HIGHER PLANTS
183
+
the (G C)-rich block that immediately follows AS2; this would, in effect, create two introns. In the natural alleles, however, no sequence similar to the 5’ donor consensus is present within 50 bp downstream of the beginning of the (G + C)-rich block. If an additional 5’ donor were available, we predict that two introns (using AS2 and AS1) would separate an exon that includes the 331-nt (G C)-rich block of dSpm. Why isn’t the native bzl 3’ acceptor used in the mutants? We propose that the insertion of an (A U)-rich sequence just downstream (38 nt) of the bzl 3’ acceptor internalizes it into a much larger (A + U)-rich region, where it might be “coated with a U-rich binding protein. It was not recognized because it no longer appeared at a discrete border between (A + U)- and (G + C)-rich sequences. The 38 bp that separates the intron and dSpm insertion is probably not long enough to be recognized as an exon; [recall that a 41-nt (G C)-rich exon in Bz2 was skipped in a similar situation (Fig. 6)]. Small exons may be at a particular disadvantage if splice-site choice depends on internal intron recognition. An unusually small exon of 24 nt appears in Gpcl, but the next smallest maize exon is 47 nt, suggesting that exons must be of a minimum size to be distinguished from surrounding introns. The requirement of a minimum exon size probably prevents missplicing to 5’ donors or 3’-acceptor sequences that appear fortuitously in both exons and introns.
+
+
+
IV. Examples of Alternative Splicing In Section 111, the TE-induced alternative splicing of several maize mu,tants is described. Many animal genes undergo tissue- and stage-specific alternative splicing (18,95),but only a few examples of alternative splicing of native plant genes have been described (Table VI). The first reported case of alternative splicing in plants was for the spinach rubisco activase transcript (96). Although rubisco activase exists as a singlecopy gene, two immunlogically similar polypeptides of 41 and 45 kDa are present in spinach leaves. Sequencing the corresponding cDNAs indicated the use of alternative 5’ donors and a common 3’ acceptor for the sixth intron. Splicing at the upstream 5’ donor results in an mRNA encoding the 45-kDa polypeptide. Use of the downstream 5‘ donor added 22 nt to the mRNA, including a stop codon that leads to the truncated 41-kDa polypeptide. In western blots using rubisco activase antibody as a probe, the 45- and 41-kDa proteins were present in equimolar amounts in spinach leaves. Although both of these forms of rubisco activase are present in leaves, no functional difference has been ascribed to the two forms; each activates rubisco in the absence of the other. The mechanism of alternative splicing is
184
KENNETH H. LUEHRSEN ET AL.
EXAMPLES OF
TABLE \'I ALTERNATIVESPLIolNC
IN PLANTS
Ref.
Com inent Maize
S p m intron 1
Either the 4.4-khintron 1 is excised or alternative 3' acceptor in intron 1 a l l w s expression of the t n p D gene
86
Spiixtch
Rubisco activase intron 6
Alternative 5' donor results in a truncated protein; similar splicing seen in Hordeurn, but not in Chlntnydotnona.~ or maize
86, 96-99
Arabidopsis,
RNA polymerase I1
Alternatively spliced intron in 3' UTR
139
P intron 2
Alternative 3' acceptor
67
soybean
hlaize
also undetermined. Alternative splicing of the same rubisco activase intron has been described for Arabidopsis and barley (97), but only the larger polypeptide (upstream 5' donor) is observed in inaize (98) and Chlninydonwnas (99). The maize P gene is a regulator of pigment synthesis in pericarp tissue, and its amino-acid sequence shows homology to rnyb transcription factors (66. 67). The gene encodes two transcripts of 1.8 and 0.945 kb, resulting from alternative splicing of the second intron. The second intron is either 4.9 or 7 kb in length. The 7-kb intron is the longest so far described for plants. For the second intron, the same 5' donor is used with two different 3' acceptors. Each transcript is also alternatively polyadenylated; splicing of the 7-kb intron removes the polyadenylation signal used for the 1.8-kb transcript. Both transcripts are abundant in aim, and genetic analysis of P inutants suggests that the protein product translated from the 1.8-kb transcript is necessary for P function. The protein products from the 1.8-and 0.945-kb transcripts have the same N-terminal region, including the region of hoinology to myb, but differ by the length of the C-terminal end. It has been suggested (67) that the myb-containing product of the 0.945-kb transcript might act as a competitive inhibitor of P function. No mechanism was proposed to account for the use of alternative 3'-acceptor sequences. The transposable element S p m is perhaps the most interesting example of plant alternative splicing (86). The 8.3-kb element has several long open reading frames, two of which (ORF1 and ORF2) occur in the 4.4-kb first
SPLICING IN HIGHER PLANTS
185
intron. Genetic analysis had shown that transposition requires the integrity of intron 1 reading frames, implying that the transposase might result from the translation of unspliced or alternatively spliced mRNAs. Transposition requires the tnpA and tnpD gene products produced by alternative splicing; direct evidence for the contribution of these products was provided by transforming tobacco with various cDNAs (86, 100).The most abundant Spmencoded mRNA is 2.4 kb and encodes the tnpA gene product. Using RTPCR to amplify rare cDNAs, Masson et d.(86)were able to recover alternatively spliced cDNAs that contained the first intron and its reading frames. In these rare cDNAs, the first intron 5‘ donor was spliced to a 3’ acceptor upstream of the intron 1 ORFs, allowing the ORFs to be translated. The tnpD gene product is composed of ORFs 1 and 2 fused by the removal of an intron. The mechanism responsible for 3‘ acceptor choice has not been determined, but ORF2 shows 29% amino-acid identity with the reu-encoded protein of HIV-1. The reu gene product is involved in the regulation of viral splicing, suggesting that the protein product of Spm ORF2 is potentially involved in modulating splicing of the tnpD transcript.
V. Biological Phenomena Associated with Splicing
A. lntron Enhancement Although some plant genes do not have introns (i.e., maize zein genes and most histone genes), most have from one to more than 20 introns scattered throughout the protein-coding exons and the 5’ and 3’ UTRs. For some plant genes and plant reporter-gene chimeras, the inclusion of a spliceable intron in the transcription unit increases gene expression at both the mRNA and protein levels; a similar finding has been seen in mammals (101).This phenomenon has been termed “intron enhancement of gene expression.” Several studies show that introns enhance gene expression in both monocots (55, 56, 102, 103) and dicots (54, 104, 105). Quantitatively normal expression of the maize Adhl gene in stably transformed BMS calli requires one or more introns in the transcription unit (55).The level of enhancement compared to the cDNA is up to 50-fold depending on the intron(s) used and their positions in the transcription unit. In transient assays, Adhl intron 1 enhanced the expression of chimeric gene constructs having the CAT, firefly luciferase, neomycin phosphotransferase, or GUS reporter genes driven by the CaMV 35s and Nos promoters (55, 56). In addition to Adhl intron 1, several other maize introns enhance expression. Adhl introns 2 and 6 (106), Shrunken-] intron 1 (57, 107), the single Bronze-1 intron (55),and actin intron 3 (56) typically enhance expression
186
KENNETH R. LUEHRSEN ET AL.
from 2- to 100-fold when placed upstream of a reporter gene. Intron enhancement has also been described in dicots: the potato ST-LS intron 2 increases reporter gene expression -6-fold in bean (105), and pea rbcS introns increase expression -5-fold in transgenic tobacco (54, 104). Introns can function across species barriers; the maize Adhl intron 1 enhances expression in other monocots such as rice and wheat (108),and the maize Sh1 intron 1 enhances expression in guineagrass and napiergrass (107).The bean catalase intron 1 enhances expression of a linked GUS gene in the monocot rice but not in the dicot tobacco, indicating that the ability of individual introns to enhance expression must be determined empirically (109). Surprisingly, intron enhancement is observed even when an intron is not efficiently spliced. In transient assays using constructs containing Adhl intron 1 or actin intron 3, significant amounts of unspliced RNA accumulated (56),but 2- to 10-fold enhancement levels were seen. The ability of an intron to enhance gene expression appears to depend on its placement within the transcription unit. Inserting Adh1 intron 1 and Sh1 intron 1 upstream of the promoter has little effect on expression (55, 57), suggesting that these introns do not function as transcriptional enhancers. Adhl intron 1 can stimulate CAT expression 110-fold when placed in the 5' untranslated region (UTR) but only 5-fold when placed in the 3' UTR (55). Similarly, Adhl introns 2 and 6 placed in the 5' UTR enhance expression of a CAT-containing chimeric gene from 12- to 20-fold, but have no effect when placed in the 3' UTR (106). The molecular mechanisms responsible for enhancement are not well understood. Although some animal introns contain transcriptional enhancers and thus increase initiation at the proximal promoter (11O-112),most introns do not stimulate transcription, but rather mediate enhancement posttranscriptionally (55, 57, 113). Studies using mammalian cells suggest that entry of the pre-mRNA into the splicing pathway improves the nuclear stability of the transcript, possibly by leading to more efficient capping (114) or polyadenylation (113). Alternatively, the spliceosome might protect the pre-mRNA from nuclease attack, or the transcript could be shunted into a more efficient transport pathway through splicing machinery localized in the nucleus (115).That plant introns mediate greater enhancement when near the 5' end of the transcription unit suggests that early recognition of the intron within the nascent transcript facilitates mRNA accumulation.
B. Effects of Stress on Splicing Splicing in yeast and animals is inhibited by heat stress (reviewed in 116); under these conditions, unspliced pre-mRNA accumulates. To date, there has been little systematic study of splicing during environmental stress of plants. Curiously, unspliced pre-mRNA is routinely detected in normal plant
SPLICING IN HIGHER PLANTS
187
tissue (60). Because plants in the temperate zone experience wide diurnal fluctuations in temperature as well as variations in nutrient availability and exposure to toxic molecules, it is possible that the unspliced transcripts observed reflect stress conditions. The first studies on the effect of heat stress on plant splicing found that splicing is unaffected by elevated temperature. Splicing of the Gmhsp-26A pre-mRNA is unaffected in soybean seedlings kept at 40°C for 2 hours in a sucrose/potassium phosphate buffer (I17). A similar heat stress protocol on petunia leaves showed that the splicing of petunia hsp7O pre-mRNA is unaffected at 42°C (118). In maize seedlings incubated for several hours at 42"C, no accumulation of unspliced Bz2 pre-mRNA was observed (60). In contrast, maize seedlings kept for 10 minutes at 45°C accumulated polyubiquitin pre-mRNAs, indicating that the heat stress inhibited splicing (119). This experiment was carried out by submersing the seedling in 45°C water, rather than in a sucrose solution. Because methods of stress induction often vary, exact comparisons of data from different studies are difficult. The finding that some plant heat-shock genes contain introns suggests that the plant-splicing machinery might remain functional during heat stress. However, severe heat stress can inhibit splicing of at least two heatinduced, intron-containing transcripts. Maize hsp7O transcripts were isolated from tassel spikelets incubated at 24"C, 40"C, and 45°C; unspliced premRNA accumulated only at 45"C, a potentially lethal temperature (120).The splicing efficiency of Arabidopsis hsp81 pre-mRNA at 22"C, 30"C, 35"C, and 40°C has been determined (121).Although spliced mRNA was present at all temperatures, unprocessed pre-mRNAs were detected only at 40°C. It is not surprising that milder heat stress was sufficient to interfere with intron processing in Arabadopsis, as this plant shows optimal growth at cooler temperatures than does maize. Taken together, the above experiments show that heat stress can interfere with splicing, even for intron-containing genes that are induced under the stress condition; however, some splicing continues even at lethal temperatures. It will be interesting to know whether introns from different genes are differentially spliced under the same heatstress conditions. Also, the temperature at which a plant or plant tissue experiences heat stress and splicing failure is probably specific for each species. Splicing failure may be observed only after near-lethal exposure. Studies of the impact of heavy metals on splicing indicate that cadmium and, to a lesser extent, copper block splicing. Using petunia leaf tissue immersed in 50 pM cadmium chloride, the accumulation of both spliced and unspliced hsp7O transcripts has been observed (118). Dosages of 0.5 and 5 mM CdCl, resulted in increased levels of both spliced and unspliced transcript. When the tissues were treated with a solution of 5 mM CuCl,, spliced and unspliced transcripts also accumulated, but at levels only a fraction of
188
KENNETH R. LUEHRSEN ET AL.
that seen after cadmium treatment. Both spliced and unspliced products of soybean Gmhsp-26A are found after a %hour 0.2 mM CuSO, treatment (122). Splicing of a soybean auxin-regulated pre-mRNA is not inhibited following a %hour 1 mM CuCI, treatment (123). As with heat stress, stress induction methods varied among the studies, and these differences might explain the conflicting results. Other heavy metal and chemical stresses have been investigated-including silver nitrate, zinc acetate, zinc chloride, sodium arsenate, and sodium chloride-but no accounts of splicing interference have been reported for the transcripts examined. High temperatures block intron processing in DrosophiZu, and this block is absolute (1%). Splicing failure occurred in all transcripts tested and was relatively complete. Similarly, it appears that both heat shock and heavy metal stress may interfere with intron processing in many plants, even though unspliced transcripts have not always been observed. Unlike Drosophila, however, accumulation of spliced transcripts is never completely blocked in plants, and for stress-inducible transcripts, accumulation of both spliced and unspliced forms increases with the severity of the stress. This pattern is consistent with the expectation of greater flexibility of biochemical processes in plants relative to animals, because temperature plants must acclimate continuously to diverse environmental stresses.
VI. Perspective As has been observed for many other biochemical processes, plant intron splicing conforms to the general principles established by studies of fungi and animals. However, it is also true that, despite the similarity of mechanisms among eukaryotic kingdoms, the regulation of biochemical processes is often distinct, an outcome of the differences in selection pressure. Given the variable environmental conditions experienced by flowering plants, it might be predicted that plant intron splice sites would have been precisely conserved, and even enlarged, so that complementarity to the relevant snRNAs would optimize the probability of correct splice site selection over a broad temperature range. This is the case in yeasts. The short fungal introns contain these highly conserved motifs and a U-rich domain preceding the 3’ border; together these “required” features constitute up to 50%of the intron sequence. In contrast, higher plant intron splice site consensus regions and the postulated branchpoint region show less conservation than do mammalian introns, all of which are spliced at a constant temperature. Given the lower nucleotide sequence conservation of plant splice sites and the variety of conditions under which plant introns are spliced, how are plant introns accurately defined and metabolized?
189
SPLICING IN HIGHER PLANTS
We propose that dispersed internal intron motifs (probably U-rich in monocots and (A U)-rich in dicots) serve to demark a region containing a potential intron. Thus, it is proposed that the initial step in pre-mRNA splicing in plants is intron recognition. We further propose that the nuclear factor that binds the internal intron motifs masks potential splice sites within the body of the intron and may direct or stabilize the interaction of the relevant snRNPs with splice sites at the proximal and distal edges of the intron. We interpret the pattern of (G C)-rich exons flanking (A U)-rich (or U-rich) introns observed for plant transcripts as reflecting the selection for internal intron binding motifs within introns and selection against such motifs in exons. As most plant introns are short, i.e., 80% of maize introns are less than 200 nt, only one or a few internal intron recognition motifs may be required to specify most introns. Longer introns are postulated to contain multiple motifs. Two lines of evidence support this conjecture. First, the internal regions of long introns can be deleted without affecting splicing efficiency. Second, these “superfluous” intron middles can create introns when inserted in (G C)-rich exon sequences, indicating that they contain the information required to specify an intron. We consider it likely that internal intron recognition in plants reflects a specialization of an RNA-binding factor involved in nuclear-RNA processing in fungi and animals. A factor that plays a general RNA-binding role in these groups could readily have been selected for a more critical and specific role in flowering plants. Alternatively, mammals and fungi may have lost a splicing factor common to plants and lower eukaryotes with (A U)-rich introns. Intron recognition in plants appears to be an imperfect process given that the steady-state levels of unspliced transcript in plants are much (10- to 100fold) higher than in other kingdoms. The apparent relaxation of conservation of splice site sequences in plants may reflect the greater role played by the hypothesized internal intron recognition motifs in intron definition in plants. Such relaxation may, however, alter the kinetics and/or fidelity of splicing, resulting in unspliced or alternatively spliced products that accumulate in plant cells despite their lack of biological function. An equally plausible explanation is that variable conditions preclude efficient splicing of all introns under all conditions, resulting in the continuous production of some unspliced mRNAs under any growth regime. Explicit tests of the intron recognition hypothesis and further exploration of the structure of the plant splicing components should allow refined insight into the requirements for intron processing in flowering plants. A combination of studies utilizing engineered introns and the allelic series derived from transposable-element insertions provide the starting material for in vivo studies. Further information, including confirmation of lariat formation, awaits development of an in vitro splicing assay using plant extracts. Of
+
+
+
+
+
190
KENNETH R. LUEHRSEN ET AL.
particular interest will be discoveries that highlight the unique features of plant splicing and that link this fundamental molecular process to the constraints of angiosperm life history. ACKNOWLEDGMENTS We thank Chris KO for comments on this manuscript and all the members of the Walbot laboratory for helpful discussions. Our research on maize intron splicing is supported by grants from the NlH and U S D A .
REFERENCES 1 . W. Gilbert, Science 228, 823 (1985).
2. 3. 4. 5. 6. 7. 8.
9.
W. Gilbert, Nature 271, 501 (1978). T. Cavalier-Smith, Mature 315, 283 (1985). J. R. Manhart and J. D . Palmer, Nature 345, 268 (1990). M. Purugganan and S. Wessler, Genetica 86, 295 (1992). E. Brody and J. Abelson, Science 288, 963 (1985). A. Jacquier, TZBS, 15, 351 (1990). P. A. Sharp, Science 254, 663 (1991). 6. J. Goodall, T. Kiss and W. Filipowicz, in "Nuclear RNA Splicing and Small Nuclear RNAs and Their Genes in Higher Plants" (B. J. Miillin, ed.), p. 255. Oxford University
Press, Oxford, 1991. 10. T. Blumenthal and J. Thomas, Trends Genet. 4, 305 (1988). 11. S. U'.Ruby and J. Abelson, Trends Genet. 7 , 79 (1991). 12. C. Guthrie, Science 253, 157 (1991). 13. J. W. S. Brown, NARes 14, 9549 (1986). 14. 0. White, C. Soderlund. P. Shanmugan and C. fields, Plant Mol. Biol. 19, 1057 (1992). 15. B. A. Hanley and M. A. Schuler, NARes 16, 7159 (1988). 16. B. Patterson and C. Guthrie, Cell 64, 181 (1991). 16a. K. R. Luehrsen and V. Walbot, Plont Mol. Biol.; in press (1994). 16b. H. Lou, A. J. McCullough and M. A. Shuler, MCBiol 13, 4485 (1993). 17. C. Csank, F. M. Taylor and D. W. Martindale, NARes 18, 5133 (1990). 18. M. R. Green, Annu. Rev. CeU BioZ. 7, 559 (1991). 19. M . Rosbash and B. Seraphin, TZBS 16, 187190 (1991). 20. K. K. Nelson and M. R. Green, Genes Dew. 2, 319 (1988). 21. A. J. McCullough, H. Lou and M. A. Schuler, MCBiol 13, 1323 (1993). 22. C. I. Reich, R. W. VanHoy, G. L. Porter and J. A. Wise, Cell 69, 1159 (1992). 23. Y. Zhuang, A. M. Goldstein and A. M. Weiner, PNAS 86, 2752 (1989). 24. J. Wu and J. Manley, Genes Dec. 3, 1553 (1989). 2.5. R. A. Parker, P. G. Siciliano and C. Guthrie, Cell 49, 220 (1987). 26. M . A. Garcia-Blanm, S . F. Jamison and P. A. Sharp, Genes Dew. 3, 1874 (1989). 27. A. Newman and C. Norman, Cell 65, 115 (1991). 28. A. J. Newman and C. Norman, Cell 68, 743 (1992). 29. J. A. Steitz, Science 257, 888 (1992). 30. J. Wu and J. L. Manley, Nature 352, 818 (1991). 31. B. Datta and A. M. Weiner, Nature 352, 821 (1991).
SPLICING IN HIGHER PLANTS
191
P. Fabrizio and J. Abelson, Science 250, 404 (1990). H. D. Madhani and C. Guthrie, Cel 71, 803 (1992). T. Tani and Y. Oshima, Genes Dev. 5, 1022 (1991). H. D. Madhani, R. Bordonn6 and C. Guthrie, Genes Dev. 4, 2264 (1990). J. Andersen and G . W. Zieve, BioEssays 13, 57 (1991). C. Guthrie and B. Patterson, ARGen 22, 387 (1988). H. D. Parry, D. Scherly and I. W. Mattaj, TlBS 14, 15 (1989). S. V. van Santen and R. A. Spritz, PNAS 84, 9094 (1987). 40. D. B. Egeland, A. P., Sturtevant and M. A. Schuler, Plant Cell 1, 633 (1989). 41. B. A. Hanley and M. A. Schuler, NARes 19, 1861 (1991). 42. T. Kiss, M. A n d and F. Solymosy, NARes 15, 543 (1987). 43. P. Vankan and W. Filipowicz, EMBO J . 7, 791 (1988). 44. R. Waugh, G. Clark and J. W. S. Brown, Gene 107, 197 (1991). 45. Y. Q. Hu, J. W. Brown, R. Waugh and P. C. Turner, BBA 1129, 90 (1991). 46. B. A. Hanley and M. A. Schuler, NARes 19, 6319 (1991). 47. R. Luhrmann, B. Kastner and M. Bach, BBA 1087, 265 (1990). 48. R. B. Meagher, M. D. McLean and J. Arnold, Genetics 120, 809 (1988). 49. S. Huang and D. L. Spector, PNAS 89, 305 (1992). 50. D. L. Spector, X.-D. Fu and T. Maniatis, EMBOJ. 10, 3467 (1991). 51. Z. Palfi, M. Bach, F. Solymosy and R. Luhrmann, NARes 4, 1445 (1989). 52. G. G . Simpson, P. Vaux, G. Clark, R. Waugh, and J. D. Beggs and J. W. S. Brown, NARes 19, 5213 (1991). 53. C. G . Simpson, R. Sinibaldi and J. W. S. Brown, PlantJ. 2, 835 (1992). 54. A. J. McCullough, H. Lou and M. A. Schuler, NARes 19, 3001 (1991). 55. J. Callis, M. Fromm and V. Walbot, Genes Dev. 1, 1183 (1987). 56. K. R. Luehrsen and V. Walbot, MGG 225, 81 (1991). 57. C. Maas, J . Laufs, S. Grant, C. Korfhage and W. Werr, Plant Mol. Biol. 16, 199 (1991). 58. G. J. Goodall, K. Wiebauer and W. Filipowicz, in “Methods in Enzymology” (J. E. Dahlberg and J. N. Abelson, eds.), Vol. 181, p. 148. Academic Press, San Diego, 1990. 59. J. Nash, Ph.D. thesis. Stanford University, Stanford, California, 1992. 60. J. Nash and V. Walbot, Plant Physiol. 100, 464 (1992). 61. G. J. Goodall and W. Filipowicz, EMBO J. 10, 2635 (1991). 62. R. Parker and C. Guthrie, Cell 41, 107 (1985). 63. E. Waigmann and A. Barta, NARes 20, 75 (1992). 64. G. J. Goodall and W. Filipowicz, Cell 58, 473 (1989). 65. J. D. Hawkins, NARes 16, 9893 (1988). 66. P. Athma, E. Grotewald and T. Peterson, Genetics 131, 199 (1992). 67. E. Grotewald, P. Athma and T. Peterson, PNAS 88, 4587 (1991). 68. G. J. Goodall and W. Filipowicz, Plant Mol. Biol. 14, 727 (1990). 69. K. Wiebauer, J. J. Herrero and W. Filipowicz, MCBiol8, 2042 (1988). 70. R. M. Sinibaldi and I. J. Mettler, This Series 42, 229 (1992). 71. S. V. van Santen and R. A. Spritz, Gene 56, 253 (1987). 72. B. Wieringa, E. Hofer and C. Weissmann, Cell 37, 915 (1984). 73. A. G. Hunt, B. D. Mogen, N. M. Chu and N.-H. Chua, Plant Mol. Biol. 16,375 (1991). 74. K. R. Luehrsen and V. Walbot, NARes 20, 5181 (1992). 75. K. R. Luehrsen and V. Walbot, Plant Cell 2, 1225 (1990). 76. D. F. Ortiz and J. N. Strommer, MCBiol 10, 2090 (1990). 77. C. G. Simpson and J. W. S. Brown, Plant Mol. Biol. 21, 205 (1993). 78. C. F. Weil and S. R. Wessler, Annu. Rev. Plant Physiol. Plant Mol. Biol. 41, 527 (1990). 79. H. Cuypers, S . Dash, P., Peterson, H. Saedler and A. Gierl, EMBOJ. 7, 2953 (1988). 32. 33. 34. 35. 36. 37. 38. 39.
192
KENNETH R. LUEHHSEN ET AL.
80. S. R. Wessler. 6. Baran atid M . Varagona, Science 237, 916 (1987). 81, Ii.-Y.Kim, J. it: Schiefelbein, \’. Raboy, D. B. Furtek and 0. E. Nelson, Jr., PNAS 84,
,5863 \I9871 8.2. S . R. Wessler, MCBiol 11, 6192 (19911. 8.3. \V. I). Sutton. M! L. Gerlach, D. Schwartz and W. J. Peacock, Science 223, 1265 (1984). 84. E. S. Dennis, hl. hl. Sachs, W. Gerlach, L. Beach and W. J. Peacock, NARes 16, 3815
(1988). 8.5. R. Simon and P. Starlinger, MGG 209, 198 (1987). 86. P Misson, 6. Rutherford, J. A. Banks and N. Fedoroff, Cell 58, 755 (1989). NT. .\. Gierl. Z. Schwarz-Somrner and H. Saedler, EMBO J . 4, 579 (1985). 88. 2. Schwarsr-Sommer, A. Cierl, H. Cuypers, P. ‘4. Peterson and H. Saedler, EMBO J. 4,
591 (19851. 89. 2. Schwari-Sommrr, A. Gierl, R. 8 . Klosgen, U. Wienand, P. A. Peterson and H. Saedler, EMBO J . 3, 1021 (1984). 90. E. Tacke. Z. Schwarz-Sommer, P. A. Peterson and H. Saedler. Maydica 31, 83 (1986). 91. 2. Schwarz-Sonirner, N. Shepherd, E. Tacke, A. Gierl, W. Rohde, L. Leclercq, M. Mattes. R. Berridtgen. P. A. Peterson and H. Saedler, EMBOJ. 6, 287 (1981). 92. J. W.Schiefelbein, V. Raboy, N. l’Fedoroff . and 0. E. Nelson, PNAS 82, 4783 (1985). 93. \.. Rabo?; H.-Y. Kim. J. W. Schiefelbein and 0. E. Nelson, Genetics 122, 695 (1989). 94. R . J. Okagaki. T. D. Sullivan, J. W.Schiefelbein and 0. E. Nelson, Plant Cell 4, 1453 j 1992). 93. S. E. Leff, $1. G. Rosenfeld and R. 1.1. E \ m s , AAB 55, 1091 (1986). 9 6 J. 51. Wreneke, J. hf. Chatfield and \V. L. Ogren, Plant Cell 1, 815 (1989). 9;. S. J. Rutidle and R. E. Zielinski, JBC 266, 4677 (1991). 98. M . E. Salvucci. J. 51. Werneke, W.L. Ogren and A. R. Portis, P h n t Physiol. 84, 381 (1987). 99. K. H. Knesler and W. Ogren, Plant Physiol. 94, 1837 (1990). 100. A . Pereira and H. Saedler, EMBO J , 8 , 1315 (1989). 101. A. R. Buchnian and P. Berg. MCBiol8, 4395 (1988). Z0.2. D. hlcElro): \V. Zhang. J. Cao and R. Wu, Plant Cell 2, 163 (1990). 103. J. H . Oard, 11. Paige and J. Dvorak, Plant Cell Rep. 8, 156 (1989). 104. C. l>rati, hf. Favrrau. D. Bond-Nutter, J. Bedbrook and P. Dunsmuir, Plant Cell 1, 201 i 1989). 10a5. P. Leon, F. Planckaert and V. Lt’albot, Plant Pltysiol. 95, 968 (1991). 106. I>. biascarenhas. I. J. Xlettler, D. A. Pierce and H. W. Lowe, Plant Mol. B i d . 15, 913 (1990). 107. V i’asil, M . Clancy, R. J. Fed, I. K. Vasil and L. C. Hannah, Plant Physiol. 91, 1575 (1989). 108. I>. I. Last. K. I. S. Brettefl, D. A. Chamberlain, A. M. Chaudhury, P. J. Larkin, E. L. Marsh, W.J. Peacock and E. S. Dennis, Theor. Appl. Genet. 81, 581 (1991). 109. A. Tanaka. S. blita, S. Ohta, J. Kynzuka, K. Shirnarnotoand K. Nakaniura, NARes 18, 6767 (1991). 110. S. 11. Gillies. S. L. Morrison, V. T. Oi and S. Tonegawa, Cell 33, 717 (1983). I J I . S. Hayashi. E. Goto, T. S . Okada and H. Kondoh, Genes Dec. 1, 818 (1987). 11.2. W’.Horton, T. Miyashita, K. Kohno. J. R. Hassel and Y. Yamada, PNAS 84, 8864 (1987). 11.3. M . T. F. Huang and C. M. Gorman. XARes 18, 937 (1990). 114. K. lnoue, M. Ohno. H . Sakamoto and Y. Shimura, Genes Dea 3, 1472 (1989). 115. I>. L. Spector. PN.4S 87, 147 (1990). Z16. H. J. Yost, R. B. Peterson and S. Lindquist, Trends Genet. 6, 223 (1990). 117. E. Czarnecka, L. Edelman, R. SchiifH and J. I,. Key, Plant Mol. Biol. 3, 45 (1984).
SPLICING IN HIGHER PLANTS
193
J. Winter, R. Wright, N. Duck, C. Gasser, R. Fraleyand D. Shah, MGG211,315(1988). A. H. Christensen, R. A. Sharrock and P. H. Quail, Plant Mol. Biol. 18, 675 (1992). N. Hopf, N. Plesofsky-Vig and R. Brambl, Plant Mol. B i d . 19, 623 (1992). T. Takahashi, S. Naito and Y. Komeda, Plant Physiol. 99, 383 (1992). E. Czarnecka, R. T. Nagao, J. L. Key and W. B. Gurley, MCBiol 8, 1113 (1988). 6 . Hagen, N. Uhrhammer and T. J. Guilfoyle, JBC 263, 6442 (1988). H. J. Yost and S. Lindquist, Science 242, 1544 (1988). M. W. Smith, J. Mol. Euol. 27, 45 (1988). P. Katinakis and D. P. S. Verma, PNAS 82, 4157 (1985). A. M. Weiner, Cell 72, 161 (1993). A. Menssen, S. Hohmann, W. Martin, P. S. Schnable, P. A. Peterson, H. Saedlerand A. Gierl, EMBO J. 9, 3051 (1990). 129. S. R. Wessler, Maydica 36, 317 (1991). 130. K. Hartmuth and A. Barta, NARes 14, 7513 (1986). 131. B. Kieth and N.-H. Chua, EMBO J. 10, 2419 (1986). 132. J. W. S. Brown, G. Feix and D. Frendewey, EMBO J. 5, 2749 (1986). 133. D. Tagu, C. Cretin, C. Bergounioux, L. Lepiniec and P. Gadal, Plant Cell Rep. 9, 688 (1991). 134. J. Paszkowski, A. Peterhans, R. Bilang and W. Filipowicz, Plant Mol. B i d . 19, 825 (1992). 134a. H. Lou, A. J. McCullough and M. A. Schnler, Plant J. 3, 393 (1993). 135. J.-F. Lalibertk, 0. Nicolas, S. Durand and R. Morosoli, Plant Mol. B i d . 18, 447 (1992). 136. A. Barta, K. Sommergruber, D. Thompson, K. Hartmuth and M. A. Matzke, Plant Mol. Biol. 6, 347 (1986). 137. J. M. Martinez-Zapater, R. Finkelstein and C. R. Somerville, Plant Mol. Biol. 11, 601 (1988). 138. V. Pautot, R. Brzezinski and M. Tepfer, Gene 77, 133 (1989). 139. M. A. Dietrich, J. P. Prenger and T. J. Guilfoyle, Plant Mol. Biol. 15, 207 (1990). 118. 119. 120. 121. 122. 123. 124. 125. 126. 127. 128.
This Page Intentionally Left Blank
New Concepts in ProteinDNA Recognition: Sequence-directed DNA Bending and Flexibility’ RODNEYE. HARRINGTON* AND ILGAWINICOV*J *Departments of Biochemistry and +Microbiology University of Nevada Reno Reno, Nevada 89557
I. DNA Sequence Dependence in Protein-Nucleic Acid Binding Specificity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Sequence-dependent Bending in DNA . . . . . . . . . . . . . . . . . . . . . . . B. Sequence-dependent Flexibility or Kinking in DNA C. Evidence for Sequence-dependent Flexibility in DNA . . . . . . . . . . 11. A Short Taxonomy of DNA-bending Proteins and Their Recognition Sequences ................................................... A. Prokaryotic Helix-Turn-Helix Proteins ................. B. Eukaryotic Helix-Turn-Helix C. Zinc-finger Proteins: The ”C,H,” Classes D. The “C,” Class of Zinc-binding Proteins . . . . . . . . . . . . . . . . . . . . . . .................. E. Leucine-zipper Proteins . . . . F. Minor-groove-binding Proteins: The TFIID Transcription Factor Complex . . . . . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G. The NF-KB Protein and Its Binding to DNA H. Other DNA-binding Proteins with Putative F1 Their Recognition Sites . Sequences . . . . . .
196 199 200 202
213 214 221 227
233 236 240 243
245 248
111. Models of Sequence-directed Structure-Function Relationships in
Selected Regulatory Systems
References
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
253 259 261 263
One of the most interesting and fruitful recent developments in molecular biology is the continuing coalescence of genetic control with the older field of DNA molecular biOphysics and structure. It has been known for some time that many biological processes at the molecular level, including 1
A glossary of abbreviations and polynucleotide notation appears on p. 261.
Progress in Nucleic Acid Research and Molecular Biology, Vol. 41
195
Copyright 6 1994 by Academic Press, Inc. All rights of reproduction in any form reserved.
1%
RODNEY E. HARRINGTON AND ILGA WINICOV
the regulation of genes, occur concurrently with the binding of regulatory proteins to specific sites on the genomic DNA. These regulatory nucleoprotein complexes are usually characterized by unusual binding &nity as well as site specificity. Many cases are known in which the same protein can regulate different trans-actional events by binding to different DNA recognition sites, or in which differential regulation can occur from the competitive binding of two or more proteins to the same site. Recent crystallographic, spectroscopic, and biochemical studies provide some structural rationale for the extraordinary binding specificity of many of these regulatory complexes, and a number of consensus structural “motifs” for the DNA-binding domains of proteins have been described (reviewed in 1-3). However, how the DNA contributes to the binding specificity i s not so apparent. For some time, it has been recognized that the well-known DNA structural “families” may exhibit some sequence dependence, but no instances of specific DNA families binding with high specificity to regulatory proteins have so far been demonstrated. Rather, the DNA in regulatory nucleoprotein complexes seems to be invariably in the B-form, the softest and most labile of the various structural families known. Thus, sequenceregulated, highly localized DNA structures must be implicated in proteinDNA recognition. Such sequence dependence in DNA structure has been recognized for some time, and has been related in certain ways to protein binding (reviewed in 4-6). In this writing. we attempt to knit together available structural information on the DNA-binding domains of a representative set of regulatory proteins with what is presently known about sequence-directed DNA structures. Because the discovery and identification of new transcription and regulatory factors are currently proceeding at such an explosive pace, we make no pretense that this discussion can be a comprehensive and current review of these systems. Rather, we focus on unusual D N A structures with known sequence dependencies, such as bends, and the relatively new concept of sequence-directed structural softness or flexibility, and correlate these with protein structural motifs wherever possible. We show that analyses of consensiis binding sequences in DNA can provide important clues both for identifying possible roles of localized DNA structures (or microstructures) in protein-DNA interactions and for interpreting these roles in structure-function terms.
I. DNA Sequence Dependence in Protein-Nucleic Acid Binding Specificity The perception of DNA structure-fhnction relationships by molecular biologists has undergone considerable modification in recent years. The
STRUCTURAL FLEXIBILITY IN
DNA-PROTEININTERACTIONS
197
earlier depiction of DNA as a homogeneous, stiff, nearly rodlike macromolecule has had to acquiesce to the notion that many of its important biological functions are based not on its overall global conformation but rather on highly localized structural features directed by relatively short sequences of bases. A number of factors have played roles in this remarkable shift of paradigms, but two of the most important have been the discovery of axial bending in DNA and the observation that regulatory proteins generally bind to specific sequences of bases with extraordinary affinity. An important consequence is that most biological processes are modulated at the molecular level by the interactions of regulatory proteins with themselves or other proteins and with their characteristic operator DNA. A corollary is that the trajectory of the DNA is precisely defined, particularly in large, multisubunit nucleoprotein complexes. Because both of these factors are highly site specific, they can confer a corresponding level of site specificity to the processes they control, which include transcriptional regulation, the action of hormone receptors, and certain types of site-specific recombination, including the precise insertion of viral DNA into host genomes. Thus, our advancing knowledge of biological control processes at the molecular level has mandated an equivalent understanding of localized, sequence-directed DNA structures. The mechanisms by which certain proteins recognize specific regions of DNA are not fully understood and are generally complex. In spite of this, the broad classification of energetic factors leading to both binding specificity and affinity into direct and indirect readout components, as originally suggested by Drew and Travers (7), provides a useful basis for conceptualizing the role of localized, sequence-directed DNA structures in specific proteinDNA binding. Direct readout was first proposed in the early 1980s (8, s), based on modeling studies of the Cro protein crystal structure fitted to uniform, Watson-Crick DNA. A small number of amino-acid residues, usually about three, aligned along one side of a “recognition” a-helix [or occasionally a p-ribbon (lo)]in the DNA-binding domain of the protein, form specific hydrogen bonds with a “recognition matrix” of complementary nitrogen or phosphate sites in the DNA. In most cases, access is through the major groove because in B-family DNA the major groove width is adequate to accommodate the protein-recognition element, and the potential hydrogen-bonding sites are relatively exposed. However, new classes of minor-groove-binding proteins have recently been characterized. Additional contacts of the protein-recognition element with the DNA backbone, usually with phosphates, ensure correct placement of the element within the DNArecognition matrix. Although direct readout provides a mechanism for high binding specificity, this can be further amplified by cooperative proteinprotein interactions and by involving additional protein-recognition ele-
198
RODNEY E . HARRINGTON AND ILGA WINICOV
ments in the binding process. A special case of the latter is the interaction of dimeric proteins with palindromic DNA-recognition matrices. The structural fit between the protein and DNA binding partners that facilitates these highly specific interactions is termed indirect readout. Until fairly recently, it was thought that this could be understood as a complementary assembly of localized static protein and nucleic acid structures, and the “goodness of fit” between the two would be determined by the specific structures and hence by their sequences. It has been known for over a decade that B-form DNA exhibits sequence-dependent structural variability (11, 12) that leads also to corresponding variability in conformation or molecular shape (13, 14). When spaced in a sequence at helical periodicity, localized structural dislocations can lead to longer range structural features such as fixed bends. It is likely that sequence-dependent DNA structure including such coherent additivity effects leading to planar curvature is an important ingredient of indirect readout. In addition to enhancing protein-DNA interactions by improving the geometrical fit, DNA bending appears to facilitate or modulate looping between regulatory elements acting in cis at a distance (15) and in the architecture of multisubunit regulatory complexes (16).The first of these may include the interactions of enhancers with promoter regions, a number of protein-mediated intrapromoter associations, and effects of chromatin structure on transcriptional regulation ( I 7). It is becoming increasingly clear, however, that a picture of indirect readout based only on static structures is a serious oversimplification. Evidence is accumulating that there is also a dynamic aspect in which proteinDNA binding is affected by conformational deformations of both the protein and the DNA. These result from structural adjustments involving localized changes in helical twist angles and in the direction of the helical axis at the dinucleotide level. The structural accommodations between a specific binding protein and its DNA-binding site are determined by a complicated interplay of intermolecular and intramolecular electrostatic, hydrophobic, and van der Waals forces. These interactions are nonspecific in themselves, but are strongly dependent on the conformational features of both the DNA and the correctly folded DNA-binding domain of the protein. Both the protein and DNA partners seek regions of the other that maximize these interactions. The nucleoprotein complex will therefore reflect not only the greatest possible accommodation between static structures of the binding partners, but its formation may entail some level of structural distortion in one or both partners as well. Virtually all specific nucleoprotein complexes must utilize conformational lability in both the protein and DNA components to improve direct recogni-
STRUCTURAL FLEXIBILITY IN
DNA-PROTEININTERACTIONS
199
tion (18).The structural and conformational factors involved in indirect readout are highly sequence dependent because maximum accommodations between the binding partners will occur only at a juxtaposition of critical sequences in both partners. However, there may also be some synergism between direct and indirect readout effects: the nucleoprotein conformation that corresponds to maximum structural accommodation by the binding partners will usually also be the conformation that optimizes the formation of specific hydrogen-bonding contacts. It is in the context of all these effects that the extraordinary sequence specificity of interactions between binding proteins and their relatively short cognate DNA-recognition regions can be understood. It now seems likely that each specific nucleoprotein complex utilizes a characteristic and possibly unique combination of direct and indirect readout mechanisms. Furthermore, an extraordinary level of complexity in nucleoprotein binding as well as a complex interplay of multiple chemical and physical mechanisms are probably necessary to achieve the required level of site specificity and to reduce errors to a genetically acceptable level in truns-actional events having high site-specificity requirements.
A. Sequence-dependent Bending in DNA Curvature in B-form DNA regions has been studied for almost a decade and is now believed to be an intrinsic property of certain DNA sequences (reviewed in 16, 19-23). The earliest curvature elements to be identified were the phased A, tracts found in DNA fragments obtained from kinetoplast minicircles of the African trypanosome. Their anomalously slow migration in polyacrylamide gels was assumed to derive from planar, or nearly planar, axial curvature (22). Subsequent studies on these systems using a variety of experimental methods, including gel mobility retardation (23)and measurement of cyclization .probabilities (24-27), variants of methods pioneered by Shore and colleagues (28,29), clearly established that DNA with helical-phased regions containing tracts of A, with n 2 5 were indeed curved. Two models have been proposed to account for this curvature. In the wedge model, each A, tract is curved because the axial deflections of successive A-A dinucleotides combine coherently to produce a planar curve (3033). In the junction model, axial deflections arise from the structural discontinuities occurring at the junction between the A, tracts, presumed to be in a modified B-form structure (34), with adjacent B-form DNA (35).In both models, the A, tracts are in nearly perfect helical register, which assures that the curvatures of individual bending elements add coherently to produce a larger overall bend. Experiments to date do not confirm one model over the
200
RODNEY E. HAHRINGTON AND ILGA WINICOV
other, and it now seems possible that both are simply variants of the same model (21). Hegions of putative A, curving DNA in important biological systems have recently been identified (reviewed in 5 , 6, 36). More recently, evidence has appeared suggesting that sequence-directed curving in DNA may be a more general phenomenon, involving a number of sequence elements in addition to A,, regions (13, 14). A set of first-order predictive rules that provide a semiquantitative description of general sequence-directed fixed bending has recently been proposed (13).Although a number of moderately curved sequences that contain no A, tracts have been identified ( I d ) , curvature from phased A,, elements is still much larger than from other reported curvature motifs (13, 14). it now seems probable that sequence-dependent DNA structures, including coherent effects leading to curvature, serve to orient or steer the DNA in large, multisubunit nucleoprotein complexes (16).Fixed DNA bending may also be important in looping between regulatory elements acting in cis at a distance (15).
6. Sequence-dependent Flexibility or Kinking in DNA Although much more difficult to demonstrate experimentally, sequencedirected DNA flexibility very likely plays a functional role similar to that of static axial bending. The distinction between flexible and fixed or static bending in DNA is based on relative deformability of the helical axis trajectory in a direction perpendicular to the axis. All DNA is flexible to some extent, as manifested by its finite persistence-length in solution (37, 38; reviewed in 36). Just as the helical axis changes direction in certain sequence elements ( I S ) , the local bending or torsional modulus may also vary significantly among sequence motifs. Steric considerations indicate that flexibility in the double helix will generally occur preferentially by roll toward either the major or the minor groove (39). This involves less configurational readjustment of the backbone than a flexure due entirely to tilt. Such anisotropic flexibility, if it occurred in a completely directional fashion (as might be the case in the interaction of a DNA sequence element with a protein), would have configurational consequences similar to those of fixed or static bending. Thus, larger flexibility effects might derive from the coherent contributions of multiple flexibility elements located in a helically phased array just as with fixed bending motifs. Both static bending and flexibility at specific sequence elements can promote the necessary DNA trajectories to effect a tight fit between DNA and proteins in nucleoprotein complexes. Conformational lability from flexible sequence elements may fine-tune this process and may, i n addition, offer structural explanations for instances in which more than one protein can bind specifically to the same
STRUCTURAL FLEXIBILITY IN
DNA-PROTEININTERACTIONS
201
DNA sequence. From information presently available, localized DNA flexibility may be a more ubiquitous feature of DNA-protein interactions than previously believed (40). The first explicit distinction between flexibility and fixed bending was made in studies of the electrooptical properties of DNA containing putative static bending elements (41, 42). Bending was characterized in terms of the decay of electrical birefringence, a quantity highly sensitive to the effective length or end-to-end distance of a rodlike macromolecule; bending in a DNA fragment, whether static or due to anisotropic flexibility, results in an effective shortening of the molecule. The distinction between static bending and flexibility was based on the measured persistence-length, and particularly on its electrostatic component as deduced from its experimental dependence on ionic strength. Although persistence-length is a measure of chain stiffness, it is a statistical quantity averaged over the entire fragment, so these experiments could therefore not distinguish isotropic from anisotropic chain flexibility effects. In general, measurements of properties that are averaged over the entire chain are not very useful for this type of distinction; they usually cannot distinguish unambiguously between anisotropic flexibility and static bending. This includes most traditional polymer characterization methods, and most enzymatic and biochemical methods as well. Rather, methods are required that can extend the dynamical lifetime of a bent, flexible chain or sequence element. Of these, binding the sequence to a DNA-bending protein is perhaps the least ambiguous, and studies on the free sequence provide suitable controls to differentiate static from flexible bending effects. At present, the most plausible microstructure associated with increased DNA flexibility is the stereochemical kink. This concept was first proposed (43)in an attempt to explain the energetics of DNA wrapping on the nucleosome, and was amplified somewhat in later studies (44).When a structural dislocation leading to an abrupt and discontinuous change in direction of the helical axis is reduced to a single dinucleotide, or to a small set of contiguous nucleotides, the DNA is usually said to kink. Kinks have been proposed to occur from relatively massive structural alterations in the DNA due to pyrimidine dimers or psoralen crosslinks (45),from drug-binding interactions (46),from single-strand bubbles (47), and from protein binding to DNA (3, 31, 40, 48-51; reviewed in 3). Simple stereochemical kinks are evidently possible that primarily involve roll into the major or minor groove (43, 44), avoiding a change in local tilt (31,40,50,51).Such kinks represent a dislocation in the helical axis where the stacking interactions between two neighboring base pairs is essentially lost (50, 51). It has been calculated that a roll of 15 to 20"between the base planes corresponds roughly to a loss of 50% of
202
RODNEY E. HARRINGTON AND ILGA WINICOV
stacking energy (31, SO); this value therefore sets a lower limit to the kink angle. At larger angles, the kink is expected to behave essentially as a free hinge (43, 44). At ordinary temperatures, an appreciable steady-state concentration of kinkable sequence elements may exist in a kinked state due to thermal fluctuations ("DNA breathing"). Schellman (39) has analyzed the effect of such kinking on DNA chain flexibility and has suggested that as many as 2% of all bases could exist in a kinked conformation (44) at any given time in order to account for observed persistence-length values. Using a different analysis, Manning (52)has estimated 1%.Both these values are somewhat less than the upper limit of 5% deduced from proton-exchange rates (53-55). Although the latter may be up to 107 too large (56, 57), even the most conservative estimates of DNA breathing rates (reviewed in 57, 58) suggest that, at ordinary temperatures, an appreciable fraction of DNA bases may be energetically in a kinkable state. Because the sequence elements CA and TA are the lowest in stacking energy among dinucleotides (59, 6O), the equilibrium concentration of these that are energetically in kinkable states at ordinary temperatures may be greater than the above estimates suggest. Indirect evidence is available in support of this view from NMR investigations (61), from studies of nucleosome positioning (62), and from an analysis of sequence versus molecularsize dependencies of gel mobility retardation effects (63).Direct evidence for kinking at (CA)*(TG)dinucleotide elements has been provided by a recent high-resolution cocrystal structure of the complex between the CAP protein of the Escherichia coli lac operon and its operator DNA (3,48).In this work, two sharp kinks of about 40" each are observed at the two (CA)-(TG)elements symmetrically arranged about the pseudodyad of the specific binding consensus sequence. H e l i d phasing is such that this leads to a somewhat outof-plane overall bend of about 90".This is in essential agreement with a bending angle of about 100"deduced from studies using gel electrophoresis methods (64,65),but unlike the gel studies, the cocrystal structure demonstrates clearly that the locus of bending is focused primarily in (CA).(TG) h n k sites. Lower resolution crystallographic studies on the complex of the h phage Cro protein with the 0,3 operator site also show pronounced DNA bending in the complex (66).
C. Evidence for Sequence-dependent Flexibility in DNA 1. INDIRECT EVIDENCEFROM
ANOMALOUS
GEL MIGRATION
The first experimental evidence for sequence-dependent flexibility in free DNA was based on the unusual electrophoretic behavior of certain DNA
STRUCTURAL FLEXIBILITY IN
DNA-PROTEININTERACTIONS
203
fragments having fixed axial curvature but no A,, tracts (40).Each of these fragments was constructed by end-to-end ligation of a curved 21-bp “precursor” sequence. Because the number of helical repeats in each of these precursors was almost exactly two, the curvature in each was propagated coherently into planar bending oligomeric fragments. However, the retardation in mobility through the gel (characterized as RL, the ratio of apparent to true fragment length) was qualitatively different for different precursor sequences. In most cases, the retardation increased monotonically with oligomer size. This would be the expected behavior if the retardation is associated with axial curvature, at least up to oligomer sizes at which the fragment lengths and overall curvatures become very large. In a few cases, however, R L as a function of fragment size goes through a maximum at relatively low oligomeric size (about 100 bp). This “second retardation anomaly” does not appear to be correlated in any obvious manner with the precursor sequence. However, use of a sophisticated plotting algorithm (67)revealed that, at least among the precursor sequences investigated in this work, the second retardation anomaly can be observed only with those precursors whose oligomers exhibit essentially perfect planar curvature and, in addition, contain one or more (CA)*(TG)or (TA)*(TA)dinucleotide elements located precisely in the plane of curvature. It was therefore proposed that these elements are sites of unusual anisotropic flexibility as characterized by unusually small energy barriers to kinking. At ordinary temperatures, thermal breathing in the DNA would ensure that a significant subpopulation of molecules would be energetically in a kinked or, at least, in a “kinkable” state. If they are located in the plane of static curvature, the tensile force of the electric field in a gel electrophoresis experiment could facilitate their kinking in such a direction as to partially straighten out the static curvature in the fragment. According to current gel electrophoretic theory (reviewed in 68, 69), therefore, these fragments should experience less “friction” in reptating through the gel and the magnitude of R , would be correspondingly reduced. Viewed in another way, the gel could be thought to “entrap” transiently kinked species by allowing them to pass into pores too small to accommodate the normally curved fragments, thereby extending their effective lifetimes. This concept of entrapment is fully analogous to that envisioned when a specific binding protein alters the structure of its DNA recognition sequence at particular flexibility loci in order to ensure improved structural accommodations in binding. Although indirect, the anisotropic flexibility hypothesis is a plausible and attractive explanation for the second gel anomaly effects observed by McNamara et al. (40).Other explanations have been advanced that can account for nonmonotonous behavior of R , with fragment size (70), but these
204
RODNEY E. HAHRINGTON AND ILGA WINICOV
cannot account for the extraordinary consistency in the location of putative flexibility elements with respect to the plane of static curvature in the fragments studied (40). Combined with additional recent evidence, discussed in Section I,C,S,b, a strong circumstantial case can now be made for (CA).(TG) dinucleotides as sequence-directed flexibility elements of importance in protein-DNA interactions. In this connection, however, it is important to remember that McNamara et al. also identified (TA)*(TA)dinucleotides as putative kink sites in their sequences.
2. CRYSTALLOGRAPHIC EVIDENCE DNA. The earliest direct evidence for stereochemical kinking in DNA was observed in the high-resolution singlecocrystal structure of the EcoRI endonuclease with its specific binding site GAATTC (49). Crystals were grown in the absence of Mg2+ to preclude cleavage of the DNA. In this structure, the DNA conformation was distorted by the bound protein in two distinctive kinks: a torsional dislocation in the center of the binding region, called a type-I neokink, and two largely axial bending kinks, called type-II neokinks, at the edges of the central binding region. The type-I neokink effectively unwinds the DNA by approximately 25" and leads to a widening of both grooves, but particularly of the major groove, and it thus enhances the accessibility of the bases in this region. It occurs at the central (AT)-(AT)base-paired dinucleotides and leads to a relatively small bending dislocation of about 12". The type-I1 neokink, on the other hand, is primarily a bending dislocation of from about 20" to about 40" with a much smaller torsional component. It occurs mainly by roll toward the minor groove at flanking (CG)*(CG)base-paired dinucleotides. These neokinks are similar in concept to the stereochemical kinks described earlier (43, 44, 50), but differ significantly in structural details. A particularly interesting feature of the type-I neokink is that its formation creates an alignment of hydrogen-bonding sites on bases that appears necessary for direct readout in the EcoRI nucleoprotein complex, although such an alignment does not exist in the uncomplexed recognition DNA. This is a point of great importance in understanding the extreme subtlety of sequence-directed flexibility effects in protein-DNA recognition: many critical structural features, such as torsional or axial kinking, may be virtual features that exist only as transient aspects of a more complicated overall molecular dynamics in the binding partners separately, and become real only as the binding partners unite in a stable nucleoprotein complex. Stereochemical kinks have been observed also in the cocrystal structure of the CAP protein-operator DNA complex (48). In this system, kinking occurs through about 40" at two (CA).(TG) dinucleotide elements located a Eoihence for Static Kinks in
STRUCTURAL FLEXIBILITY IN
DNA-PROTEININTERACTIONS
205
symmetrically about the dyad in the palindromic recognition sequence, so that the overall DNA bend through the complex is about 90”. The cocrystal structure clearly shows that these kinks are essential for both direct and indirect readout in the CAP complex. The alignment of the recognition helix in the helix-turn-helix motif with the complementary bases in the major groove evidently cannot occur in the absence of a structural dislocation of this magnitude. In addition, the bent DNA conformations at the (CA).(TG) sites allows more distal regions of the DNA to interact with the protein, and it is suggested that sequence-directed effects similar to those proposed to explain the slight but significant sequence dependence of nucleosome placement on DNA (71)can resolve the remaining curvature observed in the CAP complex (48). However, the large DNA bend through this complex has an additional functional role: that of DNA “steering” in the transcriptional complex. Some evidence exists that CAP may activate transcription by facilitating interactions between polymerase and upstream DNA, presumably through a looping mechanism. (This question is reviewed in 3, 72.) The above cocrystal structures provide clear evidence for kinked DNA in the EcoRI and CAP nucleoprotein complexes. From this, it can be inferred that the kinks occur at “weak points” in the DNA, i.e., at sequence elements having a relatively low energy barrier to kinking, and that the protein-DNA interaction free energy is large enough to overcome the conformational free energy costs in both the protein and DNA moieties required to form the optimum nucleoprotein complex. Nevertheless, these cocrystal structures are snapshots of dynamic systems stabilized by crystal packing forces in a particular conformation; the evidence they provide that the observed structural dislocations represent sites of dynamic flexibility is necessarily indirect. This points up the need for continuing diversity in experimental approaches to study these fundamental and important structure-function relationships.
b. Evidence for True Sequence-directed Flexibility. A more direct depiction of site-directed flexibility in DNA has appeared in the work of Lipanov et al.(73),who crystallized the B-DNA decamer CCAACIlTGG in both monoclinic and trigonal space groups. This work provides a dramatic demonstration of the importance and extraordinary specificity of crystal packing forces, because the structures observed in the two space groups are different in several critical respects. Although differences in twist, roll, helical rise, slide, and propeller twist were observed in all dinucleotide steps, large differences occurred only in the pyrimidine-purine elements (CA)-(TG). These differences were primarily in roll, twist, and slide, suggesting that in this sequence, the (CA).(TG) elements are sites of unusual flexibility. Although the association of these differences with deformability is indirect, this
206
RODNEY E. HARRINGTON AND ILGA WINICOV
work must nevertheless be viewed as substantive evidence for sequencedirected dynamic flexibility in DNA. It provides additional confirmation that (CA)*(TG)is a consequential locus of such flexibility in a sequence.
3. FLEXIBILITY IN PROTEIN-BOUND DNA: THE COMPLEXOF CRO PROTEIN WITH THE OR3 BINDINGSITE a . Ecidence for DNA Bending in the Cro-OR3 Nucleoprotein ComPerhaps the most direct demonstration of (CA)-(TG)as a flexibility site in DNA is provided by gel electrophoresis studies of the cyclization properties of DNA oligonucleotides containing the Cro protein of A phage complexed to one of its several specific recognition sites. Lyubchenko et al. (74)described a gel electrophoresis technique that allows a direct determination of bending angle in certain nucleoprotein complexes. The technique is based on the mixed-ligation cyclization method of Ulanovsky et al. (75). It has been applied to the Cro-OR3 nucleoprotein complex. Although Cro is a relatively small regulator protein, the method seems likely to be applicable also to larger nucleoprotein systems. The DNA bending angle in the Cro0,3 complex was about 45",in excellent agreement with the value obtained from a recent low-resolution X-ray cocrystal structure on this complex (66). The method used (74)was relatively straightforward and direct. Complementary single strands 21 nt long and containing the 17-bp OR3recognition sequence were synthesized. These were designed so that subsequent hybridization produced double-stranded 21-bp precursors with 4-nt singlestrand overhangs, which were single end-labeled using T4 polynucleotide kinase and [Y-~~PIATP. With this sequence protocol, 0,3 sites were spaced by exactly two helical turns in the higher ligation products. After hybridization, the 21-bp oligomers were reacted with Cro protein in the ligation buffer and then ligated slowly at 0°C for about 12 hours. Following ligation, the protein was removed and the DNA was analyzed by autoradiography in a two-dimensional gel electrophoresis system similar to that described by U1anovsky et al. (75).Control determinations on the DNA in the absence of Cro protein were also performed; essentially no circles less than 300 bp in size were observed, and the gel mobilities were normal. Thus, the 0,3 sequence in the absence of bound protein exhibits no unusual curvature. The two-dimensional gel is shown in Fig. 1 and a scan of the spot distribution corresponding to circles is shown in Fig. 2. The distribution of clearly resolved circle sizes ranges from 147 to 273 bp with a fairly sharp maximum at 168 bp, or 8 x 21-bp precursor elements. Because the circles are topologically relaxed (75), the bending angle per 21-mer (or per bound 0,3 sequence) is immediately obtained as =360"/8 = 45" (Fig. 3). Potential problems in the interpretation of the mixed-ligation results are discussed in 69. It is possible that the distribution of circle sizes can be
plex.
STRUCTURAL FLEXIBILITY IN
DNA-PROTEININTERACTIONS
207
FIG.1. Analysis of ligation products of the 21-bp precursor element containing the 17-bp Cro recognition region (underlined), TAXACCGCAAGGGATAAATA (with complementary strand 3’-TGCCGTECCTATITATE-5’ to produce four base-unpaired ends), using twodimensional gel electrophoresis. The doublets in each circle group correspond to nicked or open (upper) and covalently closed (lower) circles. From Lyubchenko et al. (74).
1.o
! 0.5
AU
8
b
11 12
Mobility (1 st Dimension) FIG. 2. Computerized densitometric scans of closed circles from Fig. 1. From Lyubchenko et al. (74).
208
RODNEY E. HARRINGTON AND ILGA WINICOV
I FIG.3. Schematic illustration of Cro protein-induced bending of the 0 , 3 operator site of A phage and the circulariz;ltion of ligated Cro-bound 21-bp oligomers (see legend to Fig. 1). An octainrric circle is shown that corresponds to the most probable circle-size obtained experinientally (Figs. 1 and 2). The dark rectangles represent boiind cro monomers. From Lyubchenko et nl. ( 7 4
distorted by poor end alignment, which might make the cyclization rate dependent on oligonucleotide length, or by the effects of intermediates in the complex eyclizationlligation process. This appears unlikely for the following reasons. Additional mixed-ligation experiments at higher temperatures. at increased T4 ligase concentrations, in the presence of polyethylene glycol (PEG), and at 21-bp precursor concentrations varied over a sinall range about the published values, showed no effect on the distribution of circle sizes formed within experimental error (74; also L. S. Shlyakhtenko, unpublished). This indicates that ligation conditions were very nearly optimum under the experimental conditions described (74) and that linear precursors for cyclization into the various sized circles observed were essentially at steady-state concentrations. Under these conditions, the distribution of circle sizes should reflect the distribution in true cyclization efficiencies for the various sizes of species observed. In addition, the agreement between the gel analysis and the cocrystal structure (66)is striking, lending additional credence to the results obtained in this work.
STRUCTURAL FLEXIBILITY IN
DNA-PROTEININTERACTIONS
209
It is possible that the 4-bp “sticky ends” on the precursor oligonucleotides used in 74 is an important factor in maintaining steady-state kinetic conditions in the present case. Considerable experimental evidence exists for relatively clean ligation of oligonucleotides having overhangs of 4 nt or more (reviewed in 69). This suggests that minor alignment problems in writhe or twist do not perturb seriously the cyclization reaction, and hence should have only a minor effect on the cyclization as a function of fragment size. The fact that other studies using mixed-ligation cyclization are in reasonable agreement with independent experimental determinations (75, 76) suggests also that complex kinetics do not limit seriously the results from these determinations under the experimental conditions employed.
b. Evidence for Site-specijic Flexibility in the Cro-0,3 Nucleoprotein Complex. Although the operator DNA is clearly bent in the Cro-0,3 nucleoprotein complex, the studies just described could only verify the bending angle and estimate its magnitude (66,74).Neither the low-resolution X-ray structure of the nucleoprotein cocrystal (66) nor the cyclization study (74)could localize specific bending loci within the 17-bp recognition region, or associate the bending with a particular sequence motif. Candidates for bending loci in the 0,3 sequence include the alternating pyrimidine-purine dinucleotide (CA)*(TG),because kinking at this sequence element has been observed by crystallography in the CAP-operator complex (48)and alternating [ (CA)*(TG)], runs are highly overrepresented in CAP sequences (77). Furthermore, this element has been identified as a possible kinking locus or site of unusual flexibility in DNA (40). Some additional information on this question has been provided by more recent experiments (78). In an extension of the work described above, studies were made of the cyclization properties of a set of single-site mutation and mismatched sequences derived from the 17-bp 0,3 wild-type recognition sequence. Mutations to the 0,3 sequence were made in accordance with thermodynamic binding constant criteria (79) to ensure that tight binding conditions were maintained in all cases. The set of mutations includes mutations to both the upstream and downstream specific binding regions as well as to the central, nonbinding region of 0,3. The specific sequences along with estimates of helical twist (80) are given in Fig. 4. From the thermodynamic data (79), the binding free energies relative to wild-type 0,3 for the mutant sequences shown are about -0.5 kcal/mol for M3 and less than -to. 1 kcal/mol for M2 and M1, respectively. Standard gel-mobility retardation assays on the free DNA for heptamers (147 bp) and decamers (210 bp) showed identical retardation within experimental error for both fragment sizes of all species (R, = 1.2 & 0.1). This indicates that all the free DNA sequences are very similar to one another in
210
RODNEY E. HARRXNGTON AND ILGA WINICOV
TWIST
A
wild 5'- TATCACCGCAAGGGATAAATA -3' type
716.6" -3.4"
3'- TGGCGTTCCCTATTTATATAG -5' Mutations
5'- m C a C A A G G G A T A A A T A -3' M1
715.7" -4.3"
5'- TATCACCGgAAGGGATAAATA -3' M 2
712.6' -7.4"
5'- TATCACCGCAAGtGATAAATA -3' M 3
718.1" -1.9"
5'- TATCtCCGgAAGGGATAAATA -3' M 4
708.3" -1 1.4"
Mismatches
5'- TGTCACCACAAGGGATAAATA -3' C O M l
C 5'- TATCACCGGAA-TA G
-3' COM2
5'- TATCKCGCAAGTGATAAATA -3' COM3
c
FIG.4. The complete set of oligonucleotide sequences used in the 0,3 mutation studies of Lyubchenko et d. (78).The specific binding regions (66, 79) are indicated by boldiunderlined typeface. Mutations are shown in lower-case type. Helical twists are from Kabsch et d. (80).All complementary strands are designed to allow 4-ba.se single-strand overhanging ends, and is shown for the wild-type sequence only
static bending properties. They are also very similar in helical twist (80),and estimated differences are all much less than the expected thermal fluctuation in this quantity (27,33).Static bending properties of the wild-type and pointmutated sequences predicted by computer modeling based on the wedge model (13) show no significant out-of-plane bending effects and are fully consistent in magnitude with the experimental gel-mobility results. Finally, the gel-mobility retardation data showed no temperature dependence; theoretical considerations predict that this can be true only for oligomers with negligible out-of-plane bending (81). These considerations are all consistent and suggest that the single-base mutations in Fig. 4 do not significantly change the torsional matching of ends. However, there are significant differences in cyclization properties among the sequences in Fig. 4. The observed cyclization properties of the mismatched sequences COM 1-COM3, included as positive controls, all
STRUCTURAL FLEXIBILITY IN
DNA-PROTEININTERACTIONS
211
show substantial cyclization in much smaller fragment lengths than the 0,3 wild-type sequence, both in the presence and absence of bound Cro. Thermodynamic (82),spectroscopic (83),and electron-microscope (84)studies as well as gel-retardation (84, 85) and enzymatic (85)investigations are consistent in demonstrating that such mismatches destabilize the double helix and may propagate structural disturbances a distance of several base pairs from the mismatch site (83), leading to increases in both bending and torsional flexibility. A typical gel for a mutant sequence (M1 of Fig. 4) containing bound Cro protein is shown in Fig. 5a and can be compared to the same sequence in the absence of Cro in Fig. 5b. The efficiency of ring formation clearly goes through a maximum at about 180 to about 200 bp in the presence of Cro, whereas only small amounts of larger circles are formed in its absence. Furthermore, the circle distribution curves for the wild-type OR3 (Fig. 2) and the M1 mutant (Fig. 5b) are noticeably dissimilar. Differences of this type are also observed for the other mutants in Fig. 4. The only fully selfconsistent explanation for these cyclization differences is to assume that they result from variations in anisotropic flexibility among these sequences. Furthermore, these variations must be due to the presence or absence of [(CA).(TG)], sequence elements, because these are the only sequence features that differ among the OR3wild-type and the point mutations Ml-M3 shown in Fig. 4. The cyclization results indicate that putative flexibility increases in the series (CA)*(TG)< (CAC).(GTG) < (CACA).(TGTG).The longest element studied, (CACA).(TGTG),appears to be significantly more flexible than the shorter elements, suggesting that it might adopt an alternative structure. A recent crystallographic study of alternating [(CA)*(TG)],tracts has shown that these elements can adopt an unusual structure at low temperatures in the crystal, which include intramolecular A-G and T C hydrogen bonds (86). If the (CACA)*(TGTG)tract in the central, nonbonding region of mutant M1 can assume this or a similar structure, it would explain the unusual propensity of this fragment to cyclize. This suggests that an alternative structure in the (CACA)*(TGTG)sequence element is associated with increased flexibility, which, in turn, might facilitate DNA bending by the Cro protein. Although the cyclization data cannot indicate a directional preference for flexible elements, calculations (33, 50) show a preference for bending into the major groove. These results suggest that the alternating pyrimidine-purine runs [(CA).(TG)], that appear in the Cro binding sites are loci of unusual anisotropic flexibility. These sequence elements may play a role in indirect readout, i. e., facilitate sequence-specific binding between the DNA-binding domain of Cro and the DNA-recognition region, by inducing bends in the
212
RODNEY E. HARRINGTON AND ILGA WINICOV
FIG.5. Representative two-dimensional polyacrylamide gel cyclization assays (74) (shown for mutant M1,Fig. 4) in the O,3 mutation studies of Lyubchenko ef ul. (78).Circle formation in (a) the presence and (b) the absence of bound Cro protein. Reprinted with permission from (78). Copyright 1993 American Chemical Society
STRUCTURAL FLEXIBILITY IN
DNA-PROTEIN
INTERACTIONS
213
latter that are strategically positioned to provide an improved fit between the binding partners. This is consistent with the observation that two (CAC)*(GTG)elements at the 0,l and 0,2 sites improve interaction with both Cro and CIrepressors (87). In support of these ideas, it should be noted that the triplet (CAC).(GTG) appears with exceptional frequency in regulatory-protein binding sites and has been proposed to be a potential site of alternative DNA structure (61, 88). It appears in a variety of regulatory sequences, as is discussed in more detail in Section 11, and regions of helically phased (CAC).(GTG)elements can weakly position nucleosomes (62). Cyclization studies may now offer a reasonable physical explanation for these various observations.
II. A Short Taxonomy of DNA-bending Proteins and Their Recognition Sequences As we have noted, the primary motif in specific protein-DNA interactions is the binding of an a-helical or p-ribbon region of the protein to a special sequence-dependent DNA structure in which critical hydrogenbonding and other attractive interactions can occur. Conformational changes in both protein and DNA on complexation occur primarily to maximize these highly specific interactions, to enable additional sources of nonspecific binding energy, such as electrostatic and hydrophobic free energies, and to improve intermolecular interactions among proteins. Conformational changes in the protein and sequence-dependent bending in the DNA, either static or induced by the protein, probably allow correct positioning of the binding partners so that the binding &nity is either maximized or controlled within specific limits required by the biological process under regulation. These may involve adjustments in major or minor groove width to accommodate the recognition element of the protein. They may be sequence-directed, as, for example, the tendency of poly(A) runs to have narrower minor grooves (34, 89, 90)and (G C)-rich regions to have compressed major grooves (91, 92; reviewed in 5, 6, 36). Another role for DNA bending may be to allow DNA to achieve an optimum pathway in multiprotein complexes. Indeed, the only function of some specific binding proteins in oligomeric complexes may be to bend DNA (93). Finally, DNA bending may affect the kinetics of protein binding, i.e., the on-and-off rate, and in this way influence the competition of a protein for multiple binding sites. We are just beginning to witness the wide diversity in nucleoprotein complexes that exist in nature, although to date we have been extremely limited in our ability to characterize them.
+
214
RODNEY E . HARRINGTON AND ILGA WINICOV
A. Prokaryotic He1ix-Turn-He1 ix Proteins The helix-turn-helix DNA-binding domain motif was first discovered in the Cro protein of A phage (94) and has subsequently been identified in a variety of other prokaryotic regulatory proteins (reviewed in 95) including the Cro (96) and the repressor (97) proteins from phage 434, the Cro and repressor proteins from phage A, and the CAP and trp (98) repressor proteins of E . coli. Most of these have been cocrystallized with DNA fragments containing specific recognition sequences, and relatively detailed structural information from X-ray crystallography is now available for these complexes. These include the Cro (66)and cI repressor (99)from A phage, the Cro (100) and 434 repressor (also denoted R1-69) (101)from phage 434, and the CAP protein (48) and trp repressor (102) of E . coli. The crystallographic results show a general pattern of protein-induced bending in the operator DNA, and although many intriguing findings have been reported, it is not possible at present to demonstrate a consistent set of patterns in structure-function relationships.
1. THEREPRESSOR
AND
CRO PROTEINSFROM PHAGE434
The 434 repressor protein has been cocrystallized with a 20-bp fragment containing the full 0,l site (TATACAAGAAAGTJTGTACT). The 434 Cro protein has been cocrystallized with two different DNA fragments: a 14-bp oligomer with consensus homology to the 0 , n and 0,n (n = 1, 2, 3) binding sites (ACAATATATATTGT)(103),and with the same 20-mer shown above and used for the repressor (100). In the Cro system, the conformations of the DNA in the two complexes are significantly different. In the complex with the 14-mer, the DNA is straight, uniformly overwound, and in the canonical B-form. No unusual variations in the width of either the major or minor groove are evident. In the 20-mer, the central 14 bp are similar structurally to the DNA in the smaller complex, but the ends of the DNA are sharply bent in a fashion observed also with the 434 repressor complex to this same sequence (101). A close comparison of the DNA conformations obtained for both Cro and repressor shows that the principal locus of bending occurs at the symmetrically located CA and TG elements, which are separated by 12 bp, a little over a single helical repeat. The conformations of the DNA in the two systems are very similar and are roughly in the shape of a laterally elongated, shallow U with somewhat nonparallel arms (100). It is possible, therefore that these base stacks may play an indirect readout role in both the 434 Cro and repressor systems. It should be noted that the helical twists calculated for these elements from the crystallographic data do not show the anomalously large values reported for a number of other helix-turn-helix nucleoprotein
STRUCTURAL FLEXIBILITY IN
DNA-PROTEININTERACTIONS
215
complexes (see also the discussion of eukaryotic helix-turn-helix proteins in Section 11,B). However, the investigators note that the helical rise and twist parameters are highly sensitive to local variations in the structure; hence the values reported are approximate. 2. THE REPRESSORAND CRO PROTEINSFROM PHAGEA The repressor protein from phage A has been cocrystallized with a 20-bp fragment containing the 17-bp 0,l recognition site, TATATCACCGCCAGTGGTAT. The structure has been determined at two levels of resolution, 2.5 A in an early study (104),and 1.8A in a more recent determination (105). This recognition sequence has almost twofold symmetry about the central G. The left half is called the “consensus” half, because it matches the consensus sequence determined for the 12 operator half-sites (106);the right half, the “nonconsensus” half, differs at positions 13and 17. Binding to the A repressor protein is also quite different between the two halves (105). The entire recognition sequence also differs in several important respects from the sequence used in the 434 repressor study. In the A repressor, the central region of the 0,l site is (G + C)-rich, but is (A + T)-rich in the O,n sites of the 434 repressor. This suggests possible microstructural differences in this region between the two sites. We have noted, for example, that statistical studies on DNA conformation in nucleosomes show that (A + T)-rich sequences tend to occur in positions where the minor groove is compressed, whereas (G + C)-rich sequences are more likely characterized by compression of the major groove. This may be indicative of lability in major-groove width, because the major groove in the A repressor complex is slightly opened near the center, where it is contacted by the N-terminal arm of the protein (105). The 434 repressor also contains (CA).(TG)and (TG)*(CA)elements separated by 11bp and the A repressor (CA).(TG)and (TG).(CA)elements separated by 7 and 10 bp. The first and last of these appear as the triplets (CAC)*(GTG)and (GTG).(CAC).As noted in Section I,C,3,b, recent studies of DNA cyclization properties indicate that flexibility increases in the series (CA).(TG) < (CAC).(GTG) < (CACA).(TGTG)(78). The DNA conformation determined in the high-resolution structure of the A repressor complex (105) is similar at the ends of the sequence to that observed in the 434 repressor and includes bends at the terminal (CAC).(GTG) and (GTG)*(CAC)elements. As in the 434 repressor, specific contacts between the protein and DNA occur in these elements, and they exhibit no unusual helical parameters. Specific contacts are shown in Figs. 6a and 6b. On the other hand, the helical twist in the central (CA)-(TG) stack at position 12 is anomalously large: 49.2” compared to an average of 34.1” over the full sequence. There are no specific contacts made with the protein at this site (Fig. 6b), although
\
n
I
1
FIG. 6 . Summary o f the differeiitial contacts between the A repressor itnd (a) the consenstis half and (I)) the noncomensus half of the Oi,l binding site. The D N A is shown in a cylindrical projection representation. The backbone phosphate groups are shown as circles, with filled circles indicating those contacted by the protein. From Beainer and Pabo (99).
STRUCTURAL FLEXIBILITY IN
DNA-PROTEININTERACTIONS
217
contacts are made with adjacent base-pairs. A significant structural abnormality is indicated in the DNA at this position because the N-7 in the G at position 10 in the complementary strand is hyperreactive to chemical methylation on repressor binding (105).A larger than average twist (40.9")is also observed at the GC sequence at position 10. The (G + C) base-stack at position 10 is also the site of specific interactions with the N-terminal arm of the protein. The five interactions between Lys-3 and Lys-4 and various bases in this region are shown in Fig. 6a. The N-terminal arm is an important feature both in direct readout and in binding d n i t y , because mutations in the first six N-terminal residues reduce the repressor functionality and specific binding constant (107). However, it is defined structurally only in the repressor binding to the consensus (5')half of the recognition site, even at -15"C, where thermal motions are considerably reduced (105); the arm appears amorphous and structureless in the protein binding to the nonconsensus half (105, 107) and makes no contacts with bases in this region (Fig. 6b). This underscores the remarkable subtlety that governs the interrelationships between direct and indirect readout. The N-terminal arm is a structural feature in the A repressor that evidently develops only as the protein and recognition site come together to form a unified whole, and the sequence-directed extended structure of the six N-terminal residues in the protein is an important component of direct and indirect readout. Furthermore, the sequence-dependent structures of both the N-terminal region of the protein and the DNA-recognition site lead to differential binding in the two halves of the dimeric A repressor complex that may have important functional consequences in the transcriptional biology of A phage (105, 106). It appears from the above observations that indirect readout plays a significant role in the binding specificities of both the 434 and A repressors, although subtle differences exist between the two complexes. There can be little doubt that sequence-distinguished sites of anisotropic flexibility, located at critical positions in the DNA-recognition sequence, provide a basis for enhanced binding specificity through indirect readout mechanisms. These probably also allow local structural accommodations to occur between the protein and DNA binding partners that facilitate direct readout in these systems. The differential binding between the two half-sites in the DNArecognition region may also fine-tune repressor binding with respect to Cro in the lytic-lysogenic switches of the respective viruses. The structure of the Cro protein from A phage bound to the 17-bp 0,3 site, TATCACCGCGGGTGATA, has been determined in a cocrystal structure of 3.9 A resolution (66). The DNA in these crystals does not stack endto-end as in most other nucleoprotein crystals, and to date it has been dimcult to improve this resolution significantly. At present, these observa-
218
RODNEY E. HARRINGTON AND ILGA WINICOV
tions are inadequate for a direct structural comparison to the A repressor; the locus of bending cannot be determined precisely from this study, although the bending angle was estimated as about 44". As noted previously, the bending angle has also been determined as about 40" to about 45" using a two-dimensional polyacrylamide-gel assay for ring closure or cyclization (74). This is certainly in general agreement with the crystallographic observation. The additional cyclization studies in which Cro was complexed with several mutations to the OR3 binding site also suggest that the single (CA)*(TG) element in this site may also be primarily responsible for the observed bending (78). The Cro protein of A phage binds as a 14.7-kDa dimer specifically and noncooperatively to several binding sites in the A phage genome. It coinpetes with the A repressor for the 17-bp OR1, OR23 and OR3 sites to effect the switch between the lytic and lysogenic modes in this virus. The differences in binding between Cro and A repressor are not well understood from a structural standpoint. The A repressor contains distinct DNA-binding (Nterminal) and protein-binding (C-terminal) domains and binds cooperatively to these same recognition sites and to three additional 0 , n sites. Its binding affinity to operator sites is somewhat greater than that of Cro (106). (The molecular genetic mechanisms are reviewed in 108-110.) The reasons for the apparent differences in binding &nity and geometry between Cro and A repressor are not clear at the present time. What is clear is that they represent an extraordinarily finely tuned competitive binding system, and in view of the many similarities in the binding domains of these two proteins, the differences must be due to subtle effects such as microstructural relationships between the proteins and the DNA-recognition sites, and to sequencedependent anisotropic flexibility in the DNA. It is also possible that kinetic differences in binding due to DNA bending in Cro and cooperative interactions of C-terminal domains in the A repressor may play a role in the binding competition of the lytic-lysogenic switch mechanism.
3. THECATABOLITE GENEACTIVATORPROTEIN The catabolite gene activator protein (CAP; also called the cyclic AMP receptor protein, CRP) from E . coli functions primarily as an activator of transcription, although it can also act as a repressor (reviewed in 72, 111). If carbon sources are restricted, several operons, including lac and gal, are induced so that alternative sugars can be catabolized, using the coded enzymes. When CAP binds to its allosteric effector, CAMP, it actuates transcription in about 20 promoters in E . coli (111) located from 41 to 103 bp upstream from the start site. The complex binds as a dimer to a 16-bp coiisensus sequence TGTGANNNNNNTCACA. The strongest binding is to the lacP1 promoter having the sequence TGTGAGTTAGCTCACT, but the
STRUCTURAL FLEXIBILITY IN
DNA-PROTEININTERACTIONS
219
characteristic alternating purine-pyrimidine motif, (CAC)*(GTG),appears to
be a common feature of most CAP binding sites (77).Strong binding interactions occur over 28 to 30 bp, a region almost twice the size of the consensus sequence (112).This appears to require substantial bending of the DNA in order to maintain contact between the DNA and protein. Although earlier studies implicated sequences outside the consensus region in the bending (92), the recent crystal structure of CAP complexed to a 30-bp sequence that includes the consensus sequence demonstrates clearly that the bending occurs almost entirely in two kinks at the symmetrically located (CA)*(TG)dinucleotide elements (3, 48). The bending is highly localized at (CA)-(TG)dinucleotide elements spaced at a helical repeat distance in the operator site of CAP (48);the (CA)*(TG)elements kink through about 45", with the result that the helical trajectory of the operator site bends by about 90". Similar bending angles for CAP have been determined in solution using cyclic permutation gel-mobility retardation studies (64, 65). Residual bending seems to be associated with the phased alternation of (A + T)- and (G + C)-rich regions, which have been identified in nucleosomes (71). In addition to the helix-turn-helix contacts, recognition is modulated by 13 additional amino-acid side-chains interacting with 11 phosphates that span a region 28 bp in size. Thus, the binding interaction has a large nonspecific component, but the specific binding part appears to be due largely to the unusual DNA conformation. The bending also seems to have a functional significance because sequences with fixed bends can activate the promoter both in uiuo (113)and in vitro (114). It is possible that the functional role of CAP in promoter activation, and possibly its only role, is to bend the promoter DNA to allow formation of the transcription complex (93),in which case it may be only one of a much larger class of DNA-steering proteins.
4. THE trp REPRESSOR PROTEIN: BY INDIRECTREADOUT
RECOGNITION
EXCLUSIVELY
The tryptophan operon (trp)is a contiguous string of five genes in E . coli that code for several biosynthetic enzymes in addition to its own control elements. It is regulated by the trp repressor, a tetramer of about 12,500-Da subunits, which requires L-tryptophan as a corepressor to provide an autogenous control system highly sensitive to local Trp concentration. The cocrystal structure of the trp repressor complexed with a 19-bp oligonucleotide of sequence TGTACTAGTTAACTAGTAC that simulates the actual operator sequence has been determined to 2.4 di resolution (102). This structure discloses a number of interesting differences between the trp repressor protein and other prokaryotic helix-turn-helix regulatory proteins, including the complete absence of direct readout mediated by specific contacts in
220
RODNEY E. HARRINGTON AND ILGA WINICOV
its binding to its DNA-recognition site. Instead of direct contacts between amino-acid residues and bases in the recognition site, a number of hydrogenbonded interactions are observed between amino-acid residues and phosphate groups in the phosphodiester backbone of the DNA. Thus, sequence specificity is evidently determined entirely by structural accommodations between the protein-binding domain and the DNA operator region. Other studies on the protein suggest that the recognition helix in the helix-turn-helix binding domain is unusually flexible in that binding L-tryptophan to the aporepressor does not lead to a unique binding domain structure (115). However, it does cause an orientation of the helix-turn-helix domain to occur that facilitates interaction of the recognition helix with the major groove in the recognition site (116). Similarly, no unique structure was found for the N-terminal residues, although it was recognized that this might be either an artifact of the crystal environment (102)or a possible consequence of N-terminal arm involvement in protein-protein interactions of dimer formation (117). The recognition-site DNA in the complex showed two shallow bends at (TA).(TA) base-stacks 6 and 14. These bends occur in different planes, and because they are separated by only nine bases, appreciably less than a full helical repeat. this suggests that the (TA)-(TA)elements are undergoing anisotropic flexure. These sequence elements are centered in the five-base tracts, (ACTAG)*(CTAGT),at positions 4 and 12. These exhibit the largest deviations in slide and roll angles from average B-DNA values, although the helical twist values are not extraordinary. The (ACTAG)-(CTAGT)tracts are also the region most sensitive to mutations in this operator site (118).The (TA)-(TA)elements at positions 3 and 17 have high helical twist values of 42.8", but are well within the canonical B-form DNA range in other helical parameters. The central (TA)-(TA)element at position 10 shows a slightly abnormal roll angle of 8.8". The only direct contacts between the tr p repressor protein and its DNArecognition site are three water-mediated interactions between specific residues and the A, G, and T at positions 15, 16, and 17, respectively. Although such water-mediated interactions might contribute a small component of direct readout to the binding specificity, it is doubtful that they could account for more than a small part of either the observed affinity or the binding specificity of the trp repressor protein to its operator. These are evidently determined exclusively or almost exclusively by indirect readout mechanisms. It is therefore of interest to examine more fully those sequence elements in the DNA that deviate most significantly from canonical B-form DNA. Gel electrophoresis studies suggest that (CA)*(TG)and (TA).(TA)dinucleotide elements are unusually susceptible to kinking by rolling into the
STRUCTURAL FLEXIBILITY IN
DNA-PROTEININTERACTIONS
221
major or minor groove, i.e., presumably they can deform at lower energies than can other dinucleotide stacks (40). The evidence above suggests that these elements may be the principal components in the large and nearly exclusive indirect readout mechanism that determines the binding specificity of the trp repressor system. By allowing the DNA to assume a very precisely defined microstructure, these (and possibly other) sequence elements may allow the DNA topography to conform very closely to that of the protein in the presence of bound corepressor at a relatively low free-energy cost. The free-energy difference between specific and nonspecific binding for the trp repressor has been estimated as =6 kcal/mol(119). The formation of 24 hydrogen bonds between the protein and the DNA phosphates (102),and improved electrostatic interactions and entropic contributions from enlarging the water-excluded contact surface between the two binding partners, can all contribute toward meeting this cost. However, it is clear that the requirement of high sequence specificity of binding also requires that the DNA be deformable in a very precise way at a relatively low energy cost. This could certainly be achieved in a sequence that contains flexible elements at just the correct positions.
6. Eukaryotic Helix-Turn-Helix
Proteins:
The Homeodomain The homeodomain is a 60-aminoacid region that comprises the DNAbinding domain of a large group of eukaryotic transcription factors (120,121). It was found in a group of Drosophila regulatory proteins involved in homeosis, a transformation in which the development of one part of a system mimics that of another, but evidence is now available that the homeodomain occurs in a broad family of eukaryotic regulatory proteins and can be grouped into a number of subfamilies (reviewed in 120). The amino-acid sequences are highly conserved and the modes of DNA binding are evidently comparable, with many similarities to the basic prokaryotic helixturn-helix motif (reviewed in 122). Current evidence suggests that the level of conservation is significantly higher among protein homeodomains than among the corresponding DNA homeobox sequences (which typically are found in the coding regions of the genes). In addition, there is usually little amino-acid sequence similarity outside the homeodomain regions, although there are a number of important exceptions. Thus, evolutionary pressures are generally presumed to be more important at the protein level in these systems (120). At present, relatively high-resolution structural data are available for three different homeodomain peptides complexed to DNA-recognition se-
222
RODNEY E. HARRINGTON AND ILGA WINICOV
quences. The structure of a mutant Antennapedia (Antp)homeodomain from Drosophila bound to a 14-bp oligomer has been studied by NMR (123). Cocrystal structures have been completed on the “engrailed protein from Drosophila (124) and on the a2 repressor protein from yeast (122), both complexed to DNA oligomers with consensus recognition sequences that bind the proteins tightly. These proteins are genetically distinct, but are quite similar to each other in the gross structure of their binding regions. which all fall into a broadly defined helix-turn-helix motif, although the sequence similarity among these homeodomains is <30%. The recognition DNA is exclusively B-form in all these systems, and there is no evidence for bending or for significant sequence-directed structural dislocations in the specific systems investigated. Binding specificity derives in all cases from a very limited number of specific contacts between recognition regions of the protein and specific bases in the major or minor grooves of the DNA. However, in all these structural studies, both the protein and recognition DNA moieties are severely truncated, and it is possible that complexes between the entire proteins and longer regions of DNA may utilize flexibility both in the proteins and in the DNA to increase binding specificity through indirect readout.
1. NMH STRUCTUREOF
AN
Antp HOMEODOMAIN PEPTIDE
An Antp homeodomain peptide consisting of the 68-aminoacid residues between 297 and 363 of the Antp protein, plus an N-terminal methionine associated with the preparative method, was investigated by NMR (125). It also contained a C-to-S mutation at residue 39 that prevents dimerization. This peptide was bound to a 14-bp DNA fragment corresponding to the BS2 binding site identified as a specific recognition site, CTCTAATGGCTITC (complementary strand 3’-GAAAGCCAITAGAG-S’) (125). The mutant and wild-type proteins bind to this DNA sequence with similar binding constants of approximately 10-9 M (126). An essentially complete set of sequencespecific resonance assignments were obtained; comparison of these to independent assignments on the peptide and DNA moieties alone suggested that conformationaf changes in both binding partners associated with complex formation were relatively small. The structure was determined by distance constraints guided by molecular modeling of the free protein associating with the free DNA. Nine intermolecular NOES were obtained, indicating specific contacts between the pepticle and the DNA; the most important of these are shown in Fig. 7. These contacts were all in the major groove with the single exception of R(S) to G(12) in the complementary strand, which was a minor-groove contact, and the majority from helix I or the loop involve sugar protons of the DNA. It is clear from Fig. 7 that four are contacts between the putative recognition
STRUCTURAL FLEXIBILITY IN
5 RO
10
DNA-PROTEININTERACTIONS
15
20
223
25
LT
\
R R R30 R I A35
-0
H A
N E K K W K M R R N O F W I K I O R E T L. 60
55
[
50
45
helix IV I helix 111 000000000000
s
L
40
-_--. turn 007
FIG.7. Summary of available data on interactions in the complex between the Antp homeodomain and the 14-bp binding-fragment used in the N M R structural studies of Otting et d. (123).The sequence of the fragment is given in the center; a and p refer to the two complementary strands. The amino-acid sequence of the protein is arranged in a clockwise fashion around the D N A and is numbered from the C-terminal end; residues at either terminus are omitted for simplicity. The secondary structure of the protein in free solution and in the complex is noted beside the protein sequence. Bold letters indicate those amino-acid residues for which large chemical-shift changes were noted on complexation. Squares indicate those residues with slow amide proton-exchange in the complex, and a question mark indicates those for which no measurement of exchange was possible. Arrows show the specific contacts as evidenced by NOES behveen the D N A and the protein. Reproduced from Otting et al. (123)by permission of Oxford University Press.
helix 111and the AA element in TAAT, and with helices 111 and IV and the C in (CA).(TG) of the complementary strand. In the free Antp homeodomain, helix IV is an extension of helix I11 and the two are linked by a kink of about 30"(127); this extended helical structure evidently persists in the nucleoprotein complex. It is possible that flexibility in both the DNA at the (CA)*(TG)element and in the protein at the helix 111-IV junction may permit additional contacts to occur in the complex. Overall resolution is insufficient to address this possibility directly, but it is supported by the presence of chemical shift differences for residues between positions 55 and 60 in helix IV. On the other hand, amide proton exchange rates suggest that helix I11 is stabilized in the complex but helix IV is similar in the complex and in the free peptide. Whether this latter observation is an artifact of the relatively short DNA sequence used in this work is not known. However, the observed high specificity of binding would be favored by additional specific contacts as well as by DNA and protein structural changes that strengthen indirect readout.
224
RODNEY E. HARRINGTON AND ILGA WINICOV
2. COCRYSTAL STRUCTURESOF NOMEODOhfAIN OF
HOMEODOMAINOF DNA SEQUENCES
THE ENGHAILED Drosophila AND THE MATa2 YEAST BOUND TO RECOGNITION
A peptide of 61-aminoacid residues containing the 60-residue homeodomain from the engrailed protein of Drosophila with an added N-terminal methionine was cocrystallized with a 21-bp fragment of sequence TITTGCCATGTAATTACCTAA, and the crystal structure was determined to 2.8 A resolution (1%). This fragment binds the peptide tightly with a biridirig constant of about 10-9 M. Three principal helical regions having a motif similar to prokaryotic helix-turn-helix proteins were found; helices I and I1 were oriented approximately antiparallel and joined by a loop; helix 111, the putative recognition helix, was oriented almost perpendicularly to these and was connected to helix 11 by a turn. In the nucleoprotein complex, the hydrophilic face of helix 111 lies in the major groove of the DNA so that several amino-acid side-chains can make specific contacts, whereas helices I and I1 span the major groove at almost right angles to it. No evidence for the kink in helix 111 observed in the A n t p homeodomain could be observed at the resolution reported either in the complex or in the free peptide. The DNA fragments are aligned end-to-end in the crystal to form a quasicontinuous helix, and peptides are bound both at the junctions and over the TAAT sequence element (bases 11-14). Binding to the TAAT element is tighter than at the ends by roughly two orders of magnitude, and in the complex to TAAT, specific contacts are made by three residues on helix 111 to this element in the major groove and by two residues located on an extended N-terminal arm in the minor groove. Additional contacts occur with the DNA backbone. Neither the peptide nor the DNA conformations are seriously distorted in the complex as compared to the free state. From this, it would appear that binding specificity in this complex arises almost exclusively from direct readout, with little if any indirect readout contribution. However, both the peptide and DNA-recognition fragment are truncated in the complex studied, and it cannot be determined from these results whether additional contacts and alternative structural features might augment the role of indirect readout in the full nucleoprotein complex. Certain structural features and a comparison of the engrailed and MATa2 complexes suggest this as a possibility. The horneodomain of the a2 repressor in yeast has also been cocrystallized with a DNA fragment containing a recognition sequence and the structure determined to 2.7 A resolution (122). In spite of the sequence divergence, the MAT& homeodomain is structurally similar to the engrailed horneodomain, and the DNA binding utilizes a comparable extended helix-
STRUCTURAL FLEXIBILITY IN
DNA-PROTEININTERACTIONS
22s
turn-helix motif. The DNA-recognition site investigated is a 21-bp sequence of imperfect dyad symmetry truncated from the STE6 gene operator (128)that consists of a conserved 13-bp region between two 9-bp a2 recognition elements. In the native a 2 complex, a dimer of MCMl proteins bind to the central region, and the a 2 dimer binds to the 9-bp regions through the homeodomains located in the C-terminal region ofthe protein. The N-terminal domains interact cooperatively in a manner analogous to the A CI repressor. The truncated recognition site used for the cocrystal structure has lost the central 12 bp of the MCMl site and consists only of the or2 sites and a single overhanging base at each 5' end. Its sequence is shown in Fig. 8 along with alignments of four additional a2 binding sites for which the a2-MCM1 complex functions genetically as a repressor. The CATGTAAT consensus sequence element is present also in the engrailed recognition site. As in the case of the engrailed complex, the a 2 complex appears to form with only minimal structural distortion of the free peptide and the DNAbinding partners. In particular, no bending of the DNA is evident. The pattern of specific contacts is also similar, but involves different amino-acid
-1
1
2
3
4
5
6
7
8
9 10
A C A T G T A A T T C A T T T A C A C G G G T A C A T T A A G T A A A T G T G C G U
A)
9 ' 8 ' 7 ' 6 5 ' 4 ' 3 ' 2 '
B)STEG
C A T G T A A T T C G T G T A A A T
BAR1
C G T G T A A T T C A T G T A A T T
STE2
C A T G T A C T T C A T G T A A A T
MFAl
T G T G T A A T T C A T G T A A A T
MFA2
C A T G T A T T T C A T G T A A A T
1'-1'-2'
consensus C A T G T A A T T FIG.8. A comparison of the oligomer sequence used for the cocrystal structure study of the MATa2 homeodomain-operator complex by Wolberger et al. (122) with the five known operator sites. (A) Sequence of the 21-bp oligomer used for cocrystallization. The consensus side is to the left (bases 1-9) and the nonconsensus side is to the right (bases 9'-1'). The axis of pseudodyad symmetry is the C.G (base 10). (b) Alignment of the ten known a 2 binding-sites as obtained from the five known a2-MCMl operators. From Wolberger et al. (122)with permission of Cell Press 0 1991.
226
RODNEY E. HARRINGTON AND ILGA WINICOV
residues in the two complexes. In the a2 complex, three specific contacts occur in the major groove between helix I11 and the TGT element in the DNA, and an N-terminal arm of the homeodomain allows at least one specific minor-groove contact. Additional contacts with the backbone occur in the 5' end of the consensus sequence with the (CA)-(TG)and the T of the (TG).(CA) elements. The specific contacts are summarized and compared between the a2 and engrailed complexes in Fig. 9 (122).The most important specific contacts of helix I11 with the DNA occur two to three bases 5' in the a2 peptide compared to the engrailed peptide, but about the same number of contacts are involved in the two complexes and the factors governing direct readout in the two systems must therefore be similar. Although the contacts between the a2 and engrailed homeodomains and the 5' sequence elements (CA).(TG) and (TG)-(CA) are to the sugar-
a2
n
engrrllsd 5' 8asn paws
1
1
2
3
4
5
8
7
6
9
FIG.9. A comparison of the engrailed and a2 homeodoinain complexes with recognition sequences. The D N A is shown in a cylindrical projection. Backbone phosphates are shown as circles, with hatching indicating those phosphates that make specific contact with the homeodomain. Identical residues in the two proteins are enclosed in solid lines, and nonidentical residues, in dashed lines. The sequence of the a2 site is the consensus half. From Wolberger et al. (122) with permission of Cell Press Q 1991.
STRUCTURAL FLEXIBILITY IN
DNA-PROTEININTERACTIONS
227
phosphate backbone, the degree of sequence conservation at these sites is remarkable. Furthermore, the helical twist values at these sites are extraordinarily large in both cases. In the engrailed complex, the twist values for the (CA)*(TG)and (TG).(CA)elements just upstream from the (TAAT)*(ATTA) core are 44.7" and 40.6', respectively, compared to a mean of 34.2" over the entire recognition sequence. In the a 2 complex, the same elements twist 37.8" and 38.7", respectively with a mean of 34.5'. These values are especially interesting because Lipanov et aZ. (73) observed twist angles of about 50' and 36" at the same (CA).(TG)element in a DNA decamer crystallized in two different space groups, and attributed this to unusual flexibility. As noted in Section 11,A,3), (CA)*(TG)is kinked in the cocrystal structure of the CAP-DNA complex (48),and it appears to be a locus of unusual flexibility in the complex of Cro protein of A phage with its OR3 recognition site. It is therefore a reasonable conjecture that, in the native a 2 and engrailed homeodomain nucleoproteins, these sequence elements may grant enough axial or torsional flexure to permit additional specific contacts to be made and to amplify binding specificity through indirect readout mechanisms. Such a conjecture would also rationalize the high level of conservation in these sequences.
C. Zinc-finger Proteins: The "C2H2" Classes 1. THE ARCHETYPAL ZINC-FINGERTRANSCRIPTION FACTOR, TFIIIA The Cys,His, (C,H,) class of zinc-finger proteins, typified by transcription factor TFIIIA from Xenopus laeuis, have as a common basic structural unit two cysteines and two histidines coordinating a single zinc atom and separated by a loop of 12 or 13 amino acids. TFIIIA is required for transcription of the 5s ribosomal subunit genes by RNA polymerase I11 (129). It is a canonical zinc-finger protein of 39 kDa and it binds in a highly sequence-specific fashion to an approximately 50-bp internal control region (TCAGAAGCCAAGCAGGGTCGGGCCTGGTTAGTACTTG GATGGGAGACCGCCTG), spanning bases 45 through 97 in a 122-bp highly conserved region of the 5s RNA gene using a 30-kDa region of its N-terminal domain (130, 131), which contains nine well-defined zinc-fingers (132). On the average, therefore, each finger domain can interact with up to about 5 bp. It also can bind to the 5s gene product to form a 7s nucleoprotein particle that, in oocytes, evidently functions to store RNA for later use (133). Although the characterization of TFIIIA initiated the zinc-finger concept (132), many other specific binding regulatory proteins are now known to utilize this general binding motif (yeviewed in 133, 134). Two principal models have been proposed to describe the interaction of
228
RODNEY E. HARRINGTON AND ILGA WINICOV
TFIIIA with its recognition site. These models can probably be generalized to other zinc-finger proteins, at least to those of the same CzHz class. Both models require that the overall alignment of the protein is roughly parallel to the DNA. In the model developed by K h g and co-workers (135, 136; reviewed in 137), the protein lies along a single face of the DNA and successive zinc-fingers interact in alternate orientation with the major groove such that structurally equivalent -5-bp contacts occur every other finger and are spaced -10 bp apart. In an alternative model proposed by Berg (134, 138), the fingers wrap around the major groove, making -3-bp contacts. To fit the hydroxyl radical footprint of the complex (139),fingers 1, 5, 7, and 9 lie in the major groove while 6 is constrained to lie across the DNA due to the short linkers, resulting in a sharp kink of -60" or greater in the DNA at a point about one-third from the end of the internal control region. The structure of the Zif268 protein is generally consistent with this latter model (140). Both models can account for DNA bending in the association complex, but neither requires it in general except in the case of TFIIIA. It is not clear at the present time whether DNA bending with zinc-finger proteins is likely to be a general phenomenon (134). Studies on TFIIIA mutants (141) and on cloned peptides containing different zinc-fingers (142-144) demonstrate a division of labor among the nine zinc-fingers for DNA and RNA binding activity. Hydroxyl-radical cleavage patterns (141) and footprinting, methylation interference, and differentialbinding studies on precisely defined fragments of the DNA-binding domain (143, 144) show that fingers 1, 2, and 3 bind with high specificity and affinity to the 3' "C-block promoter element (the last 18 bp in the internal control sequence given above) and provide about 95% of the binding energy of the full protein to the full internal control region site. Base-specific and phosphate contacts in the major groove provide a direct readout mechanism for this binding. Zinc-fingers 4, 5 , 6, and 7 represent the minimal domain required for specific, high-affinity binding to RNA (144). Fingers 1, 2, and 3 are not required to account for most of the RNA-binding energy. Fingers 8 and 9 contribute little to either the DNA or RNA binding specificity or &nit); and evidently function only to locate the protein on the internal control sequence by binding fingers 7 through 9 weakly to the 5' adenine block promoter element (base-pairs 6 through 18 in the above sequence) (141-143). Such correct positioning of the protein on the target DNA is probably required to provide essential contacts for other components of the transcription complex (145). Hydroxyl radical studies (141) on TFIIIA bound to the internal control sequence have suggested that the DNA-binding fingers 1, 2, 3, 7, and 8 are in a compacted conformation similar to that observed in the Zif268 protein (see below), whereas the RNA-binding fingers 3,
STRUCTURAL FLEXIBILITY IN
DNA-PROTEININTERACTIONS
229
5, 6, and 7 are extended and lie roughly parallel to the helical axis of the DNA. There is presently some controversy as to whether the DNA bend in the TFIIIA recognition site complex predicted by the Berg model (134, 138) is observed experimentally. In Fig. 10, a helical trajectory plot is shown of the internal control region to which TFIIIA binds. This plot is based on wedge angles given by Bolshoy et al. (13) and twist angles from Kabsch et al. (80); the predicted average helical repeat for this region is 10.64 bp. Clearly, there is no evidence of significant fixed bending in the absence of bound protein. However, there is some experimental evidence that the DNA in the TFIIIA-DNA complex may be curved. Phosphorus-imaging electron-microscopy of TFIIIA bound to the internal control region suggests that the DNA may bend (146). Experiments based on circularization and cyclic permutation gel-mobility shift assays appear to demonstrate DNA bending of about 60 to 65" in the internal control region (147). This conclusion has been challenged by Zweib and Brown (148), who also used the cyclic permutation method and report that the DNA in the TFIIIA complex is bent by no more than about 30". It has been suggested
0"
60'
I 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7
5'-TCGGAAGCCAAGCAGGGTCGGGCCTGG 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
TTAGTACTTGGATGGGAGACCGCCTG-3' FIG. 10. Computer-developedDNA trajectory projection for the internal control region of transcription factor 1lIA (TFIIIA) from Xenoplls luevis (86.87).The following helical parameters were used: dinucleotide wedge angles (combined roll and tilt angles) were taken from the compilation of Bolshoy et al. (13)and helical twist values from Kabsch et aZ. (80).The 5' base in the sequence is at the bottom of each strand. Two orientations of the strand about an axis perpendicular to the 5' base-pair, which differ by 60" in an anticlockwise sense, are shown.
230
RODNEY E. HARRINGTON AND ILGA WINICOV
that the discrepancy between these two studies may arise from differences in the ionic strengths employed, because, at high and low ionic strengths, TFIIIA may exist in different conformational states that may bend the DNA recognition region differently on complexation (149). This is an important question because studies to date cannot determine whether the intact TFIIIA protein undergoes conformational changes on binding to different substrates. The 50-bp recognition sequence of TFIIIA given above contains the putative flexibility elements (CA).(TG) and (TA).(TA)in near-helical or halfhelical periodicity. It is therefore of some interest to examine the effects these elements may have on DNA conformation. Fig. 11 shows axial trajectory plots of the full internal control region using the same helical parameters as in Fig. 10 except that (CA)*(TG)and (TA).(TA)are allowed to kink through an angle of 45" into the major groove. The first of these assumptions is justified by structural studies on the (CA)*(TG)element in CAP (48) and Cro (66, 74. 78) recognition sequences, which show that it kinks by about this amount in the respective nucleoprotein complexes. The second assumption, that kinking occurs into the major groove, is based on theoretical considerations (150). In Fig. l l a , a trajectory plot is shown for the internal control region assuming kinking at (CA).(TG)and (TA)-(TA)elements. Two views are shown: in the first, the DNA is lying primarily in the plane of principal bending; in the second, this plane is rotated 60"about an axis normal to the first (5')basepair. Figure I l b shows the same views, but excludes kinking at the (TA).(TA) elements. The jog in the axial trajectory nearest the 5' end is cased by two (CA).ITG) elements spaced at half-helical periodicity; subsequent elements then bend the DNA almost coherently in the plane until the final (3')about 10 bp. which are seen to curve away from the principal bending plane. The main effect of including the (TA)*(TA)elements is to increase the bending planarity over the 3' half of the sequence, which includes the C-block promoter element. Although the bending assumptions used in this illustration cannot be rigorously justified, Fig. 11 nevertheless points up a possible, and indeed, a likely role for sequence-directed flexibility in the TFJIIA internal control region that is consistent with known protein-DNA contacts (141-145). Limited axial deformation at (CA)+(TG)and (TA)-(TA)elements can easily lead to the degree of bending suggested by independent experiments (146, 147), and furthermore can lead to a DNA conformation in the internal control region that allows the DNA to wrap around the protein and thereby provides an efficient structure for multiple zinc-finger binding at either end (144, 14.5).
STRUCTURAL FLEXIBILITY IN
DNA-PROTEIN INTERACTIONS
231
60'
(b) 1 2 3 4 5 6
7 8 9 0 1 2
3 4 5 6 7 8 9 0 1 2 3 4 5 6 7
5'-TCGGAAGCCAAGCAGGGTCGGGCCTGG 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8
9 0 1 2 3
TTAGTACTTGGATGGGAGACCGCCTG-3' FIG. 11. Computer-developed DNA trajectory projection for the internal control region of transcription factor IIIA (TFIIIA) from Xenopus h o i s (86,87).The followinghelical parameters were used: dinucleotide wedge angles (combined roll and tilt angles) were taken from the compilation of Bolshoy et aE. (13)and helical twist values from Kabsch et al. (80).The 5' base in the sequence is at the bottom of each strand. (a) All (CA)-(TG)and (TA).(TA) dinucleotide elements are kinked through 45" by roll angle only into the major groove, as suggested by the available experimental evidence (3, 40,48, 74, 78).Views shown are rotated 60"with respect to each other about an axis normal to the 5' base-pair. (b) (CA).(TG) dinucleotide elements are kinked as in (a) (3, 48, 74, 78) and (TA).(TA)dinucleotides assume their normal wedge angle values (13). Views shown are also rotated 60" with respect to each other about an axis normal to the 5' base-pair.
232
HODNEY E . HARHINGTON AND ILGA WINICOV
The TFIIIA nucleoprotein complex has also been studied using a number of biochemical and biophysical techniques, including nuclease digestion (1S1, 152). methylation protection (135),and hydroxyl radical footprinting (136, 139), including the missing nucleoside method (143, phosphorusimaging electron-microscopy (146), circular dichroism spectropolarimetry (253), and electric dichroism (15.1).None of these methods has yet provided an unequivocal characterization of its conformation or conformational lability. Nevertheless, from what is currently known about this system, it seeins likely that TFIIIA utilizes sequence-directed flexibility and microstructure in its recognition sequence, so that indirect readout is an important component of its binding specificity However, evidence is not yet adequate to determine whether this notion can be extended to other zinc-finger proteins.
2. THEZr~268PROTEINOF
THE
MOUSE
‘4peptide ofthe ZiQ68 immediate-early-protein from the mouse containing only the DNA-binding domain has been cocrystallized with a DNArecognition site, and the structure has been determined to 2.1 A resolution j2.10). The Zif.268 protein uses a three-zinc-finger DNA-binding motif, and the peptide used for the structural study contained all of these. The DNA fi-agment used for cocrystallization was an 11-mer consensus (155) binding sequence AGCGTGGGCGT. The three zinc-fingers are in a tandem array, and the a-helices of each finger contact adjacent base triplets in the major groove with a similar orientation: finger 1 contacts the 3’ (GCG).(CGC) triplet, finger 2 contacts the central (TGG)*(CCA)triplet, and finger 3 contacts the 5‘ (GCG)-(CGC)triplet. The entire protein therefore assumed a slightly skewed C-shaped conformation that lies along the major groove. Although a (TG)*(CA)sequence is in the central-binding base-triplet, the structure showed little evidence for D N A bending at this site. Helical parameters for the entire recognition sequence were well within the normal range for B-form DNA, although considerable variability was observed in helical twists, particularly in the binding triplets for fingers 1 and 3. Binding specificity seeins to be mainly direct readout from 11 specific base contacts by the peptide. and a number of interactions with the DNA backbone. These are mainly with the guanine-rich noncoding strand. There is little or no evidence for indirect readout based on unusual DNA microstructures, although the central (TGG)*(CCA)triplet is highly conserved. It cannot be determined from this study whether deformation of the DNA-binding site occurs in the complex between the full protein and a longer DNA sequence. As noted above, such an effect has been found in cocrystal studies of the A Cro protein complex to the 0,1 operator site. In addition, crystal packing forces may influence complex conformation in these relatively “soft” sys-
STRUCTURAL FLEXIBILITY IN
DNA-MOTEIN INTERACTIONS
233
tems. In the Zif268 cocrystal, the 11-bp duplexes stack end-to-end, forming a quasi-continuous helix. This forces the 11-bphelical repeat observed in the operator DNA and could lead to additional constraints against DNA bending.
D. The “C,” Class of Zinc-binding Proteins A more diverse class of zinc binding proteins, the (Cys), class, contains a variable number of cysteines. For example, certain yeast transcription31 factors, typified by GAL4, contain six invariant cysteines (CJ. The steroid receptors that have been characterized are mostly of the C, or C, class. The C, proteins are predicted to form a 13-aminoacid loop similar to the C,H, group. Important members of this class include the large superfamily of hormone receptor proteins, including steroid hormones, thyroid hormones, retinoic acid, and vitamin D, receptors (reviewed in 156, 157‘).
1. THE GAL4 TRANSCRIPTIONAL FACTOR The GALA transcriptional factor from yeast regulates a number of genes responsible for galactose catabolism (reviewed in 158).Transcriptional activation by GALA is controlled by the regulatory factor GAL80 in an inverse fashion with local galactose availability and a GAL3 gene product acts also as a GAL80 regulator (159). A cocrystal structure at 2.7 A resolution has been reported for an N-terminal 65-aminoacid fragment from GALA bound to a consensus DNA operator sequence that provides many details of the protein-DNA binding in this complex (160). In the cocrystal, the protein was bound to a 19-bp fragment CCGGAGGACAGTCCTCCGG containing a 17-bp palindromic recognition sequence. This sequence differs from the consensus of the 11known binding sequences (161)only in replacing an A for a T at the dyad at position 10. Although GAL4 binds as a dimer to its palindromic recognition sequence, it does not dimerize in the absence of its operator DNA (162). A crystal structure (162)and an NMR study (163)of the protein show that the DNA-binding domain contains six cysteine residues coordinating two Zn2+ ions in a “binuclear cluster” (163).These represent a class of zinc-binding proteins distinct from the zinc-fingers of TFIIIA and the zinc-binding domains of the steroid receptor proteins. In GAL4, the zinc-binding domains interact in the major groove with the (CCG).(CGG)trinucleotide elements at the ends of the recognition sequence. These domains are linked to C-terminal coiled-coil domains, which lie roughly over the dyad in the symmetric complex, by an extended segment that is unstructured in the absence of bound DNA, but which follows along the minor groove in the complex. The overall complex is tripartite, and the carboxy-terminal regions function as weak dimerization elements.
234
RODNEY E. HARRINGTON AND ILGA WINICOV
The DNA is weakly bent toward the protein at the dyad, which is a (CA).(TG) in the cocrystal and a (TG).(CA) in the consensus operator sequence. The operator DNA is canonical B-form in its helical parameters, but the minor groove, which is unusually wide over most of the 17-bp recognition sequence, is conspicuously constricted just at the dyad. This is evidently due to interaction of side-chains in the N-terminal portion of the coiled-coil domain with phosphates in the minor groove 2 and 3 bp away from the dyad, but it may also facilitate the small degree of bending observed in the operator DNA at or near the dyad. The major groove in the central region of the DNA-recognition site is relatively free of protein in the cocrystal structure studied, and may therefore be a binding site for an additional protein under in uiuo conditions. This seems likely, because binding specificity froin the zinc-binding domains does not appear to be greatly extended by indirect readout, with the exception of possible flexure at the putative kink site at the dyad. Furthermore, the central sequence is relatively conserved, suggesting that it may also be involved in additional protein-DNA contacts.
2. THE NUCLEARRECEPTOR PROTEIXS
The nuclear receptor proteins constitute a broad family of regulatory proteins that includes receptors for steroid hormones, thyroid hormones, vitamin D, retinoids, and a number of other regulatory proteins, some of which are of unknown function (164). They are activated by ligand binding and have many functional features in common, such as distinct domains for DNA binding, ligand binding, and regulation of transcription. The DNAbinding domains contain eight cysteines coordinated in various ways to the zinc ion. The best studied from a structural standpoint are the steroid receptors (reviewed in 165) as typified by the glucocorticoid receptor protein. In the DNA-binding domains, the eight cysteines coordinate two zinc ions to form two tetrahedrally coordinated zinc-binding regions containing looped regions of -9 to -13 residues. A highly amphipathic a-helix of -11 to -13 residues begins at the residue following the third zinc-binding cysteine. NMR results indicate that the zinc-binding motifs in the glucocorticoid receptor differ significantly from those in TFIIIA. In the hormone receptors, the zinc-binding motifs tend to fold together as part of a larger single domain (165-167), in contrast to TFIIIA, in which they exist as independent zinc-fingers. In both cases, direct readout of the DNA sequence occurs through the zinc-stabilized recognition a-helices, but in the glucocorticoid receptor, the two helices lie almost at right angles in the unified domain, and the first helix functions as the recognition helix by insertion into the major groove of the recognition-site DNA.
STRUCTURAL FLEXIBILITY IN
DNA-PROTEININTERACTIONS
235
Such a unified structure has been verified in a recent crystallographic study at 2.9 resolution of the DNA-binding domain of the glucocorticoid receptor protein cocrystallized with a specifically bound DNA fragment (167). This suggests that the glucocorticoid receptor, and probably other steroid receptors as well, constitute a clearly delineated third class of zincbinding regulatory systems. In the cocrystal(267), the glucocorticoid receptor was bound to a symmetrized operator CCAGAACATCGATGl'TCTG instead of to the consensus (168) sequence TCAGAACATCATGTTCTGA. In these sequences, the half-sites are hexamers (underlined), so that the spacers between them are 4 and 3 bp, respectively. The DNA-binding domains dimerize when bound to the symmetrical sequences with a spacing between them corresponding to that of the consensus sequence. Only one of the domains interacted fully with the half-site, whereas the other was displaced 1 bp toward the first and interacted nonspecifically with a nonbinding hexamer. A subsequent study at 4 A resolution used cocrystals of the DNA-binding domains with the consensus sequence, and revealed that in this case the domains associated correctly with the two hexameric half-recognition sites. The pattern observed in the consensus sequence of (CA)-(TG)(position 7) separated from a (TG)*(CA)(position 12) by a 3-bp spacer, i.e., an exact halfin the helical turn, is preserved also in the estrogen response element (168); thyroid response element, the 3-bp spacer is lost (169). The flanking (CA)*(TG)and (TG)*(CA)elements at positions 2, 10, and 17 in the glucocorticoid receptor consensus sequence are not highly conserved. Binding specificity in each half-site is provided by three specific base contacts and seven backbone contacts in the major groove between the DNA and the zinc-binding module (recognition helix) of the peptide. In the glucocorticoid and estrogen receptor proteins, spacing and orientation requirements for the half-recognition sites are determined by a region of about four residues in a loop on the surface of the protein near the recognition helix that can form protein-protein contacts with the adjacent bound protein. This leads to the correct spacing and orientation of the recognition helices in the bound protein dimer. In the thyroid receptor, these residues do not support such interactions, and allow dimeric protein binding to the two adjacent halfsites. In all cases, however, binding of two proteins to the full recognition site occurs independently and cooperatively, and under normal conditions the proteins do not dimerize in free solution. Although this level of direct readout is not extraordinary, even including the contributions of protein-protein interactions, the nuclear receptor proteins as a group are characterized by relatively weak af€inity to their target sequences. Although conserved putative flexibility elements are present in the symmetrical sequence used in the cocrystal structure, the DNA is not
236
RODNEY E. HARRINGTON AND ILGA WINICOV
bent in the half-recognition sites and its helical parameters are within the B-form range. However, considerable variance in these parameters occurs across the half-recognition sites and the major groove in the specifically bound site is widened by about 2 W compared to the nonspecifically bound site, possibly as a consequence of the inserted recognition helix. Whether this apparent absence or near absence of indirect readout is an artifact stemming from the small DNA sequence (19 bp) and truncated protein cannot be determined at present. There is a growing body of evidence that the functions of many hormone response elements inay be modulated not only by the binding of appropriate receptor proteins but of various nonreceptor accessory proteins as well. In addition, many response elements may in fact be composite and functionally modulated by more than one receptor, as well as by nonreceptor proteins. Recent examples include composite response elements for the mineralocorticoid and glucocorticoid receptors (170; reviewed in 171)and for the vitamin D receptor and retinoidX receptor-a (172; reviewed in 173). These highly complex differential interactions provide a richly varied menu of possibilities for ligand-mediated transcriptional control. They would also seem to virtually mandate flexibility in the response element and possibly in the receptor proteins as well. Hence, the absence of DNA bending observed in the cocrystal structure of the glucocorticoid DNA-binding domain complexed with the response element (167) may not be a general characteristic, because this structure may represent only one of a large number of functional structures, and the putative flexibility elements in the DNA-recognition sites may conceivably play important fiinctional roles in this regulatory system.
E. Leucine-zipper Proteins 1. Fos
r
l JLJK ~ ~
Fos and Jun are transcription factors of the bZIP family that interact either as homodimers or heterodimers with a DNA-binding region that is conserved from yeast to humans (reviewed in 174). By forming heterodimers with a large number of other family members, these proteins can create a very large number of complexes that can compete for the AP-1 and CRE recognition sites (175).Because these sites are often components of complex regulatory elements, the associated proteins may interact further with other transcription factor families. Thus, these recognition regions are subject to a set of rather special requirements, including a high level of versatility. Dimerization of the proteins is mediated by a leucine zipper in the DNAbinding domain of both proteins (1 76).The consensus sequence for binding is TG.4CTCA, which also binds the human transcription factor AP-1 and the
STRUCTURAL FLEXIBILITY I N
DNA-PROTEININTERACTIONS
237
yeast factor GCN4 (174;see Section 11,E,3).This sequence has dyad symmetry about the central (CG).(GC) base-pair and is unusual also as it contains (CA).(TG) putative kinking elements (48, 78) phased at exactly half-helical periodicity. Cellular Jun from both chicken and humans also binds to a TGACACA site, located between the CAAT box and a TATA-like sequence element; binding to this site autoregulates in a positive sense the expression of the gene (177). This sequence is asymmetric, and if kinking occurs at (CA).(TG),the change from a (CT).(AG)to a (CA)-(TG)element may substantially alter the trajectory of the DNA in the binding site. Figure 12 shows computer-generated helical-axis trajectories for the two AP-1 recognition sites discussed above. The same wedge and helical twist angle data used in Figs. 10 and 11are used here, and a similar comparison is made between free DNA (left-hand side) and the assumption of kinking through 45” involving roll angle only into the major groove at each (CA).(TG) element (right-hand side). This comparison suggests a possible structural rationale for the binding specificity differences between these sites, because the trajectories of the DNA, due to the different spatial relationships between putative induced bending sites, are quite different in the two sequences. Recent cyclic-permutation gel-mobility-shift studies employing phased bending analyses indicate that the association of Fos and Jun with the AP-1 consensus site results in significant conformational changes both in the proteins and in the DNA (176, 178). These studies further suggest that binding of Fos- Jun heterodimers and Jun- Jun homodimers directs bending in opposite directions; Fos-Jun binding leads to DNA bending into the major groove whereas Jun-Jun binding bends DNA toward the minor groove. Other Fos-Jun combinations appear to induce negligible DNA bending, and it has therefore been postulated that DNA flexibility in the AP-1 site may be affected by the binding of these proteins (176). Cyclic-permutation studies such as these are often difficult to interpret because anomalous polyacrylamide gel retardation due to DNA bending is highly nonlinear with respect to bending planarity, i.e., to the phases of multiple bending loci (32). Nevertheless, these studies suggest a “flexible hinge” model for these systems (176), a concept perfectly consistent with putative flexibility in the (CA)*(TG)elements in the recognition sequences.
2. THEMYCPROTEINS Another group of oncogene proteins that bind to palindromic recognition sequences containing approximately phased (CA).(TG) dinucleotide elements are the Myc proteins. Although little is known of their function, they are thought to have a role in cell differentiation and proliferation (reviewed in 179). They are thought to bind using a basic helix-loop-helix motif and
238
RODNEY E. HARRINGTON AND ILGA WINICOV
FIG. 12. Computer-developed DNA trajectory projection for (a) the AP-1 consensus site and ib) a binding-site for cellular Jun protein. The following helical parameters were used: dinucleotide wedge angles (combined roll and tilt angles) were taken from the compilation of . and helical twist values from Kabsch ef al. (SO).The 5' base in the sequence is Bolshoy et ~ l(I31 at the hottom of each strand. In each panel, the left-hand strand was modeled using only the wedge and twist angles above. In the right-hand strand, (CA).(TC)dinucleotide elements are kinked through 45" by roll angle only into the major groove, as suggested by available experimental ebidmce (48, 71. 78).Views shown are in the same orientation about an axis norinal to the 5' base-pair. Additional flanking guanines (lower-case type) were added to each binding-site to facilitate the graphical representation.
STRUCTURAL FLEXIBILITY IN
DNA-PROTEININTERACTIONS
239
have characteristic leucine-zipper regions that, combined with their palindromic recognition sites, suggest that they bind as dimers. The c-Myc recognition sites contain the consensus sequence CANNTG (180).The consensus binding site for v-Myc contains the CACGTG motif (184, which is also recognized by a heterodimeric complex of c-Myc with the basic factor containing the helix-loop-helix, Max (182, 183). c-Myc also trans-activates the p53 tumor suppressor gene promoter using an essential downstream CACGTG motif (184).In addition, this same sequence is recognized by a variety of regulatory proteins, including the USF (185)and TFE3 (186)transcription factors of the adenovirus major late promoter, the pE3 in the immunoglobulin heavy-chain enhancer (181),and the G-box proteins of plants (187; see the discussion on plant proteins below). In both motifs, the spacing between (CA)*(TG)and (TG).(CA)elements is nearly but not quite a half-helical turn, so that if anisotropic flexure occurs at these elements, the resulting DNA trajectory will be in the shape of a bent U. However, the degree of nonplanarity will be different, depending on the central bases in the c-Myc site. Furthermore, the CACGTG sequence contains the (CG).(CG) slement, which is subject to methylation, at its central dyad. It is known that the c-Myc, c-Myb, and v-Myb oncogene proteins bind to recognition sequences in a methylation-dependent fashion. 3. THE GCN4 TRANSCRIPTIONAL FACTOR FROM
YEAST
Recent structural studies on the GCN4 transcriptional activator from yeast (188) show that important differences exist among members of the basic-region leucine-zipper (bZIP) protein class in their requirements for conformational changes in the recognition DNA. The GCN4 protein is similar to other bZIP proteins in that its DNA-binding domain is not in a suitable conformation for binding until this is stabilized by the association with DNA, and it binds as a dimer to the same AP-1 site that the Fos and Jun oncogene proteins bind. However, in a number of important respects, the DNAstabilized binding domain differs from its counterparts in the oncogene systems. The GCN4 protein from yeast is one of a large family of eukaryotic bZIP regulators that have high sequence homology and bind to similar recognition sequences (189).This group includes a number of the plant proteins noted below (Section 11,I). A cocrystal structure of GCN4 with a 20-bp fragment containing the AP-1 sequence at 2.9 A resolution has been reported (188). The AP-1 binding sequence, 'ITCCTATGACTCATCCAGm, is identical to that bound by the Fos and Jun proteins. This structure clearly shows that the basic DNA-binding domain, as stabilized by the interaction with the DNA, is a long, continuous a-helix; in the bound dimer, the two domains form a paired coiled-coil that diverges toward the N-terminal end to grasp the DNA
240
RODNEY E . HARHINGTON AND ILGA WINICOV
through the major groove like a long pair of tongs. The sole function of the leucine zipper is to orient and stabilize the dimeric protein complex. Most significantly, the DNA-recognition region is straight and in the canonical B-form, with no unusual microstructure or helical parameters. It is not clear whether DNA bending might occur in a complex with longer DNA, but the structure provides no clue as to how such bending might occur if such were the case. Thus, the binding specificity in this system seems to derive exclusively from direct readout of specific hydrogenbond and phosphate-backbone interactions with the basic a-helix in the two asymmetric half-sites. Comparing the GCN4 nucleoprotein complex to the Fos/Jun systems raises interesting questions about structural similarities and differences that allow proteins binding to the same DNA-recognition site to place such different requirements on the D N A conformation. The basic regions of both proteins are highly homologous and hybrid proteins constructed by interchanging basic or leucine-zipper regions are found to bind more or less interchangeably (190, 191). Furthermore, GCN4 also binds with reduced affinity to the symmetrized CRE site, in which an additional symmetrizing base-pair is added at the center of AP-1, rotating two half-recognition sites about 36" with respect to each other. All of this suggests that the basic a-helical DNA-binding domain must be flexible to some degree. It is clear that the interplay of direct and indirect readout is exceedingly complex in the bZIP protein systems.
F. Minor-groove-binding Proteins: The TFllD Transcription Factor Complex TFIID is a transcription factor required for efficient pol I1 activity in many. and perhaps all, protein-encoding genes in eukaryotic cells. i t binds to the TATA box, a conserved sequence located -30 bp upstream from most pol i I start sites, and induces the formation of a tnultiprotein preinitiation complex that appears to be stable through several rounds of transcription (192). For the initiation of pol I1 genes, TFIID is a multisubunit association complex containing the TATA binding protein (TBP) and a number of TBPassociated proteins called TATA-associated factors (TAFs) (193, 194; reviewed in 195). It has elements of structural similarity with the integration host factor (IHF) of E . coli and several other regulatory proteins that may also be reflected in the binding properties (reviewed in 196). TFIID is not required for transcription by pol I and pol 111, although TBP is evidently involved (reviewed in 197). The roles of TBP in transcription by polymerases other than pol I1 are incompletely understood, although a TBP-TAF complex seems to be involved in pol I transcription,. and the associated TAFs are thought to be different from those in TFIID (198). A
STRUCTURAL FLEXIBILITY IN
DNA-PROTEININTERACTIONS
24 1
crystal structure for TBP from the plant Arabidopsis thaliana has been determined (199).This study shows that the protein has a cross-sectional shape rather like a quarter-moon, with a small lower concave surface that straddles the DNA and an upper convex surface that accommodates the binding of ancillary proteins, the TATA-associated factors. The DNA-binding element, which contacts the minor groove of the recognition DNA, is a curved, antiparallel 9-sheet. No cocrystal structure has been reported for TBP at the time of this writing. Two recent studies indicate that, unlike most regulatory proteins that form specific contacts in the major groove of their recognition sites, TFIID evidently binds in the minor groove of the TATA element (200,201).Cyclicpermutation gel-retardation studies show that TPB, the TATA-binding subunit of TFIID, bends the DNA of its recognition region (201).Furthermore, the kinetics of TFIID binding are slow and require thermal energy (204, suggesting that the protein may also undergo a significant conformational change on binding. Because these proteins evidently nucleate and become elements of larger multiprotein complexes, their conformations and the trajectory of the associated DNA must be subject to severe constraints. This suggests that binding kinetics, as influenced by conformational changes in both the DNA and proteins, and by sequential binding of multisubunit components, may also be a factor in conferring binding specificity in this large multisubunit system. A downstream initiation element is required for efficient TFIID binding in the human gfa gene promoter (202).The region involved runs from the TATA box at about -25 to the downstream element between about +10 to about +SO bp. This suggests that DNA bending or flexibility may be involved in the association with a rather long region of DNA with a transcription complex. The wild-type sequence for this region is TTCATAAAGCCCCTCGCATCCCAG GAGCGAGCAGAGCCAGAGCAGGATGGAGAGGAGACGCAXACCTCCGCTGCTCGCCG, where the TATA box is identified as the CATAAAG element in the 5' region and the downstream element is the final 30 bp. This sequence is (G + C)-rich, especially in the downstream region, and it contains a number of approximately helically phased (CA)*(TG) putative kink elements. Figure 13 shows two computer-developed projections of this sequence into two dimensions using the estimated wedge (13) and helical twist (80) angles as employed in Figs. 10-12 and assuming that (CA).(TG) elements can kink by roll through 45" into the major groove. It is clear that this sequence region can form a nearly planar bend of almost 180" that can bring the TATA region into proximity with the downstream initiation element. The two views in Fig. 13 are rotated about the 5' base -40" with respect to each other to illustrate the putative planarity of the curvature in the DNA. As in
242
RODNEY E. HARRINGTON AND ILGA WINICOV
40"
OD
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
4 5 6 7 8
5'-TTCATAAAGCCCCTCGCATCCCAGGAGC 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
GAGCAGAGCCAGAGCAGGATGGAGAGG 6 7 8 9 0 1 2 3 4 5 6
7 8 9 0 1 2 3 4 5 6
7 8 9 0 1
AGACGCATCACCTCCGCTGCTCGCCG-3' Frc. 13. Computer-developed DNA trajectory projection for the transcription factor IID ITFIID! binding-site in the human &a gene (123). The following helical parameters were used: dinucleotide wedge angles (combined roll and tilt angles) were taken from the compilation of Bolshoy ef al. (13)and helical twist values from Kabsch et al. (80).The 5' base in the sequence is at the bottom of each strand. (CAI.(TG)dinucleotide elements are kinked through 45" hy roll angle only into the major groove, as suggested hy available experimental evidence (48. 74, 78)., Views thown are rotated 40" with respect to each other about an axis normal to the 5' base-pair.
Figs. 10-12, the parameters and assumptions used in obtaining Fig. 13 are indirect. Nevertheless, all but the helical twist values have some experimental basis, and Fig. 13 therefore provides a possible rationalization for the role of the downstream site in the TFIID initiation complex, and suggests a way in which multiple DNA-binding factors might be part of the same large, multisubunit transcription complex proposed previously (202). POSSIBLE
FLEXIBILITY IN THE TATA
PROMOTER ELEMENT
The TATA promoter element (203)has a consensus sequence TATAAA(A); this sequence is quite highly conserved in a large number of eukaryotic
STRUCTURAL FLEXIBILITY IN
DNA-PROTEININTERACTIONS
243
promoters. However, occasional variations occur in which (CA).(TG) elements are substituted for (TA)-(TA).An example is the human gfa gene in which the first (TA).(TA) is replaced by a (CA)-(TG), but nevertheless a TFIID complex that regulates the initiation of transcription in this system is formed. This suggests that anisotropic flexibility might be an important feature of the TATA region because there is evidence that both (TA).(TA) and (CA).(TG)base-stacks may kink when subjected to relatively minimal bending stress (40).A more dramatic example of transpositions of (TA).(TA) to (CA).(TG)has been reported in the fission yeast Schizosaccharomyces pombe (204). The sequence CAGTCACA is in the normal TATA location in this organism. Deletions or mutations in this sequence resulted in an almost complete loss of transcription initiation, and other evidence suggests that its function is identical to the TATA sequence, although the protein that binds to it appears to be different from TBP. Thus, although different organisms may utilize different sequence motifs in their transcriptional initiation sites, these motifs appear to utilize sequence elements that are functionally related through a common tendency toward anisotropic flexibility.
G. The NF-KB Protein and Its Binding to DNA NF-KBis a pleiotropic transcription factor that can effect gene control in a highly tissue-specific fashion (reviewed in 205, 206). Its gene has recently been cloned, and analysis of the nucleotide sequence suggests that no zincfinger domains are present (207, 208). NF-KBbinds specifically to a number of DNA sequences about 10 bp long of consensus GGGR(A/T)TYYCC, depending on cell type, in various promoter and enhancer regions (reviewed in 205). It was originally identified in KBsites of the K light-chain enhancer of B cells. It was thought to be functional only in this system (209), but later studies show that it plays many roles in many cellular systems, including T-cell activation, cytokine regulation, and the control of a number of viral systems. Cytomegalovirus (CMV) and SV-40 have NF-KB binding sites in their enhancers (205, 210). The HIV-1 enhancer has two NF-KB binding sites, one of which regulates transcriptional inducibility of the 5-LTR in activated T cells (211). There is biochemical evidence suggesting that NF-KBis a homogeneous system. For example, binding specificity to a particular target and the pattern of base contacts in complexes with DNA seem to be independent of protein source (206). Extensive purification of NF-KB from human sources leads to a 50-kDa polypeptide, the DNA-binding form of which is a dimer (p50). The principal DNA-binding form of NF-KB is a heterotetramer that includes, in addition to p50, two nonbinding 65-kDa subunits (p65). The binding of the heterotetramer is influenced by the presence of zinc, although neither of the subunits exhibits a zinc-finger structural motif. An inhibitory
2-44
HODNEY E . HAHHINGTON AND ILGA WINlCOV
subunit, IKB,binds one each of the 50-kDa and 65-kDa subunits to produce a heterotriiner complex in unstimulated cells (212).The role of IKBevidently is to block the assembly of the heterotetramer. The 65-kDa subunit serves as a receptor for IKB (213)and therefore seems to function in NF-KB inactivation, which can occur even when the latter is tightly bound to DNA. The 65kDa subunit also seems to modulate dramatically the binding of the p50 subunit to DNA (213). The p50 subunit alone binds with high affinity to palindromic sequences constructed from KB motifs (GGGACGTCCC and GGAAAITICC obtained from five-base half-sites of the KBmotif GGGACTTTCC), but the (50-kDa + 65-kDa) heterodimer binds 10- to 20-fold less strongly (213).The addition of the p65 subunit also limits the requirement of p50 for highly symmetric or 11-bp binding motifs, and hence seems to broaden the range of binding motifs accessible to NF-KB (214). It is also interesting that homodimers from p50, and heterodimers from the p50 and p65 subunits, seem to bind these sequence motifs with about the same affinity (215). The synergistic action of p6." in complex with p50 evidently contributes to the ability of NF-KB to bind specifically to and discriminate among a multiplicity of target sites, thus enabling it to control a variety of genes under many physical and biological conditions. However, the structural and biophysical mechanism(s) it uses to discriminate among its many potential binding loci are not well understood. Some evidence exists that this discrirninatory ability is based on DNA bending (216). Results obtained using the cyclic-permutation gel-mobility shift method imply that binding of NF-KBto DNA induces bending within the decameric KBbinding domain (GGGACT'ITCC), with the locus of bending near or at the 3' end. There appear to be no minor-groove contacts between the protein and the KB site. The bending angle induced by NF-KB has been estimated as about 110". In contrast. the bending angle due to the p50 dimer alone has been estimated as about 75", with the bending locus symmetrically located near the center of the KB site; this suggests that the p65 subunit increases the magnitude as well as the locus and symmetry of DNA bending within the KB binding motif. DNase I cleavage patterns of this domain in the absence of binding suggest no unusual structure in the absence of binding. In addition, computer modeling the decameric KB binding domain ( G G G A C m C C ) , as discussed above, reveals little fixed curvature (13).Thus, the observed bending is evidently induced as a conformational change on binding with the protein. Comparative binding studies of the p50 homodimer and a heterodimer of 50- and 65-kDa subunits suggests a sequence of events in which the most conserved half-site is initially contacted by a p50 subunit, followed by DNA bending and the binding of a second p50 or p65 subunit (214).This suggests
STRUCTURAL FLEXIBILITY IN
DNA-PROTEININTERACTIONS
245
that a kinetic mechanism may also play a role in allowing NF-KB to discriminate among its possible binding sites. This notion is consistent with the very low concentrations of NF-KB typically found in cells (217). It has been proposed that such a mechanism might be more important than the thermodynamic stability of the complexes, i.e., the binding constants (214). Clearly, the binding of NF-KBto the various cognate DNA-recognition sites and its ability to discriminate among them is a subtle phenomenon in terms of the structural and conformational factors that are involved in both the protein and DNA partners. It will be important to determine the precise conformations of the recognition-site complexes with the subunits as well as with NF-KB in order to understand how this promiscuous binding transcription factor can function so selectively and yet so ubiquitously.
H. Other DNA-binding Proteins with Putative Flexibility Elements in Their Recognition Sites 1. THEOccR TRANSCRIPTIONAL ACTIVATORFROM Agrobacteriurn turnefaciens
Agrobacterium tumefaciens is widely used to transport and integrate foreign DNA into certain plants to produce transgenics. As part of this unusual pathogenesis, octopine released from crown-gall tumors serves as a nutrient and possibly as a signal source for the invading bacteria. Catabolism requires the products of the OccQ operon, which is transcriptionally induced by octopine, and this induction is mediated by the OccR transcriptional activator. OccR binds with high &nity to a single promoter site between the OccR and OccQ genes, which are divergently transcribed. Octopine binds to a single site in OccR. Octopine binding to OccR shortens the DNase I footprint of the protein on its recognition site (218). This shortening appears to result from the relaxation of a bend in the DNA; octopine binding evidently does not alter either the binding &nity or the sequence specificity of the protein. The OccR footprint maps to a 56-bp sequence, GGCA'ITCGGTCAAATTCATAATGACCGGGCAAGAATAAGCAGATGTGTGCGT. The locus of bending was found by cyclic-permutation gel-retardation to be about 26 bp from the 5' end, or a little to the right of center in this sequence. With bound octopine, the magnitude of the bend was estimated as -62"; in its absence, the bend relaxed to -46". Figure 14a shows an axial projection of the DNA obtained in a fashion similar to those in Figs. 10-13. A small bend in the DNA is predicted near the center of this sequence, due largely to the presence of AA and GA basestacks in correct array. Figure 14b shows the axial trajectory if both (CA)-(TG)and (TA).(TA) elements are allowed to flex through 45", and Fig.
1 2 3 4 5 6 1 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8
5'-TGC ATTCGGTC AAATTC ATAATGACCGG 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 1 8 9 0 1 2 3 4 5 6
GC AAGAAT AAGC AG ATGTT ATGGTGCGT-3'
it
clockwise (C)
FIG. 14. Computer-developed DNA trajectory projection for the binding-site for the OccR transcriptional activator from Agrobacteriurn tutnefaciens (138).The following helical parameters were used: dinucleotide wedge angles (combined roll and tilt angles) were taken from the compilation of Bolshoy et al. (23)and helical twist values from Kahsch et al. (80). The 5' base in the sequence is at the bottom of each strand. (a) The sequence was modeled using only the wedge and twist angles. Views shown are rotated 90"with respect to each other in a clockwise sense about an axis normal to the 5' base-pair. (b) All (CA).(TG) and (TA).(TA) dinucleotide elements are kinked through 45" by roll angle only into the major groove, as suggested by available experimental evidence ( 3 , 4 0 ,48, 7 4 , 78). Views shown are rotated 90" with respect to each other in a cltwkwise sense about an axis normal to the 5' base-pair. (c) (CA).(TG) dinucleotide elements are kinked through 45" by roll angle only into the major groove, as suggested by available experimental evidence (48, 74, 78). Views shown are rotated 60" with respect to each other in a clockwise sense about an axis normal to the 5' base-pair.
STRUCTURAL FLEXIBILITY I N
DNA-PROTEININTERACTIONS
247
14c shows the trajectory if such flexure is restricted to (CA)*(TG)elements. As in the case of the TFIIIA internal control region, discussed in Section II,C, 1, the inclusion of (TA)*(TA)flexing elements improves the planarity of the putative bend, which is broadly distributed about the center of the sequence. Considerable out-of-plane bending is predicted in Fig. 14c throughout the 3' half of the sequence, but this is substantially reduced in Fig. 14b. The OccR complex is unusual in that the modulation of transcriptional regulation by a bound mediator appears to occur by a DNA bending mechanism rather than by changes in binding affinity or operator occupancy. In this case, indirect readout may direct binding specificity and may carry an additional functional component as well. COMPLEXFOR PHAGE$29 DNA REPLICATION 2. INITIATION Initiation of replication of the linear double-stranded DNA in phage $29 from Bacillus subtilis is activated by a viral protein, p6, which forms a complex with double-stranded DNA at the replication origins. The p6 protein binds as a dimer every 24 bp, causing the DNA to form a right-handed superhelix around the multimeric protein core. The DNA bending properties of this complex have been investigated using an oligomer consisting of direct repeats of a 24-bp target sequence for p6 binding at the replication origin (219). From the linking number changes due to p6 binding to oligomers containing different numbers of the 24-bp precursor sequence, along with DNA compaction and the known helical repeat in the complex, the superhelical trajectory of the DNA in the complex could be determined. The observed trajectory requires that the DNA curve through a very large angle, about 66" for every 12 bp. This degree of curvature is considerably larger than any experimental observation of fixed bending in DNA (27,36). Hence, it seems likely that it can only be achieved by kinking the DNA at least once per helical turn. The precursor sequence used in the above experiments was CCTUTAXGACATMTCCGTCGA. This has a number of putative flexibility sites. Figure 15 shows an axial trajectory projection as in earlier illustrations, allowing both (CA)*(TG)and (TA)*(TA)elements to kink through 45" into the major groove. The resulting curvature is almost planar and curves through about 90" for this set of assumptions. As in other examples, allowing kinking only at (CA)-(TG)elements leads to a greater lack of planarity in the bend. It therefore seems likely that the observed DNA curvature in the p6 complex can be understood in terms of DNA flexibility elements, and other, more draconian structural dislocations need not be invoked to explain the large curvature observed in this system.
248
RODNEY E. HARRINGTON AND ILGA WINICOV
900
60’
1 ? 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
S’-CTTAATATCGACATAATCGTCGA-3’ Frc:. 1.5. Coml’uter-de\eloped D N A trajectory projection for the binding site for the p6 replication activator protein of Bmillirs strbtilis 1139). The following helical parameters were used: diniicleotide wedge angles (combined roll and tilt angles) were taken from the compilation of Bolshoy et al. (13)and helical twist values from Kahsch et 01. (80).The 5’ base in the sequence is at the bottom of each strand. Rotation ahout an axis normal to the 5‘ hase-pair is O”, SO”,and 60”, respecti\&, for the panels from left to right.
I. Plant Regulatory Proteins with Provocative DNA Recognition Sequences Plant DNA-binding proteins that regulate plant gene expression are being identified with increasing frequency using either transgenic plants or transient expression assays (reviewed in 220-222). The structural information on their interactions with DNA is presently limited to TBP from A . thaliunu interactions with the minor groove of DNA (122).However, many of the other DNA recognition sequences for plant proteins contain provocative elements, which may imply anisotropic flexibility at (CA)-(TG)dinucleotides in individual protein-binding sites, or more fixed bending through phased (CA).(TG) containing regulatory motifs and (A + T)-rich sequences. It is likely that biological mechanisms in DNA-protein interactions for plant gene regdation are conserved in relationship to other biological systems. This was first demonstrated by the use of yeast GALA derivatives to
STRUCTURAL FLEXIBILITY I N
DNA-PROTEININTERACTIONS
249
stimulate gene expression in tobacco leaf protoplasts (223). It appears that the protein structures of many of the regulatory factors first described in prokaryotes and mammalian systems appear to function also in plants. In addition, plants are providing examples of new variants in both promoter organization of regulatory motifs and possible protein structures that interact with these elements. The following discussion therefore focuses on plant proteins and DNA-recognition sequences that may portend a role for indirect readout in their regulatory interactions.
1. THE G-BOXMOTIF AND BASICLEUCINE-ZIPPER PROTEINS The palindromic G-box motif (CCACGTGG) appears in many plant gene promoters that are photoregulated, as well as in those that respond to plant hormones and other stimuli. Nuclear proteins interact with this sequence specifically (224-226). Several members of this GBF family of nuclear DNAbinding proteins have been cloned (227-231), and all belong to the class of basic leucine-zipper (bZZP) proteins (see Section 11,C). The DNA-binding domain in these bZZP proteins consists of a region enriched in basic amino acids. The adjacent leucine zipper (232, 233), which forms an amphipathic a-helix, permits dimerization (234, 235) of bZZF proteins and thus potentially can provide additional diversity in regulation through heterodimerization (227). Binding of bZZP proteins to their recognition sites is contingent on dimerization of the proteins (236), and the basic DNA-binding element assumes an a-helical conformation that is stabilized by DNA interactions (237). These requirements suggest that a number of kinetic steps must occur before a functional complex can interact with the major groove of DNA, such as in the “forceps model” discussed in II,E,3 (188). These kinetic steps are likely to be influenced both by the protein and by DNA structures. The center of the plant G-box motif contains the “core” CACGTG sequence, which has the same putative flexibility elements as the prokaryotic promoters discussed earlier. The binding of the bZIP proteins to this sequence, however, appears to be regulated by additional sequence elements flanking the core sequence. These flanking elements affect binding affinities for a variety of binding factors to individual DNA sequences in vitro (238241) and confer different expression patterns in transgenic plants in vivo (242).In the forceps model (188), contacts of the bZIP basic region occur at the two halves of the AP-1 (ATGACTCAT) pseudodyad with the protein oriented about the central C*Gbase-pair of the binding site. The inverted symmetry of the G-box core motif does not permit completely symmetric interactions of the two bZZP a-helices across the major groove, separated by a half-helical turn. These interactions may therefore be sensitive to small alterations in local DNA conformation in the core CACGTG sequence or to the contributions of flanking sequences. This may lead to binding &nity
250
RODNEY E. HARRINGTON AND ILGA WINICOV
differences resulting from induced conformational changes in the proteins or in the complex with DNA. Thus, indirect readout may play a significant role in hZIP protein interactions with the G-box-like elements in plant promoters, especially in sequences with the potential for multiple indirect readout signals. Further changes in the core sequence would be expected to affect the binding properties significantly. An example of this can be found in the Opuque-2 bZIP transcriptional activator, which recognizes a G-box-like element with overall pseudodyad symmetry that mimics one-half of the canonical 6-box binding site, but differs in the other half: CCACGTAG. Opaque-2 can interact with this sequence as a homodimer (243) or as Theterodimer (244), suggesting structural adaptability in this site. The role of individual bases in binding affinity has also been shown by competition studies for protein binding to the Arabidopsis G-box-like element TGACGTGG (239) that contains two (TG)*(CA)putative flexibility elements separated by a halfhelical turn, which may contribute to differential protein interactions with this site. The apparent restrictions dictated by the pseudodyad symmetry of the forceps model of bZlP protein-DNA interactions (188) suggest a similar mode of interaction between the OCSBF bZIP proteins from maize with their recognition sequence. The OCSBF-1 -encoded protein recognizes a 20bp DNA sequence with dyad symmetry (TGACGTAAGCGCTI'ACG-TCA)as well as the animal AP-1 (TGACTCA) and CREB (TGACGTCA) recognition sites (245). Other bZIP proteins encoded by TGAl a and TGAl b specifically bind to TGACG, an even shorter DNA sequence (246). However, it is not clear whether the concatamers used in individual experiments (246)contributed structurally to the recognition sequences in these investigations. It will require additional crystallographic studies on different bZIP protein-DNA complexes to resolve these questions.
2. PUTATIVEZINC-BINDINGDOMAIN PROTEINSIN PLANTS
A very limited number of potential zinc-finger regulatory proteins in plants have been described. The gene encoding a metal-dependent DNAbinding protein (3AF1) cloned from tobacco recognized a very adenine-rich motif (AAATAGATAAATAAAAACATQ in the pea rbcS-3A promoter (247). This interaction was abolished by mutating the T in the GATA sequence and two of the central AA residues, but detailed information on the potential role of DNA bending on 3AF1 binding is still lacking. The predicted amino-acid sequence from the 3 A F 1 clone contains two repeated cysteine- and histidine-rich segments, but does not conform to any currently known zincfinger structures. The binding site recognized by 3AF1 also contains the GATA motif. The family of trans-acting factors that recognize GATA in mam-
STRUCTURAL FLEXIBILITY IN
DNA-PROTEININTERACTIONS
251
malian cells share a highly conserved Cys, zinc-finger DNA-binding domain (248, 249). In plants, the GATA motif is found in many promoters and is recognized by ASF-2 (250) and GA-1(226,251) factors, but their relationship to 3AF1 protein presently is not understood. Another putative zinc-finger protein sequence encoded in a cDNA (Alj n - l ) sequence has been isolated by differential screening from salt-tolerant alfalfa cells (252). The predicted amino-acid sequence in this molecule contains a proline-rich region and an acidic region as well as one putative Cys, and one putative (His,Cys,) structure, which resemble the novel zinc-finger family of proteins described by Freemont et al. (253) and by Haupt et al. (254). However, the DNA-recognition sequence for this protein is currently unknown. The Petunia E P F l factor (255) demonstrates the presence of the first (Cys,His,)-type of zinc-finger protein in plants. Although the protein binds to at least four sites in the Petunia 5-enolpyruvylshikimate-3-phosphate synthase gene (EPSPS) regulated in a developmental- and tissue-specific manner, three of these sites show sequence similarity (ITACTNNNAT) and the ___in which the two fourth is a palindromic sequence (TGACAGTGTCA) (CA)*(TG)sequence elements are almost a half-helical turn apart and nearly on opposite sides of the DNA. A very unusual clone of an Arabidopsis regulatory gene, encodes a protein, COP1, with both zinc-binding motifs and a G-protein-related domain (256). This protein appears to function as a repressor of photomorphogenesis in darkness and may function also in transcriptional regulation as well as interact with the G-protein signaling pathway. The putative zinc-finger domain encodes one (Cys,His) and one Cys, zinc-binding motif, but its binding sequence is currently not known. It is inte esting that, so far, three of the cloned plant-protein sequences exhibit unusual putative zinc-binding motifs. The distinctive folds of these zinc-binding modules and their interactions with DNA remain to be determined.
rc:
3. AT-RICH BINDINGSITES IN PLANT PROMOTERS DNA bending at A, tracts, where n > -3 to -5, has been well documented (22-26). Thus, the presence of such tracts in plant promoter regions that are binding sites for nuclear proteins may indicate that sequencedependent bending could play a role in DNA protein recognition in these regions. A nuclear protein, AT-1, binds specifically to (A T)-rich elements in the pea rbcS gene (257). Similar elements are also found in photoregulated genes from other species. The complete binding domain for AT-1 in this gene consists of two overlapping AT-rich elements with the sequence CITATATATITITAA3TA"lTATTCTCTTAA, which extends from -566 to -533 in the upstream promoter region. In this sequence, the (A*T), tracts
+
252
RODNEY E. HARHINGTON AND ILGA WINICOV
are spaced exactly at integral helical periodicity and thus could bend the DNA recognition site in a coherent manner. A (C + A)-rich element in the Arabidupsis cub-140 gene promoter is the binding site for the phosphoprotein CA-1 (258) and also contains an adjacent stretch of As. A rice actin promoter that is constitutively active shows a particularly interesting region, where 33 of 37 bases are A with G residues interspersed (259).The same promoter contains seven repeats of a CCCAA, which effectively spaces six (CA)-(TG)dinucleotides a half-helical turn apart. Protein binding to (A T)-rich sites has also been observed in promoters responsive to a variety of stimuli (260-26.1) and may involve A-tracts or phased repeats of A*Tsequences. However, the understanding of the specific mechanism of those interactions will have to await cloning of genes for the specific binding proteins and detailed studies of their structures.
+
4. HELIX--TURS-HELIX-RJR;~;-HELIX REGULATORY PROTEINSI N PLANTS The search for elements in plant gene promoters that are responsive to light has focused significantly on the signal transduction system that originates with the active form of the photoreceptor phytochrome (Pfr). This has led to the identification of two GT-rich elements associated with response to light. The box-II element, TCTGTGGlTAATATG, which in tandem copies conferred light-responsive expression of a reporter gene in transgenic tobacco (265),contains multiple (CA)-(TG)and (TA).(TA) elements, and might be expected to confer significant localized flexibility to the promoter element as noted above. Two genes, B2F and GT-la, that encode DNA-binding proteins that recognize this sequence have been identified (266, 267). Both proteins appear to have three a-helical regions that may interact with DNA, but their individual specificities are not known. A third protein from rice (GT-2) recognizes a GT-rich motif in the phyA promoter and has a similar triple helix-turn-helix structure (268, 269). The unusual GT-2 factor contains two autonomous DNA-binding domains, each with a triple helix-turnhelix structure, that discriminate between three GT-rich motif.. in the phyA promoter. Thus the triple helix-turn-helix motif may represent a new class of DNA-binding proteins. As further work in this area progresses, it will be interesting to see what similarities or differences in DNA binding exist among these plant proteins and the prokaryotic and homeodomain helixturn-helix regulatory proteins.
5. SPATIAL ARKANCEMENTOF DNA-BINDINGMOTIFS One of the remarkable characteristics of most plant gene promoters is the array of cvtntrol elements that are repeated, often with variation in different parts of a single promoter as well as in different order, among individual
STRUCTURAL FLEXIBILITY IN
DNA-PROTEININTERACTIONS
253
members of a gene family. The different organization of DNA control elements in the rbcS (270, 271) and cub gene (272) families demonstrates the extent of such variation within each gene family. It is likely that individual base variations within a motif as well as the spatial arrangement of the different motifs contribute to the indirect readout that determines the affinity with which functional DNA-protein interactions are established. As demonstrated by DNase footprint analysis of organ- and development-specific DNA-protein interactions in tomato rbcS genes, individual DNA-protein interactions as measured in vitro are necessary but are not sufficient for transcriptional activity (271).It is not yet clear whether protein modifications or interactions with additional proteins may contribute in subtle ways to the structure of the functional DNA-protein complex or act through kinetic pathways. However, both the specific nature as well as the spacing of flexibility elements in DNA-binding sites are likely to contribute to some of the discriminatory and kinetic aspects of DNA site interactions with regulatory proteins.
I11. Models of Sequence-directed StructureFunction Relationships in Selected Regulatory Systems The preceding sections detail the precise fitting together of DNA binding-sites to their cognate sequence-specific binding proteins. The possible effects of such structure-directed fitting on DNA-protein interactions within these sites were also discussed, along with the putative roles of both direct and indirect readout of binding information. In a number of systems, the contacts between individual bases in specific DNA binding-sites (recognition matrices) with appropriately positioned amino-acid side-chains in the protein-recognition element (direct readout) have been probed with a large array of mutants to try to understand their roles in regulatory DNA-protein interactions. Accordingly, we have taken published data from two promoters with extensive collections of mutants and have analyzed their putative sequencedirected DNA structures to assess possible contributions of indirect readout to the results. Both the mouse pmaj-globin promoter (273-275) and the oat phyA3 promoter (276)contain several regulatory sequences. The sequences have been mutated either by linker-scan mutagenesis or by point mutations without aflecting their spacing to the TATA or + 1transcription start site, and each mutation has been tested for function by gene expression in vivo. The results offer some interesting insights and raise questions about the complicated mechanics of regulatory protein-DNA interactions.
254
RODNEY E . HARRINGTON AND ILGA WLNICOV
A. The Proximal pmai-globin Promoter in Mouse The expression of globin genes during erythroid differentiation is controlled by cis-acting DNA elements and trans-acting regulatory factors at both the promoter and locus-control regions (reviewed in 277,278). Experiments with deletion mutants of cloned hybrid genes transferred into murine erythroleukemia (MEL) cells showed that a minimal promoter of 106 bp of P-globin 5'-flanking sequences directed correctly initiated transcripts, and that these transcripts were induced by MEL cell differentiation (274). Several important regulatory elements have been identified within the 106-bp region: the TATA box (from -30 to -26), a CCMT box (from -79 to -72), a CACCC element (from -95 to -87), the imperfect repeat (PDRE) element AGGGCAGGAGCCAGGGCAGAGC (from -53 to -32), and a GATA-like element on both strands. Binding factors have been identified for these sites from MEL cells (279, 284). It appears that the region is not only crowded with potential binding factors, but that a combination of several factors is required to mediate transcription from the minimal promoter (275,279).The matter is further complicated by the recent finding that the erythroid cellspecific factor that binds to the CACCC element, and is related to the Kruppel family of proteins, also binds the sequences (CCA).(TGG), (CAC).(GTG), and (CCT).(AGG) (284). Point mutations in most of the 106 bases of the mouse PmaJ proximal promoter have been tested in both long-term and transient assays in HeLa and MEL cells to identify sequences involved in transcription and the induction response (273-275). Figure 16 shows computer-developed DNA trajectory projections of the promoter region from base-pairs -97 to -26, in an effort to identify regions with potential indirect readout information and to correlate these with the known functional regions. Using the same helical parameters in the modeling as discussed earlier (Fig. ll), it is clear that the DNA structure for the mouse pnlaj-globin promoter shows a remarkable pattern of flexible sites arranged in a planar manner, except for the TATA site. Each one of the known protein-binding sites lies in a region of putative flexibility. In addition, the relatively planar overall conformation suggests the possibility of interactions between proteins that bind at sites distant from one another in this sequence. Interactions of proteins at nonadjacent sites has been suggested as the mechanism for negative regulation of the pmajglobin promoter (285)and may well play a quantitative role in transcriptional activation from the proximal promoter. This possibility adds another layer of complexity in comparisons and interpretation of nuclear-protein binding studies between oligonucleotides specific for binding a single protein and DNA fragments with a number of binding sites. Our model is supported by recent findings that the protein PDRf that binds the PDRE site in the pmaj-
STRUCTURAL FLEXIBILITY IN
DNA-PROTEININTERACTIONS
255
(5') 0"
goo
5'-gGAGCCACACCCTGGTAAGGGCCAATCTGCTCACAC 0
-90
-80
b
-70
AGGATAGAGAGGGCAGGAGCCAGGGCAGGGCAGAGCATAZlA-3' -60
b
-50
-40
b
-30
FIG. 16. Computer-developed DNA trajectory projection for the mouse Pmaj-globin proximal promoter sequence. Underlined regions represent known protein-binding sites. The following helical parameters were used: dinucleotide wedge angles (combined roll and tilt angles) were taken from the compilation of Bolshoy et al. (13)and helical twist values from Kabsch et al. (80).The 5' base in the sequence is at the bottom of each strand and is identified. Views shown are rotated 90"with respect to each other about an axis normal to the 5' base-pair. The extra guanine at the 5' end (lower-case typeface) is added to clarify the computer-modeling presentat ion.
globin promoter bends the DNA an estimated 90" at this site (285~).In addition, the binding affinity of PDRf was substantially higher for the intact p-globin promoter than for the isolated PDRE site ( 2 8 5 ~ ) . We used the same modeling methods to examine the effect of various point mutations in the CACCC element, the CCAAT box, and the PDRE element on the DNA conformation depicted in Fig. 16. These same mutations affect transcription in MEL cells (274).A summary of results is given in Table I, where (+) denotes a mutant as contributing to a change in the wildtype binding-site conformation and (-) denotes no significant change. These results demonstrate a remarkable correlation of conformational changes in DNA with changes in transcriptional activity of individual mutants. Individ-
256
RODNEY E. HARHINGTON AND ILGA WINICOV
ual mutants listed in Table I caused changes in the DNA trajectory ranging from alterations in the microarchitecture of DNA bending/flexing regions to variations in the planar structure of the entire proximal promoter region. Such changes in DN,4 planarity might well affect interactions between binding proteins brought together by DNA looping. These correlations therefore suggest that indirect readout very likely plays a role in the functional asseinblv of the mouse Pnlaj-globin promoter complex with its binding proteins.
B. The Phytochrome (phyA3) Promoter in Oats The plant photoreceptor phytochrorne represses its own phyA gene transcription via a signal pathway after conversion to the active Pfr form in red light (reviewed in 286, 287). Deletions and substitutions identified several phyA promoter elements involved in transcriptional activation of these genes (268, 288) and indicated that the minimal promoter (-400)provides sufficient information for high-level expression from the phytochrome pronioter
TABLE I M o v s ~p ” i ~ ~ ~ Pnosio G ~ ~ mi ) ~ MLTAN ~ u I‘ T~ANSCRIPTIONAL ACTIVITY S T n u c n x a CHANCES
AND
DNA position“
$1utation“
(%)
Structural change
Wild tlpe
-
None
100
Wild type
PDRE
-33 -25 - 37 -39 - 40 - -12 --16 - -18
G to T G to A C to A G to A G b to A C” to ‘4 G to A A to T G to A
80 97 119
+
Element
-.SO
CACAA CCA.4T
GCC.4CACCC
-G
--1.3 - 78 - 79 - 87
-91 - 93
-95
TranscriptioncJ
POTENTIAL.
100
104 106 85 107 80
C to A
69
h to G C to A G to ‘4
29
C to T
c to A
C to T G to A
88
-e
122
+
25 20 27
+
34
t
+
+
STRUCTURAL FLEXIBILITY IN
DNA-PROTEININTERACTIONS
257
in the dark, as well as phytochrome regulation by red light. Transient expression of linker-scan mutants of the PE1 and PE3-RE1 elements of the phyA3 promoter (276) tested the function of individual segments of these elements in both transcription and light regulation. Figure 17 shows the results of computer conformation modeling of the DNA trajectory similar to that described above for the p-globin system, on wild-type PE1 and PE3RE1 from the oat phyA3 promoter (276). As we did for P-globin, we tested the effect of each of the linker-scan mutations on the predicted DNA conformation. These data are summarized in Table 11, where (+) indicates a mutant that leads to an apparent conformational change and (-) indicates no apparent change. The results are quite provocative. In Fig. 17a, the PE1 element of the phyA3 promoter is modeled from position -372 to -338; this positive regulatory region shows a planar loop conformation with putative flexible sequence elements spanning the linker-scan mutant sites. Linker-scan mutants 614, 615, and 616 resulted in decreased transcription in the dark, whereas 617 showed normal levels of transcriptional activity (276) (Table 11). All four mutant sequences have been altered by mutations in six or seven bases in the sequence, producing significant changes in transcriptional activity. Mutants 614, 615, and 616 also show significant conformational changes in the planarity of the looped conformation or in the directionality of bend sites. Mutant 617, on the other hand, retains the wild-type level of transcriptional activity as well as wild-type conformation. Similar results are obtained with linker-scan mutations that define the positive PE3 element and the negative RE1 element of the oat phyA3 promoter (276). The wild-type DNA trajectory projection of these adjacent sequence elements is shown from - 115 to -65 in Fig. 17b, and the conformational modeling results are also summarized in Table 11. The boundaries of PE3 are marked with putative flexibility elements, of which the most pronounced is the apparent boundary between PE3 and RE1. The sequence trajectory for mutant 641, which showed wild-type activity in the dark and red-light repression (276), was essentially similar to that of the wild type (Table 11). On the other hand, mutants 639, 638, and 637 had more highly bent conformations and mutant 636 a less bent conformation than the wild type, and all four mutants showed loss of transcriptional activity in the dark (276). The conformation of mutant 635 DNA appeared very similar to that of the wild type, which correlates well with a substantial level of transcriptional activity in the dark (276). The two mutants in the RE1 element have opposite effects on potential bending. Mutant 634 abolishes the putative flexibility site at the PE3-RE1 boundary, whereas mutant 640 introduces a new flexibility site in the RE1 element. Both mutants have lost transcriptional repression in red light, indicating that both positive and negative regulatory
(5')
0"
90"
ATGGCTCTEC ATC
S-TCGA~AGCTCCC 30"
5'-GGCTGGAAATAGCAAATGTTAAAAATAAAGGTGA-3' S -370
0
-360
-350
-340
0
-110
-100
-90
CGCGCCGGTCIATGG~~GCGdAACAA-3' -80
0 -7 0
FIG. 17. Computer-developed DNA trajectory projection for the oat phyA3 regulatory elements. The following helical parameters were used: dinucleotide wedge angles (combined roll and tilt angles) were taken from the compilation of Bolshov e t a / . (13)and helical twist values from Kabsch et al. (80).The 5' base in the sequence is at the bottom of each strand. (a) the P E l element in two views rotated 30" with respect to each other about an axis normal to the 5' hase-pair. (b) The PE3-RE1 element in two views rotated 90" with respect to each other about an axis normal to the 5' base-pair.
STRUCTURAL FLEXIBILITY IN
DNA-PROTEIN INTERACTIONS
259
TABLE I1 THE phyA3 PROMOTER: LINKER-SCAN MUTANTTRANSCRIPTIONAL ACTIVITY AND POTENTIAL STRUCTURAL CHANGES
Element PE 1
PE3
RE 1
Mutant Clone.
Activity (dark).
Activity (red light).
Structural Change
449 (wild type) 614 615 616 617
100 20 35 35 95
20 35 20 30 25
Wild type
449 (wild type) 641 639 638 637 636 635
100 100 45 30 30 45 70
25 25 30 30 35 30 45
Wild type
95 80
95 85
+ +
634 640
+ + + -
-
+ + + +
+ -
~
QDatafrom Bruce et 01. (276).
interactions in the phyA3 promoter may utilize DNA conformational information to achieve optimal DNA-protein fits required for regulatory functions.
IV. Past Challenges and Future Prospects Our understanding of DNA conformation and structure has undergone a slow but continuous redirection in the 40 or so years since the double helix was first proposed. The first chapters of the DNA saga stressed its polymeric character, as epitomized in the perfectly uniform double helix of Watson and Crick in which each base pair, although chemically different from the remaining three, was nevertheless thought to be structurally equivalent to its neighbors. This simplicity was aesthetically appealing; furthermore, it had real value in the initial concept of the basic structure of DNA. Watson and Crick were perhaps fortunate to have developed their structure by scrutinizing relatively imprecise macroscopic mechanical models. It was recognized rather early that axially directed sources of stabilization free energy, or base stacking, contributed significantly (along with laterally directed hydrogen-bond energy) to the overall stability of DNA. In addition, the polyelectrolyte nature of DNA led to some restrictions on its polymeric
260
HODNEY E . HAHHINGTON AND ILGA WINICOV
character. Under most solution conditions, the highly charged phosphates in the sugar-phosphate backbone led to electrostatic stresses with strong components directed along the helical axis. These short-range axial interactions produced a high level of “chain stiffness” in DNA, which led to its characterization as a rigid rodlike macromolecule, or in later, more sophisticated thinking, as a “weakly bending” rod or as a “wormlike” chain with persistence. The charged phosphates also resulted in large intrachain interactions among regions widely separated in contour distance along the DNA chain, which had a substantial excluded volume effect. Thus, because of its intrinsic complexity, DNA became a marvelous exercise bar for generations of polymer biophysicists, and enormous ingenuity was manifested in describing the hydrodynamic, thermodynamic, and conformational-dynainical properties of this remarkable molecule. All this was accomplished within the paradigms of high-polymer physics, however, and the relation of most of these properties to biological function remained obscure. From the beginning, the basic processes of biology such as replication, transcription, and recombination could be broadly rationalized at the molecular level through the Watson-Crick helical structure of DNA. However, it has only been within the last 15 years or so that the more subtle details of these processes have become understandable in terms of DNA conformation and structure. All this has been expedited by the striking advances in the molecular biology of transcription, replication, and recombination that have occurred during this same time period, and the enormous synergism that has developed between molecular biology and structural biophysics has been highly beneficial to both areas. Moreover, two structurally related concepts seem to have been particularly critical: the demonstration of fixed or static bending in DNA and the gradual acceptance of site-directed flexibility through the mechanism of the stereochemical kink. The molecular biology of gene regulation, including the explosive flood of new information on regulatory nucleoprotein complexes, has converged with these latter ideas to support sequence dependence in DNA structure. Over the past decade, the wealth of new data from structural biology, including results from crystallographic, sp&troscopic, microscopic, and other physical methods, has left no doubt that the older models of DNA as a weakly bending rod are inadequate in the extreme. The “new DNA” clearly is vastly more complicated than even its complex and difficult polymeric predecessor. The new structural paradigms for DNA lead to an essentially surrealistic picture of structure-function relationships. Every dinucleotide base-stack seems to possess components both of fixed bending and of axial flexibility. These are almost certainly interrelated, although we do not yet fully understand the physical basis for this interrelatedness. We know even less at
STRUCTURAL FLEXIBILITY IN
DNA-PROTEININTERACTIONS
261
present about site-localized helical twisting and torsional flexibility, although, like axial bending, this property has been characterized in terms of averages over a random base-sequence. We know less yet about cooperativity effects with neighboring sites, i.e., effects of sequence context on sitelocalized properties. It is clear, therefore, that we can no longer ignore the many new layers of complexity that sequence-directed structure superimposes upon the older polymeric concept of DNA. The DNA molecule is beginning to look less like a highway over land than a route over an ocean, the straight path broken by ripples, all of which have an origin, a history, and a fundamental meaning. To further extend the surrealism, we see, in this review, examples of structural relationships in regulatory nucleoprotein complexes that do not even exist in the absence of either one of the binding partners. The last f6w years have indeed witnessed a renaissance in biological regulation at the molecular level with both DNA and protein as active partners in the structural engagement we call gene regulation. However, in order to understand and digest this new information and its implications, we must continue to reassess our paradigms concerning the DNA molecule and redirect our thinking about this new Middle Earth of DNA structure-function in which much occurs that we cannot always see. We will also have to be at least as resourceful as past generations of structural biophysicists and will have to devise new experiments that can provide us with information on localized structural effects within the context of supramolecular systems. In other words, we must devise new methods to measure and interpret the ripples and waves on the surface of the genetic ocean. See addendum on p. 399.
V. Glossary of Abbreviations and Polynucleotide Notation
Abbreviations AP- 1 bZIP cab-140 CREB CI repressor EPFl
animal protein with DNA-binding specificity for ATGACTCAT basic leucine-zipper binding motif in a protein gene encoding the light-harvesting chlorophyll alb proteins animal protein with DNA-binding specificity for TGACGTCA the A repressor protein putative zinc-finger DNA-binding protein from Petunia
262
EPSPS GALA GBF proteins G-box Kruppel protein MCMl
NF-UB
NOE
OccR, OccQ OccR
OCSBF-1
Opaque2
PEI element
RODNEY E. HARRINGTON AND ILGA WINICOV
gene encoding the 5-enolpyruvylshikimate-3phosphate synthase in Petunia a transcriptional activator protein in yeast required for galactose and melibiose catabolism plant nuclear proteins that bind to the G-box CCACGTGG sequence element in plant promoters zinc-finger protein encoded by the DrosophiEa segmentation gene Kruppel accessory proteins in the complex of the MATa2 repressor proteins with the STE6 gene operator in the yeast Saccharomyces cerevisiae (122, 128) a pleiotropic transcription factor that can effect gene control in a highly tissue-specific manner nuclear Overhauser enhancement (or effect), an enhancement of a nuclear magnetic resonance due to proximity of two nonbonded nuclei, which can be useful in spectral assignments and in determining interatomic distances genes in Agrobacterium tumfaciens that regulate certain catabolic processes transcription factor for the OccR gene gene encoding a bZJP regulatory protein in maize with DNA-binding specificity for the Ocs site: ACGTAAGCGCTI'ACGT gene encoding a bZIP regulatory protein in maize with DNA-binding specificity for TCCACGTAGA one of several specific recognition sites on the X phage genome for X Cro and X repressor proteins positive regulatory element of phyA promoter in rice: positions -372 to -338 relative to the 1 transcription start site adjacent positive and negative regulatory elements of the phyA promoter in rice. PE3, positive element: positions -11 to -91. RE1, negative element: positions -80 to -70. All positions relative to the + 1 transcription start site gene encoding the phytochrome apoprotein
+
PE3-RE1 elements
STRUCTURAL FLEXIBILITY IN
rbcS RL
TBP TFIIIA
TGAla TGAl b
DNA-PROTEININTERACTIONS
263
gene encoding the small subunit of ribulosebisphosphate carboxylase the ratio of apparent to true fragment length as determined by electrophoretic mobility of curved DNA fragments in polyacrylamide gels the TATA-binding protein necessary for transcription of most eukaryotic genes a zinc-finger regulatory protein from Xenopus lmvis that binds specifically to the internal promoter of 5s RNA genes and also to 5s ribosomal RNA gene encoding a bZIP protein in tobacco with DNA-binding specificity for TGACGTCA gene encoding a bZIP protein in tobacco with DNA-binding specificity for TGAGGT
Polynucleot ide Notation dinucleotide (phosphodiester bonded) A-T or AT base-pair (H-bonded) A*T (A-plus-T)-rich region (same strand) A+T dinucleotides (on opposing strands, H-bonded) (A - T)*(A- T) base triplet CAC base triplets (H-bonded) (CAC)(GTG) nucleoside N (NNNN.**).(NNNN***) base-paired sequences (opposing)
REFERENCES 1. C. 0. Pabo and R. T.Sauer, ARB 61, 1053 (1992). 2. S. C. Harrison, Nature 353, 715 (1991). 3. T. A. Steitz, Q . Rev. Biophys. 23, 205 (1990). 4. R. E. Harrington, Mol. Microbiol. 6, 2549 (1992). 5. A. A. Travers, ARB 58, 427 (1989). 6. A. A. Travers, Cell 60, 177 (1990). 7. H. R. Drew and A. A. Travers, NARes 13, 4445 (1985). 8. W. F. Anderson, D. H. Ohlendorf, Y. Takeda and B. W. Matthews, Nature 290, 754 (1981). 9. W. F. Anderson, Y. Takeda, D . H. Ohlendorfand B. W. Matthews, J M B 159,745 (1982). 10. S.-H. Kim, Science 255, 1217 (1992). 11. R. E. Dickerson and H. R. Drew, J M B 149, 761 (1981). 12. R. E. Dickerson, in “Proceedings of the Sixth Conversation in Biomolecular Stereodynamics” (R. H. Sarma, ed.), p. 72. Adenine, Schenectady, New York, 1989.
26.1
RODNEY E. HAHHINGTON AND ILGA WlNICOV
13. A. Solshoy. P. T. McNamara, R. E . Harrington and E. N. Trifonov, PNAS 88,2312 (1991). 1.1. P. T. McNainara and R. E. Harrington, JBC 266, 12548 (1991). 15. M. Ptashne, Suture 323, 697 (1986). 16. H. Echols, Science 233, 1050 (1986). 17. C . Felsenfeld, Nature 355, 219 (1992). 18. H. Nussino\.; Crit. Rec. Biocheni. Mol. B i d . 25, 1885 (1990). 19. E. K. Trifonov, CRC Crit. Rev. Biochem. 19, 89 (1985). 20. S. lliekmann, in “h’ucleic Acids and Molecular Biology” (F. Eckstein and D. M .J. Lilley. eds. ), p. 138. Springer-\’erlag, Berlin, 1987. 21. hl. Sundaralingam and Y. C. Sekharudu, in “Structure and Expression” (W. K. Olson. M. H. Sarma, R. H. Saririaand M . Sundaralingam, eds.), Vol. 3, p. 9. Adenine, Schenectady, New York. 1988. 22. J. C. hfarini. S. D. Levene. D. M .Crothers and P. T. Engluiid, PNAS 79,766.1 (1982);up. cit. (10, 7678 (wrrectionl. 23. H . X1. \Vu and D. hl. Crothers, Mature 308, .509 (1984). 24. S. D. Levene and D. M .Crothers, J M B 189, 61 (1986). 25. S. D. Levene and D. 41. Crothers, J M B 189, 73 (1986). 26. H . 4 . Zooo.J. Drak, J. A. Rice and D. M .Crothers, Bchetn 29, 4227 (1990). 27. I). M .Crothers, J. Drak, J. D. Kahn and S. D. Levene, i n “Methods in Enzymology” (I). hl. J , Lilley and J. E. Dahlberg, eds.). Vol. 212. p. 3. Academic Press, San Diego. 1992. 28. 1). Sliore and R. L. Baldwin, J M B 170, 9.57 (1983). 29. I). Shore. J. Langowski and R. L. Baldwin. PNAS 78, 4833 (1981). 30. E. N. Trifonov and J. L. Sussman, PNAS 77, 3816 (1980). 3 1 . 1’. B. Zhnrkin, 1. Biornol. Struct. Dyn. 2, 785 (1985). 32. E. K.Trifonov and L. E. Ulanovsky, in “Unusual DNA Structures’’ (R. D. Wells and S. C. Harvey, eds. ), p. 173. Springer-Verlag, New York, 1988. 33. \-. B. Zhurkin. N. 8. Ulyanov, A. A. Gorin and R. L. Jernigan, PXAS 88, 7046 (1991). 34. if. C. M. Nelson, J. T. Finch, B. F. Luisi and A. Klug, Nature 330, 221 (1987). 3.5. H.4. Koo. H.-M. R’u and D. M. Crothers, Xafcttrrre 320, 501 (1986). 36. P. J. Hagerman, ARB 59, 755 (1990). ~37. P. J. Hagerman. Biopolytners 20, 1503 (1981). 38. K . I,. Cairncy and R . E. Harrington, Biopolyrners 21, 923 (1982). 39. 1. A. Schellman, BiopoIyrners 13, 216 (1974). 40. P. T. McNarnara, A. Bolshoy, E. N . Trifonov and H. E. Harrington, /. Biotnol, Struct. D y n . 8, 529 (1990). 41. P. J. Hagerman, PXAS 81, 4632 (1984). 42. S. D. Levene, H.-\V. Wn and D. M . Crothers, Bchern 25, 3988 (1986). 43. F. H. C. Crick and A. Klug, Nature 255, 530 (1975). 41. H. hl. Sobell. C . Tbai. S. G . Gilbert, S. C. Jain and T. D. Sahore, PNAS 73, 3068 (1976). 3.5. I). A . Pearlman. S. 14. Holbrook, D. H. Pirkle and S.-H. Kim, Science 227, 1304 (1985). 46. M.-J. Pillaire. G. Villani, J . 3 . Hoffmarin, A . - M . hlazard and M. Defais, NARes 20, 6473 il992). 47. A S. Bhattadiaryyai. A . I. H. Murchie and D. hi. J. Lilley. Nature 343, 484 (1990). 38. S. Schultz and T. A. Steitz, Science 253, I001 (1991). 49. J. A. McClarin, C. A. Frederick. B.-C. ivdiig, P. Greene, 11, W. Boyer, J. Grable and J. X I . Hosenherg, Science 234, 1526 (1986). .TO. \’. B. Zhnrkin. Y. P. Lysov and V. I. Ivanov, NARes 6 , 1081 (1979). 51. J. L. Sussman and A. Khg, PXAS 75, 103 (1978). 52. G. S. Manning. Biopokymers 22, 689 (1983).
STRUCTURAL FLEXIBILITY IN
DNA-PROTEININTERACTIONS
265
53. H. Teitelbaum and S. W. Englander, J M B 92, 55; ibid. 92, 79 (1975). 54. C. Mandal, N. R. Kallenbach and S. W. Englander, J M B 135, 391 (1979). 55. N. R. Kallenbach, C. Manda and S. W. Englander, in “Nucleic Acid Geometry and Dynamics” (R. H. Sarma, ed.). Pergarnon, New York, 1980. 56. M. Gueron, M. Kochoyan and J.-L. Leroy, Nature 328, 89 (1987). 57. J. Wilcoxen and J. M. Schurr, Biopolymers 22, 2273 (1983). 58. M. Frank-Karnenetskii, Nature 328, 17 (1987). 59. 0. Gotoh and T. Tagashira, Biopolymers 20, 1033 (1981). 60. K. J. Bresslauer, R. Frank, H. Blocker and L. A. Markey, PNAS 83, 3746 (1986). 61. S. Cheung, K. Arndt and P. Lu, PNAS 81, 3665 (1984). 62. P. T. McNamara, I. Winicov and R. E. Harrington, BBRC 138, 110 (1986). 63. N. C. Stellwagen, in “Structure and Expression” (W. K. Olson, M. H. Sarma, R. H. Sarma and M . Sundaralingarn, eds.), Vol. 3, p. 69. Adenine, Schenectady, New York, 1988. 64. S. Zinkel and D. M. Crothers, Nature 328, 178 (1987). 65. S. Zinkel and D. M. Crothers, Biopolymers 29, 29 (1990). 66. R. G. Brennan, S. L. Roderick, Y. Takeda and B. W. Matthews, PNAS 87, 8165 (1990). 67. A. Bolshoy and E. N. Trifonov, “Abstract 7 and Report to the EMBO Workshop on Auxiliary Binding Proteins in Prokaryotes and Eukaryotes,” Ein Gedi, Israel, May 21-25, 1990. 68. B. H. Zimrn and S. D. Levene, Q . Rev. Biophys. 25, 171 (1992). 69. R. E. Harrington, Electrophoresis 14, 732 (1993). 70. R. K. Z. Tan and S. C. Harvey, J . Biomol. Struct. Dyn. 5, 497 (1987). 71. H. R. Drew and A. A. Travers, J M B 186, 733 (1985). 72. D. M. Crothers and T. A. Steitz, in “Transcriptional Regulation,” p. 501. CSHLab, Cold Spring Harbor, New York, 1992. 73. A. Lipanov, M. L. Kopka, M. Kaczor-Grzeskowiak, J. Quintana and R. E. Dickerson, Bchem 32, 1373 (1993). 74. Y. L. Lyubchenko, L. S. Shlyakhtenko, B. Chernov and R. E. Harrington, PNAS 88,5331 (1991). 75. L. Ulanovsky, M. Bodner, E. N. Trifonov and M. Choder, PNAS 83, 862 (1986). 76. K. Zahn and F. R. Blattner, Science 236, 416 (1987). 77. A. M. Barber and V. B. Zhurkin, J. Biomol. Struct. Dyn. 8, 213 (1990). 78. Y. L. Lyubchenko, L. S. Shlyakhtenko, E. Appellaand R. E. Harrington, Bchefn32,4121 (1993). 79. Y. Takeda, A. Sarai and V. M. Rivera, PNAS 86, 439 (1989). 80. W. Kabsch, S. Sander and E. N. Trifonov, NARes 10, 1097 (1982). 81. L. S. Shlyakhtenko, Y. L. Lyubchenko, B. K. Chernov and V. B. Zhurkin, Mol. B i d . 24, 79 (1990). 82. F. Aboul-ela, D. Koh, I. Tinoko and F. Martin, NARes 13, 4811 (1985). 83. D. J. Patel, S. A. Kozlowski, S. Ikuta and K. Itakura, Fed. Proc. 43, 2663 (1984). 84. C.-H. Hsien and J. D. Griffith, PNAS 86, 4833 (1989). 85. A. Bhattacharyya and D. M. J. Lilley, NARes 17, 6821 (1989). 86. Y. Timsit, E. Vilbois and D. Moras, Nature 354, 167 (1992). 87. A. Sakai and Y. Takeda, PNAS 86, 6513 (1989). 88. M. E. Donlan and P. Lu, NARes 20, 525 (1992). 89. V. P. Chuprina, NARes 15, 293 (1987). 90. E. von Kitzing and S. Diekrnann, Eur. Biophys. I . 15, 13 (1987). 91. A. V. Fratini, M. L. Kopka, H. R. Drew and R. E. Dickerson, JBC 257, 14686 (1982). 92. M. R. Gartenberg and D. M. Crothers, Nature 333, 824 (1988).
266
RODNEY E. HAHRINGTON AND ILGA WINICOV
93. D. M . J. Lilley, Nature 354, 356 (1991). 94. W. F. Anderson, D. H. Ohlendorf, Y. Takeda and B. U’. Matthews, Nature 290, 754 (1981). 95. S. C. Harrison and A. K. Aggarwal, ARB 59, 933 (1990). 96. A. Mondragon, C. Wolberger and S. C. Harrison, J M B 205, 179 (1989). 97. A. Mondragon, S. Subbiah, S. C. Almo, M. Drottar and S. C. Harrison, J M B 205, 189 (1989). Y8. R. W. Schevitz, Z. Otwinowski, A. Joachimiak, C. L. Lawson and P. B. Sigler, Nature 327, 782 (1985). 99. I,. J. Beamer and C. 0.Pabo, J M B 227, 177 (1992). 100. A. Mondragon and S . C. Harrison, J M B 219, 321 (1991). 201. A. K. 14ggmal, D. W. Rodgers, M. Drottar, M. Ptashne and S. C. Harrison, Science 242, 899 (1988). 102. Z.Otwinowski, R. U’. Schevitz, R . 4 . Zhang, C. L. Lawson, A. Joachimiak, R. Q. Marmorstein, B. F. Luisi and P. B. Sigler, Nature 335, 321 (1988). 103. C. Wolbergrr, Y. Dong, M. Ptashne and S. C. Harrison, Nature 335, 789 (1988). 104. S. R. Jordan and C. 0. Pabo, Science 242, 893 (1988). 105. L. J. Beanirr and C. 0. Pabo, J M B 227, 177 (1992). 106’. M. Ptashne, “A Genetic Switch.” Cell Press, Cambridge, Massachusetts, 1986. 107. N. D. Clarke, L. J. Beamer, H. R. Goldberg, C. Berkower and C. 0. Pabo, Science 254, 267 (1991). 108. A. D. Johnson, A. R. Poteete, G. Lauer, R. T. Sauer, G . K. Akers and M. Ptashne, Nature 294, 217 (1981). 109. P. Ptashne. Cancer 67, 2422 (1991). 110. M. Ptashne, A. Jeffrey, A. D. Johnson, R. Maurer, B. J. Meyer, C. 0. Pabo, T. M. Roberts and R. T. Sauer, Cell 19, (1980). 111. B. de Crombrugghe, S. Busby and H. Buc, in “BiologicalRegulatioii and Development” (R. F. Coldberger and K. R. Yamomoto, eds.), p. 129. Plenum, New York, 1984. 112. H.-N. Liu-Johnson, M. R. Cartenberg and D. M. Crothers, Cell 47, 995 (1986). 113. L. Bracco, D. Kolarz, A. Kolb, S. Diekmann and H. Buc. EMBO J. 8, 4289 (1989). 114. M . R. Gartenberg and D. M. Crothers, JMB 219, 217 (1991). 115. C. L. Lawson and P. B. Sigler, Nature 233, 869 (1988). 116. R . 4 . Zhang, A. Joachimiak, C. L. Lawson, R. W. Schevitz, Z. Otwinowski and P. B. Sigler, Nature 327, 591 (1987). 117. A. A. Kuramoto, W. G. Miller and R. P. Gunsalus, Genes Deo. 1, 556 (1987). 118. S. Bass, P. Sugiono, D. N. Arvidson, R. P. Gunsalus and P. Youderain, Genes Den 1, 565 (1987). 119. J. Carey, PNAS 85, 975 (1988). 120. M. P. Scott, J. W.Tamkun and C . W. Hartzell, BBA 989, 25 (1989). 121. M. Molter, A. Schier and W. J. Gehring, Curr. Opin. Cell B i d . 2, 485 (1990). 122. C. Wolberger. A. K. Vershorr, B. Liu, A. D. Johnson and C. 0. Pabo, Cell 67, 517 (1991). 123. G. Otting. P Q. Qian, M. Billeter, M. Muller, M. Affolter, W. J. Gehring and K. Wuthrich, E M B O J . 9, 3085 (1990). 124. C. H. Kissinger, B. Liu, E. Martin-Blanco, T. 8. Kornberg and C. 0. Pabo, Cell 63, 579 ( 1990). 125. M. Muller, M. Molter, W. Leupin, G. Otting, K. Wuthrich and W. J. Gehring, E M B o J . 7, 4299 (1988). 126. M. Affoker, A. Percivd-Smith, M. Muller, W. Leupin and W. J. Cehring, PNAS 87, 4093 (1990).
STRUCTURAL FLEXIBILITY IN
DNA-PROTEIN INTERACTIONS
267
127. Y. Q. Qian, M. Billeter, G. Otting, M. Muller, W. J. Gehring and K. Wuthrich, Cell 59, 573 (1989). 128. C. A. Keleher, C. Goutte and A._DLJohnson, Cell 53, 927 (1988). 129. D. R. Engelke, S. Y. Ng, B. S., Shastry and R. G. Roeder, Cell 19, 717 (1980). 130. S. Sakonju, D. F. Bogenhagen and D. D. Brown, Cell 19, 13 (1980). 131. D. F. Bogenhagen, S. Sakonju and D. D. Brown, Cell 19, 27 (1980). 132. J. Miller, A. D. McLachlan and A. Klug, EMBO J. 4 , 1609 (1985). 133. K. Struhl, TZBS 14, 136 (1989). 134. J. M. Berg, Annu. Reu. Biophys. Biophys. Chem. 19, 405 (1990). 135. L. Fairall, D. Rhodes and A. Klug, ] M B 192, 577 (1986). 136. M. E. Churchill, T. D. Tullius and A. Klug, PNAS 87, 5528 (1990). 137. A. Klug and D. Rhodes, TZBS 12, 464 (1987). 138. J. Berg, PNAS 85, 99 (1988). 139. K. E. Vrana, M. I. A. Churchill, ' I D. Tullius and D. D. Brown, MCBiol8, 1684 (1988). 140. N. P. Pavletich and C. 0. Pabo, Science 252, 809 (1991). 141. J. J. Hayes and K. R. Clernens, Bchem 31, 11600 (1992). 142. X. Liao, K. R. Clemens, L. Tennant, P. E. Wright and J. M. Gottesfeld, JMB 223, 857 (1992). 143. K. R. Clemens, X. Liao, V. Wolf, P. E. Wright and J. M. Gottesfeld, PNAS 89, 10822 (1992). 144. K. R. Clemens, V. Wolf, S. J. McBryant, P. Zhang, X. Liao, P. E. Wright and J. M. Gottesfeld, Science 260, 530 (1993). 145. J. J. Hayes and T. D. Tullius, J M B 227, 407 (1992). 146. D. P. Bazett-Jones and M. L. Brown, MCBioZ 9, 336 (1989). 147. G. P. Schroth, G. R. Cook, E. M. Bradbury and J. M. Gottesfeld, Nature 340,487 (1989). 148. C. Zweib and R. S. Brown, NARes 18, 583 (1990). 149. G. P. Schroth, J. M. Gottesfeld and E. M. Bradbury, NARes 19, 511 (1991). 150. N. B. Ulyanov and V. B. Zhurkin, J. Biomol. Struct. Dyn. 2, 361 (1984). 151. D. Rhodes, E M B O J . 4, 3473 (1985). 152. Y. Y. Xing and A. Worcel, MCBiol9, 499 (1989). 153. J. M. Gottesfeld, J. Blanco and L. L. Tenant, Nature 329, 460 (1987). 154. D. Rau, personal communication (1993). 155. B. Christy and D. Nathans, PNAS 86, 8737 (1989). 156. M. Beato, Cell 56, 335 (1989). 157. R. E. Klevit, J. R. Heriott, and S. J. Horvath, Proteins: Struct., Funct. Genet. 7, 215 (1990). 158. M. Johnson, Microbiol. Aeu. 51, 458 (1987). 159. J. S. Flick and M. Johnson, MCBiol 10, 4757 (1990). 160. R. Marmorstein, M. Carey, M. Ptashne and S. C. Harrison, Nature 356, 408 (1992). 161. M. J. Fedor, N. F. Lue and R. D. Kornberg, J M B 204, 109 (1988). 162. J. D. Baleja, R. Marmorstein, S. C. Harrison and G. Wagner, Nature 356, 450 (1992). 163. P. J. Kraulis, A. R. C. Raine, P. L. Gadhavi and E. D. Laue, Nature 356, 448 (1992). 164. R. M. Evans, Science 242, 889 (1988). 165. J. W. R. Schwabe and D. Rhodes, TlBS 16, 291 (1991). 166. T. Hard, E. Kellenbach, R. Boelens, B. A. Maler, K. Dahlman, L. P. Freedman, J. Carlstedt-Duke, K. R. Yamamoto, J. A. Gustafsson and R. Kaptein, Science 249, 157 (1990). 167. B. F. Luisi, W. X. Xu, Z. Otwinowski, L. P. Freedman, K. R. Yamamoto and P. B. Sigler, Nature 352, 497 (1991).
268
RODNEY E. HARRINGTON AND ILGA WINICOV
1 6 8 . G . Klock, V . Strahle and G. Schutz, Nature 329, 734 (1987). 169. B. M. Forman and H. H. Sainuels. Mol. Endocrinol. 4, 1293 (1990). 170. 1). Pearce and K. R. Yamamoto, Science 259, 1161 (1993). 171. J. W.Funder, Science 259, 1132 (1993). 272, C. Carlberg. I. Bendik, A. Wyss, E. hleier, L. J. Sturzenbecker, J. F. Grippo and W. Hunziker, Sature 361, 657 (1993). 173. S. Green, Mature 361, 590 (1993). 174. P. K. Vogt and T. J. BOS, T l B S 14, 172 (1989). 17.5. T. Hai and T. Curran, PNAS 88, 3720 (1991). 176. T K. Kerppola and T. Curran, Science 254,1210 (1991). 177. P. Angel, E. Hattori, T. Smeal and hi. Karin, Cell 55, 875 (1988). 178. T K. Kerppola and T. Curran, Cell 66, 317 (1991). 179. E. M. Blackwood, L. Eretzner. T. I<. Blackwell, H. Weintraub and R. N. Eisenman, “Origins of Human Cancer: A Comprehensive Review,” p. 3665. CSHLab, Cold Spring Harbor, New York, 1991. 180. T. K. Blackwell, L. Kretzner, E. M. Blackwood, R. N. Eisenman and H. Weintrauh, Science 250, 1149 (1%”. 281. E. Kerkhoff, K. Bister and K.-H. Klempnauer. PNAS 88, 4323 (1991). 182. E. M . B l a c k w d and R. N. Eisenman, Science 251, 1211 (1991). ZR3. E. 91. Blackwood. B. Luscher and R. N. Eisenmdn, Genes Dea 6, 71 (1992). I&. D. Reisman. N. B. Elkind, B. Roy. J. Bearnon and V. Rotter, Cell Growth Differ. 4, 57 (1993). 18.5. $1. Sawaddogo, bl. W.Van Dyke, P. D. Gregor and R. G. Roeder,JRC 263, 11985(1988). 186. H. Beckniann, L.-K. Su and T. Kadesch, Genes Dec. 4, 167 (1990). 187. P. hl. Gilmartin. I.. Sarokin, J. Memelink and N . - H . Chua, PIant Cell 2, 369 (1990). 188. T. E . Ellenherger, C. J. Brandl, K. Struhl and S. C. Harrison, Cell 71, 1223 (1992). 189. A. R. Oliphant, C. J. Brandl and K. Struhl, MCBiol 9, 2944 (1989). 2 9 0 , J. W.Sellers and K. Struhl, Nature 341, 74 (1989). 291. G . Risse. K. Jooss, hl. Neuberg, H. J. Bruller and R. Muller, E M B O J . 8, 3825 (1989). 192. S , Buratowski. S . Hahn. L. Cuarente and P. A. Sharp, Cell 56, 549 (1989). 29$3.B. D. Dynlacht, T. Hoey and R. Tjian, Cell 66, 563 (1991). 194. B. F. Pugh and R. Tjidn, Genes Dec. 5 , 1935 (1991). 19.5. J. Ham, 6. Steger and M . Yaniv, FEBS M I . 307, 81 (1992). I96. H. A. Nash and A. E. Granstoir, Cell 67, 1037 (1991). 197. P. W. J. Rigby, Cell 72, (1993). 298. L. Coniai, N T4nese and R. Tjian, Cell 68, 965 (1992). 199. D, B. Nikolov. S.-H. Hu, J. Lin, A. Gasch, A. Hoffmann. M . Horikoshi, N.-H. Chua, R. 6. Roeder and S. K. Burley. Nature 360, 40 (1992). 200. D. 8.Starr and D. K. Hawley, Cell 67, 1231 (1991). 201. 1). K. Lee, Sl. Horikoshi and R. 6. Roeder, Cell 67, 1241 (1991). 2’02’. Y. Nakatani, M . Horikoshi, hl. Brenner, T. Yaniamoto, F. Besnard, R. 6. Roeder and E. Freese, Nature 348, 86 (1990). 203. R. Breathnach and P. Chanibon, ARB 50, 349 (1981). 204. I. U’itt, X . Stranb, N. F. Kaufer and T. Gross, E M B O J . 12, 1201 (1993). 205. J. J. Lenardo and D. Baltimore, Cell 58, 227 (1989). X f i . R. Sen and D. Baltimore, Cell 46, 705 (1986). 207. H. Kieran. LT. Blank. F. Logeat, 0. J. Vanderchove, F. Lottspeich, 0.LeBail, M. B. Urban, P. Kourilsky, P. .4.Baeuerle and A. Israel. Cell 62, 1007 (1990). 908. S. Ghosh, A . M. Clifford, L. R. Riviere, P. Tempst, G . P. Nolan and D. Baltimore, Cell 62, 1019 (lYgO\.
STRUCTURAL FLEXIBILITY IN
DNA-PROTEIN
INTERACTIONS
269
209. M. Atchison and R. P. Perry, Cell 48, 121 (1987). 210. F. Siddiqui, R. Gaynore, A. Srinivasan, J. Mapoles and R. W. Farr, Virology 169, 479 (1989). 211. G. Nabel and D. Baltimore, Nature 326, 711 (1987). 212. U . Zabel and P. A. Baeuerle, Cell 61, 255 (1990). 213. M. B. Urban and P. A. Baeuerle, Genes Dew. 4, 1975 (1990). 214. M. B. Urban and P. A. Baeuerle, New Biologist 3, 279 (1991). 215. E. Appella, personal communication (1993). 216. R. Schreck, H. Zorbas, E.-L. Winnacker and P. A. Baeuerle, NARes 18, 6498 (1990). 217. P. A. Baeuerle and D. Baltimore, Genes Dew. 3, 1689 (1989). 218. L. Wang, J. D. Helmann and S . C. Winans, Cell 69, 659 (1992). 219. M. Serrano, C. Gutierrez, M. Salas and J. M. Hermoso, J M B 230, 248 (1993). 220. P. M. Gilmartin, L. Saraokin, J. Memelink and N.-H. Chua, Plant Cell 2, 369 (1990). 221. C. Kuhlemeier, Plant Mol. B i d . 19, 1 (1992). 222. F. Katagiri and N.-H. Chua, Trends Genet. 8, 22 (1992). 223. J. Ma., E. Prizibilla, J. H’u, L. Bogorad and M. Ptashne, Nature 334, 631 (1988). 224. G . Giuliano, E. Pichersky, V. S. Malik, M. P. Timko, P. A. Scolnik and A. R. Cashmore, PNAS 85, 7089 (1988). 225. A. J. DeLisle and R. J. Ferl, Plant Cell 2, 547 (1990). 226. U. Schindler and A. R. Cashmore, EMBO J. 9, 3415 (1990). 227. U. Schindler, A. E. Menkens, H. Beckmann, J. R. Eckerand A. R. Cashmore, EMBO]. 11, 1261 (1992). 228. T. Tabata, H. Takase, S. Takayama, K. Mikami, A. Nakatsuka, T. Kawata, T. Nakayamaand M. Iwabuchi, Science 245, 965 (1989). 229. K. Oeda, J. Salinas aad N.-H. Chua, EMBOJ. 10, 1793 (1991). 230. B. Weisshaar, G . A. Armstrong, A. Block, 0. Costa e Silva and K. Hahlbrock, EMBO J. 10, 1777 (1991). 231. M. J. Guiltinan, W. R. Marcotte and R. S. Quatrano, Science 250, 267 (1990). 232. W. H. Landschulz, P. F. Johnson and S. L. McKnight, Science 240, 1759 (1988). 233. C. R. Vinson, P. B. Sigler and S. L. McKnight, Science 246, 911 (1989). 234. J. C. Hu, E. K. O’Shea, P. S. Kim and R. T. Sauer, Science 250, 1400 (1990). 235. R. Rasmussen, D. Benvegnu, E. O’Shea, P. S. Kim and T. Alber, PNAS 88, 561 (1991). 236. R. V. Talanian, C. J. McKnight and P. S. Kim, Science 249, 769 (1990). 237. K. T. O’Neil, J. D. Shuman, C. Ampe and W. F. DeGrado, Bchem 30, 9030 (1991). 238. U. Schindler, W. Terzaghi, H. Beckmann, T. Kadesch and A. R. Cashmore, EMBOJ. 11, 1275 (1992). 239. U. Schindler, H. Beckmann and A. R. Cashmore, Plant Cell 4, 1309 (1992). 240. M. E. Williams, R. Foster and N.-H. Chua, Plant Cell 4, 485 (1992). 241. G . A. Armstrong, B. Weisshaar and K. Hahlbrock, Plant Cell 4, 525 (1992). 242. J. Salinas, K. Oeda and N.-H. Chua, Plant Cell 4, 1485 (1992). 243. R. J. Schmidt, M. Ketudat, M. J. Aukerman and G. Hoschek, Plant Cell 4, 689 (1992). 244. L. D. Pysh, M. J. Aukerman and R. J. Schmidt, Plant Cell 5, 227 (1993). 245. K. Singh, E. S. Dennis, J. G. Ellis, D. J. Llewellyn, J. G. Tokuhisa, J. A. Wahleithner and W. J. Peacock, Plant Cell 2, 891 (1990). 246. F. Katagiri, E. Lam and N.-H. Chua, Nature 340, 727 (1989). 247. E. Lam, Y. Kano-Murakami, P. Gilmartin, B. Niner and N.-H. Chua, Plant Cell 2, 857 (1990). 248. T. Evans and G. Felsenfeld, Cell 5, 877 (1989). 249. M. Yamamoto, L. J. KO, M. W. Leonard, H. Beug, S. H. Orkin and J. D. Engel, Genes Deu. 4, 1650 (1990).
270
RODNEY E. HARRINGTON AND ILGA WINICOV
250. E. Lam and N.-H. Chua, Plant Cell 1, 1147 (1989). R. G. K. Donald and A. R. Cashmore, EMBO]. 9, 1717 (1990). 252. I . Winicvv, Plant Physiol. 102, 681 (1993). 253. P. S. Freemont, I. M. Hanson and J. Trowsdale, Cell 64, 483 (1991). 2%. ’L: Haupt, W.S. Alexander, G. Barri, S. P. Klinken and J. M. Adams, Cell 65, 753 (1991). 255. H. Takatsuji, M. Mori, P. N. Benfey, L. Ren and N.-H. Chua, EMBO J. 11, 241 (1992). 256. X.-W. Deng, M. Matsui, N. Wei, D. Wagner, A. M. Chu, K. A. Feldmann and P. H. Quail, Cell 71, 791 (1992). 257. N . Datta and A. R. Cashmore, Plant Cell 1, 1069 (1989). 2.9. L. Sun, H. A. Doxsee, E. Hare1 and E. M. Tobin, Plant Cell 5, 109 (1993). 259. Y. Wang, W. Zhang, J. Cao, D. McElroy and R. Wu, MCBiol 12, 3399 (1992). 260. B. G. Forde, J. Freeman, J. E. Oliver and M. Pineda, Plant Cell 2, 925 (1990). 261. B. A. Metz, P. Welters, H. J. Hoffmann, E. 0. Jensen, J. Schetland F. J, de Bruijn, hlCG 214, 181 (1988). 262. K. D. Jofuku, J. K. Okarnuro and R. B. Goldberg, Nature 238, 734 (1987). 263 K. Jacobsen, N. B. Laursen, E. 0. Jensen, A. Marcker, C. Poulsen and K. A. Marcker, Plant Cell 2, 85 (1990). 26.1. J . C. Cushnviln and H. J. Bohnert, Pkunt M o l . B i d . 20, 411 (1992). 265. E. Lam and N.-H. Chua, Science 248, 471 (1990). 266. 0. Persic and E. Lam, Plant Cell 4, 831 (1992). 267. P. hl. Gilmartin, J. Memelink, K. Hiratsuka, S. A. Kay and N.-H. Chua, Plant Cell 4, 839 (1992). 268. K. Dehesh, W. B. Bruce and P. H. Quail, Science 250, 1397 (1990). 269. K. Dehesh, H. lung. J. M. Tepperman and P. H. Quail, E M B O J . 11, 4131 (1992). 270. L. A. Wanner and W. Gruissem, Plant Cell 3, 1289 (1991). 271. T. Manzara. P. Carrasm and W. Gruissem, Plant Cell 3, 1305 (1991). 272. B. Piechulla, J.-W. Kellman, E. Pichersky E. Schwartzand H.-H. Forster, MGG 230,413 (1991). 27.3. R. hl. Myers. K. Tilly and T. Maniatis. Science 232, 613 (1986). 274. A. Cowie and R. M. Myers, MCBiol8, 3122 (1988). 275. L. Stuve and R. M. Myers, MCBiol 10, 972 (1990). 276. W. B. Bruce, X.-W. Deng and P. H. Quail, EMBO]. 10, 3015 (1991). 277. T. Evans, 6. Felsenfeld and M. Reitman, Annu. Reo. Cell Biol. 6, 95 (1990). 278. S. Orkin, Cell 63, 665 (1990). 279. E. deBoer, M. Antoniou, V. Mignotte, L. Wall and F. Grosveld, EMBO]. 7, 4203 (1988). 280. K . M . Barnhart, C. G. Kim and M. Sheffery. MCBiol 9, 2606 (1989). 281. R. M. Myers, A. Cowie, L. Stuve, G. Hartzog and I<. Gaensler. in “Hemoglobin Switching, Part A: Transcriptional Regulation” (G. Stamatoyannopoulos and A. Nienhuis, eds.), p. 117. Liss, New York, 1989. 282. hi. Plumb. J. Frampton, H. Wainwright, M. Walker, K. MacLeod, 6. Goodwin and P. Harrison, NARes 17, 73 (1989). 28.3. 6. A. Hartzog and R. M. Myers, MCBiol 13, 44 (1993). 284. I. J. Miller and J. J. Bieker, MCBwl 13, 2776 (1993). 28.5. K. McLeod and M. Plumb, MCBiol 11, 4324 (1991). 285,. L. L. Stuve and R. M. Myers, MCBiol 13, 4311 (1993). 286. E. Tobin and J. Silverthome, Annu. Rec. Plant Physiol. 36, 569 (1985). 287. P. H. Quail. ARGen 25, 389 (1991). 288. W. B. Bruce and P. H. Quail, Plant Cell 2, 1081 (1990). 251.
Nonsense-mediated mRNA Decay in Yeast STUARTW. PELTZ,” FENG HE,? ELLENWELCH?AND ALLANJACOBS ON^^ *Department of Molecular Genetics and Microbiology University of Medicine and Dentistry of New jersey Robert Wood johnson Medical School Piscataway, New Jersey 08854 ?Department of Molecular Genetics and Microbiology University of Massachusetts Medical School Worcester, Massachusetts 01655
He not busy being born is busy dying.
Bob Dylan I. Identification of cis-Acting Sequences Involved in Nonsense-mediated
...... mRNA Decay .............................. 11. trans-Acting Factors Involved in the Nonsense-m Decay Pathway ............................ 111. Possible Functions of the Nonsense-mediated m Decay Pathway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............................................. ... References . . . . . . . . . . . . . . . . . .
272 283 290 293 296
More than 30 years ago, Crick et al. deduced the “general nature of the genetic code” from the results of crosses between mutants in the T4 rIIB cistron (1). They recognized that the code must be nonoverlapping, degenerate, and read from a fixed point; moreover, they predicted that some triplets must be nonsense, i.e., noncoding, and that the possible function of nonsense triplets may be to signal the end of synthesis of the polypeptide chain. Subsequent work from numerous laboratories has substantiated these predictions and demonstrated that, in most genetic systems, 61 of the 64 possi-
1
To whom correspondence should be addressed.
Progress m Nucleic Acid Research and Molecular Biology, Val. 47
27 1
Copyright 0 19% by Academic Press, Inc. All rights of reproduction in any form reserved.
272
STUART W. PELTZ ET AL.
ble triplets encode amino acids, and that the triplets UAA, UAG, and UGA are noncoding and promote translation termination (2). Although the polypeptide chain-terminating effects of the UAA, UAG, and UGA triplets have been amply documented and characterized (3),they are not the only consequences of the Occurrence of nonsense codons. For example, in both prokaryotes and eukaryotes, nonsense mutations in a gene can enhance the decay rate of the mRNA transcribed from that gene (4-16). Such mRNA destabilization must be linked to premature translation termination, because nonsense-containing mRNAs are stabilized in cells containing suppressor tRNAs (9, 12, 17). We describe the niRNA-destabilizing phenoinenon as nonsense-mediated mRNA decay (18-20) and, in this paper, review the cis-acting sequences and trans-acting factors that comprise this pathway in the yeast Saccharoinyces cerecisim.
1. Identification of cis-Acting Sequences Involved in Nonsense-mediated mRNA Decay
A. mRNA Destabilization Is Dependent on Nonsense-codon Localization within the Coding Region An example of nonsense-mediated mRNA decay is shown in Fig. 1. In this experiment, either of two plasmid-borne PGKI alleles have been transformed into cells that harbor a temperature-sensitive mutation in RNA polymerase 11. One of the two PGIil alleles encodes a wild-type protein, whereas the other contains an arnber mutation at the 5.6% position of the P G K l coding region (see Fig. 1).To measure inRNA decay rates, RNA was isolated from these cells at different times after RNA polymerase I1 had been inactivated by a shift to 36°C. As can be seen in Fig. 1, the PGKl transcript containing the nonsense mutation has both a lower steady-state level (mRNA level at t = 0) and a more rapid decay rate than its wild-type counterpart. Both strains have comparable decay rates for the CYH2 (ribosomal protein) mKNA (Fig. 1), demonstrating that nonsense mutations in one transcript have no effkct on the decay rates of other mRNAs within the same cell. Early work on nonsense-mediated mRNA decay showed that the extent of rnHNA destabilization is position-dependent, because 5’-proximal nonsense mutations destabilized the URA3 and U R A l transcripts to a greater degree than those that were 3’-proximal (12, 16). To address this positioneffect more precisely, a linker harboring amber codons was inserted (in separate constructs) into six restriction sites of the P G K l gene (Fig. 2).
NONSENSE-MEDIATED
mRNA DECAY
IN YEAST
273
FIG. 1: The effect of an amber mutation on the decay of the PGKl mRNA. Decay rates for the mRNAs encoded by wild-type and early nonsense-containing PGKl alleles were determined by blot analysis of RNAs isolated at different times after transcription was inhibited by a shiR from 24 to 36°C in a strain harboring a temperature-sensitive RNA polymerase I1 (21).Both PGK1 alleles contain an oligonucleotide “tag” sequence in their 3’ untranslated region to facilitate mRNA detection (see description of PGKl alleles in Fig. 2). The RNA blot was hybridized with radioactive probes complementary to either the tag sequence or to the CYH2 (ribosomal protein) mRNA. From 22a.
Normally, this gene encodes an abundant, stable mRNA (20-22). To facilitate analysis of the mutant alleles, a DNA tag was inserted into the PGKl 3’ untranslated region. mRNA half-life measurements for these PGKl alleles indicate that 5’-proximal nonsense mutations accelerate the PGKl mRNA decay rate more than 3‘-proximal mutations, although the relationship is nonlinear (Fig. 3). Nonsense mutations that terminate translation after 55% or less of the PGKl coding sequence accelerate the PGKl mRNA decay rate approximately 12-fold. A nonsense mutation that allows translation of 67% of the protein coding region still increases the PGKZ mRNA decay rate 4-fold, whereas nonsense mutations inserted in the last quarter of the PGKl transcript have no effect (Fig. 3).
B. Possible Explanations for the Position Effect Four models that attempt to explain the observed position effects are considered in Fig. 4. Model 1 suggests that an early nonsense codon promotes premature translation termination that, in turn, creates a ribosome-
274
STUART W. PELTZ ET AL. 5‘ UT
CODING SEOUENCE
3’ UT
I 1
1
I H2
Asp
(3)
(2)
I
PGKI
5‘
- CTAGCTAGCTAG - 3
3.
GATCGATCGLITC
Amber stop codon
5.6
H2
39.0
H2 (1)
X
Bglll
Add fag (fa to fhe 3’ UT regior
Clone linker into various sites
55.0 67.7 76.2
92.6
%
codlng reglon
FIG. 2. Construction of PGKl nonsense alleles. The PCU gene is depicted at the top of figure, with the PCKl ccding sequence represented by the thick black bar. An oligonucleotide “tag” (represented by the dotted box) was inserted into the PGKl 3’ untranslated region. A DNA linker containing amber mutations in all three reading frames was inserted into the PGIU coding region at the restriction sites shown. The location ofeach nonsense mutation in the PGIU trd!lSCript is presented as the percentage of the PGKl protein-coding region that is translated before the amber mutation is encountered, Abbreviations: H2, IlincII; Asp, Asp-718; X, XhaI. From 20.
free zone on the mRNA that is a target for cellular nucleases (Fig. 4, mRNA 1).In this case, the position effect could be attributed to a nuclease requirement for a minimum size of ribosome-free target. Model 2 suggests that the ribosome (or one of its subunits) plays an active role in the decay process. It is proposed that a small fraction of terminating ribosomes (or a newly bound ribosome) scans the mRNA downstream of the nonsense codon seeking a simple sequence (drawn arbitrarily as AGUC; Fig. 4, mRNA 2). An encounter with the simplc sequence would then trigger
NONSENSE-MEDIATED
mRNA DECAY
275
IN YEAST
70 60 50
40
30 20 10 C
20
C
40
60
80
100
% of PGKl protein coding region
H2
(3) FIG.3. The position of an amber codon in the PGKl protein-coding region determines its effect on mRNA decay rates. The half-lives of the mRNAs encoded by the PGKl nonsense alleles described in Fig. 2 were determined as described in the legend to Fig. 1. The positions of the nonsense mutations in the PGKl coding region are presented as describedin the legend to Fig. 2.
rapid decay, possibly as a consequence of translation reinitiation or any other form of ribosome pause. The position effect could be explained as stochastic, i.e., the likelihood of encountering the simple sequence would diminish as the nonsense codons approach the 3'-terminus of the mRNA. Model 3 suggests that rapid mRNA decay is dependent on the concurrence of two events: a ribosome encounter with a nonsense codon and the presence on that ribosome of a specific factor. The position effect is explained by proposing that the factor falls off (or is specifically ejected from) the ribosome as a function of translational distance (Fig. 4, mRNA 3). Specific ejection of the factor would depend on translocation through a specific sequence and would constitute a mode of regulation for the nonsensemediated mRNA decay pathway. Model 4 combines the tenets of models 2 and 3, suggesting that a ribosome downstream of the nonsense codon requires both a specific downstream sequence and a bound factor in order to promote rapid mRNA decay (Fig. 4, mRNA 4).These models make specific predictions about cis-acting sequences and trans-acting factors that have been tested in recent experiments (20).
'/
FIG.4. Models for the mechanism of nonsense-mediated inHNA decay. Possible mechanisms by which a nonsense mutation may promote rapid mRNA turnover are depicted. Details are provided i n Section I . B . Top: A schematic of a wild-type transcript with its complement of ribosomes. (1)A nonsense-mntaining mRN4 with a rilmsome-free zone that is a substrate for cytoplasmic nucleases (zigzag arrows). (2) A nonsense-mntaining mRNA in which a downstream ribosome (or subunit) seeks a specific simple sequence required to trigger decay. (3) Two nonsense-containing mRNAs are shown, one with an early nonsense mutation and one with a late nonsense mutation. A factor (shown as a ball) required for nonsense-mediated decay falls off (or is ejected from) ribosomes before they reach the late nonsense codon. (4)A combination of models 2 and 3; rapid mRNA decay is depicted as requiring a downstream sequence and a bound factor (in addition to a nonsense codon). From 20.
NONSENSE-MEDIATED
mRNA
DECAY IN YEAST
277
C. Nonsense-mediated Decay of the PGK7 mRNA Requires Both a Nonsense Codon and Sequences Downstream of the Nonsense Codon The discontinuous relationship between the position of a nonsense codon and its effect on mRNA turnover (Fig. 3) suggested that sequences downstream of the nonsense codon may play a role in nonsense-mediated decay, either because they act as sites accessible to nuclease attack or because sequences in addition to a nonsense codon are required to trigger mRNA decay (see Fig. 4). To determine whether sequences 3’ of a nonsense mutation are required for nonsense-mediated mRNA decay, various amounts of PGKl protein-coding sequence downstream of a nonsense mutation located at 5% of the PGKZ coding region were deleted. Deletions that removed sequences between 5 and 67%, 5 and 55%, and 5 and 39% of the PGKZ coding sequence did not inactivate the nonsense-mediated mRNA decay pathway (20) (Fig. 5). The mRNAs encoded by these constructs all had halflives of 5 minutes. However, mRNA decay rates for two PGKZ nonsense alleles that contained deletions of either 5 to 92% or 5 to 76%of the PGKl protein-coding region downstream of the amber mutation were reduced approximately 10-fold (Fig. 5). Experiments that inserted small regions of the deleted DNA back into a PGKZ nonsense allele in which most of the protein-coding region was deleted demonstrated that a 106-nt HincII-XbaI fragment, when positioned downstream of the nonsense codon, can promote rapid decay of its mRNA (20) (Fig. 5). Although the 106-nt element is specific, it is not unique. Deletion of the 106-nt fragment from an otherwise intact PGKl gene containing an early nonsense mutation did not stabilize the resultant transcript, indicating that there are redundant 3’ cis-acting element(s) in the PGKZ gene that can promote nonsense-mediated mRNA decay (20).
D. What Is the Role of the Downstream Element? Three observations suggest that the role of the downstream element in the nonsense-mediated mRNA decay pathway may be to promote translation reinitiation: (1)three ATG codons lie within the 106-nt sequence that serves as a functional downstream element (20) (Fig. 6); (2)insertion of a stem-loop structure, which inhibits both translation initiation and reinitiation, immediately downstream of a nonsense codon stabilizes an otherwise unstable PGKZ transcript ( 2 2 ~ )and ; (3) 3-aminotriazole, an inhibitor of amino-acid biosynthesis that reduces the capacity of cells to reinitiate translation at downstream start-codons [as in the case of the GCN4 transcript (23)],also
278
STUART W. PELTL ET AL.
H2
(3)
Asp
H2 (2)
H2 (1)
X
Bglll
mRNA halflife (min)
3 - 5
mu-
30 30
5 24 FIG. 5. Decay rates for PGKl mRNAs containing deletions downstream of a nonsense codon. PGW nonsense alleles containing a 5’-proximal nonsense mutation [located in the H2(3) site] and deletions of various amounts of the coding sequence downstream of the nonsense mutation are depicted. mRNA decay rates were determined as described in the legend to Fig. 1. The PGKZ coding sequence is represented by the thick bar and the sequence that was deleted is indicated by the absence of the thick bar. The deletions comprise the sequences lwtweeri (1)5.6 and 92%. (2) 5.6 and 76.2%. (3) 5.6 and 67.7%,(4) 5.6 and 55%, and (5) 5.6 and 39% of the PGKl coding region. In an additional construct, the PGKl sequence between the Xbol and HincII(1) sites (the I W n t downstream element) was inserted in both orientations downstream of a 5’-proximal nonsense mutation in a PCKl allele with a large deletion of its coding region (the PGKl allele shown in mRNA 1). From 20.
stabilizes mRNAs with nonsense mutations without affecting the decay of wild-type transcripts (22a). These observations prompted a determination of whether deleting or mutating the potential translation reinitiation sites (the ATG codons) in the downstream element sequence would affect the ability of the downstream sequence elements to promote nonsense-mediated mRNA decay. Deletion FIG.6. Deletion analysis of the downstream element. The sequences of the downstream element and of 11 deletion mutants are shown. Downstream elements containing the various deletions were inserted into the mini-PCK1 allele containing a nonsense codon at the H2(3)site (depicted schematically at the top). Half-lives for the respective PGKl mRNAs were determined as described in the legend to Fig. 1. From 20.
H2
(3)
Asp
H2 (2)
H2 (1)
X
Bglll
GACTTCATCATTGCTGATGCmCTCTGCTGATGCCAACACCAAGACTGTCACTGACAAGGAAGGTATTCCAGCTGGCTGGCAAGGGTTGGACAATGGTCCAGAAT C m C T C T G C T G ATGCCAACACCAAGACTGTCACTGACAAGGAAGGTATTCCAGCTGGCTGGCAAGGGTTGGACAATGGTCCAGAAT
3
15
CCAACACCAAGACTGTCACTGACAAGGAAGGTATTCCAGCTGGCTGGCAAGGGTTGGACAATGGTCCAGAAT
24
GTCACTGACAAGGAAGGTATTCCAGCTGGCTGGCAAGGGTTGGACAATGGTCCAGAAT
27
TCCAGCTGGCTGGCAAGGGTTGGACAATGGTCCAGAAT TGGACAATGGTCCAGAAT
25 27
GACTTCATCATTGCTGATGCmCTCTGCTGATGCCAACACCAAGACTGTCACTGACAAGGAAGGTATTCCAGCTGGCTGGCAAGGGTTGGACAATGGTCCAGAAT
GACTTCATCATTGCTGATGCTTTCTCTGCTG ATGCCAACACCAAGACTGTCACTGACAAGGAAGGTATTCCAGCTGGCTGG
3
GACTTCATCATTGCTG ATG CTTTCTCTGCTG ATG CCAACACCAAGACTGTCACTGA
15 15
GACTTCATCATTGCTG ATG CTTTCTCTGCTG (10) GACTTCATCATTGCTG ATG C
26
( 1 1 ) GACTTCATCATTGCTG A
27
( 1 2) GACTTCATCATTGCTG
CTTTCTCTGCTG
CCAACACCAAGACTGTCACTGACAAGGAAGGTATTCCAGCTGGCTGGCAAGGGTTGGACA ATG GTCCAGAAT
80
280
STUART W. PELTZ ET AL.
analysis demonstrated that the third ATG triplet in the downstream element is not required for nonsense-mediated mRNA decay (20) (Fig. 6). The two other ATG codons reside in the first 34 nt of the element and are essential for rapid mRNA decay (see Fig. 6 , constructs 2, 3, and 12). These ATG codons are 12 nt apart and, surprisingly, are each contained within identical 9-nt sequences. To assess the importance of the ATG codons in the promotion of nonsense-mediated inRNA decay, the G nucleotide in each was mutated to other bases. Certain combinations of mutations that altered the G nucleotide of the First two ATGs (ATG-1 and ATG-2) stabilized nonsense-containing mRNAs. whereas other mutation combinations had no effect on the nonsense-mediated mRNA decay pathway. For example, in two constructs in which the downstream element contained a mutation of ATG-1 to an ATA codon, the mRNAs were stabilized six- to nine-fold (22n). Taken together with the deletion analyses, these results indicate that ATG-1 and ATG-2 are important components of the downstream element.
E. The Position-dependent Effects of Nonsense Mutations on mRNA Turnover Are a Consequence of Modulations in the Activity of the Nonsense-mediated mRNA Decay Pathway When between 67 and 76.6% of the PGKl protein-coding sequence has been translated, the PGKI transcript becomes insensitive to the nonsensemediated decay pathway (Fig. 3). When a functional downstream element was inserted 3’ of a nonsense mutation at 92% of the PGZU protein-coding region, mRNA decay rates were unaffected, indicating that 3’-proximal nonsense mutations are resistant to the nonsense-mediated mRNA decay pathway for reasons other than a lack of a specific downstream element (20) (Fig. 7). At least two hypotheses could explain this observation: (1)translation of a specific element or region in the PGKl transcript inactivates the nonsensemediated mRNA decay pathway; or (2) a factor(s) involved in the nonsensemediated mRN A decay pathway becomes inactivated stochastically as ribosomes trarislocate the PCKl transcript. To differentiate between the two hypotheses, the stabilities of PGKl transcripts in which additional protein-coding sequences were inserted upstream of a nonsense mutation that promoted nonsense-mediated mRNA decay were compared. These experiments showed that inserting the region comprising 55 to 7 6 8 of the PGKl protein-coding region, but not another sequence of coinparable size, upstream and in-frame of a nonsense codon, partially stabilized this mRNA from a half-life of 3-5 minutes to one of 15 minutes (20) (Fig. 8). Inserting the 55 to 76% region into the wild-type PGKl
28 1
NONSENSE-MEDIATEDmRNA DECAY IN YEAST
15 FIG.7. Downstream elements inserted distal to 3’-proximal nonsense mutations in the PGKl gene do not accelerate mRNA decay. Sequences downstream of 3’-proximal nonsense mutations in the PGKl gene were replaced with PGKl protein-coding sequences harboring downstream elements. Schematic representations of the PGKl alleles and a summary of the respective mRNA half-lives (measured as described in the legend to Fig. 1)are shown. The thick bar represents the PGKI gene, the stippled box represents the nonsense mutation, and the open rectangle represents the sequences inserted downstream of the nonsense mutation. The PGKl alleles represented in (2) and (3) replaced sequences downstream of the nonsense mutation inserted at 92.6%of the coding region, whereas (4)and (5)replaced sequences downstream of a nonsense mutation inserted at 55% of the coding region. The sequences added were as follows: (1)none; control PGKl allele with a nonsense mutation inserted at the BgZII site (92% of the PGKl protein-coding region); (2)from 5.6% of the PGKl protein-coding region to the end of the gene; (3) from 39%of the PCKl protein-coding region to the end of the gene; (4) from 5.6% of the PGKl protein-coding region to the end of the gene; (5) from 39% of the PGKl protein-coding region to the end of the gene. From 20.
gene did not affect mRNA decay, demonstrating that this sequence by itself is not an instability element (Fig. 8). These results ruled out the stochastic inactivation model and indicated that there is a specific sequence in the PGKZ transcript that, when translated, inactivates the nonsense-mediated mRNA decay pathway. The partial activity of this sequence suggests that the element may not have been isolated in its entirety. Such a stabilizing element may promote resistance to the nonsense-mediated decay pathway by promoting the loss of a ribosome-associated factor required for nonsensemediated mRNA decay. The protein encoded by the UPFl or UPF3 genes (or one of their “interacting factors”) are candidates for such a ribosomeassociated, stabilizer-regulated, factor (10, 18, 19, 24-26; see Section 11). Regulation of the nonsense-mediated pathway may be essential to prevent its activity during the course of normal translation termination. Evidence to suggest that this pathway is, indeed, inactive during normal translation termination is provided by the existence of relatively stable mRNAs
~1-1~ 5.6
80.0
A
A
A
A
55.0 67.7 762
i 92.8
KK -
I
FRAOYEWTS DwA
+
",$::
5.8
I
39.0
55.0 67.7 78.2
S2.8
",$y
lnwR fnpnnnta Into
mRNA Decay Rate (min)
PQKl IIW*
K
K
K
K
mANA Decay Rete (min)
FIG. 8. Sequences in the PGKl gene inactivate the nonsense-mediated mRNA decay pathway. The outline of the experimental approach and a summary of the results are shown. PGKl protein-coding sequences were inserted in-frame into either the wild-type PGKl gene or upstream of a nonsense mutation located at the 55% site of the PGKl protein-coding region. The following two segments of the PGKl coding region were inserted into the Asp-718 (or KpnI; see Fig. 2) site of the PGKl alleles: (1) the amino-terminal 21% of the P G K l coding sequence and (2) the segment between 55 and 76.2% of the PGKl coding region. PGKl alleles containing these insertions were transformed into a yeast strain harboring a temperature-sensitive RNA polymerase I1 mutation [rnbl-I)and the decav rates of the resoective transcrints were d e t e r m i n d From 30
NONSENSE-MEDIATED
mRNA
DECAY IN YEAST
283
with very short open reading frames, e.g., CUP1 mRNA [0.5 kb; t,,, = 18 minutes (see I S ) ] .
II. trans-Acting Factors Involved in the Nonsensemediated mRNA Decay Pathway
A. Non-tRNA Nonsense Suppressors: Products of the UPF Genes Nonsense suppressors in yeast are either tRNA mutants, capable of decoding a translation termination codon, or mutants with lesions in non-tRNA genes, which enhance the expression of nonsense-containing alleles by other mechanisms. The latter mutants include the allosuppressors, frameshift suppressors, and omnipotent suppressors (23, 27). At least two of these suppressors, u p f l and upf3, act by suppressing nonsense-mediated mRNA decay. Mutants in the UPFl gene (and in UPF2, UPF3, and UPF4)were isolated on the basis of their ability to act as allosuppressors of the his4-38 frameshift mutation (28).The latter mutation is a single G insertion in the HIS4 gene that generates a $1 frameshift and a UAA nonsense codon in the triplet adjacent to the insertion (29).At 30"C, but not 3TC, his4-38 is suppressed by S U F I - I , which encodes a glycine tRNA capable of reading a four-base codon (30). Mutations in UPFl allow cells that are his4-38, SUFI-I to grow at 37°C (28). The basis of this suppression appears to be the loss of function of a trans-acting factor (Upflp) essential for nonsense-mediated mRNA decay. Mutations in UPFl lead to the selective stabilization of mRNAs containing early nonsense mutations without affecting the decay rates of most other mRNAs (10) (Table I). Thus, in a UPFl deletion mutant (upflA), the his4-38 mRNA is stabilized approximately fivefold (10). The effect of the u p f l A mutation on the turnover of the various PGKI nonsense alleles is position-independent. Regardless of position, all of the PGKZ nonsense alleles have mRNA half-lives on the order of an hour in a u p f l A strain (20) (Table I), a result indicating that loss of UPFl function restores wild-type decay rates to mRNAs that would otherwise have been susceptible to the enhancement of decay rates promoted by nonsense codons. This result also demonstrates that neither the linker used to insert nonsense mutations into the PGKZ gene, nor the PGKZ sequence itself, contained instability elements capable of accelerating PGKl mRNA decay rates independent of the nonsense-mediated mRNA decay pathway. Suppression of nonsense-mediated mRNA decay in u p f l A strains does not appear to result from enhanced read-through of the termination signal (IO),nor does it appear to be specific for a single nonsense codon. The ability
284
STUAHT W. PELTZ El' AL.
TL4BLEI mRNA DECAYR 4 n s IN ISOGENIC STRAINS CONTAINING EITHER U'ILV-TYPE OR
DELETEDU P F l GENESO t , , , (minutes)
Relative abundance Transcript
(UPF1- / U P F l + )
UPF1+
UPFl-
4.3 3.35
3.5 3.5
14 16
1.0
14
14
5.0 1.0
3.5 12
20 13
0.89 1.0 1.16 0.84 1.14 0.89 1.0 1.1 1.2 1.1
13 14 4.0 3.5 37 60 18 15 13 12.2
14 14 4.0 3.5 34 60 18 16 11 10.4
4.5
4.5 13 4.0 8.0 3.0
2.6 1.0 2.0 2.2 2.4 1.0 20 12 12 4.0 1.0 1.0 1.2
11
4.0 7.0 1.0
60 3.0 5.0 5.0 15 60 60 16.2
60 60 60 60 60 60 60 17.8
OrnRh'A decd) rates were determined b, northern blotting assays of RN.4 isolated after a teinprrature shift in strains harboring the rpbl-1 mutation and either the wild-type L'PFI gene or a deletion of the LiPFl gene. Construction of upflA/rphl-i strains is described in 10. Refaiiw abundance is the ratio of the steady-state mRNA levels in u&A versus U P F l + strains. The first fivc mRNA5 arc transcribed from nonsense alleles of the respective genes. hid-713 is a 3'-proximal nonsense allele of the H I S 4 gene. Data are from 10,18, 20, and 480.
NONSENSE-MEDIATED
mRNA
DECAY IN YEAST
285
of upfl- mutants to suppress tyr7-1 (UAG), leu2-1 (UAA), leu2-2, (UGA), met8-1 (UAG), and his4-166 (UGA) (26)indicates that they can act as omnipotent suppressors. Because many biosynthetic pathways do not require maximal levels of expression for cell survival [e.g., only 6% of the HIS4 gene product is required for growth on plates lacking histidine (31)],upfl- mutants must suppress by stabilizing nonsense-containing transcripts, thus allowing synthesis of sufficient read-through protein to permit cells to grow under conditions nonpermissive for UPFl cells. The OPFl gene has been cloned and sequenced and shown to be (1) nonessential for viability on rich media (10, 24), (2) capable of encoding a 109-kDa protein with both zinc-finger, nucleotide (GTP) binding site, and RNA helicase (superfamily I) motifs (10, 24, 25), (3) identical to NAM7, a nuclear gene isolated as a high-copy suppressor of mitochondria1 RNA splicing mutations (10, 24), and (4) partially homologous to the yeast SEN1 gene (10, 24, 26). The latter encodes a noncatalytic subunit of the tRNA splicing endonuclease complex (32), suggesting that Upflp may also be part of a nuclease complex targeted specifically to nonsense-containing mRNAs. +
€3. Subcellular Localization of Upfl p The identification of a stabilizer element in the PGKl transcript led to the hypothesis that a factor required for nonsense-mediated mRNA decay (e.g., Upflp) must be associated with ribosomes in order for it to be active (see Section 1,E). To determine the subcellular localization of Upflp, a FLAG epitope tag [Met-Asp-Tyr-Lys-Asp-Asp-Asp-Asp-Lys (33)]was inserted at the amino-terminal end of the UPFl gene, creating the FLAG-UPFI allele (19). The FLAG-UPFI gene was subcloned into both centromere and high-copy yeast plasmids and transformed into a u p f i A strain. Both biochemical and genetic assays demonstrated that the activity of Upflp was unaffected by the FLAG sequence, i.e., the his4-38 mRNA was degraded rapidly in both UPFl and FLAG-UPFl+ strains, but was stabilized fivefold in a upfl- strain (19, 33a). Furthermore, the FLAG-UPFI allele abrogated omnipotent suppression of the Zeu2-2 allele normally observed in a upflstrain (19, 33a). To determine whether Upflp is ribosome associated, postmitochondrial extracts prepared from cycloheximide-treated cells harboring the FLAGUPFl gene on a high-copy plasmid were fractionated by centrifugation through sucrose gradients. The presence of FLAG-Upflp in the gradient fractions was determined by western blotting, probing the blots with a monoclonal antibody directed against the FLAG epitope (antibodies against the FLAG nanopeptide are commercially available). These experiments indicate that most of the FLAG-Upflp in the extract cosediments with polysomes, although a small amount of the protein is also contained in the +
286
STUART W. PELTZ E T AL.
Polysome
Fractions
cs"
u
5
6
7
8
9 10
Mono Upper Fractions Fractions
-11 1 2 1 3 1 4 1 5 1 6
FIG.9. Localization of the FLAG-UPFI protein in yeaqt extracts. The FLAG-UPFI allele was inserted into a high-copy plasmid and transformed into a yeast strain harboring a deletion of the UPFl gene. A cytoplasmic extract WZF prepared from cycloheximide-treated cells and fractionated on a sucrose gradient. The absorbance profile is shown on the left. Aliquots from the polysome fractions, the monosome fractions, and fractions from the upper part of the sucrosr gradient were concentrated and analyzed by western blotting, probing with a FLAGspecific monoclonal antibody (generously provided by Immunex Corp.). Reproduced from 19, with permission from Springer-Verlag.
monosome and mRNP fractions (19) (Fig. 9). Upflp associated with the polysome fraction consists predominantly of a 11BkDa polypeptide and a smaller amount of a 108-kDa polypeptide. This result suggests that Upflp may be modified when it associates with ribosomes. Subcellular localization of Upflp has also been monitored by indirect immunofluorescence and by electron microscopy of thin sections (33b).In both types of experiments Upflp was identified by exploiting the FLAG epitoye tag and a FLAG-specific monoclonal antibody. Both approaches indicate that Upflp is distributed throughout the cell in approximately 200-400 distinct clusters. For example, "staining of Upflp by indirect immunofluorescence is punctate (Fig. 10). By electron microscopy, these clusters appear to be 50-100 A in diameter, present predominantly in the cytoplasm, but definitely detectable in the nucleus (33b). These results suggest that Upflp may be part of a degradation complex and that at least some components of that complex may cycle in and out of the nucleus. A nuclear aspect to this pathway is intriguing in light of' the identification of putative Upflp-interacting proteins that are known nuclear components (see Section II,C) and recent studies in mammalian cells suggesting that premature nonsense codons may alter pre-mRNA turnover or processing (5, 7, 34, 35).
NONSENSE-MEDIATED
mRNA DECAY IN
YEAST
287
FIG. 10. Localization of Upflp in yeast cells by indirect immunofluorescence. Indirect immunofluorescence of yeast cells (upflA) harboring the FLAG-UPFI allele on a centromere plasmid was performed using a monoclonal antibody directed against the FLAG epitope (generously provided by Immunex Corp.) and goat anti-mouse IgG and IgM (H and L) conjugated to fluorescein (DTAF). From 33b.
C. Identification of Genes that Encode Potential Upfl p-interacting Proteins Fields and colleagues have developed a genetic method (the “two-hybrid system”)for the identification of genes that encode interacting proteins (36, 37). The procedure is based on the observation that the DNA binding and
288
STUART W. PELTZ ET AL.
activation functions of the GALA transcription activator can reside on two distinct polypeptides and still activate transcription from a GAL upstream activating sequence (UAS), provided that the two polypeptides can bind to each other (37, 38). A plasmid vector constructed by Fields and co-workers was designed to fuse a coding region of interest to sequences encoding the DNA-binding domain of GALA (36, 37). In a second set of vectors, genomic DNA libraries were fused, in three separate frames, to sequences encoding the transcription activation domain of GALA. The polypeptides encoded by both sets of gene fusions were targeted to the nucleus by the SV40 T-antigen or GALA nuclear localization signals. In cells transformed with the binding domain fusion and the activation domain fusions, a GAL1 /lac2 reporter gene serves as an indicator of a functional interaction between polypeptides encoded by the two plasmids. We have utilized this method to identify putative Upflp-interacting proteins. Using a, fusion of the entire UPF1 protein-coding region to the GALA DNA-binding domain, and yeast genomic libraries containing fragments fused to the GALA activation domain, we have screened 400,000 transformants for colonies that demonstrated P-galactosidase activity (F. He, S. W. Peltz, A. H. Brown and A. Jacobson, unpublished). The activation domain plasmids were isolated and tested for specificity by retransformation with either (1)the GALA binding domain vector only, (2)the original GALAIUPFI fusion plasmid, (3)an unrelated GALA DNA-binding domain-CEP1 fusion, or (4) an unrelated GALA DNA-binding domain-lamin fusion. There were 42 plasmids that yielded blue colonies only with the GALA DNA-binding domain-UPF1 fusion; these were characterized further by restriction mapping, Southern blotting, and DNA sequence analysis, yielding nine diEerent genes (F. He and A. Jacobson unpublished). Six of these genes encode putative Upflp-interacting proteins, because their activity in the assay is dependent on fusion to the GALA activation domain (see Table 11). The remaining three genes do not require the presence of the GALA activation domain and are likely to possess their own activation sequence and nuclear localization signal. Their candidacy as genes encoding Upflpinteracting proteins remains to be established. Of the six putative genes encoding Upflp-interacting proteins, two are identical to previously characterized yeast genes: (1) DBP2, a gene encoding a putative RNA helicase with homology to the mammalian p68 RNA helicase (39), and (2) S N P l , a gene encoding a U 1 snRNP 70-kDa protein homologue (40).The other four have no significant homologues in the available databases and have been designated nonsense-mediated decay genes ( N M D I -NMD4). The sizes of the respective polypeptides, the consequences of gene disruption, and chromosome assignments for the six genes encoding putative Upflp-interacting proteins are summarized in Table 11.
NONSENSE-MEDIATED
mRNA DECAY
289
IN YEAST
TABLE I1 PUTATIVEUP~~P-~NTERACTING PROTEINS~
Gene
Relative P-galactosidase activity
Polypeptide (number of amino acids)
Gene disruption
NMDl NMD2 NMD3 NMD4 SNPl DBP2
+++ +++ ++ +++ + +++
659 1089 487 218 300 546
Lethal Viable Lethal Viable Lethal Lethal
Chromosome ~~~
14 8 8 12 9 13
=Yeast genes encoding putative Upflp-interacting proteins were identified by the screening procedures of Fields and co-workers (36, 37). Genes showing specificity for interactions with Upflp were cloned and sequenced and analyzed for the in oioo consequences of gene disruption. Of the six genes identified, two were identical to previously isolated genes (SNPI and DBPZ) (39, 4 0 ) and four were not previously characterized. The latter were named N M D l - N M D 4 (nonsense-mediated decay). Relative P-galactosidaseactivity refers to the intensity of color development on X-gal plates in the standard “two-hybrid screen. Polypeptide size is deduced from the respective nucleotide sequences. Chromosome location is derived from physical mapping or from the sequence analysis of neighboring, mapped genes. Gene disruption data was obtained directly for NMDZ-NMD4. Gene disruption data for SNPl and DBPZ are !?om previously published studies (39, 40). From F. He, S. W. Peltz, A. H. B r m , and A. Jacobson (unpublished).
Additional experiments must still be done to demonstrate the significance of these genes to the nonsense-mediated mRNA decay pathway. In part, the role of these genes will be addressed by assessing the mRNA decay phenotypes of deletion or conditionally lethal mutants and the subcellular localization and biochemical interactions of the respective proteins. Nevertheless, the identification of a putative RNA helicase and a snRNP protein is extremely interesting. Because several translation initiation factors are RNA helicases (41, 42), this result may reflect a role for reinitiation in triggering mRNA decay. Alternatively, RNA helicases may be required to unwind RNA in order for it to be a substrate for the turnover machinery. The recovery of SNPI in this screen is consistent with the localization of a fraction of Upflp to the nucleus (see Section II,B) and may reflect the existence of a nuclear component to this pathway.
D. Mutants in trans-acting Factors Required for Nonsense-mediated mRNA Decay May Facilitate the Identification of Cis-acting Instability Elements of Inherently Unstable mRNAs Deletion of the UPF1 gene restores wild-type decay rates to mRNAs containing premature translational termination codons, but has no effect on inherently unstable mRNAs, i. e., those mRNAs whose instability is intrinsic
290
STUART W. PELTZ ET AL.
to their wild-type primary sequence (10, 18, 20) (Table I). This observation indicates that there are at least two independent rapid-decay pathways. The ability of UPFl mutations to inactivate only the nonsense-mediated rapid-decay pathway and the previous demonstration that the cis-acting coding-region instability element of the MATal mRNA must be translated in order to function (22) suggest an alternative method for identifying such cisacting instability elements in inherently unstable mRNAs. The approach would involve the insertion of nonsense mutations in various positions of the protein-coding region of a gene of interest and the subsequent measurement of mRNA decay rates of the respective nonsense alleles in a upflh strain. Because the use of a upflA strain (or any other mutant in which a transacting factor essential for nonsense-mediated decay was inactivated) would eliminate activity of the nonsense-mediated mRNA decay pathway, nonsense mutations that precede the instability element would prevent translation of the element and the mRNA would not be degraded rapidly. Conversely, nonsense mutations distal to the instability element would allow translation through the instability element and would result in rapid turnover of the transcript. Previous results indicate that this approach should be viable. Ribosome translocation up to or through the MATal instability element is required for rapid decay of PGKIIMATu1 and ACTIIMATul chimeric mRNAs (22). Insertion of a nonsense codon 5' of the MATcil instability element prevented turnover of the hybrid transcripts, demonstrating that if this sequence is not translated it cannot promote degradation (22). Unlike the hybrid gene approach (22, 43-45), the nonsense mutation approach would create only minor alterations in the endogenous transcript, allowing identification of instability elements in an almost wild-type context.
111. Possible Functions of the Nonsense-mediated mRNA Decay Pathway
A. Surveillance for pre-mRNAs that Have Entered the Cytoplasm The existence of tram-acting factors that promote rapid decay of nonsense-containing niRNAs raises the question of whether such mRNAs are the sole substrates of these factors, i.e., whether the cell has an apparatus to degrade nonsense-containing mRNAs specifically. It seemed unlikely that the normal function of the nonsense-mediated decay pathway was anticipatory, i.e., solely involved in the degradation of mRNAs derived from
NONSENSE-MEDIATED
mRNA DECAY
IN YEAST
29 1
nonsense alleles, so we sought to determine whether the pathway had additional substrates. Because introns generally lack contiguous open reading frames, and yeast introns are almost always at the 5’ ends of their genes (46), we considered it possible that the pathway is involved in controlling the abundance of yeast pre-mRNAs that enter the cytoplasm. If this supposition were correct, the presence of unspliced introns within a pre-mRNA would lead to premature translational termination and accelerated RNA decay in wild-type strains, but not in a upfl- strain. To test this hypothesis, the half-lives and cytoplasmic localization of two yeast pre-mRNAs that are inefficiently spliced (CYH2 and RP51 B premRNAs) and one pre-mRNA whose splicing is regulated ( M E R 2 pre-mRNA) were determined in isogenic UPFl+ or upfl- yeast strains (47). Starting from the normal translation initiation sites, ribosomes translating these premRNAs would encounter the first in-frame nonsense triplet at codon 19 in CYH2 pre-mRNA, codon 8 in RP51 B pre-mRNA, and codon 132 in M E R 2 pre-mRNA (47). Using the effect of nonsense codons on the PGKl mRNA as a paradigm (Fig. 3), the initial nonsense codons in all three pre-mRNAs should be sufficiently “early” to trigger nonsense-mediated mRNA decay. The decay rate and steady-state levels of the CYH2 and RP51B mRNAs are unaffected by the status of the UPFl gene (47) (Fig. 11). However, turnover and accumulation of the CYH2, RP51 B , and M E R 2 pre-mRNAs differed in U P F l + and upfl- strains. Compared to their decay rates in a U P F l + strain, these pre-mRNAs were stabilized two- to fivefold in upflcells (Fig. 11). These results suggest that cytoplasmic pre-mRNAs in yeast comprise a major class of substrate for the nonsense-mediated mRNA decay pathway. Crucial to this conclusion is evidence that the phenomena in question are actually cytoplasmic and not nuclear. Degradation of a pre-mRNA fraction by a cytoplasmic decay pathway is supported by the observations that (1) intron-containing pre-mRNAs associate with ribosomes in upfl cells, (2) the sizes of the respective polysomes are consistent with the relative positions of the first nonsense codons within the respective transcripts, and (3) the effects of a UPFl mutation on pre-mRNA accumulation and turnover are limited to a subset of inefficiently spliced pre-mRNAs (10, 47). The latter observation suggests that “escape” of pre-mRNAs from the spliceosome assembly/nuclear-retention system into the cytoplasm may vary inversely as a function of splicing efficiency. If a major source of nonsense-containing cytoplasmic transcripts in yeast is pre-mRNAs, then the prevalence of introns at the 5’ ends of yeast genes (46) may be due, in part, to the existence of a cellular mechanism that ensures rapid degradation of those pre-mRNAs. The UPFl gene product would thus function as part of the machinery that degrades these transcripts
292
STUART W. PELTZ ET AL.
FIG. 11. Decay of the CYHZ,RP5IB. and M E J U transcripts. Decay rates of these transcripts were determined by blot analysis of RNAs isolated at different times after transcription was inhibited in isogenic C'PFl+ or up@ strains. (A) Hybridization of the RNA blot with a radioactive CYEiB D N A probe containing both intron and exon sequences. (B) Rehybridization of the blot shown in A with a radioactive CYW2 D N A probe containing only intron sequences. (C) Hyhridi7ation of the RNA blot with a radioactive RP51B DNA probe containing both intron and exon sequences. (D) Hybridization of the RNA blot with a radioactive M E N riboprobe containing both intron and exon sequences. From 47.
to reduce the generation of potentially deleterious polypeptide nonsense fragments.
B. Other Possible Functions of the Nonsensemediated mRNA Decay Pathway An alternative function of the nonsense-mediated mRNA pathway may
be to regulate the decay rates of transcripts with upstream open reading frames. The transcript of the PPRl gene has a five-codon open reading frame upstream of the PPRI coding sequence, arranged so that the termination codon for the upstream open reading frame overlaps with the ATG of the coding sequence (48). The PPRl mRNA decays at one-third the rate in a u p f l - - strain compared to a UPFl+ strain (48a) (Table I), suggesting that upstream open reading frames, in addition to reducing the frequency of downstream translation initiation (23, 41), may also destabilize specific transcripts. This destabilization mechanism would not necessarily pertain to all mRNAs with upstream open reading frames because there is no effect of
NONSENSE-MEDIATED
mRNA DECAY
IN YEAST
293
UPFl status on the decay rate of the GCN4 mRNA (48a) (Table I). Such differences may reflect the multiplicity of upstream open reading frames in the GCN4 mRNA (23)or may be attributable to differences in the strengths of the respective translation terminators, i. e., the contribution of nonsense codon context to the induction of destabilization remains to be elucidated. As noted in Section III,A, a general function of the nonsense-mediated mRNA decay pathway may be to ensure that aberrant proteins do not accumulate in the cell. Thus, in addition to pre-mRNAs that have escaped from the nucleus because their splicing is inefficient or regulated, substrates for the pathway could include RNAs with transcription or splicing errors, products of incomplete RNA editing (49, 50), and polysomes in which the reading frame has inappropriately shifted. With regard to the latter, it is conceivable that the sequences promoting rapid decay in mRNAs that are inherently unstable do so by promoting a shift in the translational reading frame, which, in turn, could ultimately make the mRNA in question a substrate for the nonsense-mediated decay pathway. Such a confluence of decay pathways would be suggested if mutants, drugs, or cis-acting elements exist that have comparable effects on nonsense-mediated decay and on the decay of inherently unstable mRNAs. At present, there is no such evidence to suggest that these two decay pathways overlap.
IV. Conclusions
A. Translation and Turnover Are Intimately Linked The observation that premature translation termination can promote rapid inRNA decay is just one in a large set of observations that point to an important role for translation in the process of mRNA decay. Additional evidence for this linkage comes from experiments showing that (1) inhibitors of translation elongation (e.g., cycloheximide) or mutations that inhibit protein synthesis stabilize mRNAs (18,21,51,52);(2) mRNAs undergoing rapid decay are polysome associated (10, 47, 53); (3) degradative factors can be polysome associated (19, 54,55);(4) instability elements are located in coding regions (18, 22, 43-45, 52, 56-59); (5) the activity of coding-region instability-elements depends on ribosome translocation up to, or near, the element (22, 60); and (6)metabolism of the poly(A) tail, a structure involved in translation initiation (61-63), is a rate-limiting step in the decay of several mRNAs (52, 59, 63, 64). Although these observations are consistent with the postulated role of ribosomal translocation (see Fig. 4), the requirement for translation is not likely to be limited to a single step in the turnover process. Ongoing protein synthesis may be essential at several mutually dependent events in
294
STUART W. PELTL E T AL.
nonsense-mediated mRNA decay, including, (1)the recognition of the nonsense codon, (2) the recognition of the downstream element, (3)the delivery or activation of essential factors, and (4) maintenance of the levels of components of the decay machinery that are metabolically unstable.
B. Does the Downstream Element Have to Be a Site of Translational Reinitiation? We have postulated that, following nonsense codon-induced translation termination, a ribosome or subunit scans 3' until it recognizes the downstream element (see Fig. 4). The available data suggest that this element may serve as a site of translation reinitiation (see Section I,D), but alternative models can also be inferred. For example, an essential event in the destabilization process may be an interaction of mRNA and rRNA analogous to that occurring in prokaryotic initiation (65)and proposed for mammalian internal initiation (66).An analysis of possible complementary sequences showed that the 9-nt sequence surrounding the ATG-1 and ATG-2 codons is complementary to sequences in yeast 184 rRNA (18)(Fig. 12). The nucleotides bracketing the third ATG in the 106-nt downstream element will not accommodate this base-pairing scheme and do not show significant complementarity to any region in the 18-S rRNA. Interestingly, a 14-nt sequence from the instability element of the inherently unstable MATal mRNA (22) is also complementary to the same region of 18-S rRNA (reviewed in 18).Clearly, these interactions are only hypothetical and must be weighed in the light of models that suggest that this region of rRNA may be involved in intramolecular basepairing (67). However, this premise merits attention, because substantive evidence has emerged in recent years indicating that rRNA has a functional role in translation (65, 68).
C. Possible Functions of the Necessary Sequences and Factors A likely consequence of either event (i.e., translational reinitiation or mRNA-rRNA base-pairing) may be ribosome pausing. We consider it possible that a ribosome that has paused at a specific site (see site A, Fig. 12) may expose a downstream nuclease recognition-site (see site B, Fig. 12) that could then be cleaved by either a soluble or a ribosome-bound or ribosomeactivated nuclease. A two-site model, in which the first site potentiates the cleavage mechanism and the second site is the actual position of the nucleolytic attack, is consistent with the deletion data of Fig. 6. Moreover, the dependence on a ribosome-bound or ribosome-activated nuclease is consistent with the available data for both the coding region stabilizer element and the UPFl gene product.
NONSENSE-MEDIATEDmRNA DECAY IN YEAST
295
Nonsense-containing mRNA 3'
5'
I
/ /
/
,
\
,, ,,
FIG. 12. Two-site model for the role of the downstream element in promoting nonsensemediated mRNA decay. Site A of the element is postulated to be a position of ribosome pausing, possibly as a consequence of mRNA.rRNA interactions. Shown is a region of theoretical basepairing between a segment of yeast 184 rRNA and sequences flanking a downstream AUG codon required for destabilization of a PGKl mRNA containing a premature translational termination codon. Site B indicates a position of endonucleolytic cleavage. Staggered lines between the 4 0 4 subunit and the putative nuclease are intended to suggest a possible interaction. The numbers flanking the 1 8 3 rRNA sequence are the nucleotide positions within the primary sequence. From 18. See Sections IV,B and IV,C for details.
We interpret the stabilizer experiments to indicate that a translation or destabilization factor falls off the ribosome as a consequence of traversing the sequence element, and propose that such a factor may be the protein encoded by either the UPFl or UPF3 genes or one of their interacting factors. As such, this protein could be (1) a translation initiation factor that is also required for a reinitiation event that may trigger mRNA degradation, (2) a nuclease, activated by a downstream reinitiation event, or (3) a factor that promotes an interaction with a specific nuclease or nuclease complex. The latter possibility is strengthened by the observation that Upflp is present in the cell in distinct clusters (see Fig. 10). The finding of a fraction of Upflp
29-96
STUAHT W. PELTL ET AL.
localized to the nucleus (see Section 11,B) indicates that this protein may associate with mRNA prior to transport to the cytoplasm, i.e., it may have a function independent of its role in promoting nonsense-mediated mRNA decay.
ACKNOWLEVGMENTS
This work wa5 supported by grants from the National Institutes of Health to A. J. (GM27757) and to S.\L‘.P. (GM48631). We thank Agneta Brown, Janet Donahue, Chris Trotta, Chris Powe n , Peter Leeds, Michael Culbertson. Michael Rosbash. and Robert Singer for their contrihutions to the experiments described in this review.
REFERENCES I . F . H . C. Crick, L. Barnett, S . Brenner and R. J. Watts-Tobin, Nature 192, 1227 (1961). 2. S. Osawa, T. H. Jukes, K. LVatanabe and A. hluto, Microbiol. Aeo. 56, 229 (1992). 3. \V. J. Craigen, C. C. Lee and C. T. Caskey, Mol. Microbiol. 4, 861 (1990). 4 . C. F. Barker and K . Beemon, MCBicJl 11, 2760 (1991). 5. S . J. Baserga and E. J. Benz, Jr., PNAS 89, 2935 (1992). 6. B. Baumann, M. J. Potash and G. Kohler, EMBOJ. 4, 351 (1985). 7. J. Chrng and L. E. Maquat, MCBiol 13, 1892 (1993). h. M . - L . Gaspar. T. bleo, P. Bourgarel, J.-L. Guenet and hl. Tosi, PNAS 88, 8606 (1991). 9. D. Gomlbo and S . Hohmann, Ctrrr. Genet. 17, 77 (1990). 10. P. Leeds, S. W. Peltz. A . Jacobson and M. R. Culbertson, Genes Deu. 5, 2303 (1991). 11. S . - K . Lim, C. D. Sigmund, K. W. Gross and L. E. Maquat, MCBiol 12, 1149 (1992). 12. R. Losson and F. Lacroute, PNAS 76, 5134 (1979). 13. L. E. Maquat, A. J. Kinniburgh, E. A. Rachniilewitz and J. Ross, Cell 27, 543 (1981). 24. D. E. Morse and C. Yanofsky, Nature 224, 329 (1969). 15. G . Nilsson, J. G. Belasco, S. N. Cohen and A. von Gabain, FA’AS 84, 4890 (1987). 16. F. Pelsy and F. Lacroute. Ctrrr. Genet. 8, 277 (1984). 17. P. Belgrader. J. Cheng and L. E. Maquat, PNAS 90,482 (1993). 18. S. W. Peltz and A . Jacobson, in “Control of mRNA Stability” (J. 6. Belasco and 6. Brawerman, eds.), p, 291. Academic Press, Sari Diego, 1993. 19. S. W. Peltz, C. Trotta, H. Feng, A. Brown, J. Donahue, E. Welch and A. Jacobson. in “Protein Synthesis and Targetting in Yeast” (M. Tuite, J. McCarthy, A. Brown and F. Sherman, eds.), p. 1. Springer-Verlag, New York, 1993. 20. S. W. Peltz, A. H. Brown and A. Jacobson, Genes Dec. 7, 1737 (1993). 21. 11 Herrick, R. Parker and A. Jacobson, MCBiol 10, 2269 (1990). 22. H. Parker and A . Jacobson, PNAS 87, 2780 (1990). 22a. S. W. Peltz, ,4. H. Brown and A. Jacobson, unpublished. 23. A. G. Hinnebusch and S. W. Liebman, in ‘The Molecular and Cellular Biology of the Yeidst Saecharornyces”(J. R. Broach, J. R. Pringle and E. U’,Jones,eds.), Vol. I. p. 627. CSHLal),
Cold Spring Harbor, New Ynrk, 1991. 24. N. .4ltamura, 0. Grotidinsky, 6. Dujardin and P. P. Sloniniski, Jd4H 224, 575 (1992). 25. E. V. Koonin, TZBS 17, 495 (1992). 2F. 1’. k e d s , J. hi. Wood, 8.-S. Lee and M. R. Culbertson, MCBiol 12, 2165 (1992).
NONSENSE-MEDIATED
mRNA DECAY
IN YEAST
297
A. P. Surguchov, TZBS 13, 120 (1988). M. R. Culbertson, K. M. Underbrink and G. R. Fink, Genetics 95, 833 (1980). T. F. Donahue, P. J. Farabaugh and G. R. Fink, Science 212, 455 (1981). M. D. Mendenhall, P. Leeds, H. Fen, L. Mathison, M. Zwick, C. Sleiziz and M. R. Culbertson, J M B 194, 41 (1987). 31. R. F. Gaber and M. R. Culbertson, MCBiol4, 2052 (1984). 32. M. Winey and M. R. Culbertson, Genetics 118, 607 (1988). 33. T. P. Hopp, K. S. Prickett, V. L. Price, R. T. Libby, C. J. March, P. Ceretti, D. L. Urdal and P. J. Conlon, Biotechnology 6, 1204 (1988). 33a. C. Trotta, S. W. Peltz, A. H. Brown and A. Jacobson, unpublished. 33b. A. H. Brown, C. Powers, R. Singer, S. W. Peltz and A. Jacobson, unpublished. 34. G. Urlaub, P. J. Mitchell, C. J. Ciudad and L. A. Chasin, MCBiol 9, 2868 (1989). 35. L. K. Naeger, R. V. Schoborg, Q. Zhao, G. E. Tullis and D. J. Pintel, Genes Deu. 6, 1107 (1992). 36. C.-T. Chien, P.L. Bartel, R. Stemglanz and S. Fields, PNAS 88, 9578 (1991). 37. S. Fields and 0 . - K . Song, Nature 340, 245 (1989). 38. G. Ma and M. Ptashne, Cell 55, 443 (1988). 39. R. D. Iggo, D. J. Jamieson, S. A. MacNeill, J. Southgate, J. McPheat and D. P. Lane, MCBiol 11, 1326 (1991). 40. V. Smith and B. G. Barrell, EMBO J. 10, 2627 (1991). 41. M. Kozak, JBC 286, 19867 (1991). 42. H. Yoon and T. F. Donahue, Mol. Microbiol. 6, 1413 (1992). 43. A. Jacobson, A. H. Brown, J. L. Donahue, D. Herrick, R. Parker and S. W. Peltz, in “Posttranscriptional Regulation of Gene Expression” (J. E. G. McCarthy and M . F. Tuite, eds.), p. 45. Springer-Verlag, Berlin, 1990. 44. B. Heaton, C. Decker, D. Muhlrad, J. Donahue, A. Jacobson and R. Parker, NARes 20, 5365 (1992). 45. D. Herrick and A. Jacobson, Gene 114, 35 (1992). 46. G. R. Fink, Cell 49, 5 (1987). 47. F. He, S. W. Peltz, J. L. Donahue, M. Rosbash and A. Jacobson, PNAS 90, 7034 (1993). 48. R. Losson, R. P. P. Fuchs and F. Lacroute, EMBOJ. 2, 2179 (1983). 48a. S. W. Peltz, A. H. Brown, J. Wood, A. Atkins, P. Leeds, M. Culbertson and A. Jacobson, unpublished. 49. L. Simpson and J. Shaw, Cell 57, 355 (1989). 50. K. Stuart, Annu. Reu. Microbiol. 45, 327 (1991). 51. S. W. Peltz, J. L. Donahue and A. Jacobson, MCBiol 12, 5778 (1992). 52. S. W. Peltz, G. Brewer, P. Bernstein and J. Ross, Crit. Reu. Eukaryotic Gene Expression 1, 99 (1991). 53. D. W. Cleveland, TlBS 13, 339 (1988). 54. S. W. Peltz and J. Ross, MCBiol7, 4345 (1987). 55. J. Ross and G. Kobs, J M B 188, 579 (1986). 56. R. Wisdom and W. Lee, Genes Dew. 5, 232 (1991). 57. P. Bernstein, D. Herrick, R. D. Prokipcak and J. Ross, Genes Deu. 6, 642 (1992). 58. K. S. Kabnick and D. E. Housman, MCBiol8, 3244 (1988). 59. A.-B. Shyu, J. G. Belasco and M. E. Greenberg, Genes Dew. 5, 221 (1991). 60. q. A. Graves, N. B. Pandey, N. Chodchoy and W. F. Marzluff, Cell 48, 615 (1987). 61. D. Munroe and A. Jacobson, MCBiol 10, 3441 (1990). 62. D. Munroe and A. Jacobson, Gene 91, 151 (1990). 63. A. Sachs, Cell 74, 413 (1993). 64. A.-B. Shyu, M. E. Greenberg and J. G. Belasco, Genes Deu. 3, 60 (1989). 27. 28. 29. 30.
298
STUART W. PELTZ ET AL.
65. H. F. Noller, A R B 60, 191 (1991). 66. N. Sonilenberg, Trerrds Genet. 7, 105 (1991). 67. E. Dams, I,. Hendriks, Y. Van de Peer, J . - M . Neefs, 6. Srnits, I. Vandenbempt and R. De Wachter, NARes, Suppl. 16, r87 (1988). 68. A. Dahlkwrg. Cell 57, 525 (1989).
Molecular Biology and Regulatory Aspects of Glycogen Biosynthesis in Bacteria’ PREISS*~ AND TONYROMEO? JACK
*Department of Biochemistry Michigan State Unioersity East Lansing, Michigan 48824 tDepartFnt of Microbiology and Immunology University of North Texas Health Science Center Fort Worth. Texas 76107 I. Genetic Regulation of the Glycogen Synthesis Pathway in Escherichia coli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Enzymes of the Glycogen Biosynthetic Pathway A Stationary Phase .................................. B. Structural Genes for Glycogen Biosynthesis Are Clustere Adjacent Operons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. Evidence for Positive Control of the glgCAP(Y) Operon . D. Negative Control of Two glg Operons via csrA . . . . . . . . . . . . . . . . . E. Exr70 Transcribes gZgCAP .................................. F. Potential Control of Glycogen Biosynthesis via the Regulated Expression of Structural Genes Outside the glg Gene Cluster . . . . . . G. An Integrated Model for the Genetic Regulation of the Glycogen Synthesis Pathway in Escherichin colt . . ... 11. Site-directed Mutagenesis of the ADPglucos Genes to Study Structure-Function Relationships of Enzyme Action and Regulatory Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Allosteric Regulation of Bacterial ADPglucose Pyrophosphorylase B. Catalytic and Allosteric Effector Sites of ADPglucose Pyrophosphorylase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111. Glossary ........ .. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
301 301 301 304 306 312 314
314
315 315 319 326 327
Glycogen accumulates in many bacteria as an energy-reserve compound, particularly when growth is limited and an excess of a carbon source is available (reviewed in 1, 2). The precise function of glycogen in bacteria, 1 2
A glossary for this chapter appears on pp. 326-327. To whom correspondence should be addressed.
Progress in Nucleic Acid Research and Molecular Biology, Vol. 47
299
Copyright 8 1994 by Academic Press, Inc All rights of reproduction in any farm reserved
300
JACK PHEISS AND TONY H O M E 0
however, is not clear; published results suggest that it may play a role in prolonging viability by providing a source of energy, being a source of energy for sporogenesis in certain Bucilli (3).In many oral bacteria, e.g., Streptococcus rnitis (4)or Streptococcus mutans (5), glycogen degradation may be important in dental caries development due to the acid formed from the catabolism of the polysaccharide. The reactions leading to glycogen synthesis in bacteria have been extensively studied since 1964 (6).First, ADPglucose is synthesized in a reaction catalyzed by ADPglucose pyrophosphorylase (EC 2.7.7.27) (reaction 1).The glucosyl moiety of the formed ADPglucose is then transferred, in a reaction catalyzed by an ADPglucose-specific glycogen synthase (EC 2.4.1.21) (reaction 21, to either a maltodextrin or glycogen primer to form a new a-1,4glucosidic bond. Subsequently, branching enzyme (EC 2.4.1.18) catalyzes fortnation of the branched a-1,6-glucosidic linkages in glycogen from the growing polyglucose chain. These enzyme-catalyzed reactions have been observed in extracts from at least 46 bacterial species (1, 7). ATP
+ a-glucose-1-P % ADPglucose + PP,
ADPgliicose
+ a-glucan + a-1,4-glucosyl-g~ucan+ ADP
Linear a-1,4-polyglucosyl chain -+ Branched a-1,4-a-1,6-glucan (i.e., glycogen)
(1) (2)
(3)
Genetic evidence that bacterial glycogen synthesis occurs mainly if not solely via the ADPglucose pathway has been obtained with Escherichia coli or Salinonellu typhitnurium mutants devoid of, or deficient in, glycogen (reviewed in 1 , 2 , 6,8-10). Compared to the wild-type strain, they are either defective in ADPglucose pyrophosphorylase (ADPGlc PPase) or glycogen spnthase activity or both. In addition, mutants containing amounts of glycogen in excess of that observed in wild-type strains have also been isolated. These mutants are derepressed with respect to ADPGlc PPase, glycogen synthase, and/or branching enzyme activities. Thus, at least with E . coli and S . typhiniuriutn, the data strongly indicate the importance of the ADPglucose pathway for synthesis of bacterial glycogen in vivo. Reviews of bacterial glycogen synthesis ( 1 , 2, 6, 8-10) have emphasized the enzyinological aspects concerning its regulation. However, much information has recently been obtained with respect to genetic regulation of the expression of the biosynthetic enzymes. The structural genes of the glycogen biosynthetic enzymes of E . coli and S. typhimuriuni have been cloned, which has not only increased our knowledge of the genetic regulation of glycogen synthesis, but also permits, via oligonucleotide-directed mutagenesis of the E. coli g2gC gene, the structural gene encoding ADPGlc
REGULATION OF BACTERIAL GLYCOGEN SYNTHESIS
301
PPase, knowledge of various aspects of structure-function relationships concerning its catalytic and allosteric activity. It is, therefore, our intent to present the current information, views, and concepts of regulation of enzyme activity and expression of the glycogen biosynthetic enzymes.
1. Genetic Regulation of the Glycogen Synthesis Pathway in Escherichia coli
A. Enzymes of the Glycogen Biosynthetic Pathway Are Induced in the Stationary Phase Considerable evidence that glycogen biosynthesis is subject to genetic regulation was obtained prior to the current era of molecular biology studies. This evidence is briefly summarized here, as it has been amply discussed and cited in previous reviews (1,2, 6-10). The levels of glycogen biosynthetic enzymes in E . coli increase as cultures enter the stationary phase. The rate of glycogen synthesis is also inversely related to growth rate when growth is limited for certain nutrients, e.g., nitrogen. When cells are grown in an enriched medium containing yeast extract and 1% glucose, the specific activities of ADPGlc PPase and glycogen synthase increase 11- to 12-fold, and that of glycogen branching enzyme increases fivefold, as cultures enter stationary phase (Table I). In contrast, the enzyme activities are quite elevated in the exponential phase when the organism is grown in a defined medium (Table I). Branching enzyme in a defined medium is fully induced in the exponential phase, while there is only about twofold increase in specific activity of the ADPGlc PPase and glycogen synthase when cells grown in a defined medium reach the stationary phase. These experiments suggest that the gene encoding the branching enzyme is regulated differently from the genes for ADPGlc PPase and glycogen synthase. As would be expected for a pathway under transcriptional control, the addition of inhibitors of RNA or protein synthesis to prestationary phase cultures prevents the enhancement of glycogen synthesis in the stationary phase.
B. Structural Genes for Glycogen Biosynthesis Are Clustered in Two Adjacent Operons The structural genes for glycogen synthesis were shown to be located at approximately 75 min on the E . coli K-12 chromosome, and the gene order at this location was subsequently established by transduction to be glgAglgC-glgB-asd (14). These genes encode the enzymes glycogen synthase,
302
JACK PREISS AND TONY ROMEO
TABLE I GLYCOGEN,ADPGLCCOSE PYROPHOSPHORYLASE, GLYCOCENSYNTHASE, AND BRANCHINGENZYME LEVELSIN E coli B, S typhimuriurn LT-2, AND MUTANTS SG3, AC70R1, AND JPl02.
Glycogen (mg/g cells) Growth condition5 1%Glucose.
enriched media
Organism (wet wt.)
Escherichia coli B SG3 AC70R1
0.6% Glucose, minimal media
Escherichio coli B
0.6% Glucose,
Solnioriella typhitnuriuin LT-2 fP102
defined media
SG3 AC7ORI
ADPGlc PPase (nmol/mg/ mmj
Exp
Stat
Exp
Stat
1.3 2.9 12.8
19 26 83
4 39 45
43 122 410
4.2 7.2 44
20 37 68
21 79 370
43 129 300
2.3
24
9
-
55
7.5
35
Glycogen synthase (nniollmgl min) Exp
Branching enzyme (nmol/mg/ min)
Stat
Exp
Stat
84 123 310
820 700 2 100
430 420 2200
43 85 490
93 138 530
420 760 1790
56 76 179
44
84
220
28
7.5 41 55
225
47
.The enzynie levels are for exponential (Exp) and stationary (Stat) phases. The growth of the organisms, rnedi compositions, and assays of enzyniatic activity and glycogen levels have k e n described (11-13); also K . Steiner and Preiss, nnpublished.
ADPGlc PPase, glycogen branching enzyme, and an enzyme that is not involved in glycogen biosynthesis, aspartate semialdehyde dehydrogenase (EC 1.2.1.ll), respectively. The molecular cloning of the E. coli glg structural genes (15) was a seminal accomplishment that greatly facilitated the subsequent study of the genetic regulation of bacterial glycogen biosynthesis. Because the glg genes are dispensable for growth, they are not amenable to direct selection and were originally cloned into pBR322 via selection with the closely linked essential gene asd. Among several asd+ plasmid clones that were isolated, POP12 was found to contain a 10.5-kb PstI fragment that encoded the structural genes glgC, &A, and glgB. A generally applicable method for cloning cw-1,4-glucan biosynthetic genes, based on screening of clones with iodine vapor, was subsequently developed (16). This approach does not require coselection via an essential gene, and in principle should allow direct cloning of structural glg genes from any bacterium into E. coli, as well as the cloning of regulatory genes that affect glycogen synthesis. The arrangement of genes encoded by pOP12 was assessed by deletionmapping experiments (15), and the nucleotide sequence of the entire glg gene cluster was determined (17-21). The genetic and physical map of the E.
303
REGULATION OF BACTERIAL GLYCOGEN SYNTHESIS
coli K-12 glg gene cluster is shown in Fig. 1. The continuous nucleotide sequence of over 15 kb of this region of the genome has been established, including the sequences of the flanking genes asd (22) and g2pD (glycerol phosphate dehydrogenase (EC 1.1.1.8) (23).This region of the E. coli K-12 chromosome is centered at 4140 kb on the physical map of Kohara et al. (24) and is encompassed within 3584 to 3594 kb on version 6 of the physical map of Rudd et al. (original version, see 25; version 6, K. E. Rudd, personal communication). Nucleotide sequence analysis indicated that in addition to the gZgC, glgA, and glgB genes, pOP12 contains an open reading frame (ORF), gZgX, located between glgB and gZgC, and a second ORF, glgY, located downstream from glgA (20).The deduced amino-acid sequence of gZgX is significantly related to those of glucan hydrolases and transferases, including a-amylases, pullulanase, cyclodextrin glucanotransferase, the glycogen branching enzyme, and others. The homologous regions include residues that have been reported to be involved in substrate binding and cleavage by taka-amylase. The gZgY gene was identified by homology with rabbit muscle glycogen phosphorylase (20). This gene encodes glycogen phosphorylase via the expression and characterization of its gene product and has been alternatively designated as glgP (21). Neither glgX nor gZgP(Y) are required for glucan biosynthesis, suggesting that both may be involved in the catabolism of glycogen (20). Detailed inspection of the organization of the gene cluster suggests that the glg genes may be transcribed as two tandomly arranged operons, glgBX and glgCAP(Y) (Fig. 1).The coding regions of glgB and gZgX ORFs overlap by 1 bp, glgC and glgA are separated by 2 bp, and genes glgA and glgP(Y)are separated by 18 bp. The close proximity of these genes suggests translational
I' 0
II : I
I
I 4
I I I
I
I
8 Kilobases
U
I
i
t
I
1
12
FIG. 1. Structure of the glycogen gene cluster in E . coZi K-12. The restriction map was constructed completely from known contiguous sequences. All of the genes are transcribed from left to right (counterclockwise on the genome) except for gZpD.
304
JACK PREISS AND TONY ROMEO
coupling within the two proposed operons. On the other hand, a noncoding region of approximately 500 bp separates glgB and glgC. As described in Section I,C, studies of the regulation of the glg structural genes, using 'ZacZ translational fusions and other approaches, are also consistent with a twooperon arrangement for the glg gene cluster, in which the glgCAP and glgB operons are both preceded by growth phase-regulated promoters. Transcripts initiating upstream of glgC have been analyzed by S1-nuclease mapping (Fig. 2), as discussed in Section 1,C.
C. Evidence for Positive Control of the glgCAP(Y) Operon 1. REGuLArIoN BY cAMP AND CAMP-RECEPTOR PROTEIN
The first evidence that cAMP affects bacterial glycogen synthesis was that addition of exogenous CAMP to E . coli W4597(K) results in a modest enhancement in the rate of in ciuo glycogen biosynthesis (26,27). It was later observed that the genes cya, encoding adenylate cyclase (EC 4.6.1.l),and crp, encoding CAMP-receptor protein (CRP), are required for optimal synthesis of glycogen, and that exogenous CAMP can restore glycogen synthesis in a cya strain but not in a crp mutant (28). Because CAMP itself is not an effector of this enzyme and the addition of CAMP to cultures does not affect the intracellular concentrations of the allosteric effectors of ADPGlc PPase, it was proposed that CAMP may affect the synthesis of an enzyme that metabolizes an unknown effector of ADPglucose pyrophosphorylase. It has since been demonstrated that cAMP and CRP are strong positive
Transcripts
D
C
B
A
g w
g w Mutations that enhance glgC nanscnotion gfgR (cis)
CRP-binding site
glgQ (trans) csrA (trans) plgC'-'lac2 fusion in p ~ z 3 - 3
I
+ +
+ +
+
+ +
+ + 100 base pairs
FIG.2 . Regulatory sites for gZ& transcription. The CRP-binding region was located by mobility-shift analysis (-39).DNase I protection, and a computer-assisted search for sequence aimilarity to a consensus sequence for CRP sites (T. Romeo, unpublished). Transcripts from wild-type strains of E . coli B and E . coli K-12 and isogenic glycogen synthesis mutants were analyzed by Sl-iiucleaqe protection analysis (29; M. Y. Liu and T. Romeo, unpublished).
REGULATION OF BACTERIAL GLYCOGEN SYNTHESIS
305
regulators of the expression of the glgC and glgA genes, but do not affect glgB expression (29).The addition of CAMP and CRP to S-30 extract in vitrocoupled transcription-translation reactions and containing pOP12 as the genetic template resulted in up to 25-fold and 10-fold enhancement of the expression of glgC and glgA, respectively, but did not affect glgB expression (29). The stimulatory effect of cAMP saturated at approximately 100 K M , a concentration within the physiological range of this molecule. cAMP and CRP also enhanced the expression of glgC and glgA encoded by either plasmids or restriction fragments in reactions of completely defined composition, the dipeptide synthesis assay (30).In these reactions, a DNA template is used to program transcription, and the formation of the first dipeptide of experimentally specified gene products is quantified (29, 30). A restriction fragment that contained glgC and 0.5 kb of DNA from the upstream noncoding region of glgC was sufficient to permit CAMP-CRP regulated expression in the dipeptide synthesis assay (29),suggesting that the glgC gene contains its own CAMP-regulated promoter(s). Direct experimental evidence for a CRP-binding site in the upstream region of glgC was obtained using gel retardation analysis (29). Computerassisted search for a consensus CRP-binding sequence within the glgC upstream region has revealed a potential site preceding both the E . coli and the S . typhimurium glgC genes (Fig. 2). DNase I “footprint” experiments indicate that this proposed CRP-binding site of the E . coli glgC gene is protected by CAMP-CRP in vitro (T. Romeo, unpublished). The stimulatory effect of cAMP on the in vivo expression of glgC and glgA has also been demonstrated. Experiments in which proteins encoded by the glg structural genes from plasmid pOP12 were expressed in maxicells showed that exogenously added cAMP stimulated the expression of gZgC and glgA, but did not affect glgB expression (31).Further evidence that glgC is regulated by cAMP in vivo was obtained by constructing an in-frame plasmid-encoded glgC‘-‘ZacZ translational fusion, designated as pCZ3-3, which contained 0.5 kb of the upstream noncoding region of glgC (31).This gene fusion was expressed approximately fivefold better in a cya+ versus an isogenic A-cya strain, and expression was stimulated by exogenous cAMP in both strains.
2.
REGULATION BY
ppGpp
The first evidence that glycogen biosynthesis in E . coli is positively regulated by ppGpp was that relA strains are glycogen deficient (32-34). The relA gene encodes the primary biosynthetic enzyme for ppGpp and pppGpp and mediates the stringent response (reviewed in 35). A second enzyme that catalyzes degradation of ppGpp and that may also synthesize ppGpp in E . coli is encoded by the gene spoT (36).
306
JACK PREISS AND TONY ROMEO
Although a number of hypotheses have been proposed to account for the effect of the relA mutation on glycogen synthesis, it is now well established that the expression of the glgC and glgA genes is stimulated by ppGpp (29, 31). Expression of glgC in transcription-translation reactions was enhanced three- to fourfold in the presence of ppGpp; glgA expression exhibited approximately twofo\d enhancement. The expression of glgB was not significantly aEected by ppGpp. In agreement with the observations made on the effects of cAMP and ppGpp on glycogen synthesis in intact cells (28), cAMP and ppGpp were capable of independent effects on glgCA expression in in vitro transcriptiontranslation experiments (29). Therefore, it was particularly intriguing that their combined effects on gZgC expression in transcription-translation experiments were synergistic (29). The addition of CAMP-CRP or ppGpp resulted in an enhancement of 6.3-fold or l.Bfold, respectively, in the expression of glgC over the basal or unactivated level of expression, whereas the combination of these regulators led to an 18.8-fold stimulation. Experiments in which the effect of ppGpp was monitored at varying concentrations on glgC expression in the dipeptide synthesis assay failed to show significant effects in either the presence or absence of CAMP-CRP (29). It was proposed that regulation via ppGpp required one or more additional factors absent from the defined dipeptide synthesis assay but present in the S-30 cell extract that was used for transcription-translation experiments. Aboud and Pastan reached a similar conclusion regarding the positive effect of ppGpp on lac2 transcription, which was observed in crude (S-30) assays and not in purified transcription assays (37). This has also been suggested for the expression of his (38).Although there is evidence that ppGpp interacts directly with RNA polymerase, in the absence of other factors, to alter the rate of transcription (35),this may not be true for these positively regulated genes. Experimental evidence for positive regulation of glgC expression in vivo by ppGpp was obtained using the g1gC’-’lac2 translational fusion in pCZ3-3 (31). This gene fusion was introduced into strains comprising an isogenic series that varied in basal levels of ppGpp due to increasingly severe mutations in spoT (39). The expression of the glgC‘-‘lacZ gene fusion was exponentially correlated with ppGpp levels in this series of strains (31), apparently the inverse of the effect of ppGpp on rrnA P1 (39),which is negatively regulated by ppGpp.
D. Negative Control of Two glg Operons via csrA Studies of glycogen-excess E . coli B mutants SG3 and AC70R1, which exhibit enhanced levels of the enzymes in the glycogen synthesis pathway (i.e.. “derepressed mutants”), first suggested that glycogen synthesis is un-
REGULATION OF BACTERIAL GLYCOGEN SYNTHESIS
307
der negative genetic regtilation (2, 8, 15). The mutations in these strains, glgR and glgQ, respectively, affect glg transcription (29), although these mutations have not to date been isolated and sequenced. Recent experiments have resulted in the identification and molecular characterization of a pleiotropic gene from E. coli K-12, csrA, that encodes a negative factor for glg transcription (394. The relationship of csrA to the E. coli B mutations is not yet known.
MUTATIONS THAT CAUSE 1. NITROSOGUANIDINE-INDUCED OF THE GLYCOGEN BIOSYNTHETIC OVEREXPRESSION ENZYMES IN E. coli B The glgR mutation is closely linked to the glycogen structural genes by P1 transduction analysis and results in 8- to 10-fold higher levels of ADPGlc PPase and 3- to 4-fold higher levels of glycogen synthase, but does not alter the level of branching enzyme (Table I). Analysis of RNA transcripts for glgC in strain SG3 showed that the glgR mutation leads to an-increase in transcript B only (Fig. 2, Table 11). Therefore, it appears that the glgR mutation may alter a cis-acting site involved in the regulation of transcript B. This effect might be mediated via a negative regulatory site, but the current experimental evidence is also consistent with an overexpressed phenotype or an improved CRP-binding site. The gZgQ mutation is not linked to the glycogen gene cluster in P1 transduction, and results in 11-fold, 5.5-fold, and %fold increases in ADPGlc PPase, glycogen synthase, and glycogen branching enzyme, respectively (Table I). Therefore, glgQ appears to affect one or more trans-acting factors for the expression of the genes in the two glycogen operons. Levels of the four transcripts for the glgC gene are elevated in the glgQ mutant, AC70R1, although transcript A was affected the most dramatically, with approximately 25-fold higher levels being present in AC7OR1 versus the wild-type E. coli B or the SG3 strain (29) (Fig. 2). Because the levels of the branching enzyme are also elevated in AC70R1, it was not considered likely that glgQ was a mutation in the CAMP-CRP or ppGpp regulatory systems, which do not affect glgB expression. The expression of the chromosomal lmZ gene in AC70R1 and in the wild-type E . coli B strain was also similar, providing further evidence for the idea that glgQ affects a different regulatory system for the glg genes (31). 2. GLYCOGEN-ENHANCED TRANSPOSON MUTANTS OF E. coli K-12 In order to obtain further information about negative regulation of the glg genes, a collection of transposon mutants that affect glycogen biosynthesis in E. coli K-12 was isolated (39a). This approach facilitated the identification,
308
JACK PREISS AND TONY ROMEO
W B L E I1 PROPOSEDSEQUENCES FOR glgC TRANSCRIFTION
A. CAMP-CRF’binding sites for glgC regulation. E. coli (transcript B)a
E.coli CRP Consensus S.
typhimurid
AAATTCTGCTCTCGGATCGCTTTT ** * ** * ** * *** NAANTGTGANNTNNN-TCACATTT ** * ** * * ** AATTCTGTTGTCGGACCGTTTT
B. Comparison of 5‘-flanking regions of E . c o l i glgC transcripts with the consensus sequence for E.coli~070 promoters. Consensus
- 35 TTGACA
gl gC transcr iptc
~
10 TATAAT
****
A
TTCGCTGGAGAGGATAACCCAGTGATTACGGCTGTCTGGCA
B
TTGATCGCAATTAACGCCACGCT~GAGGTAACAGAGA~TCTTT
C
*** AGGCGACGGCAATGTCCGTTGCTAAATCGATATGCT
* * * t * * f *
a Detectvd by DISase I protection analysis. Nucleotide identities between the proposed CRP-binding sites are indicated hy asterisks. ”Identified solely b y sequence homology to the E . coli glgC gene (40). T h e glgC transcripts A, B. and C were mapped using S1-nnclease protection analysis (29) and are located at -60, -130. and -215 relative to the glgC initiation d o n , respectively. The 5’ termini of g l g C transcripts are indicated by asterisks. The best - 10 and -35 regions are underlined.
molecular cloning, and mapping of trans-acting regulatory genes for glycogen biosynthesis. Mutations were introduced into a strain that contained pCZ3-3 and the resulting mutants were stained with iodine vapor to detect intracellular glycogen. The plasmid-encoded P-galactosidase (EC 3.2.1.23) was subsequently determined in glycogen-excess mutants (Table 111). Several glycogen-excess mutants that also overexpressed the plasmid-borne glgC‘-’ZacZ hsion were isolated. One of these, the mutant TR1-5, was shown to accumulate approximately %fold more glycogen than an isogenic wild-type strain (Table 111). The gene affected by the TR1-5 mutation, csrA (for “carbon storage regulator”), has now been cloned, sequenced, and mapped on the E . coli genome and some of its regulatory effects have been studied (39a, 3%). The TR1-5 mutation was also shown to affect glycogen levels by causing elevated expression of genes representative of both glycogen operons, gZgC
309
REGULATION OF BACTERIAL GLYCOGEN SYNTHESIS
TABLE 111 ADPGLUCOSE PYROPHOSPHORYLASE, @gC’-’laCz- AND glgB’-’lacZ-ENcoDED P-GALACTOSIDASE ACTIVITY IN E . coli K-12 BW3414 (csrA+) AND TR1-5BW3414 (csrA::kanR)a
Strain
Glycogen (mg/mg protein)
ADGlc PPaseb (U/mg protein)
glgC’-’lacZc (AA,,o/mg protein)
g1gB’-’lacZ (AA*,o/mg protein)
BW3414 TR1-5BW3414
0.06 (*0.01) 1.61 (20.07)
0.23 (20.01) 2.16 (20.50)
10 70
3 6
aAerobic cultures were grown in Kornberg medium (31).Glycogen, ADPGlc PPase, and P-galactosidase levels were determined in stationary-phase cultures. bA unit of activity is defined as 1 pmol of product formed/lO minutes, in the pyrophosphorolysis direction, under maximal allosteric activation. cp-Galactosidase activity was determined and calculated as described in 31.
and glgB (Table 111). Levels of ADPGlc PPase expressed from the chromosome were approximately 10-fold higher in the TR1-5 mutant than in an isogenic csrA+ strain in the stationary phase. The P-galactosidase activity expressed from the glgC’-’ZacZ and the glgB’-’lacZ translational fusions was approximately 7-fold and 2- to 3-fold higher in the TR1-5 mutant, respectively, than in an isogenic csrA+ strain. It was found that the TR1-5 mutation affects glycogen and the expression of the glgB and gZgC genes in both the exponential and stationary phases. 3. PLEIOTROPIC EFFECTSOF csrA::kanR
THE
TR1-5 MUTATION
In addition to its role in the negative regulation of glycogen biosynthesis, the gene csrA exhibits pleiotropic effects, suggesting that csrA may encode a global regulatory factor. Another metabolic pathway that appears to be regulated via csrA is gluconeogenesis. The expression of a phosphoenolpyruvate (PEP) carboxykinase (EC 4.1.1.38) operon fusion (pckA‘-’lacZ) (kindly provided by Hughes Goldie) was enhanced approximately twofold throughout both the growth and the stationary phase in the TR1-5 mutant, suggesting that gluconeogenesis may also be under negative control of csrA. When several isogenic strains were grown on M9 medium, it was found that csrA+ and csrA::kanR strains (transductants with the TR1-5 mutation) were capable of growth on a wide variety of carbon sources. However, a strain that contained the functional csrA gene encoded on a multicopy (pUC19-based) plasmid, pCSRlO (Fig. 3), could grow on glucose and fructose but not on any of the gluconeogenic substrates that were tested, including succinate, glycerol, pyruvate, and L-lactate. This growth defect was shown not to be an effect of the pUC19 vector. When strains were plated on
310
JACK PREISS AND TONY HOME0
Mops medium, it was observed that this richer defined medium could support growth of a pCSR 10-containing strain using some gluconeogenic substrates, including acetate, as a major carbon source. However, the pCSR10containing strain formed only pinpoint colonies on succinate, whereas each of the other strains grew well. Perhaps the csrA gene affects succinate utilization independently of its effect on gluconeogenesis, possibly at the level of succinate transport into the cell. At least under certain growth conditions (anaerobic growth on Mops medium), the TR1-5 mutant (csrA::kanR) exhibited altered surface properties, such as adherence to the glass culture tubes that were used for growth (3%). This adherent phenotype was specifically complemented by pCSR10. The identity of the cell-surface molecules that are altered in the mutant has not yet been determined. Finally, cells containing the TRI-5 mutation were significantly larger than isogenic wild-type strains, at least under certain growth conditions. It was not determined if this was an effect resulting from the increased storage of glycogen in the mutant strain or was an independent effect of the TR1-5 mutation.
4. GENETIC MAPPING AND MOLECULARCHARACTERIZATION OF
csrA
The csrA gene was mapped by genetic and physical approaches (39b), and was located at 58 min or at position 2830 kb on the physical map of the E . coli K-12 genome (24). The csrA gene is between the gene alas, which encodes alanyl-tRNA synthetase (EC 6.1.1.7), and the serV operon of tRNA genes, and is transcribed counterclockwise on the chromosome (Figs. 3 and 4).
E
Ba-
I
I Ikb
PvPvPPv f i Y
I Ill I
-*serV
1
P
opcron csrA
v
P
B
I
I Ill
K
E
I
v
E
I
I
alas
FIG. 3. Restriction map and genetic structure of the region of the genome containing csrA and the structure of pCSR10. The shaded region of pCSRlO is drawn to scale and compared directly with the genomic region from which it originated. Modified from 3%.
870
(alas)
AGTGTGAAAGGCTGGGTCAGCGCGAAATTGCMTMTATAAGCGTCAGGCMTGCCGTGG 60
S
V
K
G
W
V
S
A
K
L
QOCH
ACTCGCTTCACGGCATTCGCATTAACGCTATCGACAACGATAAAGTCAGGTTGAAGTTGF
120
DdeI 180
GTATATCGGCTAFACTTAGGTTTAACAGAATGTAATGCCATGACTGCTTAGATGTMTGT
GTTTGTCA~CTTACTTmGGCGTTATATATGATGGATAATGCCGGGATACAGAGAGACC
R.B.S.
1 (csrA)
240
10
C G A C T C T T T T M T C T T T C ~ G G A ~ A M G A A T G C T G A T T C T G A C T C G T C G A ~ G G T G A 300 G
M
L
I
L
T
R
R
V
G
E 30
20
ACCCTCATGATTGGGGATGAGGTCACCGTGACAGTmAGGGGTAAAGGGCAACCA~TA 360 T
L
M
I
G
D
E
V
T
V
T
V
L
G
V
K
G
N
Q
V 50
40
CGTATTGGCGTAAATGCCCCGMGGAAGTTTCTGTTCACCGTGAAGAGATCTACCAGCGT
R
I
G
V
N
A
P
K
E
t
V
S
V
H
R
E
E
I
Y
Q
420
R
60
ATCCAGGCTGAAhMTCCCAGCAGTCCAGTTACTMTCTTTCCGCGTCTCATCTTTATCG I
Q
A
E
K
S
Q
Q
S
S
480
YOCH
GTGAGACGCACCCTCAAAATTTCTCCTTCACTCTATAGTCTTTTCGCTTTACTCCCGTTC
540
ATTCAACTTMGTCTCCATTTTTTTGCATTACTACTATCTGTCAGACCTCCATTCTTCTG
600
TTGATAAAACACTCTTTTTGACGTTTTTACAGACTAATTGAACGTGAAGTGTGCAAACGA
660
-35
(serV) DdeI
-10
TAAAAGTGTAGGAAMATTGTTTGACTTATAAGTCTTATAAGTCTCAGAAAGT’AATAT 7 0 9
FIG.4. Nucleotide sequence of the csrA gene and the deduced amino-acid sequence of its product. The nucleotide sequence between the alas and serV genes on the E. coli K-12 genome is shown. The proposed ribosome binding region (R. B. S.) ofcsrA and the - 10 and -35 regions of serV are underlined. The dagger (t)indicates the site of the insertion mutation TR1-5. The DdeI sites used in subcloning csrA into pUC19 to generate pCSRlO are located at 168 and 698 bp. Reproduced with permission from 39a.
3 12
JACK PREISS AND TONY HOME0
Both the TR1-5 mutant and wild-type alleles of the csrA gene were cloned into plasmid pUC19. The clone pCSRlO (Fig. 3) that contained the wild-type allele complemented the TR1-5 mutation and severely inhibited glycogen biosynthesis in all E . coZi strains tested, thus confirming the trunsacting effect of csrA ( 3 9 ~ )Nucleitide . sequence analysis revealed that the largest open reading frame located between aZuS and serV and present in pCSR10, exhibited an upstream sequence typical of E . coli ribosomebinding sites (Fig. 4). Sequencing of the csrA mutant allele in strain TR1-5 showed that this ORF is disrupted by the kunR marker in this mutant (Fig.
4. The csrA ORF encodes a 61-aminoacid polypeptide, which was strongly expressed from the plasmid pCSRlO in S-30 transcription-translation experiments ( 3 9 ~ )Deletion . mapping experiments of the plasmid-encoded csrA gene demonstrated that the ORF is required to mediate the inhibitory effects on the glycogen synthesis phenotype in oioo. Computer-assisted database searches failed to reveal convincing homology of csrA or its proposed polypeptide product, CsrA, with any existing gene or proteins or prokaryotic sequence motifs. This suggests that csrA may encode a new kind of genetic regulatory factor. Analysis of glgC transcripts by S 1-nuclease protection mapping showed that the steady-state levels of all four gZgC transcripts are elevated in the TR1-5 mutant and are severely depressed in a pCSR10-containing strain, indicating that csrA affects the transcriptional regulation of glgC (Fig. 2) (M. Y. Liu and T. Romeo, unpublished). The cis-acting site(s) required for these effects are being investigated via saturation mutagenesis experiments. Identification of these sites would allow us to propose and test molecular mechanisms that could account for the simultaneous effects of the factor CsrA on the fbur g1gC transcripts.
E. E v 7 0 Transcribes glgCAP The analysis of transcripts that originate upstream from the gZgCAP(Y) operon by S1-nuclease protection mapping revealed four transcripts. Three of these transcripts were mapped to high resolution (29). The DNA sequences immediately preceding the 5’ ends of these transcripts were weakly related to consensus sequences for E . coZi promoters (Table 11). Although positively regulated promoters typically show weak similarity to the consensus sequence (39), it is also possible that one or more of the glg promoters is recognized via an alternative sigma factor. Therefore, the dependence of glgC expression on three sigma factors was tested in coupled transcriptiontranslation, using monoclonal antibodies (mAbs) that selectively inhibit transcription by specific and selective recognition of the sigma factors (kindly provided by S. A. Lesley and R. R. Burgess; 41-43).
313
REGULATION OF BACTERIAL GLYCOGEN SYNTHESIS
A mAb directed against E . coli a70 inhibited up to 85% of the gZgC expression, versus 95% inhibition of the control gene, bla (Fig. 5). Antibody directed against a54 or a32 did not inhibit expression of glgC, relative to glnAP2 (u54-dependent) and dnaK (a32-dependent) controls (data not shown). Therefore, although nitrogen limitation enhances in vivo glycogen synthesis, the expression of glgC is not regulated by the nitrogen starvation transcriptional controls (a54 dependent). The same conclusion was suggested by the failure of NtrC and NtrA ( a s ) to enhance expression of glg genes in the S-30 experiments (29). The heatshock regulatory system (a32dependent) also appears to have no involvement in control of glgC expression. Therefore, the major active form of RNA polymerase (Eea70) was apparently utilized for glg expression in the S-30 transcription-translation system. The S-30 extracts used in the experiments were prepared from exponential cells to obtain optimal translational activity, and would not have contained endogenous activity from a sigma factor or an accessory factor synthesized in stationary phase. Therefore, expression via one or more gZgC transcripts could have gone undetected in these experiments. Although it has been reported (44) that interruption of the gene for a putative stationary-phase sigma factor, katF, leads to decreased levels of glycogen, our experiments and those of others show that katF does not regulate gZgC gene expression in viva (J. Smith and T. Romeo, unpublished; 45). Therefore, E v 7 0 is the primary form of RNA polymerase responsible
0
10
20
30
40
50
60
[anti-Sigma 701 FIG. 5. Inhibition of in vitro expression of glgC by a monoclonal antibody that specifically recognizes 070. Transcription-translation of glgC in 5-30 extracts was conducted and results were analyzed as previously described (29). Aliqnots of the a70-specific monoclonal antibody 3D3 (kindly provided by Scott A. Lesley and Richard R. Burgess) were added prior to the addition of pOP12.
314
JACK PREISS AND TONY ROMEO
for the expression of glgC, although there is a formal possibility that one or more of the proposed gZgC promoters utilizes a yet uncharacterized alternative sigma factor.
F. Potential Control of Glycogen Biosynthesis via the Regulated Expression of Structural Genes Outside the glg Gene Cluster Studies from several laboratories provide evidence for a class of stationary-phase-induced genes that depends on a putative alternative sigma factor for expression (44,46,47).The gene for this sigma factor is most often referred to as katF or rpoS. Iodine staining of katF+ and katF- strains suggest that a functional katF allele is required for optimum accumulation of glycogen in strain MC4100 (45). As stated earlier (Section I,E), the effect of katF on glycogen synthesis is not mediated by regulation of expression of the structural genes for the enzymes of the glycogen pathway. Fischer and Hengge Aronis isolated, cloned, and sequenced a gene in E . coli K-12 that stimulates glycogen synthesis (gigs) and that requires katF for optimum expression (45). The glgS gene appeared to be transcribed via a CAMP-dependent promoter and a katF-requiring promoter. Based on iodine staining of colonies, a null mutant in glgS appeared to accumulate more glycogen than a katF mutant, indicating that katF may have additional effects on glycogen synthesis besides gZgS induction. The gZgS mutation did not affect the expression of glgC or glgA gene fusions. Although several mechanisms were suggested by which gZgS might affect glycogen biosynthesis, the function of glgS and the role of katF in glycogen synthesis remain to be defined. The possibility that these factors regulate a process occurring prior to the glycogen biosynthesis pathway per se, but aRecting the ultimate level of glycogen that can be accumulated, has not been investigated.
G. An Integrated Model for the Genetic Regulation of the Glycogen Synthesis Pathway in Escherichia coli Glycogen metabolism involves a complex of regulatory factors that coordinate the rate of glycogen synthesis with the physiology of the cell. The genetic regulation of the glycogen biosynthesis pathway by CAMP and ppGpp allows E . coZi to adjust its metabolic capacity for converting available carbon substrate into glycogen in response to the availability of carbon {energy) or amino acids, respectively. When cells are growing rapidly, the levels of the enzymes are low, and although the precursors ATP and glucose-1phosphate are available for endogenous glycogen synthesis, the rate of gly-
REGULATION OF BACTERIAL GLYCOGEN SYNTHESIS
315
cogen synthesis is low. Following nutrient deprivation, the syntheses of ADPglucose pyrophosphorylase and glycogen synthase are induced; therefore, the capacity for glycogen synthesis is significantly greater. The level of glycogen that is ultimately accumulated will be dependent on substrate availability and is subject to allosteric regulation of the ADPglucose pyrophosphorylase activity. The idea that genetic regulation determines capacity for glycogen synthesis, as distinguished from the absolute level of glycogen, is an important concept. For example, the enzymes of the pathway are induced in stationary phase when cells are grown on LB medium lacking glucose, and yet glycogen synthesis is relatively poor. On the other hand, in glycogen-synthesis media containing excess glucose, in which nitrogen is limiting, the induction of the genes for the biosynthetic enzymes is somewhat weaker (CAMP is at a low level), yet glycogen synthesis is relatively greater. These observations were recently incorrectly interpreted to indicate that the genetic regulation of the enzymes of the pathway is of minor significance for controlling glycogen synthesis (45).However, studies of glycogen synthesis in mutants affected in either negative (glgR, glgQ, csrA) or positive (cyu, crp, reZA, spoT) control systems for glgCA gene expression clearly demonstrate that the genetic regulation of the levels of glycogen biosynthetic enzymes is paramount in determining the ultimate level of glycogen synthesized and accumulated under any given physiological condition. The structural and regulatory genes involved in glycogen metabolism in E. coli are listed in Table IV. The effects of both positive and negative regulatory factors controlling the expression of the glg genes of the glycogen biosynthetic pathway are shown in Fig. 6. The overall complexity and design of this regulatory system is depicted, as are some of the important questions that remain to be solved regarding the regulatory role of factor CsrA. In particular, the physiological parameters to which the CsrA regulatory system responds is one of the key features in this scheme that must be determined.
II. Site-directed Mutagenesis of the ADPglucose Pyrophosphorylase Genes to Study StructureFunction Relationships of Enzyme Action and Regulatory Control
A. Allosteric Regulation of Bacterial ADPglucose PyrophosphoryIase In addition to the regulation of enzyme expression as discussed in Section I, regulation of bacterial glycogen synthesis is also exerted through the allosteric regulation of ADPGlc PPase activity.
3 16
JACK PREISS AND TONY ROMEO TABLE IV GENESINVOLVEDIN GLYCOGEN METABOLISM I N E . coli ~
Gene
~~~
Gene product
Map site (minutes)
ADP-glucose pyrophosphorylase
75
Glycogen synthase Glycogen branching enzyme
75 75
7.9-kDa polypeptide
67
Glycogen phosphorylase a-hmylase
75 43
Glucan hydrolase/transferase
75
Adenylate cyclase
85
Cyclic AMP-receptor protein (p)ppGpp synthase I
74 60
(p)ppGp$’-pyrophosphohydrolaTe
60
6.8-kDa polypeptide (CsrA)
58
59
Trans-acting factor (unidentified)
(?I
Cis-acting site (sequence unknown)
75
.Reference 4&, all other genes are described in the text.
Comment
Synthesis of glucosyl donor Glucosyl transferase a-1,6 branch formation Function unknown Phosphorolysis Hydrolysis (physiology unknown) (From sequence homology) Mediate catabolite repression, i.e., global carbon/ energy response Mediates stringent response and response to carbonlenergy Mediates stringent response and response to carbodenergy Pleiotropic, regulation of glgC and glgB transcription Pleiotropic, required for expression of g l g S , not glgCA Transcriptional regulation glgCA and glgB Transcriptional regulation of gl gCA
317
REGULATION OF BACTERIAL GLYCOGEN SYNTHESIS
Carbon Source
-
,.
PTS 1other transporters
Inner membrane
/.
Amino acid /energy
..,"."."~"'
Phvrinlnoiril
1
stimulus?
celsize IGlycogen Bioynthesis 1 & I
I
FIG.6. Genetic regulation of glycogen synthesis in E. coli K-12. The model depicts our current knowledge of both positive and negative factors that control the flow of carbon and energy into glycogen via genetic regulation. It emphasizes the role of the gene product of csrA (CsrA)in the regulation, and points out other potential regulatory functions of CsrA. It does not consider the allosteric regulation of glycogen synthesis or poorly understood control mechanisms, such as kutF-dependent regulation (45). It is not meant to be biochemically comprehensive or to describe the specific biochemical route of all carbon sources, e.g., growth on glucose would not require gluconeogenesis for glycogen synthesis. Effects of CAMP on gluconeogenesis have been documented by Goldie (49), and the regulatory interactions of the phosphotransferase system (F'TS) with adenylate cyclase have been reviewed (50).
Over 40 ADPGlc PPases [mainly bacterial (1, 2) but also plant (51-53)] have been studied with respect to regulatory properties. In most cases, glycolytic intermediates activate ADPGlc synthesis whereas AMP, ADP, and/or Pi are inhibitors. Glycolytic intermediates in the cell may be considered as signals of carbon excess and thus, under conditions of limited growth with excess carbon in the media, accumulation of glycolytic intermediates that are certain to occur would be signals of activation of ADPGlc synthesis. For virtually all of the ADPGlc PPases studied, the activator glycolytic intermediate increases the apparent affinity of the enzyme for the substrates, ATP and glucose-1-P. Also, increasing concentrations of activator may reverse inhibition caused by either AMP, ADP, or Pi. The activator specificity of the bacterial and plant ADPGlc PPases have been studied and the enzymes can be catalogued into seven groups on the basis of their specificity of activation by the various glycolytic intermediates (1, 2, 6-10, 54). The variation of activator specificity observed has been postulated to correlate with the nature of carbon assimilation dominant in the bacterium or plant tissue. This has been discussed in detail in a number of reviews (1,2, 6-10, 54). Some examples can be illustrated here. Escherichia coli and S . typhimurium obtain their energy mainly through glycolysis. The activator for
318
JACK PREISS AND TONY ROMEO
these enterobacterial ADPglucose pyrophosphorylases is fructose-1,6-bisP, the metabolite that is a product of the allosteric enzyme in glycolysis, phosphofructokinase (EC 2.7.1.11). The photosynthetic plant, algal, and cyanobacterial ADPGlc PPases are activated by 3-P-glycerate, the initial CO, fixation product of photosynthesis. Thus, the first assimilatory product stimulates the synthesis of ADP glucose, a precursor of starch in the higher plants and of glycogen in the cyanobacteria. Moreover, much evidence has accumulated to suggest strongly that the kinetic allosteric activation observed in oitro and the inhibitor effects occur in citjo in bacterial and algal cells. There are certain mutants of E . coli and of S. typhimuriuin LT-2 that are affected in their ability to accumulate glycogen. These mutants contain ADPGlc PPases with altered regulatory properties. Generally, those mutants with ADPGlc PPases having higher affinity for the activator, fructose-1,6-bisphosphate,andlor a lower affinity for the allosteric inhibitor, AMP, accumulate glycogen at a faster rate than the parent wild-type strain. Mutant enzymes having a lower affinity for the activator accumulate glycogen at a slower rate than the parent strain. Table V summarizes the allosteric properties of the mutant ADPGlc PPases that have been studied and their ability to accumulate glycogen in the stationary phase. With respect to E . coli, there is a direct relationship between the affinity of the enzyme for the activator and the mutant's ability to accumulate glycogen, compared to the wild-type strain. If the apparent TABLE V COMPAKISON OF GLYCOGEN ACCUMULATION RATES AND ALLOSTERIC KINETIC CONSTANTS OF E . coli AND S. typhimun'um LT-2 ALLOSTERIC MWANT ADPGlc PPases WITH THE ADPGlc PPases OBTAINEDFROM WILD-BPE BACTERIA^
Strain
Glycogen accumulationb (maximal, mg/g cells)
Fructose-1,G-bisPc (A0.5,mM)
AMPd ( I o 5 , mM)
Escherichia coli Mutant SG5 Mutant 618 Mutant CL1136 Mutant SG14 Salmotiella typhimun'um Mutant JP23 Mutant JPSi
20 35 70 74 8.1 12 15 20
0.068 0.022 0.015
0.075 0.17 0.86 0.68 0.50 0.11 0.25 0.49
0.005 0.82 0.095 -e
0.0%
RData from 1 and 2. "he bacterial strains were grown in minimal media with 0.75%glucose, and the data arc expressed as maximal milligrams of anhydroglucose units per gram (wet weight) of cells in the stationary phase.
REGULATION OF BACTERIAL GLYCOGEN SYNTHESIS
319
affinity for the activator, fructose-1,6-bisP is higher, glycogen accumulation by the mutant is higher. If the apparent affinity for activator is lower, as seen for mutant SG14 enzyme, glycogen accumulation is lower in the mutant than in the parent strain. The S. typhimurium mutant strains studied have ADPGlc PPases that are more affected in the affinity of the inhibitor. Both JP 23 and JP 51 enzymes have lesser affinity for the inhibitor, and these mutants accumulate higher amounts of glycogen than does the parent Istrain. The above studies plus one showing a direct relationship between the fructose-1,6-bisP concentration in the E. coli cell and the rate of glycogen accumulation (55) clearly point out that fiuctose-1,6-bisP is an allosteric activator of ADPglucose pyrophosphorylase and a physiological activator of glycogen synthesis in E. coli and S. typhimurium.
B. Catalytic and Allosteric Effector Sites of ADPglucose Pyrophosphorylase 1. CHEMICALMODIFICATION STUDIES Chemical modification studies of the E. coli ADPGlc PPase provide evidence for the location of the activator binding site (56, 57), the inhibitor binding site (58, 59) and the substrate binding site (57, 60). These experiments have used pyridoxial-P as the analog for either the activator, fructose-1,6-bisP (56, 59, or, as subsequently shown, for the substrate, glucose-1-P (57, 61). For an ATP analog, the photoaffinity reagent 8-azidoATP (8N,ATP) proved to be a substrate for the E. coli enzyme (60),whereas 8-azido-AMP (8N,AMP) was an effective inhibitor analog (58, 59). Because the E. coli ADPGlc PPase gene, glgC, had been cloned and sequenced (15, 17), the identification of the amino-acid sequence about the modified residue enabled determination of the location of the modified residue in the primary structure of the enzyme. The amino-acid residue involved in binding the activator was Lys-39 and the amino-acid involved in binding the adenine portion of the substrates (ADPGlc and ATP) was Tyr-114. Tyr-114 was also the major binding site for the adenine ring of the inhibitor, AMP. Lys-195 is protected from reductive phosphopyridoxylation by the substrate, ADPGlc; thus, it was proposed that it is also a substrate binding site. OF LYSINERESIDUE 195 2. SITE-DIRECTED MUTAGENESIS
Subsequent site-directed mutagenesis experiments indicated that Lys-195 is involved in the binding of the phosphate moiety of the substrate, glucose-1-P, or the P-phosphate of ADPGlc (61).As seen in Table VI, substitutions of various amino acids for Lys-195 resulted in a specific and large increase in the K , of glucose-1-P. The K,,, of the K195R* mutant3 enzyme for Arginine replacing lysine at residue 195.
320
JACK PREISS AND TONY ROMEO
EFFEC:~ OF
SUBSTRATEAND
IVild type
Constant Clc-1-1' K;,,, (mM) ATP So ( m M )
sf&1z
so
(Ill%f)
Fru-l.6-bisP A , ( @ f ) AMP I , , , ( F M , .41)PGlc so 5 (PA!'! ~~
TABLE VI RESIDUE 195 OF E . co/i ADPGlc PPase ON EFFECTORKINETIC CONSTANTS
.~MINO- A~ : I DSUBSTITUTION AT
~~~~~~
0.0ZS 0.33 2.6 25.4 85 97
K195R
K195H
K195Q
3.05 0.33 3.5
3.24 0.24 3.2 27 324
16.7 0.19 3.4 21 55 1 6aa
60 188
-
-
K195I 17.2 0.25 3.6
15 572 -
Kl95E 300 0.15 2.8 28 6fj5
-
~~
CIS,, i \ the coirrmtration of substrate giving 50% of maximal activity A, is the concentration of activator gi\ing SO'+% of marimal activation, and I , is the concentration of inhibitor giving 50% inhibition.
glucose-1-P increased about 100-fold (from 25 )*.M to 3 mM) whereas for the K195E** mutant4 it increased 10,000-fold to about 300 mM. This last K , value was only an estimate, because the highest concentration of glucose-1-P used in the reaction mixture was 18 mM. The kinetic constants for ATP, M@+, and the activator, fructose-1,6-bisP, were all relatively unaffected by any of the amino acids substituted at residue 195. Some increase in the Ki of AMP and in the So,5 of ADPGlc was noted (about %fold), but this change was much less than the 10,000-fold increase in the K, for glucose-l-P. Substitution of Lys-195 by Arg resulted in the large increase, indicating that charge alone is insufficient to confer binding strength for glucose-l-P. Substitution by His was similar to that of Arg whereas substitution of uncharged amino acids such as Gln or Ile resulted in about 500-fold increases in K l . The trend in YI,values from basic to neutral to acidic aniino-acid substitutions supports the hypothesis that an ionic bond between Lys-195 and the phosphate moiety of glucose-1-P forms the basis of optimal binding a n i t y , but other factors, such as shape and size of the amino acid, are also important. The results also indicate that the apparent affinity of both ADPGlc and AMP are affected. However, the binding of the other adenine nucleotide substrate, ATP, was not affected. It is quite possible that ATP binds so that its triphosphate portion is anchored in an orientation away from the c-amino group of Lys-195, so that the ATP-MgZ+ complex does not interact with Lys-195 at all. In contrast, the a-phosphate of AMP or ADPGlc may be rotated up into the region of Lys-195 so as to allow interaction between the negatively charged phosphate oxygen and the positively charged e-amino group of Lys-195. The adenine portion of AMP and ADPGlc could occupy the same site to which the adenine of ATP (around Tyr-114) binds. These .&Glutamate
replacing lysine at residue 195.
32 1
REGULATION OF BACTERIAL GLYCOGEN SYNTHESIS
postulations agree well with the earlier kinetic and chemical studies suggesting that the adenine moieties of ATP, ADPGlc, and AMP share the same binding site, but that the sugar-phosphate portions of these ligands may occupy unique sites (58, 59, 62, 63). The postulations are also in agreement with steady-state kinetic data showing noncompetitive inhibition of AMP with glucose-1-P, therefore indicating that both AMP and glucose-1-P can bind to enzyme simultaneously. In addition, binding studies with the wild-type ADPGlc PPase demonstrated half-site reactivity with respect to ATP binding (two sites per native homotetramer) but full-site binding for AMP and for ADPGlc (62). This suggests that the binding conformation/site for AMP and ADPGlc differ from those for ATP. The large increases in the K , of glucose-1-P when Lys-195 is substituted by other amino-acid residues makes clear the absolute conservation of this residue in all the ADPGlc PPases, bacterial or plant, sequenced to date from either genomic or cDNA clones. Conservation of this sequence is shown in Table VII for sequences of 20 ADPGlc PPases thus far cloned or sequenced (64).Thus it appears that there is a high degree of specificity for the size, shape, and charge of one amino acid, Lys, to bind substrate adequately. However, there is also a certain degree of tolerance for amino-acid substitutions in terms of the ability of the protein to maintain its folding properties
TABLE VII CONSERVATION OF THE GLUCOSE- PHOSPHATE BINDING SITE OF ADPGLUCOSE PYROPHOSPHORYLASE~ Source
195
Escherichia coli Salmonella typhimurium Anabaena Synechocystis Spinach leaf, 51 kDa Potato tuber, 50 kDa Rice seed (small subunit) Maize endosperm, 54 kDa Arabidopsis thaliana (small subunit) Wheat endosperm (small subunit) .References to these sequences for the plant ADPGlc PPases are in 64. The sequences for the Anabaena enzyme are in 65 and for the Synechocystis enzyme, in 66. the nucleotide sequence for the wheat endosperm small subunit is in EMBL and GenBank Nucleotide Sequence Databases under the accession number X66080. The Lys residue that binds to glucose-1-P is underlined.
322
JACK PREISS AND TONY ROMEO
[as measured by thermal stability of the mutants ( S l ) ] ,including the maintenance of the closely positioned ATP-binding site. 3. SITE-DIRECTED MUTAGENESIS OF ~[~ROSINE-RESIDUE 114
Preliminary experiments were done to determine the nature of involvement of Tyr-114 in the binding of AMP and of the substrates, ATP and ADPGlc. As indicated by chemical modification studies with the 8-azido adenine nucleotide analogs, 8N,ADPGlc and 8N,AMP, the amino acid mainly modified was Tyr-114 (58-60). The chemical modification studies suggested that the adenine nucleotide substrates, ATP and ADPGlc, share a common site with the inhibitor, AMP. Moreover, conformation analysis (67) predicted that Tyr-114 is situated in a Rossmann-fold supersecondary structure, which is generally regarded as an adenine nucleotide binding domain
(68-71). The Phe-114 mutant was prepared via site-directed mutagenesis (72)and, as expected, the enzyme still retained 30 to 65% of the catalytic activity of the wild-type enzyme. However, its apparent affinities for the substrates, ADPGlc and ATP, were reduced to about one-fifth and one-eleventh, respectively. The K, for glucose-1-P was not changed, but surprisingly, in the synthesis direction, the enzyme was activated only twofold by the activator, fructose-l,6-bisP. This is in contrast to the wild-type enzyme, which is activated 25- to 40-fold. In addition, whereas maximal activation of the wild-type enzyme required about 0.25 to 0.5 mM activator, the Phe-114 mutant enzyme required an activator concentration of over 3 mM. Although the inhibition constant for AMP was not changed in the Phe-114 mutant enzyme, the inhibition curve was altered from a sigmoidal shape to a shape indicating negative cooperativity between the AMP sites. Thus, the mutant enzyme shows more sensitivity to AMP inhibition at low concentrations, but is more resistant to inhibition at AMP concentrations greater than 66 p M . For example, the wild-type enzyme is inhibited over 90% at 300 p M , while the Phe-114 mutant enzyme is inhibited only 74% (72). Therefore, conversion of Tyr-114 to Phe not only lowers the apparent binding affinity of the substrates, ADPGlc and ATP, but also affects the apparent affinities of the effector, fiuctose-1,6-bisP, and alters the pattern of AMP inhibition. It is proposed that the activator, substrate, and inhibitor sites are juxtaposed in the three-dimensional structure of the enzyme. The site-directed mutagenesis studies are consistent with not only the studies done with the 8-azido adenine nucleotides (58-SO), but also with binding studies done with the substrates, the inhibitor analog, chromium-ATP, and with the effectors, fructose-1,6-bisP and AMP (62).
REGULATION OF BACTERIAL GLYCOGEN SYNTHESIS
323
As indicated above (Section 11,B, l), the chemical modification studies with the azido-nucleotides showed that the adenine nucleotide substrates and inhibitor, AMP, share a common binding site at Tyr-114. Incorporation of 8N,AMP into the ADPGlc PPase was also at amino-acid residues close to the activator site, suggesting that the activator, inhibitor, and adenine nucleotide substrate binding sites are near each other in the enzyme’s tertiary structure. The binding studies show that the binding of the substrate ATP alone or the activator alone has no effect on the binding of the inhibitor. In fact, there is a slight stimulation of AMP binding by subsaturating concentrations of ATP (62).However, the presence of both ATP and fmctose-1,6-bisP together effectively inhibited the binding of the inhibitor, indicating an interplay of the three separate sites. To obtain a clearer idea of the interplay of the three sites it would be useful to have a characterization of the three-dimensional structure of crystals of the ADPGlc PPase from X-ray diffraction studies, such studies are in progress (73).But certainly the oligonucleotide-directed mutagenesis studies will play an effective role in determining structure-function relationships of various residues and domains identified by prior chemical modification studies. It would be of interest to substitute other amino-acid residues at position 114, which could provide a greater insight with respect to the function of the hydroxyl group of the tyrosine residue. The sequence around Tyr-114 is RMGENWYRGTAD. What would be the effect of other hydrophobic aminoacid substitutions at position 114? What would be the effect of substituting a serine residue at 114 or replacing the Trp residue at position 113, or the neighboring arginine residues? 4. SITE-DIRECTED MUTAGENESIS OF LYSINERESIDUE 39 Binding studies with pyridoxal-P identified Lys-39 as being involved in binding of the activator (56, 57). Thus, the initial studies at this site substituted the Lys residue with glutamate (74). The resultant mutant ADPGlc PPase was activated only about %fold by fructose-1,6-bisP as compared to the wild-type enzyme, which was activated about 30-fold. In addition, the concentration of fructose-1,6-bisP needed for 50% of maximal activation of the mutant enzyme was 23-fold that observed for the wild-type enzyme (0.9 mM versus 0.039 mM). The mutant E-39 enzyme also had lower apparent affinities for the substrate, ATP, and for Mg+2 than the wild-type enzyme. The presence of activator, which increases the apparent affinity of the normal enzyme for its substrates, had no effect on the apparent affinities of ATP and Mg+2. The concentration of inhibitor giving 50% inhibition was also similar for the E-39 enzyme in the presence or absence of activator. Thus, the E-39 mutant enzyme is not effectively activated by fructose-1,6-bisP and
324
JACK PREISS AND TONY ROMEO
this mutation also prevented the other effects that would occur on binding of activator, such as increasing the apparent affinity of the enzyme for its substrates as well as desensitizing the enzyme to inhibition by AMP. The sequence about lysine residue 39 is KDLTNKRAKPAV (56,57)and it is a highly basic region. This may be important, because binding of the fructose-1,6-bisP at both anionic phosphate residues could be essential for activation. It would be interesting to see if arginine could substitute effectively for Lys at residue 39, or if substitution of the arginine at position 40 by other amino acids would have an effect. Similarly, the other lysine residues at positions 34 and 42 may also be important for binding of the activator, and thus site-directed mutagenesis at these sites could also be of interest. As is discussed later (Section II,B,S), a glycogen-deficient mutant of E . coli, SGI4, has an ADPGlc PPase that has an amino-acid substitution of threonine at position 44 for the wild-type alanine, and this mutant enzyme is poorly activated by fructose-1,6-bisP (13, 75).
5. CLONINGOF ADPGLUCOSEPPYROPHOSPHORYLASES WITH ALTERED ALLOSTERICPROPERTIESFROM E . coli MUTANTS SYNTHESIS AFFECTED IN GLYCOGEN As indicated in Section II,A, a class of mutants of E . coli and of S . typhiiniiriuin with altered rates of glycogen accumulation was found to have ADPGlc PPases affected in their allosteric properties. The regulatory properties of these mutants are seen in Table V. In order to gain insight with respect to amino-acid residues or domains involved in maintaining allosteric function, the genes encoding the allosteric mutant ADPGlc PPases were cloned (75-80). Table VIII shows the various amino-acid substitutions in the allosteric mutants that have been analyzed. Of interest is that the mutations causing large changes in the allosteric properties of the enzyme occur throughout the sequence of the ADPGlc PPase. These changes of amino acids in the enzyme affect not only the affinity of the allosteric effectors (Table V) but also the TABLE VIII AMINO-ACIDSCRSTITLTIONS IN THE E . coli ADPGLCCOSEPYROPHOSPHORYLASE ALLOSTERIC ~~L-~ANTS Mlltallt
Mutation
Ref.
SB14 CL1136 SG5
Ala-44 -+ Thr Arg-67 -+ Cys Pro-295 -+ S e r Gly-336 + Asp
75 78 79 76, 77
618
REGULATION OF BACTERIAL GLYCOGEN SYNTHESIS
325
apparent affinities for the substrate ATP and Mg2+ (1, 2, 12, 13, 75, 80). Thus, there are many domains perturbed by the mutations. The A44T (SG14) enzyme is very similar to the K39E mutant enzyme created by site-directed mutagenesis (see Section II,B,4 and 75)in that they both have greatly reduced affinity for the activator, fructose-1,6-bisP. Other activators, such as P-end-pyruvate and NADPH, which have a significant activating effect on the wild-type enzyme, have almost no effect on the A44T and K39E enzymes. Lys-39 is involved in the binding of the activator, fructose-1,6-bisP (56, 57), and the sequence in the region starting .with Lys-39, KRAKPAV, is highly positive in charge, which probably contributes to the anionic binding domain for the phosphate group present in the activator. The disruption of the apparent binding of fructose-1,6-bisP by the substitution of the negatively charged glutamate for Lys-39 supports this. The conversion of Ala-44 to threonine also interferes with the binding of activators. Threonine can hydrogen-bond to nearby amino acids in the threedimensional structure in the enzyme, and thus possibly disrupt the placement of the adjacent positively charged groups. The importance of the Ala residue in position 44 is reinforced by its absolute conservation in every known sequence (bacterial and plant) of ADPGlc PPase (64). The A44T (SG14) mutant enzyme also has a decreased sensitivity to AMP inhibition compared to the wild-type enzyme. The R67C (CL1136) mutant ADPGlc PPase also has decreased sensitivity to AMP inhibition. Both of these amino-acid mutations are in the region where labeled 8-azido-AMP was incorporated. Thus it appears that residues 44 and 67 may be part of, or affect, the AMP binding domain. Of interest is that the A44T mutation results in disruption of both activator and inhibitor sites. In contrast, the R67C mutation, although disrupting the inhibitor site, increases the affinity for the activator. The R67C mutant enzyme is also less dependent on activator for activity. In other words, it has significant activity in the absence of activator. Other mutations, P295S (SG5) and G336D (618), increase the apparent affinity of the ADPGlc Ppase for the activator and also, to a lesser extent, cause the enzyme to be less dependent on activator for activity. Despite changes at a number of sites in the enzyme, the previously observed synergistic effects (62) in the binding of fructose-1,6-bisP, the activator, and ATP is still maintained. In mutants R67C, K195Q (61), P295S, and G336D, an increased affinity for the activator results in increased affinity for ATP. If, however, the activator binding site (K39E, A44T) or the substrate affinity site (Y114F) is disrupted, the corresponding af€inity for the substrate or activator is decreased, These results support the concept of partially overlapping sites of activator, substrate, and inhibitor ( 1 , 62). Thus site-directed mutagenesis of the ADPGlc PPase gene and analysis
326
JACK PREISS AND TONY ROMEO
of various allosteric mutant genes have provided much insight in the structure-function relationships of the substrate and catalytic sites. What is needed for greater clarification is knowledge of the three-dimensional structure of the enzyme. Unfortunately, at the present moment, no crystals of ADPGlc PPase are available for this type of analysis. The E . coli enzyme has been crystallized (73) but the crystals are of poor diffraction quality and sensitive to X-ray exposure damage. We hope these difficulties will be overcome in the near future.
111. Glossary ADPGlc PPase asd
re& spoT
his glgC‘-’lmZ
rrnAPl csrA
CHP AC70R1
SG3
TR 1-5 kanR
alas ser17
E*a70 mAb glnAP2
adenosine diphosphate glucose pyrophosphorylase aspartate semialdehyde dehydrogenase (EC 1.2.1.11) gene restriction nuclease open reading frame ADPGlc PPase gene, glycogen synthase gene, phosphorylase gene operon, or regulon region ATP:GTP 3‘-pyrophosphotransferasegene; stringent factor guanosine 3’,5’-bis(diphosphate) 3’-pyrophosphatase gene histidine biosynthetic gene translational fusion of glgC promoter region to P-galactosidase gene ribosomal RNA operon promoter carbohydrate storage regulator gene CAMP receptor protein E. coli mutant overexpressed with respect to the glycogen biosynthetic enzymes (glgQ mutant) E. coli mutant overexpressed with respect to the glymgen biosynthetic enzymes (glgR mutant) 618 gZgC allosteric mutant csrA mutant gene conferring kanamycin resistance Ala-tRNA synthetase (EC 6.1.1.17)gene serine tRNA-3 gene RNA polymerase core plus sigma factor 70 monoclonal antibody glutamine synthetase gene promoter
REGULATION OF BACTERIAL GLYCOGEN SYNTHESIS
dnaK NtrA
katF SG14 CL1136 SG5
327
DNA biosynthesis gene TON, RNA polymerase a54 subunit, involved in regulation of nitrogen metabolism glnG, positive regulatory gene for glnA putative 7.9-kDa polypeptide biosynthetic gene involved in glycogen biosynthesis catalase regulatory gene; rpos, “stationary phase sigma factors” E . coli glgC allosteric mutant E . coh glgC allosteric mutant E . coli glgC allosteric mutant ACKNOWLEDGMENT
Research in the laboratory of T. Romeo was supported by a Faculty Research Grant Award (UNT Health Science Center) and by a grant from The National Science Foundation.
REFERENCES 1 . J. Preiss and T. Romeo, Ado. Microb. Physiol. 30, 183 (1989). 2. J. Preiss, Annu. Reu. Microbid. 38, 419 (1984). 3. J. A. Slock and D. P. Stahly, J. Bact. 120, 399 (1974). 4. R. J. Gibbons and B. Kapsimalis, Arch. Oral Biol. 8, 319 (1963). 5. J. H. J. Huis, D. 0. in’t Veld and 0. B. Dirks, Caries Res. 12, 243 (1978). 6 . J. Preiss, Curr. Top. Cell. Regul. 1, 125 (1969). 7. J. Preiss, in “Bacteria in Nature” (J. S. Poindexter and E. R. Leadbetter, eds.), Vol. 3, p. 189. Plenum, New York, 1989. 8. J. Preiss and D. A. Walsh, in “Biology of Carbohydrates” (V. Ginsburg, ed.), Vol. 1, p. 173. Wiley, New York, 1981. 9. J. Preiss, S. G . Yung and P. A. Baecker, MCBchem 57, 61 (1983). 10. E. G . Krebs and J. Preiss, in “International Review of Science: Biochemistry of Carbohydrates’’ (W. J. Whelan, ed.), p. 337. University Park Press, Baltimore, 1975. 1 1 . C. Boyer and J. Preiss, Bchem 16, 3693 (1977). 12. S. Govons, N. Gentner, E. Greenberg and J. Preiss, JBC 248, 1731 (1973). 13. J. Preiss, E. Greenberg and A. Sabraw, ]BC 250, 7631 (1975). 14. M. Damotte, J. Cattaneo, N. Sigal and J. Puig, BBRC 32, 916 (1968). 15. T.W. Okita, R. L. Rodriguez and J. Preiss, ]BC 256, 6944 (1981). 16. T.Romeo, J. Moore and J. Smith, Gene 108, 23 (1991). 17. P. A. Baecker, C. E. Furlong and J. Preiss, JBC 258, 5084 (1983). 18. P. A. Baecker, E. Greenberg and J. Preiss, JBC 261, 8738 (1986). 19. A. Kumar, C. E. Larsen and J. Preiss, JBC 263, 14634 (1986). 20. T. Romeo, A. Kumar and J. Preiss, Gene 70, 363 (1988). 21. F. Yu, Y. Jen, E. Takeuchi, M. Inouye, H. Nakayama, M. Tagaya and T. Fukui, JBC 263, 13706 (1988). 22. C. Haziza, P. Stragier and J.4. Patte, EMBO J. 1, 379 (1982).
328
JACK PKEISS AND TONY ROMEO
23. D. iZustin and T. J. Larson, J . Bact. 173, 101 (1991). 24. Y. Kohara, K. Akiyama and K. lsono, Cell 50, 495 (1987). 2.5. K. E. Rtidd. \V. hliller, J. Ostell and 1). A. Benson, h’ARes 18, 313 (1990). 26. D. ti. Dietzler, 51. P. Leckie, W L, Steinheim, T. L. Taxmarl, J. M. Ungar and S. E. Porter, BBRC 77, 1468 (1977). 27. I). N. Dietzler, M . P. Leckie, J. L. hlagnani, M. J. Sughrue, P. E. Bergstein and W. L. Strrinheim, JBC 254, 8308 (1979). 28. bf, P. Leckie. R. R. Ng, S. E. Porter, D. R. Compton and D. N . Dietzler, JBC 258, 3813 11983). 2.9. T. Romeo and J. Preiss, J. Bact. 171, 2773 (1989). .30. J. L‘rbanowski, P. Leung, H. Weissbach and J. Preiss, JBC 258, 2782 (1983). 31. T. Romeo, J. Black and J. Preiss, Curr. Microbid. 21, 131 (1990). 32. M: A. Bridger and \IT. Pwanchych, Can J. Biochem. 56, 403 (1978). 33. h l . P. Leckie. V. L. Tieber. S. E. Porter, \V. G. Roth and D. N. Dietzler. J. Bact. 161, 133 (19851. 34. If. Tagwhi, K. Izui and H. Katsuki, J. Biochetn. 88, 379 (1980). 3.5. X I . Cashel and K. E . Rudd, in “Escherichin coli and Saltnonella typhitnurium: Cellular and Molecular Biolog).” (F. C. Neidhardt. J. L. Ingrdham, K. B. Low, B. Magasanik, M. Shaechter and H. E. Urnbargar, eds.), Vol. 1, p. 1410. American Society for Microbiology, Washington, D.C.. 1987. 36. H . X i a ~ ,XI. Kalinan. K. Ikehara, S. Zemel. G. Glaser and kl. Cashel, JBC 266, 5980 (1991). 37. M .Almud and I. Pastan, JBC 250, 2189 (1975). .38. I), L. Riggs, R. D. hlueller, H . 3 . Kwan and S. W.Artz, PA’AS 83, 9333 (1986). .39. E. Sarubbi, K. E. Rudd and M. C s h e l . MGG 213, 214 (1988). 3%. T. Romeo, M. Gong, M. Y. Liu and A.-M. Brun-Zinkernagel, J. Bact. 175, 4744 (1993). 39b. T, Romeo and M . Gong, J . Bact. 175, 5740 (1993). 40. T, Romeo and J. Moore. NAFies 19, 3452 (1991). 4 1 . 0. Kaibarid and M . Schwartz, Annu. Rec; Genet. 18, 173 (1984). 42. S. A. Lesley, N. E. Thompson and R. H. Burgess, JBC 262, 5404 (1987). 4.3. S. 8.Jovanovich. S. A. Lesley and R. R. Burgess, JBC 264, 3794 (1989). 44. R. Lange and R. Hengge-Aronis, Mol. Microhiol. 5 , 49 (1991). 4.5 R. Hengge-Aronis and D. Fischer, Afol. Microhiol. 6, 1877 (1992). 16. I). .4. Seigele and R. Kolter, J. Bact. 174, 345 (1992). 17. M. P. hlcCann, J. P. Kidwell and A. Matin, J. Bact. 173, 4188 (1991). 48. M . Raha, 1. Kawagishi, V. hluller, M. Kihara and R. M . Macnah,]. Bact. 174, 6644 (1992). 49. H. Goldie. J. Bacf. 159, 832 (1984). 50. P. W. l’ostina, in “Escherichia coli and Salnwneiln typhitnurium: Cellular and Molecular Biology” (F. C. Neidhardt, J. L. Ingraham, K. B. Low, B. Magasanik, M. Schaechter and H. E. Urnbarger. eds.). Vol. 1, p. 127. American Society for Microbiology, Washington, D.C.. 1987. 5 1 . J. Preiss, Annu. Rec. Plant Physiol. 54, 431 (1982). 52. J. Preiss, in “The Biochemistry of Plants” (J. Preiss. ed.), Vol. 14, p. 184. Academic Press. Sail Diego, 1988. 53. J. Preiss. iri “Oxktrd Survey of Plant Molecular and Cellular Biology” (B. J. Miflin, ed.), Vol. 7, 11. 59. Oxford University Press, Oxford, 1991. 54. A. .4.Iglesias and J. Preiss, Biochetn. Educ. 20, 196 (1992). 5 5 . D. N. Dietzler, M. P. Leckie, C . J. Lais and J. L. Magnani, ABB 162, 602 (1974). 56. T. F. Parsons and J. Preiss, JBC 253, 6197 (1978). 57. T. F. Parsons and J. Preiss, JBC 253, 7638 (1978).
REGULATION OF BACTERIAL GLYCOGEN SYNTHESIS
58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78. 79. 80.
329
C. E. Larson and J. Preiss, Bchem 25, 4371 (1986). C. E. Larson and Y. M. Lee and J. Preiss, JBC 261, 15402 (1986). Y. M. Lee and J. Preiss, JBC 261, 1058 (1986). M. A. Hill, K. Kaufmann, J. Otero and J. Preiss, JBC 266, 12455 (1991). T. H. Haugen and J. Preiss, JBC 254, 127 (1979). N. Gentner and J. Preiss, JBC 243, 5882 (1968). B. S. White and J. Preiss, J. Mol. Euol. 34, 449 (1992). Y. Y. Charng, G. Kakefuda, A. A. Iglesias, W. J. Buikema and J. Preiss, Plant Mol. Biol. 20, 37 (1992). G . Kakefuda, Y. Y. Charng, A. A. Iglesias, L. Mclntosh and J. Preiss, Plant Physiol. 99,344 (1992). P. Y. Chou and G. D. Fasman, Bchern 13, 222 (1974). G. E. Schulz and R. H. Schirmer, “Principles of Protein Structure,” Springer-Verlag, New York, 1979. S. T. Rao and M. G. Rossmann, JMB 76, 241 (1973). M.G. Rossmann, A. Liljas, C. I. Brandeu and L. J. Babazak, in “The Enzymes” (P. D. Boyer, ed.), 3rd Ed., Vol. 2A, p. 61. Academic Press, Orlando, Florida, 1970. G. E . Schulz and R. H. Schirmer, Nature 250, 142 (1974). A. Kumar, T. Tanaka, Y. M. Lee and J. Preiss, JBC 263, 14634 (1988). A. M. Mulichak, E. Skrzypczak-Jankun. T.J. Rydell, A. Tulinsky and J. Preiss, JBC 263, 17237 (1988). A. Gardiol and J. Preiss, ABB 260, 175 (1990). C. R. Meyer, P. Ghosh, S. Nadler and J. Preiss, ABB 302, 64 (1993). P. Leung, Y. M. Lee, E. Greenberg, K. Esch, S. Boylan and J. Preiss, J. Bact. 167, 82 (1986). Y. M. Lee, A. Kumar and J. Preiss, NARes 15, 10603 (1987). P. Ghosh, C. Meyer, E. Remy, D. Peterson and J. Preiss, ABB 296, 122 (1992). C. R. Meyer, P. Ghosh, E. Remy and J. Preiss, J . B a t . 174, 4509 (1992). J. Preiss, C. Lammel and E. Greenberg, ABB 174, I05 (1976).
This Page Intentionally Left Blank
Diverse Mechanisms for Regulating Ribosomal Protein Synthesis in Escherichia coli JANICE LASE
M. ZENGEL LINDAHL
AND
Depattment of Biology Unioersity of Rochester Rochester, New York 14627 -\
I. Organization of Ribosomal Protein Genes Escherichia coli . . . . . . . 11. Overview of the Control of Ribosomal Protein Synthesis in Escherichia coli .................................
............... A. S10 Operon .............................. B. Alpha (S4) Operon ................................. C. S15 Operon ............................
....................................... E. vif Region ................................ F. S20 Operon . . . ........................... G. S1 Operon ..................................... H. L20 Operon . . . . ..................... I. str Operon .....................................
IV. Epilogue . . . . A. Physiological Implications B. Future Directions . . . . . References . . . . . . . . . . . . . . .
.............................. .................. .............. ..............
332 336 341 341 348 351 353 355 359 360 361 362 363 363 364 365
Because proteins are the most abundant cellular macromolecules, cell growth requires that a large fraction of the cell mass be devoted to the protein synthesis machinery, namely, the ribosomes and their ancillary factors. For example, in rapidly growing bacteria, ribosomes account for as much as 50% of cellular dry mass. To avoid unnecessary investments in ribosome formation, mechanisms have evolved to reduce the synthesis of ribosomes during slower growth, when maximum rates of protein synthesis are not required (reviewed in 1). Because ribosome formation commands such a large fraction of cellular resources, even minor adjustments in the rate of ribosome synthesis can have a major impact on the economy of the cell. Ribosomes are complex multicomponent organelles, containing three or Progress in Nucleic Acid Research and Molecular Biology, Val. 47
Copyright 8 1994 by Academic Press, Inc.
331
All rights of reproduction in any form reserved.
332
JANICE M. ZENGEL AND LASSE LINDAHL
more d&erent RNA molecules and at least 50 different proteins. Therefore, the regulation of ribosomal synthesis presents two problems: (1) coordinating the synthesis of individual ribosomal proteins (r-proteins) and rRNA molecules, and (2) balancing the overall rate of ribosome formation against the rate of synthesis of total biomass. In Escherichia coli, there is little turnover of ribosomal components under most growth conditions, indicating that the regulation of formation of ribosomal components is predominantly at the level of synthesis. Even though the regulation of ribosome synthesis in many organisms has been investigated [for example, Bacillus subtilis (2), Succharomyces cerevisim (3, 4), Dictyosteliuin (9,plants (6),and Xenopus (7, S)], this review is focused on E . coli. Earlier work on the regulation ofE. coli ribosome synthesis (reviewed in I , 9-11) was directed mainly toward elucidating the organization of ribosomal genes and the basic principles of their regulation. In recent years, the focus has shifted toward the analysis of detailed molecular mechanisms. Here we summarize current knowledge of the organization and regulation of ribosomal genes, emphasizing recent work elucidating the molecular mechanisms regulating ribosomal protein synthesis and illustrating the diversity of these mechanisms. The regulation of rRNA synthesis has been reviewed recently (12).
1. Organization of Ribosomal Protein Genes in Escherichia coli In E . coli, the small ribosomal subunit contains one rRNA molecule (16 S) and 21 proteins. The large subunit harbors two rRNA molecules (23 S and 5 S) and 34 proteins (Table I). All of the genes for these ribosomal components have been mapped on the E . coli chromosome. Furthermore, all but three of the r-protein genes have been sequenced. The organization of the transcription units (here called operons) encoding r-proteins is sumniarized in Fig. 1. Note that several r-protein operons contain genes for nonribosomal proteins, usually other components of the transcription and translation apparatus, such as the subunits of RNA polymerase. A number of the genes for r-proteins and RNA polymerase subunits can mutate to generate resistance to various antibiotics. Historically, these antibiotics were used to name the genes. For example, the genes for r-proteins S12 and S5 and the p subunit of RNA polymerase are referred to as the str (streptomycin-resistance), spc (spectinomycin-resistance), and .if (rifampicin-resistance) genes, respectively (Table I). Similarly, the operon containing the S12 gene is referred to as the str operon and the S5-encoding operon is called the spc operon.
RIBOSOMAL PROTEIN SYNTHESIS IN
E.
333
COh
TABLE I RIBOSOMAL PROTEINAND RIBOSOMALRNA GENESOF Escherichia colia Protein
Gene
Map positionb Operon
30-S proteins
s1 s2
s3 s4 s5 S6 s7 S8 s9 s10 s11 s12 S 13 S14 S15 S 16 S17 S18 S19 s20 s21 5 0 3 proteins L1 rplA L2 rplB L3 rpzc L4 rplD L5 rpZE L6 rplF L9 rpll L10 rplJ
21 4 73 73 73 95 73 73 69 73 73 73 73 73 69 57 73 96 73 0
67
90 73 73 73 73 73 96 90 90
s1 s2 s10 Alpha S6 str SPC
L13 s10 Alpha str Alpha SPC
S 15 trmD s10 S6 s10 s20 s21
Protein
Gene
Map positionb
90 69 73 73 73 73 73 57 37 69 73 73 73 48 69 82 73 73 89 23 82 83 37 73
L7IL12C L13 L14 L15 L16 L17 L18 L19 L20 L2 1 L22 L23 L24 L25 L27 L28 L29 L30 L31 L32 L33 L34 L35 L36
Operon
L10 L13 SPC SPC
s10 Alpha SPC
tnnD L20 Not sequenced s10 s10 SPC
L25 Not sequenced L28 s10 SPC
Not sequenced L32 L28
L34 L20
L11 s10
s10 s10 SPC SPC
S6 L10 L11
rRNA transcription units rntA rrnB rrnC rrnD
-
-
rrnE rrnF rrnG
87 90 85 72 90 57 5
OData in this table are based on studies of E. coli K12 (see the legend to Fig. 1 for references). Even though the gene organization is probably very conserved among different strains, there are some differences. For example, Southern blot analysis shows that some E. coli strains lack the approximately 15-kb DNA segment that separates the str and S10 operons in E. coli K12 and E. coli B (N. Smith, J. M. Zengel, and L. Lindahl, unpublished). bMap position is in minutes of the E. coli chromosome (172). cL7 and L12 are encoded by the same gene; L7 is the acetylated form of the r-protein.
S20(0')
S2(4')
s1 (21')
L32 (23')
thrSIL2OIpheS (37')
L25 (48')
-
S16 (57')
FSl6 21K rpsP
S15 (69)
S21 (67')
TrmD
trmD
Ll9(t)
ptS
++
P~P~SPI I
rpsu
L13 (69')
DNA primase
Gs I
1 '
dnaG
sigma
t I
I
rpoD
s10 (73) TSlo
t g'PsJ
4
L3
L23
L2
S19 L22
S3
I
I
I
tplC
tpIB
p l D tplW
rpsStplV
t p ~ C PlPrpmCrpsQ,
,___________--------
.....................................
iFL14 L24
a
L5
514
L6
L18
S5 L30 L15
PIE rpsN rpsH
rplN rplX
spc (73') L28 (82')
rplF rplR rpsErpmD tpl0
-
alpha
L36FS13 S l l .I
I
secY
m u l
rpmJ rpsM rpsK
L17 t -I
rpsD
rpoD
tplQ
alpha (73')
L34 (83') ++
F)L28 L33
r u0)
rpmB rpmG
L1 l(90')
Pi P2 L34
tpmH rnpA
L10 (90')
-
S6 (95') $)S6
LII rplK
1 kb
L16 L29S17t
rplA
S18
L9 (t)
psF ORFrpsR tpll
FIG. 1. Organiiation of ribosomal protein genes of Escherichia coli. Transcription units containing r-protein genes are drawn to scale according to their nucleotide sequences. Gene names (below the gene) and their products (above the gene) are shown. The chromosomal map positions are given after the operon names. Confirmed promoters and transcription terminators are indicated as P and t, respectively. Tentative promoters and terminators are shown as (P) and (t). Attenuator sites (att) and RNase 111 processing sites (111) are indicated. Gene products demonstrated to be autogenous regulators are underlined. The genes under their autogenous control are shown as solid black boxes; the genes reported to be under retroregulation are indicated as hatched boxes. Genes encoding proteins for which there are more tentative data are shown as stippled boxes. References for most of the information summarized in the figure are given in our previous reviews (9,173).Other data are from 174 (L32 operon); 21,22,141,147,175-178 (thrS/L20/pheS operons); 179 (L25 operon); 180-182 (S21 operon); 18, 91, 93, 183 (S15 operon); 14, 184 (S10, spc, and alpha operons); 20 (str operon); and 185 (S6 operon).
336
JANICE M. ZENGEL AND LASSE LINDAHL
About half of the r-protein genes are located in the classical str-spc region at about 73 minutes on the E . coli chromosome. These genes are organized into four operons containing from 4 to 12 genes (Fig. 1). The remaining genes are scattered around the chromosome in transcription units encoding one to four r-proteins (Fig. 1). The boundaries of r-protein operons are often ambiguous, because of multiple promoters and transcription read-through between adjacent operons. For example, because there is no efficient transcription terminator between the spc and alpha operons (Fig. I), the alpha operon is transcribed from both the spc and the alpha promoters (13, 14). Similarly, there is no transcription terminator at the end of the L11 operon in the rifregion (Fig. I), so the downstream L10 operon is transcribed from both its own promoter and the 1 J l promoter (15-1 7). In addition, some transcription terminators are “leaky.” For example, about 20% of the RNA polymerase molecules transcribing the S1O operon continue into the spc operon (14), and half of the polynierases transcribing the S15 gene proceed through the terminator after this gene (18).Several of the r-protein operons, including the S21 (19), str (20),and L20 operons (21,22), contain internal secondary promoters (Fig. 1). Because many r-protein genes are expressed from several different classes of transcripts, we have defined the r-protein operons based on their regulatory patterns, rather than on transcriptional parameters.
II. Overview of the Control of Ribosomal Protein Synthesis in Escherichia coli
A. A Variety of Molecular Mechanisms for Autogenous Control In the late 1970s, several research groups independently proposed that r-protein synthesis in E . coli is subject to “autogenous control” (23-25). That is, one gene in an operon encodes an r-protein, which serves both as a structural component of the ribosome and as a regulatory protein controlling the expression of itself and other genes in the operon (Fig. 2). (The term “feedback” regulation has also been used, although to biochemists this term refers specifically to inhibition of enzyme activity, not synthesis.) The idea for autogenous control arose from three types of experiments. First, gene-dosage experiments showed that an increase in the copy number of‘ ribosomal protein genes usually led to little or no increase in the rate of synthesis of the encoded proteins (23,25).Second, conditional oversynthesis of individual r-proteins demonstrated that excessive accumulation of a single
RIBOSOMAL PROTEIN SYNTHESIS IN
E . coli
337
r-protein operons with regulatory proteins binding to 1 6 s rRNA
S20 operon
alpha operon
L20 operon
spc operon
S10 operon
S15 operon stf operon
L10 operon L11 operon
r-protein operons with regulatory proteins binding to 23s rRNA FIG.2. Linkage of autogenous control of r-protein synthesis to ribosome assembly. Operons whose regulatory r-proteins have defined binding sites on 1 6 4 or 2 3 4 rRNA are shown. Genes encoding regulatory proteins are represented by solid black boxes; other genes whose transcription or translation is under their autogenous control are indicated by hatched boxes; unregulated or retroregulated genes are represented by empty boxes. The approximate locations of the regulatory targets on the mRNA are indicated by the feedback arrows.
key r-protein from an r-protein operon repressed expression of all genes in that operon (24, 26). These induction experiments also provided the first direct evidence that autogenous regulation is operon specific. Third, in vitro synthesis of r-proteins from a given operon could be inhibited by addition of a specific purified r-protein from the operon under study (27-29). The dual function of the regulatory r-proteins suggests a mechanism for linking gene expression to the ribosome-assembly process (26, 30). During balanced growth, newly synthesized r-proteins bind to rRNA and are assembled into mature ribosomes. If the production of a given regulatory r-protein exceeds the production of other ribosomal components, the accumulated free protein represses the synthesis of most or all proteins encoded by its own operon to restore the balanced synthesis of r-proteins and rRNA (Fig. 2). The autogenous regulation is operon specific; that is, a regulatory r-protein represses only its own operon and has no direct effect on synthesis of proteins from other operons. This specificity, in conjunction with the con-
338
JANICE M. ZENGEL AND LASSE LINDAHL
sumption of equimolar amounts of r-proteins during the ribosomal assembly process, ensures the coordinate expression of the various r-protein operons. Regulatory r-proteins have now been identified for about half of the r-protein operons, and include both 3 0 4 and 50-S components (summarized in Figs. 1 and 2). Although autogenous control is a recurrent theme for regulating expression of r-protein operons, there are several different mechanisms by which the autogenous control is achieved: (1)control of translation initiation, (2) control of mRNA elongation, and (3)control of mRNA degradation. Even within a single operon, different genes may be subject to different control mechanisms or to more than one type of regulation (for examples, see discussions of the S10, spc, alpha, and L1O operons in Section III).
B. Interaction of Regulatory r-Proteins with mRNA Almost all the regulatory r-proteins are rRNA “primary” binding proteins, i.e., proteins that can bind directly and specifically to naked rRNA in citro and therefore associate with the ribosomal particle during the early stages of ribosomal assembly (26, 30). Two inferences have been drawn from this point. One is that the rate of r-protein synthesis is directly tied to the rate of rRNA synthesis, such that a fluctuation in the amount of rRNA immediately induces a parallel change in the amount of r-proteins (26)(Fig. 2). The other is that, because the regulatory r-proteins have the ability to recognize specific RNA targets, autogenous control is likely to involve direct binding of a regulatory protein to a region of its own messenger RNA that is structurally related to its rRNA binding site (30). The canonical model of direct binding of regulatory proteins to their respective mRNAs has been verified experimentally for several operons (although it is interesting to note that many of the originally proposed secondary structures of rRNA and mRNA that inspired the model turned out to be incorrect). Direct binding of a complex of L10 and L12 to the L1O operon mRNA (31),of r-protein S4 to the alpha operon mRNA (32),and of S8 to spc operon mRNA (33) has been demonstrated by filter-binding assays. The r-proteins S15 (34) and S7 (M. Nomura, personal communication) have been shown by “footprinting” experiments to bind to their respective mRNAs. However, riot all attempts to show binding of a regulatory r-protein to its own inRNA have succeeded. No binding of S20 (35)or LA (36)to their respective transcripts has been observed. In some cases, a similarity between secondary structures of the binding sites on mRNA and rRNA for a given protein is obvious (e.g., see later, Figs. 8 and 9). In other cases such similarities are less clear. Perhaps the relationship between binding sites will become more apparent when the threedimensional structures of the RNAs become known. It is also important to note that RNA sequence similarities can be deceiving. In several cases,
RIBOSOMAL PROTEIN SYNTHESIS IN
E . coli
339
experimental analysis has failed to confirm the involvement of mRNA segments that display obvious sequence similarity to rRNA (see discussions of S10, 515, spc, and S20 operons in Sections III,A, III,C, III,D, and III,F, respectively). There are a number of r-protein operons that do not contain genes for an rRNA primary binding protein. For example, the S1 operon is under autogenous control (37, 38), yet the regulatory protein, S1 is apparently not a primary rRNA binding protein (39).Also, r-protein S2 is not thought to bind directly to rRNA, yet its operon appears to be under autogenous control (40). Studies of such operons could be very interesting, because they apparently do not conform to the paradigm illustrated in Fig. 2. On the other hand, neither of the two r-proteins encoded by the tmnD operon is an rRNA primary binding protein, nor does this operon appear to be autogenously regulated (41). Thus, in this case the absence of rRNA-binding r-proteins may reflect the absence of an autogenous regulatory mechanism. The binding of regulatory r-proteins to their own mRNAs implies competition between rRNA and mRNA for the r-proteins (Fig. 2). It is obviously important that the proteins bind preferentially to rRNA so that r-protein synthesis is repressed only when there is a shortage of nascent rRNA molecules. Yet, in the few cases where comparative measurements of the affinity for rRNA and mRNA have been made, modest or no differences in the binding constants for the two RNA targets have been observed (32,33).The preferential incorporation of regulatory r-proteins into ribosomes is apparently achieved, not by a “tighter” binding to rRNA, but rather by the cooperativity of the ribosome assembly process and the fact that proteins bound to rRNA will become trapped in mature ribosomal particles (42).
C. Mechanisms for Repressing Translation Initiation Early studies showed that most autogenously controlled r-protein operons are inhibited at the level of translation (9, 11). For the L10 operon (43, 44) and the S10 operon (N. Brot, personal communication), in uitro dipeptide synthesis assays showed that the translation inhibition is at a step prior to the formation of the first peptide bond. One can envision several ways by which translation initiation could be blocked. First, the protein could bind to a site on the mRNA that overlaps the ribosome-binding region, thereby inhibiting translation by steric blocking. This model has been proposed for a number of nonribosomal operons (45,46).A second possibility is that binding of the regulatory protein induces a conformational change in the mRNA that makes the ribosomal binding region inaccessible, e. g., by involving the Shine-Dalgarno region and/or the initiation codon in intramolecular base-pairing. Even though the regula-
340
JANICE M. ZENGEL AND LASSE LINDAHL
tory protein and the ribosome would not have overlapping binding sites, the protein would indirectly prevent binding of the ribosome to the translation initiation region. Third, the regulatory r-protein might block a step in the translation initiation pathway subsequent to binding of the 303 ribosomal subunit to the mRNA but before formation of the first peptide bond. That such a pathway exists is suggested by the recent observation that the 304 ribosomal subunit interacts with the Shine-Dalgarno region prior to formation of a canonical initiation complex (47). The mechanism of translation repression is now being investigated for several r-protein operons. So far, none of the mechanisms seem to involve simple direct competition. Rather, the available evidence is compatible with the second and third possibilities. In most r-protein operons in which the regulatory protein inhibits translation of a string of two or more genes, only the most proximal gene in the series is directly inhibited. The downstream genes are regulated by translational coupling, in which a given gene is translated efficiently only when ribosomes traverse the upstream mRNA. Such coupling has been demonstrated in the L11, L10, SlO, spc, L20, and alpha operons.
D. Regulation of mRNA Elongation In the S10 operon, the same r-protein, L4,autogenously regulates both transcription and translation (26, 48, 49). The transcription regulation is the result of premature termination of transcription at a terminator (attenuator) in the leader sequence about 30 bases upstream from the first gene of the operon (48, 50). The attenuation process is independent of the translation control by L4 (49, SO), and appears to be unique to the S1O operon (51).
E. Regulation of mRNA Stability In two r-protein operons in which more than one gene is translationally inhibited, the gene directly repressed by the regulatory protein is not the most proximal gene of the operon (Figs. 1 and 2). For the str operon, in uitro experiments showed that regulatory protein S7 inhibits translation of the second and third genes, coding for S7 and EF-G, but not the proximal S12 gene (52).Similarly, for the spc operon, regulation by S8 in vitro starts at the third gene of the operon, with little or no effect on the upstream L14 and L24 genes (27). Nevertheless, the upstream genes in both operons are at least partially regulated in uiuo (52, 53). The solution to this puzzle appears to be due to a specific destabilization of the upstream part of the mRNA (54; M . Nomura, personal communication). Less specific destabilization of mRNA has been observed when translation of the L11 (55) and alpha operons (56)is inhibited by their respective autogenous regulators. For the L11 operon, the decreased stability of the
RIBOSOMAL PROTEIN SYNTHESIS IN
E.
COh
341
mRNA is an indirect effect of reduced translation, presumably resulting from
the decreased ribosome density on the message (55).How much such mRNA destabilization contributes to autogenous regulation is not yet clear.
111. Review of Individual Operons A. S10 Operon 1. 'pwo U-MEDIATED REGULATORY PROCESSES
L4 regulates both transcription and translation of the 11-gene S1O operon (24,26,28,49, 50). Transcription control was first detected in pulse-labeling experiments showing that synthesis of structural gene message from the entire S1O operon is reduced four- to fivefold in response to JA oversynthesis (26). Nuclease mapping of pulse-labeled RNA subsequently revealed that the L4 inhibition of transcription is due to the protein's stimulation of premature transcription termination (attenuation) about 140 bases from the transcription start, or about 30 bases upstream from the initiation codon for the most proximal structure gene (48, 57, 58). Transcription control has also been demonstrated by in uitro studies showing that L4 induces premature termination in the S10 leader in a transcription system that contains only purified RNA polymerase and, as discussed below, transcription factor NusA (58). Repression of translation of S10 operon mRNA mediated by L4 was first seen in in uitro experiments (28).Subsequently, translation control was also shown in vivo by comparing transcription and translation rates: oversynthesis of L4 reduced protein synthesis to about 5%, whereas the rate of mRNA synthesis was reduced to 20-25% (49).These numbers imply that excess L4 reduces mRNA synthesis 75 to 80%, and translation of the residual mRNA transcripts by an additional 80%. This interpretation was confirmed by genetic analysis of the S10 leader: certain mutations eliminated translation control by L4 without affecting inhibition of transcription, whereas other mutations eliminated transcription control but left translation control intact (49). Together, these genetic and physiological studies suggest that the two L4-mediated regulatory processes work independently, but additively, to control synthesis of the 11 r-proteins encoded by the S1O operon. 2.
REQUIREMENTS FOR LEADERSEQUENCES IN LA-MEDIATEDREGULATION
Structure-probing studies showed that the S10 leader RNA can fold into a series of hairpin structures that are illustrated in Fig. 3 (59). Transcription termination occurs around base 140, on the descending side of the large
3-42
JANICE M. ZENGEL AND LASSE LINDAHL
HE
I
Bases: Necessary for L4 inhibitionof: Transcription Trans\a%ion sequence identi with: S. typhimurium C.freundii Y. entefocolitica M.rnorganii
1-60
no
no
83%
58’1. 53%
48%
61-83
yes
I
84I45 yes
146-192
no
no
rn
Yes
100% 83% 74% 57%
98% 84% 82%
100% 98% 98% 98%
90%
-
FIG.3. Functional domains of the S10 leader. The secondary structure of the S10 leader b y Shen et nl. (59).The stem-loop structures are identified as HA, HB, HC, HD, HE, and HG. The positions ofthe Shine-Dalgarno sequence (SD)and the initiation codon for the S10 gene are also shown. Domains needed for LA-mediated transcription and translation control are indicated (49, 50, 58; J. M. Zengel and L. Lindahl, unpublished). Data comparing the SlO leader sequence from E. coli with other enterobacteria are also summarized (60). was determined
“attenuatar” hairpin structure HE (58). The leader can be divided into several functional domains according to effects of mutations on L4-mediated regulation (Fig. 3). The proximal three hairpins (HA, HB, and HC) can be deleted indi\ridually or en bloc without any detectable effect on the L4mediated inhibition of either transcription or translation (J. M. Zengel and L. Lindahl, unpublished). Hairpin HD is absolutely required for transcription control by LA (&id.), but is dispensable for translation control (49). The HE hairpin region is essential for both transcription and translation control (49).
RIBOSOMAL PROTEIN SYNTHESIS IN
E . coli
343
Finally, the leader downstream from the attenuation site is essential only for translation control (50, 58; J. M. Zengel and L. Lindahl, unpublished). This division into functional domains is consistent with our phylogenetic studies of the S10 leader sequences of the closely related enterobacteria Salmonella typhimurium, Citrobacter freundii, Morganella morganii, and Yersinia enterocolitica (Fig. 3). For example, the region upstream from hairpin HD, which is dispensable for both levels of L4 control, has as little as 48% sequence identity between E. coli and the other enterobacteria, whereas the region from HE to the initiation codon of the S10 gene has greater than 80% identity (60). Interestingly, sequences in the HA, HB, and HC regions of M . morganii and Yersinia enterocolitica, which show the least homology to E. coli, can still be organized (at least theoretically) into three hairpins, but the lengths and number of bulges and internal loops on the stems vary considerably from the E. coli leader. The H D hairpin structure, involved in LCmediated transcription control in E. coli, is conserved in the other four species, with several compensatory changes in the stem and variations in the sequence, but not length, of the loop. Similarly, base changes in hairpin H E are all in loops or bulges, or are nondisruptive (e.g., G - C 4 GaU). These phylogenetic studies are consistent with our conclusions from genetic studies that the leader sequence and/or secondary structure from HD to the beginning of the S10 gene are critical for the L4-mediated regulation. In fact, these “foreign” S1O leaders, when placed upstream from the E. coli S10 gene, display L4-mediated regulation (60). 3. BINDINGOF L4
TO
RNA
As discussed in Section 11, B , several regulatory r-proteins have been shown to bind directly to their cognate mRNA. However, attempts to show binding of L4 to the S10 leader have failed (P. Shen, J. M. Zengel, and L. Lindahl, unpublished). One possible explanation is that the L4 binding site includes protein determinants contributed by components of the transcription and/or translation system. Another possibility is that the leader RNA undergoes structural changes during its synthesis. L4 might, for example, bind to a region that is only transiently single-stranded in nascent RNA, but that becomes double-stranded once transcription of the leader region is completed (Fig. 4; see also Section III,A,5). The RNA we have used for binding studies is full-length leader RNA that may have assumed the “wrong” conformation for L4 binding. In any case, the L4 target is likely to include the S10 leader RNA, because it is otherwise difficult to explain how L4 specifically regulates the S1O operon. If a single target for L4 is involved in both transcription and translation regulation, then it must be within hairpin HE, because this is the only part of the leader required for both types of regulation (see Section III,A,2 and
344
JANICE M. ZENGEL AND LASSE LINDAHL
HE' uc
a "u/---\
C G
Ubinding site?
G U G C
HE uc
HE'
A G A U C A UA AU CG CG CG
uc
A G A U C A UA AU
HG
u CG CG
...
%
CG UA CGA : CG AU UA UA
GC
ucc.
HG C UA UA GC GC UA C A
...
... Ribosome
HF
Untranslatable
entry site?
Translatable
FIG.4. Model for LA binding to the SlO leader and L1-mediated inhibition of translation. Only the relevant region of the S10 leader is shown, beginning at base 85 from the start of transcription. According to this model, when RNA polymerase pauses during transcription of the S10 leader, the LA binding site is accessible. Once bound to the mRNA, Ld prevents extension of the HE hairpin, thereby favoring formation of the H F hairpin that sequesters the ribosome entry site. If Ld does not bind while the binding site is single-stranded, the target becwmes inaccessible. The inset shows the LA binding region of 2.34 rRNA (36).The site to which LA has been cross-linked to 23-S rRNA in 50-Ssubunits (62) is indicated.
RIBOSOMAL PROTEIN SYNTHESIS IN
E.
COh
345
Fig. 3). An attractive, but untested, possibility is that L4 binds around base 90 of the leader (Fig. 4). This region is likely to be single-stranded during the transcription pause observed at the attenuator (see Section III,A,4). Furthermore, binding of L4 to this region might also serve to affect the equilibrium between the two isomers of the mRNA hypothesized to be involved in translation control (Section III,A,5). Finally, this region would be sequestered in a double-stranded structure in full-length mRNA and thus explain why our attempts to bind L4 to the leader transcript have failed. The generic model for‘autogenous control of r-protein synthesis is based on the idea of structurally related targets on rRNA and mRNA for binding of the regulatory r-protein. We have recently mapped the L4 binding site on 2 3 4 rRNA to a region about 300 bases from the 5’ end, i.e., within domain I, by showing that specific fragments of 23-S rRNA added to in vitro transcription reactions can bind L4 and eliminate the regulatory protein’s stimulation of attenuation (36, 61).This region of the 2 3 4 RNA has no striking Similarity with the primary or secondary structure of the S10 leader (Fig. 4), although analysis of tertiary interactions in each of the two RNA molecules may reveal similarities not yet obvious. In intact 50-S ribosomes, L4 has been cross-linked to bases in both domain I (62)and domain I1 (63)of 23-S rRNA. Since the domain-I1 site was found much sooner than the domain-I site, it was long believed that the L4 binding site was within domain 11. This hypothesis was appealing because of scattered primary sequence identities between the S1O leader in the attenuator region and the region of domain I1 in 23-S rRNA that includes the L4 cross-linking site (64).However, subsequent genetic studies showed that the bases of the S1O leader involved could be mutated or deleted and still leave intact one or both forms of L4-mediated autogenous control (49, 50, 58). Thus, it now seems likely that the similarity between the S10 leader and domain I1 of 23-S rRNA is irrelevant for the autogenous control.
4. MECHANISMOF
u - M E D I A T E D TRANSCRIPTION REGULATION
In vitro transcription studies have shown that L4 stimulates termination of transcription only in the presence of the protein NusA (58,61),a transcription factor that generally increases the ability of RNA polymerase to correctly “interpret” termination and antitermination signals (65). A possible pathway for transcription control, based on these in vitro experiments, is summarized in Fig. 5. The first step is a NusA-dependent pause by RNA polymerase at the site where transcription termination will take place (61, 66). LA is not required for pausing, but strongly prolongs the pause, presumably accounting for its stimulation of termination. Interestingly, L4 can stabilize the paused complex even if the r-potein is added to the reaction after the NusA-modified RNA polymerase has already reached the pause site (66).
346
JANICE M. ZENGEL AND LASSE LINDAHL
Paused transcription complex
Elongating complex
y S10 leader RNA
-+
4
etc.
NusA
S10 gene
+ ribr40mal protein
S10 leader RNA S10 leader RNA
k
-
1
~
pause site
4 Slogene
Pre-termination complex
Premature termination
FIG.5. hlodel for L4-mediated termination of transcription. In the presence of NusA, most of the RNA polymerases pause briefly at the attenuation site before continuing elongation through the S10 and downstream genes (66). The addition of r-protein stabilizes the pause, thereby promoting premature termination of transcription (66). In the absence of NnsA, RNA polymerase ignores the pause signal (not shown).
Leader deletion studies showed that the leader region containing only the H E hairpin is sufficient for the NusA-dependent pause, but the upstream HD hairpin is essential for LA stabilization of the paused complex (J. M. Zengel and L. Lindahl, unpublished). Analysis of base substitution mutations altering the upper stem-loop region of the HE hairpin suggests that the G-C-rich stem is part of the pause signal. The role of hairpin HD in the effect of L4 is not clear. It might represent the L4 binding site on the leader HNA. However, because the HD hairpin is dispensable for L4 inhibition of translation (Fig. 3), this would imply that translation control involves a separate L4 binding site. In any case, further studies are needed to pinpoint the L4 target (see also Section 111,A,3) and to identify how a ribosomal protein influences KNA polymerase.
5. MECHAXISMOF L4-MEDIATED REGULATION OF THANSLATION
The mechanism for LI-mediated translation regulation has not yet been analyzed as rigorously as the transcription attenuation mechanism. A working model has been proposed (59), based on structure-probing data suggest-
RIBOSOMAL PROTEIN SYNTHESIS IN
E.
COk
347
ing that the leader exists in at least two conformations differing with respect to the length of the bottom part of HE and the formation of a hairpin (HF) between H E and HG. Surprisingly, the Shine-Dalgarno region is sequestered in both forms (59). This presumably prevents ribosomes from binding directly to the ribosome binding site of the S10 gene. We have hypothesized that a ribosome gains access to the translation initiation site of the S10 gene by first binding to the single-stranded region between HE and HG in the structure shown in Fig. 4 (59). Once associated with this “ribosome entry site,” the ribosome may compete with the distal sequence of HG for the Shine-Dalgarno region, or possibly even destabilize the HG hairpin by diffusing along the RNA molecule. Thus, we propose that the S10 mRNA is translatable when the ribosome entry site is in single-stranded form, but untranslatable when the site is sequestered in the H F hairpin. Our working model for translation repression of the S10 gene is that r-protein L4 induces a shift in the equilibrium between the translatable and untranslatable forms (Fig. 4). Consistent with this proposal, an 8-nt deletion removing much of the putative ribosome entry site results in a 90% reduction of the translation efficiency and eliminates translation control by L4 (J. M. Zengel and L. Lindahl, unpublished). Although this model is appealing in its relative simplicity, the process for regulating translation of the S10 gene is probably more complex. For example, this model cannot account for our observation that translation control is affected by several mutations in the upper stem-loop of hairpin HE, well upstream from the ribosome entry site and the Shine-Dalgarno sequence (49). It is interesting to note that the in vivo translation control of these hairpin mutations more or less correlates with the effect on NusA-dependent pausing observed in the in uitru transcription system (our unpublished experiments), suggesting that L4 may inhibit translation of the S10 gene only if it binds to the mRNA during a pause in transcription. For example, the L4 target may involve a region of the S10 leader on the ascending side of the H E hairpin, still single-stranded while RNA polymerase is paused. Once bound to this region, the L4 protein might prevent the lower stem of H E from forming on completion of the leader synthesis and thereby favor formation of hairpin HF, i.e., the untranslatable form (Fig. 4). Whatever the precise molecular mechanism by which L4 inhibits translation of the S1O gene, the effect is probably propagated down the multicistronic transcription unit by translational coupling. We have analyzed translational coupling between the first two genes of the S10 operon by constructing a gene fusion between the second gene of the operon (encoding L3) and ZacZ, and monitoring how mutations that decrease or eliminate translation of the upstream S10 gene affect expression of the downstream fusion gene (67). Such mutations severely reduced the expression of the
348
JANICE M. ZENGEL AND LASSE LINDAHL
downstream L3'llocZ' gene, but the reduction was partially due to premature transcription termination (polarity), as is often observed in E. coli when mRNA translation is blocked (68).This problem was circumvented by replacing the S1O promoter with a promoter for bacteriophage "7 RNA polymerase. Transcription by this enzyme proceeds even in the absence of translation (67, 69, 70), making possible measurements of the translation of the U'/lmZ'fusion gene in the presence of mutations blocking S10 translation. The results showed that complete elimination of the translation of the S10 gene reduces the translation efficiency of the L3'llacZ' gene by about 80% (67). The incomplete coupling of the translation of the S10 and L3 genes suggests that it may be difficult to propagate translation regulation through a long operon exclusively by translational coupling: ribosomes entering at an intercistronic junction because of incomplete coupling would promote translation of genes still further downstream. Perhaps this is the reason that the S10 operon is also regulated autogenously at the transcriptional level.
B. Alpha (S4) Operon 1. TRANSLATION REGULATION
BY
S4
The alpha operon of E. coli contains genes for four ribosomal proteins and for the alpha subunit of RNA polymerase. Gene dosage experiments ( 5 3 , in uitro translation experiments (27), and experiments with induced oversynthesis of specific r-proteins (29, 71, 72) demonstrated that the r-protein genes in the alpha operon are translationally repressed by r-protein S4, the product of the third gene in the operon. Remarkably, translation of the alpha subunit gene is only weakly inhibited by S4, even though this gene is flanked by r-protein genes that are strongly repressed by the r-protein (72). Regulation of all the r-protein genes, including the L17 gene located distal to the alpha subunit gene, is abolished by deletions in the leader of the operon (72). Thus, it appears that S4 has a single regulatory target on the alpha operon mRNA, and that the L17 gene is translationally coupled to the three upstream r-protein genes, even though the unregulated alpha gene is in between. N o mechanism for this unusual coupling has been determined yet.
2. BINDINGOF S4 TO RNA Binding of S4 in uitro to the alpha mRNA is well-characterized. The binding site encompasses the leader and proximal 40 bases of the first gene, encoding S13 (32, 73, 74). Both S4 binding to alpha mRNA and S4-mediated repression of translation require sequences extending outside the region containing the traditional features of a ribosome binding site (72, 7 4 , suggesting that S4-mediated repression of translation is not accomplished by a simple competition between S4 and initiating ribosomes.
RIBOSOMAL PROTEIN SYNTHESIS IN
E . coli
349
The region of the mRNA required for S4 binding and translation regulation forms two intertwined pseudoknots made up of four helices and six single-stranded regions (Fig. 6) (74, 75). The S4 binding region on 16-S rRNA encompasses an extensive sequence from the 5' end of the RNA molecule (Fig. 6) (76-79). The S4 binding domains on 16-S rRNA and on alpha mRNA have no obvious common sequence or structure, making the alpha operon one of several r-protein operons that may not follow the paradigm of similar regulatory protein binding sites on mRNA and rRNA. However, it has been dimcult to determine which part of the S4 binding region on the 16-S rRNA is in actual contact with S4 and which part is necessary simply to maintain the correct constellation of the contact regions (78). Also, a similarity in the three-dimensional pattern of the protein's contact points
&I
C-G G U
A
A
u
U-A G-C
P-A
U
u
A "-20
A
..-
G
54 mRNA FIG.6. Targets for r-protein S4 on mRNA and 1 6 4 rRNA. The structure of the S4 mRNA was determined by Draper and co-workers (74, 75, 80). The Shine-Dalgarno sequence and GUG initiation codon for the S13 gene are indicated hy boxes. A schematic diagram of the secondary structure of the region of 1 6 4 rRNA implicated in S4 binding is also shown (76-78). Helices shown by genetic analysis to be especially critical for S4 binding are shown as solid structures (78). The hatched region represents a domain apparently protected by bound S4 in footprinting experiments (77), but is not essential for binding as measured by other techniques (76, 78).
350
JANICE M. ZENGEL AND LASSE LINDAHL
on the two KNA molecules could be missed if these points were widely separated in the two-dimensional models.
3. ALLOSTERICCHANGES IN ALPHA mRNA The mechanism for S6mediated translation repression appears to involve an allosteric change in the mRNA induced by S4 binding (80), rather than a simple competition between S4 and initiating ribosomes for the same space on the message. This conclusion is based on the study of point mutations designed to probe the influence of each helix in the two intertwined pseudoknots on S4::mRNA binding and on translation repression. Several mutations abolished translation repression yet allowed normal S4 binding, indicating that the binding of the regulatory protein per se does not repress translation (80). Rather, it appears that S4 binding induces a conformational change in the wild-type mRNA that in turn prevents translation initiation. The existence of different conformers of the alpha mRNA is supported by the observation of a hyperchromic shift at about 35”C, a temperature much below the general melting temperature for the mRNA secondary structure (81). The importance of this conformational transition was recently established by studies of ribosome::alpha-mRNA complexes at different temperatures. Two types of such complexes have been identified by “toeprint” experiments [mapping of mRNA-bound ribosomes by using the ribosomes as roadblocks for the progression of reverse transcriptase in primer extension ( S l ) ] .One type of complex is the conventional ternary complex consisting of a ribosome and initiator tRNA interacting with the Shine-Dalgarno region and the initiation codon of the mRNA. The other is a binary complex consisting of only a ribosome and an mRNA molecule. Kinetic experiments suggest that conventional ternary complexes form most readily at temperatures above the conformational transition temperature, whereas the binary complexes form preferentially below the transition temperature (82). Furthermore, conversion between the two types of complexes is slow. The key to understanding translation repression seems to be that the binary complex binds S4 and prevents conversion into a conventional translation initiation complex. Thus it appears that the alpha mRNA is trapped in a complex with ribosomes and S4 and is not available for normal translation initiation (82). These experiments illustrate that S4 regulates the translation by mRNA entrapment and not by competition with ribosomes for a common binding site on the mRNA. It should be interesting to learn whether translation initiation factors (which so far have been omitted from the in uitro experiments) play a role in this novel regulatory mechanism.
RIBOSOMAL PROTEIN SYNTHESIS IN
E . coli
351
C. S15 Operon 1. GENETICORGANIZATION AND REGULATION The 515 gene is adjacent to, and cotranscribed with, the pnp gene encoding polynucleotide phosphorylase. About 50% of the RNA polymerases initiating at the P, promoter upstream from the S15 gene (Fig. 1)terminate at the t, terminator between the S15 and pnp genes; the remainder of the polymerases continue through pnp (83);pnp is also transcribed from the weaker P, promoter located between the two genes of this complex transcription unit (18). Like most other r-protein operons, the S15 gene is autogenously regulated at the level of translation (84, 85). The region of the mRNA that is required for S15 autoregulation overlaps the translation initiation site, a somewhat surprising finding because homology between a sequence at the 3’ end of the S15 gene and a sequence in the S15 binding site on the 16-S rRNA had been noted previously (86).Apparently this homology was fortuitous, similar to the example of homology between the S10 leader and domain I1 of the 2 3 4 rRNA (see Section III,A,2). Chemical and enzymatic structure probing of the regulatory site (34) showed that the S15 mRNA exists in two forms that appear to be in equilibrium with each other (Fig. 7). One conformation contains two hairpins; in the other conformation, the downstream (relatively unstable) hairpin is resolved and part of the RNA is instead paired with the loop of the upstream hairpin to form a pseudoknot. Footprinting experiments have demonstrated that S15 binding to the RNA stabilizes the pseudoknot conformation (34).Neither of the two conformations of the mRNA shows obvious homology with the S15 bindingsite on 164 rRNA (87-89) (see Fig. 7). However, there may be some similarity between the tertiary structures, because they both contain helices associated with unpaired adenines, either as a bulging base in 1 6 4 rRNA or as a pair of bases spanning the groove in the pseudoknot of the mRNA (34). 2. MECHANISMOF
REGULATION OF
S15 SYNTHESIS
Recent analysis of S 15-mediated translation repression has revealed a novel mechanism (90).The 3 0 3 subunit apparently binds to the mRNA in its pseudoknot conformation to form a preinitiation complex that generates a toeprint at the promoter distal border of the pseudoknot, about six bases upstream from the normal toeprint. This preinitiation complex can be converted into a traditional translation initiation complex, presumably simultaneously with unwinding of the pseudoknot, to allow the Shine-Dalgarno sequence of the mRNA to base-pair with the 1 6 4 rRNA. These studies also provide important evidence that S15 can bind not only
352
JARiICE M. ZENGEL AND LASSE LINDAHL
u’ C A
.. Ld G-C 670- G-C
A
AC U A u G.4 A A A-U C-C
UG -50. C-G C.C C-G
9.’ .lo.B Uu u
‘USA “A-U U-A *+I0 G-C C-G G-C*-30 A-U G U U-A ..AC?CU U U C A U UC U $ U A Cue.. .60
G-C G-C G-C
...A C ? C U G UU U C A U U C U 2 U AU -a
-20
S15 mRNA
-20
16s rRNk
FIG. 7 . Targets for S15 on inRNA and 164 rRNA. The structures of the S15 mRNA were proposed by Philippe ef al. (34).The Shine-Dalgarno sequence and AUG initiation codon for the S15 gene are indicated b y open boxes. S15 is thought to bind to the pseudoknot forin (34).The region of 16-S rRiX.4 cwntaining the S15 binding site (87-89) is shown on the right. The shaded regions indicate possihle structure homologies between the pseudoknot form of S15 mRNA and the 16-S rRN.4 (.34).
to naked mRNA, but also to the preinitiation complex (i.e., the binary mHNA::30-S complex). Binding of S15 to the binary complex prevents the transition to the canonical ternary initiation complex (90). It is important to note that the regulatory protein S15 and the 30-S subunit can bind simultaneously to the mRNA and thus do not compete for the same or overlapping binding sites on the inRNA. Thus, the S15 gene is regulated by an “entrapment” mechanism similar to that described for the alpha operon: the regulatory protein stabilizes an unproductive ribosome::mHNA complex, making the mRNA unavailable for translation initiation.
3.
€ ‘ R ~ ~ E S S ~ NOF G
S15 OPERONmRNA
Genetic evidence suggests that expression of the S15 and p n p genes may also be regulated by RNA processing. RNA transcribed from the two genes is cleaved by two nucleases, RNase 1x1and RNase E. RNase I11 cleaves at a site downstream from the P, promoter, resulting in marked destabilization of the p i p transcripts. Consistent with this, polynucleotide phosphorylase is overproduced in RNase III-defective mutants (83, 91, 92). Experiments with an RNase E mutant suggest that this iiuclease cleaves at two sites flanking the
RIBOSOMAL PROTEIN SYNTHESIS IN
E . COh
353
t, terminator; this cleavage destabilizes the S15 transcript (93). However, the effects of RNase I11 and RNase E processing on translation regulation of S15 is not clear.
D. spc Operon 1.
REGULATION OF TRANSLATION
The spc operon contains 11 r-protein genes and secY, a gene coding for a protein involved in protein secretion (Fig. 1). Early experiments in vitro and in vivo indicated that r-protein S8 regulates translation of eight contiguous genes within the operon, beginning at the third gene encoding L5; the two proximal genes, encoding L14 and L24, were reportedly not under S8 control (27, 94). Recently, a more detailed analysis of the regulation of the spc operon indicated that overproduction of regulatory protein S8 does temporarily repress the expression of L14 and L24, but within a few minutes this regulation fades, apparently overridden by other mechanisms (54). (There are no studies directly addressing the regulation of the two most distal genes of the operon, secY and rpmJ.)The %-mediated regulation originates from a site located between the second gene coding for L24 and the third gene coding for L5 (95). Genes distal to this regulatory site are repressed by S8 at the level of translation (27, 95), whereas the regulation of the two upstream genes appears to be due to “retroregulation” (54;see Section 111,D,3). 2. BINDINGOF S8 TO RNA The S8 binding site on spc mRNA (Fig. 8) was identified by isolating RNA fragments protected from nuclease digestion by r-protein S8 and rebinding of the protein to the purified RNA fragments (33). Interestingly, the rebinding requires two noncontiguous but complementary fragments of the RNA, suggesting that S8 binds to a helical segment of the mRNA (33) (Fig. 8). This position agrees well with the locations, in the region around the start of the L5 gene, of mutations that abolish the %-dependent repression of translation (95). Secondary structure probing and phylogenetic studies (95) indicate that the structure of the S8 regulatory target on the mRNA is similar to the structure of its binding site on 16-S rRNA (Fig. 8), a homology that was also proposed after visual inspection of the sequence of the mRNA region protected from nuclease by S8 (33).The S8 regulatory protein binds to the spc mRNA with an association constant about one-fifth that of the constant for S8 binding to 16-S rRNA (33; see also Section 11,B). The S8 binding site on 1 6 3 rRNA is among the best studied r-protein binding sites on rRNA (see, e.g., 33, 88, 96, 97). It is a much more compact site than, for example, the S4 binding site (see Section 111,B): binding of S8 requires only a phylogenetically conserved helix with a bulge region (33,97).
354
JANICE M. ZENGEL AND LASSE LINDAHL
+P
...C UGGG-CA U G A m
-10
u . ..
"D+I C -G
C
C -G
*A A A
A A0630 C
UG
...C
+P
-10
...
A A A
UG G u
U G C G ~ c A U C A U ~ C-G
C-C U-A A-U G-C
A A0630 C
G-C U-A A-U G-C
C-G
bw
+I0 +IN G-C U-A A-U C-G
U G G-C U-A
..Gu'Auu*
cu ...
G-C U-A A-U C-C U-A
5W.y-.-?
.. ..
..G
+20
UG G-C U-A 590.U-A
uuecu ...
..
+20
spc mRNA
16-S rRNA
spc mRNA
Model 1
16-S rRNA
Model 2
FIG. 8. Targets for S4 on spc mRNA and 16-S rRNA. Model 1 shows the secondary structure of the S4 target on spc mRNA proposed by Cerretti et al. (9.5).and the S4 target region on 16-S rRNA proposed by Mougel et al (97). Model 2 shows a slightly different structure for mRNA and 1 6 4 rRNA proposed by Gregory et al. (33). The AUG initiation codon and ShineDalgarno sequence of the L5 gene are indicated by open boxes. The regions of possible homology between mRNA and 1 6 4 rRNA are indicated by shading.
Studies of S8 binding to mutant variants of 16-S rRNA indicate that the features required for binding to the 16-S rRNA are similar to the structural features required in the spc mRNA for S8-dependent repression of translation (33, 95). The interaction of S8 with the spc operon mRNA at a 16-S rRNA-like target thus fits the generic model for autogenous regulation (30, 98). Ironically, this model originated from spc mRNA secondary structures derived by computer analysis of limited segments of the RNA (98), later shown by experimental testing to be incorrect (33, 95); nevertheless, the principle of the original model was correct.
3. MECHANISMOF TRANSLATION REPRESSION The binding of S8 to spc mRNA is believed to inhibit directly the translation of the W gene. The molecular basis for this repression is not yet clear, but the proximity of the S8 binding site to the L5 initiation codon (Fig. 8) suggests that binding of S8 to the helix that includes the initiation codon may stabilize this structure, thereby either directly preventing binding of ribo-
RIBOSOMAL PROTEIN SYNTHESIS IN
E . coli
355
somes to the mRNA, or preventing the progression from a preinitiation complex to an initiation-competent complex. In this connection, it should be noted that the current secondary structure model for this region of the spc message predicts that the Shine-Dalgarno sequence for the L5 gene is not base-paired (Fig. 8) and hence may be available even if S8 is bound. The seven r-protein genes downstream from L5 (i.e., S14 through L15) are not regulated directly by S8 binding to the spc mRNA, but rather by an indirect effect of the inhibition by the r-protein of L5 translation (94, 99). This effect is not due to transcription regulation, because measurements of mRNA synthesis during the autogenous response to S8 oversynthesis show that excess S8 has no effect on the transcription rate (99; J. M. Zengel and L. Lindahl, unpublished). Rather, translation of the downstream genes appears to be coupled to (i.e., dependent on) translation of the upstream L5 gene. When the initiation codon of the L5 gene is mutated to a termination codon, translation of the downstream genes is decreased (99). Northern analysis of accumulated mRNA showed that oversynthesis of S8 also results in site-specific processing and increased turnover of the spc mRNA, presumably due to an indirect effect of the reduced translation (99). The increased rate of mRNA degradation may help to propagate the autogenous regulation through the long spc operon (99), because translational coupling may not be absolute. Thus, the increased mRNA turnover rate could serve the same function achieved by the transcription regulation (attenuation) in the S10 operon (Section 111,A). The L14 and L24 genes are upstream from the target for S8 and their translation is ,not directly inhibited by S8 in vitro (27). Nevertheless, the synthesis of L14 and L24 is temporarily repressed in vivo when S8 is oversynthesized (54).Because excess S8 leads to fragmentation of the spc mRNA, and because one of these fragments carries the L14 and L24 gene sequences, it was proposed that L14 and L24 synthesis is regulated by retroregulation (54).That is, S8-mediated inhibition of translation triggers nucleolytic cleavage of the spc mRNA near the S8 binding site, and degradation of the resulting L14-L24 mRNA fragment. Consistent with this proposal, the repression of L14 and L24 synthesis is diminished in a mutant defective for polynucleotide phosphorylase and RNase I1 (54), two 3’-to-5’ exonucleases that have been implicated in mRNA degradation.
E. rif Region 1. GENETICAND
REGULATORY ORGANIZATION
The rifampicin region (Af)contains the genes for the four r-proteins, L11, L1, L10, and L12, as well as for the RNA polymerase subunits p and p’ (Fig. 1). Two major promoters drive transcription of these genes: PLll at the
:356
JANICE M. ZENGEL AND LASSE LINDAHL
beginning of the gene cluster and P,,, between the second and third gene (15, 16, 100, 101). There is little or no termination of transcription after the second gene, so that most or all polymerases continue to the end of the cluster (15-1 7). Because the genes for L10 and L12 are transcribed from both promoters, they are transcribed at a higher rate than the upstream L11 and L1 genes (16). Transcription is partially terminated by an attenuator between the L12 and p genes, such that the p and p’ genes are transcribed only about one-fifth as frequently as the r-protein genes (15, 16, 102, 103). Finally, all transcripts are terminated at the end of the cluster (102).Further complexity is added to the expression of this region by a site for RNase 111 cleavage immediately upstream from the attenuator between the L12 and 6 genes (Fig. 1).although processing at this site has no clear effect on gene expression (103, 104). Even though the four r-protein genes in the rif region are partially cotranscribed, they are regulated by two separate autoregulatory translation repression circuits. The two proximal genes are regulated by r-protein L1 (27, 29), whereas the two distal r-protein genes are regulated by LlO, probably in a complex with L12 (105-108). Expression of the p and p’ genes is not affected by the two regulatory r-proteins. Rather, these two genes are autogenously regulated by both transcriptional and translational processes in response to HNA polymerase activity. (For recent discussions of the regulation of‘ p and p’, see 109-112.)
2.
REGULATION OF THE
L11 OPERON
The two-gene L11 operon (Fig. 1) is regulated at the translational level by the product of the downstream gene, L1(27,29).Genetic studies showed that the target for L1 control is upstream from the L11 gene (113-11S). Enzymatic structure probing (116) and phylogenetic studies (117-119) identified a structure in the L11 leader RNA homologous to the site of L1 binding in 2 3 4 rRNA. This region contains two short helices separated by an internal loop (Fig. 9).Although binding of L1 to this region of the leader has not been directly demonstrated, the correspondence between the 23-S binding site and the presumed binding site on the mRNA was confirmed by mutational studies. Base changes in the presumed target region of the leader that reduced or eliminated the sensitivity to L1 repression (115) were compared to analogous changes introduced into the L1 binding site on a fragment of 2 3 4 rRNA. Specific base changes had very similar effects on the &nity of L1 for 23-S rRNA and L1-mediated repression of translation (120).Thus, similar features on both RNAs are recognized by the regulatory protein. Two observations demonstrate that translation regulation of the downstream L1 gene is accomplished by translational coupling to the upstream
RIBOSOMAL PROTEIN SYNTHESIS IN
E . coli
357
A 'GA 2140
G\b
;"* ;
2150
G G AGy,t AG U UG UG 2130 a U-A C-G G-C.2160
Uc A C-G U-A C-G a -30
U G
A G G A 'A2120 GGU I l l
C CA C
u
2110=G
A-U '
GU
U-A
... U U U U U
:10
~
c u A GA G A . . .
t1
U-A U-A
c
0
2180
2170
u
C-G GU A-U
L11 mRNA FIG.9. Targets for L1 on L11 mRNA and 23-S rRNA. The secondary structure of the L11 mRNA is based on enzymatic structure probing (116) and phylogenetic studies (121). The structure of the L1 target on 23-S rRNA is based on the structure of Branlant et al. (119). The Shine-Dalgarno sequence and AUG initiation codon for the L11 gene are indicated by open boxes. The regions of similarity between mRNA and 23-S rRNA (115, 120) are indicated by shading.
L11 gene. First, deletion of the translation initiation region for the L11 gene blocks translation of both L11 and L1 (113). Second, mutations in the leader that eliminate L1-mediated control of L11 also relieve translational repression of L1 (114).Efficient translation of the L1 gene requires ribosomes to traverse almost to the end of the upstream L11 gene; in the absence of L11 translation, translation of the L1 gene is inhibited by the RNA sequence in the distal part of the L11 gene (121).The mechanism of this inhibition is not clear. One possibility is that the inhibitory sequence contributes to a secondary structure that sequesters the L1 ribosome binding site. However, not all point mutations resulting in partial relief of L1 repression are consistent with this model (121). The L1 gene can be activated by deleting most of the L11 gene, showing that, in the absence of L11 translation, the ribosome binding site is intrinsically capable of accepting ribosomes from the pool of free ribosomes. Nevertheless, in vivo translation of the L1 gene in the deletion mutants is less
358
JANICE M. ZENGEL AND L A S E LINDAHL
efficient than the translation of the gene from the wild-type L11/Ll message. This suggests that the L1 translation initiation site works better if ribosomes already associated with the mRNA are “delivered to the L1 binding site from the upstream L11 gene (121).Translation of the L1 gene in the absence of L11 translation can also be improved by substituting adenines for guanines in the region immediately upstream of the Shine-Dalgarno sequence of the L1 gene (121). A similar improvement in the translation efficiency has been observed for the f l phage gene VII, also translationally coupled to its upstream gene (222). In the latter system, it was suggested that ribosomes interact with the mRNA upstream from the Shine-Dalgarno region during the formation of the translation initiation complex, and that guanines in this upstream region reduce the ability of the mRNA to associate with ribosomes (122).
3.
REGULATION OF THE
L10 OPERON
L1O (or the “L8” complex, containing one copy of L10 and four copies of L12) represses translation of the L1O and L12 genes both in uiuo and in uitro (105-108). The LlO-(L12), complex binds specifically to the L10 leader (31). The secondary structure of the L1O leader has been determined by chemical structure probing both in uitro and in oiuo (123) and the target for the L1O repressor complex w a s mapped by genetic and mRNA protection experiments (123-126). The LlO-(L12), target on 23-S rRNA has been mapped by chemical as well as phylogenetic studies (127)to a region adjacent to the L11 binding site (Fig. 10). No experiments have directly addressed the question of similarities between the mRNA and 2 3 4 rRNA targets, but one possible region of similarity is shown in Fig. 10. The target for the translational repressor on the L10 leader is unusual among translational repressor sites in that it is located 120-160 bases upstream from the initiation codon of the L10 gene. To explain this longdistance interaction it was proposed (126) that the binding of LlO-(L12), to the leader induces a change in the secondary structure leading to sequestering of the Shine-Dalgarno sequence of the L10 gene and hence to inhibition of translation. However, subsequent studies of the mRNA structure failed to confirm the predicted structural switch (123).Thus the mechanism for the translation repression of the L10 gene is still unknown. It is noteworthy that the chemical modification of certain regions of the L10 leader is particularly sensitive to the presence of Mg2+, suggesting that these bases are involved in tertiary structure formation (123). Translation of the L12 gene is coupled to translation of the LlO gene (106, 128), even though the two genes are separated by a 66-base intercistronic region. Together with the translationally coupled S10 and L3 genes, which are separated by 32 bases (67), this demonstrates that genes do not have to be
RIBOSOMAL PROTEIN SYNTHESIS IN
E . coli
359
-110 'UU
c u u u U-A A-U
uU-"u A-U -120.A
-1m.c~ G A C-G U-A
A A G ~ G A
G-C A-U.-100 UG C-G G-CU C-GU A-U A-U
. C t A A C A U C C ~ Z ~ C A A A G C ULAUGJ. A . -20
GG U-A C-G
Gc Gc -1% * A 4 -200 C-G C-G
LlO/L12 mRNA
-10
*I
GU G-C 1W.A G AG G-C G-C GU U G.1120 UG-C 'G-CG 'miC-G AA 'A G
G
23s rRNA
u c u c U
FIG.10. Targets for LlO-(L12), on mRNA and 23-S rRNA. The secondary structure of L10 mRNA (123) in the region of the regulatory target for the LlO-(L12), protein complex (31, 124, 126) is shown on the left. The Shine-Dalgarno sequence and AUC initiation d o n for the L10 gene are indicated by open boxes. The secondary structure of the LlO-(L12), binding site in the 23-S rRNA (127) is shown on the right. The shaded regions indicate possible regions of homology shared by the two targets.
located within a few bases of each other in order to be translationally coupled. Recent studies show that the coupling of L12 and L1O depends on sequences not only in the intercistronic region, but also in the proximal part of the L10 gene, suggesting that long-distance base-pairing is involved (128). Perhaps the entire mRNA is condensed when the translation of the L10 gene is blocked, thereby permanently inactivating the mRNA.
F. S20 Operon The S20 protein is encoded by a monccistronic operdn transcribed from two tandem promoters (129). This r-protein regulates its own translation as
360
JANICE .M. ZENGEL AND LASSE LINDAHL
shown both by in citro translation experiments (130)and in uiuo gene dosage (131),and by S2O oversynthesis studies (132). It was originally thought that the S2O operon conforms to the paradigm of homologous target sites for the regulatory protein on mRNA and rRNA because of primary sequence similarities between the translation initiation region of the S20 leader and the 5' region of 16-S rRNA harboring the S20 binding site (131,133).However, this similarity appears to be fortuitous, because mutations in this region of the leader reduce the translation efficiency but have no specific effect on S20mediated translation repression (132).Furthermore, subsequent analysis of the S20 binding site in 164 rRNA revealed no 520 footprints in the regions homologous to the leader (134),and mutations in these regions also had no efiect on S20 binding (135).Finally, it has not been possible to demonstrate binding of 520 to purified leader RNA (35). The S20 gene initiates with a UUG codon. A change to the traditional AUG initiation codon eliminates translational repression of the S20 gene. This observation, together with the inability to demonstrate S20 binding to its mRNA, prompted the suggestion that the target for S20 action in repressing its own synthesis may be the 30-S::S20-mRNA complex, rather than the "naked" mRNA (132). An alternative model emerges from recent experiments that suggest that 30-S ribosomal subunits lacking S20 initiate protein synthesis relatively inefficiently on normal initiation codons (136). It is tempting to speculate that at limiting S20 synthesis, S20-less 3 0 4 subunits accumulate. These deficient subunits may initiate translation at the UUG initiation codon in the S20 gene more efficiently than complete subunits, thereby favoring the synthesis of S20. Because S20-less 3 0 3 particles can be converted into normal 30-S subunits (136),repression of the S20 gene could be accomplished by addition of S20 to S20-less particles.
G. S1 Operon S 1 is the only r-protein encoded by this operon, but the gene may be
partially cotranscribed with the downstream hip gene coding for hostintegration protein (38) (Fig. 1). Upstream from the S1 gene, there is an open reading frame (ORF-25) coding for a 25-kDa protein of unknown function (137).Several promoters located upstream and within ORF-25 contribute to the transcription of the S 1 gene (137). Gene dosage experiments (37) showed that the S1 gene is regulated at the posttranscriptional level. Subsequent experiments with S1 gene fusions and oversynthesis of S1 in trans, as well as with in uitro translation, demonstrate that S1 is the effector for this regulation (38). The mechanism for regulating S1 synthesis is of considerable interest, because S 1 is not a primary rRNA binding protein like the other r-proteins that have been identified as autogenous repressors. S 1 also differs from other
RIBOSOMAL PROTEIN SYNTHESIS IN
E.
COli
361
r-proteins by rapidly cycling on and off the ribosome. The protein has a general affinity for RNA, especially oligo(U)-containing sequences (138). S1 also interacts with 16-S rRNA at the 3’ region (139). However, proteolytic fragments of the protein, which have only weak affinity for RNA, still bind efficiently to 30-S ribosomes, suggesting that S1 incorporation into the 30-S subunit is due mainly to protein-protein interactions (140).Furthermore, fragments of r-protein S1 lacking the domain(s) for strong RNA binding still repress S 1 synthesis (38).These results indicate that S1 may repress its own translation without direct binding to mRNA.
H. L20 Operon The genes for r-proteins L35 and L20 form a complex transcription unit with the genes for translation initiation factor-3 (IF3) and threonyl-tRNA synthetase (Fig. 1). Four promoters have been identified, but three are internal, mapping within the threonyl-tRNA synthetase (thrS)and I F 3 (infC) genes (21, 22). Thus, the thrS gene, which occupies the 5’-most position in the transcription unit, is transcribed from just one promoter, whereas the two r-protein genes at the 3’ end are transcribed from all four promoters (Fig. 1). There is a terminator between infC and the r-protein genes that stops about half the RNA polymerases (21,22).The genes for the subunits of phenylalanyl-tRNA synthetase are immediately downstream from the thrsinfC-L35-L20 cluster, but these genes are transcribed from a separate promoter and are regulated independently by an attenuator (141-143). Even though the thrs, infC, L35, and L20 genes are partially cotranscribed, they are subject to three distinct circuits of autogenous regulation, all at the translational level. Each thrS and infC product controls its own synthesis (144-146), whereas L20 is the repressor of its own gene and the gene for L35 (22, 147). The autorepression of thrS translation depends on the secondary structure of the thrS leader RNA, which mimics the structure of threonyl-tRNA (148).Binding of threonyl-tRNA synthetase to the leader RNA prevents the ribosome from forming a translation initiation complex, presumably because ribosomes and threonyl-tRNA synthetase cannot bind simultaneously (149). The phenotypes of mutations in the leader RNA and in the threonyl-tRNA synthetase suggest that the same region of the protein interacts with both tRNA and mRNA leader (148, 150). Thus the regulation of the thrS .gene follows the paradigm proposed for the translational repression of r-protein genes. The trandational repression of the thrS gene has been the subject of a recent review (151). The translational autorepression of the infC gene requires its unusual initiation codon, AUU: a mutation changing it to the standard initiation codon AUG results in loss of regulation (152). The translation initiation site of
362
JANICE M. ZENCEL AND LASSE LINDAHL
the infC gene also lacks a recognizable Shine-Dalgarno region. It was proposed that the regulation of translation of this gene may depend on an interaction between the region upstream of the AUU initiation codon and an internal portion of the L6-S rRNA (153),but this hypothesis has not yet been tested experimentally. IF3 functions in translation initiation by helping to ensure that stable ternary complexes containing 30-S, mRNA, and tRNA are formed only at true initiation codons (154,155). Given that IF3 regulation depends on the unique initiation codon, it is tempting to speculate that the role of IF3 in preventing incorrect initiation complexes from forming may also be involved in the control of its own synthesis. The details of the molecular mechanism for L20-mediated translation repression of the L35 and L20 genes are still lacking. However, a region internal to the infC gene is known to be essential (22), suggesting that translation repression depends on binding of L20 to a translated region of the mRNA that lies well upstream from the regulated cistrons. Recent experiments confirmed that L20 directly inhibits translation of the promoterproximal L35 gene, and indirectly, by translational coupling, blocks translation of the downstream L20 gene (147).Genetic and RNA structure studies suggest that translation of the L35 gene prevents formation of a “coupling structure” that sequesters the ribosome initiation site for the L20 gene (1471.
I. str Operon The str operon contains genes for r-proteins S12 and S7 and the elongation factors G and Tu [Fig. 1; an additional gene for EF-Tu is located elseI n uitro experiments show that S7 inhibits where on the chromosome (9)]. expression of its own and the EF-G genes but has little or no effect on synthesis of S12 and EF-Tu (52).However, S7 in vivd also represses synthesis of S12 (and possibly EF-Tu), although not to the same extent as the inhibition of S7 and EF-G synthesis (M. Nomura, personal communication). In analogy with the spc operon, the target for S7 is not upstream from the most proximal gene, but rather maps between the genes for S12 and S7. Translation of the S7 gene is directly inhibited whereas the repression of S12 synthesis appears to be mediated by retroregulation, i. e., enhanced degradation of mRNA upstream from the S7 target site (M. Nomura, personal communication). Experiments with a construct consisting of the S12 gene followed by an S7’llacZ‘ fusion showed that mutations in the S12 gene that abolish its translation reduce synthesis of the fusion protein by about 90% (M. Nomura, personal communication). These results indicate that translation of the S7 gene is coupled to translation of the upstream S12 gene, but the coupling is incomplete, similar to what was found for the S10 and L3 genes (see Section III,A,5). When S7 is oversynthesized in trans, expression of the S7‘llacZ’
RIBOSOMAL PROTEIN SYNTHESIS IN
E.
COh
363
fusion gene is inhibited if there is an intact S12 gene upstream. However, if translation of the S12 gene has been genetically blocked, the 10% residual fusion protein synthesis is not further reduced by excess S7. The interpretation of these results is that S7 specifically inhibits translation of the S7 gene that is due to coupling to the S12 gene, but not translation due to ribosomes entering at the intercistronic junction (M. Nomura, personal communication).
IV. Epilogue A. PhysiologicaI Implications Can the autogenous control mechanisms discussed above account for the regulation of ribosomal protein synthesis under various growth conditions? Perhaps more importantly, can we prove experimentally that the autogenous control is necessary and sufficient to account for the phenomenological description of ribosome synthesis? The model in Fig. 2 invites the speculation that regulation of r-protein synthesis is simply piggybacked on the regulation of rRNA synthesis, because r-protein synthesis can only proceed when sufficient rRNA sites are available to sequester the inhibitory regulatory r-proteins from the pool of free proteins. Several experiments demonstrate that the coupling of r-protein and rRNA synthesis mediated by autogenous control does indeed help to balance r-protein synthesis with the total biosynthesis of the bacterial cell. For example, induced oversynthesis of 16-S or 23-S rRNA leads to enhanced expression of the r-protein genes that are in operons autogenously regulated by 30S or 50-S r-proteins, respectively (156).Also, the synthesis of r-protein L11 is increased in mutant cells that do not synthesize L1, the autogenous control protein for the bicistronic L11/L1 operon (157, 158). In addition, steadystate growth rate regulation as well as stringent control (preferential inhibition of the synthesis of ribosomal components during amino-acid deprivation) of the L11 operon is disrupted in cells with a mutated mRNA target for L1 binding (159). These experiments all suggest that autogenous control is necessary for normal physiological regulation of r-protein synthesis, but do not exclude the contribution of other mechanisms. Experiments with the S10 operon suggest that even though autogenous control is necessary, it is not sufficient for normal in uiuo regulation. Readthrough at the S10 attenuator (see Section II1,A) is transiently increased when cells are rapidly shifted from one growth medium to another supporting higher growth rates (57), consistent with the involvement of the attenuation mechanism in modulating S10 operon expression immediately after a
364
JANICE M. ZENGEL AND LASSE LINDAHL
“nutritional shift-up. ” However, read-through at the S 10 attenuator does not remain at the increased level after the shift-up, nor does it change significantly at various steady-state growth rates (57). Thus, the attenuation control mechanism provides a “quick fix” during a nutritional transition state, but apparently does not play a major role in regulating S1O operon expression during steady-state growth. Indeed, other experiments show that the 5’ end of the S10 operon, including the promoter, leader, and proximal one-and-one-half structural genes, contains the information for autogenous control induced by oversynthesis of the regulatory r-protein L4, but is insufficient for normal steady-state growth rate regulation (160). These results indicate that mechanisms in addition to those described in this review contribute to the physiological control of r-protein synthesis. Classic experiments suggested that the lifetime of r-protein rnRNA changes as a function of the growth rate (161).Perhaps the “missing” mechanisms involve control of mRNA stability.
B. Future Directions We expect that the study of r-protein synthesis will take several directions in the future. Certainly there is still much to be learned about the details of the molecular mechanisms. It is already clear that such work will be relevant not only to the specific genes studied, but also to broader questions about transcription termination, translation initiation, and mRNA processing. Analysis of the regulation of r-protein synthesis in different bacterial species will undoubtedly be a valuable additional source of information. There is already much known about the organization of r-protein genes in non-E. coli systems, but relatively little understanding of their regulation. Interestingly, the most prominent r-protein gene clusters, the str-spc and .if regions, are preserved in systems as diverse as gram-positive bacteria, archaebacteria, and chloroplasts, even though insertions, deletions, and transpositions have occurred (reviewed in 2 , 162-164). However, although the gene sequences and order are often highly conserved, the transcription organization is often different. For example, the spc-alpha gene cluster is largely preserved in B. suhtilis (165-168), but the alpha promoter appears to be absent (168).In addition, the rifregion in several species contains the L1, L10, and L12 genes in one operon and the L11 gene in a monocistronic transcription unit; in some species the latter gene is even genetically unlinked from the three other genes (162, 163). Variations in the transcription organization of r-protein genes among species imply that the regulatory schemes must differ from those of E . coli. Experimental evidence for this is already appearing. The S4 gene controls its own translation in B. subtilis (169, 170), but this gene is not part of the S10-
RIBOSOMAL PROTEIN SYNTHESIS IN
E.
COli
365
spc-alpha cluster in this organism (171). Rather, it is in an unlinked monocistronic transcription unit. In addition, the structures forming the S8 binding site in the E . coli spc mRNA are not preserved in B . subtilis spc mRNA, at least not in the same genetic location (165). Furthermore, the existence in several organisms of a transcription unit with both L1 and L1O genes, both of which code for regulatory proteins in E . coli, also suggests that the E . coli “rules” cannot be universal. The emerging field of phylogenetic analysis of r-protein synthesis promises not only to expand the diversity of the known repertoire of regulatory mechanisms, but it also might yield clues about how these mechanisms evolved. For example, how and why were certain r-proteins recruited (and perhaps in some cases decommissioned) as autogenous regulators? A particularly interesting question is whether the autogenous regulatory systems all evolved from one ancestor system, or emerged independently of each other. In summary, the genetic organization and expression of r-protein genes in an organism as “simple” as E . coli represent a complex microcosm of regulatory mechanisms. Even though excellent progress has been made in understanding some of the molecular details of their regulation, there are still many outstanding questions to be answered before we fully understand the regulation of this global multigene system.
ACKNOWLEDGMENTS We thank all of our colleagues who provided reprints and preprints of their papers. The narrow scope of our review prevented discussion of all of the interesting results in the bacterial ribosome-synthesis field. We also thank David Goldfarb and Terry Platt for helpful comments on the manuscript. Work in our laboratory was supported by a research grant from the National Institute of Allergy and Infectious Diseases.
REFERENCES 1. L. Lindahl, R. H. Archer and J. M. Zengel, in “Interaction ofTranslationa1and Transcriptional Controls in the Regulation of Gene Expression” (M. Grunberg-Manago and B. Safer, eds.), p. 105. Elsevier, New York, 1982. 2. T. M. Henkin, in “Bacillus subtilis and Other Gram-Positive Bacteria: Biochemistry, Physiology and Molecular Genetics” (A. L. Sonenshein, J. A. Hoch and R. Losick, eds.), p. 669. American Society for Microbiology, Washington, D.C., 1993. 3. H. A. RauC and R. J. Planta, This Series 41, 89 (1991). 4. J. L. Woolford, Jr., and J. R. Warner, in “The Molecular and Cellular Biology of the Yeast Saccharomyces” (J. R. Broach, J. R. Pringle and E. W. Jones, eds.), Vol. 1, p. 587 CSHLah, Cold Spring Harbor, New York, 1991. 5. S. Ramagopal, Biochem. Cell B i d . 70, 738 (1992).
366
JANICE M. ZENGEL AND LASSE LINDAHL
6. R. Mache, Plant Sci. 72, l(1990). 7. F. Amaldi, I. Bozzoni, E. Beccari and P. Pierandrei-Amaldi, TIBS 14, 175 (1989).
8. C. Bagni, P. Mariottini. L. Terrenato and F. Amaldi, MGG 234, 60 (1992). 9. L. Lindahl and J. Zengel, ARGen 20, 297 (1986). 10. M. Nomura, R. Gourse and C. Baughman, ARB 53, 75 (1984). 11. S. Jinks-Robertson and M. Nomura, in “Escherichiu cdi and Sahonellu typhimurium, Cellular and Molecular Biology” (F. C. Neidhardt, J. L. Ingraham, K. B. Low, M. Schaechter and E. Umbarger, eds.), p. 1358. American Society for Microbiology, Washington, D.C., 1987. 12. R. L. Gourse and M. Nomura, in “Ribosomal RNA: Structure, Evolution, Processing, and Function in Protein Synthesis” (R. A. Zimmermann and A. E. Dahlberg, eds.). Telford, Caldwell, New Jersey, In press. 1993. 13. D. P. Cerretti. D. Dean, G. R. Davis, D. M. Bedwell and M. Nomura, NARes 11, 2599 (1983). 14. L. Lindahl, F. Sor, R. H. Archer, M. Nomura and J. M. Zengel, BRA 1050, 337 (1990). 15. W.L. Downing and P. P. Dennis, J M B 194, 609 (1987). 16. K. L. Steward and T. Linn, ]hlB 218, 23 (1991). 17. R. Briickner and H. Matzura, MGG 183, 277 (1981). 18. P. Rkgnier, 31. Grunberg-Manago and C. Portier, JBC 262, 63 (1987). 19. W. E. Taylor, D. B. Straus, A. D. Grossman, Z. F. Burton, C. A. Gross and R. R. Burgess, Cell 38, 371 (1984). 20. J. M. Zengel and L. Lindahl, BBA 1050, 317 (1990). 21. S. J. Wertheimer, R.-A. Klotsky and I. Schwartz, Gene 63, 309 (1988). 22. P. Lesage, H.-N. Truong, M. GraEe, J. Dondon and M. Springer, J M B 213, 4% (1990). 23. A . M .Edlon, C. S. Jinks, G. D. Strycharz and M. Nomura, PNAS 76, 3411 (1979). 24. L. Lindahl and J. M. Zengel, PNAS 76, 6542 (1979). 25. N. P. Fiil. J. D. Friesen, W. L. Downing and P. P. Dennis, Cell 19, 837 (1980). 26. J. M. Zengel. D. Mueckl and L. Lindahl, Cell 21, 523 (1980). 27. J. L. Yates, A. E. Arfsten and M. Nomura, PNAS 77, 1837 (1980). 28. J. L. Yates and M. Nomura, CeU 21,517 (1980). 29. D. Dean and M. Nomura, PNAS 77, 3590 (1980). 30. M. Nomura, J. L. Yates, D. Dean and L. E. Post, PNAS 77, 7084 (1980). 31. M. Johnsen, T. Christensen, P. P. Dennis and N. P. Fiil, EMBOJ. 1, 999 (1982). 32. I . C. Deckman and D. E. Draper, Bchem 24,7860 (1985). 33. R. R. Gregory, P. B. F. Cahill, D. L. Thurlow and R. A. Zirnmermann, JMB 204, 295 (1988). 34. C. C. Philippe. C. Portier. M. Mougel, M. Grunberg-Manago, J. P. Ebel, B. Ehresmann and C . Ehresmann. JLWB211, 415 (1990). 35. B. C . Donly and 6. A. Mackie, NARes 16, 997 (1988). 36. J. M. Zengel and L. Lindahl, NARes 21, 2429 (1993). 37. L. Christiansen and S . Pedersen, MGG 181, 548 (1981). 38. J. Skouv, J. Schnier, M. D. Rasmussen, A. R. Subranianian and S. Pedersen, JBC 265, 17044 (1990). .39. M . Laughrea and P. B. Moore, JMB 122, 109 (1978). 40. G. An, D. S . Bendiak, L. A. Mamelak and J. D. Friesen, NARes 9, 4163 (1981). 41. P. M. Wikstrom, A. S. Bystriim and G . R. Bjork, J M B 203, 141 (1988). 42. P. H. von Hippel and F. R. Fairfield, in “Mobility and Recognition in Cell Biology” p. 213. de Gruyter, New York, 1983. 43. X. Robakis. L. Men-Basso, N. Brot and H. Weissbach, PNAS 78, 4261 (1981).
RIBOSOMAL PROTEIN SYNTHESIS IN
E . coZi
367
44. N. Brot, S . Peacock and H. Weissbach, in “Structure, Function and Genetics of Ribosomes” (B. Hardesty and G. Kramer, eds.), p. 749. Springer-Verlag, New York, 1986. 45. J. E. G . McCarthy and C. Gualerzi, Trends Genet. 6, 78 (1990). 46. L. Lindahl and A. Hinnebusch, Curr. Opin. Genet. Deu. 2, 720 (1992). 47. D. Hartz, D. S. McPheeters, L. Green and L. Gold, JMB 218, 99 (1991). 48. L. Lindahl, R. Archer and J. M. Zengel, Cell 33, 241 (1983). 49. L. P. Freedman, J. M. Zengel, R. H. Archer and L. Lindahl, PNAS 84, 6516 (1987). 50. J. M. Zengel and L. Lindahl, ] M B 213, 67 (1990). 51. J. M. Zengel and L. Lindahl, J. Bact. 167, 1095 (1986). 52. D. Dean, J. L. Yates and M. Nomura, Cell 24, 413 (1981). 53. A. M. Fallon, C. S. Jinks, M. Yamamoto and M. Nomura, J. Bact. 138, 383 (1979). 54. L. Mattheakis, L. Vu, F. Sor and M. Nomura, PNAS 86, 448 (1989). 55. J. R. Cole and M. Nomura, JMB 188, 383 (1986). 56. P. Singer and M. Nomura, MGG 199, 543 (1985). 57. J. M. Zengel, R. H. Archer, L. P. Freedman and L. Lindahl, EMBO]. 3, 1561 (1984). 58. J. M. Zengel and L. Lindahl, PNAS 87, 2675 (1990). 59. P. Shen, J. M. Zengel and L. Lindahl, NARes 16, 8905 (1988). 60. P. Shen, Ph.D. thesis. University of Rochester, Rochester, New York, 1991. 61. J. M. Zengel and L. Lindahl, Biochimie 73, 719 (1991). 62. H. Gulle, E. Hoppe, M. Osswald, 8. Greuer, R. Brimacombe and G. Stoffler, NARes 16, 815 (1988). 63. R. Maly, J. Rinke, C. Sweib and R. Brimacombe, Bchem 19, 4179 (1980). 64. P. 0. Olins and M. Nomura, Cell 26, 205 (1981). 65. T. Platt, ARB 55, 339 (1986). 66. J. M. Zengel and L. Lindahl, Genes Dew. 6, 2655 (1992). 67. L. Lindahl, R. H. Archer, J. R. McCormick, L. P. Freedman and J. M. Zengel, J. Bact. 171, 2639 (1989). 68. S . Adhya and M. Gottesman, ARB 47, 967 (1978). 69. F. W. Studier, Science 176, 367 (1972). 70. I. Iost, J. Guillerez and M. Dreyfus, J. Bact. 174, 619 (1992). 71. S. Jinks-Robertson and M. Nomura, J . Bact. 151, 193 (1982). 72. M. S. Thomas, D. M. Bedwell and M. Nomura, JMB 196, 333 (1987). 73. I. C. Deckman and D. E. Draper, J M B 196, 313 (1987). 74. I. C. Deckman and D. E. Draper, JMB 196, 323 (1987). 75. C. Tang and D. E. Draper, Cell 57,531 (1989). 76. J. V. Vartikar and D. E. Draper, J M B 209, 221 (1989). 77. S. Stern, R. C. Wilson and H. F. Noller, J M B 192, 101 (1986). 78. A. Sapag, J. V. Vartikar and D. E. Draper, BBA 1050, 34 (1990). 79. D. E. Draper, Acc. Chem. Res. 25, 201 (1992). 80. C. K. Tang and D. E. Draper, Bchem 29, 4434 (1990). 81. G. Spedding, T. C. Gluick and D. E. Draper, ] M B 229, 609 (1993). 82. G. Spedding and D. E. Draper, PNAS 90,4399 (1993). 83. P. R6gnier and C. Portier, ] M B 187, 23 (1986). 84. C. Portier, L. Dondon and M. Grunberg-Manago, JMB 211, 407 (1990). 85. C. Portier, C. Philippe, L. Dondon, M. Grunberg-Manago, J. P. Ebel, B. Ehresmann and C. Ehresmann, BBA 1050, 328 (1990). 86. R. Takata, T. Mukai, M. Aoyagi and K. Hori, MGG 197, 225 (1984). 87. R. Muller and R. A. Garrett, JBC 254, 3873 (1979). 88. J. M . Kean and D. E. Draper, Bchem 24, 5052 (1985).
368
JANICE M. ZENGEL AND LASSE LINDAHL
89, 51, Xlougel, C. Philippe, J.-P. Ebel, B. Ehresmann and C. Ehresinann, NARes 16, 2825 (1988). Yo. C. Philippe, F. Eyermain, C. Portier. L. Bhiard. B. Ehresinann and C. Ehresrnami, PVL4S90,4394 (1993). 91. P. Regnier and Xf. Crunberg-Manago, Biochirnie 72, 825 (1990). 92. H. Takata, hl. Izuhara and K. Hori, NARes 17, 7441 (1989). 93. P. Rt.gnier and E. Hajnsdorf, JAfB 217, 283 (1991). 94. D. Dean. J. L. Yates and M. Nomura, Nature 289, 89 (1981). 95. 1). P. Cerretti, L. C. Mattheakis, K. R. Kearney, L. Vu and M. Nornura, J M B 204, 309 (1988). 96. P. Svensson, L.-bl. Changchien, 6. R. Craven and H. F. Noller. JMB 200, 301 (1988). 97. Xl. Slougel. F. Eyermann, E. Westhof, P. Romhy, A . Expert-Bezanyon, J:P. Ebel, B. Ehresinann and C. Ehresmann, J M B 198, 91 (1987). $8. P. 0. Olins and ht. Nomura, NARes 9, 1757 (1981). 99. L. C. Mattheakis and M. Noinura, J. Bact. 170, 4484 (1988). 100. G. Ralling and T. Linn, J. Bmt. 158, 279 (1984). 101. X I . Yamamoto and hl. Noniura, PNAS 75, 3891 (1978). 102. C;. Barry. C. L. Squires and C. Squires, PNAS 76, 4922 (1979). 103. G. Barry. C. Squires and C. L. Squires, PNAS 77, 3331 (1980). 104. P. P. Dennis, JBC 259, 3202 (1984). 105. E. b’. Holawachuk, J. D. Friesen and N. P. Fiil, PNAS 77, 2124 (1980). 106. J. L. Yates, D. Dean, W. A . Strycharz and M. Nomura, Nature 294, 190 (1981). 107. K. Fukuda, 3fGG 178, 483 (1980). 108. N. Brot, P. Caldwell and H. Weissbach, PYAS 77, 2592 (1980). 109. u’. Downing and P. P. Dennis, JBC 266, 1304 (1991). 110. L. Passador and T. Linn, J. Bnct. 171, 6234 (1989). 111. K. L. Steward and T. Linn, NARCS20, 4773 (1992). Z12. L. Passador arid T. Linn, I. Bact. 174, 7174 (1992). 11.3. G. Baughiman and Xi. Nomura, Cell 34, 979 (1983). 11-Z. 6 . Baughrnan and hl. Nomura, €“AS 81, 5389 (1984). 115. M . S. Thomas and hl. Nomura, MARes 15, 3085 (1987). 116. K H. Kearney and hl. Nomura, MGG 210, 60 (1987). ZZ7. P. Sor and M. Nomura. MGG 210, 52 (1987). 118. N. L. Course, D. L. Thurlow, S . A. Gerbi and R. A . Zirnmerman. PNAS 78, 2722 (1981). 119. C . Branlant, A. Krol, A . Machatt and J.-P. Ebel, NARes 9, 293 (1981). 120. B. Said, J. N. Cole and M . Nomura, h‘ARes 16, 10529 (1988). 121. F. Sor, Xi. Bolotin-Fukuhara and M. Nomura, J. Bact. 169, 3495 (1987). 122. M . Ivey-Hoyle and D. A. Steege, J M B 224, 1039 (1992). 123. S. C. Climie and J. 11. Friesen, JBC 263, 15166 (1988). 124. S. C.Clirnie and J, D. Friesen, J M B 198, 371 (1987). 12.5. J. I>. Frieseii, hi. Tropak and G. An, Cell 32, 361 (1983). 126. T. Christensen, hl. Johnsen. N. P. Fiil and J. D. Friesen, EMBO J. 3, 1609 (1981). 127. I. Egehjerg, S . R. Douthwaite, A. Liljas and R. A. Garrett, J M B 213, 275 (1990). 128. C. Petersen, J M B 206, 323 (1989). 129. C . A. Xlackie and 6. D. Parsons, JBC 258, 7840 (1983). 130. R. Wirth, J. Littlechild and A . Biick, MGG 188, 164 (1982). 131. C;. 11. Parsons and G . A. hfackie, J. Ract. 154, 152 (1983). 1.31. 6. D. Parsons, B. C. Donlv and G. A . Mackie. J. Bact. 170, 2485 (1988). 133. G . A. Mackic, JBC 256, 8177 (1981). 1,34. S. Stern, L.-h1. Changchien, G. R. Craven and H. F. Noller, J M B 200, 291 (1988).
RIBOSOMAL PROTEIN SYNTHESIS IN
E. C O h
369
R. S . Cormack and G. A. Mackie, JBC 266, 18525 (1991). F. Gotz, E. R. Dabbs and C. 0. Gualerzi, BBA 1050, 93 (1990). S. Pedersen, J. Skouv, M. Kajitani and A. Ishihama, MGC 196, 135 (1984). I. V. Boni, D. M. Isaeva, M. L. Musychenko and N. V. Tzareva, NARes 19, 155 (1991). A. E. Dahlberg and J. E. Dahlberg, PNAS 72, 2940 (1975). S. Giorginis and A. R. Subramanian, JMB 141, 393 (1980). M. Springer, J.-F. Mayaux, G. Fayat, J. A. Plumbridge, M. Grafle, S. Blanquet and M. Grunberg-Manago, J M B 181, 467 (1985). 142. M. Springer, M. Trudel, M. Graffe, J. Plumbridge, G. Fayat, J. F. Mayaux, C. Sacerdot, S. Blanquet and M. Grunberg-Manago, J M B 171,263 (1983). 143. G. Fayat, J.-F. Mayaux, C. Sacerdot, M. Fromant, M. Springer, M. Grunberg-Manago and S. Blanquet, JMB 171, 239 (1983). 144. J. S. Butler, M. Springer, J. Dondon, M. Graffe and M. Grunberg-Manago, J M B 192, 767 (1986). 145. J. S. Butler, M. Springer, J. Dondon and M. Grunberg-Manago, J . Bact. 165, 198 (1986). 146. M. Springer, J. A. Plumbridge, J. S. Butler, M. GrafFe, J. Dondon, J. F. Mayaux, G. Fayat, P. Lestienne, S. Blanquet and M. Grunberg-Manago, JMB 185, 93 (1985). 147. P. Lesage, C. Chiaruttini, M. Graffe, J. Dondon, M. Milet and M. Springer,]MB 228,366 (1992). 148. M. Springer, M. GrafFe, J. Dondon and M. Grunberg-Manago, E M B O J . 8,2417 (1989). 149. H. Moine, P. Romby, M. Springer, M. Grunberg-Manago, J.-P. Ebel, B. Ehresmann and C. Ehresmann, ] M B 216, 299 (1990). 150. P. Romby, H. Moine, P. Lesage, M. Graffe, J. Dondon, J. P. Ebel, M. Grunberg-Manago, B. Ehresmann, C. Ehresmann and M. Springer, Biochimie 72, 485 (1990). 151. H. Moine, B. Ehresmann, P. Romby, J. P. Ebel, M. Grunberg-Manago, M. Springer and C. Ehresmann, BBA 1050, 343 (1990). 152. J. S. Butler, M. Springer and M. Grunberg-Manago, PNAS 84, 4022 (1987). 153. L. Gold, G. Stormo and R. Saunders, PNAS 81, 7061 (1984). 154. D. Hartz, D. S. McPheeters and L. Gold, Genes Deu. 3, 1899 (1989). 155. C. 0. Gualerzi and C. L. Pon, Bchem 29, 5881 (1990). 156. M. Yamagishi and M. Nomura, J. B a t . 170, 5042 (1988). 157. S. Jinks-Robertson and M. Nomura, J . Bact. 145, 1445 (1981). 158. G. Stoffler, R. Hasenbank and E. R. Dabbs, MCG 181, 164 (1981). 159. J. R. Cole and M. Nomura, PNAS 83, 4129 (1986). 160. L. Lindahl and J. M. Zengel, J . Bact. 172, 305 (1990). 161. P. P. Dennis and M. Nomura, JMB 97, 61 (1975). 162. B. Wittmann-Liebold, A. K. E. Kopke, E. Arndt, W. Kromer, T. Hatakeyama and H.-G. Wittmann, iii ‘The Ribosome: Structure, Function and Evolution” (W. E. Hill, A. Dahlberg, R. A. Garrett, P. B. Moore, D. Schlessinger and J. R. Warner eds.), p. 598. American Society for Microbiology, Washington, D.C., 1990. 163. A. T. Matheson, J. Auer, C. Ramirez and A. Bock, in ‘The Ribosome: Structure, Function and Evolution” (W. E. Hill, A. Dahlberg, R. A. Garrett, P. B. Moore, D. Schlessinger and J. R. Warner eds.), p. 617. American Society for Microbiology, Washington, D.C., 1990. 164. A. R. Subramanian, P. M. Smooker and K. Giese, in “The Ribosome: Structure, Function and Evolution” (W. E. Hill, A. Dahlberg, R. A. Garrett, P. B. Moore, D. Schlessinger and J. R. Warner eds.), p. 655. American Society for Microbiology, Washington, D.C., 1990. 165. T. M. Henkin, S. H. Moon, L. C. Mattheakis and M. Nomura, NARes 17, 7469 (1989). 166. D. A. Christopher, J. C. Cushman, C. A. Price and R. B. Hallick, Curr. Genet. 14,275 (1988). 167. J.-W. Suh, S. A. Boylan and C. W. Price, J . Bact. 168, 65 (1986). 135. 136. 137. 138. 139. 140. 141.
370
JANICE M. ZENGEL AND LASSE LINDAHL
168. S. A Boylan. J.-W. Suh, S. M. Thomas and C. W.Price, J. Bmt. 171, 2553 (1989). 169. F. J. Grundy and T. M. Henkin. J. B a t . 173,4595 (1991). 170. F. J. Grundy and T.M. Henkin, J . Bact. 174, 6763 (1992). 171. F. J. Grundy and T. M. Henkin, J. Bact. 172, 6372 (1990). 172. B. J. Bachmann, Microbiol. Rev. 54, 130 (1990). 173. L. Lindahl and J. M. Zengel, Ado. Genet. 21, 53 (1982). 174. Y. Tanaka, A. Tsujimura, N. Fujita, S. isono and K. Isono, J . Bmt. 171, 5707 (1989). 175. C . Sacerdot. G . Fayat, P. Dessen, M.Springer, J. A. Plumbridge, M. Grunberg-Manago and S. Blanquet, EMBO J. 1, 311 (1982). 176. J.-F. Mayaux, G. Fayat, M. Fromant, M. Springer, M. Grunberg-Manago and S. Blanquet, PNAS 80, 6152 (1983). 177. Y. Mechulam, G. Fayat and S. Blanquet, J . Bact. 163, 787 (1985). 178. H. I. Miller, CSHSQB 49, 691 (1984). 179. Y. Uemura, S. isono and K. Isono, MGG 226, 341 (1991). 180. J. R. Lupski, B. L. Smiley and G. N. Godson, MGG 189, 48 (1983). 181. J. R. Lupski, A. A. Ruiz and G. N. Godson, MGG 195, 391 (1984). 182. M. Nesin, J. R. Lupski, P. Svec and G. N . Godson, Gene 51, 149 (1987). 183. S. Evans and P. P. Dennis, Gene 40, 15 (1985). I&. A. Wada and T.Sako, I . Biochem. 101, 817 (1987). 185. J. Schnier. M. Kitakawa and K. Isono, MCG 204, 126 (1986).
Enzy mo Iog ic M e cha nism of Replicative DNA Polymerases in Higher Eukaryotes PAULA. FISHER Department of Pharmacological Sciences Health Sciences Center State Uniuersity of New York at Stony Brook Stony Brook, New York 11794
I. Catalytic Core of DNA Polymerase Q ............................ 11. Holoenzyme of DNA Polymerase Q ............................. 111. Interaction of DNA Polymerase Q with Template-Primers Containing
Chemically Damaged Nucleotides .............................. IV. DNA Polymerase 6 ........................................... V. Conclusions and Prospects for Future Research . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
372 380 386 390 394 396
Over the past several years, it has become clear that at least two different DNA polymerases are responsible for replicative DNA synthesis in higher eukaryotes. DNA polymerase ci (pol a), along with its associated DNA primase, is thought to be largely responsible for lagging-strand synthesis. DNA polymerase 6 (pol a), along with its auxiliary factor, proliferating cell nuclear antigen (PCNA), is believed to be largely responsible for leading-strand synthesis. Recent genetic studies in the lower eukaryote, Saccharomyces cereuisiae, suggest that a third enzyme, DNA polymerase E (pol ~ ) , 1is essential for complete nuclear DNA synthesis (1).The precise roles of pol E are not yet clear. This essay reviews results of enzymologic experiments performed originally with the pol-a catalytic core, subsequently with the pol-a holoenzyme, and most recently with the pol-6 catalytic core. Comparable studies of pol E have not yet been performed. Results of experiments with pol ci and pol 6 have provided fundamental and presumably general information regarding the basic mechanisms of replicative DNA polymerases as well as novel insights into the fidelity of DNA replication, the interaction of DNA poly1 Re: DNA polymerase c, see P. M. J. Burgers et al., E f B 191, 617 (ISSO), and Ref. 22. [Eds.]
Progress in Nucleic Acid Research and Molecular Biology, Vot. 47
371
Copyright 0 1994 by Academic Press, Inc. All rights of reproduction in any form reserved.
372
PAUL A. FISHER
merases with template-primers containing chemically modified or damaged bases, and the mechanism of polymerase translocation along templates. Further experiments, particularly detailed physical studies of cDNA cloneencoded DNA polymerases, remain to be performed to either corroborate or modify existing hypotheses and to extend current models where appropriate,
1. Catalytic Core of DNA Polymerase a Initially, the catalytic core of human pol a was purified to nearhomogeneity and found to be a 14O-kDa, 7.1-S protomer consisting of two dissimilar subunits; this protomer had the potential to dimerize (2;see also 3). Monoclonal antibodies raised against this catalytic core (4) were used subsequently for immunoaf€inity purification of human pol-a holoenzyme, at which time it was recognized that the core catalytic subunit of human pol cx is a single polypeptide of approximately 180 kDa (5). Presumably, the pol-a core enzyme prepared initially had been proteolyzed during purification. Nevertheless, extensive enzymologic studies (6-11) of the partially proteolyzed pol-a catalytic core led to a detailed model for substrate recognition and binding by the enzyme. This model was later shown to apply in all of its features to an undegraded pol-a holoenzyme preparation from human cells (5);hence, details of this model are presented here. [Many aspects of this model were reviewed in a previous volume in this series (12).]
A. Template Recognition and Binding Human pol-a catalytic core follows an ordered sequential terreactant2 mechanism of substrate recognition and binding (8).This mechanism is represented diagrammatically in Fig. 1. The first step involves template, the only substrate for which free pol a has any detectable a n i t y . Elucidation of this feature of pol-a mechanism was the result primarily of enzyme inhibition studies, conclusions from which were substantiated by direct semiquantitative sedimentation binding analyses (6). Representative data from both sorts of assay are shown in Figs. 2 and 4. Of all the nonsubstrate DNA molecules tested, only single-stranded DNA inhibits pol a when the activity of the enzyme on an activated DNA substrate is determined (Fig. 2). This inhibition is competitive with respect to t h t activated DNA substrate (6), noncompetitive with respect to dNTP, and is observed with both heteropolymeric and homopolymeric single2
Terreactaiit (originally ter-reactant) indicates an enzyme having three substrates (26).
[Eds.]
MECHANISM OF REPLICATIVE
DNA POLYMERASES
A
B
C
Template
Primer
Correct dNTP
373
PPi
P+Q+R + Primer + Template
1 1 1 E
€A
t (EABC +EPQR)
€A0
E
DNA Polymerase a
FIG. 1. The ordered sequential terreactant mechanism of substrate recognition and binding by KB cell pol-a catalytic core enzyme; a diagrammatic representation. Information on the order of product release is not available. All products (P, Q , and R) are therefore shown as being released in a single step.
stranded DNA (6, 11).The fact that inhibition of pol a by single-stranded DNA is competitive with the activated DNA substrate implies that the site on the enzyme that binds inhibitor is the same site that binds DNA during catalysis. The essential features of a sedimentation binding assay, devised to com-
120
1
100
20
0 0
25
50
75
100 125 150 175 200
CI DNA1 ( P W
FIG.2. Inhibition of KB cell pol-a catalytic core enzyme by nonsubstrate DNA molecules of defined structure. Concentration of competitive inhibitor (in nucleotide) and DNA polymerase activity are as shown. Competitor (I) DNAs were supercoiled circular duplex PM2 DNA (+), relaxed circular duplex PM2 DNA (O), blunt-ended duplex PM2 DNA fragments generated with HaeIII restriction endonuclease (A),and single-stranded circular M13 DNA (m).
374
PAUL A. FISHER
plement enzyme inhibition studies, are shown diagrammatically in Fig. 3. Pol a alone sediments at 7.1 S, near the top of the gradient. DNA alone migrates much further down the gradient. The exact S-value varies, depending on the particular DNA molecule being analyzed. When pol a and DNA are mixed together before centrifugation, the results depend on the nature of the DNA used. If pol CY cannot bind to the DNA, the sedimentation profiles are identical with those observed when pol a and DNA are sedimented separately. If, on the other hand, pol CY binds to the DNA, a fraction of the pol-a activity is shifted down the gradient, away from the peak of free enzyme. At least a portion of this shifted activity cosediments with the DNA. Data from a sedimentation binding experiment are shown in Fig. 4. These data show that pol CY binds to single-stranded circular 4x174 DNA but not to either supercoiled or relaxed circular duplex DNA, corroborating results of enzyme inhibition studies (Fig. 2). In enzyme inhibition studies, it was noted that the interaction of the pol-a catalytic core protomer with single-stranded circular DNA displays greater than first-order dependence on DNA concentration (6, 11). Linear Hill plots with slopes (Hill coefficients) between 1.5 and 1.8, suggested that there are at least two interacting template binding sites per molecule of pol a. To date, there is no evidence that both sites can be active simultaneously in the synthesis of DNA. Indeed, the fact that the two sites have only been identified in the context of activity inhibition studies argues against simultaneous synthesis of DNA at both. Nevertheless, it remains an intriguing possibility that these two sites somehow filnction coordinately in DNA replication by pol a.
+
FIG. 3. .4 semiquantitative sedimentation binding assay to study the interaction of pol a with nonsubstrate DN.4 molecules. Positions of pol a alone and DNA alone are as indicated. Before ultracentrifugation. pol a and individual DNAs were mixed together under polymerase reaction conditions and sedimented as described (see 6, 8, 9).
MECHANISM OF REPLICATIVE
U
p)
-w
DNA
E
A
I
10000 -.’
10000
0
I
8000
8000
6000
6000
4000
4000
2000
2000
Q L
0
375
POLYMERASES
I
3’
0
c -
a
TI a
0
0
N
0
ri
5
10
15
20
25
0
5
10
15
20
25
0
v
C
x -w .-
D
I
.? 10000 c 0 Q: W
10000
8000 6000
L 0)
E
~Idk!!!
4000
4000
0 2000
2000
x -
a 6
z
n
0 0
0 0
5
10
15
20
25
5
10
15
20
25
F raction Number bot
top
FIG. 4. Binding of KB cell pol-a catalytic core enzyme to nonsubstrate DNA molecules of defined structure. Purified pol a was subjected to ultacentrifugation alone (A), after mixing with supercoiled circular duplex PM2 DNA (B), after mixing with relaxed circular duplex PM2 DNA (C), and after mixing with single-stranded circular +XI74 DNA (D). The solid line above each panel indicates the position of pol a sedimented alone; the broken line above each panel indicates the position of DNA sedimented alone. (For further details, see 6, 8, and 9.)
B. Primer Recognition and Binding Primer binding is the second step in the ordered sequential terreactant mechanism of substrate recognition and binding by the catalytic core of pol a (see Fig. 1).For efficient binding and catalysis, the catalytic core requires an octanucleotide primer (9), of which the terminal three or four nucleotides must be properly base-paired to an appropriate template (7, 9). Moreover, the chemical structure of the 3’-terminal sugar group determines whether the enzyme will bind at all (Table I). Pol-a catalytic core will bind with more
376
PAUL A. FISHER
TABLE I
EFFECT OF
CHEMICAL STRUCTC‘HE A T THE 3‘-PRIMER
’ ~ U U I N U S ON
BINDINGAND INCORPORATION BY POL a
Chemical structure at the 3‘-primer terminus
Binds
Incorporates
Yes Yes Yes Yes
Yes Yes Yes
NO
N0
~
2’-H, 3’-OH Z O H . 3’-OH 2’-PO,, 3‘-OH 2‘-H, 3’-H 2’-H, 3’-P03
NO
or less equal affinity to base-paired 3’ termini containing 2’-H, 3’-OH (a typical deoxyribonucleotide primer); 2’-OH, 3’-0€1(a ribonucleotide primer); 2‘-P04, 3‘-OH; or 2’-H, 3’-H (a dideoxynucleotide primer). Catalysis, i.e., dNMP incorporation, occurs with all but the last of these primers, to which pol a will bind. In contrast, pol-a catalytic core will neither bind to nor catalyze incorporation of dNMP on a 2’-H, 3’-PO,-containing primer (10).
C. The Role of Mg2+ in Template and Primer Binding It has long been assumed that divalent cations such as Mg2+ are required to chelate dNTPs in DNA-polymerase incorporation reactions. On this basis alone, it might be predicted that a plot of enzyme activity versus Mg2+ concentration would exhibit simple saturation kinetics; polymerase activity would increase as the concentration of Mg2+ approached that of the total dNTP present, after which enzyme activity would be unaffected by further concentration. In actuality, plots of enzyme activity versus increases in &I@+ Mg2+ concentration, an example of which is shown for pol a (Fig. 5), are much different in appearance. Enzyme activity continues to increase at Mg2+ concentrations far in excess of the total dNTP concentration, implying another role or roles for Mg2+ in the pol-a reaction mechanism. Moreover, increasing Mgz+ coiicentration beyond an empirical optimum results in dramatic inhibition of pol a. The complex role of Mgz+ in the pol-a incorporation reaction was dissected enzymologically (9).To analyze the effect of Mgz+ on template binding, the K, of pol a for heteropolymeric single-stranded DNA was measured at several different Mg2+ concentrations. As Mgz+ concentration was increased, the K , of pol a for single-stranded DNA decreased, suggesting that
MECHANISM OF REPLICATIVE
DNA POLYMERASES
377
15000 h
TI
3 30 .-
b12500
u n U L P)v
:=loo00
k n
EE
0
7500
2 Q
5000 V
v
2500
0
1
2
3
4
5
6
7
8
9
1011
[MgCI,] (mM) FIG. 5. Activity of pol a versus Mg*+ concentration with an activated DNA substrate. DNA polymerase activity and MgC& concentration are as indicated.
pol a binds template more tightly at higher concentrations of Mgz+. This conclusion was confirmed by direct sedimentation binding analyses. Similar studies performed with homopolymeric single-stranded DNA indicated that increasing concentrations of Mg2+ lead to a dramatically increased affinity for poly(dT) and poly(dC) (polypyrimidines) but have little or no effect on the interaction of pol-a catalytic core with poly(dA) (a polypurine). Because the interaction of pol a with poly(dA) templates was apparently independent of Mgz+ concentration, a “hook homopolymer, (dA)m-(dT)z, was used to study the effects of Mgz+ concentration on the interaction of pol-a catalytic core with primer (9). Results of extensive kinetic analyses indicate that free Mg2+ competes with primer for pol-a binding. This accounts for the dramatic inhibition of pol-a activity at Mgz+ concentrations above the empirical optimum. Competition by Mg2+ is highly cooperative; linear Hill plots were obtained with Hill coefficients of 3.8-3.9, indicating a minimum of four interacting Mg2+ binding sites on each molecule of pol-a catalytic core. The conclusions drawn from kinetic analyses were further substantiated by results of direct sedimentation binding experiments. These results, in conjunction with the observation that an octanucleotide is mini-
378
PAUL A. FISHER
mally necessary to be recognized as primer, led to the suggestion that pol-a catalytic core binds primer through a Mg2+ chelate, with each of four Mg2+ ions acting to coordinate two phosphodiester phosphate groups (9).
D. dNTP Recognition and Binding The binding of dNTP is the third step in the ordered sequential terreactant mechanism of pol-a catalytic core (see Fig. 1).Although primer binding is the immediate prerequisite to dNTP recognition, which of the four dNTPs is recognized and bound is dictated by the template nucleotide immediately adjacent to the primer terminus. Indeed, definitive elucidation of the ordered terreactant mechanism of pol-a catalytic core resulted from the fact that after binding to a 2',3'-dideoxynucleotide-terminated primer, pol a is able to bind dNTP in a template-specific manner, but is unable to catalyze nucleotide incorporation at such a primer because a 3'-OH group is absent (8).In this situation, both as a consequence of the highly ordered mechanism of substrate recognition and binding and because the forward step to catalysis is blocked by the lack of a 3'-OH group, pol-a catalytic core can be trapped in a stable complex with template, base-paired dideoxynucleotideterminated primer, and template-specific dNTP. This phenomenon is represented diagrammatically in Fig. 6 and was demonstrated experimentally, both in the context of steady-state kinetic analyses and semiquantitative sedimentation binding experiments (8). The assumption that MgZ+-chelated dNTPs are the actual substrates for
I
C
FIG. 6. The ordered sequential terreactant mechanism of substrate recognition and binding by KB cell pol-a catalytic core enzyme; a diagrammatic representation including the pathway followed with a dideoxynucleotide-terminated primer. The mechanism of pol a is represented as described in the legend to Fig. 1. I represents a correctly base-paired 2',3'dideoxynucleotide terminated primer.
MECHANISM OF REPLICATIVE
DNA
POLYMERASES
379
DNA polymerases has specific implications in the context of the pol-a catalytic core reaction mechanism. Whereas DNA has a relatively low affinity for Mgz+ and is mostly bound to Na+ in uiuo, dNTPs have a much higher affinity for divalent cations and compete effectively for Mg2+. It therefore seems likely that in the cell nucleus, Mg2+ ultimately destined to chelate primer is brought to the pol-a active site by dNTP. After incorporation and translocation, Mg2+ originally bound at the dNTP site remains associated with pol a in the primer-binding site.
E. The Ordered Sequential Terreactant Mechanism of Substrate Recognition and Binding The ordered sequential terreactant mechanism of substrate recognition and binding by human pol-a catalytic core, shown diagrammatically in Fig. 1 and referred to throughout this section, is presented in Fig. 7. Included in this representation are the participation of two pol-a single-stranded DNA (template) binding sites in orienting pol a at the replication fork as well as the role of Mg2+ in primer binding. The only functional substrate binding sites on free pol a recognize singlestranded DNA templates (Fig. 7, A and B). Each molecule of pol-a core protomer apparently possesses two such sites, which interact allosterically. Once bound to template, pol a acquires a functional primer-binding site (Fig. 7, B and C). Presumably, the pol-a core protomer is able to search the single-stranded template until a suitable primer is found. To be recognized as such, the three or four 3’-terminal nucleotides of a primer must be basepaired to template and the primer must be at least 8 nt long. Binding is through a Mg2+-chelate of the primer phosphodiester backbone. There is considerable flexibility in the chemical structure permissible at the 3’ terminus; primers containing 2’-H, 3’-OH; 2’-OH, 3’-OH; T-PO,, 3’-OH; and 2’-H, 3’-H are all recognized and bound with approximately equal affinity. In contrast, 3’-PO,-containing primers cannot be bound. Primer binding induces formation on pol a of a functional dNTP-binding site (Fig. 7D). Although dependent for its activity on a prerequisite step of primer binding, this site derives its specificity from the template nucleotide immediately adjacent to the primer terminus. Thus, only the appropriate nucleotide for incorporation, as directed by the DNA template, is bound by the polymerase (Fig. 7E). dNTP is bound by pol a as a MgZ+-chelate. Once the ternary complex (i.e., pol aetemplate-primer correct dNTP) is formed, catalysis ensues, provided that the primer contains a 3’-OH group, necessary for nucleophilic attack on the a+ phosphodiester bond of the incoming dNTP. We have suggested that the Mg2+ ion brought to the ternary pol asubstrate complex by the dNTP remains bound to the polymerase-active site
380
PAUL A. FISHER
FIG. 7. A representation of the ordered sequential terreactant mechanism of substrate retognition and binding by KB cell pol-a catalytic core enzyme. (A) Free pol a contains only template-binding sites (+); there are two such sites. (B) Pol a bound to primed single-stranded D N A at one of its two template-binding sites. These two sites can interact allosterically. as indicated by the double-headed arrow. A primer binding site (4)is generated adjacent to the bound template. ( C ) Pol a positioned within a replication fork by virtue of binding to singlestranded DNA at both template-binding sites. The enzyme is poised to replicate the lagging strand. (D) Binding of primer induces formation of a template-directed dNTP-binding site (0. (E) dNTP-binding site binds dNTP in a template-directed manner; catalysis ensues.
and goes on to chelate the newly formed primer after the dNMP is incorporated and PP, is released.
II. Holoenzyme of DNA Polymerase a A. Comparison of Holoenzyme with Catalytic Core Protomer from Human KB Cells Purification of apparently undegraded pol-a holoenzyme from human KB cells allowed enzymologic comparison between holoenzyme and catalytic core protomer (5). Several aspects of the fundamental mechanism of substrate recognition and binding were tested (5).In all respects, human pol-a
MECHANISM OF REPLICATIVE
DNA
C
D
5’
381
POLYMERASES
E
3’
FIG.7 . (Continued)
holoenzyme behaved identically to human pol-a catalytic core protomer. Like the catalytic core protomer, human pol-a holoenzyme is inhibited by single-stranded but not by double-stranded DNA. An example of this inhibition is shown in Fig. 8. Inhibition by single-stranded DNA showed greater than first-order dependence on inhibitor concentration; linear Hill plots with slopes of 1.5-1.6 were obtained (Fig. 9). Human pol-a holoenzyme bound primer as the second substrate and, like the catalytic core protomer, recognized properly base-paired 2’,3’-dideoxy-terminatedprimers. As a result, it was possible to demonstrate induced dNTP inhibition (Fig. lo), thus confirming the ordered sequential terreactant mechanism of substrate recognition and binding by human pol-u holoenzyme. This mechanism was identical to that previously established for the isolated catalytic core (see Figs. 1, 6, and 7). The role of Mg2+ in template-primer recognition by human pol-a holoenzyme was elucidated by steady-state kinetic analyses. With the appropriate substrate, (dA)m-(dT)z, Mg2+ was shown to be a highly cooperative competitive inhibitor of primer binding. Linear Hill plots with slopes of 3.9
382
PAUL A. FISHER
*t FIG. 8. Inhibition of human pol-aholoenzyme by single-stranded DNA. Concentration of competitive inhibitor (in nucleotide) and DNA polymerase activity are as shown. Competitor DNAs were supercded circular duplex pBR322 DNA ( +), DNase-I-nicked relaxed circular duplex pBR322 DNA (a),blunt-ended duplex pBR322 DNA fragments generated with HwIII rrrtriction endonuclease (A),and single-stranded circular +X174 DNA (W).
were obtained (Fig. 11). As with the human pol-a catalytic core protomer, this has been interpreted to indicate that human pol-a holoenzyme binds primer through a M@+-chelate of the primer phosphodiester backbone; we have suggested that four Mg2+ ions and eight phosphodiester phosphates are involved.
B. Comparison of Human Holoenzyme with Holoenzyme Purified from Drosophila melanogaster Embryos To demonstrate that the enzymologic mechanism elucidated for human pol-a catalytic core and confirmed for human pol-a holoenzyme was applicable to pol a obtained from a different organism, pol-CYholoenzyme was purified from Drosophila mehogaster embryos. Drosophilu was chosen as a higher eukaryote of considerable evolutionary distance from humans; moreover, a conventional biochemical protocol for purification of Drosophilu pol-a holoenzyme was available (13). Like human pol-a catalytic core and holoenzyme, Drosophilu pol-a holoenzyme follows an ordered sequential terreactant mechanism for substrate
MECHANISM OF REPLICATIVE
DNA POLYMERASES
383
‘ i
>
50
100
150
200
250
[4x174 DNA] (pM)
, B 0.4
L
FIG.9. Inhibition of human pol-a holoenzyme by single-stranded DNA exhibits positive cooperativity; Dixon and Hill plot analysis. (A) Dixon plot (V-l versus I ) analysis of inhibition by single-stranded circular +X174 DNA. Three different concentrations (in nucleotide) of activated DNA substrate were used: 60 pM (O), 120 p M (A),and 240 pM (m). (B) Hill plot analysis of the data shown in A. Symbols are the same as in A. V,, velocity of incorporation in the absence of inhibitor DNA; Vi, velocity of incorporation in the presence of inhibitor DNA.
recognition and binding; template is bound first, followed by an appropriately base-paired primer, and then, template-directed dNTP (14). The demonstration of induced dNTP inhibition in the presence of a base-paired 2’,3’-dideoxy-terminated primer was crucial to the elucidation of this mechanism. A detailed study has recently been completed of the effects of primer mismatch on binding and dNMP incorporation on synthetic oligonucleotide template-primers by Drosophila pol-a holoenzyme (15). Results of these studies were consistent with and refine previous studies of human pol a. It
-
P 0 c X
E U
50
100
400
200 [dGTP]-’
v
(mM)
c
>
I 50
100
200
(00
[dNTP]-’ (mM, each)
FIG.10. Induced substrate inhibition of human pol-a holoenzyme in the presence of a correctly base-paired 2‘,3’-dideoxy-terminatedprimer. Data are shown as Lineweaver-Burk double-reciprocal plots (V-1 versus S-1). All nucleic acid concentrations are expressed in terms of nucleotide. (A) Nucleic acid substrate, (dC)m-(dC),, was at a final concentration of 20 pM. dCTP concentrations and DNA polymerase activity, as measured by CPM [32P]dCMP incorporated, were as indicated; nucleic acid substrate alone (0);nucleic acid substrate plus 2 pM (dC)m-(dG)z-(ddA, ddT, or ddC), (A);nucleic acid substrate plus 2 (LM (dC)-(dG)m(ddG), (m). (B) Nucleic acid substrate was activated DNA at a final concentration of SO p M . dNTP concentrations and DNA polymerase activity, as measured by CPM [3zP]dTMPincorporated, were as indicated; activated DNA alone (0);activated DNA plus 20 pM dideoxy-primed single-stranded DNA template molecules (mean length of template plus primer was approximately 75 nucleotides) (A).
MECHANISM OF REPLICATIVE
DNA
385
POLYMERASES
.
2.0
*
1.6
*
1.2
.
0.8
.
0.4
*
/
/ /’ 0 ’
1.2
1.5
1.6
FIG. 11. Inhibition of human pol-a holoenzyme-primer binding by Mgz+ exhibits positive cooperativity;Hill plot analysis. Nucleic acid substrate, (dA)m-(dT)a was at a final concentration of 80 pM (nucleotide).The slope was computed from the extended linear portion of the plot as indicated by the solid line. See legend to Fig. 9B.
was demonstrated that Drosophila pol-a holoenzyme requires primerterminal complementarity of at least 4 bp for efficient binding and incorporation. When a mismatched base pair was present at the - 4 position relative to the 3’-primer terminus, only slight binding occurred. This was consistent with the ability of Drosophila pol-a holoenzyme to incorporate a single nucleotide on a template-primer containing a mismatch at this position, but at a rate of only 7% relative to incorporation on a perfectly matched template-primer. Despite the many similarities, several aspects of enzymologic mechanism elucidated for human pol a remain to be tested with the Drosophila pol-a holoenzyme. Two questions stand out. Is inhibition of Drosophila pol-a holoenzyme by single-stranded DNA cooperative? What is the role of Mg2+ in template binding and primer binding? In these contexts, it may also be useful to perform detailed enzymologic analyses of a lower eukaryotic pol-a homologue, for exapmle, Saccharomyces cerevisiae pol I.
386
PAUL A . FISHER
Ill. Interaction of DNA Polymerase a with Template-Primers Containing Chemically Damaged Nucleotides
A. I nteract ion with TempI ate- Primers Conta in ing Abasic Sites Takeshita et al. (16) synthesized chemically oligodeoxyribonucleotides containing modified tetrahydrofuran moieties inserted into the phosphodiester backbone in place of normal deoxyribonucleotides. They suggested that such sites were structurally analogous to abasic (base-removed) sites naturally arising in DNA. Abasic sites in DNA have been proposed as common intermediates through which pass many pathways of chemical mutagenesis (17, 18). To study the interaction of Drosophilu pol-a holoenzyme with abasicsite-containing template-primers, a series of oligonucleotides, 30-mer templates and 12-mer primers, containing abasic residues at various defined positions along their lengths, was synthesized (14). A control 30-mer and a control 12-mer, both lacking abasic residues, were synthesized as well. After annealing a complementary 12-mer primer to one end of the 30-mer teniplate, abasic sites were located at various defined positions in both template and primer regions. A single 30-12-mer contained at most one such site. The structure of the 30-12-mer template-primer is shown in Fig. 12; the possible locations of abasic residues are indicated by asterisks. Two aspects of Drosophila pol-a holoenzyme interaction with abasic-sitecontaining template-primers were studied. First, the ability of the enzyme to incorporate a single template-directed nucleotide on each available 3012-mer was determined. To make such determinations, a PAGE-dependent assay was developed. After incubation of enzyme with various templateprimers and [a-32P]dGTP, reaction products were analyzed by PAGE in the presence of 7 M urea. After electrophoresis and autoradiography, regions of the gel containing radiolabeled 13-mer reaction product were excised and
el0.
5’
+ 20* * . +
30
CTGCAGAGATCTGTCGACAAGCTTGAATTC 3’ TTCGAACTTAAG
+ + + *
3’
5’
FIG. 12. Structure of the 30-12-mer template-primer used to study the interaction of Drosophila pol-a holoenzyme with oligonucleotides containing model abasic sites. The possible positions of abasic sites are indicated by asterisks. Each template-primer contained, at most, one such site.
MECHANISM OF REPLICATIVE
DNA
POLYMERASES
387
incorporation was quantified by liquid scintillation counting. To facilitate quantification, odd-numbered gel lanes were loaded first, followed by evennumbered gel lanes 1 hour after the start of electrophoresis. This allowed excision after electrophoresis and autoradiography, without concern for overlap of radioactivity between lanes. An example of autoradiographic data from such an experiment is shown in Fig. 13. The results of incorporation studies (14) indicated that, relative to incorporation opposite a normal template nucleotide, Drosophila pol-a holoenzyme is essentially unable to catalyze nucleotide incorporation opposite an abasic template residue. This observation was consistent with reports of others (19; see also 16),suggesting that incorporation opposite the abasic site is about 1/4000th as efficient as incorporation opposite a normal template nucleotide. Abasic residues on either strand of the 30-12-mer in the primer region of the template-primer construct as far as 4 bp removed from the 3’-primer terminus prevented detectable incorporation opposite a normal template nucleotide at that primer terminus. In contrast, abasic residues in the primer region further than 4 nt removed from the primer terminus had relatively
FIG. 13. PAGE-dependent assay of single-nucleotide incorporation on 30-12-mer template primer. Denaturing PACE analysis was performed according to standard protocols (see 14 for original references). Odd-numbered lanes were loaded first and even-numbered lanes were loaded 1hour later. The migration positions of the 13-mer reaction product are as indicated to the right of the figure (el3 and 013).DNA concentrations are expressed in terms of molecules of 30-12-mer initially present; lane a, no DNA substrate; lane b, 0.25 p M DNA substrate; lane c, 0.50 pM DNA substrate; lane d, 1.0 pM DNA substrate; and lane e, 2.0 pM DNA substrate.
:388
PAUL A . FISHER
little effect on nucleotide incorporation by Drosophila pol-a holoenzyme opposite a normal template nucleotide. With one exception, abasic residues in the template other than at the site opposite the primer terminus also had relatively little effect on nucleotide incorporation by Drosophila pol-a holoenzyme opposite a normal template nucleotide. The exception was that when an abasic residue was placed in the template region close to, but not at, the 3’-primer terminus, incorporation opposite a normal template nucleotide at that primer terminus was apparently stimulated three- to fourfold. Induced dNTP inhibition experiments were performed in conjunction with nucleotide incorporation studies (14). The advantage of induced dNTP inhibition experiments is that they provide a direct measure of the pol-a template-primer binding interaction, independent of catalysis by the enzyme. Results of these experiments indicate that substrates that did not support catalysis by pol (Y in direct nucleotide incorporation experiments could not bind polymerase, thus indicating that lack of incorporation was due to lack of binding. Additionally, the substrate on which incorporation was apparently stimulated three- to fourfold, i.e., that containing an abasic template residue near to but not opposite the 3’-primer terminus, was bound similarly to the control template-primer. This suggests that enhanced incorporation on the former substrate resulted from enhancement of the rate of catalysis (kcat) and not enhanced binding of the substrate by pol a.
B.
interaction with Template-Primers Containing the Exocyclic Adduct, 1 ,N2-propanodeoxyguanosine
Studies similar to those performed with abasic-site-containing substrates were performed with template-primers containing the exocyclic adduct, 1,N2-propanodeoxyguanosine (PdG) (15).Virtually identical results were obtained. Synthetic oligonucleotides containing PdG residues at various defined sites were prepared according to Kouchakdjian et al. (20).Both direct incorporation and induced dNTP inhibition assays determined that Drosophila pol-a holoenzyme will not bind and hence cannot catalyze nucleotide incorporation when a PdG residue is present in the primer region at the -4 position or closer, relative to the 3’-primer terminus. PdG residues further than 4 nt from the 3’-primer terminus had relatively little effect on either binding or incorporation. When a PdG residue was located in the template region, near to but not at the 3’-primer terminus, single-dNMP incorporation opposite a normal template nucleotide at that primer terminus was stimulated three- to fourfold. Results obtained with template-primers containing PdG residues in the primer region are similar to those seen with both abasic-site-containing
MECHANISM OF REPLICATIVE
DNA POLYMERASES
389
template-primers and mismatch-containing template-primers. Together, they suggest that when pol a encounters either a noninformational abasic site or an exocyclic PdG adduct in the template, they are treated as residues for which no complementary nucleotide can be found. As such, incorporation stops until repair of the lesion occurs. That a relatively bulky exocyclic adduct like PdG has a similar effect on primer recognition by pol a as the noninformational abasic site suggests that, rather than recognizing base damage per se, pol a has a stringent requirement for a correctly base-paired primer. Any perturbation of that structure, whether by base mismatch or a modified nucleotide or lesion for which no normal complement exists, results in abrogation of primer binding and cessation of catalysis. Results obtained with template-primers containing PdG residues in the primer region, in conjunction with those obtained with abasic-sitecontaining primers and mismatch-containing primers, also have specific implications for fidelity of nucleotide incorporation by pol a (14, 15). When either an abasic site or a PdG residue is encountered in the template, pol a pauses, as these lesions are apparently mechanistically equivalent to a template nucleotide for which no normal complement exists. Indeed, the efficiency with which pol a incorporates a nucleotide opposite an abasic site is similar to the efficiency of base misincorporation (19).In the unlikely event that pol a does “misincorporate,” the product of misincorporation is now recognized by polymerase as a mismatched primer and further catalysis does not occur. The fact that each of the first four primer nucleotides must be base-paired for subsequent incorporation to proceed at normal rates means that immortalization of a misincorporated nucleotide actually requires that five unlikely events occur. Even if the probability of each of these events is as high as 10-2, the probability of all five occurring would be only 10-10.
C. Insights into the Mechanism of Polymerase Incorporation from Studies of Templates Containing Chemically Damaged Bases The presence of either an abasic site or a PdG residue in the template, near to but not opposite the 3’-primer terminus, leads to a three- to fourfold stimulation of single nucleotide incorporation opposite a normal template nucleotide. We have suggested a possible mechanism for this effect (14, 15). Human pol-a catalytic core interacts with between 5 and 10 template nucleotides (11).A plausible explanation for the stimulation of incorporation by damaged template nucleotides located near to but not opposite primer termini is that, to incorporate, the polymerase must translocate. To translocate, polymerase must release each template nucleotide with which it interacts initially and reform a new set of enzyme-template bonds. Weakening (or
390
PAUL A . FISHER
eliminating) one of these bonds by placing a chemically modified residue in the template might therefore facilitate translocation and thereby enhance the rate of nucleotide incorporation at a nearby primer site. This hypothesis is currently being tested.
IV. DNA Polymerase 6 It is now certain that pol a and pol 6 are two distinct proteins. Initially, this conclusion was based on two-dimensional peptide map analysis (21), immunochemical comparison (21, 22), and a variety of enzymologic studies. More recently, distinct putative pol-6 sequences have been deduced from cDNA clones isolated from both human and bovine sources (23, 24). Most notable among the enzymologic differences between pol a and pol 6 are the specific association of a DNA primase activity with the pol-a holoenzyme, the specific association of a putative proofreading 3' + 5' exonuclease with p o l 3 catalytic core, and the response of pol 6 to the auxiliary factor, proliferating cell nuclear antigen (PCNA). Purification of pol-6 catalytic core to apparent homogeneity from calf thymus (25; see also 26) made possible detailed enzymologic characterization (26).In performing these experiments, we were guided by previous analyses of pol a. We reasoned that this would facilitate direct comparison between the two major replicative DNA polymerases in higher eukaryotes. We also felt that such characterization was a necessary prerequisite to understanding the mechanism whereby poi 6 was stimulated by its specific auxiliary factor, PCNA. PCNA was also purified from calf thymus for this purpose (25)and compared with a PCNAhomologue purified from Drosophila embryos (27) as well as with cloneencoded PCNA molecules expressed in Escherichia coli.
A. The Ordered Sequential Terreactant Mechanism of Substrate Recognition and Binding Like both human pol a and Drosophila pol a,calf thymus pol-6 catalytic core follows an ordered sequential terreactant mechanism of substrate recognition and binding (see Fig. 1); also, like pol a,pol 6 binds template first, followed by primer and then template-directed dNTP (26).Template binding was demonstrable, both by enzyme inhibition studies and by direct semiquantitative sedimentation binding analyses. Of all the DNA molecules tested (Table 11), including single-stranded circular DNA, supercoiled double-stranded circular DNA, relaxed double-stranded circular DNA, and
MECHANISM OF REPLICATIVE
DNA
391
POLYMERASES
TABLE I1
EFFECTOF DNA STRUCTURE ON COMPETITIVE INHIBITION OF POL 6 DNA structure“
Inhibition of pol 6
ss circular ds circular (supercoiled) ds circular (relaxed) Blunt-ended ds frazments (3’-OH, 5’-POA)
Yes No No No
ass,
Single-stranded;ds, double-stranded.
blunt-ended linear duplex fragments, only single-stranded circular DNA inhibited pol 6 when the activity of this enzyme was assayed on an activated DNA substrate. Consistent with this, free pol 6 was only able to bind singlestranded DNA in the context of direct, semiquantitative sedimentation binding experiments. Pol 6 was unable to bind either supercoiled circular duplex DNA or relaxed circular duplex DNA. Primer binding was demonstrated specifically in the context of correct dNTP binding, and induced dNTP inhibition occurred in the presence of a base-paired 2’,3’-dideoxy-terminatedprimer (26). This allowed us to establish the ordered sequential terreactant mechanism of substrate recognition and binding, with primer binding as the second step in the mechanism followed by template-directed dNTP binding. Several additional aspects of primer binding that have been studied with pol a remain to be investigated with pol 6. These include the ability of pol 6 to bind and/or utilize primer termini with chemical structures other than 2’-H, 3’-OH (a conventional deoxynucleotide primer) and 2’-H, 3‘-H (a dideoxynucleotide primer); the role of Mg2+ in primer binding; and the ability of pol 6 to bind and/or utilize primers containing mismatched or unmatched bases at various defined positions. In conjunction with this last aspect of pol-6 mechanism, it may be particularly interesting to investigate the capacity of the pol-6-associated 3’+5’ exonuclease to excise mismatched primer-terminal nucleotides.
B. Mechanism of Stimulation by Proliferating Cell Nuclear Antigen Having established that the pol-6 catalytic core follows an ordered sequential terreactant mechanism similar to that elucidated for pol a (26),we next sought to determine the enzymologic mechanism whereby pol 6 is stimulated by its auxiliary factor, PCNA. Steady-state kinetic analyses were performed with two different substrates, (dA)m-(dT)z and 30-21-mer, a
392
PAUL A. FISHER
10
2o
30
5’ CTGCAGAGATCTGTCGACAAGCTTGAATTC 3‘ 3’ AGACAGCTGTTCGAACTTAAG 5’ 20
10
FIG. 1-1. Structure of (d.4)m-(dT), and 30-21-mer template-primers used to elucidate the Ineclianism of pol-6 stimulation by PCNA.
heteropolymeric synthetic oligonucleotide template-primer. The structures of both substrates are shown in Fig. 14. When either clone-encoded human PCNA expressed in and purified from E . coli or authentic PCNA purified from calf thymus was added to incubations with calf thymus pol 6 and various DNA substrates, several effects were noted. On (dA)m-(dT)E (Fig. 14), a substrate on which processive DNA synthesis was possible, PCNA, as expected (see, e.g., 25), stimulated greatly the processivity of calf thymus pol 8. Results of steady-state kinetic analyses suggest that this is due both to an increased af6nity of pol 6 for template-primer reflected by a dramatic decrease in apparent &,,and to a small but reproducible increase in the rate of individual nucleotide incorporation (kcat) as reflected by an increased Vmu. To simplify the steady-state kinetic analysis, similar experiments were performed using a synthetic 30-21-mer template-primer (Fig. 14) in place of (dA)T-,-(dT)-,. By including in the incubation only dITP, the dNTP complementary to the template nucleotide on the 30-mer opposite the 3’-OH group of the 21-mer primer, it was possible to study the effects of PCNA on single-nucleotide incorporation by pol 6, without the kinetic cornplexity of processive DNA synthesis. In experiments with 30-21-mer, it was found (26) that PCNA increases V,,,,, substantially, thus confirming the impression that PCNA enhances the rate of single-nucleotide incorporation (kcS,Jby pol 6. In fact, the apparent K,,,of pol 6 for 30-21-mer in these experiments actually increased in response to addition of PCNA. We suggested that this increase in apparent K,, resulted from destabilization of the enzyme-template-primer complex brought about by increased catalysis (kcat)and not decreased affinity of pol 6 for the 30-21-mer in the presence of PCNA (26).
MECHANISM OF REPLICATIVE
DNA
POLYMERASES
393
C. A Nondenaturing PAGE-band Mobility Shift Assay to Study the Interaction of Pol 6, PCNA, and Synthetic Oligonucleotide Template-Primers Further insights into the mechanism of pol-6 stimulation by PCNA were obtained from results of a novel PAGE-band mobility shift assay (28). An example of data from such an assay is shown in Fig. 15. This technique allowed us to study directly the effect of PCNA addition on the ability of pol 6 to bind the 30-21-mer without having to resort to sedimentation binding analysis. Results obtained from these assays indicated that either cloneencoded human PCNA or authentic calf thymus PCNA (i.e., homologous PCNA) can promote stable complex formation between calf thymus pol 6 and 5’-32P-labeled 30-21-mer. In contrast, neither clone-encoded nor authentic Drosophila PCNA (i.e., heterologous PCNA) functioned similarly.
FIG.15. Formation of a PAGE-detectable pol 6.PCNA.template-primer complex. Formulation of incubations and nondenaturing PAGE were as previously described (28). Each incubation contained 0.1 pM (molecules) of 30-21-mer (see Fig. 14) labeled at both 5’ ends with 32P. Lane a, 30-21-mer alone; lane b, 3&21-mer plus calf thymus PCNA; lane c, 30-21-mer plus calf thymus pol 6; and lane d, 30-21-mer plus calf thymus pol 6 plus calf thymus PCNA. Arrow to the left of the figure indicates the migration position of free 30-21-mer; arrow to the right of the figure indicates the migration of the pol 6-PCNA.(30-21-mer) complex.
394
PAUL A. FISHER
The differing abilities of homologous versus heterologous PCNA to promote PAGE-stable complex formation between calf thymus pol 6 and 30-21mer correlates well with the differing capacities of homologous versus heterologous PCNA to stimulate the activity of calf thymus pol 6 on (dA)m(dT)E(28)and are consistent with the limited ability of Drosophila PCNA to substitute for human PCNA in the reconstituted SV40 DNA replication system (27). A model for the ordered sequential interaction of pol 6, PCNA, and template-primers (Fig. 16) has been proposed (28). We have demonstrated a stable calf thymus pol 6.PCNA.oligonucleotide complex after electrophoresis on nondenaturing agarose gels (M. McConnell and P. A. Fisher, unpublished). A complex of almost identical mobility was detected in extracts of Drosophih eggs and early embryos but not in extracts prepared from older embryos (S. Bogachev and P. A. Fisher, unpublished).
V. Conclusions and Prospects for Future Research Pol a and pol 6 follow similar ordered sequential terreactant mechanisms of substrate recognition and binding. For both polymerases, the first substrate bound is template. This observation has two clear implications for in t;ioo replication. First, to initiate DNA synthesis, both enzymes would require single-stranded DNA template (i.e., first substrate). Second, neither pol a nor pol 6 should be able to fill a gap efficiently; once template size was
PCNA Template Primer pol 6.T-P
\
PCNA pol6
pol6.T
+
pol 6.T-P \/ PCNA
pol6.Tpd 6.T-P-dNTP
L
pol 6oT-PedNTP \/
PCNA
pol 6oT-P.dHIP
FIG. 16. The ordered sequential mechanism of substrate recognition and binding by calf thymus pol 6 in cvmbination with PCNA-a working hypothesis. Incorporation can occur starting with either the pol &template-primer complex or the pol 6.PCNA.template-primer complex. Although there is no evidence for interaction of pol 6 with PCNA independently of template-primer, neither can such an interaction be ruled out. The model proposes that pol 6 cannot interact with PCNA before binding to template-primer. However, this remains hypothctical.
MECHANISM OF REPLICATIVE
DNA
POLYMERASES
395
reduced below the necessary minimum, both enzymes would presumably lack the first substrate and would dissociate from the DNA. The first prediction regarding initiation has been largely substantiated by results obtained with the reconstituted SV40 DNA replication system (see, e.g., 29-31). Current models propose that DNA synthesis on both leading and lagging strands of the DNA is initiated first by unwinding of the DNA duplex in the region of the replication origin. Unwinding generates singlestranded DNA template (i.e., the first substrate for pol a)and is followed by primer synthesis catalyzed by the pol-a holoenzyme-associated DNA primase. At the same time, pol 6, which also recognizes single-stranded template DNA as its first substrate, can bind to DNA while awaiting primer. The second prediction, regarding in vim gap-filling by pol a and pol 6, remains untested. In this context, it is tempting to speculate that pol E is the “gap-filling”polymerase. One would intuitively expect the requirement for gap-filling to be much more apparent with respect to lagging-strand replication. There might nevertheless be a certain minimal requirement for gap filling on the leading strand as well. Detailed enzymologic analysis of pol E could shed immediate light on these hypotheses. Primer represents the second substrate for both pol a and pol 6. From a purely mechanistic perspective, therefore, both leading and lagging strands could be elongated by either pol (Y or pol 6. However, here the intrinsic mechanism of the polymerase reaction becomes subject to specific modification by polymerase-associated factors. Hence, the inherently discontinuous nature of lagging-strand synthesis with the frequent need for repriming would make pol-a holoenzyme, with its tightly associated DNA primase, the logical choice for lagging-strand replication. Conversely, priming on the leading strand need only occur once. As soon as primer and template are available, pol 6 in conjunction with associated PCNA and perhaps other replication factors (e.g., RF-C) would be logical candidates for leadingstrand replication. Again, results in the reconstituted SV40 DNA replication system support such notions (29, 30). The requirement of pol a, an enzyme apparently lacking an associated proofreading 3’ + 5’ exonuclease, for primer terminal complementarity, has explicit implications for the fidelity of DNA synthesis by this enzyme. These have been discussed extensively (14, 15; see also Section 111,B). So too have implications of the observation that human pol a apparently contains two interacting single-stranded DNA (template) binding sites (5, 6, 11; see also Sections 1,A and 11,A). In the present context, it is important to emphasize that these two template-binding sites have only been observed during enzyme inhibition studies. Simultaneous polymerase activity at both sites therefore seems highly unlikely. Rather, the second template-binding site on
396
PAUL A. FISHER
pol ci may serve to align the enzyme at the replication fork and ensure efficient lagging-strand DNA synthesis. In contrast with pol a, pol F has an associated 3' + 5' exonuclease activity, presumably involved in proofreading. Currently, there is little understanding of how pol-6 polymerase activity and pol-b exonuclease activity act coordinately, if indeed they do. It is likely that detailed enzymologic analyses of interactions among pol 6, PCNA, and synthetic oligonucleotides of varying primary and secondary structures will shed considerable light on these and related questions. Such analyses are under way. There is now little doubt that a Drosophilu pol 6 exists. This conclusion is based on unequivocal identification of Drosophilu PCNA (27,32),the reported detection in embryos of a DNA polymerase activity, physically distinct from pol ci and apparently responsive to PCNA (33),and the identification by nondenaturing gel-electrophoresis-band mobility shift assay (28) in Drosophiln oocyte and early embryo extracts, of a protein-nucleic acid species of mobility nearly identical to that of the calf thymus pol 6-PCNA-templateprimer complex (see Section IV,C). The complete purification of pol 6 from Drosophilu, a higher eukaryote amenable to systematic genetic manipulation, would permit in civo analysis of pol-6 function. Application of nondenaturing gel-electrophoresis-band mobility shift assays may greatly facilitate this purification.
ACKNOWLEDGMENTS It is a pleasure to acknowledge many colleagues and collaborators for contributing to the research described herein. These include S. Bogwhey K. Downey. D. Korn, M. McConnell, L. Ng, C. Tan, '1 \Vang, and S . Weiss. I also express my gratitude to A. Daraio for computer design of the figures and for help in preparation of the manuscript. Stndies from m y laboratory presented in this essay were supported by Research Grants Gh435943 and ESM068 from the National Institutes of Health. I was partially supported during the writing of this essay by Research Scholar Grant SG 189 from the American Cancer Society and by a Guest Research Fellowship from the Royal Society. This essay is dedicated to niy wife, Peggy, on the occasion of o u r tenth wedding anniversary
KEFERENCES I . A. Morrison, H. Araki, A . B. Clark. R. K. Hamatake and A. Sugino, Cell 62, 1143 (1990). 2. P. A. Fisher and D. Korn, JBC 252, 6528 (1977). 3 . I>. Filpula. P. A. Fisher and D. Korn, JBC 257, 2029 (1982). 1. S. Tanaka, S.-Z. Hit, T. S.-F. Wang and D. Korn, JRC 257, 8386 (1982). 5 . S. M: \Vong, I,. R. Pabarsky, P. A. Fisher, T. S. Wang and D. Korn. JBC 261, 7958 (1986). 6. P. A. Fisher and 0 . Ktirn, JBC 254, 11033 (1979).
MECHANISM OF REPLICATIVE 7. 8. 9. 10. 11. 12. 13. 14.
DNA POLYMERASES
397
P. A. Fisher and D. Korn, JBC 254, 11040 (1979). P. A. Fisher and D. Korn, Bchem 20, 4560 (1981).
P. A. Fisher and D. Korn, Bchem 20, 4570 (1981). P. A. Fisher, T. S.-F. Wang and D. Kom, JBC 254, 6128 (1979). P. A. Fisher, J. T. Chen and D. Kom, JBC 256, 133 (1981). D. Korn, P. A. Fisher and T. S.-F. Wang, This Series 26, 63 (1981). L. S. Kaguni, J. Rossignol, R. C. Conaway and I. R. Lehman, PNAS 80, 2221 (1983). L. Ng, S. J. Weiss and P. A. Fisher, JBC 264, 13018 (1989). 15. S. J. Weiss and P. A. Fisher, JBC 267, 18520 (1992). 16. M. Thkeshita, C. Chang, F. Johnson, S. Will and A. P. Grollman, JBC 262, 10171 (1987). 17. L. A. Loeb, Cell 40, 483 (1985). 18. L. A. Loeb and B. D. Preston, ARGen 20, 51 (1987). 19. S. K. Randall, R. Eritja, B. E. Kaplan, J. Petrushka and M. F. Goodman, JBC 262, 6864 (1987). 20. M. Kouchakdjian, E. Marinelli, X. Gao, F. Johnson, A. Grollman and D. Patel, Bchem 28, 5647 (1989). 21. S. W. Wong, J. Syvaoja, C.-K. Tan, K. Downey, A. G. So, S. Linn and T. S.-F. Wang, JBC 264, 5924 (1989). 22. M. Y. W. T. Lee, Y. Jiang, S. J. Zhang and N. L. Toomey, JBC 266, 2423 (1991). 23. D. W. Chung, J. Zhang, C.-K. Tan, E. W. Davie, A. G. So and K. M. Downey, PNAS 88, 11197 (1991). 24. J. Zhang, D. W. Chung, C.-K. Tan, K. M. Downey, E. W. DavieandA. G. So, Bchem30, 11742 (1991). 25. C.-K. Tan, C . Castillo, A. G. So and K. M. Downey, JBC 261, 12310 (1986). 26. L. Ng, C.-K. Tan, K. M. Downey and P. A. Fisher, JBC 266, 11699 (1991). 27. L. Ng, G. Prelich, C. W. Anderson, B. Stillman and P. A. Fisher, JBC 265, 11948 (1990). 28. L. Ng, M. McConnell, C.-K. Tan, K. M. Downey and P. A. Fisher, JBC 268,13571 (1993). 29. M. D. Challberg and T. H. Kelly, ARB 58, 671 (1989). 30. B. Stillman, Annu. Rev Cell Biol. 5, 197 (1989). 31. T. S.-F. Wang, ARB 60, 513 (1991). 32. M. Yamaguchi, N. Yasuyoshi, T. Moriuchi, F. Hirose, C.-C. Hui, Y. Suzuki and A. Matsukage, MCBiol 10, 872 (1990). 33. V. M. Peck, E. W. Gerner and A. E. Cress, NARes 20, 5779 (1992).
This Page Intentionally Left Blank
ADDENDUM New Concepts in ProteinDNA Recognition: Sequence-d irected DNA Bending and Flexibility
Since completion of this writing, two important additional contributions have appeared. A crystal structure of the estrogen receptor DNA binding domain peptide complexed to the consensus estrogen receptor element at 2.4 A resolution has been reported (289, 290). This structure both amplifies and complements the discussion on hormone receptor sites given in Section 11,D,2. Impressive new evidence for structural mobility at (TA)*(TA)elements in restriction enzyme binding sequences (291,292),in the TATA promoter site (Section II,F) (293,294),and in the recognition site for the Tramtrack transcriptional regulator (295) has also appeared (reviewed by Klug, 296) which supports the bending hypotheses given here. We thank J. W. R. Schwabe and L. Fairall for access to their recent work in advance of publication. 289. J.W.R Schwabe, L. Chapman, J,T. Finch and D. Rhodes, Cell 75, 567 (1993). 290.J.W.R Schwabe, L. Chapman, J*T.Finch, D. Rhodes and D. Neuhaus, Structure 1, 187 (1993). 291. M.A. Kennedy, S.T. Nuutero, J.T. Davis, G.P. Drobny and B.R. Reid, Bchem 32, 8022 (1993). 292. F.K. Winkler, D.W. Banner, C. Oefner, D. Tsernoglou, R.S. Brown, S.P. Heathman, R.K.Bryan, P.D. Martin, K. Petratos and K.S.Wilson, EMBO J 12, 1781 (1993). 293. Y. Kim, J.H. Geiber, S. Hahn and P.B. Sigler, Nature 365, 512 (1993). 294. J.L Kim, D.B. Nikolov and S.K. Burley, Nature 365, 520 (1993). 295. L. Fairall, J.W.R. Schwabe, L. Chapman, J.T. Finch and D. Rhodes, Nature (in press). 296. A. Klug, Nature 365, 486 (1993).
Progress in Nucleic Acid Research and Molecular Biology, Vol. 47
399
Copyright 8 1994 by Academic Press, Inc. All rights of reproduction in any form resewed.
This Page Intentionally Left Blank
Index
A Achondrogenesis type-11, collagen type-I1 gene mutations, 48-51 ADPglucose pyrophosphorylase, bacterial allosteric properties alternation via cloning, 324-326 regulation via, 315-319 effector sites, 319-326 chemical modification studies, 319 site-directed mutagenesis lysine residue 39, 195, 319-324 tyrosine residue 114, 322-323 Agrobacterium tumefaciens, OccR transcriptional activator, 245 Albright hereditary osteodystrophy characterization, 95-97 G p expression, 97-100 Alpha operon, 348-351 Alport’s syndrome, COLAIAS mutations,
64 Ammonia, role in Dictyostelium discoideum stalk cell differentiation, 21 Anterior-like cells, Dictyostelium discoideum characterization, 10-13 movement at culmination, 17-18 Antp homeodomain peptide, NMR structure, 222-223 Arthro-ophthalmopathy, collagen type-I1 gene mutations, 53-54
C Cellular differentiation, Dictyostelium dis-
coideum DIF-induced prestalk-specific mRNAs, 8-10 markers identification, 3-6 prestalk-enriched, 7-8 prestalk cells ammonia effects, 21
DIF-inducible genes, 6-7 extracellular cAMP effects, 21-22 heterogeneity, 10-13 stalk cells ammonia effects, 21 extracellular cAMP effects, 21-22 Collagen genes chromosomal locations, 32 procollagen domain structures, 33 tabulation, 31 Collagen type-I genes mutations Ehlers-Danlos syndrome, 43-44 osteogenesis imperfecta biochemical classification, 36 carboxy-propeptide, 41-42 carboxy-telopeptide, 41-42 COLlAl insertional and site-directed mutagenesis, 44-46 gonadal mosaicism, 43 helical deletions/insertions, 40-41 helical glycine substitutions, 36-38 helical splicing, 39-40 null allelic, 42-43 somatic mosaicism, 43 normal, 30-35 Collagen type-I1 gene mutations achondrogenesis type-11, 48-51 genotype-phenotype relationships, 54-55 hypochondrogenesis, 48-51 Kniest syndrome, 53 spondyloepiphyseal dysplasia congenita, 51-53 Stickler syndrome, 53-54 normal, 47-48 site-directed mutagenesis, 55-57 Collagen type-111 gene mutations dominant-negative, in Ehlers-Danlos syndrome, 58-59 genotype-phenotype relationships, 61
401
102
INDEX
helical deletions, 60 helical glycine substitutions, 61 helical splicing, 60-61 null allelic, 61 normal, 57-58 Collagen type-I\' genes mutations, 63-64 normal, 62-63 Collagen type-VII gene mutations, 65-66 normal, 65 Collagen type-IX genes normal. 66 site-directed mutagenesis, 68 Collagen type-X gene mutations, 69 normal, 68-69 site-directed mutagenesis, 69-70 Cro protein-0, complex, DNA bending evidence. 206-209 site-specific flexibility, 209-213 Crystallography DNA hending, 204-206 engrailed homeodomain in Drosophila mehnogaster, 224-227 MATu2 homeodomain in yeast, m - 2 2 7 Culmination, DictyosteEiutn discoideum anterior-like cell movement, 17-18 ecmB promoter role, 18-19 intracellular CAMP role, 19-20 stalk tube formation, 16-17 Cyclic AMP-dependent protein kinase regtilation of Didyostelium stalk cell differentiation, 19-20 role i n Dictyostelium morphogenesis, 19-20 Cyclic AMP-signaling, Dictyostelium di.coideum
extracellular developmental roles, 3 prestalk cell apical sorting response, 15 in stalk cell differentiation, 21-22 transdnction system. 3 intracellular. 19-20
D Dermatosparaxis, 44 Dictyostelium discoideum cellular differentiation. markers
identification, 3-6 prestalk-enriched. 7-8 culmination anterior-like cell movement, 17-18 ecmB promoter structure, 18-19 stalk-tube- formation, 17- 18 DIF-induced prestalk-specific mRNAs,
8-10 extracellular CAMP-signaling characterization, 3 prestalk cell apical sorting response, 15 prestalk cell heterogeneity, 10-13 slug formation apical sorting of' prestalk cells, 15 pstB cell migration, 15 tip formation, 13-15 stalk cell differentiation ammonia effects, 21 CAMP-dependent protein kinase role, 19-20 extracellular CAMP effects, 21-22 Differentiation-inducing factor-inducible genes alternative types, 7-8 markers of prestalk differentiation, 6-7 prestalk-specific mRNA analyses, 8-10 DIF genes, see Differmtiation-inducing factor genes DNA bending anisotropic flexibility, 200, 211 characterization, 199-200 Cro-0, nucleoprotein complex, 206-209 crystallographic evidence, 204-206 gel migration evidence, 202-204 models, 199-200 DNA-bending proteins catabolic gene activator protein, 208-209 C, class, 227-233 C,H, class. 227-233 Cro protein from phage 434, 214-215 from phage A, 215-218 engrailed homeodomain in Drosophila tnelanogaster, 224-227 eukaryotic, 221-227 Fos transcription factor, 236-237 GAL4 transcriptional factor, 233-234 GCN4 transcription factor, 239-240 initiation wmplex for phage 4, DNA replication, 247 Jun transcription factor, 236-237
403
INDEX
leucine zippers, 236-240 MATu2 homeodomain in yeast, 224-227 minor-groove-binding, 240-243 Myc proteins, 237-239 NF,B protein, 243-245 nuclear receptor proteins, 234-236 OccR transcriptional activator, 245 p6 complex, 247 plant AT-rich binding sites, 251-252 DNA-binding motifs, 252-253 G-box motif, 249-250 leucine zippers, 249-250 triple helix-turn-helix motif, 252 zinc-binding, 250-251 prokaryotic, 214-221 repressor protein from phage 434, 214-215 from phage A, 215-218 sequence-directed structure-function relationships phytochrome phA3 promoter in oats, 256-259 proximal pmajglobin promoter in mouse,
254-256 TATA promoter element, 242-243 TFIID transcription factor complex, 240243 TFIIIA, 227-233 trp repressor protein, 219-221 Zif268, 232-233 zinc-binding, 227-233 zinc fingers, 227-233 DNA polymerase a catalytic core dNTP recognition and binding, 378379 Mgz+ role, 376-378 primer recognition and binding, 375376 substrate recognition and binding, mechanism, 372-373, 379-380 template recognition and binding, 372375 human, holoenzyme and catalytic core protomer, comparison, 380-382 and holoenzyme purified from Drosophila embryos, comparison, 382385 incorporation mechanism, 389-390
interactions with template-primers containing chemically damaged nucleotides, 386-389 DNA polymerase 6 PAGE-detectable complex with PCNA and synthetic oligonucleotide template-primers, 393-394 stimulation by proliferating cell nuclear antigen, 391-392 substrate recognition and binding, 390391 Drosophila melanogaster, DNA polymerase-u holoenzyme comparison with human DNA polymerase-a holoenzyme, 382-385 interaction with abasic site-containing template-primers, 386-388
E ecmA gene extracellular cAMP stimulation, 21-22 mRNA functional analysis, 8-10 induction kinetics, 6-7 promoters, 10-13 ecmB gene extracellular cAMP stimulation, 21-22 mRNA functional analysis, 8-10 induction kinetics, 6-7 promoter, 10-13, 18-19 regulation, 17-18 Ehlers-Danlos syndrome collagen type-I11 gene mutations, 58-59 mutations of collagen type-I genes, 4344 Engrailed homeodomain, in Drosophila mehnogaster, cocrystd structure, 224227 Epidermolysis hulbosa, collagen type-VII gene mutation, 65-66 Escherichia coli glycogen biosynthesis ADPglucose pyrophosphorylase allosteric regulation, 315-319 catalytic and effector sites, 319-322 cloning from mutants, 324-326 csrA mapping and characterization, 311-312
404
INDEX
enzyme induction in stationary phase, 30 1 gl&AP(Y) operon control, 304-306 glgC dependence on sigma factors, 312-314 integrated model of genetic regulation. 314-315 negative genetic control, 306-307 transposon mutants, 307-309 TR1-5 mutation effects in csrlA::kanR, 309-31 1 positive control h y genes outside glg gene duster, 314 StrlictUrdl genes, localization, 301-304 ribosomal protein synthesis alpha operon, 348-351 autogenous control, 336-338 gene organi7;ition, ,332-336 LIO opcrwi, 358-359 L11 operon. 356-358 L20 operon, 361-362 mRNA elongation. 340 inRNA stability, 340 regulatory r-protein-mRNA interaction, 338-339 rif region, 3.55-357 S 10 operon, IA-rnediated regulation, 343-:345 tranhcription, 345-346 translation. :346-348 S15 operon. 351-353 spc operon, 3S3-355 str operon, 3612-363 translation inhibition, 339-340 Ethanol, effects on G protein expression, lO"l05
F Fos transcription fxtor. structure and recognition sequence, 236-237
G GAL4 transcriptional factor, structure and rtw)gnition sequence, 23-2.34 C;-l)ox motif, plant regulatory proteins. 249250
GCE4 transcriptional factor, yeast, structure and recognition sequence, 239-240 Gel electrophoresis detection of DNA bending in Cro protein-0, complex, 206-209 mnbility shift assay for pol &.PCNA.template-primer complex, 393-394 Gene mapping, csrA, 311-312 glg operon negative control via csrA, 306-311 positive control by alternative sigiiia factor, 314 Glucocorticoids, effects on C; protein expression, 102 Glycine substitiitions collagen type-I genes, 36-39 collagen type-111 gene. 61 Glycogen, biosynthesis in Escherichia coli csrA mapping and characterization, 311312 enzyme induction in stationary phase. 301 glgCAP(Y) operon control, 304-306 gl& dependence on sigma factors, 312 314 integrated model of genetic regulation, 314-315 negative genetic control. 306-307 transposon mutants, 307-309 TR1-5 mutation effects in csrlA::kanK, 309-311 positive control b y genes outside glg gene cluster, 314 structural genes, localization, 301-304 Gonadal mosaicism, in osteogenesis imperfecta. 43 G proteins a subunit functions, 86-88 structure, 82-86 altered expression, 101-105 altered function, clinical implications, 93-94 p subunit, structure, 88-90 p3 subunit, tissue distribution, 92-93 py dimers, function, 90-92 mutations Albright hereditary osteodystrophy. 95-97
405
INDEX analytical methods, 94-95 G,a protein expression in Albright hereditary osteodystrophy, 97-100 genetic studies, 94 McCune-Albright syndrome, 100-101 sporadic endocrine neoplasia, 100 Guanine nucleotide binding proteins, see G proteins
K Kinking, DNA neokinks, 204-205 stereochemical, 201-202 Kniest syndrome, collagen type-I1 gene mutations, 53
L
H Heat stress, effects on intron splicing, 186-
188 Hereditary nephritis, COLAIAS gene mutations, 64 Hypochondrogenesis, collagen type-I1 gene mutations, 48-51 Hypoparathyroidism, G protein mutations, 95-97
Leucine-zipper proteins plant, 249-250 structures and recognition sequences,
236-240 Lithium, effects on G protein expression,
102 L10 operon, 358-359 L11 operon, 356-358 L20 operon, 361-362
M I Insulin deficiency, effects on G protein expression, 102 Intron recognition cis requirements, 150-155 intron splicing in heterologous systems,
164-167 model, 171-174 snRNAs, 155-158 Intron splicing alternative, 183-185 (A U)-rich sequences, 165-171 enhancement of gene expression by, 185-
+
186 in heterologous systems, 164-167 site selection, 171-174 stress effects, 186-188
J Junction model, DNA bending, 199-200 Jun transcription factor, structure and recognition sequence, 236-237
Magnesium ion, in DNA polymerase a template/primer binding, 376-378 Maize genes, dSpm effects on RNA processing alterations after insertions, 178-183 Ds elements, 175-176 RNA fates after Ds insertions, 176-177 Spm properties, 177-178 MATa2 homeodomain, in yeast, cocrystal structure, 224-227 McCune-Albright syndrome, G protein mutations, 100-101 Messenger RNA, nonsense-mediated decay cis-acting factors, 272-276 codon location role, 272-275 downstream element role, 277-280, 294-
295 functions, 290-293 pathway modulation, 280-283 position effects, 273-276 requirements in PGKI, 277 trans-acting factors, 283-290 translation-turnover association, 293-294 trans mutants for cis element identification, 288-290 UPF gene product effects, 283-284
406
INDEX
Upflp-interacting proteins, gene identi' fication. 286-288 Upflp localization, 2M-2885 Mosaicism, in osteogenesis imperfecta, 43 Mutagenesis insertional COIJAI gene, a - 4 6 collagen type-I genes. 40-41 site-directed. see Site-directed mutagenesis M y proteins, structures and recognition sequences. B7-239
N Nephritis, hereditary, COLAlAQgene mutations. 64 NF,B protein. binding to DNA, %3-245 Nitric-oxide synthaw. similarity with prostaglandin synthase, 142-143 Nuclear magnetic resonance. Antp honieodomain peptide, 2.Z-223 Nuclear receptor proteins. structures and recognition sequences, 234-236 Nucleus, pre-mRNA processing, see RNA processing, nuclear
0 OccR transcriptional activator, Agrohocterium tumefaciens, 245 Osteodystrophy, Albright hereditary, see Albright hereditary osteodystrophy Osteogenesis imperfecta lethal prenatal, 44-46 type 1A. 46
P pDd26 gene, inRNA induction kinetics, 6-7 pDd56 gene, see ecmR gene pDdm gene, see wnd gene pgs-1 gene heterodirner formation with tislO/pgs-2 proteins, 1-12 structure, 137-139
Phorbol-induced primary response genes induction by ligands, 116-117 prostaglandin synthesis induction in Swiss 3T3 cells, 126-130 proteins encoded by, 118 subsets, in cell-type responses, 117-1 18 in 3T3 cells, induction by mitogens, 116 tislO gene, 124-125 tisl0lpgs-2 heterodimer formation with pgs-l proteins, 142 induction in macrophages, 139-142 in Swiss 3T3 cells, 130 inhibition, 130- 134 regulation at transcriptional level, 134137 as second pgslcox gene, 139 structure, 137-139 tisll gene, 121-124 tisZl gene, 119-121 transcription factors encoded by, 118-119 p6 protein, structure and recognition sequence, 247 Prestalk cells, r)ict~~ustelier~ni discoideerrm apical sorting response to CAMP-signaling, 15 cell differentiation markers, 6-8 heterogeneity, 10- 13 movement during culmination, 17-18 mHNAs, finictional analysis, 8-10 Proliferating cell nuclear antigen interaction with pol 6 and synthetic oligonucleotide template-primers, 393394 stimulation of D N A polymerase 6, 391392 Promoters AT-rich binding sites in plants. 251-252 ecru4 gene, 10-13 ectnB gene, 10-13, 18-19 phytochrome pi&& 256-259 proximal pmaJg1obinin mouse, 254-256 1N~-Propanodeoxygu~ti~1sinc. tcmplateprimers, interaction with D N A polymerase a, 388-389 Prostaglandins. phorbo-induced synthesis in Swiss 3T3 cells. 126-130 Prostaglandin synthase, similarity with nitric-oxide synthase, 142-143
407
INDEX
Protein-DNA recognition, sequence dependence binding specificity, 196-199 DNA bending, 199-200 DNA kinking, 200-202 Protein kinase, CAMP-dependent regulation of Dictyostelium discoideum stalk cell differentiation, 19-20 role in Dictyostelium discoideum morphogenesis, 19-20 Proteins, synthesis in Escherichia coli ribosomes alpha operon, 348-351 autogenous control, 336-338 gene organization, 332-336 L10 operon, 358-359 L11 operon, 356-358 L20 operon, 361-362 mRNA elongation, 340 mRNA stability, 340 regulatory r-protein-mRNA interaction, 338-339 rif region, 355-357 S10 operon, LA-mediated regulation, 341348 LA binding to RNA, 343-345 transcription, 345-346 translation, 346-348 S15 operon, 351-353 spc operon, 353-355 str operon, 362-363 translation inhibition, 339-340 Pseudohypoparathyroidism, G protein mutations, 96-97 pstAB cells basipetal migration, 15 characterization, 10-13 pstA cells, characterization, 10-13 pstO cells, characterization, 10-13
R Ribosomes, Escherichia coli, protein synthesis alpha operon, 348-351 autogenous control, 336-338 gene organization, 332-336 L10 operon, 358-359 L11 operon, 356-358
L20 operon, 361-362 mRNA elongation, 340 mRNA stability, 340 regulatory r-protein-mRNA interaction, 338-339 rif region, 355-357 S10 operon, LA-mediated regulation, 341348 LA binding to RNA, 343-345 transcription, 345-346 translation, 346-348 S15 operon, 351-353 spc operon, 353-355 str operon, 362-363 translation inhibition, 339-340 nyregion, 355-357 RNA, messenger, see Messenger RNA RNA processing, nuclear assays for plant splicing, 158-159 cis requirements, 150-155 intron splicing in heterologous systems, 164-167 maize genes, dSpm effects alterations after insertions, 178-183 Ds elements, 175-176 fates after Ds insertions, 176-177 Spm properties, 177-178 plant intron structure base content, 162-164 consensus splice junction sequences, 154 conserved motifs, 160-162 intron length, 162-164 snRNAs, 155-158
S Site-directed mutagenesis COLlAl gene, 44-46 collagen type-11 gene, 55-57 collagen type-IX genes, 68 collagen type-X gene, 69-70 Slug formation, Dictyostelium discoideum, 13-15 Somatic mosaicism, in osteogenesis imperfecta, 43 S10 operon, L4-mediated regulation, 343345
408
INDEX
transcription. .%5-,346 mutants, 288-290 translation, 316-348 I'Pflp S1S operon, 3.5-3S3 genes encoding interacting proteins, 286spc operon, 353-355 288 Spondyloepiphyseal dysplasia congenita, colsubcellular localization, 284-285 lagen type-I1 gene mutations, 51-53 Sporadic endocrine neoplasia, G protein mutations, 100 W Stalk cells. Dictyosteliurn discoideurn, differentiation aninionia effects, 21 Wedge model, DNA bending, 199 cAhdP-dependent protein kinase role, 19-20 extracellular cAhlP effects, 21-22 Y Stickler syndrome, collagen type-I1 gene mutations, 53-54 Stress, effects on intron splicing, 186-188 Yeast str operon, 362-36-3 GCN4 transcriptional factor. 239-240 nonsense-mediated mRNA decay cis-acting factors, 272-276 T codon location role, 272-275 downstream element role, 277-280, TATA promoter element. flexibility, 242-243 294-295 Tetradecanoyl phorbol acetate-induced sefhctions, 290-293 quence genes, we Phorbol-induced pripathway modulation, 280-283 m a n response genes position effects. 273-276 TFIID transcription factor complex, 240requirements in PGKl. 277 243 trans-acting factors. 283-290 TFIIIA transcription factor. structure, 227translation-turiiover association, 2932x3 294 tir genes, see Phorbol-induced priman retrans mutants for cis element identificasponse genes tion, 288-290 TISIPGS-2 protein. induction in Swiss 3TJ UPF gene product effects, 283-284 cells, 126-130 Upflp-interacting proteins, gene identiTPA-induced sequence genes, see Phorbolfication, 286-288 induced primary response genes Upflp bcalization, 284-285 Transposable element-induced mutants D Selements, 17.5-176 d S p elements, ~ 177-178 Z RNA processing fate after Ds insertion in exons, 176-177 after d S l m insertion, 178-183 ZiBGH protein, structure and recognition wqueiice. 232-233 Zinc-binding proteins, C, class, 233U 236 Zinc-finger proteins L'PF gene plant, 250-251 effects on iionsense-mediated mRNA destructure and recognition sequence, 227cay, 283-28s 233