Methods
in
Molecular Biology
Series Editor John M. Walker School of Hertfordshire Hatfield, Hertfordshire, AL10 9AB, UK
For other titles published in this series, go to www.springer.com/series/7651
Plant MicroRNAs Methods and Protocols
Edited by
Blake C. Meyers and Pamela J. Green Department of Plant & Soil Sciences, and Delaware Biotechnology Institute, University of Delaware, Newark, DE, USA
Editors Blake C. Meyers Department of Plant & Soil Sciences, and Delaware Biotechnology Institute, University of Delaware, Newark, DE, USA
Pamela J. Green Department of Plant & Soil Sciences, and Delaware Biotechnology Institute, University of Delaware, Newark, DE, USA
ISSN 1064-3745 e-ISSN 1940-6029 ISBN 978-1-60327-004-5 e-ISBN 978-1-60327-005-2 DOI 10.1007/978-1-60327-005-2 Springer New York Dordrecht Heidelberg London Library of Congress Control Number: 2009933463 © Humana Press, a part of Springer Science+Business Media, LLC 2010 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Humana Press, c/o Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. While the advice and information in this book are believed to be true and accurate at the date of going to press, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Printed on acid-free paper Humana Press is part of Springer Science+Business Media (www.springer.com)
Preface We have assembled a set of protocols that we believe represent the state of the art for laboratory and computational analyses of plant microRNAs (miRNAs). These small, noncoding RNA molecules of about 21 nucleotides have made a grand entrance onto the scientific stage with their discovery in the 1990s and their stunning ascent to stardom in the early years of the current decade. Along the way, it has been demonstrated that miRNAs are simply one of several classes of small RNAs produced in plant cells, albeit a particularly important class given the broad phylogenetic conservation and strong regulatory effects of many miRNAs. Plant miRNAs are uniquely interesting for their ancient evolutionary origins and their strong post-transcriptional regulatory effects. Most chapters of this volume focus on the identification, validation, and characterization of the miRNA class of RNAs. However, a topic that cannot avoid mention is the other classes of small RNAs that are biochemically similar in size and composition (although of somewhat genetically distinct origins). For example, in the cloning and characterization of plant miRNAs, the resulting data set is rich with another significant class of small RNAs, substantially more prevalent in plants than other organisms – the heterochromatic, small, interfering RNAs (heterochromatic siRNAs). As you will see, the methods contained in this volume emphasize miRNA analyses, but include ways to distinguish one class of small RNAs from another. One chapter describes how to characterize candidate members of a unique class of siRNAs (trans-acting siRNAs) that are dependent on the action of miRNAs to initiate their formation. The biogenesis of miRNAs is dependent on an increasingly well-defined set of proteins that include enzymes for the precise excision of the mature miRNA from a longer precursor, as well as the modification of this mature miRNA, export from the nucleus, and interaction with its target. This volume starts with a chapter that clearly lays out the cellular participants in the production of miRNAs and the utility of studying these genes and gene products. Discussions of the genes that encode these proteins are found throughout the volume, as partial or full loss-of-function mutants in these genes are important components in the toolbox for studying miRNAs. One set of chapters describes the standard approaches for purification of the working material for the study of miRNAs: the small RNA component of the transcriptome. While not always easy, depending on the composition and source of the plant tissue that is used, the protocols describe here should cover a broad swath of even the most recalcitrant plant materials, resulting in very high quality RNA that can be used for library construction. An alternative approach described in one of the chapters is the isolation of small RNAs associated with Argonaute proteins. With purified small RNA in hand, one is ready to begin the characterization of small RNAs by any one of numerous experimental approaches, the most common of which is deep sequencing by “next generation” technologies, a process that leads to datasets on the scale of millions of small RNAs per reaction. This volume includes a protocol for the generation of sequencing libraries from purified small RNA. After sequencing, the next significant challenge faced by the experimentalist will be the handling of the data – trimming, mapping, organizing, and analyzing the millions of short sequence reads.
v
vi
Preface
Several chapters describe methods for analyzing miRNA-directed regulation of target RNAs. In its most basic form, the identification of the regulatory targets of plant miRNAs is based on the observation of a near-perfect complementarity between miRNAs and their mRNA targets. This makes computational-based target prediction simpler in plants than in animals. The pairing of miRNAs with an mRNA, in plants, typically results in cleavage of the mRNA target. Such approaches to target prediction in plants are addressed in this volume, as are both standard approaches to validating specific target cleavage events and the exciting development of genome-wide methods to characterize cleaved mRNAs in a single library. This volume includes a series of chapters that discuss approaches to analyzing the functional role of plant miRNAs. This includes computational methods for the prediction of plant miRNA targets and the experimental methods that can supply validation data to support these predictions. Computational methods have also been applied to the study of gene regulatory sequences in promoters, an application that works well to identify promoter elements and potential transcription factor binding motifs in plant miRNA precursors. Regulatory elements contribute to the regulation of expression of miRNAs in response to stresses such as biotic and abiotic stress; under such conditions, miRNAs in turn regulate other transcripts creating variation in both their levels and expression patterns. In situ hybridizations have long been used for the localization of messenger RNAs and proteins, but an exciting recent application of this methodology is the ability to localize miRNAs in plant tissues using a new generation of highly sensitive probes. And when these data have been brought together to infer the function of a miRNA, plants are amenable to an assessment of this predicted function using transient assays. All of these topics are the subjects of chapters in this volume, and we believe that these will provide valuable contributions and useful material for our readers’ experimental work. Interestingly, studies of the biology of plant miRNAs have been somewhat turned on their head with the realization that miRNAs can also be used as tools themselves for the study of the biological function of other genes. A chapter in this volume describes the development of artificial miRNAs, a powerful tool for the investigation of gene function with current applications in both forward and reverse genetics experiments. While most of the initial studies and basic approaches have been developed in Arabidopsis, many exciting advances are sure to come from the application of these and other methods to other plant species. Indeed, our goal in editing this book is to provide the community with a set of protocols that will help advance miRNA research for all plant species. To this end, all or nearly all of the protocols could be used for any plant species of interest. We have included several chapters that will be of particular interest to plant biologists working in non-model species, including a set of approaches for RNA purification from quite diverse species and tissues, as well as an overview of computational methods to handle data from a broad set of species. MicroRNA activities and stability are dependent on a series of modifications and processing. One of the most well-characterized modifications is 3′ methylation, an activity carried out by the HUA ENHANCER 1 methyltransferase, which may stabilize the miRNA by preventing uridylation and by diminishing exonuclease-mediated degradation. One chapter in this volume describes the methods by which 3′ methylation can be assessed. Surely there are many additional advances yet to come in this field, contributing new methods in parallel to making additional strides in our understanding of the biology of small RNAs. As an example, a collaborative effort of many of the labs that contributed to this volume led to an overhaul of the criteria used to define plant miRNAs (1). That manuscript
Preface
vii
came out too late to be fully reflected in this volume, and other rapid changes in the field will keep this as one of the fastest moving fields in plant biology. Much as deep sequencing represented a tremendous advance, dare we say a revolution in the study of small RNAs, there are unquestionably greater things yet to come in the methods for the analysis of plant miRNAs. This may include further advances in understanding the cell specificity, abundance levels, trafficking, modifications, target interactions, or biological roles. In summary, we are excited by the prospect of the experiments that this volume may facilitate or even inspire. As students (broadly speaking), practitioners, or theorists in the field of plant molecular biology, we hope that you will find many or all of these chapters to be of use in your work. The chapters may serve to introduce you to a new field of work, or extend your capabilities in a topic in which you are already quite familiar. While a mastery of the techniques in this volume is not a requisite for success in the field of small RNAs, an incredible group of contributors has contributed a set of protocols. Finally, this book would not have come to fruition without the careful editorial and administrative assistance of Sharon Bancroft, along with additional administrative help by Charlotte McDermitt and Kathy Fleischut. Most of all, we are incredibly grateful to the contributing groups who have taken their time to describe in exquisite detail the methods, tips, and tricks that they use. We hope that you find this unique collection of protocols helpful to your research. Blake C. Meyers Pamela J. Green Newark, DE Reference 1. Meyers BC, Axtell MJ, Bartel B, Bartel DP, Baulcombe D, Bowman JL, Cao X, Carrington JC, Chen X, Green PJ, Griffiths-Jones S, Jacobsen SE, Mallory AC, Martienssen RA, Poethig RS, Qi Y, Vaucheret H, Voinnet O, Watanabe Y, Weigel D, Zhu JK (2008) Criteria for annotation of plant microRNAs. Plant Cell 20:3186–3190
Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
v xi
1 Piecing the Puzzle Together: Genetic Requirements for miRNA Biogenesis in Arabidopsis thaliana . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zhixin Xie
1
2 Prediction of Plant miRNA Genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Matthew W. Jones-Rhoades 3 Methods for Isolation of Total RNA to Recover miRNAs and Other Small RNAs from Diverse Species . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Monica Accerbi, Skye A. Schmidt, Emanuele De Paoli, Sunhee Park, Dong-Hoon Jeong, and Pamela J. Green
19
31
4 miRNA Target Prediction in Plants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Noah Fahlgren and James C. Carrington
51
5 A Method to Discover Phased siRNA Loci . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michael J. Axtell
59
6 Directed Gene Silencing with Artificial MicroRNAs . . . . . . . . . . . . . . . . . . . . . . . Rebecca Schwab, Stephan Ossowski, Norman Warthmann, and Detlef Weigel
71
7 Bioinformatics Analysis of Small RNAs in Plants Using Next Generation Sequencing Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kan Nobuta, Kevin McCormick, Mayumi Nakano, and Blake C. Meyers
89
8 High-Throughput Approaches for miRNA Expression Analysis . . . . . . . . . . . . . . 107 Cheng Lu and Frédéric Souret 9 In Situ Detection of miRNAs Using LNA Probes . . . . . . . . . . . . . . . . . . . . . . . . 127 Zoltán Havelda 10 Analysis of miRNA Modifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Bin Yu and Xuemei Chen 11 MicroRNA Promoter Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Molly Megraw and Artemis G. Hatzigeorgiou 12 Computational Methods for Comparative Analysis of Plant Small RNAs . . . . . . . 163 Gayathri Mahalingam and Blake C. Meyers 13 Biotic Stress-Associated microRNAs: Identification, Detection, Regulation, and Functional Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Florence Jay, Jean-Pierre Renou, Olivier Voinnet, and Lionel Navarro
ix
x
Contents
14 Abiotic Stress-Associated miRNAs: Detection and Functional Analysis . . . . . . . . . 203 Dong-Hoon Jeong, Marcelo A. German, Linda A. Rymarquis, Shawn R. Thatcher, and Pamela J. Green 15 Processing of miRNA Precursors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 Yukio Kurihara and Yuichiro Watanabe 16 Purification of Arabidopsis Argonaute Complexes and Associated Small RNAs . . . 243 Yijun Qi and Shijun Mi 17 Transient Assays for the Analysis of miRNA Processing and Function . . . . . . . . . . 255 Felipe F. de Felippes and Detlef Weigel Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
Contributors Monica Accerbi • Delaware Biotechnology Institute, University of Delaware, Newark, DE, USA Michael J. Axtell • Department of Biology, Pennsylvania State University, University Park, PA, USA James C. Carrington • Department of Botany and Plant Pathology, Center for Genome Research and Biocomputing, Oregon State University, Corvallis, OR, USA Xuemei Chen • Department of Botany and Plant Sciences, Institute of Integrative Genome Biology, University of California, Riverside, Riverside, CA, USA Felipe F. de Felippes • Max Planck Institute for Developmental Biology, Tübingen, Germany Emanuele De Paoli • Delaware Biotechnology Institute, University of Delaware, Newark, DE, USA Noah Fahlgren • Department of Botany and Plant Pathology, Center for Genome Research and Biocomputing, and Molecular and Cellular Biology Graduate Program, Oregon State University, Corvallis, OR, USA Marcelo A. German • Delaware Biotechnology Institute, University of Delaware, Newark, DE, USA Pamela J. Green • Department of Plant and Soil Sciences, School of Marine Science and Policy, and Delaware Biotechnology Institute, University of Delaware, Newark, DE, USA Artemis G. Hatzigeorgiou • Department of Genetics, Center for Bioinformatics, School of Medicine, Department of Computer and Information Science, School of Engineering, University of Pennsylvania, Philadelphia, PA, USA Institute of Molecular Oncology, Biomedical Sciences Research Center “Alexander Fleming”, Vari-Athens, Greece Zoltán Havelda • Agricultural Biotechnology Center, Gödöllo˝, Hungary Florence Jay • Institut de Biologie Moléculaire des Plantes, CNRS UPR2353 – Université Louis Pasteur, Strasbourg Cedex, France Dong-Hoon Jeong • Delaware Biotechnology Institute, University of Delaware, Newark, DE, USA Matthew W. Jones-Rhoades • Department of Biology, Knox College, Galesburg, IL, USA Yukio Kurihara • Department of Life Sciences, University of Tokyo, Tokyo, Japan Cheng Lu • DuPont Agricultural Biotechnology, RT 141 & Henry Clay, Wilmington, DE, USA Gayathri Mahalingam • Delaware Biotechnology Institute, University of Delaware, Newark, DE, USA Kevin McCormick • Delaware Biotechnology Institute, University of Delaware, Newark, DE, USA
xi
xii
Contributors
Molly Megraw • Department of Genetics, Center for Bioinformatics, School of Medicine, University of Pennsylvania, Philadelphia, PA, USA, Institute for Genome Sciences & Policy, Duke University, Durham, NC, USA Blake C. Meyers • Department of Plant and Soil Sciences, and Delaware Biotechnology Institute, University of Delaware, Newark, DE, USA Shijun Mi • National Institute of Biological Sciences, Beijing, China Mayumi Nakano • Delaware Biotechnology Institute, University of Delaware, Newark, DE, USA Lionel Navarro • Institut de Biologie Moléculaire des Plantes, CNRS UPR2353 – Université Louis Pasteur, Strasbourg Cedex, France Kan Nobuta • Delaware Biotechnology Institute, University of Delaware, Newark, DE, USA Stephan Ossowski • Department of Molecular Biology, Max Planck Institute for Developmental Biology, Tübingen, Germany Sunhee Park • Delaware Biotechnology Institute, University of Delaware, Newark, DE, USA Yijun Qi • National Institute of Biological Sciences, Beijing, China Jean-Pierre Renou • UMR Génomique Végétale INRA-CNRS-UEVE, 2 rue G.Crémieux, Evry, France Linda A. Rymarquis • Delaware Biotechnology Institute, University of Delaware, Newark, DE, USA Skye A. Schmidt • School of Marine Science and Policy and Delaware Biotechnology Institute, University of Delaware, Newark, DE, USA Rebecca Schwab • Institut de Biologie Moléculaire des Plantes (CNRS), Strasbourg Cedex, France FrÉdÉric Souret • Affymetrix Inc., Cleveland, OH, USA Shawn R. Thatcher • Chemistry–Biology Interface Program, Department of Chemistry and Biochemistry, and Delaware Biotechnology Institute, University of Delaware, Newark, DE, USA Olivier Voinnet • Institut de Biologie Moléculaire des plantes, CNRS UPR2353–Université Louis Pasteur, Strasbourg Cedex, France Norman Warthmann • Department of Molecular Biology, Max Planck Institute for Developmental Biology, Tübingen, Germany Yuichiro Watanabe • Department of Life Sciences, University of Tokyo, Tokyo, Japan Detlef Weigel • Department of Molecular Biology, Max Planck Institute for Developmental Biology, Tübingen, Germany Zhixin Xie • Department of Biological Sciences, Texas Tech University, Lubbock, TX, USA Bin Yu • Department of Botany and Plant Sciences, Institute of Integrative Genome Biology, University of California, Riverside, Riverside, CA, USA, Center for Plant Science Innovation, University of Nebraska, Lincoln, NE, USA
Chapter 1 Piecing the Puzzle Together: Genetic Requirements for miRNA Biogenesis in Arabidopsis thaliana Zhixin Xie Abstract MicroRNAs (miRNAs) are an important class of endogenous small silencing RNAs in both plants and animals. They regulate the expression of a wide range of target genes that are involved in many important biological processes. Biogenesis of plant miRNAs requires a distinct set of proteins, including members that belong to several highly conserved RNA silencing protein families. The framework for miRNA biogenesis in plants was revealed through genetic and biochemical analyses using mutants that are defective in miRNA accumulation. These general miRNA-deficient mutants constitute a set of invaluable genetic resources for the plant miRNA research community. They could be utilized to experimentally validate the candidate miRNAs that are either predicted by a computational program or recovered from a small RNA deep sequencing effort which is becoming a more affordable and widely used approach for small RNA discovery. Starting with a brief introduction on multiple small RNA pathways in plants, this chapter provides basic experimental procedures for the examination of miRNA accumulation from wild type plants and various mutant lines in Arabidopsis. Key words: miRNA biogenesis, Arabidopsis thaliana, miRNA-deficient mutants, DICER-LIKE1 (DCL1), HUA ENHANCER1 (HEN1), ARGONAUTE1(AGO1), HYPONASTIC LEAVES1 (HYL1), SERRATE (SE), HASTY(HST), miRNA detection
1. Introduction Nearly 7 years have passed since microRNAs (miRNAs) and other endogenous small silencing RNAs were discovered in plants (1–4), shortly after the seminal discovery of silencing-associated small RNAs in plants (5), and the cloning of miRNAs from animals (6–8). Much has been learned about the genomic origin, complexity, biogenesis pathway, biological function, and possible mechanisms of evolution for these small regulatory RNA molecules (reviewed B.C. Meyers and P.J. Green (eds.), Plant MicroRNAs, Methods in Molecular Biology, vol. 592 DOI 10.1007/978-1-60327-005-2_1, © Humana Press, a part of Springer Science+Business Media, LLC 2010
1
2
Xie
in (9–12). This chapter will describe the reverse genetics approach and the related experimental procedures that have been used to elucidate the miRNA biogenesis pathway in the model plant Arabidopsis thaliana. Essentially, a generic miRNA pathway gene could be established if the loss-of-function mutation leads to the loss of miRNA production, and therefore to the loss of miRNA accumulation. This approach should also be applicable to other plant species as long as the required genomic and genetic resources are available. The method can be applied to experimentally validate the candidate miRNAs that are either predicted by a computational program or recovered from a small RNA deep sequencing effort which is becoming a more affordable and widely used approach for miRNA discovery (13–17). Since the targeted audience of these methods series are those who are either new to miRNA research or less familiar with the recent development of the field. This chapter will begin with a brief overview of the multiple small RNA pathways in plants. Detailed protocols will follow for polymerase chain reaction (PCR)-based mutant genotyping as well as miRNA detection by northern blot assays. 1.1. The RNA Silencing Machinery in Plants
The generation of small RNAs of 21–24-nucleotides (nt) is a hallmark of RNA silencing, which involves a set of evolutionarily conserved proteins that are found exclusively in eukaryotes (10). The proteins that belong to the families of DICER (DCR) or DICER-LIKE (DCL), ARGONAUTE (AGO), and RNADEPENDENT RNA POLYMERASE (RDR) form the core RNA silencing machinery in plants (9, 10). Both the DCR/DCL and the AGO family proteins are highly conserved across the eukaryotic kingdoms while the RDR family proteins appear to be missing in mammals and insects. Perhaps one of the most remarkable features of the RNA silencing system in plants is the existence of multiple RNA silencing factors that form functionally distinct modules for small RNA generation and function (10, 18). The A. thaliana genome, for example, contains four expressed DCL genes, six RDR genes, and ten AGO genes (9). It has become clear that the high complexity of the regulatory small RNA component in plants is largely attributable to the proliferation and subsequent functional diversification of the RNA silencing machinery on an evolutionary time scale.
1.2. miRNAs and Other Small RNAs in Plants
Although chemically similar, the small silencing RNAs in plants can be classified into two broad categories: miRNAs and small interfering RNAs (siRNAs) (10). The major difference between these two categories lies in their genomic origin and precursor structures. miRNAs arise from defined genetic loci – the MIRNA genes, which are found predominantly within genomic segments previously annotated as intergenic regions (IGRs) (11).
Piecing the Puzzle Together
3
Typically, RNA polymerase II (Pol II) transcription from a MIRNA locus gives rise to a miRNA primary transcript (pri-miRNA) that is capable of forming a characteristic fold-back hairpin structure (19–21). This intra-molecular, imperfect dsRNA structure is recognized and processed by a DCL1-containing complex to give rise to a miRNA/miRNA* duplex, presumably through a multi-step process involving a stem-loop intermediate (pre-miRNA) (19, 22– 24). The fate of each strand in a miRNA/miRNA* duplex, appears to be predetermined by the thermodynamic features of the duplex (25, 26). That is, the miRNA strand is selectively incorporated into an AGO1-containing, RNA-induced silencing complex (RISC) upon unwinding, whereas the miRNA* strand is often short-lived (27–29). Mature plant miRNAs are typically 21-nt in size. They negatively regulate gene expression by guiding target recognition and cleavage through the AGO1-containing miRISC (11). SiRNAs, on the other hand, arise from perfect dsRNA precursors that are formed through various mechanisms, typically involving an RDR activity. The dsRNA precursors are processed into small RNA duplexes by one of the other three DCLs (DCL2, DCL3, or DCL4). Genetic and small RNA deep sequencing analyses have revealed at least three siRNA-generating systems in Arabidopsis, each requiring a distinct set of DCL and RDR proteins (10, 12). (1) Heterochromatin-associated siRNAs: These are typically 24-nt small RNAs associated with genomic repetitive sequences such as transposable elements and pericentromeric repeats (30, 31). Biogenesis and function of these siRNAs require RDR2, DCL3, AGO4, and the plant-specific nuclear RNA polymerase IV (Pol IV) (30–36). The heterochromatin-associated siRNA pathway operates in the nucleus to guide heterochromatin formation (37, 38). (2) Trans-acting siRNAs (ta-siRNAs): Biogenesis of ta-siRNAs is initiated by miRNA-directed cleavage of noncoding RNAs known as TAS primary transcripts (39, 40). RDR6 converts the cleaved TAS RNA into dsRNA, a process that also requires SUPPRESSOR OF GENE SILENCING 3 (SGS3), a coiled-coil protein with unknown function (41, 42). DCL4 processes the dsRNA into phased 21-nt siRNA arrays (43–45). (3) Natural antisense transcript-associated siRNAs (nat-siRNAs): These are siRNAs originating from dsRNAs formed by convergent transcription of two partially overlapping genes (46, 47). The biogenesis of the primary nat-siRNAs appears to be either DCL1- or DCL2-dependent. Both the ta-siRNAs and nat-siRNAs can direct target cleavage. All classes of functional small RNAs characterized so far in plants are 3¢-methylated through the activity of HUA ENHANCER 1 (HEN1), a small RNA methylase (48, 49). The methylation may serve to stabilize small RNAs in vivo by protecting them from nucleolytic activity or other types of terminal modifications (48–50).
4
Xie
1.3. Genetic Requirements for miRNA Biogenesis
Our current knowledge on miRNA biogenesis came mainly from work performed with various miRNA-deficient mutants, a collective effort made by several leading groups. As mentioned earlier, almost all the known miRNA pathway components in plants were identified through this “reverse genetics” approach. DCL1 and HEN1 were among the first genes whose roles in miRNA biogenesis were genetically established (3, 4, 51). Both the dcl1 and the hen1 homozygous mutant plants exhibited severe developmental defects, reflecting the critical roles played by miRNAs in plant development. In fact, multiple dcl1 and hen1 alleles were recovered from genetic screens for mutants that are defective in flower development (51–55). Loss-of-function mutations in AGO1 also cause severe developmental defects (56–59). Similarly, pleiotropic developmental abnormalities have also been observed in mutants harboring loss-of-function mutations in HYPONASTIC LEAVES1 (HYL1) and SERRATE (SE), two proteins required for miRNA processing (23, 24, 60–63). HYL1 belongs to a family of nuclear dsRNA-binding proteins. While HYL1 specifically interacts with DCL1 (64), the DOUBLE-STRANDED RNA-BINDING PROTEIN 4 (DRB4), another member of the HYL1/DRB family appears to interact with DCL4 and function in the ta-siRNA pathway (65, 66), suggesting that members of this family may play distinct roles in plant small RNA biogenesis. SE, a zinc-finger protein, is known to regulate shoot meristem function and leaf polarity in Arabidopsis (62). Recent in vivo evidence shows that DCL1, HYL1, and SE colocalize in nuclear dicing bodies and function in pri-miRNA processing (22, 67). HASTY (HST), an Arabidopsis ortholog of the mammalian Exportin 5, is another protein that has been shown to play a role in plant miRNA biogenesis (39, 68–70). In wild type Arabidopsis, a majority of mature miRNAs accumulate in the cytoplasm. Loss-of-function mutations in HST reduced the accumulation of most miRNAs in the cytoplasm, suggesting its role in miRNA transport from the nucleus to the cytoplasm (69, 70). Information on some published miRNAdeficient mutant alleles is presented in Table 1. Among the miRNA-deficient mutants described above, the dcl1 and the hen1 mutants have been by far the most widely used lines in experimentally validating new miRNAs, although HEN1 also functions in siRNA biogenesis (31, 71). When working with these mutant lines, one potential problem could be the poor seed productivity of the homozygous mutants. Although in most cases a sufficient amount of seeds could still be obtained by propagating the homozygous individuals, there are cases where maintenance through the heterozygous individual is necessary. Homozygous mutants harboring either the dcl1-7 or the dcl1-9 allele (51), for example, often fail to produce any viable seeds. For this reason, a
Locus ID At1g01040 At4g20910
At1g48410
At1g09700 At3g05040 At2g27100
Gene name
DICER-LIKE 1(DCL 1)
HUA ENHANCER1(HEN1)
ARGONAUTE1
HYPONASTIC LEAVES1 (HYL1)
HASTY (HST)
SERRATE (SE)
X-rays T-DNA T-DNA
T-DNA
se-4 (SALK_059424)
Dipoxybutane T-DNA
Col-0
Col-1 Col-3 Col-0
Col-0 Col-0
(62, 63) (62) (62) (63)
(68–70) (39)
(23, 60) (24)
(56) (56) (24, 28, 56) (28, 59) (57) (58) (27) Col-0 Ws Col-0 La-er Col-0 Col-0 Col-0
T-DNA T-DNA EMS Transposon insertion EMS EMS T-DNA
La-er Col-0
(3, 52) (71) (24) (64) La-er Col-0 Col-0 Col-0
EMS EMS T-DNA T-DNA
Transposon insertion T-DNA
(51, 53) (4, 51, 54)
References
La-er La-er
Background
EMS T-DNA
Type of mutation
se-1 se-2 (SAIL_44_G12) se-3 (SALK_083196)
hst-1 hst-15 (SALK_079290)
hyl1-1 hyl1-2 (SALK_064863)
ago1-1 ago1-2 ago1-3 ago1-11and ago1-12 ago1-22 to ago1-24 ago1-25 to ago1-27 ago1-36 (SALK_087076)
hen1-1 hen1-4 hen1-5 (SALK_049197) hen1-6 (SALK_090960)
dcl1-7 dcl1-9
Representative alleles
Table 1 Selected miRNA-deficient mutants in Arabidopsis thaliana
Piecing the Puzzle Together 5
6
Xie
protocol for quick genomic DNA extraction and PCR-based genotyping is included in this chapter, taking the dcl1-7 allele (a point mutation) as an example. For genotyping of T-DNA insertion lines using allele-specific PCR, the readers are directed to a recently published protocol (72).
2. Materials 2.1. Mutant Genotyping
1. dcl1-7 seeds: available for ordering (stock number CS3089 for La-er background and CS6953 for Col-0 background) through the Arabidopsis Biological Resource Center (ABRC) or the Nottingham Arabidopsis Stock Center (NASC). The dcl1-7 allele harbors a single base change (a C to T substitution) at the position 1,429 from the ATG in the genomic DNA, resulting in a missense mutation (P415S) in the DCL1 protein (51, 53). 2. 0.1% agarose in distilled water. Sterilize and store at 4°C. 3. Genomic DNA extraction buffer: 50 mM Tris-Cl, pH 8.0, 10 mM EDTA, pH 8.0, 100 mM NaCl, 1.0% SDS, and 10 mM b-mercaptoethanol (add fresh right before use). 4. Neutralization buffer: 3 M potassium acetate, pH 4.8. 5. Isopropanol. 6. 70% ethanol. Store at −20°C. 7. 1X TE buffer: 10 mM Tris-Cl, pH 7.6, 1 mM EDTA. 8. 10X PCR buffer: 500 mM KCl, 100 mM Tris-Cl, pH 9.0, 1.0% Triton X-100, and 15 mM MgCl2. 9. dNTPs: 2.5 mM each. Store at −20°C. 10. Taq DNA polymerase. Store at −20°C. 11. Custom DNA oligonucleotides: 10 mM. Store at −20°C. For dcl1-7 genotyping: (1) Mlu I-DCL1_1219F (24mer): 5¢-GCCATCTTTGGAATGACTGACGCG-3¢ (2) DCL1_1302R (24mer): 5¢-GAGGTTACGTATCTTTATCGCACA-3¢. 12. Metaphor agarose. 13. DNA size markers: 25 or 50 bp DNA ladder (e.g., Promega catalog number G4511 and G4521, respectively). Store at −20°C. 14. 4X DNA sample loading buffer: 50% glycerol, 0.03% bromophenol blue, 50 mM Tris-Cl, pH 7.7; and 5 mM EDTA. 15. Ethidium bromide: 2 mg/mL. Wrap the bottle with aluminum foil. 16. Mlu I: 10 U/µL. Store at −20°C.
2.2. miRNA Detection by Northern Blot Assay
Piecing the Puzzle Together
7
1. 10% SDS. 2. 95% ethanol. 3. DEPC-treated water. 4. 10X TBE buffer: 900 mM Tris-borate, 20 mM EDTA, pH 8.0. 5. 30% polyacrylamide stock: acrylamide:bis acrylamide = 37.5:1. Wrap the bottle with aluminum foil and store at 4°C. 6. Urea: DNase-, RNase-, and protease-free. 7. 2% agar. 8. TEMED. Store at 4°C. 9. 10% ammonium persulfate: Make fresh. May be kept at 4°C up to 1 month. 10. 4X RNA sample loading buffer: 50% glycerol, 0.03% bromophenol blue, 50 mM Tris-Cl, pH 7.7, and 5 mM EDTA in DEPC-treated water. 11. RNA size markers (21- and 24-nt): 100 µM each. Store at −80°C. 12. Ethidium bromide stock: 2 mg/mL. Wrap the bottle with aluminum foil. 13. PerfectHyb Plus buffer: Sigma, catalog number H7033-1 L. 14. T4 Polynucleotide kinase (PNK): 10 U/µL. 15. 20X SSC: 3 M sodium chloride, 0.3 M sodium citrate, pH 7.0. 16. Custom DNA oligonucleotides for use as radiolabeled probes: 10 µM. 17. ATP (g-32P):~6,000 Ci/mmol, 10 mCi/ml.
3. Methods 3.1. Genotyping for the dcl1-7 Allele 3.1.1. Grow the Mutant Plants
1. Prepare seeds for both the mutant line and the wild type (La-er or Col-0, see Subheading 2.1, item 1) control by immersing the seeds in ~10 ml of 0.1% agarose suspension in 14 ml snap-top tubes. Store the tubes at 4°C for 2 to 3 overnights. 2. Fill a set of 3-inch pots with commercial soil mix (e.g., SunGrow Loose Mix #1; BWI catalog number WPLC1). Completely wet the soil mix with deionized water. Move the pots into a flat with holes at the bottom. If a growth chamber is being used to grow the plants, sit the flat in a second flat with the same dimensions but with no holes at the bottom.
8
Xie
3. Using a P1000 pipetter attached to a wide-bored tip, plant 5–10 mutant seeds in each pot. Make an effort during planting to ensure that the seeds are evenly distributed over the surface to facilitate the subsequent thinning. The number of pots being planted will depend on the experimental needs. To identify heterozygous individuals for seeds propagation, 20 pots will yield an adequate number of the desired genotype. 4. Plant the wild type control in the same way, except that 3–5 seeds may be planted at the center of the pots. 5. Cover each flat with a matching plastic dome to maintain the humidity. Move the flats to a growth chamber or a greenhouse room. A setting with 16 h light (24°C)/8 h dark (22°C) should work fine. 6. Remove the dome shortly after germination. 7. When the seedlings are big enough to allow differentiation between the wild type phenotype and the homozygous mutant phenotypes, thin the seedlings in the mutant flat down to 3–4 wild type-looking plants per pots. Two to three weeks after planting are usually required before the homozygous mutant phenotype can be reliably recognized. 3.1.2. Genomic DNA Extraction
1. From each individual plant to be genotyped, collect 2–3 rosette leaves or flower clusters in a microcentrifuge tube. Samples from the wild type and homozygous mutant plants should also be collected to serve as controls. 2. Add 375 mL of extraction buffer to each tube. 3. Using a plastic Pellet Pestle (VWR; catalog number KT749521-1500) attached to a power drill, homogenize the tissue by several short grindings. 4. Add another 375 mL of extraction buffer to each tube. Mix by brief vortex. 5. Heat the sample tubes in a 65°C water bath for 10 min. 6. Add 150 mL of neutralization buffer and mix. Incubate the tubes on ice for 20 min. 7. Pellet the cell debris by spinning at 14 k rpm for 5 min on a bench top microcentrifuge. 8. Transfer ~700 mL of the aqueous phase into a fresh tube. Add an equal volume of isopropanol and mix. 9. Centrifuge at 14 k rpm for 2 min to pellet the DNA. 10. Carefully pour off the supernatant and wash the pellet with 500 mL of ice-cold 70% ethanol by centrifuge at 14 k rpm for 2 min. 11. Air-dry the pellet by sitting the tubes upside down on a test tube rack for about 15 min. 12. Resuspend the pellet in 30 ~ 50 mL of 1X TE. Store at 4°C.
3.1.3. PCR and Restriction Analysis
Piecing the Puzzle Together
9
The derived cleaved amplified polymorphic sequence (dCAPS) method that is widely used for PCR-based detection of single nucleotide polymorphisms in plants (73) is adopted to detect the dcl1-7 mutation. Briefly, mismatches are introduced into one of the two DCL1-specific primers (Mlu I-DCL1_1219F) such that a Mlu I site is created if the flanking nucleotide is a “T” (in the mutant allele) instead of a “C” (in the wild type allele). Upon Mlu I digestion, the non-cleavable PCR fragments and the Mlu I-cleaved PCR fragments could be resolved by electrophoresis on a 4% metaphor agarose gel. 1. Assemble 25 mL PCR reactions in 200 mL tubes by mixing the following components: –– 10X PCR buffer 2.5 mL –– 2.5 mM dNTPs 2.0 mL –– Primer 1(Mlu I-DCL1_1219F; 10 mM) 1.0 mL –– Primer 2 (DCL1_1302R; 10 mM) 1.0 mL –– Distilled water
16.5 mL
–– Taq DNA polymerase (1 U/mL) 1.0 mL –– Genomic DNA 1.0 mL 2. Run PCR for 25 cycles using the following program: –– Initial denaturation at 96°C for 1 min –– Denaturation at 95°C for 1 min –– Annealing at 58°C for 45 s –– Extension at 72°C for 20 s –– Repeat the cycle 24 more times –– Final extension at 72°C for 5 min –– Hold at 4°C 3. Prepare a 4% metaphor agarose gel. Run a 3–5 mL PCR reaction to confirm successful amplification. A 25 or 50 bp DNA ladder should also be loaded to serve as size markers. A successful amplification should yield a DNA fragment of approximately160 bp in length (Fig. 1a). 4. Digest the PCR products with Mlu I in 20 mL reactions by mixing the following components in microcentrifuge tubes, and incubate in a 37°C water bath for 1 h. –– 10X Reaction buffer 2.0 mL –– Distilled water 7.7 mL –– PCR products 10.0 mL –– Mlu I (10 U/mL) 0.3 mL 5. Prepare a second 4% metaphor agarose gel. Analyze 5–8 mL of the digested PCR reaction on the gel. Mlu I will cut a 20 bpfragment off from the PCR products amplified from the
10
Xie
a
M(bp) 200 175 150 100 75 50 1
b
2
3
4
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
6
7
8
9
10
11
12
13
14
15
16
17
18
19
M(bp) 200 175 150 100 75 1
2
5
18
19
Fig. 1. Genotyping of Arabidopsis dcl1-7 mutant using the dCAPS method. (a) The 162 bp DNA fragments amplified from genomic DNA prepared from individual dcl1-7 mutant plants (lanes 3 ~ 9 and 11~17), a wild type control (lane 18), and a homozygous mutant plant (lane 2) were analyzed on a 4% metaphor agarose gel. (b) The same set of PCR products was analyzed on a 4% metaphor agarose gel after digestion with MluI. Notice that lanes 3, 4, 6, 7, and 8 represent the PCR products amplified from heterozygous dcl1-7 mutant plants while lane 5 represents a segregating wild type. A 25 bp DNA ladder (lanes 1, 10, and 19) was used as a size marker on both gels.
mutant, but not the wild type allele. Samples derived from heterozygous mutant plants are therefore expected to display a doublet consisting of a ~140 bp band and a ~160 bp band, respectively, while samples derived from the wild type plant or a homozygous mutant plant shall display a single band of ~160 and ~140 bp in length, respectively (Fig. 1b). 3.2. Detection of miRNAs by Northern Blot Analysis
1. Clean a set of glass plates, spacers, and comb with 10% SDS. Rinse thoroughly with deionized water. Wipe clean with 95% ethanol for quick drying.
3.2.1. Resolve Total RNA by Denaturing Polyacrylamide Gel Electrophoresis (PAGE)
2. Prepare 17% polyacrylamide gel with 7 M urea in 0.5X TBE by mixing the following components in a clean 250 ml Erlenmeyer flask: –– DEPC-treated water 2.0 ml –– 10X TBE 1.5 ml –– 30% polyacrylamide stock 17.0 ml –– Urea 12.6 g 3. Mix the contents by slowly swirling the flask a few times and incubate in a 37°C water bath to dissolve the urea.
Piecing the Puzzle Together
11
4. Assemble the pre-cleaned plates and spacers using a set of 6–8 binder clips. Seal the plates with a thin layer of 2% agar. 5. When the urea is completely dissolved, add 15 µL of TEMED and 125 µL of freshly made 10% ammonium persulfate. Mix the contents immediately by swirling the flask a few times. Cast the gel and insert the comb immediately. Allow the gel to polymerize for at least 30 min. 6. When the gel is fully polymerized, pull the comb out slowly and very carefully to avoid broken wells. Rinse out the wells with 0.5X TBE using a 10 ml syringe with a needle attached. 7. Assemble the electrophoresis apparatus and pre-run the gel at 180 V in 0.5X TBE buffer for at least 20 min. Rinse out the wells again right before loading the samples. 8. Prepare the RNA samples by mixing 5–20 µg of total RNA (see Note 1) with the 4X RNA sample loading buffer in microcentrifuge tubes. Keep the final sample volume below 25 µL for best resolution. 9. Prepare the RNA size markers (see Note 2) in the same way as for sample preparation. 10. Heat the samples at 65°C for 5 min and immediately cool on ice for 2 min. Spin briefly to collect the sample in the bottom of the tube. 11. Load the samples and size markers with gel loading tips so that the samples form a flat band at the very bottom of the well. 12. Run the gel at 180 V in 0.5X TBE buffer until the bromophenol blue dye reaches the bottom of the gel. 3.2.2. Transfer RNA from the Polyacrylamide Gel to Nylon Membrane
1. Disassemble the electrophoresis apparatus and carefully separate the gel slab from the glass plates. Cut off a small triangle from the upper-right corner of the gel to mark the orientation. Stain the gel in 0.5X TBE containing ethidium bromide for 3–5 min. 2. Get the Bio-Rad semi-dry transfer unit ready for later use. Presoak one piece of nylon membrane (see Note 3) and two pieces of Bio-Rad extra-thick blot paper in 0.5X TBE. 3. Destain the gel briefly in 0.5X TBE to remove the excessive ethidium bromide on the gel surface. 4. Lay a fresh piece of plastic wrap over the UV transilluminator of a gel imager. Lay the gel on the plastic wrap. Take a picture of the entire gel under UV. The 5 S rRNA and tRNAs will show up as bright bands near the top of the gel. The RNA size markers should also be visible.
12
Xie
5. Assemble the gel transfer sandwich on the Bio-Rad semi-dry transfer unit. The order should be (from the bottom to top): extra-thick blot paper, nylon membrane, the gel, the second piece of extra-thick blot paper, and the cathode plate. Cut off a small triangle from the upper-right corner of the membrane to mark the orientation. Remove air bubbles that may have been trapped between the gel and the membrane. This can be done by carefully lifting one side of the gel, and then slowly laying it back onto the membrane. 6. Transfer for 1 h under constant current (400 mAmps). 7. Disassemble the gel transfer sandwich. Air-dry the membrane on a piece of clean Whatman paper. 8. Check the gel under UV to confirm a successful transfer. 9. Crosslink the membrane twice in a UV crosslinker under the “auto” setting. 10. Under a portable UV light, mark the position of the RNA size markers on the membrane with a pencil. 3.2.3. Probing the Membrane with RadioLabeled Oligonucleotide Probe
1. Place the membrane in a hybridization bottle, with the sample side facing the center of the bottle. Prehybridize the membrane in 5 ml PerfectHyb Plus buffer (Sigma) at 38°C for 30 min or longer with rotation at a slow mode. 2. End-label an oligonucleotide probe using T4 PNK (see Note 4). Assemble the following labeling reaction in a microcentrifuge tube: –– Distilled water 15.5 µL –– 10X PNK buffer 2.5 µL –– DNA oligonucleotide (10 µM; see Note 5) 1.0 µL –– ATP (g-32P) (~6,000 Ci/mmol; 10 mCi/ml) 5.0 µL –– T4 PNK (10 U/µL) 1.0 µL 3. Incubate the tube at 37°C for 30 min. Inactivate the kinase by incubating at 65°C for 10 min. 4. Purify the probe by passing the reaction through a Bio-Rad P6 spin column to remove the unincorporated ATP (g-32P) (see Note 6). 5. Heat the purified probe to 70ºC for 2 min, then cool on ice for 2 min. Add the probe to the prehybridization buffer. Allow the hybridization to continue overnight (8 ~ 12 h) at 38°C with rotation at a slow mode. 6. Pour off the hybridization buffer into a radioactive liquid waste container. Wash the membrane at 50ºC with rotation at fast mode with preheated buffers: (1) once with 2X SSC, 0.2% SDS for 20 min; (2) once with 1X SSC, 0.1% SDS for 20 min; (3) once with 0.5X SSC, 0.1% SDS for 20 min; (4) once with 0.1X SSC, 0.1% SDS at 50°C for 20 min.
Piecing the Puzzle Together
13
7. Wrap the membrane with plastic wrap. 8. Expose the membrane to X-ray film in an autoradiograph cassette at −80°C with two intensifying screens (see Note 7). Tape the wrapped membrane onto the bottom piece of the intensifying screen to facilitate the alignment between the membrane and the film afterwards. Mark the orientation of the film by cutting off a small triangle from the upper-right corner. The optimal exposure time may vary from a few hours to several overnights, depending on the abundance of the target miRNA as well as the type of the probe.
4. Notes 1. Clean, high quality total RNA extracts can be directly used without further fractionation for low-molecular-weight (LMW) RNAs for miRNA detection by northern blot assays. In the past, the column from the Qiagen RNA/DNA midi kit (catalog number 14142) was used to separate the high-molecularweight (HMW) and LMW RNAs by a stepwise elution procedure, followed by isopropanol precipitation. The recovery of small RNAs through this procedure turned out to be suboptimal and inefficient. However, the column purification step does significantly improve the quality of RNA, especially for samples prepared from polysaccharide-rich tissues. It is therefore still recommended to purify the total RNA extracts using either the Qiagen RNA/DNA midi kit or other alternative methods, but fractionation of LMW RNA is generally unnecessary. 2. RNA size markers serve as a convenient reference for data interpretation. An equal molar mix of 21- and 24-nt synthetic, 5¢ phosphorylated RNA oligonucleotides with a non-plant sequence (e.g., GFP) works well for this purpose. Loading of 250–500 pmol each is usually sufficient for visualizing the marker on the membrane following the gel transfer. 3. For maximum sensitivity, the use of a charged nylon membrane (e.g., Nytran SuPerCharge Membranes by Whatman, formerly Schleider & Schuell; VWR catalog number 28151318) is recommended. 4. The T4 PNK has an intrinsic bias against the efficient labeling of certain oligonucleotides, particularly those with 5¢-C ends. Labeling of such oligonucleotides with T4 PNK could therefore encounter low labeling efficiency. The OptiKinase from USB (catalog number 78334X), which is a modified version of T4 PNK and exhibits little or no base discrimination, could be the choice for solving this problem.
14
Xie
5. For detection of miRNAs with very low abundance, use of oligonucleotides with the locked nucleic acid (LNA) modification instead of regular DNA oligonucleotides could significantly improve the sensitivity of detection (74). 6. The incorporation of the radioactive 32P into the oligonucleotides can be experimentally measured. Take out 0.5 µL of the labeling reaction and make a 1:50 dilution with 1X TE. On each of the two Whatman DE81 circular filter papers (catalog number 3658-325), spot a small aliquot (e.g., 2 µL) of the diluted labeling reaction. Air-dry the filter papers. Set one of the circles aside and wash the other circle twice for 5 min each with 50 ml 0.5 M Na2HPO4 (pH6.8) in a glass beaker with gentle shaking. A final wash in 95% ethanol may be followed to facilitate the air-drying of the filter paper. Transfer each of the filter papers to a scintillation vial containing 5 ml of scintillation cocktail (e.g., Fisher Scientific catalog number SX18-4). Measure the radioactivity retained on each filter paper using a liquid scintillation counter (LSC). The incorporation efficiency can be calculated as follows:
% incorporation = éë(incorporated cpm ) / (total cpm )ùû ´ 100 é(cpm from washed filter ) / ù =ê ú ´ 100 ëê(cpm from unwashed filter )ûú An incorporation efficiency of 50% or higher is often achieved (see Note 4). 7. If possible, check the radioactive signal on a phosphor imager before exposing the blot to film. This serves as a preview to ensure that a satisfactory signal/noise ratio has been achieved. In case of high background or unknown “hot spots”, an extended final wash can be done before exposing the membrane to film.
Acknowledgments I thank Chris Rock for critically reading the manuscript.
References 1. Llave C, Kasschau KD, Rector MA, Carrington JC (2002) Endogenous and silencing-associated small RNAs in plants. Plant Cell 14: 1605– 1619
2. Mette MF, van der Winden J, Matzke M, Matzke AJ (2002) Short RNAs can identify new candidate transposable element families in Arabidopsis. Plant Physiol 130:6–9
3. Park W, Li J, Song R, Messing J, Chen X (2002) CARPEL FACTORY, a Dicer homolog, and HEN1, a novel protein, act in microRNA metabolism in Arabidopsis thaliana. Curr Biol 12:1484–1495 4. Reinhart BJ, Weinstein EG, Rhoades MW, Bartel B, Bartel DP (2002) MicroRNAs in plants. Genes Dev 16:1616–1626 5. Hamilton AJ, Baulcombe DC (1999) A species of small antisense RNA in posttranscriptional gene silencing in plants. Science 286:950–952 6. Lagos-Quintana M, Rauhut R, Lendeckel W, Tuschl T (2001) Identification of novel genes coding for small expressed RNAs. Science 294:853–858 7. Lau NC, Lim EP, Weinstein EG, Bartel DP (2001) An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science 294:858–862 8. Lee RC, Ambros V (2001) An extensive class of small RNAs in Caenorhabditis elegans. Science 294:862–864 9. Baulcombe D (2004) RNA silencing in plants. Nature 431:356–363 10. Chapman EJ, Carrington JC (2007) Specialization and evolution of endogenous small RNA pathways. Nat Rev Genet 8:884–896 11. Jones-Rhoades MW, Bartel DP, Bartel B (2006) microRNAs and their regulatory roles in plants. Annu Rev Plant Biol 57:19–53 12. Vaucheret H (2006) Post-transcriptional small RNA pathways in plants: mechanisms and regulations. Genes Dev 20:759–771 13. Axtell MJ, Snyder JA, Bartel DP (2007) Common functions for diverse small RNAs of land plants. Plant Cell 19(6):1750–1769 14. Fahlgren N, Howell MD, Kasschau KD, Chapman EJ, Sullivan CM, Cumbie JS, Givan SA, Law TF, Grant SR, Dangl JL, Carrington JC (2007) High-throughput sequencing of Arabidopsis microRNAs: evidence for frequent birth and death of MIRNA genes. PLoS ONE 2:e219 15. Henderson IR, Zhang X, Lu C, Johnson L, Meyers BC, Green PJ, Jacobsen SE (2006) Dissecting Arabidopsis thaliana DICER function in small RNA processing, gene silencing and DNA methylation patterning. Nat Genet 38:721–725 16. Lu C, Tej SS, Luo S, Haudenschild CD, Meyers BC, Green PJ (2005) Elucidation of the small RNA component of the transcriptome. Science 309:1567–1569 17. Rajagopalan R, Vaucheret H, Trejo J, Bartel DP (2006) A diverse and evolutionarily fluid set of microRNAs in Arabidopsis thaliana. Genes Dev 20:3407–3425
Piecing the Puzzle Together
15
18. Xie Z, Qi X (2008) Diverse small RNA-directed silencing pathways in plants. Biochim Biophys Acta – Gene Regulatory Mechanisms 1779 (11):720–724 19. Kurihara Y, Watanabe Y (2004) Arabidopsis micro-RNA biogenesis through Dicer-like 1 protein functions. Proc Natl Acad Sci USA 101:12753–12758 20. Parizotto EA, Dunoyer P, Rahm N, Himber C, Voinnet O (2004) In vivo investigation of the transcription, processing, endonucleolytic activity, and functional relevance of the spatial distribution of a plant miRNA. Genes Dev 18:2237–2242 21. Xie Z, Allen E, Fahlgren N, Calamar A, Givan SA, Carrington JC (2005) Expression of Arabidopsis MIRNA genes. Plant Physiol 138: 2145–2154 22. Fang Y, Spector DL (2007) Identification of nuclear dicing bodies containing proteins for microRNA biogenesis in living Arabidopsis plants. Curr Biol 17:818–823 23. Han MH, Goud S, Song L, Fedoroff N (2004) The Arabidopsis double-stranded RNAbinding protein HYL1 plays a role in microRNA-mediated gene regulation. Proc Natl Acad Sci USA 101:1093–1098 24. Vazquez F, Gasciolli V, Crete P, Vaucheret H (2004) The nuclear dsRNA binding protein HYL1 is required for microRNA accumulation and plant development, but not posttranscriptional transgene silencing. Curr Biol 14:346–351 25. Khvorova A, Reynolds A, Jayasena SD (2003) Functional siRNAs and miRNAs exhibit strand bias. Cell 115:209–216 26. Schwarz DS, Hutvagner G, Du T, Xu Z, Aronin N, Zamore PD (2003) Asymmetry in the assembly of the RNAi enzyme complex. Cell 115:199–208 27. Baumberger N, Baulcombe DC (2005) Arabidopsis ARGONAUTE1 is an RNA Slicer that selectively recruits microRNAs and short interfering RNAs. Proc Natl Acad Sci USA 102:11928–11933 28. Qi Y, Denli AM, Hannon GJ (2005) Biochemical specialization within Arabidopsis RNA silencing pathways. Mol Cell 19:421–428 29. Vaucheret H, Mallory AC, Bartel DP (2006) AGO1 homeostasis entails coexpression of MIR168 and AGO1 and preferential stabilization of miR168 by AGO1. Mol Cell 22: 129–136 30. Hamilton A, Voinnet O, Chappell L, Baulcombe D (2002) Two classes of short interfering RNA in RNA silencing. Embo J 21:4671– 4679
16
Xie
31. Xie Z, Johansen LK, Gustafson AM, Kasschau KD, Lellis AD, Zilberman D, Jacobsen SE, Carrington JC (2004) Genetic and functional diversification of small RNA pathways in plants. PLoS Biol 2:E104 32. Herr AJ, Jensen MB, Dalmay T, Baulcombe DC (2005) RNA polymerase IV directs silencing of endogenous DNA. Science 308:118–120 33. Onodera Y, Haag JR, Ream T, Nunes PC, Pontes O, Pikaard CS (2005) Plant nuclear RNA polymerase IV mediates siRNA and DNA methylation-dependent heterochromatin formation. Cell 120:613–622 34. Pontier D, Yahubyan G, Vega D, Bulski A, Saez-Vasquez J, Hakimi MA, Lerbs-Mache S, Colot V, Lagrange T (2005) Reinforcement of silencing at transposons and highly repeated sequences requires the concerted action of two distinct RNA polymerases IV in Arabidopsis. Genes Dev 19:2030–2040 35. Zhang X, Henderson IR, Lu C, Green PJ, Jacobsen SE (2007) Role of RNA polymerase IV in plant small RNA metabolism. Proc Natl Acad Sci USA 104:4536–4541 36. Zilberman D, Cao X, Jacobsen SE (2003) ARGONAUTE4 control of locus-specific siRNA accumulation and DNA and histone methylation. Science 299:716–719 37. Li CF, Pontes O, El-Shami M, Henderson IR, Bernatavichute YV, Chan SW, Lagrange T, Pikaard CS, Jacobsen SE (2006) An ARGONAUTE4-containing nuclear processing center colocalized with Cajal bodies in Arabidopsis thaliana. Cell 126:93–106 38. Pontes O, Li CF, Nunes PC, Haag J, Ream T, Vitins A, Jacobsen SE, Pikaard CS (2006) The Arabidopsis chromatin-modifying nuclear siRNA pathway involves a nucleolar RNA processing center. Cell 126:79–92 39. Allen E, Xie Z, Gustafson AM, Carrington JC (2005) microRNA-directed phasing during trans-acting siRNA biogenesis in plants. Cell 121:207–221 40. Axtell MJ, Jan C, Rajagopalan R, Bartel DP (2006) A two-hit trigger for siRNA biogenesis in plants. Cell 127:565–577 41. Peragine A, Yoshikawa M, Wu G, Albrecht HL, Poethig RS (2004) SGS3 and SGS2/SDE1/ RDR6 are required for juvenile development and the production of trans-acting siRNAs in Arabidopsis. Genes Dev 18:2368–2379 42. Vazquez F, Vaucheret H, Rajagopalan R, Lepers C, Gasciolli V, Mallory AC, Hilbert JL, Bartel DP, Crete P (2004) Endogenous transacting siRNAs regulate the accumulation of Arabidopsis mRNAs. Mol Cell 16:69–79 43. Gasciolli V, Mallory AC, Bartel DP, Vaucheret H (2005) Partially redundant functions of
Arabidopsis DICER-like enzymes and a role for DCL4 in producing trans-acting siRNAs. Curr Biol 15:1494–1500 44. Xie Z, Allen E, Wilken A, Carrington JC (2005) DICER-LIKE 4 functions in trans-acting small interfering RNA biogenesis and vegetative phase change in Arabidopsis thaliana. Proc Natl Acad Sci USA 102:12984–12989 45. Yoshikawa M, Peragine A, Park MY, Poethig RS (2005) A pathway for the biogenesis of trans-acting siRNAs in Arabidopsis. Genes Dev 19:2164–2175 46. Borsani O, Zhu J, Verslues PE, Sunkar R, Zhu JK (2005) Endogenous siRNAs derived from a pair of natural cis-antisense transcripts regulate salt tolerance in Arabidopsis. Cell 123: 1279–1291 47. Katiyar-Agarwal S, Morgan R, Dahlbeck D, Borsani O, Villegas A Jr, Zhu JK, Staskawicz BJ, Jin H (2006) A pathogen-inducible endogenous siRNA in plant immunity. Proc Natl Acad Sci USA 103:18002–18007 48. Yang Z, Ebright YW, Yu B, Chen X (2006) HEN1 recognizes 21–24 nt small RNA duplexes and deposits a methyl group onto the 2¢ OH of the 3¢ terminal nucleotide. Nucleic Acids Res 34:667–675 49. Yu B, Yang Z, Li J, Minakhina S, Yang M, Padgett RW, Steward R, Chen X (2005) Methylation as a crucial step in plant microRNA biogenesis. Science 307:932–935 50. Li J, Yang Z, Yu B, Liu J, Chen X (2005) Methylation protects miRNAs and siRNAs from a 3¢-end uridylation activity in Arabidopsis. Curr Biol 15:1501–1507 51. Schauer SE, Jacobsen SE, Meinke DW, Ray A (2002) DICER-LIKE1: blind men and elephants in Arabidopsis development. Trends Plant Sci 7:487–491 52. Chen X, Liu J, Cheng Y, Jia D (2002) HEN1 functions pleiotropically in Arabidopsis development and acts in C function in the flower. Development 129:1085–1094 53. Golden TA, Schauer SE, Lang JD, Pien S, Mushegian AR, Grossniklaus U, Meinke DW, Ray A (2002) SHORT INTEGUMENTS1/ SUSPENSOR1/CARPEL FACTORY, a Dicer homolog, is a maternal effect gene required for embryo development in Arabidopsis. Plant Physiol 130:808–822 54. Jacobsen SE, Running MP, Meyerowitz EM (1999) Disruption of an RNA helicase/RNAse III gene in Arabidopsis causes unregulated cell division in floral meristems. Development 126:5231–5243 55. Ray A, Lang JD, Golden T, Ray S (1996) SHORT INTEGUMENTS1 (SIN1), a gene required for ovule development in Arabidopsis,
also controls flowering time. Development 122:2631–2638 56. Bohmert K, Camus I, Bellini C, Bouchez D, Caboche M, Benning C (1998) AGO1 defines a novel locus of Arabidopsis controlling leaf development. Embo J 17:170–180 57. Fagard M, Boutet S, Morel JB, Bellini C, Vaucheret H (2000) AGO1, QDE-2, and RDE-1 are related proteins required for posttranscriptional gene silencing in plants, quelling in fungi, and RNA interference in animals. Proc Natl Acad Sci USA 97:11650–11654 58. Morel JB, Godon C, Mourrain P, Beclin C, Boutet S, Feuerbach F, Proux F, Vaucheret H (2002) Fertile hypomorphic ARGONAUTE (ago1) mutants impaired in post-transcriptional gene silencing and virus resistance. Plant Cell 14:629–639 59. Kidner CA, Martienssen RA (2004) Spatially restricted microRNA directs leaf polarity through ARGONAUTE1. Nature 428:81–84 60. Lu C, Fedoroff N (2000) A mutation in the Arabidopsis HYL1 gene encoding a dsRNA binding protein affects responses to abscisic acid, auxin, and cytokinin. Plant Cell 12: 2351–2366 61. Yang L, Liu Z, Lu F, Dong A, Huang H (2006) SERRATE is a novel nuclear regulator in primary microRNA processing in Arabidopsis. Plant J 47:841–850 62. Grigg SP, Canales C, Hay A, Tsiantis M (2005) SERRATE coordinates shoot meristem function and leaf axial patterning in Arabidopsis. Nature 437:1022–1026 63. Lobbes D, Rallapalli G, Schmidt DD, Martin C, Clarke J (2006) SERRATE: a new player on the plant microRNA scene. EMBO Rep 7:1052–1058 64. Kurihara Y, Takashi Y, Watanabe Y (2006) The interaction between DCL1 and HYL1 is important for efficient and precise processing of pri-miRNA in plant microRNA biogenesis. RNA 12:206–212 65. Adenot X, Elmayan T, Lauressergues D, Boutet S, Bouche N, Gasciolli V, Vaucheret H (2006) DRB4-dependent TAS3 trans-acting
Piecing the Puzzle Together
17
siRNAs control leaf morphology through AGO7. Curr Biol 16:927–932 66. Hiraguri A, Itoh R, Kondo N, Nomura Y, Aizawa D, Murai Y, Koiwa H, Seki M, Shinozaki K, Fukuhara T (2005) Specific interactions between Dicer-like proteins and HYL1/DRBfamily dsRNA-binding proteins in Arabidopsis thaliana. Plant Mol Biol 57:173–188 67. Song L, Han MH, Lesicka J, Fedoroff N (2007) Arabidopsis primary microRNA processing proteins HYL1 and DCL1 define a nuclear body distinct from the Cajal body. Proc Natl Acad Sci USA 104:5437–5442 68. Telfer A, Poethig RS (1998) HASTY: a gene that regulates the timing of shoot maturation in Arabidopsis thaliana. Development 125: 1889–1898 69. Bollman KM, Aukerman MJ, Park MY, Hunter C, Berardini TZ, Poethig RS (2003) HASTY, the Arabidopsis ortholog of exportin 5/MSN5, regulates phase change and morphogenesis. Development 130:1493–1504 70. Park MY, Wu G, Gonzalez-Sulser A, Vaucheret H, Poethig RS (2005) Nuclear processing and export of microRNAs in Arabidopsis. Proc Natl Acad Sci USA 102:3691–3696 71. Boutet S, Vazquez F, Liu J, Beclin C, Fagard M, Gratias A, Morel JB, Crete P, Chen X, Vaucheret H (2003) Arabidopsis HEN1: a genetic link between endogenous miRNA controlling development and siRNA controlling transgene silencing and virus resistance. Curr Biol 13:843–848 72. Stepanova AN, Alonso JM (2006) PCR-based screening for insertional mutants, in: Arabidopsis Protocols (Salinas J and Sanchez-Serrano J, eds) Humana Press, Totowa, NJ. pp. 163–172 73. Neff MM, Neff JD, Chory J, Pepper AE (1998) dCAPS, a simple technique for the genetic analysis of single nucleotide polymorphisms: experimental applications in Arabidopsis thaliana genetics. Plant J 14:387–392 74. Valoczi A, Hornyik C, Varga N, Burgyan J, Kauppinen S, Havelda Z (2004) Sensitive and specific detection of microRNAs by northern blot analysis using LNA-modified oligonucleotide probes. Nucleic Acids Res 32:e175
Chapter 2 Prediction of Plant miRNA Genes Matthew W. Jones-Rhoades Abstract This chapter presents procedures for the computational identification of plant miRNA genes. In the first procedure, homologs of known miRNAs are identified in a database of genomic or cDNA sequence. In the second procedure, previously unidentified miRNA families are predicted through the analysis of secondary structure, evolutionary conservation, and targeting potential. Key words: Gene prediction, microRNA discovery, Comparative genomics
1. Introduction MicroRNAs are short, non-coding, endogenously expressed RNAs that are processed from longer hairpin precursors (see Subheading 2.1, (1) for review). Historically, most plant miRNA genes have been discovered by one of two methods: the molecular cloning of small RNAs (2–8), and computational prediction of miRNA genes based on the conservation of sequence and secondary structure (9–14). While molecular cloning is the most direct way of discovering miRNAs, bioinformatic approaches have provided a useful complement to cloning experiments. For example, as new miRNAs are discovered through molecular cloning, computational approaches can identify homologous miRNAs in databases of genomic or cDNA sequences, thereby establishing the copy number and the range of evolutionary conservation of the miRNAs. Similarly, the accurate identification of known miRNA families in the growing body of genomic and cDNA sequence is critical for the accurate and thorough annotation of gene content in these databases. B.C. Meyers and P.J. Green (eds.), Plant MicroRNAs, Methods in Molecular Biology, vol. 592 DOI 10.1007/978-1-60327-005-2_2, © Humana Press, a part of Springer Science + Business Media, LLC 2010
19
20
Jones-Rhoades
Computational approaches can also complement cloning experi ments by identifying miRNA families that are difficult to clone due to low or highly specific expression. For example, the miR395 and miR399 families, both of which were initially identified (9, 11), are difficult to detect experimentally in Arabidopsis under standard growth conditions, due to low expression levels, but are easily detectable in plants starved for sulfate or phosphate, respectively (11, 15, 16). The changes from the original are (a) replace “through comparative genomics” with “compu tationally” (affer “were initially identified”) (b) insertion of “experimentally” after “to detect” (c) insertion of “due to low expression levels” “after growth conditions”. While the development of high-throughput sequencing technologies have vastly improved the sensitivity of molecular cloning experiments (3, 5, 17, 18), it is still difficult to ensure that all possible tissues, developmental stages, and growth conditions are represented in cloned libraries of small RNAs. Therefore, computational methods will remain useful in the discovery of plant miRNAs. 1.1. Identification of Plant miRNA Homologs
Homologs of a known miRNA can be predicted by identifying genomic regions that have (a) sequence similarity to the known miRNA and (b) the potential to form miRNA hairpin precursors. Because the loop regions of miRNA hairpins can be highly divergent between related miRNAs, the most sensitive means of finding potential homologs is to identify near matches to the mature miRNA. The genomic regions containing these candidates are then screened for the ability to fold into hairpin structures that resemble miRNA precursors. While this procedure is relatively straightforward, care must be taken to avoid the spurious mis-annotation of non-miRNA sequences as miRNAs (see below).
1.2. De Novo Identification of Conserved Plant miRNAs
A more difficult problem is to predict miRNA genes that are unrelated to any miRNA with previous experimental evidence. Typically the first step in such an approach is to typically identify genomic regions with the potential to form hairpin secondary structures with properties similar to known miRNA precursors. However, a typical genome contains thousands of regions with predicted secondary structures that resemble miRNA precursors (11), of which only a low percentage are likely to represent actual miRNA genes. Therefore, it is necessary to use additional criteria to identify the genomic hairpins with a high probability of encoding actual miRNAs. While several groups have addressed this problem in plants (9–11, 13), all have used the evolutionary conservation of stem-loops and the ability to base pair to potential target RNAs as a means of reducing the number of false positive hairpins under consideration.
Prediction of Plant miRNA Genes
21
This chapter describes a procedure that identified 18 of the 20 miRNA families that are conserved between Arabidopsis and rice (11). Because future analyses will probably involve different genomes that are separated by different evolutionary distances, I have attempted to highlight the ways in which this approach could be adapted to other comparisons. 1.3. miRcheck and Supporting Scripts
A central step in the prediction of plant miRNA genes is the analysis of secondary structures to identify candidate hairpin precursors. Plant miRNA hairpins are diverse in terms of the length and extent of base pairing in the loop region (defined here as the sequence connecting the miRNA and miRNA*). However, all well-supported miRNA stem-loops are predicted to have extensive base pairing in the region containing the miRNA and miRNA*. If only a moderate number of putative miRNA loci are to be analyzed, it may be possible to manually evaluate each secondary structure. However, for a genome-wide search involving hundreds or thousands of candidate miRNAs, it is desirable to use an automated method to evaluate the hairpin structures. The miRcheck algorithm has been used to evaluate the miRNA encoding potential of whole genomes (11), to identify miRNA homologs in poplar (12), and to evaluate the miRNA-encoding potential of cloned small RNAs (5, 19). miRcheck compares the local secondary structure of the miRNA/miRNA* against a series of parameters; a putative miRNA will “pass” miRcheck if cutoff values are met for all parameters. The default cutoffs of miRcheck have exhibited a good balance of selectivity and sensitivity in a number of different applications. Although miRcheck was designed based on the characteristics of 11 miRNA families conserved between Arabidopsis and rice, 94% of plant miRNA loci (miRbase 10.0) that are supported by conservation or deep sequencing pass the default parameters. Importantly, the cutoff values for each miRcheck parameter can be adjusted as for the requirements and goals of an analysis dictate. The features analyzed by miRcheck are: 1. A consistent orientation of base pairing within the putative miRNA (i.e., all nucleotides pairing to the miRNA must either all be 5¢ of their partner, or all be 3¢ of their partner). 2. A stem-loop length of at least 54 nucleotides (including putative miRNA and miRNA*). This requirement discriminates against cases in which the putative miRNA and miRNA* are separated by a very short loop, a pattern not observed for well-documented plant miRNA precursors. 3. A maximum of six unpaired nucleotides in the putative miRNA, of which no more than three are consecutive. In addition, no more than two nucleotides within the putative miRNA may be asymmetrically unpaired (i.e., only two
22
Jones-Rhoades
unpaired nucleotides that lack corresponding unpaired nucleotides in the miRNA* are allowed). 4. A maximum of six unpaired nucleotides in the putative miRNA*, of which no more than three are consecutive. 5. The extension of base pairing at least two nucleotides outside of the putative miRNA. This is defined as the existence of an “extended miRNA” (and corresponding “extended miRNA*”) that contains the putative miRNA and which meets all of the above pairing requirements. 6. The extended miRNA* must be no more than three nucleotides longer than the extended miRNA. This requirement discriminates against long bulges in the miRNA*. 7. At least one nucleotide within the extended miRNA or miRNA* must be unpaired. This requirement is designed to discriminate against long, perfect inverted repeats that occur in plant genomes. In addition to miRcheck itself (implemented as a perl module), I have also made available supporting perl scripts that are used in the methods in this chapter. In many cases, these scripts are wrappers that parse an input file, feed the data to another algorithm (such as miRcheck, patscan, or RNAfold), and parse the output into a convenient format. 1.4. Maximizing Sensitivity and Selectivity in miRNA Prediction
In any bioinformatic gene prediction strategy, it is desirable to minimize the number of false negatives (i.e., to correctly identify as many real genes as possible) while also minimizing the number of false positives. In most cases, the investigator must decide how to balance these sometimes opposing goals. In the procedures below, I present methods and cutoffs that I have found to give a good balance of sensitivity and specificity in a variety of different situations. The ideal thresholds and methods for any particular analysis will be dictated by the nature of the search and the goals of the analysis (i.e., exhaustive list of genes vs. highly accurate list of genes). Due diligence and skepticism on the part of the investigator, including the careful consideration of the possibility of false positives, is paramount to achieving the meaningful and useful prediction of miRNA genes. Without careful controls, there is a very real danger of muddling the literature with spuriously annotated sequences that are not authentic miRNA genes.
2. Materials 2.1. Identification of Plant miRNA Homologs
1. A computer with a Unix/Linux/OSX operating system. 2. A file of miRNA sequences in Fasta format. 3. A file of genomic/cDNA sequences in Fasta format.
Prediction of Plant miRNA Genes
23
4. Patscan (available at http://www-unix.mcs.anl.gov/compbio/ PatScan/) should be installed. 5. RNAfold (available at http://www.tbi.univie.ac.at/~ivo/ RNA/RNAfold.html) or other RNA secondary structure prediction software should be installed. 6. miRcheck and supporting scripts (available at http://web. wi.mit.edu/bartel/pub/software.html). 2.2. De Novo Identification of Conserved Plant miRNAs
1. Two or more sets of genomic or cDNA sequences, contained in separate Fasta files. 2. einverted (part of EMBOSS package, available at http:// emboss.sourceforge.net/) should be installed. 3. RNAfold, miRcheck, and supporting perl scripts (as described above).
3. Methods 3.1. Identification of Plant miRNA Homologs
This protocol assumes that the investigator is starting with one or more miRNA sequences to be used as queries in the search, as well as a set of genomic or cDNA sequences in which to search for miRNA homologs.
3.1.1. Identify Pontential Homology Based on Primary Sequence Similarity
% run_patscan.pl miRNAs.fa genome.fa 200 miRNA_matches This script calls patscan to identify matches to known miRNA (contained in miRNAs.fa) in the target genomic sequence (contained in genome.fa) at the specified stringency (in this case, 0–2 substitutions and no insertions/deletions) and output matches to miRNA_matches (see Note 1).
3.1.2. Screen Potential Homologs for miRNA-like Secondary Structure
1. % retrieve_genomic_regions.pl 350 miRNA_matches genome.fa miRNA_matches_g This command retrieves a genomic fragment for each putative miRNA, with 350 flanking nucleotides added to each side of the putative miRNAs (see Note 2). 2. % RNAfold < miRNA_matches_g > miRNA_matches_g_f This command uses RNAfold to predict secondary structures for each putative miRNA locus. 3. % evaluate_miRNA_candidates.pl miRNA_matches_g_f miRNA_ matches_g_f_miRNAs This script calls miRcheck to evaluate the secondary structure of each putative miRNA. The output (miRNA_matches_g_f ) contains the position of the miRNA within the hairpin (5¢ or 3¢) for each candidate that passed miRcheck. Additional output files (miRNA_ matches_g_f_mature and miRNA_matches_g_f_hairpin) contain the genomic coordinates and sequences of the mature miRNAs and miRNA hairpins for candidates that passed miRcheck.
24
Jones-Rhoades
3.1.3. Quality Control for miRNA Homolog Predictions
Due to the rather vague pairing requirements for plant miRNA hairpins, there is the considerable possibility of recovering false positives when predicting miRNA genes. There are several quality control steps that should be applied to the predicted miRNA genes to help gauge the stringency of the predictions. One potential source of false positives is the mis-annotation of miRNA binding sites in target genes as miRNA hairpins. Because most plant miRNAs are highly complementary to target miRNAs, high-quality matches (0–2 nucleotides substitutions) to a miRNA in a plant genome primarily fall into one of two categories: miRNA genes and miRNA complementary sites in target genes. In general, the former will have predicted secondary structures that pass miRcheck while the latter will not. However, due to the rather loose pairing requirements of plant miRNAs precursors, a fraction of miRNA target sites can be expected to spuriously fold into hairpin structures. Therefore, it is advisable to screen putative miRNA homologs for the potential to encode for proteins. A putative miRNA locus that has similarity to the known targets of that miRNA family (e.g., a putative miR164 hairpin that is antisense to a CUC-like gene) is probably a false positive. A useful check on the validity of the predicted miRNAs is to consider the location of each putative miRNA within its hairpin precursor (i.e., on the 5¢ arm or the 3¢ arm of the hairpin). Each miRNA family has a characteristic location in which the mature miRNA is always found. For example, all miR156 hairpins contain the mature miRNA on the 5¢ arm. An analysis that predicts a substantial number of miRNAs that are located on the opposite arms of their precursors relative to their known homologs (i.e., miR156 homologs on 3¢ arms) should be cause for concern. Another useful control is to repeat the analysis in a context in which there should be no true positives. For example, arbitrary sequences, with the same dinucleotide composition as the actual miRNAs, can be used as starting point of the search. The prediction of a substantial number of “homologs” to these arbitrary sequences should be taken as an indication that one or more steps in the procedure are insufficiently stringent.
3.1.4. A Worked Example: Identification of Homologs to Arabidopsis miRNAs in Moss
This protocol was designed to identify homologs of Arabidopsis miRNAs in the poplar genome (12). Testing this procedure against the moss (Physcomitrella patens) genome demonstrates its utility, while also illustrating some pitfalls. The recent deep sequencing of moss small RNAs provides an independent measure of the actual miRNA content in the moss genome (19). Using the set of mature Arabidopsis miRNAs (miRbase 10.0, representing 94 families) as queries, 581 genomic loci with sequence similarity to an Arabidopsis miRNA, were identified of
Prediction of Plant miRNA Genes
25
which 46 passed miRcheck. Thirty-two of the 34 conserved miRNA genes with experimental evidence (representing seven families) were identified. For the 87 Arabidopsis miRNA families without experimental evidence in moss, only nine putative homologs that passed miRcheck were identified, of which only four were on the correct hairpin arm (see Note 3). 3.2. De Novo Identification of Conserved Plant miRNAs
This protocol assumes that the investigator is starting with two or more databases of genomic or cDNA sequences in which to search for miRNA genes. The procedure outlined was developed to discover miRNAs conserved between Arabidopsis and rice. Because a comparison across a different evolutionary distance (e.g., comparing Arabidopsis to poplar, or two species of moss to each other) might require a very different analysis, it is not possible to describe a single procedure that will work well in all cases. Accordingly, the protocol describes approaches that worked well in the monocotdicot analysis so that investigators interested in other analyses can consider adapting them as needed.
3.2.1. Identify Genomic Regions with miRNA-like Secondary Structure
1. % run_einverted.pl genome.fa einverted_output This script calls einverted to identify imperfect inverted repeats in the genomic sequence. Because einverted cannot run efficiently on large sequences, the genomic sequence is broken up into overlapping 2 kb fragments (see Note 4). 2. % fold_inverted_repeats.pl einverted_output genome.fa einverted_output _f For each inverted repeat, this script retrieves the genomic sequence containing the repeat (plus 10 flanking nucleotides on either side) and uses RNAfold to predict the secondary structures of each genomic fragment. Two structures are predicted for each inverted repeat, corresponding to theoretical transcripts from either DNA strand. 3. % extract_einverted_20mers.pl einverted_output _f einverted_ output_20mers For each secondary structure, miRcheck identifies 20-mers with local base pairing patterns that are compatible with those typically found in plant miRNA precursors. Because each 20-mer is considered separately, a single stem-loop may contain numerous 20-mer miRNA candidates. The end result of this step is a list (probably quite lengthy) of candidate miRNA stem-loops and candidate miRNA precursors, with each candidate stem-loop containing one or more candidate mature miRNAs. (In many cases, the reverse complement stemloop will also contain candidate miRNAs.) Unless overly stringent cutoffs have been used, it is likely that only a very small fraction of these candidates are actual miRNAs. Therefore, additional filters are important to improve the selectivity of the miRNA predictions (see Notes 5 and 6).
26
Jones-Rhoades
3.2.2. Test Robustness of miRNA Hairpin Folding
For most well documented plant miRNA genes, the prediction of a hairpin structure by RNAfold is insensitive to the presence or absence of additional flanking nucleotides. This implies that the predicted free energy of the miRNA hairpin is more favorable than that of alternative structures in which the nucleotides of the miRNA hairpin pair with flanking sequences. In the analysis of Arabidopsis and rice genomic hairpins, it was observed that many 20-mers that passed miRcheck when their inverted repeats (as identified by einverted) were “folded” by RNAfold in isolation, no longer passed miRcheck when folded in the context of 240 genomic nucleotides flanking either side of the 20-mer. Therefore, simply re-evaluating the secondary structure of each miRNA candidate, as predicted from a genomic sequence of arbitrary length centered on the 20-mer, can serve as a useful screen against candidates with unstable hairpins.
3.2.3. Analyze Conservation of Putative miRNAs to Other Genomes
Because of the extreme level of conservation of some miRNAs (in several cases, 100% nucleotide identity of mature miRNAs between angiosperms and moss), the identification of candidates with conservation of miRNA sequence and secondary structure in more than one genome can be a powerful method to enrich authentic miRNAs. One drawback, of course, is that any miRNAs not conserved between the genomes under comparison will not be identified. In practical terms, there are two possible schemes for the genome-wide identification of potentially conserved miRNAs. One approach, as was used in the Arabidopsis/rice analysis, is to independently identify candidate miRNAs in each genome being analyzed (i.e., carry out steps given in Subheadings 3.2.1 and 3.2.2 separately for each genome) and then to compare the candidate 20-mers to identify potential homologs. Alternatively, a list of candidate 20-mers could be identified in one genome, and potential homologs in other genomes could then be identified (using, for example, the first protocol in this chapter). Regardless of the scheme used, an important consideration is the degree of similarity required for two candidates to be considered as potential homologs. Most miRNAs conserved between Arabidopsis and rice have zero to two nucleotide substitutions when compared across genomes. Allowing three substitutions between candidate 20-mer homologs is likely to capture a large number of false positives.
3.2.4. Analyze Conservation of Putative miRNA Hairpins in Other Genomes
In an alignment of two homologous miRNA hairpins from different genomes, the mature miRNA sequences are highly conserved, as are the miRNA* sequences in most cases. The loop region between the miRNA and miRNA* is often highly divergent in both sequence and length. However, plant genomes also contain conserved hairpins (of uncharacterized function) that are
Prediction of Plant miRNA Genes
27
uniformly conserved throughout the hairpin. Therefore, depending on the level of background conservation between the genomes being analyzed, it may be possible to enrich for authentic miRNAs by discriminating against putative miRNA homologs that are not more highly conserved than the loop regions of their putative precursors. 3.2.5. Identify Potential Regulatory Targets of Putative miRNAs
The function of a mature miRNA is to guide a RISC complex to a complementary target RNA. Most plant miRNAs have extensive complementarity to their target RNAs, a fact that greatly facilitates the prediction of plant miRNA targets (20). Importantly, this high degree of complementarity has also facilitated discovery of the miRNA genes themselves through the identification of candidate targets for candidate miRNAs (9, 11). A number of algorithms, all of which substantively agree with each other, have been described for the prediction of plant miRNA targets (11, 21, 22). Any of these methods can be used for the analysis of the targeting potential of candidate 20-mers by searching for targets in annotated genes or EST sequences. The combined analysis of miRNA conservation with the analysis of targeting potential (i.e., identifying conserved miRNA candidates that can base pair to homologous target RNAs in each species) is particularly powerful for enriching for conserved miRNAs. As with other steps, choosing thresholds for target prediction that limit the number of false positive predictions is critical to obtaining meaningful results.
3.2.6. Verification of Predicted miRNAs
Due to the high potential for false positives, miRNA families predicted through genomic analyses should be viewed with skepticism until validated by experimental data. There are several types of experiments that are useful in validating the expression of predicted miRNAs. The detection of the predicted miRNAs on a northern blot can be strong evidence for expression. However, it is worth noting that a faint, fuzzy band on a northern blot is not a reliable indication of expression (see Note 7). A PCR-based assay can detect the presence of rare RNAs in a library of adapter ligated small RNAs (11, 23). Importantly, this approach can also map the 5¢ end of the predicted small RNA. Perhaps, the most powerful method of validating a predicted miRNA is to demonstrate an interaction with a predicted target. 5¢ RACE has been used by numerous labs to detect the in vivo miRNA-mediated cleavage of target RNAs (5, 11, 19, 21, 24– 26), which results in a cleaved target with a 5¢ end that aligns to the tenth nucleotide of the miRNA. Ideally, the 5¢ end of the miRNA, as identified by PCR analysis, should agree with the observed position of cleavage in the target RNA.
28
Jones-Rhoades
4. Notes
1. Other search/alignment algorithms (e.g., blast) can be used instead of patscan. Regardless of the search algorithm used, an important consideration is the extent of similarity that a genomic region must possess to the query miRNA in order to be considered as a potential homolog. While different thresholds are likely to be appropriate in different situations, the requirement for an ungapped alignment between the mature miRNA and its putative homologs with no more than two nucleotide substitutions seems to give a good balance of selectivity and sensitivity.
2. Plant miRNA hairpins can be quite long. Many hairpins are 200 to 300 nucleotides long (including the miRNA, loop region, and miRNA*), with a few ranging up to 650 nucleotides. Therefore, it is important to retrieve sufficient flanking sequence on either side of the putative miRNA. Adding additional flanking sequence will increase the computing time needed to predict the secondary structures of the genomic regions, but does not seem to affect the predicted structures of most miRNAs stem-loops.
3. The 12 putative moss miRNA homologs that lacked experimental evidence are illustrative of some of the pitfalls of miRNA prediction. In two cases, the reverse complement to experimentally supported miRNA* sequences, were identified as miRNA candidates. (The actual miRNAs were also identified as candidates in these cases.) The identity of the strand, encoding the actual miRNA, would be ambiguous in these cases without experimental evidence. Three putative members of conserved miRNA families were not detected by the deep sequencing; it is unclear if these are false positives or miRNA genes that were not represented in the library of sequenced small RNAs. Of the nine putative homologs to non-conserved families, seven were to miR414, a miRNA identified through a bioinformatic search (13) that has not been subsequently validated experimentally in any species (3, 5). Because the miR414 sequence is a degenerate triplet repeat (UCA), it has 393 matches with zero to two nucleotide substitutions in the moss genome. With so many matches to a simple sequence, it is not surprising that a few can be predicted to fold into a hairpin structure. The miR414 case illustrates that certain sequences are prone to noise in bioinformatics analyses (which probably contributed to initial mis-annotation of miR414 as a miRNA), and that given a large enough number of spurious potential miRNA homologs, some can be predicted to have potential miRNA precursors.
Prediction of Plant miRNA Genes
29
4. Using einverted parameters trained on conserved angiosperm miRNAs, the program run_einverted.pl identified 990,192 inverted repeats in the moss genome. Included in this set were inverted repeats corresponding to 95% of 205 experimentally supported moss miRNA genes, most of which are not conserved in angiosperms. An alternative to using einverted to identify genomic inverted repeats, could be to use RNAfold to predict secondary structures for genomic fragments that tile the entire genome, which could then be analyzed by miRcheck. This would have the advantage of not potentially losing miRNA loci that are not identified by einverted, but may add considerable computational load and noise to the analysis.
5. The parameters passed to miRcheck will have a large impact on the outcome. In the analysis of miRNAs conserved between rice and Arabidopsis, I found that setting parameters to be stringent, so that ~15–20% of actual miRNAs did not pass miRcheck, was helpful in reducing the number of 20-mers to a workable number (11). The use of the same stringent parameters in moss captured 80% of moss miRNAs with experimental evidence. As mentioned for other steps in this chapter, the details, goals, and available computing power may suggest that other cutoffs are more appropriate for other analyses.
6. Plant genomes contain numerous instances of simple sequence repeats, as well as runs of single nucleotides. Therefore, it may be helpful to remove simple sequences from the analysis. By default, extract_einverted_20mers.pl removes 20-mers that consist primarily any one or two nucleotides.
7. As an example, nine computationally identified miRNA families (miR413–420; miR426) were annotated in miRbase on the basis of weak northern signals (13). Subsequent deep sequencing experiments have not detected evidence for expression of these miRNAs (3, 5).
References 1. Jones-Rhoades M, Bartel D, Bartel B. Micro RNAs and their regulatory roles in plants. Annu Rev Plant Biol. 2006;57:19–53. 2. Llave C, Kasschau KD, Rector MA, Carrington JC. Endogenous and silencing-associated small RNAs in plants. Plant Cell. 2002;14: 1605–1619. 3. Lu C, Kulkarni K, Souret F, MuthuValliappan R, Tej S, Poethig R, et al. MicroRNAs and other small RNAs enriched in the Arabidopsis RNA-dependent RNA polymerase-2 mutant. Genome Res. 2006;16:1276–1288.
4. Park W, Li J, Song R, Messing J, Chen X. CARPEL FACTORY, a Dicer homolog, and HEN1, a novel protein, act in microRNA metabolism in Arabidopsis thaliana. Curr Biol. 2002;12:1484–1495. 5. Rajagopalan R, Vaucheret H, Trejo J, Bartel D. A diverse and evolutionarily fluid set of microRNAs in Arabidopsis thaliana. Genes Dev. 2006;20:3407–3425. 6. Reinhart BJ, Weinstein EG, Rhoades MW, Bartel B, Bartel DP. MicroRNAs in plants. Genes Dev. 2002;16:1616–1626.
30
Jones-Rhoades
7. Sunkar R, Girke T, Jain PK, Zhu JK. Cloning and characterization of microRNAs from rice. Plant Cell. 2005;17:1397–1411. 8. Sunkar R, Zhu JK. Novel and stress-regulated microRNAs and other small RNAs from Arabidopsis. Plant Cell. 2004;16:2001–2019. 9. Adai A, Johnson C, Mlotshwa S, Archer-Evans S, Manocha V, Vance V, et al. Computational prediction of miRNAs in Arabidopsis thaliana. Genome Res. 2005;15:78–91. 10. Bonnet E, Wuyts J, Rouzé P, Van de Peer Y. Detection of 91 potential conserved plant microRNAs in Arabidopsis thaliana and Oryza sativa identifies important target genes. Proc Natl Acad Sci USA. 2004;101: 11511–11516. 11. Jones-Rhoades M, Bartel D. Computational identification of plant microRNAs and their targets, including a stress-induced miRNA. Mol Cell. 2004;14:787–799. 12. Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, et al. The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science. 2006;313:1596–1604. 13. Wang X, Reyes J, Chua N, Gaasterland T. Prediction and identification of Arabidopsis thaliana microRNAs and their mRNA targets. Genome Biol. 2004;5:R65. 14. Xie F, Huang S, Guo K, Xiang A, Zhu Y, Nie L, et al. Computational identification of novel microRNAs and targets in Brassica napus. FEBS Lett. 2007;581:1464–1474. 15. Chiou T, Aung K, Lin S, Wu C, Chiang S, Su C. Regulation of phosphate homeostasis by MicroRNA in Arabidopsis. Plant Cell. 2006;18: 412–421. 16. Fujii H, Chiou T, Lin S, Aung K, Zhu J. A miRNA involved in phosphate-starvation response in Arabidopsis. Curr Biol. 2005;15: 2038–2043.
17. Fahlgren N, Howell M, Kasschau K, Chapman E, Sullivan C, Cumbie J, et al. High-throughput sequencing of Arabidopsis microRNAs: evidence for frequent birth and death of MIRNA genes. PLoS ONE. 2007;2:e219. 18. Lu C, Tej S, Luo S, Haudenschild C, Meyers B, Green P. Elucidation of the small RNA component of the transcriptome. Science. 2005; 309:1567–1569. 19. Axtell M, Snyder J, Bartel D. Common functions for diverse small RNAs of land plants. Plant Cell. 2007;19:1750–1769. 20. Rhoades MW, Reinhart BJ, Lim LP, Burge CB, Bartel B, Bartel DP. Prediction of plant microRNA targets. Cell. 2002;110:513–520. 21. Allen E, Xie Z, Gustafson A, Carrington J. microRNA-directed phasing during transacting siRNA biogenesis in plants. Cell. 2005; 121:207–221. 22. Schwab R, Palatnik J, Riester M, Schommer C, Schmid M, Weigel D. Specific effects of microRNAs on the plant transcriptome. Dev Cell. 2005;8:517–527. 23. Lim LP, Lau NC, Weinstein EG, Abdelhakim A, Yekta S, Rhoades MW, et al. The micro RNAs of Caenorhabditis elegans. Genes Dev. 2003;17:991–1008. 24. Kasschau KD, Xie Z, Allen E, Llave C, Chapman EJ, Krizan KA, et al. P1/HC-Pro, a viral suppressor of RNA silencing, interferes with Arabidopsis development and miRNA function. Dev Cell. 2003;4:205–217. 25. Llave C, Xie Z, Kasschau KD, Carrington JC. Cleavage of Scarecrow-like mRNA targets directed by a class of Arabidopsis miRNA. Science. 2002;297:2053–2056. 26. Palatnik JF, Allen E, Wu X, Schommer C, Schwab R, Carrington JC, et al. Control of leaf morphogenesis by microRNAs. Nature. 2003;425:257–263.
Chapter 3 Methods for Isolation of Total RNA to Recover miRNAs and Other Small RNAs from Diverse Species Monica Accerbi, Skye A. Schmidt, Emanuele De Paoli, Sunhee Park, Dong-Hoon Jeong, and Pamela J. Green Abstract For the experimental analysis of miRNAs and other small RNAs in the 20–25 nucleotide (nt) size range, the first and most important step is the isolation of high-quality total RNA. Because RNA degradation products can mask or dilute the presence of true miRNAs, it is important when choosing a method that it efficiently extracts RNA from tissues in a manner that prevents degradation of RNA of both high and low molecular weight. In addition, the presence of polyphenols, polysaccharides, and secondary metabolites may render nucleic acids insoluble, and hinder the recovery of the miRNAs. Finally, and most importantly, the method chosen must be capable of retaining the small RNA component. In this chapter, we will present a set of total RNA isolation methods that can be used to maximize the recovery of high-quality RNA to be used in miRNA analysis for a large number of plant species and tissue types. Key words: RNA, Total RNA extraction, Small RNA extraction, TriReagent®, TRIzol®, Plant RNA Isolation Reagent®
1. Introduction Plants present a wide range of tissue types, both within and between species, which can often make universal methods for molecular biological techniques difficult to develop. Plant tissue varies greatly, from soft green tissue to dry seeds, flowers, roots, xylem and bark, waxy leaves, spiny needles and hard or juicy fruits.
B.C. Meyers and P.J. Green (eds.), Plant MicroRNAs, Methods in Molecular Biology, vol. 592 DOI 10.1007/978-1-60327-005-2_3, © Humana Press, a part of Springer Science + Business Media, LLC 2010
31
32
Accerbi et al.
Total RNA isolation in plant species is often complicated by the presence of recalcitrant tissue and organic compounds (1–4). While many methods exist to extract high-quality RNA from plant tissues, not all are preferred for miRNA analysis. Ion exchange and silica based separation techniques, including the Qiagen RNeasy spin column, are biased against RNAs less than 200 nt in size, and do not effectively retain the miRNA fraction. Furthermore, while LiCl precipitation is often used for total RNA extraction, this salt is largely incapable of retaining the small RNA population and therefore should be substituted with sodium salt and/or alcohol precipitation (5). The isolation of high-quality RNA depends greatly on the treatment and handling of the tissue prior to RNA extraction. Cellular ribonucleases act quickly and efficiently to degrade RNA upon cell lysis; therefore, tissue frozen in liquid nitrogen and stored at no more than −70°C will provide the best possible material for miRNA analysis. Commercial reagents such as RNAlater® (Ambion) may increase storage stability without compromising the quality or quantity of the recovered RNA, although the methods we describe do not rely on this reagent. The process of RNA isolation preferred for miRNA analysis is carried out in three main steps: lysis and denaturation, organic extraction, and RNA precipitation. A chaotropic denaturing agent such as guanidinium, or sometimes a strong reducing agent like 2-mercaptoethanol, will disrupt RNase activity upon cellular disruption and maintain the integrity of the RNA (6–8). In addition to the denaturing activity achieved by the disruption of disulfide bonds, reducing agents in the extraction buffer will prevent the oxidation of polyphenolics, rendering them incapable of binding to nucleic acids (9, 10). Organic extraction by phenol or acid phenol, and chloroform or 1-bromo-3-chloropropane (BCP), followed by centrifugation, will separate the total RNA from DNA and protein components. Finally, alcohol precipitation of the RNA from the upper aqueous layer is capable of retaining both large and small RNAs. RNA extraction from most green tissues can usually be carried out very successfully using TriReagent® (Molecular Research Center, Inc.) or TRIzol® (Invitrogen): this is the same product marketed by different companies. Based on the key reagent in a method developed by Chomczynski and Sacchi (6), the TriReagent® is a mono-phasic solution of phenol and guanidine isothiocyanate. Upon the addition of chloroform or BCP followed by centrifugation, the solution will separate into an organic and an aqueous phase, the RNA remaining in the clear aqueous phase. This method, when followed by alcohol precipitation, provides a high recovery of the small RNA fraction, as well as high-quality, non-degraded RNA.
Methods for Isolation of Total RNA to Recover miRNAs
33
For plant tissues that contain a high level of polyphenolics, polysaccharides, and fibrous materials (e.g., conifers and seeds), Plant RNA Isolation Reagent® (Invitrogen) (PRIR) is a better choice for RNA extraction. This proprietary reagent, when compared to TRIzol® or TriReagent® extraction of Arabidopsis leaves in parallel, provides RNA with a reduced A230 and a higher A260/A280 ratio. A description of these absorbances and the ratios characteristic of high-quality RNA can be found in Subheading 3.7.2 below. Highly recalcitrant plant tissues often have a high abundance of secondary metabolites that may interfere with guanidinium extraction (11). Another method of RNA isolation we present in this chapter is based on (11), with modifications by M. Perez-Amador (personal communication), and adapted for the recovery of small RNA molecules. In this instance, a highly saline, high-pH extraction buffer is used in place of guanidinium extraction, and is useful in cases such as orange tissue, where secondary metabolites may interfere with normal RNA extraction. In this chapter, we present three RNA extraction protocols that facilitate successful isolation of high-quality total RNA from a variety of representative plant species and tissue types with minimal addition of proprietary materials. The scheme in Fig. 1 can be used to help deduce an appropriate method and possible modifications that may be successful for a given sample. Table 1 lists the methods and modifications that we have used successfully to isolate total RNA that retains small RNA for nearly 100 different plant samples. These details are meant to provide an example of what has worked for a wide range of species and tissues, rather than suggest that these approaches are the only effective ones. While many commercial extraction buffers and published RNA isolation methods are specifically tailored to the type of tissue and use for the RNA desired, the methods outlined below are preferred not only for their universal application to plant species, but also for their relatively low cost in both time and materials. The RNA obtained from any of these methods may be utilized for RT-PCR, gel blotting, and library construction, or to analyze mRNAs, housekeeping RNAs such as rRNAs and tRNAs, and small RNAs such as miRNAs and siRNAs. In addition, the total RNA may be used subsequently to isolate low molecular weight RNA (~200 nt and smaller; an excellent preparation for examining small RNAs with gel blotting) and small RNA (~20–25 nt). Detailed protocols for the isolation of these RNA size fractions and the preparation of small RNA cDNA libraries can be found in Chap. 8 of this volume, and that for detection of small RNAs with gel blotting can be found in Chaps. 13 and 14.
Low Yield
Insoluble Precipitate, Low 260/230
High-Quality RNA
High Salt
Extra Chloroform
LowQuality RNA
High-Quality RNA
LowQuality RNA
Low Yield
Insoluble Precipitate, Low 260/230
Degraded RNA, Low 260/280
High-Quality RNA
Potassium Acetate (+ High Salt)
Extra Chloroform (+Acid Phenol)
II-B. Modified PRIR
LowQuality RNA
High-Quality RNA
III. Guanidinium -Free
Recalcitrant tissue, high levels of secondary metabolites
Isolation Reagent, or (III) Guanidinium-Free. For most green tissues or tissues of unknown properties, the most simple and cost-effective method is the TriReagent Isolation. However, if the RNA from this method is of poor quality, assessed by the characteristics of the resultant RNA, it is possible to determine using the illustrated scheme which method will likely suit the specific tissue. Low-quality RNA can have a variety of characteristics. A low A260/A280 ratio is indicative of protein contamination, whereas a low A260/A230 ratio or an insoluble precipitate indicates polysaccharide, salt, or phenolic contamination. Low yield can be determined by an overall low OD. Although yield is not synonymous with low quality, it can lead to low quality if multiple preparations must be combined and contaminants become too concentrated. Furthermore, degraded RNA can be seen by visual inspection on an agarose gel. As discussed in the text, these methods and modifications should enable the extraction of high-quality RNA from almost any tissue type.
Fig. 1. The isolation of high-quality RNA from most plant tissues can be accomplished using a modified version of one of three extraction methods: (I) TriReagent, (II) Plant RNA
High-Quality RNA
LowQuality RNA
II-A. Basic PRIR
I-B. Modified TriReagent
I-A. Basic TriReagent
Degraded RNA, Low 260/280
High levels of polyphenolics polysaccharides, and fibrous tissue
Green tissue
34 Accerbi et al.
Roots Roots Roots Roots Roots inoculated w/ G. intraradices Roots inoculated w/ M. incognita Roots mock inoculated Roots, stripped Root hairs Nodules 14 dpi Nodules 14 dpi Nodules 21 dpi Stolon Rhizoid
Medicago Soybean Soybean Medicago Soybean Common bean Potato Miscanthus, Zostera
TriReagent TriReagent TriReagent TriReagent PRIR PRIR PRIR PRIR Potassium acetate, high salt
Medicago
Extra chloroform
Common Bean, Rice Marsilea, Medicago, Mimulus Zostera Nuphar Medicago
(continued)
Lettuce, orange, pepper, potato, rice, Silene, soybean, tobacco, wheat Amborella, Aristolochia Marsilea, Pumpkin Cycas, Ginkgo, Grapevine, Maize, Mimulus, Miscanthus, Nuphar, Petunia, Poplar, Sorghum, Switchgrass Banana Avocado Zostera Cotton Spruce
Speciesc
PRIR
TriReagent PRIR PRIR Guanidinium-Free TriReagent
PRIR PRIR PRIR PRIR PRIR PRIR
Leaves Leaves Leaves Leaves Leaves Needles
Radical tissues
High salt
TriReagent TriReagent
Leaves Leaves Extra chloroform Potassium acetate Extra chloroform, high salt Acid phenol, proteinase K, Extra chloroform Extra chloroform
Extra chloroform
Modificationsb
TriReagent
Base methoda,b
Leaves
Foliage tissues
Tissue
Table 1 RNA isolation methods that retain small RNA from plant species
Methods for Isolation of Total RNA to Recover miRNAs 35
Whole flower Whole flower Whole flower Whole flower Whole flower Whole flower Spikelet Ovules Strobilus (cone) Strobilus (cone) Fruit Fruit Fruit Fruit Fruit Grain Seed pod Seeds Seeds Seeds 20 DAA
Reproductive tissues
Tissue
Table 1 (continued)
TriReagent PRIR PRIR PRIR PRIR Guanidinium-free PRIR PRIR PRIR PRIR PRIR PRIR PRIR PRIR Guanidinium-free PRIR PRIR TriReagent PRIR PRIR
Base methoda,b
Potassium acetate, high salt Small scale
Extra chloroform
Extra chloroform Potassium acetate Acid phenol, proteinase K, extra chloroform
Extra chloroform
Extra chloroform Potassium acetate Extra chloroform, high salt
Modificationsb
Lettuce, orange, rice, Silene, soybean, tobacco, pepper Aristolochia, avocado, common bean, petunia, potato Banana, cotton, pumpkin Grapevine Zostera Nuphar Barley, maize, Miscanthus, sorghum, switchgrass, wheat Cycas Cycas, Ginkgo Spruce Avocado Banana Grapevine Pepper Orange Sorghum Tobacco Soybean Common bean Medicago
Speciesc
36 Accerbi et al.
TriReagent PRIR PRIR TriReagent LS TriReagent PRIR PRIR TriReagent Extra chloroform, high salt Extra chloroform
Potassium acetate
Barley Poplar Poplar Pumpkin Chlamydomonas, Volvox Coleochaete, Klebsormidium, Spyrogira, Ulva Chara Porphyra
b
a
PRIR, Plant RNA Isolation Reagent® (Invitrogen); TriReagent® and TriReagent LS® (Molecular Research Center, Inc.) See details in Methods section c See http://smallrna.udel.edu
Seedlings Xylem Xylem, tension Phloem sap Algal thallus Algal thallus Algal thallus Algal thallus
Other tissues
Methods for Isolation of Total RNA to Recover miRNAs 37
38
Accerbi et al.
2. Materials 2.1. General Materials and Reagents Needed to Perform the RNA Extractions Described
1. Instruments: (a) Centrifuge with rotor (15,000 × g) (b) Microcentrifuge (c) Spectrophotometer 2. RNase-free mortars, pestles and spatulas: wrap in foil and bake at 220°C at least 12 hours. 3. RNase-free tips, microcentrifuge tubes and 13 ml sterile centrifuge tubes are commercially available; 30 ml non-disposable centrifuge tubes are treated as follows: (a) Soak tubes in 3% hydrogen peroxide at least 1 h; rinse well with DEPC-water and let air dry. (b) Alternatively, soak tubes in 0.1% DEPC-water overnight and let air dry. (c) Autoclaving is recommended for tubes being used beyond the extraction buffer and organic solvent steps. 4. RNase-free chemicals: solutions must be made with DEPCtreated distilled water and sterilized. 5. DEPC-water: add 0.05% DEPC to distilled water, stir overnight and autoclave. RNase-free water is also available commercially. 6. Liquid nitrogen. 7. Gloves: make sure you are wearing clean gloves at all times when handling RNA solutions.
2.2. Plant RNA Isolation Reagent®
1. Plant RNA Isolation Reagent (PRIR) (Invitrogen, Cat. No. 12322-012).
2.2.1. Materials
2. 5 M NaCl. 3. Chloroform. 4. Isopropyl alcohol. 5. 75% ethanol. 75% ethanol is stored at −20°C and used cold. 6. RNase-free water.
2.2.2. Optional Materials
1. High salt solution: 0.8 M sodium citrate, 1.2 M NaCl. 2. Acid phenol:chloroform:isoamyl alcohol 125:24:1, pH 4.5 (Ambion, Cat. No. AM9720). 3. 2 M Potassium acetate, pH 5.5. 4. Proteinase K 20 mg/ml. 5. SDS 20%. 6. 3 M Sodium acetate, pH 5.2. 7. Ethanol.
2.3. Plant RNA Isolation Reagent®: Small Scale Procedure for Medicago truncatula Developing Seeds (20 Days After Anthesis, DAA)
Methods for Isolation of Total RNA to Recover miRNAs
39
1. PRIR, Invitrogen, Cat. No. 12322-012. 2. 5 M NaCl. 3. Chloroform. 4. 2 M Potassium acetate, pH 5.5. 5. High salt solution: 0.8 M sodium citrate, 1.2 M NaCl. 6. Isopropyl alcohol. 7. 75% ethanol. 8. RNase-free water. 9. 3 M Sodium acetate, pH 5.2. 10. Ethanol.
2.4. TriReagent ® / TRIzol® RNA Isolation
1. TriReagent® (Molecular Research Center, Cat. No. TR-118) or TRIzol® (Invitrogen, Cat. No.15596-018).
2.4.1. Materials
2. Chloroform or 1-bromo-3-chloropropane (BCP). 3. Isopropyl alcohol. 4. 75% ethanol. 5. RNase-free water.
2.4.2. Optional Materials
1. Acid phenol:chloroform:isoamyl alcohol 125:24:1, pH 4.5 (Ambion, Cat. No. AM9720). 2. 3 M Sodium acetate, pH 5.2. 3. Ethanol.
2.5. TriReagent LS ® RNA Isolation
1. TriReagent LS® (Molecular Research Center, Cat. No. TS-120). 2. 20 mg/ml glycogen or Polyacryl Carrier® (Molecular Research Center, Cat. No. PC-152). 3. Chloroform or BCP. 4. Isopropyl alcohol. 5. 75% ethanol. 6. RNase-free water.
2.6. Guanidinium-Free RNA Isolation
1. Extraction buffer: (a) 100 mM TrisHCl, pH 9.0 (b) 200 mM NaCl (c) 15 mM EDTA, pH 8.0 (d) 0.5% Sarkosyl (e) 8 ml/ml 2-mercaptoethanol added just prior to use. 2. Tris-HCl saturated phenol, pH 6.7. 3. Phenol:chloroform:isoamyl alcohol, 25:24:1.
40
Accerbi et al.
4. Chloroform:isoamyl alcohol, 24:1. 5. Isopropyl alcohol. 6. 3 M Sodium acetate, pH 5.2. 7. 70% ethanol. 8. RNase-free water.
3. Methods 3.1. Grinding of Plant Material with Mortar and Pestle in Liquid Nitrogen
The following general guidelines apply to any RNA work described in this chapter. The tissue must remain frozen at all times during grinding to avoid release of ribonucleases. Keep adding liquid nitrogen to the mortar as soon as it evaporates. 1. Freeze mortar and pestle at −80°C for at least 1 h. 2. Working on a clean bench, pour some liquid nitrogen in the mortar to chill it. 3. Add the sample to the mortar and start grinding gently to break the bigger pieces. Increase grinding speed and force as the powder becomes finer. Keep adding liquid nitrogen and grind until no tissue particles are visible; typically, sufficiently ground green tissue appears almost white. 4. Chill an RNase-free spatula in liquid nitrogen and transfer the frozen powder to a tube containing the extraction buffer (see step 1 of chosen method). 5. Immediately close the tube and vortex to dissolve any clumps, no more than a few seconds. 6. Quickly unscrew the tube cap to release any pressure which may have arisen due to the evaporation of residual liquid nitrogen. This step is particularly important if the transferred powder looks wet, indicating that the liquid nitrogen has not completely evaporated. 7. Close the tube and continue vortexing for the amount of time indicated in the method of choice (generally in step 1); make sure no clumps of powder are visible. 8. Proceed with the extraction method of choice, generally starting with step 2.
3.2. Plant RNA Isolation Reagent® 3.2.1. General Information
This procedure is adapted from the manufacturer’s protocol (Invitrogen) “Large-Scale RNA Isolation” with the following changes: 1. If you do not know the weight of the sample, estimate the amount of reagent to use by the sample volume: use about 1 ml of loosely packed ground frozen tissue per 10 ml of reagent.
Methods for Isolation of Total RNA to Recover miRNAs
41
2. All centrifugations are carried out at 10,000–12,000 × g in 13 ml or 30 ml centrifuge tubes. 3. Clarification of the extract is achieved by centrifugation instead of filtering. 3.2.2. Procedure
1. Add the frozen ground tissue to a tube containing PRIR and mix thoroughly by vortexing, about 1 min. 2. Incubate for 5 min at room temperature, laying the tube on its side with gentle shaking. 3. Centrifuge 10 min, 4°C at 12,000 × g to precipitate insoluble material; transfer the clarified supernatant to a clean tube (see Note 1). 4. Per 10 ml of clarified supernatant, add 2 ml of 5 M NaCl and mix by inverting the tube, then add 6 ml of chloroform and mix by vortexing. 5. Centrifuge 10 min, 4°C at 12,000 × g to separate the phases; transfer the top aqueous phase to a clean tube, being careful not to disturb the interface. Recovery should be 8–10 ml (see Note 2). 6. Precipitate the RNA by adding 0.9 volumes of isopropyl alcohol and mix gently by inverting the tube 5–6 times (see Notes 3 and 4). 7. Incubate for 10–15 min at room temperature. 8. Centrifuge 30 min, 4°C at 12,000 × g to precipitate the RNA. Gently decant the supernatant, taking care not to disturb the pellet. 9. Wash the pellet by adding 5–10 ml of cold 75% ethanol and mix by vortexing. 10. Centrifuge 10 min, 4°C at 12,000 × g. 11. Gently decant the ethanol, taking care not to disturb the pellet. Briefly centrifuge to collect all residual liquid and remove it with a pipette; air dry the pellet, but not completely. 12. Dissolve the RNA pellet in RNase-free water and transfer it to a clean microcentrifuge tube (see Notes 5 and 6). 13. Measure the OD of the RNA. Good-quality RNA should have an A260/A280 ratio > 1.7 and an A260/A230 > 1.4 (see Note 6 and Subheading 3.7.2 below).
3.3. Plant RNA Isolation Reagent®: Small Scale Procedure for Medicago truncatula Developing Seeds (20 Days After Anthesis, DAA)
14. Store the RNA at −80°C. With this plant material, the RNA obtained by pooling numerous small scale extractions was of better quality than any single large-scale preparation. 1. Grind about 30 seeds in liquid nitrogen with mortar and pestle.
42
Accerbi et al.
2. Add the frozen ground tissue to a tube containing 6 ml of PRIR (0.5 ml PRIR/0.1 g tissue) and mix thoroughly by vortexing, about 1 min. 3. Incubate for 5 min at room temperature, laying the tube on its side with gentle shaking. 4. Divide the buffer-tissue mix in 0.5 ml aliquots into 1.5 ml microcentrifuge tubes. 5. Centrifuge 5 min, 4°C at 12,000 × g. 6. Transfer the supernatant to clean tubes. 7. To each supernatant, add 100 ml of 5 M NaCl and mix by vortexing, then add 300 ml of chloroform and mix by vortexing. 8. Centrifuge 15 min, 4°C at 12,000 × g. 9. Transfer the upper phase to clean tubes. 10. To the upper phase add an equal volume of 2 M KOAc. 11. Incubate on ice for 1 h. 12. Centrifuge 15 min, 4°C at 12,000 × g. 13. Transfer the supernatant to clean tubes. 14. To each supernatant, add ½ volume high salt solution and mix by inverting the tube, then add ½ volume isopropyl alcohol and mix by inverting the tube. 15. Centrifuge 10 min, 4°C at 12,000 × g. 16. Wash the pellet by adding 0.5 ml of cold 75% ethanol and mix. 17. Centrifuge 3 min, 4°C at 12,000 × g. 18. Gently decant the ethanol, taking care not to disturb the pellet. Briefly centrifuge to collect all residual liquid and remove it with a pipette; air dry the pellet, but not completely. 19. Dissolve RNA in 20 ml of RNase-free water. 20. Pool all the RNA fractions and OD the resulting solution. If the ratios are not optimal (A260/A280 < 1.7 and A260/A230 < 1.4), proceed to step 21; expect 50–60% recovery. 21. To the RNA solution, add 1/10 volume of 3 M NaOAc (pH 5.4) and mix by inverting the tube, then add 2 volumes of ethanol and mix by inverting the tube. 22. Incubate at −80°C for at least 2 h. 23. Centrifuge 20 min, 4°C at 12,000 × g. 24. Wash the pellet by adding 1 ml of cold 75% ethanol and mix. 25. Centrifuge 10 min, 4°C at 12,000 × g. 26. Gently decant the ethanol, taking care not to disturb the pellet. Briefly centrifuge to collect all residual liquid and remove it with a pipette; air dry the pellet, but not completely.
Methods for Isolation of Total RNA to Recover miRNAs
43
27. Dissolve RNA in RNase-free water. For high concentration, dissolve the pellet in less than half the volume in step 19. 3.4. TriReagent ®/ TRIzol ® RNA Isolation
This is an RNA extraction buffer based on phenol and guanidine thiocyanate (6, 8) commercially sold under the names of TriReagent® by Molecular Research Center and TRIzol® by Invitrogen. Use no more than 1 g of tissue per 10 ml of TriReagent®/TRIzol®. 1. Add 1 g of frozen ground tissue to a tube containing 10 ml of TriReagent®/TRIzol® and mix thoroughly by vortexing, about 1 min. 2. Incubate for 5–10 min at room temperature, laying the tube on its side with gentle shaking. 3. Add 1 ml (1/10 volume) of BCP or 2 ml of chloroform and mix by vortexing. 4. Centrifuge 15 min, 4°C at 10,000 × g to separate the phases. 5. Transfer the top aqueous phase to a clean tube, being careful not to disturb the interface. Recovery should be 6–7 ml (see Note 7). 6. Precipitate the RNA by adding 0.9 volumes of isopropyl alcohol and mix gently by inverting the tube 5–6 times. 7. Incubate for 10–15 min at room temperature. 8. Centrifuge 15 min, 4°C at 10,000 × g to precipitate the RNA. Gently decant the supernatant, taking care not to disturb the pellet. 9. Wash the pellet by adding 10 ml of cold 75% ethanol and mix. 10. Centrifuge 10 min, 4°C at 10,000 × g. 11. Gently decant the ethanol, taking care not to disturb the pellet. Briefly centrifuge to collect all residual liquid and remove it with a pipette; air dry the pellet for 1–2 min. 12. Dissolve the RNA pellet in RNase-free water and transfer it to a clean microcentrifuge tube (see Note 8).
3.5. TriReagent LS ® RNA Isolation
TriReagent LS® is a more concentrated version of TriReagent®/ TRIzol® suitable for RNA extraction from liquid samples. Use a volume ratio TriReagent LS®:sample of 3:1. With pumpkin phloem sap, we had the best results when multiple samples were combined before the extraction as described below. 1. Harvest 250 ml of phloem sap directly into 1.5 ml microcentrifuge tubes containing 750 ml of TriReagent LS®; keep the tubes frozen until ready to proceed with the RNA extraction. 2. Thaw the samples by vortexing. 3. Centrifuge 5 min, 4°C at 13,000 × g to precipitate insoluble material. 4. Combine the clarified supernatant from 6 samples in a 13 ml centrifuge tube and add 1 ml of TriReagent LS®; mix well.
44
Accerbi et al.
5. Add 8 ml of glycogen or Polyacryl Carrier® and mix. 6. Add 0.1 ml of BCP per 0.75 ml of TriReagent LS®, about 750 ml, and mix by vortexing. 7. Incubate for 5–15 min at room temperature. 8. Centrifuge 15 min, 4°C at 12,000 × g. 9. Transfer the upper phase to a clean tube, being careful not to disturb the interphase. Recovery should be 60–70% of the original TriReagent LS® volume (3.5–3.8 ml). 10. Precipitate the RNA: to the upper phase, add 0.7 ml of isopropyl alcohol per 0.75 ml of TriReagent LS® (about 4 ml) and mix gently. 11. Incubate for 10–15 min at room temperature. 12. Centrifuge 15 min, 4°C at 10,000 × g to precipitate the RNA. Decant the supernatant gently, taking care not to disturb the pellet. 13. Wash the pellet by adding 7 ml of cold 75% ethanol and mix. 14. Centrifuge 10 min, 4°C at 10,000 × g. 15. Gently decant the ethanol, taking care not to disturb the pellet. Briefly centrifuge to collect all residual liquid and remove it with a pipette; air dry the pellet for 1–2 min. 16. Dissolve the RNA pellet in 30–40 ml of RNase-free water, and transfer it to a clean microcentrifuge tube. 3.6. Guanidinium-Free RNA Isolation
This isolation method is based on (11), with modifications by M. Perez-Amador (personal communication, Instituto de Biologia Molecular y Celular de Plantas, Valencia, Spain), and adapted for the recovery of small RNA molecules. 1. Add 0.5–1 g of frozen ground tissue to a 13 ml tube containing 4 ml of extraction buffer with 2-mercaptoethanol and mix thoroughly by vortexing. 2. Add 4 ml of phenol and mix by vortexing. 3. Add 0.8 ml of chloroform:isoamyl alcohol and mix by vortexing. 4. Add 280 ml of 3 M NaOAc pH 5.2 and mix by vortexing. 5. Incubate for 10–15 min on ice. 6. Centrifuge 10 min, 4°C at 10,000 × g. 7. Transfer the upper phase to a clean tube, being careful not to disturb the interphase. 8. Add 4 ml of phenol:chloroform:isoamyl alcohol and mix by vortexing. 9. Centrifuge 10 min, 4°C at 10,000 × g.
Methods for Isolation of Total RNA to Recover miRNAs
45
10. Transfer the upper phase to a clean tube; if there is still a large interphase, repeat the phenol:chloroform:isoamyl alcohol extraction. 11. Add an equal volume of chloroform:isoamyl alcohol and mix by vortexing. 12. Centrifuge 10 min, 4°C at 10,000 × g. 13. Transfer the upper phase to a clean tube. 14. Add an equal volume of isopropyl alcohol and mix gently. 15. Incubate on ice 30 min to 2 h or at −80°C overnight. 16. Centrifuge 10 min, 4°C at 10,000 × g. 17. Gently decant the supernatant, taking care not to disturb the pellet. 18. Wash the pellet by adding 5 ml of cold 70% ethanol and mix. 19. Centrifuge 10 min, 4°C at 10,000 × g. 20. Gently decant the ethanol, taking care not to disturb the pellet. Briefly centrifuge to collect all residual liquid and remove it with a pipette; air dry the pellet for 1–2 min. 21. Dissolve the RNA pellet in RNase-free water and transfer it to a clean microcentrifuge tube. 3.7. Evaluation of RNA Quality
Plant RNA preparations may contain traces of soluble and/or insoluble contaminants that have escaped the purification process; this happens when the capacity of the extraction buffer is exceeded and the contaminants are carried over through the purification steps. The evaluation of the total RNA quality can be divided into three steps: inspection of the RNA pellet and its resuspension, OD reading, and inspection of the RNA using gel electrophoresis.
3.7.1. Inspection of RNA Pellet and Resuspension
Carry-over of undesirable contaminants can sometimes be recognized from the abnormal size or color of the RNA pellet prior to resuspension. Although soluble anthocyanins and other pigments can still be present at the beginning of the purification process, a colored pellet after isopropyl alcohol precipitation and ethanol wash is often an indication of impure RNA. In particular, brown pellets can result from the oxidation and decomposition of phenolic compounds (12). An RNA pellet rich in polysaccharides dissolves quite easily in RNase-free water, but the solution will be viscous (see Note 4(b), steps (1)-(10)). In addition, presence of polyphenolic compounds makes the RNA pellet difficult to dissolve. In these cases, a freezethaw-centrifuge step (see Note 6) should be taken before further assessing the RNA quality.
46
Accerbi et al.
3.7.2. OD Reading
RNA quality can be evaluated by measuring the UV absorbance of the preparation at different wavelengths. Nucleic acids have a peak of absorbance at 260 nm (A260), proteins at 280 nm (A280) and polysaccharides mostly at 230 nm (A230) (13, 14). The ratios A260/A280 and A260/A230 are generally used as indicators of protein and polysaccharide contamination. Good-quality RNA should have an A260/A280 ratio in the 1.7–2.0 range and an A260/A230 ratio in the 1.4–2.0 range.
3.7.3. Electrophoresis
Running an aliquot of the RNA sample on an agarose gel is the most common method used to evaluate the integrity of total RNA preparations. On non-denaturing 1% agarose gels, partially degraded RNA samples appear as a wide-ranging smear, with no clear bands representing the 28S and 18S rRNA molecules expected in eukaryotic samples. Fully degraded RNA will appear as a very low molecular weight smear. On denaturing agarose gels, it is possible to determine the quality of the RNA by comparing the intensity of the 28S and 18S rRNA bands: in good-quality RNA, the 28S rRNA band should be more intense than the 18S. An alternative to the traditional gel-based approaches is the new microfluidics technology, where electrophoresis is performed on a miniaturized scale on a glass chip (e.g., Agilent 2100 Bioanalyzer, www.agilent.com). These instruments offer high sensitivity, minimal sample consumption, and high resolution. The output is presented as electropherograms and gel-like images, which allow visual inspection of the RNA quality much like an agarose or denaturing gel.
4. Notes 1. If the tissue contains high levels of ribonucleases, a treatment with proteinase K followed by an extraction with acid phenol will help to minimize RNA degradation. Add the following steps before the chloroform extraction, step 4: (a) To the clarified supernatant, add SDS to obtain a final concentration of 0.5%. (b) Add proteinase K to a final concentration of 400– 500 mg/ml. (c) Incubate 15–30 min at 55°C, or according to the proteinase K manufacturer. (d) Add an equal volume of acid phenol:chloroform 5:1 solution, pH 4.5, and mix by vortexing. (e) Centrifuge 10 min, 4°C at 12,000 × g.
Methods for Isolation of Total RNA to Recover miRNAs
47
(f) Transfer the top aqueous phase to a clean tube. (g) Proceed to step 4. 2. Do an extra chloroform extraction if (1) the interphase is thick and/or loose; (2) during transfer, the upper phase gets contaminated with interphase; or (3) you seek to maximize the extraction. Extra chloroform extraction after step 5: (a) To the transferred top aqueous phase add an equal volume of chloroform and mix by vortexing. (b) Centrifuge 10 min, 4°C at 12,000 × g. (c) Transfer the new top aqueous phase to a clean tube. Recovery should be 6–8 ml. (d) Proceed to step 6. 3. For tissues rich in polysaccharides, precipitation of RNA in the presence of high salt will sometimes improve the quality. Replace step 6 with the following step: To the aqueous phase add ½ volume high salt solution, and mix by inverting the tube. Then add ½ volume (aqueous + salt) isopropyl alcohol and mix by inverting the tube. 4. For tissues rich in carbohydrates/polysaccharides, a potassium acetate precipitation will sometimes improve the quality. This precipitation can be done after chloroform extraction (step 5), when RNA of low quality is expected, or after the RNA has been dissolved in DEPC-H2O and its A260/A230 ratio is 1 or less. In the latter case, some RNA will be lost with the precipitation. a. For potassium acetate precipitation after chloroform extraction, replace step 6 with the following: 1. Aliquot the supernatant into two tubes (4–5 ml in each) and add an equal volume of 2 M KOAc. 2. Mix by vortexing. 3. Incubate for 1 h on ice. 4. Centrifuge 30 min, 4°C at 10,000 × g. 5. Transfer the supernatant to new tubes. Some pellet may be visible. 6. To the supernatant add ½ volume of high salt solution and mix by inverting the tube. Then add ½ volume isopropyl alcohol and mix by inverting the tube. b. For potassium acetate precipitation after the RNA has been dissolved, continue with the following after step 14: 1. To the RNA solution add 1/10 volume of 2 M KOAc, pH 5.5 and mix. 2. Incubate for 10 min on ice.
48
Accerbi et al.
3. Centrifuge 15 min, 4°C at 13,000 × g to precipitate insoluble material; transfer the supernatant to a clean tube. 4. Precipitate the RNA by adding 0.9 volumes of isopropyl alcohol and mix gently by inverting the tube 5–6 times. 5. Incubate for 1 h at −20°C. 6. Centrifuge 25 min, 4°C at 13,000 × g to precipitate the RNA. Gently decant the supernatant, taking care not to disturb the pellet. 7. Wash the pellet by adding 1 ml of cold 75% ethanol and mix by vortexing. 8. Centrifuge 10 min, 4°C at 13,000 × g. 9. Gently decant the ethanol, taking care not to disturb the pellet. Briefly centrifuge to collect all residual liquid and remove it with a pipette; air dry the pellet, but not completely. 10. Dissolve the RNA pellet in RNase-free water and transfer it to a clean microcentrifuge tube. 5. We generally resuspend to a final concentration of ~0.5– 1.0 mg/ml. For Arabidopsis leaves, this corresponds to a resuspension volume of about 0.5 ml for a total RNA prep from about 1 g of tissue, although recovery from different tissues and plants can vary considerably. It should be noted that if the recovery is very low, the A260/A280 can be misleading and overestimate the effective yield, presumably because the proportion of background contaminants contributing to the absorption is greater. 6. If insoluble material is visible in the RNA solution, remove it as follows: (a) Freeze the RNA at −80°C for at least 30 min. (b) Thaw by vortexing and centrifuge 10 min, 4°C at 12,000 × g to precipitate insoluble material. (c) Transfer the supernatant to a clean microcentrifuge tube and determine the OD. 7. Extra BCP/chloroform extraction (a) To the transferred top aqueous phase add ½ volume of BCP or an equal volume of chloroform and mix by vortexing. (b) Centrifuge 10 min, 4°C at 10,000 × g. (c) Transfer the upper phase to a clean tube. Recovery should be 5–6 ml. 8. If the RNA quality is less than optimal, an acid phenol extraction should help to improve it.
Methods for Isolation of Total RNA to Recover miRNAs
49
(a) To the RNA solution in a microcentrifuge tube add ½ volume of acid phenol:chloroform:isoamyl alcohol and mix by vortexing. (b) Centrifuge 8 min, 4°C at 13,000 × g. (c) Transfer the upper phase to a clean tube, add ½ volume of acid phenol:chloroform:isoamyl alcohol and mix by vortexing. (d) Centrifuge 8 min, 4°C at 13,000 × g. (e) Transfer the upper phase to a clean tube, add an equal volume of chloroform and mix by vortexing. (f) Centrifuge 8 min, 4°C at 13,000 × g. (g) Transfer the upper phase to a clean tube. If the upper phase is viscous, repeat the chloroform extraction. Otherwise, proceed as follows: add 1/10 volume of 3 M NaOAc pH 5.2 and mix, then add 2 volumes of ethanol and mix. (h) Incubate 15–20 min at −80°C. (i) Centrifuge 20 min, 4°C at 13,000 × g. (j) Wash the pellet with 500 ml of 75% cold ethanol. (k) Centrifuge 8 min, 4°C at 13,000 × g. (l) Gently decant the ethanol, taking care not to disturb the pellet. Briefly centrifuge to collect all residual liquid and remove it with a pipette; air dry the pellet, but not completely. (m) Dissolve RNA in RNase-free water.
Acknowledgments The methods described in this chapter were developed with support from NSF, USDA and DOE.
References 1. Geuna F, Hartings H, Scienza A (1998) A new method for rapid extraction of high quality RNA from recalcitrant tissues in grapevine. Plant Mol Biol Rep 16:61–67 2. Salzman RA, Fugita T, Zhu-Salzman K, Haswgawa PM, Bressman RA (1999) An impr oved RNA isolation method for plant tissues containing high levels of phenolic compound or carbohydrates. Plant Mol Biol Rep 17: 11–17
3. Tai HH, Pelletier C, Beardmore T (2004) Total RNA isolation from Pica mariana dry seed. Plant Mol Biol Rep 22:93a–93e 4. Yeh K, Juang R, Su J (1991) A rapid and efficient method for RNA isolation from plant with high carbohydrate content. Focus 13:102–103 5. Wallace DM (1987) Precipitation of Nucleic Acids. Meth Enzymol 152:41–48 6. Chomczynski P, Sacchi N (1987) Single-step method of RNA isolation by acid guanidinium
50
Accerbi et al.
thiocyanate-phenol-chloroform extraction. Anal Biochem 162:156–159 7. Chomczynski P, Sacchi N (2006) The single-step method of RNA isolation by acid guanidinium sothiocyanate-phenol-chloroformextraction: wenty-something years on. Nat Protoc 1: 581–585 8. Chomczynski P (1993) A reagent for the single-step simultaneous isolation of RNA, DNA, and proteins from cell and tissue samples. BioTechniques 15:532–537 9. Grahm GC (1993) A method of extracting total RNA from Pinus radiata and other conifers. Plant Mol Biol Rep 11:32–37 10. Siju S, Madhubala R, Bhat AI (2006) Sodium sulphite enhances RNA isolation and sensitivity of Cucumber mosaic virus detection by
RT-PCR in black pepper. J Virol Methods 141:107–110 11. Bugos RC, Chiang VL, Zhang XH, Campbell ER, Podila GK, Campbell WH (1995) RNA isolation from plant tissues recalcitrant to extraction in guanidine. BioTechniques 19: 734–737 12. Loomis WD (1974) Overcoming problems of phenolics and quinones in the isolation of plant enzymes and organelles. Meth Enzymol 31:528–544 13. Logemann J, Schell J, Willmitzer L (1987) Improved method for the isolation of RNA from plant tissues. Anal Biochem 163:16–20 14. Manning K (1990) Isolation of nucleic acids from plants by differential solvent precipitation. Anal Biochem 195:45–50
Chapter 4 miRNA Target Prediction in Plants Noah Fahlgren and James C. Carrington Abstract In plants, miRNA bind to target RNAs with a high degree of complementarity. In this chapter, a simple method for computationally predicting plant miRNA targets, using a position-dependent scoring system, is described. Key words: MicroRNA, miRNA, Target prediction, Plant, Database search
1. Introduction miRNAs are key regulators of plant growth, development, and responses to biotic and abiotic stress (1). Plant miRNAs act as guides for effector complexes that usually regulate messenger RNA (mRNA) in a sequence-specific manner. miRNAs of plants typically guide cleavage of target RNAs (1). Cleavage of mRNA requires a high degree of miRNA:target base pairing, especially at the 5¢ and central positions of the duplex relative to the miRNA (2, 3). Figure 1a shows that most mismatches in authentic miRNA:target duplexes in Arabidopsis thaliana are at the first position and after position 13 relative to the 5¢ end of the miRNA. In contrast, positions 2–13 are relatively mismatch-free and contain somewhat fewer G:U base pairs. These base-pairing patterns led to the development of computational methods for predicting plant miRNA targets based on a position-dependent scoring matrix (4, 5). miRNA sequences are first aligned to potential targets using permissive alignment settings to ensure that all possible miRNA:target interactions are captured. Next, miRNA:target alignments are scored. Mismatches and single-nucleotide bulges or gaps are B.C. Meyers and P.J. Green (eds.), Plant MicroRNAs, Methods in Molecular Biology, vol. 592 DOI 10.1007/978-1-60327-005-2_4, © Humana Press, a part of Springer Science + Business Media, LLC 2010
51
Fahlgren and Carrington
a Percent of total base pairs per position
B & W IN PRINT
52
60%
Bulges G:U pairs Mismatches
50% 40% 30% 20% 10% 0%
1
2
3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Basepair position relative to the miRNA 5’ end
b
Mismatch: score = 1
G:U base pair: score = 1 (0.5 x 2)
At4g36920: 5’ CUGCAGCAUCAUCAGGAUUCU 3’ |||||||||||||:|||||| ath-miR172a: 3’ UACGUCGUAGUAGUUCUAAGA 5’
Total score = 2
2x Score Penalty Region
Fig. 1. (a) The distribution of mismatches, single-nucleotide bulges, and G:U base pairs along 155 authentic miRNA:target duplexes from A. thaliana. The set of 155 duplexes is made up of unique miRNA sequences from each MIRNA gene family with an experimentally validated target, and unique target site sequences from the same validated target families. Position 1 corresponds to the 5¢ end of the miRNA. Each position shows the percent of total base pairs at that position that are bulged, G:U, and mismatched. (b) Arabidopsis miR172a and a target (At4g36920), illustrating the alignment scoring system used to predict targets. The gray box highlights positions 2 through 13, relative to the miRNA 5¢ end, indicating the region where penalty scores are doubled.
assessed for penalty of +1 while G:U base pairs are assessed for penalty of +0.5. Penalty scores from mismatches, bulges, gaps, and G:U pairs between positions 2 through 13 are doubled due to stricter base-pairing requirements within this region (see Fig. 1b). A maximum score cutoff that balances sensitivity and specificity of the algorithm is selected. Finally, phylogenetic and wet-bench methods can be used to validate computational predictions.
2. Materials (see Note 1) 1. 2× AMD Opteron 285 Dual Core computer with 4 GB of RAM running CentOS 3. 2. PERL, v5.8.0 built for x86_64-thread-multi.
miRNA Target Prediction in Plants
53
3. FASTA34. 4. miRNA sequences. 5. Transcript, EST, and genomic sequences.
3. Methods We generated a PERL script that aligns miRNA to sequences in a target database using FASTA34, parses the FASTA34 alignments, and scores the alignments. Subheadings 3.2 and 3.3 describe these steps in detail. 3.1. Setup of Target Sequence FASTA Database
1. Download transcript, EST, genome, or other sequences of interest (see Note 2). 2. Reformat sequences to FASTA format, if necessary. Example: >AT1G51370.2 | Symbols: | F-box family protein | chr1:1904928319050493 FORWARD ATGGTGGGTGGCAAGAAGAAAACCAAGATATGTGACAAAGTGTCACATGAGGAAGATAGGATAAGCCAGTTACCGGAACCTTTGATATCTGAAATACTTTTTCATCTTTCTACCAAGGACTCTGTCAGAACAAGCGCTTTGTCTACCAAATGGAG… 3. The FASTA-formatted sequences should be stored in a single plain-text file (database.file). This will be used as an input for FASTA34.
3.2. Generation of miRNA:Target Alignments
The program FASTA34 is used to generate alignments between a miRNA sequence and all of the potential target sequences in the database set up in Subheading 3.1 (see Note 3). FASTA34 searches for sequence similarity by identifying direct matches, not base pairing, between two sequences. Therefore, set FASTA34 to align the reverse complement of the miRNA with prospective target sequences, and to accommodate position pairs corresponding to G:U base pairs. Note that this will result in some incorrect G:U pairs that will be disallowed at a later step (see Subheading 3.3). The alignment step is not used as a filtering step, so permissive gap, match, and mismatch settings for FASTA34 are applied. 1. Use the following settings for FASTA34: (a) n (forces the query sequence to be treated as nucleotide) (b) H (suppresses the score distribution histogram from the output) (c) Q (prevents FASTA34 from prompting for input)
54
Fahlgren and Carrington
(d) f (gap opening penalty, set this to –16) (e) r (match reward/mismatch penalty, set this to +15/−10) (f) w (alignment line length, set this to 100) (g) W (additional sequence context length, set this to 25) (h) b (the number of alignments to be included in the output, set this to a number larger than the number of sequences in the target database) (i) i (limits the search to reverse complement alignments only) (j) U (changes the scoring matrix to allow for G:A, T:C, or U:C matches, in other words it allows for G:U base pairs) (k) O (output filename) ktup (the number of seed matches minimally required for the program to build an alignment, set this to 1, see Note 4) 2. Place the miRNA sequence in a FASTA-formatted plain-text file (miRNA.file). This will be used as an input for FASTA34. 3. Run FASTA34 to generate alignments. Example command line setup: fasta34 –n –H –Q –f –16 –r +15/-10 –w 100 –W 25 –b 100000 –i –U –O output.file miRNA.file database.file 1 Note: the number at the end of the command is the ktup value. 3.3. Scoring Alignments
1. For each alignment (see Subheading 3.2), parse out the name of the sequence matched and the miRNA:target duplex sequences. Complement, but do not reverse, the miRNA sequence. 2. Using the parsed sequences, score each alignment using the following position-dependent scoring rules (see Fig. 1b): (a) Mismatches, single-nucleotide gaps or single-nucleotide bulges are assessed for a penalty of +1. (b) G:U base pairs are assessed for a penalty of +0.5. (c) Penalty scores are doubled at positions 2 through 13 relative to the 5¢ end of the miRNA. 3. Additionally, reject alignments that have: (a) More than one single-nucleotide bulge or gap. (b) More than seven total mismatches, G:U base pairs, bulges, and gaps. (c) More than four total mismatches or four total G:U base pairs.
3.4. Assessing Results
The miRNA target prediction method yields prediction scores starting at zero for a perfect miRNA:target duplex and caps at 11, the maximum score possible within the constraints listed above
miRNA Target Prediction in Plants
55
(four mismatches, one bulge or gap, and two G:U base pairs all between positions 2 through 13 relative to the 5¢ end of the miRNA, see Subheading 3.3). A reasonable score cutoff must be used to preserve specificity and maximize sensitivity. Previous work in A. thaliana has shown that a cutoff of 3.5 maintains reasonable sensitivity and specificity (6–8). Decreasing the cutoff will increase specificity but at the expense of sensitivity. To estimate the sensitivity, a set of previously validated targets must be available. 1. Start by predicting targets for miRNA with previously validated targets. 2. The sensitivity is calculated by determining the percentage of targets that fail to be predicted from the known set. Two approaches can be used to estimate specificity. 1. Start by predicting targets for miRNA with previously validated targets. 2. Specificity can be calculated by determining the percentage of the total predicted targets that are not known to be targets. OR Estimate the background signal by iteratively shuffling the miRNA sequences using a dinucleotide-preserving shuffle program (see Note 5) and predict targets (8). Background is the average number of targets predicted for the shuffled miRNA sequences. In addition to determining a prediction score cutoff, other data can be used to enhance predictions (see Note 6). For instance, conserved miRNA targets can be predicted efficiently by using a phylogenetic approach to search for miRNA target site conservation (5). Predicted sites can be validated as miRNA targets using wet-bench methods, such as the 5¢RACE assay, that experimentally detect miRNA-guided cleavage products (9).
4. Notes 1. There are many computer and program configurations that could feasibly substitute for those listed. PERL and FASTA34 are available for many computer operating systems, although necessary configurations may differ for each system. 2. Generally, it is easy to predict targets from a transcript database, but often potential targets (e.g., noncoding RNAs) are not well represented in transcript databases. Therefore, EST databases may contain additional targets. It is also possible to predict targets at the level of the genome to improve sensitivity if transcripts are not well annotated and ESTs are not available.
56
Fahlgren and Carrington
However, target sites that span exon-exon boundaries will be missed. 3. The program PATSCAN has been used in place of FASTA34 by others (5, 8). 4. FASTA34 uses a lookup table to search for sequences in the database that have potential to match the query sequence. The ktup value sets the minimum word size that has to match before FASTA34 will attempt to build an alignment between two sequences. For example, if ktup = 3 then FASTA34 will only align sequences that have at least three consecutive perfect matching nucleotides. Setting ktup = 1 means that each FASTA34 search is sensitive, but slow. 5. An algorithm for permutations that preserve dinucleotide content has been described by Altschul and Erickson (10). 6. Depending on the individual miRNA, other data can be used to enhance target predictions. For deeply conserved miRNA, expression profiling of transcripts in a general miRNA-defective mutant background, such as dcl1-7, can be used to search for targets that are upregulated in the absence of miRNA (3, 4, 6).
Acknowledgments We are grateful that research on miRNA targets in our lab is supported by National Science Foundation Grant MCB-0618433. We thank Christopher M. Sullivan, Scott A. Givan, Kristin D. Kasschau, Edwards Allen and Jason S. Cumbie for productive discussions and advice.
References 1. Jones-Rhoades MW, Bartel DP, Bartel B (2006) MicroRNAs and their regulatory roles in plants. Annu Rev Plant Biol 57:19–53 2. Mallory AC, Reinhart BJ, Jones-Rhoades MW, Tang G, Zamore PD, Barton MK, Bartel DP (2004) MicroRNA control of PHABULOSA in leaf development: Importance of pairing to the microRNA 5’ region. EMBO J 23: 3356–3364 3. Schwab R, Palatnik JF, Riester M, Schommer C, Schmid M, Weigel D (2005) Specific effects of microRNAs on the plant transcriptome. Dev Cell 8:517–527 4. Allen E, Xie Z, Gustafson AM, Carrington JC (2005) MicroRNA-directed phasing during
trans-acting siRNA biogenesis in plants. Cell 121:207–221 5. Jones-Rhoades MW, Bartel DP (2004) Compu tational identification of plant microRNAs and their targets, including a stress-induced miRNA. Mol Cell 14:787–799 6. Fahlgren N, Howell MD, Kasschau KD, Chapman EJ, Sullivan CM, Cumbie JS, Givan SA, Law TF, Grant SR, Dangl JL, Carrington JC (2007) High-throughput sequencing of Arabidopsis microRNAs: Evidence for frequent birth and death of MIRNA genes. PLoS ONE 2:e219 7. Howell MD, Fahlgren N, Chapman EJ, Cumbie JS, Sullivan CM, Givan SA, Kasschau KD,
Carrington JC (2007) Genome-wide analysis of the RNA-DEPENDENT RNA POLYME RASE6/DICER-LIKE4 pathway in Arabidopsis reveals dependency on miRNA- and tasiRNAdirected targeting. Plant Cell 19:926–942 8. Rajagopalan R, Vaucheret H, Trejo J, Bartel DP (2006) A diverse and evolutionarily fluid set of microRNAs in Arabidopsis thaliana. Genes Dev 20:3407–3425
miRNA Target Prediction in Plants
57
9. Llave C, Xie Z, Kasschau KD, Carrington JC (2002) Cleavage of scarecrow-like mRNA targets directed by a class of Arabidopsis miRNA. Science 297:2053–2056 10. Altschul SF, Erickson BW (1985) Significance of nucleotide sequence alignments: A method for random sequence permutation that preserves dinucleotide and codon usage. Mol Biol Evol 2:526–538
Chapter 5 A Method to Discover Phased siRNA Loci Michael J. Axtell Abstract Short, interfering RNAs (siRNAs) arise from the processing of long double-stranded RNA (dsRNA) by Dicer enzymes. Dicers generate siRNA duplexes by successive hydrolysis of both strands of the dsRNA phosphodiester backbone at positions determined by measuring 21–24 nucleotides from an exposed dsRNA terminus. Therefore, a population of dsRNAs with precisely identical termini will produce siRNA spaced in regular, 21–24-nucleotide intervals. This chapter presents an easily customized and generally applicable strategy for identifying loci which produce the “phased” siRNAs diagnostic of such processing. Given the input of a large set of expressed small RNAs and of the corresponding genome or transcriptome from which the small RNAs are derived, the methodology produces a ranking of user-defined loci with respect to their likely production of phased siRNAs. Top ranked loci are candidates for further computational and biological analyses. Key words: siRNA, microRNA, Trans-acting siRNA, Transitivity, Small RNA, Plants, Bioinformatics
1. Introduction Most of the currently known small RNAs expressed by plants can be broadly classified into two groups: those which arise from processing of the stems of single-stranded, stem-loop precursors, including the precisely excised microRNAs (miRNAs; (1–3)) and other less precisely processed stem-loop derived small RNAs (4, 5), and those which arise from processing of long, perfectly doublestranded dsRNA. This latter class comprises the endogenous siRNAs, and they comprise the majority of small RNA expression in wild-type Arabidopsis thaliana (6, 7). siRNA duplexes arise by the successive processing of dsRNA by Dicer enzymes, which in plants have been dubbed “Dicer-Like” (DCL). DCL enzymes “measure” and cleave dsRNA a characteristic distance (21–24 nts, depending upon the DCL in question) from a dsRNA terminus. Thus, for a B.C. Meyers and P.J. Green (eds.), Plant MicroRNAs, Methods in Molecular Biology, vol. 592 DOI 10.1007/978-1-60327-005-2_5, © Humana Press, a part of Springer Science + Business Media, LLC 2010
59
60
Axtell
population of dsRNAs with uniform termini, the population of siRNAs produced by DCL catalysis will be “phased” in consistent registers. Rather than emanating from random positions of the dsRNA precursor, they arise in discrete, periodic positions reflective of successive DCL-catalyzed cleavage from the dsRNA terminus. In plants, such phased siRNA loci have attracted much recent attention due to their combination of siRNA-type biogenesis with the trans targeting capability previously associated only with miRNAs (8–10), their importance in regulating developmental processes (11–17), the implications of their biogenesis for understanding the initiation of dsRNA formation (18), and their role in regulating the expression of rapidly expanding gene families (19). The currently known phased siRNA-generating loci in plants are all associated with small RNA-directed cleavage of a single-stranded precursor. After cleavage, one or the other of the cleaved products is thought to become a substrate for an endogenous RNA-dependent RNA polymerase (chiefly RDR6 in A. thaliana; (8, 9)), thus crea ting dsRNAs with termini precisely defined by the site of small RNA cleavage. Cleavage-stimulated production of dsRNA is especially potent for RNAs with more than one small RNA complementary site (18, 19), but also occurs on a subset of “single-cut” RNAs (7, 18, 20–22). In this chapter, I describe a simple methodology that can be used to discover loci producing siRNAs in “phased” positions indicative of processing from a population of dsRNAs with uniform termini. Rather than present a specific piece of software, a general, step-by-step description of the methodology is given in the form of both a succinct narrative and pseudocode for the critical steps. This method consists of, for a given locus, simply tallying the abundances of small RNAs whose 5¢ ends fall within each of the possible phased registers. For instance, when searching for loci where small RNAs are phased in increments of 21 nts, there are 21 possible registers. Metrics based upon both the proportion of small RNA abundance falling within the dominant register and upon the total amount of small RNA diversity corresponding to the locus are calculated and used to rank loci. Top ranked loci are those for which there are a large number of distinct corresponding small RNAs, of which a substantial proportion fall into a single phasing register. With appropriate cutoffs for the metrics, this simple ranking of loci has worked quite well to discover phased siRNAs from multiple plant species (18), but interested readers may also wish to consult other useful, alternative methods (see Note 1) .
2. Materials 1. A database of expressed small RNA sequences. 2. An abundance measurement for each small RNA sequence (see Note 2).
A Method to Discover Phased siRNA Loci
61
3. A reference database comprising the nuclear genome or transcriptome to which the small RNAs correspond. 4. A computer with NCBI BLAST software installed. 5. A computer with a useful programming language installed, such as PERL or Python (see Note 3).
3. Methods 3.1. Map the Sequenced Small RNAs to the Reference Database
The positions of the reference database that are identical in sequence to small RNAs are determined by simple sequence comparisons. These matching loci within the reference database represent the genomic origins of the expressed small RNAs. Depending on the goals of the experiment, the reference database may consist of an entire assembled genome (for instance the Arabidopsis pseudochromosome sequences), a database of mRNA transcripts, or some subset of similar sequences which might spawn phased siRNAs. Pseudocode: • For each sequenced small RNA: – For each sequence within the reference database: (a) Compare the two sequences (b) For each exact match: * Record the accession number of the matching small RNA * Record the accession number of the matched reference sequence * Record the matching strand of the reference (+ or −) * Record the start coordinate (the position within the matching reference database sequence identical to the 5¢ end of the small RNA) * Record the stop coordinate (the position within the matching reference database sequence identical to the 3¢ end of the small RNA) * Record the abundance value of the small RNA The output may be stored in a tab-delimited flat file, with one line for each small RNA, or in a custom database. Sequence comparisons between small RNAs and reference databases can be performed using BLAST (see Note 4).
3.2. Divide the Mapped Small RNA Coordinates into Intervals/Loci
Before searching for loci that produce abundant, phased siRNAs, the loci themselves must be defined. For instance, in a genome-wide search it might be useful to define each successive 1,000 base pair interval as a query locus, while a transcript-centered search might
62
Axtell
define each transcript as a potential locus. Mapped small RNAs are then associated with each locus or interval of interest, based solely on the position of the 5¢ end of the small RNA. Pseudocode: • For each locus/interval: – Search map data for small RNAs whose 5¢ end (start position) is within the locus – For each matching small RNA: (a) Record the accession number of the matching small RNA (b) Record the matching strand of the locus/interval (+ or −) (c) Record the 5¢ coordinate relative to the locus/interval (d) Record the abundance value of the small RNA The output may be stored in a tab-delimited flat file, with one line for each locus. 3.3. Calculate the Phasing Statistics for Each Locus/ Interval
Steps 1–5 of Subheading 3.3 describe the analysis of each locus/ interval. This protocol assumes a search to find loci phased in 21 nt increments, but can be generalized to searches for other register sizes. Parameters that would need to be modified to enable such alternative searches are noted. Pseudocode: • For each locus/interval: – Analyze matching small RNAs as shown below... 1. Calculate and record the total abundance of all small RNAs which match the locus (see Note 2) and the total number of distinct small RNAs which match the query locus. This includes all matching small RNAs regardless of the small RNA length. Pseudocode: • Calculate total abundance of all small RNAs matching the locus • Calculate the number of distinct small RNAs matching the locus 2. Assign each nucleotide position within the locus to one of 21 possible bins, representing positions which are separated by exactly 21 nts. These bins represent each of the possible phasing registers when searching for loci producing siRNAs at regular 21 nt intervals – searches for siRNAs in other register sizes should modify the number of bins accordingly. The logic behind these bins is illustrated in Fig. 1. Nucleotide position 1 is designated to correspond to bin number 1, position number 2 to bin number 2, and so on until 21 is reached. Nucleotide position 22 is exactly 21 nts distant from position number 1, and is therefore placed in bin number 1. Thus, bin 1 is most closely in a 21 nt phase with bins 21 and 2, while bin 14 is most closely in phase
A Method to Discover Phased siRNA Loci 1 1
3 3
5 5
7 7
9 9
11 11
13 13
15 15
17 17
19 19
21 21
2 23
4 25
6 27
8 29
10 31
12 33
14 35
16 37
18 39
20 41
1 43
3 45
5 47
7 49
9 51
11 53
13 55
15 57
17 59
19 61
21 63
2 65
4 67
5�-AUCGCAUCGAUCAUCGAUUCGAUCGAUCGCCGUAUCGAGAGA GUUCUCUCUAGAUAAUCUCGAGCGG-3� 3�-UAGCGUAGCUAGUAGCUAAGCUAGCUAGCGGCAUAGCUCU CUCAAGAGAGAUCUAUUAGAGCUCGCC-5� Sense small RNA: 5’ position = 43 bin 1
63
BINS NUCLEOTIDE POSITIONS
Antisense small RNA: 5’ position = 61 Adjusted 5’ position = 61 - 18 = 43 bin 1
Fig. 1. Assignment of positions within a locus to “phasing” bins. The absolute position of each nucleotide within a hypothetical double-stranded RNA is shown (“nucleotide positions”). Each position is assigned to one of 21 arbitrary bins (“bins”), which repeat in 21 nt cycles (lines). The assignment of two siRNAs (bold, underlined) to their respective bins is illustrated – see text for details.
with bins 13 and 15. The nucleotide positions within bin number x (where x is an integer between 1 and 21) are x, (x + 21), (x + 21 + 21), (x + 21 + 21 + 21), and so on, until the end of the locus is reached. Pseudocode: • For each nucleotide position within the locus: – Assign position 1 to bin 1, position 2 to bin 2, and so on until position 21 – Assign position 22 to bin 1, position 23 to bin 2, and so on until reaching the next multiple of 21 The result is a table, simple database, or hash where each nucleotide position within the locus is assigned to the appropriate bin, indicative of 21 nt spacing. 3. Assign each small RNA corresponding to the sense strand of the locus to the appropriate bin, based upon the position of the 5¢ nucleotide. For instance, a small RNA matching the sense strand of a query locus whose 5¢ end corresponded to position 43 would be assigned to bin number 1 (Fig. 1). Keep a tally of the small RNA abundances falling within each bin. Pseudocode: • For each small RNA matching the sense strand of the locus: – Get the position within the query locus corresponding to the 5¢ end of the small RNA – Lookup the appropriate bin for that position on the query locus – Add the abundance of that small RNA to the total abundance tallied within the bin 4. Assign each small RNA corresponding to the antisense strand of the locus to the appropriate bin based upon the position of the 5¢ nucleotide after adjusting for the two nucleotide offset produced by DCL-catalyzed siRNA formation.
64
Axtell
Processing of double-stranded RNA by DCL enzymes produces siRNA duplexes with two nucleotide, 3¢ overhangs (Fig. 1). Consequently, the 5¢ end of an antisense small RNA is offset by +18 nucleotides from the 5¢ end of its corresponding sense-strand RNA. Adjusting for this offset is simple: subtract 18 nucleotides from the 5¢ position of antisense small RNAs, and then lookup the bin corresponding to the adjusted position. For example, the antisense strand of the siRNA duplex shown in Fig. 1 is clearly in the same phase as its sense partner, because they form a duplex with the expected two nucleotide, 3¢ overhangs. The 5¢ residue of the antisense small RNA is at position 61 of the locus. Adjusting this position by subtracting 18 gives 43, which corresponds to bin 1 (Fig. 1). One special case where this adjustment becomes problematic is when subtraction of 18 from the absolute position of the 5¢ residue results in a value of zero or less. In this case, simply add 21 and look up the corresponding bin. Add the abundances of each matching small RNA to the tally for the appropriate bin. Pseudocode: • For each small RNA matching the antisense strand of the locus: – Get the position within the query locus corresponding to the 5¢ end of the small RNA – Subtract 18 from the position – If the result is a zero or less, add 21 – Lookup the appropriate bin for the adjusted position on the query locus – Add the abundance of that small RNA to the total abundance tallied within the bin 5. Calculate the “phase ratio” for the locus. The assignment of small RNA abundances to one of 21 possible bins for both sense (see Subheading 3.3, step 3) and antisense (see Subheading 3.3, step 4) small RNAs, results in a table of small RNA abundance by bin (Table 1). These data may also be conveniently displayed as a radial graph (Fig. 2). The phase ratio of a given locus is defined simply as the abundance of the highest bin divided by the abundance of all small RNAs mapping to the locus (see Note 5). In other words, the phase ratio is simply the percentage of small RNAs whose 5¢ ends fall in the most abundant of the 21 different bins. Record the phase ratio for the locus. Pseudocode: • Identify the bin with the highest abundance of 5¢ ends of small RNAs • Calculate the phase ratio by dividing the abundance of the highest bin by the total abundance of small RNAs corresponding to the locus
A Method to Discover Phased siRNA Loci
65
Table 1 Phasing of small RNAs corresponding to the Arabidopsis thaliana TAS1b locus (At1g50055.1) Bin
Abundance
1
1
2
12
3
4
4
1
5
6
6
4
7
6
8
2
9
0
10
3
11
3
12
2
13
24
14
6
15
2
16
4
17
855
18
5
19
2
20
7
21
5
Sum
Phase ratio
954
0.896
The bin with the highest abundance of small RNAs is indicated in bold In this example, small RNA abundances were the combined number of sequence reads observed via sequencing of various wild-type small RNA libraries by (6) (NCBI GEO accessions GSM154361, GSM154370, and GSM154375) and (4) (GSM118372–GSM118375)
66
Axtell 1 21 1000 20
2 3
750
19
4
500
18
5 250
17
6
16
7
15
8 14
9 13
12
11
10
Fig. 2. Display of phased small RNAs using a radial graph. Data from Table 1 were plotted as a radial graph with distance from the center indicating abundance in each of the 21 bins.
3.4. Set Minimum Acceptable Values
Before performing the final ranking of potential phased siRNA loci, it is useful to set up two filters. Steps 1 and 2 of Subheading 3.4 describe these filters. 1. Set a minimum acceptable phase ratio. By chance, a locus producing a random assortment of small RNAs would be expected to have 1/21 (~4.8%) of its small RNAs in any given bin. Thus, it is useful to set a minimum phase ratio significantly higher than 0.048. A value that is generally useful is 0.25 to 0.4. Pseudocode: • Set minimum acceptable phase ratio • Remove from further consideration, loci whose phase ratio is lower than the minimum 2. Set a minimum number of distinct small RNAs matching each locus. Loci with just two or three corresponding small RNAs that happen to fall within the same 21 nt register are quite likely to occur by chance, given the very large datasets of sequenced small RNAs emerging from many species. Practically, a minimum number of five to ten distinct small RNAs is a useful cutoff – any fewer than this and it is difficult to be convinced that the locus truly is a siRNA-producing locus without corroborating evidence (see Note 6). Pseudocode: • Set minimum acceptable number of distinct small RNAs • Remove from further consideration loci which have less than the minimum number of distinct corresponding small RNAs
3.5. Rank the Loci
A Method to Discover Phased siRNA Loci
67
In the final step, each locus passing the filters described in Subheading 3.4 is ranked according to both the phase ratio and the total number of distinct, corresponding small RNAs, as described in steps 1–3 of Subheading 3.5. 1. Rank each locus according to the phase ratio. The locus with the highest ratio is assigned rank 1, the second highest ratio is assigned rank 2, and so on. Pseudocode: • Assign a rank to each locus based on the calculated phase ratio 2. Rank each locus according to the number of distinct corresponding small RNAs. The locus with the highest number of distinct, matching small RNAs is assigned rank 1, the second highest is assigned rank 2, and so on. This rank is independent of that calculated in Subheading 3.5, step 1. Pseudocode: • Assign a rank to each locus based on the number of distinct, matching small RNAs 3. Generate a combined score by adding the two ranks together. This metric gives equal weight to both the proportion of small RNAs in a single phasing register and to the total number of small RNAs deriving from the locus. Loci with the best overall evidence for phased siRNA production will have the lowest scores. An example is shown in Table 2. Pseudocode: • Derive the final metric for each locus by adding the two ranks together • Sort the loci according to this metric – smaller values indicate more likely areas of phased siRNA production
4. Notes 1. Interested readers are also referred to the studies of Chen et al. (23) and of Molnár et al. (24), both of whom use similar ideas to statistically assign confidence values to the phasing of small RNA loci, and the work of Howell et al. (19), who utilize a phase-transform algorithm to estimate the amount of phasing for small RNA generating loci. 2. Typically, the abundance of individual small RNAs is approximated by the number of times it was sequenced in a given small RNA sequencing experiment, and is thus the number of “reads” of a sequence. Abundances can be reported as normalized values (for instance, “transcripts per quarter-million” (7)) for the purposes of cross-library comparisons. Abundances can
AtTAS1c
AtTAS1a
AtTAS1b
AtTAS2
ARF4
AtTAS3a
AFB2
AFB3
AtTAS4
bHLH gene
PPR gene
AtTAS3c
PPR gene
AT2G39675.1
AT2G27400.1
AT1G50055.1
AT2G39681.1
AT5G60450.1
AT3G17185.1
AT3G26810.1
AT1G12820.1
AT3G25790– AT3G25800
AT3G23690.1
AT1G63230.1
AT5G57735.1
AT1G64580.1
0.267
0.500
0.300
0.267
0.500
0.640
0.514
0.282
0.587
0.501
0.896
0.672
0.730
Phase ratio
13
9
10
12
8
4
6
11
5
7
1
3
2
Phase ratio rank
10
6
8
21
14
8
13
133
20
245
86
162
202
# Distinct small RNAs
10
13
11
6
8
11
9
4
7
1
5
3
2
Distinct small RNA rank
23
22
21
18
16
15
15
15
12
8
6
6
4
Final score
The set of loci reported by (4,18,19) as producing phased siRNAs were analyzed, using small RNA abundances derived as in Table 1. Loci with phase ratios of less than 0.25 and/or with five or fewer distinct matching small RNAs were filtered out
Locus
AGI accession
Table 2 Analysis of known Arabidopsis thaliana phased siRNA loci
68 Axtell
A Method to Discover Phased siRNA Loci
69
also be normalized to account for the ambiguity that arises when one small RNA sequence has multiple possible origins within a genome or other reference database. This can be accomplished by dividing the number of “reads” by the number of matches to the genome/reference database (4, 25). 3. The choice of programming language and computational approach will be specific to the goals and expertise of the user. This methodology has been implemented by the author as a series of PERL scripts. 4. One way to accomplish these comparisons is by using standalone NCBI blastall, with the following nondefault parameters: -p blastn -d [reference database] -i [small RNA sequences in DNA form] -e 0.1 -F F -W 6 -m 8 -v 100000 -a 2. The resulting tab-delimited output is then parsed to retain only exact matches (those that have 100% identity, with no gaps, over the entire length of the query small RNA). 5. Some loci produce siRNAs in more than one phase, reflecting distinct, well-defined dsRNA termini (18, 19). Searches for such loci might benefit from using the combined abundances in the top two bins to calculate the phase ratio. In such cases, the minimum phase ratio (Subheading 3.4, step 1) would need to be adjusted upward to account for the fact that, given a random distribution of small RNAs in 21 phasing registers, 2/21 (~9.6%) would be expected to fall into any given bin. 6. In deciding the cutoffs that are useful for the phase ratio and for the minimum number of distinct small RNAs, it may be helpful to include in the analysis loci which are already well known to produce phased siRNAs. If this is possible, then the minimum parameters can be examined to ensure that known phased siRNA loci are still being captured.
References 1. Llave C, Kasschau KD, Rector MA, Carrington JC (2002) Endogenous and silencing-associated small RNAs in plants. Plant Cell 14: 1605–1619 2. Park W, Li J, Song R, Messing J, Chen X (2002) CARPEL FACTORY, a Dicer homolog, and HEN1, a novel protein, act in microRNA metabolism in Arabidopsis thaliana. Curr Biol 12:1484–1495 3. Reinhart BJ, Weinstein EG, Rhoades MW, Bartel B, Bartel DP (2002) MicroRNAs in plants. Genes Dev 16:1616–1626 4. Rajagopalan R, Vaucheret H, Trejo J, Bartel DP (2006) A diverse and evolutionarily fluid
set of microRNAs in Arabidopsis thaliana. Genes Dev 20:3407–3425 5. Zhang X, Henderson IR, Lu C, Green PJ, Jacobsen SE (2007) Role of RNA polymerase IV in plant small RNA metabolism. Proc Natl Acad Sci U S A 104:4536–4541 6. Kasschau KD, Fahlgren N, Chapman EJ, Sullivan CM, Cumbie JS, Givan SA, Carrington JC (2007) Genome-wide profiling and analysis of Arabidopsis siRNAs. PLoS Biol 5:e57 7. Lu C, Tej SS, Luo S, Haudenschild CD, Meyers BC, Green PJ (2005) Elucidation of the small RNA component of the transcriptome. Science 309:1567–1569
70
Axtell
8. Peragine A, Yoshikawa M, Wu G, Albrecht HL, Poethig RS (2004) SGS3 and SGS2/SDE1/ RDR6 are required for juvenile development and the production of trans-acting siRNAs in Arabidopsis. Genes Dev 18:2368–2379 9. Vazquez F, Vaucheret H, Rajagopalan R, Lepers C, Gasciolli V, Mallory AC, Hilbert JL, Bartel DP, Crete P (2004) Endogenous transacting siRNAs regulate the accumulation of Arabidopsis mRNAs. Mol Cell 16:69–79 10. Williams L, Carles CC, Osmont KS, Fletcher JC (2005) A database analysis method identifies an endogenous trans-acting short-interfering RNA that targets the Arabidopsis ARF2, ARF3, and ARF4 genes. Proc Natl Acad Sci U S A 102:9703–9708 11. Adenot X, Elmayan T, Lauressergues D, Boutet S, Bouche N, Gasciolli V, Vaucheret H (2006) DRB4-dependent TAS3 trans-acting siRNAs control leaf morphology through AGO7. Curr Biol 16:927–932 12. Fahlgren N, Montgomery TA, Howell MD, Allen E, Dvorak SK, Alexander AL, Carrington JC (2006) Regulation of AUXIN RESPONSE FACTOR3 by TAS3 ta-siRNA affects developmental timing and patterning in Arabidopsis. Curr Biol 16:939–944 13. Garcia D, Collier SA, Byrne ME, Martienssen RA (2006) Specification of leaf polarity in Arabidopsis via the trans-acting siRNA pathway. Curr Biol 16:933–938 14. Hunter C, Willmann MR, Wu G, Yoshikawa M, de la Luz Gutierrez-Nava M, Poethig SR (2006) Trans-acting siRNA-mediated repression of ETTIN and ARF4 regulates heteroblasty in Arabidopsis. Development 133:2973–2981 15. Liu B, Chen Z, Song X, Liu C, Cui X, Zhao X, Fang J, Xu W, Zhang H, Wang X, Chu C, Deng X, Xue Y, Cao X (2007) Oryza sativa Dicer-like4 reveals a key role for small interfering RNA silencing in plant development. Plant Cell 19:2705–2718 16. Nagasaki H, Itoh J, Hayashi K, Hibara K, Satoh-Nagasawa N, Nosaka M, Mukouhata M, Ashikari M, Kitano H, Matsuoka M, Nagato Y, Sato Y (2007) The small interfering RNA production pathway is required for shoot
meristem initiation in rice. Proc Natl Acad Sci U S A 104:14867–14871 17. Nogueira FT, Madi S, Chitwood DH, Juarez MT, Timmermans MC (2007) Two small regulatory RNAs establish opposing fates of a developmental axis. Genes Dev 21: 750–755 18. Axtell MJ, Jan C, Rajagopalan R, Bartel DP (2006) A two-hit trigger for siRNA biogenesis in plants. Cell 127:565–577 19. Howell MD, Fahlgren N, Chapman EJ, Cumbie JS, Sullivan CM, Givan SA, Kasschau KD, Carrington JC (2007) Genome-wide analysis of the RNA-DEPENDENT RNA POLYMERASE6/DICER-LIKE4 pathway in Arabidopsis reveals dependency on miRNAand tasiRNA-directed targeting. Plant Cell 19: 926–942 20. Allen E, Xie Z, Gustafson AM, Carrington JC (2005) microRNA-directed phasing during trans-acting siRNA biogenesis in plants. Cell 121:207–221 21. Ronemus M, Vaughn MW, Martienssen RA (2006) MicroRNA-targeted and small interfering RNA-mediated mRNA degradation is regulated by argonaute, dicer, and RNAdependent RNA polymerase in Arabidopsis. Plant Cell 18:1559–1574 22. Yoshikawa M, Peragine A, Park MY, Poethig RS (2005) A pathway for the biogenesis of trans-acting siRNAs in Arabidopsis. Genes Dev 19:2164–2175 23. Chen HM, Li YH, Wu SH (2007) Bioinformatic prediction and experimental validation of a microRNA-directed tandem trans-acting siRNA cascade in Arabidopsis. Proc Natl Acad Sci U S A 104:3318–3323 24. Molnar A, Schwach F, Studholme DJ, Thuenemann EC, Baulcombe DC (2007) miRNAs control gene expression in the singlecell alga Chlamydomonas reinhardtii. Nature 447:1126–1129 25. Brennecke J, Aravin AA, Stark A, Dus M, Kellis M, Sachidanandam R, Hannon GJ (2007) Discrete small RNA-generating loci as master regulators of transposon activity in Drosophila. Cell 128:1089–1103
Chapter 6 Directed Gene Silencing with Artificial MicroRNAs Rebecca Schwab, Stephan Ossowski, Norman Warthmann, and Detlef Weigel Abstract The characterization of gene function typically includes a detailed analysis of loss-of-function alleles. In model plants, such as Arabidopsis thaliana and rice, sequence-indexed insertion collections provide a large resource of potential null alleles that can often be easily accessed through convenient Web sites (e.g., http://signal.salk.edu). They are, however, not available for nonmodel species, require stacking for knockout of redundant homologs, and do not easily allow for partial or regulated loss of gene function, which is particularly useful when null alleles are lethal. Transgene approaches that employ directed gene silencing can substitute for null alleles and also enable refined studies of gene function, e.g., by tissuespecific and inducible gene-silencing. This chapter describes the generation and application of artificial microRNAs (amiRNAs) as a gene silencing tool in a wide variety of different plant species. Key words: Gene silencing, miRNA, Hairpin, Loss-of-function, Phenotypic complementation
1. Introduction Mediators of transgene-induced gene silencing are single-stranded silencing RNAs (19–23 nucleotides) that bind to target transcripts through complementary base-pairing (1). MicroRNAs (miRNAs), one class of silencing RNAs, originate from a characteristic hairpincontaining transcript. Vectors that contain a hairpin precursor are recognized as second-generation RNAi vectors (2). Their sequence can be modified such that miRNAs of other, defined sequence, called artificial miRNAs (amiRNAs), are produced in planta. These vectors can serve as reverse genetics tools to direct gene silencing, also in nonmodel systems. Among their unique applications are transient and tissue-specific gene silencing, and the simultaneous B.C. Meyers and P.J. Green (eds.), Plant MicroRNAs, Methods in Molecular Biology, vol. 592 DOI 10.1007/978-1-60327-005-2_6, © Humana Press, a part of Springer Science + Business Media, LLC 2010
71
72
Schwab et al.
silencing of several related genes. The latter include closely linked genes (e.g., in tandem repeats) and those that are allele- and splice form-specific, as well. In addition, they provide the possibility of phenotypic complementation with target transgenes that are no longer susceptible to amiRNA-mediated gene regulation after introduction of silent mutations into the target sites. The web application Web MicroRNA Designer (WMD) facilitates the design of suitable amiRNA sequences for a variety of different plant species, as well as the design of primer sequences needed to modify the miRNA vectors. We describe the use of the tool as well as the molecular steps necessary to engineer those vectors and generate transgenic plants.
2. Materials 2.1. Computational Prediction of amiRNA Sequences
1. Identifier (or EST name) and sequence of the target gene(s) that should be silenced.
2.2. Site-Directed Mutagenesis by PCR
1. Six oligonucleotides; two are universal to the vector (A and B; Table 1) and four are specific to the particular modification. Their sequences are the output of the amiRNA design program (3.1). 2. Template plasmid: pRS300 (containing Arabidopsis athMIR319a) or pNW55 (containing rice osa-MIR528).
Table 1 Oligonucleotide sequences Template
Name
Position
Sequence 5¢→3¢
ath-MIR319a
I
amiRNA forward
GA (N)21 T CTC TCT TTT GTA TTC C
II
amiRNA reverse
GA (N)21 T CAA AGA GAA TCA ATG A
III
amiRNA* forward
GA (N)21 T CAC AGG TCG TGA TAT G
IV
amiRNA* reverse
GA (N)21 T CTA CAT ATA TAT TCC T
I
amiRNA forward
AG (N)21 C AGG AGA TTC AGT TTG A
II
amiRNA reverse
TG (N)21 C TGC TGC TGC TAC AGC C
III
amiRNA* forward
CT (N)21 T TCC TGC TGC TAG GCT G
IV
amiRNA* reverse
AA (N)21 A GAG AGG CAA AAG TGA A
A
outside forward
CTG CAA GGC GAT TAA GTT GGG TAA C
B
outside reverse
GCG GAT AAC AAT TTC ACA CAG GAA ACA G
osa- MIR528
Directed Gene Silencing with Artificial MicroRNAs
73
3. Equipment and chemicals to perform PCR, agarose gel electrophoresis, and gel extraction of PCR bands. 2.3. Cloning of aMIRNA Precursors
1. Commercial PCR product ligation kit (e.g., TOPO kits from Invitrogen) or standard cloning plasmid (e.g., pGEM-7Z from Promega, linearized with Sma I), 10 mM ATP, restriction enzyme SmaI. 2. T4 DNA ligase. 3. Competent cells from a standard E. coli strain (e.g., DH5a, TOP10). 4. LB plates containing appropriate antibiotics. 5. Plasmid extraction (Miniprep) solutions or kit (e.g., QIAprep from Qiagen). 6. Restriction enzymes to test for positive clones and subclone aMIRNAs. 7. Binary plasmid containing a promoter of interest and a terminator.
2.4. Plant Transformation and Analysis of Transgenic Plants
1. Standard competent Agrobacterium tumefaciens strain (e.g., GV3101 for A. thaliana, LBA4404 or EHA105 for O. sativa). 2. Standard equipment for the generation of transgenic plants. 3. Selection marker and appropriate antibiotics (e.g., Kanamycin, Hygromycin). 4. TRIzol® reagent (Invitrogen) or commercial RNA extraction kit (e.g., RNeasy from Qiagen). 5. Reverse transcriptase kit (e.g., SuperScript III from Invitrogen) and oligonucleotides for RT-PCR. 6. Optional – Kit for mRNA extraction (e.g., Oligotex from Qiagen) and 5¢ RACE (e.g., GeneRacer from Invitrogen). – Oligonucleotides for site-directed mutagenesis to engineer silent mutations in transgenes containing the target gene(s) for phenotypic complementation.
3. Methods The method section describes in detail the design and construction of amiRNAs to silence one or several related genes of interest in a variety of different plant species. It is subdivided into four main sections: (1) the computer-assisted design of optimal amiRNA sequences, (2) the generation of aMIRNA precursors by site-directed mutagenesis, (3) cloning of aMIRNA precursors, and (4) the generation and analysis of transgenic plants.
74
Schwab et al.
3.1. Design of amiRNA Sequences
Sequences of amiRNAs need to be optimized for both effectiveness and specificity. The optimization for specificity (i.e., predicting and avoiding off-targets) depends on the availability of transcriptome sequence information. We have developed the Web MicroRNA Designer (WMD), which is available at http:// wmd3.weigelworld.org. It predicts suitable amiRNA sequences for gene silencing in a large number of plant species for which a whole-genome annotation or significant EST/cDNA sequence information is available. Note 1 describes how to design artificial miRNA sequences for species that are not yet included in WMD.
3.1.1. Experimental Design
WMD can design amiRNAs to silence single genes as well as design amiRNAs to simultaneously silence multiple genes. These genes need to share regions of high nucleotide sequence similarity. WMD selects amiRNA candidates that are partially complementary to the target gene(s) and ensures that no other annotated gene of the respective genome/transcriptome release (species) fulfills the criteria of productive miRNA target interaction (3–5). These criteria have been empirically determined and are expected to improve as the knowledge on miRNA biology and function grows. It is important to realize that it might not be possible to design an amiRNA against one gene or several highly sequencerelated genes, if all potential target sites to which the amiRNA would bind are also present in other (known) genes. Therefore, a search for highly similar genes is prudent before starting the design process. This can be done by a BLAST search implemented in the WMD webpage. In cases where targeting of highly related genes does not interfere with the experiment, these can be specified as “accepted off-targets” to increase the chances of a successful amiRNA design.
3.1.2. Using the amiRNA Design Tool (WMDDesigner Window)
WMD requires sequence information of the intended target gene(s) as input and selects 21mer amiRNA candidates from the reverse complement(s). To accommodate the different formats of sequence annotations and a variety of amiRNA applications, there are several ways to input “target genes” into the WMDDesigner: 1. Silencing single or multiple genes in fully annotated genomes: For genes that are fully annotated, it is sufficient to provide the gene identifier(s) including splice variants, e.g., At1g23450.1 or Os01g24680.1 for Arabidopsis or rice respectively. Whether all splice forms of a gene are targeted by a specific amiRNA can be quickly tested with the “WMD-Target Search tool” (see Note 2). 2. Silencing single or multiple genes in species for which only EST information is available: Locus identifiers (e.g., EST names) need to be provided, whereby the identifier has to be identical to the identifiers used
Directed Gene Silencing with Artificial MicroRNAs
75
by WMD. The correct names are easily determined by a search with the WMD-BLAST tool. Multiple EST sequences often exist for a single locus. First they all need to be identified with the WMD-BLAST tool , then one of them (preferably the longest) should be used as the target gene (identifier entered in the “Target genes” field), and the identifiers of all others that likely originate from the same locus are entered as “accepted off-targets” (separated by commas). When attempting to silence multiple genes simultaneously, one sequence for each gene is entered in the “Target genes” field, and the others are added to the list of “accepted off-targets.” 3. When none of the redundant ESTs comprises the full-length target transcript, several can be concatenated and serve as target for the amiRNA design. It should be given a custom identifier that distinguishes it from all annotated ESTs, and all redundant ESTs need to be specified as “accepted off-targets.” The concatenated EST needs to be entered into the “Target genes” field in FASTA format. 4. Splice-form specific silencing: If exclusively one splice form is to be targeted, then a DNA sequence (>21 nucleotides, FASTA format) that is unique to the respective splice form should be used as target and the identifier/name of the splice form (e.g., At2g23450.2) should serve as a header. 5. Not annotated genes, and sequence variants: If the gene to be silenced is not contained in the WMD database for the respective species (e.g., GUS, GFP, viral genes, genes that are not yet annotated, etc.), or represents a sequence variant of an annotated gene (e.g., an allele from a different ecotype or cultivar), the target gene sequence(s) should be provided in FASTA format, headed by a custom name that is different from that of any other annotated gene. Silencing of sequence variants also requires the reference allele to be specified as an accepted off-target unless allele-specific silencing is desired. In all cases, the respective plant species/genome release is to be selected from the drop-down menu on the WMD-Designer page to ensure specificity within this set of sequences. Specification of the “minimal number of included targets” is necessary when more than two genes are to be silenced simultaneously. In addition to finding an amiRNA that silences all genes simultaneously, WMD will attempt to generate amiRNAs that target all possible subgroups of sizes greater or equal to the number given in “minimal number of included targets.” The computation of amiRNAs will take between a few minutes and several hours. The results will be emailed to the provided emailaddress with the information entered in “Description” as the subject line.
76
Schwab et al.
Fig. 1. Example of a WMD result page. Candidate amiRNA sequences are listed and ordered by efficiency and specificity criteria. 3.1.3. Processing amiRNA Design Results
The amiRNA result email contains a hyper-link to a results webpage on which the amiRNA candidate sequences are listed (Fig. 1). This list can also be downloaded for future reference (Microsoft Excel format). See Note 3 in case the results page is empty and displays the message “Unfortunately WMD3 could not design any microRNAs”. In principle, all amiRNA sequences returned by WMD fulfill the above described criteria and are expected to silence the predicted target genes successfully. However, they comply differently to the parameters that we consider optimal in terms of base pair composition, hybridization properties to the target gene(s), and specificity criteria (for details, see (6)). The possible amiRNAs are therefore ranked by a respective cumulative score. The highest ranking amiRNA candidates are presented on top of the list. Green color indicates a very favorable score, while orange and red often mark amiRNAs with potentially reduced efficiency or, more often, specificity. It is thus recommended to proceed from top to bottom of this list. The amiRNA sequences in the results page are hyperlinked to visualizations of alignments of the amiRNA to all potential target sequences in the WMD database, ordered by hybridization energy (for illustration see Fig. 2). The intended target gene ideally appears on top of the list. The WMDTarget Search tool can also show genes that align to the amiRNA
Directed Gene Silencing with Artificial MicroRNAs
77
Fig. 2. Alignment of an amiRNA to its target gene. Explanation to the amiRNA-target alignment presented by WMD.
with equal or fewer than five mismatches but do not fulfill other empirical rules, as indicated with respective notes in red bars. For the subsequent selection of one (or more) amiRNAs for further experiments, we recommend the following: 1. It is preferable for all intended target genes to not have mismatches to the amiRNA at positions 2 to 12. 2. AmiRNA candidates with one or two mismatches at the 3¢ end of the amiRNA (positions 18 to 21) should be preferred, since it has been suggested that perfectly matching amiRNAs might trigger so-called transitive siRNA formation, where amplification of sequences adjacent to the binding site is primed by the miRNA. These sequences could in turn themselves serve as silencing triggers and affect other, unintended genes (7). 3. The absolute hybridization energy of the binding between amiRNA and the target sequence should be less than −30 kcal/ mole, and preferable be in the range between −35 and −40 kcal/ mole. 4. The amiRNA binding site should be located within the coding region of the target gene, since UTRs are more likely to be misannotated. At least two amiRNAs per target gene or group of genes should be selected for experimental work. If several are selected, the amiRNAs should bind the target mRNA at different locations, since secondary structure is suspected to influence miRNA efficacy. 3.2. Construction of aMIRNA Precursors by Site-Directed Mutagenesis
While exogenous small RNAs duplices are often directly used to transfect animal cell cultures and induce gene silencing, their accumulation in plants requires the construction and subsequent expression of a precursor RNA. AmiRNAs are engineered into vectors that contain endogenous MIRNA precursors by sitedirected mutagenesis, such that the resulting precursor RNAs are processed by the endogenous miRNA machinery to release the
78
Schwab et al.
amiRNAs. Several Arabidopsis precursor templates have been used successfully in Arabidopsis (4, 8–11) and other plants (8, 26), while rice osa-MIR528 was specifically engineered for amiRNA production in rice (12) and cre-MIR1157 and cre-MIR1162 have been used successfully in the unicellular green alga Chlamydomonas reinhardtii (24, 25). Please see Note 4 when working with a different plant species. MIRNA precursors fold back on themselves to form a hairpin structure, and it is important to preserve this structure for successful processing. Therefore, engineering of amiRNAs into MIRNA precursor templates not only requires the exchange of the miRNA by the amiRNA sequence, but also of the pairing region in the hairpin, called the (a)miRNA*, such that pairing positions as well as G:U pairs are retained. The WMD software (WMD-Oligo window) thus generates four oligonucleotides per amiRNA sequence input: I and II to engineer the actual amiRNA, and III and IV for the amiRNA* (with wobbles). Currently, the software supports ath-MIR319a, osa-MIR528, and cre-MIR1157 (selected from a dropdown menu) and others will be included as they become available. 3.2.1. The MIRNA Templates
Endogenous MIRNA precursors that have been cloned into plasmids serve as templates for PCR reactions to exchange miRNA and miRNA*. These precursors include the hairpin and short pieces of flanking sequence on either side, which are known to be part of the longer endogenous MIRNA transcript. Plasmids that are currently available contain ath-MIR319a (plasmid pRS300) and osa-MIR528 (plasmid pNW55), which can be obtained upon sending a request to Detlef Weigel (
[email protected]). A schematic representation of these MIRNA containing plasmids is shown in Fig. 3; their complete sequences are available on http://wmd3.weigelworld.org.
3.2.2. Oligonucleotides
Six PCR oligonucleotide primers are needed to produce an aMIRNA transgene. Four primer sequences are generated by WMD (WMD-Oligo window) and are given in 5¢→3¢ orientation. They are 40 nucleotides long and specific for the intended amiRNA. The 5¢ most two and 3¢ most 17 nucleotides match the template MIRNA precursor, while the 21 nucleotides in between do not match and will generate the amiRNA and amiRNA* in the amplicon (see Fig. 4). An additional two general oligonucleotides (A and B; see sequences in Table 1) that match the harboring plasmid (a pBluescript derivative) outside of the MIRNA precursor are also required. They have been placed such that the sizes of the resulting PCR products enable convenient purification and handling. Using the six primers, the aMIRNA precursor is amplified in three pieces (a–c) as shown in Fig. 4. The three pieces are subsequently fused to one amplicon (d) in a single PCR reaction.
Directed Gene Silencing with Artificial MicroRNAs
79
Fig. 3. Template plasmids for construction of the amiRNA precursor, the aMIRNA foldback. (a) Plasmid pRS300 containing the ath-MIR319a precursor in pBluescript SK (cloned via the SmaI site). (b) Plasmid pNW55 containing the osa-MIR528 precursor in pBluescript KS (also cloned via the SmaI site). Complete plasmid sequences are available at http://wmd3. weigelworld.org. Abbreviations: A, B oligonucleotide binding sites; T3, T7 RNA polymerase/oligonucleotide binding sites; Amp Ampicillin resistance gene; MCS multiple cloning site. Sizes of the aMIRNA foldback and surrounding regions are indicated in Fig. 4.
Fig. 4. Schematic representation of PCR reactions that generate aMIRNA precursors. (a) Illustration of the template plasmid (see Fig. 3) with oligonucleotide binding sites indicated. (b) PCR amplicons (a), (b), and (c). (c) (a), (b), and (c) are fused to (d) by PCR. (d) Only the central part encodes the aMIRNA precursor, which is schematically shown at the bottom. Abbreviations: Ath Arabidopsis thaliana; Osa Oryza sativa (rice); A, B, I, II, III, IV oligonucleotide identifiers (see text); MCS multiple cloning site; (a), (b), (c), (d) PCR fragments as indicated in the text.
3.2.3. Generating aMIRNA Precursors by Overlapping PCR
1. Resuspend the template plasmid upon receiving, transform into competent E. coli cells (standard lab strain), spread on ampicillin-containing LB plates, inoculate an overnight culture from a single colony, and isolate the plasmid again using standard plasmid isolation procedures. Prepare a 1:100 dilution.
80
Schwab et al.
Table 2 PCR reactions Template
Forward oligo
Reverse oligo
Template
Length of PCR product (bp)
A
IV
pRS300
272
(b)
III
II
pRS300
171
(c)
I
B
pRS300
298
(d)
A
B
(a) + (b) + (c)
701
(a)
A
II
pNW55
256
(b)
I
IV
pNW55
87
(c)
III
B
pNW55
259
(d)
A
B
(a) + (b) + (c)
555
Reaction
ath-MIR319a (a)
osa-MIR528
2. Setting up PCR reactions (a) to (c). All PCR reactions should preferentially be carried out with a proof-reading polymerase (such as Pfu) to avoid PCR errors. Table 2 shows the oligonucleotide combinations for each PCR reaction together with the expected size of the product (see also Fig. 3). Reactions (a) to (c): 2.0 µl
10× PCR buffer (with ~25 mM Mg++)
2.0 µl
dNTPs (2 mM)
1.0 µl
each oligonucleotide (10 µM; see PCR scheme)
1.0 µl
template DNA (1:100 dilution of template plasmid)
0.2 µl
polymerase
12.8 µl
water
20 µl
total
Protocol: 95° C 2 min ü ï 95° C 30 s ï ï 52° C 30 s ý 35cycles 72° C 40 s ï ï 72° C 7 min þï
Directed Gene Silencing with Artificial MicroRNAs
81
3. Isolate PCR fragments from a 2% agarose gel and purify with standard gel extraction procedures. PCR fragments from reactions (a), (b), and (c) can be pooled already at this step. Elute in 20 µl of water. 4. Reaction (d): fusion of fragments (a), (b), and (c). 2.0 µl
10× PCR buffer (with Mg++)
2.0 µl
dNTPs (2 mM)
1.0 µl
oligonucleotides A and B (10 µM)
0.5 µl
each purified gel fragment (a), (b), and (c) or 1.5 µl of combined gel eluate
0.2 µl
polymerase
12.3 µl
water
Protocol: 95° C 2 min ü ï 95° C 30 s ï ï 52° C 30 s ý 35cycles 72° C 90 s ï ï 72° C 7 min ïþ 5. Isolate PCR fragment from a 1% agarose gel. 3.3. Cloning
To sequence-verify the fusion-PCR product (d), it can be bluntend ligated into a standard cloning vector. It is important to keep in mind that this PCR fragment contains the T3 and T7 primer sites and the Multiple Cloning Sites of the template plasmid (see Fig. 2). Using T3 and/or T7 primers for sequencing may cause failed sequencing reactions, if the vector of choice also contains T3 and/or T7 sites.
3.3.1. Blunt-End Cloning Using Kits or Linearized Plasmids
(See Note 5 for the use of gateway-compatible plasmids.) PCR reactions with proof-reading polymerases generate blunt-ended products. Some companies offer kits to directly clone blunt-ended DNA fragments (e.g., TOPO kits from Invitrogen), and it is recommended to follow the manufacturer’s recommendations. Another simple and cheap protocol to clone blunt-ended PCR products is based on plasmids that are linearized with a restriction enzyme that produces blunt ends (e.g., SmaI). Since PCR products are not 5¢ phosphorylated, the plasmid needs to retain its terminal phospho-groups after restriction and is directly used for ligations without prior dephosphorylation or purification. Re-ligation of the empty plasmid is prevented by addition of SmaI to the ligation mix.
82
Schwab et al.
Ligation Reaction: 1.0 µl
10× reaction buffer for SmaI
0.5 µl
ATP (10 mM)
1.0 µl
plasmid cut with SmaI, not dephosphorylated, and not purified
1.0 µl
T4 DNA ligase (10 U/µl)
0.3 µl
SmaI
6.2 µl
purified PCR fragment
The ligation mix is incubated at 16°C overnight, followed by ~2 h at 30°C (optimal temperature for SmaI restriction) prior to transformation into standard competent E. coli strains. If possible, blue white selection for the presence of an insert is recommended. Single colonies are cultured, and the recovered plasmid DNA should be test digested (e.g., with EcoRI and BamHI to yield a 408 bp band with the ath-MIR319a template, 268 bp with the osa-MIR528 template) prior to sequence verification with standard oligonucleotides (depending on the plasmid) or oligonucleotides A or B. Sequencing at this step is strongly recommended to ensure that the new plasmid is indeed transformed. It may also be useful to know that the miRNAs in the template plasmids harbor uniquely occurring restriction sites – SacI in pRS300 (athMIR319a) and SphI in pNW55 (osa-MIR528) – which should (in most cases) be eliminated after successful PCR mutagenesis. 3.3.2. Sub-cloning into Binary Plasmids
aMIRNA precursors that are generated by site-directed mutagenesis do not contain a promoter or terminator; both need to be added by subsequent sub-cloning steps. For functionality tests in planta and initial characterizations, strong ubiquitous promoters such as cauliflower mosaic virus (35S) have been proven very helpful. More detailed analyses can be carried out with tissue-specific promoters, since amiRNAs function largely cell-autonomously (4). Because amiRNA-mediated gene silencing is quantitative (stronger promoters induce stronger effects), we do not recommend weak promoters when strongly expressed genes should be silenced efficiently. They might, however, become useful when partial silencing is intended. Inducible and transient aMIRNA expression was successful with ethanol and estrogen inducible systems (4, 13). Promoters are often already contained in binary vectors, or are to be inserted with standard cloning techniques. We did not observe remarkable differences in phenotypic effects with different binary plasmids in A. thaliana, and therefore recommend using a plasmid system that is well-established in the respective plant system.
Directed Gene Silencing with Artificial MicroRNAs
83
All restriction sites of the pBluescript Multiple Cloning Sites flanking the aMIRNA precursor in the fusion PCR product (d) can be used to excise the amiRNA precursor (the aMIRNA transcript) from the sequencing plasmid. We frequently use EcoRI and BamHI for the ath-MIR319a backbone, but other enzymes can be used as well. It is, however, necessary to preserve the direction of the aMIRNA precursor, since anti-sense transcripts are not expected to form the same secondary structure. Gatewayassisted cloning is also possible, since the presence of AttB sites adjacent to the amiRNA precursor does not seem to affect its processing (see Note 5). 3.4. Plant Transformation and Analysis of Transgenic Plants 3.4.1. Transformation of Agrobacterium and Plants
3.4.2. Reduced Abundance of Target Transcripts
Most protocols for the generation of transgenic plants rely on an Agrobacterium strain delivering the above-described binary plasmid. Transformation of competent strains (e.g., GV3101 for A. thaliana, LBA4404 or EHA105 for O. sativa) is carried out with standard transformation protocols. Similarly, transfection of plants with the transgenic Agro bacterium strains should be carried out with established protocols, and primary transformants require selection with appropriate selection markers. The observation of phenotypic variation in primary transformants is expected and this might, in some cases, resemble an allelic series of the respective mutant. Gene silencing with transgenes is, in many cases, not complete such that plants resembling null mutants of the respective target gene might not be recovered. See Note 6 in case you do not observe phenotypic changes in primary transformants. To confirm that phenotypic changes are indeed due to reduced abundance of the intended target gene product(s), their levels should be analyzed in pools of primary transformants with similar phenotypes, or in individual plants, and compared to an untransformed or empty-plasmid-transformed control. If available, estimating target protein levels with specific antibodies should be the method of choice. Since plant amiRNAs, like many endogenous miRNAs, typically also affect the accumulation of target mRNA, RT-PCR is often indicative of successful gene silencing. RNA is preferentially isolated from tissues with strong phenotypic effects, either with commercial kits or with TRIzol® reagent (Invitrogen). Commercial reverse transcription kits can be used for cDNA synthesis. RT-PCR products preferentially span the amiRNA-guided cleavage site. See Note 7 when you observe phenotypic abnormalities, but no change in target mRNA levels. To estimate the specificity of gene silencing, it is recommended to also test for the accumulation of closely related transcripts, which contain regions of partial sequence complementarity to the amiRNA (five or fewer mismatches, determined with the WMD-Target search tool; see Note 2). Reduced levels can be the
84
Schwab et al.
result of direct amiRNA targeting, but also of feedback regulation when the two genes participate in the same genetic pathway. To discriminate between the two possibilities, it is necessary to specifically test for the accumulation of cleaved targets by 5¢ RACEPCR, since (a)miRNAs trigger the cleavage of target transcripts – always opposite of positions 10 and 11 of the amiRNA. 3.4.3. Cleavage Site Mapping by RACE-PCR
RACE-PCR typically uses mRNA as a starting material, which can be isolated from total RNA with commercial kits. Standard protocols for 5¢RACE typically start by de-capping full-length mRNAs, whereas this step is omitted for cleavage product detection, and mRNA is directly ligated to the RNA linker oligonucleotide. Reverse transcription is typically carried out with an oligo-dT primer. PCR amplification of cleavage products uses forward oligonucleotides that bind the introduced linker sequence and genespecific reverse oligonucleotides complementary to a region ~200 to 300 nucleotides downstream of the putative amiRNA binding site in the gene of interest. The abundance of cleavage product can be very low, and sometimes a second, nested PCR may be necessary. Amplified products should be ligated into standard cloning vectors and sequenced to determine where the linker had been ligated and hence where the target transcript had been cleaved. Cleavage is predicted to occur at the amiRNA binding site between the two base pairs opposing positions 10 and 11 of the amiRNA.
3.4.4. Genetic Complementation
Since target sites of amiRNAs are small and distinct, it is possible to engineer silent mutations in this region of the target gene, such that the transcript is no longer susceptible to amiRNA-mediated regulation. Introducing this transgene under its endogenous or a stronger promoter should suppress the amiRNA-induced phenotypes. This approach has successfully been used to bypass regulation by endogenous miRNAs (14), and it can provide powerful evidence that the observed phenotypes are caused only by downregulation of the intended target and not by other genes. Silent mutations are typically introduced in as many positions as possible within the amiRNA binding site by PCR-based site-directed mutagenesis, in a similar way as aMIRNAs are produced (3.2.3).
4. Notes 1. When the plant species of interest is not yet included in WMD, but significant sequence information is publicly available, please contact
[email protected] to have the species added to the tool. Obviously, the specificity calculations can only take the available set of sequences into account, so there is always the possibility that amiRNAs affect additional genes
Directed Gene Silencing with Artificial MicroRNAs
85
that are not annotated or only partially annotated in the current sequence release of the respective species. 2. The WMD-Target Search application rapidly identifies target genes for miRNAs and other small RNAs in a given transcript collection/genome annotation. It uses a sequence matching algorithm, based on enhanced suffix arrays (http://vmatch.de (15), and enables the identification of all genes in the collection with a defined number of mismatches to the search sequence. In addition, the WMD-Target Search applies the empirically determined parameters of miRNA target selection (5) to filter for putative target genes. The output includes an alignment of the small RNA (reverse complement) to putative targets as illustrated in Fig. 2. With default settings, WMD-Target Search output lists only one splice form per gene. All splice forms are displayed when the splice form filter is disabled (“Show only one isoform” in Advanced Search Options). This option should be used to examine whether all splice forms are targeted. 3. Failure of WMD to produce suitable aMIRNAs can have several reasons: (a) The input sequence might have been too short to contain suitable target sites. (b) The WMD-Designer is not able to compute a specific amiRNA against a target gene of interest if its nucleotide sequence is very similar to one or several other genes at all potential target sites. The similar gene(s) can be easily identified using WMD-BLAST, and one or several targets will have to be silenced together by adding them as additional targets or as “accepted off-targets.” It might still be possible to conduct conclusive experiments by choosing off-target(s) that do not interfere with the experimental design, or by evaluating the effects of several amiRNA constructs in planta with different off-targets. (c) Some transcript collections contain redundant ESTs, and multiple ESTs might span the locus of interest. Here, all genes/ESTs that are highly related to the gene of interest should be identified with WMD-BLAST and included in the WMD-Designer input as “accepted off-targets.” (d) WMD can only compute a multi-gene amiRNA that targets several genes if they share regions of high nucleotide sequence similarity. Simultaneous silencing of multiple related genes might fail if the genes are not similar enough, or one or more is/are not different enough from other genes (see Note 2). Try to reduce the minimal number of included target genes or silence them individually. 4. A. thaliana MIRNA precursors have successfully been used for amiRNA production in other plants (tomato, tobacco, and
86
Schwab et al.
Physcomitrella, (8, 26)), but precursor functionality across species has not yet been systematically investigated. Therefore, adapting the cloning protocol to MIRNA precursors endogenous to the respective plant species of interest might be the optimal approach. MIRNA precursors have been identified and characterized in several different plants (see miRBase, http://microrna.sanger.ac.uk/, (16), often by homology to known miRNAs. As backbones for amiRNA production, we recommend either using a precursor that has been shown to be expressed and processed, i.e., by northern blot, or – when this information is not available – using a highly conserved precursor, e.g., MIR164 or MIR319. Oligonucleotides I through IV will need to be adapted to reconstruct the proper hairpin structure, such that bulges remain at their respective positions. 5. Cloning with the Gateway® technology seems to not interfere with amiRNA production. In the following, we list possible cloning strategies for using the Gateway® system: (a) The MIRNA precursor fragment can be excised from the sequencing plasmid (3.3.1) with restriction enzymes and ligated into a Gateway® entry plasmid. (b) The fusion product (d) in 3.2.3. can be ligated into a Gateway® entry plasmid as it is. (c) Alternatively, the fusion PCR of the fragments (a), (b), and (c) to (d) in 3.2.3 can be carried out with oligonucleotides that already contain AttB sites at their 5¢ ends. These primers do not necessarily need to bind to the primer binding sites A and B. Primers that bind to sequences in the Multiple Cloning Sites have successfully been used to obtain a short insert and to eliminate undesired restriction sites. The resulting PCR product with AttB sites at both ends can then be ligated into a vector of choice (e.g., pGEM T easy), which then serves as the entry plasmid for a subsequent recombination reaction. 6. Missing phenotypic changes in transgenic plants that (over) express amiRNAs can have several reasons: (a) The phenotypes might not be detectable in the growth conditions tested. (b) The loss-of-function phenotype of the gene of interest might be masked by redundancy. A search for related genes with similar expression patterns (see, e.g., AtGenExpress platform for A. thaliana; http://jsp.weigelworld.org/ expviz/expviz.jsp, (17) might help to identify potentially redundant genes to be used as additional targets. (c) The target gene might not be sufficiently downregulated to detect phenotypic changes. It is critical to achieve
Directed Gene Silencing with Artificial MicroRNAs
87
high amiRNA expression in the tissue(s) of target gene expressions, but even promoters such as the one from the CaMV35 gene are not entirely ubiquitous. A fraction of A. thaliana amiRNAs generated to date (20–25%) does not silence the intended target gene(s), but the reasons are yet to be determined. It is possible that their target sites are not accessible due to extensive local secondary structures, similar to what has been observed for siRNAs in animal systems (19). Ongoing studies address this question, and we are planning to account for this effect by integrating novel tools such as “RNAup” (20) into WMD. When calculating RNA-RNA binding, “RNAup” also considers the folding of the respective RNA molecules to themselves, and can therefore be used to predict the accessibility of the target sites in the target mRNA. At present, we recommend constructing at least two amiRNAs per target gene/group of target genes, with target sites located in different regions of the target transcript(s). (d) If even very potent amiRNAs cause only small effects on the transcript levels, the target genes might be under negative feedback regulation. Those genes may be silenced effectively on a transcriptional level (e.g., by promoter methylation, (18), however not by post-transcriptional gene silencing. 7. Typical (a)miRNA-mediated gene silencing includes the cleavage of target transcripts followed by degradation of the cleavage products, leading to a reduction in transcript abundance, which can be measured by RT-PCR. However, for some endogenous miRNAs, e.g., ath-miR172, translational inhibition is at least as important as miRNA-guided cleavage (21, 22, 27). Thus, phenotypic changes can be present even though mRNA levels might not have appreciably changed. When available, translational effects can be monitored on the protein level, by western blotting with target-specific antibodies. In many published cases, transcripts that were regulated on the translational level were still cleaved by the miRNAs (5, 21–23), and cleavage products were detected by RACE-PCR (see Subheading 3.4.3).
Acknowledgments We thank Markus Riester for his contributions to earlier versions of WMD, Alexis Maizel, Javier Palatnik, Heike Wollmann, and Wolfgang Busch for discussion and sharing technical expertise, and Peter Bommert for comments on the manuscript. Work on small RNAs in the Weigel laboratory is supported by European Community FP6 IP SIROCCO (contract LSHG-CT-2006-037900) and by the Max Planck Society. R.S. is supported by an EMBO Long-term fellowship.
88
Schwab et al.
References 1. Chapman EJ, Carrington JC (2007) Specia lization and evolution of endogenous small RNA pathways. Nat Rev Genet 8:884–896 2. Tang G, Galili G, Zhuang X (2007) RNAi and microRNA: breakthrough technologies for the improvement of plant nutritional value and metabolic engineering. Metabolomics 3:357–369 3. Allen E, Xie Z, Gustafson AM, Carrington JC (2005) MicroRNA-directed phasing during trans-acting siRNA biogenesis in plants. Cell 121:207–221 4. Schwab R, Ossowski S, Riester M, Warthmann N, Weigel D (2006) Highly specific gene silencing by artificial microRNAs in Arabidopsis. Plant Cell 18:1121–1133 5. Schwab R, Palatnik JF, Riester M, Schommer C, Schmid M, Weigel D (2005) Specific effects of microRNAs on the plant transcriptome. Dev Cell 8:517–527 6. Ossowski O, Schwab R, Weigel D (2008) Gene silencing in plants using artificial microRNAs and other small RNAs. Plant J 53(4):674–690 7. Vaucheret H (2005) MicroRNA-dependent trans-acting siRNA production. Sci STKE 2005:pe43 8. Alvarez JP, Pekker I, Goldshmidt A, Blum E, Amsellem Z, Eshed Y (2006) Endogenous and synthetic microRNAs stimulate simultaneous, efficient, and localized regulation of multiple targets in diverse species. Plant Cell 18:1134–1151 9. Niu QW, Lin SS, Reyes JL et al (2006) Expression of artificial microRNAs in transgenic Arabidopsis thaliana confers virus resistance. Nat Biotechnol 24:1420–1428 10. Parizotto EA, Dunoyer P, Rahm N, Himber C, Voinnet O (2004) In vivo investigation of the transcription, processing, endonucleolytic activity, and functional relevance of the spatial distribution of a plant miRNA. Genes Dev 18:2237–2242 11. Qu J, Ye J, Fang R (2007) Artificial miRNAmediated virus resistance in plants. J Virol 81(12):6690–6699 12. Warthmann N, Chen H, Ossowski O, Weigel D, Hervé P. Highly Specific Gene Silencing by Artificial miRNAs in Rice. submitted. 13. Michniewicz M, Zago MK, Abas L et al (2007) Antagonistic regulation of PIN phosphorylation by PP2A and PINOID directs auxin flux. Cell 130:1044–1056 14. Palatnik JF, Allen E, Wu X et al (2003) Control of leaf morphogenesis by microRNAs. Nature 425:257–263
15. Abouelhoda MI, Kurtz S, Ohlebusch E (2004) Replacing suffix trees with enhanced suffix arrays. Journal of Discrete Algorithms 2:53–86 16. Griffiths-Jones S (2004) The microRNA Registry. Nucleic Acids Res 32:D109–D111 17. Schmid M, Davison TS, Henz SR et al (2005) A gene expression map of Arabidopsis thaliana development. Nat Genet 37:501–506 18. Matzke M, Kanno T, Huettel B, Daxinger L, Matzke AJ (2006) RNA-directed DNA methylation and pol IVb in Arabidopsis. Cold Spring Harb Symp Quant Biol 71:449–459 19. Ameres SL, Martinez J, Schroeder R (2007) Molecular basis for target RNA recognition and cleavage by human RISC. Cell 130:101–112 20. Mückstein U, Tafer H, Hackermüller J, Bernhart SH, Stadler PF, Hofacker IL (2006) Thermodynamics of RNA-RNA binding. Bioinformatics 22:1177–1182 21. Aukerman MJ, Sakai H (2003) Regulation of flowering time and floral organ identity by a microRNA and its APETALA2-like target genes. Plant Cell 15:2730–2741 22. Chen X (2004) A microRNA as a translational repressor of APETALA2 in Arabidopsis flower development. Science 303:2022–2025 23. Gandikota M, Birkenbihl RP, Hohmann S, Cardon GH, Saedler H, Huijser P (2007) The miRNA156/157 recognition element in the 3¢ UTR of the Arabidopsis SBP box gene SPL3 prevents early flowering by translational inhibition in seedlings. Plant J 49:683–693 24. Tao Zhao, Wei Wang, Xue Bai and Yijun Qi; Gene silencing by artificial microRNAs in Chlamydomonas; The plant Journal, 2009 25. Attila Molnar, Andrew Bassett, Eva Thuenemann, Frank Schwach, Shantanu Karkare, Stephan Ossowski, Detlef Weigel and David Baulcombe; Highly specific gene silencing by artificial microRNAs in the unicellular alga Chlamydomonas reinhardtii; The Plant Journal, 2009 26. Basel Khraiwesh, Stephan Ossowski, Detlef Weigel, Ralf Reski, and Wolfgang Frank; Specific gene silencing by artificial MicroRNAs in Physcomitrella patens: an alternative to targeted gene knockouts. The Plant Journal, 2008. 27. Brodersen P, Sakvarelidze-Achard L, BruunRasmussen M, Dunoyer P, Yamamoto YY, Sieburth L, Voinnet O, Widespread translational inhibition by plant miRNAs and siRNAs., Science. 2008 May 30;320(5880):1185–90
Chapter 7 Bioinformatics Analysis of Small RNAs in Plants Using Next Generation Sequencing Technologies Kan Nobuta, Kevin McCormick, Mayumi Nakano, and Blake C. Meyers Abstract Next-generation sequencing technologies have a substantial impact on a broad range of biological applications. Like many other groups, we use these new technologies, especially SBS (Sequence-By-Synthesis), for deep profiling of small RNA molecules in plants. Small RNAs are 21–24 nucleotides in length and are known to play a major role in the activation of mRNAs and genomic DNAs. We have generated numerous SBS small RNA libraries; each can consist of more than three million signatures of more than 33 nucleotides in length. Here, we describe the challenges and our strategies to handle the very large quantity of small RNA data generated by these next-generation sequencing technologies. Key words: SBS, MPSS, 454, Gene expression, Small RNA, miRNA, siRNA, Signature
1. Introduction 1.1. Small RNA Molecules
The discovery of RNA-based silencing systems has changed our understanding of mechanisms of transcription, translation, and the regulation of gene expression. There are two major classes of these RNA molecules in plants: microRNAs (miRNAs) and small interfering RNAs (siRNAs), which are approximately 21–24 nucleotides in length. In plants, miRNAs are known to regulate mRNAs primarily at the posttranscriptional level by directing mRNA cleavage, while endogenous siRNAs can trigger DNA methylation and histone modifications, leading to gene silencing. Small RNAs have been identified in nearly all eukaryotes, an indication of both the importance and the ancient origin of these regulatory molecules. The pool of small RNAs in plants is highly complex, consisting primarily of a diverse set of low-abundance siRNAs (1). Extensive
B.C. Meyers and P.J. Green (eds.), Plant MicroRNAs, Methods in Molecular Biology, vol. 592 DOI 10.1007/978-1-60327-005-2_7, © Humana Press, a part of Springer Science + Business Media, LLC 2010
89
90
Nobuta et al.
work on the relatively small number of Arabidopsis miRNAs has demonstrated that these RNAs regulate the expression of many genes that play key roles in development and stress responses (2). 1.2. Next Generation Sequencing Technologies
The technology to sequence a large number of DNA molecules in a short period of time and in a cost-effective manner has developed rapidly in the last few years. Massively Parallel Signature Sequencing (MPSS) was the first methodology of this type (although the “cost-effective manner” of that method is debatable), and we have used this technology for both mRNA and small RNA transcriptome profiling (1, 3). Although MPSS generated hundreds of thousands to millions of signatures per library, the sequence length was limited to 17–20 bases (3). Since the length of small RNA molecules is primarily 21–24 bases in plants, MPSS signatures were not able to capture the information of full-length miRNA and siRNA sequences. Unlike MPSS, the sequencing length of the pyrosequencing-based technology from 454 Life Sciences can exceed 100 bases, which is more than long enough to obtain the full-length sequence of small RNA molecules (http://www.454.com). However, the sampling depth of this technology, which recently measured in the hundreds of thousands of reads, is not enough to capture the full complexity of small RNA molecules, particularly in comparison to other, newer technologies. Illumina (http://www.illumina.com) developed a new sequencing technology often referred to by the generic name, sequencing-by-synthesis, or “SBS.” Illumina’s approach uses reversible nucleotide terminators to sequence short DNA fragments (signatures). Although the sequence length of SBS is currently ~33–35 bases and hence not as long as that of 454, it is long enough to capture the full-length sequence of the small RNA molecules. More importantly, SBS can sequence millions of signatures in parallel. This capacity is deep enough to capture almost all the small RNA molecules in a given sample. There are a number of potentially competing technologies emerging, with Applied Biosystems leading the pack with their “SOLiD” sequencing device; however, these machines are only now being put to routine use in the laboratory. In any case, the methods that we describe here are easily adapted for any high through-put shortread DNA sequencing technology.
1.3. SBS Small RNA Expression Database
We have generated numerous small RNA libraries from various organisms, with the libraries sequenced using MPSS, 454, and SBS. Here, as an example, we mainly focus on the databases we created for Arabidopsis and rice small RNA libraries sequenced with Illumina’s SBS technology. The SBS signatures were handled as we describe below and stored in a relational database.
1.3.1. Trimming and Mapping
In order to extract the biologically relevant data from the raw SBS output, it is important to understand how the libraries are constructed.
Bioinformatics Analysis of Small RNAs in Plants
91
The details of the small RNA library construction method are described in Chapter 8 of this volume. Briefly, RNA molecules in the approximate size range of 20–25 nucleotides are recovered from a sample of total RNA; after ligating 5¢ and 3¢ RNA adapters and making cDNA from this, the product is subjected to SBS sequencing. The sequencing reaction reads not only the small RNA molecule, which is flanked by the two adapters (5¢ and 3¢), but also the adapter sequences. Therefore, all the potential adapter sequences have to be removed from the SBS signatures. We look for both 5¢ and 3¢ adapter sequences with up to two mismatches to allow for sequencing errors and reduce the number of untrimmed sequences. We have used the known microRNA sequences, obtained from Sanger registry (4), as a training set, and we optimized the minimum and maximum of bases in the adapter sequences to consider for trimming. As an example, Fig. 1 shows a typical trimming result from an SBS library. As expected, the majority of the signatures had a 3¢ adapter sequence, and more than 90% of the sequences had a relatively long insert (e.g., >18 bp), which was kept for genome matching (Fig. 1). In contrast, a small number of signatures had a 5¢ adapter sequence, and almost all of them were not usable (Fig. 1). After trimming, we map all the SBS signatures to a genomic sequence. Since both Arabidopsis (Col-0) and rice (Oryza sativa ssp. japonica cv. Nipponbare) genomic sequences have very high quality, and materials derived from either of these inbred genotypes are usually highly homozygous, we consider only perfect matches.
no adaptor 27,164
5’ adaptor trimmed 2,192
error 6,818 5’ adaptor 248,222
Orphan signatures 706,344
trimming
SBS library example 3,170,342 3’ adaptor 2,888,138
mapping Insert < 18 bp Match to genome 2,030,543 3’ adaptor trimmed 2,734,695
Fig. 1. An example of trimming and mapping results with an SBS small RNA library: This specific SBS library had 3,170,342 distinct signatures. As expected, our trimming script identified a high proportion of signatures with 3¢ adaptor sequences and fewer than 10% of the signatures with 5¢ adaptor sequence. A very small portion of the signatures had no adaptor sequence or signatures with nonGATC characters (errors). Among the signatures with a 3¢ adaptor sequence, a large proportion of them had a long insert (>18 bp), while fewer than 1% of those with a 5¢ adaptor sequence passed this filter. More than two million trimmed signatures were mapped successfully to the rice genome.
92
Nobuta et al.
The script scans through the chromosomal sequence from the beginning to the end, performing a simple string search. With the same rice example above, we were able to map more than two million SBS signatures (18–33 bp) on the genome (Fig. 1). The remaining signatures (~700,000 “orphans”) could be the result of mismatches, RNA editing, sequences spanning intron/ exon junctions, or derived from gaps in the rice genomic sequence. The details of the trimming and mapping algorithms are described below (see Subheadings 3.1 and 3.3). 1.3.2. Associations with Annotated Genes and Other Genomic Sequences
Plant small RNA libraries represent a complex mixture of several types of molecules, including the following: highly abundant miRNAs that are few in number; weakly expressed siRNAs that number in the millions in terms of sequence diversity; a large number of “contaminating” tRNAs, rRNAs, snRNAs and snoRNAs; and other small classes of small RNAs like trans-acting siRNAs, or natural-antisense transcript siRNAs that we won’t describe here. The tRNAs, rRNAs, snRNAs, and snoRNAs are all major components of small RNA libraries. Since small RNAs derived from these RNAs share characteristics with miRNAs, like high abundances and representation in most libraries, it is important to flag them for removal from later analyses. In order to associate signatures with annotated genes, we first map them to the genome and store all coordinates (chromosome, position, and strand) in a table. Then, we run a gene mapping script that, for each gene, identifies all signatures between its starting and ending coordinates. If a signature is within a “contaminating” sequence, we mark it with a flag, which will be pertinent later on.
1.3.3. Normalization and Summary Table
The total number of signatures sequenced by SBS differs from library to library. In order to compare the expression level of a particular signature across the libraries, the abundance values must be normalized. This calculation is performed for each signature and for each library, based on the total abundance of the libraries. We select a round number that is close to the total number of sequences as the basis for normalization. However, since not all the signatures match to the genomic sequence (presumably, primarily due to errors or gaps in the sequence), we exclude these signatures from the calculation of the total used for normalization. In addition, we exclude the “flagged” signatures described above (t/r/sn/snoRNAs). We then subtract the abundance of these signatures (flagged and non-matching) from the total raw abundance to determine the adjusted total raw abundance. In order to obtain the normalized value for each small RNA, each raw value is divided by the adjusted total raw abundance and multiplied by a normalization factor that is close to the adjusted total raw abundance but is a round number that is typically a multiple of a million or half a million. These normalized values allow us to accurately compare
Bioinformatics Analysis of Small RNAs in Plants
93
the expression levels of signatures across libraries. It is possible to further normalize the expression levels by dividing the normalized value by the total number of locations to which a tag matches; we refer to this as the “hits-normalized abundance” (HNA). 1.4. Genomic Database and Web Interface
Many small RNA analyses are based on comparisons between annotated genes and features of the genome. We use the most recent annotations of Arabidopsis (TAIR7.0) or rice (TIGR5.0) genomes in our analyses, but any sequenced plant genome could be used. We run an XML parser, which is written in Java with SAX, and extract all the necessary information (e.g., gene coordinates, genomic sequence, etc.) from these files. These sets of information are necessary as described above when we associate signatures to annotated genes. After extracting the genomic sequence from the files, we run the programs RepeatMasker (5), and Einverted/ Etandem (6), to identify the repetitive regions in the genome. These data are necessary to construct the small RNA database, flag specific types of small RNAs (mentioned above), and to display SBS data through our web interface (http://mpss.udel.edu). We have adapted for SBS data the interface that we developed previously to display our MPSS data (7). However, we made minor adjustments to accommodate the large data size of SBS. For example, the number of signatures displayed in a certain window is restricted to human readable size instead of displaying all of them at once. The details of our interface and the available tools can be found elsewhere (7).
2. Materials Our servers use Linux 2.6 as an OS and have at least 8 GB of memory with multiple Intel Xeon processors with more than 2 GHz of CPU speed. We took advantage of these configurations and developed various scripts (Perl and Java), which require large memory space but run in a relatively short period. The scripts extract expression data from the sequencing results and store them in a MySQL server as a “Small RNA Expression Database.” Similarly, we extract genomic information from the files provided by various institutes and store them in a MySQL server as a “Genomic Database.” The web interface, mainly written in PHP, extracts the data requested by the user from these two databases and displays the query results in graphical and analytical outputs. 2.1. Small RNA Expression Database
1. As described in chapter 8, small RNAs can be cloned and sequenced using high-throughput parallel methods like SBS, and we have generated numerous small RNA libraries from diverse plant species. A typical library consists of nonredundant signature sequences and corresponding abundance values.
94
Nobuta et al.
These raw data, as well as the data with normalized values, are downloadable from our website for the users who want to perform their own analyses. 2. In addition to the small RNA libraries generated in our lab, we have downloaded libraries that are publicly available from GEO (Gene Expression Omnibus). At the time of the preparation of this manuscript, our Arabidopsis and rice small RNA databases have numerous libraries that were sequenced by MPSS, 454, or SBS. We developed a package of scripts to parse these data, extract the necessary information, and build MySQL small RNA databases. 3. In addition to the signature sequences and their expression levels, the details of each library are stored in this database. Users can find information such as the library developer, the developmental stage and the condition of the plants that were used for small RNA extraction, etc. Furthermore, we record the number of signatures that match to the given genome as well as to tRNA, rRNA, snRNA, and snoRNA genes. These sets of information are useful to determine the quality of the libraries. 2.2. Genomic Database and Web Interface
1. We use genomic assemblies (pseudochromosomes) provided by public genome projects, as well as annotation files provided by such groups. The Arabidopsis and rice annotations can be downloaded in TIGR XML format from TAIR (The Arabidopsis Information resource: http://arabidopsis.org/) and Michigan State University (MSU, http://rice.plantbiology.msu.edu/), respectively. Our Java XML parser extracts necessary information and stores them in relational database such as MySQL. 2. Although the XML files from TAIR and TIGR contain almost all the information necessary for our analyses, we download additional genomic information from other providers, and those data come in a variety of file formats. For example, we obtain miRNA information in gff file format from the Sanger registry (http://microrna.sanger.ac.uk/sequences/). We customize our scripts to parse different file formats and add the information to the genomic database. 3. In order to complete our genomic database, we perform additional analyses with the data extracted from TAIR, TIGR, and other institutes. For example, we run publicly available programs, such as RepeatMasker, Einverted, and Etandem, to identify repetitive sequences on the given genome and store the information in MySQL tables. We associate this information with expression data and distinguish miRNA from siRNA signatures. In addition, we run a miRNA target prediction program developed in our lab using previously described rules (8) to identify potential targets of all known and novel miRNAs.
Bioinformatics Analysis of Small RNAs in Plants
95
4. Based on the analyses above, users can access from our website a brief description of the small RNA MPSS data, links to known microRNAs, inverted repeats, tandem repeats, pericentromeric regions, weakly predicted transposons, and trans-acting siRNAs. Users can simply click the examples to get an idea of our visualization and data access tools. A schema of a database for storing and handling SBS small RNA data is shown in Fig. 2.
3. Methods In this section, we focus on three major steps in the handling of SBS small RNA libraries and the identification of miRNAs from these data: trimming and mapping of the data, updating the database, and identification of miRNAs. We use a rice small RNA data set as an example to describe these steps. 3.1. Trimming of SBS Small RNA Signatures
We start by scanning each raw sequence for both 5¢ and 3¢ adapter sequences. This script allows up to two mismatches to allow for sequencing errors and reduce the number of untrimmed sequences. After the adapters are removed, the identical sequences are collapsed into a single file of nonredundant sequences with an associated abundance. The trimming steps are described below. 1. The script takes a couple of required and optional arguments, so that the users can run the script with their preferred conditions. For example, one of the optional arguments is the number of mismatches. As a default setting, the script looks for 5¢ and/ or 3¢ adapter sequence with up to two mismatches in a given SBS signature. The script does not allow more than two mismatches but allows the users to change it to either zero or one mismatch. 2. The SBS sequencing result is one of the required fields. Many institutes now have an Illumina SBS machine and provide sequencing services. Depending on the institute, the sequencing results are delivered in a range of file formats. Our script assumes that the file consists of nonredundant sequences with an associated abundance in a particular format. If the given file is not in one of our “standard” formats, the file has to be reformatted to a standard format, such as the signature sequence and the associated abundance value delimited by tabs. 3. The adapter sequences are divided into three parts (start, middle, end) and are used as seed sequences. The length of the seed sequences is based on the minimum sequence that can be recognized as an adapter sequence. The script goes through the input file and creates three separate files: (1) signatures with potential 5¢
mRNA data run_master
run_info
expression_raw
run_id
run_id int tag_id int frequency int confidence int
run_id int tag_length int lib_id tinyint stepper int raw_sum int
run_id int tag_id int tag char raw_value int lib_id tinyint stepper int
run_id
Summary
lib_id
tag_id int tag char stepper_chosen int reliable int significant int lib_norm int hits tinyint chr_id tinyint strand enum(’w’,’c’) position int class tinyint gene char model int
library_info library_master tag_id
tag_id int tag char lib_id tinyint norm_2 norm_4
lib_id
lib_id int name char organism char variety char RNA_extract_method char sample_source char description text
library_detail lib_id
lib_id int tag_length int raw_stepper_2_sum double raw_stepper_4_sum double distinct_tags_cnt int total_tags_cnt int
tag_id
tag_master tag_id int tag char length int
Genomic data
tag_class tag_position_id int tag_id_17 int tag_17 char tag_id_20 int tag_20 char chr_id int strand enum('w','c') position int class char gene char model int exon_id int exon_position int
tag_position
tag_hits tag_id int hits int
tag_id
chr_inverted chr_id char * start1 int end1 int start2 int end2 int score int per tinyint gap int dis int avg_len int
tag_position_id int tag_id_17 int tag_17 char tag_id_20 int tag_20 char chr_id int strand enum('w','c') position int gene char model int
tag_position_id
chr_tandem chr_id char * start int* end int* len int score int size int count int per float
chr_repeats chr_id char * start int end int len int SW_score int repeat_id char * repeat_class char *
chr_id
gene
chr_id
gene_master
chromosome_master chr_id int * length int organism varchar lineage varchar seq_group varchar centromere_location int
chr_id
pep_blast
gene
gene varchar subject varchar e-val double
gene char chr_id int strand enum('w','c') coord_start int coord_end int model_cnt int title varchar
gene_position gene char model int exon_id int utr int coord_start int coord_end int
gene
Small RNA data run_master tag char raw_value int lib_id tinyint
tag
tag_position tag
tag char chr_id char strand enum('w','c') position int gene char
library_details lib_id int flag tinyint raw_sum int distinct_tags_cnt int
lib_id
lib_id
library_info
Summary tag char reliable int significant int lib_norm int hits tinyint
run_info lib_id int raw_sum int
lib_id
lib_id int name char organism char variety char RNA_extract_method char sample_source char description text
Schema Fig. 2. Schema of a database for storing and handling SBS small RNA data: The database is designed with three major sets of data. The lines connecting tables indicate one-to-one (simple lines) or one-to-many (branched lines) relationships. The field name above or below the lines indicates the key that connects the tables. Note that not all the fields and the tables are listed in this figure.
Bioinformatics Analysis of Small RNAs in Plants
97
adapter sequences, (2) signatures with potential 3¢ adapter sequences, and (3) signatures with no potential adapter sequence. Pseudocode: • For each SBS signature in sequencing result file: – Compare the three seed sequences of the 5¢ adapter sequence. (a) If it exists, record the following to a new file (file1): signature sequence, associated abundance, and which seed sequence matched to the query sequence. (b) Else, compare the seed sequences of the 3¢ adapter sequence. * If it exists, record the same information as above to a separate file (file2). * Else, record the rest to another file (file3). 4. We next examine file1 to determine if the 5¢ adapter sequence is contained in the SBS signature. Depending on whether the 5¢ adapter sequence is remaining in the SBS signature, we do one of the following with the signature sequence and associated abundance count: (1) keep these data as well as the length of contaminated adapter sequence in a data structure, (2) record them in a file, or (3) send them to an error file. Pseudocode: • For each SBS signature with 5¢ “start” seed sequence, examine the downstream sequence of this seed sequence: – If the downstream sequence matches upstream of the 5¢ “mid” seed sequence (with or without mismatches), send to error file. – Else, record the signature and the count to a new file (file4). • For each SBS signature with 5¢ “end” seed sequence, examine the upstream sequence of this seed sequence: – If the upstream sequence matches downstream of the 5¢ “mid” seed sequence (with or without mismatches), examine the remaining length. (a) If the length is longer than 18 (by default), record the information in hash-of-array (comboHoA): SBS signature → (length of adapter, count). – Else, record the signature and the count to a new file (file4). • For each SBS signature with 5¢ “mid” seed sequence, examine the sequences both upstream and downstream of this seed sequence: – If the upstream and downstream sequences match to the 5¢ “start” and “end” seed sequence (with or without mismatches), respectively, examine the remaining length and either record the information in the hash-of-array or send to the error file, as above.
98
Nobuta et al.
– Else, record the signature and the count to a new file (file4). 5. Examine the signatures in file3 and look for potential 5¢ adapter seed sequences with mismatches. Treat the signature sequences and the associated counts in the same way as above (step 4) based on the location of the identified 5¢ adapter sequence in the SBS signature. 6. Examine file2 and determine the 3¢ adapter sequence contaminated in the SBS signature. A similar algorithm as step 4 was applied here, except the “start” and “end” seed sequences were treated in an opposite way. For example, in this step, signatures with 3¢ adapter “end” and part of “mid” seed sequences were sent to the error file, while in step 4, signatures with 5¢ adapter “start” and part of “mid” seed sequences were sent to the error file. The reason behind this is that, most likely, the signatures with the end of the 3¢ adapter and the start of 5¢ adapter sequence have no or very short inserts between the two adapters. 7. Another difference between step 4 (5¢ adapter trimming) and step 6 (3¢ adapter trimming) is that the signatures that satisfy the conditions in step 6 are trimmed and kept in the final hash table (allH) and not in comboHoA. Pseudocode: • For each SBS signature with 3¢ adapter sequence with up to two mismatches: – If the length of remaining sequence is longer than 18 (by default), trim the adapter sequence off from the signature and record the information in allH. (a) If the trimmed signature exists in allH, add the new count to the old one. (b) Else, add trimmed signature as a key and the count as a value to allH. 8. The signatures in file4 that were generated during step 4 have no identifiable 5¢ adapter sequence. Each signature is subjected to the 3¢ adapter trimming (steps 6 and 7), and all the resulting data are stored either in allH or the error file. Similarly, the signatures in the comboHoA are subjected to the same steps. If the remaining sequence, after trimming 5¢ adapter sequence and 3¢ adapter sequence (if they exist), is longer than 18 bases, the sequence is kept in allH. 9. In the final step, the script parses allH and, after final adjustment, it prints out the trimmed results with associated abundance to a final file. Pseudocode: • For each key (trimmed signature) and the value (summed abundance value):
Bioinformatics Analysis of Small RNAs in Plants
99
– If the signature has more than 75% A’s, send to the error file (as SBS produces A-rich sequences that are noise and not real), otherwise: (a) If the length of the signature is full-length (no identifiable 5¢ or 3¢ adapter), * Examine the last three bases of the signature, and if it ends with one of TCG, TC, or T, trim if off and print out to the final file. Otherwise, simply print out the untrimmed signature and the abundance value. (b) Else, print out the key and the value to the final file. 3.2. Preparing a Genome Database for the Small RNA Data
We make our data visible via a customized web interface. This visualization tool makes the data more accessible to biologists without computational or bioinformatics skills and improves the interpretability of these data. The detail of the construction of the genomic database can be found elsewhere (7). Here, we focus on the genomic information in the database that is specific for small RNA data analyses. 1. The majority of the information is derived from XML files provided by TIGR and TAIR. One of the most important sets of information is the coordinate of the annotated genes. These sets of information allow us to draw the gene structure and the associated small RNA signatures on our web interface. 2. Another important piece of information is the type of annotated genes. Currently, we categorize the genes into eight classes: protein coding gene, pseudogene, TE (Transposable Element), miRNA, tRNA, sRNA, snRNA, and snoRNA. All the classes, especially the last four classes (t/r/sn/snoRNA genes), are particularly important for small RNA analyses. Although almost all the information of these gene classes can be found in the XML files, some of them are not included in these files. For example, we obtained Indica rice snoRNA information (9) and utilized that information after mapping the snoRNA sequences to the Nipponbare genome. Similarly, we regularly obtain the most updated miRNA information from Sanger registry and manually add them to our existing genome database. 3. The t/r/sn/snoRNAs genes are structural RNAs that are constitutively expressed at high levels in many organisms, and thousands of signatures from each library correspond to these genes. Since most likely these signatures are degradation products of t/r/sn/snoRNAs and not regulatory small RNA molecules such as miRNA and siRNAs, we “flag” the t/r/sn/ snoRNA-matching signatures to allow us to easily exclude them from small RNA analyses.
100
Nobuta et al.
4. Another critical piece of information for the genomic database is the repetitive sequence information. Since repetitive sequences are known to be sources of siRNAs, this information helps us to distinguish miRNA signatures from siRNA signatures. Although TAIR and TIGR provide annotated TE information, we use freely available computer programs, such as RepeatMasker (http://repeatmasker.org) (5), Etandem, and Einverted (6), and extract the coordinates of different kinds of repetitive sequences as well as those with low cut-off values (i.e., poorly predicted or poorly annotated repeats) from the genome sequence. The coordinates of the repeat regions and the type of repeats are stored in the MySQL tables and are displayed on our web interface. 3.3. Mapping of Small RNA Sequences to a Genome and Association to Annotated Genes
To facilitate analyses of the data for plant species for which a genome is available, the small RNA sequences are mapped to the genome. We use and record only perfectly matching sequences to avoid complications of small RNAs matching to multiple closelyrelated sequences. There are two important considerations in this step: (1) sequences may match to more than one location in the genome, and may have originated from a subset of those locations; (2) sequences with errors will be ignored in future analyses. We do not consider point #2 to be important, since the depth of these libraries is already substantial. 1. The mapping script performs a simple string search from the start to the end of the given nucleotide (genomic) sequence, which is converted to an array. Since the length of the strings (small RNA signatures) varies from 18 bases to roughly 33 bases, the script splits each signature into a ‘head’ (18 bases) and the ‘tail’ (the remaining) part, and then stores the information in a hash-of-array with the head sequence as the key and the rest of the sequences in an array format. The script slides a ~33 base (full-length) window through the genomic sequence array, and if the head part of this sequence exists in the hash-of-array, the script then compares rest of the sequences. If the entire (head + tail) sequence matches to the genomic sequence, the script records the coordinates (index + 1). Pseudocode: • For each ~33 base (full-length) sequence in the given nucleotide sequence: – If the head part of this sequence exists in hash-of-array, get the ‘tail’ sequences in an array: (a) For each ‘tail’ sequence in the array, construct the ‘head + tail’ sequence. * If the ‘full-length’ starts with this sequence, record the index of genomic sequence array.
Bioinformatics Analysis of Small RNAs in Plants
101
2. Since all the signatures are stored in a hash-of-array, the script requires a relatively large amount of memory. In addition, an array with the genomic sequence requires large memory size as well. However, depending on the available memory space, users can split the genomic sequences into pieces and identify the coordinates of the signatures. After identifying the coordinates on the reading strand, the script goes through similar steps with the reverse complement strand. 3. After mapping the signatures, the coordinates are compared to the information stored in the genome database. This comparison allows us to associate the location of the signatures relative to the annotated genes. With these data, we can identify the origin of the signatures. For example, the information of the signatures that originated from t/r/sn/snoRNA genes are important for our normalization procedure, which is described below. 3.4. Updating the Database with New Libraries
As the cost of SBS sequencing goes down, generating SBS libraries become more affordable, and the frequency increases with which updates are made to the database with new libraries. Since an average small RNA SBS library consists of millions of distinct signatures and associated abundance values, the updating step is time-consuming and requires large amounts of computational power. There are multiple tables that are affected with the arrival of each new library. Here, we focus on the update of the table that holds almost all the information necessary for small RNA analyses. 1. This table consists of SBS signatures as a primary key, the number of hits to the genome, a flag to indicate if the signature originated from a t/r/sn/snoRNA gene, and the normalized abundance level identified in each library. Pseudocode: • For each new library, alter the table and add a new column that stores the normalized abundance level of each signature. – If the signature exists in the table, normalize the raw abundance value and record. – Else, insert the new signature and the normalized abundance value. The rest of the libraries are assigned an abundance value of zero. 2. The basic normalization formula is described above. Also, noted above is the important point that since some small RNA signatures do not match to the genome, and a large proportion of genome-matching small RNA signatures are derived from t/r/sn/snoRNAs, which are not small RNA molecules, we subtract the abundance value of these molecules from the sum of raw abundance before normalization. 3. Known microRNAs are identified in the data set based on their genomic coordinates, although we don’t currently assign a special flag to these sequences.
102
Nobuta et al.
4. At this point, our database is ready to be connected to our website. 3.5. Analyses of miRNAs from Small RNA Datasets
3.5.1. Identification of Regulated miRNAs
One of the primary analysis goals of producing a small RNA data set is to identify regulated small RNAs or to identify new small RNAs. In this section, we describe the approaches that we use for both of these types of analyses. In both Arabidopsis and rice, hundreds of miRNAs have been identified and registered at Sanger registry (http://microrna. sanger.ac.uk/sequences/). Many of these miRNAs are known to regulate genes that have critical functions in various aspects of plant biology. We have generated libraries of various tissues, developmental stages, stress treatments, etc. to compare the abundance of miRNAs and to identify specific sequences (siRNAmatching regions) that have characteristics suggesting that the small RNAs are differentially regulated. 1. Most of the miRNAs are 21 nt in length and are consistently processed from a precursor to generate the same 21-mer. However, the processing is not always exact, and some small RNAs are derived from the precursor but shifted by one or two bases in the 5¢ or 3¢ direction from the annotated miRNA. Therefore, in order to estimate the expression level of miRNAs, we calculate the “sum of the abundance” of all the signatures that correspond to the precursor of a given miRNA, instead of using just the abundance of the exact, annotated miRNA. 2. All the necessary pieces of data are distributed to three tables: gene_master, tag_position, and summary table on our mySQL server. The gene_master table stores the ID (gene name) of all the annotated miRNAs (in addition to annotated proteincoding and other genes). The tag_position table has all the signatures that map to the genome, including those that correspond to each gene, and the summary table has the abundance value of each signature. We join these three tables and calculate the expression level of each miRNA gene. 3. These values are compared among the libraries to identified miRNAs showing evidence of differential expression or regulation under a certain condition. Unfortunately, at the time of preparation, we do not have enough biological replicates to use statistical approaches for the comparison. Therefore, we focus on the miRNAs that show distinct expression differences (>10 times) among the libraries and verify the findings with other biological methods such as RNA gel blots.
3.5.2. Identification of Novel miRNA Candidates
The aim of this analysis is to identify candidates for new miRNAs. Although many miRNAs have been identified by various laboratories using different approaches (experimental and predictive),
Bioinformatics Analysis of Small RNAs in Plants
103
given the breadth of plant species, we are far from having discovered the full set of miRNAs for most plant species. In order to extract candidate miRNAs from deep sequencing small RNA data, we combine genomic approaches and a variety of filters. Known miRNAs are included in the analysis as a positive control, to determine which filters inadvertently catch and remove the known miRNAs. We have previously described the implementation of multiple data filters that capture most known Arabidopsis miRNAs (1). Here, we describe the filters that we apply to annotated genomes, such as rice and Arabidopsis, and an approach to identify candidate plant miRNAs from species for which there is no genomic sequence. A flowchart describing how these filters are implemented is shown in Fig. 3. The filters include metrics for each small RNA that are based on the number of libraries in which it is expressed, sparse clusters, an abundance filter, a paired filter, and a secondary structure filter. These filters are explained in the following steps: 1. The “number of expressed libraries” looks for the signatures that are expressed in the majority (more than half) of the libraries in a database. Although multiple libraries need to be available to apply this filter, we have found that this filter is one of the best filters to identify miRNAs as they are generally
307,064 unique small RNA tags
Sequence in at least half of multiple libraries (4,194) Sequence with abundance ≥ 100 TPQ (351) Sequence hit to rice genome ≥ 20 (97)
Part of known miRNAs (80)
1. Consistent expression across libraries 2. Abundance
Sequence hit to rice genome <20 (254)
Potentially new miRNAs (39)
Detailed analysis: • Foldback consolidation • miRNA* detection
3. Uniqueness
Match to other ncRNA or siRNA loci (135)
4. Folding structure 5. miRNA / miRNA* pair
Fig. 3. Flowchart of analyses used for the identification of miRNAs from plant small RNA libraries. This flowchart and the filters were developed based on the properties of known plant miRNAs. The numbers give a rough idea how we were able to reduce and refine the numbers of small RNAs to the most interesting subset. These can be verified by biologists with various molecular biology approaches (described elsewhere in this volume). The signatures counted in the black boxes represent those that passed the filters.
104
Nobuta et al.
quite consistently expressed (10). In a dataset that includes many libraries (~15 libraries or more), it may be possible to lower the cutoff for this filter to include sequences found in ~ 30% of all libraries and still not identify many siRNAs with this filter. 2. A “sparse cluster” means that only one to ten signatures were found in a region with no additional signatures within 500 bp of the 5¢ or 3¢ ends. The main purpose of this filter is to exclude heterochromatic siRNAs from the candidate list. These siRNAs are typically generated from repetitive regions of the plant genome. This process leads to hundreds of siRNA molecules from a relatively short distance on a genome and, unlike miRNAs, they form a “dense cluster.” miRNAs tend to form clusters of relatively few sequences because the processing from a precursor is so precise (see Note 1). 3. The “paired” filter searches for two small RNAs or “sets” of small RNAs that are located within 20–180 nt on the same strand (the candidate miRNA and miRNA*), and display a difference in abundance of 1:10 or greater; a “set” was defined as the consensus sequence of two or more overlapping signatures with 5¢ ends within two nucleotides of each other. This filter is useful to identify miRNA candidates because their precursors form secondary structures (“hairpins”) that are processed to generate the mature miRNA and semi-complementary molecule (miRNA*) during its biogenesis. Although miRNA* molecules are degraded rapidly, deep sampling with next generation sequencing technologies allows the capture of these molecules, albeit at a lower level than their corresponding miRNA partners. 4. The “abundance” filter identifies small RNAs with a normali zed abundance level in one of the libraries equal to or greater than 25 out of 250,000 sequences (25 TPQ). Most of the known miRNAs are relatively abundant and pass this filter. However, this is a somewhat strict criterion, and some miRNAs are found below those levels. 5. As mentioned above (step 3), miRNA precursors form hairpin structures, so these can be identified in some cases using RNA folding programs. We extract short stretches of sequences that contain the SBS signatures that satisfy the conditions described above, and we then run the program mfold (http:// www.bioinfo.rpi.edu/applications/mfold/) to select regions predicted to form hairpin-like structures. Among those results, we manually assess the quality of the secondary structure to see if there is good pairing of the miRNA and candidate miRNA* in the hairpin structure. If so, we validate these
Bioinformatics Analysis of Small RNAs in Plants
105
miRNA candidates using laboratory techniques described elsewhere in this volume. 6. The filters described above are a very powerful approach to identify miRNA candidates for the plant species that have high-quality genome annotation. Although it is challenging to identify miRNAs for plant species that have not been sequenced, the “abundance” and “number of expressed libraries” filters are both useful in this situation, and it may be possible to identify miRNA candidates based on the conservation of sequences across plant species. This requires multiple libraries from different species, and it may exclude nonconserved miRNAs, particularly if the comparisons are across widely divergent species. Tools that are useful for these types of analyses are described in this volume in Chapter 12. With new applications of next generation sequencing technologies becoming available, it is possible that new techniques for miRNA identification will soon be available for species for which the genomes have not been completely sequenced.
4. Notes Differentially regulated siRNA clusters can be identified by clustering adjacent small RNAs and summing the abundance of these clusters within each library, including sequences on both strands (since siRNAs are not strand-specific). We have previously described our proximity-based algorithm to cluster small RNAs (1). The clusters were dependent only on the distances between small RNAs and were independent of annotated genomic features such as genes. This facilitated the comparison of clusters across libraries while removing the bias that the annotation might introduce. The optimal cluster size was determined by comparing the results of clustering based on joining signatures within 100, 250, or 500 bp of each other for each library. We had found that clusters joining small RNAs within 500 bp of each other were optimal because this size reduced the number of single, unclustered signatures by approximately two-thirds in each library.
Acknowledgments The work on the tools, approaches, and the databases described here were developed with support from the NSF Plant Genome Research Program.
106
Nobuta et al.
References 1. Lu C, Tej SS, Luo S, Haudenschild CD, Meyers BC, Green PJ (2005) Elucidation of the small RNA component of the transcriptome. Science 309:1567–1569 2. Mallory AC, Vaucheret H (2006) Functions of microRNAs and related small RNAs in plants. Nat Genet 38 Suppl:S31–S36 3. Meyers BC, Tej SS, Vu TH, Haudenschild CD, Agrawal V, Edberg SB, Ghazal H, Decola S (2004) The use of MPSS for whole-genome transcriptional analysis in Arabidopsis. Genome Res 14:1641–1653 4. Griffiths-Jones S (2004) The microRNA registry. Nucleic Acids Res 32:D109–D111 5. Smit AFA, Hubley R, Green P (1996–2004) RepeatMasker Open-3.0. http://www.repeatmasker.org 6. Rice P, Longden I, Bleasby A (2000) EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet 16: 276–277
7. Nakano M, Nobuta K, Vemaraju K, Tej SS, Skogen JW, Meyers BC (2006) Plant MPSS databases: signature-based transcriptional resources for analyses of mRNA and small RNA. Nucleic Acids Res 34:D731–D735 8. Jones-Rhoades MW, Bartel DP (2004) Computational identification of plant microRNAs and their targets, including a stressinduced miRNA. Mol Cell 14:787–799 9. Chen CL, Liang D, Zhou H, Zhuo M, Chen YQ, Qu LH (2003) The high diversity of snoRNAs in plants: identification and comparative study of 120 snoRNA genes from Oryza sativa. Nucleic Acids Res 31:2601–2613 10. Lu C, Jeong DH, Kulkarni K, Pillay M, Nobuta K, German R, Thatcher SR, Maher C, Zhang L, Ware D, Liu B, Cao X, Meyers BC, Green PJ (2008) Genome-wide analysis for discovery of rice microRNAs reveals natural antisense microRNAs (nat-miRNAs). Proc Natl Acad Sci USA 105:4951–4956
Chapter 8 High-Throughput Approaches for miRNA Expression Analysis Cheng Lu and Frédéric Souret Abstract miRNAs have emerged as key regulators of gene expression in both plants and animals. These small (generally 21–22 nt) RNA molecules, originated from primary “hairpin” transcripts, can induce translational suppression or direct mRNA cleavage. Similar to regular mRNAs, the expression of miRNAs is highly regulated. Their expression pattern could provide critical clues to understanding miRNA functions. However, many previously identified miRNA families have multiple paralogous loci. Within each family, different members are often closely related and sometimes give rise to identical miRNAs. This poses critical challenges in the analysis of individual miRNA genes. This chapter describes several methods that are commonly used for miRNA expression analysis, including high-throughput sequencing, microarrays, and briefly discusses qRT-PCR, northern blotting, and other approaches used for data validation. Key words: High-throughput sequencing, Microarrays, small RNA library, microRNA, siRNA, qRT-PCR, Northern blotting
1. Introduction miRNAs are a family of noncoding RNAs that negatively regulate gene expression at the posttranscriptional level. Most plant miRNAs have near-perfect pairing to their targets and therefore generally cause mRNA cleavage (1, 2). Numerous studies have demonstrated the critical role of miRNAs in controlling developmental processes and organ identity (3–5). More recently, evidence has emerged from several groups demonstrating a role for miRNAs in abiotic/biotic stress responses and regulation of trans-acting (ta)-siRNA biogenesis (6–10). In keeping with this, many miRNAs
B.C. Meyers and P.J. Green (eds.), Plant MicroRNAs, Methods in Molecular Biology, vol. 592 DOI 10.1007/978-1-60327-005-2_8, © Humana Press, a part of Springer Science + Business Media, LLC 2010
107
108
Lu and Souret
show unique tissue-specific, development-related, and stress-induced expression (4–7, 10). Similar to coding mRNAs, accurate expression profiling of individual miRNA is essential to better understand miRNA function. However, a detailed expression pattern for specific miRNAs is difficult to obtain because of (a) the short length of the mature miRNA, (b) the presence of highly similar miRNA family members, and (c) sometimes the presence of intermediate forms that include the miRNA sequence (e.g., precursors). Northern blotting is still widely used to assess the expression of miRNAs of interest. Combined with denaturing polyacrylamide gel electrophoresis, this method provides information about the size and the accumulation level of specific miRNAs. The major disadvantages of the method are its low sensitivity, throughput, cost-effectiveness, and specificity. To overcome the sensitivity obstacle, locked nucleic acid (LNA)-oligonucleotide probes can be used in hybridization. In this case, a tenfold higher sensitivity can generally be achieved with LNA-probes in miRNA northern blotting (11, 12). Microarrays offer another hybridization-based platform for miRNA expression analysis (13–15). Usually, DNA microarrays use a probe set that can cover all the known miRNAs in a genome. Therefore, this technology offers the potential to examine a large number of miRNAs in a single hybridization. Unlike northern blotting, the microarray method does not provide any information regarding the length of the small RNA. Additionally, microarrays inherit the same drawback of low specificity as northern blots. Therefore, microarray data require careful validation and interpretation. New sequencing technologies can greatly improve the identification of miRNAs and provide genome-wide quantitative expression data in a given biological sample. This was first accomplished in Arabidopsis by using Massively Parallel Signature Sequencing (MPSS) (16), and more recently in rice (17, 18). One limitation of MPSS is that only the first 17 nucleotides of the small RNAs were sequenced. The “454” technology, which generates full-length information about small RNAs with moderate depth, has been an alternative approach for high-throughput small RNA sequencing (19–21). Substantial progress has been reported for other nextgeneration sequencing technologies, including Sequencing-BySynthesis (SBS) and Supported Oligo Ligation Detection (SOLiD). With the race for the “$1,000 genome” on, the development of the “next–next” generation of high-throughput sequencing technologies with single-molecule approaches is emerging and could further revolutionize our genomics era. Specific features and characteristics of these new technologies are discussed elsewhere (22). These methods could sequence millions of small RNAs in a single run. Compared with other methods, expression profiling by sequencing offers the best specificity. Members of the same miRNA families that differ by a single nucleotide can be differentiated at
High-Throughput Approaches for miRNA Expression Analysis
109
the sequence level with high accuracy. Another advantage is that sequence knowledge of the miRNAs is not required before the experiment, so it has become a fantastic discovery tool. The cloning frequency of individual small RNAs generally reflects their expression level. However, some bias could arise from differential ligation efficiency at the ends of various small RNAs.
2. Materials All solutions and buffers used in this protocol should be free of RNase and DNase contaminants. All biochemicals should be of the highest quality. RNA adaptors should be resuspended in DEPC-treated water and stored at −80°C until use. DNA oligos should be stored at −20°C until use. 2.1. General Equipment
1. RNase-free mortars, pestles, and spatulas: wrap in foil and bake at 220°C at least 12 h. 2. RNase-free plastic ware: tips, microcentrifuge tubes, and 13 and 30 ml centrifuge tubes.
2.2. Low Molecular Weight (LMW) RNA Isolation
1. RNase-free (DEPC-treated) water: add 0.05% DEPC to distilled water and stir overnight. Autoclave to sterilize the DEPC-treated water. Sterile water should be added to samples during RNA isolation whenever needed. RNase-free sterile water is also available commercially. 2. 50% polyethylene glycol (PEG), MW = 8,000. 3. 5 mg/ml glycogen (Ambion AM9510). 4. 5 M NaCl. 5. Ethanol, 80% and 100%. 6. 1.5% Agarose Gel.
2.3. Purification of 20–30 nt Small RNA
1. 8 M urea/15% polyacrylamide gels (30 ml): 11.8 ml 40% acrylamide stock (Ambion AM9022, Acrylamide/Bis 19:1 solution), 3 ml 10× TBE, 14.4 g urea (powder), 180 ml 10% ammonium persulfate, and 13.8 ml TEMED (add last). 2. 10× TBE. 3. 2× formamide loading buffer: 90% formamide, 1× TBE, xylene cyanol, and bromophenol blue. 4. 10 bp DNA ladder, 1 mg/ml (Invitrogen 10821-015). 5. Ethidium bromide (highly toxic). 6. 0.3 M NaCl. 7. Spin-X filter (Corning 8162).
110
Lu and Souret
2.4. 5′ Adaptor Ligation and Purification
1. 5′ RNA adaptor: 5′ OH-GGU CUU AGU CGC AUC CUG UAG AUG GAUC-OH 3′ (Dharmacon) (see Note 1). 2. 10× RNA ligase buffer (Ambion AM2140). 3. T4 RNA ligase (5 U/µl). 4. 2× loading dye: 90% formamide, 1× TBE, xylene cyanol, and bromophenol blue. 5. 8 M urea/10% polyacrylamide gels. 6. 1× TBE. 7. GlycoBlue™ (15 mg/ml; Ambion AM9515).
2.5. 3¢ Adaptor Ligation and Purification
3′ RNA adaptor. 5′ pUC GUA UGC CGU CUU CUG CUU GidT 3′ where p = phosphate, idT = inverted deoxythymidine (Dharmacon).
2.6. RT-PCR
1. RT-primer: 5′ CAA GCA GAA GAC GGC ATA CGA 3′. 2. 5× first strand buffer (one of the components of Invitrogen 18064-022). 3. dNTP mix, 10 mM (Invitrogen R725-01). 4. DTT, 100 mM. 5. RNaseOUT™ Ribonuclease (RNase) Inhibitor (40 U/µl; Invitrogen 10777-019). 6. Superscript II Reverse Transcriptase (RT) (200 U/µl; Invitrogen 18064-022) and its components. 7. 10× PCR buffer (one of the components of Invitrogen 10342-020). 8. MgCl2, 50 mM. 9. Forward PCR primer. 5′ CAA GCA GAA GAC GGC ATA CGA 3′. 10. Reverse PCR primer. 5′ GGT CTT AGT CGC ATC CTG TAG ATG 3′. 11. Taq DNA polymerase (5 U/µl; Invitrogen 10342-020). 12. 6× loading dye (Promega G1881). 13. 10% TBE-polyacrylamide gel (30 ml). 7.7 ml 40% acrylamide stock (Ambion AM9022, Acrylamide/Bis 19:1 solution), 3 ml 10× TBE, 200 ml 10% ammonium persulfate, and 20 ml TEMED (add last). 14. 10× TBE. 15. Ethidium bromide. 16. 10 bp DNA ladder (1 mg/µl; Invitrogen 10821-015). 17. Buffer saturated phenol, pH 7.9 (Ambion AM9710). 18. Chloroform.
High-Throughput Approaches for miRNA Expression Analysis
111
2.7. P CR Cloning
1. TOPO TA Cloning, pCRII-TOPO version (Invitrogen K4500-01), or pGEM-T (Promega). 2. LB Broth Base (Invitrogen 12780). 3. TOP10 OneShot Cells (Invitrogen K4500-40). 4. X-gal (Invitrogen 15520-034). 5. IPTG (Invitrogen 15529-019).
3. Methods 3.1. Construction of Small RNA cDNA Libraries for HighThroughput Sequencing
To clone small RNAs, RNA adaptors are ligated to the 5′- and 3¢-ends of the small RNA molecules using T4 RNA ligase. This is followed by reverse transcription and PCR amplification. The purified PCR product can then be used for sequencing.
3.1.1. RNA Preparation
Most RNA isolation methods are based on either ethanol precipitation or immobilization on silica-based membrane that may not be suitable for small RNA extraction. Nevertheless, several columnbased methods have now been adapted for small RNA purification (e.g., Qiagen miRNeasy Mini Kit and Invitrogen PureLink™ miRNA Isolation Kit). Trizol reagent (or similar chemicals) has been widely used in RNA isolation for good retention of the small RNA population. A set of detailed protocols for isolation of total RNA that is well-suited for analysis of miRNA can be found in Chap. 3 of this volume. In all the methods described below, the small RNA enrichment step (LMW isolation) is recommended after total RNA isolation. High Molecular weight (HMW) RNAs can be precipitated using 5–10% PEG (MW = 8,000) as outlined below. In the LMW fraction, miRNAs are highly enriched.
3.1.2. Low Molecular Weight RNA Isolation
1. Isolate total RNA according to a protocol in Chap. 3. A yield of 200–400 mg of total RNA is typically used as starting material for the subsequent steps. 2. Dissolve total RNA in sterile DEPC-treated water to a concentration of about 1 mg/ml. 3. Precipitate mRNA and rRNA HMW RNAs by adding 50% PEG (MW = 8,000) to a final concentration of 5%, and 5 M NaCl to a final concentration of 0.5 M. Mix well and put tube on ice for 30 min. 4. To pellet HMW RNAs, spin down at maximum speed in a microcentrifuge (~12 K) for 10 min at 4°C (see Note 2). 5. Transfer supernatant, which contains the LMW RNAs, to a new microcentrifuge tube. Add 2.5 volumes of 100% EtOH, mix well, and place at −20°C for at least 2 h.
112
Lu and Souret
6. To pellet LMW RNAs, spin down at maximum speed in a microcentrifuge for at least 30 min at 4°C. 7. Remove supernatant carefully and wash pellet with 80% EtOH without dislodging. Allow the pellet to air dry. 8. Dissolve the pellet in DEPC-treated water. Use 10 ml of DEPCtreated water per 100 mg of total RNA. 9. To quickly confirm the abundance and integrity of the LMW RNA, run an aliquot of the preparation on a 1.5% agarose gel (see Note 3). 3.1.3. Purification of 20–30 nt Small RNA
Small RNAs ranging from 20 to 30 nt in length are further purified from the LMW fraction using the following protocol: 1. Run the LMW RNAs obtained from 100 to 200 µg of total RNA on an 8 M urea/15% polyacrylamide gel until the dyes are well separated (200 V, ~60 min). 2. Stain the gel with 1× TBE/ethidium bromide for 5 min. 3. After visualization under UV light, cut out a plug of the gel corresponding to the band size of 20–30 nt, put it into a 2 ml tube, and crush the gel thoroughly. 4. Elute RNA with two volumes of 0.3 M NaCl overnight at room temperature. 5. Precipitate the RNA with 2.5 volumes of 100% ethanol and 1 ml of GlycoBlue™ at −80°C for at least 2 h. 6. Allow the RNA pellet to dry, and then dissolve the RNA in 10 ml of DEPC-treated water.
3.1.4. 5¢ Adaptor Ligation and Purification
Unlike most RNA turnover products, miRNAs and siRNAs that are generated by RNaseIII-like endonuclease have 5¢ phosphate and 3¢ hydroxyl termini. This feature is used in the subsequent adaptor ligation reaction that requires the presence of a 5¢ phosphate and a free 3¢ hydroxyl group on the small RNAs. 1. The 5¢ adaptor ligation reaction is carried out in a 10 ml reaction containing 5 ml small RNAs from the previous step, 2 ml 5¢ RNA adaptor (20 mM), 1 ml 10× RNA ligase buffer, and 2 ml T4 RNA ligase (5 U/µl). Incubate the reaction at room temperature for 6 h to overnight. 2. Stop the reaction with 10 ml 2× loading dye. 3. Separate the entire reaction on a 8 M urea/10% polyacrylamide gel at 180 V for about 1 h. 4. Stain the gel with 1× TBE/ethidium bromide for 5 min. 5. Cut out a plug of the gel corresponding to the band size of 50–60 bp. 6. Elute and precipitate ligated RNAs as described in Subheading 3.1.3.
High-Throughput Approaches for miRNA Expression Analysis
113
3.1.5. 3¢ Adaptor Ligation and Purification
The procedures are essentially identical to the steps detailed in Subheading 3.1.4, except that a 3¢ adaptor (2 ml of 20 mM stock) is used in the ligation reaction, and the gel piece corresponding to the band size of 70–80 bp is recovered for RNA elution and precipitation.
3.1.6. RT-PCR
1. Combine 10 ml of purified ligation product, 2 ml of 100 mM RT-primer and 2.5 ml nuclease-free water. Heat at 65°C for 10 min and spin down to cool (put on ice). 2. Add 6 ml of 5× first strand buffer, 2 ml of 10 mM dNTP mix, 3 ml of 100 mM DTT, 1.5 ml RNaseOut, and 3 ml Superscript II RT (200 U/µl). Incubate at 45°C for 1 h, followed by a final incubation at 90°C for 5 min to inactivate the enzyme. 3. PCR is carried out in two 50 ml reaction tubes by combining the following components: 5 ml 10× PCR buffer, 1.5 ml 50 mM MgCl2, 1 ml 10 mM dNTPs, 0.5 ml 100 mM forward PCR primer, 0.5 ml 100 mM reverse PCR primer, 1 ml Taq polymerase (5 U/µl), and 2 ml of RT reaction mixture. The thermal cycler program is as follows: 94°C for 1 min, and 15–20 cycles of 94°C for 45 s, 55°C for 45 s, and 72°C for 45 s. This is followed by a 3 min incubation at 72°C (see Note 4). 4. To purify the PCR product, add 20 µl of 6× loading dye into a 100 µl-PCR reaction and load the entire sample into six wells of a 10% TBE-polyacrylamide gel. Run the gel at 150 V for 60 min. 5. Cut out a plug from the gel corresponding to the band size of 75–80 bp. 6. Elute and precipitate DNA as described in Subheading 3.1.3 (see Note 5).
3.1.7. Quality Control by Traditional Sequencing
This step is extremely important for assessing the quality of the small RNA cDNA libraries. Highly expressed miRNAs should be readily identified from the quality control (QC) data. Additionally, contamination from adapter self-ligation or other non genome-matching sequences should be lower than 20%. The libraries can be used for deep sequencing only if they pass the QC analysis. 1. Clone the purified PCR product into suitable vectors (pCRIITOPO, Invitrogen or pGEM-T, Promega) following the manufacturer’s protocol. 2. After transformation and incubation, transfer white or light blue colonies to a 96-well plate and culture them overnight. 3. Sequence the clones in the plate using traditional Sanger sequencing.
114
Lu and Souret
3.1.8. High-Throughput Sequencing
Several different high-throughput sequencing technologies have been widely applied in small RNA profiling. In an early study, using MPSS, 17 nucleotides of sequence are obtained for each small RNA molecule (16). This method generates more than half a million sequences from a single sequencing run (16). The “454” technology is another method that has been described and applied in small RNA discovery (20). Since this technology offers longer read length compared to the MPSS method, sequencing of the full-length small RNA is easily achievable (19, 21). However, the depth of 454 is only ~400,000 reads/run. More recently, Illumina has developed a four-color DNA sequencing method named SBS as a replacement for MPSS. Recent studies have indicated that this approach can generate >2 million 25–30 nt sequence tags with high accuracy (23, 24). The library construction protocol described here should be applicable to all of these technologies. Because these technologies adapt different protocols in processing the purified PCR products before sequencing, specific RNA adaptors might be required for different sequencing platforms. Consult the sequencing company before designing the adaptors.
3.1.9. Data Analysis
First, the adaptor sequences have to be trimmed from raw sequence data. These sequences are then matched against the corresponding genome and assigned to each location at which a perfect match was found. The numbers of total sequences from each library are different, so sequences are normalized to facilitate comparisons among libraries. Normalization is necessary to ensure that comparisons across libraries accurately reflect biological differences and not merely differences in the total number of tags sequenced. The expression of a miRNA gene is measured by determining the frequency of signatures derived from the gene in a given library. For small RNA abundance determination, we merged the sequencing runs and calculated a single abundance normalized to “Transcripts Per Quarter million (TPQ)” after the removal of rRNA, tRNA, snoRNA, or snRNA signatures. The expression of known miRNAs can be compared across libraries with normalized abundance (16, 19). In theory, in-silico comparisons among libraries would detect all the miRNAs (known or unknown) with different expression patterns. However, unlike animals, the plant small RNA population is very complex and has a predominant class of small-interfering RNAs (siRNAs). In Arabidopsis, ~70% of small RNAs are endogenous siRNAs (16). This number is even higher in other plants that have a larger genome and more repetitive elements (80% in rice, 90% in maize) (17, 18). The large number of siRNAs makes it a major challenge to identify low-abundance miRNAs (20). Several computational filters are used to facilitate the identification of miRNAs. Although different labs developed these filters independently, all the designs were based on consensus properties
High-Throughput Approaches for miRNA Expression Analysis
115
of a known miRNA reference set (17, 19, 21). Some of the common filters include size, abundance, number of genome matches, cluster density, single strand origin, folding structure, and detection of miRNA*. As of July 2009, the miRNA Sanger database contained nearly 1,700 plant miRNAs. Most of them are 21-nt in length. By only considering 21-nt small RNAs, the vast majority of siRNAs can be excluded from the analysis. Most siRNAs are generated from high density clusters and from both strands, and have very low abundance (< 20TPQ) but a large number of genome matches (>20). Therefore, small RNAs matching to >20 genomic locations and originating from siRNAclusters can be eliminated, as can lowly expressed ones. The small RNAs, which pass these filters, can be further cleaned up by predicting stem-loop folding and miRNA* confirmation. Based on our experience, only a limited number of small RNAs are qualified as good miRNA candidates with this set of filters. Notably, some parameters of the filters can be adjusted to meet the desired level of stringency. 3.2. Small RNA Oligonucleotide Microarray
In addition to the high-throughput sequencing technologies aimed at characterizing the small RNA population and identifying novel miRNA genes in plants as described previously, several microarray platforms have been developed to profile miRNA accumulation and to assess the differential expression in several biological tissues (25, 26). The method section below will highlight (1) the different array platforms that have been developed for detecting plant small RNAs (mainly miRNAs), (2) the various labeling strategies that are used to label these small RNA molecules, and (3) the microarray processing and data analysis. Finally, different methods suitable for high-throughput sequencing and microarray data validation will be discussed (e.g., PCR-based, ligation-based assays, etc.). Similar to other technologies involving nucleic acid liquid hybridization, assay specificity and sensitivity are important criteria to characterize, especially when monitoring expression of short nucleotide sequences.
3.2.1. Microarray Manufacturing and Platform Design
Several small RNA pilot microarray designs have been reported in the literature. Some early ones contained only a limited number of features corresponding to Arabidopsis and rice miRNA sequences on the array. More recently, one study spotted more than 100 Arabidopsis mature miRNA sequences (27). Arrays are manufactured by spotting selected probes in replicate onto 1” × 3” activated glass slides. Specific oligonucleotide probes designed to be complementary to the miRNA sequences are generally extended at their 5′ terminus by a linker sequence that serves as an anchor to the solid glass matrix and allows the Tm of each candidate probe to be similar.
3.2.1.1. Custom Arrays
116
Lu and Souret
By labeling and hybridizing a set of synthetic small RNAs at various concentrations, the detection sensitivity of the microarray can be reliably determined and the linear dynamic range of signals be established (26, 28). Assay development for monitoring probe specificity targeting miRNA family members with one or more mismatches (e.g., let-7) could also be applied to plant miRNAs (e.g,. the miR-172 gene family) (28, 29). 3.2.1.2. Commercial Arrays
Several companies are now offering plant miRNA microarrays with probes designed to be complementary to the full-length mature miRNAs based on the latest registered and annotated miRNA sequences in miRBase at The Wellcome Trust Sanger Institute (http://microrna.sanger.ac.uk/sequences/). General characteristics of plant miRNA microarray platforms commercially available are presented in Table 2.
3.2.1.3. Other Array Approaches
Other platforms have also been used to examine miRNA gene expression analysis, including Arabidopsis genome-wide tiling arrays, and an alternative technique validated for mammalian miRNA regulation analysis, RNA-primed, array-based Klenow enzyme (RAKE). High-density tiling DNA microarrays contain several million short oligonucleotide probes (25–60 bp) that cover the entire genome or contigs of the genome in an unbiased fashion (30, 31). This platform, developed to monitor transcriptional activity in all regions of the genome, offers the potential to identify novel noncoding transcripts, including miRNA genes. Several platforms of tiling DNA microarrays have been successfully implemented for hybridization profiling of small RNAs in Arabidopsis (32, 33). The RAKE assay uses on-slide enzymatic reactions to monitor miRNA accumulation from a complex biological sample. miRNAs are detected with a streptavidin-conjugated fluorophore after the following steps: miRNA hybridization to their complementary DNA probes spotted on the slide, Exonuclease I treatment (to degrade single-stranded DNA oligonucleotides linked to the slide), and template (spotted probe)-dependent 3¢ end extension of the hybridized miRNAs with biotin-conjugated dATP (catalyzed by Klenow fragment of DNA polymerase I) (34). Its application to monitor plant miRNA expression profiling has yet to be reported.
3.2.1.4. Control Probes
To monitor hybridization efficiency and to normalize signal intensity of the small RNA microarray, additional controls and references are generally added to the design (see some examples in (13, 28, 29)). Negative control probes are commonly included on the array to estimate fluorescence background and background variance. These features are designed to have minimal cross-hybridization with the experimental samples (e.g., Escherichia coli sequences, sense sequences of the miRNAs spotted, and random oligonucleotides).
High-Throughput Approaches for miRNA Expression Analysis
117
Mismatch probes can also be added to monitor hybridization efficiency (35). Positive control features such as “spike-in” DNA oligonucleotides and other selected probes complementary to noncoding RNAs (e.g., tRNA, snoRNA, and snRNA) can be spotted to monitor labeling efficiency and hybridization, and to assist in data normalization. 3.2.2. Plant Small RNA Labeling
Efficient labeling of plant miRNAs for microarray application can be a challenge, since they represent only a small fraction of the mass of a total RNA sample (~0.01%), and because of the characteristics they display: small size, lack of polyadenylated tail, and 2¢-O-methylation at their 3¢ ends (36). Therefore, the traditional oligo(dT) priming-dependent reverse transcription-based method used for mRNA labeling and other approaches suitable for mammalian miRNA processing (based on miRNA sequence extension using poly(A) polymerase, e.g., Ambion mirVana™ Labeling method and Invitrogen NCode™ miRNA Labeling System) may not be appropriate for labeling plant small RNAs. Nevertheless, several assays, with or without amplification, have been developed and validated for plant miRNA labeling that combine labeling efficiency and streamlined protocols (see below; Tables 1 and 2). When relevant, we have also referenced several approaches used to label animal mature miRNAs that could be used with plant samples.
3.2.2.1. Target Preparation
Preparation of high-quality RNA (total and/or small RNA enriched samples) is critical for successful miRNA profiling and microarray experiments. Protocols for extracting total RNA are discussed in great detail in Chap. 3 (see Note 6).
3.2.2.2. Starting Material
The amount of total RNA and/or small RNA-enriched fraction required per assay will depend on the labeling strategy and the sample to be analyzed, as the relative mass of miRNAs can vary considerably between sample types. Therefore, it may be necessary to test a broad range of input RNA in pilot miRNA profiling experiments to avoid hybridization signal saturation or weakness. Enrichment and/or isolation of mature miRNAs from samples as discussed in Subheadings 3.1.2 and 3.1.3 may also be beneficial to exclude array signals from the miRNA precursors.
3.2.2.3. Labeling Methods: Direct Vs. AmplificationBased Approaches
Small RNA direct labeling methods that minimize sample manipulation and eliminate the potential bias associated with amplificationbased strategies are most likely to accurately measure miRNA populations. Several commercial kits and “home-made” procedures are available and can be used to reliably label plant miRNAs prior to array hybridization. Kreatech ULS™ Small RNA Labeling Kit and Mirus Label IT® miRNA Labeling Kit have been used to label plant miRNAs from 1 mg of small RNA-enriched sample. These labeling systems
118
Lu and Souret
generate Cy-labeled small RNAs by binding fluorescent labels to the N7 position of guanine residues in a nonenzymatic reaction. In addition, biotin-labeled miRNA can also be achieved with the Mirus Label IT® miRNA Labeling Kit. The ULS and Label IT® labeling assays are compatible with several types of miRNA microarray platforms, although it is recommended to consult the manufacturers’ guidelines for array platform compatibility. The major drawback of these approaches might be the absence of label on mature miRNAs lacking G residues. Alternatively, high-efficiency direct labeling of human miRNAs using T4 RNA ligase to attach a single fluorophore-labeled nucleotide (e.g., Cyanine 3-pCp) to the 3¢-end of mature miRNA has been implemented in sample preparation for Agilent miRNA microarrays (37). It would be pertinent to assess this strategy to label plant miRNAs. Direct labeling of small RNAs has also been achieved by taking advantage of the 3¢ hydroxyl group characteristic of mature miRNA that results from dicer-catalyzed processing of precursor miRNAs (1). Using T4 RNA ligase, short Cy-labeled RNA adaptors with a 5¢ phosphate modification can be ligated to the 3¢ hydroxyl group of mature miRNAs in the presence of ATP (27, 38, 39). Optimized ligation reaction conditions, including addition of polyethylene glycol (PEG), use of excess amounts of RNA adaptors, and alternative buffer compositions have been reported (39) and may be a starting point for ligase-based direct labeling of plant miRNAs. Another enzymatic-based method for target preparation involves using random 8-mer biotinylated primers and reverse transcriptase to initiate first-strand cDNA synthesis without further amplification (27). Direct detection of the biotin-labeled miRNAs hybridized to the microarray is then accomplished using Streptavidin-Alexa647 conjugates. Although small RNA enrichment was not performed in this study, it is most likely to improve the labeling efficiency, as the vast majority of the RNA labeled from the total RNA sample represents ribosomal RNAs. Nonenzymatic biotinylation of miRNAs followed by detection with quantum dots (QDs) has been successfully applied to profile the expression of 11 rice (Oryza sativa L.) miRNAs on microarrays (26). Direct addition of a biotin group at the 3¢ end of miRNAs was achieved by periodate oxidation and reaction with biotin-X-hydrazide. Once hybridized to the array probes, the biotinylated miRNAs were detected using QD Streptavidin conjugates. As an alternative detection approach, Liang et al. (26) effectively tested streptavidin-conjugated gold nanoparticles coupled with the silver enhancement method to reliably monitor miRNA accumulation in rice seedlings. Amplification-based labeling of plant miRNAs during sample preparation prior to array hybridization has been exploited by Axtell and Bartel (25). In this study, Arabidopsis and other plant small
High-Throughput Approaches for miRNA Expression Analysis
119
RNAs were first fractionated, and then sequentially ligated to 5¢ and 3¢ adaptors. The ligation products were then reverse-transcribed and PCR-amplified using a 5¢-end Cy-labeled forward primer and a significantly longer unlabeled reverse primer. Using this elegant approach, asymmetric PCR products could be generated, and the shorter Cy-labeled single strands could then be purified on denaturing polyacrylamide gel before hybridization on array (13, 25). By matching the 5¢ Cy3-labeled plant miRNA sample to a Cy3labeled oligonucleotide reference library prior to hybridization, this approach allowed semi-quantitative, sensitive, and highly reproducible plant and vertebrate miRNA expression profiling (13, 25). As an alternative approach to human small RNA labeling based on adaptor ligation, PCR amplification followed by labeled cRNA synthesis using T7 RNA polymerase, has also been reported (35). This allows the labeled pools to be hybridized on sense array probes. This method could most likely be adapted to label plant miRNAs with little optimization needed. 3.2.3. Microarray Hybridization and PostHybridization Wash Processing
It is highly recommended to run array experiments in triplicates. Optimization of hybridization and post-hybridization wash conditions should be determined empirically for best microarray performance, including array specificity, sensitivity, and reproducibility (40). Microarray hybridization and washing conditions associated with commercial and “custom” plant miRNA microarrays are presented in Tables 1 and 2 as examples of validated processing.
3.2.4. Image Acquisition and Data Processing
Following post-hybridization washes, arrays are dried and then scanned. Scanned images should be carefully inspected before raw data are extracted using Feature Extraction Software (Agilent Technologies), LuxScan (CapitalBio), GenePix (Axon Instruments), or similar software. Various methods for microarray data processing have been developed that generally involve lowintensity signal removal, signal normalization, followed by normalized log ratio determination. Clustering of log transformed values, dendrogram, and expression heat map creation can be executed using R (http://CRAN.R-project.org/) or Cluster, and visualized using TreeView. Additional microarray analysis software such as GeneSpring Software (Agilent Technologies) or Quantarray (PerkinElmer) may be useful for additional data analysis and other visualization tools. Original microarray data can be deposited at the Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/) following MIAME (Minimum Information About a Microarray Experiment) guidelines (41).
3.3. Data Validation
All available high-throughput sequencing methods are still expensive, so it is cost-prohibitive to analyze multiple biological and technical replicates. This makes data validation very important and
Cy dye
Biotin-Xhydrazide
Cy dye
Amersham Biosciences Codelink activated glass slides
Sigma GOPTSactivated slides
Aldhydemodified glass slides
(25)
(26)
(27)
pCU-Cy ligation
Direct labeling
Adaptor ligation followed by amplification
Sensitivity
1.5 mg (as low as 0.2 mg) ~0.4 fmol biotinylated DR: 0.16– miRNAs 20 nM
Mass input Labeling method per array
GOPTS: glycidyloxipropyltrimethoxysilane; DR dynamic range
Label type
Reference Platform
Table 1 Custom array characteristics and study references
3× SSC, 0.2% SDS, 15% formamide; overnight at 42°C
Formamide-based buffer; overnight at 37°C
3.5× SSC, 1% BSA, 0.1% SDS, 93 mg/ml salmon tested DNA, 187 mg/ml E. coli tRNA, 37 mg/ml polyadenine; 16 h at 57°C
2× SSC, 0.2% SDS for 5¢ at 42°C; then 0.2× SSC for 5¢ at RT
1× SSC, 0.5% SDS for 10¢ at 37°C
2× SSC, 0.1% SDS for 5¢ at 50°C; then 0.1× SSC, 0.1% SDS for 10¢ at RT; and 0.1× SSC for 1¢ at RT (3 times)
Hybridization buffer and Post-hybridization conditions wash conditions
120 Lu and Souret
Plant species
Plant
Arabidopsis
Arabidopsis Rice Maize Black cottonwood Soybean
Arabidopsis
Arabidopsis Rice Maize Viridiplantae
Array manufacturer
CapitalBio Corp http://www. capitalbio.com
Combimatrix http://www. combimatrix. com
Exiqon miRCURY™ LNA microRNA arrays http://www.exiqon. com
GenoSensor GenoExplorer™ http://www. genosensorcorp. com
LC Sciences http://www. lcsciences.com
158 254 44 1014
Service (Cy3® and/or Cy5® labeling)
GenoExplorer™ microRNA 5¢end labeling with array labeling kit biotin label GenoExplorer™ microRNA array chip GenoExplorer™ microRNA array probe set Hybridization and blocking buffers, washing and staining solutions
mParaflo™ Microfluidic N/A Chip Technology Detection Limit < 100 amol Dynamic range > 3.5 logs
187 plus controls 1” × 3’ slideAmine-modified (include mature oligonucleotide probes and precursor Multiple sub-arrays on each miRNAs) slide Detection limit <1 fmol Dynamic range >3 logs
33
5 µg total RNA (then miRNA sample enrichment)
5–10 mg total RNA (no enrichment required)
0.5–2 mg of a labeled target sample per array sector required
Starting material Comments
Enzymatic fluorescent From 30 ng labeling (Hy3™ total RNA and Hy5™) (no enrichment required)
1” × 3’ slide LNA capture probes 8 sub-arrays in 4 replicates on each slide Feature size: 90 mm Detection limit <1 amol Dynamic range >4 logs
203 277 98 234
miRCURY™ LNA microRNA Labeling kit miRCURY™ LNA Array Spike-in miRNA kit
Biotin or Fluorochrome Labeling (Cy3®, Cy5®, or AlexaFluor® 555 and 647 fluorescent dyes)
1” × 3’ slide Antisense oligonucleotide probes 4 × 2 K array
labeling procedure
971 plus 32 controls
Convenient reagents Service
Platform characteristics
426 plus controls (include mature and precursor miRNAs)
Number of features
Table 2 Commercial arrays available for plant miRNA expression profiling.
High-Throughput Approaches for miRNA Expression Analysis 121
122
Lu and Souret
challenging, particularly for some treatments with a high level of background noise. Also, many investigators are unfortunately reporting microarray data without confirming their results. Considering the amount of data generated from high-throughput sequencing methods and the information emerging from expression profiling of miRNAs using microarrays, it is becoming critical to confirm the results obtained by other traditional gene expression techniques. Northern blotting is still the gold standard in assessing miRNA expression, so it is preferable to have RNA gel confirmation for any sequencing data. In our first high-throughput sequencing study (comparing Arabidopsis flowers and seedlings), we saw a good correlation rate between gel blots and MPSS data, particularly for highly regulated miRNAs (ratio >10). However, in our stress-treatment experiments, only ~20% of highly regulated candidates predicted by sequencing data could be validated by northern blots. This could be partially explained by crosshybridization of the oligonucleotide probes used for the blots with nearly identical small RNAs, or sequencing errors in some of the data. More importantly, this suggests that the fluctuation in miRNA expression is very high when plants are under stress conditions. Certainly, adding biological replicates should be able to greatly reduce the “noise” level. As sequencing costs drop and sequencing power rises, biological and technical replicates are likely to be feasible in the near future. Other methods, requiring smaller amounts of starting material, have also been developed to confirm plant miRNA expression profiling data and involve PCR-based or splinted-ligation techniques. Taqman probes from Applied Biosystems Inc. (ABI) are now available to detect and monitor the expression of nearly 70 Arabidopsis miRNAs. A method for direct labeling and isotopic detection of plant small RNAs has also been developed based on splinted-ligation technology. Using this approach, the accumulation of Arabidopsis and rice endogenous small RNAs were monitored from as little as 250 ng of total RNA (17, 42). Although, cumbersome and time-consuming, ribonuclease protection assay (RPA) has also been used to examine the accumulation of endogenous small RNAs.
4. Notes 1. The exact sequence of the adapters can be changed based on specific needs. Both adaptors were PAGE-purified by Dharmacon. 2. The pellet from this step can be dissolved in DEPC-treated water and used for regular northern blots.
High-Throughput Approaches for miRNA Expression Analysis
123
3. When run next to HMW RNA, the most prominent band in the LMW RNA will be the tRNA at about 75 nt. 4. To get enough cDNA for sequencing and to maintain quantitative information at the same time, less than 20 PCR cycles are usually used for amplification. Based on our experience, starting from 100 mg of total RNA, 18 PCR cycles can generate ~100 ng of purified product. If a larger amount of PCR product is required, the volume of PCR reaction can be scaled up accordingly. 5. A shorter product of 50 bp band may also be seen in the gel. This band is likely generated from adaptor ligation product without small RNA inserts. Because most of the PCR purification kits have poor recovery efficiency for small-sized DNA fragments, gel purification is recommended. 6. Although this step is optional in the miRNA microarray general procedure, total RNA and small RNA fractions should be characterized to evaluate the quality and integrity of the RNA sample that has been extracted prior to labeling and hybridization. In general, these factors can be assessed using the Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA), by running a small aliquot on a denaturing gel (e.g., UREApolyacrylamide gel), or by performing real-time RT-PCR reactions on selected targets, etc.
Acknowledgments Research by the authors was supported, in part, by NSF grant MCB#0445638 and USDA grant # 2007-01991 to P.J. Green and NSF grant MCB#0548569 to P.J. Green and B.C. Meyers. We thank Sharon Bancroft for editorial assistance.
References 1. Chen X (2005) MicroRNA biogenesis and function in plants. FEBS Lett 579(26): 5923–5931 2. Jones-Rhoades MW, Bartel DP, Bartel B (2006) MicroRNAS and their regulatory roles in plants. Annu Rev Plant Biol 57:19–53 3. Aukerman MJ, Sakai H (2003) Regulation of flowering time and floral organ identity by a MicroRNA and its APETALA2-like target genes. Plant Cell 15(11):2730–2741 4. Juarez MT, Kui JS, Thomas J, Heller BA, Timmermans MC (2004) microRNA-mediated
repression of rolled leaf1 specifies maize leaf polarity. Nature 428(6978):84–88 5. Palatnik JF et al (2003) Control of leaf morphogenesis by microRNAs. Nature 425(6955): 257–263 6. Borsani O, Zhu J, Verslues PE, Sunkar R, Zhu JK (2005) Endogenous siRNAs derived from a pair of natural cis-antisense transcripts regulate salt tolerance in Arabidopsis. Cell 123(7):1279–1291 7. Jones-Rhoades MW, Bartel DP (2004) Computational identification of plant microRNAs and their targets, including a stressinduced miRNA. Mol Cell 14(6):787–799
124
Lu and Souret
8. Sunkar R, Chinnusamy V, Zhu J, Zhu JK (2007) Small RNAs as big players in plant abiotic stress responses and nutrient deprivation. Trends Plant Sci 12(7):301–309 9. Sunkar R, Kapoor A, Zhu JK (2006) Posttranscriptional induction of two Cu/Zn superoxide dismutase genes in Arabidopsis is mediated by downregulation of miR398 and important for oxidative stress tolerance. Plant Cell 18(8):2051–2065 10. Sunkar R, Zhu JK (2004) Novel and stressregulated microRNAs and other small RNAs from Arabidopsis. Plant Cell 16(8):2001–2019 11. Valoczi A et al (2004) Sensitive and specific detection of microRNAs by northern blot analysis using LNA-modified oligonucleotide probes. Nucleic Acids Res 32(22):e175 12. Varallyay E, Burgyan J, Havelda Z (2007) Detection of microRNAs by Northern blot analyses using LNA probes. Methods 43(2): 140–145 13. Baskerville S, Bartel DP (2005) Microarray profiling of microRNAs reveals frequent coexpression with neighboring miRNAs and host genes. RNA 11(3):241–247 14. Lim LP et al (2005) Microarray analysis shows that some microRNAs downregulate large numbers of target mRNAs. Nature 433(7027): 769–773 15. Thomson JM, Parker J, Perou CM, Hammond SM (2004) A custom microarray platform for analysis of microRNA gene expression. Nat Methods 1(1):47–53 16. Lu C et al (2005) Elucidation of the small RNA component of the transcriptome. Science 309(5740):1567–1569 17. Lu C et al (2008) Genome-wide analysis for discovery of rice microRNAs reveals natural antisense microRNAs (nat-miRNAs). Proc Natl Acad Sci U S A 105(12):4951–4956 18. Nobuta K et al (2007) An expression atlas of rice mRNAs and small RNAs. Nat Biotechnol 25(4):473–477 19. Fahlgren N et al (2007) High-throughput sequencing of Arabidopsis microRNAs: Evidence for frequent birth and death of MIRNA genes. PLoS ONE 2:e219 20. Lu C et al (2006) MicroRNAs and other small RNAs enriched in the Arabidopsis RNAdependent RNA polymerase-2 mutant. Genome Res 16(10):1276–1288 21. Rajagopalan R, Vaucheret H, Trejo J, Bartel DP (2006) A diverse and evolutionarily fluid set of microRNAs in Arabidopsis thaliana. Genes Dev 20(24):3407–3425 22. Shendure J, Mitra RD, Varma C, Church GM (2004) Advanced sequencing technologies:
methods and goals. Nat Rev Genet 5(5): 335–344 23. Cokus SJ et al (2008) Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature 452(7184):215–219 24. Mi S et al (2008) Sorting of small RNAs into Arabidopsis argonaute complexes is directed by the 5’ terminal nucleotide. Cell 133(1): 116–127 25. Axtell MJ, Bartel DP (2005) Antiquity of microRNAs and their targets in land plants. Plant Cell 17(6):1658–1673 26. Liang RQ et al (2005) An oligonucleotide microarray for microRNA expression analysis based on labeling RNA with quantum dot and nanogold probe. Nucl Acids Res 33(2):e17 27. Liu CG, Spizzo R, Calin GA, Croce CM (2008) Expression profiling of microRNA using oligo DNA arrays. Methods 44(1):22–30 28. Wang H, Ach RA, Curry B (2007) Direct and sensitive miRNA profiling from low-input total RNA. RNA 13(1):151–159 29. Sun Y et al (2004) Development of a microarray to detect human and mouse microRNAs and characterization of expression in human organs. Nucleic Acids Res 32(22):e188 30. Stolc V et al (2005) A pilot study of transcription unit analysis in rice using oligonucleotide tilingpath microarray. Plant Mol Biol 59(1):137–149 31. Yamada K et al (2003) Empirical analysis of transcriptional activity in the Arabidopsis genome. Science 302(5646):842–846 32. Boccara M et al (2007) New approaches for the analysis of Arabidopsis thaliana small RNAs. Biochimie 89(10):1252–1256 33. Stolc V et al (2005) Identification of transcribed sequences in Arabidopsis thaliana by using high-resolution genome tiling arrays. Proc Natl Acad Sci U S A 102(12):4453–4458 34. Nelson PT et al (2004) Microarray-based, high-throughput gene expression profiling of microRNAs. Nat Methods 1(2):155–161 35. Barad O et al (2004) MicroRNA expression detected by oligonucleotide microarrays: system establishment and expression profiling in human tissues. Genome Res 14(12):2486–2494 36. Yu B et al (2005) Methylation as a crucial step in plant microRNA biogenesis. Science 307(5711):932–935 37. Wang X, Wang X (2006) Systematic identification of microRNA functions by combining target prediction and expression profiling. Nucleic Acids Res 34(5):1646–1652 38. Castoldi M, Benes V, Hentze MW, Muckenthaler MU (2007) miChip: a microarray platform for expression profiling of microRNAs based on
High-Throughput Approaches for miRNA Expression Analysis
locked nucleic acid (LNA) oligonucleotide capture probes. Methods 43(2):146–152 39. Yin JQ, Zhao RC (2007) Identifying expression of new small RNAs by microarrays. Methods 43(2):123–130 40. Miska EA et al (2004) Microarray analysis of microRNA expression in the developing mammalian brain. Genome Biol 5(9):R68
125
41. Edgar R, Barrett T (2006) NCBI GEO standards and services for microarray data. Nat Biotechnol 24(12):1471–1472 42. Maroney PA, Chamnongpol S, Souret F, Nilsen TW (2007) A rapid, quantitative assay for direct detection of microRNAs and other small RNAs using splinted ligation. RNA 13(6):930–936
Chapter 9 In Situ Detection of miRNAs Using LNA Probes Zoltán Havelda Abstract A spatial and temporal analysis of miRNA accumulation by in situ analyses is the prerequisite of understanding the precise biological functions of miRNAs. Since miRNAs are very short molecules, their in situ analysis is technically demanding. Here, we describe a protocol for miRNA in situ detection in plants based on LNA-modified oligonucleotides probes. LNA modification significantly enhances the sensitivity and specificity of miRNA detecting probes and provides relatively easy in situ miRNA detection. Key words: miRNA, Plant, LNA, In situ hybridization
1. Introduction Understanding the precise role of miRNAs in the biological processes requires the spatial and temporal investigation of mature miRNA accumulation by in situ hybridization. The technical problem associated with this technology derives from the length of the target RNAs (21–25 nt), which inhibits their reliable and sensitive detection. Locked nucleic acid (LNA) modified oligonucleotide-based probes have been introduced to enhance both the sensitivity and the specificity of miRNA detection by northern blotting and in situ hybridization (1–3). LNA modifications in DNA oligonucleotides bring about a dramatically higher target affinity and specificity compared to a traditional DNA oligonucleotide (4). Using this technology, miRNAs can be detected relatively easily in both plant and animals (Fig. 1). Here we describe a detailed protocol for in situ hybridization of plant tissue sections using LNA-modified oligonucleotides as probes.
B.C. Meyers and P.J. Green (eds.), Plant MicroRNAs, Methods in Molecular Biology, vol. 592 DOI 10.1007/978-1-60327-005-2_9, © Humana Press, a part of Springer Science + Business Media, LLC 2010
127
128
Havelda
miR160
miR167
miR124
Fig. 1. Differential accumulation of miR160 and miR167 in A. thaliana flowers. Longitudinal near consecutive sections of A. thaliana flowers have been hybridized with 5¢ and 3¢ double-labeled LNA-modified oligonucleotides detecting miR160, miR167, and miR124 (mouse miRNA used as negative control). The hybridization was carried out at 60° C overnight. Arrows show the accumulation of miR167.
2. Materials 2.1. Equipment
1. Vacuum chamber and incubation chamber. 2. Rotary microtome (e.g., HM335E from Microm, Germany). 3. Hotplate. 4. Hybridization oven. 5. Standard light microscope with camera.
2.2. Chemicals
1. Ethanol (100%). 2. Wax (e.g., Paraplast plus, Sigma-Aldrich). 3. Eosin Y disodium salt (Fluka). 4. Histoclear (National Diagnostics, Atlanta, Georgia) or RotiClear (Roth, Germany).
In Situ Detection of miRNAs Using LNA Probes
129
5. Wax: Paraplast X-tra (Sigma). 6. Commercially prepared poly-l-lysine slides (e.g., Poly-Prep Slides, Sigma), coverslips. 7. Protease from Streptomyces griseus (Pronase, Sigma), 40 mg/ mL in water and self-digested at 37° C for 2 h to remove contaminant nuclease activities. Store in aliquots at −20 C. 8. Acetic anhydride (Sigma). 9. Triethanolamine (Sigma). 10. RNaseA (Sigma). 11. LNA-modified oligo (see Note 1). 12. DIG Oligonucleotide 3¢-End Labeling Kit (Roche). 13. Anti-Digoxigenin-AP, F¢ab fragments (Roche). 14. Ribonucleic acid, transfer from E. coli strain W. 15. Blocking Reagent (for nucleic acid hybridization, Roche). 16. BSA Fraction V (Sigma). 17. NBT (nitro blue tetrazolium): 50 mg/mL, BCIP (5-bromo4-chloro-3-indolyl-phosphate): 50 mg/mL (Promega). 18. Alcian Blue solution in 3% acetic acid. 19. DPX Mountant for histology (Fluka). 2.3. Buffers and Solutions
1. Fixative solution: Paraformaldehyde (4%, w/v, Sigma) and 0.1% Triton-X 100 in phosphate-buffered saline (PBS): 0.13 M NaCl, 7 mM Na2HPO4, 3 mM NaH2PO4, pH 6.7 (see Note 2). 2. 10× saline: 8.5%, w/v NaCl in water. 3. SB buffer: 0.1 M Tris-HCL, 0.1 M NaCl, 0.05 M MgCl2, pH 9.5. Prepare 10× stock (1 M Tris, 1 M NaCl, pH 9.5) and add MgCl2 to 1× buffer just before use. 4. 10×TBS buffer: 1 M Tris-HCL, 1.5 M NaCl, pH 7.2. 5. Hybridization solution: 0.3 M NaCl, 10 mM Tris–HCl pH 6.8, 10 mM NaHPO4 pH 6.8, 5 mM EDTA, 50% formamide, 10% dextran sulfate, 1× Denhardt’s solution, 1 mg/ mL tRNA (Sigma) (see Note 3). 6. Washing solution: 0.2× SSC. 7. 10× NTE buffer: 5 M NaCl, 100 mM Tris–HCl pH 7.5, 10 mM EDTA. 8. Blocking solution 1: Dissolve 0.5% Blocking Reagent (Roche) in 1× TBS buffer at 60°C and cool down. 9. Blocking solution 2: Dissolve 1 % BSA in 1× TBS and add 0.3% Triton X-100. 10. 10× salts buffer: 3 M NaCl, 0.1 M Tris–HCl pH 6.8, 0.1 M NaHPO4 pH 6.8, 50 mM EDTA.
130
Havelda
11. 20× pronase buffer: 1 M Tris–HCl pH 7.5, 0.1 M EDTA. 12. 10× PBS buffer: 1.3 M NaCl, 0.07 M Na2HPO4, 0.03 M NaH2PO4 (pH 6.5–7).
3. Methods In situ hybridization of miRNAs depends upon the detection of RNA; therefore, it is very important to avoid contamination with RNases. The working environment should be clean, and nuclease-free tubes, bottles, etc. should be used. The water and all aqueous solutions should be autoclaved and preferably aliquotted for single usage. Because of its hazardous nature, we do not favor the use of diethyl pyrocarbonate (DEPC) treatment. 3.1. Embedding and Sectioning
1. Embedding of tissue parts in wax is a long and slow process that takes several days. Once the samples are removed from the plant, they should be placed in ice-cold fixative solution immediately. After transferring the material into the fixative, apply vacuum until bubbles are formed and hold the vacuum for 5–10 min. Release the vacuum very slowly. Formaldehyde in the fixative is toxic, so work in a fume hood when possible. Keep the tubes on ice during the procedure. 2. Repeat the vacuum treatment and change the fixative solution. 3. Repeat the whole procedure until all the samples sink to the bottom of the tube. This is an indication of the complete infiltration of fixative solution into the tissue samples. 4. Exchange the fixation solution when all the samples have sunk down and fix the samples at 4° C for 4 h overnight, with gentle shaking (see Note 4). 5. Wash the fixative solution away with ice-cold 1× saline solution. 6. Dehydrate the samples on ice by passing them through a graded ethanol series with gentle rotation. Depending on the size of the samples, every step takes 1–3 h. Use the following concentrations at each step for the series: 30% EtOH/1× saline, 40% EtOH/1× saline, 50% EtOH/1× saline, 60% EtOH/1× saline, 70% EtOH/1× saline, 85% EtOH/1× saline, 95% EtOH/H2O. You can interrupt the protocol and leave the samples at 4° C at any step after 50% EtOH/1× saline. 7. The following steps are performed at room temperature with gentle shaking. Replace 95% EtOH/H2O with 100% EtOH containing 0.1% EosinY (Sigma) for 1–3 h. Replace the staining solution with 100% EtOH and incubate for 1–3 h. Repeat once. 8. Pass the samples through a graded Roti-Histol (Roth, Karlsruhe, Germany) series with gentle rotation. It is advisable to carry out the Roti-Histol steps in glass vials or good
In Situ Detection of miRNAs Using LNA Probes
131
quality reaction tubes. Depending on the size of the samples, every step takes 1–3 h. Use the following concentrations at each step for the series: 25% Roti-Histol/EtOH, 50% Roti-Histol/ EtOH, 75% Roti-Histol/EtOH, 100% Roti-Histol. Repeat the 100% Roti-Histol step twice.
9. Add one Paraplast X-tra chip (Sigma) to about one mL of Roti-Histol and leave to dissolve at room temperature. Once the first chip has dissolved, add another one, and so on, until the chips will not dissolve any more. At this stage, transfer the samples to 42° C and saturate the Roti-Histol by adding more and more Paraplast X-tra chips until saturation. 10. Put samples at 60° C and exchange half of the volume for freshly melted Paraplast X-tra (avoid heating Paraplast X-tra over 60° C). Try to form a cap of melted wax on top of the solution to avoid the heat stress of the samples. Repeat once and leave at 60° C overnight. 11. Change the wax, replacing with freshly melted wax, twice a day for 3 days. Decant the old Paraplast X-tra gently and pour in freshly melted Paraplast X-tra. Leave the tubes open to allow the evaporation of Roti-Histol. 12. Prior to sectioning the tissue, samples must be solidified in a wax block. Put the plastic or metal molds on a hotplate at 60° C, shake the samples and pour them out into the mold. Ideally, about 10–12 samples should be transferred into one mold. Arrange the samples in rows, leaving sufficient space between them to allow the cutting out of single blocks containing one or few samples. Move the samples on ice or at 4° C and allow them to harden. Apply the tissue block on a holder compatible with the microtone. 13. Make tissue sections (8–20 mm) using a retracting rotary microtome. Trim the block so that the upper and lower faces are parallel. Repeated sectioning will form a ribbon of sections. Trimming a trapezoid-shaped block helps the identification of a single section in the ribbon. 14. Mount the sections onto poly-l-lysine-coated preprepared slides. Wax sections that need to be stretched before adhesion to the glass slide. Sections should be put onto a layer of de-gassed water on a slide held on a warmed hotplate (40– 42° C). Once the section has stretched, drain away the excess water and leave the slide to dry overnight at 40–42° C. 3.2. LNA Probe Preparation
LNA-modified oligonucleotide-based probes can be purchased from Exiqon (Denmark; www.exiqon.com), and the level of their labeling depends on the required sensitivity. Arabidopsis thaliana tissue sections respond less efficiently in in situ experiments than other plant tissues (for example Nicotiana benthamiana). While 3¢ end-labeled probes (using the DIG Oligonucleotide 3¢-End Labeling Kit, Roche) work well in N. benthamiana, they do not
132
Havelda
provide reliable signals in A. thaliana. To achieve good signals in A. thaliana in situ experiments, it is recommended to order a 5¢ chemically DIG-labeled LNA-modified probe, and also label this probe at the 3¢ end, producing a double-labeled probe. Alternatively, double-labeled probes can be ordered. It is very important to use a similarly labeled negative control (for example, an animal-specific miRNA) during the experiments to test the background hybridization. 1. Label the 50 pmol LNA-modified oligonucleotide or 5¢ DIGlabeled LNA-modified oligonucleotide in 10 mL volume using the DIG Oligonucleotide 3¢ End Labeling Kit (Roche) according to the manufacturer’s instructions. It is not necessary to purify the probe after labeling. 2. Remove 0.2 mL (for probe checking) from the 10 mL reaction and add 10 mL deionized formamide. 3. To check the quality of the probe, spot the 0.2 mL aliquot (and the labeled control oligonucleotide provided by the kit) on a piece of membrane and UV cross-link. 4. Put the membrane in 1× TBS buffer for 2 min and incubate the membrane in TBS buffer containing 5% of powdered milk for 10 min. Add antiDIG-alkaline phosphatase and Fab fragments (1:2,000) and hybridize with gentle shaking for 30 min. 5. Wash at least three times in 1× TBS buffer for 5 min each and transfer to 1× SB buffer for 2 min. 6. Develop the color reaction by adding SB buffer containing NBT and BCIP (add 30 mL NBT and 30 mL BCIP solution to 10 mL SB buffer). Stop the reaction by rinsing the membrane with water and dry the membrane. 3.3. Slide Preparation
1. Transfer the slides into Roti-Histol at room temperature and incubate for 10 min. (For the first treatment, the Roti-Histol can be reused from the second treatment of a previous experiment.) Repeat using fresh Roti-Histol. 2. Transfer slides into 100% EtOH (can be reused from the previous experiment before) for 2 min. Repeat once using fresh 100% EtOH. 3. Hydrate the samples by passing them through a graded ethanol series for 2 min at each step. Use for the series the following concentrations at each step: 95% EtOH/H2O, 85% EtOH/1× saline, 70% EtOH/1× saline, 60% EtOH/1× saline, 50% EtOH/1× saline, 40% EtOH/1× saline, 30% EtOH/1× saline, 1× saline. 4. Equilibrate the slides in 1× pronase buffer for 2 min. Incubate the slides in pronase solution containing 0.125 µg/mL pronase for 10 min at room temperature. Incubate slides in 0.2% glycine in 1× PBS for 2 min, then wash in 1× PBS for 2 min.
In Situ Detection of miRNAs Using LNA Probes
133
Postfix in 4% formaldehyde/ PBS solution (in a fume hood) for 30 min. Wash the slides twice in 1× PBS for 2 min each. 5. To eliminate background reactions due to electrostatic binding of the hybridization probe, amino groups on the section should be acetylated using an acetic anhydride treatment (0.5 mL acetic anhydride in 100 mL 0.1 M triethanolamine– HCl, pH 8). Rinse the slides twice in 1× PBS for 2 min. To prepare the acetylation buffer, add 1.25 mL triethanolamine and 0.5 mL HCl to 98.25 mL water and stir. Add 0.5 mL acetic anhydride to the triethanolamine buffer and stir vigorously (work under fume hood). Since the acetic anhydride is very unstable in water, it has to be added just before using. Incubate the slides in buffered acetic anhydride for 10 min at room temperature. Wash the slides once in 1× PBS, then 1× saline for 2 min each. 6. Dip the slides in fresh saline solution for 2 min and dehydrate through a graded ethanol/saline series for 2 min each step. Use for the series the following concentrations at each step: 30% EtOH/1× saline, 40% EtOH/1× saline, 50% EtOH/1× saline, 60% EtOH/1× saline, 70% EtOH/1× saline, 85% EtOH/1× saline, 95% EtOH/H2O. 7. Transfer the slides to 100% EtOH. Now the slides are ready for hybridization. You can stop here and keep the slides safely in EtOH for hours. 3.4. Hybridization and Washing
1. Prepare the hybridization solution, about 100–200 mL hybridization solution per slide, depending on the number and size of the sections. Prepare more than needed to account for losses (see Note 3). 2. Add 1–2 mL labeled LNA probe (in 50% formamide) per slide to the hybridization solution. Mix well, centrifuge to eliminate bubbles, and keep at the temperature of hybridization (50– 60° C; see Note 5). 3. Put one slide on a hotplate at 50° C and allow it to dry. Apply the hybridization solution with probe as a band along the middle of the slide. Carefully cover with a coverslip, avoiding air bubbles. Put the slide in a closed environment saturated with 50% formamide/2× SSC prewarmed to the temperature of hybridization. (Prepare a plastic box with 50% formamide/2× SSC at the bottom. Place the slides on a horizontal support inside the plastic box). 4. Repeat with every slide individually. Close the box, seal with clingfilm, and incubate the slides at the temperature of hybridization overnight. Prepare washing solution in excess (0.2× SSC) and place at the temperature of hybridization. 5. Perform wash at 50–60° C (depending on the temperature of hybridization) in 0.2× SSC. Put the slides in washing solution
134
Havelda
and carefully remove the cover slips. Rinse the slides, having different probes, quickly several times in washing solution to avoid cross-contamination of probes. Wash the slides twice at the temperature of hybridization for 1 h each. 6. Immerse the slides in 1× NTE buffer, prewarmed to 37 ºC. Repeat in fresh buffer. Incubate slides in NTE containing 20 mg/mL RNase A at 37 ºC for 30 min to remove background hybridization. 7. Rinse the slides in 1× NTE for 5 min and transfer the rack to washing solution (0.2× SSC) for 1 h at the temperature of hybridization. Dip the slides into 1× SSC for 2 min then into 1× TBS twice for 5 min each time. Slides are now ready for the detection step. 3.5. Signal Detection
1. Incubate the slides in their rack in blocking solution 1 for 30 min. Transfer the slides into blocking solution 2 and incubate for 30 min. 2. Add antiDIG-alkaline phosphatase and Fab fragments (1:2,000, Roche) to the required amount of blocking solution 2 (0.5 mL per slide). Place the slides on a support and put them into a moist plastic or glass chamber on a tray. Apply the antibody solution onto the slides and incubate for 90 min at room temperature. 3. Stop the reaction by washing the slides (transferred to a rack) at least five times in excess 1× TBS for 5 min. Equilibrate the slides in 1× SB buffer for few minutes. 4. Develop the color reaction by adding 1× SB buffer containing NBT and BCIP (add 30 mL NBT and 30 mL BCIP solution to 10 mL 1× SB buffer). To develop the color reactions, put the slides into a moisture chamber and cover them individually with about 1 mL of substrate solution. Remove slides one by one from equilibrating 1× SB buffer and immediately apply the substrate solution, since after drying it can be difficult to spread the liquid. 5. Monitor the signal development for 2–24 h. Stop the reaction at the desired signal intensity by rinsing the slides in water. 6. Wash the slides in a graded EtOH series for 2–5 min (depending on the intensity of the signal and background) each step. Use the following concentrations at each step for the series: 40% EtOH/H2O, 70% EtOH/H2O, 95% EtOH/H2O, 100% EtOH. Repeat the process in reverse direction. 7. Counter-stain the sections by dipping the slides for 5–15 min in 0.25% Alcian blue in 3% acetic acid. The slides should be
In Situ Detection of miRNAs Using LNA Probes
135
monitored for the intensity of staining (tissue having no hybridization signals should show a faint blue staining). Rinse the slides in water and air dry. 8. Cover the section with a coverslip using mounting solution (DPX), about 100–200 mL per slide. Leave the slides to dry for a few hours. Now the sections are ready for data recording using a standard light microscope.
4. Notes 1. LNA-modified oligonucleotide probes detecting miRNAs can be ordered from Exiqon (http://www.exiqon.com) and a website for probe design is also available (http://lnatools.com). 2. It should be prepared in a screw-top bottle (e.g., Duran type) in a fume hood. Take 50 mL of (water) 1XPBS, and using a solution of 5 M KOH, adjust to pH >12. Heat the solution to 60° C on a hot plate and add 4 g paraformaldehyde while heating. Shake vigorously for about 30 s, and release the pressure every 5–10 s. The paraformaldehyde should dissolve completely, although very slight cloudiness is acceptable. Cool it on ice. Adjust the pH back down to 7 using H2SO4 (do not use HCl, as this releases a carcinogen). Then bring the volume up to 100 mL by adding 1× PBS. Add 0.1 mL of Triton-X 100. You can prepare a larger volume of solution and store it at −20° C in aliquots. Once you have thawed an aliquot, do not re-freeze. 3. For 1 mL hybridization solution, add 100 mL 10× salts buffer, 500 mL deionized formamide, 200 mL 50% dextran sulfate, 10 mL 100 mg tRNA, 10 mL 100× Denhardt’s solution, and water. The volume of the probe usually does not alter the concentration of hybridization solution significantly. Prepare a little bit more than the desired volume. 4. The fixation time strongly depends on the size and the tissue type of the samples. Larger and compact tissue samples usually require longer fixation. However, over-fixation can reduce the hybridization signal. 5. The temperature of hybridization strongly depends on the nature of the particular probe. A good starting temperature is 55° C. If the probe tends to give background hybridization, then increase the temperature of hybridization to 60° C, and in parallel increase also the temperature of washing. If no signal is detected, then lower the temperature of hybridization and washing to 50° C.
136
Havelda
Acknowledgments This work was supported by a grant from the Hungarian Scientific Research Fund (OTKA; K61461). ZH is a recipient of Bolyai Janos Fellowship.
References 1. Valoczi A, Hornyik C, Varga N, Burgyan J, Kauppinen S, Havelda Z (2004) Sensitive and specific detection of microRNAs by northern blot analysis using LNA-modified oligonucleotide probes. Nucleic Acids Res 32(22):e175 2. Valoczi A, Varallyay E, Kauppinen S, Burgyan J, Havelda Z (2006) Spatio-temporal accumulation of microRNAs is highly coordinated in developing plant tissues. Plant J 47(1):140–151
3. Kloosterman WP, Wienholds E, de Bruijn E, Kauppinen S, Plasterk RH (2006) In situ detection of miRNAs in animal embryos using LNAmodified oligonucleotide probes. Nat Methods 3(1):27–29 4. Kauppinen S, Vester B, Wengel J (2006) Locked nucleic acid: high-affinity targeting of complementary RNA for RNomics. Handb Exp Pharmacol 173:405–422
Chapter 10 Analysis of miRNA Modifications Bin Yu and Xuemei Chen Abstract After transcription, a large number of cellular RNAs employ modifications to increase their diversity and functional potential. Modifications can occur on the base, ribose, or both, and are important steps in the maturation of many RNAs. Our lab recently showed that plant microRNAs (miRNAs) possess a 2¢-O-methyl group on the ribose of the 3¢ terminal nucleotide, and that this methyl group is added after miRNA/ miRNA* formation. One function of this modification is to protect miRNAs from 3¢ terminal uridylation by an unknown enzymatic activity. It is possible that uridylation of miRNAs triggers their degradation. Here we describe a protocol to purify a specific miRNA in order to determine its molecular mass so that the presence of a modification can be inferred, an in vivo method to detect 3¢ terminal modification of miRNAs, and an (a-32P) dATP incorporation assay to study 3¢ terminal uridylation of miRNAs. Key words: miRNA, Methylation, Uridylation, b elimination
1. Introduction MicroRNAs (miRNAs) are short noncoding RNAs that recognize partially or completely complementary sequences inside target mRNAs and guide cleavage or translational inhibition of target mRNAs (1). This ability has made miRNAs important regulators of gene expression in both animals and plants (1). miRNAs are generated from long stem-loop precursor transcripts known as primiRNAs (1). In animals, an RNAase III enzyme Drosha processes pri-miRNAs into pre-miRNAs, which are processed by another RNAase III enzyme Dicer to generate transient 20–24 nucleotide (nt) miRNA/miRNA* duplexes (2–5). In plants, an RNAse III enzyme DICER LIKE1 (DCL1) processes priamiRNAs to preamiRNAs and pre-miRNAs to miRNA/miRNA* duplexes (6, 7) with the aid of HYL1 and SERRATE (8–11). miRNA/miRNA* duplexes show typical features of RNAase III products, 5¢ P, 3¢OH and a 2 nt overhang on each strand (4, 12). B.C. Meyers and P.J. Green (eds.), Plant MicroRNAs, Methods in Molecular Biology, vol. 592 DOI 10.1007/978-1-60327-005-2_10, © Humana Press, a part of Springer Science + Business Media, LLC 2010
137
138
Yu and Chen
Recently, our lab showed that Arabidopsis miRNA/miRNA* duplexes have an additional feature, 2¢-O-methylation on the 3¢ terminal ribose (13, 14), and an enzyme named HUA ENHANCER1 (HEN1; 15) catalyzes the methylation reaction (13). We revealed the presence of a methyl group on miR173 via mass spectrometry analysis of miR173 purified from Arabidopsis total RNAs. We also demonstrated the presence of methylation at the 3¢ termini of miRNAs by treating miRNAs with sodium periodate followed by b elimination (13). Complete or partial loss-offunction mutations in HEN1, such as hen1-1 or hen1-2, result in reduced accumulation and size heterogeneity of miRNAs and pleiotropic developmental defects (6, 15, 16). With the cloning of particular miRNAs and with an (a-32P) dATP incorporation assay, we showed that the size heterogeneity of miRNAs in hen1 mutants comes from 3¢ terminal uridylation, suggesting that unmethylated miRNAs are modified by an unknown polymerase activity in plants (17). In this chapter, we describe a procedure to purify a specific miRNA for mass spectrometry analysis and a protocol to perform b elimination to detect modifications on the 3¢ terminal ribose of miRNAs. We also describe an (a-32P) dATP incorporation assay to detect 3¢ terminal uridylation.
2. Materials 2.1. Purifying miR173 with a Complementary Oligonucleotide Probe Coupled to Biotin
1. Tri-reagent (Molecular Research Center, Inc. Cat# TR 118).
2.1.1. Extraction of RNAs
3. Chloroform (VWR, cat# EM-CX1055-14).
2. Diethyl pyrocarbonate (DEPC)-treated water. Add 1 ml DEPC to 1 l deionized water, stir overnight, and autoclave the next day. 4. Isopropanol (VWR, cat# EM-PX1838-1). 5. 100% ethanol and 70% ethanol. Mix 70-ml 100% ethanol with 30-ml DEPC-treated water.
2.1.2. Annealing of Probe
1. Biotinylated probe. 5¢ biotin-aagtgatttctctctgcaagcgaa 3¢ (see Note 1). 2. 20× SSC. 3. RNasin Plus RNase Inhibitor (Promega, cat # N2615).
2.1.3. Preparation of Streptavidin Magnetic Particles
1. Streptavidin magnetic particles (Roche, cat# 11641778001). 2. 0.5× SSC. 3. Magnetic stand (Promega).
2.1.4. Capturing of Annealed BiotinylatedOligonucleotide/miRNA Hybrids and Elution of miRNA
3. Quantification of Purified miRNA by RNA Filter Hybridization
Analysis of miRNA Modifications
139
1. Exonuclease I (GE Healthcare, cat# E70073Z).
1. 5× TBE and 0.5× TBE. 2. 2× RNA loading buffer. Mix 8 ml formamide, 2 ml 5× TBE, 10 mg bromophenol blue and 10 mg xylene cyanol. 3. 15% polyacrylamide gel containing 42% urea. Dissolve 42 g urea in 4 ml 5× TBE and 15 ml 40% acrylamide (acrylamide:bisacrylamide, 29:1) and add water to 40 ml. Add 320 ml 10% APS and 24 ml TEMED (see Note 2). 4. Zeta-probe GT membrane (BioRad, cat# 162-093). 5. Ultrahyb-Oligo hybridization buffer (Ambion, cat# AM8663). 6. T4 polynucleotide kinase (NEB, cat# M0201S). 7. (g-32P) ATP (PerkinElmer). 8. DNA probe 5¢ gtgatttctctctgcaagcgaa 3¢ and synthesized miR173 5¢ UUCGCUUGCAGAGAGAAAUCAC 3¢ (a, t, g and c are deoxyribonucleotides; A, U, G and C are ribonucleotides). 9. 2× SSC with 0.5% SDS.
3.1. Monitoring 3 ¢ Terminal Methylation by b Elimination
1. 0.06 M borax/boric acid buffer (pH 8.6; see Note 3).
3.1.1. Extraction of RNAs (See Subheading 2.1.1)
4. Glycerol.
3.1.2. Periodate Treatment and b Elimination
6. 3 M sodium acetate (pH 5.2, DEPC-treated).
3.2. Monitoring 3 ¢ Uridylation by an [a-32P] dATP Incorporation Assay
1. 50% polyethylene glycol 8000 (DEPC-treated).
3.2.1. Enrichment of Small RNAs
2. 0.055 M borax/boric acid/NaOH (pH 9.5). 3. 200 mM sodium periodate (see Note 4). 5. Glycogen (Fermentas, cat# R0551). 7. G25 column (GE Healthcare).
2. 5 M NaCl (DEPC-treated).
140
Yu and Chen
3.2.2. Isolation of 18–30 nt Small RNAs by Electrophoresis
1. Decade™ markers (Ambion, cat# AM7778). 2. RNA elution buffer containing 20 mM Tris-HCL (pH 7.5), 0.5 M sodium Acetate, 10 mM EDTA, and 1% SDS. 3. Glass wool. 4. Chloroform/Phenol (1:1).
3.2.3. Ligation to 3 ¢ Adaptor and Purification of Small RNAs After 3 ¢ Adaptor Ligation
1. Alkaline Phosphatase, Calf Intestinal (CIP) (NEB, cat# M0290L).
3.2.4. Reverse Transcription and Amplification of miR167
1. microP2 primer. 5¢ attgatggtgcctacagttt 3¢.
2. RNA ligase (GE Healthcare, cat# E2050Y). 3. 3¢ adaptor. 5¢ P-UUUctgtaggcaccatcaat-iT 3¢ (P is phosphate, a, t, g and c are deoxyribonucleotides; U is ribonucleotide, iT is inverted deoxythymidine).
2. miR167P1. 5¢ tgaagctgccagcatga 3¢ (see Note 5). 3. M-MuLV reverse transcriptase (NEB, cat# M0253S). 4. 10 mM dNTP. 5. Gotaq DNA polymerase (Promega, cat# M3005).
3.2.5. Purification of DNA by Electrophoresis
1. 6× DNA loading buffer. 40% sucrose, 1 mg/ml bromophenol blue, and 1 mg/ml xylene cyanol. 2. 12% native polyacrylamide gel. 3. TrackIt 10 bp DNA ladder (Invitrogen, cat# 10488-019). 4. 0.3 M Sodium acetate (DEPC-treated).
3.2.6. [a-32P] dATP Incorporation Assay
1. (a-32p) dATP (PerkinElmer).
4. Methods Outline of the methods described below: 1. Purification of miR173. 2. Detection of RNA 3¢ terminal methylation by b elimination. 3. Detection of 3¢ uridylation by an (a-32P) dATP incorporation assay. 4.1. Purification of miR173 with a Complementary Oligonucleotide Probe Coupled to Biotin
We describe an affinity procedure to purify miR173 from total RNAs (see Fig. 1). Briefly, a complementary oligonucleotide probe coupled to biotin will be annealed with miR173 in a high salt solution and the hybrids will then be captured by magnetic streptavidin particles. After washes, miR173 will be eluted with water.
Analysis of miRNA Modifications
141
Fig. 1. A schematic illustration of the purification of miR173 from Arabidopsis total RNAs. The purification is achieved in three steps. The first step is the annealing of a biotinylated antisense miR173 probe to miR173 in total RNAs. The second step is the magnetic capturing of the duplex. The third step is the elution of miR173 after washes. Small box indicates biotin; SMP, streptavidin magnetic particle. 4.1.1. Extraction of Total RNAs
1. Grind Arabidopsis tissue in liquid nitrogen to fine powder with a mortar and pestle. 2. Transfer the powder to a centrifuge tube, add tri-reagent (10 ml per 1 g of fresh tissue), mix vigorously by vortexing and incubate at room temperature (RT) for 5 min. 3. Add chloroform (1/5 volume), mix vigorously, and incubate at RT for 15 min. 4. Centrifuge at 12,000 g for 15 min at 4°C. 5. Transfer the aqueous phase to a fresh centrifuge tube, add isopropanol (1/2 volume), mix, and incubate for 10 min at RT. 6. Centrifuge at 12,000 g for 10 min at 4°C. 7. Remove supernatant, wash with 70% ethanol (1 ml per 1 ml tri-reagent used), and air-dry pellet for 5 min (see Note 6). 8. Dissolve RNA in water by mixing through a pipette tip and incubating for 10–15 min at 60°C.
142
Yu and Chen
4.1.2. Annealing of Probe
1. Transfer 500 ml of total RNA (1–2 mg/ml) to an RNAse free tube and incubate for 15 min at 65°C (see Note 7). 2. Add 3 ml biotinylated oligonucleotide probe, 5 ml RNase inhibitor and 13 ml 20× SSC to RNA, and incubate at 50°C for 5–12 h.
4.1.3. Preparation of Streptavidin Magnetic Particles
1. Transfer 50 ml of streptavidin magnetic particles (SMPs) to an RNAase-free tube. Capture the particles by placing the tube in the magnetic stand until the SMPs have collected on one side of the tube (approximately 30 s). 2. Carefully remove the supernatant. Do not centrifuge the particles. 3. Wash the SMPs by adding 250 ml of 0.5× SSC followed by the capture of the SMPs using the magnetic stand and carefully removing the supernatant. Repeat these steps two more times.
4.1.4. Capturing of Annealed OligonucleotidemiRNA Hybrids and Elution of the miRNA
1. Transfer the annealing reaction to the tube containing the washed SMPs. 2. Incubate at RT for 20 min. Gently mix by inverting the tube every 1–2 min. 3. Capture the SMPs using the magnetic stand and carefully remove the supernatant without disturbing the SMP pellet (see Note 8). 4. Wash the particles four times with 0.5× SSC (200 ml per wash). After the final wash, remove as much of the supernatant as possible without disturbing the SMPs. 5. Elute the miRNA from the SMPs by adding 50 ml of H2O followed by incubation at 65°C for 5 min. 6. Add 2 ml of exonuclease I and incubate for 1 h to degrade any DNA oligonucleotide that is co-eluted with the miRNA.
4.1.5. Quantification of the Purified miRNA by RNA Filter Hybridization
The amount of purified miR173 can be estimated by northern blotting and comparing its signal intensity to that of a series of standards of known concentrations. 1. Prepare solutions of the synthesized miR173 standard in four different concentrations by adding 0.5 ng, 1 ng, 2.5 ng and 5 ng miR173 in 5 ml H2O. Add 5 ml of RNA loading buffer to 5 ml of purified miR173 and the four standards, incubate at 65°C for 5 min, and leave on ice. 2. Resolve RNAs on a 15% polyacrylamide gel containing 42% urea. 3. Transfer the RNAs to Zeta-probe GT membrane using a semi-dry transfer apparatus (see Note 9). 4. Fix RNA to the membrane by ultraviolet cross-linking for 1 min followed by baking at 80°C for 1 h.
Analysis of miRNA Modifications
143
5. Prehybridize in Ultrahyb-Oligo hybridization buffer for 1.5 h at 42°C. 6. Prepare the 5¢ end labeled probe by incubating a mixture of 34.5 ml H2O, 5 ml 10× T4 polynucleotide kinase (PNK) buffer (700 mM Tris–HCl, 100 mM MgCl2 and 50 mM Dithiothreitol, pH 76.6), 5 ml PNK, 0.5 ml 100 mM DNA oligonucleotide, and 5 ml (g-32P) ATP (6,000 Ci/mMol) at 37°C for 1 h. 7. Pass the labeling reaction through a G-25 column to eliminate the free ATP. 8. Add the probe to the prehybridization reaction and incubate for 18 h in a hybridization oven. 9. Wash the membrane three times with 2× SSC/0.5% SDS at 42°C. 10. Visualize and quantify the radioactive signals with a PhosphoImager. 4.2. Detection of 3 ¢ Terminal Methylation by b Elimination
The presence of a methyl group on the 3¢ terminal ribose of miR173 was detected by filter hybridization of total RNAs that have been treated with sodium periodate followed by b elimination (13). As shown in Fig. 2a, periodate cleaves the vicinal hydroxyl groups of the last nucleoside of miR173 to produce a dialdehyde when free hydroxyl groups are present in both 2¢ and 3¢ positions on the ribose of the last nucleotide (18). The b elimination reaction then removes the last nucleotide to generate an RNA that is 1 nt shorter and that has a phosphate group at the 3¢ terminus (see Fig. 2a). Thus, after the chemical treatments, miR173 with two free hydroxyl groups at the 3¢ terminus will migrate approximately 2 nt faster than it will without treatment, which can be detected by RNA filter hybridization (see Fig. 2b, hen1-1). If methylation occurs on the 3¢ terminal ribose of miR173, the methyl group will block the chemical reactions. Therefore, the chemical treatment will not change the mobility of methylated miR173 (see Fig. 2b, Ler).
4.2.1. Preparation of RNAs from Ler and hen1-1 (See subheading 3.1.1)
1. Dissolve ~100 mg of RNA in 88 ml borax/boric acid buffer and add 12.5 ml of sodium periodate.
4.2.2. Periodate Treatment and b Elimination
3. Add 10 ml of glycerol and incubate for another 30 min to stop the reaction.
2. Incubate in the dark at RT for 1 h.
4. Add 1 ml glycogen, 10 ml sodium acetate, and 300 ml ethanol to precipitate RNA. 5. Dissolve precipitated RNA in 100 ml of borax/boric acid/ NaOH and incubate for 90 min at 45°C. 6. Pass the reaction through a G25 column to remove salts (optional). 7. Precipitate RNA with ethanol.
144
Yu and Chen
Fig. 2. Detection of miRNA methylation by b elimination. (a) Diagram of periodate treatment followed by b elimination. The last two nucleotides of miR173 are shown. The vicinal hydroxyl groups of the 3¢ terminal ribose react with periodate such that the last nucleoside is converted into a dialdehyde, which is subsequently removed by b elimination. The resulting miR173 is one nucleotide shorter and carries a 3¢ P. (b) The methylation status of miR173 in Ler (wild type) and hen1-1. Total RNAs of Ler or hen1-1 were treated with sodium periodate followed by b elimination, resolved by gel electrophoresis, and hybridized to an antisense miR173 probe, and the hybridization signals were visualized using a PhosphoImager.
4.3. Probing miR173 by RNA Filter Hybridization (See Subheading 3.1.5) 4.4. Detection of 3 ¢ Uridylation with an [a-32P] dATP Incorporation Assay
We employ an (a-32P) dATP incorporation assay to study the 3¢ uridylation of miRNAs (17). After small RNAs are isolated, they are ligated to a 3¢ adaptor, and reverse transcribed with a primer complementary to the 3¢ adaptor (see Fig. 3a). After this, miR167 is selectively amplified with an miRNA-specific primer that corresponds to the 5¢ portion of miR167 and the 3¢ adaptor primer. miR167 with U-tails in hen1-2 will generate a pool of PCR products with various numbers of T residues adjacent to the 3¢ adaptor, whereas miR167 from the wild type will produce products in which no Ts are adjacent to the 3¢ adaptor (see Fig. 3a). Taq DNA polymerase will be used to extend the RT-PCR products with a primer complementary to the 3¢ adaptor, in the presence of only (a-32P) dATP (see Fig. 3a). In this primer extension, the templates from U-tailed miRNAs will generate a ladder of products with varying numbers of A residues in the hen1-2 sample (see Fig. 3b, hen1-2), whereas the
Analysis of miRNA Modifications
145
Fig. 3. (a-32P) dATP incorporation assay. (a) A schematic diagram of an (a-32P) dATP incorporation assay (Adapted from Ref 17). (b) (a-32P) dATP incorporation assay performed on miR167 from Ler and hen1-2.
products from the wild-type sample will be rarely extended beyond the adaptor (see Fig. 3b, Ler). 4.4.1. Enrichment of Small RNAs
Our lab uses a polyethylene glycol/NaCl (PEG/NaCl) method to separate low molecular weight RNAs from high molecular weight RNAs. 1. Dissolve ~1 mg total RNA pellet from Ler or hen1-2 in 400 ml of H2O, add 50 ml of PEG (50%), and 50 ml of NaCl (5 M), mix and leave on ice for at least 1 h.
146
Yu and Chen
2. Centrifuge at 13,000 g for 10 min. Transfer the supernatant to a new tube. 3. Add 1 ml of glycogen, 50 ml of sodium acetate, and 3 volumes of 100% ethanol. Incubate at −20°C for at least 2 h. 4. Centrifuge at maximum speed for 20 min at 4°C. Wash the pellet with 70% ethanol. 5. Air-dry the pellet for 5 min and dissolve in DEPC-treated water. 4.4.2. Isolation of 18–30 nt Small RNAs by Electrophoresis
1. Resolve small RNAs and 32P-labelled RNA size markers on a 15% polyacrylamide gel containing 42% urea. 2. Excise 20–30 nt small RNAs (sizes were estimated based on RNA decade markers) from the gel. 3. Elute small RNAs by incubating the gel slice in RNA elution buffer at 65°C for 4 h. Pass the solution through glass wool, extract with equal volumes of chloroform/phenol twice, and precipitate RNAs with three volumes of 100% ethanol. 4. Air-dry the pellet for 5 min and dissolve in 25 ml of DEPCtreated water.
4.4.3. Ligation to 3¢ Adaptor and Purification of Small RNAs Ligated to the 3¢ Adaptor
1. Dephosphorylate small RNAs by adding 3 ml of 10× NEB Buffer 3 (500 mM tris–HCl, 1,000 mM NaCl, 10 mM MgCl2, and 10 mM Dithiothreitol, pH 7.9) and 2 ml of CIP. Incubate at 37°C for 1 h. 2. Add 70 ml of water, extract with 100 ml of chloroform/phenol and precipitate with ethanol. 3. Dissolve RNAs in 10 ml of water and add 3 ml 10× ligation buffer (500 mM Tris–HCl, 100 mM MgCl2, 10 mM ATP and 100 mM Dithiothreitol, pH7.8), 3 ml BSA, 13 ml adaptor and 1 ml T4 RNA ligase. Incubate for 16 h at 8°C. 4. Purify small RNAs ligated to the 3¢ adaptor by electrophoresis (see Subheading 3.3.2)
4.4.4. Reverse Transcription and PCR Amplification (RT-PCR)
1. Mix 13.5 ml of small RNAs ligated to the adaptor and 2 ml of microP2 primer, incubate at 65°C for 5 min, and leave on ice. 2. Add 2 ml 10× RT buffer (500 mM tris–HCl, 750 mM KCl, 30 mM MgCl2 and 100 mM Dithiothreitol, pH 8.0), 1 ml dNTP (10 mM), 0.5 ml RNase inhibitor, and 1 ml MuLV reverse transcriptase. Incubate at 42°C for 1 h. 3. Perform PCR in the solution containing 38.5 ml H2O, 4 ml RT products, 5 ml 10× PCR buffer (2,000 mM Tris–HCl, 500 mM KCl and15-mM MgCl2, pH 8.4), 1 ml dNTP (10 mM), 1 ml miR167P1, 1 ml microP2, and 0.5 ml Taq DNA polymerase.
4.4.5. Purification of DNA by Electrophoresis
Analysis of miRNA Modifications
147
1. Resolve PCR products and DNA size markers on a 12% native polyacrylamide gel and visualize DNA by ethidium bromide staining. 2. Excise the DNA band from the gel and cut the gel slices into many small pieces. 3. Add 500 ml of 300 mM sodium acetate (pH 5.2), and shake at 37°C for 1 h. 4. Pass the solution through glass wool, extract with equal volumes of chloroform/phenol twice and precipitate with two volumes of 100% ethanol. 5. Dissolve the DNA pellet in 50 ml of water.
4.4.6. (a-32P) dATP Incorporation Assay
1. Mix 12.2 ml H2O, 1 ml DNA (see Subheading 3.3.5), 1.5 ml 10× PCR buffer (2,000 mM Tris–HCl, 500 mM KCl and 15 mM MgCl2, pH 8.4) 0.2 ml (a-32P) dATP, 0.4 ml microP2 (10 mM), and 0.2 ml Taq DNA polymerase. 2. Perform one cycle PCR (94°C for 90 s, 55°C for 30 s, and 72°C for 10 s). 3. Add 15 ml of 2× loading buffer and resolve 5 ml of the reaction in a 15% polyacrylamide gel containing 42% urea. 4. Visualize the radioactive signals with a PhosphoImager.
5. Notes 1. The molecular weight of the biotinylated probe should have a large difference from that of the miRNA to be isolated. This is to prevent the biotinylated probe, which will be inevitably eluted in the purification process together with the miRNA, from interfering with the mass spectrometry analysis of the miRNA. 2. It is convenient to make a 1 l stock without the addition of APS and TEMED. The stock can be stored at 4°C in the dark. 3. To make borax/boric acid buffer (0.06 M, pH 8.6), make 0.06 M borax and 0.06 M boric acid. Use borax to adjust the pH of the boric acid to 8.6. 4. Sodium periodate needs to be kept in the dark, as it is sensitive to light. 5. As this experiment is to study the 3¢ terminus of the miRNA, the miRNA-specific primer should correspond to the 5¢ portion of the miRNA. 6. Do not completely dry the RNA pellet, as this will greatly decrease its solubility.
148
Yu and Chen
7. To obtain enough miRNA for mass spectrometry analysis, the starting amount of total RNAs should be scaled up based on the amount described here. 8. Save the supernatant from step 3 until you are certain that satisfactory binding and elution of the miRNA have occurred. 9. The current for the transfer is 2 mA per cm2 membrane, but this needs to be experimentally determined for other transfer apparatus. References 1. Bartel DP (2004) MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116:281–297 2. Grishok A, Pasquinelli AE, Conte D, Li N, Parrish S, Ha I, Baillie DL, Fire A, Ruvkun G, Mello CC (2001) Genes and mechanisms related to RNA interference regulate expression of the small temporal RNAs that control C. elegans developmental timing. Cell 106:23–34 3. Hutvágner G, McLachlan J, Pasquinelli AE, Balint É, Tuschl T, Zamore PD (2001) A cellular function for the RNA-interference enzyme Dicer in the maturation of the let-7 small temporal RNA. Science 293:834–838 4. Lee Y, Ahn C, Han J, Choi H, Kim J, Yim J, Lee J, Provost P, Radmark O, Kim S, Kim VN (2003) The nuclear RNase III Drosha initiates microRNA processing. Nature 425: 415–419 5. Ketting RF, Fischer SE, Bernstein E, Sijen T, Hannon GJ, Plasterk RH (2001) Dicer functions in RNA interference and in synthesis of small RNA involved in developmental timing in C. elegans. Genes Dev 15:2654–2659 6. Park W, Li J, Song R, Messing J, Chen X (2002) CARPEL FACTORY, a Dicer homolog, and HEN1, a novel protein, act in microRNA metabolism in Arabidopsis thaliana. Curr Biol 12:1484–1495 7. Reinhart BJ, Weinstein EG, Rhoades MW, Bartel B, Bartel DP (2002) MicroRNAs in plants. Genes Dev 16:1616–1626 8. Lobbes D, Rallapalli G, Schmidt DD, Martin C, Clarke J (2006) SERRATE: a new player on the plant microRNA scene. EMBO Rep 7:1052–1058 9. Yang L, Liu Z, Lu F, Dong A, Huang H (2006) SERRATE is a novel nuclear regulator in primary microRNA processing in Arabidopsis. Plant J 47:841–850
10. Fang Y, Spector DL (2007) Identification of nuclear dicing bodies containing proteins for microRNA biogenesis in living Arabidopsis plants. Curr Biol 17:818–823 11. Song L, Han MH, Lesicka J, Fedoroff N (2007) Arabidopsis primary microRNA processing proteins HYL1 and DCL1 define a nuclear body distinct from the Cajal body. Proc Natl Acad Sci U S A 104:5437–5442 12. Basyuk E, Suavet F, Doglio A, Bordonne R, Bertrand E (2003) Human let-7 stem-loop precursors harbor features of RNase III cleavage products. Nucleic Acids Res 31:6593–6597 13. Yu B, Yang Z, Li J, Minakhina S, Yang M, Padgett RW, Steward R, Chen X (2005) Methylation as a crucial step in plant microRNA biogenesis. Science 307:932–935 14. Yang Z, Ebright YW, Yu B, Chen X (2006) HEN1 recognizes 21–24 nt small RNA duplexes and deposits a methyl group onto the 2¢ OH of the 3¢ terminal nucleotide. Nucleic Acids Res 34:667–675 15. Chen X, Liu J, Cheng Y, Jia D (2002) HEN1 functions pleiotropically in Arabidopsis development and acts in C function in the flower. Development 129:1085–1094 16. Boutet S, Vazquez F, Liu J, Beclin C, Fagard M, Gratias A, Morel JB, Crete P, Chen X, Vaucheret H (2003) Arabidopsis HEN1: a genetic link between endogenous miRNA controlling development and siRNA controlling transgene silencing and virus resistance. Curr Biol 13:843–848 17. Li J, Yang Z, Yu B, Liu J, Chen X (2005) Methylation protects miRNAs and siRNAs from a 3’-end uridylation activity in Arabidopsis. Curr Biol 15:1501–1507 18. Alefelder S, Patel BK, Eckstein F (1998) Incorporation of terminal phosphorothioates into oligonucleotides. Nucleic Acids Res 26: 4983–4988
Chapter 11 MicroRNA Promoter Analysis Molly Megraw and Artemis G. Hatzigeorgiou Abstract In this chapter, we present a brief overview of current knowledge about the promoters of plant microRNAs (miRNAs), and provide a step-by-step guide for predicting plant miRNA promoter elements using known transcription factor binding motifs. The approach to promoter element prediction is based on a carefully constructed collection of Positional Weight Matrices (PWMs) for known transcription factors (TFs) in Arabidopsis. A key concept of the method is to use scoring thresholds for potential binding sites that are appropriate to each individual transcription factor. While the procedure can be applied to search for Transcription Factor Binding Sites (TFBSs) in any pol-II promoter region, it is particularly practical for the case of plant miRNA promoters where upstream sequence regions and binding sites are not readily available in existing databases. The majority of the material described in this chapter is available for download at http://microrna.gr. Key words: MicroRNA, Transcription factors, Promoter, Sequence scanning, Position-specific weight matrices
1. Overview Plant miRNAs are primarily encoded in intergenic regions, and down-regulate the expression of a gene by guiding an Argonaute protein complex in slicing a highly complementary mRNA-target molecule (1). They are located inside longer transcripts, which are transcribed by RNA polymerase II (pol-II). Although much effort has been focused on elucidating the mechanism of miRNA target gene regulation, relatively little is known and published about the regulation of miRNA genes themselves. The nature of miRNA promoter elements remains one of the most interesting, open problems in the study of miRNA biogenesis, since their identification would aid the understanding of regulatory networks in which miRNAs play a crucial role.
B.C. Meyers and P.J. Green (eds.), Plant MicroRNAs, Methods in Molecular Biology, vol. 592 DOI 10.1007/978-1-60327-005-2_11, © Humana Press, a part of Springer Science + Business Media, LLC 2009
149
150
Megraw and Hatzigeorgiou
While TFBS prediction in vertebrates is greatly facilitated by hundreds of transcription factor binding motifs characterized as Positional Weight Matrices (PWMs) in the database TRANSFAC (2), and several plant databases contain predicted sites within the regions upstream of protein-coding loci (3–5), we found that searching for TFBSs upstream of noncoding loci within plant genomes essentially meant starting from scratch with few available PWMs and without standardized tools for sequence extraction or analysis of intergenic regions. In Megraw et al. (6), we describe how we constructed our own collection of 99 PWMs for known transcription factors in Arabidopsis and implemented a log-likelihood-based scoring function to identify putative binding sites for these factors. Furthermore, we discovered that the use of PWM-specific threshold values is superior to the common procedure of using a single score threshold across all PWMs, and that this method was key to obtaining meaningful results in our analysis of Arabidopsis miRNA promoter regions. The purpose of this chapter is to provide a step-by-step guide to TFBS prediction for plant researchers, a guide that takes advantage of this procedure. Specifically, we present a practical statistical method, which allows the researcher to obtain a set of predicted Transcription Factor Binding Sites (TFBSs), which are at least as likely as any trusted collection of sites. This collection may be experimentally determined (7) or include predicted sites (3–5). The central idea of this method is to use cutoff thresholds appropriate to each individual TF as well as to background sequence composition. The method is conceptually simple and easy to use. While the procedure can generally be applied to search for TFBSs in any pol-II promoter region, it is particularly practical for the case of plant miRNA promoters where upstream regions and binding sites are not readily available in existing databases.
2. Materials 1. PC/Mac with Java installed (http://www.java.com/en/download/index.jsp), in order to use an existing computer program to perform all required calculations. Alternatively, necessary calculations can be implemented by the reader. 2. Source of transcription factor binding motifs for TFs of interest. Alternatively, begin with representations already constructed at http://www.diana.pcbi.upenn.edu/Tools/ PlantTFBS/. 3. Source of miRNA promoter sequences in FASTA format. Alternatively, begin with experimentally supported miRNA promoter sequences available at http://www.diana.pcbi. upenn.edu/Tools/PlantTFBS/.
MicroRNA Promoter Analysis
151
3. Methods The methods described below outline (1) the construction of a Positional Weight Matrix (PWM) representation of a TF binding motif, (2) the construction of a background model, (3) the choice of a threshold for each TF, (4) the choice of promoter regions and methods of sequence extraction, and (5) scanning promoter regions for putative TFBSs. 3.1. Construction of PWMs for Each TF
A PWM is a standard way of representing TF binding motifs (8). A PWM represents each TF binding motif as a simple probability model that describes the chance of finding a particular nucleotide (A, C, G, or T) at a particular position of the motif. This model is encoded by a matrix which consists of the number of occurrences of each nucleotide in each binding position of the motif. Fig. 1 illustrates this concept. In order to build a PWM for a particular TF of interest, simply follow these steps: 1. List the sequences of acceptable binding sites for the TF. Ideally, each sequence represents one experimentally supported binding observation at a particular site. If multiple binding observations occur at sites with the same sequence, each one should be listed separately. If some instances of the motif contain more positions than others, choose a common “core” of meaningful positions (see Note 1). 2. Using this list of sequences, count the number of A’s, C’s, G’s, and T’s at each sequence position. Record the counts in a table similar to the one shown in Fig. 1. 3. To ensure that no entry of the PWM matrix will be exactly zero, add a small number of pseudocounts (e.g., 0.25 counts) to each entry of the count matrix (see Note 2). 4. For each column of the table (sequence position), add up the counts in the column. 5. For each entry (nucleotide) of each column, divide the entry by the column total computed in step 3. A frequency matrix similar to the one shown in Fig. 1 will result. As a check, the entries in each column should now sum up to one. We used this procedure to construct PWMs representing 99 TF binding motifs in Arabidopsis from the AGRIS (3) and AtProbe (7) databases. Sites obtained from AtProbe are experimentally supported observations that have been manually curated from the literature, whereas the AGRIS database includes sites that have not been experimentally validated. A command-line Java tool to construct PWMs from sequences and the entire PWM collection constructed in our study (6) are freely available at http://www. diana.pcbi.upenn.edu/Tools/PlantTFBS/.
152
Megraw and Hatzigeorgiou Acceptable binding sites for a TF
C
C
C
C
C C
A
A
T
G
T
T
G
C
A
T
C
A
T
A
T
G
T
G
Positional weight matrix for a TF
1
2
3
4
5
6
A
0
0
1
.25
0
0
C
1
1
0
0
0
0
G
0
0
0
0
0
1
T
0
0
0
.75
1
0
Fig. 1. Given a collection of sequences, which represent observed binding sites for a TF, a PWM “counts up” the number of A’s, C’s, G’s, and T’s in each position in order to describe the chance of finding each nucleotide in this position. The PWM can be visualized as a “logo” which describes how often the TF is expected to bind certain types of sites.
3.2. Construction of a Background Model
There is a noted compositional bias toward A-T enrichment in plant promoters (9), and we also observed such a bias in the Arabidopsis promoter sets that we examined in our study (6). Specifically, we observed that the TAIR6 genome build contains a mean A-T content of 64.0%, while our miRNA and proteincoding gene promoter sets in the region from the TSS to 800 bp upstream each contain a mean A-T content of 68.9 and 67.8%, respectively. The similarity in base composition between miRNA promoter and protein coding gene promoter sets supports the idea that an A-T rich background is a biologically meaningful aspect of Arabidopsis pol-II promoter regions. These observations are of direct relevance in PWM-based TFBS prediction within plant miRNA promoters. In order to decide if a subsequence within a promoter region is a good candidate binding site for a particular TF, intuitively we would want to know if it is more likely to observe this sequence under the PWM model than to observe it simply as a result of the underlying nucleotide distribution within promoter regions. This is exactly the question addressed by the likelihood scoring method described in the following section. In order to ask this question, we first need to estimate the underlying nucleotide distribution or background model. A straightforward practical model is to assume
MicroRNA Promoter Analysis
153
Background frequency vector A
.34
C
.16
G
.16
T
.34
Fig. 2. Example of a simple background model, computed as a background frequency vector. This example reflects the A-T rich nature of Arabidopsis promoter sequences.
P(Sequence S is observed under PWM model M ) Score = P(Sequence S is observed under background model B ) L ∏ PM ( si ) i =1 = log L S = s1s2s3...sL P ( s ) ∏ B i si ∈{A,C,G,T} i =1 and L = ∑ (log(PM ( si ) )− log (PB ( si ) )) where P M (si) denotes the probability of observing base si i =1 in position i of PWM model M P B (si) denotes the probability of observing base si in position i of background model B
Fig. 3. Log-likelihood score function for comparing subsequence S to PWM M.
that nucleotides are drawn at random according to the mean observed frequencies for each nucleotide within a large collection of promoter regions. Figure 2 gives an example of a background frequency vector computed for the case of Arabidopsis promoter regions. This frequency vector is available for use with the TFBS scoring and scanning programs at http://www.diana.pcbi.upenn. edu/Tools/PlantTFBS/. Construction of a new background frequency vector for another species is straightforward: 1. Obtain a set of promoter sequences for pol-II genes, ideally as large a set as possible with as much experimental support as possible. 2. Compute and record the observed frequency of each nucleotide in this set. 3.3. Threshold Choice for Each TF
In our promoter element search we will use a log-likelihood scoring function to scan upstream sequences for potential TF binding sites (Fig. 3), which is a standard PWM-based scoring approach (10). Using the log-likelihood score equation in Fig. 3, the score for each binding sequence can be computed as follows: For each position i = 1…L of the sequence: 1. What is the nucleotide at position i (is it A, C, G, or T)? Call this nucleotide si.
154
Megraw and Hatzigeorgiou
2. Look at the table of frequencies given by the PWM: for position i, what is the frequency of nucleotide si? This is the value found in the ith column of the PWM at the row associated with nucleotide si. Call this frequency PM(si). 3. Look at the vector of background frequencies: what is the background frequency of nucleotide si? Call this frequency PB(si). 4. Compute the value of the term: log(PM(si)) – log(PB(si)) where log refers to the natural logarithm of a number. Add up the L computed terms to get the Score for this sequence. Intuitively, the log-likelihood score compares the probability of observing a particular subsequence according to our PWM model to the probability of observing that subsequence according to a background model. A high score is indicative of a good match to the TF binding motif, but how good must this score be for us to conclude that the subsequence is a binding site for that TF? The need for a meaningful threshold can be addressed with the following simple procedure for each TF of interest: 1. Choose a trusted collection of binding sites for the TF. This collection could include experimentally verified and/or putative sites, and/or sites which were not used to construct the PWM for this TF. The important point is to choose a collection which represents one’s best estimate of the full range of sites where the TF can bind. 2. List the possible binding sequences for this collection of sites. Even if many sites are associated with the same binding sequence, it is only necessary to list each binding sequence once. 3. Compute the log-likelihood score for each binding sequence in the list as described above (Fig. 3). 4. Choose the lowest Score from the list as the Threshold Score. The consequence of this procedure is that, when scanning a particular set of upstream sequences, we will only “discover” a binding site occurrence of a particular TF if its PWM-based loglikelihood score is at least as strong as all trusted binding sites for that TF. If more stringent site selection is desired, putative sites can later be filtered according to their scores. A command-line Java program for performing this thresholding procedure, as well as the set of thresholds determined by this procedure for the 99 PWMs constructed in our Arabidopsis miRNA promoter study (6), are freely available at http://www.diana.pcbi.upenn.edu/Tools/PlantTFBS/. 3.4. Selection of Promoter Regions
Binding sites predicted on the basis of sequence are clearly much more likely to be functional in the promoter regions of genes than elsewhere. It is therefore important to estimate the true promoter region using as much evidence about TSS location as possible. Ideally, evidence from a 5¢ cap detection method is available for the miRNA transcript of interest. Recent plant promoter prediction methods for pol-II genes assert approximately 60–80%
MicroRNA Promoter Analysis
155
accuracy for miRNA promoter prediction (11, 12), and predicted promoters for some miRNAs can be obtained online from these sources (see Note 3). Even when evidence for the TSS location is available, it is important to obtain accurate information on genomic location of the TSS and extract the sequence from the most recent plant genome build. The identification of 63 miRNA Transcription Start Sites (TSSs) in Arabidopsis (13) via 5¢ RACE recently opened an exciting opportunity for computational analysis of miRNA promoter regions in plants. Data from these experimentally supported TSSs suggests that in plants, the location of the mature miRNA generally falls within about 300 nt of the downstream-most TSS. Therefore, in the absence of any additional information, the location of the mature miRNA itself can provide a first very rough approximation for an appropriate upstream search region. As functional TFBS sites are unlikely to reside within the miRNA precursor, it is also helpful to obtain estimates of the miRNA precursor location from miRBase (14) or via RNA secondary structure prediction (see Note 5). A second concern in selecting promoter regions for TFBS prediction is to decide how much upstream sequence to search. While enhancer elements for plant genes may be present several kilobases upstream of the TSS, we found that the vast majority of experimentally supported promoter elements reported in the literature for Arabidopsis protein-coding genes fall within 800 nt of the annotated TSS. While this may be due in part to the fact that many experiments focus on the region immediately proximal to the annotated TSS, a search region of approximately 1 kb upstream does provide a starting point for a segment where functional binding sites for many known TFs are likely to be found. As part of our study on Arabidopsis miRNA promoters, we performed the necessary sequence extraction steps for all experimentally supported miRNA promoters from Xie et al. (13). We selected a length of 800 nt upstream of each TSS, and have made the genomic locations and sequences for these promoters available in FASTA format at http://www.diana.pcbi.upenn.edu/Tools/ PlantTFBS/. Multiple alternative TSSs were indicated by the 5¢ RACE experiments for about 20% of the miRNAs in this study, and in these cases, we selected the downstream-most TSS in order to provide the most comprehensive TFBS search region. We summarize the current options for obtaining upstream sequence here: 1. If the miRNA of interest has an experimentally supported TSS identified by the study (13), obtain its genomic location and/ or download the promoter sequence directly from http:// www.diana.pcbi.upenn.edu/Tools/PlantTFBS/. 2. If an experimentally supported TSS or a predicted TSS is available from another source, but the desired length of upstream
156
Megraw and Hatzigeorgiou
sequence is not directly available, use one of two options to obtain the sequence: (a) If genomic location of the TSS is directly available, identify the genome build to which this location refers and download the desired length of upstream sequence (see Note 4). For Arabidopsis, this task can be performed at the TAIR Web site http://www.arabidopsis.org/. (b) If genomic location for the TSS itself is not available but some portion of upstream or surrounding sequence is available, use a BLAST tool to find the best matching location for this TSS within the upstream vicinity of the miRNA. For Arabidopsis, TAIR BLAST is available at http://www.arabidopsis.org/Blast/index.jsp. 3. If no additional source for estimating TSS location is available for the miRNA of interest, locate the miRNA precursor and use the upstream-most endpoint of the hairpin foldback as the first approximation for the TSS. Genomic locations for miRNA precursors are available in miRBase for several plant species at http://microrna.sanger.ac.uk/sequences/ftp.shtml. 4. If the miRNA of interest is not contained in miRBase and no additional source for estimating TSS location is available, two options remain: (a) Use the location of the mature miRNA sequence as a very rough approximation of the TSS. (b) Predict the hairpin foldback for this miRNA using an RNA secondary structure prediction tool (see Note 5). Use the upstream-most endpoint of the predicted hairpin foldback as an approximation for the TSS. 5. As many TFs do not show a marked strand bias for binding, it is also useful to create a reverse complement set of sequences for TFBS prediction. After obtaining a forward-strand set of promoter sequences as described in steps 1–4, a reverse-complement set can be obtained using a sequence processing tool such as the one available at http://www.bioinformatics.org/sms2/ rev_comp.html. 3.5. Scanning Promoter Regions for Putative TFBSs
The log-likelihood scoring function and associated TF-specific threshold score are used to scan miRNA promoter sequences for putative TF binding site occurrences, as illustrated in Fig. 4. The sites found to exceed the PWM-specific threshold score for each TF are considered to be putative TFBSs. Use the following steps to perform the scan for each TF and each upstream sequence selected above. In these steps, suppose the PWM for a particular TF has L positions and the upstream region under consideration is of length N.
MicroRNA Promoter Analysis
157
Observed Site High
.
. . . . . ... . .. . . . . . . .... . . .. . ..
Threshold Scores
0
Low
ACGGTACCTATTGACGAGCTCCAATGTGA Promoter Sequence
PWM
Fig. 4. An illustration to visualize scanning for binding sites: only those sites in a promoter sequence that exceed the PWM-specific threshold score are “observed” as putative binding sites.
1. For the subsequence or “window” that starts at the first nucleotide of the upstream region, compute the log-likelihood Score for the TF as described in Subheading 3.3, Fig. 3. 2. If the Score is greater than or equal to the Threshold Score previously determined for this TF, record the location and Score for this window as a putative TFBS. 3. Repeat this procedure for each of the remaining windows in the upstream region. As illustrated in Fig. 4, the idea of this routine is effectively to “slide” the PWM along the upstream sequence and ask at each window location “does this site match the PWM well enough to be considered a putative TFBS?” The procedure for computing the score at each position is exactly the same as the procedure for computing the score of each binding sequence when choosing a TF threshold. A command-line Java program for performing this sequence scanning procedure is freely available at http://www. diana.pcbi.upenn.edu/Tools/PlantTFBS/.
4. Discussion In this chapter, we discuss the computational identification of TFBSs for miRNA promoters in plants. In a recent work, we used the above-described method to analyze the promoter regions of the 63 experimentally verified miRNA TSSs of miRNAs in Arabidopsis (13). This study remains the most comprehensive published source of experimentally supported miRNA TSSs to date, though it is anticipated that additional data will soon become
158
Megraw and Hatzigeorgiou
available through deep sequencing of small RNA cDNA libraries (15). Using this published source, we analyzed regions up to 800 nt upstream of the miRNA TSS sites to search for known transcription factor binding elements. The goal of our investigation was to determine whether “miRNA-preferred” transcription factor binding elements exist in plants. In our analysis (6), we observed a predominance of TATAcontaining miRNA promoters, and found that miRNA promoters in general have a similar AT-rich base composition to their proteincoding counterparts in plants. However, while many of the same transcription factor binding elements were found in Arabidopsis miRNA promoters as in protein-coding gene promoters, the distribution of these elements differed between the two promoter types. Several factors were found significantly more frequently in miRNA promoters (AtMYC2, ARF, SORLREP3, and LFY), and these “miRNA-preferred” factors were consistent with prominent roles for miRNAs in organism development and adaptation. This investigation also suggested that miRNAs may be involved in direct feedback loops with hormonal regulators in plants (Fig. 5). An anecdotal example of such a case was found and described in (6). In this study, we found that two miRNAs with putative ARF binding sites upstream, miR-160 and miR167, have targets belonging to the ARF gene family. These targets were not only computationally predicted, but were also found to have experimental evidence for binding recorded in TarBase (16), a database for experimentally supported interactions between miRNAs and mRNAs. While our computational study yielded some initial insights into the nature of plant miRNA promoters, a vast amount of work still remains to be done in this new area of investigation. In many cases, the researcher will need to begin by predicting TFBSs within known or putative miRNA promoter regions. This collection
Transcription Factor (TF)
miRNA
Reduced amount of TF available to promote miRNA transcription 5’
TF mRNA
DNA
miRNA is transcribed and processed into mature sequence
3’
Mature miRNA binds and represses translation of TF mRNA
Fig. 5. An illustration of a miRNA and a transcription factor in a negative feedback loop. A TF X is activating a miRNA Y. After miRNA Y is transcribed, it targets the messenger RNA of TF X. This results in a reduced amount of protein produced by TF X, in turn lowering the expression of the transcript hosting miRNA Y.
MicroRNA Promoter Analysis
159
of sites can then be used as a starting point for the identification of candidate regions for ChIP-on-Chip experiments, for the incorporation of gene expression data in order to increase confidence in specific sites, for forming hypotheses about genetic pathways of interest, and for many other useful follow-on analyses.
5. Notes 1. In the framework of this method, PWMs are constructed from a contiguous set of aligned binding site positions. However, some instances of binding site motifs reported in databases or in the literature may have different lengths than others for the same TF. In this case, choose a contiguous set of positions which are common to these motifs, and discard positions for which there is missing data. 2. Even though a certain nucleotide may never appear in a particular position within the observed binding sequence data for a TF, it is understood in this situation that the observed data are unlikely to represent every possible binding site for that TF. Rather than represent the appearance of this nucleotide in such a position as an impossible event, a more realistic approach would be to represent it as a very rare event. The addition of pseudocounts is a well-established practical method for addressing this situation in the case where it is otherwise very difficult to estimate the frequency of such rare events. By adding a relatively small number (this number can also be fractional) of extra counts to each entry when summing up the total number of observations, “impossible events” are eliminated from the probability model and replaced in a proportional manner with rare events. 3. The first cited source provides a web-based interface that accepts regions of sequence for promoter prediction (http:// softberry.com/berry.phtml?topic = tssp&group = programs& subgroup = promoter). The second cited source provides a set of predicted promoters for some miRNAs as supplementary material online (http://cic.cs.wustl.edu/microrna/ath_miRNA_ promoter.fa). 4. If the miRNA of interest is on the “minus” strand of the chromosome, keep in mind that the desired start and end points of the upstream sequence segment will be annotated with numerically larger values than the genomic location of the TSS itself. 5. Several programs are available for secondary structure prediction. Among the most well-known software packages are RNAfold (17) and MFOLD (18). It is important to bear in mind that installing, running, and selecting energy thresholds
160
Megraw and Hatzigeorgiou
for these programs can be a nontrivial task when undertaken for the first time. Several programs have online versions available and they are expedient when only a few sequences need to be folded; however, energy thresholds must still be appropriately chosen. Help from an informatics colleague with experience in running these programs is ideal for first-time users.
Acknowledgments The authors thank Shane Jensen, Vesselin Baev, Ventsislav Rusinov, and Kriton Kalantidis for their contributions to the study from which this work was derived. This work was supported by an NSF Career Award (DBI-0238295).
References 1. Jones-Rhoades MW, Bartel DP, Bartel B (2006) MicroRNAs and their regulatory roles in plants. Annu Rev Plant Biol 57:19–53 2. Wingender E, Chen X, Hehl R, Karas H, Liebich I, Matys V, Meinhardt T, Pruss M, Reuter I, Schacherer F (2000) TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res 28:316–319 3. Davuluri RV, Sun H, Palaniswamy SK, Matthews N, Molina C, Kurtz M, Grotewold E (2003) AGRIS: Arabidopsis gene regulatory information server, an information resource of Arabidopsis cis-regulatory elements and transcription factors. BMC Bioinformatics 4:25 4. Steffens NO, Galuschka C, Schindler M, Bulow L, Hehl R (2005) AthaMap web tools for database-assisted identification of combinatorial cis-regulatory elements and the display of highly conserved transcription factor binding sites in Arabidopsis thaliana. Nucleic Acids Res 33:W397–W402 5. O’Connor TR, Dyreson C, Wyrick JJ (2005) Athena: a resource for rapid visualization and systematic analysis of Arabidopsis promoter sequences. Bioinformatics 21:4411–4413 6. Megraw M, Baev V, Rusinov V, Jensen ST, Kalantidis K, Hatzigeorgiou AG (2006) MicroRNA promoter element discovery in Arabidopsis. RNA 12:1612–1619 7. Hoffman M, Zhang MQ (2001) AtProbe: Arabidopsis thaliana promoter binding element database. http://exon.cshl.org/cgi-bin/ atprobe/atprobe
8. Stormo GD (2000) DNA binding sites: representation and discovery. Bioinformatics 16:1 6–23 9. Pandey SP, Krishnamachari A (2006) Computational analysis of plant RNA Pol-II promoters. Biosystems 83:38–50 10. Durbin R, Eddy S, Krogh A, Mitchison G (1999) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge 11. Zhou X, Ruan J, Wang G, Zhang W (2007) Characterization and identification of microRNA core promoters in four model species. PLoS Comput Biol 3:e37 12. Solovyev VV, Shahmuradov IA (2003) PromH: promoters identification using orthologous genomic sequences. Nucleic Acids Res 31:3540–3545 13. Xie Z, Allen E, Fahlgren N, Calamar A, Givan SA, Carrington JC (2005) Expression of Arabidopsis MIRNA genes. Plant Physiol 138:2145–2154 14. Griffiths-Jones S, Grocock RJ, van Dongen S, Bateman A, Enright AJ (2006) miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res 34:D140–D144 15. Lu C, Meyers BC, Green PJ (2007) Construction of small RNA cDNA libraries for deep sequencing. Methods 43:110–117 16. Sethupathy P, Corda B, Hatzigeorgiou AG (2006) TarBase: a comprehensive database of experimentally supported animal microRNA targets. RNA 12:192–197
17. Hofacker IL, Fontana W, Stadler PF, Bonhoeffer S, Tacker M, Schuster P (1994) Fast folding and comparison of RNA secondary structures. Monatsh Chem 125:167–188
MicroRNA Promoter Analysis
161
1 8. Zuker M (2003) Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res 31: 3406–3415
Chapter 12 Computational Methods for Comparative Analysis of Plant Small RNAs Gayathri Mahalingam and Blake C. Meyers Abstract Small RNAs play an important role in plant development, stress responses, and epigenetic regulation, primarily through their role in transcriptional and post-transcriptional silencing of specific target genes and loci. Most if not all plants utilize these small RNA signaling networks. We have developed a deepsequencing based dataset of plant small RNAs, based on the hypothesis that comparisons among the complex pool of small RNAs from diverse plants will identify novel types of conserved, regulated, or species-specific molecules. A database containing upward of hundreds of millions of plant small RNA sequences is being created for comparative analyses. This small RNA database will allow the experimental characterization of the majority of the biologically important small RNAs for a range of plant species. This database can be accessed from our website (http://smallrna.udel.edu/). A variety of web-based tools have been developed for analyses of these data. Here, we focus on these tools, and we describe how the users can implement these tools to analyze and interpret the small RNA data and how the users could use similar approaches for other sets of plant small RNAs from diverse species. Key words: Small RNA; Comparative analysis
1. Introduction 1.1. Small RNAs
Two major types of small RNAs (21–24 nucleotides in size), known as small interfering RNAs (siRNAs) (1) and microRNAs (miRNAs) (2), are present in a wide variety of eukaryotic organisms (3, 4). miRNA molecules originate from distinct genomic loci predicted to form “hairpin” structures that often have an imperfect double-stranded characteristic (5). They are cleaved from the hairpin as a duplex by a DICER-LIKE protein (DCL1), and the miRNA strand of this duplex becomes associated with an AGO protein in a complex called RISC (6, 7). In plants, miRNAs
B.C. Meyers and P.J. Green (eds.), Plant MicroRNAs, Methods in Molecular Biology, vol. 592 DOI 10.1007/978-1-60327-005-2_12, © Humana Press, a part of Springer Science + Business Media, LLC 2009
163
164
Mahalingam and Meyers
almost always induce cleavage and accelerated degradation of their “target” mRNAs by forming base-pairing interactions (8, 9). They can also direct cleavage of non-coding RNAs to induce production of trans-acting siRNAs (ta-siRNAs) (10, 11). In addition, some miRNAs act by preventing mRNA translation and thereby limiting protein production (12, 13). In contrast to siRNAs, miRNAs usually do not match perfectly to their target mRNA molecules (14–16). The biological roles of miRNA are predominantly associated with development (4, 17, 18), but they also play a role in stress responses. Differential accumulation of miRNAs in different tissues is common, and many miRNAs target transcription factor mRNAs (19, 20). Moreover, developmental defects are associated with the miRNA metabolism mutants discussed above (10, 21–24). Although the association of small RNAs with viral resistance has been known for some time, more recent data demonstrate a role of miRNAs in plant disease resistance against bacterial or other pathogens (25, 26). Associations of miRNAs and natural antisense siRNAs with abiotic stresses are also well documented (27, 28). Functions of siRNAs include protection against viruses and mobile genetic elements and other repetitive sequences (29, 30). In addition, several genes are regulated at the level of chromatin modifications and DNA methylation, and small RNAs play a major role in the establishment and maintenance of these marks (31, 32). 1.2. Conservation of Small RNAs in the Plant Kingdom
Cloning of miRNAs from moss and other lower land plants indicates that miRNAs are conserved over more than 400 million years of evolution (33–35). While these data suggest that many miRNAs are evolutionarily conserved, some miRNAs may not be conserved across species, such as the Arabidopsis miRNAs 158, 161, 163, and 173, which are not present in rice (16). At least two of these miRNAs have evolved recently, suggesting that other species-specific miRNAs are likely to be found (36); indeed, our recent deep sequencing in an rdr2 background demonstrates that many nonconserved miRNAs are found at low expression levels and have eluded detection because of these low abundance levels. The discovery of new miRNAs with unique characteristics, as well as more detailed studies of known miRNAs, will likely lead to new computational tests to enable miRNA discovery; examples of this have been published (37, 38).
1.3. Novel Technologies for Deep Sequencing of Small RNAs
Several approaches have emerged for sequencing of small RNAs (39). Previously, the standard method for the isolation and identification of small RNAs involved gel-purification and cDNA production, followed by the cloning of single molecules or longer “concatamers” of these molecules that are subsequently sequenced (40–42). The latest generation of sequencing technologies includes a method called Sequencing by Synthesis (SBS), developed
Computational Methods for Comparative Analysis of Plant Small RNAs
165
by Illumina, Inc. SBS enables deep sequencing at a low cost (>2,000,000 sequences per channel), while enabling the complete sequencing of the small RNA. Several other technologies have been described or promised that report potential total numbers of reads that meet or exceed those of SBS. The methods that we describe in this chapter all assume this level of sequencing depth, and methods for the construction of the libraries are described in Chapter 8 of this volume. 1.4. Deep Sequencing to Identify Small RNAs in Diverse Plant Species
To generate whole-genome data on the diversity and abundance of siRNAs and miRNAs in plants, this chapter assumes the availability of a set of libraries representing several, preferably diverse plant species. Because of the depth of sequencing, these data will define nearly the complete repertoire of endogenous small RNA molecules generated in these plant species. Our own project includes a broad selection of more than 30 plant species, but the methods and applications that we describe could be applied to a much smaller set of species.
1.5. Selection of Species from Across the Plant Kingdom
The set of small RNA libraries that we are developing is indicated in Fig. 1, and this list includes nearly 100 libraries representing more than 33 plant species. To improve our ability to match small RNA sequences to their genomic sources, a set of selected species includes most plants for which extensive genomic resources are being developed. The species were selected on the basis of the following series of criteria: 1. Diversity in the plant kingdom. We have chosen representative species that represent either important families or nodes. The Floral Genome Project has worked on many of these species for several years and has an extensive justification for their species selections, available on the web (http://fgp.bio.psu.edu/ fgp/taxa/rationale.html). Their rationale includes the following: (a) phylogenetic position; (b) diversity of floral-organ structure; (c) direct relevance to crop or economic plants; (d) diploid with a small genome size; (e) availability of inbred lines, when possible; (f) desirable properties, such as large numbers of flowers per plant, transformability, and prior flower developmental studies; and (g) lack of prior genomic resources. 2. Economic importance. Because the focus of the NSF Plant Genome Research Program is on economically important crops, we have biased our selection in the favor of these species. We have also selected two families for focused analyses including the Poaceae and Solanaceae. 3. Research importance. While we have ongoing small RNA projects in many of the important model species (Arabidopsis, rice, Medicago truncatula), there are many other model species of growing importance.
166
Mahalingam and Meyers Cucurbita maxima (pumpkin) Fabaceae (Medicago, soybean) Malus (apple) Populus (poplar) Gossypium (cotton) Citrus sinensis (orange) Arabidopsis Silene alba Vitis vinifera (grape) Ribes (currant) Antirrhinum (snapdragon) petunia tobacco pepper tomato potato Lactuca sativa (lettuce) Mimulus guttatus Vaccinium (blueberry) Mesembryanthemum (ice plant) Beta (beet) Eschscholzia (California poppy) Persea americana (avocado) Liriodendron (yellow poplar) Saruma henryi Halophila decipiens (paddle grass) Crocus sativus Cocos nucifera (coconut palm) Musa acuminata (banana) rice switchgrass maize sorghum barley wheat Acorus (sweet flag) Illicium (star anise) Nuphar advena (water lily) Amborella trichopoda Cryptomeria (Japanese cedar) Welwitschia Gnetum Ephedra (Mormon tea) Pinus radiata Ginkgo biloba Zamia fischeri (cycad) Marsilea quadrifolia Ceratopteris Angiopteris Psilotum Selaginella Lycopodium Physcomitrella (moss) Anthoceros (hornwort) Marchantia (liverwort) Chara Coleochaete Mesostigma Acetabularia Caulerpa Chlamydomonas Volvox
*
*
eudicots
Solanaceae
*
*
ferns
*
monocots
green algae
*
non-seed land plants
gymnosperms
basal angiosperms
Poaceae
* **
magnoliids
*
* * Fig. 12.1. Phylogenetic tree of the selected plant species: Branch lengths are approximate. Gray text indicates species that will not be sampled (shown for reference only) or for which small RNA data are already available or underway. Asterisks indicate whole genome sequences are available or underway (only Capsella rubella, and Triphysaria are not shown); Arabidopsis has two sequences, A. lyrata and A. thaliana. Gymno- and angiosperm tree modified from the FGP; lower plant phylogeny is modified from http://www.greenbac.org/. Gray boxes indicate the families on which we are focused.
Computational Methods for Comparative Analysis of Plant Small RNAs
167
4. Availability of genomic sequence data. For the most part, these resources have been developed for the species selected for reasons 1–3.
2. Materials 2.1. Project Data
1. The database tables are built with MYSQL for the public web server, for which we use a Dell PowerEdge 2950. The web interface and the tools, mainly written in PHP, extract the data requested by the user from the MYSQL tables and display the query results in tabulated form and as downloadable data sheets. See Note 1 for details on the implementation of the database. 2. The normalized and raw data are available from our website for those who want to perform their own analysis.
2.2. Analysis Tools
1. The tools run on the web server. These tools were written in PHP and retrieve data from the tables that reside in our MYSQL server. 2. These tools can be accessed from our public server and provide a variety of ways to access and analyze the small RNA data from the species we have described above.
3. Methods In this section, we give a detailed description of the tools that we have developed for comparative analysis of the majority of biologically important small RNAs for a range of plant species. The tools that were developed are as follows: 1. A tool for small RNA comparisons to genomic DNA sequences, reporting only perfect matches. 2. A tool for comparisons of small RNAs and either genomic DNA or other small RNAs, allowing mismatches. 3. A library comparison tool to identify small RNAs conserved in multiple libraries or different plant species. 4. An miRNA target prediction tool. 5. A library comparison tool to identify small RNAs that demonstrate evidence of differential regulation in different tissues or species. We encourage the reader to utilize our website for hands-on experience with these tools. These tools can be accessed from the
168
Mahalingam and Meyers
home page of our NSF-funded project entitled “Comparative Sequencing of Plant Small RNAs” (http://smallrna.udel.edu). 3.1. A Tool for Perfect Match Comparisons of Small RNA vs. Genomic DNA
This tool is a small RNA mapping tool that can be used to map sequences from one of our small RNA libraries onto a DNA sequence of interest to the user. The tool identifies perfect matches between the DNA sequence and the sequences from the library selected by the user. This could be used to find siRNAs derived from a repeat or other sequence within a genomic fragment, or could find an miRNA if it is encoded on that clone. 1. The input DNA sequence is submitted in FASTA format, and we have currently designated a limit of 50,000 characters. The interface also enables the user to either paste their DNA sequences in the text box provided, or to upload their own file containing the DNA sequences in FASTA format. 2. The web interface provides a set of DNA sequences to the user as an example for the user to test the tool. This example can be accessed by pressing the “Example” button displayed under the text box (see Fig. 2a) used for entering the DNA sequence.
Fig. 2. The input interface of small RNA mapping tool: (a) The text box and the file upload option to input the DNA sequence (http://smallrna.udel.edu/). (b) Drop down lists showing the list of species and the libraries for each species. (c) Text boxes to input the e-mail address and the subject line for the e-mail to keep track of analyses.
Computational Methods for Comparative Analysis of Plant Small RNAs
169
3. After entering the DNA sequences, the next step is to select the desired library of sequences (species and tissue) that will be mapped onto the DNA sequences. The library is selected by first choosing the species of interest from the drop down menu (shown in Fig. 2b). 4. Only once the species has been selected, the user can select the library from the second drop down menu (shown in Fig. 2b). This kind of implementation enables us to populate the drop down menu with only those libraries that are specific to the selected species. Please refer to Note 2 for the naming conventions for each library. The user also has the option of choosing all the libraries for a given species for the analysis of the DNA sequences. 5. Finally, the user must provide his/her email address for the results to be mailed to him/her once the analysis has been completed (see Fig. 2c); a link to a web page storing the results will be mailed to the user. At this point, all the information required for the tool has been entered and the user can submit his/her request. A link to the results of an example analysis is provided at the bottom of the page so that the user can see how the results will be displayed. 6. The results are displayed with a table for each input DNA sequence. The table contains the following fields: the small RNA sequences from the selected library with perfect matches in the input DNA sequence, and the strand, the coordinate, and the abundance value of each matching small RNA in the analyzed library. If the user entered multiple DNA sequences in the FASTA input file, then in the output he/she can jump to a particular table of results by clicking on the DNA sequence name listed at the top of the page. 7. The results page also provides links to a file containing the input DNA sequences, a link to download the results as an Excel worksheet, and a link to download the results as a text file. These links help the user track his/her input DNA sequences and to save the analysis for future reference; the results of the analysis are stored in our database for only 48 hours from the time of request. The user can place any number of requests in one day. 3.2. A Tool for Small RNA vs. Genomic DNA Comparisons, for Mismatches
This tool, also known as the miRNA mismatch tool, is used to compare known miRNAs or small RNA sequences against a userselected library of other small RNAs. The tool allows mismatches in the alignment of two small RNAs, but does not allow insertion/ deletion events (indels). This tool could be used to find miRNAs that have slightly diverged in the species of interest compared with known, annotated miRNAs.
170
Mahalingam and Meyers
1. The web interface of this tool enables the user to compare the miRNA or small RNA sequences of interest against one of the libraries selected from our database. The text box at the top of the page allows the user to input his/her miRNA sequences of interest in FASTA format with a limit of 50,000 characters. The tool also provides the option of using known miRNA sequences for certain plant species obtained from the Sanger miRNA registry (http://microrna.sanger.ac.uk). In order to utilize this option, the user can select a species by clicking the checkbox next to the name of the species from the list of species and then by pressing the button “Paste miRNAs” (shown in Fig. 3a). This will automatically paste a set of anno-
Fig. 3. The input interface for analyses of small RNAs against DNA sequences, allowing mismatches: (a) The text box to input the DNA sequence and the example sequences given as options. (b) The drop down lists showing the species and the library, options for the length of the signatures, and the options showing the maximum number of mismatches allowed.
Computational Methods for Comparative Analysis of Plant Small RNAs
171
tated miRNA sequences for the selected species in the text box. The list of species with available known miRNAs currently includes Arabidopsis, rice, maize, and Medicago. 2. The library against which the comparison should be made can be selected from two drop down lists, one displaying the set of species available in our database and the other one providing the list of libraries available for the species selected. The tool only enables the selection of a library after the selection of a species from the drop down list. 3. A subset of the signatures or small RNA sequences from the selected library for comparison against the input sequences can be chosen based on their lengths. This is useful to distinguish potential miRNAs from siRNAs. The tool provides three options for the selection of these sequences (shown in Fig. 3b) which could have the following characteristics: (a) The small RNAs which are of exactly the same length as the input miRNA sequence. This option allows the user to match each miRNA sequence with signatures from the selected library having the same length as the miRNA sequence. (b) The signatures which are ±2 nucleotides in length to the miRNA sequence. This option allows the user to match the miRNA sequences with those signatures which are shifted 5¢ or 3¢ by a difference of two nucleotides from that of the miRNA sequence against which it is matched. (c) The signatures which are of any length. This option allows the user to match the miRNA sequences against the entire set of signatures in the selected library, regardless of their length. This will find a match of one small RNA “embedded” or contained within a longer small RNA. 4. Since the tool allows mismatches, the maximum number of allowed mismatches must be selected by the user from one of the options that we provide in a list, as shown in Fig. 3b. The number of mismatches ranges from zero to a maximum of five. 5. The results of the miRNA mismatch tool are tabulated for each small RNA input sequence. The results table lists the input miRNA sequence, the signature matched with the input sequence (taking the maximum number of mismatches allowed into account), the start and end positions of the library-derived small RNA compared to the input sequence, the number of mismatches, and the abundance values of the small RNA in the libraries selected. The mismatches are highlighted with a different color in the sequence to identify the positions where the mismatches occurred. 6. The results page includes links for downloading the input sequences and also the output results in either text format
172
Mahalingam and Meyers
and/or as an Excel worksheet. These results can only be downloaded when the output exceeds 1,000 entries for a given set of input sequences. This prevents delays in loading a large html page. 3.3. Library Comparison Tool to Identify Conserved Small RNAs
This tool can be used to perform an interlibrary comparison to identify the conserved small RNAs (sequences found in multiple libraries). The tool identifies those conserved small RNAs that occur in a single, primary library as well as all or a subset of the other libraries that the user has selected. Because miRNAs tend to be present in multiple libraries and are often conserved across species, this tool may be useful for distinguishing miRNAs from siRNAs. 1. The primary library can be selected from the two drop down lists at the top of the main page of this tool. As described for other tools, the species can be selected using the first drop down list, and then the library can be selected from the second drop down list (see Fig. 4a). The primary library is defined as a library in which the conserved small RNA must appear in addition to the other libraries that are analyzed. 2. The tool includes an option to allow the user to paste sequences as a primary library. This option, when enabled, provides the
Fig. 4. The input interface of the tool for identifying small RNAs conserved across multiple libraries: (a) The drop down lists for selecting the species and the library as the primary library and a list of checkboxes to select the secondary libraries. (b) The text box to input the small RNA sequences for the primary library.
Computational Methods for Comparative Analysis of Plant Small RNAs
173
user with a textbox for entry of the sequences. The user could also choose to paste known miRNAs, as described above (see Fig. 4b). 3. A series of checkboxes list all the libraries that are available to the user. The selection of multiple libraries allows for comparison with the primary library to determine which small RNAs are found in the intersection. Once the user selects a primary library, it is automatically included in the list of libraries chosen for comparison. 4. The user then needs to specify the total number of other libraries in which a desired small RNA should be found, in addition to the primary library. For example, if the user chooses the library “CRE1–Control” as the primary library and then chooses four other libraries for comparison, the maximum number of libraries that the user can choose for comparison including the primary library is five. However, the user can choose any value less than the maximum. In this case, the comparison is made with different combinations of the chosen libraries, with all the combinations including the primary library. 5. A link to the results of the analysis is emailed to the user. The results page includes the following details about the analysis: (a) The primary library that the user had selected for the analysis. (b) The number of libraries (out of the maximum number of libraries) chosen for comparison with the primary library. (c) A table tabulating the signatures from the primary library that also appeared in the other libraries, the length of each resulting small RNA, and the abundance values of these sequences in the primary library and in the other libraries chosen for comparison. The table is sorted based on the length of the signatures. The table is not displayed if there are more than 1,000 signatures retrieved from the comparison. Instead, the user can download the results as a worksheet in Microsoft Excel format, as mentioned above. 3.4. Library Comparison Tool to Identify Regulated Small RNAs
This tool is used to identify small RNAs demonstrating different expression levels in the comparison of two or more libraries. The tool identifies those signatures whose abundance in the chosen libraries is greater or lesser than the threshold chosen by the user. Since this analysis presupposes the presence of the small RNA in multiple libraries, it is best suited for identifying miRNAs that show regulation in different tissues or conditions. 1. The initial page of the tool displays the list of available libraries in a table (shown in Fig. 5) with three options to choose for each of the libraries. The first option allows the user to select the library for comparison and display in the final results. The
174
Mahalingam and Meyers
Fig. 5. The initial library selection page for the library comparison tool to identify differentially regulated small RNAs: The list of libraries available for selection or viewing is shown.
second option can be used to select libraries not used in the analysis, but displayed only for viewing purposes in the final result. The third option enables the user to ignore the library. Once the libraries have been selected for either viewing or for comparison, the other criteria for comparison among the chosen libraries can be selected by clicking on the “continue to criteria” button in the initial page. 2. The second page of the interface lets the user choose the other input required for the comparison. All the libraries chosen for comparison or viewing are tabulated with the table containing the following columns (shown in Fig. 6):
Computational Methods for Comparative Analysis of Plant Small RNAs
175
Fig. 6. The selection criteria for the tool to identify differentially regulated small RNAs: The figure shows the selection range of abundances for each selected library, the expression levels for two libraries chosen for comparison and the sorting options for the final result.
(a) The first column lists the library name as a link. Clicking on the library name would open a popup window that displays the information about that library. (b) The second column shows a drop down menu that enables the user to pick a range of values for the abundances of that library for comparison. The different options are “>1,000 TPM,” “101–1,000 TPM,” “11–100 TPM,” “3–10 TPM,” and “1 or 2 TPM.” (c) The third column consists of two text boxes that allow the user to define his/her own lower and upper values of the boundary ranges for the abundances of that library. The column is labeled as “user defined range.” (d) The last column gives the option of ignoring the library for comparison. This is the default option. The user can
176
Mahalingam and Meyers
choose only one option, the second, third, or the fourth column. The libraries chosen for viewing alone will not have the above options displayed in the table. 3. The user, if interested in comparing only two libraries, can choose two libraries from the two drop-down lists (shown in Fig. 6) after checking the checkbox that enables this option. The user can then specify the ratio of expression for signatures in the first library versus the second library. The ratio can be specified as a lower and upper range for the abundance in percentage, or by specifying the lower and upper threshold, or by specifying the exact value of the ratio. The comparison between these two libraries can be performed either as a percentage comparison or as a fold comparison. 4. The default direction of comparison is unidirectional; in other words, the abundance is measured by comparing the value in the first library with that of the second library, so that a query for 10-fold higher abundance will require that the first library have a 10-fold higher abundance than the second and not vice versa. However, the user can choose to perform a bidirectional comparison between the two libraries; in this case, the comparison is performed in both directions (library one vs. two and two vs. one) with the ratio specified by the user. The ratio of expression of the signatures is displayed as the user selects the two libraries for comparison and the range of abundances as percent values or fold values. See Fig. 6 for a graphic view of these options on our website. 5. The tool provides an option of sorting the results based on three libraries from the list of libraries chosen by the user either in ascending or descending order and also based on the length of the signatures. The libraries can be chosen from three individual drop-down lists that enable the user to sort the results in the order in which they are chosen. The table, as shown in Fig. 6, also allows the results to be put in ascending or descending order with respect to each of the three libraries chosen for sorting. 6. The results are displayed in a table with the following fields: the sequences that satisfied the criteria that the user picked for the query; the length of the sequence; and the sequence abundances in each of the libraries chosen by the user, displayed in the sorted order specified by the user. If the user has not chosen any sorting option, then the column is displayed in the order in which the data are retrieved from the database. The libraries chosen for viewing alone are displayed in gray color. 7. The results page includes features that enable the user to view the entire table in separate web pages. By default, each page displays 1,000 rows of the table, and the last page displays
Computational Methods for Comparative Analysis of Plant Small RNAs
177
the rest of the rows in the result, or 1,000 rows, whichever is less. The user can change the number of rows to display from the drop-down menu provided at the top of every page. A maximum of 3,000 rows can be displayed in a page. Each page shows the range of rows displayed out of the entire set of data. The user can jump to a page to view the results in that page by either entering the page number in the textbox provided or by clicking the appropriate page number listed horizontally on the top of each page. These options allow the user to easily navigate through the pages. Finally, the tool also provides an option of downloading the results in an Excel worksheet format. 3.5. Small RNA Reverse Target Prediction Tool
This tool is based on more typical target prediction programs used to predict targets of small RNA sequences such as miRNAs, but since no genome may be available or since the user may have a genomic sequence for which they would like to assess for targets of small RNAs, we developed a “reverse target prediction” algorithm. In this case, the user enters a genomic sequence, and then picks a set of small RNAs to compare against that sequence. In some cases, it may be possible to use this page with the output of another of our small RNA analysis pages. Because this is a computationally intensive search (due to the mismatches permitted for miRNA targets), we limit the size of the input sequence. The result will be a set of small RNAs that match the input sequence to the characteristics of a miRNA, as defined by an algorithm that generates a score based on mismatches and bulges. 1. The initial page of the tool allows the user to input their own genome sequences to search for targets. The input sequence is expected to be in the FASTA format and is limited to 50,000 characters. An example sequence can be chosen by clicking the “Example” button provided below the text box (shown in Fig. 7a) in which the sequences need to be entered. The user can also upload a file containing the genomic sequences in the FASTA format. 2. Once the user inputs their genomic sequences, they can choose a species from the drop-down list (shown in Fig. 7a) from which to use the small RNA data. The user can also utilize the output of one of our other tools, and thus can choose his/her own set of small RNA sequences for reverse target prediction. To minimize the computational time, we also require that the user selects smaller molecules (18–22 nt) or larger molecules (>22 nt). Since miRNAs are usually 20–22 nt, we recommend using the smaller class. 3. Once the user chooses the species for which the small RNA data will be used, they next can adjust the settings for penalty scores for mismatches, bulges, and wobbles in the miRNA-target pair;
178
Mahalingam and Meyers
or the user can choose the default settings. The tool also provides two filter options: (1) the 10/11 nt positions can require perfect matches (as for known miRNAs, as this is the site of cleavage), and (2) the 2–9 nt positions may not allow more than one mismatch (as also the case for known miRNAs); both of these are turned on in the default conditions. All of these options as displayed on the web site are shown in Fig. 7b. 4. The results of the analysis are emailed to the user and are visible on a web page, the link to which is sent in the email. The results page includes a table that displays the target site alignment of the genomic sequence, plus the start and end positions of the sequence, the score, and the total numbers of mismatch and indels, and the strand of the genomic sequence on which the match was found. The table also includes the abundances of the matching small RNAs in the libraries of the species selected by the user for target prediction. The result
Fig. 7. The input interface for the reverse target prediction tool: This tool compares a library of small RNAs to a genomic sequence to identify small RNAs that could target the genomic sequence, using standard miRNA criteria. (a) The text box to enter the genomic sequence and the drop down list showing the species available, and the options provided for the length of the signatures. (b) The settings for the scoring system. The settings allow the user to alter the scoring penalties associated with nucleotide “wobbles” (G:U pairs), bulges, and mismatches.
Computational Methods for Comparative Analysis of Plant Small RNAs
179
of the analysis can be downloaded as an Excel sheet, a link which is provided at the bottom of the results page.
4. Notes 1. The following are the tables used in our database to retrieve the data requested by the user; this table description should be sufficient for a reader who wishes to implement his/her own version of this database. (a) A species table that contains the list of species and its tissues. Each species is identified by a species code, and each tissue is identified by the library code as explained in Note 2. (b) The small RNA data are stored in a table that includes the following columns: (a) the union of the distinct sequences generated for of all the species; (b) the body part of each of these signatures; (c) the length of each signature; and (d) the raw abundance of each signature in each of the libraries, structured in the table with each library occupying a column. The abundance value of a signature for a library is set to zero if that signature is not present in the library. Thus, if there are nine libraries, then the table has ten columns listing the raw abundances for each of these libraries and for each of the sequences in the signature column. Normalized instead of raw abundances can be stored in a similarly structured table, and this may facilitate comparisons across libraries for differentially regulated small RNAs. (c) Other than these tables, each tool has two other associated tables. One of the tables is used to store the pending requests that the user initiates in our system from the web page. A “cron job” (a program which runs on a regular schedule) runs in the background, and this extracts data from the pending request table, identifying jobs that are pending to move them to the processing stage. After processing, the results are stored in another table and can be retrieved from this table with the user’s email address and the time at which the request was placed. 2. Each library is represented by an identifier composed of the three letter code of the species followed by the library number, followed by the name of the tissue separated by a hyphen. For example, the tissue “Leaves” of the species “Lactuca sativa” is represented as “LSA1-Leaves.” In general, library #1 is a leaf library, library #2 is an inflorescence library, and library #3 is a tissue that varies from species to species.
180
Mahalingam and Meyers
Acknowledgments We are grateful to Prakash Janardhan, Mayumi Nakano, Vimal Kannan, Emanuele De Paoli, Pam Green, and other members of the Meyers and Green labs for assistance with the database, web pages, analytical tools, data and discussions about all of these. Work on this project is supported by the NSF Plant Genome Research Program, Comparative Sequencing Project, award #0638525.
References 1. Hamilton AJ, Baulcombe DC (1999) A species of small antisense RNA in posttranscriptional gene silencing in plants. Science 286:950–952 2. Bartel B, Bartel DP (2003) MicroRNAs: at the root of plant development? Plant Physiol 132:709–717 3. Mallory AC, Vaucheret H (2004) MicroRNAs: something important between the genes. Curr Opin Plant Biol 7:120–125 4. Carrington JC, Ambros V (2003) Role of microRNAs in plant and animal development. Science 301:336–338 5. Bartel DP (2004) MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116:281–297 6. Schwarz DS, Hutvagner G, Haley B, Zamore PD (2002) Evidence that siRNAs function as guides, not primers, in the Drosophila and human RNAi pathways. Mol Cell 10:537–548 7. Jones-Rhoades MW, Bartel DP, Bartel B (2006) MicroRNAs and their regulatory roles in plants. Annu Rev Plant Biol 57:19–53 8. Tang G, Reinhart BJ, Bartel DP, Zamore PD (2003) A biochemical framework for RNA silencing in plants. Genes Dev 17:49–63 9. Llave C, Xie Z, Kasschau KD, Carrington JC (2002) Cleavage of Scarecrow-like mRNA targets directed by a class of Arabidopsis miRNA. Science 297:2053–2056 10. Peragine A, Yoshikawa M, Wu G, Albrecht HL, Poethig RS (2004) SGS3 and SGS2/SDE1/ RDR6 are required for juvenile development and the production of trans-acting siRNAs in Arabidopsis. Genes Dev 18:2368–2379 11. Vazquez F, Vaucheret H, Rajagopalan R, Lepers C, Gasciolli V, Mallory AC, Hilbert JL, Bartel DP, Crete P (2004) Endogenous transacting siRNAs regulate the accumulation of Arabidopsis mRNAs. Mol Cell 16:69–79
12. Aukerman MJ, Sakai H (2003) Regulation of flowering time and floral organ identity by a MicroRNA and its APETALA2-like target genes. Plant Cell 15:2730–2741 13. Chen X (2004) A microRNA as a translational repressor of APETALA2 in Arabidopsis flower development. Science 303:2022–2025 14. Kasschau KD, Xie Z, Allen E, Llave C, Chapman EJ, Krizan KA, Carrington JC (2003) P1/ HC-Pro, a viral suppressor of RNA silencing, interferes with Arabidopsis development and miRNA function. Dev Cell 4:205–217 15. Reinhart BJ, Weinstein EG, Rhoades MW, Bartel B, Bartel DP (2002) MicroRNAs in plants. Genes Dev 16:1616–1626 16. Jones-Rhoades MW, Bartel DP (2004) Computational identification of plant microRNAs and their targets, including a stressinduced miRNA. Mol Cell 14:787–799 17. Palatnik JF, Allen E, Wu X, Schommer C, Schwab R, Carrington JC, Weigel D (2003) Control of leaf morphogenesis by microRNAs. Nature 425:257–263 18. Kidner CA, Martienssen RA (2005) The developmental role of microRNA in plants. Curr Opin Plant Biol 8:38–44 19. Rhoades MW, Reinhart BJ, Lim LP, Burge CB, Bartel B, Bartel DP (2002) Prediction of plant microRNA targets. Cell 110:513–520 20. Llave C, Kasschau KD, Rector MA, Carrington JC (2002) Endogenous and silencing-associated small RNAs in plants. Plant Cell 14:1605–1619 21. Vazquez F, Gasciolli V, Crete P, Vaucheret H (2004) The nuclear dsRNA binding protein HYL1 is required for microRNA accumulation and plant development, but not posttranscriptional transgene silencing. Curr Biol 14:346–351
Computational Methods for Comparative Analysis of Plant Small RNAs
22. Jacobsen SE, Running MP, Meyerowitz EM (1999) Disruption of an RNA helicase/RNAse III gene in Arabidopsis causes unregulated cell division in floral meristems. Development 126:5231–5243 23. Vaucheret H, Vazquez F, Crete P, Bartel DP (2004) The action of ARGONAUTE1 in the miRNA pathway and its regulation by the miRNA pathway are crucial for plant development. Genes Dev 18:1187–1197 24. Kidner CA, Martienssen RA (2004) Spatially restricted microRNA directs leaf polarity through ARGONAUTE1. Nature 428:81–84 25. Navarro L, Dunoyer P, Jay F, Arnold B, Dharmasiri N, Estelle M, Voinnet O, Jones JD (2006) A plant miRNA contributes to antibacterial resistance by repressing auxin signaling. Science 312:436–439 26. Lu C, Kulkarni K, Souret FF, Valliappan RM, Tej SS, Poethig RS, Henderson IR, Jacobsen SE, Wang W, Green PJ, Meyers BC (2006) microRNAs and other small RNAs enriched in the Arabidopsis RNA-dependent RNA polymerase-2 mutant. Genome Res (submitted) 27. Borsani O, Zhu J, Verslues PE, Sunkar R, Zhu JK (2005) Endogenous siRNAs derived from a pair of natural cis-antisense transcripts regulate salt tolerance in arabidopsis. Cell 123: 1279–1291 28. Sunkar R, Zhu JK (2004) Novel and stressregulated microRNAs and other small RNAs from Arabidopsis. Plant Cell 16:2001–2019 29. Baulcombe D (2004) RNA silencing in plants. Nature 431:356–363 30. Lippman Z, Martienssen R (2004) The role of RNA interference in heterochromatic silencing. Nature 431:364–370 31. Chan SW, Zilberman D, Xie Z, Johansen LK, Carrington JC, Jacobsen SE (2004) RNA silencing genes control de novo DNA methylation. Science 303:1336 32. Kinoshita T, Miura A, Choi Y, Kinoshita Y, Cao X, Jacobsen SE, Fischer RL, Kakutani T (2004) One-way control of FWA imprinting
181
in Arabidopsis endosperm by DNA methylation. Science 303:521–523 33. Floyd SK, Bowman JL (2004) Gene regulation: ancient microRNA target sequences in plants. Nature 428:485–486 34. Arazi T, Talmor-Neiman M, Stav R, Riese M, Huijser P, Baulcombe DC (2005) Cloning and characterization of micro-RNAs from moss. Plant J 43:837–848 35. Axtell MJ, Bartel DP (2005) Antiquity of microRNAs and their targets in land plants. Plant Cell 17:1658–1673 36. Allen E, Xie Z, Gustafson AM, Sung GH, Spatafora JW, Carrington JC (2004) Evolution of microRNA genes by inverted duplication of target gene sequences in Arabidopsis thaliana. Nat Genet 36:1282–1290 37. Bonnet E, Wuyts J, Rouze P, Van de Peer Y (2004) Evidence that microRNA precursors, unlike other non-coding RNAs, have lower folding free energies than random sequences. Bioinformatics 20:2911–2917 38. Ohler U, Yekta S, Lim LP, Bartel DP, Burge CB (2004) Patterns of flanking sequence conservation and a characteristic upstream motif for microRNA gene identification. RNA 10:1309–1322 39. Meyers BC, Souret FF, Lu C, Green PJ (2006) Sweating the small stuff: microRNA discovery in plants. Curr Opin Biotechnol 17:139–146 40. Reinhart J, Mertz LM, Catt KJ (1992) Molecular cloning and expression of cDNA encoding the murine gonadotropin-releasing hormone receptor. J Biol Chem 267: 21281–21284 41. Aravin AA, Lagos-Quintana M, Yalcin A, Zavolan M, Marks D, Snyder B, Gaasterland T, Meyer J, Tuschl T (2003) The small RNA profile during Drosophila melanogaster development. Dev Cell 5:337–350 42. Lagos-Quintana M, Rauhut R, Lendeckel W, Tuschl T (2001) Identification of novel genes coding for small expressed RNAs. Science 294:853–858
Chapter 13 Biotic Stress-Associated microRNAs: Identification, Detection, Regulation, and Functional Analysis Florence Jay, Jean-Pierre Renou, Olivier Voinnet, and Lionel Navarro Abstract The methods described herein first highlight the strategies that were used to discover a biotic stressassociated miRNA. This involved (1) the selection of transcripts that were more abundant in transgenic plants expressing viral-derived suppressors of RNA silencing and transcripts that were repressed in wildtype seedlings treated with a biotic stress, (2) a 5¢ RACE-derived assay to map miRNA target sites, and (3) a bioinformatic analysis to retrieve specific miRNA loci from the Arabidopsis genome. We then describe methods used to monitor (1) the levels of primary miRNA transcripts (pri-miRNAs)/mature miRNAs and (2) the transcriptional activity of miRNAs in response to a biotic stress and bacterial challenge. Furthermore, we present a strategy to identify additional biotic stress-responsive miRNA genes and get insight into their regulation. This involves (1) a microarray approach that allows detection of pri-miRNAs, coupled with (2) a promoter analysis of co-regulated miRNA genes. Finally, we describe strategies that can be used to functionally characterize individual biotic stress-associated miRNAs, or the miRNA pathway, in disease resistance. Key words: Biotic stress response, miRNA, Bioinformatics, bacteria, Promoter analysis
1. Introduction Although miRNAs were initially characterized in plant and animal development, recent studies revealed a role for miRNAs in controlling the innate immune response (1, 2, 3). For example, Arabidopsis miR393 is a stress-responsive miRNA that contributes to resistance against virulent Pseudomonas syringae pv. tomato strain DC3000 (Pto DC3000), presumably by repressing auxin-signaling (1). In this chapter, we present the strategies that were used to identify and functionally characterize miR393 in antibacterial resistance. We also present methods to identify additional biotic stress-associated miRNAs and get insight into their regulation. These methods can B.C. Meyers and P.J. Green (eds.), Plant MicroRNAs, Methods in Molecular Biology, vol. 592 DOI 10.1007/978-1-60327-005-2_13, © Humana Press, a part of Springer Science + Business Media, LLC 2009
183
184
Jay et al.
be used to retrieve and functionally characterize biotic-stress asso ciated miRNAs from model plants but also from agriculturally important crops whose genomes are partially sequenced.
2. Materials 1. pGEMT easy® vector. 2. GeneRacer RNA adaptor® (Invitrogen). 3. Tri ReagentTM (Sigma). 4. Acrylamide: Acryl/Bis-acryl 19:1 (Eurobio). 5. Hybridization Buffer: Perfect-HybTM plus hybridization Buffer, 1× (Sigma). 6. HybondTM-NX (Amersham). 7. Whatman® (Schleicher and Schuell). 8. Cy3-dUTP/Cy5-dUTP (Perkin-Elmer-NEN Life Science Products). 9. UV cross-linker (StratalinkerTM). 10. Hybridization oven: Hybridizer HB-ID (Techne). 11. Oligotex mRNA mini kit® (Qiagen). 12. RNeasy Plant Mini Kit® (Qiagen). 13. miR StarFireTM oligonucleotide labeling kit (Integrated DNA Technologies). 14. Klenow DNA polymerase. 15. Polynuclotide kinase (PNK). 16. SephadexTM (G25 spin columns). 17. Superscript III® (Invitrogen). 18. Taq DNA polymerase (Qiagen). 19. Oligonucleotides. 20. 1× Murashige-Skoog (MS) liquid medium. 21. Flg22 and flg22A.tum peptides. Flg22A.tum is an inactive peptide derived from the N-terminal part of Agrobacterium tumefaciens flagellin. 22. Pseudomonas syringae pv. tomato strain DC3000 (Pto DC3000), Pto DC3000 hrcC-. 23. NYGA and NYGB medium. 24. Agarose and DNA sequencing gel equipment. 25. Overexpressor vectors (pK7WG2D, pROKII). 26. GFP reporter vector (pBIN61-based).
Biotic Stress-Associated microRNAs
185
27. Arabidopsis miRNA- and siRNA-deficient mutants (dcl and rdr mutants). 28. Arabidopsis transgenic plants overexpressing P19, P1-Hc Pro or P15 virus-derived suppressors of RNA silencing under the strong Cauliflower Mosaic Virus 35 S promoter.
3. Methods 3.1. Identification of Biotic StressAssociated miRNA Loci
The identification of biotic stress-associated miRNA loci involves (1) the selection of putative biotic stress-associated miRNA target transcripts, (2) the mapping of miRNA-directed cleavage sites, and (3) a bioinformatic analysis to retrieve potential stress-associated miRNA-precursors and miRNA mature sequences (see Note 1).
3.1.1. Identification of Putative Stress-Associated miRNA Targets Using Transcriptome Analysis
Transcripts accumulating to higher levels in Arabidopsis transgenic lines overexpressing either P19, P1-Hc Pro or P15 virusderived suppressors of RNA silencing, when compared with wild-type plants, are likely, directly or indirectly, regulated by siRNAs and/or miRNAs (4, 5). Thus, comparing gene sets that are repressed in wild-type seedlings treated with the Pathogenassociated molecular pattern (PAMP) flg22-a flagellin-derived peptide-with gene sets that are more abundant in non-treated suppressor lines allows the identification of mRNAs that may be post-transcriptionally repressed by small RNAs in the flg22 response (Fig. 1a). For example, the auxin receptor Transport Inhibitor Response 1 (TIR1) fulfilled such criterion (Fig. 1a), suggesting that the corresponding transcript might be directly regulated by an endogenous small RNA (see Note 2). 1. Grow A. thaliana Col-0 and transgenic lines overexpressing viral-derived suppressors of RNA silencing for 10 days on plates containing 1× MS medium (Duchefa), 1% sucrose and 0.8% agar under a 12 h photoperiod at 22°C. Transfer the seedlings to MS liquid medium (two seedlings per 500 mL of medium in wells of 24-well-plates). 2. After 2 days, challenge Col-0 plantlets with 1 mM of either flg22 (QRLSTGSRINSAKDDAAGLQIA) or flg22A.tum (ARVSSGLRVGDASDNAAYWSIA) synthetic peptides (Sigma Genosys). Collect plantlets in liquid nitrogen and store at −80°C. For Col-0 seedlings, collect samples at different time points after treatments (in the illustrated example, a 30 min time point was used). 3. Prepare total RNA using RNeasy Plant Mini Kit® (Qiagen). 4. Perform cDNA synthesis, cRNA synthesis, biotin labeling, hybridization, and scanning as recommended by the manufacturer (Affymetrix®).
a Common P1-HcPro/ genes P19/P15
Flg22 DOWN
UP
TIR1 mRNA
5’RACE-derived assay on TIR1 W
t
dc
l1-
9
dc
1 -1 l 2 - dcl3
Mapping the cleavage site by sequencing
AtmiR393a (chromosome 2)
AtmiR393b (chromosome 3)
Blast versus IGR regions and mfold analysis
b
c
Fig. 1. Identification of Arabidopsis miR393. (a) Schematic representation of the methods used to identify miR393 loci. (b) Alignment of TIR1, AFB1, AFB2, AFB3, and rice TIR1 cDNAs. (c) Alignment of miR393 precursor sequences from different plant species. From top to bottom, miR393 precursor sequences derived from: Rice_1, Oryza sativa chromosome 1 (AP002483); Rice_2, Oryza sativa chromosome 4 (OSJN00092); Populus, Populus alba x Populus tremula cDNA (CF231897);
Biotic Stress-Associated microRNAs
187
5. Perform bioinformatic analysis: select transcripts that are repressed in flg22-treated Col-0 seedlings and more abundant in non-treated P19, P15, or P1-Hc Pro transgenic seedlings. 6. Validate microarray data by semi-quantitative RT-PCR analysis (Fig. 1a): Reverse transcribe total RNA into cDNA using SuperScript III® reverse transcriptase (Invitrogen). PCR amplify the full-length candidate transcripts. 3.1.2. Mapping the miRNA-Directed Cleavage Site
A modified rapid amplification of cDNA ends (RACE) assay is commonly used to detect 3¢-cleavage products derived from plant miRNA targets. Using this method, we found that TIR1 mRNA is cleaved in Col-0, La-er, dcl2-1, and dcl3-1, as evidenced by detection of PCR-amplified cleavage product (Fig. 1a). However, no PCR amplification was detected in the dcl1-9 mutant background, indicating that the TIR1 transcript is likely specifically cleaved by an endogenous miRNA rather than a siRNA (Fig. 1a). 1. Collect and store 12-day-old Col-0, La-er, dcl1-9, dcl2-1, dcl3-1, and dcl4-2 seedlings at −80°C. 2. Extract total RNA using Tri ReagentTM (Sigma). 3. Enrich PolyA RNAs from 100 to 150 mg of total RNA using an Oligotex® mRNA mini kit (Qiagen). 4. Add PolyA RNAs directly into tubes that contain the lyophilized GeneRacer RNA oligo. 5. Incubate at 65°C for 5 min and place on ice for 2 min. 6. After adding the T4 RNA ligase, incubate for 1 h at 37°C. 7. Centrifuge and place on ice. 8. Precipitate the RNA. 9. Incubate at 65°C for 5 min and chill on ice for 5 min. 10. Proceed with reverse transcription using Superscript III® reverse transcriptase (Invitrogen). 11. Perform a PCR amplification using the GeneRacer 5¢ primer (5¢-CGACTGGAGCACGAGGACACTGA-3¢) and the Gene Racer 3¢ primer (5¢-GCTGTCAACGATACGCTACGTAAC G-3¢) to generate a pool of 5¢ RACE products. 12. Perform a PCR amplification (35 cycles) using the 5¢Nested GeneRacer primer 5¢- and a 3¢ gene specific primer (5¢-TTAT AATCCGTTAGTAGTAATGATTTG-3¢ for TIR1). The pool of 5¢ RACE products generated at point 11 is used as template in this PCR amplification.
Fig. 1. (continued) Lotus, Lotus japonicus chromosome 5 (AP004970); Arabidopsis_1, Arabidopsis thaliana chromosome 2 (between At2g39880 and At2g39890); Arabidopsis_2, Arabidopsis thaliana chromosome 3 (between At3g55730 and At3g55735); Malus_x_domestica, Malus x domestica cDNA (CN904957); and Medicago, Medicago truncatula (AC147434).
188
Jay et al.
13. Clean PCR products using QIAquick® PCR purification kit (Qiagen) and load those amplifications on 1% agarose gel (see Note 3). 14. Clone the 3¢ cleavage products into pGEMT easy® vector. 15. Sequence the inserts. 3.1.3. Identification of miRNA Precursors and Mature miRNA Sequence
Alignment of the Arabidopsis TIR1 cDNA sequence with TIR1 paralogs and Oryza sativa TIR1 orthologs revealed a conserved 23-nucleotide sequence motif in this region (Fig. 1b). This sequence motif was BLASTed against Arabidopsis intergenic sequences (IGRs) and led to the identification of one hit on chromosome 3 (IGR between At3g55730 and At3g55735) and another one on chromosome 2 (IGR between At2g39880 and At2g39890). Analysis using the Mfold software of each IGR region surrounding the predicted miRNA sequence produced a prediction of stable stem loop structures of 174 nt (chromosome 3; DG = −66.9 kcal/mol) and 149 nt (chromosome 2; DG = −54.6 kcal/mol) (Fig. 1a). Several orthologous precursors were identified, and their alignment revealed a conserved 21-nucleotide sequence motif corresponding to the mature miR393 sequence (Fig. 1c). This miRNA sequence motif begins with a U, like the majority of plant miRNAs loaded in AGO1 (20, 21), and the predominant cleavage occurs at position 10 from the 5¢ end of the miRNA. The Arabidopsis miR393 precursors were referred to as At-miR393a and At-miR393b, which are located on chromosome 2 and 3, respectively (22). 1. Align the candidate cDNA sequence with paralogs and orthologs in order to find a conserved motif containing the miRNA target site (see Note 4). 2. Blast this motif against Arabidopsis IGRs from the EMBL databases EST division and against other plant genomes from the plant division (PNL) (5). 3. Subject each IGR surrounding this motif to Mfold structure predictions (http://mfold.bioinfo.rpi.edu/cgi-bin/rna-form1.cgi). 4. Align the different precursor sequences.
3.2. Detection of Biotic Stress-Associated miRNAs
A method to monitor miRNA-transcriptional activities in response to biotic stress is first presented. Methods to detect biotic stressassociated pri-miRNA and mature miRNAs are subsequently described.
3.2.1. miR393Transcriptional Activity In Response to flg22 Treatment
RT-qPCR analysis revealed an increase in eGFP mRNA levels in miR393a.p::eGFP transgenic plants treated with flg22 (Fig. 2b). This result indicates the presence of flg22-responsive elements in miR393a promoter (e.g., 1.5 kb upstream region of miR393a
Biotic Stress-Associated microRNAs
189
Fig. 2. miR393 is transcriptionally induced upon flg22 treatment. (a) Northern blot analysis of miR393 in response to flg22. Northern blot analysis of miR393 (left panels) and miR171 (right panels) upon flg22 (upper panels) and flg22A.tum (bottom panels) treatments. rRNA, ethidium bromide staining of ribosomal RNA. (b) Flg22-responsiveness of miR393a promoter. T2 transgenic lines expressing either At-miR393a or At-miR393b promoters in fusion to the reporter gene eGFP. Three independent transgenic lines expressing either reporter constructs are depicted. Real-time RT-PCRs were performed to assess the relative mRNA level of eGFP upon flg22 or flg22A.tum treatments. mRNA levels were normalized to that of Actin2 (At3g18780). Error bars represent standard deviation from four PCR results.
stem loop contains ten copies of the stress-related W-box elements (6), data not shown). 1. Clone the 1.5 kb DNA sequences that are upstream of the miRNA stem loop structures (see Note 5). 2. Fuse those DNA sequences to a reporter gene (e.g., the enhanced Green Fluorescence Protein (eGFP) using a pBIN61-based reporter vector for instance). 3. Transform these constructs in Arabidopsis to generate stable transgenic lines. 4. Prepare cDNA samples as described in Subheading 3.2.2.
190
Jay et al.
5. Perform qPCR analysis using SYBR® Green qPCR kit (EUROGENTEC) with eGFP specific primers (Forward 5¢-ACGTAAACGGCCACAAGTTC-3¢, Reverse 5¢-AAGTCGTGC TGCTTCATGTG-3¢). PCR is performed in 96-well optical reaction plates heated at 95°C for 10 min, followed by 45 cycles of denaturation at 95°C for 15 s, annealing at 60°C for 20 s, and elongation at 72°C for 40 s. A melting curve is performed at the end of the amplification by steps of 1°C (from 95 to 50°C). Transcript levels are normalized to that of Actin2 (At3g18780): Forward 5¢-GCACCCTGTTCTTCTTACCG-3¢ and Reverse 5¢-AACCCTCGTAGATTGGCACA-3¢ and TIP4-1 like gene (At4g34270): Forward 5¢-GTGAAAACTGTTGGAGAGAAGC AA-3¢ and Reverse 5¢-TCAACTGGATACCCTTT CGCA-3¢. 3.2.2. Detection of pri-miR393a/b Induction Upon flg22 Treatment
We found that pri-miR393a, but not pri-miR393b, was significantly induced in flg22-treated seedlings (data not shown). 1. Challenge 12-day-old Col-0 seedlings with either flg22 or flg22A.tum at a final concentration of 10 mM and store samples at −80°C (see Note 6). 2. Extract total RNA using RNeasy Plant Mini Kit (Qiagen®). 3. Reverse transcribe RNA samples into cDNA using SuperScript III® reverse transcriptase (Invitrogen). 4. PCR amplify the primary miRNA transcript using Taq DNA polymerase (Qiagen®). PCR conditions: 2 min, 94°C (first cycle); 30 s, 94°C; 30 s, 58°C; 1.5 min, 72°C (35–38 cycles); and 10 min, 72°C (last cycle). Primer sequences are as follows (see Note 7): pri-miR393a Forward 5¢-GAGATAGAGAGTTGA ACAAATTCTTC-3¢, Reverse 5¢-GTATCCATGATAGTTG AGAAATTTGC-3¢; pri-miR393b Forward 5¢-ACACCATT GCTCCCACCTTGAAAGA-3¢, Reverse 5¢-CGCCGTTGATG TCTCCGGTCATG-3¢. To control equal cDNA amount in each reaction, perform a PCR with primers corresponding to Actin2 (At3g18780): Forward 5¢-GCACCCTGTTCTTCTTA CCG-3¢ and Reverse 5¢-AACCCTCGTAGATTGGCACA-3¢ and TIP4-1 like gene (At4g34270): Forward 5¢-GTGAAAACTG TTGGAGAGAAGCAA-3¢ and Reverse 5¢-TCAACTGGAT ACC CTTTCGCA-3¢ as control. 5. Separate PCR products on a 1.2% agarose gel and visualize bands after ethidium bromide staining.
3.2.3. Detection of miR393 Induction by Northern Analysis Using Polyacrylamide Gels
Northern analysis showed a ~2-fold increase in miR393 accumulation after 20 min and 60 min of flg22 treatment, whereas levels of the unrelated miR171 remained unaltered (Fig. 2a) (see Note 8). 1. Challenge 12-day-old Col-0 seedlings with either flg22 or flg22A.tum at a final concentration of 10 mM (see Note 6) and store samples at −80°C.
Biotic Stress-Associated microRNAs
191
2. Extract total RNA using Tri ReagentTM (see Note 9). 3. Resuspend total RNA in 50% formamide. 4. Heat 40 to 60 mg of total RNA at 95°C for 5 min and chill on ice for another 5 min. Add ¼ volume of loading buffer (50% glycerol, 50 mM Tris-HCl pH7.7, 5 mM EDTA, 0.03% Bromophenol Blue). 5. Load total RNA on a 17.5% acrylamide gel (using Acryl/Bisacryl 19:1, Eurobio). The gel is pre-run for 30–60 min in TBE 0.5× buffer before loading the samples. 6. Migrate for 4 h at constant voltage (80 V) in TBE 0.5× buffer. 7. Transfer RNA to a HybondTMNX membrane using Transblot Cell (Biorad). Transfer conditions: constant voltage under 80 V for 75 min in TBE buffer 0.5×buffer. 8. Place the membrane on a Whatman® 3MM paper soaked with SSC 2× for 5 min. 9. Cross-link RNA using a UV cross-linker (StratalinkerTM) (see Note 10). 10. Transfer the membrane into a rotating glass tube in a hybridization oven (Hybridizer HB-1D, Techne). Prehybridization is performed by adding 20 mL of Perfect-HybTM Plus (Sigma) at 42°C for at least an hour (23). 11. End label the DNA oligonucleotide (which is complementary to the miRNA sequence of interest) with g-32P-ATP using T4 PNK (New England Biolabs, Beverly, MA) as described by the manufacturer (see Note 11). Add this probe in the Perfect HybTM plus buffer (Sigma) and perform the hybridization overnight at 42°C. 12. Wash the membrane twice with 20 mL of SSC 2×, SDS 2% for 20 min at 42°C (see Note 12). 13. Expose the membrane to X-ray film (at least 24 h exposure for miR393 detection). 3.3. Understanding the Regulation of Biotic Stress Responsive miRNAs
Promoter analyses are used to provide additional insights into the regulation of stress-associated miRNAs. This involves methods to (1) retrieve known cis-regulatory elements within a biotic stress-responsive promoter sequence; and (2) profile biotic stressassociated miRNA transcripts, implement a clustering analysis of their expression profiles and perform a promoter analysis on the miRNA genes that are co-regulated in response to the biotic stress.
3.3.1. Promoter Analysis of Known cis-Regulatory Sequences
A large number of known cis-regulatory element databases are available online and can be used for this purpose (e.g., PLACE database: http://www.dna.affrc.go.jp/PLACE/signalscan.html). Alternatively, specific cis-regulatory elements can be retrieved and highlighted simply by using Microsoft Word.
192
Jay et al.
1. Generate a file containing candidate promoter sequences in a fasta format. 2. Use >Edit and >Replace functions. 3. In the appearing window, enter the cis-regulatory element sequence of interest in the ‘Find What’ box and the exact same sequence in the ‘Replace with’ box using the highlight function for the latter one. 4. Use ‘Replace all’ function. 5. Repeat the same analysis by entering the sequence of the cisregulatory motif in the reverse complementary orientation to highlight cis-regulatory elements on the other DNA strand (see Note 13). The methods described hereafter allow (1) the identification of PAMP-responsive miRNAs that may play an important role in antibacterial resistance (2) the functional characterization of individual miRNAs in disease resistance. 3.3.2. Understanding the Regulation of Stress-Responsive miRNAs Using a Microarray Approach Coupled to a Bioinformatic Analysis
Approaches to profile pri-miRNA transcripts coupled with clustering/promoter analyses can be implemented to identify putative over-represented cis-regulatory elements that may play a role in the regulation of biotic stress-associated loci (see Note 14, Fig. 3). 1. Design of pri-miRNA probes: retrieve sequences located upstream of the miRNA stem loop structures (see Note 15). 2. Spot 60–70 mers oligonucleotides corresponding to the reverse complementary sequences onto a microarray slide (see Note 16). 3. Challenge samples with the biotic stress of interest (see Note 17) and extract total RNA using Rneasy Plant Mini Kit® (Qiagen). 4. Perform the in vitro transcription (Ambion), the RT in the presence of Cy3-dUTP or Cy5-dUTP (Perkin-Elmer-NEN Life Science Products).
Fig. 3. Methods to study the regulation of stress-responsive miRNAs. Schematic representation of a method that can be used to identify over-representation of cis-regulatory elements within promoters of co-regulated stress-responsive miRNA genes.
Biotic Stress-Associated microRNAs
193
5. Hybridize the labelled samples to the slide, and scan as described in Lurin et al. (7) and analyse the data as described in Gagnot et al. 6. Promoter analysis: to identify cis-regulatory elements that are over-represented within the promoter of co-regulated miRNA genes use any of the following publicy available programs: that are AlignACE, DIALIGN, FootPrinter, MEME, and MotifSampler. 7. Repeat the same analysis as in step 6, but this time using a set of promoter sequences that are derived from biotic stressinsensitive miRNA genes (used as negative controls).
Fig. 4. miR393 contributes to resistance against virulent Pto DC3000. (a) pri-miR393a/b transcripts are up-regulated in response to Pto DC3000 hrcC- and this induction is suppressed by virulent Pto DC3000. Semi-quantitative RT-PCR analysis of several pri-miRNAs. (b) miR393a and b are transcriptionally induced by Pto DC3000 hrcC- and this induction is suppressed by virulent Pto DC3000. RT-qPCR analysis of the eGFP and FRK1 transcripts in miR393a-p::eGFP and miR393b-p::eGFP reporter lines challenged with Pto DC3000 and Pto DC3000 hrcC-. (c) Overexpression of AFB1 increases susceptibility to virulent Pto DC3000. Growth of Pto DC3000 on AFB1-myc overexpressing lines, Col-0 and tir1-1. (d) Overexpression of miR393 elevates resistance to virulent Pto DC3000. Growth of Pto DC3000 on three independent miR393 overexpressing lines.
194
Jay et al.
3.4. Identification and Functional Analysis of miRNAs Implicated in Antibacterial Resistance
3.4.1. Identification of PAMP-Responsive miRNAs that May Play an Important Role in Antibacterial Resistance
The methods described hereafter allow (1) the identification of PAMP-responsive miRNAs that may play an important role in antibacterial resistance (2) the functional characterization of individual miRNAs in disease resistance. Pseudomonas syringae pv tomato strain DC3000 hrcC- mutant (Pto DC3000 hrcC-) is a type III defective mutant that cannot deliver effector proteins into host cells (8). This bacterial mutant elicits, but cannot suppress, PAMP-triggered immunity and consequently triggers-like flg22- a potent basal defense response. Accordingly, the PAMP-responsive pri-miR393a/b and pri-miR396 are all induced in response to Pto DC3000 hrcC- whereas levels of other PAMP-insensitive pri-miRNAs remain unaltered (Fig. 4a) (25). Several reports indicate that virulent Pto DC3000 can suppress transcriptional induction of PAMP-responsive protein-coding genes (some of these genes are required for basal resistance (9–11)). Similarly, induction of the PAMP-responsive primiR393a/b and pri-miR396b are significantly suppressed by virulent Pto DC3000 (Fig. 4a). This effect occurs, at least in part, at the transcriptional level because similar results were obtained using transgenic lines reporting miR393a/b transcriptional activities (Fig. 4b). Altogether, these results suggest a role for these PAMP-responsive miRNAs in antibacterial resistance. Therefore, to identify the whole set of PAMP-responsive miRNA genes that may play an important role in antibacterial resistance, a genome-wide analysis (described in Subheading 3.3.2) can be conducted. 1. Grow Pto DC3000 hrcC- and virulent Pto DC3000 strains overnight at 28°C in 10 mL of NYGB medium. This medium contains for 1 L: Bacto Yeast Extract (3 g), Bacto Peptone (5 g), Glycerol (20 mL), 1% agar, supplemented with 100 mg/L Rifampicin. Spin down cells by centrifugation for 10 min at 2,500 g at room temperature. Resuspend the bacteria in a 10 mM MgCl2 solution and spin down again for 8 min at 3,500 rpm. Finally, resuspend the cells in a solution of 10 mM MgCl2 and dilute to the appropriate working concentration. 2. Inoculate 5-week-old Arabidopsis fully expanded leaves by syringe-infiltration with a bacterial concentration of 2 × 107 colony-forming units per mL (cfu/mL). 3. Collect samples at different timepoints post-inoculation (see Note 18). The methods described hereafter allow (1) the identification of PAMP-responsive miRNAs that may play an important role in antibacterial resistance (2) the functional characterization of individual miRNAs in disease resistance. 4. Perform RNA extraction, cDNA preparation, and hybridization onto the microarray slide as described in Subheading 3.3.2.
Biotic Stress-Associated microRNAs
195
5. Select pri-miRNAs that are up-regulated in response to Pto DC3000 hrcC- and no longer induced upon virulent Pto DC3000 treatment (see Note 19). We also recommend to select pri-MRNA’s that are repressed in response to P to DC 3000 hrcC and no longer repressed in response to P to DC 3000. Such miRNAs may regulate positive regulators of the plant defense response and their inactivation might increase antibacterial resistance. Primer sequences are as follows (see Note 7): pri-miR166a Forward 5¢-GGGACGAACATAGAAAGAGAGAGA-3¢, Reverse 5¢-AATAT GGAGTAAACAGGGAGCAAC-3¢, pri-miR393a Forward 5¢-GA GATAGAGAGTTGAACAAATTCTTC-3¢, Reverse 5¢-GTATCCA TGATAGTTGAGAAATTTGC-3¢; pri-miR396b Forward 5¢-TT AATTAGTTTTCAGAAGAAGGAG-3¢, Reverse 5¢-CTTCAAATC AATATCTTTGGAAAGAA-3¢; pri-miR393b Forward 5¢-ACACCA TTGCTCCCACCTTGAAAGA-3¢, Reverse 5¢-CGCCGTTGATG TCTCCGGTCATG-3¢. To control equal cDNA amount in each reaction, perform a PCR with primers corresponding to Actin2 (At3g18780): Forward 5¢-GCACCCTGTTCTTCTTACCG-3¢ and Reverse 5¢-AACCCTCGTAGATTG GCACA-3¢. 3.4.2. Inhibition of miRNA Function
The function of an miRNA can be inhibited either by overexpressing an miRNA-resistant target or by knocking out the miRNA genes. 1. Overexpressing miRNA-resistant targets: In the case of miR393 functional analysis, we used Arabidopsis transgenic lines that overexpress a myc epitope tagged version of AFB1, which is a TIR1 paralog that is naturally more resistant to miR393-mediated cleavage (1). Therefore, over-expression of AFB1 should have dominant-negative effects upon a putative miR393-mediated defense response. When inoculated with virulent Pto DC3000, AFB1 transgenic lines had higher bacterial titers compared with nontransformed or tir1-1 plants at 4 days post inoculation (dpi) (Fig. 4c). (a) Generate multiple synonymous mutations in the miRNA target site (see Note 20 and 21). For AFB1 overexpressing lines, AFB1 cDNA (that is partially refractory to miR393mediated cleavage) was introduced in the pROKII vector that carries the strong 35 S promoter (12). (b) Generate Arabidopsis stable transgenic lines that overexpress the miRNA-resistant target (see Note 22). (c) Inoculate virulent Pto DC3000 as described in Subheading 3.2.2, but using an inoculum of 105 cfu/mL. (d) Monitor bacterial growth at 2 and 4 dpi. 2. Knock-out of miRNA using insertion lines: This is performed by selecting homozygous T-DNA/transposons insertion lines in the corresponding miRNA locus (e.g., SALK
196
Jay et al.
T-DNA insertion lines available at http://signal.salk.edu/cgi-bin/ tdnaexpress) (see Note 23). 3.4.3. Overexpression of Stress-Responsive miRNAs
Upon inoculation with virulent Pto DC3000, miR393 overexpressing lines, but not the empty vector transformants, displayed lower bacterial titers at 4 dpi (Fig. 4d), indicating that miR393 contributes to resistance against Pto DC3000. Furthermore, no difference in bacterial growth was observed in transgenic lines over-expressing an artificial miRNA directed against GFP mRNA (13), indicating an miR393-specific effect. 1. PCR amplifies an miRNA precursor from Arabidopsis Col-0 genomic DNA by using primers located 20 bp upstream and downstream of the miRNA stem loop of interest. 2. Introduce this PCR product in a GATEWAY® TOPO Entry vector (Invitrogen) according to the manufacturer’s recommendations. 3. Recombine the insert in a GATEWAY Binary destination vector carrying the strong 35 S promoter cassette (pK7WG2D) (see Note 22). 4. Transform Arabidopsis plants with this construct. 5. Characterize the transgenic lines and inoculate these plants with virulent Pto DC3000 as described in Subheading 3.2.2, using an inoculum of 105 cfu/mL. 6. Monitor bacterial growth at 4 dpi.
3.5. Functional Analysis of RNA Silencing Pathways in Plant Basal Defense
To test the role of small RNA pathways in disease resistance, a set of Arabidopsis mutants that are defective in the accumulation of endogenous siRNA and/or miRNAs are inoculated with the non-virulent bacterium Pto DC3000 hrcC- (see Note 24). This bacterial mutant multiplies poorly on wild-type Col0- and La-er-inoculated leaves (Fig. 5a). However, growth as well as disease symptoms of Pto DC3000 hrcC- are significantly enhanced in the miRNA-deficient dcl1-9 and hen1-1 mutants (Fig. 5a and data not shown, see Note 25). Similar effects are observed when dcl1-9 and hen1-1 plants are challenged with the non-host Pseudomonas syringae pv. phaseolicola (Psp) or the non pathogenic Pseudomonas fluorescens Pf-5 and E. coli W3110 strains (Fig. 5b–d). Collectively, these results indicate that the miRNA pathway plays a preponderant role in basal defense. 1. Grow plants at 21–22°C with an 8 h photoperiod. dcl1-9 seedlings are first grown on plates containing 1× MS medium (Duchefa), 1% sucrose, and 0.8% agar with kanamycin selection. Homozygous seedlings are selected based on their developmental phenotype and then transferred to soil at 10 days post germination (dpg). hen1-1 and the corresponding La-er
Biotic Stress-Associated microRNAs
197
Fig. 5. The miRNA pathway is required for antibacterial basal resistance. (a) Pto DC3000 hrcC- growth is specifically enhanced in miRNA-deficient mutants. (b) as (a) but with Pseudomonas syringae pv. phaseolicola (Psp). (c) as (a) but with Pseudomonas fluorescens Pf-5. (d) as (a) but with E. coli W3110.
control are similarly grown on in vitro plates containing solid medium (without selection) and transferred to soil at 10 dpg. The rest of the plants used in this assay are grown directly from soil. La-er, dcl1-9, and hen1-1 6-week-old plants that were inoculated with E. coli W3110 were then grown for 4 days at 28°C with an 8 h photoperiod before bacterial counting.
198
Jay et al.
2. Perform as described in Subheading 3.4.2 with a concentration of 106 cfu/mL for Pto DC3000 hrcC- and Psp or with a concentration of 108 cfu/mL for P. fluorescens Pf-5 and E. coli W3110.
4. Notes 1. Several other methods that are not described herein allow the identification of a large number of stress-associated miRNAs (e.g., production of small RNA libraries coupled with deep sequencing technologies as described in Chaps. 8 and 14). 2. Because miRNA-mediated translational inhibition is a key mechanism by which miRNA silence their targets (14), we anticipate that a similar comparative analysis at the proteome level will be even more informative. For this purpose, we would recommend using proteomic approaches that are sensitive enough to detect little changes in protein levels (e.g., 2D DIGE, differential in gel electrophoresis). 3. This is an important step to avoid the cloning of non-specific PCR products. 4. This approach is particularly useful to identify evolutionarily conserved miRNAs, such as miR393. However, in the case of rapidly evolving miRNAs (‘young miRNAs’), we would recommend performing a BLAST analysis against closely related genomes (for instance, in the case of a young Arabidopsis thaliana miRNA, we would recommend a blast against Arabidopsis halleri and/or Capsella rubella genomes). 5. Instead of cloning the upstream regions of miR393a and b stem loop structures, we would recommend first mapping the start of pri-miR393a and b transcripts by 5´ RACE, and subsequently fusing the corresponding 1.5 kb upstream regions to a reporter gene. Xie et al. already mapped the 5´ start of several other Arabidopsis pri-miRNA transcripts and this available information can be used for this purpose (15). It is important to note that DNA portions of 1.5 kb long were arbitrarily used in the present study. However, we cannot rule out that important cis-regulatory elements are located even upstream of these DNA sequences. 6. A flg22 concentration of 1 mM is high enough to detect miR393 and pri-miR393a induction in seedlings. 7. To monitor pri-miRNA transcripts by semi-quantitative RT-PCR analysis, we use primers that are designed from each part of the miRNA stem loop structures. 8. Alternatively, plant mature miRNAs can be detected and their abundance quantified using specific quantitative RT-PCR
Biotic Stress-Associated microRNAs
199
assays. For instance the TaqMan ® MicroRNA Assay (Applied Biosystem) use a two-step protocol: reverse transcription with a miRNA specific looped-primer, followed by real-time PCR with TaqMan probe. This method is highly specific (quanti fication of only mature miRNAs, with a single base discrimination) and sensitive. 9. Do not use columns from RNA extraction kits that cannot retain small RNAs. 10. For low abundant miRNAs we recommended to use a chemical cross-linking which enhance the detection sensitivity of small RNAs. Such method was previously described (23). 11. To enhance the detection sensitivity of miRNAs, we recommend using the StarfireTM polymerase extension labeling reaction. Briefly, a 3¢ hexamer extension is added to the target-specific oligonucleotide. In a second step, a template oligonucleotide that carries a complementary hexamer extension together with an oligo-dT10 sequence is annealed to the target specific oligonucleotide. Annealed duplexes are then labeled with g-32P-dATP in the presence of Klenow fragment of DNA polymerase. The latter step allows the addition of ten radiolabeled deoxynucleotides per molecule. To increase the detection sensitivity of miRNAs, we also recommend to use Locked-Nucleic acid (LNA) oligonucleotide. 12. When LNA probes are used, we perform the hybridization and washing at 68°C instead of 42°C. 13. The same analysis can be used to calculate the frequency of representation of known stress-responsive cis-regulatory elements in a set of miRNA promoters derived from co-regulated pri-miRNAs. Besides highlighting the cis-regulatory elements within the promoter region of interest, the ‘Replace all’ function will provide the number of changes that occur all through the promoter sequences of interest. For this type of analysis, a file containing promoter sequences derived from miRNAs that are not responsive to biotic stress should be used as a negative control. 14. To get insight into the regulation of stress-responsive miRNAs, we recommend the use of genome-wide approaches that allow detection of pri-miRNA transcripts rather than mature miRNAs. The main reason being that mature miRNAs are often produced by multiple pri-miRNAs which are not always co-regulated (for instance, flg22 activates the transcription of miR393a but not miR393b in seedlings (Fig. 2b)). Therefore, identifying the subset of stress-responsive pri-miRNAs within the same miRNA subfamily will be essential for subsequent promoter analyses. 15. We have evidence that oligonucleotide probes of 60–70 nt long are working well for this purpose (L. Navarro, O. Voinnet, J.-P. Renou, unpublished data).
200
Jay et al.
16. The microarray slides should preferentially contain probes corresponding to protein-coding gene transcripts, some of which being miRNA targets. 17. We would recommend performing an extensive time course experiment. By doing so, the resolution of the subsequent clustering analysis-displaying the expression pattern of coregulated pri-miRNAs-, will be significantly improved. 18. We recommend the collection of samples between 6 h and 12 h post inoculation. 19. We also recommend to select pri-miRNAs that are repressed in response to Pto DC3000 hrcC- and no longer repressed in response to Pto DC3000. Such miRNAs may regulate positive regulators of the plant defense response and their inactivation might increase antibacterial resistance. 20. Mutations located at opposite positions 10-11 of the miRNA are essential to abolish miRNA-guided slicing. Nevertheless, we recommend the generation of as many synonymous mutations as possible along the miRNA-target site. 21. Alternatively, a strategy involving a non-coding RNA that sequesters an miRNA of interest could be used. This principle occurs in nature to negatively regulate the phosphate starvation-induced miRNA miR399 (16). In this particular case, the non-coding RNA Induced by Phosphate Starvation 1 (IPS1), which is refractory to miR399-directed cleavage due to mismatches opposite positions 10-11 of the miRNAs, sequesters miR399. Franco-Zorilla et al. (16) demonstrated that by engineering IPS1 to mimic target sites of miR156 or miR319 an efficient inhibition of these miRNA activities was also obtained. Therefore, we recommend the use of the same strategy to knock-down the activity of biotic stress-associated miRNAs such as miR393. 22. Because the overexpression of miRNAs or miRNA-resistance targets can significantly alter the normal development and physiology of plants, we would also recommend the generation of transgenic lines that conditionally express these entities (e.g., under a dexamethasone or estradiol inducible promoters). 23. This approach is feasible in the case of miR393 where only two loci are present in the Arabidopsis genome. However, this strategy would not be possible in the case of miRNAs that are produced by a large number of miRNA loci (e.g., Arabidopsis miR169, which is produced by 14 loci). 24. To assess the role of RNA silencing pathways in other plants species than Arabidopsis thaliana, we recommend the use of viral-derived suppressors of RNA-silencing (VSRs). This could be achieved by constitutively or conditionally overexpressing different VSRs in the plant of interest. For this purpose,
Biotic Stress-Associated microRNAs
201
we would recommend using the following VSR proteins: P19 from Tomato bushy stunt virus (TBSV), P1-HcPro from Turnip mosaic virus (TuMV), P15 from Peanut clump virus (PCV), or P25 from Potato virus X (PVX). P19, P1-HC-Pro and P15 suppress siRNA and miRNA functions, whereas P25 suppresses specifically siRNA function (3, 4). 25. We cannot rule out that long siRNAs (lsiRNAs) or natural cis-acting siRNAs (nat-siRNAs), which are also DCL1dependent (17, 18), additionally contribute to the observed disease phenotypes. However, we found that mutants altered in the accumulation of lsiRNA and/or nat-siRNA-, but not in miRNA- (e.g., rdr6 which is fully impaired in nat-siRNA accumulation and partially impaired in AtlsiRNA-1 accumulation (17, 18)) did not rescue the growth of Pto DC3000 hrcC- (Fig. 5a, data not shown). This result suggests that lsiRNAs or nat-siRNAs seem not to significantly contribute to basal resistance in this assay. Similarly, we would recommend the use of rdr6, ago7, nrpd1a, and nrpd1b to distinguish the lsiRNA and/or nat-siRNA effects from the miRNA effects in various functional assays.
Acknowledgments The authors thank P. Dunoyer, S. Dharmasiri, M. Estelle and J.D.G Jones for their discussions and contributions to this work. L.N was supported by a long-term Fellowship from the Federation of European Biochemical Societies (FEBS); O.V and F.J by a grant from the trilateral Génoplante-German Plant Genome Research Program-Spanish Ministry of Research; J-P Renou by Génoplante.
References 1. Navarro L, Dunoyer P, Jay F, Arnold B, Dharmasiri N et al (2006) A Plant miRNA contributes to antibacterial resistance by repressing auxin signaling. Science 312:436–439 2. Taganov KD, Boldin MP, Chang KJ, Baltimore D (2006) NF-kappaB-dependent induction of microRNA miR-146, an inhibitor targeted to signaling proteins of innate immune responses. Proc Natl Acad Sci U S A 103: 12481–12486 3. Jagadeeswaran G, Saini A, Sunkar R (2009) Biotic and abiotic stress down-regulate miR398 expression in Arabidopsis. Plant 229:1009–1014
4. Chapman EJ, Prokhnevsky AI, Gopinath K, Dolja VV, Carrington JC (2004) Viral RNA silencing suppressors inhibit the microRNA pathway at an intermediate step. Genes Dev 18:1179–1186 5. Dunoyer P, Lecellier CH, Parizotto EA, Himber C, Voinnet O (2004) Probing the microRNA and small interfering RNA pathways with virus-encoded suppressors of RNA silencing. Plant Cell 16:1235–1250 6. Kanz C, Aldebert P, Althorpe N, Baker W, Baldwin A et al (2005) The EMBL nucleotide sequence database. Nucleic Acids Res 33:D29–D33
202
Jay et al.
7. Eulgem T, Rushton PJ, Robatzek S, Somssich IE (2000) The WRKY superfamily of plant transcription factors. Trends Plant Sci 5: 199–206 8. Lurin C, Andres C, Aubourg S, Bellaoui M, Bitton F et al (2004) Genome-wide analysis of Arabidopsis pentatricopeptide repeat proteins reveals their essential role in organelle biogenesis. Plant Cell 16:2089–2103 9. Yuan J, He SY (1996) The Pseudomonas syringae Hrp regulation and secretion system controls the production and secretion of multiple extracellular proteins. J Bacteriol 178:6399–6402 10. He P, Shan L, Lin NC, Martin GB, Kemmerling B et al (2006) Specific bacterial suppressors of MAMP signaling upstream of MAPKKK in Arabidopsis innate immunity. Cell 125:563–575 11. Li X, Lin H, Zhang W, Zou Y, Zhang J et al (2005) Flagellin induces innate immunity in nonhost interactions that is suppressed by Pseudomonas syringae effectors. Proc Natl Acad Sci U S A 6(102):12990–12995 12. Navarro L, Zipfel C, Rowland O, Keller I, Robatzek S et al (2004) The transcriptional innate immune response to flg22. Interplay and overlap with Avr gene-dependent defense responses and bacterial pathogenesis. Plant Physiol 135:1113–1128 13. Dharmasiri N, Dharmasiri S, Weijers D, Lechner E, Yamada M et al (2005) Plant development is regulated by a family of auxin receptor F box proteins. Dev Cell 9:109–119 14. Parizotto EA, Dunoyer P, Rahm N, Himber C, Voinnet O (2004) In vivo investigation of the transcription, processing, endonucleolytic activity, and functional relevance of the spatial distribution of a plant miRNA. Genes Dev 18:2237–2242 15. Brodersen P, Sakvarelidze-Achard L, BruunRasmussen M, Dunoyer P, Yamamoto YY et al (2008) Widespread translational inhibition by plant miRNAs and siRNAs. Science 320(5880): 1185–1190
16. Xie Z, Allen E, Fahlgren N, Calamar A, Givan SA, Carrington JC (2005) Expression of Arabidopsis MIRNA genes. Plant Physiol 138:2145–2154 17. Franco-Zorrilla JM, Valli A, Todesco M, Mateos I, Puga MI et al (2007) Target mimicry provides a new mechanism for regulation of microRNA activity. Nat Genet 39:1033–1037 18. Katiyar-Agarwal S, Gao S, Vivian-Smith A, Jin H (2007) A novel class of bacteria-induced small RNAs in Arabidopsis. Genes Dev 21:3123–3134 19. Katiyar-Agarwal S, Morgan R, Dahlbeck D, Borsani O, Villegas A Jr et al (2006) A pathogen-inducible endogenous siRNA in plant immunity. Proc Natl Acad Sci U S A 103: 18002–18007 20. Mi S, Cai T, Hu Y, Chen Y, Hodges E et al (2008) Sorting of small RNAs into Arabidopsis argonaute complexes is directed by the 5’ terminal nucleotide. Cell 133: 116–127. 21. Montgomery TA, Howell MD, Cuperus JT, Li D, Hansen JE et al (2008) Specificity of ARGONAUTE7-miR390 interaction and dual functionality in TAS3 trans-acting siRNA formation. 22. Jones-Rhoades MW, Bartel DP (2004) Computational identification of plant micro RNAs and their targets, including a stressinduced miRNA. 23. Pall GS, Hamilton AJ (2008) Improved northern blot method for enhanced detection of small RNA. Nature protocols 3: 1077–1084 24. Gagnot S, Tamby JP, Martin-Magniette ML, Bitton F, Taconnat L et al (2008) CAT db: a public access to Arabidopsis transcriptome data from the URGV-CATMA platform. Nucleic Acids Research 36:D986–D990 25. Navarro L, Jay F, Nomura K, He SY, Voinnet O (2008) Suppression of the miRNA pathway by bacterial effector proteins. Science 321: 964–967
Chapter 14 Abiotic Stress-Associated miRNAs: Detection and Functional Analysis Dong-Hoon Jeong, Marcelo A. German, Linda A. Rymarquis, Shawn R. Thatcher, and Pamela J. Green Abstract MicroRNAs (miRNAs) are small regulatory noncoding RNAs varying in length between 20 and 24 nucleotides. They play a key role during plant development by negatively regulating gene expression at the posttranscriptional level. Moreover, recent studies reported several miRNAs associated with abiotic stress responses. Small RNA cloning and high-throughput deep sequencing methods provide expression profiles of not only known miRNAs, but also novel miRNAs. In this chapter, we describe the methods used to identify and characterize abiotic stress-associated miRNAs and their target genes. Key words: MicroRNA, Abiotic stress, RLM 5¢-RACE, PARE target library
1. Introduction In plants, small RNAs (20–24 nt), including microRNAs (miRNAs) and short interfering RNAs (siRNAs), are involved in gene regulation through translation inhibition, mRNA cleavage, or directing chromatin modifications (1–3). MicroRNA molecules originate from distinct genomic loci predicted to form “hairpin” structures that often have an imperfect double-stranded characteristic. They are cleaved from the hairpin as a miRNA/miRNA* duplex by a DICER-LIKE protein (DCL1), and the miRNA strand of this duplex becomes associated with an AGO protein in a complex called miRNP (4, 5). In plants, complementarity between the miRNAs and their targets almost always directs the miRNPs to cleave the target mRNAs, accelerating their degradation (6). Many functional studies have demonstrated that plant miRNAs play important roles in various developmental processes, B.C. Meyers and P.J. Green (eds.), Plant MicroRNAs, Methods in Molecular Biology, vol. 592 DOI 10.1007/978-1-60327-005-2_14, © Humana Press, a part of Springer Science + Business Media, LLC 2009
203
204
Jeong et al.
such as organ boundary formation, organ polarity/radial patterning, and in development of root, stem, leaf, and floral organs (1). Also, recent reports indicate that miRNAs are associated with abiotic stress responses and nutrient homeostasis. For example, functional studies using transgenic plants characterized the function of miR399 and miR398, which play important roles in phosphate starvation and oxidative stress, respectively (7, 8). In addition, dcl1 and hen1 mutants are hypersensitive to abiotic stresses (9). HEN1 methylates miRNAs and siRNAs, which is thought to protect them against degradation [(4) and Chapter 10]. The effect of these mutants implies some miRNAs processed by DCL1 and HEN1 have important roles in abiotic stresses. Other studies report that miRNAs are regulated in response to cold, drought, salt, UV-B radiation, phosphate or sulfate starvation, oxidative stress, or mechanical strain (7, 8, 10–15). Some examples are shown in Table 1. To identify and profile small RNA populations including miRNAs, several sequencing approaches have emerged. Previous conventional techniques were based on the cloning of single molecules or longer concatamers that are subsequently sequenced. Recently, high-throughput sequencing methods such as massively parallel signature sequencing (MPSS), 454 sequencing, and sequencing-by-synthesis (SBS) technology have made it possible to access the full complexity of small RNAs in plants (16–19). Additionally, deep sequencing provides quantitative expression information, since the cloning frequency of an individual small RNA generally reflects its relative abundance in the sample. In this chapter, we describe methods and strategies to identify abiotic stress-regulated miRNAs by deep sequencing. Using examples from model systems, we include conditions and strategies for stress treatment. Additionally, to analyze the function of the miRNAs, we provide methods for validation and analysis of target genes using Northern blots, RNA ligase-mediated (RLM) 5¢-RACE, and the genome-wide identification of miRNA-target RNA pairs using a recently developed approach called Parallel Analysis of RNA Ends (PARE). A summary of the overall experimental design is provided in Fig. 1.
2. Materials 2.1. Abiotic Stress Treatment
1. Arabidopsis (Col-0) seeds and rice (Oryza sativa, Nipponbare cv.) seeds. 2. 300 mM NaCl. 3. Murashige and Skoog (MS) medium: 20.6 mM NH4NO3, 18.8 mM KNO3, 0.08 mM H3BO3, 0.05 mM KI, 0.001 mM
MS S-deficient media for 2 weeks
MS S-deficient media for 5 days
Arabidopsis
Rice
MS P-deficient media for 5 days
Rice
Sulfate starvation
MS P-deficient media for 2 weeks
Arabidopsis
Phosphate starvation
28°C for 24 h 42°C for 8 h
4°C under light for 8 h
Rice
Arabidopsis Rice
4°C under light for 8 h
Arabidopsis
300 mM NaCl for 8 h
Rice
Heat
Cold
300 mM NaCl for 8 h
Arabidopsis
Expose to air for 8 h
Rice
Salt
Expose to air for 4 h
Arabidopsis
Drought (desiccation)
Condition
Species
Stress
miR399
miR165, miR172, miR169, miR319, miR389, miR393, miR396, miR397, miR402
miR389, miR393, miR396, miR397, miR402
miR389, miR393, miR396, miR397, miR402
Known responsive miRNAs
OsASP1 (LOC_Os03g53230)
APR1 (At4g04610)
miR395
miR395
OsRNS1 (LOC_Os07g43670) miR399 OsACP1 (LOC_Os01g52230)
RNS1 (At2g02990)
HsfA2 (At2g26150) SPL7 (LOC_Os05g45410)
OsWRKY71 (LOC_ Os02g08440)OsMAPK2 (LOC_Os03g17700)
Rd29A (At5g52310)CBF3 (At4g25480)
Rd29A (At5g52310) COR15A(At2g42540) SalT (LOC_Os01g24710) OsLEA3 (LOC_Os05g46480)
SalT (LOC_Os01g24710) OsLEA3 (LOC_Os05g46480)
Rd29A (At5g52310)
Inducible genes
Table 1 Examples of stress conditions known to regulate miRNAs in model systems
(10)
(10, 43)
(40–42) and Fig. 3
(14, 39)
(37) (38)
(35, 36) and Fig. 2
(11, 13, 30, 34)
(31, 32) and Fig. 2
(13, 30, 33)
(31, 32) and Fig. 2
(13, 30)
References
Abiotic Stress-Associated miRNAs 205
206
Jeong et al. Abiotic stress treatment -Test different stress conditions (time and concentration) -Monitor of known stress-responsive genes
Small RNA library construction and Deep sequencing
Identification of regulated miRNAs -Mining of inducible or repressible known and new candidate miRNAs by data analysis of sequencing -Validation of expression pattern by northern blot, analysis of biological replicates
Identification and characterization of target genes -Validation by RNA ligase-mediated 5’-RACE -PARE target library construction and analysis -Examination of target gene expression -Transgenic approaches over and/or under express miRNAs and target RNAs
Fig. 1. Overall experimental design for analyzing the roles of miRNAs in abiotic stress responses.
Na2MoO42H2O, 0.0001 mM CoCl26H2O, 3 mM CaCl22H2O, 1.5 mM MgSO47H2O, 0.1 mM MnSO4H2O, 0.03 mM ZnSO47H2O, 0.0001 mM CuSO45H2O, 0.1 mM Na2 EDTA, 0.1 mM FeSO47H2O, 1.25 mM KH2 PO4, 4.1 pM Nicotianic acid, 2.4 pM Pyridoxine-HCl, 0.3 pM Thiamine–HCl, 30 g/L sucrose, 0.1 g/L myo-inositol (pH 5.7). 4. MS P-deficient medium: KH2 PO4 was omitted in the MS media. 20.6 mM NH4NO3, 18.8 mM KNO3, 0.08 mM H3BO3, 0.05 mM KI, 0.001 mM Na2MoO42H2O, 0.0001 mM CoCl26H2O, 3 mM CaCl22H2O, 1.5 mM MgSO47H2O, 0.1 mM MnSO4H2O, 0.03 mM ZnSO47H2O, 0.0001 mM CuSO45H2O, 0.1 mM Na2 EDTA, 0.1 mM FeSO47H2O, 4.1 pM Nicotianic acid, 2.4 pM Pyridoxine-HCl, 0.3 pM Thiamine–HCl, 30 g/L sucrose, 0.1 g/L myo-inositol (pH 5.7). 5. MS S-deficient medium: All the SO4 was substituted with Cl2 in the MS media. 20.6 mM NH4NO3, 18.8 mM KNO3, 0.08 mM H3BO3, 0.05 mM KI, 0.001 mM Na2MoO42H2O, 0.0001 mM CoCl26H2O, 3 mM CaCl22H2O, 1.5 mM Mg Cl27H2O, 0.1 mM MnCl2H2O, 0.03 mM ZnCl27H2O, 0.0001 mM CuCl25H2O, 0.1 mM Na2 EDTA, 0.1 mM FeSO47H2O, 1.25 mM KH2 PO4, 4.1 pM Nicotianic acid, 2.4 pM Pyridoxine–HCl, 0.3 pM Thiamine–HCl, 30 g/L sucrose, 0.1 g/L myo-inositol (pH 5.7).
Abiotic Stress-Associated miRNAs
207
6. Gelling reagents: 8g/L phytoagar (for Arabidopsis, RPI Corp, Mountain Prospect, IL), 2g/L phytagel (for rice, Sigma, St Louis, MO). 2.2. SplintedLigation-Based miRNA Detection
1. Splinted-ligation-based miRNA detection kit such as miRtectIT™ miRNA Labeling and Detection Kit (USB, Cleveland, OH), or: (a) Ligation oligonucleotide : 5¢-CGCTTATGACATTC/ dideoxyC/-3¢ (Integrated DNA Technologies, Coralville, IA) (b) OptiKinase (10 U/µL) (USB, Cleveland, OH). (c) 10× OptiKinase reaction buffer (USB, Cleveland, OH). (d) 10× capture buffer (100 mM Tris–HCl (pH7.5), 750 mM KCl) (USB, Cleveland, OH). (e) PrepEase sequencing dye clean-up kit (USB, Cleveland, OH). (f) Ligate-IT rapid ligation kit (USB, Cleveland, OH). (g) Shrimp alkaline phosphatase (1U/µL) (Roche Diagnostics GmbH, Mannheim, Germany). (h) 2× Formaldehyde loading dye (95% formaldehyde, 20 mM EDTA, 0.025% bromophenol blue and 0.025% xylene cyanol). 2. Bridge oligonucleotide: see Subheading “Bridge oligonucleotide design” (Integrated DNA Technologies, Coralville, IA). 3. Low molecular weight markers, 10–100 nt (USB, Cleveland, OH). 4. [g-32P]-ATP (6,000 Ci mM) (Perkin Elmer). 5. 40% Acrylamide/bis solution (29:1) (Ambion, Austin, TX). 6. Urea (USB, Cleveland, OH). 7. Glycerol tolerant gel buffer, 20× (USB, Cleveland, OH). 8. 10% Ammonium persulfate. 9. TEMED (Bio-Rad, Hercules, CA). 10. Millex-HA 0.45-µm filter (Millipore, Billerica, MA).
2.3. RNA LigaseMediated 5¢-RACE
1. Nuclease-free (and sterile) water. 2. Nuclease-free (and sterile) tubes. 3. 5¢ RACE kit, such as FirstChoice RLM-RACE kit (Ambion, Austin, TX) or GeneRacer kit (Invitrogen, Carlsbad, CA), or: (a) RNA oligo of 40 nt or longer (Integrated DNA technologies, Coralville, IA). (b) T4 RNA ligase (Promega, Madison, WI).
208
Jeong et al.
(c) Two primers specific to the RNA oligo (Integrated DNA technologies, Coralville, IA). 4. Oligo(dT)12–18, or gene-specific primer (Integrated DNA technology, Coralville, IA). 5. RNAseOUT (40 U/µL) (Invitrogen, Carlsbad, CA). 6. SuperScript II RT (200 U/µL) (Invitrogen, Carlsbad, CA). 7. dNTPs, 10 mM each. 8. One or two primers antisense to the putative miRNA target. 9. Choice-Taq DNA Polymerase (Denville Scientific, Inc., Metuchen, NJ). 10. Agarose. 11. 10× TBE buffer. 12. Ethidium bromide (10 mg/mL). 13. Gel extraction kit, such as NucleoCentrifuge Extract II (Macherey-Nagel, Bethlehem, PA). 14. pGEM-T easy vector system II (Promega, Madison, WI), or TOPO TA cloning kit (Intritogen, Carlsbad, CA). 15. Antibiotics. 16. LB (Luria Broth). 17. Plasmid DNA extraction kit, such as NucleoSpin Plasmid (Mackerey-Nagel, Bethlehem, PA). 2.4. PARE Target Library
1. Nuclease-free sterile water. 2. Nuclease-free sterile tubes. 3. T4 RNA ligase (Ambion, Austin, TX). 4. 10× T4 RNA ligase buffer (Promega, Madison, WI). 5. RNAseOUT (40 U/µL) (Invitrogen, Carlsbad, CA). 6. SuperScript II RT (200 U/µL) (Invitrogen, Carlsbad, CA). 7. dNTPs. 8. 5¢RNA adapter: 5¢- GUUCAGAGUUCUACAGUCCGAC-3¢ (Dharmacon Inc., USA). 9. Phenol/chloroform/isoamyl alcohol (25:24:1). 10. Chloroform/isoamyl alcohol (24:1). 11. Glycoblue (Ambion, Austin, TX). 12. 3 M NaOAc. 13. Ethanol. 14. Oligotex kit (Qiagen, Gaithersburg, MD, USA). 15. 3¢ oligo(dT) primer (Integrated DNA Technologies, Coralville, IA). 16. 0.1 M DTT.
Abiotic Stress-Associated miRNAs
209
17. 5× PCR buffer (Finnzymes, Espoo, Finland). 18. 5¢ adapter primer: 5¢-GTTCAGAGTTCTACAGTCCGAC-3¢ (Integrated DNA Technologies, Coralville, IA). 19. 3¢ adapter primer: 5¢-CGAGCACAGAATTAATACGACT-3¢ (Integrated DNA Technologies, Coralville, IA). 20. Phusion (2 U/µL) (Finnzymes, Espoo, Finland). 21. NEB #4 buffer (New England Biolabs, Ipswich, MA). 22. 10× SAM (500 µM) (New England Biolabs, Ipswich, MA). 23. MmeI (2 U/µL) (New England Biolabs, Ipswich, MA). 24. Shrimp alkaline phosphatase (1 U/µL) (Roche Diagnostics GmbH, Mannheim, Germany). 25. Acrylamide (Ambion, Austin, TX). 26. 10% Ammonium persulfate (APS) buffer. 27. TEMED (Bio-Rad, Hercules, CA). 28. 6× DNA loading buffer containing bromophenol blue and xylene cyanol. 29. 10-bp DNA ladder (Invitrogen, Carlsbad, CA). 30. 5 M NaCl. 31. Millex-HA 0.45-µm filter (Millipore, Billerica, MA). 32. Microcon columns (Millipore, Billerica, MA). 33. Rapid Ligation Kit (Roche Diagnostics GmbH, Mannheim, Germany). 34. Double-stranded DNA adapter: top, 5¢-p-TCGTATGCCGTCTTCTGCTTG -3¢and bottom, 3¢-NNAGCATACGGC AGAAGACGAAC-5¢. p, phosphate group. (Integrated DNA Technologies, Coralville, IA). 35. P5 primer: 5¢-ATGATACGGCGACCACCGACAGGTTCAG AGTTCTACAGTCCGA-3¢ (Integrated DNA Technologies, Coralville, IA). 36. P7 primer: 5¢-CAAGCAGAAGACGGCATACGA-3¢ (Integr ated DNA Technologies, Coralville, IA). 37. Glycogen (Ambion, Austin, TX).
3. Methods 3.1. Abiotic Stress Treatment
In order to avoid unintended biotic or abiotic stresses, plants for the experiments, we describe, are grown on a defined growth medium instead of soil. It is important to monitor the expression of a known stress-regulated gene (Fig. 2) to verify that the stress treatment is effective and determine which specific conditions, such
210
Jeong et al. 0
2
4
8
12
24 (h) SalT
drought RBCS
SalT
salt RBCS
cold
OsWRKY71
Fig. 2. Use of known stress-regulated genes to monitor and select stress conditions for library construction. Total RNAs were isolated from 2-week-old rice seedlings treated with drought, salt, or cold during a 24-h time course. Stress inducible (SalT and OsWRKY71) or repressible (RBCS) gene expression patterns were examined to choose optimal conditions for library construction. In these examples, 8-h treatment was selected for small RNA library construction.
as duration of the treatment, are appropriate. Rather than describing all possible abiotic stress conditions, we will describe a few examples from rice and Arabidopsis (Table 1). For rice, the stress treatments below are given for plants initiated in parallel 14 days after sowing on MS agar media and incubated in a growth chamber with 12 h light at 28°C/12 h dark at 25°C. For Arabidopsis, the stress is imposed on 2-week-old seedlings grown on MS agar plates for 2 weeks in an incubator set at 16 h light /8 h dark and 21°C. The entire harvest procedure is completed in less than 10 min. The RNA samples for control should be generated from nonstressed plants, which are handled in exactly the same way but not exposed to stress conditions. 3.1.1. Cold Stress
Transfer 2-week-old rice or Arabidopsis seedlings to 4°C in a cold room under continuous light for 8 h. Cold stress is more effective under light conditions.
3.1.2. Heat Stress
Transfer 2-week-old seedlings to an incubator set to 42°C for 8 h (rice) or 28°C for 24 h (Arabidopsis). Humidity and light/dark cycle should be identical to the control conditions.
3.1.3. Drought Stress
Desiccation can be used as a proxy for drought. Remove 2-weekold rice seedlings from media and expose to air in an incubator for 8 h. For Arabidopsis, gently pull 2-week-old seedlings from MS media and expose their roots for 4 h.
3.1.4. Salt Stress
Transfer 2-week-old rice seedlings to 300 mM NaCl solutions for 8 h. For Arabidopsis, pour a solution of concentrated NaCl
Abiotic Stress-Associated miRNAs
211
onto plates containing 2-week-old Arabidopsis seedlings and soak for 8 h. 3.1.5. Sulfate Starvation
Remove the endosperm from 9-day-old rice seedlings to avoid nutrient transport. Transfer the seedlings to MS S-deficient media (for sulfate starvation) or MS media (for control). After 5 days of treatment, separate shoots and roots for independent analysis. This will help us understand the roles of miRNAs in primary uptake of sulfate by roots, but also those in internal translocation of sulfate by shoots under sulfate-deficient conditions. For Arabidopsis, plate seeds on the MS media with varying concentrations of sulfate and grow under the stress for the full 2 weeks.
3.1.6. Phosphate Starvation
Remove the endosperm from 9-day-old rice seedlings to avoid nutrient transport. Transfer the seedlings to MS P-deficient media (for phosphate starvation) or MS media (for control). After 5 days of treatment, separate shoots and roots for independent analysis. For Arabidopsis, plate seeds onto MS media with varying concentrations of phosphate and grow for the full 2 weeks.
3.2. Small RNA Library Construction and Sequencing
Small RNA populations in plants are so vast and complex that small RNA library construction and deep sequencing is necessary to elucidate the identity, regulation, and function of miRNAs. Additionally, by using deep sequencing methods such as MPSS, 454, or SBS, the number of times a small RNA is sequenced from a small RNA library provides a reliable indicator of the relative abundance of that small RNA. To identify abiotic stress responsive miRNAs, small RNA libraries are constructed from stresstreated plant tissues and controls (see Notes 1 and 2). Construction of small RNA libraries is described in Chapter 8. Typically, 10 rice or 100 Arabidopsis seedlings are more than sufficient to yield enough total RNA (isolated as described in Chapter 3) for small RNA isolation and the preparation of one or more libraries.
3.3. Computational Analysis and Validation of Abiotic Stress-Associated miRNAs
In order to find known or novel miRNAs associated with abiotic stress responses, deep sequencing data can be analyzed using a series of computational tools. First, to obtain the small RNA sequences, the adaptor sequences are removed with trimming scripts (see Chapter 7). Trimmed sequences are then matched against the genome to remove likely contaminants. These could derive from fungal, bacterial, or viral sources, especially in plants not grown under sterile conditions, but also could be the result of instrument sequencing errors or sequences that derive from unsequenced regions of the genome such as centromeres or ribosomal repeats. Next, to streamline the analysis, sequences matching to noncoding RNAs – such as rRNAs, tRNAs, small nuclear RNAs, and small nucleolar RNAs – are typically removed, as are those matching to chloroplast or mitochondrial genomes. It is thought that most of these sequences represent nonspecific decay products
3.3.1. Computational Analysis of Abiotic Stress-Associated miRNAs
212
Jeong et al.
of the corresponding RNAs, though some may be biologically interesting and could be studied subsequently if the reason arises. The abundance of each sequence is represented by the percentage of its abundance relative to that of the total library, rather than by a raw abundance, according to the following formula: normalized abundance (TPM) = raw abundance/(total genome match – t/r/sn/snoRNA/chloroplast/mitochondria)/total reads in library × 1,000,000 (see Note 3). Candidate small RNAs regulated by abiotic stress can be isolated based on the ratio of normalized abundance values between control and abiotic stress libraries. To minimize noise from technical bias, it is preferable to select sequences showing the greatest difference between libraries for further functional analysis, preferably exhibiting differential expression of several fold or more. Similar to microarrays, replicates offer a means of decreasing biological and technical variation and, with the increasing prevalence of instrumentation, are now becoming more affordable. This approach may be particularly useful if the goal is to accurately characterize the population of small RNAs that change the abundance in response to an abiotic stress, but could also be helpful for identifying the best examples to examine with functional studies. The miRNAs that are found to be regulated in response to a given stress could be of three types (1) known miRNAs that are known to be regulated by the stress, (2) known miRNAs that are newly discovered and to be regulated by the stress, and (3) new miRNAs that are regulated by the stress. MicroRNAs of the first type are good positive controls and also could reveal unknown aspects of the regulation, such as tissue or organ specificity. However, the second and third types are likely the most interesting. Those of the second type are easy to identify, and probably have known targets that can be investigated for biological function in stress responses as discussed below. Accordingly, examination of the abundances of known miRNAs is useful to include as early analysis of small RNA data sets. Particular attention should be paid to putative novel miRNA sequences that are regulated by a given stress. To minimize the number of candidates requiring manual and experimental analysis, we typically filter the sequences for those with high abundance (over 100 TPM), few hits on genome (<20 times), and 20–24 nt in length. Recently, a new primary criterion was established for annotating a new miRNA and listing it in the miRBase of the Sanger registry (20): there should be evidence for the relatively precise and predominant excision of that miRNA from a miRNA precursor-like stem-loop structure. That is, the putative miRNA will in most cases be the major small RNA derived from a stem-loop precursor, and often the miRNA* sequence from the other side of the stem will be evident in the library at a lower abundance. If the stem-loop precursor has no strongly predominant small RNA but
Abiotic Stress-Associated miRNAs
213
rather multiple distributed sequences on one or both strands, then a regulated small RNA from this locus may certainly be very interesting, but most likely as a regulated siRNA and not as a new miRNA. Usually, the number of stress-regulated new miRNA candidates is not too large for the candidates to be examined individually for stem-loop potential and small RNA distribution. 3.3.2. Validating the regulation of Abiotic Stress-Associated miRNAs
Validating the results of differentially regulated miRNA expression patterns is very important. As with any high-throughput technique, some false positives are to be expected. It is best to examine whether the differential expression revealed by one method is also reproduced by another method and in a biological replicate. For example, real time quantitative RT-PCR, Northern blotting (Fig. 3 and Chapter 13), or splinted-ligation-based a
-Pi
+ Pi
OsmiR399
tRNAs
b (nt)
M
+ Pi -Pi
M
+ Pi -Pi
100
50 45 40 35 30 25 OsmiR399
OsmiR168
Fig. 3. Confirmation of stress regulation of OsmiR399, a low abundance miRNA. (a) RNA gel blot analysis of OsmiR399 in rice seedling shoots grown in high phosphate (+Pi) or phosphate-deficient (−Pi) medium. tRNA staining is shown as a loading control. The amount of low molecular weight RNA (see Chapter 8) loaded per lane was equivalent to 150 µg of total RNA. (b) Validation of OsmiR399 expression by splinted ligation-mediated miRNA detection. Arrows indicate the position of the expected 35-nt product, which is the ligation product of the 14-nt detection oligonucleotide and the 21-nt OsmiR399. M is the radiolabeled low molecular weight marker. OsmiR168 was used as an internal control. The amount of total RNA used per lane in this case was only 2 µg.
214
Jeong et al.
miRNA detection methods (21) have been shown to be very good at quantifying miRNAs. 3.3.2.1. Detection of miRNAs using splinted ligation
Splinted-ligation-based miRNA detection uses an miRNA-specific bridge oligonucleotide to form base pairs with the miRNA and a 5¢-end-radiolabeled ligation oligonucleotide. The captured miRNA is directly labeled by ligation, separated by denaturing gel electrophoresis, and visualized by autoradiography. This method takes advantage of liquid hybridization to improve detection sensitivity. Compared to standard northern blotting, which requires more than 100 µg of total RNA to detect miRNAs, this method is successful with nanogram to microgram amounts of total RNA. See Fig. 3 for a comparison for a low abundance miRNA. Since an amplification step is not required, there is less bias compared with the real-time RT PCR method.
3.3.2.1.1. Preparation of ligation oligonucleotide
1. To prepare [g-32P]-labeled ligation oligonucleotide, assemble the following components at room temperature: (a) 2 µL Ligation oligonucleotide (10 µM) (b) 2 µL 10× OptiKinase reaction buffer (c) 2 µL [g-32P]-ATP (6,000 Ci mmol/L) (d) 2 µL OptiKinase (e) Nuclease-free water to a final volume of 20 µL 2. To prepare radiolabeled markers, assemble the same reaction, replacing the ligation oligonucleotide with 1 µg of low molecular weight marker. 3. Mix gently, and then centrifuge briefly. 4. Incubate at 37°C for 1 h. 5. Prepare the PrepEase sequencing dye clean-up kit to remove the unincorporated [g-32P]-ATP. Centrifuge the column briefly to collect the dry resin at the bottom of the column. 6. Add 600 µL of RNAse-free water into the column to hydrate the resin. 7. Vortex briefly. 8. Incubate at room temperature for at least 30 min. 9. Centrifuge at 750 × g for 2 min to remove the remaining water. Discard the flow-through. 10. After radiolabeling incubation, dilute the reactions (from step 4) to 100 µL by adding 80 µL RNase-free water. 11. Place the columns in a 1.5-mL tube and apply the diluted reaction onto the top of the resin. 12. Centrifuge the column at 750 × g for 3 min to clean the radiolabeled reaction. 13. Store at −20°C if not required immediately.
Abiotic Stress-Associated miRNAs
215
3.3.2.1.2. Bridge oligonucleotide design
The bridge oligonucleotide is a DNA oligonucleotide complementary to both the ligation oligonucleotide and a specific miRNA at its 5¢- and 3¢-ends, respectively. For example, in order to detect OsmiR156a (5¢-UGACA GAAGAGAGUGAGCAC-3¢), bridge oligonucleotide should be designed as 5¢ GAATGTCATAAGCGGTGCTCACTCTCT TCTGTCA-3¢. The first 14 underlined nt are the reversecomplement DNA sequence of the ligation oligonucleotide and the remaining sequence is the reverse-complement of OsmiR156a. A standard desalting purification after synthesis is enough to prepare the bridge oligonucleotide. In some cases, it is desirable to block the 3¢-end or both the 5¢- and 3¢-ends by incorporating a modification such as a C3 spacer, amino-modifier, inverted deoxyT, or dideoxyC. These modifications make certain that unwanted side ligation reactions do not happen. However, in general, this modification is not necessary.
3.3.2.1.3. miRNA Capture, Ligation, and Inactivation
1. Prepare 0.1 pmol/µL of the bridge oligonucleotide by diluting in 10× capture buffer. 2. For the capture reaction, assemble the following in a nuclease-free PCR tube on ice (a) ~1–8 µg of total RNA (b) RNase-free water to a final volume of 8 µL • 1-µL bridge oligonucleotide (0.1 pmol/µL in 10× capture buffer) • 1-µL radiolabeled ligation oligonucleotide 3. Mix gently, and then centrifuge briefly. 4. Incubate the mixture at 94°C for 1 min, 65°C for 2 min and 37°C for 10 min using a PCR machine. 5. To prepare the ligase mastermix, combine the following per sample: (a) 3-µL 5× Ligate-IT buffer (b) 1-µL Ligate-IT enzyme (c) 1-µL RNase-free water 6. Add 5 µL of the ligase mastermix to each sample. 7. Mix gently and centrifuge briefly. 8. Incubate in a PCR machine at 30°C for 1 h. 9. To inactivate the reaction, add 1 µL of shrimp alkaline phosphatase to each sample. 10. Mix gently and centrifuge briefly. 11. Incubate in a PCR machine at 37°C for 15 min.
3.3.2.1.4. Gel electrophoresis and visualization
1. Prepare a 15% urea-polyacrylamide gel by combining the following in 50-mL tube (makes two gels)
216
Jeong et al.
(a) 14.4 g Urea (8 M) (b) 11.2 mL 40%Acrylamide/bis solution (19:1) (c) 1.5-mL 20× glycerol tolerant gel buffer (d) Nuclease-free water to a final volume of 30 mL 2. Place the tube on a rocker to dissolve the urea completely. 3. Filter the solution through a nitrocellulose filter and cool to room temperature. 4. Add 120 µL of freshly prepared 10% ammonium persulfate to the solution. Mix well. 5. Add 9.2 µL of TEMED. Mix thoroughly by swirling. 6. Pour the solutions between glass plates and allow the acry lamide to polymerize at room temperature for at least 30 min. 7. Pre-run the gel in a 1× glycerol tolerant gel buffer for 20–30 min at 200 V. 8. Add an equal volume (16 µL) of formamide loading dye to the reaction (16 µL). 9. Prepare the radiolabeled marker as described earlier. Use 15 µL of a 1:1,000 dilution of [32P]-labeled low molecular weight marker per lane. Radiolabeled markers can be stored at −20°C and used up to 2 months after adjusting the volume to take into account decay of the radioisotope. 10. Denature the samples at 95°C for 5 min and put on ice until loading. 11. Load 10–20 µL of samples into the gel and include radiolabeled markers. 12. Run the gel at 200 V for about 1 h until the lower bromophenol blue dye runs out of the gel. Unligated detection oligonucleotide (14 nt) should have run out by this point. 13. After the gel has run, the miRNA can be detected with X-ray film or phosphoimaging. 3.4. Computational Analysis and Validation of Target Genes of Abiotic Stress-Associated miRNAs 3.4.1. Computational Analysis of Target Genes
Abiotic stress up-regulated miRNAs are expected to target negative regulators of stress responses or positive regulators of processes that are inhibited by stresses. On the other hand, stress down-regulated miRNAs may repress the expression of positive regulators and/or stress up-regulated genes. Publicly available target prediction websites can be used to identify the potential targets of the stress-related miRNAs (22, 23). Although some targets of known miRNAs in plants have been reported with a mismatching score of 3.5, a highly stringent score (3.0) is usually applied in target prediction to reduce noise. Detailed methods are described in Chapter 4.
Abiotic Stress-Associated miRNAs
217
3.4.2. Validation of Target Genes by RNA LigaseMediated (RLM) 5¢-RACE
Computational analysis will often identify multiple putative targets for a given miRNA. On the basis of our experience, it is critical to verify that these putative targets are not only expressed during the abiotic stress, but also that they are cleaved within the miRNA complementary region. The RNA ligase-mediated (RLM) 5¢ RACE procedure is often used to accomplish both.
3.4.2.1. Ligation of the 5¢ Adapter
The validation of miRNA targets via 5¢ RACE has been successfully accomplished using both total (24) and poly(A)-selected RNA (10, 25) as the starting material (see Note 4). When using a commercial 5¢ RACE kit, such as FirstChoice RLM-RACE kit (Ambion, Austin, TX) or GeneRacer kit (Invitrogen, Carlsbad, CA), which were developed to map the 5¢ end of mRNAs, omit the tobacco acid pyrophosphatase (TAP) and calf intestinal phosphatase (CIP) treatments, and proceed directly to the RNA adapter ligation to retain the required 5¢ phosphate generated wduring miRNA-mediated cleavage. 1. To set up the ligation reaction, assemble the following in a nuclease-free tube on ice: (a) 1 µg of total RNA or 25 ng of poly(A) RNA (b) 5 units T4 RNA ligase (Promega, Madison, WI) (c) 1 µL of 10× RNA ligase buffer (Promega, Madison, WI) (d) 0.3 µg of RNA adapter (Integrated DNA technologies, Coralville, IA) (e) Nuclease-free water in a total volume 10 µL 2. Incubate at 37°C for 1 h.
3.4.2.2. Reverse Transcription (RT) for RLM 5¢ RACE
Creation of cDNA can be accomplished using oligo(dT) or genespecific primers to prime the first strand synthesis. If one is testing multiple miRNA targets, an oligo(dT)-primed cDNA reaction can ubiquitously be used for PCR reactions for up to 20 putative miRNA targets, whereas for validating low abundance targets, using a gene-specific primer at this step may increase the chances of success. Because of the numerous temperatures required for the RT reaction, using a thermocycler for this procedure eliminates the need for multiple heat blocks or waterbaths. 1. To set up the RT reaction, assemble the following on ice: (a) 0.5 µL oligo(dT)12–18, (50 mM), or 2-pmol gene-specific primer (b) 1 µL of the ligated RNA from the precious reaction (c) 1 µL 10 mM dNTP mix (d) Nuclease-free water to 12 µL in nuclease-free PCR tube 2. Incubate at 65°C for 5 min.
218
Jeong et al.
3. Briefly chill on ice. 4. Add 4 µL of 5× First-Strand Buffer, 2 µL of 0.1 M DTT, and 1-µL RNAseOUT (Invitrogen, Carlsbad, CA). 5. Incubate at 42°C for 2 min. 6. Add 1 µL of SuperScript II RT (Invitrogen, Carlsbad, CA). 7. Incubate at 42°C for 1 h. 3.4.2.3. Amplification of RACE Products
Some labs have found that one round of PCR amplification is sufficient when starting from poly(A)-selected RNA (25), whereas others use a second round of amplification to increase specificity and/or yield of the PCR product when using poly(A) or total RNA (10, 24). Assuming two rounds of PCR are required, two sets of nested PCR primers should be generated (see Note 5). Two primers should be specific to the RNA adapter, whereas two specific to the putative miRNA target should be designed to amplify a PCR product of 100–500 nucleotides in length (Fig. 4). The two prim-
Fig. 4. RLM 5¢-RACE. (a) Schematic of the miRNA cleavage product from At2g44790 after adapter ligation. The black and gray arrows represent sets of nested PCR primers. In each set, one primer is specific for the adapter, whereas one is gene specific. (b) Poly(A)-selected RNA were used as the starting material for the RLM 5¢ RACE. cDNA was primed using oligo(dT), and subsequently amplified with the nested PCR primers shown in (a). The asterisk denotes the PCR products of the expected size, which were extracted and sequenced. (c) The miRNA complementary site of At2g44790 is shown paired with miR408. Vertical black bars represent paired bases, whereas the circle represents a GU pair. The vertical arrow corresponds to the miRNA cleavage site. The numbers above the vertical arrow indicate the sequencing data (ratio of sequences ending at that site to the number sequenced).
Abiotic Stress-Associated miRNAs
219
ers amplifying the larger and smaller products are designated outer and inner, respectively, in the following sections. 1. To set up the PCR reactions, assemble the following: (a) 5 µL 10× PCR buffer (b) 2 µL of each outer primer (10 mM stock) (c) 4 µL 10 mM dNTPs (d) 1 µL of cDNA from previous reaction (e) 0.5 µL Choice-Taq DNA Polymerase (5 U/µL) (Denville Scientific, Inc., Metuchen, NJ) (f) Nuclease-free water to 50 µL 2. The PCR program is: (a) 94°C for 2 min (b) 94°C for 30 s • Primer-specific annealing temperature for 30 s • 72°C for 30 s • Repeat (b) for 35 cycles (c) 72°C for 10 min 3. To start a second round of amplification, the reaction is set up similar to the first with the exceptions that 1 µL of the first PCR reaction replaces the cDNA, and the inner primers replace the outer primers. 3.4.2.4. Cloning and Sequencing
1. Run the PCR reaction on a 1.5–2% agarose TBE gel with a DNA marker. 2. Visualize using ethidium bromide, and carefully excise the band of the expected size. 3. Extract DNA from gel using kits such as NucleoSpin Extract II (Macherey-Nagel, Bethlehem, PA) following the manufacturer’s instructions. 4. Clone the PCR product and transform E. coli using pGEM-T easy vector system II (Promega, Madison, WI) or TOPO TA cloning kit (Invitrogen, Carlsbad, CA), according to manufacturer’s instructions (see Note 6). 5. Extract plasmid DNA using kits such as NucleoSpin Plasmid (Macherey-Nagel, Bethlehem, PA) following the manufacturer’s instructions. 6. Verify that plasmids contain inserts using restriction digests or PCR. 7. Send 10–20 insert-containing plasmids for sequencing. 8. Verify that the majority of the 5¢ ends begin 11 bases downstream of the start of the miRNA-complementary region on the miRNA target (Fig. 4c).
220
Jeong et al. mRNA – target of miRNA
m7G cap
AAAAAA (A)n miRNA directed cleavage
m7G cap
3’OH
3’ end of cleaved mRNA
5’P
AAAAAA (A)n
3’ end of cleaved mRNA AAAAAA (A)n
5’P
5’ RNA adapter ligation AAAAAA (A)n
5’
(T)21
Reverse transcription 3’
5’ 2nd 5’ 3’
5’ 3’
strand synthesis (make cDNA) 3’ 5’
MmeI
Cut to capture 20-21 bp “signature” 3’ 5’ Add 3’ adapter, PCR amplify
Sequencing by SBS
= RNA adapter = DNA adapter or primer
Fig. 5. Schematic depiction of PARE library construction. The library construction requires several steps: (1) Poly(A) RNA is extracted. (2) An RNA adapter is ligated to the 5¢ ends of the single-stranded RNA molecules. (3) Reverse transcription is carried out to generate the first strand of cDNA using an oligo (dT) with a 3¢ adapter sequence. (4) A short PCR is applied to amplify the cDNA. (5) The products of the PCR reaction are digested with MmeI to generate equal-sized fragments. (6) A double-stranded 3¢ DNA adapter with degenerate nucleotides in the overhang region is ligated to the MmeI digestion products. (7) The resulting material is PCR amplified, gel purified, and used for high-throughput sequencing. Adapted from a figure first published in (44).
3.5. PARE, A Suitable Approach for the Global Identification of miRNA Targets Associated with Abiotic Stresses
The emergence of innovative technologies such as 454, SBS, and SOLiD enabled the finding of miRNA targets on a genome-wide scale without the need of prediction software. A novel approach named Parallel Analysis of RNA Ends (PARE) (26) uses modified RLM 5¢ RACE and bioinformatics tools to identify simultaneously miRNA targets and their associated miRNAs at a global scale. A synopsis of PARE library construction is shown in Fig. 5.
3.5.1. PARE Library Construction
Start a PARE library with 100–150 mg of total RNA extracted with Trizol (Invitrogen, Carlsbad, CA) or similar reagent (see Chapter 3) according to the manufacturer’s instructions, and use the Oligotex kit (Qiagen, Gaithersburg, MD) for poly(A) RNA extraction. This generally yields 2–5 mg of polyA RNA at a concentration of ~100 ng/mL.
3.5.1.1. Ligation of the 5¢ Adapter
Abiotic Stress-Associated miRNAs
221
1. For the ligation reaction, assemble the following components in a nuclease-free tube on ice: (a) 1 mg poly(A) RNA (b) 1 mL (200 µM) 5¢ RNA adapter (Dharmacon, Inc., USA) (c) 2 mL 10× RNA ligase buffer (Ambion, Austin, TX) (d) 2 mL T4 RNA ligase (5 U/mL) (Ambion, Austin, TX) (e) Nuclease-free water to a final volume of 20 µL 2. Mix gently, and then centrifuge briefly. 3. Incubate at 37°C for 1 h. 4. Add 280-mL nuclease-free water to the reaction. 5. Extract once with 300 mL of phenol/chloroform/isoamyl alcohol (25:24:1). 6. Extract once with 300 mL of chloroform/isoamyl alcohol (24:1). 7. Add 1 mL of Glycoblue (Ambion, Austin, TX), 30 mL of 3 M NaOAc, and 900 mL of cold 100% ethanol. Centrifuge at maximum speed for 30 min. 8. Wash with 500 mL of 70% ethanol, vacuum dry, and resuspend in 250-mL nuclease-free water. 9. Remove unligated adapters by purifying poly(A) RNA with Oligotex (Qiagen, Gaithersburg, MD, USA), according to manufacturer’s directions. 10. Proceed immediately to reverse transcription step.
3.5.1.2. Reverse Transcription (RT) for PARE
1. To set up the RT reaction, assemble the following in a nucleasefree tube on ice: (a) ~500 ng purified ligated RNA from above (b) 2-µL 3¢ adapter (dT) primer (50 mM) (Integrated DNA Technologies, Coralville, IA) (c) Nuclease-free water to a final volume of 30 µL 2. Heat to 65°C for 5 min, and centrifuge briefly to cool. 3. Add the following: (a) 10-µL 5× first strand buffer (Invitrogen, Carlsbad, CA) (b) 2.5 µL dNTP mix (10 mM each) (c) 2.5 µL 0.1 M DTT (d) 2 µL RNAseOUT (40 U/µL) (Invitrogen, Carlsbad, CA) (e) 3 µL SuperScript II (200 U/µL) (Invitrogen, Carlsbad, CA) 4. Incubate the tube at 42–45°C for 3 h to overnight. Finish the reaction at 70°C for 10 min. 5. Add 250-mL nuclease-free water to the reaction. 6. Extract once with 300 mL of phenol/chloroform/isoamyl alcohol (25:24:1).
222
Jeong et al.
7. Extract once with 300 mL of chloroform/isoamyl alcohol (24:1). 8. Add 1 mL of Glycoblue (Ambion, Austin, TX), 30 mL of 3 M NaOAc and 900 mL of cold 100% ethanol. Centrifuge at maximum speed for 30 min. 3.5.1.3. PCR Amplification (Short)
9. Wash with 500 mL of 70% ethanol, vacuum dry and resuspend in 6-mL nuclease-free water. 1. For the short PCR amplification, add the following into two PCR tubes on ice: (a) 3-µL cDNA (RT reaction ~0.5 mg) (b) 32-µL nuclease-free water (c) 10-µL 5× PCR buffer (Finnzymes, Espoo, Finland) (d) 0.5-µL dNTP mix (25 mM each) (e) 2-µL 5¢ adapter primer (10 µM) (Integrated DNA Techno logies, Coralville, IA) (f) 2-µL 3¢ adapter primer (10 µM) (Integrated DNA Techno logies, Coralville, IA) (g) 0.5-µL Phusion (Finnzymes, Espoo, Finland) 2. The PCR program is: (a) 98°C for 1 min (b) 98°C for 20 s • 60°C 30 s • 72°C for 3 min • Repeat (b) for 5 cycles1 (c) 72°C for 7 min 3. Combine the tubes and add 200-mL nuclease-free water. 4. Extract once with 300 mL of phenol/chloroform/isoamyl alcohol (25:24:1). 5. Extract once with 300 mL of chloroform/isoamyl alcohol (24:1). 6. Add 1 mL of Glycoblue (Ambion, Austin, TX), 30 mL of 3 M NaOAc and 900 mL of cold 100% ethanol. Centrifuge at maximum speed for 30 min. 7. Wash with 500 mL of 70% ethanol, vacuum dry, and resuspend in 250-mL nuclease-free water. 8. Clean the products with Microcon columns (Millipore, Billerica, MA) according to the manufacturer’s instructions.
3.5.1.4. MmeI Digestion
1. Add the following to 2–5 mg of the PCR clean product dissolved in 22-mL nuclease-free water: (a) 3-µL 10× NEB #4 buffer (New England Biolabs, Ipswich, MA)
The number of cycles required may vary according to the organism or tissue.
1
Abiotic Stress-Associated miRNAs
223
(b) 3-µL 10× SAM (500 µM) (New England Biolabs, Ipswich, MA) (c) 2-µL MmeI (2 U/µL) (New England Biolabs, Ipswich, MA) 2. Place at 37°C for 2 h with occasional rotation. 3. Add 2 µL of shrimp alkaline phosphatase (Roche Diagnostics GmbH, Mannheim, Germany). 4. Dephosphorylate for 1 h at 37°C. 3.5.1.5. PAGE I
1. Prepare a 12% polyacrylamide gel by combining the following (makes two gels): (a) 6.3-mL Acrylamide (Ambion, Austin, TX) (b) 2-mL 10× TBE (c) 11.7-mL Nuclease-free water (d) 120-mL 10% APS (e) 9.2-mL TEMED 2. Add 15 mL of 6× DNA loading buffer to the dephosphorylated MmeI digest. Load three lanes of the polyacrylamide gel with ~15 mL each, and one lane with 10 bp DNA ladder (Invitrogen, Carlsbad, CA). The DNA ladder is critical as most likely no product band will be visible. 3. Run the gel at 180 V until there is good separation of bromophenol blue and xylene cyanol (~ 60 min). 4. Stain in an ethidium bromide bath for 5 min. 5. Slice corresponding gel bands (42 nt), put into a 2-mL tube, and crush. 6. Add two volumes of elution buffer (0.3 M NaCl). 7. Elute it overnight or at least 6 h at room temperature with rotation. 8. Centrifuge 10 min at maximum speed. 9. Filter the supernatant through a Millex-HA 0.45-µm filter unit to remove polyacrylamide gel pieces (optional). 10. Add 2.5 volumes of 100% ethanol and 1 µL of glycogen (Ambion, Austin, TX). 11. Place at −80°C for at least 3 h (overnight is best). 12. Centrifuge at maximum speed at 4°C for at least 45 min. Wash with 75% ethanol. Eliminate ethanol as much as possible. 13. Air dry for 10 min.
3.5.1.6. Ligation of 3¢ Double Strand DNA Adapter
1. For the ligation, add the following to the gel-purified digested product: (a) 6-µL Dilution buffer X1 (b) 4-µL 3¢ double-stranded DNA adapter (1.5 µM)
224
Jeong et al.
(c) 1-µL DNA dilution-buffer 2 (5×) (d) 12.5-µL Rapid-Ligation-buffer 1 (2×) (e) 2-µL Rapid-Ligation-Kit ligase 2. Incubate at room temperature for 2 h. 3.5.1.7. PAGE II
1. Prepare a 12% polyacrylamide gel as in Subheading “PAGE I.” 2. Add 6× DNA loading buffer to the ligation and load two lanes of polyacrylamide gel with ~16 µL each, and one lane with 10-bp DNA ladder (Invitrogen, Carlsbad, CA). Again, the DNA ladder is critical, as most likely no product band will be visible. 3. Run the gel at 180 V until there is good separation of bromophenol blue and xylene cyanol (~60 min). 4. Stain in an ethidium bromide bath for 5 min. 5. Slice corresponding gel bands (63 nt), put into a 2-mL tube and crush. 6. Add two volumes of elution buffer (0.3 M NaCl). 7. Elute it overnight or at least 6 h at room temperature with rotation. 8. Centrifuge for 10 min at maximum speed. 9. Filter the supernatant through a Millex-HA 0.45-µm filter unit (optional). 10. Add 2.5 volumes of 100% ethanol and 1 µL of glycogen (Ambion, Austin, TX). 11. Place at −80°C for at least 3 h (overnight is best). 12. Centrifuge at maximum speed at 4°C for at least 45 min. Wash with 75% ethanol. Eliminate ethanol as much as possible. 13. Air dry for 10 min.
3.5.1.8. PCR Amplification (Long)
1. Resuspend the clean ligation product in 37-µL nuclease-free water. 2. Transfer to a PCR tube and add the following components: (a) 10 µL 5× cloned Phu Buffer (b) 1 µL 25 µM P5 primer (c) 1 µL 25 µM P7 primer (d) 0.5 µL 12.5 mM dNTP mix (e) 0.5-µL Phu polymerase 3. The PCR program is: (a) 98°C for 30 s (b) 98°C for 10 s • 60°C 30 s
Abiotic Stress-Associated miRNAs
225
• 72°C for 15 s • Repeat (b) for 21 cycles2 (c) 72°C for 3 min 3.5.1.9. PAGE III
1. Prepare a 12% polyacrylamide gel as in Subheading “PAGE I”. 2. Add 6× DNA loading buffer to the PCR reaction and load three lanes of polyacrylamide gel with ~20 µL each, and one lane with 10-bp DNA ladder (Invitrogen, Carlsbad, CA). 3. Run the gel at 180 V until there is good separation of bromophenol blue and xylene cyanol (~60 min). 4. Stain in an ethidium bromide bath for 5 min. 5. Slice corresponding gel bands (86 nt), put into a 2-mL tube and crush. If no bands are visible at this step, increase the number of PCR cycles in Subheading “PCR Amplification (Long)”. If this does not solve the problem, increase by two fold the initial amount of mRNA in Subheading “Ligation of the 5¢ Adapter.” 6. Add two volumes of elution buffer (0.3 M NaCl). 7. Elute it overnight or at least 6 h at room temperature with rotation. 8. Centrifuge for 10 min at maximum speed. 9. Filter the supernatant through a Millex-HA 0.45-µm filter unit (optional). 10. Add 2.5 volumes of 100% ethanol and 1 µL of glycogen (Ambion, 5 mg/mL). 11. Place at −80°C for at least 3 h (overnight is best). 12. Centrifuge at maximum speed at 4°C for at least 45 min. Wash with 75% ethanol. Eliminate ethanol as much as possible. 13. Air dry for 10 min and resuspend in 11.5-µL nuclease-free water. 14. Use 0.5–1 µL for QC cloning as outlined above in Subheading “Cloning and Sequencing.” 15. If QC is passed, the library is ready for 454/SBS/SOLiD sequencing (see Note 7).
3.5.2. PARE Data Analysis
Following the library sequencing, a series of steps is required to analyze the data (Fig. 6). First, adapter sequences should be removed and the trimmed signatures matched to the genome or cDNA. The 15-nt upstream of each match is collected and concatenated with the first 15 nt of the trimmed signature to generate a 30 nt sequence (t-signature), which should contain the miRNA complementary site (Fig. 6). The t-signature should be reverse complemented and matched to a database containing
The number of cycles required may vary according to the organism or tissue.
2
226
Jeong et al.
Fig. 6. Schematic description of the steps for identification of miRNA targets using PARE data. First, adapter sequences are removed and the sequences from the sequencing are matched to the genome or cDNAs. Next, the 30 nt surrounding the miRNA complementary site (called the t-signature) is collected. The t-signature is reverse complemented and compared to a database containing Arabidopsis small RNAs to identify potential miRNA/target pairs. t-Plots (see text) are used as filters prior to validation of new miRNAs, and characterization of miRNA and target RNA regulation by conventional methods (see Subheadings 3.3.1, 3.3.2, and 3.6). Adapted from a figure first published in (44).
Arabidopsis small RNAs, allowing several mismatches, to identify potential miRNA/target pairs. To easily pinpoint miRNA targets, the distribution of signatures (and abundances) along their transcripts should be plotted. These target plots (t-plots), indicating the abundance of each signature as a function of its position in the transcript, can be used to distinguish true miRNA cleavage sites from background noise. For the majority of validated miRNAs, signatures corresponding to the cleavage site are found at higher abundance than those at other positions, making them readily apparent when examining the t-plots (26). 3.6. Examination of Target Gene Expression
In general, plant miRNAs down-regulate their target gene’s expression, at least in part, by guiding cleavage and subsequent degradation of the target’s mRNA. Stress-regulated miRNAs often have targets associated with stress response and modulate their expression at the post-transcriptional level (27). In order to characterize the role of miRNAs on their targets, it is necessary to examine the expression pattern of target genes. The expressions of target genes can be monitored by Northern blot, real-time quantitative RT-PCR, or microarray. The antagonistic expression patterns between miRNAs and their targets reflect negative effects of miRNAs on their target genes. For example, inducing miRNAs
Abiotic Stress-Associated miRNAs
227
under stress conditions can down-regulate the expression of negative regulators of stress responses, whereas, repressing miRNAs can lead to increased expression of positive regulators of stress responses (27). However, we found that the expression level of target mRNAs is not always decreased by the induction of miRNA expression even if cleavage of the corresponding target mRNA by miRNA was detected. This implies that the post-transcriptional regulation of target mRNA levels by miRNA-directed cleavage is not always rate-limiting for mRNA accumulation under the stress conditions tested. 3.7. Functional Analysis of Abiotic Stress-Associated miRNAs in Transgenic Plants
The general strategy to examine functional roles of specific stressassociated miRNAs is typically to overexpress or knockdown the miRNA or its target gene and then to examine the plants for changes normally associated with the stress response. To overexpress an miRNA, the given miRNA and its miRNA* can be cloned into the known miRNA gene in place of the natural miRNA and miRNA* sequence. This new genes can be introduced into the plant under the control of strong promoter (e.g., 35S promoter in Arabidopsis, maize ubiquitin promoter in rice). To overexpress miRNA target genes, genomic regions including their natural promoter regions are first cloned. Then, the miRNA-complementary site is mutated using synonymous codons, creating miRNA-insensitive targets. Detailed methods are described in chapter 13. Overexpression of stress-responsive miRNAs or their insensitive targets should disrupt stress-responses and provide insight about their roles. Reducing the expression of miRNA targets can potentially be accomplished by overexpressing the miRNA. In addition, knockout mutants such as the T-DNA lines in Arabidopsis and in rice may offer insight as to their function. Many miRNAs belong to multigene families, so a knockout approach to identify their functions would be plagued with redundancy problems. One way to inhibit the function of a miRNA family is to use target mimics (28) or “miRNA sponges”(29). A vector containing one or more binding sites to a miRNA of interest under strong promoter can be introduced into the plants. The expression of the target mimics or miRNA sponges will act as competitive inhibitors of miRNAs and derepress targets.
4. Notes 1. Using the rdr2 mutant background, which is diminished for many siRNAs, may provide greater sensitivity for miRNA detection. Any changes should be validated in wild-type plants. Although the rdr2 mutant does not display an aberrant phenotype under normal growth conditions in Arabidopsis, its phenotype under the
228
Jeong et al.
stress in question should be compared with that of the wild type in case an aberrancy should become apparent. 2. Small RNA library construction with 2-nt coded indexes in 5¢ adaptors facilitates the pooling of several libraries in order to sequence them in the same reaction. 3. The normalization multiplier is usually changed, depending on the sequencing depth, to a figure close to the lowest common denominator for a given set of libraries that will be compared. This figure is typically in the range of TPQ (transcripts per quarter million) up to TP5M (transcripts per five million). 4. For the analysis of miRNA targets that are weakly expressed or cleaved at a low rate, the use of poly(A)-selected RNA and a gene-specific primer to prime the cDNA may increase the success rate. 5. PCR products of multiple sizes may arise after the second amplification step. To avoid this, use primers which have a minimum annealing temperature of 60°C and ensure that they are specific to the gene of interest. 6. It is important to verify that the Taq used to amplify the PCR product leaves unpaired adenosines at the 3¢ ends of the PCR products if a TA cloning kit is to be used. If a proofreading Taq is used, a blunt end cloning kit, such as Zero Blunt TOPO PCR cloning kit (Invitrogen, Carlsbad, CA), should be used instead. 7. The sequencing depth needed will depend on the system and question being addressed. We currently aim for a depth of at least 10 million reads for PARE libraries to identify new miRNA/target RNA pairs in plant species.
Acknowledgments The methods in this chapter were developed with support from NSF, USDA, and DOE. References 1. Jones-Rhoades MW, Bartel DP, Bartel B (2006) MicroRNAS and their regulatory roles in plants. Annu Rev Plant Biol 57:19–53 2. Chen X (2005) MicroRNA biogenesis and function in plants. FEBS Lett 579:5923–5931 3. Vazquez F (2006) Arabidopsis endogenous small RNAs: highways and byways. Trends Plant Sci 11:460–468 4. Park W, Li J, Song R, Messing J, Chen X (2002) CARPEL FACTORY, a Dicer homolog, and HEN1, a novel protein, act in microRNA
metabolism in Arabidopsis thaliana. Curr Biol 12:1484–1495 5. Vaucheret H, Vazquez F, Crete P, Bartel DP (2004) The action of ARGONAUTE1 in the miRNA pathway and its regulation by the miRNA pathway are crucial for plant development. Genes Dev 18:1187–1197 6. Llave C, Xie Z, Kasschau KD, Carrington JC (2002) Cleavage of Scarecrow-like mRNA targets directed by a class of Arabidopsis miRNA. Science 297:2053–2056
7. Chiou TJ, Aung K, Lin SI, Wu CC, Chiang SF, Su CL (2006) Regulation of phosphate homeostasis by MicroRNA in Arabidopsis. Plant Cell 18:412–421 8. Sunkar R, Kapoor A, Zhu JK (2006) Posttranscriptional induction of two Cu/Zn superoxide dismutase genes in Arabidopsis is mediated by downregulation of miR398 and important for oxidative stress tolerance. Plant Cell 18:2051–2065 9. Zhang JF, Yuan LJ, Shao Y, Du W, Yan DW, Lu YT (2008) The disturbance of small RNA pathways enhanced abscisic acid response and multiple stress responses in Arabidopsis. Plant Cell Environ 31:562–574 10. Jones-Rhoades MW, Bartel DP (2004) Computational identification of plant microRNAs and their targets, including a stressinduced miRNA. Mol Cell 14:787–799 11. Zhou X, Wang G, Sutoh K, Zhu JK, Zhang W (2008) Identification of cold-inducible microRNAs in plants by transcriptome analysis. Biochim Biophys Acta 1779:780–788 12. Zhou X, Wang G, Zhang W (2007) UV-B responsive microRNA genes in Arabidopsis thaliana. Mol Syst Biol 3:103 13. Sunkar R, Zhu JK (2004) Novel and stressregulated microRNAs and other small RNAs from Arabidopsis. Plant Cell 16:2001–2019 14. Fujii H, Chiou TJ, Lin SI, Aung K, Zhu JK (2005) A miRNA involved in phosphate-starvation response in Arabidopsis. Curr Biol 15:2038–2043 15. Lu S, Sun YH, Shi R, Clark C, Li L, Chiang VL (2005) Novel and mechanical stressresponsive MicroRNAs in Populus trichocarpa that are absent from Arabidopsis. Plant Cell 17:2186–2203 16. Lu C, Tej SS, Luo S, Haudenschild CD, Meyers BC, Green PJ (2005) Elucidation of the small RNA component of the transcriptome. Science 309:1567–1569 17. Fahlgren N, Howell MD, Kasschau KD, Chapman EJ, Sullivan CM, Cumbie JS, Givan SA, Law TF, Grant SR, Dangl JL, Carrington JC (2007) High-throughput sequencing of Arabidopsis microRNAs: evidence for frequent birth and death of MIRNA genes. PLoS ONE 2:e219 18. Lu C, Kulkarni K, Souret FF, MuthuValliappan R, Tej SS, Poethig RS, Henderson IR, Jacobsen SE, Wang W, Green PJ, Meyers BC (2006) MicroRNAs and other small RNAs enriched in the Arabidopsis RNA-dependent RNA polymerase-2 mutant. Genome Res 16:1276–1288 19. Zhu QH, Spriggs A, Matthew L, Fan L, Kennedy G, Gubler F, Helliwell C (2008) A diverse set of microRNAs and microRNA-like
Abiotic Stress-Associated miRNAs
229
small RNAs in developing rice grains. Genome Res 18:1456–1465 20. Meyers BC, Axtell MJ, Bartel B, Bartel DP, Baulcombe D, Bowman JL, Cao X, Carrington JC, Chen X, Green PJ, Griffiths-Jones S, Jacobsen SE, Mallory AC, Martienssen RA, Poethig RS, Qi Y, Vaucheret H, Vazquez F, Voinnet O, Weigel D, Zhu JK (2008) Criteria for annotaion of plant microRNAs. Plant Cell 20:3186–3190 21. Maroney PA, Chamnongpol S, Souret F, Nilsen TW (2007) A rapid, quantitative assay for direct detection of microRNAs and other small RNAs using splinted ligation. RNA 13:930–936 22. Moxon S, Moulton V, Kim JT (2008) A scoring matrix approach to detecting miRNA target sites. Algorithms Mol Biol 3:3 23. Zhang Y (2005) miRU: an automated plant miRNA target prediction server. Nucleic Acids Res 33:W701–W704 24. Souret FF, Kastenmayer JP, Green PJ (2004) AtXRN4 degrades mRNA in Arabidopsis and its substrates include selected miRNA targets. Mol Cell 15:173–183 25. Kasschau KD, Xie Z, Allen E, Llave C, Chapman EJ, Krizan KA, Carrington JC (2003) P1/HC-Pro, a viral suppressor of RNA silencing, interferes with Arabidopsis development and miRNA function. Dev Cell 4:205–217 26. German MA, Pillay M, Jeong DH, Hetawal A, Luo S, Janardhanan P, Kannan V, Rymarquis LA, Nobuta K, German R, De Paoli E, Lu C, Schroth G, Meyers BC, Green PJ (2008) Global identification of microRNA-target RNA pairs by parallel analysis of RNA ends. Nat Biotechnol 26:941–946 27. Sunkar R, Chinnusamy V, Zhu J, Zhu JK (2007) Small RNAs as big players in plant abiotic stress responses and nutrient deprivation. Trends Plant Sci 12:301–309 28. Franco-Zorrilla JM, Valli A, Todesco M, Mateos I, Puga MI, Rubio-Somoza I, Leyva A, Weigel D, Garcia JA, Paz-Ares J (2007) Target mimicry provides a new mechanism for regulation of microRNA activity. Nat Genet 39:1033–1037 29. Ebert MS, Neilson JR, Sharp PA (2007) MicroRNA sponges: competitive inhibitors of small RNAs in mammalian cells. Nat Methods 4:721–726 30. Yamaguchi-Shinozaki K, Shinozaki K (1994) A novel cis-acting element in an Arabidopsis gene is involved in responsiveness to drought, low-temperature, or high-salt stress. Plant Cell 6:251–264 31. Claes B, Dekeyser R, Villarroel R, Van den Bulcke M, Bauw G, Van Montagu M, Caplan A
230
Jeong et al.
(1990) Characterization of a rice gene showing organ-specific expression in response to salt stress and drought. Plant Cell 2:19–27 32. Moons A, De Keyser A, Van Montagu M (1997) A group 3 LEA cDNA of rice, responsive to abscisic acid, but not to jasmonic acid, shows variety-specific differences in salt stress response. Gene 191:197–204 33. Wilhelm KS, Thomashow MF (1993) Arabidopsis thaliana cor15b, an apparent homologue of cor15a, is strongly responsive to cold and ABA, but not drought. Plant Mol Biol 23:1073–1077 34. Gilmour SJ, Zarka DG, Stockinger EJ, Salazar MP, Houghton JM, Thomashow MF (1998) Low temperature regulation of the Arabidopsis CBF family of AP2 transcriptional activators as an early step in cold-induced COR gene expression. Plant J 16:433–442 35. Rabbani MA, Maruyama K, Abe H, Khan MA, Katsura K, Ito Y, Yoshiwara K, Seki M, Shinozaki K, Yamaguchi-Shinozaki K (2003) Monitoring expression profiles of rice genes under cold, drought, and high-salinity stresses and abscisic acid application using cDNA microarray and RNA gel-blot analyses. Plant Physiol 133:1755–1767 36. Huang HJ, Fu SF, Tai YH, Chou WC, Huang DD (2002) Expression of Oryza sativa MAP kinase gene is developmentally regulated and stress-responsive. Physiol Plant 114:572–580 37. Nishizawa A, Yabuta Y, Yoshida E, Maruta T, Yoshimura K, Shigeoka S (2006) Arabidopsis heat shock transcription factor A2 as a key regulator in response to several types of environmental stress. Plant J 48:535–547
38. Yamanouchi U, Yano M, Lin H, Ashikari M, Yamada K (2002) A rice spotted leaf gene, Spl7, encodes a heat stress transcription factor protein. Proc Natl Acad Sci USA 99: 7530–7535 39. Bariola PA, Howard CJ, Taylor CB, Verburg MT, Jaglan VD, Green PJ (1994) The Arabidopsis ribonuclease gene RNS1 is tightly controlled in response to phosphate limitation. Plant J 6:673–685 40. Yi K, Wu Z, Zhou J, Du L, Guo L, Wu Y, Wu P (2005) OsPTF1, a novel transcription factor involved in tolerance to phosphate starvation in rice. Plant Physiol 138:2087–2096 41. Hur YJ, Lee HG, Jeon EJ, Lee YY, Nam MH, Yi G, Eun MY, Nam J, Lee JH, Kim DH (2007) A phosphate starvation-induced acid phosphatase from Oryza sativa: phosphate regulation and transgenic expression. Biotechnol Lett 29:829–835 42. Bari R, Datt Pant B, Stitt M, Scheible WR (2006) PHO2, microRNA399, and PHR1 define a phosphate-signaling pathway in plants. Plant Physiol 141:988–999 43. Takahashi H, Yamazaki M, Sasakura N, Watanabe A, Leustek T, Engler JA, Engler G, Van Montagu M, Saito K (1997) Regulation of sulfur assimilation in higher plants: a sulfate transporter induced in sulfate-starved roots plays a central role in Arabidopsis thaliana. Proc Natl Acad Sci USA 94:11102–11107 44. German MA, Luo S, Schroth G, Meyers BC, Green PJ (2009) Construction of Parallel Analysis of RNA Ends (PARE) libraries for the study of cleaved miRNA targets and the RNA degradome. Nat Protoc 4:356–362
Chapter 15 Processing of miRNA Precursors Yukio Kurihara and Yuichiro Watanabe Abstract Plant microRNA (miRNA) processing requires at least two cleavage steps of respective precursors. The first cleavage step is from pri-miRNA to pre-miRNA, and the second cleavage step is from pre-miRNA to mature miRNA. Using northern blot analysis, we previously showed that the RNase III enzyme Dicerlike protein 1 (DCL1) and the double-stranded RNA-binding (DRB) protein HYL1 are involved in the processing reactions of miRNA precursors. The processing of plant miRNAs differs from that of animal miRNAs in some respects. Here, we introduce our methods for analyzing the processing of miRNA precursors through transient expression of mutated pri-miRNAs with modified stem-loop structures in Nicotiana benthamiana. The methods described here could be useful for understanding how DCL proteins and cognate DRB proteins are involved in adequate recognition and processing of substrate precursor RNA molecules. Key words: microRNA, Plant, RNase III processing, Dicer-like protein, Precursor microRNA
1. Introduction MicroRNAs (miRNAs) are noncoding RNA species of 20–24 nt that induce post-transcriptional gene silencing through cleavage or translation inhibition of their target mRNAs via sequence elements that are highly complementary to the miRNA sequences (1). miRNAs regulate some important processes, including differentiation, development and stress responses. Primary miRNA transcripts (pri-miRNAs) are transcribed from miRNA-coding genes by RNA polymerase II (2). pri-miRNAs contain a stem-loop structure that encodes miRNA sequences on one side of the stem. At both ends of the stem, there are flexible nonpairing nucleotide sequences. In animals, an almost complete stem-loop structure is produced after cleavage by the RNase III enzyme Drosha
B.C. Meyers and P.J. Green (eds.), Plant MicroRNAs, Methods in Molecular Biology, vol. 592 DOI 10.1007/978-1-60327-005-2_15, © Humana Press, a part of Springer Science + Business Media, LLC 2009
231
B & W IN PRINT
232
Kurihara and Watanabe microRNA biogenesis Animal nucleus
cytoplasm
Drosha
Dicer
DGCR8 or Pasha
pri-miRNA
pre-miRNA
miRNA/ miRNA*
miRNA
Plant nucleus
DCL1
DCL1
HYL1 Fig. 1. Comparison of the steps in miRNA processing between animals and plants. Both processing pathways include two cleavage steps.
at specific sites in the nucleus (Fig. 1) (3). Drosha functions as a complex with the double-stranded RNA (dsRNA)-binding protein DGCR8 (also known as Pasha) (4–7). The reaction eliminates the nonpairing sequences and releases specific panhandle-structured intermediates that are referred to as pre-miRNAs. Each pre-miRNA is subsequently exported to the cytoplasm and cleaved by Dicer, a cytoplasmic RNase III enzyme. Dicer cleavage produces an miRNA/ miRNA* duplex. One strand of the duplex is then selected and incorporated into an RNA-induced silencing complex (RISC) (8). The RISC subsequently induces the functional target mRNA cleavage or translation inhibition. Plant miRNA processing appears to be very similar to that of animal miRNAs (Fig. 1), except for the fact that both the first and second cleavage steps are performed by the same Dicer-like 1 (DCL1) enzyme, one of four plant Dicer homologues (9). In other words, DCL1 plays dual roles corresponding to the roles of Drosha and Dicer in animals (Fig. 1). DCL1 also forms a complex with the dsRNA-binding protein HYL1 (10, 11). This interaction is important for efficient and precise processing of pri-miRNAs. The DCL1–HYL1 complex localizes in the SmD3/ SmB nuclear body, indicating that both the first and second cleavage steps occur in the nucleus (12). Here, we introduce our methods for detecting pri-miRNAs and pre-miRNAs, which can be used in analyses of their processing. We previously reported that Arabidopsis miR163, miR164b and miR166a precursors are detectable by northern blot analysis (9, 11). We have also shown that exogenously introduced pri-miRNA
Processing of miRNA Precursors
233
constructs are properly transcribed and processed in heterologous Nicotiana benthamiana as well as in Arabidopsis, thereby making it possible to clarify the fates of the introduced miRNA transcripts. We provide examples to analyze what motifs in pri-miRNA secondary structure precisely determine the first cleavage position of pri-miRNAs in planta. A series of modified pri-miRNAs were transiently expressed by way of an Agrobacterium infiltration method in N. benthamiana leaves (13). We were able to detect aberrant and/or inefficient processing of some mutated pri-mRNAs by northern blot analysis. Taken together, the possible stem-loop structure adjacent to and including the first cleavage site determines a possible cleavage site through a certain spacing distance between the very bottom of the stem and the possible cleavage site. More detailed analyses are required to further clarify this aspect.
2. Materials 1. pCR4 TOPO vector (Invitrogen, Carlsbad CA). 2. pSK1 vector ( kindly provided by Prof. Yasunori Machida, Nagoya University). 3. Agrobacterium tumefaciens: strain GV3101 carrying the pMP90 helper plasmid (see Note 1). 4. Escherichia coli DH5a. Luria-Bertani (LB) medium: tryptone 10 g/L, yeast extract 5 g/L, NaCl 10 g/L, pH 7.5. 5. YEB medium: 5 g/L beef extract, 1 g/L yeast extract, 5 g/L peptone, 5 g/L sucrose, 2 mM MgSO4, pH 7.0. 6. Agrobacterium infiltration buffer: 10 mM MgCl2, 10 mM MES (pH 5.8), 150 mM acetosyringone. Kanamycin: working concentration, 30 mg/mL; gentamycin: working concentration, 20 mg/mL; rifampicin: working concentration, 100 mg/mL. 7. Needles and 1-mL syringes (both from TERUMO). 8. TRIzol reagent (Invitrogen): used for isolation of intact total RNA from plant tissues (see Note 2). 9. Phenol:chloroform:isoamylalcohol, 50:50:1. 10. 100% formamide (Wako). 11. 4× tracking dye solution: 50% glycerol, 0.03% bromophenol blue, 50 mM Tris–HCl (pH 7.7), 5 mM EDTA. 12. Polyacrylamide gels: polymerized from a monomer solution of acrylamide and bisacrylamide at a ratio of 19:1. 13. Urea (Nacalai). 14. 0.5× TBE: 89 mM Tris (pH 8.3), 89 mM boric acid, 2 mM EDTA.
234
Kurihara and Watanabe
15. 0.5 mg/mL ethidium bromide solution (EtBr). 16. RNA size standards: Century Marker Plus Markers (Ambion) including seven RNAs with lengths of 1,000, 750, 500, 400, 300, 200 and 100 nucleotides. 17. Chemically synthesized miRNA markers: miR164 (21 nt; UGGAGAAGCAGGGCACGUGCA) and miR163 (24 nt; UUGAAGAGGACUUGGAACUUCGAU) RNAs. 18. Nylon membrane: Hybond N + (GE Healthcare UK Ltd.). 19. Megaprime DNA-labeling system (GE Healthcare UK Ltd.). 20. T4 polynucleotide kinase (TOYOBO). 21. Hybridization buffer: Perfect Hyb plus hybridization buffer (Sigma) (see Note 3). 22. Wizard® SV Gel. 23. PCR Clean-Up System. 24. A full-wet transfer unit (Biocraft, Tokyo, Japan). 25. A UV crosslinker: UV Stratalinker 1800 (Stratagene). 26. [a-32P]dCTP.
3. Methods The methods described below outline (1) Expression plasmid for mutant miRNA genes, (2) Gene introduction and expression of primary transcript in N. benthamiana, (3) Total RNA extraction, (4) Analysis of processing by northern blot analysis, and (5) Cycle-RT-PCR and determination of cleavage site sequences. 3.1. Expression Plasmid for Mutant miRNA Genes
The pCR4 TOPO vector was used for cloning pre-miRNA sequences. The sequences to be transcribed in plants were inserted in a pSK1 vector (14) under the direction of a 35 S promoter.
3.1.1. pCR4 TOPO Vector, pSK1 Vector 3.1.2. cDNA Ccloning
The sequences of pri-miR164b and pri-miR166a were PCRamplified and subcloned into the pCR4 TOPO. The primers used for amplification of pri-miR164b were 5¢-AGCGGCC GCCTA-ACCTGATACTATTTC-3¢ and 5¢-ATCTAGAATCC AGACAA-ATCATAC-3¢, while those used for pri-miR166a were 5¢-AGC-GGCCGCATAGCAATGTAGAAAAG-3¢ and 5¢-ATCTAGAT-TGAAACTTGAAAAGC-3¢. After the restriction of the PCR fragments, Xba I-Not I fragments from the subcloned fragments were inserted into the pSK1 expression vector (35S promoter), creating pSK1-pri-miR164b, and pSK1-pri-miR166a, respectively.
Processing of miRNA Precursors
235
3.1.3. Mutagenesis
A series of artificial pri-miR164b and pri-miR166a constructs were designed and expected to express the respective pri-miRNA molecules with several base substitutions from the respective plasmids as shown in Fig. 2. The substitutions were mainly introduced at the bottom part of the stem. In these experiments, we intended to clarify the relationship between the processing efficiency, and the length between the bottom of the stem and the first original cleavage site of each pri-miRNA.
3.2. Gene Introduction and Expression of the Primary Transcript in N. benthamiana
Agrobacterium was precultured in YEB medium with vigorous shaking at 30°C for 16 h. The overnight culture was diluted 20-fold with fresh YEB medium and cultured until the OD600 reached the range of 0.2–0.3. Bacteria were collected by 3,000 × g at 4°C for 5 min. The bacteria pellet was suspended with ice-cold 10 mM Tris–HCl (pH 8.0) and collected, using the same centrifugation conditions. Bacteria were suspended with 1/20 volume of the original culture, and dispensed into 0.2 mL
3.2.1. Agrobacterium Transformation
pri-miRNA
pri-miR166a mutants
pri-miR164b mutants NORMAL AUGAG CA GU
GUG CAC CA - - -
MUTANT3 A
A GAUGG CUACC A
MUTANT1 AUGAG GUG CAAGAUGG CAC
GUUCUACC
GUG CAC
UGAGCAAGAUGG ACUCGUUCUACC - - - UUGUU
MUTANT4
MUTANT1 UUUCUCUU
GUGAUGAGCAAGAUGG CACUACUCGUUCUACC
C
A
MUTANT5 AUGAGCAA GUG GAUGG CAC CUACC CA - - - AAA
A UUG AAC
C
CA - - MUTANT2 A A GUG UGAGCA GAUGG CAC ACUCGU CUACC
NORMAL UUUCUCUU
GGGGA CCCCU C
MUTANT3 UUUCU CAAUUGGGGGGA GUUAACCCCCCU - - - UU MUTANT4 UUUCUCUUA
UUGGGGGGA AACCCCCCU - - - UUGUU
- - - UUGUUA
MUTANT2 UUUCU CAAUUG GUUAAC
MUTANT5 UUUCUCUUAA
- - - UU
A UG AC
GGGGA CCCCU C
GGGGA CCCCU C
A G C
- - - UUGUUAA
GGGGA CCCCU C
UUUCUCUUAACA GGGGA CCCCU - - - UUGUUAACC
Fig. 2. Artificially constructed modified precursor sequences derived from miR164b (A) and miR166a (B). One or multiple base substitutions were introduced into the base area of the stem structure, with the aim of changing the secondary structures near or at the first cleavage sites indicated by the arrows on the wild-type stem structures. The potential secondary structures of the pri-miRNAs were predicted using MFOLD 3.1 (http://molbio.info.nih.gov/molbio-nih/mfold. html). The first cleavage sites of the pri-miRNAs are indicated by arrows on the wild-type structure.
236
Kurihara and Watanabe
aliquots in microtubes. Samples were frozen in liquid nitrogen and stored at −80° C until use. Frozen Agrobacteria tubes for each transforming DNA were thawed at room temperature. A small volume of each DNA was put into the respective Agrobacteria tube, and the tubes were frozen in liquid nitrogen for 5 min. Tubes were then placed into a 37° C water-jacketed bath for 20 min. To each tube, 1 mL of LB medium was added, followed by incubation at 30° C for 1–2 h. Bacteria were collected by 3,000 × g centrifugation, then plated onto the respective LB plates including 30 mg/mL kanamycin, 20 mg/mL gentamycin, and 100 mg/mL rifampicin. After incubation for two overnights, positive colonies appeared. 3.2.2. Agrobacterium Preculture
Three-month-old N. benthamiana plants and A. tumefaciens strain GV3101 carrying the pSK1-pri-miR164b or pSK1-pri-miR166a plasmid and the pMP90 helper plasmid were used for infiltration experiments. Agrobacteria inoculated directly from individual colonies were grown in 5 mL of LB medium (plus 30 mg/mL kanamycin, 20 mg/mL gentamycin, and 100 mg/mL rifampicin) at 30° C for 20 h. The bacteria were centrifuged at 1,500 × g and collected as a pellet. The pellet was rinsed with Agrobacterium infiltration buffer to remove the LB medium, resuspended in the infiltration buffer to an OD at 600 nm of 0.5 (see Note 4), and then incubated at room temperature for more than 3 h.
3.2.3. Agroinfiltration into N. benthamina Leaves
Before infiltration, holes were created in fully expanded leaves of N. benthamiana using a needle. Subsequently, the Agrobacterium suspension was infiltrated into the leaves through the holes using a 1-mL syringe. For isolation of processed miRNAs and some intermediates, the plants were harvested at 48 h after the agroinfiltration (see Note 5).
3.3. Total RNA Extraction
Total RNA was extracted from patches of the infiltrated leaves 2 days after the agroinfiltration using TRIzol reagent (see Note 6).
3.3.1. Harvest 3.3.2. RNA Isolation
Each plant tissue sample (1 g) was pulverized in liquid nitrogen using a mortar and pestle. The resultant powder was mixed with 10 mL of TRIzol reagent, and 3 mL of chloroform was added. Next, the mixture was transferred to a centrifugation tube and vortexed vigorously. After separation of the phases by centrifugation, the aqueous phase was deproteinized once with an equal volume of phenol:chloroform:isoamylalcohol. RNA in the aqueous phase was precipitated with 0.7 vol. of isopropanol. For electrophoresis, the precipitate was dissolved in 100% formamide (can be stored at −80° C at this point) and centrifuged at 15,000 × g
Processing of miRNA Precursors
237
for a few minutes. After the centrifugation, the aqueous phase was collected as the total RNA sample and the RNA concentration was measured (see Note 7). 3.4. Analysis of Processing by Northern Blot Analysis 3.4.1. Electrophoresis
Aliquots containing 10 mg of total RNA were sufficient for detection of miRNA precursors and miRNAs. Each sample was heated at 65° C for 5 min and immediately cooled on ice. Just before electrophoresis, 4× tracking dye solution was mixed with the sample. RNA size standards were loaded in separate lanes, allowing blank lanes between the standard and sample lanes. The samples were subjected to electrophoresis in denaturing 7.5% polyacrylamide gels (19:1) (8 M urea, 0.5 × TBE) for detection of precursors, or 15% polyacrylamide gels (19:1) (7 M urea, 0.5 × TBE) for detection of miRNAs at 100 V in 0.5× TBE buffer until the dye reached the bottom of the gels. Next, the marker lanes and zones for 5 S rRNA and tRNAs in each gel were stained with EtBr (see Note 8) and analyzed using a UV transilluminator to record the mobilities and amounts of loaded RNAs. Subsequently, the marker mobilities were digitally aligned with the hybridization. The amounts of 5 S rRNA and tRNAs in the RNA samples were checked to confirm the equivalence of the RNA recoveries.
3.4.2. Filter Transfer
Prior to northern blotting, the nylon membrane was pre-wetted with 0.5× TBE buffer. The polyacrylamide gel containing the sample lanes was placed in contact with the nylon membrane, and the samples were transferred to the membrane using a full-wet transfer unit in 0.5× TBE buffer at 40 V for 1 h. After the northern blotting, the membrane was trimmed along the upper and lower borders of the gel to allow easy alignment of the membrane length with the position of the molecular size standards. The membranes were dried for a short period and subjected to UV crosslinking at 70,000 mJ/cm2 using a UV crosslinker.
3.4.3. Probe Construction and Labeling
The DNA fragments used as specific probes for the miRNA precursors were PCR-amplified from genomic DNA and subcloned into the pCR4 TOPO vector, creating pCR4-miR164bpP (miR164b precursor probe) and pCR4-miR166apP (miR166a precursor probe). The primers used were 5¢-GATGGAGAAGCAGGGCACGT-3¢ and 5¢-GTGAAGATGGGCACATGAAG-3¢ for miR164bpP (149 nt) and 5¢-AGATATATATTCAGAAACCCTAG-3¢ and 5¢-GGTTCATTCACTGGATCTGAAAC-3¢ for miR166apP (246 nt). The probe templates were amplified by PCR using pCR4miR164bpP or pCR4-miR166apP, and the PCR products were cleaned using a silica-based plasmid purification kit (Wizard® SV Gel and PCR Clean-Up System). For d etection
238
Kurihara and Watanabe
of miRNA precursors (pri-miRNAs, pre-miRNAs and remnant miRNAs containing the loop structure), radiolabeled DNA probes were synthesized by random priming reactions in the presence of [a-32P]dCTP using a Megaprime DNA-labeling system. Radiolabeled DNA oligonucleotide probes specific for miR164 (5¢-TGCACGTGCCCTGCTTCTCCA-3¢) and miR166 (5¢-GGTTCATTCACTGGATCTGAAAC-3¢) were constructed by end-labeling with [g-32P]ATP using T4 polynucleotide kinase. 3.4.4. Hybridization
Prehybridization and hybridization were performed in Perfect Hyb plus hybridization buffer in a thermo block rotator (NISSIN) (see Note 9). Prehybridization was performed for more than 1 h at 65° C to detect precursor pri-miRNAs and pre-miRNAs, or at 40° C to detect mature miRNAs. Hybridization was also performed overnight in hybridization buffer containing radiolabeled probes at the same respective temperatures to detect precursor miRNAs or miRNAs.
3.4.5. Filter Washing
Washing of filters was conducted using the following protocols. 1. To detect miRNAs using oligonucleotide probes: – 2× SSC containing 0.5% SDS at 50° C for 10 min twice. – 2× SSC containing 0.5% SDS at 50° C for 15 min three times. 2. To detect miRNAs precusor using random primed probes (tendency to have high backgrounds): – 2× SSC containing 0.5% SDS at 50° C for 10 min twice. – 0.5× SSC containing 0.5% SDS at 50° C for 15 min three times. The washed membranes were exposed to an Imaging Plate (BAS-MS2040; Fujifilm) for 2–6 h. Signals were detected using a BAS-2500 image analyzer (Fujifilm).
3.5. Cycle-RT-PCR and Determination of Cleavage Site Sequences
Aliquots containing 2–5 mg of low molecular weight RNA plus (see Note 6) were self-ligated with T4 RNA ligase (NEB) in 100 mL of 50 mM Tris–HCl (pH 7.8) containing 10 mM MgCl2 and 1 mM ATP in a 50-mL reaction volume for 2 h at 16° C. After the ligation reaction, the RNAs were precipitated with ethanol. The pre-miRNAs and miRNA remnants were amplified by RT-PCR using the following gene-specific primers: pre-miR164b forward, 5¢-GTGTGTTGAGTGTGATGATATGG-3¢ and premiR164b reverse, 5¢-CAAAATTCCGCATATATACACGC-3¢; pre-miR166a forward, 5¢-GAATTGAACCTTCAGATTTCAG-3¢ and pre-miR166a reverse, 5¢-TTGTTAGATCGAAAGAGATCC-3¢. The reverse primers were also used for RT reactions. The PCR products were resolved in 7.5% polyacrylamide gels, detected with EtBr staining, gel-purified and cloned into the pCR4 TOPO vector for sequencing (Fig. 3).
1,000 500 200
3
4
5
6 pri-miR164b
pre-miR164b
1,000 500
24 21
5S rRNA and tRNA
2
3
4
5
6 pri-miR166a
pre-miR166a 100
miR164
1
239
200
remnant 100
normal
1 2
vector
normal
Processing of miRNA Precursors vector
remnant
24 21
miR166 5S rRNA and tRNA
Fig. 3. Analysis of the processing of transcripts from the miR164b (A) and miR166a (B) gene constructs. The top panels show the detection of the respective pri-miRNAs, pre-miRNAs and remnant miRNAs. The middle panels show mature miR164b (A) and miR166a (B). The 5 S RNA and tRNA bands are loading controls Northern blot analysis of pri-miR164b derivatives revealed some differences in processing between wild-type and mutant pri-miR164b (A). Please note that endogenously expressed miR164 was also observed, since control N. benthamiana infiltrated with the empty cloning vector alone produced endogenous miR164 precursors. Mutant 1 has a one base substitution to make another base-pair adjacent to the first and normal DCL1 cleavage site. This base-pair would remove a wild-type one-base bulge as shown in Fig. 2. Comparison of the bands indicated that this single substitution did not affect the cleavage at the normal site, indicating that the small bulge is not significantly involved in aligning the cleavage site. On the contrary, mutants 2, 3 and 4 showed different band patterns, indicating that their substitutions affected the alignment of the DCL1/HYL1-mediated cleavage site. Mutant 2 has three consecutive base substitutions making three consecutive base-pairs just ahead of the original cleavage site. Mutant 3 has one base substitution, while mutant 4 has an additional base substitution that makes the stem longer and more consecutive. The findings that mutants 2, 3 and 4 could not produce precursors with the wildtype size, but did produce precursors that were a little longer, suggest that longer stem structures prompt DCL1/HYL1 to recognize sequences in the newly organized bottom areas. In fact, mutants 2, 3 and 4 produced longer miRNAs, possibly 22 nt. On the other hand, mutants 5 and 6 have a few base substitutions that destabilize the stem root, resulting in failure to detect any processing of these mutants. In the case of pri-miR166a and its mutants (B), similar results for those for miR164b processing were observed. Please note that endogenous miR166 was also observed in this case. Additional band intensities mean increased amounts of ectopically expressed miRNAs. Mutant 1 has a one base change to remove a one-base bulge, similar to miR164b mutant 1. Mutant 2 has two base substitutions to produce complementary matches at the bottom of the stem and make the stem more rigid. These two mutants produced precursor miRNAs and mature miRNAs at similar levels to the wild-type molecule. Mutant 3 with the base substitutions of both mutants 1 and 2 showed a faint difference in the pre-miRNA position, indicating a shift in the first cleavage site. Mutants 4, 5 and 6 have a few base substitutions that destabilize the stem root in a stepwise manner, similar to pri-miR164b mutants 5 and 6. The intensities of the precursor molecules gradually decreased, such that mutant 6 did not show precursor bands.
4. Notes 1. We initially used strain EHA101 as well. We did not find any differences between the strains. 2. TRIzol® reagent is a monophasic solution of phenol and guanidine isothiocyanate for isolating total RNA that is free of DNA
240
Kurihara and Watanabe
and proteins. The procedures are based upon the single-step RNA isolation method developed by Chomczynski and Sacchi (15). During sample homogenization or lysis, the TRIzol® reagent maintains the integrity of the RNA, while disrupting cells and dissolving cell components. 3. A classical hybridization buffer including formamide, SSC and Denhardt’s solution gave poor signals and high backgrounds in the analysis. The described hybridization buffer is ready to use and the probes just need to be added to start the hybridization. Since the buffer does not include formamide, the incubation can be performed at high temperatures. The hybridization conditions can be moderately controlled. 4. Suspensions ranging from 0.1 to 1.0 basically give similar results. Suspensions above 1.0 would cause wilting of the treated plants. 5. Harvesting at times beyond 24 h after the agroinfiltration, produces similar results in terms of the accumulation and detection of precursor molecules. We do not know whether the accumulation levels decline beyond 48 h after the agroinfiltration. 6. Previously, we fractionated total RNA into low molecular weight RNA plus (abundant in RNAs of less than 1,000 nt) using an RNA/DNA kit (Qiagen), thereby making it feasible to clearly detect precursor RNA molecules. Low molecular weight RNA plus (LMW RNA+) was isolated by anion-exchange chromatography using an RNA/DNA Midi kit (Qiagen, Valencia, CA) according to the manufacturer’s instructions with the following minor exception. Briefly, total RNA was dissolved in 200 mL of RNase-free water, followed by the addition of 1 mL of QRL1. Next, 9 mL of QRV2 was added, and the solution was applied to a Qiagen RNA/DNA column according to the manufacturer’s directions with the above-described alteration of the specific elution buffer (50 mM MOPS pH 7.0, 1.0 M NaCl, 15% (vol/ vol) ethanol) in the elution step. However, this Qiagen kit is not currently commercially available at present. A simple silica column strategy like that in the PureLink miRNA isolation kit (Invitrogen) was tried to enrich the miRNA precursor molecules. However, this kit was originally developed for the enrichment of mature small miRNA molecules, and not for the longer precursor molecules. It requires some tasks and is not as convenient for this purpose. If the enrichment of precursor miRNAs (except miR163) is necessary, polyethylene glycol (PEG) precipitation is recommended. The total RNA fraction can be subjected to PEG precipitation to separate a relatively high molecular weight RNA fraction (most ribosomal RNAs and mRNAs, including the relatively large pri-miRNA molecules) and a low molecular weight RNA fraction (possibly less than 500 bases, including most pre-miRNAs and mature miRNAs).
Processing of miRNA Precursors
241
7. RNA concentrations ranging from 3 to 5 mg/mL are recommended for later handling. 8. The whole gel can be stained with EtBr, but it requires some handling time and makes the RNA bands less sharp. 9. Hybridization can also be performed in a plastic bag. It is just a matter of handling convenience.
Acknowledgments We thank members of the Watanabe lab for helpful discussions. References 1. Bartel DP (2004) MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116:281–297 2. Lee Y, Kim M, Han J, Yeom KH, Lee S, Baek SH, Kim VN (2004) MicroRNA genes are transcribed by RNA polymerase II. EMBO J 23:4051–4060 3. Lee Y, Ahn C, Han J, Choi H, Kim J, Yim J, Lee J, Provost P, Radmark O, Kim S, Kim VN (2003) The nuclear RNase III Drosha initiates microRNA processing. Nature 425:415–419 4. Denli AM, Tops BBJ, Plasterk RHA, Ketting RF, Hannon GJ (2004) Processing of primary microRNAs by the Microprocessor complex. Nature 432:231–235 5. Gregory RI, Yan KP, Amuthan G, Chendrimada T, Doratotaj B, Cooch N, Shiekhattar R (2004) The Microprocessor complex mediates the genesis of microRNAs. Nature 432: 235–240 6. Han J, Lee Y, Yeom KH, Kim YK, Jin H, Kim VN (2004) The Drosha–DGCR8 complex in primary microRNA processing. Genes Dev 18:3016–3027 7. Landthaler M, Yalcin AA, Tuschl T (2004) The human DiGeorge syndrome critical region gene 8 and its D. melanogaster homolog are required for miRNA biogenesis. Curr Biol 14:2162–2167 8. Tomari Y, Matranga C, Haley B, Martinez N, Zamore PD (2004) A protein sensor for siRNA asymmetry. Science 306:1377–1380
9. Kurihara Y, Watanabe Y (2004) Arabidopsis micro-RNA biogenesis through Dicer-like 1 protein functions. Proc Natl Acad Sci U S A 101:12753–12758 10. Han MH, Goud S, Song L, Fedoroff N (2004) The Arabidopsis double-stranded RNAbinding protein HYL1 plays a role in microRNA-mediated gene regulation. Proc Natl Acad Sci U S A 101:1093–1098 11. Kurihara Y, Yuasa T, Watanabe Y (2006) The interaction between DCL1 and HYL1 is important for efficient and precise processing of pri-miRNA in plant microRNA biogenesis. RNA 12:206–212 12. Fujioka Y, Utsumi M, Ohba Y, Watanabe Y (2007) Location of a possible miRNA processing site in SmD3/SmB nuclear bodies in Arabidopsis. Plant Cell Physiol 48: 1243–1253 13. Kurihara Y, Inaba N, Kutsuna N, Takeda A, Tagami Y, Watanabe Y (2007) Binding of tobamovirus replication protein with small RNA duplexes. J Gen Virol 88:2347–2352 14. Kojima S, Banno H, Yoshioka Y, Oka A, Machida C, Machida Y (1999) A binary vector plasmid for gene expression in plant cells that is stably maintained in Agrobacterium cells. DNA Res 6:407–410 15. Chomczynski P, Sacchi N (1987) Single-step method of RNA isolation by acid guanidinium thiocyanate–phenol–chloroform extraction. Anal Biochem 162:156–159
Chapter 16 Purification of Arabidopsis Argonaute Complexes and Associated Small RNAs Yijun Qi and Shijun Mi Abstract Argonaute (AGO) proteins recruit small RNAs to form effector complexes of RNA interference (RNAi), collectively termed RNA-induced silencing complexes (RISCs). Here, we describe detailed protocols for the purification of AGO complexes and their associated small RNAs, using Arabidopsis AGO1 as an example. Key words: RNAi, Argonaute, microRNA, Purification
1. Introduction Argonaute (AGO) proteins recruit small RNAs to form effector complexes, collectively termed RNA-Induced Silencing Complexes (RISCs). RISCs are guided by small RNAs to their targets (RNA or chromatin) based on sequence complementarity, to catalyze target mRNA cleavage, translational repression or chromatin modification (1, 2). Arabidopsis encodes 10 AGO proteins. Of these, AGO1 predominates in the miRNA pathway (3–6), and AGO4 and AGO6 play redundant roles in repeatassociated siRNA (rasiRNA) accumulation and in controlling DNA methylation and TGS at specific genomic loci, including transposons and repeats (7–10). The functions of the other AGO proteins remain largely unknown. As a complementary approach to genetic analysis of RNAi mutants, profiling of small RNAs from purified AGO complexes can provide direct information about the diversification and range of biological roles of AGOs and their associated small RNAs.
B.C. Meyers and P.J. Green (eds.), Plant MicroRNAs, Methods in Molecular Biology, vol. 592 DOI 10.1007/978-1-60327-005-2_16, © Humana Press, a part of Springer Science + Business Media, LLC 2009
243
244
Qi and Mi
In this chapter, we describe detailed protocols for the purification of AGO complexes and their associated small RNAs, using AGO1 as an example.
2. Materials 1. Plant extraction buffer. 50 mM Tris–HCl (pH7.5), 150 mM NaCl, 0.1% NP-40, 4 mM MgCl2, 5 mM DTT, and EDTAfree protease inhibitor cocktail (1 tablet/10 mL buffer, Roche). DTT and protease inhibitor cocktail should be added freshly. 2. Syringe-driven filter unit (0.45 mm, Millipore). 3. Syringes. 4. Protein A agarose (Roche). 5. Trizol reagent (Invitrogen). 6. 5× SDS sample buffer. 0.6 mL 1 M Tris–HCl (pH 6.8), 5 mL 50% glycerol, 2 mL 10% SDS, 0.5 mL b-Mercaptoethanol (final concentration, 14.4 M), 1 mL 1% bromophenol blue, 0.9 mL ddH2O. 7. Materials for SDS polyacrylamide gel electrophoresis. 8. Peptide (Invitrogen). 9. Glycogen (Ambion). 10. Fixing solution. To make 200 mL, mix 80 mL Methanol, 100 mL 37% formaldehyde and 120 mL H2O. 11. Sensitizer solution. 0.2 g/L Na2S2SO3. 12. Silver solution. 0.1% AgNO3. 13. Developer solution. To make 200 mL, add 6 g Na2CO3, 4 mL 0.2 g/L Na2S2SO3 and 250 mL formaldehyde in 196 mL H2O. 14. Stop solution. 10% acetic acid. 15. 5× Transfer buffer. 15.1 g Tris–base, 72.0 g Glycine, 20% methanol, to 1 L. 16. TBST. 12.11 g Tris–base, 8.775 g NaCl and 0.5% Tween 20, to 1 L. 17. Blocking buffer. 5% nonfat dry milk in TBST. 18. X-ray film. 19. 2× PK buffer. 0.2 M Tris–HCl, pH 8.0, 0.1 M EDTA, pH8.0, 0.5 M NaCl, 2% SDS. 20. 3 M NaAc, pH 5.2. 21. Riboprobe T7 kit (Promega). 22. 5× Cleavage buffer. 5 mM ATP, 1 mM GTP, 6 mM MgCl2, 125 mM creatine phosphate, 150 mg/mL creatine kinase and 2 unit/mL RNasin RNase Inhibitor (Promega).
Purification of Arabidopsis Argonaute Complexes and Associated Small RNAs
245
23. 2× Loading buffer. 95% Formamide, 0.025% xylene cyanol, 0.025% bromophenol blue, 18 mM EDTA, 0.025% SDS. 24. 32P-g-ATP (3,000 Ci/mmol). 25. 32P-pCp (3,000 Ci/mmol). 26. 32P-a-UTP (800 Ci/mmol).
3. Methods The methods described below, outline (1) purification of Arabidopsis AGO1 complex using specific antibody, (2) detection of AGO1 by silver staining and western blot, (3) examination of cleavage activity of purified AGO1 complex, and (4) isolation of small RNAs from purified AGO1 complex. 3.1. Immunopurification of AGO1 Complex
The AGO1 complex can be immunopurified by using antibody, specifically recognizing the N-terminal 11 residues. The peptide was conjugated to mcKLH using the Imject Maleimide Activated mcKLH Kit (Pierce) and used to raise rabbit polyclonal antiserum. The antiserum were affinity-purified using Sulfolink coupling gel (Pierce) conjugated to the peptide. Detailed protocols for the generation and purification of peptide antibody can be obtained from the manufacturer, and thus are not included here. Described below are the steps of immunoprecipitation of AGO1 complex using peptide antibody and elution of the complex from the beads using peptides (see Note 1).
3.1.1. Immunopurification of Arabidopsis AGO1 Complex
1. Harvest 20 g (fresh weight) of Arabidopsis cultured cells, leaf or inflorescence tissues, grind the tissues into fine powder under liquid nitrogen using mortar and pestle, and then homogenize in 10 mL of plant extraction buffer. 2. Centrifuge the extracts (20,000 g, 25 min) to remove cell debris. Filter the supernatant through a 0.45 mm filter. 3. Pre-clear the extracts by incubation with 100 mL of Protein A agarose beads (Roche) at 4°C for 60 min (see Note 2). 4. Centrifuge at 250 g for 5 min, 4°C to pellet the beads. 5. Divide the supernatant into two equal amounts, and transfer into two new 15 mL Falcon tubes. 6. Add AGO1 antibodies in one tube (usually 1:50–500 dilution, depending on the titer of antibody; the optimal dilution should be tested before a large scale purification). Add same amount of pre-immune serum into the other tube, this is the control purification. Incubate at 4°C for 2 h. 7. Add 120 mL of Protein A agarose beads into each tube, and incubate at 4°C for 1 h.
246
Qi and Mi
8. Centrifuge at 250 g for 5 min, 4°C. Remove supernatant carefully and try not to disturb the beads. 9. Wash the beads three times (20 min each) in plant extraction buffer containing 2 mM DTT at 4°C. 10. After the final wash, add 200 mL of plant extraction buffer into each tube to resuspend the beads. 11. Set aside 20 mL of the beads, and add 5 mL of 5× SDS sample buffer. Store at −20°C until ready to use. 3.1.2. Elution of AGO1 Complex from the Beads with Peptides
1. Prepare peptide elution solution by dissolving 0.2 mg of AGO1 peptide in 0.2 mL extraction buffer. 2. Remove the buffer from the beads from Step 9 of Subheading 3.1.1. 3. Resuspend the beads with 100 mL of 1 mg/mL peptide elution buffer. Incubate at room temperature for 20 min by rocking or end-over-end mixing. 4. Centrifuge at 250 g for 5 min, and transfer the supernatant to another clean tube carefully. 5. Repeat Steps 3 and 4. 6. Pool the eluate. This is the purified AGO1 complex. The purified complex can be used for cleavage activity assay or isolation of associated small RNAs as described in Subheadings 3.3 and 3.4. 7. Aliquot 20 mL of the eluate, and add 5 mL of 5× SDS sample buffer. Store at −20°C until ready to use. 8. To examine the peptide elution efficiency, resuspend the beads in 200 mL of extraction buffer. Aliquot 20 mL of the beads, and add 5 mL of 5× SDS sample buffer. Store at −20°C until ready to use.
3.2. Detection of Purified AGO1 Protein
The next steps involve the detection of purified AGO1 protein by SDS–PAGE (Subheading 3.2.1) followed by silver staining (Subheading 3.2.2) and western blotting (Subheading 3.2.3).
3.2.1. SDS–PAGE
1. Prepare two 8% SDS polyacrylamide gels (11) using the Mini-PROTEAN gel system (Bio-Rad). 2. Incubate the samples from Step 10 of Subheading 3.1.1 and Steps 7 and 8 of Subheading 3.1.2 at 95°C for 5 min, and then load 10 mL of each onto each gel. 3. Run the gel at 120 V until the blue dye reaches the bottom of the gel. It usually takes 1.5–2 h. 4. One gel will be used for silver staining, and the other gel will be used for western blotting.
Purification of Arabidopsis Argonaute Complexes and Associated Small RNAs
3.2.2. Fast Silver Staining
247
1. Carefully remove the gel from the glass plates and place it into a clean plastic container. Add 50 mL of fixing solution, and shake on a staining shaker for 10 min. 2. Remove the fixing solution. Wash the gel with 50 mL of distilled water twice, each for 5 s. 3. Remove the water and incubate the gel with sensitizer solution for 1 min. 4. Remove the sensitizer solution. Wash the gel with 50 mL of distilled water twice, each for 5 s. 5. Decant the water and add 50 mL of silver solution, shake for 10 min. 6. Remove the silver solution, and wash the gel with water for 5 s. 7. Decant the water and add 50 mL of the developer solution. Shake gently until the protein bands appear. 8. Add 5 mL of stop solution to stop the developing reactions. The AGO1 protein should be seen as three bands around 120 kD (Fig. 1a).
3.2.3. Western Blot
1. Immerse the gel into 1× transfer buffer. 2. Soak blotting pads and PDVF membrane in transfer buffer and leave them in buffer until ready to assemble the blotting sandwich. 3. Hold up one pad and lay it flat on the platinum anode. Using a clean pipette, roll out any bubbles that may be trapped under the pad. 4. Place the soaked PDVF membrane on top of the pad and roll out the air bubbles. 5. Carefully place the gel on top of the membrane, well side up. Make sure all the edges are aligned and air bubbles are rolled out. 6. Take the other pad and place it on top of the gel. Roll out the pad to remove all air bubbles. Add approximately 10 mL of transfer buffer on top to resaturate the sandwich. 7. Place the anode core on top, then carefully transfer the entire assembly to the Trans-Blot cell box (Bio-Rad), pressing the anode core into the cathode core with the wedge. 8. Turn on the power supply. Set the power supply for constant voltage (15 V) for 30 min. 9. Transfer the membrane into a container and block the membrane (facing up) with 5% nonfat dry milk in TBST at room temperature for 1 h. Shake gently. 10. Incubate with primary antibody (1:1,000) in blocking buffer for 1 h at room temperature. Shake gently.
250
after elution
�-AGO1
Preimmune
�-AGO1
Preimmune
(kDa)
�-AGO1
b Preimmune
a
eluate
Qi and Mi
before elution
248
(kDa)
150
150
100
100
75
75
50
50
before elution
eluate
after elution
Fig. 1. Purification of Arabidopsis AGO1 complex. The AGO1 complex was immunoprecipitated by using a peptide-specific antibody, eluted from the Protein A agarose beads by peptides, and separated on an 8% SDS–PAGE. The proteins were detected by silver staining (a) or western blotting (b). The positions of protein size markers, electrophoresed in parallel, are shown to the left of each gel. The AGO protein bands are indicated by solid arrowheads.
11. Remove the buffer; wash the membrane with TBST three times, each 10 min. 12. Incubate with an HRP-conjugated goat anti-rabbit secondary antibody (1:5,000 dilution) in fresh blocking buffer for 1 h at room temperature. 13. Rinse with TBST three times, each 20 min. 14. Develop the blot using Chemiluminescence HRP substrate (Pierce) following the manufacturer’s instructions. 15. Expose the blot to X-ray film. For best results, expose for 10 s, 1 min, 5 min and 20 min to visualize the chemiluminescence signal (Fig. 1b). 3.3. Examination of Small RNAs Associated with AGO1 Complex
The steps described below, outline the procedure for isolation of small RNAs from AGO1 complex and detection of the small RNAs by SYBR gold staining, end-labeling (12) or northern blotting (13).
3.3.1. Extraction of Small RNAs from Purified AGO1 Complex
1. Mix 1 mL of Trizol reagent with the purified AGO1 complex from Step 9 of Subheading 3.1.1 or Step 6 of Subheading 3.1.2.
Purification of Arabidopsis Argonaute Complexes and Associated Small RNAs
249
2. Vortex vigorously for 15 s, and incubate at 50°C for 5 min. 3. Add 200 mL of chloroform and vortex vigorously. 4. Centrifuge at 20,000 × g for 15 min at 4°C. 5. Transfer the aqueous phase (~0.6 mL) to a new tube. Add 3–4 ml of glycogen (5 mg/mL) and 0.6 mL of isopropyl alcohol to precipitate the RNA. Keep at −20°C for 30 min. 6. Centrifuge at 20,000 × g for 15 min at 4°C. The RNA can be seen as a white pellet. 7. Carefully remove the supernatant, and add 1 mL of 75% ethanol to wash the RNA pellet. 8. Centrifuge at 20,000 × g for 5 min at 4°C. 9. Carefully remove the liquid. 10. Centrifuge again briefly and remove the remaining liquid completely. 11. Dry the RNA pellet at room temperature for 10 min. 12. Dissolve the RNA in 20 mL of RNase-free water. 3.3.2. Detection of Small RNAs by SYBR-Gold Staining
1. Prepare a 20 cm × 20 cm 15% polyacrylamide/7 M urea gel (11). 2. Mix the isolated small RNA with equal volumes of 2× loading buffer, and heat at 95°C for 3 min. 3. Load the samples onto the gel. 4. Run the gel at ~300 V until the bromophenol blue dye reaches the bottom of the gel, which takes 2–3 h. 5. Transfer the gel into a clean container and rinse with distilled water. 6. Stain the gel with SYBR-gold (1:10,000 dilution; Roche) for 10 min. The small RNAs can be visualized under UV (Fig. 2).
3.3.3. Detection of Small RNAs by End-Labeling
The 3¢ termini of small RNAs contain a hydroxyl, thus the small RNAs can be 3¢ end-labeled by ligation to 32P-pCp. 1. Set up the following reaction: RNA sample
10 mL
10× T4 RNA ligase buffer
2 mL
32
P-pCp
5 mL
RNAsin RNase inhibitor (40u/ ml)
1 mL
T4 RNA ligase (20u/ ml)
2 mL
Final volume
20 mL
Incubate at 4°C, overnight.
SYBR-gold staining
Northern blot
-AGO1
Qi and Mi Preimmune
250
(nt) 24 21
24 21
Fig. 2. Detection of small RNAs in AGO1 complex. Small RNAs were extracted from AGO1 complex, analyzed by denaturing polyacrymide gels, and detected by SYBR-gold staining (upper panel) or northern blotting (miR164, lower panel). The positions of RNA size markers are shown to the left.
2. Add 200 mL of Trizol reagent and 3 mL of glycogen (5 mg/mL). Vortex vigorously for 30 s. 3. Add 50 mL of chloroform and vortex vigorously. 4. Centrifuge at 20,000 × g for 5 min. Transfer aqueous phase (~100 mL) to a new tube. 5. Add 100 mL of isopropyl alcohol, mix well and incubate at −20°C for 30 min. 6. Centrifuge at 20,000 × g for 15 min at 4°C. Carefully remove the supernatant, centrifuge again briefly and remove remaining liquid completely. 7. Dry the RNA pellet at room temperature for 10 min. 8. Dissolve the RNA in 20 mL 2× loading buffer. Heat at 95°C for 3 min. 9. Resolve the RNAs by 15% polyacrylamide/7 M urea gel. A 5¢ end-labeled synthetic 21-nt RNA should be electrophoresed in parallel to serve as a size marker. 10. After electrophoresis, wrap the gel with plastic wrap, and expose it to phosphor screen to detect the small RNA signals. 3.3.4. Detection of Small RNAs by Northern Blot
To examine whether AGO1 complex contains particular small RNAs with known sequences, northern blots can be performed using oligonucleotide probes that are complementary to the small RNA sequences. In our hands, most of the known Arabidopsis miRNAs could be easily detected in AGO1 complex by northern blots (Fig. 2).
3.4. In Vitro RISC Activity Assay
Plant miRNAs regulate gene expression by cleaving their target mRNAs. Our lab and the Baulcombe lab have previously demonstrated that purified AGO1 complex can use the bound miRNAs to cleave corresponding target mRNAs in vitro (5, 6).
Purification of Arabidopsis Argonaute Complexes and Associated Small RNAs
251
The following steps outline the procedure for preparation of target RNAs and examination of RISC cleavage activity. 3.4.1. Preparation of 32 P-Labeled Target RNAs
Uniformly-labeled RNA transcripts can be prepared by in vitro transcription in the presence of 32P-UTP. The template for in vitro transcription is generated by PCR using primers specific for the miRNA target gene of one’s interest. T7 promoter sequence (TAATACGACTCACTATAGGG) is added to the 5¢ end of the forward primer. The primers are usually designed to amplify a ~200 bp fragment that contains the miRNA target region. 1. Use the Riboprobe T7 kit (Promega) to generate target RNA transcripts. Set up the following reaction: 5× cleavage buffer
4 mL
DTT, 100 mM
2 mL
RNasin RNase Inhibitor(40u/mL)
1 mL
rATP, 10 mM
1 mL
rCTP, 10 mM
1 mL
rGTP, 1 mM
1 mL
rUTP, 1 mM
1 mL
m7G Cap analog,10 mM
1 mL
32
P-UTP (800 Ci/mmol)
5 mL
T7 RNA polymerase
15–20 u
Final volume
20 mL
2. Incubate for 1 h at 37°C. 3. Stop the reaction by adding equal volume of 2× loading buffer and heating for 5 min at 95°C. 4. Load the in vitro transcripts in 8% polyacrylamide/7 M urea gel. 5. After electrophoresis, wrap the gel with plastic wrap and expose to X-film. Use a fluorescent tape to mark the position of the gel. 6. Excise the band of expected size, place it into an Eppendorf tube, and crush the gel slice into small pieces using a 1 mL tip. 7. Soak the gel in 400 mL of 2× PK buffer at room temperature with agitation for 3 ~ 4 h or overnight. 8. Add 400 ml of phenol/chloroform. Vortex vigorously for 30 s and centrifuge at 20,000 g for 10 min at room temperature. 9. Transfer the aqueous phase to a fresh tube, add 3 ~ 4 mL of glycogen (5 mg/mL) and 1 mL of 100% ethanol. Incubate for 30 min at −20°C.
252
Qi and Mi
10. Centrifuge at 15,000 g, 15 min. 11. Wash the RNA pellet with 70% ethanol. 12. Dry the pellet by leaving the cap open at room temperature for 5 min. 13. Dissolve the pellet in 20 mL of RNase-free water. 14. Measure the cpm of the RNA sample and adjust the labeled RNA to 50 K cpm/mL. 3.4.2. Cleavage Activity Assay
1. Set up a cleavage assay as follows. 5× cleavage buffer
4 mL
Purified AGO1 complex from Step 9 of Subheading 3.1.1 or Step 6 of Subheading 3.1.2
15 mL
Labeled target RNAs
1 mL
Final volume
20mL
2. Incubate at 25 ~ 30°C for 2 h. 3. Stop the reaction by adding 250 mL of Trizol and 4 mL of 5 mg/mL glycogen. Vortex vigorously for 30 s. 4. Add 50 ml of chloroform, vortex and centrifuge at 15,000 g for 5 min. 5. Transfer the aqueous phase (~150 mL) to a fresh tube, and add 150 mL of isopropyl alcohol in the tube. Mix well and incubate for 30 min at −20°C. 6. Centrifuge at 20,000 × g for 15 min at 4°C. Carefully remove the supernatant, centrifuge again briefly and remove the remaining liquid completely. 7. Dry the RNA pellet at room temperature for 10 min. 8. Dissolve the RNA in 20 ml of 2× loading buffer. Heat at 95°C for 3 min. 9. Resolve the RNAs by 8% polyacrylamide/7 M urea gel. 10. After electrophoresis, wrap the gel with plastic wrap, and expose it to a phosphor screen to detect the cleavage products (Fig. 3).
4. Notes 1. Alternatively, AGO1 complex can be purified by using transgenic Arabidopsis plants expressing N-terminal FLAG-tagged AGO1 under its native promoter and commercially available a-FLAG antibodies (5).
253
-AGO1
Preimmune
Purification of Arabidopsis Argonaute Complexes and Associated Small RNAs Buffer only
Substrate
3’ product
5’ product
Fig. 3. Examination of the cleavage activity of AGO1 complex. Uniformly labeled RNA transcripts (CUC1, the target of miR164) were incubated with purified AGO1 complex. Cleavage products were resolved in 15% denaturing PAGE gels. Positions of cleavage products are indicated.
2. Before use, the storage buffer for Protein A agarose beads should be replaced by plant extraction buffer by washing the beads with excess amount of plant extraction buffer three times.
Acknowledgments This work was supported by the Ministry of Science and Technology of China.
References 1. Hannon GJ (2002) RNA interference. Nature 418(6894):244–251 2. Meister G, Tuschl T (2004) Mechanisms of gene silencing by double-stranded RNA. Nature 431(7006):343–349
3. Vaucheret H et al (2004) The action of ARGONAUTE1 in the miRNA pathway and its regulation by the miRNA pathway are crucial for plant development. Genes Dev 18(10): 1187–1197
254
Qi and Mi
4. Morel JB et al (2002) Fertile hypomorphic ARGONAUTE (ago1) mutants impaired in post-transcriptional gene silencing and virus resistance. Plant Cell 14(3):629–639 5. Baumberger N, Baulcombe DC (2005) Arabidopsis ARGONAUTE1 is an RNA Slicer that selectively recruits microRNAs and short interfering RNAs. Proc Natl Acad Sci U S A 102:11928–11933 6. Qi Y, Denli AM, Hannon GJ (2005) Biochemical specialization within Arabidopsis RNA silencing pathways. Mol Cell 19:421–428 7. Qi Y et al (2006) Distinct catalytic and noncatalytic roles of ARGONAUTE4 in RNAdirected DNA methylation. Nature 443(7114): 1008–1012 8. Zilberman D, Cao X, Jacobsen SE (2003) ARGONAUTE4 control of locus-specific siRNA accumulation and DNA and histone methylation. Science 299(5607):716–719
9. Zilberman D et al (2004) Role of Arabidopsis ARGONAUTE4 in RNA-directed DNA methylation triggered by inverted repeats. Curr Biol 14(13):1214–1220 10. Zheng X et al (2007) Role of Arabidopsis AGO6 in siRNA accumulation. DNA methylation and transcriptional gene silencing. Embo J 26(6):1691–1701 11. Sambrook J, Fritsch EF, Maniatis T (eds) (1989) Molecular cloning, a laboratory manual. Cold Spring Harbor Press, Cold Spring Harbor, New York 12. Motamedi MR et al (2004) Two RNAi complexes, RITS and RDRC, physically interact and localize to noncoding centromeric RNAs. Cell 119(6):789–802 13. Dalmay T et al (2000) Potato virus X amplicons in arabidopsis mediate genetic and epigenetic gene silencing. Plant Cell 12(3): 369–379
Chapter 17 Transient Assays for the Analysis of miRNA Processing and Function Felipe F. de Felippes and Detlef Weigel Abstract Transient assays provide a convenient alternative to stable transformation. For small RNA analysis in plants, the most widely used method, commonly named agroinfiltration, makes use of Agrobacterium tumefaciens to deliver transgenes into leaf cells of Nicotiana benthamiana. Compared to the generation of stably transformed plants, agroinfiltration is more rapid, and samples can be analyzed a few days after inoculation. Agroinfiltration has been used successfully in many different applications, including the analysis of small RNAs. We describe here a protocol for analysis of miRNA processing using agroinfiltration of N. benthamiana leaves. Key words: Agroinfiltration, Transient assay, miRNA processing, Small RNA blot
1. Introduction The in vivo study of small RNAs requires, in many cases, the use of transgenic techniques. However, generation of stable transgenics for many flowering plants is labor-intensive and time-consuming. Even for species such as Arabidopsis thaliana, for which simple protocols for plant transformation are available (1, 2), the generation of stable transgenics can take months. As an alternative to the generation of stable transformants, transient introduction of transgenes has been used successfully in a large number of applications including the analysis of small RNAs, promoters and suppressors of RNA silencing (3–7) and the study of gene function (8). The greatest advantage of transient assays over the generation of stable transformants is its rapidity; transgene activity/expression can be assayed usually within a few days after B.C. Meyers and P.J. Green (eds.), Plant MicroRNAs, Methods in Molecular Biology, vol. 592 DOI 10.1007/978-1-60327-005-2_17, © Humana Press, a part of Springer Science + Business Media, LLC 2009
255
256
de Felippes and Weigel
transfection (9). The most popular method of transient assay in plants is the infiltration of Agrobacterium tumefaciens containing T-DNA in Nicotiana benthamiana leaves, due to the ease of manipulation and robust transgene expression. In addition, this method, agroinfiltration, does not require special equipment (9–11). Finally, the same vectors can often be used for generation of stable transformants, so that results obtained after agroinfiltration can be confirmed with stably transformed plants. The transient expression of the miR319a precursor in N. benthamiana leaves, RNA isolation, and posterior detection of the mature miRNA will be described here as an example for the use of transient assays in miRNA validation. The agroinfiltration protocol described here is based on technology developed in the laboratory of James Carrington, Oregon State University.
2. Materials 1. T-DNA vectors. 2. Oligonucleotide primers. 3. Taq DNA polymerase, restriction enzymes, T4 DNA ligase, dNTPs, LR ClonaseTM (Invitrogen), agar, agarose. 4. E. coli strain DH5a. 5. A virulent A. tumefaciens strain, e.g., ASE, GV3101, LBA4404. 6. N. benthamiana plants. 7. Growth medium: Luria-broth (LB) with 50 mg/mL kanamycin (A. tumefaciens ASE selection), 25 mg/mL chloramphenicol (A. tumefaciens ASE selection), 100 mg/mL spectomycin (binary vector selection) and 5 mg/mL tetracycline (pSoup selection). 8. Infiltration medium: 10 mM MgCl2, 10 mM MES pH 5.7, 150 mM acetosyringone. 9. Hypodermic needle and plastic syringe. 10. Mortar and pestle. 11. TRIZOL® reagent (Invitrogen) (toxic; causes burns). 12. Polyacrylamide gel: Polyacrylamide (neurotoxic while unpolymerized), urea, APS (ammonium persulfate) and TEMED. 13. 10× TBE: 54 g Tris base, 27.5 g boric acid, 20 mL 0.5 M EDTA, water to 500 mL. Adjust pH to 8.4. 14. Formamide (toxic) and loading dye (RNAse free). 15. Nytran SuPerCharge nylon transfer membrane (Schleicher & Schuell Bioscience).
Transient Assays for the Analysis of miRNA Processing and Function
257
16. Extra thick blot paper (BioRad). 17. Trans-Blot® SD semi-dry transfer cell (BioRad). 18. UV Stratalinker® 2400 (Stratagene). 19. Gamma 32P-ATP (10 mCi/mL) (radioactive hazard). 20. Optikinase (USB). 21. Micro Bio-Spin® 6 chromatography columns (BioRad). 22. PerfectHybTM Plus hybridization buffer, 1× (Sigma) (irritant). 23. 20× SSC: 3 M NaCl, 0.3 M sodium acetate. Adjust the pH to 7.0 with HCl. 24. 10% (w/v) sodium dodecyl sulfate (SDS). 25. X-ray film and X-ray cassette. 26. Photographic film developing solutions.
3. Methods In this section, we describe (a) the construction of the binary vector containing the precursor for miR319, (b) the agroinfiltration of N. benthamiana leaves, and (c) the analysis of miRNA processing. 3.1. Construction of the Binary Vector for Agroinfiltration
The following subsections contain a description of the vectors used in the transient assay (Subheading 3.1.1) and a description of the steps for expression of plasmid construction (Subheading 3.1.2).
3.1.1. Description of the Plasmids
The molecular basis of A. tumefaciens infection involves the transfer of a DNA segment called T-DNA (transfer DNA) from the tumor-inducing (Ti) plasmid of the bacterium to the plant cell (12, 13). For biotechnological purposes, binary systems are preferred, in which the trans function for T-DNA transfer are all encoded on a disarmed Ti plasmid that lacks the tumor-inducing sequences. For ease of manipulation during the cloning steps, the T-DNA itself is placed on a separate plasmid, hence the term “binary.” These plasmids have origins of replication that allow propagation in both E. coli and A. tumefaciens (13).The native T-DNA sequences, which cause tumor formation, are replaced by the sequence of interest, in this case for the expression of miRNAs (13). In the experiment described in this chapter, a modified binary plasmid based on pGreenII vectors is used (pFK210; Frank Küttner and Markus Schmid, pers. communication) (14). pFK210 is Gateway compatible (15), includes a gene conferring spectinomycin resistance for selection in E. coli and a T-DNA defined by the characteristic left and right borders (12, 13). The T-DNA
258
de Felippes and Weigel
region contains the BAR gene conferring resistance to the herbicide glufosinate (trade name Basta®) for selection of stable transformants (see Note 1), an expression cassette formed by a recombination module (attR1-ccdB-attR2, discussed below) (15), flanked by the strong constitutive promoter CaMV 35S and a transcriptional terminator from the Pisum sativum ribulose 1,5-bisphosphate carboxylase/oxygenase small subunit (rbcS) gene. To minimize the size of the plasmid and hence increase the ease of cloning procedures, the pSa replication region in the pGreenII plasmid lacks the RepA gene, which is necessary for replication in A. tumefaciens. Therefore, pGreenII plasmids need to be co-transformed with pSoup (14). pSoup contains, besides the RepA gene, a tetracycline resistance gene for A. tumefaciens and E. coli selection. Before recombination of the sequence of interest into the expression cassette in pFK210, it is necessary to adapt this molecule to the GATEWAY® technology. This can be done using an entry vector containing a multiple cloning site (MCS) flanked by the attL1 and attL2 sites, which can be recombined by a recombinase with the attR1 and attR2 sites present in pFK210 (15). The entry vector normally contains a gene conferring kanamycin resistance for selection in E. coli. 3.1.2. Generation of the Expression Vector
All procedures described here were performed using standard recombinant DNA methodologies (16). The 400 nucleotide fragment containing the miR319a precursor (17) was amplified by PCR from A. thaliana genomic DNA. Restriction sites for EcoRI and BamHI were included in the sequence of the forward and reverse primers, respectively. These enzymes were used for digestion of the PCR product and subsequent ligation into an entry vector digested with the same enzymes. The resulting plasmid was recombined with pFK210, generating the final binary plasmid, which was introduced in A. tumefaciens (strain ASE) together with pSoup via electroporation (18). All clones were confirmed by DNA sequencing (19). Presence of the binary vector in A. tumefaciens was confirmed by colony PCR, using specific primers.
3.2. Agroinfiltration of N. benthamiana Leaves
In the transient assay described here, wild-type N. benthamiana plants were used. Seeds were incubated for 3 days in the dark at 4° C soaked in 0.1% agar solution. Plants were grown on soil under long day condition (23° C, 16 h of light). Two to five-weeks old plants were used.
3.2.1. Plant Material 3.2.2. Agroinfiltration of N. benthamiana Leaves
1. Inoculate 5 mL of growth medium with a single colony of transformed A. tumefaciens. Incubate the culture for 20 h at 28–30° C with vigorous shaking.
Transient Assays for the Analysis of miRNA Processing and Function
259
2. Use 2 mL of the culture from step 1 to inoculate 50 mL of growth medium. Incubate for 16–20 h at 28–30° C with vigorous shaking. 3. Recover the bacteria by centrifugation of the culture at 2,000 × g (4,000 rpm in a Sorvall SS-34 rotor) for 10 min at 4° C. Remove the supernatant. 4. Resuspend the bacterial pellet in 30 mL of infiltration medium and incubate 16–20 h at room temperature (21–23° C) with gentle shaking (see Note 2). 5. Adjust the volume of the culture with the infiltration medium to a final concentration corresponding to an optical density (OD) of 0.5 at 600 nm. 6. With a sharp hypodermic needle, make superficial wounds in the abaxial side of the leaf. Make sure that the wounds do not perforate the leaf, which could decrease the efficiency of the infiltration (see Note 3) (Fig. 1). 7. Using a 5 mL syringe without the needle, infiltrate, through the wounds, the A. tumefaciens solution into the leaf. Use your fingertip to apply gentle counter pressure to the other side of the leaf. 8. Mark the leaf by attaching a small tag to the petiole and repeat steps 6 and 7 on one or two more leaves.
Infiltrated area
Adaxial
Needle
Abaxial
Fig. 1. Diagram of N. benthamiana leaf and agroinfiltration.
260
de Felippes and Weigel
9. Three days after the inoculation, harvest the tissue in liquid nitrogen. Proceed to RNA extraction or store the tissue at −80° C. 3.3. Analysis of miRNA Processing
The steps described in this section include extraction of total RNA from N. benthamiana leaves and detection of processed miRNAs by small RNA blot. Follow the standard procedures for RNA manipulation to prevent RNAse contamination and RNA degradation. For additional details please refer to (16).
3.3.1. Total RNA Extraction
1. Using a mortar and pestle, grind N. benthamiana leaves in the presence of liquid nitrogen. 2. Extract total RNA from 100 mg of tissue with TRIZOL reagent, according to manufacturer’s instructions (see Note 4). 3. Quantify the RNA amount by measuring, in the spectrophotometer, the absorbance of an aliquot at 260 nm. Consider 1 OD = 40 mg/mL. Purity of the RNA can be checked by measuring the sample also at 280 nm and calculating the A260/A280 ratio. The A260/A280 ratio should be greater than 1.65. 4. Check the RNA integrity by performing agarose gel electrophoresis containing ethidium bromide (16). Load 500 ng of RNA on a 1% agarose gel, and after sample separation, visualize the gel under UV light. The 28S and 18S species of rRNA should be clearly visible as a sharp band.
3.3.2. Small RNA Blot 3.3.2.1. Polyacrylamide Gel
1. For the polyacrylamide gel, prepare a solution containing 17% polyacrylamide and 7 M urea in 0.5× TBE. Incubate at 37° C until the urea is dissolved. 2. Add 0.05% of TEMED and 0.2% of APS (from 25% stock solution). Pour the solution into the gel mold and let it polymerize for at least 1 h. 3. Transfer the gel to the electrophoresis chamber containing 0.5× TBE. With the help of a syringe, clean the gel slots by aspiration, to avoid urea to accumulate. Pre-run the gel for 1 h at 180 V.
3.3.2.2. RNA Sample Preparation
1. For the small RNA blot, use 3–15 mg of total RNA. Prepare the sample by adding the loading dye and formamide (the final concentration of the formamide should be 40% of the final volume) (see Note 5). 2. Heat the sample at 95° C for 4 min, cool down on ice and spin briefly. 3. Clean the gel slots and apply the RNA sample slowly. 4. Run the gel at 180 V until the blue from the loading dye reaches the bottom.
Transient Assays for the Analysis of miRNA Processing and Function
261
5. Disassemble the gel carefully and stain it for 5 min in 0.5× TBE containing 0.5 mg/mL of ethidium bromide. Take a picture under a UV lamp. 6. Cut the Nytran SuPerCharge nylon transfer membrane to a size large enough to cover the entire surface of the gel. Do the same with two pieces of extra thick blotting paper. 7. Soak the membrane and the blotting paper in 0.5× TBE. Assemble the blotting paper, the membrane and gel on the semidry gel transfer device. Transfer for 1 h at 10 V (see Note 6). 8. After transfer, mark the position of the gel slots using a pencil. Check the gel under UV light to make sure all the RNA has been transferred. 9. Fix RNA to the membrane using an UV crosslinker set on “autocrosslink.” Proceed to hybridization (Subheading “Membrane Hybridization”) (see Note 7). 3.3.2.3. Probe Labeling
1. As probe, use a DNA oligonucleotide that has the reverse complementary sequence of the small RNA of interest; in this case, to detect the mature miR319: GGGAGCTCCCTTC AGTCCAA. 2. Combine 5 pmol of the oligonucleotide, 2.5 mL of 10× Optikinase buffer, 5 mL of gamma 32P-ATP (50 mCi), 16 mL water and 1 mL of Optikinase. Incubate for at least 30 min at 37° C (see Note 8). 3. Remove the unincorporated nucleotides using a Micro BioSpin® 6 chromatography column, according to the manufacture’s instructions. 4. Add 200 mL of 1× hybridization buffer, incubate at 95° C for 5 min, and cool down on ice.
3.3.2.4. Membrane Hybridization
1. In a rotating oven, pre-hybridize the membrane for 1 h at 38° C in the presence of 1× hybridization buffer. The volume of buffer should be sufficient to cover the membrane when the tube is in the horizontal position. Be sure that the membrane side containing the RNA sample is facing the interior of the tube (having direct contact with the hybridization buffer). 2. During the pre-hybridization, carry out the probe labeling (Subheading “Probe Labeling”). 3. Add the probe and incubate over night at 38° C. 4. Pour off the hybridization solution and briefly rinse with 2× SSC, 0.2% SDS solution. 5. Wash with 2× SSC, 0.2% SDS solution for 20 min at 50° C (see Note 9). 6. Wash with 1× SSC, 0.1% SDS for 20 min at 50° C.
262
de Felippes and Weigel
7. Discard the washing solution. Dry the membrane briefly and seal with plastic wrap. 8. Place the membrane in a film cassette, add the film and expose overnight at −80° C. After exposure, develop the X-ray film (see Note 10).
4. Notes 1. In transient assays, like the one described here, no selection of the transgenic tissue is necessary. While the binary vector used to illustrate this method was designed for the generation of stable transgenic plants, it is also effective for this kind of method. Other binary vectors and methods that do not rely on GATEWAY® cloning can of course be used. Hellens et al. (20) describe a set of vectors designed for the agroinfiltration technique that can be very useful, depending on the analysis planned. 2. Alternatively, the bacterial cells can be resuspended in the infiltration medium to a final concentration of 0.5 OD at 600 nm, incubated for 3 h at room temperature and used for the agroinfiltration (9, 10). 3. Give preference to leaves that have recently undergone rapid expansion, since transient expression levels are usually higher. Young leaves before expansion and older leaves appear to be less useful for this type of assay (9). 4. Other methods for total RNA extraction may be used. 5. To reach the sample volume suitable for gel loading, RNA can be concentrated using a vacuum-centrifuge (such as SPD SpeedVac®; ThermoSavant) prior to adding the loading dye and formamide. The advantage of this approach is that it does not involve ethanol precipitation. Since there is less risk of loss of material, the RNA does not need to be quantified again in this case. 6. Alternatively, the RNA sample can be transferred from the gel to the membrane using capillary forces. For details refer to (16). 7. The membrane can be stored at 4° C for several days before hybridization. 8. A smaller amount of radioactivity may be sufficient. 9. In case of high background, repeat this washing step. Checking the membrane with a Geiger counter after each washing step can indicate the necessity of this extra wash step. 10. Usually, overnight exposure is ideal for detection of small RNAs, when the expression is driven by the CaMV35S
Transient Assays for the Analysis of miRNA Processing and Function
263
promoter. However, due to some variation intrinsic to the sample or to the technique, different times of film exposure might be necessary. In cases like this, two films can be placed at the same time in the cassette, one over the other. Develop the first film after some hours, taking care not to misplace the remaining film (attach the film to the cassette with tape). If longer exposure time is needed, develop the remaining film after additional time.
Acknowledgments We would like to thank Dr. Jia-Wei Wang for discussion and Heike Wollmann for discussion and suggestions in the preparation of this chapter. FFF is supported by DAAD. Work on small RNAs in the Weigel laboratory is supported by European Community FP6 IP SIROCCO (contract LSHG-CT-2006-037900) and by the Max Planck Society.
References 1. Clough SJ, Bent AF (1998) Floral dip: a simplified method for Agrobacterium-mediated transformation of Arabidopsis thaliana. Plant J 16:735–743 2. Chung MH, Chen MK, Pan SM (2000) Floral spray transformation can efficiently generate Arabidopsis transgenic plants. Transgenic Res 9:471–476 3. Himber C, Dunoyer P, Moissiard G, Ritzenthaler C, Voinnet O (2003) Transitivity-dependent and -independent cell-to-cell movement of RNA silencing. EMBO J 22:4523–4533 4. Usharani KS, Periasamy M, Malathi VG (2006) Studies on the activity of a bidirectional promoter of Mungbean yellow mosaic India virus by agroinfiltration. Virus Res 119:154–162 5. Yang Y, Li R, Qi M (2000) In vivo analysis of plant promoters and transcription factors by agroinfiltration of tobacco leaves. Plant J 22: 543–551 6. Baumberger N, Tsai CH, Lie M, Havecker E, Baulcombe DC (2007) The Polerovirus silencing suppressor P0 targets ARGONAUTE proteins for degradation. Curr Biol 17(18): 1609–1614 7. Pazhouhandeh M, Dieterle M, Marrocco K, Lechner E, Berry B, Brault V, Hemmer O, Kretsch T, Richards KE, Genschik P, ZieglerGraff V (2006) F-box-like domain in the
polerovirus protein P0 is required for silencing suppressor function. Proc Natl Acad Sci U S A 103:1994–1999 8. Hoffmann T, Kalinowski G, Schwab W (2006) RNAi-induced silencing of gene expression in strawberry fruit (Fragaria × ananassa) by agroinfiltration: a rapid assay for gene function analysis. Plant J 48:818–826 9. Wroblewski T, Tomczak A, Michelmore R (2005) Optimization of Agrobacteriummediated transient assays of gene expression in lettuce, tomato and Arabidopsis. Plant Biotechnol J 3:259–273 10. Llave C, Kasschau KD, Carrington JC (2000) Virus-encoded suppressor of posttranscriptional gene silencing targets a maintenance step in the silencing pathway. Proc Natl Acad Sci U S A 97:13401–13406 11. Wydro M, Kozubek E, Lehmann P (2006) Optimization of transient Agrobacteriummediated gene expression system in leaves of Nicotiana benthamiana. Acta Biochim Pol 53:289–298 12. Tzfira T, Citovsky V (2006) Agrobacteriummediated genetic transformation of plants: biology and biotechnology. Curr Opin Biotechnol 17:147–154 13. Gelvin SB (2003) Agrobacterium-mediated plant transformation: the biology behind the
264
de Felippes and Weigel
“gene-jockeying” tool. Microbiol Mol Biol Rev 67:16–37 table of contents 14. Hellens RP, Edwards EA, Leyland NR, Bean S, Mullineaux PM (2000) pGreen: a versatile and flexible binary Ti vector for Agrobacteriummediated plant transformation. Plant Mol Biol 42:819–832 15. Hartley JL, Temple GF, Brasch MA (2000) DNA cloning using in vitro site-specific recombination. Genome Res 10:1788–1795 16. Sambrook J, Russell DW (eds) (2001) Molecular Cloning: A Laboratory Manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY 17. Palatnik JF, Allen E, Wu X, Schommer C, Schwab R, Carrington JC, Weigel D (2003)
Control of leaf morphogenesis by microRNAs. Nature 425:257–263 18. Weigel D, Glazebrook J (eds) (2002) Arabid opsis: a laboratory manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, USA 19. Sanger F, Nicklen S, Coulson AR (1977) DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci U S A 74:5463– 5467 20. Hellens RP, Allan AC, Friel EN, Bolitho K, Grafton K, Templeton MD, Karunairetnam S, Gleave AP, Laing WA (2005) Transient expression vectors for functional genomics, quantification of promoter activity and RNA silencing in plants. Plant Methods 1:13
INDEX A Abiotic stress.............................................51, 164, 203–228 Agilent Bioanalyzer........................................................ 123 AGO. See Argonaute Agrobacterium......................73, 83, 184, 233, 235, 236, 256 Agroinfiltration............................... 236, 240, 256–260, 262 Alignment................................................13, 26–28, 51–54, 56, 76, 77, 85, 169, 178, 186, 188, 237, 239 Antisense...................... 3, 24, 63, 64, 92, 141, 144, 164, 208 Arabidopsis..................................... 1–14, 20, 33, 51, 59, 72, 90, 108, 131, 150, 164, 183, 204, 232, 243, 255 Argonaute (AGO)....................... 2, 163, 203, 243, 244, 248 Artificial miRNA (amiRNA)..................................... 71–87
B b elimination...........................................138–140, 143, 144 Biogenesis........................................ 1–14, 60, 104, 107, 149 Bioinformatics.............................................28, 89–105, 220 Biotic stress........................ 51, 107, 164, 183–201, 203–228 Biotin.............................................. 116, 118, 138–143, 185 BLAST....................................61, 74, 75, 85, 156, 188, 198
C Chromatin.......................................................164, 203, 243 Cleaved................................................... 3, 9, 27, 60, 84, 87, 163, 187, 203, 217, 228, 232 Cold.......................................................... 38, 41–45, 48, 49, 130, 204, 210, 221, 222, 235 Comparative analysis.......................................163–180, 198 Comparative genomics..................................................... 20 Complementation..................................................72, 73, 84 Conservation [of miRNAs]........................................ 19, 26 Co-regulated [pri-miRNAs].......................................... 199 Cyanine 3-pCp (Cy label).............................................. 118
D Database............................................. 19, 25, 53–56, 60, 61, 63, 69, 75, 76, 90–96, 99–103, 105, 115, 150, 151, 158, 159, 167, 169–171, 176, 179, 180, 188, 191, 225, 226 dCAPS......................................................................... 9, 10 DCL1 (DICER-LIKE 1)..............................3, 4, 6–10, 56, 137, 163, 187, 196, 197, 201, 203, 204, 232, 239 DCL4 (DICER-LIKE 4).......................................3, 4, 187
Deep sequencing................................................2, 3, 21, 24, 28, 29, 103, 113, 158, 164–165, 198, 204, 211 Degradation.................................................. 46, 87, 99, 164, 203, 204, 226, 260 DEPC (diethyl pyrocarbonate)..............................7, 10, 38, 47, 109, 111, 112, 122, 130, 138–140, 146 Dicer.......................................2, 59, 118, 137, 163, 203, 232 DNA extraction.......................................................6, 8, 208 DRB................................................................................... 4 Drought.................................................................. 204, 210
E einverted.................................23, 25, 26, 28, 29, 93, 94, 100 Evolution.................................................................... 1, 164 Exportin............................................................................. 4 Extraction............................................... 6, 8, 32–34, 38–41, 43–49, 73, 81, 94, 111, 119, 138, 139, 141, 150, 151, 155, 195, 199, 208, 220, 234, 236, 244–246, 248, 253, 260, 262
F Fasta..................................................... 22, 23, 53, 54, 75, 150, 155, 168–170, 177, 192 Flg22.......................................... 184, 189 454, 90, 94, 108, 114, 204, 211, 220, 225
G Gene prediction................................................................ 22 Gene silencing...........................................3, 71–87, 89, 231 Genotyping...........................................................2, 6, 7, 10
H Hairpin.......................................................3, 19–21, 23–28, 71, 78, 86, 104, 156, 163, 203 HASTY (HST).................................................................. 4 HEN1. See HUA ENHANCER1 Homologous [miRNAs]............................................. 19, 26 HST (HASTY )................................................................. 4 HUA ENHANCER1 (HEN1).............................3, 4, 138, 143–145, 196, 197, 204 HYL1. See HYPONASTIC LEAVES1 HYPONASTIC LEAVES1 (HYL1).........4, 137, 232, 239
I In situ [hybridization]............................................ 127, 130
265
lant MicroRNAs 266 PIndex
L Library................................................ 27, 28, 33, 67, 90–94, 99, 101, 105, 114, 119, 167–176, 178, 179, 208, 210–212, 220–225, 228 LNA. See Locked Nucleic Acid Locked Nucleic Acid (LNA)............. 14, 108, 127–136, 199 Log-likelihood score........................................153, 154, 157 Low-molecular-weight (LMW) RNAs..................................... 13, 109, 111, 112, 123
M Mapping................................................... 64, 84, 90–92, 95, 99–101, 168, 185, 187, 198 Massively parallel signature sequencing (MPSS)...............90, 93–95, 108, 114, 122, 204, 211 Mechanical stress............................................................ 204 Medicago truncatula..............................39, 41–43, 165, 187 Methylation...... 3, 87, 89, 117, 138–140, 143, 144, 164, 243 Mfold......................................................104, 159, 188, 235 Microarray......................................................108, 115–119, 122, 123, 187, 192, 195, 200, 212, 226 miRbase............................21, 24, 29, 86, 116, 155, 156, 212 miRcheck............................................................. 21–26, 29 miRNA............................................1, 19, 32, 51, 59, 71, 89, 107, 127, 137, 149, 163, 183, 203, 231, 243, 256 miRNA modification............................................. 137–148 Mismatch...................... 9, 51–55, 77, 83, 85, 91, 92, 95, 97, 98, 116, 117, 167, 169–171, 177, 178, 200, 216, 226 Mutagenesis...................................... 72, 73, 77–82, 84, 235 Mutant..................................2, 4–10, 56, 83, 138, 164, 185, 187, 194, 196, 197, 201, 204, 227, 234, 239, 243
N Nicotiana benthamiana........................................... 233, 256 Northern [blotting]...................................... 2, 7, 10–13, 27, 86, 108, 122, 127, 142, 189, 204, 213, 214, 226, 232–234, 237, 239, 248, 250
O Oligonucleotide probe................................12–13, 108, 115, 116, 122, 135, 138–143, 199, 238, 250 Oryza sativa................................. 79, 91, 118, 186, 188, 204 Oxidative stress.............................................................. 204
P Parallel analysis of RNA ends (PARE)................................ 204, 208, 220–226, 228 Paralogous [loci]............................................................. 107 Patscan............................................................22, 23, 27, 56 Perl...........................................22, 23, 52, 53, 55, 61, 69, 93 Phased small RNAs.......................................................... 66 Phase ratio............................................................ 64, 66–69 Phenotype..................................8, 83, 84, 86, 196, 201, 227
Phosphate starvation.......................................200, 204, 211 Physcomitrella patens....................................................... 24 Plant RNA Isolation Reagent® (PRIR)........................................... 33, 38, 39, 41, 42 pNW55...........................................................72, 78, 79, 82 Positional Weight Matrix (PWM)................................ 150–154, 156, 157, 159 Precursor................................................. 2, 3, 19–21, 24–26, 28, 59, 60, 71, 73, 77–79, 82, 83, 85, 86, 102, 104, 108, 117, 118, 137, 155, 156, 185, 186, 188, 196, 212, 231–241, 256–258 Prediction................................... 19–29, 51–56, 72, 94, 150, 152, 154–156, 159, 167, 177, 178, 188, 216, 220 pre-miRNA................................. 3, 137, 232, 234, 238–240 pri-miRNA.....................................................3, 4, 137, 188, 192–195, 198–200, 231–233, 235, 238–240 Promoter.........................................................73, 82, 84, 87, 149–160, 185, 188, 189, 191–193, 195, 196, 199, 200, 227, 234, 251, 252, 255, 258, 263 Promoter element............................................149, 153, 155 pRS300............................................................72, 78, 79, 82 Pseudochromosome.................................................... 61, 94 Pseudomonas syringae pv. tomato strain DC3000 (Pto DC3000)............................................ 183, 184
R 5′ RACE................................................ 27, 55, 73, 84, 155, 204, 207, 217, 218, 220 RDR2..................................................................3, 164, 227 Reverse target prediction........................................ 177–179 Ribonuclease.......................................... 32, 40, 46, 110, 122 RLM (RNA ligase-mediated) 5′-RACE............................. 204, 207, 217, 218, 220 RNA adaptor................................... 109–112, 114, 118, 184 RNAfold............................................ 22, 23, 25, 26, 29, 159 RNA gel blot.......................................................... 102, 213 RNA-induced silencing complex (RISC)........................3, 27, 163, 232, 243, 250, 251 RNA ligase-mediated (RLM) 5′-RACE.....................................204, 207–208, 217 RNA-primed, array-based Klenow enzyme (RAKE)................................................. 116 RNA processing......................................................... 15, 16 RNase III.........................................................137, 231, 232 RNA size markers...................................7, 11–13, 146, 250 RNA splicing, RT-PCR..................................33, 73, 83, 87, 110, 113, 123, 144, 146, 187, 189, 193, 198, 199, 213, 226, 234, 238
S 35S....................................................................82, 234, 258 Salt........................................................... 32, 34, 38, 39, 42, 47, 128, 129, 135, 140, 143, 204, 210, 215
Plant MicroRNAs 267 Index
Sanger database.............................................................. 115 SE (SERRATE)......................................................... 4, 137 Secondary structure.................................. 19–26, 28, 29, 77, 83, 87, 103, 104, 155, 156, 159, 233, 235 Sequence-By-Synthesis (SBS)............................90–99, 101, 104, 108, 114, 164, 165, 204, 211, 220, 225 Sequencing................................................... 2, 3, 20, 21, 24, 28, 29, 65, 67, 81–83, 86, 89–105, 108, 111, 113–115, 119, 122, 123, 158, 164, 165, 168, 180, 184, 198, 204, 207, 211, 214, 218–220, 225, 226, 228, 238, 258 Site-directed mutagenesis..........................72–73, 77, 82, 84 snoRNA....................................................... 92, 94, 99, 101, 114, 117, 212 snRNA.................................................. 92, 94, 99, 114, 117 Sodium periodate............................ 138, 139, 143, 144, 147 Splinted-ligation......................................122, 207, 213, 214 Stem-loop................................................... 3, 21, 25, 28, 59, 115, 137, 212, 213, 231, 233 Streptavidin..................................... 116, 118, 138, 140–142 Stress response.......................................... 90, 107, 164, 204, 206, 211, 212, 216, 226, 227, 231 Sulfate starvation.................................................... 204, 211
T Target................................................... 2, 20, 51, 60, 71, 94, 107, 127, 137, 149, 164, 185, 203, 231, 243 Target scans,
Total RNA................................................ 10, 11, 13, 31–49, 84, 91, 111, 112, 117, 118, 122, 123, 138, 140–145, 148, 185, 187, 190–192, 210, 211, 213–215, 217, 218, 220, 233, 234, 236, 237, 239, 240, 260, 262 Trans-acting siRNA........................................3, 92, 95, 164 Transcription factor.........................................150, 158, 164 TRANSFAC.................................................................. 150 Transient assay........................................................ 255–263 Transitivity....................................................................... 59 Translation......................................... 89, 164, 203, 231, 232 Trimming........................................ 90–92, 95–99, 131, 211 TriReagent®...............................................32–34, 39, 43–44 TRIzol®...............................32, 33, 39, 43, 73, 83, 111, 220, 233, 236, 239, 240, 244, 248, 250, 252, 256, 260 tRNA........................................................ 11, 33, 92, 94, 99, 114, 117, 123, 129, 135, 211, 213, 237, 239
U Uridylation......................................................138–140, 144 UV-B.............................................................................. 204
V Virus..........................................................82, 164, 185, 201
W Western............................................................ 87, 245–248 WMD webpage................................................................ 74