Rice Functional Genomics
Rice Functional Genomics Challenges, Progress and Prospects
Edited by
NARAYANA M. UPADHYAYA Commonwealth Scientific and Industrial Research Organization (CSIRO) Plant Industry Canberra, ACT 2601, Australia
Narayana M. Upadhyaya Commonwealth Scientific and Industrial Research Organization (CSIRO) Plant Industry Canberra, ACT 2601, Australia
Library of Congress Control Number: 2006939781 ISBN-10: 0-387-48903-7 ISBN-13: 978-0-387-48903-2
e-ISBN-10: 0-387-48914-2 e-ISBN-13: 978-0-387-48914-8
Printed on acid-free paper. © 2007 Springer Science+Business Media, LLC All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. 9 8 7 6 5 4 3 2 1 springer.com
Foreword
In 1991 Gurdev Khush and I edited the first book containing summaries of research results in Rice Biotechnology. In comparing that book with this one, what a difference 15 years can make! How excited we were back then with publication of the first molecular genetic map of rice using DNA markers (120 RFLPs) and with genetic transformation of the recalcitrant cereals, which had finally been achieved using DNA uptake by and plant regeneration from rice protoplasts— though efficiencies were very low and some of the regenerated plants looked rather peculiar. Even then, rice was beginning to be considered a model monocot for molecular genetic research, in part because leading laboratories confirmed an earlier report from India suggesting rice had a relatively small genome. Just as important, in my opinion, rice was achieving model status because the scientists generating the knowledge base and creating enabling technologies readily shared them with numerous other scientists who then made further advances in rice molecular biology. Not only did the scientists freely provide results and materials to others, but they also offered training in their use and combined and integrated results from different laboratories to advance the science. It is clear from the chapters of this book that such a spirit of collaboration is still promoting advances in rice genomics and keeping rice at the forefront of the field. As rice yeast artificial chromosome (YAC) bacterial artificial chromosome (BAC) P1-derived artificial chromosome (PAC), cosmid, and fosmid libraries became available, they too are shared and have laid the foundations for success in map-based cloning, rice genome sequencing, and comparative mapping across species. Similarly, more than one million rice expressed sequence tags (ESTs) have been developed by several laboratories, more than for any other plant species, and shared with all researchers. Perhaps the most significant collaboration has been the International Rice Genome Sequencing Project undertaken by laboratories in ten countries with contributions from two corporations. Despite many bumps along the way, the leaders of the project were able to keep all parties committed to generating a complete and highly accurate rice genome sequence with all data immediately placed in the public domain. Completed in 2004, this sequence solidified the model status of rice, and, as demonstrated in several chapters of this book, has become an extremely valuable resource for fundamental research in
VI
Foreword
genomics and for crop genetic improvement. Similar collaborations continue with the International Rice Functional Genomics Consortium, the Oryza Map Alignment Project, and the International Rice Information System. This book presents an excellent review of recent advances in determining the function of 30,000 to 40,000 genes of rice and in using this knowledge to identify agronomically important genes in rice, wild relatives of rice, and other cereals. It is fortuitous that rice has come to serve as the model for monocot research because rice also happens to feed half of humanity, including many of the world’s poor. We know from past experience that genetic modifications in rice development and productivity can lead to transformations in agriculture that help to feed and improve the lives of hundreds of millions of people. An international rice research system is in place that has, and will continue to use scientific progress and knowledge, such as that presented here, to make such genetic improvements in rice for the benefit of humankind. The authors and editors who have contributed to this book are to be commended for synthesizing our knowledge of rice functional genomics in a format that will both advance the science and facilitate such applications. Gary H. Toenniessen Managing Director Interim President, Alliance for Green Revolution in Africa The Rockefeller Foundation New York, NY 10018-2702, USA
Preface
My continuous association with rice research dates back to 1990, when I started as a postdoctoral research fellow at CSIRO Plant Industry thanks to the generous support of the Rockefeller Foundation under its International Rice Biotechnology Program. By that time, rice had already been recognized as a model species for cereal biotechnology, not only because of its status as a staple food for resource-poor Asia with half the world’s population and the urgent need to increase the rice production to meet the growing demand, but also because of well understood rice genetics and the availability of a large number of molecular markers. Progress with transgene delivery and expression has been more rapid with rice than with any other cereal because of the efficient rice tissue culture and transformation systems developed over the years. In the mid-1990s, rice was further established as a model species for cereal genome research, because of its small genome size, ease with which it could be transformed, and its gene order and gene sequence similarities with other cereals. A consortium of publicly funded laboratories formed The International Rice Genome Sequencing Project (IRGSP) in 1997 to produce a high-quality, map-based sequence of the rice genome using the cultivar Nipponbare of Oryza sativa ssp. japonica. I was fortunate enough to continue to work on rice even after the conclusion of our Rockefeller-funded project in 1997, thanks to the support and encouragement of CSIRO Plant Industry’s then Chief Dr. Jim Peacock and Genomics Program leader Dr. Liz Dennis. We knew that with the imminent availability of the complete rice genome sequence, the challenge to the scientific community would be in identifying functions for each of the expected 25,000 to 50,000 plant genes. Along with a few other groups worldwide, we embarked on developing functional genomics tools and resources in the form of transposon insertional mutants and mutagens. Genome-wide research tools, resources, and approaches such as data mining for structural similarities, gene expression profiling at the RNA level with expressed sequence tags (ESTs), microarray and DNA chip-based analyses, gene expression profiling at the protein level (proteomics), gene knockouts or loss of function studies with naturally occurring alleles, induced deletion mutants and insertional mutants, and gene expression
VIII
Preface
knock-down (gene silencing) studies with RNAi have all become integral parts of plant functional genomics including that of rice. I have been in touch with these facets of Rice Functional Genomics through my involvement as a member of the International Rice Functional Genomics Consortium, a voluntary organization with a mandate to coordinate research in the post-sequencing functional genomics era by exploring ways to consolidate international rice functional genomics resources and to build common strategies to achieve our common goals. We, as a scientific community, still have a long way to go in fully understanding the key genes controlling important agronomic characters before they can be exploited by classical or transformation breeding for crop improvement. The chapters in this book focus on most of the aforementioned aspects of rice functional genomics and are authored by leading researchers in their respective fields. I am indebted to chapter coordinators, coauthors, and reviewers for their extremely valuable contributions. Sincere thanks to my colleagues at CSIRO Plant Industry—Drs. Qian-Hao Zhu, John Watson, and Andrew Eamens, for assisting me with technical editing of various chapters. My thanks to Drs. Danny Llewellyn, Peter Waterhouse, Ming-Bo Wang, Alan Richardson, Chris Helliwell, Xue-Rong Zhou, Mr Neil Smith, Miss Kerrie Ramm, and others for proofreading the chapters. I thank Springer for inviting me to edit this book, which has been a challenging and rewarding experience for me. Narayana M. Upadhyaya CSIRO Plant Industry GPO Box 1600, Canberra, ACT 2601 Australia October 19, 2006
Contents
Foreword ............................................................................................................... V Preface ................................................................................................................VII Contributors......................................................................................................XIX 1 Introduction ........................................................................................................1 Narayana M. Upadhyaya and Elizabeth S. Dennis 2 Rice Genome Sequence: The Foundation for Understanding the Genetic Systems............................................................................................5 Takashi Matsumoto, Rod A. Wing, Bin Han and Takuji Sasaki Reviewed by Satoshi Tabata 2.1 The Importance of the Accurate Genome Sequence of Rice .......................5 2.2 Construction of the Sequence-Ready Physical Maps...................................7 2.3 Two-Step Strategy for Completion of Rice Genome Sequencing .............10 2.4 An Alternative Approach—the Whole Genome Shotgun Sequencing of Rice ...................................................................................13 2.4.1 Whole Genome Shotgun Sequencing of japonica Rice (Syngenta) ................................................................................13 2.4.2 Whole Genome Shotgun Sequencing of indica Rice (BGI) .............13 2.4.3 Comparison of Genome Sequences Derived from Whole Genome Shotgun Sequencing and Clone-by-Clone Shotgun Sequencing (IRGSP) .......................................................................13 2.5 Initial Analysis of the Rice Genome ..........................................................14 2.6 Current Status and Future Developments ..................................................16 Acknowledgments ...........................................................................................17 References........................................................................................................17 3 Rice Genome Annotation: Beginnings of Functional Genomics...................21 Takeshi Itoh Reviewed by C. Robin Buell and Battazar A. Antonio 3.1 Introduction................................................................................................21 3.2 Computational Methods of Annotation......................................................22
X
Contents 3.3 Automated Annotation System.................................................................. 24 3.4 Comprehensive Genome Annotation and Curation ................................... 25 3.5 From Annotations to Functional Genomics ............................................... 26 Acknowledgments ........................................................................................... 27 References ....................................................................................................... 27
4 Genome-Wide RNA Expression Profiling in Rice ......................................... 31 Shoshi Kikuchi, Guo-Liang Wang, and Lei Li Reviewed by Lee Tarpley and Iain Wilson 4.1 Introduction ............................................................................................... 31 4.2 Rice Transcriptome—from EST Collection to Microarray ....................... 32 4.2.1 Rice EST Collection and the First cDNA Microarray System Based on the EST Clones ................................................................. 32 4.2.2 Full-Length cDNA Project ............................................................... 35 4.2.3 Oligoarray Systems .......................................................................... 37 4.3 Deep Transcriptome Analysis of the Rice Genome................................... 38 4.3.1 Principles of Different SAGE Techniques ....................................... 40 4.3.2 Development of the Robust-LongSAGE (RL-SAGE) Method ........ 42 4.3.3 Application of RL-SAGE for Defense Transcriptome Analysis in Rice .............................................................................................. 43 4.3.4 MPSS for Expression Profiling ........................................................ 44 4.3.5 Deep Transcriptome Analysis Using MPSS..................................... 44 4.4 Transcriptional Analysis Using Genome Tiling Microarrays .................... 45 4.4.1 Principle of Genome Tiling Microarrays.......................................... 46 4.4.2 Application of Genome Tiling Microarray Analysis in Rice............ 47 4.5 Perspective................................................................................................. 52 Acknowledgments ........................................................................................... 53 References ....................................................................................................... 54 5 Rice Proteomics: A Step Toward Functional Analysis of the Rice Genome ........................................................................................... 61 Setsuko Komatsu Reviewed by Lee Tarpley 5.1 Significance ............................................................................................... 61 5.2 Database Based on 2D-PAGE ................................................................... 63 5.2.1 Strategy to Determine Amino Acid Sequences for Construction of the Rice Proteome Database......................................................... 63 5.2.2 Format and Content of the Rice Proteome Database........................ 65 5.2.3 How to Use the Rice Proteome Database ......................................... 66 5.2.4 Cataloging of Proteins in the Rice Proteome Database ................... 67 5.2.5 Future Prospects of the Rice Proteome Database ............................. 67
Contents
XI
5.3 Functional Analysis Using Differential Proteomics ..................................68 5.3.1 Stresses .............................................................................................68 5.3.2 Hormones .........................................................................................74 5.4 Future Prospects.........................................................................................77 5.4.1 Two-Dimensional Liquid Chromatography and Fluorescence Two-Dimensional Difference Gel Electrophoresis...........................77 5.4.2 Identification of Protein Modification for Functional Analysis ....................................................................79 5.4.3 Protein-Protein Interaction Analyses for Functional Prediction..................................................................81 5.4.4 Concluding Remarks ........................................................................83 Acknowledgment.............................................................................................83 References........................................................................................................83 6 Metabolomics: Enabling Systems-Level Phenotyping in Rice Functional Genomics ........................................................................................91 Lee Tarpley and Ute Roessner Reviewed by Tony Ashton 6.1 Significance ...............................................................................................91 6.2 Plant Sampling and Chemical Analysis .....................................................92 6.3 Case Studies in Rice Metabolomics...........................................................94 6.4 Case Studies Integrating Functional Genomic Levels ...............................96 6.5 Time and Space Limitations in Integrated Functional-Genomic Analyses ....................................................................................................98 6.6 Metabolite Response to Perturbation .........................................................99 6.7 Databases and Resources ...........................................................................99 6.8 Data Analysis...........................................................................................102 6.9 Summary..................................................................................................104 References......................................................................................................105 7 Use of Naturally Occurring Alleles for Crop Improvement .......................109 Anjali S. Iyer-Pascuzzi, Megan T. Sweeney, Neelamraju Sarla, and Susan R. McCouch Reviewed by Evans Lagudah 7.1 Introduction..............................................................................................110 7.1.1 Why Study Natural Variation?..............................................................110 7.2 A Plant Breeder’s View on Utilizing Natural Variation ..........................111 7.2.1 Importance of Germplasm Conservation for Crop Improvement ....................................................................111 7.3 Understanding Evolutionary History Through Natural Variation............113 7.3.1 Origins of Natural Variation: A Short History of Orzya sativa ......113 7.3.2 Genetic Markers: Assessing Diversity and Population Structure in O. sativa .....................................................................................114
XII
Contents 7.4 Natural Variation and Functional Genomics: Utilizing Germplasm to Identify Useful Alleles ........................................................................ 116 7.4.1 Genetic Markers and Their Use in Mapping .................................. 116 7.4.2 Mapping Populations...................................................................... 116 7.4.3 Association Mapping...................................................................... 128 7.4.4 Gene Identification and Development of Perfect Markers for Applications in Breeding.......................................................... 130 7.5 Natural Variation and Epistasis ............................................................... 132 7.6 Natural Variation or Mutant Analysis?.................................................... 133 7.7 Natural Variation versus Transgenic Approaches for Crop Improvement..................................................................................135 7.8 Conclusions ............................................................................................. 137 References ..................................................................................................... 137
8 Chemical- and Irradiation-Induced Mutants and TILLING ..................... 149 Ramesh S. Bhat, Narayana M. Upadhyaya, Abed Chaudhury, Chitra Raghavan, Fulin Qiu, Hehe Wang, Jianli Wu, Kenneth McNally, Hei Leung, Brad Till, Steven Henikoff and Luca Comai Reviewed by Phil Larkin 8.1 Introduction ............................................................................................. 150 8.2 Mutagens and Mutagenesis...................................................................... 151 8.2.1 Chemical Mutagens........................................................................ 152 8.2.2 Irradiation Mutagens ...................................................................... 155 8.2.3 Raising Mutant Populations ........................................................... 157 8.3 Rice Mutant Stocks and Databases.......................................................... 158 8.3.1 USA Mutant Stocks........................................................................ 159 8.3.2 IRRI Mutant Stocks and Database.................................................. 159 8.3.3 China Mutant Stocks ...................................................................... 160 8.3.4 Taiwan Mutant Stock ..................................................................... 160 8.3.5 Japan Mutant Stock and Database .................................................. 161 8.4 Forward Genetics with Mutants............................................................... 161 8.4.1 Phenotyping.................................................................................... 161 8.4.2 Map-Based Cloning........................................................................ 162 8.4.3 Detecting Genomic Changes Using Genome-Wide Chips ............. 163 8.5 Reverse Genetics with Mutants ............................................................... 164 8.5.1 PCR Screening ............................................................................... 165 8.5.2 TILLING ........................................................................................ 165 8.6 TILLING in Rice ..................................................................................... 166 8.6.1 Seattle TILLING Project ................................................................ 166 8.6.2 Other Technical Improvements in Rice TILLING ......................... 168 8.6.3 TILLING Case Studies for Specific Traits..................................... 168 8.7 Future Prospects ...................................................................................... 172 Acknowledgments ......................................................................................... 173 References ..................................................................................................... 174
Contents
XIII
9 T-DNA Insertion Mutants as a Resource for Rice Functional Genomics........ 181 Emmanuel Guiderdoni, Gynheung An, Su-May Yu, Yue-ie Hsing and Changyin Wu Reviewed by Alain Lecharny and Michel Delseny 9.1 Introduction..............................................................................................182 9.2 Agrobacterium-Mediated Transformation of Rice...................................183 9.3 T-DNA as an Insertional Mutagen...........................................................185 9.4 Rice T-DNA Insertional Mutant Populations ..........................................188 9.4.1 Korea ..............................................................................................188 9.4.2 China ..............................................................................................190 9.4.3 France .............................................................................................192 9.4.4 Taiwan ............................................................................................194 9.4.5 Current Collection of T-DNA Insertion Lines and FSTs................194 9.5 Current Knowledge on T-DNA Integration in Rice.................................195 9.6 T-DNA Insertion Specificity in Rice .......................................................198 9.6.1 Preference Among and Along Rice Chromosomes ........................198 9.6.2 Preference for Integration into Intergenic versus Genic Regions and Regulatory versus Coding Regions............................201 9.6.3 Preference for Insertion in Expressed Genes ..................................203 9.6.4 Preference for GC Content and DNA Structure .............................203 9.6.5 Preference for Functional Category of Genes.................................204 9.6.6 Estimation of the Number of Lines Required to Saturate the Rice Genome ............................................................................204 9.7 Gene and Enhancer Trapping with T-DNA in Rice.................................204 9.8 Forward Genetics Screens and Gene Isolation Using T-DNA Insertion Lines ........................................................................................208 9.8.1 Gene Trapping ................................................................................209 9.8.2 Activation Tagging .........................................................................211 9.9 Reverse Genetics with T-DNA Mutants in Rice......................................212 9.10 Conclusion and Prospects ......................................................................213 Acknowledgments .........................................................................................215 References......................................................................................................215 10 Transposon Insertional Mutants: A Resource for Rice Functional Genomics ...................................................................................223 Qian-Hao Zhu, Moo Young Eun, Chang-deok Han, Chellian Santhosh Kumar, Andy Pereira, Srinivasan Ramachandran, Venkatesan Sundaresan, Andrew L. Eamens, Narayana M. Upadhyaya and Ray Wu Reviewed by Tony Pryor and John M. Watson 10.1 Introduction............................................................................................224 10.2 Transposon Tagging Systems ................................................................225 10.2.1 Activity of Transposons in Rice..................................................225 10.2.2 One-Element System versus Two-Element System ....................229 10.2.3 Design of Constructs ...................................................................232
XIV
Contents
10.2.4 Gene and Enhancer Traps ........................................................... 236 10.2.5 Transiently Expressed Transposase System................................ 238 10.2.6 A High-Throughput System to Index Transposants.................... 238 10.2.7 Using Endogenous Transposons ................................................. 240 10.2.8 Inducible Transposition............................................................... 243 10.3 Mutagenesis Strategies .......................................................................... 245 10.3.1 Random or Non-targeted Mutagenesis........................................ 245 10.3.2 Localized or Targeted Mutagenesis ............................................ 246 10.4 Transposon Insertional Mutant Populations .......................................... 247 10.4.1 CSIRO Plant Industry Population ............................................... 248 10.4.2 EU (Wageningen) Population ..................................................... 249 10.4.3 National University of Singapore Population ............................. 250 10.4.4 Korea Population ........................................................................ 251 10.4.5 UC Davis Population .................................................................. 254 10.5 Gene Discovery by Transposon Tagging............................................... 256 10.5.1 Forward and Reverse Genetics Strategies................................... 256 10.5.2 Other Approaches for Mutation Identification............................ 259 10.5.3 Tagging Efficiency...................................................................... 260 10.5.4 Confirmation of Tagged Gene .................................................... 261 10.6 Future Prospects .................................................................................... 261 References ..................................................................................................... 262 11 Gene Targeting by Homologous Recombination for Rice Functional Genomics ................................................................................... 273 Shigeru Iida, Yasuyo Johzuka-Hisatomi, and Rie Terada Reviewed by Barbara Hohn and Charles White 11.1 Introduction ........................................................................................... 273 11.2 Gene Targeting by Homologous Recombination................................... 278 11.2.1 Gene-Specific Selection and Gene-Specific Screening............... 279 11.2.2 Strong Positive-Negative Selection for Enriching Targeted Homologous Recombinants ........................................................ 280 11.3 Potential Approaches for Homologous Recombination-Dependent Gene Targeting...................................................................................... 282 11.4 Concluding Remarks ............................................................................. 285 Acknowledgments ......................................................................................... 286 References ..................................................................................................... 286 12 RNA Silencing and Its Application in Functional Genomics.................... 291 Shaun J. Curtin, Ming-Bo Wang, John M. Watson, Paul Roffey, Chris L. Blanchard, and Peter M. Waterhouse Reviewed by Werner Aufsatz 12.1 Introduction ........................................................................................... 291 12.2 Discovery of RNA Silencing ................................................................. 292
Contents
XV
12.3 RNA Silencing Pathways.......................................................................295 12.3.1 MicroRNA and Trans-Acting siRNA Pathways .........................296 12.3.2 Repeat-Associated Small Interfering RNA and RNA-Directed DNA Methylation .......................................296 12.4 Proteins Involved in RNA Silencing Pathways .....................................299 12.4.1 The Dicer-Like Proteins..............................................................299 12.4.2 Hua Enhancer 1...........................................................................303 12.4.3 The Double-Stranded RNA-Binding Protein Family ..................305 12.4.4 The Argonaute Protein Family....................................................305 12.4.5 RNA-Dependent RNA Polymerase (RdRP)................................307 12.4.6 DNA Methyltransferases.............................................................307 12.5 RNA Silencing and Anti-Viral Defense.................................................307 12.6 Gene Silencing Platforms in Plants........................................................310 12.6.1 Delivery by Transgenes...............................................................313 12.6.2 Transient Delivery by Viral Vectors — Virus-Induced Gene Silencing ............................................................................321 12.6.3 Transient Delivery by Agrobacterium Infection and Biolistics.......323 12.7 Future Prospects of Gene Silencing Technology in Plants ....................323 References......................................................................................................324 13 Activation Tagging Systems in Rice............................................................333 Alexander A.T. Johnson, Su-May Yu, and Mark Tester Reviewed by Michael Ayliffe and Venkatesan Sundaresan 13.1 Introduction............................................................................................333 13.2 Classical Activation Tagging: Enhancer Element-Mediated Gene Activation ....................................................................................335 13.2.1 Classical Activation Tagging in Plants .......................................335 13.2.2 Structure and Function of the CaMV 35S Activation Tagging System...........................................................................336 13.2.3 Variations to the CaMV 35S Activation Tagging System ..........338 13.2.4 CaMV 35S Activation Tagging Resources in Rice.....................339 13.3 Transactivation Tagging: Transcriptional Activator-Mediated Gene Activation in Specific Cell Types .................................................341 13.3.1 Gene Expression at the Cell Type–Specific Level ......................341 13.3.2 Origin of the GAL4 Enhancer Trapping System.........................342 13.3.3 GAL4 Enhancer Trapping in Plants ............................................343 13.3.4 Cell Type–Specific Activation of Target Genes Using GAL4 Transactivation.................................................................344 13.3.5 Cell Type–Specific Activation Tagging Using GAL4 Transactivation.................................................................346 13.4 Future Perspectives ................................................................................348 Acknowledgments .........................................................................................349 References......................................................................................................349
XVI
Contents
14 Informatics Resources for Rice Functional Genomics .............................. 355 Baltazar A. Antonio, C. Robin Buell, Yukiko Yamazaki, Immanuel Yap, Christophe Perin, and Richard Bruskiewich Reviewed by Wm L. Crosby and Richard Cooke 14.1 Introduction ........................................................................................... 356 14.2 NIAS Informatics Resources ................................................................. 359 14.2.1 INtegrated Rice Genome Explorer.............................................. 359 14.2.2 RGP Annotation Databases......................................................... 361 14.2.3 KOME......................................................................................... 362 14.2.4 Rice PIPELINE........................................................................... 362 14.3 TIGR Informatics Resources ................................................................. 363 14.4 Oryzabase .............................................................................................. 366 14.4.1 Database Contents....................................................................... 366 14.4.2 Genetic Stocks ............................................................................ 368 14.4.3 Comparative Genomics Resources ............................................. 368 14.5 Gramene................................................................................................. 369 14.5.1 Genome Browser ........................................................................ 370 14.5.2 Maps and Markers....................................................................... 370 14.5.3 QTL, Genes, and Proteins ........................................................... 371 14.5.4 Ontology ..................................................................................... 372 14.5.5 Database Availability.................................................................. 372 14.6 CIRAD Informatics Resources .............................................................. 373 14.6.1 OryGenesDB............................................................................... 373 14.6.2 Oryza Tag Line ........................................................................... 375 14.6.3 Greenphyl.................................................................................... 376 14.7 IRRI Informatics Resources................................................................... 377 14.7.1 The International Rice Information System ................................ 378 14.7.2 Current Developments ................................................................ 379 14.8 Insertion Mutant Databases ................................................................... 380 14.8.1 Tos17 Insertion Mutant Database................................................ 380 14.8.2 Rice Mutant Database ................................................................. 381 14.8.3 Rice Ds Tagging Lines................................................................ 381 14.8.4 Taiwan Rice Insertional Mutants Database................................. 382 14.8.5 Shanghai T-DNA Insertion Population ....................................... 383 14.8.6 Rice T-DNA Insertion Sequence Database................................. 383 14.8.7 Rice FST Database at UC Davis ................................................. 384 14.8.8 CSIRO Rice FST Database and RGMIMS ................................. 384 14.8.9 RiceGE: Rice Functional Genomic Browser .............................. 385 14.9 Integration of Rice Functional Genomics Information .......................... 386 14.9.1 High-Speed Networks ................................................................. 386 14.9.2 Grid Computing .......................................................................... 387 14.9.3 Web Integration .......................................................................... 387 14.10 Rice Functional Genomics Network.................................................... 388 Acknowledgments ......................................................................................... 389 References ..................................................................................................... 389
Contents
XVII
15 The Oryza Map Alignment Project (OMAP): A New Resource for Comparative Genome Studies within Oryza.........................................395 Rod A. Wing, Hye-Ran Kim, Jose Luis Goicoechea, Yeisoo Yu, Dave Kudrna, Andrea Zuccolo, Jetty Siva S. Ammiraju, Meizhong Luo, Will Nelson, Jianxin Ma, Phillip SanMiguel, Bonnie Hurwitz, Doreen Ware, Darshan Brar, David Mackill, Cari Soderlund, Lincoln Stein and Scott Jackson Reviewed by John M. Watson and Evans Lagudah 15.1 Introduction............................................................................................395 15.2 Development of the OMAP BAC Library Resource .............................397 15.3 Development of Wild Species FPC/STC Physical Maps.......................399 15.3.1 BAC End Sequencing .................................................................399 15.3.2 BAC Fingerprinting ....................................................................399 15.3.3 Analysis of Structural Variation Between O. sativa and the 3 AA Genome OMAP Accessions .................................401 15.4 Summary, Conclusions, and Future Research........................................404 References......................................................................................................407 16 Application of Functional Genomics Tools for Crop Improvement ........411 Motoyuki Ashikari, Makoto Matsuoka and Masahiro Yano Reviewed by Elizabeth S. Dennis 16.1 Rice Genomics.......................................................................................411 16.2 Molecular Markers for Improved Breeding Efficiency..........................412 16.3 QTL Analysis.........................................................................................413 16.3.1 Genetic and Molecular Dissection of QTLs................................415 16.3.2 QTL Application in Breeding......................................................418 16.3.4 QTL Pyramiding for Breeding ....................................................418 16.3.5 QTL Detection Using Chromosome Segment Substitution Lines........................................................................420 16.4 Use of Wild Species as a Source of Diversity for Breeding ..................422 16.5 Molecular Breeding ...............................................................................422 16.6 Outlook ..................................................................................................422 References......................................................................................................423 17 From Rice to Other Cereals: Comparative Genomics ..............................429 Richard Cooke, Benoit Piégu, Olivier Panaud, Romain Guyot, Jérome Salse, Catherine Feuillet and Michel Delseny Reviewed by Robert Henry and Elizabeth S. Dennis 17.1 Introduction............................................................................................429 17.2 Origin and Evolution of Cereals ............................................................431
XVIII
Contents
17.3 Use of Comparative Genomics to Improve Genome Sequence Annotation ............................................................................ 433 17.4 Comparative Genomics and Conserved Noncoding Sequences: The Discovery of New Genes and New Signals ................................... 436 17.5 Comparative Phylogeny of Multigene Families .................................... 437 17.6 Revised “Circle Diagram” Model and Synteny Disruption ................... 443 17.7 The Rice Genome as a Model for Map-Based Cloning in Cereals ........ 450 17.8 Comparative QTL Mapping and Meta-Analysis of QTL ...................... 454 17.9 Comparative Expression Profiling......................................................... 457 17.10 Comparative Biology in the Era of Genomics ..................................... 458 17.11 Genome Sequencing in Grasses: Beyond the Model ........................... 461 Acknowledgments ......................................................................................... 464 References ..................................................................................................... 464 Index................................................................................................................... 481
Contributors
Jetty Siva S. Ammiraju Arizona Genomics Institute, University of Arizona, Tucson, AZ 85721 USA E-mail:
[email protected] Gynheung An Department of Life Science and National Research Laboratory of Plant Functional Genomics, Pohang University of Science and Technology, Hyoja-dong, Nam-gu, Pohang, Kyungbuk 790-784 Republic of Korea E-mail:
[email protected] Baltazar A. Antonio National Institute of Agrobiological Sciences, 2-1-2 Kannondai, Tsukuba, Ibaraki 305-8602 Japan E-mail:
[email protected] Motoyuki Ashikari Laboratory of Plant Bioresource, Development and Applied Division, Bioscience and Biotechnology Center, Nagoya University, Furocho, Chikusa-ku, Nagoya-shi, Aichi 464-8601 Japan E-mail:
[email protected] Tony Ashton* Genetic Engineering for Crop Improvement Program, CSIRO Plant Industry, Canberra, ACT 2601 Australia E-mail:
[email protected] Werner Aufsatz* Gregor Mendel-Institut, GMI GmbH, Wien/Vienna A-1030 Austria E-mail:
[email protected]
XX
Contributors
Michael Ayliffe* Genetic Engineering for Crop Improvement Program, CSIRO Plant Industry, Canberra, ACT 2601 Australia E-mail:
[email protected] Ramesh S. Bhat University of Agricultural Sciences, Dharwad, Karnataka-580 005 India E-mail:
[email protected] Chris L. Blanchard School of Wine and Food Sciences, Charles Sturt University, Wagga Wagga, NSW 2678 Australia E-mail:
[email protected] Darshan Brar International Rice Research Institute, Los Baños, Metro Manila Philippines E-mail:
[email protected] Richard Bruskiewich International Rice Research Institute, Los Baños, Metro Manila Philippines E-mail:
[email protected] C. Robin Buell The Institute for Genomic Research (TIGR), Rockville, MD 20850 USA E-mail:
[email protected] Abed Chaudhury Genomics and Plant Development Program, CSIRO Plant Industry, Canberra, ACT 2601 Australia E-mail:
[email protected]
Contributors
XXI
Luca Comai The University of California Davis Genome Center, Davis, CA 95616 USA E-mail:
[email protected] Richard Cooke Laboratoire Génome et Développement des Plantes, UMR5096 Centre National de la Recherche Scientifique, University of Perpignan, Perpignan Cédex 66860 France E-mail:
[email protected] Wm L. Crosby* Department of Biological Sciences, University of Windsor, Windsor, ON N9B 3P4 Canada E-mail:
[email protected] Shaun J. Curtin Genomics and Plant Development Program, CSIRO Plant Industry, Canberra, ACT 2601 Australia E-mail:
[email protected] Michel Delseny Laboratoire Genome et Développement des Plantes, UMR 5096 Centre National de la Recherche Scientifique, University of Perpignan, Perpignan, Cédex 66860 France E-mail:
[email protected] Elizabeth S. Dennis Genomics and Plant Development Program, CSIRO Plant Industry, Canberra, ACT 2601 Australia E-mail:
[email protected] Andrew L. Eamens Genomics and Plant Development Program, CSIRO Plant Industry, Canberra, ACT 2601 Australia E-mail:
[email protected]
XXII
Contributors
Moo Young Eun Rice Functional Genomics and Molecular Breeding Lab, Cell and Genetics Division, National Institute of Agricultural Biotechnology (NIAB), Rural Development Administration, Suwon, 441-707 Republic of Korea E-mail:
[email protected] Catherine Feuillet UMR Amélioration et Santé des Plantes, INRA-UBP, 63100 Clermont Ferrand France E-mail: catherine.
[email protected] Jose Luis Goicoechea Arizona Genomics Institute, University of Arizona, Tucson, AZ 85721 USA E-mail:
[email protected] Emmanuel Guiderdoni AMIS Department, UMR PIA 1096, CIRAD, Montpellier, Hérault F-34398 France E-mail:
[email protected] Romain Guyot Laboratoire Genome et Developpement des Plantes, UMR 5096 CNRS-IRD-UP, CNRS-IRD-Université de Perpignan, 34394 Montpellier cedex 5 France E-mail: romain.
[email protected] Bin Han National Center for Gene Research, Chinese Academy of Sciences, Shanghai, 200233 China E-mail:
[email protected] Chang-deok Han Division of Applied Life Science, BK21 Program, Plant Molecular Biology and Biotechnology Research Center, Gyeongsang National University, Jinju, 660-701 Republic of Korea E-mail:
[email protected]
Contributors Steven Henikoff Seattle TILLING Project Department of Biology and FHCRC, University of Washington, Seattle, WA 98195 USA E-mail: steveh@ fhcrc.org Robert Henry* Southern Cross University, Lismore, NSW 2480 Australia E-mail:
[email protected] Barbara Hohn* Friedrich Miescher-Institut, Basel Switzerland E-mail:
[email protected] Yue-ie Hsing Institute of Botany, Academia Sinica, Nankang, Taipei 11529 Taiwan E-mail:
[email protected] Bonnie Hurwitz Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724 USA E-mail:
[email protected] Shigeru Iida National Institute for Basic Biology, Myodaiji, Okazaki, Aichi 444-8585 Japan E-mail:
[email protected] Takeshi Itoh National Institute of Agrobiological Sciences, 2-1-2 Kannondai, Tsukuba, Ibaraki 305-8602 Japan E-mail:
[email protected]
XXIII
XXIV
Contributors
Anjali S. Iyer-Pascuzzi Department of Plant Breeding and Genetics, Cornell University Ithaca, NY 14853 USA E-mail:
[email protected] Scott Jackson Department of Agronomy, Purdue University, West Lafayette, IN 47907 USA E-mail:
[email protected] Alexander A.T. Johnson Australian Centre For Plant Functional Genomics PMB 1, Glen Osmond, South Australia 5064 Australia E-mail:
[email protected] Yasuyo Johzuka-Hisatomi National Institute for Basic Biology, Myodaiji, Okazaki, Aichi 444-8585 Japan E-mail:
[email protected] Shoshi Kikuchi Laboratory of Gene Expression Department of Genetics, National Institute of Agrobiological Sciences, 2-1-2 Kannon-dai, Tsukuba, Ibaraki 305-8602 Japan E-mail:
[email protected] HyeRan Kim Arizona Genomics Institute, University of Arizona, Tucson, AZ 85721 USA E-mail:
[email protected] Setsuko Komatsu Laboratory of Gene Regulation, Department of Molecular Genetics, National Institute of Agrobiological Sciences, 2-1-2 Kannondai, Tsukuba, Ibaraki 305-8602 Japan E-mail:
[email protected]
Contributors
XXV
Dave Kudrna Arizona Genomics Institute, University of Arizona, Tucson, AZ 85721 USA E-mail:
[email protected] Chellian Santhosh Kumar Department of Plant Sciences, Life Sciences Addition 1002 University of California– Davis, Davis, CA 95616 USA E-mail:
[email protected] Evans Lagudah* Genetic Engineering for Crop Improvement Program, CSIRO Plant Industry, Canberra, ACT 2601 Australia E-mail:
[email protected] Phil Larkin* Genetic Engineering for Crop Improvement Program, CSIRO Plant Industry, Canberra, ACT 2601 Australia E-mail:
[email protected] Alain Lecharny* Bioinformatics Group, Institut National de la Recherche Agronomique (INRA)/CNRS—URGV, Evry cedex CP5708, 91057 France E-mail:
[email protected] Hei Leung International Rice Research Institute, Los Baños, Metro Manila Philippines E-mail:
[email protected] Lei Li Department of Molecular, Cellular, and Developmental Biology, Yale University, New Haven, CT 06520 USA E-mail:
[email protected]
XXVI
Contributors
Meizhong Luo Arizona Genomics Institute, University of Arizona, Tucson, AZ 85721 USA E-mail:
[email protected] Jianxin Ma Department of Agronomy, Purdue University, West Lafayette, IN 47907 USA E-mail:
[email protected] David Mackill International Rice Research Institute, Los Baños, Metro Manila Philippines E-mail:
[email protected] Matsuoka Makato Laboratory of Plant Molecular Breeding, Development and Applied Division, Bioscience and Biotechnology Center, Nagoya University, Furocho, Chikusa-ku, Nagoya-shi, Aichi 464-8601 Japan E-mail:
[email protected] Takashi Matsumoto National Institute of Agrobiological Sciences, 2-1-2 Kannondai, Tsukuba, Ibaraki 305-8602 Japan E-mail:
[email protected] Susan R. McCouch Department of Plant Breeding and Genetics, Cornell University, Ithaca, NY 14853 USA E-mail:
[email protected] Kenneth McNally International Rice Research Institute, Los Baños, Metro Manila Philippines E-mail:
[email protected]
Contributors
XXVII
Will Nelson Arizona Genomics Computational Laboratory, University of Arizona, Tucson, AZ 85721 USA E-mail:
[email protected] Oliver Panaud Laboratoire Genome et Developpement des Plantes, UMR 5096 CNRS-IRD-UP, University of Perpignan, Perpignan FR-66860 France E-mail:
[email protected] Andy Pereira Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA 24061 USA E-mail:
[email protected] Christophe Perin AMIS Department, UMR PIA 1096, CIRAD, Montpellier, Hérault F-34398 France E-mail:
[email protected] Benoit Piégu Laboratoire Genome et Developpement des Plantes, UMR 5096 CNRS-IRD-UP, University of Perpignan, Perpignan 66860 France E-mail:
[email protected] Tony Pryor* Genetic Engineering for Crop Improvement Program, CSIRO Plant Industry, Canberra, ACT 2601 Australia E-mail:
[email protected] Fulin Qiu International Rice Research Institute, Los Baños, Metro Manila Philippines E-mail:
[email protected]
XXVIII
Contributors
Chitra Raghavan International Rice Research Institute, Los Baños, Metro Manila Philippines E-mail:
[email protected] Srinivasan Ramachandran Rice Functional Genomics Group, Tamasek Lifesciences Laboratory 1, Research Link, National University of Singapore, Singapore 117 604 E-mail:
[email protected] Ute Roessner Australian Centre for Plant Functional Genomics, School of Botany, The University of Melbourne, Parkville, Victoria 3010 Australia E-mail:
[email protected] Paul Roffey School of Wine and Food Sciences, Charles Sturt University Wagga Wagga, NSW 2678 Australia E-mail:
[email protected] Jérome Salse Institut National de la Recherche Agronomique (INRA) UMR ASP Clermont-Ferrand 66860 France E-mail:
[email protected] Phillip SanMiguel Department of Agronomy and Genomics Core Facility, Purdue University, West Lafayette, IN 47907 USA E-mail:
[email protected] Neelamraju Sarala Directorate of Rice Research, Rajendranagar, Hyderabad, AP 500 030 India E-mail:
[email protected]
Contributors
XXIX
Takuji Sasaki National Institute of Agrobiological Sciences, 2-1-2 Kannondai, Tsukuba, Ibaraki 305-8602 Japan E-mail:
[email protected] Cari Soderlund Arizona Genomics Computational Laboratory, University of Arizona, Tucson, AZ 85721 USA E-mail:
[email protected] Lincoln Stein Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724 USA E-mail:
[email protected] Venkatesan Sundaresan Department of Plant Sciences, Life Sciences Addition 1002 University of California– Davis, Davis, CA 95616 USA E-mail:
[email protected] Megan T. Sweeney Department of Plant Breeding and Genetics, Cornell University, Ithaca, NY 14853 USA E-mail:
[email protected] Satoshi Tabata* The Department of Plant Gene Research, Kazusa DNA Research Institute, 2-6-7 Kazusa-kamatari, Kisarazu, Chiba 292-0818 Japan E-mail:
[email protected] Lee Tarpley Texas A&M Agricultural Research and Extension Center, Beaumont, TX 77713 USA E-mail:
[email protected]
XXX
Contributors
Rie Terada National Institute for Basic Biology, Myodaiji, Okazaki, Aichi 444-8585 Japan E-mail:
[email protected] Mark Tester Australian Centre for Plant Functional Genomics, PMB1, Glen Osmond, South Australia 5064 Australia E-mail:
[email protected] Brad Till Seattle TILLING Project, Department of Biology and FHCRC, University of Washington, Seattle, WA 98195 USA E-mail:
[email protected] Narayana M. Upadhyaya Genomics and Plant Development Program, CSIRO Plant Industry, Canberra, ACT 2601 Australia E-mail:
[email protected] Guo-Liang Wang Department of Plant Pathology, Ohio State University, Columbus, OH 43210 USA E-mail:
[email protected] Hehe Wang International Rice Research Institute, Los Baños, Metro Manila Philippines E-mail:
[email protected] Ming-Bo Wang Genomics and Plant Development Program, CSIRO Plant Industry, Canberra, ACT 2601 Australia E-mail:
[email protected]
Contributors Doreen Ware USDA-ARS, North Atlantic Area (NAA) Plant, Soil & Nutrition Laboratory Research Unit, Ithaca, NY 14853 USA E-mail:
[email protected] Peter M. Waterhouse Genomics and Plant Development Program, CSIRO Plant Industry, Canberra, ACT 2601 Australia E-mail:
[email protected] John M. Watson Genomics and Plant Development Program, CSIRO Plant Industry, Canberra, ACT 2601 Australia E-mail:
[email protected] Charles White* CNRS UMR6547, Université Blaise Pascal, Aubière 63177 France E-mail:
[email protected] Iain Wilson* Genomics and Plant Development Program, CSIRO Plant Industry, Canberra, ACT 2601 Australia E-mail:
[email protected] Rod A. Wing Arizona Genomics Institute, University of Arizona, Tucson, AZ 85721 USA E-mail:
[email protected] Changyin Wu National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070 China E-mail:
[email protected]
XXXI
XXXII
Contributors
Jianli Wu International Rice Research Institute, Los Baños, Metro Manila Philippines E-mail:
[email protected] Ray Wu Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853 USA E-mail:
[email protected] Yukiko Yamazaki National Institute of Genetics, Yata 1111, Mishima, Shizuoka 411-8540 Japan E-mail:
[email protected] Masahiro Yano Applied Genomics Laboratory, National Institute of Agrobiological Sciences, 2-1-2 Kannondai, Tsukuba, Ibaraki 305-8602 Japan E-mail:
[email protected] Immanuel Yap Department of Plant Breeding and Genetics, Cornell University, Ithaca, NY 14853 USA E-mail:
[email protected] Su-May Yu Institute of Molecular Biology, Academia Sinica, Nankang, Taipei 11529 Taiwan E-mail:
[email protected] Yeisoo Yu Arizona Genomics Institute, University of Arizona, Tucson, AZ 85721 USA E-mail:
[email protected]
Contributors
XXXIII
Qian-Hao Zhu Genomics and Plant Development Program, CSIRO Plant Industry, Canberra, ACT 2601 Australia E-mail:
[email protected] Andrea Zuccolo Arizona Genomics Institute, University of Arizona, Tucson, AZ 85721 USA E-mail:
[email protected]
* Contributed
as reviewers of one or more Chapters
1 Introduction
Narayana M. Upadhyaya and Elizabeth S. Dennis CSIRO Plant Industry, GPO Box 1600, Canberra, ACT 2601, Australia
The availability of the sequences of rice (Oryza) and Arabidopsis genomes, the model species for dicot and monocot plants, respectively, allows plant science to enter a new era of plant functional genomics. The emphasis is now on identifying functions for each of the 25,000–50,000 plant genes predicted to be encoded in plant genomes. Plant functional genomics is now a major driving force in scientific research and a great challenge to the scientific community. Genome-wide research tools such as data mining for structural similarities; expression profiling at the RNA level with expressed sequence tags (ESTs), oligonucleotide, or cDNA chips; expression profiling at the protein level (proteomics); gene knockout and loss-offunction studies with naturally occurring alleles and induced deletion and insertional mutants; and gene expression knockdown (gene silencing) studies with RNA interference (RNAi) have become integral to plant functional genomics. The scientific community has chosen rice as the model cereal for functional genomics not only because it is a major worldwide food crop but also because of its small genome (~430 Mb, which is the smallest among cereal genomes), the ease with which it can be transformed, and its well studied genetics together with the availability of detailed physical maps and large numbers of molecular markers. Because of the similarities in sequence, structure, order, and function of genes among all the cereals and grasses, genes identified in rice as being important agronomically will also be important in other cereals. Any understanding of rice gene function is directly applicable to the genes of other cereals. With the availability of near-complete genome sequence data for both japonica and indica rice, the most straightforward way of predicting a likely function of a rice gene sequence is by comparison with sequence databases from other organisms, as functionally similar genes normally have sequence similarities at both the protein and DNA levels. Supercomputers and robust
2
Narayana M. Upadhyaya and Elizabeth S. Dennis
bioinformatics capabilities are being developed to increase the precision with which sequences can be compared. Several laboratories have embarked on rice sequence annotation using this approach. Such computational gene predictions suggest that approximately 50% of more than 40,000 rice genes could show sequence similarities to previously described genes with known/predicted functions. Working models for more than 40,000 rice genes have been built by combining available EST data from rice and other plant species. More than 30,000 full-length rice cDNA sequences are now available. However, many of the gene sequences predicted computationally are yet to be confirmed experimentally. Genome-wide expression profiling of rice genes is being facilitated by high-throughput techniques such as microarrays and massively parallel signature sequencing (MPSS). Spatial and temporal RNA expression patterns provide insight into their cellular and developmental function. The total protein complement expressed by the genome, termed the “proteome,” can be visualized via two-dimensional polyacrylamide gel electrophoresis (PAGE) to study the abundance and posttranslational modification of several hundred proteins at once. Similarly, metabolomics- a comprehensive analysis of low molecular weight compounds in biological samples, is a major phenotyping approach that can assist in the identification of novel gene functions. Mutational approaches are being used to unravel the genetic and molecular bases of traits. Naturally occurring allelic variations and variations induced by chemical or radiation mutagenesis are being used in functional genomics. Isolation of the mutated genes is achieved via positional cloning strategies, that is, cloning the gene based on its position on the genetic map. This strategy requires dense genetic maps with many visible (phenotypic) and molecular markers. The availability of physical maps consisting of a collection of overlapping DNA fragments cloned in yeast or bacterial artificial chromosomes and subsequent complete genomic sequences greatly accelerate positional cloning. However, the limiting factors are the time and effort required for constituting the mapping population and fine mapping of mutant loci. The recently developed targeting induced local lesions in genomes (TILLING) strategy allows high-throughput screening for point mutations produced by traditional chemical mutagens in a particular gene. Insertional mutagenesis provides a more rapid and direct way to clone a mutated gene. As the sequence of the inserted element is known, the gene into which it is inserted can be recovered easily by means of various cloning and polymerase chain reaction (PCR)-based strategies. Populations with transferred DNA (T-DNA) or transposable element-induced loss-of-function or knockout mutations are useful for identifying gene function. With a population saturated with insertions, that is,, having at least one insertion in
1 Introduction
3
each gene, it is possible to apply both forward genetics and reverse genetics approaches to identify gene function. In the forward genetics approach, a mutant with a phenotype is first identified by screening the population; sequences flanking the insertion are then cloned and compared with database sequences, enabling the assignment of a function to the mutated gene. In the “reverse genetics” approach one starts with a computerpredicted gene from the genome sequence and searches for an insertion mutant in that gene. Oligonucleotide primers from the insertional element and from the gene of interest are used for PCR amplification. Appropriately pooled DNA samples are used for high-throughput screening for this rare event in the population. Once a mutation in the appropriate gene has been identified, homozygotes are isolated and the phenotype checked. The rationale for activation tagging is that increased gene expression can create mutations for essential and redundant genes that are either not present or have no phenotype in knockout collections. This gain-offunction approach produces dominant mutations affecting the transcriptional control of genes without altering the functional gene product. The classic approach to activation tagging has been random insertion of a CaMV 35S enhancer element in the rice genome, often resulting in overexpression of native genes in all cell types of the plant. However, the development of extensive GAL4 enhancer trapping resources in rice now enables targeting of transgene expression to specific cell types, a two-step process known as transactivation. In the first step, a large number of GAL4 enhancer trapping driver lines are generated and patterns of reporter gene expression are characterized. Responder lines are then produced with transgenes of interest cloned downstream of the UAS element to which GAL4 binds. Crosses between driver and responder lines result in transactivation of the target genes by GAL4 and show the specific expression profile of each individual driver. In addition, random deployment of the UAS element into the rice genome, followed by crosses to specific driver lines, should make it possible to carry out activation tagging in specific cell types of the plant. Cell type–specific activation tagging has the potential to uncover novel mutations that are missed or “averaged out” by the classic activation tagging technique. There is an urgent need for international collaboration in building tools and resources, especially for assembling a set of lines with mutations in all the predicted 40,000 genes together with integrated databases containing all the relevant information about each gene. Precise phenotyping of mutants with common descriptors of characters between laboratories is important in assigning gene function. Phenotyping can be performed effectively through collaborations between laboratories with complementary expertise. Toward this goal, an International Rice Functional
4
Narayana M. Upadhyaya and Elizabeth S. Dennis
Genomics Consortium (IRFGC) has been formed that should provide a much needed common platform for information and resource sharing for public good research. The current status of genome sequencing and annotation and the various tools and resources being developed worldwide in the form of ESTs, fulllength cDNA, gene expression profiles, chemical- and radiation-induced mutants, TILLING resources, insertional knockout mutants (T-DNA, transposon, and retrotransposon) and activation tags, naturally occurring alleles, Oryza alignment maps (OMAP), gene targeting by homologous recombination and gene silencing by RNAi have been covered in this book. Various bioinformatics tools and resources pertinent to rice functional genomics are also described and discussed in one chapter. The application of the outputs of the rice functional genomics efforts will be via the use of naturally occurring, agronomically useful alleles of rice or other Oryza genomes. Molecular markers will allow the rapid integration of these alleles into breeding programs. Following the association of a phenotype with a gene, the level or pattern of expression of that gene can be altered to give the desired effect. This can be done by looking for mutants in the gene (e.g., TILLING) or by using transgenic methods to reduce gene expression via RNAi technology or overexpression. The specifically altered lines can then be incorporated into breeding programs. Finally, although rice is an important food crop in its own right, it is also a model for other cereals. Discoveries in rice can be applied to other cereals such as maize, wheat, and barley. With sequencing projects commencing in maize and wheat, the functional genomics findings in rice will assist gene selection and breeding in other cereals using the power of comparative genomics as described in the chapter on comparative genomics. This book covers the whole spectrum of rice functional genomics from the sequence to the field. We hope that scientists at all stages of the continuum find it useful as we span the divide between molecular biology and plant improvement.
2 Rice Genome Sequence: The Foundation for Understanding the Genetic Systems
1
2
3
1
Takashi Matsumoto , Rod A. Wing , Bin Han , Takuji Sasaki 1
National Institute of Agrobiological Sciences, 2-1-2 Kannondai, Tsukuba, Ibaraki 305-8602, Japan; 2Department of Plant Sciences, The University of Arizona, Tucson, AZ 85721, USA; 3National Center for Gene Research, Chinese Academy of Sciences, 500 Caobao Rd., 200233 Shanghai, China Reviewed by Satoshi Tabata
2.1 The Importance of the Accurate Genome Sequence of Rice .......................5 2.2 Construction of the Sequence-Ready Physical Maps...................................7 2.3 Two-Step Strategy for Completion of Rice Genome Sequencing .............10 2.4 An Alternative Approach—the Whole Genome Shotgun Sequencing of Rice.........................................................................................................13 2.4.1 Whole Genome Shotgun Sequencing of japonica Rice (Syngenta) ...13 2.4.2 Whole Genome Shotgun Sequencing of indica Rice (BGI). ..............13 2.4.3 Comparison of Genome Sequences Derived from Whole Genome Shotgun Sequencing and Clone-by-Clone Shotgun Sequencing (IRGSP)..............................................................................................13 2.5 Initial Analysis of the Rice Genome ..........................................................15 2.6 Current Status and Future Developments ..................................................16 Acknowledgments ...........................................................................................17 References........................................................................................................17
2.1 The Importance of the Accurate Genome Sequence of Rice Progress in DNA sequencing technology has produced a tremendous increase in the number of nucleotide sequences from diverse organisms in a relatively short period of time. The collections of DNA and RNA sequences submitted to public databases such as GenBank, DDBJ, and EMBL recently reached 100 gigabases (NLM 2005) from 165,000 organisms. As sequencing advances, it is important to evaluate the accuracy and quality of the
6
Takashi Matsumoto et al.
sequence data. Positional accuracy indicates that the sequence is mapped onto the correct position on the genome. Sequence accuracy means that the nucleotide evaluation is performed correctly. The first two sequencing technologies were the Maxam-Gilbert method (Gilbert and Maxam 1973) and Sanger dideoxy-chain terminator method (Sanger et al. 1977b). The Sanger method is widely used today because it is compatible with autosequencers that use fluorescent-labeled nucleotide analogs instead of radiolabeled chemicals (Smith et al. 1986). The recent development of capillary sequencers that can simultaneously run 96 or 384 samples in 2 to 3 hours allows extensive parallel analysis of the nucleotide sequences. Improvements in liquid-handling robots and computer-aided data analysis technologies allow genome-wide sequencing in a reasonable time. The first “genome” sequence was that of a bacteriophage (Sanger et al. 1977a), followed by a bacterium (Fleischmann et al. 1995), and thereafter applied to the other organisms with larger genomes. Two major strategies have been devised for genome sequencing. In the hierarchical shotgun strategy, detailed, sequence-ready physical maps are constructed from genomic clones, and each clone such as P1-phage derived artificial chromosome (PAC), bacterial artificial chromosome (BAC), or cosmid, or fosmid clone is subcloned using partially digested DNA, and the subclones are sequenced (shotgun sequencing). Sequences are then assembled via sequence assembly software to form a contiguous sequence (contig) that virtually represents the original clone sequence. Finally, the clone sequences are connected according to the order of the physical maps to form the genome sequence. The strategy usually gives long, accurate sequences, although it is expensive in terms of time, monetary cost, and labor. The alternative strategy, the whole genome shotgun (WGS) method (Venter et al. 1996), assembles the many short shotgun sequences derived from the whole genome to reconstruct the overall structure. The method is simple and straightforward, and is compatible with high-throughput sequencing equipment. The WGS method can supply genome-wide sequences mostly from “gene-rich” regions in a relatively short period of time. However, it usually gives many unconnected contigs. Moreover, there is a significant chance of genome misassembly in the case of repeat-rich sequences. Choice of the genome sequencing strategy depends on the need. Obviously for the genome of a “model” organism that would become a key to the understanding of related species, one should aim for very high position and sequence accuracy so that it can serve as a reliable “reference” genome for subsequent comparisons with many other related organisms. On the other hand, analysis of an organism for a special purpose, such as to investigate genes involved in organism-specific metabolic pathways, requires only the genes involved in the pathways. Once the “reference”
2 The Foundation for Understanding the Genetic Systems
7
genome is available, genomes of related organisms can be analyzed via the WGS method. Rice is one of the most important crops worldwide, as it is the staple food for half the world’s population. More than 2 billion people in Asia obtain the majority of their calories and protein from rice. As the world population continues to grow, as does the struggle to keep up the food supply, improving rice production is a pressing matter in the early 21st century. This makes rice the most economically and politically important crop. Rice is also the key plant to understanding the genus Oryza, grass family (Gramineae) plants, and monocotyledons. Oryza is estimated to have originated 50 M years ago (Gaut 2002) and is represented by 23 species (Vaughan et al. 2003). Gramineae has approximately 10,000 species (Royal Botanic Gardens, Kew, http://www.rbgkew.org.uk/) and is the most ecologically and economically important of all the plant families. Colinearity of gene order (synteny) occurs across the grass family and many genes are mapped via this syntenic relationship. Rice is regarded as a “reference” crop that should be sequenced with as high an accuracy as possible. Accurate rice sequence information would be useful not only for isolation and breeding of the rice gene, but also for the molecular breeding of other important crops such as maize, barley, sorghum, and wheat. Researchers also recognize that revealing the rice genome drives the basic science of monocots, which cannot be fully understood from knowledge on Arabidopsis and other dicots (Leach et al. 2002).
2.2 Construction of the Sequence-Ready Physical Maps In the hierarchical shotgun strategy, or “clone-by-clone methodology,” large genomic DNA is digested into intermediate-sized fragments (40 to 150 kb), that are cloned into E. coli cells to make genome libraries. The libraries need to have enough redundancy in terms of both genome coverage and digestion sites. In the construction of IRGSP (the International Rice Genome Sequencing Project) Nipponbare physical maps, one PAC and three BAC libraries consisting of approximately 210,000 clones were first constructed (Baba et al. 2000; Mao et al. 2000). The libraries seem to cover all the rice genome because they have a 57.4× redundancy and have enough variety for restriction sites. Moreover, the Monsanto donated 3,416 BAC clones from their physical maps with the draft sequences to accelerate international attempts to complete the rice genome (Barry 2001). However, it was later revealed that the clones were still missing some part of the genome, leaving gaps in the physical maps.
8
Takashi Matsumoto et al.
Fig. 2.1. A strategy for constructing the sequence-ready physical maps. The dotted lines indicate the fingerprinted BAC contigs, and the circles at the end of BACs show the BAC end sequences. Arrows crossing the BAC contigs indicate the anchor markers. The rectangle below shows the rice genome, to which the BAC contigs are mapped through anchor markers
The IRGSP took a complementary approach to make a comprehensive sequence-ready PAC/BAC physical map (Fig. 2.1). The Rice Genome Research Program (RGP) in Japan constructed a high-density transcript map in which 6,591 expressed sequence tag (EST)/sequence tagged site (STS) markers were mapped (Wu et al. 2002). An extensive, pooled clone polymerase chain reaction (PCR) screening identified the experimentally anchored PAC/BAC clones (Wu et al. 2003). Conversely, Clemson University Genomics Institute/Arizona Genomics Institute/Arizona Genomics Computational Laboratory (CUGI/AGI/AGCoL) from the USA fingerprinted and end-sequenced all BAC clones and assembled them into contigs via FingerPrinted Contigs (FPC) software (Soderlund et al. 1997). These assembled contigs were anchored to the genome by screening with the overlapping oligonucleotide (overgo) probes from the genetic markers
2 The Foundation for Understanding the Genetic Systems
9
Fig. 2.2. Most recent status of Nipponbare physical maps. In each chromosome, linkage maps are shown on the left (numbers show the genetic distances) and PAC/BAC contigs (black bars) on the right. (Modified from the International Rice Genome Sequencing Project [2005] Nature 436:793–800)
(Chen et al. 2002). Finally, these two contrasting methodologies were combined to form a joint maps to finalize the physical mapping; from the draft sequences of “seed BACs” that are anchored by the DNA markers, the BAC end-sequence database was searched to detect the neighboring BAC clones, which are part of a contig. This sequence tagged connector (STC) method (Siegel et al. 1999) could effectively “walk” and “jump” between the marker-associated PAC/BAC contigs to fill the gaps. Eventually, most of the chromosomal clone gaps were successfully filled except for 36 remaining ones (Fig. 2.2, centromere region not counted). The sizes of all the remaining regions were measured by fiber-fluorescence in situ hybridization (FISH) analysis to be no longer than 100 kb. So far it is not known why these regions have not cloned. Several explanations (absence of available restriction digestion sites, sequence toxic to bacteria, complex repeat clusters that hamper clone identification by the DNA markers) are possible. There are also relatively large (0.2 to 2 Mb) gaps in the centromeric regions (black triangles in Fig. 2.2) for all but three chromosomes in the physical maps (chromosomes 4, 5, and 8). Even
10
Takashi Matsumoto et al.
considering these gaps, the IRGSP was able to construct a physical map covering more than 95% of the rice genome (see Table 2.1). Table 2.1. Coverage of the IRGSP physical maps based on the sequenced lengtha Chromosome 1 2 3 4 5 6 7 8 9 10 11 12 All
Sequenced Gaps on Centromere Estimated length (Mb) arm regions Covered total (Mb) 43.3 5 No 45.05 35.0 3 No 36.78 36.2 4 No 37.37 35.5 3 Yes 36.15 29.7 6 Yes 30.00 30.7 1 No 31.60 29.6 1 No 30.28 28.4 1 Yes 28.57 22.7 4 No 30.53 22.7 4 No 23.96 28.4 4 No 30.76 27.6 0 No 27.77 370.7 36 388.82
Coverage (%) 96 98 97 98 99 97 98 100 74 95 92 99 95
a
Modified from the International Rice Genome Sequencing Project (2005) Nature 436:793–800.
2.3 Two-Step Strategy for Completion of Rice Genome Sequencing The IRGSP used the clone-by-clone method to obtain an accurate rice sequence and followed a two-step sequence publication in the public databases. The overall procedure for the genome sequencing is as follows. First, the target PAC/BAC DNA is purified and sheared into the two shotgun libraries (2 kb and 5 to 7 kb inserts). Both ends of approximately 1,000 subclones each are sequenced, and the subclone-end sequences are assembled via Phred-phrap software (http://www.phrap.org). Typically, one to five sequence contigs are formed from the resulting 4,000 shotgun sequences into a PAC/BAC clone (typically with a 100 to 150 kb genomic insert). As the sequence redundancy is high (>10×), most of the sequence gaps have multiple bridging shotgun clones, which make all the contigs both ordered and oriented. These sequence contigs can be submitted as phase 2 state in high-throughput genomic (HTG) sequences division of the public database of the National Centre for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov/projects/HTGS/). The IRGSP decided to publish all the clone sequences from phase 2 or the high-quality draft of genome sequence because of the strong demand
2 The Foundation for Understanding the Genetic Systems
11
for the release of relatively accurate genome sequences by crop researchers. The IRGSP constructed the pipelines for the mass-sequence production and submitted the clone-based sequences to the public databases immediately after the sequence assembly was completed. This accelerated data release resulted in the availability of a high-quality draft sequence of more than 450 Mb (366 Mb after removal of overlaps) by December 2002 (http://rgp.dna.affrc.go.jp/rgp/Dec18_NEWS.html). The final step, converting the draft sequences into a continuous highquality sequence, consists of three main parts: filling the sequence gaps, improving the low-accuracy regions, and resolving mis-assemblies. Filling the sequence gaps is a relatively easy task because all we need to do is fully sequence the bridge clones and reassemble the sequence. The IRGSP follows the Bermuda standard (http://www.genome.gov/ 10001812), which requires 99.99% accuracy for most of a finished sequence. The accuracy is evaluated by the scoring function of the Phred software. To improve the low-accuracy sequences indicated by low Phred scores, we resequenced them with different DNA polymerases or sequencing chemistries to reconfirm base determination. Although the rice genome has relatively few repeat sequences compared with other crops, every PAC/BAC clone sequence nonetheless has some transposon sequences, simple repeats, or unnamed repeats. Although these sequences are not genes for proteins, they might still have some unknown functions and be transcribed into RNAs or act as target sites for other proteins. As the assembly software combines the sequences by annealing similar regions, it has a high tendency to mis-assemble at these repeat regions. Trained researchers need to detect mis-assemblies, resolve them, edit the repeats to identify and order each repeats unit, and reassemble the sequences. Finally, the assembled sequences are verified by comparing sizes with those of real and virtual restriction digestion fragments. The finished sequences are submitted to the public databases as final HTG phase 3 or PLANT (PLN) sequences with or without annotations. At the time the draft sequencing was completed, more than 2000 PAC/BAC clones were left in phase 2, about half of which were assigned in RGP. The IRGSP continued working on finishing these sequences. Gradually the phase 2 clones became finished (PLN) clones, and all the sequencing was completed in December 2004 (Fig. 2.3). In the publication of the complete sequence, the IRGSP submitted 3,401 PAC/BAC clones, 18 fosmid clones, and some virtual clones from the sequences of PCR-amplified fragments. The total nucleotide length calculated by combining each PAC/BAC sequence and removing the overlaps is 370,733,456 bases. Adding these sequence lengths and the estimated gap lengths reveals the total physical length of
12
Takashi Matsumoto et al.
Fig. 2.3. Progress of finishing rice genome sequencing by IRGSP. P2 and PLN show phase 2 and completed clones, respectively
the rice genome to be 388.82 Mb. Three of the twelve centromeres have physical contigs, and two of them are published as high-quality sequences (chromosome 8: Wu et al. 2004; Nagaki et al. 2004; and chromosome 4: Zhang et al. 2004). These centromere sequences gave interesting materials for comparative genetics within the genome. Although the compositions of the two centromeres (CentO repeat, centromere retrotransposon of rice [CRR], and other transposons) are similar, the distributions of the CentO clusters are totally different. This suggests that chromosomes 4 and 8 have different histories of divergence (Ge et al. 1999). The sequenced regions, 370 Mb, correspond to 95.3% of the genome (98.9% in the euchromatin region). These results indicate that the IRGSP achieved the near-complete genome sequence of Nipponbare. The high-quality and map-based sequence of the entire genome is now available in public databases. The Nipponbare genome sequence has been improved and published (http://rgp.dna.affrc.go.jp/IRGSP/ Build2, http://rgp.dna.affrc.go.jp/IRGSP/Build3, http://rgp.dna.affrc.go.jp/IRGSP/ Build4).
2 The Foundation for Understanding the Genetic Systems
13
2.4 An Alternative Approach—the Whole Genome Shotgun Sequencing of Rice Two activities have contributed to the whole genome shotgun rice sequencing. The Beijing Genomics Institute (BGI) has published the assembled 466-Mb sequence of indica variety, 93-11 from the 4× coverage WGS assembly (Yu et al. 2002). As described in a recent publication, this assembly was improved with the additional shotgun sequences (Yu et al. 2005). A private firm, Syngenta, also published 420 Mb of the Nipponbare sequence obtained by their independent WGS assembly (Goff et al. 2002). Both research groups have predicted 30,000 to 50,000 genes on the rice genome and also found many putative orthologs of genes from Arabidopsis or other plant species. Yu et al. (2005) have further improved the Syngenta Nipponbare WGS assembly by reassembling and combining the japonica and indica genome sequences. 2.4.1 Whole Genome Shotgun Sequencing of japonica Rice (Syngenta) The latest assembly of Syngenta sequences by BGI assembled shotgun sequences (~6× coverage) of Nipponbare into 433.2-Mb sequences with 35,047 contigs (Yu et al. 2005). A total of 45,824 genes were predicted. Nearly 99% of the nonredundant rice full-length cDNA sequences (Kikuchi et al. 2003) showed corresponding sequences in the assembled genome. 2.4.2 Whole Genome Shotgun Sequencing of indica Rice (BGI) The latest assembly of BGI assembled approximately 6.3× coverage shotgun sequences of indica cv. 93-11 into 466.3-Mb sequences with 50,233 contigs (Yu et al. 2005). A total of 49,088 genes were predicted and 97.1% of the nonredundant rice full-length cDNA sequences matched the assembled genome. Sequence comparison indicated 3.00 single nucleotide polymorphisms (SNPs) per kilobase in the genic regions and 15.13 SNPs/kb between Nipponbare and 93-11. 2.4.3 Comparison of Genome Sequences Derived from Whole Genome Shotgun Sequencing and Clone-by-Clone Shotgun Sequencing (IRGSP) To compare the Syngenta and BGI shotgun sequences with the IRGSP map-based clone-by-clone sequence, we first mapped the BGI and
14
Takashi Matsumoto et al.
Syngenta contigs to IRGSP pseudomolecules using BLAST. With the Syngenta contigs we used the stringent condition that each contig must have at least 95% alignment with IRGSP pseudomolecules with an identity of at least 95%. Under this condition, a total of 26,007 contigs could be mapped to the pseudomolecules covering 290 Mb, with coverage varying from 77% to 81%. Discrepancy of this result from cDNA mapping might be due to the sequence mis-assemblies in the repeat-rich region. With the BGI contigs, considering the sequence variation between the two subspecies, we used the condition that each contig must align at least 50% with the IRGSP pseudomolecules and have at least 80% identity. Under this condition, we mapped a total of 25,101 contigs to pseudomolecules covering 258 Mb, with coverage varying from 58% to 78%, indicating subspecies variation derived from large insertions, deletions, and inversions. As the sequence assembly obtained by the shotgun sequencing is inherently confusable with repetitive sequences, we also analyzed the shotgun sequence coverage in genic regions. We used the dataset of 37,544 of IRGSP predicted genes, among which 9,485 genes are supported by rice transcripts. Of these predicted genes, 26,424 (70%) were fully covered by the Syngenta contigs and 22,376 (60%) were fully covered by BGI contigs. In full-length cDNA supported genes, Syngenta contigs covered 7,139 (75%) and BGI contigs covered 6,482 (68.3%), which may reflect the relative high coverage in the gene-dense regions compared to other parts of the genome. Detailed study of a region of chromosome 1 shows that each assembly contains nonhomologous, misaligned, or duplicate coverage, which may be an artifact of the assembly program. It even shows 0.05% base-pair mismatches in matched contigs within Nipponbare, possibly a result of the relatively low coverage of shotgun sequences. A case study of the CentO repeat sequences showed that 32% of this centromere-specific sequence was found in contigs outside the centromere, indicating a high rate of misassembly of the WGS with repeat sequences.
2.5 Initial Analysis of the Rice Genome After completion of sequencing, the IRGSP presented the results of the initial analysis (IRGSP 2005). About one-third of the total genome contains the known repeat elements, including transposons. In repeat-free domains, the computer program FGENESH detected 37,544 proteincoding sequences, 60% of which have some similarities to rice ESTs and cDNAs. Seventy percent of the predicted genes have at least one homolog in the Arabidopsis proteome. About 2,800 gene models that match rice transcripts have no counterpart in Arabidopsis detected by BLASTP with a -20 cutoff value of 10 , and most of these proteins have no known function.
2 The Foundation for Understanding the Genetic Systems
15
About 30% of the predicted genes are present in tandem duplications. A graphical presentation of distribution of the major gene clusters on each chromosome is shown in Fig. 2.4. Apparently, there are many tandemly arrayed gene clusters (the pixels indicate each gene; the stacked pixels indicate that tandem array) in more than half of the 12 chromosomes. For example, the major cluster in chromosome 1 (extreme left) is a protein kinase cluster, and the gene cluster in chromosome 11 (extreme right) is related to disease resistance. As rice is the crop plant that is widely utilized as the staple, much analysis was devoted to identifying some useful DNA markers, including more than 10,000 Tos17 insertion sites, 19,000 class I simple sequence repeats (SSRs) sites, and 80,000 SNPs. These will be good candidates for the polymorphic markers among varieties and subspecies that would assist map-based cloning and marker-assisted selection.
Fig. 2.4. Distribution of arrayed genes on rice genome. Only tandemly repeated genes were considered. A BLASTP search was performed within each chromosome against all predicted protein sequences by IRGSP. Proteins that have an expectation (E)-value of 10-5 among others were grouped and shown as pixels in the figure. Graphics were made through GenomePixelizer (http://niblrrs.ucdavis.edu/ GenomePixelizer/GenomePixelizer_Welcome.html). Numbers above each plot show the positions in pseudomolecules (in Mb)
16
Takashi Matsumoto et al.
2.6 Current Status and Future Developments After completion of the official tasks of the IRGSP, the member countries are trying to fill the remaining gaps and to improve their sequences. Telomere regions, which are considered to be responsible for accurate chromosome replication and maintenance, have not been represented in the PAC/BAC libraries. The telomere regions have specific structures of TTTAGGG repeats (Richards and Ausubel 1988) and few restriction digestion sites. A new genomic library with a fosmid vector has been produced by Arizona Genomics Institute (Ammiraju et al. 2005). This library, which has 110,592 clones with an average insert size of 41 kb, has been constructed via random physical shearing of the genfome, and it has been most helpful for the IRGSP in finding new clones to fill the clone gaps. Moreover, seven fosmid clones were recently found by hybridization with the unique probe sequences at the end of the chromosomes (positions of these telomere clones are indicated by encircled white triangles in Fig. 2.2). All these clones have TTTAGGG repeats or its derivatives, indicating the physical contigs reside very near the ends of the chromosomes. The transition regions between euchromatin and telomere regions have approximately 600 predicted genes for seven subtelomeric regions. Searches for clones in other telomere regions are underway (Mizuno et al. 2006). These and other improvements are included in the updated version of rice pseudomolecules (currently build 4) at the IRGSP Web site. The rice genome sequence is a milestone in understanding the grass family. Comparative mapping could be a useful tool for isolating the genes among other grasses. High-resolution comparative maps have been constructed between rice and wheat (Sorrells et al. 2003) and rice and maize (Salse et al. 2004). Several trait genes (VRN1: Yan et al. 2003; VRN2: Yan et al. 2004; Ppd-H1: Turner et al. 2005; Ph1: Griffiths et al. 2006) have been isolated from barley and wheat, and both syntenic as large genome blocks and microsyntenic relationships with the rice genome have been effectively utilized in gene mapping. For the “Crop Circle” investigators (Devos 2005), rice is regarded as a stepping-stone to finding the order of markers and genes in the larger genomes. Such syntenic mapping has been used in Brassica genomes, for which Arabidopsis is the standard, and in Lotus japonicus and Medicago truncatula, which serve as models for legume crops. The rice genome sequence is the key to understanding the science of rice (Paterson et al. 2005). Knowledge of what constitutes rice, how rice grows and develops, how it produces grains, and how it resists pests and diseases can lead to more design-based agriculture and biotechnology. The new technologies will be the foundation to a second “Green Revolution” to allow sustainable growth of human life.
2 The Foundation for Understanding the Genetic Systems
17
Acknowledgments The authors thank all the participants of the IRGSP. We also acknowledge all the rice biologists who have joined the analysis of feature of rice genome. We also thank to Dr. S. Tabata from Kazusa DNA Research Institute for a critical review of the manuscript.
References Ammiraju JS, Yu Y, Luo M, Kudrna D, Kim H, Goicoechea JL, Katayose Y, Matsumoto T, Wu J, Sasaki T, Wing RA (2005) Random sheared fosmid library as a new genomic tool to accelerate complete finishing of rice (Oryza sativa spp. Nipponbare) genome sequence: sequencing of gap-specific fosmid clones uncovers new euchromatic portions of the genome. Theor Appl Genet 111:1596–1607 Baba T, Katagiri S, Tanoue H, Tanaka R, Chiden Y, Saji S, Hamada M, Nakashima M, Okamoto M, Hayashi M, Yoshiki S, Karasawa W, Honda M, Ichikawa Y, Arita K, Ikeno M, Ohta T, Umehara Y, Matsumoto T, de Jong PJ, Sasaki T (2000) Construction and characterization of rice genomic libraries: PAC library of japonica variety, Nipponbare, and BAC library of indica variety, Kasalath. Bull Natl Inst Agrobiol Res Jpn 14:41–51 Barry GF (2001) The use of the Monsanto draft rice genome sequence in research. Plant Physiol 125:1164–1165 Chen M, Presting G, Barbazuk WB, Goicoechea JL, Blackmon B, Fang G, Kim H, Frisch D, Yu Y, Sun S, Higingbottom S, Phimphilai J, Phimphilai D, Thurmond S, Gaudette B, Li P, Liu J, Hatfield J, Main D, Farrar K, Henderson C, Barnett L, Costa R, Williams B, Walser S, Atkins M, Hall C, Budiman MA, Tomkins JP, Luo M, Bancroft I, Salse J, Regad F, Mohapatra T, Singh NK, Tyagi AK, Soderlund C, Dean RA, Wing RA (2002) An integrated physical and genetic map of the rice genome. Plant Cell 14:537– 545 Devos KM (2005) Updating the “crop circle”. Curr Opin Plant Biol 8:155–62 Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb JF, Dougherty BA, Merrick JM, McKenney K, Sutton G, Fitzhugh W, Fields C, Gocayne JD, Scott J, Shirley R, Liu L, Glodek A, Kelley JM, Weidman JF, Phillips CA, Spriggs T, Hedblom E, Cotton MD, Utterback TR, Hanna MC, Nguyen DT, Saudek DM, Brandon RC, Fine LD, Fritchman JL, Fuhrmann JL, Geoghagen NSM, Gnehm CL, McDonald LA, Small KV, Fraser CM, Smith HO, Venter JC (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269: 496–512 Gaut B (2002) Evolutionally dynamics of grass genomes. New Phytologist 154:15–28 Ge S, Sang T, Lu BR, Hong DY (1999) Phylogeny of rice genomes with emphasis on origins of allotetraploid species. Proc Natl Acad Sci USA 96:14400–14405
18
Takashi Matsumoto et al.
Gilbert W, Maxam A (1973) The nucleotide sequence of the lac operator. Proc Natl Acad Sci USA 70:3581–3584 Goff SA, Ricke D, Lan TH, Presting G, Wang R, Dunn M, Glazebrook J, Sessions A, Oeller P, Varma H, Hadley D, Hutchison D, Martin C, Katagiri F, Lange BM, Moughamer T, Xia Y, Budworth P, Zhong J, Miguel T, Paszkowski U, Zhang S, Colbert M, Sun WL, Chen L, Cooper B, Park S, Wood TC, Mao L, Quail P, Wing R, Dean R, Yu Y, Zharkikh A, Shen R, Sahasrabudhe S, Thomas A, Cannings R, Gutin A, Pruss D, Reid J, Tavtigian S, Mitchell J, Eldredge G, Scholl T, Miller RM, Bhatnagar S, Adey N, Rubano T, Tusneem N, Robinson R, Feldhaus J, Macalma T, Oliphant A, Briggs S (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 296:92– 100 Griffiths S, Sharp R, Foote TN, Bertin I, Wanous M, Reader S, Colas I, Moore G (2006) Molecular characterization of Ph1 as a major chromosome pairing locus in polyploid wheat. Nature 439:749–752 International Rice Genome Sequencing Project (2005) The map-based sequence of the rice genome. Nature 436:793–800 Kikuchi S, Satoh K, Nagata T, Kawagashira N, Doi K, Kishimoto N, Yazaki J, Ishikawa M, Yamada H, Ooka H, Hotta I, Kojima K, Namiki T, Ohneda E, Yahagi W, Suzuki K, Li CJ, Ohtsuki K, Shishiki T, Otomo Y, Murakami K, Iida Y, Sugano S, Fujimura T, Suzuki Y, Tsunoda Y, Kurosaki T, Kodama T, Masuda H, Kobayashi M, Xie Q, Lu M, Narikawa R, Sugiyama A, Mizuno K, Yokomizo S, Niikura J, Ikeda R, Ishibiki J, Kawamata M, Yoshimura A, Miura J, Kusumegi T, Oka M, Ryu R, Ueda M, Matsubara K, Kawai J, Carninci P, Adachi J, Aizawa K, Arakawa T, Fukuda S, Hara A, Hashizume W, Hayatsu N, Imotani K, Ishii Y, Itoh M, Kagawa I, Kondo S, Konno H, Miyazaki A, Osato N, Ota Y, Saito R, Sasaki D, Sato K, Shibata K, Shinagawa A, Shiraki T, Yoshino M, Hayashizaki Y, Yasunishi A (2003) Collection, mapping, and annotation of over 28,000 cDNA clones from japonica rice. Science 301:376–379 Leach J, McCouch S, Slezak T, Sasaki T, Wessler S (2002) Why finishing the rice genome matters. Science 296:45 Mao L, Wood TC, Yu Y, Budiman MA, Tomkins J, Woo S, Sasinowski M, Presting G, Frisch D, Goff S, Dean RA, Wing RA (2000) Rice transposable elements: a survey of 73,000 sequence-tagged-connectors. Genome Res 10:982–990 Mizuno H, Wu J, Kanamori H, Fujisawa M, Namiki N, Saji S, Katagiri S, Katayose Y, Sasaki T, Matsumoto T (2006) Sequencing and characterization of telomere and subtelomere regions on rice chromosomes 1S, 2S, 2L, 6L, 7S, 7L and 8S. Plant J 46:206–217 Nagaki K, Cheng Z, Ouyang S, Talbert PB, Kim M, Jones KM, Henikoff S, Buell CR, Jiang J (2004) Sequencing of a rice centromere uncovers active genes. Nat Genet 36:138–145 National Library of Medicine (2005) Public Collections of DNA and RNA Sequence Reach 100 Gigabases. Press Release, http://www.nlm.nih.gov/news/ press_releases/dna_rna_100_gig.html
2 The Foundation for Understanding the Genetic Systems
19
Paterson AH, Freeling M, Sasaki T (2005) Grains of knowledge: genomics of model cereals. Genome Res 15:1643–1650 Richards EJ, Ausubel FM (1988) Isolation of a higher eukaryotic telomere from Arabidopsis thaliana. Cell 53:127–136 Salse J, Piegu B, Cooke R, Delseny M (2004) New in silico insight into the synteny between rice (Oryza sativa L.) and maize (Zea mays L.) highlights reshuffling and identifies new duplications in the rice genome. Plant J 38:396–409 Sanger F, Air GM, Barrell BG, Brown NL, Coulson AR, Fiddes CA, Hutchison CA, Slocombe PM, Smith M (1977a) Nucleotide sequence of bacteriophage phi X174 DNA. Nature 265:687–695 Sanger F, Nicklen S, Coulson AR (1977b) DNA sequencing with chainterminating inhibitors. Proc Natl Acad Sci USA 74:5463–5467 Siegel AF, Trask B, Roach JC, Mahairas GG, Hood L, van den Engh G (1999) Analysis of sequence-tagged-connector strategies for DNA sequencing. Genome Res 9:297–307 Smith LM, Sanders JZ, Kaiser RJ, Hughes P, Dodd C, Connell CR, Heiner C, Kent SB, Hood LE (1986) Fluorescence detection in automated DNA sequence analysis. Nature 321:674–679 Soderlund C, Longden I, Mott R (1997) FPC: a system for building contigs from restriction fingerprinted clones. Comput Appl Biosci 13:523–535 Sorrells ME, La Rota M, Bermudez-Kandianis CE, Greene RA, Kantety R, Munkvold JD, Miftahudin, Mahmoud A, Ma X, Gustafson PJ, Qi LL, Echalier B, Gill BS, Matthews DE, Lazo GR, Chao S, Anderson OD, Edwards H, Linkiewicz AM, Dubcovsky J, Akhunov ED, Dvorak J, Zhang D, Nguyen HT, Peng J, Lapitan NL, Gonzalez-Hernandez JL, Anderson JA, Hossain K, Kalavacharla V, Kianian SF, Choi DW, Close TJ, Dilbirligi M, Gill KS, Steber C, Walker-Simmons MK, McGuire PE, Qualset CO (2003) Comparative DNA sequence analysis of wheat and rice genomes. Genome Res 13:1818–1827 Turner A, Beales J, Faure S, Dunford RP, Laurie DA (2005) The pseudo-response regulator Ppd-H1 provides adaptation to photoperiod in barley. Science 310:1031–1034 Vaughan DA, Morishima H, Kadowaki K (2003) Diversity in the Oryza genus. Curr Opin Plant Biol 6:139–146 Venter JC, Smith HO, Hood L (1996) A new strategy for genome sequencing. Nature 381:364–366 Wu J, Maehara T, Shimokawa T, Yamamoto S, Harada C, Takazaki Y, Ono N, Mukai Y, Koike K, Yazaki J, Fujii F, Shomura A, Ando T, Kono I, Waki K, Yamamoto K, Yano M, Matsumoto T, Sasaki T (2002) A comprehensive rice transcript map containing 6591 expressed sequence tag sites. Plant Cell 14:525–535 Wu J, Mizuno H, Hayashi-Tsugane M, Ito Y, Chiden Y, Fujisawa M, Katagiri S, Saji S, Yoshiki S, Karasawa W, Yoshihara R, Hayashi A, Kobayashi H, Ito K, Hamada M, Okamoto M, Ikeno M, Ichikawa Y, Katayose Y, Yano M, Matsumoto T, Sasaki T (2003) Physical maps and recombination frequency of six rice chromosomes. Plant J 36:720–730
20
Takashi Matsumoto et al.
Wu J, Yamagata H, Hayashi-Tsugane M, Hijishita S, Fujisawa M, Shibata M, Ito Y, Nakamura M, Sakaguchi M, Yoshihara R, Kobayashi H, Ito K, Karasawa W, Yamamoto M, Saji S, Katagiri S, Kanamori H, Namiki N, Katayose Y, Matsumoto T, Sasaki T (2004) Composition and structure of the centromeric region of rice chromosome 8. Plant Cell 16:967–976 Yan L, Loukoianov A, Tranquilli G, Helguera M, Fahima T, Dubcovsky J (2003) Positional cloning of the wheat vernalization gene VRN1. Proc Natl Acad Sci USA 100:6263–6268 Yan L, Loukoianov A, Blechl A, Tranquilli G, Ramakrishna W, SanMiguel P, Bennetzen JL, Echenique V, Dubcovsky J (2004) The wheat VRN2 gene is a flowering repressor down-regulated by vernalization. Science 303:1640–1644 Yu J, Hu S, Wang J, Wong GK, Li S, Liu B, Deng Y, Dai L, Zhou Y, Zhang X, Cao M, Liu J, Sun J, Tang J, Chen Y, Huang X, Lin W, Ye C, Tong W, Cong L, Geng J, Han Y, Li L, Li W, Hu G, Huang X, Li W, Li J, Liu Z, Li L, Liu J, Qi Q, Liu J, Li L, Li T, Wang X, Lu H, Wu T, Zhu M, Ni P, Han H, Dong W, Ren X, Feng X, Cui P, Li X, Wang H, Xu X, Zhai W, Xu Z, Zhang J, He S, Zhang J, Xu J, Zhang K, Zheng X, Dong J, Zeng W, Tao L, Ye J, Tan J, Ren X, Chen X, He J, Liu D, Tian W, Tian C, Xia H, Bao Q, Li G, Gao H, Cao T, Wang J, Zhao W, Li P, Chen W, Wang X, Zhang Y, Hu J, Wang J, Liu S, Yang J, Zhang G, Xiong Y, Li Z, Mao L, Zhou C, Zhu Z, Chen R, Hao B, Zheng W, Chen S, Guo W, Li G, Liu S, Tao M, Wang J, Zhu L, Yuan L, Yang H (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296:79–92 Yu J, Wang J, Lin W, Li S, Li H, Zhou J, Ni P, Dong W, Hu S, Zeng C, Zhang J, Zhang Y, Li R, Xu Z, Li S, Li X, Zheng H, Cong L, Lin L, Yin J, Geng J, Li G, Shi J, Liu J, Lv H, Li J, Wang J, Deng Y, Ran L, Shi X, Wang X, Wu Q, Li C, Ren X, Wang J, Wang X, Li D, Liu D, Zhang X, Ji Z, Zhao W, Sun Y, Zhang Z, Bao J, Han Y, Dong L, Ji J, Chen P, Wu S, Liu J, Xiao Y, Bu D, Tan J, Yang L, Ye C, Zhang J, Xu J, Zhou Y, Yu Y, Zhang B, Zhuang S, Wei H, Liu B, Lei M, Yu H, Li Y, Xu H, Wei S, He X, Fang L, Zhang Z, Zhang Y, Huang X, Su Z, Tong W, Li J, Tong Z, Li S, Ye J, Wang L, Fang L, Lei T, Chen C, Chen H, Xu Z, Li H, Huang H, Zhang F, Xu H, Li N, Zhao C, Li S, Dong L, Huang Y, Li L, Xi Y, Qi Q, Li W, Zhang B, Hu W, Zhang Y, Tian X, Jiao Y, Liang X, Jin J, Gao L, Zheng W, Hao B, Liu S, Wang W, Yuan L, Cao M, McDermott J, Samudrala R, Wang J, Wong GK, Yang H (2005) The Genomes of Oryza sativa: a history of duplications. PLoS Biol 3:e38 Zhang Y, Huang Y, Zhang L, Li Y, Lu T, Lu Y, Feng Q, Zhao Q, Cheng Z, Xue Y, Wing RA, Han B (2004) Structural features of the rice chromosome 4 centromere. Nucl Acids Res 32:2023–2030
3 Rice Genome Annotation: Beginnings of Functional Genomics
Takeshi Itoh National Institute of Agrobiological Sciences, 2-1-2 Kannondai, Tsukuba, Ibaraki 305-8602, Japan Reviewed by C. Robin Buell and Battazar A. Antonio
3.1 Introduction................................................................................................21 3.2 Computational Methods of Annotation......................................................22 3.3 Automated Annotation System ..................................................................24 3.4 Comprehensive Genome Annotation and Curation ...................................25 3.5 From Annotations to Functional Genomics ...............................................26 Acknowledgments ...........................................................................................27 References........................................................................................................27
3.1 Introduction Progress in molecular biological studies has been achieved via analysis of targeted pieces of DNA molecules. This is still a crucial step to better understand the mechanisms involved in biological processes at the molecular level. Analysis of the DNA sequences of a handful of genes is now simplified by computer programs that facilitate a sequence comparison to find genic regions and other related elements encoded in the sequence. For example, BLAST searches (Altschul et al. 1997) based on a graphical user interface are provided by the International Nucleotide Sequence Databases (DDBJ/EMBL/ GenBank; Benson et al. 2006; Cochrane et al. 2006; Okubo et al. 2006). These similarity searches allow the user to identify whether the sequence encodes for a protein and what functions can be inferred by comparison with sequences registered in the databases. However, rapid advancement of high-throughput technology for production of biological information in the last decade has changed the paradigm of molecular biology based on small-scale laboratory work. It
22
Takeshi Itoh
would be considerably time-consuming to manually search a 100-Mb DNA segment for all possible genes. To cope with the flood of biological information, one may anticipate other solutions by means of mass-computational biology. Large-scale computation to find genes and gene functions has therefore become an essential process for a genome-wide sequencing project. This is called genome annotation. This chapter describes several standard methods to annotate a genome as well as current efforts to annotate the rice genome by automated computation, focusing on the framework of the annotations rather than their bioinformatics backgrounds. For general bioinformatics issues, see Chapter 14 of this book.
3.2 Computational Methods of Annotation A DNA sequence is computationally interpreted as a stretch of four characters. The sequence per se may not add to biological knowledge unless it is annotated. Therefore, to translate these characters into biological information, it is necessary to find what functional role a specific DNA segment plays. Prediction of protein-coding genes and their functions is a primary issue of annotation. For this purpose, two essential steps are required: prediction of exons and inference of functions by comparison with other known sequences (Fig. 3.1). Ab initio gene-finding methods implement the former, while similarity searches can usually provide both. Novel genes can be predicted by ab initio gene-finding methods based on an appropriate algorithm such as a hidden Markov model (Burge and Karlin 1997; Salamov and Solovyev 2000). Computer programs usually report the positions of start and stop codons and exon–intron structures. One of the strengths of those ab initio methods is that they do not require any homologs for comparison, and therefore completely new genes can be detected. However, one should note that accurate prediction of exon–intron boundaries in higher eukaryotes is generally difficult and ab initio methods might give a number of false positives and negatives (Yao et al. 2005). Another point is that pseudogenes that have recently lost functions are mistakenly predicted even though they are not transcribed (van Baren and Brent 2006). cDNAs are thought to provide strong evidence of structural genes. In particular, comparison between full-length cDNAs and a genome sequence should determine the complete structures of gene loci. It is expected that cDNAs can be aligned easily against a genome because they were transcribed from the genome. However, cDNAs may sometimes show less than 100% identity and could not be mapped to the genome. This is due to repetitive elements, recent tandem duplication, sequencing errors, and so forth. To discard these artifacts that hamper cDNA-mapping, additional bioinformatics methods are necessary (Imanishi et al. 2004). For instance,
3 Rice Genome Annotation
23
DNA sequences of known protein-coding genes Ab initio prediction
Genome
cDNA mapping cDNA sequences
Deduced amino acid sequence Functional motifs
Comparison with protein sequences Protein databases Fig. 3.1. Schematic view of annotations. Genes are predicted by ab initio genefinding methods or cDNA-mapping. Their functions are inferred by similarity and motif searches
known repetitive sequences can be masked before the mapping by using an appropriate program such as RepeatMasker (http://www.repeatmasker.org/). Similarity searches against protein databases in many cases facilitate the assignment of plausible functions to candidate genes if homolog(s) detected have already been well investigated experimentally. If the regions of homology are limited, motif searches via InterProScan are useful to find such weak similarity (Zdobnov and Apweiler 2001; Quevillon et al. 2005). Further, InterProScan shows InterPro identification (ID) numbers that are connected with Gene Ontology (GO) ID numbers, which means that the functions inferred can be classified under the GO hierarchy (Ashburner et al. 2000). These functional inferences should expedite further experimental validation.
24
Takeshi Itoh
3.3 Automated Annotation System With the advancement of the genome sequencing by the International Rice Genome Sequencing Project (IRGSP, International Rice Genome Sequencing Project 2005), an automated annotation system that facilitates analysis of hundreds of megabases of DNA sequences and produces reliable and comprehensive results has become necessary. With this in mind, the Rice Genome Automated Annotation System (RiceGAAS) was developed to execute genome-wide annotation of rice (Sakata et al. 2002). RiceGAAS employs several ab initio gene-finding methods such as GENSCAN (Burge and Karlin 1997) and RiceHMM. Genes are also predicted by BLAST searches against expressed sequence tags (ESTs) and the nonredundant protein database of the National Center for Biotechnology Information. These results are combined and a reconstituted primary gene structure is presented. The function of the predicted gene is inferred by comparison with motifs in Pfam (Sonnhammer et al. 1997; Bateman et al. 2004) and PROSITE (Hulo et al. 2006). In addition, RiceGAAS provides information about cellular localization, upstream cis-regulatory elements, and other features. RiceGAAS: Rice Genome Automated Annotation System RiceGAAS is a rice genome automated annotation system. This system integrates programs for prediction and analysis of protein-coding gene structure. Integrated softwares are coding region prediction programs ( GENSCAN, RiceHMM, FGENESH, MZEF ), splice site prediction programs (SplicePredictor ), homology search analysis programs ( Blast, HMMER, ProfileScan, MOTIF ), tRNA gene prediction program ( tRNAscan-SE ), repetitive DNA analysis programs ( RepeatMasker, Printrepeats ), signal scan search program ( Signal Scan ), protein localization site prediction program ( PSORT ), and program of classification and secondary structure prediction of membrane proteins ( SOSUI ). Blast against full-length cDNA sequences of japonica rice is integrated. The full-length rice cDNA sequence is provided by KOME database. Interpretation of the coding region is fully automated and gene prediction is accomplished without manual evaluation and modification. Therefore some differences exist between the predicted genes by the system and the manually predicted genes included in the GenBank entries. At present about 74% of auto and manually predicted genes are the same at nucleotide level (see "comparison table of gene prediction", http://RiceGAAS.dna.affrc.go.jp/rga-bin/col_accur.pl in detail). Further, a unique function is automatically assigned for predicted gene by GFSelector based on the protein homology of the gene. Additionally, the keyword search from the functions predicted by GFSelector is now provided.
Fig. 3.2. The home page of the Rice Genome Automated Annotation System (RiceGAAS)
3 Rice Genome Annotation
25
RiceGAAS integrates all the results that can be visualized using a Webbased graphical interface (Fig. 3.2). Users can submit their own sequence as a query and conduct the automated annotation (http://ricegaas. dna.affrc.go.jp/). RiceGAAS works efficiently not only for rice but also for other related cereals such as wheat and barley.
Fig. 3.3. Databases of rice genome annotation. TIGR’s Osa1 database, the RAPDB, and Gramene
3.4 Comprehensive Genome Annotation and Curation There have been several efforts to annotate the rice genome and construct an annotation database (Fig. 3.3). The Institute for Genome Research (TIGR) created a genome assembly of Oryza sativa ssp. japonica cv Nipponbare (Yuan et al. 2005), using a BAC/PAC clone sequences produced by the IRGSP. TIGR used the Eukaryotic Genome Control pipeline (Wortman et al. 2003) for rice genome annotation and the procedure used is standard, similar to that of RiceGAAS. An advantage of TIGR’s pipeline is that automatically predicted gene structures are improved through use of the Program to Assemble Spliced Alignments
26
Takeshi Itoh
(PASA; Haas et al. 2003). Annotators can compare the structures with cDNA alignments against the genome and update the information using PASA. In addition, alternative splicing isoforms are detected. Another advantage is that TIGR constructed its own repeat database of rice (Ouyang and Buell 2004) and more than 14,000 transposable elements (TEs) were thoroughly identified and distinguished from non-TE genes. TIRG’s annotation database, Osa1, is available at http://rice.tigr.org/. Since the IRGSP completed the sequencing of the entire rice genome, the Rice Annotation Project (RAP) was organized to annotate the genome extensively (Ohyanagi et al. 2006). As approximately 32,000 full-length cDNA sequences had been released by Kikuchi et al. (2003), RAP focused on rice gene loci that were supported by physical clones so that a reliable dataset of rice genes would be provided. The gene structures were determined on the basis of cDNA-genome alignments generated by est2genome (Rice et al. 2000). Moreover, all the functions inferred by automated methods were extensively examined by manual curation to remove ambiguous electronic annotations (Misra et al. 2002; Camon et al. 2003). The RAP data are accessible through the RAP-DB (http://rapdb.lab.nig.ac.jp/). One of the central issues in the construction of a genome annotation database is that genes need to be given unique and unambiguous identifiers. Therefore, the gene identifiers that were formally defined by the Committee on Gene Symbolization, Nomenclature and Linkage of the Rice Genetics Cooperative (http://www.gramene.org/ documentation/nomenclature/) have been assigned to all the RAP loci. Annotations of multiple genomes are expected to facilitate comparative studies. The Gramene database provides an integrated view of various data obtained from major crop plants including rice (http://www.gramene.org/; Jaiswal et al. 2006). Markers, quantitative trait loci, and other features are mapped to genomes, and a comparison of these annotated genomes can be displayed by the Comparative Map Viewer (CMap), which was developed as a part of the Generic Model Organism Database (GMOD) project (http://www.gmod.org/). Gramene is one of the databases that have emphasized the use of ontologies, thereby providing controlled vocabularies that can be applied to various cereal crops (Yamazaki and Jaiswal 2005). Bioinformatics studies based on ontologies will be of increasing significance in the era of comparative genomics.
3.5 From Annotations to Functional Genomics At present, most genome annotation is electronic and therefore remains to be validated experimentally. For example, proteome analysis can confirm both translation and cellular localization of a predicted gene, so that it is
3 Rice Genome Annotation
27
possible to obtain detailed information on the function of the gene in the living cell (Komatsu et al. 2004; Komatsu and Tanaka 2005; see also Chapter 5 of this book). Recent techniques such as tiling arrays of an entire genome (Li et al. 2006) could be used as a tool for genome-wide validation of the biological significance of the annotations (for details see Chapter 4 of this book). A number of flanking sequences of transposon-tagged mutant lines have been produced in rice (Hirochika et al. 2004; see also Chapters 9 and 10 of this book). By mapping the flanking sequences to the genome and comparing their positions with those of gene candidates annotated, one can evaluate the effects of the gene disruptions if the phenotypes of the mutants have already been examined. This mapping information is an immediate resource for future functional genomics. In this way, annotations will have important utility for further large-scale experiments.
Acknowledgments The author thanks C. Robin Buell and Baltazar A. Antonio for critical reading of the manuscript. The author also wishes to thank Tsuyoshi Tanaka and Kumiko Suzuki for their assistance in preparing the manuscript.
References Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl Acids Res 25:3389–3402 Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25:25–29 Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer ELL, Studholme DJ, Yeats C, Eddy SR (2004) The Pfam protein families database. Nucl Acids Res 32:D138–141 Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL (2006) GenBank. Nucl Acids Res 34:D16–20 Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic DNA. J Mol Biol 268:78–94 Camon E, Magrane M, Barrell D, Binns D, Fleischmann W, Kersey P, Mulder N, Oinn T, Maslen J, Cox A, Apweiler R (2003) The Gene Ontology Annotation (GOA) Project: Implementation of GO in SWISS-PROT, TrEMBL, and InterPro. Genome Res 13:662–672
28
Takeshi Itoh
Cochrane G, Aldebert P, Althorpe N, Andersson M, Baker W, Baldwin A, Bates K, Bhattacharyya S, Browne P, van den Broek A, Castro M, Duggan K, Eberhardt R, Faruque N, Gamble J, Kanz C, Kulikova T, Lee C, Leinonen R, Lin Q, Lombard V, Lopez R, McHale M, McWilliam H, Mukherjee G, Nardone F, Pastor MPG, Sobhany S, Stoehr P, Tzouvara K, Vaughan R, Wu D, Zhu W, Apweiler R (2006) EMBL Nucleotide Sequence Database: developments in 2005. Nucl Acids Res 34:D10–15 Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK, Jr., Hannick LI, Maiti R, Ronning CM, Rusch DB, Town CD, Salzberg SL, White O (2003) Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucl Acids Res 31:5654–5666 Hirochika H, Guiderdoni E, An G, Hsing YI, Eun MY, Han CD, Upadhyaya N, Ramachandran S, Zhang Q, Pereira A, Sundaresan V, Leung H (2004) Rice mutant resources for gene discovery. Plant Mol Biol 54:325–334 Hulo N, Bairoch A, Bulliard V, Cerutti L, De Castro E, Langendijk-Genevaux PS, Pagni M, Sigrist CJA (2006) The PROSITE database. Nucl Acids Res 34:D227–230 Imanishi T, Itoh T, Suzuki Y, O'Donovan C, Fukuchi S, Koyanagi KO, Barrero RA, Tamura T, Yamaguchi-Kabata Y, Tanino M, Yura K, Miyazaki S, Ikeo K, Homma K, Kasprzyk A, Nishikawa T, Hirakawa M, Thierry-Mieg J, Thierry-Mieg D, Ashurst J, Jia L, Nakao M, Thomas MA, Mulder N, Karavidopoulou Y, Jin L, Kim S, Yasuda T, Lenhard B, Eveno E, Suzuki Y, Yamasaki C, Takeda J, Gough C, Hilton P, Fujii Y, Sakai H, Tanaka S, Amid C, Bellgard M, Bonaldo Mde F, Bono H, Bromberg SK, Brookes AJ, Bruford E, Carninci P, Chelala C, Couillault C, de Souza SJ, Debily MA, Devignes MD, Dubchak I, Endo T, Estreicher A, Eyras E, Fukami-Kobayashi K, Gopinath GR, Graudens E, Hahn Y, Han M, Han ZG, Hanada K, Hanaoka H, Harada E, Hashimoto K, Hinz U, Hirai M, Hishiki T, Hopkinson I, Imbeaud S, Inoko H, Kanapin A, Kaneko Y, Kasukawa T, Kelso J, Kersey P, Kikuno R, Kimura K, Korn B, Kuryshev V, Makalowska I, Makino T, Mano S, Mariage-Samson R, Mashima J, Matsuda H, Mewes HW, Minoshima S, Nagai K, Nagasaki H, Nagata N, Nigam R, Ogasawara O, Ohara O, Ohtsubo M, Okada N, Okido T, Oota S, Ota M, Ota T, Otsuki T, Piatier-Tonneau D, Poustka A, Ren SX, Saitou N, Sakai K, Sakamoto S, Sakate R, Schupp I, Servant F, Sherry S, Shiba R, Shimizu N, Shimoyama M, Simpson AJ, Soares B, Steward C, Suwa M, Suzuki M, Takahashi A, Tamiya G, Tanaka H, Taylor T, Terwilliger JD, Unneberg P, Veeramachaneni V, Watanabe S, Wilming L, Yasuda N, Yoo HS, Stodolsky M, Makalowski W, Go M, Nakai K, Takagi T, Kanehisa M, Sakaki Y, Quackenbush J, Okazaki Y, Hayashizaki Y, Hide W, Chakraborty R, Nishikawa K, Sugawara H, Tateno Y, Chen Z, Oishi M, Tonellato P, Apweiler R, Okubo K, Wagner L, Wiemann S, Strausberg RL, Isogai T, Auffray C, Nomura N, Gojobori T, Sugano S (2004) Integrative annotation of 21,037 human genes validated by full-length cDNA clones. PLoS Biol 2:859–875 International Rice Genome Sequencing Project (2005) The map-based sequence of the rice genome. Nature 436:793–800
3 Rice Genome Annotation
29
Jaiswal P, Ni J, Yap I, Ware D, Spooner W, Youens-Clark K, Ren L, Liang C, Zhao W, Ratnapu K, Faga B, Canaran P, Fogleman M, Hebbard C, Avraham S, Schmidt S, Casstevens TM, Buckler ES, Stein L, McCouch S (2006) Gramene: a bird’s eye view of cereal genomes. Nucl Acids Res 34:D717–723 Kikuchi S, Satoh K, Nagata T, Kawagashira N, Doi K, Kishimoto N, Yazaki J, Ishikawa M, Yamada H, Ooka H, Hotta I, Kojima K, Namiki T, Ohneda E, Yahagi W, Suzuki K, Li CJ, Ohtsuki K, Shishiki T, Otomo Y, Murakami K, Iida Y, Sugano S, Fujimura T, Suzuki Y, Tsunoda Y, Kurosaki T, Kodama T, Masuda H, Kobayashi M, Xie Q, Lu M, Narikawa R, Sugiyama A, Mizuno K, Yokomizo S, Niikura J, Ikeda R, Ishibiki J, Kawamata M, Yoshimura A, Miura J, Kusumegi T, Oka M, Ryu R, Ueda M, Matsubara K, Kawai J, Carninci P, Adachi J, Aizawa K, Arakawa T, Fukuda S, Hara A, Hashizume W, Hayatsu N, Imotani K, Ishii Y, Itoh M, Kagawa I, Kondo S, Konno H, Miyazaki A, Osato N, Ota Y, Saito R, Sasaki D, Sato K, Shibata K, Shinagawa A, Shiraki T, Yoshino M, Hayashizaki Y, Yasunishi A (2003) Collection, mapping, and annotation of over 28,000 cDNA clones from japonica rice. Science 301:376–379 Komatsu S, Tanaka N (2005) Rice proteome analysis: a step toward functional analysis of the rice genome. Proteomics 5:938–949 Komatsu S, Kojima K, Suzuki K, Ozaki K, Higo K (2004) Rice Proteome Database based on two-dimensional polyacrylamide gel electrophoresis: its status in 2003. Nucl Acids Res 32:D388–392 Li L, Wang X, Stolc V, Li X, Zhang D, Su N, Tongprasit W, Li S, Cheng Z, Wang J, Deng XW (2006) Genome-wide transcription analyses in rice using tiling microarrays. Nat Genet 38:124–129 Misra S, Crosby M, Mungall C, Matthews B, Campbell K, Hradecky P, Huang Y, Kaminker J, Millburn G, Prochnik S, Smith C, Tupy J, Whitfield E, Bayraktaroglu L, Berman B, Bettencourt B, Celniker S, de Grey A, Drysdale R, Harris N, Richter J, Russo S, Schroeder A, Shu S, Stapleton M, Yamada C, Ashburner M, Gelbart W, Rubin G, Lewis S (2002) Annotation of the Drosophila melanogaster euchromatic genome: a systematic review. Genome Biol 3:81–22 Ohyanagi H, Tanaka T, Sakai H, Shigemoto Y, Yamaguchi K, Habara T, Fujii Y, Antonio BA, Nagamura Y, Imanishi T, Ikeo K, Itoh T, Gojobori T, Sasaki T (2006) The Rice Annotation Project Database (RAP-DB): hub for Oryza sativa ssp. japonica genome information. Nucl Acids Res 34:D741–744 Okubo K, Sugawara H, Gojobori T, Tateno Y (2006) DDBJ in preparation for overview of research activities behind data submissions. Nucl Acids Res 34:D6–9 Ouyang S, Buell CR (2004) The TIGR Plant Repeat Databases: a collective resource for the identification of repetitive sequences in plants. Nucl Acids Res 32:D360–363 Quevillon E, Silventoinen V, Pillai S, Harte N, Mulder N, Apweiler R, Lopez R (2005) InterProScan: protein domains identifier. Nucl Acids Res 33:W116– 120 Rice P, Longden I, Bleasby A (2000) EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet 16:276–277
30
Takeshi Itoh
Sakata K, Nagamura Y, Numa H, Antonio BA, Nagasaki H, Idonuma A, Watanabe W, Shimizu Y, Horiuchi I, Matsumoto T, Sasaki T, Higo K (2002) RiceGAAS: an automated annotation system and database for rice genome sequence. Nucl Acids Res 30:98–102 Salamov AA, Solovyev VV (2000) Ab initio gene finding in Drosophila genomic DNA. Genome Res 10:516–522 Sonnhammer ELL, Eddy SR, Durbin R (1997) Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins: Struct Funct Genet 28:405–420 van Baren MJ, Brent MR (2006) Iterative gene prediction and pseudogene removal improves genome annotation. Genome Res 16:678–685 Wortman JR, Haas BJ, Hannick LI, Smith RK, Jr., Maiti R, Ronning CM, Chan AP, Yu C, Ayele M, Whitelaw CA, White OR, Town CD (2003) Annotation of the Arabidopsis Genome. Plant Physiol 132:461–468 Yamazaki Y, Jaiswal P (2005) Biological ontologies in rice databases. An introduction to the activities in Gramene and Oryzabase. Plant Cell Physiol 46:63–68 Yao H, Guo L, Fu Y, Borsuk LA, Wen TJ, Skibbe DS, Cui X, Scheffler BE, Cao J, Emrich SJ, Ashlock DA, Schnable PS (2005) Evaluation of five ab initio gene prediction programs for the discovery of maize genes. Plant Mol Biol 57:445–460 Yuan Q, Ouyang S, Wang A, Zhu W, Maiti R, Lin H, Hamilton J, Haas B, Sultana R, Cheung F, Wortman J, Buell CR (2005) The Institute for Genomic Research Osa1 Rice Genome Annotation Database. Plant Physiol 138:18–26 Zdobnov EM, Apweiler R (2001) InterProScan – an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17:847–848
4 Genome-Wide RNA Expression Profiling in Rice
1
2
Shoshi Kikuchi , Guo-Liang Wang and Lei Li
3
1
Laboratory of Gene Expression Department of Genetics, National Institute of Agrobiological Sciences, 2-1-2 Kannon-dai Tsukuba Ibaraki 305-8602 Japan; 2 Department of Plant Pathology, Ohio State University, 2021 Coffey Road Columbus OH 43210, USA; 3Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT 06520, USA Reviewed by Lee Tarpley and Iain Wilson
4.1 Introduction................................................................................................31 4.2 Rice Transcriptome—from EST Collection to Microarray........................32 4.2.1 Rice EST Collection and the First cDNA Microarray System Based on the EST Clones ...................................................................32 4.2.2 Full-Length cDNA Project .................................................................35 4.2.3 Oligoarray Systems ............................................................................37 4.3 Deep Transcriptome Analysis of the Rice Genome ...................................39 4.3.1 Principles of Different SAGE Techniques .........................................40 4.3.2 Development of the Robust-LongSAGE (RL-SAGE) Method ..........42 4.3.3 Application of RL-SAGE for Defense Transcriptome Analysis in Rice ................................................................................................43 4.3.4 MPSS for Expression Profiling ..........................................................44 4.3.5 Deep Transcriptome Analysis Using MPSS.......................................44 4.4 Transcriptional Analysis Using Genome Tiling Microarrays ....................45 4.4.1 Principle of Genome Tiling Microarrays............................................46 4.4.2 Application of Genome Tiling Microarray Analysis in Rice..............47 4.5 Perspective.................................................................................................52 Acknowledgments ...........................................................................................53 References........................................................................................................54
4.1 Introduction One of the most daunting challenges in the post-genomic era is to identify and characterize all the transcribed regions in a genome. In the past few
32
Shoshi Kikuchi et al.
years, significant progress has been made in transcriptome analysis of the rice genome via a number of new technologies. In this chapter, we review the recent advances in the large-scale expressed sequence tag (EST) sequencing, establishment of microarray systems using EST sequences, and new oligomicroarray systems based on the full-length cDNA sequences. New methods for a deep and comprehensive transcriptome analysis of the rice genome, such as serial analysis of gene expression (SAGE), massively parallel signature sequencing (MPSS), and the whole genome tiling array system, are also discussed.
4.2 Rice Transcriptome—from EST Collection to Microarray In many organisms, the first transcriptome approach is usually to collect a large number of ESTs from many cDNA libraries. These ESTs are useful for new gene discovery, probe designs of microarrays, and sequence analysis of coding regions in the genome. However, because ESTs are usually derived from one sequencing read, they are often short fragments (300 to 500 bp) and do not contain the whole open reading frame (ORF) of an expressed gene. In mammalian systems such as mouse and human, the technology for construction of full-length cDNA libraries has been well established, and the isolation of full-length cDNAs has made a significant contribution to the annotation of gene structure in these organisms. The same technology has been used in the construction of full-length cDNA libraries for japonica rice, with approximately 380,000 full-length cDNA clones isolated (Kikuchi et al. 2003; Satoh, et al. unpublished). 4.2.1 Rice EST Collection and the First cDNA Microarray System Based on the EST Clones The Japanese Rice Genome Research Program (RGP) contributed extensively to the earliest stage of EST collection. The first major contribution was made by the large-scale, pre-genome-sequencing phase of the RGP (1991–97), which contributed about 60,000 EST sequences from the Nipponbare (Fig. 4.1). Sequence data on each clone can be obtained via the MAFF_Rice cDNA Clone Overview page at http://bank.dna. affrc.go.jp/%7Eqxrice/hiho/ (Sasaki et al. 1994). Clustering analysis revealed that this collection originated from 10,000 independent cDNA groups. Protein coding analysis revealed that 25% of the clones had significant similarities to known proteins (Yamamoto and Sasaki 1997). More than 1.2 million ESTs from rice have been registered in the NCBI GenBank. The main purpose of this large-scale EST collection is the
4 Genome-Wide RNA Expression Profiling in Rice
33
construction of a restriction fragment length polymorphism (RFLP) linkage map that will allow the construction of a physical map of the chromosomes and an understanding of the mechanisms of expression of genes for various isozymes (Fig. 4.1). Later, 6,713 unique EST sequences from this collection were mapped to 4,387 yeast artificial chromosome (YAC) clones from rice genomic DNA, generating 6,591 mapped sites on the rice genome (Wu et al. 2002; Fig. 4.2). The mapping result showed that chromosomes 1, 2, and 3 have relatively high EST densities, approx–imately twice those of chromosomes 11 and 12, and contain 41% of the total EST sites on the map. Most of the EST-dense regions are distributed on the distal regions of each chromosome arm (Fig. 4.2). A further 86,136 ESTs were sequenced from nine rice cDNA libraries from the superhybrid cultivar LYP9 and its parents. The assembly of EST sequences yielded 13,232 contigs and 8,976 singletons (Zhou et al. 2003). Updated information on indica ESTs and the mapping information of rice full-length cDNAs on the indica genome sequence can be viewed through Beijing Genomics Institute’s Rice Information System (BGI_RIS) (Zhao et al. 2004).
Collection of cDNA Genes for formation of tissues and organs Genes responsible for environmental stresses
Characterization of cDNA 30%
Estimation of function of gene product by homology search
25% 20% 15% 10% 5%
root lea f
0%
panicle leaf root specific house specific specific gene keeping gene gene gene
panicle
Future purposes Sequencing of genomic regions around these genes
Probes for the microarray system
Functional analysis of rice genes
Analysis of tissue specificity
Fig. 4.1. Schematic diagram of large-scale cDNA analysis in rice summarizing the strategy and future purposes
34
Shoshi Kikuchi et al.
Fig. 4.2. Chromosomal distribution of the 6,591 rice EST sites. (Reproduced from Wu et al. 2002.)
Using the results of large-scale cDNA analysis, microarray technology can be used to monitor gene expression profiles and to perform functional analysis of the rice genome. For this purpose, the Rice Microarray Project was started in April 1999 jointly by the National Institute of Agrobiological Resources (NIAR) and the Society for Techno-Innovation of Agriculture, Forestry and Fisheries, Japan, in collaboration with 64 research institutes throughout Japan (Kikuchi 2007). The members and their research topics are shown on the Rice Microarray Opening Site (RMOS; http://cdna01.dna.affrc.go.jp/RMOS/index.html), which is administered by the National Institute of Agrobiological Sciences (NIAS). In this project, using semi-unique RGP-ESTs as probes, 1,265 (Yazaki et al. 2000) and 8,987 (Yazaki et al. 2003) cDNA-based microarray systems have been established, and more than 1,300 hybridization records have been deposited in the database (Yazaki et al. 2002). Gene expression data generated from these two sets of cDNA-based microarrays are available at the Rice Expression Database (RED) Web site (http://red.dna.affrc.go.jp/ RED/). The RMOS explains the experimental procedures for microarray analysis and gives information on probes. The 1m265 and 8,987 cDNA arrays are pioneer microarray systems in rice, but the limited number of
4 Genome-Wide RNA Expression Profiling in Rice
35
probes, corresponding to one quarter of the number of genes estimated to exist in rice, means that too few genes are analyzed. A 22K oligoarray system described later has helped to overcome several problems, including the reproducibility of the microarray quality caused by the printing process (construction of arrays), cross-hybridization caused by unknown nucleotide sequences, and insufficient capacity to accept requests from users. 4.2.2 Full-Length cDNA Project The International Rice Genome Sequencing Project (IRGSP) was launched in 1997 following efforts to establish a catalog of rice genes (Sasaki et al. 1994), a high-density linkage map (Harushima et al. 1998), and a YACbased physical map (Sasaki et al. 1996). At that time, the rice EST collection was estimated to cover about one quarter to one third of the genes in the rice genome. For complete information on transcripts, an enormous collection of full-length cDNA (FL-cDNA) clones was required. The FL-cDNA clones are necessary to identify exon–intron boundaries and gene-coding regions within genomic sequences and for comprehensive gene function analyses at the transcriptional and translational levels. At the beginning of the year 2000, as a joint collaboration of the Foundation for Advancement of International Science (FAIS), the RIKEN Institute, and the NIAS under the supervision of the Bio-oriented Technology Research Advancement Institution, the Rice FL-cDNA project was launched. This project was the first joint collaboration focusing on rice biology using the technology for FL-cDNA collection from human and mouse genomes. From more than 50 different tissues with, and without, several stress treatments, using two methods for construction of the full-length cDNA library (oligo-capping method, Maruyama and Sugano 1994; and biotinylated cap trapper method, Carninci et al. 2000), the completed project collected more than 380,000 clones and randomly sequenced them from their 5′ and 3′ ends. By September 2003, 32,127 clones out of 170,000 FL-cDNA clones were completely sequenced (Kikuchi et al. 2003), and 580,000 FL-ESTs from the 380,000 FL-cDNA clones were recorded (Satoh et al. unpublished, Fig. 4.3). All related EST sequences covering about two thirds of the rice genes, including the 580,000 single-pass sequences, were published in February 2006 (DDBJ accession CI000001–CI778739; Satoh et al. unpublished). Mapping of the FL-cDNAs to three rice genome assemblies (TIGR release 3, Yuan et al. 2005; IRGSP build 3, International Rice Genome Sequencing Project 2005; and Beijing Genomics Institute BGI’s 93-11 genome, Yu et al. 2005) revealed about 20,600 transcription units and
36
Shoshi Kikuchi et al.
about 6,000 alternative splicing events, whereas mapping of the 580,000 FL-ESTs generated about 29,800 transcription units. The TIGR annotation (57,915 genes and 61,250 gene models, Yuan et al. 2005) contains 32,000 genes that do not have cDNA evidence, and 24,000 to 25,000 genes overlap with cDNA sequences. Five thousand genes were newly discovered in the nonannotated region (Satoh et al. unpublished). Seeds, shoot and root of seedlings, mature leaves, mature roots, panicle, embryo, calli. Several kinds of stress-treated seedlings and calli
Construction of full-length cDNA libraries with two methods - Oligo-capping method - Biotinylated CAP trapper method Collection of 380K clones and 580K ESTs 32,127 clones out of 170K clones were completely sequenced at this time
others 238 65
90
D N A replica tion
243 2 58
D evelopm enta l Process, Aging, D ea th C ell growth M a intena nce
40 6 16
58 1
C om unica tion, D efense
69 1 10 09
E nergy cell com munica tion
99 07 4 98 1
Whole sequence data has been registered in a public database and shown from KOME site
Tra nscription Tra nsla tion Tra nsport M eta bolism uncla ssified
Fig. 4.3. Schematic diagram of large-scale full-length cDNA collection and sequencing
The consortium performed BLASTN and BLASTX homology searches of the registered sequences in GenBank, computer analyses of cellular location, transmembrane analyses, and Gene Ontology classification of the putative proteins encoded by 28,469 FL-cDNAs (Kikuchi et al. 2003). Globally, 64% of FL-cDNAs are homologous to Arabidopsis proteins. Details of each clone are shown on the KOME Web site (Knowledgebased Oryza Molecular Biological Encyclopedia, http://cdna01.dna. affrc.go.jp/cDNA/).
4 Genome-Wide RNA Expression Profiling in Rice
37
Information from the 32,127 FL-cDNA clones was also used in the Rice Annotation Project 1 (RAP1), in which FL-cDNAs and other public ESTs were mapped and aligned to the rice genome sequence from IRGSP, and then annotations were added by hand. The first Annotation Jamboree meeting was held in December 2004 in Tsukuba. Details of the annotated genes are shown in RAP-DB (Ohyanagi et al. 2006; http://rapdb.lab. nig.ac.jp/). 4.2.3 Oligoarray Systems The collection and complete sequencing of 32,127 rice full-length cDNA clones allowed NIAS researchers to increase the 8,987 cDNA-based microarray to a new global rice array based on oligomicroarray techn– ology. This was carried out in collaboration with Agilent Technologies, a private company with strong capabilities to synthesize 60-mer or 70-mer oligonucleotides as probes for microarray systems. Because about 22,000 probes can be printed on one glass plate, only one probe per transcription unit mapped to the rice genome sequence was selected. Agilent Technologies designed 60-mer probe sequences from 29,100 full-length cDNA sequences, considering the Tm and GC content and removing the possibility of cross-hybridization. After several validation experiments using custom-prepared arrays and RNAs from seed, callus, seedlings, and so forth, a final set of probe sequences was fixed. In November 2003, the 22K rice oligomicroarray version 1 (G4138A) was commercialized by Agilent Technologies, and is now being used by rice molecular biologists worldwide. Many journals request the registration of the data produced by microarray experiments in public databases, such as NCBI-GEO (http://www. ncbi.nlm.nih.gov/projects/geo/) and Array Express (http://www.ebi.ac.uk/ array express/). NIAS rice oligoarray version 1 was registered under accession number GPL892. The first published gene expression analysis result using the oligoarray was the gene expression profiles of abscisicacid- and gibberellin-responsive genes in rice (Yazaki et al. 2004). These data sets are registered in NCBI-GEO as gene series 661 (GSE661), samples 9853–9860 (GSM9853–9860) and platform (GPL477: 22K custom oligoarray). Information from known and predicted gene models was used for the construction of the global rice gene expression microarray system. The Affymetrix GeneChip Array is one of the standard microarray systems based on the 25-mer probe system. According to the description in NCBI-GEO’s registration, this array contains probes to query 51,279 transcripts representing two rice cultivars, with approximately 48,564 japonica
38
Shoshi Kikuchi et al.
transcripts and 1,260 transcripts representing the indica cultivar. This unique design was created within the Affymetrix GeneChip Consortia Program and provides scientists with a single array that can be used for the study of rice. High-quality sequence data were derived from GenBank mRNAs, TIGR gene predictions, and the International Rice Genome sequencing project. The arrays were designed using NCBI UniGene Build No. 52, (May 7, 2004) incorporating predicted genes from GenBank and the TIGR Os1 v2 data set. (ftp://tigr.org FASTA, 89.3 Mb). A 70-mer microarray covering 41,754 annotated genes and a nontransposable-element rice gene model, with and without experimental support was constructed (Ma et al. 2005), and the expression of genes in representative rice organs (seedling shoots, tillering-stage shoots and roots, heading and filling-stage panicles, and suspension culture cells) was analyzed. Expression of 86% of the 41,754 genes was detected. A similar proportion of the rice and Arabidopsis genomes was expressed in the corresponding organs. A large percentage of the rice gene models that lack significant Arabidopsis homologs was found to be expressed. The expression patterns of rice and Arabidopsis bestmatched homologous genes in distinct functional groups revealed dramatic differences in their degree of expression conservation between the species. These data show some basic similarities and differences between the Arabidopsis and rice transcriptomes. Since the commercialization of the 22K rice oligomicroarray system, only a few reports on its use have been published. The reason might be the large amount of gene expression data, which makes data analysis difficult. Many types of genomic information are available for rice, such as map locations of probed genes, protein coding information, and promoter sequence information. To obtain such information, researchers need to use data mining. It is important to have the facility to overlay these and other layers of genomic information, including the ability to relate these layers to classical plant biochemical information. The development of these relationships assists in interpretation of gene functions. Comparisons of gene expression under various biotic and abiotic stresses are also important. To meet these needs, a RED II database is being established covering the 22K microarray data and data mining tools (Fig. 4.4).
4.3 Deep Transcriptome Analysis of the Rice Genome For most genome projects, exhaustive sequencing of EST tags is the first method used for rapid identification of expressed genes and gene
4 Genome-Wide RNA Expression Profiling in Rice
39
Fig. 4.4. Various rice genomics databases produced by the Rice Genome Project of the National Institute of Agrobiological Sciences
expression profiling (Adams et al. 1991; also see Section 4.1). ESTs are relatively slow and costly to generate, making it difficult to achieve saturation of a library or to produce quantitative estimates of tissuespecific expression from these data. The DNA microarray technology has
40
Shoshi Kikuchi et al.
provided a rapid and relatively inexpensive way to monitor the expression of thousands of transcripts in parallel. However, microarrays are subject to inherent limitations, such as background intensities that can rival signals for weakly expressed transcripts, the difficulty of distinguishing between closely related sequences (Duggan et al. 1999), inability to obtain the transcript variants (Patankar et al. 2001; Jones et al. 2002; Gibbings et al. 2003), and limited genome coverage due to lack of accurate gene annotation. The recently developed tag-based technologies such as SAGE and MPSS can overcome these problems as described in the following sections. 4.3.1 Principles of Different SAGE Techniques SAGE is the first tag-based method that allows both qualitative and quantitative evaluation of thousands of genes without any prior information (Velculescu et al. 1995). It is based on three main principles: (1) short sequence tags (14 to 15 bp) are isolated from transcripts, giving sufficient information to provide a defined 3΄ position within a transcript; (2) ditags (two ligated individual tags) are concatenated, with as many as 70 to 100 tags per concatamer, and the concatamers are cloned and sequenced; (3) data output reflects the actual gene expression pattern in a particular condition, or stage of an organism, and allows visualization of transcript complexity such as transcript variants, antisense transcripts, and so forth (Patankar et al. 2001; Jones et al. 2002; Gibbings et al. 2003). In comparison to the EST approach, the advantage of the SAGE method is that the concatenation of ditags in a serial fashion allows for an increased efficiency to sequence the tags and many more transcripts can be identified with similar sequencing costs. For example, about 40 14-bp tags can be identified from one sequencing read of a 600-bp SAGE clone that may represent 40 different transcripts present in the RNA population. In addition, the output of SAGE sequence analysis is in a digital format, and so the data generated by different researchers and laboratories can be directly compared (Aldaz 2003). One of the limitations of the original SAGE method is that the assignment of 14-bp tags to duplicated genes or repeated sequences is problematic, especially for complex genomes (Chen et al. 2000). LongSAGE, a modified version of the original SAGE method, was first developed for expression analysis and genome annotation in the human genome (Saha et al. 2002). Instead of using BsmFI, the type IIS enzyme MmeI was used to cleave cDNAs, which increases the tag length to 21 bp. MmeI cleaves 20/18 bases from its nonpalindromic recognition sequence (TCCRAC; Tucholski et al. 1995). The advantage of the 21-bp LongSAGE
4 Genome-Wide RNA Expression Profiling in Rice
41
tags is that these tags can be used for both genome annotation and expression profiling. Of the 5,641 tags with single loci in the Celera human genome database, 3,419 precisely matched exonic sequences or 3΄ untranslated regions (Saha et al. 2002). A total of 575 tags were found to match regions within the introns of known genes that represent either the unknown exons of annotated genes or novel genes embedded in the introns of known genes. In addition, 803 tags matched regions at least 5 kb from the terminal exons of known or predicted genes. Recently, a new method called SuperSAGE was reported (Matsumura et al. 2003) in which the type III restriction endonuclease EcoP15I is used to isolate fragments of 26 bp from the 3΄ region of cDNAs. The method was used to investigate the gene expression profiles of rice blast-infected rice leaves and the gene expression changes in INF1 elicitor-treated Nicotiana benthamiana. Compared to LongSAGE, SuperSAGE increases the tag size, but the gene discovery per sequencing read is reduced because of the increased tag length. Isolation of full-length cDNAs is still labor-intensive and technically challenging. For example, from 155,144 RIKEN Arabidopsis full-length cDNA clones, only 14,668 nonredundant cDNA groups were obtained, which represents only about 60% of the predicted genes (Seki et al. 2002). Whether all of these full-length cDNAs contain the sequence of the 5΄ initiation site remains to be confirmed. To efficiently identify 5΄ tags of all expressed genes, the cap analysis gene expression (CAGE) method was developed (Shiraki et al. 2003). By analyzing four libraries, more accurate transcription units of 11% to 27% of the genes were defined. Another similar approach was recently reported for identification of 5΄ LongSAGE tags (Hashimoto et al. 2004). Among 15,448 tags identified in the human genome, 85.8% to 96.1% of the 5΄ LongSAGE tags were assigned within – 500 to +200 nt of mRNA start sites. To identify transcription units bound by a transcription initiation site and a polyadenylation site, a set of two complementary methods, 5΄ LongSAGE and 3΄ LongSAGE have recently been developed (Wei et al. 2004). The results showed that more than 90% of the tag pairs identified in the human genome were appropriately assigned to the first and the last exons. This large-scale generation of transcript terminal tags is at least 20 to 40 times more efficient than fulllength cDNA cloning and sequencing in the identification of complete transcription units. Recently, the same lab developed the gene identification signature (GIS) method, in which 5΄ and 3΄ signatures of full-length cDNAs are accurately extracted into paired-end ditags (PETs) that are concatenated for efficient sequencing and mapped to genome sequences (Ng et al. 2005). The application of these improved SAGE techniques should facilitate a comprehensive transcriptome analysis of sequenced genomes.
42
Shoshi Kikuchi et al.
4.3.2 Development of the Robust-LongSAGE (RL-SAGE) Method In contrast to the extensive application of SAGE in human and animal systems, not many plant SAGE collections have been reported to date. These plant SAGE libraries have been made from rice seedlings (Matsumura et al. 1999), panicles, leaves, and roots of a superhybrid rice (Bao et al. 2005), mature leaf, and immature seed tissue of rice (Gibbings et al. 2003), lignifying xylem of a single, 10-year-old loblolly pine (Pinus taeda L.) (Lorenz and Dean 2002), Arabidopsis roots (Fizames et al. 2004), Arabidopsis roots after 2,4,6-trinitrotoluene treatment (Ekman et al. 2003), and Arabidopsis leaves (Jung et al. 2003) and pollen (Lee and Lee 2003) undergoing cold stress. Recently, a SAGE library of maize root tips of well-watered seedlings was published (Poroyko et al. 2005). A total of 161,320 individual tags representing a minimum of 14,850 genes were identified. Among them, 47% did not match any maize cDNAs or gene models. Noteworthy is that most of the SAGE tags in the reported studies are 14 bp in length. Use of the SAGE method in plants is limited owing to several technical challenges associated with SAGE tag isolation and cloning. Because of the difficulties in obtaining longer concatemers (>500 bp) and high transformation efficiency, some laboratories adopted the colony polymerase chain reaction (PCR)-based screening method to identify large SAGE clones for sequencing. This approach is a laborious, time-consuming and expensive method of SAGE library generation. After critically evaluating the entire SAGE cloning procedure, Gowda et al. (2004) found that the unclonable nature of concatemers is the major problem. A substantially improved LongSAGE method called Robust-LongSAGE, which has four major improvements when compared with the previously reported protocols, was subsequently developed (Gowda et al. 2004, 2007). First, a small amount of mRNA (50 ng) was enough for a library construction, so the method can be used for experiments with a small amount of tissue. Second, enhancement of cDNA adapter and ditag formation was achieved through an extended ligation period (overnight). Therefore, a high yield of PCR products can be obtained. Third, only 20-ditag PCRs were needed to obtain a complete library (up to 90% reduction compared with the original protocols). Fourth, concatemers were partially digested with NlaIII before being cloned into the vector (pZEro-1), greatly improving cloning efficiency. The amount of NlaIII and the duration of partial digestion are critical for obtaining large insert clones and increasing transformation efficiency. Using this protocol, one can generate two to three libraries, each containing more than 4.5 million tags, within a month. By sequencing about 3,000 clones, about 100,000 individual tags could be isolated. Six libraries from rice, one from maize, and one from the rice blast fungus (Magnaporthe grisea) have been constructed (Gowda et al. unpublished). The general procedure of RL-SAGE library construction is illustrated in Fig. 4.5.
4 Genome-Wide RNA Expression Profiling in Rice
43
+
mRNA + Oligo (dT) beads mRNA::Oligo (dT) cDNA::Oligo(dT) Digested cDNA divided
cDNA digested by NlaIII
cDNA Pool 1
cDNA Pool 2
Ligated with Adapter A
Ligated with Adapter B Ditag formation by ligating tags from pool 1 and 2 Tag 1
NN NN
Tag 2
Tags released by MmeI
Tags released by MmeI NN
Adapter A + Tag (21 bp)
NN Tag
Primer 1
(21 bp ) + Adapter B
PCR amplification of ditags Primer 2
Ditags released byNlaIII
Ditags ligated to generate concatemers Concatemers cloned into SphI site of pZEro-1 and sequencing Isolation of individual tags, clustering analysis, genomic and EST sequence matching and determination of the patterns of gene expression of unique tags
50 45 40
60
Tag copy number
Tag copy number
Bioinformatics analysis using SageSpy**
35 30 25 20 15 10 5 0 A
B
C
D
E
Control
F
G
H
50 40 30 20 10 0
A
B
C
D
E
F
G
H
Treated
Fig. 4.5. General diagram of RL-SAGE library construction. Detailed experimental steps are described in Gowda et al. (2007). The SageSpy program was written by Eric Stahlberg at Ohio Supercomputer Center (http://www.osc.edu/ hpc/software/apps/sagespy.shtml)
4.3.3 Application of RL-SAGE for Defense Transcriptome Analysis in Rice Approximately, 65.5% of the significant tags matched to TIGR rice ESTs and 69.1% matched the rice genome sequence. Interestingly, 13.1% (7,597) of the tags matched to the M. grisea genome sequence and only 7.1% (4,215) of the tags matched the TIGR M. grisea ESTs, suggesting that the unmatched 3,382 M. grisea tags are novel transcripts that might be expressed only during infection on rice plants. In addition, Gowda et al. (unpublished) also found 1,572 antisense tags when matched to TIGR M. grisea ESTs. All the tag sequences derived from the four rice SAGE libraries are deposited and displayed on the Magnaporthe Grisea Oryza Sativa (MGOS) database (http://www.mgosdb.org/sage/).
44
Shoshi Kikuchi et al.
4.3.4 MPSS for Expression Profiling Massively parallel signature sequencing (MPSS), developed by Brenner et al. (2000), involves the cloning of a cDNA library on beads and the acquisition of 17 to 20 nucleotide (nt) signatures from these cDNAs using a hybridization-based sequencing method. The abundance of the sequence signatures precisely reflects gene expression levels in the sampled tissue. The technology is sensitive enough to detect rarely expressed transcripts because more than 1 million MPSS tags per library can be obtained. Each signature is derived from the 3΄-most DpnII site 5΄ to the poly(A) tail of a cDNA molecule. The sequencing process proceeds by identifying sets of four bases by hybridization to labeled linker-probes, then removing that set of four bases by a type IIS restriction enzyme site contained in the linker, and then repeating the process (Brenner et al. 2000). These fluorescent reactions occur underneath an automated microscope and scanner while the beads are immobilized in a flow cell, with no gels or capillaries. The procedure is completely parallel, facilitating large-scale sequencing, and 17 to 20 nt of high-quality sequence is routinely obtained per bead (Brenner et al. 2000). Meyers and his colleagues pioneered the application of MPSS for transcriptome analysis in Arabidopsis (Meyers et al. 2004a, 2004b). A total of 36,991,173 17-base signatures derived from 14 libraries were obtained. Among them, 268,132 were distinct sequences. A comparison of genomic and expressed signatures matched 67,735 signatures predicted to be derived from distinct transcripts and expressed at significant levels. At least 19,088 sense expressed signatures were derived from 29,084 annotated genes. More than 89% of the total expressed signatures matched the Arabidopsis genome, and many of the unmatched but highly expressed signatures matched to previously uncharacterized transcripts. Using a modified MPSS cloning procedure, the same group sequenced more than 2 million small RNAs from seedlings and inflorescences of Arabidopsis (Lu et al. 2005). Many known and new micro-RNAs (miRNAs) were identified among the set of more than 75,000 sequences. Many genomic regions previously considered featureless were found to be sites of numerous small RNAs and antisense strand RNAs, indicating a regulatory function. A searchable Web site displaying all the mRNA and small RNA MPSS tags of Arabidopsis has been designed (http://mpss.udel.edu/at/). 4.3.5 Deep Transcriptome Analysis Using MPSS In collaboration with Blake Meyers of the University of Delaware, we at Ohio State University, Columbus (Wang et al. unpublished) initiated a rice
4 Genome-Wide RNA Expression Profiling in Rice
45
MPSS project to deeply and comprehensively analyze the rice transcriptome. The specific objectives of this project were to use MPSS to quantify the expression of transcripts in untreated and abiotically stressed rice tissues, including transcripts found at low levels; to characterize allelespecific expression and the subset of genes affected by cis versus trans regulatory elements in indica and japonica hybrids; to monitor the expression of M. grisea or Xanthomonas oryzae pv. oryzae infected rice tissues from susceptible and resistant plants; and to compare MPSS signatures and rice genomic sequences to identify novel transcripts. MPSS data for approximately 65 rice samples of diverse untreated tissue, tissue treated with abiotic or biotic stress, and indica and japonica hybrids have been generated. The recent release of the rice genome annotation from TIGR (v2.0) was used to identify the genomic location of the tags relative to the annotated genes. The MPSS data from the first 22 rice libraries that include diverse untreated tissues as well as abiotically stressed tissues have been analyzed. These data include 121,581 distinct signatures that match the rice genome. A comparison of these signatures to the annotated genes demonstrates that at least 22,504 genes are transcribed. In addition, thousands of signatures were identified that suggest the existence of alternatively transcribed and novel (intergenic) transcripts. The rice MPSS data are available through the University of Delaware MPSS Web interface (http://mpss. udel.edu/rice). The public data will facilitate gene discovery and functional analyses and permit electronic Northern analyses of specific genes of interest. Similar to the Arabidopsis mRNA MPSS project, the Meyers lab is also generating small RNA MPSS tags from diverse untreated and treated rice tissues. It is expected that more small RNA MPSS tags will be identified from the rice genome than from the Arabidopsis genome. 4.4 Transcriptional Analysis Using Genome Tiling Microarrays Another experimental approach to evaluate computationally annotated rice gene models and to identify new transcription units is genome tiling microarray analysis. This approach utilizes multiple probes in microarray hybridization to detect RNA transcripts in a comprehensive and unbiased fashion. Results from a genome tiling analysis can be used to verify or correct annotated gene structure and to generate candidate transcripts for further confirmation. Coupled with comparative genomics and other experimental and computational approaches, genome tiling analysis can be used to elucidate the transcriptional aspects of rice genome organization, including evolution, global regulation of its expression, and epigenetics.
46
Shoshi Kikuchi et al.
4.4.1 Principle of Genome Tiling Microarrays Recent advances in microarray technologies have made it possible to use microarrays as a platform for experimental approaches to interrogate the ever-increasing genome sequences. Of particular relevance to genomewide transcriptional analysis are the high-density oligonucleotide microarrays that contain short oligonucleotide probes synthesized directly on the surface of the arrays by photolithography using light-sensitive synthetic chemistry in combination with photolithographic masks (Yamada et al. 2003), an ink-jet device (Hughes et al. 2001), or digital micro-mirrors (Bertone et al. 2004; Stolc et al. 2005). Moreover, oligonucleotide arrays can be made with several hundred thousand to several million discrete features per array (Mockler and Ecker 2005). This makes it feasible to synthesize probes to represent virtually any available genomic sequence and to interrogate complex genome sequences with a manageable number of arrays. Genomic tiling arrays involve the generation of a “tile path” made up of oligonucleotide probes that represent a target genome region or the entire genome sequence (Fig. 4.6). These probes may overlap, lay end to end, or be spaced at regular intervals. The average nucleotide distance between the centers of neighboring probes are called the “step,” which defines the resolution of the tiling arrays. These probes are immobilized on glass slides and are used for hybridization with fluorescence-labeled RNA samples. The hybridization intensity of each probe is retrieved and the integration and analysis of the hybridization data then leads to the identification of transcribed regions of the genome (Fig. 4.6). Genome tiling arrays have been used in model systems with a full genome sequence available. The first genome-wide transcription study using tiling microarrays was performed in Escherichia coli using 25-mer oligonucleotides with 6- and 30-nt steps for intergenic and coding regions, respectively (Selinger et al. 2000). Besides detecting most of the approximately 4,000 open reading frames (ORFs), antisense transcription was detected from more than 3,000 of the ORFs (Selinger et al. 2000). The first reported human whole-genome tiling experiment involved 36-mer oligonucleotides with 46-nt steps. When probed against the human liver tissue, these tiling arrays revealed approximately 11,000 novel transcribed regions not yet detected by other methods (Bertone et al. 2004). In plants, the Arabidopsis genome was probed via 8-nt-step, 25-mer tiling microarrays, which detected transcription from about 2,000 intergenic regions and antisense transcription from about 30% of the annotated genes (Yamada et al. 2003).
4 Genome-Wide RNA Expression Profiling in Rice
47
Fig. 4.6. Principle of tiling microarray analysis. Genomic tiling arrays involve the generation of a virtual tile path representing a target genome region that are made up of short oligonucleotide probes. These probes are immobilized on the surface of glass slides at a high feature density. Hybridization with fluorescence-labeled RNA samples generates signals that reflect the transcriptional activities of the genome target in question. Interrogation and analysis of the hybridization data then lead to the identification of transcribed regions of the genome that can be used to compare with the available genome annotation data
4.4.2 Application of Genome Tiling Microarray Analysis in Rice Rice genome sequences have been subjected to extensive annotation using ab initio gene prediction, comparative genomics, and a variety of other computational methods (International Rice Genome Sequencing Project, 2005; Yu et al., 2005). As such, our understanding of the rice genome is largely limited to the state-of-the-art gene prediction and annotation programs. Because oligonucleotide tiling microarrays provide unbiased end-to-end coverage of the target genome regions and measure transcriptional activity from multiple independent probes, they are capable of detecting the transcriptome in a comprehensive and unbiased way. Thus, tiling array analysis in rice can facilitate annotation of the genome by verifying predicted gene models and by identifying novel transcription units. In addition, tiling array analysis can be used to understand the relationship of transcription with genome organization.
48
Shoshi Kikuchi et al.
Recently, rice genome tiling microarrays were developed based on the Maskless Array Synthesizer (MAS) technology. The rice MAS arrays contain 36-mer oligonucleotides tiling both the japonica and indica genome sequence with a 10-nt space on average (thus a 46-nt step; Stolc et al. 2005). The rice tiling arrays were hybridized with a pooled mRNA target derived from seedling root, seedling shoot, panicle, and suspensioncultured cells. Hybridization signals were correlated with the transcriptionally active regions (TARs) of the genome by alignment of the probes to the chromosomal coordinates (Fig. 4.7). The tiling array data were used to detect transcription of the majority of the annotated gene models. For example, of the 43,914 nontransposable element (non-TE) protein-coding gene models from the improved indica whole-genome shotgun sequence (Yu et al. 2005), transcription of 35,970 (81.9%) gene models was detected (Li et al. 2006).
Fig. 4.7. Tiling microarray analysis of rice chromosome 10. (A) Schematic representation of rice chromosome 10. The oval denotes the centromere. (B) A region from the long arm of chromosome 10 displaying both the indica and the japonica annotation. (C) Detailed tiling profile of one representative gene model. The model is represented here as block arrows, which point in the direction of transcription. The fluorescence intensity value of each probe is depicted as a vertical bar. The blocks underneath the bars indicate the presence of a probe in the microarray. Adapted from Li et al. (2005)
The transcription of gene models as detected by tiling arrays was consistent with several other experimental results. Current collections of
4 Genome-Wide RNA Expression Profiling in Rice
49
rice full-length cDNA and ESTs support about half of the predicted gene models (Kikuchi et al. 2003). Transcription of gene models with fulllength cDNA/EST support was detected at a much higher percentage than the unsupported models in tiling array analysis of both japonica chromosome 10 (Fig. 4.7; Li et al. 2005) and the whole indica genome (Li et al. 2006). Based on predicted protein homology between rice and Arabidopsis thaliana, rice gene models were divided into high-homology (HH) and low-/no-homology (LH) models. A greater proportion of the HH models was detected by tiling arrays than the LH models (Li et al. 2005, 2006). Further, when sequence conservation between indica and japonica was employed to identify common and unique models relative to each subspecies, it was expected that the common models would be more reliable because of an abundance of full-length cDNA/EST supported models (Li et al. 2006). From the tiling array analysis, higher array detection rates were observed for common models than for the unique models (Li et al. 2005, 2006). Extensive transcriptional activity was observed in regions antisense to the annotated gene models. In tiling array analysis of japonica chromosome 10, antisense regions of 591 (19.6%) of the 3,019 gene models were found transcribed (Li et al. 2005), whereas analysis of the whole indica genome showed that 10,452 (23.8%) gene models have significant antisense transcription (Li et al. 2006). The proportion of rice genes exhibiting antisense transcription is slightly lower than that reported from tiling microarray analysis in Arabidopsis (~30% of all annotated genes; Yamada et al. 2003), adding to an increasing body of evidence indicating antisense transcription as an inherent property of plant genomes. However, it should be cautioned that the potential effects of several experimental artifacts such as unintended second-strand synthesis, formation of specific RNA/DNA hybrids, or spurious priming events during target preparation have to be precisely assessed before a final conclusion on the nature and extent of antisense transcription in rice can be drawn. Consistent with results from tiling microarray analysis in other model organisms, a significant amount of transcriptional activity was detected in the annotated intergenic regions of the rice genome (Li et al. 2005, 2006). Systematic scoring of indica tiling array data identified 5,464 unique novel TARs in the intergenic regions using a set of stringent criteria (Li et al. 2006). These novel TARs were validated by several independent experimental means including reverse transcriptase (RT)-PCR experiments, alignment against the rice ESTs, analysis of their coding content, and their association with simple sequence repeats (Li et al. 2006). Collectively, these results indicate that the novel TARs compositionally resemble the
50
Shoshi Kikuchi et al.
exonic regions and thus provide a reliable but conservative estimation of additional transcribed genomic loci beyond the predicted exons. Examination of the distribution of tiling array signals provides an unbiased means to score genome-level transcriptional activities. Decreased transcriptional activity was found in the pericentromic regions (Li et al. 2005, 2006). Besides the pericentromic regions, a number of chromosomal domains, including regions in chromosomes 4, 5, 7, 8, 9, 10, 11, and 12, were revealed in tiling array analysis of the indica genome to exhibit relatively repressed transcription. These domains appear to be associated with the cytologically defined heterochromatin (Li et al. 2006). The indica chromosome 4, which contains roughly equal-sized heterochromatin and euchromatin that border at about 16 Mb, was chosen to confirm the correlation between cytological features and transcriptional activity. The distribution of array signals indicates that the first half of the chromosome (~16 Mb) was generally less transcriptionally active compared to the second half of the chromosome (Fig. 4.8). When two PCR-generated probes flanking the transcriptionally defined border were used in fluorescence in situ hybridization (P1 and P2), they located precisely at the heterochromatin/euchromatin junction (Li et al. 2006). These results indicate that tiling microarray analysis provides a high-fidelity map of the repressed transcriptional activities associated with heterochromatin of the rice genome. Profiling the transcriptional activities of japonica chromosome 10 using tiling microarrays confirmed that gene expression in the heterochromatin region is generally low relative to the euchromatin under normal growth conditions (Li et al. 2005). Consistent with this observation, the gene model distribution showed that the heterochromatin domain is relatively low in full-length cDNA/EST supported models but more abundant in unsupported models. An enrichment of transposable element-related models in the heterochromatin domain is also evident. Interestingly, when plants were subjected to mineral/nutrient stresses, a general activation of transcription was observed in the heterochromatin (Li et al. 2005). These results are consistent with findings that heterochromatin stability and heterochromatin-mediated gene silencing can be regulated by development (Preuss 1999; Meyer 2000) or by modulating levels of specific transcription factors (Ahmad and Henikof 2001).
4 Genome-Wide RNA Expression Profiling in Rice
51
Fig. 4.8. Tiling microarray analysis of indica chromosome 4. (A) Number of signal probes was calculated in 100-kb windows along both strands of chromosome 4 and depicted as color-coded vertical bars. At the bottom, length of genome region represented by interrogating (R) and masked (M) probes in the same 100-kb windows along the length of the chromosome is shown. The black triangle marks the starting position of the annotated centromere. The “+” and “-” signs on the right denote the forward and reverse DNA strand, respectively. (B, C) Euchromatin and heterochromatin of indica rice chromosome 4 were mapped by 4 ,6-diamidino-2-phenylindole (DAPI) staining with heterochromatin more intensely stained. Two selected probes, P1 and P2, locating at 15.3 Mb and 16.2 Mb, respectively (marked by arrows in A), were used for fluorescence in situ hybridization. On the right are the stained images for visualization of the euchromatin and heterochromatin domains. Adapted from Li et al. (2006) (See also color plate section).
The distribution of TE and non-TE gene models in the heterochromatic and euchromatic regions of japonica chromosome 10 suggests that the heterochromatin and euchromatin may have similar capacities to accommodate protein-coding gene models (TE and non-TE), even though the heterochromatin is enriched with repetitive sequences (Li et al. 2005). Further, the heterochromatin is relatively enriched with low homology models and low in supported models, as compared with the euchromatin. Thus, it is likely that the differential package of genome elements in heterochromatin and euchromatin might enable rice to regulate and coordinate gene expression at the chromosomal level. Moreover, mapping the physical positions of the japonica and indica gene models that are supported by full-length cDNA information along chromosome 10 showed
52
Shoshi Kikuchi et al.
that the distance between a japonica-indica gene pair was homogeneous in the euchromatin but more skewed in the heterochromatin (Li et al. 2005). Together with previous findings of a mosaic organization of grass genomes where conserved sequences are disrupted by nonconserved sequences (Dubcovsky et al. 2001; Song et al. 2002; Bennetzen and Ma 2003), these results indicate that rice heterochromatin domains are more evolutionarily active and compositionally dynamic than the euchromatins.
4.5 Perspective Currently, more than 400,000 rice ESTs and full-length cDNAs have been sequenced and deposited in the NCBI databases. The public availability of these sequences has not only advanced the functional analysis of the rice genes, but has also played an important role in rice genome annotation. The mapping and alignment of combined EST sequences and full-length cDNAs to the genome sequence have provided direct experimental evidence for many of the gene models predicted by computer programs. However, a considerable number of the gene models have not been confirmed by any experimental data. The problem of mis-prediction or mis-annotation of exon–intron structure by current computer programs for gene structural annotation is still a major challenge for rice genome biologists. Further collection and complete sequencing of full-length cDNA clones and the comparison of gene models and cDNA sequences in detail will improve current rice genome annotation. Recently, in the process of updating a 22K oligomicroarray system to a 44K array, probes were constructed based on the sequences of the predicted genes. These predicted-gene probes were subjected to hybridization analyses with RNA from four diverse tissue samples: seed, upper part of seedlings, roots of seedlings, and callus. Considering the signal of each probe, an estimated number of expressed genes in rice might be around 42,000 (Satoh et al. unpublished). During the process of updating the array, it was also found that many of the transposable element-related (TE-related) genes are members of annotated nonexpressed (ANE) genes. An in-depth comparison of the structures of the TE-related genes and the truly expressed genes is also very important and should be implemented for use in the gene-prediction program. The miRNAs and short interfering RNAs (siRNAs) are relatively new and important research areas in transcriptomics. In the rice full-length cDNA collection, many of these small RNAs are included. However, to obtain the comprehensive coverage of these small RNAs, a new collection might be required. Several types of microarray systems for rice gene expression analysis have been established in recent years. However, because of the high expense associated with the
4 Genome-Wide RNA Expression Profiling in Rice
53
microarray system, the technique has not become routine in ordinary molecular biology laboratories in the way that Northern-blot hybridization and RT-PCR have. Therefore, microarray service centers should be established to perform the hybridizations for individual laboratories. The appropriate use of the statistical and microarray analysis procedures and packages for the large data sets is another obstacle for many molecular biologists. The development of simple, user-friendly, yet rigorously structured microarray analysis programs will promote the more extensive use of the microarray system. As many rice gene expression datasets are accumulated, public databases should be established so that all these data sets can be easily compared among output from different microarray platforms. In the databases, the experimental conditions should be described according to international standards, such as MIAME. These databases should also promote the coordinated analysis of transcriptomic, proteomic, and metabolomic data, which will further the prediction and validation of gene functions. Although SAGE has been available for more than 10 years and many results from human and animal systems have been published, the application of the technique in plant systems is limited, mainly because of the long and complicated cloning procedure and the high cost for sequencing. It usually takes about 2 weeks to construct a SAGE library and 3 to 4 weeks to sequence about 4,000 clones. To purify these clones and sequence the plasmids from one direction costs at least $10,000. Although MPSS library construction is performed by technicians in Solexa, Inc., the sequencing cost for a library is currently about $25,000. The development of simplified cloning techniques and the use of newly invented pyrosequencing techniques such as the 454 sequencing should be explored. In addition, since the majority of the SAGE and MPSS tags were isolated from the 3΄ region of the transcripts, large-scale identification of transcript tags from the 5΄ region will be conducted that will provide new information about the transcription initiation sites of the transcripts in untreated and treated tissues/organs.
Acknowledgments The full-length cDNA project in the Kikuchi laboratory was funded by the Rice Genome Full Length cDNA Library Construction Project grant by BRAIN (Bio-oriented Technology Research Advancement Institution). The rice microarray project and the mapping and alignment of cDNA sequences to the rice genome sequence were supported by the Rice Genome Projects in Japan. The SAGE and MPSS projects in the Wang laboratory are funded by the NSF Plant Genome Research Program (DBI
54
Shoshi Kikuchi et al.
0115642 and 0321437). The authors wish to thank Drs. Lee Tarpley (Texas A&M AREC), Iain Wilson (CSIRO) and Narayana Upadhyaya (CSIRO), for critical review of the manuscript and for helpful suggestions.
References Adams MD, Kelley JM, Gocayne JD, Dubnick M, Polymeropoulos MH, Xiao H, Merril CR, Wu A, Olde B, Moreno RF, Kerlavage AR, McCombie WR, Venter JC (1991) Complementary DNA sequencing: expressed sequence tags and human genome project. Science 252:1651–1656 Ahmad K, Henikof S (2001) Modulation of a transcription factor counteracts heterochromatic gene silencing in Drosophila. Cell 104:839–847 Aldaz CM (2003) Serial Analysis of Gene Expression (SAGE) in cancer research. In: Ladanyi M, Gerald W (eds) Expression profiling of human tumors: diagnostic and research applications. Humana Press, New Jersey, pp 47–60 Bao J, Lee S, Chen C, Zhang X, Zhang Y, Liu S, Clark T, Wang J, Cao M, Yang H, Wang S, Yu J (2005) Serial Analysis of Gene Expression study of a hybrid rice strain (LYP9) and its parental cultivars. Plant Physiol 138:1216–1231 Bennetzen JL, Ma J (2003) The genetic colinearity of rice and other cereals on the basis of genomic sequence analysis. Curr Opin Plant Biol 6:128–133 Bertone P, Stolc V, Royce TE, Rozowsky JS, Urban AE, Zhu X, Rinn JL, Tongprasit W, Samanta M, Weissman S, Gerstein M, Snyder M. (2004) Global identification of human transcribed sequences with genome tiling arrays. Science 306:2242–2246 Brenner S, Johnson M, Bridgham J Golda G, Lloyd DH, Johnson D, Luo S, McCurdy S, Foy M, Ewan M, Roth R, George D, Eletr, S, Albrecht G, Vermaas E, Williams SR, Moon K, Burcham T, Pallas M, DuBridge RB, Kirchner J, Fearon K, Mao J, Corcoran K (2000) Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat Biotechnol 18:630–634 Carninci P, Shibata Y, Hayatsu N, Sugahara Y, Shibata K, Itoh M, Konno H, Okazaki Y, Muramatsu M, Hayashizaki Y (2000) Normalization and subtraction of cap-trapper-selected cDNAs to prepare full-length cDNA libraries for rapid discovery of new genes. Genome Res 10:1617–1630 Chen JJ, Rowley JD, Wang SM (2000) Generation of longer cDNA fragments from serial analysis of gene expression tags for gene identification. Proc Natl Acad Sci USA 97:349–353 Dubcovsky J, Ramakrishna W, SanMiguel PJ, Busso CS, Yan L, Shiloff BA, Bennetzen JL (2001) Comparative sequence analysis of colinear barley and rice bacterial artificial chromosomes. Plant Physiol 125:1342–1353 Duggan DJ, Bittner M, Chen Y, Meltzer P, Trent JM (1999) Expression profiling using cDNA microarrays. Nat Genet 21:10–14 Ekman DR, Lorenz WW, Przybyla AE, Wolfe NL, Dean JFD (2003) SAGE analysis of transcriptome responses in Arabidopsis roots exposed to 2,4,6trinitrotoluene. Plant Physiol 133:1397–1406
4 Genome-Wide RNA Expression Profiling in Rice
55
Fizames C, Munos S, Cazettes C, Nacry P, Boucherez J, Gaymard F, Piquemal D, Delorme V, Commes T, Doumas P, Cooke R, Marti J, Sentenac H, Gojon A (2004) The Arabidopsis root transcriptome by serial analysis of gene expression: gene identification using the genome sequence. Plant Physiol 134:67–80 Gibbings JG, Cook BP, Dufault MR, Madden SL, Khuri S, Turnbull CJ, Dunwell JM (2003) Global transcript analysis of rice leaf and seed using SAGE technology. Plant Biotechnol J 1:271–285 Gowda M, Jantasuriyarat C, Dean RA, Wang GL (2004) Robust-LongSAGE (RLSAGE): a substantially improved LongSAGE method for gene discovery and transcriptome analysis. Plant Physiol 134:890–897 Gowda M, Venu RC, Jia Y, Stahlberg E, Pampanwar V, Soderlund C, Wang GL (2007) Use of robust-long serial analysis of gene expression to identify novel fungal and plant genes involved in host-pathogen interactions. In: Ronald PC (ed) Methods Mol Biol 354:131–144 Harushima Y, Yano M, Shomura A, Sato M, Shimano T, Kuboki Y, Yamamoto T, Lin SY, Antonio BA, Parco A, Kajiya H, Huang N, Yamamoto K, Nagamura Y, Kurata N, Khush GS, Sasaki T (1998) A high-density rice genetic linkage map with 2275 markers using a single F2 population. Genetics 148:479–494 Hashimoto S, Suzuki Y, Kasai Y, Morohoshi K, Yamada T, Sese J, Morishita S, Sugano S, Matsushima K (2004) 5'-end SAGE for the analysis of transcriptional start sites. Nat Biotechnol 22:1146–1149 Hughes TR, Mao M, Jones AR, Burchard J, Marton MJ, Shannon KW, Lefkowitz SM, Ziman M, Schelter JM, Meyer MR, Kobayashi S, Davis C, Dai H, He YD, Stephaniants SB, Cavet G, Walker WL, West A, Coffey E, Shoemaker DD, Stoughton R, Blanchard AP, Friend SH, Linsley PS (2001) Expression profiling using microarrays fabricated by an ink-jet oligonucleotide synthesizer. Nat Biotechnol 19:342–347 International Rice Genome Sequencing Project (2005) The map-based sequence of the rice genome. Nature 436:793–800 Jones SJ, Riddle DL, Pouzyrev AT, Velculescu VE, Hillier L, Eddy SR, Stricklin SL, Baillie DL, Waterston R, Marra MA (2002) Changes in gene expression associated with developmental arrest and longevity in Caenorhabditis elegans. Genome Res 11:1346–1352 Jung SH, Lee JY, Lee DH (2003) Use of SAGE technology to reveal changes in gene expression in A. thaliana leaves undergoing cold stress. Plant Mol Biol 52:553–567 Kikuchi S (2007) Comprehensive analysis of rice gene expression by using the microarray system: what we have learned from the microarray project. In: Datta S (ed) Rice improvement in the genomics era. Haworth Press, Binghamton NY (In Press) Kikuchi S, Satoh K, Nagata T, Kawagashira N, Doi K, Kishimoto N, Yazaki J, Ishikawa M, Yamada H, Ooka H, Hotta I, Kojima K, Namiki T, Ohneda E, Yahagi W, Suzuki K, Li CJ, Ohtsuki K, Shishiki T, Otomo Y, Murakami K, Iida Y, Sugano S, Fujimura T, Suzuki Y, Tsunoda Y, Kurosaki T, Kodama T, Masuda H, Kobayashi M, Xie Q, Lu M, Narikawa R, Sugiyama A, Mizuno K, Yokomizo S, Niikura J, Ikeda R, Ishibiki J, Kawamata M, Yoshimura A,
56
Shoshi Kikuchi et al.
Miura J, Kusumegi T, Oka M, Ryu R, Ueda M, Matsubara K, Kawai J, Carninci P, Adachi J, Aizawa K, Arakawa T, Fukuda S, Hara A, Hashizume W, Hayatsu N, Imotani K, Ishii Y, Itoh M, Kagawa I, Kondo S, Konno H, Miyazaki A, Osato N, Ota Y, Saito R, Sasaki D, Sato K, Shibata K, Shinagawa A, Shiraki T, Yoshino M, Hayashizaki Y, Yasunishi A (2003) Collection, mapping, and annotation of over 28,000 cDNA clones from japonica rice. Science 301:376–379 Lee JY, Lee DH (2003) Use of serial analysis of gene expression technology to reveal changes in gene expression in Arabidopsis pollen undergoing cold stress. Plant Physiol 132:517–529 Li L, Wang X, Xia M, Stolc V, Su N, Peng Z, Li S, Wang J, Wang X, Deng XW (2005) Tiling microarray analysis of rice chromosome 10 to identify the transcriptome and relate its expression to chromosomal architecture. Genome Biol 6:R52 Li L, Wang X, Stolc V, Li X, Zhang D, Su N, Tongprasit W, Li S, Cheng Z, Wang J, Deng XW (2006) Genome-wide transcription analyses in rice using tiling microarrays. Nature Genet 38:124–129 Lorenz WW, Dean JF (2002) SAGE Profiling and demonstration of differential gene expression along the axial developmental gradient of lignifying xylem in loblolly pine (Pinus taeda). Tree Physiol 22:301–310 Lu C, Tej SS, Luo S, Haudenschild CD, Meyers BC, Green PJ (2005) Elucidation of the small RNA component of the transcriptome. Science 309:1567–1569 Ma L, Chen C, Liu X, Jiao Y, Su N, Li L, Wang X, Cao M, Sun N, Zhang X, Bao J, Li J, Pedersen S, Bolund L, Zhao H, Yuan L, Wong GK, Wang J, Deng XW (2005) A microarray analysis of the rice transcriptome and its comparison to Arabidopsis. Genome Res 15:1274–1283 Maruyama K, Sugano S (1994) Oligo-capping: a simple method to replace the cap structure of eukaryotic mRNAs with oligoribonucleotides. Gene 138:171–174 Matsumura H, Nirasawa S, Terauchi R (1999) Technical advance: transcript profiling in rice (Oryza sativa L.) seedlings using serial analysis of gene expression (SAGE). Plant J 20:719–726 Matsumura H, Reich S, Ito A, Saitoh H, Kamoun S, Winter P, Kahl G, Reuter M, Kruger DH, Terauchi R (2003) Gene expression analysis of plant hostpathogen interactions by SuperSAGE. Proc Natl Acad Sci USA 100:15718– 15723 Meyer P (2000) Transcriptional transgene silencing and chromatin components. Plant Mol Biol 43:221–234 Meyers BC, Tej SS, Vu TH, Haudenschild CD, Agrawal V, Edberg SB, Ghazal H, Decola S (2004a) The use of MPSS for whole-genome transcriptional analysis in Arabidopsis. Genome Res 14:1641–1653 Meyers BC, Vu TH, Tej SS, Ghazal H, Matvienko M, Agrawal V, Ning J, Haudenschild CD (2004b) Analysis of the transcriptional complexity of Arabidopsis thaliana by massively parallel signature sequencing. Nat Biotechnol 22:1006–1011 Mockler TC, Ecker JR (2005) Applications of DNA tiling arrays for wholegenome analysis. Genomics 85:1–15
4 Genome-Wide RNA Expression Profiling in Rice
57
Ng P, Wei CL, Sung WK, Chiu KP, Lipovich L, Ang CC, Gupta S, Shahab A, Ridwan A, Wong CH, Liu ET, Ruan Y (2005) Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation. Nat Methods 2:105–111 Ohyanagi H, Tanaka T, Sakai H, Shigemoto Y, Yamaguchi K, Habara T, Fujii Y, Antonio BA, Nagamura Y, Imanishi T, Ikeo K, Itoh T, Gojobori T, Sasaki T (2006) The rice annotation project database (RAP-DB): hub for Oryza sativa ssp. japonica genome information. Nucl Acids Res 34:741–744 Patankar S, Munasinghe A, Shoaibi A, Cummings LM, Wirth DF (2001) Serial analysis of gene expression in Plasmodium falciparum reveals the global expression profile of erythrocytic stages and the presence of anti-sense transcripts in the malarial parasite. Mol Biol Cell 12:3114–3125 Poroyko V, Hejlek LG, Spollen WG, Springer GK, Nguyen HT, Sharp RE Bohnert HJ. (2005) The maize root transcriptome by Serial Analysis of Gene Expression. Plant Physiol 138:1700–1710 Preuss D (1999) Chromatin silencing and Arabidopsis development: A role for polycomb protein. Plant Cell 11:765–767 Saha S, Sparks AB, Rago C, Akmaev V, Wang CJ, Vogelstein B, Kinzler KW, Velculescu VE (2002) Using the transcriptome to annotate the genome. Nat Biotechnol 20:508–512 Sasaki T, Song J, Koga-Ban Y, Matsui E, Fang F, Higo H, Nagasaki H, Hori M, Miya M, Maruyama-Kayano E, Takiguchi T, Takasuga A, Niki T, Ishimaru K, Ikeda H, Yamamoto Y, Mukai Y, Ohta I, Miyadera N, Havukkala I, Minobe Y (1994) Toward cataloguing all rice genes: large-scale sequencing of randomly chosen rice cDNAs from a callus cDNA library. Plant J 6:615– 624 Sasaki T, Yano M, Kurata, N, Yamamoto K (1996) The Japanese Rice Genome Research Program. Genome Res 6:661–666 Seki M, Narusaka M, Kamiya A, Ishida J, Satou M, Sakurai T, Nakajima M, Enju A, Akiyama K, Oono Y, Muramatsu M, Hayashizaki Y, Kawai J, Carninci P, Itoh M, Ishii Y, Arakawa T, Shibata K, Shinagawa A, Shinozaki K. (2002) Functional annotation of a full-length Arabidopsis cDNA collection. Science 296:141–145 Selinger DW, Cheung KJ, Mei R, Johansson EM, Richmond CS, Blattner FR, Lockhart DJ, Church GM (2000) RNA expression analysis using a 30 base pair resolution Escherichia coli genome array. Nat Biotechnol 18:1262–1268 Shiraki T, Kondo S, Katayama S, Waki K, Kasukawa T, Kawaji H, Kodzius R, Watahiki A, Nakamura M, Arakawa T, Fukuda S, Sasaki D, Podhajska A, Harbers M, Kawai J, Carninci P, Hayashizaki Y (2003) Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc Natl Acad Sci USA 100:15776–15781 Song R, Llaca V, Messing J (2002) Mosaic organization of orthologous sequences in grass genomes. Genome Res 12:1549–1555 Stolc V, Li L, Wang X, Li X, Su N, Tongprasit W, Han B, Xue Y, Li J, Snyder1 M, Gerstein M, Wang J, Deng XW (2005) A pilot study of transcription unit analysis in rice using oligonucleotide tiling-path microarray. Plant Mol Biol 59:137–149
58
Shoshi Kikuchi et al.
Tucholski J, Skowron PM, Podhajska AJ (1995) MmeI a class-IIS restriction endonuclease: purification and characterization. Gene 157:87–92 Velculescu VE, Zhang L, Vogelstein B, Kinzler KW (1995) Serial analysis of gene expression. Science 270:484–487 Wei CL, Ng P, Chiu KP, Wong CH, Ang CC, Lipovich L, Liu ET, Ruan Y (2004) 5' Long serial analysis of gene expression (LongSAGE) and 3' LongSAGE for transcriptome characterization and genome annotation. Proc Natl Acad Sci USA 101:11701–11706 Wu J, Maehara T, Shimokawa T, Yamamoto S, Harada C, Takazaki Y, Ono N, Mukai Y, Koike K, Yazaki J, Fujii F, Shomura A, Ando T, Kono I, Waki K, Yamamoto K, Yano M, Matsumoto T, Sasaki T (2002) A comprehensive rice transcript map containing 6,591 expressed sequence tag sites. Plant Cell 14:525–535 Yamada K, Lim J, Dale JM, Chen H, Shinn P, Palm CJ, Southwick AM, Wu HC, Kim C, Nguyen M, Pham P, Cheuk R, Karlin-Newmann G, Liu SX, Lam B, Sakano H, Wu T, Yu G, Miranda M, Quach HL, Tripp M, Chang CH, Lee JM, Toriumi M, Chan MM, Tang CC, Onodera CS, Deng JM, Akiyama K, Ansari Y, Arakawa T, Banh J, Banno F, Bowser L, Brooks S, Carninci P, Chao Q, Choy N, Enju A, Goldsmith AD, Gurjal M, Hansen NF, Hayashizaki Y, Johnson-Hopson C, Hsuan VW, Iida K, Karnes M, Khan S, Koesema E, Ishida J, Jiang PX, Jones T, Kawai J, Kamiya A, Meyers C, Nakajima M, Narusaka M, Seki M, Sakurai T, Satou M, Tamse R, Vaysberg M, Wallender EK, Wong C, Yamamura Y, Yuan S, Shinozaki K, Davis RW, Theologis A, Ecker JR (2003) Empirical analysis of transcriptional activity in the Arabidopsis genome. Science 302:842–846 Yamamoto K, Sasaki T (1997) Large-scale EST sequencing in rice. Plant Mol Biol 35:135–144 Yazaki J, Kishimoto N, Nakamura K, Fujii F, Shimbo K, Otsuka Y, Wu J, Yamamoto K, Sakata K, Sasaki T, Kikuchi S (2000) Embarking on rice functional genomics via cDNA microarray: use of 3′ UTR probes for specific gene expression analysis. DNA Res 7:367–370 Yazaki J, Kishimoto N, Ishikawa M, Kikuchi S (2002) Rice expression database: the gateway to rice functional genomics. Trends Plant Sci 7:563–564 Yazaki J, Kishimoto N, Nagata Y, Ishikawa M, Fujii F, Hashimoto A, Shimbo K, Shimatani Z, Kojima K, Suzuki K, Yamamoto M, Honda S, Endo A, Yoshida Y, Sato Y, Takeuchi K, Toyoshima K, Miyamoto C, Wu J, Sasaki T, Sakata K, Yamamoto K, Iba K, Oda T, Otomo Y, Murakami K, Matsubara K, Kawai J, Carninci P, Hayashizaki Y, Kikuchi S (2003) Genomics approach to abscisic acid- and gibberellin-responsive genes in rice. DNA Res 10:249–261 Yazaki J, Shimatani Z, Hashimoto A, Nagata Y, Fujii F, Kojima K, Suzuki K, Taya T, Tonouchi M, Nelson C, Nakagawa A, Otomo Y, Murakami K, Matsubara K, Kawai J, Carninci P, Hayashizaki Y, Kikuchi S (2004) Transcriptional profiling of genes responsive to abscisic acid and gibberellin in rice: phenotyping and comparative analysis between rice and Arabidopsis. Physiol Genomics 17:87–100 Yu J, Wang J, Lin W, Li S, Li H, Zhou J, Ni P, Dong W, Hu S, Zeng C, Zhang J, Zhang Y, Li R, Xu Z, Li X, Zheng H, Cong L, Lin L, Yin J, Geng J, Li G, Shi
4 Genome-Wide RNA Expression Profiling in Rice
59
J, Liu J, Lv H, Li J, Deng Y, Ran L, Shi X, Wang X, Wu Q, Li C, Ren X, Li D, Liu D, Zhang X, Ji Z, Zhao W, Sun Y, Zhang Z, Bao J, Han Y, Dong L, Ji J, Chen P, Wu S, Xiao Y, Bu D, Tan J, Yang L, Ye C, Xu J, Zhou Y, Yu Y, Zhang B, Zhuang S, Wei H, Liu B, Lei M, Yu H, Li Y, Xu H, Wei S, He X, Fang L, Huang X, Su Z, Tong W, Tong Z, Ye J, Wang L, Lei T, Chen C, Chen H, Huang H, Zhang F, Li N, Zhao C, Huang Y, Li L, Xi Y, Qi Q, Li W, Hu W, Tian X, Jiao Y, Liang X, Jin J, Gao L, Zheng W, Hao B, Liu S, Wang W, Yuan L, Cao M, McDermott J, Samudrala R, Wong GK, Yang H (2005) The genomes of Oryza sativa: a history of duplications. PLoS Biol 3:E38 Yuan Q, Ouyang S, Wang A, Zhu W, Maiti R, Lin H, Hamilton J, Haas B, Sultana R, Cheung F, Wortman J, Buell CR (2005) The Institute for Genomic Research Osa1 rice genome annotation database. Plant Physiol 138:18–26 Zhao W, Wang J, He X, Huang X, Jiao Y, Dai M, Wei S, Fu J, Chen Y, Ren X, Zhang Y, Ni P, Zhang J, Li S, Wang J, Wong GK, Zhao H, Yu J, Yang H, Wang J (2004) BGI_RIS: an integrated information resource and comparative analysis workbench for rice genomics. Nucl Acids Res 32:377–382 Zhou Y, Tang J, Walker MG, Zhang X, Wang J, Hu S, Xu H, Deng Y, Dong J, Ye L, Lin L, Li J, Wang X, Pan Y, Lin W, Tian W, Liu J, Wei L, Liu S, Yang H, Yu J (2003) Gene identification and expression analysis of 86,136 expressed sequence tags (EST) from the rice genome. Genomics Proteomics Bioinformatics 1:26–42
5 Rice Proteomics: A Step Toward Functional Analysis of the Rice Genome
Setsuko Komatsu Department of Molecular Genetics, National Institute of Agrobiological Sciences, Tsukuba, 305-8602, Japan Reviewed by Lee Tarpley
5.1 Significance ...............................................................................................61 5.2 Database Based on 2D-PAGE ...................................................................63 5.2.1 Strategy to Determine Amino Acid Sequences for Construction of the Rice Proteome Database...........................................................63 5.2.2 Format and Content of the Rice Proteome Database..........................65 5.2.3 How to Use the Rice Proteome Database ...........................................66 5.2.4 Cataloguing of Proteins in the Rice Proteome Database ....................67 5.2.5 Future Prospects of the Rice Proteome Database ...............................67 5.3 Functional Analysis Using Differential Proteomics ..................................68 5.3.1 Stresses ...............................................................................................68 5.3.2 Hormones ...........................................................................................74 5.4 Future Prospects.........................................................................................77 5.4.1 Two-Dimensional Liquid Chromatography and Fluorescence Two-Dimensional Difference Gel Electrophoresis ............................77 5.4.2 Identification of Protein Modification for Functional Analysis .........79 5.4.3 Protein-Protein Interaction Analyses for Functional Prediction.........81 5.4.4 Concluding Remarks ..........................................................................83 Acknowledgment.............................................................................................83 References........................................................................................................83
5.1 Significance Rice is one of the world’s most important agricultural resources because it feeds almost half of the world’s population. Rice is also a model plant for biological research because its genome is smaller than those of other cereals (Devos and Gale 2000) and it has an important syntenic relationship with the
62
Setsuko Komatsu
other cereal species (Gale and Devos 1998). The draft genome sequences for Oryza sativa L. ssp. indica (Yu et al. 2002) and for O. sativa L. ssp. japonica (Goff et al. 2002), and the complete map-based sequences of chromosome 1 (Sasaki et al. 2002) and chromosome 4 (Feng et al. 2002) for O. sativa L. cv. Nipponbare provide a rich resource for understanding the biological processes of rice. Recently, the International Rice Genome Sequencing Project (2005) presented a map-based, finished-quality sequence that covers 95% of the 389-Mb genome of rice, including virtually all of the euchromatin and two complete centromeres. The annotation of rice genomes has progressed at a rapid pace during the past few years, so that currently most of the predicted genes are supported by full-length cDNAs (Kikuchi et al. 2003). Once the rice genome is completely sequenced, the challenge ahead for the monocot plant research community will be to identify the function, regulation, protein–protein interactions, and type of posttranslational modification of each encoded protein. Also, whereas the genome is relatively static, the proteome is highly dynamic in its response to external and internal cellular events. The responses of the proteome can include changes not only to the relative abundance and posttranslational modifications of each protein but also to the interactions among proteins. Proteomics is a leading technology for the high-throughput analysis of proteins on a genome-wide scale. With the completion of many genome sequencing projects and the development of improved analytical methods for protein characterization, proteomics, or the study of the entire protein content of a cell or tissue, has become a major field of functional genomics. The initial objective of proteomics was the large-scale identification of all protein species in a cell or tissue. During the last few years, considerable research effort has been directed to the analysis of the rice proteome, and remarkable progress has been made in the systematic, functional characterization of proteins in the various tissues and organelles of rice (Komatsu et al. 2003; Komatsu and Tanaka 2004; Komatsu 2005). This approach is currently being extended to analyze various functional aspects of proteins. As part of this research, a system for direct differential display using two-dimensional polyacrylamide gel electrophoresis (2D-PAGE; O'Farrell 1975) has been developed for the identification of rice proteins that vary in expression under different physiological conditions and among different tissues. This approach readily visualizes proteins, directly and rapidly identifies those with altered expression, and then analyzes their structure by comparison with the Rice Proteome Database (http://gene64.dna.affrc.go.jp/RPD/; Komatsu et al. 2004), or by mass spectrometry (MS) and Edman sequencing. This chapter, drawing from reports on rice, describes the comprehensive analysis and cataloging of rice proteins, and the functional analysis of rice using differential proteomics. Recent conceptual and technological advances are also briefly discussed.
5 Rice Proteomics
63
5.2 Database Based on 2D-PAGE Several databases based on 2D-PAGE of plant proteins are already available, such as WORLD-2DPAGE (http://expasy.org/ch2d/2dindex.html). For rice, catalogs of predicted membrane proteins, such as the Rice Membrane Protein Library (http://www.cbs.edu/rice/), are in the public domain, thus providing further support for rice proteomics efforts. In addition, the recently constructed Rice Proteome Database Web site (http://gene64.dna.affrc.go.jp/RPD/) provides extensive information on the progress of rice proteome research (Komatsu et al. 2004). Proteome analysis of select tissues and organelles has revealed diverse functional categories of proteins. Although many ubiquitous proteins have been identified that share similar functions in different tissues and organelles, most of the proteins are tissue- and subcellular compartment-specific. These results highlight the diversity of proteomes within the rice plant and hence the urgent need to analyze additional tissues and subcellular compartments to gain a more comprehensive understanding of the proteins encoded by the rice genome. The Rice Proteome Database is a compilation of known rice proteins, along with their subcellular localization and temporal expression patterns. However, the major advantage of the database is the wealth of newly identified proteins on which further experiments can be conducted at the biochemical and molecular levels (Komatsu et al. 2003). To date, proteomics studies have focused mainly on changes in genome expression that are triggered by environmental factors (Komatsu and Tanaka 2004). The aim of the research described here was a more systematic and comprehensive survey of the rice proteome—specifically, to separate proteins extracted from rice, to perform N-terminal and internal amino acid sequence analysis using a protein sequencer and MS, and to construct the Rice Proteome Database. In addition to facilitating the identification of known proteins, the sequences in the database can be used to prepare oligodeoxyribonucleotides for cloning the corresponding cDNA. Finally, an attempt was also made to study the physiological significance of some of the proteins thus identified from rice. 5.2.1 Strategy to Determine Amino Acid Sequences for Construction of the Rice Proteome Database For the rice proteome project, proteins were identified via various techniques, including gel comparison, microsequencing using a protein sequencer, and peptide mass fingerprinting using MS (Komatsu et al. 2003). The core of the Rice Proteome Database consists of a description of
64
Setsuko Komatsu
each identified protein, including calculated properties such as molecular weight, isoelectric point, and expression level; experimentally determined properties such as amino acid sequences, peptide masses, and homologous proteins; and a 2D-PAGE image showing the location of the protein. Significant progress has been made toward identifying and cataloging the proteins of rice tissues and organelles. The capacity to evaluate the functions of rice proteins has been expanded by proteomic analysis of embryo (Fukuda et al. 2003), endosperm (Komatsu et al. 1993), root (Zhong et al. 1997), green shoot (Islam et al. 2004), etiolated shoot (Komatsu et al. 1999a), and suspension-cultured cells (Komatsu et al. 1999b); anther (Imin et al. 2001; Kerim et al. 2003); leaf sheath (Shen et al. 2002); various organelles such as Golgi (Mikami et al. 2001) and mitochondria (Heazlewood et al. 2003); and other subcellular compartments (Tanaka et al. 2004a). Tsugita et al. (1994) identified 4,892 proteins from nine tissues and one organelle of rice (leaf, stem, root, germ, dark-germinated seedling, seed, bran, chaff, callus, and chloroplast). Two other studies using rice (Oryza sativa L. cv. Nipponbare) provided a more detailed proteomic analysis of leaf, root, and seed (Koller et al. 2002), and of callus, root, and leaf sheath (Tanaka et al. 2004b). However, each of these studies was conducted using different methods, making comparison of the results difficult. To avoid this problem, a consistent methodology based on 2D-PAGE (O'Farrell 1975) has been used throughout the study described here. Proteome analysis using 2D-PAGE has the power to monitor global changes that occur in the protein expression profiles of tissues and subcellular compartments. In this study, proteins extracted from 23 tissues and subcellular compartments were separated in the first dimension using isoelectric focusing (IEF) tube gels for the low pH range (4.0 to 7.0) or linear immobilized pH gradient (IPG) tube gels for the high pH range (6.0 to 10.0) (Hirano et al. 2000). Separation in the second dimension was achieved via SDS-PAGE. After detection via Coomassie brilliant blue R-250 (CBB) staining, proteins were analyzed via Image-Master 2D Elite software (Amersham Biosciences, Piscataway, NJ). The 2D maps of the low and high pH ranges overlapped at around pH 6.0. To obtain N-terminal amino acid sequences by Edman sequencing, the proteins, after separation via 2D-PAGE, were electroblotted onto a polyvinylidene difluoride membrane and detected via CBB staining. The internal amino acid sequences were determined by analyzing the sequences of peptides obtained by peptide mapping using V8 protease (Cleveland et al. 1977). The spots or bands were excised from the membrane and applied ® to a gas-phase protein sequencer (Procise , Applied Biosystems, Foster City, CA).
5 Rice Proteomics
65
For MS, individual protein spots were excised from the gel and digested with the site-specific protease trypsin, resulting in a set of tryptic peptides. The peptides were extracted, and their masses were measured via matrixassisted laser desorption ionization-time of flight MS (Voyager™, Applied Biosystems). The list of measured peptide masses was compared with the masses of the predicted tryptic peptides for each entry in the sequence database. The following three criteria were used to select a true positive match with proteins that were not clearly identified: (1) the mass deviation between the experimental and theoretical peptide masses had to be less than 50 ppm; (2) at least four different predicted peptide masses were needed to match the observed masses for an identification to be considered valid; and (3) the matching peptides had to cover at least 10% of the complete protein sequence. Further, the score obtained from the Mascot software (Matrix Science, London) analysis that indicates the probability of a true positive identification had to be at least 50. 5.2.2 Format and Content of the Rice Proteome Database As a complement to more focused studies, and to facilitate future advances in rice functional genomics, the Rice Proteome Database was constructed (Komatsu et al. 2004). This database compiles information about proteins identified on 2D-PAGE maps of protein extracts from a wide variety of rice tissues and subcellular compartments. Each entry in the Rice Proteome Database corresponds to one protein from the 2D-PAGE image file. The three main features specific to the Rice Proteome Database are briefly summarized as follows: (i)
The reference 2D-PAGE map shows the position of the identified entry. Spot numbers are displayed on this 2D-PAGE image. The spot list contains a table listing the proteins on each 2D-PAGE map in the Rice Proteome Database. Experimental protocols used for protein purification and 2D-PAGE, with either IEF or IPG in the first dimension, are shown on this page. The 2D-PAGE image was synthesized as a composite of gels run using the two different firstdimension methods and the positions of individual proteins on the gels were evaluated using Image-Master 2D Elite software. (ii) The spot information pages provide a range of information about each protein spot, including mapping procedure and spot coordinates; the calculated properties of the protein such as molecular mass, isoelectric point, and expression level; the experimentally determined properties, such as amino acid sequences and peptide masses obtained using protein sequencers and mass spectrometry, respectively, and the homologous proteins
66
Setsuko Komatsu
predicted by these two methods; and other information. The accession number of each homologous protein links to the NCBI site (http://www.ncbi.nlm.nih.gov/). Other information shows the accession number and the percentage identity of the homologous full-length cDNA in rice, and biological information such as the known function or functions obtained experimentally. (iii) The Mascot Search Results page displays the peptide masses derived from mass spectrometry. This page brings together the Mascot Search Results such as the accession numbers of homologous proteins, scores, sequence coverage, and predicted peptides. This page is also linked to the Mascot Web site (http://www.matrix-science.com/). The Rice Proteome Database has links to the NIAS rice genome tools, which are the Rice Expression Database (RED), the Rice Full-length cDNA Database (KOME), the Rice Genome Integrated Map Database (INE), the Rice Mutant Panel Database (Tos17), the Rice Genome Annotation Database (RiceGAAS), and DNA Bank. The Rice Proteome Database also links to many useful proteomics tools and other proteomics databases. 5.2.3 How to Use the Rice Proteome Database The Rice Proteome Database can be accessed through the Rice Proteome Database homepage at http://gene64.dna.affrc.go.jp/RPD/. The Rice Proteome Database homepage and the contents of the Rice Proteome Database are maintained by the author. The Rice Proteome Database homepage provides introductory material on the Rice Proteome Database. A Rice Proteome Database entry may be obtained from the server in four different ways: (i)
By selecting a spot on one of the 2D-PAGE reference maps. The Rice Proteome Database contains information on proteins identified from several tissues and organelles on 2D-PAGE reference maps. These 2D-PAGE maps can be reached by clicking the individual tissues/organelles denoted by red boxes. Only spots with sequence data are highlighted and labeled “Annotation Data Available”. (ii) By “protein keyword” or “protein database accession identifiers” using the protein name or accession number. The Rice Proteome Database can be searched using proteins as keywords. (iii) By isoelectric point and molecular weight for any protein. The Rice Proteome Database can be searched with a range of isoelectric points and molecular weights. (iv) By similarity search with the user’s amino acid sequences. The query sequence can be searched using the homology search tools
5 Rice Proteomics
67
BLASTP and BLASTX for the presence of amino acid sequences identical to or similar to previously reported amino acid sequences in the Rice Proteome Database. 5.2.4 Cataloging of Proteins in the Rice Proteome Database The current release contains 23 reference maps from rice samples that are either tissue-specific, such as suspension-cultured cells, endosperm, embryo, crown (the basal part of the young seedling leaf sheath), seedling root, seedling leaf sheath, seedling leaf blade, stem, mature plant root, mature plant leaf sheath, mature plant leaf blade, anthers, panicle before heading, and panicle after heading and 1 week after flowering; or specific to a subcellular location, such as cell wall, plasma membrane, vacuole membrane, Golgi membrane, mitochondrion, chloroplast, nucleus, and cytosol. These reference maps of proteins from various tissues and subcellular fractions have a total of 13,129 identified protein spots, corresponding to 5,236 separate protein entries in the database. The information on amino acid sequences is updated frequently. Tissue-specific proteins include polypeptides involved in general metabolism, energy production, transcription, and signal transduction in the leaf sheath; in metabolism and defense in the root; and in metabolism, energy production, cell growth, defense, and signal transduction in suspension-cultured cells (Tanaka et al. 2004a). The number of N-blocked proteins in the leaf sheath, root, and suspension-cultured cell samples was 46%, 56%, and 38%, respectively. This result is consistent with a previous report in which 134 rice proteins were subjected to sequencing and 79 proteins (59%) were found to have blocked N-termini (Tsugita et al. 1994). The proteins specific to a subcellular location are involved in a variety of processes, such as respiration and the citric acid cycle in mitochondria; photosynthesis and ATP synthesis in chloroplasts; and antifungal defense and signaling in the membranes. The N-terminal amino acid sequences of many subcellular compartment-specific proteins could not be determined and these proteins were inferred to have a blocking group at their Nterminus. Edman degradation revealed that 60% to 98% of the N-terminal sequences were blocked, and that the ratios of blocked to unblocked proteins varied among the proteomes of the various subcellular compartments (Tanaka et al. 2004b). 5.2.5 Future Prospects of the Rice Proteome Database In the future, information on posttranslational modifications such as phosphorylation, glycosylation, and other modifications, obtained experimentally
68
Setsuko Komatsu
by immunoblot analysis, will be added to the Rice Proteome Database. As new samples are evaluated, the number of identified proteins will be increased, and new information from functional analysis of physiologically significant proteins will be added to the Rice Proteome Database with regular updates. Analysis by 2D-PAGE provides a convenient way to study the various proteins that are present in rice and identify those that are regulated in response to different growth and/or stress conditions. Knowing where and when individual proteins are being synthesized in rice, with respect to tissue, subcellular compartment, and developmental stage, can also provide new clues about their function. The partial amino acid sequences determined for these proteins will contribute greatly to the field of plant molecular biology, by allowing the identification of new rice proteins of interest through homology searches. The information thus obtained from the Rice Proteome Database will be helpful in predicting the function of rice proteins and will aid in their molecular cloning in future experiments.
5.3 Functional Analysis Using Differential Proteomics One of the most commonly used methods for quantitative proteomics is 2D-PAGE coupled to either MS or protein sequencing. In the 2D-PAGE– based approach, intact proteins are separated by 2D-PAGE, and the abundance of a protein is determined based on the stain intensity of the protein spot on the gel. The differential proteome is confirmed by image analysis software. The identity of the protein is generally determined by MS analysis of peptides after proteolysis of the protein spot or by protein sequencing after blotting the gel to a membrane. The 2D-PAGE–based approach has been routinely used for large-scale quantitative proteomics analyses. 5.3.1 Stresses To grow and develop optimally, plants need to perceive and process information from both their biotic and abiotic surroundings. Because plants are not motile, they have to be especially responsive to environmental changes, including stress conditions. Although the responses of cereals to several stresses are well understood at the physiological and transcriptional levels, they are not well understood at biochemical level. Proteomics approaches to identifying proteins that are differentially regulated in response to environmental conditions are becoming commonplace in post-genomic research in cereals. Initial steps
5 Rice Proteomics
69
toward determining the physiological significance of some proteins identified in cereals exposed to abiotic and biotic stresses are described in the following sections. Cold
Crop plants in tropical and subtropical regions can be seriously injured by o temperatures below 12 C but above the freezing point (Lyons 1973). A primary, if not exclusive, effect of chilling is considered to be the phase transition of membrane lipids at the critical temperatures (Lyons 1973; Raison et al. 1971). The way plants acclimatize to cold stress is not well understood at the biochemical level, but rice seedlings exposed to low temperatures show various changes in their transcriptome. For example, microarray analysis has shown that 36 rice genes appear to be induced under cold stress, and that the expression level for several genes reaches a maximum after 24 h of cold treatment (Rabbani et al. 2003). Although this gene expression profiling has deepened our understanding of the response of rice to cold stress, it is still unknown how the transcriptional changes are reflected at the translational level. Changes in the transcriptome are not always closely correlated with changes in the protein profile (Gygi et al. 1999b). With this limitation in mind, a proteomic study of rice was carried out to gain a better understanding of the molecular mechanisms of acclimatization to cold stress. Rice seedlings were exposed to a progressively lower temperature stress treatment involving successive shifts from the normal growth temperature o to 15, 10, and 5 C (Cui et al. 2005). From these seedlings, approximately 1,700 protein spots were separated and visualized on CBB-stained 2DPAGE gels. Sixty protein spots were found to be up-regulated in response to the progressively lower temperature treatment and to display various induction patterns. These cold-responsive proteins included four protein biosynthesis factors, four molecular chaperones, two proteases, eight enzymes involved in biosynthesis of cell wall components, seven antioxidative/detoxifying enzymes, two proteins of unknown function, and proteins linked to energy metabolism and signal transduction. One of these proteins was identified as ferritin, and ferritin also was found to be coldresponsive in the earlier microarray study (Rabbani et al. 2003). The appearance of ferritin in both studies strengthens the conclusion that this protein may play a role in protecting cells from cold stress. A large proportion of the proteins (43.9%) were predicted to be located in the plastids, implying that the plastid proteome is particularly responsive to cold stress.
70
Setsuko Komatsu
Drought
Drought tolerance is required in plants that experience prolonged deficits in soil water. Tolerant plants can maintain the water content of their tissues, survive a reduction in tissue water content, and recover more completely after rewatering. Drought is one of the most severe limitations to the productivity of rain-fed lowland and upland rice. Drought is also one of the major factors limiting the yield of sugar beet. Hajheidari et al. (2005) reported that the efficiency of breeding for increased drought tolerance could be greatly improved by the identification of candidate genes for marker-assisted selection. One way to identify potentially important drought-tolerant genes is to analyze drought-induced changes in the proteome. To this end, two genotypes of sugar beet differing in genetic background were cultivated in the field. Certain proteins in these plants showed genotype-specific patterns of up- or down-regulation in response to drought; these proteins included ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO), plus 11 other proteins involved in redox regulation, oxidative stress, signal transduction, and chaperone activities. Some of these proteins could contribute a physiological advantage under drought, making them potential targets for markerassisted selection for drought tolerance. In the case of rice, Salekdeh et al. (2002) reported a proteomic analysis of drought-conditioned leaves of 3-week-old plants. They compared protein expression in the drought-tolerant cultivars IR62266 (lowland indica) and CT9993 (upland indica). Of more than 1,000 protein spots detected in leaf extracts, 42 proteins showed a significant change in abundance under stress, with 27 of them exhibiting a different response pattern between the two cultivars. For example, the expression of chloroplast superoxide dismutase (SOD)[Cu-Zn] changed significantly in opposite directions in the two cultivars in response to drought. Ten days after rewatering, the abundance of all the drought-responsive proteins had returned more-or-less completely to that of the well-watered control. In CT9993 and IR62266, the proteins that increased most in response to drought were S-like RNase homolog, actin depolymerizing factor, and RuBisCO activase, whereas the protein that decreased most was isoflavone reductase-like protein. Recently, another study used a proteomic approach to investigate changes in protein expression during the initial response of rice to drought stress (Ali and Komatsu 2006). Two-week-old rice seedlings were exposed to drought conditions for 2 to 6 days, and proteins were extracted from leaf sheaths, separated by 2D-PAGE and stained with CBB. After drought stress for 2 to 6 days, 10 proteins increased and two proteins decreased in abundance. The functional categories of these proteins were identified as
5 Rice Proteomics
71
defense, energy, metabolism, cell structure, and signal transduction. Interestingly, SOD was drought-responsive in both japonica and indica rice (see earlier), suggesting this is a key enzyme in drought stress. The effects of drought stress on the proteome were also compared to those of several other stress conditions. The levels of actin depolymerizing factor, light harvesting complex chain II, SOD, and salt-induced protein (SALT) were changed by drought and osmotic stresses, but not by cold or salt stresses or abscisic acid treatment. We also analyzed the effect of drought stress on leaf sheath proteins of a drought-tolerant rice cultivar. Lightharvesting complex chain II and actin depolymerizing factor were present at high levels in the drought-tolerant rice cultivar even before stress application. With drought stress, actin depolymerizing factor was also expressed in leaf blades, leaf sheaths, and roots. These results suggest that actin depolymerizing factor is one of the target proteins induced by drought stress. Salinity
Like drought, high salinity also causes a water deficit in plants. Salt stress is a major abiotic stress in agriculture worldwide, with an estimated 20% of Earth’s land mass and nearly half of all irrigated land affected by salinity. Increased salinization of arable land is expected to have devastating global effects, with predictions of 30% land loss within the next 25 years, and up to 50% by the year 2050 (Yan et al. 2005). Response to salinity is a very complex quantitative trait. The plant cell apoplast is a dynamic compartment involved in a variety of functions during normal growth and under stress conditions, and has a primary role in cell nutrition, because cells import ions and metabolites from the apoplast (Dani et al. 2005). Salt ions can have a specific detrimental effect on plasma membranes or, after uptake into the protoplast, may cause reduced germination efficiency, inhibition of plant growth, delayed flower emergence, or early leaf senescence (Dani et al. 2005). Dani et al. (2005) used Nicotiana tabacum plants as a model to investigate changes in apoplast soluble-protein composition induced in response to salt stress. Using a vacuum infiltration procedure, apoplastic fluid was extracted from leaves of control plants and plants exposed to salt stress. Quantitative evaluation and statistical analyses of the spots resolved in treated and untreated samples revealed 20 polypeptides whose abundance changed in response to salt stress. Among these, two chitinases and a germin-like protein increased significantly, and two lipid transfer proteins were expressed entirely de novo. Some apoplastic polypeptides, involved in cell wall modifications during plant development, remained largely unchanged.
72
Setsuko Komatsu
Rice is generally considered to be sensitive to salinity. A proteomics approach was used to identify rice proteins that increase in abundance under this type of stress in leaf sheath, root, and leaf blade (Abbasi and Komatsu 2004). In rice leaf sheath exposed to 50 mM NaCl for 24 h, eight proteins consistently showed significant changes in abundance. Of these eight proteins, three were unidentified, but the other five were identified as oxygen evolving enhancer protein 2 (OEE2)—two fructose bisphosphate aldolases, two SODs, and one protein of unknown function. This study also revealed that increased expression of SOD by salt stress in leaf sheath o is a common response to cold (5 C), drought, and abscisic acid treatments. This finding suggests that the accumulation of SOD in response to salt, drought, and cold stress has a generally protective role against stress conditions. Under salt stress, enhanced expression of OEE2 and aldolase in leaf sheath was also detected in leaf blade. These results indicate that different specific sets of proteins are enhanced in distinct regions of the rice plant and show a coordinated response to salt stress. Yan et al. (2005) also reported a proteomic analysis of salt stress in rice. Three-week-old seedlings were treated with 150 mM NaCl for 24, 48, and 72 h. Based on 2D-PAGE patterns, more than 1,100 proteins were reproducibly detected, including 34 that were up-regulated and 20 that were down-regulated. Three spots were identified as the same protein, enolase. Whereas four of the changed proteins were previously identified as SALT, six were novel: UDP-glucose pyrophosphorylase, cytochrome c oxidase subunit 6b-1, glutamine synthetase root isozyme, putative nascent polypeptide associated complex α-chain, putative splicing factor-like protein, and putative actin-binding protein. Ozone
Ozone (O3) is a destructive gaseous pollutant that seriously affects human and animal respiration, as well as causing extensive damage to both natural and cultivated plant populations (Chameides et al. 1997). The resistance of rice to O3 is a quantitative trait controlled by nuclear genes (Kim et al. 2004). The identification of quantitative trait loci and analysis of molecular markers of O3 resistance is important for increasing the resistance of rice to O3 stress. Quantitative trait loci associated with the O3 resistance of rice were mapped on chromosomes using recombinant inbred lines from a cross between Milyang 23 and Gihobyeo. The quantitative trait loci were tightly linked to three markers and were detected in each of three replicates. The association between these markers and O3 resistance in rice cultivars and doubled haploid populations was analyzed. The markers permit the screening of rice germplasm for O3 resistance and the introduction of resistance into elite lines in breeding programs. This study,
5 Rice Proteomics
73
by identifying ozone-related quantitative trait loci, provides an increased understanding of ozone responsiveness in rice and may lead to applications in breeding for enhanced ozone tolerance. Plant responses to ozone have also been analyzed via a proteomics approach. In rice leaves, ozone caused marked visible necrotic damage and increases in ascorbate peroxidase proteins; these changes were accompanied by rapid changes in the 2D-PAGE protein profile (Agrawal et al. 2002). Of 56 proteins investigated, 52 protein spots were visually identified to be differentially expressed relative to the control. Ozone caused marked reductions in the major photosynthetic proteins in leaf, including RuBisCO, and the induction of various defense- and stressrelated proteins. This research provides evidence for the specific and rapid accumulation of certain proteins, such as PR proteins (OsPR5 and OsPR10), ascorbate peroxidase, SOD[Mn], and the ATP-dependent caseinolytic protease, which could serve as sensitive markers to monitor ozone-related damage in rice. Fungus
The ability of plants to defend themselves against most potential pathogens depends on sensitive perception mechanisms that recognize microbial invaders and subsequently activate defense responses. Rice blast disease, caused by Magnaporthe grisea, is the most serious disease of cultivated rice in most rice-growing regions of the world (Valent et al. 1991). The M. grisea–rice interaction is a model system for understanding plant disease, not only because of its great economic importance, but also because of the genetic and molecular–genetic tractability of the fungus (Valent et al. 1991). A proteomics approach has been applied to the study of pathogeninfected rice (Kim et al. 2003). Proteins were extracted from suspensioncultured cells after inoculation with the rice blast fungus M. grisea, or treatment with an elicitor or other signal molecules such as jasmonic acid (JA), salicylic acid, or H2O2. Analysis by 2D-PAGE identified 14 protein spots that showed increased or decreased expression after these treatments. Of these protein spots, 12 proteins from six different genes were identified. Specifically, OsPR10, isoflavone reductase-like protein, β-glucosidase, and a putative receptor-like protein kinase were induced by rice blast fungus, whereas six isoforms of probenazole-inducible protein and two isoforms of SALT responded to blast fungus, elicitor, and jasmonic acid (JA). Western blot analysis to quantify the expression levels of probenazole-inducible protein, OsPR10, and SALT revealed that these proteins, which take part in incompatible interactions, were induced earlier and to a greater extent than were proteins involved in compatible reactions.
74
Setsuko Komatsu
Konishi et al. (2001) identified proteins that showed expression changes in rice leaf blade infected with M. grisea. Using proteome analysis, the same study also showed that quantitative expression changes in these proteins were greatly influenced by the levels of nitrogen fertilizer. Rice plants that have been exposed to excessive nitrogen fertilizer are more susceptible to blast disease than are those exposed to low levels of nitrogen. In contrast to low-nitrogen rice plants, high-nitrogen rice plants show many more lesions, and these lesions are larger. Twelve proteins that appeared to change according to the level of nitrogen were identified. For example, the amounts of RuBisCO large and small subunits were increased after a nitrogen top dressing, but the RuBisCO small subunit was decreased after nitrogen top dressing combined with blast fungus infection. After blast fungus infection, PR-1 was induced by a nitrogen top dressing. It was proposed that these proteins might be involved in incompatible interactions in rice plants after blast fungus infection. Virus
Rice yellow mottle virus (RYMV), a member of the genus Sobemovirus, is endemic to Africa (Pinel et al. 2000; Abubakar et al. 2003), and is considered to be very detrimental to rice production. With only four open reading frames, this virus could be considered as a model for studying the genetics and genomics of virus resistance (Brugidou et al. 2002). The response to RYMV infection has been analyzed in cells of two cultivars of rice: indica rice IR64, which is susceptible to infection; and japonica rice cv. Azucena, which is partially resistant to RYMV (Ventelon-Debout et al. 2004). Of the proteins resolved on 2D-PAGE gels, 64 (40 proteins in IR64, and 24 proteins in Azucena) responded to RYMV infection. Nineteen differentially regulated proteins were identified for the IR64 cultivar, and 13 were identified for the Azucena cultivar. These included proteins in three functional categories: metabolism, stress-related proteins, and translation. This study shows that several proteins regulated by abiotic stress response pathways are also activated by RYMV; these include SALT, heat-shock proteins, and SOD. On the other hand, other proteins seem to be more specific to RYMV infection, such as dehydrin and proteins involved in glycolysis. 5.3.2 Hormones Several plant hormones are thought to regulate flowering by moving from the leaves to the shoot apex. This view is based mainly on the effect on flowering of mutations in genes affecting hormone synthesis or hormone
5 Rice Proteomics
75
signal transduction, and of exogenous applications of hormones or hormone inhibitors (Suarez-Lopez 2005). Plant hormones also play an important role in many aspects of signal transduction in cells, as well as in several growth and development pathways, such as seed dormancy/germination, stem elongation, leaf expansion, and fruit development. Gibberellins
Gibberellins (GAs) are essential regulators that stimulate stem or internodal elongation (Hooley 1994). Proteins that are regulated by the GA response in rice leaf sheath elongation have been analyzed via the differential display proteome method (Shen et al. 2003). When the leaf sheath of the 2-week-old rice seedling was treated for 48 h with GA3, of 352 leaf sheath protein spots detected on 2D-PAGE gels, 32 showed altered expression. Of these proteins, two 56-kDa protein spots of different isoelectric points (pI 4.0 and 4.3), both identified as calreticulin, showed different expression levels in the GA3-treated leaf sheath. The expression level of the pI 4.0 spot was down-regulated in response to GA3, whereas the pI 4.3 spot was up-regulated. In an earlier study, a calreticulin with a pI value of 4.5, which has been identified subsequently as the pI 4.3 spot, was found to be phosphorylated in vitro in short-term suspension-culture cells, whereas no protein with a pI value of 4.0 was phosphorylated (Komatsu et al. 1996). Together these data suggest that the twin 56-kDa spots represent phosphorylated and unphosphorylated forms of calreticulin, and that the phosphorylated form becomes more abundant in response to GA3. The above study of GA-regulated proteins focused on specific proteins involved in leaf sheath elongation in rice (Shen et al. 2003). Another study analyzed GA-regulated proteins in leaf sheath, root, and suspensioncultured cells of rice (Tanaka et al. 2004b). Lists of proteins present in these tissues were constructed and used to investigate the effects of GA3 treatment. Proteins from rice leaf sheath, root, and suspension-cultured cells were analyzed by 2D-PAGE, and the expression of 8, 21, and 14 proteins, respectively, was found to be changed by the addition of exogenous GA3. In the leaf sheath, the proteins that responded to GA3 were involved in transcriptional regulation (the Osem gene and replication protein A1), primary metabolism (fructokinase, lactoylglutathione lyase, and OEE2), and signal transduction (putative receptor-like kinase). In the root tissue, proteins affected by GA3 treatment appeared to be involved in defense reactions (glutathione S-transferase, SOD[Cu-Zn], Bowman-Birk protease inhibitor, glutathione S-transferase-dependent dehydroascorbate reductase and PR-1), suggesting that GA has an essential role in defense reactions in rice roots. In suspension-cultured cells, the GA-regulated
76
Setsuko Komatsu
proteins fell into several functional categories, including metabolism (formate dehydrogenase and thioredoxin), energy (glyceraldehyde-3phosphate dehydrogenase), cell growth (growth factor 14-c protein), protein folding (chaperonin 60), transcription (nucleotide binding protein 2 and homeobox), defense (phenylalanine ammonia-lyase and glutathione S-transferase), signal transduction (small G protein), transport (voltagedependent anion channel), and hypothetical proteins. The GA-regulated proteins in these tissues might play a significant role in tissue growth stimulated by GA. Brassinosteroids
Brassinosteroids (BRs) are naturally occurring plant steroids with structural similarities to animal steroid hormones. Exogenous application of BRs to plant tissues evokes various growth responses, such as cell elongation, proliferation, differentiation, and organ bending, and enhanced stress tolerance (Sasse 1997). A proteomics study based on the application of BRs to the lamina joint or root of 2-week-old rice seedlings has been reported (Konishi and Komatsu 2003). Lamina inclination was markedly stimulated by brassinolide (BL), an active BR molecule, whereas root elongation was inhibited. On 2D gels, 786 proteins were detected in extracts from the lamina joint and 508 in root extracts. BL treatment induced changes in the expression of nine proteins in the lamina joint and 12 proteins in the root. Most of these proteins were related to photosynthesis in the lamina joint and to stress tolerance in the root. After BL-induced inclination of the lamina joint, degradation of the RuBisCO large subunit was observed, suggesting that inclination to receive more light than usual might be associated with degradation of the RuBisCO large subunit. Jasmonic Acid
JA is one of the simplest nontraditional plant hormones and has diverse functions, including potential roles in plant defense as part of more complex signaling pathways. JA treatment of the leaf and stem of 2-weekold rice seedlings was found to result in necrosis, accompanied by marked reductions in the abundance of RuBisCO subunits (Rakwal and Komatsu 2000). JA-treated stem tissues showed especially strong induction of several novel proteins, including a basic 28-kDa Borman-Birk proteinase inhibitor and an acidic 17-kDa PR-1 protein. Immunoblot analysis using antibodies generated against these proteins revealed a tissue-specific expression pattern and time-dependent induction after JA treatment. Further, this induction was blocked by a protein synthesis inhibitor,
5 Rice Proteomics
77
indicating de novo protein synthesis in response to JA. These results indicate that JA affects defense-related gene expression in rice seedlings, as judged by the de novo synthesis of novel proteins with potential roles in plant defense. Auxin
Auxin plays a critical role in apical dominance, and in lateral root initiation and emergence (Casimaro et al. 2001). In rice, root formation is regulated by auxin coupled with zinc, and a proteomics analysis of 2-week-old seedlings and suspension-cultured cells treated with auxin and zinc found seven proteins to be upregulated by this treatment (Oguchi et al. 2004a, 2004b; Yang et al. 2005). Of these proteins, NADPH-dependent oxidoreductase, methylmalonate-semialdehyde dehydrogenase (MMSDH), and elongation factor 1β΄ (EF-1β΄) were strongly up-regulated, as compared with the untreated control. NADPH oxidoreductase and MMSDH were detected in suspension-cultured cells, root, and leaf sheath, but not in leaf blade. The abundance of MMSDH protein also was increased in GA-treated suspension-cultured cells, as well as in the constitutive GA response mutant slr1, indicating that MMSDH is regulated by the GA signal transduction pathway. During root formation stimulated by auxin and zinc, the expression of NADPH oxidoreductase, MMSDH, and EF-1β΄ was increased, suggesting that these proteins play an important role in the formation of roots in rice.
5.4 Future Prospects The strengths and weaknesses of current technologies for rice proteomics are discussed in this section. Recent conceptual and technological advances are also briefly discussed. Challenges posed by different methodological approaches and techniques for rice proteomics are considered, along with the usefulness of bioinformatics for database and cluster analysis applications in the field of proteomics. 5.4.1 Two-Dimensional Liquid Chromatography and Fluorescence Two-Dimensional Difference Gel Electrophoresis The isotope coded affinity tag (ICAT) technique has been developed for quantitative comparisons without 2D-PAGE (Gygi et al. 1999a). However, problems relating to reproducibility and the number of replicates required for establishing statistical significance have yet to be completely resolved
78
Setsuko Komatsu
(Rabilloud 2002). Alternative “gel-less” approaches, such as multidimensional protein identification technology (MudPIT), have been used effectively to catalog a large number of polypeptides in total protein mixtures from several organisms, including rice (Koller et al. 2002; Whitelegge 2002). However, although MudPIT is an excellent way to generate an exhaustive catalog of proteins present in a particular protein sample, it does not yield reproducible quantitative information (Rose et al. 2004). Komatsu et al. (2006) compared two proteomics techniques, 2D liquid chromatography (2D-LC) and fluorescence 2D difference gel electrophoresis (2D-DIGE), for their ability to identify proteins regulated by the plant hormone GA in rice. For 2D-LC, proteins were extracted from 5 µM GA3-treated and untreated tissues, purified with a ProteomeLab PF 2D kit (Beckmann Coulter, Fullerton, CA), and separated on a pI gradient and reversed phase columns. The image of the 2D map generated by 2DLC was visualized and analyzed via ProteoVue software (Beckmann Coulter). For 2D-DIGE, proteins extracted from 5 μM GA3-treated and untreated tissues were labeled with Cy5 and Cy3 (Amersham Biosciences), respectively. The image of the 2D map generated by 2D-PAGE was visualized and analyzed using DeCyder software (Amersham Biosciences). The 2D-LC resolved 1,248 proteins and 2D-DIGE resolved 1,500 proteins. Of these proteins, 2D-LC identified nine that were up-regulated and nine that were down-regulated by GA3 treatment, while 2D-DIGE identified four up-regulated and four down-regulated proteins. In our previous studies, 32 proteins from leaf sheaths grown in vitro (Shen et al. 2003), and eight proteins from leaf sheaths of intact rice plants (Tanaka et al. 2004a) were regulated by GA during elongation. However, most of the 32 proteins detected using leaf sheath segments were stress-related proteins and only five of the proteins from intact treated plants were within the normal range of IEF; the other three migrated to the basic side of the IPG gels. These results suggest that 2D-LC and 2D-DIGE are powerful methods for detecting previously overlooked changes in the proteome. Both 2D-LC and 2D-DIGE detected many more proteins than standard 2D-PAGE/ CBB staining. Glyceraldehyde-3-phosphate dehydrogenase and OEE2 were two of the few GA3-regulated proteins detected using 2DPAGE. Moreover, the two new 2D methods were able to detect unknown and novel GA3-regulated proteins that had not been reported previously. The 2D-DIGE was the best method for detecting low-abundance GAresponsive proteins. For example, this method detected several GA3repressed proteins in the basal region, one of which was ATP sulfurylase, the first enzyme of the sulfate assimilation pathway. ATP sulfurylase is localized primarily in plastids, but there is also a minor cytosolic form
5 Rice Proteomics
79
(Leustek et al. 2000). Some aspects of plant sulfur metabolism, including the transport and cycling/degradation of sulfur compounds, are still unclear (Leustek et al. 2000). Nevertheless, the decrease in ATP sulfurylase after GA3 application indicates that GA regulates the first step of sulfur metabolism in rice plants. 2D-LC was the best method for detecting low-molecular-weight proteins. For example, this method detected an increase in acyl-CoAbinding protein after GA3 treatment. Cytosolic 10-kDa acyl-CoA-binding proteins are prevalent in eukaryotes and are highly conserved across species, suggesting that their physiological roles have been preserved through evolution. In plants, fatty acids synthesized in the chloroplasts are exported as acyl-CoA esters to the endoplasmic reticulum (Leung et al. 2004). The increase in acyl-CoA-binding protein after GA3 treatment indicates that GA regulates lipid metabolism. The ability of 2D-LC and 2D-DIGE to detect these proteins suggests that these methods are among the most useful tools for detecting regulated proteins of low molecular weight or low abundance. The 2D-LC technique has been developed to improve quantitative comparisons of protein mixtures in the absence of 2D-PAGE and is the preferred method for detecting low-molecular-weight proteins. However, poor reproducibility and the large number of replicates required to establish statistical significance are problems that still must be resolved. On the other hand, the 2D-DIGE technique can make exact quantitative comparisons and is very sensitive. Therefore, a combination of 2D-LC and 2D-DIGE, along with other proteomic methodologies, is the best way to obtain a comprehensive picture of changes in the proteome. 5.4.2 Identification of Protein Modification for Functional Analysis Once the rice genome is completely sequenced, the challenge for the plant research community will be to identify the function, regulation, and type of posttranslational modification of each encoded protein. The responses of the proteome can include changes not only in the relative abundance, but also in the posttranslational modification of each protein. Such efforts are at present complicated by the various posttranslational modifications that proteins can experience, including glycosylation, lipid attachment, phosphorylation, methylation, disulfide bond formation, and proteolytic cleavage. Whereas these and other posttranslational protein modifications have been well characterized in Eucarya and Bacteria, specific posttranslational modifications in rice have received far less attention. With the completion of genome sequencing in rice, it is now theoretically possible to identify signaling components through phosphoproteomics. To
80
Setsuko Komatsu
address this challenging problem, a number of techniques can be used to detect and identify phosphorylated proteins. Labeling of the proteins with 32 P is a highly selective and sensitive technique for detecting phosphoproteins (Immler et al. 1998; Larsen et al. 2001). In vitro labeling with [γ-32P]ATP followed by 2D-PAGE separation and exposure to X-ray film allows direct visualization and rough quantification of phosphoprotein spots. However, the most powerful method for analyzing phosphoproteins is mass spectrometry because of its sensitivity in detecting phosphorylation directly from excised protein spots (Resing and Ahn 1997). Signaling pathways need to be regarded as complex networks. These signal networks are characterized by multiple points of convergence and divergence that enable integration of signaling pathways at various levels and provide the molecular basis for an appropriate response. Khan et al. (2005) carried out a detailed phosphoproteome analysis in various tissues of rice using an in vitro protein phosphorylation technique followed by mass spectral analysis. Their study investigated changes in protein phosphorylation caused by treatments with various plant hormones and stresses (Khan et al. 2005). To test whether the exogenous application of hormones could change the phosphorylation status of rice leaf sheath proteins, rice leaf sheath segments were treated with plant hormones in vitro. A similar overall pattern of phosphorylation was observed no matter which hormone was used. However, there was some specificity in the responses to the various hormones. The phosphorylation status of six proteins changed in response 2+ to GA3: Ca -binding protein, glyceraldehyde-3-phosphate dehydrogenase, cytoplasmic malate dehydrogenase, putative zinc-finger protein, glyoxalase-I, and an unknown protein. Three proteins showed changes in phosphorylation status in response to BL: glyceraldehyde-3-phosphate dehydrogenase, cytoplasmic malate dehydrogenase, and aldo/keto reductase family protein. Five proteins responded to 2,4-D treatment: glyceraldehyde-3-phosphate dehydrogenase, cytoplasmic malate dehydrogenase, putative zinc-finger protein, glyoxalase-I, and calmodulinrelated protein. The phosphorylation of putative zinc-finger protein and glyoxalase-I was increased by both GA3 and 2,4-D treatment, but not by BL. Phosphorylation of aldo/keto reductase family protein was increased only by BL, while the phosphorylation of calmodulin-related protein was increased only by 2,4-D. Further, only GA3 treatment caused changes in 2+ the phosphorylation of Ca -binding protein and an unknown protein. These results demonstrate hormone-specific phosphorylation. In contrast, the phosphorylation of glyceraldehyde-3-phosphate dehydrogenase and cytoplasmic malate dehydrogenase was enhanced by all of the hormones. Glyceraldehyde-3-phosphate dehydrogenase and cytoplasmic malate
5 Rice Proteomics
81
dehydrogenase are involved in the synthesis of various metabolites and the subsequent production of energy. The enhanced phosphorylation of these proteins in response to several different hormones indicates that this may be the mechanism through which the hormones activate metabolic pathways in rice leaf sheath and thus stimulate plant growth. 5.4.3 Protein–Protein Interaction Analyses for Functional Prediction Eubel et al. (2005) has stated that even if improvements in standard 2D gel techniques could further alleviate the problems posed by protein hydrophobicity, a complete understanding of the processes taking place within the cell requires much more than just identification of the individual polypeptides forming the proteome. Most cellular processes require the action of several enzymes, many of which contain multiple subunits. Further, to raise the efficiency, specificity, and speed of metabolic pathways, these enzymes often are associated, transiently or stably, into large protein complexes. Knowledge of the composition and structure of these protein complexes will result in a much deeper understanding of metabolic pathways and cellular processes than can be delivered by protein identities alone (Eubel et al. 2005). There are many ways to investigate protein interactions, each with its advantages and drawbacks. Many of the approaches commonly used focus on the actual or possible interaction partners of a particular protein of interest. Examples include yeast two-hybrid systems, co-immunoprecipitation, pull-down assays, and in vivo fluorescence techniques. However, none of these approaches are designed to provide a global overview of protein– protein interactions in a given complex proteome in a single experiment. Further, these studies often lack the rigorous quantitative analyses necessary to assess reproducibility or to group proteins into expression classes. In contrast, differential proteomics lends itself well to quantitative analysis. One recent example is a study by Lonhosky et al. (2004), using de-etiolated maize chloroplasts as a model system. In this report, hierarchical and nonhierarchical statistical methods were used to analyze the expression patterns of 526 high-quality, unique protein spots on 2D gels. A general protocol was developed that can be used to generate highquality, reproducible data sets for comparative plant proteomics. Although a growing number of comparative proteomics studies have been reported for plant systems, the grouping of proteins into expression classes has generally been qualitative rather than quantitative. One quantitative approach that can be used to determine relationships among proteins is cluster analysis, which groups proteins according to their
82
Setsuko Komatsu
expression pattern over multiple samples. Using this approach, identified proteins with an unknown function can be related to other proteins that have similar expression patterns and whose functions have been determined. In a recent rice study, Tanaka et al. (2005) reported the quantitative analysis of changes in basal region proteins at five time points after sowing. Cluster analysis of differentially accumulated proteins during development was also carried out to clarify relationships among the proteins. To estimate cluster interactions in this study, mathematical gene interaction network optimization software (Minos) was developed (Tanaka et al. 2005). Clustering of protein time-course data for the estimation of interactions was performed using the clustering method known as the unweighted pair group method with arithmetic mean (UPGMA). Clustering analysis was performed in two steps. The first step was clustering, which involved two stages. The time course was normalized with the initial value and then evaluated by the natural logarithm of that normalized value. In the first stage, the clustering process was a normal clustering using the natural log of the normalized value data. The second stage used the data concerning the fluctuation during the time course. The fluctuation was evaluated at each time point as the difference between the current value (actually the natural logarithm of the normalized value) and its previous value in the time course. The second step was estimating interaction between clusters. The clusters that interacted were estimated by the representative time course that was calculated at each time point using the average. The average was calculated using the representative time course for each cluster evaluated in the step. Minos utilized the S-system differential equation formula (Tanaka et al. 2005) and estimated the cluster interaction by a set of differential equation coefficients that simulate the time course. One protein found to be increased during development of the basal region was annotated as a hypothetical protein of unknown function in rice. This protein was up-regulated along with Cluster 13 (containing antifungal protein 2) and Cluster 15 (containing fructokinase), and was down-regulated along with Cluster 30 (containing RuBisCO binding protein α subunit and 60S ribosomal protein L19). Conversely, this protein expression was regulated oppositely to Cluster 29 (containing ADP-ribose pyrophosphatase). These results suggest that the unknown protein is related to pathogen defense, sugar metabolism, and protein maintenance. The abnormal long morphology protein in Cluster 32 and a nucleoprotein in Cluster 12 were regulated oppositely to calreticulin precursor in Cluster 2+ 16, suggesting that these proteins are regulated by Ca signaling. The ORC in Cluster 10 and translationally controlled tumor protein homolog in Cluster 11 were regulated oppositely to tyrosyl-tRNA synthase in Cluster
5 Rice Proteomics
83
17, and might be directly related to protein synthesis. In contrast, another functionally unknown protein, NAC6 in Cluster 4, did not interact with other clusters, and so it is not possible to extrapolate its function. The differential display of proteins with 2D-PAGE is a powerful approach to the study of complex patterns of protein expression over the course of development. Cluster interaction analyses of these expression patterns, based on the S-system, will be very useful in the identification of protein functions. This approach will be applied to resolving the interactions between proteins and might lead to identifying the roles for proteins involved in rice plant development. 5.4.4 Concluding Remarks In conclusion, analysis by 2D-PAGE provides a convenient way to study the various proteins that are present in rice and to identify those that are regulated in response to different environmental or stress conditions. Knowing where and when individual proteins are synthesized in rice, with respect to the tissue, subcellular compartment, and developmental stage, can also provide clues to their function. The partial amino acid sequences determined for these proteins will contribute greatly to the field of plant molecular biology by facilitating, through homology searches, the identification of new rice proteins of interest. The information thus provided by the rice proteome database will help predict the function of proteins and aid in their molecular cloning, facilitate the development of biomarkers, and contribute to the construction of transgenic plants. Such studies will provide us with increasing knowledge about the regulation of agronomically important traits and accelerate the breeding of crops with high productivity, good quality, and broad stress and disease resistance. The rice proteomics research of today promises to contribute much to the development of the high-yield, sustainable agriculture of tomorrow.
Acknowledgment The author is grateful to Dr. Lee Tarpley for his reading of the manuscript.
References Abbasi F, Komatsu S (2004) A proteomic approach to analyze salt-responsive proteins in rice leaf sheath. Proteomics 4:2072–2081
84
Setsuko Komatsu
Abubakar Z, Ali F, Pinel A, Traore O, N'Guessan P, Notteghem JL, Kimmins F, Konate G, Fargette D (2003) Phylogeography of Rice yellow mottle virus in Africa. J Gen Virol 84:733–743 Agrawal GK, Rakwal R, Yonekura M, Kubo A, Saji H (2002) Proteome analysis of differentially displayed proteins as a tool for investigating ozone stress in rice (Oryza sativa L.) seedlings. Proteomics 2:974–959 Ali GM, Komatsu S (2006) Proteomic analysis of rice leaf sheath during drought stress. J Proteome Res 5:396–403 Brugidou C, Opalka N, Yeager M, Beachy RN, Fauquet C (2002) Stability of rice yellow mottle virus and cellular compartmentalization during the infection process in Oryza sativa (L.). Virology 297:98–108 Casimaro I, Marchant A, Bhalerao RP, Beeckman T, Dhooge S, Swarup R, Graham N, Inze D, Sandberg G, Casero PJ, Bennett M (2001) Auxin transport promotes Arabidopsis lateral root initiation. Plant Cell 13:843–852 Chameides WL, Saylor RD, Cowling EB (1997) Ozone pollution in the rural United States and the new NAAQS. Science 276:916 Cleveland DW, Fisher SG, Kirschner MW, Laemmli UK (1977) Peptide mapping by limited proteolysis in sodium dodecyl sulphate and analysis by gel electrophoresis. J Biol Chem 252:1102–1106 Cui S, Huang F, Wang J, Ma X, Cheng Y, Liu J (2005) A proteomic analysis of cold stress responses in rice seedlings. Proteomics 5:3162–3172 Dani V, Simon WJ, Duranti M, Croy RR (2005) Changes in the tobacco leaf apoplast proteome in response to salt stress. Proteomics 5:737–745 Devos KM, Gale MD (2000) Genome relationships: the grass model in current research. Plant Cell 12:637–646 Eubel H, Braun H-P, Millar H (2005) Blue-native PAGE in plants: a tool in analysis of protein-protein interactions. Plant Methods 1:1–13 Feng Q, Zhang Y, Hao P, Wang S, Fu G, Huang Y, Li Y, Zhu J, Liu Y, Hu X, Jia P, Zhang Y, Zhao Q, Ying K, Yu S, Tang Y, Weng Q, Zhang L, Lu Y, Mu J, Lu Y, Zhang LS, Yu Z, Fan D, Liu X, Lu T, Li C, Wu Y, Sun T, Lei H, Li T, Hu H, Guan J, Wu M, Zhang R, Zhou B, Chen Z, Chen L, Jin Z, Wang R, Yin H, Cai Z, Ren S, Lv G, Gu W, Zhu G, Tu Y, Jia J, Zhang Y, Chen J, Kang H, Chen X, Shao C, Sun Y, Hu Q, Zhang X, Zhang W, Wang L, Ding C, Sheng H, Gu J, Chen S, Ni L, Zhu F, Chen W, Lan L, Lai Y, Cheng Z, Gu M, Jiang J, Li J, Hong G, Xue Y, Han B. (2002) Sequence and analysis of rice chromosome 4. Nature 420:316–320 Fukuda M, Islam N, Woo SH, Yamagishi A, Takaoka M, Hirano H (2003) Assessing matrix assisted laser desorption/ ionization-time of flight-mass spectrometry as a means of rapid embryo protein identification in rice. Electrophoresis 24:1219–1329 Gale MD, Devos KM (1998) Comparative genetics in the grasses. Proc Natl Acad Sci USA 95:1971–1974 Goff SA, Ricke D, Lan TH, Presting G, Wang R, Dunn M, Glazebrook J, Sessions A, Oeller P, Varma H, Hadley D, Hutchison D, Martin C, Katagiri F, Lange BM, Moughamer T, Xia Y, Budworth P, Zhong J, Miguel T, Paszkowski U, Zhang S, Colbert M, Sun WL, Chen L, Cooper B, Park S, Wood TC, Mao L, Quail P, Wing R, Dean R, Yu Y, Zharkikh A, Shen R, Sahasrabudhe S,
5 Rice Proteomics
85
Thomas A, Cannings R, Gutin A, Pruss D, Reid J, Tavtigian S, Mitchell J, Eldredge G, Scholl T, Miller RM, Bhatnagar S, Adey N, Rubano T, Tusneem N, Robinson R, Feldhaus J, Macalma T, Oliphant A, Briggs S (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 296: 92–100 Gygi SP, Rist B, Gerber SA, Turecek F, Gelb MH, Aebersold R (1999a) Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat Biotech 17:994–999 Gygi SP, Rochon Y, Franza BR, Aebersold M (1999b) Correlation between protein and mRNA abundance in yeast. Mol Cell Biol 19:1720–1730 Hajheidari M, Abdollahian-Noghabi M, Askari H, Heidari M, Sadeghian SY, Ober ES, Hosseini-Salekdeh G (2005) Proteome analysis of sugar beet leaves under drought stress. Proteomics 5:950–960 Heazlewood JL, Howell KA, Whelan J, Millar AH (2003) Towards an analysis of the rice mitochondrial proteome. Plant Physiol 132:230–242 Hirano H, Kawasaki H, Sassa H (2000) Two-dimensional gel electrophoresis using immobilized pH gradient tube gels. Electrophoresis 21:440–445 Hooley R (1994) Gibberellins: perception, transduction and responses. Plant Mol Biol 26:1529–1555 Imin N, Kerim T, Weinman JJ, Rolfe BG (2001) Characterization of rice anther proteins expressed at the young microspore stage. Proteomics 1:149-1161 Immler D, Gremm D, Kirsch D, Spengler B, Presek P, Meyer HE (1998) Identification of phosphorylated proteins from thrombin-activated human platelets isolated by two-dimensional gel electrophoresis by electrospray ionization-tandem mass spectrometry (ESI-MS/MS) and liquid chromatography-electrospray ionization-mass spectrometry (LC-ESI-MS). Electrophoresis 19:1015–1023 International Rice Genome Sequencing Project (2005) The map-based sequence of the rice genome. Nature 436:793–800 Islam N, Lonsdala M, Upadhyaya NM, Higgins TJ, Hirano H, Akhurst R (2004) Protein extraction from mature rice leaves for two-dimensional gel electrophoresis and its application in proteome analysis. Proteomics 4:1903– 1908 Kerim T, Imin N, Weinman JJ, Rolfe BG (2003) Proteome analysis of male gametophyte development in rice anthers. Proteomics 3:738–751 Khan M, Takasaki H, Komatsu S (2005) Comprehensive phosphoproteome analysis in rice and identification of phosphoproteins responsive to different hormones/stresses. J. Proteome Res 4:1592–1599 Kikuchi S, Satoh K, Nagata T, Kawagashira N, Doi K, Kishimoto N, Yazaki J, Ishikawa M, Yamada H, Ooka H, Hotta I, Kojima K, Namiki T, Ohneda E, Yahagi W, Suzuki K, Li CJ, Ohtsuki K, Shishiki T, Otomo Y, Murakami K, Iida Y, Sugano S, Fujimura T, Suzuki Y, Tsunoda Y, Kurosaki T, Kodama T, Masuda H, Kobayashi M, Xie Q, Lu M, Narikawa R, Sugiyama A, Mizuno K, Yokomizo S, Niikura J, Ikeda R, Ishibiki J, Kawamata M, Yoshimura A, Miura J, Kusumegi T, Oka M, Ryu R, Ueda M, Matsubara K, Kawai J, Carninci P, Adachi J, Aizawa K, Arakawa T, Fukuda S, Hara A, Hashizume W, Hayatsu N, Imotani K, Ishii Y, Itoh M, Kagawa I, Kondo S, Konno H,
86
Setsuko Komatsu
Miyazaki A, Osato N, Ota Y, Saito R, Sasaki D, Sato K, Shibata K, Shinagawa A, Shiraki T, Yoshino M, Hayashizaki Y, Yasunishi A (2003) Collection, mapping, and annotation of over 28,000 cDNA clones from japonica rice. Science 301:376–379 Kim KM, Kwon YS, Lee JJ, Eun MY, Sohn JK (2004) QTL mapping and molecular marker analysis for the resistance of rice to ozone. Mol Cells 17 151–155 Kim ST, Cho KS, Yu S, Kim SG, Hong JC, Han C-D, Bae DW, Nam MH, Kang KY (2003) Proteomic analysis of differentially expressed proteins induced by rice blast fungus and elicitor in suspension-cultured rice cells. Proteomics 3:2368–2378 Koller A, Washburn MP, Lange BM, Andon NL, Deciu C, Haynes PA, Hays L, Schieltz D, Ulaszek R, Wei J, Wolters D, Yates JR 3rd (2002) Proteomic survey of metabolic pathways in rice. Proc Natl Acad Sci USA 99:11969– 11974 Komatsu S (2005) Rice proteome database: a step toward functional analysis of the rice genome. Plant Mol Biol 59:179–190 Komatsu S, Tanaka N (2004) Rice proteome analysis: a step toward functional analysis of the rice genome. Proteomics 4:938–949 Komatsu S, Kajiwara H, Hirano H (1993) A rice protein library: a data-file of rice proteins separated by two-dimensional electrophoresis. Theor Appl Genet 86:935–942 Komatsu S, Masuda T, Abe K (1996) Phosphorylation of a protein (pp56) is related to the regeneration of rice cultured suspension cells. Plant Cell Physiol 37:748–753 Komatsu S, Muhammad A, Rakwal R (1999a) Separation and characterization of proteins from green and etiolated shoots of rice (Oryza sativa L.): towards a rice proteome. Electrophoresis 20:630–636 Komatsu S, Rakwal R, Li Z (1999b) Separation and characterization of proteins in rice (Oryza sativa) suspension cultured cells. Plant Cell, Tissue Organ Culture 55:183–192 Komatsu S, Konishi H, Shen S, Yang G (2003) Rice proteomics: a step toward functional analysis of the rice genome. Mol Cell Proteomics 2:2–10 Komatsu S, Kojima K, Suzuki K, Ozaki K, Higo K (2004) Rice Proteome Database based on two-dimensional polyacrylamide gel electrophoresis: its status in 2003. Nucl Acids Res 32:388–392 Komatsu S, Zang X, Tanaka N (2006) Comparison of two proteomics techniques used to identify proteins regulated by gibberellin in rice. J. Proteome Res 5:270–276 Konishi H, Komatsu S (2003) A proteomics approach to investigating promotive effects of brassinolide on lamina inclination and root growth in rice seedlings. Biol Pharm Bull 26:401–408 Konishi H, Ishiguro K, Komatsu S (2001) A proteomics approach towards understanding blast fungus infection of rice grown under different levels of nitrogen fertilization. Proteomics 1:1162–1171 Larsen MR, Sorensen GL, Fey SJ, Larsen PM, Roepstorff P (2001) Phosphoproteomics: evaluation of the use of enzymatic de-phosphorylation and
5 Rice Proteomics
87
differential mass spectrometric peptide mass mapping for site specific phosphorylation assignment in proteins separated by gel electrophoresis. Proteomics 1:223–238 Leung K-C, Li H-Y, Mishra G, Vhye M-L (2004) ACBP4 and ACBP5, novel Arabidopsis acyl-CoA-binding proteins with kelch motifs that bind oleoylCoA. Plant Mol Biol 55:297–309 Leustek T, Martin MN, Bick J-A, Davies JP (2000) Pathways and regulation of sulfur metabolism revealed through molecular and genetic studies. Annu Rev Plant Physiol Plant Mol Biol 51:141–165 Lonhosky PM, Zhang X, Honavar VG, Dobbs DL, Fu A, Rodermel SR (2004) A proteomic analysis of maize chloroplast biogenesis. Plant Physiol 134:560– 574 Lyons JM (1973) Chilling injury in plants. Annu Rev Plant Physiol 24:445–446 Mikami S, Hori H, Mitsui T (2001) Separation of distinct components of rice Golgi complex by sucrose density gradient centrifugation. Plant Sci 161:665– 675 O'Farrell PH (1975) High resolution two-dimensional electrophoresis of proteins. J Biol Chem 250:4007–4021 Oguchi K, Tanaka N, Komatsu S, Akao S (2004a) Methylmalonate- semialdehyde dehydrogenase is induced in auxin- and zinc-stimulated root formation in rice. Plant Cell Rep 22:848–858 Oguchi K, Tanaka N, Komatsu S, Akao S (2004b) Characterization of NADPHdependent oxidoreductase from rice induced by auxin and zinc. Physiol Plant 121:124–131 Pinel A, N'Guessan P, Bousalem M, Fargette D (2000) Molecular variability of geographically distinct isolates of Rice yellow mottle virus in Africa. Arch Virol 145:1621–1638 Rabbani MA, Maruyama K, Abe H, Khan MA, Katsura K, Ito Y, Yoshiwara K, Seki M, Shinozaki K, Yamaguchi-Shinozaki K (2003) Monitoring expression profiles of rice genes under cold, drought, and high-salinity stresses and abscisic acid application using cDNA microarray and RNA gel-blot analyses. Plant Physiol 133:1755–1767 Rabilloud T (2002) Two-dimensional gel electrophoresis in proteomics: old, old fashioned, but it still climbs up the mountains. Proteomics 2:3–10 Raison JK, Lyons JM, Keith AD (1971) Temperature-induced phase changes in mitochondrial membranes detected by spin labeling. J Biol Chem 246:4036– 4040 Rakwal R, Komatsu S (2000) Role of jasmonate in the rice (Oryza sativa L.) selfdefense mechanism using proteome analysis. Electrophoresis 21:2492–2500 Resing KA, Ahn NG (1997) Protein phosphorylation analysis by electrospray ionization-mass spectrometry. Methods Enzymol 283:29–44 Rose JK, Bashir S, Giovannoni JJ, Jahn MM, Saravanan RS (2004) Tackling the plant proteome: practical approaches, hurdle hurdles and experimental tools. Plant J 39:715–733 Salekdeh GH, Siopongco J, Ghareyazie B, Bennett J (2002) Proteomic analysis of rice leaves during drought stress and recovery. Proteomics 2:1131–1145
88
Setsuko Komatsu
Sasaki T, Matsumoto T, Yamamoto K, Sakata K, Baba T, Katayose Y, Wu J, Niimura Y, Cheng Z, Nagamura Y, Antonio BA, Kanamori H, Hosokawa S, Masukawa M, Arikawa K, Chiden Y, Hayashi M, Okamoto M, Ando T, Aoki H, Arita K, Hamada M, Harada C, Hijishita S, Honda M, Ichikawa Y, Idonuma A, Iijima M, Ikeda M, Ikeno M, Ito S, Ito T, Ito Y, Ito Y, Iwabuchi A, Kamiya K, Karasawa W, Katagiri S, Kikuta A, Kobayashi N, Kono I, Machita K, Maehara T, Mizuno H, Mizubayashi T, Mukai Y, Nagasaki H, Nakashima M, Nakama Y, Nakamichi Y, Nakamura M, Namiki N, Negishi M, Ohta I, Ono N, Saji S, Sakai K, Shibata M, Shimokawa T, Shomura A, Song J, Takazaki Y, Terasawa K, Tsuji K, Waki K, Yamagata H, Yamane H, Yoshiki S, Yoshihara R, Yukawa K, Zhong H, Iwama H, Endo T, Ito H, Hahn JH, Kim HI, Eun MY, Yano M, Jiang J, Gojobori T (2002) The genome sequence and structure of rice chromosome 1. Nature 420:312–316 Sasse JM (1997) Recent progress in brassinosteroids research. Physiol Plant 100:696–701 Shen S, Matsubae M, Takao T, Tanaka N, Komatsu S (2002) A proteomic analysis of leaf sheath from rice. J Biochem 132:613–620 Shen S, Sharma A, Komatsu S (2003) Characterization of proteins responsive to gibberellin in the leaf-sheath of rice (Oryza sativa L.) seedling using proteome analysis. Biol Pharm Bull 26:129–136 Suarez-Lopez P (2005) Long-range signaling in plant reproductive development. Int J Dev Biol 49:761–771 Tanaka N, Fujita M, Handa H, Murayama S, Uemura M, Kawamura Y, Mitsui T, Mikami S, Tozawa Y, Yoshinaga T, Komatsu S (2004a) Proteomics of the rice cell: Systematic identification of the protein population in subcellular compartments. Mol Gen Genomics 271:566–576 Tanaka N, Konishi H, Khan M, Komatsu S (2004b) Proteome analysis of rice tissues separated and visualized by two-dimensional electrophoresis: approach to investigating the gibberellin regulated proteins. Mol Gen Genomics 270:485–496 Tanaka N, Mitsui S, Nobori H, Yanagi K, Komatsu S (2005) Expression and function of proteins during development of the basal region in rice seedlings. Mol Cell Proteomics 4:796–808 Tsugita A, Kawakami T, Uchiyama Y, Kamo M, Miyatake N, Nozu Y (1994) Separation and characterization of rice proteins. Electrophoresis 15:708–720 Valent B, Farrall L, Chumley FG (1991) Magnaporthe grisea genes for pathogenicity and virulence identified through a series of backcrosses. Genetics 127:87–101 Ventelon-Debout M, Delalande F, Brizard J-P, Diemer H, Van Dorsselaer A, Brugidou C (2004) Proteome analysis of cultivar-specific deregulations of Oryza sativa indica and O. sativa japonica cellular suspensions undergoing Rice yellow mottle virus infection. Proteomics 4:216–225 Whitelegge JP (2002) Plant proteomics: BLASTing out of a MudPIT. Proc Natl Acad Sci USA 99:11564–11566 Yan S, Tang Z, Su W, Sun W (2005) Proteomic analysis of salt stress-responsive proteins in rice root. Proteomics 5:235–244
5 Rice Proteomics
89
Yang G, Inoue A, Takasaki H, Kaku H, Akao S, Komatsu S (2005) A proteomic approach to analyze auxin and zinc-responsive protein in rice. J Proteome Res 4:456–463 Yu J, Hu S, Wang J, Wong GK, Li S, Liu B, Deng Y, Dai L, Zhou Y, Zhang X, Cao M, Liu J, Sun J, Tang J, Chen Y, Huang X, Lin W, Ye C, Tong W, Cong L, Geng J, Han Y, Li L, Li W, Hu G, Huang X, Li W, Li J, Liu Z, Li L, Liu J, Qi Q, Liu J, Li L, Li T, Wang X, Lu H, Wu T, Zhu M, Ni P, Han H, Dong W, Ren X, Feng X, Cui P, Li X, Wang H, Xu X, Zhai W, Xu Z, Zhang J, He S, Zhang J, Xu J, Zhang K, Zheng X, Dong J, Zeng W, Tao L, Ye J, Tan J, Ren X, Chen X, He J, Liu D, Tian W, Tian C, Xia H, Bao Q, Li G, Gao H, Cao T, Wang J, Zhao W, Li P, Chen W, Wang X, Zhang Y, Hu J, Wang J, Liu S, Yang J, Zhang G, Xiong Y, Li Z, Mao L, Zhou C, Zhu Z, Chen R, Hao B, Zheng W, Chen S, Guo W, Li G, Liu S, Tao M, Wang J, Zhu L, Yuan L, Yang H (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296:79–92 Zhong B, Karibe H, Komatsu S, Ichimura H, Nagamura Y, Sasaki T, Hirano H (1997) Screening of rice genes from a cDNA based on the sequence data-file of proteins separated by two-dimensional electrophoresis. Breeding Sci 47:255–251
6 Metabolomics: Enabling Systems-Level Phenotyping in Rice Functional Genomics
1
Lee Tarpley and Ute Roessner
2
1
Texas A&M Agricultural Research and Extension Center, Beaumont, TX 77713, USA; 2Australian Centre for Plant Functional Genomics, School of Botany, University of Melbourne, 3010 Victoria, Australia Reviewed by Tony Ashton
6.1 Significance ...............................................................................................91 6.2 Plant Sampling and Chemical Analysis .....................................................92 6.3 Case Studies in Rice Metabolomics...........................................................94 6.4 Case Studies Integrating Functional Genomic Levels ...............................96 6.5 Time and Space Limitations in Integrated Functional-Genomic Analyses...................................................................98 6.6 Metabolite Response to Perturbation .........................................................99 6.7 Databases and Resources ...........................................................................99 6.8 Data Analysis...........................................................................................102 6.9 Summary..................................................................................................104 References......................................................................................................105
6.1 Significance Metabolomics is the comprehensive analysis of low-molecular-weight compounds in biological samples, and aims to determine genetic, environmental, and developmental influences on global and local aspects of metabolite composition, metabolism, and systems biology (Raamsdonk et al. 2001; Roessner et al. 2001; Stitt and Fernie 2003; Dunn et al. 2005; Fukusaki and Kobayashi 2005). In conjunction with other approaches, such as transcriptomics, proteomics, and glycomics, that aim to determine other cell products, metabolomics is a critical tool of phenotyping strategies on a systems level because the comprehensive measure of metabolite changes after imposed or natural perturbations helps to reveal the functions of genes, proteins, and metabolites. For
92
Lee Tarpley and Ute Roessner
example, metabolomics provides a comprehensive phenotyping at the cellular or tissue biochemical level (Roessner et al. 2001), allowing us to link transcript data to phenotypic responses (Stitt and Fernie 2003). The links between transcript data and comprehensive cellular biochemical data can be applied to study of unknown gene functions, as has been shown for the identification of silent mutations (Raamsdonk et al. 2001). Comparison of the metabolomes of different samples can also support the characterization of links between genes and phenotypes. Thus, as part of a broad phenotypic analysis, metabolomics allows us to validate gene function experimentally (Stitt and Fernie 2003), and in a similar fashion protein function (see Chapter 5, this volume). Indeed, metabolomics has become an important player in functional genomics as well as in systems biology (Kitano 2002).
Because of the diverse chemical nature of metabolomes, no single analytical technique allows the analysis of the complete metabolome. Therefore, a range of different methodologies for extraction, separation, detection, and quantification needs to be established for greatest coverage. In the past decade, a number of methods and analytical technologies have been successfully developed for the analysis of a large number of different metabolites from many different species. Most of the common technologies are based on chromatographic separation of complex compound mixtures via either liquid or gas chromatography. When these are coupled to mass spectrometric detection methods, great specificity and selectivity can be achieved. In addition, nuclear magnetic resonance (NMR) spectroscopy has played and will continue to play a major role in metabolomics approaches. Current limitations to a routine performance of metabolomic analyses as a part of rice functional genomics programs include the access to expertise in the techniques of plant metabolomics, the necessary instrumentation, the establishment of adequate sample preparation procedures, the need for coordinated comprehensive cataloging and/or control of environmental variables, the availability of databases providing storage and access to metabolomic-type of data in addition to other functional-genomics data, and interpretation of metabolomics data. None of these limitations is likely to remain much longer. By learning from other successful plant metabolomics approaches, rice metabolomics is poised to become a standard technology in rice functional genomics.
6.2 Plant Sampling and Chemical Analysis The plant metabolome is strongly influenced by environment and development, as well as by the genotype. Many examples have demonstrated
6 Metabolomics
93
that a number of environmental and physiological factors influence the plant metabolome and must therefore be either thoroughly controlled or documented (Dunn et al. 2005; Fukusaki and Kobayashi 2005). These include, in particular, light conditions, diurnal time point of harvest, developmental stage of the plant, geographical and seasonal variation, and water and nutrient supply. Implicit in this list are factors such as the history of external temperatures and light spectral quality. Consequent to the replicated, well-defined culture of the plants or populations, the critical step in metabolite analysis is the sampling of the plant tissue and the immediate quenching of metabolism, which is extremely dynamic with wide variation in the metabolic reaction half-lives of the metabolites owing to enzymatic reactions. Methods to stop enzymatic alterations include rapid freezing or treatments with acid (Dunn et al. 2005). Different methods for preserving metabolite composition before metabolite analysis in rice tissues have been described. Morino et al. (2005) directly homogenized callus tissue in ice-cold methanol. Takahashi et al. (2005) lyophilized leaves and panicles. Sato et al. (2004) investigated the extraction process for metabolomic analysis of rice leaves in detail. Although their interest was compatibility for metabolite determinations using capillary electrophoresis mass spectrometry (CE-MS) and capillary electrophoresis diode array detection (CE-DAD), their initial procedure involving freezing the tissue in liquid nitrogen, rapid mashing with a bead mill and extraction with ice-cold methanol, and later addition of ice-cold water to improve solubility of certain metabolites, would be more widely applicable. Tarpley et al. (2005) developed procedures for taking different sections from multiple rice plants. To remove soil from the seedlings, they first washed the seedlings and then placed the roots in tap water to maintain rice plant integrity for the short period before the destructive sampling. On sectioning, the tissue slices were plunged within a few seconds into liquid nitrogen until all sections were collected. Sections were stored in nitrogen-purged vials at –80°C until lyophilization. Depending on the class of compound of interest and the attempted analytical detection system, the process for extraction and preparation of metabolites may vary. Sato et al. (2004) provided a thorough discussion of metabolite extraction and preparation for CE-MS and CE-DAD of rice samples and Tarpley et al. (2005) provided a detailed methodology for gas chromatography (GC)-MS of rice samples. These processes are further explored in the review by Dunn et al. (2005). These presentations provide a good starting point for future attempts to employ metabolomics in rice research. The selection of instrument for analysis depends on several factors, including (1) the desire to detect primary/central metabolites or secondary metabolites or both, (2) desired throughput, (3) available amount of sample, and (4) the aim for metabolite identification versus fingerprinting of
94
Lee Tarpley and Ute Roessner
samples. The diverse analytical instrumentation so far used for rice metabolomics is described in more detail in the paragraphs that follow. The examples strongly support the general statement that no single analytical technology is sufficient for comprehensive metabolite analysis, and that the choice of technology depends on the needs of the study (Dunn et al. 2005).
6.3 Case Studies in Rice Metabolomics Few studies in rice metabolomics or metabolite profiling have been published so far. Most of these have been in response to environmental or physiological, rather than genetic, perturbation. Although the intent of the studies was not necessarily to support functional genomic approaches, they do, nevertheless, serve as preliminary studies that can assist in the performance and interpretation of metabolomic studies as part of rice functional genomics. The metabolite profiling study of Morino et al. (2005) investigated if tryptophan-overproducing rice callus possesses an altered spectrum of aromatic compounds and/or altered production of auxin relative to wild-type callus. Aromatic compounds were profiled via high-performance liquid chromatography (HPLC) coupled to photodiode array detection (PAD) in the 200 to 400 nm range followed by mass spectrometric identification of compounds of interest. The authors desired only the identification of compounds that showed a large alteration between the wild-type (cv. Nipponbare) and the tryptophan-overproducing lines, and thus subjected only certain compounds to mass-spectrometric identification. The authors conclude that, although there was limited overall effect on the profile of aromatic compounds in rice calli, there was a change in the metabolic network of the transgenic versus the wild-type calli with regard to secondary metabolite production and the regulation of auxin homeostasis. The results indicated the advantage of metabolite profiling (or presumably metabolomics) of higher plants in order to determine unintended influences on metabolism after genetic manipulations, that is, as part of a comprehensive phenotyping. One of the earliest metabolomics studies of rice was a methods description by Frenzel et al. (2002), who sought a procedure for quantitative sequential extraction of ground rice grains (either rough or brown rice), with appropriate derivitization for gas chromatography-mass spectrometry (GCMS) analysis of a range of different compound classes present in the grain measurable in addition to high-molecular-weight compounds found in grains, such as lipids and starch. The study serves as a reminder that constant vigilance in sample preparation is critical for success in metabolomic studies of new biological systems.
6 Metabolomics
95
Fourier-transform ion cyclotron mass spectrometry (FT-ICR-MS) is one of the most powerful analytical technologies for metabolite fingerprinting, which can mass-resolve most metabolites with a mass accuracy of less than 1 ppm (Dunn et al. 2005). FT-ICR-MS thus provides a highthroughput method for metabolite fingerprinting. Takahashi et al. (2005) described the application of this method for metabolite fingerprinting in conjunction with proteomics for comparing leaves, panicles, and calli of wild-type (cv. Nipponbare) versus a transgenic YK1 gene (homolog of maize HC-toxin reductase gene) overexpressing line known to confer increased tolerance to rice blast and multiple environmental stresses. The authors concluded that the (global) composition of organ-specific metabolites did not differ significantly between the two lines, so that if metabolic alterations contributed to the differences in stress tolerance then it was due to alteration in fewer than 10% of the metabolites. The proteomic part of the study demonstrated that the transgenic line expressed several proteins known to be expressed under certain (stressful) conditions even though the stresses were not present in this study. The authors conclude that ectopic overexpression of a single gene (YK1) can affect expression of unrelated proteins and metabolites. Studies such as this, in which global analyses of multiple transcription levels are performed, will play an increasingly important role in functional genomics because of their ability to provide focus to follow-up studies, especially as the capabilities for the integrated analysis of multiple levels are enhanced. Metabolomics and proteomics can expose silent phenotypes and increase our understanding of systems biology. The CE-MS system has proved to have major advantages compared to other technologies for plant metabolomics in ease of sample preparation, good reproducibility, sensitivity, and high-throughput capability, and appears to be especially useful for study of low molecular weight, charged, and unstable compounds. Sato et al. (2004) have examined a number of factors influencing sample preparation and the performance of the CE-MS system for metabolite analysis in leaves of rice cv. Haenuki. For detection, the analyzed compounds were classified into four groups: (1) amino acids, amines, and purine bases; (2) organic acids and sugar phosphates; (3) nucleotides and coenzymes; and (4) sugars. The sugars required a different detection method (diode array detector, DAD). The authors used three different CE-MS methods under varying conditions and one CE-DAD system, each system optimized for a group of metabolites. Such optimal detection conditions resulted in increased comprehensiveness. The four methodologies were then used to examine diurnal differences in metabolite concentrations in rice leaves. The patterns of variation suggested that most variance was not due to procedure but rather to biological variability and light-dependent metabolite differences. A number of the analyzed
96
Lee Tarpley and Ute Roessner
metabolites changed dramatically in concentration between day and night, and most changed to some extent. This study in particular demonstrated how important the diurnal time point of harvest is, especially in leaves, when conducting a comparative metabolomics study because the concentrations of a large number of metabolites are light dependent. Tarpley et al. (2005) presented a detailed rice metabolomics study of the seedling tillering stage using GC-MS. In the study reported here, metabolites in different tissue sections obtained from various positions along the developing rice seedling (cv. IR36) and at a number of dates postemergence were determined. These tissue positions and sampling dates were selected to ensure that the period of early tiller growth was covered. The resulting data set was analyzed in order to identify a small subset of metabolites that captured most of the metabolite variation present in the period of rice development bridging first tillering. Because this is an important and biologically representative developmental stage, the set of “biomarker” metabolites found could be used for comparative study of the pattern of metabolite change due to development, environment, or genotype. The biomarker-metabolite approach could allow initial rapid screening of multiple samples for comparative purposes in some situations, thus providing focus for additional comprehensive metabolomic analyses. The biomarkers were validated by comparison with the diurnal data of Sato et al. (2004) discussed in the preceding text.
6.4 Case Studies Integrating Functional Genomic Levels Data sets resulting from a metabolomic analysis can be analyzed together with data sets from a transcriptomic and/or proteomic analysis via various means to detect associations in response to perturbation. These coresponse types of analyses are often applied in plant functional genomic studies. In recent years, a number of studies with plants involving integrated analysis have been reported (Urbanczyk-Wochniak et al. 2003; Hirai et al. 2005; Nikiforova et al. 2005; Tohge et al. 2005; Schauer et al. 2006). The study of Tohge et al. (2005) illustrates the utility of an integrated analysis to pinpoint putative gene function. A combined analysis of transcriptomic data using Arabidopsis DNA microarrays and metabolite profiling data was performed on Arabidopsis plants overexpressing the PAP1 gene encoding an MYB transcription factor; this mutant is a T-DNA activation-tagged line that overproduces anthocyanins. The metabolite analysis was carried out using two approaches: (1) a targeted profiling of flavonoids using liquid chromatography with photodiode array detection and mass spectrometry (LC-PAD-MS), that of amino acids by HPLC, and
6 Metabolomics
97
that of anions and sugars by CE-MS and (2) a nontargeted analysis using FT-ICR-MS. Results from the nontargeted analysis showed that the sample origin (plant organ) and the growth conditions have greater influences on the metabolite composition than the transgenic event. This suggested that the studied gene regulates anthocyanin accumulation relatively specifically. Based on the integrative analysis, some of the altered transcripts were identified as being involved in anthocyanin biosynthesis or its regulation, and some of the induced genes were similar to genes coding for transferases. Several of these putative transferase genes were then studied in more detail to investigate and assign their role in anthocyanin synthesis. This study clearly shows the potential value of an integrated transcriptomic and metabolomic analysis. Results obtained from other integrated analyses are just as compelling in various ways, and support the value of integrated study. The study of Urbanczyk-Wochniak et al. (2003) showed that a metabolite profile can provide better discrimination among potato tuber systems (developmental stages and transgenics) than can a transcript profile. In the study, correlations among transcripts and metabolites (transcript–transcript, transcript– metabolite, metabolite–metabolite) were also examined. The presence of a number of predictable associations was reassuring, and the presence of a number of surprising associations stimulates the rapid identification of candidate genes. In the study by Nikiforova et al. (2005) (and preceding articles), in which transcriptomic and metabolomic data were merged for analysis of responses to sulfur deprivation, several interesting results emerged, including the observation that the transcriptome appeared to be dynamic as the sulfur deprivation continued, whereas the metabolome changed fairly quickly to a new “steady state” that was then largely maintained. In addition, the authors were able to state more specifically a putative function for and possible regulatory mechanism of activity of IAA28, which is an auxin-related transcriptional factor. Another integrated study of transcriptomics and metabolomics in response to sulfur deficiency, also using Arabidopsis, was conducted by Hirai et al. (2005) (and preceding articles). Analysis of the combined datasets was fruitful for identifying candidate genes. In vitro enzymatic assays of the recombinant gene products were performed based on the observed time-dependent associations of putative sulfotransferase genes with known glucosinolate biosynthesis genes as noted in the integrated analyses. The enzyme assays confirmed the gene functions as desulfoglucosinolate sulfotransferases. Other candidate genes were identified. In a somewhat different, but also very useful, approach in integrated analysis, Schauer et al. (2006) analyzed a number of metabolites along with whole-plant phenotypic traits. One of the strengths of the study was
98
Lee Tarpley and Ute Roessner
the use of interspecific introgression lines in which marker-defined genomic regions of one species replaced homologous intervals of the other species. The introgressions overlap with respect to the chromosome regions covered, so that a metabolite trait related to a phenotypic trait can be associated with a specific chromosomal segment and quantitative trait loci (QTL). Numerous QTLs for metabolite traits and phenotypic traits were identified, and association between metabolite traits and phenotypic traits was noted.
6.5 Time and Space Limitations in Integrated Functional-Genomic Analyses A major limitation in integration of levels in functional genomics is the difficulty in obtaining the homogeneous tissue samples needed for establishing careful global and local associations among genomic elements (metabolites, proteins, mRNA). The problem exists in part because of the invasive nature of the procedures used for obtaining time-space metabolomic information (Arita 2004). Stitt and Fernie (2003) have addressed the difficulty in obtaining time–space information for plants, and introduced biochemical and cell-biological methods to examine metabolite distribution at fine spatial resolution. Recently, the first successful attempts for highly spatially resolved metabolite analyses have been reported. The application of capillary electrophoresis coupled to laser-induced fluorescence detection allowed the detection of amino acids and sugars in only five pooled mesophyll cells from Cucurbita maxima (Arlt et al. 2001). Another approach for near-“single cell” analysis has been reported in which cryosectioning was first used to preserve cellular structures. Specific cell types were then cut and collected via laser micro-dissection until a sufficient amount of cells was obtained. This approach allowed the detection of about 68 major metabolites in these cells by GC-MS (Schad et al. 2005). In the future, much effort will be directed toward the development of comprehensive metabolomics approaches at the organ or even the single-cell level. Further, the determination of steady-state metabolite levels is not sufficient for a detailed understanding of plant metabolism, but rather the analysis of the dynamics between metabolites (metabolic flux) will be of great help, as noted by Stitt and Fernie (2003). Current technologies for analysis of metabolic flux are based on a combination of stable isotope labeling under steady-state conditions and nuclear magnetic resonance (NMR)- or MS-based detection systems to follow the distribution of label. The application of a multiparallel detection method such as GC- or LC-MS allows the determination of isotope label in many metabolites in a single
6 Metabolomics
99
experiment and therefore provides an opportunity to calculate metabolic fluxes of many different pathways simultaneously (Schwender et al. 2003; Roessner-Tunali et al. 2004). In the future, metabolomics in combination with the analysis of metabolic flux using stable isotopes will provide important insights into plant functional genomic studies.
6.6 Metabolite Response to Perturbation Metabolite response to imposed or natural perturbation occurs both globally (subtle shifts in composition across a range of metabolites) and locally (change within a small subset of metabolites possessing some relationship to each other) simultaneously. This is commonly seen in metabolite response to perturbation, and is also true for other genomic elements (mRNA, proteins). Data analysis methods should account for both types of responses. Figures 6.1 and 6.2, both from the study of Tarpley et al. (2005), illustrate the simultaneous occurrence of local and global change in metabolite composition in response to perturbation.
6.7 Databases and Resources Physiological changes in individual metabolite levels and broad shifts among metabolites can occur in response to plant internal and external conditions. This requires a thorough documentation of all potential influencing factors on metabolism. In addition, the sampling, extraction, and analytical technologies commonly used in plant metabolomics have advantages and drawbacks (Dunn et al. 2005). Thus, the methodologies used for a particular study impose particular biases into the data, which both expert and non-expert readers must be able to evaluate. All of this indicates the need for adequate standardized collection and organization of supporting data through central databases (Jenkins et al. 2005). As an example, the Arabidopsis Information Resource (http://www.arabidopsis. org) provides for metabolomic, as well as for transcriptomic and proteomic, data entry for functional genomic studies of Arabidopsis. This could be a basis for an Oryza Information Resource, which would be of great utility for the rice functional genomics initiative. An overview of existing computational resources available for metabolomics is provided by Arita (2004). These resources are with an emphasis on mass spectrometry resources and the application of various machine learning packages for analyzing patterns. In addition, they may
100
Lee Tarpley and Ute Roessner
support the integration and translation of metabolomics data into systems biology knowledge.
3
5 mm mid-section height, 11 d post-emergence 9 mm mid-section height, 13 d post-emergence 13 mm mid-section height, 15 d post-emergence 17 mm mid-section height, 17 d post-emergence
Z-score for metabolite
2
1
0
-1
Oxalic Acid Leucine Valine Succinic Acid Uracil Thymine Malic Acid Salicyclic Acid Pyroglutamic Acid GABA Phenylalanine p-hydroxybenzoic Acid trans-Aconitic Acid Shikimic Acid Citric Acid Mannose Trehalose Galactose Carbonate Lysine Glutamic Acid
-2
Biomarker metabolite
Fig. 6.1. Magnitude and pattern of variation in selected metabolite concentrations in samples ranging in development. The samples progress in height at midsection of the sampled tissue and in days post-emergence (Tarpley et al. 2005). The example metabolites are listed along the horizontal axis, and each dot plot shows the Z-scores for the concentration. The Z-distribution has mean = 0 and standard deviation = 1; thus the figure shows the pattern and magnitude of the variation among the presented tissues, but also the amount of this variation relative to that of the metabolite concentration for the whole study. The ranges of patterns and magnitudes of variation of the metabolites demonstrate that local uncorrelated changes in metabolites occur in response to perturbation, in this case development.
6 Metabolomics
101
17 13 9 5 1 17 13 9 5 1
1
2
17 13 9 5 1
3
Principal 17 Component 13 9 Score 5 15 to 20 1 10 to 15 5 to 10 17 13 0 to 5 9 –5 to 0 5 –10 to –5 –15 to –10 1 –20 to –15 7 11 13 15 17 19
4
Principal Component
Height (mm) of mid-section
7 11 13 15 17 19
5
Days Post Emergence Fig. 6.2. Principal component (PC) scores during a rice plant developmental period bridging first tillering (Tarpley et al. 2005). The scores (categorized by value using a gray scale as indicated in the legend) of PC 1 to 5 (panels 1 to 5, respectively) are plotted against the progression in sampling of days post-emergence (horizontal axis) and the height of the sampled tissue section (as height [mm] of mid-section – vertical axes of panels). The PCs, from principal component analysis, are independent of each other, but each includes a contribution from each measured metabolite of the comprehensive metabolite data set; thus the different patterns of PC scores with perturbation (development) illustrate that global change, that is, broad subtle shifts within the collective set of metabolites, occurred. Simultaneous local and global changes also occur with mRNA and proteins.
102
Lee Tarpley and Ute Roessner
An important computational resource for integrating functional genomic data is the Arabidopsis Information Resource (http://www.arabidopsis.org), which provides bioinformatics support for linking data sets from different origins, such as transcript, protein, or metabolite data. Very helpful for the interpretation of metabolomics data is the AraCyc pathway resource, which provides virtual, linked, and well-documented metabolic maps. For a holistic integration of numerous multparallel genomic, proteomic, metabolomic and metabolic-flux data sets with metabolic–pathway information, the “Pathway Tools Omics Viewer” has been enabled, which in an easy and powerful manner paints experimental data onto the biochemical pathway map. Another example for such mapping tools is MapMan (Thimm et al. 2004), allowing users to visualize comparative metabolite and also transcriptional profiling data sets on existing metabolic templates. Additional publicly available resources also provide for a holistic integration of multiparallel genomic, proteomic, and metabolomic data sets. PaVESy is a data-managing system for editing and visualization of biological pathways. The database model accommodates flexible annotation of the genomic elements (biological objects) by user-defined attributes, and thus allows research on the regions of metabolism of which we possess limited knowledge (Lüdemann et al. 2004). Another publicly available resource, with utility as a research tool, is MetNetDB (Wurtele et al. 2003), which is designed to allow the visualization, statistical analysis, and modeling of metabolic and regulatory network maps of Arabidopsis, combined with gene-expression profiling data. The capability provided by these database resources to allow the development and testing of hypotheses will be very important for data mining in the functional-genomics field.
6.8 Data Analysis Many of the approaches for statistical analysis used for metabolomic data are the same as for transcriptomic data (e.g., Liang and Keleman 2006). Also necessary, however, are unique approaches for the analysis of metabolomic data that account for the intimate relationship of metabolites within metabolic networks, as suggested by Arita (2004) and further emphasized by Weckwerth and Morgenthal (2005). These authors discuss procedures to utilize the latent information contained within correlations among metabolites in metabolomic studies as a way of extracting information about the metabolic network. By analysis of changes in metabolite steady-state levels in response to certain perturbations (e.g., developmental, environmental, or genetic), and concurrent determination if and how existing correlations are structured among metabolites, the researcher may be able to identify regulatory points in metabolic networks. It has to be
6 Metabolomics
103
noted that metabolite relationships can be predicted on the one hand based on metabolic pathway connectivity, but on the other hand by comparison of induced changes in the metabolite correlation matrices. These correlation network alterations can then be analyzed further via multivariate statistical methods aiming to obtain novel information about patterns in the metabolic reaction network. Weckwerth and Morgenthal (2005) have also provided an interesting example of the use of a specific multivariate analysis method for metabolite correlation network comparisons that became popular for signal processing in the mid-1990s as a blind source separation (feature identification) method. Independent component analysis (ICA) seeks to maximize independence among components where the latent variables are assumed to not have a normal distribution, unlike principal component analysis (PCA) (Hyvärinen 1999). The ICA of an integrated metabolite–protein data matrix was capable of separating out variation due to genotype (wild-type versus transgenic) and diurnal variation, presumably due to the ability of ICA to effectively exploit information in the covariance matrix (Weckwerth and Morgenthal 2005). A PCA was shown to be less satisfactory. The highly correlated nature of a metabolomics data set (Steuer et al. 2003) has much in common with data of a spectral nature, such as the signal processing-type example given above or in many analytical chemistry data sets, such as from chromatography. Chemometrics has developed as a field partially in response to the need to effectively analyze highly structured, often massive, data sets of highly intercorrelated variables. Many of the chemometric methods are being applied to analysis of metabolomic data sets, both those analyzed manually and in silico, which can possess complex structures and are often not fully identified for optimal analysis. The articles by van der Greef and Smilde (2005) and Smilde et al. (2005) discuss the potential role of multiway analyses of metabolomic datasets and provide examples for one-set, two-set, and multiset problems. The authors describe the use of analysis of variance–simultaneous component analysis for the analysis of metabolomic data sets from multisubject multivariate time series with an underlying design. A simultaneous component analysis can account for a group structure in the data, whereas PCA, which is often used in analysis of metabolomics data, does not. Although the example studies provided by these authors concern mainly medical applications, the described tools will be of great use for analyses of plant metabolite data or integrated functional–genomic studies as these produce highly complex multivariate data sets (Smilde et al. 2005). The utilization of these and other approaches to relate metabolomic data to the metabolic network and the system as a whole (systems biology) will greatly strengthen the application of metabolomics in functional genomics approaches. Yet, plant metabolic networks are not uniformly well identified.
104
Lee Tarpley and Ute Roessner
For example, little knowledge exists about the substrate specificity of many of the enzymes or about the regulation of flux among alternate metabolic pathways. Metabolomics will prove more beneficial if the data are analyzed in multiple ways, including the use of multivariate analytical procedures and chemometric approaches that help to identify broad patterns and latent features in the data. While identifying the structure in the datasets, new methods capable of building out from individual nodes in the metabolite structure can reveal the unique properties of subsets or neighborhoods in metabolite networks. These methods can include the intense analysis of correlations/relationships and provide the means of identifying location or referencing information in the metabolite network. The complement of statistical data analysis methods will help achieve a major aim of metabolomics, that of a transparent translation from “real” data to metabolic networks. This will, in turn, promote efficiency in functional genomic analyses. Additional tools exist that were developed for integrating whole-genome expression results onto cellular networks (Cavalieri and De Filippo 2005). The incorporation of a network perspective in the analysis can help provide a further level of understanding of the system. For example, regulation of transcription occurs at the chromosome level in rice (see Chapter 4 of this book). An understanding of the roles of other functional genomic levels in this type of regulation will likely require tools incorporating a network perspective. These methods include those, such as MapMan, that project genomic information onto pathways, and also include tests for statistical significance of enrichment of genomic elements belonging to the same class, pathway, or network. As our knowledge base grows, these methods will play an increasingly important role.
6.9 Summary Metabolomics is considered to be one of the major phenotyping approaches in plant research. Because of its ability to provide a comprehensive biochemical phenotyping that can assist in the identification of novel gene functions, and because of the proven close relationship between metabolites, metabolic network, and cellular network and thus systems biology, metabolomics is poised to contribute extensively to functional genomics. As programs are developed that take advantage of the combined strengths of various analytical technologies, and as bioinformatic and central database resources for metabolomics become more available, we can expect that metabolomics will contribute substantially to the rice functional genomics initiative. Although only a few rice metabolomics studies have been reported so far, a number of rice metabolomic projects are currently
6 Metabolomics
105
being conducted in many laboratories worldwide, and we expect the number of reported studies to increase exponentially in the near future.
References Arita M (2004) Computational resources for metabolomics. Brief Funct Genomics Proteomics 3:84–93 Arlt K, Brandt S, Kehr J (2001) Amino acid analysis in five pooled single plant cell samples using capillary electrophoresis coupled to laser-induced fluorescence detection. J Chromatogr A 926:319–325 Cavalieri D, De Filippo C (2005) Bioinformatic methods for integrating wholegenome expression results into cellular networks. Drug Discov Today 10:727–734 Dunn WB, Bailey NJC, Johnson HE (2005) Measuring the metabolome: current analytical technologies. Analyst 130:606–625 Frenzel T, Miller A, Engel K-H (2002) Metabolite profiling - a fractionation method for analysis of major and minor compounds in rice grains. Cereal Chem 79:215–221 Fukusaki E, Kobayashi A (2005) Plant metabolomics: potential for practical operation. J Biosci Bioeng 100:347–354 Hirai, MY, Klein M, Fujikawa Y, Yano M, Goodenowe DB, Yamazaki Y, Kanaya S, Nakamura Y, Kitayama M, Suzuki H, Sakurai N, Shibata D, Tokuhisa J, Reichelt M, Gershenzon J, Papenbrock J, Saito K (2005) Elucidation of geneto-gene and metabolite-to-gene networks in Arabidopsis by integration of metabolomics and transcriptomics. J Biol Chem 280:25590–25595 Hyvärinen A (1999) Survey on Independent Component Analysis. Neural Comput Surv 2:94–128 Jenkins H, Johnson H, Kular B, Wang T, Hardy N (2005) Toward supportive data collection tools for plant metabolomics. Plant Physiol 138:67–77 Kitano H (2002) Systems biology: a brief overview. Science 295:1662–1664 Liang Y, Kelemen A (2006) Associating phenotypes with molecular events: recent statistical advances and challenges underpinning microarray experiments. Funct Integr Genomics 6:1–13 Lüdemann A, Weicht D, Selbig J, Kopka J (2004) PaVESy: pathway visualization and editing system. Bioinformatics 20:2841–2844 Morino K, Matsuda F, Miyazawa H, Sukegawa A, Miyagawa H, Wakasa K (2005) Metabolic profiling of tryptophan-overproducing rice calli that express a feedback-insensitive α subunit of anthranilate synthase. Plant Cell Physiol 46:514–521 Nikiforova VJ, Daub CO, Hesse H, Willmitzer L, Hoefgen R (2005) Integrative gene-metabolite network with implemented causality deciphers informational fluxes of sulphur stress response. J Exp Bot 56:1887–1895 Raamsdonk LM, Teusink B, Broadhurst D, Zhang N, Hayes A, Walsh MC, Berden JA, Brindle KM, Kell DB, Rowland JJ, Westerhoff HV, van Dam K,
106
Lee Tarpley and Ute Roessner
Oliver SG (2001) A functional genomics strategy that uses metabolome data to reveal the phenotype of silent mutations. Nat Biotechnol 19:45–50 Roessner U, Luedemann A, Brust D, Fiehn O, Linke T, Willmitzer L, Fernie AR (2001) Metabolic profiling allows comprehensive phenotyping of genetically or environmentally modified plant systems. Plant Cell 13:11–29 Roessner-Tunali U, Lui J, Leisse A, Balbo I, Perez-Melis A, Willmitzer L, Fernie AR (2004) Flux analysis of organic and amino acid metabolism in potato tubers by gas chromatography-mass spectrometry following incubation in 13C labelled isotopes. Plant J 39:668–679 Sato S, Soga T, Nishioka T, Tomita M (2004) Simultaneous determination of the main metabolites in rice leaves using capillary electrophoresis mass spectrometry and capillary electrophoresis diode array detection. Plant J 40:151– 163 Schad M, Mungur R, Fiehn O, Kehr J (2005) Metabolic profiling of laser microdissected vascular bundles of Arabidopsis thaliana. Plant Methods 1:2 Schauer N, Semel Y, Roessner U, Gur A, Balbo I, Carrari F, Pleban T, PerezMelis A, Bruedigam C, Kopka J, Willmitzer L, Zamir D, Fernie AR (2006) Comprehensive metabolic profiling and phenotyping of interspecific introgression lines for tomato improvement. Nat Biotechnol 24:447–454 Schwender J, Ohlrogge JB, Shachar-Hill Y (2003) A flux model of glycolysis and the oxidative pentosephosphate pathway in developing Brassica napus embryos. J Biol Chem 278:29442–29453 Smilde AK, Jansen JJ, Hoefsloot HCJ, Lamers R-JAN, van der Greef J, Timmerman ME (2005) ANOVA-simultaneous component analysis (ASCA): a new tool for analyzing designed metabolomics data. Bioinformatics 21:3043–3048 Steuer R, Kurths J, Fiehn O, Weckwerth W (2003) Observing and interpreting correlations in metabolomic networks. Bioinformatics 19:1019–1026 Stitt M, Fernie AR (2003) From measurements of metabolites to metabolomics: an ‘on the fly’ perspective illustrated by recent studies of carbon-nitrogen interactions. Curr Opin Biotechnol 14:136–144 Takahashi H, Hotta Y, Hayashi M, Kawai-Yamada M, Komatsu S, Uchimiya H (2005) High throughput metabolome and proteome analysis of transgenic rice plants (Oryza sativa L.). Plant Biotechnol 22:47–60 Tarpley L, Duran AL, Kebrom TH, Sumner LW (2005) Biomarker metabolites capturing the metabolite variance present in a rice plant developmental period. BMC Plant Biol 5:8 Thimm O, Bläsing O, Gibon Y, Nagel A, Meyer S, Krüger P, Selbig J, Müller LA, Rhee SY, Stitt M (2004) MAPMAN: a user-driven tool to display genomics data sets onto diagrams of metabolic pathways and other biological processes. Plant J 37:914–939 Tohge T, Nishiyama Y, Hirai MY, Yano M, Nakajima J, Awazuhara M, Inoue E, Takahashi H, Goodenowe DB, Kitayama M, Noji M, Yamazaki M, Saito K (2005) Functional genomics by integrated analysis of metabolome and transcriptome of Arabidopsis plants over-expressing an MYB transcription factor. Plant J 42:218–235
6 Metabolomics
107
Urbanczyk-Wochniak E, Luedemann A, Kopka J, Selbig J, Roessner-Tunali U, Willmitzer L, Fernie AR (2003) Parallel analysis of transcript and metabolic profiles: a new approach in systems biology. EMBO Rep 4:989–993 van der Greef J, Smilde AK (2005) Symbiosis of chemometrics and metabolomics: past, present, and future. J Chemometr 19:376–386 Weckwerth W, Morgenthal K (2005) Metabolomics: from pattern recognition to biological interpretation. Drug Discov Today 10:1551–1558 Wurtele ES, Li J, Diao L, Zhang H, Foster CM, Fatland B, Dickerson J, Brown A, Cox Z, Cook D, Lee E-K, Hofmann H (2003) MetNet: software to build and model the biogenetic lattice of Arabidopsis. Comp Funct Genomics 4:239–245
7 Use of Naturally Occurring Alleles for Crop Improvement
1
1
2
Anjali S. Iyer-Pascuzzi , Megan T. Sweeney , Neelamraju Sarla and Susan 1 R. McCouch 1
Department of Plant Breeding and Genetics, Cornell University, Ithaca, NY 14853; 2Directorate of Rice Research, Rajendranagar, Hyderabad 500 030, India Reviewed by Evans Lagudah
7.1 Introduction.......................................................................................110 7.1.1 Why Study Natural Variation? ................................................110 7.2 A Plant Breeder’s View on Utilizing Natural Variation ...................111 7.2.1 Importance of Germplasm Conservation for Crop Improvement ............................................................................111 7.3 Understanding Evolutionary History Through Natural Variation .....113 7.3.1 Origins of Natural Variation: A Short History of Orzya Sativa........................................................................113 7.3.2 Genetic Markers: Assessing Diversity and Population Structure in O. sativa..............................................................114 7.4 Natural Variation and Functional Genomics: Utilizing Germplasm to Identify Useful Alleles..................................................................116 7.4.1 Genetic Markers and Their Use in Mapping ...........................116 7.4.2 Mapping Populations ..............................................................116 7.4.3 Association Mapping ..............................................................128 7.4.4 Gene Identification and Development of Perfect Markers for Applications in Breeding ...................................................130 7.5 Natural Variation and Epistasis.........................................................132 7.6 Natural Variation or Mutant Analysis? .............................................133 7.7 Natural Variation versus Transgenic Approaches for Crop Improvement ............................................................................135 7.8 Conclusions.......................................................................................137 References...............................................................................................137
110
Anjali S. Iyer-Pascuzzi et al.
7.1 Introduction 7.1.1 Why Study Natural Variation? Natural variation is the raw material of plant breeding and for centuries has been recognized as a vital resource underpinning the world’s food supply. More recently, plant biologists have discovered its value as a template for studying gene function and evolution. Unlike most genetic analysis performed with induced mutations in which only one or a small number of genetic changes distinguishes a mutant from a wild type, the use of natural variation confronts the geneticist with individuals that differ genetically at numerous loci. Thus, even when focusing on a single trait or phenotype, the biologist studying natural variation identifies multiple allelic differences that contribute in different ways to the phenotypic variation of interest. This is referred to as quantitative variation and suggests that there are numerous ways in which genetic variation distributed throughout the genome can alter a phenotype. When molecular maps and markers are used to identify the genes or regions of the chromosomes that are associated with a quantitatively inherited phenotype, these loci are referred to as quantitative trait loci (QTL). The genes underlying QTL work in concert and they interact, directly or indirectly, with each other (G × G or epistasis) and with the environment (G × E) to condition a trait. Scientists interested in identifying and characterizing the genes underlying QTL can take advantage of both forward and reverse genetics approaches. Using populations developed from diverse germplasm resources, forward genetics approaches such as positional cloning, coupled with reverse genetics strategies targeting candidate genes, allow the biological researcher to identify genes underlying QTL (Yano 2001). As genes of interest are identified, researchers aim to identify what specific genetic changes distinguish one allele from another and which are functionally relevant, causing one allele to contribute differently to the phenotype of interest (Yano 2001). Using knowledge of where a QTL resides along a chromosome, one can screen populations of induced mutations (functional genomics populations) to look for a mutation in a gene that resides in the region of interest. If such a mutation is detected, the plant or line can then be tested to determine whether that allelic variant has any effect on the phenotype under consideration. In this way, alleles resulting from induced mutation can be compared with those identified as natural variants at the same loci to learn more about the way a particular gene or allelic series conditions a phenotypic response. The identification of naturally occurring alleles offers plant biologists an opportunity to use that information to examine natural history and evolution in ways that the discovery of an induced mutation does not. Based on
7 Naturally Occurring Alleles for Crop Improvement
111
population level studies of natural variation, researchers can compare the extent of genetic diversity in different species or populations, identify genes or regions of chromosomes that have been under selection, investigate population structure, and begin to examine the evolutionary history of a species. The use of natural variation allows researchers to identify alleles and allele combinations that contribute to adaptation or that are ecologically relevant. It also allows them to investigate the extent to which genes and genetic regulatory systems are conserved across species. Natural variation can therefore be viewed as a valuable resource, not only for plant breeders, but also for evolutionary, ecological, and functional studies of the genes themselves (Koornneef et al. 2004).
7.2 A Plant Breeder’s View on Utilizing Natural Variation 7.2.1 Importance of Germplasm Conservation for Crop Improvement Although there are many sources of natural variation, wild germplasm is increasingly recognized as a valuable repository of useful allelic variation for crop improvement. Historically, wild species have provided many valuable traits such as disease and pest resistance or cytoplasmic male sterility (Brar and Khush 1997). The successful wide hybridization program at the International Rice Research Institute (IRRI) utilizes wild species by screening them for numerous characters of interest and then selectively introgressing a target trait into an elite Oryza sativa cultivar (Brar and Khush 1997). The traits are almost all simply inherited characters that can be easily recognized in wild accessions. More recently, wild ancestors have been shown to contain valuable alleles that can contribute positively to the enhancement of complex traits such as yield, despite the fact that the wild species themselves are extremely low yielding. The use of molecular maps and markers has greatly facilitated the identification of positive alleles in accessions whose breeding value cannot be discerned by examining their phenotype (Tanksley and McCouch 1997). In this section, we discuss how plant breeders utilize wild germplasm as a source of valuable alleles for plant improvement. Most of the cytoplasmic male sterile (CMS) lines currently used in hybrid rice breeding are derived from O. rufipogon. For many years, the wild abortive (WA) cytoplasm from O. rufipogon has provided male sterility in conjunction with the use of nuclear fertility-restorer genes, Rf3 and Rf4, derived from the O. sativa cultivar IR24 (Li and Yuan 2000). Disease resistance has also come from wild relatives. The wild species O. nivara provided grassy stunt virus resistance (Khush and Ling 1974). Grassy stunt virus is a devastating viral disease transmitted by the green leafhopper
112
Anjali S. Iyer-Pascuzzi et al.
insect that threatened the rice crop in the 1970s and 1980s. At IRRI, large numbers of rice cultivars and wild relatives were screened for resistance to the green leafhopper vector and a single accession of O. nivara was found to be resistant. This source of resistance has proven durable and still serves as the major form of grassy stunt viral resistance today. A distantly related African species, O. longistaminata, was the source of the Xa21 gene that confers resistance to bacterial blight caused by Xanthomonas oryzae, pv. oryzae (Ikeda et al. 1990). Crossing O. longistaminata with O. sativa is difficult, but the gene was successfully introgressed into the O. sativa cultivar, IR24, and subsequently added to the bacterial blight isoline series in the IR24 background (Khush and Ling 1974).The resulting cultivar, IRBB21, provides breeders with useful bridging material from which to move Xa21 into additional O.sativa parents. Subsequently, the Xa21 gene was cloned (Song et al. 1995) and can now be deployed in transgenic as well as conventional varieties (Toenniessen et al. 2003). In each of these examples, thousands of accessions were screened to identify a single individual that carried the trait of interest, and in each case, a wild species gene provided a useful solution to an important agricultural problem. This scenario has been repeated many times for numerous traits, and offers a strong argument for conserving a wide diversity of germplasm resources, including landraces and wild/weedy relatives. More recently, studies have demonstrated that crosses between low yielding wild relatives and high yielding elite cultivars can be used to improve the yield performance of the elite, high yielding parent (Tanksley and McCouch 1997). While at first counterintuitive, this phenomenon, which is known as transgressive variation, can be explained by the interaction of many genes whereby alleles for both increased and decreased yield are dispersed in the parents but can be recombined in the offspring (Reiseberg et al. 2003). The degree of genetic divergence between the parents is critical to the likelihood of finding positive transgressive segregants in the offspring, and thus an understanding of the population structure and genetic relationships between O. sativa and related wild species can provide a useful framework for selecting parents. Several groups have demonstrated the value of this approach using O. rufipogon as a donor of positive alleles for yield in combination with both indica and japonica cultivars as recurrent parents (Xiao et al. 1998; Moncada et al. 2001; Thomson et al. 2003; Marri et al. 2005; Tian et al. 2006). To facilitate high-resolution genetic analysis within the Oryza genus, the Oryza Map Alignment Project (OMAP) is constructing robust physical, bacterial artificial chromosome (BAC)-based maps of 11 wild Oryza species as well as the African cultivated species O. glaberrima (Wing et al. 2005). The wild rice BAC libraries represent a valuable resource that can facilitate gene isolation and future transgenic variety development, studies
7 Naturally Occurring Alleles for Crop Improvement
113
of molecular evolution, and mining of agronomically useful alleles from species that are not sexually compatible with O. sativa. This work is reviewed in Chapter 15 of this book.
7.3 Understanding Evolutionary History Through Natural Variation 7.3.1 Origins of Natural Variation: A Short History of Orzya sativa The genus Oryza contains 23 species, of which 2 are cultivated (O. sativa and O. glaberrima) and 21 are wild. The genus is divided into four species complexes—the O. sativa-, O. officinalis-, O. ridleyi i-, and O. granulataspecies complex. The term “complex” is used to indicate aggregates of species that lack good taxonomic characteristics to distinguish them from each other (Vaughan et al. 2003). The first confirmation that these species complexes were meaningful at the molecular genetic level was provided by Wang et al (1992), who used restriction fragment length polymorphisms (RFLPs) to investigate 93 accessions from 21 species in the Oryza genus. They were able to identify the four species complexes corresponding to those of Vaughan (Vaughan et al. 2003). More recently, Vaughan et al. (2005) suggested that the O. ridleyi- and O. granulata- complexes, found in Southeast Asia and New Guinea Southeast and Continental Asia, respectively, are more ancestral than the O. sativa and O. officinalis complexes. This is consistent with data indicating that the highest genomic diversity within Oryza is found in New Guinea, and supports the hypothesis that the earliest forms of the genus Oryza may have evolved in the Australasian region (Vaughan 1991). All extant members of the Oryza genus have n = 12 chromosomes and interspecific crossing is possible within each complex, though it is difficult to recover fertile offspring from crosses between members of the different complexes (Vaughan et al. 2003). Within the O. sativa complex are seven major species—two cultivated rices: O. sativa and O. glaberrima, and 5 wild: O. rufipogon sensu lactu (known as O. nivara in its annual state), O. barthii, O. longistaminata, O. meridionalis, and O. glumaepatula (Vaughan et al. 2003). All species within the O. sativa complex have a diploid chromosome number of 2n = 24, and, each is found in a different geographical region. O. sativa and its wild ancestor O. rufipogon are found throughout South and Southeast Asia, and gene flow between them is fairly common. O. glaberrima, the cultivated African rice, and its wild ancestor O. barthii are found in West Africa; the wild perennial O. longistaminata is found in central and
114
Anjali S. Iyer-Pascuzzi et al.
eastern Africa. O. meridionalis is a wild Australian rice, while O. glumaepatula is a Latin American wild species. Several authors have suggested various dates for the divergence of species within the O. sativa complex, but the lineage of each is not completely clear. Using a molecular clock approach, Zhu and Ge (2005) suggested that the AA genome species divergence occurred approximately 2 million years ago (Mya). Although reproductive barriers among species in each complex exist, a number of groups have demonstrated the feasibility of inter-specific crossing between them. For example, Naredo et al. (1997) showed that the Asian O. rufipogon and O. sativa can be hybridized with the Australian O. meridionalis, though seed set and frequency of hybrids were quite low. Using a cytological study of both intra- and interspecific hybrids, Lu et al. (1998) reported that individuals from both types of crosses had high chromosomal pairing during metaphase I, suggesting that the genomes within each AA species are structurally very similar. In this chapter, we focus on the breeding value of AA genome species within the O. sativa complex. Our particular interest is on the use of natural variation brought into O. sativa from the wild gene pool, while we recognize that cultivated rice is more commonly improved with alleles derived from other cultivars. 7.3.2 Genetic Markers: Assessing Diversity and Population Structure in O. sativa The extent of natural variation within a species can be documented at the level of the phenome, the genome, the transcriptome, or the proteome. Currently, much of our knowledge regarding genetic diversity in rice has come from analysis of the genome using molecular markers. Previously, the diversity found in rice was assessed based on morphology, crossing behavior, and cytology. As early as 100 A.D., the Chinese had recognized two main subgroupings, Hsien (indica) and Keng (japonica) (Katayama 1993). In the early twentieth century, Kato et al. (1928) used morphology and hybrid sterility to study the separation and genetic isolation of the indica and japonica groups. A third group or subpopulation was identified based on morphology by Matsuo (1952) and is referred to as the javanica or tropical japonica subpopulation. Additional work (Morishima and Oka 1970; Oka and Morishima 1982), including cytological studies (Engle et al. 1969), demonstrated clear genetic differentiation among these three widely recognized groups of rice. Isozymes, RFLPs, random amplification of polymophic DNAs (RAPDs), amplified fragment length polymorphisms (AFLPs), simple
7 Naturally Occurring Alleles for Crop Improvement
115
sequence repeats (SSRs), and more recently, single nucleotide polymorphisms (SNPs), have been used to detect rice genetic diversity and population structure (see Edwards and McCouch 2005 for a review). Second (1982) was among the first to use isozymes to differentiate the indica and japonica groups within O. sativa, as well as the geographically defined groupings distinguishing African and Asian rices. In a landmark study, Glazmann (1987) used 15 polymorphic loci coding for 8 isozymes, to classify nearly 1,700 O. sativa varieties from across Asia. He identified 6 different varietal groupings or subpopulations, indica, japonica, aus, aromatic, rayada, and ashina. Subsequent work with RFLPs detected only the indica–japonica differentiation (Wang and Tanksley 1989), leaving the subpopulation structure of rice unresolved. A recent study by Garris et al. (2005) using 234 accessions of rice and 169 nuclear SSRs identified five major subpopulations: aromatic, aus, indica, temperate japonica, and tropical japonica. These groupings corresponded well with Glazmann’s original classification, and support the idea that O. sativa consists of more than two or three genetically identifiable groups. Other studies have explored the antiquity of the indica–japonica differentiation to determine whether it occurred pre- or post-domestication. Initially, evidence from isozymes and RFLPs demonstrated that indica and japonica accessions were more closely related to different accessions of O. rufipogon than to each other (Second 1982; Wang et al. 1992). Analysis of genomic sequence from cv Nipponbare (japonica) and cv 93-11 (indica) provided additional support for the idea that the divergence predated domestication. It was estimated that the indica and japonica subgroups diverged between 200,000 and 400,000 years ago (0.2 to 0.4 Mya) (Ma and Bennetzen 2004; Zhu and Ge 2005), long before the domestication of O. sativa, which is estimated to have occurred about 10,000 years ago (Vaughan et al. 2003). Taken together, all the data suggest that the O. rufipogon ancestor must have contained at least two differentiated subgroups from which the indica and japonica groups were independently domesticated (Chang 1976; Second 1982; Wang et al. 1992; Ohtsubo et al. 2004; Garris et al. 2005). Recently, based on the antiquity and genetic distinctiveness of the aromatic and aus groups (Jain et al. 2004; Garris et al. 2005), it has been proposed that these major subpopulations may have been independently domesticated from different subpopulations of O. rufipogon (McCouch et al. 2006). Additional investigations are necessary to understand the evolution, speciation, and subpopulation divergence of members of the Oryza genus. Understanding how genetic diversity is partitioned within and between subpopulations of rice is important because these gene pools represent major reservoirs of natural variation that can be exploited by both plant breeders and geneticists.
116
Anjali S. Iyer-Pascuzzi et al.
7.4 Natural Variation and Functional Genomics: Utilizing Germplasm to Identify Useful Alleles 7.4.1 Genetic Markers and Their Use in Mapping Genetic markers are tools that can detect differences (polymorphisms) in the DNA of individuals. A single copy marker identifies a unique locus in a genome, while multiple copy markers identify loci in repetitive regions of DNA or gene families. Historically, genetic markers were detected as simply inherited differences between individuals and were used to construct linkage maps. Molecular markers first became available for rice in the mid-1970s when isozymes were used to assess genetic diversity (Second 1982). The first RFLP map of rice was published in 1988 (McCouch and Kochert 1988) and was followed by intensive mapping of rice chromosomes using both RFLP and SSR markers in subsequent years (Saito et al. 1991; Causse et al. 1994; Chen et al. 1997; Harushima et al. 1998; Temnykh et al. 2000, 2001; McCouch et al. 2002). 7.4.2 Mapping Populations Just as in standard mutant analysis, which requires the development of mutant collections and subsequent screening, identifying genes underlying natural variation involves the development of populations (for mapping) or germplasm collections (for association analysis) followed by phenotypic and genotypic screening. To construct a mapping population, individuals that differ for the trait of interest are crossed and the segregating progeny are used for analysis. There are many types of mapping populations, and the advantages and disadvantages of each must be taken into consideration when deciding what type of population to construct for a particular study. In this section, we discuss several of the most common types of populations used for genetic mapping and we demonstrate their use in identifying and cloning genes of interest (Table 7.1 and Fig. 7.1). When the map position of a genetic marker is known, it provides an efficient way of determining the position of a gene along a chromosome. Molecular mapping is often the first step in identifying a gene underlying a phenotype of interest. With the availability of molecular maps and markers, breeders and geneticists have gained a powerful tool that allows them to more easily identify the genes underlying both qualitative and quantitative variation. As the identity of genes underlying quantitative variation is discovered, the information can be used to identify different alleles in a range of natural and mutant populations, to develop “perfect markers” that are useful for creating new varieties, to characterize the molecular function of the genes, or to develop novel applications using transgenic approaches.
Septiningsih 2002 Marri et al. 2005
Wan et al. 2006 Zheng et al. 2003 Zhuang et al. 2002 Sirithunya et al. 2002 Lanceras et al. 2000
red pericarp, flowering time sd1 yield bacterial blight resistance genes
grain length root length and number yield neck and leaf blast amylose content, gel consistency, gelatinization temperature submergence
O. sativa (cv. Jefferson) × O. rufipogon O. sativa (cv. IR64) x O. rufipogon O. sativa (cv.IR58025A) × O. rufipogon
IR24/Toyonishiki/Miyang 23 × various bacterial blight resistance donors
Asominori × IR24 IR1552 × Azucena Zhenshan 97B × Milyang 46 Khao Dawk Mali 105 × CT9993-5-10-M KDML105 × CT9993
Advanced backcross
Nearly isogenic
Recombinant inbred
Chromosome segment substitution and introgression
Nipponbare × Kasalath Asominori × IR24 Koshihikari × Kasalath Zhenshan 97B (indica) × Nipponbare O. sativa × O. meridionalis Taichung 65 (japonica) × O. glumaepatula Asominori × IR24 O. sativa (japonica) × O. glaberrima Asominori × IR24
IR74 × FR13A (indica)
Sweeney et al. 2006; Thomson et al. 2006
flowering time
Yamanouchi et al. 2002 Wan et al. 2004 Ebitani et al. 2005 Mu et al. 2004 Kurakazu et al. 2001 Sobrizal et al. 1999 Aida et al. 1997 Doi et al. 2002 Kubo et al. 2002
Spl7 eating quality spikelet numbers/panicle and culm length development of lines for research development of lines for research development of lines for research development of lines for research development of lines for research development of lines for research
Nandi et al. 1997
Gu et al. 2005; Iyer and McCouch 2004; Jiang et al. 2006; Song et al. 1995; Sun et al. 2004; Yoshimura et al. 1998
see Chapter 16 by Ashikari et al. this book
Hittalmani et al. 2003 Zheng et al. 2006) Lanceras et al. 2004
Nipponbare × Kasalath
Doubled haploid
F2
Reference
Example of use to clone or map natural variants yield and vegetative growth root growth and length drought
Mapping population (Recurrent parent is written first) IR64 × Azucena IR64 × Azucena CT9993-510-1-M (upland japonica) × IR62266-42-6-2 (indica)
Type
Table 7.1. Examples of mapping populations discussed in the text
7 Naturally Occurring Alleles for Crop Improvement 117
118
Anjali S. Iyer-Pascuzzi et al.
Fig. 7.1. Structure of mapping populations. Black and gray bars indicated chromosomal segments from either the maternal or paternal parent. Each group of 12 chromosomes (n = 6) represents the genome of an individual plant.
Doubled Haploid Populations
Doubled haploid (DH) populations are developed by collecting anthers (which contain haploid microspores or pollen cells), culturing them on artificial media, doubling their chromosome number, and regenerating whole plants from the diploid cells. Colchicine or another mitotic inhibitor is used to double the chromosome number, giving rise to diploid microspores
7 Naturally Occurring Alleles for Crop Improvement
119
that are completely homozygous. The main advantages of DH populations are that they are 100% homozygous, genetically stable, and the resulting diploid plants can be immortalized through self-pollination. In addition, their development takes only one generation, as opposed to eight or more generations to reach effective homozygosity in conventionally bred populations. The complete homozygosity of DH populations allows researchers to evaluate plots of genetically identical individuals in multiple years and locations and to identify recessive traits that may be more difficult to observe in other populations. However, an important disadvantage of DH populations is that whole plants must be regenerated via tissue culture from individual microspores. This may create a bottleneck if specific varieties are difficult to regenerate using available tissue culture methods or if specific genotypes within the range of recombinants regenerate more readily than others, leading to skewing of allele frequencies in the DH lines (Guiderdoni et al. 1988; Xu et al. 1997). While most DH populations in rice are developed from F1 anthers, the DH method is flexible and can be used to double the chromosome number of microspores at any point in a generation advance program. If applied to F1 anthers, recombination is limited to a single generation of meiosis. This means that a DH population will contain large linkage blocks that have not been broken up by subsequent generations of recombination, providing low resolution for mapping. However, if it is used to “fix” the genetics of F8 microspores, the resulting homozygous F9 lines will capture higher levels of recombination. DH populations from indica x japonica crosses have been used to identify QTL for yield-related traits (Lu et al. 1997; Hittalmani et al. 2003; Li et al. 2003; Lanceras et al. 2004) root growth, length, and thickness (Zheng et al. 2000, 2006; Kurakazu et al. 2001) and plant height (Li et al. 2003), among others. F2 Populations
F2 populations in rice can be developed rapidly and with minimum investment, and as a result, population sizes can be large. These populations display tremendous diversity and all three genotypic classes are represented, making it possible to assess dominance and additivity for any locus of interest. Unfortunately, F2 populations have several disadvantages. The lines are not fixed, so replications of F2 individuals are not possible, though F3 families derived from selfing the F2 plants can be grown in replicated trials for phenotyping. If there are sterility problems in crosses with wild or exotic (unadapted) germplasm, F2 populations show significant skewing of allele frequencies (Xu et al. 1997) but despite this problem, they have been used in a number of cases involving interspecific crosses to identify genes or QTLs of interest (Xiong et al. 1999; Cai and Morishima 2002).
120
Anjali S. Iyer-Pascuzzi et al.
Once a gene or QTL has been mapped onto an F2 population, much work is needed to confirm and quantify the effect of a particular locus. This is due in great part to the fact that the genetic background of each line is different and therefore many generations of backcrossing are required to homogenize the genetic background and effectively “Mendelize” the trait. This may involve creating near isogenic lines (NILs) or simply backcrossing enough times to remove alleles at “background” loci that contribute to the phenotypic variation of interest. Once lines have been created that clearly differ for the trait of interest, and when crossed, segregate in a 3:1 ratio for the phenotype (suggesting that a single locus is responsible for the phenotype), these lines can be used to construct an F2 population that can be readily used as the basis for both forward and reverse genetics analysis aimed at gene isolation. Reports of gene isolation using large F2 populations typically refer to crosses between NILs or advanced backcrossed lines (Iyer and McCouch 2004; Sweeney et al. 2006), and many such examples are described in the following sections that cover these types of populations. Recombinant Inbred Lines
Recombinant inbred lines (RILs) are created by single-seed descent from F2 individuals. F2 plants are selfed for sic or more generations, saving one seed from each plant per generation to create highly homozygous lines that differ from each other (Burr et al. 1988). Development of RIL populations is laborious and time-consuming, but they are well suited as the basis for community mapping projects because of their inbred composition. Once an RIL population is created, it can be maintained indefinitely by selfpollination and the lines can be shared with colleagues simply by exchanging aliquots of seed. In community projects, the population of RILs is genotyped using genetic markers and the lines are then sent to interested researchers for phenotypic evaluation. The RILs can be evaluated at any time or place and results can be analyzed in conjunction with the available marker data to map genes and QTLs of interest. The genetic resolution of RILs can be improved by randomly intermating F2 individuals before the development of the RIL population. In maize, this technique has been used to generate a 2.7-fold increase in recombination after five generations of intermating (Lee et al. 2002). Though there are currently no examples of cloned genes from such populations, numerous genes and QTLs have been mapped at reasonable resolution. In rice, QTLs mapped using RIL populations include those for root length and number (Champoux et al. 1995; Zheng et al. 2003), submergence tolerance (Nandi et al. 1997), cold tolerance (Andaya and Mackill 2003), cooking and eating quality (Lanceras et al.
7 Naturally Occurring Alleles for Crop Improvement
121
2000), blast resistance (Wang et al. 1994; Sirithunya et al. 2002), yield (Xing et al. 2002; Zhuang et al. 2002; Shen et al. 2003), and grain length (Wan et al. 2006). Advanced Backcross Populations
Advanced backcross (AB) populations are particularly useful in wide crosses aimed at identifying QTLs. When using wild relatives, inbreeding after crossing often results in sterility, making it difficult to generate a large, random array of segregants for mapping. The advanced backcross QTL (AB-QTL; (Tanksley and Nelson 1996) method suggests a way to overcome this problem. AB-QTL allows the plant biologist to create an array of BC2 or BC3 lines, each containing a small number of random introgressions from the donor wild species in an elite varietal background (serving as the recurrent parent), and thus to expedite QTL mapping, variety development, fine mapping, and gene discovery (Tanksley and Nelson 1996). In this method, the process of QTL identification involves the simultaneous transfer of QTLs into elite breeding lines. The strategy is to discard BC1 plants with extreme phenotypes or noxious wild or weedy characteristics (i.e., steriles, excessively tall or late flowering types, those with shattering or excessive seed dormancy) and to concentrate on families with agronomically tolerable plant type. By eliminating deleterious alleles early in the process, the likelihood of detecting favorable introgressions from the wild or exotic parent is increased and introgressions that interact positively with the genetic background of the elite parent are not masked by undesirable alleles at other loci. In rice, the AB-QTL method has been used to map QTLs for yield and grain quality from the progenitor wild species, O. rufipogon. One advantage of the AB-QTL method is that QTL mapping uses populations of elite lines that are almost isogenic for the introgressed QTL/genes of interest. This means that it requires only a few generations of backcrossing to develop lines that can be evaluated as improved varieties. However, this method also has several disadvantages. Owing to linkage drag, favorable alleles may be discarded during the early stages of population development. This will happen if recombination is not adequate to break linkage between negative and positive traits. Another disadvantage is that each BC2 or BC3 line is expected to contain only about 12.5% or 6.25% of the donor genome, respectively, in any individual BC2F2 family. If epistatic interactions between donor introgressions are required to give a useful phenotype, they are much more likely to be detected in an F2 or RIL population than in advanced backcross families. Thus, the feature that most distinguishes advanced backcross from other types of populations used for
122
Anjali S. Iyer-Pascuzzi et al.
mapping can be an advantage or a disadvantage, depending on the trait and the objectives of the researcher. AB-QTL analysis provided the basis for isolating QTL for flowering time (Thomson et al. 2006), sd1 (Septiningsih 2002), and yield (Marri et al. 2005). Further, Sweeney et al. (2006) recently used this method to isolate the gene for red pericarp. Red grain color is ubiquitous among the wild ancestors of O. sativa and is found in many early landraces and weedy rices. Though modern cultivars are all white, red weedy rice is a persistent problem in farmers’ fields in North America and wherever direct seeding is used. Rice crops that are contaminated by red rice are penalized in the marketplace, owing to the different grain quality and cooking characteristics of the weedy rice. Because weedy rice retains dormancy and shattering characteristics, it persists in farmers’ fields, and because it is an excellent mimic of its white cousin and intercrosses freely, it has been very difficult to eradicate. Knowledge of how the red pericarp, dormancy and shattering traits are controlled genetically is expected to offer new insights into novel methods of weed control for rice farmers around the world. QTL analysis and subsequent fine-mapping in an AB population derived from a cross between O. rufipogon and cv. Jefferson, a US tropical japonica cultivar, identified a large QTL in the centromere region of chromosome 7 that corresponded to the previously mapped position of a classically defined gene, Rc, conferring red pericarp. Sweeney et al. (2006) demonstrated that Rc encodes a basic helix–loop–helix (bHLH) protein and sequence comparison of Rc alleles from the mapping parents as well as a panel of wild and landrace varieties with red pericarp and other varieties with white pericarp identified the functional nucleotide polymorphism (FNP) as a 14-bp deletion that deleted the bHLH domain of the protein in white rice. A natural variant with a light red pericarp was also examined and provided a third allele, Rc-s, that contained a premature stop codon before the bHLH domain. Importantly, this gene could not have been identified via any of the currently available functional genomic populations because the “wild type” used to develop these populations are all varieties with white pericarp that contain nonfunctional alleles of the gene. In fact, it is unlikely that existing functional genomics populations will be useful for genetic analysis of any domestication-related traits or for detecting useful alleles that did not pass through the domestication bottleneck and remain locked up in wild and weedy ancestors. Nearly Isogenic Lines
Nearly isogenic lines (NILs) differ from each other genetically owing to a single donor introgression in the genetic background of a recurrent parent (RP). Unlike true isogenic lines (iso-lines), which differ by a mutation at a
7 Naturally Occurring Alleles for Crop Improvement
123
single genetic locus, NILs, or substitution lines, differ by a single introgression that may contain a hundred or more genes. NILs may be constructed using forward genetics, in which case, the NILs differ for a key phenotypic trait of interest. They may also be constructed using reverse genetics, in which case the introgression is targeted to a particular region of the genome or a particular genetic locus, with the objective of investigating the phenotypic impact of substituting alleles in that particular region. To construct true iso-lines, a wild type (wt) is subjected to a mutagenic agent. NILs, conversely, are constructed by crossing a donor genotype with a line selected to serve as the recurrent parent, followed by several generations of backcrossing coupled with phenotypic and/or molecular marker-assisted selection. The size of the donor introgression can vary dramatically among pairs of NILs and, when selection relies solely on phenotype, multiple introgressions may unknowingly remain in the genetic background of the RP. In this sense, the definition of an NIL is not as strict as that of a true iso-line, but fixed NILs must show the expected 3:1 (for a dominant trait) or 1:2:1 (for a co-dominant trait) segregation ratio for the phenotype of interest when crossed with each other. NILs have been developed for both qualitative and quantitative traits and they are widely used to isolate genes underlying QTLs. The starting point is generally natural variation found in germplasm resources. Phenotypic differences are identified in segregants derived from crosses between diverse parents, and individual lines are backcrossed to the recurrent parent (RP) for several generations to isolate a donor introgression that segregates with the phenotype of interest. When isolating components of a quantitatively inherited phenotype, several QTL-NILs may be developed to identify different genetic components of the trait. Once fixed NILs are developed, they may be crossed and the resulting F2 population provides excellent resolution for high-resolution mapping and gene isolation. In rice, NILs have been used extensively in the cloning of bacterial blight resistance genes. Bacterial blight, caused by Xanthomonas oryzae pv. orzyae (Xoo), is a serious disease of rice in South and Southeast Asia. Two sets of NILs were constructed in the 1980s, one in the recurrent parent (RP) background of the susceptible indica cultivar, IR24, and a second in the background of the susceptible japonica cultivar, Toyonishiki (Ogawa et al. 1988). Each set of NILs consisted of lines containing a different bacterial blight resistance gene that had been introduced from diverse donors. F2 populations derived from crosses of these resistant NILs to the corresponding susceptible RPs have provided the basis for the isolation of every bacterial blight disease resistance gene cloned to date (Xa21, Xa1, Xa26, xa5, and Xa27) (Song et al. 1995; Yoshimura et al. 1998; Iyer and McCouch 2004; Sun et al. 2004; Gu et al. 2005; Jiang et al. 2006). Several recent publications regarding the cloning of these genes have
124
Anjali S. Iyer-Pascuzzi et al.
highlighted the role of natural selection in the evolution of this system of host–pathogen interaction (Iyer and McCouch 2004; Sun et al. 2004). Using positional cloning with an F2 population generated from a cross between the NILs IRBB5 (resistant) × IR24 (susceptible), recombinational analysis identified the recessive xa5 gene as the small subunit of transcription factor IIA (TFIIAγ) (Iyer and McCouch 2004; Jiang et al. 2006). Semiquantitative reverse transcriptase-polymerase chain reaction (RT-PCR) analysis revealed no difference in expression between resistant and susceptible alleles, but sequence comparison showed that the proteins associated with the susceptible and resistant response differed by a single amino acid. Susceptible lines had valine and resistant lines had glutamic acid in a solvent-exposed region of the protein. This represents a significant change, from a hydrophobic to hydrophilic amino acid, but protein modeling suggested that the change should not affect protein structure. An association study with 36 individuals from the aus subpopulation (from which xa5 was derived) showed that all 27 resistant varieties carried glutamic acid at the critical position, while the 9 susceptible varieties all had valine. Examination of all TFIIAγ ESTs or cDNAs available for this essential transcription factor in plant species found in GenBank in 2003 revealed that all carried hydrophobic amino acids in the critical position of the molecule. Given the rarity of this event, it is unlikely that functional genomics populations would have identified the xa5 allele, highlighting the value of exploring natural variation as we examine the genetics of host–pathogen interaction. Xa27 was also isolated using positional cloning in an F2 population derived from a cross of the NILs IRBB27 (resistant) × IR24 (susceptible) (Gu et al. 2005). Xa27 was found to encode a protein consisting of 113 amino acids with little sequence similarity to other rice proteins. The coding regions of the susceptible and resistant alleles were nearly identical, but the authors found several key insertions in the promoter of the gene and, unlike most resistance genes, they found that the resistant allele, Xa27, was induced on inoculation with Xanthomonas, while the susceptible allele was not. This induction was strongest 3 days after inoculation and was localized to the area of infection. Ectopic expression of Xa27 resulted in resistance in the absence of a key pathogen protein, demonstrating that Xa27 expression and consequent protein production are the keys to resistance. Therefore, variations in the promoter are responsible for resistance when challenged by the pathogen. Understanding the mechanisms of this novel gene will shed light on the subtle natural variation in resistance mechanisms. Introgression Lines
Introgression lines are pre-NILs and contain multiple introgressions in an RP background. In a recent study by Li et al. (2005), more than 20,000
7 Naturally Occurring Alleles for Crop Improvement
125
introgression lines were developed from crosses between 195 different donors and three RPs. The backcrossed lines were screened under a number of extreme stress conditions and mass selection was used to select lines that outperformed the elite RP. The surviving lines were genotyped using molecular markers and donor introgressions were mapped. The genotypes of the best performing lines were compared with the RP to look for a significant association between specific donor introgressions and performance in the population of lines that survived the stress. The power of this approach is that it allowed researchers to identify cases in which more than one introgression from the donor contributed favorably to performance under stress. Future work will be required to backcross these introgression lines, create NILs, isolate the genes underlying the QTLs, and examine the genetics of the enhanced performance. In the meantime, the lines are useful to plant breeders as donors of valuable alleles and can be used in marker-assisted breeding programs to transfer the specific introgressions of interest into additional RPs. QTL-NIL
A specialized form of the NIL, known as the QTL-NIL, has been used to fine map QTLs for many traits, including heading date, yield, submergence tolerance, and salt tolerance (reviewed in Yano 2001). The excellent work on heading date is reviewed in chapter 16 of this book, but in this section we discuss the use of QTL-NILs to isolate genes associated with yield and salt tolerance. QTL-NIL can be derived from any of the populations described in the preceding text. Yield is a composite trait and is of interest to every plant breeder. With strong G × E interactions, and a multiplicity of genes contributing to its expression, the use of standard mutant analysis to identify genes underlying yield would be expected to be a long and painstaking process. Recently, Ashikari et al. (2005) cloned a gene controlling grain number that was identified using QTL-NILs derived from a cross between the indica cultivar, Habataki, and the japonica cultivar, Koshihikari. Habataki produces more grains on the main panicle than Koshihikari. Gn1, the most significant QTL associated with grain number, was located on chromosome 1 and explained 44% of the phenotypic variation for the trait. A fixed QTL-NIL for Gn1 was constructed in the Koshihikari background, with a recessive introgression from the Habataki donor conferring enhanced grain number. The Gn1 QTL was found to consist of two loci, Gn1a and Gn1b. The candidate region of Gn1a was narrowed to a 6.3-kb region with one open reading frame predicted for cytokinin oxidase, OsCKX2. The DNA sequence of the OsCKX2 gene contained several nucleotide differences between the two parental lines, making it difficult to determine the identity of
126
Anjali S. Iyer-Pascuzzi et al.
the FNP. Thus, the gene was sequenced in three additional Chinese rice varieties that had exceptionally high grain number. Two of these proved to have the same haplotype as Habataki, but a novel 11-bp deletion was detected in the coding region of the third variety, 5150. This variety had the highest grain number of the varieties tested, with more than 400 grains in the primary panicle. The deletion created a premature stop codon, suggesting that 5150 was null for OsCKX2. Transgenic plants were produced, and those carrying two copies of the sense strand of OsCKX2 showed reduced grain number and those with antisense strands of OsCKX2 showed reduced levels of cytokinin oxidase expression, along with higher grain number. Thus, Gn1a was identified as OsCKX2. Tolerance to salt stress is important in many environments. Ren et al. (2005) cloned a QTLs underlying salt stress tolerance where the favorable allele came from the traditional indica cultivar, Nona Bokra. Originally + mapped as one of eight QTL for salt-related traits based on K shoot content, SKC1 (Shoot Potassium Concentration1) was recently shown to encode a member of a group of transporters known as HKT. SKC1, a dominant allele from Nona Bokra, accounted for 40% of the variation between Nona Bokra and the salt-sensitive japonica variety, Koshihikari. It was cloned using QTL-NILs derived from a BC2F2 population. The authors narrowed the region containing the gene to 7.4 kb using recombinational analysis. This area contained an ORF encoding OsHKT8, and complementation analysis confirmed the identity of the gene. OsHKT8 alleles in Nona Bokra and Koshihikari differed by six nucleotide substitutions and four amino acid changes. Comparing NILs, the authors demonstrated that the alleles from both parents were expressed in similar amounts in roots of both varieties under normal conditions, and up-regulated in the roots of both parents under salt stress. In shoots, SKC1 mRNA was expressed at a lower level than in roots, and was not up-regulated by salt stress. The gene was detected primarily in vascular tissue (in parenchyma cells bordering xylem vessels). The authors found no difference in potassium or salt concentration in roots between the NILs. However, under salt stress conditions, the NIL with the Nona Bokra introgression (SKC1) had higher potassium concentrations in the shoots and xylem sap than did the Koshikari RP. Thus, SKC1 may be involved in the regulation of sodium/ potassium homeostasis in the shoots. Measurement of transport activity of SKC1 showed that the protein selectively transports sodium. Further, the Nona Bokra SKC1 protein was more active than the Koshikari protein, indicating that SKC1 is a QTL underlying natural variation of two functional transporters. The authors hypothesis is that Nona Bokra is salt tolerant because its HKT transporter, SKC1, can unload more sodium from the xylem, leading to lower sodium in the shoots and a higher tolerance than potassium. This study elegantly demonstrates
7 Naturally Occurring Alleles for Crop Improvement
127
how cloning QTL leads to an understanding of the complex genetic mechanisms underlying natural variation for traits of evolutionary significance. Chromosome Segment Substitution Lines
Chromosome segment substitution lines (CSSLs) can be thought of as a library of NILs in which each line contains a different segment of DNA and together, the CSSLs provide a complete library of donor introgressions. CSSLs provide a lasting resource in which all of the donor genome is present in the background of a recipient genome. Because only one introgression is present in each line, its effects are not masked by other components of the donor genome. These lines offer an excellent starting point for mapping and cloning a gene or QTL of interest, and are especially important for identifying genes with small effects. Many studies have reported the creation of CSSLs in rice, using a variety of donors and recipients (Aida et al. 1997; Doi et al. 2002; Table 7.1). Some involve indica × japonica crosses and offer a view of the variation that is generated when the genomes of individuals from these two groups are combined. Using indica × japonica CSSLs, researchers have identified QTLs for seed dormancy, cadium (Cd) concentration in grain, grain length and width, cooking quality, tiller angle, nitrogen content, heading date, and resistance to iron toxicity, among others (Kubo et al. 2002; Jiang et al. 2003; Wan et al. 2003, 2004; Mu et al.2004; Ebitani et al. 2005; Ishikawa et al. 2005; Yang et al. 2005; Yu et al. 2005; Wan et al. 2006). Other CSSLs involve crosses between the two cultivated species of rice, O. sativa x O. glaberrima (Ghesquiere et al. 1997), or between wild or weedy relatives and cultivated rice (Sobrizal et al. 1999; Kurakazu et al. 2001; Ahn et al. 2002; Tian et al. 2006). These lines provide a permanent resource for the rice genetics community and represent “bridging” material for rice breeders who wish to introgress specific regions of a donor genome into an elite cultivar to take advantage of the large reservoir of natural variation that exits within the Oryza genus. Individually, CSSLs are no different than NILs, but together, they represent a handy tool for low-resolution mapping, in that the phenotype of each individual line can be compared to all other lines, or only to the RP. If it is significantly different than the RP, it can be concluded that a gene(s) associated with the phenotype of interest is located within the region of introgression. CSSLs can also be used as the parents of choice in crosses with a third (donor) genotype, where the genetic background of the CSSL is a good combiner with the new donor, but an introgression across a region of interest provides a divergent template that facilitates the identification of polymorphic DNA markers. This approach was employed in an interesting study to clone the spotted leaf gene, Spl7, which combined mutant analysis
128
Anjali S. Iyer-Pascuzzi et al.
with mapping using CSSLs (Yamanouchi et al. 2002). Induced mutations in the japonica cultivar, Norin 8, resulted in a mutant, spl7, that showed spontaneous lesions under high temperature and UV light. Mutants also showed increased susceptibility to pathogens and had decreased levels of defense genes. Spl7 was mapped to the long arm of chromosome 5. To clone the gene, the Norin 8 mutant line was crossed to a CSSL (KL210), which contained a segment of chromosome 5 from the indica cultivar, Kasalath, in the background of the japonica cultivar, Nipponbare. The region of interest was highly polymorphic owing to the indica–japonica combination, but the rest of the genome was japonica, making it easy to generate a large, fertile F2 population for subsequent fine-mapping and gene isolation. Spl7 was determined to correspond to heat shock factor 7 (HSF7). 7.4.3 Association Mapping A mapping approach that has been widely used in human genetics, and only more recently in plants, is linkage disequilibrium (LD) mapping, in which nonrandom associations among alleles within a population (i.e., alleles that are correlated, or co-inherited) are interpreted as being physically linked (Nordborg and Tavare 2002; Weiss and Clark 2002; Fig. 7.2). The molecular markers most commonly used for association mapping and evolutionary studies include SNPs, which are the most abundant type of DNA polymorphism in eukaryotic genomes, and SSRs, which are known to be highly polymorphic. Studies using SNPs and SSRs have been used to examine the effect of population history, breeding system and selection in particular regions of the genome or at particular genetic loci, and to investigate the mechanisms that drive evolutionary change and contribute to genomic diversity. These marker systems allow scientists to examine the levels and patterning of nucleotide polymorphisms within and between loci and to test whether specific genes are evolving under selection or in a neutral manner (Hudson and Kaplan 1988; Kreitman and Akashi 1995; Hudson et al. 1997; Nielsen 2001). When a mutation initially arises in a population, it is automatically associated or “in disequilibrium” with all the alleles present in the genome of the individual that gave rise to the mutation. If this mutation persists in the population over evolutionary time, through genetic drift, associations with other alleles are gradually eroded by segregation and recombination, so that eventually, the mutation is in LD only with alleles that are physically closely linked to it (Barton 2000). There are several measures of LD (Weir 1996), and it has been demonstrated that LD is affected by various evolutionary and demographic forces, including selection, population admixtures, inbreeding, and bottlenecks (Weir 1996; Nordborg and Tavare 2002; Weiss and Clark 2002).
7 Naturally Occurring Alleles for Crop Improvement
129
Fig. 7.2. Association mapping. Each blue horizontal bar represents a chromosomal region that has been genotyped in the association study. The colored vertical bars represent polymorphic insertion/deletion (indel) loci across the region, each color representing a different nucleotide. The genotypes have been grouped by phenotype, in this case resistance and susceptibility to disease. The boxed region contains indels that are in LD with the trait. All resistant plants carry the same haplotype and susceptible plants carry a different haplotype (See also color plate section).
Taking advantage of LD and the high densities of SNP and SSR markers that have been annotated in a genome, LD mapping seeks to identify and map genes responsible for both qualitative and quantitative trait variation (Terwilliger and Weiss 1998; Kruglyak 1999; Jorde 2000 ). This mapping strategy, also referred to as association mapping, offers an alternative to the requirement to make controlled biparental crosses and to develop large populations for segregation analysis. In humans, in whom controlled crossing is not an option, association mapping exploits the fact that any mutation that causes a phenotypic change and that persists in a population should be in LD with an array of alleles that are closely linked to it (Terwilliger and Weiss 1998; Kruglyak 1999; Jorde 2000). To undertake an association mapping experiment, a population is genotyped for markers that span a genomic region of interest. The markers are then tested against a specific phenotype to determine whether a statistical correlation exists between marker genotypes and a particular trait. A significant association between a specific marker(s) and a trait phenotype may arise either because the nucleotide polymorphism causes the phenotypic difference, or because the marker is in LD with the causal (functional) polymorphism. LD mapping has several advantages over QTL mapping approaches. First, it can survey the variation in a large population, and not simply the two progenitors of a mapping population. Second, by relying on historical recombination, it offers the hope of being able to localize QTLs to a higher
130
Anjali S. Iyer-Pascuzzi et al.
degree of resolution than is possible with the same number of individuals using traditional QTL linkage analysis. Third, the technique can be used without the need to develop new mapping populations. These features have led to concerted efforts to develop and exploit LD mapping for identifying genes in several crop species (Long et al. 1998; Puca et al. 2001; Thornsberry et al. 2001; Tabor et al. 2002). In general, LD mapping can be used either in genome scans or in candidate gene association studies (Kruglyak 1999; Tabor et al. 2002). In genome scans, either the entire genome or a specific genomic region can be analyzed with molecular markers of sufficient density that they help localize the QTL. In a candidate gene association study, a candidate gene for a given trait may have been previously identified, and the association is examined in the context of polymorphic markers localized within this specific functional candidate gene (Tabor et al. 2002). Association mapping using a whole genome scan was first used in rice in a study by Virk et al. (1996) and more recently by Zhang et al. (2005). The ability to undertake reliable whole genome LD mapping depends on both the density of markers and the extent of LD in the rice genome. The first study to evaluate the extent of LD decay within a candidate gene region in rice was that of Garris et al. (2003), who examined the extent of LD around the xa5 locus. Using 114 landraces from the aus sub-population of rice, LD was found to persist across a 70-kb region containing five genes bracketing the xa5 locus. This meant that the gene could not be unequivocally identified via this approach, and further recombinational mapping was necessary to isolate the gene (Iyer and McCouch 2004). A second example of LD analysis in rice targeted the region around the Waxy gene (Olsen et al. 2006). In this study, LD was found to persist across a 250-kb region (approximately 1 cM), showing asymmetric bracketing of Waxy. Together, these estimates suggest that LD decays more slowly in rice than in maize or other outcrossing specie, such as Drosophila or humans, and that it provides much lower resolution than recombinational fine-mapping as a strategy for gene discovery. However, the large regions of LD observed in rice raise the possibility that association mapping strategies using whole genome scans may provide a realistic approach as a first-pass mapping strategy, offering exciting opportunities to couple genome mapping with the exploration of population substructure and germplasm diversity. 7.4.4 Gene Identification and Development of Perfect Markers for Applications in Breeding Researchers are rapidly isolating genes associated with quantitative traits of interest. As these genes are cloned, researchers have begun to design
7 Naturally Occurring Alleles for Crop Improvement
131
“perfect markers” that target FNPs so that breeders can readily distinguish favorable from unfavorable alleles by testing a small sample of DNA. Perfect, or functional, markers are based directly on a sequence polymorphism that is responsible for a functional change in a target gene. Perfect markers are 100% predictive of the presence/absence of a particular allele (Andersen and Lubberstedt 2003). For major genes in which a single locus is responsible for most of the natural variation associated with the trait, having a perfect marker allows breeders to predict the phenotype of a plant reliably without seeing the trait expressed in the field. This has found commercial application in testing for seed purity, confirming the identity of a variety or seed stock in a germplasm collection and in screening unknown germplasm or recombinants to determine how to classify it or whether to keep or discard it in a plant improvement program. Where phenotyping a population is expensive and labor intensive, these markers are of great assistance. Perfect markers are now available in rice for genes conferring resistance to blast and bacterial blight (Jia et al. 2004; Iyer-Pascuzzi and McCouch 2006), red pericarp (Sweeney et al. 2006), grain amylose content (Yamanaka et al. 2004), and aroma (Bradbury et al. 2005). Molecular markers are used to tag genes or regions of chromosomes containing alleles of interest, and the presence or absence of a favorable allele can then be detected using a small amount of DNA extracted from the leaf of a young plant long before the trait itself is actually expressed or detectable in the whole plant. By allowing breeders to screen for or against critical alleles early in plant development, it can save time, labor, and the expense of screening unwanted individuals in the field. This is particularly critical when the trait of interest is economically important but can only be evaluated late in the life of the plant. Examples include seed quality characteristics, male sterility, and flowering times. Marker assisted selection is also very helpful when introgressing multiple disease resistance genes from either natural sources or transgenic sources. Often, it is costly and difficult to screen for multiple R genes, particularly if some are recessive, if the phenotypic of one masks that of a second or third R gene, or if the pest is either quarantined or not present in the field every year. Breeding for aroma in rice is an excellent example of how perfect markers are superior to other methods of detection. Basmati and jasmine aromatic rices command premium prices in the stores and as such are a target of breeding programs. When new traits are introduced into aromatic rice from nonaromatic sources, breeders need to be sure of retaining the aromatic quality. Aroma is a recessive trait, so in traditional breeding a generation of progeny testing is needed to determine if individuals carry the allele for aroma (Berner and Hoff 1986). The compound associated with the aroma in rices is known to be 2-acetyl-1-pyrroline (Buttery et al. 1983; Lorieux et al. 1996; Widjaja et al. 1996; Yoshihashi 2002), but selection
132
Anjali S. Iyer-Pascuzzi et al.
for this chemical has not been easy. Breeding programs have tried several different methods of detecting aroma, including panels of taste experts, chemical reactions, gas chromatography, and DNA markers that are linked to the gene (Sood and Sidiq 1978; Reinke et al. 1991; Widjaja et al. 1996; Lorieux et al. 1996; Cordeiro et al. 2002 ). But these methods are either time consuming, expensive, not completely reliable or all of the above. The gene responsible for fragrance in rice has been cloned and shown to encode betaine aldehyde dehydrogenase 2 (BAD2) (Bradbury et al. 2005). Nonaromatic rice contains a functional copy of this gene, whereas the allele in aromatic rice contains an 8-bp deletion and three SNPs. These polymorphisms introduce a frame shift that leads to a premature stop codon and truncates the protein. Using this information, Bradbury et al. (2005) designed a perfect marker for the aroma gene. This marker is run in a single tube; contains an internal positive control; can clearly differentiate homozygous aromatic, homozygous nonaromatic, and heterozygous individuals; and can be detected on agarose. This test is quick, inexpensive, completely accurate, and eliminates the need for progeny testing as heterozygous individuals can be identified.
7.5 Natural Variation and Epistasis The concept of epistasis is central to our understanding of natural variation (Thomson et al. 2006). Mutant studies often use only one genetic background, such as the Tos17 lines in cv. Nipponbare or the deletion lines in cv. IR64. In contrast, natural variation occurs in many genetic backgrounds; this enables the study of epistatic interactions. Sometimes a certain allele is needed in the genetic background for the effect of an allele at another locus to be observed. Thus the same haplotype may give rise to diverse phenotypes in different cultivars, depending on the combination of other alleles within the cultivar. Though flowering time is reviewed extensively elsewhere in this volume, we discuss it briefly here to provide an example of how epistatic interactions give rise to complex phenotypes that can be unraveled by studying natural variation. Extensive natural variation for flowering time exists within rice germplasm. Rice is a photosensitive plant in which flowering is promoted under short day conditions. As rice cultivation migrated north and south of the equator, the day length during the growing season increased. Varieties in areas with longer day lengths developed the ability to flower in reasonable time periods under longer days, while those in regions with shorter days were able to flower under shorter photoperiods. Flowering time is an important trait for plant breeders, both because they need to be able to make
7 Naturally Occurring Alleles for Crop Improvement
133
crosses and because farmers need varieties that will flower reliably in their production environments. In an effort to understand the genes underlying the variation in flowering time, Masahiro Yano’s group used populations derived from a cross between the aus cv Kasalath and the japonica cv. Nipponbare, as these differ in photoperiod sensitivity and flowering time. QTL studies from these crosses revealed many different genomic regions influencing flowering time. Some of the QTL could be detected in F2 populations, but others reached significance only in advanced backcross lines. One of these QTL, Heading date6 (Hd6) was shown to have an epistatic interaction with Hd2, such that the effect of the Kasalath allele of Hd6 was observed only in the presence of the Nipponbare allele Hd2 (Yamamoto et al. 2000). Extensive effort has resulted in the cloning of many of these QTLs, determination of the functional nucleotide polymorphisms, establishment of their effects on each other and on other target genes, and the development of a framework of events affecting flowering time under both short and long day conditions. Thus, in a single genetic background, variants that could be advantageous may not be visible unless the genetic background contains the correct interacting partners. As these pathways are better understood and the components of the genetic systems controlling them are identified, we are beginning to develop a toolkit that allows us to predict and better utilize the diversity of alleles that are available in our germplasm resources.
7.6 Natural Variation or Mutant Analysis? One of the most useful qualities of natural variation is that it allows researchers to identify genes that contribute in subtle ways to the overall phenotype. In addition, natural variation allows the identification of alleles that could not be found based on analysis of knockout or insertion mutants. This may be because such genes produce lethal phenotypes if completely erased, that these genes result in the production of a slightly altered protein product, rather than a deletion or significant up-regulation of the protein, or because the phenotype of interest has already been knocked out in the wild type, such that no further mutational pressure is likely to allow it to regain function (as the case with red pericarp discussed above). Throughout this chapter we have focused on the uses of natural variation and the advantages of investigating natural variation compared to mutant analysis. However, for the geneticist, natural variation and mutant analysis frequently intersect and both are useful in understanding gene function. For example, more than 60 genes are known in rice in which mutations result in semi-dwarf stature (Futsuhara and Kikuchi 1997). The most commonly used semi-dwarf locus in plant breeding, semi-dwarf1
134
Anjali S. Iyer-Pascuzzi et al.
(sd1), is the gene responsible for the green revolution. Interestingly, this gene was identified by breeders both as a natural variant and as an induced mutant, and both sources of alleles have been successfully used in breeding. The recessive sd1 gene was first detected in the indica cultivar Dee-gee-woo-gen (DGWG) as a natural variant and subsequently used in Taiwan to develop the cultivar, Taichung Native (TN1) and at IRRI to develop the widely planted indica cultivar, IR8. This allele was also transferred via crossing to many of the japonica cultivars grown in Korea and California (Hedden 2003). An additional dwarfing variety was induced by mutation and widely used in breeding programs in China, Japan, and the United States. Calrose 76 is one of the more commonly known varieties produced in this way (Hedden 2003). In 2002, this locus was cloned by five groups and shown to encode OsGA20ox2, a gibberellin A biosynthetic gene (Ashikari et al. 2002; Monna et al. 2002; Sasaki et al. 2002; Septiningsih 2002; Spielmeyer et al. 2002). The natural variants and induced mutants are distinguishable molecularly, though they are practically indistinguishable phenotypically and both carry mutations in the same gene. The natural variant found in DGWG carries a 383-bp deletion in OsGA20ox2 that causes loss of function as a result of the introduction of a stop codon that results in a truncated enzyme. In contrast, the induced mutant has point mutations that result in single amino acid substitutions, decreasing the activity of the enzyme. The gibberellic acid (GA) biosynthesis and signaling pathways have many genes that, when mutated, produce a semidwarf phenotype (Sakamoto et al. 2004). Indeed, the green revolution semi-dwarf phenotypes in wheat and rice are remarkably similar, and have similar agronomic value; however, the genes underlying the phenotypes are different. In wheat, the phenotype is the result of a semi-dominant mutation in the Rht gene, a regulatory protein associated with the GApathway. In rice, the phenotype is due to a knock-out of the sd1 gene, a biosynthetic gene in the same pathway (Peng et al. 1999). Interestingly, mutations in the Rht and other genes of the GA pathway have been evaluated in rice (Ashikari et al. 1999; Ikeda et al. 2001) and have detrimental consequences resulting in decreased performance and would therefore not be useful in a breeding program. Natural variation and mutant analysis can also intersect and complement each other in plant genetic studies. Often, using both approaches in parallel achieves better and more efficient results. For example, using a combination of QTL and mutant analysis, Sergeeva et al. (2006) recently identified a vacuolar invertase underlying a QTL responsible for hypocotyl elongation in Arabidopsis. After narrowing the region of interest with fine mapping, they were able to identify mutants in candidate genes and test these mutants for hypocotyl length. By identifying a knockout mutant in the
7 Naturally Occurring Alleles for Crop Improvement
135
invertase candidate, they were able to identify it as responsible for the QTL without further fine-mapping, saving time and energy. With more and more mutant collections becoming available for rice, this will soon be a viable technique for most rice genes.
7.7 Natural Variation versus Transgenic Approaches for Crop Improvement One of the primary goals of researchers working with natural variation is to use it for crop improvement. This is also often the case with those working with another kind of genetic variation, transgenics, or genetically modified organisms (GMOs). Though GMOs are often posited by the media as either a cure-all for global hunger or the end of the civilized world, most people would probably agree the answer to their use lies somewhere in the middle. An approach combining crop improvement using sexual recombination to harness natural variation and transgenic approaches is almost certainly the one most likely to succeed. At the moment, the use of natural variation has several advantages over transgenics: economic considerations include the ability of farmers to sell their crop on the global market, sociological factors related to public acceptance, and biological factors include issues related to biosafety. With the development of molecular maps and markers, marker-assisted breeding aimed at recombining sources of natural variation can be less expensive and just as efficient as transgenic approaches. Natural variation is the product of natural and/or artificial selection working on the whole genome over long periods of time. This suggests a degree of ecological and geographical adaptation that cannot be easily replicated by transgenic approaches. Newly created variation in the form of GMOs has to be extensively tested to determine what advantages or disadvantages these novel genotypes might provide. Further, it is important to keep in mind that the advantages of a particular allele or allele combination may not be visible if the line(s) is not tested under the right conditions. Thus, a novel disease resistance gene will not be detected unless it is challenged by the appropriate pathogen. That said, transgenic approaches do have a place in crop improvement. One benefit of the transgenic approach is that it allows breeders to combine multiple alleles of a single locus. Using natural variation, only one allele in a homozygous state or two alleles in a heterozygous state is feasible. Transgenics offer the possibility of introducing multiple alleles from a single locus into one genotype. Although considered unnatural, biotechnology entails the use of diverse transgenes, most of which are found in nature, but evolved in an organism
136
Anjali S. Iyer-Pascuzzi et al.
that is not the focus of the research. Novel genetic variation is thus created by introducing a transgene into an existing variety and in subsequent generations, the transgene can be moved from one genetic background to another via crossing and marker assisted selection using “perfect markers.” There are currently three types of transgenes for rice that are either in the process of release, undergoing field trials, or in the early stages of development: Bt, conferring insect resistance; Xa2, conferring disease resistance; and a pair of genes conferring golden rice. As of the beginning of 2006, only one country, Iran, has released transgenic rice (Bt-rice) commercially, though several other countries will probably do so in the coming years. Here, we focus our discussion on golden rice, as recent improvements to this technology hold much promise. In countries where rice is a staple part of the diet, vitamin A deficiency can be severe, since provitamin A, or beta-carotene, is not found in the rice endosperm. Vitamin A deficiency causes blindness and increases the susceptibility and severity of other diseases. To alleviate this problem, transgenic golden rice was developed. Golden rice successfully synthesizes pro-vitamin A as part of the beta-carotene pathway and the beta-carotene accumulates in the rice endosperm, giving it a golden color. Golden rice was first engineered with two foreign genes—a daffodil phytoene synthase (psy) and a bacterial transit peptide known as ctr1. The first generation of golden rice produced using these genes had 1.6 μg beta-carotene per gram of rice, not enough to prevent vitamin A deficiency, but a good beginning. Recently, Paine et al. (2005) hypothesized that the psy gene was the limiting factor in the production of higher levels of beta-carotene. The authors tested psy genes from several different organisms (rice, maize, Arabidopsis, sunflower, pepper, narcissus, and tomato) in combination with the ctr1 gene and measured carotenoid content in the grains of the resulting transgenic plants. Using the phytoene synthase gene from maize, they were able to increase the concentration of total carotenoids approximately 23-fold to 37µg/g, of which 31 µg/g was beta-carotene. The authors estimate that 70 g of “golden rice 2” will contribute approximately 50% of a child’s recommended daily allowance (RDA). As an average child’s portion of rice is approximately 60 g, and in most countries where rice is a staple crop, it is eaten several times a day, this could be enough to prevent vitamin A deficiency in these countries. Although this form of variation is not “natural,” this rice is likely to have a positive impact in the lives of many. The ability to target gene recombination is a breeders dream and developments in this area of biotechnology are likely to be very beneficial in the coming years, both for basic and applied research. This technique is akin to gene therapy and aims to facilitate the direct exchange of one allele for another at a particular locus. The outcome would be to reduce the positional effects that are associated with current transgenic procedures
7 Naturally Occurring Alleles for Crop Improvement
137
and to enable a researcher to substitute one rice allele for another without recombining the entire rice genome. Though “targeted allele replacement” is still in its infancy, Terada et al. (2002) demonstrated its utility by creating substitution mutants in the Waxy gene. They obtained T0 plants that were heterozygous for the substitution and without any ectopic recombination. Their technique was independent of gene specificselection, and can therefore be applied to other genes. Eventually plant breeders may be able to identify natural variants and quickly add or subtract them from selected lines in a breeding population using targeted allele replacement technology.
7.8 Conclusions Natural variation is one of the most important resources for understanding plant processes. We now have many of the tools in place that are necessary to clone individual genes. However, because so many different genes contribute to most agriculturally valuable phenotypic variation, including flowering time, seed dormancy, or yield, one of the next challenges is to understand how these multiple genes interact to shape the anatomy and development of the rice plant. We are beginning to address the molecular basis of epistatic interactions that condition plant growth and development, and information from these studies will shed new light on our understanding of how natural variation is organized in populations or species, how it functions in particular plant processes, and how it evolves. In addition, microRNA variants, post-translational modifications, DNA methylation patterns, and interactions among them only add to the possibilities of identifying natural variants. While it is clear that we will be busy for many decades to come, early work in gene identification has clearly demonstrated that progress is possible. Further, work with natural variation has demonstrated that it can shed light on fundamental questions related to environmental adaptation and evolutionary history.
References Ahn SN, Suh JP, Oh CS, Lee SJ, Suh HS (2002) Development of introgression lines of weedy rice in the background of Tongil-type rice. Rice Genet Newsl 19:14 Aida Y, Tsunematsu H, Doi K, Yoshimura A (1997) Development of a series of introgression lines of japonica in the background of indica rice. Rice Genet Newsl 14:41–43
138
Anjali S. Iyer-Pascuzzi et al.
Andaya VC, Mackill DJ (2003) QTLs conferring cold tolerance at the booting stage of rice using recombinant inbred lines from a japonica x indica cross. Theor Appl Genet 106:1084–1090 Andersen JR, Lubberstedt T (2003) Functional markers in plants. Trends Plant Sci 8:554–560 Ashikari M, Wu J, Yano M, Sasaki A, Yosimura A (1999) Rice gibberellininsensitive dwarf mutant gene Dwarf1 encodes the alpha-subunit of GTPbinding protein. Proc Natl Acad Sci USA 96:10284–10289 Ashikari M, Sasaki A, Ueguchi-Tanaka M, Itoh H, Nishimura A, Datta S, Ishiyama K, Saito T, Kobayashi M, Khush GS, Kitano H, Matsuoka M (2002) Mutation in a gibberellin biosynthesis gene, GA20 oxidase, contributed to the rice ‘Green Revolution’. Breed Sci 52:143–150 Ashikari M, Sakakibara H, Lin H, Yamamoto T, Takashi T, Nishimura A, Angeles ER, Qian Q, Kitano H, Matsuoka M, Ashikari M (2005) Cytokinin oxidase regulates rice grain production. Science 309:741–745 Barton NH (2000) Genetic hitchhiking. Phil Trans Royal Soc London B 355:1553–1562 Berner DK, Hoff BJ (1986) Inheritance of scent in American long grain rice. Crop Sci 26:876–878 Bradbury LM, Henry RJ, Jin QS, Reinke RF, Waters DL (2005) A perfect marker for fragrance genotyping in rice. Mol Breed 1665:279–283 Brar DS, Khush GS (1997) Alien introgression in rice. Plant Mol Biol 35:35–47 Burr B, Burr FA, Thompson KH, Albertson MC, Stube CW (1988) Gene mapping with recombinant inbreds in maize. Genetics 118:519–526 Buttery RG, Ling LC, Juliano BO, Turnbaugh JG (1983) Cooked rice aroma and 2-acetyl-1-pyrroline. J Agric Food Chem 31:823–826 Cai HW, Morishima H (2002) QTL clusters reflect character associations in wild and cultivated rice. Theor Appl Genet 104:1217–1228 Causse MA, Fulton TM, Cho YG, Ahn SN, Chunwongse J, Wu K, Xiao J, Yu Z, Ronald PC, Harrington SE, et al (1994) Saturated molecular map of the rice genome based on an interspecific backcross population. Genetics 138: 1251–1274 Champoux MC, Sarkarung S, Mackill DJ, O'Toole JC, Huang N, McCouch SR (1995) Locating genes associated with root morphology and drought avoidance via linkage to molecular markers. Theor Appl Genet 90:969–981 Chang TT (1976) Origin, Evolution, cultivation, dissemination, and diversification of Asian and African rices. Euphytica 25:425–441 Chen X, Temnykh S, Xu Y, Cho YG, McCouch SR (1997) Development of a microsatellite framework map providing genome-wide coverage in rice (Oryza sativa L.). Theor Appl Genet 95:553–567 Cordeiro GM, Christopher MJ, Henry RJ, Reinke RF (2002) Identification of microsatellite markers for fragrance in rice by analysis of the rice genome sequence. Mol Breed 9:245–250 Doi K, Sobrizal K, Ikeda K, Sanchez PL, Kurakazu T (2002) Developing and evaluating rice chromosome segment substitution lines, In: IRRI Conference (September 16–19, 2002). International Rice Research Institute, Beijing, China. pp. 275–287
7 Naturally Occurring Alleles for Crop Improvement
139
Ebitani T, Takeuchi Y, Nonoue Y, Yamamoto T, Takeuchi K, Yano M (2005) Construction and evaluation of chromosome segment substitution lines carrying overlapping chromosome segments of indica rice cultivar ‘Kasalath’ in a genetic background of japonica elite cultivar ‘Koshihikari’. Breeding Sci 55:65–73 Edwards J, McCouch SR (2005) Molecular markers for use in plant breeding and germplasm evaluation. In: Proceedings of Red Bio (June 21–24, 2004). Dominican Republic Engle LM, Chang TT, Ramirez DA (1969) The cytogenetics of sterility in F1 hybrids of indica x indica and indica x japonica varieties of rice (Orzya sativa L.). Philipp Agric 53:289–307 Futsuhara Y, Kikuchi F (1997) Dwarf characters. In: Matsuo T, Futsuhara Y, Kikuchi F, Yamaguchi H (eds) Science of the Rice Plant, vol 3, Genetics. Food and Agriculture Policy Research Center, Tokyo, pp 300–308 Garris AJ, McCouch SR, Kresovich S (2003) Population structure and its effect on haplotype diversity and linkage disequilibrium surrounding the xa5 locus of rice (Oryza sativa L.). Genetics 165:759–769 Garris AJ, Tai TH, Coburn JR, Kresovich S, McCouch S (2005) Genetic structure and diversity in Oryza sativa L. Genetics 169:1631–1638 Ghesquiere A, Sequier J, Second G, Lorieux M (1997) First steps toward a rational use of African rice, Oryza glaberrima in rice breeding: A contig line concept. Euphytica 96:31–39 Glaszmann JC (1987) Isozymes and classification of Asian rice varieties. Theor Appl Genet 74:21–30 Gu K, Yang B, Tian D, Wu L, Wang D, Sreekala C, Yang F, Chu Z, Wang GL, White FF, Yin Z (2005) R gene expression induced by a type-III effector triggers disease resistance in rice. Nature 435:1122–1125 Guiderdoni EJ, Glaszmann JC, Courtois B (1988) Segregation of 12 isozyme gene among doubled haploid lines derived from a japonica x indica cross of rice (Oryza sativa L.). Euphytica 42:45–53 Harushima Y, Yano M, Shomura A, Sato M, Shimano T, Kuboki Y, Yamamoto T, Lin SY, Antonio BA, Parco A, Kajiya H, Huang N, Yamamoto K, Nagamura Y, Kurata N, Khush GS, Sasaki T (1998) A high-density rice genetic linkage map with 2,275 markers using a single F2 population. Genetics 148:479–494 Hedden P (2003) The genes of the green revolution. Trends Genet 19:5–9 Hittalmani S, Huang N, Courtois B, Venuprasad R, Shashidhar HE, Zhuang JY, Zheng KL, Liu GF, Wang GC, Sidhu JS, Srivantaneeyakul S, Singh VP, Bagali PG, Prasanna HC, McLaren G, Khush GS (2003) Identification of QTL for growth- and grain yield-related traits in rice across nine locations of Asia. Theor Appl Genet 107:679–690 Hudson RR, Kaplan NL (1988) The coalescent process in models with selection and recombination. Genetics 120:831–840 Hudson RR, Saez EG, Ayala FJ (1997) DNA variation at the Sod locus of Drosophila melanogaster: An unfolding story of natural selection. Proc Natl Acad Sci USA 94:7725–7729 Ikeda R, Khush GS, Tabien RE (1990) A new resistance gene to bacterial blight derived from O. longistaminata. Jpn J Breed 40:280–281
140
Anjali S. Iyer-Pascuzzi et al.
Ikeda A, Ueguchi-Tanaka M, Sonoda Y, Kitano H, Koshioka M, Futsuhara Y, Matsuoka M, Yamaguchi J (2001) slender rice, a constitutive gibberellin response mutant, is caused by a null mutation of the SLR1 gene, an ortholog of the height-regulating gene GAI/RGA/RHT/D8. Plant Cell 13:999–1010 Ishikawa S, Ae N, Yano M (2005) Chromosomal regions with quantitative trait loci controlling cadmium concentration in brown rice (Oryza sativa). New Phytol 168:345–350 Iyer AS, McCouch SR (2004) The rice bacterial blight resistance gene xa5 encodes a novel form of disease resistance. Mol Plant Microbe Interact 17:1348–1354 Iyer-Pascuzzi A, McCouch SR (2006) Functional markers for xa5 mediated resistance in rice (Oryza sativa L.). Mol Breed (In Press, DOI 10.1007/s11032-006-9055-9) Jain S, Jain R, McCouch S (2004) Genetic analysis of Indian aromatic and quality rice (Oryza sativa L.) germplasm using panels of fluorescently-labeled microsatellite markers. Theor Appl Genet 109:965–977 Jia Y, Redus MA, Wang Z, Rutger J (2004) Development of SNLP marker from the Pi-ta blast resistance gene by tri-primer PCR. Euphytica 138:97–105 Jiang L, Cao YJ, Wang CM, Zhai HQ, Wan JM, Yoshimura A (2003) Detection and analysis of QTL for seed dormancy in rice (Oryza sativa L.) using RIL and CSSL population. Acta Genetica Sinica 30:453–458 Jiang GH, Xia ZH, Zhou YL, Wan J, Li DY, Chen RS, Zhai WX, Zhu LH (2006) Testifying the rice bacterial blight resistance gene xa5 by genetic complementation and further analyzing xa5 (Xa5) in comparison with its homolog TFIIAgamma1. Mol Genet Genomics 275:354–366 Jorde LB (2000) Linkage disequilbrium and the search for complex disease genes. Genome Res 10:1435–1444 Katayama TC (1993) Historical review of taxonomical studies. In: Matsuo T, Hoshikawa K (eds) Science of the Rice Plant, vol 1, Morphology. Food and Agriculture Policy Research Center, Tokyo, pp 35–41 Kato S, Kosaka H, Hara S (1928) On the affinity of rice varieties as shown by fertility of hybrid plants. Bull Sci Fac Agric Kyushu Univ, Fukuoka, Japan 3:132–147 Khush GS, Ling KC (1974) Inheritance of resistance to Grassy stunt virus and its vector in rice. J Hered 65:135–136 Koornneef M, Alonso-Blanco C, Vreugdenhil D (2004) Naturally occurring genetic variation in Arabidopsis thaliana. Annu Rev Plant Biol 55:141–172 Kreitman M, Akashi H (1995) Molecular evidence for natural selection. Annu Rev Ecol Syst 26:403–422 Kruglyak L (1999) Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nat Genet 22:139–144 Kubo T, Aida Y, Nakamura K, Tsunematsu H, Doi K, Yoshimura A (2002) Reciprocal chromosome segment substitution series derived from Japonica and Indica cross of rice (Oryza sativa L.). Breed Sci 52:319–325 Kurakazu T, Sorbrizal N, Ikeda K, Sanchez PL, Doi K, Angeles RR, Khush GS, Yoshimura A (2001) Oryza meridionalis chromosomal segment introgression lines in cultivated rice, O. sativa L. Rice Genet Newsl 18:81–82
7 Naturally Occurring Alleles for Crop Improvement
141
Lanceras JC, Huang ZL, Naivikul O, Vanavichit A, Ruanjaichon V, Tragoonrung S (2000) Mapping of genes for cooking and eating qualities in Thai jasmine rice (KDML105). DNA Res 7:93–101 Lanceras JC, Pantuwan G, Jongdee B, Toojinda T (2004) Quantitative trait loci associated with drought tolerance at reproductive stage in rice. Plant Physiol 135:384–399 Lee M, Sharopova N, Beavis WD, Grant D, Katt M, Blair D, Hallauer A (2002) Expanding the genetic map of maize with the intermated B73 × Mo17 (IBM) population. Plant Mol Biol 48:453–461 Li J, Yuan L (2000) Hybrid Rice: Genetics, breeding and seed production. Plant Breed Rev 17:15–120 Li ZK, Yu SB, Lafitte HR, Huang N, Courtois B, Hittalmani S, Vijayakumar CH, Liu GF, Wang GC, Shashidhar HE, Zhuang JY, Zheng KL, Singh VP, Sidhu JS, Srivantaneeyakul S, Khush GS (2003) QTL × environment interactions in rice. I. heading date and plant height. Theor Appl Genet 108:141–153 Li ZK, Fu BY, Gao YM, Xu JL, Ali J, Lafitte JR, Jiang YZ, Rey JD, Vijayakumar CHM, Maghirang R, Zheng TQ, Zhu LH (2005) Genome-wide introgression lines and their use in genetic and molecular dissection of complex phenotypes in rice (Oryza sativa L.). Plant Mol Biol 59:33–52 Long AD, Lyman RF, Langley CH, Mackay TFC (1998) Two sites in the Delta gene region contribute to naturally occurring variation in bristle number in Drosophila melanogaster. Genetics 149:999–10173 Lorieux M, Petrov M, Huang N, Guiderdoni E, Ghesquiere A (1996) Aroma in rice: genetic analysis of a quantitative trait. Theor Appl Genet 93:1145–1151 Lu BR, Naredo EB, Juliano AB, Jackson MT (1998) Taxonomic status of Oryza glumaepatula Steud. III. Assesment of genomic affinity among AA gemone species from the New World, Asia, and Australia. Gen Res Crop Evol 45:215–223 Lu C, Shen L, Tan Z, Xu Y, He P, Chen Y, Zhu L (1997) Comparative mapping of QTLs for agronomic traits of rice across environments by using a doubledhaploid population. Theor Appl Genet 94:145–150 Ma J, Bennetzen JL (2004) Rapid recent growth and divergence of rice nuclear genomes. Proc Natl Acad Sci U S A 101:12404–12410 Marri PR, Sarla N, Reddy LV, Siddiq EA (2005) Identification and mapping of yield and yield related QTLs from an Indian accession of Oryza rufipogon. BMC Genet 6:33 Matsuo T (1952) Genecological studies on cultivated rice. Bull Natl Inst Gr Sci Jpn D3:1–111 McCouch SR, Kochert G (1988) Molecular mapping of rice chromosomes. Theor Appl Genet 76:815–829 McCouch SR, Teytelman L, Xu Y, Lobos KB, Clare K, Walton M, Fu B, Maghirang R, Li Z, Xing Y, Zhang Q, Kono I, Yano M, Fjellstrom R, DeClerck G, Schneider D, Cartinhour S, Ware D, Stein L (2002) Development and mapping of 2240 new SSR markers for rice (Oryza sativa L.). DNA Res 9:199–207 McCouch SR, Sweeney M, Li J, Jiang H, Thomson M, Septiningsih E, Edwards J, Moncada P, Xiao J, Garris A, Tai T, Martinez C, Tohme J, Sugiono M, McClung A, Yuan LP, Ahn SN (2006) Through the genetic bottleneck:
142
Anjali S. Iyer-Pascuzzi et al.
O. rufipogon as a source of trait-enhancing alleles for O. sativa. Euphytica, (In Press, DOI 10.1007/s10681-006-9210-8) Moncada M, Martínez C, Tohme J, Guimaraes E, Chatel M, Borrero J, Gauch H, McCouch S (2001) Quantitative trait loci for yield and yield components in an Oryza sativa x Oryza rufipogon BC2F2 population evaluated in an upland environment. Theor Appl Genet 102:41–52 Monna L, Kitazawa N, Yoshino R, Suzuki J, Masuda H, Maehara Y, Tanji M, Sato M, Nasu S, Minobe Y (2002) Positional cloning of rice semidwarfing gene, sd-1: rice “green revolution gene” encodes a mutant enzyme involved in gibberellin synthesis. DNA Res 9:11–17 Morishima H, Oka HI (1970) A survey of genetic variations in the populations of wild Oryza species and their cultivated relatives. Jpn J Genet 45:371–385 Mu J, Zhou H, Zhao S, Xu C, Yu S, Zhang Q (2004) Development of contiguous introgression lines covering entire genome for the sequenced japonica rice. In: Fischer T, Turner N, Angus J, McIntyre L, Robertson M, Borrell A (eds) New directions for a diverse planet: Proceedings for the 4th International Crop Science Congress, Brisbane, Australia, 26 September – 1 October 2004 (http://www.cropscience.org.au/icsc2004/poster/3/2/1/781_yusb.htm) Nandi S, Subudhi PK, Senadhira D, Manigbas NL, Sen-Mandi S, Huang N (1997) Mapping QTLs for submergence tolerance in rice by AFLP analysis and selective genotyping. Mol Gen Genet 255:1–8 Naredo MEB, Juliano AB, Lu BR, Jackson MT (1997) Hybridization of AA genome rice species from Asia and Australia I. Crosses and development of hybrids. Gen Res Crop Evol 44:17–23 Nielsen R (2001) Statistical tests of selective neutrality in the age of genomics. Heredity 86:641–647 Nordborg, M. Tavare, S. (2002) Linkage disequilibrium: what history has to tell us. Trend Genet 18:83–90 Ogawa T, Yamamoto T, Khush GS, Mew TW, Kaku H (1988) Near-isogenic lines as international differentials for resistance to bacterial blight of rice. Rice Genet Newsl 5:106–109 Ohtsubo H, Cheng CY, Ohsawa I, Tsuchimoto S, Ohtsubo E (2004) Rice retroposon p-SINE1 and origin of cultivated rice. Breed Sci 54:1–11 Oka HI, Morishima H (1982) Phylogenetic differentiation of cultivated rice, XXIII. Potentiality of wild progenitors to evolve the Indica and Japonica types of rice cultivars. Euphytica 31:41–50 Olsen KM, Caicedo AL, Polato N, McClung AM, McCouch SR, Purugganan MD (2006) Selection under domestication: evidence for a sweep in the rice Waxy genomic region. Genetics 173:975–983 Paine JA, Shipton CA, Chaggar S, Howells RM, Kennedy MJ, Vernon G, Wright SY, Hinchliffe E, Adams JL, Silverstone AL, Drake R (2005) Improving the nutritional value of Golden Rice through increased pro-vitamin A content. Nat Biotechnol 23:482–487 Peng J, Richards DE, Hartley NM, Murphy GP, Devos KM, Flintham JE, Beales J, Fish LJ, Worland AJ, Pelica F, Sudhakar D, Christou P, Snape JW, Gale MD, Harberd NP (1999) ‘Green Revolution’ genes encode mutant gibberellin response modulators. Nature 400:256–261
7 Naturally Occurring Alleles for Crop Improvement
143
Puca AA, Daly MJ, Brewster SJ, Matise TC, Barrett J, Shea-Drinkwater M, Kang S, JOyce E, Nicoli J, Benson E, Kunkel LM, Perls T (2001) A genome-wide scan for linkage to human exceptional longevity identifies a locus on chromsome 4. Proc Natl Acad Sci USA 98:10505–10508 Reinke RF, Welsh LA, Reece JE, Lewin LG, Blakeney AB (1991) Procedures for quality selection of aromatic rice varieties. Int Rice Res Newsl 16:10–11 Reiseberg LH, Widmer A, Arntz AM, Burke JM (2003) The genetic architecture necessary for transgressive segregation is common in both natural and domesticated populations. Philos Trans R Soc Lond 358:1141–1147 Ren ZH, Gao JP, Li LG, Cai XL, Huang W, Chao DY, Zhu MZ, Wang ZY, Luan S, Lin HX (2005) A rice quantitative trait locus for salt tolerance encodes a sodium transporter. Nat Genet 37:1141–1146 Saito A, Yano M, Kishimoto N, Nakagahra M, Yoshimura A, Saito K, Kuhara S, Ukai Y, Kawase M, Nagamine T, Yoshimura S, Ideta O, Ohsawa R, Hayano Y, Iwata N, Sugiura M (1991) Linkage map of restriction fragment length polymorphism loci in rice. Jpn J Breed 41:665–670 Sakamoto T, Miura K, Itoh H, Tatsumi T, Ueguchi-Tanaka M, Ishiyama K, Kobayashi M, Agrawal GK, Takeda S, Abe K, Miyao A, Hirochika H, Kitano H, Ashikari M, Matsuoka M (2004) An overview of gibberellin metabolism enzyme genes and their related mutants in rice. Plant Physiol 134:1642–1653 Sasaki A, Ashikari M, Ueguchi-Tanaka M, Itoh H, Nishimura A, Swapan D, Ishiyama K, Saito T, Kobayashi M, Khush GS, Kitano H, Matsuoka M (2002) A mutant gibberellin-synthesis gene in rice. Nature 416:701–702 Second G (1982) Origin of the genic diversity of cultivated rice (Oryza spp.): study of the polymorphism scored at 40 isozyme loci. Jpn J Genet 57:25–57 Septiningsih EM (2002) Identification, near-isogenic line development and fine mapping of quantitative trait loci from the rice cultivar IR64 and its wild relative Oryza rufipogon. PhD thesis. Cornell University, Ithaca, NY Sergeeva LI, Keurentjes JJ, Bentsink L, Vonk J, van der Plas LH, Koornneef M, Vreugdenhil D (2006) Vacuolar invertase regulates elongation of Arabidopsis thaliana roots as revealed by QTL and mutant analysis. Proc Natl Acad Sci USA 103:2994–2999 Shen B, Zhuang JY, Zhang KQ, Xia QQ, Sheng CX, Zheng KL (2003) QTLs mapping of leaf traits and root vitality in a recombinant inbred line population of rice. Yi Chuan Xue Bao 30:1133–1139 Sirithunya P, Tragoonrung S, Vanavichit A, Pa-In N, Vongsaprom C, Toojinda T (2002) Quantitative trait loci associated with leaf and neck blast resistance in recombinant inbred line population of rice (Oryza sativa). DNA Res 9:79–88 Sobrizal K, Ikeda P, Sanchez L, Doi K, Angeles ER, Khush GS, Yoshimura A (1999) Development of Oryza glumaepatula introgression lines in rice, O. sativa L. Rice Genet Newsl 16:107–108 Song W-Y, Wang G-L, Chen L-L, Kim H-S, Pi L-Y, Holsten T, Gardner J, Wang B, Zhai W-X, Zhu L-H, Fauquet C, Ronald P (1995) A receptor kinase-like protein encoded by the rice disease resistance gene, Xa21. Science 27:1804–1806 Sood BC, Sidiq EA (1978) A rapid technique for scent determination in rice. Indian J Genetic Plant Breed 38: 271
144
Anjali S. Iyer-Pascuzzi et al.
Spielmeyer W, Ellis MH, Chandler PM (2002) Semidwarf (sd-1), “green revolution” rice, contains a defective gibberellin 20-oxidase gene. Proc Natl Acad Sci USA 99:9043–9048 Sun X, Cao Y, Yang Z, Xu C, Li X, Wang S, Zhang Q (2004) Xa26, a gene conferring resistance to Xanthomonas oryzae pv. oryzae in rice, encodes an LRR receptor kinase-like protein. Plant J 37:517–527 Sweeney M, Thomson MJ, Pfeil B, McCouch S (2006) Caught red-handed: Rc encodes a basic helix-loop-helix protein conditioning red pericarp in rice. Plant Cell 18:283–294 Tabor H, Risch N, Myers R (2002) Candidate-gene approaches for studying coplex genetice traits: practical considerations. Nat Rev Genet 3:391–396 Tanksley SD, Nelson JC (1996) Advanced backcross QTL analysis: a method for the simultaneous discovery and transfer of valuable QTLs from unadapted germplasm into elite breeding lines. Theor Appl Genet 92:191–203 Tanksley SD, McCouch SR (1997) Seed banks and molecular maps: unlocking genetic potential from the wild. Science 277:1063–1066 Temnykh S, Park WD, Ayres N, Cartinhour S, Hauck N, Lipovich L, Cho YG, Ishii T, McCouch SR (2000) Mapping and genome organization of microsatellite sequences in rice (Oryza sativa L.). Theor Appl Genet 100:697–712 Temnykh S, DeClerck G, Lukashova A, Lipovich L, Cartinhour S, McCouch S (2001) Computational and experimental analysis of microsatellites in rice (Oryza sativa L.): Frequency, length variation, transposon associations, and genetic marker potential. Genome Res 11:1441–1452 Terada R, Urawa H, Inagaki Y, Tsugane K, Iida S (2002) Efficient gene targeting by homologous recombination in rice. Nat Biotechnol 20:983–984 Terwilliger JD, Weiss KM (1998) Linkage disequilibrium mapping of complex disease: fantasy or reality? Curr Opin Biotech 9:578–594 Thomson MJ, Tai TH, McClung AM, Hinga ME, Lobos KB, Xu Y, Martinez C, McCouch SR (2003) Mapping quantitative trait loci for yield, yield components, and morphological traits in an advanced backcross population between Oryza rufipogon and the Oryza sativa cultivar Jefferson. Theor Appl Genet 107:479–493 Thomson MJ, Edwards JD, Septiningsih EM, Harrington S, McCouch SR (2006) Substitution mapping of dth1.1, a flowering time QTL associated with transgressive variation in rice, reveals a cluster of QTLs. Genetics 172:2501–2514 Thornsberry JM, Goodman MM, Doebley J, Kresovich S, Nielsen D, Buckler ESt (2001) Dwarf8 polymorphisms associate with variation in flowering time. Nat Genet 28:286–289 Tian F, Li de J, Fu Q, Zhu ZF, Fu YC, Wang XK, Sun CQ (2006) Construction of introgression lines carrying wild rice (Oryza rufipogon Griff.) segments in cultivated rice (Oryza sativa L.) background and characterization of introgressed segments associated with yield-related traits. Theor Appl Genet 112:570–580 Toenniessen GH, O'Toole JC, DeVries J (2003) Advances in plant biotechnology and its adoption in developing countries. Curr Opin Plant Biol 6:191–198
7 Naturally Occurring Alleles for Crop Improvement
145
Vaughan DA (1991) Biogeography of the genus Oryza across the Malay Archipelago. Rice Genet Newsl 8:73–75 Vaughan DA, Morishima H, Kadowaki K (2003) Diversity in the Oryza genus. Curr Opin Plant Mol Biol 6:139–146 Vaughan DA, Kadowaki KI, Kaga A, Tomooka N (2005) On the Phylogeny and Biogeography of the Genus Oryza. Breed Sci 55:113–122 Virk P, Ford-Lloyd BV, Jackson MT, Newbury HJ. (1996) Predicting quantitative variation within rice germplasm using molecular markers. Heredity 76: 296–304 Wan JL, Zhai HQ, Wan JM, Yasui H, Yoshimura A (2003) Mapping QTL for traits associated with resistance to ferrous iron toxicity in rice (Oryza sativa L.), using japonica chromosome segment substitution lines. Yi Chuan Xue Bao 30:893–898 Wan XY, Wan JM, Su CC, Wang CM, Shen WB, Li JM, Wang HL, Jiang L, Liu SJ, Chen LM, Yasui H, Yoshimura A (2004) QTL detection for eating quality of cooked rice in a population of chromosome segment substitution lines. Theor Appl Genet 110:71–79 Wan XY, Wan JM, Jiang L, Wang JK, Zhai HQ, Weng JF, Wang HL, Lei CL, Wang JL, Zhang X, Cheng ZJ, Guo XP (2006) QTL analysis for rice grain length and fine mapping of an identified QTL with stable and major effects. Theor Appl Genet 112:1258–1270 Wang GL, Mackill DJ, Bonman JM, McCouch SR, Champoux MC, Nelson RJ (1994) RFLP mapping of genes conferring complete and partial resistance to blast in a durably resistant rice cultivar. Genetics 136:1421–1434 Wang ZY, Tanksley SD (1989) Restriction Fragment Length Polymorphism in Oryza sativa L. Genome 32:1113–1118 Wang ZY, Second G, Tanksley SD (1992) Polymorphism and phylogenetic relationship among species in the genus Oryza as determined by analysis of nuclear RFLPs. Theor Appl Genet 83:565–581 Weir BS (1990) Genetic Data Analysis: Methods for Discrete Population Genetic Data, 377 pp. Sinaur Associates, Sunderland, MA Weiss K, Clark AG (2002) Linkage disequilibrium and the mapping of complex human traits. Trend Genet 18:19–24 Widjaja R, Craske JD, Wootton M (1996) Comparative studies on volatile components of non-fragrant and fragrant rices. J Sci Food Agric 70:151–161 Wing RA, Ammiraju JS, Luo M, Kim H, Yu Y, Kudrna D, Goicoechea JL, Wang W, Nelson W, Rao K, Brar D, Mackill DJ, Han B, Soderlund C, Stein L, SanMiguel P, Jackson S (2005) The Oryza map alignment project: the golden path to unlocking the genetic potential of wild rice species. Plant Mol Biol 59:53–62 Xiao J, Li J, Grandillo S, Ahn SN, Yuan L, Tanksley SD, McCouch SR (1998) Identification of trait-improving quantitative trait loci alleles from a wild rice relative, Oryza rufipogon. Genetics 150:899–909 Xing YZ, Tan YF, Hua JP, Sun XL, Xu CG, Zhang Q (2002) Characterization of the main effects, epistatic effects and their environmental interactions of QTLs on the genetic basis of yield traits in rice. Theor Appl Genet 105: 248–247
146
Anjali S. Iyer-Pascuzzi et al.
Xiong LZ, Liu KD, Dai XK, Xu Cg, Zhang Q (1999) Identification of genetic factors controlling domestication-related traits of rice using an F2 population of a cross between Oryza sativa and O. rufipogon. Theor Appl Genet 98: 243–251 Xu Y, Zhu L, Xiao J, Huang N, McCouch SR (1997) Chromosomal regions associated with segregation distortion of molecular markers in F2, backcross, doubled haploid, and recombinant inbred populations in rice (Oryza sativa L.). Mol Gen Genet 253:535–545 Yamamoto T, Lin H, Sasaki T, Yano M (2000) Identification of heading date quantitative trait locus Hd6 and characterization of its epistatic interactions with Hd2 in rice using advanced backcross progeny. Genetics 154:885–891 Yamanaka S, Nakamura I, Watanabe KN, Sato Y (2004) Identification of SNPs in the Waxy gene among glutinous rice cultivars and their evolutionary significance during the domestication process of rice. Theor Appl Genet 108:1200– 1204 Yamanouchi U, Yano M, Lin H, Ashikari M, Yamada K (2002) A rice spotted leaf gene, Spl7, encodes a heat stress transcription factor protein. Proc Natl Acad Sci USA 99:7530–7535 Yang QH, Wang CM, Hu ML, Zhang YX, Zhai HQ, Wan JM (2005) Genetic analysis for nitrogen content and its change in rice flag leaf. Zhongguo Shuidao Kexue 19:7–12 Yano M (2001) Genetic and molecular dissection of naturally occurring variation. Curr Opin Plant Biol 4:130–135 Yoshihashi T (2002) Quantitative analysis on 2-acetyl-1-pyrroline of an aromatic rice by stable isotope dilution method and model studies on its formation during cooking. J Food Sci 67:619–622 Yoshimura S, Yamanouchi U, Katayose Y, Toki S, Wang ZX, Kono I, Kurata N, Yano M, Iwata N, Sasaki T (1998) Expression of Xa1, a bacterial blightresistance gene in rice, is induced by bacterial inoculation. Proc Natl Acad Sci USA 95:1663–1668 Yu CY, Liu YQ, Jiang L, Wang CM, Zhai HQ, Wan JM (2005) QTLs mapping and genetic analysis of tiller angle in rice (Oryza sativa L.). Acta Genetica Sinica 32:948–954 Zhang N, Xu Y, Akash M, McCouch S, Oard J (2005) Identification of candidate markers associated with agronomic traits in rice using discriminant analysis. Theor Appl Genet 110:721–729 Zheng BS, Yang L, Zhang WP, Mao CZ, Wu YR, Yi KK, Liu FY, Wu P (2003) Mapping QTLs and candidate genes for rice root traits under different watersupply conditions and comparative analysis across three populations. Theor Appl Genet 107:1505–1515 Zheng BS, Yang L, Mao CZ, Zhang WP, Wu P (2006) QTLs and candidate genes for rice root growth under flooding and upland conditions. Yi Chuan Xue Bao 33:141–151 Zheng HG, Babu RC, Pathan MS, Ali L, Huang N, Courtois B, Nguyen HT (2000) Quantitative trait loci for root-penetration ability and root thickness in rice: comparison of genetic backgrounds. Genome 43:53–61
7 Naturally Occurring Alleles for Crop Improvement
147
Zhu Q, Ge S (2005) Phylogenetic relationships among A-genome species of the genus Oryza revealed by intron sequences of four nuclear genes. New Phytol 167:249–265 Zhuang JY, Fan YY, Rao ZM, Wu JL, Xia YW, Zheng KL (2002) Analysis on additive effects and additive-by-additive epistatic effects of QTLs for yield traits in a recombinant inbred line population of rice. Theor Appl Genet 105:1137–1145
8 Chemical- and Irradiation-Induced Mutants and TILLING
1
2
2
Ramesh S. Bhat , Narayana M. Upadhyaya , Abed Chaudhury , Chitra 3 3 3 3 3 Raghavan , Fulin Qiu , Hehe Wang , Jianli Wu , Kenneth McNally , 3 4 4 4,5 Hei Leung , Brad Till , Steven Henikoff , and Luca Comai 1
Department of Biotechnology, University of Agricultural Sciences, Dharwad-580 005, Karnataka, India; 2CSIRO Plant Industry, PO Box 1600, Canberra, ACT 2601, Australia; 3International Rice Research Institute, Los Baños, Philippines; 4 Seattle TILLING Project, Department of Biology and Fred Hutchinson Cancer Research Center, University of Washington, 1100 Fairview Ave. N. PO Box 19024, Seattle, WA 98109, USA; 5The UC Davis Genome Center, 451 E. Health Sciences Drive, Davis, CA 95616, USA Reviewed by Phil Larkin
8.1 Introduction..............................................................................................150 8.2 Mutagens and Mutagenesis......................................................................151 8.2.1 Chemical Mutagens..........................................................................152 8.2.2 Irradiation Mutagens ........................................................................155 8.2.3 Raising Mutant Populations .............................................................157 8.3 Rice Mutant Stocks and Databases ..........................................................158 8.3.1 USA Mutant Stocks..........................................................................159 8.3.2 IRRI Mutant Stocks and Database....................................................159 8.3.3 China Mutant Stocks ........................................................................160 8.3.4 Taiwan Mutant Stock .......................................................................160 8.3.5 Japan Mutant Stock and Database ....................................................161 8.4 Forward Genetics with Mutants...............................................................161 8.4.1 Phenotyping......................................................................................161 8.4.2 Map-Based Cloning..........................................................................162 8.4.3 Detecting Genomic Changes Using Genome-Wide Chips ...............163 8.5 Reverse Genetics with Mutants ...............................................................164 8.5.1 PCR Screening .................................................................................165 8.5.2 TILLING ..........................................................................................165 8.6 TILLING in Rice .....................................................................................166 8.6.1 Seattle TILLING Project ..................................................................166 8.6.2 Other Technical Improvements in Rice TILLING ...........................168 8.6.3 TILLING Case Studies for Specific Traits.......................................168
150
Ramesh S. Bhat et al.
8.7 Future Prospects ...................................................................................... 172 Acknowledgments ......................................................................................... 173 References ..................................................................................................... 174
8.1 Introduction Chemical and ionizing radiation mutagenesis have been routinely used to generate genetic variability for breeding research and genetic studies. To date, through such mutagenesis, 2,428 crop varieties have been released and among them 501 are rice varieties (see http://www-mvd.iaea.org/MVD). Because traditional mutagenesis does not use transgenic technology, it has particular appeal to the industry, where prohibitive regulatory costs and the debilitating debate on genetically modified organisms have restricted many cropimprovement efforts. Mutagen-induced morphological mutations have also provided genetic markers for the development of genetic linkage maps in many plants, including rice. While Arabidopsis has become the paramount model plant system, it is not a crop plant. Thus, the spectrum of its biological traits cannot address fundamental questions of crop plant domestication and agronomic performance. An alternative experimental system based on a crop plant is therefore much needed. At the time of this publication, rice is the only crop for which a complete genome sequence has been made available (Goff et al. 2002; Yu et al. 2002; International Rice Genome Sequencing Project 2005). To realize the potential of rice in the post-sequencing era, however, a complete analysis of function must involve disruption or modification of all of its genes. Several approaches are available for the functional inactivation of genes, including the use of gene-tagging elements and gene silencing (see Chapters 9, 10, and 13 of this book) and the use of chemical mutagens and irradiation. Collectively, sequence-tagged T-DNA insertions amount to more than 360,000 for Arabidopsis (http://signal.salk.edu/cgi-bin/tdnaexpress) and 113,000 for rice (see Chapters 9 and 14). In Arabidopsis, even these very high numbers, however, do not provide saturation mutagenesis as thousands of Arabidopsis genes, especially those smaller than 1 kb, have no insertions (http://signal.salk.edu/database/T-DNA/). Further, although the probability of tagging a gene increases with the number of available tags, it does so asymptotically; hence the efforts to extend the database to cover all the genes may soon approach the point of diminishing returns. In addition, a tagging approach is more likely to create nonfunctional proteins rather than multiple alleles with amino acid alterations. On the other hand, chemical or irradiation mutagenesis can yield base substitution mutant alleles that often play an important role in determining the functional domains of the protein.
8 Chemical- and Irradiation-Induced Mutants and TILLING
151
Given the aforementioned concerns, traditional mutagenesis, coupled with efficient targeting of genes, is an attractive genetic strategy for both Arabidopsis and rice. Production of mutants by chemical or irradiation mutagenesis is relatively inexpensive. Any genotype can be mutagenized and the distribution of mutations is probably random in the genome. Because of the high density of mutations, genome-wide saturation mutagenesis can be achieved using a relatively small mutant population (Koornneef et al. 1982; Henikoff and Comai 2003). This also provides a large allelic series as a complement to the knockout mutants produced by insertional mutagenesis or transformation methods (over- and under–expression). Unlike insertional mutagenesis technologies, which require highly efficient transformation systems, chemical and irradiation forms of mutagenesis do not rely on transformation. Despite these advantages, the use of chemical and irradiationinduced mutants as gene identification tools has been limited. This is mainly because the molecular isolation of mutated gene(s) requires considerable effort as the mutations are not physically tagged. However, advances in highthroughput genotyping have significantly increased the efficiency in detecting point mutations or deletions (Borevitz et al. 2003; Henikoff and Comai 2003; Winzeler et al. 2003). One such example of a high-throughput reverse-genetic technique is Targeting Induced Local Lesions in Genomes (TILLING). TILLING is employed to discover point mutations in the mutant libraries created via traditional chemical mutagenesis. Technologies that are developed primarily for discovering single-nucleotide polymorphisms (SNPs) in surveys of human and other populations are being adopted for TILLING (Bentley et al. 2000; McCallum et al. 2000a, 2000b; Comai and Henikoff 2006). Consequently, there has been growing interest in using chemical and irradiation mutagenesis in model organisms for functional genomics research (Liu et al. 1999; Nadeau and Frankel 2000). In this chapter, we discuss various chemical and irradiation mutagens, mutagenesis strategies, and various forward and reverse genetics approaches available with special reference to TILLING and its application as a powerful reverse genetics strategy for plants. We highlight the current status of rice mutant stocks, databases, and forward and reverse genetics strategies currently employed for rice functional genomics.
8.2 Mutagens and Mutagenesis Chemical mutagens and ionizing radiation have long been used as plant mutagens in forward-genetic studies (Guenet 2004). They are preferred over insertion mutagenesis because of their ability to (1) generate allelic series,
152
Ramesh S. Bhat et al.
(2) induce mutations at high frequencies, and (3) be applied to various plant species. Chemicals induce mainly point mutations, and are thus ideal for producing missense and nonsense mutations, which would provide a series of change-of-function mutations. On the other hand, ionizing radiations normally induce chromosomal rearrangements and deletions. The utility of a comprehensive deletion stocks is best illustrated in yeast. A collection of yeast deletion mutants covering 96% of annotated open reading frames has been proven to be a valuable resource for yeast functional genomics (Giaever et al. 2002). As shown in yeast, achieving a saturated gene-deletion mutant library with a small population is very important. Therefore, selection of a mutagen should be based on its efficiency and specificity to induce mutations, such that the resulting mutant library is of manageable size. At the same time, the mutagenesis procedure should be as simple as possible. It is also important to know the major type of mutation induced by a particular mutagen, as the screening strategy to be used will depend on the predominant type of mutation it creates (Koornneef et al. 1982). 8.2.1 Chemical Mutagens Ethylmethane Sulfonate
Alkylating agents were the first class of chemical mutagens to be discovered when Auerbach and Robson (1946) found the mutagenic effects of mustard gas and related compounds during World War II. Alkylating agents such as mustard gas, methylmethane sulfonate (MMS), ethylmethane sulfonate (EMS), and nitrosoguanidine have several effects on DNA. Because of its potency and ease with which it can be used, EMS is the most commonly used chemical mutagen in plants. EMS alkylates guanine bases and leads to mispairing-alkylated G pairs with T instead of C, resulting in primarily G/Cto-A/T transitions (Sega 1984; Vogel and Natarajan 1995). EMS mutagenesis in rice involves soaking the seeds in an aqueous solution at a chosen concentration (from 0.2% to 2.0%) for 10 to 20 h (based on the sensitivity or kill curve of the genotype used). Since EMS produces a large number (genome-wide) of nonlethal point mutations a relatively small mutant population (approximately 10,000) is sufficient to saturate the genome with mutations. In Arabidopsis, point mutation density can be as high as four mutations per Mb (Comai and Henikoff 2003, 2006; Till et al. 2003b). An important advantage of using a common mutagen, such as EMS, is that a substantial body of literature has accumulated that confirms its utility in forward genetic screens in a variety of organisms. These include
8 Chemical- and Irradiation-Induced Mutants and TILLING
153
the favorite model animal and model plant for mutagenesis studies, Drosophila melanogaster and Arabidopsis thaliana, respectively. EMS is remarkably consistent, in that apparently similar levels of mutagenesis have been achieved in these organisms, despite the approximately 1 billion years of divergence between them. For example, recessive lethal mutations are estimated to occur at similar rates in both cases, with EMS doses causing acceptable levels of sterility and lethality (Koornneef et al. 1982; Ashburner 1990). In addition, direct estimates confirm that base substitution rates are comparable for Arabidopsis seeds soaked in EMS (McCallum et al. 2000a, 2000b) and Drosophila males fed EMS (Bentley et al. 2000), and approximately similar rates were found in a reversegenetic screen of zebrafish progeny exposed to N-ethyl-N-nitrosourea (ENU; Wienholds et al. 2002). Thus, chemical mutagenesis causes a high frequency of nucleotide substitutions in a variety of organisms. Genome size does not appear to be an important factor in EMS mutagenesis because estimates of per gene mutational density found for Arabidopsis appear to be similar for maize (Goll and Bestor 2002), which has a 20-fold larger genome size. Therefore, EMS may likely be the mutagen of choice for TILLING in plants (see the subsequent section in this chapter). However, the toxicity of EMS may vary depending on the species, and other mutagens or post-treatments with antitoxicants may be worth considering (Henikoff and Comai 2003). Over the last few years, several new projects have been initiated with the aim of producing EMS-induced rice mutant populations in the United States (Crops Pathology/Genetics Research Unit of UC Davis, USDAARS), China (The Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences), Taiwan (Taiwan Agricultural Research Institute), Japan (Institute of Genetic Resources, Kyushu University, Japan), and the Philippines (International Rice Research Institute). Diepoxybutane
Diepoxybutane (DEB) is a potent chemical mutagen used in a variety of biological systems (Ehrenberg and Hussain 1981) and is capable of producing alkali-labile sites in DNA and forming inter- and intrastrand cross-links (bifunctional alkylating agent). The exact mode of action and the precise end results of its premutagenic lesions are not well understood, but it was shown to be mutagenic in Drosophila melanogaster (Graf et al. 1984), an efficient chromosome breaker (Watson 1966; Zimmering 1983) and an efficient inducer of multilocus deletions in Drosophila (Shukla and Auerbach 1980; Olsen and Green 1982). DEB has been shown to induce mutations at the rosy locus in Drosophila, 43% of them being deletions
154
Ramesh S. Bhat et al.
ranging from 50 bp to 8 kb (Reardon et al. 1987). DEB mutagenesis in rice involves soaking seeds in an aqueous solution of 0.004% or 0.006% DEB 0 with gentle shaking at 30 C for 13 h, and has been successfully used for forward genetics (Wu et al. 2005). N-Methyl-N-nitrosourea
N-methyl-N-nitrosourea (MNU) is a monofunctional alkylating agent causing single-strand DNA breaks during interphase stages. The pronounced clustering of chromosomal aberrations in heterochromatic regions (as shown by in situ hybridization studies) after treatment with MNU is thought to be due mainly to an error-prone interference of recombinative repair and replication in damaged basic repeats of large tandem repeat arrays (Vogel and Natarajan 1995). A large collection of MNU-induced mutants has been produced in Japan (http://www.shigen.nig.ac.jp/rice/oryzabase/nbrpStrains/kyushuGrc.jsp). Twelve classes of visible phenotypes, including 49 easily identifiable phenotypes, have been used to classify these mutant lines. Phenotypic classification of the MNU-induced mutants is identical to that used for Tos17-induced mutant lines. These mutants represent a promising resource for characterizing mutant genes using reverse-genetic tools such as TILLING (Kurata and Yamazaki 2006). Sodium Azide
Mutagenicity of sodium azide, an inhibitor of catalase and peroxidase enzymes, has been demonstrated in barley, maize soybean, pea, Brachypodium, and rice. There have been some reports of synergistic increase in the frequency of chromosomal aberrations when gamma-ray irradiation was followed by sodium azide treatment, although there was no apparent affect on chlorophyll mutation frequency. Synergism with respect to chlorophyll mutation frequency has been observed when used after MNU treatment. Mutation frequency as well as biological damage showed a linear response to an increase in the concentration of sodium azide from -4 -3 5 × 10 M to 2 × 10 M (in 0.1 M phosphate buffer, pH 3). Presoaking of seeds in water for 4 to 12 h induced highest chlorophyll mutation frequency with reduction in the frequency with longer presoaking treatments (Sarma et al. 1979). Researchers at the Taiwan Agricultural Research Institute have undertaken sodium azide mutagenesis of rice cultivar Tainung 67 (TNG67). No data are available on the type of genetic lesions produced by sodium azide in rice. However, in barley it has been shown to induce substitutions comprising transitions and transversions (Olsen et al. 1993).
8 Chemical- and Irradiation-Induced Mutants and TILLING
155
8.2.2 Irradiation Mutagens Ionizing radiation has been widely used to induce mutations for plant breeding and classical genetic analysis, but in-depth analyses at the molecular level have been done in only a few organisms. In plant genomes, ionizing radiation normally induces rearrangements and deletions, (Shirley et al. 1992; Bruggemann et al. 1996; Cecchini et al. 1998; Shikazono et al. 1998, 2001). Mutants in crop species produced by ionizing radiation have proved to be valuable in the fields of genetics and mutational breeding. The International Atomic Energy Agency (IAEA) has been a strong advocate of applying irradiation mutagenesis for crop improvement, and continues to provide gamma-ray irradiation as a public service, and organize regional research and training networks to apply mutation methods for crop breeding (R. Afza, IAEA, personal communication; http://www-naweb.iaea.org/nafa/pbg/index.html). Fast Neutron
Fast neutron has been shown to be a very effective mutagen in plants. An Arabidopsis line treated with fast neutrons at a dose of 60 Gy would have approximately 10 genes deleted on average (Koornneef et al. 1982), and thus approximately 2,500 lines would be sufficient to represent deletions in each of the expected 25,000 genes (The Arabidopsis Genome Initiative 2000). In another study, Bruggemann et al. (1996) found that most (13 out of 18) fast neutron-induced hy4 mutations in Arabidopsis were deletions larger than 5 kb. Molecular characterization of Arabidopsis ga1-3 (Sun et al. 1992) and tomato prf-3 (Salmeron et al. 1996) further demonstrated that fast neutron bombardment induces relatively large deletion mutations. Recently, a reverse genetics system based on fast-neutron–induced deletions was developed to identify and isolate targeted plant genes (Li et al. 2001, 2002; Li and Zhang 2002; Wu et al. 2005). According to Li et al. (2001), the reverse genetics system using fast-neutron–generated deletions is highly efficient and that fast neutrons could produce mutant lines with complete coverage much easier than that by T-DNA insertional mutagenesis. In rice, fast neutron mutagenesis has been used to produce about 8,000 M4 lines in an indica variety IR64 (Wu et al. 2005). Although the size of induced deletions (in a range of kilobases) in a few characterized mutants appears to be suitable for polymerase chain reaction (PCR) screen, the usefulness of this collection in a reversegenetic screen as described by Li et al. (2001) has not been thoroughly tested.
156
Ramesh S. Bhat et al.
Gamma Irradiation and X-Rays
Irradiation with gamma-(γ) and X-rays is also known to produce deletions and other chromosomal rearrangements, but a only few of them have been characterized at the molecular level (Oppenheimer et al. 1991; Wilkinson and Crawford 1991; Shirley et al. 1992; Kieber et al. 1993; Nambara et al. 1994). In Arabidopsis, all eight gamma radiation-induced mutations of a negatively selectable suicide marker (tms2), which was integrated into the genome, had deletions larger than 5 kb (Cecchini et al. 1998). Gamma- and X-rays have been used to produce mutants in Arabidopsis (Rédei and Koncz 1992) and rice (Cheema and Atta 2003; R. Bhat, unpublished data). Wu et al. (2005) used two doses of gamma irradiation (250 and 500 Gy) to produce a large collection of mutants in IR64. The size of the genetic lesions (in the kilobase range) appeared amenable to detection by chip-based techniques. Accelerated Ions
High linear energy transfer (LET) radiation, such as ion particles, causes more localized, dense ionization within cells than low-LET radiation (Smith 1972; Kraft et al. 1992; Blakely and Kronenberg 1998; Shikazono et al. 2003). On the basis of microdosimetric and radiobiological considerations, it is assumed that high-LET radiation could produce doublestrand breaks with damaged end groups and consequent low frequency of repair (Hagen 1994; Goodhead 1995; Blakely and Kronenberg 1998; Nikjoo et al. 1998). High-LET radiation would therefore generate mutations more frequently (more closely positioned) than low-LET radiation. Most likely, large structural alterations are also induced by the high-LET radiation more frequently than those by low-LET radiation. Using accelerated carbon ions (C ions), researchers have isolated several novel Arabidopsis mutants (ast, frl1, uvi1, suv1, tt18, and tt19) (Tanaka et al. 1997; Hase et al. 2000; Sakamoto et al. 2003; Shikazono et al. 2003; Kitamura et al. 2004). Analyses of these mutants at the nucleotide sequence level revealed inversions, translocations, and short deletions at comparable frequencies (Shikazono et al. 1998, 2001, 2003, 2005; Sakamoto et al. 2003; Kitamura et al. 2004). From the analysis of rearrangements, deletions were found to be generated at a frequency of -5 6.1 × 10 , which was comparable to that induced by fast neutrons (Li et al. 2001). These results imply that mutagenesis by accelerated ions could be used for both forward and reverse genetics in plants. Abe et al. (2005, 2006) have studied the mutation frequency in rice cv. Nipponbare with accelerated C and Neon (Ne) ions. Seeds soaked for 3 o days in water at 30 C without light were exposed to ions accelerated to 135 MeV/μm by the RIKEN Ring Cyclotron (RRC, The RIKEN Accelerator
8 Chemical- and Irradiation-Induced Mutants and TILLING
157
Research Facility, Japan) within a dose range of 10 to 40 Gy. The LET values of the C and Ne ion corresponded to 22.6 and 63.0 keV/μm at the surface of the seeds. Treated seeds were grown and progeny raised to measure the indicator mutation frequency (chlorophyll-deficient mutant or CDM). Half seed fertility doses were 40 to 80 Gy and 20 Gy for C and Ne ions, respectively. This result shows that biological effects depend strongly on their LETs. The optimum irradiation dose to induce CDM was 20 to 40 Gy with C ion. They also adjusted the LET values of the C ions at a dose of 20 Gy using the absorbers from 22.6 to 60.3 keV/μm at the surface of the seeds. They observed reduction in seed fertility with high LET irradiation but no difference in mutation rates at LET values of 22.6, 37.4, and 48.0 keV/μm. A tall mutant and a lesion–mimic mutant segregated in M2 generation, showed homozygous lines in the M3 generation (Abe et al. personal communication). 8.2.3 Raising Mutant Populations After treatment with an appropriate mutagen, the treated seeds are washed free of the mutagen and sown to produce M1 generation plants. Because each cell of the embryo is mutagenized independently of the other cells, M1 individuals are chimeric in the sense that they have mutated tissue sectors that descend from a single embryonic cell. In addition, each mutated sector is heterozygous for any mutation. Mutations present in the cells that form the reproductive tissues are inherited by the selfed progeny, the M2 generation. M2 plants are used to prepare pooled DNA samples for reverse genetics screening (see Section 8.5), while their seeds are inventoried. Forward genetics screening (phenotypic analysis) is normally performed on M3 plants. For assaying quantitative traits, it is particularly important to advance the lines to M4 or beyond because of the need to evaluate phenotypes in replicated trials. Bulked seeds from advanced generations are also more useful for the purpose of distributing the materials for examining different phenotypes. For the purpose of identifying mutated genes, it is better to aim for a moderate to high mutation density in the genome so that fewer mutants are needed to achieve genome coverage. However, too high a dose presents practical problems. At high doses, lethality and sterility of M1 plants make it difficult to produce an appropriately large population in a single attempt (Wu et al. 2005). From the oligo-hybridization experiments using several DEB- and gamma ray–induced rice mutants, it has been estimated that more than 100 mutations could be present in each genome of the chemical or irradiationinduced mutants (H. Leung, unpublished data). Theoretically, it would take many generations to eliminate all the background mutations. However,
158
Ramesh S. Bhat et al.
with one or two backcrossings, one can quickly establish the inheritance pattern and at the same time remove a significant portion of the background mutations. Producing a useful mutant population therefore is often a trade-off between the need to produce high-density mutations and the practicality of keeping a vigorous population without too many deleterious effects and background mutations (Wu et al. 2005).
8.3 Rice Mutant Stocks and Databases The FAO/IAEA Mutant Variety Database (MVD) provides information on induced mutations suitable for breeding programs and genetic analyses (http://www-mvd.iaea.org/Refs/MutBree-Rev-1.pdf). MVD collects infor– mation on crop mutant varieties, mutagens used, and characters improved, and a good number of rice entries are included in this database. Various chemical and irradiation mutagens used in rice by different laboratories are summarized in Table 8.1. Table 8.1. Summary of various chemical and irradiation mutagenesis attempts made in rice Mutagen
Cultivar
Ethylmethane sulfonate (EMS) Diepoxybutane (DEB) N-methyl-Nnitrosourea (MNU) Fast neutron
IR64 (indica) M202 (japonica) Nipponbare IR64 Point mutations, deletions Kinmaze Single-strand Taichung 65 DNA breaks IR64 M202 IR64
Gamma irradiation and X-rays Accelerated Nipponbare carbon ions Sodium azide
China-45 Tainung 67
a
Chlorophyll-deficient mutant.
Nature of mutation Point mutations
Method for Group used detection TILLING IRRI UC Davis IPPE PCR, IRRI TILLING TILLING IGRKU, Japan
Large deletions, translocations Large deletions, point mutations
PCR
IRRI; USA
PCR
IRRI
Double-strand breaks, large structural alterations Data not available
RIKEN, Japan CDMa frequency, PCR CDM frequency
Sarma et al. (1779); TARI, Taiwan
8 Chemical- and Irradiation-Induced Mutants and TILLING
159
8.3.1 USA Mutant Stocks A fast neutron–induced population was originally developed by Pamela Ronald at the University of California-Davis using japonica variety M202 and was subsequently acquired by a private company. The collection consists of 24,660 M2 lines that have been used for high-throughput PCR screening (Li et al. 2001). However, these are proprietary stocks and are not available publicly. The Crops Pathology/Genetics Research Unit of UC Davis (USDAARS) is developing a sizable stock (approximately 10,000) of chemical (EMS)-induced rice mutants in cultivar Nipponbare as a public resource (Dr. Tom Tai personal communication, http://www.ars.usda.gov/research/ projects/projects.htm?ACCN_NO=408015). 8.3.2 IRRI Mutant Stocks and Database The International Rice Research Institute (IRRI) in the Philippines is maintaining a mutant collection derived from the indica cultivar IR64. These were produced using four mutagenic agents: fast neutron, gammaray, DEB, and EMS in order to have different sizes of genetic lesions in the population. IR64 is the most widely grown rice variety in the tropics and it has many valuable agronomic traits related to yield, plant architecture, grain quality, and tolerance to biotic and abiotic stresses. For many traits, IR64 has intermediate phenotypes, thus enabling screening for gain- and loss-of-function mutations. Producing mutations in this elite genetic background can maximize the detection of phenotypic changes in important agronomic traits (Leung et al. 2001). Currently, the population consists of approximately 45,000 lines at the M4 stage. Mutant View, a database of these mutants, has been developed at the International Rice Information System (IRIS, http://www.iris.irri.org). This database, in addition to providing descriptions of the mutants, also serves as a portal for users to request materials. To consistently describe the mutant phenotypes through different seasons of field phenotyping, and to recognize synonymous mutations observed in different mutant studies, a set of controlled vocabulary (CV) descriptors documenting the observed mutant phenotypes was established. The set of vocabulary in use from the agronomic observations were compiled and curated, and 86 distinct agronomic mutant phenotypes were listed. In 2003, in collaboration with the Tos17 mutant group at NIAS (A. Miyao and H. Hirochika), the list of controlled vocabulary descriptors from the Tos17 mutant phenotypes was merged with the IR64 mutant phenotype controlled vocabulary. Fifty-six distinct phenotype observations
160
Ramesh S. Bhat et al.
from Tos17 mutants and the 86 IR64 CV terms were rationalized and a composite CV set of 91 terms was developed. In its current release, the phenotype CV is posted as a table in the IRISmutant Web site; it has mapped most of the CVs to public term ontology databases. Its purpose is not to create a new ontology but to utilize existing ontological classifications. In the case of novel CVs discovered through additional mutant screenings, these would be deposited in the appropriate ontology databases (e.g., Trait Ontology database) for incorporation into publicly accessible resources. This may serve as an inter-database resource enabling clear and unambiguous queries of mutant traits across the various existing rice mutant stock resources. The use of CVs in describing agronomic phenotypes, and the ability to accurately map this description into other mutant resource databases, is an important first step in gene discovery experiments. Using CV to crossreference databases, it is possible to query mutants with either enhancement or knockout of the same or a correlated trait. Because one or more of these mutants could be sequence indexed, it will enable quick identification of allelic series across different mutant collections This approach could be particularly useful for finding allelic series in the nontransgenic indica mutants using the sequenced-indexed databases of the transgenic mutants. 8.3.3 China Mutant Stocks Researchers at the Institute of Plant Physiology and Ecology (IPPE) of Shanghai Institutes for Biological Sciences (SIBS), in collaboration with Plant Research International (Netherlands), have produced approximately 60,000 rice mutant lines via EMS mutagenesis (http://202.127.18.254/ research/field3.htm). These populations were obtained with seed treatment of 20 mM, 40 mM, and 60 mM EMS. This group is poised to set up a mutant database (approximately 6,000 entries) for forward genetics and DNA pools for large-scale reverse genetics. 8.3.4 Taiwan Mutant Stock The sodium azide-induced mutant stock developed and maintained at Taiwan Agricultural Research Institute (TARI), Taiwan (http://www.agnet. org/library/article/rh2003009b.html) contains more than 2,000 mutants (M12) with diverse variations including pathogen (Pyricularia oryzae, Xanthomonas campestris pv. oryzae) and insect (brown planthopper, white-back hopper) resistance, herbicide (bentazone and glyphosate),
8 Chemical- and Irradiation-Induced Mutants and TILLING
161
stress (UV, chilling) tolerance, chemical composition (starch, storage proteins, and aroma), taste quality, pericarp color of grain, and hundreds of agronomic variations in the growth stage, grain development, morphology, plant type, and yield capabilities. 8.3.5 Japan Mutant Stock and Database The Institute of Genetic Resources at Kyushu University has produced more than 6,000 rice mutants by N-methyl-N-nitrosourea treatment in cultivars Kinmaze and Taichung 65. Information relating to this collection can be viewed at http://www.shigen.nig.ac.jp/rice/oryzabase/nbrpStrains/ kyushu Grc.jsp. The list of mutant strains presented here contains the link to characteristic data, the request form and the MTA form for seed distribution. Preliminary results on TILLING indicate a mutation frequency of 0.8% per 1-kb region and so one can expect about eight different mutations for every 1 kb of genome sequence in 1,000 mutant lines (Suzuki et al. 2005). The mutant population with this high mutation ratio should serve as a promising TILLING resource and for reverse-genetic studies in rice. At present 1,000 mutant lines of each cultivar are available for public distribution and it is expected that by the end of 2007, all mutant lines will be ready for seed distribution.
8.4 Forward Genetics with Mutants In the forward genetics strategy, one starts with a phenotype and its inheritance, followed by genetic mapping to locate the target gene on a chromosomal region. With the help of genetic markers it is possible to “walk down” the chromosome and eventually the DNA sequence responsible for the trait can be identified. The availability of the entire rice genome sequence will hasten gene identification considerably. 8.4.1 Phenotyping Mutant populations harbor a large amount of genetic variability that can be revealed when the mutants are subjected to appropriate phenotypic screening. Morphological mutants can be identified based on phenotypic categories. But it is more difficult to estimate the variation for conditional traits because of the differences in experimental conditions. As mentioned previously, the availability of seeds from advanced generations is most important in screening for quantitative phenotypes. In fact, except for
162
Ramesh S. Bhat et al.
simple qualitative traits, it is not possible to identify mutant phenotypes in early generations. With the screening of replicated lines, mutants with altered response to diseases (blast, bacterial blight, tungro virus) and brown plant hopper have been recovered from the IRRI collection. More recently, mutants with quantitative changes in salinity tolerance have also been isolated (B. Nakhoda and A. Ismail, IRRI, unpublished data). In many cases, both gain- and loss-of-resistance mutants were found. Overall, the rate of mutant detection in the population is approximately 0.1% for a broad category of traits such as altered disease resistance. However, the rate is an order of magnitude lower (approximately 0.01%) for a highly specific trait, such as a change in response to tungro viruses (P. Cabauatan and I. Choi, IRRI, unpublished data). 8.4.2 Map-Based Cloning Map-based cloning is a forward genetics approach to identify the function of a gene by delimiting the chromosomal region conferring the phenotype of interest with markers linked to the mutated gene. Until recently, mapbased cloning approach has been rather tedious and time consuming. Although initial progress is easy to achieve, the fine mapping of a candidate gene is increasingly difficult. Thus, defining gene function by reverse genetics approaches has offered an attractive alternative. However, reverse genetics can be limited by a lack of phenotypes in reverse screenings, mainly because of gene function redundancy. Also, the choice of phenotypes to be evaluated can be biased by preconceptions about the possible function of a chosen gene. The advances in sequencing projects, improvements in map-based cloning approaches, the wealth of available marker systems and the progress made in methods to detect DNA polymorphisms, have brought map-based cloning back into the limelight (for a review see Peters et al. 2003). Map-based cloning strategies are based on the fact that as distances between the target gene and the analyzed markers decrease, so does the frequency of recombination. Therefore, the availability of increasinglydenser genetic maps, culminating in physical maps is a key factor determining the speed with which map-based cloning can be achieved. Although sizable numbers of polymorphic isozyme and DNA-based markers such as RFLP, RAPD, SSR, AFLP, and SSLP, are available for many crops, their detection is not as straightforward and inexpensive as that of SNPs and insertion/deletions (indels). Recent developments and improvements in high-throughput sequencing technology and the availability of large sets of ESTs are enabling easy detection of SNPs and indel polymorphisms.
8 Chemical- and Irradiation-Induced Mutants and TILLING
163
A genome-wide rice DNA polymorphism database (http://shenghuan. shnu.edu.cn/ricemarker) has been constructed using the genomes of Nipponbare, a japonica cultivar, and 93-11, an indica cultivar (Shen et al. 2004). This database contains 1,703,176 single SNPs and 479,406 indels, approximately one SNP every 268 bp, and one indel every 953 bp in the rice genome. Both SNPs and indels in the database have been experimentally validated. Of 109 randomly selected SNPs, 107 SNPs (98.2%) are accurate. PCR analysis indicated that 90% (97 of 108) of indels in the database could be used as molecular markers, and 68% to 89% of the 97 indel markers have polymorphisms between other indica cultivars (Guang-lu-ai 4 and Long-te-pu B) and japonica cultivars (Zhonghua 11 and 9522). By validating indel polymorphisms in the database, sets of indel markers for all chromosomes have been developed. These markers are inexpensive and easy to use, and can be used for any combination of japonica and indica cultivars. This rice DNA polymorphism database will be a valuable resource and an important tool for map-based cloning of rice gene. Recently, indel polymorphism is being exploited for genetic mapping using a low-cost microarray platform (David Galbraith, University of Arizona, personal communication). Besides their use as DNA markers, SNPs can also be used for allele discrimination in the analysis of allele-sharing status among distant or related rice strains. An allele-sharing map has been proposed as an effective strategy to convert huge amounts of complicated SNP data into a compact but informative map for various study purposes (Monna et al. 2006). 8.4.3 Detecting Genomic Changes Using Genome-Wide Chips Single-feature polymorphisms (SFP) have been detected successfully in Arabidopsis ecotypes using oligonucleotide (oligo) chips (Borevitz et al. 2003). Chang et al. (2003) reported preliminary results on using the Syngenta GeneChip, which contains 24-mer oligos representing 24,000 rice genes, to detect deletions. Genes/probes that generate hybridization signals below those of the wild-type cultivar (based on significant t-test) were considered as candidate genes. The gene chip approach was first tested with mutant alleles of two known genes: a γ-ray-induced dwarf mutant having a deletion in d1 (AB028602) encoding a heterotrimeric G protein (Ashikari et al. 1999) and diepoxybutane- and fast neutron– induced deletion mutants at the Xa21 locus conditioning bacterial blight resistance in rice cultivar IRBB21 (Wang et al. 2004). DNA from the mutants and wild-type lines were hybridized separately to the Syngenta Rice GeneChip genome arrays. The GeneChip arrays successfully detected
164
Ramesh S. Bhat et al.
deletions spanning the single copy d1 gene. Detection of the Xa21 deletions was ambiguous because of the presence of multiple members of the gene family. Although the chip detection technique may not always pinpoint the target gene, it enables rapid localization of the approximate position of candidate regions. There are limitations to the chip-based detection technique. It depends on genome coverage of the oligoarray chip and the size and the position of deletions relative to the oligos represented in the chip. It would be difficult to detect large deletions or multiple mutations across the genome. Backcrossings are often required to remove background mutations. To overcome these problems, one may use multiple alleles, if available, to narrow the search for candidate mutations. Pooling of DNA from segregants with common phenotypes is another way to mask irrelevant mutations (Gong et al. 2004). Finally, availability of the newer versions of oligoarrays such as the 44K Agilent oligoarray and rice genome chips such ® as the 51K Affymetric GeneChip (see Chapter 4 od this book) will greatly improve the utility of deletion mutants for gene discovery.
8.5 Reverse Genetics with Mutants With the availability of near-complete genome sequences for rice, identifying specific functions of each of the predicted 40,000 rice genes is a huge and challenging task for biologists. For genes that show detectable phenotypes when mutated, forward genetics approaches are the most feasible. However, for genes showing no detectable phenotypes, forward genetics is not feasible. Reverse genetics strategies are becoming increasingly useful, especially with the expanding collection of insertiontagged lines and the advancement in RNAi-based gene silencing technology. Yet, production and curation of large libraries of insertional mutants with recoverable tags (such as T-DNA or transposon) for each gene will be very difficult to achieve because of “cold spots” in the genome (regions apparently inaccessible for insertion). Chemical and radiation mutagens on the other hand allow saturation mutagenesis to be achieved using relatively few individuals with multiple lesions in the genome. The use of such mutant populations for reverse genetics is becoming a reality with the development of high-throughput PCR-based detection (Li et al. 2001; Li and Zhang 2002) and TILLING technologies (McCallum et al. 2000a, 2000b).
8 Chemical- and Irradiation-Induced Mutants and TILLING
165
8.5.1 PCR Screening Small to medium-sized deletions in genomes (such as the ones produced by fast neutron mutagenesis) can be detected through PCR analysis. A method that identifies smaller than expected amplicons due to the presence of a deletion was first described by Jansen et al. (1997). In this method, primers flanking a genomic region containing a target gene are designed in such a way that the product generated by the wild-type allele is difficult to PCR amplify because of its large size. When a deletion reduces the length of the region flanked by the primers, the fragment with such deletion can often be amplified with higher efficiency. As a result, such smaller product can be detected even if the DNA from the individual allele carrying the deletion is mixed with DNA from many wild-type individuals. Li and colleagues exploited this strategy in rice to develop a reverse genetics resource (Li et al. 2001; Li and Zhang 2002). Applying PCR-screening to 5,000 IR64 mutant lines has yielded one deletion mutation in a defense gene but the experiment required stringent optimization of PCR conditions, making the method less robust (P. Manosalva and J. Leach, Colorado State University, unpublished data). Nonetheless, the deletion detection strategy has advantages not found in other methods. Most notably, it can yield mutations in which tandemly arranged genes (paralogous or not) can be simultaneously deleted. While this approach is potentially quite powerful, the mutants identified have been limited. The inefficiency of this approach can be attributed to several factors. First, deletions can have severe consequences on the affected genes and mutagenic treatments that produce a high density of these lesions are lethal because essential genes are knocked out. Second, the severity of deletions makes it impossible to study the function of essential genes. Third, the production of an appropriately mutagenized population is difficult in most organisms because of the conflict between mutation density and survival. Fourth, the strategy to detect mutant alleles is constrained: deletions that are too small to change the amplification efficiency of the target or those that are in the primer-binding sites cannot be identified. Notwithstanding the drawbacks, it would be useful to have such a resource in rice. Unfortunately, the mutagenized population described in Li et al (2001) is proprietary and unavailable as a public resource. No public population has been described, and therefore this approach appears to be presently not easily available to rice researchers. 8.5.2 TILLING The TILLING approach makes use of DNA strand mismatches formed between mutant and wild-type DNA. DNA from individual M2 plants is
166
Ramesh S. Bhat et al.
isolated, pooled, and arrayed in 96-well plates. Primers are designed (e.g., using CODDLE, http://www.proweb.org/input/) to bracket a 1-kb region that most likely contains a deleterious mutation in a target gene. The primers are then used to amplify the gene of interest followed by denaturing and reannealing of DNA to allow formation of homo- and heteroduplexes in the DNA pool. Originally, denatured high-performance liquid chromatography (HPLC) was used to detect the presence of a DNA mismatch, but now it is detected by enzymatic cleavage of PCR-amplified heteroduplexed DNA and band visualization using fluorescent endlabeling and denaturing polyacrylamide gel electrophoresis (Henikoff and Comai 2003; Henikoff et al. 2004). The TILLING approach is working well in Arabidopsis, where a relatively high mutation frequency was induced and more than 5000 mutations have been identified in more than 400 targeted genes (Cooper et al. 2005). To date, mutants/alleles identified by TILLING have resulted in the identification of six Arabidposis genes, namely DAWDLE (Morris et al. 2006), REVERSION-TO-ETHYLENE SENSITIVITY1 (Resnick et al. 2006), AtISA3 (Delatte et al. 2006), Arabidopsis carotenoid beta-ring hydroxylase (Kim and DellaPenna 2006), AtWEX (Li et al. 2005), ARABIDOPSIS CRINKLY4 (Gifford et al. 2005). The generality of the mutagenesis and the mutation discovery methods allow application of this approach to most organisms. TILLING can be used to identify allelic series of mutations, including knockouts. Indeed, TILLING can be applied to selected target genes even if genomic sequencing is limited. The high density of chemically induced point mutations makes TILLING suitable for targeting small genes, and it allows an investigator to focus on single protein domains when targeting larger genes. In contrast to insertional mutagenesis, TILLING is widely applicable, as chemical mutagenesis has been successfully applied to most taxa. Indeed, TILLING results have been reported for a variety of plant and animals (McCallum et al. 2000a, 2000b; Perry et al. 2003; Wienholds et al. 2003; Till et al. 2004; Gilchrist and Haughn 2005; Gilchrist et al. 2006; Slade et al. 2005; Winkler et al. 2005; Wu et al. 2005). Because it is broadly applicable and nontransgenic, TILLING has the potential to become a standard reverse-genetic strategy for plant functional genomics.
8.6 TILLING in Rice 8.6.1 Seattle TILLING Project The Seattle TILLING Project (http://tilling.fhcrc.org:9366/) in collaboration with the International Rice Research Institute (IRRI) and the Agricultural Research Station of the US Department of Agriculture at
8 Chemical- and Irradiation-Induced Mutants and TILLING
167
Davis, has been applying the TILLING method to rice. A critical requirement for TILLING is the availability of a mutagenized population with a sufficient density of induced mutations. The estimate of mutation density per megabase of DNA is the single most important determinant of the feasibility of TILLING as an effective reverse-genetic strategy. Although it is possible to TILL (i.e., to find suitable mutations in) a population that has one mutation per megabase of diploid DNA, efficient TILLING requires at least two mutations per megabase. Thus, an important step in TILLING is determining the best dosage of mutagen. Too severe a treatment can cause sterility and nonviability, whereas too mild a treatment results in a low density of mutations and will require more screening to obtain an adequate allelic series. Seed mutagenesis, even repeated treatments under identical conditions, can be variable and different species may require different dosages (for examples see Till et al. 2003a, 2003b). As a result, optimizing mutagenesis may involve multiple attempts using a range of mutagen concentrations, to produce the best trade-off between fertility and mutation rate. Determining the mutation rate is best done via TILLING, and it requires carrying about 800 plants to the M2 stage before deciding which conditions worked best. This pilot process entails TILLING of three to six genes in 768 plants from a test population and it is the only reliable way to estimate a mutation rate. Pilotscale screening also can identify other factors that might limit the efficiency of high-throughput TILLING, such as insufficient DNA purity. Rice has proved technically challenging to mutagenize to achieve a sufficient mutation density, although recent efforts to achieve the critical threshold of mutations have been successful (B. Till et al., unpublished data). To date, the Seattle TILLING Project has screened several rice pilot populations that have mutation frequencies lower than what we judge sufficient for a successful TILLING service. Some seed-mutagenized populations of indica had a mutation frequency of approximately one per megabase (Wu et al. 2005). A population mutagenized by the floral dip method had a better mutation frequency of approximately 1.7 per megabase (Nori Kurata, Brad Till, Jennifer Cooper et al., unpublished data). Testing of mutagenic treatments is ongoing with indica rice at IRRI. A recurrent EMSmutagenesis scheme has increased the mutation density up to 1.3 per megabase based on screening about 1,600 M2 plants with 11 genes (F. Qui and H. Leung, unpublished data). Most recently, populations of japonica rice mutagenized by Dr. Tom Tai at the USDA-ARS of Davis have displayed the best density of mutations measured so far, allowing the isolation of multiple mutants in several tested genes (B. Till, T. Tai et al., unpublished data). It was concluded that the latter populations would be suitable for use in a largescale TILLING project. Consequently, a scale-up of TILLING libraries derived from these populations is in progress at the UC Davis Genome
168
Ramesh S. Bhat et al.
Center. A public service is anticipated as early as Spring 2007 which will be run from the Genome Center of UC Davis, with seed distribution from the Dale Bumpers Rice Stock Center in Arkansas, and will be modeled on services previously established for Arabidopsis, maize, and Drosophila. 8.6.2 Other Technical Improvements in Rice TILLING Suzuki et al. (2005) have simplified the TILLING procedure for use in rice by replacing the fluorescence primers with nonlabeled primers in PCR amplification, and using the capillary gel electrophoresis with the HAD-GT12 Genetic Analyzer (eGene Inc., Irvine, CA) that can separate DNA fragments below 2 kb within 8 minutes. This modified system could detect SNPs at any DNA regions examined between indica and japonica varieties and also test-pooled DNA samples with a capability of detecting one heterozygous mutant in a pool of six plants. In such a test screening of 700 M2 MNU-induced mutant lines for mutations in a 600-bp known intragenic region, they could detect 10 candidate mutant lines, six of which were confirmed by sequencing. Recently, Raghavan et al. (2007) have optimized an agarose gel method to simplify detection without the specialized equipment (see flow chart in Fig. 8.1). The group showed that the SNP detection by agarose gel corresponded perfectly with those based on the LiCor genotyper. It was possible to detect mutations in an eightfold DNA pool. Screening efficiency was also increased by scanning amplicons as large as 3 kb. The real advantage of agarose TILLING is the elimination of the need for labeled primers, which represents a significant cost reduction, making the technique much more affordable for laboratories with modest budgets. 8.6.3 TILLING Case Studies for Specific Traits TILLING has been advocated as an important tool for agricultural improvement through the identification of new variants and as a means to validate gene function. For wide adoption, however, it is important to know the efficiency of identifying allelic variants that yield detectable phenotypes. The strategy adopted at IRRI has been to focus on a few genes to illustrate the potential of identifying useful variants with agronomically important phenotypes. The first case deals with screening induced mutations in members of a gene family conditioning disease resistance. The second case concerns the detection of natural variation in a candidate gene with putative function in the drought response pathway. While these experiments are far from complete, they illustrate different challenges and may offer some useful hints to guide future applications.
8 Chemical- and Irradiation-Induced Mutants and TILLING
169
Collect ~2 g leaf tissue from individual plants & store at -80 °C for tracking mutants.
A
e.g.
• Pool equal quantity of leaf tissue (~ 1 g) from each of the 8 plants. • Extract DNA from each pool. • Quantify and prepare a working stock of 0.5 ng/μl of each DNA pool.
OR
DNA pools 0.5 ng/µl
IR 64 wild type
Pool 1
Pool 2 Phenotypic mutant
Pool 3
Non-phenotypic mutant
B • Extract DNA from leaf tissue (~ 1 g) of each of the 8 plants separately. • Quantify and normalize DNA extracted from each of the 8 plants to a concentration of 0.5 ng/μl. • Create pools by mixing equal volume of normalized DNA from each of the 8 plants.
Pool 4 EMS mutagenized population of IR 64
Using procedure ‘A’ or B’ DNA pools (0.5 ng/μl) are prepared.
e.g. Pool 1
Pool 2
Pool 3
Pool 4
• PCR amplify candidate gene “G1” in a final volume of 14 μl separately for each pool. • Use 2 μl of PCR reaction to check for a unique PCR-product or amplicon on a 1% agarose gel. • Subject the remaining PCR reaction to the following conditions in a thermal cycler to enable heteroduplex formation: 95 °C – 2 min; 95- 85 °C @ -2 °C/s ; 85 -25 °C @ -0.1 °C/s; 4 °C hold. • Prepare and CJE mix in the following ratio: 8.3 μl millipore distilled water; 1.5 μl CJE buffer (may vary depending on titre); 0.2 μl CJE. • Treat each PCR reaction with 10 μl of the CJE mix and incubate at 45 °C for 30 min. • Stop the digestion by adding 5 μl of 0.5 M EDTA to each reaction. • Load 10 or more μl of the digest onto a 1.2 % agarose gel and electrophores @ 10 V/cm. • Stain the gel with ethidium bromide in the conventional way and visualize under UV-transilluminator. Pools with no cleaved product No mutation occurred in the G1 locus across the 8 lines in Pools 1, 2 and 3 Screen additional genes
Pool 4 cleaved product visible There is a mutation (SNP) in the G1 locus • Extract DNA from the frozen leaf samples of each of the 8 plants in Pool 4. • Store part of the DNA of the individual plants to track mutation at other loci. • Combine equal volumes of DNA from each member of Pool 4 with that of IR 64 wildtype – making 8 DNA pools (0.5 ng/μl). • Repeat steps 1 to 8. • Identify the individual plant that carries the mutation and verify by sequencing.
Fig. 8.1. Rice TILLING on agarose gel. The procedure is similar to standard TILLING in terms of DNA pooling, PCR, and CEL I digestion. The main difference is the use of agarose method for detecting cleaved products. This obviates the need for label primers in PCR and special genotyping platforms. CJE = celery juice extract as a source of CEL I restriction enzyme
170
Ramesh S. Bhat et al.
Gene Family Members Associated with Disease Resistance QTL
Several chromosomal regions harboring oxalate oxidase (OXO) and oxalate oxidase-like protein (OXL) have been shown to be associated with quantitative disease resistance in mapping populations (Ramalingam et al. 2003; Liu et al. 2004). The OsOXO cluster has four members on chromosome 3. The OsOXL genes are clustered on chromosome 8 (12 members) and chromosome 12 (4 members). The 12 members on chromosome 8 have different expression patterns that do not show an obvious relationship with resistance phenotype expressed in different genotypes (R. Davidson and J. Leach, Colorado State University, unpublished data). Because of the quantitative effect, it is difficult to determine if individual members of the family or combinations of these members confer disease resistance. One approach is to identify mutant alleles in each of the gene members and determine their phenotypes: an ideal problem for TILLING to address. At IRRI, the simplified agarose method has been adopted for all TILLING operations (Fig. 8.1; Raghavan et al. 2007). From screening approximately 800 M2 DNA samples, 11 SNPs in the oxalate oxidase genes in a high-dose (2%) EMS population were identified. Of these, five SNPs cause asynonymous changes leading to changes in amino acids (Table 8.2). All five SNPs are G/C to A/T transitions consistent with that expected from EMS mutagenesis. Interestingly, while the estimated mutation density for this mutant population is low (about one per megabase), a good number of mutations can be identified in a specific gene family in a relatively small population of 800 lines. Of the five mutants evaluated for disease response, only one appeared to show reduced resistance in preliminary analysis. This illustrates the need to assemble a large collection of allelic mutations or to combine multiple mutations in a single genotype in order to reveal phenotypic changes. Table 8.2. Identification of mutations in members of gene families of oxalate oxidase (OsOXO) and oxalate oxidase-like protein (OsOXL) in an EMS-induced IR64 mutant collection Gene name TIGR Locus ID OsOXO-4 OsOXL-7 OsOXL-6 OsOXL-9 OsOXL-9
LOC_Os03g48780 LOC_Os08g09010 LOC_Os08g09000 LOC_Os08g09040 LOC_Os08g09040
Chromosome 3 8 8 8 8
Mutant line identified M3E93 M3E715 M3E97 M3E183 M3E543
Mutation Amino acid change C to T Pro to Leu G to A Arg to Lyc C to T Ser to Phe C to T Arg to Val C to T Pro to Leu
8 Chemical- and Irradiation-Induced Mutants and TILLING
171
Drought-Response Candidate Gene
A modified procedure of TILLING, called EcoTILLING (Comai et al. 2004), was applied to identify natural allelic variants in a gene coding a putative ethylene–responsive element binding protein 3 (ERF3) harboring an AP2 domain at 136.6 cM on chromosome 1 (Wang 2005). This locus falls within a drought QTL region centered on 146 cM that correlates with yield components under stress. The genetic variation at this locus was examined in a collection of 905 rice lines of the minicore germplasm collection at IRRI (Table 8 3). The germplasm collection in essence is considered a large natural mutant bank. Table 8.3. Summary of nucleotide diversity in ERF3 region based on analysis of 905 rice lines Nucleotide polymorphism
5΄ upstream 847 bp
5΄ UTR 90 bp
CDS 708 bp
3΄ UTR 97 bp
SNPs Insertions/deletions Informative sitesa Total
21 (2.48) 4 (0.47) 15 (1.77) 25 (2.95)
1 (1.11) 0 (0.00) 0 (0.00) 1 (1.11)
5 (0.71) 2 (0.28) 3 (0.42) 7 (0.99)
1 (1.03) 1 (1.03) 2 (2.06) 2 (2.06)
a
Informative sites: nucleotide substitutions resulting in changes in either cis-acting factors or amino acids; Figures in parenthesis are % sites polymorphic (i.e., number of polymorphisms/total no. of bp in each region).
The percentage of polymorphic sites shows that 5΄ upstream region is the most polymorphic among the four screened regions. In this germplasm collection, the average SNP frequency of the ERF3 noncoding region was one SNP per 65 bp and of the coding region (CDS) was one SNP per 141 bp. This frequency is lower than that of maize, where the SNP frequency in US elite inbred germplasm was one SNP per 48 bp in noncoding regions and one SNP per 131 bp in coding regions (Bhattramakki et al. 2002). The experiment to associate molecular variation to phenotype was done with the drought physiology group at IRRI who assayed drought-response phenotypes of more than 400 lines under field and greenhouse conditions. The phenotyping data were then tested for association with SNP haplotype data at the ERF3 locus. Preliminary analysis suggested a positive association between a SNP haplotype and biomass under stress within a collection of indica germplasm (N = 217). Recognizing that multiple loci are likely to be involved, a larger panel of more than 20 drought response candidate genes is being tested in this collection of germplasm to understand the molecular mechanisms of drought tolerance response (K.L. McNally and E. Naredo, IRRI, unpublished data).
172
Ramesh S. Bhat et al.
TILLING with Phenotype-Enriched Mutant Subsets
Another approach adopted by IRRI researchers is a combination of forward and reverse genetics wherein a particular phenotype-enriched mutant subset is first selected phenotypically for subsequent TILLING using a small set of genes presumed to be involved in imparting the phenotype in question. As a test case, genes in the starch biosynthesis pathways have been targeted. A large collection of mutants is being screened first for abnormal grains (in an inexpensive phenotypic screen) and then TILLING is being applied using a small set of known genes involved in starch biosynthesis. Adopting this strategy, several mutants have been obtained (Douglas Willoughby and Melissa Fitzgerald, IRRI, unpublished data). Currently, this approach is also adopted in screening for mutations in genes related to small RNA metabolism by using a set of more than 200 mutant lines with visible morphological variation based on the assumption that most of the genes involved in small RNA metabolism also affect development in either vegetative or reproductive stages (Taeko Sasaki and Jehan Sasongko, IRRI, unpublished data).
8.7 Future Prospects Now that we have the complete rice genome sequence information, it is instructive to reread the short essay by Hieter and Boguski (1997) on “Functional genomics: it’s all how you read it,” published just a year before the rice genome sequencing project was conceived. In their commentary, they pointed out the essence of “functional genomics” which is the “development and application of global (genome-wide) experimental approaches to assess gene function by making use of the information and reagents provided by structural genomics.” The new science entails highthroughput and computational analyses and it promises to narrow the gap between sequence information and function and eventually phenotypes. But they also cautioned that “functional genomics, however, will not replace the time-honored use of genetics, biochemistry, cell biology, and structural studies in gaining a detailed understanding of the biological mechanisms.” These comments are as relevant now as they were a decade ago when only the yeast genome was completely sequenced. With whole genome sequence available, it is theoretically possible to test the relationship between molecular variation and phenotypes at every gene by forward and reverse genetics. A prerequisite to understanding the functions of each gene, and its interactions with other genes, is the identification of biological variants that carry the loci and alleles of interest. Chemical- and irradiation-induced mutants are particularly
8 Chemical- and Irradiation-Induced Mutants and TILLING
173
valuable for understanding the gene–phenotype relationship because SNPs and indels represent the majority of genomic variations in natural germplasm. Further, techniques that enable a high-throughput sampling of allelic variation in multiple genes in large collections of mutants or natural germplasm are critical in the post-sequencing era. In this chapter, we have surveyed the present state of the art in chemical and irradiation mutagenesis, resources available to accelerate gene discovery, and single nucleotide detection technologies for forward as well as reverse genetics. In this context, TILLING represents a promising tool because it is relatively simple to implement and can serve to identify induced and natural variants in germplasm and mutant collections for almost any crop species. The main benefit of TILLING lies in its potential to identify a large series of mutations ranging from knockouts to subtle missense mutations. There is already a large collection of insertion and activation lines with flanking sequence tags (FSTs). The current OryGenes database (http://orygenesdb.cirad.fr) has about 80,000 available tagged sequences on the rice genome. On average, there is a 50% success rate of finding tagged mutations of a gene of interest (A. Pereira, personal communication). While the success rate will continue to rise with enlarging mutant populations, mutations discovered by TILLING can fill in the missing gaps by producing a rich allelic series of point/indels mutations. With modest investment, almost any laboratory can conduct TILLING using local mutant populations and germplasm collection. At present, the limitation of TILLING mutant populations is largely biological—the ability to produce a sufficiently large allelic series such that the desired knockouts or knockdown mutations can be uncovered. Also, there is still a paucity of empirical data to indicate the number of mutations needed for phenotypic evaluation of agronomic traits. Thus, more convincing examples are needed to demonstrate the benefits of TILLING in generating useful diversity in induced mutations.
Acknowledgments We thank the Swiss Agency for Development and Cooperation (HL), Rockefeller Foundation (HL, LC), USDA (HL, LC), and Generation Challenge Program (KM, HL) for the financial support. We thank colleagues at various institutions to provide unpublished information concerning their mutant collections, and IAEA for supporting irradiation mutagenesis for the mutant collections at IRRI.
174
Ramesh S. Bhat et al.
References Abe T, Hayashi Y, Saito H, Takehisa H, Miyazawa Y, Yamamoto YY, Ryuto H, Fukunishi N, Sato T, Yoshida S, Kameya T (2005) Chlorophyll-deficient mutants of rice induced by C-ion irradiation. RIKEN Accel Prog Rep 38:132 Abe T, Yasuda M, Takehisa H, Hayashi Y, Saito H, Ichida H, Shirao T, Onuma R, Ryuto H, Fukunishi N, Miyazawa Y, Tokairin H, Nakashida H, Kudo T, Sato T (2006) Isolation of morphological mutants of rice induced by heavy-ion irradiation, RIKEN Accel Prog Rep 39:137 Ashburner M (1990) Drosophila, A Laboratory Handbook. Cold Spring Harbor, NY, Cold Spring Harbor Press Ashikari M, Wu J, Yano M, Sasaki T, Yoshimura A (1999) Rice gibberellininsensitive dwarf mutant gene Dwarf 1 encodes the alpha-subunit of GTPbinding protein. Proc Natl Acad Sci USA 96:10284–10289 Auerbach C, Robson JM (1946) Chemical production of mutations. Nature 157:302 Bentley A, MacLennan B, Calvo J, Dearolf CR (2000) Targeted recovery of mutations in Drosophila. Genetics 156:1169–1173 Bhattramakki D, Dolan Hanafey M, Wineland R, Vaske D, Register J, Tingey S, Rafalski A (2002) Insertion-deletion polymorphisms in 3' regions of maize genes occur frequently and can be used as highly informative markers. Plant Mol Biol 48:539–547 Blakely EA, Kronenberg A (1998) Heavy-ion radiobiology: new approaches to delineate mechanisms underlying enhanced biological effectiveness. Radiat Res 150:S126–145 Borevitz JO, Liang D, Plouffe D, Chang HS, Zhu T, Weigel D, Berry CC, Winzeler E, Chory J (2003) Large-scale identification of single-feature polymorphisms in complex genomes. Genome Res 13:513–523 Bruggemann E, Handwerger K, Essex C, Storz G (1996) Analysis of fast neutrongenerated mutants at the Arabidopsis thaliana HY4 locus. Plant J 10:755–760 Cecchini E, Mulligan BJ, Covey SN, Milner JJ (1998) Characterization of gamma irradiation-induced deletion mutations at a selectable locus in Arabidopsis. Mutat Res 401:199–206 Chang HS, Wu C, Zeng L, Dunn M, Wang GL, Leung H, Goff S, Wang X, Zhu T, Leach JE (2003) Detection of deleted genes in rice mutants using the Rice GeneChip genome array. In: Abstracts of Plant and Animal Genome XI. 11-15 January 2003, San Diego, California, p 100 Cheema AA, Atta BM (2003) Radioactivity studies in Basmati rice. Pak J Bot 35:197–207 Comai L, Henikoff S (2006) TILLING: practical single-nucleotide mutation discovery. Plant J 45:684–694 Comai L, Young K, Till B, Reynolds S, Greene E, Codomo C, Enns L, Johnson J, Burtner C, Odden A, Henikoff S (2004) Efficient discovery of DNA polymorphisms in natural populations by Ecotilling. Plant J 37:778–786 Cooper J, Till B, Codomo C, Burtner C, Young K, Bowers E, Holm A, Laport R, Greene E, Zerr T, Kwong S, Comai L, Henikoff S (2005) TILLING and
8 Chemical- and Irradiation-Induced Mutants and TILLING
175
Ecotilling in Rice In: Plant Biology 2005 Symposium VI: New Directions of Rice Research in Post-Genome Sequencing Era, Seattle, Washington, USA Delatte T, Umhang M, Trevisan M, Eicke S, Thorneycroft D, Smith SM, Zeeman SC (2006) Evidence for distinct mechanisms of starch granule breakdown in plants. J Biol Chem 281:12050–12059 Ehrenberg L, Hussain S (1981) Genetic toxicity of some important epoxides. Mutat Res 86:1–113 Giaever G, Chu AM, Ni L, Connelly C, Riles L, Veronneau S, Dow S, LucauDanila A, Anderson K, Andre B, Arkin AP, Astromoff A, El-Bakkoury M, Bangham R, Benito R, Brachat S, Campanaro S, Curtiss M, Davis K, Deutschbauer A, Entian KD, Flaherty P, Foury F, Garfinkel DJ, Gerstein M, Gotte D, Guldener U, Hegemann JH, Hempel S, Herman Z, Jaramillo DF, Kelly DE, Kelly SL, Kotter P, LaBonte D, Lamb DC, Lan N, Liang H, Liao H, Liu L, Luo C, Lussier M, Mao R, Menard P, Ooi SL, Revuelta JL, Roberts CJ, Rose M, Ross-Macdonald P, Scherens B, Schimmack G, Shafer B, Shoemaker DD, Sookhai-Mahadeo S, Storms RK, Strathern JN, Valle G, Voet M, Volckaert G, Wang CY, Ward TR, Wilhelmy J, Winzeler EA, Yang Y, Yen G, Youngman E, Yu K, Bussey H, Boeke JD, Snyder M, Philippsen P, Davis RW, Johnston M (2002) Functional profiling of the Saccharomyces cerevisiae genome. Nature 418:387–391 Gifford ML, Robertson FC, Soares DC, Ingram GC (2005) ARABIDOPSIS CRINKLY4 function, internalization, and turnover are dependent on the extracellular Crinkly repeat domain. Plant Cell 17:1154 –1166 Gilchrist E, Haughn G (2005) TILLING without a plough: a new method with applications for reverse genetics. Curr Opin Plant Biol 8:211–215 Gilchrist EJ, Haughn GW, Ying CC, Otto SP, Zhuang J, Cheung D, Hamberger B, Aboutorabi F, Kalynyak T, Johnson L, Bohlmann J, Ellis BE, Douglas CJ, Cronk QC (2006) Use of Ecotilling as an efficient SNP discovery tool to survey genetic variation in wild populations of Populus trichocarpa Molecular Ecology 15:1367–1378 Goff S, Ricke D, Lan T, Presting G, Wang R, Dunn M, Glazebrook J, Sessions A, Oeller P, Varma H, Hadley D, Hutchison D, Martin C, Katagiri F, Lange B, Moughamer T, Xia Y, Budworth P, Zhong J, Miguel T, Paszkowski U, Zhang S, Colbert M, Sun W, Chen L, Cooper B, Park S, Wood T, Mao L, Quail P, Wing R, Dean R, Yu Y, Zharkikh A, Shen R, Sahasrabudhe S, Thomas A, Cannings R, Gutin A, Pruss D, Reid J, Tavtigian S, Mitchell J, Eldredge G, Scholl T, Miller R, Bhatnagar S, Adey N, Rubano T, Tusneem N, Robinson R, Feldhaus J, Macalma T, Oliphant A, Briggs S (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 296:92–100 Goll MG, Bestor TH (2002) Histone modification and replacement in chromatin activation. Genes Dev 16:1739–1742 Gong J-M, Waner D, Horie T, Li S, Horie R, Abid K, Schroeder J (2004) Microarray-based rapid cloning of an ion accumulation deletion mutant in Arabidopsis thaliana. Proc Natl Acad Sci USA 101:15404–15409 Goodhead DT (1995) Molecular and cell models of biological effects of heavy ion radiation. Radiat Environ Biophys 34:67–72
176
Ramesh S. Bhat et al.
Graf U, Wurgler FE, Katz AJ, Frei H, Juon H, Hall CB, Kale PG (1984) Somatic mutation and recombination test in Drosophila melanogaster. Environ Mutagen 6:153–188 Guenet J (2004) Chemical mutagenesis of the mouse genome: an overview. Genetica 122:9–24 Hagen U (1994) Mechanisms of induction and repair of DNA double-strand breaks by ionizing radiation: some contradictions. Radiat Environ Biophys 33:45–61 Hase Y, Tanaka A, Baba T, Watanabe H (2000) FRL1 is required for petal and sepal development in Arabidopsis. Plant J 24:21–32 Henikoff S, Comai L (2003) Single-nucleotide mutations for plant functional genomics. Annu Rev Plant Biol 54:375–401 Henikoff S, Till B, Comai L (2004) TILLING. Traditional mutagenesis meets functional genomics. Plant Physiol 135:630–636 Hieter P, Boguski M (1997) Functional genomics: it’s all how you read it. Science 278:601–602 International Rice Genome Sequencing Project (2005) The map-based sequence of the rice genome. Nature 436:793–800 Jansen G, Hazendonk E, Thijssen K, Plasterk R (1997) Reverse genetics by chemical mutagenesis in Caenorhabditis elegans. Nat Genet 17:119–121 Kieber JJ, Rothenberg M, Roman G, Feldmann KA, Ecker JR (1993) CTR1, a negative regulator of the ethylene response pathway in Arabidopsis, encodes a member of the Raf family of protein kinases. Cell 72:427–441 Kim J, DellaPenna D (2006) Defining the primary route for lutein synthesis in plants: The role of Arabidopsis carotenoid beta-ring hydroxylase CYP97A3. Proc Natl Acad Sci USA 103:3474 –3479 Kitamura S, Shikazono N, Tanaka A (2004) TRANSPARENT TESTA 19 is involved in the accumulation of both anthocyanins and proanthocyanidins in Arabidopsis. Plant J 37:104–114 Koornneef M, Dellaert LW, van der Veen JH (1982) EMS- and radiation-induced mutation frequencies at individual loci in Arabidopsis thaliana (L.) Heynh. Mutat Res 93:109–123 Kraft G, Kramer M, Scholz M (1992) LET, track structure and models. A review. Radiat Environ Biophys 31:161–180 Kurata N, Yamazaki Y (2006) Oryzabase. An integrated biological and genome information database for rice. Plant Physiol 140:12–17 Leung H, Wu C, Baraoidan M, Bordeos A, Ramos M, Madamba S, Cabauatan P, Vera Cruz C, Portugal A, Reves G, Bruskiewich R, McLaren G, Gregorio G, Bennett J, Brar D, Khush G, Schnable P, Wang G, Leach J (2001) Deletion mutants for functional genomics: Progress in phenotyping, sequence assignment, and database development. In: Khush G, Brar D, Hardy B (eds) Rice Genetics IV. Science Publishers, New Delhi, pp 239–251 Li B, Conway N, Navarro S, Comai L (2005) A conserved and species-specific functional interaction between the Werner syndrome-like exonuclease at WEX and the Ku heterodimer in Arabidopsis. Nucl Acids Res 33:6861– 6867
8 Chemical- and Irradiation-Induced Mutants and TILLING
177
Li Q, Liu Z, Monroe H, Culiat CT (2002) Integrated platform for detection of DNA sequence variants using capillary array electrophoresis. Electrophoresis 23:1499–1511 Li X, Zhang Y (2002) Reverse genetics by fast neutron mutagenesis in higher plants. Funct Integr Genomics 2:254–258 Li X, Song Y, Century K, Straight S, Ronald P, Dong X, Lassner M, Zhang Y (2001) A fast neutron deletion mutagenesis-based reverse genetics system for plants. Plant J 27:235–242 Liu B, Zhang S, Zhu X, Yang Q, Wu S, Mei M, Mauleon R, Leach J, Mew T, Leung H (2004) Candidate defense genes as predictors of quantitative blast resistance in rice. Mol Plant-Microbe Interact 17:1146–1152 Liu LX, Spoerke JM, Mulligan EL, Chen J, Reardon B, Westlund B, Sun L, Abel K, Armstrong B, Hardiman G, King J, McCague L, Basson M, Clover R, Johnson CD (1999) High-throughput isolation of Caenorhabditis elegans deletion mutants. Genome Res 9:859–867 McCallum CM, Comai L, Greene EA, Henikoff S (2000a) Targeting induced local lesions in genomes (TILLING) for plant functional genomics. Plant Physiol 123:439–442 McCallum CM, Comai L, Greene EA, Henikoff S (2000b) Targeted screening for induced mutations. Nat Biotechnol 18:455–457 Monna L, Ohta R, Masuda H, Koike A, Minobe Y (2006) Genome-wide searching of single-nucleotide polymorphisms among eight distantly and closely related rice cultivars (Oryza sativa L.) and a wild accession (Oryza rufipogon Griff.). DNA Res 13:43–51 Morris ER, Chevalier D, Walker JC (2006) DAWDLE, a Forkhead-associated domain gene, regulates multiple aspects of plant development. Plant Physiol 141:932–941 Nadeau JH, Frankel WN (2000) The roads from phenotypic variation to gene discovery: mutagenesis versus QTLs. Nat Genet 25:381–384 Nambara E, Keith K, McCourt P, Naito S (1994) Isolation of an internal deletion mutant of the Arabidopsis thaliana ABI3 gene. Plant Cell Physiol 35:509–513 Nikjoo H, Uehara S, Wilson WE, Hoshi M, Goodhead DT (1998) Track structure in radiation biology: theory and applications. Int J Radiat Biol 73:355–364 Olsen OA, Green MM (1982) The mutagenic effects of diepoxybutane in wildtype and mutagen-sensitive mutants of Drosophila melanogaster. Mutat Res 92:107–115 Olsen O, Wang X, von Wettstein D (1993) Sodium azide mutagenesis: Preferential generation of A·T G·C transitions in the barley Ant18 Gene. Proc Natl Acad Sci USA, 90:8043–8047. Oppenheimer LW, Farine D, Ritchie JW, Lewinsky RM, Telford J, Fairbanks LA (1991) What is a low-lying placenta? Am J Obstet Gynecol 165:1036–1038 Perry J, Wang T, Welham T, Gardner S, Pike J, Yoshida S, Parniske M (2003) A TILLING reverse genetics tool and a web-accessible collection of mutants of the legume Lotus japonicus. Plant Physiol 131:866–871 Peters JL, Cnudde F, Gerats T (2003) Forward genetics and map-based cloning approaches. Trends Plant Sci 8:484–491
178
Ramesh S. Bhat et al.
Raghavan C, Naredo E, Wang H, Atienza G, Liu B, Qiu F, McNally K, Leung H (2007) Rapid method for detecting SNPs on agarose gels and its application in candidate gene mapping. Molecular Breeding 19:87–101 Ramalingam J, Vera Cruz C, Kukreja K, Chittoor J, Wu J, Lee S, Baraoidan M, George M, Cohen M, Hulbert S, Leach J, Leung H (2003) Candidate resistance genes from rice, barley, and maize and their association with qualitative and quantitative resistance in rice. Mol Plant-Microbe Interact 16:14–24 Reardon JT, Liljestrand-Golden CA, Dusenbery RL, Smith PD (1987) Molecular analysis of diepoxybutane-induced mutations at the rosy locus of Drosophila melanogaster. Genetics 115:323–331 Rédei GP, Koncz C (1992) Classical mutagenesis. In Arabidopsis. In: Koncz C, Schell J, Chua N-H (eds) Molecular Genetics. World Scientific Publisher, Singapore, pp 16–82 Resnick JS, Wen C-K, Shockey JA, Chang C (2006) REVERSION-TOETHYLENE SENSITIVITY1, a conserved gene that regulates ethylene receptor function in Arabidopsis. Proc Natl Acad Sci USA 103:7917–7922 Sakamoto A, Lan VT, Hase Y, Shikazono N, Matsunaga T, Tanaka A (2003) Disruption of the AtREV3 gene causes hypersensitivity to ultraviolet B light and gamma-rays in Arabidopsis: implication of the presence of a translesion synthesis mechanism in plants. Plant Cell 15:2042–2057 Salmeron JM, Oldroyd GE, Rommens CM, Scofield SR, Kim HS, Lavelle DT, Dahlbeck D, Staskawicz BJ (1996) Tomato Prf is a member of the leucinerich repeat class of plant disease resistance genes and lies embedded within the Pto kinase gene cluster. Cell 86:123–133 Sarma NP, Patnaik A, Jachuck PJ (1979) Azide mutagenesis in rice: Effect of concentration and soaking time on induced chlorophyll mutation frequency Environ Exp Bot 19:117–121 Sega G (1984) A review of the genetic effects of ethyl methanesulfonate. Mutat Res 134:113–142 Shen YJ, Jiang H, Jin JP, Zhang ZB, Xi B, He YY, Wang G, Wang C, Qian L, Li X, Yu QB, Liu HJ, Chen DH, Gao JH, Huang H, Shi TL, Yang ZN (2004) Development of genome-wide DNA polymorphism database for map-based cloning of rice genes. Plant Physiol 135:1198–1205 Shikazono N, Yokota Y, Tanaka A, Watanabe H, Tano S (1998) Molecular analysis of carbon ion-induced mutations in Arabidopsis thaliana. Genes Genet Syst 73:173–179 Shikazono N, Tanaka A, Watanabe H, Tano S (2001) Rearrangements of the DNA in carbon ion-induced mutants of Arabidopsis thaliana. Genetics 157:379–387 Shikazono N, Yokota Y, Kitamura S, Suzuki C, Watanabe H, Tano S, Tanaka A (2003) Mutation rate and novel tt mutants of Arabidopsis thaliana induced by carbon ions. Genetics 163:1449–1455 Shikazono N, Suzuki C, Kitamura S, Watanabe H, Tano S, Tanaka A (2005) Analysis of mutations induced by carbon ions in Arabidopsis thaliana. J Exp Bot 56:587–596
8 Chemical- and Irradiation-Induced Mutants and TILLING
179
Shirley BW, Hanley S, Goodman HM (1992) Effects of ionizing radiation on a plant genome: analysis of two Arabidopsis transparent testa mutations. Plant Cell 4:333–347 Shukla PT, Auerbach C (1980) Genetic tests for the detection of chemically induced small deletions in Drosophila chromosomes. Mutat Res 72:231–243 Slade A, Fuerstenberg S, Loeffler D, Steine M, Facciotti D (2005) A reverse genetic, nontransgenic approach to wheat crop improvement by TILLING. Nat Biotechnol 23:75–81 Smith HH (1972) Comparative genetic effects of different physical mutagens in higher plants. In: Induced Mutations and Plant Improvement, International Atomic Energy Agency, Vienna, pp 75–93 Sun T, Goodman HM, Ausubel FM (1992) Cloning the Arabidopsis GA1 Locus by genomic subtraction. Plant Cell 4:119–128 Suzuki T, Eiguchi M, Satoh H, Kumamaru T, Kurata N (2005) A modified TILLING system for rice mutant screening. Rice Genet Newsl 22:89–91 Tanaka A, Tano S, Chantes T, Yokota Y, Shikazono N, Watanabe H (1997) A new Arabidopsis mutant induced by ion beams affects flavonoid synthesis with spotted pigmentation in testa. Genes Genet Syst 72:141–148 The Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796–815 Till B, Colbert T, Tompa R, Enns L, Codomo C, Johnson J, Reynolds S, Henikoff J, Greene E, Steine M, Comai L, Henikoff S (2003a) High-throughput TILLING for functional genomics. Methods Mol Biol 236:205–220 Till B, Reynolds S, Greene E, Codomo C, Enns L, Johnson J, Burtner C, Odden A, Young K, Taylor N, Henikoff J, Comai L, Henikoff S (2003b) Large-scale discovery of induced point mutations with high-throughput TILLING. Genome Res 13:524–530 Till B, Reynolds S, Weil C, Springer N, Burtner C, Young K, Bowers E, Codomo C, Enns L, Odden A, Greene E, Comai L, Henikoff S (2004) Discovery of induced point mutations in maize genes by TILLING. BMC Plant Biol 4:12 Vogel E, Natarajan A (1995) DNA damage and repair in somatic and germ cells in vivo. Mutat Res 330:183–208 Wang GL, Wu C, Zeng L, He C, Baraoidan M, de Assis Goes da Silva F, Williams CE, Ronald PC, Leung H (2004) Isolation and characterization of rice mutants compromised in Xa21-mediated resistance to X. oryzae pv. oryzae. Theor Appl Genet 108:379–384 Wang H (2005) Application of EcoTILLING to relate molecular variation in a rice ethylene response factor (ERF3) gene to drought stress response. M.S. Thesis, University of the Philippines, Los Banos, Philippines, p 86 Watson WAF (1966) Further evidence of an essential difference between the genetical effects of mono- and bifunctional alkylating agents. Mutat Res 3:452–455 Wienholds E, Schulte-Merker S, Walderich B, Plasterk RH (2002) Target-selected inactivation of the zebrafish rag1 gene. Science 297:99–102 Wienholds E, van Eeden F, Kosters M, Mudde J, Plasterk R, Cuppen E (2003) Efficient target-selected mutagenesis in zebrafish. Genome Res 13:2700–2707
180
Ramesh S. Bhat et al.
Wilkinson JQ, Crawford NM (1991) Identification of the Arabidopsis CHL3 gene as the nitrate reductase structural gene NIA2. Plant Cell 3:461–471 Winkler S, Schwabedissen A, Backasch D, Bokel C, Seidel C, Bonisch S, Furthauer M, Kuhrs A, Cobreros L, Brand M, Gonzalez-Gaitan M (2005) Target-selected mutant screen by TILLING in Drosophila. Genome Res 15:718–723 Winzeler EA, Castillo-Davis CI, Oshiro G, Liang D, Richards DR, Zhou Y, Hartl DL (2003) Genetic diversity in yeast assessed with whole-genome oligonucleotide arrays. Genetics 163:79–89 Wu JL, Wu C, Lei C, Baraoidan M, Bordeos A, Madamba MR, Ramos-Pamplona M, Mauleon R, Portugal A, Ulat VJ, Bruskiewich R, Wang G, Leach J, Khush G, Leung H (2005) Chemical- and irradiation-induced mutants of indica rice IR64 for forward and reverse genetics. Plant Mol Biol 59:85–97 Yu J, Hu S, Wang J, Wong G, Li S, Liu B, Deng Y, Dai L, Zhou Y, Zhang X, Cao M, Liu J, Sun J, Tang J, Chen Y, Huang X, Lin W, Ye C, Tong W, Cong L, Geng J, Han Y, Li L, Li W, Hu G, Huang X, Li W, Li J, Liu Z, Li L, Liu J, Qi Q, Liu J, Li L, Li T, Wang X, Lu H, Wu T, Zhu M, Ni P, Han H, Dong W, Ren X, Feng X, Cui P, Li X, Wang H, Xu X, Zhai W, Xu Z, Zhang J, He S, Zhang J, Xu J, Zhang K, Zheng X, Dong J, Zeng W, Tao L, Ye J, Tan J, Ren X, Chen X, He J, Liu D, Tian W, Tian C, Xia H, Bao Q, Li G, Gao H, Cao T, Wang J, Zhao W, Li P, Chen W, Wang X, Zhang Y, Hu J, Wang J, Liu S, Yang J, Zhang G, Xiong Y, Li Z, Mao L, Zhou C, Zhu Z, Chen R, Hao B, Zheng W, Chen S, Guo W, Li G, Liu S, Tao M, Wang J, Zhu L, Yuan L, Yang H (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296:79–92 Zimmering S (1983) The mei-9a test for chromosome loss in Drosophila: a review of assays of 21 chemicals for chromosome breakage. Environ Mutagen 5:907–921
9 T-DNA Insertion Mutants as a Resource for Rice Functional Genomics
1
2
3
3
Emmanuel Guiderdoni , Gynheung An , Su-May Yu , Yue-ie Hsing and 4 Changyin Wu 1
CIRAD, AMIS department, UMR PIA 1096, F-34398 Montpellier, France; Department of Life Science and National Research Laboratory of Plant Functional Genomics, Pohang University of Science and Technology, Pohang 790-784, Republic of Korea; 3Institute of Molecular Biology and Institute of Plant and Microbial Biology, Academia Sinica, Yienchuyuan Rd., Nankang, Taipei, 11529, Taiwan, Republic of China; 4National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China 2
Reviewed by Alain Lecharny and Michel Delseny
9.1 Introduction..............................................................................................182 9.2 Agrobacterium-Mediated Transformation of Rice...................................183 9.3 T-DNA as an Insertional Mutagen...........................................................185 9.4 Rice T-DNA Insertional Mutant Populations ..........................................188 9.4.1 Korea ................................................................................................188 9.4.2 China ................................................................................................189 9.4.3 France ...............................................................................................192 9.4.4 Taiwan..............................................................................................194 9.4.5 Current Collection of T-DNA Insertion Lines and FSTs .................194 9.5 Current Knowledge on T-DNA Integration in Rice.................................195 9.6 T-DNA Insertion Specificity in Rice .......................................................198 9.6.1 Preference Among and Along Rice Chromosomes ..........................198 9.6.2 Preference for Integration into Intergenic versus Genic Regions and Regulatory versus Coding Regions............................................201 9.6.3 Preference for Insertion in Expressed Genes....................................203 9.6.4 Preference for GC Content and DNA Structure ...............................203 9.6.5 Preference for Functional Category of Gene ....................................204 9.6.6 Estimation of the Number of Lines Required to Saturate the Rice Genome ..............................................................................204 9.7 Gene and Enhancer Trapping with T-DNA in Rice.................................204
182
Emmanuel Guiderdoni et al.
9.8 Forward Genetics Screens and Gene Isolation Using T-DNA Insertion Lines ........................................................................................................ 208 9.8.1 Gene Trapping.................................................................................. 209 9.8.2 Activation Tagging........................................................................... 211 9.9 Reverse Genetics with T-DNA Mutants in Rice...................................... 212 9.10 Conclusion and Prospects ...................................................................... 213 Acknowledgments ......................................................................................... 215 References ..................................................................................................... 215
9.1 Introduction The latest annotated release of the completed rice (japonica cv. Nipponbare) genome sequence unravelled a wealth of 42,653 genes, excluding transposable element (TE)-related genes (http://rice.tigr.org), the majority of which have no assigned functions or no known homologues in Arabidopsis. In this context, the establishment of high-throughput methods for investigating gene function through inactivation of gene expression in rice is urgently needed. Among various inactivation methods, insertional mutagenesis using either class I and II transposable elements or transferred DNA (T-DNA) is one of the most straightforward approaches for assigning a function to a particular sequence and to isolate the gene that causes a particular phenotype. Experience gained in Arabidopsis during the last decade has highlighted T-DNA as the preferred insertion mutagen for generating large libraries of lines. Stemming from the late 1990s, considerable efforts have been undertaken in rice through several independent national initiatives— mainly in Korea, China, France, and Taiwan—for generating T-DNA insertion libraries, characterizing T-DNA flanking sequences at insertion points and gathering phenotype and sequence information in Web accessible databases (Hirochika et al. 2004; An et al. 2005a). This effort has led to the generation of more than 460,000 T-DNA lines and the release of more than 113,000 flanking sequence tags (FSTs) in public databases. It is presumed that the full characterization of all these lines with other concurrent initiatives using the maize Ac/Ds and En/Spm transposable elements and the tissue culture stimulated endogenous retrotransposon Tos17 will enable rice geneticists to find at least one insertion in any rice gene and several alleles in most of the genes. Here we review the progress achieved in generating and characterizing T-DNA insertion libraries in rice and identifying or validating genes using T-DNA insertions in the rice genome.
9 T-DNA Insertion Mutants as a Resource
183
9.2 Agrobacterium-Mediated Transformation of Rice In nature, Agrobacterium tumefaciens, a soil-borne bacterial phytopathogen, is capable of inserting a defined DNA segment from its tumor-inducing (Ti) plasmid into the plant genome on infection at the host wound sites, resulting in tumor development (for a review see Gelvin 2003). This transferred DNA fragment (T-DNA) of the Ti plasmid is delimited by two 25-bp border repeats which are recognized by VirD1 and VirD2 proteins produced by virulence (Vir) genes also located on the Ti plasmid. Vir gene expression is induced by the emission of phenolic compounds from a plant wound. VirD1 and VirD2 proteins produce a single-stranded nick between the third and fourth nucleotides of each border repeat in the bottom strand of the T-DNA (Yanofsky et al. 1986). This nicked strand is then transferred to the plant cell as a single-stranded (ss) DNA molecule covalently linked and coated with VirD2 and VirE proteins. These two proteins act as chaperones to target the T-DNA to the cell nucleus, where it becomes integrated into the host genome (Gheysen et al. 1991; Mayerhofer et al. 1991; Tinland 1996). It has been postulated that the T-DNA is integrated by illegitimate recombination. The 3' end of the ss T-DNA lands at a single-stranded region of genomic DNA via sequence homology–dependent annealing (Tinland 1996). In the currently accepted model, a short double stranded region is made from the ss TDNA to ligate the T-DNA to a genomic double-strand break (Salomon and Puchta 1998; Puchta 1999). The Ti plasmids have been modified to serve as vectors for introducing foreign DNA into plant genomes. Such vectors have been successfully used for DNA transfer into a large range of dicotyledonous species. Monocotyledonous species, particularly cereals, were long thought to be recalcitrant to Agrobacterium transformation because of the absence of the amenability of a cereal cell to enter in a wound-healing process through triggered cell division. This limitation was circumvented by the use of rapidly dividing cereal cells. The first recovery of transgenic japonica rice plants after coculture of immature embryos and embryo-derived callus with Agrobacterium was reported independently in Taiwan (Chan et al. 1993) and Japan (Hiei et al. 1994). The system developed by Hiei et al. (1994) appears to be the most reproducible which relies on coculture of fast-growing, highly responsive seed embryo–derived calli and the addition of acetosyringone. The use of the so-called supervirulent EHA101 (or EHA105 derivative) Agrobacterium strain, which carries a disarmed version of the pTiBo542 Ti plasmid, and a superbinary vector containing duplicated Ti plasmid VirB, VirC, and VirG gene sequences in the binary plasmid backbone (pTOK) was recommended for recalcitrant cultivars whereas nonsupervirulent strains (such as LBA4404) and ordinary binary
184
Emmanuel Guiderdoni et al.
plasmids (such as pBIN19 vectors and derivatives) were sufficient for transforming amenable cultivars (Hiei et al. 1997). Since then, many laboratories have reported on the transformation of temperate and tropical japonica genotypes as well as indica cultivars (Aldemita and Hodges 1996; Dong et al. 1996). However, with increasing experience it was realized that the combination of a supervirulent strain and a superbinary plasmid was not mandatory as initially thought. These protocols allowed the routine generation of 10 to 50 transgenic plants per 100 cocultured callus pieces. Although these transformation efficiencies proved to be sufficient for introducing a range of genes of interest in japonica and indica rice cultivars (recently reviewed in Bajaj and Mohanty 2005), the implementation of genome-wide T-DNA insertional mutagenesis required the development of a high-throughput transformation procedure permitting the generation of thousands of transformants of a model cultivar in a single transformation experiment. The gradual improvements in Agrobacterium-mediated transformation methodologies—from the root explant (Valvekens et al. 1988) and seed transformation methods (Feldmann and Marks 1987) to floral/whole plant dip techniques (Bechtold et al. 1993) have indeed been instrumental in obtaining an output of T-DNA plants suitable for genome-wide insertional mutagenesis in Arabidopsis. However, high-throughput transformation procedures for rice functional genomics were established only recently (Lee et al. 1999; Sallaud et al. 2003; Terada et al. 2004). A highly efficient method of transformation in japonica rice now allows the routine generation of 100 to 500 independent transgenic plants per 100 cocultured callus pieces. The various steps illustrating such highthroughput protocols are outlined in Fig. 9.1 (Sallaud et al. 2003). The critical parameters influencing the transformation efficiency are the precise timing of subculturing, careful selection of rice callus tissues before coculture, the conditions of coculture and selection allowing the recovery of a large number (10 to 20) of transformed cell lines from 30% to 90% of the cocultured calli. With microprojectile bombardment, several (one to four) resistant cell lines arising from a single immature embryo scutellum or callus piece proved to be mostly clonal in nature, resulting from the fragmentation of a unique transformation event (Chen et al. 1998). However, with Agrobacterium-mediated transformation, 2 to 30 resistant cell lines arising from a single cocultured callus piece proved to be independent transformation events with a 95% frequency, thereby considerably enhancing the efficiency of the transformation procedure (Sallaud et al. 2003). This may be due to the fact that when calli are immersed in the Agrobacterium suspension, a much higher number peripheral callus cells become accessible to a gentle gene delivery compared to exposure to particle bombardment.
9 T-DNA Insertion Mutants as a Resource
185
Fig. 9.1. Agrobacterium-mediated transformation procedure. Globular somatic embryos (B, arrows) released from primary, seed embryo scutellum-derived callus (A) are transferred to fresh medium to reach an optimal size (C) and are immersed into liquid coculture medium (R2CL) containing EHA105 or LBA4404 cells at an OD600 of 1, for 15 min, then blot dried and transferred to Petri dishes containing solid, coculture medium (R2CS), for a 3-day incubation period in the dark at 25°C. The procedure for selecting hygromycin-resistant cell lines (D, here visualized through GFP activity at the surface of a cocultured callus, 14 days after transfer to R2S selective medium) includes subcultures on selective growth and maturation media NBS and PR-AG (E and F, 28 and 35 days after the transfer of cocultured callus to the first R2S selective medium), then to the RN regeneration medium under light. Young plantlets are allowed to develop for a further 3 weeks in rooting medium in test tubes and are then transferred to the greenhouse to set seeds. (Reproduced from Sallaud et al. 2003) (See also color plate section).
9.3 T-DNA as an Insertional Mutagen T-DNA insertional mutagenesis takes advantage of the mostly random integration of the T- DNA of Agrobacterium tumefaciens to create dispersed molecular tags throughout a genome. When a tag is inserted
186
Emmanuel Guiderdoni et al.
within a gene, it can create stably inherited mutation with a visible phenotype. The T-DNA may in addition be equipped with activator sequences or a promoter less reporter gene (see later) for increasing the chances of detecting a phenotype through enhanced gene expression or reporter gene-mediated detection of an endogenous tagged gene. Reporter gene expression generally follows the original pattern of expression of the tagged gene. In Arabidopsis, large collections of several hundred thousand insertion lines are now available to provide the systematic disruption of any gene (see the TAIR portal http://www.arabidopsis.org/index.jsp): While several initiatives have demonstrated the potential of the maize Ac/Ds and En/Spm transposon systems (Parinov et al. 1999; Speulman et al. 1999; Tissier et al. 1999; Marsch-Martinez et al. 2002; Raina et al. 2002), T-DNA was the preferred choice for insertional mutagen for this model dicot species because of the availability of an efficient transformation procedure (Feldmann 1991; Koncz et al. 1992; Bechtold et al. 1993; Krysan et al. 1999; Sessions et al. 2002; Szabados et al. 2002; Alonso et al. 2003). The tagged lines have been extensively used for forward genetics screens under standard or specific growth conditions, as well as reverse genetics searches for insertions in particular candidate sequences. Identification of individual plant/lines that carry a particular mutation in a known sequence of interest is achievable either by polymerase chain reaction (PCR) screening of DNA pools, using one primer specific to the candidate sequence and another primer specific to the mutagen (McKinney et al. 1995; Krysan et al. 1999; Young et al. 2001; Rios et al. 2002) or by computational searches in FST databases. Despite the large initial effort required, FST databases are preferred tools for straightforward identification of mutations in genes of interest. Searching for mutants in a particular gene of interest is performed in silico through a Web database interface (Samson et al. 2002, http://urgv.evry.inra.fr/ FLAGdb/; Sessions et al. 2002, http://www.nadii.com/pages/collabortions/ garlicfiles/GarlicDescription.html; Alonso et al. 2003, http://signal.salk. edu/cgi-bin/tdnaexpress; Rosso et al. 2003, http://www.mpiz–koeln. mpg.de/GABI-Kat/). Further, linking of FST databases to phenotype databases would greatly accelerate the assignment of function to candidate genes.
9 T-DNA Insertion Mutants as a Resource
187
Table 9.1. Details of T-DNA tagged japonica rice populations currently being produced worldwide Institution
a
Postech
BRI-CAAS c SIPPE d HZAU
e
b
Cultivars used
Copy (locus) number
Comments
Reference
Dongjin Hwayoung
2.0 (1.4)
Original Tos17 copies= 2.0; Average number of new Tos17 copies in the mutant population = 4.0
Lee et al. 1999
Nipponbare Zhonghua 11 Zhonghua 15
2.1 2.0 2.0
Number of loci and number of Tos17 copies not yet determined
Yang et al. 2004 Wu et al. 2003
Cirad
Nipponbare 2.2-2.8 (1.5) (Zhong zuo 321, Azucena, Kasalath, TN1, TY1, Bala)
Single co-cultivated callus piece Sallaud et al. 2003, yielded multiple independent 2004 T-DNA tagged lines; Average number of complete T-DNA copies is 2.2 based on hybridization signals revealed by both gusA (uidA) and hph probes and 2.8 when incomplete T-DNA copies are taken into account; Original Tos17 copies (Nipponbare) = 2.0; Average number of new Tos17 copies in the mutant population = 3.0
Academia f Sinica
Tainung 67
57.9 and 23.0% transgenic lines Hsing et al. 2006 contain 1 and 2 integration loci, respectively; Original Tos17 copies = 3.0; Average copy number of new Tos17 in the mutant population = 0.1 (i.e., 91.3 % of transgenic lines contain 3 original Tos17)
ND (1.73)
a Plant Functional Genomics Lab. Dept. of Life Science, POSTECH San 31 Hyoja-dong, Nam-gu Pohang, Kyoungbuk, Korea (http://141.223.132.44/pfg/index.php). bBiotechnology Research Institute, The Chinese Academy of Agricultural Sciences, Beijing 100081, People’s Republic of China. c Shanghai Institute of Plant Physiology and Ecology (SIPPE) Shanghai T-DNA Insertion Population (SHIP) http://ship.plantsignal.cn/index.do. dNational Center of Plant Gene Research, Huazhong Agricultural University (HZAU) Wuhan, 430070 P. R. China http://rmd.ncpgr.cn. eGenoplante-Oryza Tag Line (OTL), France (http://urgi.versailles.inra.fr/OryzaTagLine). f Academia Sinica, Institute of Molecular Biology and Institute of Plant and Microbial Biology, Taiwan, Republic of China (http://trim.sinica.edu.tw/).
The prerequisites that need to be carefully considered before embarking on a T-DNA insertional mutagenesis project are (1) the transformation procedure has to be highly efficient, routine, and reliable; (2) excellent
188
Emmanuel Guiderdoni et al.
growth conditions in containment greenhouses have to be established, since getting a decent amount of seeds from primary transformants is often a limitation; (3) a barcode system to track plant and seed materials, DNA samples and PCR products and information collected under greenhouse and field conditions should be implemented to reduce error in handling; (4) a high-throughput method for isolating genomic regions flanking T-DNA inserts has to be set up; (5) Quality checks and evaluations (sequence to seed stock) need to be performed on a regular basis, notably through library user’s feedback; and (6) a relational database, integrating sequence and phenotype information in both textual and searchable formats has to be set up and made accessible to collaborators via the Internet to enrich and update library information. Expertise gained in Arabidopsis insertional mutagenesis, improvements in rice transformation procedures (see earlier and Table 9.1) and in automated PCR methods as well as major, medium-term commitments of national authorities for funding such initiatives in rice have allowed the launching of large insertional mutagenesis projects in Korea, China, France, and Taiwan (Table 9.2). Limited efforts have also been made in several other countries. As these insertion libraries have been developed in four different japonica subgroup cultivars, appropriate controls should be used and this has to be kept in mind for pyramiding several mutations in a single line by crossing. One can also anticipate some sequence variations in these cultivars compared to the Nipponbare genome sequence, the extent of which remains to be evaluated from FST information.
9.4 Rice T-DNA Insertional Mutant Populations 9.4.1 Korea In a project partly supported by the Novartis company, the POSTECH group initially produced 22,090 primary transgenic lines (18,358 fertile lines) of japonica cv. Dongjin with a T-DNA construct bearing a gusA gene trap (pGA2144, Fig. 9.2a; Jeon et al. 2000). This was the first report of a large-scale generation of a T-DNA insertion library in rice containing an estimated 25,700 tags. Supported by the 21st Century Frontier Program and the Biogreen 21 program of the Rural Development Administration, the same group later generated 14,674 T-DNA insertion lines with tagging vector pGA2707 (Jeong et al. 2002) and 20,810 insertion lines with a bidirectional gene trap vector containing gusA and gfp reporter genes at the right and left border, respectively (pGA2717, Fig. 9.2a; Ryu et al. 2004). The POSTECH team also implemented an activation tagging strategy using multimerized enhancer elements of the CaMV 35S promoter to
a
Tainung 67
Taiwan Rice Insertional Mutagenesis (TRIM) Library Taiwan http://trim.sinica.edu.tw/
see Figures 9.2 and 9.3 for details.
Nipponbare
GénoplanteOryza Tag Line (OTL) France http://urgi.versailles. inra. fr/OryzaTagLine
gusA promoter trap + activation tag (pTag8)
40,000
KpnI, XhoI, PstI, SpeI
KpnI, XhoI, PstI, SpeI
gusA promoter trap (pTag4)
10,000
pPZP200
EcoRI
Gal4 enhancer trap (p4956ET15)
9,000
hph, gusA
hph, gusA
hph, Gal4
Johnson et al. 2005 Hsing et al. 2006
LB: CGCTCATGTGTTGAGCATAT RB: TCGCCTTGCAGCACATCC RB: AACTCATGGCGATCTCTTACC
A. Betzner and W. Tucker, Unpublished LB: CGCTCATGTGTTGAGCATAT RB: TCACGGGTTGGGGTTTCTACAGGAC hph, bar, gusA
XbaI
Sallaud et al. 2004
Wu et al. 2003; Zhang et al. 2006
LB: CGCTCATGTGTTGAGCATAT RB: TCACGGGTTGGGGTTTCTACAGGAC
LB:TCGCTCATGTGTTGAGCATA RB:TGCAGGTTCTCTCCAAATGA
hph, gusA
hph, Gal4
SstI
HindIII
gusA enhancer trap + Ds (p4984)
pCAMBIA1300
pCAMBIA1300
3,000
gusA enhancer trap (p4978)
Gal4:gfp enhancer trap (pEGFP)
90,185
20,000
Gal4:gusA:gfp enhancer trap (pSMRJ18R)
Gal4:gusA enhancer trap (pFX-E 2.42-15R)
31,318
3,794
Zhonghua 11 Zhonghua 15
LB: TAGCTAGAGTCGAGAATTCAGT RB: AACGCTGATCAATTCCACAG
hph, gusA ApaI, ClaI
gusA promoter trap + plasmid rescue activation tag (pGA2772)
11,473
National Program of Rice Functional Genomics Rice Mutant Database (RMD) China http://rmd.ncpgr.cn/
Jeong et al. 2002
LB: TTGGGGATCCTCTAGAGTCGAG RB: AACGCTGATCAATTCCACAG
hph, gusA
ClaI, PstI, XhoI
gusA promoter trap + activation tag (pGA2715)
36,469
Jeong et al. 2006
Ryu et al. 2004
LB: ACCTCGTCGAGAATTCAGTAC RB: AACGCTGATCAATTCCACAG
hph, gusA
ApaI, XhoI
gusA/gfp bidirectional promoter trap (pGA2717)
12,169
Jeon et al. 2000
LB: ACAAGCCGTAAGTGCAAGTG RB AACGCTGATCAATTCCACAG
hph, gusA
ClaI, PstI
pGA1611
gusA promoter trap (pGA2707)
20,810
Dongjin Hwayoung
POSTECH Rice Insertion Sequence Database (RISD) Korea http://141.223.132.44/pfg/ index.php
Reference
PCR primers specific to the left and right border of the T-DNA that can be used for detection of the insert in combination with a gene specific primer (5' to 3')
No. of lines
Cultivar
Library (web site)
Suggested probes for hybridization
Single cut restriction enzymes for Southern
Backbone
Constructsa
Table 9.2. Summary of the constructs used for generating the largest T-DNA insertion line libraries developed worldwide
9 T-DNA Insertion Mutants as a Resource 189
190
Emmanuel Guiderdoni et al.
produce 13,450 activation tagged (AT) pGA2715 lines (Fig. 9.2a; Jeong et al. 2002). This AT line library was further expanded to 47,932 lines with the generation of 23,009 pGA2715 lines and 11,473 pGA2772 lines in the japonica cv. Dongjin and Hwayoung (Jeong et al. 2006). So far, the POSTECH is the only group to have successfully implemented a PCRbased strategy for screening pooled T-DNA mutant DNA samples for insertions in candidate sequences. A systematic survey with MADS-box gene sequences (Lee et al. 2003) exemplifies the success of their strategy. The group has successfully sequenced a large number of T-DNA flanking regions isolated by inverse-PCR (iPCR) and so far collected 79,810 FSTs which are available to the public through the Rice Insertion Sequence Database (RISD) at http://www.postech.ac.kr/life/ pfg/risd. The group plans to increase the collection to at least 100,000 FSTs by the end of 2006. 9.4.2 China Over the past 5 years China’s Ministry of Science and Technology has funded the China Rice Functional Genomics Program (CRFGP) to develop tools and resources for functional genomics and characterization of important genes for rice molecular breeding (Wang et al. 2005). This program will be funded until 2010 to dissect further the function of rice genes of agronomic importance. Initial efforts to produce a T-DNA insertion population and to characterize insertion sites were made at the Beijing Institute of Microbiology (Sha et al. 2004) and Zhejiang University (Chen et al. 2003). The latter group positioned more than 1,000 FSTs in the rice genome. In total, the Chinese research groups have collectively generated more than 290,000 T-DNA insertion lines which include 100,000 of cv. Nipponbare from Beijing (Peng et al. 2005), 65,000 of cv. Zhonghua 11 from Shanghai (http://ship.plantsignal.cn/home.do) and 129,000 of cv. Zhonghua 11 and Zhonghua 15 from Wuhan (Wu et al. 2003). Gal4:VP16/UAS-gusA:gfp (vector pSMR-J18R), Gal4:VP16/UASgusA (vector pFX-E24.2-15R) and Gal4:VP16/UAS-gfp (vector pEGFP) T-DNA enhancer trapping systems (Fig. 9.2b) have been used to generate the respective libraries. Of these lines 24,500 have been phenotyped for important agronomic traits such as plant height, tiller number, panicle morphology, fertility, and abiotic stress tolerance. The Wuhan group of a joint national program, under the National Special Key Program on Rice
9 T-DNA Insertion Mutants as a Resource
191
Fig 9. 2. Structural elements of the T-DNAs used to generate (A) POSTECH Rice T-DNA insertion sequence database (RISD) and (B) HZAU rice mutant database (RMD) libraries of insertion lines. Left (LB) and right (RB) borders of the T-DNA; native (gusA) or modified (BoGUS) E.coli β-glucuronidase reporter gene; castor bean catalase intron (i); standard (gfp) or enhanced (sgfp and egfp) versions of the Aequorea victoria green fluorescent protein reporter gene; minimal promoter (–90 or –48bp, MP); tetramerized enhancer elements (4×SE: –417 to – 86 bp 35SE), promoter sequence (35S P), and terminator region (35S T) of the CaMV 35S; multimerized yeast 17 bp upstream activation sequences (UAS); composite transcription activator containing the yeast GAL4 binding domain fused to a modified VP16 activation domain (Gal4:VP16); promoter (OsTub1 P); coding sequence (OsTub1); second (I2) and third (I3) intron and terminator sequence (Tt) of the rice α-tubulin A1 gene; hygromycin phosphotransferase (hph) selectable gene; Agrobacterium nopaline synthase terminator (nos T); pTiA6 seventh gene terminator (T7). Arrows show the transcription direction
192
Emmanuel Guiderdoni et al.
Functional Genomics of China has developed a rice mutant database (RMD) that comprises 13,804 FSTs (Zhang et al. 2006a, 2006b). This database is now available on line (http://rmd.ncpr.cn) and contains detailed information of approximately 129,000 T-DNA insertion lines generated with the enhancer trap system (Wu et al. 2003). Another database containing 6,000 FSTs has been established by the Shanghai group (http://www.plantsignal.cn/ship/ index.htm). 9.4.3 France French public institutions CIRAD, INRA, IRD, and CNRS supported by the French Ministries of Research and Agriculture as well as by private companies (which include Bayer Crop Science, Biogemma and Bioplante) under the framework of the national plant genomics initiative Génoplante have generated a library of 40,000 T-DNA insertion lines in cv. Nipponbare from 1999 to 2003 (Sallaud et al. 2004; http://www.genoplante. com/). Three different T-DNA constructs (p4978, p4984, and p4956ET15, Fig. 9.3a) have been used. A library of Gal4:VP16/UAS-gfp (vector p4956ET15) enhancer trap lines was established in collaboration with the Stress Physiology Laboratory at the University of Cambridge, UK (Johnson et al. 2005). Seed multiplication and phenotypic evaluation of the library are currently being carried out (since year 2002) in Colombia under field conditions, in collaboration with the International Centre for Tropical Agriculture (CIAT). More than 15,000 lines have already been evaluated. FST rescue and sequencing of these flanking regions (>25,000) is expected to be completed by the end of 2006. Sequencing of 20,000 Tos17 insertions in the same T-DNA library is also underway in collaboration with the French national sequencing center, Genoscope (http://www. genoscope. cns.fr/). The seeds of the Génoplante Oryza Tag Line (OTL) entries and related sequence and phenotype information are available at http://urgi.versailles.inra.fr/OryzaTagLine. For the initial 6 months from the release date, seeds and related flanking sequence information are available to Génoplante partners. After this date this information will be made publicly available to other researchers. The FST information is integrated in a modified FlagDB++ module (originally developed for Arabidopsis thaliana insertion lines) which, following a request to locate a genomic position of a query sequence using the BLAST program establishes a graphical environment of the annotated genome sequence and associated FSTs (http://urgv.evry.inra.fr/FLAGdb). Public sequence information is also displayed under http://orygenesdb.cirad.fr/ (Droc et al. 2006; see Chapter 14 of this book). Meanwhile, the public
9 T-DNA Insertion Mutants as a Resource
193
Fig. 9.3. Structural elements of the T-DNAs used to generate the GENOPLANTE OTL (A) and Academia Sinica TRIM (B) libraries of insertion lines. Left (LB) and right (RB) borders of the T-DNA; left (LJ) and right (RJ) junctions of the Ac transposable element of maize. Modified first intron of the Amy7/RAmy1A gene containing three putative splicing donor and acceptor sites (Do/Ac); E.coli β-glucuronidase reporter gene (gusA); composite transcription activator containing the yeast GAL4 binding domain fused to a modified VP16 activation domain (Gal4/VP16); multimerized yeast 17 bp upstream activation sequences (UAS); enhanced version of the Aequorea victoria green fluorescent protein reporter gene (egfp); minimal promoter (–90 or –48 bp: MP); octomerized enhancer elements (– 417 to –86 bp 35SE), promoter sequence (35S P) and terminator region (35S T) of the CaMV 35S, Subterranean clover mosaic virus promoter (ScMV P) fused to the rice Actin 1 first intron (Ai); hygromycin phosphotransferase (hph) selectable gene with or without a castor bean catalase intron (i); Agrobacterium nopaline synthase terminator (nos T). Arrows point to the transcription direction
194
Emmanuel Guiderdoni et al.
institutions participating in the Génoplante consortium (CIRAD and IRD), which have a specific mandate to collaborate with less developed countries, are using the library for their own international collaborative projects with national and international agricultural research centers, under the umbrella of the Generation Challenge Program. 9.4.4 Taiwan The Taiwan Program Project on Genomics and Proteomics, Academia Sinica and the National Science and Technology Program for Agricultural Biotechnology, have been funding a network of laboratories in Taiwan since 2002. The purpose of funding is to generate a genome-wide gene knockout mutant library by T-DNA (Institute of Molecular Biology, Academia Sinica); analyze flanking sequences (Institute of Plant and Microbial Biology, Academia Sinica); collect, preserve, and distribute seeds (National Plant Genetic Resources Center) and to characterize phenotypes (Taiwan Agricultural Research Institute). Using the promoter trap and activation tag constructs pTag4 or Tag8 (Fig. 9.3b), respectively, the Taiwan effort has generated 10,000 and 40,000 lines in cv. Tainung 67. 11,992 FSTs isolated from the T-DNA right border have been produced (Hsing et al. 2006). A database of these insertion mutant populations has been established and is available online (http://trim.sinica.edu.tw/). 9.4.5 Current Collection of T-DNA Insertion Lines and FSTs Overall, there are now more than 460,000 T-DNA insertion lines produced in rice and from these lines 113,000 FSTs have been released to public databases. The number of T-DNA FSTs deposited in public databases is expected to grow to 130,000 by the end of 2006. The main question to arise at this point is, to what extent should this effort be pursued? or what is the number of characterized insertion sites needed to reach genome saturation such that there is a knockout in each rice gene? A prerequisite for the estimation of these values in a given plant system is determination of the average number of insertion loci of the mutagen as well as characterizing the mutagen’s insertional preference in particular regions of the genome and into genes.
9 T-DNA Insertion Mutants as a Resource
195
9.5 Current Knowledge on T-DNA Integration in Rice As in dicotyledonous plants, T-DNA was found to integrate into the rice genome at either one locus or at several independent loci (Hiei et al. 1994; Jeon et al. 2000 Yin and Wang 2000; Sallaud et al. 2003; Wu et al. 2003; Afolabi et al. 2004; Eamens et al. 2004; Sha et al. 2004; Yang et al. 2004). Multiple, intact, or truncated T-DNA copies are also frequently formed at a single locus in either direct or inverted repeats. Precise determination of the actual copy and locus numbers requires DNA blot hybridization using several probes specific to T-DNA (e.g., single-cut enzymes and probes extending to both the left and right border) and monitoring the segregation of hybridizing signals in progenies of primary transformants. Resistance assays based on expression of selectable gene harbored by the T-DNA are not always reliable because truncated T-DNA devoid of a functional selectable marker gene can be integrated along with an intact T-DNA copy with functional selectable marker gene. Shortened T-DNA insertions may be generated by breakage at some stage during the transfer or integration process, most probably after the synthesis of a normal T-DNA intermediate. Transgene silencing in progeny plants might also lead to underestimate loci number (Vain et al. 2003). Evaluating precise T-DNA organization in a set of 43 Dongjin plants transformed with the vector pGA2144 (see Fig. 9.2a). Kim et al. (2003) determined that in addition to 35% single T-DNA plants, 33% of plants harboured direct T-DNA repeats, while 26% and 9% had inverted repeats at the 5΄ and 3΄ end junctions, respectively. Average copy and locus numbers determined in the main T-DNA insertion libraries of rice are shown in Table 9.1. On average, 1.5 to 2.5 copies of the T-DNA are integrated, residing at an average of 1.4 to 2 loci. In these populations approximately 30% to 45% of the lines have single copy of the T-DNA insert. As in dicotyledonous species, the T-DNA boundary is not always clearly defined in transgenic rice plants. Vector backbone (VB) sequences that reside outside the LB or the RB have been detected at frequencies ranging from 33% to 45%, irrespective of the bacterial strain used for transformation (Yin and Wang 2000; Kim et al. 2003; Sallaud et al. 2003). There is an established correlation between the number of T-DNA copies integrated and the presence of VB sequences (Mieulet et al. unpublished). Such high frequencies have also been observed in dicotyledonous plants (15% to 70%) to suggest that non T-DNA sequences are transferred to the plant genome either independently of the T-DNA or linked to the T-DNA across either the LB or the RB (Kononov et al. 1997). One of the possible reasons for long T-DNA transfer may be inefficient nicking or insufficient
196
Emmanuel Guiderdoni et al.
VIR protein for the binary system. As a consequence, long T-DNA transfer sometimes greater than the unit length of the binary plasmid have been detected in transgenic rice plants (Yin and Wang 2000; Kim et al. 2003; Sha et al. 2004). Using PCR amplification, Kim et al. (2003) determined that T-DNA organization in 55 of the 77 lines analyzed exhibited LB readthroughs which were further sequenced. The T-DNA was mainly linked to the VB through the intact border sequence. In 67% of the LB read-through integration loci, the DNA transferred included the entire VB followed by a RB read-through resulting in integration of T-DNA/VB/T-DNA transgene. It has also been confirmed that the LB can serve as the start point of VB transfer, which continues through a RB read-through, terminating when the LB is once again encountered, resulting in the integration of a VB/T-DNA fragment. Recent large-scale sequencing of flanking regions of T-DNA inserts has confirmed the rather high frequency of tandem insertions and vector backbone in rice transformants (Table 9.3). Short regions of microhomology, insertions of filler DNA, and small deletions have been reported as features of T-DNA:plant DNA junctions in dicots (Gheysen et al. 1991; Mayerhofer et al. 1991; Tinland 1996; Kumar and Fladung 2002). In different sets of rice T-DNA insertion lines, 27.6% (Sha et al. 2004), 41.3% (Ryu et al. 2004), and 43.0% (Eamens et al. 2004; Zhu et al. 2006) of the integrated T-DNAs were nicked precisely at the same site found in dicotyledonous species (after the first or second base of the right border) and the remaining insertions resulted from a nick between 30 bp and 4 bp before, and after, the RB repeat respectively (Kim et al. 2003; Eamens et al. 2004; Ryu et al. 2004; Sha et al. 2004). Contrastingly, the cleavage site at LB rarely remained conserved and the sizes of the deletions are generally longer than at the RB and may reach several hundred bp (Kim et al. 2003; Eamens et al. 2004; Ryu et al. 2004; Zhu et al. 2006). Three major types of junction were observed: (1) one to several nucleotides overlap between T-DNA and genomic DNA, that is, homology between the T-DNA end and the rice genomic DNA; (2) frequent occurrence (35% to 50%) of filler DNA; and (3) less frequent direct link without any overlap or filler DNA (Kim et al. 2003; Eamens et al. 2004; Ryu et al. 2004; Sha et al. 2004; Peng et al. 2005). Ryu et al. (2004) determined that the filler DNA, which was observed in 35% of the junctions examined, was shorter than 30 bp in 80% of the filler sequences and consisted of plant genomic DNA, VB DNA or DNA of unknown origin.
9 T-DNA Insertion Mutants as a Resource
197
Table 9.3. Summary of the efficiencies of high-throughput procedures of isolation and sequencing of genomic regions flanking T-DNA inserts in rice insertion libraries Library (reference) POSTECH-RISD www.postech.ac.kr/ life/pfg/risd (Jeong et al. 2006)
No. of good sequences (>30 bp) Inverse - LB+RB: PCR 53,335
Method
Percentage of tandem repeats and VB ND
No. of FSTs anchored to the rice sequence 33,721 (63.2%)
HZAU RMD http://rmd.ncpgr.cn (Zhang et al. 2006a ; 200b)
TAILPCR
LB: 30,578
48.5
15, 754 (51.5%)
Génoplante OTL http://urgi.versailles.inra.fr/ OryzaTagLine (Sallaud et al. 2004; Mieulet et al. unpublished)
Walk PCR
LB: 29,028
30.4
20,203 (69.6%)
Academia Sinica TRIM http://trim.sinica.edu.tw/ (Hsing et al. 2006)
TAILPCR
48.2 RB: 14,077 RB: 20,497
6,742 (47.9%) 37.0
11,992 (58.5%)
The origin of filler DNA at T-DNA insertion sites is still debated. Windels et al. (2003) examined the filler sequences in 67 Arabidopsis T-DNA::plant genome junctions. In 27 (40%) of them, they found several predominantly short sequence motifs that are identical to sequence blocks in the immediate surroundings of the plant T-DNA integration site (i.e., identical to preinsertion site deletions or to plant DNA adjacent to either side of the T-DNA insertion site), in scattered positions along the T-DNA sequence, or both. As this contrasts with the majority of filler insertions found at double-strand break (DSB)-repaired junctions, which are made up of simple uninterrupted sequence blocks identical to sequences of the plant genome, authors suggested that filler DNA results from the nature of the initial interaction between the invading T-DNA and the plant target site. Before stabilization by host dependent nonhomologous end-joining (NHEJ) associated protein complexes, a free 3΄ protruding end, from either left or right T-DNA, lands and screens for microcomplementarity and once found it is taken as primer for simultaneous template-based DNA
198
Emmanuel Guiderdoni et al.
synthesis. Repeated T-DNA landing and take-off in the neighbourhood genome sequences would result in the observed patchwork feature of the filler sequence (Windels et al. 2003). In rice, most traceable filler DNAs were found to be derived from the T-DNA adjacent to the breakpoint or from the rice genome surrounding the T-DNA integration site (Zhu et al. 2006).
9.6 T-DNA Insertion Specificity in Rice For efficient primary transformant recovery the T-DNA integration event has to occur in a genomic region that does not prevent the expression of the T-DNA-borne selectable marker gene. Therefore, this may introduce a bias against the overall frequency of insertion in some lowly transcribed, heterochromatic regions of the genome. This means that in contrast to other mutagens such as transposable elements, the so-called “insertion preference” of the T-DNA has to be considered with caution, as insertions in regions not favoring the expression of the selectable gene of the T-DNA may not lead to the recovery of corresponding transgenic plants. Considering this, salient preference of recovery have been consistently reported in the literature after the recent characterization of large numbers of T-DNA insertion sites (An et al. 2003; Chen et al. 2003; Sallaud et al. 2004; Hsing et al. 2006; Jeong et al. 2006). 9.6.1 Preference Among and Along Rice Chromosomes The frequency of T-DNA integration was generally found to be proportional to chromosome size (An et al. 2003; Sallaud et al. 2004; Jeong et al. 2006) though some groups noted a slightly higher insertion density in chromosomes 1, 2, and 3 and a lower density in chromosomes 11 and 12 (Tables 9. 4 and 9.5; Chen et al. 2003; Sallaud et al. 2004; Zhang et al. 2006a). This distribution may parallel differences in gene content and in euchromatic status of these chromosomes. On the other hand, a nonuniform distribution of T-DNA inserts is observed along the chromosomes with a lower insertion density around the centromere region and a higher density in subtelomeric regions (Fig. 9.4). This distribution matches those of heterochromatic and euchromatic regions and possibly illustrates the contrasting recombinogenic activity in these regions. The influence of the
9 T-DNA Insertion Mutants as a Resource
199
Table 9.4. Summary of T-DNA FSTs released in GenBank used to analyze the distribution of T-DNA inserts over the rice chromosomes in Fig. 9.4 Institution
No. of Web site FSTs in GenBank POSTECH RISD, Korea 27,621 http://www.postech.ac.kr/life/pfg/risd HZAU RMD, China 13,249 http://rmd.ncpgr.cn GENOPLANTE , France 7,480 http://urgi.versailles.inra.fr/OryzaTagLine TRIM, Taiwan 7,053 http://trim.sinica.edu.tw/ Zhejiang University, 1,017 http://www.genomics.zju.edu.cn/ China ricetdna.html CSIRO, Australiaa 174 http://www.pi.csiro.au/fgrttpub Total
56,594 http://orygenesdb.cirad.fr/
a
These are mainly Ds/T-DNA launch pads specially suited for targeted localized Ds-mediated insertional mutagenesis Table 9.5. Distribution of T-DNA FSTs released in GenBank over the rice chromosomes Chromosome 1 2 3 4 5 6 7 8 9 10 11 12 Total
Size (Mb)
No. of FSTs
43.25 35.90 36.35 35.50 29.70 31.20 29.70 28.30 22.70 22.70 28.40 27.50
8,257 6,545 7,509 5,052 4,138 4,383 4,014 3,720 3,203 2,999 3,147 3,168
371.20
56,135
Insertion density per Mb 191 182 207 144 139 140 135 131 141 132 111 115 151
200
Emmanuel Guiderdoni et al.
Fig. 9.4. Density graphs of 56K T-DNA insertions (light gray, at right) and 32K FL cDNA (dark gray, at left) plotted for each 250 kb with a sliding window of 10 kb along the 12 rice pseudomolecules (x-axes). The position of the centromere is shown as a circle
centromere is particularly evident on short arms of chromosomes 4, 9, and 10 which also exhibit a lower frequency of expressed genes as deduced by the density of full-length (FL) cDNAs. These regions are known to be the most heterochromatic in pachytene chromosome observations using 4΄, 6-diamidino-2-phenylindole (DAPI) staining (Cheng et al. 2001). Along the same line, the three most euchromatic chromosomes identified
9 T-DNA Insertion Mutants as a Resource
201
in the latter report, that is, chromosomes 1, 2, and 3, were also exhibiting a higher density of mapped FL cDNA sequences and tended to harbor more T-DNA insertions. Hsing et al. (2006) recently carried out a comparison of FST sequences against those of copia-type and gypsy-type retrotransposon, rDNA genes, centromere-specific satellite DNA CentO, and subtelomerespecific satellite DNA Os48. They found that T-DNA has a higher frequency of insertion in heterochromatic regions compared to the retrotransposon Tos17. T-DNA insertions were specifically found in CentO regions suggesting that during tissue culture the chromatin structure in this region may be decondensed. An alternative explanation is that a particular sequence topology permits integration in these regions or that another T-DNA with an expressed hph gene has been inserted in a euchromatic region in the same lines. 9.6.2 Preference for Integration into Intergenic versus Genic Regions and Regulatory versus Coding Regions Rice genome annotation is still an incomplete process. Identification of boundaries of predicted genes and exons according to annotation algorithms have to be accurately trained against the rice genome sequence. Though studies were conducted on different FST populations of various sizes at different stages of completion of the rice genome and genome annotations, they consistently revealed preferential recovery of T-DNA inserts from gene-rich regions and very low recovery from repetitive DNA (An et al. 2003; Chen et al. 2003; Sallaud et al. 2004; Hsing et al. 2006; Jeong et al. 2006). The results emerging from the examination of the largest sets of FSTs are summarized in Table 9.6. Analysis of other smaller data sets also drew comparable conclusions (Chen et al. 2003; Ryu et al. 2004). Frequencies are rather consistent but points for no bias or bias in favour of genic regions, the conclusion being largely influenced by the intervals used to determine the boundaries of a gene. A bias in favor of the interval extending 1,000 bp upstream the start codon, or in the vicinity of ATG (transcription site) and in the 500-bp interval downstream the stop codon or at the end of the coding region have been reported (An et al. 2003; Chen et al. 2003; Hsing et al. 2006; Sallaud et al. 2004). In contrast, Tos17 (Miyao et al. 2003) exhibits a clear preference for insertion into genic regions (88.43%), including exons (38.25%) and introns (35.62%), and this frequency is much higher than that observed with T-DNA (Table 9. 6).
17,982
NIAS Tos17
Rice genome
11,992
TRIM
7,292
13,761
RMD
Génoplante OTL
54.70
27,621
POSTECHRISD
46.00
11.31
20.30
27.40
35.50
Intergenic (%)
No. of FSTs
Library
54.00 (–1000/+200)
88.43 (–1000/+200)
79.21 (–1000/+200)
72.20 (–1000/+200)
64.50 (–1000/+500)
45.30 (–300/+300)
Overall
15.62 (–1000)
10.97 (–1000)
26.18 (–1000)
22.93 (–1000)
19.70 (–1000)
8.00 (–300)
5' region
15.33
35.62
29.51
31.17
19.80
16.90
Intron
Genic (%)
12.00
38.25
19.08
13.78
17.30
14.70
Exon
3.12 (+200)
3.58 (+200)
4.44 (+200)
4.31 (+200)
7.80 (+500)
5.60 (+300)
3' region
Hsing et al. 2006
Miyao et al. 2003; Hsing et al. 2006
Hsing et al. 2006
Sallaud et al. 2004; Hsing et al. 2006
Zhang et al. 2006
Jeong et al. 2006
Reference
Table 9.6. Frequency of occurrence of T-DNA inserts in intergenic- and genic regions of the rice genome following several large scale sequencing of flanking regions
202 Emmanuel Guiderdoni et al.
9 T-DNA Insertion Mutants as a Resource
203
9.6.3 Preference for Insertion in Expressed Genes Density graphs of distribution of FSTs parallels those of ESTs and FL-cDNA (Chen et al 2003; Sallaud et al 2004). It has been determined that the density of T-DNA insertion was positively correlated with the expressed, rather than the predicted, gene density along each chromosome (Jeong et al. 2006). After examination of 27,621 inserts in the rice genome, it was observed that the frequency of putative knockouts (KOs) was 17.1% and 27.8% in predicted and transcribed genes, respectively, indicating a preference for the recovery of insertions in expressed genes (Jeong et al. 2006). In addition, density profiles of T-DNA integration sites followed the distribution of 24,438 ESTs recovered from callus tissues indicating a high correlation between T-DNA integration and expressed callus genes (Hsing et al. 2006). Zhang et al. (2006b) recently mapped 45,441 T-DNA FSTs from four different research groups worldwide: 11,945 mapped to the coding regions of non-TE–related genes (based on theTIGR TU model) with further 8,067 and 3,482, respectively mapping within 1,000 bp upstream of the ATG codon and 500 bp downstream of the transcriptional stop codon. Assuming that these upstream and downstream regions are genic, the total intragenic insertions were 23,494 with 14,287 (33.5%) of the genes being tagged at least once. Although majority of the genes were tagged only once, 29 genes were tagged more than 10 times, indicating the possible existence of insertion “hot spots.” 9.6.4 Preference for GC Content and DNA Structure T-DNA integration into plant genomes occurs at dsDNA break points by nonhomologous recombination. Certain DNA contexts, i.e., AT-rich regions with low duplex stability and strong bending were proposed to favour T-DNA integration (Brunaud et al. 2002). On the other hand, Tos17 integration sites are thought to be determined by palindrome consensus sequences that form a cruciform structure. Tos17 prefers a narrow range of GC content, with very few integration events occurring at low- or high-GC content regions (Miyao et al. 2003). T-DNA is capable of integrating into chromosomal regions with a broad range of GC content and this may partially account for its more even distribution in the rice genome (Hsing et al. 2006). Consequently, overall GC content at T-DNA insertion sites is close to that of the entire rice genome (An et al. 2005b). Another question is to determine whether a special DNA configuration that is influenced by some unique structure of the DNA may favour T-DNA integration. A prominent peak in bendability was indeed recently detected at T-DNA
204
Emmanuel Guiderdoni et al.
insertion sites in Arabidopsis (Schneeberger et al. 2005). Bendable target DNA would create favored integration sites at the outer surface of the helix. DNA asymmetry may play a role in forming a bent DNA configuration enhancing sensitivity to DNA nuclease cleavage, which is of great importance for the integration of foreign DNA. 9.6.5 Preference for Functional Category of Genes Sallaud et al. (2004) performed similarity searches between FSTs and FL-cDNA sequences which have been classified successfully into biological process categories. The distribution of FL cDNA interrupted by a T-DNA insertion followed that of the full population of FL cDNA, indicating that no significant bias was detected in any functional category excepted for that of translation. This indicates that T-DNA insertion is not biased toward a particular class of genes as also reported in another study (An et al. 2003). 9.6.6 Estimation of the Number of Lines Required to Saturate the Rice Genome Assuming that random insertion occurs in the rice genome, it has been estimated that 471,000 T-DNA lines, harboring an average of 1.4 T-DNA inserts are required to have a 99% probability of knocking out every rice gene (Jeon and An 2001). However, as shown in the preceding text, an important finding emerging from the analysis of FST data is that T-DNA inserts are preferentially recovered in low-copy, gene-rich regions of the genome (with unbiased preferences to genic and intergenic regions) but scarcely from repetitive DNA. Further, T-DNA seems to have relatively fewer hot spots and cold spots of integration than that of Tos17 (Sallaud et al. 2004; Hsing et al. 2006; Jeong et al. 2006). The maximum number of T-DNA inserts found in a 10-kb region is 25, whereas up to 327 Tos17 insertions were found in a 15-kb region (Hsing et al. 2006).
9.7 Gene and Enhancer Trapping with T-DNA in Rice To increase the probability of detecting genes interrupted by the T-DNA or situated in the near vicinity of the T-DNA insert, the T-DNA can be equipped with a gene or enhancer trap. An enhancer trap (ET) typically consists of a reporter gene fused to a minimal promoter (MP) (e.g., –48 bp from the CaMV 35S promoter, containing the TATA box and a
9 T-DNA Insertion Mutants as a Resource
205
transcriptional start site) that is not transcriptionally active but its transcription can be triggered by neighboring chromosomal enhancer elements (Springer 2000). ET insertions tend to result in a high frequency of gene detections but often may not correspond to disruption of the discovered genes. Moreover, the minimal promoter may be under the influence of regulatory elements within the T-DNA itself following complex T-DNA integration at some insertion points. A gene trap (GT) contains a promoterless reporter gene whose expression occurs only when the T-DNA insert lies within a transcriptional unit and in correct orientation. The presence of one or more splice acceptor sites aligned in all reading frames and preceding the reporter gene allows expression of the reporter gene if insertion occurs in an intron of the interrupted gene. Frequency of expression is generally lower than that observed with ET, but corresponds to insertion within genes and most likely to knockouts. GUS expression with GT is a result of a translational fusion between the reporter gene and upstream exons of the interrupted gene (Springer 2000) and thus may provide information about interrupted gene’s temporal, spatial, and developmental expression pattern and/or gene product (protein) localization. Sundaresan et al. (1995) compared the frequencies of trapping as revealed by GUS activity among Arabidopsis insertion line populations harboring either GT or ET Ds elements and found relative values of 26% and 48%, respectively. Though such thorough comparison has not been made with T-DNA rice lines, frequencies of gene detection based on GUS activity typically fell in ranges of 1.6 to 8% and 20% to 30% with GT and ET lines, respectively. Illustrations of ET and GT TDNA constructs used to generate insertion libraries in rice and frequencies of detection of reporter gene activity are provided in Figs. 9.2 and 9.3 and Table 9.7. In different studies varied frequencies of trapping of 7.9% to 10.3% (GT, Eamens et al. 2004), 23% (ET, Johnson et al. 2005), and 84.3% (ET, Wu et al. 2003) have been observed in transformed callus tissues. The β-glucuronidase (gusA) and the green fluorescence protein (gfp) genes have been used as reporters for both enhancer- and gene trappings. The advantage of the gusA system is the accurate detection of gene products and tolerance of the N-terminal translational fusion in its enzyme activity (Jefferson 1987). Another advantage lies in the possibility of fixation of tissues with current histological techniques to resolve the pattern of expression at the cell level. The major drawbacks of using gusA as the reporter are its destructive nature of detection and the problem of substrate penetration into inner tissues, particularly with mature rice leaves. On the other hand, the GFP assay does not require exogenous cofactors or substrates and can be detected noninvasively, thereby allowing expression assays under a variety of environmental stresses (Ryu
206
Emmanuel Guiderdoni et al.
et al. 2004). Gene activation can indeed be triggered by certain environmental conditions or chemicals such as growth substances which may increase the gene detection frequency. Fluorescence microscopy and confocal microscopy have been used for monitoring subcellular localization of the GFP reporter protein. However, the poor detection in green tissues, especially in leaf mesophyll cells, due to chlorophyll interference remains to be a problem with GFP observation in rice. This can, however, be partly solved by etiolation or ethanol extraction of chlorophyll (Zhou et al. 2005). Table 9.7. Frequency of detection of GUS-specific activity and GFP through histochemical assay and epifluorescence observations respectively in enhancer (ET) and gene (GT) traps T-DNA insertion lines of rice Constructa (reference)
No. of ndependent lines observed
Frequency (%) of detection of activity Seeds
Rootsb
Leavesb
Flowerse
ND ND ND ND
ND ND ND 52.0
68.5b 58.5c 39.3d 61.7b
ND ND 25.0 ND ND
Gal4:gusA ET [pFX-E24.2-15R] (Wu et al. 2003)
408 996 827 454
Gal4:gusA ET [pFX-E24.2-15R] (Peng et al. 2005)
Values in parentheses
40.0 (9,120)
32.6 (212)
48.2b (212)
gusA GT [pGA2144] (Jeon et al. 2000)
Values in parentheses
1.6 (1,948)
2.1 (1,353)
2.0 (5,353)
b
1.1 (7,026)
gusA GT [pGA2707] gusA GT + 4x35SE AT [pGA2715] (Jeong et al. 2002)
2,290 3,842
5.3 9.3
2.8 6.3
3.8b 10.5b
4.7 9.4
Bidirectional GT gusA Bidirectional GT gfp [pGA2717] (Ryu et al. 2004)
3,140 3,140
4.8 0.5
2.0 ND
2.0b ND
ND ND
25.4 (2,667) ND
18.1b (2,667) 9.5 d (1,982)
ND
Gal4:gfp ET [p4956 ET 15] (Johnson et al. 2005) a
Values in parentheses
10.4 (2,664) ND
3.1 (1,982)
See Figs. 9.2 and Fig. 9.3 for details; bFor roots and leaves of seedling stage; Tillering stage; dHeading stage; eFlowers—only activity in stamens, carpel, pistil, and lodicules have been taken from original data; ND = not determined
c
9 T-DNA Insertion Mutants as a Resource
207
An elegant modification of this system is the construction of bidirectional gene traps making use of the two reporter systems (gusA and gfp) adjacent to each border of the T-DNA (Eamens et al. 2004; Ryu et al. 2004). Although a lower frequency of detection was observed with GFP, than with gusA, in the latter study (Table 9.7) differences were attributed either to sensitivity (for instance detection of GFP in the endosperm was scarce compared to gusA due to the reduction in light intensity during penetration of the tissue) or to the position of the gfp gene at the less conserved LB. Eamens et al. (2004) have developed a dualpurpose (T-DNA and Ds) bidirectional gene trap construct containing Ds termini inside the T-DNA borders to allow Ds transposition in the presence of the Ac transposase. A modification of the classical enhancer trap includes a modified yeast transcriptional activator Gal4 gene (Gal4:VP16) fused to a minimal promoter (MP) and within the same construct, a gusA or gfp gene fused to tandemly arrayed upstream activating sequence elements (UAS), which are recognized by GAL4 as binding sites. GUS or GFP activity reports on Gal4:VP16 expression, since GAL4 controls transcription of the gusA or gfp reporter genes through the binding to the upstream activation sequence (UAS) elements (Springer 2000). After identification of a particular enhancer trap line with interesting gusA or GFP expression, a second construct can be introduced with genes of interest fused to UAS elements which will be expressed only in tissues and cell types expressing GUS or GFP. This strategy has been used successfully in Drosophila melanogaster (Brand and Perrimon 1993; Phelps and Brand 1998) and was later applied to Arabidopsis thaliana, where a Gal4:VP16 fusion gene with modified codon usage was used in a T-DNA-based enhancer trap system (http://www.plantsci.cam.ac.uk/Haseloff/Home.html). Recently, the Gal4:VP16/UAS-gfp ET system has been successfully incorporated into rice (Wu et al. 2003; Yang et al. 2004; Johnson et al. 2005). The enhancer trapping frequency using the GAL4 system in rice T-DNA insertion lines ranged from 29% with GFP (Johnson et al. 2005) to 60% to 70% with GUS (Wu et al. 2003; Yang et al. 2004). As mentioned previously, the GAL4-based system is unique in that it can be used as a tool to transactivate any transgene of interest fused to UAS elements (Bougourd et al. 2000; Kiegle et al. 2000). The transactivation a gusA reporter gene fused to UAS elements after its introduction in Gal4:VP16/UAS-gfp ET lines have recently been demonstrated (Johnson et al. 2005; see Chapter 13 of this book). The efficiency of the GAL4mediated transactivation system in rice was recently expanded to target genes of interest (Liang et al. 2006). This was based on the use of the Gal4:VP16 ET library (approximately 130,000 lines) generated in Wuhan. Owing to the high frequency of near constitutive patterns of expression of
208
Emmanuel Guiderdoni et al.
the gusA gene in the library, a prescreening for nonexpressing lines at the callus stage was performed to increase the chances of identifying tissuespecific “pattern” or “driver” lines. Target lines were generated by transformation of Zhonghua 11 with constructs carrying the egfp reporter gene and target gene of interest, both controlled by the UAS, but in opposite directions. Hybrid plants were obtained by crossing target lines of 10 putative transcription factor genes from rice with 6 “pattern” lines exhibiting expression in anther, stigma, palea, lemma, and leaves. Various phenotypic changes such as delayed flowering, multiple pistils, dwarfism, narrow and droopy leaves, reduced tillers, growth retardation, and sterility were induced as a result of the expression of target genes. Other potential uses of the cell specific transactivation method include ablation of tissues through expression of a lethal gene or RNAi-mediated gene silencing in specific cell types.
9.8 Forward Genetics Screens and Gene Isolation Using T-DNA Insertion Lines Classical forward genetics proceeds from the identification of constitutive or conditional mutant phenotypes under standard or altered culture conditions, to the molecular establishment of tagging by the insertional mutagen, and ultimately, isolation of the disrupted gene. This is generally followed by complementation with the wild-type gene through crossing or transformation for restoration of the wild type phenotype. Systematic forward genetic screens conducted in Arabidopsis have shown that only a minority of mutant phenotypes can be related back to a recoverable tag even when no tissue culture step is involved in the generation of mutant libraries. Large sets of data on tagging frequency are not yet available in rice. However, one can speculate that this frequency may even be lower than that for Arabidopsis as T-DNA lines are generated through tissue culture procedures that, although limited in terms of duration and subcultures, are known to generate an undesirable background of somaclonal variations. In addition to transposable element activities (e.g., Tos17, mPing) small insertions, deletions, and base substitutions may be induced in cultured cells. Occurrence of new Tos17 inserts depends on the activity of the retrotransposon copy(ies) existing in each cultivar: T-DNA lines of cv. Tainung 67 were found to harbour on average 0.12 newly transposed Tos17 copy (Hsing et al. 2006), whereas Nipponbare and Dongjin T-DNA lines harbored an average of three and four new copies, respectively (D. Mieulet et al. unpublished; G. An, unpublished). However, only 5% to
9 T-DNA Insertion Mutants as a Resource
209
10% of the mutations identified in a tissue culture derived Tos17 library were found to be caused by insertion of the retroelement, indicating that the other sources of variation were predominant (Hirochika 2001). Owing to the large size and limited number of seeds produced by the rice compared to Arabidopsis and the need to propagate libraries under field conditions for evaluation of agronomic traits (which is a constraint in the case of transgenic lines), systematic screens of whole insertion libraries will likely remain relatively limited in rice. Several T-DNA libraries are being field propagated in China, Korea, Taiwan, and Colombia and should yield informative results about the comparative frequency of morphological and physiological alterations effectively tagged by the TDNA. In the case of the Génoplante OTL library (Sallaud et al. 2004), which is being propagated at the Centro Internacional de Agricultura Tropical (CIAT) in Colombia, 1.5%, 1.5%, and 20% of the tested lines exhibited a phenotype for response to fungal pathogens (J.B. Morel et al., unpublished), seed–related traits (P. Perez et al., unpublished), and morphological and physiological traits (M. Lorieux and J. Tohme, unpublished), respectively. Wu et al. (2003) observed conspicuous morphological alterations in about 7.5% of the 2,679 lines studied under field conditions. Clear 3:1 segregation was found in more than one third of the progeny of these mutants (60 out of 157 Zhonghua 11, and 25 out of 44 T1 of Zhonghua 15). Though many studies on characterization of mutations observed in T-DNA lines are currently being conducted in laboratories generating such insertion line libraries, the published studies so far mainly report on the isolation of genes after gene trapping or activation tagging. 9.8.1 Gene Trapping The frequency of trapping of nearby genes by GT technology was established by Ryu et al. (2004), who confirmed gene trapping in 19 out of 25 tested GFP-positive lines by iPCR and isolation of rice gene sequences flanking the gfp gene. The presence of the fusion transcript between OsZFP33, a putative zinc finger protein, and GFP was ascertained by reverse transcriptase-PCR (RT-PCR: splicing was found to occur at the third donor and first acceptor of the OstubA1 intron located between the gfp gene and the OsZP33 promoter (Fig. 9.2a). Yang et al. (2004) assayed T1 seeds of 9,120 independent ET lines harboring a BoGUS::gfp N terminal fusion (construct pFX-E24.3-15R; Fig. 9.3a). To evaluate the effectiveness of enhancer trapping, they selected 58 candidate promoters predicted from upstream flanking sequences. Of 10 promoters (randomly amplified T-DNA FSTs) mounted upstream of the gusA reporter gene (in
210
Emmanuel Guiderdoni et al.
vector pCAMBIA1391Z), six exhibited consistent expression patterns with those of the original ET lines when reintroduced into rice by transformation (Peng et al. 2005). To identify low-temperature responsive genes in rice, Lee et al. (2004b) screened GUS-trapped T-DNA lines (pGA2144 or pGA2707 vector; Fig. 9.2a) that were subjected to a cold stress at 5°C. Of 15,586 lines, 81 (0.52%) showed cold responsive alteration in GUS activity. Of the 62 lines studied, 53 exhibited increased GUS activity, whereas 9 showed a decrease in GUS expression under cold stress. Sixteen of the 62 lines were also influenced by abscisic acid (ABA), treatment suggesting an ABAdependent cold response. iPCR and thermal asymmetric interlaced (TAIL)PCR were used to identify 37 tagged genes, two of which were characterized further at the molecular level: an LRR-RLK OsRLK1 inducible by cold and salt stress, and OsDMKT1, a putative demethylmenaquinone methyltransferase whose expression is induced under low temperatures. These results demonstrate the effectiveness of gene trap mutagenesis for the discovery of novel genes that are regulated in response to low temperatures in rice. In this study it was also demonstrated that the GUS staining pattern fully mirrored the sites of expression and the responsiveness of the trapped gene. Jung et al. (2003) identified 270 lines with preferential GUS activity in anthers following screening of 14,000 pGA2715 T-DNA GT lines: fifteen lines exhibited male sterility that cosegregated with the GUS pattern in progeny plants. A mutant called undeveloped tapetum 1 (udt1)) was also isolated (Jung et al. 2005). In the GT line, GUS activity was high during tapetum development and decreased after tapetum degeneration. However, no activity was observed in other floral organs or vegetative tissues and GUS activity was found to be localized in the anther wall and microspores. The T-DNA insertion is located 1,006 bp downstream from the ATG stop codon of the UDT1 gene which encodes a predicted protein of 227 amino acids that is similar to Brassica napus and Arabidopsis thaliana bHLH transcription factors. The region between amino acids 59 and 118 was predicted to be a HLH domain necessary for dimerization and a nearby basic domain for target DNA binding. UDT1 is a nuclear protein and its transcripts are most abundant during early anther development. A Tos17 allele was identified in PCR pools of the T-DNA population corresponding to an insertion in the third exon. A transcriptome analysis of developing anthers of the udt1 mutant was further conducted to determine its targets: 1,225 genes are either up or down regulated including aspartyl proteases and subtilin-like proteases. Five WRKY and 1 MYB transcription factors followed the same expression profile pattern as UDT1. In addition, Lee et al. (2004a) reported on the functional analysis of a cysteine protease gene OsCP1 gene through isolation of a GT insertion in
9 T-DNA Insertion Mutants as a Resource
211
the 5΄ UTR region. The OsCP1 promoter is highly active in loculi and tapetum of rice anthers and also in developing pollen, but is expressed to a low degree in vascular bundles and in connective tissues. The OsCP1 is homologous to papain family cysteine proteases. The knockout (KO) mutants showed significant defects in pollen development, reduced height, and seed formation. 9.8.2 Activation Tagging A limitation, learned from the experience with Arabidopsis, is that loss-offunction screens rarely identify genes that act redundantly. In addition, knockouts of genes required during multiple stages of the life cycle of the plant results in early embryonic or in gametophytic lethality and hence such genes are difficult to be identified (Weigel et al. 2000). Activation tagging has thus been described as an alternative method to isolate genes through the use of inserts carrying strong activating sequences that can quantitatively modify the transcription of genes adjacent to insertion sites, while still retaining their original expression pattern. Activation tagging has been shown to function in Arabidopsis with the use of multimerized transcriptional enhancer sequences from the well characterized CaMV 35S promoter (–343 to –90) fragment carried by the T-DNA (Weigel et al. 2000) or Spm/dSpm element (Marsch-Martinez et al. 2002). From 30,000 T-DNA activation tagging insertion lines and 2,900 En/Spm activation tagging insertion lines, 30 and 31 dominant mutants have been identified, respectively. In the first study (Weigel et al. 2000), overexpressed genes were normally found adjacent to the inserted CaMV 35S enhancers at distances ranging from 380 bp to 3.6 kb, indicating that in small-sized genomes such as Arabidopsis, 20,000 to 30,000 activation tagging insertion lines are sufficient to ensure activation of majority of genes. The possibility of randomly enhancing gene expression through T-DNA mediated activation tagging has been demonstrated in rice and, to date, more than 150,000 insertion lines harboring this system have been generated (Jeong et al. 2002; Hsing et al. 2006; Jeong et al. 2006). Four out of 10 randomly chosen candidate lines were found to exhibit enhanced expression of nearby genes separated by a distance of 1.5 to 4.3 kb from the enhancer elements, while still maintaining their original expression pattern (Jeong et al. 2002). Genes that have been isolated and characterized after activation tagging (Hsing et al. 2006; Jeong et al. 2006) are described in Chapter 13 of this book.
212
Emmanuel Guiderdoni et al.
9.9 Reverse Genetics with T-DNA Mutants in Rice Reverse genetics comprises a set of methods designed to create or identify lines with inactivated expression of a particular candidate gene in order to assign a function to that gene. Identification of KO mutants in an insertion line library has long relied on the use of PCR screens for the desired insertion in one- to three-dimensional pools of DNA samples representing the entire population, using primers specific for both the insertional mutagen and the target gene. Large-scale isolation and sequencing of chromosomal regions flanking inserts to create insertion databases is becoming more popular because it allows the direct identification of mutant lines through simple worldwide computer searches in public databases. More than 300,000 Arabidopsis FSTs are currently available, allowing for the identification of one or more insertions in any gene of this model species (http://www.arabidopsis.org/links/insertion.jsp). In rice, both PCR-based searches in DNA pools and FST databases have enabled the identification of mutants in sequences of interest among mutagenized populations. For instance, PCR screening for 12 MADS box genes of DNA pools prepared from 21,049 tagged lines identified of five insertions in four target genes (Lee et al. 2003). The DNA pool size at POSTECH has been increased to 61,481 lines (15,419 pGA2707, 23,965 pGA2715, 16,912 pGA2717, and 5,185 pGA2772). They were divided into 640 pools and 91 superpools. The success rate is approximately 50%. The major problem in identifying T-DNA insertional mutants from DNA pools is the high GC content in the rice genome. Since PCR efficiency is low in highGC regions, tags in a certain GC-rich genes are difficult to identify by PCR based approaches. The problem can be partly overcome by employing betaine in the reaction buffer and shortening the size of PCR fragments. A published example of mutants identified found by a reverse genetics search of DNA pools in rice, is an orthologue of CLAVATA1, FON1. Two KO mutants—fon1-3 and fon1-4—generated by T-DNA and Tos17 insertion, respectively were found in the POSTECH RISD library (Moon et al. 2006). They exhibited alterations both in reproductive and vegetative tissues, producing semi-dwarf plants with reduced tillering and delayed senescence. Enlargement of the shoot apical meristem was observed in fon1-3. The recent generation of large public FST information in rice gathered in specialized reverse genetics databases (e.g., Droc et al. 2006, http://orygenesdb.cirad.fr/) now greatly facilitates identification of inserts in rice candidate genes generated from literature searches and transcriptome analyses. This also allows downsizing the scale of forward screens for alteration in particular traits such as stress response. Such a strategy is being implemented in the frame of a Generation Challenge
9 T-DNA Insertion Mutants as a Resource
213
Program to target stress-associated genes though evaluation of sequenceindexed mutants in international collections (Pereira et al. 2005). It has been shown in Arabidopsis that analysis, under standard culture conditions, reveals only a small percentage of KO mutant lines exhibiting an informative phenotype (Bouche and Bouchez 2001). One explanation for this “phenotype gap” is our inability to detect slight physiological alterations. As an alternative to testing under a wide range of environmental conditions, mining the information on the target sequence, ranging from expression profiling data to localization of the gene product, can help defining the precise conditions for revealing the phenotypes (Bouche and Bouchez 2001). Another explanation for the phenotype gap relates to the redundancy of gene function: gene duplication is indeed frequent in higher plant species and most genes belong to gene families with members existing in dispersed and/or clustered copies throughout the genome. This situation is anticipated to be even more frequent in rice than in Arabidopsis since tandemly repeated sequences are more prevalent in the rice gene complement. As mentioned earlier, an alternative to the creation and stacking of mutant alleles in the various members of a gene family through crossing of lines altered in individual genes which might prove impossible in the case of tightly linked tandem arrays of family members is to create global KO lines expressing a dsRNA of conserved sequence motifs shared between family members, thus silencing all family members simultaneously. Alternatively, the mobilization of a Ds element from a nearby launching pad to saturate the tandem array with inserts, or the creation of a large deletion in the corresponding chromosomal region, could help to address this problem. Producing gain-of-function phenotype through overexpression or activation tagging could also help resolving gene redundancy.
9.10 Conclusion and Prospects T-DNA is now accepted as the preferred insertion mutagen to create large libraries of insertion lines in rice (more than 460,000 lines and 113,000 public FSTs). The major advantages of T-DNA inserts are that they are chemically and physically stable over generations, can carry powerful gene detection and/or activation systems and/or a Ds element, and are phenotypically tagged with expressions of selectable marker and/or reporter gene(s). Moreover, they are integrated in low-copy numbers (average of 2 copies at 1.4 loci per line) thereby facilitating further genetic and molecular analyses to enable the creation of large libraries, ensuring genome saturation. However, the main drawback of T-DNA insertional mutagenesis is that the observed alterations are frequently untagged (or
214
Emmanuel Guiderdoni et al.
unrecoverable tag) due to integration of truncated T-DNA and/or somaclonal variation. Another drawback is the often complex organization of T-DNA inserts, which include concatemerized and/or truncated copies and/or binary vector sequences, which result in an overall 40% to 50% failure rate in sequencing of T-DNA flanking regions. FST redundancy, however, is low with T-DNA compared to that observed with Tos17 and Ds lines as the latter could be arising from common cell and progeny lineage. If one were to establish a more specialized T-DNA libraries it is highly desirable to enhance their quality by eliminating unusable transformation events during the generation of the primary transformants. For instance, minimizing VB integration events should be possible through the use of multiple left border T-DNA constructs (Kuraya et al. 2004) or the integration of an ubiquitin promoter-barnase gene cassette in the VB as shown by Eamens et al. (2004). A T-DNA construct system designed to trigger gene silencing of the selectable marker gene in case of tandem integration would also reduce the frequency of events not indexed by an FST (Chen et al. 2005). Characterization of FSTs of primary transformants at an early stage would save greenhouse and seed storage space by eliminating transformants with unrecoverable tags. Development of an in planta transformation system to avoid somaclonal variation and establishment of a tagging system for the generally recalcitrant indica cultivars are also highly desirable. Analysis of the distribution of T-DNA insertion sites resulting from several large-scale FST recovery from independent libraries, consistently demonstrated that T-DNA inserts are scattered along the rice chromosomes with no apparent hot spots or cold spots of integration other than a preference for gene-rich regions. T-DNA generally integrates into gene and intergenic regions with comparable frequency and seems to prefer regions surrounding the ATG and stop codons, contrasting with Tos17 insertions which exhibit a clear preference for coding sequences. Given their intrinsic properties and insertional preferences, the combined utilization of all mutagens appears desirable to achieve genome saturation with insertion sites. Different type of insertion libraries have to be considered complementary rather than redundant because they allow finding allelic series of lesions which is an alternative gene function validation tool to trans-complementation. In this respect, identification of new Tos17 inserts through amplification in DNA pools or systematic sequencing in T-DNA insertion lines may also prove useful in finding allelic mutations in the same cultivar background, as exemplified for UDT1 (Jung et al. 2005) and FON1 (Moon et al. 2006). Though the number of T-DNA lines generated now appears to allow genome saturation, the effort of FST recovery should be intensified. The
9 T-DNA Insertion Mutants as a Resource
215
major obstacle of reverse genetics approaches is indeed gene duplication in which redundancy may result in an absence of obvious phenotypic change. A large number of FSTs specific to rice gene families is therefore needed to combine double or triple mutations in a group of related genes for observing mutant phenotypes.
Acknowledgments The authors wish to acknowledge the support of the Generation Challenge Program, the ANR Génoplante program, France, the Crop Functional st Genomics center, the 21 century Frontier Program (CG1111) and the Biogreen 21 program and the Rural Development Administration, Korea, the Academia Sinica, the National Science Council and the Council of Agriculture of the Republic of China and the Ministry of Science and Technology of China.
References Afolabi AS, Worland B, Snape JW, Vain P (2004) A large-scale study of rice plants transformed with different T-DNAs provides new insights into locus composition and T-DNA linkage configurations. Theor Appl Genet 109: 815–826 Aldemita R, Hodges TK (1996) Agrobacterium tumefaciens-mediated transformation of japonica and indica rice varieties Planta 199:612–617 Alonso JM, Stepanova AN, Leisse TJ, Kim CJ, Chen H, Shinn P, Stevenson DK, Zimmerman J, Barajas P, Cheuk R, Gadrinab C, Heller C, Jeske A, Koesema E, Meyers CC, Parker H, Prednis L, Ansari Y, Choy N, Deen H, Geralt M, Hazari N, Hom E, Karnes M, Mulholland C, Ndubaku R, Schmidt I, Guzman P, Aguilar-Henonin L, Schmid M, Weigel D, Carter DE, Marchand T, Risseeuw E, Brogden D, Zeko A, Crosby WL, Berry CC, Ecker JR (2003) Genome-wide insertional mutagenesis of Arabidopsis thaliana. Science 301:653–657 An G, Jeong DH, Jung KH, Lee S (2005a) Reverse genetic approaches for functional genomics of rice. Plant Mol Biol 59:111–123 An G, Lee S, Kim SH, Kim SR (2005b) Molecular genetics using T-DNA in rice. Plant Cell Physiol 46:14–22 An SY, Park S, Jeong DH, Lee DY, Kang HG, Yu JH, Hur J, Kim SR, Kim YH, Lee M, Han SK, Kim SJ, Yang JW, Kim E, Wi SJ, Chung HS, Hong JP, Choe V, Lee HK, Choi JH, Nam JM, Kim SR, Park PB, Park KY, Kim WT, Choe S, Lee CB, An G (2003) Generation and analysis of end sequence database for T-DNA tagging lines in rice. Plant Physiol 133:2040–2047
216
Emmanuel Guiderdoni et al.
Bajaj S, Mohanty A (2005) Recent advances in rice biotechnology; towards genetically superior transgenic rice. Plant Biotech J 3:275–307 Bechtold N, Ellis J, Pelletier G (1993) In planta Agrobacterium mediated gene transfer by infiltration of adult Arabidopsis thaliana plants. C.R. Acad Sci Ser III (Paris) 316:10–1199 Bouche N, Bouchez D (2001) Arabidopsis gene knockout: phenotypes wanted. Curr Opin Plant Biol 4:111–117 Bougourd S, Marrison J, Haseloff J (2000) An aniline blue staining procedure for confocal microscopy and 3D imaging of normal and perturbed cellular phenotypes in mature Arabidopsis embryos. Plant J 24:543–550 Brand AH, Perrimon N (1993) Targeted gene expression as a means of altering cell fates and generating dominant phenotypes. Development 118:401–415 Brunaud V, Balzergue S, Dubreucq B, Aubourg S, Samson F, Chauvin S, Bechtold N, Cruaud C, DeRose R, Pelletier G, Lepiniec L, Caboche M, Lecharny A (2002) T-DNA integration into the Arabidopsis genome depends on sequences of pre-insertion sites. EMBO Rep 3:1152–1157 Chan MT, Chang HH, Ho SL, Tong WF, Yu SM (1993) Agrobacterium-mediated production of transgenic rice plants expressing a chimeric alpha-amylase promoter/beta-glucuronidase gene. Plant Mol Biol 22:491–506 Chen L, Marmey P, Taylor NJ, Brizard JP, Espinoza C, D'Cruz P, Huet H, Zhang S, de Kochko A, Beachy RN, Fauquet CM (1998) Expression and inheritance of multiple transgenes in rice plants. Nat Biotechnol 16:1060–1064 Chen S, Jin W, Wang M, Zhang F, Zhou J, Jia Q, Wu Y, Liu F, Wu P (2003) Distribution and characterization of over 1000 T-DNA tags in rice genome. Plant J 36:105–113 Chen S, Helliwell CA, Wu LM, Dennis ES, Upadhyaya N, Zhang R, Waterhouse PM, M.B. W (2005) A novel T-DNA vector design conducive for selection of transgenic lines with simple transgene integration and stable transgene expression. Funct Plant Biol 32:671–681 Cheng Z, Buell CR, Wing RA, Gu M, Jiang J (2001) Toward a Cytological Characterization of the Rice Genome. Genome Res 11:2133–2141 Dong JJ, Teng WM, Buchholz WG, Hall TC (1996) Agrobacterium-mediated transformation of Javanica rice. Mol Breed 2:267–276 Droc G, Ruiz M, Larmande P, Pereira A, Piffanelli P, Morel JB, Dievart A, Courtois B, Guiderdoni E, Perin C (2006) OryGenesDB: a database for rice reverse genetics. Nucl Acids Res 34:736–740 Eamens AL, Blanchard CL, Dennis ES, Upadhyaya NM (2004) A bidirectional gene trap construct suitable for T-DNA and Ds-mediated insertional mutagenesis in rice (Oryza sativa L.). Plant Biotech J 2:367–380 Feldmann KA (1991) T-DNA insertion mutagenesis in Arabidopsis: mutational spectrum. Plant J 1:71–82 Feldmann KA, Marks MD (1987) Agrobacterium-mediated transformation of germinating seeds of Arabidopsis thaliana: a non-tissue culture approach. Mol Gen Genet 208:1–9 Gelvin SB (2003) Agrobacterium-mediated plant transformation: the biology behind the “gene-jockeying” Tool. Microbiol Mol Biol Rev 67:16–37
9 T-DNA Insertion Mutants as a Resource
217
Gheysen G, Villarroel R, Van Montagu M (1991) Illegitimate recombination in plants: a model for T-DNA integration. Genes Dev 5:287–297 Hiei Y, Ohta S, Komari T, Kumashiro T (1994) Efficient transformation of rice (Oryza sativa L.) mediated by Agrobacterium and sequence analysis of the boundaries of the T-DNA. Plant J 6:271–282 Hiei Y, Komari T, Kubo T (1997) Transformation of rice mediated by Agrobacterium tumefaciens. Plant Mol Biol 35:205–218 Hirochika H (2001) Contribution of the Tos17 retrotransposon to rice functional genomics. Curr Opin Plant Biol 4:118–122 Hirochika H, Guiderdoni E, An G, Hsing YI, Eun MY, Han CD, Upadhyaya N, Ramachandran S, Zhang QF, Pereira A, Sundaresan V, Leung H (2004) Rice mutant resources for gene discovery. Plant Mol Biol 54:325–334 Hsing Y-I, Chern C-G, Fan M-J, Lu P-C, Chen K-T, Lo S-F, Ho S-L, Lee K-W, Wang Y-C, Sun P-K, Ko R, Huang W-L, Chen J-L, Chung C-I, Lin Y-C, Hour A-L, Wang Y-W, Chang Y-C, Tsai M-W, Lin Y-S, Chen Y-C, Chen S, Yen H-M, Li C-P, Wey C-K, Tseng C-S, Lai M-H, Chen L-J, Yu S-M (2007) A rice gene activation/knockout mutant resource for high throughput functional genomics. Plant Mol Biol 63:351– 364 Jefferson RA (1987) Assaying chimeric genes in plants from gene fusion system. Plant Mol Biol Rep 5:387–405 Jeon JS, An G (2001) Gene tagging in rice: a high throughput system for functional genomics. Plant Sci 161:211–219 Jeon JS, Lee S, Jung KH, Jun SH, Jeong DH, Lee J, Kim C, Jang S, Lee S, Yang K, Nam J, An K, Han MJ, Sung RJ, Choi HS, Yu JH, Choi JH, Cho SY, Cha SS, Kim SI, An G (2000) T-DNA insertional mutagenesis for functional genomics in rice. Plant J 22:561–570 Jeong DH, An SY, Kang HG, Moon S, Han JJ, Park S, Lee HS, An KS, An G (2002) T-DNA insertional mutagenesis for activation tagging in rice. Plant Physiol 130:1636–1644 Jeong DH, An S, Park S, Kang HG, Park GG, Kim SR, Sim J, Kim YO, Kim MK, Kim SR, Kim J, Shin M, Jung M, An G (2006) Generation of a flanking sequence-tag database for activation-tagging lines in japonica rice. Plant J 45:123–132 Johnson AAT, Hibberd JM, Gay C, Essah PA, Haseloff J, Tester M, Guiderdoni E (2005) Spatial control of transgene expression in rice (Oryza sativa L.) using the GAL4 enhancer trapping system. Plant J 41:779–789 Jung KH, Hur J, Ryu CH, Choi Y, Chung YY, Miyao A, Hirochika H, An G (2003) Characterization of a rice chlorophyll-deficient mutant using the T-DNA gene-trap system. Plant Cell Physiol 44:463–472 Jung KH, Han MJ, Lee YS, Kim YW, Hwang IW, Kim MJ, Kim YK, Nahm BH, An G (2005) Rice Undeveloped Tapetum1 is a major regulator of early tapetum development. Plant Cell 17:2705–2722 Kiegle E, Moore CA, Haseloff J, Tester MA, Knight MR (2000) Cell-typespecific calcium responses to drought, salt and cold in the Arabidopsis root. Plant J 23:267–278 Kim SR, Lee J, Jun SH, Park S, Kang HG, Kwon S, An G (2003) Transgene structures in T-DNA-inserted rice plants. Plant Mol Biol 52:761–773
218
Emmanuel Guiderdoni et al.
Koncz C, Nemeth K, Redei GP, Schell J (1992) T-DNA insertional mutagenesis in Arabidopsis. Plant Mol Biol 20:963–976 Kononov ME, Bassuner B, Gelvin SB (1997) Integration of T-DNA binary vector ‘backbone’ sequences into the tobacco genome: evidence for multiple complex patterns of integration. Plant J 11:945–957 Krysan PJ, Young JC, Sussman MR (1999) T-DNA as an insertional mutagen in Arabidopsis. Plant Cell 11:2283–2290 Kumar S, Fladung M (2002) Transgene integration in aspen: structures of integration sites and mechanism of T-DNA integration. Plant J 31:543–551 Kuraya Y, Ohta S, Fukuda M, Hiei Y, Murai N, Hamada K, Ueki J, Imaseki H, Komari T (2004) Suppression of transfer of non-T-DNA vector backbone sequences by multiple left border repeats in vectors for transformation of higher plants mediated by Agrobacterium tumefaciens. Mol Breed 14: 309–320 Lee S, Jeon, JS, Jung KH, An G (1999) Binary vectors for efficient transformation of rice J Plant Biol 42:310–316 Lee S, Kim J, Son J-S, Nam J, Jeong D-H, Lee K, Jang S, Yoo J, Lee J, Lee D-Y, Kang H-G, An G (2003) Systematic reverse genetic screening of T-DNA tagged genes in rice for functional genomic analyses: MADS-box genes as a test case. Plant Cell Physiol 44:1403–1411 Lee S, Jung KH, An GH, Chung YY (2004a) Isolation and characterization of a rice cysteine protease gene, OsCP1, using T-DNA gene-trap system. Plant Mol Biol 54:755–765 Lee SC, Kim JY, Kim SH, Kim SJ, Lee K, Han SK, Choi HS, Jeong DH, An GH, Kim SR (2004b) Trapping and characterization of cold-responsive genes from T-DNA tagging lines in rice. Plant Sci 166:69–79 Liang D, Wu C, Li C, Xu C, Zhang J, Kilian A, Li X, Zhang Q, Xiong L (2006) Establishment of a patterned GAL4/VP16 transactivation system for discovering gene function in rice. Plant J 46:1059-1072 Marsch-Martinez N, Greco R, Van Arkel G, Herrera-Estrella L, Pereira A (2002) Activation Tagging Using the En-I Maize Transposon System in Arabidopsis. Plant Physiol 129:1544–1556 Mayerhofer R, Koncz-Kalman Z, Nawrath C, Bakkeren G, Crameri A, Angelis K, Redei GP, Schell J, Hohn B, Koncz (1991) T-DNA integration: a mode of illegitimate recombination in plants. EMBO J 10:697–704 McKinney EC, Ali N, Traut A, Feldmann KA, Belostotsky DA, McDowell JM, Meagher RB (1995) Sequence-based identification of T-DNA insertion mutations in Arabidopsis: actin mutants act2-1 and act4-1. Plant J 8:613–622 Miyao A, Tanaka K, Murata K, Sawaki H, Takeda S, Abe K, Shinozuka Y, Onosato K, Hirochika H (2003) Target site specificity of the Tos17 retrotransposon shows a preference for insertion within genes and against insertion in retrotransposon-rich regions of the genome. Plant Cell 15: 1771–1780 Moon S, Jung KH, Lee DE, Lee DY, Lee J, An K, Kang HG, An G (2006) The rice FON1 gene controls vegetative and reproductive development by regulating shoot apical meristem size. Mol Cells 21:147–152
9 T-DNA Insertion Mutants as a Resource
219
Parinov S, Sevugan M, De Y, Yang W-C, Kumaran M, Sundaresan V (1999) Analysis of flanking sequences from dissociation insertion lines: a database for reverse genetics in Arabidopsis. Plant Cell 11:2263–2270 Peng H, Huang H, Yang Y, Zhai Y, Wu J, Huang D, Lu T (2005) Functional analysis of GUS expression patterns and T-DNA integration characteristics in rice enhancer trap lines. Plant Sci 168:1571–1579 Pereira A, Hirochika H, Guiderdoni E, Lorieux M, Verdier V, Ishitani M, Lu TG, Zhang Q, Leung H (2005) Discovery of stress tolerance genes using global collections of rice mutants. In: Rice Genetics V Abstracts. Int Rice Res Inst Manila, Philippines (http://www.irri.org/rg5/Abstracts.pdf) Phelps CB, Brand AH (1998) Ectopic gene expression in Drosophila using GAL4 system. Methods 14:367–379 Puchta H (1999) Double-strand break-induced recombination between ectopic homologous sequences in somatic plant cells. Genetics 152:1173–1181 Raina S, Mahalingam R, Chen F, Fedoroff N (2002) A collection of sequenced and mapped Ds transposon insertion sites in Arabidopsis thaliana. Plant Mol Biol 50:93–110 Rios G, Lossow A, Hertel B, Breuer F, Schaefer S, Broich M, Kleinow T, Jasik J, Winter J, Ferrando A, Farras R, Panicot M, Henriques R, Mariaux J-B, Oberschall A, Molnar G, Berendzen K, Shukla V, Lafos M, Koncz Z, Redei GP, Schell J, Koncz C (2002) Rapid identification of Arabidopsis insertion mutants by non-radioactive detection of T-DNA tagged genes. Plant J 32:243–253 Rosso MG, Li Y, Strizhov N, Reiss B, Dekker K, Weisshaar B (2003) An Arabidopsis thaliana T-DNA mutagenized population (GABI-Kat) for flanking sequence tag-based reverse genetics. Plant Mol Biol 53:247–259 Ryu CH, You JH, Kang HG, Hur JH, Kim YH, Han MJ, An KS, Chung BC, Lee CH, An G (2004) Generation of T-DNA tagging lines with a bidirectional gene trap vector and the establishment of an insertion-site database. Plant Mol Biol 54:489–502 Sallaud C, Meynard D, van Boxtel J, Gay C, Bes M, Brizard JP, Larmande P, Ortega D, Raynal M, Portefaix M, Ouwerkerk PB, Rueb S, Delseny M, Guiderdoni E (2003) Highly efficient production and characterization of T-DNA plants for rice (Oryza sativa L.) functional genomics. Theor Appl Genet 106:1396–1408 Sallaud C, Gay C, Larmande P, Bes M, Piffanelli P, Piegu B, Droc G, Regad F, Bourgeois E, Meynard D, Perin C, Sabau X, Ghesquiere A, Glaszmann JC, Delseny M, Guiderdoni E (2004) High throughput T-DNA insertion mutagenesis in rice: a first step towards in silico reverse genetics. Plant J 39:450–464 Salomon S, Puchta H (1998) Capture of genomic and T-DNA sequences during double-strand break repair in somatic plant cells. EMBO J 17:6086–6095 Samson F, Brunaud V, Balzergue S, Dubreucq B, Lepiniec L, Pelletier G, Caboche M, Lecharny A (2002) FLAGdb/FST: a database of mapped flanking insertion sites (FSTs) of Arabidopsis thaliana T-DNA transformants. Nucl Acids Res 30:94–97
220
Emmanuel Guiderdoni et al.
Schneeberger RG, Zhang K, Tartinova T, Troukhan M, Kwok CF, Drais J, Klinger K, Orejudos F, Macy K, Bhakta A, Burns J, Subramanian G, Donson J, Flavell R, Feldmann KA (2005) Agrobacterium T-DNA integration in Arabidopsis is correlated with DNA sequence compositions that occur frequently in gene promoter regions. Funct Int Genomics 5:240–253 Sessions A, Burke E, Presting G, Aux G, McElver J, Patton D, Dietrich B, Ho P, Bacwaden J, Ko C, Clarke JD, Cotton D, Bullis D, Snell J, Miguel T, Hutchison D, Kimmerly B, Mitzel T, Katagiri F, Glazebrook J, Law M, Goff SA (2002) A High-throughput Arabidopsis reverse genetics system. Plant Cell 14:2985–2994 Sha Y, Li S, Pei Z, Luo L, Tian Y, He C (2004) Generation and flanking sequence analysis of a rice T-DNA tagged population. Theor Appl Genet 108:306–314 Speulman E, Metz PLJ, van Arkel G, te Lintel Hekkert B, Stiekema WJ, Pereira A (1999) A Two-component enhancer-inhibitor transposon mutagenesis system for functional analysis of the Arabidopsis genome. Plant Cell 11:1853–1866 Springer PS (2000) Gene traps: tools for plant development and genomics. Plant Cell 12:1007–1020 Sundaresan V, Springer P, Volpe T, Haward S, Jones JD, Dean C, Ma H, Martienssen R (1995) Patterns of gene action in plant development revealed by enhancer trap and gene trap transposable elements. Genes Dev 9: 1797–1810 Szabados L, Kovacs I, Oberschall A, Abraham E, Kerekes I, Zsigmond L, Nagy R, Alvarado M, Krasovskaja I, Gal M, Berente A, Redei GP, Ben Haim A, Koncz C (2002) Distribution of 1000 sequenced T-DNA tags in the Arabidopsis genome. Plant J 32:233–242 Terada R, Asao H, Iida S (2004) A large-scale Agrobacterium-mediated transformation procedure with a strong positive-negative selection for gene targeting in rice (Oryza sativa L.). Plant Cell Rep 22:653–659 Tinland B (1996) The integration of T-DNA into plant genomes. Trends Plant Sci 1:178–184 Tissier AF, Marillonnet S, Klimyuk V, Patel K, Torres MA, Murphy G, Jones JDG (1999) Multiple independent defective Suppressor-mutator transposon insertions in Arabidopsis: a tool for functional genomics. Plant Cell 11: 1841–1852 Vain P, Afolabi AS, Worland B, Snape JW (2003) Transgene behaviour in populations of rice plants transformed using a new dual binary vector system: pGreen/pSoup. Theor Appl Genet 107:210–217 Valvekens D, Van Montagu M, Van Lijsebettens M (1988) Agrobacterium tumefaciens-mediated transformation of Arabidopsis thaliana root explants by using kanamycin selection. Proc Natl Acad Sci USA 85:5536–5540 Wang YH, Xue YB, Li JY (2005) Towards molecular breeding and improvement of rice in China. Trends Plant Sci 10:610–614 Weigel D, Ahn JH, Blazquez MA, Borevitz JO, Christensen SK, Fankhauser C, Ferrandiz C, Kardailsky I, Malancharuvil EJ, Neff MM, Nguyen JT, Sato S, Wang Z-Y, Xia Y, Dixon RA, Harrison MJ, Lamb CJ, Yanofsky MF, Chory J (2000) Activation Tagging in Arabidopsis. Plant Physiol 122:1003–1014
9 T-DNA Insertion Mutants as a Resource
221
Windels P, De Buck S, Van Bockstaele E, De Loose M, Depicker A (2003) T-DNA integration in Arabidopsis chromosomes. Presence and origin of filler DNA sequences. Plant Physiol 133:2061–2068 Wu C, Li X, Yuan W, Chen G, Kilian A, Li J, Xu C, Li X, Zhou D-X, Wang S, Zhang Q (2003) Development of enhancer trap lines for functional analysis of the rice genome. Plant J 35:418–427 Yang Y, Peng H, Huang H, Wu J, Jia S, Huang D, Lu T (2004) Large-scale production of enhancer trapping lines for rice functional genomics. Plant Sci 167:281–288 Yanofsky MF, Porter SG, Young C, Albright LM, Gordon MP, Nester EW (1986) The virD operon of Agrobacterium tumefaciens encodes a site-specific endonuclease. Cell 47:471–477 Yin Z, Wang GL (2000) Evidence of multiple complex patterns of T-DNA integration into the rice genome. Theor Appl Genet 100:461–470 Young JC, Krysan PJ, Sussman MR (2001) Efficient screening of Arabidopsis T-DNA insertion lines using degenerate primers. Plant Physiol 125:513–518 Zhang J, Li C, Wu C, Xiong L, Chen G, Zhang Q, Wang S (2006) RMD: a rice mutant database for functional analysis of the rice genome. Nucleic Acids Res 34:D745–748 Zhang J, Guo D, Chang YX, You CJ, Li XW, Dai XX, Weng QJ, Zhang JW , Chen GX, Li XH, Liu HF, Han B, Zhang QF , Wu CY (2007) Non-random distribution of T-DNA insertions at various levels of the genome hierarchy as revealed by analysing 13,804 T-DNA flanking sequences from an enhancertrap mutant library. Plant J (In Press, DOI 10.1111/j.1365-313X.2006.03001.x) Zhou X, Carranco R, Vitha S, Hall TC (2005) The dark side of green fluorescent protein. New Phytol 168:313–321 Zhu Q-H, Ramm K, Eamens AL, Dennis ES and Upadhyaya NM (2006) Transgene structures suggest that multiple mechanisms are involved in T-DNA integration in plants. Plant Science 171:308–322
10 Transposon Insertional Mutants: A Resource for Rice Functional Genomics
1
2
3
Qian-Hao Zhu , Moo Young Eun , Chang-deok Han , Chellian Santhosh 4 5 6 Kumar , Andy Pereira , Srinivasan Ramachandran , Venkatesan Sundare4 1 1 7 san , Andrew L. Eamens , Narayana M. Upadhyaya and Ray Wu 1
CSIRO Plant Industry, GPO Box 1600, Canberra, ACT 2601, Australia; 2Rice Functional Genomics and Molecular Breeding Lab, Cell and Genetics Division, National Institute of Agricultural Biotechnology, RDA, Suwon 441-707, Korea; 3 Division of Applied Life Science, BK21 Program, Plant Molecular Biology and Biotechnology Research Center, Gyeongsang National University, Jinju 660-701, Korea; 4Department of Plant Sciences, Life Sciences Addition 1002, University of California–Davis, Davis, CA 95616, USA; 5Virginia Bioinformatics Institute, Washington Street, MC 0477, Virginia Tech, Blacksburg, VA 24061, USA; 6Rice Functional Genomics Group, Temasek Life Sciences Laboratory, 1 Research Link, National University of Singapore, 117604, Singapore; 7Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA Reviewed by Tony Pryor and John M. Watson
10.1 Introduction............................................................................................224 10.2 Transposon Tagging Systems ................................................................225 10.2.1 Activity of Transposons in Rice .....................................................225 10.2.2 One-Element System versus Two-Element System .......................229 10.2.3 Design of Constructs ......................................................................232 10.2.4 Gene and Enhancer Traps...............................................................236 10.2.5 Transiently Expressed Transposase System ...................................238 10.2.6 A High-Throughput System to Index Transposants .......................238 10.2.7 Using Endogenous Transposons.....................................................240 10.2.8 Inducible Transposition ..................................................................243 10.3 Mutagenesis Strategies ..........................................................................245 10.3.1 Random or Non-targeted Mutagenesis ...........................................245 10.3.2 Localized or Targeted Mutagenesis................................................246 10.4 Transposon Insertional Mutant Populations...........................................247 10.4.1 CSIRO Plant Industry Population...................................................248 10.4.2 EU (Wageningen) Population.........................................................249 10.4.3 National University of Singapore Population.................................250
224
Qian-Hao Zhu et al.
10.4.4 Korea Population............................................................................251 10.4.5 UC Davis Population...................................................................... 254 10.5 Gene Discovery by Transposon Tagging............................................... 256 10.5.1 Forward and Reverse Genetics Strategies ...................................... 256 10.5.2 Other Approaches for Mutation Identification ............................... 259 10.5.3 Tagging Efficiency......................................................................... 260 10.5.4 Confirmation of Tagged Gene........................................................ 261 10.6 Future Prospects .................................................................................... 261 References ..................................................................................................... 262
10.1 Introduction With the completion of rice genome sequencing, the new challenge for the rice community is to unravel the biological functions of approximately 40,000 rice genes. To achieve this goal, a wide range of functional genomics tools, such as microarray, serial analysis of gene expression (SAGE), RNA interference (RNAi), insertional mutagenesis, and bioinformatics, have been established and employed. Insertional (T-DNA, transposon or retrotransposon) mutagenesis has proven to be one of the most efficient methodologies, because studies of mutants with detectable phenotypes have given us the greatest insight into the mechanisms underlying a wide range of biological processes in plants. Compared with T-DNA insertional mutagenesis, transposon insertional mutagenesis (or transposon tagging) has distinct advantages. Large-scale transposon mutagenized populations can be produced using a relatively small number of starter lines, as many independent insertions can be generated among the progeny of a single line. The tagged gene can be confirmed by revertants resulting from excision of the transposon. Transposons were first discovered by Barbara McClintock in the 1940s as the causative agent of variegated maize (Fedoroff 1989). Since then, transposons have been found to be ubiquitous genetic elements in both prokaryotes and eukaryotes. In rice, recent genome sequencing and annotation have shown that a large portion of the rice genome consists of transposable elements (Mao et al. 2000), and almost all of these endogenous transposable elements are inactive under normal conditions. However, transposons have played an important role in the evolution of the rice genome. According to the transposition mechanism and propagation mode of transposons, they are categorized into two groups: class I elements, also called retrotransposons, transpose via an RNA intermediate, and class II elements transpose via a DNA intermediate by a “cut and paste” mechanism (Saedler and Nevers 1985; Coen et al. 1989; Gorbunova and Levy 2000). Both class I and class II elements exist as autonomous and nonautonomous transposable elements. An autonomous transposon
10 Transposon Insertional Mutants
225
encodes its own transposase—a protein required for its transposition. A nonautonomous transposon does not encode its own transposase, but can be induced to transpose by the transposase expressed by an autonomous transposon elsewhere in the genome. This chapter focuses on the utilization of class II elements—Ac/Ds and Spm/dSpm (also called En/I ) in rice functional genomics. The study of transposon insertional mutagenesis and the utilization of transposons as mutagens were initially carried out in maize (Zea mays) and snapdragon (Antirrhinum majus), in which a high frequency of spontaneous mutations resulted from insertions of their endogenous transposons within genes. Transposons were first isolated from these two species in the 1980s (Fedoroff et al. 1983; Pereira et al. 1985; Sommer et al. 1985), and soon after genes were cloned via transposon tagging (Fedoroff et al. 1984; Martin et al. 1985). Engineered transposons were also found to retain their transposability in transgenic plants, including rice (Baker et al. 1986; van Sluys et al. 1987; Yoder et al. 1988; Frey et al. 1990; Izawa et al. 1991; Murai et al. 1991; Finnegan et al. 1993). Several genes were cloned via introduced transposons in Arabidopsis and petunia at the same time (Aarts et al. 1993; Bancroft et al. 1993; Chuck et al. 1993; Long et al. 1993). These efforts demonstrated the feasibility of transposon tagging in heterologous plant species. The utilization of the maize two-element Ac/Ds and Spm/dSpm transposons for gene tagging have been extensively investigated since the autonomous Ac element was proven to be active in transgenic rice plants (Izawa et al. 1991; Murai et al. 1991). During the last decade, sophisticated transposon tagging systems have been established to improve screening efficiencies, and a large number of transposon insertion lines have been generated. Several genes have been cloned by transposon tagging since the successful cloning of a gene (BFL1/FZP) that mediates the transition from spikelet to floret meristem (Komatsu et al. 2003a; Zhu et al. 2003). In this chapter, we discuss rice transposon tagging systems and methodologies, summarize the progresses made, and discuss the strategies for gene discovery using transposon mutagenized populations.
10.2 Transposon Tagging Systems 10.2.1 Activity of Transposons in Rice Retaining transposability of engineered transposons in transgenic rice is an obvious prerequisite for rice gene tagging systems based on transposon mutagenesis. To test the mobility of transposons in the rice genome, the autonomous Ac element was first introduced into the rice genome by
226
Qian-Hao Zhu et al.
electrophoration via a construct in which the Ac element is inserted between the Cauliflower mosaic virus (CaMV) 35S promoter and the hygromycin phosphotransferase (hph) gene. Transposition of the Ac element was R proven by recovering hygromycin-resistant (Hyg ) plants (Izawa et al. 1991; Murai et al. 1991). The nonautonomous Ds element was also shown to transpose in transgenic rice plants in the presence of Ac transposase (Shimamoto et al. 1993). This was also the first study to demonstrate that Ds could be transactivated and stably integrated into different chromosomes of the rice genome by the transiently expressed Ac transposase at the tissue culture stage. These results encouraged many groups to investigate further the transposition behavior of Ac/Ds and Spm/dSpm in transgenic rice plants to determine the feasibility of using these two-element systems as mutagens for large-scale gene tagging in rice (Chin et al. 1999; Enoki et al. 1999; Nakagawa et al. 2000; Greco et al. 2001a, 2003, 2004; Kohli et al. 2001; Upadhyaya et al. 2002, 2006; Eamens et al. 2004; Ito et al. 2004; Jin et al. 2004; Kim et al. 2004; Kolesnik et al. 2004; Kumar et al. 2005; Szeverenyi et al. 2005; van Enckevort et al. 2005). Enoki et al. (1999) analyzed the behavior of Ac in 559 rice plants derived from four independent transgenic progenitors through three successive generations by Southern blot hybridization analysis. The frequency of Ac transposition ranged from 8.3% to 40.9% in the four independent transgenic rice populations. This frequency was comparable to those reported in other heterologous systems except for Arabidopsis, in which the transposition frequency of Ac was shown to be very low (Schmidt and Willmitzer 1989). This study also demonstrated a preferential transposition of Ac into protein-coding sequences in rice, through the rescue and analysis of Ac flanking sequences. Two-thirds of the rescued flanking sequences were shown to be homologous to predicted rice gene sequences (Greco et al. 2001a, 2001b). The frequency (15% to 50%) of Ac transposition detected by Greco et al. (2001a, 2001b) was similar to that reported by Enoki et al. (1999). Preferential transposition of Ds into coding regions has also been recently reported, with one-third of Ds flanking sequences showing homology to either protein coding sequences or to expressed sequence tags (ESTs) in rice (Kolesnik et al. 2004). Greco et al. (2001a) also showed that the transposition frequency of Ac in rice was inversely proportional to the Ac dosage (or copy number). Transformant lines harboring multiple copies of Ac resulted in a single transpositional event, whereas transformants with a single copy of the Ac induced multiple early transpositional events. This inverse correlation between the number of Ac excision events and Ac copy number had been previously observed in maize, and the timing of Ac excision in maize kernel development could be delayed by increasing the Ac copy number (McClintock 1950, 1951), but this effect depended on the level of transposase as well as the dosage and composition of the transactivated
10 Transposon Insertional Mutants
227
element (Heinlein 1996). However, in dicots, there was a consistent increase in germinal transposition of Ac with increasing Ac copy numbers (Jones et al. 1989; Hehl and Baker 1990; Keller et al. 1993a). Very high levels of transposase expression, however, have been found to inhibit Ac transposition in maize, petunia, and tobacco, perhaps because of the aggregation of the transposase protein (Scofield et al. 1993; Heinlein et al. 1994). Once stably integrated Ds lines are generated, they can be crossed with Ac lines capable of producing active transposase. In the resulting F1 progeny, expression of the Ac transposase protein can induce active transposition of the Ds element (Shimamoto et al. 1993). It was shown by Southern blot analysis that transposition of Ds occurred in a high proportion of F2 plants (Izawa et al. 1997). Although transposition inhibition was observed in most Ds lines in later generations (Izawa et al. 1997), Ds inactivation may not be a general phenomenon. First, in the same study, Izawa et al. (1997) found one line with actively transposing Ds elements over several generations. Second, later studies have also shown that Ds transposition was active in subsequent generations. For example, the frequency of independent Ds transposition in the F2 generation was 3% to 20% in one study (Nakagawa et al. 2000) and the frequency of putative stable insertion lines was approximately 6% in the F2 and double transformant T1 (DtT1) generations, and 7% to 12% in the F3 and DtT2 generations in another study (Upadhyaya et al. 2002). Other studies have also shown activity of Ds even in F4 and F5 generations (Kolesnik et al. 2004; Szeverenyi et al. 2006). These results indicate that a high frequency of Ds transposition can be achieved throughout successive generations in rice using an Ac/Ds-based tagging system. However, it is important to note that the transposition frequency may vary greatly among different lines and crossing combinations (Izawa et al. 1997; Nakagawa et al. 2000; Greco et al. 2001a; Upadhyaya et al. 2002; Kolesnik et al. 2004). Many factors could have caused these differences. First, the integration position of the Ds element may affect the binding efficiency of the transposase owing to the conformation or configuration of chromatin itself. The conformation and structure of the chromatin may also influence the reinsertion activity. If the targets and/or donor sites are difficult to access, the frequency of reinsertions may be low (Nakagawa et al. 2000). Therefore, the initial insertion site of a Ds element may be very important in determining its transposition frequency. The available large number of Ds insertion lines and corresponding Ds flanking sequences are valuable tools for investigating the effect of the initial Ds insertion site on its subsequent transposition. Second, the transposition of Ds is likely to be affected by the length or composition of the Ds construct itself (Ito et al. 1999). Third, imprecise excision and integration, by which the
228
Qian-Hao Zhu et al.
termini of the transposon are deleted, may also result in transposon inactivation as the integrity of the terminal inverted repeats of the Ac and Ds are essential for their transposition (Ito et al. 2002). Fourth, inactivation of Ds could be promoted by either multiple Ds copies or by varying levels of transposase (Kolesnik et al. 2004). In cases in which inactivation of Ds was observed, multiple copies of Ds and/or the Ds elements coexisted with the Ac transposase (Izawa et al. 1997). Therefore it is important to select single-copy Ds transgenic plants as starter lines. Finally, the inhibition of Ds transposition may also result from epigenetic suppression. The epigenetic suppression could be relatively stable, resulting in inactivation of the Ds element, even in the presence of transposase (Wang and Kunze 1989; Kim et al. 2002). It has become clear that chromatin structure and methylation are the two main mechanisms involved in epigenetic regulation (Gendrel and Colot 2005). In the case of the maize Ac/Ds transposon, methylation of the terminal or subterminal regions of the element has been correlated with decreased mobility of these elements (Fedonoff and Chandler 1994). McClintock (1984) suggested that genomic stress could also trigger the activity of transposons. Reactivation of silent Ac in maize through tissue culture was found to be associated with alterations in the methylation pattern (Brettell and Dennis 1991). Reactivation of silent Ds in rice following tissue culture has also been reported (Kim et al. 2002). Therefore, the transposability of epigenetically inactivated transposons could be reversed via developmental reprogramming such as tissue culture. It has been shown that Ac and Ds preferentially transpose to genetically linked sites in maize (Dooner and Belachew 1989) and in heterologous plant species such as tobacco (Jones et al. 1990), Arabidopsis (Keller et al. 1993b; Raina et al. 2002), and barley (Koprek et al. 2000). In rice, the frequency of linked transposition has been shown to range from 35% to 80% (Nakagawa et al. 2000; Greco et al. 2001a; Upadhyaya et al. 2006). This is a disadvantage of using the Ac/Ds for genome-wide mutagenesis. In contrast, the Spm/dSpm transposon has been shown to produce a high frequency of unlinked transpositions in Arabidopsis (Aarts et al. 1995). The feasibility of using the Spm/dSpm system for large-scale mutagenesis in rice has been investigated by two groups. Greco et al. (2004) found low frequencies of transposition. However, Kumar et al. (2005), using a new fluorescence marker based screening system, observed high frequencies of stable dSpm insertion lines with a high proportion of insertions unlinked to the donor sites (launch pads). The conflicting results obtained in these two studies in rice could be due to the differences in the length of termini of dSpm employed (Kumar et al. 2005). The 5΄ and 3΄ terminal sequences of the dSpm element used by Greco et al. (2004) were 267 and 640 bp, respectively, whereas those of
10 Transposon Insertional Mutants
229
the element used by Kumar et al. (2005) were 1,014 and 1,193 bp, respectively. As in Arabidopsis, dSpm element is likely to transpose to unlinked sites in rice more frequently than Ds element, therefore dSpm element should be more efficient than Ds for genome-wide coverage (Kumar et al. 2005). The mutation frequency in progeny plants depends on the frequency of germinal transposition in the parental plant. Moreover, a high frequency of somatic transposition may create undesirable mutations due to secondary transpositions. In the Ac/Ds two-element system, transposition of Ds is achieved by expressing Ac-encoded transposase driven by a constitutive promoter such as CaMV 35S. To achieve maximal germinal transposition, an attempt has been made to control the expression of transposase by the use of a meiosis-specific promoter in rice (Morita et al. 2003). Although a much higher frequency of independent transposition was observed with the meiosis-specific promoter when compared to a constitutive promoter, the feasibility of using these promoters needs to be further investigated, as the overall transposition frequency reported so far is still very low. 10.2.2 One-Element System versus Two-Element System Both the one-element and two-element systems have been used in transposon mediated gene tagging in rice (Fig. 10.1). In the one-element system, an autonomous transposon (Ac or Spm) that encodes its own transposase, delivered through T-DNA, is used as a mutagen. The Ac element is usually inserted between a constitutively expressed promoter such as CaMV 35S and an excision marker. The excision marker is expressed upon excision of the Ac element, by which the transposition of Ac can be monitored (Izawa et al. 1991; Murai et al. 1991; Greco et al. 2001a). However, identification of Ac reinsertion in the rice genome relies solely on molecular analyses due to the lack of a selectable marker within the Ac element. In the two-element system, two independent transgenic lines are generated: one with an immobilized autonomous element (wings-clipped Ac or Spm) that provides the transposase and the other with a nonautonomous element (Ds or dSpm) that is capable of transposition only in the presence of a transposase gene. In the nonautonomous element, selectable markers such as antibiotic or herbicide resistance genes are incorporated to select progeny bearing transposed elements. To monitor the transposition event, the nonautonomous element can be inserted between a promoter and an excision marker gene so that excision results in expression of the excision marker. Transgenic plants bearing these two transposable elements are crossed to induce transposition of the nonautonomous element. Stable Ds (or dSpm) insertion
230
Qian-Hao Zhu et al.
lines can be easily selected based on the expression of selectable markers in the F2 or subsequent generations because the transposed nonautonomous element is likely to be unlinked to the autonomous element that encodes the transposase.
A
Ac
P
Ex
G
S
Ac
P
B
iAc
P
Ex
x
S
G
S
P
RM
Ex
Ds
G
S RM
iAc
P
S
Ex
P
RM P
Ex
S
S
Ds G
Ds
G
Fig. 10.1. One- and two-element transposon tagging systems (Ac/Ds as an example). (A) One-element system. Ac is inserted between a promoter and an excision marker, which will express on excision of Ac and is used to indicate the occurrence of transposition. Transposed Ac re-inserts somewhere in the rice genome. If the insertion happens to be in an expressed gene, the function of the gene is impaired. The genomic sequence of the mutated gene can be isolated using Ac as a molecular tag. (B) Two-element system. iAc (immobile Ac) and Ds are introduced into two different transgenic plants. To induce Ds transposition, transgenic plants harboring iAc and Ds are crossed to bring iAc and Ds together. In the F1 generation, Ds transposes from the T-DNA (launch pad) and reinserts elsewhere in the rice genome. The transposition and reinsertion of the Ds element are monitored by the expression of the excision marker and reinsertion marker. In F2 and subsequent generations, stable Ds insertion plants are selected by segregating iAc away. The presence or absence of the iAc element can be achieved by Ac-specific polymerase chain reaction (PCR) or Ac counterselection marker. Ac, Activator; Ds, Dissociation; Ex, excision marker; G, rice gene; P:, promoter; RM, reinsertion marker; S, selectable marker for transformation
10 Transposon Insertional Mutants
231
Numerous studies have shown that both the one- and two-element systems are suitable for gene tagging in rice (Chin et al. 1999; Enoki et al. 1999; Nakagawa et al. 2000; Greco et al. 2001a, 2003, 2004; Kohli et al. 2001; Upadhyaya et al. 2002, 2006; Eamens et al. 2004; Jin et al. 2004; Kim et al. 2004; Kolesnik et al. 2004; Kumar et al. 2005; Szeverenyi et al. 2005; van Enckevort et al. 2005). To simplify the discussion that follows, Ac and Ds will be used as examples of autonomous and nonautonomous elements, respectively, unless otherwise specified. In one-element tagging system, the insertion is unstable as the transposability of the autonomous transposon remains throughout the life of a plant resulting in unstable mutations. Moreover, versatile selectable markers cannot be integrated into the autonomous element. Screening of transposed Ac elements relies on labor-intensive and costly polymerase chain reaction (PCR) and/or Southern blot analysis, which decreases the overall screening efficiency. Therefore, most laboratories are developing transposon mutagenized populations using the two-element system. In this system, the Ac element provides transposase to induce the transposition of the Ds element. Usually, the Ac element is modified to disable its own transposition by removing its inverted terminal repeats (i.e., wingclipped). The Ds element is extensively modified to incorporate a multitude of marker genes such as excision and reinsertion markers as well as a trapping reporter gene that can be activated by insertion adjacent to cisacting elements (e.g., promoters and enhancers). In several recent studies, counterselective markers are also incorporated into the Ac construct to eliminate plants bearing the transposase gene to stabilize the germinally transmitted Ds insertion. By appropriate combinations of marker genes, the efficiency of screening stable Ds insertion plants can be significantly improved. In most two-element systems, the autonomous and the nonautonomous elements are constructed in two different vectors. They are brought together by crossing transgenic plants bearing either of these two elements. Alternatively, calli can be cotransformed with two vectors harboring either the autonomous or the nonautonomous element to regenerate double transformants. The two-element system can also be integrated in one T-DNA vector (Greco et al. 2003, 2004; Kumar et al. 2005). This system has been efficiently used in selection of unlinked transposition events (Kumar et al. 2005). However, if the transposed nonautonomous element remains linked to the launch pad from which the transposase is expressed, then the transposed nonautonomous element will be as unstable as that in the oneelement system.
232
Qian-Hao Zhu et al.
10.2.3 Design of Constructs In the one-element system, the three basic components of the construct are a selectable marker for producing primary transformants; an autonomous element (Ac or Spm), and an excision marker, that is, a gene into which the transposon is inserted and in which, on excision of the transposon, expression of the maker gene is restored. The most common selectable markers R R are antibiotic resistance genes such as nptII (Kan ), hph (Hyg ), or an herR bicide resistance gene such as bar (Basta ) (Table 10.1). The most frequently used selectable marker in rice is hph. In the two-element system, the T-DNA carrying the transposase needs only a selectable marker for transformation, but it can be modified to carry a negative selectable marker (NSM) to allow plants bearing the T-DNA to be counterselected or identified by screening (Fig. 10.2). This feature stabilizes the transposed nonautonomous transposon and the mutant phenotype. The basic components for the second T-DNA, which harbors the nonautonomous transposon, are a selectable marker for transformation and a reinsertion marker (RM) for tracing the transposition or reinsertion of the nonautonomous transposon. As discussed in the preceding text, this marker can be an antibiotic- or herbicide-resistance gene. The transactivation of the nonautonomous transposon can be detected with the reinsertion marker in combination with an excision marker (Fig. 10.2). Another feature of some constructs is a plasmid rescue system that consists of an ampicillin resistance gene (bla) and an E. coli bacterial plasmid origin of replication (pBR322 ori). This provides an alternative way to rescue transposon flanking sequences, although thermal asymmetric interlaced PCR (TAIL-PCR; Liu et al. 1995), inverse PCR (iPCR; Earp et al. 1990), and adapter ligation PCR (Siebert et al. 1995; Devic et al. 1997; Zhu et al. 2006a) are all more suitable approaches for large-scale flanking sequence tag (FST) rescue. Further improvements of the basic construct can be made to increase the frequency of clean single-copy T-DNA insertion lines, which will greatly benefit subsequent segregation analysis. For example, in a newly designed construct, pNU435, a maize ubiquitin promoter-driven, intron-interrupted barnase gene is used as a vector backbone (VB) counterselective gene. Transformed cell lines with VB-containing T-DNA inserts will be eliminated by the activity of the barnase gene. Moreover, in this construct a clean T-DNA insert will allow the ubiquitin promoter positioned near the left border (LB) to act as a dormant gene activator. A second copy of a promoter-less intron interrupted barnase-nosT cassette, placed within the T-DNA and next to the right border (RB) of this construct also has the potential to serve as a T-DNA direct repeat (RB-LB-RB-LB) counterselector. The rationale of this design is that a directly repeated T-DNA transgene will have a strong ubiquitin promoter upstream of the barnase gene
10 Transposon Insertional Mutants
233
adjacent to the RB and that the resulting cell lines will be eliminated by the activity of the barnase gene (Upadhyaya et al. 2006). Table 10.1. Selectable and screening marker genes used in production of transposon insertion lines Tagging system
TransExcision Launching position marker pad (reinsertion) indicator marker Na hph hph
Transposase counter selectable marker hph
Reference
Ac
Selectable marker for transformation hph
Ac
hph
Na
gfp
gfp/hph
hph/gfp
Ac/DsG
hph
bar
na
hph
na
Greco et al. 2001a; Kohli et al. 2001 Chin et al. 1999
Ac/Ds
hph nptII
hph hph
nptII SPT
nptII SPT
na na
Nakagawa et al. 2000
Ac/DsG Ac/DsE
hph
nptII
na
gfp/hph
hph
Upadhyaya et al. 2002
Ac/DsE
hph
bar
na
gfp/hph/ P450
gfp/P450
Greco et al. 2003
Ac/DsG
hph
hph
na
Na
gfp
Eamens et al. 2004
Ac/DsE
nptII
hph
ALS
ALS
na
Ito et al. 2004
Ac/DsG
hph
bar
na
gfp/hph
gfp
Kolesnik et al. 2004
Ac/DsG
P450
bar
na
P450
P450
Kim et al. 2004
Ac/DsG Ac/DsE
hph bar
nptII bar
bar hph
hph hph
gfp gfp
Upadhyaya et al. 2006
Spm/dSpm hph (En/I)
bar
na
gfp/hph/ P450
hph/P450
Greco et al. 2004
Spm/dSpm hph (En/I)
DsRed
na
gfp/hph
gfp/hph
Kumar et al. 2005
Enoki et al. 1999
Ac, Activator; ALS, acetolactate synthase gene; bar, Basta resistance gene; Ds, Dissociation; DsE, Ds enhancer trap; DsG, Ds gene trap; dSpm, defective suppressor-mutator; En, Enhancer; gfp, green fluorescent protein gene; hph, hygromycin phosphotransferase gene; I, Inhibitor; na, not applicable; nptII, neomycin phosphotransferase gene; Spm, suppressormutator; SPT, streptomycin phosphotransferase gene
234
Qian-Hao Zhu et al.
A
Transposase
RB
B
S
NSM
LB
RM
RB
C
Ex
P
RM
R
RB
D
E
RM
RB
P
R
RM
F
Bn
R
RB
PRS
NSM
R
RB
LB
S S
PRS
Ex
P
R
LB
PRS
Ex
LB
S
R
LB
Bn
RM
Transposase
NSM
S
LB
Fig. 10.2. Diagrams of typical constructs used in two-element transposon tagging system in rice. (A) T-DNA harboring transposase, which is encoded by an autonomous transposon (Ac or Spm). To increase the efficiency of screening stable insertion lines, a negative selection marker (NSM) is usually integrated in the
10 Transposon Insertional Mutants
235
Although the bar gene is not an ideal selectable marker at the callus stage, it is an excellent selectable marker at the plant stage and is the preferred transposition or reinsertion marker to select for the presence of the Ds transposon in rice (Chin et al. 1999; Kim et al. 2004; Kolesnik et al. 2004; Upadhyaya et al. 2006). This is due primarily to the fact that a single spray of Basta can eliminate Ds-null segregants and thus greatly increase the overall screening efficiency. A new positive selection marker DsRed (Discosoma sp. red fluorescence protein) has been used in an Spm/dSpm tagging system, which has been shown to be very efficient for the selection of transposants (Kumar et al. 2005). Currently, most transposon tagging systems used in rice do not have an excision marker to detect the excision of the Ds element from the donor site (Chin et al. 1999; Upadhyaya et al. 2002; Greco et al. 2003, 2004; Eamens et al. 2004; Kolesnik et al. 2004; Kumar et al. 2005). PCR-based analysis has to be performed to confirm the excision of the Ds element. Incorporation of an excision marker for the nonautonomous element will significantly increase the screening efficiency (Fig. 10.2; Upadhyaya et al. T-DNA for counterselection. (B) to (F) T-DNAs harboring nonautonomous transposon (inverted triangles). The basic components of these T-DNAs are selectable marker (S) for selection of primary transformants and reinsertion marker (RM) for tracing the reinsertion events, but more markers can be combined to facilitate selection of stable insertion lines. In construct B, transposition of the nonautonomous element is monitored via an excision marker (Ex), and the reinsertion marker also serves as a selectable marker of transformation (for details see Nakagawa et al. 2000). Construct C is the most frequently used vector, in which the selectable marker is different from the reinsertion marker, and another marker (NSM) is used to distinguish the transposition events linked or unlinked to the launch pad (LP) T-DNA. A reporter gene without or with only minimum promoter is fused at one end of the transposon to serve as a gene or enhancer trap reporter. Because there is no excision marker PCRs are required to confirm that the transposon has excised from the LP in the selected stable insertion lines. In some cases a plasmid rescue system (PRS) is incorporated within the transposon for isolation of the genomic DNA flanking the transposon. To increase screening and trapping efficiency, the nonautonomous transposon is inserted between a promoter (P) and an excision marker (Ex) that will express on excision of the transposon and both ends of the transposon are fused with reporter genes (construct D; for details see Upadhyaya et al. 2006). This construct is further enhanced by incorporation of two copies of barnase genes, one within the T-DNA next to the RB and another within the vector backbone outside the LB, for selection of lines with clean single-copy T-DNA insertions preferably in intergenic regions (construct E; for details see Upadhyaya et al. 2006). The autonomous and nonautonomous transposons can also be integrated in the same vector as shown in construct F. Same as construct C, confirmation of excision of the nonautonomous transposon relies on PCR (for details see Greco et al. 2003, 2004; Kumar et al. 2005)
236
Qian-Hao Zhu et al.
2006). For example, the constructs pNU393A1/B2 and pNU435 have a CaMV 35S promoter-driven intron-interrupted hph gene cassette as the excision marker (Upadhyaya et al. 2006). This choice was based on the experience that this cassette works well as a selectable marker in rice transformation using embryogenic calli as the target tissue (Upadhyaya et al. 2000). It has been shown that this excision marker is particularly advantageous for selecting callus lines with Ds excision using the transiently expressed transposase (TET) system (Upadhyaya et al. 2006). An excision marker, in combination with a Ds reinsertion marker, is particularly useful for distinguishing Ds insertions either linked or unlinked to the T-DNA launch pad. Another feature that is incorporated in most of the constructs used in producing transposon mutagenized populations in rice is the gene or enhancer trap system (see next section for details). 10.2.4 Gene and Enhancer Traps Insertional mutagenesis by transposon tagging is useful when disruption of a gene leads to an obvious phenotype. But in eukaryotes, disruption of a gene frequently does not result in a visible phenotype because of functional redundancy between gene family members (Sundaresan 1996; Ramachandran and Sundaresan 2001). To overcome this difficulty, gene and enhancer trap systems have been developed for use in transposon tagging systems in plants (Sundaresan et al. 1995; Springer 2000). An enhancer trap harbors a minimal promoter fused to the open reading frame of a reporter gene. On insertion at or near a host gene, the minimal promoter may be cis-activated by enhancer elements in the host gene leading to expression of the reporter gene. A promoter trap contains a promoter-less reporter gene that is expressed when the transposon inserts downstream of an active endogenous promoter. A gene trap contains an intron with multiple splice acceptor sites fused to the coding region of the reporter gene. A fusion protein of the reporter gene with the N-terminal portion of a host gene will be produced when the element is inserted into either an exon or an intron of the host gene in the same transcriptional orientation (Fig. 10.3). Both enhancer and gene trap systems have been used in rice by incorporation of these features into Ac/Ds or Spm/dSpm transposon tagging systems (Chin et al. 1999; Upadhyaya et al. 2002, 2006; Greco et al. 2003, 2004; Ito et al. 2004; Kim et al. 2004; Kolesnik et al. 2004). The most frequently used reporter genes are uidA (gusA) and green fluorescent protein (gfp). A clear advantage of these trap systems is that the expression pattern of the tagged gene can be studied in detail by analyzing the GUS or GFP expression pattern during plant development. Such detailed knowledge of the
10 Transposon Insertional Mutants
237
TE SA
A
R
E Gene
B
E
Gene
TATA
R TE
TE TATA
C
R
E Gene
Fig. 10.3. Gene and enhancer trap systems. (A) Gene trap system; the transposable element (TE) has a promoter-less reporter gene (R), which contains splice acceptor (SA), at its 5΄ or 3΄ end. The reporter gene is expressed when the TE inserts into an intron, due to the creation of a fusion transcript (and therefore a fusion protein) by the interaction of splice donor of the gene and the SA of the reporter gene. (B and C) Enhancer trap system. The minimal promoter (TATA) of the reporter gene (R) is activated by a chromosomal enhancer (E), which can be in the same or complementary orientation as the TE, resulting in the expression of the reporter gene
expression pattern can be very helpful in subsequent phenotypic analysis of homozygous insertion mutants. The disadvantage of such unidirectional trapping systems is that there is neither selection against insertions outside genes, nor against insertions in which the reporter gene is in the opposite orientation relative to transcription of a tagged gene (Maes et al. 1999). This drawback has partly been addressed in a bidirectional trap system developed by Eamens et al. (2004). In the first series of bidirectional gene trap constructs (pEU334a/b), immediately inside the RB and LB borders are the Ds5΄ and Ds3΄ sequences, respectively. A promoter-less gfp gene (sgfpS65T), fused to the fourth intron of the Arabidopsis G protein gene (GPA1), containing splice acceptor sites in all three reading frames and a nopaline synthase terminator (nosT), were placed in 5΄−3΄ orientation as the RB or Ds5΄ gene trap. A promoter-less gus gene (uidA), fused to a GPA1 intron and nosT, was included as the LB or Ds3΄ gene trap. A CaMV
238
Qian-Hao Zhu et al.
35S promoter-driven, intron-interrupted hph chimeric gene was incorporated in the same orientation as the GUS-based gene trap to act as either (1) a selectable marker following the initial Agrobacterium-mediated transformation event or (2) a subsequent Ds reinsertion marker following Ds transposition from the T-DNA launch pad. The more recent construct pNU435 contains not only this proven bidirectional gene trap Ds cassette (Ds3'-GPA1-SA-uidA-nosT and Ds5'-GPA1-SA-eyfp-nosT), but also harbors two barnase genes located inside the RB and outside of the LB to counterselect against directly repeated T-DNA or VB integrations, respectively (Upadhyaya et al. 2006). 10.2.5 Transiently Expressed Transposase System Transient expression of introduced foreign DNA in target plant cells, which occurs before any stable integration through illegitimate recombination or its breakdown by the plant surveillance system, is a well known phenomenon. A burst of transient expression of the genes carried by the introduced T-DNA can be visualized by reporter gene expression within 48 to 72 hours of cocultivation (N. M. Upadhyaya et al., unpublished data). A transient assay is usually used to assess the transactivation of the Ac transposase-mediated excision of the Ds element prior to its stable integration into the plant genome. This type of transient assay has been performed in barley, rice and wheat using a Ds-interrupted uidA reporter gene (McElroy et al. 1997; Solis et al. 1999; Takumi et al. 1999). By cotransformation with an Ac construct and a Ds-interrupted uidA construct, the Ac-mediated transactivation of the Ds element can be measured by the expression of the uidA gene. Recently, Upadhyaya et al. (2006) have developed a system where a transiently expressed transposase (TET) is used to produce stable Ds insertion lines (i.e., without an integrated Ac element) in rice. The main advantage of the TET system is that stable Ds insertion lines can be produced as primary transformants. In contrast, with the Ac/Ds crossing system, the first available screening population is F2. To overcome somaclonal variation induced by tissue culture, which has been a major drawback with T-DNA insertional mutagenesis (An et al. 2005), the tissue culture phase in the TET system is kept to the absolute minimum (Upadhyaya et al. 2006). 10.2.6 A High-Throughput System to Index Transposants In producing an indexed and saturated insertional-mutant library, the final step is to determine the chromosomal location of each Ds transposant (Fig. 10.4). Up to now, most investigators have mainly focused on producing random, saturation mutant libraries. The different laboratories together
10 Transposon Insertional Mutants
239
have produced more than 150,000 Ds transposants (Table 10.3). However, the chromosomal locations of only 12% of these transposants have been determined by flanking sequence analysis. For these analyses, most investigators use the TAIL-PCR method. Researchers in Cornell University have recently developed a novel long-PCR based high-throughput procedure to determine the chromosomal location of a large number of Ds transposants to construct an indexed, region-specific, insertional-mutant library (He et al. 2007). The procedure is based on the novel use of a longPCR based high-throughput system, coupled with an anchored population, which allows rapid and simultaneous determination of the chromosomal location of thousands of insertional mutants at the same time. The principle of this procedure is based on measuring the transposition distance between a Ds transposant and a specific genomic sequence of interest in rice. Since the long-PCR procedure can amplify a genomic sequence of greater than 10 kb, all the transposants that transpose within this region can be captured simultaneously. Measuring the size of the long-PCR products by comparing DNA size markers may have an error of up to 3%; thus a 10-kb fragment may be 10 ± 0.3 kb away from that of the specific Ds primer position. However, this degree of accuracy is sufficient to meet the requirements of this system. Ds plants x Ac-TPase plants F1 plants Self-pollination F2 families Basta and hygromycin selection BastaR and HygR families
BastaS and HygR families Self-pollination F3 families (BastaR and HygR)
DNA isolation & PCR to display transposition events EDS+ & Ds+ siblings/family Plant EDS+ & Ds+ siblings in soil Determine chromosomal location of each transposant by a high throughput long-PCR procedure
Fig. 10.4. A flow chart for generating a large-scale, indexed, Ds transposant population in rice. EDS, empty donor site
240
Qian-Hao Zhu et al.
This approach has been tested by attempting to determine the chromosomal location of transposants derived from three anchor lines (launch pads). The results from one of these anchor line are as follows. Out of a total of 249 transposants, 72 (29%) transposed to new positions on the same chromosome, and 20 (8%) of these were within a region of 400-kb flanking the anchor position. In principle, all transposants within this 400-kb region can be captured by using 40 pairs of PCR primers with one set of primers positioned every 10 kb. The recurrent primer sequence is based on a short sequence complementing a portion of the 5΄ Ds sequence, whereas the 40 variable sequences are based on the genomic sequence chosen from the DNA database. Only 249 transposants around this anchor line were collected, and thus many more are needed to truly saturate this 400-kb region. Since the transposition process is not random, approximately 5,000 transposants may be needed to saturate this region with the expectation that up to 400 transposants may be found. Based on previous experience, it takes approximately two person-months to obtain 500 flanking sequences via the TAIL-PCR procedure. Thus, 20 person-months would be needed to analyze 5,000 transposants. On the other hand, using the long-PCR procedure, it would only take four person-months to analyze 5,000 transposants. According to Muskett et al. (2003), genes of approximately 0.5 kb in size account for between 10% and 20% of all Arabidopsis genes. If rice has the same percentage of small genes, the number of transposants required would need to be increased from 5,000 to approximately 20,000 to saturate a 400-kb region and tag all genes, within this region. If one uses the TAIL-PCR procedure to determine the flanking sequences, the amount of work would be proportional to the number of transposants. Thus, 80 person-months would be needed to determine the flanking sequences of 20,000 transposants. On the other hand, by using the long-PCR procedure, only eight personmonths of work would be required to achieve the same goal. Thus, the long-PCR procedure would be ten times more efficient than the TAILPCR procedure (He et al. 2007). The reconstruction experiment has shown that at least 100 transposants can be pooled together for DNA isolation, and 100 pools can be employed simultaneously to analyze 10,000 transposants (He et al. 2007). In principle, the entire rice genome can be saturated using this method, with the participation of many scientists around the world, and each group working on several specific Ds anchor lines at a time. 10.2.7 Using Endogenous Transposons Transposable elements are a major component of the repetitive DNA that comprises more than 40% of the rice genome (Goff et al. 2002; Yu et al.
10 Transposon Insertional Mutants
241
2002). Four types of active endogenous transposable elements have been identified in rice. The LTR (long terminal repeat) retrotransposons Tos10, Tos17, and Tos19 were the first identified (Hirochika et al. 1996) and Tos17 has been used in large-scale mutagenesis in rice. Karma is a LINE (long interspersed nuclear element)-type retrotransposon showing continuous transposition in consecutive generations (Komatsu et al. 2003b). The presence of active MITE (miniature inverted repeat transposable element) sequences, such as miniature Ping (mPing), has also been revealed through the analysis of rice genomic sequences (Jiang et al. 2003; Kikuchi et al. 2003). All of these native rice transposable elements are dormant under normal conditions and become active during tissue culture (Hirochika et al. 1996; Jiang et al. 2003; Kikuchi et al. 2003; Komatsu et al. 2003b) or after treatment with inducible agents such as γ-irradiation (Nakazaki et al. 2003). Tos17 is 4,114-bp long. Its copy number in the rice genome is quite low compared with other endogenous retrotransposon classes. Nipponbare contains only two copies of Tos17 per haploid genome. After tissue culture-induced activation, Tos17 could be amplified to approximately 30 copies (Hirochika 2001). Three characteristics make Tos17 an ideal mutagen for saturation mutagenesis in rice. First, the copy number of Tos17 correlates with the duration of tissue culture, making it possible to control the number of Tos17 copies. Second, Tos17 tends to transpose to unlinked positions. Third, Tos17 prefers low-copy-number sequences and genes as integration targets (Yamazaki et al. 2001; Miyao et al. 2003). A population of approximately 50,000 Tos17 insertion lines containing approximately 500,000 mutated sites has been generated and is available for public use (Hirochika et al. 2004). The feasibility of using this Tos17 insertion population for screening of targeted mutants (i.e., forward genetics approach) has been demonstrated by cloning several important genes (Table 10.2). A reverse genetics strategy is perhaps more powerful because of the availability of three-dimensional DNA pools of Tos17 insertion mutants and more than 15,000 Tos17 FSTs have been categorized for searches of genes knocked out (http://tos.nias. affrc.go.jp/). As such the proportion of mutants, identified and characterized by reverse genetics strategy, is more than that by forward genetics approaches (Table 10.2). The mobility of mPing/Pong in rice has provided the possibility of using this type of transposon for gene tagging in a similar way to that used in Tos17 mutagenesis. However, their high copy number (dozens, or even more), and currently unclear transposition frequency makes them impracticable for large-scale mutagenesis in the foreseeable future. Recently, the fourth active endogenous transposable element—nDart (nonautonomous DNA-based active rice transposon)— a member of the hAT transposon superfamily has been identified by the analysis of spontaneous mutable alleles (Fujino et al. 2005; Tsugane et al. 2006). The nDart element has
242
Qian-Hao Zhu et al.
Table 10.2. Genes identified using Tos17 tagged mutants Gene name OsH15 OsABA1, OsTATC OsPHYA OsHOS59 OsMSP1
OsCesA4, 7 and 9 OsCHLH OsCPS1, OsKS1, OsKO2, OsKAO OsGAMYB OsPAIR1 and 2 OsUDT1 OsCAO1 OsTPC1 OsGS1;1 OsMADS1 OsMADS3
OsFON1
OsSSI OsCLC-1 and -2
Mutant phenotype Dwarf Precocious germination
Strategy Reverse Forward
Etiolated seedlings No phenotypic changes Excessive number of both male and female sporocytes, disordered anther wall layers and loss of the tapetum layer Brittle culm due to dramatically cellulose contents Albino Dwarf
Reverse Reverse Forward
Shortened internodes, defects in floral organ development pollen development Male and female sterility Male sterility Pale green leaves Reduced defensive response Severe retardation in growth rate and grain filling Complete conversion of lodicules, stamens, and carpels into lemmaand palea-like structures Transformation of stamens into lodicules and ectopic development of lodicules in the second whorl near the palea Semi-dwarf, less tillers and secondary rachis-braches, enlarged shoot apical meristem and altered floral organs Increased gelatinization temperature of endosperm starch, but no effect on the size and shape of seeds Inhibition of growth at all life stages
Reference Sato et al. 1999 Agrawal et al. 2001 Takano et al. 2001 Ito et al. 2002 Nonomura et al. 2003
Forward
Tanaka et al. 2003
Reverse Forward
Jung et al. 2003 Sakamoto et al. 2004
Reverse
Kaneko et al. 2004
Forward
Nonomura et al. 2004a, 2004b Jung et al. 2005 Lee et al. 2005 Kurusu et al. 2005 Tabuchi et al. 2005 Agrawal et al. 2005
Reverse Reverse Reverse Reverse Reverse Reverse
Yamaguchi et al. 2006
Reverse
Moon et al. 2006
Reverse
Fujita et al. 2006
Reverse
Nakamura et al. 2006
identical 19-bp terminal inverted repeats (TIRs), and generates 8 bp of target site duplication (TSD) on insertion. The transposition of the nDart element can be induced by crossing with a line containing aDart, the corresponding autonomous element. The nDart insertions can then be stabilized after segregation away the aDart element (Fujino et al. 2005; Tsugane et al. 2006). Therefore, the nDart/aDart forms an endogenous transposon
10 Transposon Insertional Mutants
243
mutagenesis system in rice and is a potential new tool for gene tagging in this species. Polymorphism analysis of several japonica and indica varieties has shown that nDart is amplified independently in the genomes of these two rice subspecies (Fujino et al. 2005), and that a high frequency of transposition of nDart is observed in lines, such as H-26, that carry the aDart elements (Tsugane et al. 2006). Sequence analyses have revealed that Nipponbare contains at least 18 nDart elements, 12 dormant iDart elements (inactive Dart) but no aDart element (Fujino et al. 2005; Tsugane et al. 2006). iDart elements are structurally similar to aDart but are epigenetically silenced because they can induce transposition of the nDart elements after treatment with 5-azaC (Tsugane et al. 2006). Unlike the MITE transposon mPing/Pong, which preferentially inserts into AT-rich regions (Jiang et al. 2003; Kikuchi et al. 2003), nDart elements seem to transpose randomly in the rice genome since no conserved TSDs are found. This is an advantage for using nDart/aDart in transposon mutagenesis in rice. Before nDart/aDart transposons can be employed as a functional genomics tool in rice, further investigations are required to determine the transposability (induced by aDart or chemical treatment) and target preferences of nDart. In addition, because of the high copy number of the nDart element, efficient approaches for progeny analysis also need to be developed. 10.2.8 Inducible Transposition The efficiency of the current transposon tagging systems used in rice depends on whether the transposons can be efficiently controlled and stabilized after their transposition. In the one-element system, the autonomous transposon (Ac or Spm) retains its potential to excise from the inserted gene, resulting in chimeric progeny plants. Although this disadvantage has been overcome by the two-element system, the transposition time of the nonautonomous transposon (Ds or dSpm) remains unregulated due to the constitutively expressed transposase. To overcome this disadvantage, a self-stabilizing Ac derivative (Ds303), which undergoes autonomous transposition from the T-DNA but is stabilized once integrated (unless activated again by a subsequently introduced transposase source), has been investigated in tomato (Schmitz and Theres 1994). An ideal strategy would be to control transposase expression by means of an inducible promoter. Transposon constructs in which the expression of the transposase is controlled by heat-shock or chemically inducible promoters have been developed and used in tobacco, tomato, rice, and Arabidopsis (Charng et al. 2000; Nishal et al. 2005; G-L Wang’s group). In the INAc (Inducible Ac) vector, the transposase is driven by the PR-1a promoter that is induced by
244
Qian-Hao Zhu et al.
salicylic acid, and this component, together with a selectable marker (hph), is inserted in the internal region of the Ds element that is, in turn, inserted between the 1΄ promoter and the 5΄ untranslated region of the luciferase (LUC) gene. In this construct, LUC is the excision marker and hph serves as both a transformation-selectable marker and a Ds transposition marker. Transposition of the Ds element is induced by the application of salicylic acid and is stabilized in the absence of salicylic acid (Charng et al. 2000). Spontaneous transposition of the Ds element is low in tobacco but much higher in tomato. The induced transposition frequency depends on the concentration of salicylic acid. This construct has also recently been used to produce transgenic rice plants. Inducible transposition has been demonstrated in a salicylic acid dose-dependent mode, but high spontaneous transposition occurred in some transgenic rice lines (Charng et al. 2007). The main drawback of the INAc construct is that the inducible transposase cannot be segregated away in the progeny because it is integrated as a part of the Ds element. The transposase source and the nonautonomous element should be separated, even for the inducible system, to avoid the undesired additional transpositions due to autonomously expressed transposase. A heatshock promoter fused to the Ac element has been shown to be able to induce the transposition of Ds in Arabidopsis (Balcells et al. 1994). More recently, this heat-shock promoter has been integrated into a gene trap system and successfully used in Arabidopsis to produce large numbers of Ds insertion lines (Nishal et al. 2005). In this system, the Ac transposase, whose expression is induced by heat-shock at the flowering stage, is engineered in the same vector as the Ds element, which has nptII as a reinsertion marker and SPT (streptomycin) as an excision marker. This system can be easily adopted in rice, but its feasibility in large-scale transposon mutagenesis in rice still needs to be further investigated because the optimal time for heat-shock treatment is during the reproductive stage (as shown in Arabidopsis), which may not be practicable in rice, particularly for large-scale treatment. Another inducible transposon tagging system being developed in rice is a dexamethasone (DEX) inducible activation-transposon-tagging system (G-L Wang’s Group). In this system, the Ac transposase and Ds transposition are controlled by the transcription activator GVG that is regulated by the application of DEX. An approach using the cre-lox site-specific recombination system, to delete the Ac transposase (thereby stabilizing the transposed Ds elements) once Ds transposition has been induced, is also being investigated in rice by this group.
10 Transposon Insertional Mutants
245
10.3 Mutagenesis Strategies 10.3.1 Random or Non-targeted Mutagenesis In this strategy, starter lines homozygous for the autonomous (usually immobilized) or nonautonomous transposon insertions are produced and selected for crossing to produce F1 progeny. The F1 progeny are heterozygous for both the autonomous and the nonautonomous element. In this generation, the nonautonomous element transposes to new locations from the T-DNA (launch pad) under the influence of the autonomous element. The F2 populations are then produced by selfing F1 plants and screened for the presence of the selectable marker (excision and transposition markers) or by PCR analyses to select stable insertion lines in which the nonautonomous element has excised from the launch pad and reinserted into the rice genome and the autonomous element has segregated away. This procedure can be used to generate a large number of plants with transposed nonautonomous elements. Assuming that the nonautonomous element transposes randomly and inactivates rice genes, genome-wide (i.e., 400 Mb) saturation mutagenesis of all rice genes (assuming as 60,000) would require a mutagenized population of 180,000 to 460,000 (Hirochika et al. 2004). The population required is affected by the number of transposon copies in the rice genome as well as the transposition frequency and integration patterns (linked or unlinked). Similar to previously observed results in other plant species, a high proportion of the transposed Ds elements insert at sites that are closely linked to the launch pad (Upadhyaya et al. 2002, 2006). Thus Ac/Ds may be an inefficient general mutagen, but could be highly efficient for regional mutagenesis. One way to facilitate global mutagenesis is to select unlinked transpositions using a launch pad indicator (e.g., the excision marker or other markers integrated in the Ds/T-DNA launch pad). Using GFP as a counterselective marker for the launch pads, unlinked transposition events are significantly enriched to reach more than 80% (Kolesnik et al. 2004). An alternative is to select for Ds starter lines with insertions evenly distributed throughout the rice genome and then to use these starter lines for localized saturation mutagenesis. Considering that 50% of transposed Ds elements insert within 1 Mb (approximately 4 cM in rice) of the genomic region flanking the donor site, 430 Ds starter lines that are evenly distributed throughout the rice genome at a 1-Mb interval and approximately 400,000 F2 plants (~930 F2 plants need to be produced from each Ds starter line) could be sufficient to saturate the whole genome. Other assumptions for this estimation are: the rice genome contains 50,000 genes; the size of the rice genome is 430 Mb; the frequency of independent transposition in the F1 generation is 50%; each gene within the 1-Mb genomic region flanking the Ds donor site has the same probability of
246
Qian-Hao Zhu et al.
being mutagenized. The outcome will be better if more Ds starter lines are used because fewer F2 plants are required for each cross combination, thus increasing the frequency of independent transpositions. The major difficulty with this approach is to establish the starter lines. Considering that a large number of Ds insertion lines have been generated in several laboratories, and a good number of Ds FSTs have been rescued and mapped, the first set of the Ds starter lines could be selected from currently available resources. The regions devoid of Ds insertions can then be mutagenized using these Ds starter lines for “transposon-walking” strategies. Toward this end, 74 singlecopy Ds/T-DNA launch pads that are relatively evenly distributed amongst the 12 rice chromosomes have been produced (Upadhyaya et al. 2006). The Spm/dSpm system can be used in a complementary way for saturation mutagenesis as it has been revealed that Spm/dSpm does not show preferential transposition in rice (Kumar et al. 2005). 10.3.2 Localized or Targeted Mutagenesis Although there is no distinct difference between localized and targeted mutagenesis, localized mutagenesis is more focused on saturation of a particular chromosomal region in a way similar to that discussed in the preceding text for localized saturation mutagenesis while the main aim of targeted mutagenesis is to identify specific genes. The utility of the Ac/Ds system for localized insertional mutagenesis in Arabidopsis has been demonstrated by several studies (Long et al. 1997; Dubois et al. 1998; Ito et al. 1999, 2002; Muskett et al. 2003) and has now been extended to rice (Upadhyaya et al. 2006). Targeted transposon mutagenesis was first developed in Drosophila as a means of isolating mutants associated with a cloned gene (Kaiser and Goodwin 1990). The first successful example of using this approach in plant gene identification was the isolation of the tomato fungal resistance gene Cf-9 using a Ds located 3 cM away (Jones et al. 1994). The Ac/Ds system has also been successfully employed in a targeted tagging strategy where the FAT ACID ELONGATION1 gene was targeted and cloned using Ac as a molecular tag (James et al. 1995). In rice, a large number of genes of interest have been mapped based on QTL analysis or other methodologies. With the availability of the whole rice genome sequence, these genes of interest could be isolated by map-based cloning approaches, but this is a time-consuming process. A straightforward strategy for cloning these target genes is to use targeted transposon mutagenesis. In this strategy, an insertion line with the transposon insertion genetically linked to the gene of interest is retrieved from the insertion mutant libraries and crossed with a line containing transposase to generate multiple mutant alleles based on the fact that most Ds transpositions occur
10 Transposon Insertional Mutants
247
in genetically linked positions. Large populations of F2 or subsequent generations are then screened for mutant phenotypes tightly linked to the transposon insertion. In Arabidopsis, six independent Ac insertion alleles of DETERMINATE INFERTILE1 were generated from the same donor T-DNA by targeted mutagenesis (Bhatt et al. 1996). Seedling vigor has been selected as a trait for targeted mutagenesis in rice by CSIRO researchers (N.M. Upadhyaya et al., unpublished data). To do this, lines with Ds/T-DNA launch pads in the vicinity of previously identified seedling vigor QTLs are supertransformed with Agrobacterium harboring an Ac construct to induce Ds transposition. The DtT1 generations are being screened for seedling vigor mutants.
10.4 Transposon Insertional Mutant Populations Several groups are developing large-scale transposon insertional mutagenesis populations in rice using the one-element Ac system (Enoki et al. 1999; Greco et al. 2001a), the two-element Ac/Ds (Chin et al. 1999; Upadhyaya et al. 2002, 2006; Greco et al. 2003; Ito et al. 2004; Kim et al. 2004; Kolensik et al. 2004; Szeverenyi et al. 2005; van Enckevort et al. 2005) or Spm/dSpm systems (Greco et al. 2004; Kumar et al. 2005). The available transposon insertion populations and the rescued rice genomic sequences flanking transposon insertions are listed in Table 10.3. Table 10.3. Available transposon mutant populations Country Variety (institution)
Tagging system
Population No. of size FSTs
Approach for FST rescue
Reference
Australia
Ac/Ds
17,000
1,000
TAIL-PCR, plasmid rescue, adapter ligation PCR
Upadhyaya et al. 2002, 2006
China
Zhonghua 11 Ac/Ds
>5,000
na
Na
Xue et al. 2003
European Union
Nipponbare, Ac Bengal and Ac/Ds Pusa Basmati Spm/dSpm
>10,000
>5,000
TAIL-PCR or van Enckevort Adapter ligation et al. 2005 PCR
Korea
Dongjin
Ac/Ds
98,000
11,386
TAIL-PCR
Chin et al. 1999; Kim et al. 2004
Singapore
Nipponbare
Ac/Ds
23,000
3,000
TAIL-PCR
Kolesnik et al. 2004
United States
Nipponbare
Spm/dSpm Ac/Ds
10,500
7,400
TAIL-PCR or Kumar et al. Adapter ligation 2005 PCR
Nipponbare
248
Qian-Hao Zhu et al.
10.4.1 CSIRO Plant Industry Population At CSIRO, the Ds insertion populations are produced by crossing iAc and Ds (enhancer or gene trap) transgenic lines generated by Agrobacteriummediated transformation. Alternatively, cotransformation of iAc and Ds vectors or supertransformation of calli derived from Ds launch pads with the iAc vector is also used to produce mutagenized populations. Stable Ds – + insertion lines (iAc Ds ) with transposed Ds from subsequent generations are screened by either PCR analyses or via selectable markers, depending on the constructs used. The initial constructs used (pSK100 and pSK200) have a nonfunctional nptII (a Ds reinsertion marker) and hence screening for the presence of Ds and the absence of Ac required Ds- and Ac-specific PCR analyses, making the screening process very laborious and time consuming (Upadhyaya et al. 2002). To increase the screening efficiency, both iAc and Ds constructs have been modified to incorporate selectable and/or visual markers. In the pNU393A1/B2 construct, hygromycin and Basta resistance genes are used as Ds excision and reinsertion markers, respectively. In the iAc construct, pNU400 the GFP gene (sgfpS65T) is used as visual marker. Identification of plants with stable Ds insertions in the resulting screening population relies completely on these selectable and visual markers and PCR analyses are performed only for definite confirmation. Further improvement has been made to the Ds construct (pNU435) by the incorporation of the counterselective gene, barnase, that is under the control of the strong ubiquitin promoter upstream of the RB. With a RB-LB-RB-LB direct T-DNA repeat integration, the barnase gene will be expressed to kill the transformed cells containing this type of T-DNA repeat. CSIRO researchers are now focusing on producing single-copy T-DNA insertion lines or Ds launch pads that are evenly distributed in the rice genome using this Ds construct for localized mutagenesis and traittargeted mutagenesis via the TET system (Fig. 10.5; Upadhyaya et al. 2006). To date, approximately 1,000 Ds launch pads (LPs) have been produced, approximately 350 of these are single-copy Ds/T-DNA LP lines and approximately 100 of these single copy lines have had their FSTs mapped (Upadhyaya et al. 2002, 2006; see http://www.pi.csiro.au/fgrttpub for updates). Approximately 17,000 stable Ds insertion lines have been generated by crossing, co- or supertransformation and the majority of these are gene trap lines. Ds flanking sequences of these lines are being rescued by TAIL-PCR, adapter ligation PCR or plasmid rescue and approximately 700 FSTs have been deposited in public databases. Phenotyping has been performed for approximately 1,500 stable Ds insertion lines under normal glasshouse conditions and approximately 30% of lines show visible mutant phenotypes including late germination, defective shoot apex formation, low seedling vigor, seedling lethality, dwarfism, variegated or twisted
10 Transposon Insertional Mutants
DtT0/F1
Single copy Ds (with hpt as Ds excision marker and bar as Ds reinsertion marker) callus lines from primary transformants or heterozygous T1 seeds (BastaR), co-transformtion with iAc binary vector (with gfp as visual marker)
Regeneration DtT0 GFP-, BastaR, HygR (stable insertion lines) Confirmation by PCR FST rescue, progeny analysis, phenotyping
DtT1/F2 GFP+/ GFPBastas
Eliminated
GFP+ BastaR
Repeat segregation analysis (DtT2/F3)
GFP+, BastaR, HygR
GFP-, HygS BastaR
Confirmation as SI lines unlinked to LP by PCR
Supertransformation (iAc Ds or Ds iAc)
GFP-, HygR BastaR
Confirmation as SI lines linked to LP by PCR
Transiently expressed transposase Ds excision (HygR) and reinsertion (BastaR) selection
249
DsE/DsG X iAc
Fig. 10.5. Strategy for generating and screening stable Ds insertion lines by transiently expressed transposase (TET) system (shaded), crossing or double transformation (unshaded).
leaves, early or late flowering, partial or complete sterility, deformed spikelets, and small seeds. An analysis of 350 stable Ds insertion lines has shown that 15% and 70% of these lines expressed the GUS reporter gene in leaves and spikelets, respectively (Q.-H. Zhu et al., 2006b). Phenotyping is also being performed under field conditions. 10.4.2 EU (Wageningen) Population Both Ac one-element and Ac/Ds or Spm/dSpm two-element systems have been employed to develop transposon mutagenized populations. In the oneelement system, the Ac element is inserted between the CaMV 35S promoter and the gfp gene so that the expression of GFP is restored on excision of the Ac element (Greco et al. 2001a). In the Ac/Ds and Spm/dSpm twoelement systems, the immobilized Ac or Spm element driven by the CaMV 35S promoter is constructed in the same binary T-DNA vector as the Ds or dSpm element, in which bar is used to monitor and trace the mobilization of the nonautonomous element (i.e., Ds or dSpm) (Greco et al. 2003, 2004). The T-DNA construct also contains a negative selection maker (a cytochrome P450 gene, SU1, which converts the pro-herbicide 7042 into a cytotoxic form) for the transposase gene (i.e., Ac or Spm). Using this construct, transposition of the Ds or the dSpm element could occur directly after transformation in the transgenic calli or in the regenerating T0 plants. Theoretically stable transposants can be selected simply by application of Basta and R7042. The use of bar and SU1 genes as positive and negative selection markers seems to be highly efficient for screening in Arabidopsis (Tissier
250
Qian-Hao Zhu et al.
et al. 1999), but in rice only the bar gene has proven to work efficiently, while the effectiveness of SU1 is still remains to be determined. A core collection of 58 Ac/Ds T0 lines has been used to develop 1,421 T1 plants, from which more than 200,000 T2 seeds have been produced. Nearly 10,000 T2 plants have been analyzed in detail. In addition, more than 3,000 Ac lines that showed high frequency of Ac transposition have also been generated (van Enckevort et al. 2005). Transposon FSTs are isolated by TAIL-PCR or adapter ligation PCR. About half of the PCR products generated were of good quality as revealed by sequencing. After BLAST searching, it was found that 59% of the transposons inserted in annotated genes, while the remaining insertions were in intergenic regions. The mapping information of all these FSTs can be found in the database, OryGenesDB (http://orygenesdb.cines.fr/) and insertion lines are publicly available. 10.4.3 National University of Singapore Population Generation of Ds Insertion Lines
A two-element Ac/Ds gene trap system was used to generate a large collection (more than 20,000 lines) of stable, unlinked single-copy Ds transposants in rice (O. sativa ssp. japonica cv. Nipponbare). An immobilized Ac under the control of the CaMV 35S promoter was used to generate transposase R lines. The nonautonomous Ds element containing the bar gene (Basta ) as a transposition marker and a modified promoter-less uidA gene encoding β-glucuronidase as a reporter gene was transformed into rice to obtain Ds parental lines. The synthetic green fluorescence protein (sgfp, Chiu et al. 1996) and the enhanced yellow fluorescent protein (eyfp, Clontech, Mountain View, CA) genes both under the control of maize ubiquitin promoter were used as counterselection markers for Ac and Ds/T-DNA launch pads, respectively. Frequency and Timing of Transposition
Different cross combinations of homozygous Ac and Ds starter (parental) lines were used to establish the collection of Ds insertion lines. Altogether 4,413 F2 families were analyzed for transposants and the results showed an average germinal transposition frequency of 51%. Study of Ds transposition pattern in siblings of several F2 families revealed that 79% had at least two different insertions, suggesting late transposition during rice development, resulting in several independent single copy Ds lines within a family (Kolesnik et al. 2004). Further analysis on the timing of transposition during rice development (by analyzing possible footprints with reciprocal PCRs among siblings) showed that the independent events among siblings were due to primary transposition events. This analysis provided evidence that Ds transposed late after tiller formation (Szeverenyi et al. 2006).
10 Transposon Insertional Mutants
251
Stability of Parental and Transposed Lines
Several reports on Ac/Ds transposon mutagenesis showed that both starter lines and stable transposants become silenced in later generations, which cast doubts on the applicability of this approach for large-scale mutagenesis. Systematic analysis on various aspects of the silencing phenomenon in rice (Oryza sativa ssp. japonica cv. Nipponbare) was carried out to show the stability of Ds through progressive generations. The high somatic and germinal transpositional frequencies observed in earlier generations were maintained as late as T4 and T5 generations indicating that the propagation of such parental lines did not induce transposon silencing. The stably transposed Ds was active even after the F5 generation as it could be remobilized (as shown by footprint analysis of several revertants). Apart from these, in more than a thousand stably transposed Ds lines, the bar gene expression was examined from F3 to F6 generations and notably substantial transgene silencing was not observed in the lines tested (Szevernyi et al. 2006). Chromosomal Distribution of Ds Insertions
The Ds flanking sequences of 2,057 putative transformants were obtained by TAIL-PCR and sequencing. Analysis of these sequences showed that 88% were unique. The remaining insertions were within the T-DNA with ~4% inserted in the resident negative selection marker, the gfp gene. Further analysis of the flanking sequences by BLAST search and annotation using Rice Genome Program’s Rice GAAS annotation program (http://ricegaas.dna.affrc.go.jp/rgadb/) revealed their distribution throughout the genome but with a bias (approximately twofold) toward chromosomes 4 and 7. Further, anchoring of more than 800 insertions to a YACbased EST map suggested preferential transposition of Ds into regions rich in expressed sequences (Kolesnik et al. 2004). 10.4.4 Korea Population An Ac/Ds-Mediated Gene Trap System
Ac and Ds were separately introduced into a japonica rice cultivar, Dongjin, via an Agrobacterium T-DNA vector. As Ac and Ds starter lines containing a single copy of Ac or Ds were selected and maintained. The Ac/Ds-based gene trap system consisted of three genetic components: Ac, gene trap Ds (DsG), and a counterselective marker. Ac cDNA was used as the transposase source that was under the control of a CaMV 35S promoter (Chin et al. 1999). The bar gene and uidA coding region were oriented so as to be transcribed from either end of Ds toward the middle of the element.
252
Qian-Hao Zhu et al.
The intron used in the DsG construct was the same as that used in Arabidopsis, i.e., the 4th intron of the Arabidopsis G-protein (GPA1) gene (Sundaresan et al. 1995). In rice, fusion of the uidA gene with a host gene was achieved by utilizing three out of four putative splicing donor sites at the 3΄ end of Ds and two out of three putative splice acceptor sites at the 5΄ end of uidA coding region (Chin et al. 1999) A modified bacterial cytochrome P450 gene was used as the counterselective marker in this system. Although cytochrome P450 was successfully used for negative selection (O'Keefe et al. 1994; Tissier et al. 1999), this group found that cytochrome P450/R4702 is not a reliable marker to screen a large population of rice. Germinal Transposition Rates in F2 Progeny and the Limitation of Genetic Crosses for a Large-Scale Mutagenesis
Single-copy Ac and Ds starter lines were crossed to assess the germinal transmission frequency of Ds. More than 10,000 F2 plants were individually analyzed via Southern blot analysis. The overall frequencies of independent germinal transposition in two F2 populations were 10% to 15% (Kim et al. 2004). With the repeated use of the same starter lines maintained by selfing, the frequency of germinal transposition of Ds in the F2 generation decreased. Therefore, the extent to which the use of genetic crossing contributes to the development of a highly saturated insertional mutant population depends largely on the availability of effective selectable markers for large-scale screening. High Proportion of Independent Ds Transposants in a Population of Regenerated Plants
To overcome the dependence on marker genes and the ongoing monitoring of Ac/Ds activity, plant regeneration was adapted as a Ds-mediated genetagging strategy. Ds transposition was analyzed by Southern blot analysis in more than 2,000 R1 plants derived from callus culture of seeds carrying both Ac and Ds. From 70% to 80% of regenerated plants carried new Ds insertions (Kim et al. 2002). Only 10% to 20% of the population carried Ac alone and/or was devoid of Ds (Basta sensitive). Monitoring of the transmission of Ds in R2 plants indicated that Ds elements of R1 plants were stably maintained in the subsequent generation. Also, most of the regenerated plants from any one callus culture carried different Ds insertions. The data showed that the majority of regenerated plants carried independently transposed elements. Therefore, rapid generation of a large Ds transposant population could be achieved using a regeneration procedure involving tissue culture of seed-derived calli carrying Ac and inactive Ds elements obviating the need for any elaborate screening for transposed Ds.
10 Transposon Insertional Mutants
253
Chromosomal Distribution of Ds
For mass production of Ds FSTs, TAIL-PCR was primarily employed. The primer sets for amplification of the 5΄ or 3΄ end of Ds and optimal AD (arbitrary degenerate) primers sets were described by Kim et al. (2004). FSTs were mapped on rice pseudomolecules version 4 (http://www.tigr. org). The patterns of Ds distribution were very similar among several populations derived either by genetic crossing or tissue culture. Ds transposed to all chromosomes with preference near donor sites and some physically unlinked arms. Table 10.4 shows the chromosomal location of transposed Ds elements. The relatively high proportion of Ds elements in chromosomes 3 and 4 resulted from the locations of original donor sites in these chromosomes. Generation of Ds Population and FST Analysis
Owing to the nonrandom distribution of insertion loci, it is essential to create an evenly distributed population of original Ds elements throughout the rice genome for random mutagenesis. Using several Ac and Ds starter lines that were distributed on different chromosomes, a large-scale regeneration population has been developed. From 2001 to 2005, a total of 98,000 regeneration lines were developed. Because 70% to 80% of the population carried a transposed Ds, 73,000 lines are expected to carry independent Ds insertion events. To build up the database of FSTs, 11,386 Ds insertion sites were mapped on rice chromosomes. This material and FST data will be publicly accessible via http://genebank.rda.go.kr/dstag. Table 10.4. Chromosomal distribution of 11,386 Ds insertion sites Chromosome 1 2 3 4 5 6 7 8 9 10 11 12 Un-mapped BAC Total mapped Ds
No. of Ds insertion sites 1,295 780 3,949 1,233 502 475 613 433 385 712 383 452 174 11,386
Proportion (%) 11.4 6.9 34.7 10.8 4.4 4.2 5.4 3.8 3.4 6.2 3.4 4.0 1.4 100
254
Qian-Hao Zhu et al.
10.4.5 UC Davis Population The UC Davis insertion lines are based on using the maize Spm/dSpm and Ac/Ds elements for large-scale genome-wide random insertional mutagenesis in japonica cv. Nipponbare. A complete description of this system utilizing the dSpm element has been published (Kumar et al. 2005). In this system insertion lines are generated using a single T-DNA vector carrying an immobilized Spm or Ac transposase gene as well as the corresponding nonautonomous transposon dSpm or Ds in cis. To track the presence of mobile dSpm or Ds elements in the plants, these elements are equipped with a positive selection marker, the DsRed gene that confers red fluorescence. The DsRed marker gene has been shown to work efficiently with no escapes recorded (i.e., all of the selected red fluorescent plants carried the Ds or dSpm element). The sgfp is used as the negative selection marker to select for unlinked transposition events and to select against the Spm or Ac transposase (Chiu et al. 1996) in the T-DNA. The use of a combination of fluorescent protein marker genes (sgfp and DsRed) as negative and positive selection markers enables quick and easy identification of insertion lines from germinated seedlings. The strategy for generating insertion lines using the aforementioned system is shown in Fig. 10.6A. Primary transformants (T1) carrying single cus/copy T-DNA, expressing gfp and DsRed genes are selected as starter lines. From these starter lines T2 heterozygous progeny are identified based on the GFP fluorescence levels. They are then propagated and allowed to self-pollinate to obtain T3 seeds. Finally the screening for the insertion lines + is carried out in the T3 seedlings (4 to 7 days old) by selecting GFP DsRed seedlings (Fig. 10.6B). The dSpm or Ds flanking sequences from the T3 insertion lines are recovered either by TAIL-PCR or by adapter ligation PCR. The flanking sequences are submitted to GenBank, and also maintained in a searchable FST Database (http://sundarlab.ucdavis.edu/rice/blast/blast.html). With the Spm/dSpm system, the frequency of presumptive unlinked transpositions of the dSpm element is about 45% to 50% (measured as the + percentage of T3 families with at least one GFP DsRed seedling), and that for the Ac/Ds system is approximately 40% to 45%. So far, this group has generated more than 3,500 dSpm insertion lines and sequenced FSTs from 1,800 lines. Using the Ac/Ds system 7,000 insertion lines have been generated and 5,600 FSTs have been sequenced. Analysis of dSpm and Ds flanking sequences revealed that both dSpm and Ds preferentially insert into genes or genic regions. The frequency of insertion within the T-DNA is less than 3% for the dSpm element, while for the Ds element it was about 12%. Further study also indicated that the transposition of dSpm element occurs relatively late in development hence multiple independent insertion lines can be recovered from a single T2 heterozygous parent.
10 Transposon Insertional Mutants
A
255
1 Single copy transformant (T1) Selfing ~400 T2 Seeds Screening for Heterozygous (GFP fluorescence levels) ~200 T2 Heterozygous plants Selfing 200 T3 families Screening for stable transposants (GFP- & DsRed+) ~80 Transposants (40% frequency)
B
Maintenance of dSpm or Ds insertion lines FST sequencing & Database
Fig. 10.6. (A) Strategy for generating dSpmTab or Ds insertions by in cis strategy. (B) Screening for stable transposants by GFP and DsRed fluorescence. White arrows indicate a putative stable transposant which is GFP – (left) and DsRed + (right) (See also color plate section).
The dSpm insertions appear to differ from Ds elements in genomic distribution and exhibit a greater fraction of unlinked transpositions when compared to Ds elements. The results suggest that Ds and dSpm elements may exhibit different preferences for insertion in the rice genome, and hence different genome coverage is likely to be achieved using these elements. The insertion mutant population carrying the dSpm elements can complement other existing mutagens such as Tos17, T-DNA, and Ac/Ds and fill gaps left by these elements in the rice genome. Further, as this approach uses fluorescent protein markers that can be potentially automated – + for the fluorescent sorting of GFP and DsRed seedlings, this seems to be an ideal system for high-throughput insertion line production. The seeds from rice dSpm and Ds insertion lines generated at UC Davis are publicly available (http://sundarlab.ucdavis.edu/rice/blast/blast.html).
256
Qian-Hao Zhu et al.
10.5 Gene Discovery by Transposon Tagging 10.5.1 Forward and Reverse Genetics Strategies In the previous sections, different gene tagging systems for generating a large population of transposon insertion mutants are discussed. Compared with other mutagenesis approaches, transposon mutagenesis has several advantages and disadvantages (Table 10.5). To identify tagged genes in transposon mutagenized populations, forward and reverse genetics strategies are currently being employing (Fig. 10.7). Forward genetics is a traditional strategy that has been used successfully for many years and is aimed at cloning genes that have been identified by a mutant phenotype or function. This approach is straightforward but relies on the identification of visible mutant phenotypes. In contrast, reverse genetics starts with the gene of interest and aims to determine the function of the gene by generating and analysing the phenotype(s) in the corresponding knockout mutant. The prerequisite of an efficient reverse genetics system is that it should be possible to determine the presence and absence of a knockout mutant of a gene of interest in the mutagenized population, which is particularly important as gene knockouts might not lead to an easily identifiable phenotype for the majority of genes. With the forward genetics strategy, new genes can be identified without prior knowledge of the identity of the gene or the gene product. In rice four genes have been identified by transposon mutagenesis using a forward genetics approach (Table 10.6). This strategy can also be used in trait-targeted mutagenesis as discussed in Section 10.3.2. To carry out trait-orientated screening, not only does a large mutagenized population need to be generated, but careful observation and analysis are required as the deleterious effects of a given mutation are often difficult to detect. Once a mutant is identified, plant genomic sequences flanking the transposon can be isolated by iPCR, TAIL-PCR, adapter ligation PCR or plasmid rescue. The subsequent gene cloning process is now relatively straightforward in rice owing to the availability of the entire genomic sequence. In the case of reverse genetics strategy, two approaches are employed to find knockout mutants by screening the mutagenized population. One can randomly amplify and sequence transposon insertion flanking sequences, or specifically screen for insertions in genes of interest. The PCR and plasmid rescue methods mentioned above are only efficient for isolation of transposon FSTs from single- or low-copy-number insertion lines. PCRrelated techniques such as transposon display (van den Broeck et al. 1998) and amplification of insertion-mutagenized sites (AIMS; Frey et al. 1998) have been successfully used to isolate the transposon FSTs in highcopy-number insertion lines. In rice, the most frequently used methods are
Loss-of-function (chemical induced: point mutation; physical induced: deletion mutation); Natural
Stable
Forward and reverse genetics
Map-based cloning, TILLING
N/A
Complementation Additional alleles
Impossible
Stability of mutation
Strategy of gene discovery
Method of gene cloning
Co-segregation analysis
Functional confirmation
Targeted or localized mutagenesis
Chemical and physical agent Very easy
Type of mutation
Generation of the mutagenized population
Mutagen
Impossible
Complementation Additional alleles
Enhanced by selectable markers but deteriorated by somaclonal variation and complicated T-DNA integration
FST rescue
Forward and reverse genetics
Stable
Loss-of-function or gain-offunction (with activation tagging, it makes possible to clone genes whose knockout mutant is lethal); Transgenic
Transformation and largescale tissue culture
T-DNA
Possible
Complementation Additional alleles Revertants
Enhanced by selectable markers but deteriorated by excision footprints of transposon
FST rescue
Forward and reverse genetics
Stable but unstable for the mutations induced by the autonomous elements
Loss-of-function or gainof-function; Transgenic
Transformation but only need relatively small number of starter lines
Transposon
Table 10.5. Comparison of mutagenesis methodologies for gene discovery
Impossible
Complementation Additional alleles
No selectable markers available and deteriorated by somaclonal variation and multiple copies of the retrotransposon
FST rescue
Forward and reverse genetics
Stable but un-stable under tissue culture
Loss-of-function; Natural
Large-scale tissue culture
Retrotransposon
10 Transposon Insertional Mutants 257
258
Qian-Hao Zhu et al. Reverse genetics
Forward genetics Transposon insertion population
FST rescue
Mutants discovered by phenotyping
Database search
Co-segregation analysis of transposon and the mutant phenotype using transposon as a probe Rescue transposon flanking sequence and confirm co-segregation relationship
Insertion in interested gene
Interested gene PCR screening transposon insertion population using gene-specific and insert-specific primers Find insertion line
Identify homozygous insertion plants and investigate phenotypic changes Co-segregation analysis of the insert and the mutant phenotype using gene specific probe
Gene cloning Confirmation by complementation and/or additional alleles
Fig. 10.7. Application of forward and reverse genetics strategies in gene identification using a transposon mutagenized population
TAIL-PCR and iPCR. To determine transposon insertion flanking sequence from single-copy lines, PCR-amplified products can be directly sequenced. In multiple-copy-number lines, the amplification products derived from different insertion sites are resolved on sequencing gels, isolated, reamplified and sequenced individually. Several systematically catalogued databases of transposon FSTs have been established for rice in different laboratories around the world (Upadhyaya et al. 2002; Greco et al. 2003; Kolesnik et al. 2004; Kumar et al. 2005; Szeverenyi et al. 2005; van Enckevort et al. 2005). These databases will significantly facilitate gene identification in rice. With the mapping information of the mutation, the genomic sequence around the mutation is retrieved and annotated to pinpoint the candidate genes that are likely to be affected in the mutant. The lines containing transposon insertions in these genes are then retrieved from the insertion mutant libraries. The knockout phenotypes can then be examined in the homozygotes (Maes et al. 1999). To screen transposon insertion mutants of a specific gene, PCR-based strategies can be used to identify mutants through amplification of a PCR product using gene- and insert-specific primers. Usually, insertion lines are identified using DNA pools containing many insertion lines. The sensitivity of the PCR technique, especially after hybridization of the PCR products with a genespecific probe, allows the easy detection of a single gene hit within a pool of hundreds or thousands of individuals. Screenings of DNA pools are generally organized in a three-dimensional array, to allow easy identification of the tagged individuals.
10 Transposon Insertional Mutants
259
Table 10.6. Genes discovered by forward genetics approach Gene name BFL1a
Tagged by
Mutant phenotype
Putative function
Reference
Ds
An AP2 domain transcription factor mediate the transition from spikelet to floret meristem
Zhu et al. 2003
FZPa
Ac
The formation of florets is replaced by sequential rounds of branching as several rudimentary glumes are formed in each ectopic branch and axillary meristems are formed in the axils of rudimentary glumes. The panicle is seedless. As above
As above
AID1
Ds
Komatsu et al. 2003a Zhu et al. 2004
OsKS1
Ds
OsNOP
Ds
Anther indehiscence and partial to complete spikelet sterility Sever dwarfism, dark green leaf and failure to initiate reproductive growth Pollenless and male sterility
A single MYB domain gene functions at late stages of anther development Encoding entkaurene synthase catalyzing the second step of the gibberellin biosynthesis Containing C2GRAM domain and functioning during late stage of pollen development and its germination by cross-linking both calcium and phosphoinositide signaling pathways.
MargisPinheiro et al. 2005 Jiang et al. 2005
a
bfl1 and fzp are alleles
10.5.2 Other Approaches for Mutation Identification The limitation of the aforementioned approaches of screening DNA pools is that only one or a small number of genes can be screened for at once. To enhance the utility of transposon insertional libraries approaches that allow DNA pools representing many lines to be screened for insertions in many genes at once are desirable. To this end, Mahalingam and Fedoroff (2001) have developed a microarray-based method to screen DNA pools from multiple transposon lines for simultaneous detection of insertions in
260
Qian-Hao Zhu et al.
different genes. In this approach, transposon FSTs are amplified preferentially by TAIL-PCR and hybridized to a cDNA microarray; FSTs that overlap genes represented on the microarray will hybridize with their respective cDNAs, thereby identifying genes containing insertion mutations in or near these genes (Mahalingam and Fedoroff 2001). It has been shown that microarray hybridization of TAIL-PCR amplified FSTs can detect individual Arabidopsis Ds insertion lines from a DNA pool comprised of as many as 100 lines. But this approach is likely to favor the identification of insertions in or very close to genes because the Ds insertions tend to cluster around the translational start site (Parinov et al. 1999). Moreover, TAIL-PCR products tend to be short. A tagged transcriptome display (TTD) strategy has been developed in rice to detect the transposon insertions located in transcribed sequences (Kohli et al. 2001). In this approach, a CpG methylation-sensitive enzyme such as SalI is used to preferentially cut rice genomic DNA in transcriptionally active chromosomal regions. The transposon (Ac) FSTs are then amplified by adapter ligation PCR, blotted onto a membrane and hybridized with labeled leaf cDNA to reveal insertions in transcribed genes specifically expressed in leaf. This strategy can be used not only to detect transposon insertions in specific tissues, but also in genes that are transcribed in response to particular biotic and abiotic stresses (Kohli et al. 2001). For maximum efficiency, more than one methylation-sensitive enzyme should be used for a given line to maximize the recovery of all potential gene-tagged transposon insertions. 10.5.3 Tagging Efficiency Transposon tagging has been proven to be a powerful tool for functional genomics in plants. To increase tagging efficiency, insertions within exons are preferred as transposons may be spliced out when they insert within an intron. Most studies have shown that Ac/Ds transposes into gene coding regions (Enoki et al. 1999; Greco et al. 2001a; Kolesnik et al. 2004), but it seems that exons and introns are equal targets for transposon insertion (Kolesnik et al. 2004; Q.-H. Zhu et al. 2006b). Cases have been reported in which a phenotype is not linked to the Ds element. One possible explanation for this is that the Ds element transposes more than once in the F1 (DtT0), or in subsequent generations in the presence of the Ac transposase, leaving footprints in a coding sequence and thereby altering the reading frame to result in a mutated gene product. This is most likely the case of the csl1 (compact shoot and leafy head 1) mutant, in which all primary branches of the panicle are converted to vegetative plantlets (Q.-H. Zhu et al. 2006b). Other background mutations that induce genetic and epigenetic changes may be induced by tissue culture. Particular
10 Transposon Insertional Mutants
261
attention therefore needs to be paid when performing double transformation as background mutations may be introduced by both tissue culture and secondary transposition during cocultivation of Ac and Ds vectors. It has also been reported that transposons can create large chromosome deletions on mobilization. In the case of Osnop mutant, a deletion of 65 kb of genomic DNA containing 14 genes together with 3.8 kb of the 5΄ Ds element itself was found at the Ds insertion site (Jiang et al. 2005). The exact mechanism of such a deletion is not clear, however, endogenous repetitive sequences of Ds interacting with the transformed Ds resulting in unequal homologous recombination events might be the causal factor (Page et al. 2004). As described in the preceding text, gene and enhancer trap systems allow the identification of genes and regulatory elements that are not amenable to classical genetic analysis. Hence, novel genes are likely to be identified in such trapped lines. This approach has been very successful in Arabidopsis (Springer 2000), but in rice no gene or enhancer has so far been found using these trap systems in the transposon mutagenesis populations. 10.5.4 Confirmation of Tagged Gene After establishment of the cosegregation relationship between the mutant phenotype and a transposon insertion, the simplest and most straightforward way to confirm that the mutant phenotype is the result of a mutation due to transposon insertion, is to check whether there are other alleles that have been independently identified. With transposon-tagged mutations, it is also possible to generate more alleles or revertants by crossing the mutant with a transposase-expressing line. Both will provide additional evidence that the tagged gene is responsible for the mutant phenotype. Complementation with the wild-type copy of the tagged gene is another standard but labor-intensive procedure for confirmation. Another way to confirm the relationship between the mutant phenotype and a transposon insertion is to use RNAi to mimic the knockout phenotype.
10.6 Future Prospects Transposon-induced phenotypic changes can provide strong evidence for the biological function of a gene. Substantial populations with transposon insertions have been established in rice, but a great deal of further work is required to achieve saturation mutagenesis. Localized mutagenesis will play an important role toward the achievement of this goal. More importantly, it is now the time to shift our focus to serious and systematic phenotyping using forward or reverse genetics approaches. The challenge is to
262
Qian-Hao Zhu et al.
develop sophisticated screening systems for the identification of phenotypes of transposon-induced mutations. Conditional and/or customized phenotyping will also be required, since the essential function of a large number of genes may not be revealed under normal growth conditions. Transposon mutagenesis, together with other functional genomics tools, will ultimately help us understand the function of the more than 40,000 rice genes, and their interactive networks.
References Aarts MG, Dirkse WG, Stiekema WJ, Pereira A (1993) Transposon tagging of a male sterility gene in Arabidopsis. Nature 363:715–717 Agrawal GK, Yamazaki M, Kobayashi M, Hirochika R, Miyao A, Hirochika H (2001) Screening of the rice viviparous mutants generated by endogenous retrotransposon Tos17 insertion. Tagging of a zeaxanthin epoxidase gene and a novel ostatc gene. Plant Physiol 125:1248–1257 Agrawal GK, Abe K, Yamazaki M, Miyao A, Hirochika H (2005) Conservation of the E-function for floral organ identity in rice revealed by the analysis of tissue culture-induced loss-of-function mutants of the OsMADS1 gene. Plant Mol Biol 59:125–135 An G, Lee S, Kim SH, Kim SR (2005) Molecular genetics using T-DNA in rice. Plant Cell Physiol 46:14–22 Balcells L, Sundberg E, Coupland G (1994) A heat-shock promoter fusion to the Ac transposase gene drives inducible transposition of a Ds element during Arabidopsis embryo development. Plant J 5:755–764 Bancroft I, Jones JD, Dean C (1993) Heterologous transposon tagging of the DRL1 locus in Arabidopsis. Plant Cell 5:631–638 Baker B, Schell J, Lorz H, Fedoroff N (1986) Transposition of the maize controlling element “Activator” in tobacco. Proc Natl Acad Sci USA 83:4844–4848 Bhatt AM, Page T, Lawson EJR, Lister C, Dean C (1996) Use of Ac as an insertional mutagen in Arabidopsis. Plant J 9: 935–945 Brettell RI, Dennis ES (1991) Reactivation of a silent Ac following tissue culture is associated with heritable alterations in its methylation pattern. Mol Gen Genet 229:365–372 Charng YC, Pfitzner AJP, Pfitzner UM, Charng-Chang KF, Chen C, Tu J, Kuo TT (2000) Construction of an inducible transposon, INAc, to develop a gene tagging system in higher plants. Mol Breed 6:353–367 Charng Y-C, Wu G, Hsieh C-S, Chuan H-N, Huang J-Y, Yeh L-C, Shieh Y-H, Tu J (2007) The inducible transposon system for rice functional genomics Botanical Studies 48:1–11 Chin HG, Choe MS, Lee SH, Park SH, Koo JC, Kim NY, Lee JJ, Oh BG, Yi GH, Kim SC, Choi HC, Cho MJ, Han CD (1999) Molecular analysis of rice plants harboring an Ac/Ds transposable element-mediated gene trapping system. Plant J 19:615–623
10 Transposon Insertional Mutants
263
Chiu W, Niwa Y, Zeng W, Hirano T, Kobayashi H, Sheen J (1996) Engineered GFP as a vital reporter in plants. Curr Biol 6:325–330 Chuck G, Robbins T, Nijjar C, Ralston E, Courtney-Gutterson N, Dooner HK (1993) Tagging and cloning of a petunia flower color gene with the maize transposable element Activator. Plant Cell 5:371–378 Coen, ES, Robbins TP, Almeida J, Hudson A, Carpenter R (1989) Consequences and mechanism of transposition in Antirrhinum majus. In: Berg DE, Howe MM (eds) Mobile DNA. American Society for Microbiology, Washington DC, pp 413–436 Devic M, Albert S, Delseny M, Roscoe TJ (1997) Efficient PCR walking on plant genomic DNA. Plant Physiol Biochem 35:331–339 Dooner HK, Belachew A (1989) Transposition pattern of the maize element Ac from bz-M2 (Ac) allele. Genetics 136:261–279 Dubois P, Cutler S, Belzile FJ (1998) Regional insertional mutagenesis on chromosome III of Arabidopsis thaliana using the maize Ac element. Plant J 13:141–151 Eamens AL, Blanchard CL, Dennis ES, Upadhyaya NM (2004) A bidirectional gene trap construct for T-DNA and Ds mediated insertional mutagenesis in rice (Oryza sativa L.) Plant Biotech J 2:367–380 Earp DJ, Lowe B, Baker B (1990) Amplification of genomic sequences flanking transposable elements in host and heterologous plants: a tool for transposon tagging and genome characterization. Nucl Acids Res 18:3271–3279 Enoki H, Izawa T, Kawahara M, Komatsu M, Koh S, Kyozuka J, Shimamoto K (1999) Ac as a tool for the functional genomics of rice. Plant J 19:605–613 Fedoroff NV (1989) About maize transposable elements and development. Cell 56:181–191 Fedoroff NV, Chandler V (1994) Inactivation of maize transposable elements In: Paszkowski J (ed) Homologous Recombination and Gene Silencing in Plants. Kluwer Academic Publishers, Dordrecht, The Netherlands pp 349–385 Fedoroff N, Wessler S, Shure M (1983) Isolation of the transposable maize controlling elements Ac and Ds. Cell 35:235–242 Fedoroff NV, Furtek DB, Nelson O (1984) Cloning of the bronze locus in maize by a simple and generalizable procedure using the transposable controlling element Activator (Ac). Proc Natl Acad Sci USA 81:3825–3829 Finnegan EJ, Lawrence GJ, Dennis ES, Ellis JG (1993) Behaviour of modified Ac elements in flax callus and regenerated plants. Plant Mol Biol 22:625–633 Frey M, Reinecke J, Grant S, Saedler H, Gierl A (1990) Excision of the En/Spm transposable element of Zea mays requires two element-encoded proteins. EMBO J 9:4037–4044 Frey M, Stettner C, Gierl A (1998) A general method for gene isolation in tagging approaches: amplification of insertion mutagenised sites (AIMS). Plant J 13:717–721 Fujino K, Sekiguchi H, Kiguchi T (2005) Identification of an active transposon in intact rice plants. Mol Genet Genom 273:150–157 Fujita N, Yoshida M, Asakura N, Ohdan T, Miyao A, Hirochika H, Nakamura Y (2006) Function and characterization of starch synthase I using mutants in rice. Plant Physiol 140:1070–1084
264
Qian-Hao Zhu et al.
Gendrel AV, Colot V (2005) Arabidopsis epigenetics: when RNA meets chromatin. Curr Opin Plant Biol 8:142–147 Goff SA, Ricke D, Lan TH, Presting G, Wang R, Dunn M, Glazebrook J, Sessions A, Oeller P, Varma H, Hadley D, Hutchison D, Martin C, Katagiri F, Lange BM, Moughamer T, Xia Y, Budworth P, Zhong J, Miguel T, Paszkowski U, Zhang S, Colbert M, Sun WL, Chen L, Cooper B, Park S, Wood TC, Mao L, Quail P, Wing R, Dean R, Yu Y, Zharkikh A, Shen R, Sahasrabudhe S, Thomas A, Cannings R, Gutin A, Pruss D, Reid J, Tavtigian S, Mitchell J, Eldredge G, Scholl T, Miller RM, Bhatnagar S, Adey N, Rubano T, Tusneem N, Robinson R, Feldhaus J, Macalma T, Oliphant A, Briggs S (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 296: 92–100 Gorbunova V, Levy AA (2000) Analysis of extrachromosomal Ac/Ds transposable elements. Genetics 155:349–359 Greco R, Ouwerkerk PB, Taal AJ, Favalli C, Beguiristain T, Puigdomenech P, Colombo L, Hoge JH, Pereira A (2001a) Early and multiple Ac transpositions in rice suitable for efficient insertional mutagenesis. Plant Mol Biol 46:215–227 Greco R, Ouwerkerk PB, Sallaud C, Kohli A, Colombo L, Puigdomenech P, Guiderdoni E, Christou P, Hoge JH, Pereira A (2001b) Transposon insertional mutagenesis in rice. Plant Physiol 125:1175–1177 Greco R, Ouwerkerk PB, De Kam RJ, Sallaud C, Favalli C, Colombo L, Guiderdoni E, Meijer AH, Hoge Dagger JH, Pereira A (2003) Transpositional behaviour of an Ac/Ds system for reverse genetics in rice. Theor Appl Genet 108:10–24 Greco R, Ouwerkerk PB, Taal AJ, Sallaud C, Guiderdoni E, Meijer AH, Hoge JH, Pereira A (2004) Transcription and somatic transposition of the maize En/Spm transposon system in rice. Mol Genet Genom 270:514–523 He CK, Dey M, Lin Z, Duan F, Li F, Wu R (2007) An efficient method for producing an indexed, insertional-mutant library in rice. Genomics (In Press) Hehl R, Baker B (1990) Properties of the maize transposable element Activator in transgenic tobacco plants: versatile inter-species genetic tool. Plant Cell 2:709–721 Heinlein M (1996) Excision patterns of Activator (Ac) and Dissociation (Ds) elements in Zea mays L.: implications for the regulation of transposition. Genetics 144:1851–1869 Heinlein M, Brattigt T, Kunze R (1994) In vivo aggregation of maize Activator (Ac) transposase in nuclei of maize endosperm and Petunia protoplasts. Plant J 5:705–714 Hirochika H (2001) Contribution of the Tos17 retrotransposon to rice functional genomics. Curr Opin Plant Biol 4:118–122 Hirochika H, Sugimoto K, Otsuki Y, Tsugawa H, Kanda M. (1996) Retrotransposons of rice involved in mutations induced by tissue culture. Proc Natl Acad Sci USA 93:7783–7788 Hirochika H, Guiderdoni E, An G, Hsing YI, Eun MY, Han CD, Upadhyaya N, Ramachandran S, Zhang Q, Pereira A, Sundaresan V, Leung H (2004) Rice mutant resources for gene discovery. Plant Mol Biol 54:325–334
10 Transposon Insertional Mutants
265
Ito T, Seki M, Hayashida N, Shibata D, Shinozaki K (1999) Regional insertional mutagenesis of genes on Arabidopsis thaliana chromosome V using the Ac/Ds transposon in combination with a cDNA scanning method. Plant J 17:433–444 Ito T, Motohashi R, Kuromori T, Mizukado S, Sakurai T, Kanahara H, Seki M, Shinozaki K (2002) A new resource of locally transposed Dissociation elements for screening gene-knock-out lines in silico on the Arabidopsis genome. Plant Physiol 129:1695–1699 Ito Y, Hirochika H, Kurata N (2002) Organ-specific alternative transcripts of KNOX family class 2 homeobox genes of rice. Gene 288:41–47 Ito Y, Eiguchi M, Kurata N (2004) Establishment of an enhancer trap system with Ds and GUS for functional genomics in rice. Mol Genet Genom 271:639–650 Izawa T, Miyazaki C, Yamamoto M, Terada R, Iida S, Shimamoto K (1991) Introduction and transposition of the maize transposable element Ac in rice (Oryza sativa L.). Mol Gen Genet 227:391–396 Izawa T, Ohnishi T, Nakano T, Ishida N, Enoki H, Hashimoto H, Itoh K, Terada R, Wu C, Miyazaki C, Endo T, Iida S, Shimamoto K (1997) Transposon tagging in rice. Plant Mol Biol 35:219–229 James DW Jr, Lim E, Keller J, Plooy I, Ralston E, Dooner HK (1995) Directed tagging of the Arabidopsis FATTY ACID ELONGATION1 (FAE1) gene with the maize transposon Activator. Plant Cell 7:309–319 Jiang N, Bao Z, Zhang X, Hirochika H, Eddy SR, McCouch SR, Wessler SR (2003) An active DNA transposon family in rice. Nature 421:163–167 Jiang SY, Cai M, Ramachandran S (2005) The Oryza sativa no pollen (Osnop) gene plays a role in male gametophyte development and most likely encodes a C2-GRAM domain-containing protein. Plant Mol Biol 57:835–853 Jin WZ, Wang SM, Xu M, Duan RJ, Wu P (2004) Characterization of enhancer trap and gene trap harboring Ac/Ds transposon in transgenic rice. J Zhejiang Univ Sci 5:390–399 Jones DA, Thomas CM, Hammond-Kosack KE, Balint-Kurti PJ, Jones JD (1994) Isolation of the tomato Cf-9 gene for resistance to Cladosporium fulvum by transposon tagging. Science 266:789–793 Jones, JDG, Carland FM, Maliga P, Dooner HK (1989). Visual detection of transposition of the maize element Activator (Ac) in tobacco seedlings. Science 244:204–207 Jones JDG, Carland F, Lim E, Ralston E, Dooner HK (1990) Preferential transposition of the maize element Activator to linked chromosomal locations in tobacco. Plant Cell 2:701–707 Jung KH, Hur J, Ryu CH, Choi Y, Chung YY, Miyao A, Hirochika H, An G (2003) Characterization of a rice chlorophyll-deficient mutant using the T-DNA gene-trap system. Plant Cell Physiol 44:463–472 Jung KH, Han MJ, Lee YS, Kim YW, Hwang I, Kim MJ, Kim YK, Nahm BH, An G (2005) Rice Undeveloped Tapetum1 is a major regulator of early tapetum development. Plant Cell 17:2705–2722 Kaneko M, Inukai Y, Ueguchi-Tanaka M, Itoh H, Izawa T, Kobayashi Y, Hattori T, Miyao A, Hirochika H, Ashikari M, Matsuoka M (2004) Loss-of-function mutations of the rice GAMYB gene impair alpha-amylase expression in aleurone and flower development. Plant Cell 16:33–44
266
Qian-Hao Zhu et al.
Kaiser K, Goodwin SF (1990) “Site-selected” transposon mutagenesis of Drosophila. Proc Natl Acad Sci USA 87:1686–1690 Keller J, Jones JD, Harper E, Lim E, Carland F, Ralston EJ, Dooner HK (1993a) Effects of gene dosage and sequence modification on the frequency and timing of transposition of the maize element Activator (Ac) in tobacco. Plant Mol Biol 21:157–170 Keller J, Lim E, Dooner H (1993b) Preferential transposition of Ac to linked sites in Arabidopsis. Theor Appl Genet 86:585–588 Kikuchi K, Terauchi K, Wada M, Hirano HY (2003) The plant MITE mPing is mobilized in anther culture. Nature 421:167–170 Kim CM, Je BI, Piao HL, Par SJ, Kim MJ, Park SH, Park JY, Park SH, Lee EK, Chon NS, Won YJ, Lee GH, Nam MH, Yun DW, Lee MC, Cha YS, Le Kon H, Eun MY, Han CD (2002) Reprogramming of the activity of the activator/dissociation transposon family during plant regeneration in rice. Mol Cells 14:231–237 Kim CM, Piao HL, Park SJ, Chon NS, Je BI, Sun B, Park SH, Park JY, Lee EJ, Kim MJ, Chung WS, Lee KH, Lee YS, Lee JJ, Won YJ, Yi G, Nam MH, Cha YS, Yun DW, Eun MY, Han CD (2004) Rapid, large-scale generation of Ds transposant lines and analysis of the Ds insertion sites in rice. Plant J 39: 252–263 Kohli A, Xiong J, Greco R, Christou P, Pereira A (2001) Tagged Transcriptome Display (TTD) in indica rice using Ac transposition. Mol Genet Genomics 266:1–11 Kolesnik T, Szeverenyi I, Bachmann D, Kumar CS, Jiang S, Ramamoorthy R, Cai M, Ma ZG, Sundaresan V, Ramachandran S (2004) Establishing an efficient Ac/Ds tagging system in rice: large-scale analysis of Ds flanking sequences. Plant J 37:301–314 Komatsu M, Chujo A, Nagato Y, Shimamoto K, Kyozuka J (2003a) FRIZZY PANICLE is required to prevent the formation of axillary meristems and to establish floral meristem identity in rice spikelets. Development 130:3841–3850 Komatsu M, Shimamoto K, Kyozuka J (2003b) Two-step regulation and continuous retrotransposition of the rice LINE-type retrotransposon Karma. Plant Cell 5:1934–1944 Koprek T, McElroy D, Louwerse J, Williams-Carrier R, Lemaux PG (2000) An efficient method for dispersing Ds elements in the barley genome as a tool for determining gene function. Plant J 24:253–263 Kumar CS, Wing RA, Sundaresan V (2005) Efficient insertional mutagenesis in rice using the maize En/Spm elements. Plant J 44:879–892 Kurusu T, Yagala T, Miyao A, Hirochika H, Kuchitsu K (2005)Identification of a putative voltage-gated Ca2+ channel as a key regulator of elicitor-induced hypersensitive cell death and mitogen-activated protein kinase activation in rice. Plant J 42:798–809 Lee S, Kim JH, Yoo ES, Lee CH, Hirochika H, An G (2005) Differential regulation of chlorophyll a oxygenase genes in rice. Plant Mol Biol 57:805–818 Liu YG, Mitsukawa N, Oosumi T, Whittier RF (1995) Efficient isolation and mapping of Arabidopsis thaliana T-DNA insert junctions by thermal asymmetric interlaced PCR. Plant J 8:457–463
10 Transposon Insertional Mutants
267
Long D, Martin M, Sundberg E, Swinburne J, Puangsomlee P, Coupland G (1993) The maize transposable element system Ac/Ds as a mutagen in Arabidopsis: identification of an albino mutation induced by Ds insertion. Proc Natl Acad Sci USA 90:10370–10374 Long D, Goodrich J, Wilson K, Sundberg E, Martin M, Puangsomlee P, Coupland G (1997) Ds elements on all five Arabidopsis chromosomes and assessment of their utility for transposon tagging. Plant J 11:145–148 Maes T, De Keukeleire P, Gerats T (1999) Plant tagnology. Trends Plant Sci 4:90–96 Mahalingam R, Fedoroff N (2001) Screening insertion libraries for mutations in many genes simultaneously using DNA microarrays. Proc Natl Acad Sci USA 98:7420–7425 Mao L, Wood TC, Yu Y, Budiman MA, Tomkins J, Woo S, Sasinowski M, Presting G, Frisch D, Goff S, Dean RA, Wing RA (2000) Rice transposable elements: a survey of 73,000 sequence-tagged-connectors. Genome Res 10: 982–990 Margis-Pinheiro M, Zhou X-R, Zhu Q-H, Dennis ES, Upadhyaya NM (2005) Isolation and characterization of a Ds-tagged rice (Oryza sativa L.) GAresponsive dwarf mutant defective in an early step of the gibberellin biosynthesis pathway. Plant Cell Rep 23:819–833 Martin C, Carpenter R, Sommer H, Saedler H, Coen ES (1985) Molecular analysis of instability in flower pigmentation of Antirrhinum majus, following isolation of the pallida locus by transposon tagging. EMBO J 4:1625–1630 McClintock B (1950) The origin and behavior of mutable loci in maize. Proc Natl Acad Sci USA 36:344–355 McClintock B (1951) Chromosome organization and gene expression. Cold Spring Harbor Symp Quant Biol 16:13–47 McClintock B (1984) The significance of responses of the genome to challenge. Science 226:792–801 McElroy D, Louwerse JD, McElroy SM, Lemaux PG (1997) Development of a simple transient assay for Ac/Ds activity in cells of intact barley tissue. Plant J 11:157–165 Miyao A, Tanaka K, Murata K, Sawaki H, Takeda S, Abe K, Shinozuka Y, Onosato K, Hirochika H (2003) Target site specificity of the Tos17 retrotransposon shows a preference for insertion within genes and against insertion in retrotransposon-rich regions of the genome. Plant Cell 15:1771–1780 Moon S, Jung KH, Lee DE, Lee DY, Lee J, An K, Kang HG, An G (2006) The rice FON1 gene controls vegetative and reproductive development by regulating shoot apical meristem size. Mol Cells 21:147–152 Morita R, Hattori Y, Yokoi S, Takase H, Minami M, Hiratsuka K, Toriyama K (2003) Assessment of utility of meiosis-associated promoters of lily for induction of germinal Ds transposition in transgenic rice. Plant Cell Physiol 44:637–642 Murai N, Li ZJ, Kawagoe Y, Hayashimoto A (1991) Transposition of the maize activator element in transgenic rice plants. Nucl Acids Res 19:617–622
268
Qian-Hao Zhu et al.
Muskett PR, Clissold L, Marocco A, Springer PS, Martienssen R, Dean C (2003) A resource of mapped dissociation launch pads for targeted insertional mutagenesis in the Arabidopsis genome. Plant Physiol 132:506–516 Nakagawa Y, Machida C, Machida Y, Toriyama K (2000) Frequency and pattern of transposition of the maize transposable element Ds in transgenic rice plants. Plant Cell Physiol 41:733–742 Nakamura A, Fukuda A, Sakai S, Tanaka Y (2006) Molecular cloning, functional expression and subcellular localization of two putative vacuolar voltage-gated chloride channels in rice (Oryza sativa L.). Plant Cell Physiol 47:32–42 Nakazaki T, Okumoto Y, Horibata A, Yamahira S, Teraishi M, Nishida H, Inoue H, Tanisaka T (2003) Mobilization of a transposon in the rice genome. Nature 421:170–172 Nishal B, Tantikanjana T, Sundaresan V (2005) An inducible targeted tagging system for localized saturation mutagenesis in Arabidopsis. Plant Physiol 137: 3–12 Nonomura K, Miyoshi K, Eiguchi M, Suzuki T, Miyao A, Hirochika H, Kurata N (2003) The MSP1 gene is necessary to restrict the number of cells entering into male and female sporogenesis and to initiate anther wall formation in rice. Plant Cell 15:1728–1739 Nonomura K, Nakano M, Fukuda T, Eiguchi M, Miyao A, Hirochika H, Kurata N (2004a) The novel gene HOMOLOGOUS PAIRING ABERRATION IN RICE MEIOSIS1 of rice encodes a putative coiled-coil protein required for homologous chromosome pairing in meiosis. Plant Cell 16:1008–1020 Nonomura K, Nakano M, Murata K, Miyoshi K, Eiguchi M, Miyao A, Hirochika H, Kurata N (2004b) An insertional mutation in the rice PAIR2 gene, the ortholog of Arabidopsis ASY1, results in a defect in homologous chromosome pairing during meiosis. Mol Genet Genom 271:121–129 O'Keefe DP, Tepperman JM, Dean C, Leto KJ, Erbes DL, Odell JT (1994) Plant expression of a bacterial cytochrome P450 that catalyzes activation of a sulfonylurea pro-herbicide. Plant Physiol 105:473–482 Page DR, Kohler C, Da Costa-Nunes JA, Baroux C, Moore JM, Grossniklaus U (2004) Intrachromosomal excision of a hybrid Ds element induces large genomic deletions in Arabidopsis. Proc Natl Acad Sci USA 101:2969–2974 Parinov S, Sevugan M, Ye D, Yang WC, Kumaran M, Sundaresan V (1999) Analysis of flanking sequences from dissociation insertion lines: a database for reverse genetics in Arabidopsis. Plant Cell 11:2263–2270 Pereira A, Schwarz-Sommer Z, Gierl A, Bertram I, Peterson PA, Saedler H (1985) Genetic and molecular analysis of the Enhancer (En) transposable element system of Zea mays. EMBO J 4:17–23 Raina S, Mahalingam R, Chen F, Fedoroff N (2002) A collection of sequenced and mapped Ds transposon insertion sites in Arabidopsis thaliana. Plant Mol Biol 50:93–110 Ramachandran S, Sundaresan V (2001) Transposons as tools for functional genomics. Plant Physiol Biochem 39:243–252 Saedler H, Nevers P (1985) Transposition in plants: a molecular model. EMBO J 4:585–590
10 Transposon Insertional Mutants
269
Sakamoto T, Miura K, Itoh H, Tatsumi T, Ueguchi-Tanaka M, Ishiyama K, Kobayashi M, Agrawal GK, Takeda S, Abe K, Miyao A, Hirochika H, Kitano H, Ashikari M, Matsuoka M (2004) An overview of gibberellin metabolism enzyme genes and their related mutants in rice. Plant Physiol 134:1642–1653 Sato Y, Sentoku N, Miura Y, Hirochika H, Kitano H, Matsuoka M (1999) Lossof-function mutations in the rice homeobox gene OSH15 affect the architecture of internodes resulting in dwarf plants. EMBO J 18:992–1002 Schmidt R, Willmitzer L (1989) The maize autonomous element Activator (Ac) shows a minimal germinal excision frequency of 0.2%–0.5% in transgenic Arabidopsis thaliana plants. Mol Gen Genet 220:17–24 Schmitz G, Theres K (1994) A self-stabilizing Ac derivative and its potential for transposon tagging. Plant J 6:781–786 Scofield SR, English JJ, Jones JD (1993) High level expression of the Activator transposase gene inhibits the excision of Dissociation in tobacco cotyledons. Cell 75:507–517 Shimamoto K, Miyazaki C, Hashimoto H, Izawa T, Itoh K, Terada R, Inagaki Y, Iida S (1993) Trans-activation and stable integration of the maize transposable element Ds cotransfected with the Ac transposase gene in transgenic rice plants. Mol Gen Genet 239:354–360 Siebert PD, Chenchik A, Kellogg DE, Lukyanov KA, Lukyanov SA (1995) An improved PCR method for walking in uncloned genomic DNA. Nucl Acids Res 23:1087–1088 Solis R, Takumi S, Mori N, Nakamura C (1999) Ac-mediated trans-activation of the Ds element in rice (Oryza sativa L.) cells as revealed by GUS assay. Hereditas 131:23–31 Sommer H, Carpenter R, Harrison BJ, Saedler H (1985) The transposable element Tam3 of Anitrrhinum majus generates a novel type of sequence alteration upon excision. Mol Gen Genet 199:225–231 Springer PS (2000) Gene traps: tools for plant development and genomics. Plant Cell 12:1007–1020 Sundaresan V (1996) Horizontal spread of transposon mutagenesis: new uses for old elements. Trends Plant Sci 1:184–190 Sundaresan V, Springer P, Volpe T, Haward S, Jones JD, Dean C, Ma H, Martienssen R (1995) Patterns of gene action in plant development revealed by enhancer trap and gene trap transposable elements. Genes Dev 9:1797–1810 Szeverenyi I, Ramamoorthy R, Teo Z, Luan H, Ma Z, Ramachandran S (2006) Large scale systematic study on stability of Ds element and timing of transposition in rice. Plant Cell Physiol 47:84–95 Tabuchi M, Sugiyama K, Ishiyama K, Inoue E, Sato T, Takahashi H, Yamaya T Severe reduction in growth rate and grain filling of rice mutants lacking OsGS1;1, a cytosolic glutamine synthetase1;1. Plant J 42:641–651 Takumi S, Murai K, Mori N, Nakamura C (1999) Variations in the maize Ac transposase transcript level and the Ds excision frequency in transgenic wheat callus lines. Genome 42:1234–1241 Takano M, Kanegae H, Shinomura T, Miyao A, Hirochika H, Furuya M (2001) Isolation and characterization of rice phytochrome A mutants. Plant Cell 13:521–534
270
Qian-Hao Zhu et al.
Tanaka K, Murata K, Yamazaki M, Onosato K, Miyao A, Hirochika H (2003) Three distinct rice cellulose synthase catalytic subunit genes required for cellulose synthesis in the secondary wall. Plant Physiol 133:73–83 Tissier AF, Marillonnet S, Klimyuk V, Patel K, Torres MA, Murphy G, Jones JD (1999) Multiple independent defective suppressor-mutator transposon insertions in Arabidopsis: a tool for functional genomics. Plant Cell 11:1841–1852 Tsugane K, Maekawa M, Takagi K, Takahara H, Qian Q, Eun CH, Iida S (2006) An active DNA transposon nDart causing leaf variegation and mutable dwarfism and its related elements in rice. Plant J 45:46–57 Upadhyaya NM, Surin B, Ramm K, Gaudron J, Schünmann PHD, Taylor W, Waterhouse PM, Wang M-B (2000) Agrobacterium-mediated transformation of Australian rice cultivars Jarrah and Amaroo using modified promoters and selectable markers. Aust J Plant Physiol 27:201–210 Upadhyaya NM, Zhou X-R, Zhu Q-H, Ramm K, Wu L, Eamens A, Sivakumar R, Kato T, Yun D-W, Kumar S, Narayanan KK, Peacock WJ and Dennis ES (2002) An iAc/Ds gene and enhancer trapping system for insertional mutagenesis in rice. Funct Plant Biol 29:547–559 Upadhyaya NM, Zhu Q-H, Zhou X-R, Eamens AL, Hoque MS, Ramm K, Shivakkumar R, Smith KF, Pan S-T, Li S, Peng K, Kim SJ, Dennis ES (2006) Dissociation (Ds) constructs, mapped Ds launch pads and a transiently expressed transposase system suitable for localized insertional mutagenesis in rice. Theor Appl Genet 112:1326–1341 van den Broeck D, Maes T, Sauer M, Zethof J, De Keukeleire P, D'hauw M, Van Montagu M, Gerats T (1998) Transposon Display identifies individual transposable elements in high copy number lines. Plant J 13:121–129 van Enckevort LJ, Droc G, Piffanelli P, Greco R, Gagneur C, Weber C, Gonzalez VM, Cabot P, Fornara F, Berri S, Miro B, Lan P, Rafel M, Capell T, Puigdomenech P, Ouwerkerk PB, Meijer AH, Pe' E, Colombo L, Christou P, Guiderdoni E, Pereira A (2005) EU-OSTID: a collection of transposon insertional mutants for functional genomics in rice. Plant Mol Biol 59:99–110 van Sluys MA, Tempe J, Fedoroff N (1987) Studies on the introduction and mobility of the maize Activator element in Arabidopsis thaliana and Daucus carota. EMBO J 6:3881–3889 Wang L, Kunze R (1998) Transposase binding site methylation in the epigenetically inactivated Ac derivative Ds-cy. Plant J 13:577–582 Xue Y, Li J, Xu Z (2003) Recent highlights of the China rice functional genomics program. Trends Genet 19:390–394 Yamazaki M, Tsugawa H, Miyao A, Yano M, Wu J, Yamamoto S, Matsumoto T, Sasaki T, Hirochika H (2001) The rice retrotransposon Tos17 prefers lowcopy-number sequences as integration targets. Mol Genet Genom 265: 336–344 Yamaguchi T, Lee DY, Miyao A, Hirochika H, An G, Hirano HY (2006) Functional diversification of the two C-class MADS box genes OSMADS3 and OSMADS58 in Oryza sativa. Plant Cell 18:15–28 Yoder JI, Palys J, Alpert K, Lassner M (1988) Ac transposition in transgenic tomato plants. Mol Gen Genet 213:291–296
10 Transposon Insertional Mutants
271
Yu J, Hu S, Wang J, Wong GK, Li S, Liu B, Deng Y, Dai L, Zhou Y, Zhang X, Cao M, Liu J, Sun J, Tang J, Chen Y, Huang X, Lin W, Ye C, Tong W, Cong L, Geng J, Han Y, Li L, Li W, Hu G, Huang X, Li W, Li J, Liu Z, Li L, Liu J, Qi Q, Liu J, Li L, Li T, Wang X, Lu H, Wu T, Zhu M, Ni P, Han H, Dong W, Ren X, Feng X, Cui P, Li X, Wang H, Xu X, Zhai W, Xu Z, Zhang J, He S, Zhang J, Xu J, Zhang K, Zheng X, Dong J, Zeng W, Tao L, Ye J, Tan J, Ren X, Chen X, He J, Liu D, Tian W, Tian C, Xia H, Bao Q, Li G, Gao H, Cao T, Wang J, Zhao W, Li P, Chen W, Wang X, Zhang Y, Hu J, Wang J, Liu S, Yang J, Zhang G, Xiong Y, Li Z, Mao L, Zhou C, Zhu Z, Chen R, Hao B, Zheng W, Chen S, Guo W, Li G, Liu S, Tao M, Wang J, Zhu L, Yuan L, Yang H (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296:79–92 Zhu Q-H, Hoque MS, Dennis ES, Upadhyaya NM (2003) Ds tagging of BRANCHED FLORETLESS 1 (BFL1) that mediates the transition from spikelet to floret meristem in rice (Oryza sativa L.). BMC Plant Biol 3:6 Zhu Q-H, Ramm K, Shivakkumar R, Dennis ES, Upadhyaya NM (2004) The ANTHER INDEHISCENCE1 gene encoding a single MYB domain protein is involved in anther development in rice. Plant Physiol 135:1514–1525 Zhu Q-H, Ramm K, Eamens AL, Dennis ES, Upadhyaya NM (2006a) Transgene structures suggest that multiple mechanisms are involved in T-DNA integration in plants. Plant Sci 171:308–322 Zhu Q-H, Dennis ES, Upadhyaya NM (2006b) compact shoot and leafy head 1, a mutation affects leaf initiation and developmental transition in rice (Oryza sativa L.). Plant Cell Rep (In Press, DOI: 10.1007/s00299-006-0259-6)
11 Gene Targeting by Homologous Recombination for Rice Functional Genomics
Shigeru Iida, Yasuyo Johzuka-Hisatomi and Rie Terada National Institute for Basic Biology, Okazaki 444-8585, Japan Reviewed by Barbara Hohn and Charles White
11.1 Introduction ...........................................................................................273 11.2 Gene Targeting by Homologous Recombination...................................278 11.2.1 Gene-Specific Selection and Gene-Specific Screening ..................279 11.2.2 Strong Positive-Negative Selection for Enriching Targeted Homologous Recombinants............................................................280 11.3 Potential Approaches for Homologous Recombination-Dependent Gene Targeting ......................................................................................282 11.4. Concluding Remarks.............................................................................285 Acknowledgments .........................................................................................286 References......................................................................................................286
11.1 Introduction Gene targeting refers to the alteration of a specific DNA sequence in an endogenous gene at its original locus in the genome and, often, to the conversion of the endogenous gene into a designed sequence (Iida and Terada 2005). Now that the complete sequencing of the rice genome has been achieved (International Rice Genome Sequencing Project 2005), developing an easy and routine gene targeting procedure for characterizing the gene of interest in rice becomes particularly important. To modify an endogenous gene into a predetermined sequence in higher plants, two approaches are generally employed: chimeric RNA/DNA oligonucleotide-directed gene targeting and homologous recombination-dependent gene targeting. The former method, which generates site-specific base changes, has been reported to apply only to a single gene encoding acetolactate synthase (ALS) in tobacco and rice or acetohydroxy acid synthase (AHAS) in maize for the catalysis of the first common step in the biosynthesis of the branched amino acids and to
274
Shigeru Iida et al.
alter certain amino acids of the enzymes that confer resistance to herbicides (Zhu et al. 2000; Kochevenko and Willmitzer 2003; Okuzaki and Toriyama 2004). The latter homologous recombination-dependent gene targeting has been demonstrated to lead to both gene replacements and base changes (Fig. 11.1).
Fig. 11.1. Integration of a transgene associated with homologous recombinationdependent gene targeting. (a) Targeted gene replacement. The filled arrowheads with RB and LB indicate the right and left borders of T-DNA in an introduced vector, respectively. The open box labeled with “Marker” represents the segment for gene replacement, which usually carries either a selectable or screenable marker. The anticipated true gene targeting is generally regarded to occur via double crossovers at the flanking homologous regions on the vector, and the brackets under the maps indicate the junction fragments generated by the crossovers. To identify the transformed calli having the anticipated targeted gene replacement, systematic PCR-based screening was used to detect the junction fragments. One homologous recombination and another nonhomologous end-joining at the target locus result in
11 Gene Targeting by Homologous Recombination
275
Gene targeting by homologous recombination in mouse is a routine practice (Evans et al. 2001). However, so far, there have been only three reports describing the reproducible gene targeting of endogenous genes that resulted in generating fertile transgenic plants, two genes in Arabidopsis and one in rice (Hanin et al. 2001; Terada et al. 2002; Shaked et al. 2005), even though numerous efforts of gene targeting in higher plants have been made in the past two decades (Hanin and Paszkowski 2003; Hohn and Puchta 2003; Reiss 2003; Iida and Terada 2004, 2005; Tzfira and White 2005). Because independent gene targeting events by homologous recombination should generate an identical genomic structure with a previously designed sequence alteration(s), the experimental demonstration of the capability for the reproducible isolation of recombinants with the anticipated gene structure would be very important.
one-sided invasion. The most efficient integration of a transgene in Agrobacteriummediated transformation is border-associated random integration promoted by nonhomologous end-joining, in which the integrated single T-DNA molecules contain the entire T-DNA segment with a well-conserved right border (dark gray arrowhead) and either conserved or slightly truncated left border (open arrowhead) sequences (Tinland and Hohn 1995; Brunaud et al. 2002; Tzfira et al. 2004). Random integration of transgenes that is independent of T-DNA border-associated integrations is also mediated by nonhomologous end-joining. Homologous recombinationpromoted double crossovers between the introduced transgene and a copy of the target sequence and subsequent random integration of the resulting recombinant molecule by nonhomologous end-joining generate ectopic gene targeting; both junction fragments are generated, but the transgene is integrated randomly (Hohn and Puchta 2003; Iida and Terada 2004, 2005). The target gene is modified in both true gene targeting and one-sided invasion (within the box), whereas the target gene remains intact in both random integrations as well as ectopic gene targeting. (b) Targeted base changes. The base change is indicated by open pentagrams. Targeted base changes can be explained by double crossover events. Alternatively, the mismatch repair of a heteroduplex intermediate produced by single crossover and subsequent branch migration, which is followed by resolution, can also result in targeted point mutations. It is noteworthy that the occurrence of ectopic gene targeting was detected in all of the reported experiments with targeted base changes (Lee et al. 1990; Hanin et al. 2001; Endo et al. 2006). (c) Targeted gene replacement with a positive-negative selection. The HPT (hph) and DT-A genes were used as positive and negative selection markers, respectively, and the DT-A genes were placed next to the border sequences at both ends of the T-DNA segment to eliminate the border-associated random integrations efficiently (Terada et al. 2002; Iida and Terada 2004). Except for the true gene targeting events, all of the calli that survived the positive–negative selection were found to carry the truncated T-DNA segments, including the active HPT gene integrated into the genome by nonhomologous endjoining processes that are independent of the border-associated random integration (Y. Johzuka-Hisatomi, unpublished results).
276
Shigeru Iida et al.
The most serious difficulty in gene targeting is thought to stem from the fact that the frequency of the sequence-specific integration of a transgene by homologous recombination compared with random integration by nonhomologous end-joining in higher plants is much lower than that in mouse embryogenic stem cells; the targeted integration in higher plants occurs in the order of 0.01% to 0.1% of the random integrations (Hanin et al. 2001; Terada et al. 2002; Iida and Terada 2005; Shaked et al. 2005), whereas that in mouse is reported to occur in the order of 1% or higher (Jasin et al. 1996). Since all successful experiments to target endogenous genes via homologous recombination have been performed by Agrobacteriummediated transformation (Kempin et al. 1997; Hanin et al. 2001; Terada et al. 2002; Shaked et al. 2005; Endo et al. 2006), it is worthwhile to consider random integrations in Agrobacterium-mediated transformation briefly, although their molecular mechanisms remain largely unknown (Tzfira et al. 2004). The T-DNA molecules can be integrated into the genome by nonhomologous end-joining as a single copy or multiple copies ligated to each other in various orientations. The majority of the integrated single-copy T-DNA molecules are known to contain the entire T-DNA segment with a well-conserved right border and a left-border sequence that is either conserved or truncated by a few to around 100 bp (collectively termed here as border-associated random integration in Fig.11.1a; Tinland and Hohn 1995; Brunaud et al. 2002; Tzfira et al. 2004). We further postulated that there must be other random integrations with relatively large deletions at both ends of the T-DNA segment without the border proximal regions (e.g., Matsumoto et al. 1990); these random integrations appear to occur much less frequently than the border-associated random integrations. Because a significant portion of single-stranded T-DNA imported into the plant nucleus can become double-stranded in Agrobacterium-mediated transformation (Tzfira et al. 2004), some of the integration processes of transgenes that are independent of the border-associated integrations may be in common with those introduced by direct DNA delivery methods (Tinland and Hohn 1995; Somers and Makarevitch 2004). In addition to such random integrations, the concomitant occurrence of undesirable ectopic recombination events, such as one-sided invasion and ectopic gene targeting, has sometimes been detected (Fig. 11.1a; Puchta 2002; Hanin and Paszkowski 2003; Hohn and Puchta 2003; Reiss 2003; Iida and Terada 2004, 2005). One-sided invasion results from one homologous crossover and another nonhomologous end-joining at the target locus, while ectopic gene targeting is thought to be generated by a random integration of a recombinant molecule produced by homologous recombination between the introduced transgene and a copy of the target sequence without altering the gene to be targeted.
11 Gene Targeting by Homologous Recombination
277
To circumvent the recovery of the overwhelming random integrations in higher plants, one approach is to apply either gene-specific selection or screening for the target genes. The Arabidopsis PPO gene for protoporphyrinogen oxidase was chosen for direct gene-specific selection, through which targeted plants acquired herbicide resistance (Hanin et al. 2001), and the Arabidopsis Cruciferin gene for a seed storage protein was employed for gene-specific visual screening, through which the targeted integration of a promoterless gfp gene resulted in fluorescent seeds (Shaked et al. 2005). Another approach is to use a strong positive–negative selection for enriching the transformants with the targeted genes indirectly by reducing the transformants with randomly integrated transgenes that contain a lethal negative-selection marker (the DT-A gene encoding the diphtheria toxin A fragment) and to identify true gene targeting by polymerase chain reaction (PCR) analysis among the surviving transformants. Mutants with the modified rice Waxy gene were obtained in this way (Terada et al. 2002; Iida and Terada 2005). The last approach is, in principle, applicable to any other gene, whereas gene-specific screening appears to be applicable to the genes that must be reasonably well expressed in a highly specific manner (e.g., seed-specific expression of the Cruciferin gene) when the homologous recombinants are screened. It should be emphasized here that the method for Agrobacteriummediated transformation in Arabidopsis is different from that for rice. While the infiltrating inflorescence or floral dip method is routinely employed in Arabidopsis and the resulting transformants can be selected in their progeny seedlings (Bechtold et al. 1993; Clough and Bent 1998), the procedure with embryo scutellum-derived calli generated from mature seeds is generally used for Agrobacterium inoculation in rice (Hiei et al. 1994), and the latter procedure is subsequently modified for large-scale transformation in order to adapt for gene targeting (Terada et al. 2002; 2004; R. Terada, unpublished results). One possible drawback of the rice transformation protocol is the concomitant occurrence of somaclonal variations, which refer to genetic and epigenetic changes induced by tissue culture (Larkin and Scowcroft 1981; Kaeppler et al. 2000). Because tissue culture is necessary in almost all of the currently available reverse genetic procedures in rice (Hirochika et al. 2004; Leung and An 2004), the occurrence of somaclonal variations appears to be inevitable and may hamper the efficient characterization of gene function. Indeed, the tagging efficiency with the endogenous retrotransposon Tos17 is reported to be very low due to the high occurrence of somaclonal variations, as the tissue culture is a prerequisite for activating the dormant Tos17 element (Hirochika et al. 1996; Kumar and Hirochika 2001). Under such circumstances, in view of rice functional genomics, it would not be ideal for gene targeting to alter the recombination and/or repair systems for suppressing the
278
Shigeru Iida et al.
random integrations or enhancing the homologous recombination processes because the alteration of the recombination and/or repair systems is likely to confer certain additional phenotypes even if the plant genes affecting homologous recombination or T-DNA integration are known (Britt and May 2003; Tzfira et al. 2004; Schuermann et al. 2005). For example, the transgenic Arabidopsis plants, which overexpress the yeast RAD54 gene for an SWI2/SNF2 chromatin-remodeling protein and have been shown to enhance the homologous recombination-dependent gene targeting, confer resistance to γ-irradiation (Shaked et al. 2005). Thus, the alteration of the recombination and/or repair systems would be more appropriate to elucidate the mechanisms of recombination processes in higher plants than to characterize the gene function per se. To avoid the potential side effects caused by altering the recombination and/or repair systems, it would be preferable to employ the wild-type rice varieties for developing a general reverse genetic method to characterize an endogenous gene by modifying the gene of interest. Even under such a condition, one of six independent T0 transgenic rice plants having the Waxy gene disrupted by gene targeting was reported to be less fertile and to set few seeds, although their T1 progeny appeared to be normal (Terada et al. 2002), suggesting that the mild fertility defect may be due to a somaclonal variation. With such a consideration in mind, we are attempting to develop a gene targeting procedure for rice functional genomics. In this short chapter, we focus on approaches toward routine, efficient, and generally applicable gene targeting by homologous recombination and discuss them from the viewpoint of the functional analysis of any rice genes without the concomitant occurrence of potential side effects caused by the approaches employed. In this context, we briefly describe the current situation of our gene targeting approach using large-scale Agrobacteriummediated transformation combined with a strong positive-negative selection. Because the present gene targeting methods may have certain inherent limitations, we also discuss other possible approaches that have not so far been successful for rice gene targeting.
11.2 Gene Targeting by Homologous Recombination Chimeric RNA/DNA oligonucleotide-directed gene targeting has been applied only to a single gene encoding ALS or AHAS. These chimeric RNA/DNA oligonucleotides were introduced either by particle bombardment (most of the time) or by electroporation, followed by gene-specific selection for herbicide resistance to isolate transgenic plants with the anticipated targeted modification (Zhu et al. 2000; Kochevenko and Willmitzer
11 Gene Targeting by Homologous Recombination
279
2003; Okuzaki and Toriyama 2004). Although homologous recombinationdependent gene targeting has been employed for both gene replacements and site-specific base changes (Fig. 11.1), so far, the latter base changes in an endogenous gene have been attempted only via gene-specific selection procedures (Lee et al. 1990; Hanin et al. 2001; Endo et al. 2006). 11.2.1 Gene-Specific Selection and Gene-Specific Screening The modification of two endogenous genes for ALS/AHAS and PPO by the gene-specific selection for herbicide resistance in tobacco and Arabidopsis has been attempted via homologous recombination-dependent gene targeting; only the Arabidopsis PPO gene has been reproducibly modified, and more than two independent fertile transgenic plants were obtained (Hanin et al. 2001). In all of the cases reported (Lee et al. 1990; Hanin et al. 2001; Endo et al. 2006), Agrobacterium-mediated transformation was used to introduce appropriate vector constructs, and the occurrence of undesirable ectopic gene targeting events was always observed. Although the modification of the ALS/AHAS gene by chimeric RNA/DNA oligonucleotidedirected gene targeting has been reported in maize, tobacco, and rice, the successful transmission of the modified alleles has been documented only in maize and tobacco (Zhu et al. 2000; Kochevenko and Willmitzer 2003; Okuzaki and Toriyama 2004). Because all the modified alleles used are previously known alleles that confer herbicide resistance, it is rather doubtful that they would be applicable to any rice gene of interest for the elucidation of its function. There is only one report describing a reproducible modification of an endogenous gene by gene-specific visual screening in homologous recombination-dependent gene targeting with a promoterless gfp gene (Shaked et al. 2005). Surprisingly, all four transgenic plants derived from fluorescent seeds of the control experiments using the wild-type Arabidopsis plants were found to contain the anticipated Cruciferin gene structure with the gfp sequence integrated precisely by homologous recombination, even though it is conceivable that the promoterless gfp sequence flanked by the 1.2- and 2.5-kb Cruciferin sequences can be truncated and integrated inframe into various endogenous genes that are expressed, to a certain extent, during seed development. Compared with the wild-type situation, it may not be so surprising that all of the 15 putative gene-targeted alleles analyzed in yeast RAD54-overexpressing plants, which enhanced homologous recombination, were generated by gene targeting. It remains to be seen whether such a gene-specific screening strategy with the promoterless gfp sequence can be applicable to various endogenous genes in wild-type rice for functional genomic analysis.
280
Shigeru Iida et al.
11.2.2 Strong Positive-Negative Selection for Enriching Targeted Homologous Recombinants A strong positive-negative selection was devised for enriching targeted genes indirectly by reducing transformants with randomly integrated transgenes that contain a lethal negative selection marker (Iida and Terada 2005). The DT-A gene encoding the diphtheria toxin A fragment was shown to be effective as a suitable negative selection marker, whereas an earlier attempt to employ the Escherichia coli codA gene for cytosine deaminase was found to be insufficient for such a negative selection marker (Terada et al. 2004). In addition, a large-scale Agrobacteriummediated transformation procedure was developed to obtain rare calli that had the target gene modified (Terada et al. 2002, 2004) and further improved to achieve an easy and routine transformation method with increasing experimental scales (R. Terada, unpublished results). Most of the single T-DNA molecules integrated into the genome were shown to contain the entire T-DNA segment with a well-conserved right border and either conserved or slightly truncated left-border sequences (border-associated random integration in Fig. 11.1a; Tinland and Hohn 1995; Brunaud et al. 2002; Tzfira et al. 2004). To eliminate such border-associated random integrations effectively, the strong negative selection marker, DT-A, was placed next to the border sequences at both ends of the T-DNA segment in the vector used for gene targeting (Fig. 11.1c). All of the surviving calli tested were found to contain an active HPT gene but no DT-A gene with intact introduced promoter sequences, indicating that none of them is an escapee from the positive–negative selection (Terada et al. 2004; Y. Johzuka-Hisatomi, unpublished results). Thus, the surviving calli carry truncated T-DNA segments without both border proximal regions integrated into the genome by nonhomologous end-joining processes that must be independent of the border-associated random integrations (Fig.11.1c). After enriching the transformants with the anticipated targeted alleles, PCR screening was employed to identify homologous recombinant calli that had the HPT gene integrated into the genome via homologous recombination (Terada et al. 2002). If necessary, the PCR-amplified junction fragments could be characterized further by restriction cleavage analysis and endsequencing to exclude false-positive PCR-amplified fragments that were similar in size to the anticipated junction fragments (Y. Johzuka-Hisatomi, unpublished results). Subsequently, about 30 fertile transgenic rice plants were regenerated from each single homologous recombinant callus through multiple shoots, and plants having the target gene modified in the homozygous condition were obtained among the selfed progeny of these fertile transgenic plants and examined to determine whether the gene to be targeted had carried the anticipated modified structure or remained intact to distinguish
11 Gene Targeting by Homologous Recombination
281
between true gene targeting and ectopic gene targeting (Fig. 11.1a and c; Terada et al. 2002; Iida and Terada 2004, 2005). There may be two ways to estimate the frequency of gene targeting employing Agrobacteriummediated transformation. The commonly used gene targeting frequency is calculated from the ratio of the number of homologous recombinants to the number of transformants resulting from the usual border-associated random integrations (transformants obtained by using another vector without containing the negative DT-A gene). The targeting frequency for the Waxy gene in wild-type rice, as determined in this way, was estimated to be 0.065%, whereas those for the PPO and Cruciferin genes in wild-type Arabidopsis were reported to be 0.072% and around 0.56%, respectively (Hanin et al. 2001; Terada et al. 2002; Shaked et al. 2005). An alternative gene targeting frequency is determined by homologous recombination-promoted integrations per nonhomologous end-joining-mediated random integration that is independent of the border-associated random integrations; this frequency is calculated by homologous recombinant calli per surviving callus with positive-negative selection (Fig. 11.1c), and such targeting frequency for the rice Waxy gene was 0.94% (Terada et al. 2002). Using basically the same strategy with positive–negative selection, we were able to obtain transgenic rice plants with either an altered Adh1 or Adh2 gene for alcohol dehydrogenase and homologous recombinant calli with a modified DDM1 gene for an SWI2/SNF2 chromatin-remodeling protein (Jeddeloh et al. 1999; International Rice Genome Sequencing Project 2005). All of the obtained primary transgenic plants with the targeted modifications in Waxy, Adh1, or Adh2 were found to carry only one copy of the transgene with the anticipated structure in the heterozygous condition, and neither one-sided invasion nor ectopic gene targeting could be detected (Terada et al. 2002; Y. JohzukaHisatomi, R. Terada, K. Yamaguchi, and S. Iida, unpublished results). In the case of the DDM1 recombinant calli, we still need to examine whether some of them carry ectopically targeted genes. Moreover, the targeting frequencies of the rice Waxy and Adh2 genes, as determined by homologous recombination-promoted integrations per surviving callus with positive–negative selection, were found to be 1% or higher, comparable with the gene targeting frequencies in mouse embryogenic stem cells (Jasin et al. 1996). Interestingly, the targeting frequency of Adh1 was considerably lower than that of Adh2 (R. Terada and Y. Johzuka-Hisatomi, unpublished results), even though these Adh genes are clustered in the same orientation on chromosome 11 (Tarchini et al. 2000; International Rice Genome Sequencing Project 2005). Possible models for the generation of successful gene targeting events with positive–negative selection have been discussed (Iida and Terada 2004, 2005).
282
Shigeru Iida et al.
11.3 Potential Approaches for Homologous Recombination-Dependent Gene Targeting Although only three reports have described the reproducible gene targeting of endogenous genes that resulted in the generation of fertile transgenic plants (Hanin et al. 2001; Terada et al. 2002; Shaked et al. 2005), various approaches for homologous recombination-dependent gene targeting have been attempted (Puchta 2002; Britt and May 2003; Gong and Rong 2003; Hanin and Paszkowski 2003; Reiss 2003; Iida and Terada 2005; Tzfira and White 2005). Here, we describe two emerging approaches potentially applicable to the targeting of an endogenous gene in higher plants: zinc-finger nucleases for the induction of genomic double-strand breaks and gene targeting by generating an intermediate ends-out molecule. Both of these approaches were originally developed for targeting the endogenous yellow gene in Drosophila (Bibikova et al. 2003; Gong and Golic 2003; Porteus and Carroll 2005). It is known that genomic double-strand breaks can be repaired either by one of several homologous recombination mechanisms or by various nonhomologous repair processes (Haber 2000; Ray and Langer 2002; Carroll 2004; Puchta 2005). One of the most promising approaches to introduce double-strand breaks at a targeted genomic sequence is to employ synthetic zinc-finger nucleases; these enzymes are hybrid proteins comprised of a nonspecific DNA-cleavage domain of the FokI restriction enzyme with DNA-binding modules based on Cys2His2 zinc fingers that can recognize the GNN and ANN sequences (Segal et al. 1999; Dreier et al. 2001; Liu et al. 2002). Thus, the zinc-finger nucleases recognize and cleave appropriate sequences composed of (NNY)3N6(RNN)3, or, perhaps preferably, of (NNC)3N6(GNN)3, because the designs of the zinc-finger DNA-binding modules for the GNN triplets appear to be better characterized than those for the ANN triplets (Carroll 2004). The first successful modification of an endogenous natural gene by designed synthetic zinc-finger nucleases was the targeted mutagenesis of the yellow locus in Drosophila: a pair of the introduced synthetic zinc-finger nuclease genes fused with a heat-shock promoter was induced to generate double-strand breaks that could be repaired by nonhomologous end-joining, and small deletions and/or insertions were observed at the cleavage site (Bibikova et al. 2002). Subsequently, the targeted replacement of the yellow gene was successfully performed by generating a linear intermediate ends-out molecule that was the I-SceI endonuclease-cleaved product of the excised extrachromosomal circular molecule generated by the FLP site-specific recombinase (Gong and Golic 2003). For successful gene targeting, both genes for the I-SceI endonuclease and FLP site-specific recombinase were fused with the same heat-shock promoter and simultaneously induced. Further, Bibikova et al. (2003) have succeeded in modifying the yellow locus more efficiently by combining the
11 Gene Targeting by Homologous Recombination
283
synthetic zinc-finger nuclease to introduce double-strand breaks at the yellow gene with the intermediate ends-out molecule generated by the FLP and I-SceI enzymes. The frequency of gene targeting with the induction of the zinc-finger nuclease was found to be about 10-fold greater than that without induction, indicating that the introduction of the double-strand break at the target site activates homologous recombination processes. The current situation of gene targeting using zinc-finger nucleases including a human gene was recently reviewed (Porteus and Carroll 2005). In higher plants, gene targeting using zinc-finger nucleases remains in its infancy, and there are only two reports that have something to do with gene targeting promoted by zinc-finger nucleases (Lloyd et al. 2005; Wright et al. 2005). Targeted mutagenesis at an artificial synthetic target sequence in the Arabidopsis genome was demonstrated, and the anticipated small deletions and/or insertions at the target site were detected (Lloyd et al. 2005). To enhance homologous recombination by a transiently expressed zinc-finger nuclease, a construct containing the zinc-finger nuclease gene was introduced into tobacco protoplasts by electroporation, and a defect due to a 0.6-kb deletion in a model gus:nptII reporter gene encoding an artificial translational fusion of β-glucuronidase (GUS) and neomycin phosphotransferase (NPTII) in the tobacco genome was shown to be restored by homologous recombination with introduced 4.9-kb double-stranded gus:nptII DNA fragments (Wright et al. 2005). Thus, it remains to be seen whether an endogenous natural gene in higher plants, including rice, can be modified by homologous recombination that is enhanced by introducing double-stranded breaks with zinc-finger nucleases. Because a significant portion of the imported singlestranded T-DNA into the plant nucleus can become double-stranded before integration of the T-DNA into the plant genome in Agrobacterium-mediated transformation (Tzfira et al. 2004) and because the resulting double-stranded DNA molecules for targeted gene replacements closely resemble the intermediate ends-out molecules (Figs. 11.1a, 11.2b; Iida and Terada 2004), it is conceivable that a double-stranded break at the target gene generated by zinc-finger nucleases enhances the efficiency of gene targeting (Porteus and Carroll 2005; Puchta 2005). Since the VirD2 protein may be covalently attached to the 5΄-end of the imported T-DNA strand of the former molecules (Tzfira et al. 2004), the ends-out intermediates generated by the I-SceI endonuclease (see Fig. 11.2b) may serve better substrates for homologous recombination than the double-stranded T-DNA molecules. Nevertheless, it is likely that the combination of positive-negative selection with the utilization of zinc-finger nucleases will facilitate the targeting of certain genes, for example, Adh1, whose targeting frequency has been shown to be significantly below the 1% level, as determined by homologous recombinant calli per surviving callus with positive–negative selection (Fig. 11.1c; R. Terada and Y. Johzuka-Hisatomi, unpublished results).
284
Shigeru Iida et al.
Fig. 11.2. Zinc-finger nucleases for the induction of genomic double-strand breaks and gene targeting by generating an intermediate ends-out molecule. (a) A doublestrand break generated by zinc-finger nucleases. The genes for zinc-finger nucleases comprise a synthetic DNA-recognition domain consisting of three DNA-binding modules, each of which is based on Cys2His2 zinc-fingers (Zf), fused to a cleavage domain derived from the FokI restriction enzyme (Porteus and Carroll 2005; Tzfira and White 2005). The target sequence to be cleaved is taken from the endogenous yellow gene in Drosophila (Bibikova et al. 2002). (b) Schematic representations of gene targeting involved in an intermediate ends-out molecule. The site-specific recombination sites for the FLP recombinase are indicated by thick horizontal arrows, and the cleavage sites for the I-SceI endonuclease are shown by thin vertical arrows. The transgene disrupted either a selectable or screenable marker and flanked by the sites for the FLP site-specific recombinase and the I-SceI endonuclease is integrated into the genome by a P-element-based vector (Gong and Golic 2003), which can be easily substituted by a T-DNA-based vector in higher plants. Alternatively, the FRT sites for the FLP site-specific recombinase can be placed within the marker segment. For targeted gene replacements, an extrachromosomal circular molecule was excised by the FLP site-specific recombinase and then linearized by the I-SceI endonuclease to generate a linear intermediate ends-out molecule (Gong and Rong 2003). A double-strand break introduced at the target yellow gene by the zinc-finger nucleases was shown to enhance homologous recombination-dependent gene targeting in Drosophila (Bibikova et al. 2003)
11 Gene Targeting by Homologous Recombination
285
Although it would seem logical to develop endogenous gene targeting in plants by combing the zinc-finger nucleases with the intermediate ends-out molecule (Fig. 11.2b), the induction of genes for zinc-finger and I-SceI nucleases as well as FLP recombinase (or similar endonucleases and sitespecific recombinases) must be optimized for homologous recombination. This is because overexpression of these genes is likely to result in cytotoxic and other undesirable effects caused by excess cleavages of secondary target sequences, which are similar to the target sequences (Salomon and Puchta 1998; Bibikova et al. 2002; Coppoolse et al. 2003; Gilbertson 2003; Porteus and Carroll 2005), whereas underexpression of these genes may render gene targeting inefficient. Although the combined approach may have considerable potential for gene targeting, the development of an appropriate induction system with appropriate conditions (Padidam 2003) suitable for controlling the expression of the genes for nucleases and recombinases must be prerequisite for efficient targeting of an endogenous gene in rice. Since a transiently expressed zinc-finger nuclease was shown to stimulate gene targeting efficiently (Wright et al. 2005), a possible approach is to transiently induce the expression of genes encoding proteins that can directly or indirectly enhance homologous recombination processes, for example, RAD54, zinc-finger nuclease, or I-SceI endonucleases (Puchta 2005; Shaked et al. 2005), at the time of transformation for gene targeting. However, introduction of these multiple transgenes into the rice genome might accumulate potentially undesirable somaclonal variations because the tissue culture is necessary for the Agrobacterium-mediated transformation procedures in rice (Hiei et al. 1994; Terada et al. 2002).
11.4. Concluding Remarks Although the studies on homologous recombination and gene targeting are tightly linked in higher plants, in which homologous recombination is more inefficient than nonhomologous end-joining, the elucidation of homologous recombination processes and the application of gene targeting to characterize the gene of interest are not necessarily the same objectives. The alteration of recombination and/or repair systems would generally be more appropriate to elucidate the mechanisms of recombination processes than to characterize gene function. Except for gene tagging by an endogenous DNA transposable element (Tsugane et al. 2006), almost all of the currently available reverse genetic procedures in rice require tissue culture processes (Hirochika et al. 2004; Leung and An 2004), in which the concomitant occurrence of somaclonal variations associated with tissue culture is inevitable (Larkin and Scowcroft 1981; Kaeppler et al. 2000). In Arabidopsis, on the other hand, an easy transformation protocol by
286
Shigeru Iida et al.
infiltrating inflorescence or floral dipping with Agrobacterium, which is free from somaclonal variations because neither plant tissue culture nor regeneration processes are involved in the transformation (Bechtold et al. 1993; Clough and Bent 1998), has been established and is commonly used. Thus, the most urgently required technique to be developed for gene targeting and other reverse genetic analyses in rice will be the establishment of similar or alternative efficient transformation procedures (Potrykus 1991) that are free from the occurrence of somaclonal variations, even if the large-scale Agrobacterium-mediated transformation of rice calli with a strong positive–negative selection becomes an easy and routine procedure for gene targeting (Terada et al. 2002, 2004; R. Terada, unpublished results). As mentioned in the preceding text, we have isolated transgenic rice plants having either one of the Waxy, Adh1, and Adh2 genes homozygously modified by homologous recombination (Terada et al. 2002; Y. Johzuka-Hisatomi and R. Terada, unpublished results). Since the targeted integration of transgenes among the surviving calli with positivenegative selection in rice appears to be generally around 1%, it is now feasible to obtain transgenic rice plants having various endogenous genes modified by homologous recombination.
Acknowledgments The work in our laboratory was supported by grants from the Ministry of Agriculture, Forestry, and Fisheries of Japan (IP1007), the Ministry of Education, Culture, Sports, Science, and Technology of Japan, and the Program for Promotion of Basic Research Activities for Innovative Biosciences (PROBRAIN). We thank Barbara Hohn for valuable discussions and comments on the manuscript, Kazuo Tsugane and Atsushi Hoshino for discussions, and Charles White for reading the manuscript.
References Bechtold N, Ellis J, Pelletier G (1993) In planta Agrobacterium mediated gene transfer by infiltration of adult Arabidopsis thaliana. CR Acad Sci Paris Life Sci 316:1194–1199 Bibikova M, Golic M, Golic KG, Carroll D (2002) Targeted chromosomal cleavage and mutagenesis in Drosophila using zinc-finger nucleases. Genetics 161:1169–1175 Bibikova M, Beumer K, Trautman JK, Carroll D (2003) Enhancing gene targeting with designed zinc finger nucleases. Science 300:764
11 Gene Targeting by Homologous Recombination
287
Britt AB, May GD (2003) Re-engineering plant gene targeting. Trends Plant Sci 8:90–95 Brunaud V, Balzergue S, Dubreucq B, Aubourg S, Samson F, Chauvin S, Bechtold N, Cruaud C, DeRose R, Pelletier G, Lepiniec L, Caboche M, Lecharny A (2002) T-DNA integration into the Arabidopsis genome depends on sequences of pre-insertion sites. EMBO Rep 3:1152–1157 Carroll D (2004) Using nucleases to stimulate homologous recombination. Methods Mol Biol 262:195–207 Clough SJ, Bent AF (1998) Floral dip: a simplified method for Agrobacteriummediated transformation of Arabidopsis thaliana. Plant J 16:735–743 Coppoolse ER, de Vroomen MJ, Roelofs D, Smit J, van Gennip F, Hersmus BJM, Nijkamp HJ, van Haaren MJ (2003) Cre recombinase expression can result in phenotypic aberrations in plants. Plant Mol Biol 51:263–279 Dreier B, Beerli RR, Segal DJ, Flippin JD, Barbas CF 3rd (2001) Development of zinc finger domains for recognition of the 5΄-ANN-3΄ family of DNA sequences and their use in the construction of artificial transcription factors. J Biol Chem 276:29466–29478 Endo M, Osakabe K, Ichikawa H, Toki S (2006) Molecular characterization of true and ectopic gene targeting events at the acetolactate synthase gene in Arabidopsis. Plant Cell Physiol 47:372–379 Evans MJ, Smithies O, Capecchi MR (2001) Mouse gene targeting. Nat Med 7:1081–1090 Gilbertson L (2003) Cre-lox recombination: Cre-ative tools for plant biotechnology. Trends Biotechnol 21:550–555 Gong WJ, Golic KG (2003) Ends-out, or replacement, gene targeting in Drosophila. Proc Natl Acad Sci USA 100:2556–2561 Gong M, Rong YS (2003) Targeting multi-cellular organisms. Curr Opin Genet Dev 13:215–220 Haber JE (2000) Partners and pathways repairing a double-strand break. Trends Genet 16:259–264 Hanin M, Paszkowski J (2003) Plant genome modification by homologous recombination. Curr Opin Plant Biol 6:157–162 Hanin M, Volrath S, Bogucki A, Briker M, Ward E, Paszkowski J (2001) Gene targeting in Arabidopsis. Plant J 28:671–677 Hiei Y, Ohta S, Komari T, Kumashiro T (1994) Efficient transformation of rice (Oryza sativa L.) mediated by Agrobacterium and sequence analysis of the boundaries of the T-DNA. Plant J 6:271–282 Hirochika H, Sugimoto K, Otsuki Y, Tsugawa H, Kanda M (1996) Retrotransposons of rice involved in mutations induced by tissue culture. Proc Natl Acad Sci USA 93:7783–7788 Hirochika H, Guiderdoni E, An G, Hsing YI, Eun MY, Han CD, Upadhyaya N, Ramachandran S, Zhang Q, Pereira A, Sundaresan V, Leung H (2004) Rice mutant resources for gene discovery. Plant Mol Biol 54:325–334 Hohn B, Puchta H (2003) Some like it sticky: targeting of the rice gene Waxy. Trends Plant Sci 8:51–53
288
Shigeru Iida et al.
Iida S, Terada R (2004) A tale of two integrations, transgene and T-DNA: gene targeting by homologous recombination in rice. Curr Opin Biotechnol 15:132–138 Iida S, Terada R (2005) Modification of endogenous natural genes by gene targeting in rice and other higher plants. Plant Mol Biol 59:205–219 International Rice Genome Sequencing Project (2005) The map-based sequence of the rice genome. Nature 436:793–800 Jasin M, Moynahan ME, Richardson C (1996) Targeted transgenesis. Proc Natl Acad Sci USA 93:8804–8808 Jeddeloh JA, Stokes TL, Richards EJ (1999) Maintenance of genomic methylation requires a SWI2/SNF2-like protein. Nat Genet 22:94–97 Kaeppler SM, Kaeppler HF, Rhee Y (2000) Epigenetic aspects of somaclonal variation in plants. Plant Mol Biol 43:179–188 Kempin SA, Liljegren SJ, Block LM, Rounsley SD, Yanofsky MF, Lam E (1997) Targeted disruption in Arabidopsis. Nature 389:802–803 Kochevenko A, Willmitzer L (2003) Chimeric RNA/DNA oligonucleotide-based site-specific modification of the tobacco acetolactate synthase gene. Plant Physiol 132:174–184 Kumar A, Hirochika H (2001) Applications of retrotransposons as genetic tools in plant biology. Trends Plant Sci 6:127–134 Larkin PJ, Scowcroft WR (1981) Somaclonal variation: a novel source of variability from cell cultures for plant improvement. Theor Appl Genet 60:197–214 Lee KY, Lund P, Lowe K, Dunsmuir P (1990) Homologous recombination in plant cells after Agrobacterium-mediated transformation. Plant Cell 2:415–425 Leung H, An G (2004) Rice functional genomics: large-scale gene discovery and applications to crop improvement. Adv Agron 82:55–111 Liu Q, Xia Z, Zhong X, Case CC (2002) Validated zinc finger protein designs for all 16 GNN DNA triplet targets. J Biol Chem 277:3850–3856 Lloyd A, Plaisier CL, Carroll D, Drews GN (2005) Targeted mutagenesis using zinc-finger nucleases in Arabidopsis. Proc Natl Acad Sci USA 102:2232–2237 Matsumoto S, Ito Y, Hosoi T, Takahashi Y, Machida Y (1990) Integration of Agrobacterium T-DNA into a tobacco chromosome: possible involvement of DNA homology between T-DNA and plant DNA. Mol Gen Genet 224:309–316 Okuzaki A, Toriyama K (2004) Chimeric RNA/DNA oligonucleotide-directed gene targeting in rice. Plant Cell Rep 22:509–512 Padidam M (2003) Chemically regulated gene expression in plants. Curr Opin Plant Biol 6:169–177 Porteus MH, Carroll D (2005) Gene targeting using zinc finger nucleases. Nat Biotechnol 23:967–973 Potrykus I (1991) Gene transfer to plants: assessment of published approaches and results. Annu Rev Plant Physiol Plant Mol Biol 42:205–225 Puchta H (2002) Gene replacement by homologous recombination in plants. Plant Mol Biol 48:173–182 Puchta H (2005) The repair of double-strand breaks in plants: mechanisms and consequences for genome evolution. J Exp Bot 56:1–14 Ray A, Langer M (2002) Homologous recombination: ends as the means. Trends Plant Sci 7:435–440
11 Gene Targeting by Homologous Recombination
289
Reiss B (2003) Homologous recombination and gene targeting in plant cells. Int Rev Cytol 228:85–139 Salomon S, Puchta H (1998) Capture of genomic and T-DNA sequences during double-strand break repair in somatic plant cells. EMBO J 17:6086–6095 Schuermann D, Molinier J, Fritsch O, Hohn B (2005) The dual nature of homologous recombination in plants. Trends Genet 21:172–181 Segal DJ, Dreier B, Beerli RR, Barbas CF 3rd (1999) Toward controlling gene expression at will: selection and design of zinc finger domains recognizing each of the 5'-GNN-3' DNA target sequences. Proc Natl Acad Sci USA 96:2758–2763 Shaked H, Melamed-Bessudo C, Levy AA (2005) High-frequency gene targeting in Arabidopsis plants expressing the yeast RAD54 gene. Proc Natl Acad Sci USA 102:12265–12269 Somers D, Makarevitch I (2004) Transgene integration in plants: poking or patching holes in promiscuous genomes? Curr Opin Biotechnol 15:126–131 Tarchini R, Biddle P, Wineland R, Tingey S, Rafalski A (2000) The complete sequence of 340 kb of DNA around the rice Adh1-Adh2 region reveals interrupted colinearity with maize chromosome 4. Plant Cell 12:381–391 Terada R, Urawa H, Inagaki Y, Tsugane K, Iida S (2002) Efficient gene targeting by homologous recombination in rice. Nat Biotechnol 20:1030–1034 Terada R, Asao H, Iida S (2004) A large-scale Agrobacterium-mediated transformation procedure with a strong positive-negative selection for gene targeting in rice (Oryza sativa L.). Plant Cell Rep 22:653–659 Tinland B, Hohn B (1995) Recombination between prokaryotic and eukaryotic DNA: integration of Agrobacterium tumefaciens T-DNA into the plant genome. Genet Eng 17:209–229 Tsugane K, Maekawa M, Takagi K, Takahara H, Qian Q, Eun CH, Iida S (2006) An active DNA transposon nDart causing leaf variegation and mutable dwarfism and its related elements in rice. Plant J 45:46–57 Tzfira T, White C (2005) Towards targeted mutagenesis and gene replacement in plants. Trends Biotechnol 23:567–569 Tzfira T, Li J, Lacroix B, Citovsky V (2004) Agrobacterium T-DNA integration: molecules and models. Trends Genet 20:375–383 Wright DA, Townsend JA, Winfrey RJJr, Irwin PA, Rajagopal J, Lonosky PM, Hall BD, Jondle MD, Voytas DF (2005) High-frequency homologous recombination in plants mediated by zinc-finger nucleases. Plant J 44:693–705 Zhu T, Mettenburg K, Peterson DJ, Tagliani L, Baszczynski CL (2000) Engineering herbicide-resistant maize using chimeric RNA/DNA oligonucleotides. Nat Biotechnol 18:555–558
12 RNA Silencing and Its Application in Functional Genomics
1
1
1
2
Shaun J. Curtin , Ming-Bo Wang , John M. Watson , Paul Roffey , 2 1 Chris L. Blanchard and Peter M. Waterhouse 1
CSIRO Plant Industry, GPO Box 1600, Canberra, ACT 2601 Australia; 2Charles Sturt University, Wagga Wagga, NSW 2678, Australia Reviewed by Werner Aufsatz
12.1 Introduction............................................................................................291 12.2 Discovery of RNA Silencing .................................................................292 12.3 RNA Silencing Pathways.......................................................................295 12.3.1 MicroRNA and Trans-Acting siRNA Pathways.............................296 12.3.2 Repeat-Associated Small Interfering RNA and RNA-Directed DNA Methylation...........................................................................296 12.4 Proteins Involved in RNA Silencing Pathways .....................................299 12.4.1 The Dicer-Like Proteins .................................................................299 12.4.2 Hua Enhancer 1 ..............................................................................303 12.4.3 The Double-Stranded RNA-Binding Protein Family .....................305 12.4.4 The Argonaute Protein Family .......................................................305 12.4.5 RNA-Dependent RNA Polymerase (RdRP)...................................307 12.4.6 DNA Methyltransferases ................................................................307 12.5 RNA Silencing and Anti-Viral Defense .................................................307 12.6 Gene Silencing Platforms in Plants........................................................310 12.6.1 Delivery by Transgenes..................................................................313 12.6.2 Transient Delivery by Viral Vectors—Virus-Induced Gene Silencing........................................................................................321 12.6.3 Transient Delivery by Agrobacterium Infection and Biolistics ......323 12.7 Future Prospects of Gene Silencing Technology in Plants ....................323 References......................................................................................................324
12.1 Introduction Recent recognition of the critical roles of small RNAs in eukaryotic development and metabolism have challenged our conventional thinking about
292
Shaun J. Curtin et al.
the ways in which genes are regulated in living organisms (Waterhouse et al. 2001a; Carrington and Ambros 2003; Stevenson and Jarvis 2003; Ruvkun et al. 2004). RNA silencing (or gene silencing) is the broad term used to describe mechanisms found in all organisms, with the notable exception of bacteria and the yeast Sacchromyces cerevisiae, variously termed post-transcriptional gene silencing (PTGS) in plants, quelling in fungi, and RNA interference (RNAi) in animals. These complex processes involve RNA–RNA, RNA–DNA, RNA–protein, and protein–protein interactions (Wang and Metzlaff 2005). In this chapter, we present an overview of the various RNA silencing pathways, the genes and proteins involved, and the gene silencing technologies that have been developed for RNAi-directed mutagenesis in plants. Most of the key discoveries relating to RNA silencing have been made in the model dicot Arabidopsis. However, the RNA silencing technologies are equally applicable to monocots such as rice. RNA silencing, which has evolved to an extraordinary level of sophistication in the plant kingdom, is a general term used to describe the ensemble of processes involved in virus defense, transposon and chromatin control, and regulation of expression of genes involved in plant development. RNA silencing involves mechanisms that interfere with gene expression by either suppressing gene transcription or initiating sequence-specific mRNA degradation. It can also interfere with gene expression through inhibition of translation, although this occurs less frequently in plants than in animals (Bartel 2004). It has been suggested that these recently discovered gene-silencing mechanisms have several parallels with the immune system of animals (Waterhouse et al. 2001b). Because RNA silencing exists in most eukaryotes, in varying degrees of complexity, it is considered to have an ancient evolutionary origin. Although difficult to prove conclusively, there is increasing evidence to suggest that RNA silencing is likely to have been a major factor in the evolution of multicellular, eukaryotic organisms from prokaryotic progenitors (Sharp 2001; Margis et al. 2006).
12.2 Discovery of RNA Silencing RNA silencing was initially considered to be a side effect of introducing transgenes into a plant. When Napoli et al. (1990) and Van der Krol et al. (1990) attempted to overexpress a chalcone synthase (chsA) transgene in petunia, an unexpected result occurred. Chalcone synthase is a key enzyme involved in the biosynthesis of the red/purple pigments of petunia flowers. Surprisingly, many of the transgenic plants transformed with the cauliflower
12 RNA Silencing and Its Application in Functional Genomics
293
mosaic virus (CaMV) 35S promoter-driven chsA (chalcone synthase) expression construct lost both endogenous and transgene-encoded chalcone synthase activity, resulting in variably sized white sectors on the otherwise red/purple flowers. Nuclear run-on transcription experiments demonstrated that the loss of chsA mRNA was not associated with reduced transcription. Lindbo et al. (1993a) generated transgenic tobacco plants expressing a nontranslatable coat protein sequence derived from tobacco etch virus (TEV) and found that these plants were resistant to TEV but not to the unrelated potato virus Y (PVY). They found that TEV resistance was associated with decreasing steady-state levels of the transgene transcript, but not with a reduced rate of transgene transcription. They subsequently concluded that sequence-specific RNA degradation, induced by excessive levels of transgene- and virus-derived coat protein RNA, was responsible for the virus resistance (Lindbo et al. 1993b). They also postulated that a hostencoded RNA-dependent RNA polymerase was involved in this RNAmediated virus resistance mechanism. While further examples of PTGS in plants continued to accumulate (Baulcombe 1996; Metzlaff et al. 1997; Waterhouse et al. 1998), the RNA silencing phenomenon was independently observed in other eukaryotic organisms such as fungi, in which it was termed quelling. Quelling was discovered when Cogoni et al. (1996) attempted to increase the orange pigment produced by the fungal pathogen Neurospora crassa by transforming the fungus with the responsible al1 pigment gene. The resultant transformants had albino phenotypes. Northern blot hybridization analysis of the mRNA indicated that decreased RNA accumulation, and not the rate of transcription, was the cause of this gene silencing phenomenon. In animals, RNA silencing was first reported when Guo and Kemphues (1995) used antisense RNA to block par-1 mRNA expression in the nematode Caenorhabditis elegans. The par-1 gene is involved in cell fate determination during embryogenesis. They discovered that par-1 mRNA repressed expression of the par-1 gene and coined the term RNA interference (or RNAi) to describe this phenomenon. This finding inspired the experiments of Fire et al. (1998) in which they introduced dsRNA into C. elegans and found that it caused silencing of endogenous genes much more effectively than using either sense or antisense transcripts of the messenger RNA in question. Of considerable interest was their finding that only a small amount of dsRNA was required to achieve gene silencing, suggesting a catalytic or amplification step in the RNA interference process. Waterhouse et al. (1998) were the first to propose a role for doublestranded RNA (dsRNA) in RNA silencing in plants by showing that an inverted-repeat transgene, designed to express hairpin RNA (hpRNA) against the β-glucuronidase (GUS) reporter gene (gus or uidA) in rice, conferred much more efficient GUS silencing than conventional sense and
294
Shaun J. Curtin et al.
antisense gus transgenes. They also showed that tobacco plants, containing both sense and antisense transgenes encoding a PVY protease protein, were highly resistant to PVY. From these results they presented a model for a plant surveillance system that is induced by dsRNA, and is able to direct post-transcriptional gene silencing (PTGS). This finding also resulted in the development of highly efficient hpRNA transgene-mediated gene silencing technology in plants (discussed in detail below). The involvement of 21- to 25-nt small RNAs in RNA silencing was first demonstrated in plants by Hamilton and Baulcombe (1999). They showed that both silenced transgenes and infecting RNA viruses were associated with the accumulation of 21- to 25-nt small RNAs of both sense and antisense sequence. Subsequent studies by Zamore et al. (2000), using Drosophila in vitro systems, revealed the biochemical features of these small RNAs, termed small interfering RNAs or siRNAs: they generally have 2-nt 3' overhangs, and carry a 5 ' phosphate group. This provided direct evidence that siRNAs are the product of RNase III-like enzymes, and paved the way for the development of synthetic siRNA-mediated gene silencing technology in animals. The involvement of endogenous small RNA in regulating development was first reported by Lee et al. (1993), who discovered that the lin-4 locus, negatively regulating the level of LIN-14 protein, which is essential for the normal temporal control of diverse postembryonic developmental events in C. elegans, encodes a 22-nt small RNA. This lin-4 small RNA contains sequences complementary to the 3΄-untranslated region (UTR) of the LIN-14 mRNA, from which the authors proposed that lin-4 regulates lin-14 translation via an antisense RNA–RNA interaction. The true significance of this finding was not recognized until several years later, when many lin-4-like small RNAs, termed microRNAs, were discovered in both animals and plants and were shown to play a pivotal role in the control of normal development. Another significant discovery concerning RNA silencing that went almost unnoticed for several years was the demonstration that a replicating viroid, a small RNA pathogen of plants, was capable of inducing de novo cytosine methylation of homologous DNA in the nucleus. Wassenegger et al. (1994) showed that when tobacco plants, containing a transgene derived from the potato spindle tuber viroid (PSTVd), were infected with the viroid, heavy methylation was observed only on the PSTVd-specific transgene sequences. Subsequent studies showed that this RNA-directed DNA methylation (RdDM) could be induced by viruses, viral satellite RNA and transgene-derived dsRNAs. The possible involvement of RdDM in RNA silencing was suggested by Lindbo et al. (1993a), who showed that the coding region of post-transcriptionally silenced viral transgenes were hypermethylated. However, a clear demonstration of the involvement of RdDM in gene
12 RNA Silencing and Its Application in Functional Genomics
295
silencing came from the work of Mette et al. (2000), who showed that transcriptional gene silencing (TGS), accompanied by de novo methylation of a target promoter in plants could be triggered by dsRNA whose sequence was homologous to that of the promoter. This work also provided evidence for a direct link between post-transcriptional and transcriptional gene silencing, or between RNA silencing and heterochromatin silencing in plants as well as in other eukaryotes.
12.3 RNA Silencing Pathways In plants, specific RNA silencing pathways are involved in controlling the expression of developmentally regulated genes, defending against viral infection, and repressing the mobility of endogenous transposable elements. It is generally believed that RNA silencing is an evolutionarily ancient process, the basic components of which may have arisen before the divergence of plants and animals (Sharp 2001; Margis et al. 2006). The fact that organisms share similar pathway components (see later) suggests that RNA silencing is a universal gene regulatory system and a fundamental biological process (Cogoni and Macino 2000). RNA silencing is induced by the presence of double-stranded RNA (dsRNA) or hairpin RNA (hpRNA) in cells. These dsRNAs are cleaved by a dsRNA-specific RNAseIII enzyme called Dicer (Bernstein et al. 2001) (see later) into small (21- to 25-nt) products termed micro/small interfering RNA (mi/siRNA). The mi/siRNAs then associate with a so-called RNAinduced silencing complex (RISC) that uses one of the mi/siRNA strands to scan endogenous RNA molecules and cleaves those which have homology with the mi/siRNA (Filipowicz 2005). In animals, miRNAs do not usually cleave the target mRNA, but rather suppress translation by binding to one or more complementary sequences in the 3΄-untranslated region (3΄-UTR) of the mRNA. Five types of naturally occurring small RNAs have been described in Arabidopsis thaliana: microRNA (miRNA; Lee and Ambros 2001); small interfering RNA (siRNA; Hamilton and Baulcombe 1999); repeat-associated small interfering RNA (rasiRNA; Meister and Tuschl 2004); trans-acting small interfering RNA (ta-siRNA; Xie et al. 2005; Dunoyer et al. 2005), and natural antisense transcript siRNA (nat-siRNA; Borsani et al. 2005). miRNAs are encoded by endogenous genes whose primary transcripts (pri-miRNAs) contain imperfect hairpin-loop structures that are processed into miRNA precursors (pre-miRNAs). These pre-miRNAs are further processed into mature miRNAs. Several hundred different miRNAs have been identified in A. thaliana (Millar and Waterhouse 2005). siRNAs are generated
296
Shaun J. Curtin et al.
from dsRNA precursors that originate primarily from a single-stranded RNA template that is converted to dsRNA by RNA-dependent RNA polymerase (RdRP). Viral dsRNAs can form directly from viral RNA replicative intermediates. However, recent studies suggest that the stem-loop structures formed within single-stranded viral RNA, and secondary dsRNA synthesized by host-encoded RdRP using single-stranded viral RNA as a template, are also substrates for viral siRNAs production (Molnàr et al. 2005). 12.3.1 MicroRNA and Trans-Acting siRNA Pathways The miRNA pathway involves the processing of endogenous transcripts that contain partially complementary 20- to 50-bp inverted repeats that self-anneal to form hairpin molecules (Fig. 12.1). These hairpin RNAs are processed in two steps by an RNaseIII-like enzyme called Dicer-like1 (DCL1), in cooperation with the dsRNA-binding protein HYL1 (see later), into miRNAs. The miRNAs are incorporated into RISC, which then cleaves mRNAs that encode proteins involved in developmental processes such as stem cell maintenance, organ polarity, and other developmental processes (Park et al. 2002; Reinhart et al. 2002; Schauer et al. 2002; Carrington and Ambros 2003; Kurihara and Watanabe 2004). Ta-siRNAs are a class of endogenous siRNAs that are generated via an overlapping but distinctly different pathway from that of miRNAs and other types of siRNAs (Fig. 12.1). In A. thaliana, five ta-siRNA transcripts (TAS) are targets of the miRNAs miR173 or miR390, which set the 21-nt phasing for ta-siRNA processing. Essentially, these miRNAs bind to their cognate ta-siRNA transcripts, cleave the transcript, and the resulting cleavage product is used as a template by RDR6 to synthesize a complementary RNA strand. This dsRNA is then cleaved into 21-bp ta-siRNAs by DCL4 (Xie et al. 2005). These ta-siRNAs have been shown to target nearly one third of the 23 known auxin response factor (ARF) genes. The ARF genes encode transcription factors that transduce auxin signals during plant development (Jones-Rhoades and Bartel 2004; Allen et al. 2005). Thus, the ta-siRNA pathway is directly linked with the miRNA pathway and also plays a pivotal role in normal plant development. 12.3.2 Repeat-Associated Small Interfering RNA and RNA-Directed DNA Methylation Another distinct pathway involving siRNAs is transcriptional gene silencing (TGS). TGS results in epigenetic silencing of transgenes or endogenous genes at the level of transcription. A hallmark of TGS is the associated methylation of cytosines in the promoter region (Mette et al. 2000). Cytosine methylation, generally referred to as DNA methylation, is essential for the normal development of plants and mammals (Tamaru and Selker 2001). In
5
DRB1 HEN1
miRNA target
miRNA
miRNA duplex
ta-siRNA target
ta-siRNA
ta-siRNA duplexes
mRNA cleavage
AG01/7
DCL4 DRB4 HEN1
or
RDR6 SGS3
ta-siRNA precursor
mRNA cleavage
AG0?
DCL1 DRB? HEN1
RDR6 SGS3
AG0?
PollVa RDR6 SGS3 DRB? DCL2 HEN1
nat -siRNA precursor
AG04
**
*
*
*
rasiRNA
rasiRNA duplex
DNA methylation Chromatin remodelling
*
DRM1/2 PolIVa DRD1 DRB? 2 PolII AG04
21-nt nat -siRNA
nat -siRNA duplexes
nat -siRNA precursor
24-nt nat -siRNA
nat -siRNA duplex
RDR2 PollVa DRB? DCL3 HEN1
Fig. 12.1. RNA silencing pathways of Arabidopsis. See text for details (See also color plate section).
mRNA cleavage
AG01
DCL1
AG01
miRNA
*
5
*
miRNA precursor
Heterochromatin
*
Cis -antisense gene
**
TAS gene
*
MIR gene
12 RNA Silencing and Its Application in Functional Genomics 297
298
Shaun J. Curtin et al.
plants, DNA methylation has an essential role in maintaining genomic integrity by suppressing the transcriptional activity of transposons and other repetitive DNA sequences. In fact, almost all methylation in plants occurs in these transposons and repetitive sequences (Chan et al. 2005; Wassenegger 2005). Although it is unknown how the plant distinguishes between repeatassociated sequences and other sequences, recent studies in A. thaliana showed that the repeat-associated regions are a particularly rich source of a specific class of 24-nt siRNAs, termed repeat-associated siRNAs (rasiRNAs), which suggests that rasiRNAs are involved in directing methylation, and hence transcriptional silencing, of repetitive DNA sequences in the genome. Also, mutations in genes encoding various proteins involved in RNA silencing pathways result in disruption of the hetrochromatic structures (Onodera et al. 2005) and the loss of DNA methylation from the repeated DNA regions (Chan et al. 2004; Onodera et al. 2005). A number of studies have confirmed the involvement of RNA in the establishment of plant DNA methylation. Wassenegger et al. (1994) were the first to show that RNA-directed DNA methylation (RdDM) of homologous transgenes was induced by replicating viroids. Jones et al. (1998) showed that nuclear DNA sequences, homologous with viral sequences, became methylated following infection with the cytoplasmically replicating pea seed-borne mosaic virus (PSbMV) RNA. They speculated that a sequence-specific RNA signal was able to enter the nucleus and direct DNA methylation. Wang et al. (2001) reported that a cytoplasmically replicating viral satellite RNA could also induce strong cytosine methylation in the satellite transgene sequence in the nucleus. This sequence-specific DNA methylation was associated with the accumulation of the satellite-derived siRNAs, providing evidence that siRNAs may be a direct trigger of RdDM. Mette et al. (2000) showed that TGS, accompanied by de novo methylation of a target promoter in plants, was associated with siRNAs derived from an hpRNA transgene. This was demonstrated by transcribing an inverted-repeat nopaline synthase promoter sequence (NOSpro), from the CaMV 35S promoter, to produce a NOSpro hpRNA that was RNase-resistant. The NOSpro hpRNA transgene was also found to direct the methylation of homologous NOSpro sequences located in trans. From these studies, it was postulated that siRNAs guide DNA methyltransferases to homologous sequences throughout the genome (Mette et al. 2000). Subsequent studies of DNA methylation in RNA silencing mutants resulted in the identification of several RNA silencing factors, including DCL3 and AGO4 in the RdDM pathway, which supports a direct role of siRNAs in this process (Chan et al. 2004). However, an alternative model was also proposed by Melquist and Bender (2003) in which full-length dsRNA is postulated to be a direct inducer of RdDM. Several DNA methyltransferases and chromatin-remodeling factors have been identified that are directly or indirectly involved in the RdDM
12 RNA Silencing and Its Application in Functional Genomics
299
pathway. DRM2 and CMT3 were shown to be responsible for de novo, and maintenance, methylation of CNG and asymmetric cytosines, respectively. Methyltransferase 1 (MET1) is involved in the maintenance of CG methylation, while DDM1 is responsible for the maintenance of methylation in all sequence contexts. Recently, it was shown that a putative DNA-dependent RNA polymerase (Pol IV), that is unique to plants, is involved in the production of 24-nt rasiRNAs and/or CNG methylation in heterochromatic regions (Herr et al. 2005; Kanno et al. 2005; Onodera et al. 2005). However, a strict distinction between maintenance and de novo DNA methyltransferases does not always hold up. For example, CMT3 DRM1 and DRM2 have complex, locus-specific relationships. CNN (de novo) methylation at the Superman locus appears to be controlled by CMT3 (Cao and Jacobsen 2002).Yet hpRNAdirected de novo methylation of CGs within the nos promoter is dependent on MET1 (Aufsatz et al. 2004). Moreover, biochemical analysis of the tobacco DRM1 enzyme, in vitro, indicates that it readily catalyzes de novo methylation of non-CGs but not of CGs, indicating that other enzymes are required for efficient de novo CG methylation (Wada et al. 2003). While de novo methylation of cytosines in all sequence contexts is likely to require an RNA signal, methylation of symmetric cytosines, namely those in the CG and CNG contexts, can be perpetuated without RNA involvement by maintenance methyltransferases which use hemimethylated DNA as a template for methylation during DNA replication.
12.4 Proteins Involved in RNA Silencing Pathways Studies in a diverse range of organisms have revealed the roles of several groups of proteins in the different gene silencing pathways. Some of these proteins are functionally conserved throughout all eukaryotes and are essential for the production of small RNAs from dsRNA precursors and the downstream RNAi processes of mRNA degradation or translational inhibition (Du and Zamore 2005; Tomari and Zamore 2005; Vaucheret 2006). A summary of these RNA silencing-associated proteins and their known, or postulated, functions is shown in Table 12.1. 12.4.1 The Dicer-Like Proteins A key component of the RNA silencing pathway is an RNase III-type enzyme called Dicer (Bernstein et al. 2001). Dicer was first identified in Drosophila and was found to be evolutionarily conserved in mammals, worms, flies, plants, and fungi (Bernstein et al. 2001; Golden et al. 2002; Schauer et al. 2002). An A. thaliana homologue of the Drosophila Dicer has been designated DCL1 (Schauer et al. 2002). Three other A. thaliana DCLs (DCL2, 3, and 4) were also identified as having the same arrangement of functional motifs as that found in Dicer (Finnegan et al. 2003) (Table 12.2).
300
Shaun J. Curtin et al.
Table 12.1. Protein factors involved in the RNA silencing pathways in plants Dicer-like proteins DCL1 DCL2 DCL3 DCL4
Biogenesis of 21-nt miRNAs and ta-siRNAs. Essential for normal plant development and fertility. May also be involved in defense against viruses. Biogenesis of 22-23-nt siRNAs and 24-nt nat-siRNAs. Important for plant defense against certain viruses and for salt tolerance. Biogenesis of 24-nt rasiRNAs originating from transposons and other endogenous repeats. Important for normal heterochromatin formation and genomic stability by repressing the activity of endogenous transposons and insertion sequences. Biogenesis of 21-nt ta-siRNAs. Essential for regulating the expression of auxinresponse factor (ARF) genes, and hence normal plant growth and development such as vegetative phase change. Also involved in the biogenesis of 21-nt siRNAs from hpRNA transgenes.
RNA-dependent RNA polymerases RDR1 Has no known role in RNA silencing, but has a role in systemically acquired virus resistance. RDR2 Biogenesis of 24-nt rasiRNAs from transposons and other endogenous repeats. Important for normal heterochromatin formation and genomic stability by repressing the activity of endogenous transposons and insertion sequences. RDR6 Biogenesis of ta-siRNAs and sense transgene-derived siRNAs. Required for sense transgene-induced RNA silencing and virus-induced gene silencing (VIGS) by DNA viruses, and some RNA viruses. Required for amplicon-mediated RNA silencing. Involved in vegetative phase change and virus defence. Also designated SGS2 (suppressor of gene silencing) or SDE1 (silencing deficient). RNA helicase SGS3
Argonaute proteins AGO1
AGO4
RNA methylase HEN1
Biogenesis of ta-siRNAs and sense transgene-derived siRNAs. Required for sense transgene-induced RNA silencing and VIGS by DNA viruses and some RNA viruses. Involved in vegetative phase change and virus defence. A critical component of RISC which is involved in the siRNA or miRNA-mediated cleavage of mRNA. Involved in PTGS and may also be important for miRNA-mediated translational inhibition. Essential for normal plant development and defence against viruses. May be a component of RITS and therefore be essential for the initiation of de novo cytosine methylation of DNA and maintenance methylation and Lys-9 methylation in histone H3. Responsible for 3' terminal methylation, and hence stability, of RNA silencingassociated small RNAs and other endogenous small RNAs. Essential for normal plant development.
Double-stranded RNA-binding proteins DRB1 (HYL1) Biogenesis of miRNAs and ta-siRNAs. Physically interacts with DCL1. Essential for normal plant development. DRB4 Physically interacts with DCL4 and probably required for ta-siRNA biogenesis. Essential for normal plant development. DRB2, 3 and 5 Function not currently known but may be involved in particular RNA silencing pathways. DNA-dependent RNA polymerase RNA PolIV Biogenesis of 24-nt siRNAs originating from endogenous transposons and repetitive sequences. De novo methylation of repetitive sequences such as FWA and 5S ribosomal genes. Essential or plant defence against transposons and repetitive sequences. DNA methyltransferases DRM2 De novo cytosine methylation in all sequence contexts. Heterochromatin formation and important for plant defence against transposons and repetitive sequences. MET1 Primarily responsible for maintenance of CG methylation. Important for normal plant development. CMT3 Primarily responsible for maintenance of CNG methylation. Important for plant defense against transposons and repetitive sequences.
AtDCL1
AtDCL2
AtDCL3
AtDCL4
OsDCL1
OsDSC2a
OsDCL2b
OsDCL3a
OsDCL3b
OsDCL4
At1g01040
At3g03300
At3g43920
At5g20320
Os03g02970
Os03g38740
Os09g14610
Os01g68120
Os10g34430
Os04g43050
26-224
38-195
28-224
29-236
34-210
293-466
124-296
45-235
18-218
249-421
DExD
416-498
379-470
433-519
411-501
411-501
696-782
502-592
428-502
403-490
687-767
Helicase-C
562-652
536-626
587-677
569-655
855-950
656-748
557-645
840-935
DUF283
819-953
836-987
875-1029
797-938
826-967
1196-1357
941-1079
814-973
805-958
1180-1341
PAZ
975-1143
1057-1228
1048-1218
959-1115
988-1144
1373-1555
1101-1271
994-1167
975-1131
1361-1518
RNAseIIIa
1179-1331
1264-1420
1256-1412
1147-1299
1176-1331
1591-1747
1307-1459
1203-1353
1162-1317
1559-1707
RNAseIIIb
1335-1399
1424-1487
1416-1480
1303-1365
1335-1396
1751-1812
1463-1526
1342-1423
1321-1380
1733-1796
dsRBa
1520-1593
1507-1603
1507-1643
1836-1909
1622-1696
1436-1563
1831-1906
dsRBb
aThe linear arrangement of domains typically found in Arabidopsis (At) and rice (Os) DCL proteins is depicted above the table. The table contains the locations, in amino acid residues, where the eight different domains can be found in each DCL molecule. The gaps in the table represent the absence or failure to detect the presence of the domain in the appropriate DCL.
Protein Name
Plant Gene ID
Table 12.2. Locations of domains in DCL proteinsa
12 RNA Silencing and Its Application in Functional Genomics 301
302
Shaun J. Curtin et al.
Mutants of A. thaliana DCL1 such as SUSPENSOR1, CARPEL FACTORY, and SHORT INTEGUMENTS display obvious developmental defects. These include arrested embryogenesis, abnormal ovules, late flowering, and abnormal flowers in which carpels fail to fuse (Meins et al. 2005). Genetic and biochemical analyses of DCL1 revealed that this protein is essential for the production of miRNAs and normal plant development (Xie et al. 2004). Homozygous dcl1 mutants are sterile. DCL2 has a similar arrangement of functional domains to that of DCL1, except that it has one less dsRNA-binding domain (Table 12.2). Unlike DCL1, the exact role of DCL2 in the RNA silencing pathways remains unclear. A recent study suggested an anti-viral role for DCL2 with experiments showing delayed accumulation of turnip crinkle virus (TCV) siRNAs in an A. thaliana dcl2 mutant. However, the accumulation of cucumber mosaic virus (CMV) siRNAs was not affected by the loss-of-function mutation in dcl2, implying that one or more of the other DCLs are also involved in viral siRNA processing (Xie et al. 2004; Deleris et al. 2006). Recently, Borsani et al. (2005) reported another role for DCL2 in the production of a specific type of siRNA, the nat-siRNAs. These nat-siRNAs are generated from the overlapping 3΄ ends of two A. thaliana transcripts which are convergently transcribed from opposite DNA strands. One of these genes 1 encodes Δ -pyrroline-5-carboxylate dehydrogenase (P5CDH) while the function of the other gene (designated SRO5) is not known. The SRO5 gene is induced by salt stress, and annealing of the complementary 3΄ ends of the SRO5 and P5CDH transcripts provides a substrate for the DCL2-mediated generation of a 24-nt nat-siRNA. This nat-siRNA sets the phasing for DCL1-mediated generation of 21-nt siRNAs which direct cleavage of the constitutively expressed P5CDH transcripts (Fig. 12.1). Down-regulation of P5CDH leads to proline accumulation and consequent salt tolerance (Borsani et al. 2005). There are more than 2000 pairs of cis-antisense transcripts in A. thaliana (Wang et al. 2005), and it is not unlikely that some of these antisense transcripts are also processed by DCL2 to give rise to nat-siRNAs. The production of nat-siRNAs, like ta-siRNAs, is dependent on RDR6 and SGS3. In dcl1 mutants, the 24-nt nat-siRNA is still produced and P5CDH transcripts are down-regulated, indicating that nat-siRNA biogenesis is independent of the miRNA pathway (Borsani et al. 2005). DCL3 is involved in the generation of 24-nt siRNAs and is a key component of the RNA-dependent DNA methylation pathway (and epigenetic regulation) because the absence of 24-nt rasiRNAs in dcl3 mutants is associated with the loss of heterochromatic marks and increased transposon accumulation (Chan et al. 2004; Xie et al. 2004). Although DCL3 does not appear to participate in the biogenesis of siRNAs from RNA viruses (Xie et al. 2004), a recent study showed that it is involved in the production of 24nt siRNAs from a DNA virus that replicates in the nucleus (Akbergenov
12 RNA Silencing and Its Application in Functional Genomics
303
et al. 2006). It is possible that DCL3 acts specifically on nuclear accumulated dsRNA, including the endogenous repeat-associated dsRNA and the exogenous nuclear replicating virus-derived dsRNA. The most recently characterized Dicer-like protein is DCL4, which appears to have a broad-ranging role in the RNA silencing pathways (Allen et al. 2005; Dunoyer et al. 2005; Xie et al. 2005). As discussed previously, DCL4 is responsible for the biogenesis of ta-siRNAs. Initial evidence came from the analysis of an A. thaliana dcl4 mutant displaying heterochronic (vegetative phase change) defects, which showed that the mutant had normal levels of 21-nt miRNAs and 24-nt rasiRNAs, but low levels of 21-nt ta-siRNAs and increased levels of the ta-siRNA transcripts (TAS; Xie at al. 2005). It was subsequently concluded that the dsRNA substrate of DCL4 is the product of RDR6, which uses miRNA-cleaved transcript as templates (Allen et al. 2005). Apart from its role in ta-siRNAs biogenesis, DCL4 is known to be involved in the processing of long hpRNA derived from an inverted-repeat transgene (Dunoyer et al. 2005). Thus, DCL4 is the first DCL that has been shown to be involved in transgene-induced silencing in plants. It is possible that DCL4 is the key Dicer for post-transcriptional transgene silencing in plants as a whole. Also, it may play a principal role in the biogenesis of viral siRNAs, another group of exogenous siRNAs in plants. While each of the four DCLs in Arabidopsis appears to have specific roles in the RNA silencing pathways, a recent study suggested that their functions are partially redundant; when a particular DCL is mutated, its dsRNA substrate can be processed by one or more of the other DCLs giving rise to siRNAs typical of the substituting DCLs (Gasciolli et al. 2005; Xie et al. 2005). It is possible that the substrate specificities of individual DCLs are determined by associated protein factors, such as the doublestranded RNA binding proteins to be discussed later. In the absence of a particular DCL, the associated factors become available to other DCLs, allowing the latter to act on the dsRNA substrate of the absent DCL. In rice, six Dicer homologues have been discovered including a potential monocot-specific Dicer (Fig. 12.2 and Table 12.2). 12.4.2 Hua Enhancer 1 Plant siRNAs and miRNAs, unlike small RNAs in other eukaryotes, are methylated at the 2΄-hydroxyl of the 3΄-terminal ribose. The enzyme responsible for this methylation is encoded by the A. thaliana gene Hua Enhancer (HEN1). Studies have shown that HEN1 is expressed in roots, stems, leaves, and inflorescences and that hen1 mutants show pleiotropic effects including late flowering, infertility, curvature of leaves, and reduced organ size (Chen
304
Shaun J. Curtin et al.
et al. 2002). The primary function of HEN1 is to stabilize small RNAs in plant cells; methylation of the 3΄ terminus of small RNAs prevents them from being uridylated and thereby targeted for degradation (Li et al. 2005). A recent study indicated that both methylated and unmethylated small RNAs can be efficiently incorporated into RISC and direct silencing in plants (Qi et al. 2005), which is consistent with the observations that, while HEN1 is involved in both miRNA and siRNA-mediated silencing and is also implicated in resistance to viral infection (see later), it is not absolutely required for all types of silencing. Also HEN1 is not required for rasiRNA function associated with inverted-repeat silencing (Boutet et al. 2003; Meins et al. 2005). Ciliates Insects 975
1967
Mammals
1550 Fungi Green Algae
1625 + +
Maize 15
60 Non-plant dicer DCL1 DCL2a DCL2b DCL3a DCL3b DCL4
+
Wheat
70
10
200
+
270
Large scale gene duplication Time in million years (My)
<5
Barley Poplar
90
10
165
Arabidopsis
Dicots
955
Monocots
Rice 55
25
Fig. 12.2. Proposed evolutionary tree of Dicer-like genes in plants (after Margis et al. 2006). The presence or absence of different DCL genes and the times of divergence of the different nodes are depicted on the currently accepted phylogenetic tree of species. Branch lengths are not to scale. The estimated large-scale gene duplication events are depicted by gray ellipses. The numbers at the nodes and at the ellipses are estimated dates in million years ago (Mya). These numbers are rounded to the nearest 5 Mya and, for dates that have been previously estimated in ranges, the median of that range has been taken. The different plant DCL types are pattern coded and the nonplant Dicer genes are represented as white boxes. The duplication of a DCL gene is indicated by (+). (Reproduced from Margis et al. 2006 FEBS Lett 580:2442–2450.)
12 RNA Silencing and Its Application in Functional Genomics
305
12.4.3 The Double-Stranded RNA-Binding Protein Family Double-stranded RNA-binding proteins (dsRBPs) have been identified in eukaryotes and prokaryotes, as well as viruses. They have been shown to regulate cellular signalling events as well as the synthesis, processing, transport, translation, and degradation of RNA (Fedoroff 2002). Han et al. (2004) highlighted their role in the RNA silencing pathways with the discovery of a distinct 36-kDa protein, HYL1 (hyponastic leaves), that cofractionated with dsRNA-processing activity. Sequence analysis revealed HYL1 to be a previously uncharacterized protein with tandem dsRNA-binding domains. A conserved region typical of dsRBPs is the dsRNA-binding motif (dsRBM). This motif consists of about 70 amino acid residues that form a α–β–β–β–α fold whose two α-helices interact specifically with double-stranded RNA but not with DNA or DNA:RNA hybrids (Hiraguri et al. 2005). The dsRBM also functions as a protein– protein interaction domain, and it is worth noting that all Dicer-like proteins, with the exception of DCL3, contain at least one dsRBM. HYL1 has been shown to be an important component of the miRNA and ta-siRNA pathways (Han et al. 2004; Vazquez et al. 2004). HYL1 mRNA has been shown to accumulate in all tissues and organs; however, reporter gene experiments have shown that the promoter is mainly active in the vascular tissues of petioles and the mid-veins of rosette leaves, suggesting that it may have a role in the transportation of dsRNA through the vascular tissue (Yu et al. 2005). HYL1 mutants exhibit pleiotropic effects including curled (hyponastic) leaves, late flowering, reduced organ size, and increased lateral organ formation (Meins et al. 2005). These phenotypes are consistent with the developmental role of the miRNA and ta-siRNA pathways in which HYL1 appears to have a crucial role (see preceding text). Four other dsRNA-binding proteins with high sequence homology to HYL1 (designated DRB2 to 5) have also been found in A. thaliana, but their exact functions have not been elucidated. Hiraguri et al. (2005), using northwestern blot analysis, have shown that DRB1/HYL1 interacts with DCL1 whereas DRB4 interacts with DCL4. This is consistent with the involvement of HYL1 in miRNA biogenesis, and suggests that each individual DCL needs to interact with a specific DRB protein for normal function in plants (Hiraguri et al. 2005; Vaucheret 2006). 12.4.4 The Argonaute Protein Family Another important gene family involved in RNA silencing consists of the Argonaute genes encoding proteins which may be involved in miRNAmediated regulation of gene expression via their association in the RISC
306
Shaun J. Curtin et al.
complex (Hammond et al. 2000). In A. thaliana there are 10 genes that make up a subfamily of the ARGONAUTE1 gene. Argonaute proteins are roughly 100 kDa in size, consisting of two conserved domains, a 130amino acid (aa) N-terminal PAZ domain, and a 300-aa C-terminal PIWI domain (Carmell et al. 2002). The PAZ domain is thought to be a protein– protein interaction domain, potentially mediating either heterodimerization or homodimerization. PAZ domains are also present in DCL proteins and have been shown to bind to the ends of small RNAs (Kidner and Martienssen 2005). The PIWI domain is required for cleavage of the miRNAtargeted mRNA and is highly conserved in other eukaryotes (Carmell et al. 2002; Kidner and Martienssen 2005). AGO1 has been referred to as an RNA “slicer” that is required for proper functioning of mi/siRNAs (Baumberger and Baulcombe 2005). In A. thaliana, the four AGO1-like genes that have been extensively studied and described are AGO1 (ARGONAUTE1), AGO4, AGO7 (ZIPPY) and AGO10 (PINHEAD/ZWILLE) (Moussian et al. 1998; Carmell et al. 2002; Hunter et al. 2003; Zilberman et al. 2003, 2004; Kidner and Martienssen 2005). AGO1 and AGO10 mutants have phenotypes associated with loss of stem cell maintenance and auxillary meristem failure, with both proteins having some degree of functional redundancy. AGO1 is expressed throughout the plant at all stages of development and the mutants exhibit abnormalities including radialized leaves, infertile flowers, and filamentous structures resembling the tentacles of a squid, a likely inspiration in naming the mutant Argonaute (Carmell et al. 2002). Only AGO1 has been shown to be required for PTGS in plants. However, owing to the apparent redundancy and higher-level expression of AGO1, compared with that of AGO10 throughout the plant, it is not known whether AGO1 expression masks the role played by AGO10 (Lynn et al. 1999; Kidner and Martienssen 2005). AGO4 is required for the establishment of DNA methylation and epigenetic regulation at several plant loci such as the FWA gene (Chan et al. 2004). It is also required for histone methylation, non-CpG DNA methylation and the production of long (24-nt) siRNAs that are homologous with the AtSN1 retroelement (Zilberman et al. 2004). Thus, AGO4 is likely to act downstream of RDR2 and DCL3 in the RNA-directed DNA methylation pathway. AGO7 is likely to have a role in ta-siRNA-mediated regulation of gene expression. A recent study showed that AGO7 is involved in TAS3-directed control of leaf morphology; A. thaliana ago7 mutants have reduced levels of TAS3-derived ta-siRNAs, although the levels of TAS1- and TAS2-derived ta-siRNAs remain unaffected (Peragine et al. 2004; Allen et al. 2005; Adenot et al. 2006). This finding suggests that AGO7 may function downstream of DCL4 in the ta-siRNA pathway (Vaucheret 2006).
12 RNA Silencing and Its Application in Functional Genomics
307
12.4.5 RNA-Dependent RNA Polymerase (RdRP) The first identified component of the RNA silencing pathway is RNAdependent RNA polymerase (RdRP) (Dalmay et al. 2000; Mourrain et al. 2000). As mentioned in the preceding text, RdRPs are involved in almost all of the endogenous siRNA pathways. Plants have at least three functional RNA-dependent RNA polymerases—RDR1, RDR2, and RDR6. RDR2 is functionally associated with DCL3 and is required for the biogenesis of rasiRNAs, presumably by synthesizing dsRNA from single-stranded transcripts derived from the repeat-associated sequences in the genome (Xie et al. 2004). RDR6 is required for ta-siRNA biogenesis. It is also involved in sense transgene-mediated silencing and systemic transmission of silencing signals in plants, probably by copying single-stranded “aberrant” transgene RNA into dsRNA. Further, RDR6 is implicated in virus resistance and DNA virus-induced gene silencing in plants, and is therefore likely to be involved in viral siRNA biogenesis (Allen et al. 2004; Peragine et al. 2004; Vazquez et al. 2004). The specific role of RDR1 in silencing remains unclear. The expression of RDR1 is inducible by salicylic acid, or on virus infection, and plants deficient in RDR1 activity become more susceptible to viral infections (Yu et al. 2003; Yang et al. 2004), suggesting that this RdRP is involved in systemically acquired virus resistance in plants. RdRP is also involved in RNA silencing in nematodes and fungi (Ahlquist 2002), but has not been found in mammals or Drosophila. 12.4.6 DNA Methyltransferases The Arabidopsis genome contains 10 genes that encode DNA methyltransferases that are required for DNA methylation in two complementary processes, de novo methylation—the initial methylation of unmethylated DNA and maintenance DNA methylation that propagates existing DNA methylation during DNA replication (Cao and Jacobsen 2002). These DNA methyltransferases can be divided into three main families based on function and sequence similarities to mammalian DNA methyltransferases (Table 12.1). Some of the known roles of the DNA methyltransferases in RNA silencing pathways have been discussed in Section 12.3.2.
12.5 RNA Silencing and Anti-Viral Defense As mentioned earlier, a key role of RNA silencing in plants is defense against viruses (Wang and Metzlaff 2005). The initial evidence that RNA silencing is a natural antiviral defense mechanism in plants came from the
308
Shaun J. Curtin et al.
early studies on pathogen-derived virus resistance. Transgenic plants expressing viral sequences encoding coat protein, replicase, and other viral proteins frequently showed increased resistance to the virus (Abel et al. 1986), and this resistance was shown to be sequence-specific and often associated with low steady-state viral transgene RNA. Direct evidence that RNA silencing plays a key role in antiviral defense came from a study showing that natural recovery of Nicotiana clevelandii from nepovirus infection resulted from sequence-specific degradation of viral RNAs (Ratcliff et al. 1997). This was further confirmed by subsequent studies showing that replication of all viral and subviral agents in plants is associated with the accumulation of viral siRNAs, and that almost all plant viruses encode multifunctional proteins that can suppress the post-transcriptional RNA silencing pathway in plants. Suppressors of silencing have also been identified from viruses infecting animals, suggesting that RNA silencing is also an antiviral defense mechanism in these organisms (Voinnet 2005). However, while the antiviral defense role of RNA silencing has been confirmed in insects by recent studies (Wang et al. 2006), it is still a matter for debate as to whether this mechanism plays a major role in mammalian antiviral defense as mammals have evolved other powerful antiviral mechanisms. The first viral silencing suppressors to be identified were the helper component protease (P1/HC-Pro) from potyviruses and the 2b protein from cucumis viruses. HC-Pro was initially shown to be a pathogenicity enhancer in synergistic viral interactions. Subsequent studies showed that HCPro suppresses both transgene and virus-induced PTGS (Anandalakshmi et al. 1998). The 2b protein is also a pathogenicity determinant, and was shown to prevent the initiation of silencing at the growing points of the plants. These initial findings provided the first indication for the diversity of viral silencing suppressors and their mode of actions of (for reviews see Moissard and Vionnet 2004; Roth et al. 2004; Li and Ding 2005; Voinnet 2005). After the initial discovery of Potyvirus HC-Pro and Cucumovirus 2b suppressor proteins, numerous analogous proteins were shown to be encoded by both RNA and DNA viruses. Examples of these include the Tombus virus-encoded p19, Closterovirus p21, Potexvirus p25, and Carmovirus p38. The activities of these proteins were elucidated from the results of their respective transgene expression and the concomitant effects on siRNA and miRNA accumulation in plants (Voinnet 2005). Also, the crystal structure of p19 has been determined and its mode of action has been elucidated (Silhavy et al. 2002). The p19 protein sequesters 21-nt siRNAs by binding specifically to the double-stranded form, thereby preventing RISC assembly and hence target mRNA degradation. Not surprisingly, proteins such as p19 and p21 also bind to duplex miRNAs, thereby affecting the accumulation and functions of miRNAs and plant development
12 RNA Silencing and Its Application in Functional Genomics
309
(Chapman et al. 2004; Dunoyer et al. 2004). Although HC-Pro does not seem to have direct physical interactions with miRNAs, it enhances the accumulation of plant miRNAs of both sense and antisense polarities, possibly by preventing the unwinding of duplex miRNAs (Chapman et al. 2004; Dunoyer et al. 2004), thereby interfering with plant development. However, the developmental defects caused by the transgenic overexpression of different viral silencing suppressors in Arabidopsis are remarkably similar, which has led to the suggestion that inhibition of the miRNA pathway by transgene-expressed suppressors does not reflect a deliberate viral strategy to reprogram or alter host genome expression (Dunoyer et al. 2004). Indeed, recent studies of viral silencing suppressors suggest that viruses have evolved a survival mechanism that minimizes their interference with the endogenous siRNA and miRNA pathways and hence damage to their host plants. For instance, none of the characterized viral suppressors appears to block the production of endogenous siRNAs and miRNAs, which would allow the continued production of the small RNAs important for normal plant development. A recent report showed that while HC-Pro interferes with HEN1mediated methylation of viral siRNAs, it does not block the 3΄-terminal methylation of endogenous siRNAs and miRNAs, and thereby has minimum effects on the biochemical property of endogenous small RNAs (Ebhardt et al. 2005). As mentioned in the preceding text, the Tombusvirus-encoded silencing suppressor p19 binds specifically to double-stranded siRNA or miRNAs but not to single-stranded small RNAs, which implies that viral infections may not significantly interfere with the biological functions of the mature, single-stranded small RNAs that exist in plant cells before virus infection. The unabated production of viral siRNAs in the presence of silencing suppressors may have an additional implication for the survival strategy of the virus, that is, minimizing damage to the host: it would ensure that abundant viral siRNAs are present in infected plant cells for defense against subsequent infections by the same or related viruses, thereby minimizing damage to the host. The discovery that RNA silencing is a natural antiviral defense mechanism in plants has provided a new avenue for developing antiviral strategies in plants. Indeed, expression of viral transgenes encoding hpRNA (see later) has become a powerful approach in engineering virus resistance in plants. Complete resistance to barley yellow dwarf virus (BYDV; Wang et al. 2000) or PVY (Smith et al. 2000) has been achieved in barley or tobacco by expressing an inverted-repeat transgene encoding hpRNA of a BYDV RNA polymerase sequence or a PVY protease sequence, respectively. This hpRNA transgene strategy is expected to be effective against most RNA viruses, but questions remain as to whether DNA viruses can be effectively targeted. Also, multipartite RNA viruses could prove to be a
310
Shaun J. Curtin et al.
relatively difficult target, and high levels of resistance against such viruses may require the simultaneous targeting of all RNA components with the hpRNA transgene. This is because the RNA components, if not targeted, could support the replication of the targeted RNA, preventing its complete destruction. Further, it would be interesting to investigate if the hpRNA transgene approach would confer efficient resistance to dsRNA viruses such as the rice rugged stunt reovirus.
12.6 Gene Silencing Platforms in Plants The nucleotide sequencing of several plant and animal genomes has provided researchers with a wealth of genetic information. Complete genomic sequences are now available for A. thaliana and Oryza sativa. The sequencing of the genomes of Zea mays, Lycoperscion esculentum, Populus trichocarpa, and the legumes Medicago truncatula and Lotus japonica is in progress (Matthew 2004; Margis et al. 2006). The availability of genome sequence has led to further challenges for plant molecular biologists, some of which involve the daunting task of revealing the function of individual genes: approximately 25,000 in A. thaliana and more than 55,000 in O. sativa (Waterhouse and Helliwell 2003). To investigate the function of plant genes, an approach called reverse genetics is employed in which the coding region of a gene is generally disrupted by the insertion of a transposon or insertion sequence, thereby inactivating normal gene function. The resulting phenotype is then compared to that of wild-type plants. As described in detail in other chapters of this book, this strategy has been achieved in plants by transferred DNA (T-DNA) or transposon insertional mutagenesis of the desired genome and mapping of the resultant insertions. In plants, various techniques are used for this strategy such as transformation of T-DNA from the gram-negative soil bacterium Agrobacterium tumaciens or transposon tagging using the maize transposable elements Ac/Ds and En/Spm (Hirochika 2001; Waterhouse and Helliwell 2003). In A. thaliana, more than 225,000 Agrobacterium T-DNA insertion mutants were created in an effort to tag all genes in the genome. About 88,000 of these T-DNA insertions were precisely mapped and, from this information, 21,700 mutations were identified in the predicted 29,454 genes (Alonso et al. 2003). This project has lead to large publicly available collections of A. thaliana mutants from which desired mutants can be easily obtained through the Web-based database located at The Arabidopsis Information Resource (TAIR) and seed purchased over the Internet. The collections of A. thaliana mutants available to the research community have made major contributions to plant science research; however there
12 RNA Silencing and Its Application in Functional Genomics
311
are limitations to their use. These include the difficulty involved in investigating the functions of duplicated genes and the complication of mutant analysis due to phenotypes resulting from the disruption of nontarget genes due to multiple insertions of transposons or T-DNAs (Waterhouse and Helliwell 2003; Matthew 2004). For example, the A. thaliana SIGnal collections suggest that there are potentially 1.5 T-DNA insertions per line (Matthew 2004). Another obstacle in using these strategies in rice and other species that have a much larger genome than A. thaliana is the greater difficulty in generating and managing the large quantities of plants required for this type of insertional mutagenesis (Hirochika 2001). A recently discovered endogenous retrotransposon of rice shows promise for forward and reverse genetics strategies in this important crop plant. Tos17 has a relatively low copy number for a retrotransposon, usually one to five copies per genome depending on the cultivar, and is activated only during tissue culture where the copy number increases directly with the time in tissue culture. Once regeneration has occurred, the retrotransposons are repressed, leaving around 5 to 30 copies in regenerated plants (Hirochika 2001). These features, along with its preference for insertion within coding sequences, suggest that Tos17 will be a significant tool in rice functional genomics in the near future. However, there are some limitations including the tendency of Tos17 insertions to aggregate in genomic hotspots, potentially limiting its genomic coverage, which is a common limitation in many transposon-tagging technologies (Miyao et al. 2003). RNA silencing technologies have provided a complementary strategy to the conventional reverse genetic approaches in functional genomics studies of plant genes. It is expected that RNA silencing technology will be particularly useful in defining the function of individual genes within multigene families. Also, RNA silencing is a highly targeted mutagenic approach, making it relatively straightforward in relating phenotypes to gene functions. The functional characterization of a large number of genes in the nematode C. elegans using systematic RNA interference has demonstrated the vast potential of RNA silencing technology in functional genomics studies. Several technologies have been developed for delivering gene silencing in plants, including the hpRNA transgene approach and virus-induced gene silencing approach. As discussed later, the different systems have strengths and limitations, but they have already become useful in gene function analysis in plants (Fig. 12.3).
312
Shaun J. Curtin et al.
A
LB
M1
RdRP
P 35S
M3 Target
Term
RB
SM
M2
Hairpin of target sequence
B
LB
C
35S P
LB
D
35S P
Target
Ub P
attR attR1
E
Target
35S P
F
Term
SM
RB
Term
SM
RB
Cm
ccdB
Target
Target
Term
attR
gus linker
attR2
Target
PG transgene
FMV P
nos T spacer nos T A I
IV
IV
5'
3'
5'
B
A
A
III
II
3'
A
B
amiRNA
III
II 5'
amiRNA*
G
B
B
H
M1
RdRP
T7
M3
Target
CP
M2
I
LB
35S
RdRP
M3 Term
RB
M
Term
RB
Term
RB
M1 M2
J
LB
35S
RdRP
+ K
LB
35S
CP
Target
Fig. 12.3. Types of transgene constructs for RNA silencing in plants. (A) A plasmid containing infectious potato virus X (PVX) cDNA can be transcribed in vitro and inoculated onto the plant. A component of the PVX cassette contains an inserted region of sequence from the targeted gene (Helliwell and Waterhouse 2003). (B) A typical T-DNA plasmid that can express hairpin RNA in plants. This construct can be introduced into the plant by DNA bombardment or stably transformed by Agrobacterium-mediated transformation. The latter method requires a selectable marker. (C) A T-DNA plasmid similar the one above can expresses
12 RNA Silencing and Its Application in Functional Genomics
313
RNA with the target gene sequence located upstream of the hairpin structure. This vector can potentially be used for high-throughput screening with a cDNA library. (D) The general structure of the pANDA construct with the Gateway vector conversion system cloned in an antisense and sense direction and separated by gus linker. Polymerase chain reaction (PCR) products corresponding to the targeted gene are cloned into the pENTRO/D-TOPO vector followed by a LR clonase reaction to produce the final construct for transformation into rice (Miki and Shimamoto 2004). (E) Multiple direct repeats of chloramphenicol acetyltransferase (CAT) and gus gene sequences were shown to trigger efficient PTGS called direct repeat-induced PTGS (driPTGS). (F) Schematic representation of the SHUTR construct containing an inverted repeat of the 3 ' -untranslated region from the Agrobacterium nos gene. (G) Artificial miRNAs are constructed using overlapping PCR on an endogenous miRNA precursor. Primers are designed to replace the existing miRNA and miRNA* sequences with artificial sequences (gray). The artificial miRNA is generated by combing all three PCR products—A-IV, II–III and I-B—in a single reaction with primers A and B (Schwab et al. 2006). (H-K) VIGS vectors.
12.6.1 Delivery by Transgenes Sense and Antisense Transgenes
Antisense transgenes, designed to express RNA complementary to target mRNA, have been used extensively in animal, plant, and prokaryote studies to investigate the suppression of specific genes (Wang and Waterhouse 2000). One of the original reports of using antisense RNA technology to regulate gene expression in plants was published in 1987, about 10 years before the discovery of RNA silencing. In these studies, Rothstein et al. (1987) investigated the levels of NOS (nopaline synthase) mRNA in plants that were stably transformed with a CaMV 35S promoter-driven antisense nos transgene. The authors showed that nopaline synthase activity was reduced by a factor of 8- to 50-fold in the various plant tissues analyzed. These initial studies suggested that antisense RNA techniques might have practical benefits such as revealing specific gene function in important biochemical pathways and/or plant genetic engineering. An intriguing observation regarding antisense-mediated suppression is the general lack of strong silencing of endogenous targets. The best antisense effect has been achieved almost exclusively with transgenes as target, especially those derived from bacteria such as the gus gene from E. coli. With endogenous targets, usually only a small fraction of the antisense transgenic population can be effectively silenced. Subsequent studies by Stam et al. (1997) and Wang and Waterhouse (2000) suggested that the strong silencing in these small number of lines is not due to a direct interference by the antisense RNA transcript, but results from inverted-repeat transgene insertion
314
Shaun J. Curtin et al.
that could give rise to long hpRNA. Wang and Waterhouse (2000) showed that an antisense gus transgene does not induce stronger or more frequent silencing than a sense gus constructs in rice callus, and the strongly silenced callus lines, whether transformed with the sense or the antisense transgene, all contain inverted-repeat transgene insertions. These observations are surprising, as transcript from antisense transgenes has a potential to hybridize with the target sense mRNA and form dsRNA to trigger more efficient silencing than a sense transgene. Results of our recent study (Wang and Wu, unpublished data) have suggested that antisense RNA does not physically interact with target sense mRNA to form dsRNA. A possible effect of antisense transgene-derived RNA may come from Dicer processing of stem-loop structures formed within single-stranded antisense RNA, giving rise to a small population of siRNAs that can direct the cleavage of target mRNA (Wang and Metzlaff 2005). This view is consistent with bacterially derived antisense transgenes being more effective than plant-derived antisense transgenes at inducing silencing in plants; the bacterial sequences have not coevolved with plant Dicers and hence may contain more stem-loop structures that can be recognized by Dicer and give rise to antisense siRNAs. The unpredictable efficacy, labor intensiveness, and recent availability of improved technologies, to be discussed later, have limited the widespread use of the antisense transgene technology in basic research and functional genomics in plants. Amplicon Transgenes
Variability in expression levels is a common phenomenon associated with transgenes (Angell and Baulcombe 1997). The reasons for this variability in transgene expression are not entirely understood. Apart from the post-transcriptional transgene silencing that can be triggered by multiplecopy or inverted-repeat transgene insertions, recent studies suggest that this variation could also be a result of chromosomal effects such as transgene integration into heterochromatic regions of the genome where gene expression is normally repressed (Angell and Baulcombe 1997). It is expected that transgenes encoding silencing-inducible RNA, such as hpRNA, is also subject to such variation in expression, resulting in variable levels of target gene silencing in plants. One approach to reducing this potential variability in transgene expression is to use so-called amplicon transgenes. In this approach, the cDNA of a virus, driven by a constitutive promoter (such as CaMV 35S), can be recombined with a target gene of interest, into a construct termed an amplicon. The first demonstrated use of an amplicon involved recombining a full-length potato virus X (PVX) cDNA with a fragment of the targeted gus reporter gene (Angell and Baulcombe 1997). When an amplicon is
12 RNA Silencing and Its Application in Functional Genomics
315
transformed into a plant, the transgene becomes integrated into the genome. Because the transgene is driven by the constitutive CaMV 35S promoter, the transgene is transcribed in every cell of the plant. After transcription in the nucleus, the “transgene” mRNA is transported to the cytoplasm to be translated. Viral proteins, including RNA-dependent RNA polymerase (RdRP), that are essential for PVX replication, are produced. The newly synthesized RdRP generates a negative-strand RNA using the amplicon transgene-derived RNA as a template. The RdRP then uses the negativestrand RNA as a template to synthesize a positive-strand RNA including a full-length viral genomic RNA and target gene-containing subgenomic RNA. This RNA resembles that of an infecting virus, thereby generating siRNAs corresponding to the sequences of the amplicon RNA. An inserted DNA fragment, corresponding to the endogenous target, or reporter, gene will induce PTGS and will effectively silence the target gene or reporter transgene. Because the viral-like replication step normally generates considerably more transcript than most highly expressed transgenes, variation in transgene expression levels, due to chromosomal position effects, will be masked and the inserted sequence will be highly expressed in all transformed plants ensuring consistent silencing. These amplicon constructs are based mainly on PVX, which does not usually cause severe disease symptoms, even on its normal host potato. Amplicons have been used successfully to silence reporter transgenes such as gus and gfp, as well as endogenous genes like PDS (phytoene desaturase) in various hosts including tomato, tobacco, N. benthamiana, and petunia (Waterhouse and Helliwell 2003). Advantages of amplicon technology include the broader range of plant application than could be achieved using natural viral infections (see the section that follows on virus-induced gene silencing). The potential scope of using amplicon transgenes to silence genes in a broad range of plants has been demonstrated by the silencing of a GFP reporter gene using a CaMV 35S:PVX:gfp amplicon in A. thaliana, which is not susceptible to PVX (Dalmay et al. 2000). Amplicons can also be expressed from tissue-specific promoters, and there are generally no viral symptoms to complicate mutant analysis. One obvious disadvantage of amplicon, and other transgenic technologies, is that not all plant hosts are readily transformable (Waterhouse and Helliwell 2003). hpRNA Transgenes
Another approach used for inducing PTGS in plants is the use of invertedrepeat constructs designed to express hairpin RNA (hpRNA). These constructs consist of a promoter, a targeted sense sequence, a spacer region, a complementary targeted antisense sequence, and a transcription terminator. The RNA transcribed from such a construct has self-complementarity and
316
Shaun J. Curtin et al.
is able to form a partially-duplexed “hairpin” molecule that mimics naturally occurring dsRNA (Smith et al. 2000; Watson et al. 2005). The use of an intron instead of a nonspecific DNA spacer has been shown to increase the silencing efficiency of these hairpins (Smith et al. 2000). The first demonstration of PTGS induced by a hpRNA transgene construct was shown using the gus reporter gene in rice (Waterhouse et al. 1998; Wang et al. 2000). The authors supertransformed rice callus, stably expressing a GUS transgene, with an inverted-repeated gus construct, which was found to be far more effective at silencing the gus gene than either sense or antisense transgenes (Wang et al. 2000). Since this initial experiment, a wide variety of genes have been silenced in A. thaliana and several other plant species using this technology, including viral sequences, transcription factors, and genes encoding enzymes for various biochemical pathways (Waterhouse and Helliwell 2003). Most applications of hpRNA-mediated silencing have been applied to dicotyledenous plant species via generic vectors such as pHANNIBAL and the Gateway high-throughput vector pHellsgate (Wesley et al. 2001). These vectors have been specifically designed to facilitate the in vivo cloning of polymerase chain reaction (PCR)-generated, Gateway-adapted, inversely oriented target sequences flanking an intron sequence into an A. tumefaciens binary transformation vector. The hpRNA constructs are driven by the constitutively expressed CaMV 35S promoter; however, tissue-specific promoters such as napin and lectin promoters have been used (Smith et al. 2000). Recently, a Gateway vector, designated pANDA, was developed for PTGS of rice genes (Miki and Shimamoto 2004). The construct contains a strong maize ubiquitin promoter to drive the expression of the hpRNA transgene, as the CaMV 35S promoter is less active in monocots than in dicot plants. To generate the hpRNA constructs, a 300- to 500bp fragment from the gene sequence of choice is amplified by PCR, using Gateway-adapted primers, and cloned into a Gateway entry vector such as pENTR/D-TOPO. Once this cloning step is complete, a simple in vivo recombination reaction is undertaken in which the cloned sequences are recombined, in inversely oriented arrangement, into the pANDA “destination” vector ready for plant transformation (Miki and Shimamoto 2004). Other examples of this technology include a recent study by Liu et al. (2005), who identified four candidate DCL-encoding sequences in rice by searching available genomic sequence databases. They used hpRNA transgenes to silence the expression of OsDCL1 and OsDCL4, putative homologues of the A. thaliana DCL1 and DCL4 genes, respectively. They showed that transgenic lines containing the OsDCL1 hpRNA construct exhibited a range of defects from developmental arrest at the seedling stage to milder development defects that generally correlated with higher or lower expression of the hpRNA transgene, respectively. Analysis of the
12 RNA Silencing and Its Application in Functional Genomics
317
OsDCL-silenced transgenic lines showed that the OsDCL1 plants were defective in miRNA biogenesis whereas OsDCL4 lines were not and that neither OsDCL1 nor OsDCL4 appeared to be involved in siRNA production. While hpRNA constructs have been widely used to silence genes by inducing PTGS, there are a few examples in which hpRNAs, and VIGS (see Section12.6.2), have directed transcriptional silencing. PVX RNA vectors, containing a fragment of the 35S promoter, have been shown to induce and direct methylation and transcriptional repression of a 35S-driven gfp transgene (Jones et al 1999). Similarly, hpRNA constructs directed against the nopaline synthase (nos) promoter have been shown to transcriptionally silence a nos promoter-regulated selectable marker transgene (Mette et al. 2000). The PVX vector was unable to direct detectable levels of methylation of the endogenous gene ribulose-1, 5-bisphosphate carboxylase (Rubisco). However, three different hpRNA constructs targeting the endogenous promoter of chalcone synthase were able to direct transcriptional silencing of this endogene (Dedic-Hagan 2004). This latter result demonstrates that it is possible to transcriptionally silence endogenes using hpRNAs. However, this approach has not been widely tested and the silencing efficiency may depend on the sequence and context of the target promoter. Direct-Repeat Constructs and Constructs Carrying a 3΄-Inverted Repeat of a Nontarget Sequence
Two recently discovered transgenic techniques offer potential application in functional genomics studies of rice and other plant species (Ma and Mitra 2002; Brummell et al. 2003). The first technique involves the expression of RNA encoding multiple-copy direct repeats of the target gene. The second approach is called “silencing by heterologous 3΄-UTR” (SHUTR). In this strategy, a fragment of the target gene is fused at the 3΄ end with an inverted repeat arrangement of a nontarget sequence. The effect of direct-repeat transgenes in gene silencing was first reported by Sijen et al. (1996), who found that the frequency of resistance to cowpea mosaic virus (CPMV) was increased from around 20% with a transgene containing one copy of the virus movement protein (MP) gene to around 60% with a transgene containing two direct-repeat copies of the same MP gene sequence. Studies by Wang and Waterhouse (2000) comparing the silencing efficiencies among sense, antisense, inverted-repeat, and direct-repeat transgenes targeting a gus transgene in rice further supported this finding. The authors found that the two-copy direct-repeat gus transgene was slightly less effective at silencing the gus gene than the inverted-repeat or hpRNA transgene, but it was much more effective than either the single-copy sense or antisense transgenes.
318
Shaun J. Curtin et al.
In a more recent study, Ma and Mitra (2002) showed that the silencing efficiency of direct-repeat transgenes can be further improved by increasing the number of repeats to three or four rather than the previously investigated two repeats. Working with tobacco plants, the authors were able to induce strong silencing in 80% to 100% of transformants using three to four-copy direct-repeat transgenes in a phenomenon they have termed “direct repeat induced PTGS” (driPTGS). To examine whether the transgene promoter was a contributing factor to efficient silencing, the authors compared results obtained using a strong CaMV 35S promoter with those using a relatively weak chlorophyll a/b binding protein gene promoter and found no difference. Silencing occurred regardless of the position of the direct repeat in the transgene, whether it was at the 5΄ or 3΄ end, or whether the repeats were separated by an unrelated gene, in this case, a 1.8 kb gus gene. To date, five independent genes have been silenced using driPTGS, including a transgene in rice callus (Ma and Mitra 2002). One drawback of this technology is that the construction of the driPTGS transgenes may require several cloning steps. However, a Gateway system with multiple direct repeat cloning sites may provide a way to facilitate the generation of these transgenes. The mechanism by which the direct-repeat transgene induces silencing remains unclear. Two models have been proposed. Wang and Waterhouse (2000) proposed that transcript derived from the direct-repeat transgene has an “aberrant” nature that can be recognized by RdRP to synthesize dsRNA, thereby triggering RNA silencing. In a more recent model, Martienssen (2003) proposed that the presence of two or more tandem copies of the same sequence in one transcript is essential for RdRP-mediated amplification of silencing; an RdRP template RNA with only one copy of the target sequence would not ensure the synthesis of dsRNA corresponding to the full-length of the target sequence, resulting in successive dilution of secondary siRNAs from the 3΄ end to the 5΄ end of the target sequence. To test the effectiveness of SHUTR, Brummell et al. (2003) generated the following construct against PG (encoding the enzyme polygalacturonase), a gene that is highly expressed in ripening tomatoes. The construct consisted of the constitutively expressed figwort mosaic virus 34S promoter, a plant 5΄ leader sequence, approximately 1 kb of truncated PGcoding sequence followed by an inverted repeat of a sequence from the Agrobacterium nos gene. To ensure stability of the inverted repeat during cloning, the first nos fragment was aligned in the antisense orientation with respect to the promoter followed by a spacer region and finally the sense nos repeat (Brummell et al. 2003). During transcription of the transgene, the nos repeat presumably folds back on itself and is cleaved by Dicer. The details of this mechanism are not entirely understood; however, knowledge from the ta-siRNA pathway supports a view that the 3΄ end of the cleaved
12 RNA Silencing and Its Application in Functional Genomics
319
transcript is used as template for RDR6 to synthesize dsRNA in a primerindependent manner (see Fig. 12.1). It would therefore be interesting to examine if RDR6 and DCL4 are involved in SHUTR. Results from more than 50 independent transformations of this transgene in tomato showed PG silencing in approximately 91% of the transgenic population. Experiments using different promoters and hosts including A. thaliana showed that the SHUTR transgene is consistently effective at inducing gene silencing (Brummell et al. 2003). A transgene strategy that is similar to SHUTR was first demonstrated by Hamilton et al. (1998). In this report, the authors constructed a transgene of which the 5΄ end contains both an inverted repeat and a direct repeat of the 5΄-UTR of an ethylene biosynthetic gene, followed by the coding sequence of the gene. They demonstrated that this transgene construct also conferred strong silencing of a related second gene that shares homology only with the coding sequence in the transgene. A subsequent study of the transgenic population showed that siRNAs are generated that correspond to the transgene sequence downstream of the 5΄ repeated structure. This finding suggests that RdRP is involved in the silencing of the second gene. Obvious advantages of the SHUTR transgene strategy include the need for only one cloning step and the need for only limited knowledge of the targeted gene sequence, which could be useful for research into plant genomes that have not been extensively sequenced. However, the system has not been demonstrated for a wide spectrum of target genes. Studies in our laboratory have also raised doubt about the robustness of the SHUTR system for silencing genes in rice (Wesley et al. 2001). Artificial miRNA (amiRNA)
Compared with animal miRNA targets, which can number in the hundreds, plant miRNAs have relatively few targets (Schwab et al. 2006). One obvious difference between plant and animal miRNAs is that plant miRNAs require perfect, or near-perfect, target complementarity for proper mRNA cleavage, while animal miRNAs need only about 7 nts of contiguous complementarity to suppress translation of target mRNAs. Recent research has indicated that this stringent specificity of plant miRNAs can be exploited as a tool for functional genomics via the use of artificial miRNAs (amiRNAs). These amiRNAs can be designed to target a specific gene, or a group of related genes (Schwab et al. 2006). These findings were initially derived from examination of overexpressed natural miRNA constructs and the analysis of various miRNA targets in plants. Using the stem of miR171 and inserted gfp sequences, Parizotto et al. (2004) showed efficient silencing of a coexpressed GFP mRNA. The contrasting differences between animal and plant miRNAs support the claim that the narrow specificity of
320
Shaun J. Curtin et al.
plant miRNA function is due to an inherent property of the RNA silencing machinery in plants, and not to selection against the evolution of broader miRNA specificity (Schwab et al. 2006). Schwab et al. (2006) used naturally occurring miRNA precursor sequences as a backbone for expressing amiRNA, into which the original miRNA sequence and its complementary strand (miRNA*) are substituted with amiRNA and amiRNA* sequences, respectively. The key design parameters for amiRNAs are that the initial stem-loop structures of the natural miRNA precursor are well maintained, and that the composition of the amiRNA sequences closely imitate those of the natural miRNAs. Other important parameters include a preference for uridine at the 5΄ terminus and an adenine at the tenth base of the amiRNA as these nucleotides are highly conserved in natural miRNA populations as well as in highly efficient siRNAs. A mismatch corresponding to the 5΄ end of the amiRNA sequence in the amiRNA/amiRNA* molecule, is included to increase the likelihood that the amiRNA strand is preferentially incorporated into the RISC complex. This correct strand selection is essential for efficient silencing, as in the case of natural miRNAs and siRNAs. To avoid the possibility of transitive RNA silencing, triggered by a perfectly matching amiRNA hybridizing to the target and acting as a primer for RDR6, one to three mismatches are incorporated into the 3΄ end of the amiRNA. Examination of the efficacy of amiRNAs to silence specific endogenous genes was tested by selecting target genes with known knockout phenotypes. These genes included a regulator of floral identity (LFY), a cofactor in chlorophyll biosynthesis (GUN4), and a flowering-time gene (FT). Natural miRNA precursor molecules for miRNA172a and miRNA319a were used as backbones and modified by overlapping PCR to suit the desired amiRNA sequence with the whole construct being driven by the constitutively expressed CaMV 35S promoter. The results of these experiments show that the transcript levels of most amiRNA-targeted genes were significantly reduced (Schwab et al. 2006). The amiRNAinduced phenotypes correlated well with those of plants with conventional mutations in the corresponding target genes. It has been suggested that the amiRNA approach has several advantages over other RNAi methods. First, amiRNA-mediated silencing is highly restricted to the targeted gene, and nonspecific effects are very limited. This gives this technology a high capacity for target sequence specificity in silencing genes of interest and limits the risk of secondary PTGS resulting from transitive RNA silencing. Another potential value of the amiRNA system is that it is perhaps the only feasible technology to express a specific small RNA in plants.
12 RNA Silencing and Its Application in Functional Genomics
321
12.6.2 Transient Delivery by Viral Vectors—Virus-Induced Gene Silencing The majority of plant viruses have single-stranded RNA (ssRNA) genomes which are typically encapsidated by coat proteins and are released upon entry into a cell. Inside the plant cell, virally encoded RNA-dependent RNA polymerase generates both sense and antisense (plus and minus strand) viral RNA that can potentially hybridize to form dsRNA and trigger PTGS mechanisms. A recent study indicated that stem-loop structures formed within single-stranded viral RNAs can also be processed by Dicer in plants, thereby triggering silencing (Molnàr et a1. 2005). Viral infections can be established with naked viral RNA without the presence of coat proteins. In vitro-synthesized RNA transcripts, from a plasmid containing a cDNA encoding a complete virus genome, have been widely used to initiate virus infections. Also, the viral cDNA can be cloned into the T-DNA of Agrobacterium, which is delivered to a plant by agroinfiltration and expressed by the CaMV 35S promoter to initiate infections. On entry of viral RNA into the plant, the cell’s transcription machinery generates viral ssRNA and infection is established (Waterhouse and Helliwell 2003). A 300- to 800-nt exogenous sequence can be inserted into specific locations throughout the viral cDNA without the loss of infectivity of the RNA transcript (Thomas et al. 2001). When a recombinant virus infects a plant, the introduced viral and nonviral sequences are both targeted by the PTGS mechanism. This technology, in which a foreign sequence is introduced into a virus vector for the purpose of inducing silencing against the target gene, is referred to as virus induced gene silencing (VIGS). VIGS was first demonstrated by Kumagai et al. (1995), who showed the silencing of an infectious clone of tobacco mosaic virus (TMV), although the significance of this finding was not fully appreciated at the time. Several viruses have been shown to be effective VIGS vectors. The PVX VIGS system was spectacularly demonstrated in a N. benthamiana plant expressing GFP. A PVX-GFP VIGS vector spreads throughout the plant as in a typical viral infection and silences a constitutively expressed gfp reporter gene leaving a visual trail of virus movement throughout the plant. Although PVX is a very effective VIGS vector, it is unable to infect plant meristems (Ruiz et al. 1998). Tobacco rattle virus (TRV) is considered to be a more effective VIGS vector owing to its ability to infect almost all plant tissues including meristems and floral organs (Waterhouse and Helliwell 2003). TRV has been used to silence a wide range of endogenous genes involved in early organ development, disease resistance, and ethylene signaling (Ratcliff et al. 2001). The TRV genome consists of two separate RNAs called RNA1 and
322
Shaun J. Curtin et al.
RNA2. RNA1 encodes the RNA-dependent RNA polymerase, and other proteins, and RNA2 encodes the viral coat protein. To use TRV as a VIGS vector, a fragment of the gene of interest is inserted into RNA2 downstream from the coat protein gene whilst RNA1 is left unmodified. After coinoculation, with in vitro-transcribed RNA, or the use of TRVAgroVIGS constructs onto a susceptible plant host, infection is established and VIGS is induced. Other VIGS vectors include the geminiviruses, cabbage leaf curl virus (CbLCV) and the barley stripe mosaic virus (BSMV) which have been shown to silence both transgenes and endogenous genes in A. thaliana and wheat, respectively (Turnage et al. 2002; Scofield et al. 2005). There are also reports of the use of RNA and DNA satellite viruses as VIGS vectors. Viral satellites are specific parasitic RNAs or DNAs that require helper viruses for replication in a technology called satellite virus-induced silencing system (SVISS) (Gossele et al. 2002). In these systems, the target sequence is inserted into the satellite and coinoculated with its respective helper virus, either TMV or tomato yellow leaf curl (TYLCV) geminivirus. The major advantage of this system is the uncoupling of the virus replication and silencing mechanisms which contributes to a potent silencing signal attributed to the high copy number of the satellite virus; the target gene sequences are carried by the satellite viruses which are replicated by the corresponding helper virus. Other advantages of the SVISS system include the possibility of simultaneously silencing two genes by including two target-specific inserts (Tao and Zhou 2004). The many attributes of VIGS make it a very useful technology over other methods such as insertional mutagenesis and hpRNA technology. It is a more rapid technique for generating multiple mutants, considering the time required to generate double or triple mutants by conventional plant breeding. The latter can be a relatively time-consuming exercise in rice and even in A. thaliana, and it is an almost impossible task in polyploid plants such as wheat. VIGS vectors can be modified for high-throughput applications, in the same manner as hpRNA constructs, such as the pTRV-attP/R vector series which employs the Gateway recombination technology. VIGS can be applied to mature or juvenile plants to induce the silencing of important embryogenesis-related genes and genes required for germination in which a stably transformed hpRNA transgene, or a homozygous insertion mutant, would have resulted in embryo lethality (Watson et al. 2005) This shortcoming of stably transformed hpRNA transgenes could be avoided with the development of inducible hpRNA technologies that utilize chemically inducible promoters to induce expression of the hairpin transgene (Guo et al. 2003; Wielopolska et al. 2005). Another advantage of VIGS is its use in plants which are difficult or impossible, to transform such as wine grapes (Vitis vinefera) and cereals.
12 RNA Silencing and Its Application in Functional Genomics
323
There are also several limitations to the general application of VIGS that require addressing, such as the host range of viruses and the accessibility of infectious clones and their movement across international quarantine barriers. There is also the potential for viral symptoms to mask or mimic actual silenced phenotypes. Finally, for some viruses there may be a limit on the maximum size of the inserted target sequence. Generally, most of the viruses tested to date can accommodate less than 1 kb of foreign sequence inserted into their genomes (Watson et al. 2005). To date, no VIGS system has been developed for rice, although it is likely that the wheat VIGS vector BSMV is a suitable candidate, since rice is one of the hosts of the virus. The reason that a VIGS system has not been established in rice is probably the ease with which various hairpin transgenes can be transformed into rice. There is no limit on the size of the target sequences in these hpRNA constructs and they do not require quarantine documentation. However, it remains to be seen whether a rice VIGS system will be a useful research tool, as it would seem that the attributes of VIGS would suit research such as the analysis of bacterial and fungal pathogen defense pathways very effectively. 12.6.3 Transient Delivery by Agrobacterium Infection and Biolistics Microprojectile bombardment and agroinfiltration are methods for introducing DNA, and sometimes RNA, into cells. Microprojectile bombardment, otherwise known as biolistics, involves the coating of microscopic gold or tungsten particles with RNA or DNA and propelling them into plant material using high-pressure helium. Agroinfiltration is a term used to describe the transfer of T-DNA from Agrobacterium tumefaciens by injection of a culture through a needle-less syringe that is capable of breaching the plant stomata and entering the cells. The infecting bacterium is able to integrate its T-DNA into the plant genome and subsequently be transcribed by the cells machinery. Typically, microprojectile bombardment has been routinely used for transformation of rice and examination of gene expression in various plants. However, this procedure has recently been used to induce PTGS by bombarding particles coated with dsRNA or DNA constructs that encode hpRNA into various monocot species resulting in reduced activity of the targeted genes (Waterhouse and Helliwell 2003).
12.7 Future Prospects of Gene Silencing Technology in Plants The majority of the published research on PTGS in plants has typically been devoted to the investigation of the roles of the different RNA silencing pathways and their respective functional components. This focus is
324
Shaun J. Curtin et al.
undoubtedly due to the greater complexity of the RNA silencing pathways in plants than in other organisms. The interactions between these pathways, and their key components, are active areas of current research (Vaucheret 2006). Studies with the nematode C. elegans have led the way in functional genomics of this species using PTGS. Admittedly, the transformation of nematodes simply involves soaking the nematodes in dsRNA solutions or feeding them with bacteria that expresses dsRNA to effectively silence a targeted gene. Such technologies may have limited application in other species. Nevertheless, this C. elegans RNAi technology has advanced to the point now where large-scale high-throughput silencing projects are routine. It is hoped that such projects will encourage the development of mechanisms for similar studies in other eukaryote species. In conclusion, our understanding of small RNA-mediated pathways has increased dramatically over the last 5 years. The technologies are currently applied to large-scale functional genomics projects in dicots. It is only a matter of time before similar large-scale projects commence in monocots, with rice being one of the most likely candidates for such studies. References Abel P, Nelson R, De B, Hoffmann N, Rogers S, Fraley R, Beachy R (1986) Delay of disease development in transgenic plants that express the tobacco mosaic virus coat protein gene. Science 232:738–743 Adenot X, Elmayan T, Lauressergues D, Boutet S, Bouche N, Gasciolli V, Vaucheret H (2006) DRB4-Dependent TAS3 trans-acting siRNAs control leaf morphology through AGO7. Curr Biol 16:927–932 Ahlquist P (2002) RNA-dependent RNA polymerases, viruses, and RNA silencing. Science 296:1270–1273 Akbergenov R, Si-Ammour A, Blevins T, Amin I, Kutter C, Vanderschuren H, Zhang P, Gruissem W, Meins F, Jr., Hohn T, Pooggin MM (2006) Molecular characterization of geminivirus-derived small RNAs in different plant species. Nucl Acids Res 34:462–471 Allen E, Xie Z, Gustafson AM, Carrington JC (2005) microRNA-directed phasing during trans-acting siRNA biogenesis in plants. Cell 121:207–221 Alonso JM, Stepanova AN, Leisse TJ, Kim CJ, Chen H, Shinn P, Stevenson DK, Zimmerman J, Barajas P, Cheuk R, Gadrinab C, Heller C, Jeske A, Koesema E, Meyers CC, Parker H, Prednis L, Ansari Y, Choy N, Deen H, Geralt M, Hazari N, Hom E, Karnes M, Mulholland C, Ndubaku R, Schmidt I, Guzman P, Aguilar-Henonin L, Schmid M, Weigel D, Carter DE, Marchand T, Risseeuw E, Brogden D, Zeko A, Crosby WL, Berry CC, Ecker JR (2003) Genome-wide insertional mutagenesis of Arabidopsis thaliana. Science 301:653–657 Anandalakshmi R, Pruss GJ, Ge X, Marathe R, Mallory AC, Smith TH, Vance VB (1998) A viral suppressor of gene silencing in plants. Proc Natl Acad Sci USA 95:13079–13084
12 RNA Silencing and Its Application in Functional Genomics
325
Angell SM, Baulcombe DC (1997) Consistent gene silencing in transgenic plants expressing a replicating potato virus X RNA. EMBO J 16:3675–3684 Aufsatz W, Mette MF, Matzke AJ, Matzke M (2004) The role of MET1 in RNAdirected de novo and maintenance methylation of CG dinucleotides. Plant Mol Biol 54:793–804 Bartel DP (2004) MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116:281–297 Baulcombe DC (1996) Mechanisms of pathogen-derived resistance to viruses in transgenic plants. Plant Cell 8:1833–1844 Baumberger N, Baulcombe (2005) Arabidopsis ARGONAUTE1 is an RNA Slicer that selectively recruits microRNAs and short interfering RNAs. Proc Natl Acad Sci USA 102:11928–11933 Bernstein E, Caudy AA, Hammond SM, Hannon GJ (2001) Role for a bidentate ribonuclease in the initiation step of RNA interference. Nature 409:363–366 Borsani O, Zhu J, Verslues PE, Sunkar R, Zhu J-K (2005) Endogenous siRNAs derived from a pair of natural cis-antisense transcripts regulate salt tolerance in Arabidopsis. Cell 123:1279–1291 Boutet S, Vazquez F, Liu J, Béclin C, Fagard, M, Gratias A, Morel J-B, Crété P, Chen X, Vaucheret H (2003) Arabidopsis HEN1: a genetic link between endogenous miRNA controlling development and siRNA controlling transgene silencing and virus resistance. Curr Biol 13:843–848 Brummell DA, Balint-Kurti PJ, Harpster MH, Palys JM, Oeller PW, Gutterson N (2003) Inverted repeat of a heterologous 3'-untranslated region for highefficiency, high-throughput gene silencing. Plant J 33:793–800 Cao X, Jacobsen SE (2002) Role of the Arabidopsis DRM methyltransferases in de novo DNA methylation and gene silencing. Curr Biol 12:1138–1144 Carmell MA, Xuan Z, Zhang MQ, Hannon GJ (2002) The Argonaute family: tentacles that reach into RNAi, developmental control, stem cell maintenance, and tumorigenesis. Genes Dev 16:2733–2742 Carrington JC, Ambros V (2003) Role of microRNAs in plant and animal development. Science 301:336–338 Chan SW, Zilberman D, Xie Z, Johansen LK, Carrington JC, Jacobsen SE (2004) RNA silencing genes control de novo DNA methylation. Science 303:1336 Chan SWL, Henderson IR, Jacobsen SE (2005) Gardening the genome: DNA methylation in Arabidopsis thaliana. Nat Rev Genet 6:351–360 Chapman EJ, Prokhnevsky AI, Gopinath K, Dolja VV, Carrington, JC (2004) Viral RNA silencing suppressors inhibit the microRNA pathway at an intermediate step. Genes Dev 18:1179–1186 Chen X, Liu J, Cheng Y, Jia D (2002) HEN1 functions pleiotropically in Arabidopsis development and acts in C function in the flower. Development 129:1085–1094 Cogoni C, Macino G (2000) Post-transcriptional gene silencing across kingdoms. Curr Opin Genet Dev 10:638–643 Cogoni C, Irelan J, Schumacher M, Schmidhauser T, Selker E, Macino G (1996) Transgene silencing of the al-1 gene in vegetative cells of Neurospora is mediated by a cytoplasmic effector and does not depend on DNA-DNA interactions or DNA methylation. EMBO J 15:3153–3163
326
Shaun J. Curtin et al.
Dalmay T, Hamilton A, Rudd S, Angell S, Baulcombe DC (2000) An RNAdependent RNA polymerase gene in Arabidopsis is required for posttranscriptional gene silencing mediated by a transgene but not by a virus. Cell 101:543–553 Dedic-Hagan J (2004) Gene silencing in plants: mechanisms and spread. Ph.D thesis, Australian National University Deleris A, Gallego-Bartolome J, Bao J, Kasschau KD, Carrington JC, Voinnet O (2006) Hierarchical action and inhibition of plant Dicer-Like proteins in antiviral defense. Science 313:68–71 Du T, Zamore PD (2005) microPrimer: the biogenesis and function of microRNA. Development 132:4645–4652 Dunoyer P, Lecellier CH, Parizotto EA, Himber C, Voinnet O (2004) Probing the microRNA and small interfering RNA pathways with virus-encoded suppressors of RNA silencing. Plant Cell 16:1235–1250 Dunoyer P, Himber C, Voinnet O. (2005) DICER-LIKE 4 is required for RNA interference and produces the 21-nucleotide small interfering RNA component of the plant cell-to-cell silencing signal. Nat Genet 37:1356–1360 Ebhardt HA, Thi EP, Wang M-B, Unrau PJ (2005) Extensive 3' modification of plant small RNAs is modulated by helper component-proteinase expression. Proc Natl Acad Sci USA 102:13398–13403 Fedoroff NV (2002) RNA-binding proteins in plants: the tip of an iceberg? Curr Opin Plant Biol 5:452–459 Filipowicz W (2005) RNAi: the nuts and bolts of the RISC machine. Cell 122: 17–20 Finnegan EJ, Margis R, Waterhouse PM (2003) Post-transcriptional gene silencing is not compromised in the Arabidopsis CARPEL FACTORY (DICER-LIKE1) mutant, a homolog of Dicer-1 from Drosophila. Curr Biol 13:236–240 Fire A, Xu S, Montgomery MK, Kostas SA, Driver SE, Mello CC (1998) Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans. Nature 391:806–811 Gasciolli V, Mallory AC, Bartel DP, Vaucheret H (2005) Partially redundant functions of Arabidopsis DICER-like enzymes and a role for DCL4 in producing trans-acting siRNAs. Curr Biol 15:1494–1500 Golden TA, Schauer SE, Lang JD, Pien S, Mushegian AR, Grossniklaus U, Meinke DW, Ray A (2002) Short Integuments1/Suspensor1/Carpel Factory, a Dicer Homolog, is a maternal effect gene required for embryo development in Arabidopsis. Plant Physiol 130:808–822 Gossele V, Fache I, Meulewaeter F, Cornelissen M, Metzlaff M (2002) SVISS—a novel transient gene silencing system for gene function discovery and validation in tobacco plants. Plant J 32:859–866 Guo HS, Fei JF, Xie Q, Chua NH (2003) A chemical-regulated inducible RNAi system in plants. Plant J 34:383–392 Guo S, Kemphues KJ (1995) par-1, a gene required for establishing polarity in C. elegans embryos, encodes a putative Ser/Thr kinase that is asymmetrically distributed. Cell 81:611–620
12 RNA Silencing and Its Application in Functional Genomics
327
Hamilton AJ, Baulcombe DC (1999) A species of small antisense RNA in posttranscriptional gene silencing in plants. Science 286:950–952 Hamilton AJ, Brown S, Han Y, Ishizuka M, Lowe A, Solis AGA, Grierson D (1998) A transgene with repeated DNA causes high frequency, posttranscriptional suppression of ACC-oxidase gene expression in tomato. Plant J 15:737–746 Hammond SM, Bernstein E, Beach D, Hannon GJ (2000) An RNA-directed nuclease mediates post-transcriptional gene silencing in Drosophila cells. Nature 404:293–296 Han MH, Goud S, Song L, Fedoroff N (2004) The Arabidopsis double-stranded RNA-binding protein HYL1 plays a role in microRNA-mediated gene regulation. Proc Natl Acad Sci USA 101:1093–1098 Helliwell C and Waterhouse P (2003) Constructs and methods for high-throughput gene silencing in plants: RNA interference. Methods 30:289–295 Herr AJ, Jensen MB, Dalmay T, Baulcombe DC (2005) RNA polymerase IV directs silencing of endogenous DNA. Science 308:118–120 Hiraguri A, Itoh R, Kondo N, Nomura Y, Aizawa D, Murai Y, Koiwa H, Seki M, Shinozaki K, Fukuhara T (2005) Specific interactions between Dicer-Like proteins and HYL1/DRB- family dsRNA binding proteins in Arabidopsis thaliana. Plant Mol Biol 57:173–188 Hirochika H (2001) Contribution of the Tos17 retrotransposon to rice functional genomics. Curr Opin Plant Biol 4:118–122 Hunter C, Sun H, Poethig RS (2003) The Arabidopsis heterochronic gene ZIPPY is an ARGONAUTE family member. Curr Biol 13:1734–1739 Jones AL, Thomas CL, Maule AJ (1998) De novo methylation and co-suppression induced by a cytoplasmically replicating plant RNA virus. EMBO J 17: 6385–6393 Jones L, Hamilton AJ, Voinnet O, Thomas CL, Maule AJ Baulcombe DC (1999) RNA-DNA interactions and DNA methylation in post-transcriptional gene silencing. Plant Cell 11:2291–301 Jones-Rhoades MW, Bartel DP (2004) Computational identification of plant microRNAs and their targets, including a stress-induced miRNA. Mol Cell 14:787–799 Kanno T, Huettel B, Mette MF, Aufsatz W, Jaligot E, Daxinger L, Kreil DP, Matzke M, Matzke AJM (2005) Atypical RNA polymerase subunits required for RNA-directed DNA methylation. Nat Genet 37:761–765 Kidner CA and Martienssen RA (2005) The role of ARGONAUTE1 (AGO1) in meristem formation and identity. Dev Biol 280:504–517 Kumagai MH, Donson J, della-Cioppa G, Harvey D, Hanley K, Grill LK (1995) Cytoplasmic inhibition of carotenoid biosynthesis with virus-derived RNA. Proc Natl Acad Sci USA 92:1679–1683 Kurihara Y, Watanabe Y (2004) Arabidopsis micro-RNA biogenesis through Dicer-like 1 protein functions. Proc Natl Acad Sci USA 101:12753–12758 Lee RC, Ambros V (2001) An extensive class of small RNAs in Caenorhabditis elegans. Science 294:862–864
328
Shaun J. Curtin et al.
Lee RC, Feinbaum RL, Ambros V (1993) The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell 75: 843–854 Li H-W, Ding S-W (2005) Antiviral silencing in animals: RNAi: mechanisms, biology and applications. FEBS Lett 579:5965–5973 Li J, Yang Z, Yu B, Liu J, Chen X (2005) Methylation protects miRNAs and siRNAs from a 3'-end uridylation activity in Arabidopsis. Curr Biol 15: 1501–1507 Liu B, Li P, Li X, Liu C, Cao S, Chu C, Cao X (2005) Loss of function of OsDCL1 affects microRNA accumulation and causes developmental defects in rice. Plant Physiol 139:296–305 Lindbo JA, Silva-Rosales L, Proebsting WM, Dougherty WG (1993a) Induction of a highly specific antiviral state in transgenic plants: implications for regulation of gene expression and virus resistance. Plant Cell 5:1749–1759 Lindbo JA, Silva-Rosales L, Dougherty WG (1993b) Pathogen derived resistance to potyviruses: working, but why? Semin Virol 4:369–379 Lynn K, Fernandez A, Aida M, Sedbrook J, Tasaka M, Masson P, Barton MK (1999) The PINHEAD/ZWILLE gene acts pleiotropically in Arabidopsis development and has overlapping functions with the ARGONAUTE1 gene. Development 126:69–81 Ma C, Mitra A (2002) Intrinsic direct repeats generate consistent posttranscriptional gene silencing in tobacco. Plant J 31:37–49 Margis R, Fusaro AF, Smith NA, Curtin SJ, Watson JM, Finnegan EJ, Waterhouse PM (2006) The evolution and diversification of Dicers in plants. FEBS Lett 580:2442–2450 Martienssen RA (2003) Maintenance of heterochromatin by RNA interference of tandem repeats. Nat Genet 35:213–214 Matthew L (2004) RNAi for plant functional genomics. Comp Funct Genom 5:2440–2444 Meins F, Si-Ammour A, Blevins T (2005) RNA silencing systems and their relevance to plant development. Annu Rev Cell Dev Biol 21:297–318 Meister G, Tuschl T (2004) Mechanisms of gene silencing by double-stranded RNA. Nature 431:343–349 Melquist S, Bender J (2003) Transcription from an upstream promoter controls methylation signaling from an inverted repeat of endogenous genes in Arabidopsis. Genes Dev 17:2036–2047 Millar AA, Waterhouse PM (2005) Plant and animal microRNAs: similarities and differences. Funct Integr Genom 5:129–135 Mette MF, Aufsatz W, van der Winden J, Matzke MA, Matzke AJM (2000) Transcriptional silencing and promoter methylation triggered by double-stranded RNA. EMBO J 19:5194–5201 Metzlaff M, O'Dell M, Cluster PD, Flavell RB (1997) RNA-mediated RNA degradation and chalcone synthase A silencing in petunia. Cell 88:845–854 Miki D, Shimamoto K (2004) Simple RNAi vectors for stable and transient suppression of gene function in rice. Plant Cell Physiol 45:490–495 Miyao A, Tanaka K, Murata K, Sawaki H, Takeda S, Abe K, Shinozuka Y, Onosato K, Hirochika H (2003) Target site specificity of the Tos17 retrotransposon
12 RNA Silencing and Its Application in Functional Genomics
329
shows a preference for insertion within genes and against insertion in retrotransposon-rich regions of the genome. Plant Cell 15:1771–1780 Moissard G, Vionnet O (2004) Viral suppression of RNA silencing in plants. Mol Plant Pathol 5:71–82 Molnàr A, Csorba T, Lakatos L, Varallyay E, Lacomme C, Burgyán J (2005) Plant virus-derived small interfering RNAs originate predominantly from highly structured single-stranded viral RNAs. J Virol 79:7812–7818 Mourrain P, Beclin C, Elmayan T, Feuerbach F, Godon C, Morel JB, Jouette D, Lacombe AM, Nikic S, Picault N, Remoue K, Sanial M, Vo TA and Vaucheret H (2000) Arabidopsis SGS2 and SGS3 genes are required for posttranscriptional gene silencing and natural virus resistance. Cell 101: 533–542 Moussian B, Schoof H, Haecker A, Jürgens G, Laux T (1998) Role of the ZWILLE gene in the regulation of central shoot meristem cell fate during Arabidopsis embryogenesis. EMBO J 17:1799–1809 Napoli C, Lemieux C, Jorgensen R (1990) Introduction of a chimeric chalcone synthase gene into petunia results in reversible co-suppression of homologous genes in trans. Plant Cell 2:279–289 Onodera Y, Haag JR, Ream T, Nunes PC, Pontes O, Pikaard CS (2005) Plant nuclear RNA polymerase IV mediates siRNA and DNA methylation-dependent heterochromatin formation. Cell 120:613–622 Parizotto EA, Dunoyer P, Rahm N, Himber C, Voinnet O (2004) In vivo investigation of the transcription, processing, endonucleolytic activity, and functional relevance of the spatial distribution of a plant miRNA. Genes Dev 18: 2237–2242 Park W, Li J, Song R, Messing J, Chen X (2002) CARPEL FACTORY, a Dicer homolog, and HEN1, a novel protein, act in microRNA metabolism in Arabidopsis thaliana. Curr Biol 12:1484–1495 Peragine A, Yoshikawa M, Wu G, Albrecht HL, Poethig RS (2004) SGS3 and SGS2/SDE1/RDR6 are required for juvenile development and the production of trans-acting siRNAs in Arabidopsis. Genes Dev 18:2368–2379 Qi Y, Denli AM, Hannon GJ (2005) Biochemical specialization within Arabidopsis RNA silencing pathways. Mol Cell 19:421–428 Ratcliff F, Harrison BD, Baulcombe DC (1997) A similarity between viral defence and gene silencing in plants. Science 276:1558–1560 Ratcliff, F, Martin-Hernandez, A.M, Baulcombe, DC (2001) Tobacco rattle virus as a vector for analysis of gene function by silencing. Plant J 25:237–245 Reinhart BJ, Weinstein EG, Rhoades MW, Bartel B, Bartel DP (2002) MicroRNAs in plants. Genes Dev 16:1616–1626 Roth BM, Pruss GJ, Vance VB (2004) Plant viral suppressors of RNA silencing. Virus Res 102:97–108 Rothstein SJ, DiMaio J, Strand M, Rice D (1987) Stable and heritable inhibition of the expression of nopaline synthase in tobacco expressing antisense RNA. Proc Natl Acad Sci USA 84:8439–8443 Ruiz MT, Voinnet O, Baulcombe DC (1998) Initiation and maintenance of virusinduced gene silencing. Plant Cell 10:937–946 Ruvkun G, Wightman B, Ha I (2004) The 20 years it took to recognize the importance of tiny RNAs. Cell 116:S93–S96
330
Shaun J. Curtin et al.
Schauer SE, Jacobsen SE, Meinke DW, Ray A (2002) DICER-LIKE 1: blind men and elephants in Arabidopsis development. Trends Plant Sci 7:487–491 Schwab R, Ossowski S, Riester M, Warthmann N, Weigel D (2006) Highly specific gene silencing by artificial microRNAs in Arabidopsis. Plant Cell 18:1121–1133 Scofield SR, Huang L, Brandt AS, Gill BS (2005) Development of a virusinduced gene-silencing system for hexaploid wheat and its use in functional analysis of the Lr21-mediated leaf rust resistance pathway. Plant Physiol 138:2165–2173 Sharp PA (2001) RNA interference – 2001. Genes Dev 15:485–490 Sijen T, Wellink J, Hiriart JB, Van Kammen A (1996) RNA-Mediated virus resistance: role of repeated transgenes and delineation of targeted regions. Plant Cell 8:2277–2294 Silhavy D, Molnàr A, Lucioli A, Szittya G, Hornyik C, Tavazza M, Burgyán J (2002) A viral protein suppresses RNA silencing and binds silencinggenerated, 21- to 25-nucleotide double-stranded RNAs. EMBO J 21: 3070–3080 Smith NA, Singh SP, Wang M-B, Stoutjesdijk PA, Green AG, Waterhouse PM (2000) Gene expression: total silencing by intron-spliced hairpin RNAs. Nature 407:319–320 Stam M, Mol JNM, Kooter JM (1997) The silence of genes in transgenic plants. Ann Bot 79:3–12 Stevenson DS, Jarvis P (2003) Chromatin silencing: RNA in the driving seat. Curr Biol 13:R13–15 Tamaru H, Selker EU (2001) A histone H3 methyltransferase controls DNA methylation in Neurospora crassa. Nature 414:277–283 Tao X, Zhou X (2004) A modified viral satellite DNA that suppresses gene expression in plants. Plant J 38:850–860 Thomas CL, Jones L, Baulcombe DC, Maule AJ (2001) Size constraints for targeting post-transcriptional gene silencing and for RNA-directed methylation in Nicotiana benthamiana using a potato virus X vector. Plant J 25:417–425 Tomari Y, Zamore PD (2005) MicroRNA biogenesis: Drosha can’t cut it without a partner. Curr Biol 15:R61–64 Turnage MA, Muangsan N, Peele CG, Robertson D (2002) Geminivirus-based vectors for gene silencing in Arabidopsis. Plant J 30:107–114 van der Krol AR, Mur LA, Beld M, Mol JN, Stuitje AR (1990) Flavonoid genes in petunia: addition of a limited number of gene copies may lead to a suppression of gene expression. Plant Cell 2:291–299 Vaucheret H (2006) Post-transcriptional small RNA pathways in plants: mechanisms and regulations. Genes Dev 20:759–771 Vazquez F, Vaucheret H, Rajagopalan R, Lepers C, Gasciolli, V, Mallory AC, Hilbert J-L, Bartel DP, Crete P (2004) Endogenous trans-acting siRNAs regulate the accumulation of Arabidopsis mRNAs. Mol Cell 16:69–79 Voinnet O (2005) Induction and suppression of RNA silencing: insights from viral infections. Nat Rev Genet 6:206–220 Wada Y, Ohya H, Yamaguchi Y, Koizumi N, Sano H (2003) Preferential de novo methylation of cytosine residues in non-CpG sequences by a domains
12 RNA Silencing and Its Application in Functional Genomics
331
rearranged DNA methyltransferase from tobacco plants. J Biol Chem 278:42386–42393 Wang M-B, Metzlaff M (2005) RNA silencing and antiviral defense in plants. Curr Opin Plant Biol 8:216–222 Wang M-B, Waterhouse PM (2000) High-efficiency silencing of a betaglucuronidase gene in rice is correlated with repetitive transgene structure but is independent of DNA methylation. Plant Mol Biol 43:67–82 Wang M-B, Abbott DC, Waterhouse PM (2000) A single copy of a virus-derived transgene encoding hairpin RNA gives immunity to barley yellow dwarf virus. Mol Plant Pathol 1:347–356 Wang M-B, Wesley SV, Finnegan EJ, Smith NA, Waterhouse PM (2001) Replicating satellite RNA induces sequence-specific DNA methylation and truncated transcripts in plants. RNA 7:16–28 Wang X-H, Aliyari R, Li W-X, Li H-W, Kim K, Carthew R, Atkinson P, Ding S-W (2006) RNA interference directs innate immunity against viruses in adult Drosophila. Science 312:452–454 Wang XJ, Gaasterland T, Chua NH (2005) Genome-wide prediction and identification of cis-natural antisense transcripts in Arabidopsis thaliana. Genome Biol 6:R30 Wassenegger M (2005) The role of the RNAi machinery in heterochromatin formation. Cell 122:13–16 Wassenegger M, Heimes S, Riedel L, Sanger HL (1994) RNA-directed de novo methylation of genomic sequences in plants. Cell 76:567–576 Waterhouse PM, Helliwell CA (2003) Exploring plant genomes by RNA-induced gene silencing. Nat Rev Genet 4:29–38 Waterhouse PM, Graham MW, Wang M-B (1998) Virus resistance and gene silencing in plants can be induced by simultaneous expression of sense and antisense RNA. Proc Natl Acad Sci USA 95:13959–13964 Waterhouse PM, Wang M-B, Finnegan EJ (2001a) Role of short RNAs in gene silencing. Trends Plant Sci 6:297–301 Waterhouse, PM, Wang, M-B, Lough, T (2001b) Gene silencing as an adaptive defence against viruses. Nature 411:834–842 Watson JM, Fusaro AF, Waterhouse PM (2005) RNA silencing platforms in plants. FEBS Lett 579:5982–5987 Wesley SV, Helliwell C, Smith NA, Wang M-B, Rouse D, Liu Q, Gooding P, Singh S, Abbott D, Stoutjesdijk P, Robinson S, Gleave A, Green A, Waterhouse PM (2001) Constructs for efficient, effective and high throughput gene silencing in plants. Plant J 27:581–590 Wielopolska A, Townley H, Moore I, Waterhouse P, Helliwell C (2005) A highthroughput inducible RNAi vector for plants. Plant Biotech J 3:583–590 Xie Z, Johansen LK, Gustafson AM, Kasschau KD, Lellis AD, Zilberman D, Jacobsen SE, Carrington JC (2004) Genetic and functional diversification of small RNA pathways in plants. PLoS Biol 2:642–652 Xie Z, Allen E, Wilken A, Carrington JC (2005) DICER-LIKE4 functions in trans-acting siRNA biogenesis and vegetative phase change in. Proc Natl Acad Sci USA 102:12984–12989
332
Shaun J. Curtin et al.
Yang SJ, Carter SA, Cole AB, Cheng NH, Nelson RS (2004) A natural variant of a host RNA-dependent RNA polymerase is associated with increased susceptibility to viruses by Nicotiana benthamiana. Proc Natl Acad Sci USA 101:6297–6302 Yu D, Fan B, MacFarlane SA, Chen Z (2003) Analysis of the involvement of an inducible Arabidopsis RNA-dependent RNA polymerase in antiviral defense. Mol Plant Microbe Interact 16:206–216 Yu L, Yu X, Shen R, He Y (2005) HYL1 gene maintains venation and polarity of leaves. Planta 221:231–242 Zamore PD, Tuschl T, Sharp PA, Bartel DP (2000) RNAi: double-stranded RNA directs the ATP-dependent cleavage of mRNA at 21 to 23 nucleotide intervals. Cell 101:25–33 Zilberman D, Cao X, Jacobsen SE (2003) ARGONAUTE4 control of locusspecific siRNA accumulation and DNA and histone methylation. Science 299:716–719 Zilberman D, Cao X, Johansen LK, Xie Z, Carrington JC, Jacobsen SE (2004) Role of Arabidopsis ARGONAUTE4 in RNA-directed DNA methylation triggered by inverted repeats. Curr Biol 14:1214–1220
13 Activation Tagging Systems in Rice
1
2
1
Alexander A.T. Johnson , Su-May Yu and Mark Tester 1
Australian Centre for Plant Functional Genomics, PMB 1, Glen Osmond, South Australia 5064, Australia; 2Institute of Molecular Biology, Academia Sinica, Taipei 115, Taiwan, Republic of China Reviewed by Michael Ayliffe and Venkatesan Sundaresan
13.1 Introduction............................................................................................333 13.2 Classical Activation Tagging: Enhancer Element-Mediated Gene Activation ..............................................................................................335 13.2.1 Classical Activation Tagging in Plants...........................................335 13.2.2 Structure and Function of the CaMV 35S Activation Tagging System . ..........................................................................................336 13.2.3 Variations to the CaMV 35S Activation Tagging System..............338 13.2.4 CaMV 35S Activation Tagging Resources in Rice ........................339 13.3 Transactivation Tagging: Transcriptional Activator-Mediated Gene Activation in Specific Cell Types ..........................................................341 13.3.1 Gene Expression at the Cell Type-Specific Level .........................341 13.3.2 Origin of the GAL4 Enhancer Trapping System ............................342 13.3.3 GAL4 Enhancer Trapping in Plants ...............................................343 13.3.4 Cell Type–Specific Activation of Target Genes Using GAL4 Transactivation . ...............................................................................344 13.3.5 Cell Type–Specific Activation Tagging Using GAL4 Transactivation ...............................................................................346 13.4. Future Perspectives ...............................................................................348 Acknowledgments .........................................................................................349 References......................................................................................................349
13.1 Introduction Sequencing of the 389-Mb rice genome (Oryza sativa L.) is nearly complete and map-based, finished quality sequence now covers 95% of the genome. Determining function of the 37,544 predicted genes in the rice genome, however, remains a formidable challenge that will require multiple,
334
Alexander A.T. Johnson et al.
complementary approaches to be achieved. As with the dicotyledonous model species, Arabidopsis thaliana, rice genetic resources have been heavily invested toward the generation of random loss-of-function mutants, or knockouts, involving the use of mutagens such as γ-rays or, more frequently, transferred DNA (T-DNA) and transposon-based systems such as Ac/Ds, En/Spm, and Tos17 (see Hirochika et al. 2004 for review of rice mutant resources). To date, roughly 300,000 mutants have been generated using these strategies, providing invaluable genomic tools for gene mining in the model monocotyledonous species. In addition, gene targeting techniques have recently emerged that allow for specific rice loci to be disrupted (Terada et al. 2002; Cotsaftis and Guiderdoni 2005), yet optimization is still required before these techniques can be used to generate knockouts on order. Despite widespread application, the traditional knockout approach is limited in its ability to fully saturate the rice genome with mutations. Genes with lethal or deleterious knockout phenotypes (particularly at the embryonic stage of development) are not amenable to the loss-offunction approach, and the investigation of large gene families is often hampered by the redundant activity of one gene member compensating for the loss of another. This is particularly relevant to the rice genome, in which 29% of predicted genes have been amplified at least once to form tandem repeats, with some tandem repeats stretching up to 134 members (International Rice Genome Sequencing Project 2005). To address this significant obstacle and maximize the usefulness of knockout collections, gene, promoter, and enhancer traps have often been included in T-DNA and transposon-based insertion systems to enable reporter visualization of native gene activity when other phenotypes are not necessarily present (Jeon et al. 2000; Ito et al. 2004; Peng et al. 2005). Trapped patterns report on spatial and developmental activity of native rice genes, although the identification of genomic elements responsible for those patterns can be laborious and not always apparent (Peng et al. 2005). RNA silencing is a well documented phenomenon in plants (Baulcombe 2004), with the clear advantage over gene knockouts of simultaneously silencing multiple members of a particular gene family. The extent to which RNA silencing can be used to suppress gene targets in the rice genome remains to be seen, with a recent study of the OsRac gene family reporting a maximum of three gene members efficiently suppressed using inverted repeat constructs (Miki et al. 2005). Continued refinements to RNA silencing technology, such as the development of artificial microRNAs with greater targeting control than traditional hairpin constructs (Schwab et al. 2006), promise to increase the efficiency and accuracy of RNA silencing in plants. While interruption or silencing of a particular coding sequence may not lead to a detectable phenotype, for a variety of reasons, dominant mutant phenotypes are more likely to result from upregulation, or activation
13 Activation Tagging Systems in Rice
335
tagging, of the same coding sequence. Random activation of genes in the classical sense, utilizing the CaMV 35S enhancer element, is a growing field in rice functional genomics, with two groups reporting on the development of large activation tagged populations (Jeong et al. 2002, 2006; Hsing et al. 2006). Moreover, the concurrent development of several GAL4 enhancer trapping populations in rice (Wu et al. 2003; Yang et al. 2004; Johnson et al. 2005) means that gene activation can now be targeted to specific cell types. This chapter provides an introduction to classical activation tagging in plants, drawing extensively from Arabidopsis research, where the tagging technique primarily originated, before describing recent progress made in applying activation tagging to rice. GAL4 enhancer trapping technology is then presented with examples of how the technology has been used to transactivate target genes in specific cell types of rice. Finally, a novel method to carry out cell type–specific activation tagging using the GAL4 system, currently a powerful tool in Drosophila melanogaster genomics, is presented as an exciting application for the existing GAL4 rice resources.
13.2 Classical Activation Tagging: Enhancer Element-Mediated Gene Activation 13.2.1 Classical Activation Tagging in Plants Activation tagging provides an alternative to knockouts and RNA silencing through upregulation, rather than abolition, of native gene expression. The classical activation tagging technique uses T-DNA or transposons such as the En-I maize transposon system (Marsch-Martinez et al. 2002) to position strong activating enhancer elements, usually a tetramer of the CaMV 35S enhancer (Odell et al. 1985), throughout the plant genome. When integrated in proximity to genes, the enhancer elements can interact with promoter sequences in the genome to increase expression levels and/or alter expression patterns of native genes. This gain-of-function approach produces dominant mutations that dramatically affect the transcriptional control of genes while still retaining a functional gene product. As such, activation tagging represents a powerful method for deregulating vital housekeeping genes that must remain transcriptionally active to ensure viability. The activation approach also lends itself well to the dissection of signal transduction pathways with multiple activator and suppressor genes such as those involved in floral induction (Kardailsky et al. 1999; Lee et al. 2000) or response to internal growth factors such as auxin (Zhao et al. 2001), cytokinin (Kakimoto 1996), and brassinosteroids (Li et al. 2001,
336
Alexander A.T. Johnson et al.
2002). For several Arabidopsis activation tagged signal transduction mutants, overexpression phenotypes have been identified that have no corresponding loss-of-function phenotype. Protein kinase and associated signal transducing domains are particularly abundant in the Arabidopsis genome and account for more than 10% of all genes (The Arabidopsis Genome Initiative 2000). Similarly, 65% of tandemly repeated genes in the rice genome with more than 27 members contain protein kinase domains (International Rice Genome Sequencing Project 2005). Activation tagging may be the best method to mutate these and other highly amplified signaling genes, such as transcription factors, and could explain why many signal transduction mutants in Arabidopsis have been identified specifically through activation tagging. Finally, genes that are normally expressed at very low levels, such as those involved in phenylpropanoid biosynthesis (Borevitz et al. 2000), may produce a recognizable phenotype only when greatly overexpressed. 13.2.2 Structure and Function of the CaMV 35S Activation Tagging System The original activation tagging T-DNA vector used for plant growth and developmental studies in tobacco (Walden et al. 1994) contained four copies of the 35S transcriptional enhancer sequence (nucleotide –420 to –90 relative to transcription start of the 35S RNA promoter) cloned in tandem and placed adjacent to the right border, thus creating a 35S enhancer tetramer that could enhance, but not initiate, native gene expression on integration into the plant genome (Fig. 13.1a). Deletion studies had previously shown that the transcriptional initiation and enhancement properties of the 35S RNA promoter are physically separate, and deletions beyond – 46 (such as the fragment used for activation tagging) remove transcript initiation while retaining the ability to enhance transcription (Odell et al. 1985). Free of any insertional constraints concerning gene transcription, the 35S tetramer is a versatile enhancer element that can influence gene expression both up- and downstream of native genes, in either orientation, at genomic distances ranging from 380 bp (Wiegel et al. 2000) to 8.2 kb (Ichikawa et al. 2003) from the ATG start codon of Arabidopsis genes.
13 Activation Tagging Systems in Rice
337
Fig. 13.1. Schematic representations of the different activation tagging systems in plants; T-DNA constructs appear in light gray, bordered by left and right borders (LB and RB), and plant genomic elements in dark gray. (a) Classical activation tagging with a tetramer of the CaMV 35S enhancer (E) cloned next to the left border of a T-DNA construct. An adjacent endogenous transcriptional unit consisting of promoter (small dashed cylinder) and coding sequence (large dashed cylinder) shows upregulated expression (indicated by arrow) due to interaction with the enhancer element. (b) Activation tagging with the complete CaMV 35S promoter (35S) cloned next to the LB of a T-DNA construct. Integration of the T-DNA directly 5 of an endogenous coding sequence replaces the native promoter with the 35S promoter, resulting in constitutive overexpression of the gene. (c) Enhancer trapping with the minimal promoter-equipped Gal4 gene (GAL4) cloned next to the right border of a T-DNA construct. An endogenous enhancer element (hatched arrow) drives transcription of the Gal4 gene, leading to the GAL4 transcriptional activator protein binding to the UAS element (five 17 bp UAS repeats cloned in tandem, followed by a minimal promoter TATA) and activating expression of a downstream reporter gene. The resulting pattern of GAL4/reporter gene expression can be highly specific, depending on the genomic enhancer, and is the defining characteristic of the driver line. (d) Activation of a responder construct in specific cell types using GAL4 transactivation. A gene of interest (GOI) is placed immediately downstream of the UAS element and the resulting construct is introduced, through sexual crosses or retransformation, into a driver line. The responder construct subsequently comes under transcriptional control of the driver, forcing transcription of the GOI in the same pattern as GAL4/reporter gene expression. (e) Cell type–specific activation tagging using GAL4 transactivation. UAS elements are cloned next to the LB and RB of a T-DNA construct, creating a double-sided gene transactivator that is capable of up-regulating endogenous gene expression from either border
338
Alexander A.T. Johnson et al.
The frequency with which activation tagging produces visible, dominant activation tagged mutants has ranged from 0.07 % (Weigel et al. 2000) to 2.2% in Arabidopsis (Ichikawa et al. 2003). Differences in tagging frequencies may result from methylation-induced silencing of the 35S tetramer, triggered by the multimerized 35S enhancer element itself, or a prevalence of multiple T-DNA insertions in certain populations. While most studies have focused on single genes being activated by the 35S tetramer, evidence suggests that multiple genes may be activated, particularly in the case of genes lying closely in tandem (van der Graaff et al. 2002). The exact size of the 35S enhancer fragment varies slightly in different activation tagging studies, but the deletion of transcriptional initiation and the tetramer conformation is nearly universal. Integration of the 35S tetramer into the Arabidopsis genome usually causes massive upregulation of the adjacent gene(s), resulting in constitutive expression throughout most tissues of the plant and thus loss of the original expression pattern (Kakimoto 1996; Kardailsky et al. 1999; Borevitz et al. 2000; Ito and Meyerowitz 2000; Lee et al. 2000; Huang et al. 2001; Li et al. 2001, 2002; Zhao et al. 2001; Marsch-Martinez et al. 2002). However, a small number of activation tagged mutants have displayed upregulated gene activity while conserving the original expression profile of the native gene (Neff et al. 1999; van der Graaff et al. 2000). In a collection of 30 activation tagged Arabidopsis phenotypic mutants, Weigel et al. (2000) identified one mutant line with upregulated expression of the FLOWERING LOCUS T gene in a similar expression pattern (predominantly shoot) to that of the wild-type gene. Upregulation under the native gene expression profile could avoid deleterious effects caused by constitutive overexpression of a gene. An example of this is the LEAFY PETIOLE gene, originally characterized by van der Graaff (2000) in an Arabidopsis activation tagged mutant showing tissue-specific upregulation of the gene. Attempts to constitutively express the gene with the full 35S promoter resulted in sterile plants with severe developmental defects, indicating that tissue-specific overexpression was necessary to produce viable mutants. Clearly an increase in the frequency of tagged mutants with tissue-specific, or cell type–specific, activation of genes would be desirable, although this may require alternative promoters with tissue specific activity or the use of GAL4 enhancer trapping technology, as described later in this chapter. 13.2.3 Variations to the CaMV 35S Activation Tagging System In a few instances the entire 35S promoter, rather than the enhancer tetramer, has been employed for activation tagging studies in Arabidopsis (Wilson et al. 1996; Schaffer et al. 1998; Fridborg et al. 1999). In these
13 Activation Tagging Systems in Rice
339
cases, the 35S promoter can replace a native promoter when integration of the activation tagging cassette occurs directly 5΄ of a coding sequence (Fig. 13.1b), resulting in constitutive overexpression of the flanking gene. This type of activation tagging requires insertions occurring in close proximity to the start codon, and the literature documents insertions ranging from 35 to 131 bp upstream of the upregulated gene (Wilson et al. 1996; Schaffer et al. 1998). Attempts have also been made to utilize enhancer fragments from promoters other than 35S, and some progress was reported with an enhancer element isolated from the cassava vein mosaic virus (Dong and Arnim 2003). The introduction of inducible promoters into activation tagging systems has enabled gene activation to be temporally controlled. Matsuhara et al. (2000) used the Arabidopsis HSP18.2 heat-shock promoter to induce activation of several native Arabidopsis genes following heat shock at 37°C, while Zuo et al. (2002) and Sun et al. (2003) used an estradiol-inducible activation tagging system to identify genes involved in plant phytohormone signaling. Further, activation tagging has been carried out with transgenic Arabidopsis lines carrying a reporter gene such as luciferase (LUC) cloned downstream of a promoter normally activated by min stress, such as the sweet potato sporamin gene (Spo ) sugar-inducible promoter (Masaki et al. 2005) or the pathogenesis-related 1 defense gene (PR-1) promoter (Grant et al. 2003). After deployment of an activation tag, such as the 35S tetramer, into the transgenic background, dominant mutants have been identified (through screens for reporter gene expression under nonstressed conditions) that exhibit enhanced promoter activity. Tagged genes in the dominant mutants have been shown to function as key regulators of the targeted promoters and other similar sequences. 13.2.4 CaMV 35S Activation Tagging Resources in Rice Jeong et al. (2002) was the first group to report activation tagging in any monocot species with their description of 13,450 activation tagged lines in japonica rice. The plants were produced using the traditional 35S enhancer tetramer (nucleotide –417 to –86 relative to transcription start) located next to the left border of a T-DNA construct, and dominant mutants were detected at a frequency of 0.3%. However, reverse transcriptase-polymerase chain reaction (RT-PCR) analysis of 10 randomly selected lines with 35S tetramer insertions within 4.5 kb of a native rice gene showed that four lines had significantly increased expression of that gene. These results suggest that integration of the 35S tetramer into the rice genome activates expression of adjacent genes (within 4.5 kb) in roughly half of the insertion events, despite the fact that dominant mutant phenotypes were detected at a much lower frequency. Detailed analysis of two activation tagged lines
340
Alexander A.T. Johnson et al.
demonstrated that at least one of them, with increased expression of the Cf2/Cf5 gene, showed elevated gene expression under the same expression profile as the native gene. More recently, Jeong et al. (2006) described a much larger expression study of genes located closest to the 35S tetramer in 112 activation tagged lines, taken from the same population of plants (which currently totals 47,932 lines). More than half of the lines, 52%, showed elevated expression of the adjacent gene. Gene activation was found to extend significantly further than 4.5 kb, with one analyzed gene showing strong upregulation at a distance of 10.7 kb from the 35S tetramer. Interestingly, and unlike previous reports in Arabidopsis, 70% of the tagged lines showed elevated expression under the endogenous gene profile, with the remaining lines showing more general expression profiles and frequent ectopic expression in the leaf. The higher frequency of gene activation in rice compared to Arabidopsis, particularly under the native gene profile, likely results from the fact that most Arabidopsis activation tagged mutants were identified in screens for visible dominant phenotypes, while the expression analyses carried out by Jeong et al. (2006) utilized randomly selected transformants determined to have the activation tagging T-DNA construct integrated in intergenic regions. Indeed, the study identifies many activation tagged lines with upregulated gene expression that do not appear to have a phenotype. An examination of expression patterns in the 0.3% of activation tagged lines with dominant mutant phenotypes would determine if constitutive overexpression of activated genes predominates in these lines, or if the endogenous expression pattern is retained here as well. Closer examination is also needed to verify that the dominant phenotypes are truly a consequence of the activation tagging element and not a result of tissue culture-induced somaclonal variation. Finally, the high frequency of gene activation detected by RT-PCR highlights the fact that more sensitive screens may be necessary to identify mutants in populations of rice activation tagged lines. A large population of 45,000 activation tagged lines of rice was recently described by Hsing et al. (2006) using a modification of the classical CaMV 35S activation tagging system. The study cloned eight tandem repeats of the 35S enhancer (nucleotide –343 to –46 relative to transcription start) adjacent to the left border of a T-DNA construct, creating a 35S octamer that can activate gene expression upon integration into the rice genome. The octamer configuration may serve as a more potent activator than the traditional tetramer, with genetic distances as far as 12.5 kb from the ATG start codon reportedly leading to gene activation in certain lines (S. M. Yu et al., unpublished). Tagging efficiency has not been reported for this population, nor has the predominant pattern of overexpression been documented. However, detailed characterization of a GA-deficient mutant line showing upregulated expression of the GA2ox gene suggests that the line has constitutive
13 Activation Tagging Systems in Rice
341
overexpression of the gene, as the phenotype (inhibited stem and reproductive organ development) does not differ from that obtained by general overexpression of the gene by the full 35S promoter. The two populations of classical activation tagged lines reported in rice comprise a much larger population of plants (more than 90,000 independent transformants) than that reported for Arabidopsis. In addition, the CaMV 35S enhancer element appears to function equally well, if not better, in the Jeong et al. (2006) population as compared to published Arabidopsis studies. The 35S enhancer has been shown to activate rice genes at genetic distances as far as 12.5 kb, far exceeding the average gene density of one gene per 9.9 kb in the rice genome (International Rice Genome Sequencing Project 2005). This result suggests that most insertions of the 35S enhancer element into the rice genome (excluding insertions in coding sequences, leading to knockouts) have the potential to activate at least one native rice gene, if not several. For all of these reasons, the described classical activation tagging resources are likely to yield valuable mutants that traditional loss-of-function strategies have missed. However, additional analyses of the Hsing et al. (2006) population are required to ascertain the efficiency and pattern of activation tagging in rice with a 35S octamer enhancer element, as well as any silencing issues implicated with eight repeats of the enhancer sequence.
13.3 Transactivation Tagging: Transcriptional ActivatorMediated Gene Activation in Specific Cell Types 13.3.1 Gene Expression at the Cell Type–Specific Level The recent increase of gene expression studies involving specific tissues and cell types of plants represents a rapidly growing trend in plant biology to deconstruct organs into their constituent parts, and firmly heralds the arrival of microgenomics in plants (Brandt 2005; Moore et al. 2006). Two technological breakthroughs largely responsible for this development— laser capture microdissection (LCM; Kerk et al. 2003; Nakazono et al. 2003) and fluorescence-activated cell sorting (FACS; Birnbaum et al. 2003)—now enable researchers to isolate and perform transcriptome analyses on cell types as numerically small as the Arabidopsis quiescent center (Nawy et al. 2005). These studies have identified markedly diverse gene expression profiles in different cell types and developmental stages of plants (Birnbaum et al. 2003), highlighting the dynamic nature of gene expression across various organs, tissues and cell types of plants. Despite the fact that native genes are frequently expressed in a cell type– specific manner, the majority of functional genomic studies in plants involve overexpression of target transgenes throughout the entire plant using
342
Alexander A.T. Johnson et al.
strong constitutive promoters such as CaMV 35S or the rice Actin1 promoter. Constitutive overexpression of transgenes can lead to pleiotropic effects not related to normal functioning of the gene and a resulting misinterpretation of its role. A more desirable option is to target transgene expression to particular tissues and cell types where it is known or predicted to be active. However, until recently, a general paucity of plant promoters with spatial and/or developmental-specific activity has limited the ability to do this. The development of GAL4 resources in Arabidopsis and more recently, rice, reduces the need for characterized cell type-specific promoters and now enables the expression and analysis of transgene expression in nearly all tissues and cell types of these model organisms. 13.3.2 Origin of the GAL4 Enhancer Trapping System The development of GAL4 enhancer trapping technology in Drosophila melanogaster (Brand and Perrimon 1993) marked a revolution in fly developmental biology and multicellular organisms in general. Providing researchers for the first time with a powerful tool to routinely manipulate or destroy (ablate) specific cell types while leaving the rest of the organism untouched, the GAL4 system has come to be known as the fly geneticist’s “Swiss army knife” (Duffy et al. 2002). Today more than 5,000 Drosophila GAL4 enhancer trap lines have been developed and characterized, allowing nearly all cell types of this model organism to be targeted. The GAL4 enhancer trapping system makes use of the 881 amino acid GAL4 transcriptional activator protein originally isolated from Saccharomyces cerevisiae. The GAL4 protein acts as a potent transcriptional activator in yeast by binding to a 17-bp DNA sequence known as the Upstream Activation Sequence (UAS) element, found in the promoter region of galactose-inducible genes. For GAL4 enhancer trapping in other organisms, the Gal4 coding sequence is cloned behind a minimal promoter and launched into the genome using the P-element transposon for flies, or T-DNA for plants. Integration of the construct into the host genome can place the minimal promoter-equipped Gal4 gene under transcriptional control of local promoter and enhancer elements, so that GAL4 is produced in a pattern reflective of native gene activity. GAL4 expression is conveniently visualized through a GAL4-responsive reporter gene such as gfp or gus (uidA), included on the same T-DNA construct for plants, cloned downstream of five or six tandem repeats of the UAS element (Fig. 13.1c). Binding of GAL4 to the UAS sequence leads to amplification and increased expression of the reporter gene (Moore et al. 2006), resulting in far higher frequencies of enhancer trapping than that reported with traditional
13 Activation Tagging Systems in Rice
343
enhancer trapping constructs. Depending on the native gene driving expression of GAL4, expression patterns of the transcriptional activator can range from whole organism to individual cell types. 13.3.3 GAL4 Enhancer Trapping in Plants Enhancer trapping in plants with a modified version of the GAL4 protein (a GAL4:VP16 fusion protein engineered for efficient expression in plants, hereafter referred to as GAL4) was first reported in Arabidopsis, where T-DNA integration of a Gal4:gfp enhancer trapping construct yielded GFP expression frequencies of 30% in T1 lines (Haseloff 1999). More than 12,000 Arabidopsis GAL4 lines have been developed, and many display highly specific GFP expression patterns throughout the plant. A catalog of root-specific lines, with patterns specific to cell types such as the pericycle and lateral root cap, has been collated to facilitate study of the root meristem and is available at http://www.plantsci.cam.ac.uk/Haseloff/. The deployment of GAL4 technology in rice followed several years later with the development of three large enhancer trapping populations in japonica rice: 31,443 independent transformants in cvs. Zhonghua 11 and Zhonghua 15 (Wu et al. 2003); 100,000 independent transformants in cv. Nipponbare (Yang et al. 2004; Peng et al. 2005); and a second cv. Nipponbare population containing roughly 13,000 independent transformants (Johnson et al. 2005). Comprising nearly 145,000 transformants in total, these rice enhancer trap lines now represent the largest set of GAL4 resources in any single plant species, and perhaps any organism. The three rice populations were transformed with similar GAL4 enhancer trapping constructs, although two populations used six repeats of the UAS element and the GUS reporter gene in their enhancer trapping construct (Wu et al. 2003; Yang et al. 2004; Peng et al. 2005) while the population described by Johnson et al. (2005) employed five repeats of the UAS element and the gfp reporter gene. The populations show significantly different frequencies of reporter gene expression, ranging from 32% in the GFP lines to 84.3% in the GUS populations, and the noted variation in construct design may account for these differences. As with the Arabidopsis GAL4 library, many rice lines with cell type–specific reporter gene expression profiles have been identified (see Fig. 13.2) and searchable databases for various patterns throughout the rice plant can be found at http://129.127.183.5 (Johnson et al. 2005) and http://rmd.ncpgr.cn/ (Zhang et al. 2006).
344
Alexander A.T. Johnson et al.
Fig. 13.2. Three different GAL4 driver lines from the Johnson et al. (2005) collection showing markedly different patterns of GFP expression specific to various cell types of the rice shoot: (a) collar or lamina joint; (b) leaf vascular cells; (c) collar and stomata. The spatial and/or developmental characteristics of endogenous plant enhancers determine the GAL4/GFP expression pattern of each individual driver. Green color is due to fluorescence of GFP; red is due to autofluorescence of chlorophyll (See also color plate section).
13.3.4 Cell Type–Specific Activation of Target Genes Using GAL4 Transactivation The potent activator properties of GAL4 enable the cell type–specific expression patterns of interesting GAL4 lines to be harnessed to drive the expression of other transgenes in equally specific fashion. For this to occur, a GAL4 enhancer trap line is first selected on the basis of reporter gene expression in a particular cell type(s) of interest. The selected enhancer trap line is termed the “driver” line because it drives expression of the GAL4 transcriptional activator protein in a pattern of interest. A second transformant is then generated with a transgene of interest placed immediately downstream of the UAS sequence element to which GAL4 binds, creating the so-called “responder” line (Fig. 13.1d). By crossing the driver and responder lines (or retransforming the driver with the responder construct), the GAL4-responsive gene of interest comes under transcriptional control of the GAL4 expression pattern present in the driver line, resulting in transactivation of the gene of interest specifically in the cell type(s) of interest. Transactivation of genes using GAL4 enhancer trapping technology has long been routine in Drosophila (Brand and Perrimon 1993; Phelps and Brand 1998), and a similar trend in the plant literature is now emerging with five published reports in Arabidopsis (Bougourd et al. 2000; Kiegle et
13 Activation Tagging Systems in Rice
345
al. 2000; Boisnard-Lorig et al. 2001; Sabatini et al. 2003; Gallois et al. 2004) and two in rice (Johnson et al. 2005; Liang et al. 2006). The study by Johnson et al. (2005) was the first to demonstrate with the GUS reporter gene (uidA) that a target transgene could be transactivated by GAL4 driver lines in specific cell types of the rice plant representing the root, seed, leaf and floral organs. Figure 13.3a presents the confocal image of a rice enhancer trap line identified in this collection with GFP fluorescence specific to xylem parenchyma cells of the root; a vascular cell type known to be important in controlling the composition of the xylem transpiration stream(Tester and Davenport 2003). An identical GUS staining pattern was obtained on retransformation of the enhancer trap line with a UAS:uidA responder construct (Fig. 13.3b and c), providing clear evidence that GAL4-mediated transactivation of genes in rice is robust and cell type–specific. The ability to target transgene expression to specific cell types of interest, such as the xylem parenchyma, represents a more biologically relevant method to express genes of interest than that provided by constitutive promoters. In addition, genes reported to show toxicity when expressed constitutively, such as certain sodium transporter genes, may be more amenable to the GAL4 system of expression.
Fig. 13.3. Transactivation of the GUS reporter gene (uidA) in xylem parenchyma cells of the rice root. (a) Confocal laser scanning microscopy image of a GAL4 driver line showing GFP expression specifically in xylem parenchyma cells of the root; beginning in the zone of cell differentiation (indicated by arrow). Green color is due to fluorescence of GFP; red is due to fluorescence of propidium iodide. Scale bar = 50 μm. (b and c) Stereomicroscopy images of the same GAL4 line transformed with a UAS:uidA responder construct. Histochemical GUS staining shows that expression of the reporter gene remains specific to the xylem parenchyma, consistently appearing in the zone of cell differentiation and becoming more intense in mature regions of the root (See also color plate section).
346
Alexander A.T. Johnson et al.
More recently, six different GAL4 enhancer trap lines from the Wu et al. (2003) collection, showing GUS expression patterns primarily in reproductive cell types such as the stigma or lemma/palea, were used as drivers to investigate the function of 10 rice transcription factors (Liang et al. 2006). The responder lines, each carrying one of the transcription factor genes cloned downstream of the UAS element, were crossed to the different drivers, thus enabling functional analysis of transcription factor expression in several different organs represented by the GAL4 drivers. The study demonstrated that certain transcription factors reveal phenotypes upon activation in specific organs and not in others (such as a leaf phenotype, but not floral, for OsMADS15). In addition, crosses of the responder lines to one reference driver showing GUS expression in most organs of the plant (essentially a constitutive driver) frequently produced more severe phenotypes than with the specific driver lines, causing lethality in the case of OsMYBS3. The OsMYBS3 transcription factor produced a viable phenotype when activated specifically in the stigma (short stigma), a result that again highlights the utility of tissue-specific expression systems, such as GAL4 transactivation, when aiming to perturb specific plant processes. Transactivation involving any of the lines described by Yang et al. (2004) has not, to our knowledge, been published. However, the sheer number of GAL4 rice lines currently available to the scientific community, as well as the fact that transactivation has been clearly demonstrated in two of the GAL4 populations, suggests that cell type-specific transactivation of target genes is a rapidly emerging technique in rice that will see increased application to functional genomic studies in the near future. In addition, alternative transcriptional activator systems, using proteins other than GAL4, are currently being developed for use in rice that may extend the application of transactivation work even further by providing transcriptional activators with greater freedom to operate (www.cambia.org). Alternative transcriptional activators would also enable the production of “dual” driver lines, where the combination of two enhancer trapping constructs, based on different transcriptional activator proteins, would enable the simultaneous targeting of transgene expression to different cell types of the plant. 13.3.5 Cell Type–Specific Activation Tagging Using GAL4 Transactivation The development of large GAL4 populations in rice, and successful implementation of transactivator technology to drive targeted gene expression, indicates that a more specialized form of activation tagging is now possible, one where tagging is carried out in specific cell types of the plant.
13 Activation Tagging Systems in Rice
347
This novel gain-of-function strategy uses random genomic integration of the GAL4-binding UAS element to dissect and analyze the function of endogenous genes in individual cell types. Unlike classical activation tagging with the 35S enhancer element, the UAS strategy forces random gene activation specifically in predetermined cell types of interest, thereby reducing pleiotropic effects caused by constitutive gene activation and yielding valuable functional data that is relevant to only the targeted cell type. Shortly after the GAL4 enhancer trapping system was first described in Drosophila, a UAS-based gene activator construct was developed enabling activation tagging studies to be carried out in specific cell types (Rørth et al. 1996). The gene activator consisted of multiple repeats of the UAS element, plus a minimal promoter for transcript initiation, positioned towards one end of a P-element transposon. In what the authors termed a “modular misexpression screen,” the gene activator was launched throughout the fly genome using P-element mutagenesis and a library of flies was generated, each containing a novel genomic insertion of the UAS element. The insertion process ensured that some UAS insertions occurred in close proximity to, or inside of, native genes, thereby creating GAL4-responsive endogenous genes and what we have termed “endogenous responder” lines. Subsequent sexual crosses of the endogenous responders to selected driver lines resulted in activation of the endogenous genes in specific cell types of interest. The sexual crosses revealed high frequencies of activation tagged mutants when an eye-specific GAL4 driver was used, with 4% of endogenous responders yielding a dominant phenotype upon crossing to this line. Toba et al. (1999) developed a significantly improved version of the original fly random gene activator construct by positioning UAS minimal promoter elements at both ends of a P-element transposon. The doublesided activator activates genes in both orientations, and 64% of the resulting endogenous responder lines revealed a detectable dominant phenotype in combination with at least one of the four different GAL4 driver lines to which they were crossed. The fact that several endogenous responders produced phenotypes in combination with certain GAL4 drivers, and not with others, indicates that many genes produce a detectable phenotype only when activated in specific cell types. Correspondingly, many endogenous responders produced radically different phenotypes, ranging from rough eyes to lethality, when crossed to the four different driver lines, demonstrating that native genes can play fundamentally different roles when activated in different cell types, information that conventional constitutive expression would “average out” as a single phenotype or miss entirely. Cloning of the UAS insertion site in 47 mutant endogenous responder lines showed that more than half contained insertions of the enhancer
348
Alexander A.T. Johnson et al.
element at –50 to +100 nucleotides relative to the native transcriptional start of an endogenous gene, and most of the activated transcripts were spliced and polyadenylated correctly (Toba et al. 1999). Close proximity of the UAS element to the 5΄ region of activated genes, most likely due to the fact that the UAS element carries a minimal promoter that initiates transcription, greatly facilitated the cloning of activated genes and appeared to limit activation to one specific gene. The remarkably high activation frequency reported in this study could be attributed to significantly amplified gene expression achieved through binding of GAL4 to the UAS, resulting in prominent phenotypes, as well as the fact that cell type–specific activation tagging may result in larger numbers of viable phenotypes than that derived from classical activation tagging. The development of large GAL4 resources in both rice and Arabidopsis has yet to lead to cell type–specific activation tagging populations, although the idea of using GAL4 technology in this fashion has long been promoted for plants (see CAMBIA review in Finkel 1999). Experience with Drosophila would suggest that such tagging efforts will soon follow and find broad application in both model species, especially as increasing numbers of well characterized GAL4 driver lines become available to the scientific community. A double-sided T-DNA gene activator carrying UAS minimal promoter repeats at both ends, similar in concept to the vector described by Toba et al. (1999), was recently developed for use in rice transformations and is currently being integrated into several GAL4 enhancer trap lines (Fig. 13.1e; Johnson and Tester, unpublished results). The double-sided activator has the potential to activate native genes from either border; however, the average density of one gene per 9.9 kb in the rice genome would ensure that most T-DNA insertions result in single gene activations.
13.4 Future Perspectives The immediate task for classical activation tagging in rice is to utilize the large resources at hand for gene discovery programs. While the majority of Arabidopsis activation tagging articles focus on the characterization of individual mutant lines showing interesting phenotypes, rice activation tagging has so far mostly reported on population development and expression analyses of genes lying adjacent to the activation element. The expression studies have conclusively demonstrated that the CaMV 35S activation strategy works efficiently in rice and, in fact, indicate that more stringent screens are probably necessary to detect phenotypes associated with the high levels of gene activation uncovered in the tagged populations.
13 Activation Tagging Systems in Rice
349
Transactivation tagging is also an area of rice functional genomics where large efforts have been devoted toward resource development. Further work is needed to demonstrate that transactivation is possible in all of the existing GAL4 collections; however, successful GAL4 transactivation of the GUS reporter gene (Fig. 13.3; Johnson et al. 2005) as well as 10 transcription factor genes (Liang et al. 2006) verifies that targeted expression studies, using most if not all cell types of rice, are now possible. Further characterization of expression patterns in the three GAL4 collections to uncover specific GAL4 drivers, particularly in the lines described by Yang et al. (2004), will accelerate the use of GAL4 enhancer trapping technology for this purpose. Finally, activation tagging in specific cell types, still unrealized in plant biology, is one of the most promising methods to emerge from GAL4 technology and will likely see widespread application to rice activation tagging efforts in the near future.
Acknowledgments We thank Gynheung An for advance access to activation tagging data that greatly facilitated the writing of this manuscript. We would also like to recognize Drs. Emmanuel Guiderdoni and Julian Hibberd as co-developers of the GAL4/GFP rice enhancer trap collection. Drs. Venkatesan Sundaresan and Michael Ayliffe are thanked for critical review of the manuscript.
References Baulcombe D (2004) RNA silencing in plants. Nature 431:356–363 Birnbaum K, Shasha DE, Wang JY, Jung JW, Lambert GM, Galbraith DW, Benfey PN (2003) A gene expression map of the Arabidopsis root. Science 302:1956–1960 Boisnard-Lorig C, Colon-Carmona A, Bauch M, Hodge S, Doerner P, Bancharel E, Dumas C, Haseloff J, Berger F (2001) Dynamic analyses of the expression of the HISTONE:YFP fusion protein in Arabidopsis show that syncytial endosperm is divided in mitotic domains. Plant Cell 13:495–509 Borevitz JO, Xia Y, Blount J, Dixon RA, Lamb C (2000) Activation tagging identifies a conserved MYB regulator of phenylpropanoid biosynthesis. Plant Cell 12:2383–2394 Bougourd S, Marrison J, Haseloff J (2000) An aniline blue staining procedure for confocal microscopy and 3D imaging of normal and perturbed cellular phenotypes in mature Arabidopsis embryos. Plant J 24:543–550 Brand AH, Perrimon N (1993) Targeted gene expression as a means of altering cell fates and generating dominant phenotypes. Development 118:401–415
350
Alexander A.T. Johnson et al.
Brandt SP (2005) Microgenomics: gene expression analysis at the tissue-specific and single-cell levels. J Exp Bot 56:495–505 Cotsaftis O, Guiderdoni E (2005) Enhancing gene targeting efficiency in higher plants: rice is on the move. Transgen Res 14:1–14 Dong Y, von Arnim AG (2003) Novel plant activation-tagging vectors designed to minimize 35S enhancer-mediated gene silencing. Plant Mol Biol Rep 21: 349–358 Duffy JB (2002) GAL4 system in Drosophila: a fly geneticist’s Swiss army knife. Genesis 34:1–15 Finkel E (1999) Australian center develops tools for developing world. Science 5433:1481–1483 Fridborg I, Kuusk S, Moritz T, Sundberg E (1999) The Arabidopsis dwarf mutant shi exhibits reduced gibberellin responses conferred by over-expression of a new putative zinc finger protein. Plant Cell 11:1019–1031 Gallois JL, Nora FR, Mizukami Y, Sablowski R (2004) WUSCHEL induces shoot stem cell activity and developmental plasticity in the root meristem. Genes Dev 18:375–380 Grant JJ, Chini A, Basu D, Loake GJ (2003) Targeted activation tagging of the Arabidopsis NBS-LRR gene, ADR1, conveys resistance to virulent pathogens. Mol Plant Microbe Interact 16:669–680 Haseloff J (1999) GFP variants for multispectral imaging of living cells. Methods Cell Biol 58:139–151 Hirochika H, Guiderdoni E, An G, Hsing Y, Eun MY, Han C, Upadhyaya N, Ramachandran S, Zhang Q, Pereira A, Sundaresan V, Leung H (2004) Rice mutant resources for gene discovery. Plant Mol Biol 54:325–334 Hsing Y-I, Chern C-G, Fan M-J, Lu P-C, Chen K-T, Lo S-F, Ho S-L, Lee K-W, Wang Y-C, Sun P-K, Ko R, Huang W-L, Chen J-L, Chung C-I, Lin Y-C, Hour A-L, Wang Y-W, Chang Y-C, Tsai M-W, Lin Y-S, Chen Y-C, Chen S, Yen H-M, Li C-P, Wey C-K, Tseng C-S, Lai M-H, Chen L-J, Yu S-M (2007) A rice gene activation/knockout mutant resource for high throughput functional genomics. Plant Mol Biol, 63:351–364 Huang S, Cerny RE, Bhat DS, Brown SM (2001) Cloning of an Arabidopsis patatin-like gene, STURDY, by activation T-DNA tagging. Plant Physiol 125: 573–584 Ichikawa T, Nakazawa M, Kawashima M, Muto S, Gohda K, Suzuki K, Ishikawa A, Kobayashi H, Yoshizumi T, Tsumoto Y, Tsuhara Y, Iizumi H, Goto Y, Matsui M (2003) Sequence database of 1172 T-DNA insertion sites in Arabidopsis activation-tagging lines that showed phenotypes in T1 generation. Plant J 36:421–429 International Rice Genome Sequencing Project (2005) The map-based sequence of the rice genome. Nature 436:793–800 Ito T, Meyerowitz EM (2000) Over-expression of a gene encoding a cytochrome P450, CYP78A9, induces large and seedless fruit in Arabidopsis. Plant Cell 12:1541–1550 Ito Y, Eiguchi M, Kurata N (2004) Establishment of an enhancer trap system with Ds and GUS for functional genomics in rice. Mol Genet Genom 271:639–650
13 Activation Tagging Systems in Rice
351
Jeon J, Sichul L, Ki-Hong J, Jun S, Jeong D, Lee J, Kim C, Jang S, Lee S, Yang K, Nam J, An K, Han M, Sung R, Choi H, Yu J, Choi J, Cho S, Cha S, Kim S, An G (2000) T-DNA insertional mutagenesis for functional genomics in rice. Plant J 22:561–570 Jeong DH, An S, Kang HG, Moon S, Han JJ, Park S, Lee HS, An K, An G (2002) T-DNA insertional mutagenesis for activation tagging in rice. Plant Physiol 130:1636–1644 Jeong DH, An S, Park S, Kang HG, Park GG, Kim SR, Sim J, Kim YO, Kim MK, Kim SR, Kim J, Shin M, Jung M, An G (2006) Generation of a flanking sequence-tag database for activation-tagging lines in japonica rice. Plant J 45:123–132 Johnson AAT, Hibberd JM, Gay C, Essah PA, Haseloff J, Tester M, Guiderdoni E (2005) Spatial control of transgene expression in rice (Oryza sativa L.) using the GAL4 enhancer trapping system. Plant J 41:779–789 Kakimoto T (1996) CKI1, a histidine kinase homolog implicated in cytokinin signal transduction. Science 274:982–985 Kardailsky I, Shukla VK, Ahn JH, Dagenais N, Christensen SK, Nguyen JT, Chory J, Harrison MJ, Weigel D (1999) Activation tagging of the floral inducer FT. Science 286:1962–1965 Kerk NM, Ceserani T, Tausta SL, Sussex IM, Nelson TM (2003) Laser capture microdissection of cells from plant tissues. Plant Physiol 132:27–35 Kiegle E, Moore C, Haseloff J, Tester M, Knight M (2000) Cell-type specific calcium responses to drought, NaCl, and cold in Arabidopsis root: a role for endodermis and pericycle in stress signal transduction. Plant J 23:267–278 Lee H, Suh S, Park E, Cho E, Ahn JH, Kim S, Lee JS, Kwon, YM, Lee I (2000) The AGAMOUS-LIKE 20 MADS domain protein integrates floral inductive pathways in Arabidopsis. Genes Dev 14:2366–2376 Li J, Lease KA, Tax FE, Walker JC (2001) BRS1, a serine carboxypeptidase, regulates BRI1 signaling in Arabidopsis thaliana. Proc Natl Acad Sci USA 98:5916–5921 Li J, Wen J, Lease KA, Doke JT, Tax FE, Walker JC (2002) BAK1, an Arabidopsis LRR receptor-like protein kinase, interacts with BRI1 and modulates brassinosteroid signaling. Cell 110:213–222 Liang D, Wu C, Li C, Xu C, Zhang J, Kilian A, Li X, Zhang Q, Xiong L (2006) Establishment of a patterned GAL4/VP16 transactivation system for discovering gene function in rice. Plant J 46:1059–1072 Marsch-Martinez N, Greco R, Arkel VG, Herrera-Estrella L, Pereira A (2002) Activation tagging using the En-I maize transposon system in Arabidopsis. Plant Physiol 129:1544–1556 Masaki T, Tsukagoshi H, Mitsui N, Nishii T, Hattori T, Morikami A, Nakamura K (2005) Activation tagging of a gene for a protein with novel class of CCTdomain activates expression of a subset of sugar-inducible genes in Arabidopsis thaliana. Plant J 43:142–152 Matsuhara S, Jingu F, Takahashi T, Komeda Y (2000) Heat shock tagging: a simple method for expression and isolation of plant genome DNA flanked by T-DNA insertions. Plant J 22:79–86
352
Alexander A.T. Johnson et al.
Miki D, Itoh R, Shimamoto K (2005) RNA silencing of single and multiple members in a gene family of rice. Plant Physiol 138:1903–1913 Moore I, Samalova M, Kurup S (2006) Transactivated and chemically inducible gene expression in plants. Plant J 45:651–683 Nakazono M, Qiu F, Borsuk LA, Schnable PS (2003) Laser-capture microdissection, a tool for the global analysis of gene expression in specific plant cell types: identification of genes expressed differentially in epidermal cells or vascular tissues of maize. Plant Cell 15:583–596 Nawy T, Lee JY, Colinas J, Wang JY, Thongrod SC, Malamy JE, Birnbaum K, Benfey PN (2005) Transcriptional profile of the Arabidopsis root quiescent center. Plant Cell 17:1908–1925 Neff MM, Nguyen SM, Malancharuvil EJ, Fujioka S, Noguchi T, Seto H, Tsubuki M, Honda T, Takatsuto S, Yoshida S, Chory J (1999) BAS1: a gene regulating brassinosteroid levels and light responsiveness in Arabidopsis. Proc Natl Acad Sci USA 96:15316–15323 Odell JT, Nagy F, Chua N (1985) Identification of DNA sequences required for activity of the cauliflower mosaic virus 35S promoter. Nature 313:810–812 Peng H, Huang H, Yang Y, Zhai Y, Wu J, Huang D, Lu T (2005) Functional analysis of GUS expression patterns and T-DNA integration characteristics in rice enhancer trap lines. Plant Sci 168:1571–1579 Phelps CB, Brand AH (1998) Ectopic gene expression in Drosophila using GAL4 system. Methods 14:367–379 Rørth P (1996) A modular misexpression screen in Drosophila detecting tissuespecific phenotypes. Proc Natl Acad Sci USA 93:12418–12422 Sabatini S, Heidstra R, Wildwater M, Scheres B (2003) SCARECROW is involved in positioning the stem cell niche in the Arabidopsis root meristem. Genes Dev 17:354–358 Schaffer R, Ramsay N, Samach A, Corden S, Putterill J, Carré IA, Coupland G (1998) The late elongated hypocotyl mutation of Arabidopsis disrupts circadian rhythms and the photoperiodic control of flowering. Cell 93: 1219–1229 Schwab R, Ossowski S, Riester M, Warthmann N, Weigel D (2006) Highly specific gene silencing by artificial microRNAs in Arabidopsis. Plant Cell 18:1121–1133 Sun J, Niu QW, Tarkowski P, Zheng B, Tarkowska D, Sandberg G, Chua NH, Zuo J (2003) The Arabidopsis AtIPT8/PGA22 gene encodes an isopentenyl transferase that is involved in de novo cytokinin biosynthesis. Plant Physiol 131:167–176 Terada R, Urawa H, Inagaki Y, Tsugane K, Iida S (2002) Efficient gene targeting by homologous recombination in rice. Nat Biotech 20:1030–1034 Tester M, Davenport R (2003) Na+ tolerance and Na+ transport in higher plants. Ann Bot 91:503–527 The Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796–815 Toba G, Ohsako T, Miyata N, Ohtsuka T, Seong KH, Aigaki T (1999) The gene search system: a method for efficient detection and rapid molecular identification of genes in Drosophila melanogaster. Genetics 151:725–737
13 Activation Tagging Systems in Rice
353
van der Graaff E, Dulk-Ras AD, Hooykaas PJ, Keller B (2000) Activation tagging of the LEAFY PETIOLE gene affects leaf petiole development in Arabidopsis thaliana. Development 127:4971–4980 van der Graaff E, Hooykaas PJ, Keller B (2002) Activation tagging of the two closely linked genes LEP and VAS independently affects vascular cell number. Plant J 32:819–830 Walden, R, Fritze K, Hayashi H, Miklashevichs E, Harling H, Schell J (1994) Activation tagging: a means of isolating genes implicated as playing a role in plant growth and development. Plant Mol Biol 26:1521–1528 Weigel D, Ahn JH, Blázquez MA, Borevitz JO, Christensen SK, Fankhauser C, Ferrandiz C, Kardailsky I, Malancharuvil EJ, Neff MM, Nguyen JT, Sato S, Wang ZY, Xia Y, Dixon RA, Harrison MJ, Lamb CJ, Yanofsky MF, Chory J (2000) Activation Tagging in Arabidopsis. Plant Physiol 122:1003–1013 Wilson K, Long D, Swinburne J, Coupland G (1996) A Dissociation insertion causes a semidominant mutation that increases expression of TINY, an Arabidopsis gene related to APETALA2. Plant Cell 8:659–671 Wu C, Li X, Yuan W, Chen G, Kilian A, Li J, Xu C, Li X, Zhou DX, Wang S, Zhang Q (2003) Development of enhancer trap lines for functional analysis of the rice genome. Plant J 35:418–427 Yang Y, Peng H, Huang H, Wu J, Jia S, Huang D, Lu T (2004) Large-scale production of enhancer trapping lines for rice functional genomics. Plant Sci 167:281–288 Zhang J, Li C, Wu C, Xiong L, Chen G, Zhang Q, Wang S (2006) RMD: a rice mutant database for functional analysis of the rice genome. Nucl Acids Res 34:745–748 Zhao Y, Christensen SK, Fankhauser C, Cashman JR, Cohen JD, Weigel D, Chory J (2001) A role for flavin monooxygenase-like enzymes in auxin biosynthesis. Science 291:306–309 Zuo J, Niu QW, Frugis G, Chua NH (2002) The WUSCHEL gene promotes vegetative-to-embryonic transition in Arabidopsis. Plant J 30:349–359
14 Informatics Resources for Rice Functional Genomics
1
2
3
4
Baltazar A. Antonio , C. Robin Buell , Yukiko Yamazaki , Immanuel Yap , 5 6 Christophe Perin and Richard Bruskiewich 1
National Institute of Agrobiological Sciences, 2-1-2 Kannondai, Tsukuba, Ibaraki 305-8602, Japan; 2The Institute for Genomic Research, 9712 Medical Center Dr, Rockville MD 20850, USA; 3National Institute of Genetics, Yata 1111, Mishima, Shizuoka 411-8540, Japan; 4Department of Plant Breeding and Genetics, Cornell University, Ithaca, NY 14853, USA; 5Centre de coopération internationale en recherche agronomique pour le développement, 40/03 Avenue Agropolis, 34398 Montpellier Cedex 5, France; 6International Rice Research Institute, DAPO Box 7777, Manila, Philippines Reviewed by Wm L. Crosby and Richard Cooke
14.1 Introduction............................................................................................356 14. 2 NIAS Informatics Resources ................................................................359 14.2.1 INtegrated Rice Genome Explorer .................................................359 14.2.2 RGP Annotation Databases ............................................................361 14.2.3 KOME ............................................................................................362 14.2.4 Rice PIPELINE ..............................................................................362 14.3 TIGR Informatics Resources .................................................................363 14.4 Oryzabase ..............................................................................................366 14.4.1 Database Contents ..........................................................................366 14.4.2 Genetic Stocks................................................................................368 14.4.3 Comparative Genomics Resources.................................................368 14.5 Gramene.................................................................................................369 14.5.1 Genome Browser............................................................................370 14.5.2 Maps and Markers ..........................................................................370 14.5.3 QTL, Genes, and Proteins ..............................................................371 14.5.4 Ontology.........................................................................................372 14.5.5 Database Availability .....................................................................372 14.6 CIRAD Informatics Resources ..............................................................373 14.6.1 OryGenesDB ..................................................................................373 14.6.2 Oryza Tag Line...............................................................................375 14.6.3 Greenphyl .......................................................................................376 14.7 IRRI Informatics Resources...................................................................377 14.7.1 The International Rice Information System ...................................378
356
Baltazar A. Antonio et al.
14.7.2 Current Developments.................................................................... 379 14.8 Insertion Mutant Databases ................................................................... 380 14.8.1 Tos17 Insertion Mutant Database................................................... 380 14.8.2 Rice Mutant Database .................................................................... 381 14.8.3 Rice Ds Tagging Lines................................................................... 381 14.8.4 Taiwan Rice Insertional Mutants Database.................................... 382 14.8.5 Shanghai T-DNA Insertion Population .......................................... 383 14.8.6 Rice T-DNA Insertion Sequence Database .................................... 383 14.8.7 Rice FST Database at UC Davis .................................................... 384 14.8.8 CSIRO Rice FST Database and RGMIMS .................................... 384 14.8.9 RiceGE: Rice Functional Genomic Browser.................................. 385 14.9 Integration of Rice Functional Genomics Information .......................... 386 14.9.1 High-Speed Networks .................................................................... 386 14.9.2 Grid Computing ............................................................................. 387 14.9.3 Web Integration.............................................................................. 387 14.10 Rice Functional Genomics Network.................................................... 388 Acknowledgments ......................................................................................... 389 References ..................................................................................................... 389
14.1 Introduction Functional genomics research requires an informatics infrastructure that will facilitate categorizing information on genome sequence, cDNAs, proteins, genetic stocks, and many other important genomic parameters in order to make them easily accessible to a wide range of users. Parallel to the sequencing of the rice genome initiated in 1998, several databases have been constructed that facilitated the eventual linkage of all genomic information as the genome sequence emerged. Genome databases developed by the participating members of the International Rice Genome Sequencing Project (IRGSP), particularly in Japan and the United States, provided information on gene and protein sequences, alignments, predicted gene function, map positions, protein families and domains, and so forth. Then, as various laboratories around the world embarked on full-scale studies to characterize the function of almost 40,000 genes predicted in rice, more databases and analysis tools that could expedite functional genomics research became available (Table 14.1). The large amount of information that has emerged from genome sequencing as well as the post-sequencing era poses a new challenge to rice bioinformatics. It has become necessary to organize various types of information and tools in a way that users with diverse backgrounds and research interests could easily access, retrieve, and integrate. The wide application of new technologies for measuring gene expression such as microarray, tiling array, biochemical profiling, MPSS (massively parallel signature sequencing), and SAGE (serial analysis of gene expression) also
14 Informatics Resources for Rice Functional Genomics
357
Table 14.1. Partial inventory of online public rice/crop/plant informatics databases A. Rice genomics databases Database RGP IRGSP RAP-DB INE RiceGAAS RAD TIGR Rice
Oryzabase IRIS IRFGC BGI RISe OryzaSNP OMAP
Description Japanese Rice Genome Research Program International Rice Genome Sequencing Project Rice Annotation Project database (IRGSP-assembled pseudomolecules) Integrated Rice Genome Explorer Rice Genome Automated Annotation System Rice Annotation Database The Institute for Genomic Research (TIGR) rice genome annotation database (TIGRassembled pseudomolecules) Oryza genetics database International Rice Information System International Rice Functional Genomics Consortium Beijing Genomics Institute Rice Genome Database Oryza Single Nucleotide Polymorphism Consortium Comparative Genome Maps of Oryza wild relatives
Rice Proteome
Rice Proteome Database
KOME
Rice full-length cDNA sequence database Unification tool for NIAS rice databases Comparative grass genomics anchored on rice Munich Information Center for Protein Sequence (MIPS) (Oryza sativa) Database
RicePIPELINE Gramene MOsDB
URL http://rgp.dna.affrc.go.jp/E/ http://rgp.dna.affrc.go.jp/E/IRGSP/ index.html http://rapdb.lab.nig.ac.jp/ http://rgp.dna.affrc.go.jp/E/giot/INE.html http://ricegaas.dna.affrc.go.jp/ http://golgi.gs.dna.affrc.go.jp/SY-1102/rad/ http://rice.tigr.org/
http://shigen.lab.nig.ac.jp/rice/oryzabase/ http://www.iris.irri.org/ http://www.iris.irri.org:8080/IRFGC/ http://rise.genomics.org.cn/rice/index2.jsp http://www.oryzasnp.org/ http://mips.gsf.de/proj/plant/jsf/rice/ index.jsp http://gene64.dna.affrc.go.jp/RPD/ main_en.html http://cdna01.dna.affrc.go.jp/cDNA/ http://cdna01.dna.affrc.go.jp/PIPE/ http://www.gramene.org/ http://mips.gsf.de/proj/plant/jsf/rice/ index.jsp
B. Rice mutant lines databases
OryGenesDB Oryza Tag Line Rice Mutant Database Rice Ds Tagging Lines TRIM SHIP
Description Tos17 insertion mutant database Rice flanking sequence tags Rice phenotypic and reporter gene expression database Rice mutant database for T-DNA insertion lines Rice Ac/Ds mediated gene tagging lines Taiwan Rice Insertional Mutants Shanghai T-DNA Insertion Population
URL http://tos.nias.affrc.go.jp/ http://orygenesdb.cirad.fr http://urgi.versailles.inra.fr/OryzaTagLine/ http://rmd.ncpgr.cn/ http://genebank.rda.go.kr/dstag/ http://trim.sinica.edu.tw/index.php http://ship.plantsignal.cn/ ( Continued (
Database Tos17
Baltazar A. Antonio et al.
Table 14.1. ( Continued RISD Rice FST Database CSIRO Resources RiceGE
(
358
Rice T-DNA Insertion Sequence Rice insertion lines containing Ds gene trap, Ds element or dSpm Ds/T-DNA launch pads and Ds insertion lines and FSTs Rice Functional Genomics Browser
http://an6.postech.ac.kr/pfg/index.php http://sundarlab.ucdavis.edu/rice/blast/ blast.html http://www.pi.csiro.au/fgrttpub/ knowngene.htm http://signal.salk.edu/cgi-bin/RiceGE
C. Gene expression databases Database RED RMOS Rice Array Db Yale Plant Genomics Rice MPSS
Description Rice Expression Database Rice Microarray Opening Site NSF Rice Oligonucleotide Array Project Gene expression from tiling path arrays Rice Massive Parallel Signature Sequencing gene expression database
URL http://red.dna.affrc.go.jp/RED/ http://cdna01.dna.affrc.go.jp/RMOS/ http://www.ricearray.org/ http://plantgenomics.biology.yale.edu/ http://mpss.udel.edu/rice/
D. Other functional genomics related databases Database PlexDB GRIN TAIR PLACE db PlantCARE MATDB Plant Genomes Central EXPASY
Description Plant Expression Database Plant germplasn resources information network The Arabidopsis Information Resource Plant cis-acting regulatory DNA elements Plant cis-acting regulatory DNA elements MIPS Arabidopsis thaliana database NCBI Plant Genomes Central —genome projects in progress Index to other plant specific databases
URL http://www.plexdb.org/ http://www.ars-grin.gov/ http://www.arabidopsis.org/ http://www.dna.affrc.go.jp/PLACE/ http://bioinformatics.psb.ugent.be/webtools/ plantcare/html/ http://mips.gsf.de/proj/thal/db/ http://www.ncbi.nlm.nih.gov/genomes/ PLANTS/PlantList.html http://www.expasy.org/links.html
requires divergent computational methods to support such analyses. Significant progress in comparative genomics, particularly in various aspects of genome colinearity among grass species, has made it compelling and indispensable to view rice genomics data in relation to other members of the Poaceae. Further, to provide a unified method for defining terms and organizing data, it has also become necessary to use controlled vocabularies to unambiguously correlate the genetic or biochemical function defined in rice with those of other organisms. This chapter focuses on various bioinformatics tools and resources developed by the major players in rice functional genomics. As the real challenge is in the integration of various functional genomics related data, network integration of rice functional genomics information is described as well.
14 Informatics Resources for Rice Functional Genomics
359
14.2 NIAS Informatics Resources The National Institute of Agrobiological Sciences (NIAS) has a large-scale program for rice genome research supported by the Ministry of Agriculture, Forestry, and Fisheries (MAFF) of Japan. This includes various projects on structural and functional characterization of the genome, analysis of agronomically important genes, and development of techniques for their utilization. Several databases have been constructed as part of these projects highlighting the progress that has been accomplished since the Rice Genome Research Program (RGP) was initiated in 1991 to the completion of genome sequencing in 2004, as well as various efforts aimed at characterizing the function of the wide array of genes that comprise the rice genome. 14.2.1 INtegrated Rice Genome Explorer The informatics effort of the RGP, which led the international rice genome sequencing effort, centers on the database aptly named INE (INtegrated Rice Genome Explorer, http://rgp.dna.affrc.go.jp/giot/INE.html). In the Japanese language, INE literally means “rice plant” and therefore signifies the core of a large-scale effort to characterize the plant that plays a vital role not only in agriculture but also in Japanese culture and economy. This database was developed primarily to integrate the genetic and physical mapping data with the sequence of the rice genome (Sakata et al. 2000). In addition, it also functions as a repository of rice genome sequences from the international sequencing collaboration. At present, the database consists of the genetic map with 3,267 DNA markers (Kurata et al. 1994; Harushima et al. 1998), a yeast artificial chromosome (YAC)-based physical map covering 80% of the genome (Saji et al. 2001), a transcript map of rice with about 6,500 expressed sequence tags (ESTs; Wu et al. 2002), P1-derived artificial chromosome/bacterial artificial chromosome (PAC/BAC) contigs representing the physical map used for genome sequencing, the sequence data of 1,810 PAC and BAC clones (International Rice Genome Sequencing Project 2005), and annotation of each BAC/PAC clone sequence. The genetic map, transcript map, and physical map are therefore integrated with the genome sequence data for each chromosome, thus allowing users to correlate various types of rice genomics data. The integrated maps for each chromosome provide a general overview of the genomic information such as the distribution of DNA markers, the position of ordered YAC clones, localization of ESTs in the YAC clones, and the sequence-ready physical map with PAC/BAC contigs (Fig. 14.1). The DNA markers in the genetic map can be traced to the physical map,
360
Baltazar A. Antonio et al.
Fig. 14.1. An overview of mapping and sequence information in INE database. As a resource for map-based genomics, the database consists of a genetic map, which serves as a center-point linking the markers to the physical maps and the completed genome sequence of rice
transcript map, and the genome sequence. Details for DNA markers, YAC clones, ESTs, and PAC/BAC clones are available through Java applet windows. Links are also provided on the curated annotation of all sequenced clones showing detailed analysis of the genome sequence. The annotation map includes detailed analysis of the structural features of the predicted genes as well as the results of homology searches with nonredundant protein and EST databases. A relational database scheme has been implemented to improve facile access to the database and facilitate a robust searching capability. The maps can be manipulated by zoom in/out, which enables browsing at detail-oriented levels. A Java-based viewer facilitates rapid display of integrated maps as well as applet windows and contributes to smooth navigation of specific information associated with each data set. INE provides a starting point for many functional genomics studies. Map-based cloning of genes associated with complex traits can proceed from identification of markers associated with a trait followed by chromosome walking; selection of clones such as YAC, PAC, or BAC containing the marker; and analysis of the sequence data. With the completion of the genome sequence, characterization of many agronomically important genes will be the major objective in many plant improvement programs. The major strength of this approach will rely on a comprehensive database and the availability of resources to implement a viable strategy.
14 Informatics Resources for Rice Functional Genomics
361
14.2.2 RGP Annotation Databases The RGP developed several annotation databases as part of the genome sequencing effort. Initially, an automated annotation system called RiceGAAS (Rice Genome Automated Annotation System; Sakata et al. 2002) was developed to provide the community with a comprehensive analysis of sequenced rice genomic clones as soon as the data became available in the public domain. The accompanying database (http://ricegaas. dna.affrc.go.jp/) provides up-to-date annotation of rice genome sequence entries collected on a regular basis and subjected to annotation. The results of a homology search against protein and EST databases; gene prediction using various programs such as GENSCAN, GENSCAN+, and RiceHMM; as well as analysis of exons, splice sites, repeats, and transfer RNA are provided in the database. The genes and long terminal repeats (LTRs) are predicted based on an algorithm that combines multiple gene prediction programs with homology search results. The results of various analyses used for constructing the most plausible model of a predicted gene are then integrated and visualized in an annotation map via a Web-based graphical viewer with links to the results of all analysis. A contig-oriented Rice Annotation Database (RAD, http://golgi.gs.dna. affrc.go.jp/SY-1102/rad/) was also developed to complement the genome map information in INE (Ito et al. 2005). The primary objective of RAD is to facilitate efficient management of the sequenced PAC and BAC clones generated by the international sequencing collaboration and to provide a comprehensive database of manually curated annotation. As a relational database, RAD facilitates the storage, query, and visualization of annotation information such as sequence data, predicted genes, and homology analysis of each PAC/BAC clone. This information is managed at the contig or chromosome level to provide an overall view of the structural features of specific regions of the genome. The database also provides results of various analyses such as GC content, splice sites, amino acid usage, gene codon usage, gene length distribution, Gene Ontology (GO), and functional classification based on MIPS criteria. An annotation database describing the manual curation of all predicted genes in rice (RAP-DB, http://rapdb.lab.nig.ac.jp/) based on rice IRGSPassembled pseudomolecules is described in Chapter 3 on genome annotation. It should be noted that TIGR also independently assembled the rice pseudomolecules described in the section that follows.
362
Baltazar A. Antonio et al.
14.2.3 KOME The KOME Database (Knowledge-based Oryza Molecular Biological Encyclopedia, http://cdna01.dna.affrc.go.jp/cDNA/) was constructed as a repository for the rice full-length cDNA clones that have been collected and completely sequenced as part of the Rice Full-length cDNA Project of NIAS (Kikuchi et al. 2003). The starting materials for library construction consisted of about 20 kinds of tissues of japonica rice grown in normal or under stressed conditions. A total of 170,000 clones were randomly selected and subjected to 3΄ terminal single-pass sequencing. Based on the sequence information, the clones were grouped into 28,000 independent groups and representative clones were completely sequenced. Information on individual full-length cDNA clones can be accessed by BLAST search, accession number, specific domain name of the clones, or using a key word. The information provided in the resulting KOME report page for each clone consists of the nucleotide sequence and encoded amino acid sequence information, results of the homology search with the public databases, mapping information, pattern of alternative splicing, protein domain information, transmembrane structure, cellular localization, and gene function by GO. 14.2.4 Rice PIPELINE The Rice PIPELINE (http://cdna01.dna.affrc.go.jp/PIPE/) was developed to integrate all databases associated with rice genome research at the NIAS and to provide a unification tool for plant functional genomics (Yazaki et al. 2004). These databases include KOME for full-length rice cDNAs; INE for genetic map, physical map, and genome sequence; RED for rice gene expression profiles; Tos17 for insertion mutant lines; and PLACE for cis-acting regulatory elements (Fig. 14.2). All data from these databases were dynamically collated to provide available genomics information for a specific query such as a sequence, a keyword, accession number, or ID. A Rice PIPELINE query generates three types of information— structural, gene expression, and genome information. The first pathway shows the flow of gene structural information from the query. Users can obtain a KOME report from the KOME database, which includes results of nucleic acid and amino acid analysis, domain search results, and GO classification. In addition, the KOME report provides links to information on ciselement motifs at the 5΄ upstream region of the full-length cDNA through the PLACE search, phenotypes with flanking sequences through search from the Tos17 Insertion Mutant Database and expression profiles from RED. Such information can be useful for in silico analysis of transcriptional
14 Informatics Resources for Rice Functional Genomics
363
Fig. 14.2. The Rice PIPELINE is a unification tool for all genomics databases of NIAS providing structural and functional information for specific searches
factors and for elucidating gene structure. The second pathway indicates the flow of gene expression information from the query. Users can obtain expression profiles from RED and EST analysis from the MAFF DNA Bank report and the GenBank report. In addition, as RED provides a link to INE, the genetic and physical map information can also be accessed. This allows the user to characterize the function of the gene based on expression profiling under various physiological conditions. The third pathway indicates the flow of information from the BLAST results. A BLAST search from the japonica genome sequence can provide genetic map and physical map information from INE. A BLAST search from the indica genome sequence provides a GenBank report. From the information obtained through this pathway, the user can get an overview of the map position, which could be useful for high-resolution genetic mapping, positional cloning, and genetic dissection of quantitative traits. Thus, a single search provides a user with all the necessary information that can be used for structural and functional characterization of the gene.
14.3 TIGR Informatics Resources Although the International Rice Genome Sequencing Project has completed a high-quality, finished, map-based sequence of the rice genome (International Rice Genome Sequencing Project 2005), to maximize the use of this sequence by not only rice biologists and breeders but also other
364
Baltazar A. Antonio et al.
cereal biologists, The Institute for Genomic Research (TIGR) was funded to annotate the rice genome. The primary goal of the TIGR Rice Genome Annotation project is to generate high-quality, uniform structural and functional annotation of the rice genome and make this available to the public. Central to the project is the construction of a set of pseudomolecules that represent the 12 rice chromosomes. The fourth version of the pseudomolecules was released, along with associated annotation, in January 2006. The details of the annotation pipeline were released previously (Yuan et al. 2005), and the reader is referred to this publication for specifications of the annotation pipeline. In brief, the backbone of the structural annotation pipeline is the ab initio gene finder FGENESH (Salamov and Solovyev 2000) and the Program to Assemble Spliced Alignments (Haas et al. 2003) that results in gene models that have been updated with transcription evidence. Functional annotation for the gene models is obtained using sequence similarity with known proteins and/or the presence of domains from the Pfam database. Other types of functional annotation of the gene models that are provided through this project are viewable through the Rice Genome Browser (http://www.tigr.org/tigr-scripts/osa1_web/gbrowse/rice). The rationale for providing diverse types of annotation is that all forms of functional annotation are valuable to the end user. These additional functional annotation data types include alignments to other plant sequences that are essential to use of the rice genome by cereal biologists. The alignments of the rice genome include the TIGR Gene Indices (monocot, dicot, and other plants), rice genetic markers, wheat genetic markers, maize genetic markers, sorghum genetic markers, and flanking sequence tags from rice insertion lines. Another valuable component of the functional annotation is the provision of links to expression data for the rice gene models available through a myriad of large-scale rice gene expression projects. Central to the TIGR Rice Genome Annotation project is public access to the data. This is obtained primarily through a set of Web pages hosted at TIGR (http://rice.tigr.org). Through these pages, search and display tools for annotation at the structural and functional levels are provided. The primary access for the data is through the Genome Browser (Fig. 14.3; http://www.tigr.org/tigr-scripts/osa1_web/gbrowse/rice). This browser is based on the Generic Genome Browser developed by the Generic Model Organism Database (GMOD) group (Stein et al. 2002). In addition to these access points, the data through FTP downloads and a Data Extractor Tool developed at TIGR are also provided.
14 Informatics Resources for Rice Functional Genomics
365
Fig. 14.3. Display of structural and functional annotation available through the TIGR Rice Genome Annotation project. Shown here is a screen capture of the TIGR Rice Genome Browser for two loci on chromosome 12. One locus, LOC_Os12g08740, encodes a putative exonuclease family protein. Shown are alignments with rice transcripts as well as monocot transcripts and the top Arabidopsis protein. Links to expression data available for this locus are shown through the MPSS tags, SAGE tags, and probes available through the NSF Rice Oligo Array project and Affymetrix. The other locus shown is LOC_Os12g08750, which encodes a hypothetical protein as there is no expression or protein evidence for this gene
A feature recently added to the TIGR Rice Genome Annotation project is the Community Annotation Tool (http://rice.tigr.org/tdb/e2k1/osa1/ca/ rice_ca_info.shtml). The goal of the Community Annotation Tool is to
366
Baltazar A. Antonio et al.
engage the greater research community to annotate at the structural and functional level rice gene families. This effort is analogous to the gene family annotation effort at The Arabidopsis Information Resource (TAIR; http://www.arabidopsis.org/). Continued efforts on the TIGR Rice Genome Annotation Project will involve further enhancements to the quality of the structural and functional annotation of the gene models and making these data even more accessible to the scientific community.
14.4 Oryzabase Oryzabase is a comprehensive rice database integrating biological data derived from various studies on morphology, physiology, and ecology with molecular genomic information (Kurata and Yamazaki 2006). The database is accessible at URL http://www.shigen.nig.ac.jp/rice/oryzabase/. 14.4.1 Database Contents The major components of the database are various data derived from efforts of biologically characterizing rice in terms of mutants and their genes, phenotypes, developmental features, and other events related to the ontology of rice. This information is then correlated with the genome sequence of rice thereby providing a direct link to structural and functional rice genomics. The data in Oryzabase are divided into 15 sections: development/anatomy, mutants, trait genes, linkage maps, physical maps, comparative maps, references, basic biological data, DNA sequence, BLAST search, chloroplast and mitochondrion, tools and protocols, strains, stock centers and wild rice (Fig. 14.4). The biological data incorporated in Oryzabase include morphological and gene expression characteristics of rice at different developmental stages and in various mutants. These data are classified into four sections—development/anatomy, trait genes, mutants, and basic biological information. The development/anatomy section (http://www.shigen.nig. ac.jp/rice/oryzabase/development/search.jsp) is the most distinguishing feature of Oryzabase. Detailed descriptions of anatomical characteristics of plant organs such as embryo/endosperm, leaf, root, panicle, spikelet, ovule, anther, and reproductive organs are available here. These organs are characterized at various stages of development based on stage names,
14 Informatics Resources for Rice Functional Genomics
367
Fig. 14.4. An overview of the major features of Oryzabase. Various data derived from biologically characterizing rice in terms of mutants and their genes, phenotypes, developmental features, and so forth are incorporated in the database
tissue sizes, stage-specific events, in situ gene expression information, related mutants, and β-glucuronidase-staining patterns of enhancer trap lines (Ito et al. 2004; Itoh et al. 2005; Kurata et al. 2005). Thus, each organ or developmental stage could be linked to a related biological and molecular event. These data are presented in a tabular form with detailed descriptions, photographs, and corresponding links to available resources so that viewers can have an overall view of all information about mutants, gene expression patterns, and references (Fig. 14. 4). The database contains information on 1,698 trait genes and 136 unclassified genes that have been identified from a mutant or natural variant (http://www.shigen.nig.ac.jp/rice/oryzabase/genes/genesTop.jsp). These mutant or natural variant genes are classified into seven organs or characters namely, vegetative organ, reproductive organ, heterochrony, coloration, seed, tolerance/resistance, and quantitative trait loci. Each organ or character is further categorized into 38 subclasses according to their characteristic features. Each entry (trait gene) is provided with a gene symbol, gene name, chromosome (location if identified), mutant class name, and GO and/or trait ontology number.
368
Baltazar A. Antonio et al.
14.4.2 Genetic Stocks The Oryzabase is also a repository for about 20,000 rice strains including wild rice accessions, cultivars, mutant lines, chromosome substitution lines, recombinant inbred lines, and marker gene lines (http://www.shigen. nig.ac.jp/rice/oryzabase/nbrpStrains/nig.jsp) in conjunction with the National Bioresource Project (NBRP) in Japan. The wild rice strains include 23 species from the AA, BB, CC, BBCC, CCDD, EE, FF, GG, and HHJJ genomes. A core collection of 289 accessions from wild species was chosen and ranked for convenience in accessing. The crossed lines include RILs of four japonica × indica crosses, chromosome substitution lines with japonica backgrounds crossed with other AA genome species. A collection of more than 6,000 mutants induced by N-methyl-N-nitrosourea (MNU) and that has been classified into 12 classes of visible phenotypes is also available. 14.4.3 Comparative Genomics Resources As to data on rice genomics, Oryzabase provides access to genome maps, sequences, and comparative mapping resources (http://www.shigen.nig. ac.jp/rice/oryzabase/maps/map.jsp). The four basic linkage maps of rice— the classical linkage map with 571 phenotypic genes (Nagato and Yoshimura 1998), the integrated linkage map with 83 restriction fragment length polymorphism (RFLP) markers and 40 phenotypic markers (Yoshimura et al. 1997), the recombinant inbred (RI) map with 375 RFLP markers (Tsunematsu et al. 1996), and the high-density Nipponbare/Kasalath linkage map with 2,275 DNA markers (Harushima et al. 1998)—were integrated using commonly mapped markers. A genome viewer showing the physical maps of the 12 rice chromosomes with windows ranging from 250 kb to 1 Mb and 10 to 100 kb has been added recently (http://www. shigen.nig.ac.jp/rice/oryzabase/genome/chromosomeList.jsp). The genome sequences represent the latest submissions in DDBJ/EMBL/GenBank. The comparative maps include barley clones and wheat ESTs mapped in the rice genome (http://www.shigen.nig.ac.jp/rice/oryzabase/comparative/ comparative.jsp). Oryzabase also aims to establish a comprehensive ontology for all morphological features and trait genes in rice. All trait genes are manually annotated and assigned with GO IDs, in agreement with the central GO database at Gramene (http://www.gramene.org/). A major priority is to develop, curate, and share controlled vocabularies describing various morphological structures of the rice plant at different stages of growth and development in line with the Plant Ontology Consortium
14 Informatics Resources for Rice Functional Genomics
369
(http://www.plantontology.org/) to establish a widely accepted plant ontology (PO) and plant and trait ontology (PATO) framework for rice. An integrated viewer, O3, will soon be incorporated in the database to correlate the concept of ontology with various biological data. Therefore Oryzabase can be a source of both information and resources that will provide the foundation for rice functional genomics.
14.5 Gramene Gramene is a comparative mapping database and resource for cereals, with an emphasis on rice. The database may be accessed at URL: http://www.gramene.org. Both manual and automatic data curation are combined to correlate information between different grass species about gene and genomes, proteins, maps and markers, quantitative trait loci (QTL), and literature citations. The sequenced rice genome is used as an anchor to facilitate and integrate cross-species comparisons and help users understand characteristics of genes, genome organization, phenotypes, and metabolic pathways. Gramene organizes the different data types into modules: genome browser, maps and markers, QTL, genes, proteins, and ontology. Movement between modules is facilitated by extensive crosslinking, which also assists in the understanding of the interrelationships between the different data types (Fig. 14.5).
Fig. 14.5. Flow diagram showing the relationships of all Gramene modules and datasets. Each module links to other modules as depicted by the arrows. Doubleheaded arrows indicate that the modules have reciprocal links
370
Baltazar A. Antonio et al.
14.5.1 Genome Browser Gramene hosts annotated genome assemblies of japonica rice, Arabidopsis thaliana, and a representation of the physical BAC map of maize. These genomes are displayed using the Ensembl genome browser (Hubbard et al. 2005). The current rice genome assembly (http://www.gramene.org/ Oryza_sativa) is based on the Osa1 rice genome annotation database from TIGR (Yuan et al. 2005). The maize genome assembly (http://www.gramene. org/Zea_mays/) is based on an FPC physical map developed by the Arizona Genomics Institute (Gardiner et al. 2004). The data for the Arabidopsis genome browser (http://www.gramene.org/Arabidopsis_thaliana/) were kindly provided by the Nottingham Arabidopsis Stock Centre (http://www. gramene.org/Arabidopsis_thaliana/), which initially imported it from TIGR (Haas et al. 2005). In addition to features mapped by the respective source databases on the rice and maize genomes, we have added additional features of interest such as genes, proteins, QTL, transcripts, markers, ESTs, single-nucleotide polymorphisms (SNPs), insertion elements, and repeats. This includes features from rice, maize, and other grasses, which facilitates cross-species comparison to the rice and maize genomes. The synteny viewer (http://www.gramene.org/Oryza_sativa/syntenyview?otherspecies=Zea_mays) displays patterns of long-range colinearity between the rice and maize genomes. In addition, gene family relationships among Arabidopsis, rice, and maize determined by the Ensembl compara pipeline allow users to compare known Arabidopsis genes to orthologues on the rice or maize genomes. 14.5.2 Maps and Markers To display and compare maps, Gramene uses the CMap Comparative Map Viewer (http://www.gramene.org/cmap/) developed in collaboration with the GMOD project (http://www.gmod.org/cmap). CMap represents a map as a linear array of interconnected features. This array could represent a single linkage group of a genetic map, a single contig of a physical map, or a single pseudomolecule of an assembled sequence. Related maps are grouped into a map set, such as the set of linkage groups produced by a genetic mapping study or the set of chromosomes annotated during a sequencing project. A feature on a map may be a point or an interval. It represents any mapped item such as a marker, clone, gene, or QTL. Correspondences may be created between features either automatically or manually if, for instance, they were mapped using the same marker. In a comparative map
14 Informatics Resources for Rice Functional Genomics
371
view, correspondences appear as lines that connect features one of map to those on another map. These correspondences allow a user to compare the linear order of features along different maps. For example, a researcher may use the comparative map viewer to investigate colinear regions found to carry genes and QTLs contributing to similar traits in different plant species. CMap currently has 166 map sets with more than 2.4 million features from 22 species, including 103 map sets from rice. To aid researchers in their mapping projects, detailed information about molecular markers is stored in the marker module (http://www.gramene. org/markers/). The markers are classified into different types, such as RFLP, simple sequence repeat (SSR), or EST. A user may search for a marker by name or synonym, marker type, and source species. Available information on a marker depends on its type and includes polymerase chain reaction (PCR) primers and conditions, source germplasm, source sequence, cross references to other databases (e.g., GenBank, MaizeGDB, or GrainGenes), and map and genome positions. The marker module currently has more than 4.5 million markers from almost 200 species of grass. 14.5.3 QTL, Genes, and Proteins The CMap module displays the position of quantitative trait loci (QTL) as intervals on a map. More information about the QTL may be viewed in the QTL module. The module houses data on more than 10,000 QTL classified into 308 traits from rice, maize, oat, barley, hexaploid and tetraploid wheat, pearl millet, foxtail millet, and wild rice. For convenience in browsing, the traits are grouped into eight major trait categories: abiotic stress, anatomy, biochemical, biotic stress, development, quality, sterility or fertility, vigor, and yield. The traits also form a controlled vocabulary and are mapped to the trait ontology (see later). Gramene currently curates basic QTL information such as trait, symbol, position, and reference. Future curation efforts will include more detailed data such as allele effects, logarithm of the odds (LOD) scores, significance thresholds, as well as allow submission of raw QTL segregation data. The genes module (http://www.gramene.org/rice_mutant/) is a manually curated resource that provides information about rice and maize genes. It includes descriptions of genes and alleles associated with morphological, developmental, and agronomically important phenotypes, physiological characteristics, biochemical functions, and isozymes. The module displays information about the phenotype, map position, gene sequence, gene products, alleles, and germplasm. Gene records also include associations to trait (TO) plant structure (PO), and growth stage (GRO) ontology terms. The
372
Baltazar A. Antonio et al.
database currently has data on 1,525 rice genes and 6,676 maize genes; the latter were added in collaboration with MaizeGDB. The protein module (http://www.gramene.org/protein/) contains information about more than 68,000 grass proteins from SwissProt-Trembl. The proteins are annotated with additional information from published reports or in silico experiments and includes information about protein domains and families from Pfam (http://www.sanger.ac.uk/ Software/Pfam/) and Prosite (http://kr.expasy.org/prosite/), mappings to Interpro (Mulder et al. 2005), transmembrane domains via TMHMM (Krogh et al. 2001), signal peptides via SignalP (Bendtsen et al. 2004), and N-terminal targeting sequences via Predotar (Small et al. 2004). In addition, each protein record displays associations to the GO for biochemical characterization and the plant ontology (PO) for gene expression and phenotype associations. 14.5.4 Ontology An ontology (or controlled vocabulary) is a catalog of standardized terms, hierarchically structured so that a user may query it at different levels. The use of ontologies provides scientists as well as databases with standard language for the description of genes, proteins, QTL, and so forth. This simplifies searches across different databases and publications and facilitates analysis of experimental results and interspecies comparisons (Berardini et al. 2004; Clark et al. 2005). Gramene is an active collaborator in the Gene Ontology (Gene Ontology Consortium 2004) and Plant Ontology (The Plant Ontology Consortium 2002) Consortiums. Gramene has also developed ontologies in house such as the trait ontology (TO) and environment ontology (EO). The various data modules and ontologies are extensively cross-linked such that, for example, a user browsing a QTL entry can click on the associated trait ontology term to view a list of related genes and proteins as well as other QTL that have been associated with the same TO term. 14.5.5 Database Availability The Gramene database may be accessed via the World Wide Web at URL http://www.gramene.org. Access to the database is free and open to the public. Gramene is an open source project. It uses the MySQL relational database management system, externally developed modules such as CMap and Ensembl, and internally developed and maintained code to dynamically render the data into Web pages. All software code and data sets are freely available for download and local installation (http://www.gramene.org/documentation/gramene_installation.html).
14 Informatics Resources for Rice Functional Genomics
373
14.6 CIRAD Informatics Resources Centre de coopération internationale en recherche agronomique pour le développement (CIRAD) has been engaged in a national and European project to develop resources for rice functional genomics (Sallaud et al. 2004; van Enckevort et al. 2005) by generating a library of 46,000 insertion lines. The insertion lines are currently being characterized for phenotypes, flanking sequences and reporter gene expression. Moreover, the library is used as a starting point for reverse and forward genetics to decipher genes involved in root development. As a first step, information systems and bioinformatics tools have been constructed to integrate these data with other rice genomics resources developed around the world. Two public databases have been created to store and exploit these resources, namely OryGenesDB (http://orygenesdb.cirad.fr/) (Droc et al. 2006) for flanking sequence tags and Oryza Tag Line (http://urgi. versailles.inra.fr/OryzaTagLine/) for phenotypic and reporter gene expression data. These two databases not only are repositories for CIRAD functional genomics data, but also aim at integrating public data generated by other groups and also developing powerful tools to assist rice molecular geneticists in obtaining all the useful information needed for their functional analyses. Comparative genomics between cereals becomes a reality with the emergence of rice as a model species and with the fast growing number of genomics resources available in other species. An A. thaliana/rice comparative genomics project based on phylogenomics concepts (Eisen and Wu 2002) was also initiated to identify rice orthologues of A. thaliana genes. This tool will be extended to other species, including cereals in order to predict gene function based on data available for rice and A. thaliana. 14.6.1 OryGenesDB OryGenesDB (http://orygenesdb.cirad.fr/) is a database developed for reverse genetics studies in rice, store insertion data generated at CIRAD (Sallaud et al. 2004; van Enckevort et al. 2005) and by some other related insertion lines projects (for a review see Hirochika et al. 2004). OryGenesDB represents the most populated database (as of February 6, 2006) with more than 60,668 flanking sequence tags (FSTs). OryGenesDB comes with tools specifically designed for reverse genetics applications. The set of interfaces developed for OryGenesDB provides users with several ways to search for insertions in candidate genes. This ensures an exhaustive as possible search for all insertions and their positions for a given list of candidate genes (Fig. 14.6).
374
Baltazar A. Antonio et al.
Fig. 14.6. Generic Genome Browser (GBrowse) view in a 30-kb region of chromosome 2. Tos17, T-DNA, and Ac/Ds insertions represented with colour coding for the flanking sequence tag track. A contextual pop-up opens each time the mouse moves under a feature, and links or feature-specific actions can be activated
The genome navigator functionality was extended based on biologists’ requests to simplify their work and speed up their search. A facility for mapping raw sequences to the rice pseudomolecules was added (“adding annotation tool”). A multi-FASTA sequence file can be uploaded for an automatic BLASTN search against the whole rice genome to map these sequences on the genome and to look for overlapping features in OryGenesDB. The GBrowse (Generic Genome Browser) genome navigator (Stein et al. 2002) with contextual pop-ups gives access to additional hyperlinks or action buttons for analysis software (Fig. 14.6). This simple feature greatly increases the power of GGB and helps users find details on a feature just by moving their mouse pointer over it. Another plug-in named All4One was added to allow dumping of one or several genome region(s). This special dump is referred to as an “aggregative track,” which means that by giving a small fragment of the genome (i.e., an array probe), the user gets information not only for a fragment that match the probe (a feature), but also all information overlapping with that feature. All the plug-ins are generic, species independent and can then be added to any other genome project using GGB.
14 Informatics Resources for Rice Functional Genomics
375
Several enhancements are being made in OryGenesDB. Newly sequenced insertions (from CIRAD or from other international projects) are being added as they become publicly available. A user management system to provide a personal account for automatic updating of FSTs is being developed. After each update of OryGenesDB, a new search will be automatically performed and an e-mail with all new FSTs identified will be sent to the account owner. 14.6.2 Oryza Tag Line Forward genetics—from a mutant phenotype to the corresponding altered gene—is a powerful strategy to isolate genes involved in a biological process including agronomical traits. To do so, a library of 46,000 T-DNA insertion lines in the reference japonica cultivar Nipponbare was generated with T-DNA constructs containing either a gusA or a Gal4:gfp enhancer trap system (to allow detection of nearby gene enhancer elements by observation for GUS activity or GFP fluorescence, respectively). The library is being evaluated under field conditions to detect interesting phenotypes and reporter gene expression patterns while the seed multiplication is carried out. The Oryza Tag Line information system (http://urgi.versailles. inra.fr/OryzaTagLine/) was developed to store this information and make it accessible to the research community. The phenotypic information system in this database currently displays agronomic, morphological, and reporter gene expression records for different insertion lines with a userfriendly graphical Web interface. Searches can be performed by keyword, trait, developmental stage, or reporter gene expression pattern (Fig. 14.7). The mutant phenotypic descriptions provided are based on controlled vocabularies specified by the Plant Ontology Consortium (2002) and Gene Ontology Consortium (2006). To date, Oryza Tag Line contains approximately 10,000 entries concerning 266 traits of interest. The Oryza Tag Line also provides the seed availability information. New observations of seed morphology or responses to inoculation by the fungal pathogen Magnaporthe grisea from the field evaluation of T-DNA insertion lines will be incorporated in the near future. The next step will be to make Oryza Tag Line interoperable with other popular plant genomic databases such as Gramene (Ware et al. 2002; Jaiswal et al. 2006), Oryzabase (Kurata and Yamazaki 2006), and the NIAS Phenome database. The plan is to enable the sharing of information using MOBY network, Web service technologies, and the ontology controlled vocabularies implemented in the Oryza Tag Line database.
376
Baltazar A. Antonio et al.
Fig. 14.7. Description of a mutant search in Oryza Tag Line. (A) Search interfaces return of a list of lines with Line ID, reporter gene expression type (when available), and FST information. Details for a specific Line ID can be obtained with an interactive link shown in B. (C) Pictures of the mutant organ and/or plant are presented. Images can be zoomed to view them with larger size and higher resolution
14.6.3 Greenphyl Molecular genetic analysis in rice is expected to be accelerated by taking advantage of the fast growing number of available A. thaliana datasets. The Greenphyl project at CIRAD aims to generate a whole proteome ortholog prediction between A. thaliana and O. sativa by phylogenomics (Eisen and Wu 2002) and to integrate the genome ortholog predictions into OryGenesDB. This will facilitate the development of a tool to predict orthologous members between any species and extract species specific sequences/subfamilies from a gene family. This will greatly accelerate functional analysis of rice based on available A. thaliana data and will also help to predict in silico divergence and convergence of genetic pathways between monocotyledons and dicotyledons. An optimal combination of phylogenetic software with some accessory algorithms is being tested on a few gene families to optimize automatic orthologs prediction. A first beta
14 Informatics Resources for Rice Functional Genomics
377
test version (not yet publicly available) is running in OryGenesDB and takes advantage of the contextual pop-up system to display additional information for a given ortholog, like the family description, the full gene family alignment, the related phylogenetic tree, and the subfamily location. Moreover, a specific orthologous Greenphyl database is also being developed (Fig. 14.8). All the Arabidopsis orthologs are also linked to the TAIR database (Garcia-Hernandez 2002; Rhee et al. 2003). A public version is expected to be released by the end of 2007. This approach will be extended later to other plants of agronomic interest. The database and tools being developed at CIRAD will not only facilitate functional analysis in rice but will also provide a platform for sharing information and seeds to elucidate the function of most rice genes in the next decade.
Fig. 14.8. Integration of A. thaliana- rice ortholog predictions in OryGenesDB. Predicted A. thaliana orthologs of Os03g49990.1 are visible in the ortholog pipeline track. Predicted paralogs of Os03g49990. A contextual pop-up displays additional information such as gene family alignment or phylogenetic tree by clicking, respectively, on the JalView button or ATV hyperlink
14.7 IRRI Informatics Resources The International Rice Research Institute (IRRI, http://www.irri.org) was established in 1960 by the Rockefeller and Ford Foundations to undertake public rice research and training to enhance rice production and food security
378
Baltazar A. Antonio et al.
in developing rice-dependent communities in Asia. Since its establishment, IRRI has remained a major centre for rice genetic resources, genomics, and crop improvement research and is home to the world’s largest ex situ collection of rice germplasm, which is maintained in trust for the United Nation’s Food and Agricultural Organization (FAO). 14.7.1 The International Rice Information System The focal point for IRRI’s crop information management for genetic resources, genomics, and crop improvement information is the International Rice Information System (IRIS; http://www.iris.irri.org). IRIS currently contains about 2 million germplasm entries with millions of associated data points in hundreds of experimental studies, including many phenotypic observations, and a growing number of genotypic measurements (Fig. 14. 9).
Fig. 14.9. Sample screen images of some IRIS Web and software tools
14 Informatics Resources for Rice Functional Genomics
379
IRIS also publishes phenotype information for the institute’s IR64 rice mutant collection (Wu et al. 2005). This latter information is searchable using a query interface permitting the specification of mutant phenotypes using the “observable,” “attribute,” and “value” phenotype and trait ontology paradigm being developed in collaboration with initiatives such as the Plant Ontology Consortium (2002). The core of IRIS is based on the International Crop Information System (ICIS; http://www.icis.cgiar.org) an “open-source” and “open-licensed” generic crop information system under development since the early 1990s by Consultative Group on International Agricultural Research (CGIAR), National Agricultural Research and Extension Systems (NARES), Advanced Research Institutes (ARI), and private sector partners (Fox and Skovmand 1996; Bruskiewich et al. 2003; McLaren et al. 2005). ICIS is designed to fully document germplasm genealogies by assigning unique germplasm identifiers serving as the integration point for germplasm associated meta-data such as passport data and to accurately cross-link germplasm entries with associated experimental observations from evaluations undertaken in the field, greenhouse, or the laboratory. In addition to specifying a common database schema, the ICIS community has collaboratively developed many freely available specialized software analysis tools and interfaces to the system for efficiently documenting, analyzing, and retrieving information about germplasm samples and studies. These include practical tools to manage lists of germplasm for plant crosses, evaluation nurseries, and collections. IRRI scientists have generated a number of high-throughput data sets including genetic maps; transcript, protein, and metabolomic expression experiments; and genotype measurements on a growing set of germplasm. Some of these data sets are available as divisions of IRIS. Although many of these data sets remain unpublished within IRIS, there is a long-term commitment to progressively post and integrate all such data online. For example, beginning in 2006, IRRI began leading a significant high– throughput genotyping project of 20 representative rice germplasm accessions using Perlegen technology (see http://www.perlegen.com). The results of this analysis will be posted on an IRIS linked project site (http://www.orzyasnp.org) in collaboration with NSF funded partners in the United States.
14.7.2 Current Developments IRRI is involved in various international research consortia and alliances, in particular, the International Rice Functional Genomics Consortium (IRFGC; http://www.iris.irri.org/IRFGC), the Generation Challenge Programme (GCP; http://www.generationcp.org), and a formal alliance with
380
Baltazar A. Antonio et al.
CIMMYT that has resulted in the creation of a joint Crop Research Informatics Laboratory spanning both institutes. Such partnerships are demanding much greater integration across data resources and research outputs, integration that will require the application of novel state-of-the-art bioinformatics methodology and technologies, developed as a team effort across many institutes. The Generation Challenge Programme in particular has a formal research subprogramme for crop information platform and network development that is accelerating the pace of development of bioinformatics standards and tools for crop research. Such standards include a comprehensive scientific domain model (published online at http://pantheon.generationcp.org), data networking protocols and associated software, downloadable from a Web site called CropForge (http://www.cropforge. org), which also now hosts the latest releases of ICIS software. As part of its current strategic planning activities, IRRI is hoping to serve as a more comprehensive hub of rice research and training information in the coming decade. That strategy includes plans to develop additional valuable reference resources for the rice functional genomics community and to catalyze the internet integration of all publicly available online resources for such research.
14.8 Insertion Mutant Databases The most important resources for functional genomics research are the genetic stocks developed by many laboratories all over the world (see Chapters 9 and 10 in this book for details). The accompanying databases provide a starting point for searching all available information concerning a gene of interest. Most of these databases provide information on flanking sequences of the disrupted genes. The search functions of these databases allow users to screen the available mutants by BLAST as well as information on how to obtain the mutant line of interest. Currently, almost 530,000 insertion mutant lines have been generated worldwide, with flanking sequence information for about 110,000 lines and phenotyping data for about 97,000 lines. The major features of representative databases including access to individual insertion lines are described here. 14.8.1 Tos17 Insertion Mutant Database The Rice Tos17 Insertion Mutant Database (http://tos.nias.affrc.go.jp/) was developed in conjunction with the Tos17 Mutant Panel Project of the National Institute of Agrobiological Sciences in Japan (Hirochika 2001).
14 Informatics Resources for Rice Functional Genomics
381
The database contains the data from a total of 50,000 Tos17 insertion mutant lines from japonica rice cultivar Nipponbare. All these mutants have been characterized phenotypically and about 10,000 lines have been analyzed for flanking sequences. Currently, images of the phenotype of mutant lines with flanking sequence data are also available. The database provides search functions by BLAST allowing users to search their sequences against the flanking sequences of the Tos17 mutant lines. A typical hit would provide information on the flanking sequence of the disrupted gene and an accompanying image of the phenotype. The Tos17 insertion lines are available through the Rice Genome Resource Center (http://www.rgrc.dna.affrc.go.jp/) of NIAS. The site also provides a clickable genome map showing the position of Tos17 insertions on the IRGSP Build 3.0 and Build 4.0 pseudomolecules. Data from T-DNA and Ds insertions analyzed in other laboratories are also merged in the clickable genome map. 14.8.2 Rice Mutant Database The Rice Mutant Database (RMD, http://rmd.ncpgr.cn/) is an archive for approximately 129,000 rice T-DNA insertion lines generated by an enhancer trap system using three japonica rice varieties, namely, Zhonghua 11, Zhonghua 15, and Nipponbare (Zhang et al. 2006). It was developed by the National Special Key Program on Rice Functional Genomics of China (Wuhan group) and is maintained by the National Center of Plant Gene Research at Huazhong Agricultural University. The database contains comprehensive information on the phenotypes, reporter-gene expression patterns, flanking sequences of T-DNA insertion sites, and seed availability of the mutant lines. The search function for the flanking sequences of T-DNA insertion sites can be initiated using either the nucleotide sequence or protein sequence as query. A keyword search is also provided to initiate the search using the entry ID of enhancer trap line or suggested keywords or phrases associated with the phenotypes. Currently, the database contains approximately 23,000 mutants based on phenotype and an additional 5,511 mutants based on expression pattern. The flanking sequence for about 14,000 entries and photographs for 733 entries are also available. 14.8.3 Rice Ds Tagging Lines The Korean consortium of Ac/Ds-mediated gene tagging projects maintains a Web site for rice Ds tagged lines (http://genebank.rda.go.kr/ dstag). The Web site describes a collection of Ds transposon insertion site sequences in a Ds population developed from a japonica rice cultivar,
382
Baltazar A. Antonio et al.
Dongjin. A gene trap system was cloned in the Ds element that carries a bar (Basta resistance gene) gene as a selection marker. These elements simultaneously disrupt gene function and monitor gene expression. Most insertion lines were generated from tissue culture of seed-derived calli carrying Ac and inactive Ds elements. On an average, each line carries two to three copies of the Ds element. Currently, a total of 73,500 lines have been established. Thermal asymmetric interlaced PCR (TAILPCR) was primarily employed to amplify the insertion sites. The primer sets for amplification of the 5΄ or 3΄ ends of Ds and optimal AD (arbitrary degenerate) primers sets were described by Kim et al. (2004). The Web site contains a search function of phenotypes, a BLAST search on flanking sequence data with the user’s query sequence, current statistics for the project, and chromosomal distribution of insertions. Searches can be performed by keyword or by sequence similarity (BLAST). So far, a total of 11,386 Ds insertion sites have been localized on rice chromosomes. Location of insertion sites is indicated with “bin” (centimorgan) of each chromosome. In the near future, all the FSTs will be mapped on the rice pseudomolecules. Seeds corresponding to individual lines have been deposited at the Yeongnam National Agriculture Station (Milyang, Korea) and are available on request. 14.8.4 Taiwan Rice Insertional Mutants Database The Taiwan Rice Insertional Mutants (TRIM, http://trim.sinica.edu.tw/) is part of the effort for rice functional genomics research in Taiwan to address characterization of gene functions using T-DNA knockout and activation-tagging strategy. Stable insertion lines were generated containing random insertions of T-DNA in japonica rice cv. Tainung 67, a popular cultivar in Taiwan. Currently, the insertion line collection consists of more than 55,000 lines with more than 20,000 flanking sequence tags (FSTs) available. Database interrogation can be initiated via BLAST, annotation, line number, and phenotype searches. Users may run BLAST using DNA sequence of interest as query to find out the tentative insertional mutants in this collection. Alternatively, users may use keywords to search for the mutants with insertions in genes of interest (e.g., transporter, kinase or transcription factor). They may also retrieve the flanking sequence by the mutant line number if they find out any interesting mutant in their searches. The phenotypes of the mutant population were scored by rice breeders and thus the traits may also be used to search for mutant lines with the target phenotype.
14 Informatics Resources for Rice Functional Genomics
383
14.8.5 Shanghai T-DNA Insertion Population The Shanghai T-DNA Insertion Population Database (SHIP, http://ship. plantsignal.cn/) contains a collection of rice mutant lines harbouring T-DNA insertions from the Shanghai Institute of Plant Physiology and Ecology. Using a japonica rice variety Zhonghua 11, a high-efficiency Agrobacterium-mediated transformation system was established and more than 50,000 independent transgenic lines were obtained. By TAIL-PCR analysis, around 6,000 flanking sequences were localized in the rice genome, and annotated mutations of the relevant genes were analyzed. In addition, after successive planting and screening, approximately 8,000 homozygous lines were obtained, allowing for the screening, phenotyping, and analyses for recessive genes. With a high-throughput infrared based quality assay whole seeds are being analyzed for starch, amylase, fat, protein contents, and gel consistency. Currently, approximately 4,000 lines have been analyzed, providing a good resource for seed quality studies. All the information in the SHIP database is regularly updated with the new flanking sequences. It is expected that at end of 2006, approximately 10,000 homozygous lines (15,000 at the end of 2007) will be available for screening, and relevant flanking sequences and quality characters will be available as well, to provide a good resource for rice functional genomics studies, especially for seed quality studies. Homozygous seeds or T2 generation seeds can be requested through links provided in the database. 14.8.6 Rice T-DNA Insertion Sequence Database The Rice T-DNA Insertion Sequence Database (RISD, http://an6.postech. ac.kr/pfg/index.php) is a collection of T-DNA insertion and activation tagging lines in japonica rice cv. Dongjin or Hwayoung from Pohang University of Science and Technology (POSTECH) Plant Functional Genomics Laboratory. The insertion lines were obtained using pGA2707 (GUS trapping vector), pGA2717 (GUS and GFP trapping vector), and activation tagging vectors pGA2715 and pGA2772 (Jeon et al. 2000; Jeong et al. 2002; An et al. 2005). The flanking sequences of each T-DNA were determined by inverse PCR or TAIL-PCR (An et al. 2003; Jeong et al. 2006). The database can be searched with the gene locus number based on the TIGR rice genome annotation database or location on the chromosome based on version 3.0 of TIGR rice pseudomolecules. The database currently contains 55,107 flanking sequences. Seeds of insertion lines can be requested by sending the request by e-mail to
[email protected] and the MTA form to Dr. Gynheung An (Department of Life Science, Pohang University of Science and Technology, Pohang, 790-784, Republic of Korea).
384
Baltazar A. Antonio et al.
14.8.7 Rice FST Database at UC Davis The Rice FST Database at Sundaresan Laboratory, University of California Davis, is part of a project to develop efficient transposon tagging strategies for large-scale transposon mutagenesis in rice (http://wwwplb.ucdavis.edu/Labs/sundar/rice/sequence.html). Stable insertion lines containing random insertions of the maize En/Spm or Ac/Ds transposons in japonica rice cv. Nipponbare were generated. A tabular list of the insertion lines with links to the flanking sequence homology can be accessed directly through the Web site. In addition, the flanking sequences of the insertion lines can be downloaded as a PDF file. The insertion lines are classified as rice gene trap insertion lines (designated with the prefix RGT) that contain a Ds gene trap element carrying a promoterless gus reporter gene with the GPA1 intron and triple splice acceptor sequences upstream of the ATG codon; rice Ds insertion lines (designated with the prefix RDs) and rice dSpm insertion line insertion lines (designated with the prefix RdSpm) harboring the dSpm element. The database also provides BLAST search and links on how to obtain the seeds of the insertion lines. As of March 2006, a total of 10,308 lines have been generated and 7,078 lines have been analyzed for flanking sequence at the site of insertion. Seeds are currently available for distribution within the United States only (http://sundarlab.ucdavis.edu/rice/seedorder/instructions.htm). 14.8.8 CSIRO Rice FST Database and RGMIMS The CSIRO Rice Functional Genomics Project has generated several specialized gene tagging constructs (based on Ac/Ds), Ds/T-DNA launch pad lines (suited for high-throughput chromosomal region-directed Ds insertional mutagenesis) and Ds insertion lines (http://www.pi.csiro.au/ fgrttpub/). A BLAST searchable FST database has also been set up (http:// www.pi.csiro.au/fgrttpub/blast_csn.htm) allowing users to search for homology of their sequence against the CSIRO FST collection. Information on lines with tags in known gene sequences can be accessed at http://www.pi.csiro.au/fgrttpub/knowngene.htm. Phenotypic information can be viewed at http://www.pi.csiro.au/fgrttpub/phenotype.htm via a userfriendly query form which is a part of the phenotyping module (one of the six modules) of Rice Gene Machine Information Management System (RGMIMS) being developed by CSIRO (see later). In addition, CSIRO FST annotation and Ds/T-DNA launch pad files can be downloaded from this site as an up-loadable file to other genome browsers such as OryGenesDB.
14 Informatics Resources for Rice Functional Genomics
385
RGMIMS
A robust laboratory information management system is critical for handling massive amounts of data generated from the large-scale insertional mutagenesis projects mentioned in the preceding text. CSIRO has developed this Web-based system to facilitate the management (storage, active updating, and querying) and integration of data on the rice mutant collection (in terms of seeds, plants, transformations, and phenotypes using relevant ontologies) in a centralized information system. The system is integrated with technologies such as barcoding and hand-held devices. RGMIMS is being developed in a modular form—plant management, transformation management, seed management, FST management, phenotype management, and ad hoc querying (L. Henry, N.M. Upadhyaya et al., unpublished). Although primarily developed for the rice insertional mutagenesis project at CSIRO (http://www.pi.csiro.au/fgrttpub/) and collaborating laboratories, the system has the potential to be used as a general laboratory-field information management system. 14.8.9 RiceGE: Rice Functional Genomic Browser The RiceGE (http://signal.salk.edu/cgi-bin/RiceGE) was established by the Salk Institute Genome Analysis Laboratory (SIGnal) and is one of the most comprehensive database and browser for rice functional genomics (Fig. 14.10). The database integrates the information on insertion lines generated by the groups/laboratories described in the preceding text. With RiceGE, users could browse rice genome by gene name, cDNA name, insertion name, or chromosome region. Most impressively, it could establish the relationship between gene and insertion, cDNA, or other data, which could not be done by any other browsers. The genome browser incorporates the IRGSP genome sequence assembled by TIGR corresponding to pseudomolecules/ annotation version 4. The mapped T-DNA/Ds flanking sequences incorporated were derived from institutional databases described in the preceding text. The database also integrates rice full-length cDNAs from the KOME database and community cDNAs, as well as IRGSP BACs/PACs, rice ESTs, markers, and long SAGE tags. Homologies to wheat, maize, barley, and Brassica cDNAs as well as Arabidopsis coding proteins are also indicated. The Web site also provides a convenient toolkit for rice functional genomics (http://signal.salk.edu/riceisects. 4.html). The SIGnAL iSECT toolkit for rice genome includes several modules that facilitate designing genomic primers for verifying rice T-DNA insertions; retrieving or fetching sequences of genes or T-DNAs from rice genome; comparing two sets of data in their intersection, difference, union, and so forth; comparing the similarity of two sequences in FASTA format; and designing the best pair of primers for submitted sequences.
386
Baltazar A. Antonio et al.
Fig. 14.10. The genome browser of RiceGE showing the mapped T-DNA/Ds flanking sequences from various insertional mutagenesis projects all over the world. The genome sequence corresponds to the TIGR V4 (2) pseudomolecules/annotation. (Screenshot from the Salk Institute Genomic Analysis Laboratory Web site at URL http://signal.salk.edu/cgi-bin/RiceGE)
14.9 Integration of Rice Functional Genomics Information Integration of rice functional genomics will be highly facilitated by recent advances in protocols for high-speed communication as well as control and management of high-speed networks. With the goal of providing timely dissemination of information, some feasible approaches are described here. 14.9.1 High-Speed Networks Bioinformatics is significantly empowered by access to high-bandwidth Internet connections. A number of next generation research networks have
14 Informatics Resources for Rice Functional Genomics
387
arisen worldwide to benefit the rice functional genomics research community: Internet2 in North America and the Asia Pacific Advanced Network (APAN) in Asia (http://www.apan.net). Such networks will continue to greatly facilitate the transfer of large research data sets between collaborators, in particular, within the International Rice Genome Sequencing Project (IRGSP) and the International Rice Functional Genomics Consortium (IRFGC; http://www.iris.irri.org/IRFGC). 14.9.2 Grid Computing Computationally intensive operations on very large bioinformatics data sets can overwhelm even the most well equipped personal computer. Timely production of research results requires access to high-performance computers. Fortunately, cost-effective computer configurations are now available in which multiple high-performance computers are interconnected as a pool of computing resources locally sharing high capacity file storage. Many university, government, and public research institutes involved in rice functional genomics research have invested in such high-performance computing facilities, which find increasing application in analyzing large structural and functional genomics crop data sets, across a global Internetconnected “grid.” Such computational research resources should become more commonly accessible to global crop researchers in the future. 14.9.3 Web Integration A principal limitation of most online databases is their dependency on Hypertext Markup Language (HTML) formatted Web pages as the primary user interfaces for data forms and display, interfaces depending on human interaction with standard Web browsers. Often, such interfaces lack flexibility and integrative power that might otherwise be found in more sophisticated analytical software. Also, users are often obliged to do their own integration of data across multiple, disjoint Web sites not sharing common formats and semantics. The resulting integration protocols tend to be tedious and time consuming to implement. Finally, HTML pages are somewhat refractory to automatic parsing, given that HTML is primarily a formatting specification, not a semantic encoding framework for content (except inadvertently, in the eyes of the human beholders of Web pages). Technologies such as semantic Web languages and Web services protocols are now being explored as a means of overcoming such limitations by creating frameworks for “computer program-friendly Web surfing” using semantic encoding eXtensible Markup Language (XML) based languages,
388
Baltazar A. Antonio et al.
such that more powerful client software than Web browsers can be designed, implemented, and deployed on the biologist’s desktop, and clients can interact more autonomously with a distributed network of information resources. One such protocol is BioCASE (http://www.biocase.org) and its likely successor, Tapir (http://ww3.bgbm.org/protocolwiki/FrontPage). Another notable protocol is the BioMOBY project (http://www.biomoby.org) that is striving to apply biological semantics to data and service specifications in a formal manner to integrate bioinformatics data sources and computational services into complex workflows that can be managed and visualized by sophisticated clients, such as the Taverna workflow tool (http://taverna.sourceforge.net/). Essential to the effective application of such technology is the development of common community standards for encoding scientific domain semantics. To this end, a significant number of ontology development projects have arisen in the international scientific community over the past decade, initiatives such as the Gene Ontology Consortium (http://www. Geneon tology.org) and the Plant Ontology Consortium (http://www.plantontology. org). In addition to ontology development, specific scientific communities are striving to develop sensible domain models to serve as common semantic frameworks for data encoding and transmission. Some examples of such efforts pertinent to the rice functional genomics community include the Functional Genomics Experiment object models (FuGE; http://fuge.sourceforge.net) and the activities of the Generation Challenge Programme (GCP; http://www.generationcp.org/model).
14.10 Rice Functional Genomics Network Integration of functional genomics data is the major concern of major databases developed by different groups. However, more than the integration of different of information, a global interactive network will be indispensable. Within the auspices of the International Rice Functional Genomics Consortium (IRFGC; http://www.iris.irri.org/IRFGC) and the GCP, efforts are underway to build a pilot integrated rice functional genomics network specifically using MOBY technology. The initial target for integration are rice genomics resources deployed at partner sites across the GCP. The vision is to create a “one stop shop” portal technology enabling users to query for rice structural and functional genomics data using ontology. A similar initiative led by the National Center for Genome Resources and funded by the NSF is the “Virtual Plant Information Network (VPIN; http://vpin.ncgr.org). VPIN will use MOBY to integrate data across NSF funded databases. Both GCP and VPIN project scientists are developing a
14 Informatics Resources for Rice Functional Genomics
389
platform of management, query, and display tools for distributed plant/crop information, tools that will become increasingly common on the rice researcher’s desk over the next few years.
Acknowledgments We acknowledge the contribution of the following people in charge of various databases: Dr. Akio Miyao (Tos17 Database); Dr. Junshi Yazaki (RicePipeline); Dr. Shoshi Kikuchi (KOME); Dr. Yue-ie Hsing (TRIM); Dr. Hong-Wei Xue (SHIP); Drs. Eun Moo Young and Changdeok Han (Korean Rice Ds Tagging Lines); Dr. Nori Kurata (Oryzabase); Drs. Thomas Metz, Martin Senger and Graham McLaren (IRIS); Mr. Gaetan Droc (OryGenesDB), Mr. Pierre Larmande (Oryza Tag Line), Mr. Matthieu Conte (GreenPhyl); Miss Leakha Henry and Dr. Narayana Upadhyaya (CSIRO); Drs. Lincoln Stein, Susan McCouch, Doreen Ware and Pankaj Jaiswal (Gramene). Information on other databases were derived from their respective Web sites. We also thank Dr. Richard Cooke (UMR CNRS-IRD-Université de Perpignan, France) and Dr. Wm L. Crosby (University of Windsor, Canada) for valuable comments and suggestions.
References An S, Park S, Jeong DH, Lee DY, Kang HG, Yu JH, Hur J, Kim SR, Kim YH, Lee M, Han S, Kim SJ, Yang J, Kim E, Wi SJ, Chung HS, Hong JP, Choe V, Lee HK, Choi JH, Nam J, Kim SR, Park PB, Park KY, Kim WT, Choe S, Lee CB, An G (2003) Generation and analysis of end-sequence database for T-DNA tagging lines in rice. Plant Physiol 133:2040–2047 An G, Lee S, Kim SH, Kim SR (2005) Molecular genetics using T-DNA in rice. Plant Cell Physiol 14:14–22 Bendtsen JD, Nielsen H, von Heijne G, Brunak S (2004) Improved prediction of signal peptides: SignalP 3.0. J Mol Biol 340:783–795 Berardini TZ, Mundodi S, Reiser L, Huala E, Garcia-Hernandez M, Zhang, Mueller, LA Yoon J, Doyle A, Lander G, Moseyko N, Yoo D, Xu I, Zoeckler B, Montoya M, Miller N, Weems D, Rhee SY (2004) Functional annotation of the Arabidopsis genome using controlled vocabularies. Plant Physiol 135:745–3755 Bruskiewich R, Cosico A, Eusebio W, Portugal A, Ramos LR, Reyes T, Sallan MAB, Ulat VJM, Wang X, McNally KL, Sackville Hamilton R, McLaren CR (2003) Linking genotype to phenotype: The International Rice Information System (IRIS). Bioinformatics 19(Suppl.1):i63–i65 Clark JI, Brooksbank C, Lomax J (2005) It’s all GO for plant scientists. Plant Physiol 138:1268–1279
390
Baltazar A. Antonio et al.
Droc G, Ruiz M, Larmande P, Pereira A, Piffanelli P, Morel JB, Dievart A, Courtois B, Guiderdoni E, Perin C (2006) OryGenesDB: a database for rice reverse genetics. Nucl Acids Res 34D736–D740 Eisen JA, Wu M (2002) Phylogenetic analysis and gene functional predictions: phylogenomics in action. Theor Popul Biol 61:481–487 Fox PN, Skovmand B (1996) The International Crop Information System (ICIS)— Connects Genebank to breeder to farmer’s field. In Cooper M, Hammer GL (eds) Plant Adaptation and Crop Improvement, CAB International, pp 317–326 Garcia-Hernandez M, Berardini TZ, Chen G, Crist D, Doyle A, Huala E, Knee E, Lambrecht M, Miller N, Mueller LA, Mundodi S, Reiser L, Rhee SY, Scholl R, Tacklind J, Weems DC, Wu Y, Xu I, Yoo D, Yoon J, Zhang P (2002) TAIR: a resource for integrated Arabidopsis data. Funct Integr Genom 2:239–253 Gardiner J, Schroeder S, Polacco ML, Sanchez-Villeda H, Fang Z, Morgante M, Landewe T, Fengler K, Useche F, Hanafey M, Tingey S, Chou H, Wing R, Soderlund C, Coe EH (2004) Anchoring 9,371 maize expressed sequence tagged unigenes to the bacterial artificial chromosome contig map by twodimensional overgo hybridization. Plant Physiol 2004:1317–1326 Gene Ontology Consortium (2004) The Gene Ontology (GO) database and informatics resource. Nucl Acids Res 32:D258–D261 Gene Ontology Consortium (2006) The Gene Ontology (GO) project in 2006. Nucl Acids Res 34:D322–D326 Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK Jr, Hannick LI, Maiti R, Ronning CM, Rusch DB, Town CD, Salzberg SL, White O (2003) Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucl Acids Res 31:5654–5666 Haas BJ, Wortman JR, Ronning CM, Hannick LI, Smith RK, Maiti R, Chan AP, Yu C, Farzad M, Wu D, White O, Town CD (2005) Complete reannotation of the Arabidopsis genome: methods, tools, protocols and the final release. BMC Biol 3:7 Harushima Y, Yano M, Shomura A, Sato M, Shimano T, Kuboki Y, Yamamoto T, Lin SY, Antonio BA, Parco A, Kajiya H, Huang N, Yamamoto K, Nagamura Y, Kurata N, Khush GS, Sasaki T (1998) A high-density rice genetic linkage map with 2275 markers using a single F2 population. Genetics 148:479–494 Hirochika H (2001) Contribution of the Tos17 retrotransposon to rice functional genomics. Curr Opin Plant Biol 4:118-122 Hirochika H, Guiderdoni E, An G, Hsing YI, Eun MY, Han CD, Upadhyaya N, Ramachandran S, Zhang Q, Pereira A, Sundaresan V, Leung H (2004) Rice mutant resources for gene discovery. Plant Mol Biol 54:325–334 Hubbard T, Andrews D, Caccamo M, Cameron G, Chen Y, Clamp M, Clarke L, Coates G, Cox T, Cunningham F, Curwen V, Cutts T, Down T, Durbin R, Fernandez-Suarez XM, Gilbert J, Hammond M, Herrero1 J, Hotz H, Howe K, Iyer V, Jekosch K, Kahari A, Kasprzyk A, Keefe D, Keenan S, Kokocinsci F, London D, Longden I, McVicker G, Melsopp C, Meidl P, Potter S, Proctor G, M. Rae M, Rios D, Schuster M, Searle S, Severin J, Slater G, Smedley D,
14 Informatics Resources for Rice Functional Genomics
391
Smith J, Spooner W, Stabenau A, Stalker J, Storey R, Trevanion S, UretaVidal A, Vogel J, White S, Woodwark C, Birney E (2005) Ensemble 2005. Nucl Acids Res 33:D447–D453 International Rice Genome Sequencing Project (2005) The map-based sequence of the rice genome. Nature 436:793–800 Ito Y, Eguchi M, Kurata N (2004) Establishment of an enhancer trap system with Ds and GUS for functional genomics in rice. Mol Genet Genomi 271:639–650 Ito Y, Arikawa K, Antonio BA, Ohta I, Naito S, Mukai Y, Shimano A, Masukawa M, Shibata M, Yamamoto M, Ito Y, Yokoyama J, Sakai Y, Sakata K, Nagamura Y, Namiki N, Matsumoto T, Higo K, Sasaki T (2005) Rice Annotation Database (RAD): a contig-oriented database for map-based rice genomics. Nucl Acids Res 33:D651–D655 Itoh JI, Nonomura KI, Ikeda K, Yamaki S, Inukai Y, Yamagishi H, Kitano H, Nagato Y (2005) Rice plant development: from zygote to spikelet. Plant Cell Physiol 46:48–62 Jaiswal P, Ni J, Yap I, Ware D, Spooner W, Youens-Clark K, Ren L, Liang C, Zhao W, Ratnapu K, Faga B, Canaran P, Fogleman M, Hebbard C, Avraham S, Schmidt S, Casstevens TM, Buckler ES, Stein L, McCouch SR (2006) Gramene: a bird’s eye view of cereal genomes. Nucl Acids Res 34: D717–D723 Jeon JS, Jang S, Lee S, Nam J, Kim C, Lee SH, Chung YY, Kim SR, Lee YH, Cho YG, An G (2000) leafy hull sterile1 is a homeotic mutation in a rice MADS box gene affecting rice flower development. Plant Cell 12:871–884 Jeong DH, An S, Kang HG, Moon S, Han JJ, Park S, Lee HS, An K, An G (2002) T-DNA insertional mutagenesis for activation tagging in rice. Plant Physiol 130:1636–1644 Jeong DH, An S, Park S, Kang HG, Park GG, Kim SR, Sim J, Kim YO, Kim MK, Kim SR, Kim J, Shin M, Jung M, An G (2006) Generation of a flanking sequence-tag database for activation-tagging lines in japonica rice. Plant J 45:123-132 Kikuchi S, Satoh K, Nagata T, Kawagashira N, Doi K, Kishimoto N, Yazaki J, Ishikawa M, Yamada H, Ooka H, Hotta I, Kojima K, Namiki T, Ohneda E, Yahagi W, Suzuki K, Li CJ, Ohtsuki K, Shishiki T, Otomo Y, Murakami K, Iida Y, Sugano S, Fujimura T, Suzuki Y, Tsunoda Y, Kurosaki T, Kodama T, Masuda H, Kobayashi M, Xie Q, Lu M, Narikawa R, Sugiyama A, Mizuno K, Yokomizo S, Niikura J, Ikeda R, Ishibiki J, Kawamata M, Yoshimura A, Miura J, Kusumegi T, Oka M, Ryu R, Ueda M, Matsubara K, Kawai J, Carninci P, Adachi J, Aizawa K, Arakawa T, Fukuda S, Hara A, Hashizume W, Hayatsu N, Imotani K, Ishii Y, Itoh M, Kagawa I, Kondo S, Konno H, Miyazaki A, Osato N, Ota Y, Saito R, Sasaki D, Sato K, Shibata K, Shinagawa A, Shiraki T, Yoshino M, Hayashizaki Y, Yasunishi A (2003) Collection, mapping, and annotation of over 28,000 cDNA clones from japonica rice. Science 301:376–379 Kim CM, Piao HL, Park SJ, Chon NS, Je BI, Sun BY, Park SH, Park JY, Lee EJ, Kim MJ, Chung WS, Lee KH, Lee YS, Lee JJ, Won YJ, Lee GH, Nam MH, Cha YS, Yun DW, Eun MY, Han CD (2004) Rapid, large-scale generation of
392
Baltazar A. Antonio et al. Ds transposant lines and analysis of the Ds insertion sites in rice. Plant J 39:252–263
Krogh A, Larsson B, von Heijne G, Sonnhammer EL (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol 305:567–580 Kurata N, Yamazaki Y (2006) Oryzabase: an integrated biological and genome information database for rice. Plant Physiol 140:12–17 Kurata N, Nagamura Y, Yamamoto K, Harushima Y, Sue N, Wu J, Antonio BA, Shomura A, Shimizu T, Lin SY, Inoue T, Fukuda A, Shimano T, Kuboki Y, Toyama T, Miyamoto Y, Kirihara T, Hayasaka K, Miyao A, Monna L, Zhong HS, Tamura Y, Wang ZX, Momma T, Umehara Y, Yano M, Sasaki T, Minobe Y (1994) A 300 kilobase interval genetic map of rice including 883 expressed sequences. Nat Genet 8:365–372 Kurata N, Miyoshi K, Nonomura K, Yamazaki Y, Ito Y (2005) Rice mutants and genes related to organ development, morphogenesis and physiological traits. Plant Cell Physiol 46:48-62 Lee S, Kim J, Han J-J, Han MJ, An G (2004) Functional analyses of the flowering time gene OsMADS50, the putative SUPPRESSOR OF OVEREXPRESSION OF CO 1/AGAMOUS-LIKE 20 (SOC1/AGL20) ortholog in rice. Plant J 38:754–764 McLaren CG, Bruskiewich RM, Portugal AM, Cosico AB (2005) The International Rice Information System (IRIS): a platform for meta-analysis of rice crop data. Plant Physiol 139:637–642 Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bradley P, Bork P, Bucher P, Cerutti L, Copley R, Courcelle E, Das U, Durbin R, Fleischmann W, Gough J, Haft D, Harte N, Hulo N, Kahn D, Kanapin A, Krestyaninova M, Lonsdale D, Lopez R, Letunic I, Madera M, Maslen J, McDowall J, Mitchell A, Nikolskaya AN, Orchard S, Pagni M, Ponting CP, Quevillon E, Selengut J, Sigrist CJA, Silventoinen V, Studholme DJ, Vaughan R, and Wu CH (2005) InterPro, progress and status in 2005. Nucl Acids Res 33:D201–D205 Nagato Y, Yoshimura A (1998) Report on the committee on gene symbolization, nomenclature and linkage map. Rice Genet Newsl 15:13–74 Plant Ontology Consortium (2002) The Plant Ontology Consortium and plant ontologies. Comp Funct Genom 3:137–142 Rhee SY, Beavis W, Berardini TZ, Chen G, Dixon D, Doyle A, Garcia-Hernandez M, Huala E, Lander G, Montoya M, Miller N, Mueller LA, Mundodi S, Reiser L, Tacklind J, Weems DC, Wu Y, Xu I, Yoo D, Yoon J, Zhang P (2003) The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community. Nucl Acids Res 31:224–228 Saji S, Umehara Y, Antonio BA, Yamane H, Tanoue H, Baba, Aoki H, Ishige N, Wu J, Koike K, Matsumoto T, Sasaki T (2001) A physical map with yeast artificial chromosome (YAC) clones covering 63% of the 12 rice chromosomes. Genome 44:32–37
14 Informatics Resources for Rice Functional Genomics
393
Sakata K, Antonio BA, Mukai Y, Nagasaki H, Sakai Y, Makino K, Sasaki T (2000) INE: a rice genome database with an integrated map view. Nucl Acids Res 28:97–102 Sakata K, Nagamura Y, Numa H, Antonio BA, Nagasaki H, Idonuma A, Watanabe W, Shimizu Y, Horiuchi I, Matsumoto T, Sasaki T, Higo K (2002) RiceGAAS: an automated annotation system and database for rice genome sequence. Nucl Acids Res 30:98–102 Salamov AA, Solovyev VV (2000) Ab initio gene finding in Drosophila genomic DNA. Genome Res 10:516–522 Sallaud C, Gay C, Larmande P, Bes M, Piffanelli P, Piegu B, Droc G, Regad F, Bourgeois E, Meynard D, Perin C, Sabau X, Ghesquiere A, Glaszmann JC, Delseny M, Guiderdoni E (2004) High throughput T-DNA insertion mutagenesis in rice: a first step towards in silico reverse genetics. Plant J 2004 39:450–64 Sasaki A, Itoh H, Gomi K, Ueguchi-Tanaka M, Ishiyama K, Kobayashi M, Jeong D-H, An G, Kitano H, Ashikari M, Matsuoka M (2003) Accumulation of phosphorylated repressor for gibberellin signaling in an F-box mutant. Science 299:1896–1898 Small I, Peeters N, Legeai F, Lurin C (2004) Predotar: a tool for rapidly screening proteomes for N-terminal targeting sequences. Proteomics 4:1581–1590 Stein LD, Mungall C, Shu S, Caudy M, Mangone M, Day A, Nickerson E, Stajich JE, Harris TW, Arva A, Lewis S (2002) The generic genome browser: a building block for a model organism system database. Genome Res 12:1599–1610 Tsunematsu H, Yoshimura A, Harushima Y, Nagamura Y, Kurata N Yano M, Sasaki T, Iwata N (1996) RFLP framework map using recombinant inbred lines in rice. Breed Sci 46:297–284 van Enckevort LJ, Droc G, Piffanelli P, Greco R, Gagneur C, Weber C, Gonzalez VM, Cabot P, Fornara F, Berri S, Miro B, Lan P, Rafel M, Capell T, Puigdomenech P, Ouwerkerk PB, Meijer AH, Pe' E, Colombo L, Christou P, Guiderdoni E, Pereira A (2005) EU-OSTID: a collection of transposon insertional mutants for functional genomics in rice. Plant Mol Biol 59:99–110 Ware DH, Jaiswal P, Ni J, Yap IV, Pan X, Clark KY, Teytelman L, Schmidt SC, Zhao W, Chang K, Cartinhour S, Stein LD, McCouch SR (2002) Gramene, a tool for grass genomics. Plant Physiol 130:1606–1613 Wu J, Maehara T, Shimokawa T, Yamamoto S, Harada C, Takazaki Y, Ono N, Mukai Y, Koike K, Yazaki J, Fujii F, Shomura A, Ando T, Kono I, Waki K, Yamamoto K, Yano M, Matsumoto T, Sasaki T (2002) A comprehensive rice transcript map containing 6591 expressed sequence tag sites. Plant Cell 14:525–535 Wu J, Wu C, Lei C, Baraoidan M, Boredos A, Madamba RS, Ramos-Pamplona M, Mauleon R, Portugal A, Ulat V, Bruskiewich R, Wang GL, Leach JE, Khush G, Leung H (2005) Chemical- and irradiation-induced mutants of indica rice IR64 for forward and reverse genetics. Plant Mol Biol 59: 85–97 Yazaki J, Kojima K, Suzuki K, Kishimoto N, Kikuchi S (2004) The Rice PIPELINE: a unification tool for plant functional genomics. Nucl Acids Res 32:D383–D387
394
Baltazar A. Antonio et al.
Yoshimura A, Ideta O, Iwata N (1997) Linkage map of phenotype and RFLP markers in rice. Plant Mol Biol 35:49–60 Yuan Q, Ouyang S, Wang A, Zhu W, Maiti R, Lin H, Hamilton J, Haas B, Sultana R, Cheung F, Wortman J, Buell CR (2005) The Institute for Genomic Research Osa1 rice genome annotation database. Plant Phys 138:18–26 Zhang J, Li C, Wu C, Xiong L, Chen G, Zhang Q, Wang S (2006) RMD: a rice mutant database for functional analysis of the rice genome. Nucl Acids Res 34:D745–D748
15 The Oryza Map Alignment Project (OMAP): A New Resource for Comparative Genome Studies Within Oryza
1
1
1
1
Rod A. Wing , HyeRan Kim , Jose Luis Goicoechea , Yeisoo Yu , Dave 1 1 1 1 Kudrna , Andrea Zuccolo , Jetty Siva S. Ammiraju , Meizhong Luo , Will 2 3 3 4 4 Nelson , Jianxin Ma , Phillip SanMiguel , Bonnie Hurwitz , Doreen Ware , 5 5 2 4 Darshan Brar , David Mackill , Cari Soderlund , Lincoln Stein and Scott 3 Jackson 1
Arizona Genomics Institute, University of Arizona, Tucson, AZ 85721, USA; Arizona Genomics Computational Laboratory, University of Arizona, Tucson, AZ 85721, USA; 3Department of Agronomy and Genomics Core Facility, Purdue University, West Lafayette, IN 47903, USA; 4Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA; 5International Rice Research Institute, Los Baños, Laguna, Philippines 2
Reviewed by John M. Watson and Evans Lagudah
15.1 Introduction............................................................................................395 15.2 Development of the OMAP BAC Library Resource .............................397 15.3 Development of Wild Species FPC/STC Physical Maps.......................399 15.3.1 BAC End Sequencing.....................................................................399 15.3.2 BAC Fingerprinting........................................................................399 15.3.3 Analysis of Structural Variation Between O. sativa and the 3 AA Genome OMAP Accessions.................................................401 15.4 Summary, Conclusions, and Future Research........................................404 References......................................................................................................407
15.1 Introduction Rice (Oryza sativa L.) is the most important human food crop in the world. The agronomic importance of rice, its shared evolutionary history with major cereal crops, and small genome size have led to the generation of a high-quality finished genome sequence by the International Rice Genome
396
Rod A. Wing et al.
Sequencing Project (2005). The highly accurate and public IRGSP sequence now serves as a unifying research platform for a complete functional characterization of the rice genome. Such an analysis will investigate the rice transcriptome, proteome, and metabolome, with the goal of understanding the biological function of all 35,000 to 40,000 rice genes and applying that information to improve rice production and quality. This comprehensive analysis will utilize a variety of techniques and resources from expression and genome tiling arrays to collections of tagged mutant populations developed in elite cultivars grown around the world. Comparative genomics between the cereal genomes and within the genus Oryza will also play a critical role in our understanding of the rice genome (Ahn et al. 1993; Ahn and Tanksley 1993; Bennetzen and Ma 2003; Han and Xue 2003; Huang and Kochert 1994; Jena et al. 1994; Ma and Bennetzen 2004). By comparing genome organization, genes, and intergenic regions between cereal species, one can identify regions of the genome that are highly conserved or rapidly evolving. Such regions are expected to yield key insights into genome evolution, speciation, and domestication. The study of conserved noncoding sequences (CNSs) between cereal genomes will also increase our ability to understand and isolate regulatory elements required for precise developmental and temporal gene expression (Kaplinsky et al. 2002). The genus Oryza is composed of two cultivated (O. sativa and O. glaberrima) and 21 wild species (Khush 1997; Vaughan et al. 2003). Based on recent phylogenetic data, Ge et al. (1999) proposed that Porteresia coarctata should be included in the genus as the 24th Oryza species. Cultivated rice is classified as an AA genome diploid and has 6 wild AA genome relatives. The remaining 15 wild species are classified into 9 other genome types that include both diploid and tetraploid species. Figure 15.1 shows a proposed phylogenetic tree of the genus Oryza as described by Ge et al. (1999) based on the analysis of two nuclear genes and one chloroplast gene. The wild rice species offer a largely untapped resource of agriculturally important genes that have the potential to solve many of the problems in rice production that we face today, such as yield, drought, and salt tolerance as well as disease and insect resistance. To better understand the wild species of rice and take advantage of the IRGSP genome sequence, we have embarked on an ambitious comparative genomics program entitled the Oryza Map Alignment Project (OMAP). The long-term objective of OMAP is to create a genome-level closed experimental system for the genus Oryza that can be used as a research platform to study evolution, development, genome organization, polyploidy, domestication, gene regulatory networks, and crop improvement. The specific objectives of OMAP are to (1) construct deep-coverage large-insert bacterial artificial
15 The Oryza Map Alignment Project (OMAP)
397
O. barthii, O. glaberrima, O. longistaminata, O. nivara, O. sativa, O. rufipogon O. punctata BBCC O. minuta, O. punctata CC O. eichingeri, O. officinalis, O. rhizomatis CCDD O. alta, O. grandiglumus, O. latifolia DD(EE) O. australiensis KK O. ? HHKK O. schlechteri FF O. brachyantha HH O. ? HHJJ O. longiglumis, O. ridleyi AA BB
JJ
O. ?
GG
O. granulata, O. meyeriana
Fig. 15.1. Phylogenetic tree of the genus Oryza. (Modified from Ge et al. 1999)
chromosome (BAC) libraries from 11 wild and 1 cultivated African Oryza species (O. glaberrima); (2) fingerprint and end-sequence clones from all 12 BAC libraries; (3) construct physical maps for all 12 Oryza species and align them to the IRGSP genome sequence; and (4) perform a detailed reconstruction of rice chromosomes 1, 3, and 10 across all 12 Oryza species (Wing et al. 2005). This chapter presents our current progress for OMAP and some early glimpses into the results we are finding.
15.2 Development of the OMAP BAC Library Resource Wild rice accessions were obtained from (1) the International Rice Research Institute (IRRI) Los Baños, Philippines; (2) the National Institute of Genetics, Mishima, Japan; and (3) Cornell University, Ithaca, New York (Table 15.1). The major criteria for the selection of these wild rice accessions were that each one was robust and that sufficient seed was available for distribution to the community, and that each contained potentially useful agronomic traits. High-molecular-weight DNA was obtained from young seedlings for the AA genome species O. nivara, O. rufipogon, and O. glaberrima. In contrast, because no inbred single-seed decent material was available for the remaining wild species, we prepared DNA from single plants that were clonally propagated at IRRI. Efforts are now underway to generate inbred seed from these wild species and that should be available from the IRRI seed bank within 2 to 3 years.
100897 105491 96717 105690 100896 101141 105143 100882 101232 102118 100821 104502
AA
AA
AA
BB
CC
BBCC
CCDD
EE
FF
GG
HHJJ
HHKK
O. nivara
O. rufipogon
O. glaberrima
O. punctata
O. officinalis
O. minuta
O. alta
O. australiensis
O. brachyantha
O. granulata
O. ridleyi
O. coarctata
Bangladesh
Thailand
Thailand
Africa
Australia
S. America
Philippines
Thailand
Africa
Africa
Malaysia
India
Source
147,456
129,024
73,728
36,864
92,160
92,160
129,024
92,160
36,864
55,296
64,512
55,296
No. of clones
123
127
134
131
153
133
125
141
142
130
134
161
Average insert size (kb)
NDb
1283
882
362
965
1008
1124
651
425
357
439
448
Genome size (Mb)
BAC libraries, high density filters can be ordered from the AGI BAC/EST Resource Center (www.genome.arizona.edu); Reproduced from Ammiraju et al. (2006) Genome Research 16:140-147; bND = not determined
a
IRGC Acc No.
Genome
Oryza species
Table 15.1. OMAP BAC library summarya
398 Rod A. Wing et al.
15 The Oryza Map Alignment Project (OMAP)
399
Deep-coverage large-insert BAC libraries were developed for all 12 OMAP species via standard procedures developed in our laboratory over the past 10 years (Table 15.1) (Luo and Wing 2003). All libraries were quality tested for insert size and depth of coverage and were found to represent at least 10 genome equivalents with average insert sizes ranging between 123 kb (O. coarctata) to 161 kb (O. nivara) (Ammiraju et al. 2006). All OMAP BAC libraries were deposited in the Arizona Genomic Institute’s BAC/EST Resource Center for public distribution (http://www. genome.arizona.edu).
15.3 Development of Wild Species FPC/STC Physical Maps After BAC library construction, the libraries are then BAC end sequenced and fingerprinted. The fingerprints are assembled into contigs based on shared bands between clones using finger printed contigs(FPC) software (Soderlund et al. 2000). FPC contigs can then be aligned to the IRGSP genome sequence using the associated BAC end sequences. 15.3.1 BAC End Sequencing As shown in Table 15.2, all the BAC libraries have been end sequenced. Using O. nivara as an example, OMAP attempted to sequence 110,589 BAC ends and successfully sequenced 106,104 of these. The average high-quality sequence read length was 665 bases. All BAC end sequences and trace files can be found in the GSS section and trace archive of GenBank, respectively. 15.3.2 BAC Fingerprinting To fingerprint a BAC library, we used a modification of the SNaPshot fingerprinting method described by Luo et al. (2003). Briefly stated, BAC DNA is isolated using a semiautomated 96-well alkaline lysis protocol (Kim HR and Wing RA, unpublished), and then digested with five restriction enzymes of which four generate 5 overhangs. The corresponding 3 OH ends are then extended using a single fluorescently labeled ddNTP and DNA polymerase. The reaction products are then separated on ABI3730XL capillary electrophoresis sequencers and the labeled fragments are band called using ABI fragment analysis software.
a
93
73,344 73,344 110,592 184,224 146,400 137,908 71,350 147,212 221,137 208,510
1,558,326
AA
AA
BB
CC
BBCC
CCDD
EE
FF
GG
HHJJ
HHKK
O. rufipogon
O. glaberrima
O. punctata
O. officinalis
O. minuta
O. alta
O. australiensis
O. brachyantha
O. granulata
O. ridleyi
O. coarctata
ND = not determined
Total/Average
94
73,716
AA
O. nivara
93
94
94
93
88
92
93
93
91
96
96
110,589
Genome
Oryza species
% Success No. of reads after trim
Table 15.2. OMAP BAC-end sequencing summary
1,448,093
195,285
204,729
138,171
67,364
128,599
128,732
169,651
103,251
68,384
66,821
70,982
106,124
No. of GenBank submissions
654
661
632
674
672
676
586
559
717
710
590
704
665
Average length (bp) after trim (in GenBank)
937
129
129
93
45
87
75
95
74
49
39
50
71
~Mb sequenced (in GenBank)
11
NDa
10
11
13
9
7
8
11
11
11
11
16
% Genome coverage
400 Rod A. Wing et al.
15 The Oryza Map Alignment Project (OMAP)
401
As shown in Table 15.3, all 12 BAC libraries have been fingerprinted and assembled into phase I physical maps. Again, using O. nivara as an example, OMAP attempted to fingerprint 51,056 clones and achieved a 91% success rate. These fingerprints were then assembled into phase I physical maps composed of 456 contigs and 2,356 singletons. These physical maps are now being refined as “heavily manually edited maps (HME)” using a variety of assembly parameters followed by end merging and contig alignment to the IRGSP reference sequence. Figure 15.2 shows SyMAP (Soderlund et al. 2006) alignments of HME maps for O. rufipogon (AA), O. punctata (BB), and O. brachyantha (FF). As expected, there is a tremendous amount of colinearity between the wild rice species and the reference IRGSP sequence (cultivar Nipponbare of O. sativa subspecies japonica). However, several regions of structural variation can be observed, even between the two AA genome species. All of the OMAP FPC maps produced are available on the Internet using webFPC (http://www.omap.org; Pampanwar et al. 2005), and BAC end sequence alignments as well as the OMAP FPC maps can be found on Gramene (http://www.gramene.org; Ware et al. 2002). 15.3.3 Analysis of Structural Variation Between O. sativa and the 3 AA Genome OMAP Accessions The O. sativa ssp. japonica cv Nipponbare chromosome 3 was recently finished by the US Rice Chromosome 3 Sequencing Consortium in 2005. Cytologically, chromosome 3 is the second-largest rice chromosome, measuring 56.41 μm (or approximately 52.4 Mb) and is one of the most euchromatic (Cheng et al. 2001). Genetically, chromosome 3 is 170 cM in length (Harushima et al. 1998) and has 27 morphological mutants. In addition, more than 133 agronomic genes/traits/quantitative trait loci (QTLs) and 963 cDNAs have been associated with chromosome 3 (Wu et al. 2002; http://www.shigen. nig. ac.jp/rice/oryzabase). The consortium sequenced approximately 36.1 Mb of chromosome 3 and identified 6,237 new genes (The Rice Chromosome 3 Sequencing Consortium 2005). Although Oryza separated from maize and sorghum approximately 50 million years ago (MYA) and from wheat and barley approximately 40 MYA their common evolutionary history can be traced by the colinear order of genetic markers across their chromosomes (Moore et al. 1995). This is particularly true for the short arm of chromosome 3, which shows large stretches of genetic marker colinearity with maize chromosomes 1 and 9, sorghum linkage group C, and barley and wheat chromosomes 4L. Such conserved synteny across the cereals suggests that rice chromosome 3 will be a good model for the study of chromosome evolution.
92,522
357 425 651 1124 1008 965 362 882 NDc 1283
AA
BB
CC
BBCC
CCDD
EE
FF
GG
HHKK
HHJJ
O. glaberrima
O. punctata
O. officinalis
O. minuta
O. alta
O. australiensis
O. brachyantha
O. granulata
O. coarctata
O. ridleyi
104,393
25,216
63,368
63,860
86,861
46,937
34,224
33,065
33,023
94
91
88
78
86
85
90
85
93
85
91
91
% Success
HME = heavily manually edited maps; bNE = not edited; cND = not determined
a
64,836
439
AA
O. rufipogon
51,056
448
AA
O. nivara
No. of clones FP
Genome size (Mb)
Genome
Oryza species
Table 15.3. OMAP SNaPshot/FPC fingerprinting Summary
2,190
1,250
2,358
428
1,409
2,492
3,962
764
490
905
637
456
Phase I FPC
1,305 2,098 1,482 2,052 9,576 3,111
327a 167a a a
NEb NEb
5,169
NEb
3,032
NEb
1,810
1,805
225a
NEb
2,163
NE
310
210
2,356
No. of singletons
340a
HME FPC
No. of contigs
402 Rod A. Wing et al.
15 The Oryza Map Alignment Project (OMAP)
403
Fig. 15.2. SyMAP (Soderlund et al. 2006) alignments of BAC FPC/STC contig maps of three different Oryza species (left columns) with the 12 IRGSP reference genome pseudomolecules (right columns). Horizontal lines represent BES alignments to the IRGSP reference sequence
404
Rod A. Wing et al.
Insertions and deletions in genomes play a critical role in evolution. To obtain a more in-depth understanding of the roles that insertions and deletions are playing in reshaping the genomes of Oryza, we generated minimum tiling paths of BAC clones across the entire length of chromosome 3 from the HME maps of O. nivara (AA), O. rufipogon (AA), and O. glaberrima (AA). The predicted size of each BAC clone, based on the alignment of paired BAC end sequences on the IRGSP reference genome sequence, was then compared with the empirically determined BAC size as determined by pulsed-field gel electrophoresis. Figure 15.3 shows graphical representations of this analysis and reveals that all three rice chromosomes are undergoing chromosome-wide expansion and contraction relative to the IRGSP reference sequence. To obtain a detailed sample of genome expansion in O. sativa, relative to the other three AA genome species, we sequenced and annotated a single BAC from each of these species that mapped near the top of the short arm of chromosome 3. The selected O. nivara BAC had a predicted size of 220 kb based on paired BAC end sequence alignment but was found to be 178 kb following full sequencing and indicated that O. sativa was expanded by 42 kb in this region or O. nivara contracted by the same amount (or a combination of the two resulting in a overall difference of 42 kb). The predicted sizes of the O. rufipogon and O. glaberrima BACs were 169 kb and 230 kb, however their actual sequenced sizes were 126 kb and 148 kb, respectively thus representing overall variation of 43 kb and 82 kb respectively. Table 15.4 summarizes the annotation analysis and shows that not only are the insertions of transposable elements responsible for the relative expansion in O. sativa but also the presence of several new genes. The most dramatic example can be seen in the sequence comparison between O. sativa and O. glaberrima as shown in Fig. 15.4. Here, the overlapping region in O. sativa spans 230.4 kb but only 114.1 kb is conserved with O. glaberrima and contains 16 annotated genes and 2 annotated retroelements. The remaining 116.3 kb of unique O. sativa sequence contains 10 nonorthologous genes and 11 nonorthologous retroelements. For the O. glaberrima BAC, 34 kb of the 148.1-kb sequence is unique and contains three nonorthologous genes and four nonorthologous retroelements.
15.4 Summary, Conclusions, and Future Research The domestication of rice some 10,000 years ago has severely limited the gene pool that breeders can utilize for further improvements to cultivated rice. The wild species of the genus Oryza contain a wealth of genetic diversity that must therefore be uncovered if we are to meet the challenges of feeding the world’s expanding population in the 21st century.
15 The Oryza Map Alignment Project (OMAP)
405
Difference –kb (insert size\BES alignment
O. nivara
O. rufipogon
O. glaberrima
Chr. 3 -Mb
Fig. 15.3. Chromosome 3 expansion/contraction analysis of structural variations in three Oryza species—O. nivara, O. rufipogon, and O. glaberrima (mapped relative to O. sativa ssp. japonica IRGSP reference sequence)
406
Rod A. Wing et al.
Table 15.4. Summary of annotated genes, complete retrotransposons, and solo LTRs of three fully sequenced orthologous BACs in comparison to the IRGSP reference sequence O. sativa Number 5 5 7
X kb 52 12 39
O. sativa Number Unmatched blocks 6 Genes in unmatched blocks 4 Retros/solo LTRs in unmatched blocks 7 O. sativa Number Unmatched blocks 5 Genes in unmatched blocks 10 Retros/solo LTRs in unmatched blocks 9
X kb 60 12 40 X kb 113 26 50
Unmatched blocks Genes in unmatched blocks Retros/solo LTRs in unmatched blocks
O. nivara Number kb 3 9 0 0 1+ 9 unannotated O. rufipogon Number kb 4 16 0 0 4 15 O. glaberrima Number kb 5 33 3 6 3+ 16 unannotated
The Oryza Map Alignment Project was designed to conduct a detailed characterization of a single representative of each of the 10 genome types of wild rice species. The alignment of these genomes to the IRGSP reference sequence will provide a comprehensive physical framework whereby numerous genome-wide applied and basic research projects can be launched to unlock the genetic potential of these wild genomes and provide breeders with new candidate genes and QTLs for rice improvement. All of the production objectives for OMAP have been completed. Our consortium is now actively mining this data set and will reevaluate the phylogenetic tree of Oryza, establish genome-wide simple sequence repeat (SSR)/single–nucleotide polymorphism (SNP) maps of all the AA genome species, investigate the transposon dynamics in Oryza and their effect on genome size variation (Piegu et al. 2006), and develop a comprehensive Oryza Rearrangement Index to determine the majority of expansion, contraction, inversion, and translocation events with respect to the IRGSP reference sequence. Because of the massive amount of information and resources that OMAP has produced to date, and will in the future, it is obvious that our consortium alone will not be able to realize the full potential of this systems biology project without collaboration and cooperation with the broader research community. We therefore propose to establish an International
15 The Oryza Map Alignment Project (OMAP)
407
O. sativa ssp. japonica 1
2
3 46
8
5 7
9
10 12
11
14
13
16
15
2
15
18
17
17
20
19
22 24 26
28
21 23 25
18
20
29
27
25
23 21
27
O. glaberrima 14
Feature Sequence length (kb) # annotated genes # annotated retroelements Unique seq length (kb) # non-orthologous genes # non-orthologous retroelements
16
UU
O. sativa 230.4 27 12 116.3 11 10
19
24 22 26 H
Conserved 114.1 16 2
Orthologous gene
Orthologous retro
Non-orthologous gene
Non-orthologous retro
28
O. glaberrima 148.1 19 6 34 3 4
Fig. 15.4. Comparative annotation of O. sativa ssp. japonica IRGSP reference sequence versus an orthologous O. glaberrima BAC
Oryza Map Alignment Project (I-OMAP), in a similar vein as the IRGSP to help coordinate research activities utilizing this new resource. Such an IOMAP could include the development of advance backcross (ABC) populations and chromosome substitution lines (CSSLs) using the AA genome OMAP accessions. Furthermore, as sequencing technologies become more cost effective, I-OMAP could propose 6x whole-genome draft sequences of all 12 OMAP species. Having such powerful genetic and sequence resources available would lead to much more complete understanding of the world’s most important food crop.
References Ahn S, Tanksley SD (1993) Comparative linkage maps of the rice and maize genomes. Proc Natl Acad Sci USA 90:7980–7984 Ahn S, Anderson JA, Sorrells ME, Tanksley SD (1993) Homoeologous relationships of rice, wheat and maize chromosomes. Mol Gen Genet 241:483–490 Ammiraju JS, Luo M, Goicoechea JL, Wang W, Kudrna D, Mueller C, Talag J, Kim H, Sisneros NB, Blackmon B, Fang E, Tomkins JB, Brar D, Mackill D, McCouch S, Kurata N, Lambert G, Galbraith DW, Arumuganathan K, Rao K,
408
Rod A. Wing et al.
Walling JG, Gill N, Yu Y, Sanmiguel P, Soderlund C, Jackson S, Wing RA (2006) The Oryza bacterial artificial chromosome library resource: construction and analysis of 12 deep-coverage large-insert BAC libraries that represent the 10 genome types of the genus Oryza. Genome Res 16:140–147 Bennetzen JL, Ma J (2003) The genetic colinearity of rice and other cereals on the basis of genomic sequence analysis. Curr Opin Plant Biol 6:128–133 Cheng Z, Buell CR, Wing RA, Gu M, Jiang J (2001) Toward a cytological characterization of the rice genome. Genome Res 11:2133–2141 Ge S, Sang T, Lu BR, Hong DY (1999) Phylogeny of rice genomes with emphasis on origins of allotetraploid species. Proc Natl Acad Sci USA 96:14400–14405 Han B, Xue Y (2003) Genome-wide intraspecific DNA-sequence variations in rice. Curr Opin Plant Biol 6:134–138 Harushima Y, Yano M, Shomura A, Sato M, Shimano T, Kuboki Y, Yamamoto T, Lin SY, Antonio BA, Parco A, Kajiya H, Huang N, Yamamoto K, Nagamura Y, Kurata N, Khush GS, Sasaki T (1998) A high-density rice genetic linkage map with 2275 markers using a single F2 population. Genetics 148:479–494 Huang H, Kochert G (1994) Comparative RFLP mapping of an allotetraploid wild rice species (Oryza latifolia) and cultivated rice (O. sativa). Plant Mol Biol 25:633–648 International Rice Genome Sequencing Project (2005) The map-based sequence of the rice genome. Nature 436:793–800 Jena KK, Khush GS, Kochert G (1994) Comparative RFLP mapping of a wild rice, Oryza officinalis, and cultivated rice, O. sativa. Genome 37:382–389 Kaplinsky NJ, Braun DM, Penterman J, Goff SA, Freeling M (2002) Utility and distribution of conserved noncoding sequences in the grasses. Proc Natl Acad Sci USA 99:6147–6151 Khush GS (1997) Origin, dispersal, cultivation and variation of rice. Plant Mol Biol 35:25–34 Luo M, Wing RA (2003) An improved method for plant BAC library construction. Methods Mol Biol 236:3–20 Luo MC, Thomas C, You FM, Hsiao J, Ouyang S, Buell CR, Malandro M, McGuire PE, Anderson OD, Dvorak J (2003) High-throughput fingerprinting of bacterial artificial chromosomes using the snapshot labeling kit and sizing of restriction fragments by capillary electrophoresis. Genomics 82:378–389 Ma J, Bennetzen JL (2004) Rapid recent growth and divergence of rice nuclear genomes. Proc Natl Acad Sci USA 101:12404–12410 Moore G, Devos KM, Wang Z, Gale MD (1995) Cereal genome evolution. Grasses, line up and form a circle. Curr Biol 5:737–739 Pampanwar V, Engler F, Hatfield J, Blundy S, Gupta G, Soderlund C (2005) FPC web tools for rice, maize and distribution. Plant Physiol 138:116–126 Piegu B, Guyot R, Picault N, Roulin A, Saniyal A, Kim H, Collura K, Brar D, Jackson S, Wing R, Panaud O (2006) Doubling genome size without polyploidization: dynamics of retrotransposition driven genomics expansions in the genus Oryza. Genome Res (In Press, DOI 10.1101/gr.5290206) Soderlund C, Humphray S, Dunham A, French L (2000) Contigs built with fingerprints, markers, and FPC V4.7. Genome Res 10:1772–1787
15 The Oryza Map Alignment Project (OMAP)
409
Soderlund C, Nelson W, Shoemaker A, Paterson A (2006) SyMAP: a system for discovering and viewing syntenic regions of FPC Maps. Genome Res 16:1159–1168 The Rice Chromosome 3 Sequencing Consortium (2005) Sequence, annotation, and analysis of synteny between rice chromosome 3 and diverged grass species. Genome Res 15:1284–1291 Vaughan DA, Morishima H, Kadowaki K (2003) Diversity in the Oryza genus. Curr Opin Plant Biol 6:139–146 Ware D, Jaiswal P, Ni J, Pan X, Chang K, Clark K, Teytelman L, Schmidt S, Zhao W, Cartinhour S, McCouch S, Stein L (2002) Gramene: a resource for comparative grass genomics. Nucl Acids Res 30:103–105 Wing RA, Ammiraju JS, Luo M, Kim H, Yu Y, Kudrna D, Goicoechea JL, Wang W, Nelson W, Rao K, Brar D, Mackill DJ, Han B, Soderlund C, Stein L, SanMiguel P, Jackson S (2005) The Oryza map alignment project: the golden path to unlocking the genetic potential of wild rice species. Plant Mol Biol 59:53–62 Wu J, Maehara T, Shimokawa T, Yamamoto S, Harada C, Takazaki Y, Ono N, Mukai Y, Koike K, Yazaki J, Fujii F, Shomura A, Ando T, Kono I, Waki K, Yamamoto K, Yano M, Matsumoto T, Sasaki T (2002) A comprehensive rice transcript map containing 6591 expressed sequence tag sites. Plant Cell 14:525–535
16 Application of Functional Genomics Tools for Crop Improvement
1
2
2
Motoyuki Ashikari , Makoto Matsuoka and Masahiro Yano 1
Bioscience Center, Nagoya University, Nagoya, Aichi, 464-8601, Japan; 2Japan National Institute of Agrobiological Sciences, Tsukuba, Ibaraki 305-8602, Japan Reviewed by Elizabeth S. Dennis
16.1 Rice Genomics.......................................................................................411 16.2 Molecular Markers for Improved Breeding Efficiency..........................412 16.3 QTL Analysis.........................................................................................413 16.3.1 Genetic and Molecular Dissection of QTLs ...................................415 16.3.2 QTL Application in Breeding .........................................................418 16.3.4 QTL Pyramiding for Breeding .......................................................418 16.3.5 QTL Detection Using Chromosome Segment Substitution Lines...............................................................................................420 16.4 Use of Wild Species as a Source of Diversity for Breeding ..................422 16.5 Molecular Breeding ...............................................................................422 16.6 Outlook ..................................................................................................422 References......................................................................................................423
16.1 Rice Genomics Cereals supply many of the calories for humans, partly by direct intake but also as the main feed for livestock. Three cereal species—rice (23%), wheat (17%), and maize (10%)—provide approximately 50% of the calories consumed by the world population (Khush 2003). Studying and breeding these species are crucially important for the continuous provision of calories to a growing world population. Rice has been established as a model cereal because it has a particularly small genome size of about 400 Mb (for comparison: maize, 3,300 Mb; barley, 5,100 Mb; Bennett and Leitch 1995, 2005) while showing genome synteny with other cereals (Devos 2005). Moreover, gene transformation technology is well established in rice (Hiei et al. 1994). The International Rice Genome Sequencing
412
Motoyuki Ashikari et al.
Project turned rice into the first cereal to be completely sequenced as a representative model monocot (International Rice Genome Sequencing Project 2005). What kind of benefits does this accomplishment offer? Rice (Oryza sativa L.) is a staple food regularly consumed by some 50% of the world population (White 1994). In particular, it is the most important crop in the monsoonal areas of Asia, where it has a long history of cultivation and is deeply ingrained in the daily lives of Asian people. Therefore, benefits that will originate from the knowledge of the rice genome in the context of tight collaborations of scientists and breeders may affect human life directly, as a major food supply is concerned. In addition, scientific breakthroughs achieved in rice breeding programs will have impacts on the utilization of other cereals as well. Rice breeders and geneticists have generated and collected an unusually large number of mutants. The classical genetic map based on phenotypic markers of rice mutants was developed in the mid-1960s (Nagao and Takahashi 1963; Iwata and Omura 1975; Kinoshita 1995). In 1988, a research group at Cornell University first applied the restriction fragment length polymorphisms (RFLPs) technique to construct a rice linkage map (McCouch et al. 1988). This was the dawn of rice genomics. In the last two decades, innovations and developments in DNA technology enabled dramatic advances in this field, and several rice genome projects including the construction of high-resolution linkage maps (Saito 1991; Causse et al. 1994; Kurata et al. 1994; Harushima et al. 1998; MacCouch et al. 2002), the production of tagging lines including T-DNA (Jeon et al. 2000; An et al. 2005), the Ac/Ds transposon (Greco et al. 2001) and the retrotransposon Tos17 (Hirochika et al. 2001), full-length cDNA analysis (Kikuchi et al. 2003), the development of genomic libraries (Ammiraju et al. 2006), and microarray analysis (Ma et al. 2005; Rensink and Buell 2005) have been initiated and/or completed. As indispensable tools for plant molecular scientists and breeders, databases have been established (Yamasaki and Jaiswal 2005; Jaiswal et al. 2006; Kurata and Yamasaki 2006). The setting up and further development of an infrastructure of rice genomics will facilitate functional analyses of the rice genome and will help to elucidate general mechanisms of gene function which will be of interest for pure biology as well as practical breeding.
16.2 Molecular Markers for Improved Breeding Efficiency In the 19th century, Gregor J. Mendel discovered the genetic basis of plant breeding through his experiments on garden peas, by which he established Mendel’s law. For centuries, farmers have improved crops and vegetables by crossing and selecting desired traits such as environmental adaptability,
16 Functional Genomics Tools for Crop Improvement
413
quality, and yield. They planted seeds from individuals carrying the preferred traits and continued selection to fix those characteristics. Useful plant varieties have been established in this way. Conventional breeding requires a lot of practice, experience, and time to be successful. The usage of molecular markers, on the other hand, makes it possible to speed up the process to produce new lines with high efficiency, as the selection of desired traits can be based on the markers that are linked to the target traits in the process of marker assisted selection (MAS). The completed genome sequence of rice allows us to define molecular markers at any chromosome position so that MAS is very helpful not only for practical breeding, but also for quantitative trait loci (QTL) analysis.
16.3 QTL Analysis Many important agronomic traits including heading date, grain productivity, and stress tolerance show continuous phenotypic variation in the F2 populations derived from the parental cross. Such continuous phenotypic differences are based on natural allelic variation between parental lines and are governed by several genes as QTLs. Obviously, the identification of QTLs is of great practical importance for crop breeding. However, even when using a large number of plants, numerous genetic markers, and well-developed statistical methods, it often is difficult to determine the precise location of individual QTLs in the primary analysis (Fig. 16.1). Subsequent analytical steps such as evaluation and characterization of target QTLs, fine mapping, and QTL cloning are required. The production of nearly isogenic lines (NILs) carrying only one target QTL in a unique genomic background facilitates the comprehensive analysis of the QTL. NILs are produced by backcrossing with one parental line to identify the genomic region derived from the other parent through MAS. Because the chromosome background is uniform in NILs except for the target QTL, the QTL can be efficiently analyzed. Comparison of the target trait between the NIL carrying the QTL and the parental line not only confirms the existence of the QTL in the NIL genome but also allows the quantification of the effects of the QTL in the parental background. Because QTLs in NILs can be regarded as single Mendelian factors (Peterson et al. 1988), progenies from heterozygote NILs can be used for fine mapping and isolation of the target QTL gene. Because QTL analysis facilitates the identification of genes that are hard to characterize by conventional mutant analysis, it represents a methodology of general importance in biology. However, polygenic characteristics are very difficult to analyze without complete genome information, so that the completion of the rice genome sequencing project provides an indispensable tool in QTL analysis.
414
Motoyuki Ashikari et al.
We can now design suitable molecular markers to study the natural allelic variations underlying complex traits, and several QTLs have been identified so far (Doebley et al. 1997; Frary et al. 2000; Fridman et al. 2000; Yano et al. 2000; El-Assal et al. 2001; Takahashi et al. 2001; Kojima et al. 2002; Liu et al. 2002; Ashikari et al. 2005; Nishimura et al. 2005; Ren et al. 2005; Konishi et al. 2006; Li et al. 2006). A
Donor plant
Recurrent parent
1 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 7 8 9 10 11 12
x QTL involved in an interested trait Production of NIL carrying the target QTL from a donor plant on the recurrent parent background by backcross and MAS 1 2 3 4 5 6 7 8 9 10 11 12
B
QTL on NIL can be treated as a Mendelian factor, and therefore it can be characterized and isolated by usual methods Mapping and positional cloning of QTL Target QTL
Fig. 16.1. QTL cloning. First, QTL analysis is conducted to identify and localize QTLs involved in interesting traits using primary-mapping populations such as F2 developed from crossing between donor and recurrent parent plants (A). Then, an NIL is produced that carries the target QTL from the donor plant (shaded black; B) on the recurrent parent genome (unshaded; B) by backcrossing and MAS. The existence and effect of the target QTL can be evaluated precisely by comparing the target traits between NIL-Q1 and the recurrent parent. As the QTL in the NIL behaves as a single Mendelian factor, it can be mapped and isolated by classical cloning methods
16 Functional Genomics Tools for Crop Improvement
415
16.3.1 Genetic and Molecular Dissection of QTLs The best characterized QTL in rice controls heading date. Heading date is a key determinant for the adaptability of rice to different climates and cropping seasons. A wide range of variation in heading date and daylength response or photoperiod sensitivity (PS) has been observed in rice varieties (Fig. 16.2). For example, the cultivar Hayamasari, which is adapted to the climate of northern Japan, shows almost the same number of days-to-heading (DTH) under short-day (SD) and long-day (LD) conditions, lacking any PS. At the other extreme, the indica cultivar from India, Nona Bokra, exhibits an extremely strong PS. In the last decade, progress in the development of DNA markers has made it possible to conduct QTL analyses to clarify which genes control heading date in rice (Yano and Sasaki 1997). We have studied the genetic control of heading date using progeny derived from crosses between japonica and indica cultivars. In particular, comprehensive QTL detection has been performed in crosses between Nipponbare (japonica) and Kasalath (indica), resulting in the identification of 15 QTLs, Heading date (Hd)1, 2, 3a, 3b, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 (Yano et al. 2001; Fig. 16. 3). Among them, eight QTLs—Hd1, Hd2, Hd3, Hd4, Hd5, Hd6, Hd8, and Hd9—were mapped as single Mendelian factors, and five of these—Hd1, Hd2, Hd3, Hd5, and Hd6—were found to confer PS (Lin et al. 2000, 2002, 2003; Yamamoto et al. 2000). Recently, QTLs involved in extremely early and late heading have been identified (Nonoue et al. unpublished data). One QTL controlling early heading was mapped to chromosome 7 and one to chromosome 8, coincident with the locations of Hd4 and Hd5. On the other hand, six QTLs involved in extremely late heading A
B
SD (10h) LD (14.5h) Natural field (Tsukuba, Japan)
Nona Bokra
Hayamsari Taichung 65 Koshihikari
┬
150 ┬┬ ┬
100 50
┬┬
┬
┬ ┬
┬
┬ ┬ ┬
┬┬
┬
┬
0
H
Nipponbare
┬
200
ay a Ko ms s h ar i i N hik a ip po r i T a nb ic are hu Ka ng 6 N sala 5 on a th Bo kr a
Kasalath
Days-to-heading
250
Fig. 16.2. Geographical origins of six rice cultivars (A) and their day-length responses in terms of heading (B)
416
Motoyuki Ashikari et al.
1
2
3 Hd9
4 Hd11
Hd8
5
6
7
8
9
10
Hd3b Hd3a
Hd1
Hd4 Lhd4
Hd5
11
12 Hd13
Ehd1 Hd14
Hd10 Hd12 Hd2
Hd7 Hd6
Fig. 16.3. QTLs detected in genetic analyses using several cross combinations. Boxed loci were cloned by a map-based strategy. Hd1~14 were identified in progeny derived from a cross between Nipponbare (O. sativa ssp. japonica) and Kasalath (O. Sativa ssp. indica). Ehd1 was identified in progeny of TC65 (O. Sativa ssp. japonica) × African rice (Oryza glaberrima Steud.). Lhd4 was identified in progeny of Hayamasari (O. Sativa ssp. japonica) × ‘Kasalath (O. Sativa ssp. indica)
were detected by QTL analysis of an F2 population derived from a cross between the early heading cultivar Koshihikari and the extremely late heading cultivar Nona Bokra. The chromosomal locations o these QTLs were coincident with those of Hd1, Hd2, Hd3a, Hd4, Hd5, and Hd6. A large part of the phenotypic variability previously described could be explained by these QTLs. A major QTL controlling the photoperiod response, Hd1, was identified by means of a map-based cloning strategy and was demonstrated to correspond to a homolog of CONSTANS (CO) in Arabidopsis (Yano et al. 2000). Further, it was revealed that Hd6 and Hd3a encode α-subunit of protein kinase CK2 (CK2α; Takahashi et al. 2001) and a protein with high similarity to FT from Arabidopsis (Kojima et al. 2002), respectively. More recently, it was shown that Ehd1 encodes a B-type response regulator (Doi et al. 2004). Current efforts focus on the cloning of Hd5 and Lhd4 (unpublished data). We have monitored the mRNA levels of Hd1 and Hd3a in several NILs (Kojima et al. 2002). Hd3a transcripts were detected early and gradually increased with time under SD conditions. To investigate whether Hd1 regulates Hd3a, we also quantified the expression level of Hd3a in an NIL
16 Functional Genomics Tools for Crop Improvement
417
for Hd1, NIL (Hd1). The Arabidopsis CO (Hd1 ortholog) is involved in the promotion of flowering and up-regulates the expression of FT under LD conditions. In NIL (Hd1), the Nipponbare functional allele was replaced with a loss-of-function Kasalath allele. NIL (Hd1) headed later than Nipponbare under SD conditions. The expression levels of Hd3a were reduced in NIL (Hd1), indicating that the functional allele of Hd1 up-regulates the expression of Hd3a (Kojima et al. 2002). These results suggest that the functions of Hd3a and FT and the regulation of their expression by Hd1 and CO, respectively, are conserved between rice, an SD plant, and Arabidopsis, an LD plant. However, there is a difference between rice and Arabidopsis in the expression profiles of the key flowering-time genes Hd3a and FT in response to day length. Doi et al. (2004) have demonstrated that Ehd1 is involved in upregulation of an FT-like gene such as Hd3a, independently of the action of Hd1, resulting in the promotion of heading under SD conditions. A BLAST search in the Arabidopsis genome failed to detect an Arabidopsis ortholog of Ehd1. Although similar genetic control mechanisms of flowering time (heading date) exist in rice and Arabidopsis, these results suggest that unique mechanisms may also operate in the control of flowering in rice. Currently, we are analyzing the mRNA levels of several PS genes, including Hd1, Hd3a, Ehd1, Hd5, and Lhd4, identified in several NILs. These analyses will contribute further to our understanding of the gene hierarchy in the mRNA transcriptional network controlling heading in rice. However, as it seems likely that post-transcriptional mechanisms are involved in the regulation of heading as well, studies on the protein level will be required for a more complete understanding of the developmental mechanisms of heading. Although biochemical functions of the Arabidopsis genes CO and FT seem to be conserved in rice Hd1 and Hd3a, the inductive photoperiod for flowering differs between the two species. This raises the question for the gene(s) or mechanism(s) that generate the opposite photoperiod responses in SD plants and LD plants. Newly identified QTLs such as Ehd1, Hd5, and Lhd4 may provide novel approaches to this problem, and further comparative studies in Arabidopsis and rice will help to identify conserved and/or diverse features of the important and complex developmental mechanisms of flowering. The molecular characterization of genes controlling PS allowed us to identify functional nucleotide polymorphisms (FNPs) underlying the natural variation in flowering time. In the case of rice genes controlling heading date, many FNPs resulted in a loss of function. Taking together all available information, the naturally occurring continuous variation in DTH and PS appears likely to be generated by the combination of numerous gain-of- and loss-of- function alleles of a whole series of genes in rice.
418
Motoyuki Ashikari et al.
16.3.2 QTL Application in Breeding Progress in understanding the genetic control of rice heading date has made it possible to develop varieties adapted to different geographic regions. To enhance the cropping potential, we used MAS to develop isogenic lines (ILs) with early and late heading dates in the elite rice variety, Koshihikari. To minimize the length of substituted chromosome segments containing target QTLs, we developed tightly linked DNA markers in three QTL regions, Hd1 (Yano et al. 2000), Hd6 (Takahashi et al. 2001), and Hd5 (Yamanouchi and Yano, unpublished data). By MAS, we developed three ILs—Kanto IL1 (Hd1), Wakei 367 (Hd6), and Wakei 371 (Hd5)— with a very small Kasalath chromosome segment (170 to 625 kb) including a heading-date QTL in the genetic background of Koshihikari (Fig. 16.4A). A fourth IL, Wakei 370 isogenic line, had a larger integrated Kasalath chromosome segment containing the Hd4 locus (Fig. 16.4A). Because Hd4 had been rather imprecisely mapped on chromosome 7, we could not design any DNA marker that was tightly linked to this QTL. The heading date of Kanto IL1 was 12 days earlier than that of Koshihikari in Ibaraki Prefecture, and those of Wakei 367, Wakei 370, and Wakei 371, were 10, 3, and 11 days later, respectively (Fig. 16.4B). These lines were further evaluated for yield, culm length, panicle length, grain, food quality, cold tolerance, and field resistance to leaf blast. Kanto IL1 had shorter culms than Koshihikari, and was slightly inferior in eating quality and cold tolerance. There was no significant difference in any trait between the other three ILs and Koshihikari. 16.3.4 QTL Pyramiding for Breeding DNA markers tightly linked to QTLs for target traits are very effective tools for minimizing the length of substituted chromosome segments. Once NILs or ILs for target QTL have been produced, desirable QTLs can be combined through crossing NIL-QTLs or IL-QTLs into a common genetic background (Fig. 16.5). The efficiency of this approach has been demonstrated by pyramiding two QTLs involved in the control of plant height and grain number (Ashikari et al. 2005). High grain productivity and small plant height reducing lodging susceptibility are important agricultural traits. QTL analysis of progenies from the cross between the japonica rice, Koshihikari, and the indica cultivar, Habataki, established QTLs for grain number (Gn1) and plant height (Ph1). Positional cloning revealed that Gn1a encodes a cytokinin oxidase/dehydrogenase (CKX) while Ph1 encodes the gibberellin 20 oxidase, thus being identical with sd1, the rice green revolution gene (Ashikari et al. 2002; Sasaki et al. 2002, Spielmeyer et al. 2002). As expected,
16 Functional Genomics Tools for Crop Improvement
419
A kb
kb
C10915
90
RM416
P0456CT1 Hd6 70
P0456GC1
Kanto IL1
Days-to-heading
B
OS1590CCT1
3090
280
C39
20
S2539
390
100
R46
OS0040CT1
Y4836L
kb RM8266
640
400
90
Hd1
kb E4071
RM5995
Hd4
150
3520
C492
cnt13I
630
20
RM6449
G1015
800
R2976 Hd5
230
R902 130
RM1111
RM5481-1
Wakei 367
Wakei 370
Wakei 371
120 110 100 90 80 Kanto IL1
Wakei 367 Wakei 370
Wakei 371 Koshihikari
Fig. 16.4. Graphical representation of the genotypes of Kanto IL1 (Hd1), Wakei 370 (Hd4), Wakei 367(Hd6), and Wakei 371 (Hd5) (A) and their days-to-heading (B). In each IL, white and black blocks indicate chromosomal regions derived from Koshihikari and Kasalath, respectively
NIL-Gn1 showed increased grain number and IL-sd1 exhibited reduced plant height. To combine the beneficial traits, two lines were crossed and a pyramiding line carrying Gn1 and sd1 was selected using MAS. NILGn1+sd1 harbored Gn1 and sd1 from the donor line, Habataki, in the Koshihikari genetic background, and showed increased grain production (+23%) and shorter plant height (–20%) as compared to Koshihikai. These results exemplify a strategy for tailor-made crop improvement; once a battery of useful IL has been established, lines with desired combinations of QTLs can be produced to order.
420
Motoyuki Ashikari et al. NIL-grain number
1 2 3 4 5 6 7 8 9 10 1112
NIL-lodging resistant
1 2 3 4 5 6 7 8 9 10 1112
NIL-grain size 1 2 3 4 5 6 7 8 9 10 1112
NIL-salt tolerant
1 2 3 4 5 6 7 8 9 10 1112
NIL-early flowering 1 2 3 4 5 6 7 8 9 10 1112
Line with high grain number, lodging resistant, salt tolerant, large grain size and early flowering
Fig. 16.5. QTL pyramiding. Each IL line carrying a desirable agricultural trait such as grain number, lodging resistance, salt tolerance, grain size, or early flowering is mutually crossed to combine these beneficial traits into one plant by a MAS method
16.3.5 QTL Detection Using Chromosome Segment Substitution Lines Chromosome segment substitution lines (CSSLs) are lines that possess relatively large chromosome segments from the donor parent chromosome in the recurrent parental chromosome background (Ebitani et al. 2005). These lines are produced by repeated backcrossing with a recurrent parent, systematic MAS, and selection of plants carrying the desired chromosome segments. In the case of rice that possesses 12 chromosomes, the donor chromosome segments in the CSSLs successively overlap from the top of chromosome 1 to the bottom of chromosome 12 (Fig. 16.6). These CSSLs can be considered a genomic library with huge genome inserts. Phenotypic characterization can reveal which chromosome fragment from the donor carries genes associated with interesting trait. Identification of such traits takes place in conventional field experiments, in which CSSLs are cultured under various conditions such as drought, high salt, or
16 Functional Genomics Tools for Crop Improvement
421
P1 P2 1 2 3 4 5 6 7 8 9 101112 13141516 171819 20 2122 2324 25262728 2930 3132 3334 3536 3738 3940 4142 4344 4546 4748
Chr.1 Chr.2 Chr.3 Chr.4 Chr.5 Chr.6 Chr.7 Chr.8 Chr.9 Chr.10 Chr.11 Chr.12 : P1 chromosome
: P2 chromosome
Evaluation of phenotypes
Fig. 16.6. Graphical genotypes in ILs. Each IL carries a relatively large chromosome fragment from the donor (P1). ILs, containing successively overlapped chromosome fragments from the donor cover the whole genome, can be considered similar to genomic libraries of the donor. The line number of ILs is dependent on the length of each chromosome fragment introgressed from the donor
pathogen exposure; lines exhibiting desired traits can then be selected. Lines showing, for example, increased stress resistance as compared to the recurrent parent probably contain donor chromosome segments carrying genes involved in stress tolerance. The production of CSSLs, the evaluation of their agricultural traits, and the isolation of lines carrying beneficial traits provide new material for breeding. Further, CSSLs allow for a highly effective search and identification of target QTLs, which does not require the construction of linkage maps or statistic analyses and therefore is “user-friendly” in practical breeding programs. Moreover, each CSSL can be used for mapping and cloning QTL genes, and as a parental line for breeding. However, CSSLs may possess undesirable traits linked to the target gene(s), because the chromosome segments introgressed are rather big. These undesirable traits can be eliminated by chromosome recombination in progenies of heterozygous ILs; MAS techniques greatly facilitate this procedure. CSSLs are available for repeated evaluations of any traits, because they are easily propagated by self-pollination. One disadvantage in using CSSLs is the inability to identify traits controlled by two separated QTLs.
422
Motoyuki Ashikari et al.
16.4 Use of Wild Species as a Source of Diversity for Breeding The genus Oryza includes 23 species (Vaughan et al. 2003). Only two of them, Oryza sativa (include subspecies, japonica and indica) and Oryza glaberrima, are cultivated crops. The use of wild rice relatives as allele donors will provide a most powerful set of allelic variation for breeding, because wild rice species are adapted to a wider range of specific environmental conditions than the cultivated species.
16.5 Molecular Breeding Genetic engineering and biotechnology hold great potential for plant molecular biology and plant breeding as they promise to expedite the production of crop varieties with desirable characteristics. Transgenic approaches allow the use of any beneficial genes across species boundaries, as they eliminate the necessity to cross parent plants. Today, the cloning of genes and production of transgenic organisms have become routine tasks in plant science. These technologies are the basis of powerful and efficient strategies for producing “ideal” crop plants. Several transgenic crops, commonly referred to as genetically modified organisms (GMOs), with traits such as improved insect resistance and herbicide tolerance, have already been commercialized after passing stringent field trials. They provide benefits to farmers as they reduce production costs and field labor, and also reduce the amounts of pesticides and herbicides applied. Some lines generated by molecular methods have the potential to ameliorate malnutrition particularly in developing countries; the so-called golden rice with dramatically increased β-carotene contents is a case in point (Ye et al. 2000; Al-Babili and Beyer 2005). The combined application of molecular breeding and exploitation of natural allelic variations are highly promising approaches to meet the food demands of the next century.
16.6 Outlook In 2004, the rice genome sequencing project was completed. Now, functional rice genomics is being enthusiastically pursued. Many novel genes have been detected and the molecular mechanisms underlying plant growth and development have been tackled. Rice genomics has many applications for general plant science; the comparison of sequence data from rice and Arabidopsis may improve our understanding of plant evolution and the differences between monocots and dicots. Similarly, comparison of genome
16 Functional Genomics Tools for Crop Improvement
423
sequences within the genus Oryza will elucidate the evolution and domestication of rice. Insights derived from rice genomics will contribute to secure the world’s food supply. The application of rice genomics provides us with powerful strategies to improve the efficiency of rice breeding. In addition, the rice genome is very similar to that of major cereal crops such as maize, barley, and wheat. Therefore, rice genomics has very significant implications for cereal breeding programs in general. We hope that rice genomics will be widely accepted by scientists and breeders, and that it will contribute to the welfare of all humans.
References Al-Babili S, Beyer P (2005) Golden Rice – five years on the road – five years to go? Trends Plant Sci 10:565–573 Ammiraju JS, Luo M, Goicoechea JL, Wang W, Kudrna D, Mueller C, Talag J, Kim H, Sisneros NB, Blackmon B, Fang E, Tomkins JB, Brar D, Mackill D, McCouch S, Kurata N, Lambert G, Galbraith DW, Arumuganathan K, Rao K, Walling JG, Gill N, Yu Y, Sanmiguel P, Soderlund C, Jackson S, Wing RA (2006) The Oryza bacterial artificial chromosome library resource: construction and analysis of 12 deep-coverage large-insert BAC libraries that represent the 10 genome types of the genus Oryza. Genome Res 16:140–147 An G, Jeong DH, Jung KH, Lee S (2005) Reverse genetic approaches for functional genomics of rice. Plant Mol Biol 59:111–123 Ashikari M, Sasaki A, Ueguchi-Tanaka M, Itoh H, Nishimura A, Datta SK, Ishiyama K, Saito T, Kobayashi M, Khush GS, Kitano H, Matsuoka M (2002) Mutation in a gibberellin biosynthetic gene, GA20 oxidase, contributed to the rice “Green Revolution”. Breed Sci 52:143–150 Ashikari M, Sakakibara H, Lin S, Yamamoto T, Takashi T, Nishimura A, Angeles ER, Qian Q, Kitano H, Matsuoka M (2005) Cytokinin oxidase regulates rice grain production. Science 309:741–745 Bennett MD, Leitch IJ (1995) Nuclear DNA amounts in angiosperms. Annal Bot 76:113–176 Bennett MD, Leitch IJ (2005) Nuclear DNA amounts in angiosperms: progress, problems, and prospects. Annal Bot 95:45–90 Causse MA, Fulton TM, Cho YG, Ahn SN, Chunwongse J, Wu K, Xiao J, Yu Z, Ronald PC, Harrington SE, Second G, MaCouch SR, Tanksley SD (1994) Saturated molecular map of the rice genome based on an interspecific backcross population. Genetics 138:1251–1274 Devos KM (2005) Updating the 'crop circle'. Curr Opin Plant Biol 8:155–162 Doebley J, Stec A, Hubbard L (1997) The evolution of apical dominance in maize. Nature 386:485–488 Doi K, Izawa T, Fuse T, Yamanouchi U, Kubo T, Shimatani Z, Yano M, Yoshimura A (2004) Ehd1, a B-type response regulator in rice, confers short-day promotion of flowering and controls FT-like gene expression independently of Hd1. Genes Dev 18:926–936
424
Motoyuki Ashikari et al.
Ebitani T, Takeuchi Y, Nonoue Y, Yamamoto T, Takeuchi K, Yano M (2005) Construction and evaluation of chromosome segment substitution lines carrying overlapping chromosome segments of indica rice cultivar ‘Kasalath’ in a genetic background of japonica elite cultivar ‘Koshihikari’. Breed Sci 55: 65–73 El-Assal SED, Alonso-Blanco C, Peeters AJM, Raz V, Koornneef MA (2001) QTL for flowering time in Arabidopsis reveals a novel allele of CRY2. Nat Genet 29:435–440 Frary A, Nesbitt TC, Grandillo S, Knaap E, Cong B, Liu J, Meller J, Elber R, Alpert KB, Tanksley SD. (2000) fw2.2: a quantitative trait locus key to the evolution of tomato fruit size. Science 289:85–88 Fridman E, Pleban T, Zamir D (2000) A recombination hotspot delimits a wildspecies quantitative trait locus for tomato sugar content to 484 bp within an invertase gene. Proc Natl Acad Sci USA 97:4718–4723 Greco R, Ouwerkerk PB, Taal AJ, Favalli C, Beguiristain T, Puigdomenech P, Colombo L, Hoge J H, Pereira A (2001) Early and multiple Ac transpositions in rice suitable for efficient insertional mutagenesis. Plant Mol Biol 46:215–227 Harushima Y, Yano M, Shomura A, Sato M, Shimano T, Kuboki Y, Yamamoto T, Lin SY, Antonio BA, Parco A, Kajiya H, Huang N, Yamamoto K, Nagamura Y, Kurata N, Khush GS, Sasaki T (1998) A high-density rice genetic linkage map with 2275 markers using a single F2 population. Genetics 148:479–494 Hiei Y, Ohta S, Komari T, Kumashiro T (1994) Efficient transformation of rice (Oryza sativa L.) mediated by Agrobacterium and sequence analysis of the boundaries of the T-DNA. Plant J 6:271–282 Hirochika, H (2001) Contribution of the Tos17 retrotransposon to rice functional genomics. Curr Opin Plant Biol 4:118–122 International Rice Genome Sequencing Project (2005) The map-based sequence of the rice genome. Nature 436:793–800 Iwata N, Omura T (1975) Studies on the trisomics in rice plants (Oryza sativa L.). III. Relation between trisomics and genetic linkage groups. Jpn J Breed 25:363–368 Jaiswal P, Ni J, Yap I, Ware D, Spooner W, Youens-Clark K, Ren L, Liang C, Zhao W, Ratnapu K, Faga B, Canaran P, Fogleman M, Hebbard C, Avraham S, Schmidt S, Casstevens TM, Buckler ES, Stein L, McCouch S (2006) Gramene: a bird’s eye view of cereal genomes. Nucl Acids Res 34 (Database issue):D717–723 Jeon JS, Lee S, Jung KH, Jun SH, Jeong DH, Lee J, Kim C, Jang S, Lee S, Yang J, Nam K, An K, Han MJ, Sung RJ, Choi HS, Yu JH, Yu JH, Choi JH, Cho SY, Cha SS, Kim SI, An G (2000) T-DNA insertional mutagenesis for functional genomics in rice. Plant J 22:561–570 Khush GS (2003) Productivity improvements in rice. Nutr Rev 61:114–116 Kikuchi S, Satoh K, Nagata T, Kawagashira N, Doi K, Kishimoto N, Yazaki J, Ishikawa M, Yamada H, Ooka H, Hotta I, Kojima K, Namiki T, Ohneda E, Yahagi W, Suzuki K, Li CJ, Ohtsuki K, Shishiki T, Otomo Y, Murakami K, Iida Y, Sugano S, Fujimura T, Suzuki Y, Tsunoda Y, Kurosaki T, Kodama T, Masuda H, Kobayashi M, Xie Q, Lu M, Narikawa R, Sugiyama A, Mizuno K, Yokomizo S, Niikura J, Ikeda R, Ishibiki J, Kawamata M, Yoshimura A,
16 Functional Genomics Tools for Crop Improvement
425
Miura J, Kusumegi T, Oka M, Ryu R, Ueda M, Matsubara K, Kawai J, Carninci P, Adachi J, Aizawa K, Arakawa T, Fukuda S, Hara A, Hashizume W, Hayatsu N, Imotani K, Ishii Y, Itoh M, Kagawa I, Kondo S, Konno H, Miyazaki A, Osato N, Ota Y, Saito R, Sasaki D, Sato K, Shibata K, Shinagawa A, Shiraki T, Yoshino M, Hayashizaki Y, Yasunishi A (2003) Collection, mapping, and annotation of over 28,000 cDNA clones from japonica rice. Science 301:376–379 Kinoshita T (1995) Report of committee on gene symbolization, nomenclature and linkage groups. Rice Genet Newsl 12:9–153 Kojima S, Takahashi Y, Kobayashi Y, Monna L, Sasaki T, Araki T, Yano M (2002) Hd3a, a rice ortholog of the Arabidopsis FT gene, promotes transition to flowering downstream of Hd1 under short-day condition. Plant Cell Physiol 43:1096–1105 Konishi S, Izawa T, Lin SY, Ebana K, Fukuta Y, Sasaki T, Yano M (2006) An SNP caused loss of seed shattering during rice domestication. Science 312:1392–1396 Kurata N, Yamazaki Y (2006) An integrated biological and genome information database for rice. Plant Physiol 140:12–17 Kurata N, Nagamura Y, Yamamoto K, Harushima Y, Sue N, Wu J, Antonio BA, Shomura A, Shimizu T, Lin SY, Inoue T, Fukuda A, Shimano T, Kobuki Y, Toyama T, Miyamoto Y, Kirihara T, Hayasaka K, Miyao A, Monna L, Zhong HS, Tamura Y, Wang ZX, Momma T, Umehara Y, Yano M, Sasaki T, Minobe Y (1994) A 300 kilobase interval genetic map of rice including 883 expressed sequences. Nat Genet 8:365–372 Li C, Zhou A, Sang T (2006) Rice domestication by reducing shattering. Science 311:1936–1939 Lin HX, Yamamoto T, Sasaki T, Yano M (2000) Characterization and detection of epistatic interactions of three QTLs, Hd1, Hd2, and Hd3, controlling heading date in rice using nearly isogenic lines. Theor Appl Genet 01:1021–1028 Lin HX, Ashikari M, Yamanouchi U, Sasaki T, Yano M (2002) Identification and characterization of a quantitative trait locus, Hd9, controlling heading date in rice. Breed Sci 52:35–41 Lin HX, Liang ZW, Sasaki T, Yano M (2003) Fine mapping and characterization of quantitative trait loci Hd4 and Hd5 controlling heading date in rice. Breed Sci 53:51–59 Liu JP, Van Eck J, Cong B, Tanksley SD (2002) A new class of regulatory genes underlying the cause of pear-shaped tomato fruit. Proc Natl Acad Sci USA 99:13302–13306 Ma L, Chen C, Liu X, Jiao Y, Su N, Li L, Wang X, Cao M, Sun N, Zhang X, Bao J, Li J, Pedersen S, Bolund L, Zhao H, Yuan L, Wong GK, Wang J, Deng XW, Wang J (2005) A microarray analysis of the rice transcriptome and its comparison to Arabidopsis. Genome Res 15:1274–1283 McCouch SR, Kochert G, Yu ZH, Wang ZY, Khush GS, Coffman WR, Tanksley SD (1988) Molecular mapping of rice chromosomes. Theor Appl Genet 76:815–829 McCouch SR, Teytelman L, Xu Y, Lobos KB, Clare K, Walton M, Fu B, Maghirang R, Li Z, Xing Y, Zhang Q, Kono I, Yano M, Fjellstrom R, DeClerck G,
426
Motoyuki Ashikari et al.
Schneider D, Cartinhour S, Ware D, Stein L (2002) Development and mapping of 2240 new SSR markers for rice (Oryza sativa L.). DNA Res 9: 199–207 Nagao S, Takahashi M (1963) Trial construction of twelve linkage groups in Japanese rice (Genetical studies on rice plant, XXVII). J Facul Agr Hokkaido Univ Sapporo 53:72–130 Nishimura A, Ashikari M, Lin S, Takashi T, Angeles ER, Yamamoto T, Matsuoka M (2005) Isolation of a rice regeneration quantitative trait loci gene and its application to transformation systems. Proc Natl Acad Sci USA 102: 11940–11944 Paterson AH, Lander ES, Hewitt JD, Peterson S, Lincoln SE, Tanksley SD (1988) Resolution of quantitative traits into Mendelian factors by using a complete linkage map of restriction fragment length polymorphisms. Nature 335: 721–726 Ren ZH, Gao JP, Li LG, Cai XL, Huang W, Chao DY, Zhu MZ, Wang ZY, Luan S, Lin HX (2005) A rice quantitative trait locus for salt tolerance encodes a sodium transporter. Nat Genet 37:1141–1146 Rensink WA, Buell CR (2005) Microarray expression profiling resources for plant genomics. Trends Plant Sci 10:603–609 Saito A, Yano M, Kishimoto N, Nakagahra M, Yoshimura A, Saito K, Kuhara S, Ukai Y, Kawase M, Nagamine T, Yoshimura S, Ideta O, Ohsawa R, Hayano Y, Iwata N, Sugiura M (1991) Linkage map of restriction fragment length polymorphism loci in rice. Jpn J Breed 41:665–670 Sasaki A, Ashikari, M, Ueguchi-Tanaka M, Itoh H, Nishimura A, Datta S, Ishiyama K, Saito T, Kobayashi M, Khush GS, Kitano H, Matsuoka M (2002) Rice gibberellin synthesis gene. Nature 416:701–702 Spielmeyer W, Ellis MH, Chandler PM (2002) Semidwarf (sd-1), “green revolution” rice, contains a defective gibberellin 20-oxidase gene. Proc Natl Acad Sci USA 99:9043–9048 Takahashi Y, Shomura A, Sasaki T, Yano M (2001) Hd6, a rice quantitative trait locus involved in photoperiod sensitivity, encodes the alpha subunit of protein kinase CK2. Proc Natl Acad Sci USA 98:7922–7927 Vaughan DA, Morishima H, Kadowaki K (2003) Diversity in the Oryza genus. Curr Opin Plant Biol 6, 139–146 Yamazaki Y, Jaiswal P (2005) Biological ontologies in rice databases. An introduction to the activities in Gramene and Oryzabase. Plant Cell Physiol 46: 63–68 Yamamoto T, Lin HX, Sasaki T, Yano M (2000) Identification of heading date quantitative trait locus Hd6 and characterization of its epistatic interactions with Hd2 in rice using advanced backcross progeny. Genetics 154:885–891 Yano M, Sasaki T (1997) Genetic and molecular dissection of quantitative traits in rice. Plant Mol Biol 35:145–153 Yano M, Katayose Y, Ashikari M, Yamanouchi U, Monna L, Fuse T, Baba T, Yamamoto K, Umehara Y, Nagamura Y, Sasaki T (2000) Hd1, a major photoperiod sensitivity quantitative trait locus in rice, is closely related to the Arabidopsis flowering time gene CONSTANS. Plant Cell 12:2473–2483
16 Functional Genomics Tools for Crop Improvement
427
Ye X, Al-Babili S, Kloti A, Zhang J, Lucca P, Beyer P, Potrykus I (2000) Engineering the provitamin A (beta-carotene) biosynthetic pathway into (carotenoid-free) rice endosperm. Science 287:303–305 White PT (1994) Rice: the essential harvest. Natl Geogr 185:48–79 Wing RA, Ammiraju JS, Luo M, Kim H, Yu Y, Kudrna D, Goicoechea JL, Wang W, Nelson W, Rao K, Brar D, Mackill DJ, Han B, Soderlund C, Stein L, SanMiguel P, Jackson S (2005) The Oryza map alignment project: the golden path to unlocking the genetic potential of wild rice species. Plant Mol Biol 59:53–62
17 From Rice to Other Cereals: Comparative Genomics
1
1
1
2
Richard Cooke , Benoit Piégu , Olivier Panaud , Romain Guyot , Jérome 3 3 1 Salse , Catherine Feuillet and Michel Delseny 1
Laboratoire Génome et Développement des Plantes, UMR 5096 CNRS-IRD Université de Perpignan, 66860, Perpignan, France; 2Laboratoire Génome et Développement des Plantes, UMR 5096 CNRS-IRD-Université de Perpignan, Centre IRD, 34394 Montpellier cedex 5, France; 3UMR Amélioration et Santé des Plantes, INRA-UBP, Domaine de Crouelle, 63100 Clermont Ferrand, France Reviewed by Robert Henry and Elizabeth S. Dennis
17.1 Introduction............................................................................................429 17.2 Origin and Evolution of Cereals ............................................................431 17.3 Use of Comparative Genomics to Improve Genome Sequence Annotation .............................................................................................433 17.4 Comparative Genomics and Conserved Noncoding Sequences: the Discovery of New Genes and New Signals .....................................436 17.5 Comparative Phylogeny of Multigene Families ....................................437 17.6 Revised “Circle Diagram” Model and Synteny Disruption ...................443 17.7 The Rice Genome as a Model for Map-Based Cloning in Cereals ........450 17.8 Comparative QTL Mapping and Meta-Analysis of QTL ...................................................................................................454 17.9 Comparative Expression Profiling .........................................................457 17.10 Comparative Biology in the Era of Genomics .....................................458 17.11 Genome Sequencing in Grasses: Beyond the Model ...........................461 Acknowledgments .........................................................................................464 References......................................................................................................464
17.1 Introduction Success in sequencing the Arabidopsis genome in the mid-1990s (The Arabidopsis Genome Initiative 2000) demonstrated that sequencing larger genomes, in a reasonable period of time and at an affordable cost, would be feasible and rice became the next challenge. Although slightly more
430
Richard Cooke et al.
than three times larger, the rice genome has been almost completely sequenced in about the same time frame (6 to 7 years). Indeed, two genomes have been independently sequenced, the japonica variety Nipponbare by the IRGSP consortium (International Rice Genome Sequencing Project 2005) and the indica variety 93-11 by the Beijing Genomics Institute (Yu et al. 2005). This pioneering work has now triggered new genomic sequencing projects on other cereals (Paterson et al. 2005). Rice was considered as a model because it has one of the smallest genomes in cereals (only 400 Mbp, compared to the 2,500 Mbp of maize or the 16,000 Mbp of bread wheat), it could be much more easily transformed than any other cereal, extensive genomic and germplasm resources already existed and rice is an essential crop for human nutrition. A major argument to sequence the rice genome was the preliminary evidence, reported by a number of groups and very elegantly summarized by Moore and coworkers, that the organization of the chromosomes (or chromosome segments) of the various cultivated cereals was essentially syntenic with rice chromosomes (Moore et al. 1995). The inference was that most of the information derived from the rice genome would be immediately useful for the other cereal genomes and for the improvement of these crops. While the rice genome was being sequenced, questions concerning functional characterization of rice genes started to be addressed. Most of the recent advances, including bioinformatics, are reviewed in several chapters of this book and in a recent issue of Plant Molecular Biology (Wing 2005). A very significant continuation of rice genome sequencing is the evaluation of the genomes of wild-type species of rice, bacterial artificial chromosome (BAC)-end sequencing of each species representing between 10% and 19% of their genome (http://www.omap.org; Ammiraju et al. 2006; Chapter 15 in this book). This makes rice a unique resource among the eukaryotes, with two genomes fully sequenced and a dozen other related ones physically mapped and partially sequenced. Therefore, now that the initial goal of sequencing has been reached, it is important to evaluate the usefulness of rice and Arabidopsis sequences and the limitations of their use for analyzing other genomes from the same botanical family and for attempting to improve these crops using molecular resources. Several articles have already described the perspectives (Rensink and Buell 2004; Paterson et al. 2005; Varshney et al. 2005; Xu et al. 2005; Paterson 2006). In this chapter, we first review our knowledge of the evolution of cereals and show how comparison of just two complete genomes and several hundred thousand expressed sequence tags (ESTs) from other species has already improved the annotation of the Arabidopsis and rice genomes. We then examine how much the synteny between different genomes is conserved and whether the concentric circle model of cereal genomes (Moore et al. 1995; Devos and Gale 1997; Devos 2005) fits with
17 From Rice to Other Cereals: Comparative Genomics
431
the most recent data, followed by a consideration of how observation on rice is going to influence future analysis of the other cereal genomes. Finally, after discussion on how the rice genome information is used to isolate important genes in other cereals, we describe a few examples in which comparative genomics is already providing useful information.
17.2 Origin and Evolution of Cereals The word “cereals” is a generic term ascribed to crop species that belong to the Poaceae family and are primarily grown for grain consumption. They are just a subset of the numerous monocotyledonous species, but are by far the most economically important cultivated plants worldwide. They dominate about 20% of the land area. The Poaceae family includes more than 8,000 species widely spread from tropical to temperate climates and exhibits a high diversity of morphological as well as physiological characteristics that makes them well adapted to various environments (Kellogg 2001). The taxonomic status of the Poaceae family is well established for most of the genera. The major cereal species are distributed among four of the five main Poaceae subfamilies: maize, sorghum, and pearl millet belong to the Panicoideae (maize being a member of the Andropogoneae tribe); finger millet to the Chloridoideae; rice to the Bambusoideae (Oryzoideae tribe); and wheat, barley, oats, and rye to the Poideae (wheat, barley, and rye being in the Triticeae and oat in the Aveneae tribes). The oldest recorded Poaceae fossil has been dated at 65 to 75 million years ago (Mya) (Prasad et al. 2005). The phylogenetic relationships within the family have been studied using various molecular data, and have led to the same datation for the origin of the radiation of the four subfamilies (Kellogg 2001). An important feature of the grass genomes is that they are very often polyploids, which complicates their analysis. The impact of polyploidization on grass genome evolution was recently reviewed (Levy and Feldman 2002; Moore and Purruganan 2005). Although they clearly behave as diploids, most grass genomes have undergone ancient whole-genome duplication (WGD) and are likely paleopolyploids. More recent events of polyploidization have also occurred in maize, wheat, oat, and sugarcane. Within the Panicoideae, the divergence of maize and sorghum dates back to 15 to 20 Mya (Gaut and Doebley 1997), whereas maize separates from its ancestor, Teosinte, about 3 Mya. In the Poideae, wheat and barley diverged approximately 10 to 14 Mya, and wheat and rye approximately 7 Mya. The Triticum/Aegilops divergence is approximately 3 Mya and tetraploid wheat appeared approximately 0.5 Mya. In addition, Gaut et al. (1999) have extensively studied the Adh gene family in the Poaceae family and proposed an estimation of the molecular clock for the Adh1 gene, thus
432
Richard Cooke et al.
providing a practical means to estimate the divergence time between any given pair of Poaceae species. The Oryzoideae tribe is also an interesting one with the Oryza genus represented by 23 species distributed all around the world. Only two species, belonging to the AA genome group, are cultivated, O. sativa and O. glaberrima. The other species are either diploids or allotetraploids presenting various genome combinations (Ge et al. 1999). They offer a vast resource to investigate the evolutionary history of this genus, the effects of polyploidization on genome structure and expression and to elucidate the domestication process. All these phylogenetic data are of primary importance for the correct interpretation of comparative genomic data among cereals and dating of molecular events. All cereals have been derived from wild species through human selection and the domestication process has been particularly well studied in this family. The fertile crescent region of middle East Asia is well documented as the “center of origin” and domestication for most of the economically important species such as wheat, barley, oats, and rye, as is central America for maize, Asia and West Africa for rice, and West Africa for sorghum (Harlan 1992). Domestication of most of the cereals occurred concomitantly during the late neolithic, i.e., 10,000 to 8,000 years BC (Tanno and Wilcox 2006). Interestingly, several domestication traits, such as nonshattering, reduced tillering, increased seed weight or loss of seed dormancy are common among the cereals (and referred to as the domestication syndrome). The genes or quantitative trait loci (QTL) controlling some of these traits have been tentatively mapped across some cereal species and found to be at orthologous positions on the chromosomes (Paterson et al. 1995). This demonstrated the potential usefulness of comparative approaches in mapping genes controlling agronomically important traits. The availability of two rice genome sequences, one for the indica variety 93-11 and one for the japonica variety Nipponbare, has enabled extensive comparison between the two genomes. The gene order is highly conserved and only four inversions have been identified on the long arms of chromosomes 1, 4, and 8, and on the short arm of chromosome 1 (Han and Xue 2003). However, more than 16% of the sequence cannot be aligned owing to the presence of numerous insertions and deletions, mostly accounted for by transposable elements (TE) activity. These elements have been used to show that the two subspecies probably arose from two independent domestication events from two distinct gene pools that diverged between 0.5 and 1 Mya (Ma and Bennetzen 2004; Vitte et al. 2004).
17 From Rice to Other Cereals: Comparative Genomics
433
17.3 Use of Comparative Genomics to Improve Genome Sequence Annotation Most researchers access the databases through either sequence alignment programmes or keyword searches. In both cases, the quality of the “annotation tags” attached to the database sequences is essential for an optimal exploitation of data from systematic projects. As new genomes are completely and accurately sequenced, they provide models for subsequent analysis and annotation for those of related species. Arabidopsis provided the first higher plant genome sequence (The Arabidopsis Genome Initiative 2000) and, as such, laid the foundations for all plant annotation to come. A homogeneous nomenclature, based on the controlled vocabulary of the Gene Ontology Consortium (Berardini et al. 2004; Harris et al. 2004) for functional annotation, was developed. However, although the sequence was highly accurate, the lack of large numbers of sequenced plant genes and the absence of similarity to animal, yeast, and bacterial sequences meant that the first annotation for Arabidopsis (see TIGR Arabidopsis database at http://www.tigr.org) was not only relatively inaccurate, but also contained large numbers of genes encoding “hypothetical” and “unknown” proteins. Several approaches were adopted to improve gene identification, most notably deep and large-scale EST studies and fulllength cDNA sequencing (Seki et al. 2002). The use of sequence alignment against nucleotide and protein databases, in addition to in silico predictors, greatly improved gene identification. Between 2000 and 2005, the gene number in Arabidopsis increased from nearly 26,000 to close to 30,000 (Berardini et al. 2004) and individual gene models were considerably refined. Similar strategies are being applied to the rice genome (see Chapter 3 in this book), and rice now represents the model on which other grass genome annotation will be based. However, there are some biological and physiological differences between rice, other cereals and noncereal monocotyledonous species: two typical examples are the extreme variations in flower and inflorescence morphologies and the different photosynthetic types (C3, C4, and CAM). Therefore, it is logical to assume that some specific sets or combinations of genes are unique in different species and that sequencing a single model genome will not be sufficient. The Arabidopsis annotation was largely exploited in the annotation of the rice genome (Rensink and Buell 2004). Although these two species diverged some 200 Mya, comparison of the complete rice genome sequence (International Rice Genome Sequencing Project 2005) with protein databases frequently identified proteins predicted by the Arabidopsis annotation as the best match. This led to both a more accurate identification of exons in rice genes that have homologues in Arabidopsis and, of course, a
434
Richard Cooke et al.
reciprocal improvement in the Arabidopsis annotation. In addition to the genomic sequence, extensive collections of ESTs and full-length cDNAs (Kikuchi et al. 2003) have been developed for rice. A considerable amount of sequence information is also available for other grass species, essentially in the form of ESTs, although a full-length cDNA project for maize is under way (http://www. maizecdna.org/). Table 17.1 gives an illustration of the public EST resources for plant species. There are currently more than 30 million accessions and rice comes in third after human and mouse while wheat is second for plant ESTs. This information has already proven useful for the annotation of the rice genome, in the identification of genes for which no molecular information was available in Arabidopsis, and will clearly contribute to annotation of newly sequenced genomes. The first rice annotations predicted many more genes than in Arabidopsis, with up to 55,000 genes (Goff et al. 2002; Yu et al. 2002), among which some 50% were “rice specific.” Automated analysis has been considerably improved in the latest rice annotations and more recent studies Table 17.1. EST numbers for major monocot (bold font) and dicot species indbEST (April 28, 2006 release) (http://www.ncbi.nhm.nih.gov/dbEST/) Species Oryza sativa Triticum aestivum Zea mays Arabidopsis thaliana Hordeum vulgare Glycine max Pinus taeda Saccharum officinarum Medicago truncatula Solanum tuberosum Sorghum bicolor Lycopersicon esculentum Vitis vinifera Festuca arundinacea Zingiber officinale Sorghum propinquum Brachypodium distachyon Allium cepa Triticum monococcum Secale cereale Agrostis stolonifera
EST number (dbEST) 1,183, 548 853,146 734,267 622,966 437,328 356,918 329,469 246,301 225,129 219,765 208,466 199,875 195,434 41,834 38,083 21,780 20,449 19,582 10,139 9,301 8,992
17 From Rice to Other Cereals: Comparative Genomics
435
have shown that these “extra” genes in fact often correspond to transposable elements (Bennetzen et al. 2004), so that the actual gene number is probably less than 40,000 (see Chapter 3 in this book). This is a timely warning before large-scale in silico annotation of other cereal sequences begins. Maize and wheat genomes, for example, contain considerably greater numbers of TE-related sequences and adequate filtering will be needed to achieve a meaningful annotation. However, the high sequence similarity between cereals, notably in coding regions, will most probably allow the use of these rice versions of gene prediction algorithms without modification for annotation of other cereal genomes. Finally, two ongoing projects are devoted to manual curation of rice annotation by recognized experts on gene families (Yuan et al. 2005; Ohyanagi et al. 2006). An illustration of the power of cross-genome sequence comparison was recently provided by Katari et al. (2005). These authors used 595,321 shotgun reads of the Brassica oleracea genome as well as rice genomic sequences. These comparisons revealed conserved unannotated regions, some of them being organized in clusters within a 2-kb distance from each other. A total of 9,040 such clusters were identified in the Arabidopsis genome. A set of 112 clusters, located on chromosome 4, was selected and tested by reverse transcriptase-polymerase chain reaction (RT-PCR), resulting in the detection of 25 new transcripts. Most of them have a match in the rice sequence suggesting that they are biologically active and correspond to new genes which were previously missed. There is no doubt that these comparisons are also enriching the annotation of the rice genome and that this will be a choice method to improve the annotation of the future sequences of other cereal genomes when available. A collection of available maps and tools for comparisons within the grass family can be found in the Gramene (http://www.gramene.org) database (Jaiswal et al. 2006). A list of bioinformatic tools and useful databases for comparative genomics is given in Varshney et al. (2005) and in Chapter 14 of this book. To improve functional annotation, the comparative grass genomics resource Gramene, the comparative plant genomics resource database PlantGDB (Dong et al. 2005), and Oryzabase (Kurata and Yamazaki 2006) are contributing to the Gene Ontology Consortium. While still far from perfect, this represents a first step toward a more homogenous gene annotation which is an absolute requirement for maximum exploitation of comparative genomics, transcriptomics, and proteomics. In the near future, genomic drafts of maize, wheat, and sorghum will significantly increase the available dataset and considerably improve the significance of comparisons.
436
Richard Cooke et al.
17.4 Comparative Genomics and Conserved Noncoding Sequences: The Discovery of New Genes and New Signals An emerging field in comparative genomics is the study of conserved noncoding sequences. Despite divergence between species, some unannotated sequences are conserved between genomes, suggesting they might have a functional role that imposes their selection and conservation during evolution. At least a portion of these sequences are transcribed. Some correspond to previously unidentified protein-coding genes that escaped detection due to shortcomings in the gene prediction software, whereas others have allowed identification of new noncoding RNA genes such as microRNAs (miRNAs), small nuclear RNA (snRNAs), and small nucleolar RNA (snoRNAs). The tiling microarrays (Li et al. 2006b; Chapter 4 in this book) and direct cloning experiments confirm the authenticity of these unannotated transcripts. This situation is best illustrated by the discovery and comparative analysis of miRNAs (see Chapter 12 in this book). Plant miRNAs were first discovered in Arabidopsis, but it rapidly became clear that they are widespread in all plants and, to some extent, evolutionarily conserved (Bartel 2004; Dugas and Bartel 2004; Mallory and Vaucheret 2004). In silico analysis indicates that some Arabidopsis miRNAs are not represented in the rice genome but that an important fraction is conserved. The first described rice miRNAs were therefore homologs of the recently characterized Arabidopsis miRNA (Bonnet et al. 2004; Wang et al. 2004b). More recently direct cloning and sequencing experiments in rice revealed 13 new families, in addition to the 20 families already known by comparison with Arabidopsis (Sunkar et al. 2005). Four of these new rice miRNA families are present in the different monocots that have been analyzed (wheat, barley, maize, sorghum, and sugarcane), suggesting that a number of miRNAs might be specific to a group of related species. Because each miRNA targets a specific gene, or members of a gene family, the inference is that the target genes should present the same specificity. This is illustrated for the control of leaf polarity in both Arabidopsis and maize, where the same transcription factors are controlled by the same family of miRNAs (Juarez et al. 2004). Comparative analysis between cereals will certainly validate further candidates that do not fit stringently all the criteria to be identified as a miRNA. The situation will become clearer when additional genomes such as maize and sorghum are sequenced. The analysis of rice miR444 has already revealed a puzzling new model for its biosynthesis: the pre-miR444 is apparently the result of a spliced transcript containing three exons. The gene structure and resulting miRNA is conserved in the tested cereals (Sunkar et al. 2005).
17 From Rice to Other Cereals: Comparative Genomics
437
Besides transcribed sequences, comparisons between genomes identified short conserved motifs which might correspond to conserved regulatory signals and promoter elements (Guo and Moose 2003; Katari et al. 2005; Lockton and Gaut 2005; Paterson et al. 2005).
17.5 Comparative Phylogeny of Multigene Families Plant genomes are characterized by the presence of large multigene families. A major goal in functional annotation is to analyze these families at a genome-wide level. Comparing multiple members of a gene family helps in defining the limits of the various genes, because the exon size is usually conserved to a large extent, and thus improves annotation. This also provides information on the representation of the genes initially described in Arabidopsis and sheds new light on the evolution of these genes. Beyond sequence comparison and phylogenetic analysis, these data provide the foundation for further comparative functional studies. A number of such multigene families have been extensively analyzed from the Arabidopsis genome and this comparison has been extended for some of them to the analysis of the situation in rice and, in some cases, in other cereals, although in this case only incomplete data are available. In Table 17. 2 a list of a few families that have been studied both in Arabidopsis and rice is given. The knowledge we have gained from these comparisons is summarized in the text that follows. This work has revealed a relatively complex picture, with certain members of some gene families having arisen before the monocot/dicot divergence, although most result from more recent duplications, including WGD. Extensive studies on individual families have also highlighted plant-specific and lineage-specific expansion in many families, illustrated in particular by work on the large kinase and transcription factors (TF) families, and shown that orthologous relationships are not always easy to elucidate. Receptor-like kinases (RLK) are one of the most abundant protein families in plants. There are about 600 members in Arabidopsis and more than 1,100 in rice. Obviously, the family has dramatically expanded in rice and presumably in other cereals (Shiu et al. 2004). The phylogeny suggests that the common ancestor to Arabidopsis and rice probably already had as many as 400 genes in this family and that large-scale expansion and fusions of novel domains have occurred in both lineages since divergence. For example, the subfamily which contains the rice Xa21 and Arabidopsis FLAGELLIN SENSITIVE2 (FLS2) genes has eight members in Arabidopsis, but more than 100 in rice. The expansion mechanisms seem to involve both large-scale chromosome segmental duplications as well as tandem repeats in Arabidopsis and mainly tandem duplications in rice. Dardick and
438
Richard Cooke et al.
Table 17.2. Estimated gene numbers in some selected Arabidopsis and rice gene families Protein family RLK (receptor-like kinases)
Reference
Gene numbers Arabidopsis Rice >600 >1131
Wall-associated kinases NBS-LRR GATA TF GRAS TF MAPK MAPKK Cytochrome P 450 CBF/DREB TF AP2/ ERF TF Dof TF NAC TF MYB TF BHLH TF BZIP TF WRKY TF
26 128 29 32 20 10 272 6 146 36 105 130 164 76 72
125 >800 28 57 15 16 455 10 161 30 149 85 180 94 100
MADS TF CONSTANS-like Methyl-CpG-binding domain protein Cyclin ARF GH3
106 17 13
77 16 16
50 23 19
44 29 13
Shiu et al. 2004; Fritz-Laylin et al.2005; Dardick and Ronald 2006 Zhang et al. 2005 The IRGSP 2005 Reyes et al. 2004 Tian et al. 2004 Hamel et al. 2006 Hamel et al. 2006 Nelson et al. 2004 Skinner et al. 2005 Xiong et al. 2005 Lijavetzky et al. 2003 Olsen et al. 2005 Xiong et al. 2005 Xiong et al. 2005 Xiong et al. 2005 Zhang and Wang 2005; Wu et al. 2005 Xiong et al. 2005 Griffiths et al. 2003 Springer and Kaeppler 2005 Wang et al. 2004a Xiong et al. 2005 Térol et al. 2006
Ronald (2006) recently surveyed yeast, worm, fly, human, Arabidopsis and rice “kinomes” representing 3,723 kinase genes. They could demonstrate that the plant and animal pathogen recognition receptors can be distinguished from the members of the RLK that control nondefense pathways. The latter have rarely been duplicated since the Arabidopsis–rice divergence. In contrast, the RLK genes involved in disease resistance have been extensively amplified. Analysis of synonymous/nonsynonymous substitution rates between duplicates indicates that rice duplications are not linked to recent domestication. Many disease resistance genes belong to the NBS-LRR type. There are 128 such genes in Arabidopsis but more than 800 in rice. In Arabidopsis these genes belong to two subgroups, the TIR (Toll Interleukine 1 receptor) type representing 65% of the genes and the CC (Coiled Coiled ) type. However, the TIR genes are completely absent
17 From Rice to Other Cereals: Comparative Genomics
439
from rice and other monocots, indicating that they presumably differentiated in dicot plants after their separation from the monocots. Wall-associated kinases are another subset of receptor-like kinases which has considerably expanded in rice (Zhang et al. 2005). About 30% of the corresponding genes had to be corrected or reannotated following comparison with Arabidopsis and barley genes. The phylogenetic analysis revealed that most of the wall-associated kinases of Arabidopsis and rice clustered in different species-specific clades, suggesting that only a few members of the family originated from an ancestor common to dicots and monocots. Rice and barley genes cluster in the same clades indicating that expansion in rice results from lineage-specific expansion of the family in cereals and possibly in monocots, occurring essentially through localized tandem duplications. MAP kinases (MAPK), MAP kinase kinases (MAPKK), and MAP kinase kinase kinases (MAPKKK) are key elements in signal transduction pathways. MAPK and MAPKK genes have recently been compared (Hamel et al. 2006). All rice and poplar MAPK genes can be placed in the four clades previously identified in Arabidopsis and putative orthologous genes can be identified for AtMPK3, AtMPK6, AtMPK7, and AtMPK14. Ancient divergence of the four classes is obvious, as the two main groups are also represented in Chlamydomonas. As in other gene families, more recent duplications are obvious, before and after the divergence between monocots and dicots. Similarly, four clades can be recognized in MAPKK genes, three of which have members present in all three genomes. The MPKK7-9 clades are not represented in the rice genome. In contrast there is a single MPKK gene in Chlamydomonas. A comparative inventory of TFs was carried out recently in order to determine to what extent the 1,510 TF genes in Arabidopsis are representative of the situation in other plants (Xiong et al. 2005). This study identified 1,611 TF genes in rice. They can be classified in the same 37 families, although some subfamilies are probably lineage-specific or have expanded more in one lineage than in the other. Clear orthologous pairs or groups were identified for nearly half of the genes, and it was estimated that at least 383 ancestral TF genes were present in the common ancestor of rice and Arabidopsis. After divergence of the two genomes, the TF genes have undergone whole genome duplications, tandem duplication, loss, and functional differentiation so that about 60% of the duplicated genes have been retained on duplicated chromosome segments. The comparisons are not yet complete and the following examples are just illustrations of what can be learned from comparative studies. GATA TFs are a group of DNA-binding proteins widely distributed among eukaryotes. They have a type IV zinc-finger motif in their binding domain. The phylogenetic analysis of 57 genes revealed the existence of
440
Richard Cooke et al.
seven subfamilies, some of which are mutually exclusive to the two species. They have various additional domains (Reyes et al. 2004). The GRAS TFs are involved in a variety of developmental processes including radial organization of the root, gibberellin or the phytochrome transduction pathways. The Arabidopsis and rice genomes contain 32 genes and 57 genes, respectively. Among the rice GRAS TFs, 36 show allelic variations between indica and japonica subspecies. Eight subfamilies can be identified. Establishing orthologous relationships turns out to be difficult. In class I, there are six orthologous genes in both species. In class II, there are two unique rice genes but in Arabidopsis two correspond to the first gene and five to the second. A reciprocal situation was observed in class III in which there are five unique Arabidopsis genes but two homologues of each in rice. In class IV, there are two subfamilies of paralogous genes in both species with, respectively, five members each in rice and three and six in Arabidopsis. Class V is specific to rice with 26 genes and class VI specific to Arabidopsis with four genes. Recently, 50 maize sequences have been collected from ESTs databases and the TIGR index and used to confirm the phylogenetic analysis (Lim et al. 2005). CBFs (C-repeat binding factors) are TFs from the AP2/EREB/ERF family, which have been identified as key regulators of drought and cold stress response and adaptation. There are six genes in Arabidopsis, 10 in rice, but at least 20 in barley (Skinner et al. 2005). The phyletic analysis of the barley genes revealed three major subgroups, and all the other monocot genes fell within one of these subgroups. However, there are clear differences between rice and barley. For instance, the rice OsDREB1B.1 gene is the only rice member in the CBF4 subgroup in which there are seven barley genes; this gene is probably ancestral to the other members of the family. It remains to be determined whether this complexity is correlated with a physiological complexity of barley responses to stress. The whole AP2 gene family is larger in rice, with 161 genes, than in Arabidopsis (146 genes). Members of the Dof (DNA-binding with one finger) family (Lijavetzky et al. 2003) participate in the control of seed storage protein, defense mechanisms, germination, auxin, and gibberellin responses and many other processes, and represent a plant-specific family. The Arabidopsis genome contains 36 genes and rice at least 30. The genes cluster in four clades that are common to both species. This analysis allowed identification of putative orthologous genes and revealed ancestral gene duplications and gene loss events. Proteins containing a NAC domain form a plant-specific family of TF genes. They are involved in controlling development and meristem activity as well as responses to biotic and abiotic stress (Xiong et al. 2005). They can be classified in two major groups and 18 subgroups, based on
17 From Rice to Other Cereals: Comparative Genomics
441
sequence similarity and phylogenetic analysis. Some of these subgroups are lineage-specific, four being specific to Arabidopsis and seven to rice. Several members of this gene family have also been isolated from maize and sugarcane and an extensive survey of this gene family across plant species has recently been carried out (Olsen et al. 2005). WRKY TFs are associated with response to biotic and abiotic stress with one or several domains containing the WRKY sequence. This family is plant specific. In both Arabidopsis and rice these genes can be grouped into three common clusters, indicating that they started to differentiate before the divergence of monocots from dicots. A single gene is present in Giardia lamblia, in Dictyostelium discoideum and in Chlamydomonas reinhardti. The CO (CONSTANS) gene was initially described in Arabidopsis and shown to play a major role in the control of flowering time by the photoperiod. The Arabidopsis genome contains 17 CO-like genes which cluster into three subgroups. The rice genome has at least 16 genes belonging to this family (Griffiths et al. 2003). One of the major rice genes controlling flowering, Hd1 (Heading time 1), is a CO homolog. However, Arabidopsis is a long-day flowering plant, whereas rice is a short-day plant. Known cereal genes, including those from barley (Hordeum vulgare), can be clustered in each of the three Arabidopsis subgroups, indicating that their evolution predates monocot/dicot divergence. There seems to be a fourth subgroup specific to cereals. Several of the barley genes have been mapped and clear orthologous relationship has been established for HvCO1 and Hd1, HvCO3/OsB, and HvCO6/OsE. However, their functional role is much less clear since none of the barley genes coincide with the barley QTL for flowering time. The MADS box gene family has expanded less in rice with only 77 genes compared with more than 100 in Arabidopsis. Detailed phyletic studies indicate that they started to diverge from a common ancestor more than 700 Mya. Nine types can be recognized and both Arabidopsis and rice have at least one member in each group, confirming that the separation into subfamilies predates the separation of monocots and dicots (Nam et al. 2003). This study was refined for the subgroup characterized by the floral genes SEPALLATA (Zahn et al. 2005). It was shown that this subgroup differentiated after separation of angiosperms from gymnosperms because it is not present in these species and that at least one duplication occurred before the monocot/dicot divergence. Several duplications have occurred later on, depending on the lineages. It is interesting to note that these TFs generally work as heterodimers and therefore successive duplications provide new possibilities of interaction between the different members of the family and may be important in the origin of flowers and diversification of their morphology.
442
Richard Cooke et al.
Cytochrome P450s constitute another extremely large family in which the function of most genes remains to be established. Arabidopsis contains 246 genes and 26 pseudogenes whereas rice has at least 328 genes and 99 pseudogenes. These genes are grouped in roughly 60 families, which can be organized into clusters. Because this data set has been compared with ESTs from other plants, it has been possible to establish that much of cytochrome P450 diversity existed before the divergence between monocots and dicots and even predated the separation of angiosperms from gymnosperms. Some gene families turn out to be specific to a particular lineage, presumably illustrating metabolic specialisation and several, known to be present in other plants, seem to be absent in both rice and Arabidopsis genomes (Nelson et al. 2004). This report provides an impressive picture of the distribution of the different subfamilies of cytochrome P450 across plant phylogeny, which also illustrates how much data are still missing to get a representative picture. Cyclins are regulators of the cyclin-dependent protein kinases which control the cell cycle. The Arabidopsis genome contains 50 genes for cyclins that can be grouped into 10 classes. Four are plant specific and not represented in animal and a fifth one is shared by plants and protists. The rice genome has at least 44 cyclin genes, and most of them correspond to an Arabidopsis orthologue. However, the orthologous relationship is not always simple: for example, there is a single Arabidopsis cyclin gene CycD5, but four in rice and conversely there are three CycD3 in Arabidopsis and only one in rice (Wang et al. 2004a). The methyl CpG-binding domain proteins bind to methylated DNA and recruit chromatin modifying complexes. The human genome contains five genes coding for such proteins. Their number is much greater in Arabidopsis and rice (Springer and Kaeppler 2005). They can be divided into eight classes based on sequence analysis of the Arabidopsis and rice genomes. Two classes are apparently specific to the dicots. Close examination of the various sequences revealed evidence for domain shuffling and extensive duplications that occurred independently in the two lineages. In contrast to DNA-methyl transferases, which are highly conserved between plants, animals and fungi, these proteins have extensively diverged from their animal counterparts. This observation suggests that while plants and mammals have retained similar mechanisms for establishment and maintenance of DNA methylation patterns, they might have evolved distinct mechanisms for the interpretation of these patterns. The GH3 gene family has been described in Arabidopsis and its members are implicated in hormone homeostasis through the conjugation of indol acetate (IAA) and jasmonate (JA) to amino acids. In Arabidopsis, 19 genes have been described and they cluster into three subfamilies. Only 13 putative homologs have been identified in rice (Jain et al. 2006; Térol et al.
17 From Rice to Other Cereals: Comparative Genomics
443
2006). Analysis of ESTs in a number of species indicated that the family is ancient since three GH3 genes are detected in the moss Physcomitrella patens. The three subgroups differentiated before the separation of dicots and monocots, but they rapidly evolved after this divergence and within each group some subclusters are specific of either dicots or monocots, whereas others conserved their orthologous relationship. These few comparisons illustrate the power of comparative analysis of gene families between Arabidopsis and rice. Many aspects of development are similar in dicots and monocots and they are controlled by TFs which are functionally conserved. A typical example is the control of polarity of lateral organs: the adaxial/abaxial polarity of the leaf is controlled in Arabidopsis by the homeodomain-basic leucine zipper (HD-ZIP) genes PHABULOSA, PHAVOLUTA, and REVOLUTA. Clear orthologues can be identified in rice and maize (Juarez et al. 2004). On the other hand, flower development is different and it is sometimes difficult to find an Arabidopsis ortholog in rice. It can be anticipated that analysing more gene families across more species will unravel a fascinating picture of plant gene structure and evolution. The corresponding diversity will need to be correlated with functional specialization in the different species, opening avenues for functional biology.
17.6 Revised “Circle Diagram” Model and Synteny Disruption Comparative mapping studies using RFLP markers suggested wellconserved synteny between the grass genomes. This was summarized in the historical “Circle Diagram” article (Moore et al. 1995) and has since been regularly updated and refined (Devos 2005). Genetic mapping of a limited number of markers on several grass genomes with chromosome numbers between 5 and 12 showed that these genomes could be represented as projections of only 25 “rice linkage blocks,” suggesting that the smallest, rice genome could be used to elucidate the organization of larger genomes, notably in map-based cloning approaches. This well-structured picture was called into question by the analysis of complete, high-quality genome sequences from the model plant species Arabidopsis (The Arabidopsis Genome Initiative 2000) and rice (Goff et al. 2002) which revealed extensive chromosomal segmental duplications. The number and extent of duplicated regions suggested several rounds of WGD in the two species, with the most recent detected in rice predating the divergence of the cereals (Guyot and Keller 2004; Yu et al. 2005). Detailed analysis of these regions has shown gene loss within duplicated segments so that fewer than half the genes are retained as pairs in Arabidopsis homeologous segments (Blanc et al. 2000) and about one in five in
444
Richard Cooke et al.
rice (Paterson et al. 2004; Salse et al. 2004). Studies on incomplete data from other species (Blanc and Wolfe 2004) and the dates of these WGD suggest that virtually all angiosperms are ancient polyploids, sharing some duplication events while additional duplications have occurred more recently in different lineages, as in poplar (Sterck et al. 2005). The dynamics of gene loss in sister blocks is not yet clear, although asymmetric evolution has been described for one region of the rice genome (Wang et al. 2005a) and length heterogeneity is observed in many sister blocks in both Arabidopsis and rice. In addition, duplications that have occurred since the divergence of the grass species will further complicate the picture. For instance maize, an ancient allotetraploid, has conserved only approximately 50% of gene pairs in sister regions (Lai et al. 2004) after some 5 to 10 Mya and it is highly probable that this diploidization is an ongoing process. Several factors have contributed to improve and modify the model: the release of the complete sequence of rice, the use of many more sequencebased markers such as ESTs, and statistical treatments of the data. Statistical analysis has been applied to define colinear regions within the maize genome and between grasses (Hampson et al. 2003) and the ADHoRE software was developed in order to align genomes (Van de Poele et al. 2002). However, these statistical assessments are not yet used as much as they should be. In addition to this comparative mapping at the chromosome scale, the development of a large number of BAC libraries from different grass species has recently enabled direct sequence comparisons at a number of specific orthologous loci. Intermediate resolution studies used BAC-end sequences and partial BAC sequences to align the various sequences. Comparative studies based on the finished rice genome sequence have been carried out between rice and several other grass species. Goff et al. (2002) have reported a rather confusing picture of the rice–maize comparison, but they used relatively nonstringent criteria and did not apply any statistical treatment, so that many paralogs have been mapped in the place of orthologs. Salse et al. (2004) compared the genetic map of maize, in the form of mapped sequenced markers, to the rice genome by sequence alignment, using stringent criteria and statistical treatment to ensure that mostly orthologous positions are compared and that the alignments are significant. They concluded that the comparative concentric cereal maps inadequately represent the complexity of genome rearrangements. In wheat, many more markers derived from ESTs have been mapped using a set of deletion lines (Qi et al. 2004) and alignment against the rice BAC sequences. It was concluded that wheat–rice synteny has been eroded (La Rota and Sorrells 2004; Linkiewicz et al. 2004; Munkvold et al. 2004; Peng et al. 2004). Guyot et al. (2004) also found frequent disruptions of microcolinearity between syntenic regions of rice chromosome 5S and
17 From Rice to Other Cereals: Comparative Genomics
445
wheat chromosome 1AS. Similar analyses were carried out with sorghum (Bowers et al. 2005) and, to a lesser extent, with other cereal genomes (Devos 2005). Map comparisons were also carried out at the cytogenetic level: several sorghum chromosomes were compared to rice chromosomes (Kim et al. 2005). A high degree of colinearity was observed, although the sorghum genome is about twice as big as that of rice. The distal euchromatic regions of sorghum chromosomes 3 to 7 and 10 are on average 1.8 times larger than their rice counterparts and exhibit a lower recombination rate. The pericentromeric heterochromatic regions of sorghum are approximately 3.6-fold larger than in rice and recombination is much more strongly suppressed than in rice. Development of fluorescent in situ hybridization (FISH) technologies for high-resolution cytogenetic maps (Wang et al. 2006) should accelerate across-species cytogenetic comparisons. Following comparisons at the genetic map level, attempts to use physical map information were reported. The first example was a comparison between sorghum chromosome 3 and rice chromosome 1 (Klein et al. 2003). It demonstrated extensive conservation of gene content and order between these two chromosomes, perturbed only by one large-scale rearrangement and several smaller changes. More recently, BAC-end sequences of cv. Kasalath (indica) were aligned against cv. Nipponbare (japonica) sequences (Katagiri et al. 2004). This strategy was also used to align rice chromosome 3 with the maize and wheat genomes: the short arm of rice chromosome 3 is highly colinear with the short arm of maize chromosome 1 and the inverted long arm of maize chromosome 9, while the long arm of rice chromosome 3 is colinear with maize chromosome 1 long arm and inverted short arm of maize chromosome 5 (The Rice Chromosome 3 Sequencing Consortium 2005). Colinearity was also found between rice chromosome 3S and wheat 4BL/4DL or wheat 5AL and 4AS. In addition, rice chromosome 3L is conserved with wheat 5BL/5DL and 4DS or with wheat 4AL and 5AL. The analysis indicated that the B and D genomes of hexaploid wheat are more similar to each other than to the A genome. The two arms of the 4A chromosome are inverted relative to the 4B and 4D arms. Rice chromosome 3 sequence of Nipponbare was also compared in the same study with that of its wildtype relative O. nivara and was found to be approximately 21% larger: 34 insertion blocks, 2 deletion blocks, and 36 invariable-sized blocks were identified. Finally, extensive comparison between the genomes of the various rice species is on its way (Ammiraju et al. 2006). A similar approach has been reported concerning the rice chromosomes 11 and 12 (The Rice Chromosomes 11 and 12 Sequencing Consortia 2005), which are particularly interesting because they harbour a recent duplication of 3
446
Richard Cooke et al.
Mbp dated to 7.7 Mya. Rice gene models from these two regions were essentially colinear with bin-mapped wheat EST contigs. The comparative distribution of rice gene homologs from chromosomes 11 and 12 to the seven wheat homologous groups indicates that (with the exception of the recent 3 Mbp duplication) the two rice chromosomes have a different origin. This comparison of the syntenic regions between rice and wheat illustrates the presence of conserved genes alternating with more recently evolved genes. The physical map (and sequence) of the Nipponbare genome was aligned with two sorghum physical maps. These cover a syntenic compartment representing 41% of the sorghum BACs, but 80% of single-copy genes anchored on the map and a nonsyntenic component containing 46% of the BACs but only 13% of single-copy genes. The two components clearly correspond to the cytologically defined euchromatin and heterochromatin regions of the chromosomes. There is a greater colinearity in recombinogenic regions than in nonrecombinogenic ones, supporting the hypothesis that rearrangement are usually deleterious or that nonrecomb– inogenic regions are particularly rich in transposable elements, the activity of which is known to lead to fast genome differentiations. An interesting outcome of this comparison is that the sorghum physical map contigs could bridge 35 physical gaps in the rice sequence (Bowers et al. 2005). Therefore, these alignments at the BAC resolution scale largely confirm the colinearity hypothesis, even though more rearrangements than initially expected are detected. Early studies on microsynteny by sequencing orthologous regions in several grass genomes have indicated numerous exceptions to microcolinearity and are reviewed in Bennetzen and Ramakrishna (2002), Feuillet and Keller (2002), and Bennetzen and Ma (2003). The main studies are summarized in Table 17.3. The first comparison of local gene order across several cereal species was made for the Shrunken2/Anthocyaninless1 locus initially described in maize. A good conservation of the order was observed for the four genes (Sh2, X1, X2, and A1) in this region between rice, sorghum, and maize, but A1 is tandemly duplicated in sorghum. In maize, all the intergenic distances were larger, with more than 81 kbp between Sh2 and X1. The expansion of these regions was caused by the insertion of numerous long terminal repeat (LTR) retrotransposons following the divergence of maize and sorghum. When the corresponding region was analyzed in wheat, a complete rearrangement was observed (Bennetzen and Ma 2003).
17 From Rice to Other Cereals: Comparative Genomics
447
Table 17.3. A list of the various loci that have been compared between different cereal species by direct sequencing of more than 100 kbp Locus Sh2
Rph7 Rpg1 Rp1 Vrn1, SnoRNA r1/b1 Orp1/Orp2 Phd-H1 Lrs1/lg2 Ha (Hardness)
Species Rice, sorghum, maize, wheat Rice, maize, sorghum Rice, maize, sorghum Rice, wheat Wheat (different ploidy levels), rice Wheat (different ploidy levels) Barley, rice Barley Maize, sorghum Rice, barley Rice, maize, sorghum Maize, rice, sorghum Barley, rice Maize, rice Barley, rice, wheat
Bronze
Maize, rice
Adh1 Zeins Glutenins Lr10 (Leaf rust) Lr 21
References Bennetzen and Ma 2003, Ilic et al. 2003 Song et al. 2002 Wicker et al. 2003; Gu et al. 2004 Feuillet and Keller 2002; Guyot et al. 2004; Isidore et al. 2005 Huang et al. 2003 Brunner et al. 2003 Feuillet and Keller 2002 Ramakrishna et al. 2002 Yan et al. 2003 Swigonova et al. 2005 Ma et al. 2005a Dunford et al. 2002 Langham et al. 2004 Caldwell et al. 2004; Chantret et al. 2005 Fu and Dooner 2002; Lai et al. 2005
Comparative sequencing of genomic regions from maize, sorghum and rice corresponding to the maize Adh1 locus also revealed interrupted synteny. The rice genome around the Adh1 locus is colinear with a region of maize chromosome 4. On the other hand, the maize and sorghum Adh1 regions are colinear with another region of the rice genome, suggesting that in Andropogoneae, the Adh1 locus has been transposed to another location (Bennetzen and Ma 2003). As a result, the Adh1 gene is present in sorghum and on one of the maize homologous regions, whereas it is absent in the orthologous rice region. Indeed, the Andropogoneae lineage acquired a two-gene insertion containing the Adh1 gene. Rice shares 11 orthologous genes with the sorghum Adh1 region and shows a single tandem gene duplication. This microcolinearity was interrupted in two regions where a total of four sorghum genes do not have homologs on the rice BAC. The longest noncolinear segment is a 10-kbp stretch that contains the sorghum Adh1 gene and an adjacent gene. As expected, two homologous regions are found in the maize genome, on chromosomes 1 and 5. While highly conserved gene order was found between rice and sorghum, more than 40% of genes from each maize homeologous region were deleted. Many insertions occurred in the common ancestor to maize and sorghum compared to rice and additional insertions/deletions occurred in maize between the two homologous loci (Ilic et al. 2003).
448
Richard Cooke et al.
Comparison of gene clusters that encode storage proteins in maize (zein) and sorghum (kafirin) with the orthologous region in rice (Song et al. 2002) showed similar rearrangement. Six rice genes present in this region are found in the maize and sorghum orthologous regions. Kafirins and zeins are closely related in structure and are tandemly duplicated at orthologous positions, although the copy numbers of these genes differ in the two species and even within maize varieties. There is no homologous storage protein gene at this position in rice. Several genes flanking the storage protein genes in maize and sorghum are also missing in this rice region. Again, the sorghum and maize genomic regions have considerably expanded as a result of the presence of other genes, gene amplification and the presence of transposable elements. In wheat, the low-molecular-weight (LMW) glutenin region (another cluster of storage protein genes) has almost completely diverged from the rice sequence (Wicker et al. 2003). The high-molecular-weight (HMW) glutenin cluster was compared between the A, B, and D genomes from Triticum turgidum (AABB) and Aegilops tauschii (DD). Although gene colinearity is grossly retained, four out of six genes are disrupted in the orthologous region of the A genome including the two paralogous HMW glutenin genes (Gu et al. 2004). The Lr10 locus (Leaf rust) of wheat is located on chromosome 1AS and is responsible for resistance to the fungal pathogen Puccinia tritica in hexaploid wheat. DNA sequencing of 211 kbp in the orthologous region of diploid wheat Triticum monococcum cv. 92, (A genome) revealed the presence of six genes, including two NBS-LRR genes (RGA1 and RGA2) that are thought to be involved in fungal resistance. Comparative analysis of this region with rice revealed numerous rearrangements: the two RGA genes are absent in rice and there is an inversion and a tandem duplication. Intergenic distances have increased in wheat due to the insertion of a number of LTR retrotransposons. Several other regions containing disease resistance genes have been compared, such as the Rph7 locus between barley and rice (Brunner et al. 2003) or the Rp1 gene of maize with the Rph locus of sorghum (Ramakrishna et al. 2002). Important rearrangements were again observed and very often some of the disease resistance genes present in one species are absent in the other. Maize and wheat are particularly interesting species because of their polyploid nature, which allows the comparison of homologous loci within the same species. Wheat is a recent polyploid and is particularly relevant to the study of rearrangements after polyploidization. Recent comparison of the hardness locus (Ha) in wheat genomes (Chantret et al. 2005) and with rice (Caldwell et al. 2004), as well as the analysis at the r/b and Orp1/Orp2 regions of maize compared to the other crops, have revealed a number of interesting features related to genome rearrangements.
17 From Rice to Other Cereals: Comparative Genomics
449
Although the two Ha barley and wheat regions are colinear with a rice region, the Ha gene is not present in rice and many rearrangements occurred since the divergence of the three species. In addition, the Ha genes that are present in Triticum monococcum and barley have been deleted in the hexaploid wheat, indicating a fast rate of evolution of this region. The r/b genes are two genes coding for helix–loop–helix proteins that control anthocyanin biosynthesis located on homologous chromosomes 10L (r1) and 2S (b1) in maize. The orthologous regions (>600 kbp) have been compared in maize, sorghum, and rice (Swigonova et al. 2005). The two homologous maize regions have undergone complete or partial gene deletions, selective retention of orthologous genes, and insertion of nonorthologous genes. The rice orthologous region has three r genes and sorghum has two. Additional r genes exist elsewhere in these genomes: a phyletic analysis showed that they form clades according to their taxonomic origin, indicating they all derive from a unique ancestor gene that was duplicated independently in rice, maize, and sorghum. The two r genes in sorghum duplicated about 8.3 Mya, whereas the r and b genes of maize arose from a more recent event (less than 3 Mya). Within the maize 2S region, there is a single zeatin O-glucosyl transferase gene, but there are three tandemly organized copies in rice and five in sorghum, indicating a complex history and evolution of this region. Similar conclusions have also been reached in several other studies on gene colinearity in cereals at other loci such as Vrn1 (Yan et al. 2003), Phd-H1 (Dunford et al. 2002), Lrs1/Lg2 (Langham et al. 2004), and Orp1/Orp2 (Ma et al. 2005a). The most surprising observation is intraspecific rupture of colinearity in maize at the Bz (Bronze) locus on chromosome 9S between two lines, McC and B73, and with rice (Fu and Dooner 2002). This was confirmed by sequencing more than 2.8 Mbp from corresponding regions in two maize inbred lines, B73 and Mol17 (Brunner et al. 2005): more than 50% of the compared regions were not colinear and two thirds of the genes were absent in one of these loci. This has since been shown to be due to the activity of helitron rolling-circle transposons (Lai et al. 2005; Morgante et al. 2005), a new class of eukaryotic transposons. Although these results are apparently in conflict with the “Circle Diagram” picture at a microscale level, there is no doubt that the model reflects the conservation of genome structure at the level of several megabases or tens of megabases. It should be remembered that the rice genome contains about 30% of transposable elements, that maize has probably more than 50% of its genome occupied by retroelements and that this proportion is close to 80% in the wheat genome (SanMiguel et al. 1998). Finally, considering the 65 to 70 million years of evolution
450
Richard Cooke et al.
within the cereal family, it is not surprising that extensive exceptions or violations of the colinearity rule are observed between the grass genomes. It was more surprising to observe such violation between inbreds of maize, indicating that the different grass genomes may have different evolutionary dynamics. Only large-scale comparisons will show the level of conservation of colinearity at the individual gene level and will also provide clues as to the means of best exploiting existing synteny. In many cases, a global approach, taking into account all duplicated regions in both species under consideration, should allow the identification of many candidate genes in orthologous regions.
17.7 The Rice Genome as a Model for Map-Based Cloning in Cereals The discovery of extensively conserved synteny between rice and other cereals has opened up the exciting prospect of using the rice genome data to support positional cloning of genes from large and complex genomes in a so called “cross genome map-based cloning” approach (Feuillet and Keller 2002; Paterson et al. 2005; Xu et al. 2005). The availability of rice genome sequences provides a large number of tools which facilitate positional cloning of rice genes in other rice varieties. The sequences of indica and japonica rice genomes have been compared and a number of convenient and easy-to-use markers such as simple sequence repeats (SSR; McCough et al. 2002) and single-nucleotide polymorphisms (SNP; Feltus et al. 2004; Shen et al. 2004) have been developed. This has greatly facilitated the positional cloning of many important genes and QTLs in rice in the past few years (e.g., Ashikari et al. 2005; Bradbury et al. 2005; Ueguchi-Tanaka et al. 2005; Albar et al. 2006; Konishi et al. 2006; Li et al. 2006a and examples in Chapters 7 and 16 in this book). Because many of these genes are conserved in other cereals, this also facilitated the isolation and characterization of orthologs and paralogs from the other species. One of the best example of colinearity in gene type and function is illustrated by the “green revolution” dwarfing genes sd1 (Monna et al. 2002; Nagano et al. 2005), Rht-1 in wheat, D8 in maize (Peng et al. 1999; Nagano et al. 2005). A key locus for modern polyploids, the Ph1 locus, which prevents pairing between homologous chromosomes in bread wheat, has been characterized via a two-part cloning strategy (Griffiths et al. 2006). First, the conservation between the wheat Ph1 locus and the rice and Brachypodium sylvaticum orthologous regions provided
17 From Rice to Other Cereals: Comparative Genomics
451
markers to saturate the wheat region. Second, the authors used a set of deletions of the wheat Ph1 region to restrict the locus to a 2.5-Mbp segment of wheat chromosome 5B. This region consists of a segment of subtelomeric chromatin that inserted in a cluster of Cdc2-related genes, after polyploidization. The diploid progenitor of the B genome, Aegilops speltoides, does not possess this insertion. Orthologs of the wheat vernalization gene Vrn1 (Yan et al. 2003) and the barley photoperiod Ppd-H1gene have been found in rice (Turner et al. 2005). The maize barren stalk1 mutation mapped in a syntenic region with the rice lax panicle gene. Once the latter was cloned (Komatsu et al. 2003), it was easy to isolate Ba1 (Gallavotti et al. 2004). Similarly, the maize thick tassel dwarf 1 (td1) mutation mimics the Arabidopsis clv1 mutation (Bommert et al. 2005). It was demonstrated that they were determined by orthologous genes just after the rice orthologous mutation floral organ number 1 (fon1) was identified by positional cloning in rice (Suzaki et al. 2004). The barley sw3 dwarfism gene was also cloned using synteny with rice (Gottwald et al. 2004). Thus, genes and QTLs involved in developmental processes, and that have been selected during domestication, generally show a good conservation between cereal genomes. In these cases, the rice genes are good candidates for the direct isolation of the orthologs from other cereal genomes. In contrast, other types of genes do not show colinearity between the grass genomes. Indeed, there was, until recently, no example of colinearity retained for disease resistance (R) genes between grass genomes. Accordingly, rice genome information was not helping in the map-based cloning of R genes. Their nonsyntenic location between cereals was described almost 10 years ago through comparative genetic analysis (Feuillet and Keller 2002). For example, the rice genome contains genes homologous to the wheat Lr10 and Pm3 fungal disease R genes (Guyot et al. 2004; Yahiaoui et al. 2004) but at nonorthologous positions, indicating massive genome rearrangements. In the case of barley Rpg1 stem rust R gene, there is no orthologous gene present in the rice genome (Brueggeman et al. 2002). Only one exception to this rule has been reported recently (Chen et al. 2005): a QTL conferring resistance to the blast fungus Magnaporthe grisea is conserved in rice and barley at the same homologous location and with the same race specificity. However, even if a gene is not conserved at its orthologous position in rice, very often the flanking genes are still conserved and then synteny approaches are expected to provide a collection of useful markers to saturate the landing region in the other cereal genomes.
452
Richard Cooke et al.
Examples of map-based cloning projects in Poideae and the extent to which the rice genome has been used for gene cloning are given in Table 17.4 and provide an illustration of the power of the cross-genome mapbased approaches. Other examples of attempts to isolate orthologous genes using rice were reported in maize (Bortiri et al. 2006a, 2006b) and a few other species. Pennisetum squamulatum and Cenchrus ciliaris (buffelgrass) are the two apomictic Poaceae members. Although rice is not apomictic, the apospory specific genomic region (ASGR) is colinear with the region of rice chromosome 11 that is proximal to the centromere. Therefore, Pennisetum contigs could be organized by using rice markers (Gualtieri et al. 2006). Synteny between rice, sorghum, and sugarcane was also used to generate markers and to saturate the genetic region for the map-based cloning of a major leaf brown rust resistance (Bru1) gene from sugarcane (Le Cunff et al. 2007). In cases where the colinearity is too low, alternative strategies, such as transposon tagging, the use of more closely related species or direct mapbased cloning in the species of interest, have to be employed. As an example, the Ramosa1 gene, which controls the architecture of the maize tassel, is specific for Andropogoneae tribe and is lacking in rice. It was isolated using a transposon-tagging strategy (Vollbrecht et al. 2005). Several additional publications describe similar difficulties that have prevented direct identification of the candidate gene (Brunner et al. 2003; Miftahudin et al. 2005). The observation that many small rearrangements have disrupted the conservation between grass genomes during their evolution has led cereal geneticists to increase the number of markers in their species through the development of international initiatives. These efforts have allowed the development of large collections of genomic tools in species such as wheat, barley, ryegrass, sorghum, and sugarcane. This has also resulted in a paradigm shift in how comparative analysis and model genomes are exploited for gene cloning in cereals. Rice is no longer considered as a surrogate to perform chromosome walking in the other nonmodel genomes but it is used as a source of markers to saturate target genetic regions during map-based cloning in the other cereal genomes. Further studies might be necessary to establish the relationship between genome colinearity and the function of gene products due to extensive gene duplications (Moore and Purugganan 2005). Several studies on the C-class floral homeotic genes in the angiosperms already provide an example of the complexity of the duplication/functional evolution relationship (Kramer et al. 2004). Similarly, sequence comparison shows that the orthologous gene to the Arabidopsis AGAMOUS (AG) in Antirrhinum is FARINELLI (FAR), as is confirmed by genome colinearity; however the functional homolog of AG is PLENA, a paralog of FAR derived from an
17 From Rice to Other Cereals: Comparative Genomics
453
Table 17.4. A summary of map-based cloning in cereals (successful and in progress) Orthologous position in rice References and colinearity No, Disrupted Guyot et al. 2004 No, Disrupted Yahiaoui et al. 2004
Gene
Species
Function
Lr10 Pm3
T. aestivum T. aestivum
Disease resistance Disease resistance
Vrn1
T. monococcum Growth habit
3, Yes
Yan et al. 2003
Vrn2 Q
T. monococcum Growth habit T. aestivum Free threshing
3, Yes Not used
Yan et al. 2004 Simons et al. 2006
Lr21 Ph1 Ph2 Ppd-H1 Ror1
Ae. Tauschii T. aestivum T. aestivum H. vulgare H. vulgare
No 9, Yes 1, Yes 7, Yes Not used
Huang et al. 2003 Griffith et al. 2006 Sutton et al. 2003 Turner et al. 2005 Collins et al. 2001
Ror2
H. vulgare
3, Yes
Collins et al. 2003
Rpg1 Mla Rym4/ Rym5 Rar1
H. vulgare H. vulgare H. vulgare
6, Yes Not used Not used
Brueggeman et al. 2002 Shen et al. 2003 Pellio et al. 2005
Rht1
T. aestivum
Tga1
Zea mays
Vgt1
Zea mays
Ra2
Zea mays
Ra3
Zea mays
Pt2
Zea mays
Ba1
H. vulgare
Disease resistance Chromosome pairing Chromosome pairing Photoperiod response Disease resistance transduction pathway Disease resistance transduction pathway Disease resistance Disease resistance Virus resistance
Disease resistance No, Disrupted transduction pathway Dwarfing gene 3, Yes Teosinte glume 8, Yes architecture Transition from vege- 8, Yes tative to reproductive phase Ramosa 1, Yes (Tassel branching) Ramosa 2, Yes (Tassel branching)
Zea mays
Inflorescence architecture Barren stalk 1
Yes 1, Yes
Td1
Zea mays
Thick tassel dwarf
6, Yes
Ts4 Bru1 ASGR
Zea mays S. officinarum Pennisetum squamulatum
Tassel seed 4 Disease resistance Apospory specific genomic region
12, Yes 2, Yes 11, Yes
Lahaye et al. 1998 Peng et al. 1999; Ikeda et al. 2001 Wang et al. 2005b Salvi et al. 2002 ; Chardon et al. 2004 Bortiri et al. 2006a Bortiri et al. 2006b; Satoh-Nagasawa et al. 2006 Bortiri et al. 2006b Komatsu et al. 2003; Gallavotti et al. 2004 Suzaki et al. 2004; Bommert et al. 2005 Bortiri et al. 2006b Le Cunff et al. 2007 Gualtieri et al. 2006
454
Richard Cooke et al.
ancient (approximately 125 Mya) duplication common to both species (Causier et al. 2005). As gene loss or subfunctionalisation after speciation events are (as far as we know) independent in diverged species, it is probable that similar situations will be found for many genes resulting from duplication in the grass genomes.
17.8 Comparative QTL Mapping and Meta-Analysis of QTL Very often, agronomic traits do not simply segregate in populations as single Mendelian genes, but rather as quantitative trait loci (QTL) that are located at different positions in the genome, each accounting for a fraction of the variance. A major challenge in genomics is to isolate such QTLs. This has proven to be difficult because QTLs are revealed only in specific populations and therefore require intensive data collection and multiple assays on different segregating populations. Many QTL studies have been carried out by breeders in most cultivated cereal plants. Such experiments are very well documented in rice in which thousands of QTLs affecting many traits have been identified. Most of them are described in the Gramene public database and in Chapters 7 and 16 of this book. A few examples are drought tolerance (Yue et al. 2006), plant architecture, flowering time (Thomson et al. 2006), grain filling and size, aroma, seed dormancy, shattering (Konishi et al. 2006; Li et al. 2006a), submergence, aluminium tolerance, disease resistance, and salinity tolerance. Similar QTLs are being searched and analyzed in other cultivated cereals. One possibility to reduce the complexity of the analysis is to take advantage of the QTL analyses which are often available in the literature for the same trait in different as well as independent experiments. A convenient approach consists in pooling data across linkage studies into a single genetic system to produce a more precise estimation of the confidence intervals obtained independently from separate studies associated with low statistical power: this strategy is called meta-analysis of QTL (Etzel and Guerra 2002). In such studies, the consensus confidence interval is reduced. The meta-analysis concept was adapted by Goffinet and Gerber (2000). Meta-QTL has been carried out in animal and human genetics, as well as in the field of plant breeding, for example to study the genetic architecture of flowering time of maize (Chardon et al. 2004). Bioinformatic tools have been developed to support meta-QTL analysis. Biomercator (Arcade et al. 2004) is a graphical interface that allows the projection of QTLs obtained in independent genetic backgrounds, different traits, and different locations onto a single genetic consensus map. The
17 From Rice to Other Cereals: Comparative Genomics
455
program integrates the independent genetic maps into a consensus map (map projection algorithm) and recalculates the corresponding initial QTL into the most likely consensus QTL distribution (Meta-analysis algorithm). A prerequisite in performing meta-QTL analysis is to have access to a dense consensus genetic map that allows the projection of as many QTLs as possible from separate studies. This situation is met for most cereals. It is possible to integrate and compare different genetic maps that share common markers. Thus, markers and QTLs that could not be mapped in one study may be placed on the basis of their relative positions to common markers in another study. Every additional map contains a novel set of markers and may segregate for new phenotypes, thereby providing unique and valuable genetic information. A graphical consensus genetic map can be constructed using any of the following approaches: (1) Graph-Theoretic approach (Yap et al. 2003); (2) CarthaGene (http://www.inra.fr/bia/CarthaGene/), JoinMap (http://www.kyazma.nl), Multilocus Consensus Genetic Maps (MCGM; Mester et al. 2006); and (3) Biomercator (Arcade et al. 2004). The observation that the gene order is roughly conserved along the chromosomes of different species belonging to the same family such as in the case of the cereals, and that equivalent QTLs are often located in syntenic homologous positions, led to the extension of the application of meta-analysis across species (Chardon et al. 2004). This requires that experimental data are collected and stored in specific databases that are regularly curated. Such an information system was set up by the International Maize and Wheat Improvement Center (CIMMYT) and several other international institutes belonging to the Consultative Group for International Agricultural Research (CGIAR) in the form of ICIS (International Crop Information System) more than 10 years ago. More recently, a platform for meta-analysis of rice has been created as an initiative of the International Rice Research Institute (IRRI), called the International Rice Information System (IRIS; McLaren et al. 2005). A few examples of these approaches are illustrated in the text that follows. A major QTL for aluminium tolerance has been mapped near the end of sorghum chromosome 3. Significant synteny was observed for markers linked to the sorghum QTL on wheat chromosome 4 where a major QTL for aluminum tolerance is located. However, this is not the orthologous region of sorghum chromosome 3, suggesting that the sorghum and wheat QTLs are not orthologous. Comparison with rice indicated that a rice QTL on chromosome 1 is likely orthologous to the sorghum QTL on chromosome 3 and that a second rice QTL, mapping on chromosome 3, is orthologous to the Triticeae QTL on wheat, barley, and rye chromosome 4 (Magalhaes et al. 2004).
456
Richard Cooke et al.
Recently, flowering time QTL analysis in maize suggested that there may be two consensus QTLs, vgt1 and vgt2 on maize chromosome 8. Detailed syntenic analysis showed that the vgt1 region displays a highly conserved duplicated region on chromosome 6, which also plays an important role in maize flowering time variation (Chardon et al. 2005). A comparative analyses of QTLs for important agronomic traits between maize and rice demonstrates that 16 out of 45 QTLs affecting different maize traits (plant height, kernel row number, and kernels per row) are conserved, compared with 12 of 38 QTLs affecting different rice traits (plant height, tillers per plant, and grains per panicle). Maize usually has two conserved QTLs corresponding to one rice QTL, owing to ancient tetraploidization of the maize genome. Known QTL-rich regions on chromosomes in maize and rice usually show clustering of QTLs affecting different traits. These results reveal that the QTL for similar traits in maize and rice may have a common origin. A major QTL controlling both preharvest sprouting and seed dormancy (accounting to 70% of the phenotypic variation) has been identified on the long arm of barley chromosome 5H, showing a good synteny with the terminal end of the long arm of rice chromosome 3 as well as the wheat chromosome 4AL. However, it is located outside of the region reported for a seed dormancy QTL in wheat. The wheat chromosome 4AL QTL region for seed dormancy was syntenic to both rice chromosome 3 and 11. In both cases, corresponding QTLs for seed dormancy have been mapped in rice (Li et al. 2004). Synteny has been used to examine the genetic basis for inflorescence variation between foxtail millet and its wild relative green millet. Fourteen robust QTLs were mapped on the millet genetic map, and the synteny of millet chromosomes and rice chromosomes was used in an attempt to identified candidate genes. Thus, millet chromosomes V and IX, which contain several QTLs affecting inflorescence architecture, are indeed colinear with maize chromosomes 3 and 1, which also contain QTLs for inflorescence architecture. Examination of known genes in these regions in maize and rice allowed Doust et al. (2005) to consider some genes as putative candidates underlying the QTLs in millet. However, the candidate genes always need to be validated by further functional studies. Similar studies were carried out concerning the perenniality trait in sorghum, using the rice genetic map as a reference (Hu et al. 2003). The availability of the rice genome sequence obviously facilitates molecular cloning of these QTL. On the other hand, characterization of an important QTL in another cereal will raise the question of the existence and function of orthologuous genes in the other species. Such an example has recently been reported. A major domestication gene, the wheat Q gene has been cloned and demonstrated to encode an AP2-like TF. The Q gene
17 From Rice to Other Cereals: Comparative Genomics
457
is responsible for the soft glumes, a nonfragile rachis, a square spike, and free-threshing in wheat. Putative orthologues of Q have been identified in rice, maize, and barley. In maize, the orthologous gene is probably indeterminate spikelet1, which determines the number of floral meristems, but no functions have been assigned yet to the rice and barley orthologs (Simons et al. 2006). The number of examples of QTLs cloned using synteny information is likely to increase in the next few years.
17.9 Comparative Expression Profiling The development of genomic resources for several crops has led to numerous genome-wide studies of gene expression based on various types of DNA chips. Some are based on EST and cDNA, others on short or long oligonucleotides. At the beginning, comparisons of different chip-based experiments were difficult because many chips were home-made and the early experiments were not always correctly standardized. This situation is rapidly changing as a result of the availability of commercially produced high-quality DNA chips and stringent standardization requirements for data presentation by most journals (Zimmerman et al. 2006). It is now possible to compare gene expression profiles in similar physiological and biological situations in different crops. This approach still has limitations such as poor annotation, incomplete representation of the genome for most crops, different kinetics of development and phenetic stages, and variable experimental conditions. The most extensive studies of gene expression have been carried so far in Arabidopsis, and several resources have been established to compare results from one experiment to another, such as TAIR, NASCArrays, and Genevestigator (Zimmermann et al. 2004, 2006). Similar experiments have been carried out in rice and already compared with the Arabidopsis data (Ma et al. 2005b). This study analyzed expression in the different organs of both plants. It revealed that most of the rice gene models with no obvious homologs in Arabidopsis were expressed in rice, thus confirming the validity of the predictions, that the corresponding organs of Arabidopsis and rice expressed a similar proportion of their genome, and that the expression profiles of orthologous genes are conserved to different degrees between the two plants. It was also observed that in rice a significant proportion of adjacent genes are coregulated. Another comparative study involved the analysis of the light-regulated genes in the two species (Jiao et al. 2005). About 20% of both genes are regulated by light. Qualitatively similar expression profiles were observed for seedlings grown under different light regimens, but a quantitatively weaker effect was observed in rice. A few differences were observed
458
Richard Cooke et al.
between the two species, notably for transcription factors, which implies that the rate of TF expression is evolving faster than that of housekeeping genes. A number of transcriptomic studies of stress responses using various types of DNA chips have been reported in different cereals, but they have not yet been extensively compared. Because the various crops often show similar adaptive responses, these comparisons should be particularly helpful in unravelling key regulatory genes and in investigating control of their expressions. Examples given are transcript profiling of contrasting genotypes in response to drought and salinity in rice(Kawasaki et al. 2001; Rabbani et al. 2005; Walia et al. 2005), barley (Ozturk et al. 2002) and sorghum (Buchanan et al. 2005; Pratt et al. 2005; Salzman et al. 2005), and to cold in sugarcane (Nogueira et al. 2003). Other examples of expression patterns in rice relate to development (Zhu et al. 2003; Duan and Sun 2005; Duan et al. 2005; Wang et al. 2005c). All these expression profiling experiments in cereals are going to accumulate and to be stored in databases. Therefore meta-analysis of expression patterns across cereal species is expected to develop rapidly.
17.10 Comparative Biology in the Era of Genomics The ultimate goal of genomic studies is to understand the biology of the plant and the molecular basis of biodiversity. The development of comparative genomics and associated tools provides new avenues to investigate some aspects of development or adaptation to environments. This domain of comparative developmental biology is still in its infancy because of insufficient data sets. However, in recent years, tremendous progress has been made with the elucidation of basic processes such as functioning of the meristem, control of flowering time and of flower morphogenesis, and analysis of branching. In Arabidopsis, at least four pathways contribute to the control of flowering time—the photoperiodic, the gibberellin, the vernalization, and the autononomous pathways. In rice, the photoperiodic pathway seems to be predominant. At least 14 QTLs for heading time have been identified in a population resulting from a cross between a japonica variety (Nipponbare) and an indica variety (Kasalath). Rice orthologs of the Arabidopsis CO and FT (FLOWER LOCUS T) genes were isolated by positional cloning of two heading time QTLs, Hd1, and Hd3a, respectively (Yano et al. 2000; Kojima et al. 2002). FT is a target of CO and is an integrator gene. There are 10 FT-like genes in rice and only one in Arabidopsis, suggesting the emergence of novel functions in rice. The PHOTOPERIOD SENSITIVITY5 (SE5) gene in rice, coding for an heme oxygenase involved
17 From Rice to Other Cereals: Comparative Genomics
459
in phytochome chromophore synthesis, is absolutely required for photoperiodic flowering in rice, and is the ortholog of HY1 (LONG HYPOCOTYL 1) in Arabidopsis. Several other genes involved in control of the circadian clock and of flowering time in rice are also orthologs of Arabidopsis circadian clock and flowering genes (Izawa et al. 2003). If the general observation is that the major genes are conserved between the two species, there are also obvious differences in photoperiod response, as mentioned previously. Therefore the regulation of these major genes has to be altered in some way in rice to adjust to short-day conditions (Hayama and Coupland 2004). Some genes which are important in Arabidopsis are clearly absent in rice, indicating either that their function does not exist in rice or has been taken over by an other gene, as in the case of FLOWERING LOCUS C (FLC), FRIGIDA (FRI), and EARLY FLOWERING 4 (ELF4). Even though similar genes exist in both the species, their regulation might be different. A new response regulator, Ehd1, has been identified that promotes flowering in rice via induction of FT-like genes in an Hd1independent manner. There is no ortholog of the rice Edh1 gene in the Arabidopsis genome, and therefore this type of control appears to be unique to rice and possibly cereals (Doi et al. 2004). Perception of light might also be different between the two species. The Arabidopsis genome possesses five functionally distinct phytochrome and two cryptochrome genes, but rice has three phytochrome and three cryptochrome genes. Availability of natural alleles, induced and insertion mutants in rice (see Chapters 7, 8, 9, and 10 of this book) now allows one to functionally analyze the role of these genes in rice compared to their orthologues in Arabidopsis (Takano et al. 2005). Rice presents many varieties that are adapted to different latitudes or altitude, and therefore to different light regimens. Some others have become relatively insensitive to photoperiod. Examination of these varieties for the presence, allelic variation, copy number, and expression of the various key genes should considerably improve understanding of the adaptation of these varieties to their environment. There is limited information on the other pathways controlling flowering in rice. However, recently the ortholog of a key gene in the autonomous pathway in Arabidopsis, FCA, was isolated from rice and extensively analyzed (Lee et al. 2005). The OsFCA gene partially complements the Arabidopsis fca mutation, indicating that not only the gene structure but also its function is conserved. However, the rice OsFCA protein possess a glycine-rich domain which is absent in the Arabidopsis orthologue and the differential splicing pattern of the rice mRNA is more complex than in Arabidopsis. Identification of key genes for flowering time in rice should facilitate the same outcome in the other cereal species. LEAFY is an important flowering integrator gene. Rice has a single gene (RFL), but there are two homologous copies in maize (Zfl1 and Zfl2).
460
Richard Cooke et al.
In both species, these genes control pleiotropic characters associated with inflorescence development and patterning. Analysis of Zfl1 and Zfl2 in different maize backgrounds recently provided evidence for partial specialization following duplication. Zfl1 shows stronger association with a flowering time QTL, whereas Zfl2 associates more strongly with branching and inflorescence structure traits. Comparison with the maize ancestor, teosinte, suggested that Zfl2 played a significant role in morphological evolution of the maize ear (Bomblies and Doebley 2006). Analysis of inflorescence architecture (Kellogg 2004; McSteen 2006) is another fascinating topic in comparative biology. The inflorescence is highly variable within grasses ranging from the spikes in wheat and barley to the more branched tassels of sorghum and maize. In addition, maize is a monoecious plant with male flowers on a terminal tassel and female flowers on lateral ears. The basic unit of the grass inflorescence is the spikelet, which is a compact axillary branch, consisting in two bracts subtending one to several reduced flowers. The maize tassel consists of a main spike with several long, indeterminate branches at the base, whereas the ear is a single spike with short branches. Using maize mutants, a number of genes have been identified, in which mutation affects the inflorescence architecture. They can be classified in two groups—those showing a decrease in branch number such as liguleless2, barren stalk1, barren inflorescence1 and 2 and those showing an increase in branching such as the ramosa, branched silkless1 and indeterminate spikelet1 mutants. Two of the genes in the ramosa pathway have been cloned recently. The Ramosa2 gene was cloned using synteny with rice and it has an orthologous gene in rice (Bortiri et al. 2006a), but the Ramosa1 gene is absent from the rice genome and was isolated by transposon tagging in maize. The expression pattern of Ramosa2 is conserved among grasses. Ramosa2 codes for a Lateral Organ Boundary (LOB) domain protein and controls the activity of Ramosa1 in maize and in sorghum (Vollbrecht at al. 2005). Ramosa1 might have played an important role in maize domestication. Teosinte branched1 is another gene involved in the morphogenesis of the maize inflorescence and codes for a TCP type transcription factor. This gene has been isolated as an inflorescence QTL, distinguishing maize from its presumed ancestor, teosinte, and it is assumed that modification of its activity is largely responsible for the change in ear morphology during domestication (Wang et al. 1999). Comparative analysis of this gene among maize lines and related grasses, including rice, revealed a number of changes within the coding sequence as well as a new domain that is conserved among all Tb-like genes from grasses. However, examination of the ratio of nonsynonymous to synonymous substitutions failed to detect any evidence for positive selection, implying that the Tb1 protein has not experienced significant positive selection during evolution of the grass
17 From Rice to Other Cereals: Comparative Genomics
461
(Lukens and Doebley 2001). Indeed, further studies demonstrated that the 5' noncoding region, rather than the coding sequence itself, was under selection, (Clark et al. 2004). This type of comparative analysis will probably be carried out on most genes suspected to play a major role in domestication of cereal species in the near future. A final example of comparative biology worth reporting is the cross species analysis of the SEPALLATA genes. These genes code for MADS box TFs which control floral development by interacting with the floral identity genes and by mediating their expression. The Arabidopsis genome contains four genes in this class: SEP1, SEP2, SEP3, and SEP4 (formerly AGL2, AGL4, AGL9, and AGL3). They form a well-supported clade within the MADS box gene phylogeny (Nam et al. 2003; Zahn et al. 2005) that can be further subdivided into three clades—a mixed dicot/monocot clade containing Arabidopsis SEP3; a dicot clade containing Arabidopsis SEP1, SEP2, and SEP4; and a clade comprising only monocot genes. The grass SEP genes are not clearly orthologous to the Arabidopsis ones and therefore it is difficult to compare their function in detail. OsMADS1/LHS1 (also known as LEAFY HULL STERILE1, Jeon et al. 2000) is currently the best characterized SEP gene in rice and two orthologs have been identified in maize. To complete the phyletic analysis, the orthologous genes have been isolated from 16 grass species (Malcomber and Kellogg 2004). More interestingly, the expression pattern of this gene has been analysed in a number of these species by in situ hybridization. Distinct patterns were observed and could be correlated with the evolution of the different tribes. LHS1 gene expression is restricted to the inflorescence tissues. In early development it is expressed throughout the spikelet meristem but never in the glumes. In the ancestor to the grasses studied, LHS1 is expressed in the palea, lemma, and pistil, only in the upper florets of the spikelet, with no expression in the sterile or staminate florets at the base. Expression in the pistil is lost in the Pennisetum lineage and expression in the stamen is gained in the maize and Chasmanthium lineages. It is thus possible to reconstruct the evolution of the expression pattern of this gene in parallel to the flower morphology. Such studies will provide a molecular and evolutionary basis and explain the diversity of forms and development of the inflorescence in grasses.
17.11 Genome Sequencing in Grasses: Beyond the Model What lessons can be learned from these preliminary comparative genomic analyses? The first is that the complete and accurate Arabidopsis and rice genome sequences have been incredibly useful and have boosted plant science
462
Richard Cooke et al.
research to another complexity level. They have demonstrated that sequencing of large genomes is feasible, still expensive, but worth doing. Comparisons between different genomic data have revealed the impressive conservation of most of the gene catalogue as well as the extraordinary diversity of this gene repertoire. Even more fascinating is the comparison across species of developmental pathways such as flower morphogenesis, control of flowering time or of response to environment, as revealed by disease resistance genes. All these aspects of plant development and adaptation are now supported by genomic data which help further dissection of these phenomena and open new areas for breeding strategies, plant protection, and conservation of biodiversity. They also reveal the usefulness of comparisons across species to improve annotation, and to discover new genes and new regulatory signals. In addition to a collection of genes that are shared by most species, new genes which are specific to one or several lineages have been discovered. The number and nature of many genes is rapidly changing during evolution, allowing specific adaptation and colonization of different niches by the various plant species. The observation that synteny is relatively well conserved among cereals has facilitated discovery and positional cloning of important genes in more complex genomes than that of rice, but evidence for rearrangements and disruption of synteny indicates that in some situations we have to directly work on the complex genome. These observations clearly suggest that we cannot satisfy ourselves with just two plant genomes, when there are more than 250,000 plant species, and that new strategies have to be developed to better understand not only the genomes of different important crops, but also the whole field of plant diversity. Because sequencing costs are still high and current technology slow, new genome targets and sequencing strategies will have to be chosen in the very near future (Jackson et al. 2006; Paterson 2006). The BAC to BAC sequencing approach for the rice genome has provided a much more accurate sequence than the whole genome sequencing (WGS) one. This allows a number of analyses that require highly polished sequence, such as SNP discovery and genetic variation analysis between varieties, to discover the basis of QTLs for important agronomic traits. This experience should serve to define the most relevant strategy for larger and more repetitive genomes. Immediate candidates for sequencing are the major crops such as maize, wheat, sorghum, and barley which diverged from rice about 65 to 70 Mya. The challenge is to analyze these giga-sized genomes which are full of retroelements. The possibility of sequencing another small-sized model genome such as Brachypodium distachyon or B. sylvaticum (Draper et al. 2001) has also been discussed. These two species belong to the Poaceae and are closer to wheat than to rice. With 10 pairs of chromosomes and a genome size of approximately 175 Mbp, this is one of the smallest genomes in cereals and an attractive additional model.
17 From Rice to Other Cereals: Comparative Genomics
463
For maize, physical maps have already been established and a gold standard has been defined by the maize community for genome sequencing (Messing and Dooner 2006; Rabinowicz and Bennetzen 2006; Wessler 2006). The defined target is to obtain the complete sequence and structures of all genes and their locations on both the genetic and physical maps, using the B73 line as the reference. A project is already underway (http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=052712. Sequencing will be carried out using a combination of WGS, BAC, and gene-enriched (GE) fraction sequencing and completed via methylation spanning linker libraries (MSLL; Yuan et al. 2002) and hypomethylated partial restriction libraries (HMPRL; Emberton et al. 2005) sequencing to ensure the capture of most of the gene space. An ordered BAC library is already available (Nelson et al. 2005). Gene-enriched fractions are obtained either by methyl filtration or by high C0t analysis. Annotation tools produced for the rice genome will also be useful to annotate other cereal genomes (Haberer et al. 2005). For other important cereal crops such as wheat and sorghum, workshops have been held to set up the foundation of strategies for sequencing (Gill et al. 2004; Kresovich et al. 2005; Paterson 2006). A sorghum sequencing project (http://www.jgi.doe.gov/sequencing/why/CSP2006/sorghum.html) has been launched and sequencing will be performed by the DOE/JGI. The sorghum genome is relatively small (738 Mb) and preliminary experiments have been reported (Bedell et al. 2005). An International Wheat Genome Sequencing Consortium (IWGSC) was created in 2005 with the aim of establishing a physical map of the 21 chromosomes of hexaploid wheat and sequencing the wheat gene space (http://wheatgenome.org). The wheat genome is 40 times larger than that of rice and 6 times larger than that of maize. Its hexaploid nature increases the difficulty in physical mapping and sequence assembly. A number of pilot projects are currently underway to determine the feasibility of using the whole-genome fingerprinting strategy or chromosome-specific strategies for the construction of the physical map. Preliminary experiments have been conducted to test the efficiency of gene enrichment methods as in maize. A barley physical map project is also underway. The answer to the question, “What’s next after sequencing?”is probably “even more sequencing.” Yet, sequencing is not an end in itself. This is just the tool that allows us to understand the organization and evolution of genomes, and the foundation for understanding the integrated biology of the plant species and for improving our crops.
464
Richard Cooke et al.
Acknowledgments The authors thank CNRS, IRD, and INRA for continuous support of their research units, as well as Genoplante and ANR for supporting several programs in their laboratories. They also thank their numerous colleagues for fruitful exchanges and discussions. The authors thank Prof. Robert Henry and Dr. Liz Dennis for critical review of the manuscript.
References Albar L, Bangratz-Reyser M, Hébrard E, Ndjiondjop MN, Jones M, Ghesquière A (2006) A single nucleotide polymorphism in the translation initiation factor eIF(iso)4G confers resistance of rice to Rice Yellow Mottle Virus. Plant J 47:417–426 Ammiraju JSS, Luo M, Goicoechea JL, Wang W, Kudrna D, Mueller C, Talag J, Kim HR, Sisneros NB, Blackmon B, Tomkins JB, Brar D, MacKill D, McCouch S, Kurata N, Lambert G, Galbraith DW, Arumuganathan K, Rao K, Walling JG, Gill N, Yu Y, SanMiguel P, Soderlund C, Jackson S, Wing RA (2006) The Oryza bacterial artificial chromosome library resource: construction and analysis of 12 deep-coverage large-insert BAC libraries that represent the 10 genome types of the genus Oryza. Genome Res 16:140–147 Arcade A, Labourdette A, Falque M, Mangin B, Chardon F, Charcosset A, Joets J (2004) Biomercator: integrating genetic maps and QTL towards discovery of candidate genes. Bioinformatics 20:2324–2326 Ashikari M, Sakakibara H, Lin S, Yamamoto T, Takashi T, Nishimura A, Angeles ER, Qian Q, Kitano H, Matsuoka M (2005) Cytokinin-oxidase regulates rice grain production. Science 309:741–745 Bartel DP (2004) MicroRNAs: genomics, biogenesis, mechanisms and function. Cell:116:281–297 Bedell JA, Budiman MA, Nunberg A, Citek RW, Robbins D, Jones J, Flick E, Rholfing T, Fries J, Bradford K, McMenamy J, Smith M, Holeman H, Roe BA, Wiley G, Korf IF, Rabinowicz PD, Lakey N, McCombie WR, Jeddeloh JA, Martienssen RA (2005) Sorghum genome sequencing by methylation filtration. PLoS Biol 3(1):e13 (DOI: 10.1371/journal.pbio.0030013) Bennetzen JL, Ma J (2003) The genetic collinearity of rice and other cereals on the basis of genomic sequence analysis. Curr Opin Plant Biol 6:128–133 Bennetzen JL, Ramakrishna W (2002) Numerous small rearrangements of gene content, order and orientation differentiates grass genomes. Plant Mol Biol 48:821–827 Bennetzen JL, Coleman C, Liu R, Ma J, Ramakrishna W (2004) Consistent overestimation of gene number in complex plant genomes. Curr Opin Plant Biol 7:732–736 Berardini TZ, Mundodi S, Reiser L, Huala E, Garcia-Hernandez M, Zhang P, Mueller LA, Yoon J, Doyle A, Lander G, Moseyko N, Yoo D, Xu I, Zoeckler B,
17 From Rice to Other Cereals: Comparative Genomics
465
Montoya M, Miller N, Weems D, Rhee SY (2004) Functional annotation of the Arabidopsis genome using controlled vocabularies. Plant Physiol 135:745–755 Blanc G, Wolfe KH (2004) Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes. Plant Cell 16:1667–1678 Blanc G, Barakat A, Guyot R, Cooke R, Delseny M (2000) Extensive duplication and reshuffling in the Arabidopsis genome. Plant Cell 12:1093–1101 Bomblies K, Doebley JF (2006) Pleiotropic effects of the duplicate maize FLORICAULA/LEAFY genes Zfl1 and Zfl2 on traits under selection during maize domestication. Genetics 172:519–531 Bommert PB, Lunde C, Nardmann J, Vollbrecht E, Running MP, Jackson D, Hake S, Werr W (2005) thick tassel dwarf 1 encodes a putative maize orthologue of the Arabidopsis CLAVATA 1 leucine-rich receptor-like kinase. Development 132:1235–1245 Bonnet E, Wuyts J, Rouze P, van de Peer Y (2004) Detection of 91 potential conserved plant microRNAs in Arabidopsis thaliana and Oryza sativa identifies important target genes. Proc Natl Acad Sci USA 101:11511–11516 Bortiri E, Chuck G, Vollbrecht E, Rocheford T, Martienssen R, Hake S (2006a) ramosa2 encodes a LARERAL ORGAN BOUNDARY domain protein that determines the fate of stem cells in branch meristem of maize. Plant Cell 18:574–585 Bortiri E, Jackson D, Hake S (2006b) Advances in maize genomics: the emergence of positional cloning. Curr Opin Plant Biol 9:164–171 Bowers JE, Arias MA, Asher R, Avise JA, Ball RT, Brewer GA, Buss RYW, Chen AH, Edwards TM, Estill JC, Exum HE, Goff VH, Herrick KL, James Steele CL, Karunakaran S, Lafayette GK, Lemke C, Marler BS, Master SL, McMillan JM, Nelson LK, Newsome GA, Nwakanma CC, Odeh RN, Phelps CA, Rarick EA, Rogers, CJ, Ryan SP, Slaughter KA, Soderlund CA, Tang H, Wing RA, Paterson AH (2005) Comparative physical mapping links conservation of microsynteny to chromosome structure and recombination in grasses. Proc Natl Acad Sci USA 102:13206–13211 Bradbury LMT, Fitzgerald TL, Henry RJ, Jin Q, Waters DLE (2005) The gene for fragrance in rice. Plant Biotech J 3:363–370 Brueggeman R, Rostoks N, Kudrna D, Kilian A, Han F, Chen J, Druka A, Steffenson B, Kleinhofs A (2002) The barley stem rust-resistance gene Rpg1 is a novel disease-resistance gene with homology to receptor kinases. Proc Natl Acad Sci USA 99:9328–9333 Brunner S, Keller B, Feuillet C (2003) A large rearrangement involving genes and low-copy DNA interrupts the microcollinearity between rice and barley at the rph7 locus. Genetics 164:673–683 Brunner S, Fengler K, Morgante M, Tingey S, Rafalski A (2005) Evolution of DNA sequence nonhomologies among maize inbreds. Plant Cell 17:343–360 Buchanan CD, Lim S, Salzman R, Kagiampakis I, Morishige DT, Weers BD, Klein RR, Pratt LH, Cordonnier-Pratt MM, Klein PE, Mullet JE (2005) Sorghum bicolor’s transcriptome response to dehydration, high salinity and ABA. Plant Mol Biol 58:699–720
466
Richard Cooke et al.
Caldwell KS, Langridge P, Powell W (2004) Comparative sequence analysis of the region harboring the hardness locus in barley and its colinear region in rice. Plant Physiol 136:3177–3190 Causier B, Castillo R, Zhou J, Ingram R, Xue Y, Schwarz-Sommer Z, Davies B (2005) Evolution in action: following function in duplicated floral homeotic genes. Curr Biol 15:1508–1512 Chantret N, Salse J, Sabot F, Rahman S, Bellec A, Laubin F, Dubois I, Dossat C, Sourdille P, Joudrier P, Gautier MF, Cattolico L, Beckert M, Aubourg S, Weissenbach J, Caboche M, Bernard M, Leroy P, Chaloub B (2005) Molecular basis of evolutionary events that shaped the hardness locus in diploid and polyploidy wheat species (Triticum and Aegilops) Plant Cell 17:1033–1045 Chardon F, Virlon B, Moreau L, Falque M, Joets J, Decousset L, Murigneux A, Charcosset A (2004) Genetic architecture of flowering time in maize as inferred from quantitative trait loci meta-analysis and synteny conservation with the rice genome. Genetics 168:2169–2185 Chardon F, Hourcade D, Combes V, Charcosset A (2005) Mapping of a spontaneous mutation for early flowering time in maize highlights contrasting allelic series at two-linked QTL on chromosome 8. Theor Appl Genet 112:1–11 Chen H, Wang S, Xing Y, Xu C, Hayes PM, Zhang Q (2005) Comparative analyses of genomic locations and race specificities of loci for quantitative resistance to Pyricularia grisea in rice and barley. Proc Natl Acad Sci USA 100:2544–2549 Clark RM, Linton E, Messing J, Doebley J (2004) Pattern of diversity in the genomic region near the maize domestication gene tb1. Proc Natl Acad Sci USA 101:700–707 Collins NC, Lahaye T, Peterhansel C, Freialdenhoven A, Corbitt M, SchulzeLefert P (2001) Sequence haplotypes revealed by sequence-tagged site fine mapping of the Ror1 gene in the centromeric region of barley chromosome 1H. Plant Physiol 125:1236–1247 Collins NC, Thordal-Christensen H, Lipka V, Bau S, Kombrink E, Qiu JL, Huckelhoven R, Stein M, Freialdenhoven A, Somerville SC, Schulze-Lefert P (2003) SNARE-protein-mediated disease resistance at the plant cell wall. Nature 425:973–977 Dardick C, Ronald P (2006) Plant and animal pathogen recognition receptors signal through non-RD kinases. PLoS Pathogens 2:14–28 Devos KM (2005) Updating the ‘crop circle’. Curr Opin Plant Biol 8:155–162 Devos, KM, Gale MD (1997) Comparative genetics in the grasses. Plant Mol Biol 35:3–15 Doi K, Izawa T, Fuse T, Yamanouchi U, Kubo T, Shimatani Z, Yano M, Yoshimura A (2004) Edh1, a B type response regulator in rice, confers short day promoting of flowering and controls FT-like gene expression independently of Hd1. Genes Dev 18:926–936 Dong Q, Lawrence CJ, Schlueter SD, Wilkerson MD, Kurtz S, Lushbough C, Brendel V (2005) Comparative plant genomics resources at plant GDB. Plant Physiol 139:610–618
17 From Rice to Other Cereals: Comparative Genomics
467
Doust AN, Devos KM, Gadberry MD, Gale MD, Kellog EA (2005) The genetic basis for inflorescence variation between foxtail and green millet (Poaceae) Genetics 169:1659–1672 Draper J, Mur LAJ, Jenkins G, Ghosh-Biswas C, Bablak P, Hasterok R, Routledge APM (2001) Brachypodium distachyon. A new model system for functional genomics in grasses. Plant Physiol 127:1539–1555 Duan M, Sun SS (2005) Profiling the expression of genes controlling rice grain quality. Plant Mol Biol 59:165–178 Duan K, Luo YH, Luo D, Xu ZH, Xue HW (2005) New insights into the complex and coordinated transcriptional regulation networks underlying rice seed development through cDNA chip-based analysis. Plant Mol Biol 57:785–804 Dugas DV, Bartel B (2004) MicroRNA regulation of gene expression in plants. Curr Opin Plant Biol 7:512–520 Dunford RP, Yano M, Kurata N, Sasaki T, Huestis G, Rocheford T, Laure DA (2002) Comparative mapping of the barley Ppd-H1 photoperiod response gene region, which lies close to a junction between two rice linkage segments. Genetics 161:825–834 Emberton J, Ma J, Yuan Y, SanMiguel P, Bennetzen JL (2005) Gene enrichment in maize with hypomethylated partial restriction (HMPR) libraries. Genome Res 15:1441–1446 Etzel CJ, Guerra R (2002) Meta-analysis of genetic-linkage analysis of quantitative-trait loci. Am J Hum. Genet 71:56–65 Feltus FA, Wa NJ, Schultze SR, Estill JC, Jiang N, Paterson AH (2004) An SNP resource for rice genetics and breeding based on subspecies indica and japonica genome alignments. Genome Res 14:1812–1819 Feuillet C, Keller B (2002) Comparative genomics in the grass family: molecular characterization of grass genome structure and evolution. Ann Bot 89:3–10 Fritz-Laylin LK, Krishnamurthy N, Tör M, Sjölander KV, Jones JD (2005) Phylogenomic analysis of the receptor-like proteins of rice and Arabidopsis. Plant Physiol 138:611–623 Fu H, Dooner HK (2002) Intraspecific violation of genetic colinearity and its implications in maize. Proc Natl Acad Sci USA 99:9573–9578 Gallavotti A, Zhao Q, Kyozuka J, Meeley RB, Ritter MK, Doebley JF, Pe ME, Schmidt RJ (2004) The role of barren stalk1 in the architecture of maize. Nature 432:630–635 Gaut BS, Doebley JF (1997) DNA sequence evidence for the segmental allotetraploid origin of maize. Proc Natl Acad Sci USA 94:6809–6814 Gaut BS, Peek AS, Morton BR, Clegg MT (1999) Patterns of genetic diversification within the Adh gene family in the grasses (Poaceae) Mol Biol Evol 16:1086–1097 Ge S, Sang T, Lu BR, Hong DY (1999) Phylogeny of rice genomes with emphasis on origins of allotetraploid species. Proc Natl Acad Sci USA 96:14400–14405 Gill, BS, Appels R, Botha-Oberholster AM, Buell CR, Bennetzen JL, Chalhoub B, Chumley F, Dvorak J, Iwanaga M, Keller B, Li W, McCombie WR, Ogihara Y, Quetier F, and Sasaki T (2004) A workshop report on wheat genome sequencing: International Genome Research on Wheat Consortium. Genetics 168:1087–1096
468
Richard Cooke et al.
Goff, SA, Ricke D, Lan TH, Presting G, Wang R, Dunn M, Glazebrook J, Sessions A, Oeller P, Varma H, Hadley D, Hutchison D, Martin C, Katagiri F, Lange BM, Moughamer T, Xia Y, Budworth P, Zhong J, Miguel T, Paszkowski U, Zhang S, Colbert M, Sun WL, Chen L, Cooper B, Park S, Wood TC, Mao L, Quail P, Wing R, Dean R, Yu Y, Zharkikh A, Shen R, Sahasrabudhe S, Thomas A, Cannings R, Gutin A, Pruss D, Reid J, Tavtigian S, Mitchell J, Eldredge G, Scholl T, Miller RM, Bhatnagar S, Adey N, Rubano T, Tusneem N, Robinson R, Feldhaus J, Macalma T, Oliphant A, Briggs S (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. japonica) Science 296: 92–100 Goffinet B, Gerber S (2000) Quantitative trait loci: a meta-analysis. Genetics 155:463–473 Gottwald S, Stein N, Borner A, Sasaki T, Graner A (2004) The gibberellic-acid insensitive dwarfing gene sdw3 of barley is located on chromosome 2HS in a region that shows high colinearity with rice chromosome 7L. Mol Gen Genomics 271:426–436 Griffiths S, Dunford RP, Coupland G, Laurie DA (2003) The evolution of CONSTANS-like gene families in barley, rice and Arabidopsis. Plant Physiol 131:1855–1867 Griffiths S, Sharp R, Foote TN, Bertin I, Wanous M, Reader S, Colas I, Moore G (2006) Molecular characterization of Ph1 as a major chromosome pairing locus in polyploid wheat. Nature 439:749–752 Gu YQ, Coleman-Derr D, Kong X, Anderson OD (2004) Rapid genome evolution revealed by comparative sequence analysis of orthologous regions from four triticeae genomes. Plant Physiol 135:459–470 Gualtieri G, Conner JA, Morishige DT, Moore D, Mullet JE, Ozias-Akins P (2006) A segment of the apospory-specific genomic region is highly microsyntenic not only between the apomicts Pennisetum squamulatum and buffelgrass, but also with a rice chromosome 11 centromeric-proximal genomic region. Plant Physiol 140:963–971 Guo H, Moose SP (2003) Conserved noncoding sequences among cultivated cereal genomes identify candidate regulatory sequence elements and patterns of promoter evolution. Plant Cell 15:1143–1158 Guyot R, Keller B (2004) Ancestral genome duplication in rice. Genome 47:610– 614 Guyot R, Yahiaoui N, Feuillet C, Keller B (2004) In silico comparative analysis reveals a mosaic conservation of genes within a novel colinear region in wheat chromosome 1AS and rice chromosome 5S. Funct Integr Genomics 4:47–58 Haberer G, Young S, Bharti AK, Gundlach H, Raymond C, Fuks G, Butler E, Wing RA, Rounsley S, Birren B, Nusbaum C, Mayer KF, Messing J (2005) Structure and architecture of the maize genome. Plant Physiol 139:1612–1624 Hamel LP, Nicole MC, Sritubtim S, Morency MJ, Ellis M, Ehlting J, Beaudoin N, Barbazuk B, Klessig D, Lee J, Martin G, Mundy J, Ohashi Y, Scheel D, Sheen J, Xing T, Zhang S, Seguin A, Ellis BE (2006) Ancient signals: comparative genomics of plant MAPK and MAPKK gene families. Trends Plant Sci 11:192–198
17 From Rice to Other Cereals: Comparative Genomics
469
Hampson S, McLysaght A, Gaut B, Baldi P (2003) LineUp: statistical detection of chromosomal homology with application to plant comparative genomics. Genome Res 13:1–12 Han B, Xue Y (2003) Genome-wide intraspecific DNA sequence variations in rice. Curr Opin Plant Biol 6:134–138 Harlan JR (1992) Crops and Man. 2nd Ed. CSSA and ASA, Madison, WI. Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C, Richter J, Rubin GM, Blake JA, Bult C, Dolan M, Drabkin H, Eppig JT, Hill DP, Ni L, Ringwald M, Balakrishnan R, Cherry JM, Christie KR, Costanzo MC, Dwight SS, Engel S, Fisk DG, Hirschman JE, Hong EL, Nash RS, Sethuraman A, Theesfeld CL, Botstein D, Dolinski K, Feierbach B, Berardini T, Mundodi S, Rhee SY, Apweiler R, Barrell D, Camon E, Dimmer E, Lee V, Chisholm R, Gaudet P, Kibbe W, Kishore R, Schwarz EM, Sternberg P, Gwinn M, Hannick L, Wortman J, Berriman M, Wood V, de la Cruz N, Tonellato P, Jaiswal P, Seigfried T, White R (2004) The Gene Ontology (GO) database and informatics resource. Nucl Acids Res 32:258–261 Hayama R, Coupland G (2004) The molecular basis of diversity in the photoperiodic flowering responses of Arabidopsis and rice. Plant Physiol 135:677–684 Hu FY, Tao DY, Sacks E, Fu BY, Xu P, Li J, Yang Y, McNally K, Khush GS, Paterson AH, Li ZK (2003) Convergent evolution of perenniality in rice and sorghum. Proc Natl Acad Sci USA 100:4050–4054 Huang LS, Brooks SA, Fellers JP, Gill BS (2003) Map-based cloning of leaf rust resistance gene Lr21 from the large and polyploid genome of bread wheat. Genetics 164:655–664 Ilic K, SanMiguel PJ, Bennetzen JL (2003) A complex history of rearrangement in an orthologous region of the maize, sorghum, and rice genomes. Proc Natl Acad Sci USA 100:12265–12270 Ikeda A, Ueguchi-Tanaka M, Sonoda Y, Kitano H, Koshioka M, Futsuhara Y, Matsuoka M, Yamaguchi J. (2001) slender rice, a constitutive gibberellin response mutant is caused by a null mutation of the SLR1 gene, an orthologue of the height-regulating gene GAI/RGA/RHT/D8. Plant Cell 13:999–1010 International Rice Genome Sequencing Project (2005) The map-based sequence of the rice genome. Nature 436:793–800 Isidore E, Scherrer B, Chaloub B, Feuillet C, Keller B (2005) Ancient haplotypes resulting from extensive molecular rearrangements in the wheat A genome have been maintained in species of three different ploidy levels. Genome Res 15:526–536 Izawa T, Takahashi Y, Yano M (2003) Comparative biology comes into bloom: genomic and genetic comparison of flowering pathways in rice and Arabidopsis. Curr Opin Plant Biol 6:113–120 Jackson S, Rounsley S, Purugganan M (2006) Comparative sequencing: choices to make. Plant Cell 18:1100–1104 Jain M, Kaur N, Tyagi A, Khurana J (2006) The auxin-responsive GH3 gene family in rice Oryza sativa. Funct Integr Genomics 6:36–46 Jaiswal P, Ni J, Yap I, Ware D, Spooner W, Youens-Clark K, Ren L, Liang C, Zhao W, Ratnapu K, Faga B, Canaran P, Fogleman M, Hebbard C, Avraham S,
470
Richard Cooke et al.
Schmidt S, Casstevens TM, Buckler ES, Stein L, McCouch S (2006) Gramene: a bird’s eye view of cereal genomes. Nucl Acids Res 34:717–723 Jeon JS, Jang S, Lee S, Nam J, Kim C, Lee SH, Chung YY, Kim SR, Lee YH, Cho YG, An G (2000) leafy hull sterile 1 is a homeotic mutation in a rice MADS box gene affecting rice flower development. Plant Cell 12:871–884 Jiao Y, Ma L, Strickland E, Deng XW (2005) Conservation and divergence of light-regulated genome expression patterns during seedling development in rice and Arabidopsis. Plant Cell 17:3239–3256 Juarez MT, Kui JS, Thomas J, Heller BA, Timmermans MC (2004) MicroRNAmediated repression of rolled leaf 1 specifies maize leaf polarity. Nature 428:84–88 Katagiri S, Wu J, Ito Y, Karasawa W, Shibata M, Kanamori H, Katayose Y, Namiki N, Matsumoto T, Sasaki T (2004) End sequencing and chromosomal in silico mapping of BAC clones derived from an indica rice cultivar, Kasalath. Breeding Sci 54:273–279 Katari MS, Balija V, Wilson R K, Martienssen RA, McCombie WR (2005) Comparing low coverage random shotgun sequence data from Brassica oleracea and Oryza sativa genome sequence for their ability to add to the annotation of Arabidopsis thaliana. Genome Res 15:496–504 Kawasaki S, Borchert C, Deyholos M, Wang H, Brazille S, Kawai K, Galbraith D, Bohnert HJ (2001) Gene expression profiles during the initial phase of salt stress in rice. Plant Cell 13:889–905 Kellogg EA (2001) Evolutionary history of the grasses. Plant Physiol 125: 1198–1205 Kellogg EA (2004) Evolution of developmental traits. Curr Opin Plant Biol 7: 92–98 Kikuchi S, Satoh K, Nagata T, Kawagashira N, Doi K, Kishimoto N, Yazaki J, Ishikawa M, Yamada H, Ooka H, Hotta I, Kojima K, Namiki T, Ohneda E, Yahagi W, Suzuki K, Li CJ, Ohtsuki K, Shishiki T, Otomo Y, Murakami K, Iida Y, Sugano S, Fujimura T, Suzuki Y, Tsunoda Y, Kurosaki T, Kodama T, Masuda H, Kobayashi M, Xie Q, Lu M, Narikawa R, Sugiyama A, Mizuno K, Yokomizo S, Niikura J, Ikeda R, Ishibiki J, Kawamata M, Yoshimura A, Miura J, Kusumegi T, Oka M, Ryu R, Ueda M, Matsubara K, Kawai J, Carninci P, Adachi J, Aizawa K, Arakawa T, Fukuda S, Hara A, Hashizume W, Hayatsu N, Imotani K, Ishii Y, Itoh M, Kagawa I, Kondo S, Konno H, Miyazaki A, Osato N, Ota Y, Saito R, Sasaki D, Sato K, Shibata K, Shinagawa A, Shiraki T, Yoshino M, Hayashizaki Y, Yasunishi A (2003) Collection, mapping, and annotation of over 28,000 cDNA clones from japonica rice. Science 301:376–379 Kim JS, Islam-Faridi MN, Klein PE, Stelly DM, Price HJ, Klein RR, Mullet JE (2005) Comprehensive molecular cytogenetic analysis of sorghum genome architecture: distribution of euchromatin, heterochromatin, genes and recombination in comparison to rice. Genetics 171:1963–1976 Klein PE, Klein RR, Vrebalov J, Mullet JE (2003) Sequence-based alignment of sorghum chromosome 3 and rice chromosome 1 reveals extensive conservation of gene order and one major chromosomal rearrangement. Plant J 34: 605–621
17 From Rice to Other Cereals: Comparative Genomics
471
Kojima S, Takahashi Y, Kobayashi Y, Monna L, Sasaki T, Araki T, Yano M (2002) Hd3a, a rice orhtolog of the Arabidopsis FT gene, promotes transition to flowering downstream of Hd1 under short day condition. Plant Cell Physiol 43:1096–1105 Komatsu K, Maekawa M, Ujiie S, Satake Y, Furutani I, Okamoto H, Shimamoto K, Kyozuka J (2003) LAX and SPA: major regulators of shoot branching in rice. Proc Natl Acad Sci USA 100:11765–11770 Konishi S, Izawa T, Lin SY, Ebana K, Fukuta Y, Sasaki T, Yano M (2006) An SNP caused loss of seed shattering during rice domestication. Science 312:1392–1396 Kramer EM, Jaramillo MA, Di Stilio VS (2004) Patterns of gene duplication and functional evolution during the diversification of the AGAMOUS subfamily of MADS box genes in angiosperms. Genetics 166:1011–1023 Kresovich S, Barbazuk B, Bedell JA, Borell A, Buell CR, Clifton S, CordonnierPratt MM, Cox S, Dahlberg J, Erpelding J, Fulton TM, Fulton B, Fulton L, Gingle AR, Hash CT, Huang Y, Jordan D, Klein PE, Klein RR, Magalhaes J, McCombie R, Moore P, Mullet JE, Ozias-Akins P, Paterson AH, Porter K, Pratt L, Roe B, Rooney W, Schnable PS, Stelly DM, Tuinstra M, Ware D, Warek U (2005) Towards sequencing the sorghum genome: a US National Science Foundation-sponsored workshop report. Plant Physiol 138:1898– 1902 Kurata N, Yamazaki Y (2006) Oryzabase. An integrated biological and genome information database for rice. Plant Physiol 140:12–17 Lahaye T, Shirazu K, Schulze-Lefert P (1998) Chromosome landing at the barley Rar1 locus. Mol Gen Genomics 260:92–101 Lai J, Ma J, Swigonova Z, Ramakrishna W, Linton E, Llaca V, Tanyolac B Park Y-J, Jeong O-Y, Bennetzen JL, Messing J (2004) Gene loss and movement in the maize genome. Genome Res 14:1924–1931 Lai J, Li Y, Messing J, Dooner HK (2005) Gene movement by helitron transposons contributes to the haplotype variability of maize. Proc Natl Acad Sci USA 102:9068–9073 Langham RJ, Walsh J, Dunn M, Ko C, Goff SA, Freeling M (2004) Genomic duplication, fractionalization and the origin of regulatory novelty. Genetics 166:935–945 La Rota M, Sorrells ME (2004) Comparative DNA sequence analysis of mapped wheat ESTs reveals the complexity of genome relationships between rice and wheat. Funct Integr Genomics 4:34–46 Le Cunff L, Garsmeur O, Telismart H, Begum D, Deu M, Wing R, Glaszmann JC, Raboin LM, D'Hont A (2007) Exploitation of sorghum genetic and physical maps for fine gene mapping of the rust resistance gene Bru 1 in the high polyploid sugarcane. Theor Appl Genet (In Press) Lee JH, Cho YS, Yoon HS, Suh MC, Moon J, Lee I, Weigl D, Yun CH, Kim JK (2005) Conservation and divergence of FCA function between Arabidopsis and rice. Plant Mol Biol 58:823–838 Levy AA, Feldman M (2002) The impact of polyploidy on grass genome evolution. Plant Physiol 130:1587–1593
472
Richard Cooke et al.
Li C, Ni P, Francki M, Hunter A, Zhang Y, Schibeci D, Li H, Tarr A, Wang J, Cakir M, Yu J, Bellgard M, Lance R, Appels R (2004) Genes controlling seed dormancy and pre-harvest sprouting in a rice-wheat-barley comparison. Funct Integr Genomics 4:84–93 Li C, Zhou A, Sang T (2006a) Rice domestication by reducing shattering. Science 311:1936–1939 Li L, Wang X, Stolc V, Li X, Zhang D, Su N, Tongprasit W, Li S, Cheng Z, Wang J, Deng XW (2006b) Genome-wide transcription analyses in rice using tiling microarrays. Nat Genet 38:124–129 Lijavetzky D, Carbonero P, Vicente-Carbajosa J (2003) Genome-wide comparative phylogenetic analysis of the rice and Arabidopsis Dof gene families. BMC Evol Biol 3:17–27 Lim J, Jung JW, Lim CE, Lee MH, Kim BJ, Kim M, Bruce WB, Benfey PN (2005) Conservation and diversification of SCARECROW in maize. Plant Mol Biol 59:619–630 Linkiewicz AM, Qi LL, Gill BS, Ratnasiri A, Echalier B, Chao S, Lazo GR, Hummel DD, Anderson OD, Akhunov ED, Dvorak J, Pathan MS, Nguyen HT, Peng JH, Lapitan NL, Miftahudin, Gustafson JP, La Rota CM, Sorrells ME, Hossain KG, Kalavacharla V, Kianian SF, Sandhu D, Bondareva SN, Gill KS, Conley EJ, Anderson JA, Fenton RD, Close TJ, McGuire PF, Qualset CO, Dubcovsky J (2004) A 2500 locus bin map of wheat homeologous group 5 provides insights on gene distribution and colinearity with rice. Genetics 168:665–676 Lockton S, Gaut BS (2005) Plant conserved non-coding sequences and paralogue evolution. Trends Genet 21:60–65 Lukens L, Doebley J (2001) Molecular evolution of the teosinte branched gene among maize and related grasses. Mol Biol Evol 118:627–638 Ma J, Bennetzen JL (2004) Rapid recent growth and divergence of rice nuclear genomes. Proc Natl Acad Sci USA 101:12404–12410 Ma J, SanMiguel P, Lai J, Messing J, Bennetzen JL (2005a) DNA rearrangement in orthologous orp regions of the maize, rice and sorghum genomes. Genetics 170:1209–1220 Ma L, Chen C, Liu X, Jiao Y, Su N, Li L, Wang X, Cao M, Sun N, Bao J, Li J, Pedersen S, Bolund L, Zhao H, Yuan L, Wong GK-S, Wang J, Deng XW, Wang J (2005b) A microarray analysis of the rice transcriptome and its comparison to Arabidopsis. Genome Res 15:1274–1283 Magalhaes JV, Garvin DF, Wang Y, Sorells ME, Klein PE, Schaffert RE, Li L, Kochian LV (2004) Comparative mapping of a major aluminium tolerance gene in sorghum and other species in the poaceae. Genetics 167:1905–1914 Malcomber ST, Kellogg EA (2004) Heterogenous expression patterns and separate roles of the SEPALLATA gene LEAFY HULL STERILE 1 in grasses. Plant Cell 16:1692–1706 Mallory AC, Vaucheret H (2004) MicroRNAs: something important between the genes. Curr Opin Plant Biol 7:120–125 McCough SR, Teytelman L, Xu Y, Lobos KB, Clare K, Walton M, Fu B, Maghirang R, Li Z, Xing Y, Zhang Q, Kono I, Yano M, Fjellstrom R, DeClerck G,
17 From Rice to Other Cereals: Comparative Genomics
473
Scheider D, Carthinhour S, Ware D, Stein L (2002) Development and mapping of 2240 new SSR markers for rice (Oryza sativa L.) DNA Res 9:99–207 McLaren CG, Bruskiewich RM, Portugal AM, Cosico AB (2005) The international rice information system. A platform for meta-analysis of rice crop data. Plant Physiol 139:637–642 McSteen P (2006) Branching out: the ramosa pathway and the evolution of grass inflorescence morphology. Plant Cell 18:518–522 Messing J, Dooner HK (2006) Organization and variability of the maize genome. Curr Opin Plant Biol 9:157–163 Mester DI, Ronin YI, Korostishevsky MA, Pikus VL, Glazman AE, Korol AB (2006) Multilocus consensus genetic maps (MCGM): formulation, algorithms, and results. Comput Biol Chem 30:12–20 Miftahudin, Chikmawati T, Ross K, Scoles GJ, Gustafson JP (2005) Targeting the aluminum tolerance gene alt3 region in rye, using rice/rye micro-colinearity. Theor Appl Genet 110:906–913 Monna L, Kitazawa N, Yoshino R, Suzuki J, Masuda H, Maehara Y, Tanji M, Sato M, Nasu S, Minobe Y (2002) Positional cloning of rice semidwarfing gene, sd1: rice “green revolution gene” encodes a mutant enzyme involved in gibberellin synthesis. DNA Res 9:11–17 Moore G, Devos KM, Wang Z, Gale MD (1995) Cereal genome evolution: grasses, line up and form a circle. Curr Biol 5:737–739 Moore RC, Purugganan MD (2005) The evolutionary dynamics of plant duplicate genes. Curr Opin Plant Biol 8:122–128 Morgante M, Brunner S, Pea G, Fengler K, Zuccolo A, Rafalski A (2005) Gene duplication and exon shuffling by helitron-like transposons generate intraspecies diversity in maize. Nat Genet 37:997–1002 Munkvold JD, Greene RA, Bermudez Kandianis CE, La Rota CM, Edwards H, Sorrells SF, Dake T, Kantety R, Linkiewicz AM, Dubcovsky J, Akhunov ED, Dvorak J, Miftahudin, Gustafson JP, Pathan MS, Nguyen HT, Mathews DE, Chao S, Lazo GR, Hummel DD, Anderson OD, Anderson JA, GonzalezHernandez JL, Peng JH, Lapitan N, Qi LL, Echalier B, Gill BS, Hossain KG, Kalavacharla V, Kianian SF, Sandhu D, Erayman M, Gill KS, McGuire PE, Qualset CO, Sorrells ME (2004) Group 3 chromosome bin maps of wheat and their relationship to rice chromosome 1. Genetics 168:639–650 Nagano H, Onishi K, Ogasawara M, Horiuchi Y, Sano Y (2005) Genealogy of the “Green Revolution” gene in rice. Gene Genet Syst 80:351–356 Nam J, De Pamphilis CW, Ma H, Nei M (2003) Antiquity and evolution of the MADS-box gene family controlling flower development in plants. Mol Biol Evol 20:1435–1447 Nelson DR, Schuler MA, Paquette SM, Werck-Reichhart D, Bak S (2004) Comparative genomics of rice and Arabidopsis. Analysis of 727 cytochrome P450 genes and pseudogenes from a monocot and a dicot. Plant Physiol 135: 756–772 Nelson WM, Bharti AK, Butler E, Wei F, Fuks G, Kim H, Wing RA, Messing J, Soderlund C (2005) Whole-genome validation of high information-content fingerprinting. Plant Physiol 139:27–38
474
Richard Cooke et al.
Nogueira FTS, De Rosa VE, Menossi M, Ulian EC, Arruda P (2003) RNA expression profiles and data mining of sugarcane response to low temperature. Plant Physiol 132:1811–1824 Ohyanagi H, Tanaka T, Sakai H, Shigemoto Y, Yamaguchi K. Habara T, Fujii Y, Antonio BA, Nagamura Y, Imanishi T, Ikeo K, Itoh T, Gojobori T, Sasaki T (2006) The rice annotation project database (RAP-DB): hub for Oryza sativa ssp. japonica genome information. Nucl Acids Res 34:741–744 Olsen AN, Ernst HA, Lo Leggio L, Skriver K (2005) NAC transcription factors: structurally distinct, functionally diverse. Trends Plant Sci 10:79–87 Ozturk ZN, Talame V, Deyholos M, Michalowski CB, Galbraith DW, Gozurkirmizi N, Tuberosa R, Bonhert HJ (2002) Monitoring large scale changes in transcript abundance in drought- and salt-stressed barley. Plant Mol Biol 48:551–573 Paterson AH (2006) Leafing through the genomes of our major crop plants: strategies for capturing unique information. Nat Rev Genet 7:174–184 Paterson AH, Lin RY, Li ZK, Schertz KF, Doebley JF, Pinson SRM, Liu S-C, Stansel JW, Irvine JE (1995) Convergent domestication of cereal crops by independant mutations at corresponding genetic loci. Science 269:1714–1718 Paterson AH, Bowers JE, Chapman BA (2004) Ancient polyploidization predating divergence of the cereals, and its consequences for comparative genomics. Proc Natl Acad Sci USA 101:9903–9908 Paterson AH, Freeling M, Sazaki T (2005) Grains of knowledge: genomics of model cereals. Genome Res 15:1643–1650 Pellio B, Streng S, Bauer E, Stein N, Perovic D, Schiemann A, Friedt W, Ordon F, Graner A (2005) High resolution mapping of the Rym4/Rym5 locus conferring resistance to the barley yellow mosaic virus complex (BaMMV, BaYMV, BaYMV-2) in barley (Hordeum vulgare ssp. vulgare L) Theor Appl Genet 110:283–293 Peng JH, Zadeh H, Lazo GR, Gustafson JP, Chao S, Anderson OD, Qi LL, Echalier B, Gill BS, Dilbirligi M, Sandhu D, Gill KS, Greene RA, Sorrells ME, Akhunov ED, Dvorak J, Linkiewicz AM, Dubcovsky J, Hossain KG, Kalavacharla V, Kianian SF, Mahmoud AA, Miftahudin, Conley EJ, Anderson JA, Pathan MS, Nguyen HT, McGuire PE, Qualset CO, Lapitan N (2004) Chromosome bin map of expressed sequence tags in homeologous group 1 of hexaploid wheat and homeology with rice and Arabidopsis. Genetics 168:609–623 Peng JR, Richards DE, Hartley NM, Murphy GP, Devos KM, Flintham JE, Beales J, Fish LJ, Worland AJ, Pelica F, Sudhakar D, Christou P, Snape JW, Gale MD, Harberd NP (1999) ‘Green revolution’ genes encode mutant gibberellin response modulators. Nature 400:256–261 Prasad V, Stromberg CA, Alimohammadian H, Sahni A (2005) Dinosaur coprolites and the early evolution of grasses and grazers. Science 310:1177–1180 Pratt LH, Liang C, Shah M, Sun F, Wang H, Reid SP, Gingle AR, Paterson AH, Wing R, Dean R, Klein R, Nguyen HT, Ma HM, Zhao X, Morishige DT, Mullet JE, Cordonnier-Pratt MM (2005) Sorghum expressed sequence tags identify signature genes for drought, pathogenesis, and skotomorphogenesis
17 From Rice to Other Cereals: Comparative Genomics
475
from a milestone set of 16,801 unique transcripts. Plant Physiol 139:8869– 8884 Qi LL, Echalier B, Chao S, Lazo GR,Butler GE, Anderson OD, Akhunov ED, Dvorak J, Linkiewicz AM, Ratnasiri A, Dubcovsky J, Bermudez-Kandianis CE, Greene RA, Kantety R, La Rota CM, Munkvold JD, Sorrells SF, Sorrells ME, Dilbirligi M, Sidhu D, Erayman M, Randhawa HS, Sandhu D, Bondareva SN, Gill KS, Mahmoud AA, Ma X-F, Miftahudin, Gustafson JP, Conley EJ, Nduati V, Gonzalez-Hernandez JL, Anderson JA, Peng JH, Lapitan NLV, Hossain KG, Kalavacharla V, Kianian SF, Pathan MS, Zhang DS, Nguyen HT, Choi D-W, Fenton RD, Close TJ, McGuire PE, Qualset CO, Gill BS (2004) A chromosome bin map of 16,000 expressed sequence tag loci and distribution of genes among the three genomes of polyploid wheat. Genetics 168:701–712 Rabbani M, Maruyama K, Abe H, Khan M, Katsura K, Ito Y, Yoshiwara K, Seki M, Shinozaki K, Yamaguchi-Sinozali K (2003) Monitoring expression profiles of rice genes under cold, drought, and high salinity stresses and abscisic acid application using cDNA microarrays and RNA-gel blot analyses. Plant Physiol 133:1755–1767 Rabinowicz PD, Bennetzen JL (2006) The maize genome as a model for efficient sequence analysis of large plant genomes. Curr Opin Plant Biol 9:149–156 Ramakrishna W, Emberton J, SanMiguel P, Ogden M, Llaca V, Messing J, Bennetzen JL (2002) Comparative sequence analysis of the sorghum Rph region and the maize Rp1 resistance gene complex. Plant Physiol 130:1728–1738 Rensink WA, Buell CR (2004) Arabidopsis to rice. Applying knowledge from a weed to enhance our understanding of a crop species. Plant Physiol 135: 622–629 Reyes JC, Muro-Pastor MI, Florencio FJ (2004) The GATA family of transcription factors in Arabidopsis and rice. Plant Physiol 134:1718–1732 Salse J, Piegu B, Cooke R, Delseny M (2004) New in silico insight into the synteny between rice (Oryza sativa L.) and maize (Zea mays L.) highlights reshuffling and identifies new duplications in the rice genome. Plant J 38: 396–409 Salvi S, Tuberosa R, Chiapparino E, Maccaferri M, Veillet S, Van Beuningen L, Isaac P, Edwards K, Phillips RL (2002) Toward positional cloning of Vgt1, a QTL controlling the transition from the vegetative to reproductive phase in maize. Plant Mol Biol 48:601–613 Salzman RA, Brady JA, Finlayson SA, Buchanan CD, Sun F, Kein PE, Klein RR, Pratt LH, Cordonnier-Pratt MM, Mullet JE (2005) Transcriptional profiling of sorghum induced by methyl jasmonate, salicylic acid, and aminocyclopropane carboxylic acid reveals co-operative regulation and novel gene responses. Plant Physiol 138:352–368 SanMiguel P, Gaut BS, Tikhonov A, Nakajima Y, Bennetzen JL (1998) The paleontology of intergene retrotransposon of maize. Nat Genet 20:43–45 Satoh-Nagasawa N, Nagasawa N, Malcomber S, Sakai H, Jackson D (2006) A trehalose metabolic enzyme controls inflorescence architecture in maize. Nature 441:227–230
476
Richard Cooke et al.
Seki M, Narusaka M, Kamiya A, Ishida J, Satou M, Sakurai T, Nakajima M, Enju A, Akiyama K, Oono Y, Muramatsu M, Hayashizaki Y, Kawai J, Carninci P, Itoh M, Ishii Y, Arakawa T, Shibata K, Shinagawa A, Shinozaki K (2002) Functional annotation of a full-length Arabidopsis cDNA collection. Science 296:141–145 Shen QH, Zhou F, Bieri S, Haizel T, Shirazu K, Schulze-Lefert P (2003) Recognition specificity and RAR1/SGT1 dependence in barley Mla disease resistance genes to the powdery mildew fungus. Plant Cell 15:732–744 Shen Y, Jiang H, Jin JP, Zhang ZB, Xi B, He YY, Wang C, Qian L, Li X, Yu QB, Liu HJ, Chen DH, Gao JH, Huang H, Shi TL, Yang ZN (2004) Development of genome-wide DNA polymorphism database for map-based cloning of rice genes. Plant Physiol 135:1198–1205 Shiu SH, Karlowski WM, Pan R, Tzeng YH, Mayer KFX, Li WH (2004) Comparative analysis of the receptor like kinase family in Arabidopsis and rice. Plant Cell 16:1220–1234 Simons KJ, Fellers JP, Trick HN, Zhang Z, Tai YS, Gill B, Faris JD (2006) Molecular characterization of the major wheat domestication gene Q. Genetics 172:547–555 Skinner JS, Von Zitzewitz J, Szücs P, Marquez-Cedillo L, Filichkin T;, Amundsen K, Stockinger EJ, Thomashow MF, Chen THH, Hayes PM (2005) Structural, functional, and phylogenetic characterization of a large CBF gene family in barley. Plant Mol Biol 59:533–551 Song R, Llaca V, Messing J (2002) Mosaic organization of orthologous sequences in grass genomes. Genome Res 12:1549–1555 Springer NM, Kaeppler SM (2005) Evolutionary divergence of monocot and dicot methyl-CpG-binding domain proteins. Plant Physiol 138:92–104 Sterck L, Rombauts S, Jansson S, Sterky F, Rouze P, Van de Peer Y (2005) EST data suggest that poplar is an ancient polyploid. New Phytol 167:165–170 Sunkar R, Girke T, Kumar Jain P, Zhu JK (2005) Cloning and characterization of microRNA from rice. Plant Cell 17:1397–1411 Sutton T, Whitford R, Baumann U, Dong C, Able JA, Langridge P (2003) The Ph2 pairing homeologous locus of wheat (Triticum aestivum): identification of candidate meiotic genes using a comparative genomic approach. Plant J 36:443–456 Suzaki T, Sato M, Ashikari M, Miyoshi M, Nagato Y, Hirano HY (2004) The gene FLORAL ORGAN NUMBER1 regulates floral meristem size in rice and encodes a leucine-rich repeat receptor kinase orthologous to Arabidopsis CLAVATA1. Development 131:5649–5657 Swigonova Z, Bennetzen JL, Messing J (2005) Structure and evolution of the r/b chromosomal regions in rice maize and sorghum. Genetics 169:891–906 Takano M, Inagaki N, Xie X, Yuzurihara N, Hihara F, Ishizuka T, Yano M, Nishimura M, Miyao A, Hirochika H, Shimomura T (2005) Distinct and cooperative functions of Phytochromes A, B, and C in the control of deetiolation and flowering in rice. Plant Cell 17:3311–3325 Tanno KI, Wilcox G (2006) How fast was wild wheat domesticated? Science 311:1886
17 From Rice to Other Cereals: Comparative Genomics
477
Térol J, Domingo C, Talon M (2006) The GH3 family in plants: genome wide analysis in rice and evolutionary history based on EST analysis. Gene 371:279–290 The Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796–815 The Rice Chromosome 3 Sequencing Consortium (2005) Sequence, annotation, and analysis of synteny between rice chromosome 3 and diverged grass species. Genome Res 15:1284–1291 The Rice Chromosomes 11 and 12 Sequencing Consortia (2005) The sequence of rice chromosomes 11 and 12, rich in disease resistance genes and recent gene duplications. BMC Biology 3:20 (http://www.biomedcentral.com/1741-7007/3/20) Thomson MJ, Edwards JD, Septiningsih EM, Harrington SE, McCouch SR (2006) Substitution mapping of dth1.1, a flowering-time quantitative trait locus (QTL) associated with transgressive variation in rice, reveals multiple subQTL. Genetics 172:2501–2514 Tian C, Wan P, Sun S, Li J, Chen M (2004) Genome-wide analysis of the GRAS gene family in rice and Arabidopsis. Plant Mol Biol 54:519–532 Turner, A, Beales J, Faure S, Dunford RP, and Laurie DA (2005) The pseudoresponse regulator Ppd-H1 provides adaptation to photoperiod in barley. Science 310:1031–1034 Ueguchi-Tanaka M, Ashikari L, Nakajima M, Itoh H, Katoh E, Kobayashi M, Chow TY, Hsing YI, Kitano H, Yamaguchi I, Matsuoka M (2005) GIBBERELLIN INSENSITIVE DWARF 1 encodes a soluble receptor for gibberellin. Nature 437:693–698 Van de Poele K, Saeys Y, Simillion E, Raes J, Van de Peer Y (2002) The automatic detection of homologous regions (ADHoRE) and its application to microcolinearity between Arabidopsis and rice. Genome Res 12:1792–1801 Varshney RK, Graner A, Sorrells ME (2005) Genomics-assisted breeding for crop improvement. Trends Plant Sci 10:621–630 Vitte C, Ishii T, Lamy F, Brar D, Panaud O (2004) Genomic paleontology provides evidence for two distinct origin of Asian rice (Oryza sativa L) Mol Gen Genomics 272:504–511 Vollbrecht E, Springer PS, Goh L, Buckler ES, Martienssen R (2005) Architecture of floral branch systems in maize and related grasses. Nature 436:1119–1126 Walia H, Wilson C, Condamine P, Liu X, Ismail AM, Zeng L, Wanamaker SI, Mandal J, Xu J, Cui X, Close TJ (2005) Comparative transcriptional profiling of two contrasting rice genotypes under salinity stress during the vegetative growth stage. Plant Physiol 139:822–835 Wang CJR, Harper L, Cande WZ (2006) High-resolution single copy gene fluorescence in situ hybridization and its use in the construction of a cytogenetic map of maize chromosome 9. Plant Cell 18:529–544 Wang G, Kong H, Sun Y, Zhang X, Zhang W, Altman N, De Pamphilis CW, Ma H (2004a) Genome-wide analysis of the cyclin family in Arabidopsis and comparative phylogenetic analysis of plant cyclin-like proteins. Plant Physiol 135:1084–1089
478
Richard Cooke et al.
Wang H, Yu L, Lai F, Liu L, Wang J (2005a) Molecular evidence for asymmetric evolution of sister duplicated blocks after cereal polyploidy. Plant Mol Biol 59:63–74 Wang H, Nussbaum-Wagler T, Li B, Zhao Q, Vigouroux Y, Faller M, Bomblies K, Lukens L, Doebley JF (2005b) The origin of naked grains of maize. Nature 436:714–719 Wang JF, Zhou H, Chen YQ, Luo QJ,Qu LH (2004b) Identification of 20 new miRNA from Oryza sativa. Nucl Acids Res 32:1688–1695 Wang RL, Dtec A, Hei J, Lukens L, Doebley J (1999) The limits of selection during maize domestication. Nature 398:236–239 Wang Z, Liang Y, Li C, Xu Y, Lan L, Zhao D, Chen C, Xu Z, Xue Y, Chong K (2005c) Microarray analysis of gene expression involved in anther development in rice (Oryza sativa L) Plant Mol Biol 58:721–737 Wessler SR (2006) Genome studies and molecular genetics: Part 2 Maize genomics The maize community welcomes the maize genome sequencing project. Curr Opinion in Plant Biology 9:147–148 Wicker T, Yahiaoui N, Guyot R, Schlagenhauf E, Liu ZD, Dubcovsky J, Keller B (2003) Rapid genome divergence at orthologous low molecular weight glutenin loci of the A and Am genomes of wheat. Plant Cell 15:1187–1197 Wing R (2005) Unlocking the secrets of the rice genome. Plant Mol Biol (Special Issue) 59:1–219 Wu KL, Guo ZJ, Wang HH, Li J (2005) The WRKY family of transcription factors in rice and Arabidopsis and their origins. DNA Res 12:9–26 Xiong Y, Liu T, Tian C, Sun S, Li J, Chen M (2005) Transcription factor in rice: a genome-wide comparative analysis between monocots and eudicots. Plant Mol Biol 59:191–203 Xu Y, McCouch SR, Zhang Q (2005) How can we use genomics to improve cereals with rice as a reference genome? Plant Mol Biol 59:7–26 Yahiaoui N, Srichumpa P, Dudler R, Keller, B (2004) Genome analysis at different ploidy levels allows cloning of the powdery mildew resistance gene Pm3b from hexaploid wheat. Plant J 37:528–538 Yan LL, Loukoianov A, Tranquilli G, Helguera M, Fahima T, Dubcovsky J (2003) Positional cloning of the wheat vernalization gene VRN1. Proc Natl Acad Sci USA 100:6263–6268 Yan LL, Loukoianov A, Blechl A, Tranquilli G, Ramakrishna W, San Miguel P, Echenique V, Dubcovsky J (2004) The wheat VRN2 gene is a flowering repressor down-regulated by vernalization. Science 303:1640–1644 Yano M, Katayose Y, Ashikari M, Yamanouchi U, Monna L, Fuse T, Baba T, Yamamoto K, Umehara Y, Nagamura Y, Sasaki T (2000) Hd1, a major photoperiod sensitivity quantitative trait locus in rice, is a closely related to the Arabidopsis flowering gene CONSTANS. Plant Cell 12:2473–2484 Yap IV, Schneider D, Kleinberg J, Matthews D, Cartinhour S, McCouch SR (2003) A graph-theoretic approach to comparing and integrating genetic, physical and sequence-based maps. Genetics 165:2235–2247 Yu J, Hu S, Wang J, Wong GK, Li S, Liu B, Deng Y, Dai L, Zhou Y, Zhang X, Cao M, Liu J, Sun J, Tang J, Chen Y, Huang X, Lin W, Ye C, Tong W, Cong L, Geng J, Han Y, Li L, Li W, Hu G, Huang X, Li W, Li J, Liu Z, Li L, Liu J, Qi Q,
17 From Rice to Other Cereals: Comparative Genomics
479
Liu J, Li L, Li T, Wang X, Lu H, Wu T, Zhu M, Ni P, Han H, Dong W, Ren X, Feng X, Cui P, Li X, Wang H, Xu X, Zhai W, Xu Z, Zhang J, He S, Zhang J, Xu J, Zhang K, Zheng X, Dong J, Zeng W, Tao L, Ye J, Tan J, Ren X, Chen X, He J, Liu D, Tian W, Tian C, Xia H, Bao Q, Li G, Gao H, Cao T, Wang J, Zhao W, Li P, Chen W, Wang X, Zhang Y, Hu J, Wang J, Liu S, Yang J, Zhang G, Xiong Y, Li Z, Mao L, Zhou C, Zhu Z, Chen R, Hao B, Zheng W, Chen S, Guo W, Li G, Liu S, Tao M, Wang J, Zhu L, Yuan L, Yang H (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296:79–92 Yu J, Wang J, LinW, Li S, Li H, Zhou J, Ni P, Dong W, Hu S, Zeng C, Zhang J, Zhang Y, LI R, Xu Z, Li S, Li X, Zheng H, Cong L, Lin L, Geng J, Li G, Shi J, Liu J,Lv H, Li J, Wang J,Deng Y, Ran L, Shi X, Wang X, Wu Q, Li C, Ren X, Li D, Liu D, Zhang X, Ji Z, Zhao W, Sun Y, Zhang Z, Bao J, Han Y, Dong L, Ji J, Chen P, Wu S, Xiao Y, Bu D, Tan J, Yang L, Ye C, Xu J, Zhou Y, Yu Y, Zhang B, Zhuang S, Wei H, Liu B, Lei M, Yu H, Li Y, Xu H, Wei S, He X, Fang L, Huang X, Su Z, Tong W, Tong Z, Ye J, Wang L, Lei T, Chen C, Chen H, Huang H, Zhang F, Li N, Zhao C, Huang Y, Li L, Xi Y, Qi Q, Li W, Hu W, Tian X, Jiao Y, Liang X, Jin J, Gao L, Zheng W, Hao B, Liu S, Wang W, Yuan L, Cao M, McDermott J, Samudrala R, Wong G. K, Yang H (2005) The genomes of Oryza sativa: a history of duplications. PLoS Biol 3:266–281 Yuan Q, Ouyang S, Wang A, Zhu W, Maiti R, Lin H, Hamilton J, Haas B, Sultana R, Foo Cheung F, Wortman J, Buell CR (2005) The Institute for Genomic Research Osa1 rice genome annotation database. Plant Physiol 138:18–26 Yuan Y, SanMiguel P, Bennetzen JL (2002) Methylation spanning linker libraries link gene-rich regions and identify epigenetic boundaries in Zea mays Genome Res 12:1345–1349 Yue B, Xue W, Xiong L, Yu X, Luo L, Cui K, Jin D, Xing Y, Zhang Q (2006) Genetic basis of drought resistance at reproductive stage in rice: separation of drought tolerance from drought avoidance. Genetics 172:1213–1228 Zahn LM, Kong H, Leebens-Mack JH, Kim S, Soltis PS, Landherr LL, Soltis DE, De Pamphilis CW, Ma H (2005) The evolution of the SEPALLATA subfamily of MADS-box genes: a preangiosperm origin with multiple duplications throughout angiosperm history. Genetics 169:2209–2223 Zhang S, Chen C, Li L, Meng L, Singh J, Jiang N, Deng XW, He ZH, Lemaux PG (2005) Evolutionary expansion, gene structure, and expression of the rice wall-associated kinase gene family. Plant Physiol 139:1107–1124 Zhang Y, Wang L (2005) The WRKY transcription factor superfamily: its origin in eukaryotes and expansion in plants. BMC Evol Biol 5:1 (http://www.biomedcentral.com/1471-2148/5/1) Zhu T, Budworth P, Chen WQ, Provart N, Chang HS, Guimil S, Wu WP, Estes B, Zou GZ, Wang X (2003) Transcriptional control of nutrient partitioning during rice grain filling. Plant Biotechnol J 1:59–70 Zimmermann P, Hirsch-Hoffmann M, Hennig L, Gruissem W (2004) GENEVESTIGATOR. Arabidopsis microarray database and analysis toolbox. Plant Physiol 136:2621–2632 Zimmermann P, Schildknecht B, Craignan D, Garcia-Hernandez M, Gruissem W, May S, Mukherjee G, Parkinson H, Rhee S, Wagner U, Hennig L (2006) MIAME/Plant-adding value to plant microarray experiments. Plant Methods BMC 2:1 (http://www.plantmethods.com/content/2/1/1)
Index
β-carotene, 136, 422 β-glucuronidase gene (gus, gusA or uidA)/protein (GUS), 187~193, 205–210, 236, 238, 249, 283, 293, 312–318, 342, 345, 346, 349, 375, 383, 384 2-acetyl-1-pyrroline, 131 60S ribosomal protein L19, 82 ab initio gene-finding (prediction), 22, 24, 47, 364 abiotic stress, 38, 71, 74, 159, 190, 260, 371, 440, 441 abnormal long morphology protein, 82 abscisic acid (ABA), 72, 210 accelerated ions, 156 acetohydroxy acid synthase (AHAS), 273, 278 acetolactate synthase (ALS), 273, 278 acetosyringone, 183 actin-binding protein, putative, 72 actin depolymerizing factor, 70 activation tag(ging), 188–190, 194, 209, 211, 213, 257, 333, 335–341, 346–349 cell type-specific, 3, 335~348 estradiol-inducible, 339 acyl-CoA-binding protein, 79 AD (arbitrary degenerate) primer, 253, 382 adapter ligation PCR, 232, 247, 248, 250, 254, 256, 260 adaxial/abaxial polarity, 443 Adh (alcohol dehydrogenase) gene family, 431 Adh1, 281, 283, 286 Adh2, 281, 286
ADP-ribose pyrophosphatase, 82 advanced backcross (AB), 117, 120, 121, 133 advanced backcross QTL (AB-QTL), 121, 122 Aegilops speltoides, 451 Aegilops tauschi, 448 Aequorea victoria, 191, 193 Affymetrix, 37–38, 365 Agilent Technologies, 37 Agrobacterium tumefaciens, 185, 316, 323 mediated transformation, 183–185, 238, 248, 275~283, 285, 286, 312, 383 floral dip/inflorescence infiltration, 167, 277, 286 nopaline synthase promoter (nosPro), 298, 317 nopaline synthase terminator (nos T), 191, 193, 237 strain EHA101, 183 EHA105, 183, 185 LBA4404, 183, 185 Agrostis stolonifera, 434 aldo/keto reductase, 80 allele-sharing map, 163 allelic series, 110, 151, 160, 166, 167, 173, 214 Allium cepa, 434 allotetraploid, 432 amplified fragment length polymorphism (AFLP), 114, 162 analysis of variance, simultaneous component analysis, 103
482
Index
angiosperms, 441, 444, 452 floral homeotic genes, C-Class, 452 annotated non expressed (ANE) genes, 52 anthocyanin, 96 Anthocyaninless1, 446 antifungal protein 2, 82 Antirrhinum, 452 genes FARINELLI(FAR), 452 PLENA, 452 antitoxicant(s), 153 AP2/EREB/ERF family, 440 apoplastic fluid, 71 apospory specific genomic region (ASGR), 452 Arabidopsis thaliana, 1, 7, 14, 16, 36, 38, 41, 42, 44, 49, 153, 192, 207, 295, 334, 370, 434 genes/mutants/proteins AGAMOUS (AG), 452 AtISA3, 166 AtMPK14, 439 AtMPK3, 439 AtMPK6, 439 AtMPK7, 439 AtWEX, 166 carotenoid beta-ring hydroxylase, 166 CLAVATA1, 212 CONSTANS (CO), 416, 448 CRINKLY4, 166 CycD3, 442 CycD5, 442 Cyclins, 442 DAWDLE, 166 DCL1, 302 DCL2, 302 DCL3, 307 DCL4, 319 EARLY FLOWERING 4 (ELF4), 459 FCA, 459 FLOWERING LOCUS C (FLC), 459
FLOWERING LOCUS T (FT), 338, 417 FLS2, 437 FRIGIDA, 459 frl1, 156 ga1–3, 155 GH3 gene family, 442 Hua Enhancer1 (HEN1), 303–304 HY1, 459 hy4, 155 LEAFY, 461 LEAFY PETIOLE, 338 MPKK7–9, 439 PHABULOSA, 443 PHAVOLUTA, 443 PPO, 277 REVERSION-TO-ETHYLENE SENSITIVITY1, 166 REVOLUTA, 443 SEP1 (AGL2), 461 SEP2 (AGL4), 461 SEP3 (AGL9), 461 SEP4 (AGL3), 461 SEPALLATA, 441, 461 sulfotransferase gene, 97 suv1, 156 SWI2/SNF2 chromatinremodeling protein (DDM1), 278, 281, 299 tt18, 156 tt19, 156 uvi1, 156 aromatic compounds, 94 artificial microRNAs, 334 ascorbate peroxidase, 73 aspartyl protease, 210 association mapping, 128–130 ATP sulfurylase, 78, 79 ATP-dependent caseinolytic protease, 73 auxin, 94, 335 auxin-related transcriptional factor, 97 BAC clone(s), 361 BAC end sequence(s), 399, 404, 444
Index BAC fingerprinting, 399 background mutations, 157, 158, 164 bacterial transit peptide (ctr1), 136 barley, 7, 16, 154, 368, 371, 385, 411, 431, 436, 439, 458 genes/mutants/QTLs, see under Hordeum vulgare Barley yellow dwarf virus (BYDV), 309 betaine aldehyde dehydrogenase 2 (BAD2), 132 bidirectional gene trap(s), 188, 207 biotic stress, 38, 45, 69, 71, 74, 159, 260, 371, 440, 441 blast, rice disease, 73, 95 Bowman-Birk protease inhibitor, 75 Brachypodium distachyon, 154, 434, 462 Brachypodium sylvaticum, 450, 462 Brassica, 16, 385 Brassica napus, 210 Brassica oleracea, 435 brown plant hopper, 162 2+
Ca -binding protein, 80 Caenorhabditis elegans, 293, 294, 311, 324 lin-4 locus, 294 calmodulin-related protein, 80 calreticulin, 75, 82 CaMV 35S enhancer element, 335, 337, 341 octamer, 340, 341 tetramer, 336, 339 CaMV 35S promoter, 188, 204, 211, 226, 229, 236, 249–251, 293, 298, 313~321, 337 Capillary electrophoresis (CE), 93, 98, 399 -diode array detection (CEDAD), 93, 95 -mass spectrometry (CE-MS), 93, 95, 97 Cassava vein mosaic virus (CVMV), 339
483
Cauliflower mosaic virus (CaMV), 293, 298 cDNA, 13, 22, 23, 26, 32, 37 full-length, 2, 13, 14, 22, 24, 26, 32~37, 41, 49–52, 62, 66, 200, 357, 362, 385, 412, 434 cDNA chips, 1 cellular network, 104 Cenchrus ciliaris, 452 centromere retrotransposon of rice (CRR), 12 chalcone synthase, 292, 293 chaperonin 60, 76 Chasmanthium, 461 chemical mutagen(s), 2, 150–153 chemometrics, 103 chitinases, 71 Chlamydomonas reinhardti, 439, 441 chloroplast superoxide dismutase (SOD), 71, 72, 74 chromosome centromere-specific satellite DNA (CentO), 12, 14, 201 centromere(s), 10, 12, 14, 48, 51, 62, 200, 452 euchromatic region(s), 198 euchromatin, 12, 50, 62 gene-rich region(s), 6, 201, 204, 214 heterochromatic region(s), 154, 198, 201, 299, 314, 445 linkage map, 9, 33, 35, 116, 366, 368, 407, 412, 421 pericentromic regions, 50 subtelomeric region(s), 198 transcriptionally active regions (TARs), 48, 49 chromosome segment substitution lines (CSSLs), 127, 407, 420, 421 chromosome segmental duplications, 437 cis-acting regulatory element(s), 362 cluster analysis, 77, 82 cold tolerance, 418 colinearity, 7, 401, 450 comparative genomics, 358, 396, 429~463
484
Index
concentric (crop) circle model, 16, 430 conditional traits, 161 confocal microscopy, 206 conserved noncoding sequences (CNSs), 396, 436 Consortia, see under Institutes/Networks/Consortia coresponse analysis, 96 correlation matrix, metabolite, 103 cosmid, 6 covariance matrix, 103 cross-species comparison, 369 cryo-sectioning, 98 Cys2His2 zinc finger, 282 cysteine protease(s), 210, 211 cytochrome c oxidase subunit 6b-1, 72 cytochrome P450, 442 cytokinin oxidase/dehydrogenase (CKX), 418 cytoplasmic male sterile (CMS), 111 cytosine deaminase, 280 DAPI staining, 200 Data Banks/Databases/Browsers/ Software ABI fragment analysis software, 399 All4One, 374 AraCyc pathway resource, 102 Array Express, 37 Beijing Genomics Institute’s Rice Information System (BGI_RIS), 33, 357 BioCASE, 388 Biomercator, 454 BioMOBY, 388 BLAST, 21, 14, 24, 192, 250, 251, 362, 363, 366, 380–382, 384, 417 BLASTN, 36, 374 BLASTP, 67 BLASTX, 36, 67 CarthaGene, 455 Community Annotation Tool, 365 Comparative Map Viewer (CMap), 26, 370, 371 CropForge, 380
DNA Data Bank of Japan (DDBJ), 21, 35 DNA polymorphism database, 163 Ensembl compara pipeline, 370 Ensembl genome browser, 370 est2genome, 26 Eukaryotic Genome Control pipeline, 25 EXPASY, 358 FAO/IAEA Mutant Variety Database (MVD), 158 FASTA, 374 FGENESH, 14, 364 FingerPrinted Contigs (FPC) software, 8, 399, 401 Functional Genomics Experiment object models (FUGE), 388 GenBank, 21, 38, 363, 371 Generic Genome Browser (GBrouse), 374 Generic Model Organism Database (GMOD), 26, 364, 370 Genevestigator, 457 GENSCAN, 24, 361 GENSCAN+, 361 GrainGenes, 371 Gramene, 25, 26, 357, 368–372, 375, 401, 435, 454 GRIN, 358 High-Throughput Genome (HTG) Sequences division, NCBI, 10 Image-Master 2D Elite software, 64, 65 INE (INtegrated Rice Genome Explorer), 66, 357, 359–361, 363 International Crop Information System (ICIS), 379, 455 International Rice Information System (IRIS), 159, 357, 378, 455 Internet2, 387 Interpro, 372 InterProScan, 23 JoinMap, 455 Knowledge-based Oryza Molecular Biological
Index Encyclopedia (KOME), 36, 357, 362, 385 Magnaporthe Grisea Oryza Sativa (MGOS) database, 43 MaizeGDB, 372 MapMan, 102, 104 Mascot software, 65 MATDB, 358 Mathematical gene interaction network optimization software (Minos), 82 MetNetDB, 102 MIAME, 53 MOBYnetwork /technology, 375, 388 MOsDB, 357 Multilocus Consensus Genetic Maps, 455 NCBI-GEO, 37 NSF Rice Oligo Array, 365 OMAP, 357 OryGenesDB, 357, 373–375 Oryza Tag Line, 375, 376 Oryzabase, 357, 366–369, 375, 435 Osa1, 26, 370 Pathway Tools Omics Viewer, 102 PaVESy, 102 Pfam (database), 24, 364, 372 Phred-phrap software, 10 PLACE db, 358 Plant Genomes Central, 358 PlantCARE, 358 PlantGDB, 435 PlexDB, 358 Predotar, 372 Program to Assemble Spliced Alignments (PASA), 25, 364 PROSITE, 24 RepeatMasker, 23, 24 Rice Annotation Database (RAD), 357, 361 Rice Annotation Project Database (RAP-DB), 26, 37, 357, 361 Rice Array Db, 358
485
Rice Expression Database (RED), 34, 66, 358, 362, 363 Rice Full-length cDNA Database, 66 Rice Gene Machine Information Management System (RGMIMS), 384 Rice Genome Automated Annotation System (RiceGAAS), 24, 66, 357, 361 Rice Genome Research Program (RGP), 8, 11, 32, 359, 361 Rice Insertion Sequence Database (RISD), 190, 356, 383 Rice Massive Parallel Signature Sequencing (MPSS) gene expression database, 358 Rice Membrane Protein Library, 63 Rice Microarray Opening Site (RMOS), 34, 358 Rice Mutant Database (RMD), 192, 357, 381 Rice PIPELINE, 362, 363 Rice Proteome Database, 63, 357 Rice Tos17 Insertion Mutant Database, 66, 380 RiceGE genome browser, 385–386 RiceHMM, 24, 361 Shanghai T-DNA Insertion Population Database (SHIP), 357, 383 SwissProt-Trembl, 372 SyMAP alignments, 401, 403 synteny viewer, 370 Taiwan Rice Insertional Mutants (TRIM), 357, 382 Tapir, 388 Taverna workflow tool, 388 The Arabidopsis Information Resource (TAIR), 99, 310, 366, 377 TMHMM, 372 Trait Ontology database, 160 Virtual Plant Information Network, 388 webFPC, 401 WORLD-2DPAGE, 63
486
Index
de novo cytosine methylation, 294 deep-coverage large-insert BAC libraries, 396, 399 dehydrin, 74 dehydroascorbate reductase, glutathione S-transferasedependent, 75 demethylmenaquinone methyltransferase, 210 desulfoglucosinolate sulfotransferase, 97 Dictyostelium discoideum, 441 diepoxybutane (DEB), 153, 158 differential proteomics, 68~74 hormones auxin, 77 brassinosteroids (BRs), 76 gibberellin, 75, 76 jasmonic acid, 76 stresses cold, 69 drought, 70, 71 fungus, 73, 74 ozone, 72, 73 salinity, 71–72 virus, 75 diphtheria toxin A fragment, 277, 280 disease resistance (R) gene, 454 diurnal time of harvest, 93, 96 DNA marker(s), 359, 418 DNA methyltransferases, 298, 299, 307 DNA microarray, 96 DNA polymerase(s), 11, 399 DNA polymorphism(s), 162 DNA pool(s), 160, 166, 168, 186, 212 DNA-binding with one finger (Dof), 440 domestication syndrome, 432 double-strand break (DSB), 158, 183, 197, 282, 283 double-stranded RNA (dsRNA), 213, 293~324 double-stranded RNA-binding proteins (dsRBPs), 305
doubled haploid (DH) populations, 72, 118, 119 Drosophila melanogaster, 130, 153, 168, 207, 246, 281, 284, 335, 342 yellow gene, 282, 283 drought QTL, 171 DT-A gene, 277, 280, 281 early flowering, 420 eating quality, 418 EcoTILLING, 171 ectopic expression, 340 ectopic gene targeting, 273~287 ectopic recombination, 137, 277 Edman degradation, 67 elongation factor 1β’ (EF-1β’), 77 embryo (scutellum)-derived calli, 185, 277 endogenous gene targeting, 285 enhancer element(s), 335, 341 enhancer trap (ET), 190, 204, 236~262, 342, 375 enolase, 72 environment ontology (EO), 372 enzymatic assay, 97 epistasis, 110, 132 epistatic interactions, 137 ethylmethane sulfonate (EMS), 152–153, 158 see also under mutagenesis expressed sequence tag (EST)(s), 1, 8, 15, 24, 32~53, 251, 359, 363, 371, 433~463 expression profiling, 1, 2 eXtensible Markup Language (XML), 387 extrachromosomal circular molecule, 282, 284 ferritin, 69 Festuca arundinacea, 434 Fiber-FISH analysis, 9 flanking sequence tag (FST)(s), 173, 182~215, 232, 253, 254, 256, 373, 375, 382, 384 FST database(s), 186, 212, 254, 384
Index flavonoid, 96 FL-cDNA synthesis biotinylated cap trapper method, 35 oligo-capping method, 35 flowering time QTL, 456 FLP site-specific recombinase, 282, 284 fluorescence-activated cell sorting (FACS), 341 fluorescence two-dimensional difference gel electrophoresis (2D-DIGE), 77–79 fluorescently labeled ddNTP, 399 formate dehydrogenase, 76 forward genetics, 3, 161, 186, 208, 241 fosmid, 6, 11, 16 fourier-transform ion cyclotron mass spectrometry (FT-ICR-MS), 95, 97 foxtail millet, 371 FPC physical map, 370 fructokinase, 75 functional nucleotide polymorphism (FNP), 122, 133, 417 fungal resistance, 448 GAL4, 207, 335~349 databases, 343 driver, 344~349 endogenous responder, 347 enhancer trapping, 335, 338, 342, 343 responder, 3, 344–347 transactivation, 344 GAL4:GFP enhancer trap, 375 GAL4:VP16 fusion protein, 343 gas chromatography (GC), 93 gas chromatography mass spectrometry (GC-MS), 96–98 gene-enriched (GE), 463 gene ontology (GO), 23, 36, 361, 372, 388 gene regulatory networks, 396 gene replacement, 274, 279, 284 gene silencing, 1, 150, 164, 291~324 gene-specific screening, 277–279
487
gene tagging efficiency, 277 gene tagging systems, 225~260 one-element system, 229~249 two-element system, 229~250 gene targeting, 273~281, 334 chimeric RNA/DNA oligonucleotide-directed, 278, 279 homologous recombinationdependent, 273, 274, 278, 282–285 gene trap (GT), 205, 236, 238, 244, 248, 250, 251 gene trap Ds, 384 GeneChip 51K Affymetric, 163 Syngenta, 163 genetic engineering, 313, 422 genetic marker(s), 114–116, 401, 413 genetically modified organisms (GMOs), 135, 422 genome annotation, 22, 25, 26, 40, 41, 45, 47, 52, 66, 201, 357, 361, 364–366, 370, 383, 433 genome browsers, see under DataBanks/Databases/Browsers/ Software germin-like protein, 71 GFP reporter gene (gfp), 188, 207, 279, 343 egfp, 193, 208 Giardia lamblia, 441 gibberellic acid (GA) biosynthesis, 134 gibberellin, 75, 134, 440, 458 gibberellin 20 oxidase, 418 gibberellin-responsive genes, 37 glucosinolate biosynthesis gene, 97 glutamine synthetase root isozyme, 72 glutenin gene, 448 glyceraldehyde-3-phosphate dehydrogenase, 76, 78, 80 Glycine max, 434 glyoxalase-I, 80 Golden Rice, 136, 422 grain size, 420
488
Index
grain, food quality, 418 Graph-Theoretic approach, 455 graphical consensus genetic map, 455 graphical viewer, web-based, 361 grassy stunt virus, 111 green leafhopper, 111–112 growth factor 14-c protein, 76 growth stage (GRO), 371 GUS reporter gene (uidA), 187, 236, 342, 345, 384 gus:nptII reporter gene, 283 gymnosperms, 441 HAD-GT12 Genetic Analyzer, 168 haplotype, 126, 132 heat shock factor 7 (HSF7), 128 heat-shock promoter, 282 heavily manually edited maps (HME), 401 helitron rolling-circle transposon, 449 helix-loop-helix protein (bHLH), 122, 449 herbicide resistance/tolerance, 232, 277, 279, 422 heterochromatin, 50–52, 295, 445, 446 heteroduplex, 166, 275 heterologous plants, 228 heterotrimeric G protein, 163 hierarchical shotgun (sequencing) strategy, 6, 7 high C0t analysis, 463 high performance liquid chromatography (HPLC), 94, 96 high-throughput transformation, 184 homeobox, 76 homeodomain-basic leucine zipper (HD-ZIP) gene, 443 homologous recombination, 273~286 Hordeum vulgare, 434, 441 genes/mutants/QTLs flowering time, 441 hardness locus (Ha), 448 HvCO1, 441 HvCO3, 441 HvCO6, 441
PPD-H1, 16, 451 sw3, 451 hormone homeostasis, 442 HSP18.2 heat-shock promoter, 339 hygromycin phosphotransferase (HPH; hpt; hph), 187, 189, 191, 193, 201, 226, 232, 233, 236, 238, 244, 275 Hypertext Markup Language (HTML), 387 HypoMethylated Partial Restriction Libraries (HMPRL), 463 illegitimate recombination, 183 in planta transformation, 214 independent component analysis (ICA), 103 indol acetate (IAA), 442 IAA28, 97 inflorescence architecture, 460 inflorescence QTL, 460 insect resistance, 422 insertion/deletions (indels), 129, 162, 163, 173 Institutes/Networks/Consortia Agricultural Research Station (ARS), US Department of Agriculture (USDA), 166 Arizona Genomics Computational Laboratory (AGCoL), 8 Arizona Genomics Institute (AGI), 8, 16, 398 Asia Pacific Advanced Network (APAN), 387 Beijing Genomics Institute (BGI), 13, 35 Bio-oriented Technology Research Advancement Institution, 35 Centre de coopération internationale en recherche agronomique pour le développement (CIRAD), 373 Centro Internacional de Agricultura Tropical (CIAT), 209
Index China Rice Functional Genomics Program (CRFGP), 190 Clemson University Genomics Institute (CUGI), 8 Committee on Gene Symbolization, Nomenclature and Linkage of the Rice Genetics Cooperative, 26 Consultative Group on International Agricultural Research (CGIAR), 3, 455 Cornell University, Ithaca, New York, 397 Crops Pathology/Genetics Research Unit of UC Davis, 153 CSIRO Plant Industry, 248, 384 Dale Bumpers Rice Stock Center, Arkansas, 168 European Molecular Biology Laboratory (EMBL), 21, 368 Food and Agricultural Organization (FAO), 378 Ford Foundation, 377 Foundation for Advancement of International Science (FAIS), 35 Gene Ontology Consortium, 375, 435, 443 Generation Challenge Programme (GCP), 379, 380, 388 Genome Center, UC Davis, 168 Institute of Genetic Resources, Kyushu University, Japan, 153, 161 Institute of Plant Physiology and Ecology (IPPE), 160 International Maize and Wheat Improvement Center (CIMMYT), 380, 455 International Oryza Map Alignment Project (I-OMAP), 407 International Rice Functional Genomics Consortium (IRFGC), 4, 357, 379, 387, 388 International Rice Genome Sequencing Project (IRGSP),
489
7~16, 24–26, 35, 37, 356, 357, 361, 381, 385, 387, 396~407, 430, 438 International Rice Research Institute (IRRI), 111, 112, 134, 158, 159, 162, 166–168, 170–173, 377–380, 397, 455 International Wheat Genome Sequencing Consortium (IWGSC), 463 Ministry of Agriculture, Forestry, and Fisheries (MAFF), Japan, 359 National Agricultural Research and Extension Systems (NARES), 379 National Bioresource Project (NBRP), 368 National Center of Plant Gene Research, Huazhong Agricultural University, China, 381 National Institute of Agrobiological Resources (NIAR), 34 National Institute of Agrobiological Sciences (NIAS), 34, 35, 37, 66, 159, 202, 357, 359, 362, 363, 375, 381 National Institute of Genetics, Mishima, Japan, 397 National Special Key Program on Rice Functional Genomics of China, 381 Oryza Map Alignment Project (OMAP), 112, 395~407 OryzaSNP Consortium, 357 Plant Ontology Consortium, 368, 375 Plant Research International (Netherlands), 160 Rice Annotation Project (RAP), 26 Rice Annotation Project 1 (RAP1), 37 Rice Chromosome 3 Sequencing Consortium, 401 RIKEN Institute, 35
490
Index
Rockefeller Foundation, 377 Salk Institute Genome Analysis Laboratory (SIGnal), 311, 385 Seattle TILLING Project, 166, 167 Shanghai Institutes for Biological Sciences (SIBS), 160 Society for Techno-Innovation of Agriculture, Forestry and Fisheries, Japan, 34 The Arabidopsis Genome Initiative, 433 The Institute for Genomic Research (TIGR), 26, 35, 36, 38, 43, 45, 357, 361, 363, 364, 366, 370, 372, 374, 383, 385, 386 The Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, 153 Yale Plant Genomics, 358 Yeongnam National Agriculture Station (Milyang, Korea), 382 integrated analysis, 95–97 intermediate ends-out molecule, 282–285 introgression lines, 98, 124, 125 inverse PCR (iPCR), 190, 209, 232, 256, 383 inversions, 432 invertase, 134, 135 ionizing radiation, 150, 152, 155 I-SceI endonuclease(s), 282, 283, 284 isoelectric focusing (IEF), 64 isogenic lines (ILs), 421 isotope coded affinity tag (ICAT), 77 isozymes, 33, 115, 162 jasmonate (JA), 442 kafirin, 448 kinomes, 438 knockout (KO)(s), 203, 205, 211, 256–258, 261, 334 lactoylglutathione lyase, 75 large-scale transformation, 277
laser capture microdissection (LCM), 98, 341 laser-induced fluorescence detector, 98 launch pad (LP), Ds/T-DNA, 199, 228~250, 358, 384 lethal negative selection marker, 280 LiCor genotyper, 168 light harvesting complex chain II, 71 linear energy transfer (LET), 156 linear immobilized pH gradient (IPG) tube gels, 64 linkage disequilibrium (LD) mapping, 128, 130 lipid transfer protein, 71 liquid chromatography (LC), 94, 96 liquid chromatography mass spectrometry (LC-MS), 98 lodging resistance, 420 log of odds (LOD), 371 long SAGE, 385 long terminal repeat (LTR), 361 LTR retrotransposon, 446, 448 solo LTRs, 406 loss-of-function mutants/mutations, 2, 159, 257, 302, 334, 336 Lotus japonicus, 16 luciferase, 339 Lycopersicon esculentum, 434 gene prf-3, 155 machine learning, 99 MADS box gene(s), 190, 212, 441, 461 Magnaporthe grisea, 42, 43, 73, 375, 451 maize, 7, 120, 130, 136, 153, 154, 168, 171, 226–228, 232, 250, 254, 273, 279, 304, 310, 316, 335, 370–372, 384, 385, 401, 411, 423, 430–432, 434–436, 440~463 BAC map, 370 genes/mutants/proteins, see under Zea mays genetic markers, 364
Index inbred lines B73, 449, 463 McC, 449 Mol17, 449 malate dehydrogenase, cytoplasmic, 80 map-based cloning, 15, 162, 163, 360, 450–454 MAP Kinase (MAPK), 439 MAP Kinase Kinases (MAPKK), 439 MAP Kinase Kinase Kinases (MAPKKK), 439 mapping populations, 2, 116–118, 129, 130, 170, 414 marker-assisted selection (MAS), 15, 123, 413, 418 Markov model, 22 Maskless Array Synthesizer (MAS), 48 mass spectrometry (MS), 93~98 Massively Parallel Signature Sequencing (MPSS), 2, 32, 40, 44–45, 53, 356, 365 matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-MS), 62 Medicago truncatula, 16, 310, 434 metabolic flux, 98, 99, 102 map, 102 network, 94, 102, 103 pathway visualization, 102 metabolism, 98 quenching, 93 metabolite analysis, spatially resolved, 98 fingerprinting, 95 network, 104 profile, 94, 97 response, global/local, 98 steady-state level, 98, 102 metabolome, 92 metabolomics, 91, 94 methyl CpG- binding domain protein, 442 methyl filtration, 463
491
Methylation Spanning Linker Libraries (MSLL), 463 methylmalonate-semialdehyde dehydrogenase (MMSDH), 77 microarray(s), 2, 32–53, 69, 96, 163, 224, 260, 356, 358, 412 genome tiling, 46–48 oligomicroarray, 32, 37, 38 microcolinearity, 444, 447 microgenomics, 341 microhomology, 196 microprojectile-bombardment, 184 microRNA(s) (miRNA), 44, 52, 137, 295~320, 436 microspores, 118–119 minimal promoter (MP), 204, 207 modular misexpression screen, 347 molecular clock, 431 molecular marker(s), 371, 412–414 morphological mutants, 401 mouse embryogenic stem cells, 276, 281 multidimensional protein identification technology (MudPIT), 78 multivariate statistical method, 103 mustard gas, 152 mutagenesis EMS mutagenesis, 160, 170 N-ethyl-N-nitrosourea (ENU), 153 fast neutron(s), 155, 158, 159, 163, 165 high-LET radiation, 156 insertional, 2, 151, 185~190, 223~261 γ-irradiation mutagenesis, 154, 155 localized, 246, 248 low-LET radiation, 156 N-methyl-N-nitrosourea (MNU), 154, 158, 161, 368 MNU-induced mutant(s), 168 non-targeted, 245, 246 RNAi-directed, 292 saturation mutagenesis, 150 targeted, 246–248, 256 X-rays, 156
492
Index
NAC domain, 440 nascent polypeptide associated complex alpha chain, putative, 72 natural allelic variation, 422 natural antisense transcript RNA (natsiRNA), 295, 302 naturally occurring alleles, 109 N-Blocked proteins, 67 NBS-LRR gene, 448 nearly isogenic lines (NIL), 120, 122–127, 413, 416–418 negative selection marker (NSM), 234 neomycin phosphotransferase (NPTII), 283 Networks, see under Institutes/Networks/Consortia Nicotiana benthamiana, 41 Nicotiana tabacum, 71 nitrosoguanidine, 152 non-transposable elements, 38, 48 nonhomologous end-joining (NHEJ), 197, 274–276, 280–282 nonhomologous recombination, 203 nonorthologous genes, 404 nonorthologous retroelements, 404 nonsynonymous substitution rate, 438 nuclear magnetic resonance spectroscopy (NMR), 92, 98 nucleotide binding protein 2, 76 oat, 371, 431 oligomicroarray 22K rice, 37 44K rice, 52, 164 one-sided invasion, 275, 281 ortholog prediction, 376, 377 Oryza Rearrangement Index, 406 Oryza species Genes/mutants/proteins Cf2/Cf5, 340 Ehd1, 416, 417, 459 ethylene-responsive element binding protein 3 (ERF3), 171 floral organ number 1 (fon1), 451 FON1, 214
GA2ox, 340 HKT, 126 lax panicle, 451 Lhd4, 416, 417 MYB transcription factor(s), 210 NAC6, 83 OsB, 441 OsCKX2, 125, 126 OsCP1, 210, 211 OsDMKT1, 210 OsE, 441 Osem, 75 OsFCA, 459 OsGA20ox2, 134 OsMADS1/LHS1 (LEAFY HULL STERILE1), 461 OsMADS15, 346 OsMYBS3, 346 OsPR10, 73 OsRac, 334 OsRLK1, 210 OstubA1, 209 OsZFP33, 209 oxalate oxidase (OsOXO), 170 oxalate oxidase-like protein (OsOXL), 170 oxidoreductase, NADPH-dependent, 77 oxygen evolving enhancer protein 2 (OEE2), 72, 75, 78 PHOTOPERIOD SENSITIVITY5 (SE5), 458 R genes, 131 Red pricarp (Rc), 122 RICE LEAFY (RFL), 459 semi-dwarf1 (sd1), 133, 134, 419, 450 Shoot Potassium Concentration1 (SKC1), 123 splicing factor-like protein, putative, 72 subtilin-like protease, 210 UDT1, 210, 214 undeveloped tapetum 1 (udt1), 210 Waxy, 130, 137, 278, 286
Index WRKY, 210 Xa1, 123 Xa2, 136 Xa21, 112, 123, 163, 437 Xa26, 123 Xa27, 123, 124 xa5, 123, 124, 130 O. alta, 397 O. australiensis, 397 O. barthii, 113 O. brachyantha, 397, 401 O. coarctata, 399 O. eichingeri, 397 O. glaberrima, 112, 113, 127, 397, 404, 405, 407, 432 O. glumaepatula, 113, 114 O. grandiglumus, 397 O. granulata, 113, 397 O. latifolia, 397 O. longiglumis, 397 O. longistaminata, 112, 113, 397 O. meridionalis, 113 O. meyeriana, 397 O. minuta, 402 O. nivara, 111, 112, 397, 399, 401, 404, 405, 445 O. officinalis, 113, 402 O. punctata, 398, 401 O. rhizomatis, 397 O. ridleyi, 113, 402 O. rufipogon, 111–115, 117, 121, 122, 399–402, 404 O. sativa, 25, 27, 43, 62, 64,, 111~137, 250, 310, 333, 396, 376, 395, 397, 401~407, 412, 416, 422, 432, 434 subspecies aromatic, 119 ashina, 119 aus, 119, 127, 135 indica, 1, 13, 33~52, 62, 114, 115, 159, 184, 214, 432, 440, 450 japonica, 1, 13, 25, 32, 37, 45, 48–52, 62, 71, 112, 114, 115, 119, 122, 123, 125–128, 133,
493
134, 163, 167, 168, 183, 184, 188, 190, 243, 250, 251, 254, 310, 339, 343, 362, 368, 375, 378, 381–384, 401, 415–416 javanica, 118 rayada, 119 O. schlechteri, 397 QTLs blast resistance, 121 cadium (Cd) concentration in grain, 127 cold tolerance, 120 cooking quality, 120, 127 eating quality, 120 flowering time, 122 Gn1, grain number, 125, 418 grain length, 121, 127 grain width, 127 Hd1, 415–418, 441 Hd2, 133, 415, 416 Hd3, 415 Hd3a, 416, 417 Hd4, 415, 416 Hd5, 415, 416 Hd6, 133, 415, 416 Hd7, 416 Hd8, 416 Hd9, 416 Hd10, 416 Hd11, 416 Hd12, 416 Hd13, 416 Hd14, 416 heading date, 127 nitrogen content, 127 Ph1 (plant height1), 16, 418, 450, 451, 453 resistance to iron toxicity, 127 root length, 120 root number, 120 sd1, 122 seed dormancy, 127 submergence tolerance, 120 tiller angle, 127 yield, 121, 122
494
Index
overlapping oligonucleotide (overgo) probe(s), 8 PAC/BAC contigs, 359 paleopolyploid, 431 papain protease(s), 211 pathogen recognition receptor, 438 pea, 154 Pea seed-borne mosaic virus (PSbMV), 298 pearl millet, 371, 431 P-element transposon, 342, 347 Pennisetum, 452, 461 P. squamulatum, 452, 453 peptide mass fingerprinting, 63 perfect markers, 130–132 Perlegen Sciences, 379 phenylalanine ammonia-lyase, 76 phosphoproteome, 80 photodiode array detector (PAD), 94, 96 photolithography, 46 photoperiod response, 417 phylogenetic tree, 377 phylogenomics, 373, 376 phytochrome transduction pathway, 440 phytoene synthase (psy), 136 Pinus taeda, 42, 434 plant and trait ontology (PATO), 369 plant ontology (PO), 368, 369, 372, 379 plasmid rescue system, 232 pleiotropic effects, 347 Poaceae, 358, 431, 462 sub-family Bambusoideae tribe, Oryzoideae, 431 Chloridoideae, 431 Panicoideae, 431 tribe, Andropogoneae, 431, 447, 452 Poideae, 431 tribe Aveneae, 431 Triticeae, 431
polymorphic markers, 130 polyploid, 432 polyploidization, 431, 432, 451 polyploidy, 396 polyvinylidene difluoride membrane, 64 pooled DNA, 157 Porteresia coarctata, 3, 396 positional cloning, 2, 124, 363, 462 positive-negative selection, 275~286 post-transcriptional gene silencing (PTGS), 292~294 Potato spindle tuber viroid (PSTVd), 294 Potato virus X (PVX), 314 Potato virus Y (PVY), 293 PR proteins, 73 PR-1, 74–76, 339 primary transformant(s), 188, 198, 214 principal component analysis (PCA), 101, 103 probenazole-inducible protein, 73 promoter trap, 236 propidium iodide, 345 protein accession number, 66 cellular localization, 362 kinase, 15, 336 post-translational modification, 62, 67 sequencer, gas-phase, 64 transmembrane structure, 362 protein-protein interaction, 81–83 proteome, 2, 14, 26, 62~83, 114 proteomics, 1, 62~83, 91, 95, 194, 435 protoporphyrinogen oxidase, 277 provitamin A, 136 pseudogenes, 442 pseudomolecule(s), 14–16, 200, 253, 357, 361, 364, 370, 374, 381–383, 385, 386, 403 Puccinia tritica, 448 pulsed-field-gel-electrophoresis, 404 quantitative trait loci (QTL), 98, 110, 371, 372, 401, 413, 432, 454
Index major QTL for aluminium tolerance, 455 meta-analysis of QTL, 454, 455 QTL pyramiding, 418 QTL-NIL, 125, 126 QTLs, rice, see under Oryza species, QTLs random amplification of polymorphic DNAs (RAPDs), 114, 162 marker(s), 444 random integration(s), 275, 277, 281 reactivation, 228 receptor-like kinase(s) (RLK), 75, 439 recombinant inbred lines (RILs), 120, 121, 368 recombinase, 284 recombinational mapping, 130 recurrent parent (RP), 112, 117, 121–123, 414, 420, 421 reductase-like protein, 73 repeat-associated small interfering RNA (rasiRNA), 295~307 replication protein A1, 75 reporter-gene expression patterns, 381 restriction fragment length polymorphism(s) (RFLP), 33, 113~116, 162, 368, 371, 412, 443 retroelements, 404 retrotransposon(s), 4, 12, 182, 208, 224, 241, 257, 311, 406, 412, 446, 448 copia-type, 201 gypsy-type, 201 LINE (long interspersed nuclear element), 241 MITE (miniature inverted repeat transposable element), 241 mPing, 208, 241, 243 Tos17, 15, 132, 154, 159, 160, 182, 187, 192, 201–204, 208–210, 212, 214, 241, 255, 277, 311, 334, 356, 357, 362, 374, 380, 381, 389, 412 reverse genetics, 3, 164, 212, 241, 242, 255, 258, 261 ribulose-1, 5-bisphosphate, 70
495
rice actin1 promoter, 342 bacterial blight disease, 162 blast disease, 162 chromosome substitution lines, 368, 407 cultivar (varitey) 93–11, 13, 115, 163, 430, 432 9522, 163 Azucena, 74 Bala, 187 Calrose, 76, 134 CT9993, 70 Dee-gee-woo-gen (DGWG), 134 Dongjin, 188, 190, 195, 208, 383 Gihobyeo, 72 Guang-lu-ai 4, 163 H-26, 243 Habataki, 125, 126, 418, 419 Haenuki, 95 Hayamasari, 415, 416 Hwayoung, 190 IR24, 112, 113, 123, 124 IR36, 96 IR62266, 70 IR64, 74, 132, 155, 156, 159, 160, 379 IR8, 134 IRBB21, 113, 163 IRBB27, 124 IRBB5, 124 Kasalath, 128, 133, 187, 415–418 Kinmaze, 161 Koshihikari, 125, 126, 416, 418, 419 isogenic lines Kanto IL1, 418, 419 Wakei, 367, 418, 419 Koshikari, 126 Long-te-pu B, 163 LYP9, 33 M202, 159 Milyang, 23, 72 Nipponbare, 7, 9, 12, 25, 32, 62, 64, 94, 95, 115, 117, 128, 132, 133, 156, 158, 159, 163,
496
Index
182, 187, 190, 192, 208, 241, 247, 250, 251, 254, 343, 368, 375, 381, 382, 401, 415–417, 430, 432, 445, 446, 458 Nona Bokra, 126, 415, 416 Norin 8, 128 Taichung 65, 117, 158, 161, 415 Taichung Native (TN1), 134, 187 Tainung 67, 154, 158, 187, 189, 194, 208, 382 Toyonishiki, 123 TY1, 187 Zhonghua 11, 163, 187–190, 208, 209, 247, 343, 381, 383 Zhonghua 15, 187, 189, 190, 209, 343, 381 Zhong zuo, 321, 187 genetic marker(s), 8, 114, 116, 120, 150, 161, 364, 401, 413 genomes AA, 114, 368, 401, 432 BB, 368 BBCC, 368 CC, 368 CCDD, 366 EE, 368 FF, 368 GG, 368 HHJJ, 368 linkage blocks, 443 linkage maps integrated, 368 Nipponbare/Kasalath, 368 recombinant inbred (RI), 72, 117, 120, 368 wild rice accessions, 368 rice mutant collection, IR64, 379 rice proteome, 62~68, 83, 357 RNA interference (RNAi), 1, 4, 164, 208, 224, 261, 292, 293, 295, 299, 324 RNA silencing, 292, 324, 334, 335 RNA-dependent DNA methylation (RdDM), 137, 294, 298
RNA-dependent RNA polymerase (RdRP), 293, 307, 315, 321, 322 RT-PCR, 53, 124, 209, 339, 340, 349, 350, 435, 447 semi-quantitative, 124 RuBisCO, 70, 73, 74, 76 activase, 70 binding protein α subunit, 82 large subunit, 76 ryegrass, 452 Saccharomyces cerevisiae, 342 RAD54 gene, 278, 279, 285 Saccharum officinarum, 436 major leaf brown rust resistance (Bru1) gene, 452 salinity tolerance, 162 salt-induced protein (SALT), 71, 74 salt stress, 71, 72 salt tolerance, 420 screenable marker, 274 Secale cereale, 434 secondary target sequence, 285 seed morphology, 375 seed storage protein, 277 selectable marker(s), 229~238, 252 semantic frameworks, 388 semantic web language(s), 387 sequence mis-assemblies, 11 positional accuracy, 6 sequence accuracy, 6 sequence-indexed mutants, 213 sequence-specific integration, 276 Sequence Tagged Connector (STC) method, 9 Sequence Tagged Site (STS), 8 sequencing Maxam-Gilbert method, 6 Sanger dideoxy-chain terminator method, 6 Serial Analysis of Gene Expression (SAGE), 32, 40, 43, 53, 54, 59, 224, 356, 365, 385 LongSAGE, 40–42, 224
Index Robust-LongSAGE (RL-SAGE), 42, 43 SuperSAGE, 41, 53 signal processing, 103 signal transduction, 67, 69–71, 75, 77, 335, 336, 439 silent mutation, 92 simple sequence repeats (SSRs), 15, 115, 162, 371 single crossover, 275 single-feature polymorphisms (SFP), 163 single nucleotide polymorphism(s) (SNPs), 13, 15, 115, 128, 132, 151, 162, 163, 168, 170, 171, 173, 370, 462 single-sequence length polymorphism (SSLP), 162 site-specific base change, 279 site-specific recombinase, 282 S-Like RNase homolog, 70 small (short) interfering RNA (siRNA), 49, 52, 177, 294–298, 302, 303, 306–309, 314, 319, 320 small G protein, 76 small nuclear RNA (snRNA), 436 small nucleolar RNA (snoRNA), 436 SNaPshot fingerprinting, 399 sodium azide, 154, 160 Software, see under Data Banks/Databases/Browsers/ Software Solanum tuberosum, 434 somaclonal variation(s), 208, 214, 277, 285, 286, 340 sorghum, 7, 401, 431, 445–448, 462, 463 genetic markers, 364 Sorghum bicolor, 434 Sorghum propinquum, 434 soybean, 154 spotted leaf gene, Spl7, 127 SSR/SNP map, genome-wide, 406 stable isotope labeling, 98 starch biosynthesis pathways, 172 stress tolerance, 413, 421
497
subspecies variation, 14 substrate specificity, 104 subtelomere-specific satellite DNA, 201 Subterranean clover mosaic virus promoter (ScMV P), 193 sugar, 95, 97, 98 sugar phosphate, 95 sugarcane, 431, 452 sulfur deficiency, 97 superbinary vector, 183, 184 sweet potato sporamin gene, 339 synonymous substitution rate, 438 syntenic mapping, 16 synteny, 7, 443 systems biology, 91~104 tandem duplication, 22 tandemly-arranged genes, 165 tapetum, 210, 211 target site duplication (TSD), 242 targeted allele replacement, 137 targeted gene replacement, 283 Targeting Induced Local Lesions IN Genomes (TILLING), 2, 4, 151~173 T-DNA, 4, 164, 181~215, 224~262, 276, 280, 283, 334~348, 375, 381, 412 activation tagging lines, 383 flanking sequence(s), 381, 383 insertion lines, 188~196, 206–208, 214, 232, 248, 357, 375, 381 single-stranded T-DNA, 276 telomere, 16 teosinte, 431 terminal inverted repeats (TIRs), 242 thermal asymmetric interlaced PCR (TAIL-PCR), 216, 232, 239, 240, 247~260, 383 thioredoxin, 76 Ti plasmid pTiBo542, 183 TIGR gene indices, 364 tiling microarray, 45~50, 356, 436 tms2, 156 Tobacco, 279, 283, 293
498
Index
Tobacco etch virus (TEV), 293 Tos17, see under Retrotransposon(s) traditional mutagenesis, 151 trait ontology (TO), 367, 371, 379 trans-acting siRNAs (ta-siRNAs), 296, 306, 308, 309, 320 transactivation, 207, 341–348 transcript map, 8, 359–360 transcription factor(s), 50, 96, 124, 208, 210, 259, 296, 316, 336, 346, 349, 382, 436, 437, 458, 460 C-repeat binding factor (CBF), 438, 440 GATA, 438, 439 GRAS, 438, 451 TFIIA, 124 WRKY, 438 transcription regulation, chromosome level, 104 transcriptional gene silencing (TGS), 295, 296, 298 transcriptome(s), 31, 32, 38~47, 69, 97, 114, 210, 212, 214, 260, 341 transcriptomics, 52, 91, 97, 435 transfer RNA (tRNA), 361 transgene silencing, 195 transgressive variation, 112 transiently expressed transposase (TET) system, 236, 238, 249 transposable element (TE)(s), 50, 52, 182, 193, 198, 208, 224, 226, 229, 237, 240, 241, 285, 295, 310, 404, 432, 435, 446, 448, 449 Class I elements, 224 Class II elements, 224, 225 endogenous, 224 nonautonomous, 225, 229–232, 241, 243, 245 transposable element-related (TE-related) genes, 52 transposase, 207, 225~238, 243, 244, 246, 249–251, 254, 260, 261 transposition, 224~262 frequency, 227, 228, 245, 250, 252 germinal, 227, 229, 250–252 inducible, 243, 244
transposon, 4, 15, 164, 224~262, 335 Ac/Ds, 182, 186, 225~261, 310, 334, 357, 374, 381, 384, 412 autonomous, 224, 231, 232, 235, 243, 245 DNA-based active rice, 241 Ds, 225~262, 382 Ds flanking sequences, 385 Ds/T-DNA launch pad, 236, 245, 384 dynamics, 406 En (Spm) transposon, 182, 186, 211, 384 En/I (Spm/dSpm), 225~255, 335 endogenous, 236, 240 tagging, 225~244, 256, 460 transposon display, 256 Triticum aestivum, 434 genes Cdc2-related, 451 VRN1, 16, 449, 451 VRN2, 16 Triticum monococcum, 434, 448, 449 genes/loci/mutants Lr10 locus, 448, 451 Pm3, 451 Q, 456 RGA1, 448 RGA2, 448 Rht, 134 Rht-1, 450 Triticum turgidum, 448 trypsin, 65 tryptic peptides, 65 tumor inducing (Ti) plasmid, 183 tungro virus disease, 162 Turnip crinkle virus (TCV), 302 two-dimensional liquid chromatography (2D-LC), 77–79 two-dimensional polyacrylamide gel electrophoresis (2D-PAGE), 63–83 tyrosyl-tRNA synthase, 82 UDP-glucose pyrophosphorylase, 72 Unweighted Pair Group Method with Arithmetic Mean (UPGMA), 82 Upstream Activation Sequence (UAS), 3, 190, 193, 207, 208, 337, 342, 348
Index V8 protease, 64 vector backbone (VB), 195, 196, 232, 235 Vir genes/proteins VirB, 183 VirC, 183 VirD, 183 VirD1, 183 VirD2, 183 VirE, 183 VirG, 183 viral satellite RNA, 298 viral silencing suppressors, 308, 309 2b, 308 P1/HC-Pro, 308 p19, 308, 309 p21, 308 p25, 308 p38, 308 viroid, 294 virus resistance, 74 vitamin A deficiency, 136 vocabulary (CV), 159, 160, 371, 372 voltage-dependent anion channel, 76 VP16 activation domain, 193 wall-associated kinase, 439 web service technologies, 375 web services, 387 wheat, 7, 16, 364, 371, 385, 411, 423, 431, 436, 445, 448, 451, 455, 456 cultivar (variety) 74, 418 genes, see under Triticum aestivum genetic markers, 361 hexaploid, 448 whole genome shotgun (WGS) sequencing, 6, 7, 13, 14, 442, 463 wild rice, 368, 371
499
Xanthomonas oryzae pv. oryzae (Xoo), 45, 112, 123 yeast artificial chromosome (YAC), 33, 35, 251, 359, 360 yeast two-hybrid system, 81 Zea mays, 225, 310, 370, 434, 453 genes/mutants/proteins Ba1, 451 barren inflorescence1, 460 barren inflorescence2, 460 barren stalk1, 451, 460 branched silkless1, 460 Bronze(Bz) locus, 449 d1 gene, 164 D8, 450 indeterminate spikelet1, 457, 460 Lateral Organ Boundary (LOB), 460 Lg2, 449 liguleless2, 460 Lrs1, 449 Orp1, 448, 449 Orp2, 448, 449 r/b genes, 449 Ramosa1, 452, 460 Ramosa2, 460 Shrunken2 (Sh2), 448 Teosinte branched1, 460 thick tassel dwarf 1 (td1), 451 vgt1, 456 vgt2, 456 X1, 446 X2, 446 zein, 448 Zfl1, 459, 460 Zfl2, 459, 460 zinc-finger nuclease(s), 282–285 zinc finger protein, putative, 80 Zingiber officinale, 434