Cancer Gene Profiling Methods and Protocols

Methods in Molecular Biology™ Series Editor John M. Walker School of Life Sciences University of Hertfordshire Hatfi...

Author: Robert Grützmann | Christian Pilarsky

67 downloads 3129 Views 18MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Methods

in

Molecular Biology™

Series Editor John M. Walker School of Life Sciences University of Hertfordshire Hatfield, Hertfordshire, AL10 9AB, UK

For other titles published in this series, go to www.springer.com/series/7651

Cancer Gene Profiling Methods and Protocols

Edited by

Robert Grützmann and Christian Pilarsky Department of Surgery, University Hospital Carl Gustav Carus, University Dresden, Dresden, Germany

Editors Robert Grützmann Department of Surgery University Hospital Carl Gustav Carus University Dresden Dresden Germany [email protected]

Christian Pilarsky Department of Surgery University Hospital Carl Gustav Carus University Dresden Dresden Germany [email protected]

ISSN 1064-3745 e-ISSN 1940-6029 ISBN 978-1-934115-76-3 e-ISBN 978-1-59745-545-9 DOI 10.1007/978-1-59745-545-9 Springer Dordrecht Heidelberg London New York Library of Congress Control Number: 2009930638 © Humana Press, a part of Springer Science+Business Media, LLC 2010 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Humana Press, c/o Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. While the advice and information in this book are believed to be true and accurate at the date of going to press, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Printed on acid-free paper Humana is part of Springer Science+Business Media (www.springer.com)

Preface Science is the agent between imagination and reality (Anonymous) During the last few years, the methods for analysing cancer-related genes on a molecular level have changed rapidly. With the advent of automated sequencing, new and faster investigations have become possible. This has led to the collection of a large number of DNA samples, such as Expressed Sequence Tag (EST) libraries whose entries run into millions. The advances in DNA sequencing technologies resulted in rapid improvements in oligonucleotide synthesising technologies, which has allowed researchers to produce oligonucleotides for each and every imaginable sequence at a low cost. Finally, the method of polymerase chain reaction (PCR), and other improvements in enzymatic in vitro amplification of nucleic acids, gave researchers the opportunity to use low amounts of nucleic acids for analysis. This enabled the research community to investigate the populations of cells in a given tissue. Sometimes it takes only a few advances for a technology to be successful – for example, the concept of arraying biological probes in a reproducible manner, and the use of these arrays instead of a single probe, greatly advanced biomedical research, especially as it was discovered that everything is arrayable. It has also changed the landscape of science in another way: a reduction of costs. The costs of generating such an array are high, but the costs of replicating such arrays are not. This key fact has led to a growth in the number of biotech companies that design and produce arrays, and that today more and more researchers have access to and use. Such unrestricted access to these resources has really been the key to the biomedical research revolution we see today. In this book, we have brought together the experiences of leading scientists in the discipline of cancer gene profiling. We have included several microarray techniques, as well as methods for arraying tissues and proteomics, because cancer genes can be profiled in different ways. Such different approaches are needed to understand the key stages of cancer development, because using only one technique would be insufficient. Therefore, we attempt to give an overview of the state-of-the-art methods that will enable the reader to perform these experiments successfully. It has been written for any student or practitioner with an interest in cancer gene profiling, and can be used in any well-equipped research laboratory. It may also serve as a demonstration of the kind of analysis that is possible today and will be complementary to other textbooks in the area of biomedical research. This book has been divided into five main sections. The first section covers techniques to get clinically relevant cancer material through the best methods of sample collection and storage. The second part begins with an overview of gene expression technology and gives an introduction to the latest cancer gene profiling technologies. Because cancer gene profiling is more then just the profiling of cancer gene expression, we have also included techniques for comparative genomic hybridisation (CGH) arraying and single-nucleotide polymorphism (SNP) analysis, and proteomic techniques.

v

vi

Preface

The third section contains real-life examples for the different technologies, and shows the full potential of cancer gene profiling today. This potential can only be utilised by the use of adequate bioinformatics tools. These tools are covered in the fourth part of the book. Because a cancer gene profiling experiment will most often lead to numerous candidate genes, which, in turn, have to be further validated and analysed, examples of performing post profiling experiments can be found in the final section of the book. It should be noted that all of the chapters in the book are linked by the description of particular successful experiments that were performed within the field of gene expression profiling. We offer our gratitude to all of the contributing authors and the staff of Humana Press – without their help, this book would not have been possible. We also thank our families for their love and patience. Finally, we are indebted to our mentor Hans Detlev Saeger for his unwavering support. Science is not just a profession – it should also be fun. This fun comes from the inception of an idea that goes on to be proven through experimentation, or, as we found in a Chinese fortune cookie: “The impossible is only the untried.” We hope that you will not only be successful, but will also have fun using our book in your research.

Dresden, Germany

Robert Grützmann Christian Pilarsky

Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

v ix

1. Organizational Issues in Providing High-Quality Human Tissues and Clinical Information for the Support of Biomedical Research . . . . . . . . . . . . . . . . 1 Walter C. Bell, Katherine C. Sexton, and William E. Grizzle 2. Manual Microdissection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Glen Kristiansen 3. Laser Microdissection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Anja Rabien 4. Tissue Microarrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Ana-Maria Dancau, Ronald Simon, Martina Mirlacher, and Guido Sauter 5. A Decade of Cancer Gene Profiling: From Molecular Portraits to Molecular Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Henri Sara, Olli Kallioniemi, and Matthias Nees 6. Mining Expressed Sequence Tag (EST)Libraries for Cancer-Associated Genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Armin O. Schmitt 7. Automated Fluorescent Differential Display for Cancer Gene Profiling . . . . . . . . . 99 Jonathan D. Meade, Yong-jig Cho, Blake R. Shester, Jamie C. Walden, Zhen Guo, and Peng Liang 8. Manual Microdissection Combined with Antisense RNA–LongSAGE for the Analysis of Limited Cell Numbers . . . . . . . . . . . . . . . . . 135 Jutta Lüttges, Stephan A. Hahn, and Anna M. Heidenblut 9. Quantitative DNA Methylation Profiling on a High-Density Oligonucleotide Microarray . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Anne Fassbender, Jörn Lewin, Thomas König, Tamas Rujan, Cecile Pelet, Ralf Lesche, Jürgen Distler, and Matthias Schuster 10. Single-Nucleotide Polymorphism (SNP) Analysis to Associate Cancer Risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 Julie Earl and William Greenhalf 11. Application of Proteomics in Cancer Gene Profiling: Two-Dimensional Difference in Gel Electrophoresis (2D-DIGE) . . . . . . . . . . . . . 197 Deepak Hariharan, Mark E. Weeks, and Tatjana Crnogorac-Jurcevic 12. Search for and Identification of Novel Tumor-Associated Autoantigens . . . . . . . . 213 Karsten Conrad, Holger Bartsch, Ulrich Canzler, Christian Pilarsky, Robert Grützmann, and Michael Bachmann

viii

Contents

13. Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 Jose A Martínez-Climent, Lorena Fontan, Vicente Fresquet, Eloy Robles, María Ortiz, and Angel Rubio 14. Cancer Gene Profiling in Pancreatic Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 Felip Vilardell and Christine A. Iacobuzio-Donahue 15. Cancer Gene Profiling in Prostate Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 Adam Foye and Phillip G. Febbo 16. Cancer Gene Profiling for Response Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 B. Michael Ghadimi and Marian Grade 17. The EGFR Pathway as an Example for Genotype: Phenotype Correlation in Tumor Genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 Ulrike Mogck, Eray Goekkurt, and Jan Stoehlmacher 18. Quantitation Of CD39 Gene Expression in Pancreatic Tissue by Real-Time Polymerase Chain Reaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 Martin Loos, Beat Künzli, and Helmut Friess 19. Functional Profiling Methods in Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 Joaquín Dopazo 20. Calibration of Microarray Gene-Expression Data . . . . . . . . . . . . . . . . . . . . . . . . . 375 Hans Binder, Stephan Preibisch, and Hilmar Berger 21. Meta-analysis of Cancer Gene-Profiling Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409 Xinan Yang and Xiao Sun 22. Target Gene Discovery for Novel Therapeutic Agents in Cancer Treatment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427 Ole Ammerpohl, Sanjay Tiwari, and Holger Kalthoff Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447

Contributors Ole Ammerpohl • Clinic for General Surgery and Thoracic Surgery, Division Molecular Oncology, University Hospital of Schleswig, Kiel, Germany Michael Bachmann • Institute for Immunology, Technical University Dresden, Dresden, Germany Holger Bartsch • Institute for Immunology, Technical University Dresden, Dresden, Germany Walter C. Bell • Department of Pathology, University of Alabama at Birmingham, Birmingham, AL, USA Hilmar Berger • Institute for Medical Informatics, Statistics and Epidemiology, University of Leipzig, Leipzig, Germany Hans Binder • Interdisciplinary Centre for Bioinformatics, University of Leipzig, Leipzig, Germany Ulrich Canzler • Institute for Immunology, Technical University Dresden, Dresden, Germany Yong-jig Cho • Department of Cell Biology, Vanderbilt-Ingram Cancer Center, School of Medicine, Vanderbilt University, Nashville, TN, USA Karsten Conrad • Institute for Immunology, Technical University Dresden, Dresden, Germany Tatjana Crnogorac-Jurcevic • Cancer Research UK Molecular Oncology Unit, Barts and The London Queen Mary’s School of Medicine and Dentistry, John Vane Science Centre, London, UK Ana-Maria Dancau • Institute of Pathology, University of Hamburg, Hamburg, Germany Jürgen Distler • Science Department, Epigenomics AG, Berlin, Germany Joaquín Dopazo • Bioinformatics Department, Centro de Investigación Príncipe Felipe, Valencio, Spain Julie Earl • Division of Surgery and Oncology, University of Liverpool, Liverpool, UK Anne Fassbender • Science Department, Epigenomics AG, Berlin, Germany Phillip G. Febbo • Departments of Medicine and Molecular Genetics and Microbiology, Duke Institute for Genome Science and Policy, Duke University, Durham, NC, USA Lorena Fontan • Division of Oncology, Center for Applied Medical Research, University of Navarra, Pamplona, Spain Adam Foye • Departments of Medicine and Molecular Genetics and Microbiology, Duke Institute for Genome Science and Policy, Duke University, Durham, NC, USA Vicente Fresquet • Division of Oncology, Center for Applied Medical Research, University of Navarra, Pamplona, Spain

ix

x

Contributors

Helmut Friess • Department of Surgery, Technische Universität München, Munich, Germany B. Michael Ghadimi • Department of General and Visceral Surgery, University Medical Center Göttingen, Georg-August-University, Göttingen, Germany Eray Goekkurt • Department of Internal Medicine I, University Hospital Carl Gustav Carus, University Dresden, Dresden, Germany Marian Grade • Department of General and Visceral Surgery, University Medical Center Göttingen, Georg-August-University, Göttingen, Germany William Greenhalf • Division of Surgery and Oncology, University of Liverpool, Liverpool, UK William E. Grizzle • Department of Pathology, University of Alabama at Birmingham, Zeigler Research Building, Birmingham, AL, USA Robert Grützmann • Department of Surgery, University Hospital Carl Gustav Carus, University Dresden, Dresden, Germany Zhen Guo • GenHunter Corporation, Nashville, TN, USA Stephan A. Hahn • Molecular GI-Oncology (MGO), Center for Clinical Research (ZKF), Ruhr-University Bochum, Bochum, Germany Deepak Hariharan • Cancer Research UK Molecular Oncology Unit, Barts and The London Queen Mary’s School of Medicine and Dentistry, John Vane Science Centre, London, UK Anna M. Heidenblut • Molecular GI-Oncology (MGO), Center for Clinical Research (ZKF), Ruhr-University Bochum, Bochum, Germnay Christine A. Iacobuzio-Donahue • Department of Pathology, GI/Liver Division, Johns Hopkins Medical Institutions, The Sol Goldman Pancreatic Cancer Research Center, Baltimore, MD, USA Olli Kallioniemi • VTT Medical Biotechnology, Turku, Finland Holger Kalthoff • Clinic for General Surgery and Thoracic Surgery, Division Molecular Oncology, University Hospital of Schleswig-Holstein, Kiel, Germany Thomas König • Science Department, Epigenomics AG, Berlin, Germany Glen Kristiansen • Department of Pathology, University Hospital Zurich, Zurich, Switzerland Beat Künzli • Department of Surgery, Technische Universität München, Munich, Germany Ralf Lesche • Science Department, Epigenomics AG, Berlin, Germany Jörn Lewin • Science Department, Epigenomics AG, Berlin, Germany Peng Liang • Department of Cell Biology, Vanderbilt-Ingram Cancer Center, School of Medicine, Vanderbilt University, Nashville, TN, USA Martin Loos • Department of Surgery, Technische Universität München, Munich, Germany Jutta Lüttges • Institute für Pathology, Saarbrücken Hospital, Saarbrücken, Germany Jose A. Martínez-Climent • Division of Oncology, Center for Applied Medical Research, University of Navarra, Pamplona, Spain

Contributors

xi

JONATHAN D. MEADE • GenHunter Corporation, Nashville, TN, USA MARTINA MIRLACHER • Institute of Pathology, University of Hamburg, Hamburg, Germany ULRIKE MOGCK • Department of Internal Medicine I, University Hospital Carl Gustav Carus, University Dresden, Dresden, Germany MATTHIAS NEES • VTT Medical Biotechnology, Turku, Finland MARÍA ORTIZ • CEIT and TECNUN, University of Navarra, San Sebastián, Spain CECILE PELET • Science Department, Epigenomics AG, Berlin, Germany CHRISTIAN PILARSKY • Department of Surgery, University Hospital Carl Gustav Carus, University Dresden, Dresden, Germany STEPHAN PREIBISCH • Max-Planck-Institute for Molecular Cell Biology and Genetics, Dresden, Dresden, Germany ANJA RABIEN • Research Division, Department of Urology, Charité – Universitätsmedizin Berlin, Campus Charité Mitte, Berlin, Germany ELOY ROBLES • Division of Oncology, Center for Applied Medical Research, University of Navarra, Pamplona, Spain ANGEL RUBIO • CEIT and TECNUN, University of Navarra, San Sebastián, Spain TAMAS RUJAN • Science Department, Epigenomics AG, Berlin, Germany HENRI SARA • VTT Medical Biotechnology, Turku, Finland GUIDO SAUTER • Institute of Pathology, University of Hamburg, Hamburg, Germany ARMIN O. SCHMITT • Institute for Animal Sciences, Humboldt-Universität zu Berlin, Berlin, Germany MATTHIAS SCHUSTER • Science Department, Epigenomics AG, Berlin, Germany KATHERINE C. SEXTON • Comprehensive Cancer Center, University of Alabama at Birmingham, Birmingham, AL, USA BLAKE R. SHESTER • GenHunter Corporation, Nashville, TN, USA RONALD SIMON • Institute of Pathology, University of Hamburg, Hamburg, Germany JAN STOEHLMACHER • Department of Internal Medicine I, University Hospital Carl Gustav Carus, University Dresden, Dresden, Germany XIAO SUN • Division of Bioinformatics, State Key Laboratory of Bioelectronics (ChienShiung Wu Laboratory), Southeast University, Nanjing, China SANJAY TIWARI • Division Molecular Oncology, Clinic for General Surgery and Thoracic Surgery, University Hospital of Schleswig-Holstein, Kiel, Germany FELIP VILARDELL • Department of Pathology, GI/Liver Division, Johns Hopkins Medical Institutions, The Sol Goldman Pancreatic Cancer Research Center, Baltimore, MD, USA JAMIE C. WALDEN • GenHunter Corporation, Nashville, TN, USA MARK E. WEEKS • Cancer Research UK Molecular Oncology Unit, Barts and The London Queen Mary’s School of Medicine and Dentistry, John Vane Science Centre, London, UK XINAN YANG • Division of Bioinformatics, State Key Laboratory of Bioelectronics, (Chien-Shiung Wu Laboratory), Southeast University, Nanjing, Nanjing, China

Chapter 1 Organizational Issues in Providing High-Quality Human Tissues and Clinical Information for the Support of Biomedical Research Walter C. Bell, Katherine C. Sexton, and William E. Grizzle Summary Superior-quality human tissues are required to support many types of biomedical research. To be useful optimally in supporting research, not only must these tissues be accurately diagnosed, but also the specific aliquots of tissue supplied to investigators must be accurately described as part of the quality control analysis of the tissue. Tissues should be collected, processed, and stored uniformly. Some tissues are provided to investigators from tissue banks for which tissues have been collected and processed according to standard operating procedures (SOPs) of the tissue bank. Other tissues provided to support research are collected and processed according to SOPs modified to meet investigator needs and requirements, i.e., prospective collection/processing. These different models of tissue collection require different goals, designs, and SOPs. The objectives of tissue repositories also vary based on the types of tissues provided (e.g., fresh tissue aliquots, fixed paraffin-embedded tissue, paraffin tissue sections, etc.) and how the tissues are to be used in research. For example, the potential use of tissues affects the need for extensive annotation of the specimen including both clinical information (e.g., clinical outcomes) and demographics. Specifically, if the tissues are to be used for extraction of proteins or basic studies of disease processes, less clinical information, if any, may be needed than if the tissues are to be used for the correlation of an aspect of the disease process with clinical outcome or response to a specific therapy. In this review, we describe, based on our experience, the major issues that should be addressed in designing and establishing a tissue repository. Key words: Human tissue, Tissue banking, Tissue repository, Research infrastructure, IRB, HIPAA Abbreviations CHTN DCIS DMSO GMP HIPAA ISBER ISO

Cooperative Human Tissue Network Ductal carcinoma in situ Dimethyl sulfoxide Good manufacturing practices Health Insurance Portability and Accountability Act International Society for Biological and Environmental Repositories International Organization for Standardization

Robert Grützmann and Christian Pilarsky (eds.), Cancer Gene Profiling: Methods and Protocols, Methods in Molecular Biology, vol. 576, DOI 10.1007/978-1-59745-545-9_1, © Humana Press, a part of Springer Science + Business Media, LLC 2010

1

2

Bell, Sexton, and Grizzle

LCIS Lobular carcinoma in situ LN2 Liquid nitrogen NCI National Cancer Institute OCT Optimal cutting temperature (compound for embedding specimens prior to cryosectioning) PHI Protected health care information PSA Prostatic-specific antigen QA Quality assurance/management QC Quality control SOP Standard operating procedure

1. Introduction Modern biomedical research requires access to superior-quality specimens of human tissue and bodily fluids with or without extensive clinical annotation (1–2). Different types of organizations devoted to supplying tissues for research have varying goals selected to meet different tissue and informational needs (3). In this review, we discuss multiple models of tissue repositories and, based on our experience, several of the more important issues affecting the design and operations of a tissue repository. A detailed discussion of most of these issues is beyond the scope of this chapter. Thus, we have referenced articles that provide additional information on these topics (1–19). We have also provided examples of standard operating procedures (SOPs); one for the processing of blood and another for the processing of tissue.

2. Models of Tissue Collection Obtaining human tissues and bodily fluids to support biomedical research may utilize an organized or disorganized approach. “Catch as catch can” is the best designation of the approach in which a surgeon, pathologist, or other medical personnel provides tissues to investigators via an unorganized approach. Such specimens characteristically have not been collected, processed, or stored using SOPs and usually are not associated with quality control; thus, these specimens may be of poor quality and their diagnosis may be incorrect. More worrisome is that such specimens may be obtained without oversight of Privacy Boards or Institutional Review Boards (IRBs) and may violate the Common Rule and/or the Health Insurance Portability and Accountability Act (HIPAA) regulations.

Organizational Issues in Providing High-Quality Human Tissues

3

2.1. Prospective Collection

An organized approach in which investigators specify exactly the tissue specimen they need as well as how the specimens are to be processed and stored is designated as the “prospective collection model.” The clear disadvantage of prospective collection versus a banking model is that large numbers of specimens are not readily available immediately when requested. In addition, outcome data are not available because the specimens are collected when requested, so the patients’ clinical outcomes may take several years to develop after collection. The advantage of the prospective collection model is that the investigator receives exactly what is requested (e.g., fresh uninvolved kidney minced in RPMI media). The investigator must, however, wait for specimen availability (weeks to months) and, when needed, on clinical outcome, which may take years.

2.2. Banking

Another approach to obtaining superior human tissues to support biomedical research is to utilize a “banking model.” In a banking model, SOPs are followed for obtaining, processing, and storing human tissues. For example, a bank may store only frozen tissues and/or paraffin blocks. “Specially processed” tissues as well as fresh, unfrozen samples usually will not be available from such a bank. One of the major disadvantages is that the specimens may not meet specific requirements of the investigator for specific parameters such as aliquot size, percentage of tumor, and processing and/or storage methods (3). Advantages of the “banking model” are that large numbers of specimens may be immediately available and clinical and demographic information including clinical outcome also are readily available.

2.3. Specimens Associated with Clinical Trials

A “clinical trial model” is a type of banking model in which the remnants of the tissues/bodily fluids collected from one or more clinical trials are banked to support future studies. The problems with this specific banking model compared with a general banking model may be magnified in that the original consent form of the clinical trial may not clearly state that the specimens can be used for research in addition to the clinical trial. Similarly, the institution’s IRB may prohibit the utilization of specimens for a different type of research. In addition, the remnants of the clinical study may not meet the needs of a wide range of investigators, and remnant tissues may not be available from all original patients of the clinical trial.

2.4. Combination

The “tissue repository model” uses a combination of the approaches of the prospective and banking models, including the advantages of each of the models. The main potential problem in the operation of a tissue repository is that there are complex and numerous administrative requirements as well as the need for a more complex bioinformatics system. This chapter focuses primarily on a tissue repository model.

4

Bell, Sexton, and Grizzle

3. Matching Tissues to Tissue Requirements 3.1. Identification of Specimens

3.2. Difficult to Fulfill Requests

Correct identification of specimens is of critical importance to providing superior-quality tissues to support biomedical research (3, 6, 12, 16, 17). A labeling method should be used that (1) minimizes the label separating from the specimen, (2) prevents mislabeling due to errors by personnel, and (3) avoids problems with reading the specimen identification (e.g., poor handwriting). For most tissue repositories, this is best accomplished by the use of bar codes that link the specimen to a database containing pertinent clinical, demographic, and historical information regarding the individual who was the source of the specimen. It should be understood that a bar code is only a “number” and this number contains no other information; it is only via the link of the number of the bar code to software that information regarding the bar-coded specimen is actually identified. Thus, unless the identical software is used at a second site, the second site can only read the specimen number and does not have access to a link with the information in the database. Any other information on the printed label, e.g., race, age, etc., comes from the software via the bar code and not directly from the bar code number. Requests for very specific tissues become more difficult to meet as more requirements are placed on the request (3, 16). Obviously, a request for any breast carcinoma is less difficult to supply than a request for well-differentiated breast carcinoma from an African American man younger than 40 years of age, because breast cancers are rare in males and in relatively young individuals. Another investigator requirement that makes a request difficult to meet is a request for very large amounts of tissue (e.g., 5 g) from tumors that are typically small. This includes tumors of the breast and prostate, which are usually small due to screening methods. Because some cancers (e.g., breast, prostate, pancreas) are in great demand by investigators, requests from multiple investigators, each requiring a small amount of tissue (e.g., 0.1 g), will more likely be filled than will a request for a large amount of tissue from these tumors (3, 16). Similarly, for tumors in high demand, such as prostate or pancreas tumors, which tend to be small, requests for large numbers of cases within 1 year (e.g., 100) are unlikely to be met. Requests for large numbers of relatively rare tumors or tumors or other tissues that are not typically removed surgically cannot be provided (see Subheading 4.2.4). Most tissue repositories try to provide tissues equitably among those investigators requesting the same tissues. Because efforts (time) devoted to supplying specific tissue requests must also be divided equitably, tissue requests requiring extensive effort (e.g., removal of a vertebral column from a body or processing hundreds of

Organizational Issues in Providing High-Quality Human Tissues

5

specimens by a complex protocol) cannot easily be met by a busy tissue repository. As discussed in Subheading 3.4, it is important for tissue repositories to educate investigators regarding which of their requirements make their requests difficult or impossible to meet. This includes unreasonable times for processing and freezing after the specimen is removed surgically (as discussed in Subheading 3.3). 3.3. Time Interval Between Surgery and Tissue Storage

In general, remnant diagnostic tissues should be reviewed by a pathologist or their designate to assure that the diagnostic integrity of the specimen is uncompromised. Unreasonable requirements of investigators for rapid processing and freezing of samples after surgery will reduce the availability of tissue to investigators for several reasons. Freezing samples in the operating room within minutes of removal from the patient may jeopardize the ability of a pathologist to review the material to ensure that it is not required for diagnosis. In addition, although some aliquots of tissue may be collected, processed, and frozen within 15 min of surgical removal, this usually requires special dedicated personnel and resources that many tissue repositories may not have. It is, however, important to record the time intervals between the removal of operative specimens or the transfer of these specimens from the operating room to the tissue repository, and these intervals should be minimized as much as possible. After specimens are removed from patients, they should be maintained unfrozen, at approximately 4°C rather than at room temperature, while awaiting diagnostic review. Specimens provided for research should then be processed as rapidly as practicable; however, delays in processing may occur when multiple specimens must be processed simultaneously. In such cases, one or two aliquots from each specimen could be rapidly snap frozen in liquid nitrogen (LN2) vapor and other aliquots could be collected and subsequently frozen. The scientific importance of rapid collection and processing of tissue after a long period of warm ischemia in vivo (i.e., while blood vessels are compromised during surgery, see Subheading 4.2.3) is controversial. This is because many molecules will be affected by enzymes that function optimally at body temperature of warm ischemia. Thus, numerous molecular changes may occur before operative tissues are removed from the body. Huang et al. (20) evaluated the effects of in vitro ischemia on 2,400 genes in human tissue specimens using spotted arrays and found that less than 14% of the genes changed by more than 50%. Most genes at the messenger RNA (mRNA) level showed relatively modest increases (5 min after surgery versus 60 min after surgery). Similarly, Dash et al. (21) reported that less than 1% of genes demonstrated changes after 1 h of removal of prostate tissue from the body. In addition, Spruessel et al. (22) reported that 80% of genes

6

Bell, Sexton, and Grizzle

changed less than twofold within 30 min after removal from the body. Based on these studies of mRNA, very rapid removal (<30 min) to maintain nondegraded mRNA may not be justified. Phosphoproteins are of special interest to molecular stability. Baker et al. (23) reported that 9 of 13 biopsies of adenocarcinomas of the esophagus expressed pAkt; however, pAkt was identified in none of the matching resected specimens. In addition, using a xenograft model, they reported a 180 min half-life of Akt and a 20 min half-life of pAkt. In contrast, Ayala et al. (24) detected pAkt in formalin-fixed, paraffin-embedded tissue from radical prostatectomies and used its presence/absence as a prognostic biomarker. Thus, phosphoprotein molecules may require rapid tissue processing. Nevertheless, this area remains controversial. 3.4. Education of Tissue Repository Personnel and Repository Users

Providing consistent and hence uniformly collected, processed, and stored tissues requires using SOPs in the operation of the tissue repository. All tissue procurement personnel need to be trained in all aspects of repository operations, including meticulous adherence to SOPs (12, 17, 18). Training in safety as well as regulatory and ethical issues (keeping patient information confidential) also is very important, not only for personnel of the repository, but also for all users of the repository (3, 16). The repository should also serve as an educational resource for investigators and other clients. Clients frequently require assis tance in selecting the optimal tissues to support their research. For example, clients need to understand why and how restrictive requirements on the tissues they request may prevent them from receiving tissues (Subheadings 3.2 and 3.3). Similarly, they may need to understand repository limitations in collecting and processing tissues as well as how tissues differ in their appropriate ness to support a specific research project (e.g., smooth muscle from the wall of the colon is different from smooth muscle of the uterus). All requests for human tissues should be reviewed by the pathologist or other equivalently knowledgeable professional associated with the tissue repository and, if necessary, guidance should be provided to the investigator concerning the appropriateness of the request for specific tissues. If a request is difficult to meet because of unnecessary requirements, this offers an excellent opportunity to explain to the requesting investigator limitations in tissue collection and processing.

3.5. Types of Tissues Collected and Services Provided

Tissue repositories may provide an array of tissues to investigators, ranging from paraffin sections of one type of cancer (e.g., breast) to fresh, frozen, and fixed solid tissues and bodily fluids from patients with a variety of disease processes both neoplastic and nonneoplastic. One of the initial decisions when developing a tissue repository is to determine what tissues and what processing

Organizational Issues in Providing High-Quality Human Tissues

7

and other services will be provided to investigators. The potential services needed by investigators could be determined by surveying the tissue needs of the investigators likely to be served by the tissue repository. The design of the tissue repository, including space, equipment, personnel, and supplies will depend on the types and processing of the tissues provided to investigators. Services provided also affect the required resources of the repository. Potential services beyond providing tissues include delivery of tissues to local investigators, culturing cells from tissues, and extracting DNA and/or RNA from tissues. Providing multiple services may distract from the primary purpose of the tissue repository and such services may be difficult to discontinue once provided, even if they subsequently impede the primary functions of the tissue repository. If services are performed, the tissue repository should be fully compensated for the efforts and resources devoted to such services.

4. Issues Affecting Tissue Repositories 4.1. Annotation of Tissue Specimens, Clinical Information, and Demographics

The key components of specimen annotation include the age, race, and sex of the patient and a diagnostic description of the specific aliquot of the specimen provided to the investigator (see Subheading 5.3). After this basic information, the extent of annotation required for a specimen will vary with the primary use of the specimen as well as the goals of the tissue repository. For example, if the primary goal of the tissue repository is to develop a tissue bank to support complex epidemiological studies of a disease process (e.g., diabetes mellitus, type II), then detailed clinical, familial, and social histories of the patient would be collected at the time of tissue collection. In contrast, if tissues are being collected and used to study the biochemistry of a wide variety of diseases and little clinical information is needed for these studies, only basic annotation may be required and the collection of detailed clinical information associated with specimens on all patients would be a waste of resources. When detailed annotation on only a portion of a specimen collection will be needed, it is more efficient to collect the information on only the patients from whom those specimens were obtained and only when clinical data are needed.

4.2. Tissue-Processing Variables

The collection and processing variables that affect the operations of a tissue repository should be identified as part of the design of the tissue repository. Many variables such as neoadjuvant therapy exposure may limit the usefulness of specific tissues and it is important that investigators using such tissues understand these limitations.

8

Bell, Sexton, and Grizzle

In addition, some variables will be very limiting to the usefulness of specimens. 4.2.1. Population

Tissues are usually only available from local medical facilities; thus, some expectations should be defined regarding how many specimens of various types will be needed by the tissue repository. Similarly, needs for types of tumors and other tumor characteristics may vary. For example, if the geographic area of the tissue repository has a small Hispanic population and samples of tissue are needed from Hispanics, arrangements may need to be made for obtaining tissues from medical facilities in a geographic area with a large Hispanic population. Unless tissue repositories are already in operation at such sites, development of ancillary sites for collection of tissues for research may represent a large expenditure of effort and resources to establish a typically distant relationship. It is our experience that many such relationships fail, so these relationships should be approached and developed with great care.

4.2.2. Preoperative changes

Currently, many tumors and diseases have partial therapy prior to definitive surgical therapy. For example, many patients with breast cancer (also sarcomas and prostate cancer) are treated with neoadjuvant therapy (chemotherapy or radiation) before surgical resection. Such therapy can be very effective; however, selective populations of neoplastic cells may be destroyed completely by such therapy, resulting in a residual tumor that does not represent the original disease. Similarly, metastatic lesions may be destroyed so that correct staging of the disease is no longer accurate. It is very important to identify patients who have received neoadjuvant therapy prior to the use of residual tissue in research and to ensure that all users understand the limitations of these specimens.

4.2.3. Intraoperative Changes

Healthy, uninvolved, and diseased tissues may be damaged if their vascular supplies are cut off or are compromised. Such intraoperative changes occur while the tissue is at the normal body temperature, the temperature at which catabolic enzymes are most active. Thus, when a tissue is removed operatively from the body, many physiological and biochemical changes have already occurred. These changes are referred to as being secondary to “intraoperative ischemia” or “warm ischemia.” Recent advances in robotic surgery have increased the operative time of some procedures, such as the radical prostatectomy. It is important to inform investigators of such changes, in that data from current specimens of prostate cancer may not agree with similar previous data. It also is important to record a tissue timeline so that intraoperative ischemia can be estimated (e.g., the time the operation was begun and the time that tissue was removed from the patient). This information is often difficult to obtain.

Organizational Issues in Providing High-Quality Human Tissues

9

4.2.4. Limitations of Available Tissues/ Resources

With the development of new methods of screening for disease, tissues involved by the disease are typically smaller and are of lower stage. For example, many early breast cancers are now identified by mammography. The frequent use of this imaging technique has reduced the size and hence the stage of most breast cancers. More and more neoplastic lesions of the breast treated surgically are in situ disease, ductal carcinoma in situ (DCIS) or lobular carcinoma in situ (LCIS), or invasive lesions of less than 2 cm in diameter. This severely limits the amount of breast cancer available to support research in that all DCIS as well as small tumors of the breast are completely processed for histologic dia gnosis and clinical prognostic studies, so that no tissues may be available to support research other than fixed, paraffin-embedded material (25). Similarly, screening with prostatic-specific antigen (PSA) has reduced the volume of prostate cancer when treated by surgery. In addition, newer imaging approaches together with the use of fine needle aspirates have almost eliminated the availability of tissue samples from metastatic diseases, especially bone metastases of breast, prostate, and lung. In addition, as discussed previously, tumor tissues are limited by preoperative and intraoperative changes (3, 16). Independent of newer diagnostic approaches, access to tissues and tumors of the brain and heart as well as small cell (oat cell) carcinoma of the lung continue to be greatly restricted because these diseases are not treated primarily by surgery. In addition, access to rare tumors such as neuroendocrine–neuroectodermal tumors, subtypes of sarcomas, pediatric tumors, and rare subtypes of epithelial tumors (e.g., medullary carcinoma of the colon) will always be very limited.

4.3. Storage of Specimens

Many investigators do not have ultra-cold storage systems to house their tissue specimens. As part of the design of the tissue repository, it should be decided if the facility will store tissue specimens that have been collected for specific investigators. If so, how long will storage be provided and whether or not investigators will be charged for storage of specimens should be determined. The optimal method for long-term (³6 months) storage of tissue specimens to support biomedical research remains controversial. Data are available to demonstrate that tissue specimens should not be stored at temperatures warmer than −70°C for greater than 1 or 2 months and not for any period in self-defrosting freezers (13, 15); however, information regarding whether or not storage in LN2 vapor phase is better than storage at −80°C is unavailable. At least one study has reported that bias was introduced into the study when cases and controls were collected differently and were stored at −80°C for different intervals (26, 27). Our unpublished data indicate that there is no difference

10

Bell, Sexton, and Grizzle

between storage at −80°C or in LN2 vapor phase for at least a 10-year period. If specimen viability is a goal (i.e., that cells or tissue can exist in culture or as xenografts), the options of specimen preparation and storage are limited. Cells can clearly be grown in culture if a sample of frozen cells has been stored in media plus 10% dimethyl sulfoxide (DMSO). In addition, because the likelihood of a successful culture from a cell sample increases with the number of cells cultured, a short-term culture to increase the number of cells would be useful to promote viability. At this point, it is necessary to point out that such a cell culture would be mixed (e.g., tumor cells plus inflammatory and stromal cells). The literature concerning primary cell cultures should be consulted before attempting the culture of cells from solid tissue, whether stored or not. If solid tissue is frozen directly, it is unlikely that viable cells can be obtained when the solid tissue is thawed because the formation of ice crystals would lyse most cells within the tissue. There are a few anecdotal reports that freezing solid tissue in media plus 10% DMSO may permit the subsequent isolation of viable cells from a thawed specimen. 4.4. Records of Collecting, Processing, and Storage

It is important to keep detailed records concerning variables of the collection, processing, and storage of tissues. Of special importance is the documentation of “times” and temperature conditions, which permit the reconstruction of the history of each specimen. Such times and conditions include the operative time when a specimen is removed from the patient, the time and method of transport (room temperature, on ice, etc.) to reach pathology and the tissue repository, the method of processing (e.g., freezing in OCT, placed in media), and the time of processing (e.g., time of freezing). For tissues prepared as paraffin blocks, the time interval to fixation, the fixative (e.g., 10% neutral buffered formalin), and the length of fixation are important variables to record. In addition, the chemical components and time in each step of the tissue processor as well as the type of tissue processor utilized might be important to record. When such variables are identified, fields for this data should be included in the repository’s informatics system.

4.5. Bias in Use of Tissue Collections

Without detailed and accurate records, bias can easily be introduced into research projects and incorrect conclusions can be accepted for research. For example, consider the comparison cases of a specific disease if the cases were to be evaluated using serum that had had multiple freeze–thaw cycles, a storage time of −80°C for longer than 6 years, and collection prior to an operation; while the associated controls had only one freeze-thaw cycle, a storage time of −80°C for less than 2 years, and collection in the community using a mobile van. Mass spectrometry and other very

Organizational Issues in Providing High-Quality Human Tissues

11

Table 1 Examples of potential sources of bias in tissue sets 1. Population (e.g., racial mixture) 2. Fed or fasting state 3. Diurnal variations (i.e., time of collection) 4. Stress 5. Collection container (red top vs. separator) 6. Time to processing 7. Time to freezing 8. Temperature and length of storage 9. Freeze–thaw cycles 10. Sites of sample collection

sensitive methods might identify the differences in freeze-thaw cycles, collection methods, and/or storage times, and researchers might incorrectly conclude that there were proteomic differences between the serum from patients with the specific disease and individuals who did not have the disease. Such a conclusion would be based on the bias of the two sample sets (13, 15, 26, 27). A bias of this nature might not be identified until attempts were made to validate the initial experimental results. In addition, actions that introduce bias into research studies emphasize the importance of using SOPs to ensure tissues are collected, processed, and stored as uniformly as practicable. In the example case of bias, careful records might have emphasized the differences in specimens and the conclusions might have been tested before reporting on a subset of case and control samples collected, processed, and stored more uniformly. Some of the potential causes of bias are listed in Table 1.

5. Quality Assurance, Quality Control, and Laboratory Certification

A strong program in quality assurance is important in any aspect of biomedical research. This includes resources and infrastructure used to support research including facilities that collect, process, store, and provide tissues to support biomedical research (12, 17, 18). Quality Assurance/Management (QA) is a general approach to management activities that focuses on operational improvements in

12

Bell, Sexton, and Grizzle

all aspects of all activities to ensure that a procedure or product is of the defined quality required. Quality Control (QC) is the system of technical activities that measures the attributes and performance of a process, or item, against defined standards, to verify that the stated requirements are fully met. Thus, QC is only one component of an overall QA program. 5.1. Standard Operating Procedures

SOPs should be developed for all activities of a tissue repository. The SOP permits a laboratory activity to be performed uniformly, day after day. The SOP should be written in detail so, if followed, a new employee can perform the activity just as well as an experienced employee. The SOPs should be organized in a procedure manual that is readily available for use at the bench. Changes in SOPs must only be made by authorized, supervisory personnel and should be initialed and dated by the person making the change. SOPs should be reviewed yearly and revised as necessary. The new SOP should be dated as to its revision, and copies of the old SOP should be retained in the repository files to permit review of prior versions. Employees must not deviate from current SOPs. Example SOPs for the processing of blood and tissue are provided in Subheadings 5.1.1 and 5.1.2. In establishing a strong QA program, personnel assigned to the QA program have the responsibility for ensuring compliance with all SOPs and regulatory requirements, and should report directly to high levels of management concerning all QA issues. These personnel should aid in the development of SOPs for specimen collection, handling, processing, storage, and shipping of specimens. When problems in these areas are identified and/or if any specimens of poor quality are identified, personnel should initiate and participate in efforts to correct these deficiencies. Personnel assigned to QA should be responsible for designing, overseeing, and evaluating audits of overall operations regarding adherence to QA requirements. Good Manufacturing Practices (GMP) are regulatory guidelines that can be adopted by tissue repository organizations to meet the organization’s operational goals. Generally, these standards should include or address requirements of ISO9001, a document produced by the International Organization for Standardization (ISO), a worldwide federation of national standards organizations. The primary purpose of this document is to provide organizations with useful internationally recognized models for operating a quality management system. ISO standards are similar to GMP, but are more detailed and are accepted internationally. Tissue facilities should utilize ISO9001 in developing and monitoring their QA/QC programs.

5.1.1. Example SOP for Blood Products

There are several major considerations when obtaining samples of blood to support biomedical research:

Organizational Issues in Providing High-Quality Human Tissues

13

• Patient consent is necessary to collect blood and HIPAA authorization is required to store any associated protected health care information (PHI) in a bank to support research. Thereafter, aliquots of blood or blood products and associated information may be provided as de-identified specimens (from which all 18 HIPAA identifiers have been removed) so no HIPAA authorization is required for distribution of de-identified specimens and associated information. • Specimens should be processed rapidly within 2–4 h of draw and maintained cold (£4°C) but not frozen after clotting. Specimens can be supplied fresh or frozen; however, if frozen specimens are to be provided, samples should not be frozen until after serum/plasma has been separated. • Blood should be drawn with at least a 20-gauge needle. Drawing blood with smaller needles will increase the rate of hemolysis. If hemolysis occurs in 30% of specimens, the causes of the hemolysis should be investigated. Frequently, hemolysis is due to shearing red blood cells as they are rapidly drawn through a butterfly needle – even a 20-gauge needle if a Vacutainer tube is connected directly to a butterfly. This can partially be prevented by using a Luer Lok adapter, which reduces the draw pressure slightly, or by using a syringe to draw blood from the butterfly. • Labeling specimens correctly is critical; barcodes are recommended for labeling. The printed bar code label should identify the blood product including the type of plasma – EDTA, citrate, heparin, or other type of sample – and the tube (red top vs. separator [tiger top]) in which serum was collected. • Records of each step in the blood draw regarding size of aliquot, time of separation, and freeze-thaw cycles are critical. • The potential goals for sample usage should be defined clearly (e.g., analysis by proteomics) in order for the SOP of blood drawing to be adequately developed. Multiple organizations in which we participate have developed SOPs for drawing and preparation of blood products including the Early Detection Research Network (EDRN), the Cooperative Human Tissue Network (CHTN), and the Pulmonary Hypertension Research Initiative (PHBI). We have participated in the development of the SOPs for the collection and organizational preparation by these organizations of blood products. The following is a synthesis of these SOPs. We recommend that larger-drawing Vacutainer tubes be utilized, with the goal of drawing enough blood at one draw to prepare multiple aliquots of the blood products – serum, plasma, buffy coat, and/or whole blood – to reduce freeze–thaw cycles. The aliquots we prefer, 250 ml, have been selected as a compromise between the labor required for the aliquoting of a sample

14

Bell, Sexton, and Grizzle

large enough to be analyzed by multiplex immunoassay or by mass spectrometry, and being small enough to minimize waste on thawing. Some general recommendations are: • Consent for the specific size of the blood draw must be obtained from patients (see your local IRB for limitations) and HIPAA authorization must be obtained for storing of the blood and associated patient information in a biobank/tissue repository. • Before aliquots are distributed to investigators, specimens should be de-identified, which includes making sure that none of the 18 HIPAA identifiers are included in the medical information provided to investigators. • If samples are to be collected prospectively, a “request form” should be developed, which not only details the investigator’s needs (e.g., 250 ml of EDTA plasma from African American women and stored at −80°C or colder for <6 years with no freeze–thaw cycles), but also contains various agreements between the investigator and the tissue repository. These agreements may include a requirement for the education of all experimental personnel in biohazards, indemnification, commercial issues, and such requirements as not trying to identify the source of specimens. These can be included in a materials transfer agreement (MTA) or similar agreement. • If drawing from an intravenous (i.v.) port, make certain that the sample is not contaminated by i.v. fluids/medications.

5.1.1.1. Serum

• The actual procedure for drawing blood is beyond the scope of this manuscript. A red-top Vacutainer tube should be used for drawing blood that will be processed to aliquots of serum. Note that only approximately 40% of this volume will result in serum unless the patient has a cardiac/pulmonary or other disease that produces an increased hematocrit, in which case the volume of serum may be much less. Some groups recommend drawing blood for serum (red-top tube) first to minimize contamination with Vacutainer tube additives (e.g., anticoagulants) if one needle is used for drawing multiple tubes. The time of blood draw should be recorded. (a) After drawing, allow the Vacutainer tube to stand upright for 30–45 min (record the time) at room temperature to ensure adequate coagulation. Thereafter, keep the tube cold (4°C) if there will be a long interval (>1 h) prior to processing and storage. (b) Centrifuge the tube(s) at 1,300 × g for 10 min. (c) Transfer the tube(s) to a stable tube rack. (d) Saturate a gauze square with 70% EtOH or use an alcohol prep pad. Place the gauze over the Vacutainer tube rubber stopper and carefully remove the stopper. Do not disrupt the clot.

Organizational Issues in Providing High-Quality Human Tissues

15

(e) Using a pipette, draw off 90% of the fluid at the top of the tube. This will be the serum. Record any hemolysis. If this is an adequate amount of serum to meet the needs of the resource, proceed to step k. (f) If additional serum is required, place the serum initially drawn from the tube into a 15-ml conical tube. Transfer the tube to a stable tube rack. (g) Carefully draw off remaining serum and transfer into a separate conical tube. (h) Centrifuge the second tube at 1,300 × g for 10 min to remove any contaminating red blood cells or other material. (i) Using a disposable pipette, draw off the remaining fluid (serum). (j) Add the second serum draw to the first serum draw. Invert the tube to mix the serum. (k) Transfer 250-ml aliquots of serum into cryovials. (l) Screw the lids securely onto the cryovials. (m) Freeze the cryovials containing aliquots of serum upright on dry ice or in liquid nitrogen vapor. (n) Store at −80°C or colder. (o) Record the times of processing and freezing, and record the storage location. (p) If the clot is needed for DNA analysis or other analysis, it can be removed from the Vacutainer tube and frozen in a cryovial.

5.1.1.2. Plasma

(q) Discard pipette and tubes with remaining red blood cells into appropriate biohazard waste containers according to your institutional requirements. A lavender (EDTA) tube should be used for drawing blood that will be processed to aliquots of plasma and buffy coat. Note that only approximately 40% of this volume will be plasma unless the patient has a cardiac/pulmonary or other disease that produces an increased hematocrit, in which case the volume of plasma may be much less. The time of blood draw should be documented. (a) If not processed rapidly (e.g., within 1 h), keep the specimen cold at 4°C, but do not allow the specimen to freeze (i.e., do not use blue ice, which may be colder than 0°C). (b) Invert the tube several times to mix the blood components. (c) Centrifuge at 1,300 × g for 10 min. (d) Transfer the tube to a stable tube rack. (e) Saturate a gauze square with 70% EtOH (or use an alcohol prep pad). Place the gauze over the Vacutainer tube rubber

16

Bell, Sexton, and Grizzle

Fig. 1. Blood components separated after centrifugation.

stopper and carefully remove the stopper. Do not disturb the plasma/buffy coat layers. (f) If the plasma accidentally becomes contaminated with white cells or red cells, transfer the plasma into a secondary centrifuge tube, and centrifuge a second time at 1,300 × g for 10 min to remove all potentially remaining cells. (g) Using a pipette, draw off the fluid at the top, which will be the plasma. Be careful not to disturb the middle (tan) layer, which is the buffy coat (see Fig. 1). (h) Transfer 250-ml aliquots of plasma into cryovials. (i) Screw the lids securely onto the cryovials. (j) Freeze cryovials containing aliquots of serum upright on dry ice or in liquid nitrogen vapor. (k) Store at −80°C or colder. (l) Record the times of processing and freezing, and record the storage location. 5.1.1.3. Buffy Coat

1. After the plasma has been removed, the buffy coat will be the next layer, which is the tan cellular material located between the plasma and red blood cells (see Fig. 1). 2. With a clean pipette, carefully draw up the buffy coat (it is likely that some red cells may be pulled up in this draw; this is acceptable). 3. There is now a choice of three pathways: (a) The material can be frozen neat: (i) Transfer the buffy coat into a cryovial. (ii) Freeze the cryovial upright at −80°C or colder and store at −80°C or colder. (iii) Record the time and date of freezing and type of storage.

Organizational Issues in Providing High-Quality Human Tissues

17

(b) The material can be processed to immortalize lymphocytes; these are processed and stored at liquid nitrogen vapor phase or colder (note: the procedure for immortalization of lymphocytes is beyond the scope of this manuscript). (c) The buffy coat can be frozen with RPMI and DMSO: (i) Measure the amount of buffy coat and add an equal volume of RPMI together with 20% DMSO. The DMSO is 10% of the total volume. (ii) Transfer the cocktail to a cryovial. (iii) Freeze the cryovial upright at −80°C or colder and store at −80°C or colder. (iv) Record the time and date of freezing, the additives, and document the type of storage. 4. Discard the pipette and tubes with remaining red blood cells into appropriate biohazard waste containers according to your institutional requirements. 5.1.1.4. Whole Blood

(a) Record the time of the blood draw. (b) If not processed rapidly (e.g., within 1 h), keep the speci men cold at 4°C, but do not allow the specimen to freeze (i.e., do not use blue ice, which may be colder than 0°C). (c) Whole blood is best handled fresh; freezing whole blood without 10% DMSO will cause extensive lysis.

5.1.2. Example SOP for Obtaining Solid Tissue for Research

Important considerations in the collection of solid tissue include the following: • The clinical usefulness of the specimen must never be compromised. • Aliquots of solid human tissues to be frozen should be relatively small (0.25 g or less) to minimize future freeze–thaw cycles and to aid in rapid freezing. The exception to this would be if samples are being collected prospectively, and samples of a specific size are requested by an investigator. • Samples to be supplied fresh (e.g., to establish cell cultures) should be finely chopped (<0.5 mm) and placed in a standard transport media (e.g., RPMI 1640) until transferred. If it is necessary to hold the specimen for longer than 24 h (e.g., to await diagnostic review before distribution can take place), the sample should be placed in a cell culture incubator until it can be transferred. In some cases, fetal calf serum (usually 10%) and/or antibiotics/antifungals are added. Shipping to distant sites should usually be at 4°C, although some cellular components are best shipped at ambient temperature.

18

Bell, Sexton, and Grizzle

• A unique identification code should be assigned to each aliquot of the solid tissue. This code preferably is added via a bar code, which is linked by software to the demographic and clinical information of the donor as well as the storage site of the aliquot. • Important times related to the specimen should be recorded and included in the information linked to the bar code, as discussed previously. After processing, frozen tissues may be stored several ways: • The tissue can be placed in a tissue cassette, which is then wrapped in heavy-duty aluminum foil to reduce desiccation, and labeled on the outside of the foil with the bar code. • The tissue can be placed in a plastic cryovial, which is labeled with the bar code (note: tissue may be difficult to remove from a cryovial). • Tissue samples may be placed in other similar closed containers that withstand −80°C or LN2 temperatures. • Tissue may be placed in a cryomold, covered with OCT compound, then frozen (note: OCT may interfere with some assays). • Tissue may be placed in RNAlater for subsequent freezing (see http://www.ambion.com/techlib/prot/bp_7020.pdf for processing of specimens in RNAlater). Current reports indicate that specimens should be stored at −80°C or colder. To date, data indicate that specimens degrade within 6 months at −20°C (13, 15); few differences have been shown between long-term storage at −80°C and at the freezing temperature of liquid nitrogen vapor (−186°C). Procedure: The specific procedure used in an SOP for a tissue repository will depend on the workflow of the tissue repository, surgery, and pathology. The procedure should be organized for optimal and efficient workflow. The following is provided as an example procedure: • On the day before surgery, check the surgical schedule and identify cases of interest that could supply needed remnant tissues. • On the morning of surgery, distribute sterile specimen container(s) surrounded by wet ice and identified with the patient’s name and hospital number to the surgical desk for distribution to the appropriate operating room. • Immediately after removal from the patient, the operative specimens should be placed in the iced sterile specimen container. When the specimen is removed, tissue repository personnel should be informed so they can transfer the specimen to pathology. The times of initiation of surgery (anesthesia), specimen removal, and specimen transfer to pathology should be recorded, if available.

Organizational Issues in Providing High-Quality Human Tissues

19

• The tissue repository personnel transport the specimen directly to pathology or the frozen section room and clearly describe to personnel in pathology the types of tissues needed from the specimen (e.g., malignant breast tissue and matching uninvolved breast from a mastectomy). Tissue not needed for diagnosis is rapidly removed from the specimen and supplied to tissue repository personnel for research. • If space is not immediately available for processing of the tissue, it may be necessary for tissue repository personnel to transfer (on wet ice) the remnant research tissue to a tissueprocessing laboratory for additional tissue retrieval, processing, and storage. Note: It may take several hours to dissect and process all research tissue from a large specimen. • Tissue specimens may be further dissected into multiple aliquots of each specimen type (e.g., from a 1-g specimen of malignant breast and 1-g specimen of matching uninvolved breast, four 0.25-g aliquots of tumor and four 0.25-g aliquots of matching uninvolved tissue could be prepared). • As aliquots are created, representative quality control (QC) aliquots should be taken. These could represent a 1:1 relationship with each aliquot, or a QC aliquot can be taken between two research aliquots such that the QC aliquot is representative of both research aliquots. • Research aliquots should be processed appropriately (fresh, frozen, RNAlater, fixed) depending on the need and identified use of the tissue or SOP of the bank. • Each aliquot of tissue (both research and QC) should be assigned a unique code for identification. It is, however, important that the code assigned to the QC aliquot be such that the QC aliquot can be easily linked to the research aliquot(s) it represents (e.g., QC aliquot “A–B” is representative of research aliquot “A” and research aliquot “B”). • QC aliquots can be placed in fixative and processed to paraffin blocks or frozen in OCT, yielding paraffin-embedded sections or frozen sections, respectively, for microscopic examination. • Barcode labels should be generated to allow unique identification of each aliquot of the specimen, as discussed previously in this chapter. Subsequently, additional clinical, demographic, storage location, and QC information can be added to the database. • When specimens are to be retrieved, the bar code identifies the characterization of each aliquot as well as the storage site of each aliquot. Upon removal of a specimen from the storage location, the bar code is scanned and the disposition of the specimen (e.g., transferred to investigator X) is entered into the database for tracking.

20

Bell, Sexton, and Grizzle

5.2. A udits

Audits are written periodic evaluations of operating procedures and infrastructure. Tissue repository facilities should conduct regular audits such as those listed subsequently (12, 17, 18). Audits may be as simple as a weekly check of freezer temperature logs or liquid nitrogen levels or may be more complex, such as a quarterly review of specimen collections. QA personnel document problems and report them to upper management, who report directly to the chief executive officer of the repository. The QA program of a repository should describe how and when audits are conducted. For examples, specific audits and records could include the following: • SOPS and adherence to these procedures • Equipment maintenance and repair • Equipment monitoring (e.g., determining the cutting thickness of microtomes) • Training records and adherence of staff to required training (e.g., training in biohazards) Tissue repository organizations should consider distributing an annual survey to determine the satisfaction of users/investigators who obtained tissue during the preceding calendar year. The results of the survey should be evaluated carefully especially by the QA group. Investigators reporting unsatisfactory results should be contacted and their problems discussed and corrected if practicable. If the organization provides tissues to extramural investigators, each shipment should be monitored closely. Such monitoring can be accomplished by including a short questionnaire with each shipment that documents receipt of the shipment and any specific problems with the shipment (e.g., not enough dry ice in frozen shipments).

5.3. Quality Control of Tissues

Monitoring the quality and diagnoses of the actual tissues provided for research (i.e., quality control) is a very important component of the QA program. Tissue facilities have used various forms of QC to aid investigators with their studies in order to ensure that the tissues and associated information provided meet the needs of the investigator (3, 6, 16). Many tissues, especially tumors, are heterogeneous; thus, specimens from tumors vary regarding the extent of neoplastic cells, mucin production, fibrosis (desmoplasia), inflammatory cells, and/or necrosis. Fibrosis in and adjacent to tumors may be intermixed with or mistaken for tumor and some tumors may diffusely infiltrate healthy tissues making areas of tumor difficult to identify grossly. Therefore, in general, just knowing the diagnosis of a patient from whom tissues are obtained is not adequate quality control for tissues provided for research.

Organizational Issues in Providing High-Quality Human Tissues

21

Fig. 2. Tissue specimens supplied for research.

The minimum QC for each tissue repository organization is the microscopic examination by a pathologist of an aliquot of tissue that is very representative of the specific tissue that is supplied for research. Optimally, unless the tissue is very small, a QC examination (Fig. 2) is made on a mirror image piece of tissue to that supplied for research. Note that in Fig. 2, AB is processed to a paraffin block and histopathologic examination of AB is the QC mirror image for both specimen A and specimen B. Similarly A¢B¢ is the molecular quality control of these two specimens. Using this or similar methods of QC, the Cooperative Human Tissue Network (CHTN) has found that approximately 15% of tissues collected for specific research cannot be used for the specific research for which the specimens originally were collected (3, 6, 16). For example, areas of tissue that appear grossly to be unaffected by disease process may be microscopically involved by disease. Conversely, specimens that appear to be diseased may include such a large component of other processes, such as scaring/fibrosis/accumulation of mucin or damage by radiation that they are unusable for specific research. Other reasons for rejection of a specimen of tissue because of the QC examination may include extensive ischemia, inflammation, or necrosis. For example, some focal areas of large tumors (e.g., large renal cell carcinomas or liver metastases of colorectal cancer) may be so necrotic that only a few recognizable tumor cells remain in the areas of tumor collected for research. Typically, in the QC examination, the

22

Bell, Sexton, and Grizzle

proportion (percent) of the specimen that is diseased is specified along with the percent necrosis/fibrosis of the diseased areas as well as the percent of other factors such as mucin formation. For example, a mucinous tumor may be composed of greater than 90% acellular mucin and the lack of cellular representation is likely to impede many specific forms of analysis. Another important QC index of tissues used in research is the proportion of cells within the tumor that are neoplastic. This is necessary because a tumor may be infiltrated by a large proportion of inflammatory cells. QC can also be performed on frozen sections of a specimen embedded in frozen section support medium (e.g., OCT). Note that OCT or other similar heavy alcohol-based support mediums needed to obtain frozen sections may superficially contaminate the specimen and subsequently interfere with some assays (e.g., biological assays for folate). Figure 3 demonstrates QC via a frozen section including the minimum information needed by most investigators. The QC examination also can be in part based on “molecular quality control” in which mRNA, DNA, and protein are extracted from small aliquots followed by molecular characterization of the molecules using various analytical methods ranging from mass spectrometry or gene arrays to examination of ribosomal bands of RNA using gels (8) or other systems such as the Agilent® 2100 System. Molecular quality control is performed when investigators request this level of QC and periodically to verify the quality of specimens in general provided by the facility. It may also be performed when investigators indicate that there is a specific problem with any of the specimens with which they have been

Fig. 3. Quality control using frozen sections.

Organizational Issues in Providing High-Quality Human Tissues

23

supplied, if additional aliquots of those specimens are still available at the facility. Even more extensive QC examinations of tissue are sometimes needed by investigators or specific collaborative research projects; however, as the required QC examinations become more extensive, these more extensive QC measurements have a “price” for their use; specifically, increased time and effort is put into the QC examination of the specimen by the tissue repository organization and thus there is usually an increased “cost” of the specimen to the investigator. QC can also be tailored based on the request of individual investigators. Rarely, an investigator may request a “platinum level” of QC, which is demonstrated in Fig. 4. In this approach to QC, frozen sections of the whole specimen are made, followed by macrodissection to enrich the specimen in diseased cells, followed by a QC examination of the opposite side of the specimen plus additional macrodissection if necessary. In addition, aliquots of the front and back of the specimen after macrodissection would be analyzed molecularly. Thus, frozen sections from both

Fig. 4. Platinum level of quality control based on frozen sections, macrodissection, and molecular assays.

24

Bell, Sexton, and Grizzle

sides are not only used in microscopic examinations for QC, but also are used in molecular QC. Such an approach to QC increases the costs to the tissue repository of processing specimens and to the investigator. This approach potentially reduces the tissue available to support research because the QC examination begins to exhaust the specimen. If the research projects of an investigator do not require extensive QC, such an approach is not cost effective. Investigators may wish to perform molecular QC and/ or microdissection (e.g., laser capture) or macrodissection at their home laboratories or organizations, hence reducing the need for extensive efforts in QC at the tissue repository. In addition, most research with today’s microtechniques may not require this extent of QC. The QC examination of tissues for research should match the research planned for the tissues and the investigator requests. For most tissue specimens and investigators, the approach shown in Figs. 2 or 3 is adequate. If tumor enrichment or molecular analysis is required, this can be requested by the investigator or can be performed at the investigator’s home institution. If mRNA is to be analyzed using long amplicons greater than 200 base pairs, a section of tissue at the request of an investigator can be examined to determine the quality of RNA using the Agilent 2100 method; obviously, if the specimen is not to be used in RNA analysis, such analysis adds unnecessary costs. Note that using short amplicons in real-time quantitative polymerase chain reaction (PCR) techniques permits the use of somewhat degraded RNA, and even RNA extracted from paraffin blocks can be used to give equivalent results to those using frozen tissues (10, 11). The best and most cost effective approach for QC is to utilize a simple approach (Fig. 2) that can be expanded to the platinum level, but only at the request of an investigator. 5.4. Quality Assurance for the Collection of Bodily Fluids

The QA of bodily fluids primarily involves selecting the parameter of collection, processing, and storage, then incorporating these into SOPs while avoiding bias in the selection of patients and in the collection, processing, and storage of the specimens (13, 15). Most studies rely on specimens of bodily fluids that are frozen within 4 h of collection. After clotting and prior to processing, the specimens can be maintained at approximately 4°C, but freezing of any blood specimen prior to processing should be avoided to prevent hemolysis. For many methods, hemolyzed specimens should be avoided, in that red blood cells may release large amounts of hemoglobin as well as proteolytic enzymes after lysis. For a discussion of other issues concerning QC of a tissue repository, see the Best Practices of the International Society for Biological and Environmental Repositories (ISBER) (12, 17) and National Cancer Institute (NCI) Best Practices (18).

6. Regulatory and Ethical Issues in Tissue Repositories 6.1. Informaed Consent and HIPAA Authorization

6.2. Cost Recovery

Organizational Issues in Providing High-Quality Human Tissues

25

Tissue repository organizations frequently obtain remnant tissues that remain unused after surgical and/or other diagnostic or therapeutic procedures have been completed; these remnant tissues would otherwise be discarded or destroyed. The local IRB determines whether or not it is necessary to obtain consent from the patients who are the source of these tissues; the local IRB may elect to waive the requirement for informed consent and HIPAA authorization from these patients (5, 19). Informed consent is not easy to obtain from such patients by tissue repository personnel because the personnel of the tissue repository have no clinical or other relationship with these patients and it has been demonstrated that personnel who do have a clinical relationship with the patients (e.g., surgical nurses or surgeons) are too busy to obtain informed consent for use of tissues in research. In addition, there is no optimal time or place to obtain the consent. Upon admission, patients may be overwhelmed and may tend to sign anything. At clinic or in the preoperative area, there may be no space or time for personnel of the tissue repository to obtain consent; also, many clinics operate simultaneously, requiring multiple personnel to be involved in the consent process. After surgery, patients may be on pain medications that may affect their ability to provide informed consent. Also complicating the ability to obtain postoperative consent is the fact that many patients may be discharged on the same day as their surgery. Thus, there is no perfect time and place to obtain informed consent. We have found that obtaining informed consent typically requires approximately 30 min per patient. Since thousands of patients per year may be sources of tissue, obtaining consent from individual patients is very expensive, usually requiring more than one full-time person. While obtaining informed consent, written authorization of patients to use their protected health care information in research also can be obtained. Although the description of the research may often be too general to meet HIPAA requirements, obtaining HIPAA authorization from patients may support waivers of HIPAA authorization by the Privacy Officer/Board. It is illegal to sell human tissues; however, it is legal and ethical to recover the costs associated with collection, processing, storing, and providing tissues to investigators. Funds collected in cost recovery greatly aid in the support and maintenance of a tissue repository. A primary issue is to determine the proportion of cost recovery and establish a processing cost per specimen. For some repositories, grant support may cover some of the costs; therefore, processing fees for tissues may be set to approximate the costs to that of an experimental animal.

26

Bell, Sexton, and Grizzle

7. Safety The personnel of a tissue repository are exposed to many potential sources of injury including biohazards, chemical hazards, electrical and physical injuries, and fire dangers. To minimize these dangers, a tissue repository must develop a safety program. This safety program can be a component of an overall safety program of any institution with which the tissue repository is associated. Such a safety program will also provide a safety committee and safety officer to work with the tissue repository. Other articles may aid a tissue repository in establishing a safety program (7, 14). 7.1. Biohazards

One of the initial decisions that a tissue repository should make is whether or not to collect tissues from patients infected with human pathogens (e.g., HIV, hepatitis B) or patients at risk of infection (e.g., i.v. drug abusers). Many tissue repositories have elected not to collect such tissues; nevertheless, any tissue repository might accidentally or unknowingly provide numerous investigators with infected tissue. It is therefore important that tissue repositories require that all personnel processing or using the tissues the repository supplies be educated in biohazards and handle all tissues with universal precautions. Those receiving the tissues also need to sign an indemnification agreement with the tissue repository, i.e., to hold the suppliers not responsible for injuries caused by the tissues or tissue products that are provided. In addition, the personnel of the tissue repository must be similarly educated in biohazards and should be provided with vaccinations for hepatitis B.

7.2. Chemical Hazards

Exposure to hazardous chemicals such as formaldehyde and xylene present potential dangers to personnel of a tissue repository. This may be via direct contact or by exposure to toxic vapors. A safety program including the correct use of safety equipment to minimize exposures to hazardous chemicals is an important component of any tissue repository. A component of this safety program is yearly education in chemical hazards and the maintenance of an inventory of all chemicals along with their material safety data sheets (14).

8. Informatics An informatics system should be developed or chosen based on the size and type of the tissue repository. In general, its use should save time for the personnel of the repository, so information should be easily added to and obtained from the database. One very useful aspect of an informatics system, even a simple one, is one that is based on the use of a bar code to identify

Organizational Issues in Providing High-Quality Human Tissues

27

uniquely specimens as discussed in Subheading 3.1 on sample identification. As suggested, when multiple time points of tissue collection, processing, and storage are to be recorded, fields for these time points should be included in the database. All information needed to develop a “history” of each specimen should be incorporated in separate fields of the database. 8.1. Vocabulary

The approach to vocabulary of an informatics system that must frequently interact with the requests from investigators for solid tissues and bodily fluid is different from an informatics system that deals primarily with diagnostic vocabulary used by pathologists. For example, an investigator may want serum from African American patients with any breast cancer; in contrast, pathologists seldom make a diagnosis of “breast cancer,” but rather pathological diagnoses usually are very specific (“ductal carcinoma, well differentiated”); race is not a component of the diagnosis, and surgical pathologists rarely deal with bodily fluids. Thus, the approach to the vocabulary of an informatics system for a tissue repository must be more flexible than that of a system relying on diagnostic vocabulary. This difference is discussed and demonstrated by Edgerton et al. (28). Databases that contain identifiable patient health care information (PHI) must meet HIPAA security requirements including installation on a secure server behind a firewall. HIPAA requirements require prevention of unauthorized access to such a database so that access is only via unique individual access codes. Usually these codes permit each type of user specific types of access. Some personnel codes permit read-only, some permit data entry and editing, and some administrative codes permit modification of the database fields. One of the major HIPAA requirements for an informatics system containing PHI is maintenance of an audit trail that records all individuals who access the database even for read-only activities.

8.2. caBIG

The caBIG program is being developed by the NCI to provide consistent informatics approaches to cover most NCI activities, especially work areas associated with NCI Comprehensive Cancer Centers. Of great importance is that databases of various comprehensive cancer centers can communicate with each other. In the area of human tissues, a program, caTissue, has been developed by caBIG as an entry informatics program for use by tissue repositories. This program is available at no cost to nonprofit users. Similarly, the program TissueQuest has been developed for large tissue repositories by the CHTN to permit electronic communication concerning investigators and their tissue requests (28). TissueQuest is primarily a program to permit a large tissue repository to work with the requests of numerous investigators.

28

Bell, Sexton, and Grizzle

9. Future Directions of Tissue Repositories

Based on guidelines and the new Best Practices of ISBER (12, 17), as well as the NCI (18), detailed records will be required to develop a “history” of each human tissue stored in a tissue repository. This will require an informatics program that can incorporate such data. In addition, a process of harmonization in required activities such as collection of informed consent and HIPAA authorization is being developed. These changes are likely to be applied to all repositories funded by any governmental agency. Tissue repositories should follow new versions of ISBER as well as NCI Best Practices related to human tissue repositories. Similarly, informatics systems of tissue repositories should follow HIPAA security requirements as well as the new requirements developed by caBIG.

10. Summary Human tissue specimens of superior quality, as well as matching clinical data are needed to support many types of biomedical research. As we have discussed, there are several issues that should be considered and addressed when attempting to meet this need through the design and establishment of a human tissue repository.

Acknowledgments Our work is supported in part by the Cooperative Human Tissue Network (NCI #3U01CA044968), the Breast SPORE at UAB (NCI #1P50CA89019), the Pancreatic SPORE at UAB (NCI #P50CA101955), the Early Detection Research Network (NCI #5U24CA086359), and the Cardiovascular Medical Research Education Fund (TP1GR). References 1. Grizzle, W., Grody, W.W., Noll, W.W., Sobel, M.E., Stass, S.A., Trainer, T., Travers, H., Weedn, V. and Woodruff, K. (1999) Recommended policies for uses of human tissue in research, education, and quality control. Ad Hoc Committee on Stored Tissue, College of American Pathologists. Arch. Pathol. Lab. Med. 123, 296–300.

2. Grizzle, W.E. (1989) Use of human tissues in research in matching needs, saving lives: building a comprehensive network for transplantation and biomedical research, The Annenberg Washington Program: Washington, DC. 3. Grizzle, W.E. and Sexton, K.C. (1999) Development of a facility to supply human tissues to aid in medical research. In Molecular Pathology

Organizational Issues in Providing High-Quality Human Tissues

of Early Cancer, Chapter 24 (S. Srivastava, D.E. Henson and A. Gazdar, eds.) IOS Press: Amsterdam, pp. 371–383. 4. Bailar, J.C., Gaylor, D., Grizzle, W.E., Grumbly, T., Kalman, D., Mahaffey, K., Matthews, H.B., Perera, F. and Waksberg, J. (1991) Monitoring human tissues for toxic substances, National Academy Press: Washington, DC. 5. Grizzle, W.E., Woodruff, K.H. and Trainer, T.D. (1996) The Pathologist’s role in the use of human tissues in research – legal, ethical and other issues. Arch. Pathol. Lab. Med. 120, 909–912. 6. Grizzle, W.E., Aamodt, R., Clausen, K., LiVolsi, V., Pretlow, T.G. and Qualman, S. (1998) Providing human tissues for research: how to establish a program. Arch. Pathol. Lab. Med. 122, 1065–1076. 7. Grizzle, W.E. and Fredenburgh, J. (2001) Avoiding biohazards in medical, veterinary and research laboratories. Biotech. Histochem. 76, 183–206. 8. Jewell, S.D., Srinivasan, M., McCart, L.M., Williams, N., Grizzle, W.E., LiVolsi, V., MacLennan, G. and Sedmak, D.D. (2002) Analysis of the molecular quality of human tissues: an experience from the Cooperative Human Tissue Network. Am. J. Clin. Pathol. 118, 733–741. 9. Qualman, S.J., France, M., Grizzle, W.E., LiVolsi, V.A., Moskaluk, C.A., Ramirez, N.C. and Washington, M.K. (2004) Establishing a tumour bank: banking, informatics and ethics. Br. J. Cancer 90, 1115–1119. 10. Steg, A., Wang, W., Blanquicett, C., Grunda, J.M., Eltoum, I.A., Wang, K., Buchsbaum, D.J., Vickers, S.M., Russo, S., Diasio, R.B., Frost, A.R., LoBuglio, A.F., Grizzle, W.E. and Johnson, M.R. (2006) Multiple gene expression analyses in paraffin-embedded tissues by Taqman low density array: application to Hedgehog and Wnt pathway analysis in ovarian endometrioid adenocarcinoma. J. Mol. Diagn. 8, 76–83. 11. Steg, A., Vickers, S.M., Eloubeidi, M., Wang, W., Eltoum, I.A., Grizzle, W.E., Saif, M.W., Lobuglio, A.F., Frost, A.R. and Johnson, M.R. (2007) Hedgehog pathway expression in heterogeneous pancreatic adenocarcinoma: implications for the molecular analysis of clinically available biopsies. Diagn. Mol. Pathol. 16, 229–237. 12. Aamodt, R.L., Anouna, A., Baird, P., Beck, J.C., Bledsoe, M.A., DeSouza, Y., Grizzle, W.E., Gosh, J., Holland, N.T., Hakimian, R., Michels, C., Pitt, K.E., Sexton, K.C., Shea, K., Stark, A. and Vaught, J. (2005) Best practices for repositories I: collection, storage and retrieval of human biological materials for research. Cell Preserv. Technol. 3, 5–48.

29

13. Grizzle, W.E., Semmes, O.J., Bigbee, W.L., Zhu, L., Malik, G., Oelschlager, D., Manne, B. and Manne, U. (2005) The need for the review and understanding of SELDI/MALDI mass spectroscopy data prior to analysis. Cancer Inform. 1, 86–97. 14. Grizzle, W.E., Bell, W.C. and Fredenburgh, J. (2005) Safety in biomedical and other laboratories. In Molecular Diagnostics, Chapter 33 (G. Patrinos and W. Ansorg, eds.) Elsevier Academic press: Burlington, MA, pp. 421–428. 15. Grizzle, W.E., Semmes, O.J., Bigbee, W.L., Malik, G., Miller, E., Manne, B., Oelschlager, D.K., Zhu, L. and Manne, U. (2005) Use of mass spectrographic methods to identify disease processes. In Molecular Diagnostics, Chapter 17 (G. Patrinos and W. Ansorg, eds.) Elsevier Academic press: Burlington, MA, pp. 211–222. 16. Grizzle, W.E., Bell, W. and Sexton, K.C. (2005) Best practices and challenges in collecting and processing human tissues to support biomedical research. AACR 96th Annual Meeting, Education Book, pp. 305–310. 17. Pitt, K., Campbell, L., Skubitz, A., Somiari, S., Sexton, K., Pugh, R., Aamodt, R., Baird, P., Betsou, F., Cohen, L., De Souza, Y., Gaffney, E., Geary, P., Grizzle, W.E., Gunter, E., Horsefall, D., Kessler, J., Michels, C., Kaercher, E., Morales, O., Morente, M., Morrin, H., Petersen, G., Robb, J., Seberg, O., Thomas, J., Thorne, H., Walters, C. and Riegman, P. (2005) Best practices for repositories: collection, storage, retrieval and distribution of biological materials for research, Second Edition. Cell Preserv. Technol. 3(1), 5–48. 18. NCI (2007). National Cancer Institute Best Practices for Biospecimen Resources. http:// biospecimens.cancer.gov/global/pdfs/NCI_ Best_Practices_060507.pdf. 19. Aamodt, R. and Grizzle, W.E. (2007) White paper: Report of the Public Responsibility in Medicine and Research (PRIM&R) Human Tissue/Specimen Working Group, pp. 1–95. 20. Huang, J., Qi, R., Quackenbush, J., Dauway, E., Lazaridis E. and Yeatman T. (2001) Effects of ischemia on gene expression. J. Surg. Res. 99, 222–7. 21. Dash, A., Maine, I.P., Varambally, S., Shen, R., Chinnaiyan, M. and Rubin, M.A. (2002) Changes in differential gene expression because of warm ischemia time of radical prostatectomy specimens. Am. J. Pathol. 161(5), 1743–1748. 22. Spruessel, A., Steimann, G., Jung, M., Lee, S.A., Carr, T., Fentz, A.-J., Spangenberg, J., Zornig, C., Juhl, H.H. and David, K.A. (2004) Tissue ischemia time affects gene and protein expression patterns within minutes following surgical tumor excision. Biotechniques 36(6), 1030–1037.

30

Bell, Sexton, and Grizzle

23. Baker, A.F., Dragovich, T., Ihle, N.T., William, R., Fernogilio-Preiser, C. and Powis, G. (2005) Stability of phosphoprotein as a biological marker of tumor signaling. Clin. Cancer Res. 11(12), 4338–4340. 24. Ayala, G., Thompson, T., Yang, G., Frolov, A., Li, R., Scardino, P., Ohori, M., Wheeler, T. and Harper, W. (2004) High levels of phosphorylated form of Akt-1 in prostate cancer and non-neoplastic prostate tissues are strong predictors of biochemical recurrence. Clin. Cancer Res. 10, 6572–6578. 25. Billings, P.E. and Grizzle, W.E. (2007) The gross room. In Theory and Practice of Histology Techniques, 6th Edition, (J. Bancroft and M. Gamble, eds.) Churchill Livingstone: Edinburgh, pp. 75–82. 26. McLerran, D., Grizzle, W.E., Feng, Z., Bigbee, W.L., Banez, L.L., Cazares, L.H., Chan, D.W., Diaz, J., Izbicka, E., Kagan, J., Malehorn, D.E., Malik, G., Oelschlager, D., Partin, A., Randolph, T., Rozenzweig, N., Srivastava, S., Srivastava, S., Thompson, I.M.,

Thornquist, M., Troyer, D., Yasui, Y., Zhang, Z., Zhu, L. and Semmes, O.J. (2008) Analytical validation of serum proteomic profiling for diagnosis of prostate cancer; sources of sample bias. Clin. Chem. 54(1), 44–52. 27. McLerran, D., Grizzle, W.E., Feng, Z., Thompson, I.M., Bigbee, W.J., Cazares, L.H., Chan, D.W., Dahlgren, J., Diaz, J., Kagan, J., Lin, D., Malik, G., Oelschlager, D., Partin, A., Randolph, T., Sokoll, L., Srivastava, S., Srivastava, S., Thornquist, M., Troyer, D., Wright, G.L., Zhang, Z., Zhu, L. and Semmes, O.J. (2008) SELDI-TOF MS whole serum proteomic profiling with IMAC surface does not reliably detect prostate cancer. Clin. Chem. 54(1), 53–60. 28. Edgerton, M.E., Morrison, C., LiVolsi, V.A., Moskaluk, C.A., Qualman, S.J., Washington, M.K. and Grizzle, W.E. (2008) A standards based ontological approach to information handling for use by organizations providing human tissue for research. Cancer Inform. 6, 127–137.

Chapter 2 Manual Microdissection Glen Kristiansen Summary The new opportunities of modern assays of molecular biology can only be exploited fully if the results can be accurately correlated to the tissue phenotype under investigation. This is a general problem of non-in situ techniques, whereas results from in situ techniques are often difficult to quantitate. The use of bulk tissue, which is not precisely characterized in terms of histology, has long been the basis for molecular analysis. It has, however, become apparent, that this simple approach is not sufficient for a detailed analysis of molecular alterations, which might be restricted to a specific tissue phenotype (e.g., tumor or healthy tissue, stromal or epithelial cells). Microdissection is a method to provide minute amounts of histologically characterized tissues for molecular analysis with non-in situ techniques and has become an indispensable research tool. Here, we describe a very simple technique for microdissection of tissues that can easily and cost effectively be established. Key words: Microdissection, Manual, Tissue

1. Introduction In contrast to cell culture material, which is fairly homogenous, organic tissues have a far more complex architecture. Moreover, they are composed of fundamentally different cell types, e.g., epithelial cells, connective tissue, vessels, and various inflammatory cells. The quality of a tissue lysate for molecular analysis clearly depends on the tissue composition and the percentage of cells of interest in the total lysate. This is the more important because the final result (e.g., an expression profile or a protein concentration) cannot be clearly correlated to a tissue compartment.

Robert Grützmann and Christian Pilarsky (eds.), Cancer Gene Profiling: Methods and Protocols, Methods in Molecular Biology, vol. 576, DOI 10.1007/978-1-59745-545-9_2, © Humana Press, a part of Springer Science + Business Media, LLC 2010

31

32

Kristiansen

The only way to solve this problem of expression–phenotype correlation is to sort the cells after characterization into various compartments and to analyze them separately. Doing so in a completely blinded manner, e.g., using tissue separation, antibody incubation, and cell sorting by fluorescence-activated cell sorting (FACS), has been largely unsuccessful and delegates the work of identifying and separating morphological structures back to the pathologists. Several microdissection platforms have been established so far (1–9), and the published use of microdissection has enormously increased in the last decade, as Fig. 1 illustrates. However, separating single cells from tissues by a laser-based microdissection is enormously time consuming and can slow down projects markedly. In addition, in many instances in cancer research, the isolation of highly specific cell types is not necessary or is even counterproductive. Often, the analysis of tissue that can be assigned a single histological diagnosis (healthy tissue, tumor tissue of a certain type or grade) is sufficient. In the past, these analyses have often been done using grossly inspected and dissected tissues, which is not sufficient for a detailed analysis. A compromise between grossly dissected bulk tissue and the meticulous laser microdissection is a manual microdissection. This simple technique allows histologically characterizing and procuring tissue compartments as small as 1 mm2 and is, in our hands, much faster than the

Fig. 1. Number of PubMed-listed publications found by the search term “microdissection” between 1985 and 2007. A steep increase can be seen in the late 1990s, a plateau with approximately 488 papers was reached 2005. The first year in which the number of published microdissection projects showed a decrease was 2006, a trend that continued in 2007.

Manual Microdissection

33

far more laborious laser microdissection. We have successfully used manual microdissection for the analysis of breast cancer and prostate cancer, which is a particularly heterogeneous neoplasm, characterized by the coexistence of morphologically diverse tumor growth patterns that are being mirrored in the Gleason grading system (6). We found the following protocol very helpful, as simple as it is.

2. Materials 2.1. Tissue Collection

1. When dealing with fresh tissues, the immediate cryo conservation is most crucial, and should ideally take place in the operating theatre immediately after removal of the tissue. Because this can be difficult to organize, alternatively, the tissue can be transferred in a plastic bag on water ice from the operating theatre to the frozen section laboratory of the pathology department for further processing. 2. After gross dissection and sectioning of the organ/tissue by the pathologist, a 5-mm slice of the region of interest, e.g., grossly tumor-suspicious area, is placed between two conventional glass slides and immediately immersed in liquid nitrogen for 1–2 min. Because glass slides often break off the frozen tissue, which is inconvenient, reusable metal slides can be used as an alternative; these also have the advantage of an even faster freezing performance. 3. Wrapped in labeled aluminium foil, these frozen slices can be stored at −80°C for years.

2.2. Sectioning and Staining

1. Mount the deep frozen tissue slice to the specimen holder of the cryotome, either sterile saline or conventional OCT embedding medium can be used (Fig. 2, center). 2. Cut serial sections; the thickness can be modified according to the research question. For RNA extraction, we usually cut 30 sections at 12 mm, including thinner control sections at 4 mm after the 1st, the 10th, the 20th, and the 30th section for hematoxylin and eosin (H&E) staining. For DNA extraction, thinner sections (3–6), which yield a better morphology, can be sufficient. 3. Mount the sections on sterile glass slides (room temperature to easily take up the frozen tissue), then leave to air-dry in the cryotome chamber (Fig. 1, left hand side). 4. Process the thin sections for H&E staining as usual (hematoxylin staining, eosin staining, gradual dehydration, cover slipping).

34

Kristiansen

Fig. 2. View into the freezing chamber of a cryotome. In the center, the mounted tissue can be seen. Next to it, on the left, is a glass bench with the sections mounted on glass slides.

5. Sort the sections for microdissection into a glass staining bench prior to staining. 6. Stain with hematoxylin for 1–2 min at 4°C, then immerse in cold (!) tap water (stored in the refrigerator) for another 2 min. This will yield a mild nuclear staining for microscopy (see Note 1). 7. Drain off any superfluous water, and store the sections in a specimen box at −80°C until microdissection.

3. Method 1. On the H&E-stained control sections, mark appropriate regions of interest, for example healthy tissue, tumor tissue, hyperplastic tissue, etc. with a water-resistant pen. This is illustrated in Fig. 3; on the left-hand side, an H&E-stained slide with markings is seen. On the right-hand side, a serial section slide is shown after microdissection of the respective areas. 2. The most important prerequisite for manual microdissection is a person with a calm and skilled hand in order to retrieve the marked areas from the frozen tissue slides using conventional sterile injection needles. This also best accomplished

Manual Microdissection

35

Fig. 3. Comparison of the H&E-stained control slide (left) with an adjacent tissue section (right) stained with hematoxylin alone after microdissection. The marked areas of the left slide have been removed on the right slide.

in a secluded laboratory to ensure a concentrated workflow. Technically, we found a cheap binocular microscope (Fig. 4), commonly used in biological laboratories with a scanning magnification of 10×–60×, comfortable to work with. 3. Put the box with the deep-frozen hematoxylin-stained tissue sections on water ice next to the microscope. The typical work setting is shown in Fig. 5. 4. Compare every specimen under the microscope with the marked control slide and identify the respective areas of interest. 5. Retrieve these tissue areas using sterile injection needles, using one needle per area to avoid contamination (Fig. 6). 6. The retrieved tissue has to be transferred immediately into labeled tubes filled with lysis buffer. Once in lysis buffer, RNAses should be inhibited (Fig. 7). 7. Store at −80°C or process lysates as appropriate for your assay.

Fig. 4. An older version of a binocular microscope (from GDR times). Although fairly simple, this model allows a continuous and sufficient (up to 100×) magnification for tissue identification and offers ample space for micromanipulation.

Fig. 5. Typical set-up for microdissection: microscope, slides, and sample tubes on ice.

Manual Microdissection

37

Fig. 6. The process of manual microdissection: one hand holds the glass slide, and the other hand retrieves the regions of interest with an injection needle from the slide. Time is critical, because, after thawing, the slide will dry within minutes.

Fig. 7. Labeled tubes for pooling the microdissected areas.

4. Notes 1. We found it astonishing to learn that even the use of unsterile tap water (for blueing the sections) did not markedly diminish the RNA quality of the results. We have also attempted

38

Kristiansen

DEPC-pretreated sterilized solutions for all of these staining steps, but because this did not significantly improve RNA integrity, we kept our protocol simple.

Acknowledgments I am grateful to Britta Beyer and Eva Polzin for sectioning and performing the microdissection in my lab. Thanks also to Christoph Weber for excellent photography, Florian R. Fritzsche for proofreading, and Alfred E. Neumann for fruitful discussions. References 1. Montironi, R., Mazzucchelli, R., Scarpelli, M. (2003) Molecular techniques and prostate cancer diagnostic. Eur Urol. 4, 390–400. 2. Gillespie, J.W., Ahram, M., Best, C.J., Swalwell, J.I., Krizman, D.B., Petricoin, E.F., Liotta, L.A., Emmert-Buck, M.R. (2001) The role of tissue microdissection in cancer research. Cancer J. 7, 32–39. 3. Xu, L.L., Stackhouse, B.G., Florence, K., Zhang, W., Shanmugam, N., Sesterhenn, I.A., Zou, Z., Srikantan, V., Augustus, M., Roschke, V., Carter, K., McLeod, D.G., Moul, J.W., Soppett, D., Srivastava, S. (2000) PSGR, a novel prostate-specific gene with homology to a G protein-coupled receptor, is overexpressed in prostate cancer. Cancer Res. 60, 6568–6572. 4. Dahl, E., Kristiansen, G., Gottlob, K., Klaman, I., Ebner, E., Hinzmann, B. et al. (2006) Molecular profiling of laser-microdissected matched tumor and normal breast tissue identifies karyopherin alpha2 as a potential novel prognostic marker in breast cancer. Clin Cancer Res. 12, 3950–3960.

5. Grützmann, R., Foerder, M., Alldinger, I., Staub, E., Brümmendorf, T., Röpcke, S. (2003) Gene expression profiles of microdissected pancreatic ductal adenocarcinoma. Virchows Arch. 443, 508–517. 6. Kristiansen, G., Pilarsky, C., Wissmann, C., Kaiser, S., Bruemmendorf, T., Roepcke, S. et al. (2005) Expression profiling of microdissected matched prostate cancer samples reveals CD166/MEMD and CD24 as new prognostic markers for patient survival. J Pathol. 205, 359–376. 7. Schütze, K., Pösl, H., Lahr, G. (1998) Laser micromanipulation systems as universal tools in cellular and molecular biology and in medicine. Cell Mol Biol. 44, 735–746. 8. Kölble, K. (2000) The LEICA microdissection system: design and applications. J Mol Med. 78, B24–B25. 9. Stagliano, N.E., Carpino, A.J., Ross, J.S., Donovan, M. (2001) Vascular gene discovery using laser capture microdissection of human blood vessels and quantitative PCR. Ann N Y Acad Sci. 947, 344–349.

Chapter 3 Laser Microdissection Anja Rabien Summary Gene expression analysis requires a sound basis of cell material to obtain realistic results. Tissue, however, consists of diverse types of cells, which often differentially express target genes, so that cell populations need to be selected. If tissue diversity is moderate and negligible, manual microdissection can be the costefficient method of choice. In contrast, the advantage of laser microdissection is a very exact selection down to the level of a single cell, but often with a considerable time needed to get enough material for the following analyses. The latter issue and the method of tissue preparation needed for laser microdissection are the main problems to solve if RNA, highly sensitive to degradation, shall be analyzed. This method focuses on optimized laser microdissection procedures for RNA analysis, drawing on the very heterogeneous tissue of prostatic adenocarcinoma. Key words: Laser microdissection, RNA, Degradation, Cryosection, Prostate tissue, Cresyl violet

1. Introduction Many experiments necessitate selection of a distinct type of cells, sometimes even single cells or chromosomes from tissues or cell cultures, which can be used for analysis of gene expression, protein expression, or other applications. Since the 1990s, laser microdissection systems have been commercially available to precisely collect the material in focus. A contamination-free and very exact selection is the main advantage of laser-based microdissection, whereas manual microdissection is cheaper and often time-saving, simply gaining a more heterogeneous material. But recent research in genomics and proteomics increasingly demands a precise assignment of target expression levels, thus, laser microdissection has become indispensable. The principle of Robert Grützmann and Christian Pilarsky (eds.), Cancer Gene Profiling: Methods and Protocols, Methods in Molecular Biology, vol. 576, DOI 10.1007/978-1-59745-545-9_3, © Humana Press, a part of Springer Science + Business Media, LLC 2010

39

40

Rabien

laser microdissection is simple, cutting microscopically selected tissue by laser (ultraviolet [UV] or infrared), but the technology remains complex, thus, a laser microdissection system is still expensive to acquire. Three main systems are commercially available: (a) After cutting, cells can be catapulted into a collection tube with the Zeiss microlaser system (http://www.zeiss.com) (1, 2). (b) Selected cells are transferred onto a film with the Arcturus laser capture microdissection system (http://www.moleculardevices. com) (3–5) or to a special cap by the patented mmi Isolation Cap technology from Molecular Machines & Industries (http:// www.molecular-machines.com) (6). (c) Cut samples fall down into the lid of a collection tube, driven by gravitation, as provided by Leica Microsystems (http:// www.leica-microsystems.com) (7, 8). A short comparison of the laser microdissection techniques was provided by Murray in 2007 (9) and different applications are given in the text, Methods in Molecular Biology (10). We use the Leica laser microdissection system (Fig. 1) to analyze expression of messenger RNA (mRNA) in prostate cancer tissue, a very heterogeneous tissue where epithelium of adenocarcinoma, prostatic intraepithelial neoplasia, healthy epithelium, and atrophic glands are close to each other. Distinction between prostatic epithelium and stroma is also important because genomic as well as proteomic

Fig. 1. Laser Microdissection System Leica DMLA. The Leica CTR MIC electronics box is placed slightly distant from the microscope (left side). To the right, you see the Smartmove control in front and the laser cartridge behind. Pictures are transmitted from the camera (on top) to a computer providing the LMD software.

Laser Microdissection

41

analyses revealed significant expression differences (11, 12). Because analyses of mRNA require a permanent protection from degradation during the whole procedure, we optimized our techniques to obtain high-quality mRNA from lasermicrodissected tissue.

2. Materials 2.1. Preparation of Cryosections

1. Liquid nitrogen, dry ice. 2. 2-Methylbutane (Sigma-Aldrich, Munich, Germany). 3. Membrane glass slides coated with polyethylene naphthalate (PEN) membrane 2.0 mm (Leica, Wetzlar, Germany). 4. Fully equipped cryostat (Leica). 5. Embedding medium: Jung, Leica OCT Cryocompound (Leica). 6. Superfrost Plus glass slides (Menzel, Braunschweig, Germany).

2.2. Staining Procedure and Storage

1. Cresyl violet acetate (Sigma-Aldrich). 2. Mayer’s acidic Hemalaun solution (Hollborn, Leipzig, Germany). 3. Desiccator (Kartell, Noviglio, Italy). 4. Eosin solution, 1% alcoholic (Waldeck, Division Chrome, Münster, Germany). 5. Absolute ethanol, ACS grade (J.T. Baker, Deventer, Netherlands). 6. Xylol, ACS grade (J.T. Baker). 7. Mounting medium (Eukitt, Sigma-Aldrich).

2.3. Laser Microdissection

1. Laser Microdissection System Leica DMLA (Leica, (Fig.1). 2. Software LMD 5.0 (Leica). 3. PCR tubes, 0.5 ml (Brand, Wertheim, Germany). 4. Lysis buffer: RNeasy Lysis Buffer (Qiagen, Hilden, Germany)/ 1% b-mercaptoethanol (Serva, Heidelberg, Germany).

3. Methods 3.1. Preparation of Cryosections

1. The whole procedure is to be performed with gloves and material with a high degree of purity to avoid contamination with RNases.

42

Rabien

2. As soon as possible after surgery, a slice of tissue is shock frozen in methylbutane in a bath of liquid nitrogen (see Note 1) and stored at −80°C. 3. Before cryosectioning, PEN glass slides are UV irradiated under a sterile hood for cell culture for at least 30 min to inhibit RNase activity (see Note 2). 4. The specimen is allowed to acclimate to approximately −23°C (prostate tissue, see Note 1) in the cryostat for 10–15 min and is embedded in Jung medium on a metallic plate. The tissue is fixed and cut with a microtome blade. Cut slices of 5 mm (see Note 3) are placed onto the PEN membrane of the slides, which are immediately stored in a box in dry ice until ready for staining or storage at −80°C. Control sections to be stained with hemalum/eosin are mounted on Superfrost Plus glass slides. 3.2. Staining Procedure and Storage

1. 1% (w/v) cresyl violet acetate (see Note 4) is dissolved in absolute ethanol at room temperature overnight in a shaker. The solution is filtered before use (pore size, 0.2 mm). 2. Cryosections are air-dried for 1 min on ice (not necessary after storage at −80°C) and fixed in −20°C precooled 75% ethanol for 2 min. Excessive ethanol is wicked off with an absorbent paper. 3. The slides are dipped into the 1% cresyl violet acetate solution at room temperature for 20 s. Excessive staining solution is wicked off with an absorbent paper. 4. The slides are briefly dipped into 75% ethanol before incubation for 30 s in 100% ethanol. Excessive ethanol is wicked off with an absorbent paper. The tissue is air-dried for 10 min at room temperature. 5. The slides are stored in a desiccator in the dark, for at least 90 min before usage (see Note 4). 6. Control slides are stained in hemalaun solution for 5–10 min, incubated in warm tap water for 5 min, and rinsed with distilled water. After counterstaining with eosin for 2–5 min, the slides are rinsed in tap water and in distilled water. They are dipped into 70% ethanol, 80% ethanol, 96% ethanol, three times in absolute ethanol (can be denatured), twice in xylol, and for at least 5 min in a third bath of xylol. The tissue is mounted with mounting medium (see Note 5) and covered by a 24 mm × 40 mm glass. The slides are completely dried after 2 days.

3.3. Laser Microdissection

The laser microdissection system should be explained and configured for the customer by the technical service personnel of the supplier. In the case of the Leica microdissection system, the electronics box is placed to the left of the microscope (Fig. 1). The system is connected to a computer providing the LMD software.

Laser Microdissection

43

Fig. 2. Collection device with fourfold holder. From below, an opened collection tube is put through the big hole of a holder. After pushing the cap into its retainers, the tube is fixed underneath. The holder is placed in the collection device. Note covering of the “No Cap” position in the middle with white stickers (see Note 7).

1. The Leica CTR MIC electronics box is turned on before starting the computer. The laser cartridge has a separate switch (Fig. 1). 2. A collection tube is fixed in the tube holder (Fig. 2). The cap of the collection tube is filled with 70 ml lysis buffer/1% b-mercaptoethanol (see Note 6). When inserting (beginning obliquely from the right or the left side), the collection device (see Note 7) must snap into the mounting brackets. 3. The slide, thoroughly dried in the desiccator, is clamped in the specimen holder with the PEN membrane to the bottom (Fig. 3). 4. Using the LMD Software, the holder with the collection tube can be chosen. Magnification, lamp brightness, movement, and focusing are adjusted with the Smartmove control (Fig. 1). 5. The laser is calibrated (“Laser,” “Calibrate”), utilizing an area of the PEN membrane without tissue. 6. The cells to be excised are selected at the monitor by drawing a line around them and cutting (“Draw + Cut,” Fig. 4, see Note 8). The line can be closed automatically (“Close Line”). Several pieces can be marked and are cut consecutively (“Multiple Shapes,” Fig. 4). Bridges of tissue are eliminated by cutting while drawing (“Move + Cut”). The corresponding

44

Rabien

Fig. 3. Close up view of the microdissection stage. The slide is fixed with a cleat upside down in the specimen holder. Underneath, a cover plate occludes the motorized collection device. A UV stray light shield around the holder of the lenses protects from laser radiation.

hemalum/eosin-stained slide is used to better discriminate between different types of morphology.

7. The correct location of the pieces of tissue should be checked (Fig. 5), therefore switch from “Specimen” to “Collector.” 8. Areas and object numbers (see Note 8) from one run of cuts are listed and should be exported, e.g., to Microsoft Office Excel, because the list must be cleared to avoid repeated cutting of the same areas. 9. To remove the slide and the collection device (see Note 9), click “Unload” so that the cap of the tube is protected from hitting. 10. The tube is carefully removed from the holder as follows: Detach the tube, then inch out the lid (Do not turn!) and put the tube onto the cap for closing. Until use, tubes should be stored at −80°C.

Laser Microdissection

45

Fig. 4. Cut series of prostate tissue. (a) A 200-fold magnification is used to select epithelium of prostatic adenocarcinoma. Cuts are shown to the right. (b) Further glands are selected by drawing (“Multiple Shape”). (c) The selected tissue is cut. Note that the laser beam burns a considerable band around the selection. (d) The marked lines are removed by clearing the list of cut areas.

Fig. 5. Cap control. Cut tissue can be seen swimming in the buffer the cap is filled with (100-fold magnification).

4. Notes 1. For mRNA analysis, we recommend cryoconservation instead of formalin fixation and paraffin embedding of the tissue, because degradation of RNA is much more likely using the latter method. The tissue can also be shock frozen in liquid nitrogen, but methylbutane in a bath of liquid nitrogen (−160°C) is better conserving. For cryosectioning, the temperature of the cryostat should be optimized according to the instrument and to the type of tissue, sometimes also from sample to sample. 2. As an alternative to PEN membrane slides, polyethylene terephthalate (PET)-coated slides can be used, but they need a metallic frame (Leica). UV-irradiated membrane slides can be stored dust-free and dry in the dark for up to 1 week.

46

Rabien

3. Per patient, we used five sections of only 5 mm, because prostate tissue proved to be difficult to cut due to inserts and consistency. Nevertheless, sections of up to 20 mm can be cut with the Leica microdissection system. 4. In comparison with methyl green staining (DAKO, Hamburg, Germany) and hemalum staining (Hollborn), cresyl violet staining resulted in a better RNA quality measured with a 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA). Stained cryosections can be stored dust-free and dry (in the desiccator) in the dark for up to 1 week. 5. The hemalaun/eosin controls can be mounted with an organic mounting medium, alternatives to Eukitt are, e.g., Vitro-Clud, R. Langenbrinck, Emmendingen, Germany or Entellan, Merck, Darmstadt, Germany. 6. RNeasy Lysis Buffer (Qiagen) with 1% b-mercaptoethanol can be stored for up to 4 weeks. 7. To avoid light effects, the “No Cap” position of the collection device can be masked translucently (e.g., with white stickers, Fig. 2), and the eyepieces of the microscope can be covered. 8. Adjust the laser (“Laser Control”) for the magnification you want to cut with to obtain a comfortable speed and beam, avoiding bridges but saving tissue. The laser beam burns a considerable band of the tissue (Fig. 4). For a first general survey, we used the 2.5× lens, but excision was optimal with the 20× lens to discriminate within the heterogeneous prostatic tissue. Information regarding the number of cut cells (objects) is available if the diameter of an object is given under “Settings” and “Object Counting.” For our prostate tissue, we calculated 20 mm per cell. Program settings for a distinct type of tissue can be saved and restored in the next session (“Restore Application Configuration”). Images of the tissue can be saved in a database (IM500). 9. We recommend not exceeding 1–1.5 h of cutting for one slide, depending on air humidity, to keep the tissue dry. To collect epithelium of the heterogeneous prostate tissue, we need at least five tubes of each type of tissue. A single cell should average 10–15 pg of total RNA (with 1–3% mRNA).

Acknowledgments The author is very grateful to Cornelia Stelzer and Sabine Becker for technical assistance and photography.

Laser Microdissection

47

References 1. Micke, P., Ostman, A., Lundeberg, J., and Ponten, F. (2005) Laser-assisted cell microdissection using the PALM system. Methods Mol.Biol. 293, 151–166. 2. Schutze, K., Niyaz, Y., Stich, M., and Buchstaller, A. (2007) Noncontact laser microdissection and catapulting for pure sample capture. Methods Cell. Biol. 82, 649–673. 3. Espina, V., Milia, J., Wu, G., Cowherd, S., and Liotta, L.A. (2006) Laser capture microdissection. Methods Mol. Biol. 319, 213–229. 4. Espina, V., Wulfkuhle, J.D., Calvert, V.S., VanMeter, A., Zhou, W., Coukos, G., et al. (2006) Laser-capture microdissection. Nat. Protoc. 1, 586–603. 5. Espina, V., Heiby, M., Pierobon, M., and Liotta, L.A. (2007) Laser capture microdissection technology. Expert Rev. Mol. Diagn. 7, 647–657. 6. Anslinger, K., Mack, B., Bayer, B., Rolf, B., and Eisenmenger, W. (2005) Digoxigenin labelling and laser capture microdissection of male cells. Int. J. Legal Med. 119, 374–377.

7. Kolble, K. (2000) The LEICA microdissection system: design and applications. J. Mol. Med. 78, B24–B25. 8. Vega, C.J. (2008) Laser microdissection sample preparation for RNA analyses. Methods Mol. Biol. 414, 241–252. 9. Murray, G.I. (2007) An overview of laser microdissection technologies. Acta Histochem. 109, 171–176. 10. Walker, J.M. (ed.) (2003, 2005–2008) Methods in Molecular Biology. Humana, Totowa, NJ. 11. Ornstein, D.K., Gillespie, J.W., Paweletz, C.P., Duray, P.H., Herring, J., Vocke, C.D., et al. (2000) Proteomic analysis of laser capture microdissected human prostate cancer and in vitro prostate cell lines. Electrophoresis 21, 2235–2242. 12. Richardson, A.M., Woodson, K., Wang, Y., Rodriguez-Canales, J., Erickson, H.S., Tangrea, M.A., et al. (2007) Global expression analysis of prostate cancer-associated stroma and epithelia. Diagn. Mol. Pathol. 16, 189–197.

Chapter 4 Tissue Microarrays Ana-Maria Dancau, Ronald Simon, Martina Mirlacher, and Guido Sauter Summary Modern array technologies allow for the simultaneous screening of virtually all human genes on the DNA and RNA level. Studies using such techniques have lead to the identification of hundreds of genes with a potential role in cancer or other diseases. The validation of all of these candidate genes requires in situ analysis of high numbers of clinical tissues samples. The tissue microarray (TMA) technology greatly facilitates such analysis. In this method, minute tissue samples (0.6 mm in diameter) from up to 1,000 different tissues can be analyzed on one microscope glass slide. All in situ methods suitable for histological studies can be applied to TMAs without major changes of protocols, including immunohistochemistry, fluorescence in situ hybridization, or RNA in situ hybridization. Because all tissues are analyzed simultaneously with the same batch of reagents, TMA studies provide an unprecedented degree of standardization, speed, and cost efficiency. Key words: TMA, Tissue microarrays, High-throughput in situ analysis, IHC, Immunohistochemistry, FISH, Fluorescence in situ hybridization, Translational research

1. Introduction The demand for analyses of newly discovered genes in diseased tissues, especially human tumors, has grown massively during the last years. To identify the most significant genes among all of the emerging candidate cancer genes, it is desirable to analyze many genes in a significant number of well-characterized tumors. Hundreds of tumors must often be analyzed per gene to generate statistically meaningful results. This leads to a massive workload in involved laboratories. Moreover, traditional analysis of multiple genes results in a critical loss of precious tissue materials because the number of conventional tissue sections that can be taken from a tumor block does usually not exceed 200–300. Robert Grützmann and Christian Pilarsky (eds.), Cancer Gene Profiling: Methods and Protocols, Methods in Molecular Biology, vol. 576, DOI 10.1007/978-1-59745-545-9_4, © Humana Press, a part of Springer Science + Business Media, LLC 2010

49

50

Dancau et al.

The tissue microarray (TMA) technology does significantly facilitate and accelerate tissue analyses by in situ technologies (1,2). When this technology was developed in 1997, the term microarray was generally used for small structures organized in an array-like fashion. However, with the advent of DNA-array technologies, such as complementary DNA (cDNA) arrays or oligonucleotide arrays, the term microarray has more and more become specific for a kind of array made from homogeneous spots that have been placed on a glass surface by automated arraying machines, i.e., array spotters. It is noteworthy that the TMA technology is substantially different from such “spotted” arrays, and represents miniaturized pathology that requires a pathologist’s skills for analysis in the first place. In this method, minute tissue cylinders (diameter: 0.6 mm) are removed from hundreds of different primary tumor blocks and subsequently brought into one empty “recipient” paraffin block. Sections from such array blocks can then be used for simultaneous in situ analysis of hundreds to thousands of primary tumors on the DNA, RNA, and protein level. The cylindrical shape and the small diameter of the specimen taken out of the donor block maximizes the number of samples that can be taken out of one donor block and minimizes the tissue damage inferred to it. Studies have shown that tissue samples with a diameter of as little as 0.6 mm allow a reliable analysis and yield representative data for research and possibly also for diagnostic purposes (3). The possibility of using such small tissue cores is important for pathologists because they can now give researchers access to their material and at the same time retain their tissue blocks. Punched tissue blocks remain fully interpretable for all morphological and molecular analyses that may subsequently become necessary, provided that the number of punches is reasonably selected. Dozens of punches can be taken from one tumor without compromising interpretability. Only few tissues require the use of larger tissue spots per tumor. These tissues include several normal tissues such as blood vessels, where it is necessary to have the entire wall arrayed or normal tissues that have important structures dispersed within the tissue, such as glomeruli in kidney or Langerhans islets in pancreas. Inexperienced pathologists sometimes also require core diameters of >0.6 mm for cancers. However, this results in an unnecessary waste of tissue, additional study costs, and increased workload during interpretation. Virtually all tissues are suitable to be placed into a TMA. Therefore, the range of TMA applications is very broad. One of the most distinct advantages of TMAs is that one set of tissues (which has been reviewed by one pathologist) with available clinical data can now be used for almost an unlimited number of studies. The TMA technique is not limited at all to cancer research, although this still is the predominant application.

Tissue Microarrays

51

The normal expression pattern of gene products can optimally be tested on TMAs containing all kinds of healthy tissues. Similarly as for patient tissues, TMAs can be used for cell lines and other experimental tissues such as xenograft tumors or tissues from animal models (4, 5). There are also applications for TMAs in diagnostic molecular pathology. Here, TMAs can be used as positive control sections or for the inexpensive high-throughput testing of predictive markers such as HER2 overexpression/amplification (Lipp, etc.).

2. Materials 2.1. Sample Collection

1. Standard routine histology microscope for review of tissue sections. 2. Colored pens to mark representative areas on the slides, e.g., red for tumor, blue for healthy, and black for premalignant lesions. 3. Sufficient working space, especially for large-scale projects that require extensive sorting of thousands of sections and blocks.

2.2. Preparing Recipient Blocks

1. PEEL-A-WAY Embedding Paraffin Pellets, melting point: 56–58°C (Polysciences Inc., PA, USA). 2. Slotted processing/embedding cassettes for routine histology, e.g., EMS cat. #70070 (Electron Microscopy Sciences, Inc., PA, USA). 3. Stainless steal base molds for processing/embedding systems, e.g., EMS cat. #62510–30 (Electron Microscopy Sciences, Inc.). 4. Filter/filter papers. 5. Oven for paraffin melting (70°C).

2.3. TMA Making

1. Premanufactured empty paraffin recipient blocks. 2. Illuminated magnifying lenses and supplies (e.g., Luxo U wave II/70, cat. #27950, Luxo, Inc., Switzerland) (optional).

2.4. TMA Sectioning

1. Standard routine histology microtome and supplies (e.g., Leica SM2400, Leica Microsystems Inc., IL, USA). 2. Slide label printer (e.g., DAKO Seymour glass slide labeling system, product code S3416; DAKO A/S, Denmark) or special slide marker (e.g., Securline Marker II, Precision Dynamics Corporation, CA, USA). 3. Boxes for slide storage. 4. Refrigerator for slide storage.

52

Dancau et al.

5. Paraffin Sectioning Aid-System (Instrumedics Inc., NJ, USA; cat. #PSA) containing Ultraviolet Curing Lamp, AdhesiveCoated PSA Slides, TPC Solvent, TPC Solvent can, Hand roller, and Tape windows (optional). 2.5. TMAs from Frozen Tissues

1. OCT Tissue-Tek compound embedding medium (Sakura BV, The Netherlands). 2. Dry ice to keep punching needles and recipient block in optimally cooled condition. 3. Freezer for frozen tissue storage (−70°C).

3. Methods 3.1. TMA Manufacturing 3.1.1. Sample Collection

Although a device is needed to manufacture TMAs, it must be understood that most of the work (~95%) is traditional pathology work that cannot be accelerated by improved (i.e., automated) tissue arrayers. This preparatory work is similar to what is needed for traditional studies involving “large” tissue sections. The major difference is the number of tissues involved, which can be an order of magnitude higher in TMA studies than in traditional projects. The different tasks related to sample collection are described below: 1. Exactly define the TMA that is to be made (see Note 1). Include healthy tissues of the organ of interest and of a selection of other organs as well. 2. Generate a list of potentially suited tissues. 3. Collect all slides from these tumors from the archive. 4. One pathologist must review all sections from all candidate specimens to select the optimal slide. If possible, tumors should be reclassified at that stage according to current classification schemes and tissue areas suited for subsequent punching should be marked (see Note 2). Different colors are recommended for marking different areas on one section (for example, red for tumor, black for carcinoma in situ, blue for healthy tissue). Collect the tissue blocks that correspond to the selected slides. 5. These blocks and their corresponding marked slides must be matched and sorted in order of their appearance on the TMA. 6. Define the structure (outline) of the TMA and compose a file that contains the identification numbers of the tissues together with their locations and real coordinates (as they need to be selected on the arraying device). As a distance between the

Tissue Microarrays

53

Fig. 1. TMA outline example. The division of the TMA into four subsections facilitates the navigation during microscopy.

individual samples, 0.2 mm is recommended. To facilitate navigation on the TMA, we recommend arranging the tissues in multiple sections (e.g., quadrants). The distance between the quadrants may be 0.8 mm (see Note 3). In most laboratories, capitalized letters define quadrants, whereas small letters and numbers define the coordinates within these quadrants. Examples of a TMA structure (outline) and data file containing the necessary information for making a TMA are given in Fig. 1 and Table 1. 3.1.2. Preparing Recipient Blocks

In contrast to normal paraffin blocks, tissue microarray blocks are cut at room temperature (see Note 4). 1. The paraffin is melted at 60°C, filtered, and poured into a stainless steel mold. 2. A slotted plastic embedding cassette (as used in every histology lab) is then placed on the top of the warm paraffin. 3. Recipient paraffin blocks are then cooled down for 2 h at room temperature and for an additional 2 h at 4°C. Blocks are then

54

Dancau et al.

Table 1 Example file for TMA construction Location

Coordinates

Location

Coordinates

Location

Coordinates

A 1a

0/0

A 2a

0/800

A 3a

0/1,600

A 1b

800/0

A 2b

800/800

A 3b

800/1,600

A 1c

1,600/0

A 2c

1,600/800

A 3c

1,600/1,600

A 1d

2,400/0

A 2d

2,400/800

A 3d

2,400/1,600

A 1e

3,200/0

A 2e

3,200/800

A 3e

3,200/1,600

A 1f

4,000/0

A 2f

4,000/800

A 3f

4,000/1,600

A 1g

4,800/0

A 2g

4,800/800

A 3g

4,800/1,600

A 1h

5,600/0

A 2h

5,600/800

A 3h

5,600/1,600

A 1i

6,400/0

A 2i

6,400/800

A 3i

6,400/1,600

A 1k

7,200/0

A 2k

7,200/800

A 3k

7,200/1,600

A 1l

8,000/0

A 2l

8,000/800

A 3l

8,000/1,600

A 1m

8,800/0

A 2m

8,800/800

A 3m

8,800/1,600

A 1n

9,600/0

A 2n

9,600/800

A 3n

9,600/1,600

A 1o

10,400/0

A 2o

10,400/800

A 3o

10,400/1,600

A 1p

11,200/0

A 2p

11,200/800

A 3p

11,200/1,600

A 1q

12,000/0

A 2q

12,000/800

A 3q

12,000/1,600

A 1r

12,800/0

A 2r

12,800/800

A 3r

12,800/1,600

removed from the mold. It is important not to cool down the paraffin on a cooling plate because of the risk of block damage. 4. Quality check of the recipient blocks is important because the blocks must not contain air bubbles. Large recipient blocks (for example, 30 × 45 × 10 mm) are easier to handle than the small blocks (for example, 25–35 × 5 mm) that are typically used in routine histology labs. 3.1.3. TMA Making

Only after all of this preparatory work has been done, can a tissuearraying device be employed. Several tissue-arraying systems are now commercially available, but many groups also use a homemade tissue arrayer. Using manually operated devices, excellent TMAs can be expected only after a significant training period, mostly including several hundred, if not a few thousand punches.

Tissue Microarrays

55

A patient and enduring personality as well as keen eyesight are important prerequisites for operators of the manual tissue arrayers. Automated tissue arrayers are available, but these devices are expensive and they neither accelerate nor significantly improve TMA making. The TMA manufacturing process consists of three steps that are repeated for each sample placed on the TMA: 1. Generating a hole in an empty (recipient) paraffin block 2. Removing a cylindrical tissue sample from a donor paraffin block 3. Placing the cylindrical tissue sample in the premade hole in the recipient block Exact positioning of the tip of the tissue cylinder at the level of the recipient block surface is crucial for the quality and the yield of the TMA block. Placing the tissue too deeply into the recipient block results in empty spots in the first sections taken from the TMA block. Positioning the tissue cylinder not deeply enough causes empty spots in the last sections taken from this TMA (see Note 5). As soon as all tissue elements are filled into the recipient block, the block is heated at 40° for 10 min. 3.1.4. Array Sectioning

Regular microtome sections may be taken from TMA blocks using standard microtomes. However, the more samples a TMA block contains, the more difficult regular cutting becomes. As a consequence, the number of slides of inadequate quality increases with the size of the TMA, and, in turn, fewer sections from the TMA block can effectively be analyzed. Using a tape sectioning kit (Instrumedics) facilitates cutting and leads to highly regular nondistorted sections (ideal for automated analysis). In addition, the tape system may prevent arrayed samples from floating off the slide, if very harsh pretreatment methods are used. However, the sticky glued slides have the disadvantage of increased background signals between the tissue spots in immunohistochemistry (IHC) analyses. The tissue samples themselves do not show increased nonspecific background in IHC. The use of the tape sectioning system is described below: 1. An adhesive tape is placed on the TMA block in the microtome immediately before cutting. 2. A 3- to 5-mm section is cut. The tissue slice is now adhering to the tape. 3. The tissue slice is placed on a special “glued” slide (stretching of the tissue in a waterbath or on a heating plate is not necessary). 4. The slide (tissue on the bottom) is then placed under UV light for 35 s. This leads to polymerization of the glue on the slide and on the tape.

56

Dancau et al.

5. Slides are placed into TPC solution (Instrumedics) at room temperature for 5–10 s. The tape can then be removed gently from the glass slide. The tissue remains on the slide. 6. Slides are dried at room temperature. Using the tape system can cause inhomogeneous immunostaining when certain automated immunostainers are used. In our experience, this especially applies for Ventana devices. 3.1.5. TMAs from Frozen Tissues

Fejzo and Slamon reported manufacturing of TMAs from frozen tissues using a commercially available tissue array device (6). 1. Recipient blocks are made from OCT that is frozen down in a Tissue-Tek standard cryomold. The resulting OCT block is mounted on top of a plastic biopsy cassette. As long as the recipient OCT block is sized exactly like a paraffin recipient block (for which the arrayer had been constructed), no modifications of the arrayer are necessary to mount the block. 2. The recipient block must be surrounded with dry ice to prevent melting. 3. Tissue biopsies (diameter, 0.6 mm; height, 4–5 mm) are then punched from OCT-embedded tumor tissues and placed into the recipient OTC array block using a commercial tissue microarrayer. There are some main differences compared with the procedure described for paraffin blocks. It is important that the tissue in the needle is kept frozen during the procedure and that the needle will not be damaged (see Note 6). The frozen TMAs often become more irregular and distorted than TMAs from formalin-fixed material. Therefore, a larger space between samples is recommended (e.g., 1 mm). 4. Sections of the whole block that are 4–10 mm thick are cut from the array block. A cryostat microtome (Microm GmbH, Germany) can be used with or without the Basic CryoJane Tape Transfer System and slides (Instrumedics).

3.2. TMA Analysis 3.2.1. General Considerations

TMAs are suited for all types of in situ analysis methods especially for IHC and fluorescence in situ hybridization (FISH). The applicability of RNA in situ hybridization (RNA-ISH) to formalin fixed tissues is more disputed. However, protocols that can be used on large sections will also work on TMAs. Examples of stained TMA sections are shown in Fig. 2. The most significant difference compared with traditional large-section studies is the high level of standardization that can be achieved in TMA experiments. All slides of one TMA study are typically incubated in one set of reagents, assuring identical concentrations, temperatures, and incubation times. Other minor variables that may have an impact on the outcome of in situ analyses, such as the age of a slide (time between sectioning and use) or section thickness are also fully standardized, as long as all tissues of one study are located

Tissue Microarrays

57

Fig. 2. Examples of stained tissue sections. Hematoxylin and eosin (H&E)-stained sections of (a) a TMA from frozen tissue containing 228 tissue spots. Each tissue spot measures 0.6 mm in diameter. Missing samples result from the sectioning/staining process or indicate samples that are already exhausted. Note that the spot-to-spot distance is larger on the frozen TMA as compared with the paraffin TMA. (b) A TMA from formalin-fixed, paraffin-embedded tissues containing 540 tissue spots. (c) A spot showing immunohistochemical analysis using an antibody directed against the Her2/ neu protein in a breast cancer sample. (d) Magnification of an H&E-stained 0.6-mm tissue spot of a breast cancer. (e) RNA in situ hybridization on a frozen TMA made from healthy and malignant kidney tissues. A radioactively labeled oligonucleotide was used as a probe against vimentin mRNA. The black staining intensity indicates the level of mRNA in each tissue spot. (f) FISH analysis of centromere 11 (green signals) and the ESR 1 gene (red spots) in cell nuclei (blue staining) of a tissue spot (630×). The high number of ESR 1 signals indicates a gene amplification.

58

Dancau et al.

on the same TMA section. As a result of this unprecedented standardization within each experiment, surprising interassay variations can occur if experiments are repeated under slightly different conditions. 3.2.2. Immunohisto chemistry

In general, the same rules apply for IHC analysis on TMA as on large sections. The small size of the arrayed tissues on a TMA facilitates the staining interpretation because predefined criteria can be applied to a well-defined tissue area. This reduces interobserver variation of IHC interpretation. For many immunohistochemical tumor analyses, the following information can be recorded: • Percentage of positive cells • Staining intensity (0, 1+, 2+, 3+) • Subcellular localization of the staining (membranous, cytoplasmic, nuclear) • Tissue localization of the staining (tumor cells, stroma, vessels) For statistical analyses, tumors can be classified into three or four groups based on the percentage of positive cells and the staining intensity. For example: Negative

No staining

Weak positivity

1+ in 1–100% or 2+ in £20% of cells

Moderate positivity

2+ in 21–79% or 3+ in £30% of cells

Strong positivity

2+ in ³80% or 3+ in >30 of cells

Some of the arrayed tissues may show falsely negative or inappropriately weak IHC staining intensity due to variations in tissue processing (e.g., fixation medium and time). The large number of tissues included in a TMA will often compensate for this phenomenon, which is also encountered in large-section IHC analyses. At least a fraction of tissue spots yielding false negative IHC staining results can be identified in control experiments assessing the antigen integrity of the samples, e.g., IHC detection of tissue type-specific antigens like cytokeratins or vimentin. For tissues with a reasonable proliferative activity, Ki67 (MIB1) is an optimal quality control antibody (see Note 7). It is highly recommended to use freshly cut sections for IHC analysis. The time span between sectioning and immunostaining should be less than 2 weeks. Studies have shown that staining intensity decreases significantly with time for many antibodies (7, 8). 3.2.3. Fish

Because biopsies are all treated individually at the time when they are removed, fixed, and subsequently paraffin embedded, one must expect a certain degree of heterogeneity with respect to protein and nuclear acid preservation.

Tissue Microarrays

59

The proof of this assumption is best illustrated in the outcome of FISH analyses. Similar to results seen with largesection studies, TMA FISH analyses yield interpretable results in only approximately 60–90% of the analyzed tumors (depending on the quality and size of the FISH probe) at the first attempt. Again, similar to the case with large-section studies, it is possible to achieve interpretability in a fraction of initially non-informative cases by changing experimental conditions. For example, an increased proteinase concentration for slide pretreatment will result in interpretable signals in some initially non-informative cases at the cost of overdigestion of some previously interpretable samples. In general, we do not attempt to improve the fraction of FISH-informative cases by changing experimental conditions. Because of the high number of tumors on our TMAs (usually >500), we prefer to tolerate a fraction of non-interpretable tumors than to use too many precious TMA sections for additional experiments. 3.2.4. Summary

The TMA methodology is now an established and frequently used tool for tissue analysis. The equipment is affordable and easy to use in places where pathology expertise is available. Basically all kinds of in situ analyses, such as IHC, in situ hybridization, and in situ polymerase chain reaction (PCR) assays may be adapted to TMAs with only slight (if any) modifications of the respective large-section protocols.

4. Notes 1. Often TMA users realize that one critical control tissue has been forgotten only after completion of the TMA block. 2. It is advisable to have a freshly hematoxylin and eosin (H&E)stained section if the actual block surface is not well reflected on the available stained section. 3. For unequivocal identification of individual samples on TMA slides, it is important to avoid a fully symmetrical TMA structure. 4. Therefore, a special type of paraffin is needed with a melting temperature between 55°C and 58°C (“Peel-A-Way” paraffin, see Subheading 2). 5. However, a location of the tissue cylinder that is too superficial is less problematic than a position that is too deep, because protruding tissue elements can – to some extent – be leveled out after finishing the punching process. The use of a magnifying lens facilitates precise deposition of samples, especially for beginners. With the use of a glass slide, protruding

60

Dancau et al.

tissue cylinders are then gently pressed deeper into the warmed TMA block. 6. This can be done by precooling the needle with a piece of dry ice before punching and while dispensing the tissue core into the recipient block. Needles may easily bend or break. To prevent needle breakage, coring must be performed slowly with minimal pressure. 7. MIB1, which leads to strong staining in all mitoses, is often falsely negative in suboptimally processed tissues. References 1. Kononen, J., Bubendorf, L., Kallioniemi, A., Barlund, M., Schraml, P., Leighton, S. et al. (1998) Tissue microarrays for high-throughput molecular profiling of tumor specimens. Nat Med 4:844–7. 2. Bubendorf, L., Kononen, J., Koivisto, P., Schraml, P., Moch, H., Gasser, T. C. et al. (1999) Survey of gene amplifications during prostate cancer progression by high-throughout fluorescence in situ hybridization on tissue microarrays. Cancer Res 59:803–6. 3. Hoos, A. and Cordon-Cardo, C. (2001) Tissue microarray profiling of cancer specimens and cell lines: opportunities and limitations. Lab Invest 81:1331–8. 4. Simon, R., Struckmann, K., Schraml, P., Wagner, U., Forster, T., Moch, H. et al. (2002) Amplification pattern of 12q13-q15 genes (MDM2, CDK4, GLI) in urinary bladder cancer. Oncogene 21:2476–83.

5. Abbott, R. T., Tripp, S., Perkins, S. L., Elenitoba-Johnson, K. S. and Lim, M. S. (2003) Analysis of the PI-3-Kinase-PTEN-AKT pathway in human lymphoma and leukemia using a cell line microarray. Mod Pathol 16:607–12. 6. Fejzo, M. S. and Slamon, D. J. (2001) Frozen tumor tissue microarray technology for analysis of tumor RNA, DNA, and proteins. Am J Pathol 159:1645–50. 7. Bertheau, P., Cazals-Hatem, D., Meignin, V., de Roquancourt, A., Vérola, O., Lesourd, A. et al. (1998) Variability of immunohistochemical reactivity on stored paraffin slides. J Clin Pathol 51:370–4. 8. Jacobs, T. W., Prioleau, J. E., Stillman, I. E. and Schnitt, S. J. (1996) Loss of tumor markerimmunostaining intensity on stored paraffin slides of breast cancer. J Natl Cancer Inst 88:1054–9.

Chapter 5 A Decade of Cancer Gene Profiling: From Molecular Portraits to Molecular function Henri Sara, Olli Kallioniemi, and Matthias Nees Summary Cancer gene profiling has greatly profited from the progress in high-throughput technologies, including microarray-, sequencing-, and bioinformatics-based methods. The flood of data generated during the last decade has provoked a panel of “-omics” fields that significantly changed our understanding of malignant diseases. However, while the terms “-omics” and “-ome” in principle refer to the completeness of a genetic approach, we are in fact far from a complete understanding of cancer progression. We may understand gene expression patterns better and successfully use gene signatures for outcome prediction and prognosis, but truly promising molecular targets still have to find their way into novel therapeutic concepts. In this chapter, we will show how more comprehensive strategies, integrating multiple layers of genetic information, might in the future provide a more profound functional understanding of cancer. Key words: Microarray, Expression, CGH, Comparative genomic hybridization, Sequencing

1. Introduction: Arrays and Sequences for the Masses

Cancer is a genetic disease, and mutations are the key for understanding the disease mechanisms and developing novel therapeutic concepts. Somatic mutations and DNA copy number alterations, but also epigenetic changes, represent the basis for cancer progression, and result in altered messenger RNA (mRNA), alternative splicing patterns, or differential microRNA and protein expression. High-throughput (HTS) genomic technologies have taken the field by storm since their inception more than 10 years ago. Array technologies, in particular, have revolutionized mole cular cancer biology, clinical diagnosis and prognosis, and have created a multitude of different “-omics” approaches that would

Robert Grützmann and Christian Pilarsky (eds.), Cancer Gene Profiling: Methods and Protocols, Methods in Molecular Biology, vol. 576, DOI 10.1007/978-1-59745-545-9_5, © Humana Press, a part of Springer Science + Business Media, LLC 2010

61

62

Sara, Kallioniemi, and Nees

otherwise not be imaginable. Microarray technology has two major applications: gene expression analysis and genetic variation analysis. Both will be addressed here from the cancer-related point of view. In particular, cancer gene expression profiling has generated thousands of publications in the last decade (Fig. 1, Table 1), not even considering the “invisible” impact on private and corporate research not reflected by peer-reviewed publications. This large body of information has contributed greatly to our understanding of gene expression patterns that differ dramatically between normal and malignant tissues. Recent developments, such as the discovery of microRNAs and the option of globally profiling microRNA expression patterns, have shown that array and other HTS technologies will continue to contribute significantly to our improved understanding of cancer biology. There is also no end in sight for continued technological innovation and further miniaturization, as reflected by the most recent ultrahighdensity expression array platforms that have entered the market, and will be discussed here. Simultaneous with the success story of gene expression profiling technologies, comparable genetic “mapping” approaches such as comparative genomic hybridization (CGH) and singlenucleotide polymorphism (SNP) arrays have been developed for the genome-wide analysis of DNA copy number alterations. A panel of high-throughput genomic technologies is now broadly available that enables researchers to explore the cancer genome with an unprecedented throughput and accuracy, at greatly reduced cost. Apart from the success of the array technologies, high-throughput DNA resequencing approaches will increasingly play a role in future “-omics” research, complementing or even competing with hybridization-based technologies. Although resolution cannot be further increased beyond the singlenucleotide level, the fundamental question will be at which price the “next-generation” sequencing technologies will come and what their throughput might eventually be. Sequencing the “1,000 dollar cancer genome,” however, might soon become a reality (1, 2). The idea of fully characterizing the cancer genome(s) has therefore gained significant momentum. As a consequence, a number of large-scale projects have been recently launched to map the cancer genome in its entirety by resequencing, combined with other HT gene-profiling technolo gies. The incentives for the formation of an International Cancer Genome Consortium (ICGC) were in principle outlined at an International Cancer Genomics Meeting in Toronto, Canada, in October 2007. The ICGC will aim at the complete and comprehensive description of genomic, transcriptomic, and epigenomic changes in 50 different tumor types, including subtypes. During an initial explorative phase that will precede full-thrust efforts, the main focus will be on ten cancer types.

A Decade of Cancer Gene Profiling

63

Fig. 1. Statistics of peer-reviewed research articles, including reviews, indicating the occurrence of the term “microarray” or “microarrays” in the title, abstract, or MeSH (medical subject headings) term. Upper panel: Microarray-related publications for the years 1995–2007, in total (filled circle), cancer-related microarray studies (open square), and for spotted (filled square) or synthetic oligonucleotide arrays (filled triangle). Lower panel: Articles containing the combined keywords/MeSH headings tissue microarrays (open diamond), array–CGH (open square), ChIP-on-chip or chromatin immunoprecipitation on microarrays (filled square), alternative splicing arrays (open triangle), microRNA arrays (filled triangle), or protein and antibody arrays (open circle).

3

1 1

Mapping, chromosomal

Infection

1

8

25

3

5

Transplantation medicine

2

5

2

13

13

30

20

9

17

38

8

128

2002

5

5

2

2

5

11

6

2

14

21

5

58

2001

Reproduction

Microbiology

Alternative splicing

1

Cardiovascular

3

3

5

8

4

3

7

11

6

16

2000

Development

1

Mental health

1

2

2

1

2

2

5

8

1999

Metabolic/degenerative diseases

1

1

Drug development/toxicology

2

2

Immune/inflammatory system

2

1998

1

1

1996

SNP genotyping

Oncology

Human

6

11

43

4

15

45

18

69

37

18

33

66

12

231

2003

12

17

57

1

22

56

14

54

49

27

62

91

24

296

2004

6

21

46

4

32

97

78

52

64

37

94

127

30

411

2005

14

14

12

3

31

107

52

46

91

43

141

192

54

781

2006

8

8

8

14

38

47

49

60

61

75

84

85

108

653

2007

Table 1 Peer-reviewed publications that have made use of Affymetrix’ GeneChip technology, based on a detailed literature search in PubMed (http://www.ncbi.nlm.nih.gov/Entrez), indicating the major fields of biomedical research in which these were applied, for the two most strongly represented genomes (human and mouse)

64 Sara, Kallioniemi, and Nees

3

2000

Transplantation medicine

2

5

1

Reproduction

8

5

2

22

Microbiology

1

Cardiovascular

4

9

33

4

3

6

20

2

20

2002

2

2

Development

5

19

5

3

4

12

5

2001

Alternative splicing

3

8

Mental health

Metabolic/degenerative diseases

2

2

Mapping, chromosomal

Infection

2

Drug development/toxicology

3

1

1

1999

Immune/inflammatory system

1998

2

1

1996

SNP genotyping

Oncology

Mouse

2

11

4

1

17

60

23

54

11

5

10

40

39

2003

3

14

11

2

19

66

5

65

19

3

30

66

1

56

2004

6

15

20

3

10

113

12

54

19

15

25

82

2

59

2005

6

3

15

2

23

174

23

67

42

13

58

125

3

127

2006

2

5

5

4

30

100

22

45

28

13

22

66

7

113

2007

A Decade of Cancer Gene Profiling 65

66

Sara, Kallioniemi, and Nees

Key to participation in ICGC will be the comprehensive nature of the proposed studies, and compliance to the commonly agreed-on guidelines and data exchange policies. Considering the immense volume of DNA that needs to be sequenced, and the data that need to be stored and processed, these efforts easily equal the scale of thousands of human genome projects (3). The amount of mRNA gene expression data generated (including exon-level data) will be equally overwhelming. In this chapter, we provide a synopsis of the pros and cons of existing technologies, discuss the power of data integration, and outline the translational opportunities arising from “deep sequencing” and transcriptional profiling of the cancer genome.

2. Array-Based Expression Profiling: Spotted Versus Synthetic Arrays

The idea to review the entire body of literature on microarrays is an impossible task (compare Fig. 1, Table 2). For this chapter, we will therefore use statistics based on this large body of scientific literature, with the purpose of giving an overview of at least the most important trends and developments that have occurred during the last 10 years. The concept of microarray technologies has clearly derived from the Southern blotting method dating back to 1975. The use of collections of distinct DNA fragments in “macro” arrays for expression profiling (basically “dot blots”) was in general use throughout the 1980s. These early arrays were made by spotting complementary DNAs (cDNAs) onto filter paper with a pin-spotting device. The then upcoming large-scale expressed sequence tag (EST) projects, generating millions of cDNA clones or “tags,” provided an additional basis by generating a massive amount of sequence information and making clone libraries freely available to researchers. Polymerase chain reaction (PCR) technology (since 1986) also had a very significant impact. However, it required the development of specialized robotics, computerization, and miniaturization during the mid to late 1990s to bring microarray technologies – as we know them – significantly for ward. From the beginning, two ver y different microarray concepts were developed in parallel – and have been “competing” ever since. The use of miniaturized cDNA microarrays for gene expression profiling, based on cDNA clones and PCR-amplified DNA fragments, was first reported in 1995 (4–6). The first application of microarrays in cancer research was published in 1996 (7). A large number of cDNA platforms have been generated since, almost exclusively by the academic research community. In the pioneer days, generating a cDNA array was frequently performed under heroic “home-brewing” circumstances, handling crude spotting robotics and large clone

A Decade of Cancer Gene Profiling

67

Table 2 Number of individual microarray samples, as submitted to the GEO database, as of December 15, 2007 Table 2a Specifies mRNA gene expression analyses performed on the most prevalent commercial platforms, provided by Affymetrix, Agilent, NimbleGen and Illumina (bead arrays) Type

Array platform

Oligo array Affymetrix GeneChip Human Cancer Array HC-G110

Samples in GEO

GEO accession

23

Affymetrix, Inc. GPL74

476

Affymetrix, Inc. GPL80

Oligo array Affymetrix GeneChip Human 35K Array Hu35k-A to D

40

Affymetrix, Inc. GPL98

Oligo array Affymetrix GeneChip Human HG-Focus Target Array

1,129

Affymetrix, Inc. GPL201

Oligo array Affymetrix GeneChip Human Genome U95A to E

5,543

Affymetrix, Inc. GPL92

Oligo array Affymetrix GeneChip Human Genome U133A Early Access

464

Affymetrix, Inc. GPL4685

Oligo array Affymetrix GeneChip Human Genome U133A 2.0 Array

754

Affymetrix, Inc. GPL571

Oligo array Affymetrix GeneChip Human Genome U133 Plus 2.0 Array

10,878

Affymetrix, Inc. GPL570

Oligo array Affymetrix GeneChip Human Array HuGeneFL

Oligo array Affymetrix GeneChip Human X3P Array

218

Affymetrix, Inc. GPL1352

Oligo array Affymetrix Human Exon 1.0 ST Array (Transcript level)

401

Affymetrix, Inc. GPL5160

Oligo array Affymetrix Human Exon 1.0 ST Array (Exon level)

29

Affymetrix, Inc. GPL5188

2,548

Affymetrix, Inc. GPL5234

Oligo array Affymetrix Human Phase3 v1.0 (transcript mapping) Oligo array Agilent-012097 Human 1A Microarray G4110B

933

Agilent

GPL887

Oligo array Agilent-011521 Human 1A Microarray G4110A

210

Agilent

GPL885

Oligo array NimbleGen Human Expression array

13

NimbleGen Inc. GPL5465

Bead array

Illumina Sentrix HumanRef-8 Expression BeadChip

427

Illumina Inc.

GPL2700

Bead array

Illumina Sentrix Human-6 Expression BeadChip v1 + 2

735

Illumina Inc.

GPL2507

Bead array

Bead-based microRNA profiling platform version 1–3

453

BROAD Institute

GPL1986

68

Sara, Kallioniemi, and Nees

Table 2b Samples contained in GEO based on SNP arrays and CGH arrays, as of December 15, 2007 Type

Array platform

SNP

Affymetrix GeneChip Mapping 10K Array (Xba131 SNP)

SNP

Affymetrix GeneChip Mapping 10K 2.0 Array (Xba142 SNP)

SNP

Samples

Provider

Accession

280

Affymetrix, Inc.

GPL1266

7,233

Affymetrix, Inc.

GPL2641

Affymetrix GeneChip Human Mapping 50K Hind

64

Affymetrix, Inc.

GPL2014

SNP

Affymetrix GeneChip Human Mapping 50K Xba

64

Affymetrix, Inc.

GPL2015

SNP

Affymetrix GeneChip Mapping 100K Set Array (50K Hind240 SNP)

934

Affymetrix, Inc.

GPL2004

SNP

Affymetrix GeneChip Mapping 100K Set Array (50K Xba240 SNP)

951

Affymetrix, Inc.

GPL2005

SNP

Affymetrix GeneChip Mapping 500K Early Access (250K Sty SNP)

306

Affymetrix, Inc.

GPL3812

SNP

Affymetrix GeneChip Mapping 500K Set Array (250K Nsp SNP)

352

Affymetrix, Inc.

GPL3718

SNP

Affymetrix GeneChip Mapping 500K Set Array (250K Sty2 SNP)

682

Affymetrix, Inc.

GPL3720

SNP

Sentrix BeadChip Array HumanHap300 Genotyping BeadChip

110

Illumina, Inc.

GPL5711

CGH

Vysis GenoSensor CGH Array 300

12

Vysis, Inc.

GPL3709

CGH

Agilent-012750 Human Genome CGH Microarray 44A G4410A + B

207

Agilent Technologies, Inc.

GPL2873

CGH

Agilent-014693 Human Genome CGH Microarray 244A (G4411B)

92

Agilent Technologies, Inc.

GPL4544

CGH

NimbleGen Human HG18 WG CGH 389K array

5

NimbleGen, Inc.

GPL5941

CGH

NimbleGen Human HG17 ENCODE tiling array

74

NimbleGen, Inc.

GPL3514

CGH

CGH array LUMC, 1 Mb clone set

98

LUMC

GPL1506

CGH

CGH-CHROM14 2K versions 1 + 2

41

Sanger Institute

GPL3892

CGH

CGH-SANGER 3K versions 1–5

116

Sanger Institute

GPL4003

CGH

CGH-SANGER 4K version 1

54

Sanger Institute

GPL4939

CGH

CGH-SANGER 5K version 2

2

Sanger Institute

GPL3887 (continued)

A Decade of Cancer Gene Profiling

69

Table 2b (continued) Type

Array platform

Samples

Provider

Accession

CGH

DKFZ Homo sapiens array–CGH 6k BAC array

90

DKFZ

GPL5685

CGH

MHP Human Chromosome 1 tile path CGH array version 1

12

University of Cambridge

GPL5055

CGH

MHP Human Chromosome 1 tile path CGH array version 2

96

University of Cambridge

GPL5056

CGH

MPIMG Homo sapiens 44K ArrayCGH

68

MPIMG Berlin

GPL5114

collections stacking up in freezers, high-throughput PCR and DNA purification strategies, as well as complex surface chemistry, postprocessing, and optimization of sample labeling and hybridization, all at the same time. Larger genome institutes, such as the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI), not forgetting Stanford University, early on were able to accrue larger funding and personnel and managed to semi-industrialize array production in comparably large-scale array core facilities. A survey in the public array data sets submitted to the gene expression omnibus (GEO) array database shows that a total of 350 different cDNA array platforms have been deposited since 2001, the majority of these represent spotted arrays. These arrays, taken together, target altogether no less than 240 different species, with man, mouse, and rat ranking in the first positions (Fig. 2), followed by Arabidopsis, yeast, and Drosophila. In situ-synthesized oligonucleotide microarrays are produced by generating 25–60mer sequences directly onto a planar array surface, using light-directed chemical synthesis of nucleic acids and technologies, borrowed from semiconductor manufacturing. Photolithographic synthesis uses a chemically activated silica substrate, and light-sensitive masking agents construct the sequence one nucleotide at a time across the entire array. Affymetrix’ GeneChip® technology was invented in the late 1980s by a team of scientists led by Stephen P. A. Fodor, who cofounded Affymetrix Inc. in 1992. The company initiated in 1991 from a small unit called Affymax N.V. in Fodor’s group, which had, in the late 1980s, already developed methods (and patents) for fabricating the first small synthetic DNA arrays. The company’s first product, an HIV genotyping chip, was introduced in 1994 (8). The company eventually went public in 1996. Oligonucleo tide microarrays are now primarily associated with the name Affymetrix, which in 2007 was the undisputed market leader.

70

Sara, Kallioniemi, and Nees

Fig. 2. Statistics of microarray samples submitted to the GEO public data repository (http://www.ncbi.nlm.nih.gov/geo/), indicating the number of samples per species (top 12 species among 240 in total represented in GEO), submitted annually since the beginning of the project in 2001.

However, modified synthesis technologies have been introduced more recently. The “maskless array synthesis” protocol from NimbleGen Systems makes use of the more flexible digital micromirror devices (DMD), borrowed from light processors used in optical presenters. DMDs employ an array of miniature aluminum mirrors to pattern between 786,000 and 4.2 million individual pixels. These “virtual masks” replace the physical masks used in Affymetrix technology – the major advantage is an increased flexibility and turnover in array design. An altogether different concept underlies bead arrays, with Illumina Inc. being the market leader in this field. For bead arrays, an optical “imaging” fiber is etched such that a bead can fit into the resulting micron-sized etched wells right on the tip of the fiber. Different oligonucleotide sequences are attached to each bead, and thousands of beads can be self-assembled onto the fiber bundle. A subsequent decoding process is carried out to determine which bead occupies which well. Complementary oligonucleotides present in the

A Decade of Cancer Gene Profiling

71

s ample bind to the beads, and bound oligonucleotides are measured by using a fluorescent label. Illumina sells these arrays in two formats: Sentrix Array Matrix and Sentrix BeadChips. Affymetrix, NimbleGen, and Illumina represent so-called “single-channel” or “one-color” microarrays and give estimations of the absolute levels of gene expression, in contrast to most cDNA arrays (including spotted oligonucleotide arrays) that require a dual-channel hybridization strategy. Absolute values of gene expression may be compared with other genes within a sample or with the same gene across a large panel of array hybridizations. In contrast to cDNA arrays, data are more readily normalized and compared with arrays from different experiments or even different array generations, a task that is virtually impossible considering the hundreds of different cDNA array platforms. The absolute values of gene expression may be compared between studies conducted months and years apart, or even across researchers from all over the planet using the same (commercial) array platforms. Even between entirely different array concepts, such as bead-based and planar oligonucleotide chips, comparisons are possible and are in fact facilitated by the singlechannel principle. Considering the large amount of data generated by the international academic cancer research community, this now turns out to be one of the major advantages. Figure 1 illustrates the number of publication abstracts or MeSH headings containing the keyword “microarrays”. The number of publications has been exponentially increasing since the year 2000, with more than 6,000 peer-reviewed publications for the year 2007 alone – across the entire field of biomedical and pharmaceutical research. A combined search for the terms “cancer” and “microarrays” results in roughly 2,000 publications each for the last 2–3 years; indicating that approximately one third of all microarray-related publications were applied to cancer research. The data also suggest a significant saturation effect, with the curve approaching a plateau. This may indicate that the majority of cancers have already been sufficiently profiled using high-density microarrays, although the level of new publications remains high. The publication statistics also indicate that oligonucleotide arrays dominate the market. The number of publications based on cDNA arrays, in contrast, has not been increasing after 2002, and is actually declining since 2005. This is also illustrated in Table 2, summarizing the large number of publications based on the Affymetrix GeneChip arrays, for both mouse and human, and across all fields of biomedical research. It is intriguing to see that, in both species, the largest number of publications is related to cancer research/oncology, followed by SNP array genotyping. Unimpressed by the plateau effect, many novel array technologies are still rapidly evolving. As illustrated in the lower panel of Fig. 1, there still is an almost exponential

72

Sara, Kallioniemi, and Nees

increase in the number of new publications on tissue microarrays, array–CGH, and chromatin immunoprecipitation on chips (so-called ChIP-on-chip technologies), as well as microRNA arrays. Protein and antibody arrays, however, have not been a success story or a commercial breakthrough as of yet, at least according to the low number of publications. Alternative splicing arrays, on the other hand, are only now becoming widely available and the future will show their impact. The immense amount of data available generated an urgent need for standardization, which was addressed by a number of community efforts by 2001 (9), prior to the flood of data. Currently, the MicroArray and Gene Expression (MAGE) group continues to work on the standardization of the representation of gene expression data and relevant annotations, aiming at facilitated exchange of data sets (http://www.mged.org). The Microarray Gene Expression Data (MGED) society has taken on the Minimum Information About a Microarray Experiment (MIAME) checklist, which was initially intended to define the level of information submitted together with a microarray experiment (9). It has since been adopted by many journals as a minimal requirement for the submission of papers incorporating microarray results. MIAME, however, is not a unified format, and is therefore of limited use. MGED, therefore, has recently launched an updated MIAME 2.0 version. In parallel, the MicroArray Quality Control (MAQC) project is conducted by the US Food and Drug Administration (FDA) with the aim of developing standards and quality control metrics that will eventually allow the use of array data in drug discovery, clinical practice, and regulatory decision making. The latest version MAQC protocols encompass two stages (MAQC I and II, and MAQC III addresses guidelines for next-generation sequencing (MAQC III). The wish for standardization and the MIAME checklist were the primary reason for, and have since provided the “food” for, the building of large public data repositories, such as GEO, EBI ArrayExpress, and the Stanford Microarray database (for web addresses, see Table 3). As already indicated by the literature trends shown in Fig. 1, commercial in situ array platforms also dominate the public data submitted to such repositories. Tables 2a and 2b summarize the number of array samples deposited in the largest of these databases, GEO, by the end of 2007. The number of arrays (22,500 arrays) based on any one of the Affymetrix platforms contrasts with only 1,150 arrays based on all other commercial platforms combined, such as Agilent and NimbleGen, or 1,650 from Illumina bead arrays. Noncommercial spotted array platforms also provide a large number of samples, but are distributed over dozens of very different platforms. The largest individual studies, with currently >3000 samples hybridized, such as the Expression Project for Oncology (expO) project

A Decade of Cancer Gene Profiling

73

Table 3 List of web addresses for microarray databases, including primary data repositories, reference gene expression databases, and a selection of meta-analysis databases intended to assign functional information to gene expression patterns Primary microarray data repositories

Web URL

ArrayDB, NHGRI Array Database

genome.nhgri.nih.gov/arraydb/

ArrayExpress, EBI Microarray Database

http://www.ebi.ac.uk/arrayexpress/

ArrayTrack, FDA Microarray Database

http://www.fda.gov/nctr/science/ centers/toxicoinformatics/ArrayTrack

BROAD Institute Microarray Database, MIT

http://www.broad.mit.edu/cancer/ datasets.html

caArray Standards-Based Array Database, NCI

caarraydb.nci.nih.gov/caarray

CIBEX Microarray Database

cibex.nig.ac.jp

CleanEx Microarray Database

http://www.cleanex.isb-sib.ch/

GAN Gene Aging Nexus

gan.usc.edu

GEO Gene Expression Omnibus, NCBI/NIH

http://www.ncbi.nlm.nih.gov/geo/

LAD Longhorn Array Database

http:// www.longhornarraydatabase. org/

LMD Lung Microarray Database

lungmicroarray.org

maxD Array Database, Manchester University UK

http://www.bioinf.man.ac.uk/ microarray/maxd

MESA Microarray Database, Burnham Institute

bsrweb.burnham.org/metadot

MUSC Array Database, Medical University of South Carolina

proteogenomics.musc.edu/pss

NKI Microarray Database, the Netherlands Kanker Institut

microarrays.nki.nl/

NOMAD Array Database, UCSF

ucsf-nomad.sourceforge.net/

NYUmad, NYU Microarray Database

http://www.bioinformatics.nyu.edu/ Projects/nyumad

PhenoGen Informatics, U. of Charleston, South Carolina

phenogen.uchsc.edu/PhenoGen

PumaDB, Princeton University Array Database

puma.princeton.edu

SMD Stanford Microarray Database

genome-www5.stanford.edu/

UNC MicroArray Database

genome.unc.edu

YMD Yale Microarray Database

http://www.med.yale.edu/microarray/

Reference Gene Expression Databases

Web URL (continued)

74

Sara, Kallioniemi, and Nees

Table 3 (continued) Primary microarray data repositories

Web URL

caGEDA Cancer Gene Expression Database

http://bioinformatics.upmc.edu/GE2/ GEDA.html

Connectivity Map BROAD Institute, MIT

http://www.broad.mit.edu/cmap

EMAGE Expression Database (mouse)

genex.hgu.mrc.ac.uk/Emage/database/

GeneX Open Source Gene Expression Database

genex.sourceforge.net/

GEPIS Expression Database, UCSF

http://www.cgl.ucsf.edu/Research/ genentech/gepis/gepis.html

GXD Mouse Gene Expression Database, Jackson Lab

http://www.informatics.jax.org/ mgihome/GXD

HuGE Index Human Gene Expression Index

zlab.bu.edu/HugeSearch

ITTACA Tumor Gene Expression Database

bioinfo.curie.fr/ittaca

PEDP Prostate Expression database

http://www.pedb.org/

PEPR Public Expression Profiling Resource

pepr.cnmcresearch.org

RAD RNA Abundance Database, UPenn

http://www.cbil.upenn.edu/RAD

RefExA Reference Database for Gene Expression Analysis

157.82.78.238/refexa

SIEGE Lung Gene Expression Database

pulm.bumc.bu.edu/siegeDB

Symatlas, Novartis Institute/GNF

symatlas.gnf.org/SymAtlas/

tmaDB Tissue Microarray Database, Leeds UK

http://www.bioinformatics.leeds.ac.uk/ tmadb/

MetaSearch Expression Databases

Web URL

GeneLogic Toxicogenomics Inc.

http://www.genelogic.com

GeneVestigator, ETH Zurich

http://www.genevestigator.ethz.ch.

GENOMICA, Eran Segal Lab, Weitzmann Inst.

genomica.weizmann.ac.il/

Module Map, Daphne Koller Lab/Stanford

robotics.stanford.edu/~erans/cancer/

Oncomine

http://www.oncomine.org

SPELL Serial Pattern of Expression Levels Locator, Princeton

function.princeton.edu/SPELL

TMM Gene CoExpression Database, Pavlidis’ Lab

microarray.cpmc.columbia.edu/tmm

(http:// www.intgen.org/expo.cfm; maintained by the International Genomics Consortium [IGC]), are almost exclusively based on commercial platforms. Although these numbers may not directly correlate with the sales figures, they nevertheless illustrate the huge amount of experimental data freely available to the community. Unfortunately, corporate experimental data

A Decade of Cancer Gene Profiling

75

are not usually submitted to such repositories and are therefore missing. Furthermore, data are often released years after the actual experiments, there is a significant lag phase in database submissions. In any case, this treasury of array data has spawned a number of efforts aiming at the generation of integrative databases, allowing the mining of these data in a meta-analysis mode. Oncomine, (http://www.oncomine.com), Genesapiens (www. genesapiens.org),andGeneVestigator (www.genevestigator.ethz.ch) are the most popular of these databases and illustrate the concept others are summarized in Table 3. This trend toward meta-analyses reflects the need for a more uniform and comprehensive, functional understanding of the data, aiming less at the identification of differentially expressed genes or “markers,” and focusing instead on understanding the pathways and mechanisms that drive cancer progression. Simultaneous with the rise of immense databases and array repositories, bioinformatics has gained an immensely important role, and got a major boost in significance by the dire need to normalize, handle, and interpret the bulk of data. It is not possible to even briefly sketch the impact of bioinformatics on the life sciences due to space limitations. As a surrogate for this, Tables 4 and 5 (generated by clustering PubMed literature data on microarrays) are primarily intended to illustrate the predominance of bioinformatics-related

Table 4 Meta-analysis of the co-occurrence of keywords and MeSH headings together with the terms “microarray*” and “breast cancer.” The resulting list of additional keywords retrieved in this “literature clustering” was then ranked according to the total number of co-occurrences. Search terms pointing primarily to bioinformaticsrelated topics are in bold Breast cancer/keywords/MeSH headings

#

Keywords/MeSH headings

#

Mammary

65

Image analysis

17

Estrogen receptor

58

Normal tissues

17

Basal-like Subtype

54

Sections, tissue

17

Amplification/copy number changes

50

DNA methylation

15

Prediction of outcome/predictive markers

60

COX-2

15

Ductal carcinoma

48

Endothelial cells

15

MCF-7

46

Mutations

14

Ovarian cancer

44

Molecular classification/signature

12

Prognostic markers/signature

41

Early-stage breast cancer

12

BRCA1

34

Differentially expressed

11 (continued)

76

Sara, Kallioniemi, and Nees

Table 4 (continued) Breast cancer/keywords/MeSH headings

#

Keywords/MeSH headings

#

Formalin-fixed, paraffin-embedded

31

Databases

11

Tamoxifen

29

EGFR, EGF receptor

10

Estrogen

28

Adjuvant chemotherapy

10

FISH, fluorescence in situ hybridization

27

Proteomic analysis

10

P53, tumor suppressor protein p53

25

Biomarkers

10

Metastasis

24

False, estimates

10

ERBB2, Her-2

23

Fine needle aspiration

10

Classification, class

21

Lobular carcinoma

10

Comparative genomic hybridization

20

Prostate and breast cancer

10

Hypoxia

19

Apoptosis

10

Model

19

Subtypes of breast cancer

10

Tissue microarray

17

Locally advanced breast cancer

10

Year

# publications

1999

7

2000

19

2001

56

2002

113

2003

155

2004

267

2005

323

2006

387

2007

420

questions in microarray publications. In Table 4, MeSH headings and keywords were ranked according to the frequency of co-occurrence in publications focusing on microarrays in breast cancer research (for which the largest number of microarrayrelated studies are available). It becomes immediately obvious that keywords such as “subtypes,” “outcome prediction,” “prognostic markers,” “molecular signature,” and “tumor classification” feature prominently. Similar trends are then further exemplified by the ranking of keywords/MeSH headings most frequently co-occuring with the search terms “cancer,” “microarrays,” and “bioinformatics” (Table 5). According to this survey, bioinfor-

A Decade of Cancer Gene Profiling

Table 5 Literature clustering and analysis of keywords/MeSH headings that most frequently co-occur in conjunction with the terms “microarray*,” “cancer,” and “bioinformatic*”. This list summarizes many of the aims and procedures where bioinformatics is primarily applied in the microarray field MeSH: microarrays/ cancer/bioinformatics

n = 496

Classification

83

Clustering, cluster analysis

44

Physiology

39

Biomarkers

26

Statistics and numerical data

22

Survival

17

NCI-60 panel

16

Proportional hazards models

15

Trends

15

Standards

10

Gene interactions

9

FDR, false discovery rate

8

Logistic regression

8

Proteomic technology

7

Immunology

7

SNP array

6

Recursive feature

6

Functional modules

5

Chromosomal regions

5

Gene selection algorithm

5

Copy number changes

4

Family-wise error rate

4

Gene co-expression

4

Multiclass classification

4

77

78

Sara, Kallioniemi, and Nees

matics is the fundament for the identification of “biomarkers,” but also “functional modules,” “gene interactions,” and “targets,” to mention a few. Again, this list illustrates the need of the research community for tools that functionally annotate gene expression data, or help classifying tumors in different subclasses according to expression patterns or clinical data, such as survival. Hundreds of biomarkers and gene sets have been identified that correlate more or less significantly with stage and progression of the disease(s), but few of these may also represent promising novel targets for therapeutics. Based on the cancer classification/subclass concept, the idea of diagnostic and prognostic gene signatures has been introduced, a concept that has triggered a new “industry” of diagnostic companies that offer gene expression services for improved patient stratification and personalized therapeutic decisions (e.g., Agandia’s MammaPrint® test, http://www.agendia. com). Individualized medicine, therapeutic decisions, and more accurate patient stratification based on gene expression profiles have already become a reality. Agendia’s In Vitro Diagnostic Multivariate Index Assay (IVDMIA) was granted market clearance in 2007 by the FDA, which provides the legal basis for offering this service in the United States. The test has been sold in Europe since 2005.

3. Array-Based Genetic Mapping Part 1: CGH Arrays

The genomic landscape of tumors encompasses a broad spectrum of genetic events. The scale of copy number alterations ranges from microdeletions/amplifications of a few bases to megabases of DNA and entire chromosomes. DNA copy number variations (CNV) are most rapidly addressed by array-based technologies, primarily array–CGH and SNP arrays. Array-CGH has come a long way since the first description of chromosomal CGH in 1992 (14) and the first array–CGH (15) based on spotted DNA from BACs. It has become a common tool in cancer genome analyses, reflected by the fact that, by 2007, all of the more frequent tumor types have been covered by at least one array– CGH study in the literature. These platforms allow the rapid and reliable detection of increasingly smaller microdeletions and amplifications. Oligonucleotide-based CGH (offered by companies such as Agilent and NimbleGen, Table 2b) and SNP arrays (Affymetrix, Illumina, next paragraph) have more or less replaced bacterial artificial chromosome (BAC)-based CGH arrays as a method of choice for larger-scale genomic profiling of cancer. For example, NimbleGen offers a 384K array–CGH platform based on 50mer-long oligos, similar to Agilent’s CGH

A Decade of Cancer Gene Profiling

79

platforms (60mer) that have been recently upgraded from 44,000 to 244,000 elements. This increased density offers greatly improved resolution, down to exon-level detection of focused deletions. CGH arrays now represent highly reliable, standardized technology platforms; they have become much more affordable, and have greatly facilitated access to genomic profiling technologies for many clinical laboratories. Array service providers such as NimbleGen increasingly “do the jobs,” requiring the researcher simply to provide purified tumor DNA – and handle the data with the help of bioinformatics. Integration or “layering” of expression and CGH data generates additional insights into cancer biology. By 1999, cDNA microarrays were introduced as a plausible platform for CGH, thereby facilitating the integration of both DNAand RNA-level data (16). Now, bioinformatics also facilitates the layering of CGH array and mRNA expression data between different platforms (17–19). In breast cancer, for example, CGH array data basically reflect the subtype classification schemes defined based on expression profiling alone (10–12). This does not apply to all the subtypes described, however, and partially overlapping alternative tumor classes based on genomic features such as chromosomal instability and gene amplifications have been suggested (20). In another breast cancer study, mRNA gene expression profiling, clinical outcome data, and BAC-based CGH array data were integrated (18). This report confirmed that expression- and CGH-based tumor classification overlap to a large degree and that patient stratification and prognosis may be significantly improved by using this combined strategy. Furthermore, this study identified 66 genes with recurrent high-level amplifications, resulting in gene overexpression (“amplified/over-expressed genes”) that may represent novel cancer targets. Nine of these candidates would be generally considered as druggable. A second report, also from Joe Gray’s group (19), analyzed a panel of 51 breast cancer cell lines by CGH and expression arrays, including a set of 145 primary tumors. Protein expression and clinical data were also integrated. Again, the cell lines confirmed that most of the genetic changes found in tumors are by and large represented in the cell lines. This confirms that the use of cell lines in cancer research, although much debated, is indeed justified and reflects cancer biology to a large degree. Interestingly for cancer therapeutics, Herceptin response and resistance in the cell line panel correlated with expression of a number of protein markers that may be of clinical value for patient selection – larger panels of cell lines might give more robust insights. Data integration as exemplified above is clearly a powerful strategy with the potential to gain functional insights, and for the identification of cancer drug targets or therapies.

80

Sara, Kallioniemi, and Nees

4. Array-Based Genetic Mapping Part 2: SNP Arrays

The currently available SNP array platforms cover between 10,000 (10K) and 500,000 (500K) markers on a single array or bead array. At this high density, and due to the existence of haplotypes in the human genome resulting in the phenomenon of linkage disequilibrium, there is no need to hybridize normal samples to analyze CNVs – different from CGH arrays. Haplotypes are segments of chromosomes that have not been “broken up” by recombination, and are separated by the sites of recombination. Haplotypes in particular enable geneticists to search for genes involved in cancer and many other diseases, and facilitate the use of SNP arrays for the detection of CNV, also allowing researchers to identify putative target genes located in precisely mapped minimal amplification/deletion intervals. Approximately 3.1 million SNPs have been mapped in the human HapMap consortium (http://www.hapmap.org), and are readily available for improved SNP array design. Accordingly, both Illumina and Affymetrix have very recently launched larger SNP platforms covering more than one million SNPs (Affymetrix SNP array 6.0 with 1.8M variation markers; and the Illumina Human1M BeadChip). SNP arrays have rapidly replaced older technologies, such as microsatellite markers, and compete with CGH arrays for genetic profiling. Controlled clinical studies including hundreds or thousands of cancers are feasible, providing robust statistics for the detection of recurrent somatic alterations. The mapping resolution is comparable to that provided by CGH arrays. In one of the most extensive cancer genotyping studies to date, performed on 528 lung adenocarcinomas, the NKX2-1 or TITF1 gene was identified as the most likely target gene of a recurrent amplification at 14q13.3 (21). Analogous to findings related to the MITF gene in malignant melanoma (22), NKX2-1 is a typical lineage-specific transcription factor. Both NKX2-1 and MITF represent unique proto-oncogenes activated and over-expressed in a significant portion of lung adenocarcinomas and melanomas. In the same lung cancer study, a number of additional candidate tumor suppressor genes were allocated to recurrent deletions, such as the tyrosine phosphatase PTPRD at 9p23 and the phosphodiesterase PDE4D at 5q11.2; pointing to these as important functional pathways frequently inactivated in cancers. SNP arrays naturally continue to play an important role in vast linkage analysis projects aiming to identify novel cancer susceptibility genes. In breast cancer, a recent genome-wide association study (23) comprised more than 4,400 tumors and 4,300 control samples, followed by an even larger panel of samples from >21,000 cancer cases and 22,500 healthy control donors for subsequent validation. In

A Decade of Cancer Gene Profiling

81

this huge cohort, Affymetrix 500K SNP arrays were used to identify a panel of putative cancer-predisposing gene variants. It is interesting to note that, in GEO, SNP array data are highly over-represented compared with CGH. As of December 2007, GEO contained more than 11,000 SNP array data, compared with only 745 CGH arrays (Table 2b). Although Affymetrix SNP arrays have been commercially available for several years, BAC-based CGH platforms have previously been a rather exclusive technology produced only at a few large genome centers. This is now significantly changing, with many commercial providers of CGH arrays entering the market. A literature research in NCBI PubMed (not shown) revealed that both SNP and CGH arrays are both mentioned in >500 publications. However, while CGH arrays are almost exclusively being applied in cancer-related projects, SNP arrays were traditionally and primarily linked to the mapping of metabolic and neurological diseases, only a small fraction addressed neoplasia. Nevertheless, SNP arrays are rapidly gaining a strong foothold in the cancer field as well, definitely helped by the dramatic increase in density. An aspect that is generally poorly addressed in mapping studies is that of chromosomal translocations. However, balanced and reciprocal chromosomal translocations are extremely frequent in most types, and can result in the generation of fusion genes with novel oncogenic properties. The problem is that balanced translocations do not usually lead to massive loss or gain of DNA at the recombination site, and are therefore not readily detectable by SNP or CGH arrays. Not surprisingly, almost two thirds of the genes in the Sanger Centre Cancer Gene Census (24) represent reciprocal fusion partners. These are primarily found in bloodrelated and mesenchymal cancers, but hardly any were described in epithelial cancers that represent 90% of the entire cancer burden. The fusion-gene concept has now attained renewed attention in epithelial cancers, mainly due to the discovery of recurrent fusions of TMPRSS2 and ERG and other ETS family genes in prostate cancer (25, 26). This translocation represents one of the most frequent alterations in cancer as a whole, and it is surprising in retrospect that it was only identified in the year 2005.

5. Deep Sequencing Technologies

While the resolution of CGH and SNP arrays has dramatically increased in less than 10 years, resequencing approaches have effectively reduced the resolution to the single-nucleotide level. High-density tiling array platforms may in principle achieve the same goal, but at a comparable high cost and effort. Array-based

82

Sara, Kallioniemi, and Nees

sequencing, or “sequencing by hybridization,” is an old idea that goes back to the early 1990s (27). NimbleGen Inc., for example, offers their Comparative Genome Sequencing (CGS) technology as a viable alternative for the analysis of bacterial genomes. CGS provides an efficient, high-throughput, and cost-effective method for genome-wide analysis, but it is restricted to genomes in the 3- to 5-Mb range, and thus not suitable for eukaryotic (and cancer) genomes that are 1,000 times larger. The largest number of oncogenic alterations in cancers is probably attributed to somatic point mutations that result in proteins with gain-of-function such as activated oncogenes, or inactivate tumor suppressor/caretaker genes. Point mutations represent the technically most demanding end of the spectrum of cancer-relevant changes, because they require massive automated DNA-sequencing technologies. PCR-based, massively parallel sequencing technologies (MPSS) such as Solexa (Illumina Inc.), SOLiD (Applied Biosystems ABI), and 454 (Roche Inc.) (28, 29) seriously compete with array-based technologies in both price and throughput. Currently, pricing for these sequencing platforms is in the range of $1–6/per base. A single run costs between $8,000 and $10,000, and generates up to 1 Gbp of sequence data. Sequencing with 454 generates larger fragments (250 bp compared with 25 bp or 35 bp for SOLiD and Solexa), but only 0.1 Mbp per run in total. The amount of data generated is immense, as is the requirement for data storage capacity. Most large-scale sequencing efforts to date have focused on genes or gene families that are over-represented in cancer. This includes, naturally, panels of known oncogenes/tumor suppressor genes, or the kinases and phosphatases as the most frequently mutated functional protein families. Early on, kinase screens performed by the cancer genome project (CGP) of the Sanger Institute (http://www.sanger.ac.uk/CPG) yielded a spectacular, more than encouraging hit – BRAF mutations mutated in 70% of all melanomas (30). Parallel approaches at the Johns Hopkins Cancer Center (http://www.hopkinskimmelcancercenter.org), covering both the kinome (31) and protein tyrosine phosphatases (32) in breast, colorectal, and gastric cancers, were equally encouraging. The PIK3CA gene was identified as one of the most frequently mutated genes in breast cancers (33) and a panel of cancer cell lines. The “deep sequencing” of cancer cell lines (34), although frequently criticized, represents a valid alternative to cancer samples. As already outlined in the previous section on CGH, cell lines not only continue to harbor the same mutations and CNV as primary tumors, they also provide researchers with an opportunity to functionally characterize the impact of somatic alterations for drug development. Cell lines also offer the advantage of limited heterogeneity and less “contamination” with nontumor cells such as tumor stroma,

A Decade of Cancer Gene Profiling

83

improving sensitivity issues when it comes to the identification of somatic point mutations. Larger-scale resequencing studies on the 518 human kinases, conducted at both centers, were rewarded with a large number of somatic mutations (35–37), confirming once again that kinases and phosphatases represent the most frequently mutated gene families in cancer. The largest of these studies to date resulted in more than 1,000 somatic kinase mutations identified in 210 tumors (38). Another recent PCR-based sequencing screen covered 238 known oncogene mutations in 14 frequently mutated oncogenes across >1,000 tumor samples and 17 different tumor types (39). This study revealed not only low-frequency mutation rates in cancer types previously not associated with many of these oncogenes, but also a previously unknown and widespread “partnering of mutations” for functionally related pairs of oncogenes. It had been previously assumed that a single mutation within a critical pathway (such as the RAS pathway) is sufficient for a functional activation or inactivation within that pathway, excluding additional mutations in related genes. Their obviously recurrent existence, however, points to a potential complementary function of some of the most recurrent mutations, and confirms that pathways may in fact be targeted by more than one hit. Encouraged by these successes, unbiased large-scale tumorresequencing approaches were launched – studies not targeting specific gene families in particular. However, an initial attempt covering 1,811 exons of 470 genes identified only three somatic mutations in colorectal cancers (40). It became immediately clear that much larger-scale sequencing exercises were needed. Recent studies have sequenced basically all genes in the RefSeq database, that is, 20,857 transcripts and 18,191 genes, in a set of 11 breast and 11 colon cancers (41, 42). Genes were PCR amplified, one exon at a time, and subsequently sequenced. This effort has resulted in a set of 280 genes that were mutated in at least one tumor, indicating clearly that cancers do contain a large number of point mutations. The large number of mutations, averaging 90 per sample, came as a surprise, and created the rather difficult task to distinguish “driver mutations,” which are positively selected for in cancer progression, from “passenger mutations.” Using statistical methods, candidate genes were selected that most likely represent driver genes, to be followed-up in additional cancer cohorts. A different gene selection strategy was based on pathways previously implicated in cancers; those were used for validation based on a second “test set” of 96 tumors. From these data, a surprising variety of mutations becomes apparent; this allows a true comprehensive fine-mapping of somatic alterations in cancers. A number of additional tumor resequencing projects have been launched, for example, the cancer gene atlas (TCGA) (cancergenome.nih.gov) at the NIH, and the tumor sequencing

84

Sara, Kallioniemi, and Nees

project (TSP) (http://www.genome.gov) at NHGRI, which will be jointly funded together with the TCGA later on. Both projects will also become principal components of the ICGC initiative during 2008.

6. Future Outlook and Perspectives During the past decade, combined efforts have identified hundreds of genes mutated in cancers – collected at the COSMIC database (http://www.sanger.ac.uk/genetics/CGP/cosmic). The various mutations identified in most of the studies described in the previous section are compiled, in a searchable format, at http: // cbio.mskcc.org/cancergenes. Intriguing as the vast list of cancer genes may seem, nobody would assume this list to be complete yet. There is ample evidence that we will continue to identify novel cancer genes, particularly in anticipation of the planned large-scale sequencing projects. Even genes mutated at high frequencies (>5%) in certain malignancies most likely still await discovery, as demonstrated by the recent findings of recurrent BRAF and MITF mutations in melanoma (22, 30) or PIK3CA in breast cancer (33). Up to 70% of all melanomas contain a BRAF mutation, representing one of the highest mutation frequencies ever found, only rivaled by “classic” targets such as the p53 tumor suppressor gene. Such novel cancer genes represent excellent vantage points for future cancer therapeutics, highlighting possible oncogene dependencies and vulnerabilities that may be specifically targeted by “designed” drugs. These strategies are in principle modeled after the poster-child success stories of drugs targeting genes such as BCR/ABL and c-Kit (Gleevec), ERBB2 (Herceptin/trastuzumab), EGFR (Iressa, Tarceva), or VEGFR (avastin). There may not be a lack of novel targets after all. It is primarily mechanistic information that is missing so that cancer genomics could direct the drug discovery process. For this purpose, integrative approaches, combining different genome-wide technologies such as mRNA gene expression and DNA copy number alterations, have proven to be extremely powerful and have already resulted in greatly improved mechanistic insights. Such “overlay” strategies have already helped to identify and validate novel “driver” genes and pathways in cancer biology (18, 19). However, such “holistic” efforts need to become more routine strategy, and there is no lack of different aspects that might be taken into the big picture. The contribution of microRNA arrays (43) is only now emerging, and so is the role of genomewide analyses of epigenetic alterations (44–46). Transcriptional profiling technologies also continue to advance. Alternatively spliced mRNA variants are now routinely detectable by exon

A Decade of Cancer Gene Profiling

85

arrays (e.g., Affymetrix Exon 1.0 arrays, and other providers such as ExonHit and JIVAN Inc.). The large number of transcriptomics data available to the research community already now needs to be mined in a more comprehensive fashion as well. Large-scale initiatives to mine this information are only now beginning, with search engines such as Oncomine, GENOMICA, and geneVestigator allowing the identification of cancer-specific functional modules in cancer (47). Metabolic and proteomic fingerprints as well as the mathematical analysis and modeling of “-omics” data may complete our comprehensive understanding of the molecular deregulation of cancer cells in vitro and in vivo. Last but not least, high-throughput small interfering RNA (siRNA) and short hairpin RNA (shRNA) technologies are primarily intended toward gaining knowledge about gene function. Functional profiling technologies have a particularly strong potential when combined with mapping of the physical cancer genome or transcriptome. This was recently exemplified by ref. (48), integrating mRNA expression and genome mapping with functional shRNA screening data. They identified IKBKE as a recurrent target of amplifications in breast cancer, pointing once again to the functional activation of the NFkB pathway in tumorigenesis – a finding shared by the largest-scale sequencing efforts to date, but based on an almost entirely different set of genes (42). As the result of many cancer genome analyses already existing to date, a core panel of only 15–20 critical cancer pathways is emerging, which includes a much larger number of cancer genes that are mutated or silenced in neoplasias (49). These findings are in principle confirmed by large-scale transcriptomics studies that identify the same activated pathways by bioinformatics tools such as clustering and functional gene annotation (amigo.geneontology.org; david.abcc.ncifcrf.gov), gene set enrichment analysis (http://www.broad.mit.edu/gsea), or use a systematic approach for the discovery of functional connections among diseases, genetic perturbation, and drug action (Connectivity Map; http://www.broad.mit.edu/cmap). A relatively small set of “usual suspects,” genes that form outstanding peaks on the cancer genome map according to their extraordinary mutation frequency, have been identified in the past. However, most genes are found to be mutated at a very low frequency, which creates the principal problem of sorting the wheat from the chaff. The most powerful strategies for this purpose might be based on integrative “overlay” and functional studies. It has already become apparent that the abundance of somatic mutations most likely impacts on many genes, but only a small set of functional pathways. The identification of the spectrum of somatic alterations in the cancer genomes therefore represents only half of the way toward the goal of improved cancer therapy. The most difficult part, translating these data into knowledge and novel therapies, still lies ahead.

86

Sara, Kallioniemi, and Nees

References 1. Bennett, S.T. et al. (2005) Toward the 1,000 dollars human genome. Pharmacogenomics 6, 373–382 2. Church, G.M. (2006) Genomes for all. Sci. Am. 294, 46–54 3. Collins, F.S. and Barker, A.D. (2007) Mapping the cancer genome. Pinpointing the genes involved in cancer will help chart a new course across the complex landscape of human malignancies. Sci. Am. 296, 50–57 4. Schena, M. et al. (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270, 467–470 5. Shalon, D. et al. (1996) A DNA microarray system for analyzing complex DNA samples using two-color fluorescent probe hybridization. Genome Res. 6, 639–645 6. Schena, M. et al. (1996) Parallel human genome analysis: microarray-based expression monitoring of 1000 genes. Proc. Natl. Acad. Sci. USA 93, 10614–10619 7. DeRisi, J. et al. (1996) Use of a cDNA microarray to analyse gene expression patterns in human cancer. Nat. Genet. 14, 457–460 8. Lipshutz, R.J. et al. (1995) Using oligonucleotide probe arrays to access genetic diversity. BioTechniques 19, 442–447 9. Brazma, A. et al. (2001) Minimum information about a microarray experiment (MIAME)toward standards for microarray data. Nat. Genet. 29, 365–371 10. Perou, C.M. et al. (1999) Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. Proc. Natl. Acad. Sci. USA 96, 9212–9217 11. Sorlie, T. et al. (2001) Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc. Natl. Acad. Sci. USA 98, 10869–10874 12. van ‘t Veer, L.J. et al. (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 13. van de Vijver, M.J. et al. (2002) A geneexpression signature as a predictor of survival in breast cancer. N. Engl. J. Med. 347, 1999–2009 14. Kallioniemi, A. et al. (1992) Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors. Science 258, 818–821 15. Pinkel, D. et al. (1998) High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nat. Genet. 20, 207–211

16. Pollack, J.R. et al. (1999) Genome-wide analysis of DNA copy-number changes using cDNA microarrays. Nat. Genet. 23, 41–46 17. Bergamaschi, A. et al. (2006) Distinct patterns of DNA copy number alteration are associated with different clinicopathological features and gene-expression subtypes of breast cancer. Genes Chromosomes Cancer 45, 1033–1040 18. Chin, K. et al. (2006) Genomic and transcriptional aberrations linked to breast cancer pathophysiologies. Cancer Cell. 10, 529–541 19. Neve, R.M. et al. (2006) A collection of breast cancer cell lines for the study of functionally distinct cancer subtypes. Cancer Cell 10, 515–527 20. Fridlyand, J. et al. (2006) Breast tumor copy number aberration phenotypes and genomic instability. BMC Cancer 6, 96 21. Weir, B.A. et al. (2007) Characterizing the cancer genome in lung adenocarcinoma. Nature 450(7171), 893–898 22. Garraway, L.A. et al. (2005) Integrative genomic analyses identify MITF as a lineage survival oncogene amplified in malignant melanoma. Nature 436, 117–122 23. Easton, D.F. et al. (2007) Genome-wide association study idntifies novel breast cancer susceptibility loci. Nature 447, 1087–1093 24. Futreal, P.A. et al. (2004) A census of human cancer genes. Nat. Rev. Cancer 4, 177–183 25. Tomlins, S.A. et al. (2005) Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science 310, 644–648 26. Tomlins, S.A. et al. (2007) Distinct classes of chromosomal rearrangements create oncogenic ETS gene fusions in prostate cancer. Nature 448, 595–599 27. Lipshutz, R.J. (1993) Likelihood DNA sequencing by hybridization. J. Biomol. Struct. Dyn. 11, 637–653 28. Margulies, M. et al. (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376–380 29. Emrich, S.J. et al. (2007) Gene discovery and annotation using LCM-454 transcriptome sequencing. Genome Res. 17, 69–73 30. Davies, H. et al. (2002) Mutations of the BRAF gene in human cancer. Nature 417, 949–954 31. Bardelli, A. et al. (2003) Mutational analysis of the tyrosine kinome in colorectal cancers. Science 300, 949 32. Wang, Z. et al. (2004) Mutational analysis of the tyrosine phosphatome in colorectal cancers. Science 304, 1164–1166

33. Samuels, Y. et al. (2004) High frequency of mutations of the PIK3CA gene in human cancers. Science 304, 554 34. Ikediobi, O.N. et al. (2006) Mutation analysis of 24 known cancer genes in the NCI-60 cell line set. Mol. Cancer Ther. 5, 2606–2612 35. Stephens, P. et al. (2005) A screen of the complete protein kinase gene family identifies diverse patterns of somatic mutations in human breast cancer. Nat. Genet. 37, 590–592 36. Futreal, P.A. et al. (2005) Somatic mutations in human cancer: insights from resequencing the protein kinase gene family. Cold Spring Harb. Symp. Quant. Biol. 70, 43–49 37. Davies, H. et al. (2005) Somatic mutations of the protein kinase gene family in human lung cancer. Cancer Res. 65, 7591–7595 38. Greenman, C. et al. (2007) Patterns of somatic mutation in human cancer genomes. Nature 446, 153–158 39. Thomas, R.K. et al. (2007) High-throughput oncogene mutation profiling in human cancer. Nat. Genet. 39, 347–351 40. Wang, T.L. et al. (2002) Prevalence of somatic alterations in the colorectal cancer cell genome. Proc. Natl. Acad. Sci. USA 99, 3076–3080

A Decade of Cancer Gene Profiling

87

41. Sjoblom, T. et al. (2006) The consensus coding sequences of human breast and colo rectal cancers. Science 314, 268–274 42. Wood, L.D. et al. (2007) The genomic landscapes of human breast and colorectal cancers. Science 318(5853), 1108–1113 43. Calin, G.A. and Croce, C.M. (2006) MicroRNA signatures in human cancers. Nat. Rev. Cancer 6, 857–866 44. Stransky, N. et al. (2006) Regional copy numberindependent deregulation of transcription in cancer. Nat. Genet. 38, 1386–1396 45. Barski, A. et al. (2007) High-resolution profiling of histone methylations in the human genome. Cell 129, 823–837 46. Taylor, K.H. et al. (2007) Ultradeep bisulfite sequencing analysis of DNA methylation patterns in multiple gene promoters by 454 sequencing. Cancer Res. 67, 8511–8518 47. Tomlins, S.A. et al. (2007) Integrative molecular concept modeling of prostate cancer progression. Nat. Genet. 39, 41–51 48. Boehm, J.S. et al. (2007) Integrative genomic approaches identify IKBKE as a breast cancer oncogene. Cell 129, 1065–1079 49. Vogelstein, B. and Kinzler, K.W. (2004) Cancer genes and the pathways they control. Nat. Med. 10, 789–799

Chapter 6 Mining Expressed Sequence Tag (EST) Libraries for Cancer-Associated Genes Armin O. Schmitt Summary Originally established in the beginning of the 1990s as a direct route to gene finding, expressed sequence tags (ESTs) still lend themselves as a means to analyze gene expression in almost all human tissues. The type of questions that can be addressed using public EST libraries ranges from tissue-specific gene profiling to the comparison between tissues in diseased and healthy states. Thanks to a multitude of web-based online bioinformatics resources, mining in EST libraries is not restricted to experts in the field of data analysis, but can readily be performed by the medical or life scientist. In this chapter, a couple of cases studies are presented that guide the scientist to the most useful online resources so that they can conduct their own research. Key words: Gene expression, Differential gene expression, cDNA library, One-pass sequencing, Expressed sequence tag (EST), dbEST, Cancer genes, Online web tools, Statistical analysis, Bioinformatics

1. Introduction The only way to collect an organism’s full gene complement is, of course, to determine its genomic sequence. The technology of full-genome sequencing was developed in the late 1990s and culminated in the sequencing of the human genome in 2001. However, even before that era, scientists knew approximately the range of genes that can be found in a eukaryotic organism. This knowledge was gained primarily thanks to a technology called expressed sequence tags (ESTs) (1). An EST is a short sequence, approximately 400- to 600-bases long, that is obtained by sequencing a messenger RNA (mRNA) just once. This leads Robert Grützmann and Christian Pilarsky (eds.), Cancer Gene Profiling: Methods and Protocols, Methods in Molecular Biology, vol. 576, DOI 10.1007/978-1-59745-545-9_6, © Humana Press, a part of Springer Science + Business Media, LLC 2010

89

90

Schmitt

to a relatively high error rate of approximately 2–5% because possible errors cannot be corrected which would be done with multiple sequencing. The principal feature of an EST is that it represents a gene despite its short length and its relative inaccuracy. It can be associated uniquely and safely with exactly one gene due to the still high sequence similarity. An EST can thus be seen as a gene’s signature, as a pars pro toto. Furthermore, no intronic sequence, the noncoding part between the exons, will be copied into mRNA, and, hence, into an EST, although introns are generally considered an integral part of genes. With the exception of 3¢ ESTs, which often overlap so called untranslated regions (UTRs), ESTs generally represent coding sequence, i.e., sequence that is used as construction plan for proteins. A whole EST library is generated by randomly picking clones from a cDNA library that was produced from a well-defined tissue and by sequencing these clones from the 5¢ end to the 3¢ end. It is in the nature of the construction of EST libraries that the genes whose mRNA is frequent in the tissue under investigation will be represented with many copies in the EST library. And likewise, the genes that are rare in the tissue will be represented with few copies in the EST library. We can bona fide assume proportionality between the expression strength of a gene in a certain tissue or cell type and the number of ESTs with which the gene is represented in an EST library. Briefly, in an ideal EST library, the number of mRNA copies of an mRNA type is reflected by its number of ESTs in the EST library. A shortcoming of the EST approach is certainly that rare genes in a tissue will very likely not be present in the EST library at all. It is, therefore, not valid to conclude that a gene is not expressed in a certain tissue from the fact that it is not represented in an EST library. In order to reduce the complexity of an EST library and, thus, to facilitate gene discovery, subtracted and normalized EST libraries can be generated (2). These are methods to get rid of the range of few, but highly expressed genes, so-called housekeeping genes, that are well studied and not relevant for further analyses. It must, however, be noted that for analysis of gene expression ratios, such subtracted or normalized libraries are not suitable because the number of ESTs in them is in general not proportional to their copy number in the tissue. The most prominent collection of public EST data is the database dbEST (3, 4), maintained by the National Center for Biotechnology Information (NCBI) in the United States (http://www.ncbi.nlm.nih.gov/ dbEST). Currently, dbEST comprises more than 62 million ESTs from several hundred organisms, the highest numbers for individual organisms being eight million for human, almost five million for mouse, and more than one million for eleven other organisms including economically important domestic species such as cattle, pig, rice, maize, and wheat.

Mining Expressed Sequence Tag (EST) Libraries for Cancer-Associated Genes

91

Nowadays, in the whole genome sequencing era, the EST approach has nothing lost of its significance. EST libraries are being produced in cases in which sequencing of a whole genome is difficult because of its large size or because of its repetitive structure (5). In combination with genomic data, EST libraries are an indispensable resource to predict genes and to determine gene boundaries and gene structures. For these purposes, ESTs are aligned against the genomic sequence; exons are identified whenever an EST or a part of an EST can be aligned sufficiently well with a stretch of genomic DNA (6). ESTs are furthermore particularly helpful in the detection of alternative splicing (7). Before the advent of cheap and massive microarrays, EST libraries were the only high-throughput method for differential gene expression studies in, for instance, healthy and cancerous tissue (8). Analysis of gene expression using EST libraries ab initio involves a complex experimental and analytical pipeline. At the beginning of such a pipeline, EST libraries have to be generated. This includes preparation of the tissue, cloning mRNAs, and sequencing of the clones, as described, e.g., in ref. (9). Once sequenced, the ESTs have to be processed in many ways to remove experimental artifacts and to control their quality. This includes the removal of vector sequences and low-quality sequences and the masking of repetitive sequences (10). In a next step, ESTs derived from one and the same mRNA can be assembled, i.e., for each EST of an EST library, all other ESTs from that or any other library that have sufficiently, usually more than 95%, sequence similarity with it are searched by the sequence comparison program BLAST (11). The group of all ESTs that are sufficiently similar to each other is called an EST cluster. A consensus sequence can be determined from such an EST cluster by finding the most frequent nucleotide for each position of the cluster. This consensus sequence is typically longer than the sequence of the individual ESTs obtained from a clone because the ESTs are not derived from exactly the same parts of the clone. An iterative search and assembly strategy was suggested that in many cases allowed one to obtain the full-length sequence of a gene (8). All publicly available ESTs are disposable as clusters in the Unigene databases (12, 13). Because each of the above-sketched analysis steps demands expertise in molecular biological or bioinformatics knowledge, the detailed descriptions would go beyond the scope of this chapter. For the biomedical investigator who is interested in generating and analyzing EST libraries of their own, the most important tools for the processing are reviewed in ref. (10). Luckily, a wealth of stateof-the-art prepared and documented EST libraries is publicly accessible so that, together with online web-based analysis tools, enough material is provided for analyses of various kinds. In the remainder of this chapter, we will therefore focus on analytical methods that can be applied with an ordinary browser on a standard PC.

92

Schmitt

2. Materials All that is needed to carry out the analyses sketched under Subheading 3 is a PC with a browser such as Firefox and an internet connection.

3. Methods One of the most important types of analyses that are possible with publicly available EST libraries and tools is the search for genes that are differentially expressed between cancerous and healthy tissues. The Cancer Genome Anatomy Project (CGAP) serves exactly this purpose (14, 15). Currently, CGAP comprises 276 human EST libraries, 176 of which were derived from cancerous tissues, the rest being shared by healthy or uncharacterized tissues (see Note 1). Forty-one tissue types are currently represented, with the most represented tissues being brain, colon, lung, and ovary. Two closely related tools to identify differentially expressed genes are offered: the xProfiler (http://cgap.nci.nih.gov/Tissues/ xProfiler) and the cDNA Digital Gene Expression Displayer (DGED) (http://cgap.nci.nih.gov/Tissues/GXS). The analysis sessions for the two tools are practically identical, only the output will be different (see Note 1). In the following, we are interested in finding genes that could play a role in human colon cancer. 3.1. Identification of Differentially Expressed Genes

We would like to obtain very reliable results and include therefore only libraries gained from microdissected tissues. Furthermore, we would like to calculate expression ratios, i.e., numbers that tell us how many times a gene occurs more (or less) often in a cancerous tissue than in the healthy tissue. We therefore confine our analysis to non-normalized libraries because only this type guarantees proportionality between mRNA abundance and EST number. 1. Point your browser to http://cgap.nci.nih.gov/Tissues/ XProfiler (see Note 2). 2. Select the organism homo sapiens (default). 3. Select the library group “All EST libraries” (default). Unless there are very good reasons why you would like to exclude EST libraries from a specific center or project, it is recommended to include all EST libraries to gain statistical power. 4. Select the minimum number of sequences/library. I recommend not excluding any EST library just because it is small. Therefore, it is advisable to change the default (10 for profiler and 1,000 for DGED) to 0.

Mining Expressed Sequence Tag (EST) Libraries for Cancer-Associated Genes

93

5. Do not alter the default setting in “List libraries by.” 6. In Pools A and B, choose “Colon” in “Tissue Type.” Be sure that the “Include” radio button is activated. 7. In Pools A and B, choose “Microdissected” in “Tissue Preparation.” 8. In Pools A and B, choose “Non-normalized” in “Library Protocol.” 9. In Pool A, choose “Normal” and in Pool B, choose “Cancer” in “Tissue histology.” If you would like to include tissues in a precancerous state, choose “Pre-cancer” in addition to “Cancer.” This is achieved by first pressing “Control” on your keyboard and then by clicking on “Pre-cancer.” 10. Do not enter anything into “Library Name” for Pool A and B. This field should be used only if you want to compare two specific EST libraries whose exact identifiers have to be known to you. This is not the case in a typical exploratory analysis, such as is described here. 11. Next, click the “Submit Query” button and be patient. The server is easily busy for a minute or more depending on the time of the day. 12. As an intermediate result, the list of libraries fulfilling your requirements is presented. Go carefully through the short description that is provided. The names of the libraries are linked to a more detailed description in case you need more information. At any time, you can return from the detailed description via the return button of your browser. Make sure that you want to include the presented libraries in your analysis. Otherwise, exclude the library from the pool by clearing (deselecting) the corresponding box. 13. If you are convinced that the assignment of the libraries to Pools A and B is correct, submit the query again. 14. After a while, Xprofiler presents its results as a table containing numbers of genes. Genes are classified as “Unique,” which means that they occur exclusively in colon (regardless of the tissue histology) or “Non-Unique”; and as “Known,” which means that they were characterized in earlier studies and that their function is at least partially known or as “Unknown.” The number of the such-classified genes is given in pool A and B separately, in the union of A and B, in the intersection of A and B, and in the differential subsets (in A, but not in B; in B, but not in A). 15. Clicking on the highlighted numbers will provide the corresponding lists of genes. Clicking on “Gene info” will offer you a plethora of information related to a gene in which you are interested.

94

Schmitt

A session with Xprofiler gives you information about the presence in and absence from libraries for genes. The most informative part of the output is probably the genes hidden behind “A minus B” and “B minus A,” because they seem to be strictly related to cancer in the sense that they are either activated by cancer or suppressed by cancer. The output does not tell you how many times a gene is represented in one of the pools, which could be a valuable information for you. In the case of presence in both pools, you have no possibility to judge in which pool it is prevailing. To address questions of this type, DGED is the correct tool. As aforementioned, the analysis is analogous to that with Xprofiler. To start DGED, point your browser to http://cgap.nci.nih. gov/Tissues/GXS and proceed until step 11 as described above (see Note 2). As an intermediate result, you now receive, in addition to the list of libraries with their assignment to the pools, three more criteria that you can use to filter your results. First, you have to decide on the expression ratio. The default value of 2 means that genes that are at least twice overexpressed in either the healthy or the cancer pool will be shown. Second, you have to decide which statistical significance is appropriate for your analysis. The default value of 0.05 means that a distribution of ESTs of a gene as uneven or more uneven than the one that you observe between the two pools can happen by chance in 5% of all cases. Last, you can confine your search to a given chromosome. In an initial search, it is advisable to leave the default settings unaltered. A search for genes differentially expressed in colon with the default settings would lead to no result (“No tags were found”). Obviously, our criteria were too strict. We can repeat the last step of the analysis via the reverse button of your browser and then setting F and P equal to 1. This shows us all 132 genes. An inspection of the list that we obtained indicated that all genes were represented by very few ESTs (column “Sequences”). The most significant P value is 0.15 for the gene RPL13, a ribosomal protein. We can loosen our criteria in many different ways. If we decide to include bulk tissue, for example, we have to repeat the analysis and alter the selection in step 7. This time, 47 genes fulfill the criteria of differential expression and statistical significance. The “best” gene is now COX6C, the cytochrome c oxidase subunit Vic, which occurs 12 times in the pool of healthy tissue and which is absent from the pool of cancerous tissue. The P value for this partition is smaller than the precision provided in the list, therefore, it is indicated as 0.00. It must, however, be noted that P values can never attain the value of 0 exactly. In the column marked “Seq Odds A:B,” we see the symbol NaN, which stands for “not a number” (see Note 3). This is because we have no representative ESTs for this gene in one of the pools. Because this value is calculated as the ratio of the relative frequencies of an EST type in the two pools, this would mean division by 0, which is, of course, not defined. In such cases, the P value has to serve as

Mining Expressed Sequence Tag (EST) Libraries for Cancer-Associated Genes

95

the sole criterion to judge the occurrence pattern of a gene. In cases of very clear P values (very close to 0), this is no considerable limitation. Along similar lines, the two above-presented tools, Xprofiler and DGED, can be used to compare different tissues and to extract, e.g., genes whose expression differs between colon and prostate. A very useful type of analysis would be the search for genes that are specific for a tissue. This can be easily realized by setting up one pool with the tissue under investigation and the other pool with all other tissues using the “Exclude” radio button mentioned under step 6 in Subheading 3.2. A very similar tool is Digital Differential Display (DDD), which can be started by pointing your browser to the Unigene home page http://www.ncbi.nlm.nih.gov/sites/entrez (see Note 2), and then choosing “DDD” from the left menu bar. The main advantage over Xprofiler and DGED is that you can assign names to your pools and that the results are highlighted visually. The disadvantage is that you cannot determine thresholds for the expression ratio and the P value and that these two values are not presented in the output (see Note 4). You can, however, determine the expression ratio easily yourself by dividing the two relative frequencies of ESTs in the pools for a given gene. Examples for medically important results obtained with DDD are presented, e.g., in refs. (16, 17). 3.2. Mining EST Libraries for Genes That Are Coexpressed with an Interesting Gene

Another interesting way to exploit EST libraries is to use them to predict gene function for novel genes. The rationale behind it is that a priori unknown genes that behave similarly to wellcharacterized genes could have a comparable function or could take part in the same pathway. By “similar behavior,” we mean similar expression profiles across a multitude of different tissues. Put simply: the unknown genes are highly expressed in the same tissues and are barely or not expressed at all in the same tissues as the well-characterized gene. This idea was termed “guilt by association (GBA)” and was described in ref. (18). An application example how Parkinson’s disease genes could be found by GBA is given in ref. (19). You can carry out GBA analyses using the GBA server provided by the University of Peking. In our case study, we are interested in finding genes that are co-regulated with the well-known breast cancer gene BRCA1. Unfortunately, the web interface is not very comfortable because you have to enter the Unigene ID, but the “GBA Gene Matcher” offered on the right menu bar is not active. However, we can readily find out the Unigene ID for our gene of interest thanks to another project, called GeneCards (20). 1. Point your browser to http://www.genecards.org (see Note 2). 2. Beneath “SEARCH,” enter “brca1” into the field and click “Symbol only” under “Search by.” 3. Under “Options,” activate “Show microcards only” and “Sort microcards alphabetically.”

96

Schmitt

4. Press the “Go” button. 5. Look for the text string “unigene cluster” using the search function of your browser. The ID of the structure “Hs. XYZ,” where XYZ is a number, is the Unigene ID of your gene. It is Hs.194143 for BRCA1. 6. Now the actual GBA analysis can start. Point your browser to http://gba.cbi.pku.edu.cn:8080/gba (see Note 2). 7. Find the “GBA Engine” button on the left side of the browser and click on it. 8. Enter the Unigene ID in the field next to “UniGeneID” and your e-mail address into the field next to “Your Mail Address.” 9. Leave all other default settings as they are and press the “Search” button. 10. The result will be shown on the screen and be sent to you by e-mail after a while. 11. The result page shows you a list of 30 genes that occur preferentially in the same EST libraries as BRCA1 and that, furthermore, preferentially do not occur in the same EST libraries where BRCA1 does not occur. This list is ordered by statistical significance, i.e., the top-ranking gene is the one with the smallest P value (see Note 4). 12. Choosing the “Show” button for “more information on co-expressed genes” presents you a list with a short description of the gene function and the usual gene name. 13. Clicking on the highlighted UnigeneIDs in the result page mentioned in step 11 provides you with very detailed information about the genes, such as, for instance, the LocusLink entry or gene ontology (GO) terms (21) associated with the genes. 14. The “Run GBA” button will start a new GBA search for the given gene from the list (not for BRCA1 again).

4. Notes 1. Molecular biological methods evolve at a very fast pace, and so do the techniques to analyze and organize any data derived from the high-throughput application of them. This holds also true for the content behind and the appearance of online web tools such as those described here. So be prepared that you probably will not be able to reproduce exactly the examples given in this chapter. Due to the increase or removal of EST libraries, the results will change accordingly. This is, however, very natural and does not mean that anything went “wrong” in your analysis.

Mining Expressed Sequence Tag (EST) Libraries for Cancer-Associated Genes

97

2. In addition, be prepared that the internet address (URL) that leads you to the online web tools can alter due to reorganization of the great molecular biological research centers such as NCBI. If this has happened, there is still a great chance to locate the tool using a general search engine such as Google. Typing in keywords like “xprofiler, est” will probably lead you quickly to the new URL. 3. Unfortunately, terminology used in the field of biomolecular analyses, as in many other fields of science, sometimes is not unambiguous. For example, the term “odds ratio” is used in the DGED tool in a different way than it is used in medical statistics, where it is used to quantify risks (see Subheading 3.1). 4. Statistical issues are an integral part of bioinformatics analyses such as those described here. Of particular importance is the P value, which tells you how likely it is to observe a result (such as differential gene expression) by chance, i.e., if in reality there is no differential expression, but your finding has been rather due to an untypical sampling procedure. Data mining typically produces lists of genes (or any other biological entities) rather than singular genes. If the analysis includes statistical testing, i.e., the calculation of P values, then statisticians speak of “multiple testing.” In such a case, it has to be considered that the P values are calculated as if only one gene was analyzed. To obtain P values that are more realistic, they have to be corrected. The simplest such correction method is the so-called Bonferroni correction. It says that all P values obtained in multiple testing should be multiplied by the number of tests performed. Therefore, if you obtained a list including 12 genes in an analysis, multiply each P value by 12. Notice that some applications provide the possibility of correcting P values, such as, for instance, the GBA engine (see step 11 in Subheading 3.1). References 1. M.D. Adams, M.B. Soares, A.R. Kerlavage, C. Fields, and J.C. Venter. Rapid cDNA sequencing (expressed sequence tags) from a directionally cloned human infant brain cDNA library. Nat. Genet., 4:373–380, 1993. 2. M.F. Bonaldo, G. Lennon, and M.B. Soares. Normalization and subtraction: two approaches to facilitate gene discovery. Genome Res., 6: 791–806, 1996. 3. M.S. Boguski, T.M. Lowe, and C.M. Tolstoshev. dbEST – database for “expressed sequence tags”. Nat. Genet., 4:332–333, 1993. 4. M.S. Boguski. The turning point in genome research. Trends Biochem. Sci., 20:295–296, 1995.

5. J.L. Bennetzen. Mechanisms and rates of genome expansion and contraction in flowering plants. Genetica, 115:29–36, 2002. 6. Z. Kan, E.C. Rouchka, W.R. Gish, and D.J. States. Gene structure prediction and alternative splicing analysis using genomically aligned ESTs. Genome Res., 11:889–900, 2001. 7. B. Modrek, A. Resch, C. Grasso, and C. Lee. Genome-wide detection of alternative splicing in expressed sequences of human genes. Nucleic Acids Res., 29:2850–2859, 2001. 8. A.O. Schmitt, T. Specht, G. Beckmann, E. Dahl, C.P. Pilarsky, B. Hinzmann, and A. Rosenthal. Exhaustive mining of EST libraries for genes differentially expressed in normal

98

9.

10.

11. 12. 13.

Schmitt and tumour tissues. Nucleic Acids Res., 27: 4251–4260, 1999. M.D. Adams, J.M. Kelley, J.D. Gocayne, M. Dubnick, M.H. Polymeropoulos, H. Xiao, C.R. Merril, A. Wu, B. Olde, and R.F. Moreno. Complementary DNA sequencing: expressed sequence tags and human genome project. Science, 252:1651–1656, 1991. S.H. Nagaraj, R.B. Gasser, and S. Ranganathan. A hitchhiker’s guide to expressed sequence tag (EST) analysis. Brief. Bioinform., 8:6–21, 2007. S.F. Altschul, W. Gish, W. Miller, E.W. Myers, and D.J. Lipman. Basic local alignment search tool. J. Mol. Biol., 215:403–410, 1990. M.S. Boguski, and G.D. Schuler. ESTablishing a human transcript map. Nat. Genet., 10:369–371, 1995. G.D. Schuler, M.S. Boguski, E.A. Stewart, L.D. Stein, G. Gyapay, K. Rice, R.E. White, P. Rodriguez-Tomé, A. Aggarwal, E. Bajorek, S. Bentolila, B.B. Birren, A. Butler, A.B. Castle, N. Chiannilkulchai, A. Chu, C. Clee, S. Cowles, P.J. Day, T. Dibling, N. Drouot, I. Dunham, S. Duprat, C. East, C. Edwards, J.B. Fan, N. Fang, C. Fizames, C. Garrett, L. Green, D. Hadley, M. Harris, P. Harrison, S. Brady, A. Hicks, E. Holloway, L. Hui, S. Hussain, C. Louis-Dit-Sully, J. Ma, A. MacGilvery, C. Mader, A. Maratukulam, T.C. Matise, K.B. McKusick, J. Morissette, A. Mungall, D. Muselet, H.C. Nusbaum, D.C. Page, A. Peck, S. Perkins, M. Piercy, F. Qin, J. Quackenbush, S. Ranby, T. Reif, S. Rozen, C. Sanders, X. She, J. Silva, D.K. Slonim, C. Soderlund, W.L. Sun, P. Tabar, T. Thangarajah, N. Vega-Czarny, D. Vollrath, S. Voyticky, T. Wilmer, X. Wu, M.D. Adams, C. Auffray, N.A. Walter, R. Brandon, A. Dehejia, P.N. Goodfellow, R. Houlgatte, J.R. Hudson, S.E. Ide, K.R. Iorio, W.Y. Lee, N. Seki, T. Nagase, K. Ishikawa, N. Nomura, C. Phillips, M.H. Polymeropoulos, M. Sandusky, K. Schmitt, R. Berry, K. Swanson, R. Torres, J.C. Venter, J.M. Sikela, J.S. Beckmann,

14.

15.

16.

17.

18.

19.

20.

21.

J. Weissenbach, R.M. Myers, D.R. Cox, M.R. James, D. Bentley, P. Deloukas, E.S. Lander, and T.J. Hudson. A gene map of the human genome. Science, 274:540–546, 1996. R.L. Strausberg, S.F. Greenhut, L.H. Grouse, C.F. Schaefer, and K.H. Buetow. In silico analysis of cancer through the Cancer Genome Anatomy Project. Trends Cell Biol., 11:66–71, 2001. R.L. Strausberg. The Cancer Genome Anatomy Project: new resources for reading the molecular signatures of cancer. J. Pathol., 195:31–40, 2001. D. Scheurle, M.P. DeYoung, D.M. Binninger, H. Page, M. Jahanzeb, and R. Narayanan. Cancer gene discovery using digital differential display. Cancer Res., 60:4037–4043, 2000. H.L. Yang, E.Y. Cho, K.H. Han, H. Kim, and S.J. Kim. Characterization of a novel mouse brain gene (mbu-1) identified by digital differential display. Gene, 395:144–150, 2007. M.G. Walker, W. Volkmuth, E. Sprinzak, D. Hodgson, and T. Klingler. Prediction of gene function by genome-scale expression analysis: prostate cancer-associated genes. Genome Res., 9:1198–1203, 1999. M.G. Walker, W. Volkmuth, and T.M. Klingler. Pharmaceutical target discovery using Guilt-by-Association: schizophrenia and Parkinson’s disease genes. Proc. Int. Conf. Intell. Syst. Mol. Biol., 282–286, 1999. M. Rebhan, V. Chalifa-Caspi, J. Prilusky, and D. Lancet. GeneCards: integrating information about genes, proteins and diseases. Trends Genet., 13:163, 1997. M. Ashburner, C.A. Ball, J.A. Blake, D. Botstein, H. Butler, J.M. Cherry, A.P. Davis, K. Dolinski, S.S. Dwight, J.T. Eppig, M.A. Harris, D.P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J.C. Matese, J.E. Richardson, M. Ringwald, G.M. Rubin, and G. Sherlock. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet., 25:25–29, 2000.

Chapter 7 Automated Fluorescent Differential Display for Cancer Gene Profiling Jonathan D. Meade, Yong-jig Cho, Blake R. Shester, Jamie C. Walden, Zhen Guo, and Peng Liang Summary Since its invention in 1992, differential display (DD) has become the most commonly used technique for identifying differentially expressed genes because of its many advantages over competing technologies such as DNA microarray, serial analysis of gene expression (SAGE), and subtractive hybridization. A large number of these publications have been in the field of cancer, specifically on p53 target genes. Despite the great impact of the method on biomedical research, there had been a lack of automation of DD technology to increase its throughput and accuracy for systematic gene expression analysis. Many previous DD work has taken a “shotgun” approach of identifying one gene at a time, with a limited number of polymerase chain reactions (PCRs) set up manually, giving DD a low-tech and low-throughput image. We have optimized the DD process with a platform that incorporates fluorescent digital readout, automated liquid handling, and large-format gels capable of running entire 96-well plates. The resulting streamlined fluorescent DD (FDD) technology offers an unprecedented accuracy, sensitivity, and throughput in comprehensive and quantitative analysis of gene expression. These major improvements will allow researchers to find differentially expressed genes of interest, both known and novel, quickly and easily. Key words: Fluorescent differential display, DD, FDD, Differential gene expression, Automation, Cancer gene profiling, Differential display on automated sequencer

1.Introduction The complete sequencing of the 3 billion base pair (bp) human genome was an amazing accomplishment, but the hardest work is still ahead of us. Of the estimated 20,000–25,000 genes embedded in our genome, only a fraction of them, perhaps 10–15%, Robert Grützmann and Christian Pilarsky (eds.), Cancer Gene Profiling: Methods and Protocols, Methods in Molecular Biology, vol. 576, DOI 10.1007/978-1-59745-545-9_7, © Humana Press, a part of Springer Science + Business Media, LLC 2010

99

100

Meade et al.

are “turned on” (expressed as messenger RNAs [mRNAs] for protein synthesis) at any given time in each of our cells. Thus, interpretation of the genomic instructions in the post-genome era will have to rely, at least in large part, on tools that can allow us to determine when and where a gene is to be turned on or off in a cell as it divides, differentiates, and ages. Such tools are also important for the detection of when and where a seemingly precise interpretation of genomic instructions goes awry, which underlies many disease states including cancer. Because so many genes are involved in cancer, it is very difficult to study and understand. Making it even more difficult is that many oncogenes or tumor-suppressor genes are signaling molecules themselves, each of which functions to control the expression of a subset of downstream genes (1, 2). So, the analysis of differential gene expression – also known as gene profiling, expression genetics, or functional genomics – has become one of most widely used strategies for discovering and understanding the molecular circuitry underlying cancer. Differential display (DD) technology (3) is one of the major tools that has already helped thousands of researchers all over the world interpret gene expression in diverse biological systems ranging from yeast, fungi, plants, insects, worms, fish, reptiles, amphibians, to mammals (4–6). Since the invention of DD in 1992, the number of publications using DD has exploded to more than 3,900, outnumbering the publications using other competitive methodologies such as DNA microarrays (7, 8), serial analysis of gene expression (SAGE) (9), and subtractive hybridization (10) (see Table 1). Hundreds of these DD publications have been in the field of cancer. Many oncogene targets have been identified by DD, including genes that are regulated by RAS (11, 12), v-REL (13), and ERBB (14).

Table 1 Impact of major technologies in differential gene expression analysis Method

# of citations

Original publication

Differential display

3,965

Science 1992, 257:967–971

DNA microarrays

3,462

Science 1995, 270:467–470

SAGE

2,036

Science 1995, 270:484–487

Oligo arrays

905

Science 1996, 274:610–614

Number of citations is the number of times the original publication has been cited by other papers, which reflects the number of times each technique has been used for publications Search done with ISI Web of Knowledge Citation Search. Search conducted on January 25, 2008 at http://isi15.isiknowledge.com/portal.cgi?DestApp= WOS&Func=Frame

Automated Fluorescent Differential Display for Cancer Gene Profiling

101

One of the RAS target genes was shown to be a new cytokine, now known as interleukin (IL)-24 (15). The power of DD can be illustrated by using p53, the main tumor suppressor gene, as an example. Increasing numbers of candidate p53 target genes are being identified (16) and, amazingly, approximately half of the better understood p53 target genes were identified by DD (see Table 2) (17–43).

Table 2 List of the major potential p53 target genes identified by different technologies Gene(s)

Definition/function

Methoda

Reference(s)

Mdm-2

p53 negative regulator

Candidate

(17)

p21

Cdk2 inhibitor

SH, SAGE

(18)

14-3-3 sigma

Growth inhibition

SAGE

(17)

GADD45

DNA repair

SH

(17)

Bax

Apoptosis

Candidate

(19)

Cyclin G

Cell cycle regulator

DD

(20)

IGFBP-3

IGF binding protein, growth inhibition

SH

(21)

PIG3

NADPH-quinone oxidoreductase

SAGE

(22)

KILLER/DR5

Apoptosis

SH, SAGE

(23)

Ei24/PIG8

Novel gene, apoptosis

DD, SAGE

(22, 24)

PAG608

Novel zinc finger protein, apoptosis

DD

(25)

DDA3

Novel gene, growth inhibition

DD

(26)

TP53TG1

Novel gene, DNA-damage

DD

(27)

TP53TG3

Novel gene, cell cycle checkpoint

DD

(28)

p53R2

Ribonucleotide reductase

DD

(29)

PERP

Novel gene, pro-apoptotic

SH

(30)

PIR121

Novel gene, RNA binding

Array

(31)

Noxa

Novel gene, pro-apoptotic BH3 protein

DD

(32)

Pidd

Novel gene, death-domain protein

DD

(33)

p53AIP1

Novel gene, apoptosis, p53 phosphorylation

DD

(34)

p53DINP1

Novel gene, apoptosis, p53 phosphorylation

DD

(35)

PUMA

Novel gene, pro-apoptotic BH3-protein

SAGE, array

(36, 37)

Pirh2

Ubiquitin ligase, p53 negative regulator

DD

(38) (continued)

102

Meade et al.

Table 2 (continued) Gene(s)

Definition/function

Methoda

Reference(s)

Pac1

Protein phosphatase, pro-apoptotic

Array

(39)

Fas/APO-1

Cell death receptor

Candidate

(40)

Apaf-1

Apoptosis

Array

(41)

PTEN

Tumor suppressor

Candidate

(42)

Bid

Apoptosis

Array

(43)

a DD differential display, Array DNA microarray, SAGE serial analysis of gene expression, SH subtractive hybridization, Candidate candidate screening

It is clear that the rapid and successful adoption of differential display has been largely attributed to the simplicity of the method. Simplicity ensures a higher probability of success and few artifactual differences caused by experimental errors. Essentially, starting from the RNA samples being compared, only two steps, reverse transcription (RT) and polymerase chain reaction (PCR), are needed before signals generated are analyzed on a gel matrix. No additional steps such as second-strand DNA synthesis, purification of complementary DNA (cDNA), restriction enzyme digestion, adapter primer ligation, probe labeling/normalization, hybridization, or washing steps are required, because each of these steps could introduce and amplify errors or lead to the loss of mRNAs being detected. DD takes advantage of three of the most simple, powerful, and commonly used molecular biological methods: RT-PCR, DNA sequencing gel electrophoresis, and cDNA cloning (3–6, 44, 45). The DD methodology, also referred to as DDRT-PCR or DD-PCR in PCR nomenclature (46, 47), begins with total RNA being harvested from the cells/tissues of interest. A researcher will study at least two samples, but many more can be studied if the experiment suggests so. These samples will have morphological, genetic, or other experimental differences for which the researcher wishes to study the gene expression patterns, hoping to elucidate the root cause of the particular difference or specific genes that are affected by the experiment. Samples can be from any eukaryotic organism, including plants, fish, amphibians, reptiles, insects, yeast, fungi, and mammals. DD can be adapted for prokaryotic systems, but is more often used with eukaryotes. The messenger RNAs (mRNAs) within the total RNA population are used as the templates for DD-PCR after first-strand cDNA synthesis by reverse transcription. The current methodology makes use of three “anchored” oligo-dT primers that target the poly-adenylation (poly-A) site of eukaryotic mRNA and

Automated Fluorescent Differential Display for Cancer Gene Profiling

103

have the form H-T11M, where H is a HindIII restriction site (AAGCTT), T11 is a string of 11 Ts (although the first two Ts come from the HindIII site), and M is G, C, or A (48). They are referred to as “anchor” primers because the non-T base after the string of 11 Ts enables the primer to be anchored to the same spot for each round of amplification, in contrast to standard oligo-dT primers that only contain a string of Ts and will anneal in multiple spots, creating a smear (see Note 1). The HindIII restriction site was added to the anchor primer design to make the primers longer and more efficient in annealing to the targeted poly-A site, as well as improving downstream applications such as cDNA cloning. Using the current anchor primer design, the cDNA populations are subsequently divided into three subpopulations that represent one third of the potential mRNA expressed in the cell at any given time. Previous work indicated using anchor primers of the type T11VN, where V can be A, G, or C and N can be any of the four nucleotides, as well as anchors of the type T12MN, where M is a degenerate mixture of A, G, or C and N is any of the four nucleotides (3). Both of those primer designs result in larger subfractions of the mRNA population (12 for type T11VN and 4 for T12MN), which unnecessarily increases the amount of FDD-PCRs for the same level of gene coverage versus the H-T11M primer design. The next step in DD is the PCR amplification of the cDNA subpopulations utilizing a combination of anchor primers (called H-T11M) with a set of “arbitrary” primers that are random and short in length. The design of these arbitrary 13-mers (H-AP primers) utilized in DD technology also includes a HindIII restriction site (AAGCTT) and a 7-bp backbone of random base combinations. The HindIII restriction site is included in both the anchor and arbitrary primers for more efficient primer annealing and easier downstream manipulation of the cDNA (48). The primers used in DD represent a random selection from more than 16,000 (47) bp combinations. Additionally, the length of an arbitrary primer is so designed that, by probability, each will recognize 50–100 mRNAs under a given PCR condition (49). As a result, mRNA 3¢ termini defined by any given pair of anchored primer and arbitrary primer are amplified and displayed by denaturing polyacrylamide gel electrophoresis. A mathematical model of estimated gene coverage utilizing various combinations of anchor and arbitrary primers was developed shortly after the advent of differential display technology (49). This mathematical model indicated that approximately 240 primer combinations (3 anchor primers with 80 arbitrary primers) were needed to approach the level of estimated genome-wide screening for eukaryotes (~95%). However, a new mathematical model (50) predicts that more primer combinations are required to give that level of coverage – using 480 primer combinations (3 anchor primers

104

Meade et al.

with 160 arbitrary primers) would provide ~93% coverage based on the new model. DD was originally optimized with radioactivity using 35S (3). 33 P labeling was then developed (48) for better sensitivity and resolution and has been the most commonly used for publications. However, fluorescent differential display (FDD) (see Fig. 1) was

Fig. 1. Fluorescent mRNA differential display. Three fluorescently labeled one-base anchored oligo-dT primers with 5′ HindIII sites are used in combination with a series of arbitrary 13-mers (also containing 5′ HindIII sites) to reverse transcribe and amplify the mRNAs from a cell.

Automated Fluorescent Differential Display for Cancer Gene Profiling

105

the next logical progression. In the development of FDD, it was crucial that the new platform have similar sensitivity to traditional DD with isotopic labeling, as well as other advantages that would make the platform a viable and improved alternative to the established DD methodology. FDD, optimized using fluorochrome-labeled anchor primers (generically called FH-T11M) and higher dNTP concentrations in PCR, was shown to be essentially identical in both sensitivity and reproducibility to that of conventional DD (44) (see Fig. 2). Improvements such as elimination of radioactivity, digital data acquisition, and increased assay speed were goals that were successfully reached by the establishment of the FDD platform, representing a marked improvement over conventional DD. After PCR amplification, gel electrophoresis is performed to separate the resulting PCR products by size. Reactions are run side-by-side so that the samples being compared are next to one another for each primer combination. Comparison of the cDNA patterns between or among relevant RNA samples reveals differences in the gene expression profile for each sample (see Fig. 3). Electrophoresis can be performed with denaturing polyacrylamide sequencing gels (3, 51), non-denaturing polyacrylamide gels (46), or with agarose gels (52, 53). Sequencing gels are the most commonly used method and are recommended here because they offer the best band resolution and allow for easy and efficient recovery of genes. In addition, their ability to accommodate a large number of reactions reduces the number of gels that must be run for FDD analysis. Because the resulting cDNAs are fluorescently labeled, the use of a fluorescent imager scanner is required for this technology. Here the FMBIO® laser imager series (MiraiBio, Alameda, CA) is recommended for digital acquisition of the cDNA profiles. Although this is the recommended imager, other fluorescent scanners, such as the Typhoon® (GE Healthcare, Piscataway, NJ) and FLA-5000 (FUJIFILM Medical Systems, Stamford, CT) can also be used for FDD with similar sensitivity. Another option for visualization of PCRs is to run samples on an automated sequencer. Our group has successfully used the Applied Biosystems ABI3130xl, a capillary array-based automated sequencer, for FDD band detection with several different fluorophores. These capillary electrophoresis (CE) machines have a laser at a fixed point and, when a fluorescently labeled product passes the laser, a signal is detected. The results of FDD are seen as a series of spectral peaks for each lane, which can be compared to show differences in a very sensitive and reproducible way (see Fig. 4). The use of CE can dramatically cut down on the time and labor required for large-scale FDD screenings. However, the major drawback and bottleneck for using this technology with FDD is that, at this point, there is no way to retrieve bands from

106

Meade et al.

Fig. 2. Comparison of radioactive and fluorescent differential display (FDD). DNA-free RNA from normal (N ) and ras oncogene transformed (T ) rat embryo fibroblasts were compared in duplicate by either conventional differential display with 33Plabeled (alpha) dATP or FDD with fluorescein-labeled anchor primer under identical PCR conditions. The autoradiogram (a) and fluorescent images in grayscale (b) were compared in sensitivity and reproducibility as indicated. Reproducible differences are marked by arrows. The anchored primer, H-T11G, was used in combination with arbitrary 13-mer, H-AP29.

Automated Fluorescent Differential Display for Cancer Gene Profiling

107

Fig. 3. Automated FDD result. Four RNA samples (before, and 6, 9, and 12 h after a drug treatment) were compared with one anchor primer in combination with 24 arbitrary primers (only 21 shown) using automation in liquid handling, 132-lane electrophoresis unit, and digital acquisition of gel image. Grey arrows indicate reproducible differences worthy of pursuit.

Fig. 4. Capillary electrophoresis of FDD reactions. RNA samples without (−) and with (+) p53 activation are compared by FDD and samples are run on ABI3100 capillary electrophoresis instrument. A candidate p53 target gene shows up-regulation in the + p53 sample at approximately 305 bp.

108

Meade et al.

the CE results. One would still have to run a gel and detect bands using an alternate method. The most sophisticated attempt to solve this bottleneck was the development of a prototype computer-controlled CE system for positive band identification and retrieval by fraction collection by the Hitachi Japan group (54). However, to our knowledge, no further progress or commercialization has been made. After completion of the gene expression profiles by gel electrophoresis, the next step is to begin characterization of the potential differentially expressed genes of interest. Bands are excised from the gel matrix and reamplified with the same primer combination as the original FDD-PCR and under the same reaction conditions. Generally, a PCR-product cloning step is recommended before differential gene confirmation and sequencing, but this is up to the preferences of the researcher. The PCR-TRAP® Cloning System (GenHunter Corporation, Nashville, TN) is recommended because it is designed specifically for cloning differential display bands and employs highly efficient positive-selection cloning. Because of the potential that more than one distinct cDNA is contained within an excised band, more than one colony should be screened for the correct size before it is characterized. Furthermore, if the screening results indicate that more than one cDNA is present in the colony population, each of the different cDNAs should then be further characterized. Characterization of each potential gene includes sequencing of the cloned cDNAs of interest, with the results giving an indication of whether the cDNA is a known or unknown sequence. As with any differential gene expression technology, one has to be sure that the characterized sequences are actually differentially regulated, i.e., a “real difference,” and not a false positive. A variety of confirmation techniques, including Northern blot analysis (55), reverse Northern blot analysis (56), quantitative RT-PCR (qRT-PCR) (55), or real-time PCR (55) can be used. Although each has its own distinct advantages and disadvantages, Northern blot analysis is considered the gold standard for gene expression confirmation and, therefore, is recommended. Despite being labor intensive, time consuming, and requiring a significant amount of RNA, the Northern blot is by far the most accepted tool for confirmation. Northern blots have a distinct advantage over other confirmation methodologies in sensitivity, because both high- and low-level mRNA expression can be validated with this standard assay. The optimized FDD technology is now able to compete with other gene expression tools such as DNA microarray technology because of improved high-throughput capabilities, while maintaining its inherent advantages over microarrays. Interestingly, although several candidate p53 target genes have been isolated by microarrays (see Table 2), few of these genes have actually

Automated Fluorescent Differential Display for Cancer Gene Profiling

109

been confirmed by Northern blot analysis or functionally characterized, making it unclear whether they are real or, in fact, just “noise” (false positives) from the method itself. Because the DD approach to differential gene expression analysis relies on randomly generated primers, no prior knowledge of the mRNA sequences is required, making the gene screening systematic, non-biased, and with the ability to find unknown genes. In addition, DD allows researchers to study more than two samples simultaneously, with only 10–20 mg total RNA required for a “complete coverage.” Disadvantages of microarray technology as compared with FDD are reproducibility, probe sensitivity, nonlinearity in signal detection (57), probe cross-hybridization due to homologous cDNA sequences (58), and data management (59). Depending on the amount of desired gene coverage, FDD methodology enables quicker results when compared with traditional isotopic DD or other DD-related technologies, yet ensures more reliable results when compared with microarray or other competing, non-DD technologies. Combined with robotics and digital data analysis, FDD has been shown to be even more accurate and high throughput (4–6, 44, 45, 60). Elimination of manual reaction set up, through the use of a robotic liquid dispenser, not only ensures reproducibility by reduction of pipetting errors, but, in combination with the elimination of conventional DD autoradiography, also decreases the amount of time required for a differential gene expression screening. DD technology allows researchers to quickly and easily find the truly differentially expressed genes in their project so they can spend their time and effort on the downstream functional characterizations, where the true relevance of genes can be identified. These characterizations can be painstakingly long and difficult. Remember that the p53 tumor suppressor gene was discovered in the 1970s and has been worked on since by tens of thousands of laboratories throughout the world. Although much progress has been made in understanding the function of p53, the exact molecular nature of how p53 acts as this crucial tumor suppressor remains a mystery. Nevertheless, with tools such as DD, we think that the answers are coming, in cancer gene profiling as well as in all other fields of life science.

2. Materials 2.1. Total RNA Isolation

1. Phosphate-buffered saline (PBS). 2. RNA isolation reagent: a phenol–guanidinium monophasic solution such as RNApure® (GenHunter Corporation, Nashville, TN) is recommended.

110

Meade et al.

3. Chloroform. 4. Polytron™ Homogenizer for RNA extraction from tissue (Biospec Products Inc., Bartlesville, OK). 5. Diethyl pyrocarbonate (DEPC)-treated water (GenHunter, Cat R105). 6. Isopropanol. 7. 100% ethanol. 8. 70% ethanol in DEPC-treated dH2O. 9. 1.7-mL microfuge (Denville Scientific, Metuchen, NJ). 2.2. Removal of Genomic DNA from Total RNA

1. MessageClean® DNA Removal Kit (GenHunter, Cat. No. M601) including RNase-free DNase I (10 U/mL), 10 × reaction buffer [100 mM Tris–HCl, pH 8.4, 500 mM KCl, 15 mM MgCl2, and 0.01% gelatin], 3 M sodium acetate (pH 5.5), DEPC-treated water, and RNA Loading Mix. 2. Agarose, UltraPure (Invitrogen, Carlsbad, CA). 3. Distilled water (double distilled and autoclaved). 4. Phenol/chloroform (3:1) solution, Tris saturated: 30 mL melted crystalline phenol, 10 mL chloroform, 10 mL 1M Tris-HCl, pH 7.0. 5. 10 × MOPS buffer: 0.2 M MOPS, 0.05 M sodium acetate, 0.01 M ethylenediamine tetraacetic acid (EDTA), pH 6.5. 6. 12.3 M (37%) formaldehyde, pH > 4.0.

2.3. Single-Strand cDNA Synthesis by Reverse Transcription

1. RNAspectra™ Fluorescent Differential Display Kit (GenHunter) including distilled water, 5× RT buffer (125 mM Tris–HCl, pH 8.3, 188 mM KCl, 7.5 mM MgCl2, and 25 mM dithiol threonine [DTT]), dNTP Mix (FDD), oligo-dT anchor primers (H-T11M, 2 mM), and MMLV reverse transcriptase (100 U/mL). 2. 0.2-mL thin-walled PCR tube, RNase-free (GenHunter). 3. Thermal cycler. The GeneAmp PCR System 9600 (Applied Biosystems, Foster City, CA).

2.4. FDD-PCR

1. RNAspectra™ Fluorescent Differential Display Kit (GenHunter) including distilled water, 10× PCR buffer (100 mM Tris–HCl, pH 8.4, 500 mM KCl, 15 mM MgCl2, and 0.01% gelatin), FDD dNTP mix, fluorescent anchor primers (R-HT11M or F-H-T11M), and arbitrary primers (2 mM H-AP). 2. Taq DNA polymerase (Qiagen, Valencia, CA). 3. 0.2-mL thin-walled PCR tube, RNAse-free (GenHunter) or 96-well PCR plates (Thermo-Fast® 96 Detection Plate, ABgene Inc., Rochester). 4. Liquid-handling robot. GenHunter uses the Biomek 2000 (Beckman Coulter Inc., Fullerton, CA).

Automated Fluorescent Differential Display for Cancer Gene Profiling

2.5. Gel Electrophoresis

111

1. Gel apparatus with low-fluorescent (borosilicate) glass plates such as Horizontal or Vertical FDD Electrophoresis Systems (GenHunter). 2. 5 M KOH. 3. 50% ethanol (EtOH). 4. Sigmacote® (Sigma, St. Louis, MO) or similar product. 5. 6% denaturing gel solution such as Sequagel 6 Ready-To-Use 6% Sequencing Gel® (National Diagnostics, Atlanta, GA). 6. 10× TBE: 0.89 M Tris–borate, pH 8.3; 20 mM disodium ethylenediamine tetraacetic acid (Na2EDTA). 7. 10% ammonium persulfate (APS). 8. FDD Loading Dye from RNAspectra™ Kit (GenHunter): 99% formamide, 1 mM EDTA, pH 8.0, 0.009% xylene cyanole FF, and 0.009% bromophenol blue. 9. Fluorescent Laser Scanner. The FMBIO® II or III Series (MiraiBio, Alameda, CA) is recommended. 10. UV-transparent plastic wrap. Standard Glad® Cling Wrap (The Glad Products Company, Oakland, CA) or Saran Wrap work well. 11. FDD locator dye (GenHunter). 12. 5% bleach solution, in dH2O.

2.6. Reamplification of Selected Differentially Expressed Bands

1. Distilled water. 2. 3 M sodium acetate (pH 5.5) from GenHunter MessageClean Kit. 3. 10 mg/mL glycogen (GenHunter, Catalog No. S301). 4. 100% ethanol. 5. 85% ethanol. 6. Unlabeled anchor primers (H-T11G, H-T11A, H-T11C; 2 mM, from GenHunter). 7. Arbitrary primers (H-AP1 to H-AP80, 2 mM, from GenHunter RNAspectra Kit). 8. Taq DNA polymerase (Qiagen). 9. dNTP Mix (FDD) from RNAspectra Kit. 10. 10× PCR buffer (GenHunter): 100 mM Tris–HCl, pH 8.4, 500 mM KCl, 15 mM MgCl2, and 0.01% gelatin. 11. Agarose. 12. 10× agarose DNA loading dye (40% sucrose, 0.1% bromophenol blue, 0.1% xylene cyanole FF, and 2.5 mM EDTA, pH 8.0, in distilled water). 13. 0.2-mL thin-walled, RNAse-free PCR tube (GenHunter).

112

Meade et al.

2.7. Cloning of Reamplified PCR Products

1. PCR-TRAP® Cloning System (GenHunter) including insertready PCR-TRAP® cloning vector, 200 U/mL T4 DNA ligase, distilled water, 10× ligase buffer (500 mM Tris–HCl, pH 7.8, 100 mM MgCl2, 100 mM DTT, 10 mM ATP, 500 mg/mL BSA), 2 mM Lgh/Rgh primers, Colony Lysis Buffer (1× TE with 0.1% Tween 20), 10× PCR buffer, 250 mM dNTP, 20 mg/mL tetracycline, and GH competent cells. 2. LB media. Make 1 L LB with 10 g Bacto-tryptone, 5 g Bacto-yeast extract, 10 g NaCl, and bring volume up to 1 L with dH2O. 3. LB-Agar-TET plates. Make 1 L LB-Agar-TET plates with 10 g Bacto-tryptone, 5 g Bacto-yeast extract, 9 g NaCl, 15 g Bacto-agar, and bring volume up to 1 L with dH2O. Microwave until melted and add 1 mL of 20 mg/mL tetracycline when liquid cools to approximately 50°C. Pour plates. 4. Bacterial polystyrene petri dish.

2.8. Sequencing of Cloned PCR Products

1. AidSeq Primer Set C (GenHunter): includes Lseq and Rseq primers.

2.9. Confirmation of Differential Gene Expression by Northern Blot

1. HotPrime® DNA Labeling Kit (GenHunter) including 1 U/mL Klenow DNA polymerase, 10× labeling buffer, 500 mM dNTP (−dATP) or 500 mM dNTP (−dCTP), stop buffer, and distilled water. 2. QIAEX™ II Gel Extraction Kit. 3. Lock-top microfuge (USA Scientific, Ocala, FL). 4. Alpha-[32P] dATP (3,000 Ci/mmol) (PerkinElmer Life Sciences, Boston, MA). 5. Sephadex G50 column [Roche Applied Science, Indianapolis, IN). 6. 10 mg/mL salmon sperm DNA (GenHunter). 7. Nylon membrane: Nytran SuperCharge Nylon Transfer Membrane (Schleicher & Schuell, Keene, NH). 8. Paper towels. 9. UV-transparent plastic wrap. 10. Single emulsion scientific imaging film. Kodak Biomax MS (Kodak-Eastman, Rochester, NY) is recommended. 11. 20× saline–sodium citrate (SSC): 3 M NaCl, 0.3 M trisodium citrate · 2H2O. Adjust pH to 7.0 with 1 M HCl. 12. Formamide prehybridization/hybridization solution (GenHunter). 13. 1× SSC, 0.1% sodium dodecyl sulfate (SDS) (w/v). 14. 0.25× SSC, 0.1% SDS (w/v).

Automated Fluorescent Differential Display for Cancer Gene Profiling

113

3. Methods 3.1. Total RNA Isolation

Although FDD takes advantage of the poly-adenylation (poly-A+) site of eukaryotic mRNA, total RNA is preferred over poly-A+ RNA (mRNA) for several reasons. These reasons include the overall ease of purification, the ability to verify RNA integrity, and the cleaner background signal (see Note 2). To this end, total RNA is suggested for FDD analysis. If one is planning to do a 240-primer combination screening with FDD, approximately 12 mg of “cleaned” total RNA is required. The term “cleaned” refers to being clean of DNA achieved by DNase I treatment described in Subheading 3.2. Generally, 50–80% of the beginning amount of total RNA can be retrieved after cleaning. In addition, it is important to make sure there is plenty of RNA left over for whatever confirmation step is chosen. To ensure there is enough RNA for all steps, it is suggested to isolate approximately 50 mg of total RNA. The amount of total RNA that can be isolated from a sample can vary widely depending on the tissue/cell type, procedure used, organism, and proficiency at that particular procedure. However, using a reagent based on the standard phenol/guanidine thiocyanate technique such as RNApure®, one can achieve an average yield of 50 mg of total RNA from 25 mg of tissue or 2.5 × 106 cells (see Note 3).

3.1.1. RNA Extraction from Various Sources

1. If using regular “attached” cells, pour off medium. Set the plate on ice. If the cells are in suspension, spin down the cells, remove the medium, then move on to step 4.

3.1.1.1. Extraction of RNA from Tissue Cultures

2. Rinse the cells with 10–20 mL of cold PBS. 3. Pour off the PBS and remove the residual PBS with a 1,000-mL pipette (see Note 4). 4. Add 2 mL of RNApure® RNA isolation reagent per 100- to 150-mm plate to lyse the cells. Spread the solution by shaking the plate. This volume is sufficient for one to ten million cells. 5. Let the plate sit on ice for 10 min. 6. Pipette the lysate into two labeled 1.5-mL microfuge tubes.

3.1.1.2. Extraction of RNA from Tissues

1. Add at least 2 mL of RNApure® RNA isolation reagent to the tissue in a 50-mL conical tube on ice. Ideally, the volume ratio of RNA isolation reagent to tissue should be at least 10:1. 2. Homogenize the tissue with a Polytron™ homogenizer until the tissue is dispersed. 3. Let sit on ice for 10 min. 4. Transfer 1-mL aliquots of the lysate into labeled 1.5-mL centrifuge tubes.

114

Meade et al.

3.1.1.3. Extraction of RNA from Blood

3.1.2. RNA Purification

1. Spin down the blood products and remove the plasma. 2. Follow the instructions in “Extraction of RNA from Tissues” above. 1. Add 150 mL of chloroform per milliliter of lysate. Vortex for 10 s. The protocol can be stopped here by placing the lysates at −80°C. 2. Centrifuge the tubes at 4°C at maximum speed for 10 min (see Note 5). 3. Carefully remove the upper phase (see Note 6) into a clean, labeled 1.5-mL centrifuge tube. If RNA is being isolated from tissues, a second extraction is generally recommended to remove any RNases (see Note 7). 4. Add an equal volume of isopropanol. Mix vigorously or vortex for 30 s. Let sit on ice for 10 min. 5. Centrifuge for 10 min at 4°C at maximum speed. 6. Rinse the RNA pellet with 1 mL of cold 70% ethanol (in DEPCtreated water). Centrifuge 2 min at 4°C at maximum speed. 7. Remove the ethanol. Spin briefly and remove the residual wash solution with a pipette. 8. Resuspend the RNA in DEPC-treated water. The amount used for resuspension will depend on the amount of RNA isolated, but the RNA should be at a concentration greater than 1 mg/mL, so adjust accordingly. Do not use SDS in resuspension if using RNA for any PCR application. 9. Measure the concentration by taking 1 mL of the RNA (using a P10 pipette) and diluting to 1 mL of water (a 1:1,000 dilution). Read the concentration at 260 nm. 1 OD260 = 40 mg. 10. Move on to next steps and store RNA that will not be “cleaned” in aliquots at −80°C until the next use.

3.2. Removal of Genomic DNA from Total RNA

For the purposes of FDD gene expression analysis, as well as any other RNA-based gene expression technologies, contaminating genomic DNA must be removed before single-strand cDNA synthesis by reverse transcription and subsequent PCRs. If left unchecked, any primers with matching sequence to the contaminating DNA will anneal during the FDD-PCRs, causing amplification of DNA sequences and leading to a higher false-positive rate. Therefore, the following protocol is one of the most important procedures in preventing irregularities or artifacts during the FDD-PCRs by removal of the contaminating genomic DNA. It is important to note that one will typically retrieve 50–80% of the total RNA put into the reaction, so the amount to be cleaned must be adjusted to the amount needed for FDD.

Automated Fluorescent Differential Display for Cancer Gene Profiling

3.2.1. DNase I Digestion of Total RNA

115

1. If necessary, dilute the desired amount of RNA to be digested (maximum of 50 mg) with DEPC-treated water to a volume of 50 mL. 2. In a 1.5-mL centrifuge tube, add the following in order (the total reaction volume is 56.7 mL): Total RNA (10–50 mg)

50 mL

10× reaction buffer

5.7 mL

10 U/mL DNase I

1.0 mL

3. Mix gently and incubate at 37°C for 30 min (see Note 8). 3.2.2. Extraction and Ethanol Precipitation of DNA-Free RNA

1. Prepare phenol/chloroform solution (see Note 9) by melting crystalline phenol at 65°C. 2. Add 30 mL melted phenol to 10 mL chloroform and mix well. 3. Add 10 mL 1M Tris–HCl, pH 7.0, and mix well. Allow the saturation phase to form before using. 4. Add 40 mL of phenol/chloroform solution to each DNase I reaction (see Note 10). Vortex for 30 s. 5. Let sit on ice for 10 min. 6. Centrifuge at maximum speed for 5 min at 4°C. 7. Collect the upper phase (see Note 6) and place it into a clean, labeled, 1.5-mL microfuge tube. 8. Add 5 mL of 3 M sodium acetate and 200 mL of 100% ethanol. Mix well. 9. Let sit for at least 1 h at −80°C. Overnight to a few days at −80°C is also fine. 10. Centrifuge at 4°C for 10 min at maximum speed to pellet the RNA. 11. Carefully remove the supernatant and rinse the RNA pellet with 0.5 mL of 70% ethanol (in DEPC-treated water). Do not disturb the pellet. 12. Centrifuge for 5 min at 4°C at maximum speed and remove the supernatant. Centrifuge again briefly, removing the residual liquid without disturbing the RNA pellet. 13. Resuspend the RNA in 10–20 mL of DEPC-treated water.

3.2.3. RNA Quantification and Integrity Verification

After cleaning, it is crucial to be able to determine both the quantity and quality of the RNA retrieved. The amount can easily be quantified by OD260. The quality/integrity of the RNA is determined most accurately by running the RNA on an “RNA gel” and looking for the appearance of sharp ribosomal RNA bands.

116

Meade et al.

1. Quantitate the RNA amount by OD260 after 1:1,000 dilution of the DNA-free RNA sample with distilled water. RNA concentration (in mg/mL) = OD260 (1:1,000 dilution) × 40. 2. Prepare an “RNA gel” (denaturing formaldehyde agarose gel with MOPS and formaldehyde) by the following protocol: (a) Add the following to a microwave-safe container: i. 10× MOPS

10 mL

ii. Agarose

1–1.5 g

iii. Distilled water

83 mL

(b) Microwave for approximately 3 min or until the agarose is melted. (c) Let the agarose cool to at least 50°C (barely touchable by hand). (d) Add 7 mL of a 12.3 M (37%) formaldehyde solution. Gently mix. (e) Pour into a prepared gel casting plate and add a gel comb. (f) Running buffer (1 L) is made by diluting 100 mL of 10× MOPS with 900 mL of distilled water to a 1× concentration. Cover the agarose gel with running buffer. 3. Check the integrity of the RNA (see Note 11) by resolving 2–3 mg of both pre-DNase and post-DNase RNA samples on a 7% formaldehyde agarose gel with RNA Loading Mix by the following protocol: (a) Add 1–10 mL (2–3 mg) of RNA to 20 mL RNA Loading Mix in a labeled 1.5-mL microfuge tube. Mix well. (b) Incubate at 65°C for 10 min. (c) Centrifuge the sample briefly to collect condensation. (d) Put the samples on ice for 5 min. (e) Load the entire amount onto an RNA gel. (f) Run at 50–60 V for approximately 45 min or until resolution of the ribosomal subunits is achieved. 3.3. Single-Strand cDNA Synthesis by Reverse Transcription

Generally, two RT reactions are done per sample (called in duplicate) to ensure reproducibility and as a way of reducing any false positives. It is recommended to set up separate RT core mixes for each individual H-T11M in 200 mL-volume RT reactions if 240 primer combinations will be performed. Therefore, if two samples are being studied, set up four 200-mL RT reactions for H-T11G, four 200-mL RT reactions for H-T11A, and four 200-mL RT reactions for H-T11C. If smaller

Automated Fluorescent Differential Display for Cancer Gene Profiling

117

or larger numbers of primer combinations are chosen, adjust accordingly. 1. Dilute 40 mL of each RNA sample to a final concentration of 0.1 mg/mL with DEPC-treated water and mix thoroughly. Place on ice. 2. For an RT core mix with two samples in duplicate for one H-T11M primer (H-T11G will be shown here), add the following: 376 mL distilled water 160 mL of 5× RT buffer 64 mL FDD dNTP mix 80 mL H-T11G primer 680 mL total volume

Mix well. 3. Divide the above 680 mL evenly into four tubes labeled with sample name (Example: RTG-1a, RTG-1b, RTG-2a, RTG-2b), aliquoting 170 mL per tube (see Fig. 5 for step-bystep schematic of RT and FDD-PCR setup). 4. Add 20 mL of corresponding total RNA (0.1 mg/mL, freshly diluted, see Note 12) to each tube. For example, add 20 mL of RNA 1 to each of tubes RTG-1a and RTG-1b followed by 20 mL of RNA 2 to each of tubes RTG-2a and RTG-2b. Mix each tube well. 5. Program the thermal cycler to: 65°C for 5 min → 37°C for 60 min → 75°C for 5 min → >4°C soak (see Note 13). 6. Place the tubes on the thermal cycler and begin the program. 7. After the tubes have been at 37°C for 10 min, pause the thermal cycler and add 10 mL of MMLV reverse transcriptase to each tube. Quickly mix well by finger-tipping or pipetting up and down before continuing the incubation program. 8. At the end of the reverse transcription, spin the tube briefly at maximum speed to collect the condensation. Set the tubes on ice or store at −20°C for later use. 9. Repeat steps 1–8 for the H-T11A and H-T11C primers. 3.4. FDD-PCR

This protocol is designed for 240 primer combinations in duplicate per sample using three fluorescent dye-labeled anchor primers (FH-T11M) and 80 upstream arbitrary primers (H-AP). This would yield approximately 74% coverage of all possible genes. For a complete, genome-wide screening, 480 primer combinations or more must be completed per sample. It is ideal to set up PCRs in 96- or 384-well PCR plates using a robot to

118

Meade et al.

Fig. 5. RT and FDD reaction setup. This schematic shows individual steps involved and quantities required for standard reverse transcription (RT) and fluorescent differential display (FDD) reaction setups. These numbers are based on comparing two samples in duplicate (or four samples not in duplicate) with FH-T11M anchor primer in combination with 24 H-AP arbitrary primers. These steps would be repeated ten times until all 240-primer combinations (3 anchor primers and 80 arbitrary primers) have been completed.

Automated Fluorescent Differential Display for Cancer Gene Profiling

119

ensure reproducibility and increase throughput. Depending on the number of samples and the plate being used, one may be able to combine more or less than 24 primer combinations into one experiment. However, for simplicity, a 24-primer combination experiment with one anchor primer and two RNA samples in duplicate using a 96-well plate will be discussed. Therefore, this protocol will need to be repeated ten times using varying anchor-arbitrary primer combinations. 1. A separate FDD-PCR core mix for each individual FH-T11M primer is made. Here, a core mix for all 80 H-AP primers for FH-T11G primer is shown. This will be called “FDD Core Mix G.” 4,080 mL distilled water 800 mL of 10× PCR buffer 640 mL dNTP Mix (FDD) 800 mL FH-T11G primer 6,320 mL total volume

Mix well. 2. Aliquot 1,896 mL of FDD Core Mix G into three separate tubes labeled “FDD Core Mix G” (see Fig. 5 for step-bystep schematic of RT and FDD-PCR setup). Aliquot the remaining amount into a fourth tube labeled “FDD Core Mix G-remainder” (approximately 632 mL). 3. To one of the tubes labeled “FDD Core Mix G,” add 24 mL Taq DNA polymerase. Mix well. Freeze the other three tubes aliquoted above at −80°C for later PCRs. 4. Aliquot 480 mL of “FDD Core Mix G/Taq” mixture to four separate tubes labeled corresponding to the RT reactions. For this example, use FDDG-1a, FDDG-1b, FDDG-2a, and FDDG-2b. 5. Add 60 mL of corresponding cDNA from RT to each of the four tubes. For example, 60 mL of RTG-1a tube would go into the tube labeled FDDG-1a. Mix well. 6. Using either a robot or by hand, add 2 mL of H-AP primers 1–24 to corresponding wells of a 96-well plate (see Fig. 5). 7. Using either a robot or by hand, add 18 mL of corresponding FDD Core Mixes to corresponding wells of a 96-well plate (see Fig. 5). 8. The total reaction volume will be 20 mL. Add 25 mL of mineral oil if needed. 9. Program the thermal cycler to:

120

Meade et al. 94°C for 15 s (see Note 14) 40°C for 2 min 72°C for 60 s for 40 cycles →72°C for 5 min →4°C soak.

10. Put the 96-well plate on the thermal cycler and begin the program. Once completed, store reactions at −20°C in the dark. 11. Steps 3–10 will then be repeated for H-AP primers 25–48 and 49–72. 12. The same process will then be done for H-AP primers 73–80 as follows (see Note 15): (a) Add 8 mL Taq DNA polymerase to the 632 mL of “FDD Core Mix G-remainder.” Mix well. (b) Aliquot 160 mL of that mixture to four separate tubes labeled the same as in step 4. (c) Add 20 mL of cDNA from RT to each of the four tubes corresponding cDNA as in step 5. Mix well. (d) Using either a robot or by hand, add 2 mL of H-AP primers 73–80 to the corresponding wells of a 96-well plate. (e) Using either a robot or by hand, add 18 mL of corresponding FDD Core Mixes to the corresponding wells of a 96-well plate. (f) Follow steps 8–10 above. 13. Repeat steps 1–12 for the FH-T11A and FH-T11C primers. 3.5. Gel Electrophoresis

Because performing a large-scale FDD experiment requires many hundreds of PCRs (960 in the experiments above), one of the areas for improvement in making it more high throughput is in the gels themselves. Using a gel apparatus with many lanes can speed up this process tremendously. One system that has been successfully used is the Horizontal FDD Electrophoresis System with 132 lanes and the “Microtrough System” containing grooved glass plates. This allows one to load at least one entire 96-well plate on one gel. In addition, the Microtrough System allows the researcher to use standard 10-mL pipet tips for sample loading instead of the difficult-to-use flat gel-loading tips required by standard sequencing apparatuses. Hand position during loading is more stable and relaxed with this system.

Automated Fluorescent Differential Display for Cancer Gene Profiling

121

A multichannel pipettor for gel loading has also been tried. Matrix Technologies (Hudson, NH) manufactures several pipettors with width-expandable channels called “Matrix Equalizers.” The 8-channel Matrix Equalizer 384 with 0.5–12.5 mL volume range works fairly well. These pipettors have tips that move independently and can be spaced anywhere from 4.5 to 14.15 mm apart. For the gel loading, the tips were spaced at 9 mm for liquid uptake from a 96-well plate and then collapsed together to 4.5 mm for gel loading. However, this 4.5 mm distance only allows 87 lanes per gel, not enough to load an entire 96-well plate. A pipettor that could contract to 3 mm for gel loading would be ideal, but so far Matrix has not manufactured this. Therefore, using one of these pipettors has trade-offs: while it decreases the time required for gel loading and the chance of incorrect loading, fewer reactions can be run on the same gel. The other option is to load the PCRs using the Matrix pipettor at the 6 mm distance, loading every other well, but this requires reconfiguration of the reaction setup. For the experiments done above that have 960 PCRs on ten 96-well plates, it is recommended to run ten separate gels, each with one 96-well plate. One to two gels can generally be run per day, requiring 5–10 days to run all ten gels. For ease of use, the Sequagel 6 Ready-To-Use 6% Sequencing Gel® is recommended for denaturing gel electrophoresis. However, a general protocol is given here for the 6% denaturing polyacrylamide gel that is recommended for resolution of cDNA profiles. 1. Thoroughly clean both sides of the glass plates to be used with warm water and soap, ensuring that there is no previous gel debris or streaks (see Note 16). Be sure to rinse thoroughly afterward because soap residue may cause problems. KOH can be used occasionally for this purpose to strip off hardto-clean residue. 2. Further clean the glass plates by wiping with a 50% ethanol (EtOH) solution. Make sure the plates are completely dry. 3. Coat the interior surface of one of the plates (usually the notched one) with 500 mL Sigmacote® or similar product using a Kim-Wipe to smoothly spread it over the surface. Let dry for 1 min. This coating step allows the gel to preferentially stick to the non-coated plate after separation of plates for band cutting after running the gel. 4. Use 60 mL of the gel mixture for a 45 × 28 × 0.04-cm gel. 5. Add 0.5 mL of 10% APS solution and mix thoroughly. 6. Pour gel into the sequencing gel cast and let it polymerize for 1–2 h or overnight (see Note 17). 7. After polymerization, load the glass plates into the sequencing apparatus and add 1× TBE buffer to the upper and lower buffer chambers.

122

Meade et al.

8. Flush the urea from the gel wells by using a syringe to inject buffer into each well. Pre-run the sequencing gel in 1× TBE buffer for 30 min. 9. Add 3.5 mL of each FDD-PCR with 2 mL of FDD loading dye. Alternatively, an appropriate ratio of loading dye (8 mL for 20 mL PCRs) can be added directly to the PCR if the reactions will only be used for running gels. Incubate at 80°C for 2 min immediately before loading onto the gel. This step is to denature the DNA samples before gel loading. 10. After heat denaturation, put the samples on ice for 1–2 min. 11. Load the maximum amount of sample (usually 3–4 mL) into wells. It is crucial that all urea be removed from the wells before loading samples (see Note 18). For best results, load four to six lanes and then stop briefly to reflush the wells to remove urea. Load in appropriate groups, usually by primer combination. 12. Electrophorese for 1½ to 3 h at 60 W constant power (voltage not to exceed 2,000 V) until the xylene cyanole dye (the slower moving dye) reaches the bottom of the gel. In a 6% gel, the xylene cyanole will co-migrate with DNA of approximately 106 bp as a reference point. If voltage exceeds 2,000, lower the wattage. The gel should be kept in the dark while running to prevent photobleaching of samples (see Note 19) either by using a dark room or covering the gel apparatus with a cardboard box. 13. Turn off the power supply and remove the plates from the gel apparatus. Take off the gel tape and remove the spacers and comb (see Note 20). Clean the outside of the glass plates very well with warm water and 50% ethanol to remove any residue from the gel or tape. Thorough cleaning is required to reduce background signal (see Note 16). 14. Scan the gel on a fluorescence imager with an appropriate filter, following the manufacturer’s instructions based on the particular fluorophore being used. 3.6. Reamplification of Selected Differentially Expressed Bands

Assuming differentially expressed bands of interest are seen, those bands should be excised from the gel. After excision, the cDNAs will be reamplified using the same anchor-arbitrary primer combinations and reaction conditions as the initial FDD-PCRs. The reamplification products can then be cloned and sequenced for further characterization. 1. Separate the glass plates by taking off the notched/smaller glass plate (see Note 21) leaving the gel attached to the unnotched/larger plate.

Automated Fluorescent Differential Display for Cancer Gene Profiling

123

2. Place a layer of UV-transparent plastic wrap on top of the gel. This prevents contamination of the gel as well as making gel cutting easier. 3. Spot 0.5 mL of FDD Locator Dye at the upper and lower corners of the gel to allow orientation of the picture with the gel. The FDD locator dye, with its combination of fluorescent and visible dyes, can be used to easily align the gel with the printed template for band excision. 4. Rescan the gel with the gel facing up. 5. Print a real-size image on appropriately sized paper (see Note 22) using a quality ink jet or laser printer. This printed image will be used as the template to excise differentially expressed cDNAs. 6. Choose and label any bands to be cut (see Note 23). A logical band-naming nomenclature should be used such as RN-G1A (RN = researcher name; G = FH-T11G anchor primer; 1 = H-AP1 arbitrary primer; A = top differentially expressed band in lane). 7. Place the printout on the tabletop and lay the glass plate on top of the printout. Orient the plate so that the spots on the printout match up with those on the gel. 8. Excise each band with a razor or other sharp utensil and place the band into a 1.5-mL microfuge tube labeled with the corresponding band name. The razor blade should be cleaned between cuts to prevent cross-contamination, with a 70% ethanol or 5% bleach-soaked Kim-Wipe followed by H20soaked Kim-Wipe. 9. For each band being reamplified, add 100 mL of distilled water to the tube containing the corresponding gel slice. 10. Let soak for 10 min at room temperature. 11. Boil the tightly closed tube (with Parafilm or a lock-top tube) for 15 min to elute the cDNA from the gel slice. 12. Spin for 2 min at maximum speed to collect the condensation and pellet the gel. 13. Transfer supernatant to a fresh 1.5-mL labeled microfuge tube. Discard the tube with the gel slice. Add 10 mL of 3 M sodium acetate, 5 mL of glycogen, and 450 mL of 100% ethanol per tube. Let sit for at least 30 min on dry ice or in a −80°C freezer. 14. Spin for 10 min 4°C at maximum speed to pellet the DNA. Remove the supernatant and rinse the pellet with 200 mL of ice-cold 85% ethanol. Spin briefly and remove the residual ethanol. 15. Dissolve the pellet in 10 mL of dH2O.

124

Meade et al.

16. Make a reamplification core mix for each of the anchor primers that is large enough to reamplify all FDD bands with that particular anchor primer:

(a) A standard reamplification reaction will contain: Distilled water

23.3 mL

10× PCR buffer

4.0 mL

dNTP Mix (FDD)

0.3 mL

2 mM H-AP primer*

4.0 mL

2 mM H-T11M (seeNote 24)

4.0 mL

cDNA template*

4.0 mL

Taq DNA polymerase

0.4 mL 40 mL total volume

(b) Determine how many bands of each anchor primer will be reamplified. Multiple each of these by 10% to give a cushion for any pipetting inaccuracies. The number of bands for H-T11G × 10% = g; the number of bands for H-T11A × 10% = a; and the number of bands for H-T11C × 10% = c. (c) Make a reamplification core mix for each H-T11M by multiplying the numbers for a “Standard Reamplification Reaction” by g, a, and c, accordingly.

*However, for the core mixes, the H-AP primers and cDNA templates will not be added, as these will vary with each band. Make the core mix as follows: Distilled water

23.3 mL × g, a, or c

10× PCR buffer

4.0 mL × g, a, or c

dNTP Mix (FDD)

0.3 mL × g, a, or c

2 mM H-T11M

4.0 mL × g, a, or c

Taq DNA polymerase

0.4 mL × g, a, or c 32 mL × g, a, or c (total volume)

(d) As an example, if there were 20 bands chosen for reamplification from FH-T11G, g would be 22 (20 × 10% = 22). A core mix should be made for 22 bands by multiplying the numbers from step c above by 22: Distilled water

23.3 mL × 22 = 512.6 mL

10× PCR buffer

4.0 mL × 22 = 88 mL

dNTP Mix (FDD)

0.3 mL × 22 = 6.6 mL

2 mM H-T11M

4.0 mL × 22 = 88 mL

Taq DNA polymerase

0.4 mL × 22 = 8.8 mL 32 mL × 22 = 704 total volume

Automated Fluorescent Differential Display for Cancer Gene Profiling

125

(e) Make appropriate amounts of core mixes for both FHT11A and FH-T11C. 17. After core mixes are made, aliquot 32 mL into 0.2-mL tubes (individually, as strip tubes, or in a 96-well plate) labeled with the proper band name. 18. Add 4 mL of the corresponding cDNA template from step 11 above as well as 4 mL of the corresponding H-AP primer. 19. Place the reamplification reactions on the thermal cycler and perform using the same conditions as for FDD-PCR. 20. Make a 1.5% agarose gel with ethidium bromide by adding 1.5 g of agarose to 100 mL of 1× TAE. When the agarose/1 × TAE mix cools to approximately 50°C (barely touchable by hand), add 3 mL of ethidium bromide, swirl to mix, and pour the solution into a plastic agarose-casting tray. 21. Add 30 mL of the reamplification reaction to 5 mL of agarose DNA loading dye in a 0.5-mL microfuge tube. Load the 35 mL volume onto the 1.5% agarose gel. Save the remaining 10 mL of the PCR samples at −20°C for future cloning. 22. Electrophorese at 70 V for approximately 45–60 min. 23. Confirm correct cDNA reamplification by visualizing the gel using a UV transilluminator. The reamplified band should be approximately the same size as the band cut from the original FDD gel. After successful reamplification, each band must be confirmed to be a “real” difference by Northern blot or other technique. In addition, the band will need to be sequenced to determine whether it is a known or novel sequence. The order in which these are done can vary and is generally up to the preference of the researcher. Direct sequencing of the reamplified PCR products can sometimes be done here (see Note 25), but a cloning step is recommended first. The following steps are presented in the recommended order, but this can be modified based on the situation. 3.7. Cloning of Reamplified PCR Products

Clone differentially expressed cDNAs into recommended PCRTRAP® cloning vector (see Note 26), or other suitable cloning vector, following the manufacturer’s protocol.

3.8. Sequencing of Cloned PCR Products

If using the PCR-TRAP® Cloning System, sequencing can be performed utilizing vector-specific primers such as Lseq/Rseq or Lgh/Rgh. If using a cloning vector other than the one recommended, consult the manufacturer’s guidelines for sequencing instructions.

126

Meade et al.

3.9. Confirmation of Differential Gene Expression by Northern Blot

To confirm differential expression of the selected cDNAs, Northern blot analysis (55) is suggested rather than other confirmation techniques such as reverse Northern hybridization (56), quantitative RT-PCR (55), or real-time PCR (55). The Northern blot technique is technically simple and straightforward in approach, requiring no manipulation of the RNA sequences from which differential gene expression has been detected. Additionally, Northern blot analysis is the most accepted confirmation technique for differential gene expression, often being referred to as the gold standard of gene expression confirmation assays. If using the recommended PCR-TRAP® cloning vector, the probe template is produced by a PCR of the cDNA construct within the cloning vector. The required primers are supplied with the cloning system. Additionally, the HotPrime® DNA Labeling Kit, a random prime labeling kit with major improvements over the traditional random priming kit, is suggested. It is specifically designed to efficiently label DNA probes isolated from differential display for Northern blot analysis. This method makes use of random decamers, rather than the traditional hexamers used in random priming, incorporates the anchored oligo-dT primers (H-T11M) into the labeling buffer to ensure full-length antisense cDNA probe labeling, and uses radioactive dATP to take advantage of the ATrich nature of DD bands. These improvements greatly increase the chance for signal detection on the Northern blot analysis. After performing DD, most of the bands found will be confirmed to show differential gene expression. Those that are confirmed are considered “real” differences as opposed to any “false positives.” If a band chosen from DD does not show differential expression on a Northern blot, it does not mean that it is necessarily a “false positive.” There have been several examples where bands show no noticeable differential expression on a Northern blots, but on review, something else is involved, such as a polymorphism at the primer binding site, a short sequence deletion/insertion (61), a splicing difference, etc. If these types of changes occur at the exact site where a primer anneals (or within the gene sequence produced) during DD, a difference would be revealed, whereas a Northern blot may still show no difference because the probe could still bind to the RNA despite these small sequence differences. The message is that if a band looks convincing on the DD gel, but does not show differential expression by Northern, it could be a false positive, but it could also be something very interesting and worth pursuit.

4. Notes 1. Non-anchored oligo-dT primers have been used for Differential Display (DD), but their disadvantages far outweigh the

Automated Fluorescent Differential Display for Cancer Gene Profiling

127

advantage of needing only one primer for RT and PCR. Without the non-T base at the 3¢ end of the primer to “anchor” their position, they can anneal anywhere on the poly-A tail for PCR and will thus create many different size DNA fragments for the same exact cDNA species. This leads to a background smear, which is aesthetically unappealing, but more importantly will create problems for downstream reamplification of the wrong cDNA. 2. Although poly-A+ RNA (mRNA) is what is actually being reverse transcribed in DD, it is rarely used as the RNA input. It can be purified and used for DD, but it provides no significant advantages and therefore total RNA is the preferred RNA source for DD for a number of reasons. First, it is much easier to purify than poly-A+ (mRNA) because simple RNA isolation reagents exists from many commercial sources, including RNApure® (GenHunter). Most of the protocols for purifying mRNA require purification of total RNA first, so it requires additional steps. Second, total RNA allows for easy evaluation of overall RNA integrity by running an “RNA gel” and visualizing the ribosomal RNA bands. If these bands are sharp and without a background smear, it can be assumed that the mRNA is also intact. There are ways to evaluate mRNA integrity, but they require expensive and sophisticated instruments such as the Agilent Bioanalyser. Finally, the methods used for mRNA purification generally require an oligo-dT binding step so that only the mRNA will be captured. This always leads to some oligo-dT contamination in the RNA sample, which will cause problems for the same reasons listed in Note 1. For all of these reasons, total RNA is the RNA type of choice for DD. 3. The RNApure® reagent from GenHunter is a simple monophasic solution for rapid isolation of intact total RNA that is similar to other phenol/guanidine thiocyanate-based RNA isolation products, but has several major advantages. These include special cell lysis chemicals giving better yield, a yellow color allowing easier visualization during phase separation, and better stability with less corrosiveness. The high-quality RNA isolated can be used for differential display, Northern, and reverse Northern blot analysis, and for other applications. 4. During RNA isolation from cells, it is crucial to completely remove any residual PBS after rinsing. Otherwise, the ratio of RNApure to cells will be altered. Let the plate sit on angle for 1 min and remove the residual PBS with a 1,000-mL pipette. 5. During RNA purification steps, many of the centrifugation steps are done at 4°C. We put our centrifuge in the refrigerator a few hours before these steps will be done. However, we have noticed that if you leave the centrifuge in the refrigerator continuously, it will not spin as fast. We assume this would be caused by either temperature or moisture. Therefore, if you are

128

Meade et al.

using a standard lab centrifuge designed for room temperature use, do not keep the centrifuge in the refrigerator long term. 6. When removing the upper phase, it is crucial that you do not touch the interphase, which may contain proteins including RNases/DNases. It is much better to lose some RNA, but ensure that what RNA you do retrieve will be free of RNase/ DNase, than to try to get as much of the upper phase as possible and risk RNase/DNase contamination. 7. Because tissues generally contain higher amounts of RNases than cells, we have noticed that a second extraction phenol extraction step will significantly improve the DD results in terms of reproducibility and overall quality. This second extraction can be done directly after taking the upper phase and using more RNApure® reagent. Just add 1 mL of RNApure® reagent per 100 mL of upper phase and follow the protocol starting at Subheading 3.1.2 again. 8. For the DNase I digestion step at 37°C, we recommend sticking to this 30-min time as closely as possible in case there is any RNase contamination. However, it is also crucial to do the full 30-min incubation to completely digest all DNA. 9. We have found that phenol/CHCl3 (3:1) is superior to phenol/CHCl3 (1:1) or phenol/CHCl3/isoamyl alcohol (25:24:1). However, these other options can be used, but the extraction should be repeated twice to ensure complete removal of proteins. Phenol/CHCl3/isoamyl alcohol is normally used for DNA or plasmid purification. It is recommended that all reagents for RNA work be separated from DNA work to avoid RNase contamination. 10. There are non-phenol/chloroform-based based protocols to inactive or remove DNase, including heat inactivation, chemical inactivation, or column-based purification. However, phenol/chloroform-based purification is the gold standard for protein removal and the only way to ensure that all DNase is removed. The other protocols may inactivate or remove most of the DNase, but for RT-PCR applications, even minute amounts of DNase will cause major problems with your results. Therefore, we only recommend phenol/chloroformbased purification. 11. To check for RNA integrity, look for the clear appearance of the ribosomal RNA bands, with little to no smearing. RNA from different species can look significantly different, but mammalian RNA should have 28S and 18S ribosomal RNA (rRNA) bands in close proximity at the top of the gel and a 5S rRNA band lower. If the RNA appears degraded, this can be caused by many things: RNA was degraded before treatment with DNase I. Check the integrity at all stages (before digestion, after digestion,

Automated Fluorescent Differential Display for Cancer Gene Profiling

129

after phenol/CHCl3 extraction, etc.). Make sure that RNA is stored at −80°C at concentrations of at least 1 mg/mL.

DNase I was contaminated. DNase I from many vendors contain detectable RNase contamination. The DNase I from the MessageClean Kit is guaranteed to be RNase-free. RNA was degraded by reagents or equipment. Make sure all solutions and buffers are made with DEPC-treated dH2O and all vessels including tubes, tips, and gel boxes are free of RNase.

The RNA sample itself is contaminated with RNase. This is a common problem with RNA extracted from large amounts of tissue, which is why at least two extraction steps are recommended for tissues. To confirm RNase contamination, incubate RNA with 1–2 mM MgCl2 in Tris–Cl, pH 8.0, at 37°C for 30 min. This will activate any RNase in the RNA. If this is confirmed, if enough “uncleaned” RNA remains, do an additional phenol/CHCl3 extraction with RNA sample following the same procedures in Subheading 3.2.2. If not, start a new RNA extraction and increase the RNA extraction solution (RNApure®) to tissue ratio and do an additional phenol/CHCl3 extraction step.

The RNA sample sometimes appears to be degraded after agarose gel analysis, when the actual problem is the pH of the buffer, too much salt in the RNA, or bad loading dye, which has caused the ribosomal RNAs (28S and 18S) to migrate strangely. We recommend using RNA Loading Mix (GenHunter). Confirm the pH of the MOPS buffer, which should be between 6.5 and 7.0. Also, make sure formaldehyde is added to the gel and the RNA sample is denatured by incubating in RNA Loading Mix at 65°C for 10 min before loading.

12. RNA samples should be freshly diluted with dH2O or DEPC-treated H2O to 0.1 mg/mL directly before RT reaction set up. Do not reuse the diluted RNA after freezing and thawing because the RNA will be degraded and yield poor results. 13. For the reverse transcription reaction, the initial 65°C incubation is intended to denature the RNA secondary structure. The final incubation at 75°C is to inactivate the reverse transcriptase without denaturing the cDNA/mRNA duplexes. Therefore “hot start” PCR is neither necessary, nor helpful for the subsequent PCRs using cDNAs as templates. 14. If not using the recommended thermal cycler, you may need to adjust the denaturation (94°C) time to 30 s. 15. The PCR setup for H-AP primers 73–80 can be done at the same time for all three FH-T11M primers so they can all be put on one 96-well plate.

130

Meade et al.

16. Gel debris and streaks on the glass plates will usually fluoresce and can cause major background problems. Therefore, thorough cleaning is required. 17. If overnight gel polymerization is done, plastic wrap, such as Saran Wrap, should be used to prevent the gel from drying out. 18. During sample gel loading, it is crucial that the urea in the wells be completely flushed right before loading your samples. Because urea is heavier than water, it will fall to the bottom of the well fairly quickly. If a sample is loaded without flushing a well, it will sit on top of the urea, which in turn causes strange migration and poor resolution. For best resolution, flush every four to six wells loaded using a syringe or pipet while trying not to disturb samples that have already been loaded. 19. Fluorescent dyes are light sensitive. We recommend keeping primers and samples in the dark or covered with aluminum foil. While running the gel, the apparatus should also be kept in the dark as much as possible. This can be done by running gels in a dark room or using a cardboard box to cover the entire apparatus. 20. When scanning the gel, it is best to remove the gel tape, spacers, and comb, which will fluoresce and can cause background problems. However, if you think you might run the gel longer for better separation, you should do a quick scan before removing gel tape, spacers, and comb to determine whether the gel has been run long enough. 21. To separate the glass plates, we have found that small plastic wedges, which can be purchased from several gel companies, work well. It is important to do this slowly to make sure that the gel is sticking to only one side. 22. When printing out the real-size image, you will need a large enough sized paper to fit the whole gel. We use 11 × 17 paper on an ink-jet printer, which allows plenty of space for the entire gel. If necessary, you could also print the gel on two to three pieces of paper and tape them together. 23. When selecting bands to cut, if there is a chance that a band is worth pursuing, it is recommended to cut it out. Later, a decision can be made whether or not to reamplify that band. However, if one later decides to pursue a band that was not cut, the gel will have to be run again because gels can only be stored for a few days before drying out. When a large quantity of gels are being run, it usually makes sense to run all the gels first, cutting any interesting bands along the way, and storing those bands in the refrigerator. When all gels have been completed, a decision can be made regarding which bands are worthwhile reamplifying and then they can be reamplified together.

Automated Fluorescent Differential Display for Cancer Gene Profiling

131

24. For the reamplification reaction, note that the unlabeled (without 5′ fluorophore) H-T11M primers are used. Otherwise, the fluorophore can interfere with future cloning. 25. Direct sequencing can sometimes be done after successful reamplification. If the reamplified product is a single, clean band, direct sequencing with the H-AP primer can work, generally approximately 50% of the time. However, if the reamplified product has multiple bands, a cloning step will have to be done first. 26. The PCR-TRAP® Cloning System is by far the most efficient cloning method for PCR products that we have tested. It utilizes a third-generation cloning vector that features positive selection for DNA inserts. Only recombinant plasmids confer antibiotic resistance. The principle of this unique cloning system is based on the phage Lambda repressor gene, cI, which is cloned on the PCR-TRAP® vector and codes for a repressor protein. The repressor protein binds to the Lambda right operators Or1 to Or3 of the cro gene, thereby turning off the promoter that drives the TetR gene on the plasmid. Therefore, cloning of the PCR products directly, without any post-PCR purification, into the cI gene leads to inactivation of the repressor gene, thus turning on the TetR gene. This allows the Escherichia coli containing recombinant plasmids to grow on Tet plates. References 1. Sager, R. (1997) Expression genetics in cancer: shifting the focus from DNA to RNA. Proc. Natl. Acad. Sci. USA 94,952–955. 2. Vogelstein, B., Lane, D., and Levine, A.J. (2000) Surfing the p53 network. Nature 408, 307–310. 3. Liang, P., and Pardee, A.B. (1992) Differential display of eukaryotic messenger RNA by means of the polymerase chain reaction. Science 257, 967–971. 4. Liang, P. (2002) A decade of differential display. Biotechniques 33, 338–346. 5. Liang, P., and Pardee, A.B. (2003) Analysing differential gene expression in cancer. Nat. Rev. Cancer 3, 869–876. 6. Liang, P., Meade, J., and Pardee, A.B. (2007) A protocol for differential display of mRNA expression using either fluorescent or radioactive labeling. Nat. Protoc. 2, 457–470. 7. Schena, M., Shalon, D., Davis, R.W., and Brown, P.O. (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270, 467–470.

8. Chee, M., Yang, R., Hubbell, E., Berno, A., Huang, X.C., Stern, D., et al. (1996) Accessing genetic information with high-density DNA arrays. Science 274, 610–614. 9. Velculescu V.E., Zhang L., Vogelstein B., and Kinzler K.W. (1995) Serial analysis of gene expression. Science 270, 484–487. 10. Zimmermann, C.R., Orr, W.C., Leclerc, R.F., Barnard, E.C., and Timberlake, W.E. (1980) Molecular cloning and selection of genes regulated in Aspergillus development. Cell 21, 709–715. 11. McCarthy, S.A., Samuels, M.L., Pritchard, C.A., Abraham, J.A., and McMahon, M. (1995) Rapid induction of heparin-binding epidermal growth factor/diphtheria toxin receptor expression by Raf and Ras oncogenes. Genes Devel. 9, 1953–1964. 12. Zhang, R., Tan, Z., and Liang, P. (2000) Identification of a novel ligand-receptor pair constitutively activated by Ras oncogenes. J. Biol. Chem. 275, 24436–24443. 13. You, M., Ku, P.T., Hrdlickova, R., and Bose, H.R., Jr. (1997) ch-IAP1, a member of the inhibitor-of-apoptosis protein family,

132

14.

15.

16. 17. 18.

19.

20.

21.

22. 23.

24.

25.

26.

Meade et al. is a mediator of the antiapoptotic activity of the v-Rel oncoprotein. Mol. Cell. Biol. 17, 7328–7341. Park, B.-W., O’Rourke, D.M., Wang, Q., Davis, J.G., Post, A., Qian, X., et al. (1999) Induction of the Tat-binding protein 1 gene accompanies the disabling of oncogenic erbB receptor tyrosine kinases. Proc. Natl. Acad. Sci. USA 96, 6434–6438. Wang, M., Tan, Z., Zhang, R., Kotenko, S.V., and Liang, P. (2002) Interleukin-24 (Mob-5/ Mda-7) signals through two heterodimeric receptors, IL-22R1/IL-20R2 and IL-20R1/ IL-20R2. J. Biol. Chem. 277, 7341–7347. El-Deiry, W.S. (1998) Regulation of p53 downstream genes. Semin. Cancer Biol. 8, 345–357. Wu, X., Bayle, J.H., Olson, D., and Levine, A.J. (1993) The p53-mdm-2 autoregulatory feedback loop. Genes Dev. 7, 1126–1132. El-Deiry W.S., Tokino, T., Velculescu, V.E., Levy, D.B., Parsons, R., Trent, J.M., et al. (1993) WAF1, a potential mediator of p53 tumor suppression. Cell 75, 817–825. Miyashita, T., and Reed, J.C. (1995) Tumor suppressor p53 is a direct transcriptional activator of the human bax gene. Cell 80, 293–299 Okamoto, K., and Beach, D. (1994) Cyclin G is a transcriptional target of the p53 tumor suppressor protein. EMBO J. 13, 4816–4822. Buckbinder, L, Talbott, R., Velasco-Miguel, S., Takenaka, I., Faha, B., Seizinger, B.R., et al. (1995) Induction of the growth inhibitor IGF-binding protein 3 by p53. Nature 377, 646–649. Polyak, K., Xia, Y., Zweier, J.L, Kinzler, K.W., and Vogelstein, B. (1997) A model for p53induced apoptosis. Nature 389, 300–305. Wu, G.S., Burns, T.F., McDonald, E.R., Jiang, W., Meng, R., Krantz, I.D., et al. (1997) KILLER/DR5 is a DNA damageinducible p53-regulated death receptor gene. Nat. Genet. 17, 141–143. Gu, Z., Flemington, C., Chittenden, T., and Zambetti, G.P. (2000) ei24, a p53 response gene involved in growth suppression and apoptosis. Mol. Cell. Biol. 20, 233–241. Israeli, D., Tessler, E., Haupt, Y., Elkeles, A., Wilder, S., Amson, R., et al. (1997) A novel p53-inducible gene, PAG608, encodes a nuclear zinc finger protein whose overexpression promotes apoptosis. EMBO J. 16, 4384–4392. Lo, P.K., Chen, J.-Y., Lo, W.-C., Chen, B.-F., Hsin, J.-P., Tang, P.-P, et al. (1999)

27.

28.

29.

30.

31.

32.

33.

34.

35.

36.

37. 38.

Identification of a novel mouse p53 target gene DDA3. Oncogene 18, 7765–7774. Takei, Y., Ishikawa, S., Tokino, T., Muto, T., and Nakamura, Y. (1998) Isolation of a novel TP53 target gene from a colon cancer cell line carrying a highly regulated wildtype TP53 expression system. Genes Chromosomes Cancer 23, 1–9. Ng, C.C., Koyama, K., Okamura, S., Kondoh, H., Takei, Y., and Nakamura, Y. (1999) Isolation and characterization of a novel TP53-inducible gene, TP53TG3. Genes Chromosomes Cancer 26, 329–335. Tanaka, H., Arakawa, H., Yamaguchi, T., Shiraishi, K., Fukuda, S., Matsui, K., et al. (2000) A ribonucleotide reductase gene involved in a p53-dependent cell-cycle checkpoint for DNA damage. Nature 404, 42–49. Attardi, L., Reczek, E.E., Cosmas, C., Demicco, E.G., McCurrach, M.E., Lowe, S.W., et al. (2000) PERP, an apoptosis-associated target of p53, is a novel member of the PMP22/gas3 family. Genes Dev. 14, 704–718. Saller, E., Tom, E., Brunori, M., Otter, M., Estreicher, A., Mack, D.H., et al. (1999) Increased apoptosis induction by 121F mutant p53. EMBO J. 18, 4424–4437. Oda, E., Ohki, R., Murasawa, H., Nemoto, J., Shibue, T., Yamashita, T., et al. (2000) Noxa, a BH3-only member of the Bcl-2 family and candidate mediator of p53-induced apoptosis. Science 288, 1053–1058. Lin, Y., Ma, W., and Benchimol, S. (2000) Pidd, a new death-domain-containing protein is induced by p53 and promotes apoptosis. Nat. Genet. 26, 124–127. Oda, E., Arakawa, H., Tanaka, T., Matsuda, K., Tanikawa, C., Mori, T., et al. (2000) p53AIP1, a potential mediator of p53-dependent apoptosis, and its regulation by Ser-46-phosphorylated p53. Cell 102, 849–862. Okamura, S., Arakawa, H., Tanaka, T., Nakanishi, H., Ng, C.C., Taya, Y., et al. (2001) p53DINP1, a p53-inducible gene, regulates p53-dependent apoptosis. Mol. Cell 8, 85–94. Yu, J., Zhang, L, Hwang, P.M., Kinzler, K.W., and Vogelstein, B. (2001) PUMA induces the rapid apoptosis of colorectal cancer cells. Mol. Cell 7, 673–682. Nakano, K., and Vousden, K.H. (2001) PUMA, a novel proapoptotic gene, is induced by p53. Mol. Cell 7, 683–694. Leng, R.P., Lin, Y., Ma, W., Wu, H., Lemmers, B., Chung, S., et al. (2003) Pirh2, a

39.

40.

41.

42.

43.

44.

45.

46.

47.

48.

49.

Automated Fluorescent Differential Display for Cancer Gene Profiling p53-induced ubiquitin-protein ligase, promotes p53 degradation. Cell 112, 779–791. Yin, Y., Liu, Y.-X., Jin, Y.J., Hall, E.J., and Barrett, J.C. (2003) PAC1 phosphatase is a transcription target of p53 in signalling apoptosis and growth suppression. Nature 422, 527–531. Owen-Schaub, L.B., Zhang, W., Cusack, J.C., Angelo, L.S., Santee, S.M., Fujiwara, T., et al. (1995) Wild-type human p53 and a temperature-sensitive mutant induce Fas/ APO-1 expression. Mol. Cell. Biol. 15, 3032–3040. Kannan, K., Kaminski, N., Rechavi, G., Jakob-Hirsch, J., Amariglio, N., and Givol, D. (2001) DNA microarray analysis of genes involved in p53 mediated apoptosis: activation of Apaf-1. Oncogene 20, 3449–3455. Stambolic, V., MacPherson, D., Sas, D., Lin, Y., Snow, B., Jang, Y., et al. (2001) Regulation of PTEN transcription by p53. Mol. Cell 8, 317–325. Sax, J.K., Fei, P., Murphy, M.E., Bernhard, E., Korsmeyer, S.J., and El-Deiry, W.S. (2002) BID regulation by p53 contributes to chemosensitivity. Nat. Cell Biol. 411, 842–849. Cho, Y.-J., Meade, J.D., Walden, J.C., Chen, X., Guo, Z., and Liang, P. (2001) Multicolor fluorescent differential display. Biotechniques 30, 562–572. Meade, J.D., Cho, Y.-J., Fisher, J.S., Walden, J.C., Guo, Z., and Liang, P. (2005) Automation of fluorescent differential display with digital readout. In Differential Display Methods and Protocols, 2nd edition. Vol. 317 (Liang, P., Meade, J.D., & Pardee, A.B., eds.) Humana Press, Totowa, NJ, pp. 23–57. Bauer, D., Muller, H., Reich, J., Riedel, H., Ahrenkiel, V., Warthoe, P., et al. (1993) Identification of differentially expressed mRNA species by an improved display technique (DDRT-PCR). Nucleic Acids Res. 21, 4272–4280. Liang, P., Bauer, D., Averboukh, L., Warthoe, P., Rohrwild, M., Muller, H., et al. (1995) Analysis of altered gene expression by differential display. Methods Enzymol. 254, 304–321. Liang, P., Zhu, W., Zhang, X., Guo, Z., O’Conell, R.P., Averboukh, L., et al. (1994) Differential Display using one-base anchored oligo-dT primers. Nucleic Acids Res. 22, 5763–5764. Liang, P., Averboukh, L., and Pardee, A.B. (1994) Method of differential display. In Methods in Molecular Genetics, (Adolph, K.W., ed.) Academic, San Diego, CA, pp. 3–16.

133

50. Yang, S., and Liang, P. (2004) Global analysis of gene expression by differential display a mathematical model. Mol. Biotechnol. 27, 197–208. 51. Liang, P., Averboukh, L., and Pardee, A.B. (1993) Distribution and cloning of eukaryotic mRNAs by means of differential display: Refinements and optimization. Nucleic Acids Res. 21, 3269–3275. 52. Hsu, D.K., Donohue, P.J., Alberts, G.F., and Winkles, J.A. (1993) Fibroblast growth factor-1 induces phosphofructokinase, fatty acid synthase and Ca (2+)-ATPase mRNA expression in NIH 3T3 cells. Biochem. Biophys. Res. Commun. 197, 1483–1491. 53. Sokolov, B.P., and Prockop, D.J. (1994) A rapid and simple PCR-based method for isolation of cDNAs from differentially expressed genes. Nucleic Acids Res. 22, 4009–4015. 54. Irie, T., Oshida, T., Hasegawa, H., Matsuoka, Y., Li, T., Oya, Y., et al. (2000) Automated DNA fragment collection by capillary array gel electrophoresis in search of differentially expressed genes. Electrophoresis 21, 367–374. 55. Ausubel, F., Brent, R., Kingston, R. E., Moore, D. D., Seidman, J. G., Smith, J. A., et al. (eds.) (1995) Short Protocols in Molecular Biology (3rd edition). Wiley, New York, NY. Section 4.9.1.–4.9.8. 56. Zhang, H., Zhang, R., and Liang, P. (1996) Differential screening of gene expression difference enriched by differential display. Nucleic Acids Res. 24, 2454–2455. 57. Ramdas, L., Coombes, K.R., Baggerly, K., Abruzzo, L., Highsmith, W.E., Krogmann, T., et al. (2001) Sources of nonlinearity in cDNA microarray expression measurements. Genome Biol. 2, RESEARCH0047. 58. Richmond, C.S., Glasner, J.D., Mau, R., Jin, H., and Blattner, F.R. (1999) Genomewide expression profiling in Escherichia coli K-12. Nucleic Acids Res. 27, 3821– 3835. 59. Gibbs, W.W. (2001) Shrinking to enormity: DNA microarrays are reshaping basic biology – but scientist fear that they may soon drown in data. Sci. Am. 284, 33–34. 60. Liang, P. (2000) Gene discovery using differential display. Gen. Eng. News 20, 37. 61. Liang, S., Rossby, S.P., Liang, P., Shelton, R.C., Manier, D.H., Chakrabarti, A., et al. (2005) Detection of an mRNA polymorphism by differential display. In Differential Display Methods and Protocols, 2nd edition. Vol. 317 (Liang, P., Meade, J.D., & Pardee, A.B., eds.) Humana Press, Totowa, NJ, pp 279–285.

Chapter 8 Manual Microdissection Combined with Antisense RNA–LongSAGE for the Analysis of Limited Cell Numbers Jutta Lüttges, Stephan A. Hahn, and Anna M. Heidenblut Summary Establishing a gene expression profile of defined subtypes of cells within an organ is still challenging because it frequently requires microdissection and subsequent amplification of the limited amount of messenger RNA (mRNA) isolated from the microdissected tissue in order to be able to proceed with comprehensive gene expression analyses via microarray or serial analysis of gene expression (SAGE) technology. Here we describe a manual microdissection strategy for the isolation of high-quality RNA. Furthermore, a strategy for combining linear amplification of RNA with longSAGE is described that allows the use of antisense RNA (aRNA) generated via the well-established linear amplification of RNA procedure together with the conventional SAGE or longSAGE technology. Key words: Microdissection, RNA amplification, T7 RNA polymerase, Amplified antisense RNA, aRNA, aRNA-longSAGE, Expression profiles

1. Introduction To be able to analyze the expression profile of distinct histological cell types within a complex primary tissue, a method to isolate the cells of interest is needed. Microdissection using laser capture or manual techniques has been successfully used to produce such highly enriched cell preparations. Manual microdissection is described herein because in our hands this procedure was easier and faster than using laser capture microdissection. Amplified antisense RNA (aRNA)-long serial analysis of gene expression (longSAGE) is a modification of the conventional SAGE protocol that allows the generation of SAGE libraries from very small sample sizes such as microdissected cells (1). As little as Robert Grützmann and Christian Pilarsky (eds.), Cancer Gene Profiling: Methods and Protocols, Methods in Molecular Biology, vol. 576, DOI 10.1007/978-1-59745-545-9_8, © Humana Press, a part of Springer Science + Business Media, LLC 2010

135

136

Lüttges, Hahn, and Heidenblut

40 ng of total RNA is sufficient to generate an aRNA-longSAGE library. This is achieved by linear amplification of RNA that is carried out prior to the synthesis of the SAGE library. Linear amplification of RNA (2) is a method routinely used in gene expression profiling via microarrays. It starts with a complementary DNA (cDNA) synthesis using a modified oligo(dT) primer that adds the T7 RNA polymerase promoter to the 3¢ end of the cDNA. In vitro transcription of this cDNA with T7 RNA polymerase yields amplified aRNA. This technique introduces less amplification bias than polymerase chain reaction (PCR)based cDNA amplification protocols (3). Furthermore, the use of aRNA in differential gene expression analysis leads to the detection of expression differences that are not observed when using nonamplified RNA as starting material (4, 5). The majority of these additional expression differences can be verified by quantitative real-time PCR (4). The aRNA obtained by linear amplification of RNA cannot be used in combination with the standard SAGE protocol because the latter needs sense RNA for the cDNA synthesis that is the first step of library generation. The aRNA-longSAGE protocol described herein uses a modified cDNA synthesis to adapt the SAGE procedure to the use of antisense RNA. This is done by using a random primer for the cDNA first strand synthesis (see Fig. 1). This so called “SAGErandom” primer consists of six random nucleotides and the recognition site of the SAGE anchoring enzyme NlaIII. The NlaIII site was included to specifically reverse transcribe only those RNA molecules that are accessible to the SAGE procedure. After first strand synthesis,

Fig. 1. cDNA synthesis in the aRNA-longSAGE protocol.

Manual Microdissection Combined with Antisense RNA–LongSAGE

137

the RNA is digested with RNaseH and the resulting cDNA first strand can be hybridized to oligo(dT) beads due to its polyA tail. The oligo(dT) sequence on the beads serves as a primer for the synthesis of the second cDNA strand. After second strand synthesis, all further steps are done according to the MicroSAGE protocol (6) adapted for longSAGE (7). Some minor modifications were introduced to improve the yield of ditags and the length of concatemers (see Fig. 2 for gel photographs of an aRNA-longSAGE library).

Fig. 2. Representative gel photographs of an aRNA-longSAGE library. (a) 0.5 ml first strand cDNA on a 1% agarose gel; (b) ditag PCR on a 12% polyacrylamide gel (−, negative control; +, positive control); (c) isolated ditags on a 12% polyacrylamide gel (PCR, 5 ml of ditag PCR prior to Hsp92II digestion, four of seven lanes with isolated ditags are shown in the photograph); (d) concatemers (C ) on a 8% polyacrylamide gel; (e) insert PCR on a 1.5% agarose gel, the horizontal line shows the product size that corresponds to an empty cloning vector.

138

Lüttges, Hahn, and Heidenblut

Using the longSAGE rather than the conventional SAGE protocol improves the annotation of SAGE tags because longSAGE tags are 21 bp long, whereas conventional SAGE tags are only 14 bp long. However, the modified cDNA synthesis used in the aRNA-longSAGE protocol can be combined with the conventional SAGE protocol as well as with the longSAGE protocol.

2. Materials 2.1. Microdissection

1. Microtome for cryo sectioning with a blade holder for disposable blades. 2. Tissue-Tek OCT (Sakura Finetek Europe) for embedding of frozen tissue sample. 3. Disposable microtome blade. 4. RNase free and sterile glass slides. 5. Coverslips. 6. Histomount (Pertex, Histolab Products AB, Göteborg Sweden). 7. Hematoxylin and eosin staining solution. 8. Light microscope (i.e., BH2, Olympus). 9. RNase-free 200-ml tubes. 10. Sterile disposable hypodermic needles (size 0.60 × 25 mm, 23 gauge × 1″, i.e., Braun Sterican). 11. Extraction buffer from MicoPure RNA isolation kit (Arcturus Engineering).

2.2. RNA Isolation

1. MicoPure RNA isolation kit (Arcturus Engineering). 2. Nuclease-free pipette tips. 3. 0.5-ml microcentrifuge tubes. 4. 2-ml lidless tubes. 5. RNase-Free DNase Set (Qiagen Catalog #79254).

2.3. Preparation of Amplified Antisense RNA

1. RNA amplification kit, e.g., MessageAmp from Ambion (Huntingdon, UK).

2.4. cDNA Synthesis

1. DEPC-treated water: Add 2 ml diethylpyrocarbonate (DEPC) to 1 l of water (see Note 1), shake for 30 min, and autoclave. 2. SAGErandom oligonucleotide 5¢-NNN NNN CATG-3¢.

Manual Microdissection Combined with Antisense RNA–LongSAGE

139

3. dNTP mix, 10 mM each, store aliquots at −20°C. 4. Dry ice and wet ice. 5. 5× First Strand Buffer, 0.1 M DTT, RNaseOUT, and SuperScript III Reverse Transcriptase (all from Invitrogen, Karlsruhe, Germany). 6. 5× Second Strand Buffer: 94 mM Tris/HCl, pH 6.9, 453 mM KCl, 23 mM MgCl2, 50 mM (NH4)2SO4, 0.75 mM b-NAD. 7. 5 U/ml RNaseH (USB, Cleveland, OH, USA) is diluted with Second Strand Buffer to a final concentration of 2 U/ml. 8. 1.5-ml sterile siliconized microcentrifuge tubes (Ambion). 9. Oligo(dT)25 Beads (Dynal Biotech, Hamburg, Germany) and a magnetic stand. 10. Overhead shaker. 11. Binding buffer and washing buffer B from Dynal (Dynabeads® mRNA Purification Kit). 12. 11.8 U/ml Escherichia coli DNA polymerase I (USB), 10 U/ml E. coli DNA ligase (USB) and 3 U/ml T4 DNA Polymerase (NEB, Frankfurt A.M., Germany). 13. Thermomixer and thermal cycler for incubation steps. 14. 0.5 M EDTA. 15. Buffer BW: 1 M NaCl, 5 mM Tris-HCl, pH 7.5, 0.5 mM EDTA; and BW/BSA: Buffer BW containing 0.1 mg/ml BSA (NEB). 16. 10× Buffer K from Promega (Ingelheim, Germany). 2.5. Cleavage of cDNA with the Anchoring Enzyme (Hsp92II)

1. 100× BSA (10 mg/ml, NEB). 2. 10 U/ml Hsp92II (see Note 2) and 10× Buffer K (both from Promega). 3. 5× ligase buffer from Invitrogen.

2.6. Ligating Linkers to Bound cDNA

1. Linker1A:

5¢-TTT GGA TTT GCT GGT GCA GTA CAA CTA GGC TTA ATA TCC GAC ATG-3¢

Linker1B:

5¢-TCG GAT ATT AAG CCT AGT TGT ACT GCA CCA GCA AAT CC(Amino C7)-3¢

Linker2A:

5¢-TTT CTG CTC GAA TTC AAG CTT CTA ACG ATG TAC GTC CGA CAT G-3¢

140

Lüttges, Hahn, and Heidenblut

Linker2B:

5¢-TCG GAC GTA CAT CGT TAG AAG CTT GAA TTC GAG CAG(Amino C7)-3¢

Linker oligonucleotides are dissolved in loTE to a final concentration of 350 ng/ml and stored at −20°C

2. loTE: 3.0 mM Tris-HCl, pH 7.5, 0.3 mM EDTA. 3. 10× polynucleotide kinase buffer, 10 mM ATP and T4 polynucleotide-kinase (10 m/ml, all from NEB). 4. 5 U/ml HC T4 ligase and 5× ligase buffer (both from Invitrogen). 5. 10× buffer 4 from NEB 2.7. Release of cDNA Tags Using the Tagging Enzyme MmeI

1. 32 mM S-adenosylmethionine (NEB). 2. 2 U/ml MmeI and 10× Buffer 4 from NEB. 3. PC8: Roti®-phenol/chloroform/isoamyl alcohol, pH 7.5– 8.0, from Roth (Karlsruhe, Germany), store at 4°C. 4. 10 M ammonium acetate solution, glycogen (store at −20°C) and ethanol.

2.8. Ligating Tags to form Ditags

5 U/ml HC T4 ligase and 5× ligase buffer (both from Invitrogen).

2.9. PCR Amplification of Ditags

1. 10× BV-Mg Buffer: 670 mM Tris–HCl, pH 8.8, 167 mM (NH4)2SO4, 67 mM MgCl2, 100 mM b-mercaptoethanol. 2. DMSO, store at −20°C. 3. dNTP mix, 10 mM each, store aliquots at −20°C. 4. Oligonucleotides “Primer 1”: 5¢-GTG CTC GTG GGA TTT GCT GGT GCA GTA CA-3¢ and “Primer 2”: 5¢-GAG CTC GTG CTG CTC GAA TTC AAG CTT CT-3¢. PCR primers are resuspended in loTE to a final concentration of 350 ng/ml and stored at −20°C. 5. Taq polymerase. 6. 40% acrylamide/bis solution (19:1, this is a neurotoxin when unpolymerized; handle with care) and N,N,N¢,N¢tetramethylethylendiamine (TEMED). 7. Ammonium persulfate: prepare 10% solution in water and store at 4°C. 8. Molecular weight marker for gel electrophoresis: 25-bp ladder from Invitrogen, diluted to a final concentration of 50 ng/ml. 9. Running buffer TAE: 40 mM Tris base, 20 mM acetic acid, 2 mM EDTA.

Manual Microdissection Combined with Antisense RNA–LongSAGE

141

10. Loading buffer: 20.0% (w/v) Ficoll® 70, 1.6% (v/v) glycerol, 0.01% (w/v) lauryl sarcosine, 0.001% (w/v) xylencyanole, 0.001% (w/v) bromphenol blue in TAE. 11. Staining solution: dilute 5 ml SYBR Green 1 concentrate (BioWhittaker, Rockland, ME, USA; SYBR Green 1 is toxic, handle with care) with 50 ml TAE. Prepare staining solution fresh as required, SYBR Green 1 is not stable in aqueous solution. 12. 15-ml polypropylene tubes and 5-ml glass pipette for PC8 extraction of amplified ditags. 13. PC8: Roti®-phenol/chloroform/isoamyl alcohol, pH 7.5– 8.0, from Roth, store at 4°C. 14. 15-ml centrifugation tubes. 15. 10 M ammonium acetate solution, glycogen (store at −20°C), and ethanol. 2.10. Isolation of Ditags

1. 100× BSA, 10 mg/ml. 2. 10 U/ml Hsp92II and 10× Buffer K (both from Promega). 3. PC8: Roti®-phenol/chloroform/isoamyl alcohol, pH 7.5– 8.0, from Roth, store at 4°C. 4. 10 M ammonium acetate solution, 20 mg/ml glycogen (store at −20°C), and ethanol. 5. TE solution (Invitrogen). 6. 40% acrylamide/bis solution (19:1, this is a neurotoxin when unpolymerized; handle with care) and N,N,N¢,N¢tetramethylethylendiamine (TEMED). 7. Ammonium persulfate: prepare 10% solution in water and store at 4°C. 8. Glycerol. 9. Molecular weight marker for gel electrophoresis: 25-bp ladder from Invitrogen, diluted to a final concentration of 50 ng/ml. 10. Running buffer TAE: 40 mM Tris base, 20 mM acetic acid, 2 mM EDTA. 11. Staining solution: dilute 5 ml SYBR Green 1 concentrate (BioWhittaker, SYBR Green 1 is toxic, handle with care) with 50 ml TAE. Prepare staining solution fresh as required, SYBR Green 1 is not stable in aqueous solution. 12. Electroelution device (Elutrap system from Schleicher & Schüll, Dassel, Germany). 13. Chloroform. 14. 3 M sodium acetate solution, pH 5.2, glycogen (store at −20°C), and ethanol.

142

Lüttges, Hahn, and Heidenblut

2.11. Concatenation of Ditags

1. 5 U/ml HC T4 ligase and 5× ligase buffer (both from Invitrogen). 2. PC8: Roti®-phenol/chloroform/isoamyl alcohol, pH 7.5– 8.0, from Roth, store at 4°C. 3. 10 M ammonium acetate solution, glycogen (store at −20°C), and ethanol. 4. 40% acrylamide/bis solution (19:1, this is a neurotoxin when unpolymerized; handle with care) and N,N,N¢,N¢tetramethylethylendiamine (TEMED). 5. Ammonium persulfate: prepare 10% solution in water and store at 4°C. 6. Loading buffer: 20.0% (w/v) Ficoll® 70, 1.6% (v/v) glycerol, 0.01% (w/v) lauryl sarcosine, 0.001% (w/v) xylencyanole, 0.001% (w/v) bromphenol blue in TAE. 7. Molecular weight marker: Smart Ladder Short Fragments from Eurogentech, Searing, Belgium. 8. Running buffer TAE: 40 mM Tris base, 20 mM acetic acid, 2 mM EDTA. 9. Staining solution: dilute 5 ml SYBR Green 1 concentrate (BioWhittaker, SYBR Green 1² is toxic, handle with care) with 50 ml TAE. Prepare staining solution fresh as required, SYBR Green 1 is not stable in aqueous solution. 10. Electroelution device (Elutrap system from Schleicher & Schüll). 11. Chloroform. 12. 3 M sodium acetate solution, pH 5.2.

2.12. Cloning Concatemers

1. pZERO-1 supercoiled (1 mg/ml; part of the Zero Background Cloning kit from Invitrogen). 2. 10× Buffer 2 (NEB) and 5 U/ml SphI (NEB). 3. LoTE (s. 2.6.2) and TE (Invitrogen). 4. PC8: Roti®-phenol/chloroform/isoamyl alcohol, pH 7.5– 8.0, from Roth, store at 4°C. 5. 10 M ammonium acetate solution, glycogen (store at −20°C), and ethanol. 6. 5× ligase buffer and 1 U/ml T4 DNA ligase (both from Invitrogen).

2.13. Transformation of Bacteria

1. Electrocompetent E. coli Top Ten bacteria (part of the Zero Background Cloning kit from Invitrogen). 2. Electroporation device. 3. LB broth (Sigma, Taufkirchen, Germany).

Manual Microdissection Combined with Antisense RNA–LongSAGE

143

4. 100 mg/ml Zeocin (Invitrogen, store at −20°C, this antibiotic is light sensitive). 5. X-Gal (5-bromo-4chloro-3-indoxyl-ß-d-galactopyranoside from Roth, store at −20°C). 6. LB-Zeocin-X-Gal-Agar: 50 mg/ml Zeocin, 80 mg/ml X-Gal, 1.5% (w/v) agar in LB broth. 2.14. Insert-PCR

1. 5× RDA buffer: 335 mM Tris-HCl, pH 8.8, 80 mM (NH4)2SO4, 50 mM b-mercaptoethanol, 0.5 mg/ml BSA. 2. 50 mM magnesium chloride solution. 3. Oligonucleotides Insert_for: 5¢-CTG GTT AAC CTT ACT GGC TGA GTT AGC TCA CTC ATT AGG CAC-3¢ and Insert_rev: 5¢-TGT AAA ACG ACG GCC AGT TAC GAC TCA CTA TAG GGC GAA TTG-3¢. 4. 10 mM dNTP-Mix (Promega, 10 mM each). 5. Taq polymerase.

3. Methods 3.1. Standard Operating Procedure for Specimen Isolation and Storage

1. Immediately after resection, the specimen has to be placed on crushed ice. 2. Report the time of ischemia (time of resection until processing in pathology). 3. Immediately perform a gross pathology inspection of the tissue. 4. Dissect the tissue of choice to samples of approximate 0.5 cm3 (~0.5 × 0.5 × 0.5). 5. Wrap tissue in tinfoil. 6. Snap freeze wrapped tissue pieces in liquid nitrogen. 7. Transfer frozen tissues for long-term storage into a −80°C freezer.

3.2. Microdissection

1. Take care that all areas of the microtome that will come into contact with the tissue will be RNase-free by treating the microtome with 100% ethanol. It is highly recommended to use disposable blades. 2. Mount your frozen tissue block using Tissue-Tek OCT. 3. Prepare for each specimen a 5-mm frozen tissue section on a standard glass slide. This tissue section serves as reference for identifying the areas of interest for microdissection and is stored for documentation.

144

Lüttges, Hahn, and Heidenblut

4. Stain each slide with standard hematoxylin and eosin (H&E) using cooled dyes and seal it with Histomount (Pertex) and a coverslip. 5. Identify via a light microscope on the stained reference section tissue areas containing cell of interest. This helps to identify the required cells during subsequent microdissection. 6. Generate one to several 10-mm tissue sections from the remaining tissue block via serial sectioning using sterile and RNase-free frosted cooled glass slides and store the slides at −20°C until subsequent processing steps. 7. Briefly fix tissue sections in RNase-free ethanol (Merck, Darmstadt, Germany). 8. Stain each section with standard H&E staining chemistry using dyes cooled to 4°C (do not seal sections with coverslips) and store sections until microdissection at −20°C. 9. Manually dissect tissue under microscope (i.e., BH2, Olympus) using sterile disposable hypodermic needles (size 0.60 × 25 mm, 23 gauge × 1″, i.e., Braun Sterican). Collect the cells in a 200-ml RNase-free reaction tube containing 50 ml extraction buffer (PicoPure RNA isolation kit, Arcturus Engineering). 10. Incubate the tube containing cells for 30 min at 42°C in an incubation oven. 11. Proceed with RNA isolation protocol or freeze the cell extract at −80°C. 3.3. RNA Isolation

This protocol follows, with some minor modifications, the protocol “C” by Arcturus for “Use with CapSure Macro LCM Caps”: 1. Pipette 250 ml Conditioning Buffer (CB) onto the purification column filter membrane. 2. Incubate the RNA purification column with conditioning buffer for 5 min at room temperature. 3. Centrifuge the purification column in the provided collection tube at 16,000 × g for 1 min. 4. Pipette 50 ml of 70% ethanol (EtOH) into the cell extract from Subheading 3.2. Mix well by pipetting up and down. DO NOT CENTRIFUGE. 5. Pipette the cell extract and EtOH mixture into the preconditioned purification column. The cell extract and EtOH will have a combined volume of approximately 100 ml. 6. To bind RNA to the column, centrifuge for 2 min at 100 × g. 7. Immediately follow with a centrifugation at 16,000 × g for 30 s to remove the flow-through.

Manual Microdissection Combined with Antisense RNA–LongSAGE

145

8. Pipette 100 ml Wash Buffer (W1) into the purification column and centrifuge for 1 min at 8,000 × g. 9. Add 5 ml of DNase I (Qiagen) to 35 ml of RDD-buffer (Qiagen), mix well, and add the mixture to the column. 10. Incubate for 15 min at room temperature. 11. Add 40 ml of Wash Buffer 1 (W1) to the column and centrifuge for 15 s at 8,000 × g. 12. Pipette 100 ml Wash Buffer 2 (W2) into the purification column and centrifuge for 1 min at 8,000 × g. 13. Pipette another 100 ml Wash Buffer (W2) into the purification column and centrifuge for 2 min at 16,000 × g. 14. Remove the flow-through and centrifuge again at 16,000 × g for 1 min. 15. Transfer the purification column to a new 0.5-ml microcentrifuge tube provided in the kit. 16. Pipette 12.5 ml nuclease free water (Qiagen) directly onto the membrane of the purification column (gently touch the tip of the pipette to the surface of the membrane while dispensing the water to ensure maximum absorption of water into the membrane). 17. Incubate the purification column for 1 min at room temperature. 18. Centrifuge the column for 1 min at 1,000 × g to distribute the water in the column. 19. Centrifuge for 1 min at 16,000 × g to elute RNA. 20. The isolated RNA is now ready for use in downstream applications or may be stored at −80°C until use. 21. Optional: The quality of the RNA can be analyzed on a RNA PicoChip on a BioAnalyzer platform (Agilent, Böblingen, Germany). 3.4. Preparation of Amplified Antisense RNA

Several companies sell kits for RNA amplification via T7 promoterdriven in vitro transcription of cDNA. All protocols start with cDNA synthesis using a modified oligo(dT) primer containing the T7 RNA polymerase promoter. After purification of the cDNA, the in vitro transcription is carried out followed by purification of the amplified antisense RNA (aRNA). The incubation time for the in vitro transcription varies between protocols. Longer incubation times give a higher yield of aRNA but might also lead to degradation of part of the aRNA. For the aRNA-longSAGE protocol, the in vitro transcription is carried out for 18 h. Shorter incubation times are possible, especially if more than 40 ng of total RNA is available for the amplification procedure. The yield of aRNA can be estimated using an RNA PicoChip on a

146

Lüttges, Hahn, and Heidenblut

BioAnalyzer platform (Agilent, Böblingen, Germany). A minimum of 1.2 mg of aRNA should be used for the generation of an aRNA-longSAGE library (see Note 3). 3.5. cDNA Synthesis

1. Add DEPC-treated water to the aRNA to a final volume of 10 ml, then add 2 ml of SAGErandom oligonucleotide and 1 ml of 10 mM dNTP mix and incubate for 5 min at 65°C in a thermal cycler. After the incubation place sample on dry ice, thaw on wet ice and add 4 ml First Strand Buffer, 1 ml of 0.1 M DTT, 1 ml RNaseOUT, and 1 ml SuperScript™ III Reverse Transcriptase. Incubate in a thermal cycler for 5 min at 37°C, 1 h at 50°C, and 15 min at 70°C. 2. Add 1 ml of RNase H (2 U/ml, diluted in Second Strand Buffer) and incubate for 20 min at 37°C in a thermal cycler. 3. Add 0.5 ml DEPC-treated water, mix well, and remove 0.5 ml of the sample for loading on a 1% agarose gel. 4. Add 79 ml of DEPC-treated water. 5. Wash 200 ml Oligo(dT)25 Beads with 100 ml of binding buffer and resuspend in 100 ml of binding buffer. 6. Mix the sample with resuspended beads in a siliconized microcentrifuge tube (see Note 4). Put the sample in an overhead shaker and rotate for 15 min at room temperature. 7. Wash sample twice with 200 ml Washing Buffer B and four times with 200 ml Second Strand Buffer. 8. Resuspend beads in 112.25 ml of ice-cold DEPC-treated water and add the following components on ice: 32 ml of 5× Second Strand Buffer, 6 ml of 0.1 M DTT, 3 ml dNTPMix (10 mM each), 4.5 ml E. coli DNA Polymerase I (11.8 U/ml), 1.5 ml E. coli DNA ligase (10 U/ml), and 0.75 ml E. coli RNaseH (2 U/ml, diluted in Second Strand Buffer). 9. Incubate in a thermomixer for 2.5 h at 16°C. To keep beads in suspension, mix the sample every 15 min on a slow speed vortex (use a setting of 5). 10. Add 4 ml T4 DNA polymerase (3 U/ml) and incubate for 5 min at 16°C. 11. Add 4 ml of 0.5 M EDTA and 750 ml of 1× BW and incubate for 20 min at 75°C. 12. Wash beads once with 750 ml of 1× BW, four times with 750 ml of 1× BW/1× BSA, and twice with 200 ml of 1× Promega Buffer K that contains 0.1 mg/ml BSA.

3.6. Cleavage of cDNA with the Anchoring Enzyme (Hsp92II)

1. Resuspend beads in 200 ml reaction mix containing 1× Puffer K (Promega), 0.1 mg/ml BSA, and 50 U Hsp92II (see Note 2) and incubate in a thermomixer for 1 h at 37°C. To keep beads in suspension, mix the sample every 15 min on a slow speed vortex (use a setting of 5).

Manual Microdissection Combined with Antisense RNA–LongSAGE

147

2. Wash beads once with 750 ml of 1× BW, four times with 750 ml of 1× BW/1× BSA, and twice with 200 ml of 1× ligase buffer (Invitrogen). 3. Resuspend beads in 200 ml of 1× ligase buffer (Invitrogen). 3.7. Ligating Linkers to Bound cDNA

Prior to the first use, the linker oligonucleotides are phosphorylated and hybridized to obtain linkers “1” and “2.” Phosphorylated and hybridized linkers can be stored at −20°C in aliquots for single use. 1. Linker oligonucleotides “1B” and “2B” are phosphorylated in two separate tubes by adding 6 ml loTE, 2 ml of 10× Polynucleotide Kinase Buffer (NEB), 2 ml of 10 mM ATP (NEB), and 1 ml T4 Polynucleotide Kinase (10 U/ml, NEB) to 9 ml linker oligonucleotide (350 ng/ml). The tubes are incubated in a thermal cycler for 30 min at 37°C and then for 10 min at 65°C. 2. To hybridize linkers, mix the phosphorylated “Linker B” molecules with 9 ml of the appropriate “Linker A” oligonucleotide (350 ng/ml), i.e., mix phosphorylated Linker 1B with 9 ml Linker 1A and phosphorylated Linker 2B with 9 ml Linker 2A. Incubate both tubes for 2 min at 95°C, 10 min at 65°C, 10 min at 37°C, and 20 min at 22°C in a thermal cycler. Add 271 ml loTE to each tube, aliquot, and store linkers at −20°C (final concentration of linkers is 20 ng/ml). 3. To ligate linkers to the immobilized cDNA divide sample (200 ml beads in 1× ligase buffer) equally in two new tubes. 4. Remove the supernatant and resuspend in 9 ml reaction mix containing 5 ml loTE, 2 ml of 5× ligase buffer and 2 ml of kinased and annealed Linker 1 or 2, respectively. 5. Incubate the sample for 2 min at 50°C, then for 10 min at room temperature. 6. Add 1 ml HC T4 ligase (5 U/ml, Invitrogen) to each tube, vortex carefully. 7. Incubate at 16°C for 1¾ h in a thermomixer. To keep beads in suspension, mix the sample every 15 min on a slow speed vortex (use a setting of 5). 8. Wash beads once with 500 ml of 1× BW/1× BSA. 9. Unite ligation reactions 1 and 2 in a new tube. 10. Wash beads three times with 500 ml of 1× BW/1× BSA, once with 200 ml of 1× BW, and once with 200 ml of 1× Buffer 4 (NEB). 11. Resuspend beads in 200 ml of 1× Buffer 4 (NEB) and store overnight at 4°C. 12. Wash beads twice with 200 ml of 1× Buffer 4 that was prewarmed to 37°C.

148

Lüttges, Hahn, and Heidenblut

3.8. Release of cDNA Tags Using the Tagging Enzyme MmeI

1. Prepare a 1 mM S-Adenosylmethionine (SAM) solution by diluting the 32 mM SAM solution that comes with the MmeI enzyme. 2. Resuspend beads in 200 ml prewarmed (37°C) reaction mix containing 1 × Buffer 4 (NEB), 0.05 mM SAM, and 8 U of MmeI. 3. Incubate at 37°C for 1 h in a thermomixer. To keep beads in suspension, mix the sample every 15 min on a slow speed vortex (use a setting of 5). 4. Centrifugate at 16,110 × g for 2 min in a microcentrifuge. 5. Transfer supernatant to a new microcentrifuge tube (there is no longer a need to use siliconized tubes). 6. Resuspend beads in 40 ml loTE. 7. Centrifugate at 16,110 × g for 2 min. 8. Remove supernatant and unite it with the supernatant of the first centrifugation step (total volume: 240 ml). 9. Do a PC8 extraction: Add an equal volume of PC8 to the sample, mix well on a vortex, centrifugate at 16,110 × g for 2 min, and transfer the upper (aqueous) phase to a fresh microcentrifuge tube. 10. Remove 40 ml of the sample for use as a “no ligase” control during ditag ligation and PCR amplification of ditags. Dilute this negative control with 160 ml loTE. 11. Precipitate the sample and negative control by adding 100 ml of 10 M ammonium acetate, 3 ml glycogen, and 1 ml 100% ethanol, and centrifuging for 30 min at 4°C at 16,110 × g. 12. Wash each pellet three times with 500 ml of 70% ethanol. 13. Resuspend the sample in 1.5 ml loTE and 2.5 ml water; resuspend the negative control in 1.5 ml loTE and 3.3 ml water. Incubate both tubes at room temperature for 5 min.

3.9. Ligating Tags to form Ditags

1. Add 1.2 ml of 5× ligase buffer to the sample and negative control, then add 0.8 ml HC T4 ligase (5 U/ml, Invitrogen) to the sample but not to the negative control. 2. Incubate for 2.5 h at 16°C in a thermal cycler. 3. Add 15 ml loTE to the sample and to the negative control.

3.10. PCR Amplification of Ditags

To optimize PCR conditions, a test PCR is run using different dilutions of the ditags (1:50/1:100/1:200/1:400 in loTE) at 26, 28, and 31 PCR cycles. A 1:50 dilution of the minus ligase control run at 31 cycles serves as a negative control. Prepare the PCRs under a laminar-flow hood to avoid contamination of the sample.

Manual Microdissection Combined with Antisense RNA–LongSAGE

149

1. For each PCR, mix 1 ml of template (diluted ditags), 4 ml of 10× BV-Mg Buffer, 3 ml DMSO, 5 ml dNTP mix (10 mM each), 1 ml of each PCR primer (350 ng/ml), and 25 ml of water. 2. Add a drop of mineral oil to each well and incubate in a thermal cycler for 3 min at 95°C, then hold the temperature at 78°C and add 10 ml of polymerase mix containing 3 ml of Taq polymerase in 1× BV-Mg Buffer to each well. 3. Run PCR for 26, 28, and 31 cycles in parallel, each cycle consisting of 30 s at 95°C, 30 s at 55°C, and 30 s at 70°C. After the last PCR cycle, incubate for 5 min at 70°C. 4. Load 5 ml of each PCR on a 20 × 20-cm polyacrylamide gel (12%, 19:1 acrylamide/bis). Run the gel in TAE buffer at 180 V until the bromphenol blue band of the marker has traveled a distance of approximately 8 cm (see Note 5). Stain the gel in SYBR Green 1 solution for 15 min and visualize the bands under UV light. 5. Use PCR conditions that were optimal in the test PCR for large-scale PCR. Large-scale PCR consists of 96 PCRs that are run in parallel and then pooled in a 15-ml polypropylene tube (see Note 6). 6. Centrifuge for 1 min at 2,630 × g and remove the mineral oil. 7. Extract with an equal volume of PC8, then centrifuge for 10 min at 2,200 × g. 8. Transfer 2.1 ml of the sample (upper phase) in each of two centrifuge tubes, add 700 ml of 10 M ammonium acetate, 18 ml of glycogen, and 6 ml of ethanol to each tube. Mix well and centrifuge for 30 min at 4°C and 12,000 × g. 9. Wash the pellet twice with 5 ml of 70% ethanol, remove the supernatant, and air-dry the pellet. 10. Resuspend each pellet in 45 ml loTE. 11. Incubate for 5–10 min at 37°C to aid solubilization. 3.11. Isolation of Ditags

1. Mix the complete sample (approximately 90 ml) with 68 ml water, 20 ml of 10× Buffer K, 2 ml BSA (10 mg/ml), and 20 ml Hsp92II (10 U/ml) and incubate for 1 h at 37°C in a heating block. 2. Do a PC8 extraction, then add 66.7 ml of 10 M ammonium acetate, 3 ml of glycogen, and 1 ml of ethanol. 3. Precipitate the ditags overnight at −70°C. 4. Centrifuge for 30 min at 4°C and 16,110 × g, wash the pellet twice with 500 ml of 70% ethanol, dry the pellet for 10 min at 16°C, and resuspend the pellet in 90 ml of TE.

150

Lüttges, Hahn, and Heidenblut

5. Add 5 ml of glycerol (see Note 7). 6. Load complete sample on a 20 × 20-cm polyacrylamide gel (12%, 19:1 acrylamide/bis). Run gel at 4°C and 180 V until the bromphenol blue band of the marker has traveled a distance of approximately 8 cm (see Note 5). Stain the gel in SYBR Green 1 solution for 15 min and visualize the bands under UV light. 7. Cut out the 34-bp ditag band. 8. For electroelution of the ditags, prepare the electroelution device in such a way that the elution chamber is 2 U-inserts wide and the trap is 1 U-insert wide. Put the gel slices into the elution chamber and electroelute at 4°C and 150 V for 2 h, then reverse the polarity and turn on at 200 V for 20 s. Transfer the eluted ditags (1 ml sample volume) from the trap to two microcentrifuge tubes (see Note 8). 9. Do a PC8 extraction. 10. Extract the aqueous phase with an equal volume of chloroform. 11. Precipitate ditags by adding 50 ml of 3 M sodium acetate, 2 ml glycogen, and 1,250 ml ethanol to each tube. Incubate at −70°C overnight. 12. Centrifuge at 4°C and 16,110 × g for 30 min, wash the pellets twice with 500 ml of 70% ethanol, air-dry the pellets on ice, and resuspend both pellets in altogether 7 ml loTE. 3.12. Concatenation of Ditags

1. Add 2 ml of 5× ligase buffer and 1 ml T4 ligase HC (5 U/ml; Invitrogen). 2. Incubate in a thermal cycler at 16°C for 30 min. 3. Add 190 ml loTE and extract with 200 ml PC8. 4. Add 100 ml of 10 M ammonium acetate solution, 3 ml of glycogen, and 700 ml of ethanol, keep on ice for 10 min, then centrifuge 15 min at 16,110 × g. Wash the pellet twice with 500 ml of 70% ethanol and resuspend in 10 ml loTE. 5. Add 5 ml of loading buffer, incubate for 10 min at 65°C, chill the sample on ice, and load the sample on one lane of an 8% polyacrylamide gel (acrylamide/bis 19:1, see Note 9). 6. Electrophorese for 3 h at 130 V. Stain the gel in SYBR Green 1 solution for 15 min. 7. Visualize the bands under UV light and excise concatemers >300 bp from the gel (see Note 10). Do not excise the large concatemers at the upper edge of the well (leave a margin of 1 mm gel at the upper edge of the well). 8. For electroelution of the concatemers, prepare the electroelution device in such a way that the elution chamber is 2 U-inserts wide and the trap is 1 U-insert wide. Put the gel

Manual Microdissection Combined with Antisense RNA–LongSAGE

151

slices into the elution chamber and electroelute for 60 min at room temperature then reverse polarity and turn on at 200 V for 20 s. Transfer the eluted ditags (1 ml sample volume) from the trap to two microcentrifuge tubes. 9. Do a PC8 extraction. 10. Extract the aqueous phase with an equal volume of chloroform. 11. Precipitate ditags by adding 50 ml of 3 M sodium acetate, 2 ml glycogen, and 1,250 ml ethanol to each tube. Incubate at −20°C for 1 h or overnight. 12. Centrifuge at 4°C and 16,110 × g for 15 min, wash the pellets twice with 500 ml of 70% ethanol, air-dry the pellets, and resuspend both pellets in altogether 15 ml of water. 3.13. Cloning of Concatemers

1. Mix 1 ml pZERO-1 supercoiled cloning vector (1 mg/ml, Invitrogen) with 2 ml of 10× Buffer 2 (NEB), 16 ml water, and 1 ml SphI (5 U/ml, NEB). 2. Incubate for 15 min at 37°C in a waterbath (see Note 11). 3. Add 180 ml loTE and do a PC8 extraction with 200 ml PC8. 4. Precipitate the linearized vector by adding 66.7 ml of 10 M ammonium acetate, 3 ml glycogen, and 1 ml ethanol, and centrifuging for 10 min at 16,110 × g. 5. Wash the pellet three times with 500 ml of 70% ethanol. 6. Resuspend the air-dried pellet in 40 ml TE (final concentration of the linearized vector is 25 ng/ml). 7. Mix 1 ml of the linearized pZERO-1 with 6 ml concatemers, 2 ml of 5× ligase Buffer, and 1 ml T4 DNA ligase (1 U/ml, both Invitrogen). Incubate for 1 h at 16°C and another hour at room temperature. 8. Add 190 ml loTE and do a PC8 extraction with 200 ml PC8. 9. Precipitate the sample by adding 66.7 ml of 10 M ammonium acetate, 3 ml glycogen, and 1 ml ethanol, and centrifuging for 20 min at 16,110 × g and 4°C. 10. Wash the pellet four times with 500 ml of 70% ethanol. 11. Resuspend the air-dried pellet in 8 ml loTE.

3.14. Transformation of Bacteria

1. Use 0.8 ml cloned concatemers to electroporate an aliquot (40 ml) of electrocompetent E. coli Top Ten (Voltage: 1,800 V, see Note 12). 2. Resuspend electroporated bacteria in 1 ml of LB medium. 3. Incubate for 1 h at 37°C and 220 rpm.

152

Lüttges, Hahn, and Heidenblut

4. Plate 300 ml bacteria suspension on each of three 14.5-cm LBZeocin-X-Gal plates. 3.15. Insert-PCR

1. For each PCR, mix 2 ml of 5× RDA-buffer, 1.2 ml of 50 mM MgCl2, 0.3 ml of each primer, 0.3 ml of 10 mM dNTP-Mix, and 5.9 ml of water. 2. Pipette 10 ml of the PCR mix into the wells of a 96-well plate and add a drop of mineral oil to each well. 3. Use a sterile toothpick to gently touch a white bacteria colony (see Note 13) and then dip it into the PCR mix. 4. Incubate in a thermal cycler for 2 min at 95°C then hold the temperature at 78°C and add 5 ml of polymerase mix containing 1 ml of Taq polymerase in 1× RDA Buffer to each well. Run five cycles consisting of 30 s at 95°C, 30 s at 60°C, and 45 s at 72°C, then run an additional 30 cycles consisting of 30 s at 95°C and 60 s at 70°C. 5. Run 5 ml of each PCR of a 1.5% agarose gel to check the insert sizes of the SAGE library. Empty vectors will give a 330-bp PCR product.

4. Notes 1. Unless mentioned otherwise, water means water with a conductivity of at least 18 MW. 2. Hsp92II is an isoschizomer of NlaIII that can be stored at −20°C. Due to different unit definitions for Hsp92II and NlaIII, the volume of Hsp92II that is needed for digestion steps is much higher than the volume of NlaIII. 3. If there is more than 1.2 mg of aRNA available, use up to 2.5 mg of aRNA for the generation of an aRNA-longSAGE library. More starting material tends to generate larger insert sizes in our hands. 4. Use siliconized tubes when dealing with magnetic beads to prevent the beads from adsorbing to the surface of the tube. Wash the beads by resuspending on a slow speed vortex (use a setting of 5) instead of pipetting the beads up and down in order to minimize loss of beads by adsorption to pipette tips. 5. Keeping the traveling distance constant will result in equal electrophoresis conditions between libraries better than keeping traveling time constant. Eight centimeters of traveling distance on a 20 × 20-cm gel gives a good separation of ditags from linkers.

Manual Microdissection Combined with Antisense RNA–LongSAGE

153

6. Make sure not to use polystyrene tubes because polystyrene reacts with PC8. 7. Glycerol is added instead of loading buffer to avoid contamination of the ditags. Adding glycerol is essential to increase the density of the sample. Without glycerol, the sample will be lost by diffusing into the running buffer. 8. Electroelution gives a higher yield of regained sample than gel elution by diffusion as is done in the standard SAGE protocol. 9. This is a different gel than used in the standard SAGE protocol. Using an acrylamide/bis proportion of 19:1 instead of 37.5:1 gives a better separation of undesired small concatemers from the concatemers that are cut out from the gel. 10. For no obvious reason, the concatenation step may not work on the first try for each library. Because this protocol does use only a small fraction of synthesized ditags as template for large-scale PCR it is possible to try again with a new largescale PCR. 11. A fully linearized vector is important for the success of the cloning step. Check on an agarose gel whether the vector is fully linearized. If the digestion with SphI did not yield fully linearized vector, the linearization should be repeated with a longer incubation time or with more than 5 U of SphI. 12. Use bacteria from the Zero Background Cloning Kit (Invitrogen). Prepare competent bacteria according to the instructions given in the kit. In our hands, this bacteria strain is better than the E. coli DH10B recommended in the original SAGE protocol. 13. Blue white screening helps to chose colonies with large inserts. Even though there are white colonies with short inserts as well as blue colonies with long inserts, all in all, the average insert size is longer for white colonies than for blue ones.

References 1. Heidenblut, A. M., Luttges, J., Buchholz, M., Heinitz, C., Emmersen, J., Nielsen, K. L., Schreiter, P., Souquet, M., Nowacki, S., Herbrand, U., Kloppel, G., Schmiegel, W., Gress, T. and Hahn, S. A. (2004) aRNA-longSAGE: a new approach to generate SAGE libraries from microdissected cells. Nucleic Acids Res, 32, E131. 2. Van Gelder, R. N., von Zastrow, M. E., Yool, A., Dement, W. C., Barchas, J. D. and Eberwine, J. H. (1990) Amplified RNA synthesized from limited quantities of heterogeneous cDNA. Proc Natl Acad Sci USA, 87, 1663–1667.

3. Puskas, L. G., Zvara, A., Hackler, L., Jr. and Van Hummelen, P. (2002) RNA amplification results in reproducible microarray data with slight ratio bias. Biotechniques, 32, 1330– 1334, 1336, 1338, 1340. 4. Polacek, D. C., Passerini, A. G., Shi, C., Francesco, N. M., Manduchi, E., Grant, G. R., Powell, S., Bischof, H., Winkler, H., Stoeckert, C. J., Jr. and Davies, P. F. (2003) Fidelity and enhanced sensitivity of differential transcription profiles following linear amplification of nanogram amounts of endothelial mRNA. Physiol Genomics, 13, 147–156.

154

Lüttges, Hahn, and Heidenblut

5. Feldman, A. L., Costouros, N. G., Wang, E., Qian, M., Marincola, F. M., Alexander, H. R. and Libutti, S. K. (2002) Advantages of mRNA amplification for microarray analysis. Biotechniques, 33, 906–912, 914. 6. St Croix, B., Rago, C., Velculescu, V., Traverso, G., Romans, K. E., Montgomery, E., Lal, A., Riggins, G. J., Lengauer, C., Vogelstein, B.

and Kinzler, K. W. (2000) Genes expressed in human tumor endothelium. Science, 289, 1197–1202. 7. Saha, S., Sparks, A. B., Rago, C., Akmaev, V., Wang, C. J., Vogelstein, B., Kinzler, K. W. and Velculescu, V. E. (2002) Using the transcriptome to annotate the genome. Nat Biotechnol, 20, 508–512.

Chapter 9 Quantitative DNA Methylation Profiling on a High-Density Oligonucleotide Microarray Anne Fassbender, Jörn Lewin, Thomas König, Tamas Rujan, Cecile Pelet, Ralf Lesche, Jürgen Distler, and Matthias Schuster Summary Recently, the analysis and functional elucidation of CpG island methylation has become a focus area of genomic research. Deviations from the normal parental imprinting pattern have been shown to cause developmental defects associated with serious symptoms. Aberrant DNA methylation of tumor suppressor and other functional genes, especially when found in 5¢ untranslated regions and early exons, has been associated with tumorigenesis. In the context of applying DNA methylation analysis for the molecular characterization of cancer and other diseases, standardized protocols enabling parallel genome-wide methylation profiling of numerous samples are required. DNA methylation profiling is described using a CpG island microarray representing more than 50,000 CpG-rich DNA fragments. Fragments were selected to represent the vast majority of known 5¢-untranslated regions as well as the first exons of thousands of genes. Measurement probes were designed to represent these fragments were displayed on an Affymetrix custom array. A modified procedure for differential methylation hybridization (DMH) is described for methylation enrichment. Application of a novel signal normalization concept enables accurate and reproducible measurements using a single fluorescence channel. The use of defined calibrator material allows quantification of DNA methylation patterns by DMH in a massively parallel fashion. Key words: DNA methylation, DMH, Microarray, Normalization, Calibration

1. Introduction The elucidation of the complex interplay between human phenotypes on the one hand and DNA methylation and other genomic factors on the other requires accurate as well as highly parallel assays with the ability to unravel genomic information layer by layer (1–3). Robert Grützmann and Christian Pilarsky (eds.), Cancer Gene Profiling: Methods and Protocols, Methods in Molecular Biology, vol. 576, DOI 10.1007/978-1-59745-545-9_9, © Humana Press, a part of Springer Science + Business Media, LLC 2010

155

156

Fassbender et al.

Oligonucleotide and polynucleotide arrays, in general, (4) and high-density oligonucleotide microarrays, in particular, provide an excellent technical basis for such genomic assays (5, 6). Arraybased DNA methylation assays have already been established using a variety of techniques for the detection of differential methylation such as methylation-sensitive restriction enzymes (7–10), methyl-binding proteins (11, 12), or methylationspecific antibody arrays (13). Based on these methods, genomewide methylation profiling was shown to be generally feasible using either microarrays synthesized in situ (14) or microarrays carrying immobilized polymerase chain reaction (PCR) products generated from BAC clones (15) or CpG island libraries (16). Here, we present a quantitative, genome-wide, DNA methylationprofiling assay based on the concept of differential methylation hybridization (DMH) (7). The assay is constructed using a highdensity Affymetrix probe set array designed to cover 51,317 CpG-rich fragments, most of them in the promoters or transcribed regions of annotated genes. Figure 1 depicts the principle of the DMH technique that was introduced in 1999 (7) and successfully applied to genome-wide DNA methylation marker discovery (7, 8). According to this procedure, high molecular weight genomic DNA is digested using MseI. After ligation of a universal adapter, the fragment mixture is digested using the methylation-sensitive enzyme, BstuI (7) or a mixture of BstuI

Fig. 1. Principle of differential methylation hybridization (DMH). Step 1. DNA fragmentation using methylation-unspecific restriction enzymes results in millions of fragments that are either very short (s) and are removed in a purification step, too long (L) for a later PCR step, or of adequate size for DMH (M−, M+, C ). Step 2. Adapter linker ligation to all fragments. Step 3. Digestion using methylation-sensitive restriction enzymes and PCR amplification: Fragments containing unmethylated restriction sites (M−) are cleaved and therefore not represented in the amplificate; fragments containing only methylated sites (M+) remain uncleaved and are amplified, as are fragments containing no restriction sites at all (C), which serve as controls for normalizing the methylation signal contained in (M+) after fragmentation, labeling, and array hybridization.

Quantitative DNA Methylation Profiling on a High-Density Oligonucleotide

157

and HpaII (8). In the following step, undigested fragments are PCR amplified with a universal primer. The resulting amplificate is enzymatically fragmented, labeled, and subsequently hybridized to microarrays. The prerequisite for analysis of a region of interest by DMH is the existence of at least one BstUI or HpaII recognition site within the fragment. When using the originally described restriction enzyme combination (8), 73% of amplifiable fragments do not contribute methylation information, because they do not contain methylation-sensitive restriction sites. However, these fragments do affect the hybridization of methylation-variable fragments via cross hybridization. To address these limitations, both the fragmentation and the methylation-sensitive restrictions were evaluated in silico and optimized to increase the methylation information content and to reduce the genomic complexity of the amplificate. Excessive digestion of non-CpG island sequences is achieved in the fragmentation digest by applying additional four-base cutters recognizing AT-rich sequences. The resulting short fragments are removed in purification steps, whereas a large portion of GC-rich sequences is subject to methylation-sensitive digestion by one or several of the four restriction enzymes used to digest unmethylated, but not DNA strands. Array-to-array variability of fluorescence values requires a powerful normalization strategy for methylation-variable signals. The presented DMH application utilizes two techniques for normalization. First, probe sets targeting methylation-invariable fragments, i.e., amplifiable fragments devoid of methylation-sensitive restriction sites, are used to normalize against experimental variability affecting absolute fluorescence levels and, second, methylation calibrators allow generation of truly quantitative data. Methylation marker development can then be streamlined using this quantitative method by combining data generated in different studies.

2. Materials 2.1. Sample DNA Extraction

1. 40 U Proteinase K (Qiagen). 2. QiaAmp DNA Mini Kit (Qiagen). 3. RNAseA (Qiagen).

2.2. DNA Methylation Calibrators

1. 0% methylated DNA: GenomiPhi® DNA Amplification Kit. 2. 100% methylated DNA: Sss1-methyltransferase (NEB); S-adenosylmethionine (SAM) (NEB).

158

Fassbender et al.

2.3. Adapter

1. H24: 5¢-AGGCAACTGTGCTATCCGAGGGAT-3¢, 200 mM in water. 2. H12: 5¢-TAATCCCTCGGA-3¢, 200 mM in water.

2.4. DNA Fragmentation

1. Unmethylated Lambda DNA (NEB). 2. DNA from human PBL (Promega). 3. QiaQuick PCR Purification Kit (Qiagen).

2.5. Adapter Ligation

1. T4 DNA ligase (NEB). 2. 1 mM ATP: 0.275 g ATP disodium salt (Sigma Aldrich) dissolved in water (Fluka) and adjusted to a final volume of 50 ml. The pH is adjusted to 7.5 with NaOH. Aliquots of 50 ml are stored at −20°C and not reused. 3. MinElute PCR Purification Kit (Qiagen).

2.6. MethylationSensitive Restriction

1. BstuI, HpaII, HinP1I, and HpyCH4IV (all NEB).

2.7. PCR

1. DeepVent® (exo-) DNA Polymerase (NEB). 2. dNTPs, 100 mM each (Fermentas), mixed and diluted 1:10 with water. 3. Microcon YM-30 (Millipore).

2.8. Fragmentation and Labeling

1. Gene Chip® Mapping 10K Xba Assay Kit (Affymetrix).

2.9. Hybridization, Washing, Staining, and Scanning

1. Affymetrix custom array (see Note 18).

2. EB buffer from QiaQuick PCR Purification Kit (Qiagen).

2. 12× MES: 70.4 g 2-(N-morpholino) ethanesulfonic acid monohydrate and 193.3 g 2-(N-morpholino)-ethanesulfonic acid sodium salt dissolved in 1,000 ml water (Fluka). 3. DMSO (Sigma Aldrich), EDTA (Gibco), 50× Denhardt’s Solution (Eppendorf), 10% Tween 20 (Pierce), tetramethylammonium-chloride (TMA-Cl, Sigma Aldrich). 4. Herring sperm DNA (10 mg/ml; Promega), human Cot-1 (Roche Diagnostics). 5. Control Oligo B2, 3 nM (Affymetrix). 6. ImmunoPure streptavidin (Perbio Science), R-phycoerythrin streptavidin (Invitrogen), anti-streptavidin antibody (AXXORA). 7. Wash Buffer A: 300 ml of 20× SSPE (Roche Diagnostics), 1 ml of 10% Tween 20 and 699 ml water are mixed and filtered through a 0.2-mm sterile filter. Wash Buffer B: To 30 ml of 20 × SSPE, add 1 ml of 10% Tween 20 and 969 ml water. Filter the solution through a 0.2-mm sterile filter.

Quantitative DNA Methylation Profiling on a High-Density Oligonucleotide

159

3. Methods 3.1. Sample DNA Extraction

1. Approximately 20-mg fresh frozen tissue sample is lysed by incubation with tissue lysis buffer (TLB, Qiagen) in combination with 40 U Proteinase K overnight (16 h) (see Notes 1 and 2). 2. The genomic DNA is isolated using the QiaAmp DNA Mini Kit (Qiagen) according to the manufacturer’s protocol. 3. The elution of the DNA is performed with 60 ml of prewarmed Elution Buffer (EB, Qiagen) (50°C) (see Notes 3 and 4). 4. Finally, the concentration is quantified by UV (see Note 5).

3.2. DNA Methylation Calibrators

1. 0% Methylation: Universally unmethylated DNA is prepared by molecular displacement amplification (MDA) using the GenomiPhi® DNA Amplification kit according to the manufacturer’s instructions on 10 ng of isolated human genomic DNA from peripheral blood lymphocytes (Promega). 2. 100% Methylation: Isolated human genomic DNA from peripheral blood lymphocytes (Promega) is methylated using Sss1-methyltransferase (NEB) in the presence of S-adenosylmethionine (SAM) according to the manufacturer’s instructions. Ten micrograms of DNA is incubated with 40 U SssI methylase and 1.24 ml SAM in a final volume of 100 ml for 16 h at 37°C in a thermomixer.

3.3. Preparation of Adapters

1. Equal amounts of the two primers H24 and H12 are mixed. 2. The mixture is incubated at 95°C for 5 min and than slowly cooled down. 3. Aliquots are stored at −20°C and reused.

3.4. DNA Fragmentation

1. 500 ng to 1 mg (see Note 6) of human genomic DNA is treated with 5 U each of MseI (NEB), Csp6I (Fermentas), and BfaI (NEB) in 30 ml of 1× Y+/Tango buffer (Fermentas) at 37°C for 16 h (see Note 7 and Table 1). 2. Enzyme activity controls: unmethylated Lambda DNA (NEB) is treated with 5 U of MseI (NEB) and Csp6I (Fermentas), respectively, and DNA from human PBL (Promega) is treated with BfaI in the same 1 × Y+/Tango buffer (Fermentas) at 37°C for 16 h (see Note 8). 3. Negative controls: 1 mg DNA from human PBL (Promega) is used as a “no ligase” negative control and water as a “no DNA” negative control throughout the complete procedure. 4. Enzymes are heat inactivated for 20 min at 65°C.

160

Fassbender et al.

Table 1 Characteristics of original versus optimized DMH restriction protocols DMH Protocol (Huang et al.)

Optimized DMH protocol

Percent of all fragments

Percentage of informative fragmentsb

Percent of all fragments

Percentage of informative fragmentsb

Detectable fragments

44

16

31

29

Removed fragments

56

7

69

5

a

Fragmentation digest

MseI

MseI, Csp6I, BfaI

Methylation-specific digest

BstUI, HpaII

HpaII, HinP1I, HpyCH4IV, BstUI

Fragments <100 bp are removed during purification of the first digest Fragments containing at least one methylation-sensitive restriction site

a

b

5. The fragment mix resulting from DNA samples is purified using QiaQuick columns (Qiagen). 6. Elution is performed with 40 ml water (Fluka) (see Note 9). 7. 3 ml of purified samples and 5 ml of the unpurified enzyme control reactions are analyzed on a 1.4% agarose gel (see Note 10). 3.5. Adapter Ligation

1. 34 ml of purified template is mixed with 5 ml ligase buffer (NEB), 400 U T4 DNA ligase (water [Fluka] in the case of the “no ligase” control), 5 ml ATP (final concentration, 1 mM) and 5 ml of the double-stranded adapter (final concentration 10 mM) in a total volume of 50 ml. 2. The ligation mix is incubated at 16°C for 4 h. 3. The ligated product mixture is purified immediately after incubation using QiaMinElute columns (Qiagen) and eluted in 30 ml water.

3.6. MethylationSensitive Restriction

1. 23 ml of each ligated fragment mixture is mixed with 3 ml restriction buffer NEB1 and 0.5 ml (10 U) of each BstuI, HpaII, HinP1I, and HpyCH4IV in a total volume of 50 ml (see Note 11). 2. Enzyme activity controls: 0.5 ml of each single enzyme is added to 1 mg unmethylated Lambda DNA (NEB). 3. All samples and controls are incubated at 37°C for 2 h and 60°C for 1 h (see Note 12).

Quantitative DNA Methylation Profiling on a High-Density Oligonucleotide

161

4. Then another 10 U of each enzyme (in the case of enzyme controls, only the respective enzyme) is added, and restriction is allowed to proceed for further 5 h at 37°C and 3 h at 60°C (see Note 12). 5. The restriction product is again purified using QiaQuick columns and eluted with 60 ml water (see Notes 9 and 13). 6. 8 ml of the purified samples and 8 ml of the unpurified digest control are loaded onto a 1.4% agarose gel (see Note 10). 3.7. PCR

1. PCR is performed in a final volume of 100 ml (see Note 13). 2. The PCR mix consists of 1× PCR buffer (ThermoPol) containing 1 ml of purified digest, 5 ml DMSO, 1.9 mM MgSO4, 175 mM dNTP (Gibco-BRL), and 2.5 mM universal primer H24 in the presence of 5 U DeepVent (exo-) DNA polymerase (NEB). 3. The PCR program starts with an initial extension at 72°C for 5 min followed by 20 cycles of denaturation at 97°C for 1 min and extension at 72°C for 3 min and a final extension step for 10 min (see Note 14). 4. Six PCRs are set up to ensure sufficient yield for subsequent steps (see Notes 15–17). 5. Two PCRs are purified using one QiaQuick column (Qiagen) by subsequent loading. 6. DNA is eluted with 80 ml prewarmed (50°C) EB-buffer (Qiagen). 7. Eluates are concentrated via Microcon YM-30 columns (Millipore) and eluted in 55 ml prewarmed (50°C) EBbuffer (Qiagen) and quantified by UV. For further processing, 20 mg amplificate is required in a maximum volume of 45 ml.

3.8. Fragmentation and Labeling

Amplificate fragmentation and labeling is performed as described in the GeneChip® Mapping 10K Xba Assay Kit manual (Affymetrix, 2003–2004) with the following variations: 1. Fragmentation of 20 mg purified DNA is performed in a final volume of 55 ml containing 1× fragmentation buffer and 0.06 U fragmentation reagent. 2. The incubation for 10 min at 37°C is followed by heat inactivation for 10 min at 95°C. 3. For the subsequent labeling reaction, 50.6 ml fragmented DNA is mixed with 14 ml TdT buffer, 2 ml labeling reagent, and 3.4 ml TdT and incubated for 2 h at 37°C, followed by a heat inactivation for 15 min at 95°C.

162

Fassbender et al.

3.9. Hybridization, Washing, Staining, and Scanning

Hybridization, washing, staining, and scanning is performed as described in the GeneChip® Mapping 10K Xba Assay Kit Manual (Affymetrix, 2003–2004), with the following exceptions: 1. An Affymetrix custom array is used (see Notes 18–20). 2. Hybridization solution consists of 11 ml of 12× MES; 12 ml DMSO, 12 ml Denhardt’s solution, 2.8 ml EDTA, 2.8 ml herring sperm DNA, 2.8 ml human Cot DNA, 0.9 ml of 3% Tween-20; 1.8 ml control oligo B2; 130 ml TMACL, and 65 ml labeled DNA.

3.10. Raw Data Processing and Normalization

1. Log2 transformation is performed on all fluorescence values. 2. Quality control: All signals are assessed using Gaussian smoothed data (standard deviation of five data points). Locations with smooth data deviating more than 0.5 from the median over the whole chip are defined as local outliers. Data are discarded when artifacts cover >15% of the array surface. The dynamic range of each array is assessed by analyzing the separation of signal distributions of all background and all normalization probes. If Fishers linear discriminant is smaller than 1, the array is excluded from the analysis. 3. Methylation signal normalization: The median intensity per array is calculated over all methylation-invariable probes, i.e., all probes targeting fragments containing no methylation-sensitive restriction sites (see Fig. 2a and Note 19). This value is

Fig. 2. DMH data aggregation. Top: Calculation of normalized corrected signal Sn for each methylation-variable probe by interpolation of signal S between the median of the control signals C1 (log2) and the median of the noise control signals C0 (log2). Bottom: Calculation of fragment methylation scores M (1) from normalized signals Sn using normalized calibration signals Sn0 and Sn100 and (2) by median averaging over all calibrated probe signal representing the same fragment.

Quantitative DNA Methylation Profiling on a High-Density Oligonucleotide

163

subtracted from all log2-transformed signals of methylationvariable probes. 3.11. Methylation Calibration

1. Unmethylated and universally methylated DNA are processed as 0 and 100% calibrator samples. Raw data processing and normalization is performed as described above. 2. Using the normalized probe signals of the calibrators, a twopoint calibration is performed for each normalized methylationvariable probe of each sample of interest to extrapolate absolute methylation levels for each methylation-variable probe (see Fig. 2b and Note 21). 3. Median averaging is performed over all calibrated probe data per probe set to obtain a quantitative methylation value for each methylation-variable fragment (see Note 22).

4. Notes 1. The use of cell lines is possible. In this case, pelleted cells are resuspended in 200 ml PBS and lysed using the manufacturer’s cell lysis buffer (AL), 40 U Proteinase K, and 20 U RNAse A for 4 h at 37°C. 2. The use of carrier material should be avoided and separation from RNA is important. 3. Water is used, because buffers can inhibit enzyme reactions. 4. Elution of DNA in small volumes is necessary to obtain DNA of sufficient concentration (>40 ng/ml for the next step). 5. Care should be taken to obtain template DNA of high integrity. To check quality and concentration of DNA, a gel electrophoresis is recommended. 6. If possible, 1 mg of DNA should be used to ensure assay robustness. If the DNA amount is limited, the protocol can be used with at least 200 ng. 7. The fragmentation enzyme mixture is optimized to fulfill the following criteria: (1) cleave AT-rich sequences into small enough fragments to be excluded in the purification step and thereby reduce complexity, (2) create a large number of amplifiable CpG-rich fragments with high coverage of methylation-variable regions, (3) a large proportion of these fragments contain methylation-sensitive restriction sites. The fragment distributions resulting from restriction by a variety of individual four base cutters and mixtures thereof are analyzed in silico. The product resulting from restriction using a mixture of MseI, Csp6I, and BfaI is characterized in

164

Fassbender et al.

Fig. 3. Fragmentation characteristics of modified protocol versus original DMH protocols used by Huang et al. (7, 8). The optimized protocol generates short AT-rich fragments, which are removed in the subsequent purification. (a) Molecular size standard; (b) PBL DNA treated with Csp6I, BfaI, and MseI; (c) PBL DNA treated with MseI; (d) undigested PBL DNA.

Table 1 in comparison with MseI only, which was used in previous DMH methods (7, 8). As shown in Fig. 3, addition of Csp6I and BfaI to MseI provides a higher proportion of very short fragments that are removed during the purification steps. Because less than 5% of these fragments contain methylation-sensitive restriction sites, their removal results in a reduction of genomic complexity without significant loss of methylation information. 8. Genomic DNA is used to test BfaI, because lambda DNA does not contain sufficient cutting sites to see any difference to undigested DNA on an agarose gel. 9. Because DNA is digested into small fragments and eluted in water, prolonged storage should be avoided to prevent further degradation. Therefore, the respective process steps: DNA fragmentation, ligation, and methylation-sensitive restriction are done on subsequent days. 10. If the enzymes perform well, the controls show specific band patterns when lambda DNA is used to test MseI and Csp6I. The pattern can be obtained from REBASE (http://rebase. neb.com/rebase/). Enzymes are considered to perform well if the band with highest molecular weight is at approximately 2,000 bp. The digest of PBL (Promega) DNA with BfaI results in a smear with a size of approximately 100–2,000 bp. The digest needs to be repeated if this criterion is not fulfilled. The sample DNA should show a smear like PBL DNA. This smear can be very faint, especially if less than 1 mg starting material is used. 11. Enrichment of methylated fragments is achieved by methylationsensitive restriction. In addition to BstuI and HpaII, as used

Quantitative DNA Methylation Profiling on a High-Density Oligonucleotide

165

in the DMH protocol introduced by the Huang group in 2001 (8), HinP1I and HpyCH4IV are applied to increase the stringency of methylation enrichment. Thereby, the number of informative fragments is doubled and their portion within all amplifiable fragments has been increased to 29% from 16% (Table 1). Considering that one unmethylated restriction site per fragment causes cleavage and thereby prevents amplification, shorter fragment length in combination with increased fragment number will provide methylation information that is less prone to be affected by sporadic events like composite methylation in the direct neighborhood of comethylated sequences carrying the target information. Altogether, the methylation-sensitive amplifiable fragments generated by the selected enzyme combination cover 99.7% of all CpG-islands and 99.1% of all TSS. 12. 60°C is the recommended incubation temperature for BstuI, but the enzyme works at 37°C with decreased efficiency. 13. The product of the methylation-sensitive restriction shows degradation over time, i.e., the PCR yield decreases. Therefore, it is recommended that PCR is done within 2 weeks after methylation-sensitive restriction. 14. Twenty PCR cycles are performed to avoid amplification bias. 15. The yield of one 100 ml PCR is not sufficient to obtain 20 mg of PCR product in a maximum volume of 45 ml, which is necessary for the use of the Fragmentation and Labeling procedure of the Gene Chip® Mapping 10K Xba Assay kit (Affymetrix). Therefore six PCRs are performed, purified, and concentrated. 16. If many samples are processed in parallel, additional 100-ml PCRs need to be performed. 17. Extended preparation times for PCR mastermix should be avoided. 18. The Affymetrix custom array contains probe sets designed to match 51,317 fragments of the non-methylation-specific restriction library. The majority of the detected fragments contains one to ten methylation-sensitive restriction sites and covers 5–20 CpG sites. Whereas 60% of fragments are positioned within single annotated genes, 38% are outside the context of known genes. Approximately 2% can be associated with more than one gene. The fragments represented on the array overlap with 14,017 (out of 30,391) unique TSS and 10,522 (out of 22,395) unique CpG islands. The representation of CpG islands is strongly biased toward promoter regions and exons I of annotated genes and against repetitive elements.

166

Fassbender et al.

19. In addition to the methylation-variable fragments, the Affymetrix custom array contains 1,000 fragments devoid of methylation-sensitive restriction sites that are used for signal normalization. The hybridization background is represented by 4,821 nongenomic oligonucleotides. For detection of signal saturation effects, 1,034 probes specific for repeat fragments have been included (Table 2). 20. The use of any other Affymetrix-type microarray containing oligonucleotide probes is possible if probes are selected according to the following criteria: (1) At least six probes per probe set are recommended. (2) Probe sets covering

Table 2 CpG island microarray – oligonucleotide content Probe type

Single probes

Probe sets

Methylation

491,491

51,317

Normalization

9,834

1,000

Background

4,821

n.a.

Repeats

1,034

n.a.

n.a. not applicable

Fig. 4. Ability of individual methylation-variable probe sets to differentiate 0, 50, and 100% methylation using DNA methylation mixtures prepared in vitro. The DMH fragment data is normalized by subtraction of the mean and division through the standard deviation, and ranked by t statistics.

Quantitative DNA Methylation Profiling on a High-Density Oligonucleotide

167

fragments from 100 to 600 bp should be selected. (3) A minimum of one single methylation-sensitive restriction site is sufficient. (4) Fragments without any methylation-sensitive restriction sites are necessary for normalization. 21. To characterize the analytical performance of each methylation-variable probe set, unmethylated and methylated DNA as well as mixtures of both representing 50% methylation were processed as described. In Fig. 4 all methylation-variable probe sets are ordered according to their ability to differentiate between 0 and 100% methylation based on t statistics. Our data illustrate that 95% of probe sets are functional, i.e., are able to clearly differentiate different methylation states. 22. Example data: The described assay was used to analyze 24 colon cancer samples with 11 samples from normal colon. All samples were from commercial sources and were obtained under appropriate consent. In Fig. 5, the volcano plot of t statistics versus methylation difference is shown. Markers combining large methylation differences between normal and CRC that separate both groups with high significance are indicated in red. Forty-one fragments that show hypermethylation and nine fragments that show hypomethylation in colon cancer were identified (see Table 3). Many of

Fig. 5. Volcano plot for differential DNA methylation analysis between 24 CRC tissue samples and 11 normal controls. (Differences between methylation scores M are shown. The most significant discriminators with large methylation differences are shown as crosses).

168

Fassbender et al.

Table 3 Marker selection from differential DNA methylation analysis between 24 CRC tissue samples and 11 normal controls Methylation difference HUGO genesa

HUGO TSSb

6.4

−0.46

NPY

NPY

3

6.7

−0.41

HOXA3

4

7.1

−0.43

HOXA5

HOXA5

5

7.6

−0.46

AQP1

AQP1

6

6.3

−0.38

EFCBP1

EFCBP1

7

5.3

−0.47

8

5.8

−0.39

9

6.3

−0.33

10

4.8

−0.45

STK24

STK24

11

6.5

−0.38

HOXB13

HOXB13

12

5.4

−0.41

C20orf117

13

4.8

−0.40

ZHX3

14

4.5

−0.55

C20orf161

15

5.1

−0.41

PARD6B

16

4.7

−0.41

ZNF217

17

5.6

−0.41

DOK5

18

5.7

−0.41

19

4.9

−0.40

RAE1

20

6.4

−0.49

GNAS

21

5.9

−0.34

CYP1B1

CYP1B1

17145863, 15172987

22

5.2

−0.48

CCNA1

CCNA1

16524460, 16807314, 16449996

23

4.6

−0.43

ALG5

24

4.9

−0.51

PCDH8

PCDH8

25

5.1

−0.38

PCDH17

PCDH17

26

5.1

−0.37

EDNRB

No.

t Statistics

1

Hyper-methylated

2

PubMed reference IDc

15352125; 12819009; 12032849

ADCY8 GPR7

GPR7

17437806

DBC1

16846474, 15746151

16278676, 17145863, 16912168

C20orf161

DOK5 TFAP2C

14996719

16001328

15026333, 14688019, 12499435 (continued)

Quantitative DNA Methylation Profiling on a High-Density Oligonucleotide

169

Table 3 (continued) No.

t Statistics

Methylation difference HUGO genesa

27

4.6

−0.47

28

4.8

−0.42

29

4.8

−0.44

C20orf31

30

5.0

−0.40

GTPBP5

31

Hypo-methylated

32

−4.78

0.44

ACSL6, ACSL1

33

−5.04

0.38

KCNH1, KCNH5

HUGO TSSb

PubMed reference IDc

HCK

17344919

TPX2

Fragment overlaps with gene Fragment is within 2,500 bp of TSS c Examples for reports of hypermethylation in the respective gene a

b

Fig. 6. Comparison of quantitative MSP and DMH measurements for EDNRB (left) and CYP1B1 (right) for 24 colorectal cancer tissues and 11 normal colon control tissues. (Methylation scores M are given for MSP and DMH).

these genes, e.g., GPR7, DBC1, HOXB13, TFAP2C, GNAS, CYP1B1, CCNA1, EDNRB, and HCK, found to be hypermethylated in the colon tumor samples have been previously reported to be associated with methylation and cancer. Two genes, EDNRB and CYP1B1, were selected to be analyzed by quantitative methylation-specific PCR (MSP) as an independent method. Correlations of 76% and 85% between DMH and MSP methylation scores are observed (see Fig. 6).

170

Fassbender et al.

Acknowledgment We thank the German Ministry of Education and Research (BMBF) for financial support for part of this study by Förderprojekt “NGFN2: Systematisch-Methodische Platform Epigenetik” (01GR0492). References 1. Laird, P.W. (2003) The power and the promise of DNA methylation markers. Nat. Rev. Cancer 3, 253–266. 2. Lalande, M. (1996) Parental imprinting and human disease. Annu. Rev. Genet. 30, 173–195. 3. Fan, J.-B., Chee, M.S. and Gunderson, K.L. (2006) Highly parallel genomic assays. Nat. Rev. Genet. 7, 632–644. 4. Southern, E.M., Maskos, U. and Elder, J.K. (1992) Analyzing and comparing nucleic acid sequences by hybridisation to arrays of oligonucleotides: evaluation using experimental models. Genomics 13, 1008–1017. 5. Lockhart, D.J. et al. (1996) Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat. Biotechnol. 14, 1675–1680. 6. Syvanen, A.C. (2005) Towards genome-wide SNP genotyping. Nat. Genet. 37, S5–S10. 7. Huang, T.H., Perry, M.R. and Laux, D.E. (1999) Methylation profiling of CpG islands in human breast cancer cells. Hum. Mol. Genet. 8, 459–470. 8. Yan, P.S., Chen, C-M., Shi, H., Rahmatpanah, F., Wei, S.H., Caldwell, C.W. and Huang, T.H. (2001) Dissecting complex epigenetic alterations in breast cancer using CpG island microarrays. Cancer Res. 61, 8375–8380. 9. Hatada, I. et al. (2002) A microarray-based method for detecting methylated loci. J. Hum. Genet. 47, 448–451. 10. Schumacher, A. et al. (2006) Microarraybased DNA methylation profiling: technol-

ogy and applications. Nucleic Acids Res. 34, 528–542. 11. Rauch, T., Li, H., Wu, X. and Pfeifer, G.P. (2006) MIRA-assisted microarray analysis, a new technology for the determination of DNA methylation patterns, identifies frequent methylation of homeodomain-containing genes in lung cancer cells. Cancer Res. 66, 7939–7947. 12. Gebhard, C., Schwarzfischer, L., Pham, T.-H., Schilling, E., Klug, M., Andreesen, R. and Rehli, M. (2006) Genome-wide profiling of CpG methylation identifies novel targets of aberrant hypermethylation in myeloid leukemia. Cancer Res. 66, 6118–6128. 13. Weber, M., Davies, J.J., Wittig, D., Oakeley, E.J., Haase, M., Lam, W.L. and Schübeler, D. (2005) Chromosome-wide and promoterspecific analyses identify sites of differential DNA methylation in normal and transformed human cells. Nat. Genet. 37, 853–862. 14. Fodor, S.P. et al. (1991) Light-directed, spatially addressable parallel chemical synthesis. Science 251(4995), 767–773. 15. Ishkanian, A.S. et al. (2004) A tiling resolution microarray with complete coverage of the human genome. Nat. Genet. 36, 299–303. 16. Heisler, L.E. et al. (2005) CpG island microarray probe sequences derived from a physical library are representative of CpG islands annotated on the human genome. Nucleic Acids Res. 33, 2952–2961.

Chapter 10 Single-Nucleotide Polymorphism (SNP) Analysis to Associate Cancer Risk Julie Earl and William Greenhalf Summary Identification of hereditary factors that predispose to cancer allows targeted cancer screening and better quantification of environmental risk factors. The ability to identify which single nucleotide polymorphisms (SNPs) are associated with cancer or segregate with disease in families allows high-risk loci to be identified. In this chapter, two platforms for analysing SNPs are discussed, the Affymetrix and Illumina systems. Application of both platforms requires the same principles of good laboratory practice but there are important differences in materials and methods, which will be discussed. Key words: Familial cancer, Arrays, Association, Linkage

1. Introduction Linkage and association studies have been used to quantify cancer risk in the past with some success, for example, the loci of the Rb gene was identified in families with retinoblastoma (1), the APC loci was identified in familial adenomatous polyposis (2), various loci (each associated with mismatch repair genes) were identified in human non-polyposis colon cancer (3–5), and the STK11 locus was identified in Peutz-Jeghers syndrome (6, 7). All of these were high-risk autosomal dominant conditions; such inheritance is relatively rare in cancer, the majority of genetic predisposition results from complex interactions of multiple genes with each other and with environmental exposure (8, 9). Such weaker associations have also been investigated using microsatellites (10) but with very little success, largely because microsatellites change

Robert Grützmann and Christian Pilarsky (eds.), Cancer Gene Profiling: Methods and Protocols, Methods in Molecular Biology, vol. 576, DOI 10.1007/978-1-59745-545-9_10, © Humana Press, a part of Springer Science + Business Media, LLC 2010

171

172

Earl and Greenhalf

over the generations and so are not applicable to identifying ancient common founders (11). In any case, weaker associations will usually involve polymorphisms that are relatively common in the general population and only become significant risk factors when in combination. Weaker associations are also not amenable to analysis by linkage studies, because non-affected carriers and those not carrying high-risk alleles cannot be distinguished. This makes genome-wide scans for association impossible, but less empirical studies based on one or two targeted loci are feasible. To target the loci for association studies, identification of allelic loss in tumour samples can be used. The rationale is that high-risk alleles will be enriched during tumourigenesis by loss of the lowrisk partner, thus causing a loss of heterozygosity in the microsatellite locus. For low penetrance alleles, the selective pressure for loss of heterozygosity will be weak and so faith in this rationale requires optimism verging on irrationality. Single base changes occurring as the result of transition or transversion mutations are much more stable than microsatellites and so linkage of such changes to a disease related allele will be maintained over far greater numbers of generations; these changes may even be functionally related to the disease allele. Unfortunately, identification of such base changes is more complex than identifying size differences in microsatellite regions. Possible techniques include DNA sequencing (12), allele-specific polymerase chain reaction (PCR) (13, 14), and single-strand conformational polymorphism (SSCP) analysis (15), but despite improvements in technology (16, 17), at the time of writing, all of these remain suitable only for analysing small numbers of loci in relatively small numbers of individuals, making their use impractical for whole-genome scanning. The development of microarray technology in 1995 (18) revolutionised the area of cancer genomics, initially because it could be applied to expression analysis, revealing sequences that were up- or down-regulated during cancer development (19–22). Subsequently this technology was adapted, allowing competitive or comparative binding of cancer and non-cancer genomic DNA, revealing regions that were amplified or lost in malignant cells (23, 24) (comparative genomic hybridisation). As the technology improved, the accuracy with which hybridisation to different sequences could be distinguished reached a level at which single-nucleotide polymorphisms (SNPs) could be identified and levels compared. This offered the first practical method for genome-wide linkage (25) and association analysis (26–28) using SNPs. Not only are association studies using SNPs much more powerful, they are also more amenable to high throughput and, for whole-genome studies, require less starting material than microsatellite approaches. The lower requirement for input DNA and availability of high-density arrays means that SNP arrays have also become the technology of choice for empirical studies of

Single-Nucleotide Polymorphism (SNP) Analysis

173

loss of heterozygosity (LOH) in tumour samples (29). There are approximately 5–10 million SNPs in the human genome, occurring every 400–1,000 bp (30). It has been estimated that approximately 500,000 SNPs are required to genotype an individual of European ancestry (31). SNP genotyping involves a DNA amplification step and hybridisation of labelled DNA to an array containing immobilised oligonucleotide representations of SNPs. Since its earliest incarnations, array technology has offered alternative approaches, some of which have prospered and developed into routine laboratory workhorses, while others have flourished only to become unpopular or redundant. Both competitive and comparative hybridisations remain widely used for different applications. Competitive hybridisation requires nucleic acid from the different sources (e.g. cancer and normal tissue) to be differentially labelled (18). This allows a higher level of internal control and is the method of choice in many forms of expression analysis. Comparative hybridisation was the original technique (32) and involves separate hybridisations, this requires a very high level of reproducibility and relies on inclusion of more controls. Despite this, comparative approaches are the most popular in genomic SNP analysis; this owes a lot to the technical excellence of the Affymetrix platform, which has always relied on single labelling. Affymetrix has now been joined by another highprofile competitor, Illumina, and the field is now divided regarding which platform is the most effective and which is most appropriate for different applications. Both systems have similar requirement for sample preparation and good laboratory practice. 1.1. General Requirements 1.1.1. Minimum Information About a Microarray Experiment

Minimum information about a microarray experiment (MIAME) (33) establishes standards that allow raw data from array experiments from different groups to be compared. Most of these standards were designed with expression profiling in mind, but the general principles (if not the specific details) are equally valid for association or linkage analysis using SNP arrays. MIAME specify that the content of the microarray data contributed complies with the following elements: 1. The raw data for each hybridisation must be accessible (e.g. CEL file for Affymetrix arrays). 2. The final processed (normalised) data for the set of hybridisations in the study should be accessible and transparent (e.g. the matrix used to draw the conclusions from the study should have a standard format). 3. The essential sample annotation including experimental factors and their values should be available (e.g. was blood DNA used or were paraffin-embedded samples required). 4. The experimental design including sample data relationships should be given (e.g. which raw data file relates to which

174

Earl and Greenhalf

sample, which repeat hybridisations are quality controls and which are biological replicates). 5. Sufficient annotation of the array (e.g. genomic coordinates, probe oligonucleotide sequences, or reference commercial array catalog number). 6. The essential laboratory and data processing protocols should be transparent (e.g. the normalisation method used to obtain the final processed data should be included in reports). 1.1.2. Laboratory Environment

A room should be designated as a DNA-free pre-PCR clean area for storage of reagents required for DNA digestion, ligation, and PCR. Ideally a separate room will be allocated for the set up of pre-PCR stages that is separate to the room where reagents are stored. However, if this is not possible, then an area within a DNA-free room should be allocated for pre-PCR stage set-up. All reagents and equipment (e.g. laboratory coats, pipette tips and racks) used in the DNA-free room should not be taken into “DNA contaminated areas” that will have airborne DNA contamination, particularly laboratories where PCR amplification is performed. Work surfaces and equipment designated for a DNA-free area can be decontaminated using DNAZap, which completely degrades contaminating DNA and RNA to a level below PCR sensitivity (Ambion, AM9890) or Microsol (Anachem, MIC-201), a disinfectant that also acts on nucleic acids. It is good practice to decontaminate all work surfaces in this manner whether working in a DNA-free or DNA-contaminated area. Post-PCR stages can be performed in the main laboratory, this includes PCR (not reaction set-up), PCR clean up, fragmentation, labelling, hybridisation, washing, and staining. Reagents for these steps should be stored in this laboratory and not taken into the DNA-free area.

1.1.3. Sample Types

Successful application of SNP arrays for association or linkage studies requires good quality DNA, which means that, where possible, freshly obtained blood should be used. However, this is not always possible or convenient. Archived samples are often only available as formalin-fixed, paraffin-embedded samples, these can be applied to either platform, although the requirement for whole-genome amplification means that the Illumina GoldenGate system is preferable. DNA quality is very variable depending on the type of tissue, the age of the sample, and how the tissue was fixed. Therefore, PCR-based quality control is essential before using the DNA on the array. Even so, a compromise in quality will almost certainly be necessary, this may be acceptable on higher density arrays, but will be more problematic with arrays that have only one SNP in a region being genotyped. To increase DNA quantity when using the Affymetrix platform, multiple PCRs can

Single-Nucleotide Polymorphism (SNP) Analysis

175

be used preceding fragmentation, this will also have the benefit of avoiding Jackpot effects; single-allele amplification occurring because the second allele is missed in the first round of amplification. A similar approach could equally be applied to the Illumina system, but because this would include use of the linkage panels multiple times, it might prove prohibitively expensive. 1.1.4. Equipment

Both platforms require PCR amplification; it is a common misconception that given a good PCR machine, successful amplification will depend purely on the reagents and the protocol. In reality, subtle differences in machines and even tubes will make significant differences to effectiveness. Affymetrix recommends the 2720 thermal cycler or GeneAmp PCR system 9700 (Applied Biosystems) and MJ Tetrad PTC-225 (BioRad). The Affymetrix protocol is optimised thoroughly and has rigorous guidelines regarding the source of reagents, equipment, and plasticware. Illumina does not specify in such detail the reagents and plasticware to be used, mainly because all of the necessary reagents are supplied with the genotyping assay kit. In accordance with MIAME principles, the PCR machine used should be given in any report and groups should ensure that the machine used is commonly available to other investigators.

1.1.5. Sample Preparation

DNA quality is critical regardless of the platform used. The most commonly used method for DNA quantification is absorbance at 260 nm, but this is sensitive to single-stranded DNA (ssDNA), RNA, protein, and reagent contamination from DNA preparation methods. For pure double-stranded DNA (dsDNA), an absorbance value of 1 at 260 nm corresponds to a DNA concentration of 50 mg/ml. DNA can be run through a low-strength agarose gel (1%) to ensure that it is intact. This method of quality control is recommended by Affymetrix, who specify that the molecular weight of genomic DNA should be >20 kb. Illumina specifies that DNA fragment sizes should be at least 2 kb, although the GoldenGate assay can be used with fragments as small as 200 bp and is able to tolerate degraded sample better than the Infinium assay; both of which are provided by Illumina. Furthermore, DNA should be free of impurities with A260/280 ratios between 1.7 and 1.9. Inhibitors in genomic DNA preparations can be removed by ethanol precipitation using the protocol in Subheading 3.1.1. Illumina advises against using UV spectrometry, even the highly respected NanoDrop technology to quantify DNA; because contaminating ssDNA, oligonucleotides, RNA, and proteins can interfere with the readings and thus give inaccurate results. Therefore, Illumina recommends that DNA be quantified using the Quant-iTTM PicoGreen® assay, a fluorescent-based nucleic acid stain for quantification of dsDNA. The assay has been optimised

176

Earl and Greenhalf

so that fluorescence originating from contaminating RNA or ssDNA is minimal. The system is capable of detecting dsDNA at a concentration of 25 pg/ml. The protocol for DNA quantification using the Quant-iTTM PicoGreen® assay is described in Subheading 3.1.2. DNA extraction methods that yield ssDNA are not appropriate, because dsDNA is required for restriction digestion. DNA isolation methods recommended by Affymetrix include QIAamp® DNA Blood Maxi kit (QIAGEN). An alternative is SDS/ ProteinaseK digestion with phenol chloroform extraction, followed by ultracentrifugation/concentration with Microcon® or Centricon® filters (Millipore). It is vital to avoid contamination with DNA from other sources because this will result in a high marker detection rate but the call rate will fall because there will be mixed alleles present. The most likely way for cross-contamination to occur would be via contact of the preparation with airborne PCR-amplified DNA within the laboratory. Several safe guards are employed to avoid cross-contamination of DNAs, such as the allocation of a DNAfree room for storage and preparation of pre-PCR reagents, as described above. It is essential to use dedicated equipment, lab coats, etc. for DNA-free and main laboratory areas, with restricted movement between these areas. Because DNA quality is the critical component of any SNP-based genotyping assay, both Illumina and Affymetrix recommend that a small number of DNA samples be tested initially before performing a large-scale genotyping assay. 1.1.6. Whole-Genome Amplification of DNA in Genotyping Assays

Whole-genome amplified (WGA) DNA prepared using multiple displacement amplification (MDA)-based methods (REPLI-g®, Qiagen, and Fpolymerase) or by amplification using random primers (OmniPlex® assay, Rubicon Genomics) have been used. These methods yielded concordance rates of >98.8% and genotype call rates of >99.8% using the Illumina GoldenGate platform (34, 35). WGA with reasonable quantities of good quality DNAs is therefore effective, but, in practice, WGA will usually be considered when DNA is poor quality and minimum quantity, in which case, the genotyping data obtained will be severely prejudiced (36). Therefore, Illumina recommends that the starting DNA be intact at a minimum of 50 ng/ml, quantified using the PicoGreen® assay. Some degree of DNA degradation can be tolerated but it is recommended that a minimum of 100–200 ng of partially degraded DNA should be used. A separate WGA step (in addition to amplification integral to the assay) is not recommended for use with Illumina’s Infinium platform nor with the Affymetrix genotyping assays.

Single-Nucleotide Polymorphism (SNP) Analysis

177

2. Materials 2.1. General Materials

1. Absolute ethanol.

2.1.1. Clean-Up of Genomic DNA

2. 7 M sodium acetate. 3. Glycogen. 4. Reduced TE (10 mM Tris–HCl, pH 8.0, 0.1 mM EDTA, pH 8.0).

2.1.2. DNA Quantification Using the Quant-iT™ PicoGreen®Assay

1. Calf thymus DNA (Sigma, D4654) or bacteriophage lambda DNA (Sigma, D3654). 2. Quant-iT™ PicoGreen® dsDNA assay kit (Invitrogen, P7589). 3. Tris–EDTA buffer (TE): 10 mM Tris–HCl, 1 mM EDTA, pH 7.5. 4. Spectrofluorometer or fluorescence microplate reader.

2.2. Affymetrix Materials

1. GeneChip human mapping 500K assay kit (Affymetrix). 2. MJ Tetrad PTC-225 (BioRad). 3. Reduced TE buffer (10 mM Tris–HCl, pH 8.0, 0.1 mM EDTA, pH 8.0). 4. Molecular biology-grade water. 5. Appropriate restriction enzyme and reaction buffer (NspI or StyI). 6. Bovine serum albumin (10 mg/ml). 7. T4 DNA ligase (400 U/ml). 8. TITANIUM Taq DNA polymerase (50×) (Clontech, 639209). 9. GC melt (5 M) (Clontech, 639238) (see Note 1). 10. dNTPs (2.5 mM each). 11. 0.5 M EDTA pH 8.0. 12. 2× gel-loading buffer. 13. PCR clean-up plate (Clontech, 636974). 14. 4% TBE agarose gel (Cambrex, 54929) (see Note 2). 15. 2-Morpholinoethanesulfonic acid (MES) (12×, 1.22 M). 16. Dimethyl sulfoxide (DMSO). 17. Denhardt’s solution (50×). 18. Herring sperm DNA (10 mg/ml). 19. Human Cot-1 DNA 1 mg/ml (Invitrogen, 15279-011). 20. Tween-20 (10%). 21. Tetramethylammonium chloride (TMACL) (5 M).

178

Earl and Greenhalf

22. PCR strip tubes (BioRad, TBS-0201). 23. PCR tube strip caps (BioRad, TCS-0801). 24. All-purpose HiLo DNA marker (Bionexus, BN2050). 2.3. Infinium Array, Illumina

1. Infinium II whole-genome genotyping kit, HumanHap 240S, 300, 550, or 650Y. 2. Sentrix universal 96-array matrix for 96 samples (Illumina, FA-12-202) or Sentrix universal 16-beadchips 384-plex set of 6 (Illumina, GT-95-212). 3. Stand-alone BeadArray reader (Illumina, SC-16-300/301). 4. TE buffer: 10 mM Tris-HCl, pH 7.5; 1 mM EDTA. 5. 0.1 N (0.1 M) NaOH (i.e. 4 g/l). 6. 96-well storage plate (ABgene, AB-0859). 7. Cap-Mat (ABgene, AB-0566). 8. Microplate shaker (VWR, 444-7016). 9. Isopropanol. 10. 100% formamide. 11. 95% formamide:10 mM EDTA. 12. Heat-sealed foil cover (ABgene, AB-0559). 13. Aluminium block (Illumina, 21119). 14. Te-Flow rack (Tecan, 760–800).

2.4. GoldenGate Array, Illumina

1. Single-use DNA activation kit with enough reagents for six 96-well plates, i.e. 576 samples (Illumina, GT-95-201). 2. GoldenGate assay kit for 96/576 samples (Illumina GT-95203 [96] and GT-95-204 [576]). 3. Sentrix universal 96-array matrix for 96 samples (Illumina, FA-12-202) or Sentrix universal 16-beadchips 384-plex set of 6 (Illumina, GT-95-212). 4. 2-isopropanol. 5. Thermal cycler. 6. 0.1 N (0.1 M) NaOH (i.e. 4 g/l).

3. Methods 3.1. General Methods 3.1.1. Clean-Up of Genomic DNA

1. Add 2.5 volumes of absolute ethanol (stored at −20°C) and 0.5 volumes of 7 M sodium acetate and 10 mg of glycogen (a co-precipitant to ensure that you do not lose your pellet) per 1 mg genomic DNA.

Single-Nucleotide Polymorphism (SNP) Analysis

179

2. Vortex and incubate at −20°C for 1 h and centrifuge at 12,000 × g for 20 min at room temperature. 3. Wash the pellet with 0.5 ml of 80% ethanol and centrifuge at 12,000 × g for 5 min, repeat this step once. 4. Air-dry the pellet and resuspend it in reduced TE buffer. 3.1.2. DNA Quantification Using the Quant-iT™ PicoGreen®Assay

1. Prepare standards (1 ml or more) using either bacteriophage lambda DNA or calf thymus DNA, by dilution to concentrations of 1 mg/ml, and 100, 10, and 1 ng/ml in TE buffer. 2. Add 1 ml of Quant-iT™ PicoGreen® reagent to 1 ml of the diluted samples and a TE blank; incubate at room temperature for 2–5 min, either in the dark or in a foil-wrapped container to avoid exposure to light. 3. Measure the fluorescence intensity of the sample at 520 nm with excitation at 485 nm using either a spectrofluorometer or fluorescence microplate reader. 4. Subtract the reading at 520 nm of the blank (TE alone) from the DNA dilution standards and plot a curve of nucleic acid concentration against fluorescence intensity at 520 nm. 5. Dilute genomic DNA in TE buffer to a final volume of 1 ml and add 1 ml of Quant-iT™ PicoGreen® reagent and incubate at room temperature for 2–5 min protected from light (as previously). 6. Measure the fluorescence intensity of the sample and TE buffer alone at 520 nm. Subtract the value obtained for TE buffer and determine the DNA concentration from the standard curve.

3.2. Affymetrix SNP Arrays

The GeneChip® Human Mapping Array Sets produced by Affymetrix are high-density arrays that represent thousands of SNPs. The density of Affymetrix arrays has increased from 10,000 SNPs per chip (10K chips) to 100K, 500K, and the recently introduced Genome-Wide Human SNP array 6.0, which has 906,600 SNPs and a further 40,000 non-polymorphic probes. Originally, the 500K array set was comprised of two arrays (approximately 262,000 SNPs in the NspI array and 238,000 for the StyI array). The 500K set was subsequently amalgamated onto a single chip marketed as the Genome-Wide Human SNP Array 5.0. The restriction enzymes were chosen to maximise the number of PCR-amplifiable (i.e. 200–1,100 bp) fragments containing informative SNPs. The procedure essentially involves digestion of genomic DNA to fragment sizes of 200–1,100 bp and the ligation of adaptors to the digested products. The adaptors act as PCR primer sites to allow amplification and enrichment of these fragments. The PCR products are purified and fragmented using DNAseI to less than

180

Earl and Greenhalf

Fig. 1. SNP genotyping using the Affymetrix 500K protocol. (a) PCR of digestion/ligation product. (b) Fragmentation of PCR product run through a 4% agarose gel (37).

200 bp and a biotin label is added before hybridisation on the array (Fig. 1). There are several quality control steps throughout the procedure that allow the assessment of the efficiency of each stage and subsequent optimisation. Stage 1: Genomic DNA preparation This stage is performed in the designated DNA-free room or pre-PCR room (see Note 3). The minimum amount of starting DNA required for the Affymetrix chips is 250 ng in a volume of 5 ml in reduced TE. Stage 2: Restriction digestion of genomic DNA This stage should be performed in the pre-PCR room or designated DNA-free room (see Note 3). Genomic DNA is digested with the adaptor-specific restriction enzyme, either NspI or StyI. Ideally, a master mix would be prepared in a DNA-free room and genomic DNA added in the pre-PCR room (see Note 3). 1. For one reaction: The following can be mixed in a single tube: 11.6 ml of molecular biology-grade water, 2 ml of 10× appropriate digestion buffer, 0.2 ml BSA (10 mg/ml), 1 ml enzyme (10 U/ml), and, finally, 5 ml genomic DNA (50 ng/ml). 2. Alternatively: It would be more typical to carry out the process on multiple samples, in which case, a master mix should be prepared with 5% excess to allow for pipetting errors. For example, for eight samples, mix 97 ml of molecular biology-grade water, 17 ml of 10× digestion buffer, 1.7 ml BSA, and 8.5 ml enzyme. Add 14.75 ml of this master mix to 5 ml of genomic DNA. 3. Briefly centrifuge and incubate in a pre-heated PCR machine with a heated lid at 37°C for 2 h. Incubate at 65°C for 20 min and hold at 4°C. Samples should be stored at −20°C if not proceeding to the next stage immediately. Stage 3: Ligation Digested DNA is ligated to the appropriate adaptor in the pre-PCR room or designated DNA-free room (see Note 3). Reactions are prepared on ice as follows:

Single-Nucleotide Polymorphism (SNP) Analysis

181

1. Prepare master mixes ensuring a 5% excess. The following volumes are given for one reaction with the recommended volume followed by the required volume in parentheses: Mix 0.8 ml (0.75 ml) adaptor, 2.7 ml (2.5 ml) T4 DNA ligase buffer, and 2 ml T4 DNA ligase (400 U/ml) (see Note 4). 2. Add 5.25 ml to digestion reaction, briefly centrifuge, and incubate in a pre-cooled PCR machine at 16°C for 3 h, then heat to 70°C for 20 min and hold at 4°C until proceeding to the next step. As in the previous stage, samples should be stored at −20°C if not proceeding to the next stage immediately. Samples should be centrifuged briefly before proceeding. Stage 4: PCR The ligated product is used as a template for PCR amplification using primers that bind within the adaptor region. PCRs are performed in triplicate to achieve sufficient DNA quantity for the subsequent stages. 1. Add 75 ml of molecular biology-grade water to the ligated product from stage 3 to make a total volume of 100 ml. 2. Prepare the master mix on ice in the DNA-free room with a 5% excess. The following volumes are given for one sample (three reactions), with the recommended volume followed by the required volume in parentheses: 125 ml (118.5 ml) molecular biology-grade water, 31.5 ml (30 ml) Titanium Taq PCR buffer, 63 ml (60 ml) GC melt (5 M), 44 ml (42 ml) dNTPs (2.5 mM each), 14 ml (13.5 ml) PCR primer 002 (100 mM), and 6 ml TITANIUM Taq DNA polymerase (50×) (see Note 4). 3. Add 90 ml of PCR master mix to 10 ml of diluted ligation product in a dome-capped PCR tube and briefly centrifuge (see Note 5). 4. PCR is performed in the main laboratory and cycling proceeds as follows: 94°C for 3 min followed by 30 cycles of: 94°C for 30 s, 60°C for 45 s, and 68°C for 15 s. The reaction is completed with an additional elongation step at 68°C for 7 min and held at 4°C or at −20°C if not proceeding to the next stage immediately. 5. Briefly centrifuge the samples and add 3 ml of PCR product to 3 ml of 2× gel-loading buffer and run through a 2% agarose gel at 100 V for 1 h to confirm that the products are in the correct size range of 200–1,100 bp. A typical PCR result is shown in Fig. 1(37). Stage 5: PCR purification and quantification PCR products are pooled, purified, and concentrated into a volume of 45 µl.

182

Earl and Greenhalf

1. Add 8 ml of 0.1 M EDTA, pH 8.0, to each PCR prior to purification. 2. Pool PCR products into one well of the PCR clean-up plate and apply a vacuum until the well is dry (it will appear glossy); then add 50 ml of molecular biology-grade water and allow the membrane in the well to dry, repeat this step twice. Allow the membrane to dry completely at the end of the last wash and then add 45 µl of RB buffer (supplied with the clean-up plate). 3. Secure the plate to a flat-top vortex and set at the lowest speed and leave plate for 10 min to allow DNA immobilised on the membrane to be resuspended in the RB buffer (see Note 6). 4. Carefully remove the RB buffer into a clean 0.2-ml microcentrifuge tube and take 2 ml of purified PCR product and add to 198 ml of molecular biology-grade water. Read the absorbance at 260 and 280 nm using a spectrophotometer. Calculate the concentration of DNA assuming one absorbance unit at 260 nm equals 50 mg/ml DNA, multiply this value by 100 to allow for the dilution factor. If there is an insufficient quantity of DNA after purification to proceed to the fragmentation step, then additional PCRs can be performed on the digestion/ligation product to increase DNA quantity. There should be sufficient ligated DNA template to perform at least nine PCRs. Samples should be stored at −20°C if not proceeding to the next stage immediately. Stage 6: Fragmentation This stage should be performed in the main laboratory. It relies on a fragmentation reagent that is supplied as either 3 or 2 U/ml; a note must be made regarding which version is used. The protocol below is based on the 3 U/ml kit. 1. Transfer 90 mg of purified DNA into a sterile 0.2-ml PCR tube and make up to 45 ml using RB buffer, add 5 ml of 10× fragmentation buffer to each DNA sample. 2. Prepare the fragmentation mix so that it is at 0.05 U/ml as follows: For five reactions (the minimum number of reactions using the 3 U/ml reagent), mix 26.5 ml molecular biologygrade water, 3 ml of 10× fragmentation buffer, and 0.5 ml fragmentation reagent (3 U/ml) (see Note 7). 3. Add 5 ml of fragmentation mix to each 90 mg of purified DNA, briefly centrifuge, and place in a PCR machine preheated to 37°C for 35 min, followed by incubation at 95°C for 15 min to inactivate the enzyme. 4. Add 4 ml of fragmentation reaction to 4 ml of 2× gel-loading buffer and run through a 4% agarose gel alongside the all-

Single-Nucleotide Polymorphism (SNP) Analysis

183

purpose HiLo DNA marker at 100 V for 1 h to confirm that fragment sizes are less than 200 bp. A typical fragmentation reaction result is shown in Fig. 1(37). Stage 7: Labelling This stage is performed in the main laboratory. 1. For one reaction, the following volumes of reagents are added to 50.5 ml (see Note 8) of the fragmentation reaction in a new domed-cap 0.2-ml PCR tube: 14 ml of 5× TdT buffer, 2 ml GeneChip® DNA labelling reagent (30 mM), and 3.5 ml TdT (30 U/ml). If multiple samples are being used, a labelling master mix can be prepared, for example, for eight samples, the following would be mixed together (recommended volumes are followed by required volumes in parentheses): 117.6 ml (112 ml) of 5× TdT buffer, 16.8 ml (16 ml) GeneChip® DNA labelling reagent (30 mM), and 29.4 ml (28 ml) TdT (30 U/ml); combine 19.5 ml of labelling master mix with 50.5 ml of the fragmentation reaction. 2. Briefly centrifuge and incubate at 37°C for 4 h followed by a denaturation step at 95°C for 15 min. Samples should be stored at −20°C if not proceeding to the next stage immediately. Stage 8: Hybridisation Prior to hybridisation on the array, labelled DNA must be suspended in a hybridisation cocktail as follows. For one reaction, use: 12 ml MES (12×, 1.22 M), 13 ml DMSO (100%), 13 ml Denhardt’s solution (50×), 3 ml EDTA (0.5 M), 3 ml herring sperm DNA (10 mg/ml), 2 ml OCR, 0100 (supplied in the Affymetrix kit), 3 ml Human Cot-1 DNA (1 mg/ml), 1 ml Tween-20 (3%), and 140 ml TMACL (5 M). Add 190 ml to the 70 ml of each labelled DNA sample. Mix well and heat to 99°C in a heated block for 10 min to denature. Cool on ice for 10 s. Briefly centrifuge samples to collect condensate and incubate at 49°C for 1 min. Inject 200 ml of denatured hybridisation cocktail into the array and hybridise for 16–18 h at 60 rpm. Washing, staining, and scanning of the array is an automated process operated using the Genechip® operating software GCOS that produces files for data analysis. Stage 9: Data analysis Data will be in the form of CHP, CEL, or DAT files and can be imported into and analysed using the GTYPE software. Genotype calls are made by using either the Dynamic Model (DM) or the Bayesian Robust Linear Model with Mahalanobis (BRLMM). The dynamic model is used to call SNP genotypes on single samples. The BRLMM algorithm is a clustering method that requires multiple samples (a minimum of 50) and achieves call rates of a greater accuracy for both homozygous and heterozygous alleles than the dynamic model.

184

Earl and Greenhalf

3.2.1. Quality Control of SNP Calling

The Modified Partitioning Around Medoids (MPAM) calling algorithm is used to assess for sample contamination using a subset of SNPs on the StyI and NspI array. Mapping detection rate (MDR) and mapping call rate (MCR) are quality control (QC) measures that indicate whether a sample is contaminated. MCR is calculated as the number of SNPs called with the MPAM algorithm/total number of SNPs checked for QC purposes. All chips have control sequences showing either a perfect match or a 1-bp mismatch with a reference sequence. The SNP detection filter compares background-subtracted intensity of a perfect match (PM) probe to background-subtracted intensity of a mismatch probe (MP). MDR is defined as the number of SNPs passing the MPAM discrimination filter/total number of SNPs checked with the MPAM algorithm for QC purposes. In a pure sample, possible allele calls (AB) for a given SNP are AA, AB, and BB at ratios 100:0, 50:50, and 0:100, respectively. A breach in these ratios indicates sample contamination and thus the call rate will fall, although the detection rate will not be affected. The expected MCR value is >0.93 and >0.99 for MDR. If a sample is contaminated, a high MDR will be achieved but the MCR will decrease because the expected allele ratio will be compromised. Therefore a reduction in MCR less than 0.93 with a MDR >0.99 indicates sample contamination; this does not apply if the MDR is less than 0.99.

3.2.2. Quality Control of GeneChip® Human Mapping 500K Assay

Prior to running actual samples, it is essential to prepare and run the control DNA supplied in the kit on an array and assess the quality of genotype calls. The control DNA supplied is of a guaranteed quantity and quality and, provided DNA labelling and hybridisation procedures are optimal, it represents the maximum efficiency of genotyping in a particular operator’s hands. There are several quality control steps throughout the protocol. Troubleshooting procedures are outlined in Table 1.

3.3. Illumina Arrays

Illumina offer two types of genotyping assay, GoldenGate and the Infinium. Both of these use beadchip technology but require different methods of DNA enrichment. Illumina arrays have the advantage that the protocol from template preparation to sample hybridisation and generation of genotyping data only takes 3–4 days as opposed to 4–5 days with Affymetrix (Table 2). Illumina offers custom-designed arrays where the user can submit a list of gene accession numbers and Illumina generates a list of SNPs within a defined region of these genes, at the time of writing, this is done using genome build 36. Furthermore, Illumina offers a cancer SNP panel that contains more than 1,400 SNPs from more than 400 genes reported to be involved in cancer.

Single-Nucleotide Polymorphism (SNP) Analysis

185

Table 1 Troubleshooting the Affymetrix 500K SNP genotyping protocol Problem

Likely cause

PCR products are not in the correct size range

– Starting template DNA may be fragmented, run 1–2 ml on an agarose gel – Quantification may be inaccurate, calibrate the UV spectrophotometer – Use the specified PCR machine, tubes, and reagents for reaction

Insufficient quantity of PCR product after PCR purification

– UV spectrophotometer may not be calibrated – Increase the number of PCR replicates to more than three – Enzyme inhibitors in template sample, purify by ethanol precipitation

Fragments are too small after fragmentation reaction

– Too much DNAse used

MDR <0.99 and MCR <0.93

– Insufficient labelled DNA on array

Conventional arrays are composed of oligonucleotides probes immobilised onto a quartz-based substrate at a precise location. Illumina offers a novel technology using 300-nm-diameter beads with approximately 700,000 covalently attached oligonucleotide probes that randomly self assemble onto etched microwell substrates to a density estimated to be 40,000 times higher than conventional spotted arrays. Illumina have two array formats, the Sentrix Array Matrix (SAM) and the Sentrix Beadchip. SAM consists of fibre optic bundles chemically etched with 300-nm wells to accommodate the beads. The fibre optic bundles are grouped together in a 96-well plate format. However, for a lower throughput assay, the Sentrix Beadchip can be used, which can assay 16 samples at a time (17). 3.3.1. Infinium Genotyping

Infinium genotyping is a non-PCR-based assay that involves allele-specific primer extension of oligonucleotides hybridised to genomic DNA. Infinium arrays allow 109,000–650,000 SNPs to be genotyped per sample. Genome-wide SNP genotyping using the Infinium assay consists of four steps:

186

Earl and Greenhalf

Table 2 Time scale for performing Affymetrix 250K GeneChip® Human Mapping Array and Illumina GoldenGate and Infinium II genotyping assays

Affymetrix 250K GeneChip® Human Mapping Array

Day 1

Day 2

Day 3

Day 4

1. Genomic DNA preparation

5. PCR purification and quantification

6. Fragmentation

8. Hybridisation

1. Biotinylation of genomic DNA

5. PCR product preparation

7. Washing and imaging

2. Annealing of oligonucleotides to genomic DNA

6. Array hybridisation

2. Restriction digestion of genomic DNA

7. Labelling

3. Ligation 4. PCR Illumina GoldenGate genotyping assay

3. Oligonucleotide extension and ligation 4. PCR amplification Illumina Infinium II genotyping assay

1. Whole-genome amplification

2. Fragmentation of amplified DNA

5. Array-based primer extension

3. Precipitate and resuspend DNA

6. Multilayer immunohistochemistry sandwich staining

4. Beadchip hybridisation

1. A WGA and fragmentation step included within the kit protocol, this produces amplified DNA of 200–300 bp. 2. Hybridisation of amplified DNA to an oligonucleotide array that consists of 80mer oligonucleotides with the 5¢ end immobilised onto the beads. The first 30 bases act as the decoding sequence; this is used to determine the location and the identity of the probe, like a barcode system allowing an inventory of the array. The remaining 50 bp contains the SNP within the loci of interest. 3. Array-based enzymatic SNP scoring. 4. An antibody-based staining and signal amplification step.

Single-Nucleotide Polymorphism (SNP) Analysis

187

Fig. 2. Overview of the Infinium genotyping assay (47).

Illumina recommends that DNA samples are quantified using the PicoGreen assay (Invitrogen) as previously described. Labelled DNA for use in the Infinium II protocol has a single-base extension (SBE) format from one primer and differentiates between alleles by the incorporation of differently labelled terminator cytosine and guanine nucleotides (Fig. 2). This system is able to genotype SNP classes A/C, A/G, T/C, and T/G. A second assay can be prepared using labelled uracil and guanine terminators to genotype all SNP classes, although this would need to be run on a separate array. Stage 1: Whole-genome amplification The Infinium assay requires 750 ng of genomic DNA (15 ml at 50 ng/ml) and Illumina recommends that DNA be resuspended in TE buffer. Add 15 ml of 0.1 N NaOH to 15 ml of DNA, and resuspend to a concentration of 50 ng/ml in a 96-well storage plate. Leave at room temperature for 10 min. Add 270 ml of primer/neutralisation mix (MP1) and 300 ml of amplification master mix (AMM) to each sample and seal the plate with a CapMat. Invert the plate ten times to mix and centrifuge briefly at 280 × g and incubate in a 37°C oven for approximately 20 h. Stage 2: Fragmentation of amplified DNA After WGA, a white flocculent by-product of magnesium pyrophosphate will be visible, which indicates that amplification was successful. Centrifuge the plate at 50 × g for 1 min and aliquot 600 ml of amplification product into four wells containing

188

Earl and Greenhalf

150 ml each. Add 50 ml of fragmentation (FRG) mix to each well and seal the plate. Vortex at 1,600 rpm for 1 min using a signature high-speed microplate shaker (see Note 6), briefly centrifuge, and incubate at 37°C for 1 h. Stage 3: Precipitate and resuspend DNA Centrifuge for 1 min at 50 × g and add 100 ml of precipitation mix (PA1), seal, and vortex at 1,600 rpm for 1 min. Centrifuge at 50 × g for 1 min and incubate at 37°C for 5 min. Add 300 ml of isopropanol and seal securely using a new Cap-Mat to avoid leakage of isopropanol; invert the plate ten times to mix. Incubate at 4°C for 30 min and centrifuge at 3,000 × g for 20 min at 4°C. Immediately remove the isopropanol by inverting the plate and blotting onto a paper towel, being careful not to lose DNA pellet, which will appear blue at this stage. Dry the plate for 1 h at room temperature. Stage 4: Beadchip hybridisation 1. Add 42 ml of hybridisation buffer (RA1) to each well and seal the plate using a heat-sealed foil cover and incubate at 48°C for 1 h. 2. Resuspend the DNA by vortexing at 1,800 rpm for 1 min and centrifuge at 280 × g for 1 min. Pool the four replicas into one well to a final volume of 160 ml and denature at 95°C for 20 min using an aluminium block specifically designed for the plate format. Then, remove the plate from the heating block and leave at room temperature for 5–10 min. 3. Load the samples onto beadchips. Samples can be stored at −20°C for several months before hybridisation onto the beadchip. However, prior to hybridisation, samples must be denatured again at 95°C for 10 min. 4. Assemble the beadchip into the Te-Flow-through chamber as outlined in the Infinium assay system manual and prepare the hybridisation chambers by addition of 200 ml of humidity buffer (PB2) into the troughs. 5. Slowly dispense 150 ml of 100% formamide into the Te-Flowthrough chamber in a Te-flow rack, ensuring that no bubbles are formed. If bubbles do form, dismantle the Te-flow-through chamber, wash the slide in low-salt buffer (PA1), dry by centrifugation at 280 × g, and reassemble the chamber before repeating the formamide loading. 6. Add 150 ml of hybridisation buffer (RA1) into the Te-flow reservoir and allow to flow through, repeat this step once. Dispense the WGA DNA product (approximately 5–6 mg/ml) into the Te-flow chamber and blot residual hybridisation buffer from the bottom of the chamber. 7. Place the slide horizontally into the Illumina hybridisation enclosure and incubate the beads at 48°C for 16–18 h.

Single-Nucleotide Polymorphism (SNP) Analysis

189

Stage 5: Array-based primer extension 1. Equilibrate the Te-flow chamber rack to 44°C before removing the beadchip Te-flow chambers from the hybridisation enclosure and placing in the Te-flow chamber rack. 2. Wash the beadchip by addition of 450 ml of hybridisation buffer (RA1) to the Te-flow-through reservoir. Repeat this step five times. 3. Add 450 ml of blocking buffer (XB1) and incubate for 10 min before the addition of 450 ml of equilibration buffer (XB2) and incubation for a further 5 min. 4. Add 200 ml of extension master mix (EMM) and incubate for 15 min. 5. Add 450 ml of 95% formamide with 10 mM EDTA and incubate for 1 min; repeat this wash step once. Stage 6: Multilayer IHC sandwich staining 1. Add 450 ml of wash buffer (XB3) to the Te-flow-through reservoir and allow to drain. Repeat this step twice. 2. Add 250 ml of staining solution (LMM) and incubate for 10 min, wash with 450 ml of 95% formamide:10 mM EDTA. Add 250 ml of anti-stain solution (ASM). Incubate for 10 min and wash with 450 ml of 95% formamide:10 mM EDTA as before. Repeat this step once. 3. Add 250 ml of staining solution (LMM) and incubate for 10 min and then wash with 450 ml of 95% formamide:10 mM EDTA. 4. Remove Te-flow chambers and disassemble. Wash beadchips by immersion in low-salt wash buffer (PB1) in a 250 ml wash tray 20 times. 5. Centrifuge slides at 280 × g for 1 min to dry and store in a desiccated light-proof container until ready to scan, which must be performed within 24 h of drying. Beadchips are scanned at 0.84-mm resolution using the bead reader. 3.3.1.1. Quality Control of the Infinium II Assay

A set of oligonucleotide assay controls are included in the kit in a solution of RA1 hybridisation buffer. 1. Control oligonucleotides complementary to sequences on the array at three concentrations high (5 pM), medium (1 pM), and low (0.1 pM) allow testing of the hybridisation and extension efficiency of the system. 2. Control oligonucleotides that contain zero to six internal mismatches allow testing of hybridisation stringency. 3. Target removal controls containing 3¢ mismatches are used to assess the stripping step (because extension will only be from the target towards the bead).

190

Earl and Greenhalf

4. Extension controls have hairpins allowing extension from the control oligonucleotide towards the bead using itself as template. This allows base incorporation to be tested with or without a mismatch. 5. Controls with different levels of biotin allow staining efficiency to be tested. 6. Non-polymorphic controls against regions without SNPs test the efficiency of the system from hybridisation to detection, allowing comparison of different hybridisations. 3.3.2. GoldenGate Genotyping Assay

The GoldenGate assay requires genomic DNA to be activated by biotinylation followed by hybridisation of genotyping probes. Three different oligonucleotide types are used to genotype each individual SNP. Two oligonucleotides are specific to each allele of the SNP and are designated allele-specific oligonucleotides (ASO). A third oligonucleotide is designed to hybridise 1- to 20-bp downstream of the ASO site; this is termed the locus-specific oligonucleotide (LSO). All oligonucleotides contain a region that is specific to a genomic target with a universal sequence attached that acts as a PCR primer site. The LSO also contains an “address” sequence that is specific to a particular bead type (38). Once either of the ASOs and the LSO have annealed to genomic DNA, a non-strand-displacing polymerase without exonuclease activity is used to fill the gap between the two oligonucleotides. Then a DNA ligase joins the extended sequences together. A universal primer specific to the universal sequence of the LSO and two oligonucleotides specific to either of the ASOs, differentially labelled with Cy3 and Cy5, are used to PCR amplify the SNP region whilst immobilised on the beads. The beads containing the labelled amplified DNA are then hybridised onto the array (Fig. 3). GoldenGate offers a variety of SNP panels for human linkage, including a cancer SNP panel with more than 1,400 SNPs from more than 400 cancer-associated genes. Linkage panel IV contains 6,008 SNPs with an average 0.64-cM genetic map and 482-kb physical spacing. The main factor affecting the success rate of allele calling using the GoldenGate assay is genomic DNA quantity. This system can tolerate some degree of DNA degradation and has been used successfully to genotype DNA isolated from formalin-fixed and paraffin-embedded tissue samples (17). Given high initial DNA concentrations, this assay offers the advantage that genotyping occurs directly on genomic DNA and does not require a pre-enrichment PCR amplification that may introduce a bias into the assay; this advantage is lost if an external WGA step is included, which is usually necessary for difficult samples such as paraffin-embedded blocks.

Single-Nucleotide Polymorphism (SNP) Analysis

191

Fig. 3. Overview of the GoldenGate genotyping assay (47).

Stage 1: Biotinylation of genomic DNA The GoldenGate assay requires 250 ng of genomic DNA. Add 5 ml of genomic DNA at a concentration of 50 ng/ml to 5 ml of biotinylation reagent and incubate at 95°C for 30 min. Add 5 ml of precipitation reagent and 15 ml of 2-isopropanol, vortex, and centrifuge at 3,000 × g for 15 min. Resuspend the DNA pellet in 10 ml of resuspension buffer. Stage 2: Annealing of oligonucleotides to genomic DNA Add 30 ml of annealing reagent (OB1) and 10 ml of SNPspecific oligonucleotides (OPA1) to activated DNA from stage 1 to give a final volume of 50 ml. Place samples in a thermal cycler and ramp the temperature from 70°C to 30°C over 2 h (−1°C every 3 min), keep the samples at 30°C until proceeding to the next stage. Stage 3: Oligonucleotide extension and ligation Wash the DNA preparation from stage 2 with buffers provided to remove excess oligonucleotides. Add 37 ml of extension and ligation master mix (MEL) and incubate at 45°C for 15 min.

192

Earl and Greenhalf

Stage 4: PCR amplification Wash the beads with universal buffer 1 (UB1), resuspend the beads in 35 ml of elution buffer (IP1), and heat to 95°C for 1 min to release ligated products. The supernatant is used as a template in PCR amplification using the three universal primers, P1, P2, and P3, labelled with C3, Cy5, and biotin, respectively. The PCR proceeds as follows; 37°C for 10 min followed by 34 cycles of 95°C for 35 s, 56°C for 35 s, and 72°C for 2 min, then 72°C for 10 min. Samples are stored at 4°C until proceeding to the next stage. Stage 5: PCR product preparation Immobilise PCR products onto paramagnetic particles by adding 20 ml of magnetic particle b reagent (MPB) to 60 ml of PCR and incubate at room temperature for 60 min. Wash the bound PCR product with universal buffer 2 (UB2) and denature by addition of 30 ml of 0.1 N NaOH. Incubate at room temperature for 1 min and neutralise samples by the addition of 30 ml of hybridisation reagent (MH1). Stage 6: Array hybridisation Hydrate the array in UB2 for 3 min at room temperature and treat it with 0.1 N NaOH for 30 s, then neutralise the array in UB2 for 1 min. Single-stranded labelled DNA is hybridised to the complement bead type using either the SAM chip or the beadchip array. Hybridisation is performed under a temperature gradient from 60 to 45°C for 12 h and held at 45°C until proceeding to wash stage. Stage 7: Washing and imaging array Rinse the array twice with UB2 and once with IS1 at room temperature with low agitation, dry for 20 min, and image at a resolution of 0.8 mm using a bead reader. Optimise PMT settings for dynamic range, channel balance, and signal-to-noise ratio. 3.3.3. Genotype Calling Using Illumina Arrays

The Beadscan software system co-ordinates the BeadArray reader (Illumina) to scan the array for fluorescence of the Cy3- and Cy5-labelled products using excitation wavelengths of 550 and 630 nm, respectively. This process takes approximately 90 min for 96 samples using the SAM array format. The BeadStudio v3.1 Genotyping Module is used to analyse raw data and generate genotype calls, perform clustering, analyse data intensity, calculate loss of heterozygosity (LOH) using data generated from GoldenGate or Infinium assays. Genotypes are called based on the fluorescence intensity of Cy3 or Cy5 labels for each allele using the Gencall program (Illumina), which applies a Bayesian model to specify the most probable allele combinations. The software produces plots with the normalised theta value on the x-axis (2/p Tan−1[Cy5/ Cy3]) and normalised R value (sum of intensity of Cy3 and Cy5) on the y-axis. Theta values that fall to the left side of the graph (i.e. near 0) are homozygotes for allele A, whereas those falling to the right (i.e. near 1) are homozygous for the B allele, and those

Single-Nucleotide Polymorphism (SNP) Analysis

193

falling in the middle are heterozygous AB. A quality score is calculated for each SNP that reflects the degree of separation between the homozygote and heterozygote clusters. 3.4. Data Analysis Using the HelixTree Programme

HelixTree is a software programme for the analysis of genotyping data for whole-genome association studies. It supports case/control, quantitative trait loci (QTL), and categorical type analysis. Affymetrix GeneChip data can be directly imported into HelixTree. Genotyping data from Illumina’s BeadStudio Data Analysis software is first converted into standardised data storage format (DSF) using a custom BeadStudio plug-in before importing into HelixTree.

3.4.1. Linkage Disequilibrium

Linkage disequilibrium assumes a common founder of a disease mutation. As with any linkage analysis, a disease-causing mutation must always co-segregate with the disease. Linkage disequilibrium (LD) between a disease marker and a second non-diseasecausing marker exists when they are in such close proximity that they are not separated by a recombination event. This results in a difference of allele frequency of non-disease-causing but associated markers in affected individuals (cases) and normal individuals (controls). Blocks of SNPs often occur in strong LD and thus only a few of these “tagging” SNPs are required to characterise a particular LD SNP block. It has been estimated that 300,000 tagging SNPs will provide 70% LD coverage of the genome in whites and Asians and is equivalent in power to 1,000,000 randomly chosen SNPs. Both Illumina and Affymetrix SNP arrays have been successfully applied to LD studies. Illumina arrays have been used to perform association studies in prostate (39), breast (40, 41), and colon cancer (42, 43). Affymetrix SNP chips have been available for longer and so have been more widely used (44, 45). However, it is only very recently been proven that weakly penetrant susceptibility loci can be identified with this technique (46); this will allow the value of hereditary genetics to be extended beyond the very small number of patients with autosomal dominant syndromes and so an enormous increase in the number of such studies is predicted. SNP density and cost will ultimately be the deciding factor between the two platforms. Both companies are aware of this and so we can expect an accelerating course of development, which will surely benefit all researchers.

4. Notes 1. GC-melt may largely consist of betaine, but the authors have not confirmed this nor can we supply evidence that betaine is an effective substitute.

194

Earl and Greenhalf

2. It is tempting to manufacture in-house 4% gels, our experience is that this is not cost effective. 3. The DNA-free room (for preparation of buffers) and pre-PCR room (for preparation of samples) should ideally be physically separated, but, in practice, this is not always expedient. In this case, a sample preparation area can be designated within the DNA-free room. 4. Ensuring a 5% excess is considerably easier with greater numbers of replicates, the figures given here are for a single sample. Eight samples would be more typical, but this example serves best to highlight the issue of errors. The error in volume given in the example is acceptable with no noticeable reduction in efficiency. More accurate measurements of volume are impractical using Gilson pipettes. 5. We recommend that domed capped tubes should be used for PCR to ensure that the tube stays in contact with the heated lid so that the temperature of the PCR remains uniform and to avoid loss of sample due to evaporation. 6. Affymetrix recommends that a specialised devise (jitterbug) is used for resuspension and Illumina also recommends the use of a microplate shaker to mix reactions in a 96-well plate format. We find that taping the plate to a more commonly available vortex is perfectly adequate. 7. The fragmentation reagent is viscous and no less than 0.5 ml should be measured, this should preferably be done with a positive displacement pipette. Using the 3 U/ml reagent, this will give a minimum final volume of the master mix of 30 ml, allowing for five aliquots to be made with a margin of error. If the 2 U/ml reagent is used, then a minimum volume of 20 ml is possible, this would allow three aliquots to be made with a margin of error. 8. The fragmentation reaction in the Affymetrix protocol gives 55 ml of product, 4 ml is used for immediate QC, and 50.5 ml is used in the labelling reaction. This leaves 0.5 ml, which can be used for additional QC if necessary. References 1. Mateu, E., Sanchez, F., Najera, C., Beneyto, M., Castell, V., Hernandez, M., et al. (1997) Genetics of retinoblastoma: A study. Cancer Genet Cytogenet 95, 40–50. 2. Kartheuser, A., West, S., Walon, C., Curtis, A., Hamzehloei, T., Lannoy, N., et al. (1995) The genetic background of familial adenomatous polyposis. Linkage analysis, the APC gene identification and mutation screening. Acta Gastroenterol Belg 58, 433–51.

3. Kohonencorish, M. R. J., Doe, W. F., Stjohn, D. J. B., and Macrae, F. A. (1995) Chromosome 2p linkage analysis in hereditary nonpolyposis colon-cancer. J Gastroenterol Hepatol 10, 76–80. 4. Froggatt, N. J., Koch, J., Davies, R., Evans, D. G. R., Clamp, A., Quarrell, O. W. J., et al. (1995) Genetic-linkage analysis in hereditary nonpolyposis colon-cancer syndrome. J Med Genet 32, 352–7.

Single-Nucleotide Polymorphism (SNP) Analysis

5. Green, R. C., Narod, S. A., Morasse, J., Young, T. L., Cox, J., Fitzgerald, G. W. N., et al. (1994) Hereditary nonpolyposis coloncancer – analysis of linkage to 2p15–16 places the COCA1 locus telomeric to D2s123 and reveals genetic-heterogeneity in seven Canadian families. Am J Hum Genet 54, 1067–77. 6. Nakagawa, H., Koyama, K., Tanaka, T., Miyoshi, Y., Ando, H., Baba, S., et al. (1998) Localization of the gene responsible for Peutz-Jeghers syndrome within a 6-cM region of chromosome 19p13.3. Hum Genet 102, 203–6. 7. Mehenni, H., Blouin, J. L., Radhakrishna, U., Bhardwaj, S. S., Bhardwaj, K., Dixit, V. B., et al. (1997) Peutz-Jeghers syndrome: Confirmation of linkage to chromosome 19p13.3 and identification of a potential second locus, on 19q13.4. Am J Hum Genet 61, 1327–34. 8. Schaid, D. J., McDonnell, S. K., Blute, M. L., and Thibodeau, S. N. (1998) Evidence for autosomal dominant inheritance of prostate cancer. Am J Hum Genet 62, 1425–38. 9. Bonadona, V., and Lasset, C. (2003) Inherited predisposition to breast cancer: After the BRCA1 and BRCA2 genes, what next? Bull Cancer 90, 587–94. 10. Iobagiu, C., Lambert, C., Normand, M., and Genin, C. (2006) Microsatellite profile in hormonal receptor genes associated with breast cancer. Breast Cancer Res Treat 95, 153–9. 11. Driscoll, C. A., Menotti-Raymond, M., Nelson, G., Goldstein, D., and O’Brien, S. J. (2002) Genomic microsatellites as evolutionary chronometers: A test in wild cats. Genome Res 12, 414–23. 12. Fredriksson, H., Ikonen, T., Autio, V., Matikainen, M. P., Helin, H. J., Tammela, T. L., et al. (2006) Identification of germline MLH1 alterations in familial prostate cancer. Eur J Cancer 42, 2802–6. 13. Gillanders, E. M., Pearson, J. V., Sorant, A. J., Trent, J. M., O’Connell, J. R., and BaileyWilson, J. E. (2006) The value of molecular haplotypes in a family-based linkage study. Am J Hum Genet 79, 458–68. 14. Tanaka, Y., Hirata, H., Chen, Z., Kikuno, N., Kawamoto, K., Majid, S., et al. (2007) Polymorphisms of catechol-O-methyltransferase in men with renal cell cancer. Cancer Epidemiol Biomarkers Prev 16, 92–7. 15. Kamio, K., Matsushita, I., Tanaka, G., Ohashi, J., Hijikata, M., Nakata, K., et al. (2004) Direct determination of MUC5B promoter haplotypes based on the method of singlestrand conformation polymorphism and their statistical estimation. Genomics 84, 613–22.

195

16. Verma, M., and Kumar, D. (2007) Application of mitochondrial genome information in cancer epidemiology. Clin Chim Acta 383, 41–50. 17. Fan, J. B., Gunderson, K. L., Bibikova, M., Yeakley, J. M., Chen, J., Wickham Garcia, E., et al. (2006) Illumina universal bead arrays. Methods Enzymol 410, 57–73. 18. Schena, M., Shalon, D., Davis, R. W., and Brown, P. O. (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270, 467–70. 19. Clark-Langone, K. M., Wu, J. Y., Sangli, C., Chen, A., Snable, J. L., Nguyen, A., et al. (2007) Biomarker discovery for colon cancer using a 761 gene RT-PCR assay. BMC Genomics 8, 279. 20. Lind, G. E., Kleivi, K., Meling, G. I., Teixeira, M. R., Thiis-Evensen, E., Rognum, T. O., et al. (2006) ADAMTS1, CRABP1, and NR3C1 identified as epigenetically deregulated genes in colorectal tumorigenesis. Cell Oncol 28, 259–72. 21. Weber, A., Hengge, U. R., Stricker, I., Tischoff, I., Markwart, A., Anhalt, K., et al. (2007) Protein microarrays for the detection of biomarkers in head and neck squamous cell carcinomas. Hum Pathol 38, 228–38. 22. Zhao, H. J., Ramos, C. F., Brooks, J. D., and Peehl, D. M. (2007) Distinctive gene expression of prostatic stromal cells cultured from diseased versus normal tissues. J Cell Physiol 210, 111–21. 23. Michels, E., De Preter, K., Van Roy, N., and Speleman, F. (2007) Detection of DNA copy number alterations in cancer by array comparative genomic hybridization. Genet Med 9, 574–84. 24. Nowak, N. J., Miecznikowski, J., Moore, S. R., Gaile, D., Bobadilla, D., Smith, D. D., et al. (2007) Challenges in array comparative genomic hybridization for the analysis of cancer samples. Genet Med 9, 585–95. 25. Cao, X., Eu, K. W., Kumarasinghe, M. P., Li, H. H., Loi, C., and Cheah, P. Y. (2006) Mapping of hereditary mixed polyposis syndrome (HMPS) to chromosome 10q23 by genomewide high-density single nucleotide polymorphism (SNP) scan and identification of BMPR1A loss of function. J Med Genet 43, e13. 26. Kader, A. K., Shao, L., Dinney, C. P., Schabath, M. B., Wang, Y., Liu, J., et al. (2006) Matrix metalloproteinase polymorphisms and bladder cancer risk. Cancer Res 66, 11644–8. 27. Zheng, S. L., Sun, J., Cheng, Y., Li, G., Hsu, F. C., Zhu, Y., et al. (2007) Association between

196

Earl and Greenhalf

two unlinked loci at 8q24 and prostate cancer risk among European Americans. J Natl Cancer Inst 99, 1525–33. 28. Berndt, S. I., Platz, E. A., Fallin, M. D., Thuita, L. W., Hoffman, S. C., and Helzlsouer, K. J. (2007) Mismatch repair polymorphisms and the risk of colorectal cancer. Int J Cancer 120, 1548–54. 29. Peiffer, D. A., Le, J. M., Steemers, F. J., Chang, W., Jenniges, T., Garcia, F., et al. (2006) High-resolution genomic profiling of chromosomal aberrations using Infinium whole-genome genotyping. Genome Res 16, 1136–48. 30. Botstein, D., and Risch, N. (2003) Discovering genotypes underlying human phenotypes: Past successes for mendelian disease, future approaches for complex disease. Nat Genet 33(Suppl), 228–37. 31. Nicolas, P., Sun, F., and Li, L. M. (2006) A model-based approach to selection of tag SNPs. BMC Bioinformatics 7, 303. 32. Pinkel, D., Segraves, R., Sudar, D., Clark, S., Poole, I., Kowbel, D., et al. (1998) High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nat Genet 20, 207–11. 33. Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., Spellman, P., Stoeckert, C., et al. (2001) Minimum information about a microarray experiment (MIAME) – toward standards for microarray data. Nat Genet 29, 365–71. 34. Barker, D. L., Hansen, M. S., Faruqi, A. F., Giannola, D., Irsula, O. R., Lasken, R. S., et al. (2004) Two methods of whole-genome amplification enable accurate genotyping across a 2320-SNP linkage panel. Genome Res 14, 901–7. 35. Pask, R., Rance, H. E., Barratt, B. J., Nutland, S., Smyth, D. J., Sebastian, M., et al. (2004) Investigating the utility of combining phi29 whole genome amplification and highly multiplexed single nucleotide polymorphism BeadArray genotyping. BMC Biotechnol 4, 15. 36. Sawcer, S., Ban, M., Maranian, M., Yeo, T. W., Compston, A., Kirby, A., et al. (2005) A high-density screen for linkage in multiple sclerosis. Am J Hum Genet 77, 454–67. 37. Affymetrix Genechip® Mapping 500K Assay Manual. http://www.affymetrix.com/support/ technical/manuals.affx.

38. Gunderson, K. L., Kruglyak, S., Graige, M. S., Garcia, F., Kermani, B. G., Zhao, C., et al. (2004) Decoding randomly ordered DNA arrays. Genome Res 14, 870–7. 39. Gudmundsson, J., Sulem, P., Steinthorsdottir, V., Bergthorsson, J. T., Thorleifsson, G., Manolescu, A., et al. (2007) Two variants on chromosome 17 confer prostate cancer risk, and the one in TCF2 protects against type 2 diabetes. Nat Genet 39, 977–83. 40. Hunter, D. J., Kraft, P., Jacobs, K. B., Cox, D. G., Yeager, M., Hankinson, S. E., et al. (2007) A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat Genet 39, 870–4. 41. Stacey, S. N., Manolescu, A., Sulem, P., Rafnar, T., Gudmundsson, J., Gudjonsson, S. A., et al. (2007) Common variants on chromosomes 2q35 and 16q12 confer susceptibility to estrogen receptor-positive breast cancer. Nat Genet 39, 865–9. 42. Broderick, P., Carvajal-Carmona, L., Pittman, A. M., Webb, E., Howarth, K., Rowan, A., et al. (2007) A genome-wide association study shows that common alleles of SMAD7 influence colorectal cancer risk. Nat Genet 39, 1315–7. 43. Tomlinson, I., Webb, E., Carvajal-Carmona, L., Broderick, P., Kemp, Z., Spain, S., et al. (2007) A genome-wide association scan of tag SNPs identifies a susceptibility variant for colorectal cancer at 8q24.21. Nat Genet 39, 984–8. 44. Zanke, B. W., Greenwood, C. M., Rangrej, J., Kustra, R., Tenesa, A., Farrington, S. M., et al. (2007) Genome-wide association scan identifies a colorectal cancer susceptibility locus on chromosome 8q24. Nat Genet 39, 989–94. 45. Hu, N., Wang, C., Hu, Y., Yang, H. H., Giffen, C., Tang, Z. Z., et al. (2005) Genomewide association study in esophageal cancer using GeneChip mapping 10K array. Cancer Res 65, 2542–6. 46. Easton, D. F., Pooley, K. A., Dunning, A. M., Pharoah, P. D., Thompson, D., Ballinger, D. G., et al. (2007) Genome-wide association study identifies novel breast cancer susceptibility loci. Nature 447, 1087–93. 47. Illumina Illumina product guide 2006– 2007. http://illumina.com/pagesnrn.ilmn? ID=70#39.

Chapter 11 Application of Proteomics in Cancer Gene Profiling: Two-Dimensional Difference in Gel Electrophoresis (2D-DIGE) Deepak Hariharan, Mark E. Weeks, and Tatjana Crnogorac-Jurcevic Summary In the post-genomic era, proteomic strategies are at the forefront of cancer research. By studying the complement of all expressed genes, proteomics aims to provide knowledge of biomarkers indicative of the physiological state of cancer cells at a specific time, enabling screening, early diagnosis, monitoring the course of cancer development/progression, and gauging the efficacy and safety of novel therapeutic agents. Onco-proteomics thus has the ability to revolutionise oncology practice by delivering highly selective and individualised clinical care. One of the proteomic techniques, two-dimensional (2D) difference in gel electrophoresis (DIGE) enables simultaneous examination and comparison of multiple samples using cyanine dyes to label amino acid residues that are then separated based on charge and mass. This technique reduces variability, improves reproducibility, and allows easier quantitation when compared with traditional 2D polyacrylamide gel electrophoresis (PAGE). These advantages combined with universal availability makes 2D-DIGE a first method of choice in cancer proteome analysis of diverse specimens, including tissues, cell lines, blood, and other body fluids. Key words: Proteomics, two-dimensional difference in gel electrophoresis, 2D-DIGE, Cancer

1. Introduction After completion of the human genome project, large-scale analysis of messenger RNA (mRNA) expression using microarrays, serial analysis of gene expression, and differential display have undoubtedly increased our understanding of the molecular basis of oncogenesis. Unfortunately, RNA expression does not always correlate to changes in protein levels, and, because proRobert Grützmann and Christian Pilarsky (eds.), Cancer Gene Profiling: Methods and Protocols, Methods in Molecular Biology, vol. 576, DOI 10.1007/978-1-59745-545-9_11, © Humana Press, a part of Springer Science + Business Media, LLC 2010

197

198

Hariharan, Weeks, and Crnogorac-Jurcevic

teins are the effectors of the majority of biological events occurring in the cell, proteomics has been increasingly used in cancer research (1). Proteomics encompasses identification of changes in protein expression, post-translational modifications, subcellular distribution, and deciphering protein–protein interactions. Onco-proteomics aims to isolate, identify, and recognise the pattern of expression of diverse proteins in different physiological states (cancer vs. benign; early cancer vs. advanced cancer states, etc.) (1). Several proteomic techniques may be used to examine the protein expression in cancer. Classic proteomic approaches such as Western blot and immunohistochemistry usually enable examination of only one or few proteins at a time. With the recent rapid evolution of a number of new technologies, a variety of proteomic techniques are available to interrogate the cancer proteome on a large-scale, permitting simultaneous study of numerous proteins from multiple biological samples. These are broadly divided into gel-based and non-gel-based techniques, each with its own advantages and disadvantages. Gel-based proteomics (two-dimensional [2D] polyacrylamide gel electrophoresis [PAGE]) has become extremely popular since first described in 1975 by O’Farrell (2). In 2D-PAGE, proteins are first separated by charge, followed by mass separation in the second dimension. This technique can be time consuming, labour intensive, suffers from poor resolution at high pI values, and lacks reproducibility. It is also selective for medium to high abundant, easily soluble proteins. While not addressing all the shortfalls of 2D-PAGE, the development of 2D difference in gel electrophoresis (DIGE) by Unlu et al. in 1997 significantly improved accuracy and led to more precise quantitation over a wider dynamic range (3). This is a result of introducing pre-labelling of proteins with positively charged, amine reactive and molecular weight-matched fluorescent cyanine dyes (Cy2, Cy3, and Cy5), followed by simultaneous electrophoresis on the same 2D gel (4). The output signals of resolved labelled proteins are detected by multi-wavelength fluorescent detection imagers. The superior dynamic range of detection of cyanine dyes increases the sensitivity of the technique with added advantages of reduction in inter-gel variability, the number of gels required, accurate spot matching, and compatibility with the identification of protein spots using mass spectrometry (MS) (4, 5). The statistical analysis performed ensures that the detected difference in intensity is the result of true biological differences rather than experimental variability (5). Gel-free strategies such as isotope-coded affinity tagging (ICAT), isobaric tags for relative and absolute quantitation (iTRAQ), and electron spray ionisation tandem mass spectrometry (ESI MS/MS) rely on liquid chromatography (LC) for protein separation interfaced with high-end mass spectrometers for

Application of Proteomics in Cancer Gene Profiling

199

protein identification (6). The advantages of such techniques include automation and reduced sample requirement. The main drawbacks seem to be the lack of universal availability and prohibitive costs (1), although these techniques often produce different subsets of regulated proteins with limited overlap, making them complementary to gel-based techniques (6). Surface-enhanced laser desorption/ionisation (SELDI)– time-of-flight (TOF) MS enables high-throughput analysis of individual clinical samples such as serum, urine, and other biofluids using ProteinChips with various surface characteristics, but it usually does not provide the identity of differentially expressed proteins (7). Here, we describe the materials and instruments required for 2D-DIGE analysis, including labelling of samples, separation of proteins, image capture, analysis, spot picking and tryptic digestion of protein spots prior to mass spectrometric analysis for the identification of differentially expressed proteins in cancer.

2. Materials 2.1. Reagents

1. CyDye™ DIGE fluors N-hydroxy-succinimidyl (NHS) esters of Cy2 (Prod. No. RPK0272), Cy3 (Prod. No. RPK0273) and Cy5 (Prod. No. RPK0275) are purchased from Amersham Biosciences, GE Healthcare, NJ, USA. 2. Anhydrous 99.8% N,N-dimethylformamide (DMF, Prod. No. 22,705-6) is purchased from Aldrich, Sigma-Aldrich, UK. 3. Ampholines (pH 3.5–10, Prod. No. 80-1125-87) and Pharmalyte (pH 3–10, Prod. No. 17-0456-01) are from Amersham Biosciences, GE Healthcare. 4. Coomassie Protein Assay Reagent (Prod. No. 1856210) is from Pierce, Perbio Science Ltd, UK. 5. Other common chemicals required: urea, thiourea, dithioerythritol (DTT), l-lysine, Tris base, phosphate-buffered saline (PBS), IGEPAL-630, bromophenol blue, methanol, glacial acetic acid, 3-[(3-cholamidopropyl)-dimethylammonio]-1propanesulfonate (CHAPS), sodium dodecyl sulphate (SDS), N,N,N¢,N¢-tetramethylethylenediamine (TEMED), iodoacetamide (IAM), acetonitrile solution (ACN), ammonium persulphate (APS), and Agarose Colloidal Coomassie Brilliant Blue (CCB) G-250 tablets.

2.2. Equipment

The equipment used in our experiments was purchased from Amersham Biosciences, GE Healthcare (similar equipment may be purchased from other manufacturers): Immobiline Dry Strip

200

Hariharan, Weeks, and Crnogorac-Jurcevic

Reswelling Tray, Immobiline Dry Strip IEF gels, Immobiline Dry Strip Cover Fluid, Multiphor II Electrophoresis Unit, Ettan DALT low-fluorescence glass plates, Plus One Repel Silane, Plus One Bind Silane, reference markers, low-fluorescence glass plates, Ettan DALT 12-gel caster, separation unit, and power supply, Typhoon 9400 Imager, Ettan Spot Picker, and DeCyder differential analysis software. The SpeedVac was purchased from Thermo savant SPD 1010. 2.3. Solutions

1. CyDyes (NHS-Cy2, -Cy3, -Cy5) are purchased as lyophilised powder and kept at −20°C. The dyes are reconstituted by the addition of 10 ml of DMF and centrifuging at 12,000 × g for 30 s to make a stock solution of 1,000 pmol/ml. Solutions are wrapped in aluminium foil and stored in the dark at −20°C. Before opening the tubes, stock solutions are equilibrated on ice. 2. Lysis buffer: 8 M urea, 2 M thiourea, 4% (w/v) CHAPS, 0.5% IGEPAL-630 (v/v), 10 mM Tris-HCl, pH 8.3. To make 100 ml, dissolve 48 g of urea and 15.2 g of thiourea in 50 ml of distilled water. Add 4 g CHAPS, 500 mL IGEPAL-630, and 0.67 ml of 1.5 M Tris base pH 8.8 solution. This should give a final pH of 8.3, which is critical for labelling. All solutions are aliquoted and stored at −20°C. 3. 40% (w/v) CHAPS: A 100 ml solution is prepared by dissolving 40 g CHAPS in water and volume adjusted to 100 ml. This solution is stored at room temperature. 4. 10% IGEPAL-630 (v/v) in milliQ 18 W water. This solution can be stored at room temperature. 5. l-lysine solution: To make 10 mM l-lysine in water, 9.1 mg of l-lysine is dissolved in 5 ml distilled water. The solution is aliquoted and stored at −20°C. 6. 1.3 M dithioerythritol (DTT) solutions: To make 10 ml, 2 g of DTT is dissolved in distilled water and the volume adjusted to 10 ml. This solution is aliquoted and stored at −20°C. A note of precaution is that this solution should not be heated. 7. Ampholines/pharmalyte solution: equal volumes of ampholines (pH 3.5–10) and pharmalyte (pH 3–10) are mixed and stored at 4°C. These broad pH range IPG buffers can be replaced with narrow-range buffers depending on the firstdimension pH range. 8. 0.2% (w/v) Bromophenol blue solution: To make 10 ml, 20 mg of bromophenol blue is diluted in 10 ml with distilled water. The solution is then filtered and stored at room temperature. 9. Polyacrylamide gel electrophoresis (PAGE) solution: 10–15% solutions may be used for second-dimension separation

Application of Proteomics in Cancer Gene Profiling

201

depending on the type of sample used. For one gel, 14% PAGE solution is made in a conical flask mixing 46.5 ml of 30% acrylamide-bis, 26.9 ml of milliQ 18 W water (or equivalent), 25 ml of 1.5 M Tris–HCL, pH 8.8, 1 ml of 10% SDS, 50 ml of N,N,N¢,N¢-tetramethylethylenediamine (TEMED), and 0.5 ml of APS and stirring for 1 h. The TEMED and APS are added immediately prior to pouring the gel solution into the casting unit. 10. Bind Saline solution: For twelve 24-cm × 20-cm plates, mix 16 ml of Plus One Bind Saline, 400 ml glacial acetic acid, 16 ml ethanol, and 3.6 ml distilled water. 11. Equilibration buffer (6 M urea, 30% [v/v] glycerol, 50 mM Tris–HCl, pH 6.8, 2% [w/v] SDS): To make 200 ml, dissolve 72 g urea in 100 ml distilled water. Add 60 ml of 100% glycerol, 10 ml of 1 M Tris base, pH 6.8 solution, and 4 g SDS. Adjust the volume to 200 ml. The solution is aliquoted and stored at −20°C for 4–6 weeks. 12. 0.5% (w/v) agarose overlay: To make 200 ml, melt 1 g of agarose in 200 ml of 1× SDS electrophoresis buffer in a microwave on low heat. Add bromophenol blue solution for a pale blue colour. 13. 1× Tris glycine–SDS running buffer: (Single strength [1×] solution contains: 0.025 M Tris base, 0.192 M glycine, and 0.1% sodium dodecyl sulphate): 200 ml stock solution of 10× Tris glycine-SDS is diluted in milliQ 18 W water to make final volume of 2 l. 14. Gel fixing solution (35% [v/v] methanol, 7.5% [v/v] acetic acid: 750 ml of 100% methanol is mixed with 150 ml of 100% acetic acid and this volume is further diluted in distilled water to make a final volume of 2 l. 15. Colloidal Coomassie Brilliant Blue fixing solution (50% [v/v] ethanol, 2% [v/v] phosphoric acid): To make 2 l of solution, 1 l of 100% ethanol is mixed with 40 ml of 100% phosphoric acid and further diluted with milliQ 18 W water. 16. Colloidal Coomassie Brilliant Blue staining solution (34% [v/v] methanol, 17% [w/v] ammonium sulphate, 3% [v/v] phosphoric acid): 340 mg of ammonium sulphate is dissolved in 680 ml of 100% methanol, 60 ml of 100% phosphoric acid and further diluted with milliQ 18 W water to make final volume of 2 l. Using a magnetic stirrer and stir bar facilitates the dissolution of ammonium sulphate. 17. Acetonitrile solution (50% [v/v]): To make 50 ml, 25 ml of 100% acetonitrile solution is diluted with milliQ 18 W water. 18. 10 mM solution of ammonium bicarbonate: 0.079 g of ammonium bicarbonate is dissolved in 100 ml of milliQ 18 W water.

202

Hariharan, Weeks, and Crnogorac-Jurcevic

19. 10 mM DTT solution in 10 mM ammonium bicarbonate: 0.0154 g of DTT is dissolved in 10 ml of 10 mM ammonium bicarbonate solution. 20. 10 mM IAM solution in 10 mM ammonium bicarbonate: 0.0184 g of IAM is dissolved in 10 ml of 10 mM ammonium bicarbonate solution. 21. Trypsin solution: stock solution of modified porcine trypsin (Promega, Southampton, UK) is prepared by the addition of 40 ml of trypsin resuspension buffer to 20 mg of trypsin provided in vial and further diluting it (1:100) by the addition of 5 mM ammonium bicarbonate (5 ng/ml final concentration). The diluted trypsin is aliquoted in 1.5-ml Eppendorf tubes and stored at −20°C.

3. Methods 3.1. Sample Collection and Preparation

Collection and processing of samples are the first critical steps in 2D-DIGE expression profiling (8). A variety of cell types (whole tissue, cell lines) and body fluids (saliva, cerebrospinal fluid, plasma and urine) have already been analysed by 2D-DIGE (7, 9–13). Readers can refer to these references for obtaining information on optimal processing of their particular sample(s) of interest. Sample collection and processing protocols should be standardised to reduce inter-sample variability and ensure sufficient protein quantity (usually 2–3 mg/ml of protein per sample) prior to 2D-DIGE experiments. During collection, EDTA-free protease inhibitors should be used to prevent sample degradation, because they do not interfere with Cy dye labelling.

3.2. Experimental Design

2D-DIGE experiments require careful design so that statistically meaningful data can be obtained. The number of samples that require comparison are largely limited by the number of gels and labelling design chosen. By far the most popular experimental design involves labelling of the pooled internal standard (sample composed of equal aliquots of each sample used in the experiment) with Cy2 dye, while Cy3 and Cy5 are used to label test and reference samples. The internal standard permits robust analysis and more precise quantitation. Statistical analysis is further improved by increasing the number of gels run with performing both technical and biological replicates (minimum of three). Differential expression is calculated as the average fold-change (the average spot intensity ratio between differentially labelled spots matching across all three gels)

Application of Proteomics in Cancer Gene Profiling

203

Table 1 Example of the experimental design of a simple 2D-DIGE experiment Cy2

Cy3

Cy5

Gel 1

100 mg pool

100 mg disease sample, replicate 1

100 mg control sample, replicate 1

Gel 2

100 mg pool

100 mg control sample, replicate 2

100 mg disease sample, replicate 2

Gel 3

100 mg pool

100 mg disease sample, replicate 3

100 mg control sample, replicate 3

with statistical confidence provided by a t test. Table 1 gives an example of an experimental design where 100 mg of proteins from a disease sample (cancer) and control sample (healthy/ benign disease) are compared in triplicates (50 mg of each sample are pooled for the internal standard). Dye bias is reduced by interchangeably labelling samples as shown in Table 1. Due to typically high biological variability and in cases where limited quantity of samples are available, pooling can be considered to obtain statistically valid data. More complex comparisons can be performed by running up to 12 gels simultaneously and comparing 24 samples, as shown by Gharbi et al. (14). It is our recommendation that protein expression profiling experiments using 2D-DIGE are best performed in dedicated laboratory space (see Note 1). 3.3. Cy Dye Labelling

1. Protein concentrations of samples to be analysed are determined using one of the quantitation methods, such as Pierce Coomassie protein assay reagent and performed according to manufacturer’s instructions, using bovine serum albumin in the lysis buffer to generate a standard curve (see Notes 2 and 3). At least three to four replicate assays should be performed for each sample for accurate protein determination. For ease, concentrated samples are diluted with lysis buffer and adjusted to the same protein concentration. 2. From the stock solution of Cy dyes, a working solution of 200 pmol/ml of Cy dye is made by the adding 4 ml of DMF to 1 ml of stock solution (see Note 4). 3. The desired amount of protein is aliquoted into low proteinbinding Eppendorf tubes (100 mg of protein from disease and control samples, as shown in the experiment in Table 1), and a pool of all samples (internal standard) is made of a mixture of 150 mg of each of the disease and control samples, as the experiment is performed in triplicate (see Note 5). 4. Disease and control samples are labelled by the addition of 400 pmol of the appropriate CyDye (Cy3/Cy5) per 100 mg of protein used interchangeably as shown in Table 1,

204

Hariharan, Weeks, and Crnogorac-Jurcevic

and the internal standard is labelled with Cy2 (1,200 pmol/300 mg of protein). Samples are then mixed, vortexed, and incubated on ice in the dark for 30 min (see Note 6). 5. The labelling reaction is quenched by adding a 20-fold molar excess of l-lysine (for 400 pmol CyDye, add 0.8 ml of 10 mM l-lysine solution) with incubation on ice in the dark for 10 min. 6. The Cy3- and Cy5-labelled samples are mixed appropriately and a 100-mg aliquot of the Cy2-labelled pool is added (to give 300 mg total protein). 7. Samples are then reduced by the addition of 1.3 M DTT to a final concentration of 65 mM. 8. Add 9 ml of carrier ampholines/pharmalyte mix to a final concentration of 2% (v/v) and add 1 ml of 0.2% bromophenol blue. The total volume is then adjusted to 450 ml with lysis buffer. Samples are spun in a centrifuge at 13,200 × g for 5 min. 3.4. Sample Loading to Immobiline™ Drystrips

1. 450 ml of Cy dye-labelled samples are pipetted into the individual wells of an Immobiline re-swelling tray. 2. The plastic protective cover from Immobiline Drystrip (24 cm, pH 3–11) is removed and the strips are placed gel-side down into the sample solution (see Note 7). 3. The gel strips and samples are incubated at room temperature for 10 min prior to covering each strip in Immobiline Drystrip cover fluid. 4. The re-swelling tray with samples and gels is rehydrated in the dark by covering in aluminium foil at room temperature for a period up to 12 h.

3.5. First-Dimension Separation Using Iso-electric Focussing (IEF)

1. Once rehydration is completed, a pair of forceps is used to remove the strips and drain excess cover fluid, taking extreme care not to touch or damage the gel. 2. Wicks soaked in a 65 mM solution of DTT are used to improve contact between the gel surface and electrodes, leading to better separation. 3. Samples are separated by isoelectric point on a MultiPhore II flatbed system (Amersham, UK) for a total of 95 kVh, in accordance with the manufacturer’s instructions. 4. The temperature of the MultiPhore II flatbed system is maintained at 17°C throughout, preventing the development of hot spots. 5. A sloped gradient of 0–500 V for 1 h, 500 V for 2 h, 500–3,500 V for 2 h, 3,500 V for 24 h, and finally 500 V for 2 h is used on a MultiPhore II power supply (this can be protein load and sample specific and should be increased for higher protein loads) and the separation is carried out (see Note 8). 6. The apparatus is covered to exclude light.

3.6. SecondDimension Gel Preparation and Separation

Application of Proteomics in Cancer Gene Profiling

205

1. Sets of low-fluorescence glass plates containing a large back plate (with spacer attached) and smaller front plates (Ettan DALT 24-cm gel plates) are selected depending on the number of gels that needs to be run. The spacer size can be varied for 0.5-, 1.0-, and 1.5-mm gel thickness. 2. The plates are washed thoroughly with detergent and airdried, then further washed with ethanol, then milliQ 18 W water, wiping the plates dry with a lint-free cloth between washes. 3. Reference markers are applied to the surface of the smaller plates. These are placed halfway down the plates and 15–20 mm in from each edge and are critical for determining coordinates for automated spot picking. Fresh Bind Saline solution (1.5 ml) is applied per small plate and the surface is wiped with a lint-free tissue. The plates are left to dry for a minimum of 1 h. 4. The inner surface of the larger spacer plate is treated with Repel Silane, wiped using lint-free tissues, and left to dry for 10 min. 5. Glass plates with the repel and bind surfaces facing each other are assembled in an Ettan gel casting unit, ensuring a tight seal and no leak. 6. The feeding tube and funnel are attached to the caster and 10–15% polyacrylamide gel solution is poured through the funnel taking care not to introduce air bubbles. 7. Water-saturated butanol (2 ml) is overlaid on each gel and the gels are allowed to polymerise for at least an hour (see Note 9). 8. After the completion of first-dimension separation, the gel strips are equilibrated for 15 min in equilibration buffer containing 65 mM DTT and then 15 min in the same buffer containing 240 mM IAM (see Note 10). 9. The equilibrated Immobiline gel strips are then rinsed with 1 × SDS electrophoresis buffer prior to being placed onto the top of handmade PAGE gels described in Subheading 3.5. 10. The strips are placed in a molten 0.5% agarose overlay, with the basic end of the strip towards the left-hand side, and the bonded plate facing forward (see Note 11). 11. The agarose is allowed to cool and set at room temperature. 12. The electrophoresis tank is filled with 1 × SDS running buffer and gels complete with Immobiline Drystrips are inserted into the designated slots in the gel tank. 13. The unoccupied slots are filled with blank plastic plates and the top tank filled with running buffer.

206

Hariharan, Weeks, and Crnogorac-Jurcevic

14. Separation is performed according to manufacturer’s instructions. For the Ettan DALT 12 system, this is achieved by running gels for 16 h at 2.2 W per gel or until the dye front has reached the bottom of the gel (see Note 12). 3.7. Image Capture

1. Gels are removed from the Ettan gel tank and rinsed in milliQ 18 W water prior to image capture (see Note 13). 2. Gels plates are aligned according to the manufacturer’s instructions and the photomultiplier tube (PMT) voltage is set to low (500 V) on each channel (Cy2, Cy3, and Cy5). 3. Optimal excitation/emission wavelengths for fluorescence detection are 488/520 nm for Cy2, 532/580 nm for Cy3, and 633/680 nm for Cy5. 4. Preliminary low-resolution scan (1,000 mm) images are built up by the scanner (we use a Typhoon 9400 multi-wavelength laser scanner) for each channel and grey scale pixel values are generated. 5. Using ImageQuant software (GE Healthcare) for Typhoon 9400, maximum pixel values in various user-defined, spotrich regions of each image are obtained by adjusting PMT voltages. 6. Repeated scans may be required because maximum pixel values are required to be within 10% for each of the three channels. 7. Once PMT voltages are set, high-resolution scans (100 mm) are performed across all gels. Two gels can be scanned simultaneously (~10 min per channel), and images obtained are exported as .tiff files.

3.8. Image Analysis

1. Scanned images are analysed using differential analysis software (we used DeCyder version 5.1). 2. Spot boundaries are defined using DeCyder differential in gel analysis (DIA) module, allowing for intra-gel analysis. 3. Standardised abundance for each spot is obtained by comparing Cy3 and Cy5 labelled spots to the internal standard (ratios – Cy3:Cy2 and Cy5:Cy2). 4. The DeCyder biological variation analysis (BVA) module is used to match test spot volumes across all gels (inter-gel analysis). 5. A list of statistically significant deregulated protein spots between disease and control samples is generated using Student’s t test or one- or two-way analysis of variance (ANOVA). 6. After the position of the reference markers is defined in the BVA software, the x–y co-ordinates of each spot of interest can be exported as a “coordinate pick list.”

3.9. Post-staining and Spot Picking

Application of Proteomics in Cancer Gene Profiling

207

Post-staining of 2D gels using CCB is compatible with Cy dye labelling and mass spectrometry and allows for improved spot picking (14). 1. On completion of image capture and analysis process, gel plates are separated, and individual gels bonded to lowfluorescent glass back plates are placed in separate trays with lids containing 440 ml of fixing solution per gel. 2. Trays with bonded gels immersed in fixing solution are placed on an orbital shaker and gently shaken overnight at room temperature. 3. After fixation, gels are washed three times for 30 min each with milliQ 18 W water. 4. Freshly made staining solution (400 ml/gel) is added to the washed gels in a new plastic tray and gels are left shaking for several hours at room temperature. 5. 0.5 g/l of CCB G-250 stain is added to each gel, the lids of trays are replaced, and the trays are shaken gently at room temperature for 48 h (see Note 14). 6. After completion of CCB staining, the gels are washed briefly in milliQ 18 W water and scanned using a Typhoon 9400 scanner using red laser and no filters. 7. Scanned CCB stained gel images are saved as .tiff format files and imported into DeCyder software. 8. Spots are matched accurately between CCB- and Cy dyestained images, leading to the generation of a pick list containing the same master spot numbers as provided by BVA quantitative analysis. 9. The relative position of differentially expressed protein spots that require identification are defined according to the reference markers initially applied on the smaller gel plate. The pick list coordinate file is exported to the Ettan automated spot picker. 10. A CCB gel of interest is clamped down into position in the picker and submerged into a 1- to 2-mm-deep layer of milliQ 18 W water. 11. The imported pick list is opened and the spot picking head of the instrument is aligned to the reference markers previously defined by DeCyder according to the manufacturer’s instructions. 12. Spots of interest are excised using a 2-mm picking head and are placed in a designated 96-well plate in 200 ml of milliQ 18 W water. 13. The water from each well is removed prior to storage at −20°C or subsequent digestion, and mass spectrometric analysis is performed.

208

Hariharan, Weeks, and Crnogorac-Jurcevic

3.10. Spot Digestion

1. The gel spots within the 96-well plates are dehydrated and destained by washing three times in 30 ml of 50% acetonitrile (ACN) and then dried in a SpeedVac for 10 min. 2. Destained spots are removed from the 96-well plate and transferred into individual siliconised Eppendorf tubes, because 96-well plates can bind protein and peptides and are not suitable for carrying out tryptic digests. 3. Di-sulphide bonds in proteins are reduced by the addition of 10 mM DTT in 10 mM ammonium bicarbonate (pH 8) followed by incubation for 45 min at 50°C. 4. The DTT is removed and a 50 mM solution of IAM in 10 mM ammonium bicarbonate (pH 8) is added, followed by incubation for an hour in the dark to alkylate cysteine moieties in proteins. 5. The IAM solution is removed and the gel pieces are washed twice in 50% ACN and dried in SpeedVac for 10 min. 6. 10 ml of diluted trypsin (at a concentration of 5 ng/ml, refer to Subheading 2.3, item 20) solution is added to each Eppendorf tube containing a single gel spot. 7. Tubes are incubated at room temperature for 10 min prior to overlaying each spot with 10–20 ml of 10 mM ammonium bicarbonate. 8. The samples are incubated at 37°C overnight (not exceeding 16 h). 9. The samples are centrifuged at 10,000 × g for 2 min prior to the addition of 5 ml of 50% of ACN/5% trifluoro-acetic acid. The samples are gently agitated for 5 min. 10. The supernatant is transferred to a fresh tube and sufficient 50% ACN/5% trifluoro-acetic acid to cover gel piece is added again. This process is repeated twice. 11. The pools of extracted peptides from each gel piece (steps 9 and 10) are dried in a SpeedVac. 12. The samples are stored at −20°C until ready for mass spectrometry.

3.11. Protein Identification

Protein identification by mass spectrometry is performed using either (or a combination of) peptide mass fingerprinting (PMF) or sequence-specific peptide fragmentation (15). Matrix-assisted laser desorption ionisation mass spectrometry (MALDI MS) is fast, accurate, sensitive, and easy to perform and is the most commonly used technique for peptide mass fingerprinting. If PMF fails to unambiguously identify proteins, peptide sequencing needs to be performed (16). The choice of mass spectrometer largely depends on the local availability of hardware, resources, and expertise.

Application of Proteomics in Cancer Gene Profiling

209

Fig.1. Post-scanned 2D-DIGE 24-cm × 20-cm × 1-mm 14% bis/acrylamide gel image marked with differentially expressed protein spots identified using MALDI MS when cancer and normal human urine specimens were compared.

Figure 1 shows a representative 2D-DIGE gel image depicting differentially expressed proteins that were identified by MALDI MS.

4. Notes 1. Use of a dedicated clean room is recommended in all steps involving 2D-DIGE expression profiling experiments. Gloves, aprons, facemasks, and hair caps need to be worn while in the clean room to prevent contamination by skin keratins. 2. During sample preparation, salts, lipids, detergents, and nucleic acids need to be removed because they can affect isoelectric focusing. 3. The buffers used during sample preparation need to be compatible with the protein quantitation technique used.

210

Hariharan, Weeks, and Crnogorac-Jurcevic

4. The DMF used to make the stock solution of cyanine dyes should be 99.5% pure and anhydrous. The addition of DMF for resolubilisation of stock dyes should be conducted under a nitrogen atmosphere. Cyanine dyes are water sensitive; hence water-free storage needs to be ensured (silica gel may be used). 5. Adding 10% to these figures allows for variations in pipetting and ensures sufficient sample for all replicates. 6. The method outlined here is for minimal dye labelling using cyanine dyes. 7. The length of IPG gel strip, pH range, and IEF gradient setup should be optimised to achieve the best possible resolution for the samples of interest. 8. After IEF, IPG gel strips may be stored at −80°C for several weeks in a rigid container (because the strips are brittle when frozen) prior to separation in the second dimension. 9. The PAGE gels need to polymerise completely and should preferably be left overnight. 10. The reduction and alkylation of proteins on IPG gel strips using DTT and IAM should not exceed the stipulated times because overexposure may contribute to protein loss. 11. While inserting the IPG gel strip between the glass plates over the PAGE gel, one must ensure that there are no air bubbles at the interface. The overlaying agarose has to solidify prior to commencing second-dimension separation. 12. Use IPG gel strips from the same batch and second-dimension gels prepared at the same time, because similar conditions need to be maintained throughout the experiment to enable easier spot matching across different gels. 13. Before scanning, both outer gel plate surfaces are wiped clean and dried to ensure optimum scanned image quality. 14. After applying the CCB stain, the gels do not need to be destained to visualise proteins. References 1. Pastwa, E., Somiari, S. B., Czyz, M. and Somiari, R. I. (2007) Proteomics in human cancer research. Proteomics Clin Appl. 1(1), 4–17. 2. O’Farrell, P. H. (1975) High resolution twodimensional electrophoresis of proteins. J Biol Chem. 250, 4007–4021. 3. Unlu, M., Morgan, M. E. and Minden, J. S. (1997) Difference gel electrophoresis: a single gel method for detecting changes in protein extracts. Electrophoresis. 18, 2071–2077.

4. Marouga, R., David, S. and Hawkins, E. (2005) The development of the DIGE system: 2D fluorescence difference gel analysis technology. Anal Bioanal Chem. 382, 669–678. 5. Tonge, R., Shaw, J., Middleton, B., Rowlinson, R., Rayner, S., Young, J., et al. (2001) Validation and development of fluorescence two-dimensional differential gel electrophoresis proteomics technology. Proteomics. 1, 377–396.

Application of Proteomics in Cancer Gene Profiling

6. DeSouza, L., Diehl, G., Rodrigues, M. J., Guo, J., Romaschin, A. D., Colgan, T. J., et al. (2005) Search for cancer markers from endometrial tissues using differentially labeled tags iTRAQ and cICAT with multidimensional liquid chromatography and tandem mass spectrometry. J Proteome Res. 4, 377–386. 7. Ryu, O. H., Atkinson, J. C., Hoehn, G. T., Illei, G. G. and Hart, T. C. (2006) Identification of parotid salivary biomarkers in Sjogren’s syndrome by surface-enhanced laser desorption/ionization time-of-flight mass spectrometry and two-dimensional difference gel electrophoresis. Rheumatology (Oxford). 45, 1077–1086. 8. Shaw, M. M. and Riederer, B. M. (2003) Sample preparation for two-dimensional gel electrophoresis. Proteomics. 3, 1408–1417. 9. Orenes-Pinero, E., Corton, M., GonzalezPeramato, P., Algaba, F., Casal, I., Serrano, A., et al. (2007) Searching urinary tumor markers for bladder cancer using a two-dimensional differential gel electrophoresis (2D-DIGE) approach. J Proteome Res. 6, 4440–4448. 10. Lee, I. N., Chen, C. H., Sheu, J. C., Lee, H. S., Huang, G. T., Yu, C. Y., et al. (2005) Identification of human hepatocellular carcinoma-related biomarkers by two-dimensional difference gel electrophoresis and mass spectrometry. J Proteome Res. 4, 2062–2069. 11. Katayama, M., Nakano, H., Ishiuchi, A., Wu, W., Oshima, R., Sakurai, J., et al. (2006)

211

Protein pattern difference in the colon cancer cell lines examined by two-dimensional differential in-gel electrophoresis and mass spectrometry. Surg Today. 36, 1085–1093. 12. Jin, T., Hu, L. S., Chang, M., Wu, J., Winblad, B. and Zhu, J. (2007) Proteomic identification of potential protein markers in cerebrospinal fluid of GBS patients. Eur J Neurol. 14, 563–568. 13. Kakisaka, T., Kondo, T., Okano, T., Fujii, K., Honda, K., Endo, M., et al. (2007) Plasma proteomics of pancreatic cancer patients by multi-dimensional liquid chromatography and two-dimensional difference gel electrophoresis (2D-DIGE): up-regulation of leucine-rich alpha-2-glycoprotein in pancreatic cancer. J Chromatogr B Analyt Technol Biomed Life Sci. 852, 257–267. 14. Gharbi, S., Gaffney, P., Yang, A., Zvelebil, M. J., Cramer, R., Waterfield, M. D., et al. (2002) Evaluation of two-dimensional differential gel electrophoresis for proteomic expression analysis of a model breast cancer cell system. Mol Cell Proteomics. 1, 91–98. 15. Thiede, B., Hohenwarter, W., Krah, A., Mattow, J., Schmid, M., Schmidt, F., et al. (2005) Peptide mass fingerprinting. Methods. 35, 237–247. 16. Aebersold, R. and Goodlett, D. R. (2001) Mass spectrometry in proteomics. Chem Rev. 101, 269–295.

Chapter 12 Search for and Identification of Novel Tumor-Associated Autoantigens Karsten Conrad, Holger Bartsch, Ulrich Canzler, Christian Pilarsky, Robert Grützmann, and Michael Bachmann Summary During the development of tumors, autoantibodies against aberrant or overexpressed autoantigens can be induced. Several hundreds of tumor-associated autoantibodies (TAAB) with more or less specificity for tumors have been found until now by molecular cloning and proteomics technologies. Many TAAB are detectable in preclinical stages of the disease and may be indicators of tumor development. The screening for autoantibody responses in tumor patients may lead to new diagnostic tumor markers and may be a simple and effective way to identify concomitantly cytotoxic T-lymphocyte (CTL) reactivity. However, most of the TAAB lack sufficient sensitivity and specificity for use as biomarkers in the clinical practice. For further use TAAB should be selected for their specificity regarding malignancies and for their potential clinical application. If selected for high specificity, for the screening of risk groups the sensitivities of most TAAB are too low. A combined determination of two or more tumor-specific autoantibodies may overcome this problem. Therefore, a further evaluation of the relevance of known autoantibody specificities as well as the search for novel diagnostically relevant TAAB by different methodologies is necessary. An optimal combination of highly specific TAAB in multiparametric assays as well as the standardization of the autoantibody analysis is necessary to exhaust the potential of TAAB in the early (presymptomatic) diagnosis and monitoring of malignancies. Key words: Immunoscreening, Tumor-associated antigen, Lambda phages, cDNA library

1. Introduction There is growing evidence that antigens that become aberrantly or overexpressed during the transition to malignancy can be targets of the cellular and/or humoral immune response under special conditions (e.g., sequence of major histocompatibility complex Robert Grützmann and Christian Pilarsky (eds.), Cancer Gene Profiling: Methods and Protocols, Methods in Molecular Biology, vol. 576, DOI 10.1007/978-1-59745-545-9_12, © Humana Press, a part of Springer Science + Business Media, LLC 2010

213

214

Conrad et al.

[MHC] molecules, proinflammatory stimuli). Those antigens may be involved in or the consequence of the tumorigenesis and, therefore, immune responses (e.g., autoantibodies) against them can be regarded as tumor associated or even tumor specific. Responsible for the tumorigenesis are alterations in at least three groups of genes: oncogenes, tumor-suppressor genes, and stability genes. Autoantibodies more or less associated with a variety of tumors have been described against oncogene products, tumor suppressor proteins, products of stability genes, inhibitors of apoptosis proteins, cancer/testis class of tumor antigens, and onconeural antigens. Furthermore, autoantibodies against proliferation- or differentiation-associated antigens other than oncoproteins (e.g., cyclins; cyclin-dependent kinases; centromere protein F; insulin-like growth factor II messenger RNA [mRNA]-binding proteins; the DEAD-box protein 48; the 32-kDa subunit of replication protein A [RPA32]; the annexins I, II, and XI-A; and the AIS gene product p40) have been found in tumor patients (reviewed in ref. 1). The list of tumor autoantigens is growing rapidly; several hundreds have been found via recognition by autoantibodies in sera from cancer patients via complementary DNA (cDNA) cloning or proteomics-based technologies. Some novel antigens are accidentally detected during routine diagnosis due to unknown epifluorescence staining patterns while screening for nuclear antibodies either on fixed cultured cells or tissues (e.g., Fig. 1).

Fig. 1. Examples of fluorescence pattern on HEp-2 cells (a, b) and monkey cerebellum (c, d) of autoantibodies in sera from tumor patients: (a) CENP-F pattern: centromeres are strongly stained in prometaphase and metaphase cells, variable fine granular staining is seen in interphase nuclei; (b) granular nuclear pattern (specificity unknown); (c) staining of neuron nuclei; and (d) granular staining of Purkinje cell cytoplasm.

Search for and Identification of Novel Tumor-Associated Autoantigens

215

1.1. Relevance of Tumor-Associated Autoantibodies

Because autoantibody response in tumor patients is often associated with aberrant or overexpression of antigens in tumor tissue or serum, most tumor-associated autoantibodies (TAAB) seem to be a result of antigen-driven immune response like those suggested for autoimmune diseases. Furthermore, it may be concluded that TAAB can be viewed as reporters from the immune system revealing the identity of antigens that might be playing a role in the tumorigenic processes (2). Therefore, TAAB are important markers for different approaches. (1) They are molecular probes for the identification of novel proliferation-associated antigens and pathways. Sera of cancer patients have been shown to be useful reagents for identifying new cellular proteins possibly involved in tumor development (e.g., S/G2 nuclear antigen, centromere protein F). (2) They are useful in the search of new targets for vaccine-based therapies. Most TAAB detect antigens that are highly expressed predominantly in tumor cells. Those tumor-associated antigens (TAA) are putative candidates for a tumor vaccination strategy because the B-cell response is often accompanied by a cellular immune response (3). (3) TAAB may be useful as biomarkers for the prediction, diagnosis, and monitoring of tumors. However, despite the large number of TAAB detected until today, the practical application is limited for the following reasons. (1) The aberrant or overexpression of TAA in tumors is, with few exceptions, necessary, but not sufficient for immune activation. Only a subset of patients with a tumor type develops a humoral response to a particular antigen, for example, p53. The immunogenicity of a tumor depends on several factors that may be variable among tumors of a similar type. Regarding the (tumor-associated) antigens, the level of expression, posttranslational modification, or variations in protein processing are of great importance. Furthermore, the specific immune response to a defined antigen depends on the structure of the highly polymorphous MHC molecules. Therefore, the diagnostic sensitivity of most TAAB is too low for diagnostic screening. (2) Overexpressed genes/proteins are thought to elicit an immune response by overriding thresholds critical for the maintenance of tolerance. Because proliferation-associated antigens may be also overexpressed in other hyperproliferative disorders, such as autoimmune diseases, the autoantibody response is often not specific for malignancy. (3) The results regarding sensitivity and specificity of TAAB may differ dramatically from study to study due to different methods used for autoantibody determination, study design, and ethnicity of tested subjects. Nevertheless, TAAB may have a great potential in the early (presymptomatic) diagnosis of cancer, because the autoimmune response does not depend on the size of the tumor.

1.2. Potential Use of TAAB as Biomarkers for the Prediction and Early Diagnosis of Tumors

There is still a need for parameters that are specific for tumors and are detectable in preclinical stages. The ideal tumor marker should be highly sensitive and highly specific for tumors. The tumor sensitivity should be higher than that of other diagnostic

216

Conrad et al.

methods and the earlier diagnosis should lead to an improvement of therapy. TAA present in sera of cancer patients (e.g., CEA, NSE, SCC, CA50, etc.) can be useful markers for prognosis and for monitoring cancer therapy but have a limited value for diagnosis, especially for the early diagnosis of cancer. Novel approaches and developments may improve the diagnostic possibilities but are too expensive for a broader use or for the screening of risk groups. TAAB may develop (very) early with respect to tumor formation. Hints for the predictive relevance of TAAB are given by different observations and approaches. (1) TAAB are significantly more often detectable in risk groups for cancer development than in healthy volunteers. A higher risk of tumor development is observed in populations expressing cancer susceptibility genes (hereditary cancer syndromes), populations who are exposed to carcinogenic noxes (e.g., uranium miners), and in populations with preneoplastic or cancer-predisposing diseases (e.g., Barretts’ oesophagus, chronic liver inflammation or cirrhosis, dermatomyositis, Sjögren’s syndrome). TAAB have been described in some of the known-risk groups in higher frequencies than in healthy controls: anti-p53 autoantibody in uranium miners (4); anti-p53 autoantibody in patients with liver cirrhosis and other chronic liver diseases; anti-Crt32 autoantibody in hepatitis B virus (HBV)positive chronic hepatitis; anti-WT autoantibody in myelodysplastic syndromes; anti-hMSH2 and hPMS1 autoantibody in patients with dermato/polymyositis; anti-HMdU (5-hydroxymethyl-2¢deoxy-uridine) and anti-p53 autoantibody in otherwise healthy women who had a family history of breast cancer (5). However, the TAAB response in these populations does not indicate that all autoantibody-positive subjects will develop cancer, because overexpression of the relevant autoantigens with the potential of autoantibody induction can be observed also in nontumorous cells. The autoantibody responses rather reflect changes that might be relevant in tumorigenesis and therefore indicate a rising risk of tumor development in those subjects. (2) TAAB are significantly more often detectable in premalignant and early tumor stages than in healthy volunteers: anti-RPA32 and anti-p53 autoantibodies are detectable in ductal carcinoma in situ of the breast with early, nonpalpable (3–5 mm) lesions; anti-YKL-40 autoantibodies are detectable in early stage of ovarian cancer more often than the conventional tumor markers (65% vs. 35% CA125 and 13% CA15-3). (3) TAAB against neuroectodermal antigens in paraneoplastic syndromes. Paraneoplastic syndromes are (most often neurologic) disorders (PND), that are caused by a strong immune response against a shared antigen in the tumor and healthy host tissues. Because there is a direct relationship between those TAAB and paraneoplastic manifestations, the PND-specific autoantibody response is an early indicator of tumor development. It has been shown that autoantibodies against the neuroectodermal antigens

Search for and Identification of Novel Tumor-Associated Autoantigens

217

HuD, amphiphysin, recoverin, or enolase precede the diagnosis of cancer in approximately 70% of patients by up to 4 years. (4) Retrospective studies demonstrate how long TAAB are detectable before disease manifestation or the definite diagnosis with conventional methods. Lubin et al. (6) were the first who described that the humoral anti-p53 response may be an early event during tumorigenesis and can be detected before clinical manifestation of the disease. In two retrospective studies, Trivers et al. (7) showed that p53 autoantibodies were present months to years before the manifestation of angiosarcoma of the liver in workers occupationally exposed to vinyl chloride and lung cancer in heavy smokers with chronic obstructive pulmonary disease. In a retrospective study on former uranium miners, we could show that anti-p53, anti-NY-ESO-1, and anti-survivin autoantibodies are detectable in sera from patients with lung cancer collected up to 10 years prior to disease manifestation or confirmed diagnosis (4, 8). AntiRPA32 autoantibody was shown to be present 18 months before diagnosis of cancer in one patient (9) and in another patient, anti-IMP1 and anti-IMP3 (Koc) autoantibodies were detectable approximately 8 years before diagnosis of hepatocellular carcinoma (HCC) (10). (5) The follow-up of TAAB-positive people is of great importance to show the real risk of cancer development in those subjects. Up to now, only one study has been published: women, healthy at blood donation but who were diagnosed 0.5–6 years later with breast or colorectal cancer exhibited significantly increased anti-HMdU (5-hydroxymethyl-2¢-deoxyuridine) antibodies over the age-matched controls (11). Hopefully, further prospective studies of TAAB-positive people will show whether defined autoantibodies can be used in the screening for preneoplastic or microinvasive tumor lesions, allowing an early diagnosis and an early intervention of cancer. 1.3. Identification of TAAB

The continuous search for human cancer-specific antibodies started in the 1970s using a variety of serological test systems and antigen sources with the aim of detecting tumor-TAA. Old et al. established the strategy of “autologous typing” using autologous serum and tumor cell lines from cancer patients (12). Although only a few tumor-specific antigens could be detected by extensive absorption analyses and were further defined biochemically, this work provided strong evidence for the specificity of antibody responses in tumor patients (13). A new phase in cancer serology was introduced in the 1990s by two developments. First, molecularly defined proteins (such as oncoproteins) or glycoproteins (such as the mucin MUC-1) were used in immunoassays to look for cancer-specific autoantibodies. Second, the technique of autologous cDNA library screening was used for identifying TAAB. The unique feature of this technique is that the cDNA library is made from the tissue of the same patient whose serum

218

Conrad et al.

is used for immunoscreening of the expression cDNA library. This technique, which was first introduced by Bachmann (14) and applied to the serum and a cDNA library prepared from an autoimmune patient, was later adapted to the screening of cDNA libraries prepared from tumor tissues that were screened with sera of tumor patients by Sahin et al. (15). Meanwhile, several hundreds of tumor antigens have hitherto been identified with autoantibodies of cancer patients by various groups using this new methodology, which was then called serological analysis of recombinant cDNA expression libraries of human tumors with autologous serum (SEREX) (15). In addition, with the use of proteomics-based technologies, novel TAAB have been found in the last years (summarized in ref. 1). Other approaches are the use of indirect immunofluorescence for selection of autoantibodies that may be directed to TAA. In addition, different proteomic assays were used for the identification of tumor-specific antigens, such as two-dimensional polyacrylamide gel electrophoresis (PAGE), Western blot, or matrix-assisted laser desorption/ionization time of flight (MALDI-TOF).

2. Materials 2.1. Preparation of DNA for Cloning a Lambda cDNA Library

1. DEPC-treated water (see Note 1). 2. Isolated polyA+ RNA. Note: you can use every tissue and every kit that is available and works in your hands (see Note 2). 3. Modified oligo dT primer including a cloning site (XhoI for lambda ZAP II). We have used CTCGAG(dT)18 (50 mM stock solution). 4. 5-methyl-dCTP (Fermentas, St. Leon-Rot, Germany). 5. Reverse transcription (RT) kit (e.g., RT for PCR kit, Takara Bio Europe/Clontech, Saint-Germain-en-Laye, France). 6. 5× second-strand buffer: 94 mM Tris-HCl, pH 6.9, 453 mM KCl, 23 mM MgCl2 and 50 mM (NH4)2SO4. Make 200-mL aliquots and store at −80°C. 7. RNase H (1.0 U/mL, Roche, Mannheim, Germany) and DNA polymerase I (10 U/mL, Fermentas, St. Leon-Rot, Germany).

2.2. Creation of Blunt Ends and Adapter Ligation

1. Phenol/chloroform/isoamyl alcohol (25/24/1) and chloroform/isoamyl alcohol (24/1) (both supplied from Carl Roth, Karlsruhe, Germany). 2. T4 DNA polymerase (5 U/mL, provided with 5× buffer, Fermentas), dNTPs (2 mM each, Fermentas, St. Leon-Rot, Germany).

Search for and Identification of Novel Tumor-Associated Autoantigens

219

3. EcoRI/BstXI adapter (Invitrogen, Karlsruhe, Germany). 4. T4 DNA ligase (1 U/mL, provided with buffer and PEG 4000 solution) (Fermentas, St. Leon-Rot, Germany). 5. DNA clean up and concentrator kit (Zymo Research, Orange, CA, USA). 2.3. Digestion and Ligation into Lambda ZAP II Vector

1. Lambda ZAP II vector (included in the ZAP Express® cDNA Synthesis Kit, Stratagene, Heidelberg, Germany). 2. Suitable restriction enzymes (XhoI (10 U/mL), EcoRI (10 U/mL) and appropriate restriction enzyme buffer (10×), (Fermentas, St. Leon-Rot, Germany). 3. Phenol/chloroform/isoamyl alcohol (25/24/1) and chloroform/isoamyl alcohol (24/1) (Carl Roth, Karlsruhe, Germany). 4. 3 M NaOAc (pH 5.2) solution, autoclave and store at room temperature. 5. 100% ethanol (molecular biology grade, Carl Roth, Karlsruhe, Germany), 70% (v/v) ethanol. 6. Tris-HCl (pH 7.4). Prepare a 1 M stock, autoclave and store at room temperature. 7. T4 DNA ligase (provided with 10× ligase buffer, Fermentas, St. Leon-Rot, Germany).

2.4. Packaging Protocol

1. Gigapack® III Gold Packaging Extract (ZAP Express® cDNA Synthesis Kit, Stratagene, Heidelberg, Germany). 2. SM buffer (50 mM Tris-HCl, pH 7.5, 10 mM NaCl, 8 mM MgSO4, 0.01% [w/v] gelatin). 3. Chloroform (Carl Roth, Karlsruhe, Germany).

2.5. Preparation of Agar Plates and Set Up of Bacteria for Infection

1. LB medium (10 g/L NaCl, 5 g/L yeast extract, 10 g/L tryptone, pH 7.0), autoclave and store at 4°C. 2. MgSO4, 1 M stock solution in water, autoclave and store at room temperature. 3. Maltose, 20% (w/v) stock solution in water, sterilize by filtration and store in aliquots at −20°C. 4. Bacterial clone XL-1 Blue (Stratagene, Heidelberg, Germany) (see Note 3). 5. Bottom agar (LB medium, 15 g/L agar, 10 mM MgSO4). 6. Sterile 24 × 24-cm cell culture dishes (Nunc, Rosklide, Denmark).

2.6. Handling and Titration of Phages

1. Top agar (LB medium, 6 g/L agarose, 10 mM MgSO4). 2. Chloroform (Carl Roth, Karlsruhe, Germany). 3. Sterile 85-cm diameter cell culture dishes (Nunc, Rosklide, Denmark).

220

Conrad et al.

2.7. Amplifying the Library

1. Bottom agar (LB medium, 15 g/L agar, 10 mM MgSO4). 2. Top agar (LB medium, 6 g/L agarose, 10 mM MgSO4). 3. Chloroform (Carl Roth, Karlsruhe, Germany). 4. Sterile 24 × 24-cm cell culture dishes (Nunc, Rosklide, Denmark).

2.8. Screening for Tumor-Associated Antibodies

1. Nitrocellulose filter: Hybond-C extra (20 × 20 cm or 82-mm diameter, Amersham/GE HealthCare Life Science, Munich, Germany) (see Note 4). 2. Isopropyl b-d-1-thiogalactopyranoside (IPTG), 1M stock solution in water, store in aliquots at −20°C. 3. 16-gauge needle and black ink for marking of the filter. 4. Tris-buffered saline with Tween (TBS-T): Prepare 10× stock with 1.5 M NaCl, 0.5 M Tris–HCl, pH 8.0, 1% Tween-20. Dilute 100 mL with 900 mL water for use. 5. Blocking buffer: 5% (w/v) Blocking Reagent (Roche, Mannheim, Germany) in TBS-T.

2.9. Immune Detection and Isolation of Reactive Plaques

1. Antibody incubation buffer: 1% (w/v) Blocking Reagent (Roche, Mannheim, Germany) in TBS-T. 2. Patient sera, dilute 1–100. 3. Monoclonal antibody: anti-human IgG coupled to alkaline phosphatase (Sigma Chemicals, St. Louis, MO, USA). 4. Detection buffer: 100 mM Tris–HCl, pH 9.5, 50 mM MgCl2, 100 mM NaCl. 5. p-Nitrobluetetrazoliumchloride (NBT) stock: 77 mg/mL in 70% dimethyl-formamide (DMFA), 5-bromo-4-chloro-3indolyl-phosphate (BCIP) stock: 50 mg/mL in 100% DMFA. Distributed by Carl Roth, Karlsruhe, Germany.

2.10. In Vivo Excision of Purified Phage Clones

1. Lambda ZAP II vector system (Stratagene, Heidelberg, Germany). 2. Helper phage R408 (Stratagene, Heidelberg, Germany). 3. 2× YT medium (10 g/L NaCl, 10 g/L yeast extract, 16 g/L tryptone, pH 7.0), autoclave and store at 4°C.

3. Methods The technique to use polyclonal sera for the screening of a specific clone in a complex lambda cDNA library was developed during the 1980s. This technique was than adapted to an autologous setting by Tröster et al. (14). Here, we used the cDNA library

Search for and Identification of Novel Tumor-Associated Autoantigens

221

of a patient with primary Sjögren’s syndrome and screened with the patient sera for patient-specific autoantigens. Later, this technique was used in a slightly different context for the identification of TAA (15). Several hundreds of tumor antigens have hitherto been identified with autoantibodies of cancer patients by various groups using this methodology called SEREX, which is described in the following section. 3.1. Preparation of DNA for Cloning a Lambda cDNA Library

1. Prepare polyA+ RNA from your desired tissue. We used, for example, peripheral blood lymphocytes (PBL) and the Trizol protocol provided by the supplier (Invitrogen), followed by the isolation of mRNA with the mMACS mRNA isolation kit from Miltenyi (Bergisch Gladbach, Germany). 2. For synthesis of cDNA, you have to design first a modified oligo dT primer including a suitable restriction enzyme site, e.g., an XhoI site for later cloning. We used the primer sequence CTCGAG(dT)18. In this case, it is mandatory to use 5-methyldCTP in the reverse transcription step to protect newly synthesized XhoI sites included in your cDNA sequences. 3. For cDNA preparation, we used the RT for PCR Kit provided by Takara Bio Europe/Clontech. It is a must that all components are RNase free, because for the second-strand synthesis you will need the DNA-RNA hybrid molecule (see Note 5). 4. Thaw the components of the kit and store the tubes on ice. You do not need to thaw the included dNTPs and the primer solutions. 5. In a sterile 0.5-mL microcentrifuge tube, add your RNA preparation to a volume of DEPC-treated water that will give a total volume of 31.5 mL. Use 5 mg of your prepared mRNA. 6. Add 1.0 mL of the specific XhoI/oligo(dT)18 primer (50 mM stock solution). 7. Heat the RNA/primer mix at 70°C for 2 min. Then quench rapidly on ice before proceeding to the next step. 8. Mix your modified dNTP solution: dATP, dTTP, dGTP, and 5-methyl-dCTP, 10 mM each. All nucleotides were obtained by Fermentas. 9. Add a premixed solution of 10.0 mL of 5× reaction buffer, 1.0 mL recombinant RNase inhibitor, 5.0 mL MMLV reverse transcriptase, and 2.5 mL modified dNTP mix (10 mM each). The final volume of this reaction mix is 50 mL (see Note 6). 10. Mix the contents of the tube by pipetting up and down. 11. Incubate the reaction mix at 42°C for 1.5 h. 12. Prepare the second-strand premix shortly before the end of the reverse transcription reaction by mixing 80 mL of 5× second-

222

Conrad et al.

strand buffer, 6 mL dNTP mix (10 mM each), and 250 mL water. Store on ice until use. 13. Add the first-strand synthesis mix to the cooled second-strand solution and mix. 14. Prepare the enzyme mix in a separate tube 4 mL RNase H (1.0 U/mL) and 10 mL DNA polymerase I (from Escherichia coli, 10 U/mL). Mix the enzymes and add this mixture to the second-strand premix. Mix immediately by inverting the tube several times. 15. Collect the liquid by a short centrifugation step and perform the second-strand synthesis first at 16°C for 1 h, followed by an additional 1 h incubation at room temperature. 16. You can control the cDNA by performing an agarose gel electrophoresis and checking the smear on the gel. The size should range from approximately 300 to 8,000 bp. 3.2. Creation of Blunt Ends and Adapter Ligation

1. Extract the DNA once with phenol/chloroform/isoamyl alcohol and once with chloroform/isoamyl alcohol. Always transfer the liquid phase to a fresh tube. Add 1/10 of the remaining volume of 3 M NaOAc (pH 5.2) and precipitate the DNA by adding 2.5 volumes pure ethanol. Wash once with 70% (v/v) ethanol. 2. Air-dry the DNA and resuspend in 35 mL water. 3. Blunt-end the DNA by using T4 DNA polymerase (Fermentas). To the DNA, add 10 mL of 5× reaction buffer, 2.5 mL dNTP (2 mM each), and 2.5 mL T4 DNA polymerase (5 U/mL). 4. Incubate the mixture at 11°C for 30 min and stop the reaction by heating at 70°C for 10 min. 5. Extract the DNA once with phenol/chloroform/isoamyl alcohol and once with chloroform/isoamyl alcohol. Always transfer the liquid phase to a fresh tube. Add 1/10 of the remaining volume of 3 M NaOAc (pH 5.2) and precipitate the DNA by adding 2.5 volumes pure ethanol. Wash once with 70% (v/v) ethanol. 6. Air-dry the DNA and resuspend in 20 mL water. 7. For ligation of the EcoRI/BstXI adapter, add 3 mL of 10× T4 DNA ligase buffer (Fermentas), 3 mL of 50% (w/v) PEG 4000 solution and 2 mL EcoRI/BstXI adapter (1 mg/mL, Invitrogen) and mix (see Note 7). 8. Add 2 mL T4 DNA ligase (2 U), mix, and incubate overnight at 4°C. 9. Inactivate the ligase by heating at 65°C for 10 min. 10. Remove excess linker molecules by using a commercially available DNA clean-up kit. We use, for example, the DNA

Search for and Identification of Novel Tumor-Associated Autoantigens

223

clean up and concentrator kit from Zymo Research. Follow the instructions given with the system. 11. Digest the DNA with XhoI by adding 1 mL of 10× restriction enzyme buffer and 5 U XhoI. Bring the volume up to 10 mL with water. 3.3. Digestion and Ligation into Lambda ZAP II Vector

1. Combine 5 mg of lambda ZAP II vector with 10 mL of 10× restriction enzyme buffer and 10 U XhoI and 10 U EcoRI. Adjust the volume to 50 mL and digest the DNA for 2 h at 37°C. 2. Extract the DNA once with phenol/chloroform/isoamyl alcohol and once with chloroform/isoamyl alcohol. Always transfer the liquid phase to a fresh tube. Add 1/10 of the remaining volume of 3 M NaOAc (pH 5.2) and precipitate the DNA by adding 2.5 volumes pure ethanol. Wash once with 70% (v/v) ethanol. 3. Air-dry the DNA and resuspend it in 5 mL of 10 mM Tris-HCl (pH 7.4). 4. Perform the ligation reaction by pipetting 2 mL of the digested lambda ZAP II DNA (approximately 1 mg), 3 mL digested insert DNA (approximately 0.4 mg), 1 mL 10× ligase buffer, and 2 mL water. Mix and add 2 mL T4 ligase (2 U). Incubate overnight at 15°C (see Note 8). 5. Inactivate the ligase by heating at 65°C for 10 min. Continue with the packaging or store the DNA at −20°C. Avoid multiple thawing and freezing cycles (see Note 9).

3.4. Packaging Protocol

1. Quickly thaw the Gigapack® III Gold packaging extract (Stratagene) by warming it in your hands. Monitor the extract closely and continue with the next steps as soon as the solution starts to thaw. 2. Add 4 mL (~0.1–0.4 mg ligated DNA) of the DNA (from step 3 of Subheading 3) containing your library and mix well by stirring carefully with a pipet tip. 3. Incubate the reaction mix for 2 h at room temperature. 4. Add 500 mL SM buffer and 20 mL chloroform and mix gently. 5. Spin the tube briefly to sediment the debris. The supernatant contains the phage particles. It can be stored at 4°C for up to 4 weeks. Continue with the titration of your library.

3.5. Preparation of Agar Plates and Set Up of Bacteria for Infection

1. For the first screening, fill 24 × 24-cm cell culture dishes (Nunc) with 100 mL bottom agar. Always work sterilely, because there is no antibiotic added in the screening plates.

224

Conrad et al.

2. For the second and any further screening, use 85-mm cell culture plates and fill with 10 mL bottom agar. Also use the small plates for the determination of the phage titer. 3. The plates can be stored at 4°C for at least 4 weeks. 4. For the preparation of the XL-1 Blue bacteria, inoculate a single colony in 20–50 mL LB medium supplemented with 10 mM MgSO4 and 0.2% (w/v) maltose (see Note 10). 5. Incubate overnight at 37°C in a bacterial shaker. 6. On the next day, pellet the bacteria at 1,000 × g for 10 min and resuspend the bacteria in 10 mM MgSO4 (50% of the initial used volume). Measure the OD600, it should be at least 4. 7. Store the bacteria at 4°C and use for a maximum of 4 weeks. 3.6. Handling and Titration of Phages

1. The phages are stored in SM buffer in the dark at 4°C supplemented with chloroform (approximately 1/20 of the total volume). Before the phages can be used for a screening, the exact titer (in pfu/mL) has to be determined. Therefore prepare a ten times dilution row in SM buffer. Prepare ten different concentrations containing 50 mL each. 2. Warm the bottom agar plates to 42°C prior to use (for at least 1 h). 3. Liquefy the top agar, fill separate tubes with 3 mL, and cool to 47°C in a water bath. 4. Dilute an aliquot of the bacteria with 10 mM MgSO4 to OD600 = 1. Use 100 mL of these bacteria and mix with 10 mL of the diluted phage suspensions. 5. Incubate at 37°C for 15 min. 6. Mix the bacteria/phage suspension with the top agar and pure immediately onto the prewarmed bottom agar plate. Make sure that the top agar is evenly spread on the bottom agar; use a planar surface (see Note 11). 7. Wait until the top agar has hardened (~10 min at room temperature) and incubate the plates at 37°C overnight. 8. The next day, count the plaques on the plates and calculate the titer of your phage suspension.

3.7. Amplifying the Library (See Note 12)

1. Warm the needed amount of 24 × 24-cm bottom agar plates at 42°C for at least 2 h. 2. Liquefy 30 mL top agar per plate and cool down to 47°C. 3. Dilute the XL-1 Blue bacteria to OD600 = 1. Use 1 mL of these bacteria and mix with an aliquot of the library suspension containing 1 × 105 pfu of bacteriophage. To amplify 1 × 106 plaques, use a total of ten aliquots (each aliquot contains 1 × 105 plaques per plate). Do not use more than 100 mL of the phage library; otherwise, the bacteria might be killed by the included chloroform.

Search for and Identification of Novel Tumor-Associated Autoantigens

225

4. Incubate for 15 min at 37°C. 5. Mix this suspension with the warm top agar and poor immediately onto the bottom agar plate. Make sure that the bottom agar plate is placed on a planar surface and that the top agar is evenly spread over the bottom agar. Wait until the top agar has hardened (~15 min at room temperature). 6. Invert the plates and incubate at 37°C for 6–8 h. Do not allow the plaques to get larger than 1.2 mm. 7. Overlay each plate with 20 mL of SM buffer. Store the plates at 4°C overnight (with gentle rocking if possible). This allows the phages to diffuse into the SM buffer. 8. Recover the bacteriophage suspension from each plate and pool it into a sterile polypropylene container. Add chloroform to a 5% (v/v) final concentration. Mix well and incubate for 15 min at room temperature. 9. Remove the bacterial cell debris by centrifugation for 10 min at 500 × g. 10. Recover the supernatant and transfer it to a sterile polypropylene container. If the supernatant still appears cloudy or has a high amount of cell debris, repeat steps 8 and 9. If the supernatant is clear, add chloroform to a 5% (v/v) final concentration and store at 4°C. A properly stored phage library is stable for several years, although the titer slightly decreases over the time. 11. Check the titer of the amplified library as described above. Assume approximately 109–1011 pfu/mL. Briefly spin the lambda phage stock to ensure that the chloroform is separated completely before removing the aliquot for titration. 3.8. Screening for Tumor-Associated Antibodies

1. After you have determined the titer of your phage cDNA library, you can start with the first screening. Here, you need to screen a large number of clones to get positive results. Therefore, start with a 24 × 24-cm plate and a maximum of 1.5 × 105 pfu. 2. In the following screening steps, you need small 85-mmdiameter plates and only 100 to a maximum of 500 plaques per plate to get well-isolated single clones. The volumes needed for these screening steps are given in brackets in the following protocol. 3. Warm the bottom agar plates at 42°C for at least 2 h. 4. Liquefy 30 mL (3 mL) top agar and cool down to 47°C. 5. Dilute the XL-1 Blue bacteria to OD600 = 1. Use 1 mL (100 mL) of these bacteria and mix with the calculated amount of phage solution. Do not use more than 100 mL (10 mL) of the phage library; otherwise, the bacteria might be killed by the included chloroform. 6. Incubate for 15 min at 37°C.

226

Conrad et al.

7. Mix this suspension with the warm top agar and pour immediately onto the bottom agar plate. Make sure that the bottom agar plate is placed on a planar surface and that the top agar is evenly spread over the bottom agar. Wait until the top agar has hardened (~15 min at room temperature). 8. Incubate the plates at 42°C for 3 h. 9. While incubating the plates, prepare the nitrocellulose filter for the screening (see Note 13). 10. Impregnate the filter in a 10-mM IPTG solution to induce the protein expression. You will need approximately 10 mL for one 22 × 22-cm filter (or ~15 filters with 80-mm diameters). Dry the moistened filters between Whatman 3MM filter paper sheets for 45 min minimum. Two filters can be used per plate. 11. After the 3 h incubation at 42°C, put the first filter very carefully onto the plate. You must not remove the filter or alter the orientation of the filter once contact with the agar is made. Mark the position of the filter by punching different numbers of holes at the corners. Use a needle dipped in black ink so that the marks are easy visible in the turbid agar. 12. Incubate the plate for another 3 h at 37°C to induce the protein expression. Control the plaque formation at the edge of the filters. 13. Remove the first filter. If desired, apply a second filter. Mark the second filter in a different way then the first filter (e.g., at each side instead at the corners). Incubate the second filter for at least 30 min at 37°C and then overnight at 4°C to prevent overgrowing. Remove the second filter and proceed in the same way as with the first one. 14. Store the plate at 4°C until you can pick the positive plaques. 15. Wash the filter after you removed it directly two times in 30 mL (10 mL) TBS-T each. Transfer the filter in 30 mL (10 mL) blocking solution and block the filter overnight at 4°C. Alternatively, block at room temperature for 2 h. 3.9. Immune Detection and Isolation of Reactive Plaques

1. After the blocking, incubate the filter with the appropriate serum. Dilute the serum 1–100 in incubation buffer (TBS-T including 0.1% bovine serum albumin [BSA]). You will need at least 30 mL (5 mL) to cover the filter. Alternatively, you can seal the filter in a film tubing to save serum. Incubate for 1.5 h at room temperature on a shaking platform. 2. Wash five times with at least 30 mL (10 mL) TBS-T for 8 min at room temperature. 3. Incubate with the secondary antibody coupled to alkaline phosphatase. We use anti-human IgG AP, diluted 1–1,000 in incubation buffer (Sigma Chemicals). You will need at least

Search for and Identification of Novel Tumor-Associated Autoantigens

227

30 mL (5 mL) to cover the filter. Again, you can also seal the filter in film tubing to save volume. Incubate for 1 h at room temperature on a shaking platform. 4. Wash again five times with at least 30 mL (10 mL) TBS-T for 8 min at room temperature. 5. Equilibrate the membrane for 10 min in detection buffer to provide the proper pH for the alkaline phosphatase. 6. During the equilibration step, prepare the staining solution as follows. Dilute 250 mL NBT stock solution and 188 mL BCIP stock solution in 50 mL detection buffer (enough for ~ten 80-mm filter). Store at 4°C in the dark until use. 7. Stain the filter with 50 mL (5 mL) staining solution at room temperature protected from light until the plaque pattern becomes visible. Longer incubation steps will only lead to higher background and not to signals that are more specific. Stop the reaction with water and dry the filter between two sheets of Whatman 3MM paper. 8. Identify the positive plaques on the filter and try to locate them on the phage plate. Use the background pattern together with the marks to retrieve the right clone (see Note 14). 9. Mark the clone at the bottom of the plate and pick the clone by using a sterile blue tip that has been cut off to get a circle of approximately 5–6 mm. Transfer the agar piece in 1 mL SM buffer supplemented with 50 mL chloroform. For the second and any further screening, pick one well-isolated single plaque. Use a sterile Pasteur pipet and transfer the plaque in 100 mL SM buffer supplemented with 5 mL chloroform (see Note 15). 10. Store in the dark at 4°C, do not freeze. 11. Continue with at least two additional screening rounds. Start the next screening again with the determination of the phage titer of this solution. In this screening, you will need between 100 and a maximum of 500 plaques per plate to get well-isolated single plaques. 12. Proceed with the screening as outlined before (use the volumes written in brackets for the small filters used in these screenings). 3.10. In Vivo Excision of Purified Phage Clones

1. This method is adapted to a protocol supplied by Stratagene, Heidelberg, Germany. You can use this protocol when using lambda ZAP II vector system (Stratagene, Heidelberg, Germany). 2. Mix 2 mL competent XL-1 Blue bacteria with 200 mL phage suspension (should be more than 1 × 105 pfu) and with 1 × 106 pfu helper phage R408. Determine the titer of the helper phage directly before use.

228

Conrad et al.

3. Incubate for 15 min at 37°C. 4. Add 5 mL of 2× YT medium and incubate for 3 h at 37°C in a rotating incubator at 180–200 rpm. 5. Heat the solution at 70°C for 20 min to lyse the lambda phage particles and the cells. Spin the tube at 4,000 × g for 5 min to pellet the cell debris. 6. Decant the supernatant into a fresh sterile tube. This stock contains the excised pBluescript phagemid packaged as filamentous phage particles (stock may be stored at 4°C for 1–2 months without any serious loss of infectiousness). 7. To plate the excised phagemids, add 200 mL of competent XL-1 Blue bacteria cells (OD600 = 1.0) to two 1.5-mL microcentrifuge tubes. 8. Add 100 mL of the phage supernatant to one microcentrifuge tube and 10 mL of the phage supernatant to the other microcentrifuge tube. 9. Incubate the microcentrifuge tubes at 37°C for 15 min. 10. Plate 200 mL of the cell mixture from each microcentrifuge tube on LB-ampicillin agar plates (150 mg/mL) and incubate the plates overnight at 37°C. 11. Pick some isolated colonies and analyze the plasmid DNA for the inserted cDNA.

4. Notes 1. Unless stated otherwise, all solutions should be prepared in water that has a conductivity of 0.056 ms/cm and total organic content of less than 5 ppb. This standard is referred to as “water” in this text. 2. If you plan to use PBMC as a source for RNA, you will screen IgG clones that react with the secondary antibody, so you have to select more clones for your convenience or to use subtraction libraries. 3. Several, but not all bacterial clones are easily infected by lambda phages. In addition, different bacterial strains show different plaque morphology. Therefore, we recommend the use of XL-1 Blue or BB4 bacteria (both distributed by Stratagene). 4. Any nitrocellulose membrane might work with this system. 5. There are plenty of kits and protocols available for the cDNA synthesis and you can use whichever system works best in

Search for and Identification of Novel Tumor-Associated Autoantigens

229

your hands, as long as the provided reverse transcriptase lacks RNase H. 6. You must use RNase H negative reverse transcriptase to keep the hybrid molecule intact. 7. If you are using primer in the adapter ligation reaction, you have to order them phosphorylated. 8. Avoid PEG solution in this ligation step, because it might inhibit the packaging reaction. 9. If the insert used is free from contaminants and contains a high percentage of ligatable ends, expect approximately 2 × 106 to 1.5 × 107 recombinant plaques when using highefficiency packaging extracts (e.g., Gigapack® III Gold packaging extracts [Stratagene]). 10. Because normal bacteria are not easily infected with lambda phages, they have to be “primed” beforehand by adding MgSO4 and maltose to the culture media. 11. Be sure that the top agar temperature does not exceed 47°C, otherwise, the bacteria will be killed. 12. Because the created library is usually not stable, it is desirable to amplify the library prepared in lambda vectors to make a large, stable quantity of a high-titer stock of the library. However, more than one round of amplification is not recommended, because slower-growing clones may be significantly underrepresented. 13. The filters applied onto the plates should be slightly smaller than the dish used, to provide easy handling. 14. The plaques should be clearly distinguishable from the negative background staining in the first and second screening. However, because in the third screening only one clone should have been used, all plaque staining will look the same. If you have any doubt in the first screening, pick the clone. You will distinguish the negative clones in the further screening steps. 15. For picking in the first screening, you might also use the “wrong” site of a Pasteur pipet, but it is difficult to release the agar piece out of the pipet.

Acknowledgments We thank Uta Kießling, Andrea Thieme, and Martina Franke for their excellent technical support.

230

Conrad et al.

References 1. Conrad K, Roggenbuck D, Bachmann M. (2005) Autoantibodies as indicators of tumor development. In: Conrad K, Bachmann M, Lehmann W, Sack U (Eds). Methods, Possibilities and Perspectives of Pre-symptomatic Tumor Diagnostics. Pabst Science Publishers, Lengerich, pp 55–77. 2. Tan EM. (2001) Autoantibodies as reporters identifying aberrant cellular mechanism in tumorigenesis. J Clin Invest; 108:1411– 1415. 3. Jaeger E, Chen YT, Drijfhout JW, Karbach J, Ringhoffer M, Jaeger D, Arand M, Wada H, Noguchi Y, Stockert E, Old LJ, Knuth A. (1998) Simultaneous humoral and cellular immune response against cancer-testis antigen NY-ESO-1: Definition of human histocompatibility leukocyte antigen (HLA)-A2-binding peptide epitopes. J Exp Med; 187:265–270. 4. Conrad K. (2000) Autoantibodies in cancer patients and in persons with a higher risk of cancer development. In: Shoenfeld, Y, Gershwin ME (Eds). Cancer and Autoimmunity. Elsevier, Amsterdam, pp 159–173. 5. Crawford LV, Pim DC, Bulbrook RD. (1982) Detection of antibodies against the cellular protein p53 in sera from patients with breast cancer. Int J Cancer; 30:403–408. 6. Lubin R, Zalcman G, Bouchet L, Tredanel J, Legros Y, Cazals D, Hirsch A, Soussi T. (1995) Serum p53 antibodies as early markers of lung cancer. Nature Med; 1:701–702. 7. Trivers GE, De Benedetti VMG, Cawley HL, Caron G, Harrington AM, Bennett WP, Jett JR, Colby TV, Tazelaar H, Pairolero P, Miller RD, Harris CC. (1996) Anti-p53 antibodies in sera from patients with chronic obstructive pulmonary disease can predate a diagnosis of cancer. Clin Cancer Res; 2:1767–1775. 8. Rohayem J, Diestelkoetter P, Weigle B, Oehmischen A, Schmitz M, Mehlhorn J, Conrad K,

Rieber EP (2000) Antibody response to the tumor-associated inhibitor of apoptosis protein Survivin in cancer patients. Cancer Res; 60:1815–1817. 9. Tomkiel JE, Alansari H, Tang N, Virgin JB, Yang X, VandeVord P, Karvonen RL, Granda JL, Kraut MJ, Ensley JF, Fernández-Madrid F. (2002) Autoimmunity to the Mr 32,000 subunit of replication protein A in breast cancer. Clin Cancer Res; 8:752–758. 10. Himoto T, Kuriyama S, Zhang JY, Chan EKL, Nishioka M, Tan EM. (2005) Significance of autoantibodies against insulin-like growth factor II mRNA-binding proteins in patients with hepatocellular carcinoma. Int J Oncol; 26:311–317. 11. Frenkel K, Karkoszka J, Glassman T, Dubin N, Toniolo P, Taioli E. (1998) Mooney LA, Kato I: Serum autoantibodies recognizing 5-hydroxymethyl-2’-deoxyuridine, an oxidized DNA base, as biomarkers of cancer risk in women. Cancer Epidemiol Biomarkers Prev; 7:49–57. 12. Old LJ. (1981) Cancer immunology: The search for specificity - G.H.A. Lowes Memorial Lecture. Cancer Res; 41:361–375. 13. Old LJ, Chen YT. (1998) New paths in human cancer serology. J Exp Med; 187:1163–1167. 14. Tröster H, Metzger TE, Semsei I, Schwemmle M, Winterpacht A, Zabel B, Bachmann M. (1994) One gene, two transcripts: Isolation of an alternative transcript encoding for the autoantigen La/SS-B from a cDNA library of a patient with primary Sjögrens’ syndrome. J Exp Med; 180:2059–2067. 15. Sahin U, Türeci O, Schmitt H, Cochlovius B, Johannes T, Schmits R, Stenner F, Luo G, Schobert I, Pfreundschuh M. (1995) Human neoplasms elicit multiple specific immune responses in the autologous host. Proc Natl Acad Sci USA; 92:11810–11813.

Chapter 13 Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies Jose A Martínez-Climent, Lorena Fontan, Vicente Fresquet, Eloy Robles, María Ortiz, and Angel Rubio Summary During the last decade, gene expression microarrays and array-based comparative genomic hybridization (array–CGH) have unraveled the complexity of human tumor genomes more precisely and comprehensively than ever before. More recently, the simultaneous assessment of global changes in messenger RNA (mRNA) expression and in DNA copy number through “integrative oncogenomic” analyses has allowed researchers the access to results uncovered through the analysis of one-dimensional data sets, thus accelerating cancer gene discovery. In this chapter, we discuss the major contributions of DNA microarrays to the study of hematological malignancies, focusing on the integrative oncogenomic approaches that correlate genomic and transcriptomic data. We also present the basic aspects of these methodologies and their present and future application in clinical oncology. Key words: Oncogenomics, Array–CGH, Lymphoma, Gene expression

1. Introduction The application of gene expression microarrays has allowed the definition of common patterns of gene expression that can distinguish pathologically different tumors, but has also revealed degrees of heterogeneity between and within tumors (1–5). Alizadeh and colleagues were the first to use microarrays to identify subtypes of a single disease (diffuse large B-cell lymphoma [DLBCL]) that could only be defined by their gene expression patterns (6). Since this pioneering report, gene expression profiling has made significant contributions in basic and applied cancer

Robert Grützmann and Christian Pilarsky (eds.), Cancer Gene Profiling: Methods and Protocols, Methods in Molecular Biology, vol. 576, DOI 10.1007/978-1-59745-545-9_13, © Humana Press, a part of Springer Science + Business Media, LLC 2010

231

232

Martínez-Climent et al.

research by providing useful prognostic biomarkers, defining novel oncogenic pathways, characterizing molecular portraits of transformation and metastasis, and revealing unique gene signatures of therapeutic response (5, 7–9). Today, the extrapolation of some of these microarray findings to more efficient and costeffective laboratory techniques, such as quantitative polymerase chain reaction (PCR), flow cytometry, or immunohistochemistry, are remarkable examples that place basic science closer to clinical medicine (8). A different DNA microarray technology, termed comparative genomic hybridization (CGH) to microarrays (from now on, array–CGH), can detect and map changes in the DNA copy number that are present in tumors but not in the corresponding nontumoral germline sequences (10–12). In seminal papers, Pinkel, Lichter, and colleagues used high-resolution array–CGH with bacterial artificial chromosomes (BACs) as clones to precisely define amplicon structures and deletion borders in tumors, mapping the corresponding gene loci targeted by the amplification and deletion processes (12–14). Since then, notable improvements in the resolution and sensitivity of current genome-wide array–CGH platforms have made possible the accurate screening for genomewide aberrations in large tumor sets (11, 15, 16). More recent oligonucleotide-based single-nucleotide polymorphism (SNP) microarrays were able to detect not only DNA copy number changes but also copy-neutral genetic aberrations such as loss of heterozygosity (LOH) caused by uniparental disomy (UPD) (17). Overall, systematic scanning of cancer genomes using array–CGH has served to describe patterns of genetic alterations linked to the genesis and dissemination of human tumors. Although gene expression microarrays and array–CGH are both mature technologies, the simultaneous assessment of global changes in messenger RNA (mRNA) expression and in DNA profile through “integrative oncogenomics” represents a relatively novel approach that attempts to accelerate cancer genome annotation and gene target discovery at a genome scale (5, 18). Using these comparative systems, which need support from robust bioinformatics tools, researchers have access to results uncovered through the analysis of one-dimensional data sets. Notable examples are the identification of specific chemotherapy response signatures by microarray analyses of multiple human biopsies and human-like tumors from genetically manipulated mice, and the construction of regulatory genetic networks where participating cancer genes can be functionally characterized in proper molecular and cellular contexts (18). In this chapter, we discuss the major contributions of DNA microarrays to the study of hematological malignancies, focusing on the integrative oncogenomic approaches that correlate genomic and transcriptomic data. We also present the basic aspects of these methodologies and their application in clinical oncology.

Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies

2. Application of Gene Expression Microarrays in Hematological Tumors

233

The study of hematological malignancies has particularly benefited from gene expression analysis, and crucial discoveries about diagnosis, prognosis, and pathogenetic mechanisms of these diseases have been made. In acute myeloid leukemia (AML), good prognostic subgroups are defined by the presence of specific chromosomal rearrangements such as the translocations t(8;21) and t(15;17) or the inversion of chromosome 16, whereas translocations affecting the MLL gene in chromosome 11q23 or deletions of chromosome 5q or 7q characterize poor prognostic subgroups (19, 20). Gene expression profiling has been able to identify these leukemia subgroups with high accuracy. In two landmark papers, gene expression profiling detected not only previously defined genetically and prognostically subgroups in AML but also novel clusters with adverse prognosis (21, 22). A different approach evaluated the expression profiling of CD34+ hematopoietic stem/progenitor cells, revealing distinct subtypes of therapy-related AML (23). In B-cell acute lymphoblastic leukemia (B-ALL), gene expression profiles distinguished each of the prognostically important leukemia subtypes, including those with specific chromosomal translocations: t(1;19)-E2A-PBX1, t(9;22)-BCR-ABL, t(11q23)-MLL, and t(12;21)-TEL-AML1, as well as those with hyperdiploidy with >50 chromosomes. Further, within some of these genetic subgroups, those patients who eventually relapsed presented typical gene expression patterns that allowed their recognition (24). Gene expression studies have also deciphered novel oncogenic pathways in childhood T-cell acute lymphoblastic leukemia (T-ALL) (25), and in adult T-cell lymphoma in leukemic phase (26). Notably, these studies identified T-ALL subgroups with molecular signatures associated with favorable prognosis (HOX11), while those expressing TAL1, LYL1, or HOX11L2 presented much worse responses to treatment (25). A common aspect of these studies is the identification of molecular subgroups defined by oncogenes that are aberrantly expressed in the absence of chromosomal abnormalities. Therefore, gene expression microarrays can identify all leukemias within identical molecular subgroups, including cases with typical chromosomal rearrangements but also others that would be missed by standard cytogenetic and molecular techniques. One additional goal of gene expression microarrays has been to search for therapeutic targets in patients with leukemia. FLT3 mutations, a common genetic abnormality in AML, is an independent prognostic indicator of poor outcome and response to standard chemotherapy (27). In a complementary DNA (cDNA) microarray study of childhood leukemias, FLT3 was found to be overexpressed in patients carrying MLL gene translocations. Subsequent studies showed that FLT3 inhibitors are active against leukemias with

234

Martínez-Climent et al.

MLL rearrangements in vitro and in vivo (28, 29). In conclusion, it is unquestionable that gene expression arrays have had an enormous impact in our current understanding of acute leukemias. To move on, clinical trials should evaluate novel therapies in patients who are stratified according to the molecular profiles determined at the time of diagnosis. Molecular profiling has also been crucial in deciphering the pathogenesis of B-cell malignancies. DLBCL can be divided into molecular subgroups based on their cellular origins; these subgroups significantly differed in therapy response and cure rate (6, 30). Importantly, these molecular subsets of disease, namely germinal center DLBCL (GC-DLBCL) and activated B-cell DLBCL (ABC-DLBCL), were only distinguishable by gene expression profiling and not by other current diagnostic methods (6). Following these studies, immunohistochemistry-based assays were developed to classify ABC-DLBCL and GC-DLBCL cases on a routine basis (31). Additionally, the lymphochip survival-prediction data were further validated through measuring single expression of six genes (LMO2, BCL6, FN1, CCND2, SCYA3, and BCL2) by quantitative PCR, which was sufficient to predict overall survival in patients with DLBCL treated either with cyclophosphamide, hydroxydaunomycin, Oncovin (vincristine), and prednisone (CHOP) or, more recently, with CHOP plus the anti-CD20 monoclonal antibody rituximab (R-CHOP) (32, 33). Most importantly, translation of this molecular knowledge to the clinic may lead to therapeutic tailoring in patients with DLCBL. For instance, ABC-DLBCL cases show constitutive activation of NF-kB genes and thus respond uniquely to NF-kB inhibitors (34). Further refinement of these investigations showed a cooperative signaling through the STAT3 and NF-kB pathways in a subset of ABC-DLBCL cases. A smallmolecule inhibitor of JAK signaling, which blocked STAT3 signature expression, was toxic only for ABC-DLBCL lines and synergized with an NF-kB inhibitor (35). Additional Affymetrix microchips profiled molecular signatures of DLBCL cases with different responses to standard chemotherapy, thus revealing unique pathways associated with poor responses (36). Moreover, the use of multiple clustering and gene set enrichment analysis allowed the identification of three discrete subsets of DLBCL termed “oxidative phosphorylation,” “B-cell receptor/proliferation,” and “host response” (HR), pointing out that the tumor microenvironment and the host inflammatory response are defining features in DLBCL (37). These molecular routes altered in subsets of patients with DLBCL may represent important targets for therapeutic intervention using specific drugs (37–40), some of which are being used in phase II clinical trials (41, 42). Subsequent studies have revealed a transcriptional signature with differential expression of BCL6 target genes that can accurately identify DLBCL cases carrying genetic alterations

Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies

235

of the BCL6 oncogene. Notably, the DLBCL subgroup with the BCL6 expression signature is uniquely sensitive to BCL6 inhibitors (43–45). Other remarkable achievements of gene expression analysis of B-cell malignancies include the recognition of molecular links between apparently different entities such as Hodgkin disease and primary mediastinal B-cell lymphoma, suggesting a putative common cellular origin (46–48); the discovery of better prognostic markers, such as ZAP70 expression measurement as an indicator of survival length of patients with B-cell chronic lymphocytic leukemia (B-CLL) (49–52); a more accurate molecular definition of Burkitt lymphoma that expands the spectrum of the WHO criteria for this disease (53, 54); the correlation of survival duration in patients with follicular lymphoma (FCL) with gene expression profiles reflecting an interaction between tumor cells and infiltrating immune cells (55, 56); the understanding of the roles of cell cycle control and DNA repair pathways that correlate with cell proliferation and clinical outcome in mantle cell lymphoma (57, 58); the definition of a typical gene expression profiling of hairy cell leukemia that reveals a phenotype related to memory B cells with altered expression of chemokine and adhesion receptors (59); and the investigation of the multistep transformation of monoclonal gammopathy of undetermined significance to multiple myeloma by global gene expression analysis (60–63). Recently, expression microarrays have been used to investigate questions on leukemia biology and therapy in more complex functional model systems. Krivtsov and colleagues demonstrated in a mouse model of leukemia initiated by MLL-AF9 oncogenic fusion that leukemia stem cells maintain the global identity of the progenitor cells from which they arose while activating a limited stem-cell or self-renewal-associated gene expression program characteristic of hematopoietic stem cells (64). In a different report, Ngo and colleagues used a doxycycline-inducible retroviral vector for the expression of small hairpin RNAs (shRNAs) for 2,500 human genes in ABC-DLBCL and GC-DLBCL cell lines. Each vector was engineered to contain a unique 60-bp “bar code,” allowing the abundance of an individual shRNA vector within a population of transduced cells to be measured using microarrays of the bar code sequences. Results determined that a subset of shRNA vectors was depleted from the transduced cells when shRNA expression was induced, uncovering the CARD11 gene as a key component responsible for the constitutive NF-kB activation in ABC-DLBCL but not in GC-DLBCL (65). Further validating the screening, this group of investigators found mutations in exons encoding the coiled-coil domain of CARD11 gene that activated the NF-kB pathway in 9.6% of patients with ABC-DLBCL (66). These data point to CARD11 as an attractive therapeutic target in this lymphoma subgroup, but we still do not

236

Martínez-Climent et al.

know the genetic lesions in the remaining 90% of ABC-DLBCL cases with NF-kB signaling. Similar massive genomic screenings using RNA interference (RNAi) have been recently applied to other tumor cell types, providing alternative functional information of cancer cells (67–69). Palomero and colleagues applied gene expression microarrays to T-ALL cell lines that were classified as sensitive or resistant to gamma-secretase inhibitors, which block a proteolytic cleavage required for NOTCH1 activation. Among the gene targets that were found differentially expressed was the tumor suppressor PTEN. Further investigations demonstrated that NOTCH1 regulates the expression of PTEN and the activity of the phosphoinositol-3 kinase (PI3K)–AKT signaling pathway in normal and leukemic T cells (70). This novel observation suggests the need to simultaneously inhibit both pathways to improve therapeutic efficacy in T-ALL. Microarray technologies have been also used to measure the global expression of a class of small noncoding RNA species, known as microRNAs (miRNA) in tumors (71, 72). One of the cancers where miRNA profiling has provided critical information is B-CLL. Patients with B-CLL and prolonged survival were characterized by downregulation of miR-15a and miR-16-1 located at 13q14.3 (73, 74). On the other hand, B-CLL cases with unmutated IgVH or with elevated expression of ZAP70 showed high levels of TCL1 due to low-level expression of miR-29 and miR-181, which directly target this oncogene (75). These data suggest that B-CLL is a disease in which the main pathogenetic alterations may occur in miRNAs.

3 . DNA Copy Number Variation in Leukemia, Lymphoma, and Myeloma

Tumor genomes usually show a large diversity of abnormalities, ranging from point mutations to overt chromosomal aberrations, that have been accumulated through the process of malignant transformation. Although array–CGH cannot detect small sequence mutations or reciprocal chromosomal translocations, this technology has facilitated the description of global portraits of DNA copy number aberrations in tumors with high precision and resolution (76). In hematologic malignancies, array–CGH has led to the identification of known amplicons as well as hidden gene amplifications unappreciated by other genetic screens, such as those containing c-MYC in 8q24 in DLBCL and FCL, REL ad BCL11A genes in DLBCL, JAK2 and PDL2 in primary mediastinal B-cell lymphoma, BMI1 in mantle cell lymphoma, and CCND3 and BYSL in DLBCL (47, 77–80). Some of these aberrations showed unexpected patterns of complexity only discovered through array–CGH studies. For instance, a detailed mapping of the 18q21.3 amplicon disclosed two major

Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies

237

target sites involving BCL2 and MALT1 genes in FCL and mucosaassociated lymphoid tumor (MALT) lymphoma, respectively (81). A number of the amplicon targets included genes commonly involved in IG-related chromosomal translocations (82). Moreover, some of these genomic amplifications seem to originate after regional chromosomal translocations. For example, chromosome 8q24 sequences surrounding c-MYC loci are frequently amplified in many B-cell malignancies carrying a previous t(8;14)(q24;q32) (79, 83). This is also the case for the lymphomas developed in deficient mice for P53 and nonhomologous end-joining (NHEJ) genes, which present complex genomic rearrangements with coamplification of c-MYC (chromosome 15) and IgH (chromosome 12) sequences (84). Whether these secondary genomic amplifications have any biological impact or merely reflect local genomic instability remains unknown. Overall, genomic amplification functions as a mechanism of oncogene activation alternative to IG-related translocations in B-cell lymphoma (Fig. 1a). In one interesting study, amplification of 7p22 in adult T-cell leukemia/lymphoma pinpointed CARD11 as the possible target, a gene that has been recently implicated as a NF-kB activator in ABC-DLBCL (66, 85). However, its implication in T-cell leukemia/lymphoma, which also shows constitutive NF-kB activation, awaits further studies. Noncoding microRNAs may also be the target of genomic amplification. In the chromosome 13q31.3 amplificon, commonly observed in FCL, GC-DLBCL, mantle cell lymphoma, and splenic marginal zone lymphoma, the miR-17–92 cluster results in overexpression of up to 500-fold (Fig. 1b). These microRNAs positively target the c-MYC oncogene in B-cell lymphomas (86, 87), and have been implicated in several biological processes depending on the cellular context, such as the control of monocytopoiesis through AML1 targeting and M-CSF receptor upregulation (88), or the promotion of proliferation and the inhibition of differentiation of lung epithelial progenitor cells (89). In three recent studies, the miR17-92 cluster has been shown to have critical roles in normal B-cell lymphopoiesis as well as in the pathogenesis of B-cell lymphomas and some autoimmune disorders in mice (90–92). Localization and rapid delineation of areas of genomic loss has been one of the major achievements of the application of array– CGH in tumors. Initial reports mapped deletions of known tumor suppressor genes such as P53, P16, ARF, and FHIT that were inactivated with variable frequencies in many cancer types (93–99). Similar studies have also discovered the loci of novel genes inactivated in B-cell malignancies, such as the pro-apoptotic BIM in chromosome 2q13 that is disrupted by minimal biallelic deletions in mantle cell lymphoma; the PRMD1 gene in chromosome 6q21, frequently targeted by deletion of one allele and by truncating mutation of the remaining allele in ABC-DLBCL but not in GC-DLBCL; the NF-kB inactivator gene TNFAIP3, which shows biallelic deletion

238

Martínez-Climent et al.

Fig. 1 Examples of genomic aberrations and associated gene expression changes in selected areas of the human genome in hematologic malignancies.

Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies

239

in ocular MALT lymphoma, FCL, and DLBCL, and the INK4c/ P18 gene targeted by biallelic deletion or heterozygous loss and mutation of the remaining allele in mantle cell lymphoma (Fig. 1c) (100–108). The use of DNA microarrays containing 32,000 overlapping BACs has enhanced the detection sensitivity of array–CGH devices, for instance by mapping intra-immunoglobulin gene deletions at 2p11 and 22q11 chromosomes as small as 130 kb (109). Positional identification of suppressor genes in genomic deletions has also been investigated by array-CGH in mouse models of cancer. A pioneering report by Hodgson and colleagues used array-CGH to scan the genomes of mouse islet carcinomas, revealing regional alterations that are syntenic to human genome sequences containing candidate oncogenes and suppressor genes (110). In a different study, Mao and colleagues studied radiation-induced lymphomas from P53-deficient mice. Lymphomas from P53+/− mice, but not those from P53−/− mice, showed frequent LOH and a 10% mutation rate of FBXW7/hCDC4, a gene encoding a ubiquitin ligase implicated in the control of chromosome stability. Further investigations showed that FBXW7+/− mice have greater susceptibility to radiation-induced tumorigenesis, but most tumors retain and express the wild-type allele, indicating that FBXW7 is a haploinsufficient tumor suppressor gene (111). However, interpretation of these microarray approaches in mouse models of cancer may be difficult, because BAC array–CGH studies of normal genomes from different strains of common laboratory mice revealed important segmental and sequence variations (112). Genome-wide approaches using Affymetrix SNP–CGH technology have recently demonstrated their power of high resolution to identify new molecular lesions in different cancer types. These studies revealed deletion, amplification, intragenic mutation, and structural rearrangement in genes encoding principal regulators of B lymphocyte development and differentiation in 40% of B-ALL cases, preferentially targeting the PAX5 gene (113). In accordance with these unexpected data, Cobaleda and colleagues reported that mice lacking PAX5 in mature B cells developed aggressive B-cell malignancies, which were identified by their gene expression profile as progenitor cell tumors (114). In addition, CCAAT enhancerbinding protein (CEBP) transcription factors CEBPA and CEBPB, which down-regulate PAX5 in B-cells (115), have been shown to be overexpressed in B-ALL through IG-related translocation (116). Collectively, these findings suggest that block of genes controlling B-cell development and differentiation, especially of PAX5 either by genetic inactivation or by functional suppression through CEBP family members, contributes to B-cell ALL pathogenesis. Efforts are being made to translate the correlation of genomic and clinical data detected by DNA microarrays into more feasible and applicable clinical tests. B-CLL can be considered as a prototype disease where length of patient survival can be predicted

240

Martínez-Climent et al.

by the presence of chromosomal aberrations associated with poor prognosis (deletions of chromosomes 11q22–q23 and 17p13) or with favorable prognosis (deletion of 13q14 or cases with normal karyotype) (117). Routine detection of these genetic alterations is currently performed by the fluorescence in situ hybridization (FISH) technique of bone marrow or peripheral blood samples obtained at diagnosis in many laboratories throughout the world (118). An automated BAC array–CGH detected with high precision all of these changes in a series of 106 patients with B-CLL, especially in those with >50% of tumoral cells in blood or marrow samples (119). Mantle cell lymphoma (MCL) is also characterized by a set of genomic aberrations that target genes involved in the pathogenesis of the disease. Examples include the genomic amplification of chromosomes 8q24 affecting c-MYC, 10p13 involving BMI1 oncogene and 11q13 targeting CCND1/cyclin D1, and the losses of 8p21.3 including TRAIL-R1/R2 genes, 9p21 (INK4A/ARF), 11q23 (ATM), and 17p13.1 (P53) (83, 120, 121). The pattern of these alterations has been correlated with tumor phenotypes and, thus, blastoid variants of MCL usually display inactivation of P16/INK4A and P53 genes, whereas indolent forms of MCL, usually having mutated IgVH genes, frequently show deletion of chromosome 8p (83, 120, 121). Although most patients with MCL show poor clinical outcome with current immunochemotherapy regimens, the long-term survivors can be identified by a characteristic genomic profile defined by the absence of deletions of P53, P16/ARF, and chromosome 9q21-q22, and by the presence of the deletion of chromosome 1p21-p22 (83, 122). In both B-CLL and MCL, the development of disease-specific CGH microchips may be of value in the clinic, because they should allow testing of the genomic profiles as prognostic and predictive factors of response to novel therapies. In a recent report, high-density SNP–CGH arrays were used to analyze genome-wide changes of copy number and allele status in B-CLL samples from patients who were sensitive or resistant to MDM2 inhibitors. These studies conclusively demonstrate that P53 status is the major determinant of response to MDM2 inhibitors in B-CLL (123). In a study of 107 FCL diagnostic biopsies with an array–CGH platform containing more than 26,819 BAC clones covering >95% of the human genome, 68 regional alterations were identified in >10% of cases. Importantly, 11 of these areas were independent predictors of overall survival using a multivariate analysis that included the International Prognostic Index (IPI) score. Further, two of the 11 regions (deletions of 1p36 and 6q21-q24) were also predictors of transformation risk (Cheung et al., in press). These genetic data may be useful to identify FCL high-risk patients as candidates for risk-adapted therapies. The acquisition of UPD is a common event in cancer. Genome-wide SNP analysis has revealed large-scale cryptic regions of UPD in many hematologic tumors. In AML, these alterations are

Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies

241

nonrandom and contain homozygous mutations in genes known to be mutational targets in leukemia (WT1, FLT3, CEBPA, and RUNX1) (124, 125). A high proportion of patients with myeloproliferative disorders, including polycythemia vera, essential thrombocythemia, and chronic idiopathic myelofibrosis, carry a dominant gain-of-function mutation of JAK2(126–128). Using SNP–CGH, the UPD of chromosome 9p typical of these entities has provided the molecular mechanism of homozygous mutation of JAK2 in these entities (129). These data imply that mutation of one allele precedes mitotic recombination, which acts as a “second hit” responsible for removal of the remaining wildtype allele, which is substituted with a copy of the mutated allele. Additional examples are the identification of UPD surrounding the NF1 gene locus in cases of juvenile myelomonocytic leukemia associated with neurofibromatosis (130). In lymphoma, the mutation status of genes within areas of UPD is less established, although biallelic mutations of P53 and P16/ARF have been reported in cases with UPD of 17p and 9p, respectively. SNP-CGH arrays showed that in mantle cell lymphoma and in FCL most areas of UPD were coincident with known regions of chromosome deletion (104, 105, 131). However, UPD was also observed in chromosome 6p in 20–30% of initial biopsies from patients with FCL, an area not usually targeted by DNA copy number changes. To date, the gene or genes involved in this area have not been detected (104, 105). A different application of high-density SNP–CGH arrays has been the genome-wide linkage search of 206 families with B-CLL. These studies identified potential susceptibility loci on chromosomes 2q21.2, 6p22.1, and 18q21.1. Notably, none of the regions coincided with areas of common chromosomal abnormalities frequently observed for B-CLL (132). These findings strengthen the argument for an inherited predisposition to B-CLL that might explain familial aggregation, and they support similar microarray studies in other familial cancers with unknown causing genes.

4. Integrative Oncogenomics as a Tool to Discover Novel Cancer Genes

Initial comparative genomic studies evaluated the degree to which DNA copy number alterations contributes to variations in the transcriptional program of tumors (133). Using cDNA microarrays, Pollack and colleagues found that 62% of highly amplified genes in breast tumors showed moderately or highly elevated expression. However, the influence of low-level DNA copy number changes was much more limited and only 12% of all the variation in gene expression among the breast tumors was directly attributable to underlying genomic dosage (134). Again using beast cancer as a

242

Martínez-Climent et al.

model disease, Hyman and colleagues reported that both highand low-level copy number changes had a substantial impact on gene expression, with 44% of the highly amplified genes showing overexpression and 10.5% of the highly overexpressed genes being amplified (135). A third study focused on the process of transformation of FCL to DLBCL, which is observed in more than one third of patients with FCL and is generally characterized by an aggressive clinical course and refractoriness to treatment. Parallel array–CGH and gene expression analyses revealed that FCL transformation was accompanied with a variable spectrum of recurrent genomic imbalances and gene expression changes. Among the approximately 600 genes that presented deregulated expression in the transformation phase, up to one third showed correlation with DNA copy number variation (136). Overall, these reports concluded that a fraction of transcriptomic modifications are a consequence of genomic changes in tumors. Since these studies, more sophisticated bioinformatics methods were developed for determining whether altered patterns of gene expression correlate with chromosomal abnormalities. One of these software is Chromosomal Aberration Region Miner (ChARM), a robust and accurate expectation-maximization-based method for identification of segmental aneuploidies from gene expression and array–CGH microarray data, sensitive enough to detect statistically significant and biologically relevant subtle changes in mixed populations of cells (137). Likewise, DIGMAP is a powerful computational tool enabling the coupled analysis of microarray data with genome location (138). More complex devices include the VAMP (Visualization and Analysis of array– CGH, transcriptome, and other Molecular Profiles) software, developed as a graphical user interface for visualization of CGH arrays, transcriptome arrays, SNP–CGH arrays, LOH results, and chromatin immunoprecipitation arrays. The interface offers the possibility of looking for recurrent regions of alterations, confrontation to transcriptome data or clinical information, and clustering (139). ARACNE is a different algorithm designed to scale up to the complexity of cellular regulatory networks present in microarray profiles, based on a theoretic approach that eliminates indirect interactions inferred by coexpression methods. For instance, authors demonstrated and validated a complex interactive network among the transcriptional targets of the c-MYC oncogene in B-cell lymphomas (140). One of the major advances of integrative oncogenomic approaches has been the identification of novel cancer genes. In one landmark report, Garraway and colleagues identified microphthalmia-associated transcription factor (MITF) as the target gene of a melanoma amplification by integrating SNP–CGH array maps with gene expression signatures derived from the NCI60 cell lines. Further investigation demonstrated that MITF represents a

Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies

243

“lineage survival” oncogene required for both melanoma development and metastatic spread (141). In the study by Yu and colleagues, the power of integrating multiple diverse genomic data of prostate cancer models (in vitro cell line, in vivo tumor profiling, and genome-wide location data) to search for key targets genes of the Polycomb family protein EZH2, showed the ADRB2 gene as a critical mediator of beta-adrenergic signaling (142). A number of additional papers have applied similar genetic screens to mouse models of cancer to discover new oncogenes. In a screen for gene copy number changes in mouse mammary tumors, a 350-kb amplicon from a region syntenic to a locus amplified in human cancers at chromosome 11q22 was detected. This amplicon contained only one gene, YAP, which encodes the mammalian ortholog of Drosophila Yorkie (Yki), and resulted a regulator of cellular proliferation and apoptosis in epithelial cells (143). In a mouse model of hepatocarcinoma, genome-wide analyses of tumors revealed a similar amplification at mouse chromosome 9qA1 syntenic to human chromosome 11q22. Gene expression analyses delineated cIAP1 and YAP as candidate oncogenes that cooperated to promote tumorigenesis (144). A different study characterized metastatic variants in an induced mouse model of melanoma, identifying an acquired focal chromosomal amplification that corresponded to a much larger amplification in chromosome 6p25 in human metastatic melanomas. Further investigation demonstrated that NEDD9, the only gene within the minimal common region that exhibited amplification-associated overexpression, was a bona fide melanoma metastasis oncogene (145). Through the analysis of human and mouse models of B-cell lymphoma, Chang and colleagues demonstrated that c-MYC regulates a much broader set of miRNAs than previously anticipated. Notably, MYC overexpression promoted a widespread repression of miRNA expression, primarily through direct binding to miRNA promoters (146). An important advantage of the simultaneous study of human and mouse tumors is that putative candidate genes can be functionally validated in vivo. The identification of tumor suppressor genes in cancer by classic genetics methods has been difficult and slow. In one report, integration of genomic and gene expression microarray data was applied to localize suppressor genes. Within 20 homozygous deletion areas detected in 48 human B-cell lymphoma cell lines, a number of novel candidate genes were pinpointed (100). Notably, some of these genes were shown to be inactivated in lymphoma biopsies by various genetic and epigenetic mechanisms that substantially varied among the different lymphoma subgroups. Thus, the P53inducible PIG7/LITAF was silenced by homozygous deletion in primary mediastinal B-cell lymphoma and by promoter hypermethylation in germinal center lymphoma, whereas the proapoptotic BIM gene showed homozygous deletion in mantle cell lymphoma and promoter hypermethylation in Burkitt lymphoma (100).

244

Martínez-Climent et al.

A different study evaluated the candidate target genes in chromosome 8p21.3 deletions delineated through high-resolution array–CGH of B-cell lymphomas. In previous reports, the presence of deletions of 8p in mantle cell lymphoma was associated with blood dissemination (83, 147). By comparing gene expression profiles of tumors with and without 8p deletion, only two genes within the 8p21.3 deletion, those encoding for the TRAIL receptors R1 and R2, showed significant downregulation in deleted tumors (148). However, a recent report discovered that deletion of BIN3, another gene included within the 8p21.3 commonly deleted region, generated B-cell lymphoma in aging mice (149). Loss of BIN3, which is a BAR adapter protein, did not affect normal cell proliferation but rather increased the motility of transformed cells. It is tempting to speculate that the loss of BIN3 may enhance B-cell lymphocyte migration, leading to a disseminated disease in patients with mantle cell lymphoma. A similar integrative microarray analysis revealed downregulation of the gene encoding P53-binding protein 1 (53BP1) in DLBCL with heterozygous deletion of chromosome 15q15, this deletion being more common in the BCR-DLBCL group (150). Although a reduced gene and protein dosage (haploinsufficiency) caused by the single-copy loss is suggested as the tumoral pathogenetic mechanism in these reports, further investigations are needed to validate this attractive hypothesis. A different strategy combined nonsense-mediated RNA decay microarrays and array–CGH for the genome-wide identification of genes with biallelic inactivation involving nonsense mutations and loss of the wild-type allele. This approach enabled the authors to identify previously unknown inactivating mutations in the receptor tyrosine kinase gene EPHB2, which were shown to be functionally important in the progression and metastasis of prostate cancer (151). Zardo and colleagues used an alternative approach that integrated array–CGH and restriction landmark genomic scanning for global analysis of aberrant methylation of CpG islands in a series of human glioblastomas (152). Results showed that most aberrant methylation events are focal and independent of genomic deletions, but a small subset of genes were affected by convergent methylation and deletion, including genes that exhibit tumor-suppressor activity such as SOCS1 and COE3. In a different study, Stransky and colleagues used a combination of transcriptome correlation map analysis and array–CGH to evaluate, at a large-scale, epigenetic suppression of gene expression of whole genomic regions. Authors demonstrated such regional copy number-independent deregulation of transcription by long-range epigenetic silencing in a series of bladder carcinomas (153). In another study, authors determined the expression profiling of microRNAs in T24 cells, revealing that 17 out of 313 miRNAs were upregulated after DNA demethylation and histone deacetylase inhibition treatment. One of these, miR127, was shown to repress the BCL6 oncogene, suggesting a role in the pathogenesis of this disease (154).

Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies

245

Multiple myeloma is one of the tumors where integrative oncogenomic approaches have been more successfully applied. Shaughnessy and colleagues performed microarray analysis on myeloma cells from 532 patients. Seventy genes, 30% of them mapping to chromosome 1, were linked to reduced length of survival. Importantly, most upregulated genes mapped to chromosome 1q (frequently amplified in myeloma), and downregulated genes mapped to chromosome 1p (frequently deleted in myeloma). These data suggest that altered transcriptional regulation of chromosome 1 genes contribute to multiple myeloma pathogenesis and can be used to identify high-risk disease (60). In a different study, high-resolution array–CGH data and expression profiles were determined in a collection of myeloma cell lines and patient biopsies. Unsupervised classification defined distinct genomic subtypes. Genomic and expression data integration generated a refined list of myeloma gene candidates, thereby providing a molecular framework for dissection of disease pathogenesis (155). More recently, two different groups investigated possible genetic lesions responsible for the constitutive NF-kB activation observed in multiple myeloma by integrating array–CGH and gene expression profiling data. Keats and colleagues found mutations in ten genes causing the inactivation of TRAF2, TRAF3, CYLD, and cIAP1/cIAP2 and activation of NFKB1, NFKB2, CD40, LTBR, TACI, and NIK that result primarily in constitutive activation of the noncanonical NF-kB pathway, with the single most common abnormality being inactivation of TRAF3 (156). Annunziata and colleagues compared the genetic profiles of multiple myeloma cell lines that were resistant or sensitive to an inhibitor of IkappaB kinase beta (IKKbeta) targeting the NF-kB pathway. Sensitive cell lines with NF-kB activation showed frequent genetic or epigenetic alteration of NIK, TRAF3, CYLD, cAPI1/cAPI2, CD40, NFKB1, or NFKB2 genes (157). These two complementary reports uncovered frequent genetic lesions of genes in the NF-kB pathway, suggesting that NF-kB inhibitors hold promise for the treatment of this disease.

5. Future Investigations: Integrative Computational Analysis of Novel High-Throughput Genetic Technologies in Cancer Biology

A myriad of new high-throughput technologies are being used in cancer research, including exon arrays to analyze alternative splicing, tiling arrays for high-resolution investigation of DNA and histone methylation patterns, on-chip chromatin immunoprecipitation to discover DNA–protein interactions, and protein microarrays to measure global protein expression portraits. Consequently, next comparative oncogenomic and proteomic assays will attempt to visualize these complex molecular interactions

246

Martínez-Climent et al.

in the context of highly connected and regulated cellular networks. While we assist these fantastic advances, our last challenge is to use this comprehensive biological knowledge to accelerate the transition from current empirical therapies to tailored medicine.

6. Materials 6.1. Total RNA Preparation for Microarray Analysis

This protocol is suitable for total RNA sample preparation for microarray analysis from cell lines or fresh frozen tissues. RNA obtained this way is very clean and salt free (see Notes 1 and 2). 1. TRIzol® Reagent, Invitrogen Life Technologies. 2. RNeasy® Mini Kit, QIAGEN. 3. Absolute ethanol (store ethanol at room temperature). 4. 80% ethanol (store ethanol at room temperature). 5. IKA® T-10 Basic Homogenizer (for fresh frozen tissue). 6. Nanodrop ND-1000 Spectrophotometer. 7. 2100 Bioanalyzer and Agilent, RNA 6000 Nano LabChip® kit.

6.2. DNA Preparation for Microarray Analysis

This protocol is based on the procedure established by QIAGEN using their DNeasy® Blood & Tissue kit. 1. DNeasy® Blood & Tissue kit, QIAGEN. 2. Absolute ethanol. 3. Reduced EDTA TE buffer (10 mM Tris–HCl, 0.1 mM EDTA, pH 8.0). 4. Nanodrop ND-1000 Spectrophotometer.

6.3. Oligonucleotide Gene Expression Microarrays

1. One-cycle target labeling and control reagents, Affymetrix. 2. Absolute ethanol. 3. 80% ethanol. 4. GeneChip Hybridization, Wash, and Stain kit, Affymetrix. 5. GeneChip Eukaryotic Hybridization Control Kit, Affymetrix, P/N 900454 (30 reactions) or P/N 900457 (150 reactions), contains Control cRNA and Control Oligo B2. 6. Nanodrop ND-1000 spectrophotometer. 7. 2100 Bioanalyzer and Agilent, RNA 6000 Nano LabChip® kit. 8. Hybridization Oven 640, Affymetrix. 9. Heatblock.

Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies

247

10. Fluidics Station 450, Affymetrix. 11. GeneChip® Scanner 3000, Affymetrix. 6 .4. CGH to BAC Microarrays

1. 2.5× random primers (BioPrime DNA labeling systems, Invitrogen). Store at −20°C. 2. Genomic DNA. 3. Klenow fragment (40 U/mL, BioPrime DNA labeling system, Invitrogen). Store at −20°C. 4. Cy3- and Cy5-labeled dCTP (1 mM, Amersham Pharmacia Biotech, Inc.). 5. 0.5 M EDTA, pH 8.0. 6. 1 M Tris-HCl, pH 7.6. 7. 10× dNTP mixture in sterile water: 3.7 mM dATP, dTTP, and dGTP (Invitrogen), 1.8 mM dCTP (Invitrogen), 10 mM Tris-HCl, pH 7.6, and 1 mM EDTA. 8. Sephadex G-50 spin column (Amersham Pharmacia Biotech, Inc). 9. Human cot-1 DNA (1 mg/mL, Invitrogen). 10. 20% sodium dodecyl sulfate (SDS) in sterile H2O (heat at 68°C to dissolve). 11. 100% ethanol. Store at −20°C. 12. 3.0 M sodium acetate, pH 5.2. 13. Dextran sulfate sodium salt (500,000 MW). 14. Formamide (re-distilled, ultra pure, Invitrogen). Store at −20°C. 15. 20× SSC (3.0 M NaCl, 0.3 M sodium citrate, pH 7.0). 16. Master mix mixture: dissolve 1 g dextran sulfate in 5 mL of formamide, 1 mL of 20× SSC, and 1 mL dH2O. Adjust to pH 7.0 with approximately two drops of HCl. 17. PN buffer: 0.1 M sodium phosphate, 0.1% Nonidet P40, pH 8.0. 18. UV Stratalinker 2400 (Stratagene) capable of producing 130,000 × 100 mJ UV. 19. Rocking table (~1 rpm) inside a 37°C incubator. 20. Rubber cement (Ross, American Glue Corporation). 21. Silicon gasket (Press-to-seal, 2-mm thick, #62-6508-24, PGC Scientific). 22. 100% glycerol. 23. 10× phosphate-buffered saline (PBS). 24. Stereomicroscope. 25. Binder clips, medium size.

248

Martínez-Climent et al.

26. 1M Pixel CCD Imager (custom made; Dan Pincel, UCSF) or the 2-color scanner array WoRxe Biochip Reader (AppliedPrecision, Issaquah, WA, USA), a white-light CCD-based system that provides highest quality images along with more accurate and repeatable microarray results. 6.5. High-Resolution SNP–CGH Microarrays

1. Reduced EDTA TE buffer (10 mM Tris–HCl, 0.1 mM EDTA, pH 8.0), TEKnova. 2. 250 ng genomic DNA per array working stock, 50 ng/mL. 3. StyI (10,000 U/mL), New England Biolabs (NEB). 4. NspI (10,000 U/mL), New England Biolabs (NEB). 5. AccuGENE® Water, molecular biology grade, Cambrex. 6. T4 DNA Ligase, New England Biolabs (NEB). 7. Adaptor Nsp (50 mM), Affymetrix. 8. Adaptor Sty (50 mM), Affymetrix. 9. G-C Melt (5 M), Clontech. 10. dNTP (2.5 mM), Takara or Fischer Scientific. 11. PCR Primer 002 (100 mM), Affymetrix. 12. Clontech TITANIUM® Taq Polymerase (50×), Clontech. 13. All purpose Hi-Lo DNA Marker, Bionexus, Inc. 14. DNA amplification clean-up kit, to be used with Affymetrix DNA products (one plate). The kit contains RB buffer. 15. Fragmentation reagent (DnaseI), Affymetrix. 16. 10× Fragmentation Buffer, Affymetrix. 17. 4% TBE Gel, BMA Reliant precast (4% NuSieve 3:1 Plus Agarose), Cambrex. 18. GeneChip® DNA Labeling Reagent (30 mM), Affymetrix. 19. Terminal deoxynucleotidyl transferase (30 U/mL), Affymetrix. 20. 5× Terminal deoxynucleotidyl transferase buffer, Affymetrix. 21. 5 M tetramethyl ammonium chloride (TMACL), Sigma. 22. MES Hydrate Sigma Ultra, Sigma. 23. MES Sodium salt, Sigma. 24. Denhardt’s Solution, Sigma. 25. Herring sperm DNA (HSDNA), Promega. 26. Human Cot-1 DNA®, Invitrogen. 27. Oligo control reagent, 0100 (OCR, 0100), Affymetrix. 28. GeneChip 250K array (one per sample). 29. GeneAmp® PCR System 9700 Thermocycler by Applied Biosystems. 30. GeneChip Hybridization oven 640.

Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies

249

31. Manifold-QIAvac multiwell unit, QIAGEN, P/N 9014579. 32. Biomek® Seal and Sample aluminum foil lids, Beckman. 33. Jitterbug® 115 VAC, Boekel Scientific. 34. QIAGEN® Vacuum regulator, QIAGEN.

7. Methods 7 .1. Total RNA Preparation for Microarray Analysis

1a. For fresh frozen tissue samples: The amount of tissue required is variable depending on the kind of tissue and varies from 10 to 100 mg to get 10–300 mg of total RNA. Be careful not to let tissue thaw before homogenization. Homogenize tissue directly in TRIzol® reagent using an electric homogenizer by means of a small-gauge generator (5 mm). The recommended volume of TRIzol® is 1 mL for each 50–100 mg of tissue. Homogenize each sample tube at least three times for at least 1 min each time. Keep the samples on ice in between each round of homogenization because overheating of samples can cause RNA degradation. 1b. For cell lines: Pellet cells by centrifugation and completely remove culture medium. Do not wash cells at this time, proceed directly to lyse cells with the appropriate amount of TRIzol® reagent (recommended by manufacturer: 1 mL/5–10 × 106 cells) by pipetting. 2. Let the samples stand for 5 min at room temperature. 3. Pass the sample twice through a 25-gauge needle to reduce viscosity of the sample. 4. Add 200 mL of chloroform per milliliter of TRIzol® used and shake the sample for 15 s vigorously by hand. Incubate for 1 min and shake again for 15 s. 5. Centrifuge the sample at 12,000 × g for 15 min at 2–8°C. 6. After centrifugation, the mixture separates into two phases, the colorless upper phase is the aqueous phase containing the RNA. The other phase is the pink phase (phenol– chloroform) that contains DNA and proteins. Take 200 mL from the top layer to continue and add to 700 mL of QIAGEN RLT buffer in a new RNase-free tube. (Do not add 2-mercaptoethanol to RLT buffer because it may increase background in the array). 7. Add 500 mL of absolute ethanol to the sample (200 mL + 700 mL RLT). Mix well by vortexing. 8. Apply the mixture to a QIAGEN Mini or MicroElute spin column and spin for 15 s at 8,000 × g. Discard the flow-through

250

Martínez-Climent et al.

and repeat the procedure until all the sample has been loaded onto the column. 9. Replace the collector tube for a new tube and wash the column by adding 500 mL of the RPE buffer. Centrifuge for 15 s at 8,000 × g and discard the flow-through. 10. Add 700 mL of 80% ethanol and spin at 8,000 × g 15 s. Repeat this step again to efficiently remove all guanidine salts. 11. Transfer the column to a new collector tube and spin for 5 min at top speed with tubes cap off to ensure removal of ethanol. 12. To elute RNA, transfer the column to a new 1.5-mL RNasefree microfuge tube. Elute with 20 or 14 mL of RNase-free water for Mini or MicroElute Spin column, respectively. 7.2. Quality Control of RNA

To qualify RNA for microarray applications, it is important to measure its concentration, 260/280 ratio, 260/230 ratio, and RNA integrity. We use Nanodrop to asses that the concentration is at least 250 ng/mL, the 260/280 ratio is between 1.9 and 2.1; and the 260/230 ratio is greater than 1.5 (this determines the presence of salts that could inhibit labeling reactions). Integrity of RNA can be measured by studying integrity of ribosomal RNA (rRNA) on a gel. Affymetrix recommends the use of the capillary electrophoresis Bioanalyzer 2100 system from Agilent. This software calculates the RNA integrity number (RIN), which in our experience should be greater than 8.0 to guarantee that the sample will work properly on the array.

7.3. DNA Preparation for Microarray Analysis

For tissue samples: 1a. The amount of tissue needed is variable, but 25-mg tissue (up to 10 mg spleen) maybe suitable for this application. Cut the tissue into small pieces, and place it in a 1.5-mL microcentrifuge tube. Add 180 mL Buffer ATL. 2a. Add 20 mL proteinase K. (600 mAu/mL) Mix thoroughly by vortexing, and incubate at 55°C until the tissue is completely lysed (it can be lysed overnight). During incubation, occasional vortexing is recommended to disperse the sample. 3a. Add 200 mL Buffer AL to the sample, and mix thoroughly by vortexing. Then add 200 mL ethanol (96–100%), and mix again by vortexing. It is essential that the sample, Buffer AL, and ethanol are mixed immediately and thoroughly to yield a homogeneous solution. For cell lines: 1b. Start from approximately 5 × 106 cells, pellet them and wash twice with 1× PBS. Resuspend the pellet in 200 mL of 1× PBS. 2b. Add 20 mL proteinase K (600 mAu/mL) and 200 mL of buffer AL, mix thoroughly by vortexing, and place at 70°C for 10 min.

Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies

251

3b. Then add 200 mL ethanol (96–100%), and again mix thoroughly by vortexing. 4. Pipet the sample (including any precipitate) into the DNeasy® Mini spin column placed in a 2-mL collection tube. Centrifuge at 6,000 × g for 1 min. Discard the flow-through and collection tube. 5. Place the DNeasy® Mini spin column in a new 2-mL collection tube, add 500 mL Buffer AW1, and centrifuge for 1 min at 6,000 × g. Discard the flow-through and collection tube. 6. Place the DNeasy® Mini spin column in a new 2-mL collection tube, add 500 mL Buffer AW2, and centrifuge for 3 min at 20,000 × g to dry the DNeasy® membrane. Discard the flow-through and collection tube. 7. Place the DNeasy® Mini spin column in a clean 1.5- or 2-mL microcentrifuge tube, and pipet 200 mL Buffer AE directly onto the DNeasy® membrane. Incubate at room temperature for 1 min, and then centrifuge for 1 min at 6,000 × g to elute. If SNP arrays from Affymetrix are to be performed, then use a buffer with low EDTA concentration to elute the sample (10 mM Tris–HCl; 0.1 mM EDTA, pH 8.0) because EDTA concentration adversely affects the following reactions. 7.4. Quality Control of DNA

8. Oligonucleotide Gene Expression Microarrays 8.1. Introduction

The principal parameters to control DNA quality are concentration (for the 500K SNP array from Affymetrix, it should be at least 50 ng/mL), a 260/280 ratio of approximately 1.9 if pure DNA, and a 260/230 ratio greater than 1.5 in salt-free samples. To determine DNA integrity, we perform gel electrophoresis on a 1–2% agarose 1× TBE gel. High-quality genomic DNA will give a band of 10–20 Kb on the gel.

We use the One-Cycle Eukaryotic Target Labeling Assay from Affymetrix. It is possible to start with total RNA (1–15 mg) or mRNA (0.2–2 mg). We usually begin with 2 mg of total RNA. It is fundamental to start with the same amount of RNA for all samples to be compared. This RNA is first reverse transcribed using a T7-Oligo(dT) Promoter Primer. The second-strand synthesis reaction is mediated by RNase H. Double-stranded cDNA obtained is then purified and used as a template in the following in vitro transcription (IVT) reaction. The IVT reaction is performed in the presence of T7 RNA polymerase and a biotinylated nucleotide analog/ribonucleotide mix for complementary RNA (cRNA) amplification and biotin labeling. These biotinylated

252

Martínez-Climent et al.

cRNA targets are then cleaned up, fragmented, and hybridized to GeneChip expression arrays (see Note 3). 8 .2. Preparation of Poly-A RNA Controls for One-Cycle cDNA Synthesis (Spike-in Controls) 8.2.1. First-Strand cDNA Synthesis

The relative amount of Poly-A RNA Controls added to the sample RNA will be constant, therefore, it is dependent on the initial amount of sample. For 2 mg of RNA, 2 mL of a 1:50,000 dilution of Poly-A RNA Controls is used. 1. Mix RNA sample, diluted poly-A RNA controls, and T7-Oligo(dT) Primer. Incubate the reaction for 10 min at 70°C. Then cool the sample at 4°C for at least 2 min. 2. In a separate tube, assemble the First-Strand Master Mix: 4.0 mL of 5× 1st Strand Reaction Mix; 2.0 mL of 0.1 M DTT; 1 mL of 10 mM dNTP (per sample). 3. Transfer 7 mL of First-Strand Master Mix to each RNA/ T7-Oligo(dT) Primer mix for a final volume of 19 mL. Mix by flicking the tube a few times. Immediately place the tubes at 42°C and incubate for 2 min at 42°C. 4. Add 1 mL of SuperScript II to each RNA sample for a final volume of 20 mL. 5. Incubate for 1 h at 42°C; then cool the sample for at least 2 min at 4°C.

8.2.2. Second-Strand cDNA Synthesis

1. Prepare Second-Strand Master Mix: 91 mL RNase-free Water; 30 mL of 5× 2nd Strand Reaction Mix; 3 mL of 10 mM dNTP; 1 mL E. coli DNA ligase; 4 mL E. coli DNA Polymerase I; 1 mL RNase H (per sample). 2. Add 130 mL of Second-Strand Master Mix to each firststrand synthesis sample from First-Strand cDNA Synthesis for a total volume of 150 mL. Incubate for 2 h at 16°C. 3. Add 2 mL of T4 DNA Polymerase to each sample and incubate for an additional 5 min at 16°C. 4. Add 10 mL of 0.5 M EDTA and proceed to Section 8.2.3. Do not leave the reactions at 4°C for long periods of time.

8.2.3. Cleanup of Double-Stranded cDNA

1. Add 600 mL of cDNA Binding Buffer to the double-stranded cDNA synthesis preparation and mix by vortexing for 3 s. The color of the mixture should be yellow. If not, add 10 mL of 3 M sodium acetate pH 5.0 and mix. 2. Apply 500 mL of the sample to the cDNA Cleanup Spin Column sitting in a 2-mL collection tube, and centrifuge for 1 min at ³8,000 × g. Discard the flow-through. Repeat reload of the spin column with the remaining mixture and centrifuge as above. Discard the flow-through and collection tube.

Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies

253

3. Transfer the spin column into a new 2-mL collection tube. Wash the spin column with 750 mL of the cDNA Wash Buffer. Centrifuge for 1 min at ³8,000 × g. Discard the flowthrough. 4. Open the cap of the spin column and centrifuge for 5 min at maximum speed to completely eliminate ethanol. Discard the flow-through and collection tube. 5. Transfer spin column into a 1.5-mL collection tube, and pipet 14 mL of cDNA Elution Buffer directly onto the spin column membrane. Incubate for 1 min at room temperature and centrifuge 1 min at maximum speed (£25,000 × g) to elute. 8.3. Synthesis of Biotin-Labeled cRNA

1. Transfer the needed amount of template cDNA (if 2 mg were used as starting material, use 12 mL of purified cDNA) to RNase-free microfuge tubes and add the following reaction components in the order indicated: 8 mL RNase-free Water; 4 mL of 10× IVT Labeling Buffer; 12 mL IVT Labeling NTP Mix; and 4 mL IVT Labeling Enzyme Mix. It is important not to assemble the reaction on ice, because spermidine in the 10× IVT Labeling Buffer can lead to precipitation of the template cDNA. 2. Incubate at 37°C for 16 h in a thermal cycler.

8.3.1. Cleanup and Quantification of Biotin-Labeled cRNA

1. Add 60 mL of RNase-free water to the IVT reaction and mix by vortexing for 3 s. 2. Add 350 mL IVT cRNA Binding Buffer to the sample and mix by vortexing for 3 s. 3. Add 250 mL ethanol (96–100%) to the lysate, and mix well by pipetting. Do not centrifuge at this step. 4. Apply sample (700 mL) to the IVT cRNA Cleanup Spin Column sitting in a 2-mL collection tube. Centrifuge for 15 s at ³8,000 × g. Discard the flow-through and collection tube. 5. Transfer the spin column into a new 2-mL collection tube. Pipet 500 mL IVT cRNA Wash Buffer onto the spin column. Centrifuge for 15 s at ³8,000 × g to wash. Discard the flowthrough. 6. Pipet 500 mL 80% (v/v) ethanol onto the spin column and centrifuge for 15 s at ³8,000 × g. Discard the flow-through. 7. Centrifuge for 5 min with caps off at maximum speed to allow complete drying of the membrane. Discard the flowthrough and collection tube. 8. Transfer spin column into a new 1.5-mL collection tube, and pipet 21 mL of RNase-free water directly onto the spin column membrane. Centrifuge for 1 min at maximum speed (£25,000 × g) to elute.

254

Martínez-Climent et al.

For subsequent quantification of the purified cRNA, we dilute the eluate 1:5 or 1:4-fold in RNase-free water. We use Nanodrop to determine the concentration of the cRNA obtained and Bioanalyzer to study the sizes of the labeled products (which should have an average size of 1,580 nucleotides). If using total RNA as starting material, it is necessary to calculate an adjusted cRNA yield to reflect carryover of unlabeled total RNA. Using an estimate of 100% carryover, use the formula below to determine adjusted cRNA yield: adjusted cRNA yield = RNAm−(total RNAi) (y) RNAm = amount of cRNA measured after IVT (mg). total RNAi = starting amount of total RNA (mg). y = fraction of cDNA reaction used in IVT Sample Cleanup Module. 8.3.2. Fragmenting the cRNA for Target Preparation

1. Fragmentation of cRNA is a critical step of the protocol. When using a 49-microarray format, we will fragment 20 mg (with a volume ranging from 1 to 21 mL). The final volume of fragmentation reaction is 40 mL, where 8 mL corresponds to 5× Fragmentation Buffer. 2. Incubate the reaction at 94°C for 35 min. Put on ice after the incubation. Save an aliquot for analysis on the Bioanalyzer. This standard fragmentation procedure should produce a distribution of RNA fragment sizes from approximately 35–200 bases. Undiluted, fragmented cRNA sample is ready to perform the hybridization. If you are not going to proceed with labeling, store the sample at −20°C (or −70°C for longer-term storage).

8.3.3. Hybridization

1. Mix the following for each target, scaling up volumes for hybridization to multiple probe arrays. –15 mg fragmented cRNA (final concentration 0.05 mg/mL) –5 mL control oligonucleotide B2, 3 nM (final concentration 50 pM) –15 mL of 20× Eukaryotic Hybridization Controls (bioB, bioC, bioD, cre) (final concentration 1.5 pM) –150 mL of 2× hybridization buffer (final concentration 1×) –30 mL DMSO (final concentration 10%) –Nuclease-free water, upto 300 mL 2. Equilibrate probe array to room temperature immediately before use. 3. Heat the hybridization cocktail to 99°C for 5 min in a heat block.

Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies

255

4. Meanwhile, wet the array by filling it through one of the septa with an appropriate volume of 1× prehybridization buffer using a micropipettor and appropriate tips. Incubate the probe array at 45°C for 10 min with rotation. 5. Transfer the hybridization cocktail that has been heated at 99°C, in step 3, to a 45°C heat block for 5 min. 6. Spin the hybridization cocktail(s) at maximum speed in a microcentrifuge for 5 min to remove any insoluble material from the hybridization mixture. 7. Remove the buffer solution from the probe array cartridge and fill with 200 mL (for the 49-microarray format) of the clarified hybridization cocktail, avoiding any insoluble matter at the bottom of the tube. 8. Place the probe array into the hybridization oven, set to 45°C. Avoid stress to the motor; load the probe arrays in a balanced configuration around the axis. Rotate at 60 rpm. Hybridize for 16 h. 8 .3.4. Staining, Washing, and Scanning

Staining and washing are performed using the Fluidics Station 450 (Affymetrix). At this point, the most important issue is to select the correct script for your chip. For example, HUG-133 2.0 Plus uses protocol FS450_0001. The script contains the directions to stain and wash the microarray: the number of cycles of washing or staining, the temperature, and the buffer. For HUG-133 2.0 Plus, place Stain Cocktail 1 in sample holder 1, Stain Cocktail 2 in sample holder 2, and Array Holding Buffer in sample holder 3. In the final step, the probe array is filled with array holding buffer; arrays can be stored for 3 h at 4°C in the dark before scanning. The scanner used is the GeneChip® Scanner 3000. A complete image of the scanned array is stored as a .DAT file (scanned image, full information), and GCOS software generates the .Cel file, which represents the first summarization step because the image is summarized in median intensity/probe cell.

9. CGH to BAC Microarrays 9.1. Introduction

The arrays for CGH consist of a linker-adapter PCR representation of BAC clones printed on a substrate. Each clone contains at least one sequence tagged site (STS) and is mapped to the human genome sequence. Clones containing unique sequences near telomeres and clones containing genes known to be significant in cancer and medical genetics are included. Hybridization to these arrays allows detection of single copy gains and losses compared

256

Martínez-Climent et al.

with diploid cells even in presence of normal cell contamination (see Notes 4–6). 9.2. Random-Primed Labeling of Genomic DNA for Array–CGH Analysis

A typical random-primed labeling procedure is described. The random-primed labeling is carried out in a 25-mL reaction volume containing 600 ng genomic DNA, 1× random primers, 40 U Klenow DNA polymerase, Cy3- and Cy5-labeled dCTP, and 1× dNTP mixture. 1. Mix 6,000 ng genomic DNA with 10 mL of 2.5× random primer solution and bring the volume up to 21 mL with sterile H2O. 2. Denature the DNA by heating the mixture at 99°C in a PCR machine for 10 min. Briefly centrifuge and place on ice. 3. Add 2.5 mL of the 10× dNTP mixture, 1 mL of 1 mM Cy3and Cy5-labeled dCTP, and 0.6 mL Klenow DNA polymerase. Incubate at 37°C for 12–20 h. 4. Remove unincorporated nucleotides from the DNA. Place a Sephadex G-50 column in a 1.5-mL tube and pre-spin the column at 760 × g for 1 min. Discard the supernatant. Tap the end of the tube on a paper towel to remove the remaining supernatant from the neck of the tube. Place the column in a clean 1.5-mL tube, apply the sample onto the column, and spin at 760 × g for 2 min.

9.3. Hybridization of Fluorescently Labeled Genomic DNA for Array–CGH Analysis

1. Preparation of the array for the hybridization:

(a) Expose a printed array to 260,000 mJ (2,600 × 100 mJ) of UV using a Stratalinker. Place the slide in the Stratalinker, with the array facing up. Overcrosslinking the slide might result in a decrease in fluorescent hybridization signal.

(b) Fill a 10-mL syringe with rubber cement and fit a 200mL pipet tip onto the syringe outlet. You may have to cut 1–2 mm off the wide end of the pipet tip for it to fit well. Apply a rubber cement ring around each array on the slide, using a stereomicroscope to observe the area of the array. Air-dry and apply a second thick layer of rubber cement on top of the first layer. Air-dry the rubber cement.

2. Preparation of samples for hybridization:

(a) Combine 25 mL labeled test genomic DNA, 25 mL labeled reference genomic DNA, and 40–50 mg human Cot-1 DNA. Precipitate the DNA sample mixture by adding 2.5 volumes of ice-cold 100% ethanol and 0.1 volume of 3 M sodium acetate pH 5.2. Vortex the solution briefly and collect the precipitate by centrifugation at 14,000 × g for 45 min at 4°C.

Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies

257

(b) Carefully aspirate and discard the supernatant. Wipe the excess liquid from the tube and air-dry the pellet for approximately 5–10 min. Dissolve the pellet in 7 mL dH2O, 14 mL 20% SDS, and 49 mL master mix mixture. Incubate for 1 h at room temperature to completely resuspend.

3. Denature the DNA sample at 73°C for 13 min and then incubate at 37°C for 1–2 h to allow the Cot-1 DNA to anneal to repetitive sequences. 4. Place the array on a heat block set at 37°C for 5 min to warm the array. 5. Apply the sample (step 3) onto the array. Keep the sample at 37°C until just before application to the array to reduce nonspecific binding of the probe to the array surface. Place a silicon gasket around the edge of the slide and lay a clean glass slide on top, aligning the edges with the gasket. Clamp the assembly together using binder clips. Incubate the array for 48–68 h at 37°C on a rocking table (~1 rpm). 6. Disassembly the array assembly and rinse the hybridization solution from the slide under a stream of PN buffer. It is preferable to leave the rubber cement on the array at this time, because it will not affect the rising steps that follow. 7. Wash the slides once in 50% formamide, 2× SSC, pH 7.0, for 15 min at 45°C, followed by a 15 min wash in PN buffer at room temperature. The washes can conveniently be done in slide staining jars (coplin jars) placed in water baths. 8. At the bench, carefully remove the rubber cement with forceps, while keeping the array moist with PN buffer. 9. Mount the slide in a DAPI solution to stain the array spots (90% glycerol, 10% PBS, 1 mM DAPI). 9.4. Microarray Image Capture with CCD Imager and Microarray Image Quantification

To capture the microarray image with CCD Imager for the image quantification, we use the software “UCSF SPOT” available in www.janlab.org/downloads.html. This software allows numerical values to be obtained, expressed in log2 ratio, for the ratios comprised between the sample to be analyzed and the control sample. The numerical data are processed and saved in an Excel table. Using the software “SPROC,” the data are normalized from the spot files, generating the final log2 ratio file data with the standard deviation (medians of each three spots). At the same time, the program arranges the BACs by genomic position and chromosome location (http//genome. vse.ucsc.edu).

258

Martínez-Climent et al.

10. High-Resolution SNP–CGH Microarrays 10.1. Introduction

10.1.1. Step 1. Genomic DNA Preparation

The purpose of the Affymetrix GeneChip Mapping 500K Assay is to detect SNPs greater than 500,000 in samples of genomic DNA. The Mapping 500K Set is comprised of two arrays and two assay kits. The protocol starts with 250 ng of genomic DNA per array and will generate SNP genotype calls for approximately 250,000 SNPs for each array of the two-array set. The assay utilizes a strategy that reduces the complexity of the human genomic DNA up to tenfold by first digesting the genomic DNA with the NspI or StyI restriction enzyme and then ligating sequences onto the DNA fragments. The complexity is further reduced by a PCR procedure optimized for fragments of a specified size range. After these steps, the PCR products are fragmented, end-labeled, and hybridized to a Gene Chip array (see Note 7). To minimize contamination of the samples, the use of two separate rooms to perform the assay is recommended: one is the pre-PCR clean room (or area for the DNA template and free of PCR products), and the other is the PCR staging room or main laboratory, where the rest of steps are performed. 1. Thoroughly mix the genomic DNA by vortexing at high speed for 3 s. 2. Determine the concentration of each genomic DNA sample. 3. Based on OD measurements, dilute each sample to 50 ng/mL using reduced EDTA TE buffer.

10.1.2. Step 2. Restriction Enzyme Digestion

Before proceeding: – Program the thermal cycler in advance. Switch on the thermal cycler 10 min before the reactions are ready so that the lid is heated. – Reference genomic DNA 103 is supplied in both the GeneChip® Mapping 250K NSP or Sty Assay kits. This DNA can be used as a positive control. 1. Depending on the restriction enzyme used, prepare the following Digestion Master Mix ON ICE (for multiple samples, make a 5% excess). For the NspI digestion, 9.75 mL H2O; 2 mL of 10× NE Buffer 2; 2 mL of 10× BSA (1 mg/mL); and 1 mL NspI (10 U/mL). For the StyI digestion, 9.75 mL H2O; 2 mL of 10× NE Buffer 3; 2 mL of 10× BSA (1 mg/mL); and 1 mL StyI (10 U/mL). Note: The BSA is supplied as 100× (10 mg/mL), and needs to be diluted 1:10 with molecular biology-grade water before use. 2. Add 5 mL of genomic DNA diluted to each tube. The total amount of genomic DNA is 250 ng for each restriction enzyme.

Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies

259

3. Aliquot 14.75 mL of the digestion master mix to each tube containing DNA. Mix gently and spin at 400 × g. 4. Place the tubes in the thermal cycler and run the 500K Digest program: 37°C, 120 min; 65°C, 20 min; hold at 4°C. Store the sample at −20°C if not proceeding to the next step. 10.1.3. Step 3. Ligation

Before proceeding: – Program the thermal cycler in advance. Switch on the thermal cycler 10 min before the reactions are ready so that the lid is heated. – Ligase buffer contains ATP and should be thawed/held at 4°C. Avoid multiple freeze-thaw cycles, according to vendor’s instructions. 1. Depending on the restriction enzyme used, in the pre-PCR area, prepare the following Ligation Master Mix ON ICE (for multiple samples, prepare a 5% excess): for NspI, 0.75 mL Adaptor Nsp 50 mM; 2.5 mL of 10× T4 DNA Ligase Buffer; and 2 mL T4 DNA Ligase (400 U/mL). For StyI: 0.75 mL Adaptor Sty 50 mM; 2.5 mL of 10× T4 DNA Ligase Buffer; and 2 mL T4 DNA Ligase (400 U/mL). Total volume: 5.25 mL. 2. Aliquot 5.25 mL of the Ligation Master Mix into each digested DNA simple. Add 19.75 mL of the digested DNA to bring the total volume to 25 mL. Mix gently and spin at 400 × g for 1 min at 4°C. 3. Place the tubes into a thermal cycler and run the 500K Ligate program: 16°C, 180 min; 70°C, 20 min; hold at 4°C.

Store samples at −20°C if not proceeding to the next step within 60 min.

4. Dilute each DNA ligation reaction by adding 75 mL of molecular biology-grade water to the 25 mL (1/4 dilution). 10.1.4. Step 4: PCR

Before proceeding: – Program the thermal cycler in advance. Switch on the thermal cycler 10 min before the reactions are ready so that the lid is heated. 1. Prepare the following PCR master mix ON ICE (three PCR reactions per sample) in the pre-PCR clean room for NspI or StyI ligation reactions and vortex at medium speed for 2 s (for multiple samples, make a 5% excess). For one PCR: 39.5 mL H2O; 10 mL of 10× Clontech TITANIUM® Taq PCR Buffer; 20 mL of 5 M G-C Melt; 14 mL of 2.5 mM dNTPs; 4.5 mL of 100 mM PCR Primer 002; and 2 mL of 50× Clontech TITANIUM® Taq Polymerase. Note: 90 mg of PCR product is needed for fragmentation.

260

Martínez-Climent et al.

2. Transfer 10 mL of each diluted ligated DNA to the corresponding three PCR tubes. 3. Add 90 mL PCR master mix to obtain a total volume of 100 mL. 4. Mix gently and spin samples at 400 × g for 1 min. 5. Place in the thermal cycler in the main laboratory and run the 500K PCR program (optimized for the GeneAmp® PCR System 9700 Thermocycler): 94°C, 3 min; 30× (94°C, 30 s; 60°C, 45 s; 68°C, 15 s); 68°C, 7 min; hold at 4°C. 6. Run 3 mL of each PCR product mixed with 3 mL of 2× Gel Loading Dye on 2% TBE gel at 120 V for 1 h. PCR products can be stored at −20°C if not proceeding to the next step within 60 min. 10.1.5. Step 5: PCR Purification and Elution with Clontech Clean-Up Plate

1. Connect a vacuum manifold to a suitable vacuum source able to maintain approximately 600 mbar. 2. Place a Clean-Up Plate on top of the manifold. Cover wells that are not needed with a PCR plate cover. We recommend covering the plate with the aluminum cover, and removing the portion of the cover corresponding to the probe wells. 3. Add 8 mL of 0.1 M EDTA (diluted from the 0.5 M EDTA in water) to each PCR reaction. Seal the plate with the plate cover, vortex at medium speed for 2 s, and spin at 400 × g for 1 min. 4. Consolidate three PCR reactions for each sample into one well of the Clean-Up Plate. 5. Apply a vacuum and maintain at 600 mbar until the wells are completely dry. 6. Wash the PCR products by adding 50 mL molecular biologygrade water and dry the wells completely (~20 min). Repeat this step two additional times for a total of three water washes. 7. Switch off the vacuum source and release the vacuum. 8. Carefully remove the Clean-Up Plate from the vacuum manifold and immediately:

(a) Blot the plate on a stack of clean absorbent papers to remove any liquid that might remain on the bottom of the plate.

(b) Dry the bottom of each well with an absorbent wipe.

9. Add 45 mL RB buffer to each well. Cover the plate with PCR plate cover film and seal tightly. Moderately shake the CleanUp Plate on a plate shaker for 10 min at room temperature. 10. Recover the purified PCR product to clean tubes by pipetting the eluate out of each well and transferring it to the corresponding tube.

Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies

10.1.6. Step 6: Quantification of Purified PCR Products

261

1. Add 2 mL of the purified PCR product to 198 mL molecular biology-grade water and mix well. 2. Read the absorbance at 260 nm. Ensure that the reading is in the quantitative range of the instrument (generally 0.2– 0.8 OD). 3. Apply the convention that one absorbance unit at 260 nm equals 50 mg/mL for double-stranded PCR products. 4. For fragmentation:

10.1.7. Step 7: Fragmentation

(a) Transfer 90 mg of each of the purified DNA samples to the corresponding wells of a new plate.

(b) Bring the total volume of each well up to 45 mL by adding the appropriate volume of RB buffer.

(c) Cover the plate with PCR plate cover film and seal tightly.

(d) Vortex at medium speed for 2 s, and spin down at 400 × g for 1 min.

Before proceeding: – Preheat the thermal cycler to 37°C before setting up the fragmentation reaction. – Prepare the fragmentation dilution immediately prior to use. – Perform all the dilution and mixing steps on ice. 1. Preheat the thermal cycler to 37°C. 2. Add 5 mL of 10× Fragmentation Buffer to each sample (45 mL) in the corresponding tube ON ICE, giving a total volume of 50 mL. 3. Examine the label of the GeneChip Fragmentation Reagent tube for the units per microliter definition, and calculate the dilution: Y = microliters of stock Fragmentation Reagent. X = units of stock Fragmentation Reagent per microliter (see the label on the tube). 0.05 U/mL = final concentration of diluted Fragmentation Reagent. 120 mL = final volume of diluted Fragmentation Reagent (enough for 20 reactions). Y = 0.05 U/mL × 120 mL/X U/mL. 4. Dilute the stock of Fragmentation Reagent to 0.05 U/mL as follows:

(a) Place the water, Fragmentation Buffer, and Fragmentation Reagent on ice.

(b) Combine the reagents ON ICE in the order described in the example listed below.

(c) Vortex at medium speed for 2 s.

262

Martínez-Climent et al.

An example of dilution is: 105 mL H2O; 12 mL of 10× Fragmentation Buffer; and 3 mL Fragmentation Reagent; giving a total volume of 120 mL. 5. Divide the Fragmentation Reagent into the tubes required. 6. Add 5 mL of diluted Fragmentation Reagent (0.05 U/mL) to the PCR samples tubes containing Fragmentation mix on ice. Pipet up and down several times to mix. The total volume for each sample is 50 mL. 7. Mix the tubes gently and spin briefly at 400 × g at 4°C. 8. Place the samples in a preheated thermocycler as quickly as possible, and run the 500K Fragment program: 37°C, 35 min; 95°C, 15 min; hold at 4°C. 9. Spin the samples to collect at the bottom of the tube. 10. Dilute 4 mL of fragmented PCR product with 4 mL gel loading dye and run on a 4% TBE gel. Proceed immediately to the labeling step if the result matches the example below. 10.1.8. Step 8: Labeling

Before proceeding: – Program the thermal cycler in advance. Switch on the thermal cycler 10 min before the reactions are ready so that the lid is heated. 1. Prepare Labeling Mix ON ICE and vortex at medium speed for 2 s (for multiple samples, make a 5% excess): 14 mL of 5× TdT Buffer; 2 mL of 30 mM GeneChip® DNA Labeling Reagent; and 3.5 mL TdT (30 U/mL). 2. Aliquot 19.5 mL of Labeling Master Mix into the tubes containing 50.5 mL of fragmented DNA, giving a total volume of 70 mL. 3 . Mix the reaction gently and spin at 400 × g for 1 min at 4°C. 4. Run the 500K Label program: 37°C, 4 h; 95°C, 15 min; hold at 4°C. 5. Spin the plate briefly at 400 × g to collect the reaction at the bottom of the tube. Samples can be stored at −20°C if not proceeding to the next step.

10.1.9. Step 9: Target Hybridization

Before proceeding: – It is important to allow the arrays to equilibrate to room temperature completely. Unwrap the array and leave on the bench top for 15 min. – DMSO is light sensitive. It should be contained in a dark glass bottle. – Preparation of the 12× MES Stock: 70.4 g MES Hydrate; 193.3 g MES Sodium salt; 800 mL molecular biology-grade water.

Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies

263

Mix and adjust the volume to 1,000 mL. The pH should be between 6.5 and 6.7. Filter through a 0.2-mm filter. Do not autoclave. Store between 2 and 8°C, and shield from light. 1. Prepare the Hybridization Cocktail Master Mix in the order described. For multiple samples, prepare a 5% excess: 12 mL of 12× MES; 13 mL DMSO; 13 mL of 50× Denhardt’s Solution; 3 mL of 0.5 M EDTA; 3 mL HSDNA (10 mg/mL); 2 mL OCR 0100; 3 mL human Cot-1 DNA (1 mg/mL); 1 mL of 3% Tween-20; and 140 mL of 5 M TMACL. Mix well. 2. Transfer each of the labeled samples to a 1.5-mL Eppendorf tube. Aliquot 190 mL of the Hybridization Cocktail Master Mix into the 70 mL of labeled DNA samples, giving a final volume of 260 mL. 3. Heat the 260 mL of hybridization mix and labeled DNA at 99°C in a heat block for exactly 10 min to denature. 4. Cool on crushed ice for 10 s. 5. Spin briefly at 400 × g in a microfuge to collect any condensate. 6. Place the tubes at 49°C for 1 min. 7. Inject 200 mL denatured hybridization cocktail into the array. 8. Hybridize at 49°C for 16–18 h at 60 rpm in the oven. The remaining hybridization mix can be stored at −20°C for future use.

11. Notes 1. The source of RNA is a major determinant of the success for each individual microarray experiment. In this procedure, between 3 and 50 mg of high-quality RNA (usually corresponding to a 15- to 100-mm3 tumor biopsy) is needed. Ideally, tumor biopsies frozen immediately after surgical resection in liquid nitrogen (at least at −80°C to prevent RNA degradation) should be used (158, 159). This requirement limits the study of large series of patient samples, most of which are not stored in adequate conditions, especially in retrospective analyses or in series of rare tumors that are collected from different institutions. In addition, this requirement makes it problematic to obtain early tumors or biopsies obtained through minimally invasive methods such as fine-needle aspiration (159–161). An alternative method to preserve biological specimens involves suspending the tissue in a preservative such as RNAlater

264

Martínez-Climent et al.

(Ambion, Austin, Tx, USA), followed by snap freezing of the tissue the next day. This method obviates the immediate need for nitrogen liquid, and preserves the integrity of RNA to be used in microarrays experiments (160). 2. Although there are novel methods to extract high-quality RNA from small tumor amounts (even from a single cell) and formalin-fixed tissues, the utility of these RNAs should be extensively evaluated and carefully validated in gene expression microarrays (160, 162). 3. Tumors are composed of different cell types, including malignant cells, stromal and inflammatory cells, and blood vessels. The proportion of these cell populations vary between and within tumors. Because this heterogeneity can complicate the interpretation of microarrays results, a careful selection of the tumors to be included in the study is an important step. In addition, a detailed histopathological analysis of each tumor sample is mandatory. In cases with a low percentage of tumor cells, microdissection of the tumor cells in biopsies or cell sorting by flow cytometry in blood, marrow aspirates, effusions, or desegregated lymph nodes may be a good choice (163, 164). However, expression of the nonmalignant surrounding cells may also be informative, and in some situations the analysis of both isolated tumor cells and whole tumors may be a good choice (55). One additional issue is the inclusion in the microarray study of normal cell populations to allow the comparison of the genetic profiles of tumors with their putative cells of origin and with the normal surrounding cells (3, 165). 4. In array-based CGH using BAC clones, several factors influence the success of the analysis. First, the general heterogeneity of the spotted BACs, which differ in the proportion of repetitive sequences and gene DNA contents, provide variable signal hybridization intensities. Second, like in gene expression microarrays, is the presence of “contaminating” nontumoral surrounding cells in the sample. These normal cells have two DNA copies genome wide (with the exception of X and Y chromosomes in male patients), and conversely to gene expression profiling, its analysis does not provide any biological information to the study. Thus, array–CGH should limit its application to cases with more than 50% of tumoral cells, because lower proportions may yield a normal genomic profile corresponding to the normal cells (76). Third, is the production variability among the different arrays printed at each laboratory, including the few commercially available BAC microarrays. Fourth, these arrays can be used to analyze paraffin-embedded tissues, but this largely depends on the DNA quality and integrity isolated from fixed cells.

Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies

265

Despite these difficulties, whole-genome BAC arrays of approximately 1 Mb resolution (including 3,000–4,000 probes) have been successfully applied to search for genomic changes in many cancer types, allowing an accurate description and mapping of areas of genomic amplification and deletion (14, 120, 136, 166–168). These alterations can be easily confirmed and visualized in the tumoral cells by complementary fluorescence in situ hybridization (FISH) using the same BACs as probes (169). 5. Array–CGH devices can be also applied to scan tumor genomes in other species, predominantly in laboratory mice (112, 170), dogs (171), and Drosophila (172). 6. More recently, tiling resolution human DNA microarrays with more than 32,000 overlapping BAC clones covering the entire human genome have been developed, allowing the identification of minute DNA alterations in tumors not previously detected (109, 173). However, the presence of such amounts of BAC clones that cannot be individually verified, and the inclusion of only one BAC per array (instead of the three to five BACs spotted on the 1-Mb BAC arrays) have limited the application of these initial arrays. 7. One advantage of SNP–CGH arrays is that they only use a test (tumoral) DNA that is hybridized on the chip, without needing any normal DNA as a control. Results of one particular sample are generally “normalized” with respect to available data obtained from the study of a pool of normal DNAs; however, to increase sensitivity and avoid false positive results, the analysis of tumoral and normal DNAs from each individual in two different arrays is usually recommended (174–176). Important limitations of this technology include the poor-quality results obtained from the analysis of DNAs extracted from paraffin-embedded tissues and the limitation for the detection of areas of UPD in biopsies with more than 50% of nontumoral cells.

12. Integrative Oncogenomics: Correlation of Genomic Aberrations and Gene Expression Data

We describe step-by-step our recommended sequence of algorithms and statistical tests to integrate expression data with copy number data. 1. Derive gene expression levels and raw copy number data The data from the expression and copy number .cel files must be preprocessed to remove noise and make the arrays comparable between them.

266

Martínez-Climent et al.

For gene expression data, RMA, GCRMA (177, 178), dChip, or other methods can be used. The authors recommend the use of RMA, because it has became the de facto standard to obtain the expression levels of a gene. To derive the raw copy number, there are several methods, such as CNAT, CNAG (179), dChip, and Aroma.affymetrix (180). There are marginal differences between them. The most accurate seems to be Aroma.affymetrix. In this case, the user has to be confident using R programming language. CNAT, CNAG, and dChip provide convenient user interfaces that Aroma does not. There are other packages for the R programming language (SNPChip). The main disadvantage that occurs in these packages – and not in Aroma.affymetrix – is that all of the information of the .cel files must be stored in memory, limiting the number of arrays to be analyzed to a few tens – depending on the type of array. There are some special information files related with Affymetrix chips called chip definition files (cdf). These files provide the information on how to group each single probe into a set of probes. We recommend using the cdf provided by the Brainarray Website instead of the Affymetrix default files (181). This website updates the information of these files frequently, improving the results of the analysis. On the other hand, these definition files have the advantage that a set of probes correspond to a single gene – in the case of Affymetrix, a gene can be represented by several set of probes, making it difficult to know the correct one. 2. Segmentation of the raw copy number data Copy number alterations occur in segments of the genome – a whole chromosome, an arm of a chromosome, or a part of it. This fact can be used to extract the parts – segments – of the genome that have the same copy number. The procedure to get these parts from the raw copy number data is called segmentation. There are various algorithms to perform the segmentation. Three with the most widespread use are circular binary segmentation (CBS) (176), hidden Markov models (HMM), and CGHSeq (182). CNAT, CNAG, and dChip provide HMM segmentation whereas Aroma uses CBS segmentation. CHGSeq must be used under the Matlab platform. A major drawback of HMM is that the number of states – and the corresponding copy numbers – have to be established beforehand. If there is contamination of normal tissue in a tumor sample, copy number will no longer be an integer number, and HMM may fail to discern copy number alterations. CBS and

Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies

267

CGHSeq do not have this problem; they provide an estimation of the copy number for each of the segments. 3. Assign copy number values to the genes After the described computations, each SNP has its copy number assigned. These data have to be combined to assign to each gene its corresponding copy number. The copy number for each gene is the mean of the copy number of the segments of the genome where the gene is located. Special attention has to be paid to genes in which there are copy number changes because aberrant splicing forms can occur. 4. Remove effects in expression data that are not related with the position in the genome Segmentation can be also applied to expression data to locate segments of the genome with genes overexpressed or underexpressed. If applied to expression data, it is better to apply CBS or CGHSeq because there are no obvious means to establish the states beforehand (as needed by HMM). Another possibility is to apply a filter (a moving average across the position in the genome) to the normalized expression data. The weights of this filter can follow a Gaussian distribution (Gaussian filter). 5. Detection of cytobands whose genes have their copy number or their expression significantly modified The authors suggest performing a hypergeometric test to detect which cytobands have genes that show a significant variation in copy number (increase or decrease). The hypergeometric test has four parameters: N (the total number of genes), n (the total number of genes in the cytoband), K (the number of genes with copy number increased), and k (the number of genes within a cytoband with copy number increased). This test provides a p value that describes whether the number of genes with copy number increased is especially large, i.e., statistically significant, for a particular cytoband. This test can be performed against all of the cytobands in the genome (~300) and for all of the samples within the study. The same procedure can be applied to gene expression to detect cytobands whose genes show a significant variation in their expression. 6. Global analysis of copy number and expression changes within a study A simple procedure to describe which loci in the genome show variation within a study is to show the percentage of samples that have variation in the copy number (increased or decreased) and coherent variation of gene expression, i.e., the percentage of samples that shows increase in the copy number and upregulation (Fig. 2).

Fig. 2. Results of the study of 29 lymphoma cell lines. Two chromosomes are shown (chromosomes 17 and 18), each with two graphics. The upper plot shows the percentage of samples with copy number increased and decreased. The lower plot shows the percentage of samples with copy number and expression increased (or decreased). It can be seen, for example, that 17q21.31 shows several genes that have increased both their copy number and their expression. 18q21.31 also shows an increase in copy number and expression. The gene BCL2 is located in this region.

Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies

269

References 1. Chung CH, Bernard PS, Perou CM. (2002) Molecular portraits and the family tree of cancer. Nature genetics. 32(Suppl), 533–540. 2. DeRisi J, Penland L, Brown PO, Bittner ML, Meltzer PS, Ray M, et al. (1996) Use of a cDNA microarray to analyse gene expression patterns in human cancer. Nature genetics. 14, 457–460. 3. Brentani RR, Carraro DM, Verjovski-Almeida S, Reis EM, Neves EJ, de Souza SJ, et al. (2005) Gene expression arrays in cancer research: methods and applications. Critical Reviews in oncology/hematology. 54, 95–105. 4. Staudt LM, Dave S. (2005) The biology of human lymphoid malignancies revealed by gene expression profiling. Advances in immunology. 87, 163–208. 5. Hoheisel JD. (2006) Microarray technology: beyond transcript profiling and genotype analysis. Nature reviews. Genetics. 7, 200–210. 6. Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, et al. (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature. 403, 503–511. 7. Tinker AV, Boussioutas A, Bowtell DD. (2006) The challenges of gene expression microarrays for the study of human cancer. Cancer cell. 9, 333–339. 8. Sotiriou C, Piccart MJ. (2007) Taking geneexpression profiling to the clinic: when will molecular signatures become relevant to patient care? Nature reviews. Cancer. 7, 545–553. 9. Unger MA, Rishi M, Clemmer VB, Hartman JL, Keiper EA, Greshock JD, et al. (2001) Characterization of adjacent breast tumors using oligonucleotide microarrays. Breast cancer research. 3, 336–341. 10. Kallioniemi A, Kallioniemi OP, Sudar D, Rutovitz D, Gray JW, Waldman F, et al. (1992) Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors. Science (New York, NY). 258, 818–821. 11. Pinkel D, Segraves R, Sudar D, Clark S, Poole I, Kowbel D, et al. (1998) High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nature genetics. 20, 207–211. 12. Solinas-Toldo S, Lampel S, Stilgenbauer S, Nickolenko J, Benner A, Dohner H, et al. (1997) Matrix-based comparative genomic hybridization: biochips to screen for genomic imbalances. Genes, chromosomes & cancer. 20, 399–407. 13. Albertson DG, Ylstra B, Segraves R, Collins C, Dairkee SH, Kowbel D, et al. (2000) Quantitative mapping of amplicon structure by array

CGH identifies CYP24 as a candidate oncogene. Nature genetics. 25, 144–146. 14. Snijders AM, Nowak N, Segraves R, Blackwood S, Brown N, Conroy J, et al. (2001) Assembly of microarrays for genome-wide measurement of DNA copy number. Nature genetics. 29, 263–264. 15. Pollack JR, Perou CM, Alizadeh AA, Eisen MB, Pergamenschikov A, Williams CF, et al. (1999) Genome-wide analysis of DNA copynumber changes using cDNA microarrays. Nature genetics. 23, 41–46. 16. Barrett MT, Scheffer A, Ben-Dor A, Sampas N, Lipson D, Kincaid R, et al. (2004) Comparative genomic hybridization using oligonucleotide microarrays and total genomic DNA. Proceedings of the National Academy of Sciences of the United States of America. 101, 17765–17770. 17. Lindblad-Toh K, Tanenbaum DM, Daly MJ, Winchester E, Lui WO, Villapakkam A, et al. (2000) Loss-of-heterozygosity analysis of smallcell lung carcinomas using single-nucleotide polymorphism arrays. Nature biotechnology. 18, 1001–1005. 18. Zender L, Lowe SW. (2008) Integrative oncogenomic approaches for accelerated cancer-gene discovery. Current opinion in oncology. 20, 72–76. 19. Lowenberg B, Downing JR, Burnett A. (1999) Acute myeloid leukemia. The New England journal medicine. 341, 1051–1062. 20. Rowley JD. (2001) Chromosome translocations: dangerous liaisons revisited. Nature reviews. Cancer. 1, 245–250. 21. Bullinger L, Dohner K, Bair E, Frohling S, Schlenk RF, Tibshirani R, et al. (2004) Use of gene-expression profiling to identify prognostic subclasses in adult acute myeloid leukemia. The New England journal of medicine. 350, 1605–1616. 22. Valk PJ, Verhaak RG, Beijen MA, Erpelinck CA, Barjesteh van Waalwijk van Doorn-Khosrovani S, Boer JM, et al. (2004) Prognostically useful geneexpression profiles in acute myeloid leukemia. The New England journal of medicine. 350, 1617–1628. 23. Qian Z, Fernald AA, Godley LA, Larson RA, Le Beau MM. (2002) Expression profiling of CD34+ hematopoietic stem/ progenitor cells reveals distinct subtypes of therapy-related acute myeloid leukemia. Proceedings of the National Academy of Sciences of the United States of America. 99, 14925–14930. 24. Yeoh EJ, Ross ME, Shurtleff SA, Williams WK, Patel D, Mahfouz R, et al. (2002) Classification,

270

Martínez-Climent et al.

subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer cell. 1, 133–143. 25. Ferrando AA, Neuberg DS, Staunton J, Loh ML, Huard C, Raimondi SC, et al. (2002) Gene expression signatures define novel oncogenic pathways in T cell acute lymphoblastic leukemia. Cancer cell. 1, 75–87. 26. Kari L, Loboda A, Nebozhyn M, Rook AH, Vonderheid EC, Nichols C, et al. (2003) Classification and prediction of survival in patients with the leukemic phase of cutaneous T cell lymphoma. The Journal of experimental medicine. 197, 1477–1488. 27. Thiede C, Steudel C, Mohr B, Schaich M, Schakel U, Platzbecker U, et al. (2002) Analysis of FLT3-activating mutations in 979 patients with acute myelogenous leukemia: association with FAB subtypes and identification of subgroups with poor prognosis. Blood. 99, 4326–4335. 28. Armstrong SA, Staunton JE, Silverman LB, Pieters R, den Boer ML, Minden MD, et al. (2002) MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature genetics. 30, 41–47. 29. Armstrong SA, Kung AL, Mabon ME, Silverman LB, Stam RW, Den Boer ML, et al. (2003) Inhibition of FLT3 in MLL. Validation of a therapeutic target identified by gene expression based classification. Cancer cell. 3, 173–183. 30. Rosenwald A, Wright G, Chan WC, Connors JM, Campo E, Fisher RI, et al. (2002) The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. The New England journal of medicine. 346, 1937–1947. 31. Hans CP, Weisenburger DD, Greiner TC, Gascoyne RD, Delabie J, Ott G, et al. (2004) Confirmation of the molecular classification of diffuse large B-cell lymphoma by immunohistochemistry using a tissue microarray. Blood. 103, 275–282. 32. Lossos IS, Czerwinski DK, Alizadeh AA, Wechser MA, Tibshirani R, Botstein D, et al. (2004) Prediction of survival in diffuse largeB-cell lymphoma based on the expression of six genes. The New England journal of medicine. 350, 1828–1837. 33. Lossos IS, Morgensztern D. (2006) Prognostic biomarkers in diffuse large B-cell lymphoma. Journal of clinical oncology. 24, 995–1007. 34. Lam LT, Davis RE, Pierce J, Hepperle M, Xu Y, Hottelet M, et al. (2005) Small molecule inhibitors of IkappaB kinase are selectively toxic for subgroups of diffuse large B-cell lymphoma defined by gene expression profiling. Clinical cancer research. 11, 28–40.

35. Lam LT, Wright G, Davis RE, Lenz G, Farinha P, Dang L, et al. (2008) Cooperative signaling through the signal transducer and activator of transcription 3 and nuclear factor-{kappa} B pathways in subtypes of diffuse large B-cell lymphoma. Blood. 111, 3701–3713. 36. Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RC, et al. (2002) Diffuse large B-cell lymphoma outcome prediction by geneexpression profiling and supervised machine learning. Nature medicine. 8, 68–74. 37. Monti S, Savage KJ, Kutok JL, Feuerhake F, Kurtin P, Mihm M, et al. (2005) Molecular profiling of diffuse large B-cell lymphoma identifies robust subtypes including one characterized by host inflammatory response. Blood. 105, 1851–1861. 38. Chen L, Monti S, Juszczynski P, Daley J, Chen W, Witzig TE, et al. (2008) SYK-dependent tonic B-cell receptor signaling is a rational treatment target in diffuse large B-cell lymphoma. Blood. 111(4), 2230–2237. 39. Su TT, Guo B, Kawakami Y, Sommer K, Chae K, Humphries LA, et al. (2002) PKC-beta controls I kappa B kinase lipid raft recruitment and activation in response to BCR signaling. Nature immunology. 3, 780–786. 40. Smith PG, Wang F, Wilkinson KN, Savage KJ, Klein U, Neuberg DS, et al. (2005) The phosphodiesterase PDE4B limits cAMP-associated PI3K/AKT-dependent apoptosis in diffuse large B-cell lymphoma. Blood. 105, 308– 316. 41. Robertson MJ, Kahl BS, Vose JM, de Vos S, Laughlin M, Flynn PJ, et al. (2007) Phase II study of enzastaurin, a protein kinase C beta inhibitor, in patients with relapsed or refractory diffuse large B-cell lymphoma. Journal of clinical oncology. 25, 1741–1746. 42. Shipp MA. (2007) Molecular signatures define new rational treatment targets in large B-cell lymphomas. Hematology/the Education Program of the American Society of Hematology. 2007, 265–269. 43. Polo JM, Dell’Oso T, Ranuncolo SM, Cerchietti L, Beck D, Da Silva GF, et al. (2004) Specific peptide interference reveals BCL6 transcriptional and oncogenic mechanisms in B-cell lymphoma cells. Nature medicine. 10, 1329– 1335. 44. Polo JM, Juszczynski P, Monti S, Cerchietti L, Ye K, Greally JM, et al. (2007) Transcriptional signature with differential expression of BCL6 target genes accurately identifies BCL6-dependent diffuse large B cell lymphomas. Proceedings of the National Academy of Sciences of the United States of America. 104, 3207–3212.

Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies

45. Parekh S, Polo JM, Shaknovich R, Juszczynski P, Lev P, Ranuncolo SM, et al. (2007) BCL6 programs lymphoma cells for survival and differentiation through distinct biochemical mechanisms. Blood. 110, 2067–2074. 46. Savage KJ, Monti S, Kutok JL, Cattoretti G, Neuberg D, De Leval L, et al. (2003) The molecular signature of mediastinal large B-cell lymphoma differs from that of other diffuse large B-cell lymphomas and shares features with classical Hodgkin lymphoma. Blood. 102, 3871–3879. 47. Rosenwald A, Wright G, Leroy K, Yu X, Gaulard P, Gascoyne RD, et al. (2003) Molecular diagnosis of primary mediastinal B cell lymphoma identifies a clinically favorable subgroup of diffuse large B cell lymphoma related to Hodgkin lymphoma. The journal of experimental medicine. 198, 851–862. 48. Kuppers R, Klein U, Schwering I, Distler V, Brauninger A, Cattoretti G, et al. (2003) Identification of Hodgkin and Reed-Sternberg cell-specific genes by gene expression profiling. The journal of clinical investigation. 111, 529–537. 49. Klein U, Tu Y, Stolovitzky GA, Mattioli M, Cattoretti G, Husson H, et al. (2001) Gene expression profiling of B cell chronic lymphocytic leukemia reveals a homogeneous phenotype related to memory B cells. The journal of experimental medicine. 194, 1625–1638. 50. Orchard JA, Ibbotson RE, Davis Z, Wiestner A, Rosenwald A, Thomas PW, et al. (2004) ZAP-70 expression and prognosis in chronic lymphocytic leukaemia. Lancet. 363, 105–111. 51. Wiestner A, Rosenwald A, Barry TS, Wright G, Davis RE, Henrickson SE, et al. (2003) ZAP-70 expression identifies a chronic lymphocytic leukemia subtype with unmutated immunoglobulin genes, inferior clinical outcome, and distinct gene expression profile. Blood. 101, 4944–4951. 52. Crespo M, Bosch F, Villamor N, Bellosillo B, Colomer D, Rozman M, et al. (2003) ZAP-70 expression as a surrogate for immunoglobulin-variable-region mutations in chronic lymphocytic leukemia. The New England journal of medicine. 348, 1764–1775. 53. Hummel M, Bentink S, Berger H, Klapper W, Wessendorf S, Barth TF, et al. (2006) A biologic definition of Burkitt’s lymphoma from transcriptional and genomic profiling. The New England journal of medicine. 354, 2419–2430. 54. Dave SS, Fu K, Wright GW, Lam LT, Kluin P, Boerma EJ, et al. (2006) Molecular diagnosis of Burkitt’s lymphoma. The New England journal of medicine. 354, 2431–2442.

271

55. Dave SS, Wright G, Tan B, Rosenwald A, Gascoyne RD, Chan WC, et al. (2004) Prediction of survival in follicular lymphoma based on molecular features of tumor-infiltrating immune cells. The New England journal of medicine. 351, 2159–2169. 56. Husson H, Carideo EG, Neuberg D, Schultze J, Munoz O, Marks PW, et al. (2002) Gene expression profiling of follicular lymphoma and normal germinal center B cells using cDNA arrays. Blood. 99, 282–289. 57. Rosenwald A, Wright G, Wiestner A, Chan WC, Connors JM, Campo E, et al. (2003) The proliferation gene expression signature is a quantitative integrator of oncogenic events that predicts survival in mantle cell lymphoma. Cancer cell. 3, 185–197. 58. Martinez N, Camacho FI, Algara P, Rodriguez A, Dopazo A, Ruiz-Ballesteros E, et al. (2003) The molecular signature of mantle cell lymphoma reveals multiple signals favoring cell survival. Cancer research. 63, 8226–8232. 59. Basso K, Liso A, Tiacci E, Benedetti R, Pulsoni A, Foa R, et al. (2004) Gene expression profiling of hairy cell leukemia reveals a phenotype related to memory B cells with altered expression of chemokine and adhesion receptors. The journal of experimental medicine. 199, 59–68. 60. Shaughnessy JD, Jr, Zhan F, Burington BE, Huang Y, Colla S, Hanamura I, et al. (2007) A validated gene expression model of high-risk multiple myeloma is defined by deregulated expression of genes mapping to chromosome 1. Blood. 109, 2276–2284. 61. Davies FE, Dring AM, Li C, Rawstron AC, Shammas MA, O’Connor SM, et al. (2003) Insights into the multistep transformation of MGUS to myeloma using microarray expression analysis. Blood. 102, 4504–4511. 62. Zhan F, Barlogie B, Arzoumanian V, Huang Y, Williams DR, Hollmig K, et al. (2007) Genexpression signature of benign monoclonal gammopathy evident in multiple myeloma is linked to good prognosis. Blood. 109, 1692–1700. 63. Zhan F, Huang Y, Colla S, Stewart JP, Hanamura I, Gupta S, et al. (2006) The molecular classification of multiple myeloma. Blood. 108, 2020–2028. 64. Krivtsov AV, Twomey D, Feng Z, Stubbs MC, Wang Y, Faber J, et al. (2006) Transformation from committed progenitor to leukaemia stem cell initiated by MLL-AF9. Nature. 442, 818–822. 65. Ngo VN, Davis RE, Lamy L, Yu X, Zhao H, Lenz G, et al. (2006) A loss-of-function RNA interference screen for molecular targets in cancer. Nature. 441, 106–110.

272

Martínez-Climent et al.

66. Lenz G, Davis RE, Ngo VN, Lam L, George TC, Wright GW, et al. (2008) Oncogenic CARD11 mutations in human diffuse large B cell lymphoma. Science (New York, NY). 319, 1676–1679. 67. Peer D, Park EJ, Morishita Y, Carman CV, Shimaoka M. (2008) Systemic leukocytedirected siRNA delivery revealing cyclin D1 as an anti-inflammatory target. Science (New York, NY). 319, 627–630. 68. Schlabach MR, Luo J, Solimini NL, Hu G, Xu Q, Li MZ, et al. (2008) Cancer proliferation gene discovery through functional genomics. Science (New York, NY). 319, 620–624. 69. Silva JM, Marran K, Parker JS, Silva J, Golding M, Schlabach MR, et al. (2008) Profiling essential genes in human mammary cells by multiplex RNAi screening. Science (New York, NY). 319, 617–620. 70. Palomero T, Sulis ML, Cortina M, Real PJ, Barnes K, Ciofani M, et al. (2007) Mutational loss of PTEN induces resistance to NOTCH1 inhibition in T-cell leukemia. Nature medicine. 13, 1203–1210. 71. Lu J, Getz G, Miska EA, Alvarez-Saavedra E, Lamb J, Peck D, et al. (2005) MicroRNA expression profiles classify human cancers. Nature. 435, 834–838. 72. Lim LP, Lau NC, Garrett-Engele P, Grimson A, Schelter JM, Castle J, et al. (2005) Microarray analysis shows that some microRNAs downregulate large numbers of target mRNAs. Nature. 433, 769–773. 73. Calin GA, Dumitru CD, Shimizu M, Bichi R, Zupo S, Noch E, et al. (2002) Frequent deletions and down-regulation of micro-RNA genes miR15 and miR16 at 13q14 in chronic lymphocytic leukemia. Proceedings of the National Academy of Sciences of the United States of America. 99, 15524–15529. 74. Calin GA, Liu CG, Sevignani C, Ferracin M, Felli N, Dumitru CD, et al. (2004) MicroRNA profiling reveals distinct signatures in B cell chronic lymphocytic leukemias. Proceedings of the National Academy of Sciences of the United States of America. 101, 11755–11760. 75. Pekarsky Y, Santanam U, Cimmino A, Palamarchuk A, Efanov A, Maximov V, et al. (2006) Tcl1 expression in chronic lymphocytic leukemia is regulated by miR-29 and miR-181. Cancer research. 66, 11590–11593. 76. Pinkel D, Albertson DG. (2005) Comparative genomic hybridization. Annual review of geno mics and human genetics. 6, 331–354. 77. Fukuhara N, Tagawa H, Kameoka Y, Kasugai Y, Karnan S, Kameoka J, et al. (2006) Characterization of target genes at the 2p15–16 amplicon

in diffuse large B-cell lymphoma. Cancer science. 97, 499–504. 78. Kasugai Y, Tagawa H, Kameoka Y, Morishima Y, Nakamura S, Seto M. (2005) Identification of CCND3 and BYSL as candidate targets for the 6p21 amplification in diffuse large B-cell lymphoma. Clinical cancer research. 11, 8265–8272. 79. Werner CA, Dohner H, Joos S, Trumper LH, Baudis M, Barth TF, et al. (1997) High-level DNA amplifications are common genetic aberrations in B-cell neoplasms. The American journal of pathology. 151, 335–342. 80. Bea S, Tort F, Pinyol M, Puig X, Hernandez L, Hernandez S, et al. (2001) BMI-1 gene amplification and overexpression in hematological malignancies occur mainly in mantle cell lymphomas. Cancer research. 61, 2409–2412. 81. Sanchez-Izquierdo D, Buchonnet G, Siebert R, Gascoyne RD, Climent J, Karran L, et al. (2003) MALT1 is deregulated by both chromosomal translocation and amplification in B-cell nonHodgkin lymphoma. Blood. 101, 4539–4546. 82. Willis TG, Dyer MJ. (2000) The role of immunoglobulin translocations in the pathogenesis of B-cell malignancies. Blood. 96, 808–822. 83. Rubio-Moscardo F, Climent J, Siebert R, Piris MA, Martin-Subero JI, Nielander I, et al. (2005) Mantle-cell lymphoma genotypes identified with CGH to BAC microarrays define a leukemic subgroup of disease and predict patient outcome. Blood. 105, 4445–4454. 84. Zhu C, Mills KD, Ferguson DO, Lee C, Manis J, Fleming J, et al. (2002) Unrepaired DNA breaks in p53-deficient cells lead to oncogenic gene amplification subsequent to translocations. Cell. 109, 811–821. 85. Oshiro A, Tagawa H, Ohshima K, Karube K, Uike N, Tashiro Y, et al. (2006) Identification of subtype-specific genomic alterations in aggressive adult T-cell leukemia/lymphoma. Blood. 107, 4500–4507. 86. He L, Thomson JM, Hemann MT, HernandoMonge E, Mu D, Goodson S, et al. (2005) A microRNA polycistron as a potential human oncogene. Nature. 435, 828–833. 87. Tagawa H, Seto M. (2005) A microRNA cluster as a target of genomic amplification in malignant lymphoma. Leukemia. 19, 2013–2016. 88. Fontana L, Pelosi E, Greco P, Racanicchi S, Testa U, Liuzzi F, et al. (2007) MicroRNAs 17-5p-20a106a control monocytopoiesis through AML1 targeting and M-CSF receptor upregulation. Nature cell biology. 9, 775–787. 89. Lu Y, Thomson JM, Wong HY, Hammond SM, Hogan BL. (2007) Transgenic over-expression

Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies

of the microRNA miR-17-92 cluster promotes proliferation and inhibits differentiation of lung epithelial progenitor cells. Developmental biology. 310, 442–453. 90. Koralov SB, Muljo SA, Galler GR, Krek A, Chakraborty T, Kanellopoulou C, et al. (2008) Dicer ablation affects antibody diversity and cell survival in the B lymphocyte lineage. Cell. 132, 860–874. 91. Ventura A, Young AG, Winslow MM, Lintault L, Meissner A, Erkeland SJ, et al. (2008) Targeted deletion reveals essential and overlapping functions of the miR-17 through 92 family of miRNA clusters. Cell. 132, 875–886. 92. Xiao C, Srinivasan L, Calado DP, Patterson HC, Zhang B, Wang J, et al. (2008) Lymphoproliferative disease and autoimmunity in mice with increased miR-17-92 expression in lymphocytes. Nature immunology. 9, 405–414. 93. Dreyling MH, Bullinger L, Ott G, Stilgenbauer S, Muller-Hermelink HK, Bentz M, et al. (1997) Alterations of the cyclin D1/ p16-pRB pathway in mantle cell lymphoma. Cancer research. 57, 4608–4614. 94. Chim CS, Wong KY, Loong F, Lam WW, Srivastava G. (2007) Frequent epigenetic inactivation of Rb1 in addition to p15 and p16 in mantle cell and follicular lymphoma. Human pathology. 38, 1849–1857. 95. Faderl S, Kantarjian HM, Estey E, Manshouri T, Chan CY, Rahman Elsaied A, et al. (2000) The prognostic significance of p16(INK4a)/ p14(ARF) locus deletion and MDM-2 protein expression in adult acute myelogenous leukemia. Cancer. 89, 1976–1982. 96. Gallucci M, Guadagni F, Marzano R, Leonardo C, Merola R, Sentinelli S, et al. (2005) Status of the p53, p16, RB1, and HER-2 genes and chromosomes 3, 7, 9, and 17 in advanced bladder cancer: correlation with adjacent mucosa and pathological parameters. Journal of clinical pathology. 58, 367–371. 97. Kim CH, Yoo JS, Lee CT, Kim YW, Han SK, Shim YS, et al. (2006) FHIT protein enhances paclitaxel-induced apoptosis in lung cancer cells. International journal of cancer. 118, 1692–1698. 98. Krug U, Ganser A, Koeffler HP. (2002) Tumor suppressor genes in normal and malignant hematopoiesis. Oncogene. 21, 3475–3495. 99. Mattioli E, Vogiatzi P, Sun A, Abbadessa G, Angeloni G, D’Ugo D, et al. (2007) Immunohistochemical analysis of pRb2/p130, VEGF, EZH2, p53, p16(INK4A), p27(KIP1), p21(WAF1), Ki-67 expression patterns in gastric cancer. Journal of cellular physiology. 210, 183–191.

273

100. Mestre-Escorihuela C, Rubio-Moscardo F, Richter JA, Siebert R, Climent J, Fresquet V, et al. (2007) Homozygous deletions localize novel tumor suppressor genes in B-cell lymphomas. Blood. 109, 271–280. 101. Tagawa H, Karnan S, Suzuki R, Matsuo K, Zhang X, Ota A, et al. (2005) Genome-wide array-based CGH for mantle cell lymphoma: identification of homozygous deletions of the proapoptotic gene BIM. Oncogene. 24, 1348–1358. 102. Pasqualucci L, Compagno M, Houldsworth J, Monti S, Grunn A, Nandula SV, et al. (2006) Inactivation of the PRDM1/BLIMP1 gene in diffuse large B cell lymphoma. The Journal of experimental medicine. 203, 311–317. 103. Tam W, Gomez M, Chadburn A, Lee JW, Chan WC, Knowles DM. (2006) Mutational analysis of PRDM1 indicates a tumor-suppressor role in diffuse large B-cell lymphomas. Blood. 107, 4090–4100. 104. Ross CW, Ouillette PD, Saddler CM, Shedden KA, Malek SN. (2007) Comprehensive analysis of copy number and allele status identifies multiple chromosome defects underlying follicular lymphoma pathogenesis. Clinical cancer research. 13, 4777–4785. 105. Fitzgibbon J, Iqbal S, Davies A, O’Shea D, Carlotti E, Chaplin T, et al. (2007) Genome-wide detection of recurring sites of uniparental disomy in follicular and transformed follicular lymphoma. Leukemia. 21, 1514–1520. 106. Honma K, Tsuzuki S, Nakagawa M, Karnan S, Aizawa Y, Kim WS, et al. (2008) TNFAIP3 is the target gene of chromosome band 6q23.3-q24.1 loss in ocular adnexal marginal zone B cell lymphoma. Genes, chromosomes & cancer. 47, 1–7. 107. Kim WS, Honma K, Karnan S, Tagawa H, Kim YD, Oh YL, et al. (2007) Genome-wide array-based comparative genomic hybridization of ocular marginal zone B cell lymphoma: comparison with pulmonary and nodal marginal zone B cell lymphoma. Genes, chromosomes & cancer. 46, 776–783. 108. Thelander EF, Ichimura K, Corcoran M, Barbany G, Nordgren A, Heyman M, et al. (2008) Characterization of 6q deletions in mature B cell lymphomas and childhood acute lymphoblastic leukemia. Leukemia & lymphoma. 49, 477–487. 109. de Leeuw RJ, Davies JJ, Rosenwald A, Bebb G, Gascoyne RD, Dyer MJ, et al. (2004) Comprehensive whole genome array CGH profiling of mantle cell lymphoma model genomes. Human molecular genetics. 13, 1827–1837.

274

Martínez-Climent et al.

110. Hodgson G, Hager JH, Volik S, Hariono S, Wernick M, Moore D, et al. (2001) Genome scanning with array CGH delineates regional alterations in mouse islet carcinomas. Nature genetics. 29, 459–464. 111. Mao JH, Perez-Losada J, Wu D, Delrosario R, Tsunematsu R, Nakayama KI, et al. (2004) Fbxw7/Cdc4 is a p53-dependent, haploinsufficient tumour suppressor gene. Nature. 432, 775–779. 112. Snijders AM, Nowak NJ, Huey B, Fridlyand J, Law S, Conroy J, et al. (2005) Mapping segmental and sequence variations among laboratory mice using BAC array CGH. Genome research. 15, 302–311. 113. Mullighan CG, Goorha S, Radtke I, Miller CB, Coustan-Smith E, Dalton JD, et al. (2007) Genome-wide analysis of genetic alterations in acute lymphoblastic leukaemia. Nature. 446, 758–764. 114. Cobaleda C, Jochum W, Busslinger M. (2007) Conversion of mature B cells into T cells by dedifferentiation to uncommitted progenitors. Nature. 449, 473–477. 115. Xie H, Ye M, Feng R, Graf T. (2004) Stepwise reprogramming of B cells into macrophages. Cell. 117, 663–676. 116. Akasaka T, Balasas T, Russell LJ, Sugimoto KJ, Majid A, Walewska R, et al. (2007) Five members of the CEBP transcription factor family are targeted by recurrent IGH translocations in B-cell precursor acute lymphoblastic leukemia (BCP-ALL). Blood. 109, 3451–3461. 117. Dohner H, Stilgenbauer S, Benner A, Leupolt E, Krober A, Bullinger L, et al. (2000) Genomic aberrations and survival in chronic lymphocytic leukemia. The New England journal of medicine. 343, 1910–1916. 118. Zenz T, Dohner H, Stilgenbauer S. (2007) Genetics and risk-stratified approach to therapy in chronic lymphocytic leukemia. Best practice & research. Clinical haematology. 20, 439–453. 119. Schwaenen C, Nessling M, Wessendorf S, Salvi T, Wrobel G, Radlwimmer B, et al. (2004) Automated array-based genomic profiling in chronic lymphocytic leukemia: development of a clinical tool and discovery of recurrent genomic alterations. Proceedings of the National Academy of Sciences of the United States of America. 101, 1039–1044. 120. Kohlhammer H, Schwaenen C, Wessendorf S, Holzmann K, Kestler HA, Kienle D, et al. (2004) Genomic DNA-chip hybridization in t(11;14)-positive mantle cell lymphomas shows a high frequency of aberrations and allows a refined characterization of consensus regions. Blood. 104, 795–801.

121. Bea S, Ribas M, Hernandez JM, Bosch F, Pinyol M, Hernandez L, et al. (1999) Increased number of chromosomal imbalances and high-level DNA amplifications in mantle cell lymphoma are associated with blastoid variants. Blood. 93, 4365–4374. 122. Salaverria I, Zettl A, Bea S, Moreno V, Valls J, Hartmann E, et al. (2007) Specific secondary genetic alterations in mantle cell lymphoma provide prognostic information independent of the gene expression-based proliferation signature. Journal of clinical oncology. 25, 1216–1222. 123. Saddler C, Ouillette P, Kujawski L, Shangary S, Talpaz M, Kaminski M, et al. (2008) Comprehensive biomarker and genomic analysis identifies P53 status as the major determinant of response to MDM2 inhibitors in chronic lymphocytic leukemia. Blood. 111(3), 1584–1593. 124. Raghavan M, Lillington DM, Skoulakis S, Debernardi S, Chaplin T, Foot NJ, et al. (2005) Genome-wide single nucleotide polymorphism analysis reveals frequent partial uniparental disomy due to somatic recombination in acute myeloid leukemias. Cancer research. 65, 375–378. 125. Fitzgibbon J, Smith LL, Raghavan M, Smith ML, Debernardi S, Skoulakis S, et al. (2005) Association between acquired uniparental disomy and homozygous gene mutation in acute myeloid leukemias. Cancer research. 65, 9152–9154. 126. Baxter EJ, Scott LM, Campbell PJ, East C, Fourouclas N, Swanton S, et al. (2005) Acquired mutation of the tyrosine kinase JAK2 in human myeloproliferative disorders. Lancet. 365, 1054–1061. 127. Kralovics R, Passamonti F, Buser AS, Teo SS, Tiedt R, Passweg JR, et al. (2005) A gain-offunction mutation of JAK2 in myeloproliferative disorders. The New England journal of medicine. 352, 1779–1790. 128. Levine RL, Wadleigh M, Cools J, Ebert BL, Wernig G, Huntly BJ, et al. (2005) Activating mutation in the tyrosine kinase JAK2 in polycythemia vera, essential thrombocythemia, and myeloid metaplasia with myelofibrosis. Cancer cell. 7, 387–397. 129. Jones AV, Kreil S, Zoi K, Waghorn K, Curtis C, Zhang L, et al. (2005) Widespread occurrence of the JAK2 V617F mutation in chronic myeloproliferative disorders. Blood. 106, 2162–2168. 130. Flotho C, Steinemann D, Mullighan CG, Neale G, Mayer K, Kratz CP, et al. (2007) Genome-wide single-nucleotide polymorphism analysis in juvenile myelomonocytic leukemia

Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies

identifies uniparental disomy surrounding the NF1 locus in cases associated with neurofibromatosis but not in cases with mutant RAS or PTPN11. Oncogene. 26, 5816–5821. 131. Nielaender I, Martin-Subero JI, Wagner F, Martinez-Climent JA, Siebert R. (2006) Partial uniparental disomy: a recurrent genetic mechanism alternative to chromosomal deletion in malignant lymphoma. Leukemia. 20, 904–905. 132. Sellick GS, Goldin LR, Wild RW, Slager SL, Ressenti L, Strom SS, et al. (2007) A highdensity SNP genome-wide linkage search of 206 families identifies susceptibility loci for chronic lymphocytic leukemia. Blood. 110, 3326–3333. 133. Lockhart DJ, Winzeler EA. (2000) Genomics, gene expression and DNA arrays. Nature. 405, 827–836. 134. Pollack JR, Sorlie T, Perou CM, Rees CA, Jeffrey SS, Lonning PE, et al. (2002) Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors. Proceedings of the National Academy of Sciences of the United States of America. 99, 12963–12968. 135. Hyman E, Kauraniemi P, Hautaniemi S, Wolf M, Mousses S, Rozenblum E, et al. (2002) Impact of DNA amplification on gene expression patterns in breast cancer. Cancer research. 62, 6240–6245. 136. Martinez-Climent JA, Alizadeh AA, Segraves R, Blesa D, Rubio-Moscardo F, Albertson DG, et al. (2003) Transformation of follicular lymphoma to diffuse large cell lymphoma is associated with a heterogeneous set of DNA copy number and gene expression alterations. Blood. 101, 3109–3117. 137. Myers CL, Dunham MJ, Kung SY, Troyanskaya OG. (2004) Accurate detection of aneuploidies in array CGH and gene expression microarray data. Bioinformatics (Oxford, England). 20, 3533–3543. 138. Yi Y, Mirosevich J, Shyr Y, Matusik R, George AL, Jr. (2005) Coupled analysis of gene expression and chromosomal location. Genomics. 85, 401–412. 139. La Rosa P, Viara E, Hupe P, Pierron G, Liva S, Neuvial P, et al. (2006) VAMP: visualization and analysis of array-CGH, transcriptome and other molecular profiles. Bioinformatics (Oxford, England). 22, 2066–2073. 140. Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, et al. (2006) ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC bioinformatics. 7(Suppl 1), S7.

275

141. Garraway LA, Widlund HR, Rubin MA, Getz G, Berger AJ, Ramaswamy S, et al. (2005) Integrative genomic analyses identify MITF as a lineage survival oncogene amplified in malignant melanoma. Nature. 436, 117–122. 142. Yu J, Cao Q, Mehra R, Laxman B, Yu J, Tomlins SA, et al. (2007) Integrative genomics analysis reveals silencing of beta-adrenergic signaling by polycomb in prostate cancer. Cancer cell. 12, 419–431. 143. Overholtzer M, Zhang J, Smolen GA, Muir B, Li W, Sgroi DC, et al. (2006) Transforming properties of YAP, a candidate oncogene on the chromosome 11q22 amplicon. Proceedings of the National Academy of Sciences of the United States of America. 103, 12405–12410. 144. Zender L, Spector MS, Xue W, Flemming P, Cordon-Cardo C, Silke J, et al. (2006) Identification and validation of oncogenes in liver cancer using an integrative oncogenomic approach. Cell. 125, 1253–1267. 145. Kim M, Gans JD, Nogueira C, Wang A, Paik JH, Feng B, et al. (2006) Comparative oncogenomics identifies NEDD9 as a melanoma metastasis gene. Cell. 125, 1269–1281. 146. Chang TC, Yu D, Lee YS, Wentzel EA, Arking DE, West KM, et al. (2008) Widespread microRNA repression by Myc contributes to tumorigenesis. Nature genetics. 40, 43–50. 147. Martinez-Climent JA, Vizcarra E, Sanchez D, Blesa D, Marugan I, Benet I, et al. (2001) Loss of a novel tumor suppressor gene locus at chromosome 8p is associated with leukemic mantle cell lymphoma. Blood. 98, 3479–3482. 148. Rubio-Moscardo F, Blesa D, Mestre C, Siebert R, Balasas T, Benito A, et al. (2005) Characterization of 8p21.3 chromosomal deletions in B-cell lymphoma: TRAIL-R1 and TRAIL-R2 as candidate dosage-dependent tumor suppressor genes. Blood. 106, 3214–3222. 149. Ramalingam A, Duhadaway JB, SutantoWard E, Wang Y, Dinchuk J, Huang M, et al. (2008) Bin3 deletion causes cataracts and increased susceptibility to lymphoma during aging. Cancer research. 68, 1683–1690. 150. Takeyama K, Monti S, Manis JP, Cin PD, Getz G, Beroukhim R, et al. (2008) Integrative analysis reveals 53BP1 copy loss and decreased expression in a subset of human diffuse large B-cell lymphomas. Oncogene. 27, 318–322. 151. Huusko P, Ponciano-Jackson D, Wolf M, Kiefer JA, Azorsa DO, Tuzmen S, et al. (2004) Nonsense-mediated decay microarray analysis identifies mutations of EPHB2 in human prostate cancer. Nature genetics. 36, 979–983.

276

Martínez-Climent et al.

152. Zardo G, Tiirikainen MI, Hong C, Misra A, Feuerstein BG, Volik S, et al. (2002) Integrated genomic and epigenomic analyses pinpoint biallelic gene inactivation in tumors. Nature genetics. 32, 453–458. 153. Stransky N, Vallot C, Reyal F, Bernard-Pierrot I, de Medina SG, Segraves R, et al. (2006) Regional copy number-independent deregulation of transcription in cancer. Nature genetics. 38, 1386–1396. 154. Saito Y, Liang G, Egger G, Friedman JM, Chuang JC, Coetzee GA, et al. (2006) Specific activation of microRNA-127 with downregulation of the proto-oncogene BCL6 by chromatin-modifying drugs in human cancer cells. Cancer cell. 9, 435–443. 155. Carrasco DR, Tonon G, Huang Y, Zhang Y, Sinha R, Feng B, et al. (2006) High-resolution genomic profiles define distinct clinicopathogenetic subgroups of multiple myeloma patients. Cancer cell. 9, 313–325. 156. Keats JJ, Fonseca R, Chesi M, Schop R, Baker A, Chng WJ, et al. (2007) Promiscuous mutations activate the noncanonical NF-kappaB pathway in multiple myeloma. Cancer cell. 12, 131–144. 157. Annunziata CM, Davis RE, Demchenko Y, Bellamy W, Gabrea A, Zhan F, et al. (2007) Frequent engagement of the classical and alternative NF-kappaB pathways by diverse genetic abnormalities in multiple myeloma. Cancer cell. 12, 115–130. 158. Chowdary D, Lathrop J, Skelton J, Curtin K, Briggs T, Zhang Y, et al. (2006) Prognostic gene expression signatures can be measured in tissues collected in RNAlater preservative. The journal of molecular diagnostics. 8, 31–39. 159. Wang E, Miller LD, Ohnmacht GA, Liu ET, Marincola FM. (2000) High-fidelity mRNA amplification for gene profiling. Nature biotechnology. 18, 457–459. 160. Mazumder A, Wang Y. (2006) Gene-expression signatures in oncology diagnostics. Pharmacogenomics. 7, 1167–1173. 161. Florell SR, Coffin CM, Holden JA, Zimmermann JW, Gerwels JW, Summers BK, et al. (2001) Preservation of RNA for functional genomic studies: a multidisciplinary tumor bank protocol. Modern pathology. 14, 116–128. 162. Chen J, Byrne GE, Jr, Lossos IS. (2007) Optimization of RNA extraction from formalinfixed, paraffin-embedded lymphoid tissues. Diagnostic molecular pathology. 16, 61–72. 163. Barrett MT, Glogovac J, Prevo LJ, Reid BJ, Porter P, Rabinovitch PS. (2002) High-quality RNA and DNA from flow cytometrically sorted human epithelial cells and tissues. Biotechniques. 32, 888–890, 892, 894, 896.

164. Aoyagi K, Tatsuta T, Nishigaki M, Akimoto S, Tanabe C, Omoto Y, et al. (2003) A faithful method for PCR-mediated global mRNA amplification and its integration into microarray analysis on laser-captured cells. Biochemical and biophysical research communications. 300, 915–920. 165. Alizadeh A, Eisen M, Davis RE, Ma C, Sabet H, Tran T, et al. (1999) The lymphochip: a specialized cDNA microarray for the genomicscale analysis of gene expression in normal and malignant lymphocytes. Cold Spring Harbor symposia on quantitative biology. 64, 71–78. 166. Greshock J, Naylor TL, Margolin A, Diskin S, Cleaver SH, Futreal PA, et al. (2004) 1-Mb resolution array-based comparative genomic hybridization using a BAC clone set optimized for cancer gene analysis. Genome research. 14, 179–187. 167. Tagawa H, Suguro M, Tsuzuki S, Matsuo K, Karnan S, Ohshima K, et al. (2005) Comparison of genome profiles for identification of distinct subgroups of diffuse large B-cell lymphoma. Blood. 106, 1770–1777. 168. Fiegler H, Carr P, Douglas EJ, Burford DC, Hunt S, Scott CE, et al. (2003) DNA microarrays for comparative genomic hybridization based on DOP-PCR amplification of BAC and PAC clones. Genes, chromosomes & cancer. 36, 361–374. 169. Kallioniemi A, Visakorpi T, Karhu R, Pinkel D, Kallioniemi OP. (1996) Gene copy number analysis by fluorescence in situ hybridization and comparative genomic hybridization. Methods. 9, 113–121. 170. Chung YJ, Jonkers J, Kitson H, Fiegler H, Humphray S, Scott C, et al. (2004) A wholegenome mouse BAC microarray with 1-Mb resolution for analysis of DNA copy number changes by array comparative genomic hybridization. Genome research. 14, 188–196. 171. Thomas R, Fiegler H, Ostrander EA, Galibert F, Carter NP, Breen M. (2003) A canine cancer-gene microarray for CGH analysis of tumors. Cytogenetic and genome research. 102, 254–260. 172. Fan C, Long M. (2007) A new retroposed gene in Drosophila heterochromatin detected by microarray-based comparative genomic hybridization. Journal of molecular evolution. 64, 272–283. 173. Ishkanian AS, Malloff CA, Watson SK, DeLeeuw RJ, Chi B, Coe BP, et al. (2004) A tiling resolution DNA microarray with complete coverage of the human genome. Nature genetics. 36, 299–303. 174. Lin M, Wei LJ, Sellers WR, Lieberfarb M, Wong WH, Li C. (2004) dChipSNP: significance

Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies

curve and clustering of SNP-array-based loss-of-heterozygosity data. Bioinformatics (Oxford, England). 20, 1233–1240. 175. McCarroll SA, Hadnott TN, Perry GH, Sabeti PC, Zody MC, Barrett JC, et al. (2006) Common deletion polymorphisms in the human genome. Nature genetics. 38, 86–92. 176. Olshen AB, Venkatraman ES, Lucito R, Wigler M. (2004) Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics. 5, 557–572. 177. Cope LM, Irizarry RA, Jaffee HA, Wu Z, Speed TP. (2004) A benchmark for Affymetrix GeneChip expression measures. Bioinformatics (Oxford, England). 20, 323–331. 178. Wu Z, Irizarry RA. (2004) Preprocessing of oligonucleotide array data. Nature biotechnology. 22, 656–658; author reply 658.

277

179. Nannya Y, Sanada M, Nakazaki K, Hosoya N, Wang L, Hangaishi A, et al. (2005) A robust algorithm for copy number detection using high-density oligonucleotide single nucleotide polymorphism genotyping arrays. Cancer research. 65, 6071–6079. 180. Bengtsson H, Irizarry R, Carvalho B, Speed TP. (2008) Estimation and assessment of raw copy numbers at the single locus level. Bioinformatics (Oxford, England). 24, 759–767. 181. Dai M, Wang P, Boyd AD, Kostov G, Athey B, Jones EG, et al. (2005) Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic acids research. 33, e175. 182. Lai WR, Johnson MD, Kucherlapati R, Park PJ. (2005) Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics (Oxford, England). 21, 3763–3770.

Chapter 14 Cancer Gene Profiling in Pancreatic Cancer Felip Vilardell and Christine A. Iacobuzio-Donahue Summary High levels of RNases present in the normal pancreas and the abundance of desmoplastic stroma of most pancreatic cancers have traditionally caused difficulty in the extraction of high-quality RNA and gene expression profiling from pancreatic tissues. However, a variety of innovative strategies have made it possible to successfully perform a molecular analysis of pancreatic cancer, and the expression profiles that have been generated have provided tremendous insight into the nature of this aggressive disease. Here, we describe some of these techniques. Key words: Pancreas, Ductal adenocarcinoma, RNA extraction, Gene profiling

1. Introduction As recently as 10 years ago, it had been assumed that global expression profiling would be impossible in pancreatic tissues because of the high levels of RNases and other enzymes in the pancreas and the low neoplastic cellularity of most pancreatic cancers. However, these hurdles have been overcome by variety of approaches, and the resulting expression profiles that have been generated have literally provided a wealth of information regarding this aggressive disease (1–5, 7, 8, 10). A review of these studies indicates that different investigators have used varying and innovative strategies to discover those genes or pathways most characteristic of pancreatic cancers. The sample types that have proven most useful for gene expression profiling in the pancreas include human-derived cultured cell lines (normal and malignant) and surgically resected tissue specimens from the pancreas. Both sample types have inherent Robert Grützmann and Christian Pilarsky (eds.), Cancer Gene Profiling: Methods and Protocols, Methods in Molecular Biology, vol. 576 DOI 10.1007/978-1-59745-545-9_14, © Humana Press, a part of Springer Science + Business Media, LLC 2010

279

280

Vilardell and Iacobuzio-Donahue

advantages and disadvantages when performing a gene expression profiling study that are related to sample processing and interpretation of data, but a good understanding of these details should allow the researcher to successfully perform expression profiling experiments using pancreatic cancer tissues and obtain meaningful results. Pancreatic cell lines or cell line xenografts are very useful because they are pure populations of epithelial cells. One can therefore obtain an undiluted view of gene expression patterns (9). Additionally, neoplastic cell lines are particularly useful for evaluating the response of the neoplastic cells to various treatment strategies, delineating signaling cascades or cellular functions that may be altered by various experimental conditions. However, although cell lines are clearly useful, one must also appreciate their limitations. Cell lines are grown in artificial conditions that can result in the dysregulation of gene expression, particularly the downregulation of gene expression related to the interactions of epithelial cells (both normal and neoplastic) with their surrounding extracellular matrix components. Although this feature of cell lines may not affect some directed gene expression studies, it is nonetheless important to be aware of this limitation in interpretation of gene expression data based solely on the analysis of cell lines (6). Surgically resected tissue specimens, because they represent the neoplasm in its “native” state, are also essential for gene expression studies. However, two concerns exist regarding the use of surgically resected pancreatic tissue samples, i.e., the predominance of nonneoplastic stromal cells within the tumor tissue specimens, and the extent of messenger RNA (mRNA) degradation in pancreatic tissues. Typically, resected pancreatic cancers are composed of a minor population of infiltrating neoplastic epithelial cells surrounded by a predominance of dense fibrous (or desmoplastic) nonneoplastic stroma (Fig. 1). This stroma contains proliferating fibroblasts, small endothelial-lined vessels, inflammatory cells, and trapped residual atrophic parenchymal components of the organ invaded. A consistently low ratio of the infiltrating neoplastic epithelial cells to this abundant nonneoplastic desmoplastic response is rather unique to duct adenocarcinomas of the pancreas, in contrast to infiltrating carcinomas arising in other organ or tissue types. Microdissection or other methods of purification of the epithelial component have been used successfully to overcome this perceived obstacle (2, 4). Alternatively, some investigators have successfully used the approach of coanalyzing resected samples of chronic pancreatitis together with resected pancreatic cancers, or by coanalyzing cell lines with resected cancers, as a means to determine those genes solely overexpressed within the neoplastic tissues (3, 5). All approaches are valid, and knowledge of the gene(s) that serve as markers of a particular cell type within the expression profiling data generated

Cancer Gene Profiling in Pancreatic Cancer

281

Fig. 1. Histopathology of representative pancreatic samples. (a) Pancreatic ductal adenocarcinoma. The neoplastic glands show marked cytologic and nuclear atypia and are extensive desmoplasia. (b) Healthy pancreas. The healthy pancreas is predominantly comprised of pyramidal and basophilic acinar epithelial cells, with scattered islets of Langerhans, and normal pancreatic ducts lined by a low cuboidal epithelium also seen. (c) Advanced chronic pancreatitis. The pancreas shows a chronic lymphocytic infiltrate in association with atrophy of pancreatic acini and lobular fibrosis. (d) Pancreatic cancer cell line. Unlike the pancreatic cancer tissue shown in (a), cell lines represent a pure population of neoplastic epithelial cells seen as cells with a high nuclear/cytoplasm ratio and marked atypia. No contaminating normal cells or fibrosis are present.

can help in interpretation. Table 1 indicates those genes that are reproducible markers of the different normal and neoplastic cell populations analyzed in expression profiling. The second common perception regarding the use of pancreatic tissues is that they contain a large amount of endogenous RNases, which can potentially interfere with mRNA extraction methods preceding gene expression studies. RNases are a major secretory product of normal pancreatic acinar cells. However, there is commonly a significant loss of acinar cells within infiltrating pancreatic cancers due to atrophy or destruction of the gland by the neoplasm, thus facilitating the study of mRNA expression patterns within these otherwise stromal rich tissues. Thus, with careful technique, adequate amounts of mRNA can be extracted from quickly frozen surgically resected samples (5).

282

Vilardell and Iacobuzio-Donahue

Table 1 Markers of normal and neoplastic cell populations in expression profiling data Normal cellular function

Predominant cell population represented

Muc4

Apomucin, epithelial protection

Neoplastic duct epithelium

Claudin 4

Component of epithelial Neoplastic duct epithelium tight junctions

Fascin

Cytoskeletal protein, cellular motility

Neoplastic duct epithelium

Mesothelin

GPI-anchored protein

Neoplastic duct epithelium

PSCA

GPI-anchored protein

Neoplastic duct epithelium

Trypsin

Serine protease

Normal acinar epithelium

Chymotrypsin

Serine protease

Normal acinar epithelium

CA19–9

Tetrasaccharide carbohydrate, role in cell–cell recognition

Neoplastic duct epithelium

DUPAN-2

Tetrasaccharide carbohydrate, role in cell–cell recognition

Normal and neoplastic duct epithelium

Hsp47

Collagen-specific chaperone

Desmoplastic stroma

Name

Apolipoprotein C-1 Secreted lipid carrier

Desmoplastic stroma

Secreted lipid carrier

Desmoplastic stroma

Apolipoprotein D

Thrombospondin-1 Extracellular matrix (ECM) component

Neoplastic duct epithelium Desmoplastic stroma

MMP2

Matrix remodeling

Desmoplastic stroma

MMP11

Matrix remodeling

Desmoplastic stroma

2. Materials (Before beginning see Note 1) 2.1. For RNA Extraction from Bulk Tissues and Determination of RNA Integrity

1. Frozen specimens of pancreatic adenocarcinoma. 2. RNase-free iron plate. 3. Suitably sized vessels for tissue homogenization (e.g., 1.5-ml tubes).

Cancer Gene Profiling in Pancreatic Cancer

283

4. Containers with dry ice and wet ice. 5. Sterile surgical blades. 6. Scientific precision balance. 7. Polytron in a cold room. 8. Microcentrifuge in a cold room. 9. QIAGEN RNA extraction kit (depending on the starting amount of tissue, the suitable kit may range from Mini Kit to Maxi Kit) or PicoPure RNA Isolation Kit (Catalog #KIT0204, suitable for large amounts of tissue). 10. Diethylpyrocarbonate (DEPC)-treated distilled water. 11. Agarose-LE (Catalog #AM9040). 12. 10× Gel Prep/Running #016R057898A).

Buffer

(AMBION

Catalog

13. Power supply. 14. Electrophoresis chamber, gel casting tray, and sample combs. 15. Ethidium bromide, a fluorescent dye used for staining nucleic acids (mutagenic). 16. Glyoxal Load Dye (AM #8551), which contains bromophenol blue and also ethidium bromide. 17. Transilluminator (ultraviolet light box) or a more modern molecular imager such as a Gel Doc XR System (BioRad Laboratories), which will be used to visualize ethidium bromide-stained nucleic acids in gels. 2.2. For Sectioning of Frozen Tissue Samples (See Note 2)

1. Frozen samples of normal pancreas and pancreatic adenocarcinoma, preferably snap frozen. 2. Sterile surgical blades and forceps. 3. Container with dry ice. 4. Cryostat with disposable microtome blades. 5. Tissue-Tek® OCT compound (VWR Catalog #25608-930). 6. 100% ethanol for cleaning the knife holder and antiroll plate in the cryostat. 7. Hematoxylin and eosin for routine histologic staining. 8. Polystyrene distyrene 80, dibutyl phthalate plasticizer, and xylene (DPX) mounting media. 9. Glass slides.

2.3. For RNA Extraction from Pancreatic Microdissected Tissue

1. Microscope with laser capture microdissection (LCM) system or with P.A.L.M. laser pressure catapulting system (Zeiss/ P.A.L.M. Laser Technologies). 2. If the LCM system is going to be used, the HistoGene™ LCM Frozen Section Staining Kit is recommended for preparing and staining tissues, preserving intact nucleic acids from the captured cell populations.

284

Vilardell and Iacobuzio-Donahue

3. Slides covered with membrane of polyethylene naphthalate (PEN-membrane slides, by P.A.L.M. Laser Technologies) if the P.A.L.M. system is going to be used. These PEN-membrane slides can be acquired DNase and RNase free. Otherwise, RNase can be inactivated by heating at 180°C for 4 h. 4. Tightly sealed containers for freezing and rethawing the slides, such as a 50-ml Falcon tube or a microslide box. 5. A small desiccator or slide box containing desiccant (anhydrous calcium sulfate and cobaltous chloride) to transport the mounted slides from one location to another (VWR Scientific Products Catalog #22890-900). 6. Ice-cold 70% ethanol. 7. Ice-cold RNase-free water. 8. Histological staining such as hematoxylin and eosin, Methyl Green, Cresyl Violet, or Nuclear Fast Red. 9. PALM AdhesiveCaps or usual RNase-free plastic tubes of 0.5-ml size for collecting catapulted samples. 10. Lysis buffer, e.g., QIAGEN RLT buffer, and RNeasy® Micro Kit (QIAGEN, #74004), for RNA extraction from P.A.L.M. microdissected tissue. 11. PicoPure RNA Isolation Kit (Arcturus, Catalog #KIT0202), or RNAqueous®-Micro Kit Catalog #AM1931 for RNA extraction from LCM microdissected cells. 2.4. For RNA Extraction from Cultured Cells

1. Cultured cells, 80–90% confluent in 75-cm2 tissue culture flasks. 2. Phosphate-buffered saline (PBS): 8.0 mM Na2HPO4·2H2O (1.44 g/l), 1.5 mM KH2PO4 (0.30 g/l), 2.7 mM KCl (0.20 g/l), 0.137 mM NaCl (8.00 g/l), adjust pH to 7.8 with K2HPO4. Store at room temperature and cool before using. 3. RNeasy® Mini, Midi, or Maxi RNA isolation kits (QIAGEN). 4. RNase-free DNAse set (QIAGEN). 5. 100% ethanol. 6. DEPC-treated water. 7. Cycloheximide (optional).

3. Methods 3.1. Checking RNA Integrity

Checking the integrity of the RNA of bulk tissue samples is recommended before beginning any other more sophisticated

Cancer Gene Profiling in Pancreatic Cancer

285

procedure such as laser microdissection. First, identification of samples with poor RNA quality at this stage will avoid wasting valuable time and reagents for microdissection. Second, the finding of poor RNA quality after laser microdissection of bulk samples that were first deemed satisfactory may indicate RNA degradation during the microdissection procedure. Extractions for checking RNA integrity may easily be performed by means of a QIAGEN RNeasy® Mini Kit from snap-frozen bulk tissue as follows: 1. Use RNase-free pincers and a sterile scalpel blade for cutting the specimen on an RNase-free iron plate (e.g., 15 × 10 cm) placed in a container with dry ice. 2. Place the container with dry ice near a balance. To weigh the portion of tissue, the weight of the 1.5-ml tube must be reduced by the tare of an empty tube. Determine the amount of tissue. Do not use more than 30 mg. All of the remaining steps have to be performed in wet ice. 3. Place the fragment of tissue into a clean 1.5-ml tube and add 594 ml of lysis buffer RLT plus 6 ml of b-mercaptoethanol. 4. Keeping the tube with the sample and the lysis buffer in a beaker with wet ice, proceed to disrupt the tissue by means of a Polytron placed in a cold room. Homogenize the sample three times at full speed for 30 s each, waiting 15 s between each homogenization. To avoid heating of the sample, an alternative is to perform more homogenizations but at shorter lengths. Allow the homogenized solution to stand for 5 min. 5. Proceed with the rest of the RNeasy® Mini Handbook protocol. 6. Check the RNA quality of the eluted RNA by gel electrophoresis. This may be performed using an Agilent BioAnalyzer, or by routine agarose gel electrophoresis described below. (a) Before beginning, spray or wipe the surfaces of the glassware and electrophoresis equipment to be used with RNaseZap to removing RNases. Rinse twice with DEPCtreated water. (b) Place the gel tray containing combs in the electrophoresis chamber. The combs must be placed closest to the cathode (negative/black) lead. (c) From the stock of 10× Gel Prep/Running Buffer, prepare a 1× dilution, e.g., taking 100 ml of 10× Running Buffer and adding up to 1,000 ml of DEPC-treated water. (d) Prepare a 1% agarose gel, melting 1 g of agarose in 100 ml of 1× Running Buffer for every 100 ml of gel needed. Heat in a microwave oven until the agarose is in complete solution.

286

Vilardell and Iacobuzio-Donahue

(e) Add 9 ml of an ethidium bromide solution at 10 mg/ml to the melted agarose. Ethidium bromide is known to be mutagenic and should be handled carefully. Let the solution cool to approximately 60°C. (f) Up to 30 mg of RNA can be loaded in each well. Usually we load 20 ml of prepared sample in each well, comprising 10 ml of sample RNA and an equal volume of Glyoxal Load Dye. For larger sample volumes, less Glyoxal Load Dye can be used, but never use less than one half volume. Incubate the samples in a heating block at 50°C for 30 min (60 min if less than one volume of Load Dye was used). (g) After incubation, briefly spin the samples. If they will not be loaded into the gel immediately, place them in ice. The samples can also be stored at −20°C at this stage for several days. (h) Pour the gel into the casting tray to approximately 6 mm in thickness and pop the bubbles with an RNase-free cover slide or pipet tip. Let the gel solidify. (i) Fill the electrophoresis chamber with 1× Running Buffer dilution up to covering the gel with approximately 1 cm of Running Buffer. Remove the combs. (j) Load the samples into the gel by mean RNase-free pipet tips, placing the tip inside the top of the well. Be careful not to trap air at the end of the tip when picking up every sample. (k) Run the gel at 5 V/cm of distance between electrodes. RNA and bromophenol blue will migrate toward the anode (positive/red) electrode. Free ethidium will migrate in the opposite direction of the RNA, running off the top of the gel. Run the gel until the bromophenol blue front migrates almost to the bottom of the gel. (l) The gel can be viewed and photographed under UV light on a transilluminator or in a Gel Doc. Place plastic wrap beneath the gel to avoid contaminating RNases from the transilluminator surface. (m) The 28S and 18S ribosomal RNA (rRNA) bands should be clearly visible in the intact RNA sample, and the 28S rRNA band should be approximately twice as intense as the 18S rRNA band. This 2:1 ratio (28S:18S) is a good indication of RNA quality. Partially degraded RNA will have a smeared appearance, will lack the sharp rRNA bands, or will not exhibit a 2:1 ratio. Completely degraded RNA will appear as a very low molecular weight smear (Fig. 2) (see Note 3).

Cancer Gene Profiling in Pancreatic Cancer

287

Fig. 2. Analysis of RNA quality. Extracted RNA was run on an Agilent 2100 Bioanalyzer (Agilent, Palo Alto, CA). From left to right, the gel images for the RNA ladder, six samples of RNA extracted from pancreatic cancer tissues, and one sample of RNA extracted from a cell line are shown. RNA from tumor samples 1–4 and the cell line show a 2:1 ratio abundance of the 28S to the 18S ribosomal RNA and are acceptable for use in expression profiling. By contrast, tumor samples 5 and 6 show significant degradation.

3.2. Sample Selection 3.2.1. From Fresh Tissue

1. Embed the specimen in embedding media, usually a viscous compound called OCT (Tissue-Tek®), to allow histologic sectioning (see Note 6). Place an empty, labeled cryomold on dry ice for 1 min. It should remain on dry ice during the entire embedding procedure. 2. Cover the bottom of the cryomold with embedding medium, approximately 1–2 mm deep. Place the tissue for freezing against the bottom of the cryomold in the medium. To facilitate cutting, the tissue should be relatively small (1 cm in maximum dimension) and the desired cutting surface should be faced against the bottom. 3. Fill the cryomold containing the base of embedding medium and tissue with more embedding medium. Cover the dry ice

288

Vilardell and Iacobuzio-Donahue

container and allow the embedding medium to harden (it will turn from translucent to white when frozen). The blocks of frozen tissue can be stored at −80°C until needed. 4. Within the cryochamber of the cryostat, remove the block of tissue encased within embedding media from the cryomold and attach it to the specimen holder disk of the cryostat with additional embedding medium. 5. Allow the block to equilibrate to the cryochamber temperature (−20°C) for at least 15 min. 3.2.2. From Previously Snap-Frozen Tissue

Tissue frozen in liquid nitrogen usually yields higher quality RNA than tissue frozen in dry ice. 1. Remove the tube containing the sample from the −80°C freezer and place in dry ice for transferring to the cutting room. 2. Cool a cryomold in dry ice, add OCT embedding media until approximately two thirds full, and leave the OCT to become viscous but not hard. Take the frozen specimen with clean forceps and dip in the OCT, pressing the tissue down against the bottom of the cryomold. 3. Cover the specimen completely by adding more OCT, and freeze the tissue completely by keeping the cryomold on dry ice for 5 min. Sectioning the tissue can now be performed, or the frozen OCT tissue block can be stored at −80°C until ready for use.

3.3. RNA Extraction from Microdissected Tissue

Tissue microdissection performed by either a laser capture microdissection (LCM) system or a P.A.L.M. laser pressure catapulting system (Zeiss/P.A.L.M. Laser Technologies) is explained in detail in its specific chapter (also see Notes 4, 5 and 7).

3.4. RNA Extraction from Cultured Cells

1. Remove the medium from cultured cells that are 80–90% confluent in a 75-cm2 flask. Wash the cells carefully with sterile PBS at 4°C (see Note 8). 2. Add 5 ml PBS. Put the flask on ice and collect the cells by scraping with a sterile cell scraper. Transfer the cells to a new 12-ml tube (white cap). Add 5 ml PBS to the original flask and continue scraping. Collect all cells. 3. Microcentrifuge for 10 min at 1,500 rpm (150 g) at room temperature. 4. Carefully remove the supernatant by inverting the tube. Optional: wash the pellet first with cold sterile PBS. 5. Put the tube on wet ice. 6. Proceed with the suitable RNeasy® Micro, Midi, or Maxi Kit (QIAGEN) according to the instructions for RNA isolation from cell cultures.

Cancer Gene Profiling in Pancreatic Cancer

289

4. Notes 1. A special area for working with RNA is critical. The surface of the workspace should be cleaned of RNases with special cleaning solutions, e.g., RNaseZap (AMBION, #9780), or RNase AWAY ® (Molecular BioProducts). Gloves should be frequently changed and a mask should be worn. All plastic tubes and filtered tips should be sterile. All of the glass reservoirs and tools should be treated with diethylpyrocarbonate (DEPC) and baked at 180°C several hours before using. In addition, all solutions should be prepared in advance to contain 0.1% DEPC and be kept on ice until use. 2. A major reason for preparing frozen sections is not for RNA extraction but to confirm the histology (healthy, pancreatitis, cancer) when using bulk tissues. 3. Extracted RNA should be resuspended in RNA elution buffer or ethanol, but not in DEPC-treated water. 4. Before microdissecting, make sure to confirm the sample histology. For this purpose, proceed with cutting a section at 5 mm onto a standard glass slide and quickly perform hematoxylin and eosin (H&E) staining. (a) Dip the previously fixed slides five to six times in RNasefree deionized water. (b) Stain for 1 min in Mayer’s hematoxylin solution. (c) Rinse for 1 min in DEPC-treated water. (d) Stain for 10 s in Eosin. (e) Quickly dehydrate in 70% ethanol, then 96% ethanol, and then 100% ethanol, for 15 s each. Make sure to continuously dip the slide during dehydration. (f ) Mount the H&E-stained section with DPX and check that the histology of the tissue corresponds to the expected histology. If not, reject the current OCT block and choose another one. Repeat the process with the new OCTembedded specimen. (g) If the new sample shows the expected histology, go ahead with the procedure, making 8-mm-thick sections of the same mounted specimen. Mount the sections at the center of a room-temperature LCM microslide or a precooled P.A.L.M. PEN-membrane slide. 5. H&E staining is also frequently used for laser microdissection. If hematoxylin staining is chosen, the duration of staining with hematoxylin should be minimized, and the eosin can be avoided entirely if a good visualization of the cytoplasm is not required

290

Vilardell and Iacobuzio-Donahue

during the microdissection. Other stains, such as Cresyl Violet, Methyl Green, or Nuclear Fast Red can be regarded as alternatives. For tissues particularly rich in RNases such as normal pancreatic tissue, a short staining procedure such the Cresyl Violet method is recommended as follows:

(a) Dissolve solid cresyl violet acetate at a concentration of 1% (w/v) in 100% EtOH at room temperature by stirring for several hours or even overnight. Filter the solution before use. (b) Dip the previously fixed slides for 1 min in the 1% cresyl violet acetate solution. (c) Remove the excess of staining on absorbent surface. (d) Dip into 70% ethanol. (e) Dip into 100% ethanol.

(f) Air-dry on a Kimwipe for 1–2 min. Methyl Green is a good staining procedure because it is very fast and therefore helpful to preserve RNA quality. (a) Dip the previously fixed slides five to six times in RNasefree deionized water. (b) Dip the slides in Methyl Green solution (DAKO, #S1962) for 1 min. (c) Rinse for 30 s in a jar with DEPC-treated water. (d) Rinse for 30 s in a second jar with DEPC-treated water. (e) Air-dry on a Kimwipe for 1–2 min. The lack of dehydration steps in the Methyl Green staining method may cause a poor morphologic appearance of the tissue sections. This may be improved by first mounting the slide on the P.A.L.M. microscope, then adding 5 ml of 70% or 100% ethanol onto the area of interest with an RNase-free pipet tip. When using 100% ethanol, some destaining of the tissue may occur, but the tissue dries much more quickly than when using 70% ethanol. 6. Some helpful features to distinguish a well-differentiated ductal adenocarcinoma from trapped and reactive glands in chronic pancreatitis with atrophy and fibrosis are provided in Table 2. 7. Extract RNA from microdissected samples according to the PicoPure RNA Isolation Kit (Arcturus, Catalog #KIT0202) procedures for LCM-obtained samples or by means of RNeasy® Micro Kit (QIAGEN, #74004) for P.A.L.M. microdissected samples. Both procedures involve passing cell extracts through affinity spin columns, which bind and immobilize the RNA. The bound RNA is then washed and total RNA can be eluted to a final volume of 10 ml. If the RNeasy® Micro Kit is used, adding 50% ethanol to the cleared lysate containing buffer RLT instead of 70% ethanol can increase the RNA yield.

Cancer Gene Profiling in Pancreatic Cancer

291

Table 2 Helpful features to distinguish ductal adenocarcinoma from chronic pancreatitis Carcinoma

Pancreatitis

Glandular architecture

Haphazard

Lobular

Variation in nuclear size

Variable (4:1 or more)

Uniform

Nucleoli

Huge irregular nucleoli

Single, regular

Luminal necrosis

May be present

Absent

Incomplete glands

May be present

Absent

Perineural invasion

May be present

Absent

Vascular invasion

May be present

Absent

Glands immediately adjacent to muscular artery

May be present

Absent

8. Optional use of cycloheximide in RNA extractions from cell lines. Cycloheximide is a protein translation inhibitor and may have a positive effect on the level of mutant transcripts present. If desired, in a fume hood, prepare a 100 mg/ml solution of cycloheximide. Incubate the cultured cells in the presence of 100 mg/ml of cycloheximide for 4–8 h prior to harvesting.

References 1. Buchholz, M., M. Braun, A. Heidenblut, H. A. Kestler, G. Kloppel, W. Schmiegel, S. A. Hahn, J. Luttges, and T. M. Gress. (2005). Transcriptome analysis of microdissected pancreatic intraepithelial neoplastic lesions. Oncogene 24, 6626–36. 2. Crnogorac-Jurcevic, T., E. Efthimiou, T. Nielsen, J. Loader, B. Terris, G. Stamp, A. Baron, A. Scarpa, and N. R. Lemoine. (2002). Expression profiling of microdissected pancreatic adenocarcinomas. Oncogene 21, 4587–94. 3. Friess, H., J. Ding, J. Kleeff, L. Fenkell, J. A. Rosinski, A. Guweidhi, J. F. Reidhaar-Olson, M. Korc, J. Hammer, and M. W. Buchler. (2003). Microarray-based identification of differentially expressed growth- and metastasis-associated genes in pancreatic cancer. Cell Mol Life Sci 60, 1180–99. 4. Grutzmann, R., C. Pilarsky, O. Ammerpohl, J. Luttges, A. Bohme, B. Sipos, M. Foerder, I. Alldinger, B. Jahnke, H. K. Schackert, H. Kalthoff, B. Kremer, G. Kloppel, and H. D. Saeger. (2004). Gene expression profiling of

microdissected pancreatic ductal carcinomas using high-density DNA microarrays. Neoplasia 6, 611–22. 5. Iacobuzio-Donahue, C. A., R. Ashfaq, A. Maitra, N. V. Adsay, G. L. Shen-Ong, K. Berg, M. A. Hollingsworth, J. L. Cameron, C. J. Yeo, S. E. Kern, M. Goggins, and R. H. Hruban. (2003). Highly expressed genes in pancreatic ductal adenocarcinomas: a comprehensive characteri zation and comparison of the transcription profiles obtained from three major technologies. Cancer Res 63, 8614–22. 6. Iacobuzio-Donahue, C. A., B. Ryu, R. H. Hruban, and S. E. Kern. (2002). Exploring the host desmoplastic response to pancreatic carcinoma: gene expression of stromal and neoplastic cells at the site of primary invasion. Am J Pathol 160, 91–9. 7. Kim, H. N., D. W. Choi, K. T. Lee, J. K. Lee, J. S. Heo, S. H. Choi, S. W. Paik, J. C. Rhee, and A. W. Lowe. (2007). Gene expression profiling in lymph node-positive and lymph node-negative pancreatic cancer. Pancreas 34, 325–34.

292

Vilardell and Iacobuzio-Donahue

8. Logsdon, C. D., D. M. Simeone, C. Binkley, T. Arumugam, J. K. Greenson, T. J. Giordano, D. E. Misek, R. Kuick, and S. Hanash. (2003). Molecular profiling of pancreatic adenocarcinoma and chronic pancreatitis identifies multiple genes differentially regulated in pancreatic cancer. Cancer Res 63, 2649–57. 9. Ryu, B., J. Jones, N. J. Blades, G. Parmigiani, M. A. Hollingsworth, R. H. Hruban, and S. E.

Kern. (2002). Relationships and differentially expressed genes among pancreatic cancers examined by large-scale serial analysis of gene expression. Cancer Res 62, 819–26. 10. Ryu, B., J. Jones, M. A. Hollingsworth, R. H. Hruban, and S. E. Kern. (2001). Invasionspecific genes in malignancy: serial analysis of gene expression comparisons of primary and passaged cancers. Cancer Res 61, 1833–8l.

Chapter 15 Cancer Gene Profiling in Prostate Cancer Adam Foye and Phillip G. Febbo Summary Gene profiling and expression analysis using microarrays have made a significant impact on our biological understanding of prostate cancer. The procedures for generating high-quality expression data from prostate cancer cell lines and tumors are not trivial. However, during the past 9 years, methods by which to process samples for gene profiling have been developed. In this chapter, techniques to process prostate cancer specimens either en bloc (macrodissection) or using laser capture microdissection are presented in detail along with extensive technical notes. Although we focus on prostate cancer and discuss the specific methods utilized in our lab, the processes discussed are generalizable to other tumors and amenable to the substitution of alternative instruments and/or commercially available kits. Key words: Prostate cancer, Expression analysis, Laser capture microdissection, En bloc analysis

1. Introduction Prostate cancer remains the most common nondermatological cancer in men in the United States and the second leading cause of death in men (1). Understanding the molecular pathogenesis of prostate cancer has great potential to improve disease control and cure rates for men diagnosed with this disease. Expression analysis has been applied to prostate cancer cell lines, xenografts, and human tumors and has improved our biological understanding of prostate cancer and is contributing to the improved clinical management of patients. Expression analysis has resulted in perhaps the most profound discovery in prostate cancer biology this decade; the identification of chromosomal translocations involving the ETS transcription factors in prostate cancer. Using a novel approach to gene Robert Grützmann and Christian Pilarsky (eds.), Cancer Gene Profiling: Methods and Protocols, Methods in Molecular Biology, vol. 576, DOI 10.1007/978-1-59745-545-9_15, © Humana Press, a part of Springer Science + Business Media, LLC 2010

293

294

Foye and Febbo

expression called cancer outlier profile analysis (COPA), a group of investigators identified aberrant expression of ERG (21q22.3) and ETV1 (7p21.2) in a subset of prostate cancers compared with other solid tumors. These genes were subsequently found to be genetically translocated downstream of TMPRSS2, an androgenregulated protease (2). Multiple groups have now confirmed the frequent presence of these translocations in early prostate cancer and this discovery has had profound implications on the direction of prostate cancer research. A second biological discovery with a major impact on our understanding of prostate cancer resulting from microarray analysis was the finding that increased androgen receptor (AR) RNA expression is the single most consistent RNA change associated with castration-refractory growth and that increased AR expression was both sufficient and necessary for castrationresistant growth of the xenografts (3). The finding of continued AR signaling despite castrate levels of testosterone was subsequently demonstrated in human samples of castration-resistant prostate (4). Interestingly, even with increased expression of the AR gene and androgen-metabolizing enzymes, the transcriptional activity of AR still decreases in castration-resistant prostate cancer (5, 6). This suggests that while the levels of AR activity likely decrease in hormone-refractory prostate cancer, the energy allotted to maintain some AR signaling increases, underscoring the importance of maintaining at least some AR activity. These observations have reinvigorated investigations on how to further inhibit the AR in advanced prostate cancer so as to improve the duration and quality of life for men with advanced prostate cancer. Finally, multiple groups using microarrays have demonstrated significant differential gene expression between malignant and benign prostate cancer specimens (7–12). Of the genes found to be differentially expressed by most groups, AMACR has been further studied and validated in multiple independent studies using immunohistochemistry (9, 13) and now serves as a clinically deployed biomarker to help differentiate between tumor and benign prostate tissues. Although these examples represent only a fraction of the body of work where microarray analysis of prostate cancer has provided biological or clinical insight, they demonstrate the central role global transcriptional analysis of prostate cancer cell lines, xenografts, and tumors has played and portent the continued importance of expression analysis as an investigational method with which to interrogate prostate cancer. In this chapter, we provide the materials and methods required for expression analysis of prostate cancer. In addition, the methods are annotated with procedural insight and helpful hints to facilitate the adoption and deployment of this technique. This chapter is focused on

Cancer Gene Profiling in Prostate Cancer

295

reparing samples for oligonucleotide arrays such as those availp able from Affymetrix, but they are readily adapted for use if the use of spotted dual hybridization arrays (i.e., Agilent) or beadbased assays (i.e., Illumina) are preferred.

2. Materials 2.1. Fresh-Frozen Tissue Block Preparation, Storage, and Microdissection

1. Cryostat (Leica 1800, Leica CE knife holder for disposable low-profile blades) with heat extractor. 2. Sakura® Accu-Edge® low-profile disposable microtome blades. 3. Tissue pallets for cryostat, minimum of 5. 4. Sakura® Tissue-Tek® Optimal Cutting Temperature (OCT) compound, embedding medium. 5. Sakura® Tissue-Tek® Mega-Cassette® plastic tissue storage cartridges. 6. Liquid nitrogen-based storage system. 7. Small syringe needle (for tissue manipulation, ~18 g). 8. Straight-edged razor blades.

2.2. Slide Staining and Dehydration

1. 70% ethanol (95% ethanol and nuclease-free water). 2. 95% ethanol. 3. 100% ethanol stored with molecular sieves (Sigma-Aldrich, cat. no. 69839-250G). 4. Gill #2 formulation hematoxylin. 5. Alcoholic eosin-Y. 6. Xylene. 7. Nuclease-free water. 8. 11 Coplin staining jars, 55 mL. 9. Compressed air or nitrogen gas line.

2.3. Laser Capture Microdissection

1. Arcturus (Molecular Devices) Veritas LCM Instrument. 2. CapSure® Macro LCM Caps or HS LCM Caps (Molecular Devices, cat. no. LCM0212 or LCM0214). 3. GeneAmp thin-walled reaction tubes, 0.5 mL.

2.4. Frozen Tissue Macrodissection

1. Thermo Savant FastPrep® FP120 Homogenizer. 2. MP Biomedicals Lysing Matrix A. 3. Straight-edged razor blades.

296

Foye and Febbo

4. Cryostat (Leica 1800, Sakura® Tissue-Tek® disposable microtome blades) with heat extractor. 2.5. RNA Isolation

1. Stratagene® Absolutely RNA® Nanoprep Kit. 2. 70% ethanol (see Subheading 2.2). 3. Ambion® RNase Zap®. 4. Heating blocks preset to 37°C and 60°C. 5. Ambion® mirVana™ miRNA Isolation. 6. Heating block preset to 95°C. 7. 100% ethanol (see Subheading 2.2).

2.6. RNA Quality Verification

1. Agilent 2100 Bioanalyzer. 2. Agilent RNA 6000 Pico Kit. 3. Molecular biology-grade water, nuclease free (Cellgro, cat. no. 46-000-Cl). 4. Turner Biosystems TBS-380 mini-fluorometer. 5. Molecular Probes Inc. RiboGreen® RNA Quantification Kit. 6. Wheaton 200-mL round glass cuvettes, 5 × 31 mm.

2.7. RNA Amplification and Biotin Labeling

1. NuGEN Ovation™ Biotin RNA Amplification and Labeling System. 2. Zymo Research DNA Clean & Concentrator™-25 purification kit. 3. Qiagen DyeEx 2.0 Spin kit. 4. 80% ethanol (room temperature, prepared with 95% ethanol and nuclease-free water, see Subheading 2.2). 5. NanoDrop® ND-1000 spectrometer.

3. Methods 3.1. Fresh-Frozen Tissue Block Preparation and Storage

1. Set cryostat temperature to −20°C. Position the heat extractor for perpendicular movement to the pallet holding platform. 2. Prepare several tissue collection pallets by applying OCT compound directly to the cooled aluminum cryostat pallets. 3. Once the OCT compound has become partially opaque (the circles on the surface of the pallet will disappear, see Note 1), use the heat extractor cylinder to put a flat surface on the top of the OCT pallet. 4. Separate the OCT compound and pallet from the heat extractor using a sharp straight-edged razor blade.

Cancer Gene Profiling in Prostate Cancer

297

5. Repeat steps 1–3 for each sample pallet necessary to collect the tissue sample. 6. Place the pallets in either a pallet holder surrounded with dry ice or simply place the pallets on dry ice (see Note 1). Keeping the pallets cold is essential to freezing the fresh OCT compound. Bring OCT compound, gloves, and a small syringe needle to the collection. 7. At the collection, place tissue samples directly onto the pallets of OCT. The tissue will freeze on contact so align the tissue prior to making contact. 8. Surround the tissue with OCT compound, forming a perimeter around the fresh tissue. Slowly add more OCT over the top, making sure that the previous layer does not solidify (become opaque) before adding more. Allowing the layers to solidify can cause block fragmenting during sectioning. 9. Cover the tissue specimen completely with OCT and wait for the compound to solidify. If the tissue specimen is too large to cover on the pallet, cover as much as possible and recoat the specimen on the cryostat. 10. At the cryostat, separate the original OCT layer on the pallet from the layer surrounding the frozen tissue using a razor blade. The layers should separate easily. 11. Invert the tissue block so that the exposed tissue is facing up. Place just enough OCT compound over the tissue to cover the top of the block once the heat spreader is lowered. 12. Lower the heat spreader to flatten the surface and freeze the OCT. Separate the OCT from the extractor with a razor blade as before. 13. The frozen tissue should now be completely surrounded with OCT. Place the tissue block into a labeled Mega Cassette® cartridge and store in a freezer box (9 × 9 cardboard grid with every other vertical divider removed) in vapor phase liquid nitrogen. Storage at −80°C is considered adequate for short-term storage (<1 month). 3.2. Frozen Tissue Sectioning and Slide Preparation for Microdissection

1. Set the cryostat temperature to −25°C or warmer depending on the tissue (see Note 2). Position the heat extractor for perpendicular movement to the pallet-holding platform. 2. Label all of the slides that will be required for the sample being sectioned. 3. Apply enough OCT to an empty pallet so that the surface is covered. 4. When the surface of the pallet is no longer visible (the bottom layer of OCT has become opaque), place the block of tissue to be sectioned in the center of the pallet.

298

Foye and Febbo

5. Immediately apply the heat extractor to the surface of the tissue block. This places the tissue block flat for easier blade alignment. 6. Insert the pallet into the microtome. Set the cutting depth to 7 mm (see Note 3). 7. Cut sections of tissue and place two sections, centered, on each slide. Having two sections to choose from during LCM can increase the chances of a successful run. 8. Immediately place each slide into the chilled slide box in the cooler of dry ice. Make every effort to minimize the amount of time the tissue section spends melted after cutting and before refreezing in the slide box. RNA will degrade during this time. 9. When complete, store frozen sections in a covered slide box at −80°C for up to a few weeks. Ideally, slides are taken immediately from the cryostat to the staining setup prior to LCM to minimize RNA degradation (see Note 4). 3.3. Hematoxylin and Eosin (H&E) Slide Staining and Dehydration (Truncated Protocol, See Note 5)

1. Dip the frozen slide into 70% ethanol for 30 s. 2. Briefly wash the slide in nuclease-free water by dipping the slide until streaking is gone. 3. Dip the slide in Gill #2 or hematoxylin repeatedly for 3–5 s. 4. Wash off excess hematoxylin in nuclease-free water until the stain streaks have cleared. 5. Soak in 70% ethanol for 30 s. 6. Soak in 95% ethanol for 30 s. 7. Dip the slide in alcoholic eosin for 2–3 s. 8. Soak in 95% ethanol for 30 s, initially rocking the slide gently to wash off excess eosin. 9. Continue to dehydrate tissue in 100% ethanol for 30 s. 10. Perform two successive washes in xylenes for 1 min each. 11. After the second wash, quickly wipe off excess xylenes from the back and edges of the slide with a small wipe (do not touch the tissue). 12. Dry slide under diffuse nitrogen gas or a compressed air line (10 psi or less) for 10–15 s. Be sure the air pressure is not strong enough to disrupt the tissue, particularly with bone tissue (see Note 6). When the section is adequately dried for LCM, it will appear lighter or even a bit chalky once the xylenes have evaporated. 13. Begin LCM immediately.

3.4. Laser Capture Microdissection

Cancer Gene Profiling in Prostate Cancer

299

1. Open the Veritas software and select New Session from the materials control box (the Veritas instrument and the computer should be left powered on). 2. Load the stained slide into the LCM slide stage as well as cartridges with the appropriate LCM caps into the cap tray (see Note 7). If the goal is to isolate RNA from the captured cells, only load one slide per session to minimize the time each section spends exposed prior to isolation (see Note 8 for RNA handling during LCM). 3. Mark the caps present and complete the necessary information for the slide including slide name, study, and any notes. When OK is selected, the machine door will close and the session will begin. 4. The LCM machine will first acquire a roadmap image of the slide. Do not begin selecting anything until this process is over. 5. The system automatically begins with the 2× objective. Scan the tissue section for the region of interest for capture. This can be done using any objective (microscope control box), but 10× is often the most efficient. The larger, flatter regions of tissue typically lead to the best LCM performance. Avoid

Fig. 1. Bone fragments in a were removed using a clearing cap prior to LCM of cancer cells. b shows the cleared tissue. c and d show cap melting performance with (c) and without (d) the interfering bone fragments. When the cap can rest flat on the tissue surface, melting is more consistent. IR laser settings: single pulse at 80 mW, 2,500 ms for c and d.

300

Foye and Febbo

folded or raised tissue because it often impairs melting performance (Fig. 1). Imperfections in sections such as folds can be corrected prior to slide loading. See Note 9 for tips on macrodissection and pre-LCM slide preparations. 6. Once a region is chosen, center the live viewing window on the middle of the region, right click, and select place cap at region center. During cap movement, do not press any command buttons. 7. Once the cap has been placed on the slide and the machine has returned to rest, dim the visible light (microscope control box) and use the infrared (IR) laser focus bar (capture laser control box) to hone the beam to a focused spot. Reset the visible light level and test fire the IR laser (double click) over either a highly visible section of targeted tissue, or a location without tissue. Begin with the following laser parameters: 65 mW, 2,500-ms duration, two pulses. See Note 10 for more information on laser setup for various tissue types and LCM conditions. 8. The laser spot size is configured by adjusting the listed parameters from the capture laser control box. Spot size will vary depending on the cap height above the tissue, tissue dryness, and inconsistencies in both the cap membrane and the tissue surface (see Note 11). Smaller spot sizes typically offer higher LCM precision but result in higher capture times. Use the largest spot possible that fires consistently and offers the required amount of precision. Settings for the capture laser parameters can vary dramatically from slide to slide. As a general guide, increased pulse durations will lead to larger spot diameters and often require a decreased pulse power. Pulse power should be adjusted to ensure proper melting. If the laser test spot appears as a blurred round spot, the membrane has not melted to the tissue/slide surface and the laser power needs to be increased. 9. Once a reliable spot is created, use the spot size function (capture laser control box) to record the diameter of the melted spot. Click on either side of the inside edge of the black spot perimeter (marks the two end points of the spot diameter). If measuring a spot off-tissue, mark the edges on the spot perimeter to account for the larger spot size made over the tissue (see Note 11). 10. Begin targeting by one of two ways: from a static image or from the live video window. To capture from the live video screen, skip to step 11. To acquire a static image, use the box selection tool (red rectangle) to draw a box over the region of interest. This region must be within the circumference of the placed cap to capture the cells without repositioning the cap and going through an additional laser setup. See Note 12

Cancer Gene Profiling in Prostate Cancer

301

Fig. 2. Pre-LCM (left) and post-LCM (right) static images with post-LCM cap image (bottom).

for hints on successful region selection for LCM. Once the region is drawn, deselect the box tool, right click on the box, and select acquire region. All tissue targeting will be done from the static image that will appear once each tiled image is taken (Fig. 2). 11. To begin marking tissue, select one of the marking tools from the microdissection control box. The Veritas system, as well as other LCM units, offers a spot tool, a line drawing tool, and a region drawing tool (shown as a red polygon). Choose the tool that best matches the regions of interest. In most cases, especially with plain glass slides, basic LCM (IR laser only) will be the best method of capture. Note 13 highlights some joint uses of the IR and UV lasers to eliminate capture of unwanted tissue. Marking tools can be changed throughout the marking process. Be sure to leave a comfortable margin between the target spots and unwanted tissue. A good width to start with is the radius of the spot size. The total area of the cells targeted will appear in the capture groups control box. 12. Once the desired area of tissue is targeted (Fig. 3), select LCM capture selected cells under the Image menu. If targeting was done on the live video window, the Go – Cut and

302

Foye and Febbo

Fig. 3. Targeted regions on a static image prior to LCM.

Capture button can be selected from the microdissection control box. Do not click any commands while the instrument is performing LCM. Regardless of targeting method, the live window will move to the top so that the LCM process can be viewed. 13. Once LCM is complete, move the live video window to a wide open region of the slide, right click, and select move cap to region center. This will move the cap to a clear section of slide to allow visualization of the captured tissue. Once the cap move is complete, the box tool can be used as during targeting to take a tiled static image of the cap after capture (Fig. 2). 14. If satisfied after inspection of the captured tissue, move the cap to an offload bay in the materials control box. Wait for the machine to complete all movement before proceeding. If unwanted tissue was pulled up by the cap, ablation may be necessary. Note 14 offers troubleshooting tips for poor LCM performance. Ablation using the UV laser will require more time and could damage good tissue adjacent to the UV beam. To ablate, drag the cap to the QC position in the offload bay. Start with a UV laser power setting of medium at roughly 30 for smaller-scale ablation. Larger-scale ablation can be done at higher power, but settings in the high category are often overkill. The command alt-x will activate the UV laser, and movement is achieved by dragging the live

Cancer Gene Profiling in Prostate Cancer

303

Fig. 4. (a) Cap image showing a region of captured tissue from Fig. 2. The tissue in the upper left (a) was not targeted and was removed through UV laser ablation. b shows the tissue on the cap after ablation.

video window (Fig. 4). When ablation is complete, drag the cap to an offload bay in the materials control box. 15. Return the live video screen to the region of capture and autofocus. Reacquire a static image for postcapture visualization (Fig. 2). 16. LCM is complete at this point; click New Session to open the door to unload the caps. 17. Using a GeneAmp 0.5-mL tube prepared with lysing buffer (see Subheading 2.5), open the tube and invert the LCM cap over the top of the tube and press to form a seal. Invert again, cap side down, and vortex thoroughly to pull cells off of the LCM cap. See Note 15 for combining samples with small capture areas to yield higher RNA concentrations. 18. The lysed cell solution can be stored at −30°C for several weeks, or placed on ice and RNA isolation begun immediately. The lysis buffer discussed in Subheading 3.6 contains guanidine thiocyanate to stabilize RNA, thus, storage at −30°C is sufficient. 3.5. Frozen Tissue Macrodissection

1. Allow the tissue block to warm to the cryostat temperature of −30°C. 2. Using the razor blade, cut away the surrounding OCT medium to expose the flat surface of the tissue block. See Note 16 for cutting tips. 3. Using a microcentrifuge tube and stand, tared to zero on an accurate balance, place pieces of the tissue block into the tube until a mass of 50–100 mg is reached. 4. Resurface the remaining tissue block with OCT and return the block to cryogenic storage. 5. Keep the weighed tissue sample on ice.

304

Foye and Febbo

6. Add 600 mL of lysis buffer from the mirVana® miRNA isolation kit (or the appropriate volume of lysis buffer from the RNA isolation kit of choice). 7. Add the contents of one 2-mL Lysing Matrix A magnetic beads tube. 8. Agitate in the FastPrep homogenizer for 20 s at the level 6 setting (see Note 17). 9. Spin the sample in a centrifuge set to 13,000 × g for 1 min at room temperature. 10. Remove lysate solution (top layer) and set aside. 11. Spin again at 13,000 × g for 1 min. 12. Take the lysate (top layer), add to the previous lysate, and proceed with the RNA isolation using the mirVana® kit. 3.6. RNA Isolation (See Note 18) 3.6.1. LCM Tissue: Stratagene®1

1. Label two spin cups and one filter cup with a cap for each sample to be isolated. Pretreat the filter by adding 100 mL of lysis buffer to the filter cup, mount in a spin cup, and centrifuge at 10,000 × g for 60 s. Discard buffer. 2. Thaw samples prior to isolation if working with frozen lysed cells from LCM. Once thawed, add an equal volume of 70% ethanol as the lysis buffer initially added. Vortex thoroughly. For combined samples, mix the contents of all combined tubes in a clean, RNase-free tube (see Note 19), then add an equal volume of 70% ethanol. 3. Transfer the contents into the corresponding spin/filter cup. Cap and spin at 13,000 × g for 60 s at 4°C. For combined samples, add up to 400 mL of the cell/ethanol mix. Discard the filtrate and repeat this step until the entire combined sample volume has passed through the spin cup. 4. Discard the filtrate and reseat the filter cup. Add 300 mL of Low-Salt Wash Buffer. Spin at 13,000 × g for 60 s at 4°C. 5. At this point, take the DNAse I enzyme out of the −30°C freezer and place it on ice. This enzyme solution is sensitive to degradation, so mix gently when necessary. Prepare the DNAse solution according to the following chart (all volumes in µl). Pipet up and down slowly to mix, then place on ice. 6. Discard the filtrate and reseat the filter cup. Add 15 mL of DNAse treatment directly onto the fiber matrix. Replace the cap and incubate in a 37°C heating block for 15 min. 7. Fill a 0.6-mL tube with elution buffer according to the chart, then place in a 60°C heating block.

1 The protocol is derived directly from the Stratagene® Absolutely RNA® Nano Prep Kit Instruction Manual, Catalog #400753, Revision #033001a.

Cancer Gene Profiling in Prostate Cancer

# of samples

305

DNAse I

DNAse digestion buffer

Elution buffer

4

11

55

75

5

14

70

90

6

17

85

105

7

20

100

120

8

23

115

135

9

26

130

150

10

29

145

165

11

31

160

180

8. After incubation, add 300 mL of High-Salt Wash Buffer to the filter cups. Spin at 13,000 × g for 60 s at 4°C. 9. Discard the filtrate and reseat the filter cup. Add 300 mL of Low-Salt Wash Buffer. Spin 13,000 × g for 60 s at 4°C. Discard the filtrate and repeat the low-salt wash. 10. Discard the filtrate and reseat the filter cup. Spin the tubes at 13,000 × g for 3 min at 4°C to dry the fiber matrix. 11. Discard the spin cup and reseat the filter cup into a new collection tube. Add 15 mL of warmed elution buffer directly to the fiber matrix and incubate at room temperature for 2 min (see Note 19). 12. Spin the tubes at 13,000 × g for 5 min at 4°C. 13. Carefully remove the tubes and transfer the eluted RNA spot to the fiber matrix for a second elution pass. Spin the tubes at 13,000 × g for 5 min at 4°C. 14. Discard the filter cup and aliquot RNA out of collection tube into appropriate sample tubes, typically one 10-mL aliquot for amplification and one 4.5-mL aliquot for quality verification. 3.6.2. Tissue Block: Ambion® 2

1. Add 1/10th volume of microRNA (miRNA) Homogenate Additive to the tissue lysate and mix well by vortexing (e.g., 30 mL of miRNA Homogenate Additive for 300 mL of lysate). 2. Leave the mixture on ice for 10 min. 3. Add a volume of acid-phenol:chloroform equal to the tissue lysate volume before the addition of the miRNA Homogenate Additive. 4. Vortex for 30–60 s to mix.

The protocol is derived directly from the Ambion® mirVana mi RNA Isolation Kit Instruction Manual, Catalog #1560, Manual Version 0601.

2

306

Foye and Febbo

5. Centrifuge for 5 min at ³10,000 × g at room temperature to separate the aqueous and organic phases. After centrifugation, the interphase should be compact; if it is not, repeat the centrifugation. 6. Remove the aqueous (upper) phase without disturbing the lower phase and transfer it to a fresh tube. Note the volume removed. 7. Preheat the nuclease-free water for elution to 95°C and ensure that the 100% ethanol is at room temperature. 8. Add 1.25 volumes of 100% ethanol to the aqueous phase and mix thoroughly (1.25× the volume removed in step 6). 9. Place a filter cartridge into a collection tube for each sample. 10. Pipet the lysate/ethanol mix onto the filter cartridge (see Note 20 for volume limitations and serial filtrations). 11. Centrifuge for 15 s at 10,000 × g. 12. Discard the filtrate. If necessary, repeat the spin to pass all of the lysate through the filter. 13. Add 700 mL of miRNA wash solution 1 to the filter cartridge and centrifuge for 10 s at 10,000 × g. Discard the filtrate. 14. Add 500 mL of wash solution 2/3. Spin at 10,000 × g for 10 s. Discard the filtrate. 15. Repeat step 14 for a second wash. Discard the filtrate. 16. Return the filter cartridge to the collection tube and spin at 10,000 × g for 1 min to remove any remaining fluid. 17. Transfer the filter cartridge to a fresh collection tube. 18. Add 100 mL of the preheated nuclease-free water to the center of the filter and close the cap. 19. Spin at ³10,000 × g for 30 s to recover the RNA. 20. The eluted liquid will contain the RNA. Store at £−30°C. 3.7. RNA Quality Verification 3.7.1. Agilent 2100 Bioanalyzer RNA 6000 Pico Kit 3

1. Allow the package of RNA 6000 Pico reagents to equilibrate to room temperature for 30 min. Also allow the RNA samples and RNA ladder aliquot to defrost (see Note 21 for RNA ladder aliquot information). Power the 2100 Bioanalyzer on, and load the software. 2. Vortex the dye concentrate (blue cap) for 10 s, spin down, and add 1 mL of dye to one of the prepared 65-mL aliquots of filtered gel (see Note 22 for filtration steps). 3. Vortex the solution well. Spin the tube at 13,000 × g for 10 min at room temperature.

3 The protocol is a modified version of the original provided by Agilent Technologies, Agilent RNA 6000 Pico Kit Guide, Manual PN G2938-90044, Edition 08/2006, Waldbronn Germany, ©Agilent Technologies, Inc. 2006.

Cancer Gene Profiling in Prostate Cancer

307

4. While the gel–dye mix is spinning, pipet 1.0 mL of each RNA sample and ladder into separate 0.6-mL microcentrifuge tubes. Keep all samples on ice until loading onto the chip. 5. Place a new RNA 6000 Pico chip in the priming station. Be sure that the base plate is in position C. 6. Pipet 9.0 mL of the filtered gel-dye mix into the well labeled G with the dark circle. 7. Be sure that the syringe is at the 1.0-mL mark and that the plunger clip is in the highest position. Close the priming station and press the plunger until it is held by the clip. 8. Wait exactly 30 s and release the plunger clip. The syringe will slowly rise, when it stops, help move the plunger slowly back up to the 1.0-mL mark. 9. Open the station and add 9.0 mL of gel–dye mix to the other two wells labeled G. 10. Pipet 9.0 mL of conditioning solution (white cap) into the well marked CS. 11. Before loading samples, pipet 350 mL of nuclease-free water into a cleaning chip and load into the 2100 Bioanalyzer to clean the electrodes. Leave the chip in for 5 and 10 min while preparing the RNA samples for chip loading. The lid of the machine must be open for at least 30 s after the wash to allow the electrodes to dry. 12. Pipet 5.0 mL of marker (green cap) into each 1.0-mL aliquot of RNA, including the ladder. 13. Vortex each sample mix and spin down. 14. Load each sample into the wells numbered from 1 to 11. 15. Pipet 6.0 mL of marker (green cap) into each unused sample well. 16. Pipet the RNA ladder into well 12 (the well marked with a small ladder). 17. Load the chip into the 2100 Bioanalyzer and be sure the chip plate is set to position 1. If all electrodes are making contact, the machine icon will change to an RNA 6000 Pico chip and the run can begin. 18. Begin the run by clicking start. During the run, samples can be labeled by selecting the Data and Assay icon and filling in the appropriate fields. 19. When the run is completed, wash the electrodes, as before, for 5–10 min. The 18 s and 28 s peaks on the electropherogram will be indicative of RNA sample quality (Fig. 5). Depending on the experiments to follow, the height of the 28 s peak will likely dictate the level of RNA quality (see Note 23). Highquality RNA will have a 28 s peak double the height of the corresponding 18 s peak, although after LCM work, 28 s

308

Foye and Febbo

Fig. 5. Electropherograms from various RNA samples run for analysis via RNA Pico kit on the Agilent 2100 Bioanalyzer. a Diluted sample of RNA from cultured LnCAP cells. b, c, and d RNA from laser-captured samples. Note the difference in 28 s peaks, with cell line RNA (a) being of the highest quality. b represents relatively good RNA quality for LCM samples, and c and d show degraded, and low-concentration degraded RNA, respectively. Marker peaks are approximately 20 s.

peaks as low as half the 18 s peak height can yield good results after amplification. See Note 24 for troubleshooting with the Pico Kit. 3.7.2. RiboGreen® RNA Quantification Assay 4

1. Allow all RNA samples and the diluted ribosomal RNA standard to thaw on ice. See Note 25 for dilution of the ribosomal RNA standard included in the RiboGreen® kit. 2. Remove the RiboGreen® dye reagent from the freezer, wrap in foil to protect from light, and place on ice. See Note 25 for handling of the dye reagent. 3. Prepare a dilution of TE buffer using the 20× stock, stored at 4°C. Five hundred microliters of 20× TE buffer in 9.5 mL nuclease-free water will yield more than enough buffer for eight RNA samples. Keep the 1× buffer on ice. 4. Cover a 5 mL or larger RNase-free tube with foil to block light. Transfer 3 mL of 1× TE buffer to the covered tube and place on ice. 5. Unwrap the 96-well plate and place it in a separate ice bucket, applying pressure down to form a solid contact between the bottom of the plate and the ice. Pack the surrounding ice

4

The protocol is derived from the Golub Lab, Dana-Farber Cancer Institute (14).

Cancer Gene Profiling in Prostate Cancer

309

down to avoid any ice contamination inside the wells. Keep the plate covered until the first standards are loaded. 6. Pipet 180 mL of 1× TE buffer into well A1. 7. Pipet 100 mL of 1× TE buffer into wells A2–A9. 8. Pipet 20 mL of the diluted ribosomal RNA standard (2.0 mg/mL) into well A1 and pipet repeatedly to mix. 9. Prepare the ribosomal RNA standards via a serial dilution. Transfer 100 mL from well A1 to well A2 and pipet repeatedly to mix. Continue the serial dilution through well A8. The resulting RNA concentrations are outlined in the table below. Plate well

Volume 1× TE (mL)

Volume of RNA standard (mL)

Volume of 1:2,000 dye

Final RNA conc. (ng/mL)

Al

180

20 (of 2 mg/mL standard)

100

100

A2

100

100 from Al

100

50

A3

100

100 from A2

100

25

A4

100

100 from A3

100

12.5

A5

100

100 from A4

100

6.25

A6

100

100 from A5

100

3.13

A7

100

100 from A6

100

1.56

A8

100

100 from A7

100

0.78

A9

100

None

100

0

10. Transfer 1.5 mL of the thawed RiboGreen® dye reagent to the covered tube of 3 mL of 1× TE buffer. Vortex briefly to mix. Place on ice. 11. Using a repeater pipet, add 100 mL of the diluted dye reagent to each RNA standard, wells A1–A9. Remove the plate from ice and gently swirl to mix the contents of the wells. 12. Cover the plate with an ice bucket lid and allow the dye to mix at room temperature for 60 s. 13. Power on the minifluorometer and calibrate the unit for blue fluorescence using the prepared blank (A9) and the 12.5 ng/ mL standard (A4). Be sure the standard value of the fluorometer is set to 12.5. 14. Immediately begin reading the samples by loading the full 200 mL into a cuvette, starting with A9 and A4. Continue in order from lower to higher concentration (A8–A1). 15. Record the results in Microsoft Excel, creating a linear regression of fluorometer reading versus the associated RNA standard concentration.

310

Foye and Febbo

16. Return the 96-well plate to the ice bucket. 17. Begin taking 1-mL aliquots of each RNA sample and place them in separate 0.5-mL tubes. 18. Add 199 mL of 1× TE buffer to each sample and pipet gently to mix. 19. Add 100 mL of each sample to the plate wells in row C. 20. Using the repeater pipet, add 100 mL of the diluted dye reagent to each loaded well. Swirl plate and cover at room temperature for 60 s. 21. Read each sample and record the results in the spread sheet. 22. Repeat steps 19–21 with the remaining diluted samples. Load in plate row E. 23. Now that all data has been recorded, remove any RNA standards that may have been loading or read errors from the linear regression. If the R2 value of the regression is less than 0.99, rerun the assay (see Note 26). 24. Use the regression line equation to calculate the concentrations of the samples from the fluorometer readings. Use the average of the two sample readings to compute the calculation and multiply by 200 to account for the sample dilution. 3.8. RNA Amplification 5

1. Thaw the first-strand reagents (Set A, blue caps). Mix each reagent, spin down, and place on ice.

(See Note 27)

2. For each sample, place 5 mL of total RNA into a 0.2-mL PCR tube and place on ice (see Note 28). Eight-tube strips are convenient for running multiple samples. Label the tubes to retain the proper orientation.

3.8.1. First-Strand Complementary DNA (cDNA) Synthesis (Reagent Set A, Blue)

3. Add 2 mL of first-strand primer mix A1, flick tubes to mix, and spin down. 4. Place the tubes in a thermal cycler running the following program: 65°C for 5 min, 4°C forever. Wait until the cycler reaches 65°C before loading samples. Use the heated lid function of the cycler, if available. 5. After the program completes the 65°C incubation, immediately remove the samples and snap cool on ice. 6. Make the first-strand master mix by combining 12 mL of buffer mix A2 with 1 mL of enzyme mix A3 for each sample. Account for some liquid loss by using a X + 0.2 master mix formula where X represents the number of samples (e.g., eight samples would require 98.4 mL A2 and 8.2 mL A3 for a 106.6 mL total volume of master mix).

The protocol is taken from the NuGEN™ Technologies, Inc., Ovation™ Biotin RNA Amplification and Labeling System Version 1.0 User Guide, Catalog #D01002, Version 09.06.05.

5

Cancer Gene Profiling in Prostate Cancer

311

7. Mix the first-strand master mix, spin down, and place on ice. 8. Add 13 mL of the first-strand master mix to each sample tube. Mix and spin down. 9. Place tubes in a thermal cycler running the following program: 48°C for 60 min, 70°C for 15 min, and 4°C forever. 3.8.2. Second-Strand cDNA Synthesis (Reagent Set B, Yellow)

1. Thaw the second-strand reagents (Set B, yellow caps). Mix each reagent, spin down, and place on ice. 2. Once the thermal cycler reaches 4°C from the previous program, remove the tubes, spin down, and place on ice. 3. Make the second-strand maser mix by combining 18 mL of buffer mix B1 with 2 mL of enzyme mix B2 for each sample. Use the X + 0.2 formula as with the first-strand master mix. 4. Mix the second-strand master mix, spin down, and place on ice. 5. Add 20 mL of the second-strand master mix to each firststrand reaction tube. Mix and spin down. 6. Place tubes in a thermal cycler running the following program: 37°C for 30 min, 75°C for 15 min, and 4°C forever. 7. When the cycler reaches 4°C, remove the reaction tubes, spin down, and place on ice.

3.8.3. SPIA™ Amplification (Reagent Set C, Red)

1. Thaw the SPIA™ amplification reagents (Set C, red caps). Vortex C1 and C2-C. Invert C3-C five times. Spin all tubes down and place on ice. 2. Make SPIA™ master mix by combining 72 mL of C2-C, 4 mL of C1, 4 mL of D1 (nuclease-free water, green cap), and 40 mL of C3-C for each sample. Use the X + 0.2 formula as before. 3. Mix the SPIA™ master mix, spin down, and place on ice. 4. On ice, add 120 mL of SPIA™ master mix to each secondstrand reaction tube. Mix and spin down. 5. Place half of the 160-mL reaction into a separate 0.2-mL PCR tube (or a second eight-tube set of PCR tubes, labeled accordingly). Cap tightly and spin down. 6. Place both sets of tubes in a thermal cycler running the following program: 48°C for 60 min, 95°C for 5 min, and 4°C forever. 7. When the cycler reaches 4°C, remove the reaction tubes and spin down. 8. Combine the contents of the two tubes for each sample, spin down, and place on ice. Proceed immediately to the Zymo Research purification step, or store the SPIA™ cDNA at −20°C for purification and labeling later.

312

Foye and Febbo

3.8.4. Purification of SPIA™ cDNA: Zymo Research Clean and Concentrator™-25

1. Add 320 mL of DNA binding buffer to a clean 1.5-mL tube for each SPIA™ cDNA sample. 2. Add the 160 mL of amplified SPIA™ cDNA product to each corresponding 1.5-mL tube. Vortex and spin down. 3. Place a Zymo-Spin II column into a collection tube for each cDNA sample to be purified. 4. Load the entire 480 mL of each cDNA sample onto the appropriate Zymo-Spin II column. 5. Spin the columns for 10 s at 10,000 × g and discard the filtrate. 6. Wash each sample by adding 200 mL of room temperature 80% ethanol. 7. Spin the columns for 10 s at 10,000 × g and discard the filtrate. 8. Repeat step 6. Spin the columns for 30 s at 10,000 × g and discard the filtrate. 9. Blot each column tip on filter paper to absorb any residual ethanol wash. 10. Place each column in a clean 1.5-mL tube. 11. Add 30 mL of room temperature, nuclease-free water (D1, green cap) to the center of the column. 12. Spin the columns for 30 s at 10,000 × g. 13. Collect the sample, which should be approximately 30 mL of purified cDNA. 14. Vortex each sample, spin down, and proceed to fragmentation and labeling.

3.8.5. Fragmentation and Labeling (Reagent Set F, Purple)

1. Thaw the fragmentation and labeling reagents (Set F, purple caps). Invert F2 to mix, then spin down. Vortex all other Set F reagents and spin down. Place F2 and F4 on ice. Leave F1, F3, and F5 at room temperature. 2. Place 25 mL of each SPIA™ cDNA sample into a clean 0.2-mL PCR tube (or a strip of eight PCR tubes). 3. Add 5 mL of fragmentation buffer F1 to each tube and mix. 4. Add 5 mL of fragmentation buffer F2 to each tube. Mix and spin down. 5. Place the set of tubes in a thermal cycler running the following program: 50°C for 30 min and 4°C forever. 6. Once the cycler reaches 4°C, remove the tubes and spin down. 7. Add 5 mL of labeling buffer F3 to each sample and mix. 8. Add 2.5 mL of biotin reagent F4 to each sample. Mix and spin down.

Cancer Gene Profiling in Prostate Cancer

313

9. Place the set of tubes in a thermal cycler running the same program used in step 5. 10. Once the cycler reaches 4°C, remove the tubes, spin down, and place on ice. 11. Add 7.5 mL of stop buffer F5 to each sample tube. Mix and spin down. 3.8.6. Purification of Biotin-Labeled SPIA™ cDNA: Qiagen DyeEx 2.0 Spin Kit

1. Prepare a DyeEx column for each cDNA sample by gently vortexing each column to resuspend the resin (see Note 29). 2. Loosen the cap of the column a quarter turn to avoid creating a vacuum and snap off the bottom closure of the column. 3. Place each column in a clean 2.0-mL collection tube. 4. Centrifuge the columns for 3 min at 750 × g. 5. Discard the filtrate and transfer each column to a clean collection tube. 6. Remove and discard the column cap. Apply each cDNA sample to the center of the slanted gel bed surface. See Subheading 4 for further details regarding this step. 7. Centrifuge the columns for 3 min at 750 × g. 8. Remove and discard the spin columns. The purified fragmented and labeled cDNA product is in the collection tube. 9. The final concentration and cDNA quality can be measured using a NanoDrop® ND-1000 spectrometer.

4. Notes 1. Fresh-frozen tissue block preparation and storage OCT and the Pallet: When forming pallets for frozen block preparation, wait until the bottom layer of OCT has solidified (turns opaque) before lowering the heat extractor. If the extractor is lowered too early, the OCT compound will not bind to the surface of the pallet and the OCT wafer can separate during the tissue collection. Pallet Holder: A sample holder can be any fabricated device that is an excellent thermal conductor and can hold the pallets flat so that OCT does not run off before solidification during the collection. A solid sample holder can be made from soldering copper pipes into a stand that can sit inside a cooler. The stand has a larger (¾-in. diameter) pipe that is parallel to the ground

314

Foye and Febbo

Fig. 6. Copper pallet holder with frozen OCT pad on a pallet.

and has perpendicular holes drilled at a diameter that can snugly hold the sample pallet posts. The copper pipe quickly takes on the temperature of surrounding dry ice, facilitating tissue freezing in addition to holding the pallets level (Fig. 6). 2. Frozen tissue sectioning temperature The temperature of the cryostat will change the cutting performance of the microtome. In general, softer tissues are cut more cleanly at lower temperatures. For example, breast tissue cuts well at −25°C or lower. Harder tissues, such as bone, will cut more easily at higher temperatures. Bone biopsies for metastatic prostate cancer typically need to be cut as warm as −10°C. Handling of tissue blocks can become difficult above −15°C because the OCT medium melts more quickly than if the chamber is set colder. Having a set of forceps prechilled on dry ice can facilitate handling tissue blocks at warmer chamber temperatures. A good starting temperature is −20 to −25°C. As you begin removing the outer layer of OCT, keep track of the blade’s motion as it slices. Chattering across OCT and tissue cracking or excessively curling back can mean the temperature is too low or the tissue block as not warmed to the chamber temperature. If the section does not form a clean slice off the razor and the tissue bunches near the blade, try using a colder temperature. 3. Frozen tissue section thickness A good section depth for LCM work is 7–8 mm. This is thin enough to provide clarity from a histology perspective, yet thick enough to yield decent amounts of RNA. Sections greater than 10 mL often compromise the stain quality (via H&E) and have little, if any benefit in RNA yield. Since LCM done with the IR laser and macro cap only pull from the upper surface of the tissue, LCM performed on thicker sections would often leave cells bound to the glass. Membrane slides can be used, but they do not address the stain quality concerns. Lysing captured tissue from a

Cancer Gene Profiling in Prostate Cancer

315

cap used on a membrane slide is more challenging than with glass slides. The membrane on the slide and the cap fuse from the UV laser beam and form a sort of laminated seal that lysis buffers have difficulty penetrating. 4. RNA integrity with frozen sections A great deal of RNA degradation can take place during a freeze and thaw process. It is therefore critical to stain and microdissect the frozen section immediately after sectioning, or store the tissue right away in an ultralow freezer until staining and LCM is performed. Because sectioning takes place at roughly −20°C and the tissue actually thaws when placed on a slide, a second freeze-thaw cycle can be eliminated if the section is stained and microdissected right away rather than put in storage. 5. Departures from standard H&E stain This protocol differs from a standard H&E staining protocol to maximize RNA quality. Water contact has been minimized and the overall staining time has been reduced. The basic bluing reagent following the hematoxylin stain has been dropped. Exposure to xylenes has been limited by several minutes and the single xylenes wash has been replaced with two brief washes to dissolve ethanol and dehydrate the tissue more efficiently. Rather than leaving the slide out to air dry for more than 10 min, the use of compressed air or nitrogen has the same drying effect in seconds. The protocol can be modified further depending on tissue type. Dip times in both hematoxylin and eosin can be altered to change stain coloring and intensity. Some tissues with large amounts of stroma will often require less time in eosin to avoid overly pink coloring. Tissue with higher cell density will often require less time in hematoxylin. 6. Tissue dehydration and drying Tissue dehydration and dryness is critical for accurate LCM performance. If LCM caps are not melting properly over flat sections of tissue, try increasing the soaking time in the last 95% and 100% ethanol steps. Also try leaving the slide under compressed gas longer. Both of these changes will increase the dryness of the tissue and should yield better LCM performance if the standard staining protocol is not working. 7. LCM cap choice Macro caps are “all around use” caps that consist of an IRabsorbing membrane bonded to a plastic cap. These will offer the largest area of for capture, but require the target tissue to be dry and very flat. With a well-prepared sample, minimum spot sizes can reach as low as 5 mm in diameter, but achieving melting consistency at this size can be challenging. High sensitivity caps (HS) have a raised ring that rests between the membrane and the tissue surface, creating a reliable 12-mm-thick separation between membrane and tissue (Fig. 7). Theoretically this improves spot melting consistency and allows for smaller spot sizes. However,

316

Foye and Febbo

Fig. 7. Macro LCM cap (a) and High Sensitivity (HS) LCM cap (b). The black ring on an HS cap is 12 mm in height, raising the cap membrane surface to a more uniform distance off the tissue compared with the macro caps. The actual performance difference between these caps will vary depending on tissue conditions.

an HS cap has seldom worked successfully when a macro cap failed under the same conditions. 8. RNA considerations Special care must be taken when performing LCM for RNA based projects. Time, temperature, and water exposure are all factors that will substantially degrade RNA in tissues. Unfortunately, the LCM process will inevitably introduce factors that will degrade RNA, but minimizing these factors can greatly improve results. The most important factor to address is time. Any opportunity to reduce the time that a tissue sample spends between frozen section and lysed cell solution should be taken. LCM should only be performed on one slide at a time, and a single cap should be used to avoid having slides and completed samples waiting in the machine while LCM is performed. Targeting tissue from a static image can greatly reduce targeting time by increasing the viewing area for marking. With respect to setting up LCM spots, the largest spots with the least overlap will result in the shortest LCM times. Use a spot size that is as large as possible while still providing the accuracy necessary to select the tissue of interest. However, avoid excessive pulse duration times (>6,000 ms) to maintain melting and cap adhesion reliability. Ablation is also a key consideration with RNA work because it requires extra time and introduces UV damage to tissue neighboring ablated cells. A balance must be achieved between eliminating unwanted tissue and minimizing time. If more than a minute of ablation is necessary to clean up a cap, there may be other methods of prepping tissue that are more efficient. Some of these methods are mentioned later in this section.

Cancer Gene Profiling in Prostate Cancer

317

9. Preparing tissue for LCM For most tissue types, preparation beyond dehydration during staining is not necessary. Folded tissue can provide a challenge for cap placement, but can be dealt with simply by using a straightedged razor blade to cut and scrape the raised part of the section off of the slide. Tissue sections involving bone can be particularly challenging for LCM. Bone fragments often create large discrepancies in tissue depth and cells of interest may have to be targeted in different layers. LCM is primarily a two-dimensional process once the cap has been lowered, so, before loading the slide, steps should be taken to ensure that your targeted cells are in one accessible plane. The following methods can be used to clear a tissue section of bone fragments and other unwanted, raised tissue: Macrodissection: Using a sharp straight-edged razor blade or a scalpel (and magnifying glass, if necessary), carefully cut off and scrape away sections of dense bone that rise above the 7-mm (tissue section depth) tissue height. Also cut away any pieces of tissue that are detached from the slide because these can hinder cap melting performance or become stuck to the cap when LCM is completed and the cap is raised off of the tissue. Adhesive Paper: Bone fragments and other loose pieces of tissue can be removed with a sticky piece of paper such as the bottom of Post-It® notes. This is a sensitive process because a very small amount of adhesion is needed to pull up tissue. With a gloved hand, tap the adhesive section of the paper repeatedly to decrease the adhesiveness. Little or no pressure is required to stick loose tissue to the paper, so start as lightly as possible. It is very easy to accidentally remove the majority of the section with this method; however, it is fast and effective at removing bone. If this method is removing too much tissue, try the cap clearing method below. Cap Clearing Method: Another method of clearing unwanted tissue from a slide is to use an extra macro cap. When using this method, be sure to have extra caps loaded for each session. Complete the LCM method through step 7. Instead of using the cap to target cells of interest, use it to adhere to loose tissue fragments and to pull up raised regions of tissue. It is very common for bone fragments to stick to the cap membrane even if those areas are not targeted with the IR laser because bone does not adhere to the glass slide surface as well as other tissues. By using a clearing cap, most fragments that would contaminate the actual sample cap will be removed, resulting in a more pure sample (Fig. 1). Starting from step 7, set up the LCM laser using a higher power than normal (e.g., 85–100 mW). Given the rough surface of tissue prior to clearing, membrane melting performance can be inconsistent and higher laser power will increase the chance of sticking to fragments. Use either the line tool or spot tool to quickly mark areas for removal. Regions do not need to be

318

Foye and Febbo

covered, just tagged with LCM spots. When tagging is complete, simply drag the cap to an offload bay, return to the tissue, and resume the LCM protocol at step 6. 10. IR laser settings The parameters for adjusting the IR laser will more than likely have to be adjusted for every sample. It is helpful to become familiar with trends in adjusting particularly laser power and pulse duration. This is best done through practice rather than reading a protocol, but the following notes address some basic trends. Laser Power: The default setting for laser power is 70 mW. For a well-prepared, dry, and flat sample, this is too high. An ideal laser spot will be a clearly defined black circle with a very clear center region. The thick border represents the pour formed in the melted membrane. If the spot appears blurry in the center, the pour did not dip low enough to hit tissue and thus power should be increased. If a black dot appears in the center of the melted spot, this represents overmelted membrane, either from splashing back toward the cap or from prolonged melting in the center region (Fig. 8). If this appears, lower the laser power. With inconsistent melting on uneven tissue, it is often necessary to allow some overmelting to achieve any melting at all in other regions of tissue. Overmelted membrane spots often still capture fine and the tissue will not be adversely affected.

Fig. 8. The top image is a comparison of varying pulse duration with constant power (single 70-mW shot). The spot diameter increases consistently with increased pulse duration up to approximately 4,500 ms, with diminishing returns thereafter. The lower image is a comparison of varying power settings with constant pulse duration (single 2,500-ms shot). Overmelting is seen at 100 mW, while the power was not enough to completely melt the membrane at 40 mW.

Cancer Gene Profiling in Prostate Cancer

319

Pulse Duration: Pulse duration represents the length of time the IR laser is on during each fire. The default value is 2,500 ms and this is often a good value for a medium spot size. On dry tissue, settings of 4,000–5,500 ms can yield very large spot sizes (>60 mm), which can be excellent for larger areas of cells. Generally, the higher the pulse duration, the larger the melted spot. Laser power typically has to be lowered as pulse duration is increased to avoid an excessively long, powerful laser pulse. Short durations (<1,000 ms) can be used for high-precision LCM. With well-prepared tissue, pulse durations as low as 500 ms can be used in conjunction with higher power (>70 mW) to achieve spot sizes below 7 mm (Fig. 8). Pulse: This is the number of times the laser diode will fire for a single spot on the target window. The default setting is one pulse. Using two pulses will often increase the reliability of spot size and has a negligible effect on capture time. Pulses greater than 2 will have a similar effect as increased pulse duration (increased spot size), but can increase capture time depending on pulse duration. 11. Setting spot size Setting a proper spot size is crucial for LCM accuracy, timing, and sample yield. There is a certain amount of variability in each cap membrane as well as in the surface of the tissue section that will lead to inconsistencies in spot melting. When the IR laser is test fired during setup, the spot size may change during the capturing process. For this reason, a margin of error should be used when targeting tissue and recording the spot size. When marking regions, allow a buffer between the tissue of interest and the adjacent unwanted tissue. This will account for any spot size increase during capture and prevents overlap that can contaminate your sample. For the same reason, mark the edge of the spot inside the actual edge of the melted circle when measuring spot size. This will adjust how the LCM software marks the tissue by recording a smaller than actual spot diameter. If the melted spot size decreases during the capture process, there will be less tissue lost to gaps between spots. Spot overlap can also be adjusted from the capture groups control box. Select the properties button in the bottom right corner of the control box. Select the capture properties tab. Under the glass slide column, horizontal and vertical overlap can be adjusted. A value of 40 for each results in a spot overlap that increases reliability of capture. Increasing the value on samples with a high number of melting inconsistencies could improve LCM results, but adjusting IR laser settings may prove more helpful. 12. Picking regions for LCM Region selection often determines the length of capturing time. Numerous small regions often take longer to capture than one large region, and smaller spot sizes yield higher capture times.

320

Foye and Febbo

Especially when working with RNA, capture time should be kept to a minimum to reduce RNA degradation. When selecting an area for capture, keep an eye on the time spent targeting as well as the area. Brief test experiments comparing capture area with RNA yield showed diminishing returns in RNA yield once the targeted area of malignant cells surpassed 3.0 mm2. Yields vary tremendously with tissue type, condition, and many other factors. However, we typically target roughly 2–3 mm2 of cells and find that with larger regions of metastatic disease, this area is the best balance of capture time and RNA yield. 13. Combined UV/IR laser use and alternatives to UV ablation There are generally two ways in which unwanted tissue adheres to a cap membrane. LCM performed on a glass slide is more of an adhesion and tissue tearing process than a cutting process. The melted membrane attaches to the tissue surface, but once the cap is lifted, the region of tissue is torn from original section. Depending on tissue type, a border of tissue near the edge of the targeted region can be pulled up with the cap. This issue can be addressed by using the cut and capture LCM tools instead of simply the capture tools. When using a glass slide, this will add a perimeter cut via the UV laser to separate the targeted region of tissue from surrounding tissue. If using the cut and capture tools does not solve the problem, use this method in conjunction with the “cap clearing” method above. Instead of doing both the cutting and LCM on the same cap, perform the cutting on the first cap, discard, then perform LCM using a clean cap. The unwanted tissue near the border of the UV guided perimeter should be stuck to the first cap. Another way of picking up unwanted cells happens when tissue that is not well fixed to the glass slide sticks to the cap and is lifted when the cap is removed. This can be handled using the cap clearing method mentioned above. Often in the case of poorly fixed pieces of tissue, little or no LCM on the clearing cap is necessary. Simply placing a cap down over the region of interest and lifting the cap can pull unwanted tissue off the area, however, using the IR laser to create tags on the unwanted sections will increase the effectiveness of the method. 14. Troubleshooting Getting the laser to melt a reliable spot is the key to successful LCM and thus is also the source of most problems. There is no single trick to achieving a reliable cap melt, but there are several approaches that can increase the chances of success. The following list addresses possible solutions in the order in which they should be addressed. The first several involve adjusting the laser and LCM cap. These attempt to fix problems while salvaging the tissue section and LCM session. Once problems arise, keep an eye on the time spent tinkering with LCM settings. A cutoff time will likely need to be set to ensure good RNA quality if the run

Cancer Gene Profiling in Prostate Cancer

321

ultimately works. A good amount of information can be learned from botched LCM runs, so even if you determine you will not get RNA from a given section, continue working the laser and cap adjustments so that a remedy may be found more quickly during future sessions. Increase Laser Power: Typically, a misfiring IR laser just needs more power to melt the cap down to the tissue. This should be the first adjustment made if a test fire does not melt properly. Increase Laser Pulse Duration: For the same reasons mentioned with laser power, sometimes the membrane will melt if given a longer laser pulse. Pulse duration typically adjusts spot size (longer durations yield larger spot diameters). Refocus IR Laser: If the IR laser is out of focus, the full power is not being channeled to the cap membrane. This will greatly affect cap melting. Note that the actual laser seen on the screen is in fact a 650-nm visible light laser diode that is set to mimic the IR laser and facilitate targeting and focus. The actual IR laser used for LCM is not visible. Reposition the Cap: Cap positioning has a major impact on LCM performance. With good positioning, a huge range of laser settings will yield good LCM performance. If test fires of the IR laser are not working at all, chances are quite good that the cap is either positioned over a fold in the tissue or an uneven region of the section. The cap is essentially dropped by a robotic arm over the tissue section, so any unevenness on the surface of the section (even 1-m differences) will result in varying gaps between the cap membrane and the tissue. There is no user-enabled function that can “level” a cap once it has been placed, so the only alternative is to find a flat region of tissue. The cap does not need to be centered over the targeted region of tissue. Try placing the cap off center so that the majority of the cap mass is over a flat region. Keep in mind that the section itself is a 7-mm plateau, so placing the cap over the edge can lead to inconsistent spot sizes. If you must place a cap over the edge of the section, make sure more than 50% of the surface area of the cap is on the tissue section. That will not guarantee successful cap melting, but it will increase the likelihood that the cap will stay somewhat level over the tissue. Replace the Cap: Caps can have defects that prevent them from melting. It is rare, but sometimes a membrane will separate from the cap. Other times, a cap may just refuse to melt properly. It is unlikely that a faulty cap is the cause of poor melting, but it is a possibility and should be considered if other options fail to solve the problem. Problems with Tissue Preparation: Tissue dryness is one of the most common causes of LCM problems. Any moisture in the tissue will either prevent the cap from melting properly or it can prevent the membrane from adhering to the tissue. See Note 6 for steps on further dehydration of tissue sections.

322

Foye and Febbo

5. Combining small samples for higher RNA yield 1 Some samples will have fewer than 1,000 mm2 of cells that can be captured. A higher RNA yield can be achieved by combining lysed cells from numerous LCM caps during the isolation protocol. For these samples, use a total volume of 50 mL for the lysis buffer mix and freeze separately (freeze each tube immediately after LCM). During isolation, up to 200 mL (four caps of tissue) of lysed cells can be passed through the filter cup at a time (total capacity of 400 mL, 1:1 mixture with 70% EtOH). With multiple 200 mL passes during isolation, there is technically no limit on the number of LCM caps that can be combined into one isolation tube. However, even with the smallest bone biopsy samples, our protocol has never require more than three passes. 16. Macrodissection When coarse cutting a frozen tissue block, be careful to control the tissue on either side of the razor blade. Simply cutting down into the block will often cause one or both pieces to shoot away from the blade. Place the razor between your thumb and index finger in such a way that the fingertips are in contact with the tissue. Advancing the blade between the fingers (by a rolling/ squeezing motion) will lower the razor through the tissue slowly while maintaining fingertip contact with both pieces of tissue. Be sure to minimize the time that fingers are in contact with the tissue block to avoid melting. If the block becomes soft, allow it to equilibrate to the cryostat chamber temperature before continuing any macrodissection. 17. Cell agitation During the agitation process, it is normal for the sample to become foamy. The two spins are essential to achieve proper separation of the beads from the lysed tissue suspension. After each spin, the beads will form the bottom layer and the supernatant will contain the lysed cells in buffer. 18. Kit selection The Stratagene kit provides both the ability to get RNA from very few cells and a very low elution volume requirement, which increases yield concentration for the small sample sizes associated with LCM. The Stratagene Absolutely RNA® kits do not isolate microRNAs and this should be considered if analysis of microRNA is to be included in your project. 9. Stratagene® RNA isolation notes 1 RNA Handling: Every precaution must be taken to preserve the small amount of RNA present when isolating RNA from LCM samples. Be sure every step of the isolation process is done wearing gloves that have been washed with RNaseZap or an equivalent RNase inhibitor wash. All bench-top surfaces, centrifuge rotors and buttons, and heating block surfaces should be washed

Cancer Gene Profiling in Prostate Cancer

323

with the inhibitor as well. All plasticware should be handled with gloves and stored in closed containers to avoid dust and human contact that can introduce or activate RNase enzymes. Elution: The elution step is important for determining the resulting concentration of RNA, which is critical for the amplification protocol to follow. Elution into 15 mL allows for one 10-mL aliquot for amplification and roughly 4 mL for running quantification and quality assays (RiboGreen®, Molecular Probes Inc., and Agilent RNA 6000 Pico Assay, respectively). Smaller elution volumes will create higher RNA concentrations, however, less than 10 mL is not recommended and will likely not provide enough sample to run the necessary assays in addition to amplification. 20. Filter cartridge loading The filter cartridge can only handle 700 mL of liquid per spin. If combining samples for higher RNA yield, or if the lysate/ethanol mix of one sample exceeds this volume, serial spins can be performed. Load £700 mL at a time, perform the spin, and discard the filtrate. Repeat until the entire sample has been loaded. 21. Preparing the RNA ladder Remove the ladder from the reagent kit immediately upon arrival. Transfer the contents (10 mL) to a certified nuclease-free 1.5-mL tube. Heat denature the tube for 2 min at 70°C. Immediately cool on ice. Add 90 mL of certified RNase-free water and mix thoroughly. Prepare aliquots from this stock. Three-microliter aliquots will allow enough ladder for two RNA 6000 Pico chips. Store the aliquots at −80°C. 22. Preparing filtered gel aliquots Pipet 550 mL of RNA 6000 Pico gel matrix (red tube) into one of the provided spin filters. Spin at 1,500 × g for 10 min at room temperature. Aliquot 65 mL of the filtered gel into 1.5-mL microcentrifuge tubes. Store aliquots at 4°C. One aliquot is enough gel to run two RNA 6000 Pico chips. 23. Verifying RNA quality: 18 s and 28 s peaks The electropherogram should yield three distinctive peaks for a good sample of RNA. The first peak occurs quickly (~20–25 s into the run) and represents the marker. After running for approximately 35–40 s, the 18 s peak should appear. The height of the 28 s peak that follows is the best indicator of RNA quality. Robust RNA samples from cultured cells should yield 28 s peaks that are double the height of the 18 s peak. Given the stress of the LCM process on RNA quality, 18 s and 28 s peaks of equal height indicate a higher-quality RNA sample. 28 s peaks as low as half the height of the corresponding 18 s peak can often yield good results after amplification (Fig. 5). The cutoff for a “good” sample will depend on the specific project, the quality and success

324

Foye and Febbo

of amplification, and the genomic data provided by the microarray chip. 24. Troubleshooting The RNA 6000 Pico kit is much more sensitive than the RNA 6000 Nano kit and thus errors are more likely to arise. Keeping Electrodes Clean: Thorough cleaning of the electrodes before and after each run will reduce failed runs. Limiting RNA concentration to within the range of the assay is also important for keeping clean electrodes (no more than 5 ng/mL of RNA). Higher concentrations can be run with the pico chip, but more extensive and frequent electrode cleaning will be necessary and sample dilution may a simpler alternative. Identifying a Failed Run: The entire run is dependent on the RNA ladder. This is the first sample to load and has a distinctive pattern of successive peaks. If the ladder peaks appear too late in the electropherogram or if the peaks are not present, none of the samples will run properly. It is therefore very important to use a fresh aliquot of ladder for each run and be sure to heat denature the original ladder stock. Air Bubbles in Chip Wells: Each sample must be loaded without introducing air bubbles for the electrodes to get a proper read of each sample. Pipet into the wells with the tip touching the bottom of the well, but at an angle to allow the liquid to dispense slowly. Do not take the pipet plunger into the “blow out” range, simply stop the plunger at the set volume and slowly retract the pipet tip. This is especially important for loading the gel-dye mix and the ladder because these wells affect each sample. 25. Reagent preparations for RiboGreen® Ribosomal RNA Standard: The RNA standards included in the RiboGreen® kit are 100 mg/mL each. Dilute 10 mL of the stock RNA standard in 500 mL of 1× TE buffer to create a 2.0 mg/mL standard. Create 50-mL aliquots of this standard for use with the assay and store at −80°C. Dye Reagent: The RiboGreen® dye reagent is light sensitive. Overhead lighting should be turned off for the assay and both the stock dye vial and any dilution tubes should be covered with foil to eliminate light-induced degradation. The dye also has a high freezing point relative to RNA samples diluted in water or elution buffers. The dye should be thawed wrapped in foil and placed just outside of an ice bucket to reduce thawing time. The dye can be thawed at room temperature but should be returned to ice once thawed. 26. Assay sensitivity and troubleshooting Due to the repeated dilutions and sensitivity of the dye reagent, the RiboGreen® assay is extremely sensitive to errors and deviations in timing. Time differential between adding dye and reading on the

Cancer Gene Profiling in Prostate Cancer

325

fluorometer will cause discrepancies between samples. Therefore, sample reading should be done quickly and with as much consistency as possible. Slight errors in performing dilutions can also lead to misread samples or inaccurate results. It is helpful to perform the standard readings and make a decision on assay reliability before running samples. The R2 value of the linear regression will be available before diluting RNA samples if the Excel file is created in advance. If the standards are inaccurate, create a new set of standard dilutions in row B of the plate. The quantity of 2 mg/mL RNA standard, 1:2,000 dye reagent, and 1× TE created is enough to run two sets of standards per assay as a precaution. 27. Amplification kit selection NuGEN’s Ovation amplification and labeling system offers an allin-one solution for preparing biotin-labeled cDNA for use with Affymetrix GeneChip® microarrays from small amounts RNA (5 ng total RNA or less). The kit provides a reliable, isothermal linear DNA amplification. Although most of our work has been done using the Ovation version 1 system, the WT-Ovation™ amplification system has since been released and can allow for smaller initial RNA concentrations as well as providing amplification to cDNA without a 3¢ bias. This kit must be combined with the FL-Ovation™ Biotin V2 kit to offer a complete solution from total RNA to biotin-labeled cDNA. 28. RNA quality and quantity requirements RNA should be analyzed using an Agilent 2100 Bioanalyzer for quality control prior to amplification. See 3.7.1 for more information. RNA of low quality will typically not amplify with an accurate representation of the genome, leading to skewed array results. RNA quantity is a little more flexible than quality, but running an amplification with total RNA levels lower than 5 ng is taking a risk. Sometimes less than 5 ng of RNA will amplify to enough cDNA to place on an array chip. Keeping the 5 ng threshold in mind is a good indicator of amplification success, but it is not a black and white boundary that should deter running an important yet lower concentration sample. 29. Qiagen DyeEx 2.0 purification The DyeEx columns have fragile resin blocks once spun down. These columns cannot be spun any greater than 750 × g. If possible, try spinning them at 700 × g since the resin blocks can crack even at the recommended centrifugation settings. Using a fixed angle rotor will result in an angled resin surface for sample loading, where a pivoting bucket rotor will create a flat surface. It is very important that a pipet tip does not come into contact with the resin block or it can crack or fragment. If this happens to a column, use a new one if possible.

326

Foye and Febbo

References 1. Jemal, A., R. Siegel, E. Ward, T. Murray, J. Xu, and M.J. Thun, (2007). Cancer statistics, 2007. CA Cancer J Clin, 57(1), p. 43–66. 2. Tomlins, S.A., D.R. Rhodes, S. Perner, S.M. Dhanasekaran, R. Mehra, X.W. Sun, et al., (2005). Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science, 310(5748), p. 644–8. 3. Chen, C.D., D.S. Welsbie, C. Tran, S.H. Baek, R. Chen, R. Vessella, et al., (2004). Molecular determinants of resistance to antiandrogen therapy. Nat Med, 10(1), p. 33–9. 4. Stanbrough, M., G.J. Bubley, K. Ross, T.R. Golub, M.A. Rubin, T.M. Penning, et al., (2006). Increased expression of genes converting adrenal androgens to testosterone in androgen-independent prostate cancer. Cancer Res, 66(5), p. 2815–25. 5. Holzbeierlein, J., P. Lal, E. LaTulippe, A. Smith, J. Satagopan, L. Zhang, et al., (2004). Gene expression analysis of human prostate carcinoma during hormonal therapy identifies androgen-responsive genes and mechanisms of therapy resistance. Am J Pathol, 164(1), p. 217–27. 6. Tomlins, S.A., R. Mehra, D.R. Rhodes, X. Cao, L. Wang, S.M. Dhanasekaran, et al., (2007). Integrative molecular concept modeling of prostate cancer progression. Nat Genet, 39(1), p. 41–51. 7. Dhanasekaran, S.M., T.R. Barrette, D. Ghosh, R. Shah, S. Varambally, K. Kurachi, et al., (2001). Delineation of prognostic biomarkers in prostate cancer. Nature, 412(6849), p. 822–6. 8. Lapointe, J., C. Li, J.P. Higgins, M. van de Rijn, E. Bair, K. Montgomery, et al., (2004).

Gene expression profiling identifies clinically relevant subtypes of prostate cancer. Proc Natl Acad Sci USA, 101(3), p. 811–6. 9. Luo, J., S. Zha, W.R. Gage, T.A. Dunn, J.L. Hicks, C.J. Bennett, et al., (2002). Alphamethylacyl-CoA racemase: a new molecular marker for prostate cancer. Cancer Res, 62(8), p. 2220–6. 10. Luo, J., D.J. Duggan, Y. Chen, J. Sauvageot, C.M. Ewing, M.L. Bittner, et al., (2001). Human prostate cancer and benign prostatic hyperplasia: molecular dissection by gene expression profiling. Cancer Res, 61(12), p. 4683–8. 11. Singh, D., P.G. Febbo, K. Ross, D.G. Jackson, J. Manola, C. Ladd, et al., (2002). Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1(2), p. 203–9. 12. Welsh, J.B., L.M. Sapinoso, A.I. Su, S.G. Kern, J. Wang-Rodriguez, C.A. Moskaluk, et al., (2001). Analysis of gene expression identifies candidate markers and pharmacological targets in prostate cancer. Cancer Res, 61(16), p. 5974–8. 13. Rubin, M.A., M. Zhou, S.M. Dhanasekaran, S. Varambally, T.R. Barrette, M.G. Sanda, et al., (2002). Alpha-Methylacyl coenzyme A racemase as a tissue biomarker for prostate cancer. JAMA, 287(13), p. 1662–70. 14. Febbo, P.G., A. Thorner, M.A. Rubin, M. Loda, P.W. Kantoff, W.K. Oh, et al., (2006). Application of oligonucleotide microarrays to assess the biological effects of neoadjuvant imatinib mesylate treatment for localized prostate cancer. Clin Cancer Res, 12(1), p. 152–8.

Chapter 16 Cancer Gene Profiling for Response Prediction B. Michael Ghadimi and Marian Grade Summary Preoperative treatment strategies are now recommended for a variety of human cancers. Unfortunately, the response of individual tumors to a preoperative treatment is not uniform, and ranges from complete regression to resistance. This poses a considerable clinical dilemma, because patients with a priori resistant tumors could either be spared exposure to radiation or DNA-damaging drugs, i.e., they could be referred to primary surgery or dose-intensified protocols could be pursued. Because the response of an individual tumor as well as therapy-induced side effects represent the major limiting factors of current treatment strategies, identifying molecular markers of response or for treatment toxicity have become exceedingly important. However, complex phenotypes such as tumor responsiveness to multimodal treatments probably do not depend on the expression levels of just one or a few genes and proteins. Therefore, methods that allow comprehensive interrogation of genetic pathways and networks hold great promise in delivering such tumor-specific signatures, because expression levels of tens of thousands of genes can be monitored simultaneously. During the past few years, microarray technology has emerged as a central tool in addressing pertinent clinical questions, the answers to which are critical for the realization of a personalized genomic medicine, in which patients will be treated based on the biology of their tumor and their genetic profile (1–4). Key words: Gene expression profiling, Microarrays, Rectal cancer, Preoperative chemoradiotherapy, Response prediction, Personalized medicine

1. Introduction The major advantage of microarray technology over other techniques that study expression levels of genes is that tens of thousands of genes can be studied simultaneously in one single experiment. It has been shown that gene expression profiles of cancer cell lines correlate with drug activity (5–7) or radiosensitivity (8). Robert Grützmann and Christian Pilarsky (eds.), Cancer Gene Profiling: Methods and Protocols, Methods in Molecular Biology, vol. 576, DOI 10.1007/978-1-59745-545-9_16, © Humana Press, a part of Springer Science + Business Media, LLC 2010

327

328

Ghadimi and Grade

It has also been demonstrated that gene expression signatures predicting sensitivity to chemotherapeutic drugs in vitro can also accurately predict clinical response in patients treated with these drugs in vivo (9). In analogy to these model systems, gene expression signatures have been identified that predict the response of breast cancers to preoperative chemotherapy (for review, see ref.10), of esophageal carcinomas to preoperative chemoradiotherapy (11), or colon cancers to postoperative chemotherapy (12). In addition, prognostic signatures have been established for patients with breast cancers (13–15) and non-small cell lung cancer (16), leading to the initiation of multicenter trials to test the clinical effectiveness of these gene sets (17, 18). We recently demonstrated that gene expression profiling might be useful for predicting the response of locally advanced rectal cancers to preoperative chemoradiotherapy (19). These results led us to initiate prospective profiling of tumor samples from patients enrolled in the ongoing CAO/ARO/AIO-04 trial of the German Rectal Cancer Study Group, which is integrated into a Clinical Research Unit (KFO 179) funded by the German Research Foundation (DFG). This chapter describes the general principles of a microarray experiment. First, total RNA is isolated from frozen tumor samples. Second, the messenger RNA (mRNA) is amplified, during which time, amino allyl UTP nucleotides are incorporated that are later chemically coupled to an N-hydroxysuccinimidyl (NHS) ester dye. After purification to remove unincorporated nucleotides, the labeled sample is combined with a differentially labeled common-reference sample and subsequently hybridized onto a spotted oligonucleotide microarray slide (see Fig. 1).

2. Materials 2.1. Sample Accrual and Storage

• RNAlater (Ambion, Austin, TX). Make aliquots of 1 ml in polypropylene tubes and store at room temperature for up to 6 months.

2.2. RNA Isolation

1. TRIzol (Invitrogen, Carlsbad, CA). Cover the bottle with aluminum foil. 2. Anatomical forceps that can be sterilized. You need one forceps per sample. 3. Handheld homogenizer, e.g., Polytron (Kinematica, Littau, Switzerland). 4. Chloroform (Mallinckrodt Baker, Phillipsburg, NJ).

Cancer Gene Profiling for Response Prediction

329

Fig. 1. Principle of a two-color microarray experiment. RNA samples from tumor and control tissues are individually amplified, labeled with different fluorescent dyes, and hybridized to a single DNA microarray. The fluorescence intensity is measured for each probe in both samples, and relative gene expression levels can be calculated.

330

Ghadimi and Grade

5. Glycogen, 20 mg/ml (Invitrogen, Carlsbad, CA). 6. Isopropyl alcohol (Mallinckrodt Baker, Phillipsburg, NJ). 7. 200-proof ethyl alcohol (Warner-Graham, Cockeysville, MD). 8. DEPC-treated water (Research Genetics, Huntsville, AL). 9. Recommended: Spectrophotometer (Nanodrop, Rockland, DE). 10. Recommended: Bioanalyzer 2100 (Agilent Technologies, Palo Alto, CA). 11. RNA-, DNA-, RNase-, DNase-free sterile, cotton-plugged pipette tips. 12. RNA-, DNA-, RNase-, DNase-free microcentrifuge tubes. 13. Always wear gloves! 2.3. RNA Amplification

• Amino Allyl MessageAmp aRNA kit (Ambion, Austin, TX)

2.4. Indirect Labeling and Hybridization

1. The NHS ester dyes are provided as dried samples (Amersham Biosciences, Piscataway, NJ). Dissolve one vial of each Cy3 ester and Cy5 ester in 73 ml dimethyl sulfoxide (Amino Allyl MessageAmp aRNA kit). Prepare 5-ml aliquots, and store them at −80°C (seal tubes with Parafilm). 2. Nuclease-free water (Amino Allyl MessageAmp aRNA kit). 3. Coupling buffer (Amino Allyl MessageAmp aRNA kit). 4. Hydroxylamine (Amino Allyl MessageAmp aRNA kit). 5. Antisense RNA (aRNA) binding buffer (Amino Allyl MessageAmp aRNA kit). 6. 200-proof ethyl alcohol. 7. aRNA filter cartridge (aRNA collection tube; Amino Allyl MessageAmp aRNA kit). 8. aRNA wash buffer (Amino Allyl MessageAmp aRNA kit). 9. aRNA collection tube (Amino Allyl MessageAmp aRNA kit). 10. Microcon YM-30 columns (Millipore, Billerica, MA). 11. 10 × fragmentation buffer (Amino Allyl MessageAmp aRNA kit). 12. Stop solution (Amino Allyl MessageAmp aRNA kit). 13. LifterSlips (25 × 60 mm; Erie Scientific, Portsmouth, NH). 14. Spotted oligonucleotide microarray glass slides (Hs-Operon; Advanced Technology Center of the National Cancer Institute, Gaithersburg, MD). 15. Deionized formamide (Ambion).

Cancer Gene Profiling for Response Prediction

331

16. Prepare prehybridization solution (5 × standard sodium citrate [SSC], 1% bovine serum albumin [BSA], 0.1% sodium dodecyl sulfate [SDS]) and warm up to 42°C. 17. Prepare 2 × hybridization buffer (50% deionized formamide, 10 × SSC, 0.2% SDS) and warm up to 48°C. 18. Hybridization cassettes (TeleChem International, Sunnyvale, CA). 19. Prepare washing solutions: Solution 1: 2 × SSC, 0.1% SDS (200 ml total); solution 2: 1 × SSC (200 ml total); solution 3: 0.2 × SSC (200 ml total).

3. Methods The set up of gene expression microarray experiments largely depends on two factors: the amount of RNA of a given sample, and the microarray platform. Depending on the samples that are used, RNA amplification may be required. Some microarray manufacturers like Agilent Technologies have already included an RNA amplification step in their protocol. If RNA amplification is necessary to obtain sufficient amounts of RNA (e.g., for repeat hybridizations), there are many companies that provide special kits. We have experience with Arcturus’ RiboAmp RNA amplification kit (Mountain View, CA) and Ambion’s Amino Allyl MessageAmp aRNA kit, and both yielded good results. However, it should be noted that some kits generate sense RNA, while others generate antisense RNA. This is not a factor when hybridizing to complementary DNA (cDNA) arrays, but must be taken into consideration for oligonucleotide arrays. Additionally, some kits enable two rounds of amplification for higher yield. The decision of which kit to use is therefore based on the design of the microarray platform, because (spotted) microarrays can represent single- or double-stranded sequences. Another very important aspect to consider is whether one-color or dual-color hybridizations should be performed. Both techniques are accepted in the microarray field for use with specific platforms, and each has advantages and disadvantages, which are discussed elsewhere (20, 21). As mentioned above, we performed dual-color hybridizations. Accordingly, one tumor sample is hybridized against a common reference aRNA pool. It should also be noted that Cy5 is very sensitive to high environmental ozone concentrations (22). This problem can obviously be overcome if one-color hybridizations with Cy3 are being performed. If dual-color hybridizations are required, chemical preservatives can be added to the washing solutions.

332

Ghadimi and Grade

Other alternatives are to install carbon filters in the air handling system, or to perform the hybridization and washing steps in a closed hood with an activated charcoal filter through which the air is purified (23). Finally, because of the dynamic nature of this technology, it should be noted that this protocol is optimized for those microarrays that we purchased. Commercially available microarrays obviously require different protocols. The general considerations outlined here, and the protocol for RNA isolation, however, hold true for those too. 3.1. Sample Accrual and Storage

The time interval between sampling and storage is very important because even partial degradation dramatically impairs microarray analyses. For gene expression profiling, we therefore strongly recommend accruing tissue samples directly in the operating room or in an endoscopic unit. The samples should be immediately stored in an RNA stabilization reagent or frozen directly in liquid nitrogen. We and others have good experience with RNAlater from Ambion. The advantage of RNA stabilization reagents is that they are ready for use, and can be stored in cups or tubes at room temperature for months.

3.2. RNA Isolation

The single most critical factor for a successful microarray experiment is the RNA, i.e., its purity and integrity. Many different protocols are available for RNA isolation. Because we have not only focused on the cellular transcriptome, but also on genomic and proteomic analysis, we have been primarily using TRIzol. The isolation protocol described here is based on the manufacturer’s recommendation with minor modifications according to our experience. In our hands, we have been able to successfully isolate sufficient amounts of RNA from cancer biopsies with weights ranging from 5 to 150 mg. 1. Thaw tumor samples that have been stored in RNAlater and, using a sterile forceps, transfer the tissue immediately into a 15-ml polypropylene tube containing 4 ml of the TRIzol reagent (see Note 1). 2. Thoroughly homogenize samples to disrupt cells and dissolve components, which usually takes approximately 30 s, and incubate for 5 min at room temperature to dissociate nucleoprotein complexes. 3. Add 0.8 ml chloroform (0.2 ml chloroform per 1 ml of TRIzol), tightly cap tubes, and shake vigorously for 30 s. 4. Allow phase separation for 15 min on ice, and centrifuge at 12,000 × g for 15 min at 4°C (phase separation). 5. Transfer very carefully only the upper aqueous phase (colorless), containing mostly RNA, to a new 15-ml polypropylene tube (see Note 2).

Cancer Gene Profiling for Response Prediction

333

6. Add 1 ml glycogen and mix briefly (see Note 3). 7. Add 2.0 ml isopropyl alcohol (0.5 ml isopropyl alcohol per 1 ml TRIzol) to precipitate the RNA. 8. Vortex tube and incubate for at least 1 h at −20°C (see Note 4). 9. Centrifuge at 12,000 × g for 30 min at 4°C. 10. Remove the supernatant (see Note 5), and add 4 ml of 75% ethanol (1 ml ethanol per 1 ml TRIzol) to wash off residuals of TRIzol. 11. Break up the pellet by pipetting up and down and vortex for a few seconds. 12. Wash the pellet for 10 min at room temperature on a rotator (see Note 6). 13. Centrifuge at 7,500 × g for 15 min at 4°C to pellet the RNA. 14. Remove the supernatant, and add 1 ml of 75% ethanol. 15. Break up the pellet by pipetting up and down, and transfer it to a new RNase-free 1.5-ml microcentrifuge tube. 16. Vortex, and wash the pellet for 10 min at room temperature on a rotator. 17. Centrifuge at 7,500 × g for 15 min at 4°C. 18. Carefully remove the supernatant, and briefly air-dry the pellet at room temperature (see Note 7). 19. Resuspend the pellet in 20–100 ml DEPC-treated water (see Note 8), and incubate at 65°C for 5 min on a shaking Thermomixer. 20. Cool down the sample on ice, and determine the quantity, purity, and integrity of your RNA (see Note 9). 21. Store RNA at −80°C (see Note 10). 3.3. RNA Amplification

The basic principle of the Amino Allyl MessageAmp aRNA amplification procedure is as follows: First, total RNA is reverse transcribed into cDNA using an oligo(dT) primer, to which a T7 promotor is attached. Second, T7 RNA polymerase is used to transcribe the cDNA into antisense RNA (aRNA), which represents the actual amplification step. The Ambion protocol is straightforward, and all required reagents are included in the kit: First-strand and second-strand cDNA synthesis (reverse transcription), cDNA purification, in vitro transcription (aRNA synthesis), aRNA purification, dye-coupling reaction, and purification of labeled aRNA. The handbook is in principle designed like a cookbook, and we followed the protocol without any modifications. Therefore, we simply refer

334

Ghadimi and Grade

to Ambion’s website (http://www.ambion.com/techlib/prot/ fm_1752.pdf). There are, however, several points to consider, which are discussed below: 1. It is recommended to use 100–2,000 ng of total RNA. Since we needed 5 mg aRNA for subsequent hybridizations, we decided to start with 5 mg of total RNA (which is the maximum suggested RNA input). 2. The in vitro transcription can be performed for 6–14 h. For convenience, we therefore incubated overnight. 3. The Amino Allyl MessageAmp aRNA kit allows one round or two rounds of amplification. 4. A DNase I treatment is optional, and we recommend including this step (especially if further validation using real-time polymerase chain reaction [PCR] is intended). 5. Common-reference pool: For dual-color hybridizations, we strongly recommend creating a pool of amplified reference RNA. Because it is mandatory to use aliquots of the same reference pool for all microarray analyses within a given experiment, estimate the total number of anticipated hybridizations and amplify sufficient quantities of reference RNA. To guarantee the stability of this aRNA pool over time, quality controls using a Bioanalyzer 2100 should be performed routinely. 3.4. Microarray Hybridization 3.4.1. NHS Ester Coupling

During in vitro transcription of cDNA into aRNA, amino allyl UTP nucleotides (aaUTP) have been incorporated. Within the coupling reaction, NHS ester dyes form a chemical bond to the reactive primary amino group of the aaUTP (C5 position of uracil). 1. Thaw resuspended NHS–Cy dyes in the dark. 2. Heat up the frozen RNA sample for 5 min at 65°C and put on ice (see Note 11). 3. Transfer 5 mg aRNA into a new tube and vacuum dry in a Speed Vac (see Note 12). 4. Add 9 ml coupling buffer, pipette up and down, vortex, and spin down briefly. 5. Incubate for 10 min at 37°C. 6. Add 5 ml NHS–Cy dyes, flick, and spin down briefly. 7. Incubate for 1 h at room temperature in the dark. 8. Add 4.5 ml of 4 M hydroxylamine, flick, and spin down briefly, and incubate for 15 min at room temperature in the dark.

3.4.2. Purification

1. Heat up nuclease-free water to 60°C. 2. Add 80.5 ml preheated nuclease-free water to each tube, resulting in a final volume of 100 ml per tube, and mix well by pipetting up and down.

Cancer Gene Profiling for Response Prediction

335

3. Add 350 ml aRNA binding buffer, and mix well by pipetting up and down. 4. Add 250 ml of 100% ethanol, and mix well by pipetting up and down. 5. Immediately transfer the entire volume to an aRNA filter cartridge (aRNA collection tube), and spin for 1 min at 10,000 × g. 6. Discard the flow-through, and place the filter cartridge back in the original tube, taking care not to touch the tip of the cartridge. 7. Add 650 ml aRNA wash buffer, and spin for 1 min at 10,000 × g. 8. Discard the flow-through and spin again for 1 min at 10,000 × g. 9. Transfer the filter cartridge to a new aRNA collection tube. 10. Add 50 ml preheated nuclease-free water to the middle of the filter, incubate for 2 min at room temperature, and spin for 2 min at 10,000 × g. 11. Repeat step 10, and place the samples on ice (see Note 13). 12. Measure the incorporated dye concentration for the labeled aRNA with a Nanodrop. 3.4.3. Fragmentation

1. Combine tumor and reference sample (see Note 14) and transfer the entire volume to a Microcon YM-30 column. 2. Spin down for 6 min at 8,000 × g to decrease the reaction volume. 3. Flip the column, place it into a new vial, and spin for 3 min at 1,000 × g. 4. If necessary, add nuclease-free water to end up with a final volume of 9 ml (see Note 15). 5. Add 1 ml of the 10× fragmentation buffer, flick, and spin down briefly. 6. Incubate for 15 min at 70°C. 7. Spin down briefly, add 1 ml stop solution, and place the tube on ice.

3.4.4. Prehybridization of Array Slides

1. Wash LifterSlips and array slide with isopropyl alcohol. 2. Place LifterSlip onto the array and carefully apply 80 ml warm prehybridization solution at one end. The solution should be drawn under the LifterSlip onto the hybridization area. 3. Place the array slide into a hybridization chamber and incubate for 30–60 min in a 42°C waterbath.

336

Ghadimi and Grade

4. Wash array slide in DEPC-treated water and isopropyl alcohol by dipping them into the corresponding solutions, and air-dry for no longer than 1 h at room temperature. 3.4.5. Preparation of Hybridization Solution

1. Add 29 ml nuclease-free water to bring the final volume to 40 ml. 2. Denature the probe mix at 90°C for 2 min and immediately place on ice. 3. Add 40 ml preheated hybridization buffer, mix thoroughly, and spin down briefly.

3.4.6. Hybridization

1. Place the LifterSlip onto array slide. 2. Spin down the probe briefly, and apply the entire volume (80 ml) to one end of the LifterSlip, allowing it to be drawn under to the hybridization area. 3. Add 20 ml of nuclease-free water to the hybridization cassette to prevent evaporation of the hybridization solution. 4. Tightly seal the chamber and incubate submerged at the bottom of a 42°C waterbath for 16 h.

3.4.7. Washing

1. Wash slides for 2 min in solution 1 (shaking). 2. Wash slides for 2 min in solution 2 (shaking). 3. Wash slides for 2 min in solution 3 (shaking). 4. Spin array slides in a centrifuge at 650 rpm for 3 min to dry, and store them in the dark.

3.4.8. Scanning

Scan the slides within 24 h, preferably directly after the washes. Use settings according to the recommendations in the scanner manual.

4. Notes 1. TRIzol is toxic and should be handled under a fume hood. We recommend weighing the tissue samples at this point, because it might give a rough estimate on the amount of RNA to expect. Additionally, you can freeze down your samples at this point at −80°C. 2. Three phases should be visible: a lower red phenol-chloroform phase (proteins), an interphase (DNA), and a colorless upper aqueous phase (RNA). Be careful not to disturb the interphase when removing the upper phase; it is better to loose some RNA than to risk contamination with DNA. If you wish to subsequently isolate DNA and proteins as well, you need to keep the phenol–chloroform phase and the interphase (please read the manufacture’s protocol).

Cancer Gene Profiling for Response Prediction

337

3. Glycogen serves as an inert co-precipitant and increases nucleic acid recovery. It also helps to visualize the RNA pellet after precipitation, and does not inhibit further reactions. 4. One may wish to stop at this point and leave the tubes overnight at −20°C. 5. Depending on the size of the tissue samples, this pellet might be very small and difficult to see (see Note 3). 6. You can stop at this point, and store your sample at −20°C (months). 7. Do not vacuum-dry the RNA pellet, and do not air-dry it completely, otherwise, its solubility will be decreased. 8. The volume of DEPC-treated water to be added is strongly influenced by the amplification protocol that you wish to use. For our purposes, we needed an RNA concentration of >0.5 mg/ml. 9. To evaluate RNA quantity and purity, perform spectrophotometric readings at wavelengths of 260 and 280 nm. Depending on the quality of the RNA, expect 260/280 ratios between 1.9 and 2.1. For subsequent microarray experiments, we strongly recommend analyzing your samples with Agilent’s Bioanalyzer or a similar technique. A spectrophotometer does not provide information about the RNA integrity, and even RNA with a perfect 260/280 ratio can be degraded. Even though a Bioanalyzer is not capable of determining the percentage of full-length mRNA, it is, in our opinion, more reliable than a conventional denaturing agarose gel and requires a smaller amount of RNA. 10. There is an ongoing discussion about long-term storage of RNA in DEPC-treated water, which is slightly acidic and may result in RNA degradation. In our own experiments, and by comparing RNA samples over time, this has not been a specific problem. Alternatives for long-term storage are 70% ethanol or 0.1× TE, although the EDTA may need to be removed to prevent inhibition of subsequent enzymatic reactions in the protocol. 11. Heating up samples at 65°C destroys secondary structures. Even though it is well known that continuous freezing– thawing impairs RNA integrity, this is a necessary step. 12. Make sure not to over-dry RNA samples. Clean the Speed Vac prior to use. 13. It usually takes time to perform the subsequent measurements and calculations, therefore the labeled aRNA should be placed on ice. 14. Make sure to use equal amounts of the two samples to be hybridized based on the dye concentration, not the aRNA

338

Ghadimi and Grade

concentration. In general, measured dye concentrations of 1.5 pmol/ml result in good hybridization signals. If you wish, you may stop at this point and store the labeled probes at −80°C. 15. If the volume remains >9 ml, vacuum dry briefly.

Acknowledgments The authors thank Drs. Michael J. Difilippantonio and Jochen Gaedcke, and Mr. Patrick Hörmann for their advice. This work was supported by the Deutsche Krebshilfe and the Deutsche Forschungsgemeinschaft (KFO 179). References 1. Quackenbush J. (2006) Microarray analysis and tumor classification. N Engl J Med. 354, 2463–2472. 2. Jensen EH, McLoughlin JM, Yeatman TJ. (2006) Microarrays in gastrointestinal cancer: is personalized prediction of response to chemotherapy at hand? Curr Opin Oncol. 18, 374–380. 3. Bol D, Ebner R. (2006) Gene expression profiling in the discovery, optimization and development of novel drugs: one universal screening platform. Pharmacogenomics. 7, 227–235. 4. Nevins JR, Potti A. (2007) Mining gene expression profiles: expression signatures as cancer phenotypes. Nat Rev Genet. 8, 601–609. 5. Ross DT, Scherf U, Eisen MB, Perou CM, Rees C, Spellman P, Iyer V, Jeffrey SS, Van de Rijn M, Waltham M, Pergamenschikov A, Lee JC, Lashkari D, Shalon D, Myers TG, Weinstein JN, Botstein D, Brown PO. (2000) Systematic variation in gene expression patterns in human cancer cell lines. Nat Genet. 24, 227–235. 6. Scherf U, Ross DT, Waltham M, Smith LH, Lee JK, Tanabe L, Kohn KW, Reinhold WC, Myers TG, Andrews DT, Scudiero DA, Eisen MB, Sausville EA, Pommier Y, Botstein D, Brown PO, Weinstein JN. (2000) A gene expression database for the molecular pharmacology of cancer. Nat Genet. 24, 236–244. 7. Mariadason JM, Arango D, Shi Q, Wilson AJ, Corner GA, Nicholas C, Aranes MJ, Lesser M, Schwartz EL, Augenlicht LH. (2003) Gene expression profiling-based prediction

of response of colon carcinoma cells to 5fluorouracil and camptothecin. Cancer Res. 63, 8791–8812. 8. Torres-Roca JF, Eschrich S, Zhao H, Bloom G, Sung J, McCarthy S, Cantor AB, Scuto A, Li C, Zhang S, Jove R, Yeatman T. (2005) Prediction of radiation sensitivity using a gene expression classifier. Cancer Res. 65, 7169–7176. 9. Potti A, Dressman HK, Bild A, Riedel RF, Chan G, Sayer R, Cragun J, Cottrill H, Kelley MJ, Petersen R, Harpole D, Marks J, Berchuck A, Ginsburg GS, Febbo P, Lancaster J, Nevins JR. (2006) Genomic signatures to guide the use of chemotherapeutics. Nat Med. 12, 1294–1300. 10. Lønning PE, Knappskog S, Staalesen V, Chrisanthar R, Lillehaug JR. (2007) Breast cancer prognostication and prediction in the postgenomic era. Ann Oncol. 18, 1293–1306. 11. Luthra R, Wu TT, Luthra MG, Izzo J, Lopez-Alvarez E, Zhang L, Bailey J, Lee JH, Bresalier R, Rashid A, Swisher SG, Ajani JA. (2006) Gene expression profiling of localized esophageal carcinomas: association with pathologic response to preoperative chemoradiation. J Clin Oncol. 24, 259–267. 12. Del Rio M, Molina F, Bascoul-Mollevi C, Copois V, Bibeau F, Chalbos P, Bareil C, Kramar A, Salvetat N, Fraslon C, Conseiller E, Granci V, Leblanc B, Pau B, Martineau P, Ychou M. (2007) Gene expression signature in advanced colorectal cancer patients select drugs and response for the use of leucovorin, fluorouracil, and irinotecan. J Clin Oncol. 25, 773–780.

Cancer Gene Profiling for Response Prediction

13. van ‘t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH. (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature. 415, 530–536. 14. van de Vijver MJ, He YD, van’t Veer LJ, Dai H, Hart AA, Voskuil DW, Schreiber GJ, Peterse JL, Roberts C, Marton MJ, Parrish M, Atsma D, Witteveen A, Glas A, Delahaye L, van der Velde T, Bartelink H, Rodenhuis S, Rutgers ET, Friend SH, Bernards R. (2002) A geneexpression signature as a predictor of survival in breast cancer. N Engl J Med. 347, 1999–2009. 15. Buyse M, Loi S, van’t Veer L, Viale G, Delorenzi M, Glas AM, d’Assignies MS, Bergh J, Lidereau R, Ellis P, Harris A, Bogaerts J, Therasse P, Floore A, Amakrane M, Piette F, Rutgers E, Sotiriou C, Cardoso F, Piccart MJ. (2006) TRANSBIG Consortium. Validation and clinical utility of a 70-gene prognostic signature for women with node-negative breast cancer. J Natl Cancer Inst. 98, 1183–1192 16. Potti A, Mukherjee S, Petersen R, Dressman HK, Bild A, Koontz J, Kratzke R, Watson MA, Kelley M, Ginsburg GS, West M, Harpole DH Jr, Nevins JR. A genomic strategy to refine prognosis in early-stage non-small-cell lung cancer. N Engl J Med. 355, 570–580. 17. Bogaerts J, Cardoso F, Buyse M, Braga S, Loi S, Harrison JA, Bines J, Mook S, Decker N, Ravdin P, Therasse P, Rutgers E, van ‘t Veer LJ, Piccart M; TRANSBIG consortium. Gene signature evaluation as a prognostic tool: challenges in the design of the MINDACT trial. Nat Clin Pract Oncol. 3, 540–551.

339

18. Anguiano A, Potti A. (2007) Genomic signatures individualize therapeutic decisions in non-small-cell lung cancer. Expert Rev Mol Diagn. 7, 837–844. 19. Ghadimi BM, Grade M, Difilippantonio MJ, Varma S, Simon R, Montagna C, Füzesi L, Langer C, Becker H, Liersch T, Ried T. (2005) Effectiveness of gene expression profiling for response prediction of rectal adenocarcinomas to preoperative chemoradiotherapy. J Clin Oncol. 23, 1826–1838. 20. de Reyniès A, Geromin D, Cayuela JM, Petel F, Dessen P, Sigaux F, Rickman DS. (2006) Comparison of the latest commercial short and long oligonucleotide microarray technologies. BMC Genomics. 7, 51. 21. Patterson TA, Lobenhofer EK, Fulmer-Smentek SB, Collins PJ, Chu TM, Bao W, Fang H, Kawasaki ES, Hager J, Tikhonova IR, Walker SJ, Zhang L, Hurban P, de Longueville F, Fuscoe JC, Tong W, Shi L, Wolfinger RD. (2006) Performance comparison of one-color and two-color platforms within the MicroArray Quality Control (MAQC) project. Nat Biotechnol. 24, 1140–50. 22. Fare TL, Coffey EM, Dai H, He YD, Kessler DA, Kilian KA, Koch JE, LeProust E, Marton MJ, Meyer MR, Stoughton RB, Tokiwa GY, Wang Y. (2003) Effects of atmospheric ozone on microarray data quality. Anal Chem. 75, 4672–4675. 23. Branham WS, Melvin CD, Han T, Desai VG, Moland CL, Scully AT, Fuscoe JC. (2007) Elimination of laboratory ozone leads to a dramatic improvement in the reproducibility of microarray gene expression measurements. BMC Biotechnol. 7, 8.

Chapter 17 The EGFR Pathway as an Example for Genotype: Phenotype Correlation in Tumor Genes Ulrike Mogck, Eray Goekkurt, and Jan Stoehlmacher Summary Tumor-specific and germ-line variations of DNA significantly contribute to tumor growth and its ability to develop resistance. Among several mechanisms that cause resistance to cancer treatment, the genotype of certain growth factors, like epidermal growth factor receptor (EGFR), is critical. EGFR signals requests for proliferation and survival toward the nucleus of the cancer cell. Several polymorphic DNA sequences of EGFR and the mutational status of the Kirsten-Ras (KRAS) gene appear to be determinants of response to new drugs that inhibit EGFR. We describe the correlation between the EGFR genotype, including the KRAS mutation, and the consequences of the resulting genotype for anti-EGFR therapy in colorectal cancer. Key words: KRAS, EGFR polymorphism, Genotype, Colorectal cancer

1. Introduction The phenotypes of genes involved in cancer development, growth, and resistance to chemotherapy have been shown to be highly relevant in the treatment of solid tumors. The majority of these processes are mostly determined by the individual genetic set-up. Tumor-specific aberrations of the genome may lead to significant changes in expression and/or functionality of the encoded gene. The efficacy of cell replication and response to cancer treatment is linked to the current state of functionality of essential tumor genes like growth factors or DNA repair genes. Differences of this genetically determined phenotype may translate directly into variations of tumor behavior. These differences are a daily clinical phenomenon (1). Robert Grützmann and Christian Pilarsky (eds.), Cancer Gene Profiling: Methods and Protocols, Methods in Molecular Biology, vol. 576 DOI 10.1007/978-1-59745-545-9_17, © Humana Press, a part of Springer Science + Business Media, LLC 2010

341

342

Mogck, Goekkurt, and Stoehlmacher

Changes within these genes may occur as polymorphic DNA sequences or mutations. Most of these variations are without effect to further development of the cancer cell. However, some genetic aberrations cause significant impact on protein function (2). These tumor-specific changes are the basis for treatment responsiveness of the tumor (1). In addition, DNA sequence variations among individuals that are inherited and transmitted from one generation to the next, so-called germ-line polymorphisms, are responsible for a lot of differences in phenotype with respect to cancer treatment-related side effects. Thus, genotypedependent differences are responsible for both the main and side effects of current antitumor therapies. The epidermal growth factor receptor (EGFR) pathway has become of major interest for the treatment of several solid tumors including colorectal cancer, lung cancer, and cancers of the head and neck (3, 4). It has been demonstrated that the activation of processes downstream from EGFR are related to genetic changes in the coding sequence of the receptor, which include different polymorphisms and mutations (5). The effects of EGFR activation for growth, survival, and apoptosis are transmitted into the nucleus by processes involving the Kirsten-Ras (KRAS) pathway (6). The activating character of point mutations within codons 12 and 13 of the exon 1 of the KRAS gene have been known for many years to contribute to the carcinogenesis of different cancers. In modern oncology treatment regimens, several drugs target EGFR or its tyrosine kinases, therefore, these mutations became very important. Recent data in colorectal cancer demonstrated that the strategy of EGFR inhibition is ineffective in patients possessing a KRAS mutation (7, 8). Therefore, the constantly activated EGFR pathway based on a single KRAS point mutation determines efficacy of cancer treatment and tumor growth. Acknowledging these close genotype–phenotype interactions, the European Medicines Agency (EMEA) approved panitumumab, an anti-EGFR directed antibody, for the treatment of colorectal cancer only in patients who are demonstrated to have wild-type KRAS. In addition to the KRAS status, the EGFR polymorphisms R497K, C-191A, G-216T, A61G, (CA)nVNTR in exon 1 have been linked to tumor growth under EGFR inhibition.

2. Materials 2.1. DNA Extraction from Blood

1. Lysis buffer: 1 mM NH4HCO3, 115 mM NH4Cl in water. 2. White cell lysis buffer: 100 mM Tris–HCl (pH 7.6), 40 mM EDTA (pH 8.0), 50 mM NaCl, 0.05% sodium

The EGFR Pathway as an Example for Genotype

343

acetate, 0.2% SDS in water. Note: autoclave buffer before adding SDS. 3. Saturated NaCl (~6 M) in water. 4. Ethanol absolute; store at 4°C. 5. TE-buffer: 10 mM Tris base; 1 mM EDTA in water. 2.2. DNA Extraction from Formalin-Fixed, Paraffin-Embedded (FFPE) Tissue

1. Xylole. 2. Lysis buffer: 10 mM Tris–HCl (pH 7.6), 1 mM EDTA (pH 0.8), 0.01% SDS in water. 3. 20 mg/ml proteinase K in water (Sigma); store in single-use aliquots at −20°C. 4. Phenol:chloroform:isopropanol alcohol (25:24:1); store in a glass bottle at 4°C. 5. Chloroform; store in a glass bottle at 4°C. 6. Sodium acetate: 3 M, pH 4.6, in water. 7. Isopropanol.

2.3. Direct Sequencing

1. Sodium acetate: 3 M, pH 4.6 in water. 2. Ethanol absolute; store at 4°C. 3. AmpliTaq BigDye terminator premix (Applied Biosystems). 4. Sequence-specific primer.

2.4. Polymerase Chain Reaction (PCR)Restriction Fragment Length Polymorphisms (RFLP)

1. Taq DNA polymerase (Invitrogen) or Hot Start Taq DNA polymerase (Qiagen). 2. 100 mM dNTP stock solution (Invitrogen). 3. Q-solution (Qiagen). 4. Suitable enzyme (New England Biolabs). 5. 3% (w/v) Agarose (Invitrogen) in 1× TBE; prepare 10× stock solution: 108 g Tris base, 55 g boric acid, 9.3 g EDTA per liter in water, pH 8.3.

2.5. GeneScan Analyses

1. TBE buffer (10×): 108 g Tris base, 55 g boric acid, 9.3 g EDTA per liter in water, pH 8.3. 2. Urea. 3. Long Ranger (Bio-Rad). 4. Ammonium persulfate: 10% solution in water; store in singleuse aliquots at −20°C. 5. TEMED (Sigma). 6. Formamide. 7. Loading buffer: 50% glycerol, 0.05% bromphenol blue, 100 mM EDTA. 8. ROX standard 500 (Applied Biosystems).

344

Mogck, Goekkurt, and Stoehlmacher

3. Methods 3.1. DNA Extraction

DNA, which is the basis of genotype–phenotype analysis, can be extracted from host leucocytes (normal cells) or from tumor cells from either FFPE or fresh-frozen tissue. In terms of germline polymorphisms, genotyping from normal host cells might be sufficient but might not always reflect the tumor genotype. Indeed, it has to be taken into account that tumor-specific changes of the genome (e.g., chromosomal aberrations) might lead to discrepancies between genotyping results from normal cells and from tumor cells. Therefore, in the case of tumor-specific mutations, DNA from tumor cells has to be analyzed. As an alternative to the following DNA extraction manual, several commercially kits for DNA extraction from all tissue types are available.

3.1.1. DNA Extraction from Blood

1. Collect 5 ml of whole blood in a 15-ml Falcon tube and add 10 ml of lysis buffer. Mix completely by inversion. 2. Spin for 10 min in a table centrifuge. 3. Discard the supernatant, resuspend the cell pellet in 10 ml lysis buffer, and repeat the centrifugation step. 4. Discard the supernatant and resuspend the cell pellet in 1.8 ml white cell lysis buffer. The cell/buffer solution should be gelatinous. 5. Cell lysate can be stored at 4°C. 6. For final DNA extraction, add 150 ml saturated NaCl solution to 400 ml white cell lysate. 7. Mix by inverting and vortexing well and incubate on ice for 10 min. 8. Pipet the supernatant into a new 1.5-ml tube and add 1 ml of absolute ethanol. Mix by inversion, and a DNA precipitate should be visible. 9. Spin in a microcentrifuge for 1 min at maximum speed and discard the supernatant. 10. Wash the pellet with 1 ml of 70% ethanol, air-dry, and resuspend in TE buffer (see Note 1).

3.1.2. DNA Extraction from FFPE Tissue by Microdissection

1. One 3-mm-thick slice is needed for hematoxylin and eosin (HE) staining and two to five 10-mm-thick slices are needed for DNA isolation. 2. The HE-stained thin section will be reviewed by a pathologist and areas of interest should be outlined (see Note 2). 3. Compare the HE-stained slice with the 10-mm-thick slices, and, with a blade, mark the areas of interest on the thick slices.

The EGFR Pathway as an Example for Genotype

345

4. With a blade, carefully scrape the tissue from the slide and put it into a 1.5-ml tube. Use one tube for a sample. 5. For deparaffinization, add 1.2 ml Xylole to the tube and mix thoroughly. Centrifuge for 5 min at maximum speed. 6. Carefully remove the supernatant with a pipet and add 1.2 ml absolute ethanol to remove the Xylole. Mix carefully and centrifuge for 5 min at maximum speed. 7. Carefully remove the supernatant and repeat once. 8. Add 1 ml of 70% ethanol, mix carefully, and centrifuge for 5 min at maximum speed; remove the supernatant with a pipet and air-dry the pellet at 37°C until the ethanol is complete removed. 9. Resuspend the pellet in 500 ml lysis buffer supplemented with proteinase K at a concentration of 2 mg/ml. 10. Mix well by vortexing and incubate overnight at 56°C until all tissue fragments are dissolved completely. 11. On the next day, add 500 ml phenol:chloroform:isopropanol alcohol and mix thoroughly by vortexing. 12. Centrifuge at 13,000 ´ g for 10 min and, with a 100-ml pipet, transfer the upper phase into a new tube. Add 1 volume of chloroform, and mix thoroughly by vortexing. Centrifuge for 5 min at 13,000 ´ g. 13. Carefully remove the upper aqueous phase and transfer it into a new tube. Add 0.1 volume sodium acetate and 1 volume ice-cooled isopropanol and incubate overnight at −20°C. 14. Centrifuge at 13,000 ´ g at 4°C for 15 min and the DNA should visible as a small pellet on the bottom. 15. Carefully discard the supernatant and wash the pellet once with 70% ethanol. Centrifuge for 5 min at 13,000 ´ g at 4°C and remove the supernatant. Air-dry the pellet. 16. Dissolve the DNA in 50 ml (or an appropriate volume) water. 17. For optimal quality of DNA, storage is recommended at −20°C. 3.2. Genotyping Methods

Depending on the DNA variation of interest, various methods for genotyping can be considered. Allele-specific direct sequencing provides a method for detecting all kinds of mutations/polymorphisms. The most often seen DNA variations are point mutations, either as germline single-nucleotide polymorphisms (SNP) or tumor-specific mutations. In this cases, PCR-based RFLP techniques or real-time PCR strategies (not shown) provide easyto-perform genotyping methods. In case of a tandem repeat or insertion/deletion polymorphisms, genotyping can be performed by GeneScan analyses using a genetic analyzer (sequencer),

346

Mogck, Goekkurt, and Stoehlmacher

which is able to detect sequence variations of 1-bp difference. Comprehensive SNP analyses may be carried out using SNP chip technology either with whole-genome SNP chip arrays (screening) or customized chip arrays (e.g., metabolic pathways). 3.2.1. Direct Sequencing

1. For sequencing, PCR is performed in a total volume of 25 ml with specific primers and suitable salt and dNTP conditions. 2. The PCR product has to be purified by ethanol. Therefore, the volume is adjusted to 150 ml by water, 15 ml sodium acetate (3 M, pH 4.6) and 375 ml ethanol absolute (4°C) are mixed and centrifuged at 15,000 ´ g for 15 min at room temperature. After centrifugation, the supernatant is carefully removed by pipetting, the pellet is washed with 250 ml of 70% ethanol, and a second centrifugation is performed at 15,000 ´ g for 5 min. 3. The pellet is air-dried and is resuspended in water. 4. 2 ng DNA is mixed with 4 ml AmpliTaq BigDye terminator premix and 0.2 mM sequence primer (same primer as used for PCR) in a total volume of 10 ml. 5. The sequence reaction is performed in a thermal cycler system with the following conditions: denaturation for 10 s at 95°C and an annealing and extension step for 90 s at 58°C. This is repeated 19 times (see Note 3). 6. To remove unincorporated ddNTP, a second ethanol precipitation must be performed (10 ml reaction mix, 140 ml water, 15 ml sodium acetate, 375 ml ethanol). The dry pellet is resuspended in 20 ml HPLC-quality water and can be analyzed in a Genetic Sequencer (see Note 4). 7. Data interpretation is carried out by ABI Sequence Analysis 5.2 software (Fig. 1).

3.2.2. Pcr-rflp

1. PCR is performed in a total reaction volume of 25–50 ml containing 50 ng template DNA, 0.4 mM specific primer, 2 mM MgCl2, 2 mM each dNTP, and 1 U Taq DNA polymerase. 2. Under suitable conditions, PCR is performed in a thermal cycling system (Table 1). 3. Due to of the high GC content in the promotor area, an optimized PCR is used: Q-solution is added to the reaction mix and instead of Taq DNA polymerase, a Hot Start Taq DNA polymerase is used. Hot Start polymerase is easily activated by a 15-min 95°C incubation step. 4. For RFLP, mix 15 ml PCR product with 3 U enzyme and suitable buffer in a 25-ml reaction volume. 5. The mix is incubated for 3 h at 37°C and loaded onto a 3% agarose gel. DNA fragments are visible in UV light and documented by taking a picture (Fig. 2).

The EGFR Pathway as an Example for Genotype

347

Fig. 1. Direct sequencing of parts of exon 1 from KRAS with examples for mutations in exon 12 and exon 13.

3.2.3. GeneScan Analyses

1. The instructions assume the use of a Genetic Analyzer 377 XL from Applied Biosystems. 2. Prepare a 0.7-mm-thick 10% gel by mixing 18 g urea, 25.5 g Aqua-Dest, 5 ml of 10× TBE buffer, 5 ml Long Ranger, 250 ml ammonium persulfate solution, and 35 ml TEMED. 3. Glass plates are cleaned by Alconox, rinsed with water, and dried with special tissues. 4. Pour the gel and it should polymerize for at least 2 h or, preferably, overnight (see Note 5). 5. Prepare the running buffer by a ten times dilution of concentrated TBE buffer. 6. Prepare the samples by mixing 2 ml PCR product with 5 ml FLS (formamide and loading buffer 4:1) and 0.55 ml ROX 500 Standard. Denature the samples for 2 min at 95°C and store on ice immediately after denaturation (see Note 6). 7. Rinse the wells with a needle with running buffer before loading a 1.8-ml sample. 8. The run will be performed with a gel temperature at 51°C, laser power of 200 mV, and a collection time of 2 h. 9. Data are analyzed with the ABI PRISM GeneScan software (Fig. 3).

Kras 12/13

EGFR(CA)n

EGFR C-191A

EGFR G-216T

EGFR R497K

F 5¢ TGTCACTAAAGGAAAGGA 3¢

EGF A61G

R 5¢ CATGAAAATGGTCAGAGAAACC 3¢

F 5¢ TAGTGGTGGAGTATTTGATAGT 3¢

R 5¢ TTC TTC TGC ACA CTT GGC AC 3¢

F _FAM 5¢ GTT TGA AGA ATT TGA GCC AAC C 3¢

R 5¢ GAGGTGGCCTGTCGTCCGGTCT 3¢

F 5¢ TCTGCTCCTCCCGATCCCTCCT 3¢

R 5¢ GAGGTGGCCTGTCGTCCGGTCT 3¢

F 5¢ TCTGCTCCTCCCGATCCCTCCT 3¢

R 5¢ CCA GAA GGT TGC ACT TGT CC 3¢

F 5¢ TGC TGT GAC CCA CTC TGT CT 3¢

R 5¢ TTCACAGAGTTTAACAGCCC 3¢

Primer sequence

Polymorphisms

275 bp

116–128 bp

224 bp

224 bp

155 bp

150 bp

Fragment length

Table 1 Primer and PCR conditions for genotyping the EGFR pathway

–

–

Sac II

BseR I

BstN I

Alu I

Enzyme

95°C 30¢, 54°C 30¢, 72°C 45¢, 35 cycles

94°C 30¢, 55°C 30¢, 72°C 30¢, 30 cycles

98°C 5¢, 59°C 10¢, 72°C 20¢, 38 cycles

98°C 5¢, 59°C 10¢, 72°C 20¢, 38 cycles

94°C 60¢, 62°C 45¢, 72°C 30¢, 35 cycles

95°C 30¢, 51°C 30¢, 72°C 30¢, 35 cycles

Cycle conditions

348 Mogck, Goekkurt, and Stoehlmacher

The EGFR Pathway as an Example for Genotype

349

Fig. 2. RFLP of EGFR promoter polymorphisms (−191 and −216). Lanes 1 and 8: 50-bp ladder; lane 2: EGFR-191 wild type; lane 3: EGFR-191 mutant; lane 4: EGFR-191 heterozygote; lane 5: EGFR-216 wild type; lane 6: EGFR-216 mutant; lane 7: EGFR-216 heterozygote.

Fig. 3. Detection of the EGFR (CA)n intron 1 polymorphism using the GeneScan method, (a) heterozygote (18/20), (b) homozygote (18/18).

4. Notes 1. DNA should not be resuspended by pipetting. Store the solution overnight at 4°C to resuspend the DNA completely. For optimal quality of DNA, storage is recommended at −20°C.

350

Mogck, Goekkurt, and Stoehlmacher

2. The surface of the tumor corresponds to the DNA concentration. To assess the surface of the tissue to be dissected, use a grid based on millimeter paper. 3. There are different chemistries for the sequence PCR. For short fragments, use a mix with a high concentration of ddNTPs. 4. When using a capillary sequencer, choose the right polymer (e.g., POP4 is better for sequencing short fragments; POP7 is recommended for high throughput). 5. It is very important that are no air bubbles in gel. Pour the gel while knocking on the glass plates. 6. FLS is not stable, it can be stored at 4°C for 1 week. For better results, always prepare fresh solution. References 1. McLeod H. (2006) Individualizing cancer chemotherapy. Clin Adv Hematol Oncol. 4(4), 259–61. 2. Evans W. E., and Relling M. V. (2004) Moving towards individualized medicine with pharmacogenomics. Nature 429, 464–8. 3. Overmann M. J., and Hoff P. M. (2007) EGFR-targeted therapies in colorectal cancer. Dis Colon Rectum 50, 1259–70. 4. Reuter C. W., Morgan M. A., and Eckardt A. (2007) Targeting EGF-receptor-signalling in squamous cell carcinomas of the head and neck. Br J Cancer 96, 408–16 5. Gebhardt F., Bürger H., and Brandt B. (2000) Modulation of EGFR gene transcription by secondary structures, a polymorphic repetitive sequence and mutations - a link between

genetics and epigenetics. Histol Histopathol 15, 929–36 6. McCubrey J. A., Steelman L. S., Chappell W. H., Abrams S. L., Wong E. W., Chang F., et al. (2007) Roles of the Raf/MEK/ERK pathway in cell growth, malignant transformation and drug resistance. Biochim Biophys Acta 1773, 1263–84. 7. Lievre A., Bachet J. B., LeCorre D., Boige V., Laudi B., Emile J. F., et al. (2006) KRAS mutation status is predictive of response to cetuximab therapy in colorectal cancer. Cancer Res 66, 3992–5. 8. Amado R. G., Wolf M., Freeman D., Peeters M., Van Cutsem E., Siena S., et al. (2007) Wild-type KRAS is required for panitumumab efficacy in patients with metastatic colorectal cancer: results from a randomized, controlled trial. ECCO 2007, LBA#7.

Chapter 18 Quantitation of CD39 Gene Expression in Pancreatic Tissue by Real-Time Polymerase Chain Reaction Martin Loos, Beat Künzli, and Helmut Friess Summary Within the past decade, the field of gene expression analysis has constantly evolved, with numerous technologies being available for RNA quantification, including differential display, serial analysis of gene expression (SAGE), quantitative real-time (qRT) polymerase chain reaction (PCR), and microarrays. Although every technique has its specific application, the high levels of accuracy, reproducibility, sensitivity, and specificity have established qRT-PCR as a standard method for detection and quantification of gene expression. In this chapter, all steps of the qRT-PCR procedure, including purification of total RNA from animal tissues, reverse transcription to complementary DNA (cDNA), and quantification of relative gene expression are discussed. We chose qRT-PCR analysis of CD39 in pancreatic tissue as an example that is applicable to any gene of interest. CD39/ecto-nucleoside triphosphate diphosphohydrolase-type-1 (ENTPD1) is the dominant vascular ecto-nucleotidase that hydrolyzes extracellular nucleotides to integrate purinergic signaling responses. It has recently been associated with tumor growth and proliferation in melanoma cells and linked to pancreatic cancer progression. Key words: Gene expression analysis, Quantitative real-time polymerase chain reaction (qRT-PCR), Pancreatic cancer, CD39

1. Introduction Gene expression analysis is widely used in biological and biomedical research. In cancer research, quantitative RNA analysis plays a fundamental role in the identification of aberrant gene expression, which is not only responsible for the development and progression but is also responsible for the resistance to treatment of malignant diseases. DNA microarrays are one of the most popular techniques for high-throughput transcriptional profiling. Robert Grützmann and Christian Pilarsky (eds.), Cancer Gene Profiling: Methods and Protocols, Methods in Molecular Biology, vol. 576, DOI 10.1007/978-1-59745-545-9_18, © Humana Press, a part of Springer Science + Business Media, LLC 2010

351

352

Loos, Künzli, and Friess

The power of microarrays lies mostly in the simultaneous quantification of thousands of different target genes, allowing extensive comparisons for gene expression changes between different groups, e.g., healthy and cancer tissues. Although the reproducibility of microarrays has been improved, other more accurate techniques have been established. Therefore, microarrays are mostly used as a broad messenger RNA (mRNA) “fishing” strategy to narrow down potential gene targets that should be further validated. Quantitative real-time (qRT) polymerase chain reaction (PCR) represents a powerful tool for detection and quantification of gene expression (1, 2). It is a refinement of the original PCR developed by Kary Mullis and co-workers in the mid-1980s (3). In addition to the amplification of a specific DNA template, qRT-PCR allows the investigator to quantify the amount of template prior to the start of the PCR process. All qRT-PCR systems depend on the detection and quantitation of a fluorescent reporter signal, which increases directly in proportion to the amount of PCR product in a reaction. Data are collected at each cycle, making it possible to monitor the PCR during the exponential amplification phase (where the amount of PCR product reflects the initial amount of target template), unlike conventional PCR. The most commonly used methods for detecting target templates include 5¢ nuclease assays. The 5¢ nuclease assay uses an oligonucleotide probe (TaqMan probe, Applied Biosystems, Foster City, CA, USA), which specifically anneals to a complementary sequence between the forward and reverse primer sites (4). For visualization, the oligonucleotide probe carries a fluorescein group such as 6-carboxyfluorescein (6-FAM) at the 5¢ end and a quencher such as 6-carboxy-tetramethyl-rhodamine (TAMRA) at 3¢ end. When the probe is intact, the proximity of the reporter dye to the quencher dye results in suppression of the reporter fluorescence primarily by Förster-type energy transfer (5). As the Taq DNA polymerase extends the PCR primer in the 5¢ to 3¢ direction, the 5¢ exonuclease activity of the polymerase will degrade the oligonucleotide probe. As a result, the reporter dye gets separated from the quencher dye, resulting in increased fluorescence of the reporter (Fig. 1). Because the fluorescent signal is generated only if the oligonucleotide probe hybridizes with its complementary target, nonspecific amplification is not detected. The increase in fluorescence can be plotted on a graph through a single reading at the end of each PCR cycle (amplification plot, Fig. 2). Relative quantitation can be used for quantitative measurement. In this method, a comparison within a sample is made between the gene of interest and a control gene. Therefore, the cycle threshold (Ct) of the control gene is subtracted from the Ct of the gene of interest. The Ct value is calculated based on the time (measured in PCR cycle numbers) at which the reporter

Quantitation of Cd39 Gene Expression in Pancreatic Tissue

353

Fig. 1. 5¢ Nuclease assay using TaqMan probe. This probe hybridizes to the template between the standard PCR primers. The oligonucleotide probe carries a fluorescein group (6-FAM) (a) at the 5¢ end and a quencher (TAMRA) (b) at the 3¢ end. During the extension phase, degradation of the probe by the activity of the Taq DNA polymerase cleaves the reporter dye from the probe and generates a fluorescent signal that can be detected.

Fig. 2. Amplification plot. The threshold is calculated as ten times the standard deviation of the average signal of the baseline fluorescent signal. A fluorescent signal above the threshold is considered a real signal that can be used to define the threshold cycle (Ct). The Ct is calculated based on time (measured in PCR cycle numbers) at which the reporter fluorescent emission increases beyond the threshold.

354

Loos, Künzli, and Friess

fluorescence emission increases beyond a threshold level (based on background levels). The Ct value is correlated to the level of starting mRNA. The higher the starting mRNA levels, the lower the Ct value, because fewer PCR cycles are required for the reporter fluorescence emission intensity to reach the threshold (6). The resulting difference in cycle number (DCt) is the exponent of the base 2 (due to the doubling function of PCR), representing the fold difference of template for these two genes. A prerequisite for the application of the relative quantitation method is that the genes analyzed have similar PCR efficiencies, preferably >95%. The PCR efficiency can be measured by performing a tenfold serial dilution of a positive control template, and by plotting the Ct as a function of log10 concentration of template (standard curve, Fig. 3). The resulting slope of the line will be a function of the PCR efficiency. A slope of −3.32 indicates the PCR is 100% efficient, meaning that the amount of template is doubled after each cycle. In this chapter, we describe the isolation of total RNA from frozen tissues, the reverse transcription to complementary DNA (cDNA), and the analysis of mRNA expression using qRT-PCR. We chose qRT-PCR analysis of CD39 in pancreatic tissues as an example that is applicable to any gene of interest (7). Relative mRNA expression of CD39 for normal pancreas, chronic pancreatitis, and pancreatic cancer is shown in Fig. 4. CD39 mRNA expression is significantly upregulated in chronic pancreatitis and pancreatic cancer, compared with healthy pancreas.

Fig. 3. Standard curve plot. The standard curve is used for calculation of PCR efficiency and quantification (PCR efficiency = 10L(1/S)). The resulting Ct values for each input amount of template are plotted as a function of the log10 concentration of input amounts (cross marks), and a linear trendline is fit to the data.

Quantitation of Cd39 Gene Expression in Pancreatic Tissue

355

Fig. 4. mRNA expression of CD39 in human pancreas. Relative mRNA expression for healthy pancreas, chronic pancreatitis, and pancreatic cancer is shown for CD39. Expression is shown as number of copies. Asterisk indicates statistical significant overexpression of CD39 in chronic pancreatitis and pancreatic cancer, compared with healthy pancreas.

2. Materials 2.1. Isolation of Total RNA

For isolation of total RNA, we use Qiagen Kit, RNeasy® Protect Mini Kit, Cat. #74124. 1. Steel mortars and pestles. 2. Sterile pipette tips. 3. Microcentrifuge (with rotor for 2-ml tubes). 4. Disposable gloves. 5. RNeasy Mini Spin Columns. 6. QIAshredder Spin Columns. 7. 1.5-ml Collection tubes. 8. 2-ml Collection tubes. 9. Buffer RW1. Store at room temperature (RT). 10. Buffer RLT. Store at RT. 11. Buffer RLC. Store at RT. 12. Buffer RPE. Store at RT. 13. 14.3 M 2-b mercaptoethanol (b-ME). Store at 4°C. 14. RNase-free water. Store at RT. 15. 96–100% Ethanol. Store at RT. 16. 70% Ethanol. Store at RT.

356

Loos, Künzli, and Friess

2.2. Reverse Transcription

For the preparation of cDNA, we use TaqMan® Reverse Transcription Reagents (Cat. #4304134). 1. MgCl2. 2. 25× dNTPs. 3. 10× RT buffer. 4. RT random primers. 5. RNase inhibitor. 6. Reverse transcriptase. 7. RNase-free water. 8. Total RNA. 9. Spectrophotometer, e.g., GeneQuant II RNA/DNA calculator (Pharmacia Biotech, Amersham Biosciences). 10. RNAsecure™ reagent (Ambion).

2.3. PCR

1. Double-distilled H2O (RNase free and DNase free). 2. 10× Taq buffer A (Applied Biosystems). Store at −20°C. 3. 10 mM dNTP mix. 4. 5 U/ml Taq DNA polymerase. Store at −20°C. 5. 200 mM Forward primer (e.g., of CD39 and 18S, Applied Biosystems). Store at 4°C. 6. 200 mM Reverse primer (e.g., of CD39 and 18S, Applied Biosystems). Store at 4°C. 7. 100 mM Probe (e.g., of CD39 and 18S, labeled with FAM and TAMRA, Applied Biosystems). Store at 4°C.

2.4. Determination of CD39 by RT-PCR

1. 96-Well optical reaction plates with optical caps (eight caps/ strip, both Applied Biosystems). 2. ABI PRISM 7700 Sequence Detection System (Applied Biosystems) or compatible real-time cycler with sequencedetection software.

3. Methods 3.1. Purification of Total RNA from Mouse Pancreatic Tissues

We use Qiagen Kit, RNeasy® Protect Mini Kit, Cat. #74124 according to the manufacturer’s protocol, with individual modifications. 1. b-ME must be added to Buffer RLT before use. Add 10 ml b-ME per 1 ml Buffer RLT. Dispense in a fume hood. The reagent can be stored at RT for up to 1 month. 2. Before using Buffer RPE, add 4 volumes of 96–100% ethanol to obtain a working solution.

Quantitation of Cd39 Gene Expression in Pancreatic Tissue

357

3. Place 30 mg of frozen, stabilized pancreatic tissue in the cooled mortar. Add liquid nitrogen into the mortar. 4. Grind the sample thoroughly by twisting the pestle. To obtain high RNA yields, the tissue must be finely pulverized (but not thawed). 5. Transfer the suspension of pulverized tissue and liquid nitrogen into an RNase-free, liquid nitrogen-cooled, 2-ml microcentrifuge tube. Place the tube on dry ice and allow the liquid nitrogen to evaporate, but do not allow the tissue to thaw. 6. Add 600 ml of Buffer RLT and immediately start homogenizing the tissue lysates. Load up to 700 ml of lysate onto a QIAshredder spin column placed in a 2-ml collection tube and spin for 2 min at 20,000 × g in a microcentrifuge. The lysate is homogenized as it passes through the spin column. 7. Centrifuge the lysate for 3 min at 20,000 × g at RT. Carefully remove the supernatant by pipetting, and transfer it to a new microcentrifuge tube. Use only this supernatant in subsequent steps. 8. Add 1 volume of 70% ethanol to the cleared lysate, and mix immediately by pipetting. Do NOT centrifuge. 9. Transfer up to 700 ml of the sample, including any precipitate that may have formed, to an RNeasy spin column placed in a 2-ml collection tube. Close the lid gently, and centrifuge for 15 s at >10,000 × g. Discard the flow-through. Reuse the collection tube in step 8. 10. Add 700 ml Buffer RW1 to the RNeasy spin column. Close the lid gently, and centrifuge for 15 s at >10,000 × g to wash the spin column membrane. Discard the flow-through. Reuse the collection tube in step 9. 11. Add 500 ml Buffer RPE to the RNeasy spin column. Close the lid gently, and centrifuge for 15 s at >10,000 × g to wash the spin column membrane. Discard the flow-through. Reuse the collection tube for step 10. 12. Add 500 ml Buffer RPE to the RNeasy spin column. Close the lid gently, and centrifuge for 2 min at >10,000 × g to wash the spin column membrane. Carefully remove the RNeasy spin column from the collection tube so that the column does not contact the flow-through to avoid carryover of ethanol. 13. Place the RNeasy spin column in a new 1.5-ml collection tube. Add 30–50 ml RNase-free water directly to the spin column membrane. Close the lid gently, and centrifuge for 1 min at 10,000 × g to elute the RNA. The RNA is now ready for reverse transcription. It can be stored at −20°C for several weeks.

358

Loos, Künzli, and Friess

3.2. RNA Concentration Measurement

The purity and quantity of total RNA must be determined spectrophotometrically. Here, we describe the use of a spectrophotometer designed for this purpose – GeneQuant II RNA/ DNA calculator (Pharmacia Biotech, Amersham Biosciences). The sample is taken up into a quartz capillary tube. This allows for the measurement of volumes of samples as small as 2 ml. The machine is referenced with nuclease-free dH2O containing 1× RNAsecure™ reagent. The spectral absorption at 260 and 280 nm is measured and the purity of RNA determined from the A260/280 ratio. Values between 1.8 and 2.0 are observed for high-quality RNA, whereas lower values correspond to poorquality RNA. The concentration of RNA is calculated from the spectral absorption at 260 nm using the Beer–Lambert Law: C = A/el, where C is the RNA concentration (in mg/ml); A is the absorption (260 nm); e is the RNA extinction coefficient (38 mg/ml); and l is the pathlength (0.05 cm).

3.3. Reverse Transcription of Total RNA to cDNA

1. For preparation of the Mastermix for a single 100-ml reaction, thaw all reagents on ice. Mix all reagents thoroughly by gentle pipetting and spin down as indicated in Table 1 (except reverse transcriptase). 2. Combine all reagents (except RNA) and mix thoroughly by gentle pipetting. Then spin down. 3. Aliquot the Mastermix into PCR tubes. 4. Add individual sample RNA to the appropriate PCR tubes.

Table 1 Preparation of the Mastermix for a single 100-ml reaction Reagent

Volume (ml)

Final concentration

25 mM MgCl2

22.0

5.5 mM

dNTP mixture

20.0

500 mM/dNTP 10×

TaqMan RT buffer

10.0

1×

Random hexamer (50 mM)

5.0

2.5 mM

RNase inhibitor (20 U/ml)

2.0

0.4 U/ml

Reverse transcriptase (50 U/ml)

2.5

1.25 U/ml

Total RNA content of each individual sample

0.5 mg

RNase-free water

Up to 100 ml of total volume

Quantitation of Cd39 Gene Expression in Pancreatic Tissue

359

5. Mix the contents by pipetting and spin down to remove any air bubbles. 6. Place the PCR tube into a thermal cycler. 7. Thermal cycling parameters: Incubation

10 min at 25°C

Reverse transcription

45 min at 42°C

Inactivation

5 min at 95°C

8. Store all cDNA at −20°C. 3.4. RT-PCR Using the 7700 Sequence Detector 3.4.1. Initial Preparation

1. Remove the TaqMan universal PCR Mastermix (TaqMan®, Cat. #4304437), the specific primer of interest (Applied Biosystem), and the individual cDNA probe from the −20°C freezer and thaw on ice before you begin the preparation of the Mastermix. 2. Dilute cDNA 1:5 in PCR water at RT and mix by gentle pipetting. 3. Prepare the Mastermix according to the following protocol: – 12.5 ml/well of TaqMan universal PCR Mastermix (TaqMan®, Cat. #4304437). – 1.25 ml/well of specific primer of interest (Applied Biosystem). – 6.25 ml/well of H2O. – 5 ml/well of 1:5 diluted individual cDNA probe. 4. For an internal control, we used the following primers from Applied Biosystems: – 18S ribosomal RNA (rRNA), Probe dye VIC-MGB (Cat. #4319413E-0312010). 5. Make sure you sufficiently mix the contents by gentle pipetting.

3.4.2. Basic PCR Plate Set Up

The final 96-well PCR plate should include the following samples (Table 2): 1. No-template control samples (NTC), including all PCR components, except the individual template (at least three NTC per gene). 2. Unknown samples (S1, S2, S3, …) of which, each one should be run at least in triplicate. 3. Control samples without reverse transcriptase (NRT), containing RNA instead of cDNA; at least one NRT per sample (S1, S2, S3, …).

3.4.3. Data Analysis

1. To read the prepared plate on the 7700 Sequence Detector, click on “Analyze” in the application menu of the sequence detection software and scroll to “Analyze data” to get the amplification curves.

360

Loos, Künzli, and Friess

Table 2 96-well plate with normal template control (NTC), control sample without reverse transcriptase (NRT), individual samples S1, S2, S3, …, and 18S RNA expression of sample 1 (18S(S1)), sample 2 (18S(S2)), and sample 3 (18S(S3)). Triplicates of samples are mandatory for a precise analysis 1

2

3

4

5

A

NTC

S1

S2

S3

…

B

NTC

S1

S2

S3

…

C

NTC

S1

S2

S3

…

D

NRT

18S(S1)

18S(S2)

18S(S3)

…

E

NRT

18S(S1)

18S(S2)

18S(S3)

…

F

NRT

18S(S1)

18S(S2)

18S(S3)

…

6

7

8

9

10

11

12

G H

Table 3 Example for calculating the ▵DCt value of a randomly chosen sample standardized to the expression of 18S. Standardization of the ▵DCt value of 18S (STDZ ▵D18S) is demonstrated Sample type

Detector gene

Ct value (well)

Ct value (well)

Ct value (well)

Average Ct value

STDZ ▵D18S

Relative expression

S1

CD39

32.07 (A2)

32.06 (B2)

32.09 (C2)

32.07

20.98

48.35

18S (S1)

18S

11.03 (D2) 11.15 (E2)

11.07 (F2)

11.09

2. Use the same threshold level when comparing the Ct values of the standards with one another or when comparing the Ct values of the unknown samples. 3. The PCR efficiency needs to be 95%. 4. Generate a report file and export the file in Microsoft Excel format for further analysis as displayed in Table 3. 5. To finally calculate the ▵Ct value of the detector gene CD39, the relative RNA expression is standardized to a defined housekeeping gene (e.g., 18S). Therefore, the standardized D18S value (STDZ ▵D18S) equals the subtraction of the average Ct value of sample 1 minus the average Ct value of total 18S of sample (STDZ D18S = average Ct [S1] − average Ct of [18S

Quantitation of Cd39 Gene Expression in Pancreatic Tissue

361

{S1}]). The relative RNA expression level of CD39 sample 1 (S1) is finally calculated by the formula (POWER[2,−{STDZ D18S}] multiplied by a correction factor 1e8) (in our example 48,35). The correction factor is randomly chosen to allow further analysis with easy manageable numbers. 6. An important note has to be made about comparison of data points. We can only compare relative RNA expression of samples standardized to a housekeeping gene within related sample populations, ideally displayed on the same 96-well plate.

4. Notes 1. Changes in the gene expression pattern can occur due to specific and nonspecific RNA/DNA degradation. RNA is easily degraded by ribonucleases (RNases), which are abundant in the environment and difficult to eliminate. Sterile, RNasefree microcentrifuge tubes and pipet tips should be used at all times. Gloves should be changed regularly and designated pipets should be used for RNA work. All solutions containing RNA need to be kept on ice at all times to minimize RNA degradation by contaminating ribonucleases. 2. As with all enzymatic reactions, mix all non-enzymatic components first and then add the enzymatic components. 3. Always wear a suitable lab coat, disposable gloves, and safety goggles when working with chemicals. Buffer RLC containing guanidine hydrochloride, Buffer RLT containing guanidine thiocyanate, and b-mercaptoethanol are harmful chemicals. Inhalation, ingestion, and skin and eye contact should be avoided at all times. 4. The use of a Mastermix markedly reduces the number of reagent transfers per sample and minimizes reagent loss and sample-to-sample variations. In addition, the use of multichannel pipettes is essential to minimize pipetting errors.

References 1. Higuchi, R., Fockler, C., Dollinger, G., Watson, R. (1993) Kinetic PCR analysis: real-time monitoring of DNA amplification reactions. Biotechnology 11, 1026. 2. Gibson, U.E., Heid, C.A., Williams, P.M. (1996) A novel method for real time quantitative RT-PCR. Genome Res. 6, 995–1001.

3. Mullis, K.B., Faloona, F.A. (1987) Specific synthesis of DNA in vitro via polymerasecatalyzed chain reaction. Methods Enzymol. 155, 335–350. 4. Holland, P.M., Abramson, R.D., Watson, R. (1991) Detection of specific polymerase chain reaction product by utilizing the 5¢-3¢

362

Loos, Künzli, and Friess

exonuclease activity of Thermus aquaticus DNA polymerase. Proc. Natl. Acad. Sci. USA 88, 7276–7280. 5. Foerster, V.T. (1948) Intermolecular energy transfer and fluorescence. Ann. Phys. 2, 55–75. 6. Winer, J., Jung, C.K., Shackl, I., Williams, P.M. (1999) Development and validation of real-time quantitative reverse transcriptasepolymerase chain reaction for monitoring gene

expression in cardiac myocytes in vitro. Anal. Biochem. 270, 41–49. 7. Künzli, B.M., Berberat, P.O., Giese, T., Csizmadia, E., Kaczmarek, E., Baker, C., Halaceli, I., Büchler, M.W., Friess, H., Robson, S.C. (2007) Upregulation of CD39/NTPDases and P2 receptors in human pancreatic disease. Am. J. Physiol. Gastrointest. Liver Physiol. 291, G223–G230.

Chapter 19 Functional profiling methods in cancer Joaquín Dopazo Summary The introduction of new high-throughput methodologies such as DNA microarrays constitutes a major breakthrough in cancer research. The unprecedented amount of data produced by such technologies has opened new avenues for interrogating living systems although, at the same time, it has demanded of the development of new data analytical methods as well as new strategies for testing hypotheses. A history of early successful applications in cancer boosted the use of microarrays and fostered further applications in other fields. Keeping the pace with these technologies, bioinformatics offers new solutions for data analysis and, what is more important, permits the formulation of a new class of hypotheses inspired in systems biology, more oriented to pathways or, in general, to modules of functionally related genes. Although these analytical methodologies are new, some options are already available and are discussed in this chapter. Key words: Functional profiling, Functional enrichment, Gene-set analysis, Pathway, Gene ontology, Systems biology, Microarray

1. Introduction Among the battery of high-throughput methodologies that are revolutionizing cancer research, DNA microarrays can be considered the standard due to their popularity and characteristics. Although many different questions can be addressed though microarray experiments, there are usually three types of objective in this context: “class comparison,” “class prediction,” and “class discovery” (1, 2). The first two objectives usually involve the application of tests to define differentially expressed genes, or the use of different procedures to predict class membership on the basis of the values observed for a number of “key” genes. Robert Grützmann and Christian Pilarsky (eds.), Cancer Gene Profiling: Methods and Protocols, Methods in Molecular Biology, vol. 576, DOI 10.1007/978-1-59745-545-9_19, © Humana Press, a part of Springer Science + Business Media, LLC 2010

363

364

Dopazo

Clustering methods belong to the last category, also known as unsupervised analysis, because no previous information about the class structure of the data set is used in the study. When strategies for microarray data analysis are considered from a historical perspective, an initial period can be distinguished in which almost all publications were related to reproducibility and sensitivity issues. Many classic microarray papers dating from the late 1990s were mainly proof-of-principle experiments (3, 4). Consequently, the methodological approaches used for analysis were mainly related to clustering and, in general, unsupervised approaches. This has caused a subsequent confusion with respect to the choice of the appropriate methodology for a proper data analysis, as noted by some authors (5). Later, sensitivity became a main concern as a natural reaction against very liberal interpretations of microarray experiments, such as the fold criteria, to select differentially expressed genes. It was soon obvious that genome-scale experiments should be carefully analyzed because many apparent associations happened merely by chance (6). In this scenario, methods for the adjustment of p-values, which are considered standard today, started to be extensively used (7, 8). The increasingly use of microarrays as predictors of clinical outcomes (9), despite not being free of criticisms (5), fueled the use of the methodology because of its practical implications. Comparative studies show that, although intra-platform reproducibility seems to be high, cross-platform and cross-laboratory coherence is still an issue (10). Another aspect that soon became of major importance was the interpretation of microarray experiments in terms of their biological implications, rather than restricting them to a mere comparison of lists of gene identifiers (11, 12). Thus, a number of methods that essentially search for the overrepresentation of functional modules within groups of genes previously defined in the experiment were developed. Examples of repositories widely used to define gene modules are Gene Ontology (GO) (13), KEGG pathways (14), or Biocarta (http://www. biocarta.com). Programs such as GOMiner (15), FatiGO (16), etc., can be considered representatives of a family of methods that use these gene module functional definitions to conjecture about the interpretation of the results of microarray experiments (17). The difficulties for defining repeatable lists of genes of interest across laboratories and platforms even using common experimental and statistical methods (18) has led several groups to propose different approaches that aim to select genes taking into account their functional properties. The Gene Set Enrichment Analysis (GSEA) (19, 20) has pioneered a family of methods devised not to find individual genes but to search for groups of functionally related genes with a coordinate (although not necessarily high) overexpression or underexpression across a list of genes ranked by differential expression between two classes, compared in microarray experiments. Different tests have recently been proposed for microarray data,

Functional Profiling methods in cancer

365

with this aim in mind (12, 21–25) and also for expressed sequence tags (ESTs) (26), and some of them are available on web servers (12, 27). In particular, the FatiScan procedure (12, 27) can deal with ordered lists of genes independently from the type of data that originated them. This interesting property allows for its application to a broad range of experimental designs (case–control, multiclass, survival, etc.) as well as to other type of high-throughput data apart from microarrays. Thus, in addition to the conventional study of individual genes and proteins, genome-wide approaches based on highthroughput methodologies have helped to uncover fundamental principles of tumorigenesis, and increasing evidence points to cooperative, systems-level events as important factors to understand the mechanisms by which cancer gene products coordinately promote cellular transformation (28, 29). Moreover, modern trends in the pharmaceutical industry also point toward the use of functional genomics and systems biologyoriented studies (30, 31) as fundamental steps of the drug discovery pipeline.

2. Materials 2.1. Definition of Gene Modules: Sources of Information

Any functional analysis relies on the definition of gene modules related by biological properties of interest. Probably the most widely used source of definition of functional modules is the Gene Ontology (GO) catalog (13). GO represents the biological knowledge as a tree (more precisely as a directed acyclic graph [DAG], in which a node can have more than one parent) where functional terms near the root of the tree make reference to more general concepts while deeper functional terms near the leaves of the tree make reference to more specific concepts. If a gene is annotated to a given level then it is automatically considered to be annotated at all of the upper levels (all of the parent levels) up to the root. Because genes are annotated at different levels of the GO hierarchy, it is common to use this abstraction to choose a predefined level in the hierarchy instead of using directly the original levels of annotation of the genes (11, 32), which increases the power of the enrichment tests (11, 12, 33, 34). The KEGG pathways database (14) or the Biocarta pathways (http://www.biocarta.com) are two extensively used sources of functional information. There are also databases that contain functional motifs mapped to proteins, such as the Interpro database (35) and many others. In addition, other types of modules, such as transcriptional ones, can be defined as groups of genes under the same regulatory control.

366

Dopazo

Databases that collect regulatory motifs are available. Among the most popular are CisRed (36) and Transfac, which contains predictions of transcription factor binding sites (37). In addition, negative regulation mediated by microRNAs has recently gained relevance. The miRBase (38) contains putative gene targets of such microRNAs. Genes sharing one or more of these regulatory motifs can be considered a putative regulatory module. Other ways of defining modules of different nature include the use of information obtained using text-mining procedures (39), chromosomal location (40, 41), protein–protein interactions, etc. 2 .2. Bioinformatics Tools

Beyond other technical or statistical considerations, the approximate level of acceptance of different gene-set analysis (GSA) methods among the scientific community is reported in Tables 1 and 2. Table 1 presents an exhaustive list of bioinformatics tools available for functional profiling that implement tests for functional enrichment. Here the number of Scholar Google citations has been used as an approximate popularity index, given that it is reflecting the number of academic documents (mostly papers) citing a particular paper. Following this criterion, the most popular tools having more than 200 citations are EASE (42), DAVID (43), GOMiner (15), Babelomics/FatiGO (12, 16, 34), MAPPFinder (44), GOStats (45), and Ontotools (46). In the case of GSA methods, Table 2 shows that more than the 75% of the Scholar Google citations are monopolized by two tools: GSEA and Babelomics.

3. Methods 3.1. Functional Enrichment Methods

In the conventional approach for the functional annotation of microarray experiments, known as functional enrichment analysis, the functional interpretation of the data is performed in two steps: in a first step, genes of interest are selected using different procedures. In a subsequent step, the selected genes of interest are compared with a background (usually the rest of the genes) to find enrichment in any gene module. This comparison with the background is essential because an apparently high proportion of a given functional module could easily be nothing but a reflection of a high proportion of this particular module in the whole genome but not a proper enrichment. Actually, both enrichments and depletions of gene modules are potentially of interest. Therefore, unless there is a specific reason not to consider enrichment or depletion, two-sided tests are appropriate (47). This comparison between the selected genes and the background can be carried out

Functional Profiling methods in cancer

367

Table 1 Functional enrichment data analysis tools with at least ten Scholar Google citations Tool

Application type or URL for web servers

References

Citationsa

EASE

Windows application

(42)

603

DAVID

http://www.DAVID.niaid.nih.gov

(43)

504

GOMiner

http://discover.nci.nih.gov/gominer/

(15, 55)

408

Babelomics

http://www.babelomics.org

(12, 16, 34, 50, 56)

402

MAPPFinder

http://www.GenMAPP.org

(44)

379

FatiGO

http://www.fatigo.org

(16)

341

GOStat

http://gostat.wehi.edu.au/

(45)

249

Ontotools

http://vortex.cs.wayne.edu/ ontoexpress/

(32, 46, 57–59)

223

GOTM

http://genereg.ornl.gov/gotm/

(60)

164

GO::TermFinder

Perl script

(61)

152

FunSpec

http://funspec.med.utoronto.ca

(62)

100

GeneMerge

http://www.oeb.harvard.edu/hartl/ lab/publications/GeneMerge.html

(63)

96

FuncAssociate

http://llama.med.harvard.edu/ Software.html

(64)

91

BINGO

Cytoscape plugin

(65)

75

GOToolBox

http://gin.univ-mrs.fr/GOToolBox

(66)

74

GFINDer

http://www.medinfopoli.polimi.it/ GFINDer/

(67, 68)

49

WebGestalt

http://bioinfo.vanderbilt.edu/webgestalt/

(69)

46

GOSurfer

R package

(70)

45

CLENCH

Perl script

(71)

26

Pathway Explorer

https://pathwayexplorer.genome. tugraz.at/

(72)

25

Ontology Traverser

R package

(73)

24

THEA

Java standalone

(74)

11

WebBayGO

http://blasto.iq.usp.br/~tkoide/ BayGO/

(75)

10

GOStat

R package

(76)

10

a Citations are taken from Scholar Google (by January 2008). Scholar Google is taken as an indirect estimation of the citation in papers but gives an idea on the impact in the scientific community

368

Dopazo

Table 2 Tools available for functional profiling by gene-set analysis with at least ten Scholar Google citations Tool

Application type or URL for web servers

References

Test

Citationsa

(19, 20)

GS, C

1,013

Babelomics (FatiGO + http://www.babelomics.org FatiScan)

(12, 16, 34, 50, 56)

FE/GS, C

402

FuncAssociate

http://llama.med.harvard. edu/Software.html

(64)

FE/GS, C

91

Global test

R package

(22)

GS, SC

89

PAGE

Python script

(25)

GS, C

42

ErmineJ

Java

(77)

GS, C

35

FatiScan

http://www.babelomics.org

(50)

GS, C

34

GO-mapper

Windows, Perl script

(24)

GS, C

33

SAFE

R package

(49)

GS, C

27

GOAL

http://microarrays.unife.it

(78)

GS, C

25

Catmap

Perl script

(79)

GS, C

19

PLAGE

http://dulci.biostat.duke. edu/pathways/

(80)

GS, SC

18

GODist

Mathlab program

(81)

GS, SC

17

t-Profiler

http://www.t-profiler.org/

(82)

GS, C

12

GSEA

http://www.broad.mit.edu/ gsea/

Type of test: GS gene set; C Competitive; FE functional enrichment; SC self-contained Citations are taken from Scholar Google (by January 2008). Scholar Google is taken as an indirect estimation of the citation in papers but gives an idea on the impact in the scientific community a

by means of the application of different tests, such as the hypergeometric, Fisher’s exact test c2 and binomial, which are considered to give similar results (47). Because many tests are conducted to check all the gene modules, adjustment for multiple testing, such as false discovery rate (FDR) (7) or others, must be used. 3.2. Gene-Set Analysis Methods

The interpretation of a genome-scale experiment using the twosteps functional enrichment approach is far from being optimal given that the thresholds imposed in the first step assuming independence preclude the detection of many gene modules. Methods directly inspired in systems biology focus on collective properties

Functional Profiling methods in cancer

369

of the genes more than on individual gene expression values. Modules of genes related by common functionality, regulation, or other interesting biological properties will simultaneously fulfill their roles in the cell and, consequently, they are expected to display a coordinated expression. In its simplest formulation, the GSA method uses a rank of values derived from the experiment analyzed. Mootha et al. (19) ranked the genes according to their differential expression when two predefined classes (diabetic versus healthy controls) were comparing by means of any appropriate statistical test (48). The position of the genes (that cooperatively act to define modules) within this ranked list is related to its participation in the trait studied in the experiment. Consequently, each module that is a causative agent of the differences between the classes compared will be found in the extremes of the ranked list with highest probability. Thus, instead of testing differential activities of genes, which implicitly assumes independent behavior (an aspect often ignored by the researchers applying the test), and later searching for enrichment in gene modules among the selected genes, GSA directly tests for gene modules significantly cumulated in the extremes of a ranked list of genes. In this way, artificial previous thresholds, which inadvertently change the meaning of our hypothesis testing schema, is avoided. Different methods have been proposed for this purpose, such as the GSEA (19, 20) or the SAFE (49) methods, which use a nonparametrical version of a Kolmogorov–Smirnov test. Other strategies proposed are the direct analysis of functional terms weighted with experimental data (24) or model-based methods (22). Methods with similar accuracy, although conceptually simpler and quicker, have also been proposed, for instance, the parametrical counterpart of the GSEA, the PAGE (25), or the segmentation test, Fatiscan (50). 3.3. Functional Profiling in Array–CGH Experiments

Genetic alterations, such as losses (deletions), gains (amplifications), or losses of heterozygosity (LOH) of genetic material that affect certain regions of the genome, have been shown to be the basis of many types of cancer (51). New technologies such as array– CGH, along with the use of expression arrays, offer for the first time the opportunity to accurately characterize the alterations in genomic copy number and the dependence of gene expression on the alterations (52). Despite the obvious fact that such alterations affect a large number of genes, most of the research is still focused in finding only one or a few genes responsible for a disease or a trait and ignores the chromosomal context (52). In particular, the putative impact that the local distribution of functions could have in the symptomatology of diseases that harbor copy number alterations or, in general, could have in gene regulation and/or silencing is largely unexplored. Actually, only a few attempts of

370

Dopazo

analyzing copy number alterations in terms of gains or losses of whole or parts of gene teams have been made to date (40, 41). Programs such as ISACGH (41) detect copy number alterations using conventional algorithms and allow a functional enrichment analysis of the regions with detected alterations. 3.4. Gene-Set Analysis in Genotyping

Another field in which a gene set-based approach could be very useful is genotyping. Association and linkage studies with chips with increasingly density result in a frustrating effect of decreasing the power of the tests, because of the strict corrections that must be applied to the tests. Most genetic disorders have a complex inheritance and can be considered the combined result of variants in many genes, each contributing only weak effects to the disease. Given that, in any disorder, most of the disease genes will be involved in only a few different molecular pathways, the knowledge of the relationships (functional, regulatory, interactions, etc.) between the genes can help in the assessment of possible candidates (which may reside in different loci) with a joint basis for the disease etiology. The use of different gene module definitions (GO, KEGG, protein interactions and coexpression) in an integrated network was recently applied to interrelate positional candidate genes from different disease loci and then to test 96 heritable disorders in the Online Mendelian Inheritance in Man database (53). This gene set-based strategy resulted in a 2.8-fold increase over random selection.

3.5. Conclusion

As research in cancer is increasingly benefited by the introduction of high-throughput technologies, new hypotheses, inspired in systems biology concepts, can be addressed and checked (54). Bioinformatics has become an essential tool not only as a mere instrument for managing the huge amount of data produced by these new technologies, but to implement a new generation of algorithms and concepts that are opening the doors to the understanding of cancer as a system (28, 29). Biomedicine is becoming more computational and research in cancer is pioneering this transformation.

Acknowledgments This work is supported by grants from the Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), ISCIII, projects BIO 2008-04212 from the Spanish Ministry of Education and Science and National Institute of Bioinformatics (http://www.inab.org), a platform of Genoma España. EA is supported by a fellowship for the FIS of the Spanish Ministry of Health (FI06/00027).

Functional Profiling methods in cancer

371

References 1. Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., et al. (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286, 531–537. 2. Allison, D.B., Cui, X., Page, G.P. and Sabripour, M. (2006) Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet, 7, 55–65. 3. Perou, C.M., Jeffrey, S.S., van de Rijn, M., Rees, C.A., Eisen, M.B., Ross, D.T., Pergamenschikov, A., Williams, C.F., Zhu, S.X., Lee, J.C., et al. (1999) Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. Proc Natl Acad Sci USA, 96, 9212–9217. 4. Alizadeh, A.A., Eisen, M.B., Davis, R.E., Ma, C., Lossos, I.S., Rosenwald, A., Boldrick, J.C., Sabet, H., Tran, T., Yu, X., et al. (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature, 403, 503–511. 5. Simon, R., Radmacher, M.D., Dobbin, K. and McShane, L.M. (2003) Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J Natl Cancer Inst, 95, 14–18. 6. Ge, H., Walhout, A.J. and Vidal, M. (2003) Integrating ‘omic’ information: a bridge between genomics and systems biology. Trends Genet, 19, 551–560. 7. Benjamini, Y. and Yekutieli, D. (2001) The control of false discovery rate in multiple testing under dependency. Ann Stat, 29, 1165–1188. 8. Storey, J.D. and Tibshirani, R. (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci USA, 100, 9440–9445. 9. van ‘t Veer, L.J., Dai, H., van de Vijver, M.J., He, Y.D., Hart, A.A., Mao, M., Peterse, H.L., van der Kooy, K., Marton, M.J., Witteveen, A.T., et al. (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415, 530–536. 10. Moreau, Y., Aerts, S., De Moor, B., De Strooper, B. and Dabrowski, M. (2003) Comparison and meta-analysis of microarray data: from the bench to the computer desk. Trends Genet, 19, 570–577. 11. Al-Shahrour, F. and Dopazo, J. (2005) In Azuaje, F. and Dopazo, J. (eds.), Data analysis and visualization in genomics and proteomics. Wiley, pp. 99–112. 12. Al-Shahrour, F., Minguez, P., Vaquerizas, J.M., Conde, L. and Dopazo, J. (2005) BABELOMICS: a suite of web tools for functional

annotation and analysis of groups of genes in high-throughput experiments. Nucleic Acids Res, 33, W460–W464. 13. Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet, 25, 25–29. 14. Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y. and Hattori, M. (2004) The KEGG resource for deciphering the genome. Nucleic Acids Res, 32, D277–D280. 15. Zeeberg, B.R., Feng, W., Wang, G., Wang, M.D., Fojo, A.T., Sunshine, M., Narasimhan, S., Kane, D.W., Reinhold, W.C., Lababidi, S., et al. (2003) GoMiner: a resource for biological interpretation of genomic and proteomic data. Genome Biol, 4, R28. 16. Al-Shahrour, F., Diaz-Uriarte, R. and Dopazo, J. (2004) FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes. Bioinformatics, 20, 578–580. 17. Khatri, P. and Draghici, S. (2005) Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics, 21, 3587–3595. 18. Bammler, T., Beyer, R.P., Bhattacharya, S., Boorman, G.A., Boyles, A., Bradford, B.U., Bumgarner, R.E., Bushel, P.R., Chaturvedi, K., Choi, D., et al. (2005) Standardizing global gene expression analysis between laboratories and across platforms. Nat Methods, 2, 351–356. 19. Mootha, V.K., Lindgren, C.M., Eriksson, K.F., Subramanian, A., Sihag, S., Lehar, J., Puigserver, P., Carlsson, E., Ridderstrale, M., Laurila, E., et al. (2003) PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet, 34, 267–273. 20. Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., et al. (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA, 102, 15545–15550. 21. Goeman, J.J., Oosting, J., Cleton-Jansen, A.M., Anninga, J.K. and van Houwelingen, H.C. (2005) Testing association of a pathway with survival using gene expression data. Bioinformatics, 21, 1950–1957. 22. Goeman, J.J., van de Geer, S.A., de Kort, F. and van Houwelingen, H.C. (2004) A global test for groups of genes: testing association

372

Dopazo

with a clinical outcome. Bioinformatics, 20, 93–99. 23. Tian, L., Greenberg, S.A., Kong, S.W., Altschuler, J., Kohane, I.S. and Park, P.J. (2005) Discovering statistically significant pathways in expression profiling studies. Proc Natl Acad Sci USA, 102, 13544–13549. 24. Smid, M. and Dorssers, L.C. (2004) GOMapper: functional analysis of gene expression data using the expression level as a score to evaluate Gene Ontology terms. Bioinformatics, 20, 2618–2625. 25. Kim, S.Y. and Volsky, D.J. (2005) PAGE: parametric analysis of gene set enrichment. BMC Bioinformatics, 6, 144. 26. Chen, Z., Wang, W., Ling, X.B., Liu, J.J. and Chen, L. (2006) GO-Diff: mining functional differentiation between EST-based transcriptomes. BMC Bioinformatics, 7, 72. 27. Al-Shahrour, F., Minguez, P., Tarraga, J., Montaner, D., Alloza, E., Vaquerizas, J.M., Conde, L., Blaschke, C., Vera, J. and Dopazo, J. (2006) BABELOMICS: a systems biology perspective in the functional annotation of genome-scale experiments. Nucleic Acids Res, 34, W472–W476. 28. Khalil, I.G. and Hill, C. (2005) Systems biology for cancer. Curr Opin Oncol, 17, 44–48. 29. Kitano, H. (2004) Cancer as a robust system: implications for anticancer therapy. Nat Rev Cancer, 4, 227–235. 30. Butcher, E.C. (2005) Can cell systems biology rescue drug discovery? Nat Rev Drug Discov, 4, 461–467. 31. Searls, D.B. (2005) Data integration: challenges for drug discovery. Nat Rev Drug Discov, 4, 45–58. 32. Khatri, P., Sellamuthu, S., Malhotra, P., Amin, K., Done, A. and Draghici, S. (2005) Recent additions and improvements to the OntoTools. Nucleic Acids Res, 33, W762–W765. 33. Al-Shahrour, F., Arbiza, L., Dopazo, H., Huerta-Cepas, J., Minguez, P., Montaner, D. and Dopazo, J. (2007) From genes to functional classes in the study of biological systems. BMC Bioinformatics, 8, 114. 34. Al-Shahrour, F., Minguez, P., Tarraga, J., Montaner, D., Alloza, E., Vaquerizas, J.M., Conde, L., Blaschke, C., Vera, J. and Dopazo, J. (2006) BABELOMICS: a systems biology perspective in the functional annotation of genome-scale experiments. Nucleic Acids Res, 34, W472–W476. 35. Mulder, N.J., Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Binns, D., Bradley, P., Bork, P., Bucher, P., Cerutti, L., et al. (2005) InterPro, progress and status in 2005. Nucleic Acids Res, 33, D201–D205.

36. Robertson, G., Bilenky, M., Lin, K., He, A., Yuen, W., Dagpinar, M., Varhol, R., Teague, K., Griffith, O.L., Zhang, X., et al. (2006) cisRED: a database system for genomescale computational discovery of regulatory elements. Nucleic Acids Res, 34, D68–D73. 37. Wingender, E., Chen, X., Hehl, R., Karas, H., Liebich, I., Matys, V., Meinhardt, T., Pruss, M., Reuter, I. and Schacherer, F. (2000) TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res, 28, 316–319. 38. Griffiths-Jones, S., Grocock, R.J., van Dongen, S., Bateman, A. and Enright, A.J. (2006) miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res, 34, D140–D144. 39. Minguez, P., Al-Shahrour, F., Montaner, D. and Dopazo, J. (2007) Functional profiling of microarray experiments using text-mining derived bioentities. Bioinformatics, 23, 3098–3099. 40. Conde, L., Montaner, D., Burguet-Castell, J., Tarraga, J., Al-Shahrour, F. and Dopazo, J. (2007) Functional profiling and gene expression analysis of chromosomal copy number alterations. Bioinformation, 1, 432–435. 41. Conde, L., Montaner, D., Burguet-Castell, J., Tarraga, J., Medina, I., Al-Shahrour, F. and Dopazo, J. (2007) ISACGH: a web-based environment for the analysis of Array CGH and gene expression which includes functional profiling. Nucleic Acids Res, 35, W81–W85. 42. Hosack, D.A., Dennis, G., Jr., Sherman, B.T., Lane, H.C. and Lempicki, R.A. (2003) Identifying biological themes within lists of genes with EASE. Genome Biol, 4, R70. 43. Dennis, G., Jr., Sherman, B.T., Hosack, D.A., Yang, J., Gao, W., Lane, H.C. and Lempicki, R.A. (2003) DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol, 4, P3. 44. Doniger, S.W., Salomonis, N., Dahlquist, K.D., Vranizan, K., Lawlor, S.C. and Conklin, B.R. (2003) MAPPFinder: using Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data. Genome Biol, 4, R7. 45. Beissbarth, T. and Speed, T.P. (2004) GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics, 20, 1464–1465. 46. Khatri, P., Bhavsar, P., Bawa, G. and Draghici, S. (2004) Onto-Tools: an ensemble of web-accessible, ontology-based tools for the functional design and interpretation of highthroughput gene expression experiments. Nucleic Acids Res, 32, W449–W456. 47. Rivals, I., Personnaz, L., Taing, L. and Potier, M.C. (2007) Enrichment or depletion of a

Functional Profiling methods in cancer

373

GO category within a class of genes: which test? Bioinformatics, 23, 401–407. 48. Cui, X. and Churchill, G.A. (2003) Statistical tests for differential expression in cDNA microarray experiments. Genome Biol, 4, 210. 49. Barry, W.T., Nobel, A.B. and Wright, F.A. (2005) Significance analysis of functional categories in gene expression studies: a structured permutation approach. Bioinformatics, 21, 1943–1949. 50. Al-Shahrour, F., Diaz-Uriarte, R. and Dopazo, J. (2005) Discovering molecular functions significantly related to phenotypes by combining gene expression data and biological information. Bioinformatics, 21, 2988–2993. 51. Albertson, D.G. and Pinkel, D. (2003) Genomic microarrays in human genetic disease and cancer. Hum Mol Genet, 12(Spec No 2), R145–R152. 52. Pinkel, D. and Albertson, D.G. (2005) Array comparative genomic hybridization and its applications in cancer. Nat Genet, 37(Suppl), S11–S17. 53. Franke, L., van Bakel, H., Fokkens, L., de Jong, E.D., Egmont-Petersen, M. and Wijmenga, C. (2006) Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. Am J Hum Genet, 78, 1011–1025. 54. Kitano, H. (2002) Computational systems biology. Nature, 420, 206–210. 55. Zeeberg, B.R., Qin, H., Narasimhan, S., Sunshine, M., Cao, H., Kane, D.W., Reimers, M., Stephens, R.M., Bryant, D., Burt, S.K., et al. (2005) High-throughput GoMiner, an ‘industrial-strength’ integrative gene ontology tool for interpretation of multiple-microarray experiments, with application to studies of Common Variable Immune Deficiency (CVID). BMC Bioinformatics, 6, 168. 56. Al-Shahrour, F., Minguez, P., Tarraga, J., Medina, I., Alloza, E., Montaner, D. and Dopazo, J. (2007) FatiGO+: a functional profiling tool for genomic data. Integration of functional annotation, regulatory motifs and interaction data with microarray experiments. Nucleic Acids Res, 35, W91–W96. 57. Draghici, S., Khatri, P., Bhavsar, P., Shah, A., Krawetz, S.A. and Tainsky, M.A. (2003) Onto-Tools, the toolkit of the modern biologist: Onto-Express, Onto-Compare, OntoDesign and Onto-Translate. Nucleic Acids Res, 31, 3775–3781. 58. Khatri, P., Desai, V., Tarca, A.L., Sellamuthu, S., Wildman, D.E., Romero, R. and Draghici, S. (2006) New Onto-Tools: PromoterExpress, nsSNPCounter and Onto-Translate. Nucleic Acids Res, 34, W626–W631.

59. Khatri, P., Voichita, C., Kattan, K., Ansari, N., Khatri, A., Georgescu, C., Tarca, A.L. and Draghici, S. (2007) Onto-Tools: new additions and improvements in 2006. Nucleic Acids Res, 35, W206–W211. 60. Zhang, B., Schmoyer, D., Kirov, S. and Snoddy, J. (2004) GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies. BMC Bioinformatics, 5, 16. 61. Boyle, E.I., Weng, S., Gollub, J., Jin, H., Botstein, D., Cherry, J.M. and Sherlock, G. (2004) GO::TermFinder – open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes. Bioinformatics, 20, 3710–3715. 62. Robinson, M.D., Grigull, J., Mohammad, N. and Hughes, T.R. (2002) FunSpec: a webbased cluster interpreter for yeast. BMC Bioinformatics, 3, 35. 63. Castillo-Davis, C.I. and Hartl, D.L. (2003) GeneMerge – post-genomic analysis, data mining, and hypothesis testing. Bioinformatics, 19, 891–892. 64. Berriz, G.F., King, O.D., Bryant, B., Sander, C. and Roth, F.P. (2003) Characterizing gene sets with FuncAssociate. Bioinformatics, 19, 2502–2504. 65. Maere, S., Heymans, K. and Kuiper, M. (2005) BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics, 21, 3448–3449. 66. Martin, D., Brun, C., Remy, E., Mouren, P., Thieffry, D. and Jacq, B. (2004) GOToolBox: functional analysis of gene datasets based on Gene Ontology. Genome Biol, 5, R101. 67. Masseroli, M., Galati, O. and Pinciroli, F. (2005) GFINDer: genetic disease and phenotype location statistical analysis and mining of dynamically annotated gene lists. Nucleic Acids Res, 33, W717–W723. 68. Masseroli, M., Martucci, D. and Pinciroli, F. (2004) GFINDer: Genome Function INtegrated Discoverer through dynamic annotation, statistical analysis, and mining. Nucleic Acids Res, 32, W293–W300. 69. Zhang, B., Kirov, S. and Snoddy, J. (2005) WebGestalt: an integrated system for exploring gene sets in various biological contexts. Nucleic Acids Res, 33, W741–W748. 70. Zhong, S., Storch, K.F., Lipan, O., Kao, M.C., Weitz, C.J. and Wong, W.H. (2004) GoSurfer: a graphical interactive tool for comparative analysis of large gene sets in Gene Ontology space. Appl Bioinformatics, 3, 261–264.

374

Dopazo

71. Shah, N.H. and Fedoroff, N.V. (2004) CLENCH: a program for calculating Cluster ENriCHment using the Gene Ontology. Bioinformatics, 20, 1196–1197. 72. Mlecnik, B., Scheideler, M., Hackl, H., Hartler, J., Sanchez-Cabo, F. and Trajanoski, Z. (2005) PathwayExplorer: web service for visualizing high-throughput expression data on biological pathways. Nucleic Acids Res, 33, W633–W637. 73. Young, A., Whitehouse, N., Cho, J. and Shaw, C. (2005) OntologyTraverser: an R package for GO analysis. Bioinformatics, 21, 275–276. 74. Pasquier, C., Girardot, F., Jevardat de Fombelle, K. and Christen, R. (2004) THEA: ontologydriven analysis of microarray data. Bioinformatics, 20, 2636–2643. 75. Vencio, R.Z., Koide, T., Gomes, S.L. and Pereira, C.A. (2006) BayGO: Bayesian analysis of ontology term enrichment in microarray data. BMC Bioinformatics, 7, 86. 76. Falcon, S. and Gentleman, R. (2007) Using GOstats to test gene lists for GO term association. Bioinformatics, 23, 257–258. 77. Lee, H.K., Braynen, W., Keshav, K. and Pavlidis, P. (2005) ErmineJ: tool for functional

analysis of gene expression data sets. BMC Bioinformatics, 6, 269. 78. Volinia, S., Evangelisti, R., Francioso, F., Arcelli, D., Carella, M. and Gasparini, P. (2004) GOAL: automated Gene Ontology analysis of expression profiles. Nucleic Acids Res, 32, W492–W499. 79. Breslin, T., Eden, P. and Krogh, M. (2004) Comparing functional annotation analyses with Catmap. BMC Bioinformatics, 5, 193. 80. Tomfohr, J., Lu, J. and Kepler, T.B. (2005) Pathway level analysis of gene expression using singular value decomposition. BMC Bioinformatics, 6, 225. 81. Ben-Shaul, Y., Bergman, H. and Soreq, H. (2005) Identifying subtle interrelated changes in functional gene categories using continuous measures of gene expression. Bioinformatics, 21, 1129–1137. 82. Boorsma, A., Foat, B.C., Vis, D., Klis, F. and Bussemaker, H.J. (2005) T-profiler: scoring the activity of predefined groups of genes using gene expression data. Nucleic Acids Res, 33, W592–W595.

Chapter 20 Calibration of Microarray Gene-Expression Data Hans Binder, Stephan Preibisch, and Hilmar Berger Summary Calibration of microarray measurements aims at removing systematic biases from the probe-level data to get expression estimates that linearly correlate with the transcript abundance in the studied samples. The improvement of calibration methods is an essential prerequisite for estimating absolute expression levels, which, in turn, are required for quantitative analyses of transcriptional regulation, for example, in the context of gene profiling of diseases. We address hybridization on microarrays as a reaction process in a complex environment and express the measured intensities as a function of the input quantities of the experiment. Popular calibration methods such as MAS5, dChip, RMA, gcRMA, vsn, and PLIER are briefly reviewed and assessed in light of the hybridization model and of previous benchmark studies. We present our hook method, a new calibration approach that is based on a graphical summary of the actual hybridization characteristics of a particular microarray. Although single-chip related, hook performs as well as the multi-chip-related gcRMA, presently one of the best state-of-the-art methods for estimating expression values. The hook method, in addition, provides a set of chip summary characteristics that evaluate the performance of a given hybridization. The algorithm of the method is briefly described and its performance is exemplified. Key words: Gene expression, Microarray calibration, Preprocessing methods, Transcript concentration, Hook curve, Hybridization, Langmuir isotherm

1. Introduction In this chapter, we emphasize GeneChip microarray data analysis after the chips have been hybridized and scanned and the images have been summarized into hundred thousands of probe intensity values. With this enormous amount of data, we need standardized systems and tools for data management to analyze the results in a proper and sound way, as well as to be able to benefit from other publicly available gene expression data sets. Robert Grützmann and Christian Pilarsky (eds.), Cancer Gene Profiling: Methods and Protocols, Methods in Molecular Biology, vol. 576, DOI 10.1007/978-1-59745-545-9_20, © Humana Press, a part of Springer Science + Business Media, LLC 2010

375

376

Binder, Preibisch, and Berger

The basic principle of microarray experiments relies on the fluorescence intensity measurement for an individual probe to infer the transcript abundance specific for a selected gene. This relationship raises several difficult issues to properly extract the expression degree from the measured intensity. Calibration of microarray measurements aims at removing consistent and systematic sources of variations to allow mutual comparison of measurements acquired from different probes, arrays, and experimental settings. Calibration is also called preprocessing because it usually constitutes the first step in the microarray analysis pipeline. It potentially influences the results of all subsequent steps of “higher-level” analyses as well as the biological interpretation of these results, and is therefore a crucial step in the processing of microarray data. The chapter is organized into three parts: (1) As the essential premise for evaluating existing and developing new calibration methods, we acknowledge hybridization on microarrays as a reaction process in a complex environment and express the measured intensities as a function of input quantities of the experiment. (2) Over the past years, microarray preprocessing has adapted a few generally accepted methodologies. In the second part of the chapter, we briefly review these options with regards to the underlying hybridization process and we judge advantages and disadvantages in the light of previous benchmark studies. Because we focus on Affymetrix GeneChip arrays, special attention is dedicated to the question of whether a mismatch-based chip design provides benefits for intensity calibration. (3) Finally, we present our hook method, a new calibration approach that is based on a graphical summary of the hybridization characteristics of each microarray. It uses a sort of natural metrics for intensity calibration, with the potential to estimate expression values on an absolute scale. We briefly describe the algorithm and exemplify its performance.

2. Calibration of Microarrays Microarray experiments aim to estimate the “expression degree” of thousands of specific target sequences using the integral intensity response of the respective probe spots on the chip surface. The detected intensity is affected by parasitic effects owing to the “technical” variability of repeated measurements and systematic biases that disturb the one-to-one relationship between the input and the output quantity of the measurement (1). The task of making estimates of the input quantity of a measurement from observations of its output is called calibration. First, it requires the determination of the model describing the basic relationship between the probe intensity and the specific

Calibration of Microarray Gene-Expression Data

377

transcript concentration with consideration given to all relevant parasitic effects that should be straightened out. Second, the magnitude of these effects should be estimated using the intensity information of a given chip or of a series of chips. Third, one needs practicable algorithms that estimate the expression degree from the intensity values. 2.1. The Langmuir Hybridization Model

The hybridization of a microarrays probe (P) can be described by the following reversible second-order reactions referring to specific (S) and nonspecific (N) target binding, respectively: P f + S f  PS and P f + N f  PN .

(1)

Accordingly, free complementary RNA (cRNA; or complementary cDNA) fragments with completely complementary (S f) and partly complementary (N f) sequences in solution compete for duplex formation (PS and PN) with free DNA–probe–oligonucleotides attached to the chip surface (P f). The equilibrium constants for specific and nonspecific hybridization characterize the “affinity” of the respective targets for duplexing with the probe,

KS ≡

[PS ] [P f ]∙[S f ]

KN ≡

and

[PN ] . [P f ]∙[N f ]

(2)

The brackets denote the respective concentrations. Making use of the condition of material balance for the probe oligonucleotides, [P ] = [PS ] + [PN ] + [P f ] , and assuming excess of free species, f [ N ] = [ N f ] + [ PN ] and [ S ] = [ S ] + [ PS ] , one obtains the fraction of occupied probe oligonucleotides after insertion into Eq. 2 and rearrangement, Θ ≡

(

)

K S ∙[S ] + K N ∙[N ] [PS ] + [PN ] = . [P ] 1 + K S ∙[S ] + K N ∙[N ]

(

)

(3)

Typically, the hybridization solution contains a very large number of nonspecific fragments of different lengths and sequences. For sake of simplicity, we subsume this diversity by the term K N ∙[N ] ≡ ∑ i K iN ∙[N ]i referring to a single, effective species.

Our reaction scheme, Eq. 1, considers only the bimolecular duplexing between probes and targets for the sake of simplicity. Note that the available concentration of free probes and targets are, however, reduced by parasitic reactions such as bulk dimerization between different targets and intramolecular folding of probes and targets. These effects can be taken into account by substituting the equilibrium constants for the bimolecular binding

378

Binder, Preibisch, and Berger

reactions (Eq. 2) with effective reaction constants depending on the reaction constants of the additional processes (see, e.g., ref. 1 for details). Typically, the effective binding rates are decreased compared with their values in the absence of parasitic reactions. After the hybridization step, free targets are removed by washing, and bound targets are labeled with fluorescent markers that attach to biotinylated nucleotides inserted into the target sequences before hybridization. Finally, the fluorescence emission of the probe spot is scanned and processed into one intensity value. Assuming good grinding, it directly relates to the fraction of occupied probe duplexes, Q, i.e., (4) I = M∙ Θ +O. Here, M denotes the proportionality constant in intensity units and O the “optical” background, referring to the residual intensity measured in the absence of bound transcripts owing to, e.g., adsorbed free labels or the dark current of the detector. 2.2. Probe and Chip Design

On GeneChip expression arrays, each gene is interrogated by a set of Nset = 11–20 probe pairs. Each of them consists of a perfect match (PM) and a mismatch (MM) version. The PM sequence perfectly matches a segment of the target gene with a length of 25 nucleotides. The MM sequence is identical to that of the corresponding PM probe except the middle (13th) base, which is changed to its Watson–Crick complement. The MM probes are intended to provide an estimate of the background of the respective PM. The probe set forms a series of pseudo-replicates probing the same target with different probe sequences to increase the certainty of the expression estimate. GeneChip microarrays can be viewed as sort of multi-photometer chips, each of which assembles approximately 105–106 virtually independent dual-channel micro-photometers on an area of approximately 1 cm2. This analogy implies that each PM probe spot constitutes the “sample” channel for detecting RNA fragments of a given sequence; whereas the MM spot serves as the “reference” channel for nonspecific background correction. The apparatus function given by Eq. 4 applies to each of these “microphotometers,” however, with different sequence- and transcriptspecific values of the parameters. With (3) one obtains:

I pP,c =

LPp ,c 1 + M c−1 ∙LPp ,c

+ Oc

with

LPp ,c = S pP,c + N pP,c .

(5)

The concentration dependence of the intensity of a PM/MM probe pair is illustrated in Fig. 1. In Eq. 5, probe-related properties are indexed by the superscripts P = PM, MM to account for the probe type, and by the subscripts p = 1, …, Nprobe

Calibration of Microarray Gene-Expression Data

379

and c = 1, …, Nchip for the probe and chip effects in terms of the probe number on the chip and the chip number in a series of microarray hybridizations, respectively. Each gene/transcript is subsumed in the chip effect because its expression degree is a sample- and, thus, chip-related property. LPp,c is the linear approximation of the amount of target binding in intensity units. It additively decomposes into contributions due to nonspecific and specific hybridization, NPp,c and SPp,c, respectively. The latter term can be further split into factors characterizing the affinity (ApP) and the expression degree (Ec) according to Eq. 3,

S pP,c = A pP ∙Ec = M c ∙K pP,S ∙[S ]c .

(6)

The right-hand side of Eq. 6 refers to an absolute scale where the expression degree is given in concentration units (material per volume, e.g., mole per liter). Note that the binding constant defines the concentration of “half occupancy,” at which 50% of the probe–oligonucleotides become occupied in the absence of nonspecific hybridization (see Eq. 3 with KpS∙[S] = 1 and [N] = 0). On the contrary, the middle part of Eq. 6 defines the expression degree and affinity in arbitrary units with an uncertainty of a constant factor. Microarray calibration experiments using sets of spiked-in transcripts at different concentrations confirmed the predicted nonlinear intensity response to a good approximation (2–6). This hyperbolic function levels off into an intensity asymptote of IPp,c→Mc + Oc upon saturation of the probe spots with bound transcripts at high transcript concentrations (see Fig. 1). It can be “linearized,” provided the asymptotic and optical background values are known,

LPp ,c =

I pP,c − Oc

(

(7)

)

1 − M c−1 ∙ I pP,c − Oc

This transformation is illustrated in the right part of Fig. 1. 2.3. Calibration Error: Linear or Logarithmic Scale

The raw intensity data are highly “noisy.” Application of simple error propagation formalisms to Eq. 5 provides the intensity error in the linear and logarithmic scales: e ≡ d ( I − O) ≈ ±

(db)2 + (( I − O) · dg )2 (8)

2

 db  2 log e ≈ d log( I − O) = ±  + (dg ) .  ( I − O) 

It splits into an additive contribution due to fluctuations of the transcript concentrations and the optical background,

380

Binder, Preibisch, and Berger

Fig. 1. Langmuir hybridization isotherm (left part, Eq. 5) and linearized isotherm (right part, Eq. 7) of PM and MM probes. The row of figures below shows the “hook” plot in D vs. S coordinates. Error limits are shown by dashed lines (Eq. 8). The hybridization regimes are indicated in the left part of the figure (see text). The optical background is omitted for sake of clarity.

db ∝ d [S ] ∝ d [N ] ∝ dO ∈ N (0, σb ) ; and into a multiplicative term

caused by variations of the binding affinity, dg ∝ d log K ∈N (0, σ g ). The former term dominates at small intensities, whereas the multiplicative contribution is the most significant source of variation at higher intensities. Most of the available data analysis algorithms assume a homoscedastic, intensity-independent gaussian error. The linear scale meets this assumption at small intensities but progressively underestimates the error with increasing signal. In turn, log-transformed data underestimate the error at low intensities. Mostly, relevant expression values refer to medium and higher intensity levels. Therefore, for most purposes, the data analysis is more adequately performed in log scale than in the linear scale. An apparently better alternative makes use of the so-called 1  generalized logarithm, g log( x ) ≡ log  ( x + x 2 + c ) . It behaves 2  linearly at small and logarithmically at high arguments, ensuring a

Calibration of Microarray Gene-Expression Data

381

virtually constant error width, g log e ≈ dg . However, its proper use requires scaling of the argument and of the parameter c (7, 8). Note that the standard deviations of the considered distributions are only constants in the absence of saturation. Otherwise, the error width decreases with progressive saturation at high intensities according to: s g ≈ (1 − M −1 ·( I − O)) · s g0

and

s b ≈ ((1 − M −1 ·( I − O)) · s g0 )2 + (s O0 )2 .

(9)

The error limits of the hybridization isotherm are illustrated in Fig. 1 by dashed lines. 2.4. Reference Probes: MM or Half-Price Solution

The use of MM probes as background references for the PM probes, as originally intended, brings up two practical problems: (1) for a considerable fraction of probe pairs, the MM fluoresce brighter than the PM. This observation appears “unphysical” because MM probes are assumed to bind transcripts at maximum in equal but never in higher amounts than the PM. (2) The MM probe intensities, on average, scatter more strongly than those of the PM. As a consequence, calibration algorithms either empirically attenuate the MM intensity values to ensure strictly positive PM–MM intensity differences or they deal completely without MM data (see Subheading 3.1. below). Half-price solutions for chips without MM are proposed to replace the “superfluous” MM by additional PM probes (9). New GeneChip generations such as the Exon 1.0 arrays are designed as PM-only chips without MM probes. However, intensity calibration of microarray data is still a challenging task and the question of whether the use of internal reference probes such as MM can bring some real benefit into chip analysis is not answered yet. For example, the problem of bright MM can be rationalized in terms of the “reversed” base pairings that form the complementary middle bases of the PM and MM probe sequences upon nonspecific hybridization and of the purine–pyrimidine asymmetry of binding strengths of RNA/DNA interactions (10, 11). Additionally, the variability problem of the MM probes can be, at least partially, explained on the level of base pairings of the middle base: In the MM, it changes from a complementary Watson– Crick pairing in the nonspecific duplexes into a mismatched pairing in the specific duplexes, whereas the respective pairing of the PM remains virtually unchanged (10, 11). Below, we present a new calibration algorithm that explicitly accounts for these effects. Moreover, this “hook” method uses the MM probes not only as a background reference, but also

382

Binder, Preibisch, and Berger

interprets them as a sort of “weak” PM that also responds to specific hybridization according to Eq. 5. In this approach, the MM operate as a hybridization reference over the full range of transcript concentrations. This way they enable the scaling of the intensities in a natural metrics system. We suggest that this idea opens a new view on the potential design and use of mismatched reference probes. 2.5. The Calibration Tasks

The intensity contribution due to specific hybridization, S, measures the expression degree on a relative scale. Consequently, the inversion of Eq. 5 with respect to S and the solution of Eq. 6 with respect to E (or [S]) furnishes a starting point to discuss the essential tasks for calibrating probe level data. It implies the need for estimating: 1. The background contributions, N and O. 2. The sequence-specific affinities K S /A and K N affecting N and S, respectively; and 3. The degree of saturation in terms of the saturation parameter M for correcting the intensity of each probe. Microarray intensity data are noisy with non-gaussian frequency distributions. Proper calibration also requires, therefore, the consideration of: 4. Appropriate error models based on the frequency distribution of the intensities and of their specific and nonspecific contributions (Eq. 5). The special design of GeneChip arrays raises two additional tasks for probe intensity calibration, namely: 5. The aggregation of the individual probe-level expression values of one probe set into one transcript-related expression value; and 6. The proper use of the MM probes to adjust the PM data. Usually, the expression measure E is given in arbitrary units that are related to the special conditions of a particular hybridization. For comparison with other chips, calibration therefore requires finally: 7. Adjustment of the chip-related expression measures into one common scale, which is, ideally, the absolute scale of transcript abundance in concentration units.

3. Preprocessing: State of the Art Microarray data calibration is usually called preprocessing because it is performed prior to higher-level statistical analysis, such as differentially expressed genes selection. A preprocessing method

Calibration of Microarray Gene-Expression Data

383

for GenChips typically consists of three basic “ingredients”: background correction, normalization, and summarization. The background correction step is typically done in an attempt to remove nonspecific binding and the optical background; the normalization step reduces systematic variation between chips; and the summarization step generates an expression value for each gene/probe set. Background correction typically uses information only from one array, normalization makes a series of arrays comparable, and summarization can be performed alternatively on the basis of single-chip or multi-chip data. Numerous algorithms exist for the steps dealing with one or several of the calibration tasks specified in the previous section. Many of them can be applied in different combinations and order, providing numerous potential preprocessing methods with apparently little consensus regarding which is the most suitable. In the next sections, we give a short overview of some of the most popular methods and review their performance on the basis of the results of different benchmark studies. 3.1. Linear Approximations

P P P P The linear approximation of Eq. 5, I p ,c ≈ L p ,c + Oc = S p ,c + N p ,c + Oc , neglects saturation at high transcript concentrations. It is used in basically all popular preprocessing methods: Microarray Suite 5 (MAS5, (12)), robust multiarray analysis (RMA, (13, 14)), gcRMA (15), dChip (16), probe logarithmic intensity error (PLIER, (17)), and variance stabilization normalization (vsn, (7)). The kernel of these methods, except vsn, essentially deals with the baseline correction and summarization steps, which, in principle, can be independently combined with stand-alone normalization algorithms such as quantile (18), global mean (19), loess, or invariant probe set normalizations (16) (see below). In contrast, vsn provides baseline-corrected and normalized probelevel expression values that can be further processed with any stand-alone summarization algorithm, such as median polish (see below). To clarify, by “method,” we mean the complete processing pipeline starting from raw intensity data and ending up with transcript-related expression values. Available algorithms can be roughly divided into global and probe-specific baseline-correction algorithms (see Table 1 for an overview). RMA and vsn, referring to the former group, correct all probe intensities of a selected microarray by one common background, whereas the other algorithms estimate a specific background value for each probe, partly, using the MM probe intensities. For summarization, all methods, except MAS5, process a series of chips in parallel. The obtained expression values are consequently context sensitive and require reprocessing upon elimination, substitution, or addition of arrays in the respective series. The methods can also differ with respect to the used error model that fits the data either in linear, log, or glog scale.

384

Binder, Preibisch, and Berger

Table 1 Comparison of preprocessing methods with respect to background correction, scaling of the expression values, and chip processing. The asterixes indicate adequate and useful approaches with respect to probe-specific effects, error propagation, and single-chip analysis vsn

RMA

gcRMA

PLIER

Background

Global*

Global

Specific*

Scale

glog*

log*

# of chips

Multi

Multi

dChip

MAS5

Hook

Specific* Specific*

Specific*

Specific*

log*

glog*

Lin

log*

glog*

Multi

Multi

Multi

Single*

Single*

In the following, we outline the algorithmic backbone of the selected methods: 3.1.1. Microarray Analysis Suite 5

MAS5 is a single-chip background and summarization method. It performs background correction in two steps: First, the optical background is estimated by dividing the chip surface into a 4 × 4 grid, taking the average over the 2% weakest intensities within each zone and subtracting an interpolated background depending on the x–y position of each probe to account for spatial inhomogeneities. Second, the MM intensities serve as estimates for the N contribution, SMAS5 = IPM − IMM*, where however “bright” MM are substituted by “representative” values IMM*, which transform negative differences (IPM – IMM) into small positive ones (IPM – IMM*) ³ 0 to obtain strictly positive specific signals for each probe, SMAS5 ³ 0. Finally, the SMAS5 values are transformed into log scale and summarized for each probe set using one-step Tukey’s biweight median, which effectively removes signals with large median absolute deviations. In addition to the expression measure, MAS5 calculates the detection call, a useful qualitative value, which indicates whether a transcript is reliably detected (present) or not detected (absent). MAS5 uses global normalization as standard, which simply rescales the log intensities of each probe by a chip-specific factor that ensures agreement between all chip averages in the considered series.

3.1.2. dChip

Two alternative versions of this method provide either PM only or PM–MM estimates of the expression degree using the equations IpcPM = ApPM∙Ec + Bp + e or IPM − IMM = ApPM–MM∙Ec + e to fit the respective intensities by nonlinear least squares (e is the additive error term). The model assumes equal background contributions of the PM and MM on all chips of a series, and includes the optical contribution, Bp = Np + O with Np = NpPM = NpMM. The method constrains the squared set average of the affinity to unity,

Calibration of Microarray Gene-Expression Data

385

<Ap2>peset = 1, with the consequence that the expression degree is obtained as the affinity-weighted average of the specific signal over the probeset, Ec = <Sc,p∙Ap>p, with larger weights given to high-affinity probes. dChip uses invariant-set normalization as standard: This method selects a subset of PM probes with small rank differences of their intensities in a series of arrays, and calculates an intensity-dependent correction curve from this subset, which is then applied to all probes. 3.1.3. Robust Multiarray Analysis

To get strictly positive expression estimates (S ³ 0), RMA decomposes the frequency distribution of the intensities into an exponential signal (P S (S) ~ exp(−a∙S)) and a gaussian background (P B(B) ~ N(B, mc, sc)) distribution: P I (Ip) = P B (B)∙P S (Sp). The distribution parameters a, mc, and sc are estimated from the chip data. The background-corrected signal referring to a given intensity is then obtained as the weighted average over the background and signal distributions, with the constraint SpRMA = Ip − BcRMA ³ 0: BcRMA = mc + s c ∙ (s c ∙ a − ∆f) (Df is the difference of normalized error functions). Summarization is performed by the fit of the log-transformed specific data of each probe set in a series of chips to the additive model, log(SpcRMA) = log EcRMA + log ApRMA + log e, using median polish to minimize the residual log error. The used constraint Median(log ApRMA)peset = 0 results in expression measures that are roughly related to the median of the log signal, i.e., log Ec ~ median(log Sp,c)p€set. RMA uses quantile normalization as standard. This algorithm transforms the different intensity distributions of a chip series into one “average” distribution.

3.1.4. gcRMA

This method is essentially identical to the RMA method, except for the background correction step. Here gcRMA accounts for the sequence specificity of nonspecific hybridization using the intensity of pseudo-MM as “representatives” taken from a subset of the MM possessing the same GC content as the PM probe of interest. Then the logarithm of the specific signal, log SgcRMA, is calculated as the weighted average over the gaussian background distribution and a signal distribution following a power law. As in RMA, the center of the background distribution is shrunken with respect to that of the pseudo-MM due to correlations with the PM, i.e., B pgcRMA = exp (r ln I pMM + (1 − r)∙mc ) (r is the coefficient ,c of correlation between the PM and the MM data and mc is the center of the MM distribution).

3.1.5. Variance Stabilization Normalization

The vsn approach shifts and rescales the intensity of a series of chips to transform their intensity-dependent heteroscedastic error distribution into an intensity-independent homoscedastic distribution. Instead of the logarithm, it uses the arcsinh function as a special case of the glog transformation, arcsinh(x) = g log(x) with c = 4 (see above), to get the background-corrected signal,

386

Binder, Preibisch, and Berger

arcsinh(Spcvsn) = arcsinh((Ipc – Bcvsn)/F0vsn). The chip-specific parameters Bcvsn and Fcvsn are obtained via maximum likelihood optimization for a subset of virtually invariant genes in a series of chips. The arcsinh-transformed probe-level expression values can then be summarized using, e.g., median polish, according to arcsinh(Spcvsn) » log Apvsn + log Ecvsn + log e (for Spcvsn>1). 3.1.6. Probe Logarithmic Intensity Error

This method uses the MM probes for background correction and the glog transformation for appropriate error handling. It fits −1 MM the equation S pPLIER using an = A pPLIER ∙EcPLIER = e ∙I pPM ,c ,c − e ∙I p ,c outlier-resistant nonlinear least squares algorithm for minimizing the error term log(e) = g log(SPLIER) − g log(IPM − IMM) with c = 4IPM∙IMM. The fit returns strictly positive signals SPLIER ³ 0 for all nonnegative intensities independently of the relation between the PM and MM values, i.e., including also bright MM, IMM > IPM. For sake of completeness we will notice the existence of alternative and partly interesting approaches such as the positionaldependent nearest neighbor (PDNN) method, which uses a nonlinear, sequence-specific model (20); TM, which is based in a very simple but effective fashion on the trimmed mean of PM–MM differences (21); factor analysis for robust microarray summarization (FARMS), a probe-specific RMA-like, multivariate approach (22); and a method based on strict signal deconvolution based expression detection (23).

3.2. Benchmark Criteria and Calibration Data

In the preceding section, we briefly outlined some of the most popular preprocessing methods. The diversity of competing algorithmic approaches implies profound effects on the derived expression measures with consequences for subsequent higherlevel statistical analysis. The correct choice of a method might depend on the scientific question being asked and on the particular experimental design and microarray data structure. Here, benchmark studies might permit users to judge each method using scientifically meaningfully summaries. Two basic benchmark criteria, precision and accuracy, are essential for judging calibration methods. The accuracy specifies the systematic bias of the method in terms of the deviation of the expression estimates from its true (usually unknown) value. In turn, the precision characterizes the resolution (or “uncertainty”) of the expression estimates. It is inversely related to their variability in replicate measurements. Different test scenarios are used for calibration/benchmark studies to model different experimental situations: In the Latin-square spiked-in experiment, the concentrations of a small set of ~15–40 transcripts are varied in definite concentration steps in a hybridization solution containing a cell extract as a constant background (24). These calibration data are suited to assess the concentration dependence of the intensity and the

Calibration of Microarray Gene-Expression Data

387

performance of the background correction and summarization steps. The small number of variable transcripts affecting less than 1% of the available probe sets and the Latin-square design of the experiment, which cyclically permutes the spikes among the chips, give rise to a rather small inter-chip variability. It makes the data not optimal for judging normalization algorithms. On the contrary, in the golden spike experiment, a relatively high number of transcripts referring to ~25% of all probe sets are hybridized on the chips without special background addition (25). The concentration of approximately one half of these spikes is varied in a “treatment versus control” design. Experiments of the golden spike type might help to develop new, improved normalization algorithms because the basic assumptions of global normalization methods are violated in many expression studies. Particularly, normalization methods such as quantile and global mean normalizations presume that only a small fraction of genes is differentially expressed, and that there are roughly equal numbers of upregulated and downregulated genes. These assumptions are rather restrictive and prevent the exploration of global changes of the expression level (see below). In dilution experiments, the total amount of RNA in the hybridization solution is changed in definite steps (24). In the closely related mixing experiments, two RNA extracts are mixed in different proportions, leaving the total amount of RNA constant (26). These types of experiments provide a good basis for studying the effect of the mutual interference between different transcript fractions in the hybridization solution on the performance of preprocessing methods. Another approach uses quantitative real-time PCR(27) as the gold standard method of measuring gene expression in tissue samples for the evaluation of microarray calibration. Alternative studies analyze statistical characteristics, such as the false discovery rate (21), correlations between genes (28, 29), or sources of variation between samples (30) to validate preprocessing methods on “real” data sets collected in a biomedical context. The practical relevance and consistency of the used criteria must be checked as the case arises: For example, correlation-based criteria favor methods that produce, on the average, zero correlations between randomly selected genes (28). Here methods are preferred that remove biases but, unfortunately, also the “valuable” expression signals. In addition, computer simulations are an interesting option to compare preprocessing methods. However, there is the problem of avoiding inherent circularities, e.g., if the data model relies on assumptions used in the analysis algorithm. For example, it is not surprising that methods ignoring probe-specific background levels perform well on data synthesized without probe-specific background contributions (31). Therefore, results from simulation

388

Binder, Preibisch, and Berger

studies must be critically reflected in the context of the actual simulation design. 3.3. Which Method is the Best?

Numerous studies have assessed preprocessing methods in a wide range of conditions to benchmark their performance. In a general sense, there is apparently no “best” method that clearly outperforms the others under all circumstances. Moreover, all of these methods have been proven in numerous applications to provide reasonable results. For example, in patient-cohort studies, researchers typically select sets of genes that are differentially expressed between certain known conditions (supervised approach) or they attempt to detect biological relations between samples or genes by grouping them according to their expression profiles (unsupervised approach). Often the goal is to obtain predictors for prognostically relevant categories. It has been argued that the choice of the preprocessing method has less influence on the final outcome, especially in studies based on large numbers of arrays, whereas it can have important effects on the results of smaller studies (29). The existence of a certain minimum number of differentially expressed genes is obviously sufficient for predictor selection without the need of exact quantification of the observed changes. Clearly, the reliability of such analyses will improve with the number of samples and/or with the significance level for detecting differential expression. On the other hand, genomic regulation is governed by the specificity of molecular interactions between genomic, transcriptomic, and proteomic factors, and their mutual relations and levels. Particularly, the estimation of transcript levels on an absolute scale using microarrays is a challenging task that becomes necessary for exploring mechanisms of gene regulation. For these issues, exact calibration and the choice of appropriate methods is an essential prerequisite. Calibration data reproducing the basic concentration dependence of the intensity without complex inter-chip variations of the hybridizations clearly show that the nonspecific hybridization background correction is the main factor that explains differences between the methods (25, 27, 28, 32). Global background correction algorithms such as vsn and RMA obviously underestimate the level of nonspecific hybridization, leading to attenuated estimates of differential expression with strong negative biases, especially at low expression levels. Methods with MM corrections such as MAS5, PLIER, and dChip outperform methods discarding MM data at medium and higher expression levels, providing much better accuracies. On the other hand, MM corrections give rise to highly variable expression estimates at low intensity levels with partly high false-positive detections. The correct balance between accuracy and precision depends on the signal intensity,

Calibration of Microarray Gene-Expression Data

389

with the problem that the gain in precision at low intensities must be paid for by a penalty in accuracy and vice versa. It seems that the much lower variability of RMA and vsn estimates expression of low-abundance genes in a biased, but very precise manner. Minimizing variability for biased estimates, however, produces a dangerous sense of confidence in potentially incorrect data. On the contrary, a higher variability at low intensities at least circumvents such incorrect conclusions as long the variability exceeds the bias. Here, the sequence-based background adjustment of gcRMA emerges as a method that may be the most optimum one across the whole intensity range (25, 27, 32). Generally, one has to keep in mind that the precision of expression measures can be improved by replicate measurements and also by further developing statistical concepts, e.g., by explicit consideration of the measurement error derived from the hybridization mechanism in a probe-dependent fashion (see above). On the contrary, the accuracy of calibration methods cannot be improved by replicates. It requires the understanding of the essential factors that govern microarray hybridization and their implementation into feasible algorithms. All considered methods systematically underestimate the expression level at high RNA concentrations because they neglect saturation. Here, nonlinear hybridization models such as the two-species Langmuir isotherm provide a more adequate concept to account for this effect. Other important challenges for the amelioration of calibration methods are the need for better probe-specific background corrections, for normalization algorithms that conserve differential expressions between the samples on an absolute scale, and also for better affinity corrections for more precise data. Note that most expressed genes are not necessarily the key players in genomic regulation. Hence, better background and affinity corrections should increase the resolution of the method to also detect relatively small changes of the expression level.

4. Hook Calibration: Toward Absolute Expression Measures

Our hook calibration method analyzes the intensity data of a given GeneChip microarray in terms of the two-species Langmuir isotherm (Eq. 5). The method uses the MM probe intensities as reference for the PM over the whole concentration range to discern typical hybridization regimes, namely those of predominant nonspecific binding (N), mixed hybridization (mix), predominant specific binding (S), saturation (sat), and asymptotic binding (as) as illustrated in Fig. 2. The intensity data are

390

Binder, Preibisch, and Berger

Fig. 2. The hook method. The raw intensity data of one GeneChip microarray are plotted into the D = log(PM/MM) vs. S = 1/2 log(PM ∙ MM) coordinate system and smoothed to get the raw hook curve. Then, probes from the N and S hybridization regimes are used to calculate four sets of 16 position-dependent nearest-neighbor sensitivity profiles of the affinity model (nonspecific and specific for the PM and MM each). After affinity correction of the intensities, one obtains the corrected hook curve. It is used to get improved sensitivity profiles in a second iteration step. The mix, S, and sat ranges of the corrected hook are well fitted using the two-species Langmuir hybridization model. The dimensions of the hook, its width and height, provide hybridization characteristics of the chip such as the binding strength of nonspecific hybridization and the mean PM/MM gain of the binding affinity, respectively.

aggregated into one mean hybridization characteristic called a hook curve because of its characteristic shape, which is predicted by the Langmuir model (see Figs. 1 and 2). The method uses the position-dependent nearest neighbor model to account for the probe-specific binding affinity on specific and nonspecific hybridization. It corrects the probe intensities for probe-specific background, affinity, and saturation limit. Note that our model differs from that of Zhang et al. (20) who restrict the positional dependence by a common weight function for the nearest-neighbor free energy terms. Our positional-dependent terms are freely

Calibration of Microarray Gene-Expression Data

391

adjusted (see Eq. 10 below). The hook method is a single-chip approach, which provides essential hybridization summaries such as the fraction of not-expressed probe sets (%N), the mean background intensity (NcPM), and the PM/MM sensitivity gain on specific binding (sc). 4.1. Algorithm

The algorithm consists of the following basic steps (see also Fig. 2): 1. The intensity data are corrected for the optical background using the Affymetrix zone algorithm (19). 2. The PM and MM probe intensity data are plotted into a special type of M–A plot, where the ordinate value is the log difference, D = log IPM − log IMM, and the abscissa value is the set-averaged log sum, S= 0.5 < (log IPM + log IMM)>set. 3. The data are smoothed using a sliding window over ~100 probe sets along the abscissa. The obtained D vs. S relationship is called a raw hook curve because of its characteristic shape. It divides into four characteristic parts: the N range referring to the relatively flat starting region, the subsequent mix range of positive slope, the S range near the maximum, and the sat range with a negative slope beyond the maximum. 4. The intensities of the probes from the N and S ranges are used to fit the positional-dependent nearest neighbor model. It decomposes the log intensity variation about its set average P,h into a sum of additive sensitivity terms, de k (BB¢ )p , where BB¢ is the couple of adjacent bases at position k and k + 1 of the probe sequence (k = 1, …, 24; BB¢ = AA, AT, …, CC). The model is parameterized separately for nonspecific and specific binding (h = N,S) of the PM and MM (P = PM,MM), respectively (10, 11), thus, providing four sets of 16 BB¢ sensitivity profiles, which, in turn, are used to calculate the affinity correction in a sequence-specific fashion: 24

P,h log A pP,h ,c = ∑ de k ,c (BB¢ ) p . k =1

(10)

5. Next, the probe intensities are corrected for sequence-specific affinities using the model adjusted in the previous step. In the mix range, we use a weighted superposition of the N and P,S − (x P,S ∙log A pP,S )∙log A pP,N ,c + (1 − x ,c ) S contributions, I pP,corr , where = I pP,c ∙10 ,c

P P,N P x pP,S is the fraction of specific ,c = max(1 − N c ∙A p ,c / L p ,c , 0)

hybridization contributing to the intensity.

6. The affinity-corrected intensities are used to get the corrected version of the hook curve with the coordinates Shook and Dhook and an improved set of sensitivity profiles by reiteration of

392

Binder, Preibisch, and Berger

steps 2–5. Note the significant differences between the raw and the corrected hooks: Affinity correction clearly reduces the width of the N range and also the scattering of the data in the remaining hybridization regimes. 7. The mix, S, and sat ranges of the corrected hook curve are fitted using the two-species Langmuir isotherm (see Subheading 4.2). The fit and the separate analysis of the N range provide chip characteristics such as the mean background level (NcPM), the saturation intensity (Mc), the width and correlation coefficient of the background distribution (s and r), and the mean PM/MM sensitivity gains (nc and sc) that are used for calibration of the probe-level intensity data in the next step. 8. The probe intensities are linearized using Eq. 7. Then, the probe-level expression degree is estimated as the weighted glog average of the total signal minus the respective nonspecific background contribution according to Eq. 5:

PM N PM x PM PM,N g log(S pPM , c ) = ∫ N ( N c , s )·g log ( L p , c − 10 · N c · Ap , c ) ·dx,

(11)

where N(NcPM,s N) is the gaussian distribution of the nonspecific, affinity-corrected PM signal. Alternatively, we also calculate a PM–MM version by substituting the g log term in the integral of Eq. 11 for MM x PM PM,N g log(( LPM − (nc−1 · N cPM )− r · ApMM,N ·10( r −1)· x )) p , c − L p , c ) − 10 · N c ( Ap , c ,c

. This approach uses the bivariate marginal distribution of the PM–MM background, where r denotes the coefficient of correlation between the PM and MM background intensity values. 9. The probe-level specific signals are affinity corrected PM PM,S −1 according to E pPM for the PMonly and ,c = S p ,c ∙(A p ,c ) − MM − MM E pPM = S pPM ∙(A pPM,S − s c−1∙A pMM,S )−1 for the PM–MM estimates, ,c ,c ,c ,c

and then summarized by means of the Tukey biweight median to get robust transcript-level expression estimates. 4.2. Natural Metrics of Expression Values

The hook-like shape of the D vs. S dependence can be reproduced using the two-species Langmuir isotherm (see Fig. 1). First, we applied Eq. 5 separately to the intensities of the PM and MM and then transformed the predicted intensities into D vs. S coordinates. The obtained theoretical function fits the experimental data to a good approximation (Fig. 2). The hook curve considers all probes of a given chip. It consequently summarizes the prope-rties of a particular hybridization into a sort of mean binding isotherm. The hook curve is divided into five characteristic ranges, which can be assigned to different hybridization regimes (see step 3 of Subheading 4.1 and also Fig. 2): In the N regime, the probes hybridize almost exclusively nonspecifically owing to

Calibration of Microarray Gene-Expression Data

393

the absence or low concentrations of specific transcripts. In the subsequent mix regime, both specific and nonspecific transcripts significantly contribute to the observed intensity of the probes. In the S regime, the probes predominantly hybridize with specific transcripts. In the sat regime, the probes become progressively saturated with bound transcripts. This effect first and foremost affects the PM due to their higher specific-binding constant. As a consequence, the concentration dependence of the intensity progressively becomes nonlinear and D starts to decrease. In the “as” range, the intensities of the PM and MM reach their asymptotic values owing to complete saturation. In typical hybridizations, this region is usually not reached. Note that the D vs. S coordinates are simply linear combinations of the PM and MM intensities. Hence, the hook curve can be interpreted as a special representation of the binding isotherm where the explicit dependence of the probe intensities on the (usually unknown) transcript concentrations is replaced by the (experimentally available) relation between the PM and the MM probe intensities. Here, the MM probes serve as an internal reference subjected essentially to the same hybridization law as the PM, however, with modified characteristics. Particularly, one expects to find different binding constants for specific and, possibly, also nonspecific binding. Let us denote the respective PM/MM ratios with s c ≡ K cPM,S / K cMM,S and nc ≡ K cPM,N / K cMM,N , respectively. Other hybridization characteristics are the mean background intensity of the PM due to nonspecific binding, NcPM, and the maximum intensity, Mc, referring to completely saturated probe spots. The coordinates of the start and end points of the hook curve, and, to a good approximation, also its maximum, can be directly related to basic hybridization characteristics. For example, the S coordinates of the start and end points, S(0) » log(NcPM) − 1/2 log(nc) and S (¥) » log(Mc), estimate the mean nonspecific background and the saturation intensity, respectively. The D coordinates of the start point and of the maximum, Ds(0) » log(nc) and Dmax » log(sc) + log(nc), are measures of the mean log difference between binding constants of the PM and MM for nonspecific and specific binding, respectively. Making use of these data, one obtains the “width” and the “height” of the hook curve, which estimate the mean binding strength of nonspecific hybridization, Sas(¥) − Ss(0) » log(XcPM,N) = −log(KcPM,N∙[N]), and the mean affinity gain for specific binding of the PM relatively to the MM, Dmax − Ds(0) » log(sc), respectively. The binding strength, XcPM,N, is a dimensionless measure of the concentration in units of the respective binding constant. A value of unity refers to a surface coverage of Q = 0.5 in the absence of specific transcripts. The mean affinity gain is directly related to the free energy difference due to the replacement of the complementary Watson–Crick

394

Binder, Preibisch, and Berger

pairing with a mismatched base pairing in the respective probe/ transcript duplexes (11). In summary, the hook curve spans a sort of natural metrics system for the expression estimates. It reflects essential hybridization characteristics in terms of its geometric dimensions: width, height, and “start” coordinates. 4.3. Examples: Chip Characteristics

Figure 3 shows a collection of representative hook curves taken from six hybridizations of human genome chips of different generations, a plant chip (Arabidopsis thaliana chip ATH-12501) and alternative hybridizations with cRNA and cDNA. Along the chip generations, the spot size of the probes decreases from 20 mm (U95), to 18 mm (U133A and U133Av2), and to 11 mm (U133-plus2). The reduction of spot size has enabled the number of probe sets per chip to be increased from 16,000 to 22,000, and to 54,000, respectively (33, 34). In addition, this development is accompanied by modifications of the reagent kits and the scanning technique. Importantly, probe selection has also been improved by applying more sophisticated genomic and thermodynamic criteria, especially after the U95 generation. The different shapes of the uncorrected hook curves of the U95 and U133 chips, particularly the broader N range of the former one, can be explained by the partially suboptimal probe quality of the U95 generation containing a relatively high number of weak-affinity probes. For the U133 series, the N range considerably narrows, essentially due to better quality of the probes. It is important to note that our affinity correction levels out this difference, to a large extent providing corrected hook curves of very similar shape for the U95 and U133 chips. The width of the fitted hook curves estimates the binding strength of the nonspecific background in “intrinsic” units of the respective binding constant (see above). A wider hook curve is equivalent to a lower level of nonspecific background and, thus, with an increased dynamic measurement range of the probe spots. The widths of the fits shown in Fig. 3 indicate that this range slightly increases with the chip generations (see also Table 2). In general, microarray technology takes advantage of either of two types of chemical entities as the labeled target, cRNA or cDNA, considered to be virtually equivalent for the purpose of expression analysis. Here we compare both options for illustrating the effect of the two binding “chemistries” on the chip characteristics. The substitution of cRNA by cDNA gives rise to essentially two effects (see Fig. 3): First, it increases the dynamic range by reducing the background level, and, second, it reduces the variability of the uncorrected background intensity. Among the two options, affinity correction to a much less extent improves the hook curve of the DNA hybridization. The higher nonspecific background level and variability of the RNA hybridization were

Calibration of Microarray Gene-Expression Data

395

Fig. 3. Hook curves of six different microarray hybridizations: raw hook (lower panel), affinity-corrected hook and number distribution (middle), and the fit of the specific part of the hook (upper panel) for human genome GeneChips of different generations (upper row of figures, Affymetrix HG-U95, HG-U133, and HG_U133_plus2) taken from the spiked-in data sets (37) and mixing series (26); and of a plant genome (lower row; Arabidopsis thaliana, ATH1_121501 array) and of hybridization with cRNA and cDNA (24). The vertical dotted line indicates the “break” of the hook curve that was used to estimate the number of “absent” probe sets given in percent for each hybridization in the figures. See also Table 2 for the mean hybridization characteristic of the respective experimental series.

(37)

(37)

(26)

(24)

(24)

Affymetrix spiked-in

Affymetrix spiked-in

Barnes dilution

Eklund spiked-in (cRNA)

Eklund spiked-in (cDNA)

HG-U133Av2 (6)

HG-U133Av2 (6)

HG-U133_plus2 (12)

HG-U133A (42)

HG-U95A (59)

HG-U95A (74)

Affymetrix chip (# of chips)

Frontal brain

(38)

HG-U95Av2 (6)

Patient cohort and cell line studies

(36)

Ref.

GeneLogic dilution

Calibration data sets

Data set

1.81 ± 0.13

1.58 ± 0.02

1.63 ± 0.01

1.62 ± 0.01

1.47 ± 0.02

1.93 ± 0.06

1.74 ± 0.13

log O

Optical background

1.87 ± 0.18

1.06 ± 0.01

1.55 ± 0.14

1.47 ± 0.08

1.54 ± 0.05

1.70 ± 0.05

1.54 ± 0.22

log N

Nonspecific background

4.80 ± 0.28

4.22 ± 0.03

4.51 ± 0.13

4.48 ± 0.03

4.20 ± 0.04

4.14 ± 0.09

4.27 ± 0.15

log M

saturation intensity

2.93 ± 0.25

3.16 ± 0.02

2.96 ± 0.15

3.01 ± 0.10

2.66 ± 0.05

2.44 ± 0.10

2.75 ± 0.20

log X N

N binding strength

0.91 ± 0.02

0.93 ± 0.03

1.08 ± 0.04

1.02 ± 0.03

0.85 ± 0.04

0.89 ± 0.04

1.00 ± 0.05

log s

PM/MM gain (s)

0.10 ± 0.01

0.04 ± 0.002

0.07 ± 0.01

0.08 ± 0.005

0.08 ± 0.005

0.07 ± 0.006

0.10 ± 0.015

log n

PM/MM gain (n)

0.30 ± 0.006

0.29 ± 0.003

0.36 ± 0.04

0.32 ± 0.0013

0.29 ± 0.003

0.30 ± 0.008

0.28 ± 0.008

σN

standard deviation of N-BG

Table 2 Mean hybridization characteristics of GeneChips estimated from different experimental series. The values are given as MED ± MAD, where MED is the median and MAD the median absolute deviation calculated from the respective values over the experimental series in logarithmic scale (log10)

396 Binder, Preibisch, and Berger

(40)

(41)

(42)

(43)

(44)

(45)

(46)

Colon cancer

Lymphocytic leukemia

Renal carcinoma

Mouse

Arabidopsis

Yeast

Rice

Rice (25)

1.62 ± 0.03

1.85 ± 0.06

1.84 ± 0.09

ATH1–121501 (16)

Yeast-2 (41)

1.86 ± 0.05

1.80 ± 0.11

1.60 ± 0.05

1.88 ± 0.13

1.84 ± 0.06

MOE430A (33)

HG-U133_plus2 (47)

HG-U133_plus2 (20)

HGU133Av2(20)

HG-U133A (221)

1.31 ± 0.06

1.44 ± 0.07

1.41 ± 0.15

1.55 ± 0.11

1.99 ± 0.09

1.29 ± 0.08

1.62 ± 0.12

1.96 ± 0.15

4.51 ± 0.09

4.60 ± 0.07

4.46 ± 0.06

4.42 ± 0.12

4.73 ± 0.09

4.32 ± 0.14

4.63 ± 0.04

4.49 ± 0.10

3.20 ± 0.10

3.16 ± 0.10

3.01 ± 0.15

2.87 ± 0.11

2.72 ± 0.10

3.03 ± 0.15

3.01 ± 0.12

2.43 ± 0.15

1.0 ± 0.03

1.05 ± 0.04

0.99 ± 0.06

0.98 ± 0.03

0.82 ± 0.03

0.87 ± 0.04

0.96 ± 0.05

0.85 ± 0.06

0.03 ± 0.008

0.002 ± 0.03

0.03 ± 0.007

0.06 ± 0.01

0.10 ± 0.006

0.06 ± 0.006

0.06 ± 0.01

0.09 ± 0.01

Median (med(x)) and median absolute deviation: MAD = 1.4·med(|x − med(x)|)) (the factor accounts for asymptotic normal consistency)

(39)

Malignant lymphomas

0.30 ± 0.01

0.31 ± 0.03

0.26 ± 0.004

0.29 ± 0.008

0.38 ± 0.02

0.30 ± 0.009

0.31 ± 0.01

0.35 ± 0.03

Calibration of Microarray Gene-Expression Data 397

398

Binder, Preibisch, and Berger

attributed to relatively stable mismatched “G·u wobble” base pairings in the RNA/DNA duplexes, which give rise to less specific binding compared with DNA/DNA hybridizations without such stable mismatch pairings (24). To generalize the discussed single-chip-related results, we collect mean values of these characteristics over experimental series taken from different studies dealing with calibration issues, biological samples, cancer specimen, different chip generations, and species (Table 2). In essence, most of the chip characteristics provide relatively similar values for the different series, despite the very heterogeneous origin of the data. The maximum intensity and the optical and nonspecific background levels vary roughly over three orders of magnitude. The PM affinity gain parameter for specific hybridization shows that the central mismatch of the MM causes, on the average, a tenfold (s ~ 7–11) increased affinity of the PM compared with that of the MM. On the contrary, for nonspecific binding, one expects, on the average, the same affinity for the PM and MM. The respective PM/MM gain parameter, however, indicates a small but significantly increased PM affinity, n ~ 1.05–1.25. We tentatively attribute this effect to false-positive detections in the N range, i.e., to a certain amount of specific hybridization among the absent probes (see below). The relatively narrow distributions of hybridization characteristics reflect the common physical–chemical basics of the method, for example, the oligonucleotide density and size of the probe spots, the common MM probe design, and hybridization conditions. The positional-dependent sensitivity terms, d e (Eq. 10), represent another type of chip characteristic because they are used to adjust the intensities of each microarray. Figure 4 shows the sensitivity profiles of the MM probes for three of the chips taken from Fig. 3; note the similar profiles of the two selected RNA hybridizations. Generally, one observes C>G>T>A for most of the sequence positions. On the contrary, for the DNA hybridization, this order changes to G>C>A»T. The positional-dependent sensitivity terms, d eɛ are directly related to the binding strength of base pairings in the probe/ target duplexes (10, 11, 35), which are basically independent of a particular hybridization but change with the chemical entity. In Fig. 4, we aggregated the 16 nearest-neighbor profiles into four single-base profiles for the sake of clarity. In addition, the maximum and minimum NN profiles are shown. For the RNA hybridizations, for example, adjacent CC provide the strongest intensity increment, whereas for DNA hybridization, one gets GG and CG. Note also the “dents” in the middle of the specific MM profiles. They reflect the effect of the mismatches on the binding strength with “molecular resolution.”

Calibration of Microarray Gene-Expression Data

399

Fig. 4. Sensitivity profiles of three chips shown in Fig. 3. Only the MM profiles for nonspecific (above) and specific (below) hybridization are shown. The PM profiles look similar to those of the nonspecific MM profiles. The 16 nearest neighborterms (NN) profiles are aggregated into four single-base profiles for the sake of clarity (symbols). In addition, each figure shows the two NN profiles with the largest positive and negative values. The profiles of the RNA hybridizations differ from those of the DNA hybridization due to the different binding chemistry.

4.4. Examples: Expression Values

For further validation of the method, we analyzed the Affymetrix Latin-square spiked-in and the GeneLogic dilution data sets (24). The corrected hook curves of selected chips of these series are shown in Figs. 5 and 7, respectively. The hook curves of the spiked-in series mainly reflect the hybridization of the cell extract, which was added in equal amounts to all hybridizations (Fig. 5). In addition, each chip contains a set of “spiked-in” probes covering the whole concentration range of the spikes (0–512 pM). The D vs. S coordinates of these spikes spread over the full range of the hook curve (see circles in Fig. 5). Their positions shift along the hook to the right with increasing transcript concentration. Probes without specific transcripts and probes with only tiny spiked-in concentrations accumulate mainly within the N range of the hook curve. In a simple approximation, we classify these probes as “absent” in analogy with the absent calls calculated by MAS5 (19). The insertion in Fig. 5 shows that both methods, hook and MAS5, provide very similar absent rates for the spikes. Note that the vertical shift between the MAS5 and hook data is due to the somewhat arbitrary choice of the threshold parameters used in both methods. It can be simply reduced by appropriate adjustment. Figure 6 shows the expression measures obtained from selected preprocessing methods as a function of the spiked-in concentration.

400

Binder, Preibisch, and Berger

Fig. 5. Hook curve of one spiked-in hybridization (HGU-133A). The open circles refer to the spiked-in probes. Their positions move along the hook to the right with increasing spiked-in concentration of the respective specific transcripts. The vertical line indicates the breakpoint between the N and mix regimes, which classifies the probes into absent and present ones. The insertion shows the fraction of absent probes as a function of the spiked-in concentration obtained from the hook and the MAS5 methods.

Perfect calibration refers consequently to a diagonal line of slope unity in this double-logarithmic plot. The hook and gcRMA methods clearly outperform MAS5 and RMA with respect to this criterion. Note that the reduced slope of the RMA curve indicates a systematic bias, which underestimates differential expression roughly by the square root of the true change, FCRMA » (FCtrue)0.5. Figure 6 also reveals that saturation gives rise to the flattening of all curves at high concentrations except that of the hook method, which corrects the data for this effect. Dilution of the hybridization solution in the dilution series gives rise to the progressive shift of the N range of the hook curve toward smaller abscissa values, leaving the position of the asymptotic “as” range unchanged (Fig. 7). The associated “widening” of the curve is compatible with the global decrease of the transcript concentration in this experiment (see above). This trend is also paralleled by the disappearance of the “sat” range, i.e., dilution globally decreases the occupancy of the probes.

Calibration of Microarray Gene-Expression Data

401

Fig. 6. Mean expression degree of all spiked-in probe sets as a function of the spiked-in concentration. The comparison of different preprocessing methods shows that the single-chip hook method performs roughly as well as the multi-chip method gcRMA. The diagonal lines of slope one refer to optimum calibration. The dotted diagonals indicate fivefold changes with respect to the dashed diagonal line. The smaller slope of MAS5 and especially of RMA compared with that of hook and gcRMA indicate the accuracy penalty of these methods. Note that the MAS5 and gcRMA curves are vertically shifted for the sake of clarity.

Figure 7b shows that the background intensity indeed changes almost linearly with dilution. The mean nonspecific background (N) is the log intensity average over the N range of the respective hooks. The optical background (O) referring to 2% of the darkest probes is obtained in step 1 of the algorithm. The total background (N + O) is independently obtained by omitting this optical background correction in the hook algorithm. The relation between the background levels indicates that the optical contribution gradually decreases with increasing transcript concentrations. Moreover, the residual slope of the O data shows that the “optical” background correction probably also comprises small contributions from nonspecific hybridization. Simple dilution does not change the component composition of the hybridization solution. Consequently, the amount of absent probe sets is expected to remain invariant in the different dilution steps. The respective fractions of absent probes obtained from the hook curves confirm this expectation (Fig. 7c). On the contrary, MAS5 provides an increasing amount of absent probes at smaller transcript concentrations, probably because the underlying algorithm converts probes with smaller intensities progressively into absent ones. The hook method uses the N region as classificatory criterion for absent probes. Obviously, it is more robust

402

Binder, Preibisch, and Berger

Fig. 7. (a) Hook curves of the dilution experiments for different amounts of RNA (see figure). The dashed curves are calculated using the two-species Langmuir isotherm assuming a common asymptotic maximum intensity value. On dilution, the position of the left branch of the hook shifts to smaller abscissa values, indicating the decrease of nonspecific hybridization. (b) Background level on dilution. The total background (N + O) decomposes into contributions due to the optical effects (O) and nonspecific hybridization (N). (c) The hook method provides a virtually constant fraction of absent probes on dilution, whereas MAS5 progressively overestimates absent calls.

against dilution effects than the probe intensity criterion used by MAS5 (19). Figure 8 illustrates the effect of dilution on the expression levels of selected probe sets. The expression data obtained from the hook algorithm correctly reflect the linear decrease of transcript concentration on dilution in contrast to the MAS5 and RMA expression levels, which remain virtually constant. The latter effect is the result of the used normalization algorithms,

Calibration of Microarray Gene-Expression Data

403

Fig. 8. Expression values of selected probes and methods on dilution. The concentration of the specific transcripts linearly decreases as reflected by the hook estimate. The other methods provide different, mostly constant expression estimates owing to normalization. Note that AFFXBioB3 is a hybridization control that is spiked into the hybridization solution with constant concentration. Again, the hook method well reproduces this behavior.

which, for MAS5 (global mean normalization) and RMA (quantile), balance the probe-level data relative to a mean characteristic over all dilution steps. This relative scale remains virtually invariant in this type of experiment. In contrast, the hook method uses an absolute scale, which sensitively responds to dilution effects. A set of special probes, the so-called hybridization controls, are spiked into the hybridizations with equal concentrations. The global normalizations pretend variable expression degrees for these probes over the dilution series (e.g., AFFXBioB3_at, see Fig. 8), whereas the hook expression values remain virtually constant as expected. Note that another effect is also revealed in the expression data shown in Fig. 8: The mean expression levels of the selected transcripts differ by more than three orders of magnitude. These absolute changes are accompanied by distinct variations between the expression levels provided by the different methods. For example, one gets RMA > hook at intermediate expressions (31432_at in Fig. 8) but partly hook > RMA at high (31463_at) and low (31491_at) levels. These trends can be attributed to the

404

Binder, Preibisch, and Berger

better linearity of the hook method over the whole concentration range, which reduces systematic biases due to background and saturation effects compared, e.g., with RMA (see also Fig. 6). 4.5. Download

The beta version of the hook program can be downloaded from http://www.izbi.de. The stand-alone JAVA program processes single chips and chip series in a batch mode according to the scheme given in Fig. 2. Chip and probe set-related characteristics such as expression degrees, hook curves, and sensitivity profiles are exported in tabular form and .jpg graphics. The detailed description of the method and selected applications are given in refs. (47, 48).

5. Conclusions The improvement of microarray calibration methods is an essential prerequisite for obtaining absolute expression estimates, which, in turn, are required for quantitative analysis of transcriptional regulation. Benchmark studies indicate that the correction for nonspecific background intensity contributions is the crucial preprocessing step. Here, mismatched MM probes provide essential information not available from PMonly approaches. Among established linear calibration approaches, gcRMA emerges as the method that makes the best compromise between accuracy and precision across the whole intensity range. The Langmuir hybridization model provides a physically adequate and computationally feasible approach for microarray intensity calibration, with the potency to improve existing linear methods. Our hook calibration method uses this model together with the positionaldependent nearest-neighbor affinity correction. Although related to single-chip analysis, the hook method performs roughly as well as the multi-chip method, gcRMA method, in estimating expression values. The hook method, in addition, provides a set of chip summary characteristics that evaluate the performance of a given hybridization in terms simple parameters such as the mean nonspecific background intensity, its saturation value, the mean PM/ MM sensitivity gain, and the fraction of absent probes.

Acknowledgments We thank Anke Wendschlag for performing some of the data calculations. The work was supported by the Deutsche Forschungsgemeinschaft under grant no. BIZ 6/4. H. Berger was supported

Calibration of Microarray Gene-Expression Data

405

by the Molecular Mechanisms in Malignant Lymphomas Network Project of the Deutsche Krebshilfe (grant no. 70-3173-Tr3) to which we are grateful for using the MMML gene expression data.

References 1. Binder, H. (2006), Thermodynamics of competitive surface adsorption on DNA microarrays – theoretical aspects, Journal of Physics Condensed Matter 18, S491–523. 2. Hekstra, D., Taussig, A. R., Magnasco, M., and Naef, F. (2003), Absolute mRNA concentrations from sequence-specific calibration of oligonucleotide arrays, Nucleic Acids Research 31, 1962–68. 3. Burden, C. J., Pittelkow, Y. E., and Wilson, S. R. (2004), Statistical analysis of adsorption models for oligonucleotide microarrays, Statistical Applications in Genetics and Molecular Biology 3, 35. 4. Binder, H., Kirsten, T., Loeffler, M., and Stadler, P. (2004), The sensitivity of microarray oligonucleotide probes – variability and the effect of base composition, Journal of Physical Chemistry B 108, 18003–14. 5. Binder, H., and Preibisch, S. (2006), GeneChip microarrays – signal intensities, RNA concentrations and probe sequences, Journal of Physics Condensed Matter 18, S537–66. 6. Burden, C. J., Pittelkow, Y. E., and Wilson, S. R. (2006), Adsorption models of hybridization and post-hybridization behaviour on oligonucleotide microarrays, Journal of Physics Condensed Matter 18, 5545–65. 7. Huber, W., von Heydebreck, A., Sueltmann, H., Poustka, A., and Vingron, M. (2002), Variance stabilization applied to microarray data calibration and to the quantification of differential expression, Bioinformatics 1, 1–9. 8. Durbin, B. P., Hardin, J. S., Hawkins, D. M., and Rocke, D. M. (2002), A variance-stabilizing transformation for gene-expression microarray data, Bioinformatics 18, 105–10. 9. Wu, Z., and Irizarry, R. A. (2005), A statistical framework for the analysis of microarray probe-level data, John Hopkins University, Dept. of Biostatistics Working Paper 73, 1–31. 10. Binder, H., and Preibisch, S. (2005), Specific and non-specific hybridization of oligonucleotide probes on microarrays, Biophysical Journal 89, 337–52. 11. Binder, H., Preibisch, S., and Kirsten, T. (2005), Base pair interactions and hybridization isotherms of matched and mismatched

oligonucleotide probes on microarrays, Langmuir 21, 9287–302. 12. Affymetrix (2001), Affymetrix Microarray Suite 5.0, in “User Guide”, Affymetrix, Inc., Santa Clara, CA. 13. Irizarry, R. A., Bolstad, B. M., Collin, F., Cope, L. M., Hobbs, B., and Speed, T. P. (2003), Summaries of Affymetrix GeneChip probe level data, Nucleic Acids Research 31, e15. 14. Irizarry, R. A., Hobbs, B., Collin, F., BeazerBarclay, Y. D., Antonellis, K. J., Scherf, U., and Speed, T. P. (2003), Exploration, normalization, and summaries of high density oligonucleotide array probe level data, Biostatistics 4, 249–64. 15. Wu, Z., Irizarry, R. A., Gentleman, R., Murillo, F. M., and Spencer, F. (2003), A model based background adjustment for oligonucleotide expression arrays, John Hopkins University, Dept. of Biostatistics Working Paper 1. 16. Li, C., and Wong, W. H. (2001), Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection, Proceedings of the National Academy of Sciences of the United States of America 98, 31–36. 17. Affymetrix (2005), Guide to probe logarithmic intensity error (PLIER) estimation. 18. Bolstad, B. M., Irizarry, R. A., Astrand, M., and Speed, T. P. (2003), A comparison of normalization methods for high density oligonucleotide array data based on variance and bias, Bioinformatics 19(2), 185–93. 19. Affymetrix (2002), Statistical Algorithms Description Document, Santa Clara. 20. Zhang, L., Miles, M. F., and Aldape, K. D. (2003), A model of molecular interactions on short oligonucleotide microarrays, Nature Biotechnology 21, 818–28. 21. Shedden, K., et al. (2005), Comparison of seven methods for producing Affymetrix expression scores based on False Discovery Rates in disease profiling data, BMC Bioinformatics 6, 26. 22. Hochreiter, S., Clevert, D.-A., and Obermayer, K. (2006), A new summarization method for Affymetrix probe level data, Bioinformatics 22, 943–49.

406

Binder, Preibisch, and Berger

23. Havilio, M. (2005), Signal deconvolution based expression-detection and background adjustment for microarray data, Journal of Computational Biology 13, 63–80. 24. Eklund, A. C., Turner, L. R., Chen, P., Jensen, R. V., deFeo, G., Kopf-Sill, A. R., and Szallasi, Z. (2006), Replacing cRNA targets with cDNA reduces microarray cross-hybridization, Nature Biotechnology 24, 1071–73. 25. Choe, S., Boutros, M., Michelson, A., Church, G., and Halfon, M. (2005), Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset, Genome Biology 6, R16. 26. Barnes, M., Freudenberg, J., Thompson, S., Aronow, B., and Pavlidis, P. (2005), Experimental comparison and cross-validation of the Affymetrix and Illumina gene expression analysis platforms, Nucleic Acids Research 33, 5914–23. 27. Qin, L.-X., Beyer, R., Hudson, F., Linford, N., Morris, D., and Kerr, K. (2006), Evaluation of methods for oligonucleotide array data via quantitative real-time PCR, BMC Bioinformatics 7, 23. 28. Ploner, A., Miller, L., Hall, P., Bergh, J., and Pawitan, Y. (2005), Correlation test to assess low-level processing of high-density oligonucleotide microarray data, BMC Bioinformatics 6, 80. 29. Verhaak, R., Staal, F., Valk, P., Lowenberg, B., Reinders, M., and de Ridder, D. (2006), The effect of oligonucleotide microarray data preprocessing on the analysis of patient-cohort studies, BMC Bioinformatics 7, 105. 30. Zakharkin, S., Kim, K., Mehta, T., Chen, L., Barnes, S., Scheirer, K., Parrish, R., Allison, D., and Page, G. (2005), Sources of variation in Affymetrix microarray experiments, BMC Bioinformatics 6, 214. 31. Freudenberg, J., Boriss, H., and Hasenclever, D. (2004), Comparison of preprocessing procedures for oligo-nucleotide microarrays by parametric bootstrap simulation of spike-in experiments, Methods of Information in Medicine 5, 434–38. 32. Irizarry, R. A., Wu, Z., and Jaffee, H. A. (2006), Comparison of Affymetrix GeneChip expression measures, Bioinformatics 22, 789– 94. 33. Affymetrix (2001), Array Design for the GeneChip Human Genome U133 Set. 34. Affymetrix (2003), GeneChip Human Genome U133 Arrays. 35. Binder, H., Kirsten, T., Hofacker, I., Stadler, P., and Loeffler, M. (2004), Interactions in oligonucleotide duplexes upon hybridisation

of microarrays, Journal of Physical Chemistry B 108, 18015–25. 36. GeneLogic dilution data: http://www.GeneLogic. dilution.com/. 37. Affymetrix spiked-in data set: http://www. affymetrix.com/support/technical/sample_ data/datasets.affx. 38. Deng, V., et al. (2007), FXYD1 is an MeCP2 target gene overexpressed in the brains of Rett syndrome patients and Mecp2-null mice, Human Molecular Genetics 16, 640–50. 39. Hummel, M., et al. (2006), A biologic definition of Burkitt’s lymphoma from transcriptional and genomic profiling, The New England Journal of Medicine 354, 2419–30. 40. Juhasz, A., Markel, S., Gaur, S., Wu, X., and Doroshow, J. (2007), Inhibition of NOX1 Gene Expression with shRNA in Human Colon Cancer, Gene Expression Omnibus GSE4561. 41. Malek, S. N., and Ouilette, P. N. (2007), Chronic lymphocytic leukemia (CLL) gene expression comparison, Gene Expression Omnibus GSE 9250. 42. Furge, K. A., Chen, J., Koeman, J., Swiatek, P., Dykema, K., Lucin, K., Kahnoski, R., Yang, X. J., and Teh, B. T. (2007), Detection of DNA copy number changes and oncogenic signaling abnormalities from gene expression data reveals MYC activation in high-grade papillary renal cell carcinoma, Cancer Research 67, 3171–76. 43. zur Nieden, N. I., Price, F. D., Davis, L. A., Everitt, R. E., and Rancourt, D. E. (2007), Gene profiling on mixed embryonic stem cell populations reveals a biphasic role for {beta}catenin in osteogenic differentiation, Molecular Endocrinology 21, 674–85. 44. Stepanova, A. N., Yun, J., Likhacheva, A. V., and Alonso, J. M. (2007), Multilevel interactions between ethylene and auxin in Arabidopsis roots, The Plant Cell 19, 2169–85. 45. Li, C. M., and Klevecz, R. R. (2006), From the cover: A rapid genome-scale response of the transcriptional oscillator to perturbation reveals a period-doubling path to phenotypic change, Proceedings of the National Academy of Sciences of the United States of America 103, 16254–59. 46. Jain, M., Nijhawan, A., Arora, R., Agarwal, P., Ray, S., Sharma, P., Kapoor, S., Tyagi, A. K., and Khurana, J. P. (2007), F-box proteins in rice. Genome-wide analysis, classification, temporal and spatial gene expression during panicle and seed development, and regulation by light and abiotic stress, Plant Physiology 143, 1467–83.

Calibration of Microarray Gene-Expression Data

47. Binder, H., Krohn, K., and Preibisch, S. (2008), “Hook” calibration of GeneChip-microarrays: chip characteristics and expression measures, Algorithms for Molecular Biology 3:11.

407

48. Binder, H., and Preibisch, S. (2008), “Hook” calibration of GeneChip-microarrays: Theory and algorithm, Algorithms for Molecular Biology 3:12.

Chapter 21 Meta-analysis of Cancer Gene-Profiling Data Xinan Yang and Xiao Sun Summary DNA microarray profiles are plagued by the issue of large number of variables but small number of samples and are often notorious for their low signal-to-noise ratio for clinical applications. Therefore, a great need for meta-analysis techniques is emerging to yield more valid and informative results than each experiment separately. By exploring the power of several studies in one single analysis, meta-analysis of many cancer gene-profiling data increases the statistical power to detect differentially expressed genes and allows assessment of heterogeneity. OrderedList is such a method that was specially proposed for cancer gene expression data meta-analysis. It is superior to other methods in that it does not rely on strong effects of differential gene expression in a single study but on consistent regulated genes across multiple studies. This chapter introduces the R implementation of this methodology on real data sets to identify biomarkers for adenocarcinoma lung cancer. Key words: Microarray, Gene-list comparison, Expression, Meta-analysis

1. Introduction With high-dimensional variables (thousands of genes), microarray data suffer from small numbers of samples and are often notorious for their low signal-to-noise ratio. However, microarray technologies are becoming more prevalent for cancer research, and it is now usual to find several gene expression data sets from different laboratories employing the same/different technologies to identify genes related to the same condition. Meta-analysis is a statistical technique for combining these quantitative findings from independent studies. Therefore, meta-analysis of gene-profiling data is increasingly required to integrate data sets that investigate

Robert Grützmann and Christian Pilarsky (eds.), Cancer Gene Profiling: Methods and Protocols, Methods in Molecular Biology, vol. 576 DOI 10.1007/978-1-59745-545-9_21, © Humana Press, a part of Springer Science + Business Media, LLC 2010

409

410

Yang and Sun

a common theme or disorder, and to yield more valid and informative results than each experiment separately (1). Meta-analysis is a way to enlarge the sample size of microarray data by integrating studies of the same theme. Moreover, it provides a possibility to discover a measurable commonness that exists between certain topics of independent studies. The danger is that, in amalgamating a large set of different studies, the construct definitions can become imprecise and, thus, it may be difficult to meaningfully interpret the results. The general assumption for meta-analysis is that many researches into one topic can be combined into a large study in terms of “effect sizes.” The effect size encodes the selected research findings (effects) on a numeric scale. It provides information regarding how much an expression change is evident either across all studies or for a subset of studies(2). Two common strategies for modeling the effect size of microarray include either transforming gene expression measures across studies or generating summaries such as p values, probabilities, or ranks (3, 4). This chapter describes the problems of meta-analysis, e.g., comparing between different chip platforms and the methods to solve these problems. Moreover, it introduces the related Bioconductor package OrderedList (5) and useful linkages. It focuses on combining summary measures of expression rather than expression measures, which better overcomes the difficulty in incorporating data across multiple platforms and laboratories. By accessing the original raw data (if available) that yielded the initial results, and observing the orderings of two independent statistics, OrderedList uncovers whether two lists of differentially expressed genes are significantly similar. This similarity is evaluated by weighted sum of size of overlapping in top ranks. The method is demonstrated by using public data from the same as well as different platforms. Corresponding R example codes are given in Subheading 3.3.

2. Materials (Public Microarray Databases)

The Minimum Information About a Microarray Experiment (MIAME) guidelines that outline the minimum information should be included when describing a microarray experiment. Popular public repositories that store cancer microarray data as well as support the MIAME requirements include: 1. Gene Expression Omnibus (GEO) – A database in NCBI for the public use and dissemination of gene expression data (6) (http://www.ncbi.nlm.nih.gov/geo/) (see Note 1).

Meta-analysis of Cancer Gene-Profiling Data

411

2. ArrayExpress – A public repository for microarray-based gene expression data maintained by the European Bioinformatics Institute (7) (http://www.ebi.ac.uk/arrayexpress/). 3. Oncomine – A collection of publicly available cancer microarray studies and data mining tools to efficiently query genes and data sets of interest (8). Oncomine Research Edition is freely available to academic and nonprofit organizations at (http://www.oncomine.org). 4. Stanford Microarray Database (SMD) – Stores two-color, spotted DNA microarrays for the entire scientific community (9) (genome-www5.stanford.edu/). 5. Others.

3. Methods The validity of a meta-analysis depends on the quality of the systematic review on which it is based. Good meta-analyses aim at complete coverage of all relevant studies including heterogeneity studies, and explore the robustness of the main themes using sensitivity analysis. Thus, the design of meta-analysis is the core and involves clinical and biology know-ledge. In each study, samples are divided into at least two distinct classes. To integratively evaluate multiple independent data sets that investigate a common theme or disorder, it is important to choose the classes one is interested in comparing. Here, we simply compare healthy lung and adenocarcinoma (a type of lung cancer) as an example. The base of multiple comparisons is pair-wise comparison. The methods of the main package, OrderedList, provide a comparison of two summary measures, say, effect sizes. Separately, each effect size is generated from the chosen two conditions of a certain clinical/biological theme or disorder for each study (see Note 2). It is important that although each single study might not necessarily reveal significant changes, one might observe considerable overlap in the top-ranking genes. Moreover, consensus changes reflecting the addressed theme or disorder would always lead orders in different studies. Hence, the number of overlapping genes is first computed for the pair of lists along their ranks. Then a similarity score is assigned to this comparison of two ranked (ordered) gene lists (see Note 3). In principle, the similarity score is a weighted sum of the size of overlap in the top ranks, with more weight placed on the top ranks (10).

412

Yang and Sun

3.1. For Gene Lists with Expression Levels 3.1.1. Data Selection and Collection

Select the experiments addressing the same clinical/biological problem. For example, to compare the differential gene expression between human adenocarcinoma and healthy lung tissues, we select two GEO data sets and four Oncomine gene lists. The two GEO data sets are based on different microarray technologies and used as examples for meta-analysis data with expression levels. The other gene lists from Oncomine provide an example of performing meta-analysis based directly on gene lists. 1. GSE1987 (see Subheading “Download and Convert Expression Data”): GSE1987 is a non-small cell lung cancer data set with 37 cases, based on Affymetrix Hgu95av2 gene chips. It includes seven samples of adenocarcinoma, seven samples of healthy lung tissues adjacent to the tumors, two commercial samples of normal lung RNA (see Note 4), and others. 2. GDS619 (see Subheading “Download, Impute, and Convert Expression Data”): GDS619 collects high-grade human lung tumor groups, including 12 adenocarcinoma and 19 healthy lung samples, based on two-color spotted complementary DNA (cDNA) microarrays. 3. Four whole expressed gene lists (see Table 1) of t statistics on healthy versus adenocarcinoma, downloaded from Oncomine Research Edition, version 3.5 (see Note 5, Subheading “Download and Read Gene Lists into R”).

3.1.2. Data Preprocessing

All data with available raw profiles are recommended to be preprocessed (11), respectively or together (see Note 6). For the data stored in GEO, one can simply download the preprocessed data in the format of Simple Omnibus Format in Text (SOFT). 1. Preprocessed expression values are then base-two log-transformed where applicable (see Note 7, Subheading “Get the GSE Data You Wanted”).

Table 1 The gene lists downloaded from Oncomine (Research Edition version 3.5) for comparisons of healthy lung tissue versus lung adenocarcinoma ID

Author

Platform

No. healthy

No. cancer

G1

Beer

HumanGeneFL Array

10

86 (17)

G2

Bhattacharjee

Human Genome U95Av2 Array

17

139 (13)

G3

Stearman

Human Genome U95Av2 Array

19

20 (14)

G4

Garber

Spotted Array

6

40 (18)

Meta-analysis of Cancer Gene-Profiling Data

413

2. Process the phenoData of each study and decide on the features (conditions) to be compared across studies (see Subheadings “Build phenoType Table from the Description of Each Sample” and “Observe and Process the phenoData”). 3. Build the ExpressionSet object for each study, respectively (see Subheading “Convert to ExpressionSet Object” in two occurrences under Subheading 3.3.1). 4. Microarray features with more than half of the values missing across all arrays per study are not considered for further analysis (see Subheadings “Remove the Probesets with NA Values Across Samples” and “Keep the Probesets with Less Than 50% NA Values”). 5. To use the package OrderedList, the data with missing values are imputed by replacing them via nearest-neighbor averaging (see Note 8, Subheading “Impute Missing Expression Values”). 6. Variables (genes) are matched for across-studies comparison. If studies are based on different platforms and only a subset of genes can be mapped from one chip to the other, one must provide this information via the argument mapping in the function prepareData. Here, we used a mapping between the manufacturer’s identifiers and UniGene identifiers as example (see Note 9, Subheading “Observe the Common Identifiers” under Subheading 3.3.1). 7. Prepare a collection of two expression sets of class exprSet by calling the function prepareData (see Notes 10 and 11, Subheading “Combine Two Studies Into One Expression Set”). These data sets are then merged into one exprSet together with the rearranged phenoData, and the argument mapping if the studies are based on different platforms. 3.1.3. Evaluate the Significance of the Similarity Score

1. Within each study, a gene-wise test on the difference of class means is conducted as the effect size (12). To do this we performed a t test with regularized variances, z test, as an example (see Note 12). 2. Decide the parameter beta ∈{0.5,1} : beta = 1: The class labels of two studies match each other. That is, the first class label of study A has the same interpretation as the first class label of study B. The same principle applies for the second class labels (see Subheading “Detect Similarities of Two Expression Studies”). beta = 0.5: The class labels do not match. For example, study A compares different tumor grades whereas study B compares different tissues. Now, the orientation of the two lists is not clear. Thus, both the similarities of the originally ordered lists as well as the similarity of one list to the other list in flipped orientation are taken into account.

414

Yang and Sun

Fig. 1. A data driven parameter alpha helps to provide the best signal-to-noise separation for similarity scores. A datadriven optimal alpha is chosen where the pAUC scores are maximally separating the distribution of observed and random similarity scores (11). A vertical line marks the optimal alpha.

3. Decide the parameters B and alpha to decide how many ranks should be taken into account (10). This can be done with two parameters:

B is the number of internal subsamplings needed to achieve an optimized alpha*. An example result is shown in Fig. 1. alpha is a vector of weighting parameters. If set to NULL (the default), the parameters are computed such that the top 100–2,500 ranks receive weights above min.weight = 1 × 10−5. A smaller alpha counts for more ranks in the ordered gene list to calculate the similarity score. The optimal alpha gets the highest pAUC score that separates B times random scores from alternative scores and, thus, provides the best signal-to-noise separation (see Fig. 2). 3.1.4. Get the Contributing Identifiers that Drive the Similarity

The output of function OrderedList also gives a vector with sorted probe IDs of the overlapping genes that contribute percent (95% as default) to the overall similarity score (see Subheading “Get

Meta-analysis of Cancer Gene-Profiling Data

415

Fig. 2. The red (right) curve corresponds to simulated observed scores and the black (left) curve corresponds to simulated random scores. The vertical red line denotes the actually observed similarity score. These two kernel density estimates of score distributions underlie the pAUC score for the optimal alpha, as shown in Fig. 1. The bottom rugs mark the simulated values.

the Contributing Identifiers that Drive the Similarity” under Subheading 3.3.1). 3.2. For Gene Lists Without Expression Levels (for Each Sample) 3.2.1. Compare Between the Same Technological Platforms

The examples of comparison between the same platforms are given in Subheading “Compare Between the Same Affymetrix Chips” for Affymetrix Hgu95av2 (GPL91). Two preprocessed gene lists, one reported by Bhattacharjee et al. (13) and another reported by Stearman et al. (14) are given (see Table 1). In addition, Subheading “Compare Between Different Affymetrix Chips” gives R codes to do comparison between different Affymetrix chips as an example. 1. Use probeset IDs as identifiers if comparing between the same platform, otherwise, use the Unigene ID (see Note 13). 2. Find the common identifiers between the two studies (see Note 14).

416

Yang and Sun

Fig. 3. The numbers of overlapping genes in the two gene lists generated from different platforms but for the same comparison of healthy versus adenocarcinoma lung tissue. The overlap size is drawn as a step function over the respective ranks. The top ranks correspond to upregulated and the bottom ranks to downregulated genes. In addition, the expected overlap and 95% confidence intervals derived from a hypergeometric distribution are shown as filled background. The similarity is also significant here (p = 0).

3. Order the lists of identifiers to be compared. 4. Compare the ordered lists with weighted overlap score using the function compareLists. 5. Get the contributing identifiers that drive the similarity by calling the function getOverlap (see Fig. 3). 3.2.2. Comparison Between Different Technological Platforms

The examples of comparison between different microarray technological platforms, i.e., cDNA arrays and Affymetrix oligonucleotide arrays are given in Subheading “Comparison Between Different Microarray Technological Platforms.” Although many arguments exist, it is reported that the log ratios of the highly expressed genes are strongly correlated, especially between Affymetrix and cDNA arrays(15). 1. Make one gene equal one statistic for the lists to be compared. This step is required because many genes are detected with

Meta-analysis of Cancer Gene-Profiling Data

417

multiple probesets and it is difficult to map them between different microarray technological platforms. These genes are first presented by the probeset with highest statistic or highest variance (see Note 15). 2. Check the one-to-one relationship between the two lists to be compared. 3. Observe the common one-to-one mapped identifiers. 4. Order the lists of identifiers to be compared. 5. Compare the ordered lists with the weighted overlap score by calling the function compareList. 6. Get the contributing identifiers that drive the similarity, if significant. 3.3. Examples 3.3.1. R Examples for Comparing Gene Lists with Expression Data 3.3.1.1. Download and Convert Expression Data

# adenocarcinoma (AC) vs. healthy human lung samples # R version 2.6.0 (2007-10-03) library(“GEOquery”) library(“impute”) # (see Note 5) library(“OrderedList”) library(“hu6800”) library(“hgu95av2”) require(“Biobase”)

3.3.1.1.1. Get the GSE Data You Wanted

gse <- getGEO(“GSE1987”, “/your_local/GSE1987_family. soft.gz”) probesets <- Table(GPLList(gse)[[1]])$ID data.matrix <- log2(do.call(“cbind”, lapply(GSMList(gse), function(x) { # (see Note 16) tab <- Table(x) mymatch <- match(probesets, tab$ID_REF) return(tab$VALUE[mymatch]) }))) ## Only the 12,625 probesets have values in this data set data.matrix <- data.matrix[1:12625,]

3.3.1.1.2. Build phenoType Table from the Description of Each Sample

varLabels <- c(“disease”, “gender”, “stage”, “sample”) # Due to the data set clinical <- matrix(nrow = ncol(data.matrix), ncol = length (varLabels)) colnames(clinical) <- varLabels gsm <- GSMList(gse) getpData <- function(gsm,cli) { for(i in 1:nrow(cli)){ C <- Meta(gsm[[i]])$description varValues <- unlist(strsplit(C,”[.]”)) if (length(varValues) ==2) varValues <- c(varValues,””)

418

Yang and Sun

cli[i,] <- c(varValues,Meta(gsm[[i]])$title) } return(cli) } cli <- getpData(gsm,clinical) ## Abbreviate, merge the same subtype with different characters or spacing cli[which(cli[,1] %in% c(“Adenocarcinoma”, “Adenocar cinoma”),1] <- “AC” cli[which(cli[,1]==“Squamous Cell Carcinoma”),1] <- “SCC” cli[which(cli[,1]== “Commercial normal lung RNA”),1] <“normal” ## label the healthy tissue adjacent to the a disease sample as “Normal” cli[, “disease”] <- sapply(cli[, “disease”], function(x) unlist (strsplit(x,” “))[1]) 3.3.1.1.3. Convert to ExpressionSet Object

rownames(data.matrix) <- probesets[1:12625] colnames(data.matrix) <- names(gsm) pdata <- data.frame(cli) rownames(pdata) <- names(gsm) metaData <- data.frame(labelDescription=c(“Adenocarcinoma, normal, Squamous Cell Carcinoma …”, “male/female”, “Two Stage”, “Sample ID”)) ADF <- new(“AnnotatedDataFrame”, data = pdata, varMetadata = metaData) eset <- new(“ExpressionSet”, phenoData = ADF, exprs = data.matrix) ## Store the result to save time save(eset, file=”/your_local/GSE1987.eSet.rdat”,compress=T) eset1 <- eset cli1 <- pData(eset1)

3.3.1.1.4. Remove the Probesets with NA Values Across Samples (See Note 17)

## Because the current package OrderedList cannot deal with data with a NA value, length(which(is.na(exprs(eset1)))) ## remove the 26 probesets without expression levels for any sample in this data set data <- exprs(eset1) eset1 <- eset1[-which(is.na(data[,1])),]

3.3.1.2. Download, Impute, and Convert Expression Data

gds <- getGEO(“GDS619”, “/your/local/GDS619.soft.gz”) eset <- GDS2eSet(gds, do.log2 = FALSE) data2 <- exprs(eset)

3.3.1.2.1. Get the GDS Data You Wanted 3.3.1.2.2. Observe and Process the phenoData

## Abbreviate, merge the same subtype with different characters or spacing cli <- pData(eset)

Meta-analysis of Cancer Gene-Profiling Data

419

rownames(cli) <- cli[,”sample”] disease <- as.vector(cli[,”cell.type”]) disease[which(disease==”adenocarcinoma”)] <- “AC” disease[which(disease==”small cell lung carcinoma”)] <- “SCLC” cli[,”cell.type”] <- disease 3.3.1.2.3. Convert to ExpressionSet Object

pdata <- data.frame(cli) metaData <- data.frame(labelDescription = c(“Sample ID”, “Adenocarcinoma, normal, small cell lung carcinoma …”,”description”)) ADF <- new(“AnnotatedDataFrame”, data = pdata, varMetadata = metaData) eset <- new(“ExpressionSet”, phenoData = ADF, exprs = data2) save(eset, file=”/your_local/GDS619.eSet.rdat”,compress=T)

3.3.1.2.4. Keep the Probesets with Less Than 50% NA Values (See Note 17)

data2 <- exprs(eset) n.na <- apply(data2,1,function(x) length(which(is.na(x)))) data2 <- data2[which(n.na < ncol(data2)*0.5),]

3.3.1.2.5. Impute Missing Expression Values

if(exists(“.Random.seed”)) rm(.Random.seed) d2.imputed <- impute.knn(data2) phe <- phenoData(eset) eset2 <- new(“ExpressionSet”, exprs = d2.imputed$data, phenoData = phe) cli2 <- pData(eset) g2 <- geneNames(eset2)

3.3.1.3. Observe the Common Identifiers

## There are 7,524 common genes between GPL91 (1759) and GPL962 (7104) (GEO version 30 Oct 2007).

3.3.1.3.1. Download and Construct the Mapping Table

## downloaded from [http://ailun.stanford.edu/compareData/ gpl_91_962.txt] map91 <- read.delim(“/your_local/gpl_91_962.txt”,header= FALSE) ## downloaded from [http://ailun.stanford.edu/compareData/ gpl_962_91.txt] map962 <- read.delim(“/your_local/gpl_962_91.txt”,header= FALSE) map <- map91; map[,2] <- map962[,1] colnames(map) <- c(“GPL91”,”GPL962”,”Gene”,”Notation”)

3.3.1.3.2. To Ensure a One-to-One Relationship Between the Two Lists To Be Compared

oneGene2oneProbe <- function(eset, map, whichCol) { ## a gene described by multiple probesets is presented by the probeset with the highest statistic or highest variance. variances <- apply(exprs(eset),1,var)

420

Yang and Sun

{

}

}

ignoreProbe <- NULL dupID <- grep(“///”,map[,whichCol]) for( i in 1:length(dupID)) probeGroup <- strsplit(as.character(map[dupID[i],whichCol]), split=”///”) vGroup <- variances[unlist(probeGroup)] thisProbe <- which(vGroup == max(vGroup)) if(length(thisProbe)>0) ignoreProbe <- c( ignoreProbe, names (vGroup)[-thisProbe] ) ignoreProbe <- unique(ignoreProbe) eset <- eset[-which(geneNames(eset) %in% ignoreProbe),] return(eset) eset1 <- oneGene2oneProbe(eset1,map,”GPL91”) g1 <- geneNames(eset1) #9948 eset2 <- oneGene2oneProbe(eset2,map,”GPL962”) g2 <- geneNames(eset2) #34967

3.3.1.3.3. Generate the Argument Mapping for the Function prepareData

id1 <- which(g1 %in% map$GPL91) length(id1) #[1] 5519 eset1 <- eset1[id1,] id2 <- which(g2 %in% map$GPL962) length(id2) #[1] 4415 eset2 <- eset2[id2,] map <- map[which(map$GPL91 %in% geneNames(eset1)),] map <- map[which(map$GPL962 %in% geneNames(eset2)),] dim(map) #[1] 3273 4

3.3.1.4. Combine Two Studies into One Expression Set

### “map” contains the appropriate mapping between two identifiers of platforms. ## The character strings with comparison labels (names) should be the same as given to the colnames of mapping. eset1 <- eset1[as.vector(map$GPL91),] eset2 <- eset2[as.character(map$GPL962),] MetaSet <- prepareData( list(data=eset1,name=“GPL91”,var=“disease”, out = c(“Normal”,“AC”), paired = FALSE), list(data=eset2,name=“GPL962”,var=“cell.type”, out = c(“normal”, “AC”), paired = FALSE), mapping = map[,c(“GPL91”, “GPL962”)] )

3.3.1.5. Detect Similarities of Two Expression Studies

x <- OrderedList(MetaSet, B = 1000, test = “z”, beta = 1) plot(x,”pauc”) # (see Fig. 1) dev.copy2eps(file=”result/pauc.eps”) plot(x,”scores”) # (see Fig. 2)

Meta-analysis of Cancer Gene-Profiling Data

421

dev.copy2eps(file=”result/scores.eps”) x$p #[1] 0.000999001 3.3.1.6. Get the Contributing Identifiers that Drive the Similarity

res <- x$intersect xx <- as.list(hgu95av2SYMBOL) ## Remove probes that do not map to any GENENAME xx <- xx[!is.na(xx)] GPL91id <- sapply(res, function(x) unlist(strsplit(x, split=”/”)) [1]) res <- unlist(xx[GPL91id])

3.3.2. R Example for Comparing Gene Lists Without Expression Data

g1 <- read.table(“/your_local/Beer_NL_AC_diffGeneList_1.txt”, row.names=NULL, header=TRUE, sep=”\t”) g2 <- read.table(“/your_local/Bhattacharjee_NL_AC_diffGeneList_1.txt”, row.names=NULL, header=TRUE, sep=”\t”) g3 <- read.table(“/your_local/Stearman_NL_AC_diffGeneList_1.txt”, row.names=NULL, header=TRUE, sep=“\t”) g4 <- read.table(“/your_local/Garber_NL_AC_diffGeneList_1.txt”, row.names=NULL, header=TRUE, sep=”\t”, comment. char=”%”)

3.3.2.1. Download and Read Gene Lists into R

3.3.2.2. Compare Between the Same Affymetrix Chips 3.3.2.2.1. Use Probeset ID as rownames (Identifiers)

rownames(g2) <- g2[,”Gene.Symbol”]

rownames(g3) <- g3[,”Gene.Symbol”] comProbes <- intersect(as.vector(g2[,3]), as.vector(g3[,3])) length(comProbes)

3.3.2.2.2. Observe the Common Identifiers

## Since g2 had 11,158 probesets and g3 had 12,625 probesets, ## the common probesets were used. g2 <- g2[comProbes,] g3 <- g3[comProbes,]

3.3.2.2.3. Order the Lists of Identifiers to be Compared

list1 <- rownames(g2)[order(g2[,“Statistic”],decreasing = FALSE)] list2 <- rownames(g3)[order(g3[,“Statistic”],decreasing = FALSE)]

3.3.2.2.4. Compare Ordered Lists with Weighted Overlap Score

x <- compareLists(list1,list2) x$pvalue ##<0.002

3.3.2.2.5. Get the Contributing Identifiers that Drive the Similarity 3.3.2.3. Compare Between Different Affymetrix Chips 3.3.2.3.1. Use Symbols of Gene as rownames

res <- getOverlap(x,max.rank = 100) plot(res) res$intersect rownames(g1) <- g1[,“Gene.Symbol”] rownames(g2) <- g2[,“Gene.Symbol”]

422

Yang and Sun

3.3.2.3.2. Observe the Common Identifiers (Unigene ID)

xx <- as.list(hu6800UNIGENE) ## Remove probe identifiers that do not map to any UniGene ID xx <- xx[!is.null(xx)] genes1 <- xx[rownames(g1)] xx <- as.list(hgu95av2UNIGENE) ## Remove probe identifiers that do not map to any UniGene ID xx <- xx[!is.null(xx)] genes2 <- xx[rownames(g2)] ## Ensure the one-to-one relationship between the two lists to be compared mergeProbe <- function(Statistic,identifiers) { ## The parameter Statistic is a vector to order the list dupGenes <- which(duplicated(identifiers)) uniGene <- unique(identifiers) S <- Statistic[-dupGenes] names(S) <- uniGene for( i in 1:length(dupGenes)) { thisGeneID <- dupGenes[i] j <- which( uniGene %in% identifiers[thisGeneID]) if (Statistic[thisGeneID] > S[j]) S[j] <- Statistic[thisGeneID] } return(S) } ## s1, s2 are two statistics with unique names of Unigene ID s1 <- mergeProbe(g1[,”Statistic”],genes1) s2 <- mergeProbe(g2[,”Statistic”],genes2) comUnigenes <- intersect(names(s1),names(s2)) length(comUnigenes) #[1] 4943 ## Let the two lists to be compared contain the same identifiers s1 <- s1[comUnigenes] s2 <- s2[comUnigenes]

3.3.2.3.3. Order the Lists of Identifiers to be Compared

list1 <- names(s1)[order(s1,decreasing = FALSE)] list2 <- names(s2)[order(s2,decreasing = FALSE)]

3.3.2.3.4. Compare the Ordered Lists with the Weighted Overlap Score

x <- compareLists(list1,list2) x$pvalue ## <0

3.3.2.3.5. Get the Contributing Identifiers that Drive the Similarity

res <- getOverlap(x,max.rank = 100) plot(res) dev.copy2eps(file= “result/overlap.eps”) #(see Fig. 3) res$intersect

3.3.2.4. Compare Between Different Microarray Technological Platforms 3.3.2.4.1. Make One Gene Equal One Statistic for the Lists to be Compared

Meta-analysis of Cancer Gene-Profiling Data

423

## A gene described by multiple probesets is presented by the probeset with highest statistic or highest variance. names1 <- g1[,“Gene.Name”] names4 <- g4[,“Gene.Name”] ## s1,s2 are two statistics with unique names of Unigene ID s1 <- mergeProbe(g1[, “Statistic”],names1) s4 <- mergeProbe(g4[, “Statistic”],names4)

3.3.2.4.2. Observe the Common Identifiers

comGenes <- intersect(names(s1),names(s4)) length(comGenes) # [1] 3589

3.3.2.4.3. Observe the Common One-to-One Mapped Identifiers

s1 <- s1[comGenes] s4 <- s4[comGenes]

3.3.2.4.4. Order the Lists of Identifiers to be Compared

list1 <- names(s1)[order(s1,decreasing = FALSE)] list2 <- names(s4)[order(s4,decreasing = FALSE)]

3.3.2.4.5. Compare the Ordered Lists with the Weighted Overlap Score

x <- compareLists(list1,list2)

3.3.2.4.6. Get the Contributing Identifiers that Drive the Similarity

res <- getOverlap(x,max.rank = 100) res$intersect # [1] “ADH1C” “EST” “TSPAN7”

4. Notes 1. The bioconductor package GEOquery can download data from NCBI GEO and convert them to Bioconductor “ExpressionSets” or limma “MALists” objects. 2. The preprocessed gene expression values are assumed to be on an additive scale, that is, a logarithmic or log-like scale. Thus, fold changes of molecule abundance correspond to differences in the normalized data. 3. In any case, a large positive test score corresponds to upregulation in the first class of samples and a large negative value corresponds to downregulation. The genes within each study are sorted according to their test scores. Top ranks correspond to highly upregulated genes and bottom ranks to highly downregulated genes. 4. A GDS record can be directly converted to an ExpressionSet. However, starting from a GSE Series Matrix, one needs first to process the phenoData, then to build an ExpressionSet).

424

Yang and Sun

5. The Oncomine Research Edition is free but has limited resources. To download the entire expression measurements, one can set a threshold of 1 to the differentially expressed gene lists. 6. A common protocol is recommended due to poor concordances among preprocessing results where raw expression data are available. 7. Note that some methods of probesets summary already output the value on a log2 scale. As an alternative to log2 scaling, and as advisable for low expressed genes, the variance-stabilizing transformation VSN may also be used. VSN can be used with cDNA or Affymetrix data. 8. Note that some imputation can be inappropriate. In addition, most of the functions in R can deal with missing values if you set the argument na.rm = TRUE and some do this by default. Therefore, impute the data only if you plan to use a classification method that cannot handle missing values. More than one R package implements missing values, e.g., impute.knn method from the impute package. 9. Make sure your expression matrixes are ordered for the same gene orderings and then no mapping is needed if all studies are based on same platform. There is a new resource, AILUN (http://ailun.stanford.edu/), developed by Stanford. It reannotates all gene expression/proteomics data from GEO by relating all probe IDs to Entrez Gene IDs once per month(16). 10. Both data sets have to be preprocessed beforehand, either together or independently of each other (see Note 4). Moreover, for each data set, one has to specify features in the corresponding phenoData according to which samples are grouped into two distinct classes. 11. Only a subset of genes can be mapped from one chip to the other, this information must be provided via the argument mapping. To get one-to-one mapping across different platforms, genes detected with multiple probesets need to be merged into one value (see Subheading “To Ensure a One-toOne Relationship Between the Two Lists To Be Compared”). 12. Other alternative tests are the common t test or just the log ratio test, which is the difference of means on an additive scale. 13. We use Unigene ID here as example. One can also use another special mapping, such as the one provided by Affymetrix (http://www.affymetrix.com/support/technical/comparison_ spreadsheets.affx). 14. Make sure that the two lists received as arguments are matched against each other according to a given mapping, for example, probeset ID. If mapping is NULL, the two lists

Meta-analysis of Cancer Gene-Profiling Data

425

are expected to contain the same identifiers and there must be a one-to-one relationship between them. 15. Some studies use the average expression level for genes with multiple probesets. 16. By using the GEOquery package to convert one GEO data set to Bioconductor “ExpressionSet” or limma “MAList,” one should decide whether the data need to be log2 transformed before inserting into a new data structure. Sometimes, the data were log2 transformed before submitting. For Affymetrix platforms, the function expresso in package affy performs the steps background correction, normalization, probe-specific correction, and summary value computation. It gives expression measure in log2 scales. 17. Missing values can be informative, but this depends on how the missing values were generated. One commonly filters genes with more than 70% missing (across arrays) to avoid spurious results. Sometimes arrays with more than, for example, 50% missing values can be indicative of array problems.

Acknowledgments The author thanks Dr. Lottaz C. for helpful advice and Zhang Q.Q. for carefully proof reading the example R codes. This work was supported by the Natural Science Foundation, 60671018, 60771024. References 1. Fishel, I., Kaufman, A. and Ruppin, E. (2007) Meta-analysis of gene expression data: a predictor-based approach. Bioinformatics 23(13), 1599–1606. 2. Hedges, L.V. and Olkin, I. (1985) Statistical Methods for Meta-Analysis. Orlando, FL: Academic. 3. Rhodes, D.R., Barrette, T.R., Rubin, M.A., Ghosh, D. and Chinnaiyan, A.M. (2002) Meta-analysis of microarrays: interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer. Cancer Res. 62, 4427–4433. 4. Conlon, S.M, Song, J.J. and Liu, A. (2007) Bayesian meta-analysis models for microarray data: a comparative study. BMC Bioinformatics 8, 80. 5. Lottaz, C., Yang, X., Scheid, S. and Spang, R. (2006) OrderedList – a bioconductor package

for detecting similarity in ordered gene lists. Bioinformatics 22, 2315–2316. 6. Barrett, T., Suzek, T.O., Troup, D.B., et al. (2005) NCBI GEO:mining millions of expression profiles – database and tools. Nucleic Acids Res. 33, D562–D566. 7. Parkinson, H., Sarkans, U., Shojatalab, M., et al. (2005) ArrayExpress – a public repository for microarray gene expression data at the EBI. Nucleic Acids Res. 33, D553–D555. 8. Rhodes, D.R., Kalyana-Sundaram, S., Mahavisno, V., et al. (2007) Oncomine 3.0: genes, pathways, and networks in a collection of 18,000 cancer gene expression profiles. Neoplasia 9, 166–180. 9. Gollub, J., Ball, G.A., Binkley, G., et al. (2003) The Stanford Microarray Database: data access and quality assessment tools. Nucleic Acids Res 31, 94–96.

426

Yang and Sun

10. Yang, X.N., Bentink, S., Scheid, S. and Spang, R. (2006) Similarities of ordered gene lists. J Bioinform Comput Biol. 4(3), 693–708. 11. Irizarry, R.A., Wu, Z. and Jaffee, H.A. (2006) Comparison of Affymetrix GeneChip expression measures. Bioinformatics 22(7), 789–794. 12. Glass, G.V. (1978) Integrating findings: the metaanalysis of research. Rev Res Educ 5, 351–379. 13. Bhattacharjee, A., Richards, W.G., Staunton, J., Li, C., Monti, S., et al. (2001) Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci USA 98, 13790–13795. 14. Stearman, R.S., Dwyer-Nield, L., Zerbe, L., Blaine, S.A., Chan, Z., et al. (2005) Analysis of orthologous gene expression between human pulmonary adenocarcinoma and a carcinogeninduced murine model. Am J Pathol 167(6), 1763–1775. 15. Park, P.J., Cao, Y.A., Lee, S.Y., et al. (2004) Current issues for DNA microarrays: platform

comparison, double linear amplification, and universal RNA reference. J Biotechnol 112(3), 225–245. 16. Chen, R., Li, L. and Butte, A.J. (2007) AILUN: reannotating gene expression data automatically, Nat Methods 4(11), 879. 17. Beer, D.G., Kardia, S.L., Huang, C.C., Giordano, T.J., Levin, A.M., Misek, D.E., Lin, L., Chen, G., Gharib, T.G., Thomas, D.G., Lizyness, M.L., Kuick, R., Hayasaka, S., Taylor, J.M., Iannettoni, M.D., Orringer, M.B., Hanash, S. (2002) Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nature Med 8, 816–824. 18. Garber, M.E., Troyanskaya, O.G., Schluens, K., Petersen, S., Thaesler, Z., PacynaGengelbach, M., van de Rijn, M., Rosen, G.D., Perou, C.M., Whyte, R.I., Altman, R.B., Brown, P.O., Botstein, D., Petersen, I. (2001) Proc Natl Acad Sci USA 98, 13784– 13789.

Chapter 22 Target Gene Discovery for Novel Therapeutic Agents in Cancer Treatment Ole Ammerpohl, Sanjay Tiwari, and Holger Kalthoff Summary Target identification of novel therapeutic drugs is pivotal for the establishment of (1) new anticancer regimens, (2) to control side effects of the drugs, and (3) to identify appropriate combinations with established drugs. Here, we describe several in vitro assays applicable to characterize different characteristics of tumor cells. Furthermore, we present a protocol for establishing a reporter gene system for in vivo imaging, allowing for the study of drug effects in small animal models. Key words: Apoptosis, Cell cycle, Tumor invasion, FACS, In vivo imaging, Fluorescence

1. Introduction Worldwide, a large number of studies and projects are in progress to identify and isolate new anticancer drugs from plants, animals, or microorganisms from all parts of the world, from the rain forests to the oceans. Identified by high-throughput assays every year, numerous new potential drugs await further investigation to determine their mode of action. This kind of translation approach also covers the evaluation of putative side effects and the establishment of combination therapies. Cancer progression, its spreading and prognosis, is characterized by different parameters such as cell proliferation, cell motility, cell viability, or angiogenesis. Unfortunately, there is no simple assay available addressing all of these biological processes. Thus, to characterize a putative new drug, several assays have to be performed. In this chapter, we present various tests that can be very useful Robert Grützmann and Christian Pilarsky (eds.), Cancer Gene Profiling: Methods and Protocols, Methods in Molecular Biology, vol. 576 DOI 10.1007/978-1-59745-545-9_22, © Humana Press, a part of Springer Science + Business Media, LLC 2010

427

428

Ammerpohl, Tiwari, and Kalthoff

to identify the mode of action of new drugs and the signaling pathway by which they mediate their activity. Enhanced with this knowledge, it is also possible to further delineate putative target genes. This is especially true when gene expression profiling or DNA methylation data showing the status of specific genes involved in different pathways can be combined with functional assays. For the molecular assays, we refer to the other chapters of this book. Numerous known anticancer drugs induce cell cycle arrest, often followed by induction of apoptosis. Because this affects primarily rapidly dividing cells (like tumor cells), drugs affecting the cell cycle and inducing apoptosis are useful for many fastgrowing cancers. To test the ability of an unknown drug to influence the cell cycle, a standard ex vivo assay is the staining of treated tumor cells and untreated control cells with propidium iodide (PI) followed by analysis with a cytometer (1, 2). If only the cell viability or the total number of cells is of interest, other less apparatus-dependent methods and high throughputsuitable tests like the MTT test (3–6) (measures the activity of mitochondrial enzymes and therefore cell viability) or the crystal violet assay (determines an optical density corresponding to the number of viable cells) are available (7). Beside the increased cell division, another important aspect of many malignant tumors is their ability to spread and to induce metastasis. To spread to a new location, a tumor cell has to leave its current tissue. For this purpose, it must gain the ability to destroy or digest the connective tissue or extracellular matrix. There are many tests available investigating different aspects of tumor cell spreading. Here we present an easy and noncomplex assay that determines the tumor cells’ ability to digest organic material (8, 9). Last but not least, we present some information for the in vivo imaging of xenotransplanted animals. In contrast to the in vitro assays discussed above, this approach offers the possibility of investigating the effect of a new drug and identifying mechanisms and pathways involved in an in vivo situation. Here, side effects of the drug and additional contributions of other organs and cells (metabolism of other organs, effect of the immune system) to the tumor growth or metastasis can be studied. Because molecular imaging is a relatively new field that merges molecular and cellular biology with state-of-the-art technology, it is appropriate to first present an overview of this rapidly advancing field and its application in drug discovery. Molecular imaging encompasses a range of modalities including magnetic resonance imaging (MRI), single-photon emission computed tomography (SPECT), positron emission tomography (PET), ultrasound (US), and optical imaging (OI) techniques (10–12). Optical imaging utilizes a reporter system based on bioluminescent or fluorescent proteins whereas PET and SPECT

Target Gene Discovery for Novel Therapeutic

429

are nuclear imaging modalities utilizing g-ray-emitting probes. MRI uses magnetic fields to alter the spin state of water molecules in the body, specifically hydrogen, and then uses radiofrequencies to measure the time for these altered states to relax. Currently, no modality that fulfills all of the characteristics of an ideal in vivo imaging system exists. For example, MRI provides a high degree of spatial resolution that is well suited for tumor phenotyping and anatomic detail but has low molecular sensitivity. Highly sensitive approaches such as PET and optical imaging are preferable for monitoring tumor cell biology, as well as tumor burden, progression, and metastases. However, PET and SPECT have limited resolution, which may sometimes be problematic for applications in small animals. In addition, the requirement for specific, low half-life radioisotopes is a major obstacle. Optical imaging provides a safe alternative to radioisotope assays and it is faster and cheaper than the other modalities. However, the major drawback of optical imaging is the limited depth sensitivity because light is absorbed by chromophores such as hemoglobin, melanin, or water and scattered by cellular components, especially membranes. Optical imaging in the near infrared (NIR) and mid-infrared emitted light (650–900 nm) is receiving greater attention because absorption and scattering is at a minima and several imaging probes that emit in the NIR have recently been developed (13, 14). Although poor tissue penetration constrains optical imaging applications in the clinic, for preclinical studies, it is the modality of choice. This is because optical reporters can be built into animals and linked to target genes and cells. When cloned into promoter/enhancer sequences or engineered into fusion proteins, imaging reporters enable the interrogation of signal transduction pathways and protein–protein interactions in live cells and animals (15). The in vivo visualization of specific molecular pathways enables the identification of key targets for the development of novel therapeutic agents in cancer treatment. Furthermore, many of the optical tools can be combined with other imaging modalities without modification, to provide both anatomic and functional information. The application of molecular imaging in drug discovery has been influenced by scientific and technological advancements on several fronts. First, the technical advances in imaging instruments, imaging probes, and three-dimensional reconstruction algorithms allow for the sensitive measurements of fundamental properties of biological processes such as proliferation, apoptosis, angiogenesis, and inflammation. Second, advances in genomics and proteomics have led to the identification of a wealth of novel targets and consequently there is an increasing need to support drug identification and development in the living animal. Third, advances in the ability to manipulate the mouse genome provide researchers the ability to generate models that faithfully recreate

430

Ammerpohl, Tiwari, and Kalthoff

aspects of the human disease (16). With these advances in mind, the role of molecular imaging in target identification can be applied at multiple levels. It can be used to identify regulatory mechanisms in a subpopulation of normal cells, such as stem cells or immune cells, which may provide novel insights into cancer stem cells or inflammation associated with immune escape, respectively. At another level, tumor mouse models can be utilized to monitor minimal residual disease and micrometastases. In such models, the elucidation of the components of the microenvironment that allow tumor dormancy or engraftment and subsequent growth at the secondary site can be examined. The evaluation of treatment efficiency and optimization of new drugs in reporter models can also lead to the further elucidation and characterization of these pathways, thus, improving identification for novel target drugs. Finally, the combination of a reporter mice reporting a gene implicated in disease bred into a model of a given disease would allow for the investigation of the molecular events leading to the onset and progression of disease. Following the identification of a target gene, gene ablation using short interfering RNA (siRNA) is often utilized to determine whether the target gene is a “drugable” target. The effect of gene ablation can be performed in vivo utilizing a reporter gene fused to the target gene of interest. For a discussion on the various strategies for the integration of genetically encoded imaging reporters for the deciphering of complex biological responses, the reader is referred to a recent review by Piwnica-Worms and colleagues (15).

2. Materials 2.1. Cell Culture and Cell Preparation

1. Cell culture media (appropriate for the cell line investigated, e.g., Dulbecco’s Modified Eagle’s Medium [DMEM]; Ham’s F12; or RPMI-1640) supplemented with 10% fetal calf serum (FCS) (e.g., Pan-Biotech, Aidenbach, Germany). Depending on the cell line used for the experiments, additional supplements might be mandatory. Information about cell line prerequisites is available from ATCC, DSMZ, ECACC, or the specific literature. 2. Phosphate-buffered saline (PBS): sodium hydrogen phosphate (Na2HPO4, 10 mM), sodium dihydrogen phosphate (NaH2PO4, 2 mM), sodium chloride (130 mM), pH 7.4. 3. Solution of trypsin (0.25%) and ethylenediamine tetraacetic acid (EDTA, 1 mM) in PBS. All solution used for cell culturing must be sterile, either by autoclaving or sterile filtration.

2.2. Cell Cycle Investigations by FluorescenceActivated Cell Sorting (FACS) (PI Staining)

Target Gene Discovery for Novel Therapeutic

431

1. PI stain: Prepare a 0.1% (1 mg/ml) stock solution of PI in PBS. PI is a mutagen and carcinogen. It should be handled with care (see manufacturer’s instructions). 2. Ethanol (absolute). 3. EDTA (1 mM) in PBS. 4. RNase A (20 mg/ml). RNase A solution can be purchased from several providers (e.g., Invitrogen) or prepared by dissolving RNase A in PBS to get a 20 mg/ml solution.

2.3. Assays 2.3.1. Cell Viability Assay (MTT Assay)

1. MTT solution: Dissolve 50 mg MTT powder (3-[4,5-dimethyl-2thiazolyl]-2,5-diphenyl-2H-tetrazolium bromide; SigmaAldrich, Munich, Germany) in 100 ml RPMI-1640 (without any other additives such as FCS) to get a 0.05% solution.

2.3.2. Cell Assay (Crystal Violet Stain)

1. Solution of 0.5% hexamethylpararosaniline chloride (also known as Crystal violet, Methyl violet 10B or Gentian violet) in 20% methanol and Aqua Dest (Carl Roth, Karlsruhe, Germany). Crystal violet is a putative mutagen and carcinogen. It should be handled with care (see the manufacturer’s instructions).

2.3.3. Easy Tumor Cell Invasion Assay

1. Trypan blue: Prepare a 0.2% solution of trypan blue in PBS. Alternatively, dilute a commercially available 0.4% trypan blue solution (Invitrogen, Karlsruhe, Germany) 1:2 with PBS. If photometric measurement is intended, a lysis buffer is required. We recommend the PolyATtract® GTC Extraction Buffer available from Promega (Promega, Mannheim, Germany).

2.4. Stable Transfection

1. Accutase (PAA Laboratories GmbH). 2. 1% low-melting soft agarose (Sigma-Aldrich). 3. 2× DMEM + 20% FBS. 4. G418 200 mg/ml stock solution (Carl Roth GmbH). 5. LipofectAmine Plus (Invitrogen). 6. Cloning cylinder (Fisher).

3. Methods 3.1. Cell Cycle Investigations by FACS (PI Staining) 3.1.1. Preparing Cells for FACS Analysis

1. Nonadherent cells

(a) When investigating nonadherent cells, transfer the cells including the media into a Falcon tube.

432

Ammerpohl, Tiwari, and Kalthoff

2. Adherent cells

(a) When using adherent cells, remove the media including the nonadherent cells (save the media if these cells are to be analyzed also).

(b) Add prewarmed (37°C) PBS to the cells carefully, so as not to detach the cells until the cells are covered.

(c) Remove the PBS by decantation or aspiration.

(d) Cover the cell layer with the trypsin solution.

(e) Keep the cells at 37°C in a cell culture incubator until the cells have detached. Detaching can be supported by several careful thumps of the culture vessel against the open hand every 5 min during the incubation. Depending on the cell line, the cells usually have detached after 10–20 min.

(f) Stop the trypsin activity by adding 1 volume cell culture medium including 10% FCS. Transfer the cell suspension into a suitable centrifuge tube.

(g) When nonadherent cells should be included into analysis, add the medium including the floating cells stored from step a.

3. Spin at 500 × g (approximately 1,400 rpm in a standard benchtop centrifuge) for 5 min in a centrifuge to pellet the cells. 4. Remove the medium by decantation or aspiration, being careful not to disturb the cell pellet. 5. Resuspend the cell pellet carefully in at least 10 volumes (15–30 ml) PBS. 6. Spin at 500 × g (approximately 1,400 rpm in a standard benchtop centrifuge) for 5 min in a centrifuge to pellet the cells. 7. Resuspend the cell pellet carefully in 1–5 ml PBS. 8. Count the cells using a hematocytometer. Numerous hematocytometers are available (e.g., Fuchs-Rosenthal, Thoma, Bürker, or Neubauer). If you are not familiar with the system available, please refer to the manufacturer’s manual of your chamber. 9. When not continuing directly with Subheading 3.1.2, store the remaining cells on ice (this time should be kept as short as possible). 3.1.2. Staining Cells for FACS Analysis

1. Spin the remaining cells not used for counting at 500 × g (approximately 1,400 rpm in a standard benchtop centrifuge) for 5 min in a centrifuge to pellet the cells. 2. Add PBS to a final concentration of 2–5 × 106 cells/ml and resuspend the cell pellet. 3. Transfer 1 ml cell suspension into a new centrifuge tube (e.g., 15-ml conical tube).

Target Gene Discovery for Novel Therapeutic

433

4. While vortexing the cell suspension very carefully, add ethanol (absolute) drop wise to 1 ml. 5. Incubate for 30 min at room temperature. Fixation in solvents often produces considerable aggregates of cells. See step 2b of Subheading 3.1.3 and Note 1. 6. Spin the suspension at 500 × g (approximately 1,400 rpm in a standard benchtop centrifuge) for 5 min in a centrifuge to pellet the fixed cells. 7. Remove the supernatant by decantation, and tap the tube upside down against an absorbent paper to remove any remaining alcohol (do not lose the pellet!). 8. Add 500 ml PBS containing 1 mM EDTA. 9. Add 2.5 ml RNase A solution to a final concentration of 50 mg/ml and resuspend pellet carefully by flicking against the tube. Incubate at room temperature for 30 min. 10. Transfer the cells into an appropriate FACS tube. 11. Spin at 500 × g (approximately 1,400 rpm in a standard benchtop centrifuge) for 5 min in a centrifuge. 12. Remove the supernatant by decantation; tap the tube upside down against an absorbent paper to remove any remaining liquid. 13. Resuspend pellet in 500 ml PBS containing 200 mg/ml PI. 14. Incubate for at least 30 min at 4°C in the dark (can be stored overnight if desired). 3.1.3. Run FACS Analysis

A wide range of FACS equipment is available on the market. Thus, we cannot give a detailed description that is appropriate for every cytometer available. Instead we will present some general considerations and explain some steps as examples for using a FACScan™ or FACSCalibur™ running CellQuest™ software (Becton Dickenson, Heidelberg, Germany), WinMDI (http:// facs.scripps.edu/software.html; freeware) or Cyflogic (http:// www.cyflogic.com). 1. In contrast to, e.g., DAPI staining, which requires a UV laser, for PI staining, a standard blue laser (488 nm) is sufficient. For detection, use the orange fluorescence channel (FL2). This is available for most cytometers. 2. For data acquisition, we recommend creating the following plots:

(a) Forward scatter (FSC) vs. side scatter (SSC) either as dot plot or density plot: This plot provides insights into the quality and integrity of your cells. Even if this plot is not essential for cell cycle analysis, it would provide additional information that might be interesting for putative trouble shooting.

434

Ammerpohl, Tiwari, and Kalthoff

(b) FL2-A (FL2-area) vs. FL2-W (FL2-width) either as dot plot or density plot: This plot allows the discrimination of single cells from cell aggregates, which may alter DNA cell cycle measurements. A cell in the G2 phase has twice the DNA amount than a cell in the G1 phase. Thus, when a cell in G2 passes the laser beam, it will provide a stronger fluorescence signal than a cell in G1. Two cells in G1 clumping together and passing the beam at the same time will be difficult to distinguish from a single cell in G2. However, the analysis of the photomultiplier signal allows a discrimination of such events when the two cells pass through the laser beam in single file. This will be accomplished by the FL2-A vs. FL2-W plot. By adjusting the amplifiers (FL2-H, FL2-A, and FL2-W), cells in G1 phase should be located around channel 200 (linear presentation). Signals or events generated by cell aggregates should be excluded from further analysis by gating the signals generated from single cells. Figure 1 provides the typical view of a cell cycle analysis. It should be noted that there are several limitations to this approach (see Note 1). In some cases, one might prefer to define a region that contains the G1 phase and acquire a fixed number of cells in this region. In this case, the histograms have similar heights. However, this might be critical when some of your samples show a strong arrest in a cell cycle phase other than the G1 phase. (c) FL2-A histogram plot: This histogram displays the different phases of the cell cycle. Only signals from single cells are shown in this histogram (they have been gated and selected in the FL2-A vs. FL2-W plot). The statistics function of the FACS software allows a determination of the cell numbers in G0/G1, S, or G2 phase. When information about apoptotic cells is of interest, the subG1 fraction can be included in the gate in the FL2-A vs. FL2-W plot. The subG1 fraction includes apoptotic cells. However, other cellular fragments including DNA other than from apoptotic origin (e.g., mechanically damaged cells or aneuploid cells) might contribute to the subG1 signals and this might cause misleading results. When a solely apoptosis assay is desired, we recommend other assays such as Annexin V–PI staining (detection of phosphatidylserine at the cell surface) or TUNEL staining. 3.2. Cell Viability Assay (MTT Assay)

The viability assay can be performed in numerous formats, from 96-well plates to T75 cell culture flasks. Here, we describe an assay suitable for 12-well plates. According to the individual conditions, the assay can easily be scaled up or down. The assay presented here works with adherent cells only.

Target Gene Discovery for Novel Therapeutic

435

Fig. 1. Cell cycle analysis of a pancreatic cancer cell line. Left column (a, c): Density plots; right column (b, d): histograms. The upper row (a, b) demonstrates an average result of a cell cycle analysis from a pancreatic cancer cell line growing under standard cell culture conditions. (a) Signals obtained from single cells are located in region R1. This region is pivotal for the cell cycle analysis and should be gated for further analysis. Cells in region R2 show cell aggregates (duplets), which cause artificial results and, therefore, should be excluded from the analysis. Signals in region R3 (subG1) represent apoptotic cells, apoptotic bodies, or otherwise broken cells. (b, c) Histogram of the cells gated in region R1 of the density plot. The first peak corresponds to the G0/G1 phase, whereas the second peak represents cells in the G2 phase. The valley between the peaks displays cells in the S phase of the cell cycle. The lower row (c, d) shows a cell population treated with a strong inhibitor of the cell cycle, leading to a cell cycle arrest in the G2 phase. Although signals obtained from cells in G1 and S phase are substantially reduced, the peak corresponding to the G2 phase is significantly increased. In parallel, the subG1 fraction (R3) is increased, indicating increased cell death.

1. Seed the cells for the experiments. For most experiments, the cells should not have reached confluence at the time the experiment is performed. As a guideline, seed 2 × 105 cells per well (12-well plate) 24 h before you perform the MTT assay. When an additional treatment of the cells with drugs is intended, adjust the time.

436

Ammerpohl, Tiwari, and Kalthoff

2. Start treatment of the cells at the applicable time point (e.g., 12 h before starting the MTT assay). 3. Remove the medium from the cells by aspiration, being careful not to disturb the cells. 4. Add the MTT solution. The cells should be covered completely. As a guideline, add 500 ml MTT solution per well (12-well plate). Do not add the solution directly to the cells, but, rather, add it to the border of the wells to prevent detachment of the cells. 5. Incubate at 37°C in a cell culture incubator. 6. Check staining of the cells every 10 min. Depending on the cell viability, dark violet crystals become visible. The incubation time should be adjusted so the wells do not become completely deep violet because the measurement becomes difficult in this case. Usually an incubation time between 30 and 60 min is sufficient. 7. Remove the MTT solution completely from the cells by aspiration, taking care not to disturb the cells. 8. Add 300 ml 2-propanol (absolute) to the wells. 9. Incubate on a shaker at 200 rpm at room temperature until all crystals are completely dissolved (usually 20 min). 10. Transfer 50–150 ml of the alcoholic solution (depending on its density) into a well of a new 96-well plate. 11. Measure the optical density of the solution at 570 nm (with reference at 650 nm) in an optical plate reader. When more than one plate is being measured, store the other plate at +4°C until ready, see Note 2. 12. When including appropriate controls (e.g., untreated cells), the effect of a specific treatment on the cell viability can be determined. 3.3. Cell Assay (Crystal Violet Stain)

This is an alternative assay to stain viable cells and to get an overview regarding the viable cells in a culture vessel. This cell assay can be performed in numerous formats from 96-well plates to T75 cell culture flasks. Here, we describe an assay suitable for 24-well plates. The assay presented here works with adherent cells only. 1. Seed the cells for the experiments. For most experiments, the cells should not have reached confluence by the time of performing the crystal violet stain. As a guideline, seed 0.5–1 × 105 cells per well (24-well plate) 24 h before you perform the crystal violet stain. When an additional treatment of the cells with drugs is intended, adjust the time. 2. Start treatment of the cells at the applicable time point (e.g., 12 h before crystal violet staining).

Target Gene Discovery for Novel Therapeutic

437

3. Remove the medium from the cells by aspiration, being careful not to scratch the cells. 4. Add 100 ml of crystal violet solution (24-well plate). The cells should be covered completely. Do not add the solution directly to the cells, but, rather, add it to the border of the wells to prevent detachment of the cells. 5. Incubate for 15–30 min with slight shaking. 6. Aspirate the crystal violet solution completely. 7. Wash three to four times with Aqua Dest by adding water to the border of the wells (to prevent detachment of the cells) followed by aspiration. 8. Let the wells dry completely at room temperature. 9. Add 400 ml methanol to each well. 10. Shake at room temperature until the dye is completely dissolved. 11. Transfer 100 ml of the alcoholic solution into a well of a new 96-well plate. 12. Determine the optical density at 590 nm with an optical plate reader. 13. When including appropriate controls (e.g., untreated cells), the effect of a specific treatment on the amount of cells can be estimated. 3.4. Easy Tumor Cell Invasion Assay

1. Seed 3 × 105 cells, preferentially a fibroblast cell line (e.g., normal diploid human skin fibroblasts), in 1 ml complete cell culture medium, into one well of a 24-well cell culture plate. 2. Let the cells grow until they have reached confluence. 3. Remove the medium from the cells by aspiration, being careful not to disturb the cells. 4. Add 1 ml PBS to each well. Do not add PBS directly to the cells, but, rather, add it to the border of the wells to prevent detachment of the cells. 5. Remove PBS from the cells by aspiration, being careful not to disturb the cells. 6. Add 400 ml DMSO to the cells carefully to prevent cell detachment. 7. Incubate for 1 h at room temperature. 8. Remove the DMSO from the cells by aspiration, being careful not to disturb the cells. Particular care is needed at all steps after DMSO treatment because the confluent cell line layer can be easily detached (see Note 3). 9. Add 1 ml PBS to each well. Do not add PBS directly to the cells, but, rather, add it to the border of the wells to prevent detachment of the cells.

438

Ammerpohl, Tiwari, and Kalthoff

10. Remove PBS from the cells by aspiration, being careful not to scratch the cells. 11. Repeat steps 9 and 10 two times. 12. Seed 2 × 104 invasively growing cells in 1 ml complete cell culture medium on top of the DMSO-treated cells. 13. Add drugs to the medium where applicable. 14. Incubate in a cell culture incubator for 24–48 h. 15. Remove the medium from the cells by aspiration, being careful not to disturb the cells. 16. Add 1 ml PBS to each well. Do not add PBS directly to the cells, but, rather, add it to the border of the wells to prevent detachment of the cells. 17. Remove PBS from the cells by aspiration, being careful not to disturb the cells. 18. Add PBS–trypan blue solution to the cells carefully. Prevent disturbing the cells. 19. Incubate for 20 min at room temperature. 20. Remove the staining solution from the cells by aspiration. Do not disturb the cells. 21. Add 1 ml PBS carefully to each well. 22. Remove PBS from the cells by aspiration. Do not disturb the cells. 23. Repeat steps 21 and 22 two times. 24. Only the DMSO-fixed cells simulating the organic layer to be destroyed and invaded by the tumor cells become stained by trypan blue. Cell staining can be measured by microscopy or photometry: Microscopy: For microscopic analysis, take pictures of numerous areas of the culture vessel and measure stained or unstained areas using your preferred software. Photometry: (a) For photometric analysis, add 300 ml lysis buffer (e.g., PolyATtract GTC extraction buffer) to each well.

(b) Incubate with shaking at room temperature for 30 min.

(c) Depending on the viscosity of the solution, it might be necessary to shear the chromatin. This can be done either by ultrasound or by aspiration of the solution through a 20- or 24-gauge needle several times.

(d) Transfer 100 ml of this solution into a well of a new 96-well plate and determine the optical density at 590 nm.

(e) A high optical density corresponds to a low tumor cell invasion. A sample without any tumor cell addition might be useful to determine the maximal optical density

Target Gene Discovery for Novel Therapeutic

439

(no invasion at all). A sample without trypan blue staining will present the minimal density (maximal invasion, matrix completely destroyed). 3.5. Establishment of a Stable Transfected Cell Line with a Reporter Gene 3.5.1. Overview

A reporter gene with a strong viral vector permits not only longitudinal monitoring of cell trafficking and/or tumor growth but also monitoring of cell survival after engraftment into the mouse because constitutive expression of the reporter gene occurs only when the cell is alive. It is necessary to stably express the reporter gene to visualize all implanted cells, to get an accurate read out of tumor growth, and to ensure that the reporter expression is not diluted on cell division. As shown in Fig. 2, a molecular construct (plasmid) is obtained commercially that contains the reporter gene (DsRed2) and its promoter. The plasmid also contains a drug selection marker (e.g., neomycin-resistant gene [Neo]). The plasmid vector can be used directly to transfect the cells; the entry into cells is facilitated by cationic lipid-based transfecting agents or by electroporation. After transfection, cells are subjected to a selection procedure in which only cells that have integrated the reporter gene in the genome will be selected. This is achieved by incubation of cells with a suitable antibiotic drug; cells not expressing the antibiotic-resistant gene (e.g., Neo) will be killed by the drug (e.g., neomycin). Clones of surviving cells will be collected and amplified for further characterization. The level of

Fig. 2. Representative reporter gene plasmid is shown with the reporter gene under the control of a eukaryotic or viral regulatory element (promoter 1). A drug selection marker, e.g. a neomycin-resistant gene with its promoter (promoter 2) is also present, allowing selection in both, eukaryotic cells and bacteria.

440

Ammerpohl, Tiwari, and Kalthoff

reporter gene expression may vary among the clones; therefore, the clone that stably expresses the highest level of the reporter gene is chosen. 3.5.2. Titrating G418 (Neomycin) to Establish a Kill Curve

Because each cell line has a different sensitivity to G418, you should determine the optimal concentration of drug for selection. 1. Split confluent cells 1:5 in 10 ml DMEM + 10% FCS media. 2. Transfer 0.5 mL cell suspension into a 24-well plate containing 500 ml of media with G418. Use a G418 range starting at 50 mg/ml, with the highest concentration at 1 mg/ml. 3. Use the lowest concentration of drug that begins to give massive cell death in 3 days and kills all of the cells within 2 weeks.

3.5.3. Transfection and Drug Selection

1. Grow cells to ~80% confluence in complete medium and transfect your plasmid (preferably linearized, see Note 4) with the appropriate method, for example, LipofectAmine Plus (Invitrogen). Include a mock transfection control that contains only the transfection reagent but no DNA. 2. After 24–48 h of transfection, cells are split to 1:10, 1:20, or 1:50 into two 15-cm plates containing 25 ml of DMEM + 10% FCS + the appropriate concentration of G418. 3. Observe cell growth every 2–3 days and change the medium with G418 every week or more often if necessary. After 2–4 weeks, isolated colonies should begin to appear. At this time point, the mock-transfected control cells should be dead and you can proceed with the cloning.

3.5.4. Isolation of Drug-Resistant Clones

This can be done in one of two ways (see also Note 5): (A) Adapted from Dario Neri’s laboratory, Institute of Pharmaceutical Sciences, ETHZ. 1. Take four to six (or more) 96-well plates and fill them with 80 ml of G418-supplemented medium. Detach cells using Accutase (gives no clumps!) and resuspend an aliquot of cells in a 15-ml Falcon tube. Then make one or two serial dilutions (e.g., one in ten) and determine the cell concentration with the cytometer. Check your dilutions until you can only count 1–5 cells in the cytometer. 2. Prepare a sufficient amount of a 3 cells/20 ml (150 cells/ ml) dilution. 3. Let the cells grow in the wells for approximately 2 weeks. Approximately 1 week after plating, you can also add 100 ml of fresh G418 medium to each well to reestablish a high G418 concentration (G418 may be broken down with time) and to prevent contamination.

Target Gene Discovery for Novel Therapeutic

441

4. Once “large” colonies (approximately one sixth of the well diameter) are visible by eye when viewing the wells from the bottom of the plate, and the color of the medium starts to change (typically after ~2 weeks), colonies can be screened for expression by observing the plate under a fluorescence microscope. 5. Choose the highest expressing clones and transfer them to a 24-well plate. Let them grow and determine the protein expression levels again once the cells have reached a suitable cell number. At the level of 24- or 6-well plates, it is also possible to make a first cryotube of each selected clone. 6. Grow several best-expressing candidates up to the 75-cm2 flask level. (B) Adapted from The Morimoto Lab, Northwestern University: 1. Melt 1% low-melting agarose in a microwave and incubate the agarose in a water bath at 37°C until it has cooled down to 37°C. Mix 10 ml of 1% agarose and 10 ml of 2 × DMEM + 20% FCS and pour it into a 15-cm plate. Leave the plate at room temperature for less than 30 min. Place the plate into a CO2 incubator. 2. Mark large, healthy, and well-separated colonies and put a colony separator (see Fig. 3) directly on the surface of the soft agarose. Apply gentle pressure to the top of the separator to prevent movement. 3. Add 100 ml of Accutase, pipetting several times gently to penetrate the soft agarose surrounding the isolated colony and incubate at RT. 4. Place the Accutase-treated colony into one well of a 48-well plate containing 1 ml of DMEM + 10% FBS + G418. 5. Split clones that reach ~80% confluence into one 12-well plate and one 6-well plate. Use one plate for checking

Fig. 3. Colony separator. These separators are usually made of glass or stainless steel. For successful colony separation, the lower end should be greased.

442

Ammerpohl, Tiwari, and Kalthoff

protein expression/induction. Check the protein expression of each clone by its fluorescence intensity under a fluorescence microscope. 6. Split only expression-positive clones into a 10-cm plate and store in liquid N2. 7. For subcloning, replate approximately 100 cells per plate. For example, immediately after splitting, take 10–100 ml of culture and replate it in a 10-cm plate. 8. Repeat steps 5–11 for up to six colonies to check for protein expression. 3.6. Image Analyses

Many optical in vivo imaging devices are available commercially. Suppliers include Hamamatsu, Berthold, Caliper Life Science, Raytest, CRI, LiCOR, GE Healthcare, and VisenMedical. Particular features of respective systems include multispectral imaging, three-dimensional quantification, the ability to image both fluorescence and bioluminescence in the same system, and the ability to overlay anatomical X-ray imaging with fluorescence imaging. Below is an outline of the procedure used to monitor tumor growth in mice using an in vivo imager purchased from Berthold. Four- to six-week-old SCID beige mice were inoculated orthotopically with PancTuI cells, stably expressing DsRed2 fluorescent protein. The Berthold LB983 NightOwl optical imager (EG&G Berthold, Bad Wildbad, Germany) was used to monitor tumor growth by detection of DsRed2 fluorescence. The imager contains a Peltier cooled backlit CCD camera (2,184 × 1,472 pixels) housed within a light-tight enclosure. The excitation source is a ring light used for epi-illumination, mounted 12 cm above the mice. For excitation, a 550-nm (10-nm) filter, and for emission, a 605-nm (55-nm) filter was used. The exposure time was 2 s. Using the WinLight 32 software (Berthold), fluorescent signals from the images were calculated by selecting a circular region of interest around the materials and integrating the signal from that area. Signals were expressed in ph/s (or fluorescent tumor area, see Note 6). In addition, color-enhanced overlays of fluorescence images on photographic images were created using the WinLight software (Fig. 4).

4. Notes 1. Aggregate discrimination in FACS analyses The detection of aggregates based on FL2-A vs. FL2-W suffers from two notable limitations. The approach requires that the

Target Gene Discovery for Novel Therapeutic

443

Fig. 4. Tumor growth rate of the pancreatic adenocarcinoma cell line, PancTu1. Orthotopic implantation of 0.5 × 106 stably transfected DsRed2 cells into SCID mice was performed. DsRed2 fluorescence emission was collected at regular intervals.

shape of the aggregate is different from that of a single cell. This is the case when a doublet passes the laser beam in single file, but it is not the case when the doublet passes the laser beam with one cell behind the other. In this case, the fluorescence profile of two G1 cells cannot be distinguished from that of the G2 cell. Second, in a population of cells that are heterogenous in shape, an oblong G2 cell cannot be easily distinguished from a G1 doublet on the basis of peak or width vs. area. This can be the case, for example, with epithelial cells derived from tumor tissue. Therefore, cells should be in a single-cell suspension as much as possible. Vortexing pellets while adding solutions is important to minimize cell clumping. Fixation in ice-cold alcohol may also help. If there is obvious clumping, pass the cells through a 40to 70-mm nylon mesh before analyses on the FACS and vortex each sample just before running it through the FACS. If the coefficient of variation (CV) of the G1 peak is high (i.e., 10% or more), estimation of the fraction of cells in different phases of the cell cycle is difficult. A broad CV may be due to insufficient fixation, RNA contamination, or concentration of stain that is either too low or too high, leading to nonstoichiometric or nonspecific binding. 2. Spectrophotometric analyses in MTT assay MTT enters the cell by endocytosis and is converted by dehydrogenases to insoluble purple formazan crystals, which are read spectrophotometrically after organic solvents have dissolved the crystal. The formation of insoluble formazan (Subheading 3.2) has been proven to be unstable at room temperature in solution. Therefore, it is recommended to standardize the time before the plates are read in the spectrophotometer and also to store the plates at 4°C if several plates need to be read at the same time.

444

Ammerpohl, Tiwari, and Kalthoff

The MTT assay is less effective if cells have been cultivated in the same media for many days, which may lead to underestimation of control and untreated samples. 3. Cell detachment in the invasion assay The invasion assay (Subheading 3.4) relies on the invasion and/ or digestion of the fibroblast layer by tumor cells, which lead to areas with mostly unstained cells. The dimensions of these unstained regions can be estimated semiquantitatively and they correspond to the invasive potential of the tumor cells. Therefore, it is paramount that the confluent fibroblast layer is not detached during the procedure of adding PBS or seeding of invasive cells. Particular care needs to be taken after DMSO treatment because the detachment of the fibroblast layer of cells can occur very easily. 4. Linear vs. circular plasmid for transfection In the establishment of a stably transfected cell line (Subheading 3.5), circular or linearized plasmid can be used for transfection. It is recommended that linearized plasmid be used and that the plasmid is cut at a unique site that does not disrupt the coding sequence of the reporter gene or drug resistance. Stable transfection with a circular plasmid may result in recombination into the genome at a random position within the plasmid and can lead to antibiotic resistant clones that do not express the reporter gene. A screening of the transfection method and reagents (e.g., Amaxa, GT porator) may be initially required to obtain a high transfection efficiency for the cell line of interest. 5. Colony screening of stable transfectants If the method of picking colonies described in Subheading 3.5.4 is not successful, a simpler method is to wait until sizeable colonies appear and then mark the position of individual colonies on the underside of the dish with a marker pen, with the help of an inverted microscope. Then, in a sterile tissue culture hood, wash the plate with a little trypsin and add just enough fresh trypsin to cover the plate. To obtain individual colonies, use a 1-ml Pipetman to suck up the colony using the marking on the underside of the plate to determine the localization of that colony. 6. Imaging of fluorescent tumor area A disadvantage with optical imaging is the problem of depth sensitivity. Thus, for two-dimensional planar imaging, as described in Subheading 3.6, the emission signal detected is dependent on the proximity of the reporter gene to the surface. In cases where tumors grow close to the skin, the tumor may be perceived as being larger than tumors growing deeper inside the animal. Hoffman and colleagues have demonstrated that for tumors of less than 1,500 mm3, measurement of the fluorescent tumor area detected by planar optical imaging correlated strongly with the tumor volume measured by MRI (17). Therefore, measurement of fluorescent tumor area may be a better indication of tumor growth and drug efficacy than total relative light unit count.

Target Gene Discovery for Novel Therapeutic

445

Acknowledgments ST acknowledges the contribution of MOIN-SH in supporting molecular imaging-based preclinical projects.

References 1. Ammerpohl, O., et al. (2007) Complementary effects of HDAC inhibitor 4-PB on gap junction communication and cellular export mechanisms support restoration of chemosensitivity of PDAC cells. Br J Cancer. 96(1): p. 73–81. 2. Orntoft, T.F., S.E. Petersen, and H. Wolf. (1988) Dual-parameter flow cytometry of transitional cell carcinomas. Quantitation of DNA content and binding of carbohydrate ligands in cellular subpopulations. Cancer. 61(5): p. 963–70. 3. Asklund, T., et al. (2004) Histone deacetylase inhibitor 4-phenylbutyrate modulates glial fibrillary acidic protein and connexin 43 expression, and enhances gap-junction communication, in human glioblastoma cells. Eur J Cancer. 40(7): p. 1073–81. 4. Svechnikova, I., O. Ammerpohl, and T.J. Ekstrom. (2007) p21waf1/Cip1 partially mediates apoptosis in hepatocellular carcinoma cells. Biochem Biophys Res Commun. 354(2): p. 466–71. 5. Ammerpohl, O., et al. (2004) HDACi phenylbutyrate increases bystander killing of HSV-tk transfected glioma cells. Biochem Biophys Res Commun. 324(1): p. 8–14. 6. Appelskog, I.B., et al. (2004) Histone deacetylase inhibitor 4-phenylbutyrate suppresses GAPDH mRNA expression in glioma cells. Int J Oncol. 24(6): p. 1419–25. 7. Tolboom, T.C. and T.W. Huizinga. (2007) In vitro matrigel fibroblast invasion assay. Methods Mol Med. 135: p. 413–21. 8. Casey, R.C., et al. (2003) Establishment of an in vitro assay to measure the invasion of ovarian carcinoma cells through mesothelial cell monolayers. Clin Exp Metastasis. 20(4): p. 343–56.

9. Trauzold, A., et al. (2005) CD95 and TRAF2 promote invasiveness of pancreatic cancer cells. FASEB J. 19(6): p. 620–2. 10. Glunde, K., A.P. Pathak, and Z.M. Bhujwalla. (2007) Molecular-functional imaging of cancer: to image and imagine. Trends Mol Med. 13(7): p. 287–97. 11. Kelloff, G.J., et al. (2005) The progress and promise of molecular imaging probes in oncologic drug development. Clin Cancer Res. 11(22): p. 7967–85. 12. Stell, A., et al. (2007) Multimodality imaging: novel pharmacological applications of reporter systems. Q J Nucl Med Mol Imaging. 51(2): p. 127–38. 13. Frangioni, J.V. (2003) In vivo near-infrared fluorescence imaging. Curr Opin Chem Biol. 7(5): p. 626–34. 14. Weissleder, R., et al. (1999) In vivo imaging of tumors with protease-activated near-infrared fluorescent probes. Nat Biotechnol. 17(4): p. 375–8. 15. Villalobos, V., S. Naik, and D. Piwnica-Worms. (2007) Current state of imaging proteinprotein interactions in vivo with genetically encoded reporters. Annu Rev Biomed Eng. 9: p. 321–49. 16. Carver, B.S. and P.P. Pandolfi. (2006) Mouse modeling in oncologic preclinical and translational research. Clin Cancer Res. 12(18): p. 5305–11. 17. Bouvet M, J. Spernyak, M.H. Katz, R.V. Mazurchuk, S. Takimoto, Y.M. Rustum, R. Bernacki, A.R. Moossa, and R.M. Hoffman. (2005) High correlation of whole-body red fluorescent protein imaging and magnetic resonance imaging on an orthotopic model of pancreatic cancer. Cancer Res. 65(21): p. 9829–33.

INDEX A Adapter............................................................13, 102, 156, 158–160, 218–219, 222–223, 244, 255 Adenomatouspolyposis .................................................. 171 Affymetrix .............. 64, 67, 69–72, 78, 80, 81, 85, 156, 158, 162, 165, 166, 173–186, 193, 194, 234, 239, 246–248, 250, 251, 255, 258, 266, 295, 325, 376, 391, 395, 399, 412, 415, 416, 421, 424, 425 Agar ........................................ 112, 143, 219, 220, 223–229 Agarose .................................................. 105, 110, 111, 116, 125, 129, 137, 146, 152, 153, 160, 161, 164, 175, 177, 180–182, 199, 201, 205, 210, 219, 220, 222, 248, 251, 283, 285, 286, 337, 343, 346, 431, 441, 442 Agilent® ...............................................................22, 24, 46, 67, 72, 78, 127, 145, 146, 246, 250, 285, 287, 295, 296, 306, 308, 323, 325, 330, 331, 337 Algorithm ...... 183, 184, 242, 265, 266, 370, 376, 377, 380, 381, 383–389, 391–392, 401, 402, 429 Alkaline phosphatase ......................................220, 226, 227 Amplification ............... 51, 57, 80, 103, 105, 114, 136, 138, 140–141, 145, 148–149, 156, 157, 159, 165, 173–176, 179, 181, 186, 187, 190, 192, 229, 232, 237, 239, 240, 242, 243, 248, 251, 265, 296, 305, 308, 310–313, 323–325, 330–331, 333–334, 337, 352, 353, 359 Antibody...........................32, 57, 58, 63, 72, 156, 158, 186, 215–217, 220, 226, 228, 234, 342 Antisense RNA (aRNA) ............... 135–138, 145, 146, 152, 330, 331, 333–335, 337 Arcturus................................................... 40, 138, 144, 284, 290, 295, 331 Array CGH .... 63, 72, 78, 79, 232, 236, 237, 239, 240, 242, 244, 245, 256–257, 264, 265, 369–370 Autoantibody ......................................................... 215–217

B Bacteriophage .........................................177, 179, 224, 225 Banking ......................................................................... 3 Benchmark ..................................... 376, 383, 386–388, 404 Biohazard ......................................................14, 15, 17, 26 Bioinformatic ...........................3, 75–77, 79, 81, 85, 91, 97, 232, 242, 366, 370, 411 Biomarker .................................... 6, 78, 215–217, 232, 294

Biotin..................................... 180, 190, 192, 251, 253–255, 296, 312, 313, 325 Biotinylation .......................................................... 190, 191 Bromopheol-blue...................................111, 199–201, 204, 283, 286 Bovine serum albumin (BSA)........................112, 139, 141, 143, 146, 147, 149, 180, 226, 258, 331 Buffy Coat ........................................................... 13, 15–17

C Calibration..............................................162, 163, 375–405 Cancer Cancer Genome Anatomy Project (CGAP).................... 92 Carcinoma .................... 4, 9, 21, 27, 52, 216, 217, 239, 244, 280, 328, 418, 419 Chemotherapy .................................... 8, 232–234, 328, 341 Clinical information ............................................ 1–28, 242 Clinical trial ............................................................... 3, 234 Collection ................................2–3, 5–8, 10–13, 17, 20, 24, 25, 27, 28, 40, 43, 44, 46, 51–53, 66, 69, 90, 108, 144, 202, 245, 251–253, 296, 297, 305, 306, 312, 313, 330, 335, 347, 355, 357, 394, 411–413 Comparative genome hybridization ............................... 62, 63, 68, 72, 78–82, 232, 236, 237, 239–242, 244, 245, 247–249, 255–265, 369–370 Complementary DNA (cDNA) ............................... 50, 66, 69, 71, 79, 90, 92, 102, 103, 105, 108–110, 114, 116–117, 119–127, 129, 136–140, 145–148, 214, 217–222, 225, 228, 233, 241, 251–254, 310–313, 325, 331, 333, 334, 354, 356, 358–359, 377, 394, 395, 412, 416, 424 Complementary RNA (cRNA) .....................246, 251–254, 377, 394, 395 Coomassie ......................................................199, 201, 203 cpG ................................................................156, 157, 163, 165, 166, 244 Cresyl Violet ......................................... 41, 42, 46, 284, 290 Cryoconservation............................................................. 45 Cryosection ..........................................................41, 42, 46

D Database .................... 4, 19, 26, 27, 46, 67, 69, 71–75, 83, 84, 90, 91, 365, 366, 370, 410–411 2D-DIGE ............................................................. 197–210

447

CANCER GENE PROFILING 448 Index Denhardt’s Solution................ 158, 162, 177, 183, 248, 263 Diethyl pyrocarbonate (DEPC) ...... 37, 110, 114, 115, 117, 129, 138, 146, 218, 221, 283–285, 289, 290, 330, 333, 336, 337 Differential Display ........................................... 95, 99–131 DIGE ............................................................. 197–210 Dimethyl sulfoxide (DMSO) .....10, 17, 140, 149, 158, 161, 162, 177, 183, 254, 262, 263, 437, 438, 444 Ditag .............................................................137, 140–142, 148–153 DNA ................................7, 50, 61, 91, 100, 136, 155–170, 172, 214, 232, 296, 329, 341, 351, 363, 377, 411, 428 dNTP ............. 105, 110–112, 139, 140, 143, 146, 149, 152, 158, 161, 177, 181, 218, 221, 222, 247, 248, 252, 256, 259, 343, 346, 350, 356 2D-PAGE ..................................................................... 198 Dry ice .............15, 16, 20, 41, 42, 52, 56, 60, 123, 139, 146, 283, 285, 287, 288, 297, 298, 314, 357 Ductal carcinoma in situ (DCIS) ...................................... 9

E Electrophoresis .......................102, 103, 105, 107, 108, 111, 120, 121, 140, 141, 152, 163, 197–210, 218, 222, 250, 251, 283, 285, 286 Eosin ............................ 33, 41, 42, 44, 46, 57, 59, 138, 144, 283, 284, 289, 295, 298, 315, 344 Epidermal growth factor receptor (EGFR) .................... 84, 341–350 Epigenomics .................................................................... 62 Ethanol .......................41, 42, 110, 111, 114, 115, 121–123, 140–144, 148–151, 175, 177–179, 185, 201, 205, 219, 222, 223, 246, 247, 249–251, 253, 256, 283, 284, 289, 290, 295, 296, 298, 304, 306, 312, 315, 323, 333, 335, 337, 343–346, 355–357, 431, 433 Ethical ......................................................................... 6, 25 Expressed sequence tag (EST) .............66, 89–97, 365, 423 Expression profiling..............62, 66–79, 136, 173, 202, 203, 209, 231, 233–235, 244, 245, 264, 279–282, 287, 294, 328, 332, 428

F Fluorescence .............................................. 32, 56, 122, 157, 162, 176, 177, 179, 192, 200, 205, 206, 214, 240, 265, 309, 329, 352, 354, 376, 378, 431, 433, 434, 441, 442 Fluorescence-activated cell sorting (FACS).................... 32, 431–434, 443 Fluorescence in situ hybridization (FISH) ............... 56–59, 240, 265 Fragmentation ....................... 156–161, 163–165, 174, 175, 180, 182, 183, 186–188, 194, 208, 248, 254, 259, 261, 262, 312–313, 330, 335

Functional enrichment ...................................366–368, 370

G Gene set analysis (GSA).................................366, 368–370 Genomic DNA........................................................91, 110, 114, 156, 159, 164, 172, 175, 177–180, 185, 187, 190, 191, 247, 248, 251, 256–258 Genomiphi® ........................................................... 157, 159 Genotype ............................... 173, 174, 176, 183–185, 187, 190, 192–193, 258, 341–350 Glass slide ...................... 33, 34, 37, 41, 42, 51, 56, 59, 138, 143, 144, 257, 283, 289, 301, 315, 317, 319, 320, 330 Glioblastoma ................................................................. 244 GoldenGate .................... 174–176, 178, 184, 186, 190–192

H Hematoxylin ...........................33–35, 57, 59, 138, 144, 283, 284, 289, 295, 298, 315, 344 Heterozygosity ............................... 172, 173, 192, 232, 369

I Illumina ......................................67, 70–72, 78, 80, 82, 125, 173–176, 178, 184–188, 192–194, 295 Immunohistochemistry .............. 55, 58, 198, 232, 234, 294 Ischemia ...........................................................5, 8, 21, 143 Iso-electric focussing (IEF) ............................200, 204, 210

K Kegg ...............................................................364, 365, 370

L Labelling ........................4, 13, 51, 69, 102, 104, 105, 112, 126, 156, 158, 161, 165, 246–248, 250, 251, 253, 254, 256, 262, 296, 311–313, 325, 330–331 Lambda phage ................................................225, 228, 229 Laser ...............................32, 33, 39–45, 105, 111, 123, 135, 199, 206, 208, 218, 283–285, 288, 295, 299–304, 308, 314, 315, 317–321, 347, 433, 434, 443 Laser capture .....................40, 135, 283, 288, 295, 299, 308 Leukemia ............................................................. 233–241 Ligation .................. 102, 147, 148, 156, 158, 160, 164, 174, 179–182, 191, 218–219, 222–223, 229, 259 Linear .............................136, 162, 183, 309, 310, 325, 354, 379–381, 383–386, 393, 402, 404, 444 Linkage disequilibrium ............................................ 80, 193 Liquid nitrogen (LN2) ................5, 9, 10, 15–18, 20, 33, 41, 42, 45, 143, 263, 288, 295, 297, 332, 357, 385 Lobular carcinoma in situ (LCIS) ..................................... 9 LongSAGE ........................................................... 135–153 Lymphoma ............................................................231, 233, 235–241, 243, 244, 268

CANCER GENE PROFILING 449 Index M

Q

Manual .................................. 12, 31–46, 55, 109, 135–153, 161, 162, 188, 336, 344, 432 Meta-analysis .............................................73, 75, 409–425 Methylation ...................................................155–170, 244, 245, 428 Microdissection ................................... 24, 31–46, 135–153, 264, 280, 283, 285, 288–290, 295, 297–299, 301, 302, 344 Microsatellite ....................................................80, 171, 172 Multiplex ....................................................................... 14

Quality control (QC)......................... 2, 11–24, 58, 72, 162, 174, 175, 180, 184, 189–190, 194, 250, 251, 302, 325, 334 Quantitative RT-PCR (qRT-PCR) ...............108, 352, 354

N Non-polyposis colon cancer........................................... 171 Normalization ........................102, 157, 162, 163, 166, 167, 383–387, 389, 402, 403, 425 Northern blot ................................. 108, 109, 112, 125–127 Nylon membrane ........................................................... 112

O Oncogene ........................................... 80, 82–84, 100, 106, 197, 214, 233, 235–237, 239, 240, 242–244

P Pancreas Cancer .........................................4, 279–281, 355 PCR tube ........................41, 110, 111, 178, 181–183, 260, 310–312, 358, 359 PEG ...............................................................219, 222, 229 Peutz Jeghers syndrome ................................................. 171 Phage .............................. 131, 219, 220, 223–225, 227–229 Phenotype ............................................... 32, 155, 235, 240, 341–350, 413, 417 PicoGreen® ........................................................... 175–177, 179, 187 Plasma ........................................................13–16, 114, 202 Poly-A RNA ................................................................. 252 Polyacrylamide .......................103, 105, 121, 137, 149, 150, 198, 200, 205, 218 Polyacrylamide gel electrophoresis (PAGE) ................. 198, 200, 201, 205, 210, 218, 369 Polymerase.... 24, 59, 66, 102, 110–112, 119, 120, 136, 139, 140, 143, 145, 146, 149, 152, 156, 158, 161, 172, 176, 177, 181, 190, 218, 222, 232, 248, 251, 252, 256, 259, 333, 334, 343, 346, 351–361 Polymerase chain reaction (PCR) ................ 24, 41, 66, 102, 137, 156, 172, 218, 232, 310, 334, 343, 351–361, 387 Preprocessing ......................... 376, 382–384, 386–388, 399, 401, 404, 412–413, 424 Prostate Cancer .............8, 9, 33, 40, 81, 243, 244, 293–325 Proteinase K ........................... 157, 159, 163, 250, 343, 345 Proteomics .........................39, 197–210, 214, 218, 424, 429

R Random primer .............................. 136, 176, 247, 256, 356 Raw data ......................................... 162–163, 173, 192, 410 Real time ...................24, 108, 126, 136, 334, 345, 351–361 Record ........................................5, 8, 10, 11, 13, 15–17, 20, 27, 28, 300, 309, 310, 423 Response prediction............................................... 327–338 Restriction Enzyme ....................... 102, 156, 157, 177, 179, 180, 219, 221, 223, 258–259 Reverse transcription (RT) .... 102, 108, 110, 114, 116–120, 126–129, 218, 221, 333, 354–359, 442 RNA ............................. 5, 33, 40, 50, 61, 89, 102, 135–152, 174, 197, 218, 232, 280, 294, 328, 351, 377, 412, 430 RNA in-situ hybridization (RNA-ISH) ................... 56, 57 RNA integrity number (RIN) ....................................... 250 RNAse ..............................42, 110, 111, 128, 129, 138, 143, 144, 146, 163, 218, 221, 222, 229, 249–254, 282, 284–286, 289, 290, 296, 304, 308, 322, 323, 330, 333, 355–357, 431, 433 RNaseH..........................................................137, 139, 146 RT-PCR ................................. 102, 108, 126, 128, 356, 359

S Safety ....................................................................6, 26, 361 SEREX .................................................................. 218, 221 Serial analysis of gene expression (SAGE) ................... 100, 135, 136, 138, 152, 153 Serum ...............................10, 11, 13–17, 27, 177, 199, 203, 215, 217, 218, 226, 331, 430 Single nucleotide polymorphism (SNP) ........62, 68, 71, 78, 80–81, 171–194, 232, 239–242, 248–249, 251, 258, 265, 267, 345, 346 Standard operating procedures (SOPs) .....................2, 3, 6, 11–20, 24, 143 Storage ................. 3, 5, 9–12, 14–19, 24, 27, 41, 42, 51, 52, 82, 143, 164, 174, 176, 178, 187, 193, 207, 210, 254, 295–297, 303, 313, 315, 328, 332, 337, 345, 349 Superfrost plus ........................................................... 41, 42 Superscript™ ...................................................146, 252, 378 SYBR Green ..........................................141, 142, 149, 150

T T4 Ligase .....................................................140, 142, 147, 148, 150, 223 T7-Oligo(dT) Primer.................................................... 252

CANCER GENE PROFILING 450 Index Taq .........................................110, 111, 119, 120, 140, 143, 149, 152, 177, 181, 248, 259, 343, 346, 352, 353, 356, 359 Taqman® .................................................352, 353, 356, 359 Target ............................. 39, 69, 80, 85, 101, 102, 107, 108, 165, 172, 189, 190, 232, 234–237, 240, 242, 244, 246, 251, 254, 262–263, 301, 315, 317, 319, 320, 342, 352, 376–379, 394, 398, 427–445 TBE .............................................. 111, 121, 122, 177, 248, 251, 260, 262, 343, 347 TE ..................................112, 141, 142, 149, 151, 177–180, 187, 246, 248, 258, 308–310, 324, 325, 337, 343, 344 Tetramethylammonium chloride (TMACL) ............... 158, 162, 177, 183, 248, 263 Tissue microarray (TMA) ......................49–60, 63, 72, 158 Tissue repository ...............2, 3, 5–10, 12, 14, 18–21, 23–28 Tissue Tek® ..................52, 56, 138, 143, 183, 287, 295, 296

TRIzol®

............................................. 221, 246, 249, 328, 332, 333, 336 Tumor associated autoantibodies (TAAB) ............ 215–218 Tumor suppressor gene ................ 80, 82, 84, 100, 101, 109, 214, 237, 239, 243 Tween-20 ...................................... 162, 177, 183, 220, 263

V Validation ..................................................80, 83, 334, 399

W Western blot .......................................................... 198, 218 Whole genome, 91, 172, 174, 176, 178, 187, 193, 265, 346, 366

X Xylol .......................................................................... 41, 42