volume 28 number 10 october 2010
F o c u s
o n : ep i g e n et i c s
e d i tor i a l
© 2010 Nature America, Inc. All rights reserved.
1031 Making a mark
The painting Histone Subunit Exchange portrays the dynamic nature of nucleosome structure. This issue focuses on the role of epigenetics in health and disease and discusses the therapeutic prospects of targeting the epigenetic machinery. Credit: David Sweatt.
op i n i o n a n d c omme n t C O M M E N TA R Y 1033 Linking cell signaling and the epigenetic machinery Helai P Mohammad & Stephen B Baylin 1039 Tackling the epigenome: challenges and opportunities for collaboration John S Satterlee, Dirk Schübeler & Huck-Hui Ng 1045 The NIH Roadmap Epigenomics Mapping Consortium Bradley E Bernstein, John A Stamatoyannopoulos, Joseph F Costello, Bing Ren, Aleksandar Milosavljevic, Alexander Meissner, Manolis Kellis, Marco A Marra, Arthur L Beaudet, Joseph R Ecker, Peggy J Farnham, Martin Hirst, Eric S Lander, Tarjei S Mikkelsen & James A Thomson 1049 Epigenomics reveals a functional genome anatomy and a new approach to common disease Andrew P Feinberg
c omp u tat i o n a l b i o l o g y C O M M E N TA R Y 1053 Putting epigenome comparison into practice Aleksandar Milosavljevic
resear c h Reviews 1057 Epigenetic modifications and human disease Anna Portela & Manel Esteller 1069 Epigenetic modifications as therapeutic targets Theresa K Kelly, Daniel D De Carvalho & Peter A Jones 1079 Epigenetic modifications in pluripotent and differentiated cells Alexander Meissner 1089 Genomics tools for unraveling chromosome architecture Bas van Steensel & Job Dekker Nature Biotechnology (ISSN 1087-0156) is published monthly by Nature Publishing Group, a trading name of Nature America Inc. located at 75 Varick Street, Fl 9, New York, NY 10013-1917. Periodicals postage paid at New York, NY and additional mailing post offices. Editorial Office: 75 Varick Street, Fl 9, New York, NY 10013-1917. Tel: (212) 726 9335, Fax: (212) 696 9753. Annual subscription rates: USA/Canada: US$250 (personal), US$3,520 (institution), US$4,050 (corporate institution). Canada add 5% GST #104911595RT001; Euro-zone: €202 (personal), €2,795 (institution), €3,488 (corporate institution); Rest of world (excluding China, Japan, Korea): £130 (personal), £1,806 (institution), £2,250 (corporate institution); Japan: Contact NPG Nature Asia-Pacific, Chiyoda Building, 2-37 Ichigayatamachi, Shinjuku-ku, Tokyo 162-0843. Tel: 81 (03) 3267 8751, Fax: 81 (03) 3267 8746. POSTMASTER: Send address changes to Nature Biotechnology, Subscriptions Department, 342 Broadway, PMB 301, New York, NY 10013-3910. Authorization to photocopy material for internal or personal use, or internal or personal use of specific clients, is granted by Nature Publishing Group to libraries and others registered with the Copyright Clearance Center (CCC) Transactional Reporting Service, provided the relevant copyright fee is paid direct to CCC, 222 Rosewood Drive, Danvers, MA 01923, USA. Identification code for Nature Biotechnology: 1087-0156/04. Back issues: US$45, Canada add 7% for GST. CPC PUB AGREEMENT #40032744. Printed by Publishers Press, Inc., Lebanon Junction, KY, USA. Copyright © 2010 Nature America, Inc. All rights reserved. Printed in USA.
i
volume 28 number 10 october 2010 e d i tor i a l 987
Teetering on the brink
news
© 2010 Nature America, Inc. All rights reserved.
Uncertain future for ESC research, p 987 and p 991
989 Geron trial resumes, but standards for stem cell trials remain elusive 990 China’s $2.4 billion splurge 991 US courts throw ES cell research into disarray 992 Drug user fees top $1 million 992 Sugar beets still in the game 992 Roche backs Aileron’s stapled peptides 994 Life swallows Ion Torrent 994 Anti-anemics price hike 994 Genzyme resumes shipping as Sanofi-aventis hovers 995 Cancer research fund launches biologics pilot plant 996 Wellcome partners with India 996 Hungary eyes biotech jobs 996 Monsanto relaxes restrictions on sharing seeds for research 997 Newsmaker: Constellation Pharmaceuticals 998 data page: Drug pipeline: Q310 999 news feature: Turning the tide in lung cancer 1003 news feature: At the heart of genetic testing
B i oe n trepre n e u r B u i l d i n g a b u s i n ess
Screening for rare heart conditions, p 1003
1007 Why you need a lawyer Craig Shimasaki
op i n i o n a n d c omme n t 1010 1012 1015 1017
C O R R E S P O ND E NC E Safe and effective synthetic biology The regulatory bottleneck for biotech specialty crops ProHits: integrated software for mass spectrometry–based interaction proteomics More sizzle than fizzle
c omme n tary 1018 Case study: The path less costly Brady Huggett
F eat u re pate n ts IP windfall for faculty Down Under, p 1019
nature biotechnology
1019 Faculty and employee ownership of inventions in Australia Amanda McBratney & Julie-Anne Tarr 1023 Recent patent applications in gene synthesis 1023 Selected patent expirations/extensions in the second half of 2010
iii
volume 28 number 10 october 2010 N E W S A ND VI E W S RTK RTK RAS
TORC2
MEK
ERK RSK
1025 Timing is everything in the human embryo see also p 1115 Ann A Kiessling
PI(3)K
TORC1
PDK1
AKT
S6K
1026 Taking the measure of the methylome see also p 1097 and p 1106 Stephan Beck 1028 Tracing cancer networks with phosphoproteomics David B Solit & Ingo K Mellinghoff 1030 Research highlights
Compound-directed biomarker discovery, p 1028
resear c h
© 2010 Nature America, Inc. All rights reserved.
a n a lys i s 1097 Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications R A Harris, T Wang, C Coarfa, R P Nagarajan, C Hong, S L Downey, B E Johnson, S D Fouse, A Delaney, Y Zhao, A Olshen, T Ballinger, X Zhou, K J Forsberg, J Gu, L Echipare, H O’Geen, R Lister, M Pelizzola, Y Xi, C B Epstein, B E Bernstein, R D Hawkins, B Ren, W-Y Chung, H Gu, C Bock, A Gnirke, M Q Zhang, D Haussler, J R Ecker, W Li, P J Farnham, R A Waterland, A Meissner, see also p 1026 M A Marra, M Hirst, A Milosavljevic & J F Costello 1106 Quantitative comparison of genome-wide DNA methylation mapping technologies C Bock, E M Tomazou, A B Brinkman, F Müller, F Simmer, H Gu, N Jäger, see also p 1026 A Gnirke, H G Stunnenberg & A Meissner A R T ICL E
Benchmarking DNA methylation analysis, p 1097 and p 1106
1115 Non-invasive imaging of human embryos before embryonic genome activation predicts development to the blastocyst stage C C Wong, K E Loewke, N L Bossert, B Behr, C J De Jonge, T M Baer & see also p 1025 R A Reijo Pera l etter 1123 Substrate elasticity provides mechanical signals for the expansion of hemopoietic stem and progenitor cells J Holst, S Watson, M S Lord, S S Eamegdool, D V Bax, L B Nivison-Smith, A Kondyurin, L Ma, A F Oberhauser, A S Weiss & J E J Rasko 1129 errata and corrigenda
c areers a n d re c r u i tme n t 1131 Portfolio managing for scientists David Sable 1132 people
Insights into early human development, p 1115
nature biotechnology
v
in this issue
© 2010 Nature America, Inc. All rights reserved.
Focus on epigenetics Driven by new technologies that allow easier mapping of chromatin modifications and DNA methylation, the field of epigenetics has seen an enormous increase both in new insights and in attention by the scientific community and the wider public. This issue of Nature Biotechnology surveys the current state of the field and sketches out where the future might take us. Much of the renewed interest in epigenetics stems from the discovery that aberrant placement of epigenetic marks is strongly correlated with a diverse range of human diseases, most notably cancer [Review, p. 1057]. A first generation of drugs manipulating global DNA methylation and histone acetylation patterns has shown therapeutic effects, particularly in certain hematologic cancers, and there is considerable excitement about the use of anomalously placed epigenetic marks for diagnostic purposes. Currently, a second generation of compounds that target other histone modifications and microRNAs is under development, whereas more site-specific manipulations of chromatin states lie in the more distant future [Review, p. 1069]. In normal cells, the epigenetic code is written during development or upon differentiation of adult stem cells. The understanding of the epigenetic mechanisms involved in cell lineage commitment is therefore indispensable for the targeted manipulation of embryonic, adult or induced pluripotent stem cells [Review, p. 1079]. In addition to technologies that map the classic epigenetic mechanisms of covalent DNA methylation and histone modifications, a new breed of genome-wide technologies for mapping the three-dimensional structure of the DNA in the nucleus is allowing first glimpses into a previously less appreciated level of transcriptional control [Review, p. 1089]. Currently, we have only a rudimentary understanding of the mechanisms that guide the placement of epigenetic marks at specific sequences in the genome and how epigenetic regulation ties in with the rest of the cellular machinery that controls gene expression. The elucidation of these pathways needs to be a primary focus of future research [Commentary, p. 1033]. To obtain a more complete picture of the diversity of the patterns of epigenetic modifications, several large-scale efforts are underway to provide reference measurements of the localization of histone and methylation marks in many different cells types, individuals and disease states [Commentary, p. 1039]. In particular, the National Institutes of Health Roadmap Epigenomics Mapping Consortium is poised to contribute substantially to this goal [Commentary, p. 1045]. Beyond the projects currently underway, a thorough appreciation of the involvement of epigenetics in human diseases will require the analysis of large-scale cohorts for DNA methylation and chromatin modifications, which will entail the development of new approaches that drive down the cost and increase the throughput of epigenetic assays [Commentary, p. 1049]. The scale and heterogeneity of epigenomic data present computational challenges that may be addressed by comparative analysis, which has been a successful strategy for filtering signal from noise and for identifying functional elements in genomes [Commentary, p. 1053]. A word of thanks goes to our sponsors, EpiNova DPU (a GlaxoSmithKline discovery unit), Cellzome, and Active Motif, whose generous support was essential in producing this supplement, the contents of which will be freely available online for six months. Finally, we wish to express our gratitude to the authors of the commentaries and reviews in this focus. ME
A window onto human development In a study that may improve the success rates of in vitro fertilization (IVF) clinics, Reijo Pera and colleagues have analyzed a large set of human embryos by time-lapse microscopy, revealing new details about early human development. Working with 242 IVF embryos—far more than in previous studies—the authors search for visual cues that would allow them to predict whether a 2-day-old embryo will Written by Kathy Aschheim, Markus Elsner & Michael Francisco
nature biotechnology volume 28 number 10 OCTOBER 2010
develop into a blastocyst at day 5 or 6. Embryos that meet three criteria have a high likelihood of reaching the blastocyst stage: a first cytokinesis lasting 0–33 minutes, an interval of 7.8–14.3 hours between the first and second mitoses and an interval of 0–5.8 hours between the second and third mitoses. New imaging software tracks these three parameters automatically. The authors also measure gene expression in whole embryos and in individual blastomeres and discover that the cells of a single embryo are strikingly heterogeneous, with some cells dominated by maternal transcripts and others having activated embryonic transcription. If implemented in clinical IVF programs, this noninvasive imaging approach could increase the chances of selecting embryos that will lead to successful pregnancies. [Articles, p. 1115; News and Views, p. 1025] KA
vii
© 2010 Nature America, Inc. All rights reserved.
in this issue
Benchmarking DNA methylation mapping
Stem cells and elasticity
Over the next few years, the DNA methylation patterns of at least 1,000 cell types will be determined in an international effort to create high-quality reference methylomes. In addition, many researchers investigate methylation profiles in their own projects using a multitude of different methods. So far, it has remained unclear how these methods compare in terms of accuracy, cost and genome coverage, and how well the methylation maps derived from the different technologies correspond to each other. Bock et al. and Harris et al. present a systematic comparison of the most commonly used technologies. Harris et al. compare four techniques that use high-throughput sequencing as readout and detect methylated cytosines either by bisulfide conversion or affinity enrichment of sequences with methylated cytosines. Bock et al. evaluate three of the sequencing-based methods and one methylation-sensitive array. Overall, both studies find an encouragingly high concordance between the methylation calls made by the different methods, although they differ significantly in genome coverage and cost per cytosine assayed. [Analysis, p. 1106, p. 1097; News and Views, p. 1026] ME
Biomechanical forces such as shear stress and elasticity are known to influence the behavior of certain types of stem cell. Rasko and colleagues have now investigated the effects of elasticity on hematopoietic stem and progenitor cells. Mouse bone marrow cells or human cord blood cells are cultured on dishes coated with tropoelastin, the precursor of elastin, which confers elasticity to the skin and other tissues. Culture on tropoelastin leads to a several-fold expansion of primitive hematopoietic cell populations. The increase in cell numbers is similar to that achieved by a cytokine cocktail, and the two effects are additive. These findings suggest that manipulation of substrate elasticity may be a valuable complement to other strategies for in vitro expansion of hematopoietic stem cells. [Letters, p. 1123] KA
Patent roundup A recent decision by the Australian High Court means that, unless faculty are bound by an assignment or intellectual property policy, they may own inventions resulting from their research. McBratney and Tarr discuss the case’s implications for inventors and the prospects of Bayh-Dole style legislation coming to fruition in Australia. [Patent Article, p. 1019] MF Recent patent applications in gene synthesis. [New patents, p. 1023] MF
viii
Next month in • Differentiation of hES cells towards chondrocytes • Antibody discovery using small libraries • pH-dependent binding prolongs antibody longevity • Multicolor in situ hybridization in whole embryos • Vascular stem cells cultured for natural products
volume 28 number 10 OCTOBER 2010 nature biotechnology
www.nature.com/naturebiotechnology
EDITORIAL OFFICE
[email protected] 75 Varick Street, Fl 9, New York, NY 10013-1917 Tel: (212) 726 9200, Fax: (212) 696 9635 Chief Editor: Andrew Marshall Senior Editors: Laura DeFrancesco (News & Features), Kathy Aschheim (Research), Peter Hare (Research), Michael Francisco (Resources and Special Projects) Business Editor: Brady Huggett Associate Business Editor: Victor Bethencourt News Editor: Lisa Melton Associate Editors: Markus Elsner (Research), Craig Mak (Research) Editor-at-Large: John Hodgson Contributing Editors: Mark Ratner, Chris Scott Contributing Writer: Jeffrey L. Fox Senior Copy Editor: Teresa Moogan Managing Production Editor: Ingrid McNamara Production Editor: Amanda Crawford Senior Illustrator: Katie Vicari Illustrator: Marina Corral Cover design: Erin DeWalt Senior Editorial Assistant: Ania Levinson
© 2010 Nature America, Inc. All rights reserved.
MANAGEMENT OFFICES NPG New York 75 Varick Street, Fl 9, New York, NY 10013-1917 Tel: (212) 726 9200, Fax: (212) 696 9006 Publisher: Melanie Brazil Exectutive Editor: Veronique Kiermer Chief Technology Officer: Howard Ratner Head of Nature Research & Reviews Marketing: Sara Girard Circulation Manager: Stacey Nelson Production Coordinator: Diane Temprano Head of Web Services: Anthony Barrera Senior Web Production Editor: Laura Goggin NPG London The Macmillan Building, 4 Crinan Street, London N1 9XW Tel: 44 207 833 4000, Fax: 44 207 843 4996 Managing Director: Steven Inchcoombe Publishing Director: Peter Collins Editor-in-Chief, Nature Publications: Philip Campbell Marketing Director: Della Sar Director of Web Publishing: Timo Hannay NPG Nature Asia-Pacific Chiyoda Building, 2-37 Ichigayatamachi, Shinjuku-ku, Tokyo 162-0843 Tel: 81 3 3267 8751, Fax: 81 3 3267 8746 Publishing Director — Asia-Pacific: David Swinbanks Associate Director: Antoine E. Bocquet Manager: Koichi Nakamura Operations Director: Hiroshi Minemura Marketing Manager: Masahiro Yamashita Asia-Pacific Sales Director: Kate Yoneyama Asia-Pacific Sales Manager: Ken Mikami DISPLAY ADVERTISING
[email protected] (US/Canada)
[email protected] (Europe)
[email protected] (Asia) Global Head of Advertising and Sponsorship: Dean Sanderson, Tel: (212) 726 9350, Fax: (212) 696 9482 Global Head of Display Advertising and Sponsorship: Andrew Douglas, Tel: 44 207 843 4975, Fax: 44 207 843 4996 Asia-Pacific Sales Director: Kate Yoneyama, Tel: 81 3 3267 8765, Fax: 81 3 3267 8746 Display Account Managers: New England: Sheila Reardon, Tel: (617) 399 4098, Fax: (617) 426 3717 New York/Mid-Atlantic/Southeast: Jim Breault, Tel: (212) 726 9334, Fax: (212) 696 9481 Midwest: Mike Rossi, Tel: (212) 726 9255, Fax: (212) 696 9481 West Coast: George Lui, Tel: (415) 781 3804, Fax: (415) 781 3805 Germany/Switzerland/Austria: Sabine Hugi-Fürst, Tel: 41 52761 3386, Fax: 41 52761 3419 UK/Ireland/Scandinavia/Spain/Portugal: Evelina Rubio-Hakansson, Tel: 44 207 014 4079, Fax: 44 207 843 4749 UK/Germany/Switzerland/Austria: Nancy Luksch, Tel: 44 207 843 4968, Fax: 44 207 843 4749 France/Belgium/The Netherlands/Luxembourg/Italy/Israel/Other Europe: Nicola Wright, Tel: 44 207 843 4959, Fax: 44 207 843 4749 Asia-Pacific Sales Manager: Ken Mikami, Tel: 81 3 3267 8765, Fax: 81 3 3267 8746 Greater China/Singapore: Gloria To, Tel: 852 2811 7191, Fax: 852 2811 0743 NATUREJOBS
[email protected] (US/Canada)
[email protected] (Europe)
[email protected] (Asia) US Sales Manager: Ken Finnegan, Tel: (212) 726 9248, Fax: (212) 696 9482 European Sales Manager: Dan Churchward, Tel: 44 207 843 4966, Fax: 44 207 843 4596 Asia-Pacific Sales & Business Development Manager: Yuki Fujiwara, Tel: 81 3 3267 8765, Fax: 81 3 3267 8752 SPONSORSHIP
[email protected] Global Head of Sponsorship: Gerard Preston, Tel: 44 207 843 4965, Fax: 44 207 843 4749 Business Development Executive: David Bagshaw, Tel: (212) 726 9215, Fax: (212) 696 9591 Business Development Executive: Graham Combe, Tel: 44 207 843 4914, Fax: 44 207 843 4749 Business Development Executive: Reya Silao, Tel: 44 207 843 4977, Fax: 44 207 843 4996 SITE LICENSE BUSINESS UNIT Americas: Tel: (888) 331 6288 Asia/Pacific: Tel: 81 3 3267 8751 Australia/New Zealand: Tel: 61 3 9825 1160 India: Tel: 91 124 2881054/55 ROW: Tel: 44 207 843 4759
[email protected] [email protected] [email protected] [email protected] [email protected]
CUSTOMER SERVICE www.nature.com/help Senior Global Customer Service Manager: Gerald Coppin For all print and online assistance, please visit www.nature.com/help Purchase subscriptions: Americas: Nature Biotechnology, Subscription Dept., 342 Broadway, PMB 301, New York, NY 100133910, USA. Tel: (866) 363 7860, Fax: (212) 334 0879 Europe/ROW: Nature Biotechnology, Subscription Dept., Macmillan Magazines Ltd., Brunel Road, Houndmills, Basingstoke RG21 6XS, United Kingdom. Tel: 44 1256 329 242, Fax: 44 1256 812 358 Asia-Pacific: Nature Biotechnology, NPG Nature Asia-Pacific, Chiyoda Building, 2-37 Ichigayatamachi, Shinjuku-ku, Tokyo 162-0843. Tel: 81 3 3267 8751, Fax: 81 3 3267 8746 India: Nature Biotechnology, NPG India, 3A, 4th Floor, DLF Corporate Park, Gurgaon 122002, India. Tel: 91 124 2881054/55, Tel/Fax: 91 124 2881052 REPRINTS
[email protected] Nature Biotechnology, Reprint Department, Nature Publishing Group, 75 Varick Street, Fl 9, New York, NY 10013-1917, USA. For commercial reprint orders of 600 or more, please contact: UK Reprints: Tel: 44 1256 302 923, Fax: 44 1256 321 531 US Reprints: Tel: (617) 494 4900, Fax: (617) 494 4960
n e w s f e at u r e
At the heart of genetic testing
It has not been a very happy year for those hop- pany (now a LabCorp subsidiary), is absolutely ing that genetic testing was going to revolution- tart in his criticism of any linkage between what ize our ability to predict who was and who wasn’t is called genome-wide association studies in going to come down with major heart diseases. heart disease and monogenic sequencing tests Not to mention using that knowledge to do for gene-specific heart conditions. “It’s like comsomething about the conditions. In February, paring apples to zebras,” he says. A classic example of a testing apple can an article in the Journal of the American Medical Association found that when 19,000 American be seen in the use the Canadian province of women were followed an average of 12 years, Newfoundland and Labrador has been making of a genetic screenan analysis of their ing for the heart genetic differences disorder known as “did not improve arrhythmogenic c ard i ov a s c u l ar right ventricular risk prediction”1. cardiomyopathy The catchline of (ARVC). ARVC an article in Science causes a fatty magazine in June buildup in the declared “So far, heart, which often genome-wide without warning association studgenerates a highly ies have not found irregular heart common genes beat and then, no with a big impact heartbeat at all. on heart health”2. ARVC has become And The New York infamous in the Times also in June Camaroon soccer star Marc-Vivien Foe collapsed and world of sport as declared “after died of hypertrophic cardiomyopathy at the age of 28. one explanation 10 years of effort, geneticists are almost back to square one in for why previously ostensibly healthy athletes knowing where to look for the roots of com- suddenly collapse after a competition. “ARVC often goes undetected until a person mon disease”3. Buried beneath the gloom is what might be drops dead,” says Kathy Hodgkinson, a genetitermed a good news asterisk. It reads: none of the cist and clinical epidemiologist at Memorial above is true if we shift our gaze from common University in St. John’s, Newfoundland, who heart conditions to a wide range of less com- wrote her PhD thesis on ARVC genetics in the mon, but genetically linked, cardiac diseases. province. “And it appears in Newfoundland Over the past five years or so, testing for gene families more often than it does elsewhere.” Although it is estimated that ARVC worldmutations connected to them has been transforming how doctors diagnose illnesses, treat wide afflicts roughly 1-in-5,000 people, that patients and expand that treatment to include number may be as high as 1-in-1,000 people in Newfoundland. The high incidence is the fruit family members. It has also given birth to a commercial gene of a highly penetrant mutation, which a study testing industry that believes it is perched on the of old records and family bibles suggests first appeared in the late 1700s in descendents of a brink of a major leap forward. British immigrant. After ARVC was first clinically described in Preventing early death The difference between what is happening in the 1980s, researchers at Memorial University two domains is so acute that David Margulies, in St. John’s, Newfoundland began to study cofounder and CEO of Correlagen, a Waltham, the genetics of the early and unexpected heart Massachusetts–based genetic diagnostics com- attacks occurring in the province. In 1997, the Corbis
© 2010 Nature America, Inc. All rights reserved.
Genetic testing for rare heart conditions might someday expand to more common cardiac ailments. Already there are signs testing is dramatically changing how some conditions are treated and doctors’ definition of who a patient is. Stephen Strauss reports.
nature biotechnology volume 28 number 10 OCTOBER 2010
group initiated a formal search for the gene that gives rise to the condition. They first localized it to chromosome 3 and then in 2007 uncovered the exact gene where the Newfoundland-rooted mutation occurs. This research has off the top allowed scientists to get a more precise measurement of Newfoundland’s ARVC’s deadly demographics. When 18 extended families carrying the mutations were studied—the largest one comprising 1,200 people with records of heart deaths extending over ten generations—it turned out that the median age of death for men is 41. Women, probably because of the mitigating effect of estrogen, on average die at 71. But equally important, when you know who carries the gene defect, there is something you can do about it. Newfoundland doctors are now counseling family members with the mutation to have a cardiac defibrillator implanted. The recommendation is being made to boys in their late teens and girls in their late 20s, even if there is no overt sign of any heart disease. With their families’ history of early deaths on their minds, the cardioverter defibrillator (ICD) implantation is an option that Newfoundlanders are seizing upon. By 2009, 104 adults who carry the mutation have been offered an ICD, and only nine refused to be implanted. And the intervention is working. Last year the Memorial researchers reported that the five-year mortality rate in men who had an ICD implanted in them was zero. This compares with a death rate of 28% for men who didn’t have the implantation. “We have been able to take a heart attack, which in the past was seen as an act of God, explain it as an act of genetics, and then do something to keep their genetics from prematurely killing people,” says Terry-Lynn Young, a professor of molecular genetics at Memorial University, who has been spearheading the study of the mutation in the province. Spawning diagnostics The gene became part of a generalized ARVC screening test that Newton, Massachusetts based-PGxHealth offers for five genes associated with variants of the condition. But more significantly, it has now become part of PGxHealth’s suite of heart disease gene screening. Beginning in 2004, with a test for long QT syndrome (LQTS), which is also a sudden and unexpected heart killer, PGxHealth now tests for six separate heart conditions. In total, upwards of 100 genes associated with genetically linked heart conditions are being screened for by various companies (Table 1). The tests have become increasingly sophisticated and can now quantify the percentage of cases that can be linked to each individual gene mutation. The differences are rather striking. 1003
NEWS f e at u r e
Table 1 Genetics of rare heart conditions
© 2010 Nature America, Inc. All rights reserved.
Disease
Number of genes
Approximate frequency
Treatments
Hypertrophic cardiomyopathy
17
1 in 500
Beta blockers, implantable cardioverter-defibrillator, lifestyle changes
Dilated cardiomyopathy
23
1 in 2,500
Avoidance of alcohol, lowered salt intake, various heart failure drugs, Implantable cardioverter-defibrillator, heart transplants
Long QT syndrome
12
1 in 5,000–7,000
Brugada syndrome
6
1 in 2,000–10,000
Implantable cardioverter-defibrillator
Arrhythmogenic right-ventricular cardiomyopathy
7
1 in 1,000–-10,000
Beta blockers, implantable cardioverter-defibrillator, avoidance of strenuous activities
Catecholaminergic polymorphic ventricular tachycardia
2
1 in 10,000
Thus, whereas in dilated cardiomyopathy (one of a group of diseases in which the heart muscle wastes away) 12 genes associated with the condition account for no more than 6% of the cases, in hypertrophic cardiomyopathy (a thickening of the heart, particularly of the left ventricle) two of nine associated genes used by several companies in gene testing account for as much as 40–60% of the cases. The growing number of genes associated with these disorders is important, not simply because it leads to a deeper understanding of the biological pathways involved, but because for certain genes, the specific mutation a person carries may have profound clinical significance. Effectively, conditions that before genetics testing were seen as singular illnesses have in the past few years been grouped into closely related conditions, each of which may manifest itself, and be treated, differently. For example, the genetic tests for LQTS differentiate several varieties of the condition associated with different genes. Type 1 LQTS accounts for 35% of the cases, type 2 for 30% and type 3 for 10%. The other ten genes currently associated with the condition collectively account for only about 2% of the cases. The triggers for the variants can be quite different. Strenuous exercise, particularly swimming, has been associated with attacks and deaths in type 1 LQTS. However, Peter Schwartz, a cardiologist at the University of Padua in Italy, who has been studying the condition since the early 1970s, says “we found, and that was a surprise, that [those with] type 2 and 3 are at very low risk during exercise, as it is not a trigger for them.” What triggers type 2 LQTS are loud noises, think a telephone suddenly ringing or an alarm clock bell. Conversely, in type 3 LQTS, the most important trigger is depression and sleeping. What has also followed from the splitting of the condition into three genetically differentiated disorders is a partial realization of the dream of personalized medicine. Doctors now 1004
Beta blockers, implantable cardioverter-defibrillator, avoidance of strenuous activities
Beta blockers
recommend that people with type 1 LQTS limit strenuous activities but those with type 2 or type 3 need not. What’s more, there are implications for drug prescription. Schwartz has shown that beta blockers, which typically were given to everyone diagnosed with LQTS, are significantly more protective for those with type 1 LQTS than for those with type 2, and perhaps not at all effective for type 3. “Screening is, in variance what with a lot of people think, not just a research tool; it is a clinical tool. There is no doubt that cardiac genetics is allowing us to modify disease management,” remarks Schwartz. Screening exercises Another part of screening’s clinical significance is that it has added a significant new tool to cardiologists’ diagnostic armatorium. Many of the classic diagnostic technologies that indicate heart disease fail when it comes to conditions in which the heart suddenly stops beating because of a genetic abnormality. “Many times people with these conditions can have a normal EKG [electrocardiogram], because your EKG is just a spot look,” says Sherri Bale, co-president and clinical director of GeneDx, a gene screening diagnostics company in Gaithersburg, Maryland. “It is a minute-anda-half, or three minutes, or whatever, snapshot of your heart. If an arrhythmia doesn’t occur during that time, you don’t see anything.” One marker of the significance of gene screening for diagnosis is that professional organizations are beginning to recommend that screening for disease-causing gene mutations become a normal part of the diagnosis process. For example, the European Task Force on Diagnosing ARVC recently recommended that the diagnostic criteria be revised to include “identification of a pathogenic mutation categorized as associated or probably associated with ARVC/D in the patient under evaluation”4.
Who’s your patient? The diagnostic reach of cardiac disease testing is doing more than improving diagnoses, it is now forcing physicians to reconfigure their view as to whom their patients are. “Traditionally cardiologists are good at seeing the disease in front of them and then nailing it, attacking it, treating it. They are hardwired to treat an individual patient well,” says Michael Ackerman, a pediatric cardiologist, who is director of Mayo Clinic’s Long QT Syndrome Clinic in Rochester, Minnesota. “What we are not good at in cardiology historically is thinking of these as genetic diseases and reflecting ‘I now have to think like a family medicine doctor. I now have to take care of all the family’,” he says. Some of the changes require an organizational reconfiguration. Ackerman points to the LQTS clinic he set up at Mayo in 2000 that is geared to evaluate, counsel and treat all affected family members, regardless of age rather than having the children seen in one medical facility and the adults seen in another across the city. This is important because potentially quite a lot of family members might come in to be treated, particularly if the gene is dominant and therefore could have been passed on to half of close blood relatives. Heidi Rehm, a geneticist at Harvard Medical School and director of the Laboratory for Molecular Medicine at Partners HealthCare Center for Personalized Genetic Medicine in Cambridge, Massachusetts, is preparing a paper on the genetic testing of over 2,000 people with hypertrophic cardiomyopathy at her facility from 2004 to 2010. Of the first 533 individuals who tested positive for the mutation, 255 subsequently brought in at least one family member to be tested. All told, an average of 3.4 people per family were tested with the range being a single family member to 33 members of one huge extended family. Even so, expanding these practices to include gene-carrying family members has proven
volume 28 number 10 OCToBER 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
n e w s f e at u r e daunting to doctors, in part because, as Schwartz remarks, “a large majority of physicians grew up not knowing a thing about genetics.” As a consequence, gene diagnosis companies are trying to bridge the information gap by having genetic counselors on staff whose specific job it is to counsel not patients but the doctors who must treat them. The issues are both complex and varied. For example, family testing means cardiologists must now confront a new emotional element in their practices. “There is a lot of sudden anxiety when people have to deal not only with the death of a family member but with something that now can affect the rest of the family. There is an emotional overload for a lot of people coming to get this type of testing,” says Amy Daly, a genetic counselor with GeneDx. At the same time, cardiologists must deal with family members’ refusing genetic testing for themselves—and the dire consequences of that ignorance. The Newfoundland group wrote in a recent paper of a 31-year-old man who declined to be tested, even though ARVC had been detected in his family. He subsequently died while golfing from what turned out to be ARVC5. “There sometimes is total denial, people just saying ‘this isn’t going to happen to me’,” Memorial University’s Young explains. A different, and happier result, was reached when it was discovered that a young Newfoundland man training to become a commercial pilot carried the ARVC mutation with its risk of sudden death. “We just talked things through,” says geneticist Hodgkinson, “and he decided to change careers.” An interesting conundrum for physicians is what to do if subjects at high risk of sudden death choose to ignore the information and continue in a profession where their condition might put other lives in danger. Parents must also decide whether or not to test their potentially at-risk children for the mutations. The difficulties that these and other genetic screening and diagnosis issues introduce into a medical practice have fed into what is seen as a general reluctance by many cardiologists to expand their treatment to include gene testing and gene counseling for family members. Some feel only a legal impetus is going to change this. In a recent editorial in the Journal of the American College of Cardiology, Schwartz has argued that only the threat of malpractice will produce a general acceptance of what is being termed ‘cascade screening’6. “I am afraid the turning point will be when someone will be convicted in court for not having recommended genetic screening and someone died,” he says.
The mutational conundrum Whereas the rapid expansion and almost immediate applications of genetic screening for less common heart conditions clearly has been bene ficial, it has brought with it several unresolved issues. One is the meaning and the multiplicity of mutations. Rehm points to data she analyzed several years ago, where she found that out of more than 1,000 mutations in her database “850 of them were pathogenic, 150 were not.” What is unclear in the extreme is how to differentiate the dangerous from the benign when it comes to mutations. “When you scan a large group of healthy volunteers … rare variants pop up in them, pop up right next door to amino acids in which there is no doubt about disease mutations,” says Ackerman. Not to mention the effect of multiple mutations. About 7% of the people in Rehm’s study have at least one additional mutation. “The significance of the second mutation isn’t always clear,” says Rehm. This is confusing to doctors. “Physicians have a lot of questions about what we call ‘variants of unknown significance’,” says GeneDx’s Daly. But it may be even more confusing to patients and family members who have to decide if they are going to initiate treatments or actions to reduce their risks. In a soon-tobe-published paper, Rehm and her associates describe how, when a positive mutation result came in, one mother decided to severely reduce the activity of one of her children, only to be informed a year later that the laboratory that screened for the disease had decided the mutation was benign. The money game And then there is the question of who pays and how much they pay for the testing. Indeed, people point out that the differences between countries when it comes to paying for gene testing is almost a litmus test for that country’s medical system. Ackerman says that when the tests for LQTS genes first became commercially available in 2004, there was a great deal of excitement because it was felt that the tests were finally going to be of clinical significance. This was in part driven by the fast turnaround time of the commercial tests—6 to 8 weeks as opposed to months or even years when university laboratories alone oversaw testing. “But guess what? What we learned —our patients’ insurance was not paying for it,” Ackerman notes. It has only been in the past couple of years that many US payers have been picking up most, generally about 75%, of the price of the testing. Part of what has convinced them has been the economics of a negative screening. In place of conducting yearly magnetic resonance imaging or EKGs on patients whose susceptibility
nature biotechnology volume 28 number 10 OCTOBER 2010
to the disease is unknown, noncarriers can be excluded from the testing lists. Others point out that governments in places like New Zealand and Canada are more willing to pay for the screenings because it is in their long-term economic interest. “The payer who is paying for the test is same [one] who pays for the treatment of heart disease two decades later,” says Correlagen CEO Margulies. Because this is not the case in the US, gene test prices are not so much what actually get paid, but the opening level at which negotiations between payers and gene screening companies begins. “We are paid very different amounts by different payers on different days,” says Margulies. Moving the technology forward Ask people involved what the future holds for heart gene testing and the first words that come out are “more, better, cheaper.” Using what are called next-generation or third-generation sequencing platforms, companies are racing to increase the number of genes being tested and decrease the costs of the tests. Ackerman foresees the day in five or ten years when everyone gets a test for their gene variations for less than $1,000. That might mean that today’s specific tests for specific heart genes may be folded into a generalized gene screening. “I believe we are in a ten-year window for disease-specific genetic testing,” Ackerman says. GeneDx’s Bale on the other hand doesn’t believe gene-specific diagnostic tests for inherited heart failure are going to cease to be conducted. With more genes will come more complexity and “unfortunately we will identify tons of stuff we don’t know how to interpret,” she says. Nonetheless change is happening now. Rehm says Partners HealthCare is working on a heart screening test for 65–70% of the most frequent mutations associated with rare heart conditions. “I think we will catch half of all positives with this screening test. You never will get an inclusive result, because we will only test for variants we know the significance of.” That change won’t be five or ten years away and cost $1,000. “The goal is to do that testing for under $500. We hope to have such a test available by the end of the year,” Rehm says. Stephen Strauss, Toronto 1. Paynter, N.P. et al. J. Am. Med. Assoc. 303, 631–637 (2010). 2. Couzin- Frankel, J. Sci. 328, 1220–1221 (2010). 3. Wade, N. The New York Times, 12 June 2010
4. Marcus, F.I. et al. Circulation 121, 1533–1541 (2010). 5. Hodgkinson, K. et al. Genet. Med. 11, 859–865 (2009). 6. Schwartz, P.J. J. Am. Coll. Cardiol. 55, 2577–2579 (2010).
1005
building a business
Why you need a lawyer Craig Shimasaki What’s involved in formally starting a biotech company?
© 2010 Nature America, Inc. All rights reserved.
C
reating a sustainable biotech company is analogous to driving from New York City to Los Angeles. There are myriads of routes to get there, but if you start out heading north, you will never arrive. More to the point, if you’re headed north and not legally licensed to drive, not only will you fail to reach your destination but you may also experience disastrous consequences. For would-be entrepreneurs, establishing a venture as a legal entity is the key first step in making the business a reality and moving it forward. This article summarizes the key tasks in legally founding your company and outlines the different types of legal expertise you will need to recruit. Doing this correctly at the beginning will pay dividends in terms of your ability to attract capital, align business and scientific goals, and set your company on the path to success. The legal team So you have a concept for your new venture. Your first step in making it a reality is to find a great attorney. You might ask, “Why do I need an attorney? Aren’t there legal forms available online that can save me a lot of money?” Yes, there are, and in fact most attorneys use their own boilerplate documents. But when you hire an attorney, you are paying for experienced legal advice and business guidance—not for someone who fills out forms. You should consider your attorney the most critical employee for your budding organization because his or her counsel and advice will directly impact the direction you take in corporate and financing matters. For instance, your attorney will advise you on the impact of terms for founders’ agreements, your strategy for issuing stock options, the implications of tax law, and securities and financing Craig Shimasaki is CEO of BioSource Consulting, Oklahoma City, USA. e-mail: [email protected]
Box 1 Count the costs Legal expenses are typically greater than you might anticipate, but getting your business established correctly will save you major headaches later. Depending on their experience and locale, corporate attorney rates for biotech startup expertise can range from $200 to more than $750 per hour. All attorneys should give a complimentary initial visit to discuss your situation. If they insist on charging you for an initial consultation, find another attorney. Getting your company established and drawing up founder and employee documents and a license agreement can cost $5,000–$25,000 or more. Cost depends on the complexity of your business, the number of founders and the issues related to a technology license. Legal assistance for closing a round of capital can be $10,000–$50,000 or more depending on the size of the round, the number of investors and other terms related to funding. Your attorney should provide you with a good estimate before beginning any transaction, and some may even give you a flat rate if the work is clearly defined. For larger deals, such as closing on a venture capital round of financing, you may be able to get a commitment for a maximum limit on legal fees. Some attorneys that specialize in startup organizations may even accept deferred compensation but may charge a higher fee and take a small equity position.
issues. Your attorney will also give you advice on the best practices in intellectual property (IP) protection, how to interpret employment law matters and how best to structure various contracts and agreements. The truth is, the biotech entrepreneur will need help from three types of attorneys: corporate, patent and securities. When establishing a company, you should first retain a corporate attorney. A corporate attorney specializes in corporate and business matters for biotech startups and practices business law. He or she should be experienced in startup issues, such as organizational structure, employment agreements, stock options and financing structures—particularly venture capital deals. You will also need a patent attorney who specializes in patent law and biotech patent prosecution—litigation in particular. Make sure this person understands your technology area. Look for a patent attorney with a combined background or dual degree in the area of your technology, such as someone with a JD and a PhD or ChemE. These individuals provide added value because they understand the science
nature biotechnology volume 28 number 10 OCTOBER 2010
and can add to the patent in ways that only an experienced scientist can. During the early stages of your organization one of the most valuable assets you have is your IP, so be sure that it is managed well. If you are the inventor, you already have a working relationship with a patent attorney. If you licensed IP from an institution, your patent portfolio is already being managed by a patent attorney. However, be sure you are confident with the capabilities of this person—or find another. The final type of legal expertise you’ll require is a securities attorney. This person specializes in the legal aspects of acquiring funding, handling private placements and dealing with securities laws. He or she will provide guidance on many issues related to raising capital and will be sure that you are complying with securities laws and protecting the company’s interests as you raise money. Occasionally, you may be able to locate a good corporate attorney who is also experienced in securities. Finding the right attorney is probably easier said than done, as it’s unlikely you’ll know experienced biotech attorneys when first starting your firm. One of the best ways 1007
b u ildi n g a b u si n e ss
Box 2 Changing names There are certain situations in which you might want to consider changing an established company name. Here are some examples:
© 2010 Nature America, Inc. All rights reserved.
• If the company has a troubled past that haunts the new management as it tries to raise money, or if you are reorganizing the company or doing a restart. • If the name is a source of confusion because it was strongly associated with a former focus and the company has a new focus. • If the previous management had a notorious reputation and a clear separation is needed. • If the current name is problematic for business because it ties the company to an unrelated field.
to find one is through networking—start by asking other biotech entrepreneurs who they would recommend. Search for reputable law firms specializing in startup biotechs in your area. You should try to find an attorney with offices in your city because you don’t want to be boarding a plane just to have a face-to-face meeting. But if you don’t live in a biotech hub, you may have no other option than to hire an attorney who does. Long-distance travel isn’t optimal; however, a lawyer living in a biotech hub can provide advantages: these experienced lawyers usually have venture capital contacts and access to seasoned biotech executives, which can help with financing and recruiting. Ideally, you will want to work with an attorney who is a partner or senior member in a smallto medium-sized law firm—this is preferable to working with less-experienced junior staff at a mega law firm. Of course, your fees will be higher working with a senior partner, but you get what you pay for (Box 1). Establishing your company Before you incorporate your company you need a name that brands the company and its future. Barring anything unforeseen (and usually bad), you’ll keep that name for the life of the company (Box 2). There are at least four aspects to consider when choosing a company name: does it represent the current and future focus of the organization, is it relatively easy to pronounce and recognize, is it unique enough that it will not be confused with the names of other organizations and will it work well with envisioned products? There are, of course, other issues to think about, too (Nat. Biotechnol. 28, 16–19, 2010). After selecting a company name, the next step is to formally incorporate and set up a legal structure. This allows for the issuance of stock to potential investors, founders or future employees and it reduces your exposure to liabilities and protects personal assets. But it also provides maximum advantage of tax laws, including carry-forward losses for the business. Another important decision is the choice of corporate structure, which should be discussed 1008
with your attorney and will be based upon your current plans and future direction. There are five corporate structure options in the US: sole proprietorship, partnership, limited liability corporation (LLC), S corporation (S-corp) and C corporation (C-corp). In the UK, there are also limited (Ltd.), public limited (PLC) and unlimited corporations. The selection of your legal structure impacts how the business is taxed and sets differences in liabilities to the owners and fiduciary agents of the company. Some startups may begin as an LLC until they get significant investments. However, because we are talking about a biotech company, ultimately any enterprise in the US will need to be a C-corp, which is this industry’s standard business entity because of laws pertaining to ownership, structuring flexibility, finances and taxation. When incorporating a business, your attorney files the company’s articles of incorporation and bylaws. This filing designates the number of authorized company shares, the number of board members and other related matters. Your state of incorporation can be where you are actually located, but before you secure venture or institutional capital, you’ll likely need to be incorporated in Delaware, where corporate laws and tax laws are more favorable. Your attorney can handle this. Issuing stock Next, your corporate counsel will assist with issuing stock or stock options to the founders, inventors, IP holders and key staff. You should issue stock soon after the organization is established rather than waiting until after capital is raised. When shares are issued upon company formation, they can be granted to the founders at minimum value. If stock is issued after raising a significant amount of capital, there is a specific value imputed to the enterprise. If shares are issued at a discount to that value, the shareholder could have large tax consequences. For instance, upon securing investor financing there is a ‘fair market’ value imputed to company shares based on the amount that investor paid. If shares are simultaneously
iscounted to founders or key employees, there d could be a tax liability based on the difference between the fair market value and the amount of money these founders paid for their stock. There is no reason for founders or key employees to be paying taxes on shares at this stage of the company. Your attorney will guide you through any tax consequences of issuing stock or obtain the help of tax counsel. Your corporate attorney should also give advice on what types of stock to be issued, choosing from founders’ stock, restricted stock, preferred shares, common shares, voting and nonvoting shares, and two kinds of stock options: incentive stock options (ISOs) and nonqualified options (NQOs). These all have different privileges, rights and restrictions. Vesting schedules are usually given with stock options (NQOs and ISOs) and restricted stock. If this is all sounding foreign to you, then you’re beginning to see why hiring an attorney is one of the first things you should do. Many biotech companies are formed by more than one founder, and they all usually receive founders’ shares. It’s tempting to equally divide allotted shares among each founder, but you should first consider what each individual has contributed to establishing the company and what their roles will be going forward. Will they all be working full time? And are they all committed to sticking around to see it through to success? The answer to these questions will help determine the split of founders’ shares. You’ll also need a founders’ agreement that outlines the provisions and considerations given in exchange for work, contribution and IP rights. This document should include a provision that the company can buy back a certain amount of its shares should one of the founders later leave the organization. This prevents a founder who leaves from watching his or her shares rise in value on the labor and sweat of others. Beyond that, there are several other a greements needed for founders and employees alike (Box 3). The board Your articles of incorporation will stipulate that you set up a board of directors. This group has a legal obligation to the company in that they possess a fiduciary (trustee) responsibility to look after the best interests of the overall organization. You and your shareholders elect the board (even if, at startup, the shareholders are just you and a few angel investors). Carefully select board members based on expertise and ability. Do not include friends and family unless they are actually qualified and even then be aware of the pitfalls. Remember that difficult issues are decided by the board
volume 28 number 10 OCTOBER 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
b u ildi n g a b u si n e ss and you do not want personal relationships influencing decisions. The board has two main duties. The first is called ‘duty of care’, meaning it has an obligation to make decisions in a reasonable, careful and prudent manner. All decisions have risk, and any decision can be second guessed, but if the board made a rational decision that’s considered judicious at the time, it has operated under the duty of care. The second is the ‘duty of loyalty’, meaning all decisions or transactions with and for the company must not be motivated by self-dealing or any conflict of interest. If a conflict arises, that board member should disclose it and abstain from voting on that particular issue. A board needs a chairman, and if the CEO is not the chairman, it’s usually a board member appointed by the major shareholders (investors or otherwise). If you are fortunate enough to have good venture capitalists with depth of experience in your field, they will guide and strengthen the remaining board member selection. Odd numbers of board members are chosen to avoid voting logjams, and your board should grow in size as the company grows. In the beginning, the board may consist of only three members. Later, it may grow to five or even seven. A publicly traded biotech company may have nine to eleven members, but it is always advantageous to have fewer instead of more. Board members that are investors or executives of the company are not usually compensated for their participation as they are simply managing their investment. As the company grows and independent board members are added, board compensation is usually a mix of cash, such as an annual retainer, and some form of equity compensation. Depending on the stage of the company, the compensation may simply be reimbursement for out-of-pocket expenses or may be up to several thousand dollars annually. Generally, equity compensation for directors is given as stock options, though it can also be in other forms of stock, as discussed previously. The amount of stock may be between 0.25%–2% of outstanding shares or more depending on the value of these members to the organization. The SAB The scientific advisory board (SAB) is called upon for advice and assistance in matters pertaining to the science. An SAB should be formed early and should be selected based on expertise and knowledge in the technology
Box 3 The dotted line for all These are some typical agreements that cover founders and employees, and they protect intellectual property (IP) assets and provide the assurances that are expected by any new investor in the company. Confidential Disclosure Agreement or Nondisclosure Agreement. This protects the company by requiring that each employee appropriately handle confidential information. By doing this, the company protects its know-how and IP from competitors. Invention Assignment Agreement. This transfers assignment of any and all new inventions conceived by the employee to the company. This ensures that the organization owns the IP required to develop and market its products. There are allowances given for inventions before hire. Non-compete Agreement. This prevents an employee from quitting and starting an identical business in the same field using the same technology. It protects the company from disgruntled founders or key employees going out and starting a competitive business with the information they have been using in your company. Employment Agreement. This contains any other provisions that constitute employment, especially for those who may be considered key employees; these provisions may be combined with the other agreements.
or science of the company—these individuals should be considered experts by their peers. An SAB is not a legally constituted board, and its members do not have fiduciary responsibilities. For that matter, this group could be called a scientific advisory committee if preferred. The number of SAB members will vary, though three to seven is usually sufficient. Have your corporate attorney provide a thorough SAB agreement, which contains member duties, type of compensation, a confidential disclosure or nondisclosure agreement, and specifications about publications and inventions. A secondary purpose of the SAB is to bolster credibility for your company’s science. Individuals considered experts in your field indirectly give credibility to the business venture and are reassuring to potential investors. The SAB members should be willing to present reports on the scientific progress at conferences. Using SABs in this manner can also accelerate acceptance of the company’s work in the eyes of future investors. Having an SAB co-author peer-reviewed publications shows its involvement in and contribution to developing the science. Like the board of directors, the SAB is t ypically compensated with either stock options or restricted stock. The amount of stock options granted varies depending on the company and the critical need of each
individual. Ranges for stock options can include 0.1%–2% of outstanding shares. Ranges for restricted stock can be 0.1%–0.5% of outstanding shares. If your members are highly sought after, sometimes you may need to pay a per-meeting fee or nominal annual retainer to the SAB at early stages. However, it is not unusual to just provide equity and cover out-of-pocket expenses that members incur to attend SAB meetings. After laterstage funding, you may add an annual retainer or a per-meeting fee when the finances of the company can support this. Conclusions The importance of a good attorney cannot be overstated. I have observed potential investors walk away from investing in an organization because of sloppy corporate structure, missing employment and IP agreements, or convoluted and overly complicated licensing agreements. Investors need to have confidence in the management’s ability to run an organization before they will invest. You don’t want to learn later that the optimal route was not taken for your company’s development or that critical agreements were not drafted appropriately. Setting a solid legal framework with appropriate and detailed contracts, licenses and agreements gives new investors confidence and is a key first step to setting the foundation for your business’ future success.
To discuss the contents of this article, join the Bioentrepreneur forum on Nature Network:
http://network.nature.com/groups/bioentrepreneur/forum/topics
nature biotechnology volume 28 number 10 OCTOBER 2010
1009
correspondence
© 2010 Nature America, Inc. All rights reserved.
Safe and effective synthetic biology To the Editor: A letter in your January issue highlights the need for harmonizing biosecurity oversight for gene synthesis1. The US government is currently preparing to publish its final, formal ‘guidelines’ on the procedures at DNA synthesis companies for screening incoming orders for sequences of potential dual-use concern. As the research community continues to debate the promise and risks of synthetic biology, we report here discussions at two major synthetic biology conferences with important implications for safe and effective progress within the field. The 2009 National Academies Keck Futures Initiative on Synthetic Biology (NAKFI-SB) took place in Irvine, California, on November 19–22 and convened more than 160 experts to explore the engineering, scientific and social impact of synthetic biology. Participants were asked to consider such basic questions as what tools and technologies are required to advance the field, why man-made biologic systems are more fragile than natural ones and how to create and improve intercellular communication. Discussions also covered risk assessments, the religious and ethical implications of synthetic biology and how best to leverage the technologies to explore other biological systems. Although the primary focus of NAKFI-SB was to discuss future research and promote interdisciplinary cooperation, the significant inherent risks and potential bioethical implications of synthetic biology were recognized by attendees. In terms of risk assessment, the NAKFI-SB discussions focused on the value of revisiting the selfexamination and self-regulation imposed on early adopters of recombinant DNA technology at the Asilomar meeting2 in light of the increased complexity and ambitious goals for synthetic biology. Attendees also recognized the need for a ‘safety switch’ to disable undesirable ‘neoorganisms’ (Table 1). A second meeting, convened by the American Association for the Advancement of Science (AAAS) Center for Science, Technology and Security Policy on January 11 1010
in Washington, DC, at the request of the US Department of Health and Human Services (DHHS) and the US Department of State, focused on the government’s perspective on minimizing the risk of synthetic biology and critiqued the recent DHHS draft set of voluntary guidelines entitled “Screening
Framework Guidance for Synthetic DoubleStranded DNA Providers” released in November 2009 (ref. 3). Comments were solicited from representatives of the US government agencies, gene-synthesis provider organizations and the biotech and
Table 1 Summary of deliberations at NAKFI-SB meeting Question
Response
What is needed to facilitate synthetic biology?
• Integration of biological vocabulary within computer programming. • Improved analytical and design modeling. • Novel cellular monitoring techniques. • Improved screening technologies. • Enhanced cell lines to improve productivity. • Cheaper technology. • Techniques to create complex entities. • ‘Fail-safe’ systems. • A ‘kill switch’ for neo-organisms.
What are the bioethical considerations?
• Synthetic biology is similar, but not identical, to other genetic engineering techniques. • Implications require regulatory oversight. • Novel ethical issues necessitate specific risk-benefit evaluation. • Ongoing public communication and input is vital.
Is synthetic biology useful as an investigative modality?
• Can be used to evaluate intracellular systems.
Is synthetic biology useful for multicellular systems?
• Can be used to evaluate extra-cellular communication and integration.
How do we make synthetic systems as stable as natural ones?
• Integrate redundancy.
Is synthetic biology useful for multiorganism systems?
• Can evaluate inter-organism interaction.
Are there alternatives to using genes within synthetic biology?
• Chemical and physical interactions can be used to modify biological reactions.
• Would require advances in current technology, but that is expected. • A sharable library of results is essential but that requires standardization of a context-sensitive archiving format. • Could create novel tissues, organs and complete organisms.
• Increase adaptability. • Improve evaluation techniques. • Can search for unique genetic material. • Requires improved database administration. • Unique nongenetic compounds can be developed to influence outcomes. • Alternative engineering techniques (e.g., application of computer design tools) will likely improve results. • Create novel methods for system interfaces and interactions (e.g., optical inputs and outputs). • Isolate created functions from natural processes (e.g., create synthetic organelles or ‘subroutines’).
Is it important that synthetic biologic systems ‘evolve’?
• Provides adaptability.
What is required to fulfill the potential of synthetic biology?
• Enhanced education opportunities at all levels.
• Improved modeling would be valuable. • Need techniques to speed up process to be useful. • Improved and consistent public education and communication.
volume 28 number 10 OCTOBER 2010 nature biotechnology
correspondence pharmaceutical industries as well as biosecurity experts and academics industry players and other concerned parties (a summary of the meeting’s main themes can be found elsewhere4 and is summarized in Table 2). As expected from the diverse nature of the participants, some of the concerns raised were contradictory, but the conference deliberations were constructive in providing the perspective of the major companies involved in commercial gene synthesis and highlighting perceived weaknesses within the current strategy for verifying sequences of potential concern.
The two conferences provided two contrasting perspectives on the field. NAKFI-SB was a broad evaluation of the current status of synthetic biology and the final recommendations focused on methods to advance the field. Besides outlining some technical improvements currently needed to improve productivity, the participants recognized the paramount importance of public communication and of lay participation in regulation and oversight to address potential bioethical issues. They also advocated specific technological steps to improve the stability of engineered biological
systems, including enhanced redundancy and adaptability as characterized by a capacity to evolve to improve efficacy. In terms of applications, participants suggested that synthetic biology is likely to be employed in the evaluation and synthesis of more complex biological systems in the coming years and to progress beyond using single genes to create more complex gene circuits with mechanisms that regulate these novel systems. As the AAAS meeting was convened to comment on proposed US governmental safety regulations, the recommendations were understandably narrower. The importance
© 2010 Nature America, Inc. All rights reserved.
Table 2 Summary of deliberations at AAAS meeting4 Theme
Comments
Recommendation
DHHS guidance
• May inhibit competition and innovation.
• Coordinate customer and sequence screening to assure safety and security across all DNA providers.
• How will proprietary information be protected? • No mechanism for ‘garage biology’ oversight. • No mechanism for DNA providers to share customer information. • No ongoing, updated database of entities prohibited from obtaining synthetic biology technology.
• Provide a mechanism to assure safety and security of synthetic biology technology providers. • Enhance accountability of all aspects of synthetic biology including reporting and appeal mechanisms.
• DNA providers may refuse to fill orders for sequences that require additional expenses to participate in oversight programs. • No oversight of synthesis providers to assure security and safety. • Although the purchase of synthesis technology is a private transaction, there is a lack of an established appeal process for refused orders. Customer screening
• No mechanism to determine who is the end user of technology. • Costs associated with compliance may be prohibitory.
• Supply precise customer screening modalities and criteria to assure safety and security. • Shift some compliance requirements from providers to customer institutions, including ‘Biosafety Committee–like’ review boards. • Compile, review and update a database of approved customers and consider a licensing requirement to allow purchase of synthetic biology technology.
Sequence search methodology
• Automated reviews of DNA sequences are inadequate.
• Human review of all sequence orders.
• Screening against a list does not consider the possible context of use since ‘sequence does not necessarily predict function’.
• Compile, review and update a database of harmful sequences.
• Innovation and discovery would be inhibited if orders are limited to previously described sequences. • Mandatory reporting of DNA sequence orders may compromise proprietary information. • Cannot identify sequences changed by end users. • ‘Best match’ determinations that search for sequences that are more similar to harmful than nonharmful patterns are better than ‘thresholds’ but may be below current industry standards. • Labeling a sequence as potentially ‘of concern’ does not determine actual harmful nature.
• Promote research to determine the fundamentals of harmful sequences and use this information for screening. • Create and promote protocols for sequence screening ‘best practices’. • Establish list of subject matter experts for each potentially harmful select agent. • Screen each order against any potentially harmful sequence not just those on select agent and commercial control lists. • Mandate the use of open-source screening software that is continuously updated. • Screen all orders irrespective of sequence length.
• Proprietary screening software is inadequate. • 200 bp minimum size for sequence screening is inadequate. Implementation and evaluation
• Success is determined by degree of implementation. • The costs of implementation are minimal when compared with other costs of doing business. • Regulatory compliance is difficult to determine.
• Ongoing, regular governmental communication and interaction with industry and research institutions is critical. • Models of illegal and noncompliance methods should be used to evaluate screening modalities. • Screening methods require continuous governmental and industry evaluations of effectiveness. • Screening methods require ongoing evaluation of financial impact on industry. • Effectiveness can be determined in part by the number of providers that claim compliance with regulations and by the number that perform follow-up screening. • DNA providers should be certified.
International engagement
• Voluntary compliance and cooperation is crucial to assure safety and security.
nature biotechnology volume 28 number 10 OCTOBER 2010
• Coordinate and streamline international screening of sequences, customers and industry providers.
1011
© 2010 Nature America, Inc. All rights reserved.
correspondence of improved oversight along the entire chain of production within synthetic biology was emphasized. Increased oversight included improvements in customer and end-product screening modalities and greater cooperation between governments, industry and academics both within the US and elsewhere. Some of the AAAS participants noted that the increased financial burden required to comply with these regulations may impede private industry’s investment in the technology. Discussions at both conferences recognized that the promise of synthetic biology is associated with the potential for significant harm. There is a need to prepare for malicious acts using purely synthetic or hybrid synthetic and/or natural neoorganisms. Additionally, strategies should be in place to predict and prevent such events and to trace the source of such materials should they surface. Current prevention efforts rely on voluntary participation in a software-based matching system that checks orders against select agent sequences to head off the commercial synthesis of select agent genes, but, as the AAAS report details4, that system could be improved. In addition, it is imperative to identify a strong method to label synthetic genes so they can readily be identified as such. Unencrypted watermarks have already been reported in published sequences of synthetic genes (http:// www.wired.com/wiredscience/2008/01/ venter-institut/). Although such watermarks are feasible, currently there is a lack of regulatory controls against surreptitious insertions of sequence; synthetic genes can be tagged with DNA encoding natural amino acids, but the ability to remove, modify or even counterfeit such sequences using conventional molecular biology tools suggests that more robust strategies will be needed. One potential solution would be to create a ‘serial number’ that could be traced back to individual synthesis laboratories or even individual synthesis machines, and encoded into the synthetic gene using an appropriate combination of public-key and private-key hash algorithms. Going forward, public-private cooperation will be vital for safe and effective progress within synthetic biology and to ensure that the field is not restrained by public fears. There must be a concerted effort to minimize the expense associated with regulatory compliance; however, the inherent risks of synthetic biology mandate rigorous oversight especially because the burdens of a major ‘accident’ will be borne by the public. The financial expenditures that companies synthesizing genes will have to bear to 1012
proactively reduce the risk of potential misuse of the technology are substantially less than the estimated costs to respond to a biological disaster. Safety must be designed into the system and not become a secondary concern. In this respect, the attempt to shift the oversight burden from the gene manufacturers to their customers through the creation of institutional ‘biosafety review boards’ modeled after institutional animal care and use committees is likely to be problematic as it would further decentralize the review process and rely on committee structures that were not designed to preemptively detect hazardous modalities. The AAAS4 and NAKFI-SB5 meetings were an excellent starting point for debate and we strongly recommend that the discussions be expanded and that the subsequent safety recommendations become expeditiously implemented.
COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests.
David A LaVan1 & Louis M Marmon2 1Materials Science and Engineering Laboratory, National Institute for Standards and Technology, Gaithersburg, Maryland, USA. 2Department of Surgery, Division of Thoracic and General Pediatric Surgery, Sheikh Zayed Institute for Pediatric Surgical Innovation, Children’s National Medical Center, George Washington University School of Medicine, Washington, DC, USA. e-mail: [email protected]
1. Fischer, M. & Maurer, S.M. Nat. Biotechnol. 28, 20–22 (2010). 2. Berg, P. et al. Proc. Nat. Acad. Sci. USA. 72, 1981– 1984 (1975). 3. Department of Health and Human Services. Fed. Reg. 74, 62319–62327 (November 27, 2009). 4. Marfatia Berger, K., Pinard, W., Coat, G. & Epstein, G.L. Scientists’ Views on the U.S. Government’s Guidance on Synthetic Genomics (AAAS, Washington, DC, 2010). http://cstsp.aaas.org/files/syn%20bio%20 summary%20012110.pdf 5. Synthetic Biology: Building on Nature’s Inspiration (The National Academics Press, Washington, DC, 2010).
The regulatory bottleneck for biotech specialty crops To the Editor: Specialty crops, which include fruits, vegetables, nuts, turf and ornamental crops, are important components of human diets and provide environmental amenities1. In 2007, such crops represented ~40% of the $140 billion in total agricultural receipts, despite being cultivated on just 4% of the total cropped area2. Although tomato was the first genetically modified (GM) food crop to be commercialized in 1994, the only GM specialty crop traits currently marketed are virus-resistant papaya and squash, insect-resistant sweet corn and violet carnations. All of these received initial regulatory approval over 10 years ago. As a group, GM specialty crops have garnered limited market share (the exception is GM papaya resistant to papaya ringspot virus1, which now produces 90% of Hawaii’s crop). In contrast, GM field crops, such as soybean, maize, cotton and canola, have come to dominate the markets in countries where they have been released3. What is responsible for this disparity in the commercialization of GM field crops versus specialty crops? One possibility is that the dearth of GM specialty crops indicates a lack of current research or of beneficial traits
for crop improvement through genetic engineering. Alternatively, research may have continued but progression through the regulatory process to the marketplace may have failed. Anticipated lack of market acceptance could have stopped either research or regulatory submissions. To find out why specialty crops with GM traits have fared so poorly, we have analyzed the research, regulatory and market pipeline to determine which steps in the process may be responsible for the limited range of commercially available products. To assess the recent research and development pipeline for GM specialty crops, an extensive search was conducted on a global scale for scientific journal articles, describing work in specialty crops using recombinant DNA (transgenic) methods, published between January 2003 and October 2008 (Supplementary Table 1). In most cases, these reports demonstrate proof of concept of the effectiveness of the transgene in producing the phenotypic trait in the species studied. Among 313 published articles on specialty crops, 46 species were represented, of which tobacco, potato and tomato accounted for 59% of the total reports, in part due to their use as easily transformed
volume 28 number 10 OCTOBER 2010 nature biotechnology
correspondence
Tobacco 24%
50 40 30 20
Tomato 20%
Potato 23%
10 0
80 70 60 50 40 30 20
United States 27% India 10% China Japan 10% 8%
10 0 St at e In s d Ch ia i Ja na pa n G So er Italy ut ma h n Ko y Ta rea Ca iwa n En nad gl a an Fr d Ne P anc w ol e Ze an a d Au lan Th st d e ra Ne lia th Bra er zi la l Sw nd ed s e Ar Sp n ge ain nt in Is a ra el
60
To ba c P co In di T otat an om o m at us o Pa tard Ca pay ss a a Ap va Le ple ttu Pe ce an u Pe t a Eg Fl r gp ax la Ca nt Pe rro Ry tun t e ia Ca gra bb ss ag e P Fi in Bea el ea n d p m pl us e Ba tar na d na
Number of journal articles
70
Number of journal articles
b
80
Un ite d
a
© 2010 Nature America, Inc. All rights reserved.
Figure 1 International scientific journal publications on transgenic crops. (a) Number of published articles describing research on the top 20 GM specialty crops (of 46 total species). The percentage of reports on each crop is also shown (inset). (b) Number of published articles according to country of origin. The percentage of total articles by country is also shown (inset). A complete list of all publications is in Supplementary Table 1.
model plants in research laboratories (Fig. 1a). Although the United States is the leader in the number of articles published, many reports originate from the European Union (EU; Brussels), India, Japan and China (Fig. 1b). Other plant biotech surveys also indicate that a number of GM specialty crops are being developed in China4,5. Following laboratory studies and proof of concept, development of GM crops generally proceeds to field trials. Because countries began establishing their independent regulatory processes specifically for GM organisms beginning in the early 1990s, thousands of field trial permits have been granted worldwide. The Organization for Economic Co-operation and Development (OECD; Paris) developed the UNU-MERIT field trial database, which collates GM trials that are ongoing in 24 developed countries, although data for China and India are not included (A. Arundel, OECD, personal communication). During this six-year period (2003–2008), the United States accounted for ~70% of all field trials, with 15% of the total field trials being conducted on specialty crops (Fig. 2a). The United States and Canada were responsible for 88% of the 1,231 permitted field trials on specialty crops, with the majority of the Canadian trials focused on mustard crops. The Information Systems for Biotechnology database (http://gophisb.biochem.vt.edu) was also queried to identify all approved field test permit applications in the United States between 1992 and October 2008. Field trials of specialty crops averaged 39% of the number in commodity crops from 1992 to 2002, but only 18% since 2003 (Fig. 2b). Qualitative data on GM crops under development internationally confirm that although laboratory and field trials
have been conducted on GM specialty crops in many countries, none has progressed to commercial production outside the United States, except perhaps virus-resistant tomato and pepper in China, the commercial status of which is currently uncertain6,7. To further evaluate the scope of research that has been conducted on GM specialty crops, we categorized the traits from scientific reports and field trials into two categories: output traits, which would directly benefit consumers; and input traits, which primarily benefit producers and only indirectly benefit consumers through reduced agricultural inputs, higher productivity, lower cost or reduced environmental impacts. This compilation identified 77 specialty crops (listed in Supplementary Table 2) and 260 unique traits (Supplementary Data and Supplementary Table 1). The output traits included modifications in oil, sugar and starch content, protein quality and amino acid composition, vitamin content and nutritional quality, flavor and postharvest quality as well as reduced allergenicity. Input traits included tolerance to abiotic and biotic stresses, insect and nematode resistance, herbicide tolerance, nitrogen acquisition and yield. These data demonstrate that there is a broad global research pipeline for GM specialty crops using traits that would be beneficial to both producers and consumers. Governmental approval is required before GM crops can be marketed. Since 1992, 24 governmental bodies have approved or deregulated a total of 84 unique plant and trait combinations (http://www.cera. gmc.org/). Regulatory approvals of GM specialty crops averaged 48% of the number in commodity crops from 1992 to 2002, but only 5% since 2003 (Fig. 2c). Although
nature biotechnology volume 28 number 10 OCTOBER 2010
21 approvals have been granted by all governmental bodies for nine specialty crops, only two have occurred since 2000. These two transgenic events are reduced nicotine content in tobacco and virus resistance in plum. The tobacco product was marketed briefly in the United States as an aid to smoking cessation, and the GM plum variety still awaits final approval from the US Environmental Protection Agency before it can be grown commercially. The distribution of all regulatory approvals exhibits two distinct phases (Fig. 2c). Approvals initially peaked in 1995, followed by a decline to only one approval each in 2000 and 2001. The number of approvals then increased, albeit slowly, but only for commodity crops. A recent analysis shows that innovations in agbiotech were on an exponentially increasing trend during the 1990s, which then abruptly leveled off around 1998, with a decline in subsequent years8. Furthermore, new innovations entering the pipeline after 1998 were less likely to move toward commercialization. These patterns were attributed to a global change in regulatory and market policies toward GM crops, notably the moratorium on new approvals and therefore marketing in the EU beginning in 1998. Our results indicate that in contrast to the pre-1998 era, only commodity crop developers were able to participate successfully in this new regulatory and market environment. There are a number of possible reasons why GM specialty crops are not progressing past the research phase, and exploring these deserves further research. Previous analyses have documented that the $1–15 million in additional costs per insertion event associated with receiving regulatory approval9,10 (which is not required for varieties developed using other breeding methods) are out of proportion to the potential additional market value that can be recovered on the limited areas devoted to these crops11. Similarly, a review on ornamental specialty crops concluded that although there is considerable technology available and valuable traits to be exploited, GM varieties are still unattractive from an economic perspective, primarily due to regulatory costs9. Lack of demand or market rejection of GM specialty crops could also be the reason for their absence. This is undoubtedly the case in some countries and markets that unconditionally ban GM products, but the hypothesis is difficult to test, as until they receive regulatory approval, GM products 1013
Note: Supplementary information is available on the Nature Biotechnology website. Acknowledgements J.K.M. is partially funded through the UC Discovery Fellows program (http://ucdiscovery.org/). This study also received support from Specialty
1014
a
Forest tree crops 5%
Commodity crops crops Commodity 80% 80%
Specialty Specialty crops crops 15% 15%
Spain Sweden The Netherlands 1% Germany 1% 1% 3% Australia 1% Other 5% Canada Canada 19% 19%
United United States States 69% 69%
1,200
b
1,000
Number of field trials
800 600 400 200
c
0 12 10
Specialty crop Commodity crop
8 6 4 2 0
19 92 19 93 19 94 19 95 19 96 19 97 19 98 19 99 20 00 20 01 20 02 20 03 20 04 20 05 20 06 20 07 20 08
are not available for consumers to accept or reject. For example, although Indian Minister for the Environment Jairam Ramesh cited a lack of public confidence when he recently blocked regulatory approval of insect-resistant GM brinjal (eggplant)12, his action precluded consumers from having the opportunity to demonstrate their preferences in the marketplace. Given the limited number of GM specialty crops that have received regulatory approval, consumer acceptance remains largely untested in the market. Our interviews with specialty crop seed companies and nurseries provide extensive anecdotal evidence that many potentially marketable GM products have been created and tested in the private sector, but the cost and uncertainty of the regulatory process has made further development uneconomical and prevented them from testing actual market acceptance. The justification for requiring costly regulatory testing of GM plants is to ensure that potential risks are fully assessed before commercial release. Thus, it can be argued, if specialty crops cannot meet this standard economically, that is the price to be paid to eliminate risk. However, even virtually identical traits do not require such approval if developed using non-GM methods and no actual risks unique to the recombinant DNA process per se have been experienced with the GM crops currently marketed. On the other hand, the constriction in commercialization of GM traits has resulted in lost societal benefits due to foregone innovations that are estimated to be in the billions of dollars10,13. When GM crops could reduce environmental impacts or improve health and nutrition relative to current varieties (Supplementary Data), failure to use them also constitutes risks that generally are not considered in regulatory evaluations14. Although research on GM specialty crops continues to explore a wide range of input and output applications, their commercialization may depend upon a reexamination of the balance between potential risks versus foregone societal benefits and consequent adjustments in regulatory requirements.
Number of new regulatory approvals
© 2010 Nature America, Inc. All rights reserved.
correspondence
Figure 2 Field trials and regulatory approvals. (a) Using the UNU-MERIT database, field trials conducted in 24 developed countries between 2003 and 2008 were separated on the basis of commodity, forest tree or specialty crop. From this, the specialty crops were further subdivided based on the country in which the field trial was conducted. (b) The numbers of field trial permits acknowledged or issued in the United States are plotted by year for commodity crops and specialty crops. (c) The 84 unique transgenic events that have been granted regulatory approval by one or more countries are plotted by year of approval. If the year of approval varied among countries, the first year of regulatory approval granted by any agency for a given event was used.
Crop Regulatory Assistance (http://www. specialtycropassistance.org/). COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests.
Jamie K Miller & Kent J Bradford Seed Biotechnology Center, University of California, Davis, California, USA. e-mail: [email protected] 1. Alston, J.M. & Pardey, P.G. Hortscience 43, 1461–1470 (2008). 2. USDA-NASS. Summary and State Data Vol. 1 (USDANASS, 2009). 3. Brookes, G. & Barfoot, P. AgBioForum 12, 184–208 (2009). 4. Huang, J.K. et al. Science 295, 674–677 (2002). 5. Wang, D.P. J. Integrative Plant Biol. 49, 1281–1283 (2007).
6. Evenson, R.E. in Regulating Agricultural Biotechnology: Economics and Policy (eds. Just, R.E., Alston, J.M. & Zilberman, D.) 103–123 (Springer Publishers, New York, 2006). 7. Stein, A.J. & Rodriguez-Cerezo, E. The Global Pipeline of New GM crops: Implications of Asynchronous approval for International Trade. (Joint Research Center Scientific and Technical Reports, Brussels, 2009). 8. Graff, G.D., Zilberman, D. & Bennett, A.B. Nat. Biotechnol. 27, 702–704 (2009). 9. Dobres, M.S. in Floriculture, Ornamental and Plant Biotechnology, Vol. V (ed. Teixera da Silva, J.A.) 1–14 (Global Science Books, 2008). 10. Kalaitzandonakes, N., Alston, J.M. & Bradford, K.J. Nat. Biotechnol. 25, 509–511 (2007). 11. Bradford, K.J., Alston, J.M. & Kalaitzandonakes, N. in Regulating Agricultural Biotechnology: Economics and Policy (eds. Just, R.E., Alston, J.M. & Zilberman, D.) 683–697 (Springer Publishers, New York, 2006). 12. Bagla, P. Science 327, 767–767 (2010). 13. Graff, G.D., Hochman, G. & Zilberman, D. AgBioForum 12, 34–36 (2009). 14. Potrykus, I. Nature 466, 561–561 (2010).
volume 28 number 10 OCTOBER 2010 nature biotechnology
correspondence
© 2010 Nature America, Inc. All rights reserved.
ProHits: integrated software for mass spectrometry– based interaction proteomics To the Editor: Affinity purification coupled with mass spectrometric identification (AP-MS) is now a method of choice for charting novel protein-protein interactions and has been applied to a large number of both small-scale and high-throughput studies1. However, general and intuitive computational tools for sample tracking, AP-MS data analysis and annotation have not kept pace with rapid methodological and instrument improvements. To address this need, we have developed the ProHits laboratory information management system platform. ProHits is a complete open source software solution for MS-based interaction proteomics that manages the entire pipeline from raw MS data files to fully annotated protein-protein interaction data sets. It was designed to provide an intuitive user interface from the biologist’s perspective and can accommodate multiple instruments within a facility, multiple user groups, multiple laboratory locations and any number of parallel projects. ProHits can manage all project scales and supports common experimental pipelines, including those using gel-based separation, gel-free analysis and multidimensional protein or peptide separation. This software platform is a clientbased HTML program written in PHP (PHP: Hypertext Preprocessor) that runs a MySQL database on a dedicated server. The complete ProHits software solution consists of two main components: a ‘Data Management’ module, and an ‘Analyst’ module (Fig. 1a; see Supplementary Fig. 1 for data structure tables). These modules are supported by an ‘Admin Office’ module, in which projects, instruments, user permissions and protein databases are managed (Supplementary Fig. 2). A simplified version of the software suite (‘ProHits Lite’), consisting only of the Analyst module and Admin Office, is also available for users with preexisting data management solutions or who receive precomputed search results from analyses performed in a core MS facility (Supplementary Fig. 3). A step-by-step installation package, installation guide and user manual (Supplementary Data) are available on the ProHits website (http:// www.prohitsMS.com/).
In the Data Management module, raw data from all mass spectrometers in a facility or user group are copied to a single secure storage location in a scheduled manner. Data are organized in an instrumentspecific manner, with folder and file organization mirroring the organization on the acquisition computer. ProHits also assigns unique identifiers to each folder and file. Log files and visual indicators of current connection status assist in monitoring the entire system. The Data Management module monitors the use of each instrument for reporting purposes (Supplementary Figs. 4 and 5). Raw MS files can be automatically converted to appropriate file formats using the open source ProteoWizard converters (http://proteowizard. sourceforge.net/). Converted files may be subjected to manual or automated database searches, followed by statistical analysis of the search results, according to any user-defined schedule; search engine parameters are also recorded to facilitate reporting and compliance with MIAPE (Minimum Information about a Proteomics Experiment) guidelines2. Mascot3, X!Tandem4 and the TransProteomics Pipeline (TPP5) are fully integrated with ProHits via linked search engine servers (Supplementary Figs. 6 and 7). The Analyst module organizes data by project, bait, experiment and/or sample, for gel-based or gel-free approaches (Fig. 1a; for description of a gel-based project, see Supplementary Fig. 8). To create and analyze a gel-free affinity purification sample, the user specifies the bait gene name and species. ProHits automatically retrieves the amino acid sequence and other annotation from its associated database. Bait annotation may then be modified as necessary, for example, to specify the presence of an epitope tag or mutation (Supplementary Fig. 9). A comprehensive annotation page tracks experimental details (Supplementary Fig. 10), including descriptions of the Sample, Affinity Purification protocol, Peptide Preparation methodology and liquid chromatography-tandem MS (LC-MS/ MS) procedures. Controlled vocabulary lists for experimental descriptions can be added by drop-down menus to facilitate compliance with annotation guidelines, such as MIAPE6 and MIMIx (Minimum
nature biotechnology volume 28 number 10 OCTOBER 2010
Information about a Molecular Interaction Experiment)7, and to facilitate the organization and retrieval of data files. Free text notes for cross-referencing laboratory notebook pages, adding experimental details not captured in other sections, describing deviations from reference protocols and links to gel images or other file types may be added in the ‘Experimental Detail’ page. Once an experiment is created, multiple samples may be linked to it (e.g., technical replicates of the same sample or chromatographic fractions derived from the same preparation). All baits, experiments, samples and protocols are assigned unique identifiers. Once a sample is created, it is linked to both the relevant raw files and database search results. For multiple samples in high-throughput projects, automatic sample annotation may be established by using a standardized file-naming system (Supplementary Fig. 11) or files may be manually linked. Alternatively, search results obtained outside of ProHits (with the X!Tandem or Mascot search engines) can be manually imported into the Analyst module (Supplementary Fig. 12). The ProHits Lite version enables uploading of external search results for users with an established MS data management system. In the Analyst module, MS data can be explored in an intuitive manner, and results from individual samples, experiments or baits can be viewed and filtered (Supplementary Figs. 13 and 14). A user interface enables alignment of data from multiple baits or MS analyses using the ‘Comparison’ viewing tool. Data from individual MS runs, or derived from any user-defined sample group, are selected for visualization in a tabular format, for side-by-side comparisons (Fig. 1b and Supplementary Figs. 15–17). In the Comparison view, control groups and individual baits, experiments or samples are displayed by column. Proteins identified in each MS run or group of runs are displayed by row, and each cell corresponds to a putative protein hit, according to user-specified database search score cutoff. Cells display spectral count number, unique peptides, scores from search engines and/or protein coverage information; a mouse-over function 1015
1016
a
Site 1
Data management
Site 2
Mass spectrometer 1
Mass spectrometer 2
Mass spectrometer 3
RAW file
RAW file
RAW file
Search results
Search results
Search results
Sample
Sample
Experiment
Experiment
Sample
Experiment
Experiment
Analyst Bait
Bait
File conversion Search parameters TPP parameters
ProteoWizard
View data reports Filter background Compare with literature Visualize networks Export data
NCBI, SGD SAINT BioGRID Cytoscape IMEx, Tranche
X!Tandem, Mascot TPP
Project 2
Read, write
b
Bait
Bait
Project 1
Read only
Read, write
Bait comparison
Cytoscape RHOQ
NCK2
VIPR1 WIPF3 ITSN1 WIPF2 CDC42
WASL GRB2 WIPF1 DNMBP
PACSIN1 PACSIN2
6 MEPCE
7 EIF4A2
9 RAF1
Total Peptide Number
8 WASL
Gene Name PRPSAP1
[BioGRID]
Protein ID
PFN1
NCK1 PACSIN3
CTTN
Experimental data only Overlap Literature data only
Hits Peptide
194018537
Peptide comparison Peptide Sequence
Descending
Ascending
8 WASL
Gene ID: 644150 Gene Name: WIPF3
Control
reveals all associated data for each cell in the table. For each protein displayed in the Comparison view, an associated ‘Peptide’ link (Fig. 1b) may also be selected to reveal information such as sequence, location, spectral counts and score, for each associated peptide. Importantly, all search results can be filtered. For example, ProHits allows the removal of nonspecific background proteins from the hit list, as defined by negative controls, search engine score thresholds or contaminant lists. Links to the external US National Center for Biotechnology Information (NCBI) and the Biological General Repository for Interaction Datasets (BioGRID)8 databases are provided for each hit to facilitate data interpretation. Overlap with published interaction data housed in the BioGRID database8 can be displayed to allow immediate identification of new interaction partners. A flexible export function enables visualization in a graphical format with Cytoscape9, in which spectral counts, unique peptides and search engine scores can be visualized as interaction edge attributes. The Analyst module also includes advanced search functions, bulk export functions for filtered or unfiltered data, and management of experimental protocols and background lists (Supplementary Figs. 18–20). Deposition of all MS-associated data in public repositories is likely to become mandatory for publication of proteomics experiments2,7,10. Open access to raw files is essential for data reanalysis and cross-platform comparison; however, data submission to public repositories can be laborious due to strict formatting requirements. ProHits facilitates extraction of the necessary details in compliance with current standards and generates Proteomic Standard Initiative (PSI) v2.5 compliant reports11, either in the MITAB (MapInfo. TAB binary) format for BioGRID8 or in XML format for submission to International Molecular Exchange (IMEx) consortium databases12, including IntAct13 (Supplementary Fig. 21). MS raw files associated with a given project can also be easily retrieved and grouped for submission to data repositories, such as Tranche14. ProHits was developed to manage many large-scale in-house projects, including a systematic analysis of kinase and phosphatase interactions in yeast, consisting of 986 affinity purifications15. Smaller-scale projects from individual laboratories are readily handled in a similar manner. Examples of AP-MS data from
Control
© 2010 Nature America, Inc. All rights reserved.
correspondence
Peptide Sequence
Protein
Figure 1 Overview of ProHits. (a) Modular organization of ProHits. The Data Management module backs up all raw MS data from acquisition computers and handles data conversion and database searches. The Analyst module organizes data by project, bait, experiment and sample (gel-free project shown; see Supplementary Fig. 8 for gel-based organization). Search results from the Data Management module are parsed to individual samples defined within the Analyst module. ProHits can handle large collaborative projects and offers several security layers. In the Analyst module, several view, filter and export functions enable data analysis. Functions provided by external software are listed on the right. (b) ProHits Comparison page. On the left are shown filtered Comparison results for four human baits and one negative control (see Supplementary Fig. 17 for unfiltered data). Display, sort, filter and literature overlap options are listed on the top; selected options in this example are shown in red. Filtered results are displayed at the bottom of the page. Columns represent individual baits. Comparison at the Experiment or Sample levels is also possible. Rows list the hits that pass selected filters. Color coding and intensity in each cell is based on the property selected for visualization, shown for this example as total peptide numbers; mouseovers of each cell will list all properties. A star or triangle inside the cell indicates an interaction identified in previous high-throughput (star) or low-throughput (triangle) studies in BioGRID. Each term in the hits column is hyperlinked to external databases (EntrezGene, BioGRID or NCBI Protein) or to the list of identified peptides. The top right shows the visualization of data in Cytoscape with MS information encoded as an edge attribute. Interactions detected for the example bait protein WASL that are not reported in BioGRID are shown as blue edges with color intensity mapped spectral counts and thickness mapped to number of unique peptides; overlap interactions detected in both the experiment and in BioGRID are shown in green; interactions detected only in BioGRID are shown in gray. At the bottom right is an example of the Peptide view for the protein WIPF3 in the WASL AP-MS experiment.
volume 28 number 10 OCTOBER 2010 nature biotechnology
correspondence both yeast and mammalian projects are provided in a demonstration version of ProHits (http://www.prohitsMS.com/) and in Supplementary Data. The modular architecture of ProHits will accommodate additional new features, as dictated by future experimental and analytical needs. Although ProHits has been designed to handle protein interaction data, simple modifications of the open source code will enable straightforward adaptation to other proteomics workflows.
© 2010 Nature America, Inc. All rights reserved.
Note: Supplementary information is available on the Nature Biotechnology website. Author contributionS G.L. and J.Z. devised and coded all aspects of the platform; C.S. and B.-J.B. implemented protein annotation and provided advice on database architecture; Y.D. wrote the Mascot parser; B.L., A.B., Z.-Y.L., K.C., A.P., A.I.N., T.P., J.L.W. and B.R. provided suggestions on software features; M.T. conceived and guided the project; A.-C.G., B.R. and G.L. wrote the instruction manuals; M.T. and A.-C.G. co-directed project development; A.-C.G. wrote the manuscript with input from B.R. and M.T. Acknowledgments We thank G. Bader, H. Hermjakob, S. Orchard, J.A. Vizcaíno, C. Le Roy, R. Beavis and members of the Tyers and Gingras laboratories for helpful discussions. We are grateful to D. Figeys, S. Angers, D. Fermin, T. LeBihan, F. Ellisma, C. Poitras and B. Coulombe for testing beta versions of ProHits. We thank W. Dunham, E. Deutsch, D. Fermin, T. Glatter, M. Goudreault, L. D’Ambrosio and R. Ewing for critical reading of the manuscript and instruction manual and L. Ng, J. Wei and N. Mohammad for IT support. Supported by grants from the CIHR (MOP-84314 to A.-C.G., MOP12246 to M.T., MOP-81268 to B.R., GSP-36651 to T.P., J.L.W. and M.T., FRN 82940 to M.T. and a resource grant to T.P., A.-C.G., J.L.W. and M.T.), the NIH (5R01RR024031 to M.T., 1R01GM094231-01 to A.I.N. and A.-C.G., and CA-126239 to A.I.N.), MRI-ORF (T.P., J.L.W. and A.-C.G.), the Canada Foundation for Innovation (T.P., J.L.W., A.-C.G. and M.T.), and Genome Canada through Ontario Genomics Institute (T.P. and J.L.W.). We wish to acknowledge support from the Mount Sinai Hospital Foundation; Canada Research Chairs in Functional Genomics and Bioinformatics to M.T., in Proteomics and Molecular Medicine to B.R., and in Functional Proteomics to A.-C.G.; the Lea Reichmann Chair in Cancer Proteomics to A.-C.G. and a Scottish Universities Life Sciences Alliance Research Professorship and a Royal Society Wolfson Research Merit Award to M.T. COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests.
Guomin Liu1, Jianping Zhang1, Brett Larsen1, Chris Stark1, Ashton Breitkreutz1, Zhen-Yuan Lin1, Bobby-Joe Breitkreutz1, Yongmei Ding1, Karen Colwill1, Adrian Pasculescu1, Tony Pawson1,2, Jeffrey L Wrana1,2, Alexey I Nesvizhskii3,
Brian Raught4, Mike Tyers1,2,5 & Anne-Claude Gingras1,2 1Centre for Systems Biology, Samuel Lunenfeld
Research Institute, Toronto, Ontario, Canada. 2Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada. 3Departments of Pathology and Center for Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA. 4Ontario Cancer Institute and McLaughlin Centre for Molecular Medicine, Toronto, Ontario, Canada. 5Wellcome Trust Centre for Cell Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, UK. e-mail: [email protected] or [email protected] 1. Gingras, A.C., Gstaiger, M., Raught, B. & Aebersold, R. Nat. Rev. Mol. Cell Biol 8, 645–654 (2007). 2. Taylor, C.F. et al. (MIAPE). Nat. Biotechnol. 25, 887–893 (2007).
3. Perkins, D.N., Pappin, D.J., Creasy, D.M. & Cottrell, J.S. Electrophoresis 20, 3551–3567 (1999). 4. Craig, R. & Beavis, R.C. Bioinformatics 20, 1466– 1467 (2004). 5. Keller, A., Eng, J., Zhang, N., Li, X.J. & Aebersold, R. Mol. Syst. Biol. 1, 2005 0017 (2005). 6. Ewing, R.M. et al. Mol. Syst. Biol. 3, 89 (2007). 7. Orchard, S. et al. Nat. Biotechnol. 25, 894–898 (2007). 8. Breitkreutz, B.J. et al. Nucleic Acids Res. 36, D637– 640 (2008). 9. Shannon, P. et al. Genome Res. 13, 2498–2504 (2003). 10. Cottingham, K. J. Proteome Res. 8, 4887–4888 (2009). 11. Hermjakob, H. et al. Nat. Biotechnol. 22, 177–183 (2004). 12. Orchard, S. et al. Proteomics 7 Suppl 1, 28–34 (2007). 13. Kerrien, S. et al. Nucleic Acids Res. 35, D561–565 (2007). 14. Falkner, J.A., Hill, J.A. & Andrews, P.C. Proteomics 8, 1756–1757 (2008). 15. Breitkreutz, A. et al. Science 328, 1043–1046 (2010).
More sizzle than fizzle To the Editor: In an echo of Mark Twain’s response when reading his own published obituary that “The report of my death has been exaggerated,” I should like to correct an inaccuracy about GlaxoSmithKline’s (Brentford, UK) EpiNova Discovery Performance Unit (DPU), which was mentioned in Catherine Shaffer’s news article entitled “Pfizer explores rare disease path” from the September issue. The article suggested that EpiNova had ‘fizzled out.” As vice president and head of the EpiNova DPU, I can confirm that, on the contrary, this early drug discovery unit continues to research and build alliances in our search to apply the knowledge of epigenetics to the quest for new medicines for patients. In fact, with our own first-class science, innovation and entrepreneurial spirit, and our external alliances with leading epigenetics research units, including the biotech company Cellzome (Heidelberg, Germany), and Cambridge (UK), Harvard (Cambridge, MA), Oxford (UK) and Rockefeller (New York) universities, we are in glowing health. Many of your readers who have heard
nature biotechnology volume 28 number 10 OCTOBER 2010
our presentations at the American Chemical Society meeting this August in Boston, as well as Miptec 2010 and the Society for Medicines Research Epigenetics Meeting held last month in Basel and London, respectively, will already know this. We are also sponsoring the “Epigenetics of Chromatin Modifications in Inflammation” meeting, taking place in Oxford this December. We look forward to building on the excellent collaborations that are already in place and advancing our work in the area of immuneinflammation long into the future, with the aim of bringing out new innovative medicines for immuno-inflammatory diseases. COMPETING FINANCIAL INTERESTS The author declares competing financial interests: details accompany the full-text HTML version of the paper at http://www.nature.com/ naturebiotechnology/
Kevin Lee EpiNova DPU, iiCEDD, GlaxoSmithKline, Stevenage, UK. e-mail: [email protected]
1017
c o m m e n ta r y
case study
The path less costly Brady Huggett
© 2010 Nature America, Inc. All rights reserved.
When faced with a competitive threat, two companies took diametrically opposite approaches. Both were ultimately successful, but Genzyme’s decision proved to be the cleaner and cheaper option.
A
s the world’s leader in developing enzyme-replacement drugs, Genzyme has always understood the importance of first to market. In 1991, the company obtained approval for Ceredase (alglucerase injection), an enzyme replacement therapy for lysosomal storage disease (LSD) type 1 Gaucher. Three years later, its second-generation product, Cerezyme (imiglucerase for injection), was also cleared for commercialization. The lack of treatments for lysosomal storage diseases and effective patient outreach and marketing meant that Genzyme could command soaring prices for its orphan treatments. In 2000, the two drugs alone provided 66% of Genzyme’s entire product revenue. At this time, Novazyme Pharmaceuticals was a young company developing a preclinical product for another lysosomal storage disease called Pompe. Its patented phosphotransferase technology was designed to add mannose-6 phosphate and N-acetylglucosamine sugars to recombinant β-glucocerebrosidase produced in Chinese hamster ovary (CHO) cells, thus increasing uptake of the enzyme in the patient. The idea was that less drug would be needed per patient, and any eventual product would be priced lower than competitors, although approval lay some years away. By June 2001, however, the company was in financial trouble: it had about $5.5 million in cash and equivalents, with no incoming revenue and a six-month burn of $11.9 million. The company needed help if it was to handle the expense of clinical trials. There was interest from Genentech, which offered a collaboration, including $30 million upfront, milestone payments and 50% of any sales. But Genzyme, which had its own product in development for Pompe, wanted more than a partnership. It offered to buy the company outright for $137.5 million, pegging another $87.5 million on milestones surrounding approval of one or more drugs that incorporated Novazyme’s platform technology. Genzyme was already conducting an extension of a phase 2 trial with transgenically derived human alpha-glucosidase for Pompe disease, and also a phase 2 trial of a CHO cell–derived alpha-glucosidase product in-licensed from Synpac, of Research Triangle Park, North Carolina, for the same indication. It also had its own, internally developed compound for Pompe. By taking Novazyme’s drug in house, Genzyme could line it up against its three candidates in a massive preclinical trial and move the best one forward based on results. Ultimately, the best replacement protein turned out not to be Novazyme’s drug; worse still, the data surrounding the technology platform proved to be nonreproducible. It seemed the buyout was a bust all-round. But consider another company’s dance with a potential competitor: Genentech of S. San Francisco, Calif. and its twisting history with Tanox Brady Huggett is Business Editor at Nature Biotechnology.
1018
and Novartis of Basel. Houston-based Tanox was founded in 1986 to focus on anti-IgE antibodies and by 1989 was looking for a clinical development partner; it sent samples of its candidate to both Genentech and Ciba Geigy (the company that would later become Novartis). Genentech passed. Ciba Geigy, however, began working with Tanox on anti-IgE antibodies for allergic diseases. Yet Genentech clearly had interest in the area, because it began its own anti-IgE program a few years later—a move that prompted a misappropriation suit from Tanox. The companies fought in court for three years before Genentech, Tanox and Novartis reached a settlement and entered a cross-licensing agreement for anti-IgE antibodies. That was hardly the end for Genentech. After the three companies surveyed their collective R&D efforts and pushed Genentech’s anti-IgE product (eventually called Xolair; omalizumab) to the front, Tanox took it upon itself to develop its discarded anti-IgE antibody (TNX-901) for peanut allergy, thinking that well within the rights of the original three-way contract. Tanox’s partners did not see it the same, and the trio went back to court to settle matters, with Tanox eventually receiving $6.6 million in 2004 with the stipulation it let TNX-901 die. Nor was this the end. Xolair was approved in 2003 for asthma, and has gone on to sell well: it approached blockbuster status in 2009 and through last year has sold about $3.3 billion worldwide. Tanox had a slice of sales through the cross-licensing agreement, which became the impetus for Genentech to approach Tanox in December 2005 with a merger offer. The deal was announced in November 2006 at $919 million, a 46% premium to Tanox’s stock price pre-announcement. One can only wonder what the merger would have cost Genentech 16 years before, when Tanox first came knocking. Instead, Genentech rebuffed that initial proposition and went to court with Tanox twice before finally swallowing it in a move that had more to do with accounting than science or patients. In either of these mergers, the gem compound eventually died—hardly a surprise, given the rates of attrition in drug pipelines. At the same time, both of these buyouts could technically be called a success. Bringing aboard Novazyme and its product temporarily muddied the waters for Genzyme, but it knew that whatever eventually floated to the top, it would own. The acquisition crisply and cleanly removed a noisy competitor, leaving Genzyme to focus on approval: Myozyme was cleared in 2006 and has since surpassed $1 billion in worldwide sales. The numbers worked out fine for Genentech, too—the Xolair revenue slice pays for the price of Tanox—but the company wasted resources and money in court and ended up paying more than it might have. It also received negative press for its court fight over Tanox’s attempts to develop a peanut allergy drug. Crisp and clean this was not. volume 28 number 10 OCTOBER 2010 nature biotechnology
p at e n ts
Faculty and employee ownership of inventions in Australia Amanda McBratney & Julie-Anne Tarr
T
hirty years after its introduction, the US Bayh-Dole Act, which vests ownership of employee inventions in the employer university or research organization, has become a model for commercialization around the world. In Australia, despite recommendations that a Bayh-Dole–style regime be adopted, the recent decision in University of Western Australia (UWA) v. Gray1 has moved the default legal position in a diametrically opposite direction. A key focus of the debate was whether faculty’s duty to carry out research also encompasses a duty to invent. Late last year, the Full Federal Court confirmed a lower court ruling that it does not, and this year the High Court refused leave to appeal (denied certiorari). Thus, Gray stands as Australia’s most faculty-friendly authority to date. The US common-law position Absent an express written agreement assigning the rights in faculty inventions to a university, or a legally binding faculty handbook or university intellectual property (IP) policy, the commonlaw position on ownership of faculty and other employees’ inventions remains unclear. The usual starting point is the US Supreme Court’s decision in Standard Parts Co. v. Peck2, which states that an employer owns employees’ inventions if the employee was ‘hired to invent’. Under the later United States v. Dubilier Condenser case3, if the employment was merely ‘general’, then even if the invention is in the employee’s field, and relevant to the employer’s business, Amanda McBratney and Julie-Anne Tarr are in the Faculty of Business, Queensland University of Technology, Brisbane, Queensland, Australia. Amanda McBratney is also a consultant with McCullough Robertson Lawyers, Brisbane, Queensland, Australia. e-mail: [email protected]
Source: Creative Commons/Adz
© 2010 Nature America, Inc. All rights reserved.
A recent Australian legal decision means that, unless faculty members are bound by an assignment or intellectual property policy, they may own inventions resulting from their research.
The Australian Federal Court building in Melbourne.
and even if it was developed on the employer’s time and/or with the employer’s resources, the employee owns it. The employer gets a shop right as compensation—an irrevocable, royalty-free, nonexclusive, largely nontransferable, implied license to use the invention. If the employment was general and no employer time or resources were used, the employee owns all rights unencumbered. Whether an employee was hired to invent often depends on the specificity of the task delegated by the employer. Generally, being engaged to do research or improve products is insufficient—if courts were too ready to vest ownership in the employer, inventive creativity might be discouraged4. Thus, the employee must be hired to invent the invention at issue; if hired to invent A, they will not lose the right to invention B. In the university context, despite increasing commercialization, most researchers are still not explicitly hired to invent for the university’s commercial gain. The core of ‘public good’ still lingers, and the pursuit of knowledge for its own sake has yet to disappear. The difficulty lies in deciding whether a faculty member’s duty to research encompasses a duty to invent.
nature biotechnology volume 28 number 10 OCTOBER 2010
Those arguing in the positive invariably cite Speck v. North Carolina Dairy Foundation5. Speck, a professor and researcher at North Carolina State University, developed a process for producing a sweet-tasting acidophilus milk. He successfully drove efforts to have the milk mass produced, but the university refused to pay him anything out of its licensing royalties. Speck sued, and the case turned on whether he could show he had a property interest in the process. The North Carolina Supreme Court found that Speck was hired to invent, so he had no rights in the process. Proponents of Speck usually argue that the Houghton v. United States case lends further substance to the proposition that researchers are hired to invent: Let a case be supposed of a charitable foundation, which employs chemists and physicians to study diseases, with a view of discovering a cure for them, one of whose employees, in the course of experiments conducted for it, discovers a remedy which it is seeking, and for the discovery of which the experiments are conducted, and procures a patent on it. Should such employee be allowed to withhold the patent from the foundation for his own profit, merely because the foundation does not desire to monopolize the remedy but to give the benefit of the discovery to mankind?…To ask such a question is to answer it…6. However, the Houghton case involved a chemist expressly directed to develop a particular fumigant, and the court was merely dispelling the argument that the hired-to-invent doctrine would not apply if the employer were uninterested in patenting. In addition, many commentators have criticized Speck, particularly because the court considered Speck’s use 1019
pat e n t s
© 2010 Nature America, Inc. All rights reserved.
of the university’s time and resources as a factor indicating he was hired to invent, rather than, as settled case law indicates, a factor indicating a shop right should be awarded. So, although the university ‘permitted and encouraged’ the research, there was no evidence Speck’s research agenda was controlled by his department or the university; it was ‘motivated simply by his scientific curiosity’7. Some have lamented the policy signals Speck sends: if faculty are hired to invent, how does this sit with the usual notions of academic freedom8? This is the precise question the Australian courts grappled with, as discussed below. The position under Bayh-Dole Given the uncertainty in US common law, the passage of the Bayh-Doyle Act in 1980 was a welcome (for some) introduction of an overriding standard for determining ownership of inventions created with the use of federal research funds. The Act aims to promote the commercialization of inventions by allowing universities and other recipients of federal research funds to elect to take title to subject inventions if they agree to file a timely patent application. The university must then retain title to the inventions and share licensing proceeds with the employee inventors, and the balance of licensing income must be used to support scientific research or education. Against the backdrop of Bayh-Dole, most universities have in recent times adopted a dual approach, by issuing IP policies that vest ownership of faculty inventions in the university, and by requiring that faculty sign invention assignment agreements (either as a condition of employment or as a belated attempt to stitch up the ownership question). A fundamental tenet of contract law is that parties are free to enter bargains as they see fit. Consequently, where faculty assign inventions, it is assumed that this has been compensated by the payment of wages. So courts have upheld assignments over objections they are unconscionable, coerced under duress, where the university has paid as little as $1, or where continued employment is the only consideration9. However, although the arrangement under Bayh-Dole has recently garnered support10, there is a growing body of agitators for faculty ownership. In an early article, Chew11 argues that, in fact, university ownership is not legally required by Bayh-Dole. Faculty ownership would allow universities to maintain their academic mission and primacy of basic research. It would also enhance faculty creativity. Universities, Chew argues, would not lose significant revenue because royalty income represents a relatively minor part of most university funding. In any event, she suggests that 1020
university technology transfer offices (TTOs) could negotiate a percentage of royalty income in return for marketing faculty inventions, and that this could offset losses incurred. Similar arrangements are proposed by Kulkarni12 and Smith13. More recently, Clements uses law and economics to argue that Bayh-Dole’s impact has been marginalized by the practicalities of implementation: refusals to disclose inventions, difficulties of technology transfer due to information asymmetries and deadweight losses caused by exclusive licenses. He argues that faculty ownership could be achieved by amendments to the Act and, possibly, by courts’ refusals to uphold assignment agreements. Like Chew, Clements argues that universities would not forego significant revenue because universities’ income from licensing is proportionately small compared to other revenue sources. It would remove at least part of the perceived issue with publication delay and secrecy because many ‘archetypal academic scientists’ who ‘hate the Act’ would choose to publish rather than patent. It would also reverse the effect of the Act ‘steer[ing] research down less interesting avenues’ because faculty members who chose to publish rather than patent would channel their efforts toward more basic research14. Kenney and Patton similarly argue that university ownership is not optimal in terms of either maximizing economic efficiency or advancing the social interest of rapidly commercializing technology and encouraging entrepreneurship. Under faculty ownership, the inventors would be the principals and could choose their agent TTO; TTOs would thus be forced to become more competitive. Faculty ownership would also shrink the gray market in faculty inventions—in one study, over 20% of professors had founded firms without university licenses; in another, 42% of professors who patented did so without informing their TTOs15. Nevertheless, Kenney and Patton candidly acknowledge that faculty ownership brings its own problems, including the obvious rejoinder to Clements that, in fact, problems with secrecy and nondisclosure may be exacerbated by the faculty’s increased stake in the rewards of patenting. Further, there is a risk that some inventions would not be commercialized, and that some inventors may be incompetent at commercialization—but they maintain that this decentralized ineptitude is better than the present centralized ineptitude of many TTOs that affects all university inventors. However, Kenney and Patton also point to a significant detriment of faculty ownership for which they offer no remedy: it could discour-
age collaborative research, particularly largescale multi-institutional collaborative research, because the large number of co-owners makes logistics extremely difficult. As Bruun explains, the fragmented ownership problem faced by large-scale collaborative projects was one of the main reasons that Finland followed other European countries and abolished the ‘teacher exception’ (faculty ownership) in 2007 and adopted Bayh-Dole–like reforms16. For every commentator applauding BayhDole, there are others now querying its efficacy. It does not look like the debate will be settled any time soon. In the meantime, proselytizing aside, for most US academics it’s business as usual: contractual arrangements are likely to be binding and will decide the ownership issue in the university’s favor. The UWA v. Gray case In Australia, employers derive ownership rights under §15(1)(b) of the Patents Act. This states that employers “would, on the grant of a patent…be entitled to have the patent assigned” to them. Entitlement to assignment is determined by a common-law principle established by the English case of Sterling Engineering v. Patchett17, which (like the US Supreme Court’s decisions in Standard Parts and Dubilier) dictates that the employer will own when an employee, in the course of employment, makes an invention that it was his or her duty to make. The question of whether Australian faculty were hired to invent was, until UWA v. Gray, left unclear. Some thought the question depended on the discipline and on whether the research might yield an invention18. For example, if research into diabetes could result in an invention, it was thought that the duty to research might encompass a duty to invent. The protagonist in the case, Bruce Gray, was first employed at Melbourne University in the early 1980s, where he initiated research into the treatment of liver cancer by selectively delivering anticancer therapies to tumor sites. The University of Western Australia subsequently employed Gray from 1985 to teach and “undertake research and to organize and generally stimulate research among the staff and students.” Gray continued in the same line of research, and after patenting various inventions, he assigned his IP to his commercialization company Sirtex Medical (Lane Cove, New South Wales, Australia). Sirtex went on to float on the Australian Stock Exchange, and its current market capitalization is more than AUS$275 million; it is now one of Australia’s largest biotech companies. Gray’s employment contract referred to the UWA Statutes and Regulations, which included Patents Regulations and, later, IP Regulations.
volume 28 number 10 OCTOBER 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
pat e n t s However, the earlier Regulations did not vest IP rights in faculty inventions in UWA; they merely assumed that the university had such rights. The later Regulations did purport to vest ownership in UWA, but the trial judge and Court of Appeal held these Regulations invalid—the Act establishing the University did not allow the Senate to make regulations divesting its faculty’s IP rights. So the invalid Regulations could not be incorporated into Gray’s contractual terms. An additional problem was that the Regulations had not been properly promulgated and so were not effective until after Gray’s employment at UWA had fundamentally changed. So UWA was thrown back on its commonlaw position. It argued that Gray’s contract was subject to a term, implied by law, that he must assign ownership of any inventions to the University. Assignment was said to be required wherever an employee is engaged, instructed or authorized to solve technical problems, improve the employer’s technology or undertake research from which such invention may arise. The Australian ruling The Appeals court, like the trial judge, found that the university/faculty relationship raised such distinctive considerations that it was inappropriate to accept that there is a general presumption that a university will own inventions developed in the course of its faculty’s research. Gray’s circumstances of employment also weighed against implication of the term. In particular: (i) Gray was not under any (express) duty to invent anything. Consistently with traditional notions of academic freedom, Gray was free to choose his line of research and the manner of its pursuit. (ii) Gray was free to publish his research results and any inventions developed, notwithstanding that this might destroy the patentability of any inventions. The fact that UWA did not impose any obligations of secrecy was also consistent with traditional notions of academic freedom, and inconsistent with an intention by UWA to own and commercialize faculty inventions. The Court of Appeal considered that the importance of the freedom to publish was ‘self-evident’. Implication of a term as posited by UWA would have a ‘significant collateral impact’ on academics if it were underpinned by a corresponding duty not to disclose. (iii) Gray expended much time and effort applying for external funds. As with many cash-strapped universities,
UWA wanted to foster, but could not fund, Gray’s research. Unlike the general employer-employee scenario in which the employer has funded the employee’s work, if UWA’s term were implied, it would ‘allow UWA to reap where various entities had sown’. (iv) Gray engaged in significant collaborative work with external organizations. The need for inter-institutional cooperation weighed against exclusive appropriation of the end product by one institution via an implied term. In addition, the evidence on information exchanges in Gray’s field of research demonstrated that sharing of research results, and know-how, was both necessary and accepted—which further argued against implication of the term. Ultimately, UWA was unsuccessful on all counts. In what might seem the final legal snub, Australian law has yet to recognize the existence of a shop right, so UWA was left without any rights whatsoever to Gray’s inventions. As the High Court drily remarked in refusing leave to appeal, the case emphasizes the need for express contractual arrangements on ownership. However, it didn’t all go well for Gray. Sirtex successfully cross-claimed against him for breach of his directors’ duties and for misleading and deceptive conduct (for failing to inform the company about the potential ownership problems), resulting in an order for Gray to pay it almost AUS$2 million. The aftermath So the common-law position on faculty ownership in Australia seems to have been resolved, for the moment, in favor of faculty. Not surprisingly, Gray ignited a rash of university soulsearching, in which the implications of faculty ownership and how best to secure university ownership were pondered. Yet the Gray case is in many respects an unsatisfactory precedent, and there is certainly an array of very particular facts and circumstances that will allow later courts to sidestep it and discount it as an allembracing authority. Nevertheless, Gray’s emergence on the legal landscape will likely lead to renewed calls for tailored statutory intervention. Christie et al.19 raise the usual arguments: The default position should not vest ownership of patents in employee inventors nor funding agencies. This is because employees may not recognize the commercial value of their inventions and because of
nature biotechnology volume 28 number 10 OCTOBER 2010
the potential problems with fragmentation of ownership. Indeed, experience in Canada has shown that academic staff members in universities often lack the time and expertise required for commercialisation. Funding agencies are also not well placed to assume ownership rights, as they are one step removed from the inventive process… There is little doubt that Bayh-Dole has gone a considerable distance towards clarifying title for research with federal funding links. It means that, like the hamster bite that kills, the uncertainty of the common law is rarely a problem (but when it is, it can still be ugly). However, it must be acknowledged that a spectrum of issues—from capturing ownership outside the Act, through to how it might best be overhauled to mitigate a host of anticompetitive byproducts—continue to raise concerns. So although arguments can be assembled in favor of the adoption of a Bayh-Dole–style approach in Australia, as always a careful balancing process should be followed. On the one hand, a new statute will inevitably limit future flexibility and interfere with private contracting rights. On the other, given the vexed issue of ownership and the complications of Gray, the answer is probably that some form of statutory clarification is justified. Whether legislation like Bayh-Dole would have a good fit with the Australian research landscape is another matter. The Australian Productivity Commission (Canberra, Australian Capital Territory, Australia) recently urged a “cautious” approach to adopting an Australian Bayh-Dole Act. Although Bayh-Dole was introduced in response to concerns that many university inventions were not being commercialized, there is little evidence of a similar phenomenon in Australia. The Commission found that there were already financial incentives for Australian universities to commercialize—such as the one-third royalty-sharing arrangement most commonly divided among the university, the inventor and the inventor’s academic department or faculty. The Commission also pointed out the potential for new Bayh-Dole–style legislation to adversely affect the incentives operating within universities—suggesting the delicate balance between commercialization activities and the ‘academic traditions of openness and curiosity driven research’ might be disrupted20. For now, Australian inventors and researchers are left with Gray, and the prospects of Bayh-Dole reforms appear to be slim to none in the immediately foreseeable political future. 1021
pat e n t s For those few faculty members or researchers at other institutions who, through good luck or good management, are not bound to contractual assignments, the decision is a windfall. For most Australian faculty, as in the United States, contract will usually reign supreme, and universities will usually win the ownership tug-of-war. Nevertheless, the Gray decision is a salient reminder that the issue of ownership can never be underestimated in regard to its ability to provoke a good fight. COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests.
12. Kulkarni, S.R. Hastings L.J. 47, 221–256 (1995). 13. Smith, K.G. 1 V.A J.L. & Tech. 1(4), 15 (1997). 14. Clements, J.D. IDEA 49, 469–516 (2009). 15. Kenney, M. & Patton, D. Research Policy 39, 1407– 1422 (2009). 16. Bruun, N. Ga. St. U. L. Rev. 23, 913–935, 921 (2007). 17. Sterling Engineering Co. Ltd. v. Patchett [1955] AC. 18. Monotti, A. & Ricketson, S. Universities and Intellectual Property (Oxford University Press, 2003). 19. Christie, A. et al. Analysis of the legal framework for patent ownership in publicly funded research institutions (Intellectual Property Research Institute of Australia, 2003). 20. Australian Productivity Commission. Public support for science and innovation (APC Final Report) (2007).
© 2010 Nature America, Inc. All rights reserved.
1. University of Western Australia (UWA v. Gray) [2008]
FCA 498 (first instance); [2009] FCAFC 116 (Full Federal Court); [2010] HCATrans 11 (High Court of Australia refusing special leave to appeal (i.e., certiorari denied)). 2. Standard Parts Co. v. Peck, 264 U.S. 52 (1924). 3. U.S. v. Dubilier Condenser Corp., 289 U.S. 706 (1933). 4. Chisum, D.S. Chisum on Patents, 8–22 (LexisNexis, Matthew Bender, 1978). 5. Speck v. North Carolina Dairy Foundation, 307 S.E.2d 785 (N.C. 1983) rev’d 319 S.E.2d 139 (1984). 6. Houghton v. United States, 23 F.2d 386, 390 (1928, CCA, 4th cir). 7. Chew, P.K. Wis. L. Rev. 259–314 (1992). 8. Browning, C.G. Jr. N.C.L. Rev. 63, 1248–1259. (1985) 9. Merges, R. Harv. J. Law & Tec. 13, 1–54 (1999). 10. Luppino, A.J. UMKC Law Review 78, 367–427 (2009). 11. Chew, P.K. Wis. L. Rev. 259–314 (1992).
1022
volume 28 number 10 OCTOBER 2010 nature biotechnology
patents
© 2010 Nature America, Inc. All rights reserved.
Recent patent applications in gene synthesis Priority application date
Publication date
1/6/2009
7/15/2010
12/19/2008
6/24/2010
Han S, Heo G, Kim D, Lee A, Lee H, Lee M, Nam M, Noh E, Park H
10/10/2008
4/20/2010
Verenium (Cambridge, MA, USA)
Barton NR, Bueno A, Cuenca J, Dayton CLG, Hitchman T, Kline KA, Lyon J, Miller ML, Wall MA
8/29/2008
3/4/2010
British Columbia Cancer Agency (Vancouver, BC, Canada)
Coope R, Holt RA, Horspool D
5/14/2008
11/19/2009
Patent number
Description
Assignee
Inventor
WO 2010079039
A method for modulating macronutrients, comprising producing a synthetic gene coding for at least one enzyme, expressing and optionally activating the enzyme and contacting it with macronutrients; useful in, e.g., pharmaceutical composition.
Nestec (Vevey, Switzerland)
Arigoni F, Bureau-Franz I, Maynard F, Pridmore R
WO 2010071602
A method of synthesizing a nucleic acid molecule Agency for Science, Li M, Ye H; Ying JY involving assembling overlapping oligonucleotides Technology & Research (Singapore) and amplification with PCR, using a single PCR, with distinct oligonucleotides and annealing temperatures.
KR 2010040634
A method of manufacturing a deletion cassette, comprising artificial sequence linkers, by performing PCR amplification using a primer pair existing in the artificial sequence linker and removing the artificial sequence linker.
Korea Research Institute of Bioscience & Biotechnology (Daejeon, S. Korea)
WO 2010025395, US 20100055085
A new isolated, synthetic or recombinant nucleic acid comprising, e.g., a nucleic acid (polynucleotide) encoding at least one polypeptide, useful, e.g., for encoding immobilized polypeptide used as food, and in a cosmetic or a cream.
WO 2009138954
A method of synthesizing a polynucleotide on a solid support comprising partitioning a polynucleotide into an ordered set of palindromeless subunits and ligating the oligonucleotide precursors of each subset.
Source: Thomson Scientific Search Service. The status of each application is slightly different from country to country. For further details, contact Thomson Scientific, 1800 Diagonal Road, Suite 250, Alexandria, Virginia 22314, USA. Tel: 1 (800) 337-9368 (http://www.thomson.com/scientific).
Selected patent expirations/extensions in the second half of 2010 Generic drug (brand) name
Company
Indication
Patent information
Drug information
Cidofovir (Vistide)
Gilead; Pfizer
Cytomegalovirus retinitis (in AIDS patients)
The last patent covering the selective nucleotide inhibitor for viral DNA polymerase, which was approved by FDA in 6/96, expired on 6/26/10.
S-N-(3-hydroxy-2-phosphonylmethoxy) propylcytosine derived from the purine analog, S-9-(3-hydroxy-2-phosphonomethoxy) propyl-adenine
Bivalirudin (Angiomax)
Biogen Idec; The Medicines Company
Percutaneous coronary interventions (PCIs)
On 8/6/10, USPTO granted a one-year interim extension of US patent no. 5,196,404, which covers the bivalent peptide thrombin inhibitor used in PCIs such as bypass surgery, until 8/13/11.
Bivalirudin (D-FPRPGGGGDGDFEEIPEEYL) is a bivalent thrombin inhibitor comprising a moiety (D-FPRP) that binds thrombin’s active-site cleft and a hirudin-like C-terminal region (DGDFEEIPEEYL) that binds to the thrombin anion-binding exosite
rHuPH20 (Hylenex)
Halozyme; Baxter
Adjuvant agent, for increasing drug absorption or dispersion
On 8/12/10, US patent no. 7,767,429 was Recombinant human hyaluronidase issued, claiming the proprietary platform for the rHuPH20 PEGylated glycoprotein, which was approved by the FDA in December 2005. The claim, which extends until 9/23/27, also covers formulation of rHuPH20 with other pharmaceutical agents. The European counterpart, EP1603541, provides protection until 3/5/24.
Docetaxel (Taxotere)
Sanofi-aventis
Various cancers
On 11/14/10, the patents for Taxotere will expire. The patent covering the drug substance expired on 5/14/10.
Taxane, the substance from which Taxotere is derived, is found in the needles of the European yew tree
Gemcitabine (Gemzar)
Eli Lilly
Various cancers
On 11/15/10, the compound patent for Gemzar will expire.
Deoxycytidine analog that inhibits DNA synthesis
Topotecan (Hycamtin)
GlaxoSmithKline
Various cancers
On 11/28/10, the compound and drug use claim patents for Hycamtin will expire.
A semisynthetic derivative of camptothecin and an anti-tumor drug with topoisomerase I-inhibitory activity
Donepezil (Aricept)
Eisai
Alzheimer’s disease
On 11/25/10, the substance patent for Aricept will expire. FDA approved the first generic version of donepezil on 12/11/09.
Reversible acetylcholinesterase inhibitor
FDA, US Food and Drug Administration. NA, not available. Source: http://biomedtracker.com/
nature biotechnology volume 28 number 10 OCTOBER 2010
1023
news and views
Timing is everything in the human embryo Ann A Kiessling
© 2010 Nature America, Inc. All rights reserved.
A noninvasive imaging method for predicting how human embryos will develop may improve the success and safety of in vitro fertilization. Humans may be the least fertile mammals on earth1,2. Although in vitro fertilization (IVF) technology has helped many couples—an estimated 3 million IVF babies had been born worldwide by 2002—its success rates are low (~25% per procedure), and it has led to an epidemic of high-risk multiple births3,4. In this issue, Wong et al.5 report a noninvasive strategy for evaluating human embryos that could both improve IVF success rates and decrease the likelihood of multiple gestations. Using time-lapse videography, they discovered characteristics of initial cleavages that predict successful development to the blastocyst stage with >93% accuracy. A new human life depends upon robust, steadily increasing signals from the fertilized egg to the mother. Failure to elaborate sufficient signal (e.g., the pregnancy hormone, human chorionic gonadotropin) results in a menstrual cycle and expulsion of the fertilized egg, thus freeing maternal resources for another attempt with a new egg. How the early human embryo sends the robust, steadily increasing signals is poorly understood, but one mechanism could be rapid duplication of embryonic DNA with a concomitant increase in signaling output to the mother. This suggests a direct correlation between the speed of chromosome duplication, cell division and successful pregnancy, an observation often reported by programs of assisted reproduction4. IVF involves hormone stimulation of a woman’s ovaries in order to mature multiple eggs, which are removed, fertilized in the laboratory, cultured for 2 to 6 days, and transferred back to her uterus for gestation. Fertilized on day 1, an egg that has duplicated its chromosomes twice and reached the 4-cell stage by Ann A. Kiessling is in the Department of Surgery, Harvard Medical School, Boston, Massachusetts, USA. e-mail: [email protected]
a Day 1
Day 1 or 2
2-Cell
Zygote 14 ± 6 minutes
11 ± 2 hours
Within 1 ± 1.6 hours
Day 2 or 3
Day 4 or 5
Day 5 or 6
4-Cell
Morula
Blastocyst
b
Figure 1 Early human development. (a) The zygote possesses one pronucleus containing egg chromosomes and another pronucleus containing sperm chromosomes. Both sets of chromosomes are duplicated before the first cleavage to 2 cells. Wong et al.5 discovered that, for successful development to the blastocyst stage, the first cleavage furrow should be only 14 ± 6 min from the beginning to the appearance of 2 cells; the 2-cell stage should last only 11 ± 2 h; and the cleavage of each of the 2 cells to its daughter cells in the 4-cell stage should occur within 1 ± 1.6 h of each other. The morula forms at the 8- to 16-cell stage, trapping 1 or 2 cells inside that undergo commitment to become the inner cell mass (ICM) within the blastocyst. The ICM gives rise to the fetus. The outer cells of the blastocyst become committed to trophoblast, precursor to the placenta. (b) Theoretical aneuploidy in early development. The schematic depicts the highest rate of aneuploidy (purple cells) that could form the ICM from a euploid cell (green) and produce a normal fetus. This theory is supported by several lines of evidence, including chromosomal analyses that reveal both aneuploid and euploid cells in human blastocysts8.
early day 2, and reached the 8-cell stage by early day 3, has a higher likelihood of giving rise to an offspring than an egg that duplicated its chromosomes only once and reached the 2-cell stage on day 2 and the 4-cell stage on day 3, but the correlation is imperfect. Faster-cleaving embryos also have a higher likelihood of developing to the blastocyst stage, which marks the first cell-commitment event in early development, but that correlation, too, is imperfect. Many IVF programs extend embryo culture to day 5 or 6 to transfer a single blastocyst. This practice successfully decreases the risk
nature biotechnology volume 28 number 10 OCTOBER 2010
of multiple gestations while yielding a higher pregnancy rate for women under the age of 36 (ref. 6). But fertilized eggs from many patients do not form blastocysts in culture. Moreover, the well-studied mouse embryo model has taught us that the rapid cleavage rate that occurs in vivo between the 4-cell and 16-cell stage is not reproduced in vitro under existing culture conditions. Because blastocyst formation begins at a defined interval after fertilization, independent of the number of cell divisions, mouse embryos developed in vivo have more than twice as many cells at 1025
© 2010 Nature America, Inc. All rights reserved.
news an d v iews the blastocyst stage than embryos developed in culture7. Should the situation be the same for human embryos, extended culture would lead to blastocysts with fewer cells available to form the fetus—a possible explanation for the low birth weight reported for some IVF babies3,4. The obvious way to improve IVF rates of pregnancy and of singleton births is to choose one healthy embryo, as soon after fertilization as possible, for transfer at the ideal time into the uterus of the prospective mother. Herein lies the value of the new work by Wong et al.5. By stunning time-lapse photography of 100 fertilized human eggs cultured to day 5 or 6, they discovered three characteristics that could predict progression to the blastocyst stage with a sensitivity and a specificity of 93% and 94%, respectively: (i) duration of the first cleavage furrow leading to 2 cells of 14 ± 6 min, (ii) a 2-cell stage lasting only 11 ± 2 h and (iii) the 2-cell blastomeres cleaving to 4 cells within 1.0 ± 1.6 h of each other (Fig. 1a). These noninvasive, early-cleavage parameters should be easily adaptable by IVF programs to help select the best embryo for transfer. In addition to establishing cell-division guidelines that predict blastocyst development, Wong et al.5 correlated patterns of cleavage with gene expression in an additional 142 embryos. Errors in early cleavages that give rise to chromosomal aneuploidy have been cited as leading to embryonic failure4,8. But genome-wide analyses of gene expression in normal-appearing, 8-cell human embryos9 suggested that aneuploidy may be common in the early cleavage stages of apparently normal embryos. These studies revealed a lack of cell cycle checkpoints, such as Rb and Wee1, and overexpression of cell cycle drivers, such as Cyclins A, B and E, and Myc, which allows for the rapid rates of gene amplification needed for maternal signaling. In fact, aneuploidy in early-cleaving embryos may not be lethal to fetal development because most of the early cells will form trophoblast (Fig. 1b). An important feature of early mammalian development is the enormous size of the egg; thus, DNA duplication and cell division to approximately the 64-cell stage occurs without the need for cell growth or, perhaps, for growth factor stimulation. Geometrically, at the 16-cell stage, 1 or 2 cells trapped in the middle of the cell mass initiate commitment to inner cell mass (ICM) cells (precursor fetal cells), while the outer cells form the trophoblast lineage (precursor placental cells). Cultured ICM cells can also give rise to embryonic stem cells, which do express Rb and Wee1 (ref. 9), suggesting that the ICM has active cell cycle checkpoints to help maintain the chromosome integrity of the developing embryo. 1026
Wong et al.5 describe several aberrant embryo phenotypes, as well as gene expression analyses of individual blastomeres, that reveal asynchrony in cell division, in gene expression and in the degradation of maternal messages. These results suggest that the current view of the early-cleaving embryo—that all cells are equivalent and chromosomally balanced— should be revised to recognize that aneuploidy after the 2-cell stage may be common8, that aneuploid trophoblast cells may be able to give rise to a fully functioning placenta and that only ICM cells (as few as 12% at the 16-cell stage) must be chromosomally balanced to give rise to a normal fetus. Balancing rapid cell divisions with accurate chromosome allocation is the yin and yang of early human development. A new human life arises from fewer than 30% of fertilized human eggs1–4. By careful comparison of cell division characteristics with the emerging lists of cell
cycle gene elements10, we may be able not only to predict which embryos have the greatest potential for life, but also to develop therapies for early embryos that will balance their yin and yang in favor of development to offspring. COMPETING FINANCIAL INTERESTS The author declares no competing financial interests. 1. Evers, J.L.H. Lancet 360, 151–159 (2002). 2. Macklon, N.S. et al. Hum. Reprod. Update 8, 333–343 (2002). 3. Adashi, E.Y. et al. Reprod. Biomed. Online 7, 515–542 (2003). 4. de Mouzon, J. et al. Hum. Reprod. 24, 2310–2320 (2009). 5. Wong, C.C. et al. Nat. Biotechnol. 28, 1115–1121 (2010). 6. Stillman, R.J. et al. Fertil. Steril. 92, 1895–1906 (2009). 7. Kiessling, A.A. et al. J. Exp. Zool. 258, 34–47 (1991). 8. Northrop, L.E. et al. Mol. Hum. Reprod. 16, 590–600 (2010). 9. Kiessling, A.A. et al. J. Assist. Reprod. Genet. 27, 265–276 (2010). 10. Neumann, B. et al. Nature 464, 721–727 (2010).
Taking the measure of the methylome Stephan Beck Two comparative studies from the International Human Epigenome Project find high concordance between different methods for measuring genomic methylation. With the rapid development of new methods for epigenomic analysis, the need for a systematic assessment of available technologies has become acute. In this issue, Harris et al.1 and Bock et al.2 compare the performance of commonly used techniques for DNA methylation analysis in terms of cost, resolution, genome coverage and accuracy. The findings provide a first benchmark of which method works best for which part of the methylome. In humans, DNA methylation occurs predominantly at cytosine bases in the form of methyl cytosines (mCs), methyl cytosine guanine dinucleotides (mCGs), hydroxymethyl cytosines (hmCs) and, possibly, in other, yet unknown forms. Collectively, these modifications define the DNA methylome of a cell. Together with the study of other epigenetic marks, methylome analysis forms an integral part of ongoing efforts to elucidate the epi genomes of healthy and diseased cell types. Stephan Beck is at the UCL Cancer Institute, University College London, London, UK. e-mail: [email protected]
Such methylome maps will allow the identification of genomic regions involved in cell differentiation and disease. Following years of planning by the Epigenome Taskforce3 and other initiatives, the studies of Harris et al.1 and Bock et al.2 mark another milestone for the International Human Epigenome Project4. Together with recent papers by Li et al.5 and Robinson et al.6 (not discussed here), they compare the performance of the main technologies for mCG methylome analysis. Until now, we did not know how well these methods work, what their particular strengths and weaknesses are, or the extent to which the resulting methylation maps overlap. Understanding these issues is especially important when choosing a method for generating so-called reference methylomes, which will be used as definitive resources in future research and must therefore be as accurate and comprehensive as possible. In all, Harris et al.1 and Bock et al.2 tested six methods, of which five are sequencingbased and one is array-based. Three of the methods—MethylC-seq (data from ref. 7),
volume 28 number 10 October 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
news an d v iews the blastocyst stage than embryos developed in culture7. Should the situation be the same for human embryos, extended culture would lead to blastocysts with fewer cells available to form the fetus—a possible explanation for the low birth weight reported for some IVF babies3,4. The obvious way to improve IVF rates of pregnancy and of singleton births is to choose one healthy embryo, as soon after fertilization as possible, for transfer at the ideal time into the uterus of the prospective mother. Herein lies the value of the new work by Wong et al.5. By stunning time-lapse photography of 100 fertilized human eggs cultured to day 5 or 6, they discovered three characteristics that could predict progression to the blastocyst stage with a sensitivity and a specificity of 93% and 94%, respectively: (i) duration of the first cleavage furrow leading to 2 cells of 14 ± 6 min, (ii) a 2-cell stage lasting only 11 ± 2 h and (iii) the 2-cell blastomeres cleaving to 4 cells within 1.0 ± 1.6 h of each other (Fig. 1a). These noninvasive, early-cleavage parameters should be easily adaptable by IVF programs to help select the best embryo for transfer. In addition to establishing cell-division guidelines that predict blastocyst development, Wong et al.5 correlated patterns of cleavage with gene expression in an additional 142 embryos. Errors in early cleavages that give rise to chromosomal aneuploidy have been cited as leading to embryonic failure4,8. But genome-wide analyses of gene expression in normal-appearing, 8-cell human embryos9 suggested that aneuploidy may be common in the early cleavage stages of apparently normal embryos. These studies revealed a lack of cell cycle checkpoints, such as Rb and Wee1, and overexpression of cell cycle drivers, such as Cyclins A, B and E, and Myc, which allows for the rapid rates of gene amplification needed for maternal signaling. In fact, aneuploidy in early-cleaving embryos may not be lethal to fetal development because most of the early cells will form trophoblast (Fig. 1b). An important feature of early mammalian development is the enormous size of the egg; thus, DNA duplication and cell division to approximately the 64-cell stage occurs without the need for cell growth or, perhaps, for growth factor stimulation. Geometrically, at the 16-cell stage, 1 or 2 cells trapped in the middle of the cell mass initiate commitment to inner cell mass (ICM) cells (precursor fetal cells), while the outer cells form the trophoblast lineage (precursor placental cells). Cultured ICM cells can also give rise to embryonic stem cells, which do express Rb and Wee1 (ref. 9), suggesting that the ICM has active cell cycle checkpoints to help maintain the chromosome integrity of the developing embryo. 1026
Wong et al.5 describe several aberrant embryo phenotypes, as well as gene expression analyses of individual blastomeres, that reveal asynchrony in cell division, in gene expression and in the degradation of maternal messages. These results suggest that the current view of the early-cleaving embryo—that all cells are equivalent and chromosomally balanced— should be revised to recognize that aneuploidy after the 2-cell stage may be common8, that aneuploid trophoblast cells may be able to give rise to a fully functioning placenta and that only ICM cells (as few as 12% at the 16-cell stage) must be chromosomally balanced to give rise to a normal fetus. Balancing rapid cell divisions with accurate chromosome allocation is the yin and yang of early human development. A new human life arises from fewer than 30% of fertilized human eggs1–4. By careful comparison of cell division characteristics with the emerging lists of cell
cycle gene elements10, we may be able not only to predict which embryos have the greatest potential for life, but also to develop therapies for early embryos that will balance their yin and yang in favor of development to offspring. COMPETING FINANCIAL INTERESTS The author declares no competing financial interests. 1. Evers, J.L.H. Lancet 360, 151–159 (2002). 2. Macklon, N.S. et al. Hum. Reprod. Update 8, 333–343 (2002). 3. Adashi, E.Y. et al. Reprod. Biomed. Online 7, 515–542 (2003). 4. de Mouzon, J. et al. Hum. Reprod. 24, 2310–2320 (2009). 5. Wong, C.C. et al. Nat. Biotechnol. 28, 1115–1121 (2010). 6. Stillman, R.J. et al. Fertil. Steril. 92, 1895–1906 (2009). 7. Kiessling, A.A. et al. J. Exp. Zool. 258, 34–47 (1991). 8. Northrop, L.E. et al. Mol. Hum. Reprod. 16, 590–600 (2010). 9. Kiessling, A.A. et al. J. Assist. Reprod. Genet. 27, 265–276 (2010). 10. Neumann, B. et al. Nature 464, 721–727 (2010).
Taking the measure of the methylome Stephan Beck Two comparative studies from the International Human Epigenome Project find high concordance between different methods for measuring genomic methylation. With the rapid development of new methods for epigenomic analysis, the need for a systematic assessment of available technologies has become acute. In this issue, Harris et al.1 and Bock et al.2 compare the performance of commonly used techniques for DNA methylation analysis in terms of cost, resolution, genome coverage and accuracy. The findings provide a first benchmark of which method works best for which part of the methylome. In humans, DNA methylation occurs predominantly at cytosine bases in the form of methyl cytosines (mCs), methyl cytosine guanine dinucleotides (mCGs), hydroxymethyl cytosines (hmCs) and, possibly, in other, yet unknown forms. Collectively, these modifications define the DNA methylome of a cell. Together with the study of other epigenetic marks, methylome analysis forms an integral part of ongoing efforts to elucidate the epi genomes of healthy and diseased cell types. Stephan Beck is at the UCL Cancer Institute, University College London, London, UK. e-mail: [email protected]
Such methylome maps will allow the identification of genomic regions involved in cell differentiation and disease. Following years of planning by the Epigenome Taskforce3 and other initiatives, the studies of Harris et al.1 and Bock et al.2 mark another milestone for the International Human Epigenome Project4. Together with recent papers by Li et al.5 and Robinson et al.6 (not discussed here), they compare the performance of the main technologies for mCG methylome analysis. Until now, we did not know how well these methods work, what their particular strengths and weaknesses are, or the extent to which the resulting methylation maps overlap. Understanding these issues is especially important when choosing a method for generating so-called reference methylomes, which will be used as definitive resources in future research and must therefore be as accurate and comprehensive as possible. In all, Harris et al.1 and Bock et al.2 tested six methods, of which five are sequencingbased and one is array-based. Three of the methods—MethylC-seq (data from ref. 7),
volume 28 number 10 October 2010 nature biotechnology
news an d v iews Table 1 Key metrics of the technology comparison MethylC-seq Genomic DNA Readout Assay
Resolution
MethylCap-seq
MBD-seq
RRBS
Infinium-27Ka
5 μg
0.3–5 μg
1 μg
3 μg
0.03–0.05 μg
0.5–1 μg
Sequence
Sequence
Sequence
Sequence
Sequence
Array
Bisulfite conversion
Capture with monoclonal antibody
Capture with MBD of MeCP2
Capture with MBD of MBD2
Bisulfite conversion
Bisulfite conversion
1 bp
100–1,000 bp
100–1,000 bp
100–1,000 bp
1 bp
1 bp
Whole-genome (~100%)
Whole-genome (~100%)
Whole-genome (~100%)
Whole-genome (~100%)
Genome-wide (~10%)
Genome-wide (~0.1%)
Actual coverage (1 read threshold)
~95%1
~67%1
~67%2
~61%1
~12%1
(~0.1%)
Actual coverage (5 reads threshold)
~87%1
~23%1
~28%2
~28%1
~10%1
(~0.1%)
Actual coverage (10 reads threshold)
~76%1
~9%1
~14%2
~20%1
~9%1
(~0.1%)
~$100 Kb
~$2 K
~$3 K
~$2 K
~$2 K
~$0.2 K
NA
NA
NA
NA
NA
NA
~99%
Theoretical coverage
Cost Concordance (6-, 5-way) (4-way)1
~99%
~99%
~99%
Concordance (3-way)1
~100%
~100%
~100%
Concordance (2-way)1
~96%
Concordance
© 2010 Nature America, Inc. All rights reserved.
MeDIP-seq
~96%
(2-way)1
~96%
Concordance (2-way)2
~84%
Concordance
Concordance (2-way)2
~96% ~84% ~88%
~88%
Concordance (2-way)1
~91%
~91%
Concordance (2-way)1
~97%
Concordance (2-way)2
~92%
~92%
Good for CpG islands
Good for promoters
Conclusion
Gold standard but issue with hmC
Good all-rounder
Good all-rounder
Good all-rounder
~97%
Where appropriate, numbers are rounded or shown as a range. As sequencing costs are falling rapidly, the estimates shown are approximate and based on the assumption of ~$1K per lane on an Illumina Genome Analyser (see refs. 1 and 2 for details on the models used). To determine the maximum achievable genome coverage, MethylC and RRBS were subjected to saturation sequencing, entailing 2 lanes of sequencing for RRBS. aFor
the MethylC data reported by Lister et al.7 in 2009, the exact number of lanes could not be determined and the estimated 100 lanes (resulting in estimated costs of $100K), are likely to represent the upper limit. The current costs for a MethylC methylome is closer to $20K. MeDIP and MBD were subjected to 2 lanes and MethylCap to 3 lanes of sequencing. bA 450K upgrade of the current 27K Infinium array has been announced for later this year.
reduced representation bisulfite sequencing (RRBS) and the Infinium-27K bead-array—use sodium bisulfite treatment of DNA, which converts unmethylated but not methylated cytosine to uracil. The other three—methylated DNA immunoprecipitation sequencing (MeDIPseq), methylated DNA capture by affinity purification (MethylCap-seq) and methylated DNA binding domain sequencing (MBDseq)—rely on capture of methylated DNA by a monoclonal antibody or by the recombinant methyl-binding domains of MECP2 or MBD2, respectively. Each method was subjected to rigorous quality control, and all results were supported by comprehensive statistical analysis of at least two replicate samples. Table 1 summarizes some of the metrics examined. In addition to cost, the other important parameters when choosing a method for a particular methylome analysis are resolution, coverage and accuracy. With respect to resolution, the choice is straightforward between the high resolution (1 bp) achieved with the bisulfite-based methods and the low resolution (≥100 bp) of capture-based methods. Although the highest possible resolution
is usually desirable, single-base-pair resolution is not always required because the methylation status of adjacent CpG sites is highly correlated for up to 1,000 bp. Coverage and accuracy are much more difficult to assess as the different methods have different dependencies—including CpG density, fragment length, capture affinity, read length, read depth and, for capture methods, absence of reads in unmethylated regions—making a direct comparison challenging. Based on the fraction of the genome that can potentially be analyzed by each method, the theoretical mCG coverage is ~100% for MethylC-seq, MeDIP-seq, MethylCap-seq and MBD-seq, ~10% for RRBS and ~0.1% for Infinium-27K. Determining whether this coverage is actually achievable in practice requires the generation and analysis of saturation data for each method. This was done only for MethylC-seq and RRBS and is not applicable to Infinium-27K. Applying thresholds of 1 to 10 reads per mCG, the determined actual coverage ranges from 96–76% for MethylC-seq and 12–9% for RRBS, which is close to the theoretical limits of coverage, particularly for RRBS. Because
nature biotechnology volume 28 number 10 OCTOBER 2010
saturation sequencing was not carried out for MeDIP-seq, MethylCap-seq and MBD-seq, the actual coverage data presented for these methods are less representative, as evident from the large variation (67–9%) in coverage when applying the same thresholds of 1 to 10 reads per mCG. Both studies1,2 assessed accuracy by comparing overlapping data sets between methods. For most comparisons, the Infinium rather than the more comprehensive MethylC data were used as a common standard. This is somewhat unfortunate as the resulting comparisons are therefore between sequencing- and array-derived data and limited to the small set of highly selected CpG sites on the Infinium-27K array. Nevertheless, the overall concordance is encouragingly high (84–100%), depending on the comparison (2- to 4-way comparisons were conducted) and the parameters. This is good news and lends confidence to the many existing data sets already generated by any of the methods. Both studies1,2 conclude that all of the evaluated methods are capable of producing accurate data, and neither recommends 1027
© 2010 Nature America, Inc. All rights reserved.
news an d v iews a particular method for the generation of reference methylomes, although Harris et al.1 suggest the possibility of hybrid methods and show improved results for MeDIP-seq integrated with MRE-seq (based on methylationsensitive restriction). Although the two studies1,2 have successfully resolved many long-standing questions in the epigenomics community, several challenges remain. The most pressing concern is that a full methylome analysis should include mC and hmC in addition to mCG, although the biological functions of these modifications have yet to be determined. Another challenge is that bisulfite-based methods (the current gold standard of methylation analysis) cannot distinguish between methylation and hydroxymethylation8, which has implications for all bisulfite-based data already deposited in public databases. As the International Human Epigenome Consortium gears up to generate 1,000 reference epigenomes, the participating laboratories will undoubtedly use different methylome analysis methods. It will therefore be important to develop a procedure for assigning quality values
to the methylation status of each cytosine. A similar metric proved to be very helpful in the assembly and use of the draft sequence of the human genome. For the future, there are great expectations that one day we will be able to read the different forms of DNA methylation directly using methods such as nanopore9 and singlemolecule, real-time10 sequencing. For now, however, with careful management, our current technology is adequate to move ‘AHEAD’. COMPETING FINANCIAL INTERESTS The author declares no competing financial interests. 1. Harris, R.A. et al. Nat. Biotechnol. 28, 1097–1105 (2010). 2. Bock, C. et al. Nat. Biotechnol. 28, 1106–1114 (2010). 3. Jones, P.A. & Martienssen, R. Cancer Res. 65, 11241– 11246 (2005). 4. Satterlee, J. Nat. Biotechnol. 28, 1039–1044 (2010). 5. Li, N. et al. Methods published online, doi:10.1016/ j.ymeth.2010.04.009, 27 April 2010. 6. Robinson, M. et al. Epigenomics 2, 587–598 (2010). 7. Lister, R. et al. Nature 462, 315–322 (2009). 8. Huang, Y. et al. PLoS ONE 5, e8888 (2010). 9. Clarke, J. et al. Nat. Nanotechnol. 4, 265–270 (2009). 10. Flusberg, B.A. et al. Nat. Methods 7, 461–465 (2010).
Tracing cancer networks with phosphoproteomics David B Solit & Ingo K Mellinghoff A mass-spectrometry approach for identifying downstream events in cancer signaling pathways may help to tailor therapies to individual patients. The clinical success of kinase inhibitors such as Gleevec (imatinib) has provided a glimpse of what can be achieved by targeting the signaling pathways involved in the growth of cancer cells1. But these signal-transduction networks are still poorly understood, hampering efforts to apply this paradigm more broadly to patients with advanced cancer. Two recent studies, by Moritz et al.2 and Andersen et al.3, show how this challenge might be addressed with ‘compound-centric’ phosphoproteomics. The findings, reported in Science Signaling2 and Science Translational Medicine3, not only provide new insights into the signaling circuitry responsible for cell proliferation David B. Solit and Ingo K. Mellinghoff are in the Human Oncology and Pathogenesis Program, Memorial Sloan-Kettering Cancer Center, New York, New York, USA. e-mail: [email protected] or [email protected]
1028
but may also be of value in identifying the greatest vulnerabilities of particular tumor cells and, therefore, in optimizing therapies for individual patients. Many human cancers have alterations in the phosphatidylinositol-3-OH kinase (PI(3)K) pathway, which has become an area of particular interest in drug development4. Rapamycin (Rapamune, sirolimus), an immunosuppressive agent that inhibits the kinase mTOR (‘mammalian target of rapamycin’) and is used clinically in organ transplantation, was the first inhibitor of a PI(3)K signaling intermediate to enter broad clinical testing for cancer5. But despite compelling preclinical results (particularly in models with aberrant PI(3)K pathway activation) and modest efficacy in patients with renal cell carcinoma, the overall clinical success of rapamycin in oncology has been disappointing. These failures may be due in part to activation of AKT and
MAPK by de-inhibition of negative feedback loops6,7 and to redundant regulation of key downstream effectors of transformation by parallel signaling pathways8. In many respects, this experience exemplifies the challenges of targeting a signaling network that is insufficiently understood. The two new studies2,3 used global mass spectrometry (MS)-based approaches to identify substrates of serine (Ser)/threonine (Thr) kinases downstream of receptor tyrosine kinases (RTKs), RAS, PI(3)K and mTOR. Selected members of the signaling network comprising RAS, PI(3)K and the mTORcontaining complexes TORC1 and TORC2 (ref. 9) are shown in Figure 1a. Each of these core signaling pathways activates kinases that phosphorylate their substrates in a contextspecific manner, depending on the amino acids flanking the phosphorylation site. Both studies2,3 used phosphomotif-specific antibodies for immunoaffinity purification before MS analysis and quantified the effects of various pathway inhibitors on the newly identified Ser/Thr-phosphorylation sites using an approach based on stable isotope labeling with amino acids in cell culture (SILAC)10 (Fig. 1b). Moritz et al.2 identified >300 substrates in three human cancer cell lines with mutations in either epidermal growth factor (EGFR), hepatocyte growth factor receptor (MET) or platelet-derived growth factor receptor α (PDGFRΑ); almost half of these substrates were identified for the first time. Phosphorylation of 21 proteins decreased significantly in all three cell lines after inhibition of the oncogenic RTK. The targets include the previously reported Akt-RSK-S6 kinase substrates glycogen synthase kinase 3A and B, ribosomal protein S6 (RPS6) and the proline-rich Akt1 substrate (PRAS40). The study by Andersen et al.3 focused on the PI(3)K branch of the network and used a PTEN-deficient human prostate cancer cell line, a broader immunoaffinity purification scheme (enriching for AKT substrates, MAPK substrates and PDK1-docking motifs) and a different set of pathway inhibitors (targeting PDK1, AKT and both PI3K and mTOR). The authors identified 375 nonredundant phosphopeptides, of which about a quarter showed a substantial change in phosphorylation in response to pathway perturbation. Some proteins (e.g., RPS6 and PRAS40) showed decreased phosphorylation in response to all three pathway inhibitors, whereas others showed more selective responses to particular inhibitors (e.g., RPS6KA6 for the PDK1 inhibitor). The authors then focused on PRAS40 and showed that its phosphorylation at Thr246 positively correlates with phosphorylation of
volume 28 number 10 OCTOBER October 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
news an d v iews a particular method for the generation of reference methylomes, although Harris et al.1 suggest the possibility of hybrid methods and show improved results for MeDIP-seq integrated with MRE-seq (based on methylationsensitive restriction). Although the two studies1,2 have successfully resolved many long-standing questions in the epigenomics community, several challenges remain. The most pressing concern is that a full methylome analysis should include mC and hmC in addition to mCG, although the biological functions of these modifications have yet to be determined. Another challenge is that bisulfite-based methods (the current gold standard of methylation analysis) cannot distinguish between methylation and hydroxymethylation8, which has implications for all bisulfite-based data already deposited in public databases. As the International Human Epigenome Consortium gears up to generate 1,000 reference epigenomes, the participating laboratories will undoubtedly use different methylome analysis methods. It will therefore be important to develop a procedure for assigning quality values
to the methylation status of each cytosine. A similar metric proved to be very helpful in the assembly and use of the draft sequence of the human genome. For the future, there are great expectations that one day we will be able to read the different forms of DNA methylation directly using methods such as nanopore9 and singlemolecule, real-time10 sequencing. For now, however, with careful management, our current technology is adequate to move ‘AHEAD’. COMPETING FINANCIAL INTERESTS The author declares no competing financial interests. 1. Harris, R.A. et al. Nat. Biotechnol. 28, 1097–1105 (2010). 2. Bock, C. et al. Nat. Biotechnol. 28, 1106–1114 (2010). 3. Jones, P.A. & Martienssen, R. Cancer Res. 65, 11241– 11246 (2005). 4. Satterlee, J. Nat. Biotechnol. 28, 1039–1044 (2010). 5. Li, N. et al. Methods published online, doi:10.1016/ j.ymeth.2010.04.009, 27 April 2010. 6. Robinson, M. et al. Epigenomics 2, 587–598 (2010). 7. Lister, R. et al. Nature 462, 315–322 (2009). 8. Huang, Y. et al. PLoS ONE 5, e8888 (2010). 9. Clarke, J. et al. Nat. Nanotechnol. 4, 265–270 (2009). 10. Flusberg, B.A. et al. Nat. Methods 7, 461–465 (2010).
Tracing cancer networks with phosphoproteomics David B Solit & Ingo K Mellinghoff A mass-spectrometry approach for identifying downstream events in cancer signaling pathways may help to tailor therapies to individual patients. The clinical success of kinase inhibitors such as Gleevec (imatinib) has provided a glimpse of what can be achieved by targeting the signaling pathways involved in the growth of cancer cells1. But these signal-transduction networks are still poorly understood, hampering efforts to apply this paradigm more broadly to patients with advanced cancer. Two recent studies, by Moritz et al.2 and Andersen et al.3, show how this challenge might be addressed with ‘compound-centric’ phosphoproteomics. The findings, reported in Science Signaling2 and Science Translational Medicine3, not only provide new insights into the signaling circuitry responsible for cell proliferation David B. Solit and Ingo K. Mellinghoff are in the Human Oncology and Pathogenesis Program, Memorial Sloan-Kettering Cancer Center, New York, New York, USA. e-mail: [email protected] or [email protected]
1028
but may also be of value in identifying the greatest vulnerabilities of particular tumor cells and, therefore, in optimizing therapies for individual patients. Many human cancers have alterations in the phosphatidylinositol-3-OH kinase (PI(3)K) pathway, which has become an area of particular interest in drug development4. Rapamycin (Rapamune, sirolimus), an immunosuppressive agent that inhibits the kinase mTOR (‘mammalian target of rapamycin’) and is used clinically in organ transplantation, was the first inhibitor of a PI(3)K signaling intermediate to enter broad clinical testing for cancer5. But despite compelling preclinical results (particularly in models with aberrant PI(3)K pathway activation) and modest efficacy in patients with renal cell carcinoma, the overall clinical success of rapamycin in oncology has been disappointing. These failures may be due in part to activation of AKT and
MAPK by de-inhibition of negative feedback loops6,7 and to redundant regulation of key downstream effectors of transformation by parallel signaling pathways8. In many respects, this experience exemplifies the challenges of targeting a signaling network that is insufficiently understood. The two new studies2,3 used global mass spectrometry (MS)-based approaches to identify substrates of serine (Ser)/threonine (Thr) kinases downstream of receptor tyrosine kinases (RTKs), RAS, PI(3)K and mTOR. Selected members of the signaling network comprising RAS, PI(3)K and the mTORcontaining complexes TORC1 and TORC2 (ref. 9) are shown in Figure 1a. Each of these core signaling pathways activates kinases that phosphorylate their substrates in a contextspecific manner, depending on the amino acids flanking the phosphorylation site. Both studies2,3 used phosphomotif-specific antibodies for immunoaffinity purification before MS analysis and quantified the effects of various pathway inhibitors on the newly identified Ser/Thr-phosphorylation sites using an approach based on stable isotope labeling with amino acids in cell culture (SILAC)10 (Fig. 1b). Moritz et al.2 identified >300 substrates in three human cancer cell lines with mutations in either epidermal growth factor (EGFR), hepatocyte growth factor receptor (MET) or platelet-derived growth factor receptor α (PDGFRΑ); almost half of these substrates were identified for the first time. Phosphorylation of 21 proteins decreased significantly in all three cell lines after inhibition of the oncogenic RTK. The targets include the previously reported Akt-RSK-S6 kinase substrates glycogen synthase kinase 3A and B, ribosomal protein S6 (RPS6) and the proline-rich Akt1 substrate (PRAS40). The study by Andersen et al.3 focused on the PI(3)K branch of the network and used a PTEN-deficient human prostate cancer cell line, a broader immunoaffinity purification scheme (enriching for AKT substrates, MAPK substrates and PDK1-docking motifs) and a different set of pathway inhibitors (targeting PDK1, AKT and both PI3K and mTOR). The authors identified 375 nonredundant phosphopeptides, of which about a quarter showed a substantial change in phosphorylation in response to pathway perturbation. Some proteins (e.g., RPS6 and PRAS40) showed decreased phosphorylation in response to all three pathway inhibitors, whereas others showed more selective responses to particular inhibitors (e.g., RPS6KA6 for the PDK1 inhibitor). The authors then focused on PRAS40 and showed that its phosphorylation at Thr246 positively correlates with phosphorylation of
volume 28 number 10 OCTOBER October 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
news an d v iews AKT at Ser473 and with the sensitivity of cancer cell lines to an allosteric AKT inhibitor. Until now, the technical challenges of MS-based detection of phosphopeptide substrates have limited our ability to detect Ser/ Thr phosphorylation events in cancer. Beyond providing new information about the signaling circuitry downstream of Akt, MAPK, RSK and S6K, these pioneering studies2,3 open the entire space of Ser/Thr protein phosphorylation for further study. Nonetheless, it remains possible that some of the observed drug-induced phosphorylation changes represent ‘off-target’ effects. Additional, confirmatory experiments involving genetic approaches and more specific compounds will be needed before we can revise our picture of the RTK/RAS/PI3K/ mTOR network. The studies2,3 are equally promising from the perspective of clinical drug development. First, they document the effects of compounds on a large number of phosphorylation events, which can be quantified in clinical tumor samples using various antibody-based proteomic assays. Such information can guide dosing decisions with molecularly targeted therapies during early clinical drug evaluations and help to prevent drug development resources from being wasted on compounds that do not achieve sufficient target inhibition in tumor tissue11. Second, especially if linked to the detection of phosphotyrosine protein modifications in the same sample3, compoundcentric phosphoproteomics may uncover unexpected effects of the drug on upstream or parallel signaling networks that mediate drug resistance, identify mechanisms of ‘off-target’ drug toxicity or suggest new opportunities for combination therapies12. Thus far, kinase inhibitor therapy has been most successful for cancers with an activating mutation that can be readily identified in routinely collected clinical samples using genomic assays. Examples include nonsmall cell lung cancers with mutations in the EGFR kinase domain or melanomas with BRAF mutations. It remains unclear which mutations predict responsiveness to PI(3)K/ mTOR pathway inhibitors. Moreover, signaling through this pathway can be deregulated by many molecular alterations. This genomic
a
RTKi
b
RTK RTK
RAS U0126
Wortmannin
PI(3)K TORC2
MEK
PDK1
PDK1i
Vehicle
Inhibitor
12
13
12
13
C-Arg C-Lys
C-Arg C-Lys
SILAC labeling of human cancer cell lines and short-term treatment with inhibitors of RTKs, PI(3)K, AKT, MEK, mTOR
Rapamycin PI-103 ERK RSK
TORC1 S6K
AKT AKTi
Protein substrates with Ser/Thr phosphorylation sites (GSK3, PRAS40, RPS6, many others)
Mix cell lysates 1:1 Immunoaffinity purification with antibodies specific to AGC kinase and MAPK kinase phosphomotifs Mass spectrometry
Figure 1 Identification of (Ser)/(Thr) phosphorylation substrates in core cancer signaling pathways. (a) Probing the RAS, PI(3)K and mTOR signaling pathways with inhibitors. Members of the RAS-PI(3)K-mTOR signaling network9 are shown in black and inhibitors used by Moritz et al.2 and Andersen et al.3 are shown in red. ERK, extracellular signal-regulated kinase; GSK3, glycogen synthase kinase 3; MEK, mitogen-activated protein/extracellular signal-regulated kinase kinase; PDK1, 3-phosphoinositide-dependent protein kinase 1; PI(3)K, phosphatidylinositol-3-OH-kinase; PRAS40, proline-rich AKT1 substrate 1; RPS6, ribosomal protein S6; RSK, ribosomal S6 kinase; RTK, receptor tyrosine kinase; S6K, p70 ribosomal protein S6 kinase; TORC1/2, mammalian target of rapamyin complex 1/2. (b) SILAC-based mass spectrometry10 to quantify inhibitor-induced changes in Ser/Thr phosphorylation. Cancer cell lines are grown either in ‘light’ medium containing the normal forms of the amino acids lysine (12C6-Lys) and arginine (12C6-Arg) or in ‘heavy’ medium containing 13C6-Lys and 13C6Arg. After short-term treatment with inhibitor, cells are lysed and lysates pooled before immunoaffinity purification with antibodies specific to phosphomotifs of interest. Inhibitor-induced changes in phosphorylation patterns are quantified by comparing protein abundance using the light and heavy peaks in the mass spectra. AGC, cAMP (adenosine 3′,5′-monophosphate)-dependent, cGMP (guanosine 3′,5′-monophosphate)-dependent, and protein kinase C; MAPK, mitogen-activated protein kinase.
complexity represents a rate-limiting step in the further development of PI(3)K pathway inhibitors and has spurred interest in transcriptional13 or proteomic markers of aberrant pathway activation. It remains to be seen whether at least a subset of cancers display ‘pathway addiction’ as opposed to ‘oncogene addition’ to particular components within the PI(3)K pathway. Perhaps the combination of a robust pathway-activation marker (e.g., phosphoPRAS40Thr246) with focused mutational analysis (e.g., mutational profiling of PIK3CA) will offer a reasonable compromise for patient stratification into PI(3)K/AKT inhibitor trials, as suggested by Anderson et al.3. Clearly, much work remains to be done to realize the clinical potential of genomics and proteomics. Nonetheless, these two studies2,3 represent outstanding examples of hypothesis-driven biomarker discovery, which,
nature biotechnology volume 28 number 10 OCTOBER 2010
once validated in a broader genetic context, are likely to produce new pharmacological opportunities for disrupting cancer-associated signaling networks. COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests. 1. Varmus, H. Science 312, 1162–1165 (2006). 2. Moritz, A. et al. Sci. Signal. 3, ra64 (2010). 3. Andersen, J.N. et al. Sci. Transl. Med. 2, 43ra55 (2010). 4. Courtney, K.D., Corcoran, R.B. & Engelman, J.A. J. Clin. Oncol. 28, 1075–1083 (2010). 5. Sabatini, D.M. Nat. Rev. Cancer 6, 729–734 (2006). 6. Cloughesy, T.F. et al. PLoS Med. 5, e8 (2008). 7. Carracedo, A. et al. J. Clin. Invest. 118, 3065–3074 (2008). 8. She, Q.B. et al. Cancer Cell 18, 39–51 (2010). 9. Shaw, R.J. & Cantley, L.C. Nature 441, 424–430 (2006). 10. Mann, M. Nat. Rev. Mol. Cell Biol. 7, 952–958 (2006). 11. Sawyers, C.L. Nature 452, 548–552 (2008). 12. Wallin, J.J. et al. Sci. Transl. Med. 2, 48ra66 (2010). 13. Saal, L.H. et al. Proc. Natl. Acad. Sci. USA 104, 7564–7569 (2007).
1029
r e s e a r c h h i g h l i g h ts
© 2010 Nature America, Inc. All rights reserved.
Chimeric mouse with a rat pancreas The successful generation of human induced pluripotent stem cells (iPSCs) has made the production of patient-specific organs for transplantation conceivable, although the immense technical difficulties of creating such organs in vitro make it unlikely that this aim will be achieved in the near future. Kobayashi et al. now suggest a potential alternative route. Working with rats and mice, they show that it is possible to gestate rat-mice chimeras to term, a feat achieved previously only once for animals belonging to different genera—the geep, a chimera between goats and sheep. The authors inject either rat or mouse iPSCs into a blastocyst of the respective other species and transplant the blastocysts into pseudopregnant mothers of the species of the donor blastocyst. The resulting animals show a strong contribution by the chimeric cells to all organs tested. To test if it is possible to derive an organ entirely from iPSCs, Kobayashi et al. use a mouse line with a deletion of the transcription factor Pdx1, which is essential for the development of the pancreas. When rat iPSCs are injected into mouse Pdx1–/– blastocysts, the chimeric animal grows a pancreas derived entirely from rat cells that can functionally substitute for the missing mouse organ. Although considerable conceptual, technical, ethical and legal challenges remain in translating this approach to human organs, this paper may open up an new route to patient-specific organ production. (Cell 142, 787–799, 2010) ME
A methylome map of lineage commitment How do epigenetic marks control the present and future identity of a cell? Several studies have compared the DNA methylome of pluripotent and differentiated cells, but a new paper by Ji et al. is the first to map the methylome of a differentiation hierarchy that encompasses several stages of progressive cell-fate restriction. The authors measure global CpG methylation in mouse hematopoietic multipotent progenitor cells and in their two progeny, common lymphoid progenitors and common myeloid progenitors. They also analyze cell populations further down these pathways: thymocyte progenitors and granulocyte/macrophage progenitors. The data reveal considerable epigenetic plasticity over the course of differentiation and show that lymphopoiesis involves far more DNA methylation than myelopoiesis. The study also identifies many new genes associated with the choice between lymphoid and myeloid fate. (Nature 467, 338–342, 2010) KA
Plug-and-play sequence analysis Genome sequencers are being increasingly used in the research laboratory and in the clinic, resulting in an acute need for robust software for analyzing the data. McKenna et al. describe a software tool kit with code for performing important low-level data-management Written by Kathy Aschheim, Laura DeFrancesco, Markus Elsner, Peter Hare & Craig Mak
1030
tasks, which provide the foundation for higher-level analyses of personal genomes, cancer genomes and exomes, for example. The software, called the Genome Analysis Toolkit, describes much needed conceptual patterns—‘abstractions’ in the parlance of computer scientists—for simplifying how programmers structure their code. These abstractions, and previous efforts such as SAMtools, Galaxy and the ShortRead package, are essential as they reduce software errors, promote sharing of knowledge and should speed the development of analytical pipelines. Given that all of the potential consumers of high-throughput sequence data will not have the bioinformatics resources to engineer customized pipelines from scratch, an open source code base of low-level software libraries should be valuable, much as libraries for internet communication protocols aided computer networking. (Genome Res. 20, 1297–1303, 2010) CM
Antimalarial drug candidate Evidence of the emerging resistance to artemisinin derivatives has increased the urgency of identifying new classes of compounds to treat malaria. Despite substantial advances in our understanding of the biology of Plasmodium in recent years, the success rates of efforts to rationally design drugs that target molecular vulnerabilities in the malarial parasite have been disappointing. Rottmann et al. report that a more traditional approach—using cellular proliferation assays to screen chemical libraries—has identified spirotetrahydro-β-carbolines, or spiroindolones, as promising antimalarials. Cell-based assays and rodent data suggest that their most promising compound, NITD609, has all of the attributes needed for an antimalarial treatment. These include efficacy at concentrations that are potentially compatible with a once-daily oral dosing regimen. Unlike most commonly used antimalarial drugs, NITD609 rapidly stops protein synthesis, possibly by disrupting ion homeostasis maintained by a Plasmodium falciparum P-type ATPase. This protein seems a likely site for the emergence of resistance should the promising preclinical data for the drug translate to its broad clinical use. (Science 329, 1175–1180, 2010) PH
Obesity leaves its methylation marks Associating genetic variants with complex phenotypes like obesity or heart disease has proven elusive, but Feinberg et al. have done just that with a set of epigenetic markers. Using their comprehensive, highthroughput, array-based, relative methylation technology, they analyze samples from a group of Icelandic individuals who participated in a more than decade-long study called Age, Gene/Environment Susceptibility. By analyzing 4.5 million CpG sites from 74 people whose lymphocytes provided ample DNA on two occasions, the researchers identify 227 regions that show interindividual variations or variably methylated regions (VMRs), which encompass genes important in development and morphogenesis. The authors also assess differences within individuals, identifying two distinct classes of genes, 41 of which vary and 199 of which remain stable. Using cross-sectional linear regression analysis for each VMR in relation to body mass index (BMI), they go on to determine the relationship of the VMRs to obesity. From this analysis, 13 VMRs are found to co-vary with BMI, four of which did so stably over the course of the 11 years of study. Although the sample size is small, this study suggests the utility of VMR association in defining the epigenetic basis of a disease. In addition, it is the first comprehensive study demonstrating stable methylation marks that uniquely identify individuals. (Sci. Transl. Med. 2, 49ra67, 2010) LD
volume 28 number 10 OCTOBER 2010 nature biotechnology
Editorial
Making a mark High-throughput technologies are enabling epigenetic modifications to be mapped on a genome-wide scale, but whether such knowledge can be rapidly translated into biomedical applications remains unclear.
© 2010 Nature America, Inc. All rights reserved.
D
espite its inception over 60 years ago, epigenetics is very much in its formative stages. Even the term ‘epigenetics’ means different things to different people. The best working definition for the field is that it is the study of traits heritable through meiosis or mitosis that are not dependent on the primary DNA sequence. Even so, British geneticist Adrian Bird has commented, “Epigenetics is a useful word if you don’t know what’s going on—if you do, you use something else.” In the past year, ‘what’s going on’ has become a good deal clearer. The first DNA methylomes for different human cell types have now been worked out; the long-sought mammalian DNA demethylase has been identified; heritable epigenetic marks have been demonstrated to not only depend on genetic variation, but also vary in ways associated with disease predisposition; an expanding group of noncoding RNAs has been shown to interact with the epigenetic machinery; the role of methylation in regulating alternative splicing has been established; and additional evidence has accrued that chromatin modifications are important for neuronal plasticity and protracted changes in brain function. At the same time, efforts to create genome-wide catalogs of covalent modifications of DNA and histones have been spurred by next-generation sequencing and array technologies that offer greater throughput and sensitivity. The molecular actors participating in what’s going on have also become clearer: covalent modifications both to DNA (e.g., methylcytosine and hydroxymethylcytosine nucleotides) and to histones/histone variants (acetylation, methylation, phosphorylation and so on) as well as noncoding RNA molecules (microRNAs, small nucleolar RNAs and large intergenic noncoding RNAs (lincRNAs)), transcription factors, DNA-binding proteins and even cytoplasmic signaling factors. It is now evident that, unlike changes to DNA sequence, most chromatin states are remarkably reversible and transient. Even DNA methylation— long considered a permanent, gene silencing, epigenetic mark—can be removed in certain instances. These chromatin signatures change during aging and are influenced by environmental factors, such as maternal behavior, physical exercise and diet. And dysregulation of epigenetic silencing is associated with several diseases, including imprinting disorders, Rett syndrome, facioscapulohumeral muscular dystrophy and even autism. But it is in the realm of cancer, particularly leukemias, where epigenetic research has yielded insights into abnormalities in histone marks on promoters, aberrant DNA methylation at CpG islands and microRNAs. For solid tumors, malignancies have been associated with spontaneous defects in tumor suppressor gene silencing and breast cancer invasiveness/metastasis recently has been linked to lincRNA-mediated retargeting of a histone methylase. Aberrant chromatin remodeling has also been implicated in the process of somatic cell nuclear transfer (SCNT) used to clone animals. Few cloned embryos survive to term and many of the offspring die postnatally or are abnormal. That inappropriate epigenetic signatures are responsible for these defects is evident from the fact that the offspring nature biotechnology volume 28 number 10 OCTOBER 2010
of these cloned animals—the second generation—are phenotypically normal. In this context, it is sobering that work published in Nature (467, 280–281, 2010) last month suggests that induced pluripotent stem (iPS) cells show less complete epigenetic reprogramming than embryonic stem cells produced via SCNT. Of course, if high-throughput technologies can help determine the appropriate signature of epigenetic marks, it may be possible to screen for more fully reprogrammed iPS cells. But given the plasticity of many histone modifications, it remains uncertain whether epigenetic signatures alone will have sufficient predictive or diagnostic value. In most cases, we have no way to tell whether a particular epigenetic signature is a cause of disease or merely a consequence of the pathological state. From a therapeutic standpoint, there is particular reason for optimism, as four drugs acting on DNA methyltransferase and histone deacetylase enzymes have already been approved. At least in blood cancers, this provides validation that pharmacological alteration of chromatin modifications has tangible clinical benefit, and these successes are spurring industry interest in the development of inhibitors of other epigenetic targets, such as histone methyltransferases. Currently, however, all epigenetic drugs act in a nonspecific, pangenomic manner and, consequently, are associated with significant doselimiting toxicities. This is perhaps unsurprising as chromatin-modifying enzymes have no inherent specificity for a particular nucleosome (or its associated gene). Rather, they are recruited by DNA binding proteins or co-factors or RNAs that localize the complex to a specific stretch of sequence. This issue goes beyond simple, drug-related, off-target effects: agents that modify the chromatin state across the genome may also awaken undesirable elements, such as endogenous retroviruses. Thus, if epigenetic therapies are to succeed outside cancer—in neurological indications, for example—their activity needs to be more directed. We are currently witnessing a renaissance in epigenetics research. Much of the recent growth in the field can be attributed to the technology-enabled ability to survey epigenetic modifications on a genome-wide scale. The success of epigenetic therapy in hematological malignancies has also engendered confidence in the translational potential of the field. But greater emphasis now needs to be placed on elucidating not only the molecular mechanisms by which an expressed or silent state is transmitted through cell division but also the interplay between DNA and/or chromatin modifications and RNAs, transcription factors, nuclear organizing factors and signal transduction pathways in different cells types, at different ages and under different developmental and disease states. With this knowledge in hand, epigenetics has the potential to make an even greater mark on the practice of medicine. Nature Biotechnology is grateful to sponsors GlaxoSmithKline’s EpiNova DPU, Cellzome and Active Motif, whose support enables this focus to be freely available to readers online. 1031
c o m m e n ta r y
Linking cell signaling and the epigenetic machinery Helai P Mohammad & Stephen B Baylin
© 2010 Nature America, Inc. All rights reserved.
One of the biggest gaps in our knowledge about epigenomes is how their interplay with cellular signaling influences development, adult cellular differentiation and disease.
A
n array of high-throughput technologies is providing us with ever more detailed maps of the positions of epigenetic marks in the genomes of various cell types under assorted conditions. The striking differences observed in these experiments have increased our appreciation of the importance and functional consequences of chromatin remodeling. However, how the machinery that governs the various epigenetic processes ties into the larger cellular context remains largely unknown. Specifically, we need to understand what signals a cell must receive and send to appropriately orchestrate the epigenome for its role in developmental biology, cell differentiation and renewal of adult stem cells, and in how aberrant signaling to the epigenetic machinery contributes to disease development. The basic premise of this commentary is that we must add three-dimensional information to linear epigenome mapping by elucidating the environmental cues and signal transduction cascades that result in alterations in chromatin structure and ultimately gene expression.
The epigenome and the cellular environment The genomic distributions of the three main modulators of the epigenome (reviewed in this issue1)—DNA methylation, histone modifications and nucleosome positioning—are rapidly being elucidated across the genomes of multiple cell types using a growing series of sequencing- and microarray-based technologies2,3. We will undoubtedly establish over the Helai P. Mohammad and Stephen B. Baylin are at the Sidney Kimmel Comprehensive Cancer Center, The Johns Hopkins Medical Institutions, Baltimore, Maryland, USA. e-mail: [email protected]
next several years the patterns of all epigenetic marks that accompany heritable transcriptional states in all cell types. What we currently know much less about, and must come to understand, are the cues in both normal and abnormal cellular environments that signal cells to alter their epigenomes. The complex interaction between these cues and changes in the epigenome must be the result of a highly orchestrated set of events that involves communication of each cell with its environment. The initiation of proper cell differentiation and tissue organization must involve signal transduction cascades that create epigenome alterations only when new sets of gene expression patterns become heritable. Epigenetic processes will thereby regulate the balance between stem, progenitor and mature cells in adult and developing tissues as well as fostering abnormal cell population states, such as in cancer. It is increasingly apparent that genes that are important in development overlap considerably with those that govern adult renewal systems and whose expression patterns are altered in diseases, such as cancer. Most of these genes are subject to a stringent epigenetic control of their transcription. Our goal for this article is to consider, using selected examples, what we may need to explore to understand the bi-directional relationship between the epigenome and the cellular environment. Highly regulated epigenome remodeling in the embryo A premier setting for investigating the signals that control switching of cell states by changing epigenetic states is, of course, embryonic development. Here, the relatively open cellular chromatin status of the zygote and of embryonic stem cells (ESCs), which is characterized
nature biotechnology volume 28 number 10 october 2010
by comparatively low levels of DNA methylation and high levels of histone acetylation, must be progressively converted to ever-changing variations of more-closed, deactylated and methylated chromatin states that facilitate cell lineage commitment and the formation of specialized tissues4. Genome-wide studies of the localization of histone modifications, histone modifying enzymes and DNA methylation are providing us with valuable insights into the linear positioning of the epigenetic determinants of ESCs and clues to changes occurring with lineage commitment (reviewed in this issue5). The ability to reprogram mature cells to an embryonic-like state by nuclear transfer or by inducing the expression of key transcription factors has provided us with critical opportunities to linearly map the epigenetic parameters that are essential for attaining pluripotency. As we obtain knowledge of the above genomic patterns, we are challenged to understand what cellular signaling processes dictate their development and how they contribute to cellular differentiation. One way to envision dissecting this is to consider the Waddington landscape model, as recently interpreted by several authors6–10. Waddington envisioned development starting with a marble at the top of a hill and initially having the totipotent state of a zygote cell (Fig. 1). As the totipotent zygotic cell rolls down the hill, it—with some degree of stochasticity—enters a series of furrows that induce increasingly restrictive, more committed cell fates as the totipotent cell changes to a multipotent adult stem cell and then to differentiated cells of adult tissues. This trip through the valleys is further associated with the evolving patterns of epigenetic states that maintain the cell fate changes key to each developmental and differentiation stage7. During cellular reprogramming to an induced pluripotent state 1033
C O M M E N TA R Y Environmental stimulus Developmental potential
Totipotent Zygote
© 2010 Nature America, Inc. All rights reserved.
Unipotent Differentiated cell types
WNT S Shh Notch RA PcG miRNA lincRNA
–Pax5
Global DNA demethylation
WNT TGFs/growth factors ? Notch Shh s LIF/Cytokines
Oct4 Sox2 Nanog PcG miRNA lincRNA A
Pluripotent ICM/ES cells, EG cells, EC cells, mGS cells, IPS cells
Multipotent Adult stem cells (partially reprogrammed mmed d cells?)
Epigenetic status
Signal transduction
Only active Z chromosomes Global repression of differentiation genes by polycomb proteins Promoter hypomethylation +Oct4 +Sox2 +Klf4 +c-Myc BMP Wnt3a
X Inactivation Repression of lineage-specific genes by polycomb proteins Promoter hypermethylation
?
? Macrophage
B cell ell
+Pdx1 +Ngn3 +Mafa
Fibroblast muscle
X Inactivation Derepression of polycombsilenced lineage genes Promoter hypermethylation
+MyoD
Figure 1 Depiction of potential cell signaling in Waddington’s model of epigenetic determination of development, as interpreted by Hochedlinger and Plath7. Colored marbles correspond to differentiation states. Arrows represent the directionality of factor influence for development with ‘+’ indicating addition and ‘–’ indicating removal of a given factor or signal. The downward blue arrow at the top left of the ‘hill’ reflects direction of normal development, whereas the upward blue arrow at the bottom right of the hill depicts the direction of cellular reprogramming during generation of iPSCs. Coloring of text for names of factors and signaling pathways correspond to their function within the given developmental stage.
(induced pluripotent stem cells (iPSCs)), the marble is pushed progressively back ‘uphill’ in a process that must reverse many of the mature epigenomic modifications. Dissecting this process through epigenomic analyses is providing an unparalleled opportunity to understand the linear patterns of epigenomic features that characterize different stages in development. We now suggest that understanding the integration between the environment of Waddington’s furrows with the epigenome alterations experienced by Waddington’s marble provides a model for understanding what orchestrates epigenomic changes (Fig. 1). Pluripotency and intra-nuclear regulation. Studies of the mechanisms of iPS cell generation have further clarified the need for orchestration of the epigenetic landscape in early development. The critical step during dedifferentiation to iPSCs is the activation of transcription factors that maintain the embryonic state and the downregulation of factors promoting cell differentiation, whereas epigenetic silencing of pluripotency genes, such as OCT4 and NANOG, is associated with the onset of lineage commitment9. The silencing of these genes appears to be a molecular progression. The presence of repressive histone modifications emerges first to dictate transcriptional silenc1034
ing. This is then locked in by the imposition of DNA methylation10–13 which further prevents reprogramming to the undifferentiated state10–12. (Fig. 2). Studies of iPSC generation have identified potential roles for the enzyme cytidine deaminase (AID) and the Tet family of proteins as DNA demethylases for the pluripotency genes in the final steps of converting committed cells back to ESC-like cells14,15. The specific signals that regulate the levels and targeting of the machinery that demethylates and methylates DNA at the pluripotency genes are largely unknown but important possibilities are discussed later below. As genes in ESCs undergo changes in their epigenetic status, transcription factors regulate downstream target genes that are critical to the epigenetic landscapes that evolve during development. OCT4, SOX2 and NANOG are part of a regulatory network that includes their own promoters and promoters of genes that are targets of the polycomb-group (PcG) protein complexes that mediate long-term gene silencing16,17. A hallmark of ESCs is that some gene promoters are simultaneously occupied by the polycomb-associated histone repression mark, H3K27me3, as well as the activation marks H3K4me2 and three methyl (me3) groups placed by the trithorax (Trx) complex—a state termed bivalent chromatin18,19. During
conversion from the totipotent state of ESCs to the multipotent state of more adult tissue stem/ progenitor cells, this bivalent state is remodeled, and either the active or the repressive mark is enhanced, depending on the transcription state required for lineage commitment18. Although these chromatin states are being defined in mapping experiments, the questions, from a signaling standpoint, are what triggers this shift in balance to the Trx- and PcG-mediated histone modifications and what is the mechanism by which such a shift may occur (Fig. 2)? Extrinsic signaling and cellular niche/environment. From studies of human and mouse ESCs and the reprogramming of cells to iPSCs, several pathways have emerged as major candidate regulators of epigenetic remodeling (Fig. 1). In essence, these are the signal transduction systems that induce and maintain the stemness of ESCs as well as those that convert ESCs to and maintain them as more committed progenitors. The first example is the Wnt pathway that is involved at all of these stages. In terms of supporting pluripotency, the ligand Wnt3a can replace overexpression of the nuclear oncogene product, c-Myc, a key downstream target of Wnt pathway signaling, in potentiating the generation of iPSC (Fig. 1)20.
volume 28 number 10 october 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
C O M M E N TA R Y Growth factors are a second major group of signaling molecules to be considered. This class of morphogens, critical for appropriate development, includes bone morphogenetic proteins, transforming growth factors (TGFs) and fibroblast growth factors (FGFs). Specifically, FGFs and TGFs have been shown to sustain expression, by means of downstream signaling to Smad proteins, of the pluripotency factors Oct4, Sox2 and Nanog that promote the undifferentiated potential of human ESCs21. Growth factor withdrawal has been exploited to push ESCs toward differentiation, thus allowing inferences with respect to the influence of growth factor signaling on epigenetic alterations during differentiation22,23. Genes marked by bivalent chromatin patterns undergo remodeling upon growth factor withdrawal such that either more active or repressive chromatin states emerge, depending upon of the specific lineage commitment induced20,24. As a third example, the cytokine leukemia inhibitory factor (LIF) has been associated with supporting the undifferentiated state of mouse ESCs. LIF confers its signal to a cascade that results in activation of STAT3, expression of which has been shown to be important in retention of the pluripotent state of ESCs25,26. Whereas the above examples are pathways that maintain stemness of ESCs, signaling cascades are also critical to the commitment of these cells during development and maintenance of the progenitors for various cell lineages that emerge in development. In some instances, the same pathway can function both in ESCs maintenance and differentiation. For example, activation of the Wnt pathway is important for the differentiation of neuronal precursors and the specification of the forebrain in vivo27,28. Disruption in the downstream mediator of Wnt, β-catenin, results in decreased proliferation of neural progenitors and defects in neuronal migration29,30. Retinoic acid signaling is also essential for neural development and involved in specification, differentiation and outgrowth of axons (reviewed in ref. 31). Additionally, retinoic acid can induce differentiation of human and mouse ESCs, and embryonic carcinoma (EC) cells32. Similarly, Notch signaling appears critical for key stages of development. It is important for cell-type specification and differentiation but is also involved in the proliferation and self-renewal of stem cells (reviewed in refs. 33,34). Finally, Sonic Hedghog (Shh) signaling is critical for the development of many organs including the brain, the lung and endocrine system. It promotes the appropriate differentiation of ESCs (reviewed in ref. 35), but has also been implicated in self-renewal of normal stem cells.
Stemness
Commitment TGFβ
Wnt ?
?
Notch ?
Kinase activity? Altered expression of machinery? miRNAs? lincRNAs? Altered positioning of machinery?
AID? TET? Retention of unmethylated state
Shh ?
Histone demethylases PcG Trx
Oct4 Sox2 Nanog
Bivalent and/or PcG genes
PcG
Figure 2 Potential mechanisms by which the chromatin of key developmental genes may be regulated by cellular signaling. The left panel represents the ESC state whereby extrinsic signaling may impinge upon regulation of the DNA methylation status of pluripotency genes. The transcription start sites (arrows) of the three genes are depicted at the bottom left with circles representing CpG sites as DNA unmethylated (white) or methylated (black). The DNA methylated gene is transcriptionally repressed (red circle over transcription start) and nucleosomes (blue) are in a more compact structure in contrast to the more open structure of the expressed genes on the bottom left. Right panel represents the committed progenitor state that ensues when the pluripotency factors are silenced in ESCs. Subsequent resolution of bivalency to active or inactive target gene transcriptional states is depicted as discussed in the text.
The key question is how we tie the signaling from the above pathways to specific interactions that control the epigenetic landscapes during critical stages of development. Although our understanding in this arena is in the early stages, we can consider the critical steps that must be subject to control by the signaling pathway and discuss the possible mechanisms that are beginning to emerge. In doing so, it is first imperative to separate classic signaling transduction from true epigenetic regulation. The former involves events that directly modulate gene expression by altering the levels, post-translational modification or positioning of transcription factors and their co-factors in response to the extracellular environment or state of the cell36,37. These events may then trigger changes in the transcription of a given gene or a program of responsive genes. Only then might a new epigenetic state that stabilizes these signal transduction induced changes come into play which renders the new expression states heritable, even in the absence of the initial signals37. The imposition of epigenetic states establishes heritable activation and repression of gene transcription through a host of activat-
nature biotechnology volume 28 number 10 october 2010
ing or repressing histone modifications, in some cases coupled to DNA methylation and changes in nucleosome positioning38. All of these steps involve a battery of enzymes and protein complexes, including histone methyltransferases (HMTs), histone acetyltransferases (HATs), histone deactylases (HDACs), histone demethylases (KDMs), histone deacetylases, DNA methyltransferases (DNMTs), DNA demethylases and nucleosome remodeling complexes (reviewed in this issue1 and in ref. 39). The activity of all of these proteins could be regulated by signaling molecules. Once a gene transcription state is altered by the enzymatic chromatin remodelers, it becomes locked in a heritable pattern and the maintenance of the state may not require continued activity of the factors that originally created it. Obviously, the interplay between classic signal transduction and epigenetic control is likely to be complex and may often be intertwined. Thus, regulation of DNA-binding transcriptional regulatory machinery may interact with chromatin modifying enzymes to lock in epigenetic states. 1035
C O M M E N TA R Y
Notch
Wnt
RBP-J KDM
c-Myc PcG
DNMT
HMT
© 2010 Nature America, Inc. All rights reserved.
HDAC
Growth factors Shh Ras
Figure 3 Modeling signaling that may promote cancer-specific DNA hypermethylation imposed on a normally non-DNA methylated, PcG-marked gene. Black solid arrows, direct regulation; black dashed line arrows, potential regulation and intersection of signal transduction with chromatin regulating machinery; red arrows, feedback in that the signaling pathway itself becomes activated in association with genes that are abnormally DNA methylated and silenced; yellow star, active 2/3meH3K4; red star, the PcG-associated 3meH3K27; green polygon, AcH3K9; black circles, methylated CpG sites. HMT, histone methyltransferase; KDM, lysine demethylase; HDAC, histone deacetylase; DNMT, DNA methyltransferase.
Hormone receptors, such as those that bind retinoic acid, are one example of DNA-binding transcription factors that also interacts with chromatin-modifying enzymes (reviewed in ref. 40). These receptors are typically localized to the cytoplasm and translocate to the nucleus upon steroid hormone binding, where they act as DNA-binding factors mediating gene expression changes. At least in the case of the mouse mammary tumor virus (MMTV) promoter (reviewed in refs. 40,41), hormone activation can occur through activation of two such receptors, glucorticoid and progesterone as well as through progestin activated ERK signaling41. Activation of MMTV and other progesterone receptor target genes is further coupled to activity of HAT’s, ultimately resulting in repositioning of histones providing an accessible landscape for receptor binding at hormone response elements41. Another example of transcription factor– mediated alterations of the epigenome involves STATs, the downstream effectors of cytokine signaling, important for maintenance of the pluripotent state, as described above. STAT4 1036
can promote an active chromatin environment, whereas STAT6 is associated with a transcriptionally inactive epigenetic state42. Earlier studies indicated that transcription factors, such as STATs, do have differential requirements for co-activators likely helping to confer additional specificity to their ability to modulate the local chromatin43. Specifically, for maintenance of ESC stemness, LIF activates STAT3, which, in turn, activates Klf4 that activates the pluripotency factor, Sox226. A third, and recent example, involves the interaction of the Notch effector RBP-J with the lysine demethylase, KDM5a. Methylation of histone H3 Lys 4 is dynamically altered at RBP-J sites upon inhibition or reactivation of Notch signaling44. Finally, a fourth example is the transcriptional mediators of TGF-beta signaling, the SMAD proteins. SMADs have been associated with transcriptional activation or repression in a variety of contexts45. SMADs recruit either co-repressors or co-activators, generally associated with transcriptional machinery such as HDAC or p300, that ultimately alter
acetylation states of histones within TGF-beta responsive gene promoters45. The aforementioned scenarios provide just a few examples by which signal transduction cascade effectors can modulate chromatin through interactions with the histonemodifying machinery. There are many additional examples and probably numerous others that have yet to be found. The theme, however, is clear in that extrinsic signaling may lead to a cytoplasmic cascade of events that are ultimately translated to the nucleus by transcription factors. The transcription factors may recruit complexes that contain histone modifiers to specific gene promoters thereby modulating the local chromatin to promote or inhibit transcription. Ultimately, as in the previously described example of the pluripotency factors, these transcriptional states may be locked in by DNA methylation (refs. 10–13 as above). From the standpoint of Waddington’s model in Figure 1, we might surmise that any environmental signaling for control of ESCs might influence the balance between stemness retention and commitment. Central to this would be the regulation of embryonic transcription factors, such as OCT4, SOX2 and NANOG (Fig. 2). For maintenance of stemness, one might imagine signaling pathway convergence upon steps that initially allow transcription of these genes with factors creating a favorable open chromatin structure, which includes a promoter free of DNA methylation and histones with activating post-translational modifications, such as lysine acetylation. For the onset of loss of pluripotency and for subsequent lineage commitment, it appears that signal transduction must first initiate silencing of the pluripotency genes through the appearance of repressive histone modifications. Subsequently, promoter DNA methylation ensures that lineage commitment is maintained through the many rounds of mitosis that follow in the lifetime of the organism10. Might then the epigenetic control of embryonic transcription factors be influenced by pathways such as Wnt, bone morphogenetic proteins, TGFs, FGFs or cytokines, such as LIF? Does it involve regulation of the DNA demethylase steps recently proposed to be essential for the induction or retention of embryonic factor expression14,15. Downstream from these events, how does the cell signal transduction machinery control the balance between Trx and PcG proteins in establishing the bivalent chromatin balance discussed earlier? One pathway linked to such control is Shh, which increases progenitor cell number in a manner dependent upon the PcG factor, Bmi1. Bmi1 is a component of the PcG complex that recognizes the repressive histone mark H3K27me3. In cell culture models, addition
volume 28 number 10 october 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
C O M M E N TA R Y of Shh ligands or ovexpression of Gli, a downstream effector of Shh signaling, increases Bmi1 expression46. Mice engineered to be null for Bmi1 expression in cerebellar granule precursor cells are defective in cerebellar development as well as proliferation of neural progenitor cells in response to Shh47,48. The convergence between Bmi1, a PcG constituent, and Shh signaling highlights one mechanism by which signal transduction may alter the balance between Trx and PcG through modulation of a key component of the PcG machinery. Additional study is required to elucidate how this may specifically alter the bivalent chromatin balance for Bmi1 target genes. Notch is another pathway that is increasingly implicated in the control of the epigenetic landscape. As mentioned earlier, a recent study describes a new role for the histone demethylase, KDM5A, as playing an integral function in a Notch repressor complex. This PcG interacting histone demethylase49,50, which would decrease the key transcriptional activating mark, H3K4me3, associates with the Notch nuclear effector protein, RBP-J, in a manner essential for Notch/RBP-J target gene silencing51. Furthermore, the KDM5A and RBP-J interaction is critical in Notch-mediated patterning, growth and tumorigenesis in vivo, as shown in Drosophila44. The control of the balance between ESC stemness and commitment also appears to involve microRNAs (miRNAs) and other noncoding RNAs. Specifically, a family of c-Mycregulated miRNAs regulates the expression of pluripotency and differentiation factors52,53. Within this group, a family of miRNAs, mir-200, has recently been shown to regulate a key PcG component, SUZ12 (ref. 54). Long-intergenic RNAs (lincRNAs) have also been shown to have a role in pluripotency and are themselves transcriptionally regulated by key transcription factors such as Oct4 and Nanog55. Such RNAs may also be critical for PcG occupancy and the correct targeting of histone modifiers such as LSD1 (refs. 56,57). Signaling may thus involve regulation of the expression status of noncanonical, noncoding RNAs, adding an additional layer of complexity by which signaling can alter cell states to ultimately promote heritable alterations to the epigenome (Figs. 1 and 2). We can consider maintenance of ESC stemness and subsequent differentiation during development as a model for signalingmediated alterations in the epigenome. To this end, we have tried to emphasize the need to step up research efforts to elucidate the interactions between cell signaling pathways and the molecular machinery regulating switches in the epigenetic landscape that drive key stages in embryogenesis. We now have the opportunity
to map such interactions across the genome, as analytic tools and the understanding of the molecular epigenetic machinery are available and/or rapidly developing. Adult cell renewal and epigenomes Understanding the mechanisms of adult cell renewal will also require the exploration of the impact of cellular signaling on the epigenetic landscapes of these cells (Fig. 1). Many of the same concepts that govern embryonic development, such as regulation by bivalent chromatin marks, can also be found in adult stem cells. In addition, the same nuclear transcription factors and signaling pathways discussed previously for embryonic development are major players in adult tissue homeostasis. The environment for this cell renewal—the ‘stem cell niche’—might be imagined as the furrows (those at the very bottom of the model) for interaction with stem and progenitor cells in the Waddington model58. Stem cell niches provide special microenvironments for adult stem cells. The signaling factors in the niche are thought to regulate the epigenetic mechanisms responsible for maintaining the balance between adult stem cell self-renewal and lineage commitment. Indeed, genome-wide chromatin tiling studies have defined the epigenetic changes that accompany different stages of differentiation in adult tissues59,60. One great challenge in studying these processes is obtaining high-quality samples from key isolated cell types in sufficient purity and quantity for today’s genome-wide technologies for mapping epigenetic modifications. Clearly, improved methods for the isolation of such populations and adapting assay platforms to very small cell numbers must be developed. Use of mouse and other model organisms may be instrumental for many such studies. Disease as a scenario for understanding epigenome regulation It is increasingly being recognized that epigenetic abnormalities are critical to disease pathogenesis (see ref. 1, this issue). Studies of epigenetic changes associated with different conditions can not only improve our understanding of the biology of the diseases and hold great promise for improving their management (for a review, see ref. 61, this issue), but also be invaluable for providing insights into basic aspects of epigenetic regulation. At present, cancer is by far the most studied disorder with respect to epigenetic abnormalities. Silencing of tumor suppressor genes by aberrant DNA hypermethylation of normally unmethylated promoter region CpG islands is the best understood of these molecular changes. Dissecting the molecular origins of
nature biotechnology volume 28 number 10 october 2010
these silencing events, which might involve hundreds of genes depending on the tumor of the specific patient (reviewed in refs. 62,63), can teach us much about the cell signaling events that govern epigenetics (Fig. 3). Perhaps the most instructive aspects link epigenetic abnormalities in cancer to the developmental events in cell signaling that have been a major focus of this article so far. Many studies have stressed that cancer is a disease driven by cells either arrested in embryonic-like states and/or reprogrammed to adopt such states (reviewed in refs. 64,65). Recent gene expression array results have found an ‘embryonic stem cell–like’ signature in many cancers, particularly in the most aggressive tumors66. How can we tie the similarities between tumor cells and ESCs to the signaling events that may guide epigenomes in both? Some clues have emerged in recent years. First, most of the factors generally used for the generation of iPSCs have defined roles as oncogenes67. Second, deletion of tumor suppressor genes, including perhaps the most common abnormally epigenetically silenced gene in cancer, Ink4a, can facilitate iPSC formation68, and can reprogram older cells to behave as younger ones. Inactivation of another epigenetically silenced gene in cancer, Arf, an alternative reading frame product of the Cdkn2a locus, along with inactivation of the tumor suppressor Rb, can convert post-mitotic myocytes to myoblast colonies that retain the ability to differentiate69. Third, exciting recent work has linked maintenance of cancer stem-like cells, including their role in therapy resistance, to high expression of Jarid 1a and 1b, two proteins that erase the transcriptional activating mark, H3K4me3 (refs. 70,71). Fourth, as human cell systems age, their stem cell numbers actually increase72,73 and DNA hypermethylation of many of the same genes modified in cancer increases with age74. Finally, a sizeable fraction of the overexpressed genes in the ESC-like signature for tumors61 encode members of the PcG complexes that are vital for establishing developmental epigenome patterns17,75–77. Also, the overexpression of these PcG genes has been experimentally linked to cellular transformation78–81 and to induction of cancer-related abnormal gene silencing and progression of abnormal DNA methylation82,83. Another scenario illustrates how the links between embryogenesis and cancer epigenomes provide clues on how cell signaling is involved (Fig. 3). Multiple laboratories have shown that virtually half the genes with aberrant promoter hypermethylation in cancer carry PcG and/ or bivalent chromatin marks in ESCs and/or embryonic progenitor cells82,84,85. These genes do not have DNA methylation in their promoter 1037
© 2010 Nature America, Inc. All rights reserved.
C O M M E N TA R Y CpG islands in embryonic cells (Fig. 2)82,84,85. It is hypothesized that DNA methylation arises to replace or augment PcG occupancy for a more stable silencing of the involved genes (reviewed in refs. 65,86,87). Although somewhat preliminary, multiple studies have shown how signaling events are involved in abnormal DNA methylation in cancer (Fig. 3). Many signal transduction pathways that drive cell transformation and tumor progression lead to the upregulation of PcG and/or components of the DNA methylation machinery. Experimental overactivation of the Ras pathway demonstrates a requirement for such epigenetic machinery proteins and can induce the abnormal gene promoter DNA methylation88. Additionally, through largely unknown mechanisms, overexpression of c-myc can give rise to a specific signature of CpG island hypermethylation in culture and in vivo models of T-cell lymphoma (ref. 89). Overactivation of the Shh pathway with possible downstream involvement of the PcG protein Bmi1 is linked to maintenance of cancer stem-like cells46. Similar functions have been reported for Wnt and Notch pathways (discussed above)90,91. Furthermore, there is mounting evidence for involvement of epigenetic abnormalities in facilitating the tumorigenic activities of all of these pathways. For example, epigenetic gene silencing may be obligatory for Shh-driven formation of cerebellar tumors47 and the oncogenic activity of Notch requires the interaction with a H3K4me3 demethylase44. Multiple genes encoding for proteins that antagonize Wnt pathway activity become DNA methylated and silenced thus facilitating hyperactivity of the pathway in cancer (reviewed in ref. 92). Much remains to be understood about the precise molecular events that seem to link cell signaling and the cancer epigenetic abnormalities discussed above. Doing so, however, should provide invaluable insights toward understanding how epigenomic states can be regulated and the knowledge will clearly be invaluable for improving the treatment of cancer. Conclusions We have tried to illustrate, using selected examples, how vital it will be to understand what may now be one of the biggest gaps in our knowledge about the nature of epigenomes— how their different states are orchestrated by cell signaling at key stages of development, in adult cellular differentiation and in important disease states. We have pointed out accruing experimental evidence of the signaling pathways that control the epigenetic machinery
1038
and to the molecular mechanisms involved. We have hypothesized how we may build on these clues to study the events in infinitely more detail. New technologies and our rapidly accelerating knowledge of the machinery that maintains epigenetic states will facilitate our dissection of the regulatory mechanisms that govern the epigenome and will help in the development of translational applications. COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests. 1. Portelay, A. & Esteller, M. Nat. Biotechnol. 28, 1057– 1068 (2010). 2. Satterlee, J. Nat. Biotechnol. 28, 1039–1044 (2010). 3. Bernstein, B. et al. Nat. Biotechnol. 28, 1045–1048 (2010). 4. Surani, M.A., Hayashi, K. & Hajkova, P. Cell 128, 747–762 (2007). 5. Meissner, A. et al. Nat. Biotechnol. 28, 1079–1088 (2010). 6. Hemberger, M., Dean, W. & Reik, W. Nat. Rev. Mol. Cell Biol. 10, 526–537 (2009). 7. Hochedlinger, K. & Plath, K. Development 136, 509– 523 (2009). 8. Zhou, Q. & Melton, D.A. Cell Stem Cell 3, 382–388 (2008). 9. Yamanaka, S. & Blau, H.M. Nature 465, 704–712 (2010). 10. Athanasiadou, R. et al. PLoS ONE 5, e9937 (2010). 11. Feldman, N. et al. Nat. Cell Biol. 8, 188–194 (2006). 12. Epsztejn-Litman, S. et al. Nat. Struct. Mol. Biol. 15, 1176–1183 (2008). 13. Li, J.Y. et al. Mol. Cell. Biol. 27, 8748–8759 (2007). 14. Bhutani, N. et al. Nature 463, 1042–1047 (2010). 15. Ito, S. et al. Nature 466, 1129–1133 (2010). 16. Sridharan, R. et al. Cell 136, 364–377 (2009). 17. Lee, T.I. et al. Cell 125, 301–313 (2006). 18. Bernstein, B.E. et al. Cell 125, 315–326 (2006). 19. Guenther, M.G. & Young, R.A. Science 329, 150–151 (2010). 20. Marson, A. et al. Cell Stem Cell 3, 132–135 (2008). 21. Xu, R.H. et al. Cell Stem Cell 3, 196–206 (2008). 22. Conti, L. et al. PLoS Biol. 3, e283 (2005). 23. Brustle, O. et al. Science 285, 754–756 (1999). 24. Chi, A.S. & Bernstein, B.E. Science 323, 220–221 (2009). 25. Niwa, H., Burdon, T., Chambers, I. & Smith, A. Genes Dev. 12, 2048–2060 (1998). 26. Niwa, H., Ogawa, K., Shimosato, D. & Adachi, K. Nature 460, 118–122 (2009). 27. Hirabayashi, Y. et al. Development 131, 2791–2801 (2004). 28. Gunhaga, L. et al. Nat. Neurosci. 6, 701–707 (2003). 29. Machon, O., van den Bout, C.J., Backman, M., Kemler, R. & Krauss, S. Neuroscience 122, 129–143 (2003). 30. Backman, M. et al. Dev. Biol. 279, 155–168 (2005). 31. Maden, M. Nat. Rev. Neurosci. 8, 755–765 (2007). 32. Andrews, P.W. Dev. Biol. 103, 285–293 (1984). 33. Bray, S.J. Nat. Rev. Mol. Cell Biol. 7, 678–689 (2006). 34. Kopan, R. & Ilagan, M.X. Cell 137, 216–233 (2009). 35. Bertrand, N. & Dahmane, N. Trends Cell Biol. 16, 597–605 (2006). 36. Bryant, G.O. et al. PLoS Biol. 6, 2928–2939 (2008). 37. Ptashne, M. Curr. Biol. 19, R234–R241 (2009). 38. Jenuwein, T. & Allis, C.D. Science 293, 1074–1080 (2001). 39. Richly, H., Lange, M., Simboeck, E. & Di Croce, L. Bioessays 32, 669–679 (2010). 40. Biddie, S.C., John, S. & Hager, G.L. Trends Endocrinol. Metab. 21, 3–9 (2010).
41. Vicent, G.P. et al. Mol. Endocrinol. 3, 1–2 (2010). 42. Wei, L. et al. Immunity 32, 840–851 (2010). 43. Korzus, E. et al. Science 279, 703–707 (1998). 44. Liefke, R. et al. Genes Dev. 24, 590–601 (2010). 45. Massague, J., Seoane, J. & Wotton, D. Genes Dev. 19, 2783–2810 (2005). 46. Liu, S. et al. Cancer Res. 66, 6063–6071 (2006). 47. Leung, C. et al. Nature 428, 337–341 (2004). 48. Zencak, D. et al. J. Neurosci. 25, 5774–5783 (2005). 49. Klose, R.J. et al. Cell 128, 889–900 (2007). 50. Pasini, D. et al. Genes Dev. 22, 1345–1355 (2008). 51. Borggrefe, T. & Oswald, F. Cell. Mol. Life Sci. 66, 1631– 1646 (2009). 52. Lin, C.H., Jackson, A.L., Guo, J., Linsley, P.S. & Eisenman, R.N. EMBO J. 28, 3157–3170 (2009). 53. Dang, C.V. EMBO J. 28, 3065–3066 (2009). 54. Iliopoulos, D. et al. Mol. Cell 39, 761–772 (2010). 55. Guttman, M. et al. Nature 458, 223–227 (2009). 56. Rinn, J.L. et al. Cell 129, 1311–1323 (2007). 57. Tsai, M.C. et al. Science 329, 689–693 (2010). 58. Voog, J. & Jones, D.L. Cell Stem Cell 6, 103–115 (2010). 59. Barski, A. et al. Cell 129, 823–837 (2007). 60. Barski, A. & Zhao, K. J. Cell. Biochem. 107, 11–18 (2009). 61. Kelly, T.K., De Carvalho, D.D. & Jones, P.A. Nat. Biotechnol. 28, 1069–1078 (2010). 62. Herman, J.G. & Baylin, S.B. N. Engl. J. Med. 349, 2042–2054 (2003). 63. Jones, P.A. & Baylin, S.B. Cell 128, 683–692 (2007). 64. Feinberg, A.P., Ohlsson, R. & Henikoff, S. Nat. Rev. Genet. 7, 21–33 (2006). 65. Ohm, J.E. & Baylin, S.B. Cell Cycle 6, 1040–1043 (2007). 66. Ben-Porath, I. et al. Nat. Genet. 40, 499–507 (2008). 67. Takahashi, K. & Yamanaka, S. Cell 126, 663–676 (2006). 68. Li, H. et al. Nature 460, 1136–1139 (2009). 69. Pajcini, K.V., Corbel, S.Y., Sage, J., Pomerantz, J.H. & Blau, H.M. Cell Stem Cell 7, 198–213 (2010). 70. Sharma, S.V. et al. Cell 141, 69–80 (2010). 71. Roesch, A. et al. Cell 141, 583–594 (2010). 72. Rossi, D.J., Jamieson, C.H. & Weissman, I.L. Cell 132, 681–696 (2008). 73. Chambers, S.M. et al. PLoS Biol. 5, e201 (2007). 74. Toyota, M. & Issa, J.P. Semin. Oncol. 32, 521–530 (2005). 75. Valk-Lingbeek, M.E., Bruggeman, S.W. & van Lohuizen, M. Cell 118, 409–418 (2004). 76. Gil, J., Bernard, D. & Peters, G. DNA Cell Biol. 24, 117–125 (2005). 77. Mikkelsen, T.S. et al. Nature 448, 553–560 (2007). 78. Varambally, S. et al. Nature 419, 624–629 (2002). 79. Bracken, A.P. et al. EMBO J. 22, 5323–5335 (2003). 80. Kirmizis, A., Bartley, S.M. & Farnham, P.J. Mol. Cancer Ther. 2, 113–121 (2003). 81. Kleer, C.G. et al. Proc. Natl. Acad. Sci. USA 100, 11606–11611 (2003). 82. Ohm, J.E. et al. Nat. Genet. 39, 237–242 (2007). 83. Mohammad, H.P. et al. Cancer Res. 69, 6322–6330 (2009). 84. Schlesinger, Y. et al. Nat. Genet. 39, 232–236 (2007). 85. Widschwendter, M. et al. Nat. Genet. 39, 157–158 (2007). 86. Cedar, H. & Bergman, Y. Nat. Rev. Genet. 10, 295–304 (2009). 87. Baylin, S.B. in StemBook (ed. L. Gerard) (Harvard Stem Cell Institute, 2009). 88. Gazin, C., Wajapeyee, N., Gobeil, S., Virbasius, C.M. & Green, M.R. Nature 449, 1073–1077 (2007). 89. Opavsky, R. et al. PLoS Genet. 3, 1757–1769 (2007). 90. Reya, T. & Clevers, H. Nature 434, 843–850 (2005). 91. Fre, S. et al. Nature 435, 964–968 (2005). 92. Ying, Y. & Tao, Q. Epigenetics 4, 307–312 (2009).
volume 28 number 10 october 2010 nature biotechnology
c o m m e n ta r y
Tackling the epigenome: challenges and opportunities for collaboration John S Satterlee, Dirk Schübeler & Huck-Hui Ng
© 2010 Nature America, Inc. All rights reserved.
What are the key considerations to take into account when large-scale epigenomics projects are being implemented?
E
pigenetic changes have been correlated with important biological processes and disease states; however, the global epigenetic landscape of most cell types has not been comprehensively investigated. Recent advances in genomics technology, in particular high-throughput sequencing, have enabled genome-wide analysis of histone modifications and analysis of DNA methylation at nucleotide resolution. Here we discuss the scientific opportunities and current challenges in this area of research, including strategies for balancing the breadth and depth of genome-wide projects; the selection of cell types and assays; data visualization and analysis; and approaches for exploiting the resulting data to learn more about human health and disease. We also provide a brief overview of current large-scale projects (Box 1). What is epigenomics? Human genomes consist of the DNA encoding our genetic information, whereas epigenomes include DNA modifications and histone modifications layered on top of the genome. These marks comprise a part of the instructions directing the genome to express genes at particular places and times1,2. Scientific understanding of the DNA ‘hardware’ of the human genome is well established, but the epigenomic ‘software’ has not yet been systematically investigated at a genome-wide level. A chief hurdle for such an endeavor is the large number of epigenomes even within an individual. Each of us has essentially one genome; however, each cell type in each individual is believed to have John S. Satterlee is at the US National Institute on Drug Abuse, Bethesda, Maryland, USA; Dirk Schübeler is at the Friedrich Miescher Institute, Basel, Switzerland; and Huck-Hui Ng is at the Genome Institute of Singapore, Singapore. e-mail: [email protected]
a distinct epigenome that reflects its developmental state3. Thus, there are likely to be at least as many human epigenomes as there are distinct cell types in the human body. The epigenetic state of a cell is affected by developmental as well as environmental influences, and both of these inputs may leave epigenetic traces that the cell ‘remembers’ (referred to as cellular memory)4. Furthermore, the history of transcription and environmental influences, such as nutrition, toxins, drugs of abuse, infection, disease state and exposure to toxic agents, can also affect DNA and histone modifications5. Thus, the epigenome may provide a crucial interface between the environment and the genome. The stability of chromatin changes can vary: some may be transient changes, whereas others are longer lasting. Some chromatin changes are mitotically heritable and can affect somatic tissues, whereas others may even be inherited through meiosis and affect the next generation6. Epigenetic states are also likely to be influenced by an individual’s specific constellation of genetic variation; however, the extent to which this is the case is unknown. Thus, there is potentially an extremely large number of possible epigenomes that could be mapped. The field of epigenomics—the study of epigenetic changes at the level of the genome—has changed rapidly, largely owing to advances in DNA sequencing technology7. Until recently, researchers were using microarray approaches and performing relatively small-scale studies. However, with the widespread adoption of next-generation sequencing, genome-wide epigenomic mapping experiments can now be performed at unprecedented resolution8,9. Potential scientific benefits of large-scale epigenomics Large-scale epigenomic mapping studies have the potential to enhance three major areas of
nature biotechnology volume 28 number 10 october 2010
science: basic gene regulatory processes, cellular differentiation and reprogramming, and the role of epigenetic regulation in disease. Although chromatin modifications are becoming better characterized at the genomewide level, further work is necessary to understand their role in nuclear processes, such as gene regulation. Epigenome-wide maps will provide a comprehensive list of chromatin features that can serve as a launch point for ‘upstream’ investigations to identify the transcription factors, regulatory molecules and pathways that initiate, modulate or maintain epigenomic features10. These maps may also allow pursuit of ‘downstream’ investigations to identify genes with similar suites of epigenetic features that suggest coordinated regulation of gene expression in particular cell types11. Mapping of DNA methylation, histone modifications and noncoding RNAs simultaneously in the same cell types will allow scientists to begin to better understand the cross-talk that occurs among these epigenetic regulatory mechanisms. Epigenomic data sets also have remarkable power to identify functional chromosomal regions. For example, epigenomic information in concert with other data has been used to predict cis-regulatory sequences such as enhancers, microRNA genes, imprinted loci, and loci poised for activation12–18. Given the complexity involved in cellular regulation, epigenome maps will undoubtedly reveal new principles in the regulation of genome structure and function. Compared with differentiated cells, the epigenomes of human embryonic stem cells (hESCs) are unusual, especially with respect to DNA methylation19,20. Understanding how the epigenomic state of hESCs changes during the differentiation process is crucial for understanding both normal development and disruptions in development that might lead to 1039
© 2010 Nature America, Inc. All rights reserved.
COMMEN TARY adverse birth outcomes or disease conditions in the child or adult. Similarly, understanding the extent to which the epigenome can be reprogrammed, as in the case of induced pluripotent stem cells, may be essential to enable regenerative medicine to reach its full potential for treating diseases in which cells have been permanently lost or impaired21–23. With respect to health and disease, epigenetic regulation has been implicated in certain types of cancers, and the most detailed disease epigenetics investigations have taken place in the area of cancer biology24. Even so, a growing number of other diseases seem to occur at least in part as the result of epigenetic dysregulation. For example, it is hypothesized that certain neuropsychiatric and neurodevelopmental disorders have a significant epigenetic component25. Because the epigenomic status of cells may be altered by exogenous influences, it is likely that epigenetic regulation is important in the development, severity and course of other common diseases. However, the extent to which epigenetic dysregulation might be a consequence of, or itself lead to, other common disease states is poorly understood. Whether or not these aberrant epigenetic states can affect subsequent generations is even less clear but is an important area for investigation. Epigenomic maps of cell types and tissues important in specific diseases may provide a unique resource allowing researchers to identify upstream factors and pathways that might contribute to the disease state as well as downstream genes affected by the disease state. In addition, epigenetic states, regardless of whether they are causal for a given disease, have great promise as potential biomarkers for disease states or environmental stressors (e.g., exposure to toxins, infections, drugs of abuse or psychosocial stress) and thus may be useful for diagnosis of disease or disease progression26. Both genes and environment are important in the development of human diseases27. Genome-wide association studies have been successful in identifying genetic variants associated with many different diseases28. In the case of diseases that have a strong environmental component, epigenome-wide association studies that statistically correlate epigenetic variation with disease states or phenotypes could be of great value. Epigenomic maps of specific cell types or tissues could serve as a foundation for the design of future epigenome-wide association studies to systematically investigate individual epigenetic variation and its potential role in human diseases. Taking this one step further, studies investigating both genetic and epigenetic variation in disease, such as a recent study29 looking at the role of maternal or paternal contribution of gene variants to 1040
type 2 diabetes and other common diseases, illustrate the potential value of such approaches to investigating human diseases29. A deeper understanding of the influence of individual epigenetic and genetic variation in disease susceptibility could enable personalized prevention measures in the future. Epigenetic changes are inherently more plastic and dynamic than genetic changes and thus may be particularly useful targets for therapeutic intervention30. Indeed, there are at least three ‘epigenetic therapeutics’ approved by the US Food and Drug Administration for treating specific cancers and seizure disorders, and other compounds are in clinical trials31,32. A histone deacetylase inhibitor was even used to treat a patient with a genetic mutation that was believed to be the cause of a seizure disorder, suggesting that in certain cases, epigenetic alterations might be able to override genetic disease33. Potential practical benefits of large-scale epigenomics In addition to the potential scientific and medical benefits described above, several practical benefits could arise from the coordination of large-scale epigenomics projects. These include improved comparability between data sets, avoiding duplication of effort, exploiting economies of scale and developing ‘best practices’ for epigenomic studies. Given the possible combinations of cell types, environmental exposures, disease states and individual genomic variation, the sheer number of possible distinct epigenomes that could be analyzed seems astronomical. The scientific community is likely to benefit most from an organized and systematic mapping effort in which similar epigenetic features are mapped in a defined set of cell types with a standardized set of protocols and quality controls, creating high-quality data sets that are comparable with one another. A standardized approach would more easily enable the identification of epigenomic features correlated with particular cell states. Many scientists are becoming interested in investigating the interplay between epigenetics and disease, as indicated by the large increase in the number of US National Institutes of Health (NIH) grants investigating epigenetic processes (Fig. 1). Generation of epigenomic maps for cell types and tissues relevant to important biological processes and diseases would allow scientists to use the available public data to look at epigenomic regulation of their gene or process of interest rather than duplicating efforts by generating their own epigenomic maps. Of course, community epigenomics projects will merely serve as a foundation to be exploited
and built upon by researchers as they pursue their unique scientific interests. Because large-scale mapping groups are already geared toward data production, they typically can generate epigenomic data more inexpensively, rapidly and reproducibly than researchers pursuing smaller-scale projects within their own laboratories. Such groups will also be able to troubleshoot problems that arise in methods development, reagent validation, data standardization and data quality measurement, and begin to converge upon the best scientific practices in this area. These groups will also be well positioned to investigate the sources of experimental variation that arises and determine what steps need to be taken to minimize this variation. Solutions to technical problems in epigenomics will enable individual investigators to more readily apply epigenomic techniques to their biological problem of interest. As highthroughput sequencers become more widely available, many scientists will begin to consider performing epigenomic analyses within their own laboratories. Although there will inevitably be healthy competition between research groups exploring small-scale epigenomics questions, it will be important to encourage coordination between larger-scale community projects to minimize duplication and maximize the exploration of epigenomic features in a wide array of cell types and tissues. Planning large-scale epigenomics projects Given the broad scientific interest in this area, it is conceivable that large-scale epigenomics projects could be initiated in several fields, depending upon what data a community desires the most. One could readily imagine individual projects focusing on stem cells, cancer, neuroepigenomics, genetic-epigenetic interactions, developmental biology or individual epigenomic variation, because each of these fields has a distinct set of important biological questions and challenges that need to be addressed. Compared with the Human Genome Project, large-scale epigenomics projects have the potential to be quite openended, and thus a key consideration is to limit the scope of such projects so that they achieve defined and compelling scientific outcomes within the fiscal and time constraints of each project. In particular, the depth and breadth of a project must be clearly defined through the selection of tissue or cell types to be analyzed, epigenomic features to be assayed and functional correlates to be tested. Project planners also need to consider the best way to manage, analyze and visualize the large amounts of data that will be generated. These key considerations are addressed in more detail below.
volume 28 number 10 OCTOBER 2010 nature biotechnology
COMMEN TARY
© 2010 Nature America, Inc. All rights reserved.
Box 1 Overview of current large-scale epigenomics projects Several medium- and large-scale epigenomics efforts have already been initiated. We briefly describe selected examples below.
the project will provide crucial insights into the links and interplay between genetics and epigenetics.
Asian projects. Institutions in several countries have already developed technological platforms to enable the generation of large-scale sequencing data. These centers are in a good position to launch epigenomics projects. Scientists from Yonsei University (Seoul), the Japanese National Cancer Center, the Shanghai Cancer Institute and the Genome Institute of Singapore are already organizing annual conferences to promote interactions and collaborations. As epigenomics research begins to move to center stage, Asian scientists are likely to make increasing contributions.
NIH Roadmap Epigenomics Program. Initiated in 2008, the NIH’s Roadmap Epigenomics Program (http://www. roadmapepigenomics.org) has an epigenomic mapping component described in detail in an accompanying article61. In brief, the Roadmap Epigenomics Mapping Consortium is conducting in-depth epigenomic mapping of several highpriority human cell types. In this subset of tissues, genome-wide histone modification analysis will be performed for >30 histone modifications using ChIP-seq, and DNA methylation analysis will be performed genome-wide at single-base resolution using the MethylC-seq technique. The first two DNA methylomes for human cell types (the H1 hESC line and the IMR90 fibroblast cell line) have recently been published19. To capture the breadth of epigenomic differences among cell types, the consortium also intends to map >100 human cell types and tissues in a more focused way. Six informative histone modifications have been selected for ChIP-seq analysis in these cell types. In addition, DNA methylation will be mapped at single-base resolution, but only on a subset of the genome, using reduced-representation bisulfate sequencing or similar methods. DNase I–hypersensitive sites are expected to be analyzed for tissues from which sufficient material is available. Gene expression data will be collected using microarray or RNAsequencing strategies. The consortium is developing standards and best practices for individual and integrative analyses of the different data types to provide a reference for the larger epigenomic community as it builds upon these data. Data are being released immediately (with a 9-month embargo for publication of genome-wide analyses) and will be permanently archived in the GEO database (http://www.ncbi.nlm.nih.gov/epigenomics) at the US National Center for Biotechnology Information (NCBI). Other aspects of the Roadmap Epigenomics Program include development of new technologies for epigenomic analysis and imaging, identification of new epigenetic modifications and investigations into the role of the epigenome in a wide array of human diseases and environmental effects (http://nihroadmap. nih.gov/epigenomics/fundedresearch.asp).
Canadian and Australian projects. The Canadian Institutes of Health Research (Ottawa) is leading an effort to develop a potential broad-ranging initiative on ‘Epigenetics, Environment and Health’. Australia has been the site of several workshops and meetings devoted to epigenetics, and in 2008 researchers there formed the Australian Alliance for Epigenetics (http://www. epialliance.org.au/). European epigenomics projects. Although research in Europe is funded at the national level as well as Europe-wide through the research program of the European Union, we exclusively focus on the latter here. As yet, the EU has no program specifically dedicated to epigenomics, but several research groups working in the area have been funded within more broadly defined programs. Among these are the HEROIC program (http://www.heroic-ip. eu/) on epigenomics in mouse stem cells and differentiated cells and the EPITRON initiative (http://www.epitron.eu/) on cancer epigenetics, which were both early adopters of next-generation sequencing. Furthermore, the SMARTER initiative (http:// www.smarter-chromatin.eu/) aims to develop small inhibitors of chromatin-modifying enzymes. Particularly internationally visible is the Epigenome Network of Excellence, which fosters the epigenetics research community in Europe, in part through support of junior scientists, organization of focal meetings (e.g., on technical aspects of epigenomics), development of tools (e.g., a popular protocol database; http://www.epigenome-noe.net/ WWW/researchtools/protocols.php) and an informative website aimed at the layman (http://www.epigenome-noe.net/WWW/ index.php). The Epigenome Network of Excellence exemplifies the importance of networking, and the experience it has gained will be useful for the coordination and implementation of a more international effort in epigenomics. The ENCODE and ICGC projects. The Encyclopedia of DNA Elements (ENCODE) project launched by the US National Human Genome Research Institute aims to identify all functional elements in the human genome sequence, whereas the modENCODE project has the same goal with respect to model organisms 59. These projects use a wide array of different assays to identify functional elements, and epigenomic profiling is thus an important component of the programs but not their major thrust (http://www.genome.gov/10005107). Another project, the International Cancer Genome Consortium (ICGC), is investigating genomic changes that occur in various types of cancer (http:// www.icgc.org/), with the goal of obtaining a comprehensive description of genomic, transcriptomic and epigenomic changes in 50 different tumor types and/or subtypes60. As many samples from one tumor type or subtype will be analyzed in great detail,
nature biotechnology volume 28 number 10 october 2010
The International Human Epigenome Consortium. Over the years, a grassroots group of scientists has championed a sustained international epigenomics effort62,63. This is now taking definitive shape in the form of an initiative to develop an IHEC, which builds on the NIH effort in epigenomics and will attempt to create a truly global epigenomics project (http://ihec-epigenomes. org). Even the Roadmap Epigenomics Program, the largest epigenomics effort to date, will be able to map only a small number of the many epigenomes of interest. Although plans for the consortium are not finalized, it may expand the number of human cell types and tissues being mapped, and its goals could also include epigenomic analysis of nonhuman cells and tissues, which are not being characterized in the NIH Roadmap Program. The IHEC could also help to develop best practices and standards for epigenomic data generation and analysis, so investigators can perform successful epigenomic analyses more quickly and avoid problems already encountered and solved by other researchers.
1041
© 2010 Nature America, Inc. All rights reserved.
Tissue and cell type selection. One of the primary issues for large-scale epigenomic research projects is the selection and prioritization of cell types and tissues, which can be affected by both technical factors and community demand, and will vary depending upon the precise nature of the planned project. Given finite resources, scientific prioritization of cell types and tissues for analysis is crucial. Which cells or tissues are most likely to provide valuable insights into important biological processes, diseases or environmental exposures? For community resource projects, what epigenomic maps would be of the greatest value to researchers? Another question is what cell types or tissues are available in the necessary quantities for the proposed assays. This is especially important given that current technologies for epigenomic analysis are generally not amenable to the use of very small numbers of cells, although this could change rapidly34,35. Similarly, most tissues are composed of several specialized cell types (e.g., different types of neuronal and glial cells in the brain), which could make interpretation of the resulting epigenomic data difficult and perhaps require approaches to isolate single cells out of tissues. Cell lines are more homogenous than tissues, but in vitro growth conditions have the potential to substantially affect the epigenomic state of the cells so that they may not accurately reflect the in vivo state36. In the case of human tissues, individuals have different genomes and environmental exposure histories37. Thus, there might be an advantage to performing epigenomic analyses on different tissues from a single individual, in whom these variables are held constant. Investigations into individual epigenomic variation and the interplay between the genome and the epigenome are important areas for future research, so it would be ideal to use cells and tissues of known DNA sequence. Alternatively, tissue samples should be stored for future analysis as sequencing costs continue to drop. Another important consideration is the organism to be investigated. Epigenomic maps of human tissues will no doubt be important for understanding human disease. Even so, mapping of tissues and cell types from model organisms (e.g., mouse, zebrafish, fly, worm, sea slug, plant and yeast) may be very useful because the genotype and environmental exposure can be controlled and these systems are more amenable to functional testing or manipulation of epigenomic states. Epigenomic studies in established animal models of environmental exposure or disease may also be valuable, particularly for transgenerational investigations or for studies of cell types that are difficult to obtain from human sources. 1042
Grants with keyword epigenetic/epigenomic
COMMEN TARY
2,000
1,500
1,000
500
0 2005
2006
2007 Years
2008
2009
Figure 1 The growth of grants related to epigenetics at the NIH. The NIH RePORTER grants database (http://projectreporter.nih.gov/ reporter.cfm) was searched using the keywords epigenetic or epigenomic for each of the years indicated. Graph shows number of grants that contain a title, abstract or specific aims with one or both of these keywords.
Comparative epigenomic analyses of specific cell types from different species may also be valuable for identifying epigenomic regulatory features. Epigenomic assays. Which marks or features should be mapped? Again, technical and fiscal restraints affect assay selection and prioritization. Assays that measure histone modifications, DNA modifications, and small and long noncoding RNAs are essential, but information about other chromatin features (transcription factor binding sites, chromatin-interacting proteins, histone variants and nucleosome position) may add substantial value to the epigenomic data captured (see below). Of course, reagents or high-throughput assays may not be available for all features one might wish to examine. For the purposes of this article, we limit our discussion to analysis of DNA modifications and histone modifications. Several assays are available for large-scale DNA methylation analysis, and these differ in their resolution and comprehensiveness. For example, the genome-wide antibodybased methylated DNA immunoprecipitation assay has a resolution on the order of hundreds of nucleotides, whereas assays such as reduced-representation bisulfite sequencing and targeted bisulfite sequencing achieve single-base resolution but interrogate only a subset of the genome and may therefore miss information concerning important regions of the epigenome22,38–43. MethylC-seq, the most comprehensive DNA methylation technique currently available, provides genome-wide single-nucleotide-resolution data and recently has been used to create a high-density DNA methylation map for two human cell types19. Unfortunately, genome-wide MethylC-seq is expensive, making comprehensive DNA methylation analysis of large numbers of cell types and tissues less feasible for the time being.
Even so, truly comprehensive epigenomic maps should include whole-methylome data whenever possible. The recent identification of the covalent DNA modification, hydroxymethylcytosine44,45 (hmC), also poses challenges. Antibodies for hmC should permit specific hydroxymethyl-DNA immunoprecipitation assays. However, because mC and hmC are both resistant to bisulfite conversion46, it will be a priority to develop technology that can distinguish between both modifications at a single-base level. In terms of histones, >100 distinct posttranslational modifications have been identified thus far, and more seem to be discovered every month. The functions of most of these modifications are largely unknown, although some modifications are associated with active chromatin, whereas others are linked to silenced chromatin47. The two major assays for profiling histone modifications both rely on chromatin immunoprecipitation (ChIP), in which an antibody against a histone modification is used to immunoprecipitate cross-linked chromatin48. The DNA regions associated with the histone modification can then be analyzed using either microarray analysis (ChIP-chip) or high-throughput sequence analysis (ChIP-seq). ChIP-seq has the advantage of not being limited to the sequences present on the microarray, and the output of ChIP-seq is more quantitative than that of ChIP-chip. ChIP-seq is also more cost-efficient than genome-wide microarrays, and the cost of DNA sequencing is expected to drop even further48. However, ChIP-seq creates large amounts of data that multiply with the number of histone modifications profiled, creating new computational challenges with regard to storage and analysis49,50. The lack of antibodies with the specificity and efficiency of enrichment needed for a ChIP experiment limits the study of many post-translational histone modifications. For comprehensive epigenomic maps, one might want to assay every modification for which there is a useful reagent (currently around 30); however, this is an expensive proposition if one wishes to look at a large number of tissues or cell types. An alternative strategy to reduce cost is to identify a set of key histone modifications for which high-quality reagents are available and that are highly informative of cellular state. For example, the NIH Roadmap Epigenomics Mapping Consortium identified a subset of six histone modifications that was felt to be maximally informative at this point in time (H3K4me1, H3K4me3, H3K9ac, H3K9me3, H3K27me3, H3K36me3). To be comparable, ChIP-based assays rely on validated antibodies supplied in continuous
volume 28 number 10 OCTOBER 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
COMMEN TARY quality. Commercially available antibodies, however, are frequently polyclonal and thus are only available in finite amounts with batch-tobatch variations, leading to antibody-dependent differences in the output data. Monoclonal antibodies offer a renewable standardized source of antibodies; however, because they bind to a single epitope, they do not always work well in ChIP assays. Clearly, the development of renewable standardized affinity reagents for ChIP studies would be of great benefit to the scientific community. For all its power, the ChIP assay has certain limitations. Some modifications may be masked by other proteins, rendering them inaccessible to the antibody and not readily detected. Moreover, large numbers of cells are currently required. Although ChIP assays will continue to be a widely used technique for epigenomics studies, the development of alternative approaches that complement ChIP would be of great value. For example, the routine ability to isolate chromatin from a particular genetic locus and analyze it via mass spectrometry to identify all the proteins and post-translational modifications present at the locus would greatly enhance researchers’ ability to investigate regulation of gene expression51. Value-added information. In addition to the technical aspects of generating epigenomic data sets described above, additional experimental information regarding the biological sample should be included whenever possible. For example, as much phenotypic data as possible should be collected for each tissue assayed so that epigenomic state can be correlated with environmental exposures, disease state, age, gender and other measures. As mentioned above, genetic variation may affect epigenomic states, so capturing the DNA sequence of the cell type or tissue assayed would allow these correlations to be made. Measurement of gene expression levels is crucial for correlating epigenomic state with transcription. Gene expression can be measured through microarray analysis or next-generation sequencing and can focus on coding or noncoding RNA, depending on the experimental design. The role of noncoding RNAs in epigenomic processes remains somewhat unclear; however, in some species there is convincing evidence that certain noncoding RNAs are associated with specific histone modifications and with DNA52,53. Thus, strategies for measuring gene expression that include quantification of noncoding RNA levels would help scientists understand how these molecules correlate with other epigenomic features.
For some projects, it will be worthwhile to measure chromatin features such as binding of transcription factors or chromatin-interacting proteins, histone variants and positioning of nucleosomes48. Characterization of chromatin structure using DNase I hypersensitivity–based assays may allow correlation of epigenetic states with chromatin accessibility54. Epigenomic features are likely to affect higher-order chromatin structure, and the introduction of new methods for analyzing higher-order chromatin (Hi-C, or chromatin interaction analysis using pairedend tag sequencing) may provide additional chromatin structure information that can be correlated with epigenomic features55,56. Analysis of epigenomic maps is, by nature, correlative, and thus understanding of function and mechanism will require examination of the relationships between the observed epigenomic features and proposed biological processes. Manipulation of the epigenome can be achieved globally using pharmacological modulators of epigenetic-modifying enzymes or effector molecules. Similarly, genetic deletion or knockdown, as well as overexpression approaches, can be used to manipulate protein levels. Currently, locus-specific epigenetic manipulation involves expression of fusion proteins, and variations of this approach will probably produce valuable tools for examining the functions of epigenomic features57. Epigenomic data considerations. The volume of data generated by a large-scale epigenomics project is great and thus creates a need for efficient data storage and processing. Furthermore, pipelines must be in place for handling the different data types (e.g., ChIPseq, MethylC-seq, gene expression, DNase I hypersensitivity), performing quality control, comparing replicates and providing statistically correct normalization measures. Further processing includes genome alignments and deposition into databases. Controlled-access databases, such as the Database of Genotypes and Phenotypes (http://www.ncbi.nlm.nih. gov/sites/entrez?db=gap) or the European Genotype Archive (http://www.ebi.ac.uk/ega/ page.php), are used to store DNA sequence information from consenting participants to allow their genomic data to be accessible. Such aligned and quality-controlled data can then be analyzed in a variety of ways58. Coincidence of epigenetic marks at a particular gene locus within a cell or tissue type can be examined to group genes on the basis of their complement of epigenetic features. This type of analysis aims to identify co-occurrence of epigenetic features and thus potential cross-talk between epigenomic regulatory processes. Epigenomic maps from different cell types can be compared
nature biotechnology volume 28 number 10 october 2010
to identify features at particular gene loci that are indicative of a particular cell type or tissue. Data from normal and diseased tissue can also be compared to identify epigenetic features at gene loci that might be associated with a particular disease state or environmental exposure. Epigenomic data can also be correlated with phenotype, genetic variation (single-nucleotide polymorphisms or copy-number variants), gene expression levels, chromatin accessibility or other data types. Recent integrative analysis has begun to reveal the predictive value of epigenetic marks, as described above. Computational approaches for some of these epigenomic analyses are being developed8,11. In addition to sophisticated computational analysis, visualization tools are also needed for intuitive data display that allows the nonexpert to check the epigenetic state of their gene of interest. Conclusions Large-scale epigenomic mapping projects have the potential to provide global, integrated views of different cellular states. This information will almost certainly provide new biological insights for different fields of science, particularly in the areas of basic gene regulatory processes, cellular differentiation and reprogramming, and the role of epigenetic regulation in disease processes. It is hoped that a deeper understanding of the influence of epigenetic processes will lead to better knowledge of disease mechanisms, improve disease diagnosis, enable prevention and potentially allow the development of new therapeutic agents. ACKNOWLEDGMENTS We thank members of the Roadmap Epigenomics Consortium and Workgroup as well as the Interim Steering Committee of the International Human Epigenome Consortium for their input. COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests. 1. Bird, A. Nature 447, 396–398 (2007). 2. Suzuki, M.M. & Bird, A. Nat. Rev. Genet. 9, 465–476 (2008). 3. Murrell, A., Rakyan, V.K. & Beck, S. Hum. Mol. Genet. 14 Spec. No. 1, R3–R10 (2005). 4. Ng, R.K. & Gurdon, J.B. Cell Cycle 7, 1173–1177 (2008). 5. Zhang, T.Y. & Meaney, M.J. Annu. Rev. Psychol. 61, 439–466 (2010). 6. Youngson, N.A. & Whitelaw, E. Annu. Rev. Genomics Hum. Genet. 9, 233–257 (2008). 7. Bernstein, B.E., Meissner, A. & Lander, E.S. Cell 128, 669–681 (2007). 8. Hawkins, R.D., Hon, G.C. & Ren, B. Nat. Rev. Genet. 11, 476–486 (2010). 9. Bentley, D.R. et al. Nature 456, 53–59 (2008). 10. Berger, S.L., Kouzarides, T., Shiekhattar, R. & Shilatifard, A. Genes Dev. 23, 781–783 (2009). 11. Ernst, J. & Kellis, M. Nat. Biotechnol. 28, 817–825 (2010). 12. Ozsolak, F. et al. Genes Dev. 22, 3172–3183 (2008). 13. Heintzman, N.D. et al. Nature 459, 108–112 (2009). 14. Bernstein, B.E. et al. Cell 125, 315–326 (2006).
1043
© 2010 Nature America, Inc. All rights reserved.
COMMEN TARY 15. Dindot, S.V., Person, R., Strivens, M., Garcia, R. & Beaudet, A.L. Genome Res. 19, 1374–1383 (2009). 16. Barski, A. et al. Genome Res. 19, 1742–1751 (2009). 17. Guttman, M. et al. Nature 458, 223–227 (2009). 18. Hon, G.C., Hawkins, R.D. & Ren, B. Hum. Mol. Genet. 18, R195–R201 (2009). 19. Lister, R. et al. Nature 462, 315–322 (2009). 20. Meissner, A. Nat. Biotechnol. 28, 1079–1088 (2010). 21. Ball, M.P. et al. Nat. Biotechnol. 27, 361–368 (2009). 22. Deng, J. et al. Nat. Biotechnol. 27, 353–360 (2009). 23. Doi, A. et al. Nat. Genet. 41, 1350–1353 (2009). 24. Gronbaek, K., Hother, C. & Jones, P.A. APMIS 115, 1039–1059 (2007). 25. Tsankova, N., Renthal, W., Kumar, A. & Nestler, E.J. Nat. Rev. Neurosci. 8, 355–367 (2007). 26. Mulero-Navarro, S. & Esteller, M. Crit. Rev. Oncol. Hematol. 68, 1–11 (2008). 27. Hunter, D.J. Nat. Rev. Genet. 6, 287–298 (2005). 28. Altshuler, D., Daly, M.J. & Lander, E.S. Science 322, 881–888 (2008). 29. Kong, A. et al. Nature 462, 868–874 (2009). 30. Haberland, M., Montgomery, R.L. & Olson, E.N. Nat. Rev. Genet. 10, 32–42 (2009). 31. Sharma, S., Kelly, T.K. & Jones, P.A. Carcinogenesis 31, 27–36 (2010).
1044
32. Mack, G.S. J. Natl. Cancer Inst. 98, 1443–1444 (2006). 33. Almeida, A.M. et al. N. Engl. J. Med. 356, 1641–1647 (2007). 34. Goren, A. et al. Nat. Methods 7, 47–49 (2010). 35. Gu, H. et al. Nat. Methods 7, 133–136 (2010). 36. O’Neill, L.P., VerMilyea, M.D. & Turner, B.M. Nat. Genet. 38, 835–841 (2006). 37. Bjornsson, H.T. et al. J. Am. Med. Assoc. 299, 2877– 2883 (2008). 38. Weber, M. et al. Nat. Genet. 37, 853–862 (2005). 39. Ammerpohl, O., Martin-Subero, J.I., Richter, J., Vater, I. & Siebert, R. Biochim. Biophys. Acta 1790, 847– 862 (2009). 40. Meissner, A. et al. Nature 454, 766–770 (2008). 41. Eckhardt, F. et al. Nat. Genet. 38, 1378–1385 (2006). 42. Beck, S. & Rakyan, V.K. Trends Genet. 24, 231–237 (2008). 43. Li, J.B. et al. Genome Res. 19, 1606–1615 (2009). 44. Kriaucionis, S. & Heintz, N. Science 324, 929–930 (2009). 45. Tahiliani, M. et al. Science 324, 930–935 (2009). 46. Hayatsu, H. & Shiragami, M. Biochemistry 18, 632– 637 (1979). 47. Campos, E.I. & Reinberg, D. Annu. Rev. Genet. 43, 559–599 (2009). 48. Park, P.J. Nat. Rev. Genet. 10, 669–680 (2009).
49. Mikkelsen, T.S. et al. Nature 448, 553–560 (2007). 50. Barski, A. et al. Cell 129, 823–837 (2007). 51. Dejardin, J. & Kingston, R.E. Cell 136, 175–186 (2009). 52. Kloc, A., Zaratiegui, M., Nora, E. & Martienssen, R. Curr. Biol. 18, 490–495 (2008). 53. Nagano, T. & Fraser, P. Mamm. Genome 20, 557–562 (2009). 54. Hesselberth, J.R. et al. Nat. Methods 6, 283–289 (2009). 55. Fullwood, M.J. et al. Nature 462, 58–64 (2009). 56. Lieberman-Aiden, E. et al. Science 326, 289–293 (2009). 57. Hansen, K.H. et al. Nat. Cell Biol. 10, 1291–1300 (2008). 58. Bock, C. & Lengauer, T. Bioinformatics 24, 1–10 (2008). 59. Birney, E. et al. Nature 447, 799–816 (2007). 60. The International Cancer Genome Consortium. Nature 464, 993–998 (2010). 61. Bernstein, B. Nat. Biotechnol. 28, 1045–1048 (2010). 62. Jones, P.A. & Martienssen, R. Cancer Res. 65, 11241– 11246 (2005). 63. The American Association for Cancer Research Human Epigenome Task Force, and the European Union, Network of Excellence, Scientific Advisory Board. Nature 454, 711–715 (2008).
volume 28 number 10 OCTOBER 2010 nature biotechnology
c o mmenta r y
The NIH Roadmap Epigenomics Mapping Consortium © 2010 Nature America, Inc. All rights reserved.
Bradley E Bernstein, John A Stamatoyannopoulos, Joseph F Costello, Bing Ren, Aleksandar Milosavljevic, Alexander Meissner, Manolis Kellis, Marco A Marra, Arthur L Beaudet, Joseph R Ecker, Peggy J Farnham, Martin Hirst, Eric S Lander, Tarjei S Mikkelsen & James A Thomson The NIH Roadmap Epigenomics Mapping Consortium aims to produce a public resource of epigenomic maps for stem cells and primary ex vivo tissues selected to represent the normal counterparts of tissues and organ systems frequently involved in human disease.
R
ecent years have seen remarkable progress in understanding of human genetics, enabled by the availability of the human genome sequence and increasingly high-throughput technologies for DNA analysis1. Yet despite their breadth and comprehensiveness, purely DNA sequence–level investigations do not shed light on a crucial component of human biology: how the same genome sequence can give rise to over 200 different cell types through remarkably consistent differentiation programs. This process of developmental specification, classically termed ‘epigenesis’, is now known to involve differential regulation of genes and their products2. Aberrant regulation of such phenomena has been extensively linked to human diseases and, additionally, can be influenced by environmental inputs3–5. Gene regulation and genome function are intimately related to the physical organization of genomic DNA and in particular to the way it is packaged into chromatin, a complex
ucleoprotein structure comprising histones, n DNA binding factors, accessory protein complexes and noncoding RNAs6–9 (Fig. 1). Chromatin is a dynamic entity that is subject to modification of both its DNA and protein components, with direct structural and functional consequences. The term ‘epigenome’ is used to describe the way in which these modifications and structural features are distributed across the genome in a given cell population. The epigenomic landscapes and the associated gene expression programs are maintained within a given cell lineage through complex processes that involve transcription factors, chromatin regulators, histone modifications and variants, and RNAs10–12, but that remain poorly understood in mammals. Although the mechanisms remain obscure, a now overwhelming body of evidence supports central roles for epigenomic changes in disease susceptibility and pathogenesis. Multiple disease processes, including cancer, are now
well known to be associated with characteristic alterations in the patterns of chromatin, DNA methylation and gene expression3,5. In addition, epidemiological studies have linked early environmental exposures, such as in utero starvation, to long-term health consequences ranging from metabolic disorders to psychiatric diseases13. A causal role for epigenomic aberrations is supported by several lines of evidence, including mutations of genes encoding chromatin regulators in developmental disorders and cancer4,14–16, and by the therapeutic efficacy of small-molecule inhibitors of DNA methyltransferases and histone-modifying enzymes17. Major epigenomic features can now be interrogated comprehensively by combining cellular, biochemical and molecular techniques with high-throughput sequencing. Production of genome-wide maps of cytosine methylation, histone modifications, chromatin accessibility and RNA transcripts represents a powerful and
Bradley E. Bernstein, Alexander Meissner, Manolis Kellis, Eric S. Lander and Tarjei S. Mikkelsen are at the Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA; Bradley E. Bernstein is also at the Howard Hughes Medical Institute, Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA; and Alexander Meissner is in the Department of Stem Cell and Regenerative Biology at Harvard University, Cambridge, Massachusetts, USA. John A.
Stamatoyannopoulos is in the Departments of Genome Sciences and Medicine, University of Washington School of Medicine, Seattle, Washington, USA. Joseph F. Costello is in the Department of Neurosurgery, University of California at San Francisco, San Francisco, California, USA. Bing Ren is at the Ludwig Institute for Cancer Research, University of California San Diego School of Medicine, La Jolla, California, USA. Aleksandar Milosavljevic and Arthur L. Beaudet are in the Department of Molecular and Human Genetics, Baylor College of Medicine,
Houston, Texas, USA. Marco A. Marra and Martin Hirst are at the Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, Canada. Joseph R. Ecker is in the Genomic Analysis Laboratory, Salk Institute for Biological Studies, La Jolla, California, USA. Peggy J. Farnham is at the Genome Center, University of California at Davis, Davis, California, USA. James A. Thomson is at the University of Wisconsin School of Medicine and Public Health, Madison, Wisconsin, USA. e-mail: [email protected]
nature biotechnology volume 28 number 10 october 2010
1045
C O M M E N TA R Y The epigenome DNA methylation
DNA accessibility
Histone modifications
Polycomb complex
Histone DNA
© 2010 Nature America, Inc. All rights reserved.
RNA B. Wong
DNA binding proteins
Figure 1 Layers of genome organization. Genome function and cellular phenotypes are influenced by DNA methylation and the protein-DNA complex known as chromatin. In mammals, DNA methylation occurs on cytosine bases, primarily in the context of CpG dinucleotides. Accessible chromatin that is hypersensitive to DNase I digestion marks promoters and functional elements bound by transcription factors or other regulatory proteins. Histone modifications, associated proteins such as Polycomb repressors and noncoding RNAs constitute an additional layer of chromatin structure that affects genome function in a context-dependent manner.
general approach for surveying the regulatory state of the genome in a cell type of interest. The resulting data define the locations and activation states of diverse functional elements, including genes and their transcriptional control elements (e.g., promoters, enhancers and insulators), noncoding transcripts and epigenetic effectors, such as imprinting control regions18–25. More globally, such maps can provide insight into developmental state and potential, for example of a stem cell population, and shed light on aberrant regulatory programs in diseased tissues. Here we describe the aims and scope of the US National Institutes of Health (NIH) Roadmap Epigenomics Mapping Consortium, which has set out to provide a publicly accessible resource of epigenomic maps in stem cells and primary ex vivo tissues. These maps will detail the genome-wide landscapes of DNA methylation, histone modifications and related chromatin features, and are intended to provide a reference for studies of the genetic and epigenetic events that underlie human development, diversity and disease. Below, we describe the organizational structure, goals and anticipated deliverables of the consortium. A coordinated study of human epigenomes In 2008, the NIH Roadmap Epigenomics Mapping Consortium (http://www.roadmap epigenomics.org/) was launched with the goal of producing a public resource of human epigenomic data to catalyze basic biology and disease-oriented research. The consortium leverages experimental pipelines built around next-generation sequencing technologies to map DNA methylation, histone modifications, chromatin accessibility and RNA transcripts in 1046
stem cells and primary ex vivo tissues selected to represent the normal counterparts of tissues and organ systems frequently involved in human disease. The mapping of such normal epigenomes is being undertaken by four Epigenomics Mapping Centers and supported by a Data Analysis and Coordinating Center, which collectively coordinate experimental and analytical efforts to maximize consistency, data quality and overall coverage of the epigenomic landscape. Because the epigenomic landscape varies markedly across tissue types (and between individuals), there is no single ‘reference’ epigenome. Rather, the consortium expects to deliver a collection of normal epigenomes for different tissues and individuals, intended to provide a framework or reference for comparison and integration within a broad array of future studies. A core goal of the consortium is to close the gap between data generation and its public dissemination by rapid release of raw sequence data, profiles of epigenomic features and higher-level integrated maps, in coordination with the US National Center for Biotechnology Information (NCBI). The consortium is also committed to the development, standardization and dissemination of protocols, reagents and analytical tools to enable the research community to utilize, integrate and expand upon this body of data (Fig. 2). Reference maps for major epigenomic features The Epigenomics Mapping Centers have collaboratively established data collection pipelines to produce high-quality, comprehensive epigenomic maps. Specific data
types have been prioritized that offer broad insight into genome regulation, are generally applicable to diverse cell populations and can be evaluated comprehensively and accurately by high-throughput sequencing. These include genomic maps for DNA methylation, histone modifications, chromatin accessibility and RNA expression. The Mapping Centers work with the Data Analysis and Coordination Center to evaluate, compare and integrate the different data types and formats to ensure data quality and standards that enable the larger community to build upon these data. The first of these data types, DNA methylation, is assayed by sequencing DNA that has been treated with sodium bisulfite (BS-seq), or enriched by methylcytosine pulldown (methylated DNA immunoprecipitation (MeDIP)-seq) or methylation-sensitive restriction enzymes (MRE-seq). BS-seq, applied either to whole genomes or to reducedrepresentation samples, has been designated as a primary assay because it provides accurate and consistent nucleotide-resolution data. The consortium is implementing MeDIPseq and MRE-seq on a more limited basis to benchmark and compare these widely applied approaches. A second type of data, histone modifications, are assayed by sequencing DNA enriched by chromatin immunoprecipitation with modification-specific histone antibodies (ChIP-seq). The consortium has implemented rigorous specificity tests that use arrays of differentially modified histone tail peptides to ensure antibody specificity. In addition, common cell sources are collectively profiled and compared, ensuring consistency between the different data-collection centers. Chromatin accessibility is assayed by sequencing DNase I cleavage sites in nuclear chromatin. These assays are performed at high sequencing depth to provide a global survey of accessible regions as well as high-resolution information regarding the protein occupancy of specific sequences24. Finally, RNA expression is assayed by sequencing mRNAs or size-selected small RNA fractions to high depths. These expression data are intended to augment and illuminate the functional output of the epigenomic profiles. Given its mandate to deliver epigenomic maps for hundreds of different cell populations, the consortium must balance breadth of cell coverage with the depth to which different epigenomic features are investigated. High-value cell types, such as human embryonic stem cells (hESCs), will be subjected to deep exploration of a very broad range of histone modifications and comprehensive, single nucleotide–resolution analysis of
volume 28 number 10 october 2010 nature biotechnology
C O M M E N TA R Y
© 2010 Nature America, Inc. All rights reserved.
DNA methylation. Although it is not yet possible to specify a definitive set of features that represent a minimal epigenome, the consortium has initially identified DNA methylation, six major histone modifications (H3K4me1, H3K4me3, H3K9me3, H3K9ac, H3K27me3 and H3K36me3). chromatin accessibility and RNA as essential features that will be assayed in most or all designated cell populations. This combination of deep and broad analysis is expected to maximize coverage of cellular diversity and disease-relevant human tissues, while ensuring that a broad range of epigenomic features is explored. Prioritized cells and tissues The consortium will investigate a diverse collection of cell and tissue models, including hESCs and adult stem cells and their differentiated progeny; induced pluripotent stem cells; and primary ex vivo human fetal and adult tissues. These cells and tissues were prioritized on the basis of broad scientific and biomedical interest, tractability, phenotypic diversity and under-representation in other collaborative projects. Because of their biomedical importance, hESCs and major lineage derivatives have been selected for intensive investigation. The resulting data will offer insight into the distributions, dynamics and inter-relationships among epigenomic features, and catalyze study of their functions in development, epigenetic control and genome regulation. The consortium will also target additional stem cell models, including mesenchymal and neural stem cells, and reprogrammed cells, as in vitro models of development with particular relevance to regenerative medicine. Broader coverage of human cellular diversity will be achieved through study of primary cells and tissues relevant to metabolic and cardiovascular disease, cancer, neuropsychiatric disease, aging and other leading health issues. These will be acquired from primary sources and sorted or otherwise manipulated to obtain suitably homogeneous cell populations that will be directly channeled to data-collection pipelines. Prioritized cell types include sorted hematopoietic lineages, liver, muscle and adipose, as well as selected cell types from breast and neural tissues. In addition, fetal tissues will be analyzed for insight into epigenomic landscapes of early development. Maps for such primary ex vivo tissues are urgently needed because most of our current knowledge has come from either transformed cell lines or cultured cells, both of which experience marked nonphysiologic changes to their chromatin environment, including aberrant DNA hypermethylation and loss of heterochromatin integ-
Figure 2 Portal for the NIH Roadmap Epigenomics Mapping Consortium. A public portal (http://www. roadmapepigenomics.org/) provides general information about the consortium and its participants, along with links to experimental protocols, consortium data and interfaces for visualizing epigenomic maps.
rity. Collectively, profiles for these diverse cell models should offer unprecedented insight into the breadth and dynamics of human epi genomes and provide a durable framework for future explorations of epigenomic changes associated with human disease. Integration and dissemination of human epigenomes The consortium aims to provide the scientific community ready access to a critical mass of high-quality epigenomic data for cells and tissues representative of normal human biology. These data will comprise multiple levels of information, from raw sequencing data and epigenomic profiles for an individual epigenomic feature in a single cell or tissue type, to integrated epigenomic maps that represent a composite of multiple epigenomic profiles for an individual cell type or, alternatively, that capture biological variation of such features across different cell types. The consortium will also develop and disseminate software tools and algorithms to facilitate use of these results by the community—for example, through the ability to search for epigenomic signatures common across genes or loci, or to identify distinguishing features of cell lineages, developmental stages, cellular environments or derivation history. The latter may also be used to classify disease states or to identify aberrant epigenomic features or regulatory programs that underlie human pathology.
nature biotechnology volume 28 number 10 october 2010
The primary web portal for the consortium (http://www.roadmapepigenomics.org/) offers detailed descriptions of the overall project, target cell and tissue types, and epigenomic assays used by the consortium. The portal links to companion sites managed by NCBI (http://www.ncbi.nlm.nih.gov/geo/roadmap/ epigenomics/) and the Data Analysis and Coordination Center (http://www.epigenome atlas.org/) that provide access to raw and processed consortium data along with tools for visualization, analysis and integration of epigenomic data. Progress and challenges The use of established technologies and approaches has enabled the consortium to rapidly initiate data production. Notable progress during the initial phase included production of comprehensive DNA methylomes for an hESC line (H1) and primary fibroblasts26, and generation of hundreds of data sets—for major histone modifications, targeted DNA methylation analysis, RNA expression and chromatin accessibility— representing dozens of cell types, including multiple stem cell lines and ex vivo adult and developing tissues. These data sets are now available for download and viewing at the web portals referenced above. Guided by other NIH genomics projects27, the consortium has adopted a data release policy under which users will have immediate access to the data 1047
C O M M E N TA R Y
© 2010 Nature America, Inc. All rights reserved.
but are expected to abide by a moratorium on submission or presentation of works that incorporate these data for the 9 months following their release. Any effort of this scope inevitably faces challenges and obstacles. The chief issues have revolved around cell-type selection and acquisition, assay standardization and developing the infrastructures for integration and dissemination of epigenome-scale data sets. Cell type selection and acquisition. A key ongoing challenge relates to the identification and prioritization of cells and tissues by the consortium. Ideally, models are selected on the basis of pervasive biological and medical importance. However, the decisions are confounded by issues of tractability. Many high-value primary tissues are available in limited quantities that push the detection boundaries of c urrent technologies. In addition, isolating relatively homogeneous populations from certain complex tissues can involve extensive preparative steps that may themselves effect changes to the epigenome. Finally, our relatively crude understanding of inter-individual epigenomic variation leaves open the question of how many samples of a given tissue type must be a nalyzed to yield a representative map. These challenges highlight the importance of technology development, including effective procedures for isolating homogeneous cell populations, interrogating small samples and increasing the throughput of the assays. Standardization of assays. The consortium is implementing the latest epigenomic technologies based on next-generation s equencing technology. Because these technologies continue to evolve and are inherently dependent on preparative steps, there is an ongoing need to benchmark and validate assays. In the case of histone modification assays, substantial resources must be committed to procurement and validation of high-quality antibody reagents, including confirmation of biochemical specificity and ChIP-seq efficacy. In the case of DNA methylation, there is a need to benchmark and standardize different assay types, including BS-seq applied either to reduced representations of the genome or to the whole genome, as well as various enrichment methods in widespread use by the scientific community28.
1048
Data integration and dissemination. Several challenges have emerged at the level of data handling and analysis. First, a clearer understanding of the underlying data sets in terms of sensitivity, specificity and precision is needed and is being pursued as a joint effort among the centers. Second, the sheer volume and complexity of consortiumgenerated data has pushed the limits of existing analytical and visualization tools. Thus, the development of a new generation of tools for integration, dissemination and interpretation of epigenomic data is vital to the overall success of the program.
ENCODE or implicated in studies of genome variation may be understood. Such information will be essential for appreciating the relevance of detected genomic elements and variants to normal development and human disease. In the coming years, the Roadmap Epigenomics Program and other complementary efforts should vastly improve understanding of the organization of the human epigenome and how it varies across tissues, individuals and disease states—information that may translate directly into the identification of aberrant epigenetic events that underlie susceptibility to specific diseases and environmental exposures.
Future and context The long-term goal of epigenomics research is a fuller understanding of how global changes in diverse functional features superimposed on the human genome sequence contribute to cellular phenotypes in health and disease. This is a complex and ambitious undertaking, the realization of which will ultimately require systematic dissection and analysis of tissues, characterization of disease models and detailed exposition of regulatory mechanisms through model-organism studies. The efforts of the Roadmap Epigenomics Mapping Consortium to establish an expansive resource of epigenomic maps of normal cell and tissue phenotypes represents an important step in this direction. By catalyzing subsequent mechanistic studies of chromatin, DNA methylation and transcription, these efforts should provide a springboard for disease-focused studies, such as those currently being pursued under the parallel Roadmap program Epigenomics of Human Health and Disease. These Roadmap efforts will also be complemented by other major initiatives, such as the International Human Epigenome Consortium, which was established to accelerate and coordinate epigenomics research worldwide (see accompanying paper29). More broadly, the consortium aims to foster synergistic interactions with related collaborative projects, including the Encyclopedia of DNA Elements (ENCODE) Consortium18, the International HapMap Project and the 1000 Genomes Project30. The Epigenomics Mapping Consortium is distinguished from these efforts by the broad set of normal primary tissues and stem cell–derived developmental models that it will survey. As such, it will provide a highly complementary resource through which the in vivo state and behavior of DNA elements catalogued under
COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests. ACKNOWLEDGMENTS We thank R. Waterland, C. Epstein, N. Shoresh and all consortium members, as well as the NIH Epigenomics Implementation Group, for discussions and feedback in the drafting of this document. 1. Altshuler, D., Daly, M.J. & Lander, E.S. Science 322, 881–888 (2008). 2. Bird, A. Nature 447, 396–398 (2007). 3. Feinberg, A.P. Nature 447, 433–440 (2007). 4. Jaenisch, R. & Bird, A. Nat. Genet. 33 Suppl, 245–254 (2003). 5. Jones, P.A. & Baylin, S.B. Cell 128, 683–692 (2007). 6. Kouzarides, T. Cell 128, 693–705 (2007). 7. Bernstein, B.E., Meissner, A. & Lander, E.S. Cell 128, 669–681 (2007). 8. Fraser, P. & Bickmore, W. Nature 447, 413–417 (2007). 9. Zaratiegui, M., Irvine, D.V. & Martienssen, R.A. Cell 128, 763–776 (2007). 10. Schwartz, Y.B. & Pirrotta, V. Nat. Rev. Genet. 8, 9–22 (2007). 11. Grewal, S.I. & Moazed, D. Science 301, 798–802 (2003). 12. Henikoff, S. Nat. Rev. Genet. 9, 15–26 (2008). 13. Jirtle, R.L. & Skinner, M.K. Nat. Rev. Genet. 8, 253– 262 (2007). 14. Hess, J.L. Crit. Rev. Eukaryot. Gene Expr. 14, 235–254 (2004). 15. Hansen, R.S. et al. Proc. Natl. Acad. Sci. USA 96, 14412–14417 (1999). 16. Dalgliesh, G.L. et al. Nature 463, 360–363 (2010). 17 Batty, N., Malouf, G. G. & Issa, J. P. Cancer Lett. 280, 192–200 (2009). 18. Birney, E. et al. Nature 447, 799–816 (2007). 19. Heintzman, N.D. et al. Nature 459, 108–112 (2009). 20. Eckhardt, F. et al. Nat. Genet. 38, 1378–1385 (2006). 21. Meissner, A. et al. Nature 454, 766–770 (2008). 22. Cokus, S.J. et al. Nature 452, 215–219 (2008). 23. Barski, A. et al. Cell 129, 823–837 (2007). 24. Hesselberth, J.R. et al. Nat. Methods 6, 283–289 (2009). 25. Mikkelsen, T.S. et al. Nature 448, 553–560 (2007). 26. Lister, R. et al. Nature 462, 315–322 (2009). 27. Toronto International Data Release Workshop Authors Nature 461, 168–170 (2009). 28. Suzuki, M.M. & Bird, A. Nat. Rev. Genet. 9, 465–476 (2008). 29. Satterlee, J. Nat. Biotechnol. 28, 1039–1044 (2010). 30. Frazer, K.A. et al. Nature 449, 851–861 (2007).
volume 28 number 10 october 2010 nature biotechnology
c o m m e n ta r y
Epigenomics reveals a functional genome anatomy and a new approach to common disease © 2010 Nature America, Inc. All rights reserved.
Andrew P Feinberg Epigenomics provides the context for understanding the function of genome sequence, analogous to the functional anatomy of the human body provided by Vesalius a half-millennium ago. Much of the seemingly inconclusive genetic data related to common diseases could therefore become meaningful in an epigenomic context.
N
ew Year’s Eve in 2014 will mark the fivehundredth anniversary of the birth of Andreas van Wesel, commonly known as Vesalius, author of De humani corporis fabrica1, a treatise almost as influential in its time as was On the Origin of Species over three centuries later. Vesalius pioneered the rigorous study of human anatomy and introduced experimental observation into medical education as a substitute for hearsay. The late Victor McKusick, who helped to create the Human Genome Project and mapped the first human autosomal gene, called gene mapping “neo-Vesalian”2, as it represented an anatomy of the genome, similar to Vesalius’ anatomy of the body, for finding genes. Vesalius was more than a mapper, though: he challenged the dogma of both Galen and Aristotle on the anatomy of blood circulation by using the arrangement of structures in the body to correctly deduce their functions. Similarly, the particular order of genes on chromosomes and the arrangement of the chromosomes themselves have only recently been found to be meaningful biologically, not just as a map. I suggest here that epigenomics—that is, the genome-scale study of epigenetics—has transformed genome science by showing that the organization of the genome is important for gene function, just as Vesalius showed that the organization of anatomic structures allowed the function of organs. Moreover, the combination Andrew P. Feinberg is at the Center for Epigenetics and Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA. e-mail: [email protected]
of new epigenomic tools with conventional genetics, and a new mathematical language for their interface, may have as much impact on understanding of human disease as did Vesalius’ anatomy a half-millennium ago. Epigenomics provides a functional anatomy of the genome Epigenomics has helped to reveal several surprising large-scale functional relationships among genes themselves and the surrounding nongenic DNA, previously hinted at by the β-globin cluster. One is the generality of large (tens to thousands of kilobases) genomic regions regulating gene expression. Although the β-globin gene cluster had been studied for decades3 and progressive chromatin changes had been linked to globin gene switching during development4, the generality and size of multigene chromatin domains emerged only with large-scale epigenomic mapping. As increasing numbers of imprinted genes were found, it was discovered that they were organized in gene clusters, often with common regulatory elements, such as CCCTC binding factor (CTCF) binding sites5. With the advent of genome-scale mapping of histone modifications, many large regions of heterochromatin modifications have been found, such as specific modifications associated with the inactive X chromosome6. Moreover, large autosomal regions of heterochromatin modification across Hox gene clusters have been determined to be more highly conserved across species than the underlying DNA sequence and do not simply correspond to exonic boundaries7. Thus, epigenomic studies have revealed
nature biotechnology volume 28 number 10 october 2010
that the functional genome is at least an order of magnitude greater in scope than what was suspected on the basis of the sequence alone. Epigenomics has provided the genome with the kind of functional anatomy that Vesalius gave gross anatomy five centuries ago. Another unexpected large-scale genomic relationship is frequent intra- and interchromosomal interactions mediated by chromatin proteins. These were discovered through chromatincapture methods, described in detail elsewhere in this issue8, designed to preserve chromatinmediated interactions over long distances. DNA loop structures, mediated by chromatin, highly dynamic and surprisingly common, are associated with function. For example, several interleukin genes in the 200-kilobase (kb) mouse TH2 cytokine locus, when transcriptionally active, are folded into numerous loops anchored by special AT-rich sequence-binding protein (SATB) at their bases9. Remarkably, trans interactions between chromosomes involve some of the same sequences that epigenetically regulate imprinted gene domains, such as the H19 differentially methylated region, and may act through transvection to regulate genes in trans10. A recent example of large-scale genomic organization mediated by chromatin is the link between long RNAs, heterochromatin modification and gene activity. At the ‘Biology of Genomes’ meeting held at Cold Spring Harbor, New York, USA on 11–15 May 2005, Tom Gingeras of Cold Spring Harbor Laboratory asked for a wager on the number of genes that will ultimately be agreed upon, arguing that the nearly 50% of the genome that may be untranslated RNA will be proved 1049
COMM E NTARY
8,000
Approximate number of publications
300
Selected genes Genome scale
250 7,000 200
6,000 5,000
150 4,000 100
3,000 2,000
50
Approximate number of publications
9,000
1,000 0 2009
2007
2005
2003
2001
1999
1997
1995
1993
1991
1989
1987
1985
0
© 2010 Nature America, Inc. All rights reserved.
Year
Figure 1 The rate of increase of genome-scale publications addressing cancer genetics has become greater than that of publications in the same area focused on selected genes. Whereas published genome-scale studies represent only about 2% of cancer epigenetics, the rate of increase over the past 5 years of cancer epigenomic studies is double that of conventional analyses based on selected genes. Numbers are approximate, from PubMed citation analysis; scales are different for gene-based and genome-based plots; 2010 data are extrapolated.
functional11. Growing evidence indicates that much of this RNA mediates chromatin structure. For example, antisense RNAs appear to establish heterochromatin in mammalian genes, independently of Dicer and the post-translational microRNA machinery12. These regions may span >100 kb12, affect multiple genes and involve Argonaut-family proteins13. An exciting recent discovery is the role of long intergenic noncoding RNAs (lincRNAs) in establishing heterochromatin. For example, HOTAIR is a lincRNA that retargets PRC2 over HOX domains, leading to marked changes in gene expression relevant to cancer progression14. Finally, large organized chromatin lysine (K) modifications (or LOCKs) have been shown to organize the genome into very large blocks (hundreds to thousands of kilobases), some of which are differentiation-specific in their location and extent and correspond to lamin-associated domains (LADs)15–17. These very large regions may provide a dynamic mechanism for functional organization of the genome and are altered in cancer15. Large-scale mapping studies offer additional clues that many such large-scale epigenetic networks profoundly influence cellular development and genome function. For example, CTCF, which mediates H19 imprinting, seems to play a general role in defining the boundaries of functional gene regions18. Likewise, target genes of Polycomb, a protein thought to be involved in stable gene silencing, may alternate between functionally active and silent states 1050
over large gene regions19. That such networks have a general role in organizing the genome functionally is suggested by the identification of chromosome territories and the spatial proximity of gene-rich chromosomes20. Epigenomics may supersede single-gene epigenetic disease research Just as epigenomics provides a functional anatomy of the normal genome, genome-scale studies of epigenetic disease are helping us understand epigenetic pathology. And just as cancer was the vanguard for gene-specific disease epigenetics21, genome-scale epigenetic studies of disease have also focused first on cancer, revealing much more genetic pathology than was suggested by candidate-gene approaches. For example, methylation changes can affect large genomic regions in colorectal cancer22, and widespread methylation changes are even more striking outside of the usually examined CpG islands (i.e., in shores and gene bodies)23. Similarly, it came as a surprise to most when widespread alterations in histone acetylation and methylation were found to be ubiquitous in cancer24. Stem cells, the focus for a wide range of both basic and applied research on disease, have shown promiscuous methylation differences from somatic cells on a genome-wide scale, notably including differences at non-CpG sites25. Remarkably, the sites of differential methylation largely overlap, with strong statistical significance, across physiological states—the same sites appear, for example, in normal cells compared with cancer cells, in stem cells compared with differentiated cells
and in comparisons of tissues derived from different germ layers26. Thus, the language of epigenomic organization seems to be common for normal development and for disease, just as the language of anatomy is common for normal and abnormal physiology. Increasing appreciation of the importance of large-scale epigenetic control in regulating gene function has influenced how disease-based genomic studies are being organized. Although published genome-scale studies represent only about 2% of cancer epigenetics, the rate of increase over the past five years of cancer epigenomic studies is more than double that of conventional gene-based analyses of cancer (Fig. 1). The same relative increase in genome-scale studies also seems to apply in the nascent field of noncancer human disease epigenetics, such as epigenetics of cardiovascular, immunological and neuropsychiatric disease27,28. These differences are driven in part by the availability of new technology, of course, but also by the growing realization that variation in both DNA methylation and chromatin are widespread across the genome and may be organized into large genomic domains. Another important factor driving such ‘disease epigenomics’ is the relatively limited yield to date of conventional single-nucleotide polymorphism (SNP)–based genetic analysis in explaining most common human diseases. As has been widely described in both scientific29,30 and lay publications31, it was anticipated a decade ago that genetic analysis would be much more successful at attributing risk of disease to specific genetic markers. How is epigenomics transforming the search for genetic causes of common human diseases? Many have suggested that environmentally driven epigenetic variation may be an important contributing factor in disease risk, particularly as a surrogate for mutational change32–34 (Table 1). But researchers should also consider another dimension to this epigenetic argument for common disease, an aspect that has received comparatively less attention. Because the actual ‘genome anatomy’ target for disease is probably much larger than scientists previously realized—perhaps involving more than half of the genome—and because understanding of the normal function of this genome anatomy requires epigenomics, it is possible that much of what appears to be negative genetic-association data could become meaningful in an epigenomic context (Table 1). For example, most genome-wide association studies (GWAS) identify not genes, but nearby regions or intergenic deserts. Yet these same regions frequently harbor differentially methylated regions that discriminate tissue types or distinguish cancer
volume 28 number 10 october 2010 nature biotechnology
COMM E NTARY
© 2010 Nature America, Inc. All rights reserved.
Table 1 How epigenomics is transforming the search for genetic causes of common human disease Epigenome anatomy
Possible disease link
New approach to common disease search
Environmentally driven epigenetic variation
Epigenome changes in absence of sequence variant
Methylome arrays, capture bisulfite sequencing, chromatin immunoprecipitation with sequencing RNA sequencing and methods above
Regulatory site or expression
Noncoding RNAs
Key disease sequences unlinked to target genes
Intra- and interchromosomal interactions
Chromatin network mapping
Regulatory sequence distant from gene
Coregulated gene clusters
Genome-scale methylation, chromatin mapping
Sequence-defined methylation
Sequence variants controlling epigenome
Linked GWAS and epigenome studies
New class of VMRs
Sequence variants controlling epigenomic variance
New statistics for reexamining and integrating GWAS
Domain disruption, anchoring proteins
LOCKs and LADs
Native chromatin whole-genome analysis
from normal cells. They are also the canonical regions for lincRNAs that help establish chromatin structure and normal gene function. Furthermore, gene deserts may promote trans associations of chromosomes in epigenetic regulation35. Another way in which diseaseassociated DNA sequence variants might affect disease risk is through their linkage to DNA sequences that regulate DNA methylation, chromatin modification or binding factors. Substantial association of SNPs with DNA methylation has already been found36,37. An additional possibility my group has proposed is that DNA sequence variants themselves might affect the stochastic or environmentally influenced variance in the epigenome. According to this model, individuals in a complex species would gain an evolutionary advantage by including alleles for increased epigenetic variation per se (i.e., genetic alleles that increase epigenetic variance without affecting the mean)38. This would be like an evolutionary ‘hedging one’s bet’ and would confer an advantage for genes in pathways whose environment changes epochally (e.g., in response to the abundance of food and water). Examining inbred mice from the same litter and living in the same cage, we identified hundreds of variably methylated regions (VMRs) that are highly enriched by functional annotation for key genes in development and embryonic pattern formation38. Thus, development itself, which is regulated by epigenetics, probably includes a great deal of stochasticity at the epigenetic level. Genetic variants that increase this developmental plasticity at specific targets may confer an evolutionary advantage but might be deleterious to some individuals after a recent epochal change in the environment, such as the recent Western diet38. Intriguingly, several VMRs have recently been linked to body mass index39. Finally, researchers are only beginning to understand the role of LOCKs and LADs in functional genome organization. Their assessment in disease will require robust genome-scale approaches to native chromatin measurement and availability of clinical specimens permitting such analyses (Table 1).
Future technology development What potential areas for future technology development will fuel growth in this area? Of course, as in non-epigenetic genome science, all roads lead to sequencing, including bisulfite genome-scale sequencing for DNA methylation. The rollout of inexpensive, comprehensive and high-throughput single-molecule sequencing has been slower than promised, and second-generation sequencing is still impractical for large-scale epidemiological studies involving thousands of patients, except for capture-based methods, such as padlock probes40. The dilemma in capture-based studies is that although they offer enormous advantages in throughput, single-base resolution and allele-specific data, they will not reveal regions of differential methylation where we do not already know to look—a problem that may be vast as epigenomics is applied to an ever increasing number of diseases. At the same time, high-throughput sequencing is relatively cheap now for examining chromatin modifications—but that is true only for studies working, for example, with modifications on a fairly small fraction of the genome purified by chromatin immunoprecipitation. For large regional changes, such as LOCKs, there are cost limitations similar to those for wholegenome bisulfite sequencing. An important advance will come from reagents, such as the arrays from Illumina (San Diego) and others, that are cheap and amenable to processing by typical university core laboratories. For example, a soon-to-be-released methylation chip from Illumina will provide ~450,000 targets, including all CpG islands and shores, as well as DNase-hypersensitive sites and other regions identified and curated for this purpose by a consortium of laboratories organized by Tom Hudson of McGill University in Montreal. Although this reagent may not be next year’s or even this year’s most comprehensive tool, 450,000 targets isn’t bad—and such cooperative approaches open epigenomic research to any general laboratory, a very exciting development. Other exciting technological initiatives include epigenomic analysis of microdissected
nature biotechnology volume 28 number 10 october 2010
samples or even single cells, and enrichment of small chromosomal fragments for biochemical analysis of chromatin41. A new epigenetic epidemiology will need to be crafted. Research can no longer consider genetic variation in isolation when looking for disease relationships. Samples in ongoing and future large-scale cohorts must be preserved to allow analysis of DNA methylation and chromatin. But retrospectively, a great deal can be added to existing cohort studies, as DNA methylation is stable over decades. Much of the existing genetic data might be made clearer by supplementing those studies with epigenomic analysis. New cohort sampling should include standard sources, such as lymphocytes, but also, as much as possible, target tissues affected by the disease. Additionally, we need to develop new statistical and epidemiological tools for disease epigenomics and for its synthesis with conventional genetic analysis. For example, unlike SNPs, epigenetic variation is inherently quantitative and thus does not lend itself to simple allele designation (for example, quantitative levels of DNA methylation or Polycomb complex members). The quantitative nature of epigenome variation can help explain complex traits with a smaller number of contributing loci, as they do not necessarily require as many of the additive signals originally proposed by R.A. Fisher42. Such an approach is being applied, for example, to the analysis of quantitative traits associated with VMRs39. The apparent additional complexity that epigenomics brings to genetics may seem daunting. But I don’t think Vesalius would have been intimidated, and I know Victor would have been delighted. COMPETING FINANCIAL INTERESTS The author declares no competing financial interests. ACKNOWLEDGMENT I thank E. Pujadas, K. Reddy and R. Ohlsson for comments on the manuscript. This work was supported by US National Institutes of Health grant 5R37CA054358. 1. Vesalius, A. De humani corporis fabrica libri septem (J. Oporini, Basel, Switzerland, 1543).
1051
COMM E NTARY 16. Hawkins, R.D. et al. Cell Stem Cell 6, 479–491 (2010). 17. Peric-Hupkes, D. et al. Mol. Cell 38, 603–613 (2010). 18. Smith, S.T. et al. Dev. Biol. 328, 518–528 (2009). 19. Schwartz, Y.B. et al. PLoS Genet. 6, e1000805 (2010). 20. Lieberman-Aiden, E. et al. Science 326, 289–293 (2009). 21. Feinberg, A.P. & Vogelstein, B. Nature 301, 89–92 (1983). 22. Frigola, J. et al. Nat. Genet. 38, 540–549 (2006). 23. Irizarry, R.A. et al. Nat. Genet. 41, 178–186 (2009). 24. Fraga, M.F. et al. Nat. Genet. 37, 391–400 (2005). 25. Lister, R. et al. Nature 462, 315–322 (2009). 26. Doi, A. et al. Nat. Genet. 41, 1350–1353 (2009). 27. Saterlee, J., Schubeler, D. & Ng, H. Nat. Biotechnol. 28, 1039–1044 (2010). 28. Portela, A. & Esteller, M. Nat. Biotechnol. 28, 1057– 1068 (2010). 29. Manolio, T.A. et al. Nature 461, 747–753 (2009). 30. Goldstein, D.B. N. Engl. J. Med. 360, 1696–1698 (2009).
31. Wade, N. A decade later, genetic map yields few new cures. New York Times (12 June 2010). 32. Bjornsson, H.T., Fallin, M.D. & Feinberg, A.P. Trends Genet. 20, 350–358 (2004). 33. Petronis, A., Paterson, A.D. & Kennedy, J.L. Schizophr. Bull. 25, 639–655 (1999). 34. Jiang, Y.H., Bressler, J. & Beaudet, A.L. Annu. Rev. Genomics Hum. Genet. 5, 479–510 (2004). 35. Gondor, A. & Ohlsson, R. Nature 461, 212–217 (2009). 36. Kerkel, K. et al. Nat. Genet. 40, 904–908 (2008). 37. Gibbs, J.R. et al. PLoS Genet. 6, e1000952 (2010). 38. Feinberg, A.P. & Irizarry, R.A. Proc. Natl. Acad. Sci. USA 107 Suppl 1, 1757–1764 (2010). 39. Feinberg, A.P. et al. Sci. Transl. Med. 2, 49ra67 (2010). 40. Deng, J. et al. Nat. Biotechnol. 27, 353–360 (2009). 41. Bernstein, B.E. et al. Nat. Biotechnol. 28, 1045–1048 (2010). 42. Barton, N.H., Briggs, D.E.G., Eisen, J.A., Goldstein, D.B. & Patel, N.H. Evolution (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, USA, 2007).
© 2010 Nature America, Inc. All rights reserved.
2. McKusick, V.A. J. Am. Med. Assoc. 286, 2289–2295 (2001). 3. Proudfoot, N.J., Shander, M.H., Manley, J.L., Gefter, M.L. & Maniatis, T. Science 209, 1329–1336 (1980). 4. Crossley, M. & Orkin, S.H. Curr. Opin. Genet. Dev. 3, 232–237 (1993). 5. Viville, S. & Surani, M.A. Bioessays 17, 835–838 (1995). 6. Boggs, B.A. et al. Nat. Genet. 30, 73–76 (2002). 7. Bernstein, B.E. et al. Cell 120, 169–181 (2005). 8. van Steensel, B. & Dekker, J. Nat. Biotechnol. 28, 1089–1095 (2010). 9. Cai, S., Lee, C.C. & Kohwi-Shigematsu, T. Nat. Genet. 38, 1278–1288 (2006). 10. Sandhu, K.S. et al. Genes Dev. 23, 2598–2603 (2009). 11. Kapranov, P., Willingham, A.T. & Gingeras, T.R. Nat. Rev. Genet. 8, 413–423 (2007). 12. Yu, W. et al. Nature 451, 202–206 (2008). 13. MacFarlane, L.A., Gu, Y., Casson, A.G. & Murphy, P.R. Mol. Endocrinol. 24, 800–812 (2010). 14. Gupta, R.A. et al. Nature 464, 1071–1076 (2010). 15. Wen, B., Wu, H., Shinkai, Y., Irizarry, R.A. & Feinberg, A.P. Nat. Genet. 41, 246–250 (2009).
1052
volume 28 number 10 october 2010 nature biotechnology
c o mm e n ta r y
Putting epigenome comparison into practice Aleksandar Milosavljevic
© 2010 Nature America, Inc. All rights reserved.
Comparative analysis of epigenomes offers new opportunities to understand cellular differentiation, mutation effects and disease processes. But the scale and heterogeneity of epigenetic data present numerous computational challenges.
M
any layers of epigenomic information are being mapped using methods based on high-throughput sequencing and microarrays, but thus far, integrative analysis of epigenomic data has been limited by the relatively few types of cells that have been assayed1–4. The most recent achievement in this area is computational inference of chromatin states5 defined by combinations of histone marks. New initiatives6, enabled by high-throughput sequencing–based assays, aim to systematically sample many diverse cell types. In addition, falling costs for DNA sequencing are making it feasible to conduct smaller-scale projects focused on specific diseases. This denser sampling of the space of epigenomic variation by large and small projects alike should provide unprecedented opportunities for discovery by comparative analysis of epigenomes. Unlike DNA sequence, however, epigenomic data are not digital. Furthermore, epigenomes may be measured at several levels of resolution, from the 1-base-pair (bp) resolution of DNA methylation detected by whole-genome bisulfite sequencing to >100-bp-resolution maps of histone marks or of methylation measured via methylated DNA immunoprecipitation and high-throughput sequencing (MeDIP-seq)7. In addition, epigenomic signals may be spread throughout the genome and may not necessarily be associated with any specific genomic element. Epigenomic information may vary between cell types, between individuals and even between cells
Aleksandar Milosavljevic is at The NIH Epigenomics Roadmap Data Analysis and Coordination Center, Molecular and Human Genetics Department, Baylor College of Medicine, Houston, Texas, USA. e-mail: [email protected]
of the same type in a population. It may also be influenced by many molecular processes, including transcriptional regulation, splicing, and DNA recombination, replication and repair8. Epigenomic diversity spans several timescales, ranging from short-term physiological processes, such as memory formation9 and cell differentiation10, to long-term processes, such as aging11 and evolutionary variation12. Epigenomic variation is also influenced by genetic, environmental, disease-associated and experimental perturbations. The wide spectrum of biological processes involving epigenomic variation points to an opportunity for discovery by comparative epigenome analysis. Comparative analysis has been successfully applied to genomic DNA sequences and to perturbations of gene expression patterns13. As the sampling of epigenomic diversity improves, comparative analyses of epigenomes will provide increasing opportunities for discovery by identifying, at ever finer levels of detail, epigenomic changes that correlate with each other and with biologically significant variables. Here I describe two applications of comparative analysis of epigenomes and then consider the relevant computational and cyberinfrastructure challenges. Comparing epigenomes to map cellular differentiation Waddington’s epigenetic landscape concept14,15 suggests a bifurcating branching pattern of cellular differentiation. The now iconic picture of the landscape is a visual representation of cellular differentiation along specific trajectories in the abstract multi-dimensional space of molecular states within a cell. This totality of molecular states includes what we now refer to as the epigenome. Epigenomes from several related cell types might provide sufficient
nature biotechnology volume 28 number 10 october 2010
information to infer the bifurcating branching patterns of the epigenetic landscape. Studies of differentiation mediated by the Polycomb-Trithorax system suggest that this will be possible. In embryonic stem cells, Polycomb-Trithorax regulates genes containing CpG islands in their promoters. Such genes reside in a ‘bivalent’ or ‘poised’ state, defined by the presence of both trimethylated lysine 4 on histone H3, an epigenetic mark associated with active genes, and trimethylated lysine 27 on histone H3 (H3K27me3), a mark associated with inactive genes4,16. Genes marked with this chromatin state may be activated or inactivated upon differentiation. A recent study17 has identified extensive patterns of H3K27me3 shared by two pancreatic cell types, beta cells and acinar cells, which is consistent with their common developmental history. Specifically this study found that the epigenomes of beta cells contain H3K27me3 marks characteristic of the endodermal lineage of the pancreatic cells, whereas the gene expression signature of beta cells largely resembles those of ectodermderived neural tissues. Additional results suggest that the neural expression program of beta cells is activated during late pancreatic cell differentiation by a small number of transcriptional regulators. This case shows that epigenomes provide information about cell lineages that may not be available at the level of gene expression. One method to reconstruct the presumably bifurcating patterns of differentiation is the cladistic method18, which has been used to recover evolutionary branching patterns of speciation. Unlike purely numerical methods that use the totality of measurements of a single type, the cladistic method focuses on select evidence (‘characters’) from a diversity of sources relevant for the reconstruction of a tree pattern19. 1053
C O M M E N TA R Y Project 2
Data level
0
1
Data processing cluster
2
evidence that the effects of copy-number variants on the epigenome may be widespread. The study reports that the effects of copy-number variation on gene expression are not limited to the genes within copy-altered loci, as had commonly been assumed. In fact, most of the affected genes reside far from the structural change, leading the authors to hypothesize that the effects of structural variants may be mediated by local changes in chromatin structure. Epigenome comparisons are likely to be useful in testing this hypothesis.
Project 3
Human epigenome atlas
Comparison
Katie Vicari
© 2010 Nature America, Inc. All rights reserved.
Project 1
3
Comparison cluster
Database cluster
Figure 1 A proposed cyberinfrastructure for epigenome analysis and comparison. The cyberinfrastructure would connect users and resources that are geographically distributed over the network. A clinical researcher conducting a study of disease-related epigenomic perturbations would rely almost completely on remote resources distributed over the web for primary processing of the data (data levels 0–3) and comparative analysis using a human epigenome atlas.
By focusing on the bifurcating tree as the underlying structure, the cladistic method succeeded in integrating evidence from paleontological and molecular data18. By analogy, in case of cell differentiation, the method holds promise for integrating data obtained by direct measurements on partially differentiated cell types and from reconstructions based on fully differentiated ones. Comparing epigenomes to understand genetic variation A comparison of two epigenomes may reveal differences that are due to the variation in the underlying genomic sequence. This may be accomplished by identifying differences between the epigenomes that coincide with changes in genomic sequences in the same locus. The effects of genetic variation on the epigenome are just beginning to be comprehended20, with the exception of a few relatively well-understood genomic loci where variants cause human diseases. In some cases, such as in Rett syndrome, where the methyl-CpG– binding protein MeCP2 is mutated, genetic 1054
mutation acts in trans, in that mutation at a single locus alters genome-wide patterns of epigenome maintenance. Alternatively, genetic variants may act in cis to alter local patterns of epigenomic marks, as shown, for example, by a recent high-resolution genome-wide comparison of DNA methylation and single-nucleotide polymorphisms (SNPs) in humans21. This study found that allele-specific skewing of methylation levels occurs at >35,000 sites across the genome, suggesting that sequence variants have pervasive effects on the epigenome. Moreover, genetic mutations are known to affect local epigenetic marks in diseases such as fragile X and facioscapulohumeral muscular dystrophy. The frequency with which sequence variants cause phenotypically significant changes in the epigenome is an open question. A plan has been proposed22 to use patterns of allelespecific epigenomic marks to identify SNPs of functional significance within critical regions detected by genome-wide association studies. Epigenome comparisons may also help identify functional consequences of structural variants. Cahan et al.23 recently provided indirect
Computational and engineering challenges ahead Comparing epigenomes to each other and to other types of data is challenging because the resolution of epigenomic signals is assay dependent and may not match the resolution of the other data sets. For example, assays of DNA methylation based on bisulfite sequencing yield data at nucleotide resolution, whereas MeDIP assays offer hundred-base-pair resolution7. There are a number of different solutions to this problem. One is to average signals over fixed-size ‘windows’ across the genome or over features such as exons, introns or enhancer elements. An alternative is to parse epigenomic signals into discrete peaks. This is suitable for punctate peaks, such as trimethylation of lysine 4 on histone H3, but not for the broad peaks associated with many other signals, such as trimethylation of lysine 36 on histone H3. There will probably be numerous ways in which the genome-wide signals are transformed into numerical data for epigenome comparison, with each transformation being appropriate for specific purposes. Epigenomes may be compared by searching for similarity or by detecting differences. Searches for similarity among epigenomes may borrow from methods developed for wholegenome comparison. In particular, comparing epigenomes may require a combination of global and local ‘alignment’ methods. Unlike genomic sequence, however, which provides a convenient concept of ‘locality’ in the one-dimensional base-pair coordinate system, comparing epigenomes may require sets of noncontiguous loci to be analyzed together to accommodate our knowledge of the three-dimensional organization of chromosomes in the nucleus or our knowledge of thousands of loci spread throughout the genome that are co-regulated by master regulators of development. Such sets may be created by grouping genomic regions containing binding sites of specific master regulators, genes related to a particular differentiation pathway or gene elements such as promoters. Interpreting specific differences between two epigenomes will depend on our understanding
volume 28 number 10 october 2010 nature biotechnology
C O M M E N TA R Y of the background variation in the signal. In analogy to DNA sequence comparisons, we need to understand which epigenomic marks are conserved at a specific locus and which are under looser constraint in the same locus. Of course, the immediate problem is that we currently do not have much knowledge about the conservation of epigenomic marks across genomic loci.
Gradual accumulation of data will solve this problem but probably not in a definitive way, because variation is not only locus-dependent but may be also highly context dependent. For example, variation during development in one cell lineage may have different meaning than variation in a different lineage or variation due to aging. Consequently, observed epigenomic
differences will be open to context-dependent reinterpretation as more data accumulate. The comparative interpretation of epigenomic signals will also pose several technical and engineering challenges that are often grouped under the term ‘cyberinfrastructure’. These challenges include the standards, resources and tools for computer-aided discovery, data sharing and
Table 1 Key concepts for epigenomics research cyberinfrastructure Requirement
Concept
Description and examples of relevance for epigenomics Data level 0 refers to DNA sequence reads, typically in short read format (SRF) or fastq format.
© 2010 Nature America, Inc. All rights reserved.
Data level 1 refers to reads mapped to a reference assembly, typically in sequence alignment/map (SAM), binary equivalent of SAM (BAM) or browser-extensible data (BED) formats. Level 1 data can be used to identify both genomic and epigenomic variation. These data also include the unmapped (repetitive) fraction of reads. Data levela
Data level 2 refers to ‘raw epigenomic signal’ such as read density plots, CpG methylation counts 28 or other statistics, frequently in the bigWig UCSC Genome Browser format29. Data level 3 refers to typically discrete data such as chromatin immunoprecipitation with sequencing (ChIP-seq) peak calls or hidden Markov model segmentations segmentations of the genome into chromatin states. These data are obtained by analyzing individual or multiple marks from a single sample. Depending on data volume, they are stored either in high-density or in simple tab-delimited (GFF, LFF) formats. Data level 4 refers to results of epigenome comparisons. Syntax and semantics for this data level are still under development.
Data reuse and integration
Syntax
Data formats to meet the often conflicting requirements of storage efficiency for high-volume data (bigWig), simplicity (tab-delimited) and machine readability (JavaScript Object Notation, or JSON; Extensible Markup Language, or XML).
Semantics
Theory of meaning. This term is commonly used in connection with controlled vocabularies and ontologies, such as the widely used Gene Ontologies and other ontologies produced by the Open Biomedical Ontologies Foundry and other projects.
Semantic Web (Web 3.0)
Set of technologies developed by the World Wide Web Consortium, including Resource Description Framework for knowledge representation, that allows programmatic communication and automated reasoning about information shared across the web.
Metadata
Data about data, a key requirement for data reuse. Various minimal standards have been recommended by groups such as the Minimum Information for Biological and Biomedical Investigations project. In coordination with the European Bioinformatics Institute and the DNA Database of Japan, and guided by feedback from the NIH Epigenomics Roadmap initiative and other users, NCBI has now developed version 1.2 of a Sequence Read Archive (SRA)-XML metadata format for assays with sequencing readouts. Shared metadata formats will be essential for successful coordination of international epigenome projects.
Pipeline
A set of analysis tools that are invoked sequentially to perform a data analysis task. Galaxy 30 is a software suite with an interactive interface and an online service for pipeline design. One example is integration of the EpiGRAPH software for epigenome analysis using Galaxy31 to identify epigenomic modifications that characterize highly polymorphic (SNP-rich) promoters.
Workflow
A formal, portable, programmatically executable description of a data analysis process. May be used as metadata to document and ensure reproducibility of data analysis. Projects developing workflow systems include Galaxy, GenePattern and Taverna.
Workbench
An environment for integration of data analysis and visualization tools and data sets (for example, CLC Genomics Workbench and Genboree Workbench).
URI and URL
The address system of the Web, used to uniquely identify objects, such as web pages and epigenome maps, for access by web browsers and other computer programs via Hypertext Transfer Protocol (HTTP) and other protocols.
REST API
Representational State Transfer Application Programming Interface. A programming interface, typically implemented using HTTP, that is developed using a set of design principles to ensure efficient communication of computer programs over the web. Provides access to data and computing resources over the web using scripts written in a programming language such as Pearl, Python, Ruby or JavaScript.
Cloud computing
Access to scalable, on-demand computing and storage services over the web.
Software as a service
Access to software applications over the web, such as those for epigenomic data processing and comparison (Fig. 1). This is a key aspect of Web 2.0 (see below).
Authentication protocol
Protocol (for example, OpenID) allowing users or computer programs acting as their agents to be recognized by multiple web servers.
Web 2.0
Web hosting of collaborative processes such as grant review at the NIH or epigenomic data processing and comparison (Fig. 1).
Tool integration
Web services and programmatic interoperability
Access to computing resources and services
Collaboration and publication
Databases, knowledge bases and archival repositories
Examples include NCBI Gene Expression Omnibus and SRA archives, Ensembl, UCSC Genome Browser and more specialized resources such as the human epigenome atlas (Fig. 1).
aThis abstraction captures commonalities and facilitates development of data formats and tools for a diversity of genomic and epigenomic assays. Examples in the table focus on assays with sequencing readouts.
nature biotechnology volume 28 number 10 october 2010
1055
C O M M E N TA R Y
© 2010 Nature America, Inc. All rights reserved.
collaboration over the web. The problems of high-volume data capture, visualization, interpretation and reuse are currently recognized as key limiting factors across scientific disciplines24. Table 1 lists infrastructure requirements and concepts relevant (but not necessarily specific) to epigenome research. A few of these are described in detail below. Data reuse. One practical cyberinfrastructure challenge for epigenomics research is to enable effective data exchange and reuse. The first step in this direction is to develop a unifying framework for the multiple layers of heterogeneous information generated by sequencing- and array-based assays. Data standards are emerging from the coordination between the Cancer Genome Atlas, the 1000 Genomes Project, the Encyclopedia of DNA Elements and the US National Institutes of Health (NIH) Epigenomics Roadmap (see ‘data levels’ in Table 1). The abstract data levels codify commonalities across the diversity of assays and technologies used to obtain data. As the diversity of derived data and knowledge increases, advanced methods for knowledge representation and exchange, such as the Resource Description Framework derived in the context of the Semantic Web, will need to be applied25. Metadata standards. Metadata is a key requirement for reuse of epigenomic data in the public domain for comparative analyses because it provides the biological and experimental context in which the data were generated. One example is the Sequence Read Archive XML schema developed by the US National Center for Biotechnology Information (NCBI) and adapted by the NIH Epigenomics Roadmap initiative for epigenomic data. Reproducibility. Another practical challenge is to ensure reproducibility of reported analysis results26. This problem may be tackled by encapsulating all aspects of computational analyses in the form of workflow descriptions and distributing them as metadata with analysis results. Data storage and computing power. Epigenome comparisons and higher-level interpretations will require substantial computational resources. The use of multiple data
1056
and computing resources that are geographically distributed over the web, and of ‘cloud computing’ (using shared remote computer hardware) and programming frameworks such as the Genome Analysis Toolkit27, may be helpful. A human epigenome atlas. How will cyberinfrastructure be used to facilitate epigenome comparison? Figure 1 illustrates a hypothetical scheme that includes several projects and could involve clinical researchers using web-based services to process epigenomic data and perform comparative analyses. This model, known as ‘software as a service’, is appealing because fewer local resources would be required. Such an arrangement would be particularly important for adoption of epigenomics in the context of translational and disease-focused studies, where local bioinformatics resources and expertise may be limited. Many projects could use cloud computing and well-tested pipelines with built-in quality-characterization steps that take in sequencing data (data level 0) as it is delivered from sequencers and generate epigenomic signals at the level of individual samples (data levels 1–3). These signals would be compared against a human epigenome atlas, which would serve as a reference data set much like the reference human genome. Other types of visualization and analysis are possible. Upon publication, raw data and the results of analyses would be archived and incorporated into the human epigenome atlas and other specialized repositories. One open issue is how best to involve the research community in the continued development and maintenance of repositories such as a human epigenome atlas. To stimulate the contribution of smaller projects to these data and knowledge commons, the NIH Epigenomics Roadmap Consortium is collaborating with the NCBI to develop standards for epigenomic metadata and define reference pipelines for uniform processing and characterization of the quality of a variety of epigenomic assays. Conclusions In summary, comparative analysis of epigenomes is likely to provide many novel insights. Mapping the bifurcating tree of cellular differentiation should be useful for understanding development. Precise and comprehensive
mapping of epigenomic perturbations should reveal consequences of genomic mutations and environmental influences on human development and disease. To achieve these goals, we must develop conceptual and computational approaches that address the heterogeneity and context dependence of epigenetic data. In addition, discovery would be aided by the building of a cyberinfrastructure that includes shared repositories and knowledge bases able to accommodate the unprecedented volume of data and diversity of applications. COMPETING FINANCIAL INTERESTS The author declares competing financial interests: details accompany the full-text HTML version of the paper at http://www.nature.com/ naturebiotechnology/ 1. Birney, E. et al. Nature 447, 799–816 (2007). 2. Barski, A. et al. Cell 129, 823–837 (2007). 3. Heintzman, N.D. et al. Nat. Genet. 39, 311–318 (2007). 4. Mikkelsen, T.S. et al. Nature 448, 553–560 (2007). 5. Ernst, J. & Kellis, M. Nat. Biotechnol. 28, 817–825 (2010). 6. Bernstein B.E. et al. Nat. Biotechnol. 28, 1045–1048 (2010). 7. Harris, R.A. et al. Nat. Biotechnol. 28, 1097–1105 (2010) 8. Kouzarides, T. Cell 128, 693–705 (2007). 9. Levenson, J.M. et al. J. Biol. Chem. 281, 15763– 15773 (2006). 10. Reik, W. Nature 447, 425–432 (2007). 11. Rakyan, V.K. et al. Genome Res. 20, 434–439 (2010). 12. Bernstein, B.E. et al. Cell 120, 169–181 (2005). 13. Lamb, J. et al. Science 313, 1929–1935 (2006). 14. Slack, J.M. Nat. Rev. Genet 3, 889–895 (2002). 15. Waddington, C.H. The Strategy of the Genes: A Discussion of Some Aspects of Theoretical Biology (Allen & Unwin, London, 1957). 16. Bernstein, B.E. et al. Cell 128, 669–681 (2007). 17. van Arensbergen, J. et al. Genome Res. 20, 722–732 (2010). 18. Ridley, M. Evolution and Classification: the Reformation of Cladism (Longman, London UK, 1989). 19. Hennig, W. Phylogenetic Systematics (University of Illinois Press, Urbana, Illinois, 1966). 20. Meaburn, E.L. et al. Epigenetics 5, 578–582 (2010). 21. Schalkwyk, L.C. et al. Am. J. Hum. Genet. 86, 196– 212 (2010). 22. Tycko, B. Am. J. Hum. Genet. 86, 109–112 (2010). 23. Cahan, P. et al. Nat. Genet. 41, 430–437 (2009). 24. Tony Hey, S.T. & Tolle, K. (eds). The Fourth Paradigm: Data-Intensive Scientific Discovery (Microsoft Research, Seattle, 2009). 25. Wang, X. et al. Nat. Biotechnol. 23, 1099–1103 (2005). 26. Mesirov, J.P. Science 327, 415–416 (2010). 27. McKenna, A. et al. Genome Res. 20, 1297–1303 (2010). 28. Xi, Y. & Li, W. BMC Bioinformatics 10, 232 (2009). 29. Rosenbloom, K.R. et al. Nucleic Acids Res. 38, D620–D625 (2010). 30. Goecks, J. et al. Genome Biol. 11, R86 (2010). 31. Bock, C. et al. Methods Mol. Biol. 628, 275–296 (2010).
volume 28 number 10 october 2010 nature biotechnology
review
Epigenetic modifications and human disease © 2010 Nature America, Inc. All rights reserved.
Anna Portela1 & Manel Esteller1,2 Epigenetics is one of the most rapidly expanding fields in biology. The recent characterization of a human DNA methylome at single nucleotide resolution, the discovery of the CpG island shores, the finding of new histone variants and modifications, and the unveiling of genome-wide nucleosome positioning maps highlight the accelerating speed of discovery over the past two years. Increasing interest in epigenetics has been accompanied by technological breakthroughs that now make it possible to undertake large-scale epigenomic studies. These allow the mapping of epigenetic marks, such as DNA methylation, histone modifications and nucleosome positioning, which are critical for regulating gene and noncoding RNA expression. In turn, we are learning how aberrant placement of these epigenetic marks and mutations in the epigenetic machinery is involved in disease. Thus, a comprehensive understanding of epigenetic mechanisms, their interactions and alterations in health and disease, has become a priority in biomedical research. Even before DNA was identified as the molecule of inheritance, scientists knew that not every gene in an organism can be active in each cell at all times. Even so, all cells in an organism share the same genetic information. Conrad Waddington coined the term ‘epigenetic landscape’1,2 for the molecular mechanisms that convert this genetic information into observable traits or phenotypes. In many instances, epigenetic gene expression patterns and associated phenotypes persist through mitosis or even meiosis, although no change in the primary DNA sequence has occurred. Consequently, epigenetics is generally understood to be the study of mechanisms that control gene expression in a potentially heritable way. Recent breakthroughs in the understanding of the mechanisms underlying epigenetic phenomena and their prevalence as contributors to the development of human disease have led to a greatly enhanced interest in epigenetic research. On a molecular level, covalent modifications of cytosine bases and histones, and changes in the positioning of nucleosomes are commonly regarded as the driving epigenetic mechanisms. They are fundamental to the regulation of many cellular processes, including gene and microRNA expression, DNA-protein interactions, suppression of transposable element mobility, cellular differentiation, embryogenesis, X-chromosome inactivation and genomic imprinting. In multicellular organisms, the ability of epigenetic marks to persist during development and potentially be transmitted to offspring may be necessary for generating the large range of different phenotypes that arise from the same genotype1,3–5. For instance, cloned animals 1Cancer
Epigenetics and Biology Program (PEBC), Bellvitge Biomedical Research Institute (IDIBELL), Barcelona, Catalonia, Spain. 2Institucio Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Catalonia, Spain. Correspondence should be addressed to M.E. ([email protected]). Published online 13 October 2010; doi:10.1038/nbt.1685
nature biotechnology volume 28 number 10 OCTOBER 2010
generated from the same donor DNA are not identical to, and develop diseases with different penetrance from, their donor1,3. Human clones that arise spontaneously—monozygotic twins—are identical at the DNA sequence level, but have different DNA methylation4,5 and histone modification profiles4 that might affect the penetrance of several diseases, such as cancer4 or autoimmune disorders6. But this phenomenon is also observed at a single cell level: how can stem cells develop into any type of cell and how does a liver cell always give rise to two new liver cells after cell division? Again, epigenetics seems to be part of the answer as it has been described as one of the key factors in cellular differentiation7,8 (see the review by Meissner9 in this issue). The importance of epigenetics in maintaining normal development and biology is reflected by the observation that many diseases develop when the wrong type of epigenetic marks are introduced or are added at the wrong time or at the wrong place10. For instance, a clear causality role for DNA methylation in cancer is suggested by hypermethylation of some genes (e.g., p16INK4a, p14ARF and MGMT) as an early event in tumorigenesis, as well as by tumor type-specific methylation landscape11. Here we summarize recent progress in the field of epigenetic research and its role in disease, preparing ourselves for the surprises that epigenetics might hold in the future. Epigenetic modifications and their machineries For didactic purposes, epigenetic modifications can be grouped into three main categories: DNA methylation, histone modifications and nucleosome positioning. It is important to keep in mind the interplay between epigenetic factors—as the observed outcome is always the sum of their interactions—and the many positive and negative feedback mechanisms. DNA methylation. The most widely studied epigenetic modification in humans is cytosine methylation. DNA methylation occurs almost exclusively in the context of CpG dinucleotides. The CpG dinucleotides tend to cluster in regions called CpG islands1, defined as regions of 1057
re v ie w a
Unmethylated CpG island DNMT
RNA pol TF
DNMT
MBDMBD x MBDMBD
TF RNA pol
E1
b
Methylated CpG island
E2
E3
E1
Unmethylated CpG island shore
E2
Methylated CpG island shore
x
RNA pol
~ 2kb
© 2010 Nature America, Inc. All rights reserved.
c
TF
TF
RNA pol
E1
~ 2kb
E2
Methylated gene body RNA pol TF
d
TF
E1
RNA pol
E2
x
E3
x
x
E4
Methylated repetitive sequence
x
Repetitive sequence
E3
E1
E2
Unmethylated gene body RNA pol
RNA pol RNA pol
E1
TF
E2
TF
RNA pol
E3
TF
TF
E4
Unmethylated repetitive sequence Transposition Recombination Genome instability
Repetitive sequence
Figure 1 DNA methylation patterns. DNA methylation can occur in different regions of the genome. The alteration of these patterns leads to disease in the cells. The normal scenario is depicted in the left column and alterations of this pattern are shown on the right. (a) CpG islands at promoters of genes are normally unmethylated, allowing transcription. Aberrant hypermethylation leads to transcriptional inactivation. (b) The same pattern is observed when studying island shores, which are located up to 2 kb upstream of the CpG island. (c) However, when methylation occurs at the gene body, it facilitates transcription, preventing spurious transcription initiations. In disease, the gene body tends to demethylate, allowing transcription to be initiated at several incorrect sites. (d) Finally, repetitive sequences appear to be hypermethylated, preventing chromosomal instability, translocations and gene disruption through the reactivation of endoparasitic sequences. This pattern is also altered in disease.
more than 200 bases with a G+C content of at least 50% and a ratio of observed to statistically expected CpG frequencies of at least 0.6. CpG dinucleotides are usually quite rare in mammalian genomes (~1%). About 60% of human gene promoters are associated with CpG islands and are usually unmethylated in normal cells, although some of them (~6%) become methylated in a tissue-specific manner during early development or in differentiated tissues12 (Fig. 1a). In general, CpG-island methylation is associated with gene silencing. DNA methylation plays a key role in genomic imprinting, where hypermethylation at one of the two parental alleles leads to monoallelic expression13. A similar gene-dosage reduction is observed in X-chromosome inactivation in females14. DNA methylation can inhibit gene expression by various mechanisms. Methylated DNA can promote the recruitment of methyl-CpG-binding domain (MBD) proteins. MBD family members in turn recruit histonemodifying and chromatin-remodeling complexes to methylated sites15,16. DNA methylation can also directly inhibit transcription by precluding the recruitment of DNA binding proteins from their target sites17. In contrast, unmethylated CpG islands generate a chromatin structure favorable for gene expression by recruiting Cfp1, which associates with histone methyltransferase Setd1, creating domains rich in the histone methylation mark H3K4 trimethylation (H3K4me3; see below)18. 1058
DNA methylation does not occur exclusively at CpG islands. The term CpG island shores, referring to regions of lower CpG density that lie in close proximity (~2 kb) of CpG islands, has recently been coined. The methylation of these CpG island shores is closely associated with transcriptional inactivation (Fig. 1b). Most of the tissue-specific DNA methylation seems to occur not at CpG islands but at CpG island shores19,20. Differentially methylated CpG island shores are sufficient to distinguish between specific tissues and are conserved between human and mouse. Moreover, 70% of the differentially methylated regions in reprogramming are associated with CpG island shores20,21. DNA methylation is less frequently coupled with transcriptional activation, as when, for instance, it occurs at gene bodies (Fig. 1c). Gene body methylation is common in ubiquitously expressed genes and is positively correlated with gene expression22. It has been proposed that it might be related to elongation efficiency and prevention of spurious initiations of transcription23. DNA methylation and DNA methylation–associated proteins not only participate in gene transcription regulation in cis, but also act in trans, being involved in nuclear organization and in the establishment of specific chromosomal territories. An imprinted region can physically interact with sequences distant in the primary sequence or on different chromosomes. These physical interactions in trans can volume 28 number 10 OCTOBER 2010 nature biotechnology
re v ie w a
de novo DNMT
Maintenance DNMT
DNA methylation
Hemimethylated DNA
DNMT1
Recruited by EZH2 and G9A (HMTs) DNMT3A
It is recruited to methylated DNA by URHF1
Directly repressed by miR-29b
Indirectly repressed by miR-29b, through SP1
b
Histone modifications
Methylation
HDAC1 and 2 can be recruited by MeCP2
SETDB1 and Suv39h (HMTs) are recruited by MBD1
HMT mir-449a targets HDAC1
© 2010 Nature America, Inc. All rights reserved.
HDAC
SET7 (HMT) regulates DNMT1 stability
HDM
Chromatin remodeling
SWI/SNF
DNMT3A is recruited by HRR3me
Phosphorylation
KDM1B (HDM) is required to establish maternal genomic imprint
H3S10ph blocks H3K9me Kinase
ISWI
Mi-2
miR-9* and miR-124 mediate the BAF to npBAF switch
NURF recognizes the H3K4me3
BRM is recruited by MeCP2
H4K16ac inhibits chromatin remodeling by ISWI
ISW2 excludes SWI/SNF from promoters by postioning nucleosomes
SET domains (HMT) recognize ISWI-remodeled nucleosomal species
H3S10ph facilitates H3 recognition by GCN5 (HAT)
Phosph. JAK2 phosphorylates H3,
LSD1 is a subunit of the NuRD complex
among others…
c
DNMT3B
Acetylation
HAT
Interaction with nucleosomes containing methylated DNA
releasing HP1α
INO80
CHD5 expression is repressed by CpG island methylation
SWR1 removes the H2A-H2B dimmers and replaces them with H2A.Z-H2B dimmers
MBD3 is an integral subunit of Mi-2/NurD
p400 has HAT activity
HDAC and 2 are integral components of Mi-2/NuRD
H2Aph enhances INO80 recruitment
Figure 2 Epigenetic machinery and interplay among epigenetic factors. Epigenetic marks are catalyzed by different epigenetic complexes, whose principal families are illustrated here. (a–c) Epigenetic regulation depends on the interplay among the different players: DNA methylation (a), histone marks (b) and nucleosome positioning (c). The interaction among the different factors brings about the final outcome. This figure illustrates selected examples of the possible interrelations among the various epigenetic players.
regulate transcription, as shown for the H19 imprinting control region and the Osbpl1a/Impact loci24. Other examples of epigenetic players that cause three-dimensional (3D) rearrangements of the genome to regulate gene expression are the DNA methylation enzyme (DNA methyltransferase 1; DNMT1) that participates in the maintenance of the nucleolar compartment architecture25 and the methyl-CpG-binding domain (MBD) protein MeCP2, which is required for the formation of a silent chromatin loop at the Dlx5-Dlx6 locus26 (the 3D organization of the genome is discussed in more detail in the review by van Steensel and Dekker27 in this issue). DNA methylation is not only linked to gene transcription regulation. A significant fraction of deeply methylated CpGs is found in repetitive elements (Fig. 1d). This DNA methylation is needed to protect chromosomal integrity, which is achieved by preventing reactivation of endoparasitic sequences that cause chromosomal instability, translocations and gene disruption11. Although DNA methylation mainly occurs in the CpG dinucleotide context in mammals, non-CG methylation has recently been described in humans at CHG and CHH sites (where H is A, C or T). CHG and CHH methylation has been found in stem cells and seems to be enriched in gene bodies directly correlated with gene expression and to be depleted in protein binding sites and enhancers28. The levels of non-CpG methylation decrease during differentiation and nature biotechnology volume 28 number 10 OCTOBER 2010
are restored in induced pluripotent stem cells, suggesting a key role in origin and maintenance of pluripotent state28,29. Mechanisms of no-CpG methylation remain unclear29. In addition to 5-methylcytosines, 5-hydroxymethyl-2′-deoxycytidine has also been observed. So far, 5-hydroxymethyl-2′-deoxycytidine has been reported in Purkinje cells (constituting 0.6% of total nucleotides) and in granule cells (constituting 0.2% of total nucleotides), but it seems not to be present in cancer cell lines30. These new DNA modifications need to be further studied to determine their implications for normal and diseased epigenetic regulation. More work is also required in the development of new technological approaches31,32 and powerful analytical tools33, which have proven to be crucial for the progress of the field34. Massive parallel sequencing is providing lots of data, but its accurate analysis and interpretation, and its price remain as the last drawbacks to work with DNA methylomes at base resolution. Beyond sequencing-based technologies, the recently released, refined methylation arrays are worth considering for certain genomic questions. DNA methylation is mediated by the DNMT family of enzymes that catalyze the transfer of a methyl group from S-adenosyl methionine to DNA. In mammals, five members of the DNMT family have been reported: DNMT1, DNMT2, DNMT3a, DNMT3b and DNMT3L, but only DNMT1, DNMT3a and DNMT3b possess methyltransferase activity. 1059
© 2010 Nature America, Inc. All rights reserved.
re v ie w The catalytic members of the DNMT family are customarily classified into de novo DNMTs (DNMT3A and DNMT3B) and maintenance DNMTs (DNMT1). DNMT3A and DNMT3B are thought to be responsible for establishing the pattern of methylation during embryonic development. The de novo DNMTs are highly expressed in embryonic stem (ES) cells and downregulated in differentiated cells15. The DNMT3 family contains a third member, DNMT3L, which is required for establishing maternal genomic imprinting, despite being catalytically inactive35. DNMT3L is expressed during gametogenesis when genomic imprinting takes place. It acts as a general stimulatory factor for DNMT3a and DNMT3b and interacts and co-localizes with them in the nucleus36,37. The maintenance DNMT, DNMT1, has a 30- to 40-fold preference for hemimethylated DNA, and also has de novo DNMT activity. DNMT1 is the most abundant DNMT in the cell and is transcribed mostly during the S phase of the cell cycle. It is most often needed to methylate hemimethylated sites that are generated during semi-conservative DNA replication (Fig. 2). In a cellular context the affinity of DNMT1 to newly synthesized DNA is increased by its interaction with the DNA polymerase processing factor proliferating cell nuclear antigen (PCNA), ensuring localization to the replication fork38. The ubiquitin-like plant homeodomain and RING finger domain-containing protein 1 (UHRF1) could perform a similar function, tethering DNMT1 to hemimethylated DNA, thanks to its SET and RING associated–domain, that shows strong preferential binding to hemimethylated CpGs39 (Fig. 2a). However, the division of labor between de novo and maintenance methylation is not always so clear, and a revised model has recently been proposed by Jones and Liang40. The updated model still supports the idea that the bulk of DNA methylation in dividing cells would be maintained by DNMT1 in conjunction with UHRF1 and PCNA. But it also proposes that DNMT3A and DNMT3B, which have been shown to anchor strongly to nucleosomes containing methylated DNA41 (Fig. 2a), are compartmentalized in methylated regions, methylating the sites missed by DNMT1 at the replication fork. Finally, DNMT2, despite containing all the catalytic signature motifs of conventional DNMTs, has almost no DNMT activity. However, it has been reported that DNMT2 methylates tRNAAsp (ref. 42). One of the most intriguing questions in the DNA methylation field is how the DNA methylation machinery is directed to specific sequences in the genome. Several mechanisms have been proposed, mainly suggesting interaction of DNMTs with other epigenetic factors41,43–47 (Fig. 2). More recently, small inhibitory (si)RNA-mediated, RNA-directed DNA methylation have also been described. In plants, RNA-directed DNA methylation is a stepwise process initiated by double-stranded RNAs that recruit DNMTs to catalyze de novo DNA methylation of specific regions including not only gene promoters but also repetitive sequences48–51. Although the process is well studied in plants and some of the RNAdirected DNA methylation components are conserved in mammals, it is still unclear if similar processes are involved in regulating DNA methylation in animals. There are no reports suggesting the involvement of long intergenic ncRNA (lincRNAs) in DNA methylation. Histone modifications. Histones are key players in epigenetics. The core histones H2A, H2B, H3 and H4 group into two H2.A-H2.B dimers and one H3-H4 tetramer to form the nucleosome. A 147-bp segment of DNA wrapped in 1.65 turns around the histone octamer and neighboring nucleosomes are separated by, on average, ~50 bp of free DNA. The core histones are predominantly globular except for their N-terminal tails, which are unstructured52. Histone H1 is called the linker histone. It does not form part of the nucleosome but binds to the linker DNA (that is, the DNA separating two histone complexes), sealing off the nucleosome at the location where DNA enters and leaves53. 1060
All histones are subject to post-transcriptional modification. Several post-transcriptional modifications occur in histone tails: acetylation, methylation, phosphorylation, ubiquitination, SUMOylation and ADPribosylation52,54, among others (Fig. 3). Histone modifications have important roles in transcriptional regulation, DNA repair55, DNA replication, alternative splicing56 and chromosome condensation52. In relation to its transcriptional state, the human genome can be roughly divided into actively transcribed euchromatin and transcriptionally inactive heterochromatin. Euchromatin is characterized by high levels of acetylation and trimethylated H3K4, H3K36 and H3K79. On the other hand, heterochromatin is characterized by low levels of acetylation and high levels of H3K9, H3K27 and H4K20 methylation57. Recent studies have demonstrated that histone modification levels are predictive for gene expression. Actively transcribed genes are characterized by high levels of H3K4me3, H3K27ac, H2BK5ac and H4K20me1 in the promoter and H3K79me1 and H4K20me1 along the gene body58 (Fig. 4). However, the notion of heterochromatin as a transcriptionally inactive region has been challenged by the discovery of numerous noncoding RNAs (ncRNAs) derived from heterochromatic loci51. For instance, Schizosaccharomyces pombe centromeric regions express siRNAs that bind to the RNA-induced transcriptional silencing complex and provide sequence specificity to the complex. The RNA-induced transcriptional silencing complex is required for H3K9 methylation at centromeric repeats and for the recruitment of the histone methylation enzyme Clr4, which is essential for the spreading of heterochromatic domains51,59,60. But centromeric siRNAs are not the only ncRNAs that are capable of directing histone modifications61. Well-known examples of this phenomenon in humans are the ncRNAs XIST and HOTAIR. XIST is involved in the silencing of the inactive X chromosome in females, through the recruitment of Polycomb-repressing complexes (PRC) with methyltransferase and histone ubiquitinase activity62,63. HOTAIR is a lincRNA transcribed from the HOXC cluster that represses genes in the HOXD cluster by recruiting the histone methyltransferase PRC2 (ref. 64). All the modifications described so far are covalent post-transcriptional modifications. However, a new type of modification has recently been described. The histone H3 tail is clipped after the Ala21 residue, cutting off the N-terminal 21 residues and associated post-transcriptional modifications. This modification represents the first massive clearing of histone marks to be reported. Histone H3 clipping seems to be inhibited by H3K4me65. Histones can be modified at different sites simultaneously. The core histones forming the nucleosome can each have several modifications, giving rise to cross-talk among the different marks. Communication among histone modifications can occur within the same site66, in the same histone tail67 and among different histone tails68 (Fig. 2b). Thus, a single histone mark does not determine outcome alone; instead, it is the combination of all marks in a nucleosome or region that specifies outcome. A recent paper has described the existence of up to 51 distinct ‘chromatin states’ based on the enrichment of specific combinations of histone modifications. Distinct biological roles are suggested for the different chromatin states69. An interesting case of co-existing histone modifications is found in ES cells within the ‘bivalent domains’, where the H3K4me3 active mark is found together with the H3K27me3 repressive mark at promoters of developmentally important genes. Bivalent domains enable ES cells to tightly regulate and rapidly activate gene expression during different developmental processes, but are lost with cell commitment70,71. As mentioned before, all the epigenetic players interact with each other. An interesting example of the interplay between histone modifications and DNA methylation is the relationship between DNMT3L volume 28 number 10 OCTOBER 2010 nature biotechnology
re v ie w
P
M P
A P
A
A
P
A
U
M A
M
N-S E T A P… …A E K T P V… …K S A G A A K R K A S… …K A V A A S K E R… …A L K K A L... 3
H1.4
26 27
17 18 A
A
A
34
M
M
36
M
46
63 64
52 A
P
M
P
P
…K S L V S K G T L V Q T K… …S F K L N… …K S A K K T… …K K A K S… …P K S P A… -C 90
85 P
H2A
A
A
A
154
A
168
172
186
U P M
M
N-S R G K Q G G K A R A K A K S… …L R K G N… …L G K V T… …L P K K T E S H…-C 1 A
H2B
A
149
106
97
5
9
M
A
13
15
36 A
P A
M
U
119 120
99 A
A
A
A
U
N-…P A K S A... …K G S K K A V T K… …V Y K V L… …Y N K R S… …L A K H A… …K A V T K…-C 5
12
M P AM
1415
M AMP P
20
A
43
85
A
M A
108 A
M AM P
M
MM
116
120
P
P
© 2010 Nature America, Inc. All rights reserved.
N-A R T K Q T A R K S T G G K A P R K Q L A T K A A R K S A P A T G G V K K P H R Y R P G T V… 2 3 4
H3.1
A
8 9 1011
14
17 18
26 27 28
23
36 37
45
41
M
M
…Y Q K S T… …D F K T D…-C 56
P
H4
M
79
A
A
A
M
A
A
M
P
M
A
A
A M
N-S G R G K G G K G L G K G G A K R H R K... …R I S G L… …V L K V F… …K R K… …L K R Q…-C 1
3
5
8
12
16
20
47
59
77
79
91 92
Figure 3 Histone modifications. All histones are subject to post-transcriptional modifications, which mainly occur in histone tails. The main posttranscriptional modifications are depicted in this figure: acetylation (blue), methylation (red), phosphorylation (yellow) and ubiquitination (green). The number in gray under each amino acid represents its position in the sequence.
and H3K4. DNMT3L specifically interacts with histone H3 tails, inducing de novo DNA methylation by recruitment of DNMT3A; however, this interaction is strongly inhibited by H3K4me 43. Furthermore, several histone methyltransferases have also been reported to direct DNA methylation to specific genomic targets by recruiting DNMTs44,45, helping in this way to set the silenced state established by the repressive histone marks. Moreover, histone methyltransferases and demethylases can also modulate the stability of DNMT proteins, thereby regulating DNA methylation levels46,47 (Fig. 2b). On the other hand, DNA methylation can also direct histone modifications. For instance, methylated DNA mediates H3K9me through MeCP2 recruitment72. Many enzymes that catalyze covalent post-transcriptional modifications have been described52,73. Because the modifications are dynamic, enzymes to remove these post-transcriptional modifications have also been reported52,73,74. However, the list of histone modifications, its writers and erasers, might not yet be completed. Of the enzymes that modify histones, methyltransferases, histone demethylases and kinases are the most specific to individual histone subunits and residues52,75. Conversely, most of the histone acetyltransferases (HATs) and histone deacetylases (HDACs) are not highly specific and modify more than one residue. Many transcriptional co-activators (e.g., GCN5, PCAF, CBP, p300, Tip60 and MOF) have been reported to possess intrinsic HAT activity, whereas many transcriptional co-repressor complexes (e.g., mSin3a, NCoR/SMRT and Mi-2/NuRD) contain subunits with HDAC activity66. Surprisingly, it has recently been reported that HDACs and HATs are both targeted to transcribed regions of active genes by phosphorylated RNA polymerase II. Thus, most HDACs in the human genome function to reset chromatin by removing acetylation at active genes, whereas HATs, by contrast, are mainly linked to transcriptional activation76. nature biotechnology volume 28 number 10 OCTOBER 2010
Nucleosome positioning. Nucleosomes are a barrier to transcription that blocks access of activators and transcription factors to their sites on DNA, at the same time they inhibit the elongation of the transcripts by engaged polymerases. The packaging of DNA into nucleosomes appears to affect all stages of transcription, thereby regulating gene expression. In particular, the precise position of nucleosomes around the transcription start sites (TSSs) has an important influence on the initiation of transcription. A preferential positioning of nucleosomes can be described at any given genomic locus. Nucleosome displacements of as few as 30 bp at TSS have been implicated in changes in the activity of RNA polymerase II. Moreover, the 5′ and 3′ ends of genes possess nucleosome-free regions needed to provide space for the assembly and disassembly of the transcription machinery. The loss of a nucleosome directly upstream of the TSS is tightly correlated with gene activation, whereas the occlusion of the TSS by a nucleosome is associated with gene repression77,78 (Fig. 4). Nucleosome positioning not only determines accessibility of the transcription factors to their target DNA sequence but has also been reported to play an important role in shaping the methylation landscape79 (Fig. 4). Besides transcription regulation, nucleosome occupancy also participates in directing meiotic recombination events80. The precise function of nucleosomes is influenced by the incorporation of different histone variants. Histone variants are distinguished from core histones by the fact that they are expressed outside of S phase and are incorporated into chromatin independently from DNA replication. They differ from core histones in their tails, in their domain structure and in a few key amino acids57. Histone variants regulate nucleosome positioning and gene expression23. For example, the incorporation of the histone variant H2A.Z protects genes against DNA methylation81. Thus, the interplay among different epigenetic partners becomes evident once 1061
© 2010 Nature America, Inc. All rights reserved.
re v ie w more. The nucleosome remodeling machinery is influenced by DNA methylation82 and has been linked with specific histone modifications83 (Fig. 2c). MicroRNAs (miRNAs) can also regulate histone variant replacement84 or interact with chromatin remodeling complexes mediating the exchange of specific subunits85. Several groups of large macromolecular complexes are known to move, destabilize, eject or restructure nucleosomes in an ATP hydrolysis–dependent manner. These complexes, known as chromatin remodeling complexes, can be classified into four families (SWI/SNF, ISWI, CHD and INO80) that share similar ATPase domains but differ in the composition of their unique subunits86. In the first of these families, the SWI/SNF family, members have as a catalytic unit either Brahma (BRM) or BRG1, which share ~75% of identity but differ in their first 60 amino acids. SWI/SNF family complexes are master regulators of gene expression, regulating expression of—among others—FOS, CSF-1, CRYAB, MIM-1, p21 (also known as CDKN1A), HSP70, VIM and CCNA2. Moreover, SWI/SNF has also been reported to modulate alternative splicing87. Many members of the second class, the ISWI family, such as ACF and CHRAC, have been reported to promote chromatin assembly and to repress transcription. However, NURF, another complex of this family, is capable of activating RNA polymerase II thus participating in transcriptional activation88. In the CHD family, some members participate in the sliding and ejection of nucleosomes, promoting transcription; however, others, such as the Mi-2/NuRD complex, have repressive roles and contain HDAC activity and MBD proteins88 (Fig. 2c). Members of the last group, the INO80 family, have been reported to participate in multiple cellular processes: transcriptional activation, DNA repair, telomere regulation, chromosome segregation and DNA replication among others86. However, the SWR1 member has the unique ability to restructure the nucleosome, removing the H2A-H2B dimers and replacing them with H2A.Z-H2B dimers88 (Fig. 2c). Epigenetic modifications in cancer In addition to featuring classic genetic mutations, cancer cells present a profoundly distorted epigenetic landscape (Table 1). The cancer epigenome is characterized by global changes in DNA methylation, histone modification patterns and chromatin-modifying enzymeexpression profiles11,89, which play important roles in cancer initiation and progression. DNA methylation. Cancer cells are characterized by a massive global loss of DNA methylation90 (20–60% less overall 5-methyl-cytosine). At the same time, the acquisition of specific patterns of hypermethylation at the CpG islands of certain promoters is frequently observed (Fig. 1a). Global hypomethylation occurs mainly at repetitive sequences, promoting chromosomal instability, translocations, gene disruption and reactivation of endoparasitic sequences 90,91 (Fig. 1d). A clear case is the LINE family member L1, which has been shown to be hypomethylated in a wide range of cancers, including breast, lung, bladder and liver tumors92. Hypomethylation at specific promoters can activate the aberrant expression of oncogenes and induce loss of imprinting (LOI) in some loci. For instance, MASPIN (also known as SERPINB5), a tumor suppressor gene that becomes hypermethylated in breast and prostate epithelial cells93, appears to be hypomethylated in other tumor types. MASPIN hypomethylation, and therefore its expression, increases with the degree of dedifferentiation of some types of cancer cells94,95. S100P in pancreatic cancer, SNCG in breast and ovarian cancers and melanoma-associated gene (MAGE) and dipeptidyl 1062
peptidase 6 (DPP6) in melanomas are other well-studied examples of hypomethylated genes in cancer19,92. The most common LOI event due to hypomethylation is insulin-like growth factor 2 (IGF2), which has been reported in a wide range of tumor types, including breast, liver, lung and colon cancer96. In contrast to global DNA hypomethylation, hypermethylation is observed at specific CpG islands (Fig. 1a). The transcriptional inactivation caused by promoter hypermethylation affects genes involved in the main cellular pathways: DNA repair (hMLH1, MGMT, WRN, BRCA1), vitamin response (RARB2, CRBP1), Ras signaling (RASSFIA, NOREIA), cell cycle control (p16INK4a, p15INK4b, RB), p53 network (p14ARF, p73 (also known as TP73), HIC-1) and apoptosis (TMS1, DAPK1, WIF-1, SFRP1), among others15. Hypermethylated promoters have been proposed as a new generation of biomarkers and hold great diagnostic and prognostic promise for clinicians97 (reviewed in more detail by Jones and colleagues98 in this issue). However, even though the focus of most studies is on CpG islands located in promoters, recent findings suggest that most of the aberrant DNA methylation in cancer occurs in CpG island shores (e.g., in HOXA2 and GATA2) (Fig. 1b). Notably, most changes in CpG island shores (45–65%) seem to be associated with regions that become hypermethylated during normal tissue differentiation (e.g., in TGFB1 and PAX5)19,20. Differential DNA methylation seems to correlate with gene expression at CpG island shores just as it does with CpG islands21. Human tumors are also characterized by an overall miRNA downregulation99 often caused by hypermethylation at the miRNA promoters100. For example, miR-124a is repressed by hypermethylation, mediating CDK6 activation and Rb phosphorylation101. Interestingly, inactivation of miRNA expression by hypermethylation is not only linked to cancer but also to metastasis development. Silencing of miR-148, miR-34b/c and miR-9 by promoter hypermethylation favors tumor dissemination from the original location102. Hypermethylation patterns are tumor-type specific and it is still unclear why certain regions become hypermethylated, whereas others remain unmethylated. One possibility is that inactivation of particular genes confers a growth advantage, resulting in clonal selection15. In some cases, it has also been proposed that aberrant CpG-island methylation could be due to the recruitment of DNMTs and HDACs to specific target genes mediated by fusion proteins, such as the promyelocytic leukemia–retinoic acid receptor-α (PML–RARA) fusion protein, expressed in some leukemias103. Another possibility is the spreading of methylation from highly methylated sequences to their surroundings is more pronounced in cancer. It has been reported that epigenetic silencing by DNA methylation can span 1-Mb-long regions of a chromosome104, resembling the loss of heterozygosity often observed in human tumors. This global distortion of the DNA methylation pattern could also be mediated by dysregulation of DNMT expression. DNMT1 and DNMT3b are overexpressed in many tumor types105. Moreover, DNMT expression can also be regulated by miRNAs. The miR-29 family is known to directly target and downregulate DNMT3A and DNMT3B, and indirectly target DNMT1 (ref. 106) (Fig. 2a). Histone modifications. The most prominent alteration in histone modification in cancer cells is a global reduction of monoacetylated H4K16 (ref. 107). Loss of acetylation is mediated by HDACs, which have been found to be overexpressed108 or mutated109 in different tumor types. The main class of HDACs implicated in this process is the Sirtuin family of proteins110. Gene expression and deacetylase activity of SirT1 is upregulated in several cancer types. Moreover, SirT1 interacts with DNMT1, thus affecting DNA methylation patterns25. HDAC expression can be regulated by miRNAs, such as miR-449a, which, by repressing the expression of HDAC-1 in volume 28 number 10 OCTOBER 2010 nature biotechnology
re v ie w
H3K27 H3K9 M
M
H3K27 M H4K20 M
H3K9
H4K20
M
M
H3K27
H3K9
M
M
© 2010 Nature America, Inc. All rights reserved.
M
M
H3K4 A
H3K9 H4K20
M A
H3K79 H3K36 M M
A
Figure 4 Nucleosome positioning patterns. Nucleosome positioning plays an important role in transcriptional regulation. Transcriptionally active gene promoters possess a nucleosome-free region at the 5′ and 3′ untranslated region, providing space for the assembly and disassembly of the transcription machinery. The loss of a nucleosome directly upstream of the TSS is also necessary for gene activation, whereas the occlusion of this position leads to transcription repression. DNA methylation regulates transcription, and thus interferes with nucleosome positioning. Methylated DNA seems to be associated with ‘closed’ chromatin domains, where DNA is condensed into strictly positioned nucleosomes, thereby impeding transcription. Conversely, unmethylated DNA is associated with ‘opened’ chromatin domains, which allow transcription.
prostate cancer cells, regulates cell growth and viability111 (Fig. 2b). In addition to alteration in HDAC expression, several cancer types (e.g., colon, uterus, lung and leukemia) also bear translocations leading to the formation of aberrant fusion proteins, mutations or deletions in HATs and HAT-related genes112,113, thus contributing to the global imbalance of histone acetylation. Besides the global loss of H4K16ac, cancer cells suffer a global loss of the active mark H3K4me3 (ref. 114) and the repressive mark H4K20me3 (ref. 107), and a gain in the repressive marks H3K9me (ref. 115) and H3K27me3 (ref. 116). Altered distribution of the histone methyl marks in cancer cells is mainly due to the aberrant expression of both histone methyltransferases and histone demethylases75. A recent publication has described inactivating mutations in the histone methyltransferase SETD2 and in the histone demethylase UTX and JARID1C in renal carcinomas117. Another example is the histone methyltransferase EZH2—a subunit of the PRC2 and PRC3 complexes—which enhances proliferation and neoplastic transformation and is overexpressed in several cancer types. Overexpression of the lincRNA HOTAIR in breast tumors and metastases retargets PRC2 and alters H3K27me3 landscape118. Moreover, EZH2 expression is upregulated in many cancer tumors due to the genomic loss of miR-101 (ref. 119). In addition to its histone methyltransferase activity, EZH2 interacts with DNMTs directly controlling DNA methylation116. NSD1, another histone methyltransferase, has been reported to undergo promoter DNA methylation-dependent silencing in neuroblastomas120. DOT1L, the major H3K79 histone methyltransferase, is essential for the establishment of a euchromatic state that allows the expression of tumor suppressor genes121,122. In leukemias, the presence of mixed lineage leukemia (MLL) fusion oncoproteins leads to aberrant patterns of H3K79 and H3K4 methylation, resulting in altered gene expression of MLL targets123,124. Some histone demethylases (e.g., GASC1, LSD1, JmjC and UTRX) have also been shown to be upregulated or amplified in several cancers, including prostate cancer and squamous cell carcinomas125. Although further studies are needed, histone phosphorylation also seems to be relevant in cancer. Histone phosphorylation plays a role in DNA damage-repair response, chromosome stability and apoptosis. Recently JAK2, a nonreceptor tyrosine kinase that regulates several cellular processes by inducing cytoplasmic signaling cascade, has been reported also to be present in the nucleus, directly phosphorylating H3Y41 (Fig. 2b). Phosphorylated H3Y41 (H3Y41ph) levels are regulated nature biotechnology volume 28 number 10 OCTOBER 2010
by cytokine signaling. H3K41ph prevents the binding of heterochromatin protein1α (HP1α) to this region of H3, increasing the expression of the genes located there, as it was reported in the lmo2 promoter. JAK2 is frequently activated by chromosomal translocations or point mutations in hematological malignancies126. Nucleosome positioning. All families of chromatin remodelers have been tied to cancer, although in most cases the molecular mechanisms underlying their function remain unclear. For instance, BRG1 and BRM, the ATPase subunits of SWI/SNF complexes, have been characterized as tumor suppressors and are silenced in about 15–20% of primary nonsmall-cell lung cancers127. Surprisingly, an oncogenic role for BRG1 as a p53 destabilizer has also been proposed128. Mutations in SNF5, a subunit of the SWI/SNF remodeling complex, have been observed in sporadic renal rhabdoid tumors and in choroid plexus carcinomas, meduloblastomas and central primitive neuroectodermal tumors129. Nucleosome remodeling is also involved in the transcriptional repression by promoter hypermethylation (Fig. 4). Promoter hypermethylation results in the occupation of the TSS by a nucleosome, as has been reported for MLH1 in colon cancer130. The genes encoding subunits of the chromatin remodeling complexes (e.g., CHD5 (ref. 131)) themselves are also targets of CpG island hypermethylation in cancer, thereby downregulating its expression and impairing the normal chromatin remodeling processes (Fig. 2c). In addition to nucleosome positioning, histone variants have also been related to cancer. For example, increased expression of MacroH2A is involved in senescence. Thus, lung tumors with highly expressed MacroH2A have a better prognosis, with lower proliferation rates and less frequent recurrence132. Epigenetic modifications in neurodevelopmental disorders The central nervous system is one of the most complex systems in humans. Not only do the different regions of an organ present different expression patterns, but the same cell type has different transcriptional regulation depending on its localization in the organ133. The mitotic exit, when neural cells lose their multipotency, is a key step in nervous system development85,134, requiring a very precise tuning of the transcriptional program. Epigenetic factors are key players in this regulation. Genetic mutations in epigenetic genes cause dysfunctions that lead to certain neurodevelopmental disorders. Here, we classify them according to the epigenetic machinery that becomes mutated. 1063
re v ie w Table 1 Epigenetic modifications in human diseases Aberrant epigenetic mark Alteration
Consequences
Examples of genes affected and/or resulting disease
Cancer DNA methylation
Histone modification
© 2010 Nature America, Inc. All rights reserved.
Nucleosome positioning
CpG island hypermethylation
Transcription repression
MLH1 (colon, endometrium, stomach11), BRCA1 (breast, ovary11), MGMT (several tumor types11), p16INK4a (colon11)
CpG island hypomethylation
Transcription activation
MASPIN (pancreas92), S100P (pancreas92), SNCG (breast and ovary92), MAGE (melanomas92)
CpG island shore hypermethylation
Transcription repression
HOXA2 (colon20),GATA2 (colon20)
Repetitive sequences hypomethylation
Transposition, recombination genomic instability
L1 (ref. 11), IAP11, Sat2 (ref. 107)
Loss of H3 and H4 acetylation
Transcription repression
p21WAF1 (also known as CDKN1A)11
Loss of H3K4me3
Transcription repression
HOX genes
Loss of H4K20me3
Loss of heterochromatic structure
Sat2, D4Z4 (ref. 107)
Gain of H3K9me and H3K27me3
Transcription repression
CDKN2A, RASSF1 (refs. 115–116)
Silencing and/or mutation of remodeler subunits
Diverse, leading to oncogenic transformation
BRG1, CHD5 (refs. 127–131)
Aberrant recruitment of remodelers
Transcription repression
PLM-RARa103 recruits NuRD
Histone variants replacement
Diverse (promotion cell cycle/destabilization H2A.Z overexpression/loss of chromosomal boundaries)
CpG island hypermethylation
Transcription repression
Alzheimer’s disease (NEP)135
CpG island hypomethylation
Transcription activation
Multiple sclerosis (PADI2)135
Repetitive sequences aberrant methylation
Transposition, recombination genomic instability
ATRX syndrome (subtelomeric repeats)135,143
Aberrant acetylation
Diverse
Parkinson’s and Huntington’s diseases135
Aberrant methylation
Diverse
Huntington’s disease and Friedreich’s ataxia135
Neurological disorders DNA methylation
Histone modification
Nucleosome positioning
Aberrant phosphorylation
Diverse
Alzheimer’s disease135
Misposition in trinucleotide repeats
Creation of a ‘closed’ chromatin domain
Congenital myotonic dystrophy151
CpG island hypermethylation
Transcription repression
Rheumatoid arthritis (DR3)154,155
CpG island hypomethylation
Transcription activation
SLE (PRF1, CD70, CD154, AIM2)6
Repetitive sequences aberrant methylation
Transposition, recombination genomic instability
ICF (Sat2, Sat3), rheumatoid arthritis (L1)152,155
Aberrant acetylation
Diverse
SLE (CD154, IL10, IFN-γ)6
Aberrant methylation
Diverse
Diabetes type 1 (CLTA4, IL6)159
Aberrant phosphorylation
Diverse
SLE (NF-κB targets)
SNPs in the 17q12-q21 region
Allele-specific differences in nucleosome distribution
Diabetes type 1 (CLTA4, IL6)
Histone variants replacement
Interferes with proper remodeling
Rheumatoid arthritis (histone variant macroH2A at NF-κB targets)157
Autoimmune diseases DNA methylation
Histone modification
Nucleosome positioning
DNA methylation. Rett syndrome is an X-linked neurological disease caused by point mutations in the MBD protein MeCP2. Both upregulation and downregulation of MeCP2 in the brain are associated with neurodevelopmental defects. Customarily, MeCP2 has been considered to function as a gene silencer, mediating the recruitment of HDACs to methylated DNA (Fig. 2b). Recently, new data have highlighted important roles for MeCP2 in chromatin architecture, regulation of mRNA splicing135,136 and active transcription of genes (e.g., Sst, Gprin1)137. Although transcriptional alterations have been described in some genes (e.g., Fkbp5, Mobp, Ddc and S100a9)138, imprinted regions (e.g., DLX5) and miRNAs (e.g., miR-184)139,140, MeCP2 deficiency does not result in high levels of genome-wide transcriptional alteration. It stills remains unknown whether or not the described alterations are causative. Histone modifications. Rubinstein-Taybi syndrome is an autosomal dominant disorder associated with the dysfunction of a HAT. It is a genetically heterogeneous disease associated in ~55% of cases with mutations 1064
in the cAMP-response element binding protein (CBP), in another 3% of cases with mutations in EP300 and in ~42% of cases with an unidentified cause. CBP and EP300 function as transcriptional co-activators in addition to their HAT activity135. In Cbp+/− mice H2B acetylation is reduced by more than 30%, suggesting that the failure in long-term memory formation could be explained by chromatin changes in one or several loci that control memory storage141. The neurodevelopmental disease Coffin-Lowry syndrome is a rare X-linked disorder caused by loss-of-function mutations in RSK2, a serine/threonine protein kinase. RSK2 participates in the MAP kinase pathway, inducing the transient transcription of a set of genes. RSK2 mediates H3S10ph directly, changing chromatin structure and facilitating the binding of CBP, which acetylates H3 residues. Thus, RSK2 promotes gene transcription through chromatin opening142. Nucleosome positioning. ATRX syndrome is an X-linked disorder caused by mutations in ATRX, a member of the Snf2 family of chromatin volume 28 number 10 OCTOBER 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
re v ie w remodelers. The ATRX protein interacts with the SET domain of the histone methyltransferase EZH2, the Daxx transcriptional cofactor, MeCP2 and the chromoshadow domain of HP1 proteins. It participates among other cellular processes in heterochromatin formation, chromosome alignment at the meiotic spindle, chromosome cohesion in somatic cells and maintenance of X-chromosome inactivation in women. Because no DNA repair defects or genomic instability occurs in ATRX patients, it has been suggested that ATRX may regulate the transcription of a specific set of target genes. Although global DNA methylation is unchanged in ATRX patients, aberrant DNA methylation in some repetitive sequences has been reported135,143.
Nucleosome positioning. It has been suggested that the amplification of CTG repeats in congenital myotonic dystrophy is a very strong nucleosome positioning signal that mediates the creation of a closed chromatin domain151. Despite this fact, which needs further investigation, little is known about the possible implications of nucleosome positioning or histone variants in neuronal malignancies.
Epigenetic modifications in neurodegenerative and neurological diseases Recent studies have also shed some light on the relationship between epigenetic alterations and neurodegenerative and/or neurological diseases. The majority of the evidence centers on DNA methylation and histone modification (Table 1).
DNA methylation. Most of the research relating autoimmunity disorders and epigenetic changes has focused on DNA methylation alterations. In fact, one of the best known autoimmune diseases, the ICF (immunodeficiency, centromeric instability and facial anomalies) syndrome, is caused by heterozygous mutations in DNMT3B. ICF patients show marked DNA hypomethylation in the pericentromeric satellite 2 and 3 repeats, alpha satellite sequences, Alu sequences and the D4Z4 and NBL2 repeats. Conversely, ICF patients have almost unchanged global DNA methylation levels143,147, although several genes regulating development, neurogenesis and immune function have aberrant expression152. Other autoimmune diseases, unrelated to mutations in the DNA methylation machinery, also present global hypomethylation, as is the case for systemic lupus erythematosus (SLE) and rheumatoid arthritis. The hypomethylated regions are not yet well defined, although some hypomethylated sites have been reported. SLE patients have DNA hypomethylation in PRF1, CD70, CD154, IFGNR2, MMP14, LCN2, CSF3R and AIM2 among other genes, and also in the ribosomal RNA gene promoter, 18S and 28S (ref. 6). The mechanisms responsible for this widespread hypomethylation are beginning to be revealed. It has been recently reported that hypomethylation in SLE is partially mediated by miR-21 and miR-148a that directly and indirectly target DNMT1 (ref. 153). In rheumatoid arthritis, not only hypomethylated sites (e.g., in L1 and IL6) but also hypermethylated sites (e.g., in DR3) have been described154,155.
DNA methylation. DNA methylation patterns appear to be distorted in a great deal of neurological diseases, giving rise to hyper- and hypomethylated sites. For instance, FMR1 promoter hypermethylation has been described in Fragile X syndrome patients. Fragile X syndrome is caused by a CGG trinucleotide repeat expansion in the 5′-untranslated region of FMR1. Expansion of the CGG trinucleotide repeats to >200 copies induces methylation of FMR1, leading to its transcriptional silencing144. Other reported cases of hypermethylated promoters include neprilysin (NEP, also known as MME) in Alzheimer’s disease, FXN in Friedreich’s ataxia and SMN2 in spinal muscular atrophy135. Conversely, hypomethylated sites have also been reported. For example, the substantia nigra of Parkinson’s patients overexpresses tumor necrosis factor alpha (TNFα) due to its promoter hypomethylation, thereby inducing apoptosis of neuronal cells145. Other cases of hypomethylation were reported in the promoter region of PADI2 for multiple sclerosis patients135 and in the Avp enhancer for mice subjected to early-life stress146. Alterations in DNA methylation patterns not only affect gene promoters but may also lead to LOI. Classic examples of LOI are the Prader-Willi and the Angelman syndromes. Both diseases involve aberrant DNA methylation in the imprinting controlled region at 15q11-q13. Prader-Willi syndrome arises from the loss of paternally expressed genes in this region, whereas Angelman syndrome arises from the loss of the maternally expressed UBE3A gene147. Histone modifications. The pattern of histone marks is also altered in neurological diseases, histone hypoacetylation being the most frequently observed change. A good example of histone hypoacetylation is amyotrophic lateral sclerosis (ALS). ALS patients have aggregates of the protein FUS in cytoplasmic deposits of misfolded proteins. FUS is able to bind CBP, strongly inhibiting its HAT activity and to negatively regulate specific CREB target genes. Thus, overexpression of FUS induces histone hypoacetylation135. Other cases of hypoacetylation in neurological diseases are found in Parkinson’s and Huntington’s disease135 and Friedreich’s ataxia148. Despite histone hypoacetylation, more changes relating neurological diseases and histone marks have been reported. For example, histone acetylation and phosphorylation alterations are typical in Alzheimer’s disease and epilepsy, H3K9 hypertrimethylation has been described in Huntington’s disease135 and Friedreich’s ataxia149 and the histone demethylase PHF8 has been involved in X-linked mental retardation150. nature biotechnology volume 28 number 10 OCTOBER 2010
Epigenetic modifications in autoimmune diseases Autoimmune diseases are characterized by the breakdown of immune tolerance to specific self-antigens. Different types of epigenetic alterations have been reported in this type of disorder (Table 1).
Histone modifications. Little is known about the role of histone modifications in autoimmune diseases, although initial studies are beginning to shed some light in this area. In human SLE T-cells, the HDAC inhibitor trichostatin A reverses the aberrant expression of CD154, IL10 and interferon (IFN)-γ products156. A role for histone modifications in rheumatoid arthritis has also been described. Because the transcription factor NF-κB—a key regulator inflammatory—binds very poorly to nucleosomal DNA, histone modifications are needed to allow efficient NF-κB binding to its targets: histone H3K9 and S10 (also known as PSMD6) phosphoacetylation, reduction in H3K9me and increase in H3/H4 acetylation157. Thus, in rheumatoid arthritis, the reduced activity of HDACs plays a key role in regulating NF-κB–mediated gene expression158. Patients with type 1 diabetes also present a characteristic pattern of histone marks, showing lymphocytes but not monocytes with increased H3K9me2 in a subset of genes associated with autoimmune and inflammatory pathways (e.g., CLTA4, IL6)159. However, histone modifications have a role not only in transcription regulation. Nucleosomes are key autoantigens in SLE, being present in the circulation because of increased apoptosis and/or insufficient clearance. In apoptosis, histone modifications occur, such as H2BS14 phosphorylation160, H3T45 phosphorylation161, H3K4 trimethylation162, H4 triacetylation at K8, K12 and K16 (ref. 163) as well as H2BK12 acetylation164. It has been suggested that histone modifications arising during apoptosis make released apoptotic nucleosomes 1065
re v ie w more immunogenic, leading to activation of antigen-presenting cells, which could result in autoantibody production162.
© 2010 Nature America, Inc. All rights reserved.
Nucleosome positioning. No studies have yet made a connection between nucleosome positioning and autoimmune diseases. Notably, it has recently been reported that single-nucleotide polymorphisms in the 17q12-q21 region, which have been associated with a higher risk of asthma, type 1 diabetes, primary biliary cirrhosis and Crohn’s disease, lead to allele-specific differences in nucleosome distribution165. Moreover, in rheumatoid arthritis, the incorporation of the histone variant macroH2A interferes with the binding of the transcription factor NF-κB and impedes SWI/ SNF-dependent remodeling157. Conclusions and perspectives In the past decade the fast-evolving field of epigenetics has taken center stage, as shown by the results of a simple PubMed search of the term ‘epigenetic’: there were around 200 papers published in 1999, but more than 2,500 in 2009. Such startling growth in the number of publications attests to the intense research activity being undertaken in the field. Great progress has been made in the description of epigenetic modifications in normal and diseased tissues. Thus far, efforts in epigenetic research have mainly focused on cancer, but as the field has grown, it has provided new insights into other types of diseases, particularly neurological and autoimmune diseases. Epigenetic alterations are likely to be found in other disorders; indeed, they have already been described in cardiovascular diseases166–168, metabolic diseases169, myopathies170 and children born from assisted reproductive treatments171. In the past months, we have witnessed a flood of new discoveries: the description of comprehensive DNA methylomes of humans22 and viruses146, the putative identification of non-CpG methylation28, the definition of CpG island shores19, the involvement of aberrant DNA methylation in other diseases besides cancer6,135, the description of new histone modifications and histone variants and their roles45,126,161, the report of new epigenetic machinery such as the DNA demethylase Tet1 (refs. 172,173) and the histone kinase JAK2 (ref. 126), the description of new mutations in the epigenetic machinery99 and the flurry of ncRNA studies that highlight the importance of RNA-mediated regulation in epigentics174,175. Many key questions remain unanswered: what are the functions of nonCpG methylation and 5-hydroxymethylcytosine in human cells? Are there new DNA or histone modifications yet to be discovered? What are the rules of the so-called histone code? What are the roles and function of ncRNAs and how many more ncRNAs are yet to be described? How is the placement of epigenetic marks and its specificity regulated? How are causative epigenetic changes going to be distinguished from mere bystander alterations? Is it always clear whether a specific epigenetic modification is a cause or a consequence of a certain process? One of the most intriguing questions is how do the various epigenetic players interact and what mechanisms convey sequence specificity to the enzymes involved? Further research is needed and efforts focused on such questions will be key in our progress toward a complete map of epigenetic regulation. Advances in technological development are enabling epigenomic analysis on a large scale. The first whole-genome, high-resolution maps for epigenetic modifications are appearing, but we should not stop here. Detailed human DNA methylomes, histone modification and nucleosome positioning maps in healthy and diseased tissues are needed. In this regard several international projects and initiatives have been established: the NIH Roadmap Epigenomics Program, the ENCODE Project, the AHEAD Project and the Epigenomics NCBI browser, among others (see the commentaries by Bernstein and colleagues176 and Satterlee and colleagues177 in this issue). The detailed study of the epigenetic maps 1066
would be of enormous use in basic and applied research and would be relevant for focusing pharmacological research on the most promising epigenetic targets. A key topic for future research is the implementation of mechanisms for the release of whole genome methylation and histone modification maps into public databases. COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests. Published online at http://www.nature.com/naturebiotechnology/. Reprints and permissions information is available online at http://npg.nature.com/ reprintsandpermissions/. 1. Esteller, M. Epigenetics in evolution and disease. Lancet 372, S90–S96 (2008). 2. Waddington, C.H. Introduction to Modern Genetics (Macmillan, 1939). 3. Rideout, W.M., III, Eggan, K. & Jaenisch, R. Nuclear cloning and epigenetic reprogramming of the genome. Science 293, 1093–1098 (2001). 4. Fraga, M.F. et al. Epigenetic differences arise during the lifetime of monozygotic twins. Proc. Natl. Acad. Sci. USA 102, 10604–10609 (2005). 5. Kaminsky, Z.A. et al. DNA methylation profiles in monozygotic and dizygotic twins. Nat. Genet. 41, 240–245 (2009). 6. Javierre, B.M. et al. Changes in the pattern of DNA methylation associate with twin discordance in systemic lupus erythematosus. Genome Res. 20, 170–179 (2010). 7. Chi, A.S. & Bernstein, B.E. Developmental biology. Pluripotent chromatin state. Science 323, 220–221 (2009). 8. Meissner, A. et al. Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature 454, 766–770 (2008). 9. Meissner, A. Epigenetic modifications in pluripotent and differentiated cells. Nat. Biotechnol. 28, 1079–1088 (2010). 10. Esteller, M. CpG island hypermethylation and tumor suppressor genes: a booming present, a brighter future. Oncogene 21, 5427–5440 (2002). 11. Esteller, M. Cancer epigenomics: DNA methylomes and histone-modification maps. Nat. Rev. Genet. 8, 286–298 (2007). 12. Straussman, R. et al. Developmental programming of CpG island methylation profiles in the human genome. Nat. Struct. Mol. Biol. 16, 564–571 (2009). 13. Kacem, S. & Feil, R. Chromatin mechanisms in genomic imprinting. Mamm. Genome 20, 544–556 (2009). 14. Reik, W. & Lewis, A. Co-evolution of X-chromosome inactivation and imprinting in mammals. Nat. Rev. Genet. 6, 403–410 (2005). 15. Esteller, M. Epigenetic gene silencing in cancer: the DNA hypermethylome. Hum. Mol. Genet. 16 Spec No 1, R50–R59 (2007). 16. Lopez-Serra, L. & Esteller, M. Proteins that bind methylated DNA and human cancer: reading the wrong words. Br. J. Cancer 98, 1881–1885 (2008). 17. Kuroda, A. et al. Insulin gene expression is regulated by DNA methylation. PLoS ONE 4, e6953 (2009). 18. Thomson, J.P. et al. CpG islands influence chromatin structure via the CpG-binding protein Cfp1. Nature 464, 1082–1086 (2010). 19. Irizarry, R.A. et al. The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores. Nat. Genet. 41, 178–186 (2009). 20. Doi, A. et al. Differential methylation of tissue- and cancer-specific CpG island shores distinguishes human induced pluripotent stem cells, embryonic stem cells and fibroblasts. Nat. Genet. 41, 1350–1353 (2009). 21. Ji, H. et al. Comprehensive methylome map of lineage commitment from haematopoietic progenitors. Nature 467, 338–342 (2010). 22. Hellman, A. & Chess, A. Gene body-specific methylation on the active X chromosome. Science 315, 1141–1143 (2007). 23. Zilberman, D., Gehring, M., Tran, R.K., Ballinger, T. & Henikoff, S. Genome-wide analysis of Arabidopsis thaliana DNA methylation uncovers an interdependence between methylation and transcription. Nat. Genet. 39, 61–69 (2007). 24. Zhao, Z. et al. Circular chromosome conformation capture (4C) uncovers extensive networks of epigenetically regulated intra- and interchromosomal interactions. Nat. Genet. 38, 1341–1347 (2006). 25. Espada, J. et al. Epigenetic disruption of ribosomal RNA genes and nucleolar architecture in DNA methyltransferase 1 (Dnmt1) deficient cells. Nucleic Acids Res. 35, 2191–2198 (2007). 26. Horike, S., Cai, S., Miyano, M., Cheng, J.F. & Kohwi-Shigematsu, T. Loss of silentchromatin looping and impaired imprinting of DLX5 in Rett syndrome. Nat. Genet. 37, 31–40 (2005). 27. van Steensel, B. & Dekker, J. Genomics tools for unraveling chromosome architecture. Nat. Biotechnol. 28, 1089–1095 (2010). 28. Lister, R. et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462, 315–322 (2009). 29. Laurent, L. et al. Dynamic changes in the human methylome during differentiation. Genome Res. 20, 320–331 (2010). 30. Kriaucionis, S. & Heintz, N. The nuclear DNA base 5-hydroxymethylcytosine is present in Purkinje neurons and the brain. Science 324, 929–930 (2009). 31. Berman, B.P., Weisenberger, D.J. & Laird, P.W. Locking in on the human methylome. Nat. Biotechnol. 27, 341–342 (2009). 32. Weisenberger, D.J. et al. DNA methylation analysis by digital bisulfite genomic sequencing and digital MethyLight. Nucleic Acids Res. 36, 4689–4698 (2008).
volume 28 number 10 OCTOBER 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
re v ie w 33. Down, T.A. et al. A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis. Nat. Biotechnol. 26, 779–785 (2008). 34. Laird, P.W. Principles and challenges of genome-wide DNA methylation analysis. Nat. Rev. Genet. 11, 191–203 (2010). 35. Bourc’his, D., Xu, G.L., Lin, C.S., Bollman, B. & Bestor, T.H. Dnmt3L and the establishment of maternal genomic imprints. Science 294, 2536–2539 (2001). 36. Chen, Z.X., Mann, J.R., Hsieh, C.L., Riggs, A.D. & Chedin, F. Physical and functional interactions between the human DNMT3L protein and members of the de novo methyltransferase family. J. Cell. Biochem. 95, 902–917 (2005). 37. Holz-Schietinger, C. & Reich, N.O. The inherent processivity of the human de novo DNA methyltransferase 3A (DNMT3A) is enhanced by DNMT3L. J. Biol. Chem. 285, 29091–29100 (2010). 38. Chuang, L.S. et al. Human DNA-(cytosine-5) methyltransferase-PCNA complex as a target for p21WAF1. Science 277, 1996–2000 (1997). 39. Bostick, M. et al. UHRF1 plays a role in maintaining DNA methylation in mammalian cells. Science 317, 1760–1764 (2007). 40. Jones, P.A. & Liang, G. Rethinking how DNA methylation patterns are maintained. Nat. Rev. Genet. 10, 805–811 (2009). 41. Jeong, S. et al. Selective anchoring of DNA methyltransferases 3A and 3B to nucleosomes containing methylated DNA. Mol. Cell. Biol. 29, 5366–5376 (2009). 42. Goll, M.G. et al. Methylation of tRNAAsp by the DNA methyltransferase homolog Dnmt2. Science 311, 395–398 (2006). 43. Ooi, S.K. et al. DNMT3L connects unmethylated lysine 4 of histone H3 to de novo methylation of DNA. Nature 448, 714–717 (2007). 44. Tachibana, M., Matsumura, Y., Fukuda, M., Kimura, H. & Shinkai, Y. G9a/GLP complexes independently mediate H3K9 and DNA methylation to silence transcription. EMBO J. 27, 2681–2690 (2008). 45. Zhao, Q. et al. PRMT5-mediated methylation of histone H4R3 recruits DNMT3A, coupling histone and DNA methylation in gene silencing. Nat. Struct. Mol. Biol. 16, 304–311 (2009). 46. Esteve, P.O. et al. Regulation of DNMT1 stability through SET7-mediated lysine methylation in mammalian cells. Proc. Natl. Acad. Sci. USA 106, 5076–5081 (2009). 47. Wang, J. et al. The lysine demethylase LSD1 (KDM1) is required for maintenance of global DNA methylation. Nat. Genet. 41, 125–129 (2009). 48. Mosher, R.A. & Melnyk, C.W. siRNAs and DNA methylation: seedy epigenetics. Trends Plant Sci. 15, 204–210 (2010). 49. Matzke, M.A. & Birchler, J.A. RNAi-mediated pathways in the nucleus. Nat. Rev. Genet. 6, 24–35 (2005). 50. Vrbsky, J. et al. siRNA-mediated methylation of Arabidopsis telomeres. PLoS Genet. 6, e1000986 (2010). 51. Zaratiegui, M., Irvine, D.V. & Martienssen, R.A. Noncoding RNAs and gene silencing. Cell 128, 763–776 (2007). 52. Kouzarides, T. Chromatin modifications and their function. Cell 128, 693–705 (2007). 53. Daujat, S., Zeissler, U., Waldmann, T., Happel, N. & Schneider, R. HP1 binds specifically to Lys26-methylated histone H1.4, whereas simultaneous Ser27 phosphorylation blocks HP1 binding. J. Biol. Chem. 280, 38090–38095 (2005). 54. Rando, O.J. & Chang, H.Y. Genome-wide views of chromatin structure. Annu. Rev. Biochem. 78, 245–271 (2009). 55. Huertas, D., Sendra, R. & Munoz, P. Chromatin dynamics coupled to DNA repair. Epigenetics 4, 31–42 (2009). 56. Luco, R.F. et al. Regulation of alternative splicing by histone modifications. Science 327, 996–1000 (2010). 57. Li, B., Carey, M. & Workman, J.L. The role of chromatin during transcription. Cell 128, 707–719 (2007). 58. Karlic, R., Chung, H.R., Lasserre, J., Vlahovicek, K. & Vingron, M. Histone modification levels are predictive for gene expression. Proc. Natl. Acad. Sci. USA 107, 2926–2931 (2010). 59. Moazed, D. Small RNAs in transcriptional gene silencing and genome defence. Nature 457, 413–420 (2009). 60. Grewal, S.I. & Jia, S. Heterochromatin revisited. Nat. Rev. Genet. 8, 35–46 (2007). 61. Khalil, A.M. et al. Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc. Natl. Acad. Sci. USA 106, 11667–11672 (2009). 62. Chow, J. & Heard, E. X inactivation and the complexities of silencing a sex chromosome. Curr. Opin. Cell Biol. 21, 359–366 (2009). 63. Agrelo, R. & Wutz, A. X inactivation and disease. Semin. Cell Dev. Biol. 21, 194–200 (2010). 64. Rinn, J.L. et al. Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell 129, 1311–1323 (2007). 65. Santos-Rosa, H. et al. Histone H3 tail clipping regulates gene expression. Nat. Struct. Mol. Biol. 16, 17–22 (2009). 66. Wang, Z. et al. Combinatorial patterns of histone acetylations and methylations in the human genome. Nat. Genet. 40, 897–903 (2008). 67. Duan, Q., Chen, H., Costa, M. & Dai, W. Phosphorylation of H3S10 blocks the access of H3K9 by specific antibodies and histone methyltransferase. Implication in regulating chromatin dynamics and epigenetic inheritance during mitosis. J. Biol. Chem. 283, 33585–33590 (2008). 68. Nakanishi, S. et al. Histone H2BK123 monoubiquitination is the critical determinant for H3K4 and H3K79 trimethylation by COMPASS and Dot1. J. Cell Biol. 186, 371–377 (2009). 69. Ernst, J. & Kellis, M. Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat. Biotechnol. 28, 817–825 (2010).
nature biotechnology volume 28 number 10 OCTOBER 2010
70. Mikkelsen, T.S. et al. Genome-wide maps of chromatin state in pluripotent and lineagecommitted cells. Nature 448, 553–560 (2007). 71. Bernstein, B.E. et al. A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell 125, 315–326 (2006). 72. Fuks, F. et al. The methyl-CpG-binding protein MeCP2 links DNA methylation to histone methylation. J. Biol. Chem. 278, 4035–4040 (2003). 73. Bhaumik, S.R., Smith, E. & Shilatifard, A. Covalent modifications of histones during development and disease pathogenesis. Nat. Struct. Mol. Biol. 14, 1008–1016 (2007). 74. Chang, B., Chen, Y., Zhao, Y. & Bruick, R.K. JMJD6 is a histone arginine demethylase. Science 318, 444–447 (2007). 75. Chi, P., Allis, C.D. & Wang, G.G. Covalent histone modifications–miswritten, misinterpreted and mis-erased in human cancers. Nat. Rev. Cancer 10, 457–469 (2010). 76. Wang, Z. et al. Genome-wide mapping of HATs and HDACs reveals distinct functions in active and inactive genes. Cell 138, 1019–1031 (2009). 77. Schones, D.E. et al. Dynamic regulation of nucleosome positioning in the human genome. Cell 132, 887–898 (2008). 78. Cairns, B.R. The logic of chromatin architecture and remodelling at promoters. Nature 461, 193–198 (2009). 79. Chodavarapu, R.K. et al. Relationship between nucleosome positioning and DNA methylation. Nature 466, 388–392 (2010). 80. Getun, I.V., Wu, Z.K., Khalil, A.M. & Bois, P.R. Nucleosome occupancy landscape and dynamics at mouse recombination hotspots. EMBO Rep. 11, 555–560 (2010). 81. Zilberman, D., Coleman-Derr, D., Ballinger, T. & Henikoff, S. Histone H2A.Z and DNA methylation are mutually antagonistic chromatin marks. Nature 456, 125–129 (2008). 82. Harikrishnan, K.N. et al. Brahma links the SWI/SNF chromatin-remodeling complex with MeCP2-dependent transcriptional silencing. Nat. Genet. 37, 254–264 (2005). 83. Wysocka, J. et al. A PHD finger of NURF couples histone H3 lysine 4 trimethylation with chromatin remodelling. Nature 442, 86–90 (2006). 84. Lal, A. et al. miR-24-mediated downregulation of H2AX suppresses DNA repair in terminally differentiated blood cells. Nat. Struct. Mol. Biol. 16, 492–498 (2009). 85. Yoo, A.S., Staahl, B.T., Chen, L. & Crabtree, G.R. MicroRNA-mediated switching of chromatin-remodelling complexes in neural development. Nature 460, 642–646 (2009). 86. Ho, L. & Crabtree, G.R. Chromatin remodelling during development. Nature 463, 474–484 (2010). 87. Reisman, D., Glaros, S. & Thompson, E.A. The SWI/SNF complex and cancer. Oncogene 28, 1653–1668 (2009). 88. Clapier, C.R. & Cairns, B.R. The biology of chromatin remodeling complexes. Annu. Rev. Biochem. 78, 273–304 (2009). 89. Sharma, S., Kelly, T.K. & Jones, P.A. Epigenetics in cancer. Carcinogenesis 31, 27–36 (2009). 90. Goelz, S.E., Vogelstein, B., Hamilton, S.R. & Feinberg, A.P. Hypomethylation of DNA from benign and malignant human colon neoplasms. Science 228, 187–190 (1985). 91. Gaudet, F. et al. Induction of tumors in mice by genomic hypomethylation. Science 300, 489–492 (2003). 92. Wilson, A.S., Power, B.E. & Molloy, P.L. DNA hypomethylation and human diseases. Biochim. Biophys. Acta 1775, 138–162 (2007). 93. Futscher, B.W. et al. Aberrant methylation of the maspin promoter is an early event in human breast cancer. Neoplasia 6, 380–389 (2004). 94. Futscher, B.W. et al. Role for DNA methylation in the control of cell type specific maspin expression. Nat. Genet. 31, 175–179 (2002). 95. Bettstetter, M. et al. Elevated nuclear maspin expression is associated with microsatellite instability and high tumour grade in colorectal cancer. J. Pathol. 205, 606–614 (2005). 96. Ito, Y. et al. Somatically acquired hypomethylation of IGF2 in breast and colorectal cancer. Hum. Mol. Genet. 17, 2633–2643 (2008). 97. Li, M. et al. Sensitive digital quantification of DNA methylation in clinical samples. Nat. Biotechnol. 27, 858–863 (2009). 98. Kelly, T.K., De Carvalho, D.D. & Peter A Jones, P.A. Epigenetic modifications as therapeutic targets. Nat. Biotechnol. 28, 1069–1078 (2010). 99. Melo, S.A. et al. A TARBP2 mutation in human cancer impairs microRNA processing and DICER1 function. Nat. Genet. 41, 365–370 (2009). 100. Saito, Y. et al. Specific activation of microRNA-127 with downregulation of the protooncogene BCL6 by chromatin-modifying drugs in human cancer cells. Cancer Cell 9, 435–443 (2006). 101. Lujambio, A. et al. Genetic unmasking of an epigenetically silenced microRNA in human cancer cells. Cancer Res. 67, 1424–1429 (2007). 102. Lujambio, A. et al. A microRNA DNA methylation signature for human cancer metastasis. Proc. Natl. Acad. Sci. USA 105, 13556–13561 (2008). 103. Di Croce, L. et al. Methyltransferase recruitment and DNA hypermethylation of target promoters by an oncogenic transcription factor. Science 295, 1079–1082 (2002). 104. Frigola, J. et al. Epigenetic remodeling in colorectal cancer results in coordinate gene suppression across an entire chromosome band. Nat. Genet. 38, 540–549 (2006). 105. Miremadi, A., Oestergaard, M.Z., Pharoah, P.D. & Caldas, C. Cancer genetics of epigenetic genes. Hum. Mol. Genet. 16 Spec No 1, R28–R49 (2007). 106. Garzon, R. et al. MicroRNA-29b induces global DNA hypomethylation and tumor suppressor gene reexpression in acute myeloid leukemia by targeting directly DNMT3A and 3B and indirectly DNMT1. Blood 113, 6411–6418 (2009). 107. Fraga, M.F. et al. Loss of acetylation at Lys16 and trimethylation at Lys20 of histone H4 is a common hallmark of human cancer. Nat. Genet. 37, 391–400 (2005).
1067
© 2010 Nature America, Inc. All rights reserved.
re v ie w 108. Zhu, P. et al. Induction of HDAC2 expression upon loss of APC in colorectal tumorigenesis. Cancer Cell 5, 455–463 (2004). 109. Ropero, S. et al. A truncating mutation of HDAC2 in human cancers confers resistance to histone deacetylase inhibition. Nat. Genet. 38, 566–569 (2006). 110. Vaquero, A., Sternglanz, R. & Reinberg, D. NAD+-dependent deacetylation of H4 lysine 16 by class III HDACs. Oncogene 26, 5505–5520 (2007). 111. Noonan, E.J. et al. miR-449a targets HDAC-1 and induces growth arrest in prostate cancer. Oncogene 28, 1714–1724 (2009). 112. Moore, S.D. et al. Uterine leiomyomata with t(10;17) disrupt the histone acetyltransferase MORF. Cancer Res. 64, 5570–5577 (2004). 113. Bryan, E.J. et al. Mutation analysis of EP300 in colon, breast and ovarian carcinomas. Int. J. Cancer 102, 137–141 (2002). 114. Hamamoto, R. et al. SMYD3 encodes a histone methyltransferase involved in the proliferation of cancer cells. Nat. Cell Biol. 6, 731–740 (2004). 115. Kondo, Y. et al. Alterations of DNA methylation and histone modifications contribute to gene silencing in hepatocellular carcinomas. Hepatol. Res. 37, 974–983 (2007). 116. Vire, E. et al. The Polycomb group protein EZH2 directly controls DNA methylation. Nature 439, 871–874 (2006). 117. Dalgliesh, G.L. et al. Systematic sequencing of renal carcinoma reveals inactivation of histone modifying genes. Nature 463, 360–363 (2010). 118. Gupta, R.A. et al. Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. Nature 464, 1071–1076 (2010). 119. Varambally, S. et al. Genomic loss of microRNA-101 leads to overexpression of histone methyltransferase EZH2 in cancer. Science 322, 1695–1699 (2008). 120. Berdasco, M. et al. Epigenetic inactivation of the Sotos overgrowth syndrome gene histone methyltransferase NSD1 in human neuroblastoma and glioma. Proc. Natl. Acad. Sci. USA 106, 21830–21835 (2009). 121. Jones, B. et al. The histone H3K79 methyltransferase Dot1L is essential for mammalian development and heterochromatin structure. PLoS Genet. 4, e1000190 (2008). 122. Jacinto, F.V., Ballestar, E. & Esteller, M. Impaired recruitment of the histone methyltransferase DOT1L contributes to the incomplete reactivation of tumor suppressor genes upon DNA demethylation. Oncogene 28, 4212–4224 (2009). 123. Krivtsov, A.V. et al. H3K79 methylation profiles define murine and human MLL-AF4 leukemias. Cancer Cell 14, 355–368 (2008). 124. Wang, P. et al. Global analysis of H3K4 methylation defines MLL family member targets and points to a role for MLL1-mediated H3K4 methylation in the regulation of transcriptional initiation by RNA polymerase II. Mol. Cell. Biol. 29, 6074–6085 (2009). 125. Shi, Y. Histone lysine demethylases: emerging roles in development, physiology and disease. Nat. Rev. Genet. 8, 829–833 (2007). 126. Dawson, M.A. et al. JAK2 phosphorylates histone H3Y41 and excludes HP1alpha from chromatin. Nature 461, 819–822 (2009). 127. Medina, P.P. & Sanchez-Cespedes, M. Involvement of the chromatin-remodeling factor BRG1/SMARCA4 in human cancer. Epigenetics 3, 64–68 (2008). 128. Naidu, S.R., Love, I.M., Imbalzano, A.N., Grossman, S.R. & Androphy, E.J. The SWI/ SNF chromatin remodeling subunit BRG1 is a critical regulator of p53 necessary for proliferation of malignant cells. Oncogene 28, 2492–2501 (2009). 129. Roberts, C.W. & Orkin, S.H. The SWI/SNF complex–chromatin and cancer. Nat. Rev. Cancer 4, 133–142 (2004). 130. Lin, J.C. et al. Role of nucleosomal occupancy in the epigenetic silencing of the MLH1 CpG island. Cancer Cell 12, 432–444 (2007). 131. Mulero-Navarro, S. & Esteller, M. Chromatin remodeling factor CHD5 is silenced by promoter CpG island hypermethylation in human cancer. Epigenetics 3, 210–215 (2008). 132. Sporn, J.C. et al. Histone macroH2A isoforms predict the risk of lung cancer recurrence. Oncogene 28, 3423–3428 (2009). 133. Gibbs, J.R. et al. Abundant quantitative trait Loci exist for DNA methylation and gene expression in human brain. PLoS Genet. 6, e1000952 (2010). 134. Wynder, C., Hakimi, M.A., Epstein, J.A., Shilatifard, A. & Shiekhattar, R. Recruitment of MLL by HMG-domain protein iBRAF promotes neural differentiation. Nat. Cell Biol. 7, 1113–1117 (2005). 135. Urdinguio, R.G., Sanchez-Mut, J.V. & Esteller, M. Epigenetic mechanisms in neurological diseases: genes, syndromes, and therapies. Lancet Neurol. 8, 1056–1072 (2009). 136. Hite, K.C., Adams, V.H. & Hansen, J.C. Recent advances in MeCP2 structure and function. Biochem. Cell Biol. 87, 219–227 (2009). 137. Chahrour, M. et al. MeCP2, a key contributor to neurological disease, activates and represses transcription. Science 320, 1224–1229 (2008). 138. Urdinguio, R.G. et al. Mecp2-null mice provide new neuronal targets for Rett syndrome. PLoS ONE 3, e3669 (2008). 139. Nomura, T. et al. MeCP2-dependent repression of an imprinted miR-184 released by depolarization. Hum. Mol. Genet. 17, 1192–1199 (2008). 140. Urdinguio, R.G. et al. Disrupted microRNA expression caused by Mecp2 loss in a mouse model of Rett syndrome. Epigenetics 5, 656–663 (2010). 141. Alarcon, J.M. et al. Chromatin acetylation, memory, and LTP are impaired in CBP+/− mice: a model for the cognitive deficit in Rubinstein-Taybi syndrome and its amelioration. Neuron 42, 947–959 (2004). 142. Clayton, A.L., Rose, S., Barratt, M.J. & Mahadevan, L.C. Phosphoacetylation of histone H3 on c-fos- and c-jun-associated nucleosomes upon gene activation. EMBO J. 19, 3714–3726 (2000). 143. De Sario, A. Clinical and molecular overview of inherited disorders resulting from epigenomic dysregulation. Eur. J. Med. Genet. 52, 363–372 (2009).
1068
144. Gheldof, N., Tabuchi, T.M. & Dekker, J. The active FMR1 promoter is associated with a large domain of altered chromatin conformation with embedded local histone modifications. Proc. Natl. Acad. Sci. USA 103, 12463–12468 (2006). 145. Pieper, H.C. et al. Different methylation of the TNF-alpha promoter in cortex and substantia nigra: Implications for selective neuronal vulnerability. Neurobiol. Dis. 32, 521–527 (2008). 146. Murgatroyd, C. et al. Dynamic DNA methylation programs persistent adverse effects of early-life stress. Nat. Neurosci. 12, 1559–1566 (2009). 147. Robertson, K.D. DNA methylation and human disease. Nat. Rev. Genet. 6, 597–610 (2005). 148. Herman, D. et al. Histone deacetylase inhibitors reverse gene silencing in Friedreich’s ataxia. Nat. Chem. Biol. 2, 551–558 (2006). 149. Al-Mahdawi, S. et al. The Friedreich ataxia GAA repeat expansion mutation induces comparable epigenetic changes in human and transgenic mouse brain and heart tissues. Hum. Mol. Genet. 17, 735–746 (2008). 150. Kleine-Kohlbrecher, D. et al. A functional link between the histone demethylase PHF8 and the transcription factor ZNF711 in X-linked mental retardation. Mol. Cell 38, 165–178 (2010). 151. Kumari, D. & Usdin, K. Chromatin remodeling in the noncoding repeat expansion diseases. J. Biol. Chem. 284, 7413–7417 (2009). 152. Jin, B. et al. DNA methyltransferase 3B (DNMT3B) mutations in ICF syndrome lead to altered epigenetic modifications and aberrant expression of genes regulating development, neurogenesis and immune function. Hum. Mol. Genet. 17, 690–709 (2008). 153. Pan, W. et al. MicroRNA-21 and microRNA-148a contribute to DNA hypomethylation in lupus CD4+ T cells by directly and indirectly targeting DNA methyltransferase 1. J. Immunol. 184, 6773–6781 (2010). 154. Javierre, B.M., Esteller, M. & Ballestar, E. Epigenetic connections between autoimmune disorders and haematological malignancies. Trends Immunol. 29, 616–623 (2008). 155. Karouzakis, E., Gay, R.E., Gay, S. & Neidhart, M. Epigenetic control in rheumatoid arthritis synovial fibroblasts. Nat. Rev. Rheumatol. 5, 266–272 (2009). 156. Mishra, N., Brown, D.R., Olorenshaw, I.M. & Kammer, G.M. Trichostatin A reverses skewed expression of CD154, interleukin-10, and interferon-gamma gene and protein expression in lupus T cells. Proc. Natl. Acad. Sci. USA 98, 2628–2633 (2001). 157. Vanden Berghe, W. et al. Keeping up NF-kappaB appearances: epigenetic control of immunity or inflammation-triggered epigenetics. Biochem. Pharmacol. 72, 1114– 1131 (2006). 158. Huber, L.C., Stanczyk, J., Jungel, A. & Gay, S. Epigenetics in inflammatory rheumatic diseases. Arthritis Rheum. 56, 3523–3531 (2007). 159. Miao, F. et al. Lymphocytes from patients with type 1 diabetes display a distinct profile of chromatin histone H3 lysine 9 dimethylation: an epigenetic study in diabetes. Diabetes 57, 3189–3198 (2008). 160. Ajiro, K. Histone H2B phosphorylation in mammalian apoptotic cells. An association with DNA fragmentation. J. Biol. Chem. 275, 439–443 (2000). 161. Hurd, P.J. et al. Phosphorylation of histone H3 Thr-45 is linked to apoptosis. J. Biol. Chem. 284, 16575–16583 (2009). 162. van Bavel, C.C. et al. Apoptosis-induced histone H3 methylation is targeted by autoantibodies in systemic lupus erythematosus. Ann. Rheum. Dis. published online doi:10.1136/ard.2010.129320 (10 August 2010). 163. Dieker, J.W. et al. Apoptosis-induced acetylation of histones is pathogenic in systemic lupus erythematosus. Arthritis Rheum. 56, 1921–1933 (2007). 164. Van Bavel, J.J. & Cunningham, W.A. Self-categorization with a novel mixed-race group moderates automatic social and racial biases. Pers. Soc. Psychol. Bull. 35, 321–335 (2009). 165. Verlaan, D.J. et al. Allele-specific chromatin remodeling in the ZPBP2/GSDMB/ ORMDL3 locus associated with the risk of asthma and autoimmune disease. Am. J. Hum. Genet. 85, 377–393 (2009). 166. Turunen, M.P., Aavik, E. & Yla-Herttuala, S. Epigenetics and atherosclerosis. Biochim. Biophys. Acta 1790, 886–891 (2009). 167. Movassagh, M. et al. Differential DNA methylation correlates with differential expression of angiogenic factors in human heart failure. PLoS ONE 5, e8564 (2010). 168. Hang, C.T. et al. Chromatin regulation by Brg1 underlies heart muscle development and disease. Nature 466, 62–67 (2010). 169. Symonds, M.E., Sebert, S.P., Hyatt, M.A. & Budge, H. Nutritional programming of the metabolic syndrome. Nat. Rev. Endocrinol. 5, 604–610 (2009). 170. Zeng, W. et al. Specific loss of histone H3 lysine 9 trimethylation and HP1gamma/ cohesin binding at D4Z4 repeats is associated with facioscapulohumeral dystrophy (FSHD). PLoS Genet. 5, e1000559 (2009). 171. Wilkins-Haug, L. Epigenetics and assisted reproduction. Curr. Opin. Obstet. Gynecol. 21, 201–206 (2009). 172. Tahiliani, M. et al. Conversion of 5-methylcytosine to 5-hydroxymethylcytosine in mammalian DNA by MLL partner TET1. Science 324, 930–935 (2009). 173. Ito, S. et al. Role of Tet proteins in 5mC to 5hmC conversion, ES-cell self-renewal and inner cell mass specification. Nature 466, 1129–1133 (2010). 174. Ghildiyal, M. & Zamore, P.D. Small silencing RNAs: an expanding universe. Nat. Rev. Genet. 10, 94–108 (2009). 175. Mattick, J.S. The genetic signatures of noncoding RNAs. PLoS Genet. 5, e1000459 (2009). 176. Bernstein, B. The NIH Roadmap Epigenome Mapping Consortium. Nat. Biotechnol. 28, 1045–1048 (2010). 177. Satterlee, J. Tackling the epigenome: challenges and opportunities for collaborative efforts. Nat. Biotechnol. 28, 1039–1044 (2010).
volume 28 number 10 OCTOBER 2010 nature biotechnology
review
Epigenetic modifications as therapeutic targets
© 2010 Nature America, Inc. All rights reserved.
Theresa K Kelly1,2, Daniel D De Carvalho1,2 & Peter A Jones1 Epigenetic modifications work in concert with genetic mechanisms to regulate transcriptional activity in normal tissues and are often dysregulated in disease. Although they are somatically heritable, modifications of DNA and histones are also reversible, making them good targets for therapeutic intervention. Epigenetic changes often precede disease pathology, making them valuable diagnostic indicators for disease risk or prognostic indicators for disease progression. Several inhibitors of histone deacetylation or DNA methylation are approved for hematological malignancies by the US Food and Drug Administration and have been in clinical use for several years. More recently, histone methylation and microRNA expression have gained attention as potential therapeutic targets. The presence of multiple epigenetic aberrations within malignant tissue and the abilities of cells to develop resistance suggest that epigenetic therapies are most beneficial when combined with other anticancer strategies, such as signal transduction inhibitors or cytotoxic treatments. A key challenge for future epigenetic therapies will be to develop inhibitors with specificity to particular regions of chromosomes, thereby potentially reducing side effects. Epigenetics encompasses the wide range of heritable changes in gene expression that do not result from an alteration in the DNA sequence itself. DNA methylation, the reversible post-translational modification of the range of histone variants, and nucleosome positioning collectively define the epigenetic landscape of a cell1,2. DNA methylation occurs when a methyl group is added to the 5′ position of the cytosine ring of CpG dinucleotides. Recently, methylation in embryonic stem cells was also suggested to occur at sites other than CpG dinucleotides, mainly on the cytosine of CHH or CHG trinucleotides (where H = A, C or T)3. In addition, it was recently shown that 5- methylcytosine can be converted into 5-hydroxymethylcytosine by members of the TET protein family4, mainly in embryonic stem cells and Purkinje cells5 . The biological relevance of these recently described types of methylation is an area of active investigation. Histones can be covalently modified after translation by the addition of methyl, acetyl, phosphoryl, ubiquityl or sumoyl groups. Whether the modification facilitates or inhibits transcription depends on the histone residue modified and the type of modification. The localization of nucleosomes within genomic regulatory regions has an important role in creating environments that either permit or prevent transcription. Nucleosomes consist of DNA wrapped around a core of two copies of each of the H2A, H2B, H3 and H4 histone proteins, thus linking DNA methylation and histone modifications. The presence of particular variants of core histone proteins, such as H3.3 and H2A.Z, at specific genomic loci influences the stability of nucleosome occupancy. Thus, multiple levels of epigenetic control account for appropriate orchestration of gene expression in healthy cells and dysregulated gene expression in disease.
1Departments
of Urology and Biochemistry and Molecular Biology, Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, California, USA. 2These authors contributed equally to this work. Correspondence should be addressed to P.A.J. ([email protected]). Published online 13 October 2010; doi:10.1038/nbt.1678
nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010
Here, we focus on recent examples in which epigenetic modifications have been used to evaluate disease risk, progression and clinical response. We aim to provide a broad overview of the accomplishments, remaining challenges and unrealized potential of epigenetic therapies in a range of diseases, with a particular emphasis on cancer. Epigenetic disease mechanisms and their clinical relevance Epigenetic aberrations have been well established in cancer6,7 and occur in several other diseases, including diabetes8, lupus9, asthma10 and a variety of neurological disorders7,11–13 (Table 1 and references within). In cancer cells, a global loss of DNA methylation (hypomethylation), particularly in gene bodies and intergenic regions (including repetitive elements) leads to genomic instability. This global hypomethylation is accompanied by increased de novo methylation (hypermethylation) of many promoters of tumor suppressors and other genes that are contained within CpG islands. This results in stable gene silencing (Fig. 1). In addition to changes in DNA methylation, cancer cells are characterized by a global loss of histone H4 Lys16 (H4K16) acetylation and H4K20 trimethylation. There is also increased expression of BMI1, a component of the polycomb repressive complex (PRC)-1, and EZH2, a histone-methylating component of PRC2, which both inhibit gene expression6,14. Notably, recent evidence has shown that genes targeted by the PRC in embryonic stem cells are more likely than others to become methylated in cancer15–17, suggesting that aberrant linkage between polycomb repression and the silencing of gene expression by DNA methylation may at least partly account for early changes seen during oncogenesis. Further understanding of the basis of this switch in epigenetic silencing mechanisms may provide new avenues to evaluate the tumorigenic potential of abnormal tissue. Epigenetic modifications can be used to stratify disease subtypes, severity or treatment responsiveness18 and to predict clinical outcomes19,20. H3 acetylation and H3K9 dimethylation can discriminate between cancerous and nonmalignant prostate tissue, and H3K4 trimethylation can predict the recurrence of prostate-specific antigen 1069
review Table 1 Selected examples of epigenetic alterations associated with disease Epigenetic aberration Enzyme responsible DNA methylation
Disease
DNMT1, DNMT3A, Rett syndrome DNMT3B and DNMT3L Diabetes Cancer
Epigenetic alteration
Comments
Reference
Inability to ‘read’ DNA methylation
MECP2 mutation
11–13
Hypermethylation of PPARGC1A promoter Global hypomethylation, hypermethylation of some CpG island promoters, including CIMP Systemic lupus erythematosus Hypomethylation of CpG islands at specific promoter regions ICF syndrome Hypomethylation at specific sites ATR-X syndrome Hypomethylation of specific repeat and satellite sequences
© 2010 Nature America, Inc. All rights reserved.
Histone acetylation
Histone methylation
miRNA expression
HATs and HDACs
HMTs and HDMs
N/A
Rubinstein-Taybi syndrome
Hypoacetylation
Diabetes Asthma
Hyperacetylation at promoters of inflammatory genes Hyperacetylation
Cancer
H4K16 acetylation loss
Cancer
H4K20me3 loss
Sotos syndrome
Decreased H4K20me3 and H3K36me3
Huntington’s disease
Increased H3K9me3 and possibly increased H3K27 trimethylation
Cancer
Decreased miR-101 Decreased miR-143 Decreased miR-29 Increased miR-21 Increased miR-155
8 6,7,11
Decreased DNMT1 and DNMT3B expression DNMT3B mutation ATRX mutation Mutation in gene encoding CBP, a known HAT
9 11–13 11,12 11–13 8
Increased HAT activity and decreased HDAC activity Hypomethylation of DNA repetitive sequences
10 6
Hypomethylation of DNA 6 repetitive sequences Loss of function of NSD1, 113 a HMT Increased expression of the 12 HMT ESET; enhanced PRC2 activity Increased EZH2, H3K27 trimethylation Increased DNMT3A Increased DNMT3A and DNMT3B Decreased PTEN Lower survival rates
74,87 88 89 96 95
ATR-X, alpha-thalassemia X-linked; CIMP, CpG island methylator phenotype; HAT, histone acetyltransferase; HDM, histone demethylase; HMT, histone methyltransferase; ICF, immunodeficiency, centromere instability and facial anomalies; me3, trimethylation.
accumulation after prostatectomy21. EZH2 expression is an independent prognostic marker that is correlated with the aggressiveness of prostate, breast and endometrial cancers22. Expression of the DNA repair gene O(6)-methylguanine-DNA methyltransferase (MGMT) antagonizes chemotherapy and radiation treatment23. Accordingly, silencing of MGMT by endogenous hypermethylation is correlated with positive treatment response. Furthermore, epigenetic alterations can precede tumor formation and are thus potential diagnostic indicators of disease risk24. For example, infection with Helicobacter pylori is associated with DNA hypermethylation of specific genes, which are often methylated in cancer25. Thus, reversal of epigenetic alterations that occur as a result of an acute illness may prevent progression to a more chronic disease state. The growing development of technologies to analyze the epigenome has led to the emergence of pharmacoepigenomics, the use of epigenetic profiles to identify molecular pathways most sensitive to cancer drugs26 as a means of prioritizing therapeutic strategies. In non–small-cell lung cancer, an unmethylated IGFBP3 promoter indicates responsiveness to cisplatin-based chemotherapy27. A polymorphism in the gene encoding the CYP2C19*2 variant of a cytochrome P450 protein necessitates the use of higher doses of valproic acid (VPA) to achieve target plasma concentrations 28. Furthermore, epigenetic changes can be monitored to measure treatment efficacy and disease progression. Methylation of PITX2 can be used to predict outcomes of individuals with early-stage breast cancer after adjuvant tamoxifen therapy29. Patients with hypermethylation of the gene encoding p16 (CDKN2A) have lower recurrence rates 1070
of bladder cancer compared to patients with no hypermethylation after interleukin-2 treatment 30. As epigenetic mechanisms determine which genes, and thus signaling pathways, can be activated, the presence of distinct modifications on specific genes and subsets of genes can aid at several steps in determining and monitoring optimal therapeutic approaches. The reversibility of epigenetic modifications makes them more ‘druggable’ than attempts to target or correct defects in the gene sequence itself. Moreover, it is possible that cancer cells can become ‘addicted’ to the aberrant epigenetic landscape resulting from multiple epigenetic abnormalities31, rendering them more sensitive than normal cells to epigenetic therapy though a mechanism similar to an inverted oncogene addiction. A classic example of oncogene addiction is mesenchymal-epithelial transition factor (MET), a tyrosine kinase that acts as a receptor for hepatocyte growth factor and controls tissue homeostasis in normal cells32. MET can be aberrantly activated in cancer by ligand-dependent mechanisms or by overexpression32. Although MET has roles in both normal and cancer cells, the latter are more sensitive to MET inhibition owing to their greater reliance on MET signaling32. Thus, cancer cells become dependent (and consequently addicted) to increased activity of a few highly important oncogenes. It is possible that cancer cells undergo a parallel process by which they become dependent on aberrant silencing or inactivation of a few crucial tumor suppressor genes. As it is well known that several tumor suppressor genes are silenced in cancer by epigenetic mechanisms6, it is possible that cancer cells become addicted to their aberrant epigenetic landscape and consequently become more sensitive VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology
review
Normal
Acetylation
Cancer
Trimethylation K4
K4
K4
PRC reprogramming
PRC2 inhibitor, HDAC inhibitor, LSD1 inhibitor, miR-101
Tumor suppressor genes (e.g., FBXO32)
K4
K4
K4
K27
K27
© 2010 Nature America, Inc. All rights reserved.
Polycomb-repressed genes (e.g., PAX7)
K27
K27
Methylated CpG
K27
Unmethylated CpG Aberrantly polycomb-repressed genes (e.g., FBXO32)
5mC reprogramming
DNMT inhibitor, miR-143, miR-29, LSD1 inhibitor
Tumor suppressor genes (e.g., MLH1, RUNX3)
K27
K27
K27
Aberrantly methylated genes (e.g., MLH1, RUNX3) DNMT inhibitor
Epigenetic switching ???
K4
K4
K4
CTAs (e.g., NY-ESO-1) Immunotherapy targets
Aberrantly methylated genes (e.g., PAX7)
Figure 1 Epigenetic aberrations of CpG island promoters in cancer cells and the epigenetic therapies that target them. Tumor suppressor genes (such as FBXO32, MLH1 and RUNX3) are expressed in normal cells and become silenced in cancer cells. This can occur either by PRC reprogramming (as for FBXO32), where the polycomb group protein EZH2 catalyzes the methylation of H3K27, or by 5-methylcytosine (5mC) reprogramming (as for MLH1 and RUNX3) owing to de novo DNA methylation by DNMT3A and DNMT3B. Polycomb-mediated repression can be targeted by inhibitors of PRC2, such as DZNep, and re-expression of these genes can be enhanced by HDAC and LSD1 inhibitors allowing acetylation of H3 and H4 and methylation of H3K4, respectively. Polycomb-mediated repression can also be reversed by inducing miR-101 expression, which inhibits the expression and function of EZH2. 5mC reprogramming can be reversed, mainly by DNMT inhibitors, but also by re-expression of miR-143 and miR-29, two miRNAs that target de novo DNMTs. LSD1 inhibitors may also reactivate tumor suppressor genes by inhibiting DNMT1 stabilization, leading to loss of DNA methylation maintenance. Genes that are polycomb-repressed in normal cells (such as PAX7) can undergo epigenetic switching by DNA methylation, thus losing their plasticity during transformation. It is not known whether treatment of cancer cells with DNMT inhibitors alone can reverse epigenetic switching to restore the polycomb-repressed state or whether it will reactivate this set of genes. Cancer-testis antigens (CTAs, such as NY-ESO-1) can become silenced by DNA methylation in cancer. Treatment with DNMT inhibitors can induce CTA expression, allowing the immune system to recognize and kill the cancer cells. Red arrows represent epigenetic alterations during transformation; green arrows represent reversion of these alterations by epigenetic therapy.
to epigenetic therapy than normal cells. There is some evidence that cancer cells are preferentially, affected by epigenetic therapies33 . We next consider progress and remaining challenges in manipulating DNA methylation and histone modifications for therapeutic purposes, including microRNAs (miRNAs), which can also affect gene expression without altering DNA sequence and regulate as well as be regulated by epigenetic mechanisms. What are the merits and limitations of therapeutic strategies that intervene at these distinct levels of regulation of the epigenetic landscape? Moreover, how might they be used together or in combination with nonepigenetic therapies to prevent disease and remission? DNA methylation Cancer is characterized by global hypomethylation, with hypermethylation of a subset of gene promoters contained within CpG islands leading to gene silencing (Fig. 1)6. This hypermethylation has recently been described to extend past the boundaries of CpG islands into so-called DNA shores34. DNA (cytosine-5)-methyltransferase (DNMT)-3A and DNMT3B are responsible for de novo DNA methylation patterns, which are then copied to daughter cells during S phase by DNMT1. DNA methylation inhibitors have been well characterized and tested in clinical trials 35. 5-Azacytidine (5-Aza-CR; Vidaza; azacitidine), a nucleoside analog that is incorporated into RNA and DNA, is approved to treat patients with high-risk myelodysplastic syndromes (MDS) and successful clinical results have recently been reported (Tables 2 and 3)36. 5-Aza-2-deoxycytidine (5Aza-CdR; Dacogen; decitabine) is the deoxy derivative of 5-Aza-CR and is incorporated only into DNA. At low doses, both azanucleosides act by sequestering DNMT enzymes after incorporation into DNA, leading to global demethylation as cells divide. At higher doses, they nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010
induce cytotoxicity. Zebularine is a cytidine analog that acts similarly to 5-Aza-CR but has lower toxicity and greater stability and specificity37. Another drug for which promising preclinical data are available is S110, a decitabine derivative with better stability and activity than 5-Aza-CdR (Fig. 2)38. In addition to inhibiting DNMT activity, azanucleosides act through nonspecific mechanisms, which are likely to contribute to their clinical effectiveness. Analysis of promoter DNA methylation can classify cancers26,39,40, predict the progression of cancer41,42 and direct therapy43,44. For example, DNA methylation of specific promoters may identify a subset of colorectal cancers that are responsive to 5-fluorouracil43. Furthermore, use of DNA methylation inhibitors to reverse the silencing of MLH1 restores sensitivity to cisplatin45. This suggests that combining DNA methylation inhibitors with conventional chemotherapy drugs increases therapeutic efficacy. Successful conventional chemotherapy depends on activation of proapoptotic genes that respond to cytotoxic agents, leading to cell death. DNA methylation of these proapoptotic genes can prevent cell death, which in turn confers resistance to chemotherapy. Thus, reactivation of epigenetically silenced apoptotic genes should increase the efficacy of chemotherapy. For example, APAF1 is silenced in metastatic melanoma cells, and treatment with 5-Aza-CdR restores expression and chemosensivity44. Conversely, methylation-induced silencing of DNA repair genes can be detrimental (by leading to microsatellite instability46) or beneficial (by preventing the repair of genes targeted by chemotherapy, causing cells to undergo apoptosis rather than repair47). Methylationinduced silencing of cancer-testis antigens, such as NY-ESO-1, can protect cancer cells from being recognized by T cells. Treating cancer cells with demethylating agents can induce the expression of these antigens, allowing recognition and killing by engineered cytotoxic 1071
review Table 2 Selected clinical trials of epigenetic cancer therapies with published findings Epigenetic target
Agent
DNMT inhibitor alone DNMTs 5-Aza-CR
5-Aza-CdR
HDAC inhibitor alone HDAC Phenylbutyrate
© 2010 Nature America, Inc. All rights reserved.
Vorinostat (SAHA)
Phase of study
Disease
Findings
2/3
MDS and AML
3
MDS
2
MDS and CMML
Complete remission in 10–17% and hematological improvement in 23–36% Better overall survival than with conventional care (24.5 vs. 15 months) Anti-MDS and anti-CMML activities with a safe toxicity profile; 34% of patients achieved complete response and 73% had objective response
1
MDS and AML
1
1
Reference
309
114
358
36
95
115
27
116
41
117
73
118
55
110
27
112
Two of 24 showed partial responses 32 (breast and prostate cancer) and two stable disease for more than 8 months (melanoma) Better response ratio (34% versus 12.5%), 94 progression-free survival (6 versus 4.1 months) and overall survival (13 versus 9.7 months) than with placebo plus carboplatin and paclitaxel
119
Well tolerated; no patients achieved complete or partial remission, although four achieved hematological improvement Relapsed or refractory Seven of 31 AML patients showed AML, CLL, MDS, hematological improvement, including two ALL and CML complete responses and two complete responses with incomplete blood count recovery Advanced solid and One complete response (diffuse large hematologic malignancies B-cell lymphoma), three partial responses (cutaneous T-cell lymphoma)
Combination therapy DNMTs and HDAC
5-Aza-CR and VPA
1
Advanced solid cancers
1
Refractory solid tumors
HDAC
5-Aza-CR and phenylbutyrate Vorinostat and doxorubicin
1
Solid tumors
1
Advanced non–small-cell lung cancer
Vorinostat plus carboplatin and paclitaxel
Number of subjects
Combination is safe; 25% of patients showed stable disease (median, 6 months) Combination is safe; no clinical benefit
106
ALL, acute lymphoblastic leukemia; CLL, chronic lymphocytic leukemia; CML, chronic myelogenous leukemia; CMML, chronic myelomonocytic leukemia; SAHA, suberoylanilide hydroxamic acid.
T lymphocytes48. This suggests the possibility of augmenting the efficacy of immunotherapy by combining it with drugs that modulate epigenetic regulation (Fig. 1). Despite the clinical successes achieved with DNA methylation inhibitors, there is still considerable room for improvement. The available DNA methylation inhibitors block DNA methylation by trapping DNMT enzymes on DNA, preventing methylation at other genomic loci . Notwithstanding the therapeutic benefits of simultaneously counteracting the broad hypermethylation of tumor suppressor genes characteristic of most cancers, global hypomethylation may lead to activation of oncogenes and/or increased genomic instability. Moreover, DNA hypomethylation can activate promoters within repetitive elements. For example, hypomethylation of long interspersed nuclear element-1 can activate an alternative transcript of the MET oncogene in bladder cancer49. Moreover, DNA methylation inhibitors have also been implicated in defects in memory-associated neural plasticity, suggesting a link between DNA methylation and neural plasticity associated with learning and memory50. Developing DNA methylation inhibitors that target specific genes or groups of genes would overcome these perceived risks of agents responsible for global DNA demethylation. Furthermore, because DNA methylation inhibitors act during the S phase of the cell cycle, they preferentially affect rapidly growing cells. This is advantageous when treating rapidly dividing cancer cells but may be less clinically 1072
useful in treating diseases that are not characterized by rapid cell cycling. Moreover, the observation that levels of DNA methylation return to pretreatment levels upon withdrawal of azanucleoside11 suggests a continual need for DNMT inhibition. Thus, despite the clinical success of DNA methylation inhibitors, their lack of specificity, cell cycle dependency and need for continuous administration leave room for the development of better therapies. Histone modifications Whereas DNA methylation is considered to be a very stable epigenetic modification, histone modifications are more labile. Levels of histone modifications are maintained by the balance between the activities of histone-modifying enzymes that add or remove specific modifications. As aberrant histone modification levels result from an imbalance in these modifying enzymes in diseased tissue, correcting the increased or decreased level of a particular enzyme should restore the natural equilibrium in the affected cells. Cancer cells are characterized by dysregulation of histone methyl transferases and histone demethylases, overexpression of histone deacetylases (HDACs), and a global reduction in levels of histone acetylation6,14,51–53. HDAC inhibitors have long been studied in the clinical setting as potential therapies (Fig. 2), and recent clinical trials of these agents have been extensively reviewed elsewhere (see also Tables 2 and 3)54. HDAC inhibitors can also affect the acetylation VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology
review
FDA approved
DNA methylation
Histone acetylation O
NH
H3 C
O
O
H3 C
Histone methylation CH3
N H NH S
O O
S HN
N/A
H N
O
O HN
Clinical trials
5-Aza-2′Vorinostat 5-Azacytidine deoxycytidine (SAHA)
O H3 C
CH3
OH
Romidepsin HO
O
Phenylbutyrate
Hydralazine
N/A
Entinostat (MS-275) H N
Pre-clinical trials
NH2
N H N
HO
4
SL11144
O
O O
NH2
N N
O O
P
NH
N
ONa N
O
N
N
HO
NH2
O
HCl OH
HO
S110
© 2010 Nature America, Inc. All rights reserved.
H N N H
4
N
N
PCI-34051
OH
DZNep
Figure 2 Chemical structures of selected compounds that target epigenetic modifications. Several molecules that target epigenetic alterations in pathological states are currently at different stages of drug development. The nucleoside analogs 5-azacytidine and 5-aza-2′-deoxycytidine are approved by the US Food and Drug Administration (FDA) to treat high-risk MDS, and successful clinical results have been reported. The drug hydralazine is currently being investigated in clinical trials as a putative demethylating agent against solid tumors. S110, a dinucleotide containing 5-aza-CdR, has been shown in vitro to demethylate DNA and is more stable than 5-aza-CdR because it is less sensitive to deamination by cytidine deaminase. Targeting of histone acetylation has also been a successful example of epigenetic therapy. Several HDAC inhibitors are FDA approved, including the hydroxamic acid–based compound SAHA and the depsipeptide romidepsin, whereas others are currently in clinical trials for cancer (phenylbutyrate and entinostat) and neurologic diseases (entinostat). New molecules targeting specific HDACs are under preclinical investigation (such as PCI-34051, which targets HDAC8). More recently, significant effort is under way to find new molecules able to target histone methylation. To our knowledge, no drugs targeting histone methylation are FDA approved or in clinical trials. Even so, preclinical trials suggest antitumor activity of the oligoamine analog SL11144, which inhibits LSD1, and the S-adenosylhomocysteine hydrolase inhibitor DZNep, which depletes cellular levels of PRC2 components.
of proteins other than histones, potentially leading to more global effects54. Furthermore, because HDAC inhibitors only target ~10% of all acetylation sites55, more work is necessary to understand the underlying basis for target specification of global and isoform-specific HDAC inhibitors. Substantial efforts are currently under way to find new molecules that can selectively inhibit specific HDACs56,57 and thus avoid the side effects that occur with a global HDAC inhibitor, including cardiac toxicity54 and deficits in hematopoiesis58 and memory formation59–61. To date, specific inhibitors of HDAC6 (class II) and HDAC8 (class I) have been developed56,57. When combined with a better understanding of the pathophysiology of diseases associated with alterations in HDACs, the development of specific HDAC inhibitors will allow more rational therapy and potentially reduce side effects. For example, the HDAC inhibitor PCI-34051, which is derived from a low-molecular-weight hydroxamic acid scaffold, selectively inhibits HDAC8 and induces apoptosis in T-cell lymphomas but not other tumor or normal cells. This indicates that HDAC8 has an important role in the pathophysiology of this disease and suggests that therapy with an HDAC8-specific inhibitor(s) can reduce undesirable side effects57. Other HDAC inhibitors are selective to a group of HDAC isoforms, rather than a specific isoform, allowing their use for a wider range of diseases while minimizing side effects. For example, MGCD0103 (mocetinostat), which inhibits HDAC isoforms 1, nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010
2 and 3 (class 1) and 11 (class 4), was shown in clinical trials to be tolerable and inhibit histone acetylation in patients with advanced solid tumors62. MGCD0103 was also shown to be safe and to have antileukemia effects63. Although the identification of additional specific HDAC inhibitors will increase specificity and the possibility of personalized treatments, it may also limit the likelihood of their successful incorporation into combinatorial therapies. Histone methyltransferase and demethylase enzymes are generally more specific than HDACs in that they target fewer residues64. However, like HDACs, lysine and arginine methyltransferase enzymes also methylate proteins other than histones65,66. A great deal of effort is under way to find drugs able to revert specific histone methylation marks or to selectively target histone methyltransferases or histone demethylases. In this regard, a new class of oligoamine analogs was recently found that act as potent inhibitors of lysine-specific demethylase-1 (LSD1; Fig. 2). LSD1 targets the activating H3K4 mono- and dimethylation mark but can also target the repressive H3K9 dimethylation (H3K9me2) mark when complexed with the androgen receptor51,67. Treatment of colon cancer cells with LSD1 inhibitors (such as SL11144) increases H3K4 methylation, decreases H3K9me2, and restores expression of SFRP2 (ref. 68), indicating context specificity of LSD1 and its inhibitors. LSD1 inhibition in neuroblastoma results in decreased proliferation in vitro and reduced xenograft growth69. Notably, LSD1 can also demethylate DNMT1, resulting in destabilization and loss of global maintenance of DNA methylation70. The ability of LSD1 to affect both histone and DNA methylation makes it a promising target for epigenetic therapy. The repression mediated by the H3K27 trimethylation (H3K27me3) mark occurs through the actions of two multisubunit complexes, PRC1 and PRC2. The H3K27me3 mark deposited by EZH2 is recognized and bound by PRC1, which can further recruit additional proteins to establish a repressed chromatin configuration6. Gene promoters that are marked by PRC2 (that is, polycomb target genes) in embryonic stem cells have recently been shown to be far more likely than other genes to become methylated in cancer15–17. Similarly, polycomb targets in normal prostate cells also become methylated in prostate cancer71. Thus, alterations in chromatin structure do not always coincide with changes in gene expression associated with disease. Instead, DNA methylation replacement of polycomb repressive marks ‘locks in’ an inactive chromatin state through a process called epigenetic switching71. Although the mechanism underlying the predisposition of polycomb targets for DNA methylation is not fully understood, some links have recently been uncovered. CBX7, a component of the PRC1 complex, can directly interact with DNMT1 and DNMT3B at polycomb target genes72. Although drugs that target histone methylases and demethylases have considerable potential, more work is necessary to determine their specificities and the stabilities of the changes they effect. There are currently no such drugs in clinical trials. Preclinical studies suggest that the S-adenosylhomocysteine hydrolase inhibitor 3-deazaneplanocin A (DZNep) shows the most promise (Fig. 2). DZNep depletes cellular levels of PRC2 components (EZH2, EED and SUZ12) and consequently reduces H3K27me3 levels and induces apoptosis in breast cancer, but not normal, cells73. The effect of DZNep is similar to that observed when EZH2 is depleted by RNA interference, suggesting that this drug is more effective in cancers of the prostate and breast, which rely on abnormally high EZH2 expression levels74. In contrast, a subsequent study showed that DZNep also decreases H4K20me3. This demonstration that DZNep lacks specificity and acts more as a global histone methylation inhibitor underscores the need for further development of histone methylation inhibitors75. 1073
review Table 3 Epigenetic cancer therapies under commercial development (either in safety and efficacy trials or approved) Drug
Sponsor
Indication
Clinical status
DNMT inhibitors 5-Aza-CdR (Dacogen)
Eisai (Tokyo)
5-Aza-CR (Vidaza)
Celgene (Summit, NJ, USA)
MDS AML 1st line CML MDS AML Hematologic cancer MDS and AML
Approved May 2006 Phase 3 in 480 patients Phase 2 in 19 patients Approved May 2004 Phase 3 targeting 480 patients Phase 2 New Drug Application
CTCL NHL CTCL
Approved November 2009 Phase 2 Approved October 2006
Mesothelioma MDS, NHL, brain cancer and NSCLC Multiple myeloma Hodgkin’s lymphoma CML, AML and MDS Multiple myeloma AML, CTCL, MDS, NHL and ovarian cancer AML, CLL, Hodgkin’s lymphoma, NHL, pancreatic cancer and thymic carcinoma Breast cancer, Hodgkin’s lymphoma and NSCLC Hematologic cancer and sarcoma
Phase Phase Phase Phase Phase Phase Phase
Phase 1/2
Glioblastoma multiforme
Phase 2 in 40 patients
S110 (dinucleotide prodrug of decitabine)
SuperGen (Dublin, CA, USA)
HDAC inhibitors Romidepsin (Istodax; a cyclic depsipeptide)
Celgene
Vorinostat (Zolinza; suberoylanilide hydroxamic acid)
Merck (Whitehouse Station, NJ, USA)
© 2010 Nature America, Inc. All rights reserved.
Vorinostat + bortezomib (Velcade) Panobinostat (LBH589; hydroxamate analog) Panobinostat + bortezomib + dexamethasone Belinostat (PXD10; hydroxamate analog) Mocetinostat dihydrobromide (MGCD0103; aminopyrimidine analog) Entinostat (SNDX-275; synthetic benzamide derivative) PCI-24781 (CRA-024781; hydroxamic acid derivative)
Novartis (Basel)
Spectrum Pharmaceuticals (Irvine, CA, USA) MethylGene (Montreal, QC, Canada) Syndax Pharmaceuticals (Waltham, MA , USA) Pharmacyclics (Sunnyvale, CA, USA)
Other 131I-conjugated monoclonal antibody targeting Peregrine Pharmaceuticals DNA–histone H1 complexes (Cotara) (Tustin, CA, USA)
3 targeting 660 patients 2 2 and 3 targeting 742 patients 3 in 367 patients 2/3 3 targeting 676 patients 2
Phase 2 Phase 2
CLL, chronic lymphocytic leukemia; CML, chronic myelogenous leukemia; CTCL, cutaneous T-cell lymphoma; NHL, non-Hodgkin′s lymphoma; NSCLC, non–small-cell lung cancer. Sources: BioMedTracker, Thomsen Pharma Partnering and PubMed.
EZH2 activity can also be regulated by signaling cascades. For example, AKT phosphorylates EZH2 at Ser21, suppressing its methyl transferase activity and thereby reducing levels of H3K27me3 (ref. 76). The frequency of H3K27 trimethylation can be restored using LY294002, an inhibitor of the phosphatidylinositol-3-kinase and AKT pathway, opening a new therapeutic opportunity to repair epigenetic alterations by targeting upstream signaling pathways. Furthermore, in prostate cancer, the oncogenic ETS transcription factor ERG can bind to the EZH2 promoter and induce overexpression. Thus, pharmacological disruption of ERG activity could reduce the EZH2 overexpression observed in cancer77. EZH2 is a particularly important example because it is frequently overexpressed and aberrantly targeted to genes in cancer71, a process termed PRC reprogramming (Fig. 1). G9a and G9a-like protein (GLP) are histone methyltransferases that catalyze H3K9 dimethylation and are often overexpressed in tumors78. Knockdown of G9a in prostate cancer cells indicates a crucial role for this protein in regulating centrosome duplication and chromatin structure. The likely importance of G9a in perpetuating the malignant phenotype and its promise as a target in cancer therapy79 have generated substantial interest in developing G9a and GLP inhibitors. Thus far, the most efficient inhibitor is BIX-01294, a diazepine-quinazolineamine derivative that transiently reduces global H3K9me2 levels in several cell lines80. BIX-01294 binds to the SET domain of GLP in the same groove at which the target lysine (H3K9) binds. This prevents the binding of the peptide substrate and, consequently, the deposition of methylation marks at H3K9 (ref. 81). Several other histone methyltransferases and demethylases have also been associated with diseases, making them potential targets for epigenetic therapy. For instance, MMSET, a H4K20 methyltransferase, is overexpressed in myeloma cell lines and is required for cell viability82. 1074
SMYD3, a H3K4 methyltransferase, is also highly expressed in cancer and seems to have a role in carcinogenesis as a coactivator of estrogen receptor-alpha83. Expression of GASC1, an H3K9 and H3K36 demethylase, is often amplified in cancer, and its inhibition decreases rates of cell proliferation84. Although the challenges associated with targeting specific histone modifications have not prevented considerable clinical success with this group of targets, it seems likely that therapeutics capable of targeting specific histone-modifying enzymes could retain or increase therapeutic success rates while decreasing side effects resulting from the lack of specificity. In contrast, targeting individual histone modifying enzymes may decrease clinical efficacy if histone-modifying enzymes not targeted by the drug in question compensate for any changes and thereby confer drug resistance. Designing personalized cocktails of inhibitors based on an individual’s need may help overcome the potential problems of compensation and resistance. MicroRNAs Small, noncoding miRNAs are able to induce heritable changes in gene expression without altering DNA sequence and thus contribute to the epigenetic landscape. In addition, miRNAs can both regulate and be regulated by other epigenetic mechanisms. Expression of miRNAs is dysregulated in several diseases, including cancer85 and certain neuro degenerative disorders86. For example, miR-101 targets EZH2 for degradation and is downregulated in several types of cancer, leading to increased EZH2 expression (and consequently higher H3K27me3 levels) and decreased expression of tumor suppressor genes74,87. Restoring expression of miR-101 leads to reduced H3K27me3 and inhibits colony formation and cancer cell proliferation74,87. Expression of miR-143 in colorectal cancer cells88 and the miR-29 family in lung VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
review cancer cells89 reduces DNMT3A and DNMT3B levels, respectively, and results in decreased cell growth and colony formation. Treatment of cells with 5-Aza-CdR and 4-phenylbutyric acid results in miR127 activation, which in turn downregulates the BCL6 oncogene in bladder cancer cells90. In fact, treatment with 5-Aza-CdR alone is sufficient to reactivate miR-148a, miR-34b/c and miR-9—a group of miRNAs capable of suppressing metastasis91. In addition to inducing aberrantly repressed miRNAs using epigenetic drugs, replacement gene therapy may also be useful in reestablishing miRNA expression. Viral vectors generated by cloning individual or groups of human miRNAs have been successful in preclinical assays using a mouse model of hepatocellular carcinoma, in which miR-26a expression from an adeno-associated virus results in apoptosis and inhibition of cancer cell proliferation in the absence of toxicity92. Gene therapy using miRNAs has an advantage over conventional RNA interference in that it is unlikely to generate a strong type I interferon response because double-stranded RNA is not introduced to the cell93. Abnormally high expression of miRNAs can be targeted using recently developed locked nucleic acid (LNA)–modified phosphorothioate oligonucleotide technology. LNA-modified oligonucleotides contain an extra bridge in their chemical composition, leading to enhanced stability compared to their unmodified counterparts. These LNA-modified phosphorothioate oligonucleotides can generate miRNAs, creating LNA–antimiRNAs that can be delivered systemically. In preclinical assays with primates, intravenous injections of LNA–anti-miRNA complementary to the 5′ end of miR-122 antagonized liver-specific expression of this miRNA without toxicity94. Phase 1 trials based on these promising results are currently under way. LNA–anti-miRNAs may be used to target aberrantly expressed miRNAs in other diseases, such as cancer. For example, miR-155 is upregulated in lung adenocarcinoma compared to noncancerous lung tissue, and patients with higher miR-155 expression have lower survival rates than do patients with lower miR-155 expression. This suggests that miR-155 is a promising target for LNA–anti-miRNA therapy (Table 1)95. Several other miRNAs are upregulated in cancer and could theoretically be used as LNA–anti-miRNA targets. For example, miR-21 is upregulated in several types of cancer (lung, breast, colon, gastric and prostate carcinomas; endocrine pancreatic tumors; glioblastomas; and cholangiocarcinomas) and targets the tumor suppressor PTEN (Table 1)96. Thus, miRNAs can both alter the epigenetic machinery and be regulated by epigenetic alterations. This creates a highly controlled feedback mechanism, making it a suitable target for epigenetic therapy and possibly an epigenetic drug itself. One unique advantage of targeting miRNAs is the ability of one miRNA to regulate several target genes and multiple cellular processes. In that way, if the level of one or a few miRNAs has changed in a pathological state, several different pathways could consequently be altered. Rather than trying to identify and directly target the proteins in multiple pathways, it would be more effective to restore the physiological level and functions of the dysregulated miRNA(s). This clinical potential highlights the importance of better understanding miRNA profiles in healthy and diseased tissues in order to develop better therapeutic strategies. Furthermore, multiple miRNAs that target different steps of an overactive pathway could be combined to increase efficacy and allow for customization of therapies to individual patients. Although the unique composition of miRNA-based therapy provides many benefits, additional research is necessary to determine the best method of delivery and increase miRNA stability to ensure efficacy. Combined epigenetic therapies The presence of multiple epigenetic aberrations in a single tissue, the ability of diseased cells to develop resistance, and the discovery that nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010
common sets of genes are regulated by distinct epigenetic mechanisms at different biological stages collectively point to the likely feasibility of combinatorial approaches to target epigenetic modulators. Efforts for more than 25 years to enhance therapeutic efficacy by combining epigenetic strategies97,98 have revealed both additive and synergistic effects, depending on the targets11,52. Extensive work on the clinical benefits of combined DNMT and HDAC inhibition has been comprehensively reviewed elsewhere6,46,54,99,100. A recent phase 2 multicenter study examining the combination of 5-Aza-CR and the HDAC inhibitor VPA in patients with higher-risk MDS found that therapeutic levels of VPA may increase the efficacy of 5-Aza-CR28. Sequential administration of DNMT and HDAC inhibitors resulted in clinical efficacy in patients with hematologic malignancies54,99. However, other studies found no correlation between baseline methylation levels or methylation reversal and positive clinical outcome in patients with MDS or acute myeloid leukemia (AML) after combined treatment with 5-AzaCR and entinostat101. The mechanism behind the clinical efficacy of sequential DNMT and HDAC inhibition remains controversial, and additional studies investigating potential genetic or epigenetic determinants of responsiveness will be helpful. Besides inducing apoptosis in cancer cells, another therapeutic approach involves inducing differentiation of cancer cells. To this end, following 5-Aza-CR and VPA treatment with all-trans-retinoic acid resulted in global hypomethylation and histone acetylation and clinical response in nearly half of treated patients with AML or high-risk MDS102. Although targeting of histone demethylases is still in its infancy, early preclinical studies show promise for using such drugs alone (as described above) or together with other epigenetic therapies. Restoration of the expression of SFRP2, a negative regulator of Wnt signaling, in a human colon cancer model after LSD1 and DNMT inhibition has been associated with significant growth inhibition of established tumors68. Notably, in addition to demethylating histone residues, LSD1 can demethylate DNMT1. This provides the ability to target both histone methylation and DNA methylation using a single compound. Cotreatment with the HDAC inhibitor panobinostat further enhances DZNep-mediated reduction in EZH2 levels, leading to increased p16, p21, p27 and FBX032 expression and apoptosis in cultured AML cells and mouse models103. These promising data suggest that an absence of clinical trials targeting histone methylation and demethylation enzymes should not diminish enthusiasm for their therapeutic potential. Epigenetic and cytotoxic therapies Conventional chemotherapy can rapidly induce cell death in cancer cells, although resistance to standard chemotherapy often arises through epigenetic and DNA repair mechanisms 27. As a result, epigenetic therapeutics can be combined with more conventional therapies to induce responsiveness or overcome resistance to cytotoxic treatments. Preconditioning with epigenetic drugs could reverse the epigenetic alteration(s) that confer resistance, restoring chemotherapeutic sensitivity. For example, 5-Aza-CR treatment can reverse DNA methylation, thereby overcoming the gene silencing that led to chemotherapeutic resistance 104. In contrast, methylation-induced silencing of DNA repair genes, such as MGMT, is correlated with a positive clinical response to chemotherapy. Thus, the potential for success of combinations of DNA methylation inhibitors and chemotherapy may depend on the epigenetic profile of an individual tumor. Responses of patients with previously untreated non–small-cell lung cancer to combinations of the HDAC inhibitor vorinostat with carboplatin and paclitaxel were sufficiently promising to warrant a phase 2 study, which also showed encouraging results (Table 2)105,106. 1075
© 2010 Nature America, Inc. All rights reserved.
review Conclusions Several molecular regulators of the cellular epigenetic landscape have been established as effective targets in successful therapies for a variety of malignancies. In particular, the inhibition of DNMTs or HDACs has been approved for cancer treatment. Although the mechanism(s) behind the therapeutic benefit of DNMT and HDAC inhibition are not fully understood, ongoing and future studies that combine genomic sequencing and expression data may provide the keys to understanding the mechanism(s) underlying responsiveness. Besides their methylation and acetylation, histones can be phosphorylated, ubiquitylated and sumoylated. These modifications, which have been less well studied in the context of disease, may expand current possibilities for therapeutic intervention. Given the importance of epigenetic mechanisms in controlling development and normal cellular behavior, it seems that approaches capable of targeting specific epigenetic alterations, rather than affecting global modifications, would greatly enhance clinical efficiency while lowering toxicity and side effects. This is an important priority for the field. Another major challenge in advancing epigenetic therapy will be to discriminate between so-called driver genes (those that must be epigenetically silenced for disease to occur) and so-called passenger genes (those that are epigenetically silenced owing to aberrant activity of the epigenetic machinery, but are not necessary for disease to occur). Recent advances in high-throughput technologies such as genomewide sequencing, combined with RNA profiling, chromatin immunoprecipitation or bisulfite conversion, have generated large amounts of data that can be integrated to form a comprehensive understanding of the epigenetic alterations that are common and specific to various disease states. Assimilating these large datasets is likely to assist in identifying epigenetic alterations that are causative and those that are merely correlative107,108. Thus, it may eventually be possible for patients to be screened, using high-throughout technologies, and classified by epigenetic alterations of the driver genes responsible for their illness. Along with the development of targeted inhibitors of epigenetic modifications, this could open the way for the use of personalized targeted therapies. Currently, despite the successful clinical use of epigenetic therapies to treat hematological malignancies, there has been little success in treating solid cancers (Tables 2 and 3)109–112. Initial clinical trials, which used treatment regimens later found to be less than optimal, resulted in low rates of positive clinical response. Administering more recently developed dosing and treatment schedules, and classifying tumor subtypes based on molecular signatures, may increase the efficacy of epigenetic therapy for solid tumors. Solid tumors invariably comprise heterogeneous populations of cells, many at different stages of differentiation. Clinical success may therefore require more effective approaches to determine which of these cells harbor epigenetic alterations and new strategies to ensure that therapeutic agents maintain stability and are able to penetrate the cellular mass and reach affected cells. The recognition of epigenetics as a significant contributor to normal development and disease has opened new avenues for drug discovery and therapeutics, with a range of prospects that continues to expand as our knowledge of epigenetic regulation advances. Epigenetic therapies could be combined with conventional therapies to develop personalized treatments, render unresponsive tumors susceptible to treatment and reduce dosing. These advances may limit the side effects of treatment, improving compliance with dosing regimens and overall quality of life.
1076
Acknowledgments Supported by R37CA082422 and R01CA083867 (P.A.J). We thank members of the Jones laboratory for helpful discussions and careful reading of the manuscript, particularly H. Han for help in drawing chemical structures. COMPETING FINANCIAL INTERESTS The authors declare competing financial interests: details accompany the full-text HTML version of the paper at http://www.nature.com/naturebiotechnology/. Published online at http://www.nature.com/naturebiotechnology/. Reprints and permissions information is available online at http://npg.nature.com/ reprintsandpermissions/. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.
13. 14. 15. 16. 17.
18.
19. 20. 21. 22.
23. 24. 25.
26. 27.
28.
29.
Portela, A. & Esteller, M. Epigenetic modifications and human disease. Nat. Biotechnol. 28, 1057–1068 (2010). Meissner, A. Epigenetic modifications and their role in pluripotency. Nat. Biotechnol. 28, 1079–1088 (2010). Lister, R. et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462, 315–322 (2009). Tahiliani, M. et al. Conversion of 5-methylcytosine to 5-hydroxymethylcytosine in mammalian DNA by MLL partner TET1. Science 324, 930–935 (2009). Wu, S.C. & Zhang, Y. Active DNA demethylation: many roads lead to Rome. Nat. Rev. Mol. Cell Biol. 11, 607–620 (2010). Jones, P.A. & Baylin, S.B. The epigenomics of cancer. Cell 128, 683–692 (2007). Fouse, S.D. & Costello, J.F. Epigenetics of neurological cancers. Future Oncol. 5, 1615–1629 (2009). Villeneuve, L.M. & Natarajan, R. The role of epigenetics in the pathology of diabetic complications. Am. J. Physiol. Renal Physiol. 299, F14–F25 (2010). Javierre, B.M. et al. Changes in the pattern of DNA methylation associate with twin discordance in systemic lupus erythematosus. Genome Res. 20, 170–179 (2010). Adcock, I.M., Ito, K. & Barnes, P.J. Histone deacetylation: an important mechanism in inflammatory lung diseases. COPD 2, 445–455 (2005). Egger, G., Liang, G., Aparicio, A. & Jones, P.A. Epigenetics in human disease and prospects for epigenetic therapy. Nature 429, 457–463 (2004). Urdinguio, R.G., Sanchez-Mut, J.V. & Esteller, M. Epigenetic mechanisms in neurological diseases: genes, syndromes, and therapies. Lancet Neurol. 8, 1056–1072 (2009). Feng, J. & Fan, G. The role of DNA methylation in the central nervous system and neuropsychiatric disorders. Int. Rev. Neurobiol. 89, 67–84 (2009). Sharma, S., Kelly, T.K. & Jones, P.A. Epigenetics in cancer. Carcinogenesis 31, 27–36 (2010). Widschwendter, M. et al. Epigenetic stem cell signature in cancer. Nat. Genet. 39, 157–158 (2007). Schlesinger, Y. et al. Polycomb-mediated methylation on Lys27 of histone H3 premarks genes for de novo methylation in cancer. Nat. Genet. 39, 232–236 (2007). Ohm, J.E. et al. A stem cell-like chromatin pattern may predispose tumor suppressor genes to DNA hypermethylation and heritable silencing. Nat. Genet. 39, 237–242 (2007). Shen, L. et al. Integrated genetic and epigenetic analysis identifies three different subclasses of colon cancer. Proc. Natl. Acad. Sci. USA 104, 18654–18659 (2007). Seligson, D.B. et al. Global histone modification patterns predict risk of prostate cancer recurrence. Nature 435, 1262–1266 (2005). Figueroa, M.E. et al. DNA methylation signatures identify biologically distinct subtypes in acute myeloid leukemia. Cancer Cell 17, 13–27 (2010). Ellinger, J. et al. Global levels of histone modifications predict prostate cancer recurrence. Prostate 70, 61–69 (2010). Bachmann, I.M. et al. EZH2 expression is associated with high proliferation rate and aggressive tumor subgroups in cutaneous melanoma and cancers of the endometrium, prostate, and breast. J. Clin. Oncol. 24, 268–273 (2006). Weller, M. et al. MGMT promoter methylation in malignant gliomas: ready for personalized medicine? Nat Rev Neurol 6, 39–51 (2010). Kanai, Y. Genome-wide DNA methylation profiles in precancerous conditions and cancers. Cancer Sci. 101, 36–45 (2010). Kondo, T. et al. Accumulation of aberrant CpG hypermethylation by Helicobacter pylori infection promotes development and progression of gastric MALT lymphoma. Int. J. Oncol. 35, 547–557 (2009). Shen, L. et al. Drug sensitivity prediction by CpG island methylation profile in the NCI-60 cancer cell line panel. Cancer Res. 67, 11335–11343 (2007). Ibanez de Caceres, I. et al. IGFBP-3 hypermethylation-derived deficiency mediates cisplatin resistance in non-small-cell lung cancer. Oncogene 29, 1681–1690 (2010). Voso, M.T. et al. Valproic acid at therapeutic plasma levels may increase 5-azacytidine efficacy in higher risk myelodysplastic syndromes. Clin. Cancer Res. 15, 5002–5007 (2009). Martens, J.W., Margossian, A.L., Schmitt, M., Foekens, J. & Harbeck, N. DNA methylation as a biomarker in breast cancer. Future Oncol. 5, 1245–1256 (2009).
VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
review 30. Jarmalaite, S. et al. Promoter hypermethylation in tumour suppressor genes and response to interleukin-2 treatment in bladder cancer: a pilot study. J. Cancer Res. Clin. Oncol. 136, 847–854 (2010). 31. Baylin, S.B. & Ohm, J.E. Epigenetic gene silencing in cancer—a mechanism for early oncogenic pathway addiction? Nat. Rev. Cancer 6, 107–116 (2006). 32. Comoglio, P.M., Giordano, S. & Trusolino, L. Drug development of MET inhibitors: targeting oncogene addiction and expedience. Nat. Rev. Drug Discov. 7, 504–516 (2008). 33. Cheng, J.C. et al. Preferential response of cancer cells to zebularine. Cancer Cell 6, 151–158 (2004). 34. Irizarry, R.A. et al. The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores. Nat. Genet. 41, 178–186 (2009). 35. Issa, J.P. & Kantarjian, H.M. Targeting DNA methylation. Clin. Cancer Res. 15, 3938–3946 (2009). 36. Fenaux, P. et al. Efficacy of azacitidine compared with that of conventional care regimens in the treatment of higher-risk myelodysplastic syndromes: a randomised, open-label, phase III study. Lancet Oncol. 10, 223–232 (2009). 37. Yoo, C.B., Cheng, J.C. & Jones, P.A. Zebularine: a new drug for epigenetic therapy. Biochem. Soc. Trans. 32, 910–912 (2004). 38. Yoo, C.B. et al. Delivery of 5-aza-2′-deoxycytidine to cells using oligodeoxynucleotides. Cancer Res. 67, 6400–6408 (2007). 39. Toyota, M., Ohe-Toyota, M., Ahuja, N. & Issa, J.P. Distinct genetic profiles in colorectal tumors with or without the CpG island methylator phenotype. Proc. Natl. Acad. Sci. USA 97, 710–715 (2000). 40. Toyota, M. et al. CpG island methylator phenotype in colorectal cancer. Proc. Natl. Acad. Sci. USA 96, 8681–8686 (1999). 41. Tanemura, A. et al. CpG island methylator phenotype predicts progression of malignant melanoma. Clin. Cancer Res. 15, 1801–1807 (2009). 42. Noushmehr, H. et al. Identification of a CpG island methylator phenotype that defines a distinct subgroup of glioma. Cancer Cell 17, 510–522 (2010). 43. Van Rijnsoever, M., Elsaleh, H., Joseph, D., McCaul, K. & Iacopetta, B. CpG island methylator phenotype is an independent predictor of survival benefit from 5-fluorouracil in stage III colorectal cancer. Clin. Cancer Res. 9, 2898–2903 (2003). 44. Soengas, M.S. et al. Inactivation of the apoptosis effector Apaf-1 in malignant melanoma. Nature 409, 207–211 (2001). 45. Strathdee, G., MacKean, M.J., Illand, M. & Brown, R. A role for methylation of the hMLH1 promoter in loss of hMLH1 expression and drug resistance in ovarian cancer. Oncogene 18, 2335–2341 (1999). 46. Ma, X., Ezzeldin, H.H. & Diasio, R.B. Histone deacetylase inhibitors: current status and overview of recent clinical trials. Drugs 69, 1911–1934 (2009). 47. Fukushima, T., Takeshima, H. & Kataoka, H. Anti-glioma therapy with temozolomide and status of the DNA-repair gene MGMT. Anticancer Res. 29, 4845–4854 (2009). 48. Wargo, J.A. et al. Recognition of NY-ESO-1+ tumor cells by engineered lymphocytes is enhanced by improved vector design and epigenetic modulation of tumor antigen expression. Cancer Immunol. Immunother. 58, 383–394 (2009). 49. Wolff, E.M. et al. Hypomethylation of a LINE-1 promoter activates an alternate transcript of the MET oncogene in bladders with cancer. PLoS Genet. 6, e1000917 (2010). 50. Feng, J. & Fan, G. The role of DNA methylation in the central nervous system and neuropsychiatric disorders. Int. Rev. Neurobiol. 89, 67–84 (2009). 51. Shi, Y. Histone lysine demethylases: emerging roles in development, physiology and disease. Nat. Rev. Genet. 8, 829–833 (2007). 52. Yoo, C.B. & Jones, P.A. Epigenetic therapy of cancer: past, present and future. Nat. Rev. Drug Discov. 5, 37–50 (2006). 53. Nakagawa, M. et al. Expression profile of class I histone deacetylases in human cancer tissues. Oncol. Rep. 18, 769–774 (2007). 54. Lane, A.A. & Chabner, B.A. Histone deacetylase inhibitors in cancer therapy. J. Clin. Oncol. 27, 5459–5468 (2009). 55. Choudhary, C. et al. Lysine acetylation targets protein complexes and co-regulates major cellular functions. Science 325, 834–840 (2009). 56. Haggarty, S.J., Koeller, K.M., Wong, J.C., Grozinger, C.M. & Schreiber, S.L. Domain-selective small-molecule inhibitor of histone deacetylase 6 (HDAC6)mediated tubulin deacetylation. Proc. Natl. Acad. Sci. USA 100, 4389–4394 (2003). 57. Balasub ramanian, S. et al. A novel histone deacetylase 8 (HDAC8)-specific inhibitor PCI-34051 induces apoptosis in T-cell lymphomas. Leukemia 22, 1026–1034 (2008). 58. Bruserud, Ø., Stapnes, C., Ersvaer, E., Gjertsen, B.T. & Ryningen, A. Histone deacetylase inhibitors in cancer treatment: a review of the clinical toxicity and the modulation of gene expression in cancer cell. Curr. Pharm. Biotechnol. 8, 388–400 (2007). 59. Fischer, A. et al. Recovery of learning and memory is associated with chromatin remodeling. Nature 447, 178–182 (2007). 60. Guan, J.S. et al. HDAC2 negatively regulates memory formation and synaptic plasticity. Nature 459, 55–60 (2007). 61. Ptak, C. & Petronis, A. Epigenetics and complex disease: from etiology to new therapeutics. Annu. Rev. Pharmacol. Toxicol. 48, 257–276 (2008). 62. Siu, L.L. Phase I study of MGCD0103 given as a three-times-per-week oral dose in patients with advanced solid tumors. J. Clin. Oncol. 26, 1940–1947 (2008).
nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010
63. Garcia-Manero, G. et al. Phase 1 study of the oral isotype specific histone deacetylase inhibitor MGCD0103 in leukemia. Blood 112, 981–989 (2008). 64. Lall, S. Primers on chromatin. Nat. Struct. Mol. Biol. 14, 1110–1115 (2007). 65. Huang, J. & Berger, S.L. The emerging field of dynamic lysine methylation of non-histone proteins. Curr. Opin. Genet. Dev. 18, 152–158 (2008). 66. Lee, Y.H. & Stallcup, M.R. Minireview: protein arginine methylation of nonhistone proteins in transcriptional regulation. Mol. Endocrinol. 23, 425–433 (2009). 67. Metzger, E. et al. LSD1 demethylates repressive histone marks to promote androgen-receptor-dependent transcription. Nature 437, 436–439 (2005). 68. Huang, Y. et al. Novel oligoamine analogues inhibit lysine-specific demethylase 1 and induce reexpression of epigenetically silenced genes. Clin. Cancer Res. 15, 7217–7228 (2009). 69. Schulte, J.H. et al. Lysine-specific demethylase 1 is strongly expressed in poorly differentiated neuroblastoma: implications for therapy. Cancer Res. 69, 2065– 2071 (2009). 70. Wang, J. et al. The lysine demethylase LSD1 (KDM1) is required for maintenance of global DNA methylation. Nat. Genet. 41, 125–129 (2009). 71. Gal-Yam, E.N. et al. Frequent switching of Polycomb repressive marks and DNA hypermethylation in the PC3 prostate cancer cell line. Proc. Natl. Acad. Sci. USA 105, 12979–12984 (2008). 72. Mohammad, H.P. et al. Polycomb CBX7 promotes initiation of heritable repression of genes frequently silenced with cancer-specific DNA hypermethylation. Cancer Res. 69, 6322–6330 (2009). 73. Tan, J. et al. Pharmacologic disruption of Polycomb-repressive complex 2-mediated gene repression selectively induces apoptosis in cancer cells. Genes Dev. 21, 1050–1063 (2007). 74. Varambally, S. et al. Genomic loss of microRNA-101 leads to overexpression of histone methyltransferase EZH2 in cancer. Science 322, 1695–1699 (2008). 75. Miranda, T.B. et al. DZNep is a global histone methylation inhibitor that reactivates developmental genes not silenced by DNA methylation. Mol. Cancer Ther. 8, 1579–1588 (2009). 76. Cha, T.L. et al. Akt-mediated phosphorylation of EZH2 suppresses methylation of lysine 27 in histone H3. Science 310, 306–310 (2005). 77. Yu, J. et al. An integrated network of androgen receptor, polycomb, and TMPRSS2-ERG gene fusions in prostate cancer progression. Cancer Cell 17, 443–454 (2010). 78. Huang, J. et al. G9a and Glp methylate lysine 373 in the tumor suppressor p53. J. Biol. Chem. 285, 9636–9641 (2010). 79. Kondo, Y. et al. Downregulation of histone H3 lysine 9 methyltransferase G9a induces centrosome disruption and chromosome instability in cancer cells. PLoS ONE 3, e2037 (2008). 80. Kubicek, S. et al. Reversal of H3K9me2 by a small-molecule inhibitor for the G9a histone methyltransferase. Mol. Cell 25, 473–481 (2007). 81. Chang, Y. et al. Structural basis for G9a-like protein lysine methyltransferase inhibition by BIX-01294. Nat. Struct. Mol. Biol. 16, 312–317 (2009). 82. Marango, J. et al. The MMSET protein is a histone methyltransferase with characteristics of a transcriptional corepressor. Blood 111, 3145–3154 (2008). 83. Kim, H. et al. Requirement of histone methyltransferase SMYD3 for estrogen receptor-mediated transcription. J. Biol. Chem. 284, 19867–19877 (2009). 84. Cloos, P.A. et al. The putative oncogene GASC1 demethylates tri- and dimethylated lysine 9 on histone H3. Nature 442, 307–311 (2006). 85. Croce, C.M. Causes and consequences of microRNA dysregulation in cancer. Nat. Rev. Genet. 10, 704–714 (2009). 86. Eacker, S.M., Dawson, T.M. & Dawson, V.L. Understanding microRNAs in neurodegeneration. Nat. Rev. Neurosci. 10, 837–841 (2009). 87. Friedman, J.M. et al. The putative tumor suppressor microRNA-101 modulates the cancer epigenome by repressing the polycomb group protein EZH2. Cancer Res. 69, 2623–2629 (2009). 88. Ng, E.K. et al. MicroRNA-143 targets DNA methyltransferases 3A in colorectal cancer. Br. J. Cancer 101, 699–706 (2009). 89. Fabbri, M. et al. MicroRNA-29 family reverts aberrant methylation in lung cancer by targeting DNA methyltransferases 3A and 3B. Proc. Natl. Acad. Sci. USA 104, 15805–15810 (2007). 90. Saito, Y. et al. Specific activation of microRNA-127 with downregulation of the proto-oncogene BCL6 by chromatin-modifying drugs in human cancer cells. Cancer Cell 9, 435–443 (2006). 91. Lujambio, A. et al. A microRNA DNA methylation signature for human cancer metastasis. Proc. Natl. Acad. Sci. USA 105, 13556–13561 (2008). 92. Kota, J. et al. Therapeutic microRNA delivery suppresses tumorigenesis in a murine liver cancer model. Cell 137, 1005–1017 (2009). 93. McCaffrey, A.P. et al. The host response to adenovirus, helper-dependent adenovirus, and adeno-associated virus in mouse liver. Mol. Ther. 16, 931–941 (2008). 94. Lanford, R.E. et al. Therapeutic silencing of microRNA-122 in primates with chronic hepatitis C virus infection. Science 327, 198–201 (2010). 95. Yanaihara, N. et al. Unique microRNA molecular profiles in lung cancer diagnosis and prognosis. Cancer Cell 9, 189–198 (2006). 96. Calin, G.A. & Croce, C.M. MicroRNA signatures in human cancers. Nat. Rev. Cancer 6, 857–866 (2006). 97. Jahangeer, S., Elliott, R.M. & Henneberry, R.C. beta-Adrenergic receptor induction in HeLa cells: synergistic effect of 5-azacytidine and butyrate. Biochem. Biophys. Res. Commun. 108, 1434–1440 (1982).
1077
© 2010 Nature America, Inc. All rights reserved.
review 98. Cameron, E.E., Bachman, K.E., Myohanen, S., Herman, J.G. & Baylin, S.B. Synergy of demethylation and histone deacetylase inhibition in the re-expression of genes silenced in cancer. Nat. Genet. 21, 103–107 (1999). 99. Kuendgen, A. & Lubbert, M. Current status of epigenetic treatment in myelodysplastic syndromes. Ann. Hematol. 87, 601–611 (2008). 100. Chen, J., Odenike, O. & Rowley, J.D. Leukaemogenesis: more than mutant genes. Nat. Rev. Cancer 10, 23–36 (2010). 101. Fandy, T.E. et al. Early epigenetic changes and DNA damage do not predict clinical response in an overlapping schedule of 5-azacytidine and entinostat in patients with myeloid malignancies. Blood 114, 2764–2773 (2009). 102. Soriano, A.O. et al. Safety and clinical activity of the combination of 5-azacytidine, valproic acid, and all-trans retinoic acid in acute myeloid leukemia and myelodysplastic syndrome. Blood 110, 2302–2308 (2007). 103. Fiskus, W. et al. Combined epigenetic therapy with the histone methyltransferase EZH2 inhibitor 3-deazaneplanocin A and the histone deacetylase inhibitor panobinostat against human AML cells. Blood 114, 2733–2743 (2009). 104. Crea, F. et al. Epigenetic mechanisms of irinotecan sensitivity in colorectal cancer cell lines. Mol. Cancer Ther. 8, 1964–1973 (2009). 105. Ramalingam, S.S. et al. Phase I and pharmacokinetic study of vorinostat, a histone deacetylase inhibitor, in combination with carboplatin and paclitaxel for advanced solid malignancies. Clin. Cancer Res. 13, 3605–3610 (2007). 106. Ramalingam, S.S. et al. Carboplatin and Paclitaxel in combination with either vorinostat or placebo for first-line therapy of advanced non-small-cell lung cancer. J. Clin. Oncol. 28, 56–62 (2010). 107. Satterlee, J., Schübeler, D. & Ng, H. Tackling the epigenome: challenges and opportunities for collaborative efforts. Nat. Biotechnol. 28, 1039–1044 (2010). 108. Bernstein, B.E. et al. The NIH Roadmap epigenomics mapping consortium. Nat. Biotechnol. 28, 1045–1048 (2010).
1078
109. Rasheed, W., Bishton, M., Johnstone, R.W. & Prince, H.M. Histone deacetylase inhibitors in lymphoma and solid malignancies. Expert Rev. Anticancer Ther. 8, 413–432 (2008). 110. Braiteh, F. et al. Phase I study of epigenetic modulation with 5-azacytidine and valproic acid in patients with advanced cancers. Clin. Cancer Res. 14, 6296–6301 (2008). 111. Balch, C., Fang, F., Matei, D.E., Huang, T.H. & Nephew, K.P. Minireview: epigenetic changes in ovarian cancer. Endocrinology 150, 4003–4011 (2009). 112. Lin, J. et al. A phase I dose-finding study of 5-azacytidine in combination with sodium phenylbutyrate in patients with refractory solid tumors. Clin. Cancer Res. 15, 6241–6249 (2009). 113. Berdasco, M. et al. Epigenetic inactivation of the Sotos overgrowth syndrome gene histone methyltransferase NSD1 in human neuroblastoma and glioma. Proc. Natl. Acad. Sci. USA 106, 21830–21835 (2009). 114. Silverman, L.R. et al. Further analysis of trials with azacitidine in patients with myelodysplastic syndrome: studies 8421, 8921, and 9221 by the Cancer and Leukemia Group B. J. Clin. Oncol. 24, 3895–3903 (2006). 115. Kantarjian, H.M. et al. Update of the decitabine experience in higher risk myelodysplastic syndrome and analysis of prognostic factors associated with outcome. Cancer 109, 265–273 (2007). 116. Gore, S.D. et al. Impact of the putative differentiating agent sodium phenylbutyrate on myelodysplastic syndromes and acute myeloid leukemia. Clin. Cancer Res. 7, 2330–2339 (2001). 117. Garcia-Manero, G. et al. Phase 1 study of the histone deacetylase inhibitor vorinostat (suberoylanilide hydroxamic acid [SAHA]) in patients with advanced leukemias and myelodysplastic syndromes. Blood 111, 1060–1066 (2008). 118. Kelly, W.K. et al. Phase I study of an oral histone deacetylase inhibitor, suberoylanilide hydroxamic acid, in patients with advanced cancer. J. Clin. Oncol. 23, 3923–3931 (2005). 119. Munster, P.N. et al. Phase I trial of vorinostat and doxorubicin in solid tumours: histone deacetylase 2 expression as a predictive marker. Br. J. Cancer 101, 1044–1050 (2009).
VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology
re v ie w
Epigenetic modifications in pluripotent and differentiated cells © 2010 Nature America, Inc. All rights reserved.
Alexander Meissner1–3 Epigenetic modifications constitute a complex regulatory layer on top of the genome sequence. Pluripotent and differentiated cells provide a powerful system for investigating how the epigenetic code influences cellular fate. High-throughput sequencing of these cell types has yielded DNA methylation maps at single-nucleotide resolution and many genome-wide chromatin maps. In parallel to epigenome mapping efforts, remarkable progress has been made in our ability to manipulate cell states; ectopic expression of transcription factors has been shown to override developmentally established epigenetic marks and to enable routine generation of induced pluripotent stem (iPS) cells. Despite these advances, many fundamental questions remain. The roles of epigenetic marks and, in particular, of epigenetic modifiers in development and in disease states are not well understood. Although iPS cells appear molecularly and functionally similar to embryonic stem cells, more genome-wide studies are needed to define the extent and functions of epigenetic remodeling during reprogramming. Epigenetics in its classic definition describes mitotically heritable modifications of DNA or chromatin that do not alter the primary nucleotide sequence1,2. A wider definition that is still consistent with the literal meaning (‘epi’; Greek for ‘on top of ’ or ‘in addition to’) would include stable yet reversible molecular mechanisms that lead to a given phenotype without a change in genotype. Epigenetic states can be mitotically inherited and thereby provide a mechanism for the long-term maintenance of cellular identity. Despite their stability, however, epigenetic marks can be readily reprogrammed experimentally using various strategies, including nuclear transfer, cell fusion and ectopic expression of transcription factors. In recent years, pluripotent stem cells—both embryonic stem (ES) cells and iPS cells—have become a well-studied tool for epigenetics research and a principal cell type in major projects, including the ENCODE Project and the US National Institutes of Health (NIH) Roadmap Epigenomics Program. Some of the interest can be explained by the opportunity to generate large numbers of customized iPS cells for regenerative medicine, disease modeling and other applications3. From a basic research perspective, pluripotent stem cells provide a powerful model to study the interplay of epigenetic modifications and dynamics during cellular differentiation. Pluripotent cells have a unique and characteristic epigenetic signature that reflects their broad developmental potential. More generally, the epigenetic landscape of any cell is likely to be a sensitive indicator of its past and current developmental state and may predict its future potential. This review focuses on DNA methylation and histone modifications, 1Department
of Stem Cell and Regenerative Biology, Harvard University, Cambridge, Massachusetts, USA. 2Harvard Stem Cell Institute, Cambridge, Massachusetts, USA. 3Broad Institute, Cambridge, Massachusetts, USA. Correspondence should be addressed to A.M. ([email protected]). Published online 13 October 2010; doi:10.1038/nbt1684
nature biotechnology volume 28 number 10 OCTOBER 2010
which have been studied more extensively than other mechanisms of epigenetic regulation, such as histone variants and non-coding RNAs. I begin by reviewing the role of these marks and modifiers in normal development. The focus will be on functional and genomic studies and the impact of this work on our understanding of epigenetic marks in pluripotent and differentiated cells. I then summarize recent genomescale studies and discuss what has been learned about the epigenetic reprogramming involved in induced pluripotency. Epigenetic modifications in development Mutations in chromatin-modifying enzymes have long been known to cause abnormal developmental phenotypes in model organisms. Several inherited human diseases, including Rett syndrome and immunodeficiency-centromeric instability-facial anomalies (ICF) syndrome, are associated with components of the epigenetic machinery, and over the past two decades aberrant DNA methylation has been clearly linked with cancer. Although we know that epigenetic marks are essential for development (Fig. 1), many important questions remain regarding the exact roles of individual modifications across various developmental stages and cell types. Recent work using conditional deletions in specific tissues has renewed interest in the regulatory role of DNA methylation. At the earliest stages of development, the fertilized oocyte undergoes substantial epigenetic remodeling4,5. In particular, the paternal pronucleus undergoes rapid changes. During spermatogenesis, small basic proteins called protamines replace the majority of nucleosomes, which are reintroduced rapidly after fertilization and before pronuclear fusion. Recent work on human sperm has shown that some nucleosomes are retained and that these show characteristic posttranslational histone modifications6. The few percent of residual, modified histones are enriched near developmental genes, suggesting that these histones support early development6. Upon fertilization the 1079
re v ie w In vivo Knockout phenotype
Polar body
(d.p.c) E 0.5
PN
Sperm
In vitro Epigenetic marks S O
S O
P M
P M
Knockout phenotype
Epigenetic marks
Oocyte
Zygote
Explant ICM TE
E 3.5 E 5.5
ICM Eset (SetDB1)# Embryo (EN, ME and EC) #
E 10.5
Eed## Dnmt1## G9A## Ezh2##
ES cells
Dnmt1# Ring1B# Dnmt3a/b# Ezh2# Suz12 G9A# Eed#
***** EN ME
EC
TE
Dnmt3b#
A
DN
n e3 tio la 27m y h et H3K m
© 2010 Nature America, Inc. All rights reserved.
E 15.5
E 19.5 (birth) Adult
Germ cells
Dnmt3a#* Dnmt2**, Bmi1*** Dnmt3L**** A
DN
n e3 tio la 7m hy K2 t e H3 m
Figure 1 Epigenetic dynamics during in vitro and in vivo differentiation. Left (in vivo): Sperm and oocyte, come together at fertilization to form the totipotent zygote. After extrusion of the second polar body the maternal and paternal pronuclei (PN) migrate and fuse after several hours. Both genomes, paternal and maternal, subsequently undergo substantial epigenetic changes although at different rates. These changes are indicated for two epigenetic marks as examples to the right. Many of the central enzyme genes have been knocked out and result in a lethal phenotype. The respective phenotypes and approximate time observed are shown in the middle. Far right (in vitro): ES cells are derived from the hypomethylated ICM and regain genome-wide DNA methylation and other epigenetic marks by the time ES cell lines are established. For most of the investigated cell types these marks appear not to change globally although locus specific changes are observed upon differentiation. As indicated by the simplified schematic of two epigenetic marks (DNA methylation and H3K27me3), many details about their presence during normal development are still lacking. The drawings are simplified and indicate global levels that remain stable. Both marks will differ between cell types in their distribution. #, lethal. ##, maintenance fine, but has differentiation defects. *Dnmt3a knockout mice die around 3 weeks postnatally and are smaller/runted. **No observed phenotype, no observed effect on DNA methylation, effect on RNA methylation not well studied but possible. ***Mice are viable, but have hematopoietic and neural abnormalities. ****Homozygous male mice are sterile, offspring of homozygous female mice and heterozygous crosses show imprinting defects and die. *****Wild-type ES cells cannot differentiate into trophectodermal cells. Loss of Dnmt1 and global loss of DNA methylation restores this developmental potential. d.p.c., days post coitum; E, embryonic; P, paternal; M, maternal; S, sperm; O, oocyte; PN, pronuclei; EN, endoderm; ME, mesoderm; EC, ectoderm; TE, trophectoderm.
paternal genome undergoes rapid protamine-to-histone exchange, which is followed by further epigenetic changes. Antibodies against 5-methyl cytosine have been used to show that the paternal genome is actively demethylated (between pronuclear stage PN0 and PN5) before pronuclear fusion4,7,8. Additional evidence for this observation comes from several locus-specific DNA methylation assays7,9,10. Subsequently, the maternal genome is demethylated, presumably through a passive mechanism during early cleavage divisions8. Several exceptions to this simplified model exist, including most imprinted genes and some repetitive elements such as intracisternal A-particle (IAP) elements4,11. It is currently unclear how and why these genomic elements evade epigenetic reprogramming in the early embryo. The zygote and early blastomeres are totipotent, which means that they can differentiate into all embryonic and extraembryonic cell types12. The first specification into the pluripotent inner cell mass (ICM) and more lineage-restricted trophectoderm involves several well-studied 1080
transcription factors12,13. The trophectoderm will form only extraembryonic parts and can be used to derive multipotent trophoblast stem cell lines12,14. The latter can differentiate in vitro into trophoblast subtypes, and, after transfer to blastocysts, can contribute to the trophoblast lineage, including extraembryonic ectoderm, ectoplacental cone and giant cells, but not the epiblast or ICM-derived extraembryonic tissues14. By contrast, the pluripotent ICM will form the epiblast and generate the three embryonic germ layers (endoderm, ectoderm and mesoderm). When explanted under appropriate culture conditions, ICM cells give rise to pluripotent ES cells. Mouse blastocysts are typically explanted around 3.5 days after fertilization15,16, whereas 5–9-day-old embryos have been used to derive human ES cells17–19. ES cells are stably maintained in a self-renewing, pluripotent state by an autoregulatory network of transcription factors20,21. Although the pluripotent state can be maintained for extended periods in culture, it exists only transiently in vivo and is lost upon implantation and specification of the epiblast20. volume 28 number 10 OCTOBER 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
re v ie w DNA methylation. DNA methylation is essential for mammalian development and is required in most somatic cells2,22,23. It is established and maintained by three catalytically active enzymes: DNA methyltransferase (Dnmt)1, Dnmt3a and Dnmt3b1,24. Two additional, homologous enzymes, Dnmt2 and Dnmt3l, are expressed in several cell types, including ES cells24. Deletion of Dnmt2 has no apparent phenotype in vitro or in vivo24,25. The ability of Dnmt2 to methylate a specific cytosine in the anticodon loop structure of tRNAs suggests that it might not function as a DNA methyltransferase26. Loss of Dnmt1 results in embryonic lethality around embryonic day (E)8.5–9, and Dnmt1 mutant embryos retain only one-third of the normal amount of DNA methylation27. Dnmt1-deficient embryos show rudiments of the major organs, but they are smaller than normal and appear to be developmentally delayed27. Dnmt3b mutant embryos appear to develop normally before E9.5 but show multiple developmental defects later and do not develop to term28. Conditional deletion of Dnmt3b in mouse embryonic fibroblasts (MEFs) results in partial loss of methylation, indicating the importance of this enzyme, together with Dnmt1, for maintaining epigenomic patterns in proliferating cells29. Unlike mice lacking Dnmt1 or Dnmt3b, homozygous Dnmt3a knockout mice can develop to term but become runted and die ~1 month after birth28. Conditional deletion of Dnmt3a results in imprinting defects in the germline30. Homozygous Dnmt3l mice are viable, but male mice are sterile and heterozygous offspring of homozygous females die owing to imprinting defects31,32. This phenotype is similar to that of Dnmt3adeficient mice and suggests that both enzymes might be involved in establishing correct imprinting patterns. Dnmt3l is a close homolog of Dnmt3a and Dnmt3b that lacks the catalytic domain but is highly expressed in the early embryo, ES cells and germ cells31. It has been suggested to function as a co-regulator of both Dnmt3a and Dnmt3b and has recently been shown to interact with the N-terminal tail of histone H3 when it lacks methylation at lysine 4 (refs. 33,34). Importantly, the genome is transiently hypomethylated during two phases of normal development without adverse effects4. As described above, the first phase is preimplantation development. The totipotent zygote and blastomeres, the pluripotent blastomeres, the pluripotent ICM cells and trophectoderm cells do not require substantial DNA methylation. A second wave of demethylation commences after the specification of primordial germ cells (PGCs) around day E7.25 (refs. 4,35). Genome-wide bisulfite sequencing was used to show that E13.5 PGCs have only 5–20% of genomic DNA methylation left36, confirming that PGCs show transient reduction of DNA methylation without adverse effects on viability. Histone modifications. Histone modifications provide an additional and complex layer of the epigenetic code37. Many of the enzymes that regulate these modifications have been studied extensively, including histone acetyltransferases, deacetylases, methyltransferases and histone demethylases38. Among the best-characterized mediators are protein complexes of the polycomb (PcG) and trithorax (trxG) groups39–41. PcG proteins catalyze two distinct histone modifications: tri-methylation of lysine 27 of histone 3 (H3K27me3) by polycomb repressive complex (PRC) 2 (ref. 42) and mono-ubiquitination of lysine 119 H2A (H2AK119ub1) by PRC1 (ref. 40). H3K27 is tri-methylated by the enhancer of zeste (Ezh2 or KMT6), which contains a SET (su[var]3–9, enhancer of zeste, Trx) domain40–43 and, with Eed (embryonic ectoderm development) and Suz12 (suppressor of zeste 12), are components of PRC2 (ref. 42). Loss of any one of the PRC2 subunits results in severe gastrulation defects, highlighting its essential role in normal development44–46. Ezh2 knockout embryos are underdeveloped and die around E8.5 (refs. 45,47). Ezh2 is upregulated upon fertilization and its expression remains nature biotechnology volume 28 number 10 OCTOBER 2010
high during pre-implantation development45. Its close homolog, Ezh1, is expressed in the fertilized oocyte but is barely detectable at the blastocyst stage45. However, it is expressed in ES cells and found later in the adult47,48. Eed does not appear until day E5.5, suggesting that Ezh2 and maybe Ezh1 also have roles in preimplantation development that are independent of PRC2 and Eed49. Eed-deficient embryos show gastrulation defects and do not maintain X inactivation in extraembryonic cells49. Like mice deficient in the other PRC2 components, Suz12 homozygous mice die during the early postimplantation stages (before day E10.5)46. Similar to loss of PRC2, loss of PRC1 components, such as Ring1B, results in an early embryonic lethal phenotype50. Bmi-1–null mice show several hematopoietic and neurological abnormalities51, and loss of the H3K9 methyltransferases Eset (SetDB1) or G9a causes periand postimplantation lethality52,53. Finally, although not discussed here, mutants for most of the chromatin remodeling and histone chaperones also show early embryonic lethality (for a summary, see refs. 54,55). Together, the knockout studies have clearly established that DNA methylation and histone modification are essential for normal development. But many questions remain regarding the specific contributions of these epigenetic marks to the regulation of gene expression throughout development. The genomic distribution and global patterns of these marks have not been studied in detail. Mice with mutations in most of these genes die early, probably owing to failure to establish early epigenetic patterns that are presumed to dictate later developmental decisions. It is less clear what the effect of such mutations would be after initial specification has taken place. Strategies for exploring these questions in future research include conditional deletions in mouse somatic lineages and cell types and genome-wide mapping of epigenetic modifications in early development. Epigenetic modifications in ES cells Both undifferentiated and differentiated ES cells are widely used to study epigenetic mechanisms—the former because they express many epigenetic modifiers and the latter because they serve as a model of dynamic chromatin remodeling. Functional studies have shown that most epigenetic marks, including DNA methylation, are not required for the survival of pluripotency marker–positive cells in culture. Although if ES cells lacking epigenetic marks can be maintained, they cannot properly execute their developmental potential. A central question in the field is the extent to which epigenetic marks regulate, rather than simply reflect, the pluripotent state in vitro. In this section I describe loss-of-function and genome-scale studies of the epigenetic landscape. These studies have begun to shed light on this question and have provided new insights into the distinct and overlapping functions of DNA methylation and histone modifications in ES cells. Loss-of-function studies in ES cells. The pluripotent state has the potential to generate every cell type, and this potential is reflected in its unique and characteristic epigenetic signature41,56–64. All five DNA methyltransferases—Dnmt1, 2, 3a, 3b and 3l—and the core PRC1 and 2 subunits are highly expressed in undifferentiated mouse ES cells. Like somatic cells, ES cells show high global levels of DNA methylation, with ~60–80% of all CpG dinucleotides being methylated65. Although the global mCpG content is similar, the distribution of the mark is unlike that of any other somatic cell type63 and also very distinct from the hypomethylated ICM. Similarly, the distribution and enrichment of various histone modifications constitutes a unique signature of ES cells, as discussed below. ES cells, like pluripotent cells in vivo, can tolerate global loss of epigenetic marks, including DNA methylation and H3K27 methylation, as shown using knockouts for the Dnmts or PRC2 components27,28,65–68. 1081
© 2010 Nature America, Inc. All rights reserved.
re v ie w Although methylation-deficient ES cells cannot stably differentiate22,66, they maintain the potential to regain pluripotency and to contribute to germline-competent chimeras upon restoration of Dnmts66,69. According to genomic studies, most developmental transcription factors seem not to be regulated by DNA methylation, and pluripotency genes that are, such as Oct4 and Nanog, are unmethylated in the pluripotent state59,63. However, DNA methylation does regulate some genes in pluripotent cells70,71. Using mouse ES cells deficient in Dnmt1,3a and 3b, one study identified a group of genes that is regulated specifically by DNA methylation and is distinct from the genes regulated by PRC1 and 2 and the core pluripotency factors Oct4, Sox2 and Nanog70. For instance, the transcription factor gene Elf5, which is important in the regulation of trophectoderm development, is highly methylated and repressed in undifferentiated ES cells70,71. Dnmt1-deficient ES cells show hypomethylation of Elf5 promoter and gain the ability to differentiate into trophectoderm71. Loss of DNA methylation prevents ES cells from differentiating and often creates sharp colonies with less background differentiation than is normally observed. In contrast, cells deficient in PRC2 are more prone to differentiation44–46,68,72. This can be explained in part by the distinct targets of DNA methylation and H3K27 methylation in ES cells. As discussed above, PcGs silence developmental transcription factors that are not regulated by DNA methylation 59,63. Loss of polycomb marks does not lead to full-blown expression of these transcription factors, but it increases the transcription of many PRC2 target genes above a basal level72. This seems to increase the susceptibility of ES cells to differentiation under suboptimal conditions72 and has been used as an argument in favor of the idea that these marks serve to buffer commitment towards any given lineage73. Loss of any of the subunits (Eed, Suz12 or Ezh2) results in global reduction or loss of H3K27me3 (refs. 46,47,68,72). Although PRC2 suppresses differentiation, it is not required to maintain pluripotency68. Low- and high-passage Eed−/− ES cells generated early embryonic chimeras and contributed to all germ layers. But no Eed−/− MEFs could be derived, and Eed−/− ES cells contributed only rarely in late-gestation embryos (beyond E12.5). As Eed is required for proper PRC2 assembly, this result suggests that PRC2 is essential for differentiation but not for molecular pluripotency. In another study45, loss of Ezh2 led to impaired outgrowth when blastocysts were explanted and failure to derive ES cells. However, more recently it was shown that blastocysts from heterozygous mice can give rise to Ezh2-null ES cells at the expected frequency47, which suggests that Ezh2 is in fact not required for the establishment of ES cells. The discrepancy might be explained by differences in the gene deletions or in the ES cell derivation protocols47. Interestingly, H3K27me3 was still found at previously identified PRC2 target genes in the null ES cells47,72,74, but was lost in the Eed and Suz12 knockout ES cells46,68. Eed-deficient ES cells showed loss of H3K27me2 and H3K27me3 and significant loss of H3K27me1, whereas Ezh2 deletion affected H3K27me3 significantly, H3K27me2 slightly less and H3K27me1 apparently not at all47. ES cells lacking Suz12 show normal H3K27me1, slightly reduced H3K27me2 and loss of H3K27me3 (ref. 46). In addition to the PcG complexes, other histone methyltransferases have also been studied in mouse ES cells. Depletion of the H3K9 methyltransferase Suv39h (KMT1A/B) in mouse ES cells led to notable enrichment of transcripts that corresponded to all classes of repeats75. A short hairpin RNA (shRNA) screen for chromatin regulators involved in pluripotency identified SetDB1 (KMT1E), another H3K9 methyltransferase, as a crucial component for stem cell maintenance76. Notably, SetDB1-occupied genes were found to be a subset of the bivalent genes discussed above. One of the functions of SetDB1 in maintaining ES cells seems to be the repression of trophectoderm 1082
differentiation. Conditional depletion of SetDB1 results in decreased H3K9 methylation and upregulation of Cdx2 (ref. 77). In summary, DNA methylation and histone modifications seem to have distinct targets and roles in undifferentiated ES cells. Because of their largely non-overlapping functions, it may be possible to delete any one modification without completely disrupting the undifferentiated state. It will be important to use double and higher-order knockouts to dissect such compensatory effects. More dynamic and genome-scale data on epigenetic changes during differentiation will certainly advance our understanding of the respective roles of these epigenetic modifications. The epigenetic landscape in ES cells. Whereas genetic knockout and biochemical studies have shed light on the functions of particular DNA and histone modifications, genome-scale studies have provided a broader picture of the functional relevance and roles of epigenetic marks. Recent technological advances have led to comprehensive maps of DNA methylation in mouse and human pluripotent cells61–63. Genome-scale studies in the mouse confirmed the results of previous studies78,79, which showed that the methylation levels of CpGs in both wild-type ES cells and somatic cells have a bimodal distribution, with most genomic regions being either ‘largely unmethylated’ or ‘largely methylated’63. The methylation status of CpGs is highly correlated with the local CpG density. Nearly all high-CpG promoters in ES cells are enriched for H3K4me3 and are devoid of DNA methylation59,63. This anti-correlation is observed for H3K4me1, H3K4me2 and H3K4me3 and may, at least in the germline, be linked to the ability of Dnmt3l to bind only unmodified H3K4 (see above)34. In ES cells, CpGs in lowCpG promoters, which are generally associated with tissue-specific genes79, are mostly methylated, with the exception of a small subset (<10%) that are enriched for H3K4me3 or H3K4me2 (ref. 63). These results are consistent with those of previous studies and support the notion that DNA methylation and some histone modifications show clear correlations80. Similarly, all pluripotency genes in ES cells are generally enriched for H3K4 methylation and show DNA hypomethylation59,81,82. As expected, nearly all loci that are not enriched for H3K4 and H3K27 methylation show widespread DNA methylation in ES cells70. Genome-wide bisulfite sequencing in human ES cells61,62 has confirmed the findings of these previous, more limited, mouse63,64,70 and human studies79. Methylation at sites other than CpG occurs at low levels in mouse ES cells and early embryos65,83–85 but is nearly undetectable in somatic cells83. The level of non-CpG methylation in mouse ES cells depends on Dnmt3a and Dnmt3b, decreasing from ~3% in wild-type cells to background levels in cells that lack both enzymes85. By contrast, the level of CpG methylation remains largely unaffected in the double knockout cells and only decreases slowly over many passages owing to the incomplete fidelity of Dnmt1 (ref. 66). Newly integrated retroviral DNA is rapidly methylated de novo in ES cells and also shows evidence of non-CpG methylation that depends on Dnmt3a and b (ref. 85). Furthermore, overexpression of Dnmt3a in Drosophila, which lacks functional Dnmts and DNA methylation24, induces low levels of non-CpG methylation, consistent with the idea that non-CpG methylation is a consequence of de novo methylation activity83. More extensive bisulfite sequencing maps in mouse ES cells have confirmed the presence of non-CpG methylation63,65. Similarly, human methylomes revealed the presence of non-CpG methylation in human ES cells61,62. The level of non-CpG methylation per site is much lower (85% of the sites show only 10–40% methylation) than for CpG methylation (in H1 ES cells, 77% of mCpG sites were 80–100% methylated)6,61,62,65,83,85. Although the total levels of non-CpG methylation in human ES cells (~25% combined CNG and volume 28 number 10 OCTOBER 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
re v ie w CHH methylation) at first appear much higher than those in mouse ES cells (~3% combined), they are in fact very similar, and the discrepancy can be explained by a difference in the calculation rather than a difference in biology. Early studies in the mouse referred to non-CG methylation in general without distinguishing between CNG and CHH methylation65,83,85. More recent work on human cells, in analogy to the plant literature, distinguishes between CNG and CHH methylation61,62. In plants, CNG methylation depends on the plant-specific methyltransferase CMT3 (refs. 86,87). No homolog of CMT3 has been described in mammals, this is in contrast to the conserved de novo methyltransferases (Dnmt3a and b in mouse and human and DRM2 in plants) and maintenance methyltransferases (Dnmt1 in mouse and human and MET1 in plants)24,87. In both mice and humans, non-CpG methylation is low in differentiated cells62,83 and, as expected, reappears in iPS cells62. In plants, the role of non-CpG methylation has been extensively studied87. However, only one study has described the presence and possible functional role of non-CpG methylation in somatic cells of higher organisms88. This study indicated that CpA methylation is involved in the regulation of enhancers that are required for olfactory receptor choice in the mouse brain88. As all bisulfite sequencing–based approaches61–63 readily detect non-CpG methylation, in the future it will be possible to determine the extent and functions, if any, of these modifications. In addition to 5-methyl cytosine, other covalent modifications to DNA have been found in some mammalian cell types. The TET (ten-eleven translocation) family members catalyze the conversion of 5-methyl cytosine to 5-hydroxymethylcytosine (5hmC)89. 5hmC is detectable in undifferentiated mouse ES cells as well as in Purkinje neurons and granule cells (0.6% and 0.2% of total nucleotides, respectively)90. During ES cell differentiation, the amount of TET1 and 5hmC decreases89, and TET1 knockdown impairs the self-renewal and maintenance of ES cells91. Some targets, including the Nanog promoter, have been suggested in connection with the observed phenotype 91, but more work is needed to fully understand the role of this modification in ES cell biology. Most current technologies for genome-scale DNA methylation profiling, including bisulfite sequencing, cannot discriminate between 5mC and 5hmC, but specific antibodies91 have been developed and will probably be used, in a similar way to 5mC antibodies, for methylated DNA immunoprecipitation and high-throughput sequencing (meDIP-Seq) in the near future92. A more detailed discussion of TET proteins and 5hmC can be found in a recent review93. Finally, it should be noted that other modifications, such as N6-methyladenine, that are frequently found in bacteria94 have not been studied in much detail in mammalian genomes. Similar to the extensive catalog of DNA methylation maps, dozens of chromatin state maps from mouse and human pluripotent cells have been published44,57–63. Globally, ES cells show an open chromatin structure, and active chromatin domains are widespread56–59,95,96. Nearly 75% of promoters (active and inactive) are enriched for H3K4 methylation in human ES cells95. Although most of these promoters experience transcriptional initiation, only a subset is enriched for the elongating form of Pol II and H3K36 methylation95. H3K27 methylation is a key factor in balancing the self-renewal and differentiation of ES cells56,72. In addition to genome-wide chromatin maps, the localization of many of the core subunits of PRC1 and PRC2 have been reported56,72,74. Ezh1 was shown to directly interact with the other PRC2 components, and co-localization of Ezh1 and K27me3 in both wild-type and Ezh2-null cells suggests that Ezh1 has a direct role in the establishment of this mark47. The combinatorial pattern of histone marks is complex, and many new marks and states (combination of marks) are likely to be discovered in the coming years. A particularly well-studied combination nature biotechnology volume 28 number 10 OCTOBER 2010
is the co-occurrence of the repressive H3K27me3 with the active H3K4me3, termed a ‘bivalent domain’59,60. Interestingly, about half of the identified bivalent domains in ES cells have binding sites for at least one of the three pluripotency-associated transcription factors (Oct4, Nanog and Sox2)72,97. Better and more binding data for the different transcription factors will probably enhance our ability to investigate the nuances of this co-occupation further. Bivalent domains generally show PcG occupancy, but can be subdivided into two groups on the basis of co-occupancy of both PRC1 and PRC2 or occupancy by PRC2 alone56. Promoters that are ‘co-occupied’ by both complexes can retain PcG-mediated chromatin structure more efficiently upon differentiation56. Finally, H3K4me3 and H3K56ac have been shown to share occupancy in human ES cells98. Co-localization of NANOG, SOX2 and OCT4 is more often associated with H3K56Ac than H3K4me3, providing an additional link with the core pluripotency network98. Genomic regions that are associated with gene silencing—including transposon and repetitive elements—frequently possess the well-known heterochromatin marks H3K9me3 and H4K20me3 (refs. 40,59,75), but additional marks have been connected with these regions, including the globular H3K64me3 that is enriched at pericentric heterochromatin. H3K64me3 shows enrichment in mouse ES cells compared with differentiated cells99, consistent with the observation that epigenetic patterns at repeats change substantially during differentiation75,99. The open ES cell chromatin structure, which is enriched in noncompact euchromatin, allows easy access for transcription factors and the transcriptional machinery and may explain observed global ‘hypertranscription’. By contrast, lineage commitment is accompanied by the accumulation of regions of highly condensed, transcriptionally inactive heterochromatin100. Overall, the genome-scale studies have provided detailed information on the distribution of various epigenetic marks. This has turned out to be a powerful source for understanding the role and relationship of individual marks and enabled more precise annotation of genomic features such as enhancers. Finding answers to many of the remaining questions regarding their regulatory roles will be aided by additional maps in gain- and loss-of-function studies as well as by studying the dynamic changes during cellular differentiation. In the following section I will summarize some of the existing results, but clearly more data are required. Dynamic epigenetic changes during differentiation. Despite significant advances in mapping technologies, it is still difficult to investigate lineage specification and the associated global epigenetic remodeling for many cell types in vivo. But the number of cells that is required for epigenetic analysis continues to decrease, suggesting that these exciting studies will become possible in the near future101,102. In the meantime, ES cells provide a powerful in vitro system to study the role and extent of epigenetic modification during lineage commitment. For humans, ES cells are the only available model in which to study many questions of lineage commitment and cell fate decisions. Most of the published genome-scale DNA methylation and histone-modification studies have compared pluripotent cells with in vitro differentiated or donor-derived somatic cells. DNA methylation patterns during the differentiation of pluripotent cells are quite dynamic. Using high-throughput bisulfite sequencing, we compared the DNA methylation patterns of one million CpGs in undifferentiated and differentiated mouse ES cells63. Notably, most of the changes in DNA methylation that were associated with differentiation occurred at distal putative regulatory regions between 1 and 100 kb from known promoters63. Many of these regions might contain 1083
re v ie w
© 2010 Nature America, Inc. All rights reserved.
functional enhancers. A recent study described an important connection between the epigenetic marking in ES cells and the transcriptional competence of tissue-specific enhancers103. Controlled through the target-specific action of the transcription factor FoxD3, individual CpG sites remain unmethylated in ES cells and, upon differentiation, allow the recruitment of other transcription factors when directed towards endoderm (e.g., the Alb1 enhancer) or become methylated in mesoderm and ectoderm differentiation103. Genome-wide maps at nucleotide resolution across many or most cell types will help to define the relevance of individual CpG sites. In one of the recent human methylome reports, a pairwise comparison of undifferentiated human ES cells (line H1) and IMR90 fetal lung fibroblasts showed that the latter had lower levels of CpG methylation in a large proportion of the genome62. Large partially methylated regions (<70% average methylation) were found on many autosomes and the majority of the X chromosome. When a sliding window approach was applied to the two cell types, 491 differentially methylated regions were identified that were more methylated in IMR90 cells than in H1 cells. Many of the genes in these regions had been previously suggested to be important for ES cell function62. The second human methylome study also found differentially methylated regions in the promoter regions of many genes associated with pluripotency and differentiation61. As well as being required for viability upon differentiation, DNA methylation contributes to the repression of core pluripotency genes, such as Oct4 and Nanog. Both of these genes must be repressed during the initial differentiation of ES cells. Although DNA methylation does not seem to be involved in the initial downregulation of Oct4, it is 1084
m K2 e 3 7m D e3 N A Ex me pr
K4
K4 m K2 e 3 7m D e3 N A Ex me pr
m K2 e 3 7m D e3 N A Ex me pr
K4
K4
m K2 e 3 7m D e3 N A Ex me pr
Figure 2 Epigenetic reprogramming during iPS MEFs Nascent iPS cells iPS cells ES cells cell derivation. Shown are selected genes with their chromatin and expression state across distinct cell types color-coded as shown on the bottom (data taken from ref. 81). Data are Gapdh for uninduced mouse embryonic fibroblasts Dnmt1 (MEFs), a hypothetical primary iPS cell colony (nascently reprogrammed) as well as an established iPS (MCV8.1) and ES (V6.5) cell Ink4a line. Upon induction of Oct4, Sox2, Klf4 and c-Myc (OSKM), the MEF epigenome begins Snai1 remodeling. The initial events and required Induce reprogramming factors have not been described yet. After ~10– MyoD 14 days, iPS cell colonies appear that express markers such as Oct4-GFP. The expression of housekeeping genes is not affected, and they Lin28 remain active throughout the reprogramming Fgf4 process (Gapdh and Dnmt1 are representative examples). With the exception of a few marker Oct4 genes, the global extent of remodeling is unknown for this stage. Primary colonies Nanog are then picked and expanded as clonal iPS cell lines. Usually several more passages are Unknown needed before extensive marker stains can be Day 0 Day 10–14 Passage 5–8 epigenetic performed. Typically, at least 5 -8 passages Uninduced remodeling Oct4 positive are required to obtain sufficient material for genome-wide studies. The chromatin and Enriched for K4me3 Expressed Not enriched Hypermethylated expression states for the selected marker Enriched for K27me3 Not expressed Hypomethylated genes are identical in the iPS cell line MCV8.1 Unknown and in a wild-type ES cell line (V6.5) used to construct this schematic 81. Ink4a (Cdkn2a) remains bivalent and sensitive to rapid induction in normal (not transformed or immortalized) cells. Overexpression of OSKM or extended cell culture can induce expression of Ink4a, but ES and iPS cells show bivalent marks and lack of DNA methylation. Snai1 is an expressed somatic gene that becomes repressed and regains bivalency upon reprogramming. MyoD is silent in both MEF and iPS cells, but switches from H3K27 only to a bivalent state upon reprogramming. Lin28 and Fgf4 are repressed by H3K27 methylation, whereas Oct4 and Nanog are repressed by DNA methylation, and all become transcriptionally reactivated only upon reprogramming.
required to stably maintain the repressed state104,105. This is consistent with the general view that DNA methylation functions as a secondary silencing mechanism that provides long-term stability and memory. For example, retroviral elements are rapidly silenced in pluripotent cells and are targets for de novo methylation. However, they are silenced even in the absence of DNA methylation106. A recent study suggests that proviral silencing may occur by a DNA methylation–independent pathway107. The use of DNA methylation to repress genes that are no longer required (e.g., Oct4 or genes on the inactive X) and the use of more flexible histone marks for developmental genes that will be induced rapidly seems intuitively plausible. However, recent work using conditional deletion of Dnmt1 suggests that DNA methylation is also involved in developmental regulation during differentiation. Differentiation-associated genes become misexpressed and cause differentiation when DNA methylation is removed by deletion of Dnmt1 or Uhrf1 or by overexpression of Gadd45 in epithelial progenitor cells108. DNA methylation was also shown to be involved in hematopoietic differentiation, by conditional deletion of Dnmt1 in hematopoietic stem cells (HSCs)109,110 and in committed B lymphocytes109. These studies were recently partly complemented by DNA methylation data across hematopoietic progenitor and differentiated cell types. Unfortunately, HSCs were not included due to the low numbers of cells available for analysis111. Notably, it had been previously reported that in HSCs de novo methylation is essential for self-renewal but not for differentiation112. More of these conditional studies will be needed to determine the roles of DNA methylation in gene regulation across additional somatic cell populations. volume 28 number 10 OCTOBER 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
re v ie w Bivalent domains maintain genes in a state that is repressed but poised for activation60. Such domains initially seemed to be almost exclusive to pluripotent cells 60, but genome-wide studies in additional cell types identified the domains also in cells with more restricted developmental potential59. It is likely that developmental loci stay bivalent (poised) as long as their expression may be needed and switch off completely once cells have reached a terminally differentiated state. Another group of bivalent loci (unchanged during differentiation) includes genes such as Ink4a (Cdkn2a), which is directly regulated by PcG proteins113. Through its function in controlling the cell cycle, this locus fits well with the idea of a gene that is repressed but poised for rapid activation. Importantly, bivalency rather than stable DNA methylation provides the required flexibility for rapid induction. A key question in chromatin research is how epigenetic modifiers, such as PcGs, are regulated and recruited to target loci. In one study, chromatin data in undifferentiated and differentiated human ES cells revealed a region between HOXD11 and HOXD12 that seems to function as a mammalian polycomb response element114. However, additional analysis will be required as this region also harbors a CG island114, and these have been previously implicated in the recruitment of epigenetic modifiers56. Many other proteins, including Jarid2 (refs. 115–118), and non-coding RNAs119,120 have been implicated in the recruitment and regulation of PcG activity. Recruitment factors, including YY1 (the mammalian homolog of Pho), and their roles in mammalian cells are still under active investigation. The regulation of transcriptional repression has recently been discussed in more detail121. Epigenetic reprogramming and iPS cells It is well established that single or multiple transcription factors can convert one cell type into another. The mechanism by which ectopic transcription factors override the existing epigenetic state and change it into a specific alternative state without passing through normal development or complete resetting of all marks is still largely unknown. The key factors involved in the remodeling have not been identified, and many questions regarding the dynamics of this process remain. Several approaches have been used to return differentiated cells to the pluripotent state (reviewed in ref. 122). In the most promising one, overexpression of Oct4, Sox2, Klf4 and c-Myc was sufficient to reprogram somatic cells to an iPS cell state123. Translation of the approach to human cells has allowed the generation of patient-specific stem cells and transformed the field of regenerative biology124–126. Hundreds of studies have followed initial reports of the generation of germline-competent iPS cells125,127–129. Advances in the use of small molecules have raised hopes that in the future it will be possible to reprogram any cell with a cocktail of small molecules81,130,131. In the mouse system, several functional assays exist to determine the developmental potential or limitations of pluripotent cells21. These assays cannot be carried out for human ES or iPS cells, making a comprehensive and careful characterization of the cellular or epigenetic state an essential substitute. We do not know how much epigenetic remodeling really occurs during the initial reprogramming phase, until the time when the first pluripotency markers appear in individual colonies (Fig. 2). Extended passaging of human132 and mouse133 iPS cells improves the level of reprogramming. This might explain many of the conflicting reports on iPS cell variation, given that investigators rarely control for passage numbers and sometimes fail even to report them. Most of the experiments published to date should therefore be compared only with caution, both because derivation and culture conditions may have varied and because passage numbers are rarely reported and difficult to compare. nature biotechnology volume 28 number 10 OCTOBER 2010
Interestingly, somatic cell nuclear transfer has often been described as more efficient than reprogramming with transcription factors. However, embryos derived from nuclear transfer show major epigenetic reprogramming deficiencies5. For example, genes expressed in the donor cell nuclei can be expressed in an inappropriate lineage of the embryo134. In embryos of neurectodermal origin, more than half overexpressed the neural marker Sox2 in endodermal cells134. Even a second round of nuclear transfer could not establish a fully reprogrammed state, suggesting that epigenetic memory was persistent. This cellular memory was apparently the result of the euchromatic histone variant H3.3 (ref. 135), further highlighting the complexity of epigenetic remodeling during reprogramming. More evidence of incomplete or aberrant epigenetic reprogramming during nuclear transfer comes from the finding that cloned mice are heavier than normal mice and show several behavioral and metabolic alterations that are consistent with obesity136. The obese phenotype was not transmitted through the germline, suggesting that it was probably caused by reversible epigenetic changes. Although the oocyte is primed for extensive chromatin remodeling and reprogramming of the paternal and maternal genomes, it often fails to accomplish complete reprogramming of somatic donor cells5. When the expression of Oct4 and ten Oct4-related genes was tested in blastocysts derived by somatic cell nuclear transfer using cumulus cells as donor cells, nearly 40% of the blastocysts failed to express at least one of the 11 genes, whereas all 15 fertilizationderived blastocysts expressed the complete set137. In somatic cells, most of these genes are silenced by DNA methylation of their promoters82. Notably, many of the genes that are frequently not reprogrammed by the oocyte are also inefficiently reprogrammed during the generation of iPS cells. Oct4, Dppa2, Dppa3, Dppa4 and Dppa5 remain inactive until late stages of reprogramming or are expressed only in established iPS cell lines81. This group of genes is highly DNA methylated in MEFs, which might explain why it is difficult to reprogram them. Indeed, hypomethylation has been shown to facilitate reprogramming by both somatic cell nuclear transfer 138 and transcription factors81. By combining gene expression data and genome-wide epigenetic maps, our group found a strong correlation between the chromatin state in fibroblasts and the activation of genes that are expressed in pluripotent cells81. Genes in an open chromatin state (marked by H3K4me3 or H3K4me3 and H3K27me3) are efficiently reactivated, whereas genes that are silenced by H3K27me3 methylation alone remain mostly repressed. Both near promoters and in intergenic regions, most (97%) of the high CpG promoters that lack H3K4me3 enrichment in MEFs regain this mark in one of the iPS cell lines that we investigated81. These genes fall into at least two groups. One group shows the reactivation and associated gain of H3K4me3 at key pluripotency factors. These can be further subdivided into two classes: the first includes genes such as Lin28, Sox2 and Fgf4, which are repressed by H3K27 and lack detectable H3K4me3; the second includes the genes Oct4, Nanog and Dppa, which are repressed by DNA methylation and also lack detectable H3K4me3. The other major group describes genes, including MyoD, that are repressed by H3K27me3 and highly enriched for developmental transcription factors. To create a truly pluripotent cell line, these loci must remain repressed but have to acquire H3K4me3 to reestablish their bivalency and thus their developmental competence for all germ layers and cell types. Failure to do so is not detectable in gene expression data and can only be observed at the epigenetic level81. Overall, gene expression patterns would suggest a much less dynamic transition from the differentiated state to pluripotency (Fig. 2). 1085
© 2010 Nature America, Inc. All rights reserved.
re v ie w Conclusions More than 60 years ago, the best-characterized epigenetic mark, 5-methyl cytosine, was reported to be a minor constituent of mammalian genomes139. Now, 10 years after the first draft of the human genome sequence140, we have several DNA methylation maps of the human genome at single-nucleotide resolution and dozens of genome-wide chromatin state maps41,56–64. These studies have provided novel insights into transcriptional regulation and the role of epigenetic modifications across cell types. Knowledge of the genomic distribution of many histone modifications has helped us to understand the roles of these modifications and enabled more efficient annotation of the genome, including the identification of putative enhancers141, miRNAs142 and large intergenic non-coding (linc)RNAs143. The expanding catalog of lincRNAs (>3,000) and their association with PRC2 points to a possible general mechanism whereby RNAs can guide chromatin-modifying complexes to their specific sites of action144. More generally, the rapidly advancing field of noncoding RNAs underscores that we are still far from understanding how the epigenetic machinery operates in cells. In the short term, investigation of the chromatin state may help to explain differences in the responses of cells to the ectopic expression of transcription factors and lead to more efficient methods of reprogramming. Furthermore, a large repertoire of in vivo epigenome maps will also facilitate studies to determine the quality and utility of cells generated in vitro by directed differentiation or reprogramming. The enormous increase in sequencing capabilities clearly indicates that this is only the beginning. Coordinated national and international large-scale efforts, including the NIH Roadmap Epigenomics Program and the International Human Epigenome Consortium (IHEC), are underway to comprehensively map the entire human epigenome or at least capture as much of the epigenomic space (cell types × epigenetic marks × different backgrounds, such as age, genetics, diet) as possible. Epigenome reference maps and data from these projects will significantly increase our understanding of normal biology, ES cells and epigenetic reprogramming as well as undesired changes in disease states. ACKNOWLEDGMENTS I thank C. Bock, Z. Smith and B. Bernstein for comments on the manuscript. A.M. is supported by the Massachusetts Life Science Center, the Pew Charitable Trusts and the US National Institutes of Health Roadmap Initiative on Epigenomics (U01ES017155). COMPETING FINANCIAL INTERESTS The author declares no competing financial interests. Published online at http://www.nature.com/naturebiotechnology/. Reprints and permissions information is available online at http://npg.nature.com/ reprintsandpermissions/. Bird, A. DNA methylation patterns and epigenetic memory. Genes Dev. 16, 6–21 (2002). 2. Jaenisch, R. & Bird, A. Epigenetic regulation of gene expression: how the genome integrates intrinsic and environmental signals. Nat. Genet. 33 Suppl, 245–254 (2003). 3. Yamanaka, S. Strategies and new developments in the generation of patient-specific pluripotent stem cells. Cell Stem Cell 1, 39–49 (2007). 4. Reik, W., Dean, W. & Walter, J. Epigenetic reprogramming in mammalian development. Science 293, 1089–1093 (2001). 5. Rideout, W.M. III, Eggan, K. & Jaenisch, R. Nuclear cloning and epigenetic reprogramming of the genome. Science 293, 1093–1098 (2001). 6. Hammoud, S.S. et al. Distinctive chromatin in human sperm packages genes for embryo development. Nature 460, 473–478 (2009). 7. Oswald, J. et al. Active demethylation of the paternal genome in the mouse zygote. Curr. Biol. 10, 475–478 (2000). 8. Santos, F., Hendrich, B., Reik, W. & Dean, W. Dynamic reprogramming of DNA methylation in the early mouse embryo. Dev. Biol. 241, 172–182 (2002). 9. Mayer, W., Niveleau, A., Walter, J., Fundele, R. & Haaf, T. Demethylation of the zygotic paternal genome. Nature 403, 501–502 (2000). 10. Kafri, T. et al. Developmental pattern of gene-specific DNA methylation in the mouse embryo and germ line. Genes Dev. 6, 705–714 (1992).
1.
1086
11. Lane, N. et al. Resistance of IAPs to methylation reprogramming may provide a mechanism for epigenetic inheritance in the mouse. Genesis 35, 88–93 (2003). 12. Rossant, J. Stem cells and early lineage development. Cell 132, 527–531 (2008). 13. Niakan, K.K. et al. Sox17 promotes differentiation in mouse embryonic stem cells by directly regulating extraembryonic gene expression and indirectly antagonizing self-renewal. Genes Dev. 24, 312–326 (2010). 14. Tanaka, S., Kunath, T., Hadjantonakis, A.K., Nagy, A. & Rossant, J. Promotion of trophoblast stem cell proliferation by FGF4. Science 282, 2072–2075 (1998). 15. Evans, M.J. & Kaufman, M.H. Establishment in culture of pluripotential cells from mouse embryos. Nature 292, 154–156 (1981). 16. Martin, G.R. Isolation of a pluripotent cell line from early mouse embryos cultured in medium conditioned by teratocarcinoma stem cells. Proc. Natl. Acad. Sci. USA 78, 7634–7638 (1981). 17. Chen, A.E. et al. Optimal timing of inner cell mass isolation increases the efficiency of human embryonic stem cell derivation and allows generation of sibling cell lines. Cell Stem Cell 4, 103–106 (2009). 18. Cowan, C.A. et al. Derivation of embryonic stem-cell lines from human blastocysts. N. Engl. J. Med. 350, 1353–1356 (2004). 19. Thomson, J.A. et al. Embryonic stem cell lines derived from human blastocysts. Science 282, 1145–1147 (1998). 20. Silva, J. & Smith, A. Capturing pluripotency. Cell 132, 532–536 (2008). 21. Jaenisch, R. & Young, R. Stem cells, the molecular circuitry of pluripotency and nuclear reprogramming. Cell 132, 567–582 (2008). 22. Jackson-Grusby, L. et al. Loss of genomic methylation causes p53-dependent apoptosis and epigenetic deregulation. Nat. Genet. 27, 31–39 (2001). 23. Jones, P.A. & Baylin, S.B. The epigenomics of cancer. Cell 128, 683–692 (2007). 24. Goll, M.G. & Bestor, T.H. Eukaryotic cytosine methyltransferases. Annu. Rev. Biochem. 74, 481–514 (2005). 25. Okano, M., Xie, S. & Li, E. Dnmt2 is not required for de novo and maintenance methylation of viral DNA in embryonic stem cells. Nucleic Acids Res. 26, 2536– 2540 (1998). 26. Goll, M.G. et al. Methylation of tRNAAsp by the DNA methyltransferase homolog Dnmt2. Science 311, 395–398 (2006). 27. Li, E., Bestor, T.H. & Jaenisch, R. Targeted mutation of the DNA methyltransferase gene results in embryonic lethality. Cell 69, 915–926 (1992). 28. Okano, M., Bell, D.W., Haber, D.A. & Li, E. DNA methyltransferases Dnmt3a and Dnmt3b are essential for de novo methylation and mammalian development. Cell 99, 247–257 (1999). 29. Dodge, J.E. et al. Inactivation of Dnmt3b in mouse embryonic fibroblasts results in DNA hypomethylation, chromosomal instability, and spontaneous immortalization. J. Biol. Chem. 280, 17986–17991 (2005). 30. Kaneda, M. et al. Essential role for de novo DNA methyltransferase Dnmt3a in paternal and maternal imprinting. Nature 429, 900–903 (2004). 31. Bourc’his, D., Xu, G.L., Lin, C.S., Bollman, B. & Bestor, T.H. Dnmt3L and the establishment of maternal genomic imprints. Science 294, 2536–2539 (2001). 32. Bourc’his, D. & Bestor, T.H. Meiotic catastrophe and retrotransposon reactivation in male germ cells lacking Dnmt3L. Nature 431, 96–99 (2004). 33. Jia, D., Jurkowska, R.Z., Zhang, X., Jeltsch, A. & Cheng, X. Structure of Dnmt3a bound to Dnmt3L suggests a model for de novo DNA methylation. Nature 449, 248–251 (2007). 34. Ooi, S.K. et al. DNMT3L connects unmethylated lysine 4 of histone H3 to de novo methylation of DNA. Nature 448, 714–717 (2007). 35. Hayashi, K. & Surani, M.A. Resetting the epigenome beyond pluripotency in the germline. Cell Stem Cell 4, 493–498 (2009). 36. Popp, C. et al. Genome-wide erasure of DNA methylation in mouse primordial germ cells is affected by AID deficiency. Nature 463, 1101–1105 (2010). 37. Strahl, B.D. & Allis, C.D. The language of covalent histone modifications. Nature 403, 41–45 (2000). 38. Shi, Y. Histone lysine demethylases: emerging roles in development, physiology and disease. Nat. Rev. Genet. 8, 829–833 (2007). 39. Francis, N.J. & Kingston, R.E. Mechanisms of transcriptional memory. Nat. Rev. Mol. Cell Biol. 2, 409–421 (2001). 40. Campos, E.I. & Reinberg, D. Histones: annotating chromatin. Annu. Rev. Genet. 43, 559–599 (2009). 41. Bernstein, B.E., Meissner, A. & Lander, E.S. The mammalian epigenome. Cell 128, 669–681 (2007). 42. Cao, R. et al. Role of histone H3 lysine 27 methylation in Polycomb-group silencing. Science 298, 1039–1043 (2002). 43. Zhang, Y., Cao, R., Wang, L. & Jones, R.S. Mechanism of Polycomb group gene silencing. Cold Spring Harb. Symp. Quant. Biol. 69, 309–317 (2004). 44. Faust, C., Lawson, K.A., Schork, N.J., Thiel, B. & Magnuson, T. The Polycomb-group gene eed is required for normal morphogenetic movements during gastrulation in the mouse embryo. Development 125, 4495–4506 (1998). 45. O’Carroll, D. et al. The polycomb-group gene Ezh2 is required for early mouse development. Mol. Cell. Biol. 21, 4330–4336 (2001). 46. Pasini, D., Bracken, A.P., Jensen, M.R., Lazzerini Denchi, E. & Helin, K. Suz12 is essential for mouse development and for EZH2 histone methyltransferase activity. EMBO J. 23, 4061–4071 (2004). 47. Shen, X. et al. EZH1 mediates methylation on histone H3 lysine 27 and complements EZH2 in maintaining stem cell identity and executing pluripotency. Mol. Cell 32,
volume 28 number 10 OCTOBER 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
re v ie w 48. Ezhkova, E. et al. Ezh2 orchestrates gene expression for the stepwise differentiation of tissue-specific stem cells. Cell 136, 1122–1135 (2009). 49. Shumacher, A., Faust, C. & Magnuson, T. Positional cloning of a global regulator of anterior-posterior patterning in mice. Nature 383, 250–253 (1996). 50. Hanson, R.D. et al. Mammalian Trithorax and polycomb-group homologues are antagonistic regulators of homeotic development. Proc. Natl. Acad. Sci. USA 96, 14372–14377 (1999). 51. van der Lugt, N.M. et al. Posterior transformation, neurological abnormalities, and severe hematopoietic defects in mice with a targeted deletion of the bmi-1 protooncogene. Genes Dev. 8, 757–769 (1994). 52. Dodge, J.E., Kang, Y.K., Beppu, H., Lei, H. & Li, E. Histone H3–K9 methyltransferase ESET is essential for early development. Mol. Cell. Biol. 24, 2478–2486 (2004). 53. Tachibana, M. et al. Histone methyltransferases G9a and GLP form heteromeric complexes and are both crucial for methylation of euchromatin at H3–K9. Genes Dev. 19, 815–826 (2005). 54. Li, E. Chromatin modification and epigenetic reprogramming in mammalian development. Nat. Rev. Genet. 3, 662–673 (2002). 55. Surani, M.A., Hayashi, K. & Hajkova, P. Genetic and epigenetic regulators of pluripotency. Cell 128, 747–762 (2007). 56. Ku, M. et al. Genomewide analysis of PRC1 and PRC2 occupancy identifies two classes of bivalent domains. PLoS Genet. 4, e1000242 (2008). 57. Zhao, X.D. et al. Whole-genome mapping of histone H3 Lys4 and 27 trimethylations reveals distinct genomic compartments in human embryonic stem cells. Cell Stem Cell 1, 286–298 (2007). 58. Pan, G. et al. Whole-genome analysis of histone H3 lysine 4 and lysine 27 methylation in human embryonic stem cells. Cell Stem Cell 1, 299–312 (2007). 59. Mikkelsen, T.S. et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448, 553–560 (2007). 60. Bernstein, B.E. et al. A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell 125, 315–326 (2006). 61. Laurent, L. et al. Dynamic changes in the human methylome during differentiation. Genome Res. 20, 320–331 (2010). 62. Lister, R. et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462, 315–322 (2009). 63. Meissner, A. et al. Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature 454, 766–770 (2008). 64. Mohn, F. et al. Lineage-specific polycomb targets and de novo DNA methylation define restriction and potential of neuronal progenitors. Mol. Cell 30, 755–766 (2008). 65. Meissner, A. et al. Reduced representation bisulfite sequencing for comparative highresolution DNA methylation analysis. Nucleic Acids Res. 33, 5868–5877 (2005). 66. Jackson, M. et al. Severe global DNA hypomethylation blocks differentiation and induces histone hyperacetylation in embryonic stem cells. Mol. Cell. Biol. 24, 8862–8871 (2004). 67. Tsumura, A. et al. Maintenance of self-renewal ability of mouse embryonic stem cells in the absence of DNA methyltransferases Dnmt1, Dnmt3a and Dnmt3b. Genes Cells 11, 805–814 (2006). 68. Chamberlain, S.J., Yee, D. & Magnuson, T. Polycomb repressive complex 2 is dispensable for maintenance of embryonic stem cell pluripotency. Stem Cells 26, 1496–1505 (2008). 69. Holm, T.M. et al. Global loss of imprinting leads to widespread tumorigenesis in adult mice. Cancer Cell 8, 275–285 (2005). 70. Fouse, S. et al. Promoter CpG methylation contributes to ES cell gene regulation in parallel with Oct4/Nanog, PcG complex, and histone H3 K4/K27 trimethylation. Cell Stem Cell 2, 160–169 (2008). 71. Ng, R.K. et al. Epigenetic restriction of embryonic cell lineage fate by methylation of Elf5. Nat. Cell Biol. 10, 1280–1290 (2008). 72. Boyer, L.A. et al. Polycomb complexes repress developmental regulators in murine embryonic stem cells. Nature 441, 349–353 (2006). 73. Chi, A.S. & Bernstein, B.E. Developmental biology. Pluripotent chromatin state. Science 323, 220–221 (2009). 74. Bracken, A.P., Dietrich, N., Pasini, D., Hansen, K.H. & Helin, K. Genome-wide mapping of Polycomb target genes unravels their roles in cell fate transitions. Genes Dev. 20, 1123–1136 (2006). 75. Martens, J.H. et al. The profile of repeat-associated histone lysine methylation states in the mouse epigenome. EMBO J. 24, 800–812 (2005). 76. Bilodeau, S., Kagey, M.H., Frampton, G.M., Rahl, P.B. & Young, R.A. SetDB1 contributes to repression of genes encoding developmental regulators and maintenance of ES cell state. Genes Dev. 23, 2484–2489 (2009). 77. Lohmann, F. et al. KMT1E mediated H3K9 methylation is required for the maintenance of embryonic stem cells by repressing trophectoderm differentiation. Stem Cells 28, 201–212 (2010). 78. Rollins, R.A. et al. Large-scale structure of genomic methylation patterns. Genome Res. 16, 157–163 (2006). 79. Weber, M. et al. Distribution, silencing potential and evolutionary impact of promoter DNA methylation in the human genome. Nat. Genet. 39, 457–466 (2007). 80. Cedar, H. & Bergman, Y. Linking DNA methylation and histone modification: patterns and paradigms. Nat. Rev. Genet. 10, 295–304 (2009). 81. Mikkelsen, T.S. et al. Dissecting direct reprogramming through integrative genomic analysis. Nature 454, 49–55 (2008). 82. Imamura, M. et al. Transcriptional repression and DNA hypermethylation of a small set of ES cell marker genes in male germline stem cells. BMC Dev. Biol. 6, 34 (2006).
nature biotechnology volume 28 number 10 OCTOBER 2010
83. Ramsahoye, B.H. et al. Non-CpG methylation is prevalent in embryonic stem cells and may be mediated by DNA methyltransferase 3a. Proc. Natl. Acad. Sci. USA 97, 5237–5242 (2000). 84. Haines, T.R., Rodenhiser, D.I. & Ainsworth, P.J. Allele-specific non-CpG methylation of the Nf1 gene during early mouse development. Dev. Biol. 240, 585–598 (2001). 85. Dodge, J.E., Ramsahoye, B.H., Wo, Z.G., Okano, M. & Li, E. De novo methylation of MMLV provirus in embryonic stem cells: CpG versus non-CpG methylation. Gene 289, 41–48 (2002). 86. Cokus, S.J. et al. Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature 452, 215–219 (2008). 87. Chan, S.W., Henderson, I.R. & Jacobsen, S.E. Gardening the genome: DNA methylation in Arabidopsis thaliana. Nat. Rev. Genet. 6, 351–360 (2005). 88. Lomvardas, S. et al. Interchromosomal interactions and olfactory receptor choice. Cell 126, 403–413 (2006). 89. Tahiliani, M. et al. Conversion of 5-methylcytosine to 5-hydroxymethylcytosine in mammalian DNA by MLL partner TET1. Science 324, 930–935 (2009). 90. Kriaucionis, S. & Heintz, N. The nuclear DNA base 5-hydroxymethylcytosine is present in Purkinje neurons and the brain. Science 324, 929–930 (2009). 91. Ito, S. et al. Role of Tet proteins in 5mC to 5hmC conversion, ES-cell self-renewal and inner cell mass specification. Nature 466, 1129–1133 (2010). 92. Down, T.A. et al. A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis. Nat. Biotechnol. 26, 779–785 (2008). 93. Wu, S.C. & Zhang, Y. Active DNA demethylation: many roads lead to Rome. Nat. Rev. Mol. Cell Biol. 11, 607–620 (2010). 94. Jeltsch, A. Beyond Watson and Crick: DNA methylation and molecular enzymology of DNA methyltransferases. ChemBioChem 3, 274–293 (2002). 95. Guenther, M.G., Levine, S.S., Boyer, L.A., Jaenisch, R. & Young, R.A. A chromatin landmark and transcription initiation at most promoters in human cells. Cell 130, 77–88 (2007). 96. Heintzman, N.D. et al. Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat. Genet. 39, 311–318 (2007). 97. Boiani, M. & Scholer, H.R. Regulatory networks in embryo-derived pluripotent stem cells. Nat. Rev. Mol. Cell Biol. 6, 872–884 (2005). 98. Xie, W. et al. Histone h3 lysine 56 acetylation is linked to the core transcriptional network in human embryonic stem cells. Mol. Cell 33, 417–427 (2009). 99. Daujat, S. et al. H3K64 trimethylation marks heterochromatin and is dynamically remodeled during developmental reprogramming. Nat. Struct. Mol. Biol. 16, 777– 781 (2009). 100. Efroni, S. et al. Global transcription in pluripotent embryonic stem cells. Cell Stem Cell 2, 437–447 (2008). 101. Gu, H. et al. Genome-scale DNA methylation mapping of clinical samples at singlenucleotide resolution. Nat. Methods 7, 133–136 (2010). 102. Goren, A. et al. Chromatin profiling by directly sequencing small quantities of immunoprecipitated DNA. Nat. Methods 7, 47–49 (2010). 103. Xu, J. et al. Transcriptional competence and the active marking of tissue-specific enhancers by defined transcription factors in embryonic and induced pluripotent stem cells. Genes Dev. 23, 2824–2838 (2009). 104. Epsztejn-Litman, S. et al. De novo DNA methylation promoted by G9a prevents reprogramming of embryonically silenced genes. Nat. Struct. Mol. Biol. 15, 1176–1183 (2008). 105. Feldman, N. et al. G9a-mediated irreversible epigenetic inactivation of Oct-3/4 during early embryogenesis. Nat. Cell Biol. 8, 188–194 (2006). 106. Cherry, S.R., Biniszkiewicz, D., van Parijs, L., Baltimore, D. & Jaenisch, R. Retroviral expression in embryonic stem cells and hematopoietic stem cells. Mol. Cell. Biol. 20, 7419–7426 (2000). 107. Matsui, T. et al. Proviral silencing in embryonic stem cells requires the histone methyltransferase ESET. Nature 464, 927–931 (2010). 108. Sen, G.L., Reuter, J.A., Webster, D.E., Zhu, L. & Khavari, P.A. DNMT1 maintains progenitor function in self-renewing somatic tissue. Nature 463, 563–567 (2010). 109. Bröske, A.M. et al. DNA methylation protects hematopoietic stem cell multipotency from myeloerythroid restriction. Nat. Genet. 41, 1207–1215 (2009). 110. Trowbridge, J.J., Snow, J.W., Kim, J. & Orkin, S.H. DNA methyltransferase 1 is essential for and uniquely regulates hematopoietic stem and progenitor cells. Cell Stem Cell 5, 442–449 (2009). 111. Ji, H. et al. Comprehensive methylome map of lineage commitment from haematopoietic progenitors. Nature 467, 338–342 (2010). 112. Tadokoro, Y., Ema, H., Okano, M., Li, E. & Nakauchi, H. De novo DNA methyltransferase is essential for self-renewal, but not for differentiation, in hematopoietic stem cells. J. Exp. Med. 204, 715–722 (2007). 113. Jacobs, J.J., Kieboom, K., Marino, S., DePinho, R.A. & van Lohuizen, M. The oncogene and Polycomb-group gene bmi-1 regulates cell proliferation and senescence through the ink4a locus. Nature 397, 164–168 (1999). 114. Woo, C.J., Kharchenko, P.V., Daheron, L., Park, P.J. & Kingston, R.E. A region of the human HOXD cluster that confers polycomb-group responsiveness. Cell 140, 99–110 (2010). 115. Li, G. et al. Jarid2 and PRC2, partners in regulating gene expression. Genes Dev. 24, 368–380 (2010). 116. Pasini, D. et al. JARID2 regulates binding of the Polycomb repressive complex 2 to target genes in ES cells. Nature 464, 306–310 (2010). 117. Peng, J.C. et al. Jarid2/Jumonji coordinates control of PRC2 enzymatic activity and target gene occupancy in pluripotent cells. Cell 139, 1290–1302 (2009).
1087
© 2010 Nature America, Inc. All rights reserved.
re v ie w 118. Shen, X. et al. Jumonji modulates polycomb activity and self-renewal versus differentiation of stem cells. Cell 139, 1303–1314 (2009). 119. Khalil, A.M. et al. Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc. Natl. Acad. Sci. USA 106, 11667–11672 (2009). 120. Gupta, R.A. et al. Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. Nature 464, 1071–1076 (2010). 121. Guenther, M.G. & Young, R.A. Transcription. Repressive transcription. Science 329, 150–151 (2010). 122. Hochedlinger, K. & Jaenisch, R. Nuclear reprogramming and pluripotency. Nature 441, 1061–1067 (2006). 123. Takahashi, K. & Yamanaka, S. Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell 126, 663–676 (2006). 124. Park, I.H. et al. Reprogramming of human somatic cells to pluripotency with defined factors. Nature 451, 141–146 (2008). 125. Takahashi, K. et al. Induction of pluripotent stem cells from adult human fibroblasts by defined factors. Cell 131, 861–872 (2007). 126. Yu, J. et al. Induced pluripotent stem cell lines derived from human somatic cells. Science 318, 1917–1920 (2007). 127. Maherali, N. et al. Global epigenetic remodeling in directly reprogrammed fibroblasts. Cell Stem Cell 1, 55–70 (2007). 128. Meissner, A., Wernig, M. & Jaenisch, R. Direct reprogramming of genetically unmodified fibroblasts into pluripotent stem cells. Nat. Biotechnol. 25, 1177–1181 (2007). 129. Wernig, M. et al. In vitro reprogramming of fibroblasts into a pluripotent ES-cell-like state. Nature 448, 318–324 (2007). 130. Huangfu, D. et al. Induction of pluripotent stem cells by defined factors is greatly improved by small-molecule compounds. Nat. Biotechnol. 26, 795–797 (2008). 131. Ichida, J.K. et al. A small-molecule inhibitor of tgf-Beta signaling replaces sox2 in reprogramming by inducing nanog. Cell Stem Cell 5, 491–503 (2009).
1088
132. Chin, M.H. et al. Induced pluripotent stem cells and embryonic stem cells are distinguished by gene expression signatures. Cell Stem Cell 5, 111–123 (2009). 133. Polo, J.M. et al. Cell type of origin influences the molecular and functional properties of mouse induced pluripotent stem cells. Nat. Biotechnol. 28, 848–855 (2010). 134. Ng, R.K. & Gurdon, J.B. Epigenetic memory of active gene transcription is inherited through somatic cell nuclear transfer. Proc. Natl. Acad. Sci. USA 102, 1957–1962 (2005). 135. Ng, R.K. & Gurdon, J.B. Epigenetic memory of an active gene state depends on histone H3.3 incorporation into chromatin in the absence of transcription. Nat. Cell Biol. 10, 102–109 (2008). 136. Tamashiro, K.L. et al. Cloned mice have an obese phenotype not transmitted to their offspring. Nat. Med. 8, 262–267 (2002). 137. Bortvin, A. et al. Incomplete reactivation of Oct4-related genes in mouse embryos cloned from somatic nuclei. Development 130, 1673–1680 (2003). 138. Blelloch, R. et al. Reprogramming efficiency following somatic cell nuclear transfer is influenced by the differentiation and methylation state of the donor nucleus. Stem Cells 24, 2007–2013 (2006). 139. Hotchkiss, R.D. The quantitative separation of purines, pyrimidines, and nucleosides by paper chromatography. J. Biol. Chem. 175, 315–332 (1948). 140. Lander, E.S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001). 141. Heintzman, N.D. et al. Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat. Genet. 39, 311–318 (2007). 142. Marson, A. et al. Connecting microRNA genes to the core transcriptional regulatory circuitry of embryonic stem cells. Cell 134, 521–533 (2008). 143. Guttman, M. et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458, 223–227 (2009). 144. Khalil, A.M. et al. Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc. Natl. Acad. Sci. USA 106, 11667–11672 (2009).
volume 28 number 10 OCTOBER 2010 nature biotechnology
review
Genomics tools for unraveling chromosome architecture © 2010 Nature America, Inc. All rights reserved.
Bas van Steensel1 & Job Dekker2 The spatial organization of chromosomes inside the cell nucleus is still poorly understood. This organization is guided by intra- and interchromosomal contacts and by interactions of specific chromosomal loci with relatively fixed nuclear ‘landmarks’ such as the nuclear envelope and the nucleolus. Researchers have begun to use new molecular genome-wide mapping techniques to uncover both types of molecular interactions, providing insights into the fundamental principles of interphase chromosome folding. The three-dimensional (3D) architecture of interphase chromosomes is one of the most fascinating topological problems in biology. Decades of microscopy studies have revealed several important general principles that govern chromosome architecture1–3. First, interphase chromosomes each occupy their own territory in the nucleus, with only a limited degree of intermingling. Second, genomic loci tend to be nonrandomly positioned within the nuclear space and relative to each other, strongly suggesting that chromosomes adopt a configuration that is at least partially reproducible. Finally, the degree of compaction of the chromatin fiber varies locally, and is often, but not always, inversely linked to transcriptional activity and gene density. These important insights have been mostly obtained by fluorescence in situ hybridization (FISH) and in vivo tagging of selected genomic loci1–3. The power of these methods lies in their ability to visualize individual loci inside single cell nuclei by light microscopy. However, the resolution limits of light microscopy and the practical restriction that only a few loci can be visualized simultaneously have hampered the construction of detailed models of chromosome architecture. Fortunately, over the past few years several new molecular techniques have been developed toward this goal. These techniques directly probe molecular interactions and thereby offer new views beyond the resolution limits of microscopy. Moreover, by taking advantage of genome-wide detection methods such as high-density microarrays and massively parallel sequencing, researchers can now make comprehensive measurements of structural parameters of chromatin for entire genomes in a single experiment. In essence, the new techniques focus on detecting two distinct classes of molecular contacts involving the chromatin fiber (Fig. 1 and Table 1). One set of techniques identifies physical interactions of genomic loci with relatively fixed nuclear structures (landmarks) such as the nuclear envelope or the nucleolus. This can yield important information about the position of genomic loci in nuclear space. A second set of techniques monitors physical associations between 1Division
of Gene Regulation, Netherlands Cancer Institute, Amsterdam, The Netherlands. 2Program in Gene Function and Expression, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts, USA. Correspondence should be addressed to B.v.S. ([email protected]) or J.D. ([email protected]). Published online 13 October 2010; doi:10.1038/nbt.1680
nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010
linearly distant sequences that come together by folding or bending of the chromatin fiber. Such associations may also occur between loci on different chromosomes. Knowledge of intra- and interchromosomal contacts provides insight into the local or global folding of chromo somes, and into the positioning of chromosomes relative to one another. Various chromatin-landmark interactions and chromatinchromatin contacts have now been mapped systematically. Here, we highlight these new technological developments and the biological understanding they have yielded so far. Molecular mapping of genome interactions with nuclear landmarks The nuclear envelope is the main fixed structure of the nucleus, and has long been thought to provide anchoring sites for interphase chromo somes, and thus to help organize the genome inside the nucleus. The nuclear envelope consists of a double lipid membrane punctured by nuclear pore complexes (NPCs), which act as channels for nuclear import and export4. In most metazoan cells, the nucleoplasmic surface of the inner nuclear membrane is coated by a sheet-like protein structure called the nuclear lamina. Its major constituents are nuclear lamins, which form a dense network of polymer fibers5–7. Both the nuclear lamina and NPCs were proposed decades ago to provide anchoring sites for interphase chromosomes8,9. Indeed, many FISH microscopy studies have supported this model: some genomic loci are preferentially located close to the nuclear envelope, whereas other loci are typically found in the nuclear interior3,10,11. However, because of resolution limits it was generally impossible to tell whether these loci are in molecular contact with the nuclear lamina or the NPCs. Recent genome-wide mapping techniques have begun to provide more global insights into the molecular inter actions of chromosomes with components of the nuclear envelope. Interactions of the genome with the nuclear lamina have been mapped by means of DamID technology (Fig. 2). In this application, a protein of the nuclear lamina (typically a lamin) is fused to DNA adenine methyltransferase (Dam) from Escherichia coli. When it is expressed in cells, this chimeric protein is incorporated into the nuclear lamina. As a consequence, DNA that is in molecular contact with the nuclear lamina in vivo is methylated by the tethered Dam. The resulting tags, which are unique because DNA adenine methylation does not occur endogenously in most eukaryotes, can be mapped using a microarray-based readout12,13. Through this approach, nuclear lamina interactions have been mapped in detail in 1089
review A
B
NPC
Table 1 Genome contacts and mapping techniques Lamina
Nucleolus
C
© 2010 Nature America, Inc. All rights reserved.
D
Figure 1 Cartoon of nucleus depicting the spatial interactions that contribute to the overall architecture of interphase chromosomes. Labels A–D refer to corresponding entries in Table 1.
Drosophila melanogaster, mouse and human cells14–16. In all three species, interactions with the nuclear lamina involve very large genomic domains, rather than focal sites. Mouse and human genomes have >1,000 lamina-associated domains (LADs) with a median size of ~0.5 megabases (Mb). In human cells, several sequence elements demarcate the borders of many LADs, indicating that LAD organization is at least partially encoded in the genome sequence15. Although LADs have on average a relatively low gene density, when combined they nevertheless harbor thousands of genes. Notably, most of these genes are transcriptionally inactive15,16. This suggests that the nuclear lamina has a repressive role in gene regulation. Consistent with this, deletion of the major lamin in D. melanogaster causes upregulation of some genes associated with nuclear lamina17. Moreover, artificial tethering to the nuclear lamina can cause the downregulation of reporter and some endogenous genes, although this may depend on the reporter or its genomic integration site18–20. Furthermore, during differentiation, hundreds of genes show altered interactions with nuclear lamina. For many genes, detachment from the nuclear lamina occurs concomitant with transcriptional activation; other detached genes initially remain silent but are more prone to activation in a second differentiation step, suggesting that interaction with the nuclear lamina locks these genes in a stably repressed state16. Interactions of the genome with NPCs have been studied by both DamID and chromatin immunoprecipitation (ChIP). The latter technique uses cross-linking of protein-DNA interactions with formal dehyde (and sometimes other cross-linking chemicals), followed by mechanical fragmentation of the DNA and subsequent immuno precipitation using antibodies, in this case antibodies for NPC proteins (Nups). Genome-wide tiling microarrays have been used to identify the immunoprecipitated DNA sequences. In yeast, D. melanogaster and human cells, hundreds of genes are associated with various Nups21–25. Notably, detailed analyses in D. melanogaster established that a substantial proportion of these binding events occur in the nuclear interior, involving freely diffusing Nups23,24. Although this sheds light on an NPC-independent regulatory role of certain Nups, it also implies that most genome-wide maps of Nup interactions cannot be easily interpreted in terms of spatial organization of the genome, unless one conducts ChIP or DamID experiments with Nups that are only present in the NPC and not in the nucleoplasm. Fornerod and colleagues compared DamID maps obtained with engineered Nups that are either exclusively 1090
Genome contacts
Techniques
A. Nuclear lamina B. Nuclear pores C. Nucleolus D. Intra- and interchromosomal
DamID ChIP, DamID Fractionation 3C and derivatives
NPC-associated or mostly nucleoplasmic23. True NPC-associated loci thus identified are rather short sequences of <2 kilobases (kb) that do not overlap with the larger nuclear lamina–associated domains, in agreement with the spatial separation of NPCs and the nuclear lamina seen by high-resolution microscopy26. The NPC-interacting sites tend to be located on genes that are transcribed at moderate levels23. Both ChIP and DamID have some limitations. In its current implementation, DamID has low temporal resolution13 and therefore cannot capture the dynamics of nuclear lamina and NPC interactions, for example during cell cycle progression. Development of a rapidly switchable Dam enzyme should overcome this limitation. ChIP has better temporal resolution because formaldehyde cross-linking occurs within minutes. However, it has so far been difficult to generate ChIP maps of nuclear lamina components, for reasons that are not understood. Another nuclear landmark that acts as an anchoring site for DNA is the nucleolus. Originally, this nuclear compartment was thought to harbor only the genes encoding ribosomal RNA, which are transcribed by RNA polymerase I. To find other sequences that may interact with nucleoli, a recent study used simple sedimentation fractionation to isolate nucleoli from human cells. The associated DNA was then characterized by massively parallel sequencing and microarray hybridizations27. In addition to rRNA genes, many large genomic regions called nucleolus-associated domains (NADs) were identified. NADs are large genomic segments (median size 750 kb) that are highly enriched in centromeric satellite repeats and specific inactive gene clusters; this is consistent with the preferential localization of centromeres around nucleoli27,28. Notably, the genes encoding 5S rRNA and transfer RNAs, which are transcribed by RNA polymerase III, also preferentially associate with the nucleolus, in agreement with earlier microscopy observations29. Other NAD-embedded genes tend to take part in specific biological processes, such as odor perception, tissue development and the immune response, suggesting that nucleolus interactions may help coordinate the expression of specific gene sets. Together, these results demonstrate that distinct sets of chromosomal regions interact specifically with the nuclear lamina, NPCs and nucleoli. Mapping of long-range chromatin interactions Microscopic analysis of interphase chromosomes suggests that they form amorphously shaped territories, with seemingly little internal organization. Yet, chromosomes must be folded in intricate patterns, for example, to accommodate association of silent loci with the nuclear periphery, while simultaneously allowing expressed loci to congregate at sites of active transcription (‘transcription factories’). Furthermore, gene expression is modulated by cis regulatory elements, such as enhancers, that often are located hundreds of kilobases from their target genes. Many enhancers are thought to physically associate with the promoters they regulate, leading to formation of chromatin loops. A human chromosome contains hundreds to thousands of genes and each interacts, when active, with a set of regulatory elements. This array of long-range interactions will constrain the chromatin fiber into a highly complex 3D network. The precise topology of these chromatin interaction networks, and how these networks are embedded inside the nucleus, is still mostly unknown, but new molecular and genome-wide approaches are now starting to clarify the folding principles of chromosomes. VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology
review Dam
NL protein Express Dam-fusion protein in cells or tissue
DNA in contact with NL becomes adenine-methylated Isolate genomic DNA
Label and hybridize to genomic tiling array Dam-fusion/Dam log ratio
© 2010 Nature America, Inc. All rights reserved.
Selectively amplify adenine-methylated DNA fragments
1 0 –1 –2 –3 90
92
94 96 98 100 Position on chromosome 5 (Mb)
102
Figure 2 Mapping of interactions of the genome with nuclear landmarks, here shown for the nuclear lamina. See text for explanation. Adeninemethylated DNA is specifically amplified using a PCR-based protocol using restriction endonucleases that selectively digest DNA depending on the adenine-methylation state, as described elsewhere12,13. NL, nuclear lamina.
The most widely used molecular method to probe the spatial folding of chromatin is chromosome conformation capture30 (3C). 3C determines the relative frequency with which pairs of genomic loci are in direct physical contact. Chromatin is cross-linked with formaldehyde, after which DNA is digested and then re-ligated under dilute conditions that favor intramolecular ligation of cross-linked fragments (Fig. 3 and Table 2). This creates a genome-wide library of 3C ligation products, each of which is composed of a pair of restriction fragments that were sufficiently close to become cross-linked. Interactions detected by 3C can be mediated by proteins that bridge the two loci, but can also reflect coassociation of loci with larger protein complexes, or perhaps even larger subnuclear structures such as nucleoli and transcription factor ies. Combined, the 3C library reflects the population-averaged folding of the entire genome, at a resolution of several kilobases. In conventional 3C, the relative abundance of individual ligation products is determined using semiquantitative PCR. Initial 3C ana lyses in yeast revealed long-range interactions between telomeres, and between centromeres located on different chromosomes, consistent with earlier microscopic observations30. The first 3C studies that demonstrated long-range looping interactions between genes and their enhancers focused on the well-studied β-globin locus31. nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010
Long-range interactions have now been identified in many candidate loci, for example, the Igf2 locus32, the TH2 cytokine locus33 and the α-globin locus34, and in a variety of species, establishing that looping between genes and regulatory elements is a common mechanism for gene regulation. In many cases, gene promoters interact with multiple elements, and these elements often also interact with each other, leading to the formation of complex looped structures, sometimes called chromatin hubs31. In an effort to map chromatin interactions at a genome-wide scale, researchers have developed several detection methods to more comprehensively interrogate 3C libraries. 4C and 5C methods detect targeted subsets of 3C ligation products (Table 2)35–37. In 4C, inverse PCR is used to amplify all fragments ligated to a single ‘anchor’ fragment to obtain a genome-wide interaction profile for the anchor locus. 5C uses multiplex ligation-mediated amplification to amplify millions of preselected 3C ligation junctions in parallel, for example, between a set of promoters and a set of enhancers. ChIP-loop (also called 6C) and chromatin interaction analysis using paired-end tag sequencing (ChIA-PET) methods include a ChIP step to selectively identify 3C ligation products that are bound by a protein of interest, for example, a transcription factor38–40. All these high-throughput methods use microarrays or high-throughput sequencing to analyze the amplified ligation junctions. Careful experimental design of 3Cbased methods is crucial to avoid artifacts and misinterpretations, as has been discussed in detail elsewhere41,42. Results obtained with these methods confirm that long-range interactions are widespread, and have also been used to identify several new phenomena. First, long-range interactions can occur over very large genomic distances, up to tens of megabases, suggesting that chromosomes are extensively folded back on themselves. Second, interactions not only occur between specific short functional elements, such as enhancers and promoters, but also occur over larger chromosomal domains. Some groups of genes have many interactions with each other all along their lengths, suggesting these genes are in close spatial proximity, perhaps owing to association with the same subnuclear structure such as the nuclear envelope, or with a transcription factory. Third, interactions occur not only along chromosomes, but also between them. For instance, the X chromosome–inactivation center (Xic) of one X chromosome transiently interacts with the Xic of the other X-chromosome while X-chromosome inactivation is established43–45. Another example is the trans association of imprinted genes, which may contribute to their regulation46. Recently, it has become possible to determine chromatin inter actions in a truly unbiased and genome-wide manner, that is, without the need to limit the analysis to one selected anchor or a group of them, or to sites bound by a specific protein47–49. The Hi-C techno logy is also based on 3C, but includes a step before ligation in which the staggered ends of the restriction fragments are filled in with biotin ylated nucleotides48. As a result, ligation junctions are marked with biotin, allowing subsequent purification after DNA shearing using streptavidin-coated beads. Ligation junctions are then analyzed by paired-end high-throughput sequencing to identify the interacting loci. Hi-C data can be used to study the overall folding of genomes. Currently, for large genomes such as those of human and mouse, Hi-C analysis will produce an interaction map with a resolution of ~0.1 to 1 Mb. This resolution is limited only by the number of sequence reads that current platforms can produce, and expected future increases in throughput and decreases in cost will allow the generation of inter action maps with substantially higher resolution. The first Hi-C maps of the human genome confirm several features of nuclear organization that were also detected by microscopy, and these 1091
review Unfortunately, the maps produced so far are derived from diverse cell lines or from different species, so direct comparisons are 3C DNA not yet possible. Nevertheless, we can make 4C Digestion Ligation purification some conclusions and reasonable specula5C tions. At least in D. melanogaster, NPCs and the nuclear lamina clearly interact with difLigation Hi-C product library Immunoferent chromosomal regions, and thus proprecipitation vide two distinct sets of anchoring points. Figure 3 Principles of the major 3C-based In human cells, LADs and NADs both tend DNA technologies. All protocols start with treatment of ChIP-loop purification cells with formaldehyde (not shown), leading to to include centromeric regions15,27, suggestcross-linking of DNA segments in close proximity ing that centromeres in each nucleus are ChIA-PET Ligation to one another. After digestion with one or more distributed between the nuclear lamina and product library restriction enzymes, linked restriction fragments nucleoli. LADs and B-type domains show are intramolecularly ligated. In the case of Hi-C, the ends of the restriction fragments are first filled in some marked similarities (in size range and with biotinylated dNTPs before ligation to facilitate purification of ligation junctions using streptavidinan overall lack of gene activity), suggesting coated beads. Single or multiple ligation events are detected directly (using 3C, 4C, 5C and Hi-C), that they must overlap at least in part. If this or immunoprecipitation is first used to enrich for DNA associated with a protein of interest (using ChIP-loop and ChIA-PET). See Table 2 for overview of different detection strategies and their scope. is true, it suggests that LADs may interact or intermingle with other LADs and form aggremaps have already been used to uncover several new aspects of chromo- gates of compacted chromatin near the nuclear lamina (Fig. 4). This some architecture and nuclear organization48. First, chromosomes exten- model would explain the substantial amounts of heterochromatin sively interact with each other, with some chromosome pairs showing in close contact with the nuclear lamina that have been observed by preferred associations. Thus, chromosomes seem to occupy preferred microscopy. Evidence is accumulating that some epigenetic marks are linked locations with respect to each other. Second, chromosomes are spatially compartmentalized to form two types of nuclear neighborhoods, called to nuclear organization. The timing of DNA replication along the A- and B-type compartments. A-type compartments contain active loci genome shows a block-like structure of alternating large early- and (as indicated by gene expression level and the presence of chromatin late-replicating segments53,54. A genome-wide comparison indifeatures associated with active chromatin such as sites that are hypersensi- cates that late-replicating domains roughly correspond to LADs16, tive to DNase I) whereas B-type compartments are composed of inactive consistent with the enrichment of late-replicating sequences at the chromatin. Spatial separation of active and inactive domains is consistent nuclear periphery53,55. However, LADs and late-replicating domains with earlier observations obtained for individual loci by microscopy50 do not overlap perfectly16, indicating that they are related but not and by 4C35. Third, Hi-C data, like any 3C-based data, can be modeled identical. Late-replicating domains also are markedly similar to the using polymer models to uncover folding states of chromatin (for exam- B-type domains as identified by Hi-C56. Furthermore, the histone ple, refs. 30,51). Computational modeling of Hi-C data revealed that at a modification H3K9me2 has a domain pattern similar to those of length scale of up to several megabases, human chromatin may be folded LADs15,16,57 and of segments of late-replicating DNA56,58. Taken in a polymer state called a fractal globule48. This densely packed state is together, LADs, late-replicating DNA, H3K9me2 domains, and B-type characterized by the absence of knots and entanglements. This unique domains all seem closely related, but more systematic comparisons conformation allows easy folding and unfolding of sections of chromo- are needed to understand their precise relationships. The active compartments of the genome, for example, the A-domains somes, which may be relevant for activating and repressing genes. A variant of Hi-C has also been described that marks ligation junctions identified by Hi-C, may also have cytological correlates. Expressed with a biotinylated oligonucleotide to facilitate their purification49. This method was applied to analysis of the 3D structure of the yeast genome. Table 2 Scope and detection methods of 3C-based technologies The data confirmed all the known hallmarks of nuclear organization, Example including clustering of centromeres and telomeres52. Furthermore, interMethod Scope Detection reference chromosomal interactions were found to occur between tRNA genes and 3C Interaction between two Quantitative PCR 30 between origins of replication that fire early in S phase. selected loci Together, 3C-based studies suggest a bewildering complexity in 4C Genome-wide interactions Inverse PCR followed by 35 long-range communication among a variety of genomic elements of one selected locus detection with microarray across chromosomes and the genome. There is still room for or sequencing Multiplex LMA followed 37 5C All interactions among further technological improvements. For instance, there may be by detection with multiple selected loci some local biases in the interaction maps caused by differences in microarray or sequencing cross-linkability between chromatin types, and differential access Hi-C Unbiased genome-wide Making of junctions with 48 of sequences to the enzymes used in the protocol. Refining the interaction map biotin, shearing and ligation junction purification, technology may overcome some of these potential limitations. We followed by sequencing are only starting to explore the spatial folding of chromosomes, and ChIP-loop Interaction between two Quantitative PCR 38 the new genome-wide 3C methods will probably provide a wealth selected loci bound by a of new insights. particular protein
© 2010 Nature America, Inc. All rights reserved.
Hi-C: fill in with biotin-dCTP
Toward an integrated view of chromosome architecture With several new genome-wide detection methods in place, an integrated picture of chromosome architecture seems within reach. 1092
ChIA-PET Unbiased genome-wide Insertion of linker into interaction map of loci junction, followed by bound by a particular protein sequencing
40
See Figure 3 for protocols for these methods. LMA, ligation-mediated amplification.
VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology
review NPC
Lamina
low-frequency contacts between linearly distant genes (‘gene kissing’) is mostly unclear. In some cases these contacts correlate with gene expression66, but to establish causal relationships researchers must experimentally modulate these contacts, for example, by specifically disrupting them and assessing the impact on gene expression and regulation.
CTCF Transcription machinery
© 2010 Nature America, Inc. All rights reserved.
Figure 4 Speculative cartoon model of chromatin organization. LADs may consist of relatively condensed chromatin (thick lines) and aggregate at the nuclear lamina. Other repressed regions may interact with each other in the nuclear interior, as do active regions. Complexes formed by components of the transcription machinery (transcription factories) and CTCF may tether active regions together. Parts of only two chromosomes are depicted, each in a different color for clarity. Most interactions occur within chromosomes, and relatively few occur between chromosomes.
genes have been observed to cluster at subnuclear foci enriched in transcription machineries, which are sometimes called transcription factories (Fig. 4). In addition, these domains seem to correlate with open chromatin that is replicated early in S phase56,59. Another emerging theme is the critical role of the CTCF protein, a multifunctional DNA-binding protein60. Extensive 3C-based evidence indicates that CTCF can mediate long-range interactions, both in cis32,60–62 and in trans45 (Fig. 4). In addition, borders of human LADs are frequently demarcated by CTCF-binding sites 15, suggesting that CTCF helps control LAD organization. How these observations are linked remains to be elucidated, but CTCF is clearly an important factor in the regulation of chromosome topology. Stochastic nature of interactions So far, all genome-wide datasets that describe chromosome architecture are derived from large pools of cells. Yet microscopy studies have shown that the location of individual genomic loci is highly variable from cell to cell, even in clonal cell lines. This variability has two biological sources. First, within each nucleus, chromatin is mobile to a certain degree63,64. Second, in a newly formed nucleus after mitosis, the relative positioning of chromosomes may be substantially driven by stochastic processes65. It is difficult to calibrate the genome-wide interaction datasets in terms of absolute contact frequencies. Currently this can only be approximated by FISH, which is hampered by insufficient resolution and the possible disruption of chromosome folding by the harsh denaturation conditions the technique requires. Most long-range inter actions between chromosomal loci, as detected by 3C-based methods, probably occur in less than 10–20% of cells at a given time point35,66–68. Contacts of individual LADs and NADs with their respective landmarks may occur in 10–50% of cells14,27. We emphasize that these are only rough estimates, subject to arbitrary definitions of contacts used in the respective studies. The stochastic nature of chromosome architecture raises important questions related to gene regulation. For example, if LADs contact the nuclear lamina only transiently, or only in a subpopulation of cells, then how can such interactions contribute to robust gene repression? One possibility is that a transient contact with the nuclear lamina causes a long-lasting change in the chromatin, for example through a histonemodifying enzyme embedded in the nuclear lamina. Except for enhancerpromoter interactions, the functional relevance of stochastic, relatively nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010
Future outlook A notable theme emerging from studies so far is that metazoan genomes are linearly segmented into large multigene domains, which have specific interactions with nuclear landmarks and each other. This raises the possibility that chromosomal aberrations such as translocations and inversions, which are found in a variety of human genetic disorders69 and in many types of cancer70, can disrupt the spatial organization of the affected chromosomes and perhaps thereby alter gene expression71. Notably, it was recently shown that this logic can also be turned around: 3C-derived techniques can identify chromosomal aberrations on the basis of altered spatial relationships between loci72. Inversely, the spatial organization of the genome may also affect the spectrum of any translocations that could occur in that cell. Loci that are spatially proximal may more frequently engage in translocation than more distant ones73–75. Another class of human disorders that may be of interest in the context of chromosome architecture is the so-called laminopathies. These disorders are caused by congenital defects in proteins of the nuclear lamina. For example, mutations in A-type lamins cause a markedly diverse spectrum of disorders including progeria, muscular dystrophy and cardiomyopathy76. Some of these disorders may involve changes in chromosome architecture due to altered inter actions with the nuclear lamina. Indeed, in cells from patients suffering from Hutchinson-Gilford progeria syndrome (HPGS), which show abnormal accumulation of lamin A at the nuclear lamina, changes have been observed in the morphology and localization of heterochromatin77,78, although this may be an indirect effect of misregulation of certain chromatin proteins79. Mapping of genome–nuclear lamina interactions and chromosome conformation in cells from laminopathy patients may provide important insights into the etiology of this class of disorders. The initial results of various new genome-wide approaches have already uncovered some important principles of chromosome architecture. Higher-resolution views, particularly for Hi-C, will become available as sequencing throughput continues to ramp up. Yet the probabilistic and dynamic nature of chromatin organization poses practical and conceptual challenges. It would be extremely helpful if techniques for the molecular mapping of chromatin architecture could be scaled down to single cells, as this would directly capture cell-tocell variation. Although this will be technically demanding, the rapid advances in high-throughput single-molecule DNA sequencing technologies, combined with further development of methods to detect interactions, may offer new opportunities toward reaching this goal. Acknowledgments We thank members of the van Steensel and Dekker labs and M. Walhout for suggestions. This work was supported by the Netherlands Genomics Initiative and an Netherlands Organization for Scientific Research–Earth and Life Sciences (NWO-ALW) VICI grant to B.v.S., a grant from the US National Institutes of Health (HG003143) and a W.M. Keck Foundation Distinguished Young Scholar Award to J.D. COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests. Published online at http://www.nature.com/naturebiotechnology/. Reprints and permissions information is available online at http://npg.nature.com/ reprintsandpermissions/.
1093
© 2010 Nature America, Inc. All rights reserved.
review 1. Pombo, A. & Branco, M.R. Functional organisation of the genome during interphase. Curr. Opin. Genet. Dev. 17, 451–455 (2007). 2. Misteli, T. Beyond the sequence: cellular organization of genome function. Cell 128, 787–800 (2007). 3. Zhao, R., Bodnar, M.S. & Spector, D.L. Nuclear neighborhoods and gene expression. Curr. Opin. Genet. Dev. 19, 172–179 (2009). 4. Hetzer, M.W. & Wente, S.R. Border control at the nucleus: biogenesis and organization of the nuclear membrane and pore complexes. Dev. Cell 17, 606–616 (2009). 5. Stuurman, N., Heins, S. & Aebi, U. Nuclear lamins: their structure, assembly, and interactions. J. Struct. Biol. 122, 42–66 (1998). 6. Herrmann, H. & Aebi, U. Intermediate filaments: molecular structure, assembly mechanism, and integration into functionally distinct intracellular Scaffolds. Annu. Rev. Biochem. 73, 749–789 (2004). 7. Prokocimer, M. et al. Nuclear lamins: key regulators of nuclear structure and activities. J. Cell Mol. Med. 13, 1059–1085 (2009). 8. Franke, W.W. Structure, biochemistry, and functions of the nuclear envelope. Int. Rev. Cytol. 4 (suppl.), 71–236 (1974). 9. Blobel, G. Gene gating: a hypothesis. Proc. Natl. Acad. Sci. USA 82, 8527–8529 (1985). 10. Takizawa, T., Meaburn, K.J. & Misteli, T. The meaning of gene positioning. Cell 135, 9–13 (2008). 11. Fedorova, E. & Zink, D. Nuclear genome organization: common themes and individual patterns. Curr. Opin. Genet. Dev. 19, 166–171 (2009). 12. Greil, F., Moorman, C. & van Steensel, B. DamID: mapping of in vivo protein-genome interactions using tethered DNA adenine methyltransferase. Methods Enzymol. 410, 342–359 (2006). 13. Vogel, M.J., Peric-Hupkes, D. & van Steensel, B. Detection of in vivo protein-DNA interactions using DamID in mammalian cells. Nat. Protoc. 2, 1467–1478 (2007). 14. Pickersgill, H. et al. Characterization of the Drosophila melanogaster genome at the nuclear lamina. Nat. Genet. 38, 1005–1014 (2006). 15. Guelen, L. et al. Domain organization of human chromosomes revealed by mapping of nuclear lamina interactions. Nature 453, 948–951 (2008). 16. Peric-Hupkes, D. et al. Molecular maps of the reorganization of genome— nuclear lamina interactions during differentiation. Mol. Cell 38, 603–613 (2010). 17. Shevelyov, Y.Y. et al. The B-type lamin is required for somatic repression of testis-specific gene clusters. Proc. Natl. Acad. Sci. USA 106, 3282–3287 (2009). 18. Reddy, K.L., Zullo, J.M., Bertolino, E. & Singh, H. Transcriptional repression mediated by repositioning of genes to the nuclear lamina. Nature 452, 243–247 (2008). 19. Finlan, L.E. et al. Recruitment to the nuclear periphery can alter expression of genes in human cells. PLoS Genet. 4, e1000039 (2008). 20. Kumaran, R.I. & Spector, D.L. A genetic locus targeted to the nuclear periphery in living cells maintains its transcriptional competence. J. Cell Biol. 180, 51–65 (2008). 21. Casolari, J.M. et al. Genome-wide localization of the nuclear transport machinery couples transcriptional status and nuclear organization. Cell 117, 427–439 (2004). 22. Brown, C.R., Kennedy, C.J., Delmar, V.A., Forbes, D.J. & Silver, P.A. Global histone acetylation induces functional genomic reorganization at mammalian nuclear pore complexes. Genes Dev. 22, 627–639 (2008). 23. Kalverda, B., Pickersgill, H., Shloma, V.V. & Fornerod, M. Nucleoporins directly stimulate expression of developmental and cell-cycle genes inside the nucleoplasm. Cell 140, 360–371 (2010). 24. Capelson, M. et al. Chromatin-bound nuclear pore components regulate gene expression in higher eukaryotes. Cell 140, 372–383 (2010). 25. Vaquerizas, J.M. et al. Nuclear pore proteins nup153 and megator define transcriptionally active regions in the Drosophila genome. PLoS Genet. 6, e1000846 (2010). 26. Schermelleh, L. et al. Subdiffraction multicolor imaging of the nuclear periphery with 3D structured illumination microscopy. Science 320, 1332–1336 (2008). 27. Németh, A. et al. Initial genomics of the human nucleolus. PLoS Genet. 6, e1000889 (2010). 28. Stahl, A., Hartung, M., Vagner-Capodano, A.M. & Fouet, C. Chromosomal constitution of nucleolus-associated chromatin in man. Hum. Genet. 35, 27–34 (1976). 29. Thompson, M., Haeusler, R.A., Good, P.D. & Engelke, D.R. Nucleolar clustering of dispersed tRNA genes. Science 302, 1399–1401 (2003). 30. Dekker, J., Rippe, K., Dekker, M. & Kleckner, N. Capturing chromosome conformation. Science 295, 1306–1311 (2002). 31. Tolhuis, B., Palstra, R.J., Splinter, E., Grosveld, F. & de Laat, W. Looping and interaction between hypersensitive sites in the active β-globin locus. Mol. Cell 10, 1453–1465 (2002). 32. Murrell, A., Heeson, S. & Reik, W. Interaction between differentially methylated regions partitions the imprinted genes Igf2 and H19 into parent-specific chromatin loops. Nat. Genet. 36, 889–893 (2004). 33. Spilianakis, C.G. & Flavell, R.A. Long-range intrachromosomal interactions in the T helper type 2 cytokine locus. Nat. Immunol. 5, 1017–1027 (2004). 34. Vernimmen, D., De Gobbi, M., Sloane-Stanley, J.A., Wood, W.G. & Higgs, D.R. Long-range chromosomal interactions regulate the timing of the transition between poised and active gene expression. EMBO J. 26, 2041–2051 (2007).
1094
35. Simonis, M. et al. Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-chip (4C). Nat. Genet. 38, 1348–1354 (2006). 36. Zhao, Z. et al. Circular chromosome conformation capture (4C) uncovers extensive networks of epigenetically regulated intra- and interchromosomal interactions. Nat. Genet. 38, 1341–1347 (2006). 37. Dostie, J. et al. Chromosome Conformation Capture Carbon Copy (5C): a massively parallel solution for mapping interactions between genomic elements. Genome Res. 16, 1299–1309 (2006). 38. Horike, S., Cai, S., Miyano, M., Cheng, J.F. & Kohwi-Shigematsu, T. Loss of silentchromatin looping and impaired imprinting of DLX5 in Rett syndrome. Nat. Genet. 37, 31–40 (2005). 39. Tiwari, V.K., Cope, L., McGarvey, K.M., Ohm, J.E. & Baylin, S.B. A novel 6C assay uncovers Polycomb-mediated higher order chromatin conformations. Genome Res. 18, 1171–1179 (2008). 40. Fullwood, M.J. et al. An oestrogen-receptor-α-bound human chromatin interactome. Nature 462, 58–64 (2009). 41. Simonis, M., Kooren, J. & de Laat, W. An evaluation of 3C-based methods to capture DNA interactions. Nat. Methods 4, 895–901 (2007). 42. Dekker, J. The three ‘C’ s of chromosome conformation capture: controls, controls, controls. Nat. Methods 3, 17–21 (2006). 43. Xu, N., Tsai, C.L. & Lee, J.T. Transient homologous chromosome pairing marks the onset of X inactivation. Science 311, 1149–1152 (2006). 44. Bacher, C.P. et al. Transient colocalization of X-inactivation centres accompanies the initiation of X inactivation. Nat. Cell Biol. 8, 293–299 (2006). 45. Xu, N., Donohoe, M.E., Silva, S.S. & Lee, J.T. Evidence that homologous X-chromosome pairing requires transcription and Ctcf protein. Nat. Genet. 39, 1390–1396 (2007). 46. Sandhu, K.S. et al. Nonallelic transvection of multiple imprinted loci is organized by the H19 imprinting control region during germline development. Genes Dev. 23, 2598–2603 (2009). 47. Rodley, C.D., Bertels, F., Jones, B. & O’Sullivan, J.M. Global identification of yeast chromosome interactions using genome conformation capture. Fungal Genet. Biol. 46, 879–886 (2009). 48. Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009). 49. Duan, Z. et al. A three-dimensional model of the yeast genome. Nature 465, 363–367 (2010). 50. Shopland, L.S. et al. Folding and organization of a contiguous chromosome region according to the gene distribution pattern in primary genomic sequence. J. Cell Biol. 174, 27–38 (2006). 51. Dekker, J. Mapping in vivo chromatin interactions in yeast suggests an extended chromatin fiber with regional variation in compaction. J. Biol. Chem. 283, 34532–34540 (2008). 52. Taddei, A., Schober, H. & Gasser, S.M. The budding yeast nucleus. Cold Spring Harb. Perspect. Biol. 2, a000612 (2010). 53. Hiratani, I. et al. Global reorganization of replication domains during embryonic stem cell differentiation. PLoS Biol. 6, e245 (2008). 54. Schwaiger, M. et al. Chromatin state marks cell-type- and gender-specific replication of the Drosophila genome. Genes Dev. 23, 589–601 (2009). 55. O’Keefe, R.T., Henderson, S.C. & Spector, D.L. Dynamic organization of DNA replication in mammalian cell nuclei: spatially and temporally defined replication of chromosome-specific α-satellite DNA sequences. J. Cell Biol. 116, 1095–1110 (1992). 56. Ryba, T. et al. Evolutionarily conserved replication timing profiles predict long-range chromatin interactions and distinguish closely related cell types. Genome Res. 20, 761–770 (2010). 57. Wen, B., Wu, H., Shinkai, Y., Irizarry, R.A. & Feinberg, A.P. Large histone H3 lysine 9 dimethylated chromatin blocks distinguish differentiated from embryonic stem cells. Nat. Genet. 41, 246–250 (2009). 58. Yokochi, T. et al. G9a selectively represses a class of late-replicating genes at the nuclear periphery. Proc. Natl. Acad. Sci. USA 106, 19363–19368 (2009). 59. Gilbert, N. et al. Chromatin architecture of the human genome: gene-rich domains are enriched in open chromatin fibers. Cell 118, 555–566 (2004). 60. Phillips, J.E. & Corces, V.G. CTCF: master weaver of the genome. Cell 137, 1194–1211 (2009). 61. Splinter, E. et al. CTCF mediates long-range chromatin looping and local histone modification in the β-globin locus. Genes Dev. 20, 2349–2354 (2006). 62. Majumder, P., Gomez, J.A., Chadwick, B.P. & Boss, J.M. The insulator factor CTCF controls MHC class II gene expression and is required for the formation of long-distance chromatin interactions. J. Exp. Med. 205, 785–798 (2008). 63. Soutoglou, E. & Misteli, T. Mobility and immobility of chromatin in transcription and genome stability. Curr. Opin. Genet. Dev. 17, 435–442 (2007). 64. Chuang, C.H. & Belmont, A.S. Moving chromatin within the interphase nucleuscontrolled transitions? Semin. Cell Dev. Biol. 18, 698–706 (2007). 65. Bolzer, A. et al. Three-dimensional maps of all chromosomes in human male fibroblast nuclei and prometaphase rosettes. PLoS Biol. 3, e157 (2005). 66. Osborne, C.S. et al. Active genes dynamically colocalize to shared sites of ongoing transcription. Nat. Genet. 36, 1065–1071 (2004). 67. Spilianakis, C.G., Lalioti, M.D., Town, T., Lee, G.R. & Flavell, R.A. Interchromosomal associations between alternatively expressed loci. Nature 435, 637–645 (2005).
VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology
review 74. Lin, C. et al. Nuclear receptor-induced chromosomal proximity and DNA breaks underlie specific translocations in cancer. Cell 139, 1069–1083 (2009). 75. Mani, R.S. et al. Induced chromosomal proximity and gene fusions in prostate cancer. Science 326, 1230 (2009). 76. Worman, H.J., Fong, L.G., Muchir, A. & Young, S.G. Laminopathies and the long strange trip from basic cell biology to therapy. J. Clin. Invest. 119, 1825–1836 (2009). 77. Goldman, R.D. et al. Accumulation of mutant lamin A causes progressive changes in nuclear architecture in Hutchinson-Gilford progeria syndrome. Proc. Natl. Acad. Sci. USA 101, 8963–8968 (2004). 78. Taimen, P. et al. A progeria mutation reveals functions for lamin A in nuclear assembly, architecture, and chromosome organization. Proc. Natl. Acad. Sci. USA 106, 20788–20793 (2009). 79. Pegoraro, G. et al. Ageing-related chromatin defects through loss of the NURD complex. Nat. Cell Biol. 11, 1261–1267 (2009).
© 2010 Nature America, Inc. All rights reserved.
68. Miele, A., Bystricky, K. & Dekker, J. Yeast silent mating type loci form heterochromatic clusters through silencer protein-dependent long-range interactions. PLoS Genet. 5, e1000478 (2009). 69. Shaw, C.J. & Lupski, J.R. Implications of human genome architecture for rearrangement-based disorders: the genomic basis of disease. Hum. Mol. Genet. 13 Spec No 1, R57–R64 (2004). 70. Mitelman, F., Johansson, B. & Mertens, F. The impact of translocations and gene fusions on cancer causation. Nat. Rev. Cancer 7, 233–245 (2007). 71. Harewood, L. et al. The effect of translocation-induced nuclear reorganization on gene expression. Genome Res. 20, 554–564 (2010). 72. Simonis, M. et al. High-resolution identification of balanced and complex chromosomal rearrangements by 4C technology. Nat. Methods 6, 837–842 (2009). 73. Roix, J.J., McQueen, P.G., Munson, P.J., Parada, L.A. & Misteli, T. Spatial proximity of translocation-prone gene loci in human lymphomas. Nat. Genet. 34, 287–291 (2003).
nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010
1095
A n a ly s i s
© 2010 Nature America, Inc. All rights reserved.
Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications R Alan Harris1,*, Ting Wang2, Cristian Coarfa1, Raman P Nagarajan3, Chibo Hong3, Sara L Downey3, Brett E Johnson3, Shaun D Fouse3, Allen Delaney4, Yongjun Zhao4, Adam Olshen3, Tracy Ballinger5, Xin Zhou2, Kevin J Forsberg2, Junchen Gu2, Lorigail Echipare6, Henriette O’Geen6, Ryan Lister7, Mattia Pelizzola7, Yuanxin Xi8, Charles B Epstein9, Bradley E Bernstein9–11, R David Hawkins12, Bing Ren12,13, Wen-Yu Chung14,15, Hongcang Gu9, Christoph Bock9,16–18, Andreas Gnirke9, Michael Q Zhang14,15, David Haussler5, Joseph R Ecker7, Wei Li8, Peggy J Farnham6, Robert A Waterland1,19, Alexander Meissner9,16,17, Marco A Marra4, Martin Hirst4, Aleksandar Milosavljevic1 & Joseph F Costello3 Analysis of DNA methylation patterns relies increasingly on sequencing-based profiling methods. The four most frequently used sequencing-based technologies are the bisulfite-based methods MethylC-seq and reduced representation bisulfite sequencing (RRBS), and the enrichment-based techniques methylated DNA immunoprecipitation sequencing (MeDIP-seq) and methylated DNA binding domain sequencing (MBD-seq). We applied all four methods to biological replicates of human embryonic stem cells to assess their genome-wide CpG coverage, resolution, cost, concordance and the influence of CpG density and genomic context. The methylation levels assessed by the two bisulfite methods were concordant (their difference did not exceed a given threshold) for 82% for CpGs and 99% of the non-CpG cytosines. Using binary methylation calls, the two enrichment methods were 99% concordant and regions assessed by all four methods were 97% concordant. We combined MeDIPseq with methylation-sensitive restriction enzyme (MRE-seq) sequencing for comprehensive methylome coverage at lower cost. This, along with RNA-seq and ChIP-seq of the ES cells enabled us to detect regions with allele-specific epigenetic states, identifying most known imprinted regions and new loci with monoallelic epigenetic marks and monoallelic expression. DNA methylation plays a vital role in embryonic development, maintenance of pluripotency, X-chromosome inactivation and genomic imprinting through regulation of transcription, chromatin structure and chromosome stability1. It occurs at the C5 position of cytosines within CpG dinucleotides2–4 and at non-CpG cytosines in plants and embryonic stem cells (ESCs) in mammals. 5-Hydroxymethylation of cytosine also occurs in certain human and mouse cells5,6 and is catalyzed by Tet proteins acting on methylated cytosine 7. Several *A
full list of author affiliations appears at the end of the paper.
Published online 19 September 2010; doi:10.1038/nbt.1682
nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010
experimental methods detect methylation but not hydroxymethylation, whereas others detect both but cannot distinguish them. Understanding the role of DNA methylation in development and disease requires knowledge of the distribution of these modifications in the genome. The availability of reference genome assemblies and massively parallel sequencing has led to methods that provide highresolution, genome-wide profiles of 5-methylcytosine8–16. In contrast to arrays, sequencing-based methods can interrogate DNA methylation in repetitive sequences and more readily allow epigenetic states to be assigned to specific alleles. The unique characteristics of each method leave uncertainty about how to select the method best suited to answer particular biological questions. DNA methylation maps are being produced by many laboratories worldwide, and their integration forms a basis for emerging international epigenome projects17. Thus, it is critical to determine the precision of each method, and how reliably they can be compared. Here, we provide a detailed and quantitative comparison of four sequencing-based methods for genome-wide DNA methylation profiling. We focused on two methods that use bisulfite conversion (MethylC-seq8 and RRBS9), and two methods that use enrichment of methylated DNA (MeDIP-seq10,11 and MBD-seq12). We also developed an integrative methodology combining MeDIP-seq to detect methylated CpGs with MRE-seq13,14 to detect unmethylated CpGs. Unlike the enrichment methods alone, the integrative method can accurately identify regions of intermediate methylation which—in conjunction with single nucleotide polymorphism (SNP) profiling from the sequencing data—permits genome-wide identification of allele-specific epigenetic states. RESULTS Generation of DNA methylation profiles from human ESCs Four individual sequencing-based methods and one integrative method were used to generate and compare DNA methylation profiles of three biological replicates of H1 ESCs. MethylC-seq (data used here is from ref. 8) involves shotgun sequencing of DNA treated with
1097
A n a ly s i s
MethyC-Seq no.3 RRBS no.3 MeDIP-seq no.2 MBD-seq no.2 MRE-seq no.2
90 80 70 60 50 40 30 20 10 0 1
3
5
7
9 11 13 15 17 19 21 23 Read coverage threshold for CpGs
25
27
29
b Percent CpGs covered in CpG islands
Percent CpGs covered genome-wide
a 100
100
MethyC-Seq no.3 RRBS no.3 MeDIP-seq no.2 MBD-seq no.2 MRE-seq no.2
90 80 70 60 50 40 30 20 10 0 1
3
5
7
9 11 13 15 17 19 21 23 Read coverage threshold for CpGs
25
27
© 2010 Nature America, Inc. All rights reserved.
Figure 1 CpG coverage by each method. (a,b) The percentage of CpGs covered genome-wide (a) or in CpG islands (b) are plotted as a function of read-coverage threshold. (c) The percentage of genome-wide CpGs (28,163,863) covered by multiple, single or no methods are shown.
bisulfite, a chemical that converts unmethylated cytosines but not methylated cytosines to uracil. The second bisulfite-based method, RRBS9, reduces the portion of the genome analyzed through MspI digestion and fragment size selection. MeDIP-seq10,11 and MBD-seq12 involve enrichment of methylated regions followed by sequencing. In MeDIP-seq, an anti-methylcytosine antibody is used to immuno precipitate methylated single-stranded DNA fragments. MBD-seq uses the MBD2 protein methyl-CpG binding domain to enrich for methylated double-stranded DNA fragments. As a complementary approach for use in conjunction with methylated fragment enrichment methods, unmethylated CpGs are identified by sequencing size-selected fragments from parallel DNA digestions with the methyl-sensitive restriction enzymes (MREs) HpaII (C^CGG), Hin6I (G^CGC) and AciI (C^CGC)(MRE-seq)13. To reliably identify biological variation in methylation among samples from different individuals or biological states, one must determine the variation attributable to biological and technical replication. As an initial assessment of DNA methylation concordance among three H1 ESC biological replicates, the methylation status of 27,578 CpGs was assayed on the widely used bisulfite-based Infinium bead-array. The Infinium method involves bisulfite conversion and hybridization, rather than sequencing. The beta values, roughly representing CpG methylation levels, in the replicates were compared by calculating concordance correlation coefficients. The coefficients were very high, ranging from 0.992 to 0.996 (Supplementary Fig. 1). Replicate no. 1 and replicate no. 2 were run a second time on the Infinium platform to assess technical variation (data not shown). Most (98.9%) of the total variation (technical and biological) was technical. Thus, platform comparisons using these replicates should be very informative. As a second and more comprehensive analysis of variation in methylation calls, RRBS, covering ~1.6 million CpGs, MeDIP-seq and MRE-seq was performed on all three biological replicates. The correlation between the biological replicates was high for RRBS (Supplementary Fig. 2) as it was for MeDIP-seq and MRE-seq (Supplementary Fig. 3). These results show that cell passage–related ‘biological variation’ in methylation is present but minimal on the scale of the genome. The rare biological variation in methylation levels was confirmed by pyrosequencing of selected loci (Supplementary Fig. 4 and Supplementary Table 1). Several algorithms are available for bisulfite-treated short-read mapping, differences in which might alter local read density in a map, and ultimately affect methylation calls. Our assessment of overall concordance between aligners, including Bowtie18, BSMAP19, 1098
29
c Method(s)
Genome-wide CpGs covered by method(s)
Coverage by 4 methods MethylC, RRBS, MeDIP, MBD 6.32% Coverage by 3 methods MethylC, RRBS, MeDIP 0.81% MethylC, RRBS, MBD 1.46% MethylC, MeDIP, MBD 39.09% RRBS, MeDIP, MBD 0.31% Coverage by 2 methods MethylC, RRBS 2.30% MethylC, MeDIP 19.95% MethylC, MBD 10.27% RRBS, MeDIP 0.03% RRBS, MBD 0.68% MeDIP, MBD 0.61% Coverage by 1 method MethylC 14.73% RRBS 0.37% MeDIP 0.09% MBD 1.77% No coverage None 1.21%
Pash20, RMAP21 and ZOOM22 applied to a subset of the MethylCseq data9, indicated that, despite differences in speed and accuracy, aligner choice was unlikely to have a significant impact on the platform comparisons (Supplementary Table 2). There are several important parameters in choosing an appropriate method for particular experimental goals, including the total number and local context of CpGs interrogated and the amount of sequencing required. To determine the impact of sequencing depth on coverage, we plotted CpG coverage genome-wide (Fig. 1a) and in CpG islands (Fig. 1b) as a function of read coverage threshold for CpGs. For MeDIP-seq and MBD-seq, the CpG coverage does not include CpGs for which a lack of methylation could be inferred from lack of reads (Fig. 1a,b). Thus, because CpG islands are predominantly unmethylated, the CpG coverage in CpG islands is lower for the enrichment methods than for either RRBS or MethylC-seq. As an indicator of the cost efficiency for each method, we also plotted the CpG coverage normalized to a single giga base pair (Gbp) of sequence depth in the methylome maps (Supplementary Fig. 5). Enrichment methods had the lowest cost per CpG covered genome-wide, whereas RRBS had the lowest cost per CpG covered in CpG islands. For the enrichment methods we examined the potential effect of CpG density on read coverage. Most of the genome is methylated and CpG poor, but a small fraction is unmethylated and CpG rich (that is, CpG island). Consistent with this, MeDIP-seq and MBD-seq enrich primarily for low CpG density regions, along with a small subset of methylated CpG islands. In contrast, MRE-seq interrogates higher CpG density regions because they have an abundance of unmethylated recognition sites for these enzymes. Therefore, the coverage of MRE-seq and enrichment methods is notably complementary (Supplementary Fig. 6). A major advantage of the sequencing-based methods over microarrays is their ability to interrogate CpGs in repetitive elements. Approximately 45% of the human genome is derived from transposable elements, a major driving force in the evolution of mammalian gene regulation23,24, with nearly half of all CpGs falling within these repetitive regions. The extent to which different sequencing-based methods interrogate repeats is therefore of considerable interest. In general, genome-wide CpG coverage (Fig. 1a) was proportional to CpG coverage in repeats (Table 1). The percent of interrogated CpGs in repeats was similar across all four methods, with MBD-seq capturing the highest fraction of repeat sequences (59.1%). Each of these methods is therefore useful for investigating this important and largely unexplored area. MRE-seq, however, only minimally interrogates repeats, consistent with their dense methylation. VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology
a n a ly s i s Table 1 Critical parameters in sequencing-based DNA methylation profiling Method MethylC-seq RRBS MeDIP-seq MeDIP-seq MeDIP-seq MBD-seq MRE-seq MRE-seq
H1 DNA sample no. no. 3 no. 3 no. 1 no. 2 nos.1 + 2 no. 2 no. 1 no. 2
Total bases generated (Gbp) 172.49 1.58 3.42 3.02 6.44 5.67 3.61 4.03
Total high quality bases (Gbp) 115 1.43 2.07 1.84 3.91 3.71 1.31 1.69
Total bases in map (Gbp) 87.5 1.28 1.95 1.73 3.68 2.21 0.96 1.3
Maximum resolution (bp)
1-read coverage of CpGs in repeats (no.,%)
Percentage of assayed CpGs in repeats (%)
1 1 150 150 150 150 1 1
13,303,415 (91.8) 1,646,649 (11.4) 10,004,670 (68.3) 10,101,868 (68.9) 11,693,059 (79.8) 10,080,007 (68.8) 306,635 (2.07) 232,885 (1.59)
49.7 47.5 52.9 53.2 53.5 59.1 21.7 18.6
© 2010 Nature America, Inc. All rights reserved.
Sequencing statistics and CpG coverage are shown for MethylC-seq (207 lanes, data analyzed here were from ref. 8.), RRBS (2 lanes), MeDIP-seq (4 lanes each), MBD-seq (3 lanes) and MRE-seq (3 lanes each). As the amount of sequence produced per lane is increasing, we also provide “Gbp of sequence” as a measure of the relative cost of each method. The methods differ significantly in total bases generated by the Illumina sequencer, total high-quality bases passing Illumina chastity filtering and mapping uniquely and total bases used for generating methylome maps (high-quality bases passing redundancy filters). The H1 replicates assayed and the Gbp of sequence at successive processing stages by each method are shown. The bisulfite-based methods and MRE-seq resolve the methylation status of individual cytosines, whereas the MeDIP-seq and MBD-seq read mappings are extended to 150 bp, resulting in a maximum resolution of 150 bp. This extension is applied to calculations of CpG coverage but is not applied to the Gbp of sequence at the processing stages. Coverage information is shown for repeats (primarily transposon sequences) genome-wide. Although maximum resolution of each method is reported, resolution can be assessed at various levels. As the level of resolution decreases, as a consequence of averaging of methylation scores over a window of larger size, for example, imperfect coverage and limited accuracy become less limiting, provided that the average score is not affected by systematic biases in coverage and accuracy. Thus, methylome coverage and accuracy in methylation calls are a function of resolution.
Only CpGs interrogated in common can be compared directly. The intersections of CpGs covered by the four methods were therefore determined (Fig. 1c). Overall, at the sequencing depth investigated, MethylC-seq provided the highest CpG coverage at 95% followed by MeDIP-seq at 67% and MBD-seq at 61%. RRBS covered the fewest CpGs genome-wide (12%), which drove the overlap of all methods to 6% of genome-wide CpGs. For any given method, how deeply to sequence the library is an open question. As the sequencing depth increases, the number of unique reads covering a particular region approaches the total possible reads present in the library for each enriched region. This saturation occurs when further sequencing fails to discover additional regions above background. To understand the extent to which we sampled the regions represented in the RRBS, MeDIP-seq and MBD-seq libraries, saturation analysis was performed. RRBS approaches but does not reach saturation at the current sequencing depth (Supplementary Fig. 7a). For MeDIP-seq and MBDseq, saturation was observed when false-discovery rate thresholds were applied, but not when unthresholded data were plotted (Supplementary Fig. 7b,c). Saturation was not observed for MRE-seq (Supplementary Fig. 7d,e), although the average restriction site was represented 13 times within each library, indicating that additional reads would mostly resample restriction sites already interrogated. Sequencing beyond saturation improves confidence in the observations and increases the CpG coverage, though at greater cost per CpG covered. Thus, sequencing below or up to saturation maximizes the number of samples that can be analyzed, whereas sequencing beyond saturation maximizes CpG coverage and improves confidence in methylation calls. Comparison of bisulfite-based methods Several observations from the CpG coverage analysis of MethylC-seq and RRBS are important to consider before assessing their concordance in methylation calls. First, RRBS provides substantial coverage of CpGs in CpG islands, but low CpG coverage genome-wide (Fig. 1a,b). In contrast, MethylC-seq offers greater CpG coverage genome-wide. When coverage is normalized to 1 Gbp of sequence in the methylome map, RRBS shows higher coverage of CpGs in CpG islands at all read depths (Supplementary Fig. 5). This difference points to RRBS as the method of choice if CpG islands are the main focus of a study. However, at lower read thresholds, MethylC-seq sampled far more CpGs in CpG islands than RRBS (Fig. 1b). A major advantage of bisulfite-based methods is that they allow quantitative comparisons of methylation levels at single-base nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010
resolution. For MethylC-seq and RRBS, we calculated and compared
the proportion of methylated reads at individual CpGs genomewide. High concordance was observed using a simple method that makes methylation status calls at different minimum read depths and allows multiple methylation value cutoffs to be examined (Fig. 2a). The difference in methylation proportions between MethylC-seq and RRBS at a minimum read depth of 5 was calculated for individual CpGs and concordance was declared if the difference did not exceed a given threshold (Fig. 2b). Of the CpGs compared between MethylC-seq and RRBS just 12.75% displayed identical methylation level or a difference threshold of zero. If the difference threshold is relaxed to 0.1 or 0.25, the concordance increased to 53.85% or 81.82%, respectively. This analysis was also performed at minimum read depths of 2 and 10 (Supplementary Fig. 8a,b), which, for the 0.25 threshold, showed concordance of 80.28% and 83.89%, respectively, demonstrating that read depth has only a modest effect on concordance. We also performed this analysis for MethylC-seq on replicate no. 3 compared to RRBS on replicate nos. 1 and 2, which showed a similar concordance (79.64% for nos. 3 and 1; 82.95% for nos. 3 and 2) (Supplementary Fig. 8c–f ). The concordance between MethylC-seq and RRBS both on replicate no. 3, (81.82%), falls between the concordances for different replicates. RRBS on replicate nos. 1 and 2 was also compared (Supplementary Fig. 8g,h) and showed a higher concordance (91.54%) than any of the comparisons between MethylC-seq and RRBS, consistent with their high correlation coefficient (Supplementary Fig. 2). The RRBS and MethylC-seq discordant calls were not attributable to the local CpG density or genomic context of the individual CpGs (Fig. 2c,d). Taken together, these analyses suggest that differences between replicates are attributable to technical or stochastic factors as well as modest biological variation. Given the notable presence of non-CpG cytosine methylation in H1 ESCs8, we also examined concordance between MethylC-seq and RRBS at CHH and CHG cytosines. Because CHH sites are asymmetric with respect to strand and 98% of CHG sites are hemi-methylated8, reads mapping to each strand were considered separately. When nonCpG cytosines were considered, either with (Supplementary Fig. 9) or without the zero (lack of methylation) methylation percentage (Supplementary Fig. 10), concordance was higher than concordance at CpGs. However, a lower degree of variation at non-CpG sites is expected because of the relatively narrow range of methylation levels for non-CpG sites. 1099
A n a ly s i s a
Minimum read depth
2 5 10
CpGs covered
2,542,763 1,681,719 913,230
b Count of CpGs (10 ) with difference
3
9.03 5.97 3.24
3.57% 13.81%
220
0.80–0.20 0.75–0.25 0.20 Methylation cutoff Methylation cutoff Methylation cutoff % concordant % concordant % concordant
68.35 67.40 67.79
72.86 72.28 73.20
81.82%
14.61% 32.34%
53.85%
180 160 140 120 100 80
Figure 2 Comparison of bisulfite-based methods. (a) Calls of highly/ partially/weakly methylated (0.80–0.20 or 0.75–0.25 cutoff) or highly/ weakly methylated (0.20 cutoff) were made for CpGs covered at several minimum read depths by MethylC-seq and by RRBS (both on replicate no. 3). The number and percent of genome-wide CpGs covered and the percent of concordant calls are shown for each minimum read depth and methylation call cutoff. (b) Differences (MethylC-seq - RRBS) in methylated proportions (methylated reads/(methylated reads + unmethylated reads)) for CpGs with a minimum coverage of five reads by both methods. Percentages of concordant and discordant methylation were determined at cutoffs of ±0.1 (green dashed lines) and ±0.25 (red dashed lines). (c,d) CpG density in a 400-bp window (c) and genomic context of concordant and discordant CpGs at the 0.25 cutoff (d).
60 40 20 0 –1.00
–0.25 0 0.25 0.50 0.75 –0.75 –0.50 Difference between MethylC-seq - RRBS CpG methylation proportions 1,681,719 CpGs (minimum five reads) Discordant CpG density 5.89% 14.61%
High (>7%) Medium (5–7%)
Concordant CpG density 14.74%
79.50% Discordant genomic context 14.10% 41.99% 2.57% Promoter 1.67% Coding Exon UTR Intron 39.67% Intergenic
1.00
Discordant CpG density 5.93% 12.74%
15.67%
Low (<5%)
d
94.14 96.15 97.13
200
c © 2010 Nature America, Inc. All rights reserved.
Percent genomewide CpGs
81.33% 69.59% Concordant genomic context 34.85% 15.03% 7.44% 3.85%
Discordant genomic context 12.78% 2.55% 1.51%
41.84%
41.32% 38.83%
For both CpG (Fig. 2b) and non-CpG cytosine (Supplementary Figs. 9 and 10) methylation, MethylC-seq showed slightly higher methylation proportions than RRBS on the same DNA, as demonstrated by the longer tail on the positive side of the graphs. This trend was also observed in comparisons of MethylC-seq to RRBS performed on replicate nos. 1 and 2 (Supplementary Fig. 8c–f), suggesting that technical aspects are driving this difference. Comparison of methylated-cytosine enrichment methods Concordance analyses for enrichment methods differ from bisulfite methods in two fundamental ways. First, binary methylation calls are used in enrichment methods, because methylation levels are not easily determined. Second, because of the lack of single CpG resolution inherent in enrichment methods, a windows-based approach is used. The windows can include CpGs that are not directly covered by a read. Thus the percent of genome-wide CpGs contained in the compared windows is naturally higher than the percent of individual CpGs that overlap in the coverage comparison (Fig. 1c). We therefore assessed concordance between MeDIP-seq and MBD-seq by comparing binary highly methylated and weakly methylated calls from the average methylation across 1,000- and 200-bp windows (Online Methods). For both window sizes, concordance was >90% at all read depths examined and improved with increasing minimum read depths (Fig. 3a). We confirmed the concordance between MeDIP-seq and MBD-seq at selected loci by bisulfite treatment of the DNA, PCR, cloning and sequencing (Supplementary Fig. 11 and Supplementary Table 3). The substantially higher concordance relative to the bisulfite-based methods is in part related to the inference common to both enrichment methods that neighboring CpGs within a given window have similar methylation levels and to the binary rather than quantitative methylation calls. When applied in the context of the enrichment methods, the minimum read depths limit the analysis to regions with at least a minimal methylation level. At sufficiently high sequencing depth, however, greater confidence can be placed in the lack of methylation inferred 1100
from lack of reads. However, at lower sequencing depth, lack of methylation cannot be distinguished from lack of coverage due to the stochastic nature of read coverage. This is an important difference from the bisulfite-based methods, which can identify unmethylated regions at a sequencing depth well below saturation. The 1,000-bp windows covered at a minimum read depth of 5, representing 99.8% concordance, were examined for potential biases related to CpG density (Fig. 3b) and genomic context (Fig. 3c) on concordance between MeDIP-seq and MBD-seq calls. Concordant and discordant calls were similar in their genomic context, but discordant calls were shifted toward regions of lower CpG density compared to concordant calls. Thus, although these two methods differ in the extent of CpG coverage and read depth at sites covered (Fig. 1a), in windows with even minimal coverage by both methods, the concordance is exceptionally high. To further examine the accuracy of the calls, we compared the methylation calls from MeDIP-seq to those from MethylC-seq. For regions with methylation detectable by MethylC-seq, MeDIP-seq and MBD-seq, calls of highly methylated were made in nearly every case (Fig. 3d). To examine the reliability of an enrichment-based method specifically for inferring weakly methylated regions at different CpG densities, we compared MeDIP-seq to MethylC-seq (Supplementary Fig. 4). These analyses and limited validation by pyrosequencing suggest that MeDIP-seq allows accurate inferences of lack of methylation and/or weak methylation in regions of high and medium CpG density, whereas accuracy is moderately reduced in regions of low CpG density. Thus, increasing the sequencing depth of MeDIP-seq or using a complementary methodology targeting unmethylated CpGs may be useful. Although MeDIP-seq and MBD-seq methylation calls are highly concordant in sequences represented in both data sets, interesting differences exist between the regions each interrogates, and the sensitivity of each method to detect non-CpG methylation. First, the rate of enrichment differs slightly with respect to local CpG density, with MeDIP-seq enriching more at regions with relatively low CpG density and MBD-seq enriching more at regions with slightly higher CpG density (Supplementary Fig. 6), which is also reflected in their moderate (46.33%) overlap in CpG coverage. This substantial amount of non-overlap suggests that methylated fragments with low CpG density may bind more efficiently to the 5-methylcytosine antibody, or alternatively, these fragments may be selectively eliminated during enrichment in MBD-seq, depending on the salt concentration used to elute the DNA. Second, the ability of MeDIP-seq or MBD-seq to detect nonCpG methylation could be particularly important for evaluating the methylome of ESCs, which contains abundant non-CpG methylation9. To address this, we examined read densities in gene bodies with similar CpG methylation levels but different CHG methylation VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology
a n a ly s i s
250,000 200,000
3,500 3,000 2,500 2,000 1,500 1,000 500 0
150,000 100,000 50,000
c
7
8
9
10
11
12
Promoter Coding Exon UTR Intron Intergenic
500 400 300 200 100 0
1
2
3
4
5 6 7 8 CpG density (%)
9
10
11
12
Discordant genomic context
Concordant genomic context 8.56% 2.79% 40.17% 1.75%
7.70% 43.53%
46.73%
MBD-seq highly methylated windows
MeDIP-seq % highly methylated calls
MBD-seq % highly methylated calls 100.0
99.0 98.5 98.0 97.5
0
5
–5 45
0 40
–4
5
–4 35
0
30
–3
5
–3 25
0
–2
20
–2
–1
15
5
97.0 10
5
99.5
10
1–
5–
Figure 3 Comparison of methylated DNA enrichment methods. (a) Calls of highly/weakly methylated were made by averaging methylation scores for CpGs covered at varying minimum read depths by MeDIP-seq or MBDseq in 1,000- and 200-bp windows. The number of windows, percent of genome-wide CpGs covered and the percent of concordant calls are shown for each minimum read depth and window size. (b,c) For the 1,000-bp windows with a minimum read depth of 5, the CpG density (b) and genomic context (c) of the concordant and discordant windows are shown. The inset in b shows a close-up of the concordance/discordance of CpG densities consistent with CpG islands. (d) For the 1,000-bp windows with a minimum read depth of 5, MethylC-seq methylation proportions for CpGs and non-CpG cytosines covered at a minimum read depth of 5, 444,590 windows, were summed and the windows were binned by the sum. For each of these bins, the number of windows called highly methylated by MeDIPseq or MBD-seq is shown on the left y axis and the percent of total windows with calls of highly methylated is shown on the right y axis. Windows with a MethylC-seq methylation proportion sum >15, representing 83% of all windows, were called highly methylated by MeDIP-seq and MBD-seq in 99.9% of cases. The windows with a methylation proportion sum of 1–15, representing 17% of all windows, were called highly methylated by MeDIP-seq and MBD-seq in at least 99.1% of cases.
43.86%
Number of windows with highly methylated calls
MeDIP-seq highly methylated windows 200,000 180,000 160,000 140,000 120,000 100,000 80,000 60,000 40,000 20,000 0
3.01% 1.90%
Percent of windows with highly methylated calls
© 2010 Nature America, Inc. All rights reserved.
6
16 14 12 10 8 6 4 2 0
0 0
d
600
Concordant windows Discordant windows
Number of discordant windows
b
1,000-bp windows 200-bp windows Minimum Number of Percent Number of Percent read % genome% genomewindows depth wide CpGs concordant windows wide CpGs concordant 92.41 2,136,710 37.96 98.80 61.82 2 1,189,545 753,329 99.01 17.72 99.80 32.65 446,096 5 273,767 99.97 7.74 100.00 15.07 162,661 10 Number of concordant windows
a
MethylC-seq methylation proportion sums
l evels as measured by MethylC-seq. MeDIP-seq signal increased with increasing non-CpG cytosine methylation, whereas MBD-seq did not (Supplementary Fig. 12), suggesting a differential sensitivity in these two enrichment methods. However, the power to distinguish CpG methylation signal from CHG methylation signal is low, because non-CpG cytosine methylation is often embedded within regions with high CpG methylation. As a negative control, regions in the genome that contain no CpGs were examined. MeDIP-seq and MBD-seq had only background level reads, consistent with the non-CpG cytosines being unmethylated in these regions (Supplementary Fig. 13). Comparison of all methods To examine concordance of CpG methylation calls from the two bisulfite-based methods and the two methylation enrichment–based methods, a four-way comparison was performed. This can be viewed as combining the two previous pair-wise comparisons, but with three differences. First, to make the bisulfite-based methods comparable to the highly/weakly methylated categorization of MeDIP-seq/MBDseq scores, a binary calling scheme was applied with highly methylated defined as >0.20 methylation and weakly methylated defined as ≤0.20 methylation. When this calling scheme for individual CpGs was applied to bisulfite-based data alone, the concordance between methods was 94.14% for two reads, 96.15% for five reads and 97.13% for ten reads. Second, to perform the comparison at the same level of resolution, the methylation proportions for individual CpGs in MethylC-seq and RRBS were averaged across windows. Third, to compare the bisulfite-based methods to the enrichment-based nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010
ethods without inferring an unmethylated state from complete m absence of reads in enrichment methods, the comparison excluded regions lacking reads. Methylation calls were made for 1,000-bp windows where all of the methods had at least one CpG covered by a minimum of five or ten reads, allowing for comparison of 199,438 or 87,363 windows, respectively. Of all the windows covered by a minimum of five reads, 2.45% completely encompassed CpG islands and 5.5% overlapped with CpG islands. The four-way comparison revealed a high degree of concordance of methylation calls among all methods (Fig. 4a,b and Supplementary Table 4). To investigate the effect of applying different highly/weakly methylated cutoffs to MethylC-seq and RRBS, we performed the four-way comparison at several cutoffs (Supplementary Fig. 14). Concordance remained >90% up to a highly/weakly methylated cutoff of 0.55, suggesting the concordance results we report are applicable to a wide range of methylation call cutoffs. This result is congruent with the known partitioning of the genome into methylated and unmethylated zones. As the limited coverage by RRBS constrained the number of windows that could be compared, a three-way comparison excluding RRBS was also performed. This allowed for the comparison of 444,494 1,000-bp windows or 32% of CpGs genome-wide compared to 18% in the four-way comparison, which showed a three-way concordance of 99.69%. Using different minimum read depth and window sizes had little effect on concordance (Supplementary Table 5a,b). To further evaluate the performance of the four methods, we compared them individually to the widely used Infinium bead-array. For the bisulfite-based methods, the differences in methylation for individual CpGs compared to beta values from the array assaying replicate no. 3 were calculated. At a difference threshold of 0.25, high concordance was observed between the array and MethylC-seq (96.41%; 20,885 CpGs) and between the array and RRBS (97.31%; 5,475 CpGs) (Supplementary Fig. 15). For the enrichment-based methods, the average methylation score was calculated for CpGs covered by a minimum of five reads in 200-bp windows centered on CpGs assayed by the array and used to make the binary methylation call. For the array assaying replicate no. 2, highly methylated was defined as >0.20 beta value and weakly methylated defined as ≤0.20 beta value. Both MeDIP-seq (96.19%; 4,960 windows) and MBD-seq (90.80%; 4,163 windows) calls showed high concordance with the array. This high degree of agreement between very different methods further supports the validity of comparing methylation profiles across platforms. 1101
A n a ly s i s
Minimum read depth of 5 Minimum read depth of 10 199,438 windows 87,363 windows (18.01% of genome-wide CpGs) (9.39% of genome-wide CpGs) Methods Percent windows Percent windows (MethylC, RRBS, MeDIP, MBD) 97.64 98.30 0 (MethylC, RRBS, MeDIP)(MBD) 0.07 0 (MethylC, RRBS, MBD)(MeDIP) 0.07 1.98 1.60 (MethylC, MeDIP, MBD)(RRBS) 0.03 0.02 (RRBS, MeDIP, MBD)(MethylC) (MethylC, RRBS)(MeDIP, MBD) 0.20 0.07 (MethylC, MeDIP)(RRBS, MBD) 0.01 0 0 0 (MethylC, MBD)(RRBS, MeDIP)
b
50 kb
MethylC-seq RRBS
MeDIP-seq 1
© 2010 Nature America, Inc. All rights reserved.
MeDIP-seq 2 MBD-seq PCDHA1 PCDHA2 PCDHA3 PCDHA4 PCDHA5 PCDHA6 PCDHA7 PCDHA8 PCDHA9 PCDHA10 PCDHA11 PCDHA12 PCDHA13
Integrative method To increase DNA methylome coverage while maintaining modest sequencing requirements, MeDIP-seq was integrated with MREseq13. The integration is advantageous because the two methods are largely non-overlapping in the regions they interrogate, and because it allows intermediate methylation states to be identified, which is less reliably using MeDIP-seq alone. The methylation scores from MRE-seq were inversely correlated with MeDIP-seq scores (Fig. 5a). The two methods combined assessed the DNA methylation status at 22 million CpGs, 78% of genome-wide CpGs (Fig. 5b). In regions where MRE-seq scores were high and MeDIP-seq scores were low, the MRE-seq reads corroborate the lack of methylation inferred from the absence of MeDIP-seq reads. Interestingly, there are a small but significant number of CpG islands with overlapping MeDIP-seq and MRE-seq signals (Supplementary Table 6), indicating an intermediate methylation level. We tested two regions from one locus, ZNF331, by clonal bisulfite sequencing (Fig. 5c,d and Supplementary Table 7). Region 1 of ZNF331 showed overlap of signals from MeDIP-seq and MRE-seq, with bisulfite sequencing confirming intermediate and potentially monoallelic methylation. In contrast, region 2 exhibited MeDIP-seq signal only, and bisulfite sequencing confirmed nearly complete methylation. ZNF331 exhibits paternal monoallelic expression in multigenerational CEPH pedigrees consistent with imprinting25,26. In addition, allelic
Figure 5 Integrative method increases methylome coverage and enables identification of a DMR. (a) MRE-seq involves parallel digests with methylation-sensitive restriction enzymes (HpaII, AciI and Hin6I), selection of cut fragments of ~50–300 bp, pooling the digests, library construction and sequencing. For every 600-bp window along chromosome 21, MeDIP-seq scores were plotted against MRE-seq scores. The plot depicts the inverse relationship between MRE-seq and MeDIPseq signals. (b) Coverage of CpGs in the human genome by MeDIP-seq alone (red), MRE-seq alone (green), both (yellow) or neither method (no fill). Sequence from replicate nos. 1 and 2 were used in these calculations. (c) UCSC Genome Browser view of ZNF331 in H1 ESC, showing overlap of MeDIP-seq, MRE-seq and H3K4me3 (from ChIP-seq) signals at bisulfite region 1 and only MeDIP-seq signal at bisulfite region 2. (d) Clonal bisulfite sequencing results for specified regions in ESC from replicate no. 1. A filled circle represents a methylated CpG and an open circle indicates an unmethylated CpG.
1102
Figure 4 Comparison of all methods. (a) The table shows the percentage of 1,000-bp windows with concordant and discordant MethylC-seq (replicate no. 3), RRBS (replicate no. 3), MeDIP-seq (replicate no. 2) and MBD-seq (replicate no. 2) calls at minimum read depths of 5 and 10. Methods making the same call are grouped together in parentheses. Calls were made for MethylC-seq and RRBS by averaging the methylation proportion of CpGs within the window that were covered at the minimum read depth and applying a highly/weakly methylated cutoff of 0.2. Calls were made for MeDIP-seq and MBD-seq by averaging the methylation score of CpGs within the window that were covered at the minimum read depth. (b) Genome browser view of the 100-kb CpG rich Protocadherin alpha cluster (PCDHA), exemplifying the significant concordance in methylation status seen on a genome-wide level. For MethylC-seq and RRBS, the y axis displays methylation scores of individual CpGs. Scores range between −500 (unmethylated) and 500 (methylated) and the zero line is equivalent to 50% methylated. Negative scores are displayed as green bars and positive scores are displayed as orange bars. For MeDIP-seq (1), MeDIP-seq (2) and MBD-seq, the y axis indicates extended read density. Browsable genome-wide views of these data sets are available at http://www.genboree.org/ and http://genome.ucsc.edu/.
skewing of DNA methylation at ZNF331 was reported using SNP arrays27, further supporting a provisional status of ZNF331 as a novel imprinted gene. Histone H3 lysine 4 trimethylation (H3K4me3), a mark enriched at promoters, overlapped with region 1 but not region 2 (Fig. 5c). A third CpG island at the 5′ end of ZNF331 was fully unmethylated and had an even stronger H3K4me3 peak. Thus, our integrative approach identified a differentially methylated region (DMR) in ZNF331 that may be a DNA methylation–regulated promoter for one of the ZNF331 transcripts. The analysis of ZNF331 suggested the possibility of using MeDIPseq and MRE-seq to generate a list of candidate DMRs genome-wide (Supplementary Tables 6 and 7). Ultimately this could define all regions with an intermediate methylation level, encompassing DMRs of all imprinted genes in the genome, or the imprintome, and sites of non-imprinted monoallelic epigenetic regulation. Consistently, our candidate list includes 16 of 19 previously identified DMRs of
a H1 ES MRE CpG score
a
b
5 4 3
MeDIP-seq only (20.65M) MRE-seq only (0.71M) Both (1.04M) None (5.6M)
2 1 0 0
c
Scale chr19 CpG Islands H3K4me3 MRE-seq MeDIP-seq Bisulfite region ZNF331
d
20 40 60 80 100 H1 ES MeDIP CpG score 20 kb
1 Bisulfite region 1 10
2 Bisulfite region 2 10
20
VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology
a n a ly s i s a
b
Expression
21 0 38
4 21
Scale chr7 CpG Islands
Bisulfite region 1 10
10 kb
Bisulfite region 2 10
ES no.1
H1ES no.1 MRE-seq 1
H1ES no.1 MeDIP-seq H1ES no.2 MRE-seq
34
ES no.2
H1ES no.2 MeDIP-seq DNA methylation
20
Histone methylation
Bisulfite region GRB10
1
2
© 2010 Nature America, Inc. All rights reserved.
c
Figure 6 Allelic DNA methylation, histone methylation Scale chr15 10 kb and gene expression in ESCs. (a) Venn diagram CpG Islands summarizing the number of loci exhibiting monoallelic H1ES no.1 MRE-seq DNA methylation, histone methylation or monoallelic H1ES no.1 MeDIP-seq expression and their overlap. The top 1,000 loci (average H1ES no.2 MRE-seq size of 2.9 kb and encompassing a CpG island) with potential allelic DNA methylation were further evaluated, H1ES no.2 MeDIP-seq using the following assays: MRE-Seq and MeDIP-Seq 1 Bisulfite region 2 for allelic DNA methylation within the loci, MethylCPOTEB seq and expression data for monoallelic expression of Bisulfite region 1 Bisulfite region 2 genes associated (±50 kb) with the loci, MethylC-seq 10 20 30 40 10 20 30 ES no.1 and histone modifications H3K4me3 and H3K9me3 for monoallelic histone methylation within 1 kb from the loci. (b,c) Validation of known and novel DMRs identified from MeDIP-seq and MRE-seq. DMRs are presented in a UCSC Genome Browser window with MeDIP-seq and MRE-seq signals in human H1 ESC, along with bisulfite ES no.2 sequencing results. The results from the biological replicates (nos. 1 and 2) were very similar. (b) Imprinted gene GRB10 including a known DMR (Bisulfite region 1) and an upstream unmethylated CpG island (Bisulfite region 2). (c) Novel DMR upstream of POTEB, which exhibits allele-specific DNA methylation. Open circle indicates an unmethylated CpG site. Filled circle represents a methylated CpG site. ‘x’ indicates absence of a CpG site due to a heterozygous SNP, which destroyed the 28 th CpG. All clones without the CpG were unmethylated, whereas all the clones containing the CpG were methylated. Furthermore, the alleles could be distinguished in the sequence reads from MeDIP-seq (G allele, 9 of 9 reads) and MRE-seq (A allele, 30 of 30 reads).
imprinted genes, including BLCAP, GRB10, H19, INPP5F, KCNQ1, MEST, SGCE, SNRPN, ZIM2, GNAS, GNASAS, DIRAS3, DLK1, NDN, PLAGL1 and TP73. Two of the known DMRs, in PEG3 and MEG3, appeared mostly methylated, potentially representing loss of imprint marks28. One of the 19 known DMRs (for NAP1L5) is not within a CpG island but did in fact exhibit intermediate methylation (Supplementary Fig. 16). Thus, extension of this analysis to include CpG-rich regions that are not strictly CpG islands will be useful. The data indicate intermediate DNA methylation states that characterize DMRs within known imprinted regions and others are readily identifiable using an integrative approach. Monoallelic methylation and gene expression Sequencing-based methods present a unique opportunity to assign epigenetic marks and gene transcripts to specific alleles. We explored this possibility in the ESCs by identifying SNPs within sequence reads, focusing on the top 1,000 CpG island loci with extensive overlap between MRE-seq and MeDIP-seq signals (Fig. 6a and Supplementary Tables 8 and 9). Of the 1,000 loci examined, 203 contained an informative SNP and 63 of these exhibited mono allelic DNA methylation (Fig. 6a). The remaining 140 of the 203 loci with an informative SNP represent intermediate methylation states that may reflect heterogeneity in methylation across the cell population. In total, 119 of the 1,000 loci exhibited evidence of mono allelic epigenetic modification and/or expression. Four DMRs were identified that were monoallelic in DNA methylation and histone methylation and were associated with a gene exhibiting monoallelic nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010
expression (Supplementary Fig. 17). Strong corroborating evidence for monoallelic DNA methylation was obtained from similar analyses of the MethylC-seq data (Supplementary Fig. 18). These results dem onstrate the excellent capabilities of sequencing-based epigenomic and transcriptome assays for identifying genes exhibiting monoallelic epigenetic marks and monoallelic expression. To further assess the accuracy of methylation status predictions, eight regions (total of 17 nonoverlapping PCR products), which exhibited apparent monoallelic methylation from the MeDIP-seq and MRE-seq SNP analyses (Fig. 6a and Supplementary Table 8) were selected for clonal bisulfite sequencing. Adjacent CpG island loci containing only MRE-seq reads were confirmed to be largely unmethyl ated (Fig. 6b), whereas loci containing only MeDIP-seq reads were heavily methylated (Supplementary Table 7). Individual bisulfite clones from two known imprinted genes INPP5F and GRB10 were either methylated or unmethylated at nearly all CpGs (Fig. 6b and Supplementary Table 7). GRB10 exhibited DNA methylation consistent with an isoform-specific imprint mark, as previously reported29. Seven (BCL8, FRG1, ZNF331, IAH1, MEFV, POTEB, ZFP3) of the eight putative DMRs showed evidence of differential methylation (Fig. 6c and Supplementary Table 7). Bisulfite analysis of a DMR upstream of POTEB at 15q11.2 provided direct evidence for allelespecific methylation (Fig. 6c, lower panel). The H3K9me3 signal at this locus is also monoallelic, as two nucleotides identified as heterozygous from the MethylC-seq reads both showed only a single allele in the H3K9me3 sequence reads (chr15:19346665, T in 4 of 4 reads; and chr15:19348112, C in 13 of 14 reads). In the 150 kb proximal 1103
A n a ly s i s
© 2010 Nature America, Inc. All rights reserved.
to POTEB, three additional CpG islands exhibit intermediate methylation levels, including one near the noncoding RNA, CXADRP2 and one encompassing the 5′ end of BCL8. The allelic pattern of DNA methylation of BCL8 was confirmed by bisulfite sequencing (Supplementary Table 7). DISCUSSION Our quantitative comparison of four sequencing-based DNA methylation methods revealed that all four methods yield largely comparable methylation calls, but differ in CpG coverage, resolution, quantitative accuracy, efficiency and cost. The greater coverage provided by MethylC-seq comes at a >50-fold increase in cost compared to RRBS, MeDIP-seq and MBD-seq. These analyses should be widely useful in understanding the extent to which sequencing-based DNA methylation profiles generated by different methods and different laboratories can be compared to define true biological differences. Given the international investment in mapping human DNA methylomes, and other epigenomic marks, high concordance is essential. Quantifying differences among the four methods highlighted their strengths and weaknesses. Strengths of bisulfite methods include single-base resolution and an ability to quantify methylation levels. The quantification is imperfect, however, with the methylation level of ~18% of CpG varying by >25% between RRBS and MethylC-seq, and the methylation level of ~5–8% of CpGs varying by >25% in RRBS biological replicates. MethylC-seq is superior in genome-wide CpG coverage, whereas RRBS carries a significantly lower ratio of cost to CpGs covered, particularly at CpG islands. A strength of the enrichment methods is even lower cost per CpG covered genomewide relative to the bisulfite methods, albeit at reduced resolution. A second potential strength is that in the enrichment methods all four nucleotides are retained, which modestly increases the rate of uniquely mappable sequence reads and permits a greater number of genotype-epigenotype correlations. Enrichment methods do not allow precise quantification of methylation levels, and their methylation calls are therefore fit into two or three categories. Using binary methylation calls, the enrichment methods are remarkably reproducible and highly (99%) concordant, regardless of whether the window size is 200 bp or 1 kb. The inability of enrichment methods to quantify methylation was addressed by integrating MeDIP-seq to map methylated regions with MRE-seq to map unmethylated CpG sites. The integrative approach increases CpG coverage with only a modest increase in cost, and permits accurate identification of intermediate methylation states, such as the methylation states of imprinted genes or cell type–specific methylation within complex tissues. The methods also differ in their abilities to detect methylation at non-CpG cytosines and to discriminate between these residues and CpG methylation. However, the high degree of concordance, approaching 100% between MeDIP-seq and MBD-seq, suggests that this differential ability to detect non-CpG methylation does not have a significant impact on the relative methylation levels within 1,000-bp windows. This observation may be related to the low levels of methylation at non-CpG sites, and their presence in regions with high CpG methylation. Our finding that MeDIP-seq enriches for regions with lower CpG density compared to MBD-seq is seemingly in contrast to a previous finding30 that MeDIP-seq was more sensitive to regions of high CpG density than MBD-seq. However, it has also been shown30 that increasing eluent salt concentrations in MBD-seq enriches for increasingly higher CpG densities. Our comparison between MeDIP-seq and MBD-seq used a salt concentration of 1 M compared to 700 mM30, which could account for the differences. 1104
Variation in DNA methylation is a topic of wide interest. Variation is observed between individuals, cell and tissue types or within one cell type over time. Our biological replicates displayed variation that was similar in magnitude to variation from limited technical replicates, suggesting the concordance estimates may be marginally higher than what we report. Thus, to identify potentially rare variation in methylation between biological samples, the magnitude of technical variation should be considered. There are numerous opportunities to increase methylome coverage. First, for RRBS or MRE-seq, for example, selecting additional enzymes, increasing the size range of selected fragments and increasing sequencing depth could dramatically increase CpG coverage. Second, increasing read length or using paired-end sequencing could also positively affect each method. Third, integrative approaches could include MeDIP-seq or MBD-seq coupled with MRE-seq or RRBS, particularly for direct rather than inferred calling of unmethylated CpGs within high CpG density regions. Versatile methods such as ‘bisulfite padlock probes’ allow more targeted profiling and could also complement the enrichment methods14,31. Sequencing-based methods are unique in that they allow assessment of the methylation status of repetitive elements, which encompass nearly half of all CpGs in the methylome. The epigenetic status of this entire genomic compartment has been inaccessible to microarrays, but is a critical component of epigenetic gene regulation, as many of the sequences have a regulatory function23,32. Furthermore, the labile DNA methylation status of a particular transposon in the mouse agouti locus influences susceptibility to diabetes and cancer33,34. These and other studies indicate that there is a great deal to be learned about the epigenetic regulation of these abundant but enigmatic elements. Sequencing-based methylation analysis methods are also unique in that the sequence reads themselves can be used to construct a partial map of genetic variation, including common and rare variants. The comprehensiveness of the genetic map is a function of read coverage and whether reads contain three nucleotides (bisulfite methods) or four nucleotides (enrichment methods). The sites of genetic variation enable local epigenetic states to be associated with specific alleles. SNP microarrays have been similarly deployed for allelic DNA methylation analysis, but the detection of variants is confined to those present on the microarray35. Our combined epigenomic-genomic analyses identified all CpG islands with intermediate methylation states in H1 ESCs, many of which were confirmed as monoallelic DNA methylation, and in some cases, also monoallelic for histone methylation and gene expression. This represents an initial step toward characterizing the human imprintome and genome-wide monoallelic epigenetic states, a goal of basic biological and clinical importance in epigenomic research. Methods Methods and any associated references are available in the online version of the paper at http://www.nature.com/naturebiotechnology/. Data accession. Additional data related to this paper are available at http://www.genboree.org/java-bin/project.jsp?projectNa me=Methylation%20Platform%20Comparison&isPublic=yes and hgwdev-remc.cse.ucsc.edu. Data used in this paper are available for download from the GEO NIH Roadmap Epigenomics Project Data Listings (http://www.ncbi.nlm.nih.gov/geo/roadmap/epigenomics/) and the Epigenomics Atlas (http://genboree.org/epigenomeatlas/ edaccDataFreeze1.rhtml). Note: Supplementary information is available on the Nature Biotechnology website.
VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology
a n a ly s i s
© 2010 Nature America, Inc. All rights reserved.
Acknowledgments We would like to thank the US National Institutes of Health (NIH) Roadmap Epigenomics Program; sponsored by the National Institute on Drug Abuse (NIDA) and the National Institute of Environmental Health Sciences (NIEHS). J.F.C. and M.H. are supported by NIH grant 5U01ES017154-02. A. Milosavljevic is supported by NIH grant 5U01DA025956-02. A. Meissner and B.E.B. are supported by NIH grant 6U01ES017155-02. J.R.E. and B.R. are supported by NIH grant 5U01ES017166-02. R.P.N. was supported by NIH T32 CA10846204 and F32CA141799. S.L.D. was supported by CIRM TB1-01190. S.D.F. was supported by NIH T32 CA108462-06. B.E.J. was supported by NIH T32 GM008568. M.A.M. is a Terry Fox Young Investigator and a Michael Smith Senior Research Scholar. We thank Z. Zhang and H. Li for modifying the ZOOM algorithm for bisulfite alignments. AUTHOR CONTRIBUTIONS J.F.C., R.A.H., T.W., M.H., M.A.M. and A. Milosavljevic conceived and designed the experiments. R.P.N., C.H., S.L.D., B.E.J., S.D.F., Y.Z. and M.H. performed the MeDIP, MRE and bisulfite sequencing experiments. R.A.W. and X.Z. designed and performed pyrosequencing and data analyses. H.G., C.B., A.G. and A. Meissner9 performed and analyzed RRBS. L.E., H.O., P.J.F., B.E.B., C.B.E., R.D.H. and B.R. performed and analyzed Chip-seq experiments. R.L., M.P. and J.R.E. analyzed MethylC-seq data and performed Bowtie aligner testing. R.A.H., T.W., K.J.F., J.G., C.C., M.H., X.Z., A.D. and A.O. performed data analysis. T.W., T.B. and D.H. developed MeDIP and methyl-sensitive restriction enzyme scoring algorithms and performed coverage analyses including repetitive sequence analyses. Y.X., W.-Y.C., R.L., M.Q.Z. and W.L. compared bisulfite sequence aligners. J.F.C., R.A.H., M.H., T.W., R.P.N. and R.A.W. wrote the manuscript. COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests. Published online at http://www.nature.com/naturebiotechnology/. Reprints and permissions information is available online at http://npg.nature.com/ reprintsandpermissions/. 1. Robertson, K.D. DNA methylation and human disease. Nat. Rev. Genet. 6, 597–610 (2005). 2. Bird, A. DNA methylation patterns and epigenetic memory. Genes Dev. 16, 6–21 (2002). 3. Feinberg, A.P. & Vogelstein, B. Hypomethylation distinguishes genes of some human cancers from their normal counterparts. Nature 301, 89–92 (1983). 4. Gama-Sosa, M.A. et al. Tissue-specific differences in DNA methylation in various mammals. Biochim. Biophys. Acta 740, 212–219 (1983). 5. Tahiliani, M. et al. Conversion of 5-methylcytosine to 5-hydroxymethylcytosine in mammalian DNA by MLL partner TET1. Science 324, 930–935 (2009). 6. Kriaucionis, S. & Heintz, N. The nuclear DNA base 5-hydroxymethylcytosine is present in Purkinje neurons and the brain. Science 324, 929–930 (2009). 7. Ito, S. et al. Role of Tet proteins in 5mC to 5hmC conversion, ES-cell self-renewal and inner cell mass specification. Nature 466, 1129–1133 (2010). 8. Lister, R. et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462, 315–322 (2009). 9. Meissner, A. et al. Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature 454, 766–770 (2008).
10. Jacinto, F.V., Ballestar, E. & Esteller, M. Methyl-DNA immunoprecipitation (MeDIP): hunting down the DNA methylome. Biotechniques 44, 35–43 (2008). 11. Down, T.A. et al. A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis. Nat. Biotechnol. 26, 779–785 (2008). 12. Serre, D., Lee, B.H. & Ting, A.H. MBD-isolated Genome Sequencing provides a high-throughput and comprehensive survey of DNA methylation in the human genome. Nucleic Acids Res. 38, 391–399 (2010). 13. Maunakea, A.K. et al. Conserved role of intragenic DNA methylation in regulating alternative promoters. Nature 466, 253–257 (2010). 14. Ball, M.P. et al. Targeted and genome-scale strategies reveal gene-body methylation signatures in human cells. Nat. Biotechnol. 27, 361–368 (2009). 15. Cokus, S.J. et al. Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature 452, 215–219 (2008). 16. Lister, R. et al. Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell 133, 523–536 (2008). 17. The American Association for Cancer Research Human Epigenome Task Force European Union, Network of Excellence, Scientific Advisory Board Moving AHEAD with an international human epigenome project. Nature 454, 711–715 (2008). 18. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009). 19. Xi, Y. & Li, W. BSMAP: whole genome bisulfite sequence MAPping program. BMC Bioinformatics 10, 232 (2009). 20. Coarfa, C. & Milosavljevic, A. Pash 2.0: scaleable sequence anchoring for next-generation sequencing technologies. Pac. Symp. Biocomput. 2008, 102–113 (2008). 21. Smith, A.D. et al. Updates to the RMAP short-read mapping software. Bioinformatics 25, 2841–2842 (2009). 22. Lin, H., Zhang, Z., Zhang, M.Q., Ma, B. & Li, M. ZOOM! Zillions of oligos mapped. Bioinformatics 24, 2431–2437 (2008). 23. Wang, T. et al. Species-specific endogenous retroviruses shape the transcriptional network of the human tumor suppressor protein p53. Proc. Natl. Acad. Sci. USA 104, 18613–18618 (2007). 24. Kunarso, G. et al. Transposable elements have rewired the core regulatory network of human embryonic stem cells. Nat. Genet. 42, 631–634 (2010). 25. Pant, P.V.K. et al. Analysis of allelic differential expression in human white blood cells. Genome Res. 16, 331–339 (2006). 26. Pollard, K.S. et al. A genome-wide approach to identifying novel-imprinted genes. Hum. Genet. 122, 625–634 (2008). 27. Schalkwyk, L.C. et al. Allelic skewing of DNA methylation is widespread across the genome. Am. J. Hum. Genet. 86, 196–212 (2010). 28. Pick, M. et al. Clone- and gene-specific aberrations of parental imprinting in human induced pluripotent stem cells. Stem Cells 27, 2686–2690 (2009). 29. Arnaud, P. et al. Conserved methylation imprints in the human and mouse GRB10 genes with divergent allelic expression suggests differential reading of the same mark. Hum. Mol. Genet. 12, 1005–1019 (2003). 30. Li, N. et al. Whole genome DNA methylation analysis based on high throughput sequencing technology. Methods published online, doi: 10.1016/j. ymeth.2010.04.009 (27 April 2010). 31. Deng, J. et al. Targeted bisulfite sequencing reveals changes in DNA methylation associated with nuclear reprogramming. Nat. Biotechnol. 27, 353–360 (2009). 32. Bourque, G. Transposable elements in gene regulation and in the evolution of vertebrate genomes. Curr. Opin. Genet. Dev. 19, 607–612 (2009). 33. Duhl, D.M., Vrieling, H., Miller, K.A., Wolff, G.L. & Barsh, G.S. Neomorphic agouti mutations in obese yellow mice. Nat. Genet. 8, 59–65 (1994). 34. Waterland, R.A. & Jirtle, R.L. Transposable elements: targets for early nutritional effects on epigenetic gene regulation. Mol. Cell. Biol. 23, 5293–5300 (2003). 35. Hellman, A. & Chess, A. Gene body-specific methylation on the active X chromosome. Science 315, 1141–1143 (2007).
1Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA. 2Center for Genome Sciences and Systems Biology, Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, USA. 3Brain Tumor Research Center, Department of Neurosurgery, Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, California, USA. 4Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia, Canada. 5Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California, USA. 6Department of Pharmacology and the Genome Center, University of California-Davis, Davis, California, USA. 7Genomic Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, California, USA. 8Division of Biostatistics, Dan L. Duncan Cancer Center, Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas, USA. 9Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA. 10Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA. 11Center for Cancer Research, Massachusetts General Hospital, Boston, Massachusetts, USA. 12Ludwig Institute for Cancer Research. 13Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, California, USA. 14Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA. 15Department of Molecular and Cell Biology, Center for Systems Biology, University of Texas at Dallas, Dallas, Texas, USA. 16Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, Massachusetts, USA. 17Harvard Stem Cell Institute, Cambridge, Massachusetts, USA. 18Max Planck Institute for Informatics, Saarbrücken, Germany. 19USDA/ARS Children’s Nutrition Research Center, Department of Pediatrics, Baylor College of Medicine, Houston, Texas, USA. Correspondence should be addressed to J.F.C. ([email protected]).
nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010
1105
ONLINE METHODS
ESCs. H1 cells were grown in mTeSR1 medium36 on Matrigel (BD Biosciences) for 10 passages on 10 cm2 plates and harvested at passage 27. Cells were harvested by scraping before snap freezing for DNA isolation. Cells were also harvested from passages 30 and 32 and divided for isolation of DNA, RNA and chromatin. Illumina Infinium methylation assay. We used 500 ng genomic DNA per sample for the Infinium methylation assay (Illumina), which measures methylation at 27,578 CpGs, with ~2 probes per gene (14,475 RefSeq genes). Bisulfite conversion was performed with the EZ DNA methylation kit (Zymo Research) and each sample was eluted in 12 μl water. Amplification and hybridization to the Illumina HumanMethylation27 BeadChip were carried out according to manufacturer’s instructions at the UCSF Genomics Core Facility. Beta values, representing quantitative measurements of DNA methylation at individual CpGs, were generated with Illumina GenomeStudio software. Beta values were normalized to background and filtered to remove those with low signal intensity. The filtered data were used for all subsequent analysis.
© 2010 Nature America, Inc. All rights reserved.
Shotgun bisulfite sequencing (MethylC-seq). As described8. RRBS. RRBS analysis was performed as described previously37,38, using ~30 ng of H1-derived DNA as input. The steps of the experimental protocol were as follows. (i) DNA digestion using the MspI restriction enzyme, which cuts DNA at its recognition site (CCGG) independent of the CpG methylation status. (ii) End repair and ligation of adapters for Illumina sequencing. (iii) Gel-based selection of DNA fragment sizes ranging from 40 bp to 220 bp. (iv) Two successive rounds of bisulfite treatment, after which we observed 98.4% converted cytosines outside of CpGs. Due to the presence of non-CpG methylation in ESCs, this value is an underestimate of the actual bisulfite conversion rate. (v) PCR amplification of the bisulfite-converted library and sequencing on the Illumina Genome Analyzer II according to the manufacturer’s protocol. A total of two lanes were sequenced, and the data were processed using Illumina’s standard pipeline for image analysis and base calling. The alignment was performed using custom software developed at the Broad Institute9. The non-RepeatMasked reference sequence is generated by size-selecting from an in silico digest with the MspI restriction enzyme, and before the alignment all Cs in the reference sequence and in the aligned reads are converted into Ts. The alignment itself uses a straightforward seed-and-extension algorithm, identifying all perfect 12 bp alignments and extending without gaps from either end of the seed. The best alignment is kept only in cases where the second-best alignment has at least three more mismatches, whereas all reads that match multiple times are discarded. The DNA methylation level of a specific CpG is calculated as the number of C-to-C matches between the unconverted reference sequence and the aligned read sequence divided by the sum of number of C-to-C matches and C-to-T mismatches. MBD-seq. As described above, 3 μg of gDNA isolated was sheared to ~300 bp using the Covaris E210 sonicator (Covaris) and size separated by PAGE (8%). The 200- to 400-bp DNA fraction was excised, eluted overnight at 4 °C in 200 μl of elution buffer (5:1, LoTE buffer (3 mM Tris-HCl, pH 7.5, 0.2 mM EDTA)-7.5 M ammonium acetate) and purified using a QIAquick purification kit (Qiagen). The size selected DNA was end-repaired, A-tailed and ligated to 2.5 mMol of ‘paired-end’ adapters (IDT) following the manufactures recommend protocol (Ilumina). The resulting product was purified on a Qiaquick MinElute column (Qiagen) and assessed and quantified using an Agilent DNA 1000 series II assay and Qubit fluorometer (Invitrogen), respectively. 100 ng of pre-adapted, size-selected product was subjected to immunoprecipitation using the MethylMiner Methylated DNA Enrichment Kit (Invitrogen) following the manufacturer’s recommended protocol. The bound fraction was eluted at 600 mM, 1 M and 2 M NaCl and concentrated by the addition of 1 μl (20 μg/μl) mussel glycogen, 1/10th v/v 3 M sodium acetate (pH 5.2) and 2x v/v 100% ethanol. Samples were incubated at −80 °C for 2 h and subsequently centrifuged for 15 min at 16,000g at 4 °C. Pellets were washed with 500 μl cold 70% ethanol two times with 5 min centrifugation at 16,000g at 4 °C between washes and resuspended in 60 μl nuclease-free water. After purification eluted products were subjected to PCR using Illumina paired-end
nature biotechnology
adapters (Illumina.) with 15 cycles of PCR amplification. PCR products were purified on Qiaquick MinElute columns (Qiagen) and assessed and quantified using an Agilent DNA 1000 series II assay and size separated by PAGE (8%). The 320- to 520-bp DNA fraction was excised and purified as described above. The products were assessed and quantified using an Agilent DNA 1000 series II assay and Qubit fluorometer (Invitrogen), respectively. A 1 μl aliquot of each library was used as template in two independent PCR reactions to confirm enrichment for methylated (SNRPN promoter) and de-enrichment for unmethylated (CpG-less sequence on Chr15) 13 for primer sequences). Cycling was 95 °C for 30 s, 55 °C for 30 s and 72 °C for 30 s with 30 cycles. PCR products were visualized by 1.8% agarose gel electrophoresis. Each library was diluted to 8 nM for sequencing on an Illumina Genome Analyzer following the manufacturer’s recommended protocol. MeDIP-seq. As described above 2–5 μg DNA isolated was sonicated to ~100–500 bp with a Bioruptor sonicator (Diagenode). Sonicated DNA was end-repaired, A-tailed and ligated to adapters following the standard Illumina protocol. After agarose size-selection to remove unligated adapters, adaptorligated DNA was used for each immunoprecipitation using a mouse monoclonal anti-methylcytidine antibody (1 mg/ml, Eurogentec). DNA was heat denatured at 95 °C for 10 min, rapidly cooled on ice and immunoprecipitated with 1 μl primary antibody per microgram of DNA overnight at 4 °C with rocking agitation in 500 μl immunoprecipitation (IP) buffer (10 mM sodium phosphate buffer, pH 7.0, 140 mM NaCl, 0.05% Triton X-100). To recover the immunoabsorbed DNA fragments, 1 μl of rabbit anti-mouse IgG secondary antibody (2.5 mg/ml, Jackson ImmunoResearch) and 100 μl Protein A/G beads (Pierce Biotechnology) were added and incubated for an additional 2 h at 4 °C with agitation. After immunoprecipitation a total of 6 IP washes were performed with ice cold IP buffer. A nonspecific mouse IgG IP (Jackson ImmunoResearch) was performed in parallel to methyl DNA IP as a negative control. Washed beads were resuspended in TE with 0.25% SDS and 0.25 mg/ ml proteinase K for 2 h at 55 °C and then allowed to cool to 25 °C. MeDIP and supernatant DNA were purified using Qiagen MinElute columns and eluted in 16 μl elution buffer (EB) (Qiagen). Fifteen cycles of PCR were performed on 5 μl of the immunoprecipitated DNA using the single-end Illumina PCR primers. The resulting reactions are purified over Qiagen MinElute columns, after which a final size selection (220–420 bp) was performed by electrophoresis in 2% agarose. Libraries were quality checked by spectrophotometry and Agilent DNA Bioanalyzer analysis, which indicated an average fragment size of 150 bp. An aliquot of each library was diluted in EB to 5 ng/μl and 1 μl used as template in four independent PCR reactions to confirm enrichment for methylated and de-enrichment for unmethylated sequences, compared to 5 ng of input (sonicated DNA). Two positive controls (SNRPN and MAGEA1 promoters) and two negative controls (a CpG-less sequence on Chr15 and GAPDH promoter) were amplified13 for primer sequences). Cycling was 95 °C for 30 s, 58 °C for 30 s, 72 °C for 30 s with 30 cycles. PCR products were visualized by 1.8% agarose gel electrophoresis. Calculating MeDIP-seq and MBD-seq scores for single CpGs. MeDIP-seq and MBD-seq reads were mapped to the non-RepeatMasked human genome assembly (hg18) with Mapping and Assembly with Quality (MAQ). An algorithm was developed to calculate methylation scores for individual CpGs based on MeDIP-seq or MBD-seq data. Each uniquely mapped, non-redundant sequence read was extended to 150 bp long, representing individual DNA fragments pulled down in the methylation enrichment experiment. The algorithm makes two assumptions: first, for a given fragment, this fragment is assigned to a CpG site that is covered by this fragment and the probability of assigning it to a particular CpG, when there is more than 1 CpG is proportional to the level of methylation of the CpG site; the weighted sum of the probability of all CpGs covered by this fragment is always 1. Second, for a given CpG site, the number of fragments assigned to it is proportional to the level of methylation of this CpG site. The algorithm initiates by assigning a score of 1 to all CpGs, and then it iterates through two steps. In the first step, fragments are assigned to CpGs based on their scores. In the first round, because all CpGs have the same score of 1, an equal fraction of a fragment is assigned to each CpG that the fragment covers, and this is done for all fragments. In the second step, all the fractions of reads each CpG received in step 1 are added up, and this weighted sum is
doi:0.1038/nbt.1682
© 2010 Nature America, Inc. All rights reserved.
used as a methylation score for this CpG site. Then, the first step is repeated; only now individual CpGs may have a different prior for assigning reads. A fraction of a fragment is now assigned to CpGs that fragment covers based on methylation scores of the CpGs, that is, the fraction assigned to each CpG is proportional to its methylation score. These updated fragment counts are summed again in step 2 and used as methylation score for individual CpGs. The algorithm iterates through these two steps until the methylation scores converge. These scores are in essence CpG density normalized read density. Methylation sensitive restriction enzyme sequencing (MRE-seq). Three parallel digests were performed (HpaII, AciI and Hin6I; Fermentas), each with 1 μg of DNA. Five units of enzyme per microgram DNA were added and incubated at 37 °C in Fermentas “Tango” buffer for 3 h. A second dose of enzyme was added (5 units of enzyme per microgram DNA) and the DNA was incubated for an additional 3 h. Digested DNA was precipitated with sodium acetate and ethanol and 500 ng of each digest were combined into one tube. Combined DNA was size-selected by electrophoresis on a 1% agarose TBE gel. A 100–300 bp gel slice was excised using a sterile scalpel and gel-purified using Qiagen Qiaquick columns, eluting in 30 μl of Qiagen EB buffer. Library construction was performed using the Illumina Genomic DNA Sample Kit (Illumina) with single-end adapters, following the manufacturer’s instructions with the following changes. For the end-repair reaction, T4 DNA polymerase and T4 polynucleotide kinase were excluded and the Klenow DNA polymerase was diluted 1:5 in water and 1 μl used per reaction. For single end oligo adaptor ligation, adapters were diluted 1:10 in water and 1 μl used per reaction. After the second size selection, DNA was eluted in 36 μl EB buffer using Qiagen Qiaquick columns, and 13 μl used as template for PCR, using Illumina reagents and cycling conditions with 18 cycles. After cleanup with Qiagen MinElute columns, each library is examined by spectrophotometry (Nanodrop, Thermo Scientific) and Agilent DNA Bioanalyzer (Agilent). Methyl-sensitive restriction enzyme scores. MRE-seq reads were mapped to the human genome assembly (hg18) with MAQ with an additional constraint that the 5′ end of a read must map to the CpG site within a methyl-sensitive restriction enzyme site. An MRE-score was defined for each CpG site as the number of MRE-reads that map to the site, regardless of the orientation, normalized by the number of million reads generated by the specific enzyme. An MRE-score for each genomic window (e.g., any given 600 bp window) was defined as the average MRE-score for all CpGs that have a score within the window. RNA-seq. Polyadenylated RNA was purified from 20 μg of DNAse1 (Invitrogen)-treated total RNA using the MACS mRNA Isolation Kit (Miltenyi Biotec). Double-stranded cDNA was synthesized from the purified polyA+ RNA using Superscript Double-Stranded cDNA Synthesis kit (Invitrogen) and random hexamer primers (Invitrogen) at a concentration of 5 μM. The resulting cDNA was sheared using a Sonic Dismembrator 550 (Fisher Scientific) and size separated by PAGE (8%). The 190–210 bp DNA fraction was excised, eluted overnight at 4 °C in 300 μl of elution buffer (5:1, LoTE buffer (3 mM Tris-HCl, pH 7.5, 0.2 mM EDTA)-7.5 M ammonium acetate) and purified using a QIAquick purification kit (Qiagen). The sequencing library was prepared following the Illumina Genome Analyzer paired end library protocol (Illumina) with 10 cycles of PCR amplification. PCR products were purified on Qiaquick MinElute columns (Qiagen) and assessed and quantified using an Agilent DNA 1000 series II assay and Qubit fluorometer (Invitrogen) respectively. The resulting libraries were sequenced on an Illumina Genome Analyzeriix following the manufacturer’s instructions. Image analysis and base calling was performed by the GA pipeline v1.1 (Illumina) using phasing and matrix values calculated from a control phiX174 library run on each flowcell. ChIP-seq. Protocols for the chromatin immunoprecipitation assay and Illumina library construction are described in details elsewhere39. Briefly, cross-linked hESCs were obtained from Cellular Dynamics, chromatin was extracted and sonicated to an average size of 500 bp. Individual ChIP assays were performed using 50 μg chromatin (equivalent to 5 × 106 cells) and 2 μg of antibody were added to each ChIP reaction. The histone antibodies used in this study include H3AcK9 (Millipore), H3me3K4 (CST), H3me3K27
doi:0.1038/nbt.1682
(CST), H3me3K9 (Abcam), H3me3K36 (Abcam) and H3me1K4 (Abcam5). ChIP libraries have been created40 using the entire purified ChIP sample. All ChIP samples except H3me1K4 were amplified using paired-end Illumina primers for a total of 18 cycles. Libraries were then run on a 2% agarose gel, and the 150- to 500-bp fraction of the library was extracted and purified. The H3me1K4 library was constructed by performing size selection of the 200- to 400-bp library fragment before a 15-cycle amplification. The libraries were quantified using a BioAnalyzer and sequenced. ChIP-seq peaks were called using the Sole-Search software41. Bisulfite pyrosequencing. Site-specific analysis of CpG methylation was performed by bisulfite pyrosequencing. Genomic DNA (1.0 μg) was bisulfite modified and pyrosequencing was performed as previously described42. The quantitative performance of each pyrosequencing assay was verified by measuring methylation standards comprised of known proportions of unmethylated (whole genome-amplified) and fully methylated (SssI-treated) genomic DNA43. Comparison was performed on three combinations of DNA methylome platforms: MethylC-seq versus reduced representation bisulfite sequencing (RRBS) and MethylC-seq versus methylated DNA immunoprecipitation sequencing (MeDIP-seq). H1 cell lines of different passage number were used in these experiments (Batch 3 for MethylC-seq, Batch 1 for RRBS and MeDIP). CpGs showing > 80% difference in methylation for the MethylC-seq − RRBS comparison or > 80% difference between the methylated proportion and the methylation score for MethylC-seq and MeDIP comparisons were identified and regions with clusters of these sites were identified for pyrosequencing. Based on the distribution of target CpGs we looked for genomic regions with appropriate length (within range 50 bp to 75 bp), few or no non-CG cytosines and 2 or many target CpGs. Pyrosequencing assays were designed and carried out in 16 regions selected for validation; 14 of these yielded reliable results. Genomic coordinates and primers used for pyrosequencing for the validated regions are listed in Supplementary Table 1. Clonal bisulfite sequencing. Further validation of genome-wide data, particularly sites with apparent allelic DNA methylation, was performed by bisulfite sequencing. Total genomic DNA underwent bisulfite conversion following established protocol44 with a modified conversion conditions of: 95 °C for 1 min, 50 °C for 59 min for a total of 16 cycles. Bisulfite PCR primers (Supplementary Table 4) were used to amplify regions of interest and were subsequently cloned using pCR2.1/TOPO (Invitrogen). Single colony PCR and sequencing (QuintaraBio) provided contigs that were aligned for analysis. Data analyses. Comparison of CpG or non-CpG site methylation. Repeat masking of the reference genome assembly was not used in any of these analyses. For bisulfite-based methods, reads that mapped to the positive and negative strand were combined for CpG methylation calculations, but not for CHG and CHH methylation calculations due to the strand asymmetry of non-CpG methylation9. The methylated proportion was calculated for each CpG or non-CpG as (methylated reads/(methylated reads + unmethylated reads)). Comparisons of methylation status calls were performed by imposing minimum requirements of 2, 5 or 10 reads covering a CpG or non-CpG site and applying varying methylated proportion cutoffs (0.80–0.20, 0.75–0.25 or 0.20) to make calls on the methylation status. Methylated proportion differences were calculated as (MethylC-seq proportion - RRBS proportion). Methylation proportion difference graphs were generated by counting the number of CpGs with a particular methylated proportion difference and plotting the count on the y-axis. Concordance was then calculated as the percent of CpGs with a methylation proportion difference less than 0.1 or 0.25. For enrichment-based methods, methylation scores inferred for individual CpGs were averaged across CpGs covered by a varying minimum number of reads in 1,000- or 200-bp windows. Methylation calls of highly methylated (methylation score >8) or weakly methylated (methylation score ≤8) were made based on the average methylation score for each window where at least one CpG was covered by the minimum number of reads. Genomic context of concordant and discordant CpGs. The overlap of concordant and discordant CpGs with annotated genes, as defined by the UCSC Genome Browser RefSeq Gene track (2010-01-24 version
nature biotechnology
http://genome.ucsc.edu/cgi-bin/hgTrackUi?g=refGene), was identified. To deal with overlapping genes and multiple isoforms of genes, CpGs were classified into gene components based on the following prioritization order: Promoter (within 8,000 bp upstream of a transcription start site), Coding Exon, UTR and Intron. CpGs that did not overlap with any of these gene components were identified as Intergenic.
© 2010 Nature America, Inc. All rights reserved.
36. Ludwig, T.E. et al. Feeder-independent culture of human embryonic stem cells. Nat. Methods 3, 637–646 (2006). 37. Gu, H. et al. Genome-scale DNA methylation mapping of clinical samples at singlenucleotide resolution. Nat. Methods 7, 133–136 (2010). 38. Smith, Z.D., Gu, H., Bock, C., Gnirke, A. & Meissner, A. High-throughput bisulfite sequencing in mammalian genomes. Methods 48, 226–232 (2009).
39. O’Geen, H., Frietze, S. & Farnham, P.J. Using ChIP-seq technology to identify targets of zinc finger transcription factors. Methods Mol. Biol. 649, 437–455 (2010). 40. Robertson, G. et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat. Methods 4, 651–657 (2007). 41. Blahnik, K.R. et al. Sole-Search: an integrated analysis program for peak detection and functional annotation using ChIP-seq data. Nucleic Acids Res. 38, e13 (2010). 42. Waterland, R.A., Lin, J., Smith, C.A. & Jirtle, R.L. Post-weaning diet affects genomic imprinting at the insulin-like growth factor 2 (Igf2) locus. Hum. Mol. Genet. 15, 705–716 (2006). 43. Shen, L., Guo, Y., Chen, X., Ahmed, S. & Issa, J.J. Optimizing annealing temperature overcomes bias in bisulfite PCR methylation analysis. Biotechniques 42, 48, 50, 52 passim (2007). 44. Grunau, C., Clark, S.J. & Rosenthal, A. Bisulfite genomic sequencing: systematic investigation of critical experimental parameters. Nucleic Acids Res 29, E65 (2001).
nature biotechnology
doi:0.1038/nbt.1682
a n a ly s i s
Quantitative comparison of genome-wide DNA methylation mapping technologies
© 2010 Nature America, Inc. All rights reserved.
Christoph Bock1–4,6, Eleni M Tomazou1–3,6, Arie B Brinkman5, Fabian Müller1–4, Femke Simmer5, Hongcang Gu1, Natalie Jäger1–3, Andreas Gnirke1, Hendrik G Stunnenberg5 & Alexander Meissner1–3 DNA methylation plays a key role in regulating eukaryotic gene expression. Although mitotically heritable and stable over time, patterns of DNA methylation frequently change in response to cell differentiation, disease and environmental influences. Several methods have been developed to map DNA methylation on a genomic scale. Here, we benchmark four of these approaches by analyzing two human embryonic stem cell lines derived from genetically unrelated embryos and a matched pair of colon tumor and adjacent normal colon tissue obtained from the same donor. Our analysis reveals that methylated DNA immunoprecipitation sequencing (MeDIP-seq), methylated DNA capture by affinity purification (MethylCap-seq), reduced representation bisulfite sequencing (RRBS) and the Infinium HumanMethylation27 assay all produce accurate DNA methylation data. However, these methods differ in their ability to detect differentially methylated regions between pairs of samples. We highlight strengths and weaknesses of the four methods and give practical recommendations for the design of epigenomic case-control studies. DNA methylation is a common mechanism of epigenetic regulation in eukaryotes. It occurs most frequently at cytosines that are followed by guanines (CpG). High levels of DNA methylation in promoter regions are typically associated with robust gene silencing1. Twentyfive years of research on cancer epigenetics have firmly established the prevalence of aberrant DNA methylation in cancer cells2–6. Moreover, recent studies have investigated the role of DNA methylation for neural and autoimmune diseases, its correlation with physiological conditions and its response to environmental influences7–9. Comprehensive mapping of DNA methylation in relevant clinical cohorts is likely to identify new disease genes and potential drug targets, help to establish the relevance of epigenetic alterations in disease and provide a rich source of potential biomarkers10. DNA methylation mapping could
1Broad Institute, Cambridge, Massachusetts, USA. 2Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, Massachusetts, USA. 3Harvard Stem Cell Institute, Cambridge, Massachusetts, USA. 4Max Planck Institute for Informatics, Saarbrücken, Germany. 5Radboud University Department of Molecular Biology, Nijmegen Center for Molecular Life Sciences, Nijmegen, The Netherlands. 6These authors contributed equally to this work. Correspondence should be addressed to C.B. ([email protected]) or A.M. ([email protected]).
Published online 19 September 2010; doi:10.1038/nbt.1681
1106
also facilitate quality control of cultured cells by exploiting the fact that cell states and differentiation potential of stem cells are reflected in their DNA methylation patterns11. Several methods have been developed to map DNA methylation on a genomic scale. Most of these methods combine DNA analysis by microarrays or high-throughput sequencing with one of four ways of translating DNA methylation patterns into DNA sequence information or library enrichment. (i) MeDIP-seq uses an antibody that is specific for 5-methylcytosine to retrieve methylated fragments from sonicated DNA12,13. (ii) MethylCap-seq employs a methyl-binding domain protein to obtain DNA fractions with similar methylation levels 14–16. (iii) Bisulfite-based methods use a chemical reaction that selectively converts unmethylated, but not methylated, cytosines into uracils, thus introducing methylation-specific, single nucleotide polymorphisms into the DNA sequence 11,17,18. (iv) Methylation sensitive digestion uses prokaryotic restriction enzymes to selectively fractionate only methylated or only unmethylated DNA19–21. The diversity of methods to map DNA methylation and the absence of an uncontested commercial market leader raise questions about each method’s strengths and weaknesses—questions that researchers have to answer for themselves when selecting the most appropriate technology for any given project. The goal of this study was to comprehensively evaluate four popular methods—MeDIP-seq12, MethylCap-seq14, RRBS22 and the Infinium HumanMethylation27 assay17 with a special emphasis on their practical utility for biomedical research and biomarker development. All four methods are relatively easy to set up because detailed protocols have been published and/or commercial kits are available. We chose RRBS because it targets bisulfite sequencing to a well-defined set of genomic regions with moderate to high CpG density22, which makes RRBS substantially more cost efficient than genome-wide bisulfite sequencing. The Infinium HumanMethylation27 assay, also a bisulfite-based method, was included because of its wide use and easy integration with existing genotyping pipelines; it is the only microarray-based method in our comparison. Methods that use tiling microarrays were excluded because they have been benchmarked previously20 and because next-generation sequencing enables higher resolution and/or higher genomic coverage at competitive cost. Methylation-specific digestion was excluded because no algorithm exists that could accurately infer quantitative DNA methylation data from digested read frequencies 19. An outline of the experimental and analytical procedure of this techno logy comparison is shown in Figure 1.
VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology
a n a ly s i s DNA for two pairs of samples Two human ES cell lines derived from unrelated embryos A colon tumor and matched normal colon tissue from the same patient
MeDIP-seq 1. Sonication of DNA
MethylCap 1. Sonication of DNA
RRBS 1. Digestion with MspI
Infinium 1. DNA preparation
Validation 1. Primer design
2. Library preparation
2. Enrichment with methylbinding domain protein
2. Library preparation
2. Bisulfite conversion
2. Bisulfite conversion
3. Gel-based size selection
3. Hybridization onto Illumina bead arrays (Infinium HumanMethylation27)
3. PCR amplification
Accuracy of DNA methylation mapping For a more quantitative assessment of meas5. Sanger sequencing 4. Library amplification 5. Library amplification urement accuracy, we compared the results 4. Data normalization 6. Data processing using using the Illumina the BiQ Analyzer of the three sequencing-based methods BeadStudio software software High-throughput sequencing (MeDIP-seq, MethylCap-seq, RRBS) with 1. Sequencing on the Illumina Genome Analyzer II (30–40 million reads per sample) 2. Image processing, base calling and genome alignment the Infinium HumanMethylation27 assay as a common reference (Fig. 3). The Infinium Bioinformatic analysis assay was used as reference because its quan1. Accuracy analysis and quantification of DNA methylation levels titative accuracy has been established in 2. Assessment of genomic coverage and statistical power to detect DNA methylation differences 3. Identification of differentially methylated regions (DMRs), cross-method comparison and validation previous studies17,24, which reported corre4. Saturation analysis estimating the effect of sequencing depth lation coefficients around 0.9 relative to the 5. DNA methylation analysis of repetitive DNA GoldenGate and MethyLight assays. Note, Figure 1 Outline of the DNA methylation technology comparison. Four methods for DNA methylation however, that the probes of the Infinium assay mapping were compared on two pairs of samples. The resulting 16 DNA methylation maps were cover only a small percentage of all CpGs in bioinformatically analyzed and benchmarked against each other. In addition, clonal bisulfite the genome and are preferentially located in sequencing was performed on selected genomic regions to validate DNA methylation differences unmethylated promoter regions. To comthat were detected exclusively by one method. pensate for this potential source of bias, we calculated two correlation coefficients, one across the entire spectrum of methylation RESULTS levels and the other focusing only on those CpGs that exhibit at least DNA methylation mapping by four methods Genome-wide DNA methylation mapping is most commonly used as 20% methylation according to the Infinium assay. RRBS and Infinium data can be compared directly and without a discovery tool to identify differentially methylated regions (DMRs) as candidates for further research. Typical examples are cancer- normalization, because both methods measure absolute DNA methspecific DMRs, which are increasingly used as biomarkers for cancer ylation levels. For a total of 5,088 single CpGs that were covered by diagnosis and therapy optimization 10. To emulate the case-control both an Infinium probe and at least five RRBS reads, we observed approach that is widely used for epigenetic biomarker development, a Pearson correlation of 0.92 across all DNA methylation levels and a we focused on sample pairs that we statistically compare with each Pearson correlation of 0.83 when we excluded unmethylated CpGs. other. Specifically, we selected two human embryonic stem (ES) cell Because neighboring CpGs tend to exhibit highly correlated DNA lines that were derived from genetically unrelated embryos23, and a methylation levels18,25, we also evaluated the correlation for RRBS matched pair of colon tumor and adjacent normal colon tissue obtained measurement averages over a 200-base pair (bp) sequence window from the same donor. We applied each of the four methods (MeDIP- around each Infinium probe. Again, we observed excellent agreement seq, MethylCap-seq, RRBS, Infinium) to all four samples (HUES6 between the two methods (Fig. 3c), with an overall Pearson correlaES cells, HUES8 ES cells, colon tumor and matched normal colon tion of 0.92 across all DNA methylation levels and a Pearson cortissue), generating a total of 16 genome-scale DNA methylation maps. relation of 0.84 when we excluded unmethylated CpGs. This second All data were processed with a standardized bioinformatic pipeline, comparison supports the hypothesis that a single-CpG measurement and the technical data quality turned out to be similarly high across can often act as an indicator of the DNA methylation levels at neighall samples and methods (Table 1). boring, unmeasured CpGs. When plotting the DNA methylation data as genome browser Comparison with MeDIP-seq and MethylCap-seq is less straighttracks, we found excellent visual agreement between all four meth- forward because both methods measure the relative enrichment ods (Fig. 2; tracks are available online for interactive browsing: of methylated DNA rather than absolute DNA methylation levhttp://meth-benchmark.computational-epigenetics.org/). MeDIP- els. When we correlated the number of sequencing reads per 1-kb seq and MethylCap-seq gave rise to peaks of methylated DNA that region with the DNA methylation measurements of the Infinium were similar in shape, size and location, indicating that MeDIP-seq’s assay, the Pearson correlation did not exceed 0.6 across all DNA monoclonal antibody and MethylCap-seq’s methyl-binding domain methylation levels and 0.4 when we excluded unmethylated CpGs enrich for similar DNA fragments. However, MeDIP-seq exhibited (Supplementary Fig. 3a,b). High density of repetitive DNA was higher baseline levels and lower peak heights than MethylCap-seq. identified as a major source of spurious read enrichment in regions This smaller dynamic range is already apparent from Figure 2 (note with low absolute DNA methylation levels. In contrast, low CpG the different scale of the y axis) and becomes more obvious when density gave rise to low read numbers in regions with high levels plotting MeDIP and MethylCap-seq tracks along an entire chromo- of DNA methylation (Supplementary Fig. 3c,d). The confoundsome (Supplementary Fig. 1). This observation was quantitatively ing effect of DNA sequence is also visible in Figure 2. Low read confirmed by plotting the mean read frequency for enriched and counts can indicate either the relative absence of CpGs (e.g., region 1 depleted fractions of the genome (Supplementary Fig. 2). We also in Fig. 2) or the absence of DNA methylation in the presence of CpGs observed high visual agreement between RRBS and Infinium, with the (Fig. 2, region 2); and strong peaks can occur in genomic regions that limitation that Infinium covers two orders of magnitude fewer CpGs are incompletely methylated if the CpG density is sufficiently high to than RRBS (Table 1). Finally, the bisulfite-based methods (RRBS, give rise to substantial read enrichment (Fig. 2, region 3). 3. Denaturation and enrichment with antibody for 5-methylcytosine
© 2010 Nature America, Inc. All rights reserved.
Infinium) generally confirm the results of the enrichment-based methods (MeDIP, MethylCap-seq), although there are deviations in repeat-rich as well as in CpG-poor genomic regions (Supplementary Fig. 3).
3. Washing and elution
4. Library preparation and amplification
4. Bisulfite treatment
nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010
4. Amplicon cloning
1107
A n a ly s i s Table 1 Summary of DNA methylation mapping experiments Run no.
Method
Sample name
1 2 3 4 5 6 7 8
MeDIP-seq MeDIP-seq MeDIP-seq MeDIP-seq MethylCap-seq MethylCap-seq MethylCap-seq MethylCap-seq
HUES6 ES cell line HUES8 ES cell line Primary colon tumor Matched normal colon tissue HUES6 ES cell line HUES8 ES cell line Primary colon tumor Matched normal colon tissue
© 2010 Nature America, Inc. All rights reserved.
9 10 11 12
13 14 15 16
RRBS RRBS RRBS RRBS
Infinium Infinium Infinium Infinium
HUES6 ES cell line HUES8 ES cell line Primary colon tumor Matched normal colon tissue
HUES6 ES cell line HUES8 ES cell line Primary colon tumor Matched normal colon tissue
Number of lanesa
Number of reads (total)
Number of reads (aligned)
Alignment rate
2 2 2 2 3 3 3 3
37,086,239 36,078,308 33,453,797 37,789,936 38,436,495 38,735,596 37,718,830 38,330,519
22,798,831 24,266,670 18,582,183 21,793,567 23,401,511 21,670,301 23,206,054 22,724,002
61.5% 67.3% 55.5% 57.7% 60.9% 55.9% 61.5% 59.3%
2 2 4c 4c
30,004,147 28,395,040 40,015,958 32,072,287
12,150,905 12,670,034 9,545,423 6,214,732
Number of arrays
Number of CpGs (total)
Number of CpGs (valid)
1 1 1 1
27,578 27,578 27,578 27,578
27,192 27,090 27,561 27,478
40.5% 44.6% 23.9% 19.4%
Number of Number of Unique reads (unique) reads (duplicates) read rateb 12,849,623 12,287,174 7,006,484 10,360,103 21,712,433 19,585,988 21,600,129 21,290,282
9,949,208 11,979,496 11,575,699 11,433,464 1,689,078 2,084,313 1,605,925 1,433,720
56.4% 50.6% 37.7% 47.5% 92.8% 90.4% 93.1% 93.7%
Number of CpGs (total)
Number of CpGs (unique)
Mean CpG coverage
22,181,147 29,704,332 16,891,325 10,190,227
2,181,128 2,185,751 1,297,296 1,134,963
10.2x 13.6x 13.0x 9.0x
Number of Valid probe rate CpGs (unique) 27,192 27,090 27,561 27,478
98.6% 98.2% 99.9% 99.6%
aAll sequencing was performed in 2009 using the Illumina Genome Analyzer II (36-bp, single-end reads). As of June 2010, we routinely observe total read numbers per lane averaging ~40 million for MeDIP-seq and MethylCap-seq and close to 30 million for RRBS. Current alignment rates range from 60% to 80% for all three methods.bThe unique read rate was calculated by dividing the number of reads that map to a unique position in the genome (defined by chromosome, read start position and strand) by the total number of aligned reads.cSamples 11 and 12 were part of a sequencing-optimization run that resulted in lower sequencing yield and reduced alignment rates. Four lanes were sequenced to reach the target of 30–40 million reads per sample and method.
It has previously been reported that statistical correction for CpG density can improve the quantification of DNA methylation levels based on MeDIP-seq data12,26. We therefore constructed a linear regression model that corrects for the confounding effect of DNA sequence, and we observe substantially improved results (Fig. 3a,b). Across all DNA methylation levels the correlation between the statistically corrected read counts and the DNA methylation measurements of the Infinium assay amounted to 0.84 for MeDIP-seq and to 0.88 for MethylCap-seq. However, the correlations dropped to 0.57 (MeDIP-seq) and 0.66 (MethylCap-seq) when we excluded unmethylated CpGs. These results indicate that MeDIP-seq and MethylCap-seq can distinguish between methylated and unmethylated regions almost as precisely as RRBS, but are less accurate for quantifying the DNA methylation levels in partially methylated genomic regions. Genomic coverage of DNA methylation mapping The single-bp resolution of the two bisulfite-based methods comes at the cost of reduced genomic coverage compared to the two enrichmentbased methods. RRBS reads cover less than 10% of the 28 million CpGs in the human genome and Infinium is by design restricted to 27,578 promoter-associated CpGs (Table 1). In contrast, MeDIP-seq and MethylCap-seq are theoretically able to identify methylated genomic regions located anywhere in the genome, although they too are subject to intrinsic limitations27. To assess the empirical genomic coverage of each method, we calculated the number of reads (MeDIP-seq, MethylCap-seq) or CpG methylation measurements (RRBS, Infinium) for each of the following genomic regions: (i) CpG islands, (ii) gene promoters and (iii) a 1-kb tiling of the genome. The results are shown in Figure 4, and coverage details for a total of 13 types of genomic regions are available online (http://meth-benchmark.computational-epigenetics.org/). As expected, MeDIP-seq and MethylCap-seq provide broad coverage of the genome, whereas RRBS and Infinium are more restricted to CpG islands and promoter regions. However, the practically relevant differences in genomic coverage are lower than Figure 4 may suggest. This is because a minimum number of reads are required in at least one sample to reliably detect differential methylation among a 1108
given pair of samples. We illustrate this point by two statistical power calculations, which were performed with G*Power 3 (ref. 28). Assume that a genomic region is covered by five MeDIP-seq or MethylCapseq reads in one sample. Then it has to contain at least 20 reads in the second sample to be detected as hypermethylated (assuming a statistical power of 80% and a P-value of 5% without multipletesting correction). Similarly, RRBS would detect a DNA methylation increase from 30% to 70% only when at least 25 measurements are available in each sample (again assuming a statistical power of 80% and a P-value of 5% without multiple-testing correction). Identification of differentially methylated regions Genome-wide DNA methylation mapping is most commonly used for detecting DNA methylation differences, for example, between diseased and healthy tissue or between genetically modified and unmodified control cells. To assess how well MeDIP-seq, MethylCapseq and RRBS perform on this task, we developed a bioinformatic method that identifies statistically significant DMRs from multiple types of sequencing data (the Infinium assay requires a different approach and is discussed in a separate section below). For a predefined set of genomic regions we count the numbers of sequenced reads (for MeDIP-seq and MethylCap-seq) or, alternatively, the numbers of methylated versus unmethylated CpGs (for RRBS), and we test for statistically significant differences between two samples using Fisher’s exact test. When applied to a complete tiling of the human genome, this method performs genome-wide DMR detection. Alternatively, it can be targeted to specific region types such as CpG islands, gene promoters or putative enhancers, which often leads to more sensitive detection of small differences because the multiple-testing burden is reduced compared to genome-wide DMR detection. We pursued both the unbiased and the annotation-guided approach in parallel, focusing our comparison on three types of genomic regions: (i) CpG islands, (ii) gene promoters and (iii) a 1-kb tiling of the genome (Fig. 5 and Supplementary Figs. 4–8). Overall, we observed high correlation for each of the two sample pairs, but also outliers suggesting the presence of DMRs. Based on the VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
a n a ly s i s Figure 2 Comparison of DNA methylation maps obtained with four different methods. The screenshot shows genome browser tracks for MeDIP-seq (first two tracks, in green), MethylCapseq (three tracks in blue, gray and red), RRBS (stacked light blue tracks) and Infinium (single black track with percentage values) across the HOXA cluster in a human ES cell line (HUES6). Each track represents data from a single sequencing lane (MeDIP-seq, MethylCap-seq, RRBS) or microarray hybridization (Infinium). MeDIP-seq and MethylCap-seq data are visually similar to ChIP-seq data, with peaks in regions that show high density of the target molecule (5-methylcytosine) and troughs in regions with low density of methylated cytosines. The heights of the peaks represents the number of reads in each genomic interval, for each track normalized to the same genome-wide read count. RRBS gives rise to clusters of CpGs with absolute DNA methylation measurements, separated by regions that are not covered due to the reduced-representation property of the RRBS protocol. Each data point corresponds to the methylation level at a single CpG, and dark blue points indicate higher methylation levels than light blue points. Infinium data is represented in a similar way to the RRBS data, and the methylation levels at single CpGs are shown as percentage values. For reference, the CpG density is indicated by stacked points (black) at the bottom of the diagram, and CpG islands (red) as well as known genes (blue) are listed as described previously55,56.
1 Scale chr7: 25 _
2
3
50 kb 27100000 27110000 27120000 27130000 2 27140000 714 000 0 0 27150000 27 27160000 27170000 27180000 27190000 27200000 27210000 2 DIP read ad ccounts for HUES6.MeDIP a U S UES MeDIP - sequencing lane 1
MeDIP (lane 1) 0_ 25 _
MeDIP - sequencing lane 2 DIP read ad ccounts for HUES6.MeDIP a U S UES
0_ 40 _
tts fo ffor HU C p - sequencing lane 1 (high methylation fraction) Cap MethylCap read counts HUES6.MethylCap
0_ 40 _
UE p - ssequencing lane 2 (medium methylation fraction) MethylCap read counts forr HUES6.MethylCap
MeDIP (lane 2)
MethylCap (high methylation fraction)
MethylCap (medium methylation fraction) 0_ 40 _
s fo o HU Cap - sequencing lane 3 (low methylation fraction) MethylCap read counts for HUES6.MethylCap
MethylCap (low methylation fraction) 0_
RRBS Cp CpG methylation for HUE HUES6.RRBS - sequencing lane 1 p me e HU
RRBS (lane 1)
RRBS Cp CpG methylation for HUE HUES6.RRBS - sequencing lane 2 p me e HU
RRBS (lane 2)
Infinium HumanMethylation27
CpG Islands
4% 2%
7% 9%
Infinium CpG methylation for HUES6.Infinium In n niu y t ylati 8% % 77% 42% 45% 8% % 83% 33% 11% 33% 3 71% 6% 87% 34% 51% Bona fide CpG islands, predicted to exhibit l n , pr lan r xhibi b a combined epigenetic score above 0.67 59% %
8% 8 % 12% 1 %
4% 4%
Perf P f Perfect Matches tto S Short Sequence (CG)
CpG Dinucleotides
RRBS data, we obtained Pearson correlations S q Genes Seq RefSeq HOXA1 HOXA3 HOXA5 HOXA7 HOXA10 HOXA11AS around 0.9 for all three region types, both RefSeq Genes HOXA1 HOXA3 HOXA4 X 4 XA4 HOXA6 A A6 HOXA9 HOXA11 HOXA13 HOXA2 HOXA10 between the two ES cell lines (HUES6 and HOXA3 HUES8) and between the colon tumor and matched normal colon tissue. For MethylCap-seq and MeDIP-seq, the at least two methods, indicating that the comparison is not distorted correlations were somewhat lower and ranged from 0.75 to 0.92 (Fig. 5 by high numbers of method-specific artifacts. and Supplementary Figs. 4–8). Using the DMR detection algorithm (Online Methods), we identified several hundred to several thousand Validation of method-specific DMRs DMRs in both sample pairs. There was substantial, but by no means To pinpoint potential problems of MeDIP-seq, MethylCap-seq or perfect, overlap between the DMRs identified by all three methods. RRBS, we manually inspected a large number of regions that were For the two human ES cell lines, 277 out of 44,440 CpG islands were identified as significant DMRs by only one method. The most comdetected as differentially methylated by each of the three methods mon reasons why DMRs identified by one method were missed by the (Fig. 5d). Pairwise comparisons for each sample and region type other methods were insufficient genomic coverage (RRBS, Infinium) (Supplementary Figs. 4–8) confirmed that the agreement between and low read numbers conferring insufficient statistical power to the three methods was statistically significant in all cases (P < 0.01, detect differential DNA methylation (MeDIP-seq, MethylCap-seq). Fisher’s exact test). In total, we observed that up to 1,000 CpG islands, No cases were identified in which the RRBS and Infinium data 405 promoter regions or 1,924 of the 1-kilobase tiling regions (that is, were in direct contradiction with each other. However, we could <0.1% of the genome) were detected as differentially methylated by at identify a few cases in which MeDIP-seq or MethylCap-seq were least two methods. Note, however, that it is not possible to combine inconsistent with RRBS and/or Infinium data. These were almost these values into a single sum of DMRs because many CpG islands exclusively located in repetitive regions, indicating that high copyoverlap with promoter regions and every CpG island and promoter number repeats can amplify minor differences in the efficiency of region overlaps with at least one tiling region. Nor does the number methylated DNA enrichment and give rise to a small number of spuof differentially methylated tiling regions provide an accurate estimate rious DMRs. In contrast, RRBS seems more robust toward such flucof the ‘true’ number of DMRs because a sizable number of DMRs are tuations because it measures DNA methylation based on the DNA no longer statistically significant when split into 1-kb regions. Despite sequence of the reads in a given region, rather than on their read these conceptual difficulties, which preclude us from giving a single frequency. We also assessed whether copy-number variation was a ‘true’ number of DMRs for each sample pair, our data clearly indicate major confounding factor for DMR discovery. This does not seem that—on average—MethylCap-seq identifies more DMRs than RRBS, to be the case for our data. The vast majority of DMRs were shorter and MeDIP-seq identifies the fewest DMRs. This order was observed than 10 kb (Supplementary Fig. 9), whereas it is not uncommon not only when we focused on the total number of DMRs per method, for cancer-specific as well as germline-transmitted copy-number but also when we considered only those DMRs that were detected by variations to extend for much longer distances29,30. nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010
1109
b
c
DNA methylation level (RRBS)
a
DNA methylation level (MethylCap)
A n a ly s i s
© 2010 Nature America, Inc. All rights reserved.
DNA methylation level (MeDIP-seq)
Figure 3 Quantification of DNA methylation with MeDIP-seq, MethylCap-seq and RRBS. 1.0 Pearson's r = 0.92 (a–c) Absolute DNA methylation levels were 1.0 1.0 calculated from the data obtained by MeDIP0.8 0.8 0.8 seq (a), MethylCap-seq (b) and RRBS (c), respectively, and compared to DNA methylation 0.6 0.6 0.6 levels determined by the Infinium assay. For 0.4 0.4 0.4 MeDIP-seq and MethylCap-seq, sequencing reads were counted in 1-kb regions surrounding 0.2 0.2 0.2 each CpG that is interrogated by the Infinium Pearson's r = 0.84 Pearson's r = 0.88 0 0 0 assay, and a regression model was used to infer 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 absolute DNA methylation levels. Scatter plots DNA methylation level (Infinium) DNA methylation level (Infinium) DNA methylation level (Infinium) and correlation coefficients were calculated on a test set that was not used for model fitting or feature selection. For RRBS, the DNA methylation level was determined as the percentage of methylated CpGs within 200 bp surrounding each CpG that is interrogated by the Infinium assay. Data shown are for the HUES6 human ES cell line, and regions that did not have sufficient sequencing coverage were excluded.
As an additional validation, we selected eight method-specific DMRs based on the ES cell comparison, and we investigated DNA methylation patterns in the two ES cell lines by clonal bisulfite sequencing (Table 2). These genomic regions were handpicked such that one method clearly identified them as DMRs whereas the two other methods did not show a trend in either direction. Note that this preselection makes the validation substantially harder than confirming randomly selected DMRs, because the magnitude of the DNA methylation difference tends to be lower for method-specific DMRs than for DMRs that are detected by multiple methods. As an additional complication, some of the selected DMRs are highly repetitive or overlap with known copy-number variations. Sequencing an average of 11 clones per sample and region we were able to confirm three out of three MethylCap-seq–specific DMRs and two out of two RRBS-specific DMRs. In contrast, two MeDIP-seq–specific DMRs could not be confirmed, and for the third region the agreement was marginal (Table 2 and Supplementary Data 1). To assess the practical relevance of the method-specific differences, we asked whether biologically interesting hits were missed by any of the three methods. For this analysis we focused on the colon samples because of the large number of genes with a known or suspected role in colon cancer. Our results show that several interesting DMRs are
detected by all methods, including tumor-specific hypermethylation in the promoters of GATA2 (ref. 31) and GATA5 (ref. 32). However, a considerable number of interesting DMRs were missed by MeDIP-seq, whereas MethylCap-seq and RRBS both detected those regions; these include tumor-specific hypermethylation in the promoter regions of SOX17 (ref. 33), POU2AF1 (ref. 34) and SEPT9 (ref. 35). Somewhat more rarely, we also observed interesting DMRs being missed by MethylCap-seq or RRBS. For example, MethylCap-seq overlooked tumor-specific hypermethylation at the promoter of SFRP1 (ref. 36), and RRBS missed tumor-specific hypermethylation at the promoter of DKK2 (ref. 37). The effect of sequencing depth on mapping performance MeDIP-seq, MethylCap-seq and RRBS use DNA sequencing as a way of counting DNA fragments to determine the percentage of methylation-enriched reads that align to specific genomic regions (MeDIP-seq, MethylCap-seq) or to calculate the ratio of methylated and unmethylated cytosines at single CpGs (RRBS). Conceptually, sequencing can be thought of as random sampling from a large pool of DNA fragments. It is therefore expected that the performance of these methods increases when sequencing more DNA fragments, until it levels off as the sequencing depth approaches saturation. To quantify
MeDIP-seq MethylCap-seq RRBS Infinium Figure 4 Genomic coverage of MeDIP-seq, No coverage No coverage 1 3–4 ≥5 MethylCap-seq, RRBS and Infinium. Genomic 1 2–4 ≥50 2–4 2 5–9 No coverage coverage was quantified by the number of DNA CpG Islands 25–49 ≥50 1 (length ≥ 700 bp) ≥100 methylation measurements that overlap with 5–9 1–4 10–24 CpG islands (top row), gene promoters (center 44,440 regions 5–9 10–24 genome-wide row) and a 1-kb tiling of the genome (bottom No coverage 25–49 50–99 10–24 25–49 row). For MeDIP-seq and MethylCap-seq, the No coverage 3–4 ≥5 25–49 ≥50 No coverage 1 ≥50 number of measurements is equal to the number 2–4 1 No coverage 10–24 Promoter regions of unique sequencing reads that fall inside each (2 kb centered on TSS) 5–9 2 1–4 1 25–49 region. For RRBS, it refers to the number of 5–9 No coverage 10–24 23,690 regions 2–4 25–49 ≥100 valid DNA methylation measurements at CpGs genome-wide 50–99 5–9 within each region (one RRBS sequencing read 1 10–24 typically yields one measurement, but can 25–49 50–99 25–49 ≥50 10–24 ≥100 25–49 ≥50 1 ≥2 No coverage 10–24 5–9 also give rise to more than one measurement 10–24 1–4 No coverage Whole genome if it contains several CpGs). For Infinium, the 1 5–9 (1 kb sliding window) 5–9 number of measurements is equal to the number 2,858,143 regions of CpGs within each region that are present on genome-wide 1 the HumanMethylation27 microarray. CpG 2–4 2–4 No coverage No coverage islands were calculated using CgiHunter (http://cgihunter.bioinf.mpi-inf.mpg.de/), requiring a minimum CpG observed versus expected ratio of 0.6, a minimum GC content of 0.5 and a minimum length of 700 bp55. Promoter regions were calculated based on Ensembl gene annotations, such that the region starts 1 kb upstream of the annotated transcription start site (TSS) and extends to 1 kb downstream of the TSS. The genomic tiling was obtained by sliding a 1-kb window through the genome such that each tile starts at the position where the previous tile ends. No repeat-masking was performed for any of the three types of genomic regions. Data are shown for the HUES6 human ES cell line.
1110
VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology
a n a ly s i s
100
Pearson's r = 0.86
80 60 40 20 0
b MethylCap-seq read frequency for HUES6
MeDIP-seq read frequency for HUES6
a
0 20 40 60 80 100 MeDIP-seq read frequency for HUES8
© 2010 Nature America, Inc. All rights reserved.
RRBS measurement for HUES6
c
1.0 Pearson's r = 0.95 0.8
100
Pearson's r = 0.86
80 60 40 20 0 0 20 40 60 80 100 MethylCap-seq read frequency for HUES8
d
Higher methylation in HUES6 Lower methylation in HUES6 MeDIP-seq MethylCap-seq 535 77
0.6
71 8
0.4 0.2 RRBS
0 0 0.2 0.4 0.6 0.8 1.0 RRBS measurement for HUES8
1,208 241 35 484 254 23 217 151 332 288
Number of CpG islands genome-wide: 44,440
Figure 5 Detection of DMRs with MeDIP-seq, MethylCap-seq and RRBS. Average DNA methylation measurements were calculated for each CpG island and compared between two human ES cell lines (HUES6 and HUES8). (a–c) Total read frequencies are shown for MeDIP-seq (a) and MethylCap-seq (b), and mean DNA methylation levels are shown for RRBS (c). Regions with insufficient sequencing coverage were excluded. (d) The Venn diagram displays the total number and mutual overlap of differentially methylated CpG islands that could be identified by each method. CpG islands were classified as hypermethylated or hypomethylated (depending on the directionality of the difference) if the absolute DNA methylation difference exceeded 20 percentage points (for RRBS) or if there was at least a twofold difference in read number between the two samples (for MeDIP-seq and MethylCap-seq)—but only if Fisher’s exact test with multiple-testing correction gave rise to an estimated false-discovery rate of differential DNA methylation that was <0.1.
this effect, we repeated the accuracy analysis (Fig. 3) and the DMR detection (Fig. 5) on randomly sampled subsets of sequencing reads. First, we benchmarked each method against the Infinium data, assessing their ability to quantify DNA methylation levels based on reduced read numbers (Supplementary Fig. 10). The results show that all three methods give rise to accurate DNA methylation measurements based on as little as 20% of the total read coverage, and almost no improvement was observed between 50% and 100% sequencing depth. Although these data suggest that relatively low sequencing depths are often sufficient for obtaining accurate DNA methylation levels, this cannot be generalized to the entire genome. Infinium probes tend to be located in CpG-rich genomic regions, which are also preferentially covered by MeDIP-seq, MethylCap-seq and RRBS measurements (Fig. 4), such that saturation is reached earlier in the vicinity of Infinium probes than in CpG-poor genomic regions. Second, we tested how many DMRs were still detected among the two sample pairs when the number of sequencing reads in each of the samples was reduced (Supplementary Fig. 11). For MeDIP-seq, the number of detected DMRs dropped to less than half when the sequencing depth was reduced to 50%, and there was little indication that the number of MeDIP-seq DMRs approaches saturation even at the highest sequencing depth. For MethylCap-seq the decrease in the number of detected DMRs is less dramatic and there is a trend toward saturation. RRBS quickly approaches saturation especially for the ES-cell comparison (Supplementary Fig. 11). Overall, the nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010
saturation analysis reinforced a conceptual difference between RRBS on the one hand and MeDIP-seq and MethylCap-seq on the other hand. In RRBS, all sequencing is focused on a well-defined, CpGrich ‘reduced representation’ of the genome, which leads to relatively early saturation but limited coverage of DMRs in CpG-poor genomic regions. In contrast, MeDIP-seq and MethylCap-seq reads are widely distributed over the genome (albeit with a significant tendency toward high coverage in CpG-rich regions), and deep sequencing increasingly uncovers weak DMRs located in CpG-poor genomic regions. DNA methylation mapping of repetitive DNA DNA methylation differences in repetitive regions have frequently been ignored by genome-wide studies, due to technical difficulties such as ambiguous read alignment (for sequencing) and crosshybridization (for microarrays). This is unfortunate given that loss of DNA methylation in repetitive DNA was the first epigenetic alteration shown to play a role in cancer4 and has been an area of active research ever since38. In the current study, we explored two complementary approaches to test for repeat-associated DNA methylation differences. First, we included repetitive regions alongside nonrepetitive regions in the DMR detection described above (Fig. 5 and Supplementary Figs. 4–8), rather than discarding all sequencing reads that map to repetitive portions of the genome. It was thus possible to identify repeat-associated DMRs in a similar way as nonrepetitive DMRs, and we could validate several such cases by clonal bisulfite sequencing (Table 2). However, the focus on specific genomic regions makes it difficult to detect global trends that affect certain repeat classes independent of their exact location in the genome. We therefore developed a second approach, which was motivated by the common origin of many repetitive regions from a small number of retrotransposons. The basic concept was to align sequencing reads to prototypic sequences (e.g., of Alu and L1 elements) to obtain DNA methylation measures per repeat class rather than per repeat instance. To that end, we obtained a manually curated list of 1,267 prototypic repeat sequences that spans the spectrum of repetitive DNA present in the human genome39, and we aligned the sequencing reads of all three methods to this collection of repeat sequences. Approximately 20% of all MeDIP-seq, MethylCap-seq and RRBS reads could be aligned with high confidence, enabling us to estimate the global DNA methylation levels for 553 prototypic repeat sequences. The results of the three methods were in excellent agreement with each other (Supplementary Data 2) and detected substantial differences in the DNA methylation levels of different repeat classes. Among Alu, SVA (SINE-VNTR-Alus) and satellite repeat sequences we observed consistently high levels of DNA methylation, whereas most LINE (long interspersed nuclear elements), LTR (long terminal repeat) and DNA repeat sequences exhibited low levels of DNA methylation in the four samples that we investigated. However, we found that the repeat sequences with the highest copy-number throughout the genome were highly methylated for all repeat classes. When we compared the DNA methylation levels in the two sample pairs (Supplementary Data 3), we observed widespread but relatively moderate hypomethylation in the colon tumor relative to matched normal colon tissue. The most common targets were Alu, SVA and satellite repeat sequences, consistent with previous reports about cancer-specific hypomethylation38. A notable difference was identified between the two ES cell lines on the one hand and the two colon samples on the other hand: the only human-specific LINE repeat sequence in our collection (L1HS_5end) exhibited high levels of DNA methylation in the two colon samples, but was largely unmethylated and even marked by histone H3K4 trimethylation in the two ES cell 1111
A n a ly s i s
© 2010 Nature America, Inc. All rights reserved.
Table 2 Validation of method-specific DMRs for MeDIP-seq, MethylCap-seq and RRBS DMR location
Description
Experimental validation
MeDIP-seq
MethylCap-seq
RRBS
MeDIP-seq–specific DMR chr10:88,149,01688,149,732 MeDIP-seq–specific DMR chr16:31,142,90431,143,799 MeDIP-seq–specific DMR chr1:211,290,079211,290,896 MethylCap-seq–specific DMR chr20:29,526,64629,527,380 MethylCap-seq–specific DMR chr2:151,825,938151,826,902 MethylCap-seq–specific DMR chr13:44,348,93444,349,700 RRBS-specific DMR chr3:186,889,821186,890,200 RRBS-specific DMR chr3:32,609,32032,609,612
Intergenic CpG island ~30 kb upstream of GRID1, partial overlap with degenerate L1 element CpG island overlapping with the terminal exon of TRIM72
HUES6: 38/56 (68%) methylated CpGs HUES8: 26/44 (59%) methylated CpGs → insignificant (P = 0.41) HUES6: 342/362 (94%) methylated CpGs HUES8: 466/523 (89%) methylated CpGs → marginally hypermeth. (P = 0.0051) HUES6: 53/60 (88%) methylated CpGs HUES8: 45/50 (90%) methylated CpGs → insignificant (P = 1.0) HUES6: 5/72 (7%) methylated CpGs HUES8: 78/84 (93%) methylated CpGs → hypomethylated (P = 1.4E-30) HUES6: 161/208 (77%) methylated CpGs HUES8: 9/104 (9%) methylated CpGs → hypermethylated (P = 3.3E-33) HUES6: 80/88 (91%) methylated CpGs HUES8: 41/79 (52%) methylated CpGs → hypermethylated (P = 1.2E-08) HUES6: 5/90 (6%) methylated CpGs HUES8: 88/90 (98%) methylated CpGs → hypomethylated (P = 4.3E-42) HUES6: 41/121 (34%) methylated CpGs HUES8: 130/143 (91%) methylated CpGs → hypomethylated (P = 3.5E-23)
Hypermethylated (Q = 1.1E-04)
Insignificant (Q = 0.59)
Insignificant (Q = 0.43)
Hypermethylated (Q = 1.2E-05)
Insignificant (Q = 0.73)
Insufficient coverage
Hypermethylated (Q = 3.0E-06)
Insignificant (Q = 0.97)
Insignificant (Q = 0.29)
Insufficient coverage
Hypomethylated (Q = 1.8E-09)
Insufficient coverage
Insignificant (Q = 0.18)
Hypermethylated (Q = 7.3E-09)
Insufficient coverage
Insignificant (Q = 0.40)
Hypermethylated (Q = 8.3-07)
Insufficient coverage
Insufficient coverage
Insignificant (Q = 0.18)
Hypomethylated (Q = 3.5E-40)
Insufficient coverage
Insignificant (Q = 0.52)
Hypomethylated (Q = 2.9E-26)
CpG island overlapping with the putative promoter region of RPS6KC1 CpG island overlapping with the putative promoter region of REM1 CpG island overlapping with the putative promoter region of RBM43 and a known copy-number variation Intergenic CpG island ~60 kb upstream of NUFIP1, partial overlap with degenerate Alu element CpG island overlapping with an internal exon and intron of IGF2BP2 Intergenic CpG island ~20kb upstream of DYNC1LI1
Experimental validation of method-specific DMRs between two ES cell lines (HUES6 and HUES8). The table summarizes the results of clonal bisulfite sequencing for eight regions that showed clear-cut DNA methylation differences according to one method but not according to the other two. The P values in column 3 were calculated from the clonal bisulfite sequencing data using Fisher’s exact test, based on the DNA methylation levels of individual CpGs. The Q values in columns 4–6 were derived from the DNA methylation maps as described in the Online Methods. One out of three MeDIP-seq–specific DMRs, three out of three MethylCap-seq–specific DMRs and two out of two RRBS-specific DMRs could be confirmed by clonal bisulfite sequencing data (bold print). All genomic coordinates are relative to the NCBI36 (hg18) genome assembly and refer to the amplicon on which the validation was performed. A detailed documentation of the validation experiments is available in Supplementary Data 1.
lines (Supplementary Data 2). These data suggest that young retrotransposons find ways to evade silencing by DNA methylation in pluripotent cells, which may contribute to their ability to maintain activity in spite of an elaborate epigenetic genome defense40. DMR discovery using the Infinium assay Our study used the Infinium HumanMethylation27 assay as a common reference for evaluating the accuracy of the sequencing-based methods, which was justified by prior studies showing high quantitative accuracy of the Infinium assay17,24. However, no prior study investigated the Infinium HumanMethylation27 assay’s power to detect DMRs on a genome-wide scale, hence we could not use the Infinium assay as reference when evaluating DMR discovery by the sequencing-based methods. In fact, its low genomic coverage is expected to limit the utility of the Infinium assay for DMR discovery in spite of its well-established accuracy (Fig. 4). To empirically address this question, we initially performed statistical testing in much the same way as was done for Figure 5. However, most CpG islands were covered by only two Infinium probes, which resulted in low statistical power to detect significant differences. Specifically, paired-samples t-tests identified just three significant DMRs among the ES cell lines and two DMRs between the colon tumor and matched normal colon tissue (data not shown). Thus, we reformulated our question and asked how many true DMRs exhibited suggestive (albeit insignificant) DNA methylation differences in the Infinium data. As an approximation of true DMRs, we focused on those CpG islands that were detected by at least two sequencing-based methods (which are unlikely to contain a high number of technical artifacts according to the comparative validations described above). Between the two ES cell lines a total of 1,000 consensus DMRs were identified (corresponding to the sum of all center fields in Fig. 5), of which 251 were covered by at least one Infinium probe. Similarly, we identified 463 consensus DMRs 1112
between the colon tumor and matched normal colon tissue, of which 177 were covered by at least one Infinium probe. In most cases, the directionality of the difference was consistent between the consensus DMRs and the Infinium data (Supplementary Fig. 12). But when we imposed a minimum threshold of 20 percentage points DNA methylation difference in the same way as for RRBS, the number of Infiniumdetected DMRs dropped to 162 (ES-cell comparison) and 95 (colon cancer comparison). In other words, the Infinium assay detected approximately one-fifth of the consensus DMRs that we identified by the sequencing-based methods. DISCUSSION Over the last decade, DNA methylation mapping has played an important role in establishing the prevalence of altered DNA methylation in cancer cells 41,42. More recently, researchers have also started to systematically study the role of DNA methylation in a wide range of non-neoplastic diseases 43. This is indeed a good time to probe for epigenetic alterations that contribute to human diseases. Genome-wide association studies have been completed for all common diseases and point to a major role of nongenetic factors in the etiology of most diseases44. Furthermore, it has been suggested that epigenetic events could provide a tractable link between the genome and the environment, with the epigenome emerging as a biochemical record of relevant life events45,46. Systematic investigation of these topics requires powerful, accurate and cost-efficient methods for identifying DNA methylation differences across many samples. The goal of this study was to evaluate current methods for global DNA methylation mapping and to compare their performance when applied under real-world conditions. To mimic a typical diseasecentered case-control study, we worked with primary patient material (colon samples) and used lower amounts of input DNA than in most previous studies (MeDIP-seq: 300 ng; MethylCap-seq: 1 μg; RRBS: VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
a n a ly s i s 50 ng; Infinium: 1 μg). We focused on cell types that are known to exhibit relatively moderate DNA methylation differences 31,47, in contrast to the massive DNA methylation alterations that are frequently observed in cultured somatic cells11 and cancer cell lines48. Finally, because all four methods included in the current study are widely available and not excessively costly, there are few obstacles to using this technology comparison as a blueprint for individual laboratory efforts as well as large-scale epigenomic case-control studies investigating the epigenetics of human diseases. Overall, our data confirmed that all four methods provide accurate DNA methylation measurements and can be used to detect DMRs in clinical samples. In terms of accuracy, the bisulfite-based methods (RRBS, Infinium) performed slightly better than the enrichmentbased methods and did not require any statistical correction of CpG bias. Furthermore, the genomic coverage was moderately higher for MethylCap-seq than for MeDIP-seq, RRBS coverage was by design focused on CpG-rich regions and the Infinium assay covered a relatively small number of preselected genomic regions. Despite the striking differences in genomic coverage, a substantial fraction of DMRs detected by MeDIP-seq or MethylCap-seq were also identified by RRBS, and vice versa. This somewhat counterintuitive observation can be explained by the role of region-specific read coverage for the ability to identify statistically significant DMRs. If a genomic region is CpG poor and thus rarely sequenced by MeDIPseq or MethylCap-seq, both methods have low statistical power to detect differential DNA methylation. In contrast, CpG-rich genomic regions tend to be more amenable to DMR detection by MeDIP-seq and MethylCap-seq and are also frequently covered by RRBS measurements. Finally, we observed that MethylCap-seq was able to detect roughly twice as many DMRs as MeDIP-seq at comparable sequencing depths, RRBS detected more DMRs than MeDIP-seq but fewer DMRs than MethylCap-seq, and the Infinium assay detected only 20% of the consensus DMRs identified by the sequencing-based methods. These differences could be reproduced in two independent pairwise comparisons, providing strong indication that they are robust across biological replicates and cannot be explained by random experimental variation. On the other hand, we used one specific protocol for each method, and it is quite possible that protocol variations (e.g., different antibody for MeDIP-seq, different elution procedure for MethylCap-seq or different size selection for RRBS) would produce different results. Our study also reinforces the importance of sequencing depth as a key parameter determining the power to detect differential methylation with any of the sequencing-based methods. To allow for a fair and practically relevant comparison, we sequenced ~30–40 million reads for each sample and method. However, it became evident that deeper sequencing would identify further DMRs, especially for MeDIP-seq and MethylCap-seq (Supplementary Fig. 11). For disease-centered studies it is therefore necessary to make an informed decision about how to distribute the available resources between sequencing fewer samples more deeply and sequencing more samples less deeply. Such a decision can be guided by statistical power calculations when some prior knowledge exists about the characteristics of expected DMRs (e.g., magnitude of difference, location in CpG-rich versus CpG-poor genomic regions), or they can be dictated by practical considerations such as the number of available samples. In our experience and at current sequencing costs, a range of ~30–60 million reads per sample for MeDIP-seq and MethylCap-seq, and a range of ~10–20 million reads per sample for RRBS constitute a viable compromise between breadth and depth of sequencing. In contrast, whole-genome bisulfite sequencing49 provides comprehensive genomic coverage at the cost of nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010
having to sequence over a billion reads per sample. On the other end of the spectrum, low sequencing depths are often sufficient to detect strong differences such as global loss of DNA methylation but do not provide reliable locus-specific information50. Genome-wide studies tend to ignore repetitive regions due to technical difficulties, and the few studies that focused specifically on mapping DNA methylation in repetitive regions did so at relatively low coverage51–53. The current data set was well-suited to analyze DNA methylation in repetitive regions because the joint results obtained by three different experimental methods helped us to control for technical artifacts that can burden the analysis of repetitive DNA. We observed that repeat sequences are most highly methylated when they are CpG rich and highly prevalent in the human genome (Supplementary Data 2). In contrast, the DNA methylation levels varied widely among repeat sequences that are either CpG poor or infrequent in the genome. These results lend support to the hypothesis that DNA methylation provides a mechanism for keeping active retrotransposons in check54. They also argue for a highly specific mechanism of repeat repression, which targets DNA methylation mostly to those repeat sequences that threaten genome integrity, whereas many ‘benign’ repeat sequences may remain unmethylated. In summary, we benchmarked four methods for genome-scale DNA methylation mapping in terms of their accuracy and power to detect DNA methylation differences. These results will facilitate the selection of suitable methods for studying the role of DNA methylation in disease and development. Methods Methods and any associated references are available in the online version of the paper at http://www.nature.com/naturebiotechnology/. Note: Supplementary information is available on the Nature Biotechnology website. Acknowledgments We thank A. Crenshaw and M. Parkin (Broad Institute) for assistance with the Infinium assay and K. Halachev (Max Planck Institute for Informatics) for the provision of genome annotation files. C.B. is supported by a Feodor Lynen Fellowship from the Alexander von Humboldt Foundation. A.B.B. is supported by the Dutch Cancer Foundation (KWF, grant KUN 2008-4130). A.M. is supported by the Massachusetts Life Science Center and the Pew Charitable Trusts. The described work was in part funded by the Pew Charitable Trusts, the US National Institutes of Health Roadmap Initiative on Epigenomics (U01ES017155) and the European Union’s CANCERDIP project (HEALTH-F2-2007-200620). AUTHOR CONTRIBUTIONS C.B., E.M.T. and A.M. conceived and designed the study; E.M.T., A.B.B., F.S. and H.G. performed the experiments; C.B., F.M. and N.J. analyzed the data; C.B., A.G., H.G.S. and A.M. interpreted the results; and C.B. and A.M. wrote the paper. COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests. Published online at http://www.nature.com/naturebiotechnology/. Reprints and permissions information is available online at http://npg.nature.com/ reprintsandpermissions/.
1. Bird, A. DNA methylation patterns and epigenetic memory. Genes Dev. 16, 6 (2002). 2. Baylin, S.B. & Ohm, J.E. Epigenetic gene silencing in cancer—a mechanism for early oncogenic pathway addiction? Nat. Rev. Cancer 6, 107–116 (2006). 3. Esteller, M. Epigenetics in cancer. N. Engl. J. Med. 358, 1148–1159 (2008). 4. Feinberg, A.P. & Tycko, B. The history of cancer epigenetics. Nat. Rev. Cancer 4, 143–153 (2004). 5. Issa, J.P. CpG island methylator phenotype in cancer. Nat. Rev. Cancer 4, 988–993 (2004). 6. Jones, P.A. & Laird, P.W. Cancer epigenetics comes of age. Nat. Genet. 21, 163–167 (1999).
1113
© 2010 Nature America, Inc. All rights reserved.
A n a ly s i s 7. Richardson, B. Primer: epigenetics of autoimmunity. Nat. Clin. Pract. Rheumatol. 3, 521–527 (2007). 8. Tobi, E.W. et al. DNA methylation differences after exposure to prenatal famine are common and timing- and sex-specific. Hum. Mol. Genet. 18, 4046–4053 (2009). 9. Urdinguio, R.G., Sanchez-Mut, J.V. & Esteller, M. Epigenetic mechanisms in neurological diseases: genes, syndromes, and therapies. Lancet Neurol. 8, 1056–1072 (2009). 10. Bock, C. Epigenetic biomarker development. Epigenomics 1, 99–110 (2009). 11. Meissner, A. et al. Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature 454, 766–770 (2008). 12. Down, T.A. et al. A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis. Nat. Biotechnol. 26, 779–785 (2008). 13. Weber, M. et al. Chromosome-wide and promoter-specific analyses identify sites of differential DNA methylation in normal and transformed human cells. Nat. Genet. 37, 853–862 (2005). 14. Brinkman, A.B. et al. Whole-genome DNA methylation profiling using MethylCapseq-seq. Methods published online, doi:10.1016/j.ymeth.2010.06.012 (11 June 2010). 15. Rauch, T. & Pfeifer, G.P. Methylated-CpG island recovery assay: a new technique for the rapid detection of methylated-CpG islands in cancer. Lab. Invest. 85, 1172–1180 (2005). 16. Serre, D., Lee, B.H. & Ting, A.H. MBD-isolated Genome Sequencing provides a high-throughput and comprehensive survey of DNA methylation in the human genome. Nucleic Acids Res. 38, 391–399 (2010). 17. Bibikova, M. et al. Genome-wide DNA methylation profiling using Infinium assay. Epigenomics 1, 177–200 (2009). 18. Eckhardt, F. et al. DNA methylation profiling of human chromosomes 6, 20 and 22. Nat. Genet. 38, 1378–1385 (2006). 19. Brunner, A.L. et al. Distinct DNA methylation patterns characterize differentiated human embryonic stem cells and developing human fetal liver. Genome Res. 19, 1044–1056 (2009). 20. Irizarry, R.A. et al. Comprehensive high-throughput arrays for relative methylation (CHARM). Genome Res. 18, 780–790 (2008). 21. Oda, M. et al. High-resolution genome-wide cytosine methylation profiling with simultaneous copy number analysis and optimization for limited cell numbers. Nucleic Acids Res. 37, 3829–3839 (2009). 22. Gu, H. et al. Genome-scale DNA methylation mapping of clinical samples at singlenucleotide resolution. Nat. Methods 7, 133–136 (2010). 23. Cowan, C.A. et al. Derivation of embryonic stem-cell lines from human blastocysts. N. Engl. J. Med. 350, 1353–1356 (2004). 24. Weisenberger, D.J. et al. Comprehensive DNA methylation analysis on the Illumina Infinium assay platform (Illumina, San Diego, California, USA, 2008). 〈http://www. illumina.com/Documents/products/appnotes/appnote_infinium_methylation.pdf〉. (2008). 25. Bock, C. et al. Inter-individual variation of DNA methylation and its implications for large-scale epigenome mapping. Nucleic Acids Res. 36, e55 (2008). 26. Pelizzola, M. et al. MEDME: an experimental and analytical methodology for the estimation of DNA methylation levels based on microarray derived MeDIPenrichment. Genome Res. 18, 1652–1659 (2008). 27. Robinson, M.D., Statham, A.L., Speed, T.P. & Clark, S.J. Protocol matters: which methylome are you actually studying? Epigenomics 2, 587 (2010). 28. Faul, F. et al. G*Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav. Res. Methods 39, 175–191 (2007). 29. Beroukhim, R. et al. The landscape of somatic copy-number alteration across human cancers. Nature 463, 899–905 (2010). 30. Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444–454 (2006).
1114
31. Irizarry, R.A. et al. The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores. Nat. Genet. 41, 178–186 (2009). 32. Hellebrekers, D.M. et al. GATA4 and GATA5 are potential tumor suppressors and biomarkers in colorectal cancer. Clin. Cancer Res. 15, 3990–3997 (2009). 33. Zhang, W. et al. Epigenetic inactivation of the canonical Wnt antagonist SRY-box containing gene 17 in colorectal cancer. Cancer Res. 68, 2764–2772 (2008). 34. Tenesa, A. et al. Genome-wide association scan identifies a colorectal cancer susceptibility locus on 11q23 and replicates risk loci at 8q24 and 18q21. Nat. Genet. 40, 631–637 (2008). 35. Lofton-Day, C. et al. DNA methylation biomarkers for blood-based colorectal cancer screening. Clin. Chem. 54, 414–423 (2008). 36. Caldwell, G.M. et al. The Wnt antagonist sFRP1 in colorectal tumorigenesis. Cancer Res. 64, 883–888 (2004). 37. Hirata, H. et al. Wnt antagonist gene DKK2 is epigenetically silenced and inhibits renal cancer progression through apoptotic and cell cycle pathways. Clin. Cancer Res. 15, 5678–5687 (2009). 38. Ehrlich, M. DNA hypomethylation in cancer cells. Epigenomics 1, 239–259 (2009). 39. Jurka, J. Repbase update: a database and an electronic journal of repetitive elements. Trends Genet. 16, 418–420 (2000). 40. Bestor, T.H. & Tycko, B. Creation of genomic methylation patterns. Nat. Genet. 12, 363–367 (1996). 41. Esteller, M. Cancer epigenomics: DNA methylomes and histone-modification maps. Nat. Rev. Genet. 8, 286–298 (2007). 42. Jones, P.A. & Baylin, S.B. The epigenomics of cancer. Cell 128, 683–692 (2007). 43. Feinberg, A.P. Phenotypic plasticity and the epigenetics of human disease. Nature 447, 433–440 (2007). 44. Manolio, T.A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009). 45. Foley, D.L. et al. Prospects for epigenetic epidemiology. Am. J. Epidemiol. 169, 389–400 (2009). 46. Heijmans, B.T. et al. The epigenome: archive of the prenatal environment. Epigenetics 4, 526–531 (2009). 47. Doi, A. et al. Differential methylation of tissue- and cancer-specific CpG island shores distinguishes human induced pluripotent stem cells, embryonic stem cells and fibroblasts. Nat. Genet. 41, 1350–1353 (2009). 48. Smiraglia, D.J. et al. Excessive CpG island hypermethylation in cancer cell lines versus primary human malignancies. Hum. Mol. Genet. 10, 1413–1419 (2001). 49. Lister, R. et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462, 315–322 (2009). 50. Popp, C. et al. Genome-wide erasure of DNA methylation in mouse primordial germ cells is affected by AID deficiency. Nature 463, 1101–1105 (2010). 51. Horard, B. et al. Global analysis of DNA methylation and transcription of human repetitive sequences. Epigenetics 4, 339–350 (2009). 52. Rodriguez, J. et al. Genome-wide tracking of unmethylated DNA Alu repeats in normal and cancer cells. Nucleic Acids Res. 36, 770–784 (2008). 53. Weisenberger, D.J. et al. Analysis of repetitive element DNA methylation by MethyLight. Nucleic Acids Res. 33, 6823–6836 (2005). 54. Yoder, J.A., Walsh, C.P. & Bestor, T.H. Cytosine methylation and the ecology of intragenomic parasites. Trends Genet. 13, 335–340 (1997). 55. Bock, C. et al. CpG island mapping by epigenome prediction. PLoS Comput. Biol. 3, e110 (2007). 56. Pruitt, K.D., Tatusova, T. & Maglott, D.R. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35 (Database issue), D61–D65 (2007).
VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology
ONLINE METHODS
© 2010 Nature America, Inc. All rights reserved.
Sample origin and cell culture. Human ES cells were cultured in knockout serum replacement (KOSR) medium according to established protocols23 and genomic DNA was extracted as described previously 57. DNA for the colon tumor and matched normal colon tissue was purchased from BioChain (lot number A704198). Both samples originate from the same donor, an 81-yearold male patient diagnosed with moderately differentiated adenocarcinoma. Methylated DNA immunoprecipitation (MeDIP-seq). MeDIP-seq12 was performed using the EZ DNA methylation kit (Zymo Research). A total of 300ng DNA per sample was sonicated using Bioruptor (Diagenode) with 8 intervals of 10min (30s on, 30s off). Sonicated DNA was end-repaired and ligated with sequencing adapters as described previously12. After gel-based size selection, methylated DNA immunoprecipitation was performed according to the manufacturer’s protocol. A total of 1 μg of monoclonal antibody against 5-methylcytosine (included in the EZ DNA methylation kit) was used for immunoprecipitation. The immunoprecipitated DNA was PCR-amplified and the specificity of the enrichment was confirmed by qPCR for selected loci as described previously58. Two lanes of 36-bp single-ended sequencing were performed on the Illumina Genome Analyzer II according to the manufacturer’s standard protocol. Maq with default parameters was used to align the sequencing reads to the NCBI36 (hg18) assembly of the human genome59. Methylated-DNA capture (MethylCap-seq). MethylCap-seq14 was performed in a robotized procedure using a SX-8G / IP-Star (Diagenode). 2 μg of His6GST-MBD (Diagenode) was combined with 1μg of sonicated DNA in 200μl of binding buffer (BB, 20mM Tris-HCl pH 8.5, 0.1% Triton X-100) containing 200mM NaCl. This solution was incubated at 4 °C for 2 h. Magnetic GST-beads were prepared by washing 35μl of a well-mixed MagneGST glutathione particle suspension (Promega) with 200 μl of binding buffer plus 200 mM NaCl at 4 °C. Washing was repeated once and the supernatant was removed. The GST-MBD-DNA solution was added to the washed and collected beads, and this suspension was rotated for another hour at 4 °C. After removal of the supernatant (this is the flow-through) the beads-GST-MBD-DNA complexes were eluted by washing. 200 μl of binding buffer with different concentrations of NaCl was added and the suspension was rotated for 10min at 4 °C. Beads were captured using a magnet, and the supernatant was collected. The elution procedure consisted of 1× 300 mM (wash), 2 × 400 mM (wash), 1 × 500 mM (“low” eluate), 1 × 600 mM (“medium” eluate), 1× 800 mM NaCl (“high” eluate). The collected eluates were purified using QIAquick PCR purification spin columns (Qiagen), eluted with 100 μl elution buffer and prepared for sequencing as described previously14. A single lane of 36-bp single-ended sequencing on the Illumina Genome Analyzer II was performed for the low, medium and high eluates, respectively. The sequencing reads were aligned to the NCBI36 (hg18) assembly of the human genome using Illumina’s analysis pipeline (ELAND) with default parameters. The lanes for each of the three eluates are shown separately in Figure 2, and we tested whether the accuracy relative to the Infinium assay could be improved by taking this additional information into account. However, a linear model that was based on the separate read counts of the three lanes did not outperform a model that was based on the sum of the three lanes, which is why we combined the reads from all three libraries per sample for the analyses described in this paper. Reduced representation bisulfite sequencing (RRBS). RRBS22 was performed according to a previously published protocol57 with some optimizations for clinical samples and low amounts of input DNA 22. The main steps were: (i) A total of 50ng (ES cells) or 1 μg (colon samples) genomic DNA was digested by 5U to 20 U of MspI (New England Biolabs, NEB) for up to 16 h. (ii) End-repair and adenylation of digested DNA were performed in a 20 μl reaction consisting of 10U of Klenow fragments (3′→ 5′ exo-, NEB), 2 μl premixed nucleotide triphosphates (1 mM dGTP, 10 mM dATP, 1 mM 5′ methylated dCTP). The reaction was incubated at 30 °C for 30 min followed by 37 °C for additional 30 min. (iii) Preannealed 5-methylcytosine-containing Illumina adapters were ligated with adenylated DNA fragments in a 20 μl reaction containing of 1 μl concentrated T4 ligase (NEB), 1–2 μl of 15 μM adapters at 16 °C for 16 to 20 h. (iv) Gel-based selection for fragments with insertion sizes of 40 to 120 bp and 120 to 220 base pairs was performed as described previously22.
doi:10.1038/nbt.1681
(v) Bisulfite treatment with the EpiTect Bisulfite Kit (Qiagen) was conducted following the protocol designated for DNA isolated from formalin-fixed and paraffin-embedded tissues. Two rounds of conversion were performed in order to maximize bisulfite conversion rates. The final bisulfite-converted DNA was eluted with 2× 20μl pre-heated (65 °C) EB buffer. (vi) To determine the minimum number of PCR cycles for final library enrichment, analytical (10 μl) PCR reactions containing 0.5 μl of bisulfite-treated DNA, 0.2 μM each of Illumina PCR primers LPX1.1 and 2.1 and 0.5 U PfuTurbo Cx Hotstart DNA polymerase (Stratagene) were set up. The thermocycler conditions were: 5 min at 95 °C, varied cycle numbers (10–20) of 20 s at 95 °C, 30 s at 65 °C, 30 s at 72 °C, followed by 7 min at 72 °C. PCR products were visualized by running on a 4–20% polyacrylamide Criterion TBE Gel (Bio-Rad) and stained by SYBR Green. The final libraries were generated by 8 of 25 μl PCR reaction with each one containing 2–3μl of bisulfite-converted template, 1.25 U PfuTurbo Cx Hotstart polymerase and 0.2 μM each of Illumina LPX1.1 as well as 2.1 PCR primers. The libraries were PCR amplified and sequenced on the Illumina Genome Analyzer II as described previously22. The sequencing reads were aligned to the NCBI36 (hg18) assembly of the human genome using a custom alignment software that was developed for RRBS data11. Microarray-based epigenotyping (Infinium). Infinium17 analysis was performed by the Genetic Analysis Platform at the Broad Institute. A total of 1 μg of genomic DNA per sample was bisulfite-treated according to the manufacturer’s protocol and hybridized onto Infinium HumanMethylation27 bead arrays (Illumina). We previously observed almost perfect agreement between technical replicates (Pearson’s r > 0.98), which is why only a single hybridization was performed for each sample. Data preparation and quality control. For MeDIP-seq and MethylCap-seq, the aligned reads were extended to the mean fragment length obtained during sonication, and from each group of duplicate reads (that is, reads aligned to the exact same start position on the same chromosome) all but one read were discarded, in order to minimize the impact of PCR bias on downstream analysis. For RRBS, the aligned reads were compared to the reference genome, and the DNA methylation status was determined using custom software as described previously22. Infinium HumanMethylation27 data were processed with Illumina’s BeadStudio 3.2 software, using the default background subtraction method for normalization. UCSC Genome Browser tracks were constructed by custom scripts implemented in the Python programming language (http://www.python.org/). Quantification of absolute DNA methylation levels. We used linear regression models to estimate the absolute DNA methylation levels from the MeDIPseq and MethylCap-seq read counts. Based on a number of different feature selection experiments, we found that the following combination of variables was robustly predictive of DNA methylation levels: (i) the square root of the total number of MeDIP-seq or MethylCap-seq reads within the given region, (ii) the square root of the total number of whole-cell extract (WCE) reads within the region (based on a cross-tissue WCE track that we routinely use for ChIP-seq data normalization), (iii) the logit of the CpG frequency within the region, (iv) the relative GC content of the region, (v) the ratio of Cs relative to CpGs, and (vi) the relative repeat content of the region as determined by RepeatMasker (http://www.repeatmasker.org/). For both MeDIP-seq and MethylCap-seq, we observed that the read frequencies were strongly positively associated with the absolute methylation level obtained using the Infinium assay, whereas the repeat content was moderately positively associated. In contrast, the logit of the CpG frequency was highly negatively associated with DNA methylation, and all other variables as well as the model’s intercept exhibited a moderately negative association. For model fitting and performance evaluation, the current data set was split into equally sized training and test sets. All model fitting was performed using the R statistics package (http://www.r-project.org/). Identification of differentially methylated regions. In our experience, classical peak detection60,61 is not well-suited for DMR identification because of the high number of spurious hits encountered when borderline peaks are detected in one sample but not in the other (C.B., unpublished observation).
nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
Instead, we used a statistical test to compare two samples directly with each other. For a given region with RRBS data, we count the number of methylated vs. unmethylated CpGs in both samples and perform Fisher’s exact test to obtain a p-value that is indicative of the likelihood of the region being a DMR. Similarly, for MeDIP-seq and MethylCap-seq we count the numbers of reads that align inside the region for both samples and use Fisher’s exact test to contrast these values with the total numbers of reads that align elsewhere in the genome. And for the Infinium assay we use a paired-samples t-test to compare the two samples’ β-values of all Infinium probes inside the region. These tests are performed on a large number of genomic regions in parallel (e.g., on all CpG islands), and the p-values are corrected for multiple testing using the q-value method62. Genomic regions with a q-value of less than 0.1 are flagged as hypermethylated or hypomethylated (depending on the directionality of the difference), but only if the absolute DNA methylation difference exceeds 20 percentage points (for RRBS and Infinium) or if there is at least a twofold difference in the read number (for MeDIP-seq and MethylCap-seq). These thresholds were chosen by their practical utility in a number of comparisons between different cell types and have no further justification. We also mark genomic regions with insufficient sequencing coverage, but do not exclude them from DMR analysis. For MeDIP-seq and MethylCap-seq we require at least ten reads per 10 million total reads for the sample with higher read coverage, and for RRBS we require a minimum of five CpGs with at least five reads each in both samples. This statistical approach to DMR identification requires us to define sets of genomic regions on which the analysis is being performed. We pursued a two-way strategy to maximize the chances of finding interesting DMRs. One the one hand, we focused specifically on CpG islands and gene promoters, which are prime candidates for epigenetic regulation. This approach provides increased statistical power for regions with well-known functional roles because the relatively low number of CpG islands and gene promoters reduces the burden of multiple-testing correction compared to the genomewide case. On the other hand, we used a 1-kilobase tiling of the genome to detect DMRs that are located outside of any candidate regions. And to cast an even wider net, we collected a comprehensive set of 13 types of genomic regions, which includes not only CpG islands and gene promoters, but also CpG island shores31, enhancers63, evolutionary conserved regions and other types of genomic regions. DMR data for all of these region sets were calculated using a set of Python and R scripts and are available online (http://meth-benchmark.computational-epigenetics.org/). Experimental validation. Based on the CpG islands that were detected as differentially methylated between the two ES cell lines (Fig. 5), we manually selected eight method-specific DMRs for experimental validation. To that end, those CpG islands that were identified as statistically significant DMRs by one method (but not by the other two methods) were visually inspected in the UCSC Genome Browser, and regions were selected for validation only if the data fully supported their classification as method-specific DMRs. In particular, regions were not selected if a second method already picked up a suggestive but insignificant trend in the same direction as the first method, or when the data of the first method already suggested that the DMR was a false-positive hit (e.g., because of contradictory trends in the vicinity of the DMR). Experimental validation was performed by clonal bisulfite sequencing following established protocols64. Primers were designed using MethPrimer65 such that the amplicon overlapped with those CpGs that exhibited the highest
nature biotechnology
levels of differential methylation according to our original data. To prepare for bisulfite sequencing, 1 μg of DNA was bisulfite-converted using the EpiTect kit (Qiagen); 50 ng of bisulfite-converted DNA was PCR-amplified (Supplementary Data 1 for primer sequences); and purified amplicons were cloned using the TOPO TA cloning kit (Invitrogen). For each region an average of 11 clones were randomly chosen for sequencing. All sequencing data were processed using the BiQ Analyzer software66, and the results are summarized in Supplementary Data 1. Analysis of repetitive DNA. Repeat sequences were obtained from database version 14.07 of RepBase Update39, which is publicly available online (http://www.girinst.org/server/RepBase/index.php). From a total of 11,670 prototypic repeat sequences we selected those 1,267 that were annotated either to human or to its ancestors in the taxonomic tree, and we combined these prototypic repeat sequences into a pseudo-genome file. Maq with default parameters was used to align MeDIP-seq, MethylCap-seq, RRBS, ChIP-seq (H3K4me3) and whole-cell extract (WCE) sequencing reads against this pseudo-genome59. For RRBS, both the reads and the reference genome were bisulfite-converted in silico before the alignment. The epigenetic status of each prototypic repeat sequence was quantified as follows: (i) For MeDIP-seq, MethylCap-seq and ChIP-seq we calculated the odds ratios relative to the WCE data. (ii) For RRBS we computed the number of methylated CpGs, total number of CpG measurements and percentage of DNA methylation based on the comparison of the aligned reads with the prototypic repeat sequence. We discarded rare repeats with WCE coverage below 100 aligned reads or RRBS coverage below 25 CpG measurements, resulting in 553 prototypic repeat sequences that were used for further analysis. Among these were 97 LINE class sequences (92 of them from the L1 family), 51 SINEs (48 of them from the Alu family), 6 SVAs, 62 DNA repeats, 15 satellite repeats, 315 LTRs, 1 low-complexity repeat and 6 RNA repeats (Supplementary Data 2). To quantify differential methylation between a pair of MeDIP-seq and MethylCap-seq samples, we calculated the pairwise odds ratio of the read coverage for each prototypic repeat sequence. The absolute DNA methylation difference was used in the case of RRBS (Supplementary Data 3). The significance of the difference was assessed using Fisher’s exact test in the same way as for the nonrepetitive genome (described above). 57. Smith, Z.D. et al. High-throughput bisulfite sequencing in mammalian genomes. Methods 48, 226–232 (2009). 58. Rakyan, V.K. et al. An integrated resource for genome-wide identification and analysis of human tissue-specific differentially methylated regions (tDMRs). Genome Res. 18, 1518–1529 (2008). 59. Li, H., Ruan, J. & Durbin, R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18, 1851–1858 (2008). 60. Bock, C. & Lengauer, T. Computational epigenetics. Bioinformatics 24, 1–10 (2008). 61. Park, P.J. ChIP-seq: advantages and challenges of a maturing technology. Nat. Rev. Genet. 10, 669–680 (2009). 62. Storey, J.D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100, 9440–9445 (2003). 63. Heintzman, N.D. et al. Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature 459, 108–112 (2009). 64. Hajkova, P. et al. DNA-methylation analysis by the bisulfite-assisted genomic sequencing method. Methods Mol. Biol. 200, 143–154 (2002). 65. Li, L.C. & Dahiya, R. MethPrimer: designing primers for methylation PCRs. Bioinformatics 18, 1427–1431 (2002). 66. Bock, C. et al. BiQ Analyzer: visualization and quality control for DNA methylation data from bisulfite sequencing. Bioinformatics 21, 4067–4068 (2005).
doi:10.1038/nbt.1681
Articles
Non-invasive imaging of human embryos before embryonic genome activation predicts development to the blastocyst stage © 2010 Nature America, Inc. All rights reserved.
Connie C Wong1,2,7, Kevin E Loewke1–3,6,7, Nancy L Bossert4, Barry Behr2, Christopher J De Jonge4, Thomas M Baer5 & Renee A Reijo Pera1,2 We report studies of preimplantation human embryo development that correlate time-lapse image analysis and gene expression profiling. By examining a large set of zygotes from in vitro fertilization (IVF), we find that success in progression to the blastocyst stage can be predicted with >93% sensitivity and specificity by measuring three dynamic, noninvasive imaging parameters by day 2 after fertilization, before embryonic genome activation (EGA). These parameters can be reliably monitored by automated image analysis, confirming that successful development follows a set of carefully orchestrated and predictable events. Moreover, we show that imaging phenotypes reflect molecular programs of the embryo and of individual blastomeres. Single-cell gene expression analysis reveals that blastomeres develop cell autonomously, with some cells advancing to EGA and others arresting. These studies indicate that success and failure in human embryo development is largely determined before EGA. Our methods and algorithms may provide an approach for early diagnosis of embryo potential in assisted reproduction. Little is known about the basic pathways and events of early human embryo development, including factors that would aid in predicting success or failure to develop. Consequently, to increase the chances of pregnancy through IVF, multiple embryos are often transferred to the uterus, despite the potential for well-documented adverse outcomes. Development of the human embryo begins with the fusion of sperm and egg, the epigenetic reprogramming of the gametic pronuclei and a series of cleavage divisions that culminate with activation of the embryo nic genome by day 3 of development1. The embryo compacts to form a morula and subsequently a blastocyst, containing the outer trophectoderm and inner cell mass1. Although development of the human embryo shares many features with other species, there are also some notable differences, including unique gene-expression and epigenetic patterns and a protracted period of transcriptional silence through the first 3 d after fertilization1–9. In the mouse, by contrast, activation of the zygotic genome is initiated concurrent with the first cleavage division on day 1 (refs. 7,8). Human embryo development is also more fragile than that of many other species. Human fecundity rates are relatively low, largely due to pre- and post-implantation embryo loss10,11. In vitro, 50–70% of IVF embryos fail to reach the blastocyst stage12,13. Most human embryo research has been based on a small number of samples generated under diverse experimental conditions1,14–17. Studies that involve imaging have been limited to measurements of early development, such as pronuclear formation and fusion and time to first cleavage18–21, and molecular profiling studies have generally
required pooling of oocytes, embryos or blastomeres, which masks differences in gene expression between embryos or between single blastomeres within an embryo15–17,22,23. Here we sought to overcome these limitations and to define critical pathways and events of human embryo development by correlating imaging profiles and molecular data throughout preimplantation development from the zygote to the blastocyst stage. We studied a large set of supernumerary IVF embryos that had been cryopreserved at the zygote stage 12–18 h after fertilization (Fig. 1). The embryos appeared representative of the typical IVF population, as they were frozen at the two-pronucleate (2PN) stage and thus indiscriminately selected for cryopreservation relative to those selected for culture. This is in contrast to embryos cryopreserved at the 8-cell stage or later, which are not selected for transfer during fresh IVF cycles and may therefore be of lower quality. With this unique set of embryos, we carried out a large-scale study that correlated time-lapse image analysis and gene expression profiling to show that successful development to the blastocyst stage can be predicted by the 4-cell stage, before EGA. RESULTS Cytokinesis as an embryo quality marker A normal human zygote undergoes the first cleavage division early on day 2, at ~24–27 h after fertilization18–20,24 (Fig. 2a, embryo H in Supplementary Video 1). Subsequently, the embryo cleaves to a 4- and 8-cell embryo on days 2 and 3, respectively, before compacting
1Institute
for Stem Cell Biology and Regenerative Medicine, School of Medicine, Stanford University, Stanford, California, USA. 2Department of Obstetrics and Gynecology, School of Medicine, Stanford University, Stanford, California, USA. 3Department of Mechanical Engineering, Stanford University, Stanford, California, USA. 4Reproductive Medicine Center, University of Minnesota, Minneapolis, Minnesota, USA. 5Stanford Photonics Research Center, Department of Applied Physics, Stanford University, Stanford, California, USA. 6Present address: Auxogyn, Inc., Menlo Park, California, USA. 7These authors contributed equally to this work. Correspondence should be addressed to R.A.R.P. ([email protected]). Received 5 April; accepted 3 September; published online 3 October 2010; doi:10.1038/nbt.1686
nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010
1115
Articles 1 Day 1: Thaw 1-cell Time-lapse 2 human embryos imaging on multiple microscopes Expt. 1: n = 61 Expt. 2: n = 80 Expt. 3: n = 64 Expt. 4: n = 37
Day 1–2
4
Day 3
Day 4
Day 5–6
3 Single embryos Single blastomeres
© 2010 Nature America, Inc. All rights reserved.
High-throughput, singlecell qPCR analysis
Harvest a mixture of normal and arrested embryos on consecutive days
Figure 1 Experimental plan. We tracked the development of 242 twopronucleate stage embryos in four experimental sets (containing 61, 80, 64 and 37 embryos, respectively). In each set of experiments, human zygotes were thawed on day 1 and cultured in small groups on multiple plates. Each plate was observed independently with time-lapse microscopy under dark-field illumination on separate imaging stations. At ~24 h intervals, one plate of embryos was removed from the imaging system and collected as either single embryos or single cells (blastomeres) for high-throughput qRT-PCR gene expression analysis. Each plate typically contained a mixture of embryos that reached the expected developmental stage at the time of harvest (termed ‘normal’) and those that were arrested or delayed at earlier development stages, or fragmented extensively (termed ‘abnormal’). Gene expression analysis was carried out on single intact embryos or on single blastomeres of dissociated embryos. One hundred of the 242 embryos were imaged until day 5 or 6 to monitor blastocyst formation.
into a morula on day 4 and forming a blastocyst on days 5 to 6. For the purposes of this study, embryos that reached the blastocyst stage were considered developmentally competent and designated ‘normal’, whereas embryos that arrested at a stage before the blastocyst stage were considered developmentally incompetent and designated ‘abnormal’. We tracked the development of 242 IVF embryos in four independent experimental sets using multiple time-lapse microscopes equipped with low-power, dark-field illumination. Of the 242 embryos, 100 were cultured to day 5 or 6, whereas the remaining 142 were removed at various stages for quantitative real-time (qRT) PCR gene expression analysis. Among the 100 embryos cultured to day 5 or 6, 33–53% formed blastocysts (Fig. 2b), and the remaining embryos arrested at different developmental stages, usually between the 2- and 8-cell stages. To identify quantitative imaging parameters that would predict success in development to the blastocyst stage, we extracted and analyzed several parameters from the time-lapse videos, including blastomere size, thickness of the zona pellucida, degree of fragmentation, length of the first cell cycles, time intervals between the first few mitoses and duration of the first cytokinesis. As the embryos in this study were cryopreserved 12–18 h after fertilization, we did not measure parameters before the onset of the first cytokinesis, such as time to first cleavage or length of the first cell cycle, properties that have been evaluated previously18–20. Out of the set of parameters measured, three collectively predicted blastocyst formation: (i) duration of the first cytokinesis (the very brief last step in mitosis that physically separates the two daughter cells), (ii) time interval between the end of the first mitosis and the initiation of the second and (iii) the time interval between the second and third mitoses (the time between the appearance of the cleavage furrows of the second and third mitoses) (Fig. 2c). The third 1116
arameter represents the synchronicity in the formation of the two p sets of granddaughter cells. The mean values and s.d. for these three parameters for the embryos that developed to the blastocyst stage were (i) 14.3 ± 6.0 min, (ii) 11.1 ± 2.2 h and (iii) 1.0 ± 1.6 h, respectively. It is important to note that the first three mitotic events yield a 4-cell embryo from a 1-cell embryo, as opposed to the first three cleavage divisions, which yield an 8-cell embryo (Supplementary Fig. 1). Embryos that reached the blastocyst stage could be predicted, with a sensitivity and specificity of 94% and 93%, respectively, by having a first cytokinesis of 0–33 min, a time between first and second mitoses of 7.8–14.3 h and a time between second and third mitoses of 0–5.8 h (Fig. 2d, Supplementary Fig. 2 and Supplementary Data Set 1). Conversely, embryos that exhibited values outside of one or more of these windows were predicted to arrest. We further examined the behavior of cytokinesis in both normal and abnormal embryos. Embryos that reached the blastocyst stage initiated and completed cytokinesis in a smooth, controlled manner over a narrow time window of 14.3 ± 6.0 min (n = 36), from appearance of the cleavage furrows to complete separation of the daughter cells (Fig. 2e, first panel, and Supplementary Video 2). In contrast, abnormal embryos showed a diverse range of behaviors that can be classified into three aberrant cytokinesis phenotypes (Supplementary Fig. 3). In the least frequent and mildest phenotype, the morphology and mechanism of cytokinesis appears normal, but the time required to complete the process is increased by a few minutes to an hour (Fig. 2e, second panel, Supplementary Video 3 and Supplementary Fig. 3, top panel). A small fraction of the embryos that underwent a slightly prolonged cytokinesis still developed into a blastocyst. In the second phenotype, embryos formed a unipolar cleavage furrow and displayed unusual morphological behavior for several hours before finally cleaving and fragmenting into smaller pieces (Fig. 2e, third panel, Supplementary Video 4 and Supplementary Fig. 3, middle panel). In the third phenotype, embryos displayed membrane ruffling and/or multiple cleavage furrows before cleaving and fragmenting into smaller pieces (Supplementary Fig. 3, bottom panel). Together, the second and third abnormal cytokinesis phenotypes confirm that abnormal cytokinesis is one of the mechanisms for embryo fragmentation, a common observation in abnormal human embryo development. Moreover, we observed that fragmentation in abnormal embryos rarely reversed, whereas moderate fragmentation in normal embryos sometimes reversed at the 2-cell stage before the second mitosis (Supplementary Fig. 4). To determine whether cryopreservation and thawing altered the kinetics of development, we also imaged a small set (n = 10) of embryos that had not been cryopreserved (Fig. 2e, fourth panel, Supplementary Video 5 and Online Methods). Analysis of our three dynamic imaging parameters suggested that cryopreserved embryos are not developmentally delayed by the cryopreservation process. Validation of imaging parameters by automated analysis Our time-lapse imaging data showed that human embryo development varies substantially between embryos within a cohort and that embryos exhibit a wide range of behaviors during cell division. However, characterization of developmental events, such as the duration of cytokinesis, by human observers may be distorted by subjective interpretation. To validate our method for predicting blastocyst formation, we developed an algorithm for automated tracking of cell divisions up to the 4-cell stage. Our tracking algorithm employs a probabilistic model estimation technique based on sequential Monte Carlo methods. This technique works by generating distributions of hypothesized embryo models, simulating images based on a simple VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology
Articles a
Day 1
Day 2 a.m.
b 100%
Day 2 p.m. Day 6
Blastocyst 8-cell 4- to 7-cell 2- to 3-cell 1-cell
80% 60%
f
Normal Day 3
Day 4
Day 5
40%
n = 25
n = 39
n=9
n = 27
20% 0% Expt. 1 Expt. 2 Expt. 3 Expt. 4 1st mitosis
2nd mitosis
d
Time between 1st and 2nd mitoses
Duration of 1st cytokinesis
Synchronicity of 2nd and 3rd mitoses
© 2010 Nature America, Inc. All rights reserved.
–0:05
0:00
0:00
0:05
0:05
0:10
0:10
0:15
0:20
Expt 1 blastocyst (n = 9) Expt 2 blastocyst (n = 17) Expt 3 blastocyst (n = 3) Expt 4 blastocyst (n = 7) Expt 1 arrested (n = 14) Expt 2 arrested (n = 18) Expt 3 arrested (n = 6) Expt 4 arrested (n = 20)
50
e –0:05
Abnormal
3rd mitosis
0:25
Time between 2nd and 3rd mitoses (h)
c
40 30 20 10 0 0
–0:05
0:00
1:00
2:00
3:00
4:00
0 10
5:00
1st –0:05
0:00
0:05
0:10
20 Tim and e bet 30 2nd wee mit n ose s (h )
2
sis kine
4 40
6
1st
)
n (h
atio
dur
cyto
Figure 2 Abnormal embryos exhibit abnormal cytokinesis and mitosis timing during the first divisions. (a) The developmental time line of a healthy human preimplantation embryo. Scale bar, 50 μm. (b) The distribution of normal and arrested embryos among samples that were cultured to day 5 or 6. (c) Cytokinesis duration was measured from the appearance of a cleavage furrow to complete daughter-cell separation during the first division. Time between the first and second mitoses was measured from the completion of the first mitosis to the appearance of cleavage furrow of the second mitosis. Synchronicity of the second and third mitoses was defined as the time between the appearance of the cleavage furrows of the second and third mitoses. (d) Normal embryos followed strict timing in cytokinesis and mitosis during early divisions, before EGA begins. Out of the 100 embryos imaged to day 5 or 6, six were excluded from subsequent image analysis due to technical issues (e.g., inability to track identity after media change, or loss of image focus). Raw data for this plot are included as Supplementary Data Set 1, and additional views can be seen in Supplementary Figure 2. (e) Normal cytokinesis (first row) was typically completed in 14.3 ± 6.0 min in a smooth, controlled manner. In the mild phenotype (second row), the cytokinesis mechanism appears normal although it is slightly prolonged. In the severe phenotype (third row), a one-sided cytokinesis furrow is formed, accompanied by unusual ruffling of cell membranes for a prolonged period of time. Cytokinesis was defined by the first appearance of the cytokinesis furrow (arrows) to the complete separation of daughter cells. Imaging was also performed on a subset of triploid embryos (fourth row), which exhibited a distinct phenotype of dividing into three cells in a single event. Scale bar, 50 μm. (f) Embryos that underwent abnormal development and behavior (right) would occasionally appear morphologically similar to normal embryos (left) at the time of sample collection. In this particular case, time-lapse video data showed that what appeared to be a six to eight-cell embryo (right) was in fact the product of a highly aberrant cell division (Supplementary Video 10). Thus, the correlated imaging data served to ensure the accuracy of sample selection and identification for the gene expression analysis.
optical model and comparing these simulations to the observed image data (Fig. 3a and Supplementary Video 6). Embryos were modeled as a collection of ellipses with position, orientation and overlap indices (to represent the relative heights of the cells). With these models, the duration of cytokinesis and time between mitoses can be extracted. Cytokinesis is typically defined by the first appearance of the cytokinesis furrow (where bipolar indentations form along the cleavage axis) to the complete separation of daughter cells. We simplified the problem by approximating cytokinesis as the duration of cell elongation before a 1-cell to 2-cell division. A cell is considered elongated if its major axis has increased by >15% (chosen empirically). The time between mitoses is straightforward to extract by counting the number of cells in each model. We tested our algorithm on 14 human embryos from the set of 100 that were imaged up to the blastocyst stage (Fig. 3b and Supplementary Video 7) and compared the automated measurements to manual image nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010
analysis (Fig. 3c). In this data set, eight embryos reached the blastocyst stage with good morphology (Fig. 3d, top). The automated measurements were closely matched to the manual measurements, and all eight embryos were correctly predicted to reach the blastocyst stage by both methods. Two embryos reached the blastocyst stage with poor morphology (poor quality of inner cell mass; Fig. 3d, bottom). For these embryos, manual assessment indicated that one would reach the blastocyst stage and one would arrest, whereas the automated assessment predicted that both would arrest. Finally, four embryos arrested before the blastocyst stage; all four were correctly predicted to arrest by both methods. These results suggest that a systematic, automated prediction of blastocyst formation can be achieved as early as the 4-cell stage. Gene expression and cytokinesis To assess whether imaging parameters that predict success or failure of development are associated with transcriptional patterns, we 1117
Articles a
Frame 127
Frame 269
Frame 270
c 200
Frame 276
Frame 314
d
Window for blast Manual Automatic
150
Good-morphology blastocyst (embryo no.6)
100 50 0
2
4
6
8
10
12
14
20 Time between 1st and 2nd mitoses (h)
© 2010 Nature America, Inc. All rights reserved.
b
Frame 125
Duration of first cytokinesis (min)
Frame 15
Poor-morphology blastocyst (embryo no.9)
15 10 5 0
2
4
6
8
10
12
14
Good-morphology Poor- Arrested blastocyst morphology blastocyst
Figure 3 Automated image analysis confirms the utility of the imaging parameters to predict blastocyst formation. (a) Results of tracking algorithm for a single embryo. Images were captured every 5 min, and only a select group is displayed. The top row shows frames from the original time-lapse image sequence, and the bottom row shows the overlaid tracking results. (b) Set of 14 embryos that were analyzed (Supplementary Video 6). One embryo was excluded as it was floating and out of focus. (c) Comparison of image analysis by a human observer and automated analysis of the duration of cytokinesis (top) and of the time between first and second mitoses (bottom). There is excellent agreement between the two methods for embryos that reached the blastocyst stage with good morphology. The few cases of disagreement occurred mostly for abnormal embryos and were caused by unusual behavior that is difficult to characterize by both methods. The gray shade region shows the window for blastocyst prediction. The two methods agreed on blastocyst prediction except in the case of embryo 10, which was predicted as abnormal by the automated method and normal by the manual method. (d) Comparison of blastocysts with good (top) and bad (bottom) morphology.
analyzed the expression of nine putative cytokinesis-related genes in both normal and arrested embryos (Supplementary Table 1 and Supplementary Data Set 2). Aberrant cytokinesis seen in the timelapse image data correlated strongly with reduced expression of key cytokinesis genes. Like their morphological phenotypes, the gene expression profiles of embryos that arrested were diverse and variable. For example, an arrested 2-cell embryo that displayed a slightly prolonged cytokinesis and an unusual plasma membrane ruffling (Supplementary Video 8) expressed all nine cytokinesis genes examined at significantly lower levels (P < 0.05) compared with developmentally normal embryos at the same stage (Fig. 4a). On the other hand, an arrested 4-cell embryo (Fig. 4b) that underwent prolonged, unipolar cytokinesis during its first division (Supplementary Video 9) showed significantly reduced expression (P < 0.05) of only two cytokinesis genes, ANLN and ECT2. We also examined genes in categories other than cytokinesis in arrested and normal embryos at the 1- and 2-cell stage. For this purpose we calculated average expression levels for each of 52 additional 1118
genes that included housekeeping genes, germ cell markers, maternal factors, EGA markers, trophoblast markers, inner cell mass markers, pluripotency markers, epigenetics regulators, transcription factors, hormone receptors and others based primarily on published data in model organisms1,7,8. Normal 1-cell embryos were identified as having undergone successful fusion of the two pronuclei (syngamy) on day 1 and displaying a round, firm appearance. Eighteen of the 52 genes showed statistically significant differences in expression between normal and arrested embryos (P < 0.05), with certain gene categories affected more severely than others (Fig. 4c). In abnormal embryos, expression of most of the housekeeping genes, hormone receptors and maternal factors was not appreciably altered, but many genes involved in cytokinesis and in microRNA (miRNA) biogenesis, such as DGCR8, DICER1 and TARBP2, were expressed at highly reduced levels. Two of the most severely affected genes, CPEB1 and SYMPK, belong to the same molecular pathway, which regulates maternal mRNA storage and reactivation by modulating the length of poly(A) tails on oocyte/embryo transcripts25. VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology
Articles a
–0:05
0:00
0:05
0:10
0:15
0:20
0:25
c
ANLN Transcription CFL1* TAF4 Cytokinesis DIAPH1* NELF* factor 100 GTF2A1 DIAPH2 GABPB2 DNM2 BTF3* ECT2* ATF7IP2 MKLP2 MYLC2* 10 ATF4 RHOA IGFR2*
Relative expression
10 Arrested 2-cell embryo Normal 2-cell embryos
8 6
(n = 9)
4
Receptor
2
C 2 R H O A
P2
7:00
M YL
EC T2
5:00
3:00
M KL
H 2 D N M 2
H 1
1:00
D IA P
1
D IA P
C FL
AN LN 0:00
11:00
Relative expression
Arrested 4-cell embryo Normal 4-cell embryos
8 6
(n = 12)
TARBP2*
TERT* Pluripotency
9:00
CPEB1*
POU5F1
SYMPK*
NANOG
DNMT3B* TBP RPLPO HPRT1 GAPDH Housekeeping CTNNB1* ACTB
4
miRNA RNASEN biogenesis DGCR8* DICER1*
YY1*
10
TACC3 AURKA YBX2* PARN CCR4 RNA DAZL processing VASA ZP1 BNC2 GDF9 ZAR1* PDCD5 HSF1 NLRP5 Maternal effect
2
A H
O
2 C YL M
Normal 1- and 2-cell embryos (n = 5) Arrested 1- and 2-cell embryos (n = 6)
R
P2 KL M
2
T2 EC
M N
2 PH IA
D
IA D
D
1 PH
1 FL C
LN
0
AN
© 2010 Nature America, Inc. All rights reserved.
1
FGFR1
0
b
XPO5
IGFR1 FGFR2
Figure 4 Distinct gene expression profiles of developmentally delayed or arrested embryos. (a) An arrested 2-cell embryo that showed abnormal membrane ruffling during the first cytokinesis had significantly (P < 0.05) reduced expression level of all cytokinesis genes tested. Scale bar, 50 μm. (b) An arrested 4-cell embryo that underwent aberrant cytokinesis with a one-sided cytokinesis furrow and extremely prolonged cytokinesis during the first division showed lower expression of ANLN and ECT2. Scale bar, 50 μm. (c) The average expression level of 52 genes from six abnormal 1- to 2-cell embryos and five normal 1- to 2-cell embryos were plotted in a radar graph on a logarithmic scale. Arrested embryos in general expressed less mRNA than normal embryos, with genes related to cytokinesis, RNA processing and miRNA biogenesis most severely affected. Genes highlighted in orange with an asterisk indicate a statistically significant difference (P < 0.05) between normal and abnormal embryos as determined by the Mann-Whitney test.
Embryonic stage–specific patterns To further correlate our three imaging parameters with gene expression, we measured expression of two slightly different but overlapping sets of 96 genes (Supplementary Table 1) at multiple time points of embryo development. Time-lapse imaging was used to aid the identification and classification of normal and abnormal embryos because occasionally embryos that developed and behaved abnormally would appear morphologically normal at the time of sample collection (Fig. 2f and Supplementary Video 10). By analyzing the gene expression patterns of 141 of the 242 embryos that had apparently normal development as assessed by imaging (and without any prior assumptions), we derived four unique embryonic stage–specific patterns (ESSPs) of gene expression (Fig. 5a, Supplementary Fig. 5 and Supplementary Table 2). ESSP1 describes maternally inherited oocyte mRNAs destined for degradation. These transcripts were expressed at high levels at the zygote stage and declined during development to the blastocyst stage. Their half-life was just 21 h (Supplementary Fig. 6). ESSP2 includes embryonic-activated genes, first transcribed on day 3, at approximately the 8-cell stage. ESSP3 comprises genes not expressed until the blastocyst stage. Finally, ESSP4 includes persistent transcripts that maintained stable expression relative to the reference genes from the zygote to blastocyst stages. The half-life of ESSP4 genes was 193 h, more than nine times longer than that of ESSP1 genes (21 h) (Supplementary Fig. 6). Fourteen of the 96 genes analyzed did not fit into any of the four ESSP patterns and were labeled ‘undefined’ (Supplementary Table 2). We confirmed the four patterns of gene expression in two additional independent experimental sets using both single, intact normal embryos and isolated single blastomeres (Supplementary Fig. 7). nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010
We compared our qRT-PCR data in 1-cell and 2-cell embryos to published microarray data on human oocytes23 (Supplementary Data Set 3). In ref. 19, the expression values for individual genes in the microarray data were normalized against the geometric mean of GAPDH and RPLP0, which were the same reference genes used in our studies. Among the 86 genes that we analyzed, almost every gene that was expressed in 1- and 2-cell embryos was also expressed or upregulated in oocytes, with the exception of TACC3 and H2AFZ. In addition, by dividing the genes into low-, medium- and high-expression genes, we observed good correlation between the two data sets among all gene sets, especially between the highly expressed genes. A comparison of our data to a study of cell cycle genes expressed in the 8-cell human embryo26 showed agreement for the two genes that were assayed in both studies, AURKA and CCNA1. Individual blastomeres show cell autonomy Individual blastomeres in an intact early human embryo are usually assumed to be synchronized in constitution and developmental programming, and developmental success or failure is considered a property of the whole embryo. We measured expression of ten maternal transcript genes and ten embryonic genes in single blastomeres of 36 normal and abnormal embryos between the 2- and 10-cell stage. Notably, this experiment revealed a subset of normal embryos that contained blastomeres whose gene expression signatures corresponded to different developmental ages. Among 24 morphologically and developmentally normal embryos, 6 (25%) contained blastomeres of different transcriptional ages 1119
Articles a 12
ESSP1
10
Relative expression (n = 141)
8 6 4 2 0 4
ESSP3
3
3
2
2
1
1
0
1c 2c 3c 4c 6c 8c 9c M
b
B
0
Figure 5 Gene expression analysis of single human embryos and blastomeres. (a) Genes analyzed in human embryos are defined by four distinct ESSPs. Relative expression level of an ESSP was calculated by averaging the expression levels of genes with similar expression patterns. (b) The ratio of maternal to embryonic genes in embryos changes during preimplantation development (left). Some embryos contained blastomeres of different developmental ages (right). The expression levels of embryonic and maternal programs were calculated by averaging the relative expression of ten ESSP1 and ten ESSP2 markers, respectively.
ESSP2
ESSP4
1c 2c 3c 4c 6c 8c 9c M
Embryonic Maternal
B
Embryonic Maternal
Maternal/embryonic ratio
© 2010 Nature America, Inc. All rights reserved.
8 7 6 5 4 3 2 1 0 4
2-cell 4-cell
6-cell
8-cell
Examples of abnormal embryos
(Fig. 5b). Among 12 abnormal embryos arrested between the 2and 10-cell stage, this phenomenon was detected in 8 (66%). In some cases (e.g., Fig. 5b, right), the transcriptional profile of individual blastomeres varied to such an extent that some blastomeres in both normal and abnormal embryos may have been arrested for a considerable amount of time while the others progressed in development.
enabled us to derive an algorithm to automatically measure our imaging parameters and to predict blastocyst formation systematically and reliably by the 4-cell stage, before EGA. The finding that embryo development can be predicted at this early stage suggests that success or failure is likely to be determined at least in part by inheritance of maternal transcripts, which we observed to be expressed at altered levels in abnormal embryos. Other factors that may contribute to abnormal development before EGA include inherited genetic mutations, aneuploidy, environmental insult to germ cells, events during fertilization and sperm-related factors27–29. Second, we found that gene expression in preimplantation human embryos is cell autonomous and follows four distinct patterns. Maternally inherited transcripts in ESSP1 have a half-life after fertilization of ~21 h; ESSP2 and ESSP3 are expressed at EGA and thereafter, respectively; and ESSP4 genes are stably expressed, with a half-life of ~193 h. Previous studies of gene expression in the human embryo from the oocyte to day 3 have not analyzed single blastomeres, primar ily because of the technical difficulty of obtaining single-cell data1,26. At the whole-embryo level, maintenance of maternal mRNA expression profiles and failure to progress to EGA has never been observed past the first cell division1. Our single-cell gene expression analysis shows that individual blastomeres in an embryo can differ, with some maintaining maternal mRNAs whereas others progress to EGA. The frequency of this observation (in >25% of embryos) indicates that individual blastomeres in human embryos are cell autonomous. Moreover, the observation that maternal mRNAs can be maintained even after 3 days of development suggests that the degradation of the maternal programs is not simply a passive process.
DISCUSSION We have carried out a large-scale study of preimplantation human embryos that correlated time-lapse image analysis and gene expression profiling with development from the zygote to the blastocyst. Our results shed light on human embryo development and provide an approach for predicting which embryos will reach the blastocyst Time line stage using three dynamic imaging para meters (Fig. 6). First, we showed that human embryos that develop to the blastocyst stage Stage follow a strict and predictable developmental timeline that is correlated with predictOocyte able gene expression patterns. This timeline Molecular
Figure 6 Proposed model for human embryo development. Human embryos begin life with a set of oocyte RNAs inherited from the mother. After fertilization, a subset of maternal RNAs specific to the egg (ESSP1) must be degraded as the transition from oocyte to embryo begins. As development continues, other RNAs are partitioned equally to each blastomere (ESSP4). At EGA, ESSP2 genes are transcribed in a cell-autonomous manner. During the cleavage divisions, embryonic blastomeres may arrest or progress independently. ‘Feature extraction’ indicates the three imaging parameters for predicting successful development to the blastocyst stage: cytokinesis, the time between 1st and 2nd mitoses, and the time between 2nd and 3rd mitoses.
1120
provides mRNAs
24 h
15 min
11 h
1h
18 h
24 h
24 h
Blastomeres are cell autonomous Onset of Each blastomere Embryonic degradation inherits half of gene activation of ESSP1 stable mRNA (ESSP4) (ESSP2) mRNA
Imaging
Automated tracking Feature extraction
Duration of 1st cytokinesis ESSP1
Time between mitoses Transfer prior to EGA
Embryo transfer ESSP2
VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
Articles Previous studies in the mouse have sought to understand how and when the first cell fate decisions in the embryo are established. It has been suggested that the first cleavage division itself determines the blastocyst axis and that subsequently all blastomeres are not equivalent30–31. These studies may imply that cell lineage fate is determined in the first division32; more likely is that the first cleavage division affects the probabilities of fates in subsequent divisions33. Our results support the conclusion that some aspects of embryo fate, especially success or failure to reach the blastocyst stage, are determined very early in development and likely inherited from the oocyte, as described above. Moreover, they imply that each cell-autonomous blastomere is capable of contributing, or not, to subsequent lineages. Third and finally, given that embryo developmental potential can be assessed with a combination of cytokinetic and mitotic parameters in the first two cleavage divisions, it may be feasible to translate these basic studies to clinical applications. Current morphological and growth criteria that are commonly used to assess embryo viability on day 3 in assisted reproduction clinics may both underestimate and overestimate embryo potential, with well-documented consequences, such as multiple births, the need for fetal reduction and miscarriage34. Given the uncertainties associated with evaluation at day 3, some clinics have turned to longer culture to assess embryo potential, as embryos transferred at the blastocyst stage have a higher implantation rate compared with embryos transferred at day 3 (refs. 13,35–38). However, this practice involves prolonged in vitro culture and may increase the chance of altered gene expression and epigenetic inheritance39–41. Thus, a method to predict blastocyst formation at day 2 could improve IVF outcomes by increasing pregnancy rates while reducing the risk of multiple gestations. This question will be evaluated in future clinical studies. Methods Methods and any associated references are available in the online version of the paper at http://www.nature.com/naturebiotechnology/. Note: Supplementary information is available on the Nature Biotechnology website. Acknowledgments We thank R. Raja for help with the microarray analysis and early imaging experiments, the members of the Reijo Pera laboratory for technical assistance and discussions, S. Walker for advice regarding the cell tracking algorithm and K. Salisbury for providing K.E.L. with hardware and software resources. We acknowledge funding contributions from the Stanford Institute for Stem Cell Biology and Regenerative Medicine, a generous, anonymous donor and the March of Dimes (6-FY06-326). AUTHOR CONTRIBUTIONS C.C.W. and K.E.L. performed and designed experiments, analyzed data and assisted in writing and editing of the manuscript. K.E.L. designed cell tracking algorithms. N.L.B. assisted in performing the experiments. B.B., N.L.B. and C.J.D.J. assisted in analyzing data and editing the manuscript. T.M.B. and K.E.L. designed and built the imaging instrumentation. T.M.B. and R.A.R.P. designed experiments, interpreted results and assisted in writing and editing the manuscript. COMPETING FINANCIAL INTERESTS The authors declare competing financial interests: details accompany the full-text HTML version of the paper at http://www.nature.com/naturebiotechnology/. Published online at http://www.nature.com/naturebiotechnology/. Reprints and permissions information is available online at http://npg.nature.com/ reprintsandpermissions/. 1. Dobson, A.T. et al. The unique transcriptome through day 3 of human preimplantation development. Hum. Mol. Genet. 13, 1461–1470 (2004). 2. Braude, P., Bolton, V. & Moore, S. Human gene expression first occurs between the fourand eight-cell stages of preimplantation development. Nature 332, 459–461 (1988). 3. Memili, E. & First, N.L. Zygotic and embryonic expression in cow: a review of timing and mechanisms of early gene expression as compared with other species. Zygote 8, 87–96 (2000).
nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010
4. Beaujean, N. et al. Effect of limited DNA methylation reprogramming in the normal sheep embryo on somatic cell nuclear transfer. Biol. Reprod. 71, 185–193 (2004). 5. Fulka, H., Mrazek, M., Tepla, O. & Fulka, J. Jr. DNA methylation pattern in human zygotes and developing embryos. Reproduction 128, 703–708 (2004). 6. Duranthon, V., Watson, A.J. & Lonergan, P. Preimplantation embryo programming: transcription, epigenetics, and culture environment. Reproduction 135, 141–150 (2008). 7. Wang, Q.T. et al. A genome-wide study of gene activity reveals developmental signaling pathways in the preimplantation mouse embryo. Dev. Cell 6, 133–144 (2004). 8. Zeng, F. & Schultz, R. RNA transcript profiling during zygotic gene activation in the preimplantation mouse embryo. Dev. Biol. 283, 40–57 (2005). 9. Vanneste, E. et al. Chromosome instability is common in human cleavage-stage embryos. Nat. Med. 15, 577–583 (2009). 10. Macklon, N.S., Geraedts, J.P.M. & Fauser, B.C.J.M. Conception to ongoing pregnancy: the “black box” of early pregnancy loss. Hum. Reprod. Update 8, 333–343 (2002). 11. Evers, J.L. Female subfertility. Lancet 360, 151–159 (2002). 12. French, D.B., Sabanegh, E.S. Jr., Goldfarb, J. & Desai, N. Does severe teratozoospermia affect blastocyst formation, live birth rate, and other clinical outcome parameters in ICSI cycles? Fertil. Steril. 93, 1097–1103 (2010). 13. Gardner, D.K., Lane, M. & Schoolcraft, W. Culture and transfer of viable blastocysts: a feasible proposition for human IVF. Hum. Reprod. 15 (Suppl 6), 9–23 (2000). 14. Payne, D., Flaherty, S.P., Barry, M.F. & Matthews, C.D. Preliminary observations on polar body extrusion and pronuclear formation in human oocytes using time-lapse video cinematography. Hum. Reprod. 12, 532–541 (1997). 15. Adjaye, J., Bolton, V. & Monk, M. Developmental expression of specific genes detected in high-quality cDNA libraries from single human preimplantation embryos. Gene 237, 373–383 (1999). 16. Assou, S. et al. The human cumulus—oocyte complex gene-expression profile. Hum. Reprod. 21, 1705–1719 (2006). 17. Kimber, S.J. et al. Expression of genes involved in early cell fate decisions in human embryos and their regulation by growth factors. Reprod. 135, 635–647 (2008). 18. Nagy, Z.P., Liu, J., Joris, H., Devroey, P. & Steirteghem, A.V. Time-course of oocyte activation, pronucleus formation and cleavage in human oocytes fertilized by intracytoplasmic sperm injection. Hum. Reprod. 9, 1743–1748 (1994). 19. Fenwick, J., Platteau, P., Murdoch, A.P. & Herbert, M. Time from insemination to first cleavage predicts developmental competence of human preimplantation embryos in vitro. Hum. Reprod. 17, 407–412 (2002). 20. Lundin, K., Bergh, C. & Hardarson, T. Early embryo cleavage is a strong indicator of embryo quality in human IVF. Hum. Reprod. 16, 2652–2657 (2001). 21. Lemmen, J.G., Agerholm, I. & Ziebe, S. Kinetic markers of human embryo quality using time-lapse recordings of IVF/ICSI-fertilized oocytes. Reprod. Biomed. Online 17, 385–391 (2008). 22. Bermudez, M.G. et al. Expression profiles of individual human oocytes using microarray technology. Reprod. Biomed. Online 8, 325–337 (2004). 23. Kocabas, A.M. et al. The transcriptome of human oocytes. Proc. Natl. Acad. Sci. USA 103, 14027–14032 (2006). 24. Rienzi, L. et al. Significance of morphological attributes of the early embryo. Reprod. Biomed. Online 10, 669–681 (2005). 25. Bettegowda, A. & Smith, G.W. Mechanisms of maternal mRNA regulation: implications for mammalian early embryonic development. Front. Biosci. 12, 3713–3726 (2007). 26. Kiessling, A.A. et al. Evidence that human blastomere cleavage is under unique cell cycle control. J. Assist. Reprod. Genet. 26, 187–195 (2009). 27. Schatten, H. & Sun, Q. The role of centrosomes in fertilization, cell division and establishment of asymmetry during embryo development. Semin. Cell Dev. Biol. 21, 174–184 (2010). 28. Ostermeier, G.C., Miller, D., Huntriss, J.D., Diamond, M.P. & Krawetz, S.A. Reproductive biology: delivering spermatozoan RNA to the oocyte. Nature 429, 154 (2004). 29. Hammoud, S.S. et al. Distinctive chromatin in human sperm packages genes for embryo development. Nature 460, 473–478 (2009). 30. Zernicka-Goetz, M. Patterning of the embryo: the first spatial decisions in the life of a mouse Development 129, 815–829 (2002). 31. Plusa, B. et al. The first cleavage of the mouse zygote predicts the blastocyst axis. Nature 434, 391–395 (2005). 32. Hiiragi, T., Louvet-Vallee, S., Solter, D. & Maro, B. Embryology: does prepatterning occur in the mouse egg? Nature 442, E3–4 (2006). 33. Zernicka-Goetz, M. The first cell-fate decisions in the mouse embryo: destiny is a matter of both chance and choice Curr. Opin. Genet. Dev. 16, 406–412 (2006). 34. Racowsky, C. High rates of embryonic loss, yet high incidence of multiple births in human ART: Is this paradoxical? Theriogenology 57, 87–96 (2002). 35. Milki, A.A., Hinckley, M., Fisch, J., Dasig, D. & Behr, B. Comparison of blastocyst transfer with day 3 embryo transfer in similar patient populations. Fertil. Steril. 73, 126–129 (2000). 36. Gardner, D.K., Lane, M., Stevens, J., Schlenker, T. & Schoolcraft, W.B. Blastocyst score affects implantation and pregnancy outcome: towards a single blastocyst transfer. Fertil. Steril. 73, 1155–1158 (2000). 37. Gardner, D.K. & Lane, M. Towards a single embryo transfer. Reprod. Biomed. Online 6, 470–481 (2003). 38. Gardner, D.K. et al. Single blastocyst transfer: a prospective randomized trial. Fertil. Steril. 81, 551–555 (2004). 39. Manipalviratn, S., DeCherney, A. & Segars, J. Imprinting disorders and assisted reproductive technology. Fertil. Steril. 91, 305–315 (2009). 40. Niemitz, E.L. & Feinberg, A. Epigenetics and assisted reproductive technology: a call for investigation. Am. J. Hum. Genet. 74, 599–609 (2004). 41. Horsthemke, B. & Ludwig, M. Assisted reproduction: the epigenetic perspective. Hum. Reprod. Update 11, 473–482 (2005).
1121
© 2010 Nature America, Inc. All rights reserved.
ONLINE METHODS
Sample source. All embryos used in this study were supernumerary embryos from the Lutheran General Hospital IVF Program that were donated to research by informed consent. Embryos were moved to the Reproductive Medicine Center at the University of Minnesota after the Lutheran General Hospital IVF Program closed in 2002. Before the program closed, the embryos were collected over several years. Oocytes were fertilized and cryopreserved by multiple embryologists. The average number of embryos per patient in our study was 3, and all age groups encountered in a routine IVF center were included. All embryos were generated by IVF, not intracytoplasmic sperm injection, so they were derived from sperm able to penetrate the cumulus, zona and oolemma and form a pronuclei. Stimulation protocols were standard long lupron protocols. The embryos were cryopreserved by placing them in freezing medium (1.5 M 1,2 propanediol + 0.2 M sucrose) for 25 min at 22 + 2 °C and then freezing them using a slow-freeze protocol (−1 °C/min to −6.5 °C; hold for 5 min; seed; hold for 5 min; −0.5 °C/min to −80 °C; plunge in liquid nitrogen). The embryos were approved for research by the University of Minnesota Internal Review Board and the Stanford University Internal Review Board and Stem Cell Research Oversight Committee. No protected health information could be associated with the embryos. We chose to use this embryo set after consideration of alternatives. Three sources of IVF embryos are theoretically available: (i) day 1 embryos obtained from a clinic for immediate analysis without cryopreservation, (ii) clinical embryos destined for transfer for reproductive purposes if our imaging system was set up in a clinic and (iii) cryopreserved embryos that are validated with regard to key developmental landmarks. Clinical practices and general guidelines pose considerable practical obstacles to alternatives (i) and (ii). Moreover, ‘fresh’ embryos donated for research are generally abnormal in development. We therefore chose to study a large set of cryopreserved zygotes available for research. The following considerations suggest that cryopreservation did not adversely affect our results. First, the timing of developmental landmarks was similar to that of normal embryos, including cleavage to 2 cells (occurred early day 2), onset of RNA degradation (occurred on days 1 to 3), cleavage to 4 and 8 cells (occurred on late day 2 and day 3, respectively), EGA (on day 3 at the 8-cell stage) and formation of the morula and blastocyst (occurred on days 4 and 5, respectively)1,2. Second, the fraction of embryos that reached the blastocyst stage is typical of IVF embryos in a clinical setting12,13,42. This is most likely because the embryos were cryopreserved at the 2PN stage and represented the spectrum of embryos encountered in an IVF clinic. No triage was done before cryopreservation. Third, embryos frozen at the 2PN stage have been shown to possess similar potential for development, implantation, clinical pregnancy and delivery compared with fresh embryos43–45. Other studies have also shown similar results for frozen oocytes 24,46. Fourth, we focused on parameters that were not dependent on time of fertilization or thaw time. As described in the manuscript, the first parameter, duration of the first cytokinesis, is short (~10–15 min) and is not dependent on the time of fertilization. The other parameters we measured are relative to this initial measurement point and compared between embryos that succeed in developing to blastocyst and those that do not. Control for cryopreservation. In addition to the observations described above, which support the use of cryopreserved embryos, we also examined a small set of embryos obtained from the Stanford IVF clinic that were not cryopreserved as a control. These embryos were 3PN (triploid) starting at the single-cell stage. 3PN embryos have been shown to follow the same time line of landmark events as normal fresh embryos through at least the first three cell cycles47–49. These embryos were imaged before our main experiments to validate the imaging systems (but for technical reasons were not followed out to blastocyst). Out of this set of fresh embryos, three of the embryos followed a similar time line of events as our cryopreserved 2PN embryos, with duration of cytokinesis ranging from 15 to 30 min, time between first and second mitoses ranging from 9.6 to 13.8 h and time between second and third mitoses ranging from 0.3 to 1.0 h. However, in seven of the embryos we observed a unique cytokinesis phenotype that was characterized by the simultaneous appearance of three cleavage furrows, a slightly prolonged cytokinesis and ultimate separation into three daughter cells (Fig. 2e, fourth panel, and Supplementary Video 5). These embryos had a duration of cytokinesis
nature biotechnology
r anging from 15 to 70 min (characterized as the time between the initiation of the cleavage furrows until complete separation into three daughter cells), time between first and second mitoses (3-cell to 4-cell) ranging from 8.7 to 12.7 h, and time between second and third mitoses (4-cell to 5-cell) ranging from 0.3 to 2.6 h. This observation, together with the diverse range of cytokinesis phenotypes displayed by abnormal embryos, suggests that our cryopreserved embryos are not developmentally delayed by the cryopreservation process and behave similarly to fresh zygotes that cleave to two blastomeres. The data also demonstrate that abnormal cytokinesis may be associated with underlying abnormalities in chromosomal composition (as demonstrated by zygotes that form three blastomeres initially). This hypothesis is consistent with previous observations correlating aneuploidy with morphology50–52. Human embryo culture and microscopy. Human embryos were thawed by removing the cryovials from the liquid nitrogen storage tank and thawing them at 22 + 2 °C. Once a vial was thawed, it was opened and the embryos were visualized under a dissecting microscope. The contents of the vial were then poured into the bottom of a 3003 culture dish. The embryos were located in the drop and the survival of each embryo was assessed and recorded. At 22 + 2 °C, the embryos were transferred to a 3037 culture dish containing 1.0 M 1,2 propanediol + 0.2 M sucrose for 5 min, then 0.5 M 1,2 propanediol + 0.2 M sucrose for 5 min and 0.0 M 1,2 propanediol + 0.2M sucrose for 5 min. Subsequently, embryos were cultured in Quinn’s Advantage Cleavage Medium (Cooper Surgical) supplemented with 10% Quinn’s Advantage Serum Protein Substitute (SPS; Cooper Surgical) between day 1 to 3, and Quinn’s Advantage Blastocyst Medium (Cooper Surgical) with 10% SPS after day 3 using microdrops under oil. All of the experiments used the same type of cleavage-stage medium, except for two stations during the first experiment, which used a Global medium (LifeGlobal). In this small subset (12 embryos), the embryos exhibited a slightly lower blastocyst formation rate (3 out of 12, or 25%) but the sensitivity and specificity of our predictive parameters were both 100% for this group. Time-lapse imaging was performed on multiple systems to accommodate concurrent analysis of multiple samples as well as to validate the consistency of the data across different platforms. The systems consisted of seven individual microscopes: (i) two modified Olympus IX-70/71 microscopes equipped with Tokai Hit heated stages, white-light Luxeon LEDs, and an aperture for dark-field illumination; (ii) two modified Olympus CKX-40/41 microscopes equipped with heated stages, white-light Luxeon LEDs, and Hoffman Modulation Contrast illumination (note: these systems were used only during the first of four experiments after it was decided that dark-field illumination was preferable for measuring the parameters); and (iii) a custom built three-channel miniature microscope array that fits inside a standard incubator, equipped with white-light Luxeon LEDs and apertures for darkfield illumination. We observed no important difference in developmental behavior, blasto cyst formation rate or gene expression profiles between embryos cultured on these different systems; indeed, our parameters for blastocyst prediction were consistent across multiple systems and experiments. The light intensity for all systems was considerably lower than the light typically used on an assisted reproduction microscope due to the low-power of the LEDs (relative to a typical 100 W Halogen bulb) and high sensitivity of the camera sensors. Using an optical power meter, we determined that the power of a typical assisted-reproduction microscope (Olympus IX-71 Hoffman Modulation Contrast) at a wavelength of 473 nm ranges from roughly 7 to 10 mW depending on the magnification, whereas the power of our imaging systems were measured to be between 0.2 and 0.3 mW at the same wavelength. Images were captured at a 1 s exposure time every 5 min for up to 5 or 6 d, resulting in ~24 min of continuous light exposure. At a power of 0.3 mW, this is equivalent to roughly 1 min of exposure under a typical assisted-reproduction microscope. To track the identity of each embryo during correlated imaging and gene expression experiment, we installed a video camera on the stereomicroscope and recorded the process of sample transfer during media change and sample collection. We performed control experiments with mouse preimplantation embryos (n = 56) and a small subset of human embryos (n = 22), and observed no significant difference (P = 0.96) in the blastocyst formation rate between imaged and control embryos.
doi:10.1038/nbt.1686
© 2010 Nature America, Inc. All rights reserved.
High-throughput qRT-PCR analysis. For single embryo or single blastomere qRT-PCR analysis, embryos were first treated with Acid Tyrode’s solution to remove the zona pellucida. To collect single blastomeres, the embryos were incubated in Quinn’s Advantage Ca2+ Mg2+–free medium with HEPES (Cooper Surgical) for 5–20 min at 37 °C with rigorous pipetting. Samples were collected directly into 10 μl of reaction buffer; subsequent one-step reverse transcription/pre-amplification reaction was performed as previously described53. Pooled 20× ABI assay-on-demand qRT-PCR primer and probe mix (Applied Biosystems) were used as gene-specific primers during the reverse transcription and pre-amplification reactions. High throughput qRT-PCR reactions were performed with Fluidigm Biomark 96.96 Dynamic Arrays as previously described53 using the ABI assay-on-demand qRT-PCR probes listed in Supplementary Data Set 4. All samples were loaded in three or four technical replicates. qRT-PCR data analysis was performed with qBasePlus (Biogazelle), Microsoft Excel and a custom-built software. Certain genes were omitted from data analysis owing to either poor data quality (e.g., poor PCR amplification curves) or consistent low to no expression in the embryos assessed. For the analysis of blastomere age, the maternal transcript panel used includes DAZL, GDF3, IFITM1, STELLAR, SYCP3, VASA, GDF9, PDCD5, ZAR1 and ZP1, whereas the embryonic gene panel includes ATF7IP, CCNA1, EIF1AX, EIF4A3, H2AFZ, HSPA1B, JARID1B, LSM3, PABPC1 and SERTAD1. The expression value of each gene relative to the reference genes GAPDH and RPLP0, as well as relative to the gene average, was calculated using the geNorm and ΔΔCt methods (Supplementary Data Sets 5–7)54,55. GAPDH and RPLP0 were selected as the reference genes for this study empirically based on the gene stability value and coefficient of variation: 1.18 and 46% for GAPDH and 1.18 and 34% for RPLP0. These were the most stable among the ten housekeeping genes that we tested and well within range of a typical heterogeneous sample set56. Second, we observed that in single blastomeres, as expected, the amount of RPLP0 and GAPDH transcripts decreased by ~1 Ct value per division between 1-cell and 8-cell stage, congruent with expectations that each cell inherits approximately one-half of the pool of mRNA with each cleavage division, in the absence of new transcripts before EGA during the first 3 d of human development (Supplementary Fig. 8). Third, we noted that the expression level of these reference genes in single blastomeres remained stable from the 8-cell to morula stage, after EGA began. At the intact embryo level, the Ct values of both RPLP0 and GAPDH remained largely constant throughout development until the morula stage with a slight increase following in the blastocyst stage perhaps due to increased transcript levels in the greater numbers of blastomeres present. Most of the gene expression analysis performed in this study focused on developmental stages before the morula stage, however, when the expression level of the reference genes was extremely stable. Normal and arrested embryos. In the experiments described in this work, arrested embryos were defined as embryos that did not reach the expected developmental stage at the time of sample collection. For example, when a plate of embryos was removed from the imaging station on late day 2 for sample collection, any embryo that had reached 4-cell stage and beyond would be identified as normal, whereas those that failed to reach 4-cell stage would be labeled as arrested. These arrested embryos were categorized by the developmental stage at which they became arrested, such that an embryo with only 2 blastomeres on late day 2 would be analyzed as an arrested 2-cell embryo. Care was taken to exclude embryos that morphologically appeared to be dead and porous at the time of sample collection (e.g., degenerate blastomeres). Only embryos that appeared alive (for both normal and arrested) were used for gene expression analysis. However, it is possible that embryos that appeared normal during the time of collection might ultimately arrest if they were allowed to grow to a later stage. Multiplex qRT-PCR reactions of up to 96 genes belonging to different categories were assayed per sample, including housekeeping genes, cytokinesis components, germ cell markers, maternal factors, embryonic genome activation (EGA) markers, trophoblast markers, inner cell mass markers, pluripotency markers, epigenetics regulators, transcription factors, hormone receptors and others. Two slightly different but overlapping sets of genes were assayed in three different experimental sets (Supplementary Table 1 and Supplementary Data Set 2). Individual blastomeres and maternal and embryonic programs. Based on our data, we defined a maternal program as the average expression values of the
doi:10.1038/nbt.1686
ten markers most quantitatively representative of the ESSP1 genes (maternal transcripts), and the embryonic program by the average expression values of ten markers most representative of the ESSP2 genes (embryonic transcripts). Thus, the gene expression profile of a human embryo at any given time is the sum of maternal mRNA degradation and embryonic or EGA transcripts. Maternal transcripts typically comprise the bulk of transcripts in a young blastomere of early developmental age relative to EGA transcripts, and the opposite holds true for an older blastomere at a more advanced developmental age (Fig. 5b). Automated cell tracking. Our cell tracking algorithm uses a probabilistic framework based on sequential Monte Carlo methods, which in the field of computer vision is often referred to as the particle filter. The particle filter tracks the propagation of three main variables over time: the state, the control and the measurement. The state variable is a model of an embryo and is represented as a collection of ellipses. The control variable is an input that transforms the state variable and consists of our cell propagation and division model. The measurement variable is an observation of the state and consists of our images acquired by the time-lapse microscope. Our estimate of the current state at each time step is represented with a posterior probability distribution, which is approximated by a set of weighted samples called particles. We use the terms particles and embryo models interchangeably, where a particle is one hypothesis of an embryo model at a given time. After initialization, the particle filter repeatedly applies three steps: prediction, measurement and update. Prediction. Cells are represented as ellipses in two-dimensional space, and each cell has an orientation and overlap index. The overlap index specifies the relative height of the cells. In general, there are two types of behavior that we want to predict: cell motion and cell division. For cell motion, our control input takes a particle and randomly perturbs each parameter for each cell, including position, orientation, and length of major and minor axes. The perturbation is randomly sampled from a normal distribution with relatively small variance (5% of the initialized values). For cell division, we use the following approach. At a given point in time, for each particle, we assign a 50% probability that one of the cells will divide. This value was chosen empirically, and spans a wide range of possible cell divisions while maintaining good coverage of the current configuration. If a division is predicted, then the dividing cell is chosen randomly. When a cell is chosen to divide (Supplementary Fig. 9, left), we apply a symmetric division along the major axis of the ellipse, producing two daughter cells of equal size and shape (Supplementary Fig. 9, middle). We then randomly perturb each value for the daughter cells (Supplementary Fig. 9, right). Finally, we randomly select the overlap indices of the two daughter cells while maintaining their collective overlap relative to the rest of the cells. After applying the control input, we convert each particle into a simulated image. This is achieved by projecting the elliptical shape of each cell onto the simulated image using the overlap index. The corresponding pixel values are set to a binary value of 1 and dilated to create a membrane thickness comparable to the observed image data. Because the embryos are partially transparent and out-of-focus light is collected, cell membranes at the bottom of the embryo are only visible sometimes. Accordingly, occluded cell membranes are added with 10% probability. In practice, we have found that these occluded membrane points are crucial for accurate shape modeling, but it is important to make them sparse enough so that they do not resemble a visible edge. Measurement. Once we have generated a distribution of hypothesized models, the corresponding simulated images are compared to the actual microscope image. The microscope image (Supplementary Fig. 10, left) is pre-processed to create a binary image of cell membranes using a principle curvaturebased method (Supplementary Fig. 10, middle) followed by thresholding (Supplementary Fig. 10, right). The accuracy of the comparison is evaluated using a symmetric truncated chamfer distance, which is then used to assign a weight, or likelihood, to each particle. Update. After weights are assigned, particles are selected in proportion to these weights to create a new set of particles for the next iteration. This focuses the particle distribution in the region of highest probability. Particles with low probability are discarded, whereas particles with high probability are multiplied. Particle re-sampling is performed using the low variance method.
nature biotechnology
Once the embryos have been modeled, we can extract the dynamic imaging parameters such as duration of cytokinesis and time between mitoses, as discussed in the main text. Our cell tracking software was previously implemented in Matlab, and computation times ranged from a couple seconds to half a minute for each image depending on the number of particles. Our current version of the software is implemented in the programming languages C/C++, and computation times range from 1 to 5 s depending on the number of particles.
© 2010 Nature America, Inc. All rights reserved.
Discrepancies. In cases where there were discrepancies between manual and automated measurements, abnormal embryos were correctly assessed by the automated software. Moreover, our results indicated that the use of automated image analysis software may potentially provide improved ability to differentiate not only the success or failure to reach blastocyst but also blastocyst quality; however, further data and analysis are needed to support this observation. Applications. Our imaging analysis was validated through the development of a novel cell tracking algorithm that can automatically extract the dynamic imaging parameters. We believe this to be the first documented report with human embryo development subjected to diagnosis by automated image analysis algorithms. Recently, cell tracking algorithms have generated much interest in areas such as predicting stem cell fate57, characterizing subcellular and membrane dynamics58,59, and lineage tracing during embryogenesis in Caenorhabditis elegans60. We anticipate that our algorithms, and in particular the underlying probabilistic model estimation technique, will be useful in other applications of time-lapse image analysis that deal with arbitrary motion dynamics and measurement uncertainties. 42. Zhang, J.Q. et al. Reduction in exposure of human embryos outside the incubator enhances embryo quality and blastulation rate. Reprod. Biomed. Online 20, 510–515 (2010). 43. Veeck, L.L. et al. Significantly enhanced pregnancy rates per cycle through cryopreservation and thaw of pronuclear stage oocytes. Fertil. Steril. 59, 1202–1207 (1993).
nature biotechnology
44. Miller, K.F. & Goldberg, J.M. In vitro development and implantation rates of fresh and cryopreserved sibling zygotes. Obstet. Gynecol. 85, 999–1002 (1995). 45. Damario, M.A., Hammitt, D.G., Galanits, T.M., Session, D.R. & Dumesic, D.A. Pronuclear stage cryopreservation after intracytoplasmic sperm injection and conventional IVF: implications for timing of the freeze. Fertil. Steril. 72, 1049–1054 (1999). 46. Vajta, G., Nagy, Z., Cobo, A., Conceicao, J. & Yovich, J. Vitrification in assisted reproduction: myths, mistakes, disbeliefs and confusion. Reprod. Biomed. Online 19, 1–7 (2009). 47. Liebermann, J. et al. Blastocyst development after vitrification of multipronuclear zygotes using the Flexipet denuding pipette. Reprod. Biomed. Online 4, 146–150 (2002). 48. Tarin, J.J., Trounson, A. & Sathananthan, H. Origin and ploidy of multipronuclear zygotes. Reprod. Fertil. Dev. 11, 273–279 (1999). 49. Sathananthan, A.H. et al. Development of the human dispermic embryo (CD-ROM). Hum. Reprod. Update 5, 553–560 (1999). 50. Baltaci, V. et al. Relationship between embryo quality and aneuploidies. Reprod. Biomed. Online 12, 77–82 (2006). 51. Fino, E. et al. How good is embryo morphology at predicting chromosomal integrity? When is aneuploidy PGD useful? Fertil. Steril. 84, S98–S99 (2005). 52. Kearns, W. et al. Aneuploidy rates of human preimplantation embryos in relation to morphology and development. Fertil. Steril. 86, S474 (2006). 53. Foygel, K. et al. A novel and critical role for Oct4 as a regulator of the maternalembryonic transition. PLoS ONE 3, e4109 (2008). 54. Livak, K.J. & Schmittgen, K.D. Analysis of relative gene expression data using real time quantitative PCR and the 2-deltaCT Method. Methods 25, 402–408 (2001). 55. Vandesompele, J. et al. Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome Biol. 3, 0034.1–0034.12 (2002). 56. Hellemans, J., Mortier, G., De Paepe, A., Speleman, F. & Vandesompele, J. qBase relative quantification framework and software for management and automated analysis of real-time quantitative PCR data. Genome Biol. 8, R19.1–R19.14 (2007). 57. Cohen, A.R., Gomes, F.L., Roysam, B. & Cayouette, M. Computational prediction of neural progenitor cell fates. Nat. Methods 7, 213–218 (2010). 58. Jaqaman, K. et al. Robust single-particle tracking in live-cell time-lapse sequences. Nat. Methods 5, 695–702 (2008). 59. Sergé, A., Bertaux, N., Rigneault, H. & Marguet, D. Dynamic multiple-target tracing to probe spatiotemporal cartography of cell membranes. Nat. Methods 5, 687–694 (2008). 60. Bao, Z. et al. Automated cell lineage tracing in Caenorhabditis elegans. Proc. Natl. Acad. Sci. USA 103, 2707–2712 (2006).
doi:10.1038/nbt.1686
letters
Substrate elasticity provides mechanical signals for the expansion of hemopoietic stem and progenitor cells
© 2010 Nature America, Inc. All rights reserved.
Jeff Holst1,2, Sarah Watson1, Megan S Lord3, Steven S Eamegdool4, Daniel V Bax4, Lisa B Nivison-Smith4, Alexey Kondyurin5, Liang Ma6, Andres F Oberhauser6, Anthony S Weiss4 & John E J Rasko1,2,7 Surprisingly little is known about the effects of the physical microenvironment on hemopoietic stem and progenitor cells. To explore the physical effects of matrix elasticity on wellcharacterized primitive hemopoietic cells, we made use of a uniquely elastic biomaterial, tropoelastin. Culturing mouse or human hemopoietic cells on a tropoelastin substrate led to a two- to threefold expansion of undifferentiated cells, including progenitors and mouse stem cells. Treatment with cytokines in the presence of tropoelastin had an additive effect on this expansion. These biological effects required substrate elasticity, as neither truncated nor cross-linked tropoelastin reproduced the phenomenon, and inhibition of mechanotransduction abrogated the effects. Our data suggest that substrate elasticity and tensegrity are important mechanisms influencing hemopoietic stem and progenitor cell subsets and could be exploited to facilitate cell culture. Stem cells require signals from their environment to retain their phenotype. These signals arise from soluble factors, including growth factors, cytokines and chemokines1, from cell-cell contact and from extracellular matrix proteins2–4. Such signals are detected by receptors present on the surface of stem cells5. In addition to these well-characterized signaling pathways, cell structure and function may be determined by the mechanical forces of tensegrity (tensional integrity)6,7. Indeed, shear stress has recently been shown to promote embryonic hemopoiesis from progenitor cells8, and mesenchymal stem cells have been shown to sense alterations in compressive elasticity, differentiating according to the stiffness or elasticity of their substrate9. In this study we have taken advantage of the unique extensional elasticity properties of tropoelastin, the most elastic biomaterial known10, to examine the effects of elasticity on hemopoietic stem and progenitor cell populations ex vivo. Mouse bone marrow cells were cultured on control or tropoelastin-coated tissue culture plates for 3 d (Fig. 1a,b and Supplementary Fig. 1). In the presence or absence of cytokines, there was a significant (P = 0.0071 and P = 0.0051) increase in the percentage of lineage-negative (Lin−) cells after culture on tropoelastin (Supplementary Fig. 1c,d). For this study we tested a number of
ublished cytokine cocktails, and the combination of interleukin (IL)-3, p IL-6 and stem cell factor was chosen as the optimal mixture. We have previously used this combination to support the culture of repopulating stem cells11–13. Furthermore, compared to controls, the Lin− cells cultured on tropoelastin-coated plates contained a significantly (P = 0.0001) higher percentage of Sca-1+ c-Kit+ cells, which resulted in a greater number of total Lin− Sca-1+ c-Kit+ (LSK) cells (Fig. 1b and Supplementary Fig. 1e,f). There was no evidence of more cell death in the cultured cells (Supplementary Fig. 1a,b); there was also more Sca-1 mRNA and Sca-1 protein on the surface of each cell (Supplementary Fig. 2a,b). There was, however, little effect on the percentage of mature cells, including T cells, B cells, granulocytes and macrophages in this population (Supplementary Fig. 2c). The higher percentage and greater number of LSK cells observed when cells were cultured on tropoelastin-coated plates in the absence of cytokines compared to control (Fig. 1b and Supplementary Fig. 1e) was similar to the those observed when cytokines were added to control plates (Fig. 1b and Supplementary Fig. 1f). This suggested that tropoelastin mediated a similar effect on the maintenance of LSK cells as that produced by the combination of IL-3, IL-6 and stem cell factor, and may be able to replace these cytokines for ex vivo culture. However, tropoelastin and the cytokines act though different mechanisms, as the combination of both produced an additive effect on the percentage and number of LSK cells (Fig. 1b and Supplementary Fig. 1e,f). As the extracellular matrix proteins collagen and fibronectin have previously been used to enhance the growth of hemopoietic cells, we compared their ability to increase the percentage of LSK cells in vitro with tropoelastin’s (Supplementary Fig. 3). When bone marrow cells were cultured on tropoelastin-coated plates for up to 7 d, there was a significant (see figures for P-values) increase in the percentage of LSK cells at each time point compared to controls (Fig. 1c and Supplementary Fig. 3). There was a significant (see figures for P-values) increase in LSK cells after 1 (Supplementary Fig. 3a) and 3 d (Supplementary Fig. 3b) in the tropoelastin-coated plates compared to control, fibronectin- or collagen-coated plates. After 5 (Supplementary Fig. 3c) or 7 d (Supplementary Fig. 3d) the increase in LSK cells with tropoelastin compared to control remained significant (P = 0.0022 and P = 0.0008, respectively).
1Gene
& Stem Cell Therapy Program, Centenary Institute, Camperdown, New South Wales, Australia. 2Sydney Medical School, University of Sydney, Sydney, New South Wales, Australia. 3Graduate School of Biomedical Engineering, The University of New South Wales, Sydney, New South Wales, Australia. 4School of Molecular Bioscience, University of Sydney, Sydney, New South Wales, Australia. 5School of Physics, University of Sydney, Sydney, New South Wales, Australia. 6Department of Neuroscience and Cell Biology, University of Texas Medical Branch, Galveston, Texas, USA. 7Cell and Molecular Therapies, Royal Prince Alfred Hospital, Camperdown, New South Wales, Australia. Correspondence should be addressed to J.E.J.R. ([email protected]). Received 16 June; accepted 7 September; published online 3 October 2010; doi:10.1038/nbt.1687
nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010
1123
letters
0
cKit
8.13
2
3
3
10
12.1
2
10 0
10 0 0 102
3
4
10 10 Sca-1
5
10
0 102
3
4
10 10 Sca-1
105
P = 0.0001 P = 0.0175
P = 0.0159
P = 0.0317
2 1 0
Day 1 Day 3 Day 5 Day 7 Tropoelastin
P = 0.0046
80 60 40 20 0
40 20 0
g 1.0 0.8
Control Tropoelastin P= 0.0039
0.10
0.4
0.05
0.2
Tr
0 LSK
h
0.20 0.15
0.6
0
op
Control
e
P = 0.0043
Cells (% of total)
3
0
4
Cells (% of total)
c
4
10
2 3 Divisions
in
10
1
l
4
0
to k
5
10
0
cy
10
LSK (% of total)
5
10
0.05
in No e IL s -3 ,I L SC -6 F ,
FSC
0.1
Colonies (per 104 cells)
50K 100K 150K 200K 250K
FSC
cKit
5,000
st
0
10,000
0.10
la
50K 100K 150K 200K 250K
19.9
0.2
60
Tr Con op tr oe ol la st in
2
10 0
P = 0.7879
15,000
0.15
tro
6.49
20,000
0.3
oe
2
10 0
3
10
P = 0.0262 P = 0.0286
f
Control Tropoelastin
on
3
10
25,000
4
10
d
C
4
10
LSK (number)
Lineage markers
10
0
LSK CD150+ CD48-
Transplanted cells (× 106)
Figure 1 Tropoelastin increased mouse hemopoietic stem and progenitor cells. Mouse bone marrow 0 2 4 6 8 100 cells were cultured on control or tropoelastin-coated plates for 7 d. (a,b) On day 3, cells were harvested, 3.1 6.4 counted, and analyzed by flow cytometry (a), and the numbers of LSK cells were compared to those in fresh uncultured bone marrow (baseline), expressed as mean ± s.e.m. (b; n = 4–5). (c) On days 1, 3, 37 5 and 7, cells were analyzed by flow cytometry, expressed as mean ± s.e.m. (days 1, 5 and 7, n = 3; day 3, n = 10). (d) Cells were labeled with CFSE, cultured for 3 d and analyzed by flow cytometry; results Control Tropoelastin are expressed as mean ± s.e.m., with left y axis denoting undivided cells and right axis denoting cell divisions 1–4 (n = 4). (e,f) Cells cultured for either 3 (n = 4) or 5 d (n = 3) were subsequently cultured 10 in MethoCult medium and colonies enumerated (expressed as mean ± s.e.m.). (g) Cells cultured for 3 d + were analyzed for SLAM markers by flow cytometry, expressed as mean ± s.e.m. (n = 4). CD45.1 bone marrow cells cultured for 3 d were injected into irradiated CD45.2+ mice and analyzed after 8 weeks for engraftment. (h) The number of transplanted cells were plotted against the percentage of mice with unsuccessful engraftment to determine the frequency of repopulating cells (n = 25 recipients per group from five separate experiments). Negative mice (% total)
© 2010 Nature America, Inc. All rights reserved.
Day 0 bone marrow Day 3 control Day 3 tropoelastin
LSK (% of total)
Lineage markers
10
b
Tropoelastin 5
LSK (% of total)
Control 5
Colonies (per 104 cells)
a
The increased numbers of LSK cells after culture on tropoelastin suggested that these cells were preferentially expanding ex vivo (Fig. 1b). Compared to starting numbers of LSK cells, there was a 5.7-fold increase in cell numbers after culture on tropoelastin without cytokines, and a 14.2-fold increase in cell numbers after culture on tropoelastin with cytokines (Fig. 1b). To further examine this expansion of hemopoietic cells ex vivo, we used carboxyfluorescein succinimidyl ester (CFSE)-labeled bone marrow cells to directly show the division of LSK cells. After 3 d in culture, there was an increase in the number and percentage of LSK cells on tropoelastin-coated plates that had divided and maintained their phenotype, compared to control plates (Fig. 1d and Supplementary Fig. 4a,b). We next sought to define the effects of tropoelastin on specific primitive hemopoietic cell subsets, including progenitors and repopulating stem cells. To determine the clonogenic potential of cells cultured on tropoelastin, we analyzed equal numbers of cells by colony-forming assay (Fig. 1e,f). A significant (day 3, P = 0.0046; day 5, P = 0.0043) increase in the total number of colonies was observed in cells cultured on tropoelastin-coated plates compared to controls (Fig. 1e,f). No difference in the size or type of colonies was observed (Supplementary Fig. 5). The signaling lymphocyte attractant molecule (SLAM) markers CD48 and CD150 can be used to more accurately determine the presence of long-term repopulating hemopoietic cells within the LSK population14,15. We observed a significant (P = 0.0039) two- to threefold increase in the CD48− CD150+ LSK population after culture on tropoelastin-coated plates compared to control (Fig. 1g). To further assess the hemopoietic repopulating 1124
stem cells, we measured the engraftment of cultured cells transplanted into irradiated congenic recipient mice. Cells were cultured on control or tropoelastin-coated plates and transplanted into mice, and the percentage of mice with engrafted donor cells was determined. The results showed a higher frequency of repopulating cells after culture on tropoelastin-coated plates compared to control (Fig. 1h; n = 25). The frequency of repopulating cells was also determined using extreme limiting dilution analysis (ELDA) software, showing a significantly higher number of repopulating cells after culture on tropoelastin-coated plates (1 in 2.91 × 106) compared to control (1 in 7.34 × 106; P = 0.0029). Mouse and human hemopoietic cells differ in their cell surface markers and in some aspects of their biology. To determine whether tropoelastin could induce similar effects on human hemopoietic progenitor cells to those observed in mouse cells, we cultured human umbilical cord blood cells on control or tropoelastin-coated plates for 3 d. There was a significant (P = 0.0039) increase in the percentage of Lin− CD34+ CD38+ cells, as well as a small increase in the level of CD34 and CD38 staining per cell, in tropoelastin-coated compared to control plates (Fig. 2a,b). There was no evidence of more cell death as measured by cell count or flow cytometry (Supplementary Fig. 6). Additionally, human cells were analyzed by colony-forming assay and, consistent with mouse cells, there were significantly (P = 0.0025) more progenitor cells after culture on tropo elastin-coated plates (Fig. 2c). There was no difference in the size or type of colonies observed between the control and tropoelastincoated plates (Supplementary Fig. 7). VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology
letters
4
10
0K Colonies (per 104 cells)
10
5 4
CD38
CD38
0K
0
50
5
10
3
10
3.8
2
3
10
7.75
2
10 0
10 0 2
0 10
© 2010 Nature America, Inc. All rights reserved.
c
FSC
3
4
10 10 CD34
10
5
2
0 10
3
4
10 10 CD34
10
5
P = 0.0039
3 2 1 0
38.5
FSC 10
Lin–CD34+CD38+ (%)
10 0
50 K 10 0K 15 0K 20 0K 25 0K
33.7
2
25
10 0
10
0K
2
3
20
10
10
0K
3
15
10
4
10
4
4
Tr Co op ntr oe ol la st in
5
10
K
5
Lineage markers
10
b
Tropoelastin
200
P = 0.0025
150 100 50 0
Tr Co op nt oe rol la st in
Control
0
Lineage markers
a
Figure 2 Tropoelastin increased human hemopoietic progenitor cells. Human umbilical cord blood cells were cultured on control or tropoelastincoated plates. (a,b) After 3 d cells were analyzed by flow cytometry with representative dot plots shown for lineage negative gated co-expression of CD34 and CD38. (b) The percentage of Lin−CD34+CD38+ cells is shown (mean ± s.e.m.; n = 5). (c) After 3 d, cells were cultured in MethoCult medium and colonies enumerated (expressed as mean ± s.e.m.) (n = 4). Statistical significance was determined using a two-tailed Wilcoxon signed rank t test.
To determine the mechanism by which tropoelastin mediates the increase of hemopoietic progenitor cells, we purified three different tropoelastin truncation mutants—ELN27-540, ELN27-365, ELN297595—and full-length tropoelastin and used them to coat tissue culture plates (Fig. 3a). Mouse bone marrow cells were cultured in the plates for 3 d to determine whether a subdomain of the protein conferred its functionality. Plates with intact tropoelastin or ELN27-540 had a higher percentage of LSK cells than control plates, whereas plates with ELN27-365 or ELN297-595 alone did not differ from control plates (Fig. 3b). ELN27-365 and ELN297-595 overlap, and together contain all intact amino acid sequences comprising the ELN27-540 mutant. To determine whether the increase in LSK cells was due to domains present in the ELN27-540, we coated tissue culture plates with ELN27-365 and ELN297-595 together. The combination of these two truncations, however, did not mediate an increase in the percentage of LSK cells (Fig. 3b). This result shows that a property unique to the intact region ELN27-540, and not any intrinsic ability of the amino acid sequences to bind to cells, was responsible for the cellular effects. To determine whether these effects were due to a loss of extensional elasticity in the truncated tropoelastin proteins, we used single-molecule atomic force microscopy (AFM) to determine their extensibilities. Force-extension measurements confirmed that ELN27-540 was more extensible than ELN27-365 or ELN297-595 compared to full-length tropoelastin (Fig. 3c). To further confirm that the elastic properties of tropoelastin were required for its biological effects on hemopoietic progenitor cells, we cultured mouse bone marrow cells on full-length tropoelastin that had been cross-linked using glutaraldehyde. At concentrations ≥0.1% glutaraldehyde, most of the biological effects of tropoelastin were lost (Fig. 3d). This result confirmed that the physical properties of tropoelastin were responsible for its cellular effects. To confirm that glutaraldehyde cross-linking had reduced the elasticity of nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010
t ropoelastin, we performed single-molecule AFM on the cross-linked tropoelastin (Fig. 3c). These data established that there was a correlation between the extent of the elasticity of tropoelastin and maintenance of progenitor cells. For example, in a very low concentration of 0.01% glutaraldehyde, tropoelastin maintained a mean contour length of 186 nm and increased the percentage of LSK cells compared to higher glutaraldehyde concentrations, which abrogated the effects (P = 0.0079). The overall comparisons of the atomic force measurements for truncations and glutaraldehyde cross-linking revealed a threshold for the mean extensional elasticity, showing that extension lengths >125 nm were required to mediate increased percentages of LSK cells (Fig. 3e, dotted line). Quartz crystal microbalance with dissipation monitoring (QCMD) analyses were performed to further characterize the tropoelastin coatings, as well as fibronectin and collagen. Collagen type I formed a multilayered structure on the oxidized polystyrene surface, as shown by the low Δ frequency values (Fig. 4a,b). Fibronectin and tropo elastin, however, formed a single-layer configuration with higher Δ frequency values, and lower thickness than collagen type I (Fig. 4a,b). The higher Δ dissipation value for tropoelastin showed increased viscoelasticity compared to fibronectin (Fig. 4a). The high Δ dissipation for collagen type I is a consequence of the highly hydrated multilayer configuration on the oxidized polystyrene surface, which leads to a thickness of >100 nm (Fig. 4b). Intact tropoelastin and ELN27-540 were found to rapidly adsorb onto oxidized polystyrene with differences in the binding evident from the Df plot (Fig. 4a). ELN27-540 bound to the surface and underwent rearrangement as shown by the decrease in dissipation whereas intact tropoelastin did not undergo rearrangement once adsorbed onto the surface. Modeling of the QCM-D data revealed that intact tropoelastin and ELN27-540 each bound to oxidized polystyrene in a monolayer (Fig. 4b,c). Intact tropoelastin bound to oxidized polystyrene in a thicker layer than ELN27-540 (Fig. 4b) and higher mass density (Fig. 4c) owing to the difference in molecular weight of the two proteins (60 and 44.4 kDa, respectively). Both proteins bound to oxidized polystyrene with the same circular footprint (Fig. 4d). Taken together with the thickness data presented in Figure 4b, this suggests that the C-terminal region protruded from the surface and was not required for the mechanical signals that led to expansion of hemopoietic stem and progenitor cells (Fig. 3). Tropoelastin molecules did not appreciably interact as only nanogram amounts were present on the coated surfaces. Tropoelastin intermolecular interactions display a Ka of 1.71 ± 0.31 × 105 Ms−1 and a Kd of 3.8 ± 0.22 × 10−3 s−1, resulting in a Kd of 2.28 ± 0.29 × 10−8 M at a χ2 of 1.18 (Baldock, C. et al., unpublished data). This low dissociation constant and molecular tethering means that biomolecular interactions between surface-bound tethered tropoelastin molecules would be transient and rare. Furthermore, these data do not suggest that tropoelastin forms a multilayer structure that could result in increased surface area being presented to the cells. The AFM image confirmed that nonhydrated tropoelastin coated the polystyrene surface although some ‘holes’ were not coated (Supplementary Fig. 8a). The area coated by tropoelastin was as smooth as the initial polystyrene substrate. Such island-like protein coats also occur for horseradish peroxidase on polystyrene16. The roughness histogram shows the distribution of surface pixel events of independent random events (Supplementary Fig. 8b). The position of the bottom peak was 1.69 nm, whereas the position of the upper peak was 8.11 nm, with an average thickness of the tropoelastin coating being 6.42 nm. The roughness of the polystyrene substratum and tropoelastin upper surface was defined by the width of these peaks. The roughness of each surface was less than the thickness 1125
ELN27-365
297
595
27
540
KP ( ) and KA ( ) cross-linking domains VGVAPG hexapeptide domain
Fold increase (% Lin–Sca-1+cKit+)
b
4
ELN297-595 ELN27-540
Hydrophobic domains Integrin binding domain
P = 0.0001 Control Tropoelastin ELN27-540 ELN27-365 ELN297-595 ELN27-365 + ELN297-595
P = 0.0002
3 2 1 0
d
5 4 3 2 1
P = 0.0002 P = 0.0002 P = 0.0280
Control Tropoelastin
Events (% of total)
365
Events (% of total)
27
25
Events (% of total)
724 Tropoelastin
30
Events (% of total)
27
25
30
Events (% of total)
c
a
Fold increase (% Lin–Sca-1+cKit+)
Figure 3 Effect of tropoelastin truncations and cross-linking on the ability of mouse hemopoietic cells to respond to tropoelastin. (a) Schematic of full-length tropoelastin and truncations, including both domain structure and amino acid numbering of each construct end. (a–d) Mouse bone marrow cells were cultured on control, tropoelastin-coated or truncated tropoelastin–coated plates (b) or on glutaraldehyde–cross-linked tropoelastincoated plates (d). After 3 d cells were analyzed by flow cytometry for the absence of lineagespecific surface markers and for coexpression of Sca-1 and c-Kit. Shown is the fold increase in percentage of LSK cells relative to control plates. In b,d, data are mean ± s.e.m. from three to six separate experiments. In c, tropoelastin or truncations were deposited on glass slides (with or without cross-linking with glutaraldehyde) and their extensibilities (contour lengths) were determined by atomic force microscopy using the worm-like chain model of polymer elasticity. Data are the percentage of total events, including a Gaussian nonlinear fit curve. (e) The mean ± s.e.m. of the contour length was compared. A putative threshold of extensional elasticity to retain biological activity is shown (dotted line).
40
Tropoelastin Mean = 236 nm
20 15 10 5 0
0
100
200
300
400
ELN27-540 Mean = 149 nm
20 15 10 5 0
0
100
200
300
400
ELN27-365 Mean = 66 nm
20 10 0
0
100
200
300 400 ELN297-595 Mean = 110 nm
0
100
200
300 400 0.01% glut. Mean = 186 nm
20 10 0
30 20 10 0
1126
Events (% of total)
Events (% of total)
0 100 200 300 400 0 of a single tropoelastin coating. The distri40 0 0.01 0.1 1 0.1% glut. bution of tropoelastin on polystyrene was Glutaraldehyde (%) Mean = 87 nm 30 e calculated based on the thickness estima20 250 tion for the tropoelastin coating. The AFM Tropoelastin 10 image was analyzed to obtain a distribution 200 ELN27-540 0 ELN27-365 of holes >7 nm in depth (Supplementary 0 100 200 300 400 150 30 ELN297-595 1% glut. Fig. 8c). Nonhydrated tropoelastin covered Mean = 99 nm 100 70–75% of the surface. Therefore, because all 0.01% glut. 20 0.1% glut. cell assays were performed on hydrated tro50 10 1% glut. poelastin, in which >75% coverage would be 0 0 expected, every cell that was in contact with 0 100 200 300 400 the culture dish would be expected to interact Contour length (nm) with many tropoelastin molecules. The regulation of intracellular and extracellular tension relies on which cells could monitor substrate elasticity involves two-way interthe balance between adhesion and actomyosin contractility, which actions with the actin-myosin cytoskeleton, coupled through memcombine to affect gene expression by the process of mechanotransduc- brane receptors such as integrins22. The response of hemopoietic cells tion17. As the truncation and cross-linking studies definitively showed to tropoelastin that we observed did not require integrin signaling, that the physical properties of tropoelastin were required to increase as truncated tropoelastin, lacking the integrin binding C terminus the percentage of LSK cells, we set out to test whether tropoelastin (Fig. 3a)23, retained most of the capacity to increase the percentage of acts through the mechanotransduction machinery. This machinery LSK cells in vitro (Fig. 3b). Glycosaminoglycans have also been shown requires the activity of myosin II, and so we used specific inhibi- to mediate cell binding to tropoelastin24, as has the elastin binding tors of either myosin II heavy chain (blebbistatin) or myosin light protein/elastin-laminin receptor25. Whereas the elastin binding chain kinase (ML-7). Both inhibitors led to a significant (blebbistatin, protein recognition sequence (VGVAPG) hexapeptide repeat present P = 0.0001; ML-7, P = 0.0156) abrogation or reduction in the effects in tropoelastin mutant ELN482-525 may be required for binding of of culturing LSK cells on tropoelastin, with blebbistatin-treated cells tropoelastin to the cells26, any signals generated by this domain alone on tropoelastin indistinguishable from control cells (Fig. 4e). Neither were not sufficient for ELN297-595 to maintain LSK cells (Fig. 3b). blebbistatin nor ML-7 had an effect on control cells (Fig. 4e,f). Like integrins, this receptor has previously been shown to mediate Molecular signals originating from the ligation of hemopoietic stem mechanotransduction27,28. Furthermore, culturing cells on plates cell surface receptors including integrins, growth factor receptors coated with cross-linked tropoelastin, whose signaling domains are and cytokine receptors have been studied in detail for their ability to intact but whose elasticity is impaired, significantly (see Fig. 3d for maintain quiescence18, induce proliferation or promote differentia- P values) reduced the percentage of LSK cells compared to cells grown tion19. However, little is known regarding the mechanisms by which on untreated tropoelastin (Fig. 3d). The presence of a monolayer of the stiffness or elasticity of the microenvironment may affect cellu- tropoelastin in the experiments described herein would not permit lar functions or indeed how the sensing apparatus and transduction the cells to exert a compressive effect, because tropoelastin monomers of mechanical signals function20,21. The most likely mechanism by can be extended, but not compressed, from their native form. Taken Contour length (nm)
© 2010 Nature America, Inc. All rights reserved.
letters
VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology
letters
1,000 500 0
0
6 4 2 0
5 4
P = 0.3027 P= P= 0.0001 0.0001
3 2 1 0
f Fold increase – + + (% Lin Sca-1 cKit )
2
1,500
8
e
P = 0.0047 5 4
P= P= 0.0002 0.0156
3 2 1 0 Tr Co o n Tr poe trol o l 20 poe ast µM las in M tin + L7
0 –150 –100 –50 ∆ frequency (Hz)
6,000
Collagen type I
d 10
Tr Co op nt r µMTro oe ol p l bl oe asti eb la n bi sti st n at + in
10
Fibronectin
6,500
50
20
ELN27-540
Fold increase – + + (% Lin Sca-1 cKit )
10
100
Mass (ng/cm )
Thickness (nm)
∆ dissipation (1E-6)
c
125
1 –200
© 2010 Nature America, Inc. All rights reserved.
b 150
Circular footprint (diameter, nm)
Tropoelastin
a 100
Figure 4 QCM-D analysis of collagen, fibronectin, tropoelastin and ELN27-540 binding to oxidized polystyrene, and the effect of myosin II heavy chain and myosin light chain kinase inhibitors on the ability of mouse hemopoietic cells to respond to tropoelastin. Intact collagen, fibronectin, tropoelastin and ELN27-540 adsorbed onto oxidized polystyrene were monitored by QCM-D for 1 h at 20 ± 0.1°C. (a) Changes in dissipation (Δ dissipation) versus changes in frequency (Δ frequency) are presented for the third overtone (Dƒ plot). (b–d) These data were analyzed using the Voigt model to determine adsorbed layer thickness (b), mass (c) and protein circular footprint (d). Circular footprint diameter measurements were performed assuming globular proteins of tropoelastin (60 kDa), ELN27-540 (44.4 kDa), fibronectin (440 kDa) and collagen (300 kDa). Data are presented as mean ± s.d. from three separate experiments. (e,f) Mouse bone marrow cells were cultured on control (uncoated) or tropoelastin-coated plates in the presence or absence of blebbistatin (e) or DMSO ± ML-7 (f). On day 3, cells were analyzed by flow cytometry to determine the percentage of LSK cells, relative to control (uncoated) plates. Data are mean ± s.e.m. from three separate experiments. Statistical significance was determined using a two-tailed Wilcoxon signed rank t test.
together, these data suggest that the property of extensional elasticity itself conferred the increase in LSK cells. The ability of cells to sense their microenvironment by coupling actin fibers and integrins together may provide a capability to push and pull the matrix, testing elasticity and thus determining cell fate29. Inhibition of myosin II heavy chain and myosin light chain kinase confirmed the role of the actin-myosin cytoskeleton in this process, again suggesting that mechanical signals are involved (Fig. 4e,f). As myosin II is a highly elastic molecule, it is conceivable that elasticity extends continuously from the extracellular tropoelastin deep inside the cell30. It is known that certain disease states in which tissue elasticity is abnormal may result in suboptimal cellular differentiation 31. The AFM measurements of tropoelastin used in this study showed that a cell would be able to extend tropoelastin up to 200 nm, however, truncations maintaining elasticity of over 125 nm also retained most of the functionality. As membrane tension has also recently been shown to be an important determinant for cell protrusions, our results lend support to a tensegrity model of the stem cell niche32. The stem cell and its ‘niche synapse’ may establish a stable mechanical structure that actively promotes the pluripotent state. Shear stress has recently been shown to enhance mouse embryonic hemopoiesis, suggesting that biomechanical forces are involved in embryonic hemopoietic development8. The data presented herein provide evidence that both mouse and human hemopoietic stem and progenitor cells can also respond directly to biomechanical forces through extensional elasticity. Taken together, these data suggest that throughout development, hemopoietic stem and progenitor cells constantly sense and react to the physical signals provided by their niche environments. As such, biomimetic surface and ex vivo culture design should include consideration of physical properties, including elastic and compressive extensibility and shear stress33. Furthermore, we produced highly purified good laboratory practice (GLP)-grade tropoelastin, which showed a significant increase in maintenance of hemopoietic progenitor cells compared to laboratory-grade tropo elastin (Supplementary Fig. 9). GLP-grade tropoelastin exhibited an increase in mean extensional length compared head to head with laboratory-grade tropoelastin (229 ± 35 nm versus 198.3 ± 35.6 nm, respectively). Recent data showing that primitive hemopoietic cells emerge from the endothelium in the aortic floor during development nature biotechnology VOLUME 28 NUMBER 10 OCTOBER 2010
suggest that elastin might be in contact with stem or progenitor cells during early blood cell development34–38. However, it remains to be determined whether elastin is present in the microenvironmental niche or whether it has a physiological role in hemopoiesis. Either way, biomaterials engineered with unique mechanical properties such as tropoelastin may be used to mimic these niche environments ex vivo. The evidence presented here supports the use of optimal physical substrates that may replace, complement or add to existing approaches to support and expand stem cells33. In this way, elastic substrates such as tropoelastin may offer an approach to biomaterial design39,40 aimed at achieving optimal mechanical culture conditions for the maintenance of stem cells ex vivo. Methods Methods and any associated references are available in the online version of the paper at http://www.nature.com/naturebiotechnology/. Note: Supplementary information is available on the Nature Biotechnology website. Acknowledgments We thank D. Vignali for scientific discussion. J.H. received grant support from the Cancer Institute of New South Wales, A.S.W. from the Australian Research Council and National Health and Medical Research Council, A.F.O. received support from the US National Institutes of Health and J.E.J.R. from the National Health and Medical Research Council and the Cell and Gene Trust, Cure The Future. AUTHOR CONTRIBUTIONS J.H. and J.E.J.R. designed the experiments and wrote the paper, J.H. and M.S.L. analyzed the data, J.H., S.W., A.F.O., L.M., S.S.E., M.S.L. and A.K. generated the data, A.S.W. and J.E.J.R. provided conceptual input and D.V.B., L.B.N.-S. and S.S.E. provided tropoelastin reagents. COMPETING FINANCIAL INTERESTS The authors declare competing financial interests: details accompany the full-text HTML version of the paper at http://www.nature.com/naturebiotechnology/. Published online at http://www.nature.com/naturebiotechnology/. Reprints and permissions information is available online at http://npg.nature.com/ reprintsandpermissions/. 1. Sauvageau, G., Iscove, N.N. & Humphries, R.K. In vitro and in vivo expansion of hematopoietic stem cells. Oncogene 23, 7223–7232 (2004).
1127
© 2010 Nature America, Inc. All rights reserved.
letters 2. Carter, W.G. & Wayner, E.A. Characterization of the class III collagen receptor, a phosphorylated, transmembrane glycoprotein expressed in nucleated human cells. J. Biol. Chem. 263, 4193–4201 (1988). 3. Ghaffari, S., Dougherty, G.J., Lansdorp, P.M., Eaves, A.C. & Eaves, C.J. Differentiationassociated changes in CD44 isoform expression during normal hematopoiesis and their alteration in chronic myeloid leukemia. Blood 86, 2976–2985 (1995). 4. Williams, D.A., Rios, M., Stephens, C. & Patel, V.P. Fibronectin and VLA-4 in haematopoietic stem cell-microenvironment interactions. Nature 352, 438–441 (1991). 5. Wilson, A. & Trumpp, A. Bone-marrow haematopoietic-stem-cell niches. Nat. Rev. Immunol. 6, 93–106 (2006). 6. Janmey, P.A. & McCulloch, C.A. Cell mechanics: integrating cell responses to mechanical stimuli. Annu. Rev. Biomed. Eng. 9, 1–34 (2007). 7. Ainsworth, C. Cell biology: stretching the imagination. Nature 456, 696–699 (2008). 8. Adamo, L. et al. Biomechanical forces promote embryonic haematopoiesis. Nature 459, 1131–1135 (2009). 9. Engler, A.J., Sen, S., Sweeney, H.L. & Discher, D.E. Matrix elasticity directs stem cell lineage specification. Cell 126, 677–689 (2006). 10. Knowles, T.P. et al. Role of intermolecular forces in defining material properties of protein nanofibrils. Science 318, 1900–1903 (2007). 11. Holst, J. et al. Generation of T-cell receptor retrogenic mice. Nat. Protoc. 1, 406–417 (2006). 12. Holst, J., Vignali, K.M., Burton, A.R. & Vignali, D.A. Rapid analysis of T-cell selection in vivo using T cell-receptor retrogenic mice. Nat. Methods 3, 191–197 (2006). 13. Holst, J. et al. Scalable signaling mediated by T cell antigen receptor-CD3 ITAMs ensures effective negative selection and prevents autoimmunity. Nat. Immunol. 9, 658–666 (2008). 14. Kiel, M.J. et al. SLAM family receptors distinguish hematopoietic stem and progenitor cells and reveal endothelial niches for stem cells. Cell 121, 1109–1121 (2005). 15. Foudi, A. et al. Analysis of histone 2B-GFP retention reveals slowly cycling hematopoietic stem cells. Nat. Biotechnol. 27, 84–90 (2009). 16. Gan, B.K., Kondyurin, A. & Bilek, M.M. Comparison of protein surface attachment on untreated and plasma immersion ion implantation treated polystyrene: protein islands and carpet. Langmuir 23, 2741–2746 (2007). 17. Clark, K., Langeslag, M., Figdor, C.G. & van Leeuwen, F.N. Myosin II and mechanotransduction: a balancing act. Trends Cell Biol. 17, 178–186 (2007). 18. Passegué, E., Wagers, A.J., Giuriato, S., Anderson, W.C. & Weissman, I.L. Global analysis of proliferation and cell cycle gene expression in the regulation of hematopoietic stem and progenitor cell fates. J. Exp. Med. 202, 1599–1611 (2005). 19. Kiel, M.J. & Morrison, S.J. Uncertainty in the niches that maintain haematopoietic stem cells. Nat. Rev. Immunol. 8, 290–301 (2008). 20. Wang, N., Tytell, J.D. & Ingber, D.E. Mechanotransduction at a distance: mechanically coupling the extracellular matrix with the nucleus. Nat. Rev. Mol. Cell Biol. 10, 75–82 (2009).
1128
21. Discher, D.E., Mooney, D.J. & Zandstra, P.W. Growth factors, matrices, and forces combine and control stem cells. Science 324, 1673–1677 (2009). 22. Even-Ram, S., Artym, V. & Yamada, K.M. Matrix control of stem cell fate. Cell 126, 645–647 (2006). 23. Bax, D.V., Rodgers, U.R., Bilek, M.M. & Weiss, A.S. Cell adhesion to tropoelastin is mediated via the C-terminal GRKRK motif and integrin alphaVbeta3. J. Biol. Chem. 284, 28616–28623 (2009). 24. Broekelmann, T.J. et al. Tropoelastin interacts with cell-surface glycosaminoglycans via its COOH-terminal domain. J. Biol. Chem. 280, 40939–40947 (2005). 25. Mecham, R.P. et al. Elastin binds to a multifunctional 67-kilodalton peripheral membrane protein. Biochemistry 28, 3716–3722 (1989). 26. Rodgers, U.R. & Weiss, A.S. Cellular interactions with elastin. Pathol. Biol. (Paris) 53, 390–398 (2005). 27. Spofford, C.M. & Chilian, W.M. The elastin-laminin receptor functions as a mechanotransducer in vascular smooth muscle. Am. J. Physiol. Heart Circ. Physiol. 280, H1354–H1360 (2001). 28. Spofford, C.M. & Chilian, W.M. Mechanotransduction via the elastin-laminin receptor (ELR) in resistance arteries. J. Biomech. 36, 645–652 (2003). 29. Galbraith, C.G., Yamada, K.M. & Galbraith, J.A. Polymerizing actin fibers position integrins primed to probe for adhesion sites. Science 315, 992–995 (2007). 30. Schwaiger, I., Sattler, C., Hostetter, D.R. & Rief, M. The myosin coiled-coil is a truly elastic protein structure. Nat. Mater. 1, 232–235 (2002). 31. Puttini, S. et al. Gene-mediated restoration of normal myofiber elasticity in dystrophic muscles. Mol. Ther. 17, 19–25 (2009). 32. Ji, L., Lim, J. & Danuser, G. Fluctuations of intracellular forces during cell protrusion. Nat. Cell Biol. 10, 1393–1400 (2008). 33. Lutolf, M.P., Gilbert, P.M. & Blau, H.M. Designing materials to direct stem-cell fate. Nature 462, 433–441 (2009). 34. Zovein, A.C. et al. Fate tracing reveals the endothelial origin of hematopoietic stem cells. Cell Stem Cell 3, 625–636 (2008). 35. Chen, M.J., Yokomizo, T., Zeigler, B.M., Dzierzak, E. & Speck, N.A. Runx1 is required for the endothelial to haematopoietic cell transition but not thereafter. Nature 457, 887–891 (2009). 36. Bertrand, J.Y. et al. Haematopoietic stem cells derive directly from aortic endothelium during development. Nature 464, 108–111 (2010). 37. Boisset, J.C. et al. In vivo imaging of haematopoietic cells emerging from the mouse aortic endothelium. Nature 464, 116–120 (2010). 38. Kissa, K. & Herbomel, P. Blood stem cells emerge from aortic endothelium by a novel type of cell transition. Nature 464, 112–115 (2010). 39. Mithieux, S.M., Rasko, J.E. & Weiss, A.S. Synthetic elastin hydrogels derived from massive elastic assemblies of self-organized human protein monomers. Biomaterials 25, 4921–4927 (2004). 40. Mitragotri, S. & Lahann, J. Physical approaches to biomaterial design. Nat. Mater. 8, 15–23 (2009).
VOLUME 28 NUMBER 10 OCTOBER 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
ONLINE METHODS
Extracellular matrix proteins. We have previously described the expression and purification from bacteria of recombinant human tropoelastin isoforms41,42. SHELΔ26A (synthetic human elastin without domain 26A) is the 60-kDa mature form of the secreted protein after removal of the signal peptide (ELN27-724; tropoelastin). This is the most common normal splice variant of human tropoelastin, as exon 26A is expressed only in certain disease states43,44. Tropoelastin truncations have been described previously, including ELN27-540 (exons 2–25)45, ELN27-365 (exons 2–18)46 and ELN297-595 (exons 17–27)47 (Fig. 3a). Full-length tropoelastin or truncations (1.5 mg/ml in PBS) were deposited on 6- or 24-well plates (Cellstar tissue culture–treated plates, Greiner) overnight at 4 °C. Plates were washed twice with PBS to remove excess tropoelastin immediately before use. Tropoelastin solution was reused multiple times without any detectable difference in performance. Collagen-precoated six-well plates (Becton Dickinson) were washed twice with PBS before use. Fibronectin (1 mg/ml; Millipore) was deposited on 6- or 24-well plates for 30 min at 37 °C according to the manufacturers’ instructions and were washed twice with PBS immediately before use. Glutaraldehyde was diluted in PBS and deposited on washed tropoelastin-precoated plates for 1 h at room temperature. Glutaraldehyde was subsequently removed, quenched twice with 1 M Tris base solution (pH 7.0) and then washed twice with PBS to remove remaining glutaraldehyde and Tris buffer. Two washes after glutaraldehyde treatment were sufficient to avoid any observable cellular toxicity, and incubation of cells in glutaraldehyde at 0.001% caused no reduction in the percentage of LSKs (data not shown). Cell culture. C57BL/6J mice were obtained from the Animal Resource Centre (Perth, Australia), and all animal experiments performed in a specific pathogen–free facility according to national and institutional guidelines. Bone marrow was harvested from the femur, tibia and spine using a mortar and pestle in PBS supplemented with 2% FCS as previously described 11–13. Bone marrow cells (2.5 × 106 cells per ml) in duplicate or triplicate were incubated at 5% CO2 in a fully humidified atmosphere in complete Dulbecco’s Modified Eagles Medium with 20% FCS (cDMEM), with or without mouse IL-3 (20 ng/ml), human IL-6 (50 ng/ml) and mouse stem cell factor (50 ng/ml; all from Peprotech) in 6- or 24-well plates, with or without precoating with extracellular matrix proteins. Bone marrow cells were harvested using Tryple (Invitrogen). Inhibitors (blebbistatin (Merck); 50 μM, ML-7 (Merck); 20 μM) were added to the initial culture and again after 48 h. ML-7 was dissolved in DMSO, which was also added to control wells. For CFSE labeling, 5 × 107 cells per ml in DMEM were rapidly mixed with CFSE to a final concentration of 5 μM, incubated at 37 °C for 10 min, before washing 4 times in cold cDMEM. Human umbilical cord blood mononuclear cells were isolated from fresh samples using a Ficoll-Paque density gradient centrifugation following the manufacturers’ instructions. Mononuclear cells were cultured in triplicate in six-well plates, with or without precoating with tropoelastin, harvested after 3 d using Tryple, washed, and resuspended in PBS with 2% FCS for analysis. Colony assays were performed by plating equal numbers of cells in triplicate in MethoCult medium containing mouse or human cytokines following the manufacturers’ instructions (Stem Cell Technologies). Colonies containing more than ten cells were enumerated and classified after 7–10 d using a light microscope. Recipient mice (C57BL/6J or B6SJL-PtprcaPep3b/BoyJ) were irradiated to 900 cGy using a cesium irradiator before intravenous injection of cultured bone marrow cells (2–4 × 106 cells/mouse). Reconstituted mice were killed after 8 weeks, at which time spleen and peripheral blood were analyzed by flow cytometry for CD45.1 and CD45.2 expression. The frequency of stem cells and comparison of the two groups was determined by the ELDA web tool for limiting dilution analysis, or using a nonlinear fit algorithm (x-linear, y-log) with analysis at 37% unsuccessful engraftment48. Flow cytometric analysis. Bone marrow cells were incubated with lineage specific antibodies (B220, CD3, Gr-1, Ter119 and CD11b; BioLegend), preconjugated to biotin, together with antibodies against Sca-1–fluorescein isothiocyanate (Sca-1–FITC; Biolegend) or Sca-1–allophycocyanin (Sca-1–APC; BioLegend) and c-Kit–phycoerythrin (c-Kit-PE; Becton Dickinson). SLAM antibody markers used were CD150-APC (BioLegend) and CD48–Pacific Blue (BioLegend). Cells were washed twice in PBS with 2% FCS and stained with streptavidin-PE/indotricarbocyanine (streptavidin PE/Cy5; Becton
doi:10.1038/nbt.1687
Dickinson). Peripheral blood and spleen cells from transplanted mice were stained with lineage-specific antibody combinations together with antibodies against CD45.1–Pacific Blue and CD45.2–Alexa Fluor-700 (BioLegend) to detect recipient and donor cells. Atomic force microscopy. The mechanical properties of tropoelastin proteins were studied using a purpose-built single-molecule atomic force microscope as previously described49,50. The spring constant of each individual cantilever (MLCT-AUHW: silicon nitride gold-coated cantilevers; Veeco Metrology Group) was calculated using the equipartition theorem51. The r.m.s. force noise (1–kHz bandwidth) was ~10 pN. The pulling speed of the different forcedistance curves was in the range of 0.1–0.5 nm/ms. In a typical experiment, purified tropoelastin protein (≤50 μl, 10–100 μg/ml) was adsorbed to a clean glass coverslip for ~10 min and then rinsed with PBS pH 7.4. Random segments of tropoelastin molecules were then picked up by adsorption to the cantilever tip. Less than 30% of AFM pulls showed force-extension curves consistent with either single or multiple tropoelastin molecules. These were separated into single peaks that fit the Worm-Like–Chain (WLC) model (Supplementary Fig. 10a) or multiple peaks suggesting more than one tropoelastin molecule was extended (Supplementary Fig. 10b). The single peak shows the nonlinear relationship of extension length and force. Parameters that are consistent with those for a single full-length tropoelastin molecule include a contour length, Lc, of 210 nm and a persistence length of 0.32 nm. The predicted values for a single 697 amino acid long unstructured (random coil) polypeptide chain are 254 nm for Lc and ~0.4 nm for the persistence length. Hence, this recording corresponded to the stretching of a single molecule to almost its full length. Truncated tropoelastin molecules exhibit a shorter extension length and different WLC curve to that of full-length tropoelastin (Supplementary Fig. 10c). Only single-peak data were used to generate the extension measurements shown in Figure 3c. Polystyrene films of 100 nm nominal thickness were prepared by spin coating onto (100) silicon substrates (10 × 10 mm) at 2000 rpm using a SCS G3P-8 Spincoater. The spin coating solution consisted of polystyrene (Austrex 400 from Polystyrene Australia) dissolved to a concentration of 10 g/l in toluene (Sigma Aldrich; purity >99.9%). Solutions produced a homogenous film thickness over the entire surface of the silicon wafer. The absence of toluene in the resulting polystyrene films was confirmed by FTIR spectroscopy. After physisorption of tropoelastin, the unbound protein was removed by washing with buffer solution, followed by a brief aqueous immersion before being allowed to dry. AFM images of samples were collected on a Pico SPM instrument in tapping mode, at a scan rate of 0.5 lines/sec over areas of 5 × 5 and 1 × 1 μm. Analysis of the AFM images was performed using the WS × M software (version 3, Nanotec Electronica S.L. Spain). Quartz crystal microbalance. Quantification of protein adsorption onto oxidized polystyrene was determined by quartz crystal microbalance (Q Sense AB) with dissipation monitoring (QCM-D). Gold QCM-D crystals were spin coated with polystyrene and oxidized as described previously52. Experiments were performed at 20 ± 0.1 °C. A stable measurement in PBS (pH 7.2) was established before the addition of tropoelastin (20 μg/ml), ELN27-540 (20 μg/ml), fibronectin (20 μg/ml) or collagen type I (10 μg/ml) for 1 h followed by two rinses with PBS. Adsorbed mass estimates were derived using the Voigt model. Tropoelastin monomers are globular proteins with diameters of ~5–7 nm53. Our QCM-D thickness estimate of 4.1 ± 0.3 nm presented in Figure 4b shows therefore that tropoelastin exists as a single monolayer. The tropoelastin monolayer has previously been reported to be approximately 350 ng/cm2, which is in agreement with the results presented in this manuscript (Fig. 4c)54. Furthermore, tropoelastin was present as a single structural species, as shown by X-ray and neutron scattering (Baldock, C. et al., unpublished data). The circular footprint describes the circular area, given in diameter, covered by each protein molecule, which is calculated from the QCM-D mass and protein molecular weight assuming monolayer adsorption, according to the following equation: circular footprint (cm) = 2 ×
MW mass × N A × p
nature biotechnology
where MW = molecular weight of the protein (ng/mol), mass = mass of adsorbed protein (ng/cm2) and NA = Avogadro’s number (6.02 × 1023 mol−1). The circular footprint calculations were used to determine the binding footprint of intact tropoelastin, ELN27-540, fibronectin and collagen. Statistical analysis. All primary cell culture experiments were performed on duplicate or triplicate samples, with the figure legend showing the number (n) of separate experiments performed. Statistical analysis was performed using GraphPad Prism 5 (GraphPad Software). Statistical significance was determined using a twotailed Mann Whitney t test unless stated otherwise in the figure legend.
© 2010 Nature America, Inc. All rights reserved.
41. Martin, S.L., Vrhovski, B. & Weiss, A.S. Total synthesis and expression in Escherichia coli of a gene encoding human tropoelastin. Gene 154, 159–166 (1995). 42. Wu, W.J., Vrhovski, B. & Weiss, A.S. Glycosaminoglycans mediate the coacervation of human tropoelastin through dominant charge interactions involving lysine side chains. J. Biol. Chem. 274, 21719–21724 (1999). 43. Debelle, L. & Tamburro, A.M. Elastin: molecular description and function. Int. J. Biochem. Cell Biol. 31, 261–272 (1999). 44. Ostuni, A., Lograno, M.D., Gasbarro, A.R., Bisaccia, F. & Tamburro, A.M. Novel properties of peptides derived from the sequence coded by exon 26A of human elastin. Int. J. Biochem. Cell Biol. 34, 130–135 (2002). 45. Wu, W.J. & Weiss, A.S. Deficient coacervation of two forms of human tropoelastin associated with supravalvular aortic stenosis. Eur. J. Biochem. 266, 308–314 (1999).
46. Rodgers, U.R. & Weiss, A.S. Integrin alpha v beta 3 binds a unique non-RGD site near the C-terminus of human tropoelastin. Biochimie 86, 173–178 (2004). 47. Toonkool, P., Jensen, S.A., Maxwell, A.L. & Weiss, A.S. Hydrophobic domains of human tropoelastin interact in a context-dependent manner. J. Biol. Chem. 276, 44575–44580 (2001). 48. Hu, Y. & Smyth, G.K. ELDA: extreme limiting dilution analysis for comparing depleted and enriched populations in stem cell and other assays. J. Immunol. Methods 347, 70–78 (2009). 49. Miller, E., Garcia, T., Hultgren, S. & Oberhauser, A.F. The mechanical properties of E. coli type 1 pili measured by atomic force microscopy techniques. Biophys. J. 91, 3848–3856 (2006). 50. Greene, D.N. et al. Single-molecule force spectroscopy reveals a stepwise unfolding of Caenorhabditis elegans giant protein kinase domains. Biophys. J. 95, 1360–1370 (2008). 51. Florin, E.L. et al. Sensing specific molecular interactions with the atomic force microscope. Biosens. Bioelectron. 10, 895–901 (1995). 52. Lord, M.S. et al. Monitoring cell adhesion on tantalum and oxidised polystyrene using a quartz crystal microbalance with dissipation. Biomaterials 27, 4529–4537 (2006). 53. Mecham, R.P. & Heuser, J.E. The elastic fiber. in Cell Biology of Extracellular Matrix (ed. Hay, E.D.) 79–110 (Plenum Press, New York, 1991). 54. Yin, Y. et al. Covalent immobilisation of tropoelastin on a plasma deposited interface for enhancement of endothelialisation on metal surfaces. Biomaterials 30, 1675–1681 (2009).
nature biotechnology
doi:10.1038/nbt.1687
e r r ata a n d c o r r i g e n d a
Erratum: Pfizer explores rare disease path Catherine Shaffer Nat. Biotechnol. 28, 881–882 (2010); published online 9 September 2010; corrected after print 22 September 2010 In the version of this article initially published, it was reported that GlaxoSmithKline’s (GSK’s) EpiNova was one of several “biotech-like ideas” that “have been known to fizzle in pharma hands”; in fact, EpiNova has not “fizzled” but is in its second year of operation as a discovery performance unit of GSK focusing on epigenetic approaches to autoimmune disease. The error has been corrected in the HTML and PDF versions of the article.
Erratum: Public biotech 2009—the numbers Brady Huggett, John Hodgson & Riku Lähteenmäki Nat. Biotechnol. 28, 793–799 (2010); published online 9 August 2010; corrected after print 13 October 2010
© 2010 Nature America, Inc. All rights reserved.
In the version of this article initially published, in Table 6, Acorda was said to have entered into a licensing agreement with Bayer. In fact, Acorda entered into a licensing agreement with Biogen, not Bayer. The error has been corrected in the HTML and PDF versions of the article.
Corrigendum: Food firms test fry Pioneer’s trans fat-free soybean oil Emily Waltz Nat. Biotechnol. 28, 769–770 (2010); published online 9 August 2010; corrected after print 13 October 2010 The version of the article originally published states that Monsanto petitioned the USDA for deregulation of two “soybean products with modified oil profiles, one with omega-3 fatty acids for nutrition and the other with enhanced texture and functionality, called high stearic acid soybeans.” The article should have stated that “Monsanto has petitioned for deregulation of Vistive Gold soybeans, with mono-unsaturated fat levels similar to that of olive oil, and saturated fat levels similar to canola oil, which would produce an oil more stable than regular soybean oil at high frying temperatures.” The high stearate soybeans are still in development. The error has been corrected in the HTML and PDF versions of the article.
Corrigendum: Glyphosate resistance threatens Roundup hegemony Emily Waltz Nat. Biotechnol. 28, 537–538 (2010); published online 7 June 2010; corrected after print 13 October 2010 The version of the article originally published erroneously states that “Unlike pesticide use, herbicide use is not regulated by the US federal government.” The article should have stated “Unlike insect resistance, the US government does not have a mandated herbicide-resistance program.” The error has been corrected in the HTML and PDF versions of the article.
Corrigendum: Pluripotent patents make prime time: an analysis of the emerging landscape Brenda M Simon, Charles E Murdoch & Christopher T Scott Nat. Biotechnol. 28, 557–559 (2010); published online 7 June 2010; corrected after print 13 October 2010 In the version of this article initially published, the authors state: “The patents have been cross-licensed, protecting against unlicensed use of either method. Both the Sakurada and Yamanaka patents are part of the portfolio held by iPierian, a company recently formed by the merger of iZumi Bio, a San Francisco Bay Area biotech and Boston-based Pierian.” This statement is incorrect. The Yamanaka patent (owned by Kyoto University) is not licensed to iPierian. The Sakurada patent (owned by iPierian) is not licensed to Kyoto University. The error has been corrected in the HTML and PDF versions of the article.
nature biotechnology volume 28 number 10 OCTOBER 2010
1129
careers and recruitment
Portfolio managing for scientists David Sable A doctor-turned-portfolio manager finds the ever-changing economic and business environment stimulating.
© 2010 Nature America, Inc. All rights reserved.
A
t 10:44 AM on the Tuesday after Labor Day, seven of the 26 tiny companies currently in the Special Situations Life Sciences Fund are trading up, ten are trading down and eight have not yet traded. The fund has risen by eight basis points (0.08%). The general market indices are trading down for the day and have been flat for the year. I sit in front of six computer screens, four of which are filled with flashing red and green numbers, one showing a research report and a biomedical statistics program and the last filled with e-mail, instant messaging, and Twitter and news feeds. From this mix of data sources I look for situations where I can be right while everyone else is wrong, triage my investors’ capital to those places where it can do the most good and generate the highest returns. This is not what I envisioned when I graduated from medical school. There are plenty of white coats hanging in the closets of portfolio mangers, analysts, investment bankers and traders. A lot of scientists, doctors and engineers work on Wall Street; the skill sets needed to succeed in any of these fields are very similar. Moreover, given the increasing complexity of the platforms on which many biotech, life sciences and medical technology firms are based, familiarity with the basic vocabulary of molecular biology, epidemiology and biostatistics is essential to making educated investment analysis, recommendations or decisions. There are several routes from the laboratory to the business world. Most scientists get their starts on Wall Street as ‘sell-side’ analysts, supporting investment bankers who broker sales of companies, stock offerings and merger activity by evaluating the quality of the scientific work underlying potential new client companies, and later publishing written reports on those companies as they progress through their business plans. The scope of the analyses broadens over
David Sable is at Special Situations Life Sciences Fund, New York, New York; USA. e-mail: [email protected]
time from a narrow focus on just the scientific platform of a company to a comprehensive evaluation that also includes critiques of the management, capital structure and other core business considerations. Experienced analysts are therefore bilingual—conversant in the languages of science and business. Many analysts later move to the ‘buy-side’, that is, to hedge funds, venture capital or private equity funds and mutual funds, where they help the portfolio manager decide how to invest client’s money. Some subsequently become portfolio managers themselves. From bedside to buy-side I took a somewhat unusual route to being a portfolio manager. After graduating with a degree in economics, I attended medical school, undertook a residency in obstetrics and gynecology, a fellowship in reproductive endocrinology and practiced medicine for 11 years. During this time I also started and funded a business, performed consulting work for colleagues in venture capital and continued studying finance and accounting. In 2003, I was asked to evaluate the healthcare portfolio of a trading desk at Deutsche Bank, which in turn led to an offer to manage the healthcare portfolio of the Special Situations Funds, a New York hedge fund group. As portfolio manager I decide which of the almost 3,000 public life science companies to include in our funds, I negotiate the terms of structured stock offerings in which we participate, collect industry and company-specific data and formulate our funds’ strategies to respond to general science and healthcare shifts, such as healthcare reform or changes in government funding for stem cell research. A typical week includes meetings with five or six company management teams, negotiations with bankers over two or three major stock sales, discussions with senior managers of companies in which we have invested, all punctuated by fact checks and on-the-fly opin-
nature biotechnology volume 28 number 10 OCTOBER 2010
ions with my partners and frequent glances at the flashing red and green numbers showing market activity. I travel a couple of times a month for medical conferences, company site visits or board meetings. I can duplicate my desk by means of the internet from anywhere in the world. Although there are similarities between the laboratory and the business world, there are fundamental differences as well. Those of us trained in science, having grown accustomed to reproducible outcomes, the scientific method and evidence-based decision making, face an adjustment period: in the business world the standards for data presentation vary with the judgment and intentions of the presenter. The technical and very specialized platforms on which biotech and life science companies are built give senior management teams an advantage when dealing with potential investors, particularly those who lack the science vocabulary to adequately assess the value that these companies may represent. Intentional omissions of relevant data or background material, misleading presentations of data and outright errors in statistical assumptions are not unusual. Labor Day week has ended. A small biotech company was acquired by a large pharma company, whereas a large biotech company tries to avoid the same fate—or extract a higher price. The National Institutes of Health has resumed stem cell funding. My fund reinvested in newborn health. I spoke at length with the CEO of one of our portfolio companies about funding his clinical trial, then had him walk me through how it was powered. Another company in my fund announced that their pivotal data will be released in a few days, whereas a third pushed their phase 2 proof-of-concept trial back a few months. Many data points to analyze, ideas to generate, decisions to make, outcomes to observe, record and report. Science after all. COMPETING FINANCIAL INTERESTS The author declares no competing financial interests.
1131
© 2010 Nature America, Inc. All rights reserved.
people
George F. Horner III (left) was named chairman of the board of directors of Luxembourg-based Creabilis. He brings a track record of building successful biotech businesses, most recently as CEO of Prestwick Pharmaceuticals, which was sold to Biovail and Ovation Pharmaceuticals in late 2008. Previously, Horner was the CEO of Vicuron. Horner said he was “attracted to this opportunity by the company’s rich pipeline of clinical and preclinical drug candidates focused on dermatological indications, its deep understanding of the science behind a number of serious skin disorders, as well as its unique technology platform with broad utility. I am sure that from this strong base we will be able to generate significant value for Creabilis shareholders.”
Agios Pharmaceuticals (Cambridge, MA, USA) has appointed Scott Biller CSO. He joins Agios from Novartis Pharmaceuticals, where he was vice president and head of global discovery chemistry at the Novartis Institutes for BioMedical Research. Effective on December 31, Dario Carrara will resign from his position as senior vice president and managing director of the pharmaceutical group at Antares Pharma (Ewing, NJ, USA). Carrara will transition to Ferring International Center, which in November 2009 purchased from Antares certain assets and assumed a leased facility in Switzerland along with a majority of the site’s employees. MethylGene (Montreal) has announced the resignation of Donald F. Corcoran as president, CEO and director. He joined MethylGene in 1997. Charles Grubsztajn, who has been with MethylGene since 2005, most recently serving as vice president, business development, has been appointed president and CEO. Intarcia Therapeutics (Hayward, CA, USA) has announced the appointment of Kurt Graves as executive chairman of the board. Graves most recently served as executive vice president, head of corporate and strategic development and chief commercial officer at Vertex Pharmaceuticals. Before joining Vertex, he held senior leadership positions at Novartis Pharmaceuticals including US general manager and head of commercial operations and then global head of general medicines and chief marketing officer for the pharmaceuticals division. 1132
Aya Jakobovits was appointed president and CEO of cancer immunotherapy developer Kite Pharma (Los Angeles). She brings over 20 years of experience, previously serving as executive vice president and head of R&D at Agensys, which became an affiliate of Astellas Pharmaceuticals in December 2007. Before joining Agensys in 1999, Jakobovits served as director, discovery research and principal scientist at Abgenix, which was spun out of Cell Genesys in 1996 based on the XenoMouse technology developed under her leadership. Kite Pharma also announced the appointment of Gloria Lee as chief medical officer. Lee comes from Syndax Pharmaceuticals, where she served as vice president of clinical development. Frank Kelly Jr. has been appointed a director at InVasc Therapeutics (Atlanta). His 40 years of experience include serving as president and CEO of a joint venture between The CocaCola Company and Nestlé Refreshments. He is a principal at MFK Global, a marketing and consulting firm. Steven Lo joined Corcept Therapeutics (Menlo Park, CA, USA) as vice president, commercial operations. Lo has worked 15 years in the pharma and biotech industry, most recently leading the endocrinology marketing and sales organization at Genentech. Kane Biotech (Winnipeg, Manitoba, Canada) has appointed Philip Renaud to its board of
directors. Renaud is managing director of investment advisory firm Church Advisors and was previously a founding partner of Change Capital Partners. He serves on the boards of Yorbeau Resources, Diagnos and Dia Bras Exploration. Gene Logic (Gaithersburg, MD, USA), an Ocimum Biosolutions company, appointed Norrie J.W. Russell president. Russell has a 30-year history in pharmaceutical R&D holding leadership roles at NovaRx, Invitrogen, Aviva Biosciences and Lynx Therapeutics. He also was formerly global head, biological science and technology at AstraZeneca. FluoroPharma (Boston) has named Thijs Spoor as CEO and a member of its board of directors. Spoor has 15 years of industry experience, most recently as CFO of Sunstone BioSciences. He previously served in regulatory affairs, new product development and as the global brand head for nuclear cardiology at GE Healthcare. CytoGenix (New Braunfels, TX, USA) named Cy Stein chairman of the company’s board of directors, replacing Randy Moseley. Stein, a professor of medicine and molecular pharmacology at the Albert Einstein College of Medicine in New York, has been on the CytoGenix board for 7 years and has served as chairman of the scientific advisory board. Additionally, Moseley has stepped down as CFO. His replacement is Steven M. Plumb, former president of Clear Financial Solutions. He also co-founded and served as CFO of Houston Pharma and 3A Pharma. Marc Tessier-Lavigne, executive vice president for research and CSO at Genentech, will become the next president of Rockefeller University (New York) effective March 1, 2011. His is the first departure from Genentech’s top scientific ranks since its acquisition by Roche in March 2009. Tessier-Lavigne will succeed Paul M. Nurse, who announced his plans in April to become president of the UK’s Royal Society. George Yu was appointed president and CEO of Sinobiomed (Hong Kong). He most recently served as managing partner of Bay2Peak, a financial advisory and investment management firm. His experience includes small-cap hedge and venture capital funds in emerging markets and investment banking at Lehman Brothers.
volume 28 number 10 OCTOBER 2010 nature biotechnology
Editorial
Teetering on the brink The US Congress must authorize federal funding of human embryonic stem cell research.
© 2010 Nature America, Inc. All rights reserved.
W
ith his August 23 preliminary injunction banning federal funding of research on human embryonic stem cells (hESCs), US District Court Judge Royce Lamberth has singlehandedly upended one of the most promising fields of biomedical science. On September 9, in response to an appeal by the Department of Justice, the injunction was temporarily lifted by the US Court of Appeals in Washington, DC. Still to be decided are the plaintiffs’ appeal of the appellate court’s decision and the original lawsuit, Sherley v. Sebelius. The legal wrangling, which may reach the US Supreme Court, could drag on for some time. In the meantime, a cloud of uncertainty hangs over US hESC research, spreading confusion and anguish among US National Institutes of Health (NIH)-funded scientists and their collaborators, disrupting careers, damaging the prospects of companies working in the area of regenerative medicine and impeding the search for new therapies. Lamberth’s ruling rested on his interpretation of the ambiguous DickeyWicker amendment, a rider attached annually to the federal appropriations bill covering the NIH. Introduced in 1996, two years before hESCs were first derived, Dickey-Wicker prohibits the use of federal money for “research in which a human embryo or embryos are destroyed, discarded, or knowingly subjected to risk of injury or death.” Lamberth’s reading of the amendment—that it excludes federal funding of hESC research— contradicts the interpretations of US Presidents Clinton, Bush and Obama; of Congress, which twice passed bills supporting federal funding of hESC research and has never acted to ban such funding; and of the NIH, responsible for administering grants in accordance with Dickey-Wicker. Their view that federally funded hESC research is permissible relied on a 1999 analysis by Harriet Rabb, then the general counsel of the Department of Health and Human Services, who noted that “human pluripotent stem cells […] are not a human embryo within the statutory definition.” Although some US stem cell scientists and their supporters had worried about Dickey-Wicker for years, most were stunned by the latest turn of events. The field was thriving as never before. The NIH had funded hESC grants since 2002, Obama had removed Bush-era funding restrictions and offered an eloquent defense of hESC research, and support among the public and Congress was continuing to rise. To those familiar with Sherley v. Sebelius, the plaintiffs’ case appeared weak. In hindsight, it is clear that many were lulled into a false sense of security and that Congress miscalculated in failing to codify Obama’s March 2009 Stem Cell Executive Order when the political climate was more favorable. Now, only weeks before an election defined by an anti-incumbent mood that has politicians reluctant to take on controversial issues, the chances for Congressional action seem slim. The chilling effect from Lamberth’s ruling is likely to include an exodus of US scientists from the stem cell field, the departure of others to continue their studies abroad and loss of US leadership in a field with exceptional therapeutic promise. Although the ruling targets only federally funded scientists, it will surely affect regenerative medicine
nature biotechnology volume 28 number 10 october 2010
companies as well, harming efforts to translate basic research on stem cells into therapies. Research on hESCs has always been whipsawed by politics and inconsistent policy. If the US government abandons the field now, the consequences for tech transfer from federally funded universities and for industry partnerships with NIH-funded academics, as well as questions over the eligibility of hESC-based therapies for reimbursement, will make regenerative medicine even less attractive to investment. Investors, boards and industry scientists are already skeptical about how close hESCs are to therapies, and in a bad economic environment for biotech generally, a new round of uncertainty may prove to be the field’s undoing. The bright spot in this sorry tale is California, where the California Institute for Regenerative Medicine has provided $351 million for projects that include hESCs (not counting money spent on facilities and training grants). Although this and other smaller state initiatives have come under pressure from severe state budget crunches, such investments now seem prudent given that they have buffered the effects of fickle federal policy. On scientific grounds, the argument for continued research on hESCs is irrefutable. Different hESC lines behave differently; understanding these differences and developing therapeutic strategies will require comparisons among a large number of lines. Although induced pluripotent stem cells have many advantages and may one day replace hESCs, in the foreseeable future the latter will remain an indispensable cell type for studying pluripotency and differentiation. During the Bush administration, many US scientists and companies interested in working on hESCs concluded that the obstacles were simply too great. Obama’s executive order placed the presidential imprimatur on this young, controversial science, granting it unprecedented legitimacy and encouraging those who had stayed on the sidelines to proceed. But as the Lamberth decision makes clear, Dickey-Wicker represents a sword of Damocles over the field. It has often been pointed out that allowing excess in vitro fertilization (IVF) embryos to be discarded while outlawing federally funded research on these embryos is inconsistent. US prohibitions on embryo research reach back as far as 1974, when opponents of Roe v. Wade claimed that the ruling would bring about worst-case scenarios, including indiscriminate embryo experimentation. For 35 years, federal moratoria on embryo research have made investigation of infertility, early human development, reproductive medicine and pre-natal diagnosis off-limits to most US scientists and clinicians. The US needs laws that will protect free inquiry in these areas, and particularly in the field of hESCs, in accordance with ethical norms and the NIH’s competitive, merit-based formula. On the day he signed his Stem Cell Executive Order, Obama also issued a memorandum requiring the development of “a strategy for restoring scientific integrity to government decision making.” To ensure the scientific integrity of stem cell research—whatever the outcomes of the court cases—the best solution is swift legislative action. 987
news in this section Roche inks stapled peptide deal with Aileron
Pilot biologics plants spring up outside industry
p 992
Agronomic researchers free to study Monsanto’s seeds p 996
p 995
As stem cell research findings percolate into the clinic, there is a growing realization that a lack of procedural and regulatory standards are creating huge translational potholes. “I can tell you based on my experience,” says Tara Clark, general manager of North American clinical operations for Bergisch Gladbach, Germany–based Miltenyi Biotec, “that two investigators from different institutions have submitted a similar stem cell research proposal to the FDA [US Food and Drug Administration]. One came back regulated as a device and the other came back regulated as a biologic.” For Geron, regulatory confusion has translated into a two-year stop/go/stop/go as it has attempted to embark on the first clinical trials using cells derived from human embryonic stem cells (hESCs) to treat spinal cord injuries (Nat. Biotechnol. 27, 877, 2009). The FDA has now lifted the clinical hold it imposed last August on the biotech after studies revealed that mice used in preclinical work had developed cysts. After almost a year of delay, the Menlo Park, California– based firm has the go-ahead to start recruiting patients with spinal cord injuries. Geron is paving the way for human testing of hESC-derived products, but the lack of standards for translational work is seen as so serious and pervasive that a slew of efforts is underway to address the problem. The International Society for Cellular Therapy (ISCT), a researcher/industry organization headquartered in Vancouver, Canada, the International Society for Stem Cell Research (ISSCR) based in Deerfield Illinois, the European Medicines Agency (EMA) of London, and the California Institute for Regenerative Medicine (CIRM) in San Francisco, all have standardization initiatives in the works. The reasons underlying the push are multifold. The most immediate is that hESCderived therapies have now joined adult stem cell therapies under investigation in the clinic. According to a recent estimate, about 68 stem cell-based approaches are
currently in clinical development (Stem Cells 6, 517–520, 2010). However, there is no proven approval pathway for either companies or regulators to refer to. “This is the very first time that FDA [has] had to review an IND [investigational new drug] application like this one,” says Anna Krassowska, head of investor and media relations for Geron, which has faced a procession of regulatory red lights and green lights. “So we and they don’t have a lot of information to go on to say ‘do this, and this and then this,’” Krassowska notes. Standardizations are also proving hard to arrive at because unlike new chemical entities, for example, cellular products are often heterogeneous and difficult to standardize. “Every type of stem cell seems to be different in its behavior and thus in its underlying biology,” remarks Lawrence Goldstein, director of the University of California at San Diego’s Stem Cell Program and an ISSCR spokesperson. Paul de Sousa, senior research fellow, MRC Centre for Regenerative Medicine, University of Edinburgh and chief scientist at Roslin Cells, also in Edinburgh, agrees. “Stem cells are by their nature the epitome of a dynamic entity…they can be one thing one minute and something else another minute,” he says. One consequence of this variability is that the proof of a therapeutic efficacy must be joined to some sort of regulatory process which ensures that the cell type which began the research hasn’t transmogrified into something quite different by the time it is implanted in a person. Guarantees of the uniformity of the cells are further complicated by the reality that in many cases stem cells are harvested from one patient and then reimplanted in someone else. These cells carry the unique genetic fingerprint of the donor. “Intrinsically the products are different between one patient and another,” say Mohamad Mohty, a professor of hematology at the University of Nantes, France, and chairman of the prospective clinical trials committee of the European Group for Blood and Marrow Transplantation.
nature biotechnology volume 28 number 10 OCTOBER 2010
Another effect of stem cells’ variability is the lack of a standardized approach to growing them in culture. Jon Rowley, director of cell therapy process development at Lonza, a Basel-based company that specializes in a good manufacturing practice approach to stem cell production, points out that a stem cell line’s phenotypic development can be significantly altered by the medium it is grown in. Unfortunately, the fetal animal serum
Sebastian Kaulitzki/istockphoto
© 2010 Nature America, Inc. All rights reserved.
Geron trial resumes, but standards for stem cell trials remain elusive
Geron has clearance to resume its pioneering trial of human ESC-derived oligodendrocyte precursor cell therapy in humans with spinal cord injuries.
989
NEWS
in brief China’s $2.4 billion splurge
© 2010 Nature America, Inc. All rights reserved.
c40/ZUMA Press/Newscom
The Chinese government is pouring an estimated 16 billion yuan ($2.4 billion) to shore up drug development while introducing policies to promote the biotech sector. The new policies—designed to boost seven Biopharma projects emerging strategic will receive billions. industries, from sustainable energies to biotech—came under a resolution issued by the State Council, China’s cabinet, on September 8. China’s key new drug R&D scheme was launched in 2009. In its first stage, which will last until 2011, central government will invest nearly 6 billion yuan ($882.5 million) to support more than 900 drug development projects as well as several innovative technology platforms. This is followed by a second stage, running from 2011 to 2015, with an expected 10 billion yuan ($1.47 billion). The biopharma sector is expected to be one of the main beneficiaries of this funding push, although the government’s recent announcement did not provide a breakdown of the investments. Central government plans to couple this financial support with moves to strengthen intellectual property protection, and promote favorable taxation and lending policies. Zailin Yu, chairman and CEO of Tianjin-based protein drug developer SinoBiotech, who is funded by the scheme, says there is no preference for biologics or chemical drugs, as long as the proposals are strong. Mingde Yu, president of China Pharmaceutical Enterprise Management Association, in Beijing, says Chinese firms are unlikely to develop original chemical compounds, and he believes the opportunities lie in developing biotech drugs. But despite this strong governmental support, biopharma researchers complain the money is spread thin among hundreds of projects. The promised funding also arrives late, takes a long time to reach scientists and is too tightly regulated, leaving researchers little flexibility to modify their research plans. In addition, most contract research organizations (CROs) and large international pharma with facilities in China are not invited to participate in the scheme, despite their expertise manufacturing to international standards. “In China, most of the huge government support goes to academics who lack industrial experience and to state-owned pharmaceuticals because of the gap between the public institutions and privately and foreignowned industries. This is a big loss to innovative drug development,” says Shoufu Lu, founder and CEO of Shanghai’s Zhangjiang-based startup Aqbio Pharma. “We CROs charge more, so academics do not accept us. But we are happy to cut our prices in order to be involved in the Statefunded projects as long as there is mutual under standing between academics and us,” says the CEO of a leading CRO in Shanghai’s Zhangjiang, who requested anonymity. Hepeng Jia
990
mediums used in many university laboratories carry safety risks whereas the push toward animal product–free media during commercial scale-ups can create the phenotypic drift everyone worries about. “And,” says Rowley, “if there is too big a change (in phenotype) you may have to re-run expensive preclinical or even early human clinical trials.” Clinical trial design is a further challenge. Traditional small molecules and antibodies have a limited life in the body. If you cease administering the drug the body eventually washes it out. But hESCs and other specialized stem cells don’t leave the body; they become part of it in a manner akin to the implantation of a medical device. “In many cases, the introduction of cells into a human patient, at least with current technology, is often an irreversible intervention,” says Goldstein. Stem cells’ idiosyncratic biology creates as well unique intellectual property issues for people looking for ways of standardizing patent claim processes. “There are patent thickets everywhere,” says Debra Mathews, assistant director for science programs, the Johns Hopkins Berman Institute of Bioethics, and principal investigator in the Hinxton Group Project. Her institute is trying to come up with ways of optimizing stem cell innovation while at the same time ensuring its products reach as many people as quickly as possible (Nat. Biotechnol. 28, 544–546, 2010). “The unique property of [hESCs] makes for a particularly sticky wicket, as a pluripotent stem cell is a gateway technology,” says Mathews. “And patent control over a line of [hESCs] gives the patent holder control over downstream research, such as that which differentiates stem cells into neurons, islet cells, isolate proteins, et cetera,” she says. And to all of the above must be added what is described as the ‘low hanging fruit’ complication. There are already treatments for most simple conditions, and stem cells are held up as a treatment for the high hanging and as-yet intractable conditions. Geron’s potential treatment to restore limb movement after a spinal injury is a classic example. Benchmarking effectiveness of a therapy in such a condition is a substantial challenge. Finally, the plethora of standardization uncertainty can translate into a translational funding paralysis. “It is clear not enough information is available for new investors to make informed decisions,” remarks Robert Deans, senior vice president of Regenerative Medicine of Cleveland-based Athersys, and chair of the ISCT Commercialization
Committee. Unsurprisingly, the multiplicity of issues to be resolved has created a certain caution in those groups seeking to have various parts of the translation process become more standardized. Deans says the ISCT is not at this point programmatic but seeks rather to bring industry and researchers together to arrive at a consensus. “We want to give regulators, such as the FDA, exposure to certain tests and scientific models and let them hear from a number of academic investigators what the bottom line should be,” he says. Elona Baum, general counsel for CIRM and the point person in CIRM’s efforts to come up with standards, says that her organization has actively begun to investigate what the standardization priorities should be. Working with the Washington, DC–based lobbying group, the Alliance for Regenerative Medicine, they are looking at what existing standards and guidelines exist and are asking key players in the field what should be done and in what order. “We all agree with the need to move ahead, now we are trying to identify what our priorities should be,” she says. Goldstein says ISSCR’s core belief is that “the most important thing is protection of the people who will participate in the trials, or who will potentially purchase marketed therapies.” With this in mind ISSCR recently created a website to provide information by which patients and physicians can judge the bona fides of stem cell–based cures being promoted on the internet by clinics around the world. (Nat. Biotechnol. 28, 885, 2010). In Europe, EMA has been pushing active consultations on various areas of stem cell research and applications that need regularization. It hopes to have a guidance document adopted by November, which should go up on their website soon after. But with all the push for adopting uniform standards, some in the field fear more regulatory paralysis. University of Nantes’s Mohty points out that a European directive in 2001 that aimed to standardize all clinical trial procedures and thus speed up the approval process actually has had the opposite effect when it comes to stem cells. “It is very difficult, maybe even nearly impossible, to perform clinical trials in the field of hematopoietic stem cell transplantation because this activity cannot be compared with single drugs,” he says. “Consequently, there has been a big drop in the number of clinical trials performed in Europe after that directive.” Simply put, the regulatory approval bar has been raised too high. Stephen Strauss Toronto
volume 28 number 10 october 2010 nature biotechnology
NEWS
in brief China’s $2.4 billion splurge
© 2010 Nature America, Inc. All rights reserved.
c40/ZUMA Press/Newscom
The Chinese government is pouring an estimated 16 billion yuan ($2.4 billion) to shore up drug development while introducing policies to promote the biotech sector. The new policies—designed to boost seven Biopharma projects emerging strategic will receive billions. industries, from sustainable energies to biotech—came under a resolution issued by the State Council, China’s cabinet, on September 8. China’s key new drug R&D scheme was launched in 2009. In its first stage, which will last until 2011, central government will invest nearly 6 billion yuan ($882.5 million) to support more than 900 drug development projects as well as several innovative technology platforms. This is followed by a second stage, running from 2011 to 2015, with an expected 10 billion yuan ($1.47 billion). The biopharma sector is expected to be one of the main beneficiaries of this funding push, although the government’s recent announcement did not provide a breakdown of the investments. Central government plans to couple this financial support with moves to strengthen intellectual property protection, and promote favorable taxation and lending policies. Zailin Yu, chairman and CEO of Tianjin-based protein drug developer SinoBiotech, who is funded by the scheme, says there is no preference for biologics or chemical drugs, as long as the proposals are strong. Mingde Yu, president of China Pharmaceutical Enterprise Management Association, in Beijing, says Chinese firms are unlikely to develop original chemical compounds, and he believes the opportunities lie in developing biotech drugs. But despite this strong governmental support, biopharma researchers complain the money is spread thin among hundreds of projects. The promised funding also arrives late, takes a long time to reach scientists and is too tightly regulated, leaving researchers little flexibility to modify their research plans. In addition, most contract research organizations (CROs) and large international pharma with facilities in China are not invited to participate in the scheme, despite their expertise manufacturing to international standards. “In China, most of the huge government support goes to academics who lack industrial experience and to state-owned pharmaceuticals because of the gap between the public institutions and privately and foreignowned industries. This is a big loss to innovative drug development,” says Shoufu Lu, founder and CEO of Shanghai’s Zhangjiang-based startup Aqbio Pharma. “We CROs charge more, so academics do not accept us. But we are happy to cut our prices in order to be involved in the Statefunded projects as long as there is mutual under standing between academics and us,” says the CEO of a leading CRO in Shanghai’s Zhangjiang, who requested anonymity. Hepeng Jia
990
mediums used in many university laboratories carry safety risks whereas the push toward animal product–free media during commercial scale-ups can create the phenotypic drift everyone worries about. “And,” says Rowley, “if there is too big a change (in phenotype) you may have to re-run expensive preclinical or even early human clinical trials.” Clinical trial design is a further challenge. Traditional small molecules and antibodies have a limited life in the body. If you cease administering the drug the body eventually washes it out. But hESCs and other specialized stem cells don’t leave the body; they become part of it in a manner akin to the implantation of a medical device. “In many cases, the introduction of cells into a human patient, at least with current technology, is often an irreversible intervention,” says Goldstein. Stem cells’ idiosyncratic biology creates as well unique intellectual property issues for people looking for ways of standardizing patent claim processes. “There are patent thickets everywhere,” says Debra Mathews, assistant director for science programs, the Johns Hopkins Berman Institute of Bioethics, and principal investigator in the Hinxton Group Project. Her institute is trying to come up with ways of optimizing stem cell innovation while at the same time ensuring its products reach as many people as quickly as possible (Nat. Biotechnol. 28, 544–546, 2010). “The unique property of [hESCs] makes for a particularly sticky wicket, as a pluripotent stem cell is a gateway technology,” says Mathews. “And patent control over a line of [hESCs] gives the patent holder control over downstream research, such as that which differentiates stem cells into neurons, islet cells, isolate proteins, et cetera,” she says. And to all of the above must be added what is described as the ‘low hanging fruit’ complication. There are already treatments for most simple conditions, and stem cells are held up as a treatment for the high hanging and as-yet intractable conditions. Geron’s potential treatment to restore limb movement after a spinal injury is a classic example. Benchmarking effectiveness of a therapy in such a condition is a substantial challenge. Finally, the plethora of standardization uncertainty can translate into a translational funding paralysis. “It is clear not enough information is available for new investors to make informed decisions,” remarks Robert Deans, senior vice president of Regenerative Medicine of Cleveland-based Athersys, and chair of the ISCT Commercialization
Committee. Unsurprisingly, the multiplicity of issues to be resolved has created a certain caution in those groups seeking to have various parts of the translation process become more standardized. Deans says the ISCT is not at this point programmatic but seeks rather to bring industry and researchers together to arrive at a consensus. “We want to give regulators, such as the FDA, exposure to certain tests and scientific models and let them hear from a number of academic investigators what the bottom line should be,” he says. Elona Baum, general counsel for CIRM and the point person in CIRM’s efforts to come up with standards, says that her organization has actively begun to investigate what the standardization priorities should be. Working with the Washington, DC–based lobbying group, the Alliance for Regenerative Medicine, they are looking at what existing standards and guidelines exist and are asking key players in the field what should be done and in what order. “We all agree with the need to move ahead, now we are trying to identify what our priorities should be,” she says. Goldstein says ISSCR’s core belief is that “the most important thing is protection of the people who will participate in the trials, or who will potentially purchase marketed therapies.” With this in mind ISSCR recently created a website to provide information by which patients and physicians can judge the bona fides of stem cell–based cures being promoted on the internet by clinics around the world. (Nat. Biotechnol. 28, 885, 2010). In Europe, EMA has been pushing active consultations on various areas of stem cell research and applications that need regularization. It hopes to have a guidance document adopted by November, which should go up on their website soon after. But with all the push for adopting uniform standards, some in the field fear more regulatory paralysis. University of Nantes’s Mohty points out that a European directive in 2001 that aimed to standardize all clinical trial procedures and thus speed up the approval process actually has had the opposite effect when it comes to stem cells. “It is very difficult, maybe even nearly impossible, to perform clinical trials in the field of hematopoietic stem cell transplantation because this activity cannot be compared with single drugs,” he says. “Consequently, there has been a big drop in the number of clinical trials performed in Europe after that directive.” Simply put, the regulatory approval bar has been raised too high. Stephen Strauss Toronto
volume 28 number 10 october 2010 nature biotechnology
news US courts throw ES cell research into disarray
© 2010 Nature America, Inc. All rights reserved.
luismmolina/istockphoto
Funds for human embryonic stem cell (hESC) research are flowing again following a temporary ban on federal support for such research. A lower court injunction imposed by US District Court Judge Royce Lamberth on August 23 was lifted mid-September by the US Court of Appeals for the District of Columbia. When the injunction was issued, the US National Institutes of Health (NIH) responded by broadly suspending its funding for grants and contracts involving Embryonic stem cells are proving hard to hESC research, including for projects that standardize. took shape under the restrictive federal policies of the Bush Administration. For now, that blanket ban is lifted, but the issue is far from resolved. The lawsuit that led to the injunction is still pending. “We are pleased with the Court’s interim ruling, which will allow promising stem cell research to continue while we present further arguments to the Court,” says NIH director Francis Collins. The ongoing legal battle is seen as harmful to hESC research across the world as it will slow progress and stymie collaborations with US researchers. As Ian Wilmut at the MRC Centre for Regenerative Medicine in the UK puts it: “Any disruption of [hESC] research, such as that imposed by the present injunction, will have a chilling effect on research throughout the world.” According to Elaine Fuchs of Rockefeller University, New York, president of the International Society for Stem Cell Research, “Halting federal funding for such research impedes efforts aimed at ‘translating’ this knowledge into new and improved treatments for patients.” The lawsuit was brought in part by the Alliance Defense Fund (ADF), a group of “Christian attorneys and like-minded organizations,” based in Scottsdale, Arizona. ADF, acting on the behalf of “doctors opposed to the [Obama] Administration’s [hESC research] policy,” argues that this policy violates the federal Dickey-Wicker Amendment, which prohibits “federal funding of research involving the destruction of human embryos.” The Administration says that its hESC research policy complies with that law because cells from human embryos are donated from private sources and no federal funds are used obtaining them. Congress, with Representative Diana DeGette (D-CO) as a chief sponsor, twice passed legislation that would explicitly permit federal funding for hESC research, but former President Bush vetoed those bills. Although President Obama would surely sign such a bill, moving it through Congress seems unlikely anytime soon. Jeffrey L Fox Washington, DC
in their words “This is not a fixer-upper, this is beachfront property.’’ Genzyme’s CEO Henri Termeer explains why he rejected the unsolicited $69 a share offer from pharma giant Sanofiaventis (Boston Globe, 1 September 2010). “I wasn’t looking to move away. In fact, this is probably the only job that could have lured me away from Genentech.” Marc Tessier-Lavigne, who will give up his role of chief scientific officer at Genentech to become president of Rockefeller University. (The New York Times, 8 September 2010)
“The makers of Viagra would jump at the chance to sponsor the largest pole in North America.” City councillor Howard Moscoe of Toronto, where a 410 foot pole to fly the Canadian flag is being proposed, makes a pitch for corporate support from erectile dysfunction drugmaker Pfizer. (Pharmalot, 27 August 2010) “What they’ve been doing for years is buying off doctors to sell their products and a doctor’s primary obligation should be to the patient not the pharmaceutical company.” Paul Thacker, an investigator working for Republican Chuck Grassley on the US Senate Finance Committee who recently stepped down to join a non-profit, highlights the Washington perspective on conflicts of interest. (Pharmalot, 23 September 2010)
nature biotechnology volume 28 number 10 OCTOBER 2010
991
NEWS
in brief
Roche backs Aileron’s stapled peptides
For the ninth straight year, the US Food and Drug Administration (FDA) is raising the fees companies must pay to have their drugs reviewed. As of October 1, new applications will cost over a million dollars. User fees were instituted in 1992 by the Prescription Drug User Fee Act (PDUFA) to provide funding so that the FDA can conduct timely reviews of drugs. The fees have risen from $100,000 in 1993 to $1,542,000 for a new drug application with clinical trial data. Whether PDUFA has been good for the biotech industry is debatable. Reducing the time to approval (50% reduction since the late 1990s) has meant millions of dollars in revenue, as drugs can be brought to market earlier in their patent lives, according to Mary Olson, at Tulane University in New Orleans. “This expected revenue for most drugs greatly exceeds the user fee even with the proposed increases,” she says. However, Kurt Karst, a lawyer at Hyman, Phelps, and McNamara in Washington, DC, with clients in the biotech industry, says the fees are a concern for smaller companies deciding whether to seek approval for a drug. In a letter to the FDA, the Biotechnology Industry Organization of Washington, DC, pointed out that PDUFA fees now pay a greater share of the budget for drug reviews, almost two-thirds in 2008 up from 42.5% in 2006, and called for transparency on how the fees are used. Laura DeFrancesco
Sugar beets still in the game Seed producers will be allowed to plant biotech sugar beets again following a September decision from the United States Department of Agriculture’s crop approval arm to allow planting under interim guidelines. The Animal and Plant Health Inspection Service (APHIS) will issue limited permits to seed developers authorizing genetically modified (GM) beet planting this fall as long as the harvested beets are not allowed to flower. The permits are a legal way around a federal judge’s 13 August decision to ban all commercial farming of Monsanto’s Genuity Roundup Ready sugar beets beyond that date. GM sugar beets planted before the ruling may be harvested, processed and sold without restriction and the beets remain eligible for future commercial approval pending USDA/APHIS’s full environmental review of the beets. A federal judge had revoked APHIS’s beet deregulation and prohibited further planting and sale on the grounds that the agency had not adequately considered the potentially irreparable harm GM beets might cause related species through crossfertilization (Nat. Biotechnol. 27, 970, 2009). APHIS has announced it will expedite the sugar beets review, which will take about two years. Luther Markwart, of the American Sugarbeet Growers Association and Sugar Industry Biotech Council, Washington, DC, says GM beet farmers, who grow 95% of the US crop, already voluntarily maintain 4-mile isolation from related crops to prevent cross-fertilization. “Most of the interim measures that we’re looking at…are things that we’re already doing,” he says. Lucas Laursen
992
A company that staples peptides into drugs and a member of Aileron’s scientific advisory to target ‘undruggable’ proteins has landed a board. Exposure of their amide bonds renders $1.1 billion deal with Swiss drug maker Roche. peptides susceptible to proteolytic breakdown, The deal signed in August will see Aileron and their polarity makes cell penetration difpocket $25 million upfront in technology ficult. “The major problem in the discovery of and access fees and R&D support. More than peptide-based drugs has been the ability to get that, it provides validation from big pharma robust cell penetration,” says Gregory Verdine, for Aileron’s stapling platform. professor of chemistry Chemically stapled peptides at Cambridge-based result in helical peptides Harvard University that reputedly comand chairman of bine high stability Aileron’s scientific with the ability to advisory board. cross the cell mem“We’re not the first to brane to hit cellular stabilize helices.” targets. This is the Stapled peptides first major indusare locked into an try collaboration for α-helical—and, thus, Aileron, although a biologically the Cambridge, Atomic structure of a single-turn stapled peptide active—conforMassachusetts– bound to its target. Stapling locks peptides into stable, mation. To achieve based biotech has biologically active alpha-helices. this, hydrocarbon already received cross-links are the industry’s collective imprimatur. Last year, added between two non-natural amino acid the corporate venture arms of no less than four residues inserted at each end of the target peppharmaceutical firms —Roche Venture Fund, tide sequence. A ruthenium-catalyzed olefin Lilly Ventures, Novartis Venture Funds and metathesis reaction generates the hydrocarbon GlaxoSmithKline-owned SR One—backed linkages that impart structural stability to the the company’s vision of peptide modification stapled peptide and render it resistant to protewith a $40 million investment round. “They’ve olytic breakdown. The method is general in its looked very hard at this question. In most scope. “You can apply this to any peptide that is cases—without naming names —they have naturally inclined to be helical,” says Walensky. tried [to do this themselves]. And I suspect Stapled peptides have a dual role, serving both they will continue to try,” says Aileron CEO as molecular probes for studying biological proJoseph Yanchik. Notwithstanding such con- cesses, such as protein-protein interactions, and certed industry support, converting the prom- as drug leads that target those same processes. ise of stapled peptides into clinically validated “This really changes the paradigm, in that we drug molecules is going to be a complex and can create bioactive secondary structures and difficult challenge, the scale of which is not use them in vivo to target a disease and study the lost on its promoters—or its investors. “You biology,” Walensky says. For example, his group can imagine we’ve had to run a pretty difficult has generated a stapled peptide, based on the scientific gauntlet,” says Yanchik. “The length BH3 domain found in the BCL2 protein family and nature of the due diligence was extraor- member BID, tha t can activate apoptosis in dinary.” human leukemia xenografts (Science 305, 1466– Peptides make more attractive medicines 1470, 2004). More recently, they have identified than proteins or nucleic acids. They have a second function for the pro-apoptotic BCL2 evolved in nature to take on highly specific protein BAD, in insulin secretion and beta cell functions, work with great potency and are far survival. Stapled peptides, based on the BAD smaller than recombinant proteins and anti- BH3 domain, act directly on glucokinase and bodies. But they are inherently unstable chains. thereby influence glucose-stimulated insulin As soon as a job is done, they are degraded secretion (Nat. Med. 14, 144–153, 2008). Over quickly by proteases—a factor that has tended the summer, the Walensky group also reported to limit their utility as pharmaceuticals. “The on a stapled peptide that was a highly selective catch-22 is you want the peptide for its bio- inhibitor of MCL1, an anti-apoptotic protein logical activity, but you don’t want the pep- implicated in tumor survival (Nat. Chem. Biol. tide for its pharmacological vulnerability,” says 6, 595–601, 2010). Similarly, Verdine’s group has Loren Walensky, assistant professor of pedi- used stapled peptides to demonstrate inhibiatrics at Harvard Medical School, in Boston, tion of the Notch transcription factor complex, Aileron
© 2010 Nature America, Inc. All rights reserved.
Drug user fees top $1 million
volume 28 number 10 october 2010 nature biotechnology
news
© 2010 Nature America, Inc. All rights reserved.
Table 1 Selected therapeutic peptides either registered or in late-stage development. Company
Product description
Clinical stage
Scios (Fremont, California)
Natrecor (nesiritide; JNS-004, a recombinant B-type brain natriuretic peptide)
FDA approved for acute decompensated congestive heart failure
Novo Nordisk (Bagsvaerd, Denmark)
Victoza (liraglutide, a once-daily subcutaneous modified (Arg34, Lys26-[N-epsilon(gamma-Glu[N-alpha-hexadecanoyl])]) glucagon-like peptide 1 [7-37] analog)
FDA approved for type 2 diabetes
Roche/Trimeris (Durham, North Carolina)
Fuzeon (enfuvirtide, a 36-amino-acid peptide derived from the C-terminal (heptad repeat sequence 2; HR2) domain of human immunodeficiency virus (HIV) gp41).
FDA approved for HIV infection
Amylin (San Diego)/Lilly (Indianapolis)
Byetta (extendin-4, a 39-amino-acid peptide exhibiting 52% structural identity to human GLP-1)
FDA approved for type 2 diabetes and polycystic ovary syndrome
NPS Allelix (Mississauga, Ontario, Canada) Gattex (teduglutide; ALX-0600, a subcutaneous injectable GLP-2 analog, containing an Ala→Gly substitution at position 2)
Phase 3 for gastrointestinal diseases, including short bowel syndrome, enterocolitis and pediatric disorders
HealOr (Rehovot, Israel)
A topical formulation of peptide pseudosubstrates that bind protein kinase C isoforms
Phase 3 for decubitus ulcers, varicose ulcers and diabetic foot ulcers
Access Pharmaceuticals (Dallas)
Cytolex (pexiganan acetate, a cream formulation of a 22-amino-acid topical peptide based on a protein discovered in frog skin, which disrupts the integrity of bacterial cell membranes)
Phase 3 for the treatment of bacterial infections associated with diabetic foot ulcers
Zensun Sci&Tech (Shanghai, China)
Neucardin (an injectable recombinant peptide fragment of the beta 2a isoform of human neuregulin-1)
Phase 3 for the treatment of chronic heart failure
FDA, US Food and Drug Administration.
a hitherto notoriously difficult target (Nature 462, 182–188, 2009). Verdine first reported on the stapling technique a decade ago (J. Am. Chem. Soc. 122, 5891–5892, 2000). Ten years on, with the first clinical trials looming, he makes no claims that the technology is the finished article. “Every new drug modality has its own pharmacological challenges,” he says. For that reason, getting large pharma—and its attendant pharmacological expertise—on board at an early stage has been an explicit goal of the company. Verdine likens the situation to Whitehouse Station, New Jersey–based Merck’s acquisition of small-interfering-RNA specialist Sirna Therapeutics, of San Francisco, in 2006. “When Merck acquired Sirna, they took on the forward challenge of learning how to deliver these things,” he says. The corresponding challenge with stapled peptides is learning how to measure their pharmacodynamic properties. “I don’t think that’s the work of a moment,” says Kevin Johnson, newly appointed life sciences partner at London-based Index Ventures. “You’re getting into small-molecule realms.” And as small molecules, stapled peptides are most likely to be eliminated through the kidneys. “That almost inevitably makes the half-life short,” says Erkki Ruoslahti, distinguished professor at Sanford Burnham Medical Research Institute, in Santa Barbara, California. Ruoslahti, who is developing tumor-penetrating peptides that can increase the efficacy of other drugs, also questions the likely potency of stapled peptides. “One of the things about peptides is they tend to have low affinities,” he says. “That means that a lot of peptide is needed.”
The peptides enter cells by means of what Verdine and co-workers have described as “an active, endocytic, peptide import mechanism.” Serum levels of stapled peptides, therefore, are not a good proxy for intracellular levels, as they are for small-molecule drugs that enter cells by mass action. “The rate of clearance from the cells is very different from the rate of clearance from the blood,” Verdine says. Learning about their routes of metabolism and routes of clearance—and learning how to measure these processes—are all outstanding challenges. Moreover, a full understanding of the cellpenetrating properties of stapled peptides remains elusive for now, although the issue is the subject of intense scrutiny. “The molecules that have been reported have fantastic properties, and even better, they are perfect tools to help understand how peptide-like molecules gain entry to the cell and [how they] traffic once they do so,” says Alanna Schepartz, who holds chairs in chemistry and in molecular, cellular and developmental biology at Yale University, in New Haven (Table 1). “The biggest questions have to do with how the molecules get out of endosomes rather than how they get in.” The immediate attraction of peptides, particularly those of human origin, is that they generally have a benign toxicity profile. Aileron’s biggest claim in support of stapled peptides is that their cell-penetrating abilities will open up a target universe that was previously off limits to drug developers. Between them, monoclonal antibodies, which can recognize only extracellular binding sites or secreted ligands, and small molecules, which can bind only hydrophobic pockets found in
nature biotechnology volume 28 number 10 OCTOBER 2010
a small fraction of proteins, hit little more than 10% of all possible targets, says Verdine. “That’s the operating theatre of the entire biotechnology industry,” he argues. Verdine does not claim that Aileron alone will be able to access this newly emerging landscape. “I think in the next ten years we’re going to see a real efflorescence of new drug modalities,” he says. Schepartz identifies β-peptides, peptoids and miniature proteins—which were invented in her laboratory—among alternative modification technologies that have promise. Cambridge, UK–based Bicycle Therapeutics recently raised seed funding from two of Aileron’s investors—Novartis Venture Funds in Basel and SR One in Conshohocken, Pennsylvania—to develop further its chemically constrained cyclic peptides, for which it claims high target specificity and binding affinity, as well as resistance to proteolytic breakdown. Its cofounders include antibody pioneer Greg Winter. Despite the large headline value of the Roche deal, the field of peptide modification remains distinctly early stage. Basel-based Roche committed only $25 million in guaranteed funding. A successful outcome to Aileron’s first clinical trial next year—in an as-yet unidentified oncology indication—would therefore provide important momentum to the alliance. “It’s very important for us as a company to get this first clinical trial right. It’s not just the drug being judged, it’s the platform,” says Yanchik. In the meantime, he says, the deal has probably already enlivened the wider field. “I’d be willing to place a bet with the announcement we’ve just made, we’ve got another few companies funded.” Cormac Sheridan Dublin
993
NEWS
© 2010 Nature America, Inc. All rights reserved.
Life swallows Ion Torrent Instruments provider Life Technologies has acquired sequencing firm Ion Torrent of Guilford, Connecticut and S. San Francisco in a deal worth $725 million—a price tag that has left some industry observers reeling. In August, the Carlsbad, California–based Life paid $375 million upfront, with potential for an additional $350 million in milestones. The prize is Ion Torrent’s Personal Genome Machine, a system that uses semiconductors rather than optics for sequencing DNA. According to Life, the firstgeneration system, due in Q4 2010, will cost $50,000, and its potential scalability suggests it could tackle entire genomes relatively soon. This machine cannot readily compete with the multi-gigabase output of San Diego-based Illumina’s HiSeq2000 or Life’s SOLiD 4—and Ion Torrent founder and CEO Jonathan Rothberg stressed at a recent meeting that it is not intended to do so. “In the near term, there could be some virology and pathogen applications, and longer term there could be some clinical diagnostic applications,” says Doug Schenkel, managing director and senior research analyst at Cowen & Company, New York. However, Life’s investment considerably exceeds their target market—estimated at $200 million—suggesting a focus on long term opportunities. Success is contingent upon both expansion of the sequencing market and the impact of other powerful contenders: newcomers Pacific Biosciences and Complete Genomics have recently filed initial public offerings, and market leader Illumina is unlikely to rest on its laurels. Michael Eisenstein
Anti-anemics price hike New payment rules for dialysis services could further erode the use of erythropoietinstimulating agents (ESAs), already under scrutiny for potential safety risks. The US Centers for Medicare & Medicaid Services are changing how Medicare pays for end-stage renal disease services. From 1 January 2011, payment will bundle equipment and drugs into a single base rate, which will be increased from $198 to $229.60. This single rate will include injectable ESAs, prescribed to stimulate red blood cell production, which are currently reimbursed separately. “The move could affect prescribing patterns for ESAs and may discourage healthcare providers from using large doses of erythropoietin for patients as it could lead to financial loss,” says Aparna Krishnan, senior research analyst at IHS Global Insight in Lexington, Massachusetts. Makers of all versions of epoetin alpha are likely to be affected. The US Food and Drug Administration already requires a risk evaluation and mitigation strategy for ESAs, following studies linking an increase in tumor growth or risk of cardiovascular events to the drugs (Nat. Biotechnol. 28, 303, 2010). With the new rules, “Companies that manufacture ESAs will be forced to reduce drug prices or risk loss [of] market share,” says Swetha Shantikumar, research associate at Frost & Sullivan, Chennai, India. Emma Dorey
994
Genzyme resumes shipping as Sanofi-aventis hovers Genzyme is moving towards resolving the manufacturing issues that have curtailed supplies of its biologics to treat Gaucher’s disease and Fabry’s disease for over a year. In late August, in the midst of reacting to a hostile takeover bid from French drug maker Sanofi-aventis, the biotech sent patient communities separate letters detailing the company’s near-term plans for supply of the drugs. In September, people with Gaucher’s disease would receive two full doses of Henri A. Termeer, Genzyme’s Chairman, Cerezyme (imiglucerase; recombinant President and Chief Executive Officer, has human (rh) β-glucocerebrosidase)—the been fending off Sanofi-aventis’ overtures while same as before the company had to cut dealing with manufacturing problems. back supplies after discovery of vesivirus 2117 contamination at its Allston, Massachusetts, manufacturing facility (Nat. Biotechnol. 27, 681, 2009). Individuals treated for Fabry’s disease would receive one full dose of Fabrazyme (agalsidase β; rh α-galactosidase A) in September and another this month, which is double what the Cambridge, Massachusetts–based firm had been supplying, but still below full dosage. But the company now expects the remediation work at the Allston plant to take four years. This is up from the two to three years it had estimated earlier this year, when it signed a draft consent decree with the US Food & Drug Administration that detailed the process for completing that work and the penalties for missing deadlines (Nat. Biotechnol. 28, 388, 2010). As part of that process, Genzyme is required to complete an initial inspection of the facility later this year. The good news is that in the end, Genzyme should have a more efficient production process. By introducing a new working cell bank for Fabrazyme, for example, Genzyme has already increased productivity 30%, and hopes to go 30% higher than that. By controlling the process parameters around cell density, “we think we’ll be able to get the additional productivity,” said Scott Canute, newly hired president, global manufacturing and corporate operations, on the conference call. “Every company emerges from a consent decree in much better shape,” says William Tanner, biotech analyst with Lazard Capital Markets in New York. “Operating under a consent decree, things are going to be tighter, protocols more tightly adhered to. It stands to reason your production costs should go down.” What’s more, the lost revenue from discarded batches of a high-value biologic “far eclipses the cost of having some people on the ground to assure that they are in compliance with the consent decree,” he says. That said, with competitors aiming at the Gaucher’s and Fabry’s markets, the timing of these problems couldn’t have been worse for Genzyme. Basingstoke, UK–based Shire obtained EU approval for its Vpriv (velaglucerase alfa) Gaucher’s therapy, on the heels of a US approval in March 2010. It also sells Replagal (agalsidase alfa) for Fabry’s in the EU and other countries (it is under review in the US). And Protalix, in Carmiel, Israel, is partnering with Pfizer, in New York, to commercialize plant-derived glucocerebrosidase (taliglucerase alfa); it is also in early-stage development of a plant-derived enzyme drug to treat Fabry’s (Nat. Biotechnol. 28, 107–108, 2010). “It’s irreparable damage,” says Tanner. His initial projections for Vpriv, for example, were for 10–15% of the market but now, based on physician feedback, they’re at 30–40%. These issues haven’t stopped Sanofi-aventis, however, from pursuing a takeover of Genzyme. After months of discussions, on August 29, the Paris-based pharma made a formal offer at $69 per share, or $18.5 billion, which Genzyme promptly rejected. However, Tanner estimates that Genzyme lost around $1–1.3 billion in value because of its manufacturing stumbles. “If they were better able to hang onto the Gaucher[’s] and Fabry[’s] franchises,” he says, “fair value would be $4–5 per share higher.” MidSeptember, Genzyme sold its Genetic Testing Unit to LabCorp of America Holdings, located in Burlington, North Carolina, for $925 million and, in a cost cutting exercise, the biotech will implement over 1,000 job cuts. Mark Ratner Cambridge, Massachusetts
Sipra Das/The India Today Group/Getty Images
in brief
volume 28 number 10 october 2010 nature biotechnology
news
Cancer Research UK, the country’s largest cancer research funding charity, has opened a £18 ($28) million facility at Clare Hall, Hertfordshire, to serve as a pilot plant for investigational biologics. The charity’s small-scale Biotherapeutics Development Unit (BDU), launched on July 30, will produce small batches of clinical grade material ready for testing, in what could become an attractive new model for industry-academia collaborations. The BDU is not the first such initiative to take shape—a slew of small-scale manufacturing initiatives at academic institutions and research organizations signals a growing awareness that in-house drug production could avoid the delays that often stall a promising agent at the very early stages. Such pilot manufacturing plants have been adopted by the Mayo Clinic, in Rochester, Minnesota, as well as by the National Cancer Institute (NCI) and the National Institute of Allergy and Infectious Diseases, each of which own manufacturing facilities in Frederick, Maryland. Other academic institutions with their own manufacturing units include the UK’s University of Oxford and University of Bristol, and the Baylor College of Medicine in Houston. At the NCI, the Biopharmaceutical Development Program (BDP), the US counterpart to the BDU, plays a dual role. On the one hand, it supplies the host research institution with timely drugs for medical trials, and on the other, it takes on commercially unattractive projects, whose products may not be picked up for manufacture by big pharma. “Typical projects undertaken by BDP would be
higher risk for a commercial entity: rare diseases, pediatric indications, small or uncertain markets, or concepts with significant technical or regulatory challenges or need for proof-ofprinciple of a first-in-class approach,” explains Joseph Tomaszewski, deputy director, Division of Cancer Treatment and Diagnosis of the NCI in Bethesda, Maryland. Before the BDP was in place, the manufacture of small, trial-scale batches of new therapies had to be outsourced to big pharma, which was time consuming and expensive. “The little tiny jobs like we do—might get bounced out of the queue if [the company] has a commercial job coming up,” says Stephen Creekmore, chief of the Biological Resources Branch of the NCI, which houses the BDP. At Cancer Research UK, BDU head Heike Lentfer agrees. “Having our own BDU will prove to be more efficient and cost effective than outsourcing clinical trial stage drugs to contract manufacturers,” she says. “The new facility allows us to be more flexible in scheduling new projects and to develop in-house expertise in the production of biologics.” Such facilities can also take on novel therapies that have yet to establish a commercial track record or show significant results in early clinical trials. “There are a lot of projects developed at research labs that need a lot more work, and you really need some sort of lab at the development and early scale-up stage,” notes Creekmore. “That’s very hard to outsource without a lot of money.” The first project of the UK-based BDU will be the production of Chi Lob 4/7, an anti-CD40
Cancer Research UK.
© 2010 Nature America, Inc. All rights reserved.
Cancer research fund launches biologics pilot plant
Researcher working on monoclonal antibody Chi Lob4/7, Cancer Research UK’s first attempt to produce small batches of investigational drugs for clinical trials.
nature biotechnology volume 28 number 10 OCTOBER 2010
monoclonal antibody, which will enter phase 1 testing for large B-cell non-Hodgkin’s lymphoma. Plans are also underway to work on five other projects over the coming year. The unit, operated by a 15-member team, is certified to meet current good manufacturing practice and contains two suites to separately work on mammalian and microbial production.“[Projects] come to us when they’re ready for process development and optimization and scaleup,” says Nigel Blackburn, director of Cancer Research UK’s Drug Development Office. As the experimental drugs move through the different clinical trial phases, the BDUs develop large-scale manufacturing protocols gearing up for the agents’ eventual commercial manufacture at external facilities, after regulatory approval. If phase 1 and phase 2 trials are successful, the units may at that point liaise with commercial entities to take the agent into phase 3 trials and beyond, often through cooperative R&D agreements. “After a novel idea is shown to ‘work’ it usually is not hard to identify commercial interest,” says NCI’s Tomaszewski. NCI collaborated with researchers at the University of California, San Diego, on the monoclonal antibody Erbitux (cetuximab) now marketed by Eli Lilly of Indianapolis. “NCI contractors manufactured purified mouse antibodies that were used by the academic investigators in extensive preclinical experiments, followed by small clinical trials to show that such antibodies could target tumors well enough to image the patient’s cancers,” writes Tomaszewski. NCI then made a chimeric antibody. Another small-scale manufacturing facility with an impressive track record is the pilot bioproduction facility at the government-funded Walter Reed Army Institute for Research (WRAIR), in Silver Spring, Maryland. The Walter Reed pilot facility was set up around 1958, and has nursed several projects through phase 1 and 2 clinical trials, partnering with pharma companies to carry out later-stage human testing. One recent project was the Reed pilot facility’s collaboration with London-based GlaxoSmithKline (GSK) on a dengue virus vaccine now in phase 2 trials. The pilot bioproduction facility at the Walter Reed Army Institute of Research has played a significant role in the GSK-WRAIR dengue vaccine program, notes Katie Moore, director of media, vaccines global public health at GSK. The manufacturing facility is attractive for GSK, Moore says, because of “its experienced personnel and because of its close interactions with the other depart995
NEWS
© 2010 Nature America, Inc. All rights reserved.
Wellcome partners with India A £45 ($70) million fifty-fifty partnership between the UK’s Wellcome Trust and India’s Department of Biotechnology (DBT) to support development of “affordable healthcare products” is just the kind of boost small Indian biotech companies hankered after. The initiative announced 29 July builds on the existing £80 ($124) million alliance launched in 2008 to strengthen the biomedical research base in India (Nat. Biotechnol. 26, 1202, 2008). The added impetus is for translating research into medical products “that are not totally market driven but are required by people at [an] affordable price,” says DBT secretary Maharaj Kishan Bhan. Venture capitalists usually shy away from backing products that do not have a big market, he says, and the new partnership plugs this gap. Chandrasekhar Nair, director of Bigtec, a Bangalore-based startup, which has developed a diagnostic handheld microarray is investigating biomarker detection for early identification of chronic diseases. Nair says that under the Wellcome-DBT alliance his company may consider sourcing microfluidics capabilities from UK universities to fast-track the device’s development. Banda Ravi Kumar of XCyton Diagnostics, Bangalore, says a government loan enabled the initial development of their diagnostic DNA Macro Chips device. “Thanks to the new initiative, we are looking actively to develop another such platform for oncology with Oxford Biodynamics that has an epigeneticsbased technology,” he says. Killugudi Jayaraman
Hungary eyes biotech jobs The Hungarian Ministry for National Economy has unveiled a $4.5 billion scheme aimed at creating one million jobs within ten years. The New Széchenyi Development Plan will bolster small and medium enterprises (SMEs) across all industries, including biotech. The launch of a series of consultations, slated for September 2011, will provide SMEs with resources from local government and EU funds by 2013. The key points include developing healthcare and ‘green’ industries, improving science and innovation, promoting business growth, and investing in housing, employment and transport. “What we see is promising, but the plan is only one piece of the policy. We need to see how it will work all together, “ notes Ernö Duda, CEO of SOLVO, headquartered in Budapest, and founder and president of the Hungarian Biotechnology Association. “It is still too early to say how much of the funding will go into the biotechnology industry, but we hope that the government will recognize that while biotechnology is a small sector, it is growing—even while Hungary was in recession, the biotechnology sector grew by around 50% a year,” says Duda. The Hungarian Biotechnology Association, which was founded only seven years ago and already has over 100 members; has compiled a strategic report on the biotech industry for the government. “We see the Széchenyi plan as being in line with our strategy, and we feel that this will give the industry a boost,” says Duda. Suzanne Elvidge
996
ments at the institute.” What’s more, the small manufacturing plant has produced the clinical grade material needed to move projects from preclinical research into phase 1/2 clinical trials, she adds. Another pilot bioproduction facility success is the Japanese encephalitis virus (JEV) purifiedinactivated vaccine, manufactured and distributed as Ixiaro (inactivated JEV strain SA14-14-2 with aluminum hydroxide adjuvant). Ixiaro received US Food and Drug Administration approval last year, and is now distributed and
manufactured in the US by Novartis of Basel under license from Intercell of Vienna. Ken Eckels, who leads the research team at the Walter Reed pilot facility, has no doubt that biomanufacturing units springing up in publicly funded organizations provide a valuable service. The key, he says, is keeping up with regulatory protocols such as current good manufacturing practice and ensuring that the appropriate quality control and quality assurance checks are in place. Nidhi Subbaraman Boston
Monsanto relaxes restrictions on sharing seeds for research Public sector scientists who complained last year that seed companies were curbing their rights to study commercial biotech crops are negotiating research agreements with industry. In August, the Agricultural Research Service (ARS), an agency within the US Department of Agriculture in Washington, DC, finalized an umbrella license with St. Louisbased Monsanto that gives ARS Agronomic research scientists are now free to study scientists the freedom to study Monsanto’s commercial seeds. Monsanto’s commercial seeds without asking the company for permission on each project. “[The agreement] is extremely good and specific. ARS will be allowed to do basically everything that could be desired,” says one ARS scientist who asked to remain anonymous. ARS scientists were part of a group of 26 researchers who lodged an anonymous public complaint in February 2009 that charged that seed companies were thwarting public sector research. They said a legal contract called a ”stewardship agreement” forbid research from being conducted on the companies’ crops and seeds, no matter how they were obtained. The scientists said they felt forced to seek permission from the seed companies before conducting studies, even on crops that had been on the market for years (Nat. Biotechnol. 27, 880–882, 2009). “No truly independent research can be legally conducted on many critical questions involving these crops” because of company-imposed restrictions, the scientists wrote in their public comment. In response to the complaint and the press reports that followed, seed companies reexamined their research agreements with the public sector. Indianapolis-based Dow AgroSciences, Basel-based Syngenta and Johnston, Iowa–based Pioneer Hi-Bred have all begun discussions with ARS over new umbrella agreements, according to the companies. These industry players, along with Monsanto, have also been working with universities on similar licenses. The Monsanto-ARS agreement obtained by Nature Biotechnology allows ARS scientists to conduct agronomic research—studies on how crops interact with local environments and which varieties perform best. Studies outside of agronomic research, such as breeding, reverse engineering or characterizing the genetic composition of the crop, require separate contracts with the company. The agreement is nearly identical in scope to Monsanto’s licenses with universities, but is more specific. An appendix included in ARS’s license lists more than 25 examples of the specific types of studies that are considered “agronomic” and therefore permissible—a definition that has been unclear to public sector scientists in the past. “It allows us to do our research under a blanket agreement instead of negotiating everything [with Monsanto] every time,” says Larry Chandler, an area director at ARS who facilitated the negotiations. “This is much more efficient for all parties.” Emily Waltz Nashville, Tennessee News.com
in brief
volume 28 number 10 october 2010 nature biotechnology
NEWS
© 2010 Nature America, Inc. All rights reserved.
Wellcome partners with India A £45 ($70) million fifty-fifty partnership between the UK’s Wellcome Trust and India’s Department of Biotechnology (DBT) to support development of “affordable healthcare products” is just the kind of boost small Indian biotech companies hankered after. The initiative announced 29 July builds on the existing £80 ($124) million alliance launched in 2008 to strengthen the biomedical research base in India (Nat. Biotechnol. 26, 1202, 2008). The added impetus is for translating research into medical products “that are not totally market driven but are required by people at [an] affordable price,” says DBT secretary Maharaj Kishan Bhan. Venture capitalists usually shy away from backing products that do not have a big market, he says, and the new partnership plugs this gap. Chandrasekhar Nair, director of Bigtec, a Bangalore-based startup, which has developed a diagnostic handheld microarray is investigating biomarker detection for early identification of chronic diseases. Nair says that under the Wellcome-DBT alliance his company may consider sourcing microfluidics capabilities from UK universities to fast-track the device’s development. Banda Ravi Kumar of XCyton Diagnostics, Bangalore, says a government loan enabled the initial development of their diagnostic DNA Macro Chips device. “Thanks to the new initiative, we are looking actively to develop another such platform for oncology with Oxford Biodynamics that has an epigeneticsbased technology,” he says. Killugudi Jayaraman
Hungary eyes biotech jobs The Hungarian Ministry for National Economy has unveiled a $4.5 billion scheme aimed at creating one million jobs within ten years. The New Széchenyi Development Plan will bolster small and medium enterprises (SMEs) across all industries, including biotech. The launch of a series of consultations, slated for September 2011, will provide SMEs with resources from local government and EU funds by 2013. The key points include developing healthcare and ‘green’ industries, improving science and innovation, promoting business growth, and investing in housing, employment and transport. “What we see is promising, but the plan is only one piece of the policy. We need to see how it will work all together, “ notes Ernö Duda, CEO of SOLVO, headquartered in Budapest, and founder and president of the Hungarian Biotechnology Association. “It is still too early to say how much of the funding will go into the biotechnology industry, but we hope that the government will recognize that while biotechnology is a small sector, it is growing—even while Hungary was in recession, the biotechnology sector grew by around 50% a year,” says Duda. The Hungarian Biotechnology Association, which was founded only seven years ago and already has over 100 members; has compiled a strategic report on the biotech industry for the government. “We see the Széchenyi plan as being in line with our strategy, and we feel that this will give the industry a boost,” says Duda. Suzanne Elvidge
996
ments at the institute.” What’s more, the small manufacturing plant has produced the clinical grade material needed to move projects from preclinical research into phase 1/2 clinical trials, she adds. Another pilot bioproduction facility success is the Japanese encephalitis virus (JEV) purifiedinactivated vaccine, manufactured and distributed as Ixiaro (inactivated JEV strain SA14-14-2 with aluminum hydroxide adjuvant). Ixiaro received US Food and Drug Administration approval last year, and is now distributed and
manufactured in the US by Novartis of Basel under license from Intercell of Vienna. Ken Eckels, who leads the research team at the Walter Reed pilot facility, has no doubt that biomanufacturing units springing up in publicly funded organizations provide a valuable service. The key, he says, is keeping up with regulatory protocols such as current good manufacturing practice and ensuring that the appropriate quality control and quality assurance checks are in place. Nidhi Subbaraman Boston
Monsanto relaxes restrictions on sharing seeds for research Public sector scientists who complained last year that seed companies were curbing their rights to study commercial biotech crops are negotiating research agreements with industry. In August, the Agricultural Research Service (ARS), an agency within the US Department of Agriculture in Washington, DC, finalized an umbrella license with St. Louisbased Monsanto that gives ARS Agronomic research scientists are now free to study scientists the freedom to study Monsanto’s commercial seeds. Monsanto’s commercial seeds without asking the company for permission on each project. “[The agreement] is extremely good and specific. ARS will be allowed to do basically everything that could be desired,” says one ARS scientist who asked to remain anonymous. ARS scientists were part of a group of 26 researchers who lodged an anonymous public complaint in February 2009 that charged that seed companies were thwarting public sector research. They said a legal contract called a ”stewardship agreement” forbid research from being conducted on the companies’ crops and seeds, no matter how they were obtained. The scientists said they felt forced to seek permission from the seed companies before conducting studies, even on crops that had been on the market for years (Nat. Biotechnol. 27, 880–882, 2009). “No truly independent research can be legally conducted on many critical questions involving these crops” because of company-imposed restrictions, the scientists wrote in their public comment. In response to the complaint and the press reports that followed, seed companies reexamined their research agreements with the public sector. Indianapolis-based Dow AgroSciences, Basel-based Syngenta and Johnston, Iowa–based Pioneer Hi-Bred have all begun discussions with ARS over new umbrella agreements, according to the companies. These industry players, along with Monsanto, have also been working with universities on similar licenses. The Monsanto-ARS agreement obtained by Nature Biotechnology allows ARS scientists to conduct agronomic research—studies on how crops interact with local environments and which varieties perform best. Studies outside of agronomic research, such as breeding, reverse engineering or characterizing the genetic composition of the crop, require separate contracts with the company. The agreement is nearly identical in scope to Monsanto’s licenses with universities, but is more specific. An appendix included in ARS’s license lists more than 25 examples of the specific types of studies that are considered “agronomic” and therefore permissible—a definition that has been unclear to public sector scientists in the past. “It allows us to do our research under a blanket agreement instead of negotiating everything [with Monsanto] every time,” says Larry Chandler, an area director at ARS who facilitated the negotiations. “This is much more efficient for all parties.” Emily Waltz Nashville, Tennessee News.com
in brief
volume 28 number 10 october 2010 nature biotechnology
ne w s
NEWS maker Constellation Pharmaceuticals
© 2010 Nature America, Inc. All rights reserved.
Constellation’s $22 million series B financing this summer again drew pundits’ attention toward a field of research that has been tilled with particular vigor over the past year or so. The basic science underpinning epigenetics— manipulating gene expression without altering the sequence itself—is widely regarded as conceptually sound. Four epigenetic drugs—two that take aim at (among other things) histone deacetylases (HDACs) and two that target DNA methyltransferases (DNMTs), which control chemical tags on histones or cytosines in the gene sequence, respectively—have thus far received US Food and Drug Administration (FDA) approval. For Constellation, the question is whether its therapeutic focus, histone methylases and demethylases, which modify the proteins that package and order DNA, will prove as successful. Constellation started out, fueled by $32 million in series A funding, in April 2008. It spent the first year and a half hiring key personnel, building infrastructure and optimizing assays. Part of that initial funding came from Third Rock Ventures, and a partner at the fund, Mark Levin—the former CEO of Cambridge, Massachusetts–based Millennium Pharmaceuticals—served as interim CEO of Constellation. One industry insider says Levin’s reputation at Millennium enabled him to sell the figurative sizzle to investors before there was any clinical ‘steak’. Constellation work remains only at the preclinical stage. Now with 50 employees—up from about 30 this time last year—the company aims to develop drugs targeting a broad range of histone methylation enzymes. So far, Constellation scientists have published evidence that the histone lysine methyltransferase G9a/KMT1C regulates chromatin structure by promoting the methylation of the histone H1.4K26 in vivo in mammals (J. Biol. Chem. 284, 8395–8405, 2009). Constellation claims to have nailed down programs that identify enzymes with specific linkages to disease. The company will not reach the clinic by next year, but could by 2012. Mark Goldsmith, Constellation’s president and CEO, says the firm’s research is bolstered
by enhanced understanding of the role of histone methylation in modulating chromatin through action by enzymes and proteins that act as ‘writers’, ‘readers’ and ‘erasers’ to activate or deactivate genes. ‘Writers’ add chemical groups, ‘readers’ bear binding regions that recognize changes and ‘erasers’ remove the marks. Now that a survey of human methylomes—the map of human methylation patterns—has been published (Nature 462, 315–322, 2009), Goldsmith says the linkage between DNA methylation and biological consequences can be brought into sharper focus. Skepticism concerning the safety and efficacy of pharmacological interventions in the DNA and chromatin remodeling machinery has receded with the approval of several drugs in hematological cancers: HDAC inhibitors Istodax (romidepsin) and Zolinza (vorinostat), and DNMT inhibitors Vidaza (azacitidine) and Dacogen (decitabine). Indeed, big pharma is investing heavily in the area, with such deals as the $200 million agreement in March between Cambridge, UK–based CellCentric and Takeda Pharmaceutical, of Tokyo. The same month, London-based GlaxoSmithKline inked a $644 million epigenetics pact with Cellzome, of Cambridge, UK. Both deals grew out of existing relationships with nonepigenetic concerns, and say little about whether Constellation can prove itself to suitors as well, but a bubbling epigenetics pot has led would-be partners to discuss potential arrangements, according to Goldsmith. Meanwhile, as the company holds fast to three programs of special focus, Constellation is casting a wide net to consider target classes beyond those validated so far. Goldsmith wants to leave no would-be opportunities on the table, he says, but the firm’s determination to mine varied classes of enzymes for their possibilities could become a rate-limiting factor. Indeed, the nascent biology surrounding many of these targets could make a slowdown inevitable, in the view of Jean-Pierre Issa, co-director of cancer epigenetics at M.D. Anderson Cancer Center, in Houston. Despite acknowledging the considerable
nature biotechnology volume 28 number 10 OCTOBER 2010
Seacia Pavao
Replete with investor funds, the Cambridge, Massachusetts– based epigenetics firm is taking aim at methylases and demethylases linked to disease.
Left to right: Constellation’s CEO Mark Goldsmith, and founders Danny Reinberg, Professor of Biochemistry at NYU School of Medicine, and Yang Shi, Professor of Pathology at Harvard Medical School
scientific expertise at Constellation, he suspects the broad approach will mean only plodding progress. Issa likens the needle-inthe-haystack approach to that taken by companies that first began investigating tyrosine kinases, and predicts the road for epigenetics could be similarly fraught with failure. As an example of one target that almost every company is pursuing, Issa points to histonelysine N-methyltransferase EZH2, an enzyme that in humans is encoded by the EZH2 gene. This histone-modifying enzyme belongs to the polycomb group family, and three papers published in the past year in Nature Genetics (42, 181–185; 665–667; 722–726, 2010) have suggested that EZH2 could act as a tumor suppressor. Constellation would not confirm any work on EZH2, but says the target is interesting. Stuart Hwang, director of business development at SuperGen of Dublin, California, says Constellation’s plan to use approaches other than the more popular HDAC and DNMT inhibitors is logical because drugs targeting HDACs do not seem to work against solid tumors, and oral versions bring toxicity, whereas DNMT blockers display only a short half-life, which makes them unsuitable for solid tumors as well. But histone methyltransferases outside of the two main classes come in many flavors, and it’s an open question whether their pharmacological inhibition will prove successful. Hwang doesn’t think so, mainly because of the problem that has beset EZH2 work: turning on a gene or genes can mean shutting down an equal number of them. Biological benefits are starting to emerge, Hwang says, but outside the two known categories of epigenetic drugs, clinical proof of efficacy could yet lie far off. Randy Osborne Atlanta, Georgia
997
data page
Drug pipeline: Q310 Wayne Peng The number of small-molecule approvals declined more sharply than that of biologics over the past decade. However, new targets, such as atrium-specific K+ channel, phosphodiesterase-4 and renal Na+-glucose co-transporter, continue to open up new opportunities. Such novel targets are not without risk, as Eli Lilly found this
quarter when its gamma-secretase inhibitor failed to meet its endpoints in Alzheimer’s. Meanwhile, MannKind’s inhaled insulin, Afrezza, demonstrated both efficacy and safety in a key trial. Approvals are also expected for Benlysta (belimumab), ipilimumab and Bydureon (exenatide LAR).
FDA approvals by drug molecule type
Notable trial results (June–September 2010) Company/drug Indication name Bristol-Myers Type 2 Squibb and diabetes AstraZeneca/dapagliflozin
Fewer small molecules are being approved than before. Small molecule Peptide Protein
50 40
Steroid Polyclonal antibody Antisense nucleic acid
Carbohydrate Monoclonal antibody Cells/bacteria/viruses1
2 31
30 20 7
10
1 21 3 1 1 4
175 4 1 4
09
3
1/
1–
9/
3
20
08
20
07
20
06
20
05
20
04
20
03
20
02
20
01
20
00
20
99
20
98
19
97
19
96
19
19
19
95
2 0
6 12 91
10
Number of FDA approvals
© 2010 Nature America, Inc. All rights reserved.
60
Source: US Food and Drug Administration and BioMedTracker, a service of Sagient Research (http://biomedtracker.com/). Includes vaccines approved by the FDA.
Notable regulatory approvals (June–September 2010) Company/drug name Genentech-Roche/ Lucentis (ranibizumab) Forest Lab/Daxas (roflumilast) Shire/Vpriv (velaglucerase alfa) Cardiome Pharma/ Kynapid (vernakalant) Savient/Krystexxa (pegloticase)
Indication Retinal venous occlusion Chronic obstructive pulmonary disease Gaucher’s disease Atrial fibrillation
Gout
Approvals FDA, 6/22/10 (sBLA)
Drug description Humanized anti-VEGF monoclonal antibody Fab fragment EMA, 7/6/10 The first selective phosphodiesterase-4 inhibitor EMA, 8/26/10; Gene-activated human FDA, 2/26/10 glucocerebrosidase EMA, 9/1/10 Small-molecule blocker for atrium-specific potassium channel Kv1.5 FDA, 9/14/10 PEG-conjugated recombinant human uricase
Source: FDA and EMA. FDA, US Food and Drug Administration. EMA, European Medicines Agency. sBLA, supplemental Biologic License Application. VEGF, vascular endothelial growth factor.
Notable development setbacks (June-September 2010) Company/drug name MedImmuneAstraZeneca/ Numax (motavizumab)
Indication
Setback summary
Respiratory On 6/2/10, an FDA panel voted against approval. On syncytial virus 8/30/10, the FDA issued Complete Response Letter (RSV) infection requesting additional trials to support the risk-benefit profile. Motavizumab is a humanized monoclonal antibody against the fusion (F) protein of RSV. Merck/Peg-Intron Melanoma Phase 3 trial did not meet either primary or secondary (peg-interferon endpoints; treatment is no better than conventional alpha-2b) low-dose interferon treatment. (American Society of Clinical Oncology Annual Meeting, 6/05/10, Abstract LBA8506) Human Genome Multiple Phase 2 study showed that treatment did not sigSciences myeloma nificantly improve disease response or progression(mapatumumab) free survival. (Company press release, 06/09/10) Mapatumumab is a human monoclonal antibody agonist to TRAIL receptor-1. Eli Lilly/ Alzheimer’s Company discontinued development of the gammaSemagacestat disease secretase inhibitor after interim analysis of phase 3 trial (LY450139) data showed that treatment resulted in worse outcomes than placebo. (Company press release, 8/17/10) Roche Type 2 diabetes On 9/10/10, company announced suspension of (taspoglutide) phase 3 trials for the long-acting glucagon-like peptide-1 analog because serious side effects led too many patients to drop out of the trial. Source: BioMedTracker, a service of Sagient Research (http://www.biomedtracker.com/)
998
Result summary
Phase 3 study met primary and secondary endpoints after 24 weeks of treatment with this smallmolecule inhibitor of the renal sodium glucose co-transporter 2 (SGLT-2). (Diabetes Care, doi: 10.2337/dc10-0612) Morphoteck-Eisai/ Ovarian cancer Phase 2 study met primary endpoints and demfarletuzumab onstrated benefits of this humanized monoclonal antibody against folate receptor alpha compared with conventional carboplatin+taxane treatment. (American Society of Clinical Oncology Annual Meeting, 7/07/10, Abstract 5001) Jerini-Shire/Firazyr Hereditary Phase 3 study showed significant benefit of this (icatibant) angioedema selective peptide antagonist of bradykinin B2 receptor. (N. Engl. J. Med. 363, 532–541) ThromboGenetics/ Vitreomacular Phase 2 study showed significantly increased nonadhesion surgical resolution of vitreomacular adhesion by ocriplasmin intravitreal injection of the recombinant human pro(recombinant microplasmin) tein (Retina 30, 1122–1127). Preliminary phase 3 data also met primary endpoint. (American Society of Retina Specialists Annual Meeting, 8/31/10) Source: BioMedTracker, a service of Sagient Research (http://biomedtracker.com/)
Notable upcoming approvals (Q4 2010) Company/ Indication drug name Amylin Type 2 Pharmaceuticals/ diabetes Bydureon (exenatide LAR)
Approval decision
10/22/10 PDUFA date. Phase 3 trial met primary and secondary endpoints and showed significant superiority over comparators (American Diabetes Association Annual Meeting, 6/25–29/2010). This controlled release form of Byetta (exenatide, a 39-amino-acid peptide agonist of glucagon-like peptide-1, GLP-1) uses the Medisorb technology (microspheres made of polylactide co-glycolide polymer). Human Genome Systemic lupus 12/09/10 PDUFA date, priority review. MMA approval Sciences/ erythematosus expected in H2 2011. Two phase 3 trials showed Benlysta significant improvement in patient response after (belimumab) 52 weeks of treatment of this human monoclonal antibody against B-lymphocyte stimulator (BLyS). (European League Against Rheumatism Annual Congress, 6/17/10) Medarex-Bristol- Metastatic 12/25/10 PDUFA date. Priority review granted on Myers Squibb melanoma 8/18/10. Ipilimumab is a fully human antibody against (ipilimumab) cytotoxic T-lymphocyte antigen-4 (CTLA-4). Phase 3 study met primary and secondary endpoints (N. Engl. J. Med. 363, 711–723, 2010) Diabetes, 12/29/10 PDUFA date. In phase 3 trial, inhaled insulin MannKind/ Afrezza types 1 and 2 was statistically noninferior to injected insulin. Over 52 (inhaled insulin, weeks, there was no difference in pulmonary function dry powder) between groups. Inhaled insulin was as effective and well tolerated. (The Lancet 375, 2244–2253, 2010) LG Life Sciences/ Growth 09/10/10 - 01/03/11 PDUFA date range. MMA LB03002 hormone approval expected in H2 2010. LB03002 is the (SR-rHGH) deficiency sustained release form of recombinant human growth hormone and requires once-weekly injection versus current daily treatment. Phase 3 study results showed significant superiority over placebo in adult patients after 26 weeks of treatment. (Endocrine Society Annual Meeting, 6/11/09, Abstract P2-746)
Other expected approvals in Q4 include Theratechnologies’ Egrifta (tesamorelin) and Novartis’ Gilenia (fingolimod). See Nat. Biotechnol. 28, 640, 2010, for details. Source: BioMedTracker, a service of Sagient Research (http://biomedtracker.com/). PDUFA, Prescription Drug User Fee Act. MAA, market authorization application. LAR, long-acting release. SR, sustained release.
Wayne Peng, Emerging Technology Analyst, Nature Publishing Group
volume 28 number 10 October 2010 nature biotechnology
n e w s fe at u re
Turning the tide in lung cancer
In June, New York–based Pfizer’s targeted cancer drug crizotinib’s “unprecedented” early trial results brought a sliver of hope to lung cancer research1. As good as it was—almost 90% of participants had some measure of disease control—that news was also a reminder that lung cancer is one of the toughest cancers to beat. Only 3–5% of lung cancer patients have the ALK (anaplastic lymphoma receptor tyrosine kinase) gene rearrangement that crizotinib targets. Add to that the number of patients who respond to the already approved epidermal growth factor receptor (EGFR) inhibitors and that adds up to just 14–20% of patients of European descent who are likely to benefit from targeted therapy. (Response rates might be higher among Asians, who have a higher incidence of EGFR mutations, but that still leaves a large population of patients for whom no targeted therapy is yet available.) On top of that, each bit of good news is invariably accompanied by a stream of latestage drug failures. This trend continues despite hundreds of trials with dozens of new agents. As a recent editorial in the Lancet lamented, “It is quite disheartening to see that a number of clinical trials with new [lung cancer] drugs have failed to meet even the most modest endpoints”2. Undeterred, biotechs and pharmas alike are going full throttle after new lung cancer drugs. Approximately 100 new compounds are being tested in an estimated 650 lung cancer trials worldwide. More importantly, lung cancer has become a hotbed of research innovation. Having hit the proverbial wall with traditional approaches, experts in this field are pioneering bold new adaptive trial designs, novel types of diagnostic and prognostic tests and breakthrough tools for early detection. The rewards for success, after all, should be considerable: unmet need remains so high, analysts at Waltham-based Decision Resources are projecting the lung cancer market will double to over $684 million by 2017. An unmet need with scant solutions The most common malignancy in the world, lung cancer affects >200,000 people per year in the US alone and kills more people worldwide
than any other cancer. One form of the disease, non-small cell lung cancer (NSCLC), causes ~85% of deaths. Scientists blame the lung cancer drug drought on various factors. First, there was less money, relative to other cancers, early on. “Lung cancer research has been severely underfunded because it’s regarded as a smokers’ disease,” says Alice Shaw, an assistant professor and attending physician of the Thoracic Cancer Program at Massachusetts General Hospital in Boston. Lung tumors are also typically discovered only late in the course of the disease because it’s difficult to detect them until they are large and symptomatic. About half the patients with advanced disease die within a year, even with treatment. Finally, it has become increasingly clear that NSCLC is highly heterogeneous, but the move to biomarker-based studies in lung cancer has been slow. A recent review of clinical trails in NSCLC cancer found that only ~7% of trials in this disease have used biomarkers EGF
EGFR
Cell membrane PIP2
DAG PKC
CBL
ABL1
IP3
Ca
PLCγ
Ubiquitination and receptor endocytosis
AKT
2+
CRK
PI3K
p85 Nck
CAMK Nuclear membrane
Survival Metabolism Protein synthesis
CREB
RAC
SOS
P
Ras BRAF
GRB2
P
P
P
P
P
P
P
P
Migration
Transcriptional activation STAT leads to proliferation and transformation
RAF1
SHC MEKK2 and MEKK5
SRC
KSR1
© 2010 Nature America, Inc. All rights reserved.
Researchers are testing a slew of targeted therapeutic strategies in lung cancer. Signs are emerging that these therapies are gaining increasing traction in what has long been one of oncology’s minefields. Malorye Allison investigates.
for patient selection: that’s just 34 out of 493 trials listed on ClinicalTrials.gov3. The lack of biomarkers has been a major stumbling block. Iressa (gefitinib) from AstraZeneca of London was the first targeted therapy approved for lung cancer. It inhibits the tyrosine kinase domain of EGFR (Fig. 1). The drug garnered accelerated approval in Japan in 2002, then in the US in 2003. The overall response of ~11% from a phase 2 study was just enough to let the drug squeak past US regulators. But then phase 3 data suggested that those earlier numbers painted a rosier picture about the drug than deserved, and in 2005 the drug was relabeled in the US and restricted for use in only those patients who had already started treatment with the drug. Because there were tantalizing leads implying subpopulation effects (within nonsmokers, women and Asians), a biomarker might have returned the drug to the market. But unlike Genentech in S. San Francisco, California (now part of Roche of Basel), which had a biomarker in its pocket when Herceptin (trastuzumab) proved ineffective in the general breast cancer population, AstraZeneca had no such option. It took until 2008 to determine that EGFR inhibitors are most effective in lung cancer patients with EGFR mutations. During that interval, OSI Pharmaceuticals of Melville, New York, and its partner Genentech, launched the EGFR inhibitor Tarceva (erlotinib). In 2004, Tarceva was approved in the US as a second-line therapy for individuals who didn’t respond to a chemotherapy regimen;
STAT
FAK
MEF2
MYC
MEK5
MEK
ERK5
ERK
AP1
ETS
EGR1
Figure 1 Target practice. EGFR signaling pathway offers a plethora of potential drug targets. (Reprinted with permission from Nat. Rev. Cancer 10, 618–629, 2010)
nature biotechnology volume 28 number 10 october 2010
999
© 2010 Nature America, Inc. All rights reserved.
N E WS fe at u re since that time, the drug has claimed most of the US market. In April of this year, it was also approved as a maintenance therapy (Box 1). Already, many US oncologists are testing for EGFR mutations up front and prescribing Tarceva, sidestepping chemotherapy, which represents a sea change from the typical standard of care. “Until recently, everyone with metastatic lung cancer got chemo,” Shaw says. Among those whose tumors carry EGFR mutations, ~70% respond, getting anywhere from several months to just over an extra year of life. But the response is variable. “I have one patient who has taken erlotinib for seven years before needing something different,” says Roy Herbst, chief, thoracic medical oncology, M.D. Anderson Cancer Center in Houston. Iressa has not entirely lost the battle, however. In April, the European Medicines Agency approved the drug for use in lung cancer patients with EGFR mutations. AstraZeneca now reports that the drug is approved in 36 countries4. Several other tyrosine kinase inhibitors are also making their way through the clinic (Table 1). Fine-tuning targeted treatment The struggle with Iressa’s approval may have also taught companies a valuable lesson. Many are now looking for biomarkers and accepting a smaller initial market share in exchange for a speedier development pathway. Crizotinib illustrates this approach. A dual MET (mesenchymal-epithelial transition factor)/ALK-fusion inhibitor, the drug is being tested mainly in lung cancer patients whose tumors harbor a fusion of ALK and EML4 (echinoderm microtubule associated proteinlike 4), resulting from a translocation. At this June’s annual American Society of Clinical
Oncology (ASCO) meeting held in Chicago, researchers reported seeing a dramatic and durable response (some lasting 15 months) in 57% of patients. Another 20–30% of patients responded less well but still benefited from the drug. Although only about 3–5% of patients in the US carry this fusion, it’s found in a higher percentage of people in Asia. Crizotinib’s development has been rapid fire, in part due to some luck. “This drug was in clinical trials when we first learned about [EML4]ALK fusions in August 2007,” says Shaw, who is the principal investigator on the phase 2 and 3 trials now underway. The drug was slated for testing in other cancers, but the researchers quickly launched a lung cancer trial. The first patients were identified in November 2007 and enrolled a month later. Skeptics point out that results from a larger trial may temper enthusiasm for the drug, given that the data presented at ASCO included only 82 patients. Thousand Oaks, California–based Amgen has also been doing up-front biomarker work with motesanib diphosphate (AMG706), an angiogenesis inhibitor that antagonizes vascular endothelial growth factor receptors (VEGFR)1, 2 and 3, platelet-derived growth factor (PDGF) and c-Kit (stem cell growth factor) receptors5, which is being co-developed by Millennium of Cambridge, Massachusetts, and Takeda of Osaka, Japan. The drug competes with Genentech’s Avastin (bevacizumab) and is being tested in combination with chemotherapy. “Many companies are working on antiangiogenic drugs and we need biomarkers to inform treatment decisions,” says David Chang, vice president of global oncology development at Amgen. At this year’s ASCO meeting, data were presented that looked at five biomarkers in blood
Box 1 Moving into maintenance therapy Another positive development in lung cancer has been the introduction of maintenance therapy. Both Alimta (Lilly of Indianapolis’ anti-folate drug) and Tarceva have been approved for this type of use by the US Food and Drug Administration. But the way this market has shaped up sheds some light on evolving lung cancer market dynamics. Maintenance is given after an initial treatment round, and before a cancer returns, to prevent a recurrence. “There is very potent data that Alimta, given as a second treatment, can delay recurrences by months,” says Sloan-Kettering’s Kris. Because physicians are becoming more accustomed to giving a targeted therapy up front, rather than chemotherapy, “doctors are putting those two data points together, and saying “start it [Alimta] early and keep it going,” says Kris. Data from Waltham, Massachusetts–based oncology data firm IntrinsiQ support this trend. “Maintenance in NSCLC was going to be slow to uptake anyway, because it was a big change,” explains Ed Kissel, vice president of quantitative analysis at the company. Previously, doctors would just start one treatment, then switch the patient if that started to fail. Now, says Kissel, there is a push to maintenance therapy, but it’s benefiting Alimta a lot more than Tarceva. “People are using it [Alimta] more creatively than even the data suggested.”
1000
samples from patients enrolled in three phase 2 trials of the Amgen drug, including a NSCLC study6. Across those studies, individuals who responded to the drug were more likely to have elevated placental growth factor (P1GF) levels after treatment had begun. This kind of marker is only useful after a patient begins treatment with a drug. According to Scott Patterson, Amgen’s executive director of medical sciences, “Like everyone else, we are also looking for baseline biomarkers predictive of response.” Digging up biomarkers has been arduous work so far, and there’s much left to be done. “We need to find the driving mutation for every cancer and a drug for every mutation,” says Daniel Haber, director of the Massachusetts General Hospital Cancer Center. At this point, no one even knows how many subtypes of NSCLC exist. The brass ring: early detection Because a key problem with lung cancer is that the tumors are found late, early diagnosis could be a game changer. The Canary Foundation of Palo Alto, California, and Victoria, British Columbia, Canada, is supporting work at Stanford University in California that aims to address several hurdles simultaneously, using two different approaches. “Ideally, you’d have a low-cost blood test as the first step followed by molecular imaging,” says Sanjiv (Sam) Gambhir, head of nuclear medicine, Lucile Packard Children’s Hospital at Stanford and director, Canary Center at Stanford for Cancer Early Detection. Gambhir and his collaborators already have a candidate molecular imaging agent in hand. It binds to alpha V beta 3, an integrin expressed on new blood vessels and tumor cell surfaces7. Their investigational new drug application was approved to study this agent using positron emission tomography (PET) and the radiolabel fluorine 18. The agent is “highly specific” for the integrin and with tomography it “gives you a molecular map of the lungs, showing small new lesions and whether they are likely to be malignant or not,” Gambhir says. The group is also collaborating with researchers at the Seattle-based Fred Hutchinson Cancer Center using proteomics to look for signatures of early lung cancer. Although a better imaging tool alone would be a step forward, Gambhir points out that given the expense of imaging, it will be best if it can be done selectively. But finding a good blood biomarker for early detection is extremely challenging. The limits for tumor detection are not yet even known, so Gambhir and his colleagues have been creating mathematical models relating blood biomarker levels to tumor burden so they can determine how small a tumor they can detect8.
volume 28 number 10 october 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
n e w s fe at u re Novel therapeutics Whereas the progress against tumors with EGFR and ALK mutations is encouraging, “for the bulk of lung cancer patients, their disease is more genetically complicated,” says Ira Mellman, vice president of research oncology at Genentech. Smokers’ tumors in particular, he says, “have many more mutations and alterations.” Finding effective treatment for these tumors is likely to be much more difficult. Genentech is betting on radically different approaches in its next wave of cancer therapeutics—drug antibody conjugates. According to Mellman, the company has put a lot of effort into choosing optimal antibodies and making sure the toxin is linked securely to the antibody. “The linker is critical in ensuring safety. If the payload breaks off you can get terrific toxicity to the patient,” Mellman says, pointing to the withdrawal of Wyeth/ Pfizer’s Mylotarg (gemtuzumab ozogamicin; an anti-CD33 humanized monoclonal antibody linked to the cytotoxic agent calicheamicin), which received accelerated approval, but was withdrawn a decade later because mortality was higher in follow-up than in earlier trials. Other conjugates have used bacterial toxins, which are highly immunogenic. If the patient’s immune system makes an antibody to the toxin, that can limit the conjugate’s effectiveness. Genentech’s conjugates use a new kind of payload—microtubule poisons—which will be linked to antibodies targeting tumor cell surface proteins. They are not revealing the surface proteins being targeted. “The preclinical studies look great, but they always do,” Mellman says. Data from archived tumors show that 85% carry the targeted protein. The trials are being designed “to get a lot of information out of them” Mellman says. The researchers will monitor levels of the biomarkers in patients and will also use radioactive isotopes and immunoPET imaging studies to show whether the drug is actually getting where it is supposed to. Nine studies of drug conjugates in cancer will be starting over the next year. One in NSCLC uses the anti-mitotic auristatin as its payload. A clinical study of Herceptin conjugated with an anti-microtubule maytansine-derivative, Herceptin-DM1, is also quite far along and has researchers very excited about conjugates as an approach. Ultimately, lung and other deadly tumors will probably require treatment with multiple, targeted drugs. As Mellman says, “Cancers are protean and resistance mutations spring up quickly.” Just as with HIV, he says, doctors will need to bombard tumors with combination therapies to limit the cells’ options to mutate.
“You have to think one step ahead of where the cancer cell is likely to go.” In keeping with that, researchers are also looking at mutations that confer resistance to the targeted therapies they already have. The EGFR mutation T790M, for example, confers resistance to tyrosine kinase inhibitors of the receptor. “There is a worldwide developmental push to find a drug to target this defect,” says Mark Kris, chief, thoracic oncology service, Memorial Sloan-Kettering Cancer Center, New York. Better diagnostics The search for new diagnostics is broad and aggressive. Genes in the EGFR family and related pathways are one focal point in terms of cellular receptors. Overexpressing the EGFR family member ERBB3, for example, has been implicated in poor prognosis and survival among lung cancer patients and is emerging to be a key player, according to Gunamani Sithanandam, a staff scientist at the US National Cancer Institute in Bethesda, Maryland. ERBB3 activates phosphoinositide 3 kinase (PI3K)/Akt (protein kinase B) signaling, which is involved in tumorigenesis. RAS family gene products, including KRAS, act downstream of ERBB3 and KRAS mutations (which 15–30% of patients have) and make tumors less responsive to EGFR inhibitors. KRAS mutations are the most common ones found in smokers and are thought by some to be associated with unfavorable outcomes9. Gunamani thinks ERBB3 could also be an important therapeutic target. “More and more, we are thinking of treatment and prevention along molecular pathways,” says Nita Maihle, a professor in pharmacology and molecular medicine at Yale University in New Haven, Connecticut. The challenge, Maihle says, is to really work out the biology. “We like to simplify things, but many of these biomarkers are extremely
complex.” HER2 (also known as ERBB2) testing in breast cancer, she points out, has been fraught with difficulty not just because the quality of testing is so variable, but because the very meaning of ‘HER2 positive’ is now being challenged10. There is also a growing sense that mutations, although they make great targets and biomarkers, are not going to explain all the variations in response. “We need to do some of these tests in DNA, RNA and protein if we are focusing on advanced metastatic disease,” says Herbst. At M.D. Anderson, Herbst’s group is including genomic and proteomic studies in its landmark BATTLE (biomarker-integrated approaches of targeted therapy for lung cancer elimination) trials, which employ a novel adaptive design. The trials are designed to evaluate 11 biomarkers related to four pathways against multiple lung cancer drugs simultaneously. All patients must have a lung biopsy, which undergoes extensive molecular and histological analysis. Individuals are randomly assigned to an experimental therapy and monitored for eight weeks. If subjects respond, they stay on the drug, if not, they are shifted to a different experimental therapy. BATTLE-II, which is underway, will include combinations of targeted therapies. “It’s a speed trial, where we reboot each time we reassign a patient to a new treatment regimen,” Herbst says. They hope to quickly amass a range of biomarkers of response, but “expression signatures take a lot of work to validate,” he admits. They found, for example, that 61% of patients with a KRAS mutation who took Nexavar (sorafenib) had disease control at eight weeks, compared with 32% for the other three drugs. Tarceva did best against tumors with EGFR mutations, Zactima (vandetanib; a 4-anilinoquinazoline derivative selective for VEGF receptor) for high VEGF receptor 2 expression, and the combination of Tarceva
Table 1 Selected trials of second-generation tyrosine kinase inhibitors in NSCLC Company (location)
Target
Development phase
Zactima
AstraZeneca
EGFR and VEGFR2
Phase 3
BIBW-2992
Boehringer Ingelheim, (Ridgefield, Connecticut)
EGFR and HER2
Phase 3 Phase 2
Drug
Pelitinib
Wyeth (Madison, New Jersey)
EGFR
Canertinib
Pfizer
EGFR and HER2
Phase 2
Tykerb (lapatinib)
GlaxoSmithKline (Brentford, UK)
EGFR and HER2
Phase 2
VEGFR1, 2, 3; PDGF and c-Kit
Phase 2
Symphony Evolution (Rockville, Maryland)
EGFR and VEGFR2
Phase 2
Votrient (pazopanib) XL647 Crizotinib
Pfizer
MET/ALK fusion
Phase 1/2
XL184
Exelixis (S. San Francisco, California)
VEGFR2, MET and RET
Phase 1
nature biotechnology volume 28 number 10 october 2010
1001
© 2010 Nature America, Inc. All rights reserved.
N E WS fe at u re and Targretin (bexarotene; a retinoid X receptor (RXR)-selective antitumor retinoid) was most effective against tumors with cyclin D1 defects or amplified numbers of EGFR. Biodesix, a molecular diagnostics company based in Broomfield, Colorado, has what may be the first clinically available mass spectrometry (MS)-based protein assay—the Veristrat test. The test, which was launched in spring 2009, requires only a blood sample and uses a proprietary algorithm to determine how likely it is that a patient will respond to EGFR inhibitors therapy. “We have made mass spec a clinical tool for large protein biomarkers,” says CEO David Brunel. The samples are processed in Biodesix’s CLIA (Clinical Laboratory Improvement Amendments)-approved laboratory. A key advantage of the test is that it doesn’t require a tumor sample, which can be difficult to obtain. The Biodesix system looks at eight MS peaks comprising protein and peptide fragments. Half of these are variants or isoforms of serum amyloid A, which appear to interact with cancer-related pathways, including MAPK (mitogen-activated protein kinase). The other half have been harder to name. The company will likely have to identify the proteins in the test to convince skeptics of its value. Amgen’s Patterson says that for now, the more common use of proteomics continues to be for quantifying a few known proteins, as Amgen did in its motesanib/P1GF study, and not for fishing for complex markers and/or signatures. In the motesanib study, the company used a traditional immunoassay-based approach. “A lot of people using this [mass spec-based] approach ended up looking at the same proteins and fragments in the end, and a lot of the signature work has been disproven,” Patterson says. Gene expression has been beset by similar challenges. If researchers are going to use either of these tools in lung cancer trials, accuracy and reproducibility need to be improved. Some of the steps that need to be taken are actually surprisingly simple. For example, researchers at the Cambridge Research Institute in Cambridge, UK, recently published a simple way of better preserving lung cancer biopsies11. Getting enough tissue to test for anything is often a challenge. “Only about 30% of patients are suitable to have their tumor cut out. The other 70% only have tiny fragments removed,” says Malcolm Lawson,
1002
one of the Cambridge researchers. RNA, in particular, tends to degrade quickly. Lawson and his colleagues can increase the levels of RNA preserved by 50–70% just by using a known RNA preservative before freezing the samples. As a result, more of the precious tissue from samples is available later. “Microarray technology is on the cusp of being more widely used,” says Lawson. “We really need to start thinking about these practical steps that can greatly improve our results.” The cost of the tools are now less expensive and there remains the urgent need for standards. Circulating tumor cells are another promising new strategy for lung cancer diagnostics, but until now, it has been impractical to measure them because they are rare compared to other blood components. “The standard commercial technology is not very sensitive and is harsh on the cells,” says Haber. His group is developing a microfluidicsbased microchip (the circulating tumor cell (CTC) chip) that uses antibodies to epithelial cell adhesion molecule to snag the circulating tumor cells. The core technology was developed by Mehmet Toner, a professor of surgery at the Massachusetts General Hospital. The latest version uses a series of V-shaped indentations to create currents that bring more cells into contact with the walls of the device. Haber hopes that within a year or two, “we’ll be monitoring the genetic status of tumors using CTC [chip]s, in real time.” S. San Francisco–based Nodality is taking a similar approach. Developed in the laboratory of geneticist Gary Nolan of Stanford University in Stanford, California, their technology—single-cell network profiling— is based on flow cytometry. Antibodies are used to measure phosphoproteins in signaling pathways. Nolan developed a means of permeabilizing the cell membrane so that antibodies could enter cells and measure internal pathways. This approach “can characterize not just the surface of cells, but also the activation levels of pathways related to response and resistance cell by cell,” says CEO David Parkinson. The technology, and related algorithms, allows the analysis of hundreds of thousands of individual cells, resting and stimulated. “At its core, personalized medicine is the individual characterization of complex biology,”
says Parkinson. It may be that some subsets of cancer patients will not be identified until a deeper level of analysis is applied to them. Nodality has already used the technology to characterize responsiveness of acute myeloblastic leukemia to standard induction therapy, which works in about 60% of patients12. The researchers are finishing the validation of that test. They have also characterized subtypes of acute myeloid leukemia based on survival, DNA damage and apoptosis pathways13. Circulating tumor cells from lung cancer patients are one of their next targets. Many of these new approaches are several years, or more, away from realization. Still, there’s finally some optimism filtering into the conversation about lung cancer. “Ten years ago lung cancer was the least studied malignancy and the one with the fewest options. Now, because of genetics, we’re doing better with it than some other cancers,” says Haber. Kris concurs. “Tens of thousands of people with lung cancer are now having their lives made better, and we are leading the way with mutation testing,” he says. As more pathways are mapped out, new tools for studying lung cancer will be developed, and these tools will undoubtedly help researchers in other fields of oncology as well. Although there is optimism, the way forward still is not clear and a lot of hope is pinned on new technologies and novel trial designs, like BATTLE’s. For these efforts to be successful, everyone, including big pharma and regulators, will need to get truly on board. Malorye Allison, Acton, Massachusetts
1. Chustecka, Z. Medscape Today, 7 June 2010 2. Stinchcombe, T.E. & Govindan, R. Lancet Oncol. 11, 604–605 (2010). 3. Subramanian, J. et al. J. Thorac. Oncol. 5, 1116–1119 (2010). 4. AstraZeneca. IRESSA Label Change Press Release (AstraZeneca, London) (17 June 2005). 5. Polverino, A. et al. Cancer Res. 66, 8715–8721 (2006). 6. Bass, M.B. et al. J. Clin. Oncol. 28, 15s, Suppl; abstr 3037 (2010). 7. Zhang, X. et al. J. Nucl. Med. 47, 113–121 (2006). 8. Gambhir, S.S. et al. PLoS Med. 5, 1287–1297 (2008). 9. Pao, W. et al. PLoS Med. 2, e73 (2005). 10. Allison, M. Nat. Biotechnol. 28, 383–384 (2010). 11. Lawson, M.H. et al. J. Thorac. Oncol. 5, 956–963 (2010). 12. Kornblau, S.M. et al. Clin. Cancer Res. 16, 3721–3733 (2010). 13. Rosen, D.B. et al. PLoS ONE 5, e12405 (2010).
volume 28 number 10 october 2010 nature biotechnology