No title

Author: Andrew Marshall

8 downloads 1568 Views 28MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

GLOSSARY

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

A note on nomenclature Laura H Reid & Janet A Warrington The following glossary defines key terms and concepts that are used throughout the Microarray Quality Control (MAQC) consortium manuscripts. Wherever possible, the definitions are based on the Clinical and Laboratory Standards Institute harmonized terminology database (http://www.clsi.org). Detection call. A qualitative value that suggests a level of confidence in the signal calculated for that probe. In the MAQC study, the detection calls were binary and reduced to either ‘0’ for ‘not detected’ or ‘1’ for ‘detected’. For some platforms, the detection call reflects the quality of the nucleic acid spot on the microarray, similar to ‘Flag/No Flag’ scores. On other platforms, the detection call reflects the abundance of the target transcript or the concordance of results between multiple probes in a probe set, similar to ‘Absent/Present’ calls. Although the final detection call is qualitative, it is usually based on quantitative assessments and complex statistics. External RNA control. An RNA species added to a biological sample during processing for the purpose of assessing technical performance of a gene expression assay. Different external RNA controls may be used to monitor different processes. In microarray research, external RNA controls are added either to a total RNA sample (to assess the enzymatic processes involved and the hybridization step) or to the labeled cRNA (to assess hybridization efficiencies only). Gene. An expanded definition of this term was adopted by the MAQC consortium to denote both a DNA segment and the collection of RNA transcripts derived from it. In the DNA usage, a gene is a locatable region of genomic sequence, corresponding to

Laura H. Reid, Expression Analysis, Inc., 2605 Meridian Parkway, Durham, North Carolina 27713, USA. Janet A. Warrington, Affymetrix, Inc., 3420 Central Expressway, Santa Clara, CA 95051, USA

ii

a unit of inheritance, which is associated with regulatory regions, transcribed regions and/or other functional sequence regions. In the RNA usage, a gene often refers to the targets measured in a gene expression assay. Probe. A discrete piece of nucleic acid used to identify specific DNA or RNA molecules bearing the complementary sequence. Some microarray platforms rely on a single oligonucleotide probe to assay an RNA target; others combine data from multiple probes, arranged in a probe set, when calculating expression values for a target. Bead-based assays attach oligonucleotide probes to a microscopic bead surface. PCR-based assays use a pair of oligonucleotide primers (also referred to here as probes) to identify and amplify their intended RNA target, and in some cases, an oligonucleotide detection probe is hybridized to the amplified target. Repeatability. The ability to provide closely similar results from replicate samples processed in parallel at the same test site using the same gene expression assay. Reproducibility. The ability to provide closely similar results from replicate samples processed with different microarray platforms or at different test sites using the same gene expression assay. Signal. The quantitative expression value for each probe derived from a hybridization image after preprocessing steps, such as background subtraction and summarizing of data from multiple probes, as well as normalization procedures that remove systematic artifacts. Signals are not the raw fluorescence or chemiluminescence intensities captured in a pixelated microarray image. Target. Nucleic acid whose identity and/or abundance is revealed during the assay. The gene expression assays in the MAQC study have RNA targets. Multiple RNA targets can be transcribed from a single gene and individual transcripts can be alternatively spliced into multiple targets with different functions and expression patterns. Thus, a gene expression assay designed for one target may actually detect multiple RNA transcripts.

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

IN THIS ISSUE

MicroArray Quality Control project Since 2004, when the US Food and Drug Administration (FDA; Rockville, MD) started accepting voluntary genomic data submissions, the number and scope of DNA microarray–based expression data analyses filed as accompanying (but non-binding) information with new drug applications has been steadily increasing. And although the potential value of the information contained in these submissions is undisputed, no clear guidelines and standards have as yet been established for their use as part of a regulatory decision-making process [Foreword, p. 1103; Commentary, p. 1105]. But human healthcare is not the only area where microarrays represent a promising technology; environmental monitoring of pollutants through toxicogenomics, for instance, could also greatly benefit from their adoption. In a similar manner to their use in drug monitoring, microarrays could also be applied to detect early, subchronic exposure to pollutants using model systems or, at the very least, to characterize some of the underlying molecular mechanisms in toxicity [Commentary, p. 1108]. The practical challenges in implementing microarrays for the above applications will not be trivial, however. To translate the outcome of microarray analyses into the clinical and regulatory realms, many questions regarding sensitivity, reproducibility and ultimately biological significance remain to be answered [Commentary, p. 1112]. It is in this context that the MicroArray Quality Control (MAQC) project was conceived by a group of regulatory, academic and industrial partners to comprehensively tackle some of the technical issues surrounding the robustness and comparability of some of the most widely used microarray platforms. Starting with two well-defined, commercially available RNA samples, this consortium has carried out a sideby-side evaluation of seven different platforms with the aim of establishing a series of metrics that would facilitate future standardization approaches [Article, p. 1151]. To validate that microarray data are comparable to data obtained from other, more traditional gene expression assays, the MAQC data set was also assessed against three quantitative molecular assays for measuring gene transcription; and it turns out that the overlap is encouragingly high [Analysis, p. 1115]. Another important question to be addressed by the MAQC consortium was the use of RNA aliquots, external to the actual samples, that can serve as internal, technical controls for evaluating the level of performance at different steps of the experimental protocol, from reverse transcription to labeling of the samples [Analysis, p. 1132]. If adopted widely by the community, these and similar external RNA controls could provide researchers with a qualitative assessment of their assay’s performance. In a separate experiment, the consortium also put the quantification capabilities of the different platforms to test. Using a series of titration samples, good concordance of predicted and actual measurements was reported across platforms [Analysis, p. 1123]. In the early days of microarrays, two-color detection protocols were often preferred to those using one-color labeling of RNA because they could compensate for some of the imperfections and inaccuracies in microarray probe spotting. However, with improvements in microarray manufacture, the performance of one-color versus two-color platforms is becoming a central question for high-volume data generation with microarrays, in that robust and reliable single-color protocols would greatly facilitate implementation, and reduce the cost, of analyses [Analysis, p. 1140]. In a final report, the MAQC group applies their approach to real-world toxicogenomic analysis of rats exposed to three plant-derived carcinogenic compounds, aristolochic acid, riddelliine and comfrey. Again, the results across platforms showed high accuracy, reproducibility and biological relevance [Article, p. 1162]. AM & GTO

Next month in • Activated sludge metagenomics • Genome of a bioplastic producer • Knock-ins for knockout anti-inflammatory mAbs • Arrested protein chip fabrication • High-definition microarray for DNA binding site searches

In This Issue written by Michael Francisco, Peter Hare, Sabine Louët, Andrew Marshall, Gaspar Taroncher-Oldenburg & Jan-Willem Theunissen.

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

Patent roundup • Timothy Caulfield and colleagues report that policy makers may respond more to media controversies than systematic data on gene patenting. [Patent Article, p. 1091] MF • A US federal appeals court ruled on August 3 that Cambridgebased Transkaryotic Therapies (TKT), acquired last year by Shire, has infringed two patents held by Amgen for the production of erythropoietin. [News in Brief, p.1048] SL • Recent patent applications in tissue engineering. [New Patents, p. 1095]

MF

vii

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

EDITORIAL

Making the most of microarrays A major, multicenter study of microarray performance is a first step in translating the technology from bench to bedside.

N

o technology embodies the rise of ‘omic’ science more than the DNA microarray. First reduced to practice in the early 1990s, it has since undergone numerous iterations, adaptations and refinements to achieve its present status as the platform of choice for massively parallel gene expression profiling. Today, several thousand papers describing data from microarrays are published each year. Sales of arrayers, array scanners and microarray kits to the academic and industrial R&D community represent a multi-billion-dollar business. The microarray has even made its first forays into the clinic, with the US Food and Drug Administration’s approval of the ‘AmpliChip’ to help physicians tailor patient dosages of drugs that are metabolized differentially by cytochrome P450 enzyme variants. And yet doubts linger about the reproducibility of microarray experiments at different sites, the comparability of results on different platforms and even the variability of microarray results in the same laboratory. After 15 years of research and development, broad consensus is still lacking concerning best practice not only for experimental design and sample preparation, but also for data acquisition, statistical analysis and interpretation. Though problematic for bench research, lack of resolution of these issues continues to even more seriously hamper translation of microarray technology into the regulatory and clinical settings. Indeed, several regulatory authorities have been wrestling with the problem of how and when (and indeed whether) to implement microarray expression profiling data as part of their decision-making processes. The move in the past two years to accept voluntary genomic data submissions by regulatory agencies overseeing human and environmental safety was the first in a long series of steps that will be needed. One of the next steps can be found in this issue, which presents the first formal results of the MicroArray Quality Control (MAQC) Consortium—an unprecedented, community-wide effort, spearheaded by FDA scientists, that seeks to experimentally address the key issues surrounding the reliability of DNA microarray data. MAQC brings together more than a hundred researchers at 51 academic, government and commercial institutions to assess the performance of seven microarray platforms in profiling the expression of two commercially available RNA sample types. Results are compared not only at different locations and between different microarray formats but also in relation to three more traditional quantitative gene expression assays. Although the direct comparison of microarray platforms and the establishment of common controls for microarray experiments is nothing new—several cross-format studies have already been published, and other groups, such as the External RNA Controls Consortium’s (ERCC), are developing standardized RNA controls—it is the size and comprehensiveness of the data set generated by the MAQC effort that is unique. In the main study, ~60 hybridizations were carried out on each of the seven platforms; >1,300 microarrays were used during the entire project.

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

MAQC’s main conclusions confirm that, with careful experimental design and appropriate data transformation and analysis, microarray data can indeed be reproducible and comparable among different formats and laboratories, irrespective of sample labeling format. The data also demonstrate that fold change results from microarray experiments correlate closely with results from assays like quantitative reverse transcription PCR. The levels of variation observed between microarray runs by MAQC were relatively low and largely attributable to cross-platform differences in probe binding to alternatively spliced transcripts or to transcripts that show a high degree of cross-hybridization to probes other than their own. Thus, although factors as diverse as day-to-day fluctuations in atmospheric ozone levels (which effect cyanidine 5 fluorescence), nuclease levels in sample tissues and the quality of microarray production between batches have all been cited as influencing array performance, on the basis of the data presented here, experimental variability appears manageable. Another clear finding is that the days of the simple two-sample t-test as a means of ranking differentially expressed genes are surely numbered. A key take-home message is that statistical analysis in regulatory submissions and clinical diagnostics is likely to be different from that used in basic research and discovery. In the case of the MAQC study—where the goal was to optimize intra- and inter-platform reproducibility—the approach was to limit the number of transcripts identified and to sort differentially expressed genes using fold-change ranking with a nonstringent P-value cutoff. But for experiments that seek to identify differentially expressed transcripts at or near the lower limits of detection, this tradeoff between reproducibility on the one hand and precision and sensitivity on the other is likely to shift, and a different type of statistical analysis will be required. There is no one-size-fits-all statistical solution. Overall, the MAQC study represents a landmark in DNA microarray research because it provides the community with a thoroughly characterized reference data set against which new refinements in platforms and probe sets can be compared. It complements other initiatives, such as the ERCC, in providing the community with two commercially available human reference RNA samples that can be used to calibrate arrays in ongoing quality control and performance validation efforts. It can be used as the foundation for combining other microarray studies, thereby realizing the true cumulative potential of microarray data, which will undoubtedly lead to new insights. And from a clinical perspective, it validates the DNA microarray as a tool that is sufficiently robust and reliable to be embraced for use on hard-to-obtain human tissue samples. Clearly, microarrays have a long way to go before they can be used to support regulatory decision-making or accurate and consistent prediction of patient outcomes in the clinic. But the MAQC study has given us a solid foundation from which to build.

1039

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

EDITORIAL

Can Europe accelerate out of trouble? Europe should seriously consider the ‘accelerator’ concept to foster the sustainability of its biotech companies.

M

any of Europe’s biotech firms appear permanently stuck in a state of arrested development. Indeed, compared with their counterparts over the ocean, European startup companies continue to find it hard to achieve the size and stature requisite for commercial success. A recent report reveals that compared with the United States, Europe has an awful lot of small companies that, on average, grow much slower than their US counterparts. But help may be at hand in the form of a new incubator concept pioneered on the West Coast of the United States. According to the latest report from EuropaBio and consultants Critical I, Biotech in Europe: 2006 Comparative Study, two-thirds of European companies have under 20 employees, whereas two-thirds of US companies employ more than 20 people. One would expect that companies established for, say, 2 years or less would be small, and the report confirms this, both for Europe and for the United States. In the United States, however, the initial phase of company growth is rapid: by the time US companies are 5 years old, 75% of them have more than 20 employees; in Europe, by contrast, companies employing ‘less than 20 employees’ are the largest group, right up until the firms are beyond 15 years old. One reason that US biotech fares better is that US entrepreneurs and investors continue to look for ways of growing companies more efficiently. One of the models that is growing in popularity is the accelerator. Like incubators, accelerators provide customizable laboratory and business space for young companies. Unlike incubators, which bring small chunks of fluffy capital, cramped facilities and low-grade access to a centralized team of distracted and generically qualified management mentors, accelerators provide a combination of concentrated capital overlaid with specific and committed technical, clinical or market expertise. The availability of greater amounts of seed and startup cash (on the order of ~$4 million per company) certainly reduces one of the major risks that young companies face, and by favoring companies that are past the point of discovery, accelerators certainly cut out a large chunk of technology risk. However, accelerators endeavor to take risk reduction even further. Consider, for instance, the eponymous Seattle, Washington–based Accelerator, started in 2003. Leroy Hood of the Institute for Systems Biology is Accelerator’s president (p. 1055), and Amgen’s venture fund is a founding partner. That gives companies backed by Accelerator (five, so far) instant access to world-class understanding of technology and market issue. Through its founders and management, Accelerator has close ties to several of the Pacific Northwest’s (and America’s) leading venture capital firms, such as MPM, Versant and ARCH. Although Accelerator backs companies addressing various slow steps in the healthcare product development process, other accelerators focus on particular areas of clinical practice. One of the most highly focused is the Hackensack, New Jersey, firm Advanced Technologies,

1040

which has started or re-started six companies that are each developing medical devices for interventional cardiology products. The team running Advanced Technologies includes seasoned investors, cardiologists and clinicians, all of whom have clear roles to play in speeding up the development, clinical adoption and commercialization of cardiovascular devices and hence in providing expedited investment and business exits. More accelerators are on the way. A consortium of large pharmaceutical firms is said to be considering creating one in the Cambridge, Massachusetts biotech cluster. And another may be built in the San Diego biotech cluster. Oddly, just as accelerators are finding new ways to make the milieu for new US firms more encouraging and less risky, the opposite may be true in Europe. In the United States and the more advanced parts of Europe, the rate of formation of new companies has slowed in recent years. Consequently, a large proportion of the new European foundlings are arising in nations or regions that are themselves new to biotech. Often, there are precious few biotech-relevant resources in these locations, beyond a bit of seed money: there are no substantial finance streams, no management skills, no biotech-experienced support infrastructure of lawyers, accountants and consultants. Such environments are precisely the opposite of accelerators, and are likely to have precisely the opposite effect. Global competition and technology supercession means that biotech firms need to have a ‘Red Queen’ mentality. But trying to ‘run as fast as you can just to stay still’ is difficult if you are wading through mud. The lesson for companies in nations with new, fledgling biotech sectors is that they need to reach out beyond national borders to management and financiers in other, more established biotech clusters. It’s important to work with these experienced executives and investors because they are familiar with the idiosyncrasies and protracted timelines of life science ventures and they have the requisite historical and international perspective to place new biotech platforms or products in their proper global competitive context. In this respect, the Accelerator model looks particularly interesting. Given the difficulty of pooling investors and management expertise and the relative scarcity of truly globally competitive ventures emerging at the national level, perhaps a pan-European accelerator could be an effective approach. Certainly, if European centers of scientific excellence don’t want much of their first-class intellectual property to be hamstrung by underfunding, naive management and unsupportive surroundings, they should seriously consider the concept. Europe doesn’t need more biotech ventures; it needs more successful ones. And starting biotech accelerators would be one means of bringing together the sort of expertise and funding that could increase the chances that that would happen.

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

NEWS

Companies eye slice of age-related macular degeneration market Genentech’s new antibody therapy Lucentis (ranimizumab), approved in June, has the potential to dominate the market for a common eye disease. In a few years, however, it may face direct competition from Avastin (bevacizumab), a sister drug made by the same company. Meanwhile, OSI Pharmaceuticals and QLT, which already have drugs approved for the same eye disease, are trying to consolidate their market positions. As yet, several other potential competitors with drugs in phase 2 development for the same indication have not demonstrated any advantages over Lucentis. On June 30 the US Food and Drug Administration (FDA) announced it had approved Lucentis, a treatment for wet agerelated macular degeneration (AMD). The drug is administered as an injection into the eye and is a humanized antibody FabV2 fragment that targets vascular endothelial growth factor (VEGF), a protein associated with growth and leakage of blood vessels that causes vision to decline. Lucentis, which is made by S. San Franciscobased Genentech, is the first approved treatment to restore sight in a significant percentage of patients afflicted with the disease. AMD is a major cause of blindness in people over 50 years old. Until now, drugs approved for AMD could only slow the progression of the disease, rather

Scripps Howard Photo Service/John Rottet/ Raleigh News & Observer/NewsCom

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

ALSO IN THIS SECTION New clinical trials policy at FDA p1043 Amgen’s TPO mimic faces stiff competition p1044 BioXell: an Italian biotech success story? p1045 News in brief p1048 Profile: Abe Abuchowsky p1050

An eye surgeon performs microsurgery on a patient with age-related macular degeneration. New drugs, such as Lucentis, could reduce the need for such procedures.

than reverse it. But in phase 3 clinical studies of Lucentis, vision improved in more than one third of the individuals who took it. Experts say Lucentis will likely steal a significant market share from the two AMD treatments on the market: Pfizer/OSI Pharmaceuticals’ aptamer Macugen (pegaptanib) and Novartis/ QLT’s small molecule Visudyne (verteporfin). Lucentis may also discourage companies from

developing new therapies. “The success of Lucentis has raised the bar so much that it makes it difficult to come up with a drug that’s better,” says Julia Haller, a professor of ophthalmology at Johns Hopkins University in Baltimore. But physicians are discovering on their own that Genentech’s approved cancer drug, Avastin, an anti-angiogenic antibody that binds VEGF, may work for AMD just as well and just as safely

Table 1 Drugs currently in development for the treatment of wet AMD Product

Company

Mechanism of action

Phase

Evizon (squalamine)

Genaera (Plymouth Meeting, Pennsylvania)

Anti-angiogenic; inhibits VEGF, PDGFβ, thrombin & bFGF intracellular pathways

3

PTK787 (vatalanib)

Novartis (Basel,)/Schering (Berlin)

Small-molecule VEGFR kinase inhibitor

3 (cancer) 2 (AMD)

Retaane (anecortave acetate)

Alcon (Fort Worth, Texas)

Small molecule angiostatic cortisene; inhibits angiogenesis induced by basic fibroblast growth factor, VEGF and other known stimulators

3 (application withdrawn from EMEA after asked for more data; awaiting FDA approval)

AG-13958

Pfizer (New York)

Inhibits tyrosine kinases, including VEGF

2

CAND5

Acuity Pharmaceuticals (Philadelphia, Pennsylvania)

Gene silencing siRNA therapy that reduces production of VEGF

2

Combretastatin A4 Prodrug (combretastatin)

OXiGENE (Watertown, Massachusetts)

Tubulin inhibitor; disrupts the structure of endothelial cells lining the tumor vasculature to stop flow of blood and nutrients to tumor

2

VEGF Trap

Regeneron Pharmaceuticals (Tarrytown, New York)

Recombinant decoy receptor fusion protein that binds to all forms of VEGF-A and placental growth factor

2

Source: Evaluate Pharma (http://www.evaluatepharma.com/) and company websites and information. PDGFb, platelet-derived growth factor b; VEGFR, VEGF receptor. EW

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

1041

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

NEWS as Lucentis —and for a fraction of the price. Industry insiders say that if independent investigators complete enough safety and efficacy studies, off-label Avastin prescribed for AMD may become Lucentis’ greatest competitor (Box 1). However, analysts such as Joshua Schimmer, at SG Cowen in New York, believe Lucentis will likely capture 65% of the $1-billion US market. Indeed, Lucentis’ arrival on the market will be a major blow to Macugen, which was developed by OSI Pharmaceuticals and is marketed by Pfizer. Researchers say the difference between the drugs may be the amount of VEGF each drug inhibits. Although Lucentis can bind and inhibit all the active molecular forms of VEGF, Macugen binds to only one form of VEGF, called VEGF-165.“Macugen is effective but not as effective as Lucentis,” says Tony Adamis, chief scientific officer of OSI. “Lucentis’ data is so impressive…even I have to admit that.” In August, OSI announced that due to competition, it would suspend or curtail all R&D for eye diseases, except on Macugen. OSI’s Adamis says he believes Macugen could find a spot on the market as a maintenance drug—something individuals can take after they’ve reaped the benefits of Lucentis. Based on their interpretation of Genentech’s data, OSI theorizes that patients could take Lucentis for their first three doses, and then switch to Macugen, which costs about half as much, and, on average, is injected less frequently. The company has set up a clinical trial to test this proposal. Many experts, however, say OSI’s theory is based on little or no data, and is a last attempt to salvage their product. “They are trying to use a marketing ploy to feed off ophthalmologists who are not in the know,” says Peter Campochiaro, a professor of ophthalmology at the Wilmer Eye Institute at Johns Hopkins. “They’ve been pretty shameless.” Lucentis’ other approved predecessor, Visudyne is a treatment called photodynamic therapy, in which the drug is injected into the bloodstream and activated in the eye by a light beam. Like Macugen, it can slow the progression of AMD, but usually cannot reverse it. “It’s hard to see it playing a meaningful role going forward,” says Schimmer. Visudyne may be useful for individuals who cannot endure an injection in the eye, he says. It may also be used as a combination treatment with Lucentis, although data so far have not supported this treatment regimen, he says. As Lucentis edges out drugs already on the market, it may also douse enthusiasm and funding for some early-stage AMD candidates. Just two months after Genentech announced in July 2005 its phase 3 results for Lucentis, Alnylam Pharmaceuticals, a biotech

1042

Box 1 Avastin could become Lucentis’ greatest competitor With little immediate competition from candidates in the development pipeline, Lucentis’ greatest competitor may be its sister drug, Avastin. The drug stems from the same murine monoclonal antibody as Lucentis. Avastin, however, is a full-length antibody, whereas Lucentis is an antibody fragment. Avastin is also designed as an intravenous drug and has a longer half-life than Lucentis. Genentech scientists say these components make Lucentis better tailored for the eye, with less chance of inflammation and better binding with VEGF. When a standard vile of Avastin is split into eye-sized doses, the drug costs less than $50 per injection compared to the nearly $2,000 per dose estimated for Lucentis. In anecdotal reports and small independent studies of Avastin used off-label for AMD, the drug appears safe and effective, and word has spread in the ophthalmology community. “Reports have been small and anecdotal,” says Jeffrey Heier, a vitreoretinal specialist at Ophthalmic Consultants of Boston. “But all of us have used it on enough patients to know that the results are real.” Genentech, however, decided in the late 1990s to stop pursuing Avastin as an AMD drug. In response, some clinicians are trying to organize their own large-scale study so the doctors will have more concrete data on which to base their prescription choices. EW

Year 1989

Discovery/event • Napoleone Ferrara at Genentech discovers and clones VEGF

1993

• Ferrara and colleagues published pre clinical data showing that an anti-VEGF antibody can suppress tumor growth and angiogenesis—the formation of new blood vessels (Nature 362, 841–844, 1993)

1994

• Studies suggest VEGF may have a role in ocular diseases (NEJM 331, 1480–1487, 1994; Am. J. of Opthalmol. 118, 445–450, 1994)

1996

• Adamis and other Researchers at Massachusetts Eye and Ear Infirmary in Boston discover that a mouse monoclonal antibody against VEGF could be injected into monkey eyes to prevent blood vessels from growing (Arch. Ophthalmol. 114, 66–71, 1996). The cross-species experiment didn’t cause inflammation, suggesting that a humanized version may not cause inflammation if injected into human eyes.

1996

• Genentech humanizes the anti-VEGF antibody

1997 1997

• Phase 1 trials begin for Avastin, a full-length monoclonal antibody targeting VEGF • Genentech compares full-length anti-VEGF antibodies with antibody fragments (Fab) and finds that the fragments better penetrate the retina (Toxicol. Pathol. 27, 536–544, 1999). Their findings compel the company to steer Avastin down a cancer pipeline, and develop a new therapy—Lucentis—for the eye. Researchers later suggest that the study was flawed. “While the Fab appeared to penetrate better than the full-length antibody, the study was flawed due to the fact that the two molecules recognized different antigens: the Fab was directed against VEGF, and the full-length antibody was directed against an antigen expressed within the inner retina known as HER2,” writes Philip Rosenfeld, an ophthalmologist at the University of Miami’s Bascom Palmer Eye Institute, in a 2006 issue of Ophthalmology.

1999

• Phase 1a trial begins for Lucentis, an antibody fragment targeting VEGF made from the same murine monoclonal antibody as Avastin

2004

• FDA approves Avastin for metastatic cancer of the colon or rectum

2005

• Stephan Michels and his colleagues suggested in a small study that Avastin is safe and can improve macular anatomy and vision in people with wet AMD (Ophthalmol. 112,1035–1047, 2005) • Small studies and anecdotal reports conducted by clinicians support Michels’ findings

2005 2006

• Ziad Bashshur and colleagues at the American University of Beirut Medical Center in Lebanon publish the first prospective study of Avastin for AMD. (Am. J. Ophthalmol. 142, 1–9, 2006). Conducted in Lebanon on 17 human subjects, the study found marked improvement in nearly every eye studied, with no side effects.

2006

• FDA approves Lucentis for wet AMD

2006

• Clinicians vow to conduct a large-scale US clinical study of Avastin

EW

Lucentis timeline: the evolution of two anti-VEGF drugs under one roof.

company in Boston, announced that it would halt development of its AMD drug because of competition. But others persevere. Among the most promising candidates in the development pipeline, some experts say, is the VEGF Trap by Regeneron Pharmaceuticals. Scientists believe the drug works by binding more effectively with VEGF, thereby blocking VEGF receptors. A phase 1 study showed that a single injection lasted at least six weeks. Some companies are exploring drug candidates that can be delivered systemically, or into

the bloodstream (Table 1). Although Lucentis is one step ahead of competitors, it has some drawbacks, and industry insiders say there is still some room on the market for new products. Eye injections are rough on patients and carry risk. Lucentis must be injected into the eye every month for the first four months, and then at varying frequency afterward. A drug that lasts longer or can be administered less invasively and less frequently than Lucentis has potential. Emily Waltz, New York

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

NEWS

In a bid to speed drug development, the US Food and Drug Administration (FDA) is encouraging drug companies to design clinical trials with flexible enrollment, dosing and other parameters. Called ‘adaptive design,’ the approach promises quicker results with smaller trials, but also carries risks of manipulation, according to observers. At a July meeting in Washington, DC, FDA deputy commissioner for medical and scientific affairs Scott Gottlieb, laid out the agency’s plan to develop five guidance papers over the next several years. Although not binding, the documents will help drug companies design and implement adaptive trials that the FDA considers up to snuff. “We have a dilemma. [Trial] costs are spiraling upwards, trials are getting bigger, patient resources are shrinking, there are a lot of drugs in the pipeline, and it’s getting harder to measure endpoints. The old paradigm just isn’t working,” says Brian Schwartz, senior vice president for research at Ziopharm Oncology, of New York. In a typical clinical trial, parameters such as drug dosages and the number of patients in each arm of the trial are predetermined and immutable. Adaptive trials, in contrast, allow tweaking of dosages, patient pool sizes, and so on in response to incoming data. Proponents describe adaptive trials as iterative, with each new round of parameters informed by lessons learned on the fly. “It’s more of a seamless approach,” says Gottlieb. Gottlieb also says that adaptive trials will more quickly rule out unsafe or ineffective drug candidates. “The ability to fail faster is an important advance,” but “adaptive procedures are more complicated to design and analyze, and in some settings more difficult to implement.” In addition to these challenges, the FDA could have trouble getting buy-in on the concept, says Mark Senak, a consultant at Fleishman-Hillard who runs the ‘Eye on FDA’ blog. “The agency and industry will have a tough time selling the concept to policy makers and to a public that is already skeptical of clinical trial design and safety,” he says. Already, though, industry is embracing the concept. Wyeth recently hired a new vice president for adaptive trials, and Robert O’Neill, director of the office of biostatistics at the FDA’s Center for Drug Evaluation and Research, says that each of the FDA’s drug evaluation branches has received adaptive trial proposals. “The FDA is very interested in the concept,” says Mark Chang, a biostatistician at Millennium Pharmaceuticals, Cambridge, Massachusetts. “They’ve begun working closely with industry on adaptive trial designs, and they’re encouraging companies to

AP Photo/Hermann J. Knippertz

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

New clinical trials policy at FDA

Bayer Healthcare was one of the first companies to follow adaptive clinical trials protocol to determine for which cancer its drug Nevaxar would be more potent.

approach them early in the process.” Although the concept is widely embraced— few would argue against speeding up phase 2 and 3 clinical trials, which can drag on for five or more years—the mechanics of adaptive trials present thorny statistical challenges, says Chang. And Schwartz says that companies interested in adaptive trials tend to underestimate the difficulty of collecting real-time data. “By the time they look at the first 300 patients, there might be 900 patients in the trial,” he says. Companies need to develop simulations to test adaptive scenarios, says Chang. In his models, the two most common variations involve ongoing assessment of sample size and enrichment of the treatment arm with patients most likely to benefit. For instance, Chang will model a range of patients’ responses to a drug, a key factor in sizing trials—smaller variations require smaller sample sizes. Enrichment scenarios, by comparison, often call for first discovering biomarkers in the best responders and then adding more of those patients to the protocol. Chang and Gottlieb also envision ‘pivotal’ trials that combine phase 2 dosing and phase 3 effectiveness studies. “You can run a lot more doses, maybe five instead of two,” says Chang. Bayer Healthcare, based in Leverkusen, Germany, adopted an adaptive approach for its phase 2 trial of a new cancer drug. Without knowing which types of cancer Nexavar (sorafenib tosylate) would fight best, the company enrolled patients suffering a range of advanced cancers. “We knew pretty quickly, within ten or so patients, that kidney cancer was the best responder,” says Schwartz, who helped

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

run the trial before joining Ziopharm earlier this summer. Bayer then designed a traditional phase 3 trial for renal cell cancer. This approach illustrates another impetus for adaptive trials: many new cancer drugs stop a tumor from spreading but don’t necessarily shrink it. “The traditional endpoint of tumor shrinkage just doesn’t make sense anymore,” says Schwartz. He cautions, however, that committees for evaluating and modifying trials on the fly need to be “completely independent” from sponsors. “That’s the only way to maintain integrity. Industry can’t be within an arm’s length” of the evaluation committee, meaning “companies will have to give up some control,” says Schwartz. He urges the FDA to “very explicitly” spell out the role of the new committees. Most large trials already deploy a data safety and monitoring board empowered to end trials if wide benefits or severe adverse events appear early. However, these traditional committees simply collate data against pre-determined stopping points; the new committees will have much more power. Gottlieb says the FDA will issue two guidance papers in January 2007, with three more to follow. The first will provide guidelines for evaluating multiple trial endpoints; the second will outline how to enrich trials with patients most likely to benefit. “This is a wonderful opportunity,” comments Schwartz. “We want to get drugs to patients quickly and it’s frustrating to look back at some of our trials and see if we had changed this or that we could’ve had the drug to patients six months earlier.” Brian Vastag, Washington, DC

1043

NEWS

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

Amgen’s TPO mimic faces stiff competition With seemingly novel science, unsatisfactory existing treatment options and a sizeable potential patient population, it’s hard to see how Amgen’s phase 3 platelet growth factor candidate to treat blood disorders could go wrong. But, with several small molecules in the pipeline, it could face stiff competition in platelet growth factor market. Amgen, headquartered in Thousand Oaks, California, has a first-in-class treatment for a deficient platelet count slated to wrap up phase 3 trials by the end of this year and hit the market in 2007. If approved for several related blood disorders, the drug—known as AMG 531— could be a treatment option for more than a half-million people in the US and Europe. And for many indications marked by reduced platelet counts, such as immune thrombocytopenic purpura, an illness causing abnormal bleeding, and chemotherapy-induced low platelet count, the treatment options are currently meager and include steroids as well as the infusion of platelets. AMG 531 is the result of more than a decade of effort to create protein-based thrombopoietin-stimulating agents to increase platelet counts. First discovered in 1994, recombinant versions of human thrombopoietin (rTPO) had the catastrophic side effect of inducing the production of antibodies that cross-reacted with the subject’s own TPO and the development of low platelet counts in normal subjects. Several companies, including Amgen, Genentech, Pfizer, Johnson & Johnson and Schering Plough, all subsequently abandoned their rTPO efforts once it was clear that it had the unintended effect of lowering rather than boosting platelet count. Around this time, research was published by peptide company Affymax of Palo Alto, California, describing a peptide that functioned as an erythropoietin (EPO) mimetic, by binding to and stimulating the EPO receptor. This never bore fruit, because it was less effective than EPO, so Affymax “then published on a similar strategy to identify a TPO mimetic peptide, which was close to TPO in specific activity,” remembers Kenneth Kaushansky, chair of the department of medicine at the University of California, in San Diego. “Hence was born a peptide approach to stimulating the TPO receptor,” he adds, “Others thought that screening large libraries of small organic molecules could also net mimetics, and that is where several other small-molecule mimics have come from.” In the wake of the debacle of rTPO, research has advanced along these two strategic paths— peptide and small-molecule development—to

1044

A new generation of platelet growth factors could succeed where recombinant thrombopoietin has failed.

create TPO mimetics. By the late 1990s researchers were quite successful at identifying a number of small molecules and peptides that bound with the TPO receptor. That’s when the peptide part of AMG 531 was identified in Amgen’s laboratory; it increases platelet production by binding to the TPO receptor and stimulating megakaryocytes, large cells in the bone marrow from which pieces break off to form platelets. Once Amgen researchers identified an effective peptide that did not seem likely to trigger an antibody response, they needed to improve the life span of the peptide in the bloodstream. To create the AMG 531 peptibody, Amgen combined its preselected peptide with a carrier molecule that extends the life of the drug in the patient’s circulatory system, according to Roy Baynes, Amgen’s vice president of oncology and supportive care. If it is approved by the US Food and Drug Administration, AMG 531 will be the first drug known as a peptibody to make it to market. Still, there are at least a half-dozen smallmolecule and small-protein platelet-stimulating agent projects currently at various stages of clinical development to treat diseases marked by platelet-deficiency, according to life sciences clinical trial research firm La Merie, located in Barcelona, Spain. AMG 531 is among the most clinically advanced treatments; but although it does not cross-react like rTPO, AMG 531 is still a relatively inconvenient treatment requiring weekly intravenous doses. By contrast, eltrombopag, developed by GlaxoSmith Kline (GSK) in London, which is also in phase 3 trials, is a small-molecule treatment for patients with low platelet count. Eltrombopag may be among the first small mole-

cules to modulate protein-to-protein interactions, a particularly hard target for this platform, according to market research firm Decision Resources, based in Waltham, Massachusetts. This scientific advance translates into a market advantage: administration orally via a tablet. Mark Schoenebaum, a research analyst at investment bank Bear Stearns in New York City, who follows Amgen thinks this could be a big obstacle for AMG 531. He expects the candidate, if approved, to peak out at a mere $300 million in sales. “It’s not thought to be a big drug,” asserts Schoenebaum. “It’s going to face serious competition from a pill from GSK if they are both approved. Since that’s an oral pill, it is cheaper to manufacture and more convenient.” The initial indication targeted by both Amgen and GSK is immune thrombocytopenic purpura, a condition common in HIV-infected people, in which the body produces antibodies against platelets in the blood. But the next indications targeted for approval are likely to include a whole range of conditions characterized by low platelet counts, including chemotherapy-induced thrombocytopenia. In the chemotherapy market, where Amgen has several major products including anemia treatment Epogen (erythropoietin) and which requires regular intravenous infusions, AMG 531 may still have an edge. But neither Amgen nor GSK are likely to have the last word. “There are 20 or 30 companies that are quietly working on small molecules with much better pharmacologic properties than the GSK molecule,” concludes Kuter. Stacy Lawrence, San Francisco

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

NEWS

Although BioXell’s successful initial public offering (IPO) on the Swiss Exchange SWX on June 22, which grossed CHF 57.8 ($46.9) million, can be considered a relatively unremarkable event for a company at its stage of development, it could have a wider significance for the Italian biotech sector. Indeed, Italian biotech has so far struggled to convert the country’s strength in life sciences research into a thriving commercial industry. Other, less heralded developments, including a series of regional initiatives and the entry of new investors into the sector also provide some grounds for optimism on the part of the industry’s supporters, but it still faces considerable financial and cultural constraints that could choke further development. BioXell’s decision to seek a listing in Zürich rather than its hometown of Milan underlines the lack of development of a fully fledged investment infrastructure for Italian biotech. In a similar vein, Villa-Guardia-based Gentium, a 2001 spinout of Crinos Industria Farmacobiologica, raised cash in recent, successive offerings on the American Stock Exchange and on Nasdaq in New York City, whereas the Italian founders of NiCox opted to establish that company as a French entity, located in Sophia Antipolis and quoted on the Euronext exchange in Paris. A Milan IPO “wasn’t really considered” says BioXell CEO Francesco Sinigaglia, whereas the Zürich exchange is home to several biotech successes and has the support of investors who understand the sector. Even so, the share offering, which was launched shortly after the general decline in global stock markets in early June, was priced at the bottom of the indicative price range of CHF 44–CHF 48 ($35.5–$38.8)

that the company published and investors took up the minimum number of shares on offer. However, the share price has held up since the IPO, hovering close to the initial offering price for the first six weeks of trading. The BioXell success remains a largely isolated one in the Italian landscape. Despite its prominence in fields such as oncology, immunology and neuroscience, Italy has been Europe’s most egregious underperformer in biotech during the past decade. Italy was bottom of a league table of 14 western European states that measured each country’s gross domestic product against its total number of biotech companies, according to the 2006 Ernst & Young biotech report “Beyond Borders.” An absence of risk capital, deficits in areas such as patenting and technology transfer, a historic inattention to the sector on the part of government and a general lack of interest in commercial biotech on the part of academic scientists have all contributed to this state of underdevelopment. “There is still very modest entrepreneurship in the biotech sector and not many structured and savvy intermediaries. Deal flow is not significant compared to other EU countries of similar size,” says Joël Besse, senior partner with Atlas Venture in London, who participated in investments in two Italian biotechs: Milan-based Novuspharma and Bressoheadquartered Newron Pharmaceuticals. These two firms, along with BioSearch Italia and Milan-based BioXell were all established as either spin-outs from or management buyouts of international pharma R&D centers that had been located in the country. Only BioXell and Newron Pharmaceuticals remain independent. BioSearch Italia merged with Versicor, to form Vicuron Pharmaceuticals, an

BioXell

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

BioXell: an Italian biotech success story?

Francesco Sinigaglia, BioXell’s CEO, is at the helm of one of Italian biotech’s success stories.

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

1045

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

NEWS anti-infectives specialist located in King of Prussia, Pennsylvania. Pfizer then acquired Vicuron for ~$1.9 billion in cash in September 2005. Novuspharma was acquired in January 2004 by Cell Therapeutics, of Seattle, in a stockbased deal initially valued at $236 million. It is difficult to predict whether other biotechs would follow the example of the likes of BioXell. The gap between these companies, all of which were established with relatively broad clinical development pipelines, seasoned management and access to international venture capital finance, and the rest of Italy’s fragmented and, for the most part, undercapitalized biotech industry has been considerable. The great challenge for the sector has been how to close that gap. Regional authorities, notably in Lombardy and Piedmont, where the bulk of Italy’s 160 biopharmaceutical firms are based, are actively involved in promoting biotech, through funding technology transfer agencies, incubators and seed funds (Box 1). Newer initiatives have sprung up in Tuscany and in Sardinia too.

“Clearly the lack of Italian specialist venture capital funds is a problem,” says Sinigaglia. However, individual companies are pursuing alternative funding models. Some, most notably MolMed, have managed to raise cash directly from financial institutions and private investors. MolMed, located in the San Raffaele Science Park, adjacent to the San Raffaele University Hospital and the eponymous Scientific Institute, the country’s largest private clinical research center, has so far secured some €60 ($77) million by this route and may undertake an IPO during the first half of 2007. “We have a broad pipeline and we think we would be ready in the near future,” says Marina Del Bue, general manager at Milan-based MolMed, which is developing cell-based therapies and biotech drugs for cancer. Elsewhere, investors in Genextra, a holding company with a controlling interest in four companies, agreed to double their commitment, to €60 ($77) million, this summer, following their participation in a $41-million investment round in Intercept

Box 1 Italian biotech park taps into traditional industries Italians are often praised for making up for the deficiencies of their country—burdened by bureaucracy and lack of flexibility—with individual creativity. The Canavese Bioindustry Park may be a proof that there is some truth in this cliché. The park is located near the northern city of Turin and its creation was supported in the 1990s by the Piedmont region with the aim of reinforcing the high-tech dimension of the local economy after a major crisis. As a result the park shareholders are 70% public and 30% private. Since 2004, seed capital for startups has been available thanks to a model of financing based on the business angel concept devised to bypass the lack of interest from venture capital investors for early stage projects. “We collect money from wealthy people with no experience in biotech, such as local small entrepreneurs in the textile or mechanical sector, lawyers or accountants. Before meeting us they never thought of becoming business angels,” says Silvano Fumero, who conceived the park when he was still head of R&D at Serono. Thirty people gave a total of €3 ($3.8) million, each contributing a small sum and becoming founding members of a seed capital society called Eporgen Venture. “We are hopeful in a couple of years the most promising newborn companies may attract investments from [the] biggest players, maybe one [of] the international venture capitalists we have involved in the selection of projects,” explains the park project manager Fabrizio Conicella. Birth rate is unusually high by Italian standards with five new startups born last year and the intention of starting another five companies by mid 2007. The initial success of the project is already a blow against the cultural and political foot-dragging of the country, but is it replicable? “We are actually examining the way to implement a similar model in the Rhône-Alps region [of France] but it’s not so easy,” says Valérie Ayache, managing director of the biotech association near Grenoble, Adebag. She points out that the motivations of people investing in Eporgen are very much related to the history of the territory, the charisma and experience of the project fathers, and the very integrated model they have created between the park and Eporgen. Other Italian regions are trying to learn a lesson from the Canavese experience, too: until now biotech has played a minor role in the national business angels network (Iban) but its secretary general Tomaso Marzotto Caotorta thinks it’s time to create a club of senior managers scouting Italian life sciences institutes for innovative ideas. Anna Meldolesi, Rome

Pharmaceuticals, a company headquartered in New York City but based on research into the bile acid–activated nuclear receptor farnesoid X performed at the University of Perugia. “Although it is supplying mentoring and administrative support, Genextra is neither an incubator nor an investor. We are not an investment fund. We are a biotechnology group,” says Paolo Fundaro, Genextra chief financial officer. Milan-based Genextra has high visibility in Italy because its backers include leading entrepreneurs and industrialists, such as its founder, telecoms entrepreneur Francesco Micheli, Marco Tronchetti Provera, chairman of Pirelli & Telecom Italia, FIAT chairman Luca Cordero di Montezemolo and Diego Della Valle, CEO and chairman of the luxury shoemaker Tod’s. The model is borrowed directly from that of another Micheliled enterprise, the internet and telecoms group eBiscom, now FastWeb, which raised $1.5 billion at the beginning of the decade. Its progress, along with that of BioXell—now the country’s flagship biotech firm—could help to shape investor sentiment toward the sector. Assobiotec, in Milan, which represents the industry, thinks the country’s new national government can help as well. One measure, says Assobiotec president Roberto Gradnik, would be to create a national agency for innovation that would support technology transfer and partnering. “At the moment, if anybody, such as a private investor, is interested in investing in biotechnology, they don’t know where to go,” he says. Risk-averse Italian investment funds might engage with the sector if a ‘guarantee fund’ were put in place—a sort of voluntary insurance scheme that would allow venture capital funds to offset their investment losses against profits on more successful ventures. Assobiotec is also trying to persuade the government to adapt the ‘Young Innovative Company’ concept, originally developed in France to provide tax breaks and other fiscal supports to research-intensive startup companies, to the Italian tax code. Italy had a change of government in May. In its new cabinet, led by Prime Minister Romano Prodi, responsibility for innovation policy was transferred from the research ministry to the industry ministry, headed by Pier Luigi Bersani. Gradnik interprets this as a positive move. But, says Sinigaglia, a real shift away from manufacturing and towards a knowledge-based economy still needs to happen. “We need to see government to commit to that switch.” Cormac Sheridan, Dublin

1046

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

NEWS IN BRIEF

University of Queensland (UQ), Australia, molecular geneticist Robert Birch got more than he bargained for when he introduced a bacterial gene into sugarcane to convert sucrose into its high-value isomer, isomaltulose. In fact, the gene encoding sucrose isomerase, cloned from Pantoea dispersa, a harmless colonist of the crop’s leaves, delivered twice as much sugar as Birch expected. Some of the transgenic plants were producing isomaltulose at up to 110% of the normal concentration of sucrose. Others produced little or no isomaltulose, but yielded up to 100% more sucrose. In the past 50 years, breeders had been unable to improve sugarcane’s yield by even 1%. Last August, CSR, Australia’s biggest sugar refiner, and its commercial partner, UQ’s commercial arm, UniQuest, received an AUD$5 ($3.8) million federal research grant, under AusIndustry’s Renewable Energy Development Initiative, to develop Birch’s high-yield sugarcane, dubbed ‘SugarBoost’, as a source of the ‘green’ fuel ethanol. The partners recently planted the first, small-scale, contained field trial of the transgenic sugarcane, with approval from the office of the gene technology regulator. Queensland University of Technology molecular geneticist James Dale, also founder and CEO of Brisbanebased ‘biopharming’ company Farmacule, describes the development as “huge,” in terms of its significance to Australia’s nascent ethanol industry. Indeed, Australia has been battling to keep its sugar industry alive in the face of cheap sugar from Brazil. Dale adds that it might eventually be possible to engineer similar yield increases in other ethanol feedstock crops like sugar beet and maize. GON

Senate compromise on SBIR reform A bill has been approved by the US Senate committee on small business and entrepreneurship that would allow companies that are primarily owned by venture capitalists (VCs) to obtain small business innovation research (SBIR) grants. Since 2003, companies whose majority investors are VCs have been ineligible for SBIR funds. Still, companies with some VC investment have been able to access the grants, according to a General Accounting Office

Bioengineered scents available soon from New Zealand The Horticultural and Food Research Institute of New Zealand—known as HortResearch—has filed patent applications for the use of the genes that produce the scent of green apples and red roses. Auckland-based HortResearch examined its databases of fruit genes and compounds to find the genes that encode enzymes that make alpha faresene synthase (green apple scent) and germacrene D synthase (rose scent). HortResearch used its flavor compounds databases to build maps of hypothetical pathways of how the resulting compounds are synthesized in the fruit or flower. These hypothetical pathways then allowed the scientists to postulate what types of enzyme might catalyze each step. The scientists then looked in their gene databases for genes that encode enzymes that can perform these steps. Likely genes were tested in Escherichia coli to see if they did produce those compounds and then in model plants. To manufacture the enzymes, HortResearch uses biofermentation. “What we are suggesting is that you could actually use real enzymes from the plant,” says HortResearch scientist Richard Newcomb, “and it’s even more ‘nature identical’.” Steve Meller, head of Global Biosciences at Procter and Gamble located in Cincinnati, Ohio, believes that a technological process that could produce cost effectively the flavors and perfumes manufacturers need would be a benefit. He adds: “The really desirable odorants out there are those that are much more complex, so I think that’s really where the hurdle is going to be.” HortResearch’s work of producing flavors and fragrances is the flipside of the work done by Californian company Senomyx which focuses on the receptors that enable humans to perceive taste (Nat. Biotechnol. 22, 1203–1205 2004). KG HortResearch

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

200% ethanol boost from Oz sugar

(GAO) report released in the first half of this year. In 2004, about 22% of National Institutes of Health SBIR grants, or $127 million, went to companies that were held in the minority by VCs, according to a report released in the first half of this year by the GAO. The Small Business Administration Reauthorization bill includes an amendment that would commit one-quarter of SBIR funds to companies that are majority-backed by VCs. “We’re supportive of the compromise and we look forward to working with the Senate,” asserts Alan Eisenberg, the executive vice president for capital formation and business development for the Washington, DC-based Biotechnology Industry Organization. The bill is now on its way to be considered by the full Senate. StL

Europe backs ES celsl News in Brief written by Alla Katsnelson, Kim Griggs, Stacy Lawrence, Linda Nordling, Graeme O’Neill, Peter Vermij & Emily Waltz

1048

After a heated debate, the EU voted in late July to continue funding embryonic stem (ES) cell research, but with narrower crite-

ria. The EU council agreed to continue to support research on ES cells, but not their procurement, which often requires destruction of the embryo. Several countries, notably Germany, with strict laws on stem cell research attempted to block the decision. The funding is part of the EU’s €72.7 ($93) billion research budget for 2007–2013. The vote came just days after US President George W. Bush blocked the passage of a bill that would have allowed federal money to fund similar stem cell research in the US. The bill would have supported use of embryos destined for disposal at in vitro fertilization clinics. The contrasting decisions of the EU and US may give European biotechs an edge in recruiting scientists, experts say. “This is a missed opportunity for the US to assert leadership in the field,” says Michael Werner, president of the Werner Group, a Washington, DC-based biotech research consulting firm, and former chief of policy at Biotechnology Industry Organization, also in Washington, DC. “The EU is taking advantage of that.” EW

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

NEWS IN BRIEF

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

New products table Product

Details

Lucentis (ranibizumab injection) Genentech (S. San Francisco, California)

On June 30, the US Food and Drug Administration approved Lucentis, a recombinant humanized antibody fragment, for the treatment of neovascular wet age-related macular degeneration (AMD). An estimated 1.7 million people in the US suffer from severe AMD, the leading cause of blindness in the elderly. The ‘wet’ form of the condition is caused by growth of abnormal blood vessels that leak fluid and blood, leading to retinal scarring. Lucentis inhibits the activity of the angiogenesis protein human vascular endothelial growth factor A (VEGF-A). In trials, Lucentis maintained vision or restored partial vision loss in most wet AMD patients. Recommended treatment is intravitreal injection administered once a month.

Elaprase (idursulfase) Shire Pharmaceutical Group (Basingstoke, UK)

On July 24, the FDA approved Elaprase, an enzyme replacement therapy for the treatment of Hunter Syndrome. Also called Mucopolysaccharidosis II, Hunter Syndrome is a life-threatening X-linked recessive condition resulting from absence or insufficiency of duronate-2-sulfatase, causing the accumulation of cellular waste products in tissues and organs. The condition affects about 1 in 65,000 to 132,000 births. Elaprase, the first-ever treatment for Hunter Syndrome, is administered in weekly infusions. It has also under review by the EMEA in Europe. AK

UK panels urges tightening of phase 1 rules British pharmaceutical industry organizations say they are generally pleased with draft recommendations by a scientific expert panel for tighter rules governing phase 1 trials of “novel and potentially higher risk drugs…such as monoclonal antibodies.” The UK government convened the panel earlier this year when six volunteers experienced very serious adverse effects from TGN1412, a T cell–targeting ‘super monoclonal antibody’ with an agonist activity developed by TeGenero of Würzburg, Germany (Nat. Biotechnol. 24, 475–476 2006). In an interim report released on July 25, 2006, the panel said that in higher risk studies “the first dose in man should be given to one person only, leaving sufficient time for any adverse reaction to develop before further administration or administration to additional people.” The experts urged drug developers to inform regulators earlier about elevated risks and suggested enrolling people with the targeted disease rather than healthy volunteers into phase 1 trials of higher risk drug candidates “particularly if the drug is expected to affect the immune system.” The recommendations generally echoed those published a day earlier by a joint task force of the Association of the British Pharmaceutical Industry and the UK BioIndustry Association, both based in London, including the proposal to set starting

doses in first-in-man trials of biologicals below a point at which no biological effect is expected. The industry task force, however, limited some of its advice to “novel agents stimulating the immune system,” excluding from extra scrutiny agents with inhibitory effects. Such agents, the task force writes, “are widely used” and “rarely have acute adverse effects.” The expert panel is due to issue its final report in November. PV

GM sorghum stalled in SA In July, the South African government rejected an application to conduct field trials of genetically modified (GM) sorghum on its soil— research that received $16.9 million from the Bill and Melinda Gates Foundation. The decision, which received a lot of media interest throughout Africa, was based on a judgment that the containment level proposed was too low for a native African plant. Similar concerns about contamination to native plants have been raised in Mexico in the past, as the country tried to develop GM corn (Nat. Biotechnol. 23, 6, 2005). Gatsha Mazithulela, executive director of the biosciences arm of the Council for Scientific and Industrial Research (CSIR), located in Pretoria, says the rejection, far from destroying the public image of biotech, actually could inspire confidence. “It’s giving a clear message that the South African GM [organisms] legislation is working and if you don’t

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

submit the right application you won’t go through,” he says. “The issue here is that sorghum’s center of origin is Africa and that’s why there’s a cautious approach,” explains Jocelyn Webster, executive director of AfricaBio, a non-governmental organization supporting research, development and application of biotech in Africa, adding, “ There’s more information required by the applicants, and I suspect that there will be the usual process followed by the regulators.” Meanwhile, researchers working on the sorghum project are hopeful that a second application, which proposes higher containment levels, will be accepted before the end of this month. LN

TKT infringes Amgen EPO patents A US federal appeals court ruled on August 3 that Cambridge-based company Transkaryotic Therapies (TKT), acquired last year by Shire, has infringed two patents held by Amgen for the production of erythropoietin. The ruling effectively bars US sales of TKT/Shire’s EPO product, Dynepo, in the US until the patents expire in 2015. However, the court also ruled one of Amgen’s patents invalid and sent another claim back for review. Although both sides won two battles, notes Kevin Noonan, partner at the law firm McDonnell Boehnen Hulbert & Berghoff in Chicago, Illinois, Amgen won the war. “In the grand scheme, the patentee only has to win one” to demand an injunction against the competitor. But such rulings give “some certainty to how these claims can be interpreted,” he added, and can be a “spur for other [companies] to figure out how to get around them.” An additional suit is pending against Swiss company Roche, which has plans to sell its new product CERA, a long-acting EPO, in the US. The ruling “could widen the window of opportunity for Roche to craft an infringement defense for CERA that capitalizes on these new interpretations,” writes David Witzke, biotech analyst at Banc of America in New York, in a research note. Amgen’s patents expired in Europe in 2004, and European sales of Dynepo are slated to launch this year. CERA was submitted for review for approval by the European Medicines Agency in April. AK

1049

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

Abe Abuchowski As CEO of one of the first companies to make protein delivery into a profitable business, Abe Abuchowski knows what it takes to bring a new technology to market. Although his technology— PEGylation—is now considered an industry gold standard, its three-decade development history illustrates the often rocky path to commercial success for platforms.

Drug delivery is a dicey business, and Abe Abuchowski was one of the first to make it work. At the dawn of the biotech industry, proteins’ promise as therapeutics was undisputed. But taken from animal or recombinant bacterial sources, their therapeutic potential was often undone by high immunogenicity. Short circulating life posed an additional problem—frequent doses were required to maintain therapeutic levels, again increasing the likelihood of an immune response. As luck would have it, when Abuchowski began his doctoral research in biochemistry in 1971, his thesis advisor at Rutgers University, Frank Davis, put him to work on this very problem. A few years back, Davis had happened upon a paper suggesting that poly(ethylene glycol) (PEG), a polymer widely used in foods and cosmetics, could provide a solution. Initial studies indeed showed that “hanging a bit of PEG” onto a protein reduced immunogenicity and improved circulating life, recalls Davis, and along with two colleagues he patented a technique for PEG-protein delivery. Within a few years Abuchowski and his colleagues hit the jackpot when looking for a general method for attaching PEG to a protein: a formulation of PEGylated bovine serum albumin. This was the first protein molecule created that was neither immunogenic nor antigenic. “It was a real Eureka moment,” says Abuchowski. “Even after we did it we couldn’t believe it, quite honestly.” More importantly, the researchers went on to show in mice that a PEGylated protein could cure a previously untreatable enzyme deficiency. At a time when researchers were just beginning to venture into the commercial side of discovery, Abuchowski was happy to take the leap. “I think Abe very quickly saw the business applications,” says Davis, who is now retired. In 1982, the duo formed Enzon Corporation in New Jersey to bring PEG-based treatments to the clinic. In 1990 the company’s first product, PEGylated adenosine deaminase enzyme (ADA), known as Adagen, gained US Food and Drug Administration (FDA) approval—making Enzon the fifth company to have a biotech drug approved. Inherited absence of ADA had recently been found to cause one type of severe combined immunodeficiency disorder. Without PEG, ADA has no therapeutic effect. Four years later, the company received approval for Oncaspar, PEGylated Lasparaginase for acute lymphoblastic leukemia. “A company doesn’t exist to do research, but to get products on the market,” says Abukowski. The decision to go after two products with almost no market was a deliberate one. “I think Enzon was pretty smart,” notes Roger Harrison, an associate at Plexus Ventures, a global pharma consultancy based in Maple Glen, Pennsylvania, and an independent consultant specializing in drug delivery. “There’s an established belief that anything you do [to a protein] will create a problem with the FDA,” he says. But both Adagen and Oncaspar minimized this added uncertainty because both were made possible by the technology, and both approval processes could be expedited by orphan drug status. Even with Enzon’s irrefutable clinical data on Adagen, Abuchowsky notes, “up until the day of [FDA] approval, I probably had half of Enzon management betting against me.” Ultimately, getting the two products out in quick succession essentially proved the technology.

1050

Meanwhile, big pharma was beginning to appreciate PEG’s potential. Enzon signed a deal with Schering-Plough to develop a PEGylated version of alpha-interferon (PegIntron) for treating hepatitis C. But as Enzon’s management waited to see whether the project would succeed, resources dwindled, stock price fell and disagreement began to brew. A messy restructuring ensued, its outcome being a much-diminished R&D program and Abuchowski’s departure—not just from Enzon but, for a time, from biotech. PegIntron’s approval in 2001 pushed Enzon into profitability, and also marked the first time that a second-generation protein superior to the first generation due to a biotech improvement. Within a year, it had captured about 65% of the market share of a protein that had already been on the market for over a decade. “Had Enzon had the ability to prepare their own proteins and chosen more of a therapeutics model than a drug delivery model, they could have done alpha-interferon on their own,” notes Robert Shorr, who served as vice president of research and development from 1991 to 1997. Shorr is now CEO of Cornerstone Pharmaceuticals in New York. He also serves a scientific advisor to Abukowski’s new company, Prolong Pharmaceuticals in Monmouth Junction, New Jersey. By all indications, Abukowski is not about to make the same mistake twice. “Enzon was a company that developed the technology and

“A company doesn’t exist to do research, but to get products on the market,” says Abukowski. introduced it,” he says. “Prolong is a product company.” Part of the plan is to realize some of the projects that languished in Enzon’s deep-freezer. But with several biotech drugs coming off patent and manufacturing costs falling, Prolong is also looking to Asia. For Abukowski, one of the lessons of Enzon was that the sooner you get to revenue, the more freedom you have to decide where to go next. “There’s an alignment in philosophy between Abe and many players in Asia,” notes Gurinder Shahi, director of the Global Biobusiness Initiative at the University of Southern California in Los Angeles. Unlike in the United States, “in Asia there is no risk capital, so companies are forced to use a quick-to-revenue strategies and use that revenue to make a product.” Because the technology is now well established, it creates a proprietary dimension to generics. In the five years since the technology’s acceptance became official with the approval of PegIntron, about ten other PEG products have come to market, with several more in trials. Yet other technologies are emerging. Even for old products, notes Walter Blatter, CEO of ImmunoGen in Cambridge, Massachusetts, protein modifications other than PEG, such as the two additional N-glycosylation sites in Amgen’s long-acting erythropoietin Aranesp, can increase the circulation life of a molecule. Whether PEG has the capacity to surpass its primary role as a second-generation modification remains to be seen, says Samuel Zalipsky, associate director of protein and linker chemistry at ALZA in Mountain View, California. He concludes: “It’s still got some life in it, though it’s not the only game in town as it was in the past.” Alla Katsnelson, New York

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

D ATA PA G E

Biotech R&D goes further afield

Top 20 pharma as portion of R&D deals

The life sciences continued to account for 54% of federal R&D funding for the third year in a row.

The last three years have seen a rapid decline in the share of biotech R&D deals conducted by big pharma.

35.0

60%

30.0

50%

25.0

40%

20.0

30%

15.0

20%

10.0

10%

0.0

0%

쐽 Life science funding as percentage of total federal funding

19 8 19 3 8 19 4 8 19 5 86 19 8 19 7 8 19 8 89 19 9 19 0 9 19 1 9 19 2 93 19 9 19 4 9 19 5 96 19 97 19 9 19 8 9 20 9 00 20 0 20 1 0 20 2 0 20 3 04

5.0

쐽 Life sciences funding

Year

Top 20 pharma-biotech deals as a percentage of total

US life sciences federal research funding

(% total federal funding)

contributing $18 billion. Biotech firms increasingly dominate dealmaking compared with big pharma, which now contributes just under half of the funding for all deals; the value and number of these biotech partnerships remained buoyant.

($ billions)

The governments in New Zealand, Korea and Canada are placing big bets on their biotech sectors. But the United States continues to dwarf other countries in terms of total investment; last year, the US public sector spent $30 billion on the life sciences, with the private sector

90% 80% 70% 60% 50% 40% 30% 20% 10% 0%

Deal value

77%

74%

72% 59%

Deal number

48% 41%

39%

34%

2001

31%

2002

2003

27%

2004

2005

Year

Source: National Science Foundation

Source: Windhover, Burrill & Company, Nature Biotechnology

Biopharmaceutical research and product alliances

Number of partnership deals by stage

Windhover’s data reports $12 billion in R&D deals; Burrill pegs the figure at a more outstanding $17 billion.

Very late stage and discovery deals actually declined last year, whereas all other categories held steady or increased. 76 66 75

Marketed 514 9.1

12.0

($ billions)

606 10.3

621 12.3

555 10.4

617 12.2

700 600

10.0

500

8.0

400

6.0

300

4.0

200

2.0 0.0

100 2001

2002

2003

2004

2005

9

Total deal value

(Number of deals)

14.0

Number of deals

12 14 8 20 15 23 16 29 25 27

Approved Filed for approval Phase 3

26

Phase 1

0

81

2005 2004 2003

41 37

Phase 2

2002

50 57 62

38 45 44

119 119127 134

Pre-clinical

Year

235

169

Discovery

Windhover data include any deal involving a biotech firm and use only the first indication figures (if provided), whereas Burrill looks at all potential products and focuses on the research money going to biotech firms. Source: Windhover, Burrill & Company.

193

137

0

50

100

150

200

250

Number of deals

Source: Windhover, Burrill & Company

New Zealand, Korea, and Canada have devoted the largest share of their public research funding to biotech.

Country Biotech R&D spending by companies (millions)

Percentage of total business R&D spending

Based on earliest year available, 2002, 2003 or 2004. Source: Organisation for Economic Co-operation and Development

1052

20%

400 200

15%

10%

212 2%

300

6% 90

7% 149

131

10% 29 1%

105

100

5 0%

0

5%

d

en

an el Ic

w ay or

N

la Fi n

Sw ed

k

nd

d

ar m

al an

en

Ze

D

n

m do N

Ki ng

ai te d

ew

a

0%

ad

0%

25%

453 0%

500

Sp

67

2 2% 9 5 1%

8 4% 4

8 2% 8 95

2 4% 01

10%

U U nite ni d te S d ta Ki te ng s G do er m m a Fr ny a C nce an D ad en a m a Sw Ko rk itz rea er la n Is d C hi ra na e ,S I l ha taly ng Au ha st i ra l N ew S ia p Ze ain al a So Fin nd ut lan h Af d r Ic ica el a N nd or w Po ay la nd

0

1 3% 99

5 20

9 46

6 3% 99

4 19

7

1,

72

9% 2 5% 51 23 3% 6

% 12

8

1 3% ,34 7 1 6% ,34 2

2,000

20%

00 2,

4,000

7%

6,000

600

ni

21

%

30%

30% 24% 550 12%

an

% 24

8,000

727 15%

700

U

40%

10,000

800

C

50%

12,000

($ millions)

60%

51

14,000

Ko re a

%

,2 14

16,000

Public biotech spending as a percentage of total

Iceland, Denmark and New Zealand are pouring anywhere from one-quarter to one-half of all their business R&D money into biotech. Biotech R&D as a percentage of all R&D spending

International biotech public sector R&D spending

32

R&D spend on biotech from business in OECD countries

($ millions)

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

Stacey Lawrence

Biotech public R&D spending

Country Public biotech R&D as a percentage of total

Includes government and higher education biotech R&D spending. Based on earliest year available, 2002, 2003 or 2004. Source: Organisation for Economic Cooperation and Development

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

N E W S F E AT U R E

Systems Biology, Incorporated?

In June, privately held VLST Corp. in Seattle announced that it had raised $55 million in a Series B venture financing round. At the time, that was reported to be the 16th largest venture capital deal of the year across all industries1— no small feat for an upstart biotech when even established players in the field were finding it tough to curry favor with investors. But the deal was particularly remarkable for another reason. VLST (which derives its name from Viral Logic Systems Technology) is the first company to graduate from Seattle’s Accelerator Corp., a venture-backed life sciences incubator formed in 2003 by a group of venture investors—MPM Capital, Arch Ventures, Versant Ventures, Alexandria Real Estate Equities and later OVP Venture Partners and Amgen Ventures—in conjunction with the Institute for Systems Biology (ISB), of Seattle. As such, the recent financing seemingly marks a victory for the young ISB and a new milestone in the commercialization of systems biology—a group of marquee venture capitalists (VCs) putting major money into a very early-stage platform technology company at a time when most venture capitalists are avoiding biotech altogether or are trying to reduce risk with various accelerated commercialization strategies2,3. Is systems biology finally coming of age? Systems biology catches on Certainly, systems biology as a discipline has gained in popularity over the past several years. Since the ISB was founded in 2000 by Leroy Hood, close to a dozen independent systems biology institutes have been created around the world, and many more universities have created systems biology departments. But the commercial success of the discipline has been thus far ambiguous. Systems biology ideally seeks to understand complex biological systems in their entirety by integrating all levels of functional information into a cohesive model. That stands in contrast to the reductionist approaches that became standard in the twentieth century, with biologists teasing out functional information on organisms one gene or one protein at a time. Strategies for systems biology vary, but generally come down to some combination

of bottom-up data collection (for instance, amassing comprehensive information on an organism’s genome, proteome, “transcriptome,” “metabolome,” “interactome,” “transportome” and any other “-omic” approach, at all possible levels of complexity) and top-down computational modeling and simulation, in which known functions and behaviors of biological components are described mathematically and linked into complex models that allow for the dynamic interaction of large numbers of variables. Hood insists that both approaches are necessary, and that true systems biology requires, whenever possible, a global attitude toward data collection. Some companies are indeed using both topdown and bottom-up approaches to discover new knowledge, but many have focused on modeling and simulation, gathering their basic data from scientific literature or collaborative partners and tacitly accepting the greater rates of error or uncertainty that go with incomplete understanding of an organism’s constituent parts and how they interact. But there are more than technical challenges to pursuing a systems approach to biology. Because it requires understanding of biological function from phenotype down to the molecular or even atomic level, a systems approach

ISB, Seattle

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

As the first ‘systems biology’ companies achieve some measure of success, the question remains whether systems biology can provide a viable business model. Karl Thiel investigates.

requires biologists, chemists and physicists of many stripes. And because of the intense data collection, processing, modeling, and simulation required, systems biology also requires computer scientists, mathematicians, software engineers and other people not usually found in university biology departments. Hood left the University of Washington in Seattle to found ISB because he believed that he couldn’t effectively build the necessary cross-disciplinary teams in a university environment, nor find the financial backing necessary to create the required infrastructure. And he believed that systems biology would produce an enormous amount of valuable intellectual property that could be better managed outside a university setting. That would seem to make systems biology better suited for a private, commercial enterprise. But there are challenges here, too. Theoretically, a systems approach to understanding and treating human disease should identify the best means of therapeutic intervention. But others will still need to translate that information into an actual therapy. Therefore, systems biology sounds like just one more ‘tool’ strategy for drug discovery and development—a platform that will feed new targets, or new interventional strategies, to drug makers. And that’s exactly what VCs don’t want to hear right now. “That’s the noninvestable model,” says Carl Weissman, president of Accelerator and a venture partner at MPM Capital of Cambridge, Massachusetts. “That’s what VCs are not interested in—people who are trying to build up some sort of a stacked royalty and services business.” That would seem to put the young companies trying to commercialize systems biology into a tough position. How do they create a winning

The Institute for Systems Biology (ISB) brings together researchers from diverse backgrounds and provides a place for interdisciplinary work outside the usual academic setting.

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

1055

N E W S F E AT U R E

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

business model in a field that is still struggling to define itself? Enter Accelerator Unlike most incubators, the Accelerator was formed specifically to nurture startups that would benefit from an affiliation with ISB, and to provide management support and equitybased backing along with the more typical facilities and infrastructure addressed by many life sciences incubators. Indeed, says Weissman, Accelerator was initially conceived as a vehicle to specifically nurture ideas spun out of the ISB, as an earlystage testing ground where for a relatively small investment—usually in the $2 million to $5 million range—new concepts could either prove worthy of significant further investment or be shut down without major loss. That’s still the idea, but Accelerator’s reach has widened. VLST, for instance, did not come out of ISB, but rather was founded by Craig Smith and Steve Wiley, two scientists who most recently came from Immunex/Amgen of Thousand Oaks, California. Smith codiscovered the rheumatoid arthritis drug Enbrel (etanercept) while at Immunex. The company’s platform technology is based on using virulence factors found in various viral genomes as a guide to drug targets for autoimmune and chronic inflammatory diseases. According to VLST president and CEO Martin Simonetti, Smith hypothesized that many viruses rely on these secreted proteins to slow or evade the immune system and thus gain a foothold in their host. For instance, he says, Smith found that many viruses encode a protein that “looks a lot like” the p75 tumor necrosis factor (TNF) receptor—a recombinant form of which ultimately became Enbrel. At the same time, the p55 TNF receptor, which some researchers investigated as a potential drug, did not prove effective. After retrospective analysis, Simonetti says viral genomes may explain why: “We couldn’t find any viruses that coded for p55.” The idea that virulence factors could lead to what he calls “prevalidated” targets was retrospectively validated not only with Enbrel, but with other targets like interleukin (IL)-1 and CD30, he says. “If you knew what the virus was telling you, you would have saved yourself a lot of time and money in the clinic.” The company plans to use a bioinformatics approach to identify virulence factors in viruses, then use proteomics to identify the specific target, and then finally to create therapeutics to mimic the behavior of the virulence factors. The $55-million Series B round is a big step up from the approximately $4.5 million the company initially got from Accelerator. “When I first joined, we weren’t going to raise anywhere near

1056

that kind of money,” acknowledges Simonetti. “But when we sat back and thought about it, the real transforming event is the proof-of-concept phase 2 clinical trial.” Thus, the Series B round, divided into three tranches, is intended to take the company though a phase 2a trial, at which point a successful outcome should make further financing relatively easy. VLST is, in short, aiming to be a fully integrated drug company and to simply bypass the whole ‘tool’ conundrum. Ceci n’est pas Systems Biology The only problem is, despite its affiliation with ISB, VLST is not really a systems biology company by most measures. It is not seeking a systems-level integration of the human immune system to better understand targeted diseases, but rather it is using the adaptive evolution of viruses as a shortcut guide through the darkness to better targets. Hood takes it a step further and asserts that none of the companies at Accelerator—including one called Homestead spun out of his own lab—are really pursuing systems biology. “It’s still too soon,” he says. Homestead “is using systems thinking to identify biomarkers in the blood that may be useful in diagnostics. I think that kind of company has a chance of making a real contribution, but it’s not really a systems biology company—it just defines one aspect of a systems approach.” The same goes for two other companies— MacroGenics of Rockville, Maryland and NanoString of Seattle—that were spun out of ISB but are not part of Accelerator, and indeed, by Hood’s standards, for most other companies claiming to be working in the space. “Any company that claims to be in systems biology is doing it on a very marginal basis,” he says. “Because we’re just now developing the necessary tools.” But for those companies that are at least basing their businesses on modeling and simulation of complex systems, the problem remains—how do they successfully turn an essentially tooloriented platform into a growth opportunity? For some companies, the answer has been to go after as much capital as possible and try to build a fully integrated drug company—an approach that requires finding willing VCs with long time horizons, a steep challenge these days. And one company initially pursuing this path, BG Medicine of Waltham, Massachusetts, actually switched from a drug discovery model to a service model4 in 2005. For others, slow growth and modest capital budgets have been the key. One of the earliest simulation and modeling companies to begin operations was Foster City, California’s Entelos. Founded in 1996, it has created a series of ‘PhysioLabs,’ dynamic models of various disease states that integrate

information down to individual protein interactions based on information derived from published literature, with behaviors represented as differential equations and linked into a simulated patient. Different ‘virtual patients’ can then be created to represent either known variations in—or uncertainty about—the underlying parameters. Groups of diverse ‘patients’ are then used to simulate various interventional outcomes. Entelos has certainly had some tangible success both in terms of partnerships and in the fact that after raising about $50 million in venture capital, it went public on London’s Alternative Investments Market (AIM) in April, raising $20 million in its initial public offering (IPO). But the company’s path has not always been smooth. Its approximately $78-million market capitalization upon IPO was only slightly more than the roughly $70 million it has raised in private and public rounds, suggesting that the market does not yet see a great deal of surplus value in what the company has created with its capital. Part of that could come down to the revenue model, which has thus far been mostly based on some form of fee-for-service compensation. But the company is now expanding its deal structures to take a greater stake in some of its projects. In February 2005, Entelos announced it had expanded a collaboration on rheumatoid arthritis therapies with Organon of Oss, The Netherlands, into a codevelopment, comarketing deal that gives it a bigger piece of the possible upside. “Royalty deals are great, and we all want to get them,” says Entelos CEO James Karis, “but getting single digits on something that’s ten years out—I’m not sure that’s got a whole lot of value. But when you get to collaborate with someone and have the opportunity to codevelop and potentially comarket a drug, that’s a different level of value added.” “We also in some cases own other aspects of the biology that come out of our relationships,” says chief technical officer Alex Bangs, noting that the company has filed for patents on potential drug targets it has identified through its simulations, which it could choose to later out-license or even develop. The urge to move from service to products motivates more than just Entelos. San Diego’s Genomatica, a company that has used a systems approach in modeling microbes and mammalian cells primarily to help clients improve the production efficiency of chemicals and recombinant proteins, has like many systems biology companies raised very little venture capital (Table 1). After an initial $3.5million round from Iceland Genomic Ventures in 2000, Genomatica has relied on organic

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

N E W S F E AT U R E

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

Table 1 Selected systems biology companies Company/founded

Focus

Equity financing background

Entelos/1996 (LSE: ENTL) (Foster City, California)

Complex human disease models, PhysioLabs.

About $50 million in Series A–D rounds; $20-million IPO on London’s AIM market in 2006.

Genstruct/2001 (Cambridge, Massachusetts)

Drug discovery based on iterative wet lab data collection and in silico modeling.

Raised $6.5-million Series A round in 2003; none since.

Genomatica/2000 (San Diego)

SimPheny client-server application builds predictive models of organisms based on cellular metabolism.

Initial $3.5 million venture round from Iceland Genomic Ventures; none since.

GeneGo/2000 (St. Joseph, Michigan)

MetaCore platform integrates and visualizes cellular function data into complex models.

$1.4 million from Michigan Life Sciences Corridor; no institutional backing.

Ariadne Genomics/2002 (Rockville, Maryland)

“Natural language processing” and statistical algorithms. PathAssist software for visualization and analysis of regulatory pathways.

Various grants and government funding; no reported venture backing.

Gene Network Sciences/2000 (Ithaca, New York)

VisualCell data integration tool. Licensees include ISB. Builds predictive models of cells.

Some angel investor backing.

Ingenuity Systems/1998 (Mountain View, California)

Pathway analysis software.

Venture backers include Affymetrix as well as institutional VC firms.

BG Medicine/2000 (Waltham, Massachusetts) (Founded as Beyond Genomics)

Focus on systems pharmacology, including biomarkers for liver toxicity. Wet lab as well as in silico work. Switched from drug discovery to service focus in 2005.

Over $26 million in institutional and strategic funding.

BioSeek/2002 (Burlingame, California)

Human disease models, used for partners and internal discovery.

$8.4-million Series A (2002); $19 million in total private equity.

Target Discovery/2002 (Palo Alto, California)

Diagnostics based on protein isoforms.

$7-million Series A (2002–2003); none since.

growth and government funding to move its business forward. But now, the company is migrating from what was essentially fee-for-service consulting work to product ownership. “I think the business plan has to be built around chemical and biological products,” says Christophe Schilling, Genomatica’s president and chief scientific officer. That means negotiating royalties from the sales of drugs and biologics that Genomatica helps clients produce, but could also mean outright ownership of some future projects. Schilling acknowledges that when the company was starting out a few years ago, the technology was at an early stage and needed further development and validation before it would “present the kind of business case where we would want to raise tens of millions of dollars.” So the company instead opted for a business plan that didn’t require much capital and relied mostly on slow, organic growth. But now Schilling believes that Genomatica’s technology has proven its value and that he can offer investors a compelling growth opportunity. “We’re at a point today where that scenario has definitely changed,” he says. Although he believes the company could be successful on a smaller scale with a low-capital, slow-growth approach, Schilling now sees reason to accelerate the process. Still, not all companies want to create products. Gene Network Systems of Ithaca, New York, is using a modeling approach to reverse-engineer experimental data into integrated models of complex biological networks, explains CEO Colin Hill. “We are definitely a tool company, a

platform company. We are not trying to make drugs ourselves,” he says. But he, too, feels the pressure to move towards a product focus. “We’ve had various people discuss that with us—’Why don’t you become a drug maker?’ And I honestly think that’s a horrible idea, at least right now,” he says. “Many young companies even now get forced into doing that,” he says, when they have no particular competitive advantage—indeed, many considerable disadvantages of scale—in drug development. “Until our technology really demonstrates a huge, huge improvement in drug development success rates, I don’t see why it makes sense for a platform company to switch—unless they have no choice because the investors are pushing them to do that,” he asserts. Gene Network, even though it has raised almost $12 million from government grants and angel backers, does not have any institutional backing. The companies that have combined a computer-based simulation and modeling approach to systems biology with internal drug discovery programs have, not surprisingly, raised more venture capital than their in silico-only counterparts, but it remains to be seen whether the advantages that a systems approach to biology bring are enough to overcome the challenges of establishing a new drug development organization. At the same time, there’s something to be said for keeping companies small. Gerry Langeler, a general partner at Seattle’s OVP Venture Partners—one of the backers of VLST, Accelerator and ISB-spinout NanoString— echoes MPM’s Weissman when he says his firm is “relatively uninterested” in platform

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

companies with variations on fee-for-service models, which he sees as being unlikely to reach a scale that will make institutional backing a worthwhile investment. But that doesn’t mean companies in the space can’t succeed with that model. “I think there is sometimes a mistaken belief that unless you scale to a very large size, you haven’t been successful,” says Langeler. “But if you can build a $20-million-a-year company that’s throwing off $3 million to $4 million a year in profit, hats off to you. Don’t take my money, keep it for yourself! There’s a lot to be said for the modestsized company that may never be the big home run but can be a very successful enterprise for the entrepreneurs and maybe a few small angel backers.” That approach may also help young companies still seeking to prove the value of their platforms mature into something that can sustain large capital investment. “In this market, where the venture community is largely trying to de-risk their opportunities and look at triedand-true products, it’s hard to imagine how you get many systems biology companies funded,” acknowledges Weissman. Tomorrow’s systems biology successes may have to work outside the system. Karl Thiel, Portland, Oregon 1. Cook, J. Biotech startup VLST gets $55 million. Seattle Post-Intelligencer June 16, (2006). 2. Lawrence, S. Bioentrepreneur, published online 22 December 2005 (doi:10.1038/bioent897). 3. Thiel, K. Nat. Biotechnol. 22, 1087–1092 (2004). 4. Hendrickson, D. Mass High Tech, published online Sept. 9, 2005 http://masshightech.bizjournals.com/ masshightech/stories/2005/09/12/story5.html

1057

N E W S F E AT U R E based in Geneva—which states that all clinical studies should be registered, including phase 1 trials—would stifle innovation4. Furthermore, as most phase 1 trials are small and use healthy volunteers, sick patients wouldn’t need to know about them. “It is unclear how disclosing active Clinical trial databases are sprouting like weeds, but do they provide phase 1 trials would benefit patients,” says the information the public needs? Aaron Bouchie investigates. Goldhammer. Following PhRMA’s lead, the Biotechnology Industry Organization (BIO) in Washington, DC, encourages all of its members to regOn August 3, US Senators Enzi (R-WY) and include all trials that test for efficacy (excluding ister all “hypothesis-testing” trials (that is, Kennedy (D-MA) introduced legislation, which, only those early-stage trials that test for safety), late-stage, some phase 2 and all phase 3) on if enacted, could help bolster the public’s confi- not just for those that are experimental and for ClinicalTrials.gov. Most drug developers interdence in the drug industry and the government life-threatening diseases3. All 11 ICMJE member viewed for this article agree with this strategy, agency that regulates it. The bill, called The journals now require a trial to be registered at or although some still submit only those that are Enhancing Drug Safety and Innovation Act of before the onset of patient enrollment in order required by law. One notable exception is 2006 (S. 3807), calls for the establishment GlaxoSmithKline (GSK; Brentford, of a mandatory clinical trials registry UK), which was sued in June 2004 by and results database. In requiring that New York Attorney General Eliot Spitzer outcomes be included, such a registry for suppressing negative results from differs significantly from the existing clinical trials with the anti-depressant government database, ClinicalTrials.gov drug Paxil (paroxetine hydrochloride) which mainly lists ongoing clinical triin adolescents. GSK makes public all of als1. The Enzi-Kennedy bill is a response its active clinical trials including phase 1 to increasing public distrust of the drug on ClinicalTrials.gov. “We have decided industry and its oversight by the US Food to include all phase 1 trials in the puband Drug Administration (FDA) that is a lic registry to support the movement of result of recent high-profile drug safety transparency, which was led by WHO and debacles (Box 1). ICMJE,” explains Rick Koenig, GSK’s vice But the bill has sparked controversy. president of R&D communications. Public advocacy groups say it does not go GSK’s policy, though admirable, is far enough, whereas critics from indusnot required by law, nor will it be if the try say that releasing clinical trial data is Senators Mike Enzi (left) and Edward Kennedy introduced legislation in early August that would mandate the public Enzi-Kennedy bill passes. Although the unnecessary and may actually stifle inno- disclosure of late-stage clinical trial data. new bill would require the registration of vation. As outlined in the Enzi-Kennedy all late-stage trials, not just those that are bill, however, greater transparency of clinical trial data appears to offer little threat to to be considered for publication. This is no idle experimental and for life-threatening diseases, drug developers, as the most sensitive business threat considering heavyweights such as The it does not go so far as to require companies to information—that on early-stage, exploratory Lancet, The New England Journal of Medicine and report early-stage trials. The senators clearly have trials—will remain in companies’ hands. the Journal of the American Medical Association listened to industry’s interests by not including phase 1 trials in the bill’s registry requirements. are members. The current approach Most pharmaceutical and biotech companies The FDA Modernization Act of 1997 (FDAMA) have complied with the ICMJE’s policy, not just The call for greater transparency required the US Department of Health and so they can get published in reputable journals, A few months after Spitzer sued GSK, Merck Human Services to set up a registry of clinical but also because in excluding early-stage trials, of Whitehouse Station, New Jersey, voluntarily trials “of experimental treatments for serious sensitive business information is not revealed. withdrew its anti-inflammatory drug Vioxx or life-threatening diseases or conditions2.” To In explaining this exclusion, Alan Goldhammer, (rofecoxib) from the shelves because it increased achieve this, the National Library of Medicine vice president of regulatory affairs at the the risk of heart trouble. Although it created bad (NLM) launched ClinicalTrials.gov in 2002, the industry group Pharmaceutical Researchers publicity for the pharmaceutical industry, these primary purpose of which is to help patients and and Manufacturers of America (PhRMA) in two events also highlighted apparent deficienphysicians find information on nearby clinical Washington, DC, points out that phase 1 trials cies in the FDA’s post-marketing surveillance trials. According to the NLM website, the regis- are exploratory, or “hypothesis-generating,” and processes. The industry responded by launchtry currently contains ~31,700 clinical studies in the drugs being tested in this early stage are still ing a number of databases of clinical data on far from regulatory approval. “If you’re breaking their marketed products in hopes of improving over 130 countries. Many believe that, although ClinicalTrials.gov ground in a new therapeutic area, then listing its public image (Table 1 and Supplementary is a good start, it could (and should) do more to phase 1 trials would be telling the competition Table online). Clinical data can also be submitted to benefit the medical community. In September what you’re doing,” says Goldhammer (Box 2). For this reason, many in the drug industry ClinicalTrials.gov, for example, by linking to a 2004, the International Committee of Medical Journal Editors (ICMJE, a small working group believe the recommendations published in May, published journal article, although such discloof journal editors) called for the registry to 2006, by the World Health Organization (WHO) sure is not required by law. The Enzi-Kennedy Newscom/UPI Photo/Kevin Dietsch

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

Clinical trial data: to disclose or not to disclose?

1058

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

N E W S F E AT U R E

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

Box 1 Enzi-Kennedy bill basics In addition to the clinical trials registry and results database, the Enzi-Kennedy has three other elements: It outlines a plan to improve post-market monitoring of drugs by the FDA and companies. Before a drug can be approved, a company will be required to submit a risk evaluation and management strategy (REMS) that will help the FDA respond to risks identified after a product reaches market. Noncompliance results in fines of up to $250,000 per violation. According to FasterCures’ Simon, the FDA should receive additional appropriations to take on this added authority, rather than relying on user fees. It creates the Reagan-Udall Institute for Applied Biomedical Research, a public-private partnership that would foster the creation of a new generation of predictive tools to speed product development and increase safety. This institute would identify and coordinate research priorities and distribute grants. “The FDA analyzes drugs using technologies that are 20 years old,” according to Peter Pitts, president of the nonprofit Center for Medicine in the Public Interest. Thus, the institute would “help the FDA move to the edge of 21st century medicine.” It increases transparency and predictability in the FDA’s process for screening advisory committee members for potential financial conflicts of interest. Last month, the FDA announced it was looking into how to improve the process. C-Path’s Woosley says that it would be difficult to get balanced opinions without bringing in people with industry experience. “If you want an expert opinion, then you want that expert opinion, no matter where the person works,” he says. AB

bill, if passed, would mandate disclosure of results from some late phase 2 trials and all phase 3 and 4 trials. Under the legislation, failure to comply has dire consequences—it could hold up drug approval or the release of funds to trials funded by federal agencies. Even so, some patient advocates believe this bill does not go far enough, and that complete data transparency as soon as possible after the

trial is completed is necessary to benefit patients. For example, a patient looking to enroll in a trial should be able to base the decision on existing clinical data for all products that are in trials, argues Sidney Wolfe director of Public Citizen’s Health Research Group, a nonprofit watchdog based in Washington, DC. Wolfe also objects to a provision that allows companies to delay making trial results public for up to two years if it

Box 2 What is competitive business information in clinical trials? The typical clinical trial scenario—phase 1 for safety, phase 2 for toxicology and for determining dosage determination and phase 3 for efficacy—has evolved over the years. Now, press releases come out daily that describe a drug in a phase 1/2 trial, or phase 2b, or some other name that further breaks down the stage of clinical development. When determining which trials harbor competitive business information, it may be more useful to think in terms of two categories: hypothesis generating (also called exploratory) and hypothesis testing (also called confirmatory or pivotal). When a company is testing a drug, it performs lots of clinical trials to try out different delivery methods, indications and patient subpopulations. According to Hoover Institute’s Miller, “At last count, on average the results of more than 70 clinical trials are submitted by a corporate sponsor to support a submission to the FDA for approval to market a new drug, but generally only two or three of these are ‘pivotal’ trials that provide the required definitive evidence of safety and efficacy.” In other words, the pivotal trials, which would most likely be phase 3 or late-stage phase 2, would be hypothesis testing. The other 67 trials, mainly phase 1 and early phase 2, would be hypothesis generating, and data from these trials would not have to be made public under the Enzi-Kennedy bill. PhRMA’s Goldhammer says that companies aren’t as concerned about disclosing data from hypothesis-testing trials because they are “closer to the finish line.” At that point, the timeline to approval is not as long as when exploratory phase 1 trials are being done, so disclosure will have less impact on the company’s pipeline. Goldhammer also notes that companies have a fiduciary duty to investors and the Securities and Exchange Commission to disclose results of late-stage trials, because they can have a greater impact on the success of a company than phase 1 trials. AB

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

is applying for marketing approval or attempting to publish the data. “There needs to be the shortest amount of time possible between a trial ending and the data being made public,” says Wolfe. Art Caplan, director of the University of Pennsylvania Medical Center’s Center for Bioethics in Philadelphia says that such a database should be a requirement, not an option. He believes that companies have an obligation to make the data public. Patients entering clinical trials are promised that the results will be made known to help advance medicine, but companies often renege on that promise—especially if the results are negative. “This [database] fulfills companies’ promises to patients,” says Caplan. But not everyone agrees. Henry Miller, fellow at the Hoover Institution and the Competitive Enterprise Institute in Palo Alto, California, thinks that concerns about drug companies obscuring negative results are exaggerated. “Except for offering a bonanza to plaintiffs’ attorneys trolling for business, the benefit of a publicly available database of clinical trial results would be minimal,” says Miller. Nonetheless, some companies are already complying with the provisions in the bill through a voluntary database set up by PhRMA shortly after the Vioxx withdrawal. PhRMA recommends that its members make public “the results of all hypothesis-testing clinical trials…regardless of outcome.” Many big pharma companies, such as Lilly in Indianapolis, Indiana, Roche in Basel, Switzerland, GSK and AstraZeneca in London, publish such data on their websites as well (Table 1). Another provision of the Enzi-Kennedy bill is the requirement for summaries along with the raw data. Such summaries are important, according to Greg Simon, president of the biomedical think tank FasterCures of Washington, DC, because he is “more worried about burying the public in data sets and statistics.” Debra Aronson, director of BIO’s bioethics committee, believes that data is best presented to patients through peer-reviewed journal articles, but if not there, then the results and a summary should be verifiable before going into a public database. “I think there should be a peer-review process for such a database. I know some don’t like that answer, but that would be best,” says Aronson. Share and share alike Most agree that making phase 1 data public would not help patients. As around 80% of drugs fail at this stage, and for many drugs, safety data are obtained by giving the drug to healthy volunteers, such data would not benefit the public. Merrill Goozner of the Washington, DC’s Center for Science in the Public Interest says that in some cases, such as hormone

1059

N E W S F E AT U R E

Table 1 Selected clinical trial results databases Database/launch date

Organization

Description

Clinicaltrials.gov/2002

National Library of Medicine (NLM)

Mandatory registry of trials “of experimental treatments for serious or life-threatening diseases or conditions.” Companies can register other trials and submit results, but that is not required by law.

Clinical Study Results/2004 www.clinicalstudyresults.org

Pharmaceutical Researchers and Manufacturers of America (PhRMA, Washington, DC)

Voluntary results database of hypothesis-testing clinical trials regardless of outcome. Contains information on trials of over 200 drugs from about 50 companies.

SearchClinicalTrials.org/projected launch end of 2006 www.searchclinicaltrials.org

The Center for Information & Study on Provides access to multiple registries. CISCRP is a nonprofit with supClinical Research Participation (CISCRP, port from individuals, government and research institutions, foundaDedham, MA, USA) tions and corporations.

Eli Lilly and Company Clinical Trial Registry/2004 www.lillytrials.com

Eli Lilly and Company (Indianapolis, IN, USA)

Registers all their phase 2, 3 and 4 clinical trials at initiation and results of phase 1, 2 and 3 trials for all commercially marketed products when the drug is available for patient use. Posts any significant safety findings as soon as possible.

Clinical Trial Protocol Registry and Results Database/2005 www.roche-trials.com

Roche (Basel)

Registers all their phase 2, 3 and 4 clinical trials that are ongoing and data from phase 2, 3 and 4 ‘confirmatory’ trials.

AstraZeneca Clinical Trials/2005 www.astrazenecaclinicaltrials.com

AstraZeneca (London)

Registers all their ongoing hypothesis-testing trials and results from all hypothesis-testing trials for its marketed products.

GlaxoSmith Kline Clinical Trials Register/2004 http://ctr.gsk.co.uk/welcome.asp

GlaxoSmithKline (Brentford, UK)

Holds data and summaries from all their clinical trials, including phase 1, for all marketed products. Results are posted for nonmarketed products if GSK sees a safety problem related to mechanism of action.

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

http://clinicaltrials.gov/

Sources: Organization websites and organization spokespersons.

replacement therapies, efficacy might be seen in phase 1 trials. In these cases, Goozner believes that the data should be made public, but these instances are very rare. The WHO cited a more recent clinical trial disaster (Würzburg, Germany-based TeGenero’s phase 1 antibody trial, in which six healthy volunteers experienced severe immune reactions that has left some with lasting medical problems) as another reason why the public’s trust in the drug industry is waning and how full transparency is necessary to regain that trust. Goozner believes the issue with phase 1 safety is an issue of communication among companies, more than an issue of making data public. The biotech and pharma industries could benefit from sharing such data for all phase 1 failures, even when the effects are not as drastic as those seen in the TeGenero study, by eliminating the duplication of dead-end studies. By sharing some details of their failures, the knowledge base of medicine would grow much faster, and the drug industry as a whole would become more efficient. “Competition in business is understandable, but science doesn’t work that way. Failures advance the field,” says Goozner. Ray Woosley, head of the Clinical Path Institute (C-Path) in Tucson, Arizona, agrees that companies need to learn from each other’s mistakes. “C-Path was created to do just that,” he explains. Woosley points to the institute’s Predictive Safety Testing Consortium as an example of such collaboration. Although the consortium was just launched in March, already 14 companies are working on a ‘precompetitive’ way of developing better preclinical safety tests. And getting companies to collaborate in a

1060

similar way on phase 1 data would be the next step, says Woosley. “If companies find the current efforts good, then I will approach pharma about clinical data,” he says. Caplan believes mishaps could be avoided by the FDA if they were given a little more money to be more vigilant. “The FDA by law gets phase 1 safety stuff and they should be much more aggressive about sharing it with others even if corporate or researcher secrets are jeopardized,” says Caplan. “In terms of human subjects, they [the companies] should understand the point of the study is to generate safety information and how that will be shared with the FDA—that is the goal of phase 1 studies—not [general] knowledge,” says Caplan. BIO’s Aronson worries that such data sharing of phase 1 trials could harm biotech firms, however. “Biotechs rely on venture capital money, and venture capitalists are investing in intellectual property. It would be hard to get investors if all your development ideas were shared with your competitors,” she says. There is always a balance to be kept between the need to share information so that others can use it and learn from it and the need to keep some information protected so that the idea can be developed into an innovative therapeutic, she adds. But who determines which data are shared to help progress the field and which are kept protected? “Establishing that balance is sometimes difficult and often will depend on the timing of disclosures,” says Aronson. Post-marketing blues The decision to deemphasize disclosure of early-phase trial results in the Enzi-Kennedy

not only mollifies company and investor concerns about competitiveness, but also may result in efforts being focused on what many see as the more serious problem. “The weakest part of regulatory oversight is once products get on the market,” according to Caplan. He cites the safety problems of Merck’s Vioxx and of a cardiac pacemaker from Minneapolis-based device manufacturer Guidant as examples of the FDA’s lack of teeth. “Anyone who thinks the current system is working is dreaming,” says Caplan. In this respect, there already may be a solution in the wings. Woosley thinks the Agency for Healthcare and Research Quality (AHRQ), in Rockville, Maryland, would be ideal to fix the problem. AHRQ gets about $300 million a year to fund 11 Centers for Education and Research on Therapeutics (CERTs), which are congressionally mandated to perform post-marketing studies on drugs, such as head-to-head comparisons that companies tend to avoid. Postmarketing studies are not gathered and made public, according to Woosley, and CERTs could play a role in helping patients understand their therapies. “The FDA is a passive system, driven by what people bring to it,” he says. Right now only the FDA and companies are educating the public about drugs. Vioxx and Paxil have shown us that system isn’t nearly enough. “What is missing is a learned intermediary,” explains Woosley. Aaron Bouchie, New York City 1. Clinical Trials.gov. http://www.clinicaltrials.gov 2. FDA Modernization Act of 1997. http://www.fda.gov/ cber/fdama.htm 3. De Angelis, C. et al. Ann. Intern. Med. 141, 477–478 (2004). 4. Sim, I. et al. The Lancet 367, 1631–1633 (2006).

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

The promise of the East: India and China as R&D options Simon Goodall, Bart Janssens, Kim Wagner, John Wong, Wendy Woods & Michael Yeh The East provides increasing opportunities for biotech companies seeking to optimize product development and accelerate time to market. But any undertaking in China or India requires close scrutiny of the risks.

S

mall to medium-sized enterprises (SMEs) in the biotech sector face a long, arduous journey toward successful commercialization of early-stage products. To get their products to market more efficiently and to realize their true commercial potential, biotechs are looking for new resources to tap for a productivity boost, and for new markets for their products. If pursued wisely, one of the most promising and practicable solutions is the sourcing of selected tasks to Asia, particularly to India and China. Both countries have already attracted considerable investment and involvement from pharma multinational corporations and could provide smaller biotechs with comparable opportunities. Consider some of the potential advantages: a huge and inexpensive talent pool (each country produces annually more than three times as many chemistry graduates as the US does), including an increasing number of Western-trained returnees; a vast patient population available for clinical trials; strong government support for biotech, both through investment (as in science parks) and

Simon Goodall is at The Boston Consulting Group, 355 South Grand Avenue, Suite 3200, 32nd Floor, Los Angeles, California 90071, USA; Bart Janssens is at The Boston Consulting Group, 14th Floor, Nariman Bhavan, 227 Nariman Point, Mumbai 400.021, India; Kim Wagner is at The Boston Consulting Group, 430 Park Avenue, New York, New York 10022, USA; John Wong is at The Boston Consulting Group, 34th Floor, Shell Tower, Times Square, Causeway Bay, Hong Kong, China; and Wendy Woods and Michael Yeh are at The Boston Consulting Group, Exchange Place, 31st Floor, Boston, Massachusetts 02109, USA. e-mail: [email protected]

Research (Chemistry)

Research (Biology) Target identification Genetic research Key activities and technologies

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

BUILDING A BUSINESS

Target validation

Compound generation and assay development

Functional genomics

Analog prep

Proteomics Protein biochemistry

Bioinformatics

Disease models

Structural chemistry

Expression profiling

Genetically modified mice

Analytical chemistry

Basic molecular biology technologies

Screening

Compound synthesis

Assay execution

Pharmacology

Clinical management

SAR evaluation

PKDM

Data management

Toxicology

Regulatory

Synthesis

Chemoinformatics

Clinical Preclinical

Lead optimization

Phase 1–4

HTS/UHTS Drug design Assay development

Bioimaging

Medicinal chemistry Cell-based models for efficacy

Animal models for efficacy

Common service Less common offerings service offerings India China

Figure 1 Indian and Chinese partnering opportunities along the R&D value chain. HTS/UHTS, high throughput screening / ultra-high throughput screening; SAR, structure-activity relationship; PKDM, pharmacokinetics and drug metabolism

through policies (such as tax concessions); and increasing private-sector funding and involvement. By making shrewd use of these attributes and actively working to manage the risks, your biotech could conduct operations in a leaner, more cost effective and perhaps faster way. There are dangers, however, and a considered approach remains the watchword. First, any involvement in the region should be undertaken as part of a global R&D strategy, not as ad hoc and opportunistic forays. Then, you need to think of a regional strategy, not a countryspecific strategy: the opportunity is a matter of China and India, not China or India. You need to consider that the offshoring process, though designed to ease the challenges and expenses of R&D, labors under its own set of complexities and inefficiencies. Although some opportunities will likely suit your company, others, equally appealing, might not, so you need to make precise evaluations each time. And even the surest opportunity involves possible

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

risk—most notably the risk to intellectual property (IP) and the chance of delays through red tape. Biotech SMEs may have more to lose with offshoring than a large pharmaceutical concern does, as they may lack the scale to tolerate IP theft or the failure of an outsourcing venture. They can also ill afford the diversion of internal resources to find the right set of sourcing partners or opportunities. The potential benefits do look increasingly viable, but at this point they remain more potential than proven. If you are seriously considering outsourcing to India or China, you need to start moving toward an integrated and effective strategy. There are three key issues to consider when doing so: your motivation in investing, the location of investment and the risks inherent in the activity. Motivations for investment The four likeliest motives for offshoring work to China and India are saving on R&D costs,

1061

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

BUILDING A BUSINESS reducing capacity bottlenecks, accessing talent and increasing market access. When broaching a strategy, you first need to clarify what weight to give to each of these motives. And then you need to assess how the available Indian and Chinese opportunities measure up in each case. The advantages of cost cutting go without saying, but offsetting them are the dangers inherent in any form of outsourcing: the possible need for greater supervision, and the potential for slower and lower-quality output. Reducing capacity bottlenecks is particularly advantageous for resource-strapped firms; by offshoring lower-priority projects, they can concentrate on higher priorities. Similarly, accessing talent to fill gaps as needed should give biotechs the freedom to concentrate on their core strengths. As for increased market access, the advantages again go without saying. Although the market is currently modest, its potential is very sizeable. Locating the investment Although India and China both offer outsourcing opportunities across all phases of the innovation value chain, the capabilities are uneven, and some of the more complex activities remain out of reach (Fig. 1). But don’t make any assumptions: new skills and resources keep coming online. A year ago, you would scour both countries in vain for preclinical services of US Food and Drug Administration/good laboratory practice (GLP)-quality; today, Bridge Pharmaceuticals in Beijing, or CDRI in Lucknow, India, will be happy to oblige. And if you need target discovery or validation, you could try various providers in Zhangjiang Life Science Park near Shanghai, or Triesta Sciences in Bangalore. Though less advanced overall than vendors in the developed world, Asian vendors have a clear advantage when it comes to price, offering cost savings of at least 60% in many areas, such as basic chemistry or clinical trials. Just make sure each time that those cost savings aren’t going to be canceled out by extra administrative expenses on your side, or lower productivity on the provider’s. With the right provider, you should be able to ease some of your pipeline bottlenecks and capacity constraints at a stroke. Which country to choose for any particular activity or project? And which to give greater emphasis to when devising a strategy? As things stand today, India’s greatest value is in giving you quick access to specific drug-development resources, so it might prove the better bet if your priorities are shorter time frames, easy setup, rapid results and very high cost savings. China’s main attraction is in potentially strengthening your foothold in its huge and

1062

fast-growing biopharma market, so if you have a particularly commercial agenda—developing government contacts, for example, with an eye to increasing market access—you would probably opt for China. And if you have a longer time frame, you might also favor China, and pursue lengthier projects there through an alliance partner, perhaps one of the prestigious government-funded research institutes. But a fully rounded strategy will leverage the assets of both countries, rather than just one of them, taking full advantage of their differences. In capabilities, China is considerably ahead in biology, though still at a modest level compared to developed nations standards. Chinese scientists participated in the Human Genome Project, and have made some notable advances in gene therapy and stem cell work. In 2003, Shenzhen-based SiBiono GeneTech was granted the world’s first license for a gene therapy medication. In chemistry, on the other hand, India arguably has a solid lead, with some vertically integrated suppliers now able to offer end-to-end services. As for clinical trials, India once again is quicker off the mark, with contract research organizations typically able to secure approvals and get launched within 3 to 4 months, against a norm of 9 to 12 months in China. India also possesses superior strengths in information technology–dependent areas, most notably biostatistics and clinical trials data management. There are also some broader considerations. India has the unquantifiable benefit of very high proficiency in the English language. And arguably, its managerial and scientific/edu-

cational culture is more Westernized than China’s—more open to breaking with tradition and more innovation minded. That said, Chinese scientists with advanced training from Western institutions are returning at everincreasing rates, often to take management positions at Chinese biopharma companies. What’s more, China has the distinctive strategic benefit of increased commercial potential for biotech products themselves (see Box 1). Companies that invest in China stand to enhance their commercial prospects by impressing doctors, key opinion leaders and officialdom. By raising technology standards in the country, R&D investors will earn government goodwill that could raise their chances of expedited approvals and easier market access. The risks, singly and jointly On the downside, there are risk factors specific to each of the two countries. If operations are ever disrupted by workforce disputes or animal-rights activists, that would be in India; if by government interference, that would more likely be in China. The infrastructure is also far more reliable in China; India still suffers from interrupted power supplies, antiquated ports and inadequate highways in many regions. China’s GLP standards are still evolving, and lag behind those of India—with few labs in either country being internationally GLP-approved. And the bureaucratic hurdles differ: the Indian authorities grant approvals for clinical trials far faster than their Chinese counterparts. But at the preclinical stage, Indian regulations are particularly stringent, making it difficult for

Box 1 The region’s market for biotech products The markets for biotech products in China and India are quite different from those in the developed world—with a far lower proportion of consumers who can pay even a tiny fraction of Western prices. But given the high rate of growth of the region, especially within the middle class, the opportunity may eventually be a lucrative one, especially in China. China’s overall pharmaceutical market is already 2–3 times more valuable than India’s, and will remain so. It should rise from $12 billion in 2005 to a predicted $37 billion in 2015 (graduating to become the world’s fifth most valuable market en route), against India’s $5.3 billion and $16 billion, in 2005 and 2015, respectively. What’s more, the proportion of generics (currently over 70% by value in both markets) versus branded drugs is declining more steadily in China than it is in India. And the price realization, though lower than that of developed nations, is considerably higher in China than in India. In each country, the target market for high-priced biotech drugs is probably no more than 5% of the population—those with private health insurance. Still, that’s 5% of a billion-strong population in each country. It all adds up: sales of biotech products in China reached $2.5 billion in 2005. Drugs that qualify as blockbusters in the United States can reach annual sales of $50 to $100 million in China with rapid success. GlaxoSmithKline’s (Brentford, UK) Heptodin (lamivudine) reached $80 million in annual sales in China within five years of launch. That said, the commercial factor is less a current consideration than a future one. Biotechs about to launch new products, at least for the next few years, may best be advised to outlicense them to established pharma companies with proper scale in China or India. M.Y.

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

BUILDING A BUSINESS laboratories to source genetically modified animals and to import and export human tissue or blood samples. Viewed more broadly, the main risks apply to both countries: red tape and insecure IP. In each case, the two governments have taken corrective steps, easing the bureaucratic constraints and tightening the IP statutes. How these measures translate into reality isn’t yet clear. There are cultural and human factors at work, not just regulatory ones. Western ideas of urgency and privacy may take some time to permeate. Although laws that approach Western standards now exist, their enforcement in the realm of biopharma, especially in biologicals has not yet been established (see Box 2 for further details). Biotechs can reduce their IP risk in both India and China through proactive management. First, you should carefully weigh the critical value of the IP against the perceived benefits of entering India or China, and refrain from any project with an unfavorable balance. When selecting a partner or vendor, you should make all necessary due diligence evaluations of the candidates on your shortlist. In particular, check on their IP-protection measures—physical, electronic, and other. One biotech, for instance, disables its printer drivers and tracks all data downloads. Some local vendors literally erect ‘Chinese walls’—separate rooms and facilities for client activities—and even withhold the client’s name from the workforce. And when negotiating contractual arrangements, you should ensure that legal recourse, both local and abroad, is properly registered. Vendors such as Beijing-based Bridge Pharmaceuticals and Aurigene in Bangalore maintain US-based operations in part to give assurance that they comply with all US IP regulations—and to give customers the option of pursuing US-based litigation if they don’t. Even if not offshoring work to India or China, biotechs might still consider it prudent to protect their most valuable and vulnerable IP in these countries. By licensing IP to Chinese or Indian companies, they stand a better chance of preempting patent infringement, or of being represented by a party with a ‘home court’ advantage in case of litigation. Choosing a sourcing model Let’s assume that after weighing the risks and potential benefits scrupulously, you’ve decided to take the plunge, or at least to test the water. You now need to choose an optimal business model. There are three basic models—outsourcing, partnership and captive investment

Box 2 IP developments in India and China Among executives contemplating offshoring, IP protection remains a key concern, especially for discovery work. The main IP laws in both India and China are new and relatively untested, so caution is appropriate. After major changes in India’s IP laws in April 2005 that shifted from process to product protection, India now appears to have a reassuringly tough set of IP standards. Strong trade secret laws and the new Contract Act, based closely on IP statutes in the UK, protect a company against risks related to information leakage or employee switching. In addition, they allow companies to pursue litigation in Western courts against Indian companies for IP breaches. Another source of comfort is the presence of R.A. Mashelkar, director general of the Council for Scientific and Industrial Research. Mashelkar is a leading proponent of biotech partnerships and a global authority on IP protection in developing nations, serving as vice chair of the Commission on Intellectual Property Rights, Innovation and Public Health for the World Health Organization (Geneva). Although India’s new IP laws have appeared to work well in other industries, such as business process outsourcing, which handle sensitive company data, it remains to be seen if they will work as well for biotech and for biological products. After all, the Indian pharma industry as a whole does have a tradition of patent challenges and deep reverse-engineering skills. China too has a strong set of IP protection laws in place, though perhaps not quite as strong as India’s overall, and perhaps not quite as strong for biologicals as for chemical molecules. Enforcement has been an ongoing issue, and the judicial protection of IP still has to prove itself. But since its accession to the World Trade Organization in 2001, the country has been subject to the Agreement of Trade-Related Aspects of Intellectual Property Rights, so the government is under pressure to enforce international standards. Its previous efforts to change underlying attitudes toward IP protection were not unqualified successes: the patent process remains awkward, Chinese courts continue to struggle with IP cases and protection is not always applied equally across domestic and foreign parties. M.Y.

model—offering different degrees of flexibility and control. For biotech SMEs, the starting point would generally be the outsourcing model: hands-off and low-commitment, and therefore involving minimal supervision and easy entry and exit. Of course, it also involves minimal control over output and IP, and for those dual reasons the projects outsourced would tend to be low-complexity work of less strategic import. Once your company has gained confidence and has decided upon a longer-term commitment to the region or to a particular vendor, you may choose to advance to a partnership model, assigning projects of higher complexity or greater breadth to a Chinese or Indian provider, with more of your own participation in supervising, training and monitoring. This would afford you greater control over quality and should improve communication and trust. However, by moving your partner up the learning curve, you risk finding that they use the enhanced know-how of their workforce to serve potential competitors of yours. The most committed model, captive investment—where a company acquires and operates its own R&D base in China or India—is unlikely to be adopted by smaller or cash-constrained biotechs. It certainly affords increased

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

control and IP security, but at the cost of a heavy investment of time and resources. It also means a host of new responsibilities. There is no longer a streetwise local intermediary to deal with red tape or make good any unexpected infrastructure gaps. One biotech that set up a captive base in China admits ruefully that it has had to manufacture its own rodent cages. Finding the right partner To match corporate investors with the right vendor or collaborator, both India and China have quasi-official dating agencies. In China, you would approach the administration in any of the biotech parks, and they would recommend a suitable match from the list of firms based there. In India, you would approach the Ministry of Science and Technology’s Department of Biotechnology or the Council for Scientific and Industrial Research, and they would fix you up with a potentially ideal partner. But it’s worth ranging far wider than these sources. After all, finding the right partner will make a big difference to your offshoring experience, so don’t stint on the time and effort invested. In both countries develop ‘guan xi’—good relations with influential people—to get the best advice and also some help in sealing the deal. Investors and providers are heavily networked, and you should link in to these

1063

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

BUILDING A BUSINESS networks right up to the last minute, as the landscape changes quickly. At this early stage, biotechs can afford to be cautious and methodical in their approach, as limited vendor capacity is not currently an issue. Over time, vendor capacity should grow to keep pace with demand, with perhaps more of a focus toward smaller biotechs as the sourcing market develops. That said, the earlier you take the plunge, the sooner you can reap the cost savings and the better your chances of accessing proven and established vendors. Looking ahead The virtue of the sourcing option goes beyond cost and time efficiencies. Biotech talent and drive are increasingly abundant in China and India, and innovative ideas, which can’t be far off, will be equally amenable to tapping. After all, the governments of the two countries aren’t investing in biotech to create sourcing opportunities but to establish vigorous high-tech industries of their own. Specific areas of China and India represent rapidly growing clusters of biopharma

1064

expertise and may ultimately be as important to biotech as San Diego or the Bay Area are. You only need to look at all the innovation emerging from the Taiwanese computer industry to see the parallels with Indian and Chinese biotech, and the pattern of success that the countries are sure to emulate. Small Western biotechs with large ambitions and a taste for adventure can get in at the ground floor and harness Asian innovation, rather than simply offshoring their own. One other possibility that India and China are opening up is a new model of biotech product development (and perhaps of manufacturing, too). Call it the ‘modular model,’ a kind of decentralized R&D system where different aspects of R&D are distributed globally and conducted almost autonomously in different locations. But you don’t have to look that far ahead. The opportunities in China and India are rapidly developing, with key pieces falling into place. Weigh the options carefully, delve into the realities and risks of operating within the two countries and decide carefully if you

want to enter. If you do, devise a precise and methodical strategy, find the right partners and implement the strategy with full commitment. With the right strategy, you stand to give your biotech SME a productivity boost and a handsome competitive advantage. ACKNOWLEDGMENTS While most of the material in this article derives from client work, it is backed by a detailed survey conducted by the Boston Consulting Group in 2005 and 2006, collating the views and experiences of executives at over 90 vendors in China and India and of officers at several government research institutes in the two countries, and of senior executives at over ten biopharma MNCs operating there. A report summarizing the findings from this study (Looking Eastward: Tapping China and India To Reinvigorate the Global Biopharmaceutical Industry, August 2006) along with other publications on the opportunities for biopharma R&D in India and China can be found at http://www.bcg.com This story was reprinted with some modification from the Building a Business section of the Bioentrepreneur web portal (http://www.nature.com/bioent), 25 July 2006, doi:10.1038/bioent910.

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

CORRESPONDENCE

How to stay out of a BIND To the editor: Your very sympathetic editorial in the February issue (Nat. Biotechnol. 23, 215, 2006) regarding the demise of the Biomolecular Interaction Network Database (BIND) assigns the blame for this resource’s passing to “...bureaucratic delays [and] government fiscal nitpicking....” and calls on science funding agencies to provide more long-term funding for databases. Worthy as your crusade to better direct my tax dollars may be, I don’t find BIND to be a particularly suitable poster child for the effort. According to your account, BIND, via the Blueprint Initiative, burned through $25 million in about two years. Even in Canadian dollars that burn rate is nothing short of shocking, especially given BIND’s relatively modest scope, and the ease with which its data were to be ‘scraped’ from a relatively small number of scientific publications (I have quite a bit of professional experience in this domain, so I say this with some insight.) Personally, I admire Genome Canada’s decision to stop the bleeding. I’m sure there were, and are, those who have found BIND useful. Whether or not it was another $20.8 million worth of ‘useful’ or a total of $46 million worth of useful, given all the other worthy scientific uses to which that sum could be put, was the question, and Genome Canada decided this in the negative, citing concerns regarding management, budget justification and financial plan— concerns your editorial brushed aside without comment. A happy consequence of Genome Canada’s decision is that BIND is now where many such efforts belong. . . in private hands (albeit under the same management), where the rigors of the marketplace can impose upon its owners some deep regard for efficiency and utility. If BIND is truly valuable, then

Christopher Hogue can charge users a modest access fee; perhaps research funding agencies will view their grantees’ carefully justified requests for these small sums with favor. He may then use such hardwon revenues prudently to sustain and improve the product. If, on the other hand, BIND isn’t a particularly important resource, then users won’t be willing to pay, and it will pass on. This is as it should be. Much the same may be said for the Alliance for Cellular Signaling’s Molecule Pages, which never really amounted to much (numerically, at least). Now under Nature Publishing Group’s cost- and profit-conscious guidance they will, no doubt, either flourish or fold. Rather than arguing for the importance

of long-term database funding by granting agencies, BIND’s saga in fact argues for greater caution and more demanding oversight when these agencies elect to fund a database’s initial development. Realistic plans for long-term sustainability must be demanded, as must some basic enterprise management ability on the grant recipient’s part. Such expectations are anything but fiscal nitpicking; they are a fiduciary responsibility. I have no bone to pick with researchers who bemoan the intermingling of capitalism and scientific research (if, in this Bayh-Dole era, there’s anyone left who can still do so with a straight face). But those who feel this way should be prepared to make every precious tax dollar go as far as it possibly can. Those who fail at this should be quicker to blame themselves, and slower to blame ‘bureaucrats’. William B Busa Busa Consulting, Renfrew, Pennsylvania, 201 Johns Schools Road, Renfrew, PA 16053, USA. e-mail: [email protected]

The dog as a cancer model To the editor: The dog has long been used as a model in drug discovery and development research because of its similarities to human anatomy and physiology, particularly with respect to the cardiovascular, urogenital, nervous and musculoskeletal systems. Compared with other animal models, it may also prove invaluable in research and development on cancer drugs, because dogs naturally develop cancers that share many characteristics with human malignancies. The completion of a high (7.5×) coverage canine genome1 now paves the way for the development of critical resources that will allow the integration of naturally occurring canine cancers within the mainstream of cancer research. To initiate and facilitate collaborative efforts and leverage the opportunities provided by the dog in

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

cancer research, scientific and clinical leaders from both human and veterinary oncology have come together to form a multidisciplinary consortium, the Canine Comparative Oncology and Genomics Consortium (CCOGC). Cancers in pet dogs are characterized by tumor growth over long periods of time in the setting of an intact immune system, interindividual and intra-tumoral heterogeneity, the development of recurrent or resistant disease, and metastasis to relevant distant sites. In these ways, dog cancers capture the ‘essence’ of the problem of human cancer in a manner not possible with other animal model systems. Compared with other large animals commonly used in biomedical research, such as pigs and nonhuman primates, an additional advantage offered by pet dogs is that they are cared for into the ages

1065

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

CORRESPONDENCE commonly associated with the highest risk for cancer. This risk, coupled with their large population size (>70 million in the United States), results in a cancer rate sufficient to power clinical trials, including assessment of new drugs. Using crude estimates of cancer incidence, in the United States alone, there are ~4 million new cancer diagnoses made each year in dogs2. Examples of these cancers include non-Hodgkin lymphoma, osteosarcoma, melanoma, prostate carcinoma, lung carcinoma, head and neck carcinoma, mammary carcinoma and softtissue sarcoma. For many of these cancers, strong similarities to human cancers are seen, including histological appearance, tumor genetics, biological behavior and response to conventional therapies. The compressed course of cancer progression seen in dogs allows timely assessment of new cancer therapies. With the recent release of the canine genome sequence, the dog is now also amenable to comparative genomic analysis. Indeed, preliminary assessment of the canine genome suggests that the dog and human lineages are more similar than the human and rodent lineage in terms of both nucleotide divergence and rearrangements. The CCOGC initially plans to take advantage of these opportunities through the following actions: • Develop a robust and well-annotated biospecimen repository of canine cancers and tissues—funding of a large, accessible biospecimen repository is difficult using existing resources. • Improve opportunities to link the efforts of veterinary and comparative oncologists with the work of basic cancer researchers and clinicians. • Initiate non-clinical trials using pet dogs with cancers that are integrated into the development path of new cancer drugs. Mechanisms for review of these non-clinical trials by regulatory bodies should be developed such that information from these studies, where appropriate, may help to focus the scope of early human clinical trials. To date, non-clinical studies in dogs with cancer have answered questions that would have been difficult or impossible to answer in either mice or humans. The lack of goldstandard veterinary treatments also provides the opportunity for the early and humane evaluation of new therapies for dogs with

1066

the development of better cancer drugs for humans and other pet dogs. An opportunity window now exists. With the realization of the need for more useful animal models in human cancer drug development, the organization of a number of consortia and collective groups, the completion of the canine genome sequence, the increasing availability of dogspecific biological reagents and investigative methodologies, (e.g. antibodies specific for dog proteins or dog-specific oligonucleotide arrays) and the interest of the animal health biotech and drug industry, the CCOGC hopes to further stimulate efforts to fully exploit the many advantages of the dog in cancer drug research.

The 2.4-billion-bp (7.5× coverage) sequence of a female boxer dog (pictured) published in December 2005 (ref. 1), together with that of a poodle sequence released in 2003, should facilitate the use of dogs in cancer studies.

cancer. Following institutional review of trials, pet owners would be given the option to enter their dogs into clinical trials and in so doing receive access to novel cutting-edge treatment options for cancer, many of which are less toxic than conventional treatment options currently available. Accordingly, studies in pet dogs offer opportunities in both human and animal healthcare. First, pet dog trials will help better define the safety and activity of new anticancer agents. They may also assist in the identification of relevant biomarkers associated with response or exposure to these drugs. Furthermore, these studies may allow rational development of combination strategies that will improve the success of these new drugs in human clinic trials. These data may be useful before the filing of an investigational new drug application (IND) at the US Food and Drug Administration (FDA; Rockville, MD) and as means to optimize the development of anticancer agents currently in early human trials. Second, data generated through such studies may inform the development of new cancer treatments for animals. Research and development of new anticancer treatments is increasingly recognized as an area of need in the field of animal health. In this way, pet dogs with cancer will be directly helped through access to new these new drugs; results may be translated and extended to

Chand Khanna1, Kerstin Lindblad-Toh2, David Vail3, Cheryl London4, Philip Bergman5, Lisa Barber6, Matthew Breen7, Barbara Kitchell8, Elizabeth McNeil9, Jaime F Modiano10, Steven Niemi11, Kenine E Comstock12, Elaine Ostrander13, Susan Westmoreland11 & Stephen Withrow3 1Comparative Oncology Program, Center for

Cancer Research, National Cancer Institute, 9610 Medical Center Drive, Room 315, Rockville, Maryland 20815, USA. 2Broad Institute of Harvard and Massachusetts Institute of Technology, 320 Charles Street, Cambridge, Massachusetts 02141, USA. 3Animal Cancer Center, Colorado State University, Fort Collins, Colorado 80523, USA. 4Department of Veterinary Biosciences, The Ohio State University, Columbus, Ohio 43210, USA. 5The Animal Medical Center, New York, New York 10021, USA. 6Department of Clinical Sciences, Tufts University School of Veterinary Medicine, North Grafton, Massachusetts 01536, USA. 7Department of Molecular Biomedical Sciences, College of Veterinary Medicine, North Carolina State University, Raleigh, North Carolina 27606, USA. 8Center for Comparative Oncology, Michigan State University, East Lansing, Michigan 44824, USA. 9Department of Veterinary Clinical Sciences, University of Minnesota, St. Paul, Minnesota 55108, USA. 10Integrated Department of Immunology and AMC Cancer Research Center, University of Colorado at Denver and Health Sciences Center, Denver, Colorado 80214, USA. 11Center for Comparative Medicine, Massachusetts General Hospital, Charlestown, Massachusetts 02129, USA. 12University of Michigan, 5111 Cancer Center, Ann Arbor, Michigan 48109, USA. 13National Human Genome Research Institute, National Institutes of Health, 50 South Drive, MSC 8000, Building 50 Bethseda, MD 208928000, USA. e-mail: [email protected] or [email protected] 1. Lindblad-Toh, K. et al. Nature 438, 803–819 (2005). 2. Vail, D.M. & MacEwen, E.G. Cancer Invest. 18, 781–792 (2000).

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

CORRESPONDENCE

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

GM sterile mosquitoes—a cautionary note To the editor: The article in your November issue by Andrea Crisanti and colleagues (Nat. Biotechnol. 23, 1414–1417, 2005) reported the development of a transgenic strain of Anopheles stephensi, an Asian malaria vector, that the authors suggested may be useful as a sexing strain in a sterile insect technique (SIT) program against this vector. The SIT relies on the release of massive numbers of sterilized male mosquitoes to reduce the reproductive capacity of wild populations that transmit malaria1–4. Sterile females can still transmit disease, hence the need for efficient sex separation systems. It is beyond doubt, therefore, that this new methodology addresses an important need of mosquito SIT programs currently under development. As can be seen from the data, the use of this sexing method under experimental smallscale conditions was successful, but we do wish to respond to the suggestion that this methodology, and even this strain, can immediately be transferred to a large-scale SIT program. On the basis of our experience with the development of comparable systems in other species, we expect that strain evaluation(s) will have to be extremely thorough and carried out under appropriate conditions before it will be possible to judge whether a strain or particular sexing procedure is suitable for use in mosquito control programs integrating the SIT. These strains will have to be reared at high levels of production and for an extended period of time before sufficiently reliable and realistic data on the overall fitness, the accuracy and efficiency of the sexing procedure and the stability of the sexing system become available. In addition, the field performance of these strains will need to be evaluated. All these data will be used by decision makers to weigh any potential negative characteristics of the strain(s) against the benefits they provide, and only then can a judgment on the suitability of a particular strain(s) for inclusion in an SIT program be made. Radiation-induced sterility provides

some level of risk mitigation when transgenic insects are released, and this approach has been proposed for a first evaluation of the use of this technology5. In operational programs, where insect competitiveness is a key factor for success, there is currently a trend to reduce the radiation dose to a level that maximizes the sterility induction in the wild population. In the case of transgenic strains, however, this level will depend on regulatory requirements and the type of strain that is being released; for example, what type of transgene is used, in combination with what operational strategy (that is, eradication versus suppression). It is conceivable that a lower dose chosen for a conventional strain would not be appropriate for a transgenic strain. More troubling is the perception, fuelled by comments made in the press, that any efficient transgenic sexing strain can be easily incorporated into a mosquito SIT program without much further consideration. Although release of sterile males for mosquito control has been practiced in the past, direct inclusion of modern biotechnological approaches such as transgenesis should not be taken for granted. This technology needs to be considered systemically-holistically and be integrated into a broader social context6—a notion that larger development agencies like the World Bank (Geneva, Switzerland) have recognized for many years, but which still appears to have eluded some scientists and funding bodies. Mosquito genetic control specialists have been discussing the merits and limitations of modern biotech for over five years from a molecular genetic7, ecological8 and transitional9 perspective. It is evident from these discussions that only when the benefits are judged to outweigh the publicly perceived risk of the technology10 will the release of genetically modified (GM) mosquitoes become a reality. Thus, it is imperative that important stakeholders, in particular endbeneficiaries, participate in the scientific and development process11. If not, millions will be poured into technologies that are

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

not acceptable or feasible, betraying those most in need. A participative-iterative-strategic approach to malaria control is necessary to cope with intrinsic uncertainties of the interventions and changes of the ‘environment’12. The inclusion of ethical, legal and social aspects in this debate has been rudimentary at best. Although it is argued that GM insects for disease control is still in its infancy, we contend that several negative developments in the field of GM organisms may seriously impede the future applicability of this approach. Given the intricacies of stakeholder management, even in developed parts of the world13, we propose a three-pronged strategy to anticipate potential antagonism. The first and most critical step will be to gain public support. The establishment of trust through openness and direct involvement of stakeholders, including public authorities and the press, in the decision making process will be critical. Failure in this regard could result in the polarization of viewpoints and scaremongering; indeed, in India in the 1970s claims of biological warfare in the press led to the abandonment of a World Health Organisation (Geneva, Switzerland)-funded mosquito genetic control program just two days before the start of releases and after several years of research of development14. To prevent history from repeating itself, the establishment of equitable partnerships with scientists in disease-endemic countries combined with the transfer of ‘problem ownership’ are necessary. Scientific funding agencies should appreciate the complexity of such issues and the resulting need to communicate through other means in addition to the peer-review process, as rationality and reductionism are embedded in the scientific method and culture15 and are not necessarily perspectives required to tackle complexity. This would lead to a research agenda also driven by developing nations. A second need relates to oversight. The search for potential field sites to release transgenic mosquitoes is currently proceeding, backed by hastily established and cosmetic partnerships with scientists and institutions in situ. It follows that in the absence of any governance over this process and research progression in years to come, serious problems may develop. Inasmuch as

1067

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

CORRESPONDENCE developing countries are actively developing policy to engage with GM crops, there is indeed very little going on in terms of GM insects, which, for the record, will ignore national boundaries. An international entity with broad, adaptive and adequate representation is therefore urgently called for. Given the right mandate, it can safeguard against uncontrolled expansion of activities while serving as a shield for antagonistic influences through active stakeholder engagement. Finally, following the foregoing multiple perspective debates on GM mosquitoes, we propose the rapid initiation of an international gathering to start addressing the complexity of ethical, legal and social aspects of GM mosquitoes for disease control, a process that should already have taken place16,17. We conclude that contrary to there being a ‘green light for mosquito control,’ as announced in your journal18, research on SIT using transgenic insects has, for now at least, stalled at a yellow light. Bart G J Knols1, Rebecca C Hood-Nowotny1, Hervé Bossin1, Gerald Franz1, Alan Robinson1, Wolfgang R Mukabana2 & Samuel K Kemboi2 1Entomology Unit, FAO/IAEA Agriculture and Biotechnology Laboratory, A-2444 Seibersdorf, Seibersdorf, Vienna, Austria. 2University of Nairobi, P.O. Box 29053, Nairobi, Kenya. e-mail: [email protected]

1. Dyck, A.V., Hendrichs, J. & Robinson, A.S. (eds.) The Sterile Insect Technique: Principles and Practice in Area-Wide Integrated Pest Management (Springer, Heidelberg, 2005). 2. Catteruccia, F. et al. Science 299, 1225–1227 (2003). 3. Andreasen, M. & Curtis, C.F. Med. Vet. Entomol. 19, 238–244 (2005). 4. Franz, G. Genetica 116, 73–84 (2002). 5. Benedict, M. & Robinson, A.S. Trends Parasitol. 19, 349–355 (2003). 6. Scott, T.A., Takken, W., Knols, B.G.J. & Boete, C. Science 298, 117–119. 7. Alphey, L. et al. Science 298, 119–121 (2002). 8. Takken, W. & Scott, T.A. (eds.) Ecological Aspects for Application of Genetically Modified Mosquitoes. (Kluwer Academic Publishers, Dordrecht, The Netherlands, 2005) 9. Knols, B.G.J. & Louis, C. (eds.) Bridging Laboratory and Field Research for Genetic Control of Disease Vectors (Springer, Berlin, 2005). 10. The Royal Society. Risk Analysis, Perception and Management. Report of the Royal Society Study Group (The Royal Society, London, 1992). 11. Wynn, B. Global Environ. Change June, 111–127 (1992). 12. Rondinelli, D. Development Projects as Policy Experiments. (Routledge, London & New York, 1993). 13. Lusk, J.L. & Rozan, A. Trends Biotechnol. 23, 386– 387 (2005). 14. World Health Organisation. WHO Chronicle 30, 131–139 (1976). 15. Ison, R.L. Rangeland J. 15, 154–166 (1993). 16. Macer, D. Ethical, Legal and Social Issues of

1068

Genetically Modified Disease Vectors in Public Health. TDR/STR/SEB/ST/03.1 (World Health Organisation, Geneva, Switzerland, 2003). 17. Touré, Y.T. & Knols, B.G.J. in Genetically Modified Mosquitoes for Malaria Control (Boëte, C., ed.) (Landes Bioscience, Georgetown, Texas, USA, in the press, 2006). 18. Atkinson, P. Nat. Biotechnol. 23, 1371–1372 (2005).

Peter Atkinson responds: Knols et al. draw attention to two important points: that any new genetic strain developed for use in the sterile insect technique must undergo rigorous testing to ensure that it meets the necessary quality control standards required for the successful application of this technique; and that there must be full consultation with the public, stakeholders and any other interested parties before transgenic

strains can be released. These self-evident facts are not in dispute; rather, the advance reported by Crisanti and colleagues in Nature Biotechnology illustrates that recombinant techniques are now generating genetic strains that may now be appropriate for assessment and, pending the outcome, deployment in insect genetic control programs. The application of these developments do need to be openly discussed in the type of forum outlined by Knols et al. and, toward this goal, preliminary workshops on this topic have already been convened1. 1. Takken, W. & Scott, T.W. (eds.) Ecological Aspects for Application of Genetically Modified Mosquitoes. Reports from a Workshop held at Wageningen University and Research Center, June 2002 (Kluwer Academic Publishers, Dordrecht, The Netherlands, 2003).

Sequencing errors or SNPs at splice-acceptor guanines in dbSNP? To the editor: Single-nucleotide polymorphisms (SNPs) are the most frequent type of human genetic variation. They are the major basis of our phenotypic individuality, particularly with respect to heritable differences in disease susceptibility. Large collections of mapped SNPs, public and private, are powerful tools for genetic studies1. The most comprehensive public SNP database, dbSNP (http://www. ncbi.nlm.nih.gov/projects/SNP), currently contains more than 12 million human SNPs (version 126). This wealth of data is extensively used by a broad community, including clinical, experimental and computational scientists, for both locus-specific and genome-wide studies. Therefore, the quality and completeness of dbSNP is of paramount importance and a recent meta-analysis of four confirmation studies estimated a false-positive rate of ~15–17%2. As we have an interest in alternative splicing in general3 and with respect to diseases in particular4, we searched dbSNP for human variations in a nine-nucleotide context (three exon and six intron positions) of all splicedonor/acceptor sites of mRNA RefSeqs. Contrary to our expectation for the highly conserved intron positions +1, +2 (donor) and –2, –1 (acceptor), the acceptor G at –1 showed a variability comparable to that of the random position –4 (Fig. 1a). As the disruption of the G at –1 normally results in the loss of the

acceptor site5, we questioned whether this surprising variability could be compensated by any of the known biological processes (for example, RNA editing) or is an indication for a yet unknown biological phenomenon. As we could not shape a plausible explanation for our observation, and before we considered undertaking a challenging, lengthy and potentially fruitless search for an unknown biological mechanism, we decided next to evaluate the possibility that false-positive entries in dbSNP are accountable for the inexplicable variability of position –1. To this end, we first used the dbSNP validation status description and classified the RefSNPs (dbSNP entries) in three categories: (C1) validated by frequency or genotype data from HapMap6 or any other submitter; (C2) validated by independent submissions, observation of the minor allele in at least two chromosomes or submitter confirmation; and (C3) single submission without confirmation. Conspicuously, position –1 showed the highest fraction in C3 (305 of 364, 84%; Fig. 1b). As experimental verification of RefSNPs depends on the availability of appropriate population samples and assays, it was not feasible for us to carry out such a study on a large scale. Therefore, we switched to a verification procedure making use of the electropherograms derived from automatic fluorescence-based DNA sequencing instruments (traces).

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

a

Figure 1 RefSNPs and sequence confidence. (a) Apparent hypervariability at splice-acceptor Gs. (b) Classification of RefSNPs at the splice acceptors according to their validation status. (c) Electropherograms (traces) illustrating the ‘G after A’ problem at splice-acceptor sites in the 5′-to-3′ sequencing direction. (d,e) Sequence confidence (Phred) values of trace data supported RefSNPs (d) classified according to their validation status and of G/H RefSNPs (e) classified according to the 5′ nucleotide; (d,e) numbers expressed as a percentage

Donor

500

Acceptor

No. RefSNPs

400 300 200 100 0

G

N

+1 +2

A

Intron

G

Exonic

–2 –1

Intronic Intron border

Currently, 76% of all RefSNPs are supplied with trace references and for nearly 60% these data are accessible via the US National Center for Biotechnology Information (NCBI) Trace Archive (http://www.ncbi.nlm.nih.gov/ Traces; Supplementary Notes). We manually examined the available traces for RefSNPs at acceptor positions –2, –1 and +1 and collected false-positive entries, which we classified as sequencing errors (wrong base calling due to low signal-to-noise ratio) and database errors (identity of genomic RefSeq and the trace supported RefSNP allele or ambiguous alignment in microsatellites). Sequencing errors were mainly detected among C3 RefSNPs that are solely based on single-pass trace data. Database errors occurred both in C2 and C3 RefSNPs independently of their trace coverage (single trace, multiple traces of the same strand, traces from both strands; Supplementary Notes online). The astonishing error rate of 93% among 181 RefSNPs with trace data at acceptor position –1 was exclusively caused by the wellknown suppression of G after A incorporation using thermostable, genetically engineered DNA polymerases in dye terminator sequencing reactions7 (Fig. 1c). Naturally, this problem occurs at acceptor sites only in forward (5′-to-3′) traces because the AG is CT in the reverse sequencing direction. Moreover, the ‘G after A’ problem is further enhanced by the polypyrimidine tract preceding the acceptor AG in the splice consensus8. Homopolymer stretches of T and C are known to cause problems with sequence accuracy as a result of polymerase slippage9, thus leading to elevated error rates not only at position –1 but also at –2 and +1. Altogether, we estimated false-positive rates at acceptor positions –2, –1 and +1 of 17%, 82% and 11%, respectively (Supplementary Tables 1–3 online). Excluding the estimated false-positive rates, no significant difference in the variability between acceptor positions –1 and –2 remains. Thus, we conclude that a systematic sequencing error (‘suppressed G after A’) and not a previously unknown biological phenomenon causes the high

b

c

500

rs6089914

400

No. RefSNPs

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

CORRESPONDENCE

300 200

rs12039312

100 0

A C1

d

C1 10 3

G C2

C3

C2 10 3

C3 Phred values

18

0–29 19

> 40

87

87

e

A(G/H)

G(G/H)

C(G/H)

9 38

30–39

63

41

T(G/H) 8

11

19

18

19

72

70

74

21

frequency of RefSNPs in splice-acceptor position –1. Sensitized by this analysis, we then asked to what extent dbSNP contains sequencing errors in general. First, a scan of all RefSNPs for the sequence confidence of the allele alternative to the genomic RefSeq confirmed our initial observation that false positives are very likely enriched among C3 entries (18% with Phred confidence value <30; which means more than one error among 1,000 entries10) and will be equally rare among C1 and C2 entries (Fig. 1d; Supplementary Notes online). Moreover, the ‘suppressed G after A’ problem is not restricted to acceptor sites because among all G/H (genomic RefSeq allele/non-RefSeq allele, where H stands for A,C or T) C3 RefSNPs with traces, the fraction of low-confidence entries among A(G/H) variations is twice as large as for the remaining contexts (Fig. 1e; Supplementary Notes online). For a concluding estimation of sequencing errors in dbSNP, we selected a

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

set of 10,000 random SNPs and manually examined representative trace sets for all possible N(N/N)N contexts (where N is any nucleotide). Along with the expected A(G/H)N, the C(A/Y)C and G(A/C)C contexts also showed false-positive rates >10%. Altogether, we estimated that there were about 256,000 sequencing and 124,000 database errors, representing 3.2% and 1.5% of all RefSNPs. Among sequencing errors, the vast majority (85%) are caused by the ‘suppressed G after A’ problem. Most interestingly, some of the false RefSNPs were investigated in the HapMap project6 (Supplementary Tables 1–3 online) and, as expected, did not show any variation in all genotyped populations. The described error rates in dbSNP might both introduce serious biases in large-scale bioinformatic studies and misdirect experimental efforts, particularly if a special sequence context such as acceptor AG is considered. Therefore,

1069

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

CORRESPONDENCE we emphatically recommend all users of dbSNP to refer to the ‘validation status’ tag and use a simple SNP classification scheme, as described above, that aims at extracting RefSNPs with lower error rates. According to our classification, dbSNP (version 124) contains in C1, C2 and C3 2,077,680, 2,946,840 and 3,470,166 entries, respectively. To investigate the differences between those three classes, we extracted the available confidence information. C1 and C2 RefSNPs have higher average values (both 51.4) than SNPs in C3 (43.2, Supplementary Notes online). Furthermore, about 87% in C1 and C2 have confidence values of at least 40, in contrast to only 63% in C3 (Fig. 1d). As a low confidence value indicates a potential sequencing error, we recommend that bioinformatics and/or experimental efforts either use only C1 and C2 RefSNPs or find a way of excluding from C3 all dbSNP entries with Phred <40 (ref. 11). Note: Supplementary information is available on the Nature Biotechnology website.

Matthias Platzer1, Michael Hiller2,

Karol Szafranski1, Niels Jahn1, Jochen Hampe3, Stefan Schreiber3, Rolf Backofen2 & Klaus Huse1 1Genome Analysis, Leibniz Institute for Age Research–Fritz Lipmann Institute, Beutenbergstr. 11, 07745, Jena, Germany. 2Institute of Computer Science, Albert-Ludwigs-University Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany. 3Institute for Clinical Molecular Biology, Christian-Albrechts-University Kiel, Schittenhelmstr. 12, 24105, Kiel, Germany. e-mail: [email protected]

1. Kruglyak, L. Nat. Genet. 17, 21–24 (1997). 2. Mitchell, A.A., Zwick, M.E., Chakravarti, A. & Cutler, D.J. Bioinformatics 20, 1022–1032 (2004). 3. Hiller, M. et al. Nat. Genet. 36, 1255–1257 (2004). 4. Valentonyte, R. et al. Nat. Genet. 37, 357–364 (2005). 5. Krawczak, M., Reiss, J. & Cooper, D.N. Hum. Genet. 90, 41–54 (1992). 6. International HapMap Consortium. Nature 426, 789–796 (2003). 7. Korch, C. & Drabkin, H. Genome Res. 9, 588–595 (1999). 8. Stephens, R.M. & Schneider, T.D. J. Mol. Biol. 228, 1124–1136 (1992). 9. Kotlyar, A.B., Borovok, N., Molotsky, T., Fadeev, L. & Gozin, M. Nucleic Acids Res. 33, 525–535 (2005). 10. Ewing, B. & Green, P. Genome Res. 8, 186–194 (1998). 11. Hiller, M. et al. Am. J. Hum. Genet. 78, 291–302 (2006).

Data integration gets ‘Sloppy’ To the editor: Data integration in life sciences currently faces a conundrum1–4. On the one hand, the diversity of data is increasing as explosively as its volume. This makes it imperative that some degree of data formatting standardization

is agreed upon by the diverse community generating and using that data. On the other hand, the value of individual data sets can only be appreciated when enough of those distinct pieces of the systemic puzzle are put together. Therefore, it is also imperative that

standard formats not be enforced so strictly as to be an obstacle to reporting the very novel data that brings value to the targeted systemic integration. We present here a prototype application, termed Simple Sloppy Semantic Database (S3DB), that provides a bridge between loosely structured raw data annotated using personal ontologies and a globally referenceable semantic representation indexed to controlled vocabularies. Wide adoption of this database formalism has the potential to facilitate and optimize data management in a range of research fields, from molecular epidemiology to basic biology. For most types of biological data, the agreed-upon communal format has a complexity that is far from trivial and requires specialized converters that were not available when the analytical method was first developed. For example, an agreed-upon Minimum Information about Microarray Experiments (MIAME) standard was defined in 2001 (ref. 5), but the jury is still out for much older and widely used techniques such as gel-based proteomics (for example, see ref. 6). Even when, after much consultation, a community standard emerges, the rigidity of minimal descriptions eventually becomes insufficient for standalone reposition7. Like many others before us, we have reached the conclusion that complementary efforts in proteomics8, transcriptomics9 and genomics10 can only be integrated in a common representation within a semantic framework2,11. We have specifically argued2 for the need to migrate to RDF (Resource Description Framework) from the more widely used XML (Extensible Markup Language) hierarchies or relational structures, a view also espoused by the World Wide Web consortium Life Sciences interest group (http://www.w3.org/2001/sw/hcls/). However, that formalism is cumbersome for configuring information management systems and trades human intuitiveness for machine process expressiveness. This combination of implementation and interface challenges typically loses the very contribution that is needed to put the systemic puzzle together: that of the ‘biology domain’ expert. Figure 1 Example of a S3DB application. The indexing scheme is described by the table in the upper left, where the connecting lines identify the three clauses, (a)–(c), verified by the validation engine for a new statement. Three snapshots of the S3DB application for the example discussed in the text are displayed: directed graph depiction of the rules (1), validation log for submission of a literal (nuclear data element such as ‘30 years’) (2) and validation log for the association of two resources (3).

1070

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

CORRESPONDENCE A bridge between poorly structured raw data annotated to personal ontologies and a globally referenceable semantic representation indexed to controlled vocabularies is thus needed. Such a bridge should raise no obstacles to data submission and should instead allow the incremental editing of the underlining data model without compromise of the data already submitted. It should also be deployable as a web-server application such that collaborating users can share a common repository independently of their location. Finally, it should allow the referencing to external controlled vocabularies and should be exportable as RDF2. With this in mind, we have developed a prototype application, S3DB, that incorporates all these characteristics. The proposed implementation of editable semantic reposition relies on a relational backbone made of three tables: rules, statements and resources. The concept driving the configuration of S3DB is purely semantic and relies solely on documenting an entity relationship (ER) model12 of the type [Subject][Property][Object]. The solution that enables editable data model reposition consists of an indexing scheme where the permanence of the indexes rather than the permanence of the element names allows renaming and reassociation without loss of content. Although the relational backbone of S3DB consists of three tables, they do not establish a relational model. Instead, their interoperation relies on a validation engine that checks for syntactic and semantic consistency. Data are submitted as statements made of five element vectors, [Subject] [UID] [Property] [Object] [Value] that are verified for the following: (i) that the triple [Subject][P roperty][Object] exists in the rules table; (ii) that the resource unique index (UID) pair [Subject][UID] exists in the resources table; and (iii) that if [Object] is a resource (if it is declared as having UID in rules) then [Value] has to be a valid UID for that resource (i.e. , the pair [Subject][UID] exists in the resources table). This solution mimics locally the sort of evolution of the data model that we expect to achieve for global representations using RDF2. Its workings are illustrated using an example in Figure 1. To assign an age to a patient, the first step is to add that property to the domain of discourse (for example, to include an entry in the rules table saying that people have age as a demographic property; see popup window inset). Subsequently, a statement can be made for an existing patient, UID 115, saying that she is 30 years

of age (literal value) and then that a sample, UID 308, was collected. Because not just all resources, but also all rules and statements, are uniquely indexed (UID), their contents can subsequently be edited for renaming of resources and rewiring of relations. The result is a conveyor belt of successive editing into more structured and global formats. The S3DB prototype was developed with opensource languages and is made freely available with open source for unrestricted use and modification (http://www.s3db.org). ACKNOWLEDGEMENTS This work was partially funded by the National Heart, Lung and Blood Institute of the US National Institutes of Health, under contract no. N01-HV-28181, and the PREVIS project (Pneumococcal Resistance Epidemicity and Virulence, An International Study), contract number LSHM-CT-2003-503413 from the European Commission.

Jonas S Almeida1*,3, Chuming Chen2, Robert Gorlitsky2, Romesh Stanislaus2, Marta Aires-de-Sousa3, Pedro Eleutério3, João Carriço3, António Maretzek3, Andreas Bohn3, Allen Chang1, Fan Zhang4, Rahul Mitra4,5, Gordon B Mills4, Xiaoshu Wang2 & Helena F Deus3

1Department of Biostatistics and Applied Mathematics, University of Texas, 1515 Holcombe Blvd., Box 0447, Houston, Texas 77030-4009, USA. 2Department of Biostatistics, Bioinformatics & Epidemiology, Medical University of South Carolina, 135 Cannon Street, Suite 303, Charleston, South Carolina 29425, USA. 3Instituto de Tecnologia Química e Biológica da Universidade Nova de Lisboa (ITQB/UNL), Av. da República (EAN), 2781-901 Oeiras, Portugal. 4Kleberg Center for Molecular Markers, Department of Molecular Therapeutics, University of Texas M. D. Anderson Cancer Center, 1515 Holcombe Blvd., Box 0317, Houston, Texas 77030-4009, USA. e-mail: [email protected]

1. Hey, T. & Trefethen, A.E. Science 308, 817–821 (2005). 2. Wang, X., Gorlitsky, R. & Almeida, J.S. Nat. Biotechnol. 23, 1099–1103 (2005). 3. Buetow, K.H. Science 308, 821–824 (2005). 4. Foster, I. Science 308, 814–817 (2005). 5. Brazma, A. et al. Nat. Genet. 29, 365–371 (2001). 6. Stanislaus, R. et al. BMC Bioinformatics 5, 9 (2004). 7. Shields, R. Trends Genet. 22, 65–66 (2006). 8. Stanislaus, R. et al. Bioinformatics 21, 1754–1757 (2005). 9. Almeida, J.S. et al. Compar. Func. Genomics 6, 132– 137 (2005). 10. McKillen, D.J. et al. BMC Genomics 6, 34 (2005). 11. Neumann, E. Sci. STKE 2005, pe22 (2005). 12. Chen, P.P.S. Assoc. Comput. Machinery Trans. Database Syst. 1, 9–36 (1976).

Replacing cRNA targets with cDNA reduces microarray cross-hybridization To the editor: Gene-expression microarrays are designed to measure relative concentrations of transcripts through the specific hybridization of an immobilized DNA probe to its complementary target. This technology is viable to the extent that a single, rather permissive hybridization condition allows most probes to bind specifically to their targets. Despite efforts to maximize stringency, a significant hybridization signal can still be detected on various oligonucleotide-based platforms, even when there are a few mismatches between probe and target1–3. Furthermore, several groups have detected widespread cross-hybridization in microarray measurements4,5, and on the order of 10% of the probes on a common oligonucleotide array platform were predicted to be susceptible to cross-hybridization5. Efforts to optimize probe length found that longer probes enjoy stronger signal intensity but also suffer from increased

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

propensity toward cross-hybridization6. Therefore, nonspecific binding remains a significant source of measurement error and may be the reason why quantitative reverse transcription (qRT)-PCR fails to confirm about 10–20% of difference calls made by microarray analysis (reviewed in ref. 7). Here, we report that a high-level of promiscuity in DNA-RNA hybridization underlies widespread cross-hybridization in microarrays. This cross-hybridization can be reduced using cDNA targets in place of cRNA. From its inception, microarray technology took advantage of either of two types of biochemical entities as the labeled target, cRNA8 or cDNA9. Although in many aspects these two types of labeled target are considered to be equivalent for the purpose of microarray analysis, the use of cRNA has held an important methodological advantage. Because RNA polymerase does not require a primer, it was rather straightforward to design a near-linear

1071

1072

a

b

cRNA

log2 MAS5 signal, spikes added

log2 MAS5 signal, spikes added

10

5

0

–5 –5

0

5

cDNA

10

5

0

–5

d

cRNA

14 12 10 8 6 6

8

10

0

12

14

log2 probe intensity, no spikes

5

10

log2 MAS5 signal, no spikes

log2 MAS5 signal, no spikes

c

True change False change No change

–5

10

log2 probe intensity, spikes added

target amplification method, which has been extremely useful in experiments with small amounts of starting RNA8, including most clinical studies. Even so, there are hints in the literature indicating that DNA-RNA hybridization might be less specific than DNA-DNA hybridization. One such relevant observation is that in free solution DNA-RNA hybrids, certain mismatch ‘wobble’ base pairs are more stable than complementary base pairs10. This effect is, however, absent in DNA-DNA hybrids11,12, indicating that targets made of DNA might be a more specific alternative to standard RNA targets. Although free solution studies cannot be automatically extrapolated to the microarray process (microarray hybridizations are not in free solution, are not in equilibrium13 and are burdened with labeling moieties), we postulated that using cDNA instead of cRNA targets may reduce cross-hybridization on microarrays. Further stimulus for testing this hypothesis was provided by the recent development of a novel target-preparation method using a chimeric DNA-RNA primer and isothermal DNA amplification to generate cDNA that is subsequently labeled for use on standard oligo microarrays14. This method provides a near-linear amplification method with labeled cDNA as the end product. To compare the extent of crosshybridization using cRNA and cDNA targets, we used a baseline RNA specimen from the human T-cell leukemia–derived Jurkat cell line with or without an added set of ‘spikes’ comprising hemoglobin transcripts HbA1, HbA2 and HbB. The comparative microarray analysis of these two samples is expected to identify all probes affected by cross-hybridization with the spikes. Therefore, the two RNA samples were processed in triplicate, either with a standard one-round T7 in vitro transcription protocol to produce cRNA target or with the linear isothermal protocol15 to produce cDNA target. Each labeled sample was hybridized to Affymetrix (Santa Clara, CA, USA) U133A 2.0 microarrays containing 22,277 probe sets, nine of which were intended to detect the spike-in transcripts (for further details, see Supplementary Methods online; raw data from the 12 arrays is available from GEO, accession no. GSE4532). These nine probe sets were indeed the most differentially expressed, regardless of the type of target used (Fig. 1a,b). Along with these true changes, there were other probe sets that seemed to have lesser, but

log2 probe intensity, spikes added

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

CORRESPONDENCE

cDNA

14 12 10 BLAST score 0 12–24 26–34 36–44 48–50

8 6 6

8

10

12

14

log2 probe intensity, no spikes

Figure 1 Comparison of microarray data from Jurkat-cell RNA with and without spiked transcripts. (a,b) At the probe-set level, cRNA target (a) detected more false changes than did cDNA target (b). Here, a probe set was considered changed if it had present calls in at least three of six samples, twofold change in mean expression and Student’s t-test P < 0.05. (c,d) At the individual probe level, cRNA target (c) was more likely than cDNA target (d) to hybridize with off-target probes with some complementarity, as determined with BLAST. For reference, the score for an alignment with no mismatches is approximately equal to twice the number of identical bases.

still substantial, differential expression. Using simple criteria, we detected 791 false changes using cRNA and 19 false changes using cDNA (Fig. 1a,b). The Affymetrix probe set comprises multiple individual probes, each with a different sequence. As expected, we found that all probes with a sequence exactly matching those of the spikes were increased using either type of target. Additionally, many off-target probes were increased or decreased, and this effect was substantially greater with cRNA than with cDNA (Fig. 1c,d). In general, fold change of a probe corresponded with maximum sequence similarity between that probe and any of the three spikes (Fig. 1c,d). Thus, it seems that cDNA target is better able to discriminate between the correct probe and similar but incorrect probes. It is unlikely that the increased specificity using cDNA targets instead of cRNA targets

comes at the price of decreased sensitivity. First, the percentage of probe sets called ‘present’ was very similar in the baseline sample (cRNA range 60.5–61.2; cDNA range 60.4–61.4) and greater with cDNA than with cRNA after addition of spikes (cRNA range 51.0–53.0; cDNA range 59.3–60.2). Also, in a recent paper Barker et al.16 compare the same cDNA- and cRNAbased protocols for their ability to identify differentially expressed genes in the same pair of RNA aliquots. To validate differential gene-expression calls, they ran independent qRT-PCR based measurements on 106 genes whose expression varied by several orders of magnitude. Across the entire concentration range of independently quantified genes, measurements using cDNA targets were better correlated with qRT-PCR than were measurements using cRNA targets16. To assess microarray specificity, we chose to compare two RNA samples

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

CORRESPONDENCE that differ only by a set of three highconcentration spikes. It seems unlikely that a physiologically relevant experiment would involve gene-level changes of this magnitude, so our results are likely to indicate an upper limit of the effects of cross-hybridization. However, in an experiment in which many genes change levels, we expect that cross-hybridization could affect a larger number of probes, albeit to a lesser degree. As indicated by our results, the use of multiple probes against the same transcript in Affymetrix gene chips may partially compensate for the effect of a few nonspecific probes within a given probe set. Nonetheless, there is a general tendency even on this platform to reduce the number of probes per transcript. For example, the recently introduced exon arrays from Affymetrix contain only four probes per exon. Interestingly, DNA rather than RNA target is used in this type of microarray, which may explain why four probes per probe set are sufficient. Other widely used microarray platforms, such as those from Agilent (Palo Alto, CA, USA), GE Healthcare (Little Chalfont, UK), Illumina (San Diego, CA, USA), and Applied Biosystems (Foster City, CA, USA), generally use a single probe per transcript; therefore, when a probe is prone to cross-hybridization, its false signals cannot be corrected by other probes. We thus conclude that replacing cRNA with cDNA seems to provide a simple solution to eliminating a significant portion of undesirable nonspecific signals.

Genomic tiling arrays have recently been used to determine the true extent of the transcribed portion of the human genome (reviewed in ref. 17). Because of the lack of appropriate controls for falsepositive detection17, these experiments are particularly sensitive to errors caused by cross-hybridization. In light of our results, it is therefore not surprising that a side-byside analysis of the same transcript pool by RNA- and cDNA-based analysis revealed only a 35% overlap between the positive probe pairs identified by the two methods18. Consequently, it is rather reassuring that most tiling array experiments searching for previously unidentified transcripts used labeled cDNA as target18. Although the results presented here demonstrate the increased promiscuity of RNA-DNA hybridization relative to DNADNA hybridization, it will take further experimentation to see whether the same difference holds in vivo. An affirmative result may have significant implications for the design of DNA-based antisense therapy, RNA-mediated chromatin remodeling and DNA methylation, which may be initiated by RNA-DNA hybridization19. Note: Supplementary information is available on the Nature Biotechnology website.

Aron C Eklund1,2, Leah R Turner3, Pengchin Chen3, Roderick V Jensen4, Gianfranco deFeo3, Anne R Kopf-Sill3 & Zoltan Szallasi1 1Children’s Hospital Informatics Program at the Harvard-MIT Division of Health Sciences and

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

Technology (CHIP@HST), Harvard Medical School, Boston, Massachusetts 02115, USA. 2Center for Neurologic Diseases, Brigham and Women’s Hospital, 65 Landsdowne St., Cambridge, Massachusetts 02139, USA. 3NuGEN Technologies, Inc., 821 Industrial Rd, Unit A, San Carlos, California 94070, USA. 4Department of Physics, University of Massachusetts Boston, Boston, Massachusetts 02125, USA. e-mail: [email protected] 1. Ramakrishnan, R. et al. Nucleic Acids Res 30, e30 (2002). 2. Chudin, E. et al. Genome Biol 3, RESEARCH0005 (2002). 3. Hughes, T.R. et al. Nat Biotechnol 19, 342–347 (2001). 4. Zhang, J., Finney, R.P., Clifford, R.J., Derr, L.K. & Buetow, K.H. Genomics 85, 297–308 (2005). 5. Wu, C., Carta, R. & Zhang, L. Nucleic Acids Res. 33, e84 (2005). 6. Relogio, A., Schwager, C., Richter, A., Ansorge, W. & Valcarcel, J. Nucleic Acids Res. 30, e51 (2002). 7. Draghici, S., Khatri, P., Eklund, A.C. & Szallasi, Z. Trends Genet. 22, 101–109 (2006). 8. Lockhart, D.J. et al. Nat. Biotechnol. 14, 1675–1680 (1996). 9. Schena, M., Shalon, D., Davis, R.W. & Brown, P.O. Science 270, 467–470 (1995). 10. Sugimoto, N., Nakano, M. & Nakano, S. Biochemistry 39, 11270–11281 (2000). 11. Allawi, H.T. & SantaLucia, J., Jr. Biochemistry 37, 2170–2179 (1998). 12. Allawi, H.T. & SantaLucia, J., Jr. Biochemistry 37, 9435–9444 (1998). 13. Sekar, M.M., Bloch, W. & St John, P.M. Nucleic Acids Res. 33, 366–375 (2005). 14. Kurn, N. et al. Clin. Chem. 51, 1973–1981 (2005). 15. Dafforn, A. et al. Biotechniques 37, 854–857 (2004). 16. Barker, C.S. et al. BMC Genomics 6, 57 (2005). 17. Johnson, J.M., Edwards, S., Shoemaker, D. & Schadt, E.E. Trends Genet. 21, 93–102 (2005). 18. Kampa, D. et al. Genome Res. 14, 331–342 (2004). 19. Matzke, M.A. & Birchler, J.A. Nat. Rev. Genet. 6, 24–35 (2005).

1073

Why spurning food biotech has become a liability Henry I Miller, Gregory Conko & Drew L Kershen By rejecting gene-spliced ingredients in their products, some major food companies may be making foods that are less safe and wholesome for consumers—and that expose them to litigation.

In the late 1990s, a singular phenomenon swept the world. One after another, food and beverage companies capitulated to strident xenogenophobic voices that called for elimination of gene-spliced ingredients from their product lines. In the United States, fast food giant McDonald’s (Chicago) banned transgenic ingredients from its menu, food manufacturers Heinz (Pittsburgh) and Gerber (Fremont, MI, USA) dropped them from their baby food lines, and Frito-Lay (Atlanta) told its growers to stop planting corn containing Bacillus thuringiensis (Bt) toxin or risk exclusion from its snacks business. Elsewhere, brewers Kirin (Shinkawa, Japan) and Carlsberg (Valby, Denmark) eliminated gene-spliced ingredients from their beers. These actions were rationalized variously as “protecting stakeholder interests,” “ensuring human safety” and “safeguarding the environment.” Ironically (and also surprisingly in these litigious times), in their eagerness to avoid biotech and the mainstream media’s “if it bleeds, it leads” coverage of the outlandish accusations and speculations of anti-biotech activists, these companies have exposed themselves to richly deserved legal jeopardy.

Henry I. Miller is a fellow at the Hoover Institution, Stanford University, Stanford, California 94305-6010, USA. Gregory Conko is at the Competitive Enterprise Institute, 1001 Connecticut Avenue NW, Washington, DC 20036, USA. Barron’s has named their book, The Frankenfood Myth: How Protest and Politics Threaten the Biotech Revolution, one of the 25 Best Books of 2004. Drew L. Kershen is at the University of Oklahoma College of Law, Norman, Oklahoma 73019-5081, USA. e-mail: [email protected]

©LWA-Dann Tardif/CORBIS

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

C O M M E N TA R Y

What’s on the menu? Consumers could react litigiously if baby food giants like Gerber or Heinz continue to prioritize the perceived risks of gene-spliced foods over the clear and present dangers of food allergies and mycotoxins.

Toxic food Every year, scores of packaged food products are recalled from the US market because of the presence of (all-natural) contaminants like insect parts, toxic molds, bacteria and viruses. Because farming takes place out of doors and in dirt, such contamination is a fact of life. Over the centuries, the main culprits in mass food poisoning have often been mycotoxins, such as ergotamine from ergot (Claviceps purpurea) or fumonisin from Fusarium spp., resulting from the fungal contamination of unprocessed crops. This process is exacerbated when insects attack food crops, opening wounds in the plant

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

cuticle and epidermis that provide an opportunity for pathogen invasion. Once the molds get a foothold, poor storage conditions also promote their post-harvest growth on grain. Fumonisin and some other mycotoxins are highly toxic, causing fatal diseases in livestock that eat infected corn and esophageal cancer in humans. Fumonisin also interferes with the cellular uptake of folic acid, a vitamin that is known to reduce the risk of neural tube defects in developing fetuses. Because fumonisin prevents the folic acid from being absorbed by cells, the toxin can, in effect, induce functional folic acid deficiency—and thereby cause neural

1075

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

C O M M E N TA R Y tube defects such as spina bifida—even when the diet contains what otherwise would be sufficient amounts of folic acid. Regulatory agencies, such as the US Food and Drug Administration and UK Food Safety Agency are acutely aware of the danger of mycotoxins. They have established recommended maximum fumonisin levels in food and feed products made from corn. Although highly processed cornstarch and corn oil are unlikely to be contaminated with fumonisin, unprocessed corn or lightly processed corn (e.g., corn meal) can have fumonisin levels that exceed recommended levels. In 2003, the UK Food Safety Agency tested six organic corn meal products and 20 conventional corn meal products for fumonisin contamination. All six organic corn meals had elevated levels—from nine to forty times greater than the recommended levels for human health—and they were voluntarily withdrawn from grocery stores. A role for biotech The conventional way to combat mycotoxins is simply to test unprocessed and processed grains and discard those found to be contaminated—an approach that is both wasteful and dubious. But modern technology—specifically, products derived from recombinant DNA technology (also known as food biotech, genesplicing or genetic modification)—offers a way to prevent the problem. Contrary to the claims of biotech critics, who single out such crops as posing the risk of new allergens, toxins or other nasty substances if introduced into the food supply (none of which has been proven actually to have occurred), such products would offer the food industry a proven and practical means of tackling the fungal contamination at its source. An excellent example is corn crafted by splicing into commercial corn varieties a gene (or genes) encoding natural toxins from the bacterium B. thuringiensis. The Bt gene expresses a protein that is toxic to corn-boring insects but is harmless to birds, fish and mammals, including humans. As the Bt corn fends off insect pests, it also reduces the levels of the mold Fusarium, thereby reducing the levels of fumonisin. Thus, switching to the gene-spliced, insect-resistant corn for food processing would lower the levels of fumonisin—as well as the concentration of insect parts—likely to be found in the final product. Indeed, researchers at Iowa State University in Ames and the US Department of Agriculture found that in Bt corn the level of fumonisin is reduced by as much as 80% compared to conventional corn1,2. Thus, on the basis of both theory and empirical knowledge, there should be potent

1076

incentives—legal, commercial and ethical—to use such gene-spliced grains more widely. One would expect public and private sector advocates of public health to demand that such improved varieties be cultivated and used for food—not unlike requirements for drinking water to be chlorinated and fluoridated. Food producers who wish to offer the safest and best products to their customers—to say nothing of being offered the opportunity to advertise ‘new and improved!’—should be competing to get gene-spliced products into the marketplace. Alas, none of this has come to pass. Activists have mounted vocal and intractable opposition to food biotech, in spite of demonstrated, significant benefits, including reduced use of chemical pesticides, less runoff of chemicals into waterways, greater use of farming practices that prevent soil erosion, higher profits for farmers and less fungal contamination. Inexplicably, government oversight has also been an obstacle, by subjecting the testing and commercialization of gene-spliced crops to unscientific and draconian regulations that have vastly increased testing and development costs and limited the use and diffusion of food biotech. The result is jeopardy for everyone involved in food production and consumption: consumers are subjected to avoidable, and often undetected, health risks, and food producers have placed themselves in legal jeopardy. The first point is obvious, the latter less so, but it makes a fascinating story: agricultural processors and food companies may face at least two kinds of civil liability for their refusal to purchase and use fungus-resistant, gene-spliced plant varieties, as well as other superior products. (Baby) food for thought In 1999, the Gerber foods company succumbed to activists’ pressure, announcing that its baby food products would no longer contain any gene-spliced ingredients. Indeed, Gerber went farther and promised it would attempt to shift to organic ingredients that are grown without synthetic pesticides or fertilizers. Because corn starch and corn sweeteners are often used in a range of foods, this meant wholesale changes to Gerber’s entire product line. As noted above, not only is gene-spliced corn likely to have lower levels of fumonisin than conventional varieties, but organic is likely to have the highest levels because it suffers greater insect predation due to less effective pest controls. If a mother some day discovers that her ‘Gerber baby’ has developed liver or esophageal cancer, or a neural tube defect such as spina bifida, she might have a valid legal claim against Gerber3. On the child’s behalf, a plaintiff ’s lawyer can allege strict products liability

based on mycotoxin contamination in the baby food as the causal agent of the cancer or neural tube defects. The contamination would be considered a manufacturing defect under products liability law because the baby food did not meet its intended product specifications or level of safety. Gerber could be found liable “even though all possible care was exercised in the preparation and marketing of the product,” simply because the contamination occurred. The plaintiff ’s lawyer could also allege a design defect in the baby food, because Gerber knew of the existence of a less risky design— namely, the use of gene-spliced varieties that are less prone to Fusarium and fumonisin contamination—but deliberately chose not to use it. Instead, Gerber chose to use non-gene-spliced, organic food ingredients, knowing that the foreseeable risks of harm posed by them could have been reduced or avoided by adopting a reasonable alternative design—that is, by using gene-spliced Bt corn, which is known to have a lower risk of mycotoxin contamination. Gerber might answer this design defect claim by contending that it was only responding to consumer demand, but that alone would not be dispositive. Products liability law subjects defenses in design defect cases to a risk-utility balancing in which consumer expectations are only one of several factors used to determine whether the product design (e.g., the use of only non-gene-spliced ingredients) is reasonably safe. A jury might conclude that whatever consumer demand there may be for nonbiotech ingredients does not outweigh Gerber’s failure to use a technology that is known to lower the health risks to consumers. Even if Gerber were able to defend itself from the design defect claim, the company might still be liable because it failed to provide adequate instructions or warnings about the potential risks of non-gene-spliced ingredients. For example, Gerber could have labeled its non-gene-spliced baby food with a statement such as: “This product does not contain genespliced ingredients. Consequently, this product has a very slight additional risk of mycotoxin contamination. Mycotoxins can cause serious diseases, such as liver and esophageal cancer and birth defects.” Hypoallergenic foods Whatever the risk of toxic or carcinogenic fumonisin levels in nonbiotech corn may be (probably low in industrialized countries, where food producers generally are cautious about such contamination), a more likely scenario is potential legal liability when a food product causes an allergic reaction4. Between 6% and 8% of children and between 1% and 2% of adults are allergic to one or

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

C O M M E N TA R Y another food ingredient, and an estimated 150 US citizens die each year from exposure to food allergens5. Allergies to proteins from peanuts, soybeans and wheat, for example, are quite common and can be severe. Although only about 1% of the population is allergic to peanuts, some individuals are so highly sensitive that exposure causes anaphylactic shock, killing dozens of people every year in North America6. Protecting those with true food allergies is a daunting task. Farmers, food shippers and processors, wholesalers and retailers, and even restaurants must maintain meticulous records and labels and ensure against cross-contamination. Still, in a country where about a billion meals are eaten every day, missteps are inevitable. Dozens of processed food items must be recalled every year due to accidental contamination or inaccurate labeling. Fortunately, biotech researchers are well along in the development of crops in which the genes encoding allergenic proteins have been silenced or removed. According to University of California, Berkeley, biochemist Bob Buchanan, hypoallergenic varieties of wheat could be ready for commercialization within a decade, and nuts soon thereafter (R. Buchanan, personal communication; ref. 7). Once these products are commercially available, agricultural processors and food companies that refuse to use these safer food sources will open themselves to products-liability, design-defect lawsuits4. Property damages and personal injury Potato farming is a growth industry, primarily due to the vast consumption of french fries at fast-food restaurants. However, growing potatoes is not easy because they are preyed upon by a wide range of voracious and difficult-tocontrol pests, such as the Colorado potato beetle, virus-spreading aphids, nematodes, potato blight and others. To combat these pests and diseases, potato growers use an assortment of fungicides (to control blight), insecticides (to kill aphids and the Colorado potato beetle) and fumigants (to control soil nematodes). Although some of these chemicals are quite hazardous to farm workers, forgoing them could jeopardize the sustainability and profitability of the entire potato industry. Standard application of synthetic pesticides enhances yields more than 50% over organic potato production, which prohibits most synthetic inputs. Consider a specific example. Many growers use methamidophos, a toxic organophosphate nerve poison, for aphid control. Although methamidophos is a US Environmental Protection Agency–approved pesticide, the

agency is currently reevaluating the use of organophosphates and could ultimately prohibit or greatly restrict the use of this entire class of pesticides. As an alternative to these chemicals, Monsanto (St. Louis) developed a potato dubbed NewLeaf that contains a Bt gene to control the Colorado potato beetle. The ORF-1 (open reading frame 1) and ORF-2 regions from potato leafroll luteovirus (PLRV) were later added to confer resistance to PLRV infection spread by the aphids. The resulting NewLeaf-Plus potato, which received US regulatory approval for food/feed use and environmental release in 1998, is resistant to these two scourges of potato plants and allows growers who adopt it to reduce their use of chemical controls and increase yields. Farmers who planted NewLeaf and NewLeafPlus became convinced that they were the most environmentally sound and economically efficient way to grow potatoes, but after five years of excellent results they encountered an unexpected snag. Under pressure from anti-biotech organizations, McDonald’s, Burger King (Miami) and other restaurant chains informed their potato suppliers that they would no longer accept gene-spliced potato varieties for their french fries. As a result, potato processors such as J.R. Simplot (Boise, ID, USA) inserted a nonbiotech potato clause into their farmerprocessor contracts and informed farmers that they would no longer buy gene-spliced potatoes. In spite of its substantial environmental, occupational and economic benefits, NewLeaf became a sort of contractual poison pill and is no longer grown commercially. Now, assume that a farmer who is required by contractual arrangement to plant nonbiotech potatoes sprays his potato crop with methamidophos (the organophosphate nerve poison) and that the pesticide drifts into a nearby stream and onto nearby farm laborers. As a result, thousands of fish die in the stream and the laborers report to hospital emergency rooms complaining of neurological symptoms. This hypothetical scenario is, in fact, not at all far fetched. Fish kills attributed to pesticide runoff from potato fields are commonplace. In the potato-growing region of Prince Edward Island, Canada, for example, a dozen such incidents occurred in one thirteen-month period alone, between July 1999 and August 2000 (ref. 8). According to the United Nation’s Food and Agriculture Organization (Rome), “normal” use of the pesticides parathion and methamidophos is responsible for some 7,500 pesticide poisoning cases in China each year. In our hypothetical scenario, the state environmental agency might bring an administrative action for civil damages to recover the cost

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

of the fish kill, and a plaintiff’s lawyer could file a class-action suit on behalf of the farm laborers for personal injury damages. Who’s legally responsible? Several possible circumstances could enable the farmer’s defense lawyer to shift culpability for the alleged damages to the contracting processor and the fast-food restaurants that are the ultimate purchasers of the potatoes4. These circumstances include: the farmer’s having planted Bt potatoes for the previous several years; his contractual obligation to the potato processor and its fast-food retail buyers to provide only nonbiotech varieties; and his demonstrated preference for planting gene-spliced, Bt potatoes, were it not for the contractual proscription. If these conditions could be proven, the lawyer defending the farmer could name the contracting processor and the fast-food restaurants as cross-defendants, claiming either contribution in tort law or indemnification in contract law for any damages legally imposed upon the farmer client. The farmer’s defense could be that those companies bear the ultimate responsibility for the damages because they compelled the farmer to engage in higher-risk production practices than he would otherwise have chosen. The companies chose to impose cultivation of a non-gene-spliced variety upon the farmer although they knew that to avoid severe yield losses, he would need to use organophosphate pesticides. Thus, the defense could argue that the farmer should have a legal right to pass any damages (arising from contractually imposed production practices) back to the processor and the fast-food chains. Food giants—watch out! Companies that insist upon farmers’ using production techniques that involve foreseeable harms to the environment and humans may be held legally accountable for that decision. If agricultural processors and food companies manage to avoid legal liability for their insistence on nonbiotech crops, they will be ‘guilty’ at least of externalizing their environmental costs onto the farmers, the environment and society at large. 1. Munkvold, G.P., Hellmich, R.L. & Rice, L.G. Plant Dis. 83, 130–138 (1999). 2. Dowd, P. J. Econ. Entomol. 93, 1669–1679 (2000). 3. Kershen, D.L. Food Drug Law J. 61, 197–236 (2006). 4. Kershen, D.L. Oklahoma Law Rev. 53, 631–652 (2000). 5. Sicherer, S., Munoz-Furlong, A., Wesley Burks, A. & Sampson, H. J. Allergy Clin. Immunol. 103, 559–562 (1999). 6. Bock, S.A., Munoz-Furlong, A. & Sampson, H.A. J. Allergy Clin. Immunol. 107, 191–193 (2001). 7. Weise, E. Biotechnology appears to be withering as a food source. USA Today February 2 (2005), p. 8D. 8. Nickerson, C. Potatoes, pesticides divide island. The Boston Globe August 30, (2000), p.A1

1077

The breeder’s dilemma—yield or nutrition? Cindy E Morris & David C Sands The emphasis of traditional crop production on yield is counter-productive for human nutrition.

P

lant breeders, challenged to create more nutritious crops, face seemingly radical choices that constitute a ‘breeder’s dilemma’. In the search for higher yields and lower farming costs, have breeders inadvertently selected for crops with reduced nutritional quality? To create foods that keep pace with our growing understanding of what constitutes healthy diets, plant breeders may need to make a significant shift away from traditional selection criteria. Subsidizing crop nutritional value rather than yield could be an important and economical driver for this shift in perspective. How healthy is food? The wide variety and availability of DNA and proteomic tests for human health and disease treatment are among the principal technological consequences of the Human Genome Project. This is leading to a growing understanding about the molecular basis of human health and of genetic predisposition to diseases, such as obesity, type 2 diabetes mellitus, cardiovascular disease and colorectal and other cancers. The pivotal role that diet plays in both the cause and the remediation of these and other health problems is also becoming increasingly clear. Our challenge is to narrow the growing gap between what we should eat to maintain optimal health and the nutritional quality of the staple foods in modern diets. Plants are a fundamental constituent of the human diet, either as direct sources of nutrients or indirectly as feed for animals. Modern

Cindy E. Morris is at the Unité de Pathologie Végétale, INRA-Avignon, Montfavet 84140, France. She and David C. Sands are at the Department of Plant Sciences and Plant Pathology, 119 Plant Biosciences Building, Montana State University, Bozeman, Montana 59717-3150, USA. e-mail: [email protected]

1078

plant breeding has been historically oriented toward high agronomic yield, easy and consistent processing, and disease and pest resistance. This strategy may have unwittingly led to the proliferation of foods that are at the root of certain dietary problems. The biochemical quality of certain staple plant foods—and not simply the quantities consumed—might be a predisposing factor for obesity and cardiovascular disease. Furthermore, some plants, although efficient as feeds for animal production, may adversely affect the nutritional qualities of animal-based foods. For example, they might not provide sources of certain types of polyunsaturated fatty acids. Creating staple foods that are more nutritious might require selecting crop cultivars that are lower yielding, more sensitive to pests, possess unusual flavors or other uncommon properties, or otherwise do not meet the traditional criteria of plant breeders. Creation of oil crops and animal feeds that enhance the health-promoting quality of animal-derived foods might involve some concerted genetic modification of our current crops or even replacement of traditional canola-, soy-, wheatand corn-based products with new crops. Problems with staples Wheat breeding, for example, has been historically oriented toward increasing yield and the amounts of amylopectin, gluten and protein. Amylopectin and gluten contents ensure baking and processing qualities. After cooking, however, amylopectin (branched starch) is more readily digestible than amylose (straight starch). Results of feeding trials suggest that quickly digested starch, such as amylopectin, promotes the development of insulin resistance in rats. The relatively slow time course of this condition resembles the normal development of insulin resistance in humans1. Insulin resistance is the leading risk factor for type 2 diabetes mellitus and is aggravated by obesity2. In

James F. Quinn/Chicago Tribune/Newscom

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

C O M M E N TA R Y

For staple crops, dietary problems, such as intolerance to gluten in wheat or immune reactions to the Bet v 6 minor allergen in strawberries, and improved dietary value could be addressed if plant breeding programs were to broaden from their narrow focus on agronomic traits, such as increased yield.

contrast, consumption of high-amylose foods normalizes the insulin response of hyperinsulinemic human subjects. This has potential benefit for diabetics3. Gluten, the major storage protein in wheat (and similar proteins in barley and rye) causes an autoimmune response that damages the small intestine of certain genetically predisposed individuals. The damaged mucosal lining of the small intestine leads to chronic malnutrition whose symptoms are impaired physical health and emotional state. This genetic disorder is known as celiac disease. According to the US National Institutes of Health Consensus Development Conference Statement on Celiac

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

C O M M E N TA R Y Disease of June 2004 (ref. 4), this disease has been underdiagnosed by the medical community and may affect as many as 0.5–1% of people in the United States and Europe. Our staple crops may also have inherent deficiencies that may contribute to emerging dietary problems. Corn is a case in point. About 60% of corn seed proteins consist of prolamins (or zeins) that are almost completely devoid of the essential amino acids lysine and tryptophan. Attempts to select corn lines with enhanced lysine and tryptophan have invariably led to reductions in zein content. The resulting corn lines had soft chalky endosperm and consequently also suffered increased mechanical damage during harvest. They also were more susceptible to diseases and were lower yielding and thus have never led to significant commercial interest5. Corn-based diets (animal or human) require lysine and tryptophan supplementation for adequate protein synthesis. Tryptophan is also the precursor for the synthesis of some neurotransmitters and for niacin6. Historically, the nutritional deficiency pellagra developed where corn was an important dietary staple and where protein intake was low. It is caused by niacin deficiency due to the absence of its precursor, tryptophan, in the diet. Symptoms are severe dermatitis, diarrhea, dementia and eventually death7. Pellagra is rather uncommon today outside of all but the poorest regions of the world. But in those parts of the world where corn is still an important component of the diet, there may be other consequences of low tryptophan consumption that we are ignoring. The neurotransmitter serotonin, synthesized in the brain from tryptophan, is responsible for feelings of well-being, calmness, personal security, relaxation, confidence and concentration; it is a key player in overall mood and in aggressiveness8 and in the development of depression. Could consumption of tryptophan-rich foods play a role in reducing the prevalence of depression and aggression in society? Lipids also play an important role in human health from both the standpoint of caloric intake and as a source of essential fatty acids. The ratio of omega-3 to omega-6 polyunsaturated fatty acids in our modern diet has escalated from an optimal ratio of 1:1 or 1:2 to a current ratio of 1:25 to 1:30. This nonoptimal ratio results from high consumption of red meats rather than cold-water fish, a lack of plant sources of long chain omega-3 fatty acids in our diets and the use of animal feeds rich in omega6 versus omega-3 fatty acids. Development of cooking oils has led to widespread availability of mono- and polyunsaturated oils, such as canola and soy, that have largely replaced saturated fats. Unfortunately, these

oils are relatively low in long chain omega-3 fatty acids and high in omega-6 fatty acids. This skewed ratio is a key factor in the prevalence of cardiovascular diseases and inflammatory/auto-immune diseases9,10. Fish oils, high in long chain omega-3 fatty acids, cannot replace plant-derived oils for widespread use in cooking and feeds. There is a need for crops that can provide abundant quantities of these fatty acids. It is axiomatic that one of the aims of plant production is the reduction of crop loss from predation and disease. But we wonder if in our efforts to boost yields to feed growing populations we haven’t overlooked what pests could be telling us about the nutritional value of crops. Plants and bacteria are the only organisms that can synthesize the full complement of protein amino acids. Animals must consume certain preformed amino acids. The ten essential amino acids are the same for all animals (despite a few exceptions, such as aphids and termites, which harbor bacterial symbionts that can make essential amino acids and furnish them to their hosts11,12). Nevertheless, plant-derived food that is nutritionally good for insects, rodents, deer and nematodes, for example, is fundamentally good food for humans. Likewise, many of the compounds that plants produce to inhibit herbivory, such as alkaloids, cyanogenic glycosides, glucosinolates and terpenoids, have wide spectrum activities across the animal kingdom. Furthermore, most animals are equipped with taste receptors and internal feedback mechanisms that allow them to sort through stimuli to obtain necessary nutrients and avoid toxins. Thus, if crops are bred for undesirability to insects could this mean something with regard to the quality of these crops as food for humans? Corn and wheat are both deficient in lysine and methionine and there is some effort to increase the content of these amino acids in these crops. Could this lead to increased desirability to pests? We are not proposing that plant breeding ignore or enhance the susceptibility of crops to pests. Rather, we are pointing out that this is part of the breeder’s dilemma in selecting more nutritious crops. Breeding for yield, fruit size and shelf life has also inadvertently led to changes in the flavor qualities of fruits and vegetables. Tomatoes13 and strawberries14 are well-studied examples. Many of the plant volatiles that contribute to flavor are derived from essential long-chain polyunsaturated fatty acids or essential amino acids. In tomato, virtually all of the major volatiles are linked to compounds that are essential nutrients13. With the exception of volatiles that originate from lycopene

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

(which remain high in tomatoes because of selection for red-colored fruits), flavor components related to essential nutrient have been diminished through breeding13. Shifting goals for breeders? What are the technological routes to developing highly nutritious foods, particularly from staple crops? Clearly, genetic and metabolic engineering are likely to be very effective (and in some cases the only possible) routes to modify our current staple crops by insertions of specific genes, gene silencing and immunomodulation. Gene insertion has been used to create rice high in provitamin A15 and gene silencing has led to slight increases in lysine and tryptophan contents of wheat5. Recently, heavy-chain antibodies from llamas have been used to immunomodulate starch branching enzyme A in potatoes, leading to higher amylose content of tubers16. We also need to seriously consider alternative plant species as candidates for major staple crops. To create gluten-free grain crops or enhance omega-3 production in plants, should breeders focus on genetic modifications of wheat, canola or soy, or could alternative plant species more efficiently and effectively lead to solutions for the associated dietary problems? To avoid damage of more nutritious crops by pests, breeders might need to design multiline or composite crops. Each component of these crops might lack a specific nutrient, but the composite would have all of the essential amino acids, for example, in optimal quantities. This might avoid enhancing their desirability to pests. The route to more nutritious crops might also involve consideration of the full gamut of enlightenment arising from genomics, proteomics and metabolomics as applied to humans, nematodes, insects, plants and other organisms and what these approaches are telling us about the biochemical differences between health and disease. Furthermore, to create widespread acceptance of nutritious crops, breeders may need to bravely address the biology of food addiction and satiation, two very real drivers of food preferences. Breeders have always confronted dilemmas; it is their stock in trade. The breeder’s dilemma we describe arises from the confrontation between the emerging data-driven insights into the physiology of human health and traditional agricultural practices and economics of food production. Wholesale creation of highly nutritious crops may threaten current commodity crops. Solving the breeder’s dilemma requires a radical shift in perspective. Improving our diet is an investment in human capital. It will have important positive

1079

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

C O M M E N TA R Y spillovers for education and behavior; ultimately, it could improve the quality of life. Can we eventually consider that gains in work time and higher learning performance, for example, are part of the economic results of plant breeding programs? Can an economic system that subsidizes yield be converted to one that, by subsidizing crops with high nutritional quality, concomitantly reduces other costs to society related to human health?

1080

1. Byrnes, S.E., Miller, J.C. & Denyer, G.S. J. Nutr. 125, 1430–1437 (1995). 2. Hirosumi, J. et al. Nature 420, 333–336 (2002). 3. Behall, K.M. & Howe, J.C. Am. J. Clin. Nutr. 61, 334– 340 (1995). 4. 5. Huang, S. et al. J. Agric. Food Chem. 52, 1958–1964 (2004). 6. Heine, W., Radke, M. & Wutzke, K.D. Amino Acids 9, 191–205 (1995). 7. Breoton, B.P. Nutr. Anthropol. 22, 2–9 (1994). 8. Young, S.N. & Leyton, M. Pharm. Biochem. Behavior 71,

857–865 (2002). 9. Kris-Etherton, P.M., Harris, W.S. & Appel, L.J. Circulation 106, 2747–2757 (2002). 10. Simopoulos, A.P. J. Am. Coll. Nutr. 21, 495–505 (2002). 11. Douglas, A.E. Annu. Rev. Entomol. 43, 17–37 (1998). 12. Douglas, A.E., Minto, L.B. & Wilkinson, T.L. J. Exp. Biol. 204, 349–358 (2001). 13. Goff, S.A. & Klee, H.J. Science 311, 815–819 (2006). 14. Aharoni, A. et al. Plant Cell 16, 3110–3131 (2004). 15. Potrykus, I. Plant Physiol. 125, 1157–1161 (2001). 16. Jobling, S.A. et al. Nat. Biotechnol. 21, 77–80 (2003).

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

INVESTOR’S LAB

The protection racket Tom Jacobs

B

iotech stocks can be white-knuckle volatile, but there is a way you can profit from, or insure against, such rapid and dramatic price changes. Through put and call options, you may not only make more profits, but also buy and sell ‘protection’ for your investments (with no need for ‘knuckle’ sandwiches from your neighborhood mobster). Despite their potential benefits, options are definitely not for beginners, but everyone—especially biotech investors— should know about these valuable tools. Calls and puts as profit makers Call and put options are like futures contracts for oil, wheat, coffee or pork bellies where a buyer has the right, but not the obligation, to sell a commodity to a seller at a certain time (‘expiration’) for a certain price (the ‘strike price’). You can buy or sell options just like stock trades through your broker, paying a commission and, as a buyer, a charge called an options premium. You buy options for a larger potential gain than if you owned the stock. For (hypothetical) biotech Gene Genie, shares sell for $10. You might pay a $1 premium for a ‘call’ option on the shares, with a strike price of $10 and January 2008 expiration. Let’s say shares rise to $15 some time before expiration. The call option premium rises $4 to $5 (it won’t always move dollar-for-dollar with the stock, but that’s a more complex topic), and if you sold, you would pocket a terrific net gain of $4 ($5 minus your $1 premium), or four times your money. Meanwhile, the stock owner profited only 50%. These powerful gains are why few people actually hold the options to expiration and buy the stock, but there is a red flag warn-

Tom Jacobs is co-founder of Complete Growth Investor, http://www.completegrowth.com, a stock service for individual investors. Tom was neither long nor short either shares or options of the companies mentioned here at the time of writing. Options can involve significant risk and should be studied carefully before investing.

ing: If Gene Genie shares stay at $11 or below, you lose 100% of your investment, while the stock owner loses only a part. Using the same principle, if you think the Gene Genie stock price will go down from its price of $10, you can pay a $1 premium to buy a ‘put’ option at $10 a share strike price for January 2008 expiration. You then profit if the stock falls below that price at or before expiration, just as if you had shorted the stock itself.

ment your income by selling options selectively. Thus, my business partner and friend Jeff Fischer has supplemented his income for years by selling puts on two biotechs: Millennium Pharmaceuticals (Cambridge, MA, USA; Nasdaq:MLNM) whenever shares dropped to the mid-single digits and Northfield Laboratories (Northbrook, IL, USA; Nasdaq: NFLD) at prices below $10. (This not a recommendation, but an illustration.)

Calls and puts as insurance The reverse is how you insure against disaster. If you bought Gene Genie at $10, you could buy a $10 put as protection, profiting on the put option if the stock drops, even though you would lose money on the stock itself. If you are shorting Gene Genie at $10, you buy the $10 call as protection. Then, if the stock rises and your short loses, you compensate with some profit because the call option increases in value. True, this strategy would reduce potential profits by the amount of option premium you pay for the insurance. On the other hand, it also downsizes your risk because you will make some profit on the option if you suffer losses on the stock.

Risks and benefits Options involve significant risks and restrictions. First, when you sell a put or a call, you agree to buy or sell the shares at a given price. If the option works against you and for some reason you have to buy or sell, you can lose a lot of money. The topics of ‘covered’ or ‘naked’ puts and calls require much more space than is available here, but you should know what your potential losses are with each option. They can cripple you if you aren’t prepared. Second, not all stocks have options, so you can’t profit with, insure or sell insurance for all holdings, and those that do often are for only short periods before expiration, not over a year away as in our Gene Genie example. You can find these and other options data via online sources such ‘Yahoo! Finance’ or online brokers, including Chicago-based OptionsXpress, which specializes in options. Lastly, most retirement accounts prohibit options except selling covered calls, so you are restricted to a taxable brokerage account for most options activity. These caveats aside, the volatility of the biotech sector makes it fertile ground for options investors. Volatility injects uncertainty, prompting the options seller (rather like an insurance company) to charge a higher premium. That’s why options premiums on average are much higher for a speculative biotech stock than for, say, Pfizer (New York, NY, USA), Microsoft (Seattle, WA, USA) or other large companies (hurricane insurance is after all more expensive in Miami than in Toronto). When biotechs move dramatically, they can provide excellent options opportunities.

A seller or a buyer be? So far, we have discussed only buying options, but for every buyer, there is a seller. This is the cool part: if you sell (or write) the insurance, you pocket the premium today and may never have to pay out on the ‘claim.’ Remember, if you buy Gene Genie calls or puts, you pay the option premium to a seller on the open market in a normal trade via your broker. A seller pockets the cash immediately. If Gene Genie is not above $10 (in the case of calls) or below $10 (in the case of puts) at January 2008 expiration, your option expires worthless to you, but the seller keeps the premium. Because most option contracts expire worthless or close to it, the option seller has a statistical advantage. Indeed, the larger your investing account, the more you can supple-

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

1081

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

F E AT U R E

Oversight of US genetic testing laboratories Kathy L Hudson, Juli A Murphy, David J Kaufman, Gail H Javitt, Sara H Katsanis & Joan Scott Despite the boom in genetic tests available in US laboratories, oversight remains patchy. A survey of laboratory directors suggests that mandatory proficiency testing would result in fewer errors.

T

oday, genetic tests for close to 1,000 diseases are clinically available, with hundreds more under development1. Results from these tests can lead to profound, life-changing decisions, such as whether to undergo prophylactic mastectomy, terminate a pregnancy or take a particular drug or dosage of a drug. An incorrect test result can lead to misdiagnosis and inappropriate or delayed treatment; therefore, it is imperative that results from genetic tests be accurate and reliable. To explore whether creation of a genetic testing specialty with specific proficiency testing (PT) standards could improve the quality of genetic testing, we have examined not only the relationship between participation in PT for genetic testing and laboratory quality but also the attitudes of laboratory directors toward current genetic testing regulation, the value of a genetic testing specialty and the value of PT in ensuring quality testing. The data from our survey clearly demonstrate that participation in PT correlates with test quality. What’s more, most laboratory directors support moves to create formal registration under a genetic testing specialty for centers that carry out such analyses. The testing landscape Over the past three decades, genetic testing has played an increasingly important role in clinical medicine. The first genetic test, for the prenatal

Kathy L. Hudson, Juli A. Murphy, David J. Kaufman, Gail H. Javitt and Joan Scott are at the Genetics and Public Policy Center, Berman Bioethics Institute, Johns Hopkins University, 1717 Massachusetts Avenue, NW, Suite 530, Washington, DC, 20036, USA and Sara H. Katsanis is at the DNA Diagnostic Laboratory, Johns Hopkins Hospital, 600 N. Wolfe St., Baltimore, MD 21287, USA. e-mail: [email protected]

Genetic testing without quality control may be cause for concern. (Source: Genetics and Public Policy Center, Washington, DC.)

diagnosis of sickle cell disease, was developed in 1978 and signaled the birth of modern clinical molecular genetics2. What began as a handful of academic laboratories performing genetic testing for rare and often debilitating diseases has grown into a multimillion-dollar commercial industry3. Fueled by information gained from the Human Genome Project, new genetic tests are quickly transitioning from the research bench to clinical practice (Fig. 1). Currently, a patchwork of oversight mechanisms is in place to help ensure the quality of genetic testing. Only a few genetic tests—those marketed by companies as ‘test kits’—require FDA premarket review. Most tests are developed in-house by clinical laboratories (socalled home brews) and are not subject to government review before they are made clinically available. In 1988, the US Congress enacted the Clinical Laboratory Improvement Amendments (CLIA) in response to reports of rampant errors and poor quality laboratory testing services, par-

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

ticularly with regard to Pap smear results. Any laboratory performing testing on human specimens and reporting patient-specific results must be certified under the provisions of CLIA and adhere to general requirements for quality control (QC) standards, personnel qualification and documentation/validation of test procedures4. Research laboratories are exempt only if they “do not report patientspecific results for the diagnosis, prevention, or treatment of any disease or impairment of, or the assessment of the health of individual patients” (Box 1). Laboratories performing tests categorized as high complexity under CLIA must enroll in the appropriate specialty area, if one is available. Specialty areas provide more detailed requirements than the general CLIA regulations. In particular, many specialties require enrollment in a CLIA-approved PT program. However, a specialty area for molecular and biochemical genetic testing has not yet been created, so there are no specific QC, personnel or PT

1083

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

F E AT U R E 1,300 1,200 1,100 1,000 900 800 700 600 500 400 300 200 100 0

Diseases for which testing is available

1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005

Year Figure 1 Growth of genetic testing, including both clinical and research testing. (Source: Gene Tests database 2005, http://www.genetests.org/.)

standards required by CLIA for these kinds of tests. In the absence of a formal PT program, CLIA states that “a laboratory must establish and maintain the accuracy of its testing procedures” and “have a system for verifying the accuracy of its test results at least twice a year.” Thus, CLIA does not require genetic testing laboratories to enroll in a formal PT program, although some accrediting entities do (e.g., New York State requires laboratories located in New York State or doing business in New York State to participate in PT programs if they

are available). Moreover, formal PT programs are available for only a small fraction of the genetic tests offered today. When a laboratory cannot or chooses not to enroll in a formal PT program, it can perform PT by exchanging samples with other laboratories performing similar testing, retesting archived specimens or splitting samples and comparing results. Few empirical data exist on genetic testing laboratory errors and testing quality, and no data have been made available that directly assess the relationship between the extent of

Box 1 How CLIA works CLIA defines a clinical laboratory as “a facility for the biological, microbiological, serological, chemical, immunohematological, biophysical, cytological, pathological, or other examination of materials derived from the human body for the purpose of providing information for the diagnosis, prevention, or treatment of any disease or impairment of, or assessment of the health of a human being.” (United States Code, Title 42 Section 263a.) Under CLIA, laboratory tests are categorized based on their degree of complexity. Tests are graded based on seven criteria: (i) knowledge, (ii) training and experience, (iii) reagents and materials preparation, (iv) characteristics of operational steps, (v) calibration, quality control and PT materials, (vi) test system troubleshooting and equipment maintenance and (vii) interpretation and judgment. Tests requiring higher skills and knowledge to perform and interpret, such as tests for HIV, other infectious diseases, or molecular diagnostics, are categorized as “moderate” or “high complexity” tests. For these tests CLIA develops specialty areas (e.g., virology, toxicology) that provide additional QC, personnel and other standards specific to that type of testing. CLIA also requires laboratories performing moderate or high complexity testing to enroll in approved PT programs for each specialty in which the laboratory is certified, to provide an independent, external assessment of how well a laboratory is able to perform that type of testing (commonly referred to as a formal PT program). Laboratories enrolled in these formal PT programs periodically receive blinded specimens from the program to be tested in the same manner as samples received from patients. The PT program determines how often a laboratory obtains and reports correct results on these tests, which helps laboratories identify procedural problems and take corrective actions. Proficiency test results are graded as either satisfactory or unsatisfactory depending on how many deficiencies (errors) are detected. Unsatisfactory performance is reported to the CLIA-accrediting organization, and laboratories that consistently perform poorly risk losing their accreditation and CLIA certification.

1084

participation in formal and informal PT programs and the types or frequency of genetic testing errors. A review of the literature from both genetic5,6 and nongenetic7–10 testing laboratories finds that although error rates can vary widely from study to study, the distribution of errors across the pre-analytic, analytic and post-analytic phases of testing remains remarkably consistent for all types of clinical laboratory testing, including genetic testing. The majority of reported laboratory errors occur in either the pre-analytic (e.g., mislabeling specimens, incorrect test ordering) or post-analytic phases of testing (e.g., transcription or interpretation errors). Analytic errors, which are the types of errors that CLIA was intended to address, are estimated to account for 4%–32% of all laboratory errors7. In a 1999 survey of 42 molecular genetic testing laboratories, analytic errors accounted for only 6.1% of all reported problems5. Another survey of 245 molecular genetic testing laboratories found that participation in PT was a leading indicator of higher quality assurance scores6. Quality assurance scores were assigned based on the number of American College of Medical Genetics (ACMG; Bethesda, MD, USA) standards for proper procedures met by a laboratory; the study did not assess laboratory errors. The study’s conclusions were based only on the potential for laboratory errors to occur. In the survey of laboratory directors presented below, we study the quality of genetic testing laboratories, as measured by the level of participation in PT programs, the number of PT deficiencies, the number of incorrect test reports issued and the percent of laboratory directors who cite an analytic error as the laboratory’s most common problem. In addition, we document attitudes of laboratory directors toward current CLIA regulation, the value of a genetic testing specialty and the value of PT in ensuring quality testing. Survey results Overall, 190 laboratory directors responded to our survey (see Box 2 for methodology). They provided information on CLIA certification and specialty, their use of formal or informal PT methods, the effect of PT on laboratory quality, the overall number and type of incorrect test results and their enthusiasm for more stringent oversight of the testing sector. Demographics. Of the 190 respondents, 55% worked in laboratories that perform only clinical testing, 42% in laboratories that offered both clinical and research testing, and 3% in laboratories that perform only research testing, but provided test results to patients and

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

F E AT U R E providers. When respondents worked in a setting that offered research testing, we asked them to consider only the research testing they did that resulted in a report back to the patient or provider. Nearly one in four respondents (23%) was the director of a commercial or independent laboratory, half were at a university or medical school laboratory and 22% were in other hospitals. More than half of the directors (58%)

were PhDs, whereas 22% held an MD or DO degree, 18% were MD/PhDs and the remainder (2%) held another degree. Most directors (77%) reported that their laboratories perform molecular genetic tests, whereas 5% reported that their laboratories perform biochemical genetic tests and 17% reported performing both types. The number of distinct tests offered, the estimated yearly volume of tests performed and other characteristics are found in Table 1.

CLIA certification and specialty. Laboratory directors were asked, “By which organizations is your laboratory accredited or licensed as a molecular or biochemical diagnostic laboratory?” A laboratory was considered CLIA certified if it was accredited by either CLIA or one of three ‘deemed’ accrediting organizations (the Joint Commission on Accreditation of Healthcare Organizations, the College of American Pathologists Laboratory

Box 2 Survey methodology In the absence of a comprehensive directory of US genetic testing laboratory directors, our search strategy for potential participants was designed to cast a wide net and capture as many genetic testing laboratory directors as possible. Survey design. A list of 680 potential participants was compiled using the current GeneTests Clinic Directory1 (n = 226), the Association for Molecular Pathology membership directory15 (n = 274), New York State Department of Health’s online directory of certified biochemical16 and molecular genetic17 testing laboratories (n = 120), laboratories participating in the 2005 National Tay-Sachs and Allied Diseases’ Quality Control Program18 (n = 79), the Canavan Foundation’s laboratory directory19 (n = 91), Washington G-2 Reports’ 2005 Lab Outreach Buyer’s Guide: Providers of Laboratory Outreach Products and Services20 (n = 57) and Veteran Administration hospital laboratories selected from the Veterans Administration website21 (n = 8), as well as potential participants from laboratories identified in Google searches using a variety of search terms (n = 9). Many potential participants appeared on more than one of these lists. All 680 potential participants were mailed an initial invitation to participate in an online survey of genetic testing laboratory directors. This was followed several days later by an e-mail invitation. To be eligible to complete the survey, a potential participant had to identify himself or herself as the director of a molecular or biochemical testing laboratory that reports test results to patients or providers. Potential participants were excluded if they were not laboratory directors, were directors of laboratories that did not provide results to patients or providers or were directors of laboratories that test only for paternity, identity, ancestry, cytogenetics, infectious diseases tissue typing or newborn screening. Up to eight periodic mail, e-mail and phone call reminders were made to nonresponders over a three-month period. Of 680 potential participants, 404 responded. Of these, 199 respondents were ineligible based on the above criteria and were not offered the survey, whereas 190 were eligible and completed the survey. Fifteen additional eligible laboratory directors began the survey but did not complete it, and were excluded from analyses. No response was received from the remaining 276 potential participants. To calculate a response rate among eligible laboratory directors, we estimated the total number of eligible laboratory directors in our list of 680 potential participants by extrapolating the proportion of respondents who were eligible to the 276 nonrespondents22 (Supplementary Methods online). In this way, we estimated that 345 of our potential participants had been eligible for the survey, for a valid response rate of 190/345, or 55%.

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

A 65-question survey, qualified by the Johns Hopkins University Institutional Review Board as exempt (Application no. NA_00001533), was developed to collect data on the current laboratory practices and opinions of molecular and biochemical genetic testing laboratory directors in the United States. A small pretest was conducted with directors of six genetic testing laboratories, and feedback was incorporated into the final survey instrument. The survey collected data about the laboratory setting, types of testing performed (molecular or biochemical or both; research or clinical or both), the qualifications of the laboratory director, laboratory accreditation and certification, test volume and menu, quality control practices, the nature and frequency of laboratory errors, and PT practices. Knowledge Networks, a survey research firm in Menlo Park, California, administered the Web-based instrument. The data provided to the Genetics and Public Policy Center were anonymized with respect to respondents’ identifying information. Potential participants were told that data collected from the survey would be reported only in aggregate, and that analyses would not identify any particular laboratory or director. An incentive in the form of a $25 donation to one of four organizations (College of American Pathologists Foundation, American College of Medical Genetics Foundation, American Red Cross or America’s Second Harvest) was offered in exchange for a laboratory director’s participation. Survey analyses. Analyses included examination of the relationship between laboratory characteristics and the level of participation in both formal and informal PT programs, the number of deficiencies reported in formal PT programs, the number of incorrect test reports and the types of errors observed. Data on annual laboratory test volume were collected by asking respondents to choose a range corresponding to the number of biochemical genetic tests and the number of molecular genetic tests the laboratory performs in a year (ranges provided for both questions were 0, 1–249, 250–999, 1,000–4,999, 5,000–9,999, 10,000–14,9999, ≥15,000). To create an estimate of the total annual genetic test volume for a given laboratory, we added the midpoint of the range for the number of molecular tests to the midpoint of the range for the number of biochemical tests. These sums fell into four clusters, resulting in categories of 1–1,999, 2,000–5,999, 6,000–14,999 and ≥15,000 total tests performed annually. Observations based on these ranges should be interpreted with the understanding that they are estimates of laboratory volume. To assess the relationship between survey variables, we implemented general linear, Poisson and logistic regression models using SAS version 9.1. Key variables used in regression are listed in column 1 of Table 1.

1085

F E AT U R E

Table 1 Extent of CLIA certification, specialty certification and proficiency testing among laboratories

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

Percent of tests subjected to proficiency testing (formal or informal)* Percent of CLIA certified labs with no specialty certification

Type of laboratory

N

Percent

Percent of labs that are CLIA certified

0–24%

25–74%

75–99%

100%

<100%

All respondents

190

100

95

16

8

8

18

65

35

Clinical or research testing

Clinical only

104

55

98

13

6

7

19

68

32

Clinical and research

80

42

98

19

9

10

19

63

37

Setting

Commercial

Research only

Director’s education (one missing)

6

3

17

100

33

17

0

50

50

43

23

98

25

7

12

12

70

30

Univ./medical school

101

53

92

17

10

7

23

60

40

Other hospital

46

24

100

7

4

9

15

72

28

MD or DO

41

22

98

6

10

10

20

61

39

119

58

95

23

7

9

17

66

34

34

18

97

7

6

6

21

68

32 40

PhD MD/PhD Other

5

3

80

0

20

0

20

60

1–1,999

65

34

86

13

15

8

9

68

32

2,000–5,999

71

38

100

18

4

7

25

63

36

6,000–14,999

35

19

100

9

6

6

17

71

29

15,000+

18

10

100

35

0

22

22

56

44

Number of distinct tests offered (one missing)

1–4

45

24

87

7

13

4

4

78

21

5–19

77

41

96

8

7

8

21

65

35

20+

67

35

100

29

6

12

25

57

43

Molecular or biochemical testing

Molecular

Estimate of annual test volume (one missing)

147

77

94

20

7

6

17

70

30

Biochemical

10

5

100

20

30

20

10

40

60

Both

33

17

100

0

6

15

27

52

48

*Row totals may not add up to 100% due to rounding.

Accreditation Program, the Commission on Office Laboratory Accreditation), or held a New York State clinical laboratory permit. Ninety-five percent of respondents indicated that their laboratory was CLIA certified (Table 1). All of the laboratories that were not CLIA certified were low volume laboratories that process <2,000 tests yearly; 86% of these low volume laboratories were CLIA certified, compared to 100% of laboratories that perform ≥2,000 tests annually (p = 0.0001). Additionally, certification rates increased significantly as the menu of different tests offered increased (p = 0.006). The majority of laboratories that performed only research testing and reported patient-specific results were not CLIA certified. Nearly all CLIA-certified laboratories (97%) were certified for high complexity testing. However, 16% reported no specialty area certification. Approximately a third of laboratories with the highest test volumes (35%) and largest test menus (29%) reported having no specialty certification (Table 1). Among CLIA-certified laboratories, 41% were certified in a single specialty area, and 43% listed multiple specialties. The most common specialty

1086

certifications were pathology (48%), chemistry (46%) and clinical cytogenetics (41%). Participation in formal PT. All respondents were asked, “Does your laboratory participate in a formal external proficiency testing program?” Two-thirds of directors said their laboratory participated in “all available formal external proficiency testing programs,” whereas 17% said, “Yes, for some formal, external proficiency testing programs.” Sixteen percent indicated they do not participate in any formal PT programs. Significantly more laboratory directors at university sites (66%, p = 0.03) and other hospitals (82%, p = 0.01) than commercial laboratory directors (56%) reported using all of the formal external PT programs available to them, after excluding directors who said no formal programs were available for the tests they offer (n = 19). The 43 directors who responded either that their laboratory did not participate in formal external PT programs or that their laboratory participated in only some formal programs were asked to select up to five possible reasons for their nonparticipation. Sixty three percent of respondents indicated they did

not participate because of “the lack of availability of formal testing programs.” Another 17% responded that internal PT is adequate. Very few laboratory directors responded that “formal external proficiency testing is too expensive” (7%) or “formal external proficiency testing does not provide timely feedback” (2%). Twenty-four percent selected “another reason” in response to this question and were provided the opportunity to type in their response. Other reasons provided were that the laboratory was for research or teaching purposes only, that the diseases tested for were too rare, that a formal testing program for rare diseases was being established or that other informal means of PT were used. Some respondents indicated that they would participate if PT programs were available. Use of informal PT methods. For tests where no formal external proficiency test is available, CLIA requires that the laboratory “have a system for verifying the accuracy of the test result at least twice a year.” All respondents were asked, “When a formal external proficiency testing program is not available, does your laboratory perform proficiency testing using some other

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

F E AT U R E mechanism?” A majority of respondents said “yes” for all (77%) or some (15%) tests whereas 8% said “no.” Respondents whose laboratories offer 1–4 different tests were twice as likely as those offering a larger menu of tests to say that they used no additional informal PT methods (16% versus 8%, p = 0.02). Half of the laboratories that perform only research testing used no informal PT methods. Respondents (n = 42) whose laboratories did not always perform informal PT on tests when no external program was available were asked “Which of the following, if any, are reasons your lab does not perform proficiency testing using some other mechanism when a formal program does not exist?” The most common response (53%) was “We use competency testing to document our laboratory proficiency.” Forty percent answered, “We are the sole source of the test”; 21% said, “Our test volume is too low to justify developing a proficiency testing program”; and 3% said, “Proficiency testing is not necessary for the types of tests we perform.” Overall extent of PT use. We also asked respondents, “For what percentage of the genetic

tests offered by your laboratory do you conduct some sort of proficiency testing?” More than one-third of respondents (35%) offered some genetic tests for which they perform no PT at all, including 8% who conducted either formal or informal PT on less than a quarter of the tests they offer (Table 1). Three percent conduct no PT for any of their tests. Nearly two-thirds of participants (65%) said that their laboratory performs either formal or informal PT on every test offered (Table 1). After adjusting for key variables, laboratories that perform only molecular genetic tests were significantly more likely to complete either formal or informal PT on all their tests, compared to directors of laboratories using any biochemical genetic tests (70% versus 49%, p = 0.006). Additionally, the smaller the menu of tests offered, the more likely laboratories were to perform some type of PT on all of their tests (p= 0.02). No significant differences in any of the other key variables modeled were noted with respect to the extent of PT employed. Influence of PT on laboratory test quality. A laboratory participating in a formal external proficiency program is given a deficiency if the

laboratory is unable to ascertain and report the correct test results in a timely manner. Among laboratories that participate in formal external proficiency programs (n = 159), 78% reported that their laboratory had no deficiencies over the past two years, 16% reported one deficiency during that period and 7% reported two or more. Table 2 shows that as the percentage of tests on which formal or informal PT is done in a laboratory increased, the number of formal deficiencies decreased. In addition, laboratories that do not perform formal or informal PT on all of their tests were eight times as likely to report multiple deficiencies (16% versus 2%, p = 0.001). After adjusting for key variables, the percentage of tests on which formal or informal PT is done was the strongest predictor of the number of formal PT deficiencies reported over the past two years (p = 0.004), that is, the number of deficiencies decreased with increasing use of PT. After adjusting for extent of PT participation, only annual test volume was significantly related to the number of reported PT deficiencies. Laboratories that performed >2,000 tests annually reported significantly

Table 2 Frequency of proficiency test deficiencies and incorrect test reports issued “How many times in the past 2 years has your lab been found to be deficient in any way on a formal external proficiency test?” Type of laboratory All respondents

“What is your best estimate of how many incorrect test reports were issued by your lab in the past 2 years?”

N

%

Never (%)

1 time (%)

2+ times (%)

None (%)

1–3 (%)

4+ (%)

190

100

78

16

7

28

37

35

Clinical or research testing

Clinical only

104

55

79

17

3

23

38

39

Clinical and research

80

42

75

13

12

29

38

32

Setting

Commercial

Research only

Estimate of annual test volume

Number of distinct tests offered Percent of tests subjected to proficiency testing (formal or informal)

6

3

—

—

—

83

17

0

43

23

77

17

6

28

25

48

Univ./medical school

101

53

76

18

6

27

44

29

Other hospital

46

24

83

10

7

30

34

36

1–1,999

65

34

86

12

2

41

44

15

2,000–5,999

71

38

73

18

10

25

45

30

6,000–14,999

35

19

82

15

3

18

21

61

15,000+

18

10

71

18

12

12

12

77

1–4

45

24

79

14

7

48

35

18

5–19

77

41

79

16

6

25

44

32

20+

67

35

76

16

7

19

32

49

0–24%

15

8

50a

33a

17a

54

31

15

25–74%

16

8

67

8

25

27

33

40

75–99% PT

35

18

70

18

12

15

42

42

124

65

84

14

2

28

37

35

66

35

67

16

16

26

38

36

121

78

—

—

—

28

36

35

1

24

15

—

—

—

9

48

43

2+

10

7

—

—

—

0

40

60

100% PT <100% PT

Number of PT deficiencies in the past 2 years (35 missing)

None

Row totals may not add up to 100% because of rounding. aThis category excludes those performing no PT, because they cannot have PT errors.

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

1087

F E AT U R E

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

more PT deficiencies than low volume laboratories. Findings did not differ when the six directors whose laboratories perform only research testing were excluded. Incorrect test results. All respondents were asked to provide their best estimate of how many incorrect test reports were issued to patients or providers by their laboratory over the past two years (Table 2). Among respondents (n = 177), 28% said no incorrect test reports had been issued by their laboratory during that period, 37% reported between one and three incorrect reports and 35% reported four or more incorrect reports. The average number of incorrect reports reported over the past two years was 5.1. Not surprisingly, the number of incorrect test reports increased significantly with the volume of testing (p < 0.0001). However, adjusting for key variables, the number of incorrect test reports detected also increased significantly

with the number of deficient proficiency tests in the same period. A 20% increase in the number of incorrect test reports is associated with each additional PT deficiency (p = 0.03). This finding did not differ when laboratories that perform only research testing were excluded. Types of laboratory errors reported. All respondents were provided a list of seventeen types of laboratory errors and asked to indicate which had been observed in their laboratory over the last two years. Respondents were then asked to select the most common type of error seen in their laboratory. These were grouped into pre-analytic, analytic and post-analytic errors (Table 3). The most commonly observed errors occurred during the pre-analytic phase of testing; 45% of the most common errors were pre-analytic, 30% were analytic and 24% were post-analytic. Adjusting for key variables, the strongest predictor of whether the most com-

Table 3 Type and frequency of laboratory errors Percent of directors that reported detecting this type of error during the past two years

“Which was the most common type of error over the past 2 years?”

Referrer ordered incorrect test

74

27

Referrer labeled specimen incorrectly

68

10

Contamination before receipt by laboratory

19

4

Transcription error at specimen receipt

32

2

Sample switch at specimen receipt

16

2

Error in written protocol

7

1

Patient’s transfusion not reported by referrer

13

0

Test phase Pre-analytic errors

Error

Total pre-analytic Analytic errors

45

Faulty reagent

52

13

Equipment failure

52

11

Human error in data analysis

44

3

Contamination during specimen testing

18

2

Sample switch during specimen testing

27

1

Total analytic Post-analytic errors

Typographical error on test report

30 55

Data transcription error

42

5

Misinterpretation of data

19

1

Wrong results reported to patient/provider

20

1

Software error in data analysis

8

0

Total post-analytic Other

1088

17

Other

24 4

1

monly observed error occurred during the analytic phase of testing was annual testing volume. Lower-volume laboratories were more likely than those in higher-volume laboratories to identify an analytic error as the most common error (p = 0.03). The second strongest predictor of whether a laboratory’s most common error was analytic was the percentage of tests on which formal or informal PT is performed (Table 4). The odds that the most common error was analytic increased 40% with each decrease in level of PT completed (p = 0.06, Table 4). When analysis was restricted to laboratories that complete ≥2,000 tests annually, those that do not perform some form of PT on all of their genetic tests were significantly more likely than those who complete PT on all tests to identify an analytic error as the most common type (p = 0.02). This finding did not differ when the laboratories that perform only research testing were excluded. Laboratory directors’ attitudes. A majority of respondents (73%) agreed or strongly agreed that “CLIA should create a genetic testing specialty for molecular and biochemical tests.” Directors of laboratories that perform testing for both clinical and research purposes showed somewhat greater approval for a new specialty (79%) than directors of laboratories that provide only clinical genetic testing (66%, p = 0.07). There was no difference in support for a new CLIA specialty based on setting (commercial, academic, other), test volume or on the type of testing performed. Sixty percent of respondents found PT to be “very useful” to “improve the quality of genetic testing performed by the laboratory industry” and another 32% said PT was “somewhat useful.” The perceived value of PT was similarly high in both clinical and research laboratories, in laboratories that do and do not perform biochemical testing, and across laboratory settings and annual test volume. Respondents whose laboratories conducted some type of PT on fewer than half their tests also showed high support for PT in general: 47% said it was very useful and 40% said it was somewhat useful (p = 0.35). Discussion Results of this survey indicate that participation in PT—whether through a formal program or through other measures—has a clear association with laboratory quality as measured by the number of reported deficiencies and the frequency of reported analytic errors. In this survey, the number of reported deficiencies decreased as the percentage of tests for which any PT was performed increased. In addition, the number of incorrect test reports

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

F E AT U R E

Table 4 Relationship between extent of proficiency testing and type of most common error Type of most common error (%)

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

Percent of tests on which some PT is done

Pre-analytic

Analytic

Post-analytic

0–24 PT

29

50

21

25–74 PT

53

33

13

75–99 PT

37

34

29

100 PT

48

26

26

45

30

24

All respondents Row totals may not add up to 100% because of rounding.

increased 20% with each additional reported deficiency. Furthermore, laboratories that perform PT on a lower percentage of tests were more likely to report that their most common error occurred during the analytic phase of testing, which is the phase of testing that PT is intended to evaluate. A limitation of our study stems from the fact that there are no comprehensive baseline data describing the numbers, types and sizes of genetic testing laboratories in the United States that would allow us to determine whether the study sample is representative. Therefore, the extrapolation of the results to the universe of US genetic testing laboratories should be made with some caution. Respondents may over-represent laboratory directors with strong opinions, or under-represent those reluctant to share information about their attitudes or practices. In addition, because we collected data regarding the annual volume of tests and the size of the test menu as ranges (e.g., 250–999 test requisitions per year), we could not completely account for the effect of differences in volume and menu size on respondents’ answers to other questions. The significant rates of nonparticipation in PT reported by directors of laboratories of all sizes demonstrates that merely being certified under CLIA is insufficient to ensure quality: nearly a third of respondents reported that their laboratories perform PT for only some tests they offer. Mandating participation in PT (formal or informal) would increase the number of laboratories performing PT and thereby enhance the quality of genetic testing. Genetic testing has become an increasingly integral component in the diagnosis, treatment, management and prevention of numerous diseases and conditions. Information gained from genetic test results can have a significant impact on medical decision making. Incorrect test results stemming from laboratory errors can lead to misdiagnosis, inappropriate and/or delayed treatment, anxiety and in rare cases, even death. Thus, it is critical that mechanisms are in place to detect and reduce laboratory errors and to ensure that the laboratories performing genetic testing are of high quality.

Since the mid-1990s a number of federal government advisory groups have questioned the adequacy of US regulatory oversight of both genetic tests and the laboratories performing them. In 2000, the US Centers for Disease Control recommended that the Centers for Medicare and Medicaid Services (CMS), the agency that oversees CLIA, create a genetic testing specialty area under CLIA11. Nearly three out of four respondents to this survey approved of such a measure. To date, CMS has not issued a rule for the creation of a genetic testing specialty. Although the US Department of Health and Human Services placed the issuance of a proposed rule for a genetic testing specialty on its regulatory agenda12 in April, with a target publication date of November 2006, more recent statements by CMS officials indicate the agency believes a specialty is not needed. In enacting CLIA, the US Congress stated that PT “should be the central element in determining a laboratory’s competence, as it provides a measure of actual performance on laboratory test procedures rather than only gauging the potential for accurate outcomes”13. The importance of PT in evaluating and monitoring laboratory quality is underscored by the fact that errors can be difficult to detect, and self-reported error rates may not accurately reflect the actual occurrence of errors in the laboratory or the quality of the laboratory. A laboratory may be making errors but not have mechanisms in place to detect them, whereas another laboratory may rarely make errors but detect them more often as a result of redundant checks and balances that have been instituted in the laboratory. Thus, PT is a useful and objective means of evaluating a laboratory’s ability to get the correct test result and to identify potential sources of error. Creation of a genetic testing specialty under CLIA by CMS is a prerequisite to mandating enrollment in specified, CLIA-approved PT programs for genetic testing laboratories. In the absence of CLIA-approved PT programs, laboratories have adopted different practices with regard to PT. Some laboratories enroll in all available formal PT programs, whereas

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

others do not. When a formal external PT program is not available, some laboratories seek to comply with CLIA’s general requirement to ensure accuracy through alternative PT methods, whereas others do not. Lack of availability of formal PT programs was a key reason cited by respondents for failure to perform PT. In the absence of formal PT programs, some laboratory directors use competency testing as a means to assess proficiency. However, competency testing is not a comparable substitute as it assesses an individual laboratory employee’s performance and not the actual ability of a laboratory to get the correct test result. In a recent US Senate hearing, CMS stated that genetic tests are adequately covered by other specialties14. However, the survey data show 16% are not certified in any specialty, including one-third of high volume laboratories. Furthermore, the most common specialty certifications held by genetic testing laboratories have questionable relevance to establishing quality for genetic testing. Establishing additional formal PT programs for genetic testing laboratories and requiring enrollment as a condition of CLIA certification would require additional resources. Even so, more than nine out of ten laboratory directors surveyed regard PT as useful for improving the quality of the genetic testing performed by the laboratory industry and almost no one said cost is a driver of nonparticipation in programs. Furthermore, a majority of laboratory directors support creation of a genetic testing specialty under CLIA. Given these observations, and the demonstrated association between PT and laboratory quality, we conclude that the creation of a genetic testing specialty and the associated requirement to enroll in specified CLIA-approved PT programs would improve the quality of genetic testing laboratories. Note: Supplementary information is available on the Nature Biotechnology website. ACKNOWLEDGMENTS The Genetics and Public Policy Center is supported at Johns Hopkins University by The Pew Charitable Trusts. The opinions expressed in this report are those of the authors and do not necessarily reflect the views of The Pew Charitable Trusts. The authors are grateful to Linda Bradley, Michele Caggana, Wayne Grody and Michele Schoonmaker for their helpful review of an earlier draft of this manuscript, and to GeneTests for providing their Clinic Directory. 1. Gene Tests. http://www.genetests.org/ 2. Kan, Y.K. et al. Polymorphism of DNA Sequence adjacent to human-globin structural gene: relationship to sickle mutation. Proc. Natl. Acad. Sci. USA 75, 5631–5365 (1978). 3. Frost & Sullivan, U.S. Genetic Diagnostics Markets, F463–552 (2005). 4. United States Code, Title 42, Section 263(a). 5. Hofgartner, W.T. & Tait, J.T. Frequency of problems during clinical molecular genetic testing. Am. J. Clin. Pathol. 112, 14–21 (1999).

1089

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

F E AT U R E 6. McGovern, M.M. et al. Quality assurance in molecular genetic testing laboratories. J. Am. Med. Assoc. 281, 835–840 (1999). 7. Bonini P. et al. Errors in laboratory medicine. Clin Chem., 48, 691–698 (2002). 8. Witte, D.L. et al. Errors, mistakes blunders, outliers, or unacceptable results: how many? Clin. Chem. 43, 1352–1356 (1997). 9. Howanitz, P.J. Errors in laboratory medicine: practical lessons to improve patient safety. Arch. Pathol. Med. 129, 1252–1261 (2005). 10. Hollensead, S.C. et al. Errors in pathology and laboratory medicine: consequences and prevention. J. Surg. Oncol. 88, 161–181 (2004). 11. Federal Register, vol. 65, p. 25, 928, May 4, 2000. 12. Federal Register, vol. 71, p. 22,595, April 24, 2006. 13. H.R. Rep. No. 100-899 (1988).

1090

14. At Home DNA Tests: Marketing Scam or Medical Breakthrough? (Testimony of Thomas Hamilton, Director, Survey and Certification Group, Centers for Medicare and Medicaid Services) Before the Senate Special Committee on Aging, 109th Cong. (2006). 15. Association for Molecular Pathology (AMP). Membership Directory (AMP. Rockville, MD, 2005). 16. New York State Department of Health. Database of clinical laboratories currently holding a New York State Department of Health permit in the specified category of testing (2005). Genetic testing/biochemistry. http:// www.wadsworth.org/labcert/clep/CategoryPermitLinks/ CategoryListing. 17. New York State Department of Health. Database of clinical laboratories currently holding a New York State Department of Health permit in the specified category of testing (2005). Genetic testing/molecular. http:// www.wadsworth.org/labcert/clep/CategoryPermitLinks/

CategoryListing.htm 18. National Tay Sachs and Allied Diseases Association. 2005 Directory: NTSAD Quality Control Program Participating Laboratories (2005). http://www.ntsad. org/pages%5Cqclabs2005.htm 19. Canavan Foundation. Canavan Foundation Directory of Testing Centers (2005). http://www.canavanfoundation. org/screening.php 20. Washington G-2 Reports. Lab Outreach Buyer’s Guide: Providers of Laboratory Outreach Products and Services (Washington G-2 Publications, New York, 2005). 21. U.S. Department of Veteran’s Affairs. Facilities Locator and Directory (2005). http://www1.va.gov/directory/ guide/rpt_fac_list.cfm 22. American Association for Public Opinion Research. Standard Definitions: Final Dispositions of Case Codes and Outcome Rates for Surveys. ed. 4. (AAPOR Lenexa, Kansas, 2006).

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

P AT E N T S

Evidence and anecdotes: an analysis of human gene patenting controversies Timothy Caulfield, Robert M Cook-Deegan, F Scott Kieff & John P Walsh When it comes to gene patenting, policy makers may be responding more to high-profile media controversies than to systematic data about the issues.

G

ene patenting has attracted intense scrutiny for decades, raising a host of ethical, legal and economic concerns. Much of the policy debate has focused on seemingly quantifiable and practical concerns about the effect of patents on access to useful technologies in the contexts of both research and the clinic. Here, we summarize the dominant policy concerns and the events that have motivated these debates. We then reflect on what the evidence now says about the major concerns articulated in policy reports. We conclude by discussing what might explain some of the disparity between the empirical evidence and the policy focus. Although policymakers and advisory groups have long recognized the moral and ethical concerns associated with human gene patents1–3, such concerns have only rarely led to concrete proposals for reform4. A systematic review of the content and timing of major policy documents highlights the fact that policy activity has been largely stimulated by a convergence of a general social unease, the emergence of preliminary data and literature on the possible adverse practical ramifications of gene patents, and several highprofile patent protection controversies.

Timothy Caulfield is at the Health Law Institute, University of Alberta, Canada; Robert M. Cook-Deegan is at the IGSP Center for Genome Ethics, Law & Policy, Sanford Institute of Public Policy and Duke University School of Medicine, Durham, North Carolina, USA; F. Scott Kieff is at the Washington University School of Law and the Hoover Institution, Stanford University, Stanford, California, USA; and John P. Walsh is at the School of Public Policy, Georgia Institute of Technology, Atlanta, Georgia, USA. e-mail: [email protected]

The timing of the policy activity reflects this tendency. The recommendations for diagnosticuse licensing, for example, followed the international controversy associated with Myriad Genetics’ decision to enforce the patents over the BRCA1 and BRCA2 mutations5 (Fig. 1). There have been other gene patenting controversies, such as the furor over patents related to Canavan disease, or the attempt by US National Institutes of Health in the early 1990s to patent over 7,000 expressed sequence tags (ESTs)6. The mid-1990s was also a period of rapid (roughly 50% per annum) growth in DNA-related patents in the United States7. Internationally, however, the Myriad controversy coincides with the most policy activity. Indeed, as Figure 2 (and Supplementary Data online) shows, the Myriad Genetics–BRCA1/2 story is, by far, the most referenced patent controversy in the policy documents we reviewed. These controversial gene-patenting stories raised several concerns in the academic and policy literature. A prominent concern was that of a “tragedy of the anticommons,” or the possibility that the large number of patents on genes and their diverse set of owners would make it difficult to acquire the rights to all necessary research inputs, which could, in turn, result in the underuse of valuable technologies8. Second is the longstanding concern that the owners of patents on fundamental technologies will exercise their rights to exclude in ways that will prevent others from developing or accessing the technology9–11. The Myriad case was held out as an example and as a harbinger of the coming problems associated with human gene patents5. Such restrictions on access to patented genes were viewed as especially pernicious given a belief that such patents could not be invented around, because of the unique role that genes play in biological processes.

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

A closely related concern was that the strong commercial incentive built into recent policy changes, and the associated pro-commercial milieu in universities, were undermining the norms of open science12,13, leading researchers to be more secretive about their ongoing research, to delay publication of result, and to be less likely to share research materials or data. These behaviors, it was held, would retard the progress of science and technology. Starting around 2001, this literature, together with the Myriad Genetics controversy and similar ones, began to stimulate significant policy activity (Fig. 1). In Canada, an Ontario government report recommended a variety of reforms, including strengthening the research exemption and revising the compulsory licensing provisions in the Patent Act to create an exemption for genetic diagnostic and screening tests14. The UK’s Nuffield Council on Bioethics made similar recommendations2. In the United States, the National Academy of Sciences issued two reports7,15, both of which recommended a research exemption as a means of dealing with the anticommons and restricted access problems. These reports were clearly influenced by emerging empirical evidence about the effects of gene patents on genetic testing services16,17 and the Myriad controversy (the production of the Ontario report immediately followed the eruption of controversy over Myriad’s patent in Canada and the Nuffield Council and the National Academy’s 2005 report both used Myriad as a case study)18. Reflecting on the evidence With the passage of time and the accumulation of more data, we can now reflect on what the available data do and do not say about the anecdotes, theories and initial evidence that spurred so much policy activity. Indeed, the policy

1091

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

PAT E N T S debates around these concerns have both led to and been informed by a number of empirical studies designed to find out where and to what extent each of these concerns is manifest in the practice of biomedical research. The results of these empirical efforts have been fairly consistent. First, the effects predicted by the anticommons problem are not borne out in the available data. The effects are much less prevalent than would be expected if its hypothesized mechanisms were in fact operating. The data do show a large number of patents associated with genes. A recent study found that nearly 20% of human genes were associated with at least one US patent, and many had multiple patents19. Another study estimated that in the United States over 3,000 new DNA-related patents have issued every year since 1998, and more than 40,000 such patents have been granted7. But despite the large number of patents and the numerous, heterogeneous actors—including large pharmaceutical firms, biotech startups,

universities and governments—studies that have examined the incidence of anticommons problems find them relatively uncommon20–24. These studies span both academics and industry, and include data from the United States, Germany, Australia and Japan. Studies on access to upstream research tools find that although some researchers or firms are denied access to a particular technology, others do have access to the same technology, suggesting that the resulting limitations have more to do with a willingness to accept the market price and access terms25,26. Similarly, among academic biomedical researchers in the United States, only 1% report having had to delay a project and none having abandoned a project as a result of others’ patents, suggesting that neither anticommons nor restrictions on access were seriously limiting academic research21— despite the fact that these researchers operate in a patent-dense environment, without the benefit of a clear research exemption.

One important exception is in the area of gene patents that cover a diagnostic test. Here, there are more instances of researchers and firms claiming that the patent owner is asserting exclusivity or license terms that are widely viewed as inappropriate, thus lending some empirical evidence to support the concerns highlighted by the Myriad Genetics story. For example, 30% of clinical labs report not developing or abandoning testing for the HFE gene after the patent issued17. In addition, 25% of labs had abandoned one or more genetic test as a result of patents, with Myriad’s patents among the most frequently mentioned27. Such unlicensed lab testing, from the perspective of the patent owner, competes with its commercial activity, and hence it is not surprising to find owners asserting their rights. There is also substantial empirical evidence that university researchers are becoming more secretive and less willing to share research results or materials28–32. The causes of this

Ongoing: Myriad Genetics Controvery Canadian BRCA 1 Patent Grant

Myriad Patent Filings (BRCA 1/2)

Canadian BRCA 2 Patent Grant

US BRCA 1/2 Patent Grants

Nicol and Nielsen

Straus

Monsanto (SCC)

1997 1998

1999

National Research Council—IPR & Licensing Practices

Legend Data source Policy report Guidelines Cases, grants, etc.

2000

2001

Blumenthal et al.

Jensen & Murray

Cho et al.

Merz

Hansen et al.

Paradise et al.

Walsh (a) Pressman et al.

Harvard Mouse (SCC)

Merck v. Integra (USSC)

Madey v. Duke (US CAFC)

1995/1996

Verbeure et al.

Walsh (c) Cook-Deegan and McCormack Rabino

Heller and Eisenberg

Vogeli et al.

Walsh (b)

2002

Re: Dane K. Fisher (US CAFC) 2003

2004

2005

2006

The Nuffield Council— The Ethics of Patenting DNA

NIH CIPO IP Australia

OECD— Genetic Inventions, IPR, and Licensing Practices CBAC— Patenting of Higher Life Forms Ontario— Genetics, Testing and Gene Patenting

FTC— To Promote Innovation ... PHGU— IPR and Genetics The Royal Society— Keeping Science Open

UK Patent Office NAS— Patent System... ALRC— Report 99 Danish Council of Ethics— Patenting Human Genes...

CBAC— HGMs:... NRC— Reaping... WHO— Genetics, Genomics ... AACIP— Patents and EU

OECD

NAS— Reaping the Benefits... CBAC— HGMs, IPRs ... WHO— Public Health Innovation...

Figure 1 Timeline of gene patenting cases, decisions and studies, and corresponding significant policy activity (refs. 45–56).

1092

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

249

Number of references

250 223 200

150

100

50 1

2

2

2

7

7

9

Myotonic dystrophy

SCA1, SCA2, SCA3, SCA6

D/B dystrophy

Fragile X

APOE

16

24

28

31 7

0

Myriad Genetics

Cellpro

BRCA 1/2

HFE

Canavan

Huntington

FAP

Analyzing the concerns, evidence and anecdotes The survey of policy reports reveals that the Myriad Genetics controversy was used as a primary tool for justifying patent reform—thus highlighting the potential of a single high-profile controversy to mobilize both governmental and non-governmental policy makers. In Belgium, for instance, the controversy directly incited the adoption of a research exemption38. There were certainly other gene patenting controversies that might have been used in a similar fashion, but it was the Myriad case that emerged as emblematic of the fear that patents on human genetic material would have an adverse impact on access to useful technologies, both for research and for clinical use. This is most likely because the controversy, more than any other, resonated so well with the theoretical concerns that existed in the literature. In addition, the clinical consequences were easy

300

Factor V Leiden

secrecy, however, are still in dispute. In particular, we cannot determine the impact of patents themselves on secrecy, in part because many studies of academic secrecy28,32 use composite measures and, as a result, it is difficult to tease out specific causes thereof. Some studies find that patents per se have little effect on discussion of ongoing research or on sharing of research materials21,29. In contrast, several studies have found that commercial activity, as well as scientific competition and the cost and effort involved in sharing, all have negative effects on open science21,28,32. Industry funding is also often associated with delayed publication29,33,34. This failure to share research materials seems to have a negative impact on research. For example, Walsh et al. find that 19% of recent requests were not fulfilled (and that failures to supply materials are increasing), and that at least 8% of respondents had a project delayed owing to an inability to get timely access to research materials (compared to 1% who were delayed by an inability to get a patent license)21. Finally, some studies show reduced citations of publications once a corresponding patent is granted35,36. However, the causes and implications of such a relationship are unclear. In particular, is this a result of a change in research practices or simply of citation practices (that is, an unwillingness to announce infringement in print)? Even if it is the former, does this simply reflect changing incentives causing a shift by researchers (especially industry researchers) toward less encumbered research areas? The overall social welfare implications of this redirection are also uncertain, as there is both the potential loss of fewer people working on a problem, and a potential gain of a more diverse research portfolio36,37.

CMT-1A, CMT-X

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

PAT E N T S

Figure 2 Explicit references to controversial biotechnology patents and firms in major policy documents after 2002.

to understand and highly visible breast cancer constituencies were engaged. Although the available evidence suggests that the concerns associated with the Myriad case have merit in the context of diagnostic tests, the data are hardly definitive, and empirical research suggests that data about diagnostics cannot be generalized to other uses. Furthermore, five years later, there have been few similar gene patent controversies. One possibility is that the Myriad story has become a cautionary tale for the holders of similar gene patents, guiding them toward more constructive patent enforcement strategies. The evidence regarding the anticommons and restricted access concerns is clearer. The empirical research suggests that the fears of widespread anticommons effects that block the use of upstream discoveries have largely not materialized. The reasons for this are numerous and are often straightforward matters of basic economics39. In addition to licensing being widely available40, researchers make use of a variety strategies to develop working solutions to the problem of access, including inventing around, going offshore, challenging questionable patents and using technology without a license. Though it has been suggested that this latter strategy is an inappropriate and unstable policy15,41, it is important to remember that the stability of this unlicensed use is supported by a combination of the difficulty of enforcing patents owing to the secrecy of research programs, costs of lost goodwill among researchers, costs of litigation, the relatively small damages to be collected from blocking research use, and the interest of the patent owner in allowing

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

research advances in most cases. An anticommons or restricted access–type failure requires not that any one strategy be unavailable, but that the entire suite be simultaneously ineffective, which may explain why, empirically, such failures are much less common than was first posited. Finally, the data concerning the increasing secrecy of university researchers seem to indicate that there may be a conflation of patenting and commercial and/or scientific competition as the cause of this trend. It appears that academic researchers are becoming more secretive, but that is not shown to be attributable to the patenting process, suggesting that the solution might not reside in modifying patent policy. Some have suggested tempering the commercial orientation of faculty and facilitating the flow of research materials42,43. Another approach might be recognizing the inherently competitive nature of the academic process44 and developing additional and improved mechanisms for exchange among its members. Conclusions Looking back on years of policy debates and the associated empirical work on gene patents, what lessons can be drawn? First, although there may have been good reasons for concern, the feared problems have not widely manifested. And the problems that the data do reveal may have less to do with patents than with commercial concerns, scientific competition and frictions in sharing physical materials. Second, despite the growing acknowledgment of this empirical work, there is still a tendency to recommend policy interventions, usually

1093

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

PAT E N T S including a ‘research exemption.’ Yet, given the research noted above, a strengthened research exemption seems unlikely to address the anticommons or restricted access problems, especially in diagnostic testing. And such reforms need to be sensitive to the incentives that patents can provide for developing and distributing research technologies. The combination of a lack of empirical evidence of problems and a mismatch between the problems and proposed solutions may explain why there has been little actual policy change. In addition, our review of the lively policy debate and the limited empirical support for the claims that are driving that debate suggest that policymakers may be responding more to a high-profile anecdote or arguments with high face validity than they are to systematic data on the issues. However, we must acknowledge that one effect of these various high-profile policy debates may have been to sensitize both administrative and funding agencies (for example, the US Patent and Trademark Office and National Institutes of Health) and patent holders to the possible adverse consequences of the overly liberal granting of patents and overly restrictive licensing practices. Whether this swing of the pendulum will help, hurt or have no effect on innovation and the progress of science remains an open question. Thus, further research on the exact mechanisms underlying these effects, as well as their net impacts, should be encouraged. Note: Supplementary information is available on the Nature Biotechnology website. ACKNOWLEDGMENTS We would like to thank Lori Sheremeta, Richard Gold, Michael Sharp, C.J. Murdoch and Robyn Hyde-Lay for the invaluable research assistance; Genome Alberta, AHFMR, the Stem Cell Network and AFMNet for the funding support; and the US National Human Genome Research Institute and Department of Energy (R.C.-D.). We would also like to thank all of the participants of the Genome Alberta Banff Patenting Workshop (May 2006) for insightful comments. 1. Danish Council of Ethics. Patenting Human Genes and Stem Cells (Danish Council of Ethics, Copenhagen, 2004). 2. The Nuffield Council on Bioethics. The Ethics of Patenting DNA: A Discussion Paper (Nuffield Council on Bioethics, London, 2002).

1094

3. Resnik, D.B. J. Law Med. Ethics 29, 152–165 (2001). 4. House of Commons. Standing Committee on Health. Assisted Human Reproduction: Building Families (Government of Canada, Ottawa, 2001). 5. Williams-Jones, B. Health Law J. 10, 123–146 (2002). 6. Kevles, D. & Berkowitz, A. Brooklyn Law Rev. 67, 233–248 (2001). 7. National Academy of Sciences. Reaping the Benefits of Genomic and Proteomic Research: Intellectual Property Rights, Innovation, and Public Health (National Academies Press, Washington, DC, 2005). 8. Heller, M. & Eisenberg, R. Science 280, 698–701 (1998). 9. Merges, R.P. & Nelson, R.R. Columbia Law Rev. 90, 839–916 (1990). 10. Scotchmer, S. J. Econ. Perspect. 5, 29–41 (1991). 11. Caulfield, T. Community Genet. 8, 223–227 (2005). 12. Nelson, R.R. Res. Policy 33, 455–471 (2004). 13. David, P.A. J. Theoret. Institutional Econ. 160, 1–26 (2004). 14. Ontario Ministry of Health. Genetics, Testing and Gene Patenting: Charting New Territory in Healthcare (Government of Ontario, Toronto, 2002). 15. National Academy of Sciences. A Patent System for the 21st Century (National Academies Press, Washington, DC, 2004). 16. Cho, M. Am. Assoc. Clin. Chem. Newslett. 47–53 (1998). 17. Merz, J.F., Kriss, A.G., Leonard, D.G.B. & Cho, M.K. Nature 415, 577–579 (2002). 18. Benzie, R. The National Post A:15 (September 20, 2001). 19. Jensen, K. & Murray, F. Science 310, 239–240 (2005). 20. Walsh, J.P., Cohen, W.M. & Arora, A. Science 299, 1021 (2003). 21. Walsh, J.P., Cho, C. & Cohen, W.M. Science 309, 2002–2003 (2005). 22. Nicol, D. & Nielsen, J. Patents and medical biotechnology: An empirical analysis of issues facing the Australian industry—Occasional Paper No. 6 (Centre for Law & Genetics, Sandy Bay, Australia, 2003). 23. Nagaoka, S. Presentation to OECD Conference on Research Use of Patented Inventions (Madrid, May 18–19, 2006). 24. Straus, J. Presentation to the BMBF & OECD Workshop on Genetic Inventions, Intellectual Property Rights and Licensing Practices (Berlin, January 24–25, 2002). 25. Cohen, J. Science 285, 28 (1999). 26. Walsh, J.P., Cohen, W.M. & Arora, A. Patenting and licensing of research tools and biomedical innovation. in Cohen, W.M. & Merrill, S. (eds.) Patents in the Knowledge-Based Economy (National Academies Press, Washington, DC, 2003). 27. Cho, M.K. et al. J. Mol. Diagn. 5, 3–8 (2003). 28. Campbell, E.G. et al. J. Am. Med. Assoc. 287, 473– 480 (2002). 29. Walsh, J.P. & Hong, W. Nature 422, 801–802 (2003). 30. Grushcow, J. J. Legal Studies 33, 59–84 (2004). 31. Vogeli, C. et al. Acad. Med. 81:2, 128–136 (2006). 32. Blumenthal, D. et al. Acad. Med. 81, 137–145 (2006).

33. Cohen, W.M., Florida, R. & Goe, R. University-industry research centers in the United States. Report to the Ford Foundation (Carnegie Mellon University, Pittsburgh, 1994). 34. Bekelman, J.E., Li, Y. & Gross, G.P. J. Am. Med. Assoc. 289, 454–465 (2003). 35. Stern, S. & Murray, F.E. Do formal intellectual property rights hinder the free flow of scientific knowledge? An empirical test of the anticommons hypothesis: NBER Working Paper No. W11465 (2005). 36. Agrawal, A. & Henderson R. Management Science 48, 44–60 (2002) 37. Dasgupta, P. & Maskin, E. Econ. J. 97, 581–595 (1987). 38. Van Overwalle, G. & Van Zimmeren, E. Chizaiken Forum 64, 42–49 (2006). 39. Kieff, F.S. Northwestern Univ. Law Rev. 95, 691–706 (2001). 40. Pressman, L. et al. Nat. Biotechnol. 24, 31–39 (2006). 41. Eisenberg, R. Science 299, 1018–1019 (2003). 42. Rohrbaugh, M.I. Fed. Regist. 70, 18413–18415 (2005). 43. Grimm, D. Science 312, 1862–1866 (2006). 44. Ravetz, J.R. Scientific Knowledge and Its Social Problems (Oxford Univ. Press, New York, 1973). 45. Canadian Biotechnology Advisory Committee. Human Genetic Materials, Intellectual Property and the Health Sector (CBAC, Ottawa, 2006). 46. World Health Organization. Public Health Innovation and Intellectual Property Rights (WHO Press, Geneva, 2006). 47. Australian Government Advisory Committee on Intellectual Property. Patents and Experimental Use (ACIP, Sydney, 2005). 48. Canadian Biotechnology Advisory Committee Expert Working Party on Human Genetic Materials. Human Genetics Materials: Making Canada’s Intellectual Property Regime Work for the Health of Canadians (CBAC, Ottawa, 2005). 49. National Research Council Committee on Intellectual Property Rights in Genomic and Protein Research and Innovation. Reaping the Benefits of Genomic and Proteomic Research: Intellectual Property Rights, Innovation and Public Health (National Academies Press, Washington, DC, 2005). 50. World Health Organization. Genetics, Genomics and the Patenting of DNA: Review of Potential Implications for Health in Developing Countries (WHO Press, Geneva, 2005). 51. Australian Law Reform Commission. Report 99—Genes and Ingenuity: Gene Patenting and Human Health (SOS Printing Group, Sydney, 2004). 52. Federal Trade Commission. To Promote Innovation: The Proper Balance of Competition and Patent Law and Policy (FTC, Washington, DC, 2003). 53. The Royal Society. Keeping Science Open: The Effects of Intellectual Property Policy on the Conduct of Science (TRS, London, 2003). 54. Public Health Genetics Unit. Intellectual Property Rights and Genetics (PHGU, Cambridge, 2003). 55. Organization for Economic Co-operation and Development. Genetic Inventions, Intellectual Property Rights & Licensing Practices (OECD Publications, Paris, 2002). 56. Canadian Biotechnology Advisory Committee. Patenting of Higher Life Forms and Related Issues (CBAC, Ottawa, 2002).

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

PAT E N T S

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

Recent patent applications in tissue engineering

Inventor(s)

Priority application date

Publication date

12/22/2004

6/29/2006

Patent #

Subject

Assignee(s)

WO 2006067080

Magnetic pole matrices useful for tissue engineering and targeting systematic therapy for cardiovascular disease. The matrices provide a source of strong localized magnetic field gradient for targeted drug delivery and distributes the magnetic nanoparticles on the artificial surface locally and uniformly.

Steinbeis Center for Heart Li W, Ma N, Nan M, and Circulation Research Steinhoff G, Steinhoff K, (Rostock, Germany), Wenzhong L Li W, Ma N, Steinhoff G, Steinhoff K

WO 2006068972

A device for the repair and regeneration of tissue such as skin, bone and cartilage, comprising a support scaffold layer and cell sheet layer; provides equivalent cell viability and cell distribution to seeding by cell suspension.

Ethicon (Somerville, NJ, USA), Gosiewska A, Seyda A

Buensuceso CS, Colter 12/21/2004 DC, Geesin JC, Gosiewska A, Scopelianos AG, Seyda A, Sridevi D

6/29/2006

US 20060128012

A structure for growing isolated differentiable human mesenchymal cells for use in tissue engineering; includes a three-dimensional matrix of fibers that forms a scaffold for growing the isolated differentiable human mesenchymal cells.

Arinzeh T, Jaffe M, Shanmugasundaram S

Arinzeh T, Jaffe M, Shanmugasundaram S

12/3/2004

6/15/2006

WO 2006059953

A new cell culture medium comprising tumor growth factor-β1, useful for developing research/drug screening kits and liver tissue engineering and biosensors for detecting chemical/biological warfare agents.

National University of Singapore

Chia SM, Yu H

11/30/2004

6/8/2006

WO 2006055261

Preparation of a biocompatible and biodegradable polyurethane foam, comprising mixing at least one biocompatible polyol, water, at least one stabilizer and at least one cell opener to form a resin mix; useful as scaffolds for bone tissue engineering.

Carnegie Mellon University (Pittsburgh)

Didier J, Guelcher SA, Hollinger JO, Patel V

11/5/2004

5/26/2006

WO 2006046490

Cellular-tissue microchips comprising multiple cell-retaining cavities for forming cellular tissues in uniform configuration and size over a prolonged period of time; useful in, for example, tissue engineering.

Kitakyushu Foundation for the Advancement of Industry, Science and Technology (Kitakyushu, Japan)

Fukuda J, Nakazawa K

10/29/2004

5/4/2006

WO 2006044832

An implant comprising several parallel layers spaced apart by several members, and having a uniform thickness and several openings to permit fluid flow through the implant; useful in a scaffold for tissue engineering applications.

Cleveland Clinic Foundation (Cleveland)

Fleischman AJ, Mata A, Muschler GF, Roy S

10/15/2004

4/27/2006

CN 1746295

A tissue engineering cartilage based on filled stem cells from placenta, with a short cell culture time and easy quality control; useful for functional repair of joint cartilage injuries.

School of Basic Medicine, Military Medical University (Xi’an, China)

Dong L, Duan C, Guo X, Jiang H, Li J, Wang C, Zhou X

9/9/2004

3/15/2006

US 20050276791

A polymer scaffold for, for example, tissue engineering applications such as wound healing and tissue regeneration, comprising polymer layer(s) having uniform structural features with predetermined geometries.

Ohio State University (Columbus, OH, USA)

Ferrell N, Hansford DJ, Yang S

2/22/2005

12/15/2005

US 20050272153

A three-dimensional tissue scaffold implant for supporting tissue on-growth comprising a lattice having a matrix of interconnected pores; an inert, biocompatible material covering the surfaces; and at least one growth factor covering the material; useful for bone regeneration.

Bunger C, Li H, Xuenong Z

Bunger C, Li H, Xuenong Z

1/27/2004

12/8/2005

KR 2005039960

A foam dressing material using chitosan, whose material has increased stretch capacity and tensional strength and is useful for wound dressing and other structures of tissue engineering.

Hyosung Corp. (Seoul)

Choi YB, Kim DS, Kim SK

10/27/2003

5/3/2005

Source: Thomson Scientific Search Service (formerly Derwent). The status of each application is slightly different from country to country. For further details, contact Thomson Scientific, 1725 Duke Street, Suite 250, Alexandria, Virginia 22314, USA. Tel: 1 (800) DERWENT (http://www.thomson.com/scientific).

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

1095

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

NEWS AND VIEWS

Artificial sperm and epigenetic reprogramming Diana Lucifero & Wolf Reik Sperm cells derived from embryonic stem cells give rise to mice with genetic imprinting defects. Of the many differentiated cell types that have been derived in vitro from mouse embryonic stem (ES) cells, among the most intriguing are cells resembling male and female gametes1–4. Such ES cell–derived germ cells have been shown to undergo meiosis to form haploid gametes that can support early development, but their capacity to support post-natal development remained untested. In a recent report in Developmental Cell, Nayernia et al.5 address this question by demonstrating that ES cell–derived male gametes can give rise to viable offspring. ES cells are derived from the inner cell mass of the pre-implantation embryo at the blastocyst stage of early development. As pluripotent cells, they are able to generate cells of every embryonic lineage6. When introduced back into embryos, genetically modified ES cells can be used to generate knockout mice because they give rise to functional germ cells that further differentiate into male and female gametes (sperm and oocytes). ES cells can also be differentiated in vitro into various specialized cell types, which are of interest both for biological studies and for cell therapies. Eventually, the generation of gametes from human ES cells may be relevant to assisted reproductive technologies. Nayernia et al. provide the first demonstration that ES cell–derived gametes can lead to the birth of viable mice. Reporter gene–based systems have previously been used to select for a population of stem cells that express germ cell–specific genes such as Oct4 (also known as Pou5f1) or others1–3, and thereby to enrich the population of cells with the most promise of becoming gametes in vitro. Nayernia et al. Diana Lucifero and Wolf Reik are in the Laboratory of Developmental Genetics and Imprinting, The Babraham Institute, Cambridge CB2 4AT, UK e-mail: [email protected], [email protected]

took this approach further by using a clever two-step selection system in combination with special culture conditions (Fig. 1). First, they created an ES cell line to select for spermatogonial stem cells by introducing a promoter (Stra8) active in early male germ cells linked to a marker gene encoding enhanced green fluorescent protein (eGFP). The selected cell population already had the characteristics of male germ cells ready to enter the initial stages of meiosis. Using these enriched cells, they then performed another round of selection by introducing the promoter of a gene expressed in more mature haploid male germ cells (Prm1) linked to another fluorescence marker gene, dsRED. After inducing differentiation by retinoic acid treatment in the predominantly green fluorescent spermatogonial stem cells, they obtained red fluorescent cells, some of which appeared mobile. The appearance of red fluorescent cells suggested that the emerging haploid cells had undergone the final stages of spermatogenesis. The shape of the resulting sperm was, however, abnormal. Approximately one-third of the cell population obtained by selection with Stra8-eGFP turned into haploid cells when induced to differentiate. Next, the authors investigated the functionality of their in vitro–derived germ cells. Presumably because the sperm were not normal enough to fertilize oocytes by themselves, the authors injected them into oocytes using an assisted reproductive technology called intra-cytoplasmic sperm injection. A proportion of the fertilized zygotes developed into normal-looking pre-implantation embryos. Out of 65 pre-implantation embryos, 7 developed into live mice carrying the Prm1-dsRED transgene. However, most newborn transgenic mice were either smaller or larger than control mice, and died between five days and five months after birth. Thus the in vitro–derived sperm did not give rise to normal mice. The high rate of abnormal development in these manipulated gametes is likely to be

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

epigenetic, rather than genetic, in nature. Such epigenetic changes may involve DNA modifications (such as methylation) and/or chromatin modifications that in turn regulate gene expression essential for normal development. In this context, imprinted genes, a subset of mammalian genes that are methylated either in the male or female germ line and hence expressed only from one of the parental chromosomes in the offspring7, are of particular interest. These genes tend to affect fetal growth, and their parent-specific DNA methylation marks need to be erased in early germ cells (primordial germ cells), and re-established according to the sex of the gamete at later stages of gametogenesis—in this case, spermatogenesis (Fig. 1). An important question is whether this epigenetic reprogramming occurs normally as ES cells undergo germ cell development in vitro; conversely if reprogramming never happens, the resulting sperm would have abnormal patterns of imprinted methylation. Nayernia et al. examined the DNA methylation state of three imprinted genes with well-defined differentially methylated regions. Overall, the results suggest that some reprogramming did occur in the culture system, but that not all methylation marks were properly erased and re-established. Indeed, the mice that were born from ES cell–derived male germ cells also had variable imprinted methylation profiles. Together, these findings suggest that the poor viability and growth abnormalities of the mice derived from ES cell–derived sperm may be at least partly explained by incomplete epigenetic reprogramming. Follow-up experiments that rigorously investigate the methylation and expression status of a wider panel of imprinted genes in the ES cell–derived germ cells at multiple time points during the differentiation process will be important to help clarify the extent to which reprogramming occurs normally in such engineered germ cells. A similar more detailed approach should also be taken with

1097

NEWS AND VIEWS a

Imprints erased

b

Spermatogonial stem cells

Imprints maintained

Spermatocytes

Sperm

Imprints Somatic Imprints imprints? erased? established?

Normal fertilization

Egg

Early embryo

Healthy offspring

Early embryo transferred to recipient female

Healthy offspring?

Incorrect imprints maintained?

Egg ES cells

Spermatogonial stem cells

Haploid male germ cells

'Sperm'

ICSI

Katie Ris

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

PGCs in fetal gonad

Imprints established

Figure 1 Offspring from ES cell–derived male germ cells. (a) Simplified version of the normal course of events during murine spermatogenesis and the normal timing of imprint erasure, establishment and maintenance. (b) The approach of Nayernia et al. to create ES cell–derived male germ cells for derivation of viable offspring. The stages of normal male germ cell development and those described by Nayernia et al. were matched for simplicity; this rough depiction may not be biologically accurate. The in vitro derivation of sperm results in aberrant imprint erasure, establishment and/or maintenance. PGCs, primordial germ cells; ICSI, Intra-cytoplasmic sperm injection.

offspring from the ES cell-derived sperm to reveal whether the phenotypic abnormalities observed are directly linked to imprinting defects. The demonstration that sperm derived from ES cells can give rise to viable offspring mice is an important step, but the finding that these mice are abnormal and have imprinting defects suggests that the clinical applications of this procedure are still far off in the future. Clearly, ES cell–derived sperm can be used as a model system for investigating both normal and pathological germ cell development, especially in the case of human germ cells where ethical considerations preclude access to the fetal material that would be needed for such analysis. However, the relevance of this technology for infertility treatments and as a source of gametes is worthy of continued investigation. The production of oocytes from human ES cells would be of particular value for the derivation of stem cells by somatic cell nuclear transfer. The study by Nayernia et al. raises many interesting questions for future experiments. How extensive is epigenetic reprogramming, and at what time points does it occur in the culture system as compared with the in vivo situation? Reprogramming of sequences other than imprinted genes would also be important to study. Are other epigenetic marks such as chromatin signatures also affected? Can the normal reprogramming of imprints be encouraged in vitro? As the players involved in reprogramming are still not known, could ES cell–derived sperm be used as an experimental system to dissect out which genes and factors

1098

are essential for normal reprogramming? Do the ES cells used by Nayernia et al. show normal methylation profiles on imprinted differentially methylated regions to begin with? The overall efficiency of isolating sperm from ES cells is also a concern, with only 3% of oocytes microinjected with in vitro–derived male gametes giving rise to adult transgenic viable mice. Can this yield be improved? Are the embryos that fail to develop to term also dying because of aberrant imprinting reprogramming? Can a similar two-step selection approach be used for female germ cells, or are

functional oocytes inherently more difficult to obtain? Answers to these questions will enhance insight into the biology of reproduction, and may improve regenerative medicine and assisted reproductive technologies. 1. Hubner, K. et al. Science 300, 1251–1256 (2003). 2. Geijsen, N. et al. Nature 427, 148–154 (2004). 3. Toyooka, Y., Tsunekawa, N., Akasu, R. & Noce, T. Proc. Natl. Acad. Sci. USA 100, 11457–11462 (2003). 4. Clark, A.T. et al. Hum. Mol. Genet. 13, 727–739 (2004). 5. Nayernia, K. et al. Dev. Cell 11, 125–132 (2006). 6. Solter, D. Nat. Rev. Genet. 7, 319–327 (2006). 7. Reik, W. & Walter, J. Nat. Rev. Genet. 2, 21–32 (2001).

The worm turns for antimicrobial discovery Amit P Bhavsar & Eric D Brown High-throughput screening using an in vivo infection model identifies nontraditional antimicrobials. Pathogenic bacteria have evolved resistance to all the major antibiotics used to defeat them, partly in response to the use of ‘broad-spectrum’ antibiotics. This has led to calls for new Amit P. Bhavsar and Eric D. Brown are in the Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Ontario, Canada. e-mail: [email protected]

therapeutic approaches that do not contribute to the spread of antibiotic resistance1. One such strategy would be to target the virulence mechanisms of pathogenic bacteria so as to disrupt their ability to infect the host but not their viability1. An important advance in this direction was recently reported by Ausubel and colleagues2 in the Proceedings of the National Academy of Sciences USA. They describe an anti-infective screen based upon rescue of

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

the nematode Caenorhabditis elegans from a persistent infection by the human pathogen Enterococcus faecalis. Screening of over 7,000 purified compounds or natural extracts for their ability to promote survival of the infected worms revealed 16 synthetic compounds and 9 natural extracts, many of which appear to either activate host immunity or attenuate pathogen virulence. High-throughput bacterial virulence screens have previously been used to discover inhibitors of virulence for the pathogenic organisms Vibrio cholera3, Yersinia pseudotuberculosis4 and enteropathogenic Escherichia coli5. However, in each of these cases, the bacteria were screened in the absence of a host organism. Moreover, as the V. cholera and Y. pseudotuberculosis assays screened for decreased expression of known virulence genes, they relied on detailed knowledge of the pathways these organisms use for infection. In contrast, the E. coli screen relied on recognizing compounds that impaired the general bacterial virulence mechanism of protein secretion. Interestingly, the inhibitor of V. cholera virulence substantially reduced in vivo colonization in an infant mouse model3, and the inhibitor of Y. pseudotuberculosis virulence was partially effective in a HeLa cell infection model6. What makes the method of Ausubel and colleagues remarkable is that it marries high-

Inhibition of bacterial virulence

Harmless infection

throughput screening with an in vivo infection model to identify molecules that are true anti-infectives. This simple, elegant approach involves infecting nematodes with E. faecalis by allowing them to ingest the bacteria. Once a persistent infection occurs in their intestines, the worms are systematically exposed to test compounds added to liquid medium in multiwell plates. Several days later, worm survival is scored manually and compared with survival in control media. The 25 compounds that promoted worm survival undoubtedly act through very different mechanisms. Notably, the failure of some of these small molecules to affect bacterial survival in vitro suggests that they would have been overlooked in a more conventional screen for antibiotic activity. Whereas some of the compounds appear to act on E. faecalis directly, others seem to modulate worm responses to the bacteria (Fig. 1). Certain members of the latter class of compounds even allow worms to tolerate colonization by the pathogen. Other molecules had in vivo activities at lower concentrations than the minimum levels needed to inhibit in vitro growth, suggesting multiple modes of action. Antimicrobials that act directly on the bacteria could have at least two outcomes with respect to bacterial persistence. Inhibition of virulence could prevent the pathogens from

Modulation of host responses permits tolerance of infection

persisting in the worm intestine, thereby allowing the immune system to clear the infection and decrease bacterial numbers. Alternatively, a compound might have little effect on persistence but render the bacteria avirulent and therefore harmless. This would manifest as a healthy worm that remained colonized by the pathogen. These two outcomes might also occur if the compound modulated the worm immune system so that it could now readily clear the persistent infection or tolerate bacterial colonization. Whereas the former mechanism may involve upregulating nematode immunity, the mechanisms underlying increased host tolerance are unclear and very intriguing. Of course, the next challenge is to identify the mechanisms of action for each of the compounds that promote worm survival. In cases where the compound alters pathogen growth, the target might be identified genetically— for example, though multicopy suppression where target overproduction facilitates target discovery7. Another approach, recently described for the identification of FabF, a bacterial enzyme involved in fatty acid biosynthesis, as the target of the antibiotic platensimycin8, exploits target depletion through antisense RNA inhibition of target production. Mutagenesis can also yield target information, as demonstrated by the identification of ToxT,

Inhibition of bacterial virulence

Death

Activation of host immunity

Cleared infection

Katie Ris

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

NEWS AND VIEWS

Figure 1 Compounds that promote survival of Caenorhabditis elegans after persistent infection with Enterococcus faecalis can either prevent persistence of the pathogen in the intestine or enable the host to tolerate chronic innocuous intestinal colonization. Both protective outcomes may arise from either inhibiting bacterial virulence (red) or stimulating host defenses (blue). A striking difference in the appearance of live worms (sinusoidal posture) and worms that have succumbed to the infection (straight, rigid appearance resulting from proliferation of the pathogen) facilitates high-throughput screening for antimicrobials.

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

1099

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

NEWS AND VIEWS a key DNA-binding protein that allows transcription of virulence factors, as the target of virstatin, a V. cholera virulence inhibitor3. Although identifying mechanisms of action that target host responses will be challenging, a recent high-throughput screen of C. elegans to identify growth-altering compounds9 seems noteworthy. In this work, a library of 180,000 randomly mutated C. elegans strains was used to identify a calcium channel subunit as the target of nemadipine-A. Although the in vivo virulence screen designed by Ausubel and colleagues shows great potential, it may require further optimization to become compatible with industrial-scale high-throughput screening. A promising development in this regard is the use of automated inoculation and imaging of nematodes by Kwok et al.9. The easily scored morphological differences between live and dead worms observed in screens for nontraditional antimicrobials

should be amenable to such automation. Finally, the suitability of C. elegans as a model infection system deserves reconsideration. As pointed out by the authors, C. elegans can feed on several human pathogens, including the troublesome Gram negative pathogen Pseudomonas aeruginosa. However, although this model is responsive to E. faecalis, it is not susceptible to infection by other human pathogens, such as Enterococcus faecium. 1. Brown, E.D. & Wright, G.D. Chem. Rev. 105, 759–774 (2005). 2. Moy, T.I. et al. Proc. Natl. Acad. Sci. USA 103, 10414– 10419 (2006). 3. Hung, D.T., Shakhnovich, E.A., Pierson, E & Mekalanos, J.J. Science 310, 670–674 (2005). 4. Kauppi, A.M et al. Chem. Biol. 10, 241–249 (2003). 5. Gauthier, A. et al. Antimicrob. Agents Chemother. 49, 4101–4109 (2005). 6. Nordfelth, R. et al. Infect. Immun. 73, 3104–3114 (2005). 7. Li, X. et al. Chem. Biol. 11, 1423–1430 (2004). 8. Wang, J. et al. Nature 441, 358–361 (2006). 9. Kwok, T.C.Y. et al. Nature 441, 91–95 (2006).

The sweet side of biomarker discovery Carlos J Bosques, S Raguram & Ram Sasisekharan Glycomics offers exciting possibilities for discovering serum biomarkers. Glycans are involved in each and every aspect of tumor progression, from cellular proliferation to angiogenesis and metastasis1,2, and in theory could be used to diagnose, predict susceptibility to and monitor the progression of cancer. Despite the potential of glycans as diagnostic and prognostic biomarkers1, however, progress in developing reliable clinical tools has been slow. In two recent articles in Molecular and Cellular Proteomics3 and the Journal of Proteome Research4, researchers led by Carlito Lebrilla and Suzanne Miyamoto describe a promising strategy for discovering glycan biomarkers based on analyzing total glycan profiles rather than the sugars on particular glycoproteins. Protein glycosylation is one of the most common post-translational modifications Carlos J. Bosques, S. Raguram and Ram Sasisekharan are in the Biological Engineering Division, Harvard-MIT Division of Health Sciences and Technology, Center for Environmental Health Sciences, Center for Biomedical Engineering, Massachusetts Institute of Technology, 15-561, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA. e-mail: [email protected]

1100

in humans. In fact most secreted proteins are glycosylated, including important tumor biomarkers such as prostate- specific antigen and the ovarian cancer marker CA125. Many publications have suggested the use of glycans for cancer diagnostics1. Glycans might reflect pathologies in instances when reliable changes in protein profiles cannot be identified. Present at the cell surface and in the extracellular matrix, glycans are critically important in the remodeling of the microenvironment during tumorigenesis. Alterations to the normal function of the glycosylation machinery are increasingly recognized as a consistent indication of malignant transformation and oncogenesis. The extreme sensitivity of glycosylating enzymes to pathological changes is reflected by large alterations in the distribution of glycoforms presented on glycoproteins. For example, upregulation of N-acetylglucosaminyltransferase V and sialyltransferases (leading to increased β-1,6-GlcNAc branching and sialylation of N-linked glycans, respectively) are major hallmarks of cancer progression5. Specifically, increased branching of N-linked glycans has been associated with invasion, angiogenesis and metastasis6, and increased sialylation on the cell surface can, for example,

promote cell detachment from primary tumors through charge repulsion, thereby inducing tumor proliferation2. Increased branching of O-linked glycans by core 2 β-1,6-N-acetylglucosaminyltransferase has also been associated with tumorigenesis7. With at least two notable exceptions8,9, earlier studies have aimed to use carbohydrates from a specific glycoprotein as the diagnostic fingerprint. Although this approach has shed light on the functions of particular glycoproteins in many diseases, it has been difficult to correlate disease progression with specific glycosylation patterns on the protein of interest. Owing to the dynamic nature of these co- and post-translational modifications and their pleiotropic regulation, a broad overview of the total pathological changes to glycans in a tissue or body fluid (‘glycomics’) may be more informative than characterizing the glycans on particular proteins. In the new papers, the groups of Lebrilla and Miyamoto, focusing on O-linked oligosaccharide markers in ovarian and breast cancer models, rapidly profiled the total glycans released from glycoproteins in the culture media of various cancer cell lines or in the sera of diseased mice or patients. Their approach differs from earlier studies focused on single glycoproteins in that the O-linked glycans of all serum glycoproteins are released by β-elimination and analyzed. After a simple purification through graphitized carbon solid-phase extraction cartridges, fractions of glycan mixtures are directly analyzed by matrix-assisted laser desorption/ionization Fourier transform ion cyclotron resonance mass spectrometry (Fig. 1). Because no high-performance liquid chromatography separation is required, the procedure is fairly fast. Another advantage is high sensitivity, as only 50-nl samples are needed. Furthermore, as serum albumin is not glycosylated, this approach also overcomes many of the problems faced by proteomics in removing serum albumin and other abundant proteins before analysis. Profiling of the glycans in the media of four breast cancer cell lines revealed that similar breast cancer cell lines had essentially the same oligosaccharide profile, whereas a more precancerous ductal carcinoma cell line and a noncancer cell line showed different glycan profiles. This suggests that differences between glycans in different cell lines may reflect distinct stages of breast cancer. Remarkably, glycans that appeared as the disease advanced were also found in the breast cancer cell lines, but not in control cells. Furthermore, a mouse model of breast cancer revealed significant changes in glycan profiles during disease progression. Although results in mice may not adequately

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

NEWS AND VIEWS Diagnostic glycomic signatures

Normal serum

40 20

Glycan release and preconcentration

Feature recognition

MS glycoprofiling

Presence/ absence

Glycomic ratios?

Cancer serum

% intensity

80 60 40 20

(m/z)

50 0 –50

–100

Structural subgroups?

100

% intensity

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

(m/z)

100

100 80

/

60 40

/

20

α

100 80 60 40 20

✲

* (m/z)

Katie Ris

60

% intensity

80

% intensity

% intensity

100

Figure 1 Global serum glycomics for biomarker discovery. In contrast to most glycan analysis studies that focus on particular glycoproteins, total glycans from all serum glycoproteins are cleaved and preconcentrated before global glycoprofiling using matrix-assisted laser desorption/ionization mass spectrometry. Spectra from the sera of cancer patients and healthy subjects are analyzed to identify cancer-associated glycan signatures. Efficient bioinformatics tools with feature recognition capabilities will likely be indispensable in identifying cancer-associated glycomic signatures and translating them into noninvasive diagnostic tests.

represent human cancers, an advantage of these model systems in evaluating the potential of biomarkers is that matching samples can be obtained from genetically identical animals. The authors also compared the serum glycoprofiles from a limited number of cancer patients and healthy individuals. Although the carbohydrates of the graphitized carbon cartridge fractions from both sets of samples were similar, glycans from cancer samples eluted at higher acetonitrile concentrations than those from noncancer samples. This result could arise from different isomeric structures. However, it is also possible that other features, such as differences in relative glycan amounts, could affect the elution profiles. For example, as glycans are synthesized through an interconnected (and very sensitive) circuit of enzymes, pathophysiological alterations could easily alter the expression ratios of glycans by converting some glycan precursors to glycan products. Owing to its low detection limits, efficiency and rapid characterization capabilities, mass spectrometry remains one of the most popular

techniques for biomarker discovery. As demonstrated in the papers by Lebrilla and Miyamoto, matrix-assisted laser desorption ionization mass spectrometry provides a good alternative for efficiently analyzing complex glycan profiles from cell lines or serum for the purpose of discovering new biomarkers. However, the glycan structural analyses that are necessary and sufficient will likely vary with the particular application. Furthermore, if specific glycan signatures are to serve as clinically accepted biomarkers, these analytical techniques must be optimized for high reproducibility and sensitivity so that large numbers of human samples can be analyzed with statistical significance. Such studies will generate vast and complicated data sets, making data analysis a possible bottleneck. Therefore, better glycan-based bioinformatics tools must also be developed (Fig. 1)10. Although currently there are initiatives to use mass spectrometry in the clinic for diagnostic purposes, other techniques, such as lectin and glycan arrays, will also be indispensable in transforming glycan biomarkers into reliable clinical tests.

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

Glycans have great potential as cancer biomarkers because of their involvement in all stages of tumor progression1,2,11. If we can capture this vast information content in an efficient and meaningful manner and develop appropriate technologies for clinical translation, these biomolecules could become an alternative or a complement to DNA and proteins in the difficult endeavor of early cancer diagnosis. 1. Dube, D.H. & Bertozzi, C.R. Nat. Rev. Drug Discov. 4, 477–488 (2005). 2. Fuster, M.M. & Esko, J.D. Nat. Rev. Cancer 5, 526–542 (2005). 3. Kirmiz, C. et al. Mol. Cell. Proteomics 10.1074/mcp. M600171–MCP200 (2006). 4. An, H.J. et al. J. Proteome Res. 5, 1626–1635 (2006). 5. Dennis, J.W., Granovsky, M. & Warren, C.E. Biochim. Biophys. Acta 1473, 21–34 (1999). 6. Dennis, J.W. et al. Science 236, 582–585 (1987). 7. Shimodaira, K. et al. Cancer Res. 57, 5201–5206 (1997). 8. Callewaert, N. et al. Nat. Med. 10, 429–434 (2004). 9. Butler, M. et al. Glycobiology 13, 601–622 (2003). 10. Raman, R. et al. Nat. Methods 2, 817–824 (2005). 11. http://grants.nih.gov/grants/guide/rfa-files/RFA-CA07–020.html

1101

RESEARCH HIGHLIGHTS

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

PALM images stack up Despite significant progress in recent years, ‘super-resolution’ techniques for overcoming the diffraction-limited resolution of far-field optical microscopy have yet to achieve macromolecular resolution. Using a new super-resolution approach and total internal reflection fluorescence microscopy, Betzig et al. have succeeded in imaging proteins labeled with a photoactivatable fluorescent protein at a resolution below 10 nm. Photoactivated localization microscopy (PALM) resolves the signals of multiple proteins within a diffraction-limited region by an iterative process of photoactivation, localization and bleaching. Each cycle reveals the location of a small fraction of all the labeled proteins in the sample. A stack of ~104–105 images is then combined to generate a single image showing the positions of ~105–106 individual proteins. As the authors demonstrate, the labeled proteins can be visualized in their cellular context by superimposing a PALM image and a transmission electron microscope image. Compared with transmission electron microscopy of immunolabeled samples, PALM shows the labeled proteins at a much higher density (up to 105 per µm2). The method can be applied to cryosections of pelleted cells or to fixed whole cells. (Sciencexpress 10 August 2006 10.1126/science.1127344) KA

Compound relationships Determining the mode of action of new compounds can take years, but now Parsons et al. describe a way to map out the functions of bioactive compounds rapidly. Extending previous work on lethal genetic profiles in yeast deletion mutants, the researchers used Ron Davis’ barcode microarray approach to test a set of ~5000 deletion mutant strains for sensitivity to a range of compounds connected to human therapeutics—small molecules and natural products, both purified and in crude extracts, including 23 FDA-approved drugs. By analyzing the data with probabilistic sparse matrix factorization analysis, they were able to create multi-factorial groupings (as opposed to hierarchical clustering which allows for a compound to reside in a single group). Whereas some obvious groups emerged, some unexpected relationships were revealed. For example, the profile of the cytotoxic anti-HIV compound papuamide B (pap B) resembled that of peptides that disrupt membranes. Molecular biochemical analysis of pap B resistant mutants showed that pap B’s target is phosphatidyl serine, an important component of the yeast cell wall. This result explains how pap B interferes with HIV infection, and illustrates the power of the barcode approach in elucidating a drug’s mode of action. (Cell 126, 611–625, 2006) LD

MicroRNAs suppress angiogenesis inhibitors MicroRNAs have been associated with the regulation of several cellular processes, including differentiation, cell proliferation, and apoptosis. Now a role for a cluster of these short (~22 nt) non-coding RNAs has been elucidated in the control of endogenous angiogenesis inhibitors in colon cancer. After studying the role of MYC proto-oncogene

Research Highlights written by Kathy Aschheim, Laura DeFrancesco, Peter Hare, Teresa Moogan & Jan-Willem Theunissen

1102

overexpression in tumor angiogenesis and in the growth of colonocytes containing mutations in the KRAS proto-oncogene and the TP53 tumor suppressor gene implanted into mice, Dews et al. demonstrate that levels of the micoRNA polycistron miR-17-92—which is transcriptionally activated by Myc and amplified in B-cell lymphomas—are elevated in Myc-overexpressing colon cancer cells. They then go on to show that some of these microRNAs directly downregulate mRNAs encoding the anti-angiogenic proteins thrombospondin-1 (Tsp1) and connective tissue growth factor (CTGF). Direct overexpression of the miR-17-92 locus in the absence of oncogenic Myc partially restores Myc-dependent phenotypes by downregulating Tsp1 and CTGF mRNAs and increasing tumor angiogenesis; conversely, administration of a mixture of antisense 2´-O-methyl oligoribonucleotides (antagomirs) specific for microRNAs derived from the miR-17-92 re-establishes Tsp1 and CTGF protein expression. The study suggests putative therapeutic potential for antagomirs directed against these microRNAs in colon cancer. (Nat. Genet. 38, 1060–1065, 2006) JWT

Submersible rice Seasonal flooding destroys an estimated $1 billion worth of rice crops each year in south and southeast Asia. But now researchers have identified a gene that allows some cultivars to survive submergence for up to two weeks. The gene, Sub1A, is part of the 182-kilobase Sub1 (Submergence 1) quantitative trait locus, which comprises three ethylene-response factors (ERFs) and ten other genes unrelated to tolerance. Sub1A is overexpressed when plants are submersed. One allele, Sub1A-1, found only in tolerant cultivars, encodes a single nucleotide polymorphism in a mitogen-activated protein kinase site, which may explain the effectiveness of the Sub1A ERF, as phosphorylation can affect ERF-DNA binding. Intolerant japonica rice plants transformed with a copy of the allele survived 11 days of submergence, though they were somewhat smaller than normal japonica plants. To further engineer a flood-hardy rice without interfering with the variety’s desirable traits, Xu et al. introgressed the Sub1 genes into the widely grown Swarna Indian rice variety and used marker-assisted selection to pick out the progeny plants with the fewest chromosomal segments from the Sub1-source plant. The resultant Swarna-Sub1 lines survive extended submergence but show yield and plant height comparable to Swarna’s. The introduction of floodresistant varieties is an important step in protecting farmers from flood-related losses. (Nature 442, 705–708, 2006) TM

Antibody inflammatory sweet spot Although antibodies are best known for their ability to target foreign antigens and trigger inflammation, administration of IgG mixtures pooled from thousands of donors can relieve certain inflammatory autoimmune disorders. Ravetch and colleagues appear to have resolved this paradox by demonstrating that in mice, two terminal sialic acid moieties on a glycan linked to the IgG Fc domain account for the antibody’s anti-inflammatory effects. Removal of the sugar residues promotes inflammation. Despite differences between mouse and human IgG subclasses and Fc receptors, this suggests that antibodies developed to treat autoimmune diseases, such as arthritis, lupus and asthma, should be optimally sialylated, whereas sialylation of cytotoxic antibodies designed to counteract diseases such as cancer should be restricted. The glycosyltransferase and glycosidase activities in antibody-producing cells that presumably limit inflammation to periods of infection may be promising alternative therapeutic targets. (Science 313, 670–673, 2006) PH

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

FOREWORD

Empowering microarrays in the regulatory setting T

he mission of the US Food and Drug Administration (FDA) is to protect and promote the public health. One of the ways we carry out this mission is by advancing innovations that make medicines and foods safer, more effective or more affordable. Currently, almost nine out of ten investigational pharmaceuticals fail during clinical development. These failures are thought to be a consequence of both the variability among patients caused by intrinsic and extrinsic factors, and the inability to predict the effects of molecular entities in people based on in vitro and animal studies. There is a critical need for methodologies that can describe altered gene expression and cellular protein profiles in terms of their early metabolic consequences and relate these to developing, established or regressing pathologies in humans. Because of this need, drug companies are pursuing active R&D projects to develop reliable biomarkers of efficacy and toxicity using various technologies, often supported by DNA microarray data. Biomarkers can be defined as measurable characteristics that reflect physiological, pharmacological, toxicological or disease processes in humans or animals. The FDA is an active participant, with the regulated industry and the scientific community, in promoting innovative tools that will advance its mission. This participation is reflected in several significant documents, including the FDA white paper, The Critical Path to New Medical Products1 (which identifies pharmacogenomics as crucial to advancing medical product development and personalized medicine), Draft Guidance on Pharmacogenetic Tests and Genetic Tests for Heritable Markers2 and the Guidance for Industry: Pharmacogenomic Data Submissions3. This last document recognizes that at present, most pharmacogenomic data are of an exploratory or of a research nature, and FDA regulations do not require that these data be submitted to an investigational new drug application or that complete reports be submitted to a new drug application or biologics licensing application. However, to be prepared to appropriately evaluate anticipated future submissions, FDA and industry scientists need to develop an understanding of relevant scientific issues, including the following: the types of genetic loci or gene expression profiles being explored for pharmacogenomic testing; the test systems and techniques being employed; the problems encountered in applying pharmacogenomic tests to drug development and to clinical outcomes; and the ability to transmit, store and process large amounts of complex Daniel A. Casciano is professor, University of Arkansas for Medical Sciences, Little Rock, 47 Marcella Drive, Little Rock, Arkansas 72223, USA. He was former director of the US Food and Drug Administration’s National Center for Toxicological Research in Jefferson, Arkansas, USA. Janet Woodcock is Deputy Commissioner for Operations, US Food and Drug Administration, Rockville, Maryland 20857, USA. e-mail: [email protected]

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

pharmacogenomic data streams with retention of fidelity. As described in the Guidance for Industry: Pharmacogenomic Data Submissions3, the FDA is asking sponsors conducting such programs to consider providing pharmacogenomic data to the agency, voluntarily, when such data are not otherwise required by regulation. DNA microarray technology, a tool that can evaluate simultaneously the relative expression of thousands of genes, has developed rapidly and has been suggested as the presently preferred technology to identify early biomarkers of toxicity and disease. The outcome of microarray studies can be affected by many technical, instrumental, computational and interpretative factors. Indeed, a major criticism voiced about microarray studies has been the lack of reproducibility and accuracy of the derived data. To address this concern, the microarray community and regulatory agencies have developed a consortium to establish a set of quality assurance and quality control criteria to assess and assure data quality, to identify critical factors affecting data quality and to optimize and standardize microarray procedures so that biological interpretation and decision making are not based on unreliable data. These fundamental issues are addressed by the MicroArray Quality Control (MAQC) project. The MAQC project aims to establish quality control metrics and thresholds for the objective assessment of the performance achievable by different microarray platforms and evaluating the merits and limitations of various data analysis methods. It is anticipated that the MAQC project will help improve microarray technology and foster its appropriate application in discovery, development and review of FDA-regulated products. The results of these efforts are published in the compendium of papers that follows. It is anticipated that the efforts made by the contributors from diverse sectors of the scientific and regulatory communities will serve as a solid foundation on which to build a consensus on the use of microarray data in a regulatory setting. The development and validation of microarray quality control procedures also will serve as a foundation for integrating proteomics and metabonomics to accomplish an applied systems biology approach to elucidate complex disease pathways and identify and validate therapeutic targets, and identify disease biomarkers. Understanding of disease pathways will accelerate drug discovery and make clinical trials more informative by providing pharmacodynamic and safety information earlier in the drug development process. Ultimately, exploitation of microarray-based biomarkers will help bring about the transition from population-based medical treatment to true personalized medicine. Daniel A. Casciano & Janet Woodcock 1. http://www.fda.gov/oc/initiatives/criticalpath/ 2. http://www.fda.gov/cdrh/oivd/guidance/1549.pdf 3. http://www.fda.gov/cder/guidance/6400fnl.pdf

1103

Impact of microarray data quality on genomic data submissions to the FDA Felix W Frueh How can microarray data best be exploited and integrated into the regulatory decision-making process?

F

ive years ago, the completion of the sequencing of the human genome was announced1,2, triggering many comments about the value of this knowledge for new approaches and insights into drug development. However, although genomics is used in an increasing number of drug development programs, the genomics-led ‘revolution’ in drug development has not happened yet. This can be attributed to a variety of reasons; one reason is the lack of a thorough evaluation of the quality of novel technologies such as DNA microarrays as well as the manner in which the results of such experiments are analyzed and interpreted. To investigate the challenges presented to regulators by microarray data, the US Food and Drug Administration (FDA) spearheaded the formation of the MicroArray Quality Consortium (MAQC), which brings together researchers from the government, industry and academia to assess the key factors contributing to variability and reproducibility of microarray data. Ultimately, the data from this initiative will help determine a new set of standards and guidelines for the use of DNA microarray data. Genomic data matures Several factors have encouraged the adoption and integration of genomic data in drug development and regulatory assessment, including a better understanding of disease pathophysiology and targeted drug molecules to sites of action. However, there are challenges to further

Felix W. Frueh, US Food and Drug Administration, Office of Clinical Pharmacology, Center for Drug Evaluation and Research, 10903 New Hampshire Avenue, Silver Spring, Maryland 20993, USA. e-mail: [email protected]

20

Number of submissions

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

C O M M E N TA R Y

15

10

Consults VGDS 5

0

Q1 '04 Q2 '04 Q3 '04 Q4 '04 Q1 '05 Q2 '05 Q3 '05 Q4 '05 Q1 '06 Q2 '06

Quarter of year

Figure 1 Increase in formal requests (consults) for genomic data review (data submitted as part of regular INDs, NDAs or BLAs) to the Office of Clinical Pharmacology, and voluntary genomic data submissions (VGDS) to the FDA, since 2004. IND, investigational new drug; NDA, new drug application; BLA, biologic license application.

expansion of genomics use; one key issue frequently discussed is that genomic science has evolved more quickly than technologies suitable for generating consistent, high-quality genomic data. Before 2004, genomic information was largely absent from the investigational new drug submissions or new drug applications received by the FDA; today, that situation is changing (Fig. 1). This more than likely reflects the timelines associated with the drug development process overall and the integration of genomics within that process. It is therefore logical that by this time, we should be starting to see an increase in submissions to the FDA containing genomic information; indeed, the number of data submissions containing genomic information is increasing significantly (Fig. 1). On the basis of 20 voluntary genomic data submissions that have been submitted to the

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

FDA so far, it appears that the technologies for generating genomic data have only recently become a commodity of broader application. Recently, the integration of large-scale screening approaches (e.g., gene expression profiling or whole genome single-nucleotide polymorphism (SNP) scans has been observed in different stages of drug discovery and now also in drug development. Consequently, at this point, the generation and exploitation of genomic data generated from such large-scale efforts in modern drug development requires a regulatory environment adequately equipped to review such data. The agency responds Shortly after the human genome sequence was announced, a seminal paper by the FDA’s Lesko and Woodcock3 was published highlighting

1105

C O M M E N TA R Y

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

Table 1 Focus areas of voluntary genomic data submissions as of February 2006 Therapeutic areas

Scientific field

Cancer (multiple types)

Biomarkers

Alzheimer disease

Genotyping devices

Hypertension

Microarrays

Hypoglycemia

Analysis software, databases

Depression

Metabolic pathways

Obesity

Enrichment design

Rheumatoid arthritis

Registry design

All

Toxicology

All

Biostatistics

the importance of new guidance for regulatory submissions containing genomic information. This ‘call to arms’ was followed by a series of workshops organized by FDA/the Drug Information Association (Horsham, PA, USA)/and the Pharmaceutical Researchers and Manufacturers of America (PhRMA; Washington, DC, USA) on pharmacogenomics, which led to the development of a guidance document and ultimately facilitated a new type of voluntary data submission process— voluntary genomic data submission (VGDS). This process allowed for a new informal interaction between sponsors of voluntary submissions and regulators to discuss the science of novel, exploratory uses of pharmacogenomics. The Guidance for Industry: Pharmacogenomic Data Submissions4, released as a final guidance document in 2005, was accompanied by two additional documents explaining the newly created VGDS path and the function/ responsibilities of a newly created FDA-wide Interdisciplinary Pharmacogenomic Review Group (IPRG), respectively. At the same time, the FDA launched a new website (http://www.fda.gov/cder/genomics), which serves as a portal for regulatory information in the area of genomics. Together, these new regulatory resources allow and promote the submission of exploratory, cutting-edge genomic data to the FDA. This exploratory information is not used by regulators or industry as part of regulatory decision making, which is a critical aspect as it is understood that many of the data sets generated with this new technology are not yet sufficiently mature to contribute to critical regulatory decisions that have a wide-ranging impact on entire drug development programs. Nonetheless, these data are of value to regulators in understanding the changes underway in the processes, approaches and direction of drug research and development programs. It is also important to note that the Guidance for Industry: Pharmacogenomic Data Submissions4 is not a guidance about ‘voluntary’ submissions alone;

1106

instead, in very general terms, it explains what types of genomic data need to be submitted to the FDA and when, and what types of data can be submitted on a voluntary basis. Voluntary genomic data submission The VGDS program creates a forum for scientific data exchange and discussions with the FDA outside of the regular review process. The VGDS program is used for a variety of strategic purposes and continues to evolve. For example, sponsors submit data on a voluntary basis to discuss the potential impact of using this information in the drug development program: this leads to questions such as ‘how can we test the hypothesis and how can it be validated’ or ‘will this approach provide us with a clinically useful answer,’ but also to such questions as ‘how do we best analyze the data’ or ‘what is the most suitable approach for a biological (that is, mechanistic) interpretation of the data?’ To date, the FDA/IPRG has received and reviewed ~20 voluntary genomic data submissions. These submissions varied significantly in content and focus (Table 1), and a large number contained microarray gene expression data. Even though most of the microarray data were generated using a photolithographically synthesized oligonucleotide chip platform (Affymetrix; Santa Clara, CA, USA), the heterogeneity of the data submissions was surprising, illustrating to the agency two key problems: first, the need for standardization in data generation, normalization and submission; and second, the need for measures of data quality. Although VGDS data allow the FDA to gain insight into specific drug development programs and genomic data used within them, the data often do not allow a systematic assessment of quality measures. Consequently, these VGDS data are ideal to create snapshots of the state of the art in industry generation and use of genomic data, but they may or may not be consistent with more general quality standards. This poses a challenge to the interpretation of

the data itself and the conclusions drawn from such data sets. Even so, our experience with reviewing microarray data sets has already given us invaluable information that has helped us both to design the experiments and strategies needed to create such standards and to point to the most critical aspects of data analysis and interpretation. The genesis of MAQC Together with a variety of other motivating factors outlined elsewhere in this issue of Nature Biotechnology, reviewers in the IPRG created a list of issues that need to be addressed for microarray data to become acceptable for regulatory review. For example, data normalization was identified as a major factor for differences when comparing results and data interpretations performed by sponsors and FDA reviewers. The use of different data analysis protocols could explain differences as well as the use of different data interpretation tools, such as software for pathway analyses. In other words, the VGDS process, and data received in voluntary submissions, have helped to identify the impact of different data analysis strategies, but the data themselves cannot be used to address and solve the issues. To do this, a broader, well defined and generalizable process needs to be used, such as the MAQC, which allows the systematic exploration of all sources of variability, the assessment of the importance of each factor (e.g., how much does a difference in data normalization contribute to the overall variability in the data seen) and, ultimately, the determination of a set of standards and best practices to be followed. It is reasonable to expect, however, that different parameters (technical as well as practical) may continue to hamper the implementation of ‘best practices’ in the future under certain circumstances. We are aware that the studies conducted in the MAQC occur in an ‘optimized’ environment that may not always be possible to implement because of limitations to infrastructure, slower turnaround time of sample processing or other restrictions that real-life settings bear. Regardless, it is critically important to identify, and evaluate the importance of each individual step in the generation, processing and interpretation of microarray data to be able to assess the extent to which these steps may contribute to data variability. For this reason—and because a regulatory agency must not only be able to understand the steps that led to the generation of data that are submitted, but should also be able to set this information into the context of other, similar data to assess data consistency and overall quality—it is important to know what could be considered a ‘gold standard.’ Ultimately,

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

C O M M E N TA R Y this knowledge will help the agency to better understand and interpret data that have been generated under less ideal settings because the sources of the largest data variability or uncertainty in data interpretation are known and can be addressed more adequately. It has been suggested that, for example, GLP (Good Laboratory Practices, 21 CFR part 58)5 could be used to clarify some of the issues around microarray standards and data variability. The requirements of 21 CFR part 58 apply to nonclinical studies submitted to support safety findings, including nonclinical pharmacogenomic studies intended to support regulatory decision making. Given the exploratory nature of many of the microarray studies, it seems unreasonable to expect compliance with part 58 for these types of studies with the rigor of these standards. At the same time, it may not be feasible to conduct separate, long-term, non-GLP preclinical studies: sampling of tissues from GLP studies is a valuable means of conducting additional exploratorion, investigational studies. Although the removal of tissue samples and the reason for removal (e.g., exploratory, mechanistic study, tissue banking) should be specified in the protocol, the removal of specimens for investigational purposes from a study does not invalidate the GLP status of the main toxicology study, if otherwise acceptable (see ref. 4, section IV.D. for more details). The ultimate goal—standards Data generated during modern drug development are becoming increasingly more complex and large data sets, such as microarray data, need to be handled and processed in an efficient and coherent fashion. The FDA has started with the implementation of new data standards, such as the ones recommended by the Clinical Data Interchange Standards Consortium for new regulatory submissions. To date, these standards are available for such data sets as pharmacokinetics/pharmacodynamics; they are not yet available for genomic data submissions. Although we feel that it is too early for definitive recommendations on how, and in what format, to submit genomic information, it is advisable to work toward such standards, even at this early stage. Lessons learned from the VGDS program have been helpful in explaining and recommending aspects of submitting genomic data already and the agency feels that efforts, such as the MAQC, will further help it create ‘best practices’ and recommendations detailing the preferred formats and extent of genomic data submissions that should accompany regulatory filings with the FDA. From the analysis of approximately ten voluntary data submissions that contained

microarray data, the agency has found that the results are heavily dependent on the quality of the starting material that is being used for a microarray experiment, the data analysis protocol and the biological pathway analysis tools available to interpret lists of statistically significant genes. Sample storage and preparation are critical for the reproducibility of these data. Poor sample quality can prevent data interpretation from being conclusive. A second critical factor is the data analysis protocol. Different sets of gene expression signatures with different biological contexts can be generated from the same raw data by different data analysis protocols. Different biological contexts can also be generated from the same gene expression signature by different biological pathway analysis tools. The biological interpretation of the data is common currency for VGDS discussions and regulatory review between the sponsors and the FDA. The uniqueness of a list of genes in a signature is not in and of itself the goal of exploratory biomarker investigations submitted as part of a VGDS. It could, however, be important in the selection of signatures for validation studies. Consequently, for the FDA to interpret microarray data that are submitted for regulatory purposes, it is critical for sponsors of genomic data submissions to include a precise description of the steps involved before the actual array experiment, including the method of sample collection, storage, RNA extraction and labeling, as well as the data analysis protocol and biological pathway interpretation tools applied to these data. Much has been published about the concordance, or lack thereof, of data generated on different gene expression analytical platforms6,7. Although the MAQC addresses this issue with the goals of establishing quality parameters for microarray experimentation and of identifying sources of variability, the agreement (overlap) of different platforms in real-life settings might actually be less important, especially in situations where a gene signature is to be identified that might be narrowed down to a handful of key genes. In these cases, it is likely that the assay itself will be moved onto a different platform (e.g., from a high-density microarray platform onto low-density arrays or quantitative PCR) that would require new and independent validation. For this scenario, it is first important to identify the particular subset of genes that is predictive of a given state (e.g., disease, treatment effect) that in fact may or may not resemble the actual full set of genes altered in expression in that state. The use of one particular (e.g., high-density)

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

microarray platform can be used to screen for and identify these ‘predictive’ genes without the need to get a full ‘representative’ picture of the transcriptome, with the intention of producing a signature set of genes that can be used in downstream applications, such as clinical trials and clinical practice. Consequently, it is sufficient and reasonable to expect that only a subset of the transcriptome will be analyzed for this purpose. This approach is not unlike the use of haplotypes and ‘tag-SNPs’ when genotyping (rather than gene expression profiling) is performed to characterize a particular disease or disease state. Additional sources of discordance among platforms include, for example, the fact that often different locations in the same gene are used as probes on different platforms, which may, for example, either result in different reported intensities (that are therefore interpreted as different fold changes) or result in the analysis of particular splice variants of a gene. Conclusions Despite the limitations outlined above, we believe that microarray platforms are suitable tools to produce high-quality and reliable data that will prove useful in drug development and regulatory decision making. Understanding the limitations and assessing the variability is imperative, however. As those in regulatory agencies are exposed to, and expected to adequately analyze, microarray data, we need a better understanding of the technology and agreement on standards and formats for submission of data and the interpretation of the results. The MAQC provides an excellent and unprecedented resource to determine ‘best microarray practices,’ including the use of reference material, data assembly and formats. This and other efforts, such as the VGDS program at the FDA are instrumental to the efficient and effective use of microarray data in the regulatory review process. Given the increasing number of genomic data submissions to the agency, these initiatives are happening just at the right time. Disclaimer: The views expressed in this article are those of the author and not necessarily those of the US Food and Drug Administration. 1. International Human Genome Sequencing Consortium. Nature 409, 860–921 (2001). 2. Venter, J.C. et al. Science 291, 1304–1351 (2001). 3. Lesko, L.J. & Woodcock, J. Pharmacogenomics J. 2, 20–24 (2002). 4. http://www.fda.gov/cder/guidance/6400fnl.pdf 5. http://www.cfsan.fda.gov/~dms/opa-pt58.html 6. Shi, L. et al. Expert Rev. Mol. Diagn. 4, 761–777 (2004). 7. Shi, L. et al. BMC Bioinformatics 6 suppl. Suppl. 2, S12 (2005).

1107

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

C O M M E N TA R Y

A framework for the use of genomics data at the EPA David J Dix, Kathryn Gallagher, William H Benson, Brenda L Groskinsky, J Thomas McClintock, Kerry L Dearfield & William H Farland The US Environmental Protection Agency is developing a new guidance that outlines best practice in the submission, quality assurance, analysis and management of genomics data for environmental applications.

Four years ago, the US Environmental Protection Agency’s (EPA) paper Potential Implications of Genomics for Regulatory and Risk Assessment Applications at EPA1 identified four areas of oversight likely to be influenced by genomics data. These were the prioritization of contaminants and contaminated sites, environmental monitoring, reporting provisions and risk assessment. The paper also identified a critical need for analysis and acceptance criteria for genomics information

David J. Dix is in the Office of Research and Development, US Environmental Protection Agency, National Center for Computational Toxicology (D343-03), Research Triangle Park, North Carolina 27711, USA; Kerry Dearfield is at the US Department of Agriculture, Food Safety and Inspection Service and, together with Kathryn Gallagher and William H. Farland, is in the Office of the Science Advisor (8105R), US Environmental Protection Agency, Ariel Rios Building, 1200 Pennsylvania Avenue, NW, Washington, DC 20460, USA; William H. Benson is in the Office of Research and Development, US Environmental Protection Agency, National Health and Environmental Effects Research Laboratory, Gulf Ecology Division, Sabine Island Drive, Gulf Breeze, Florida 32561, USA; Brenda L. Grosinsky is at the US Environmental Protection Agency, Region 7, 901 North 5th Street, Kansas City, Kansas 66101, USA; and J. Thomas McClintock is at the Office of Prevention Pesticides Toxic Substances, US Environmental Protection Agency, MC 7101M, 1200 Pennsylvania Avenue, NW, Washington, DC 20460, USA. e-mail: [email protected].

1108

in scientific and regulatory applications. As a response to these challenges, the Genomics Technical Framework and Training Workgroup was formed and is currently developing an Interim Guidance for Microarray-Based Assays: Regulatory and Risk Assessment Applications at EPA. This guidance will address genomics data submission, quality assurance, analysis and management in the context of current possible applications by the EPA and the broader academic and industrial community. The guidance will also identify future actions that are needed to incorporate genomics information more fully into the EPA’s risk assessments and regulatory decision making. The growing impact of genomics Toxicogenomics is the examination of changes in gene expression, protein and metabolite profiles within cells and tissues, complementary to more traditional toxicological methods. Genomics tools provide detailed molecular data about the underlying biochemical mechanisms of disease or toxicity (that is, disease etiology and biochemical pathways) and could represent sensitive measures of exposure, new approaches for detecting effects of such exposures or methods for predicting genetic susceptibilities to particular stressors in the environment. Thus, genomics, proteomics and metabonomics/metabolomics can provide useful weight-of-evidence data along the sourceto-outcome continuum, when appropriate bioinformatic and computational methods are available for integrating molecular, chemical and toxicological information (Fig. 1). Identification of changes in gene expression using DNA microarrays (a collection of microscopic DNA spots attached to a solid

surface) is becoming an important genomics tool for understanding toxicological processes, and informing hazard identification and mode of action analysis. In fact, microarray data have already been encountered in agency program offices. For example, a pesticide registrant cited a published genomics article2 as part of the mode-of-action data package submitted for a product registration to the EPA’s Office of Pesticide Programs. It is not unreasonable to expect similar submissions to be made by other pesticide registrants, or other stakeholders, in support of mode-of-action analyses. Thus, there is an important need for the agency to be proactive and develop processes and policies to address how genomics data will be used in agency decision making. The EPA anticipates the development of increasing volumes of microarray data by environmental researchers, and as a part of the regulatory process. To ensure optimal use of these data, the EPA is developing science policy and guidance to address the submission, analysis and storage of microarray data (Table 1). The first step in this process was the Interim Policy on Genomics3. The Interim Policy on Genomics With the advent and growth of genomics data, a major consideration for the EPA was what to do with information currently being generated by genomics technologies and available to the agency. Although it was clear that genomics data are already available, much of these data have not been correlated with frank adverse effects, such as cancer or reproductive impairment. Therefore, in June 2002, the EPA issued an interim policy position—the Interim Policy on Genomics—which provides guidance concerning how and when genomics information

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

C O M M E N TA R Y

Source/stressor formation

Environmental concentration

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

Exposure

Biological event

Effect/outcome

Dose

Genomics/proteomics/metabonomics

Computational methods/bioinformatics

Figure 1 Genomics, proteomics and metabonomics/metabolomics can provide useful weightof-evidence data along the source-to-outcome continuum when appropriate bioinformatic and computational methods are applied toward integrating molecular, chemical and toxicological information. The source-to-outcome continuum captures the entire paradigm from the source of environmental contaminants and stressors, through to exposure, effects and ultimate outcomes on human health and ecological populations.

should be used to assess the risks of environmental contaminants under the various regulatory programs implemented by the agency3. This policy encourages and supports continued genomics research for understanding the molecular basis of toxicity and for developing indicators of exposure, effects and susceptibility; however, the interim policy clearly states that genomics data alone are currently insufficient as a basis for risk assessment and management decisions. The interim policy does state that genomics data may be useful in a weight-of-evidence approach for human health and ecological health risk assessments and can be used in concert with all the other information the EPA considers for a particular assessment or decision. Implications for regulatory oversight After the release of the Interim Policy on Genomics, the EPA’s Science Policy Council created an intra-agency Genomics Task Force and charged it with examining the broader implications genomics is likely to have on the EPA programs and policies. This Genomics Task Force developed in 2004 a genomics white paper1 that identified four areas likely to be influenced by both genomics information within the EPA and the submission of such information to the EPA: (i) prioritization of contaminants and contaminated sites, (ii) monitoring, (iii) reporting provisions and (iv) risk assessment. The first of these four regulatory applications relates to the prioritizations done by many agency programs to help focus resources on greater hazards or risks. Genomics data can be used as part of the body of information considered in the EPA’s prioritization efforts,

including testing to more fully investigate a hazard and making predictions based on this testing. Examples include the EPA’s voluntary high production volume (HPV) program, in which chemicals manufactured in large amounts are identified and their hazards characterized according to chemical category. Here, genomics may be part of a suite of tools to help confirm category groupings of HPV chemicals and identify which chemicals (or groups of chemicals) may present greater hazard or risk. The second regulatory area relates to monitoring activities at the EPA, generally for compliance and assessment purposes. Monitoring activities include the following: (i) chemical and physical analyses of air, water, soil and sediment; (ii) toxicity testing of various environmental media or chemicals; (iii) analysis of plant, animal and human tissue residues for various chemicals or their breakdown products; (iv) ecological community structure analyses; and (v) microbial community and pathogenic microorganism analyses of air, water, soil and sediment. These activities may be one of the first nearer term applications of genomics data by the EPA. For example, they are being applied to microbial source tracking to help identify nonpoint sources responsible for the fecal pollution of water systems4. As a third regulatory application of genomics, the white paper designated reporting provisions under EPA statutes—for example, Toxic Substances Control Act (TSCA) section 8(e) and Federal Insecticide, Fungicide, and Rodenticide Act (FIFRA) section 6(a)(2). Clearly, to have an effect on reporting provisions, the linkage of genomic changes to adverse effects or response pathways must be established and addressed.

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

As the predictability and validity of genomics methods increase, EPA will reevaluate this stance on reporting provisions. The fourth and final area likely to be influenced by genomics information is risk assessment. Genomics data along with conventional toxicological data may help identify which molecular events are crucial to the biological processes that represent a mode of action by a chemical and stressor. Comparative genomics might aid in the interpretation of the human relevance of animal toxicity findings and assist in assessing impacts on susceptible populations and life stages. For example, genomics data may facilitate elucidation of the possible key events in the modes of carcinogenic action, such as mutagenicity, mitogenesis, inhibition of cell death, cytotoxicity with regenerative cell proliferation and immune suppression. A mode of action comprising the same set of key events may apply to many different compounds. Thus, mRNA transcript, protein or metabolite profiles may be developed that can advance the screening of individual chemicals and allow faster, more accurate categorization into defined classes according to their mode of action. Better understanding of toxicity pathways may also provide insights into chemical interactions, and possibly improve mixtures and cumulative risk assessments. The genomics white paper1 not only identifies these four regulatory and risk assessment applications of genomics data, but also highlights some challenges and needs for the EPA5. These include research needs to link genomic responses to adverse effects and support proper interpretation of genomics data, development of acceptance criteria for genomics data submissions to the EPA, management and storage of the large amount of genomics information that the EPA is projected to handle, and training of the EPA risk assessors and managers so that they can interpret and understand genomics data. To address some of these needs, the EPA has formed the Genomics Technical Framework and Training Workgroup. This ‘workgroup’ has facilitated coordination efforts across the agency as well as with other federal agencies (e.g., the US Food and Drug Administration (FDA) and the US Department of Agriculture and continues science policy development efforts for use of genomics data in regulatory and risk assessment applications (Table 1). An interim guidance for microarray-based assays The Genomics Workgroup considered all of the ‘omics’ technologies and applications and has decided that an interim guidance on the use of data generated by DNA microarray technology

1109

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

C O M M E N TA R Y would be the most beneficial at this time to the agency and its academic and industrial communities. Thus, a document is currently under development that describes (i) the data that should be submitted to the EPA for microarray studies, (ii) a performance approach to quality assessment parameters, (iii) microarray data analysis approaches and (iv) data management and storage issues for microarray data submitted to, or used by, the EPA. The purpose of the Interim Guidance for Microarray-Based Assays will be to provide information to the community and other interested parties regarding the submission of DNA microarray data to the EPA, and to provide guidance for reviewers in evaluating and using such data or information. The interim guidance is intended to be used by EPA Program and Regional Offices to determine the applicability of specific genomics information to the evaluation of specific hazards or risks. It is important to note that microarray technology is rapidly changing, such that methodologies for generating genomics data and ensuring their quality will likely change. Even so, the need to ensure consistency and quality in generating, analyzing and using the data will be a consistent need. Thus, as the science develops, the EPA expects to revisit and revise the interim guidance. With respect to quality assurance, the interim guidance will not prescribe specific methods to be used in microarray experiments beyond compliance with MIAME (minimal information about microarray experiments) standards. Indeed, a slightly modified version of the MIAME standards6 is proposed in the interim guidance as a data submission template for the EPA, recognizing that this submission template will be subject to change as the technology evolves. In addition, the interim guidance will provide information regarding submission of microarray data to the EPA to facilitate appropriate review and consistent evaluation. The interim guidance will likely assert that a systematic approach to data analysis is necessary for the use of genomics data in risk assessments. As an interim solution, it will put forward a

genomics ‘data evaluation record’ template as a tool for systematic extraction and organization of data from genomics studies. The transfer of these evaluations, and the underlying genomics data, into searchable, electronic databases will be essential to make these complex data truly useful in risk assessments. Furthermore, development of the EPA’s databases containing gene expression profiles for a wide variety of compounds will facilitate creation of the statistical and computational methods for predicting the toxic potential of a chemical. Additional initiatives In concert with other federal agencies, the EPA has also begun to investigate and evaluate the currently available computational tools for genomics data analysis. The agency has been testing the functionality of toxicogenomics data management and analysis solutions and how these solutions can be applied toward the EPA’s efforts. For example, the FDA’s National Center for Toxicological Research’s ArrayTrack database7 is being tested. The EPA has also been collaborating with FDA, the US National Institutes of Health, the National Institute of Standards and Technology and other stakeholders on the microarray quality control (MAQC) project to establish protocols for genomics data analysis. The first results of this initiative are presented in this special issue of Nature Biotechnology. Furthermore, the agency has participated in the US National Academy of Science (Washington, DC, USA) workshops and International Life Sciences Institute (Washington, DC, USA) projects on the application of genomics to toxicology and risk assessment. Building on these previous efforts, the interim guidance suggests continued exploration of genomics tools appropriate to the application of genomics data in risk assessments and regulatory decision making. Ultimately, the EPA will need quantitative and predictive modeling tools, which will likely require the development of new algorithms and models. These tools will need to provide reliable and repeatable genomics data analysis and the consistent and necessary information for

EPA risk assessments and decision-making processes. The scientific, mathematical and statistical methods that are used for these models and analyses will need to be validated and standardized. Because of the large volumes of genomics and associated toxicological data projected, it is essential that the EPA consider the development of a complete data management solution. A preliminary outline of the functional requirements of such a solution is provided in the interim guidance. In addition, this data management solution would need to address requirements unique to scientifically based risk assessments, confidential and proprietary data security, public access and other aspects of regulatory application. The Genomics Workgroup has noted that consistency, scientific and operational robustness, common but controlled access, and availability in a scalable environment are also part of these data management requirements. Although the EPA has begun to use and develop bioinformatics research approaches, both intramurally (e.g., in the National Center for Computational Toxicology; http://www.epa.gov/comptox) and extramurally (e.g., STAR-funded Environmental Bioinformatics Centers, including the Research Center for Environmental Bioinformatics and Computational Toxicology at the University of Medicine & Dentistry of Piscataway, New Jersey and the Carolina Environmental Bioinformatics Research Center at the University of North Carolina, Chapel Hill; http://www.epa.gov/ comptox/award_biocenters.html), an agencywide data management solution integrating genomics, toxicological and other key data required for regulatory applications is not yet realized. The interim guidance will conclude with the Genomics Workgroup’s recommendations for follow-up activities. These will include the following: first, further development of genomics training materials and modules, to be offered throughout the Agency to risk assessors and decision makers, who will be faced with the challenge of interpreting and applying genomics information into regulatory and prioritization processes; second, continued

Table 1 US EPA development of science policy for the use of genomics data in regulatory and risk assessment applications Year

Publication

Purpose

URL

2002

Interim Policy on Genomics

Defined EPA’s initial approach to using genomics information in risk assessment and decision making.

html://www.epa.gov/osa/spc/genomics.htm

2004

Potential Implications of Genomics Identified impact genomics likely to have on (i) prioritization for Regulatory and Risk Assessment of contaminants and contaminated sites, (ii) monitoring, Applications at EPA (iii) reporting provisions and (iv) risk assessment.

External review pending

Interim Guidance for MicroarrayBased Assays: Regulatory and Risk Assessment Applications at EPA

1110

html://www.epa.gov/osa/genomics.htm

Describes (i) microarray data submission review to the agency, html://www.epa.gov/osa/index.htm (ii) quality assessment pending parameters, (iii) data management, analysis and evaluation and (iv) training needs for risk assessors and decision makers.

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

C O M M E N TA R Y

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

collaboration of EPA personnel with staff from other federal agencies and stakeholders in the development of tools for the analysis of genomics data; third, application of this guidance to a series of case studies to evaluate its utility in risk assessment and regulatory applications; and fourth, the updating of this guidance as needed and as the technology evolves. Conclusions The advent of genomics and the burgeoning amount of genomics-related data presents opportunities and challenges to the EPA in fulfilling its regulatory and risk-assessment responsibilities. To meet these opportunities and challenges, the EPA has initiated a series of activities to properly address the use of genomics information and has reorganized its research activities into a coordinated Computational Toxicology Program. To clarify how genomics information will be consid-

ered going forward at the EPA, the agency has provided guidance in its Interim Policy on Genomics3. The EPA has also developed a genomics white paper1 outlining implications and applications of genomics at the EPA. The final part of the EPA’s response has been to generate a technical framework that focuses on specific genomics technology, specifically DNA microarrays. This framework will be outlined in the forthcoming Interim Guidance for Microarray-Based Assays. It represents a first step for the EPA in developing formats, methodologies and consistent approaches for dealing with genomics information submitted to the agency. ACKNOWLEDGMENTS This perspective is based on the efforts of the dedicated EPA staff within the Office of the Science Advisor, the Science Policy Council, the Agency-wide Genomics Task Force and the subsequent Genomics Technical Framework Workgroups.

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

Disclaimer: This work was reviewed by EPA and approved for publication but does not necessarily reflect official agency policy. 1. US Environmental Protection Agency. Potential Implications of Genomics for Regulatory and Risk Assessment Applications at EPA. Science Policy Council. EPA Publication No EPA 100/B-04/002 (EPA, Washington, DC, 2004). http://www.epa.gov/ osa/genomics.htm 2. Genter, M.B., Burman, D.M., Vijayakumar, S., Ebert, C.L. & Aronow, B.J. Physiol. Genomics 12, 35–45 (2002). 3. US Environmental Protection Agency, Science Policy Council. Interim Policy on Genomics. (EPA, Washington, DC, 2002). http://www.epa.gov/osa/spc/ genomics.htm 4. US Environmental Protection Agency. Microbial Source Tracking Guide Document (EPA, Washington, DC, June 2005). http://www.epa.gov/ORD/NRMRL/pubs/ 600r05064/600r05064.htm 5. Dearfield, K.L., Benson, W.H., Gallagher, K. & Johnson, J. in Genetics and Environmental Policy: Ethical, Legal and Regulatory Perspectives. (ed. Marchant, G.), in press (Johns Hopkins University Press, Baltimore, 2007). 6. http://www.mged.org/Workgroups/MIAME/miame.html 7. http://www.fda.gov/nctr/science/centers/toxicoinformatics/ArrayTrack/

1111

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

C O M M E N TA R Y

Data quality in genomics and microarrays Hanlee Ji & Ronald W Davis Objective quality control indices are needed to facilitate clinical implementation of DNA microarrays used in transcriptional profiling as well as other types of genomic analysis.

D

NA microarrays are increasingly used for investigating gene expression in human diseases with the hope of identifying signatures that correlate with specific clinical outcomes. The discovery of these signatures offers the tantalizing possibility that they could be translated into fully fledged clinical diagnostic tests. Significant hurdles exist, however, in transitioning microarray technology and gene expression analysis into the complicated realm of the clinic. Namely, gene expression genomic data quality, a measure of its general reproducibility and ultimately, its true biological relevance, requires significant improvement1. For example, comparing gene expression studies using different microarray formats is fraught with difficulty, even under circumstances in which the same type of tissue is analyzed2. A recent prominent example illustrates a case where different clinical conclusions were derived from the same gene expression data set3. Currently, few if any objective metrics or established quality control standards are used to evaluate the quality of microarray studies. Often, the assessment of microarray data quality requires running replicates and making intra-sample comparisons to determine reproducibility. Using replicate arrays is an expensive strategy and cannot be routinely applied where quantities of precious biological samples, such as tumor biopsies, are limited. The majority of clinically related studies do not have replicates, leaving genomic data purveyors little in the way of guidance to determine the overall quality of

Hanlee Ji is in the Department of Medicine, Division of Oncology and Ronald W. Davis is in the Department of Biochemistry and Department of Genetics at Stanford University School of Medicine, 269 Campus Drive, CCSR 1115, Stanford, California 94305-5151, USA. e-mail: [email protected]

1112

submitted microarray data. Two major efforts currently under way, however, offer an opportunity to improve genomic data quality for gene expression. Looking at gene expression data quality Several studies have addressed the issues of genomic data quality in the realm of gene expression analysis through comparison of different formats of microarrays4–8. To date, the MicroArray Quality Control (MAQC) project—the first results of which are presented in this issue—and the External RNA Controls Consortium (ERCC) are the most comprehensive efforts in assessing and comparing gene expression data derived from common samples among different microarray platforms9,10. Both projects are focused on the analysis of highly calibrated reference RNA pools with the potential for wide distribution to the research community. Analysis of the MAQC and ERCC RNA sets has resulted in extensive gene expression data sets with validation across many microarray platforms and systems (e.g., by quantitative reverse transcription (qRT)-PCR)9,10. The public release of these results should spawn new applications to evaluate gene expression data quality. A vital part of the MAQC project has been the identification of common transcripts that are mutually represented among the various microarray platforms included in the analysis9. This aspect will enormously facilitate crossplatform comparisons of gene expression and open the door to robust meta-analysis studies in clinical gene expression studies. The completion of these projects also provides an opportunity to advocate for the adoption of genomic data quality control processes into clinically oriented studies. It is critical that there be wide acceptance of some type of quality control standards at the planning stages of clinically oriented projects. Accurate and

routinely reproducible data will improve the clinical validity of molecular signatures and speed transition into the clinical setting. For translational research, adoption of quality control will be faster if there is easy accessibility of quality standards to any size research group. We anticipate that establishing quality control standards for genomic data will substantially reduce genomic analysis costs by eliminating the need for replicate experiments and improve the design and implementation of large translational studies involving hundreds if not thousands of patient samples. Another benefit is that genomic data quality standards will facilitate future technology development. When established standards exist, it is much easier to conduct proof-of-principle studies using new systems. Moving beyond RNA There is a general recognition that quality control standards for transcriptional profiling experiments are an absolute necessity given the complexities of working with RNA, the wide variety of methodologies and different microarray platforms. We suggest that it is equally important to establish such standards for all other microarray formats, including array comparative genomic hybridization (CGH) and genotyping. When microarrays are used, the analysis of DNA has major advantages over RNA in terms of its physical and biochemical properties. Even so, many of the inherent issues of microarray performance and reliability in analyzing RNA are just as relevant. One could imagine a future consortium, similar to the MAQC and ERCC, developing a universal set of standardized DNA references with known genotypes and gene-copy alterations for use in high-throughput genotyping, sequencing and gene-copy microarray technologies. Many highly characterized DNA samples already exist, the larger hurdle being one of

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

C O M M E N TA R Y Clinical sample preparation

Array processing

Interrogation

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

RNA

RNA extraction • Protocol • Sample amount • Sample quality • Degradation • Contamination

Enzymatic preparation • Temperature • Time of incubation • Enzyme quality

Quality control checks • Temperature • Enzyme quality • Length of incubation • Quality of product

Hybridization and staining • Temperature • Time of incubation • Properties of staining • Array quality

Washing • Buffer salt concentration • Temperature • Array quality

Scanning • PMT settings (photomultiplier tube detector) • Saturation of signal • Array quality

Figure 1 DNA microarray analysis of human tissues involves multiple steps and protocols. As a result, these assays are susceptible to variance throughout the process. Improved methods are needed for monitoring experimental variation during this workflow. (Pictures used with permission of Affymetrix.)

establishing a consensus about the samples to be included. A set of DNA references would enable quality control assessment, facilitate data set comparisons among different microarray platforms and provide a valuable resource for validating new genomic technologies. Controls to assess genomic data quality Numerous studies have identified sources of inter- and intralaboratory error and variability in microarray experimental results6,7,11. They include variation in tissue processing, RNA extraction, inherent biological differences in normal tissue and microarray assay protocols11. The MAQC’s and ERCC’s RNA pools of highly characterized transcripts could be incorporated into the microarray workflow process. For example, an individual site could analyze the RNA pools characterized by the MAQC project to make performance comparisons. Leveraging the MAQC data sets will prompt the development of methods to increase the confidence that differential expression of specific genes will be reproducible. For example, we and others (H.J. and R.W.D., unpublished data; Lin, G., He, G., Shi, L. & Zhong, S., personal communication) are currently developing algorithmic methods that use the MAQC data set to account for interlaboratory variation in the discovery of differentially expressed genes. Another strategy for monitoring genomic data quality would rely on highly characterized external controls or RNA pools at every step of a microarray experimental protocol12. We and others have suggested the incorporation of a universal set of nucleic acid controls tailored to measure performance for individual steps of microarray analysis. Several RNA transcripts or pools, PCR products, oligonucleotides or other external ‘spiked’ controls would be added at every individual

step of the microarray analysis process. For example, one could imagine multiple ‘spiked’ external control sequences that would directly measure the quality and subsequent level of degradation of nucleic acid extracted from processed tissues; other controls would assess an assay’s enzyme quality and some would be specific for the hybridization process. These external controls would be assessed via microarray hybridization. To facilitate the development of external controls, one could design synthetic sequences as probes to avoid problems of cross-hybridization and reduce the interfering aspects of nucleic acid secondary structure. Incorporating synthetic sequence probes and targets would be quite similar to the development of oligonucleotide barcodes in microarrays, which has proven to be quite robust13. Another application that would improve genomic data quality is the inclusion of universal external controls in different concentrations for normalization in individual microarray experiment. As the final step of a quality control assessment process, the formal report of quality control performance would be incorporated in the resulting data file output. Building in a quality control assessment and an incorporated report of quality metrics would be enormously useful in a variety of settings (Fig. 1). We offer some hypothetical examples; a genomic data quality report would assist the individual researcher in measuring the performance of an experiment ‘on-the-fly,’ and provide journal editors with some external criteria to judge the quality of submitted data sets, and embedded quality metrics in genomic data reports would substantially quicken the complicated task of regulatory agency analysis and review. A universal set of quality control reagents for genomic data quality assessment also has the potential to decrease

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

costs. Individual researchers could assess their genomic data quality at the very beginning of a project and avoid costly mistakes. Like its MAQC and ERCC predecessors, any future efforts would require general agreement and coordination among the research community, government agencies, microarray manufacturers and producers of biological reagents. We believe this is a realistic goal. Conclusions The completion of the MAQC and ERCC collaborative projects sets the foundation for future consortia working towards universal genomic data quality control standards. These projects herald a movement in the genomics community to improve the reliability of microarray technologies in both basic and clinical research. Perhaps our greatest aspiration is that through these efforts we will improve genomic data quality sufficiently to spur rapid development of the next generation of genomic diagnostics and thus have a positive impact on the provision of human healthcare. 1. Steinmetz, L.M. & Davis, R.W. Nat. Rev. Genet. 5, 190–201 (2004). 2. Tan, P.K. et al. Nucleic Acids Res. 31, 5676–5684 (2003). 3. Tibshirani, R. N. Engl. J. Med. 352, 1496–1497 (2005). 4. Jarvinen, A.K. et al. Genomics 83, 1164–1168 (2004). 5. Bammler, T. et al. Nat. Methods 2, 351–356 (2005). 6. Irizarry, R.A. et al. Nat. Methods 2, 345–350 (2005). 7. Larkin, J.E., Frank, B.C., Gavras, H., Sultana, R. & Quackenbush, J. Nat. Methods 2, 337–344 (2005). 8. Shi, L. et al. BMC Bioinformatics 6 Suppl 2, S12 (2005). 9. MACQ Consortium. Nat. Biotechnol. 24, 1151–1161 (2006). 10. Baker, S.C. et al. Nat. Methods 2, 731–734 (2005). 11. Cobb, J.P. et al. Proc. Natl. Acad. Sci. USA 102, 4801–4806 (2005). 12. van Bakel, H. & Holstege, F.C. EMBO Rep. 5, 964–969 (2004). 13. Eason, R.G. et al. Proc. Natl. Acad. Sci. USA 101, 11046–11051 (2004).

1113

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

A N A LY S I S

Evaluation of DNA microarray results with quantitative gene expression platforms Roger D Canales1,10, Yuling Luo2,10, James C Willey3,10, Bradley Austermiller3, Catalin C Barbacioru1, Cecilie Boysen4, Kathryn Hunkapiller1, Roderick V Jensen5, Charles R Knight6, Kathleen Y Lee1, Yunqing Ma2, Botoul Maqsodi2, Adam Papallo5, Elizabeth Herness Peters6, Karen Poulter1, Patricia L Ruppel7, Raymond R Samaha1, Leming Shi8, Wen Yang2, Lu Zhang1 & Federico M Goodsaid9 We have evaluated the performance characteristics of three quantitative gene expression technologies and correlated their expression measurements to those of five commercial microarray platforms, based on the MicroArray Quality Control (MAQC) data set. The limit of detection, assay range, precision, accuracy and fold-change correlations were assessed for 997 TaqMan Gene Expression Assays, 205 Standardized RT (Sta)RT-PCR assays and 244 QuantiGene assays. TaqMan is a registered trademark of Roche Molecular Systems, Inc. We observed high correlation between quantitative gene expression values and microarray platform results and found few discordant measurements among all platforms. The main cause of variability was differences in probe sequence and thus target location. A second source of variability was the limited and variable sensitivity of the different microarray platforms for detecting weakly expressed genes, which affected interplatform and intersite reproducibility of differentially expressed genes. From this analysis, we conclude that the MAQC microarray data set has been validated by alternative quantitative gene expression platforms thus supporting the use of microarray platforms for the quantitative characterization of gene expression. To evaluate performance characteristics of gene expression measurement technologies and the data they generate, one must identify alternative quantitative platforms that can be used as references. The MAQC consortium used the TaqMan assays, Standardized (Sta)RT-PCR and

1Applied Biosystems, 850 Lincoln Centre Dr., Foster City, California 94404, USA. 2Panomics, Inc., 6519 Dumbarton Circle, Fremont, California 94555, USA. 3University of Toledo, Toledo, Ohio 43614, USA. 4ViaLogy Corp., 2400 Lincoln Avenue, Altadena, California 91001, USA. 5University of Massachusetts-Boston, 100 Morrissey Blvd., Boston, Massachusetts 02125, USA. 6Gene Express, Inc., 975 Research Drive, Toledo, Ohio 43614, USA. 7Innovative Analytics, 7107 Elm Valley Dr., Kalamazoo, Michigan 49009, USA. 8National Center for Toxicological Research, US Food and Drug Administration, 3900 NCTR Rd., Jefferson, Arkansas 72079, USA. 9Center for Drug Evaluation and Research, US Food and Drug Administration, 10903 New Hampshire Ave., Silver Spring, Maryland 20993, USA. 10These authors contributed equally to this work. Correspondence should be addressed to F.M.G. ([email protected]).

Published online 8 September 2006; doi:10.1038/nbt1236

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

QuantiGene platforms for this purpose because these platforms had been shown to have high assay specificity and detection sensitivity, broad linear dynamic range and high signal-to-analyte response1–4. The platforms were used to evaluate some of these performance characteristics in each commercial whole genome microarray platform investigated in the MAQC study. In addition, we report the fold-change correlation of each alternative quantitative platform relative to these microarray platforms. We observed high correlations between the quantitative platform measurements and the data derived from the microarrays and were also able to identify the sources of variability among microarray platforms relative to the quantitative platforms. Here we define validation as a measure of the concordance and discordance of the microarray data with the quantitative reference platforms selected—we used the results of the quantitative platforms as a reference against which to evaluate the microarray platforms. We have thus not attempted to establish a ‘gold standard’ for expression measurements but a solid reference point to allow data validation. Quantitative, real-time PCR has been developed over the last decade to specifically measure template molecule numbers4,5. The development of fluorogenic probes6 enabled accurate quantification of PCR products through measurement of a fluorescence signal during the exponential amplification phase. TaqMan Gene Expression Assays are based on the use of the 5′ nuclease activity of Taq polymerase to hydrolyze a targetspecific, dual-labeled, fluorogenic hybridization probe during the extension phase7. The number of template transcript molecules in a sample is determined by recording the amplification cycle in the exponential phase (cycle threshold or CT), at which time the fluorescence signal can be detected above background fluorescence. Thus, the starting number of template transcript molecules is inversely related to CT–the more template transcript molecules at the beginning, the lower the CT7,8. TaqMan assays have been used in recent studies to validate microarray data9–11. StaRT-PCR4,12 is a competitive PCR-based platform that enables endpoint quantification of PCR products. After RNA is converted to cDNA, the cDNA is added to a standardized mixture of internal standard (SMIS) competitive templates, aliquoted into microplate wells containing gene-specific PCR primers and amplified for 35 cycles. The individual endpoint StaRT-PCR products are then separated by size and quantified by high-throughput microfluidic electrophoresis. StaRT-PCR has also been used in studies to validate microarray data1 and has been used to generate potential biomarkers for disease stratification13,14.

1115

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

A N A LY S I S The QuantiGene Reagent System15 detects DNA and RNA directly without a reverse transcription step. It is a sandwich nucleic acid hybridization platform in which targets are captured through cooperative hybridization of multiple probes16. This complex is detected through signal amplification by a branched DNA amplifier and chemiluminescence signal generation. The QuantiGene assay has been used in US Food and Drug Administration–approved clinical diagnostic products for quantitative viral load determination of HIV, hepatitis C virus and hepatitis B virus with detection sensitivity of <50 transcript molecules17–19. Because the QuantiGene assay can measure gene expression either by measuring RNA directly without a reverse transcription step, or by measuring cDNA without PCR amplification, it provides an independent method of measurement relative to the quantitative reverse transcription (RT)-PCR and microarray platforms. Application of these quantitative platforms in the MAQC project increased the confidence in concordance observed between the microarray platforms. In addition, the results obtained from using these platforms allowed us to explore the sources of variability among microarray platforms. With this comprehensive evaluation, we demonstrate the value of alternative quantitative platforms as tools for the independent validation of microarray data and the resolution of discordant results. RESULTS Assay performance of three alternative quantitative platforms The MAQC consortium selected a list of 1,297 genes to evaluate and compare the performance of microarray and alternative quantitative platforms and to identify and analyze discordant results. TaqMan assays, StaRT-PCR and QuantiGene assays were performed on 997, 205 and 244 of the 1,297 genes, respectively. Gene lists used for analysis of selected performance metrics for quantitative platforms, and for analysis of concordance between the quantitative platforms and microarrays are shown in Supplementary Table 1 online. Four RNA samples A, B, C and D, provided by the MAQC consortium, were analyzed20. TaqMan assays were done in quadruplicate, and StaRT-PCR assays in triplicate, on cDNA generated from 10 ng total RNA (Supplementary Methods online). Both the TaqMan assays and StaRT-PCR were based on cDNA from a single reverse transcription reaction. QuantiGene assays were performed in triplicate directly from 500 ng of total RNA (Table 1). Performance metrics presented are not

directly comparable because each platform assayed a different gene set, and had different assay ranges of measurements and signal-to-analyte response. Detection sensitivity TaqMan assay quantification is directly related to CT. A gene is not detectable when the average CT > 35 cycles. By this definition, 857 genes (86%) were detectable in both A and B. The StaRT-PCR detection limit is defined as ten transcript molecules. By this definition, 193 genes (94%) were detectable in both A and B. For QuantiGene the detection limit is defined as a signal three standard deviations (s.d.) above the background. By this standard, 223 genes (91.4%) were detectable in both A and B. Assay range The assay range represents the difference in signals measured on a log10 scale between genes with the highest and the lowest expression. The assay range for TaqMan assays was 8.1 with CT values ranging from 8 (>108 transcript molecules) for 18S rRNA to 35 (~5 transcript molecules) for low expressors. For StaRT-PCR, the assay range was 6.8 with normalized transcripts of 6.4 × 107 transcript molecules for 18S rRNA to 10 transcript molecules for low expressors. For QuantiGene, the assay range was 4.1 with the highest assay range of 599 relative luminescence units (RLU) for LDHA and the lowest detectable signal of 0.045 RLU for SPARCL1. Precision The precision of the three alternative quantitative platforms was measured by coefficient of variance (CV) (Fig. 1 and Table 1) or s.d. (Supplementary Fig. 1 online). There were interplatform differences in the number of transcript molecules (RNA or cDNA) loaded into each assay. Because of differences in the amount of sample loaded (Table 1), a majority of the genes measured with QuantiGene contained >6,000 transcript molecules in the assay, whereas a majority of those measured by TaqMan assays and StaRT-PCR had less. These two platforms were used to assess the previously reported stochastic process involved in the relationship between transcript molecules loaded and CV21. A clear trend of increased CV with decreasing abundance of transcripts was observed for TaqMan assays and StaRT-PCR when <6,000 transcript

Table 1 Summary of platform performance metrics Platform

Gene list

Sample processing

Detection sensitivitya

Symbol

Number of genes tested

Sample input

Assay replicates

Data presentation

Both A & B Both A & B above LOD below LOD

Dynamic Precisionc (median) rangeb All data >6,000 (log10)

Accuracyd (median)

TAQ

997

cDNA from 10 ng total RNA, one RT reaction

Four replicates of cDNA

Normalized against POLR2A

857 (86%) 38 (3.8%)

8.1

3.46

2.42

0.950

3.6

9.4

GEX

205

cDNA from 10 ng total RNA, one RT reaction

Three replicates of cDNA

Normalized against beta-actin

193 (94%)

4 (2.0%)

6.8

6.26

3.82

0.96h

0.4h

21.1h

QGN

244

500 ng total RNA

Three replicate of RNA directly

Original data 223 (91%)

5 (2.0%)

4.1

2.16

2.12

0.994

1.0

5.0

Linearitye (R2)

RA RA (%median)f (%variance)g

aDetection sensitivity: the number (percent) of detectable or undetectable genes in both sample A&B based on each platform’s detection limit. bAssay range: based on the ratio of (highest detectable signal/lowest detectable signal) of all the genes and samples measured in each platform. cPrecision: based on median value of CV measured either a) in all genes and all samples in each platform or b) in samples with 6,000 transcript molecules or above. dBased on formula C = 0.25A +0.75B and D = 0.75A + 0.25B for TaqMan assays and QuantiGene and C = 0.88A + 0.12B and D = 0.45A + 0.55B for StaRT-PCR. eLinearity: based on the median R2 slope of the linear fit of assay signal from sample A, B, C, D for all the detectable genes with greater than twofold difference between A and B. 829, 125 and 223 genes are analyzed for TaqMan, StaRT-PCR, and QuantiGene, respectively. fRA score (% median): RA (relative accuracy) score for sample C and D for a gene is defined as (C-C′/C′) and (D-D′/D′), which represents the percent difference of experimental from the expected. Median value of % RA score for both sample C and D combined is presented here. Only detectable genes in both A & B are analyzed for each platform. gRA score (% variance): median value of the absolute RA scores for both sample C and D combined is presented here. hBased on a recalibrated data set (Supplementary Methods).

1116

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

A N A LY S I S

A QuantiGene

B TaqMan

B StaRT-PCR

B QuantiGene

assays, StaRT-PCR and QuantiGene, respectively, which are all closely centered around zero. The median distribution of the absolute value of RA scores (|∆C| and |∆D|) indicates the variance of percent difference between the predicted signal C′ and D′ and the actual assay range C and D. For TaqMan assays, the median variance value for 856 genes for both C and D was 9.4; for StaRT-PCR (193 genes) it was 21.1 and for QuantiGene (223) genes it was 5.0. The data for the QuantiGene platform are notable given that these values encompass the system-wide accuracy of the platform.

80

Coefficient of variation

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

60 40 20

80 60 40 20

0

10

20

30

0

10

20

30

0

10

20

30

Transcript molecules ◆

Unfiltered

◆

Below LOD

◆

Above 6000 copies

Figure 1 Effect of the number of transcript molecules on assay precision. The measured (StaRT-PCR) or estimated (TaqMan assays and QuantiGene) number of transcript molecules loaded into an assay for each gene in sample A or B was plotted against its CV. The data for the three platforms were transformed to be on the same x-axis scale as described in Methods. The vertical dashed line is ~6,000 transcript molecules; blue symbol, assays detecting <6,000 transcript molecules; orange, assays detecting >6,000 transcript molecules; green, assays below the limit of detection. LOD, limit of detection.

molecules (below dashed line in Fig. 1) were loaded as also specified in Table 1. For the TaqMan and StaRT-PCR platforms, each cDNA sample was split for replicate measurements, so precision measurement did not include the reverse transcription reaction. For the QuantiGene platform, replication encompassed the entire process from total RNA to chemiluminescent detection. Relative accuracy Relative accuracy was defined as the proximity of observed expression values for C and D to the predicted values based on measured expression values for A and B. Error handling for all platforms was on a linear scale with the exception of TaqMan assays in which errors increased exponentially because CT is transformed to number of molecules. The percent difference between the predicted signal C′ and D′ and the actual assay signal C and D could be used as an indication of relative assay accuracy (RA). An RA score ∆C and ∆D for a target gene was defined as (C–C′/C′) and (D–D′/D′), respectively. The distribution of percent difference from expected (RA score) for each gene was presented in a box plot for each platform (Fig. 2 and Table 1). The median percent difference from expected for both C and D was 3.6, 0.4, 1.0 for TaqMan

Figure 2 Analysis of assay accuracy. The values measured for C and D were compared to the values expected (% difference) based on measured A and B values. Formulas used to calculate expected C and D are provided in text. Box plot components are: horizontal line, median; box, interquartile range; whiskers, 1.5× interquartile range; black squares, outliers. TAQ, TaqMan assays; GEX, StaRT-PCR assays; QGN, QuantiGene assays; LOD, limit of detection. The number of genes for each platform is shown.

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

Fold-change correlation To evaluate the concordance of fold changes between the alternative quantitative platforms, we performed regression analysis of fold differences in sample A compared to sample B. This analysis was performed using pair-wise common gene sets between platforms because the overlap between the three platforms was limited to 48 genes (Fig. 3). The R2 and slope for TaqMan assays versus StaRT-PCR (92 common genes) were 0.88 and 0.93, respectively; for QuantiGene versus TaqMan assays (193 common genes), 0.81 and 0.78, respectively; and for QuantiGene versus StaRT-PCR (55 common genes), 0.85 and 0.77, respectively. Although linear regression analysis indicates good fold-change correlation across the three platforms, the respective slopes indicate compression or expansion effects between the platforms. Concordance of microarrays with alternative quantitative platforms We used the results of the alternative quantitative platforms as a reference to evaluate concordance with microarray platforms. For crossplatform comparison to microarrays, we evaluated four parameters (Figs. 4 and 5): (i) detection sensitivity, the ability of the microarrays to detect genes that were called ‘present’ by each alternative quantitative platform; (ii) the fold-change correlation between microarrays and each alternative quantitative platform; (iii) true positive rate (TPR), the concordance of genes called statistically differentially expressed by the TaqMan assay that are also called statistically differentially expressed in the microarrays; (iv) false discovery rate (FDR), the concordance of genes differentially expressed in microarrays that are not differentially expressed in the TaqMan assay. TaqMan assays were evaluated for all parameters, whereas StaRT-PCR and QuantiGene were evaluated only

Sample D, A&B > LOD

Sample C, A&B > LOD Percent difference

A StaRT-PCR

100

100

50

50

0

0

– 50

– 50

–100

–100 TAQ 850

GEX 192

QGN 223

TAQ 850

Sample C, A&B > 6K Percent difference

A TaqMan

100

50

50

0

0

– 50

– 50

–100

–100 GEX 75

QGN 183

QGN 223

Sample D, A&B > 6K

100

TAQ 95

GEX 192

TAQ 95

GEX 75

QGN 183

1117

A N A LY S I S 92 Common genes

42

48 5

44

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

97

StaRT-PCR

181 Common genes 15

15

10

10

10

5 0 –5 –10

5 0 –5

0

5

10

15

GEX Log2FC

0 –5

–15

–15 –15 –10 – 5

5

–10

–10

–15 995

53 Common genes

15

QGN Log2FC

134

TAQ Log2FC

625

QuantiGene

QGN Log2FC

TaqMan

–15 –10 – 5

0

5

10

TAQ Log2FC

15

–15 –10 – 5

0

5

10

15

GEX Log2FC

Figure 3 Correlation of fold change between alternative quantitative platforms. The sample B over sample A (B/A) fold changes (log2) for each gene common between two platforms were subjected to bivariate analysis. (a) TaqMan assays versus StaRT-PCR. (b) QuantiGene versus TaqMan assays. (c) QuantiGene versus StaRT-PCR. The dashed line on each graph represents the ideal slope of 1.0. The solid lines represent a linear regression fit. The overlapping gene list among the alternative quantitative platforms is represented in the Venn diagram. Linear fit: TaqMan assay versus StaRT-PCR, Y = –0.03647 + 0.9347X, R2 = 0.879; QuantiGene versus TaqMan assay, Y = 0.14 + 0.7825X, R2 = 0.8118; QuantiGene versus StaRT-PCR, Y = 0.4095 + 0.7707X, R2 = 0.8497.

for parameters i and ii because fewer genes were assayed for these platforms. Detailed site-by-site analysis of genes is provided for StaRT-PCR and QuantiGene in Supplementary Table 2 online and for TaqMan assays in Supplementary Figure 2 online. Detection sensitivity analysis was done for each alternative quantitative platform using the genes common to that platform and each of the microarray platforms. For this reason, assay ranges and expression characteristics of gene sets differed. There were 845, 157 and 197 genes determined to be present in sample A by TaqMan assays, StaRT-PCR and QuantiGene, respectively. At the lower ranges of gene expression, for each microarray, the fraction of genes detected decreased relative to each of the alternative quantitative platforms (Fig. 4a–c). In addition, detection sensitivities relative to each alternative quantitative platform varied among the microarray platforms. A fold-change comparison between each alternative quantitative platform and each microarray platform was also performed using LOWESS smoothing (Fig. 4d–f, ref. 22), which does not assume a linear relationship of fold-change values between platforms. We used a total of 392, 101 and 83 genes that were present in samples A and B at each site measured by each microarray platform and shared with TaqMan assays, StaRT-PCR and QuantiGene, respectively, for comparison. Although excellent fold-change correlations were observed, varying degrees of compression of signal-to-analyte response relative to the alternative quantitative platforms were also found. These data are consistent with the analysis presented elsewhere in this issue20. An additional analysis was done to show that compression effects are detectable for both low and high expressors (Supplementary Fig. 3 online). Traditionally, analysis of accuracy is carried out by analyzing the true positive rate (TPR) and false discovery rate (FDR). In this case, the actual rates were unknown. For this reason, we compared the microarray platforms to TaqMan, which became the reference platform. Using TaqMan assay calls as the reference, we constructed contingency tables against microarray platforms, in which the concordance was determined and both the P-value significance of the t-test and fold-change directionality (up- or downregulation) were taken into consideration. Specifically, true positives (TP) are genes differentially expressed (significant P value for the t-test) in both TaqMan and microarray platforms with fold change in the same direction; true negatives (TN) are genes not differentially expressed in either platform; false positives (FP), consist of two sets of genes: (i) genes not differentially expressed in TaqMan and differentially expressed in microarrays, or (ii) genes differentially expressed in both platforms with fold change in the opposite direction; false negatives (FN), genes differentially expressed for TaqMan and not for microarrays.

1118

For TPR analysis in TaqMan assays, microarrays were compared to genes considered differentially regulated at fold-change cut-offs of 0, 1.5 and 2.0 (Fig. 5a–c, Supplementary Table 3 online). For microarrays, differential expression was measured using a t-test and controlling for FDR at a 5% level23 for genes present in either sample A or B. For approximately half of the assay range assessed by TaqMan assays, there were consistent TPR values across array platforms. However, it is apparent that at low expression, detection percentages were directly proportional to TPR. As a result, there was also variation (up to 20%) in TPRs between array platforms (Fig. 5a, Supplementary Table 3 online). FDR analysis (Fig. 5d–f, Supplementary Table 3 online) using TaqMan assays as a reference also showed consistent FDRs for genes expressed at medium and high levels for the microarray platforms. As expected, alternative quantitative platforms showed ~5% discordance with arrays in agreement with the FDR cut-off used for defining differential expression in microarrays. However, genes expressed at low levels showed a variable and inverse relationship to FDR values (Fig. 5d, Supplementary Table 3 online). These results support the idea that differential expression measurement depends on the detection limit for each microarray platform. Discordant gene analysis Alternative quantitative platforms can also be used to resolve discordance among the microarray platforms because specific assays can be designed easily to identify the source of the discordance by probing different regions. Analysis of extremely discordant results among the 997 genes shared by microarray platforms and TaqMan assays resulted in 9 genes (~1%) that exhibit twofold or greater changes in opposite directions on different platforms with P < 0.0001 (Supplementary Table 4 online). Some of these genes such as POMC, LTA and EPHA7 (Supplementary Fig. 4 online) were considered low expressors by TaqMan assays (CT values > 32) and, as expected, were undetected in a majority of the microarray platforms. However, some genes appeared to exhibit true discordance, of which three (ELAVL1, IGFBP5, ABCD1) were selected for further analysis by the three alternative quantitative platforms. To investigate the nature of the discordance, we designed probes against different regions of the three genes. For IGFBP5 and ABCD1, alternative quantitative platform probes indicate consistently lower expression in sample A along the length of the transcripts (Fig. 6, Supplementary Table 5 online). These results suggest that discordance between the platforms in some cases is likely to be a result of cross-hybridization of microarray probes with other sequences. For ELAVL1, alternative quantitative platform probes were able to evaluate differential expression characteristics of the 5′ and 3′ ends of the gene. This result is consistent with a mapping

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

A N A LY S I S study showing that ELAVL1 has two alternative polyadenylation sites (unpublished observations). We also investigated some genes (DPYD, PTGS2, FURIN) that were discordant between the alternative quantitative platforms. DPYD discordant results were determined to be a result of probing different sequence locations in the gene. When probes from each alternative quantitative platform were designed to interrogate similar sequences, expression characteristics along the length of the gene were found to be in concordance. Although more 5′ probes appeared to have discrepancies in directionality of expression, these differences were found to be statistically insignificant (P > .01). Multiple probe locations for PTGS2 generated expression differences in the same direction of change across all three platforms. The only gene that remained discordant after using multiple probe designs for each of the three platforms was FURIN. For this gene both TaqMan assays and StaRT-PCR detected differential expression in probes specific to the 5′ end of the gene. Although all platforms interrogate this region of the gene, the smaller probes (TaqMan assays; base 25–95 and StaRT-PCR; base 22–182) may be detecting a splice variant not detected by probes interrogating a longer region of the gene (QuantiGene; base 1–501). Thus, by designing probes against different regions of a gene, alternative quantitative platforms can confirm location-specific expression characteristics of genes and aid in the resolution of discordant gene expression data.

indicating that although this metric reflects the ability of each platform to detect expression, it may also be subject to the stringency defined by the array manufacturer in applying detection calls. The consequences of these varying stringencies are that whereas a relaxed stringency in detection calls can lead to better detection and differential expression concordance, there will be a higher percentage of false positives. Supplementary Figure 2 online verifies that the discordance in differential expression is related to the intersite and interplatform variation in detection. Using StaRT-PCR or QuantiGene as references and more stringent criteria in which a fold-change cutoff of 2.0 was applied for genes that were considered present in at least three out of five replicates in both A and B samples did not eliminate intersite or interplatform variation in detection of differentially expressed genes (Supplementary Table 2 online). It is clear that this variation is nearly exclusively for genes expressed at low level. Even with these more stringent selection criteria, intersite variation in detection resulted in intersite and interplatform variation in lists of differentially expressed genes. Another source of discordance in differentially expressed genes in this study was interplatform variation in compression. Using alternative quantitative platforms as a reference, interplatform variation in signal-toanalyte response was observed (Fig. 4d–f) and it was particularly large among genes expressed in the high or low range (Supplementary Fig. 3 online). This platform-dependent compression was associated with discordance in differentially expressed genes (Supplementary Table 2 online). Whereas these results have identified specific causes of discordance in lists of detected, and/or differentially expressed genes, we found excellent fold-change correlation between each quantitative platform and each microarray platform for those genes that were detected by microarray platforms (Fig. 4d–f). Of the 845 genes detected in the microarray

DISCUSSION We have assessed three quantitative gene expression measurement technologies for their performance metrics, correlated the results obtained with them to DNA microarray data and then subsequently used them as a means to identify sources of discordance among microarray platforms. Our results show a good correlation between quantitative platform measurements and microarray data. This is true, regardless of whether RNA or cDNA levels were measured. A primary focus of this study was to identify possible sources of discordance. On the basis of a b c 100 100 100 data reported here, we have identified specific 80 80 80 reasons that partially explain why, as previously 60 60 60 reported22, groups of genes detected as differDetection 40 40 40 entially expressed on a particular microarray ABI ABI ABI AFX AFX AFX 20 20 20 platform are occasionally not reproducible AG1 AG1 AG1 GEH GEH GEH TaqMan QuantiGene StaRT-PCR ILM ILM ILM 0 0 0 across microarray platforms. 5 10 15 20 5 10 15 20 5 10 15 20 Whereas alternative quantitative platforms d f ABI e could detect over 85% of the genes shared ABI ABI 15 AFX AFX AFX 10 10 AG1 AG1 AG1 across alternative quantitative and array plat10 GEH GEH GEH ILM ILM ILM 5 5 5 FC forms in this study, microarray platforms were 0 0 0 correlation less sensitive in the detection of lower expressed –5 –5 –5 genes in this set (Fig. 4a–c, Supplementary –10 –10 –10 TaqMan QuantiGene StaRT-PCR Table 2 and Supplementary Fig. 2 online). In –15 –15 –10 – 5 0 5 10 15 –10 – 5 0 5 10 –10 – 5 0 5 10 addition, relative to the alternative quantitative platforms, detection levels varied by as Average signal much as 60% among microarray platforms for lower expressed genes in this set. Since sig- Figure 4 Performance of microarray platforms relative to alternative quantitative platforms. (a–c) nificant differential expression in microarrays Sensitivity of detection. Each microarray platform was compared to TaqMan (a), StaRT-PCR (b) or QuantiGene (c) for ability to detect genes expressed in sample A. Genes were analyzed based on present is largely dependent on the ability to reliably call criteria of being present in 3/5 replicates at one of the three microarray sites and in the majority of detect expression, intersite and interplatform replicates for each alternative quantitative platform (at least 3/4 for TaqMan, 2/3 for StaRT-PCR and variation can lead to discordant results in the QuantiGene). Genes detected by each alternative quantitative platform were sorted according to their signals (scaling as described in Fig. 1), and the percent of genes detected by both microarray and alternative gene lists. Using TaqMan assays as a reference, TPR quantitative platforms from bins of 30 consecutive genes (y axis) were plotted against the average signal and FDR for the various microarray platforms of those genes measured by the alternative quantitative platform (x axis). (d–f) Correlation of fold change measured by each microarray platform compared to TaqMan (d), StaRT-PCR (e) or QuantiGene (f). Pairdiffered across the assay range (Fig. 5a,d, wise Sample A to Sample B fold-change comparison, measured by each alternative quantitative platform Supplementary Table 3 online). TPR was (x axis) compared to each microarray platform (y axis). For each microarray platform, only genes present directly correlated to percent of detectable in both samples at each site were called present. Each line represents the Lowess smoothing fitting curve. genes whereas FDR was inversely correlated, The number of genes involved in each analysis varies with the platforms compared.

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

1119

A N A LY S I S Figure 5 Assessment of true positive rates and false discovery rates using TaqMan assays. (a–c) True positive rate (TPR) assessment using 80 80 80 TaqMan assays. All common genes between 60 60 60 TaqMan assays and microarray platforms were TPR 40 40 40 used for the TPR analysis. TPR was defined as the ABI ABI ABI AFX AFX 20 AFX percentage of differentially expressed genes in 20 20 AG1 AG1 AG1 GEH GEH GEH ILM sample A compared to sample B detected by each ILM 0 ILM 0 0 5 10 15 20 5 10 15 20 microarray platform out of the ones detected by 5 10 15 20 TaqMan assays data as truth [TPR = TP/(TP+FN)], No FC cut off FC cut off = 1.5 FC cut off = 2.0 ABI 100 ABI ABI 100 where TP is true positive and FN is false negative 100 AFX AFX AFX AG1 AG1 AG1 in microarray. Differential expression was detected 80 GEH GEH GEH 80 80 ILM ILM ILM by t-test, where false discovery rate (FDR) was 60 60 60 FDR controlled at the 5% level with fold-change 40 40 40 filters of 0 (d), 1.5 (e) and 2.0 (f). For TaqMan 20 20 20 assays, genes were ordered according to the 0 0 0 average signals of A and B and for bins of 50 5 10 15 20 5 10 15 20 5 10 15 20 consecutive genes, we compared the significant Average signal difference calls between each microarray platform and TaqMan assays. Concordance of differential expression was assessed for each platform. (d–f) False discovery rate (FDR) assessment using TaqMan assays. All common genes between TaqMan assays and microarray platforms were used for the FDR analysis. FDR was defined as FP/(TP + FP), where FP is false positive in microarrays. The FDR represents the percentage of differentially expressed genes detected only by microarray platforms out of all genes differentially expressed in microarray platforms. Notice that the FDR (relative to TaqMan assays) is slightly larger than 5%, which is expected from Benjamini Hochberg (BH) adjustment for multiple testing. Differential expression was detected by t-test (FDR at 5%), with fold-change level filters of 0 (d), 1.5 (e) and 2.0 (f).

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

a

No FC cut off

b

FC cut off = 1.5

c

100

100

100

d

e

f

platforms and commonly mapped to one or more of the alternative quantitative platforms, only 9 (1%) were ‘extremely’ discordant. A major factor contributing to these infrequent discordant results is differences in probe location. Assays designed to different locations of the discordant genes in this study demonstrated a utility of the alternative quantitative platforms (Fig. 6) to independently validate gene expression measurements from array platforms. This analysis was also useful in the study of discordance observed between alternative quantitative platforms. For example, discordant expression results for FURIN observed in alternative quantitative platforms is consistent with a probe location difference. The limited common gene list precluded a detailed analysis of the discordance caused by low expression genes among alternative quantitative platforms. In addition, another source of potential discordance may come from the difference of measuring mRNA directly versus measuring cDNA, which were not analyzed here. In summary, analysis of the MAQC samples by three alternative quantitative platforms revealed excellent fold-change correlation with microarray platform data while enabling identification of possible sources of intersite and interplatform discordance in lists of genes measured as differentially expressed. Advantages of the alternative quantitative platforms were partially due to assay specificity, lower detection threshold and expanded assay range. Another advantage was the ease with which they interrogated specific gene locations due to their flexible assay design. Further, analysis by these alternative quantitative technologies contributed to characterization of the MAQC samples and confirmed their value in guiding optimization of gene expression methods. METHODS Sample definition. Sample A was Universal Human Reference RNA (Stratagene) and sample B was human brain total RNA (Ambion). Concentrations of A and B were normalized based on total RNA as measured by OD260. C was a 3:1 volumetric mixture of A and B, and D was a 1:3 volumetric mixture of A and B. Selection of genes for validation by alternative quantitative platforms. A list of 1,297 RefSeqs was selected by the MAQC consortium. Over 90% of these genes were selected from a subset of 9,442 RefSeq common to the four platforms (Affymetrix, Agilent, GE Healthcare and Illumina) used in the MAQC Pilot-I Study (RNA Sample Pilot), based on annotation information provided by

1120

FC cut off = 2.0

manufacturers in August 2005. This selection ensured that the genes would cover the entire intensity and fold-change ranges and include any bias due to RefSeq itself. To aid in the titration study, we included a subset of (~100) genes based on tissue-specificity (A versus B). To address cross-platform data inconsistency, we also included another subset, which showed the largest variability in log2 fold change across platforms in the Pilot-I Study. Platform vendors were queried about their ‘favorite’ genes (e.g., CYP family, PPARA, HDAC family and a small number of these were included). Consideration was also given to the inclusion of genes that were available from QuantiGene and StaRT-PCR platforms. The final list was therefore not completely unbiased. Gene list for the MAQC study by alternative quantitative platforms. TaqMan assays: 1,000 TaqMan gene expression assays used in the study that matches with the MAQC gene list. These 1,000 assays were selected from > 200,000 available human TaqMan assays (>20,000 NCBI genes) and covered 997 genes (3 genes had more than one assay). StaRT-PCR: 103 genes were selected from the nearly 800 genes for which StaRT-PCR reagents are already available that match with the MAQC gene list. All genes that overlap with those measured by TaqMan assays and QuantiGene were included as well as an additional 102 genes for a total of 205. QuantiGene: we selected 245 QuantiGene assays (covered 244 genes) that matched with the MAQC gene list from nearly 2,600 genes for which QuantiGene probe sets are already available. All genes that overlap with those measured by TaqMan assays and StaRT-PCR were included. 55 genes were in common to all three alternative quantitative platforms. TaqMan assays. RNA Samples: total RNA samples A (universal human reference RNA (UHRR), Stratagene), B (brain, Ambion), C (3 UHRR:1 brain) and D (1 UHRR:3 brain) as described earlier were used for all TaqMan assays. There was no additional treatment to these samples before cDNA preparation. cDNA Preparation: cDNA was prepared from total RNA Sample A, B, C and D using Applied Biosystems cDNA Archive Kit and random primers. Multiple reactions containing 10 µg total RNA per 100 µl reaction volume were run for each sample following manufacturer’s recommendations. Individual reactions were pooled by sample and used for TaqMan assays analysis. TaqMan assays: each TaqMan Gene Expression Assay consists of two sequence-specific PCR primers and a TaqMan assay–FAM labeled MGB (minor groove binder) probe. Primer and probe design is described in Supplementary Methods. Each TaqMan assay was run in four replicates for each RNA sample. 10 ng total cDNA (as total input RNA) in a 10 µl final volume was used for each replicate assay. Assays were run with 2× Universal Master Mix without uracil-N-glycosylase on Applied Biosystems 7900 Fast Real-Time PCR System using universal cycling conditions (10 min at

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

QuantiGene. Assay procedure: the QuantiGene assays were performed according to the procedure of QuantiGene Reagent System (Panomics), which was previously described in detail24,25. Briefly, 10 µl of starting total RNA (500 ng)

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

ABCD1

DPYD

ELAVL1

1

2

0

0

0

–2

–1

–4

–2

FURIN

1

0

0

–1

–1

6,000

4,000

5,000

4,000

6,000

4,000

2,000

0

4,000

3,000

3,000

–2

–2

2,000

–5

2

1

0

0

3,000

2,000

PTGS2

1,000

5

2,000

0

IGFBP5 2

1,000

1,000

4,000

3,000

2,000

0

–2

1,000

3,000

2

2,000

2

1,000

4

0

StaRT-PCR. StaRT-PCR assays were performed according to the procedures previously described in detail4,12. Reverse transcription: for each of the four MAQC samples, two 20 µg aliquots of RNA were reverse transcribed. Each reverse-transcription reaction took place in a 90 µl volume containing Moloney Murine Leukemia Virus (MMLV) reverse transcriptase (1,500 units), MMLV RT 5× first strand buffer (final concentrations 50 mM Tris-HCl, pH 8.3, 75 mM KCl, 3 mM MgCl2) (both from Invitrogen), oligo dT primers (1.5 µg), RNasin (70 units), and deoxynucleotide triphosphates (dNTPs) (10 mM) (all from Promega). Calibration of cDNA: After reverse transcription, the two 90 µl cDNA products for each sample were combined into a single 180 µl volume. Each sample was then calibrated. A 2 µl aliquot of undiluted, tenfold diluted, or 100-fold diluted cDNA from each sample was PCR-amplified in presence of 2 µl of SMIS. In each µl of SMIS there are 600,000 JW molecules of ACTB internal standard (IS). It was determined that for each MAQC cDNA sample, a 50-fold dilution would result in approximate equivalence between ACTB NT and IS PCR products when equivalent volumes of each were included in the PCR reaction. After 50-fold dilution, there were 4,500 µl of each cDNA sample. It was then confirmed for each sample that the amount of ACTB cDNA in 1 µl was approximately in balance with the 600,000 ACTB internal standard molecules in 1 µl of SMIS. The amount of RNA that contributed to each µl of each 50-fold diluted working solution was 4 ng. StaRT-PCR reaction conditions: for each StaRT-PCR reaction, a 20 µl reaction volume was prepared containing 2 µl of the calibrated cDNA sample, 2 µl of SMIS, 0.5 units of Taq polymerase, 2.2 µl of buffer, 0.6 ml of MgCl2, 1 µl of each primer, 0.45 µl of dNTPs, and 10.65 µl of water. Range finding step: the expression level of each gene in each sample was initially unknown. Thus, to ensure that each measurement was in range of quantification (NT/IS > 1/10 and < 10/1), a range finding measurement was conducted for each gene in each sample with E SMIS. Each µl of E SMIS, contains 600 molecules of the target gene IS and 600,000 molecules of ACTB IS. After PCR amplification and electrophoretic separation of the PCR products, the SEM Center software then determined whether the NT/IS ratio of the PCR products was acceptable or, if not, predicted which SMIS should be used for quantification. This prediction was 95% accurate. Quantification: each 20 µl reaction volume contained 2 µl of the calibrated cDNA sample and 2 µl of the appropriate SMIS (that is, A–F), predicted to be correct in the range finding step. Triplicate measurements were made of each gene in each sample. The fold-change calculation for each gene was based on the ratio of the gene transcript in sample B over sample A.

from sample A, B, C or D was mixed with 40 µl of Lysis Mixture (Panomics), 40 µl of Capture Buffer (Panomics) and 10 µl of target gene-specific probe set (CE (capture extender), 1.65 fmol/µl; LE (label extender), 6.6 fmol/µl; BL (blocker), 3.3 fmol/µl). Each sample mixture was then dispensed into an individual well of a Capture Plate (Panomics). The Capture Plate was sealed with foil tape and incubated at 53 °C for 16–20 h. The hybridization mixture was removed and the wells were washed 3× with 250 µl of wash buffer (0.1× SSC, 0.03% lithium lauryl sulfate). Residual wash buffer was removed by centrifuging the inverted Capture Plate at 1,000g. Signals for the bound target mRNA were developed by sequential hybridization with branched DNA (bDNA) amplifier, and alkaline phosphatase-conjugated label probe, at 46 °C for 1 h each. Two washes with wash buffer were used to remove unbound material after each hybridization step. Substrate dioxetane was added to the wells and incubated at 46 °C for 30 min. Luminescence from each well was measured using a Lmax microtiter plate luminometer (Molecular Devices). Three replicate assays measuring RNA directly (independent sampling n = 3) were performed for all described experiments. Genomic DNA contamination in the RNA sample, if there is any, does not affect the QuantiGene assay, since it remains doubled-stranded throughout the entire procedure and thus cannot hybridize to the probe sets at the temperature used in the assay. Data analysis and filtering: the QuantiGene assays of 244 genes are performed for MAQC samples A, B, C, D. For all samples, background signals were determined in the absence of RNA samples and subtracted from signals obtained in the presence of RNA samples. Because the QuantiGene assay measures RNA directly, no data normalization against a reference gene is required in the data analysis. The presence and absence call is determined by limit of detection (LOD) of the assay, where LOD = background + 3 s.d. of background. If at least two samples out of A, B, C, D have signals below LOD in a gene, we call the gene absence. To determine gene expression fold change in sample A versus sample B,

0

95 °C; 15 s at 95 °C, 1 min 60 °C, 40 cycles). The assays and samples were analyzed across a total of 44–384 well plates. Robotic methods (Biomek FX) were used for plate setup and each sample and assay replicate was tracked on a per well, per plate basis. Data normalization: in QRT-PCR an endogenous control gene is used to normalize data and control for variability between samples as well as plate, instrument and pipetting differences. POLR2A was chosen as the reference gene because its CT value was within the range of most of the genes in the study and showed the least variation across the samples (Supplementary Fig. 5a,b online). Each replicate CT was normalized to the average CT of POLR2A on a per plate basis by subtracting the average CT of POLR2A from each replicate to give the ∆CT which is equivalent to the log2 difference between endogenous control and target gene. Data analysis and filtering: the ∆CT of each replicate for each of the 1,000 assays was presented in the final data set as the normalized data. When TaqMan gene expression assays are run on a 7900HT system in a 10 µl reaction volume, a raw CT value of 34 represents approximately ten transcript molecules (assuming 100% amplification efficiency). At a copy number less than five, stochastic effects dominate and data generated are less reliable. Thus, a raw CT of 35 was set as the limit of detection in this study: individual replicates which gave CT values >35 were considered not detected and flagged as not expressed (A, absent); replicates with CT < 35 were considered detectable and identified as expressed (P, present). A CT > 32 and <35 (~5–40 transcript molecules) was considered a low expressing gene. For the ∆CT calculations we used CT of 35 for any replicate with CT > 35. Fold-change calculation: the log2 fold change between two samples was calculated using ∆∆CT method21: the average ∆CT of sample A was subtracted from that of samples B.

Log2 fold change

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

A N A LY S I S

Gene coordinates − TaqMan assays

− StaRT-PCR assays

− QuantiGene assays

− Array platforms

Figure 6 Resolution of fold-change discrepancy results. Fold changes were calculated for Sample B vs. Sample A in all platforms. Each panel shows expression characteristics of a discordant gene across the transcript length. Y axis is log2 fold change. X axis represents transcript length starting from the 5′ end of the transcript. Gray bar graphically illustrates the transcript and the red vertical lines represent the exon-exon junctions. Colored bars represent expression value of each probe along the length of the transcript. The length of the colored bar represents the region interrogated by the probe for each platform. Two probes for FURIN (base 1–501, and base 217–2133) produced indistinguishable fold-change value in QuantiGene assay.

1121

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

A N A LY S I S we calculated the fold change (fold changes) using formula log2 fold changes = log2(SA/SB), where SA represents the assay range for a target gene in sample A and SB represents the assay range for the target gene in sample B. A gene is considered for fold-change analysis if the signal in both sample A and sample B passes the LOD. Relative accuracy calculation: relative accuracy measures the proximity of observed expression values for C and D to the predicted values based on measured expression values for A and B. Concentrations of samples A and B were each quantified and normalized on the basis of total RNA (OD260). They were then mixed on a volumetric basis to yield sample C (0.75A/0.25B) and sample D (0.25A/0.75B). If the assay range for the target mRNA is within the linear dynamic range of the assay, then the predicted assay signal for Sample C and Sample D can be calculated using the following formula: C′ = 0.75A + 0.25B and D′ = 0.25A + 0.75B. TaqMan assay and QuantiGene sample input was based on total RNA. For this reason the predicted values of C and D can be calculated from the volumetric proportions of A and B based on the formula C = 0.25A + 0.75B and D = 0.75A + 0.25B. With StaRT-PCR, as with the microarrays, each measurement was normalized to mRNA instead of the starting total RNA. As described in26 and27, if the fraction of mRNA is higher in sample A compared to sample B, the predicted C and D values will be different from the formula provided above. Based on analysis of optimal linearity among the MAQC samples for the StaRT-PCR data, the most likely formula was determined to be C = 0.88A + 0.12B and D = 0.45A + 0.55B. A data set recalibrated on the basis of these assumed formulas (Supplementary Methods) was used to assess relative accuracy for StaRT-PCR. Multi-platform data transformation for Figure 1. For StaRT-PCR, 6,000 transcript molecules were defined by a value of 6,000 or log2 (6,000) = 12.55. For TaqMan assays, first the CT values were transformed from a decreasing copy number scale to an increasing copy number scale. This was accomplished by taking the absolute value of the difference of every TaqMan assay CT value and the lowest value for TaqMan assays CT (40). This rescaling preserves the assay range measured by TaqMan assays in the log2 space. Given that a TaqMan assay CT value of 35 is estimated to correspond to 5 transcript molecules, the extrapolated CT equivalent for 6,000 transcript molecules is ~24.78. This value on the transformed scale corresponds to |24.78–40| or 15.22. To scale this to the StaRT-PCR value of 6,000 transcript molecules, a rescaling value of 2.66025 was applied to all values. This factor was calculated by taking the difference between the prescaling value in TaqMan assays that corresponds to 6,000 transcript molecules (15.22) and the value of StaRT-PCR that corresponds to 6,000 transcript molecules (12.55). The same transformation was applied to QuantiGene values resulting in a rescaling factor = 13.55. This factor was generated with the estimation of 6,000 transcript molecules defined by 0.5 RLU or –1.0 on a log2 scale. These transformations result in all platforms having a post-scaling value of 12.55 on a log2 scale for an approximate threshold of 6,000 transcript molecules. Note: Supplementary information is available on the Nature Biotechnology website. ACKNOWLEDGMENTS We would like to acknowledge the contribution to this manuscript from the following members of the MAQC team: Shawn B. Baker, Anne Bergstrom Lucas, Jim Collins, Eugene Chudin, Stephanie Fulmer-Smentek, Damir Herman, Richard Shippy, Chunlin Xiao and Necip Mehmet. DISCLAIMER This work includes contributions from, and was reviewed by, the FDA. The FDA has approved this work for publication, but it does not necessarily reflect official Agency policy. Certain commercial materials and equipment are identified in order to adequately specify experimental procedures. In no case does such identification imply recommendation or endorsement by the FDA, nor does it imply that the items identified are necessarily the best available for the purpose.

1122

COMPETING INTERESTS STATEMENT The authors declare competing financial interests (see the Nature Biotechnology website for details). Published online at http://www.nature.com/naturebiotechnology/ Reprints and permissions information is available online at http://npg.nature.com/ reprintsandpermissions/ 1. Vondracek, M. et al. Transcript profiling of enzymes involved in detoxification of xenobiotics and reactive oxygen in human normal and simian virus 40 T antigenimmortalized oral keratinocytes. Int. J. Cancer 99, 776–782 (2002). 2. Urdea, M. et al. Branched DNA amplification multimers for the sensitive, direct detection of human hepatitis virus. Nucleic Acids Symp. Ser. 24, 197–200 (1991). 3. Gleaves, C.A. et al. Multicenter evaluation of the Bayer VERSANT HIV-1 RNA 3.0 assay: analytical and clinical performance. J. Clin. Virol. 25, 205–216 (2002). 4. Bustin, S.A. (ed.). A-Z of Quantitative PCR. (International University Line Biotechnology Series, La Jolla, California, USA, 2004). 5. Wong, M.L. & Medrano, J.F. Real-time PCR for mRNA quantitation. Biotechniques 39, 75–85 (2005). 6. Lee, L.G., Connell, C.R. & Bloch, W. Allelic discrimination by nick-translation PCR with fluorogenic probes. Nucleic Acids Res. 21, 3761–3766 (1993). 7. Heid, C.A., Stevens, J., Livak, K.J. & Williams, P.M. Real time quantitative PCR. Genome Res. 6, 986–994 (1996). 8. Gibson, U.E., Heid, C.A. & Williams, P.M. A novel method for real time quantitative RT-PCR. Genome Res. 6, 995–1001 (1996). 9. Qin, L.X. et al. Evaluation of methods for oligonucleotide array data via quantitative real-time PCR. BMC Bioinformatics 7, 23 (2006). 10. Kuo, W.P. et al. A sequence-oriented comparison of gene expression measurements across different hybridization-based technologies. Nat. Biotechnol. 24, 832–840 (2006). 11. Wang, Y. et al. Large scale real-time PCR validation on gene expression measurements from two commercial long-oligonucleotide microarrays. BMC Genomics 7, 59 (2006). 12. Willey, J.C. et al. Standardized RT-PCR and the standardized expression measurement center. Methods Mol. Biol. 258, 13–41 (2004). 13. Rots, M.G. et al. mRNA expression levels of methotrexate resistance-related proteins in childhood leukemia as determined by a standardized competitive template-based RT-PCR method. Leukemia 14, 2166–2175 (2000). 14. Mullins, D.N. et al. CEBPG transcription factor correlates with antioxidant and DNA repair genes in normal bronchial epithelial cells but not in individuals with bronchogenic carcinoma. BMC Cancer 5, 141 (2005). 15. Flagella, M. et al. A multiplex branched DNA assay for parallel quantitative gene expression profiling. Anal. Biochem. 352, 50–60 (2006). 16. Yao, J.D. et al. Multicenter Evaluation of the VERSANT Hepatitis B Virus DNA 3.0 Assay. J. Clin. Microbiol. 42, 800–806 (2004). 17. Elbeik, T. et al. Multicenter Evaluation of the Performance Characteristics of the Bayer VERSANT HCV RNA 3.0 Assay (bDNA). J. Clin. Microbiol. 42, 563–569 (2004). 18. Stenman, J. & Orpana, A. Accuracy in amplification. Nat. Biotechnol. 19, 1011–1012 (2001). 19. Cleveland, W. Robust locally weighted regression and smoothing scatter plots. J. Am. Stat. Assoc. 74, 829–836 (1979). 20. MAQC Consortium. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat. Biotechnol. 24, 1151–1161 (2006). 21. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate - a practical and powerful approach to multiple testing. J. R. Stat. Soc. B. Met. 57, 289–300 (1995). 22. Shippy, R. et al. Performance evaluation of commercial short-oligonucleotide microarrays and the impact of noise in making cross-platform correlations. BMC Genomics 5, 61 (2004). 23. Livak, K.J. & Schmittgen, T.D. Analysis of relative gene expression data using real-time quantitative PCR and the 2-∆∆CT Method. Methods 25, 402–408 (2001). 24. Kern, D. et al. An enhanced-sensitivity branched-DNA assay for quantification of human immunodeficiency virus type 1 RNA in plasma. J. Clin. Microbiol. 34, 3196– 3202 (1996). 25. Wang, J. et al. Regulation of insulin preRNA splicing by glucose. Proc. Natl Acad. Sci. USA 94, 4360–4365 (1997). 26. Shippy, R. et al. Using RNA sample titrations to assess microarray platform performance and normalization techniques. Nat. Biotechnol. 24, 1123–1131 (2006). 27. Tong, W. et al. Evaluation of external RNA controls for the assessment of microarray performance. Nat. Biotechnol. 24, 1132–1139 (2006).

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

A N A LY S I S

Using RNA sample titrations to assess microarray platform performance and normalization techniques Richard Shippy1, Stephanie Fulmer-Smentek2, Roderick V Jensen3, Wendell D Jones4, Paul K Wolber2, Charles D Johnson5, P Scott Pine6, Cecilie Boysen7, Xu Guo8, Eugene Chudin9, Yongming Andrew Sun10, James C Willey11, Jean Thierry-Mieg12, Danielle Thierry-Mieg12, Robert A Setterquist13, Mike Wilson5, Anne Bergstrom Lucas2, Natalia Novoradovskaya14, Adam Papallo3, Yaron Turpaz8, Shawn C Baker9, Janet A Warrington8, Leming Shi15 & Damir Herman12 We have assessed the utility of RNA titration samples for evaluating microarray platform performance and the impact of different normalization methods on the results obtained. As part of the MicroArray Quality Control project, we investigated the performance of five commercial microarray platforms using two independent RNA samples and two titration mixtures of these samples. Focusing on 12,091 genes common across all platforms, we determined the ability of each platform to detect the correct titration response across the samples. Global deviations from the response predicted by the titration ratios were observed. These differences could be explained by variations in relative amounts of messenger RNA as a fraction of total RNA between the two independent samples. Overall, both the qualitative and quantitative correspondence across platforms was high. In summary, titration samples may be regarded as a valuable tool, not only for assessing microarray platform performance and different analysis methods, but also for determining some underlying biological features of the samples.

1GE Healthcare, 7700 S. River Pkwy., Suite #2603, Tempe, Arizona 85284, USA. 2Agilent Technologies, Inc., 5301 Stevens Creek Blvd., Santa Clara, California 95051, USA. 3University of Massachusetts-Boston, 100 Morrissey Blvd., Boston, Massachusetts 02125, USA. 4Expression Analysis, Inc., 2605 Meridian Pkwy., Durham, North Carolina 27713, USA. 5Asuragen, Inc., 2150 Woodward, Austin, Texas 78744, USA. 6Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, Maryland 20993, USA. 7ViaLogy, 2400 Lincoln Ave, Altadena, California 91001, USA. 8Affymetrix, Inc., 3420 Central Expressway, Santa Clara, California 95051, USA. 9Illumina, Inc., 9885 Towne Centre Dr., San Diego, California 92121, USA. 10Applied Biosystems, 850 Lincoln Centre Dr., Foster City, California 94404, USA. 11University of Toledo, Toledo, Ohio 43606, USA. 12National Center for Biotechnology Information, Bethesda, Maryland 20894, USA. 13Applied Biosystems, 2150 Woodward, Austin, Texas 78744, USA. 14Stratagene, 11011 N. Torrey Pines Rd., La Jolla, California 92037, USA. 15National Center for Toxicological Research, US Food and Drug Administration, 3900 NCTR Rd., Jefferson, Arizona 72079, USA. Correspondence should be addressed to R.S. ([email protected]).

Published online 8 September 2006; doi:10.1038/nbt1241

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

Microarrays are widely used to simultaneously measure the levels of thousands of RNA targets in a biological sample. Despite their widespread use, many in the community are concerned with the comparability of the results obtained using different microarray platforms and thus the biological relevance of the qualitative and quantitative results obtained. Microarray platform performance has been evaluated before on the criteria of sensitivity, specificity, dynamic range, precision and accuracy1–12. As part of the MicroArray Quality Control (MAQC) project, similar assessments have also been reported13,14. Other studies have used defined mixtures of RNA samples (titration samples) for interplatform2,15 and interlaboratory15 comparisons. Here we have investigated an alternative performance metric: the abilities of different microarray platforms to accurately detect a signal trend produced by mixing samples (titration trend) and the effects of normalization and other data analysis practices on this performance characteristic. Gene-expression levels were measured for two pure samples and two mixtures using five different commercial whole-genome platforms at three different test sites per platform. The five commercially available whole-genome platforms tested were Applied Biosystems (ABI), Affymetrix (AFX), Agilent Technologies (AG1), GE Healthcare (GEH) and Illumina (ILM). The level of accurate titration response was quantified by determining the number of probes for which the average signal response in the titration samples was consistent with the response in the independent, reference RNA samples. We analyzed every platform at each site, and here we present comparisons of the various platforms using various data processing and normalization techniques. To assess the titration response of as many genes as possible, an a priori expectation of differential expression of many transcripts was necessary. On the basis of results from pilot titration studies (data not shown), we elected to use two independent samples (A, Stratagene Universal RNA, and B, Ambion Human Brain RNA) that showed large, statistically significant differences in expression for a large number of transcripts to generate the two titration samples (C and D, consisting of 3:1 and 1:3 ratios of A to B, respectively; see Fig. 1). We defined the series of mean signals generated by a gene on a microarray platform across these samples as its titration response. For these analyses, we assumed

1123

A N A LY S I S B

A

Independent samples

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

C

D

Titration samples

75% A + 25% B

25% A + 75% B

Figure 1 RNA samples. We used expression measurements from two independent total RNA samples, A and B, and mixtures of these two samples at defined ratios of 3:1 (C) and 1:3 (D). The titration mixtures were generated once for all experiments, with samples A and B at equal total RNA concentrations as determined by A260.

that the expression measurement of a transcript in a titration sample follows a linear titration relationship: the signal of any given transcript in the two titration samples should be a linear combination of the signals produced by the two independent samples. From the signal intensities in the microarray titration experiments, we obtained the percentage of genes on each platform that showed a monotonic titration response and analyzed that percentage as a function of the magnitude of differential expression between A and B or as a function of the signal intensity. Many normalization methods have been developed that are commonly used for different microarray platforms16–24, including those methods that have been recommended by the array manufacturers for the MAQC project13 (see Methods). Differences in these methods significantly influence several aspects of microarray performance, including precision and sensitivity9,16–20,23,24. However, no clear consensus exists in the microarray community as to which method is best under a given set of circumstances. The optimal normalization or scaling methods for a given dataset may depend both on the experiment and on many attributes of that microarray dataset, including signal distribution and noise characteristics25. The experimental design used here is valuable for assessing the influence of different data processing techniques on the self-consistency of microarray data with regard to titration response. In addition, the different data processing techniques were also analyzed with respect to their impact on the statistical power of these platforms to distinguish between the independent and titration samples. The titration analysis presented here was applied to all commercial whole-genome microarray platforms tested in the MAQC project13, using various data processing techniques, to evaluate the self-consistency and statistical power of the resulting data. When assessing accuracy in experimental systems, the goal is to compare observed results to the expected ‘true’ values of the system. For most experiments measuring gene expression, the ‘true’ values are either unknown or difficult to measure independently. However, the titration response results presented here can provide some quantitative information about the relative accuracy of measurements of differential gene expression. Monotonicity in the titration response indicates a selfconsistent relationship among the expression measurements from the four samples. Because many inferences drawn from microarray experiments depend as much or more on the direction of expression changes

1124

as on their magnitudes, the consistency with which microarray assays determine direction of change is an important performance characteristic. The main advantages of our method are that titration responses can be assessed on a large scale, independent of a designated reference platform, and that it does not require substantial assumptions to be made about the data2,25. RESULTS The experimental design of the main MAQC study is described in detail elsewhere13. Briefly, two independent RNA samples were chosen for study and used to generate two titration samples. The gene-expression profiles of these samples, all split from a single pool, were measured on ten gene-expression measurement platforms. For each of the five whole-genome microarray platforms examined in this study, the samples were analyzed at three different test sites, each with ≤5 replicate assays per sample, for a total of 293 microarray hybridizations at 15 different sites. Data from all platforms were then processed using the recommended method from each array manufacturer, as represented in the main MAQC paper13, as well as one or more alternative normalization methods. Using probe sequence information, we identified 12,091 genes that were uniquely targeted by at least one probe for all five commercial whole-genome microarray platforms. For each platform, only the probe closest to the 3′ end of the gene was considered13. We chose to exclude genes that were not detected across all samples and focused on genes whose signals were above the noise level and therefore more reliable10. Each manufacturer provided quantitative detection calls characterizing the probability that a gene was detected in a given replicate13. For most analyses, only genes detected in at least three replicates for a given sample and site were considered. This detection-call protocol is the same as described in the main MAQC paper13. Measuring titration response as a function of fold change The chief advantage of an experiment that evaluates gene expression in a series of known mixtures of two samples is that the rank order of measured expression levels of any given gene across the series can be predicted from the relative expression levels in the two original samples. For the series described in this paper, if the true expression level (Ai) of any gene i in sample A is greater than the true expression level (Bi) of the same gene i in sample B, then Ai > Ci > Di > Bi, where Ci and Di are the true expression levels of gene i in samples C and D. If Bi > Ai, then Bi > Di > Ci > Ai. In our case, if we postulate Ai > Bi on the – basis of the observed sample mean of Ai (Ai) being significantly larger – (P < 0.001) than the observed sample mean of Bi (Bi), then we expect – – – – Ai > Ci > Di > Bi. Finally, if Ai ≈ Bi, then the order of observed means will be nearly random. In Figure 2, the percentage of genes in a 100-gene moving window that produce the expected titration response for each site and platform – – is plotted as a function of the average Ai / Bi ratio of those 100 genes, – – – – – – when Ai > Bi (left side of graph), or of the Bi / Ai ratio, when Bi > Ai (right – – – – side of graph). The x-axis origin of these graphs is at Ai / Bi = Bi / Ai = 1, the ratio at which the titration response changes direction. The overall shapes of all of the curves are similar: as expected from theory, they rise – – – – from a value near zero at Ai / Bi = Bi / Ai = 1 to an asymptote of 100% at – – – – larger values of Ai / Bi or Bi / Ai. Figure 2 also illustrates how alternative normalization methods (for AFX, alternative data reduction methods of the individual features) affect the quantitative outcome. For example, the data from the different test sites for AG1 show distinct behaviors under the standard normalization, but exhibit much more similar titration behaviors when normalized using the alternative method. In addition, for the AFX data, GCRMA processing26 (a modified version

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

differences resulting from alternative normalization techniques are also apparent in the results presented in Figure 2 and Table 1.

of robust multichip analysis (RMA) processing that models intensity of probe level data as a function of GC content) results in titration curves with a broader spread than those produced by probe logarithmic intensity error (PLIER)21 or RMA18. It should be noted that the different data processing techniques also yield different numbers of genes showing significant deviations in expression values between samples A and B (Fig. 2 and Table 1), which can also influence titration performance. The most striking differences resulting from normalization techniques are seen with the ILM data, where the alternative method, invariant scaling, resulted in many fewer significant genes on the left side of the panel as well as lower percentages of genes that titrate at lower-fold changes. The quantitative differences between the various curves shown in Figure 2 are listed in Table 1, which presents the ratios at which 50%, 75% or 90% of the detected genes show a monotonic titration response. The performances observed for different sites and platforms were similar but not identical (Table 1). Many different platforms and sites identified the correct ordering of the titration samples for more than 90% of genes with twofold difference between A and B (Table 1, rows 14 and 17), which suggests that the DNA microarrays can reliably distinguish very small-fold differences in the mixture samples. The

ABI - quantile

Measuring titration response as a function of signal intensity To further explore the impact of different normalization techniques, we assessed titration response as a function of signal intensity. In Figure 3, we plot the fraction of genes that titrate relative to the total number of genes in the given intensity range, as a function of the lowest signal in the monotonic titration trend. That is, for the monotonic trend – – – – – Ai > Ci > Di > Bi, we plotted this fraction against the signal intensity Bi – – – – – (solid lines), whereas for the opposite trend Bi > Di > Ci > Ai, we used – the intensity Ai (dashed lines). We observed that, in general, the fraction of genes that titrate is inversely proportional to the signal intensity. The signal plotted on the x-axis is the lowest signal in the series; therefore, when this signal is low, the probes are more likely to show the expected titration response, as the fold differences will tend to be larger. When the magnitude of this lowest signal increases, the possible fold difference between A and B will decrease. Differences in distribution among platforms and normalization methods are evident. For ABI, the fraction of genes that titrate follows the same trend as for the other platforms when A > B (Fig. 3, solid lines),

ABI - scaling

AG1 - median scaling

AG1 - 75th percentile scaling

100

100

100

100

90

90

90

90

80

80

80

80

70

70

70

60

60

60

50 40 30 20

A>C>D>B n = 2,806 n = 2,169 n = 2,740

50

B>D>C>A n = 2,240 n = 1,803 n = 2,198

40 30 20

0 4.0

A>C>D>B n = 2,960 n = 2,285 n = 2,807

50

B>D>C>A n = 2,355 n = 1,844 n = 2,312

40 30 20

10

10

Percentage of genes that titrate

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

A N A LY S I S

3.5

3.0

2.5

2.0

1.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

0 4.0

60 50

B>D>C>A n = 1,977 n = 2,168 n = 2,589

40 30 20

10 3.5

3.0

B/A

A/B

70

A>C>D>B n = 3,654 n = 3,435 n = 2,697

2.5

2.0

1.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

0 4.0

3.0

2.5

2.0

1.5

1.0

1.5

2.0

A/B

2.5

3.0

3.5

4.0

0 4.0

100

90

90

80

80

80

80

70

70

70

60

60

60

40 30 20

50

A>C>D>B n = 2,290 40 n = 2,772 30 n = 2,966 20

40 30 20

3.5

3.0

2.5

2.0

1.5

1.0

1.5

2.0

A/B

2.5

3.0

3.5

4.0

0 4.0

3.5

3.0

B/A

2.5

2.0

1.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

0 4.0

50

B>D>C>A n = 1,781 n = 2,020 n = 2,015

40 30 20

3.0

2.5

2.0

1.5

1.0

1.5

2.0

A/B

2.5

3.0

3.5

4.0

0 4.0

100

90

90

80

80

80

80

70

70

70

70

60

60

60

50

50

30 20

30 20

50

B>D>C>A n = 2,251 n = 2,326 n = 2,496

40 30 20

10

10 0 4.0

A>C>D>B n = 3,902 n = 3,977 n = 4,352

3.5

3.0

2.5

A/B

2.0

1.5

1.0

1.5

2.0

2.5

B/A

3.0

3.5

4.0

0 4.0

3.5

3.0

2.5

A/B

2.0

1.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

B/A

3.5

4.0

B>D>C>A n = 1,720 n = 1,931 n = 1,956

3.0

2.5

2.0

1.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

B/A

60

A>C>D>B n = 3,128 n = 3,002 n = 2,543

50

B>D>C>A n = 2,136 n = 2,038 n = 1,792

40 30 20

A>C>D>B n = 1,981 n = 1,755 n = 1,542

B>D>C>A n = 3,152 n = 2,882 n = 2,900

10

10 0 4.0

3.0

ILM - invariant scaling

100

90

40

3.5

ILM - quantile

100

B>D>C>A n = 1,918 n = 2,008 n = 2,091

2.5

B/A

A/B

90

A>C>D>B n = 3,809 n = 4,063 n = 4,034

2.0

A>C>D>B n = 2,581 n = 3,092 n = 3,227

B/A

100

40

1.5

10 3.5

GEH - quantile

GEH - median scaling

1.0

60

B/A

A/B

1.5

70

A>C>D>B n = 2,772 n = 3,305 n = 3,365

10

10

10 0 4.0

50

B>D>C>A n = 1,696 n = 1,834 n = 1,951

2.0

AFX - GCRMA

100

90

B>D>C>A n = 1,869 n = 2,038 n = 2,132

2.5

AFX - RMA

100

90

A>C>D>B n = 2,938 n = 3,520 n = 3,517

3.0

A/B

100

50

3.5

B/A

AFX - MAS5

AFX - PLIER

B>D>C>A n = 2,164 n = 2,256 n = 2,538

10 3.5

B/A

A/B

A>C>D>B n = 3,138 n = 3,048 n = 3,254

3.5

3.0

2.5

A/B

2.0

1.5

1.0

1.5

2.0

2.5

B/A

3.0

3.5

4.0

0 4.0

3.5

3.0

2.5

A/B

2.0

1.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

B/A

Average linear ratio – – – – – – –– – – – – – Figure 2 Percentage of genes showing the monotonic titration responses Ai > Ci > Di > Bi and Bi > Di > Ci > Ai plotted against the linear Ai / Bi and Bi / Ai ratios, respectively, for each commercial whole-genome microarray platform, using various normalization methods. All graphs were generated from the set of 12,091 genes common across whole-genome platforms, with outlier arrays excluded per manufacturer’s recommendations13. Genes detected across all four samples per site that were also significantly differentially expressed (P < 0.001) in independent samples A and B were used in the calculations (Table 1, rows 4 and 5). A two-sample t-test, with equal variance, was performed within each site on log2 expression values. For each platform, a 100-probe moving window, – – – – based on sorted Ai / Bi ratios (left side of plot) or Bi / Ai ratios (right side of plot), was used to calculate the percentage of self-consistent monotonic titration – – – – response genes (y-axis) as a function of the corresponding moving average of Ai / Bi or Bi / Ai ratios (x-axis) within each site. Graphs are plotted with a scale break between –1 and 1, with reassignment of the x-axis for clarity. Each graph contains six series of data points (three sites in two monotonic directions), which were smoothed using a distance-weighted least-squares method. Blue, site 1; red, site 2; gray, site 3. Total number of genes showing the monotonic – – – – – – – – –– – – – trend for each site are indicated in each graph, for both directions (Ai > Ci > Di > Bi for Ai / Bi ratios >1 and Bi > Di > Ci > Ai for Bi / Ai ratios >1), and are also listed in Table 1 (rows 4 and 5). The normalization methods highlighted in yellow for each platform represent the manufacturer’s recommended method used in the MAQC main paper13.

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

1125

A N A LY S I S Table 1 Gene counts for AFX and ABI (top) and AG1, GEH and ILM (bottom) for each normalization method

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

Quantile

Scaling

PLIER

MAS 5.0

RMA

GCRMA

Row

Condition

ABI_1 ABI_2 ABI_3 ABI_1 ABI_2 ABI_3 AFX_1 AFX_2 AFX_3 AFX_1 AFX_2 AFX_3 AFX_1 AFX_2 AFX_3 AFX_1 AFX_2 AFX_3

1

Detected in A · B · C · D

8,049 7,863 8,550 8,049 7,863 8,550 7,359 7,006 7,424 7,359 7,006 7,424 7,359 7,006 7,424 7,359 7,006 7,424

2

A>B

4,284 4,191 4,509 4,308 4,219 4,424 4,423 4,291 4,557 4,244 4,040 4,267 4,414 4,192 4,440 4,356 4,125 4,376

3

B>A

3,765 3,672 4,041 3,741 3,644 4,126 2,936 2,715 2,867 3,115 2,966 3,157 2,945 2,814 2,984 3,003 2,881 3,048

4

A > B and P < 0.001

3,144 2,298 3,046 3,143 2,376 3,037 3,723 3,632 3,848 2,982 2,934 3,168 3,559 3,491 3,670 3,420 3,273 3,490

5

B > A and P < 0.001

2,572 1,886 2,436 2,571 1,930 2,494 2,356 2,176 2,306 2,074 1,999 2,182 2,272 2,274 2,372 2,224 2,172 2,303

6

A>C>D>B

3,063 2,924 3,159 3,296 3,104 3,256 3,042 3,751 3,616 2,493 3,111 3,258 2,862 3,462 3,479 2,708 3,297 3,407

7

B>D>C>A

2,471 2,424 2,622 2,670 2,487 2,772 1,924 2,154 2,222 1,873 2,089 2,170 1,858 2,100 2,087 1,829 2,071 2,075

8

A > C > D > B and P < 0.001 2,806 2,169 2,740 2,960 2,285 2,807 2,938 3,520 3,517 2,290 2,772 2,966 2,772 3,305 3,365 2,581 3,092 3,227

9

B > D > C > A and P < 0.001 2,240 1,803 2,198 2,355 1,844 2,312 1,869 2,038 2,132 1,696 1,834 1,951 1,781 2,020 2,015 1,720 1,931 1,956

10

(A > C > D > B) / (A > B)

0.71

0.70

0.70

0.77

0.74

0.74

0.69

0.87

0.79

0.59

0.77

0.76

0.65

0.83

0.78

0.62

0.80

11

(B > D > C > A) / (B > A)

0.66

0.66

0.65

0.71

0.68

0.67

0.66

0.79

0.78

0.60

0.70

0.69

0.63

0.75

0.70

0.61

0.72

0.68

12

50% titrate when A/B =

1.35

1.35

1.36

1.28

1.32

1.32

1.30

1.13

1.20

1.52

1.28

1.30

1.40

1.18

1.25

1.60

1.28

1.32

13

75% titrate when A/B =

1.58

1.65

1.65

1.45

1.60

1.60

1.65

1.20

1.30

1.98

1.45

1.50

1.70

1.32

1.42

2.05

1.47

1.58

14

90% titrate when A/B =

1.80

1.98

1.99

1.68

1.90

1.94

2.10

1.30

1.52

3.00

1.67

1.78

2.10

1.42

1.61

2.80

1.68

1.85

15

50% titrate when B/A =

1.43

1.42

1.45

1.34

1.35

1.40

1.39

1.20

1.22

1.53

1.30

1.36

1.44

1.22

1.30

1.63

1.35

1.47

16

75% titrate when B/A =

1.77

1.80

1.88

1.60

1.75

1.83

1.68

1.37

1.38

1.82

1.45

1.52

1.75

1.40

1.50

2.22

1.65

1.80

17

90% titrate when B/A =

2.08

2.23

2.40

1.85

2.12

2.30

2.05

1.49

1.50

2.50

1.75

1.87

2.15

1.58

1.68

2.90

2.10

2.30

18

A/B > 2.00

1,794 1,664 1,830 1,813 1,718 1,808 1,703 1,602 1,832 1,759 1,548 1,756 1,693 1,468 1,702 2,178 2,062 2,255

19

B/A > 2.00

1,636 1,562 1,745 1,634 1,548 1,793 1,171 1,028 1,136 1,360 1,202 1,346 1,172 1,017 1,141 1,462 1,378 1,501

20

A/B > 2.00 (P < 0.001)

1,772 1,558 1,802 1,793 1,626 1,782 1,703 1,602 1,832 1,732 1,542 1,748 1,693 1,468 1,700 2,168 2,049 2,233

21

B/A > 2.00 (P < 0.001)

1,613 1,423 1,672 1,612 1,435 1,716 1,171 1,028 1,136 1,350 1,195 1,335 1,171 1,017 1,141 1,447 1,365 1,487

Row

Condition

AG1_1 AG1_2 AG1_3 AG1_1 AG1_2 AG1_3 GEH_1 GEH_2 GEH_3 GEH_1 GEH_2 GEH_3 ILM_1 ILM_2 ILM_3 ILM_1 ILM_2 ILM_3

1

Detected in A · B · C · D

8,322 8,468 9,121 8,322 8,468 9,121 10,416 10,505 10,289 10,416 10,505 10,289 7,995 7,761 7,555 7,995 7,761 7,555

2

A>B

5,046 4,922 5,051 4,624 4,705 5,027 6,324 6,537 6,161 6,173 6,275 6,123 4,505 4,349 4,221 3,670 3,512 3,009

3

B>A

3,276 3,546 4,070 3,698 3,763 4,094 4,092 3,968 4,128 4,243 4,230 4,166 3,490 3,412 3,334 4,325 4,249 4,546

4

A > B and P < 0.001

3,711 3,763 3,710 3,443 3,624 3,807 3,998 4,753 4,393 4,042 4,582 4,512 3,657 3,289 2,808 2,868 2,479 1,769

5

B > A and P < 0.001

2,057 2,439 2,839 2,447 2,707 2,958 2,238 2,352 2,632 2,409 2,586 2,772 2,713 2,473 2,051 3,384 3,068 2,960

6

A>C>D>B

4,249 3,714 2,923 3,430 3,218 3,460 4,413 4,314 4,381 4,637 4,308 4,917 3,204 3,170 2,924 2,097 1,945 1,989

7

B>D>C>A

2,304 2,357 2,848 2,384 2,377 2,703 2,167 2,230 2,258 2,718 2,653 2,833 2,198 2,153 2,059 3,426 3,221 3,697

Median scaling

75th % scaling

Median scaling

Quantile

Quantile

0.78

Invariant scaling

8

A > C > D > B and P < 0.001 3,654 3,435 2,697 3,138 3,048 3,254 3,809 4,063 4,034 3,902 3,977 4,352 3,128 3,002 2,543 1,981 1,755 1,542

9

B > D > C > A and P < 0.001 1,977 2,168 2,589 2,164 2,256 2,538 1,918 2,008 2,091 2,251 2,326 2,496 2,136 2,038 1,792 3,152 2,882 2,900

10

(A > C > D > B) / (A > B)

0.84

0.75

0.58

0.74

0.68

0.69

0.70

0.66

0.71

0.75

0.69

0.80

0.71

0.73

0.69

0.57

0.55

11

(B > D > C > A) / (B > A)

0.70

0.66

0.70

0.64

0.63

0.66

0.53

0.56

0.55

0.64

0.63

0.68

0.63

0.63

0.62

0.79

0.76

0.66 0.81

12

50% titrate when A/B =

1.24

1.35

1.60

1.38

1.48

1.43

1.34

1.45

1.40

1.25

1.38

1.25

1.32

1.30

1.34

1.52

1.55

1.32

13

75% titrate when A/B =

1.39

1.66

2.15

1.53

1.75

1.70

1.50

1.70

1.53

1.40

1.62

1.38

1.50

1.49

1.54

2.08

2.08

1.65

14

90% titrate when A/B =

1.55

2.09

3.20

1.68

2.02

2.02

1.65

1.95

1.66

1.60

1.95

1.55

1.65

1.70

1.72

2.72

2.80

2.15

15

50% titrate when B/A =

1.39

1.45

1.40

1.52

1.57

1.48

1.46

1.44

1.51

1.30

1.35

1.30

1.44

1.45

1.41

1.26

1.30

1.25

16

75% titrate when B/A =

1.76

1.87

1.70

1.90

1.92

1.87

1.65

1.65

1.70

1.50

1.58

1.50

1.74

1.81

1.69

1.42

1.47

1.47

17

90% titrate when B/A =

2.30

2.60

2.05

2.50

2.35

2.33

1.87

1.85

1.88

1.72

1.80

1.72

2.00

2.14

1.93

1.65

1.70

1.75

18

A/B > 2.00

2,570 2,435 2,284 2,179 2,236 2,262 2,363 2,772 2,640 2,216 2,522 2,570 1,620 1,602 1,446 1,377 1,298 1,063

19

B/A > 2.00

1,556 1,714 1,901 1,790 1,843 1,916 1,351 1,351 1,453 1,373 1,432 1,451 1,382 1,371 1,254 2,008 1,969 2,227

20

A/B > 2.00 (P < 0.001)

2,504 2,393 2,249 2,136 2,197 2,227 2,339 2,757 2,616 2,200 2,508 2,545 1,620 1,602 1,430 1,377 1,290 1,045

21

B/A > 2.00 (P < 0.001)

1,458 1,673 1,883 1,672 1,802 1,901 1,340 1,347 1,443 1,356 1,427 1,437 1,382 1,365 1,238 2,004 1,942 2,146

– – Row 1 lists the number of genes detected in all four samples for each platform, separated by site. Rows 2 and 3 represent the number of concordantly detected genes for A > B – – and B > A, respectively. The sum of rows 2 and 3 for each column is identical to the gene count in row 1. Rows 4 and 5 represent the number of concordantly detected, statistically – – – – – – – – – – –– – significant (P < 0.001) genes for A > B and B > A. Rows 6 and 7 represent the number of detected genes that show the monotonic titration trends A > C > D > B and B > D > C > A. – – – – – – –– – Rows 8 and 9 represent the number of statistically significant (P < 0.001), concordantly detected genes that show the monotonic titration trends A > C > D > B and B > D > C > A. The statistical test used was a two-sample t-test, using equal variance, calculated within each site and comparing log2 expression values between the independent samples A and B. The gene counts in rows 8 and 9 are also indicated in Figure 2 for each monotonic direction. Rows 10 and 11 translate the previous rows into percentages of genes showing the monotonic titration trend. Rows 12–17 summarize Figure 2 for three specific y-axis values (50%, 75% and 90% of genes titrate at the listed average fold changes). Rows 18 and – – – – 19 show the numbers of genes for which A / B > 2 and B / A > 2. Rows 20 and 21 show the numbers of statistically significant (P < 0.001) genes used to create the box plots in Figure 4. Columns highlighted in blue, for each platform, represent the manufacturer’s recommended normalization methods used in the main MAQC paper13. More detailed gene counts with cross-site intersections can be found in Supplementary Table 1 online.

1126

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

A N A LY S I S

Fraction of genes that titrate

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

ABI - quantile

ABI - scaling

AG1 - median scaling

0.6

0.6

0.6

0.6

0.5

0.5

0.5

0.5

0.4

0.4

0.4

0.4

0.3

0.3

0.3

0.3

0.2

0.2

0.2

0.2

0.1

0.1

0.1

0 –5

0

5

10

15

20

0 –5

0

AFX - PLIER

5

10

15

0 –5

20

AFX - MAS5

0.1 0

5

10

15

0 –5

20

0.6

0.5

0.5

0.5

0.5

0.4

0.4

0.4

0.4

0.3

0.3

0.3

0.3

0.2

0.2

0.2

0.2

0.1

0.1

0.1

5

10

15

20

GEH - median scaling

0 –5

0

5

10

15

0 –5

20

GEH - quantile

5

10

15

0 –5

20

ILM - quantile

0.6

0.6 0.5

0.5

0.5

0.4

0.4

0.4

0.4

0.3

0.3

0.3

0.3

0.2

0.2

0.2

0.2

0.1

0.1

0.1

5

10

15

20

0 –5

0

5

10

15

0 –5

20

15

20

15

20

0.1 0

0.5

0

10

0.6

0.6

0 –5

5

AFX - GCRMA

0.6

0

0

AFX - RMA

0.6

0 –5

AG1 - 75th percentile scaling

0.6

0

5

10

ILM - invariant scaling

0.1 0

5

10

15

0 –5

20

0

5

10

15

20

Average log2 signal A>C>D>B

Site 1

Site 2

Site 3

B>D>C>A

Site 1

Site 2

Site 3

Figure 3 Impact of normalization on the distributions of titrating genes as a function of signal intensity. Fractions of genes showing the monotonic titration – – – – – – –– – – – responses Ai > Ci > Di > Bi and Bi > Di > Ci > Ai are plotted against Bi (solid line) and Ai (dashed line), respectively. Histograms in each panel represent data from a different platform and normalization technique, separated by site and direction. Normalization methods highlighted in yellow for each platform are the manufacturer’s recommended method used in the MAQC study. Blue, site 1; red, site 2; gray, site 3. The data for these graphs were generated from the set of 12,091 genes common across the platforms that were significantly differentially expressed (P < 0.001) in samples A and B and detected in all four samples (Table 1, rows 4 and 5). All data are plotted on the same scale: the x-axis is normalized signal in log2 units and the y-axis shows the fraction of titrating probes relative to the total number of probes in the given intensity range. Bin centers are 0.5 apart on the log2 scale. To avoid spurious oscillations in the lowest and highest signal intensities, we plotted only bins with more than ten genes. Differences between normalization techniques are demonstrated by the differing signal ranges within a platform for the monotonic titration response. The normalization methods highlighted in yellow for each platform represent the manufacturer’s recommended method used in the MAQC main paper13.

but when B > A (dotted lines), these data show a sudden increase in that fraction at high intensity. This effect, although still present, is much less distinct for the scaled than for the quantile-normalized data. We saw improved reproducibility among sites and concordance between the two titration trends in the AG1 75th percentile scaling relative to the median scaling. For the AFX-PLIER data, the signal range across which a titration response is elicited is smaller than for the other platforms and normalization methods, possibly owing to the variance stabilization used in the PLIER method. In all cases, the AFX data show lower percentages for site 1, as in Figure 2. For the GEH data, median normalization results in a very clear distinction between the two different titration patterns; this distinction is moderated by quantile normalization. The data for the ILM rank invariant scaling indicate a larger number of – – – – – genes showing the titration response Bi > Di > Ci > Ai than showing the opposite trend, a result not seen for any other platform or normalization method. Unlike in Figure 2, the percentage of titrating genes never reaches 100% because, at all signal ranges, some genes show only very small differences in expression across the samples and are more likely to yield a near-random ordering in their titration responses. Analysis of titration mixtures An underlying assumption for this study was that the proportions of each mRNA in the mixture samples (C and D) from each of the original samples (A and B) are equivalent to the mixing proportions of the total RNA. For this assumption to be true, the fractions of each mRNA in the total RNA samples A and B had to be the same and had to be processed by the various biochemical systems with equal efficiencies. Using mathematical modeling, we investigated whether we could derive the relative mRNA contents of the two independent samples using the microarray data from the independent and titration samples (see Methods). Such

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

modeling defines the true fractions of mRNA derived from sample A in titration samples C and D as αC and αD, and the true fractions of mRNA derived from sample B in titration samples C and D as βC and βD (see Box 1 and Supplementary Fig. 5). Figure 4 shows the results of this modeling for all the platforms and normalization methods, with the y-axes representing the estimates of βC (bottom) and βD (top). The lower charts show median values of βC centered on 0.18 but usually – – – – larger for Ai > Bi (left) than for Bi > Ai (right), and the upper charts show median values of βD centered on 0.67. These deviations from the expected values of 0.25 and 0.75 based on the 3:1 mixtures of total RNA suggest that the mRNA concentrations of the A and B samples were not identical. From these results, we estimate the mRNA concentration in the B sample to be approximately two-thirds of the concentration in the A sample (see Box 1). An empirical evaluation of mRNA content in samples A and B is consistent with our estimates of 3% and 2%, respectively (see Methods). The values calculated from the different platforms and normalization methods are generally similar, with two clear exceptions. For ILM, invariant scaling results in much lower estimates for βC and βD than the other platforms and normalization methods when A > B (left side) but not when B > A. This difference is consistent with the results noted for the titration response (Figs. 2 and 3). For ABI, the estimates of βC and βD are consistent with the other platforms when A > B but lower than the other platforms when B > A. This result was seen with both normalization methods, although to different extents, and may be related to the differences noted in Figure 3. The deviations for βC and βD are particularly noteworthy because of the relatively small errors of the ABI data in this analysis. The individual microarray measurements for the titration coefficients shown in Figure 4 indicate that normalization and data-processing

1127

A N A LY S I S Box 1 Modeling of titration mixtures Ideally, the mRNA expression levels of each gene in samples C and D may be mathematically expressed as C = αCA + βCB and D = αDA + βDB,

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

where A and B are the measured mRNA abundances of the gene in samples A and B, respectively, and αC, βC, αD and βD are the mixture coefficients. If we impose the requirement that

for different platforms, sites and normalization methods can lead to deviations from these expected values (Fig. 4). For example, if the mRNA fractions for the A and B samples (termed a and b, respectively) are unequal (a ≠ b), then C = ((0.75a)A + (0.25b)B)/(0.75a + 0.25b) and D = ((0.25a)A + (0.75b)B)/(0.25a + 0.75b).

αC + βC =1 and αD + βD = 1 (if A = B, then C = A = B = D), then elementary algebra can be used to derive simple formulas for βC and βD: βC = (C – A)/(B – A) and βD = (D – A)/(B – A). If the mRNA fractions in samples A and B are identical and the normalization of samples A, B, C and D exactly the same, then the measured fraction should be centered on the ideal mixture fractions of βC = 0.25 and βD = 0.75 (implying αC = 0.75 and αD = 0.25). However, different mRNA concentrations in the A and B samples and differences in the normalization of the four samples

differences are not the primary cause for the deviations from the theoretical values. Differences in mRNA abundance contribute to these deviations and may not be circumvented with normalization alone. Additionally, further analysis of microarray measurements from these titration mixtures may provide greater-resolution observations of the global tendency (Fig. 4) of estimates of βC and βD to be larger for A > B than for B > A (see Supplementary Fig. 1 online). Effects of outlier data During execution and analysis of the MAQC study, the consortium identified one outlier site and multiple outlier arrays on the basis of objective criteria of data quality13. In some cases, we evaluated the effects of not censoring such data from the analysis. The results (data not shown) were as expected: inclusion of low-quality data degraded both intra- and intermethod reproducibility. This result, although predictable, is nonetheless noteworthy because microarray experiments are expensive and are sometimes used to analyze samples that are available in very limited quantities. Low-quality microarray data are discarded with great pain. It is therefore important that the community develop shared standards of microarray data quality to allow use and interpretation of less-thanperfect data while preventing overinterpretation. The well-characterized RNA samples and all of the data (including outliers) produced by the MAQC study are a good start on the road to such data-quality standards. In particular, the titration experimental design used in this work may prove to be an important tool for developing such standards, as the experiments can be interpreted using a small number of plausible assumptions. DISCUSSION The MAQC titration study was conceived as an experiment that could be implemented across several platforms, with a minimum of assumptions. One of the initial goals of the titration study was to assess relative accuracy by comparing observed expression in the titration samples with the expression expected on the basis of the known mixing ratios

1128

We can express the true ratios of the B to A mRNA fractions, b/a = 3βC/(1 – βC) = βD/3(1 – βD) (see Supplementary Fig. 5). Using the empirical measurements of βC and βD, we can then estimate these true mRNA fractions. For example, if the B fraction of sample C is βC ≈ 0.18, as indicated by microarray median values in Figure 4 (bottom), then we can deduce that the true ratio of mRNA fractions b/a is approximately 2:3. Moreover, these results predict that βD = 9βC/(1 + 8βC) ≈ 0.67, which is consistent with the empirical microarray results in Figure 4 (top).

of the two independent samples. This analysis proved to be more complex than originally anticipated, largely owing to the effects of different mRNA fractions in the two independent samples. However, the qualitative expectation of a particular signal ordering is still valid and provides a sensitive tool for differentiating microarray platform performance and normalization methods. As the measurement of titration response illustrates, different platforms and data analysis methods have slightly different performance optima: design and processing choices that increase the number of detected genes also tend to increase noise in the titration series. In addition to differences in the number of genes analyzed, the variations seen in Figure 2 and Table 1 can also result from differences in expression-ratio compression (leading to different ratios observed for any given gene) as well as levels of noise in each measurement. In general, the behaviors of various sites and platforms are quite similar. The analysis of the titration mixtures reveals some interesting observations about the data. These results show asymmetry in the titration responses (Figs. 2 and 3) and the estimates of the true fractions of mRNA in the titration samples (Fig. 4). This asymmetry may be caused in part by additional differences in the normalization of the A and B samples (Supplementary Fig. 1), may relate to more difficulty in distinguishing A and C at low signal or may be a consequence of nonlinearity in the signal response relative to the concentration amounts (Supplementary Fig. 2 online). In addition, the results presented here demonstrate that the mRNA content of the two independent samples is not equal. This conclusion is supported by additional lines of evidence. First, an apparent power analysis27–30 (Supplementary Figs. 3 and 4 online) is asymmetric between the sample pairings (A, C) and (B, D). This asymmetry is probably the result of the A sample being more similar to C than B is to D. Second, the slopes of the linear trends for the titration sample/independent sample ratios (Supplementary Fig. 1) suggest that the ratio of sample A to B in sample C differs from the expected value from the total RNA ratios. Third, external spike-in RNA controls were included for several platforms; these controls were amplified and labeled along with the sample RNA and indicate that the A sample contains a higher percentage

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

βD = (D – A)/(B – A)

of mRNA relative to the B sample31. Finally, a preliminary empirical analysis of mRNA content in the A and B samples (see Methods) confirmed that the mRNA content differs between the samples. The discovery of a difference in the mRNA content of samples A and B has important implications for the future use of these commercially available samples in method calibration, proficiency testing and other activities requiring well-characterized, complex RNA. As a result of the MAQC study, these samples are probably the best-characterized complex RNA preparations available. The RNA-measurement community should complete the characterization of these samples by more accurately measuring the fraction of mRNA in each preparation, so that the scientific community can make better use of this resource. The utility of the titration samples for assessing normalization and data preprocessing methods can be seen throughout the analyses presented here. Notably, for all platforms except AFX and ILM, the performance of the MAQC ‘standard’ normalization or data preprocessing method was slightly inferior to that of the secondary method, especially in the apparent power analysis (Supplementary Fig. 3). This result highlights the observation noted throughout this study that data processing methods determined to be optimal under one set of circumstances may not always prove appropriate under all conditions, particularly if primary assumptions underlying those data processing methods are violated. A great strength of the design presented here is that, despite the added complexities of varying mRNA content, the qualitative expectation of a particular signal ordering is still valid, provided that the different data sets are properly scaled relative to one another. Therefore, this design is very valuable for assessing microarray performance. Specifically, as we have shown here, the titration response can be used to distinguish between normalization methods that are sensitive to changes in mRNA fraction and methods that are robust despite such changes. One observation of this study is that the robustness of a normalization

method depends in part on the subset of data used to determine the scaling constant or function. Our results indicate a path toward objective optimization of this normalization set. The differences in gene expression among samples may be greater and the variability across replicates may be smaller in this study than in typical biological experiments; nonetheless, the lessons learned regarding the use of titration mixtures to evaluate the performance and normalization of large-scale gene-expression measurements may have widespread application in more realistic settings. In addition, the wide range of gene expression in these samples probably served to amplify data processing–derived differences that would have been more difficult to detect in analyses of more closely matched samples. Finally, it should be noted that the majority of genes considered here yielded very similar behavior across all platforms, in spite of the complications noted in this manuscript. Therefore, these results should be considered a testament to the underlying strength of all of the methods examined. Improvement of mRNA quantification methods remains an important objective, and the MAQC study has produced samples and data that will aid the community in making such improvements. The concordance of data presented here demonstrate that the methods used are sound and, when properly implemented and interpreted, can be used to measure expression levels of thousands of RNA targets simultaneously. METHODS Preparation of the RNA sample titrations. RNA samples are described in detail in the main MAQC paper13. Briefly, two commercially available total RNA solutions and 3:1 and 1:3 mixtures were chosen at the outset by the members of the MAQC project. For simplicity, these samples were designated as A, B, C and D. A and B are independent total RNA samples. A is derived from a collection of ten human cell lines and B from human brain tissue. Sample A is sold commercially under the name Universal Human Reference RNA (Catalog number 740000, Stratagene). Sample B is sold commercially under the name FirstChoice Human Brain Reference RNA (Catalog number 6050, Ambion).

A>B

B>A

1.0

1.0

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2 0.1

0.1

0

0

Quantile

Scaling

PLIER

ABI

βC = (C – A)/(B – A)

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

A N A LY S I S

MAS 5.0

RMA

GCRMA

AFX

Scaling Scaling 75

AG1

Scaling

Quantile

GEH

Quantile

Scaling

Scaling

Quantile

ILM

PLIER

ABI

1.0

1.0

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

MAS 5.0

RMA

GCRMA

AFX

Scaling

Scaling 75

AG1

Scaling

Quantile

GEH

Quantile

Scaling

ILM

0

0

Quantile

Scaling

ABI

PLIER

MAS 5.0

AFX

RMA

GCRMA

Scaling Scaling 75

AG1

Scaling

Quantile

GEH

Quantile

Scaling

Quantile

ILM Site 1 쐽

Scaling

PLIER

ABI Site 2 쐽

MAS 5.0

AFX

RMA

GCRMA

Scaling Scaling 75

AG1

Scaling

Quantile

GEH

Quantile

Scaling

ILM

Site 3 쐽

Figure 4 Titration-response concordance for each commercial whole-genome microarray platform, using different normalization methods, with data from each platform separated by site and fold-change direction. Data shown are from the 12,091 genes common across whole-genome platforms. Box plots were generated in cases where a gene was detected across all samples per site and had a statistically significant (P < 0.001) A/B ratio >2 in the direction indicated. A two-sample t-test, with equal variance, was performed within each site on log2 expression values. Data for each site were split by direction of fold change: left, genes where A/B > 2; right, genes where B/A > 2 (all differences significant, P < 0.001, for both directions). Number of genes used for each box plot is indicated by individual site counts in Table 1 (rows 20 and 21). Each box represents the interquartile range, with median marked by a horizontal black line and 10th and 90th percentiles marked by the outer whiskers. Blue, site 1; red, site 2; gray, site 3. The horizontal dashed black lines represent expected values assuming 3% and 2% mRNA abundance levels for samples A and B, respectively. In other words, when the mRNA/total RNA fraction in A is equal to 3% and in B is equal to 2%, then βC = (C – A)/(B – A) = 0.18 (bottom two charts) and βD = (D – A)/(B – A) = 0.67 (top two charts). Refer to Box 1 for further details. Normalization methods highlighted in yellow for each platform represent the manufacturer’s recommended method used in the MAQC main paper13.

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

1129

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

A N A LY S I S RNA titration samples were generated once for all MAQC experiments (Fig. 1), with samples A and B at equal concentrations as measured by A260. Sample C was made by mixing sample A with sample B at a volumetric ratio of 75:25, and sample D was made by mixing sample A with sample B at a volumetric ratio of 25:75.

negative control probes, which are thermodynamically equivalent to regular probes but do not have specific targets in the transcriptome. Gene signals were ranked relative to signals of negative controls, and the detection flag was set to present if gene signal exceeded 99% of signals of negative controls.

Normalization methods used in this study. For ABI, we used quantile normalization17 independently for each test site and 90% trim mean scaling. For trim mean scaling, the signals for highest 5% and lowest 5% are removed, and the remaining 90% of signals are used to calculate the mean. The mean of each array is scaled to the same level, and the scaling factor for each array is used to scale the signals. The trim mean scaling was calculated independently for each test site. For AG1, the data were transformed so that signal values below 5 were set to 5. After this transformation, each measurement was divided by the median of all detected measurements in that sample (for median scaling) or by the 75th percentile of all measurements in that sample (for 75th percentile scaling). For AFX data, we used PLIER21, MAS 5.0, RMA18 and GCRMA27 for data preprocessing and normalization. The PLIER method produces a summary value for a probe set by accounting for experimentally observed patterns in feature behavior and handling error appropriately at low and high abundance. PLIER accounts for the systematic differences between features by means of parameters termed feature responses, using one such parameter per feature (or pair of features, when using mismatch (MM) probes to estimate cross-hybridization signal intensities for background). Feature responses represent the relative differences in intensity between features hybridizing to a common target. PLIER produces a probe-set signal by using these feature responses to interpret intensity data, applying dynamic weighting by empirical feature performance and handling error appropriately across low and high abundances. Feature responses are calculated using experimental data across multiple arrays. PLIER also uses an error model that assumes error is proportional to the observed intensity rather than to the background-subtracted intensity. This ensures that the error model can adjust appropriately for relatively low and high abundances of target nucleic acids. Here, PLIER was run with the default options (quantile normalization and PM-MM) with the addition of a 16 offset to each expression value13. The AFX MAS 5.0 algorithm is a method for calculating probe-set signal values. The MAS 5.0 algorithm is implemented on a chip-by-chip basis and is not applied across an entire set of chips. The signal value is calculated from the background-adjusted PM and MM values of the probes in the set using a robust biweight estimator. Here, MAS 5.0 is implemented with default options, and global scaling (96% trim mean) is used for normalization. RMA18 fits a robust linear model to the probe-level data and conducts a multichip analysis. The algorithm includes a model-based background correction, quantile normalization and an iterative median polishing procedure to generate a single expression value for each probe set. GCRMA substantially refines the RMA algorithm by replacing the model for background correction with a more sophisticated computation that uses each probe’s sequence information to adjust the measured intensity for the effects of nonspecific binding, according to the different bond strengths of the two types of base pairs. It also takes into account the optical noise present in data acquisition. Both RMA and GCRMA were implemented using the ArrayAssist Lite package with default settings (Affymetrix; http://www.affymetrix.com/products/software/specific/arrayassist_lite.affx). For GEH data, we compared median scaling and quantile normalization. For the median-scaling approach, each measurement was divided by the median of all measurements within each array. Therefore, the median signal is scaled to 1 for each array. The quantile normalization approach16 was applied to log2-transformed expression values across all samples and replicates within each site. For ILM data, we compared quantile normalization16 with the addition of 15 counts of offset to each probe signal13 and normalization by a robust leastsquares fit of rank-invariant genes. For the latter normalization method, array data corresponding to sample A were averaged and used as a reference on each site independently. Signals from each array in the experiment were compared to the reference, and probes with relative rank changes of less than 5% (only probes ranked between the 50th and 90th percentiles were included) were considered to be rank invariant. Normalization coefficients were computed with iteratively reweighted linear least squares using the Tukey bisquare weight function. Background signal, estimated as the mean signal of negative controls, was subtracted before normalization. Each ILM array contains approximately 1,600

Purification of mRNA to empirically determine abundance in samples A and B. In a follow-up experiment, mRNA was isolated from 100 µg of samples A and B total RNA in duplicate using the Absolutely mRNA purification kit (Stratagene) according to the manufacturer’s protocol. Briefly, 50 µl of mRNA oligo (dT) magnetic particles were combined with 100 µl of total RNA and washed four times, and mRNA was eluted with 100 µl elution buffer. mRNA quantity and quality were evaluated by ND-1000 NanoDrop spectrophotometer (NanoDrop Technologies) and Agilent 2100 Bioanalyzer with RNA 6000 Nano LabChip Kit (Agilent Technologies). This empirical evaluation of mRNA content in each 100 ng of total RNA produced an average yield of 2.870 ± 0.095 ng for sample A and 2.003 ± 0.124 ng for sample B (mean ± s.d.).

1130

Note: Supplementary information is available on the Nature Biotechnology website. ACKNOWLEDGMENTS This study used a number of computing resources, including the high-performance computational capabilities of the Biowulf PC/Linux cluster at the US National Institutes of Health in Bethesda, Maryland (http://biowulf.nih.gov). This research was supported in part by the Intramural Research Program of the US National Institutes of Health, National Library of Medicine. DISCLAIMER This work includes contributions from, and was reviewed by, the FDA and the NIH. This work has been approved for publication by these agencies, but it does not necessarily reflect official agency policy. Certain commercial materials and equipment are identified in order to adequately specify experimental procedures. In no case does such identification imply recommendation or endorsement by the FDA or the NIH, nor does it imply that the items identified are necessarily the best available for the purpose. COMPETING INTERESTS STATEMENT The following authors declare competing financial interests (see the Nature Biotechnology website for details). Published online at http://www.nature.com/nbt/ Reprints and permissions information is available online at http://npg.nature.com/ reprintsandpermissions/

1. Barczak, A. et al. Spotted long oligonucleotide arrays for human gene expression analysis. Genome Res. 13, 1775–1785 (2003). 2. Barnes, M., Freudenberg, J., Thompson, S., Aronow, B. & Pavlidis, P. Experimental comparison and cross-validation of the Affymetrix and Illumina gene expression analysis platforms. Nucleic Acids Res. 33, 5914–5923 (2005). 3. Dobbin, K.K. et al. Interlaboratory comparability study of cancer gene expression analysis using oligonucleotide microarrays. Clin. Cancer Res. 11, 565–572 (2005). 4. Dorris, D.R. et al. Oligodeoxyribonucleotide probe accessibility on a three-dimensional DNA microarray surface and the effect of hybridization time on the accuracy of expression ratios. BMC Biotechnol. 3, 6 (2003). 5. Hughes, T.R. et al. Expression profiling using microarrays fabricated by an ink-jet oligonucleotide synthesizer. Nat. Biotechnol. 19, 342–347 (2001). 6. Irizarry, R.A. et al. Multiple-laboratory comparison of microarray platforms. Nat. Methods 2, 345–350 (2005). 7. Larkin, J.E., Frank, B.C., Gavras, H., Sultana, R. & Quackenbush, J. Independence and reproducibility across microarray platforms. Nat. Methods 2, 337–344 (2005). 8. Li, J., Pankratz, M. & Johnson, J.A. Differential gene expression patterns revealed by oligonucleotide versus long cDNA arrays. Toxicol. Sci. 69, 383–390 (2002). 9. Naef, F., Socci, N.D. & Magnasco, M. A study of accuracy and precision in oligonucleotide arrays: extracting more signal at large concentrations. Bioinformatics 19, 178–184 (2003). 10. Shippy, R. et al. Performance evaluation of commercial short-oligonucleotide microarrays and the impact of noise in making cross-platform correlations. BMC Genomics 5, 61 (2004). 11. Yuen, T., Wurmbach, E., Pfeffer, R.L., Ebersole, B.J. & Sealfon, S.C. Accuracy and calibration of commercial oligonucleotide and custom cDNA microarrays. Nucleic Acids Res. 30, e48 (2002). 12. Chudin, E. et al. Assessment of the relationship between signal intensities and transcript concentration for Affymetrix GeneChip arrays. Genome Biol. 3, RESEARCH0005 (2002). 13. MAQC Consortium. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat. Biotechnol. 24, 1151–1161 (2006).

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

A N A LY S I S 14. Shi, L. et al. Cross-platform comparability of microarray technology: intra-platform consistency and appropriate data analysis procedures are essential. BMC Bioinformatics 6 (Suppl.) S12 (2005). 15. Thompson, K.L. et al. Use of a mixed tissue RNA design for performance assessments on multiple microarray formats. Nucleic Acids Res. 33, e187 (2005). 16. Bolstad, B.M., Irizarry, R.A., Astrand, M. & Speed, T.P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19, 185–193 (2003). 17. Irizarry, R.A. et al. Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 31, e15 (2003). 18. Irizarry, R.A. et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4, 249–264 (2003). 19. Irizarry, R.A., Wu, Z. & Jaffee, H.A. Comparison of Affymetrix GeneChip expression measures. Bioinformatics 22, 789–794 (2006). 20. Parrish, R.S. & Spencer, H.J. III. Effect of normalization on significance testing for oligonucleotide microarrays. J. Biopharm. Stat. 14, 575–589 (2004). 21. Guide to probe logarithmic intensity error (PLIER) estimation. Affymetrix Technical Note 22. Statistical algorithms description document. Affymetrix

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

23. Cope, L.M., Irizarry, R.A., Jaffee, H.A., Wu, Z. & Speed, T.P. A benchmark for Affymetrix GeneChip expression measures. Bioinformatics 20, 323–331 (2004). 24. Wu, Z. & Irizarry, R.A. Stochastic models inspired by hybridization theory for short oligonucleotide arrays. J. Comput. Biol. 12, 882–893 (2005). 25. Sendera, T.J. et al. Expression profiling with oligonucleotide arrays: technologies and applications for neurobiology. Neurochem. Res. 27, 1005–1026 (2002). 26. Wu, Z. , Irizarry, R.A., Gentleman, R., Martinez Murillo, F. & Spencer, F. A model based background adjustment for oligonucleotide expression arrays. J. Am. Stat. Assoc. 99, 909–917 (2004). 27. Seo, J., Gordish-Dressman, H. & Hoffman, E.P. An interactive power analysis tool for microarray hypothesis testing and generation. Bioinformatics 22, 808–814 (2006). 28. Hwang, D., Schmitt, W.A. & Stephanopoulos, G. Determinatoin of minimum sample size and discriminatory expression patterns in microarray data. Bioinformatics 18, 1184–1193 (2002). 29. Tibshirani, R. A simple method for assessing sample sizes in microarray experiments. BMC Bioinformatics 7, 106 (2006). 30. Page, G.P. et al. The PowerAtlas: a power and sample size atlas for microarray experimental design and research. BMC Bioinformatics 7, 84 (2006). 31. Tong, W. et al. Evaluation of external RNA controls for the assessment of microarray performance. Nat. Biotechnol. 24, 1132–1139 (2006).

1131

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

A N A LY S I S

Evaluation of external RNA controls for the assessment of microarray performance Weida Tong1, Anne Bergstrom Lucas 2, Richard Shippy3, Xiaohui Fan1,4, Hong Fang5, Huixiao Hong5, Michael S Orr6, Tzu-Ming Chu7, Xu Guo8, Patrick J Collins2, Yongming Andrew Sun9, Sue-Jane Wang6, Wenjun Bao7, Russell D Wolfinger7, Svetlana Shchegrova2, Lei Guo1, Janet A Warrington8 & Leming Shi1 External RNA controls (ERCs), although important for microarray assay performance assessment, have yet to be fully implemented in the research community. As part of the MicroArray Quality Control (MAQC) study, two types of ERCs were implemented and evaluated; one was added to the total RNA in the samples before amplification and labeling; the other was added to the copyRNAs (cRNAs) before hybridization. ERC concentration-response curves were used across multiple commercial microarray platforms to identify problematic assays and potential sources of variation in the analytical process. In addition, the behavior of different ERC types was investigated, resulting in several important observations, such as the sample-dependent attributes of performance and the potential of using these control RNAs in a combinatorial fashion. This multiplatform investigation of the behavior and utility of ERCs provides a basis for articulating specific recommendations for their future use in evaluating assay performance across multiple platforms. ERCs are synthetic or naturally occurring RNA species that are added to an RNA sample for the purpose of quality control of the assay. Most commercial microarray platforms contain probes specifically designed for interrogating ERC transcripts. These probes have been extensively prototyped and optimized for performance on each microarray platform. To provide an enhanced assessment of the analytical performance of the system during data collection, a variety of ERCs can be added to the sample in a range of concentrations spanning high to low abundance by

1National Center for Toxicological Research, US Food and Drug Administration, 3900 NCTR Rd., Jefferson, Arkansas 72079, USA. 2Agilent Technologies, Inc., 5301 Stevens Creek Blvd., Santa Clara, California 95051, USA. 3GE Healthcare, 7700 S. River Pkwy., Suite #2603, Tempe, Arizona 85284, USA.4Pharmaceutical Informatics Institute, Zhejiang University, Hangzhou 310027, China. 5Z-Tech Corporation, National Center for Toxicological Research, US Food and Drug Administration, 3900 NCTR Rd., Jefferson, Arkansas 72079, USA. 6Center for Drug Evaluation and Research, US Food and Drug Administration, 10903 New Hampshire Ave., Silver Spring, Maryland 20993, USA. 7SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513, USA. 8Affymetrix, Inc., 3420 Central Expressway, Santa Clara, California 95051, USA. 9Applied Biosystems, 850 Lincoln Centre Dr., Foster City, California 94404, USA. Correspondence should be addressed to W.T. ([email protected]).

Published online 8 September 2006; doi:10.1038/nbt1237

1132

evaluating assay performance across the expected range of concentrations in the sample1. A well-constructed concentration-response series of ERCs is useful in many ways for assessing assay performance. Depending on the point in the assay the ERCs are added, they can be used to identify potentially failed steps during the assay process. Realizing the potential importance of ERCs for analytical performance assessment, the External RNA Control Consortium (ERCC) was established in 2003 with the objective of developing a set of ERC transcripts that could be used with various gene expression profiling technologies, including microarray platforms2. ERCs can also be useful for evaluating different data analysis methods3. The cRNA data set from Affymetrix, known as the Latin square data set (http://www.affymetrix.com/support/technical/sample_data/datasets. affx), consists of data from 42 cRNAs, which were prelabeled and added to a hybridization solution at various known concentrations. A similar data set is also provided by GeneLogic (http://www.genelogic.com/newsroom/ studies/index.cfm). Both data sets are freely available and have been widely used in the research community for comparative performance analysis of GeneChip-specific normalization and gene selection methods4–7. Recently, Choe et al.8,9 demonstrated the value of using a large number of cRNA transcripts at concentration ratios varying from one- to fourfold to compare the performance of different data analysis scenarios. The MAQC study10 provides a rich data resource to investigate various issues associated with DNA microarray platforms, including the performance of ERCs across various platforms. In this project, the probes for the ERC transcripts (Supplementary Methods online) are unique nonmammalian sequences selected to minimize cross-hybridization with transcripts from mammalian species such as human, mouse and rat. Seven microarray platforms were evaluated and ERCs were used in the following platforms: Applied Biosystems Genome Survey Microarray, Affymetrix GeneChip, both Agilent’s One-Color and Two-Color platforms, GE Healthcare CodeLink and Eppendorf (data not shown). With these data sets, the following questions were asked: (i) Do the ERCs behave in the expected manner? (ii) Can outlying assays be identified using ERCs? (iii) Can ERCs assess the accuracy of ratios between different samples? (iv) Can ERCs provide information other than assay quality? (v) How does the choice of normalization and data processing methods affect the ERCs data? RESULTS The utility and performance behavior of ERCs were investigated using two independent sets of data; the MAQC data set10 and rat toxicogenomics

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

A N A LY S I S (TGx) data set11. Because the results in this paper are derived from two independent experiments the following nomenclature is used to provide clarity. The subset of the MAQC data set used for the present analysis corresponds to four genome-wide commercial microarray platforms, Affymetrix GeneChip (AFX), Applied Biosystems Genome Survey Microarray (ABI) and Agilent One-Color (AG1) and Agilent Two-Color (AGL) microarrays. Data were generated for each of these platforms by three different test sites with five technical replicates for each of the four RNA samples (A, B, C and D10,12). Each data set is denoted by platform_site_replicate; for example, AG1_2_A1 denotes Agilent One-Color platform, test site 2, sample A and replicate 1. The rat TGx data set that is denoted by platform_RAT contains data from Affymetrix (AFX_Rat), Agilent One-Color microarray (AG1_Rat), Applied Biosystems (ABI_Rat) and GE Healthcare (GEH_Rat). This experiment was performed at one test site with six biological replicates for each of six different treatments. The nomenclature for the site, mentioned above, is therefore not applicable, yet it’s necessary to make a distinction between samples and that is provided in Methods and within the figures. Two types of ERCs were investigated. One type is added to the total RNA (called tERC hereafter) before initiating the cDNA synthesis and in vitro transcription steps of the RNA labeling procedure. When added in this manner, the tERC generally assesses the efficiency of the target preparation as well as the performance of the hybridization and scanner. The other type of ERC is added to the cRNA (called cERC hereafter) immediately before hybridization, which allows assessment of the assay performance from the hybridization onward. Applied Biosystems and Affymetrix platforms used both types of ERCs in their respective protocols, whereas Agilent used tERC and GE Healthcare used cERC only (Fig. 1). The concentration-response behavior of both tERCs and cERCs was evaluated using a linear regression analysis in an effort to identify microarray assays that show outlier behavior. This is a favorable approach as the analysis is self-contained within each microarray, and therefore, does not require replicates to assess outliers. The behavior of both ERC types was investigated further to determine if additional ERC-specific analysis methods could be useful for analytical performance assessment. External RNA control concentrationresponse curves The ERC transcripts span a range of concentrations in the Affymetrix, Agilent and GE Healthcare microarray platforms, making them suitable for concentration-response analyses. The Agilent One-Color platform has ten tERCs that span six logs of concentration and interrogate the lower and upper limits of assay signal detection (Supplementary Table 1 online). The Affymetrix platform has four tERCs that span one and a half logs of concentration and the GE Healthcare platform has six ERCs that span three logs of concentration. For the Applied Biosystems microarray platform, ERC controls are spiked at a single fixed concentration, rendering them unsuitable for a concentration-response analysis. Figure 2 depicts the concentration-response curves for AG1, AFX and GEH_RAT. In general, all platforms exhibited accurate concentrationresponse patterns. In addition, performance

differences are observed for tERCs relative to cERCs as seen in the data from AFX where the tERCs show decreased linear correlations compared to the cERC plots (Fig. 2, comparing the second and third rows of graphs for the AFX platform). This result is somewhat expected as the tERCs are introduced earlier in the assay process and are subject to multiple sources of variation introduced during sample amplification and labeling, more closely approximating the analytic manipulation. In contrast, the cERCs are added just before hybridization, and their more stable performance reflects fewer sample manipulations after these controls are added. Two assays generated by AG1 site 2 (AG1_2_D2 and AG1_2_A3) have noticeably higher signals for tERCs at the lowest concentrations, indicating potential assay outliers. However, the specific problematic step of the assay for these two data sets cannot be identified because the behavior of tERC reflects the performance associated with multiple steps of the experiment. The benefit of using both tERCs and cERCs is demonstrated with the AFX platform, where the combination was used to elucidate procedural problems in the assay. In this example the AFX cERC performance is stable and consistent across all three test sites, but tERCs in site 1 have lower y-intercepts as compared to the other two sites, indicating that for site 1 the target preparation yield or labeling efficiency differed from the other sites (Fig. 2). Concentration-response curves in one-color microarray assays In addition to visually inspecting the concentration-response curves to interrogate the performance over the dynamic range of an assay, we calculated the linear regression statistics of the linear portion of the curves for outlier identification, including R2 correlations and slopes. Figure 3 (Supplementary Table 2 online) plots the linear regression slope versus R2 correlations for AG1, AFX and GEH_Rat. Three outlying assays were identified for AG1 site 2 (Fig. 3a); AG1_2_D1 has a normal R2 with a low slope, whereas AG1_2_D2 has a normal slope with a low R2 and AG1_2_A3 has both low slope and R2. An assay with a concentration-response slope of one indicates no compression of the signal because values of x and y are identical across the regression fit. By inspecting the slopes in Figure 3, different degrees of compression in gene expression data are observed between three

Microarray assay process Total RNA samples

tERC added to

RT

cRNA Fragmentation Fragmented cRNA

Affymetrix: four poly-A controls Agilent: ten in vitro synthesized, polyadenylated transcripts for both one- and two-color arrays

cDNA IVT

Applied Biosystems: three IVT controls and three RT controls

cERC added to cERC added to

GE Healthcare: six positive controls Applied Biosystems: three hybridization controls Affymetrix: four hybridization controls

Hybridization

Figure 1 Overview of external RNA controls (ERCs) implemented in Affymetrix, Agilent, Applied Biosystems and GE Healthcare platforms. Two types of ERCs are implemented in these four commercial microarray platforms. The first type of ERC is added to the total RNA (tERC) before initiating the cDNA synthesis and IVT (in vitro transcription) steps of the RNA labeling procedure. The second type of ERC is added to the cRNA (cERC) just before the cRNA is placed into the hybridization mixture. Applied Biosystems and Affymetrix platforms use both types of ERCs in their respective protocols, whereas Agilent uses the tERC and GE Healthcare uses cERC in this study. RT, reverse transcription.

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

1133

AG1 site 2

Signal intensity

AG1 site 1

AG1 site 3

AG1_2_A3 AG1_2_D2 sample A sample B sample C sample D

1/1,000,000 1/100,000

1/10,000

1/1,000

1/100

1/1,000,000 1/100,000

tERC/poly-A molar ratio

1/1,000

1/100

1/1,000,000 1/100,000

AFX site 2

1/10,000

1/1,000

1/100

tERC/poly-A molar ratio

AFX site 3

Signal intensity

AFX site 1

1/10,000

tERC/poly-A molar ratio

sample A sample B sample C sample D

1/1,000,000

1/100,000

1/10,000

1/1,000

tERC/poly-A molar ratio

AFX site 1

1/1,000,000

1/100,000

1/10,000

1/1,000

tERC/poly-A molar ratio

1/1,000,000

1/100,000

1/10,000

1/1,000

tERC/poly-A molar ratio

AFX site 3

AFX site 2

Signal intensity

Figure 2 Concentration-response curves for ERCs on the Agilent, Affymetrix and GE Healthcare microarray platforms. Each concentrationresponse curve is generated from an individual microarray data set and represents the concentration of either the tERC (spiked poly-A molar ratio) or of cERC (spiked concentration in pM) on the x-axis as a function of normalized signal intensity on the y-axis. The amount of cERC added to the hybridization mixture is expressed in molar concentration based on the mass of the cERC transcript added to a specific volume of the hybridization mixture. The assumptions used to calculate the poly-A mass ratio for the different tERCs were that the average percentage of mRNA in total RNA is 2%, the average transcript length is 2,000 bases and the average molecular weight of a single base is 330 g/mol. The cERC concentration and tERC poly-A molar ratio used for this figure are summarized in Supplementary Table 1 online. The Agilent platform is presented in the first row where seven of the ten tERCs with the highest concentrations are plotted to better compare scales to the other platforms (the full concentration-response curve is presented in Supplementary Fig. 9 online). The Affymetrix platform is presented in the second and third rows and illustrates the combinatorial approach of using both tERCs (second row) and cERCs (third row). The GE Healthcare platform is presented in the fourth row illustrating the cERC concentration-response from the rat toxicogenomics study. This figure illustrates the different approaches each manufacturer employs for either tERC, cERC or both, when assessing assay quality using ERCs. Two microarrays from AG1 site 2 (AG1_2_D2 and AG1_2_A3) exhibit higher than expected signals for the tERCs with the lowest concentrations, indicating that these could be outlying assays. AA, aristolochic acid; RDL, riddelliine; CFY, comfrey. ‘L’ indicates samples isolated from livers and ‘K’ samples isolated from kidneys of treated rats. CTR, control (liver or kidney from untreated rats).

sample A sample B sample C sample D

cERC concentration (pM)

cERC concentration (pM)

cERC concentration (pM)

GEH Rat Signal intensity

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

A N A LY S I S

KAA

LCTR

LAA

LRDL KCTR LCFY platforms. The AG1 platform has very little compression with a slope close to 1 for tERCs. cERC concentration (pM) cERC concentration (pM) cERC concentration (pM) However, ERC data for AFX and GEH_RAT experiments appear compressed to a similar extent with slopes that are detectably <1. The effect of normalization concentrations to give the following expected ratios: 1:10, 1:3, 1:1, 3:1 methods on the compression was also investigated. For AFX, PLIER13, and 10:1 (Supplementary Table 3 online). This type of ERC formulation MAS514, RMA6, GCRMA15 and dChip16 algorithms were used, whereas adds an additional dimension to the typical one-color concentrationmedian scaling and quantile normalization methods were applied for response analysis, because not only should the tERCs generate signals both AG1 and GEH. For AFX, dChip compresses the gene expression proportional to the concentrations within each of the samples, but data more than other methods whereas GCRMA has little compression or the two-color assays should also generate observed ratios equal to the even a small degree of expansion (Fig. 3b)15. For AG1, the quantile nor- expected ratios (or log10 ratios) when the data sets are dye normalized malization tends to separate A and C from D and B by slope in accordance and analyzed. This accuracy assessment is contained within each probe with the mRNA abundance of the samples as shown in the “Performance interrogating a specific ERC transcript. of External RNA Controls” section (Fig. 3a). This sample-dependent The observed versus expected ERC log10 ratio plots for AGL are prebehavior associated with the quantile normalization is also observed for sented in Figure 4. There were two outlying assays at AGL site 1 with AFX (PLIER, RMA and GC-RMA) (Fig. 3b) and GEH_RAT (Fig. 3c). major assay performance failures generating log10 ratios close to zero across the assay. AG1_1_01 had to be rescanned three weeks after the External RNA controls in two-color microarray assays initial experiment and had faded significantly and AG1_1_85 had the Agilent was the only two-color platform to use ERCs in the MAQC same ERC control mixture added to the samples resulting in log10 ratios study. Agilent formulates two-color ERCs into two different mix- of 0 for all ten of the ERC transcripts. Two assays in AGL site 2 were also tures that span 2.3 logs of concentration and are mixed with different determined to be outliers. These outliers were found to have increased

1134

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

A N A LY S I S a

AG1 site 1

AG1 site 2

1

AG1 site 3

AG1 TGx

AG1_2_D1 AG1_2_A3

0.9

Raw/median

0.95

AG1_2_D2

0.85

R2

0.8 0.75 1 0.95 0.9

Quantile

0.8 0.75 .6

.7

.8

.9

1

1.1 1.2

.6

.7

.8

.9

1

1.1 1.2

.6

.7

.8

.9

1

1.1 1.2

.6

.7

.8

.9

1

1.1 1.2

Slope AG1 MAQC tERC sample A: (쎲), B: (쎲), C: (쎲) and D: (쎲) TGx tERC sample KAA: (쎲), KCTR: (쎲), LAA: (쎲), LCFY: (쎲), LCTR: (쎲), and LRDL (쎲).

b

AFX site 1

AFX site 2

AFX site 3

AFX TGx

1 0.95 0.9

Plier

0.85 0.8 0.75 1 0.95 0.9

MAS5

0.85 0.8 0.75 1 0.95

R2

0.9

dChip

0.85 0.8 0.75 1 0.95

GCRMA

0.9 0.85 0.8 0.75 1 0.95

Performance of external RNA controls The MAQC data sets were generated from four RNA samples with an incremental increase of brain total RNA across the samples: A (0%), C (25%), D (75%) and B (100%). Because the relative mRNA abundance is not expected to be the same between Stratagene Universal Human Reference RNA (UHRR or sample A) and Ambion Human Brain Reference RNA (brain or sample B), the effect of the mRNA abundance on the ERC behavior was investigated in terms of the signal intensity with the objective of developing other ERC-specific analysis methods for assay assessment. The tERC signal intensity increases in proportion to increasing concentrations of brain mRNA in the sample mixture, whereas the signal intensity from the biological probes exhibits reverse trends (Supplementary Fig. 2 online). The general trend was conserved

0.9

RMA

0.85 0.8 0.75 .6

.7

.8

.9

1

1.1 1.2

.6

.7

.8

.9

1

1.1 1.2

.6

.7

.8

.9

1

1.1 1.2

.6

.7

.8

.9

1

1.1 1.2

Slope AFX MAQC cERC sample A: (왎), B: (왎), C: (왎) and D: (왎) AFX MAQC cERC sample A: (쎲), B: (쎲), C: (쎲) and D: (쎲) AFX TGx tERC sample KAA: (쎲), KCTR: (쎲), LAA: (쎲), LCFY: (쎲), LCTR: (쎲), and LRDL (쎲).

c

1 0.95 0.9

Raw

0.85 0.8 0.75

GEH TGx cERC sample KAA: (왎), KCTR: (왎), LAA: (왎), LCFY: (왎), LCTR: (왎), and LRDL (왎)

R2

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

0.85

within-feature noise, which might result from sample contamination from the reagents used to purify the labeled cRNA. A similar observation is obtained when comparing the linear regression correlation coefficients from the observed versus expected ratios, where the outliers are determined based on R2 correlations for the linear fit beyond two s.d. below the mean for that site (Supplementary Fig. 1 online). In the MAQC study, the two-color microarray assays used only samples A and B with a dyeswap experimental design. The y-intercept was >0 (shifted up) for Cy5(B)/Cy3(A) in all three sites and the y-intercept was <0 (shifted down) for Cy5(A)/Cy3(B) at all three sites (Fig. 4). This shift indicates differences in mRNA abundance between sample A and sample B, which will be further analyzed in the following section.

1

Quantile

0.95 0.9 0.85 0.8 0.75 .6

.7

.8

.9

Slope

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

1

1.1 1.2

Figure 3 Concentration-response linear regression results for the Agilent, Affymetrix and GE Healthcare microarray platforms. (a–c) The R2 correlation coefficients (y-axis) versus slope (x-axis) from a regression analysis based on the linear portion of the concentration-response curves for AG1 (a); AFX (b) and GEH_Rat (c). Data used in creating this figure are in Supplementary Table 2 online. Abbreviations are as defined in Figure 2. For AG1, two types of data normalization methods are presented for both MAQC and TGx data sets: raw/median scaling and quantile normalization. For AFX five types of data normalization methods are presented for both the MAQC and TGx data sets: PLIER, MAS5, dChip, RMA and GCRMA. For GEH_Rat, the raw/median data are presented for the TGx data set. This analysis indicates that (i) a degree of compression in signal is evident with the slope <1 for Affymetrix and GE Healthcare platforms, (ii) the quantile normalization method causes the data to separate by sample type and (iii) three outlying assays are identified in AG1 site 2.

1135

Figure 4 Expected versus observed log10 ratio comparison for Agilent Two-Color ERC data. The expected log10 ratios on the x-axis were based on the quantity of each tERC transcript spiked into the total RNA (Supplementary Table 3 online). The dye-normalized signal ratios obtained from the AGL Feature Extraction software are plotted as observed log10 ratios on the y-axis. These are grouped by site name and ordered by sample combination. In the Two-Color assay, four pairs of RNA samples were generated by using only samples A and B. The samples are named AA, BB, AB and BA where the letters represent the RNA sample type with the first letter denoting the sample labeled with Cy5 and the second letter the sample labeled with Cy3. Four outlying assays are highlighted as red, two from site 1 and two from site 2.

Sample_AA

Sample_BA

Sample_AB

AG1_1_D1 Site 1

AGL_2_C4

Site 2

AGL_2_A1

across the different normalization methods when PLIER, MAS5, RMA and dChip were examined for AFX and the median scaling and quantile normalizations were applied to AG1 (Supplementary Fig. 3 online). This behavior was more pronounced when the ratio of the median tERC signal intensity was divided by the corresponding median biological probe intensity and plotted against the percentage of brain in the biological target sample as depicted in Figure 5 (Supplementary Table 4 online), where a positive linear correlation was observed across three different one-color platforms (ABI, AFX and AG1) with slopes >0 and high correlation coefficients (R2 > 0.8). Two titration points (sample C and sample D) were plotted based on the amount of brain in the sample based on the volumetric mixing of samples A and B where C = 75%A + 25%B and D = 25%A + 75%B (Fig. 5). This plot is accurate if the percentage of mRNA is equal between sample A and sample B. However, the Agilent two-color tERC data indicate that the percentage of mRNA was higher in sample A compared to that in sample B (Fig. 4 and Supplementary Fig. 4 online). If we assume that sample A has 1.5-fold more mRNA as compared to sample B12, the percentage of brain RNA in sample C becomes 18% and for sample D becomes 67%. When these values are used in the x-axis of Supplementary Fig. 5 online, the correlation coefficients improve for all of the samples at all of the sites for three different microarray platforms, further supporting the hypothesis that the samples have different percentages of mRNA. The effect of the mRNA abundance differences between the four samples on cERC signal intensities was also investigated. Unlike tERC signal intensities, the cERC signal intensities across the four RNA samples for the ABI and AFX exhibited no significant difference (Supplementary Fig. 6 online), indicating that the cERCs added before hybridization are unaffected by the differences in the relative abundance of the sample mRNA tested in this set of experiments. The observation is also not affected by the choice of normalization (Supplementary Fig. 7 online). This result further supports the hypothesis that the differences between the biological samples occur at an earlier stage of target preparation. Additional analyses using external RNA controls For most assays identified as problematic, one or several ERCs behave differently from the others, which should be captured by an intensitybased unsupervised analysis, such as principal component analysis (PCA)17 or hierarchical cluster analysis (HCA)17. PCA based on tERC signal intensity identified AG1_1_D2, AG1_1_A1 and AG1_3_B3 as outliers, consistent with the PCA plot based on the entire microarray

1136

Sample_BB

AG1_1_B5

Observed log ratio

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

A N A LY S I S

Site 3

Expected log ratio

(Fig. 6a). Agilent’s Feature Extraction QC Report uses a different algorithm: the concentration-response curve fit to the linear portion is performed on a log-log plot after a parameterized sigmoidal curve fit of the data. The R2 correlations and slopes from the AG1 QC report are shown in Figure 6b. This type of sigmoidal curve fitting ignores the differences seen in the tERCs outside the linear range and results in identification of a different set of outlying assays than in the analysis shown in Fig. 3a, but with the same assays as identified in the PCA analysis (Fig. 6a). Results similar to those in Fig. 6a are also observed using HCA (Supplementary Fig. 8 online). These analyses, as well as approaches based on the concentration-response curve (Figs. 2 and 3) demonstrate the value of combining various ERC-specific approaches to enhance the capability of assay assessment. DISCUSSION A number of microarray manufacturers use ERCs to assess the technical performance of their gene expression assays. This study investigated the utility of ERCs, with emphasis on cERCs and tERCs, for assay assessment across five commercial microarray platforms using the MAQC data set10 and a rat toxicogenomic data set11. This study explores several different uses of ERCs for assay assessment. First, the observed ERC signal intensities were examined against the expected concentrations to visually detect potential outlying assays, which tend to deviate from the expected concentrationresponse curve trend. Second, the concentration-response curves were modeled for identification of potential outlying assays using output variables from linear regression analysis. These two approaches take advantage of the unique characteristic of ERCs spiked across a wide range of differing concentrations. However, for some platforms such as Applied Biosystems, ERCs are spiked in at a constant concentration, requiring analysis methods other than the concentration-response curve analysis. Thus, PCA and HCA were conducted based on the ERC signal intensity, and the ERC-identified outlying data sets are consistent with the analysis results based on the biological wholemicroarray data. These approaches are complimentary to each other and could be used in conjunction to enhance the discrimination of outlier identification.

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

A N A LY S I S Site 2

Site 3

Our key findings can be summarized as follows. The cERCs exhibit stable and consistent performance across both samples and sites. ABI tERC signal intensities increased and the biological probe signal intensity decreased in proportion to increasing amounts of brain RNA in the samples. When the tERC is added to total RNA samples, it is assumed that the tERC tranR = 0.9522 R = 0.8314 R = 0.8372 scripts are at different relative proportions to the pool of biological RNA transcripts. As the AGI abundance of mRNA is relatively higher in sample A as compared to the brain (sample B), the median signal of biological probes was found to be higher in sample A than in sample B, whereas the median tERC signal had the inverse relationR = 0.9350 R = 0.9788 R = 0.8325 ship. We further determined that different levels of compression in gene expression exist across commercial platforms, indicating that care must AFX be taken when conducting a cross-platform comparison with respect to making absolute fold-change assessments. And, finally, we also determined that quantile-based normalization Brain in the sample (%) approaches, such as those used in PLIER, RMA Figure 5 Illustration of the sample-dependent behavior of tERC signal across the MAQC samples. The and GCRMA for Affymetrix and for the Agilent ratio of the median tERC signal to the median biological signal is plotted against the percentages of One-Color and GE Healthcare platforms, reveal brain RNA in the different samples (0%, 25%, 75%, and 100% for A, C, D and B, respectively). In the variability of the concentration-response all nine groupings (three sites for each of three platforms), the slope was greater than zero with high slope estimates. This increase in variability may correlation coefficients, indicating that the tERC signal intensity is dependent on the abundance of mRNA or biological differences of the samples. Data used in creating this figure, along with the result from the differences in percentage mRNA statistical assessment, are summarized in Supplementary Table 4 online. between samples A and B. Although the mediannormalized signals of the tERCs and cERCs are relatively consistent, their relative ranks within ERCs added at different steps in the assay offer a quality control for samples A and B are different. Quantile normalization forces the distribudifferent steps of the assay process. cERCs are tolerant to differences tions of all data sets to be identical, moving the signals for the tERCs and in the mRNA abundance in the total RNA samples and provide the cERCs away from their original raw expression values. Because no single common standard set of external RNA controls advantage of being able to assess assay performance independent of the total RNA sample complexity (Fig. 2). A limitation of the cERCs is the using extended concentration range and a Latin square design are in inability to detect variability that may occur during target preparation. place for use across platforms in the microarray community, it is not Because tERCs are added into the assay process at a very early stage, they can reveal failures during sample collection, storage, labeling and amplification as well as hybridization, scanning and data collection. As a tERC probes Biological probes poor target quality is a common reason for aberrant assay results, there is 20 150 value in being able to use tERC to assess this independently, while using 15 100 cERCs to differentiate post-labeling sources of variation. Therefore, 10 50 these two types of ERCs are most valuable when used in combination. 5 0 0 This utility was demonstrated through the analysis of the AFX site 1 –50 –5 AG1_2_D2 data. The combination of tERC and cERC information assisted in the –100 –10 determination of sample amplification and labeling yields that differed AG1_1_A1 –150 –15 AG1_3_B3 AG1_1_A1 –200 from other sites and underlies the spread in the variability data. –20 AG1_2_D2 R 2 = 0.9789

R 2 = 0.8198

2

2

2

2

2

2

쐽 Sample A 쐽 Sample B 쐽 Sample C 쐽 Sample D

PC2

Ratio of median tERC signal divided by median biological signal

–25 –30 –20 –10

AG1_3_B3

0

10

20

30

40

50

60

–250 –200

70

–100

0

PC1

Figure 6 Alternative analysis using ERCs. (a) The Principal Component Analysis (PCA) of the Agilent tERC signal intensity is compared with the Agilent biological signal intensities. The graphs are colored by sample and shaped by site (site 1, triangle; site 2, square; site 3, circle). The same three assays (AG1_1_A1, AG1_2_D2, and AG1_3_B3) are potential outliers based on their shift in both the tERC and the biological signal. (b) Similar to Figure 3a, except that the parameterized sigmoidal curve–fitted linear regression data from the Agilent QC Report concentration-response curves was used to compare R2 correlation data (y-axis) and slope data (y-axis). The same three outlying assays identified in the PCA are shown as potential outliers in this analysis (circled in red) demonstrating identification of outlier agreement between two fairly different analyses.

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

100

200

300

PC1

b 1

1

1

0.995

0.995

0.995

0.99

0.99

0.985

0.985

R2

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

Site 1

R 2 = 0.9170

0.99 AG1_2_D2

0.985

0.98

0.98

0.98

0.975

0.975

0.975

AG1_1_A1

0.97 0.9

1

1.1

1.2

1.3

0.97 0.9

1

1.1

1.2

1.3

0.97 0.9

Sample A Sample B Sample C Sample D

1

AG1_3_B3

1.1

1.2

1.3

AG1 estimate of slope

1137

A N A LY S I S

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

Box 1 Recommendations for the implementation of external RNA controls • One key benefit of external RNA controls (ERCs) is the ability to get a qualitative assessment of assay performance. This benefit will be more fully realized when an extensive set of ERCs is available. • A comprehensive study is needed for modeling concentrationresponse behavior based on large data sets to determine the tolerance ranges for linear fit, slope and y-intercept for assay assessment, specifically in the context of false positives and false negatives. • The development of ERC-specific analysis approaches is encouraged. • ERCs that are added at both the total RNA level and cRNA level are valuable as they enable failure analysis for different steps of the assay. Using both types of ERCs in the same assay is beneficial for monitoring quality at multiple steps in the process.

yet possible to run the ideal set of external controls for a study of this nature1. Thus, the intent of this study was to identify key attributes of ERC performance that should be considered for designing better ERCs and associated analysis approaches in the future, which is one of the many important ERCC endeavors1. Based on the findings of this study, several points of consideration are summarized in Box 1. METHODS MAQC and TGx data sets. There are two types of data sets considered in this study; both are generated from the MAQC project. The difference between these two data sets is the nature of RNA samples used for generating the gene expression data. The MAQC data set used two calibrated RNA samples (A-Stratagene Universal Reference RNA and B-Ambion Brain reference RNA) and their two mixtures (C- 75%A/25%B and D-25%A/75%B). Applied Biosystems data (ABI), Affymetrix GeneChip data (AFX), and Agilent’s One-Color platform data (AG1) were generated using these four RNA samples. Each platform comprises a total of 60 microarrays, five technical replicates for each of four samples (A, B, C and D) for one test site (20 microarrays) and data from three test sites were used. In addition, Agilent Two-Color platform data (AGL) were also generated, but using only samples A and B. For AGL, four sets of assays were conducted with five replicates for each set, two dye swap experiments using brain-Cy5/UHRR-Cy3 (sample BA) and UHRR-Cy5/brain-Cy3 (sample AB) along two types of self-self hybridizations with brain-Cy5/brain-Cy3 (sample BB) and UHRR-Cy5/UHRR-Cy3 (sample AA), resulting in a total of 20 assays. The toxicogenomics (TGx) data set applied the RNA samples from rat livers in a TGx study. The detailed experimental protocol is described elsewhere11. Briefly, six-week-old Big Blue rats were treated with three compounds for 12 weeks and then killed. The compounds were aristolochic acid, a potent nephrotoxin and carcinogen that is present in plants used in herbal medicines, riddelliine, a carcinogenic pyrrolizidine alkaloid that contaminates various plants, and comfrey, a plant consumed by humans that is a rat liver carcinogen. RNA samples were isolated from livers of the rats treated with three compounds along with a liver control. In addition, RNA samples were also isolated from kidneys associated with treatment of aristolochic acid and a kidney control. Thus, there were a total of six types of rat RNA samples (four from liver and two from kidney). Six biological replicates (rats) were generated for each type of six RNA samples. The gene expression data were generated from four microarray platforms, Applied Biosystems (ABI_Rat), Affymetrix GeneChip (AFX_Rat), Agilent OneColor microarray (AG1_Rat), and GE Healthcare CodeLink (GEH_Rat). For each platform, 36 microarrays were generated, six for each of six groups. Applied Biosystems external RNA controls. These controls contains a suite of controls (>1,592 control probes) that can be used to check the quality of many aspects of an expression profiling experiment. These controls include

1138

the following: blank features, control ladders, hybridization controls, in vitro transcription (IVT) labeling controls, reverse transcription labeling controls, negative controls, spatial calibration controls and manufacturing quality controls. Among these controls, we used only the IVT and reverse transcription labeling controls and the hybridization controls, which are spiked at a single fixed concentration. For the hybridization controls, three unlabeled probes are spotted on the microarray: HYB_Control_1_Cp (60 replicates), HYB_Control_2_Cp (60 replicates) and HYB_Control_3_Cp (115 replicates). The hybridization cERCs consist of three digoxigenin-labeled 60-mer oligo control targets supplied with the chemiluminescence detection kit HYB_Control_1_Ct, HYB_Control_2_Ct and HYB_Control_3_Ct. The digoxigenin-labeled oligo targets (cDNA or cRNA) are added to the hybridization mixture. Presence of signal indicates hybridization occurrence and signal strength indicates hybridization stringency. IVT controls consist of three synthetic double-stranded cDNA with a T7 promoter and bacterial control gene sequences: bioB, 1,000-nt ds-cDNA; bioC, 750-nt dscDNA; bioD, 600-nt ds-cDNA. Five probes were used for each of three bacterial control genes, bioB, bioC and bioD targeting different regions of the control genes. This resulted in 15 probes and each probe is spotted eight times. Reverse transcription controls consist of three synthetic mRNAs with bacterial control gene sequences: lys, 1000-nt mRNA with poly(A) tail; phe, 1,400-nt mRNA with poly(A) tail; and dap, 1,900-nt mRNA with poly(A) tail. The synthetic mRNAs are added to the reverse transcription reaction with the RNA sample when using the reverse transcription labeling kit or the RT-IVT labeling kit. There are five control probes for each reverse transcription control gene targeting different regions on the gene, and each probe is spotted eight times with a total of 120 reverse transcription control probes. More detail on these controls can be found in http://docs.appliedbiosystems.com/pebiodocs/00113259.pdf and http://docs. appliedbiosystems.com/pebiodocs/04338853.pdf. Affymetrix external RNA controls. ERCs on GeneChip eukaryotic microarrays include poly-A controls (lys, phe, thr and dap) and hybridization controls (bioB, bioC, bioD and cre). Poly-A controls are Bacillus subtilis genes that are modified by the addition of poly-A tails, and then cloned into pBluescript vectors. The GeneChip Poly-A RNA Control Kit (P/N 900433) contains a presynthesized mixture of lys, phe, thr and dap. These poly A–tailed sense RNA samples can be spiked into isolated RNA samples as controls for the labeling and hybridization processes. Hybridization controls consists of bioB, bioC, bioD and cre. BioB, bioC and bioD represent genes in the biotin synthesis pathway of Escherichia coli; Cre is the recombinase gene from P1 bacteriophage. The GeneChip Eukaryotic Hybridization Control Kit (P/N 900299 and 900362) contains a mixture of biotin-labeled cRNA transcripts of bioB, bioC, bioD and cre. They can be spiked into the hybridization mixture, independent of RNA sample preparation, and used to evaluate sample hybridization efficiency. More detail can be found in GeneChip Expression Analysis Technical Manual (http://www.affymetrix.com/support/ technical/manual/expression_manual.affx) and GeneChip Expression Analysis Data Analysis Fundamentals (http://www.affymetrix.com/support/downloads/ manuals/data_analysis_fundamentals_manual.pdf). Agilent external RNA controls. The Agilent One-Color ERC Kit contains a mixture of ten in vitro synthesized, polyadenylated transcripts derived from the Adenovirus E1A gene. These transcripts are premixed at concentrations that span six logs and differ by one log or half-log increments (Supplementary Table 1 online). The ERC mixture is added to the total RNA, amplified and labeled with Cy3-dye. When the ERCs are used in processing Agilent OneColor microarray assays, the Agilent Feature Extraction (version 8.5) QC Report contains a number of tables and graphs providing information on system performance. These include an indication of the linear portion of the dynamic range of the microarray experiment, the high and low detection limits of the experiment and the reproducibility of the controls with coefficient of variation (CV) percentage calculations across the replicate probes for each of the ten ERCs. For more details, see http://www.chem.agilent.com/scripts/ literaturePDF.asp?iWHID=42629. The Agilent Two-Color ERC Kit contains the same ten tERC transcripts as used in the Agilent One-Color platform. Each transcript is premixed into two different ERC mixtures at known concentrations such that the ten transcripts are present in mass equivalents extending across 2.3 logs of concentration and represent ratios spanning from 1:10 to 10:1 (Supplementary Table 3 online). These two mixtures are spiked into

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

A N A LY S I S

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

either the Cy3 or Cy5 labeling reactions and colabeled with the total RNA. The Agilent Feature Extraction (version 8.5) QC Report contains a number of tables and graphs providing information on system performance. These include a measure of the expected versus observed log ratios that provide an indication of system accuracy, as well as a determination of the reproducibility of the controls with CV percentage calculations across the replicate probes for each of the ten ERCs. For more details, see http://www.chem.agilent.com/scripts/ literaturePDF.asp?iWHID=40485. GE Healthcare external RNA controls. Each CodeLink Whole Genome bioarray, from GE Healthcare, contains a set of positive-control probes designed against six E. coli genes. For each of the six bacterial genes there are five unique probe sequences represented in an 8× redundancy per rat bioarray. Therefore, there are a total of 240 positive-control probes within each bioarray, which are used to assess microarray quality by reporting dynamic range and sensitivity. Each of the six bacterial transcripts is supplied individually as poly-A(+) mRNA, ranging in size from 1,000 to 1,300 ribonucleotides. These control RNAs can be spiked at different concentrations into the total RNA starting material or labeled individually with biotin and spiked into the cRNA before hybridization. The cRNA spiking method, as used in this study, is the manufacturer’s recommendation for independently measuring bioarray quality because effects due to sample integrity and purity are circumvented. The positive-control poly-A(+) mRNAs supplied with the CodeLink Expression Assay Reagent Kit are araB, entF, fixB, hisB, gnd and leuB. These transcripts are reverse transcribed and amplified individually, incorporating biotin, and arranged in a dilution series from 50 fM to 50 pM, in fourfold concentration increments. The final hybridization concentrations of biotinylated spikes in the hybridization solution are araB (51.2pM), entF (12.8pM), fixB (3.2pM), hisB 0.80pM, gnd (0.20fM) and leuB (50.0fM). For more details, see http://www4. amershambiosciences.com/APTRIX/upp00919.nsf/Content/WD%3AExternal+ RNA+co%28274354027-B500%29?OpenDocument&hometitle=WebDocs. Microarray data preprocessing and normalization. Data preprocessing and normalization were performed in ArrayTrack, an FDA microarray data management, analysis and interpretation software18,19. For Affymetrix GeneChip, five different sets of normalized data were used, PLIER, MAS5, dChip, RMA and GCRMA. Present and Absent Calls were generated for each probe set. For the Agilent One-Color microarray, the raw data (gProcessedSignal data), Median Scaling data and Quantile normalized data were used. Negative values and ERCs were not included in the normalization. For the Two-Color microarray, only the dye-normalized Log Ratio data was used, without any further normalization. For the Applied Biosystems Microarray, signal intensity is associated with two measurements, signal/noise ratio and detection call (or flag). The spots having a ratio >3 and flag <8,191 were considered Present. For GE Healthcare CodeLink, the raw data and quantile-normalized data were used. Concentration-response curve analysis. An ERC commonly has multiple replicates placed in different positions of a microarray. In the concentration-response analysis, the ERC signal is the mean intensity over the replicates for AG1 and AGL. For Affymetrix, Applied Biosystems and GE Healthcare platforms, an ERC gene consists of multiple probes targeting different regions of the ERC gene. Thus the ERC signal is calculated by first averaging the signals from different probes of the same gene and then the mean value is calculated over multiple replicates. The concentration-response curve shown in Figure 2 was generated by plotting the concentration of either tERC (spiked poly-A molar ratio) or cERC (spiked concentration in pM) on the x-axis as a function of signal intensity on the y-axis. The amount of cERC added to the hybridization mixture can be expressed in molar amounts based on the mass of the cERC transcript added to a specific volume of the hybridization mixture. Determining the final molar amount of tERCs in the final hybridization mixture is more difficult. One method is to express the ERC as a mass fraction of the total RNA used in the experiment, which has been recommended by ERCC1. A second method is to use a number of assumptions to determine the poly-A mass ratios. The assumptions used for this paper are that the average percentage of mRNA in total RNA is 2%, the average transcript length is 2,000 bases and the average molecular weight of a single base is 330 g/mol. Using these assumptions and the known length of the individual

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

tERCs, the poly-A mass ratio for the different tERCs was calculated. Both cERC concentration and tERC poly-A molar ratio used for analysis are summarized in Supplementary Table 1. The linear regression analysis of the concentration-response curve was based on the linear portion of the curve (Fig. 3), which were generated in JMP Genomics (http://www.jmp.com/). All ERCs were used in analysis for both AFX and GEH_Rat but only six of ten tERCs were applied for AG1 by removing one top tERC at the signal saturation range and three bottom tERCs at the noise level. Agilent’s Feature Extraction QC Report uses a similar algorithm for the same analysis. In this method, the concentration-response curve fit to the linear portion was performed on a log-log plot after a parameterized sigmoidal curve fit of the data. Note: Supplementary information is available on the Nature Biotechnology website. DISCLAIMER This work includes contributions from, and was reviewed by, the FDA. The FDA has approved this work for publication, but it does not necessarily reflect official Agency policy. Certain commercial materials and equipment are identified in order to adequately specify experimental procedures. In no case does such identification imply recommendation or endorsement by the FDA, nor does it imply that the items identified are necessarily the best available for the purpose. COMPETING INTERESTS STATEMENT The authors declare competing financial interests (see the Nature Biotechnology website for details). Published online at http://www.nature.com/naturebiotechnology/ Reprints and permissions information is available online at http://npg.nature.com/ reprintsandpermissions/

1. ERCC. Proposed methods for testing and selecting the ERCC external RNA controls. BMC Genomics 6, 150 (2005). 2. ERCC. The External RNA Controls Consortium: a progress report. Nat. Methods 2, 731– 734 (2005). 3. Hill, A.A. et al. Evaluation of normalization procedures for oligonucleotide array data based on spiked cRNA controls. Genome Biol 2, RESEARCH0055 (2001). 4. Rajagopalan, D. A comparison of statistical methods for analysis of high density oligonucleotide array data. Bioinformatics 19, 1469–1476 (2003). 5. Irizarry, R.A. et al. Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 31, e15 (2003). 6. Irizarry, R.A. et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4, 249–264 (2003). 7. Freudenberg, J., Boriss, H. & Hasenclever, D. Comparison of preprocessing procedures for oligo-nucleotide micro-arrays by parametric bootstrap simulation of spike-in experiments. Methods Inf. Med. 43, 434–438 (2004). 8. Choe, S.E., Boutros, M., Michelson, A.M., Church, G.M. & Halfon, M.S. Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset. Genome Biol. 6, R16 (2005). 9. Dabney, A.R. & Storey, J.D. A reanalysis of a published Affymetrix GeneChip control dataset. Genome Biol. 7, 401 (2006). 10. MAQC Consortium. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat. Biotechnol. 24, 1151–1161 (2006). 11. Guo, L. et al. Rat toxicogenomic study reveals analytical consistency across microarray platforms. Nat. Biotechnol. 24, 1162–1169 (2006). 12. Shippy, R. et al. Using RNA sample titrations to assess microarray platform performance and normalization techniques. Nat. Biotechnol. 24, 1123–1131 (2006). 13. “Guide to Probe Logarithmic Intensity Error (PLIER) Estimation”, Affymetrix Technical Note, http://www.affymetrix.com/support/technical/technotes/plier_technote.pdf 14. Microarray Suite User’s Guide, Version 5.0, http://www.affymetrix.com/support/ technical/manuals.affx 15. Wu, Z., Irizarry, R.A., Gentleman, R., Murillo, F.M. & Spencer, F.A model based background adjustment for oligonucleotide expression arrays. J. Am. Stat. Assoc. 99, 909– 917 (2004). 16. Li, C. & Wong, W. Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc. Natl. Acad. Sci. USA 98, 31–36 (2001). 17. Fang, H., Xie, Q., Boneva, R., Fostel, J., Perkins, R. & Tong, W. Gene expression profile exploration of a large dataset on chronic fatigue syndrome. Pharmacogenomics, 7, 429–440, (2006). 18. Tong, W. et al. ArrayTrack–supporting toxicogenomic research at the US Food and Drug Administration National Center for Toxicological Research. Environ. Health Perspect. 111, 1819–1826 (2003). 19. Tong, W. et al. Development of public toxicogenomics software for microarray data management and analysis. Mutat. Res. 549, 241–253 (2004).

1139

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

A N A LY S I S

Performance comparison of one-color and two-color platforms within the MicroArray Quality Control (MAQC) project Tucker A Patterson1, Edward K Lobenhofer2, Stephanie B Fulmer-Smentek3, Patrick J Collins3, Tzu-Ming Chu4, Wenjun Bao4, Hong Fang5, Ernest S Kawasaki6, Janet Hager7, Irina R Tikhonova7, Stephen J Walker8, Liang Zhang9, Patrick Hurban2, Francoise de Longueville10, James C Fuscoe1, Weida Tong1, Leming Shi1 & Russell D Wolfinger4 Microarray-based expression profiling experiments typically use either a one-color or a two-color design to measure mRNA abundance. The validity of each approach has been amply demonstrated. Here we provide a simultaneous comparison of results from one- and two-color labeling designs, using two independent RNA samples from the MicroArray Quality Control (MAQC) project, tested on each of three different microarray platforms. The data were evaluated in terms of reproducibility, specificity, sensitivity and accuracy to determine if the two approaches provide comparable results. For each of the three microarray platforms tested, the results show good agreement with high correlation coefficients and high concordance of differentially expressed gene lists within each platform. Cumulatively, these comparisons indicate that data quality is essentially equivalent between the one- and two-color approaches and strongly suggest that this variable need not be a primary factor in decisions regarding experimental microarray design. Although microarray technology has now been available for more than ten years1–3, many fundamental questions remain about essentially every aspect of its use, including experimental design, data acquisition, data analysis and data interpretation. One of the first decisions encountered

1 National Center for Toxicological Research, US Food & Drug Administration, 3900 NCTR Rd., Jefferson, Arkansas 72079, USA. 2Cogenics, A Division of Clinical Data, 100 Perimeter Park Drive, Suite C, Morrisville, North Carolina 27560, USA. 3Integrated Biology Solutions, Agilent Technologies, 5301 Stevens Creek Blvd., Santa Clara, California 95052-8059, USA. 4SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513, USA. 5Division of Bioinformatics, Z-Tech Corporation at NCTR/FDA, 3900 NCTR Rd., Jefferson, Arkansas 72079, USA. 6NCI Advanced Technology Center, 8717 Grovemont Circle, Bethesda, Maryland 20892-4605, USA. 7Yale University, W.M. Keck Biotechnology Resource Laboratory, Microarray Resource, 300 George St., Suite 2110, New Haven, Connecticut 06511, USA. 8Department of Physiology & Pharmacology, Wake Forest University School of Medicine, 115 S. Chestnut St., Winston-Salem, North Carolina 27101, USA. 9CapitalBio Corporation, 18 Life Science Parkway, Changping District, Beijing 102206, P.R. China. 10Gene Expression Chips, Eppendorf Array Technologies (EAT), 20, rue du seminaire, 5000 Namur, Belgium. Correspondence should be addressed to T.A.P. ([email protected]).

Published online 8 September 2006; doi:10.1038/nbt1242

1140

when planning a microarray experiment is whether to use a one-color or two-color approach. A one-color procedure involves the hybridization of a single sample to each microarray after it has been labeled with a single fluorophore (such as phycoerythrin, cyanine-3 (Cy3) or cyanine-5 (Cy5)), whereas in a two-color procedure, two samples (e.g., experimental and control) are labeled with different fluorophores (usually Cy3 and Cy5 dyes) and hybridized together on a single microarray. There are advantages and disadvantages associated with each experimental approach. Although the two-color design was initially developed to reduce error associated with the variability in microarray manufacturing, the availability of high quality commercial microarrays has decreased the variability due to microarray production and thereby improved the consistency of microarray results at both the signal and ratio level. In two-color designs, the hybridization of two samples to the same microarray allows a direct comparison, minimizing variability due to processing multiple microarrays per assay. This reduced variability theoretically results in increased sensitivity and accuracy in determining levels of differential expression between sample pairs. More complex hybridization schemes are also an option when using two-color platforms, including hybridization with common reference samples or the use of loop designs4. Although dye-specific biases can substantially affect results when experiments are performed using two-color designs, these biases can be mitigated by performing dye-reversed replicates (dye swaps or fluorophore reversals). Such technical replication adds to experimental costs, but can enhance both accuracy and sensitivity in measuring differential expression. The primary advantages of one-color designs are experimental design simplicity and flexibility. Hybridization of a single sample per microarray facilitates comparisons across microarrays and between groups of samples. Data inconsistency across assays due to multiple sources of variability, including microarray fabrication and processing, can be reduced for one-color microarrays by performing sufficient biological and technical replicate assays. Several groups have reported an inability to generate reproducible data across laboratories and across platforms5,6. More recent studies have demonstrated that under properly controlled conditions both interand intralaboratory comparisons show relatively good agreement7–10. Although a few recent studies have made one-color to two-color comparisons across different platforms11–14 this manuscript describes a

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

A N A LY S I S comprehensive study comparing one-color to two-color designs within the same platform and across multiple test sites. An advantage of this type of comparison is that results can be easily compared within a platform because the same microarray (thus identical probes), sample labeling protocols and detection technologies are used for both the one- and two-color designs. In this study we have used three different microarray platforms with the intent of focusing on the experimental design variable, rather than specific attributes of a given platform. Although comparison across platforms is possible, the purpose of this study is to compare results within and across design schemes for each platform. Differential expression profiles from a pair of total RNA samples (Stratagene Universal Human Reference total RNA and Ambion Human Brain Reference total RNA) were generated using both one-color and two-color assays on different microarray platforms (Agilent, CapitalBio and TeleChem). These data were used to evaluate the reproducibility, specificity and sensitivity of differential expression measurements between one- and two-color experimental designs within each platform. These analyses attempt to answer a fundamental question in microarray assay experimental design: are there significant differences between the results obtained with a onecolor approach versus a two-color approach? RESULTS All data sets from the three platforms and five test sites (three sites for the Agilent platform and one site each for CapitalBio and TeleChem) were generated using the recommended protocols and methods of the respective manufacturers (including amplification, labeling, hybridization, image analysis and data preprocessing and filtering). The same lots of two distinct RNA samples (Stratagene Universal Human Reference total RNA and Ambion Human Brain Reference total RNA) were used

for all data sets. For each of the Agilent and CapitalBio sites 20 microarrays (10 two-color and 10 one-color) were used. For the TeleChem site, 30 microarrays (20 two-color and 10 one-color) were used. Across all five sites, a total of 110 microarrays were hybridized (60 two-color and 50 one-color), which assayed a total of 170 samples (see the Methods section for additional experimental design details). After data preprocessing and filtering, the numbers of probes used in subsequent analyses for the Agilent, CapitalBio and TeleChem platforms were 19,802, 11,735 and 12,453, respectively. Reproducibility To examine reproducibility within platforms, we calculated Pearson correlations on log2-scaled data for all pair-wise combinations of microarrays within a given sample, and then averaged across combinations of specific microarrays to enable different comparisons regarding technical or platform variability. Table 1 presents average intra- and intersite correlations of intensities or ratios within one- and two-color designs for each platform. Scatter plots representing a subset of the comparisons are illustrated in Supplementary Figure 1 online. For the two-color designs, intensity reproducibility was calculated both within and across the two different dyes to assess the impact of the dye on the resulting measurement. For the within-dye calculations, the technical replicates of samples labeled with the same dye across the microarrays were considered, and for the across-dyes calculations, all of the replicates for a given sample when labeled with either dye were evaluated. The ratio results were separated according to whether the values used were calculated from within or across dye-swap configurations. Most of the average correlations are well above 0.9, indicating high reproducibility. As expected, the correlations decline when computed

Table 1 Averages and standard deviations of Pearson correlations for both one-color and two-color data from each of the three platforms Platform

Comparison

Agilent (three sites)

CapitalBio (one site)

TeleChem (one site)

Average one-color correlation value (s.d.)

Average two-color correlation value (s.d.)

Intrasite Within Dye/A

0.992 (0.005)

0.990 (0.013)

Intrasite Within Dye/B

0.993 (0.004)

0.980 (0.038)

Intrasite Within Dye Swap (Ratio)

n/a

0.980 (0.032)

Intrasite Across Dye/A

n/a

0.984 (0.015)

Intrasite Across Dye/B

n/a

0.977 (0.029)

Intrasite Across Dye Swap (Ratio)

n/a

0.950 (0.019)

Intersite Intra Dye/A

0.959 (0.018)

0.982 (0.015)

Intersite Intra Dye/B

0.965 (0.015)

0.970 (0.038)

Intersite Within Dye Swap (Ratio)

n/a

0.968 (0.031)

Intersite Across Dye/A

n/a

0.977 (0.016)

Intersite Across Dye/B

n/a

0.966 (0.033)

Intersite Across Dye Swap (Ratio)

n/a

0.950 (0.023)

Intrasite Within Dye/A

0.959 (0.010)

0.913 (0.073)

Intrasite Within Dye/B

0.975 (0.006)

0.912 (0.078)

Intrasite Within Dye Swap (Ratio)

n/a

0.955 (0.038)

Intrasite Across Dye/A

n/a

0.916 (0.074)

Intrasite Across Dye/B

n/a

0.918 (0.075)

Intrasite Across Dye Swap (Ratio)

n/a

0.950 (0.038)

Intrasite Within Dye/A

0.931 (0.018)

0.902 (0.042)

Intrasite Within Dye/B

0.885 (0.023)

0.910 (0.041)

Intrasite Within Dye Swap (Ratio)

n/a

0.805 (0.072)

Intrasite Across Dye/A

n/a

0.887 (0.032)

Intrasite Across Dye/B

n/a

0.884 (0.043)

Intrasite Across Dye Swap (Ratio)

n/a

0.543 (0.106)

Correlations are computed from log2 normalized intensity values except for rows containing (Ratio), in which case they are computed from log2 normalized ratios.

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

1141

A N A LY S I S b

30 20 10

–2

0

2

4

6

8

One-color intensity P value (–log10)

10

10

–10 –8

12 9 6 3

–4

–2

0

2

4

6

8

–2

0

2

4

6

8

10

12 9 6

h

20

–4

–2

0

2

4

6

8

Two-color intensity fold change (log2)

i

20

Two-color intensity P value (–log10)

12

12

10

10 8 6 4

8 6 4

2

2

0

0 –6

–4

–2

0

2

4

One-color intensity fold change (log2)

6

8

10

–10 –8

–2

0

2

4

6

8

10

6

8

10

6

8

10

18 15 12 9 6

–6

–4

–2

0

2

4

Two-color ratio fold change (log2) 20 18

14

14

–4

Two-color ratio fold change (log2)

–10 –8

10

16

16

–6

0 –6

18

18

20

3

3

–10 –8

30

–10 –8

f

15

10

One-color intensity fold change (log2)

–10 –8

–4

0 –6

40

0 –6

18

0

50

10

Two-color intensity fold change (log2)

e

18 15

g

20

Two-color ratio P value (–log10)

–4

One-color intensity fold change (log2)

–10 –8

30

0 –6

Two-color intensity P value (–log10)

d

40

Two-color ratio P value (–log10)

–10 –8

Two-color ratio P value (–log10)

40

c

50

Two-color intensity P value (–log10)

50

0

One-color intensity P value (–log10)

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

One-color intensity P value (–log10)

a

16 14 12 10 8 6 4 2 0

–6

–4

–2

0

2

4

Two-color intensity fold change (log2)

6

8

10

–10 –8

–6

–4

–2

0

2

4

Two-color ratio fold change (log2)

Figure 1 Volcano plots depicting estimated fold change (log2, x-axis) and statistical significance (–log10 P value, y-axis). Columns correspond to results from ANOVA model 1 (one-color intensity), model 2 (two-color intensity) and model 3 (two-color ratio). Rows correspond to manufacturers. (a–c) Agilent. (d–f) CapitalBio. (g–i) TeleChem. Each point represents a gene, and colors correspond to ranges of negative log10 P and log2 fold-change values. Red: 20 < –log10 P < 50 and 3 < log2 fold < 9 or –9 < log2 fold < –3; blue: 10 < –log10 P < 50 and 2 < log2 fold < 3 or –3 < log2 fold < –2; yellow: 4 < –log10 P < 50 and 1 < log2 fold < 2 or –2 < log2 fold < –1; pink: 10 < –log10 P < 20 and 3 < log2 fold or log2 fold < –3; light blue: 4 < –log10 P < 10 and 2 < log2 fold or log2 fold < –2; light green: 2 < –log10 P < 4 and 1 < log2 fold or log2 fold < –1; gray: –log10 P < 2 or log2 fold < 1 and log2 fold > –1.

across known sources of variability (dye and site). Interestingly, log2 ratios appeared to be slightly less reproducible than log2 intensities for Agilent and TeleChem, but more reproducible for CapitalBio. This result could be driven by a larger microarray-to-microarray variability for CapitalBio or the performance of a manual channel balancing while scanning two-color, but not one-color CapitalBio microarrays. The overall lower correlation values for TeleChem appear to be driven by a nonlinear dye bias (data not shown). The intersite, one-color results for the Agilent sites are presented elsewhere15 and reveal that the Agilent data are very consistent between sites. To determine if the one-color and two-color designs are revealing the same biology, we compared the reproducibility of the lists of genes identified as differentially expressed by each approach within each platform. Common gene lists were generated comparing the number of differentially expressed genes for one-color and two-color data within each platform (Table 2). Comparisons are given for combinations of two P values (P < 0.05 and P < 0.01) and three fold-change (FC) thresholds (FC > 1.5, FC > 2.0 and FC > 4.0), with differentially expressed genes identified using a one-sample t-test of the sample B to sample A (B/A) ratio

1142

data including five replicates for each site. Concordances of differentially expressed genes are consistently >80% for all three Agilent sites, regardless of the P-value or fold-change criteria used. Similarly, the CapitalBio concordances are consistently ~70%. The TeleChem concordances are less consistent across P values and fold changes and are generally lower than those for the CapitalBio and Agilent data, which is in agreement with the lower overall correlation values for this platform. Specificity and sensitivity In addition to evaluating the reproducibility of the data from the oneand two-color assays, we also considered the sensitivity and specificity. Specificity defines the ability of an assay to determine differences only when they truly exist (that is, the true-negative rate). Sensitivity is the power to detect true differences (that is, the true-positive rate). Both of these measures make a tacit assumption that the truth is divided, which in this case means the mRNA levels derived from a gene are either the same for samples A and B or they are different. The actual truth is that they are likely to be always different, but this difference is small enough relative to technical noise that a substantial fraction

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

A N A LY S I S Table 2 Common gene list results for one- versus two-color microarray data based on differentially expressed genes Test site

Fold change

Number of differentially expressed genes

Number of differentially expressed genes

P < 0.05 Agilent 1

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

Agilent 2

Agilent 3

CapitalBio

TeleChem

P < 0.01

One color

Two color

Common genesa

One color

Two color

Common genesa

FC > 1.5

13,043

12,709

11,053 (86%)

11,771

12,506

10,175 (84%)

FC > 2

9,701

8,812

7,767 (84%)

9,273

8,678

7,467 (83%)

FC > 4

3,998

3,494

3,055 (82%)

3,979

3,447

3,029 (82%)

FC > 1.5

13,308

12,345

10,992 (86%)

12,673

11,410

9,940 (83%)

FC > 2

9,792

8,686

7,712 (83%)

9,526

8,043

7,071 (80%)

FC > 4

4,077

3,623

3,104 (81%)

4,042

3,261

2,886 (79%)

FC > 1.5

12,968

12,545

11,192 (88%)

12,537

12,056

10,580 (86%)

FC > 2

9,363

8,720

7,721 (85%)

9,266

8,373

7,397 (84%)

FC > 4

3,728

3,596

3,058 (84%)

3,716

3,399

2,987 (84%)

FC > 1.5

7,344

6,336

5,129 (75%)

6,238

6,098

4,529 (73%)

FC > 2

5,383

4,154

3,426 (72%)

5,004

4,078

3,203 (71%)

FC > 4

2,207

1,599

1,283 (67%)

2,081

1,580

1,187 (65%)

FC > 1.5

2,883

3,306

1,491 (48%)

1,079

3,305

760 (35%)

FC > 2

2,220

1,133

659 (39%)

997

1,133

458 (43%)

FC > 4

645

178

148 (36%)

475

178

140 (43%)

Values are presented using two different statistical comparisons (P < 0.05 or P < 0.01) and three different fold-change (FC > 1.5, 2 or 4) criteria. aThe values in parentheses represent the percentage of common genes based on the number of common genes identified as differentially expressed in both one- and two-color approaches divided by the total number of differentially expressed genes from both approaches combined.

of mRNA levels can be considered to be the same. When the binary truth is known, the trade-off between sensitivity and specificity is typically portrayed using a receiver operator characteristic (ROC) plot. However, here the truth is unknown with respect to A versus B gene expression, as is the case with most gene expression profiling experiments. Therefore, relative specificity and sensitivity is compared in terms of distributions of statistical modeling results. By using a P-value criterion to declare genes differentially expressed, the specificity (false-positive rate) can be controlled at the desired level. The accuracy of this control depends, at least in part, on the standard t-test assumptions, which can be shown to be approximately valid for these data. Once specificity is bounded, the total number of differentially expressed genes can be compared as a measure of sensitivity. To more rigorously assess sensitivity in this fashion, we fit and compared results from three different gene-by-gene ANOVA models (see Methods for details): Model 1: log2(Intensity) = Mean + Sample + Site + Error Model 2: log2(Intensity) = Mean + Sample + Dye + Sample*Dye + Site +Microarray + Error Model 3: log2(Ratio) = Mean + Dye + Site + Error Model 1 is applied to one-color data, model 2 is applied for intensity data directly without forming ratios for two-color data and model 3 is applied to ratios for two-color data. Direct modeling of intensities in models 1 and 2 enables a straightforward comparison between results for the one- and two-color data. Furthermore, the results from models 2 and 3 are quite similar, and so model 2 provides a bridge between models 1 and 3 that can be used for comparisons with ratio results that are commonly computed with two-color data. Before discussing primary results from these models, it should be noted that there is an imbalance in the number of samples hybridized for the one-color and two-color designs, which improves the sensitivity of the two-color results. More specifically, for each of the Agilent and CapitalBio sites, there are ten one-color microarrays and ten two-color microarrays, hence, there are twice as many samples hybridized on the two-color microarrays; that is, the one-color results effectively have half as much data, as only one sample was hybridized to each microarray.

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

This degree of unbalance is even greater in the TeleChem platform for which 20 two-color and only 10 one-color hybridizations were processed, resulting in four times as much two-color data. Subsequent results should be interpreted with this in mind. The three models were fit to the preprocessed Agilent, CapitalBio and TeleChem data and several output summary statistics were collected for each gene. Volcano plots (Fig. 1) compare the estimated log2 fold-change (x-axes) against its statistical significance (y-axes). Large numbers of genes are identified as differentially expressed as a result of the analyses of data from all three platforms, as is expected when comparing a brain sample to a tissue pool sample. All of the volcano plots visually have a similar distribution and range for the statistical significance values (y-axes) within each platform, except for model 1 for the TeleChem data (Fig. 1g), which has a substantially smaller range, that (as noted above) may be due to differences in the total number of microarrays processed for each approach. For all three platforms there is a tendency for the one-color data to exhibit larger fold changes but smaller significance scores (that is, the volcano plots are shorter and wider for one-color as compared to two-color). Figure 2 provides a more detailed depiction of the results from models 1, 2 and 3. Estimated log2 fold changes are compared in a scatter plot matrix for one-color intensities (model 1), two-color intensities (model 2) and two-color ratios (model 3) for the Agilent, CapitalBio and TeleChem data. The estimated fold changes are very consistent, especially between the two two-color methods (far right column). The fold changes estimated from the one-color data tend to be larger than those estimated by either model for the two-color data, as indicated by the slopes shown in Figure 2. The scatter plots in Figure 3 display negative log10 P-value comparisons from Agilent, CapitalBio and TeleChem data. Larger negative log10 P values mean more significant results. Therefore, when the negative log10 P values from different methods are compared graphically on different axes and the majority of the data points lie above the 45° reference line, it suggests that the method depicted on the y-axis is more sensitive than that depicted on the x-axis (or vice versa if the majority of points lie below the reference line). The scatter plots for the Agilent data suggest

1143

A N A LY S I S b 8

4

0

–4

8

4

0

–4

–8

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

R = 0.98 S = 1.12 Two-color intensity fold change (log2)

One-color intensity fold change (log2)

8

c

R = 0.98 S = 1.12 One-color intensity fold change (log2)

a

–8

–8

–4

0

4

8

R = 1.00 S = 1.00

4

0

–4

–8

–8

Two-color intensity fold change (log2)

–4

0

4

–8

8

Two-color ratio fold change (log2) R = 0.95 S = 1.25

8

Two-color intensity fold change (log2)

One-color intensity fold change (log2)

R = 0.95 S = 1.25 One-color intensity fold change (log2)

0

4

8

f

e

d

–4

Two-color ratio fold change (log2) R = 1.00 S = 1.00

4

0

–4

–8

–8

–4

0

4

8

–8

Two-color intensity fold change (log2)

–4

0

4

8

g

8

Two-color intensity fold change (log2)

8

4

4

0

0

–8

–4

0

4

Two-color intensity fold change (log2)

8

8

R = 1.00 S = 1.00

4

0

–8

–8

–8

4

–4

–4

–4

0

i R = 0.74 S = 1.18

One-color intensity fold change (log2)

One-color intensity fold change (log2)

R = 0.73 S = 1.18

–4

Two-color ratio fold change (log2)

h 8

–8

Two-color ratio fold change (log2)

–8

–4

0

4

Two-color ratio fold change (log2)

8

–8

–4

0

4

8

Two-color ratio fold change (log2)

Figure 2 Comparison of log2 fold-change estimate results from three different modeling approaches for the three different platforms. (a–c) Agilent. (d–f) CapitalBio. (g–i) TeleChem. Columns correspond to log2 fold-change comparisons of one-color intensity versus two-color intensity, one-color intensity versus two-color ratio and two-color intensity versus two-color ratio. Each gray point represents a feature on the microarray. The red lines are 45° reference lines and the contours represent density levels for the points. Statistics for correlation (R) and slope (S) are inset in each graph.

that the two-color intensity-based analysis (model 2) has more power (sensitivity) than both the one-color intensity-based (model 1) and twocolor ratio-based analyses (model 3). The one-color analysis appears to have slightly more power than the two-color ratio analysis in the lower portion of the significance range, whereas the two-color ratio has more power in the upper range. When Figure 1a,c is also considered, the one-color data tend to exhibit larger fold changes, which explains why more differentially expressed genes were observed for the one-color data (Table 2). Figure 3 (row 1) shows that although the power between these two methods is similar, the relationship between them is nonlinear. For the CapitalBio data in Figure 3, both two-color models produce very similar results, and both appear to have more power (sensitivity) than the one-color-intensity analyses. For the TeleChem data, the difference is even more striking. As detailed above, these observed differences may be due to the differences in the amount of data for each approach, as twice as much data were obtained from a two-color assay compared

1144

to a one-color assay. Because of this inequity in the data, the power comparisons shown here are not a completely fair assessment of the sensitivity of one- versus two-color procedures, although they do help to demonstrate the effectiveness of increasing sample sizes, without also increasing the number of microarrays used. For example, from Table 2, when identical thresholds for significance are used, in most instances two-color ratio data produce fewer differentially expressed genes than one-color data, which indicates that either one-color platforms are more sensitive in identifying differentially expressed genes or that the fold changes reported by the one-color platform are less compressed than the two-color fold changes. The data modeled here suggest that the latter result is more likely. For two-color experimental designs, specificity can also be addressed by analysis of self-self hybridizations. In experimental designs that include a dye-swap design such as this, systematic errors are reduced by inclusion of the dye-flip control. One can, therefore, assess the false-positive rate

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

A N A LY S I S from self-self designs if one half of the self comparisons have the polarity reversed before calculation of significance. This analysis was performed for one of the Agilent test sites, for both pairs of self-self experiments. In this analysis, four of the self-self hybridizations were combined with two randomly chosen microarrays chosen for polarity reversal. For the A sample, 98 of 41,000 genes were detected as significantly differentially expressed (that is, false positives, P < 0.01). For the B sample, 61 of 41,000 genes were detected as significantly differentially expressed (P < 0.01). To further address the question of which design (one-color or twocolor) provides greater sensitivity, we examined correlations of one-color and two-color data for one of the Agilent test sites without any filtering based on detection calls (see Supplementary Fig. 2 online). Fold-change values correlated well between the two approaches across the entire intensity range, indicating that the approaches have similar levels of sensitivity. Furthermore, when thresholds for differential expression were applied (P < 0.01 and FC > 1.5) there was a 69% overlap of the genes identified by both approaches. Each approach uniquely identified 13–18% of the total number of differentially expressed genes and only a very small subset of the genes were found to be anticorrelated (18 or 0.09%). Accuracy Whereas specificity and sensitivity refer to a divided version of the truth, a more direct assessment of the accuracy of the platforms can be obtained when the truth is quantitative. Again, the true quantitative differences between the mRNA levels of samples A and B for each gene are unknown, but a well accepted surrogate can be obtained from orthogonal quantitative technologies (e.g., TaqMan assays). As detailed above, when data from one of the Agilent test sites were analyzed, ~31% of the total number of differentially expressed genes detected by one approach was not also identified by the other. To discern if these discordant data points are false positives on one or another of the approaches, we compared both to results generated using TaqMan assays. Genes were selected for measurement in these samples by TaqMan assays as part of the main MAQC study15. Most of the genes assayed by TaqMan were randomly selected from a set of RefSeq genes that were common to four commercial microarray platforms (Affymetrix, Agilent, GE Healthcare and Illumina). More details on the selection of these genes can be found elsewhere16. Figure 4 illustrates the comparison of the one-color, two-color and TaqMan assay data, and is colored based on the significance (P < 0.01 and FC > 1.5) of the ratio between B and A samples for the three different platforms (one-color, two-color and TaqMan assays). Data shown represent either all probes with TaqMan mapped data (Fig. 4a, N = 906) or only probes that were mapped as persistently detected in Agilent one- and two-color experiments (filtered as described in Methods) and detected in at least three of four replicates for both samples in the TaqMan assay data (Fig. 4b, N = 519). The results show a good overall correlation between the TaqMan assay data and both the one-color and two-color data. The 18 probes that were anticorrelated between one- and two-color data were not in the subset of genes assessed with TaqMan assays in this study. However, for those genes identified as discordant between the Agilent one- and two-color data, some were verified with TaqMan assays for each platform. A slightly higher percentage of probes found to be significant for only the two-color design were verified with TaqMan assays (51 of 85 or 60% for one-color, versus 39 of 55 or 71% for two-color; Fig. 4a), thereby indicating that both approaches have similar levels of accuracy. DISCUSSION Every aspect of microarray experimentation, including RNA isolation and purification, labeling and amplification, microarray fabrication, hybridization, data acquisition, analysis and statistical methods has

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

seen major advancements in the last several years. With the variety of platform choices available that have benefited from these advancements, a natural question arises regarding the characteristics of data generated from one-color and two-color assays. Results presented here describe a comprehensive study comparing one-color to two-color assays within three different platforms and across multiple test sites for one of the platforms, using two distinct RNA samples. Differential expression data from a pair of total RNA samples (Stratagene Universal Human Reference total RNA and Ambion Human Brain Reference total RNA) were generated using both one-color and two-color assays on different microarray platforms (Agilent, CapitalBio and TeleChem) and used to evaluate the relative reproducibility, specificity, sensitivity and accuracy of the two approaches. One of the strengths of this analysis is that comparison of the onecolor and two-color assays is not dependent on interplatform analysis, thus avoiding many of the complications inherent to such a comparison (including probe sequence issues as well as target labeling and detection technology differences). In addition, the filtered gene lists used for the analysis presented here are consistent between the two different design schemes on each platform, but are different between the different platforms, further complicating interplatform comparisons. Overall, the results between one- and two-color assays compare well, which aligns with expectations generated by numerous independent successes of one- and two-color microarray applications. Here we provide a statistical validation of this expectation. Reproducibility between the one-color and two-color assays is quite similar for each platform as demonstrated by the consistency of Pearson correlation values. When ratios are generated from the two distinct RNA samples, the differentially expressed gene lists are highly consistent across one- and two-color data when using widely accepted P-value and fold-change thresholds for significance. Just as important, the stability of the differentially expressed gene lists is consistent within individual platforms. Correlation coefficients in Table 1 are higher for the Agilent data, leading to greater overall concordance, but for all the platforms the one-color data and two-color data are comparable when assessing concordance using differentially expressed gene lists. Three ANOVA models are defined to provide a statistical framework for comparison of relative intraplatform specificity and sensitivity. Model 1 applies to one-color log2 intensities and model 3 to two-color log2 ratios. Model 2 handles two-color log2 intensities, and serves as a bridge between models 1 and 3. The use of these models avoids the problem of arbitrarily defining ratios for the one-color data, and enables adjustment for all known sources of variability. In addition, model 2 is shown to have slightly more sensitivity over model 3 for the Agilent data. Modeling two-color intensities directly as in model 2 is not common practice, but offers several advantages, including the ability to study sample-dye interactions. Overall, the relative specificity and sensitivity of the three platforms as determined by the three models is very similar between one- and two-color assays within each platform (Figs. 1–3). The results suggest that the two-color assays have a slight advantage with regard to power (sensitivity) and the detection of small fold changes (Figs. 1 and 3), especially when considering an equal number of microarrays. The one-color data do appear to be less compressed than two-color data as indicated by the slopes shown in Figures 2 and 4, which should be considered when using filtering rules that apply directly to estimated fold changes. In addressing the accuracy of the one-color and two-color assays using data from the Agilent platform, the results also show a good overall correlation with the TaqMan assay data. In some cases the TaqMan assay data have better agreement with the one-color data and in others the TaqMan assay data have better agreement with the two-color data. In

1145

reproducibility of the biology across the two approaches by comparing the concordance of differentially expressed gene lists, performance was approximately equal (Table 2 and Fig. 4). Cumulatively, these results indicate that data generated from both one- and twocolor assays are approximately equivalent and provide similar levels of biological insight. It should be noted that these results may not apply to microarray platforms for which manufacturing variability is high (such as may occur with some suboptimal, in-house, robotically spotted arrays, with poor quality control). All microarrays used in this study were obtained and processed at approximately the same time. Although in all three platforms multiple manufacturing lots of microarrays were used, no effort was made to control which manufacturing lots were grouped together in the study. Hence, the magnitude of the variance of the one-color and two-color results may differ from those presented here, if the data were specifically generated and assessed as individual groups across multiple manufacturing lots. In essence the variability due to manufacturing lot has not been addressed in this study since the array populations for each

many cases the differential expression results were consistent in direction between the one-color and two-color assays, but failed to meet the applied fold-change or significance criteria. In those cases when genes are reported as significantly differentially expressed by TaqMan assays, but not by either the one-color or two-color microarray assays, the differences may be attributable to the fact that the technologies are targeting and measuring different regions of a particular gene and/or splice variant. Also, most of the genes reported as significantly differentially expressed in the TaqMan assay, but not in the microarray data, are below the detection level of the microarray assay (Fig. 4) and may be indicative of the higher sensitivity of the PCR-based method. Finally, the significance of the microarray and TaqMan assays is not directly comparable, as a different level of replication was undertaken for the TaqMan assay data16. In summary, by presenting the experimental design and performance advantages of both modes, researchers are now provided insight and guidance for properly selecting the best approach (oneor two-color) to meet their research needs. When assessing the

50

40

40

30

30

20

20

10

10

0

0 0

10

20

30

40

0

10

One-color intensity P value (–log10)

12

10 8 6 4 2

30

40

4

6

8

10

12

8 6 4

2

4

6

8

10

12

12 10 8 6 4

2

14 16

Two-color intensity P value (–log10)

18

16

8 6 4

0 12

4

2

4

6

8

10

12

14

18

10

0 10

6

0

R = 0.42 S = 0.17

12

2

8

8

Two-color ratio P value (–log10)

14

2

50

10

14

Two-color intensity P value (–log10)

One-color intensity P value (–log10)

14

40

0 0

16

30

R = 0.97 S = 0.99

12

18

6

20

Two-color ratio P value (–log10)

R = 0.41 S = 0.16

4

10

14

10

14

18

2

10

Two-color ratio P value (–log10)

R = 0.76 S = 0.58

Two-color intensity P value (–log10)

0

20

0

0 2

30

50

2

0

16

20

Two-color intensity P value (–log10)

14

R = 0.69 S = 0.60

0

40

Two-color ratio P value (–log10)

14 12

R = 0.94 S = 0.98

0

50

Two-color intensity P value (–log10)

One-color intensity P value (–log10)

50

R = 0.78 S = 0.68

Two-color intensity P value (–log10)

R = 0.81 S = 0.69

One-color intensity P value (–log10)

One-color intensity P value (–log10)

50

One-color intensity P value (–log10)

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

A N A LY S I S

14

R = 0.99 S = 0.99

12 10 8 6 4 2 0

0

2

4

6

8

10

12

Two-color ratio P value (–log10)

14 16

18

0

2

4

6

8

10

12

14

16

18

Two-color ratio P value (–log10)

Figure 3 Comparison of negative log10 P-value estimate results from three different modeling approaches for the three different platforms. (a–c) Agilent. (d–f) CapitalBio. (g–i) TeleChem. Columns correspond to negative log10 P-value estimates of one-color intensity versus two-color intensity, one-color intensity versus two-color ratio and two-color intensity versus two-color ratio. Each gray point represents a feature on the microarray. The red lines are 45° reference lines and the contours represent density levels for the points. Statistics for correlation (R) and slope (S) are inset in each graph.

1146

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

A N A LY S I S b

a R = 0.876 m = 0.733

R = 0.947 m = 0.913

5

5

0

0

–5

–5

R = 0.968 m = 0.842

R = 0.906 m = 0.761

–10

–10 –5

0

5

One-color B/A log2 ratio

One color 34

Two color

38

16

439 51

39 129

160

–10

10 10

One-color B/A log2 ratio

–10

–5

0

5

One-color B/A log2 ratio

R = 0.876 m = 0.807

One color

5

30 0

Two color

28

14

289 29

23 19

–5

87

10 10

One-color B/A log2 ratio

Two-color B/A log2 ratio

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

10

10

R = 0.916 m = 0.924

5

0

–5

Taq

Taq

–10

–10 –10

–5

0

5

10

Taq B/A log2 ratio

–10

–5

0

5

10

Taq B/A log2 ratio

Figure 4 Comparison of Agilent one-color and two-color data with TaqMan assay data. The figure illustrates the comparison of the one-color, two-color and TaqMan assay data, and is colored based on the significance of the ratio between B and A samples for the three different sets of data as illustrated. Significance was based on a P < 0.01 and a 1.5 fold change. Data shown represent either of two possibilities. (a) All probes with TaqMan mapped data (N = 906). (b) Only probes that were mapped as persistently detected in Agilent one- and two-color experiments (filtering as described in Methods) and that were detected in at least three of four replicates for both samples in the TaqMan assay data (N = 519). The numbers in gray refer to the number of genes that are not detected as significantly differentially expressed (based on given FC and P-value criteria) by any of the three assays. Lines shown represent the orthogonal fit to the data with slope (m) and correlation (R) as shown in the inset.

platform were heterogeneous in terms of batch specificity. Ultimately the decision to use either a one-color or two-color approach will be determined by cost, experimental design considerations and personal preference. METHODS Hybridization. Three independent test sites were used for the Agilent platform and one test site each was used for the CapitalBio and TeleChem platforms (five total test sites). All test sites received the same lot numbers of two different total RNA samples (Stratagene Universal Human Reference total RNA (SUHRR, sample A) and Ambion Human Brain Reference total RNA (AHBRR, sample B). The hybridization-dye pairings and RNA descriptions were as follows: twocolor hybridization: a, SUHRR-Cy3 versus SUHRR-Cy5; b, AHBRR-Cy3 versus AHBRR-Cy5; c, SUHRR-Cy3 versus AHBRR-Cy5; d, AHBRR-Cy3 versus SUHRRCy5; one-color hybridization: e, SUHRR-Cy3; f, AHBRR-Cy3. The two-color self-self hybridizations (codes a and b) provide information about the reproducibility and specificity of the two-color hybridizations, but are not used for a majority of the analyses described in this paper because of space constraints and to more evenly balance the comparisons between the one- and two-color results within a platform. However, they are included in the available data set. For each of the Agilent and CapitalBio sites, 5 microarrays were used for each of the RNA codes c, d, e and f, for a total of 20 microarrays (10 two-color and 10 one-color) at each of these sites. For the TeleChem site, 10 microarrays were used for RNA codes c and d, and 5 microarrays for codes e and f, for a total of 30 microarrays (20 two-color and 10 one-color). Across all five sites, a total of 110 microarrays were hybridized (60 two-color and 50 one-color), which assayed a total of 170 samples.

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

RNA quantification and purity assessment. RNA samples were quantified using a NanoDrop ND-100 UV-VIS spectrophotometer. Each test site performed three replicate measurements for each sample using 1.5 µl and reported the values as average ± s.d. RNA intactness assessment. SUHRR and AHBRR (200 ng) were run on the Agilent Bioanalyzer 2100 in triplicate (all samples on one chip) by each test site. rRNA ratio (28S/18S) and RNA Integrity Numbers (RIN) are reported as average ± s.d. Acceptable values were defined as: A260/A280 ratio in the range of 1.8–2.2, rRNA ratio (28S/18S) > 0.9 and RIN value > 8.0. Labeling and hybridizations on the Agilent platform. Five hundred nanograms of total RNA was converted into labeled cRNA with nucleotides coupled to a fluorescent dye (either Cy3 or Cy5) using the Low RNA Input Fluorescent Linear Amplification Kit (version 4.0 protocol) (Agilent Technologies). The quality and quantity of the resulting labeled cRNA was assessed using a NanoDrop ND-1000 spectrophotometer (NanoDrop Technologies) and an Agilent 2100 Bioanalyzer. Individually labeled cRNAs were not pooled before hybridization. Equal amounts of Cy3 and Cy5-labeled cRNA (1.5 µg) from two different samples (for the twocolor protocol) or only from one Cy3-labeled cRNA (for the one-color protocol) were hybridized (see hybridization configurations above) to Agilent Human Whole Genome Oligo Microarrays (G4112A) for 17 h at 65 °C. The hybridized microarrays were then washed using manufacturers’ recommended conditions and scanned using an Agilent G2565BA scanner. Data were extracted from the scanned image using Agilent Technologies’ Feature Extraction software version 8.5 (FE8.5). All data columns present in the extracted data files are described in detail in the Agilent G2567AA FE8.5 Software Reference Guide (http://www. chem.agilent.com/scripts/LiteraturePDF.asp?iWHID=41954).

1147

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

A N A LY S I S Labeling and hybridizations on the CapitalBio platform. The human genomewide long oligonucleotide microarray was constructed in-house at CapitalBio Corporation. Briefly, 5′-amino-modified 70-mer probes representing 21,329 H. sapiens genes from the Human Genome Oligo Set Version 2.1 (Qiagen), and internal and external controls, were printed on amino silaned glass slides using a SmartArray microarrayer (CapitalBio Corp.). Fluorescent-labeled DNA (Cy3 and Cy5-dCTP) was produced through Eberwine’s linear RNA amplification method and subsequent enzymatic reaction. This procedure has been previously described in detail17. Briefly, double-stranded cDNA containing T7 RNA polymerase promoter sequence was synthesized with 5 µg of total RNA using Reverse Transcription System, RNase H, DNA polymerase I and T4 DNA polymerase, according to the manufacturer’s recommended protocol (Promega). The resulting labeled DNA (labeled control and test samples) was quantitatively adjusted based on the efficiency of Cy-dye incorporation and mixed into 80-µl hybridization solution (3× SSC, 0.2% SDS, 25% formamide and 5× Denhart’s). Individually labeled cRNAs were not pooled before hybridization. Hybridization on a microarray (see hybridization configurations above) was performed under LifterSlip (Erie Company). The hybridization chamber was laid on a three-phase Tiling Agitator (CapitalBio Corp.) to facilitate the microfluidic circulation under the coverslip. The microarray was hybridized at 42 °C overnight and washed with two consecutive washing solutions (0.2% SDS, 2× SSC at 42 °C for 5 min, and 0.2% SSC for 5 min at 22 °C) before scanning with a confocal LuxScan scanner (CapitalBio Corp.). For two-color microarrays, the scanning setting for the Cy3 and Cy5 channels was manually balanced by visual inspection of the external control spots. The data from the obtained images were extracted with SpotData software (CapitalBio Corp.). Labeling and hybridizations on the TeleChem H25K platform. Two micrograms of each sample was amplified using a Genisphere SenseAmp Plus Amplification kit (generating amplified poly A–tailed senseRNA), according to the manufacturer’s recommended protocol. The resulting tailed senseRNA was reverse transcribed with amino-allyl indirect labeling using a SuperScript Indirect cDNA Labeling Kit (Invitrogen) with slight modifications. Each first strand cDNA generation reaction used 5 µg of senseRNA with Superscript II and aa-dUTP at 42 °C for 2 h. cDNA was purified using a MinElute PCR Purification Kit and conjugated with mono-functional Cy3 or Cy5 dye aliquot (GE Healthcare) for 1 h at 22 °C in the dark. Dye-conjugated cDNA was purified with a MinElute PCR Purification Kit. Dye:base labeling efficiency was determined at this point for all dye-conjugated cDNA. Hybridization was done manually in TeleChem Hybridization cassettes using LifterSlip (Erie Company). cRNAs were labeled independently and not pooled before hybridization. In one-color experiments, Cy3-labeled cDNA samples were denatured independently and one sample applied to each microarray. For two-color experiments, Cy3- and Cy5-paired cDNA samples were combined and denatured before applying to individual microarrays (see hybridization configurations above). Hybridization mixes (55 µl total volume) consisted of 38.5 µl labeled cDNA, 5.5 µl 2% SDS, 7.0 µl 20× SSC, 3.0 µl poly dA (5 µg/µl) and 1.0 µl Cotl DNA (1 µg/µl). Hybridization cassettes and slides were pre-heated to 55 °C before samples were added and 3× SSC was added into humidity grooves in the cassette. Samples were applied to the microarrays and hybridized for 16 h at 55 °C in a water bath. After hybridization, slides were washed (10 min, 2× SSC/0.1% SDS at 42 °C; 10 min, 0.2× SSC/0.1% SDS at 42 °C; 10 min, 0.2× SSC at 22 °C twice) before centrifugation in 50-ml conical tubes at 201g for 5 min to dry. Scanning was performed on Axon 4200A or 4200B instruments at a PMT yielding 1% or less saturated spots. Agilent data preprocessing, normalization and filtering. For one-color experiments, gProcessedSignal values from Agilent’s Feature Extraction software were used as input into experimental analyses. This ProcessedSignal is generated after background subtraction and includes correction for multiplicative surface trends. Features were marked as Absent (A) when the processed signal intensity was less than twofold the value of the processed signal error value (these features were transformed by setting their processed intensity value to that of the processed signal error value). Features were marked as Marginal (M) when the measured intensity was at a saturated value or if there was a substantial amount of variation in the signal intensity within the pixels of a particular feature. Features not considered Absent or Marginal were marked Present (P).

1148

For the two-color microarrays, raw data signals were preprocessed in a similar fashion as those for one-color microarrays, but did not include a surface-trend correction and did include additional preprocessing to adjust for possible dye bias within a microarray. Data used in the two-color analyses was either the red and green ProcessedSignal or LogRatio values from Agilent’s Feature Extraction software. Dye normalization of two-color Agilent microarrays includes both linear scaling and Lowess normalization to a rank invariant set of microarray features. For some of the analyses (see Table 2, Fig. 4 and Supplementary Fig. 2), LogRatio values, which are calculated from the ProcessedSignals by Agilent’s Feature Extraction software, were used. When LogRatio was used for the twocolor data, the sign on LogRatio was changed for half of the RNA comparisons to accommodate the dye swap. Generation of a filtered feature list for Agilent one- and two-color data was conducted as follows: (i) Agilent flagging rules were applied, setting all absent and marginal features to missing. (ii) To derive a reliable common gene set across both one- and two-color data, features with fewer than 50% present genes across all microarrays were filtered. (iii) Features with fewer than five present calls from each sample group (A or B) across sites for one-color or fewer than five present calls across sites for two-color were also filtered. (iv) This filtering results in 19,802 genes in the final common gene set that was used for much of the statistical analysis presented, from a total of 41,000 non-control probes on the microarray. For the analysis presented in Figure 4 and Supplementary Fig. 2, all 41,000 noncontrol probes were included in the analysis. Further details on the data processing steps used to generate the Agilent oneand two-color output columns can be found in the Agilent G2567AA FE8.5 Software Reference Guide (http://www.chem.agilent.com/scripts/LiteraturePDF. asp?iWHID=41954). Data were median normalized for the statistical analyses in Figures 1–3 and Supplementary Figure 1 through JMP Genomics software (http://www.jmp. com/). For the remainder of the analyses, normalization of the Agilent one-color data was performed in GeneSpring GX as follows: (i) Values below 5.0 were set to 5.0. (ii) Each measurement was divided by the 50th percentile of all measurements in that sample. The percentile was calculated using only genes marked present. For analyses presented in Figure 4 and Supplementary Figure 2 only, specific samples were normalized to one another. All samples were normalized against the median of the control samples (A). Each measurement for each gene in those specific samples was divided by the median of that gene’s measurements in the corresponding control samples. CapitalBio data preprocessing, normalization and filtering. All one-color and two-color images were analyzed using SpotData software (CapitalBio Corp.) and raw data were provided in the form of tab-delimited text files for each microarray. A spot-exclusion method was adopted to filter faint spots18,19. The average log2 intensity of each gene across all replicates of both samples (A and B) was calculated and sorted. Genes with average intensity in the lowest 50% were excluded from further analysis. A subset of 11,735 genes from a total of 23,231 spots (including controls) remained for analysis. Local median and background subtraction was applied for one-color and twocolor intensity. For two-color data, an additional linear Lowess normalization was applied to the background-subtracted data. This was performed by scaling each channel to a median intensity of 100 and then normalized. For one-color data, each microarray was scaled to a median intensity of 1,000. TeleChem data preprocessing, normalization and filtering. All one-color and two-color images were analyzed using Axon GenePix Pro 5.0 software, and raw data were provided in the form of one tab-delimited text (.gpr) file per microarray. Features automatically marked as Absent (A) had a numerical value of –75 and corresponded to features in the Axon (.gal) file that show ID ‘empty’. Features marked Not Found (NF) had a numerical value of –50 and were defined as features with less than 6 pixels, or the feature diameter was greater than the lesser of three nominal diameters set in Block Properties of the (.gal) file, or the diameter that would cause it to overlap an adjacent feature of nominal diameter, or the feature was found at a position that would overlap an adjacent feature. Features marked Bad (B) had a numerical value of –100 and were defined by visual inspection during spot finding as having major noise associated with either the spot or background signal. All probes with a value less than 0 on at least one microarray were removed across all microarrays. Features marked Present (P) had

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

A N A LY S I S a numerical value of 0 and were considered acceptable for further analysis. The common filtered genes between one- and two-color microarrays were retained. The subset data was based on the list of 12,453 genes from a total of 27,648 spots (including controls) on the microarray. Analysis was based on intensity values: (F532_Median)-B532 intensity was used for one- and two-color data, in addition to (F635_Median)-B635 for two-color data. Lowess normalization of intensities was applied within individual two-color microarrays and median normalization across the microarrays. The aforementioned preprocessing and normalization methods for all three platforms followed manufacturers’ recommendations to reflect what will most likely occur in common practice. The methods differ somewhat across platforms but are consistent within platforms in order to make intraplatform one-and twocolor comparisons fair. The primary difference in normalization techniques between the three platforms is found in the TeleChem two-color data. For all the other data analyses, median scaling was applied to the data before Lowess normalization. However, with the TeleChem two-color data, this process was reversed. To compare all the data using the same normalization work flow, we applied median scaling to the TeleChem two-color data before Lowess normalization and compared to the original normalization process (Lowess before median scaling). This comparison is shown in Supplementary Figure 3 online. These additional data confirm that the minor differences in normalization procedure have very little impact on the data. Outlier assays. For the Agilent data set, microarrays identified as outlier microarrays based on single microarray quality metrics (AG1_1_A1, AG1_2_A3, AG1_ 3_B3, AGL_1_B5, AGL_1_D1, AGL_2_A1, AGL_2_C4) were not removed from analysis for the majority of the analysis presented here. The analysis presented in Figure 4 and Supplementary Figure 2 did exclude outlier microarrays. Generation of common differentially expressed gene lists. Data used for the generation of the common differentially expressed gene lists (Table 2) were from the genes that passed data preprocessing and filtering criteria for each platform and included 19,802 genes for Agilent, 11,735 genes for CapitalBio and 12,453 genes for TeleChem. Data normalization for the Agilent data was performed as described above. For both CapitalBio and TeleChem, ArrayTrack20 median scaling was used for one-color data and Linear & Lowess for two-color data (default median target intensity = 1,000). Significant differentially expressed genes were identified with a one sample t-test of log2 (B/A) ratio of five replicates that differ from 0. For two-color data, the dye swap result was averaged before doing the t-test. For both one-color and two-color data, all combinations of P-values of 0.05 and 0.01 and fold-changes of 1.5, 2.0 and 4.0 were calculated to determine the percentage of common differentially expressed genes. The percentage of common genes was calculated by dividing the number of common genes identified as differentially expressed in both one- and two-color approaches by the total number of differentially expressed genes from both approaches combined. The common manufacturer ID was used to identify the common genes from the gene lists. ANOVA models. Several analyses are based on fitting three different models to the preprocessed and normalized data: Model 1: log2(Intensity) = Mean + Sample + Site + Error Model 2: log2(Intensity) = Mean + Sample + Dye + Sample*Dye + Site +Microarray + Error Model 3: log2(Ratio) = Mean + Dye + Site + Error Separate models are fitted to the data from each feature within each platform. Model 1 is used for the one-color data and models 2 and 3 are used for the twocolor data. In these models, Intensity refers to a particular intensity value for one gene; Ratio refers to a particular ratio value for one gene; Mean indicates an overall mean value, which corresponds to mean log2(Intensity) for models 1 and 2, and mean log2(Ratio) for model 3; Sample indicates whether the intensity measurement is from sample A or B (this term is not needed in model 3 because ratios between A and B are being modeled); Site indicates the site (included for Agilent data only because CapitalBio and TeleChem data only had one site); Dye indicates the dye effect in model 2 and the dye-swap configuration in model 3; Sample*Dye refers to an interaction effect between samples and dyes; Microarray indicates the microarray from which the data were measured; Error indicates

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

random error, which is assumed to be normally distributed with mean zero and variance specific for each gene. Along with the Error term, the Site and Microarray effects are also assumed to be normally distributed with mean zero and constant variance. This arises from an assumption that effects of Site and Microarray can be assumed to be drawn from a normal population. They are so-called random effects, and estimates of their variances are known as variance components. All other effects are assumed to be fixed, that is, they have a finite number of levels, the mean value of which is estimated during the model fitting process. Model 2 is obviously the most complex of the three models but is easily fitted to two-color data using standard mixed models software. The random Microarray effect is critical, as it models the correlation between pairs of intensities observed on the same microarray. This model enables a more refined analysis of two-color results than that from model 3 by including estimates of overall mean intensity and the Sample*Dye interaction. Model 2 and its variants have been used successfully for the past five years in a variety of microarray applications21–23. For each feature on each platform, an estimate of the log2 fold change between samples A and B is computed in models 1 and 2 as the difference between the two levels of the estimated Sample effect. The ANOVA model output also includes a standard error and degrees of freedom for this difference, from which a –log10 P value is computed using a t-distribution. For model 3, the estimate of the Mean effect represents the estimated log2-scaled fold change (B/A) because the ratios were computed by dividing the B-intensity by the A-intensity, and a –log10 P value is computed in the same way as in models 1 and 2. Statistical results for all three models are based on mixed model theory21–24. Comparison of Agilent one-color and two-color data with TaqMan assay data. One-color data were normalized in Agilent GeneSpring GX as described above including the normalization of specific samples to each other (Fig. 4). Two-color data were analyzed using the following scheme. The processed signal data from the Agilent Feature Extraction software were loaded into Agilent’s GeneSpring GX software. To account for dye swap, we reversed the signal channel and control channel measurements for all d microarrays. Each gene’s measured intensity was divided by its control channel value in each sample. TaqMan assay data were generated as part of the MAQC study as described elsewhere16. TaqMan assay data were imported into Agilent’s GeneSpring GX from the data file provided by the MAQC after splitting it into individual files for each sample. For the TaqMan assay comparisons, the mapping from the final 12,091 genes was used for cross comparison between the Agilent probes and TaqMan assays15. The processed (‘intensity like’) TaqMan assay data were imported into GeneSpring GX based on the mapping, and ratios were calculated as follows: each measurement for each gene in those specific samples was divided by the median of that gene’s measurements in the corresponding control (A, SUHRR) samples. P values were calculated for the Agilent and TaqMan assay data using a onesample t-test with the appropriate number of replicates (four or five for the microarray assays, depending on the comparison, and four for the TaqMan assays), with the mean intensity value (as calculated above) compared to 1. Note: Supplementary information is available on the Nature Biotechnology website. ACKNOWLEDGMENTS The authors thank the MicroArray Quality Control (MAQC) consortium for generating the large data sets used in this study. E.K.L. and P.H. acknowledge the Advanced Technology Program of the National Institute of Standards and Technology, whose generous support provided partial funding of this research (70NANB2H3009). DISCLAIMER This work includes contributions from, and was reviewed by, the FDA and the NIH. This work has been approved for publication by these agencies, but it does not necessarily reflect official agency policy. Certain commercial materials and equipment are identified in order to adequately specify experimental procedures. In no case does such identification imply recommendation or endorsement by the FDA or the NIH, nor does it imply that the items identified are necessarily the best available for the purpose.

1149

A N A LY S I S COMPETING INTERESTS STATEMENT The authors declare competing financial interests (see the Nature Biotechnology website for details).

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

Published online at http://www.nature.com/naturebiotechnology/ Reprints and permissions information is available online at http://npg.nature.com/ reprintsandpermissions/ 1. Fodor, S.P. et al. Light-directed, spatially addressable parallel chemical synthesis. Science 251, 767–773 (1991). 2. Fodor, S.P. et al. Multiplexed biochemical assays with biological chips. Nature 364, 555–556 (1993). 3. Schena, M. et al. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270, 467–470 (1995). 4. Churchill, G.A. Fundamentals of experimental design for cDNA microarrays. Nat. Genet. 32, Suppl. 490–494 (2002). 5. Li, J., Pankratz, M. & Johnson, J. Differential gene expression patterns revealed by oligonucleotide versus long cDNA arrays. Toxicol. Sci. 69, 383–390 (2002). 6. Tan, P. et al. Evaluation of gene expression measurements from commercial platforms. Nucleic Acids Res. 31, 5676–5684 (2003). 7. Dobbin, K.K. et al. Interlaboratory comparability study of cancer gene expression analysis using oligonucleotide microarrays. Clin. Cancer Res. 11, 565–572 (2005). 8. Irizarry, R.A. et al. Multiple-laboratory comparison of microarray platforms. Nat. Methods 2, 345–349 (2005). 9. Larkin, J.E., Frank, B.C., Gavras, H., Sultana, R. & Quackenbush, J. Independence and reproducibility across microarray platforms. Nat. Methods 2, 337–343 (2005). 10. Kuo, W.P. et al. A sequence-oriented comparison of gene expression measurements across different hybridization-based technologies. Nat. Biotechnol. 24, 832–840 (2006). 11. Järvinen, A-K. et al. Are data from different gene expression microarray platforms comparable? Genomics 83, 1164–1168 (2004).

1150

12. de Reynies, A. et al. Comparison of the latest commercial short and long oligonucleotide microarray technologies. BMC Genomics 7, 51 (2006). 13. Wang, Y. et al. Large scale real-time PCR validation on gene expression measurements from two commercial long-oligonucleotide microarrays. BMC Genomics 7, 59 (2006). 14. Bammler, T. et al. Standardizing global gene expression analysis between laboratories and across platforms. Nat. Methods 2, 351–356 (2005). 15. MAQC Consortium. The MicroArray Quality Control (MAQC) project shows inter- and intramolecular reproducibility of gene expression measurements. Nat. Biotechnol. 24, 1151–1161 (2006). 16. Canales, R.D. et al. Evaluation of DNA microarray results with quantitative gene expression platforms. Nat. Biotechnol. 24, 1115–1122 (2006). 17. Guo, Y. et al. Genomic analysis of anti-hepatitis B virus (HBV) activity by small interfering RNA and lamivudine in stable HBV-producing cells. J. Virol. 79, 14392–14403 (2005). 18. Barczak, A. Spotted long oligonucleotide arrays for human gene expression analysis. Genome Res. 13, 1175–1785 (2003). 19. Shi, L. et al. Cross-platform comparability of microarray technology: intra-platform consistency and appropriate data analysis procedures are essential. BMC Bioinformatics 6 Suppl. 2, S12 (2005). 20. Tong, W. et al. Development of public toxicogenomics software for microarray data management and analysis. Mutat. Res. 549, 241–253 (2004). 21. Wolfinger, R.D. et al. Assessing gene significance from cDNA microarray data via mixed models. J. Comput. Biol. 8, 625–637 (2001). 22. Jin, W., Riley, R., Wolfinger, R.D., White, K.P, Passador-Gurgel, G. & Gibson G. Contributions of sex, genotype and age to transcriptional variance in Drosophila melanogaster. Nat. Genet. 29, 389–395 (2001). 23. Chu, T-M., Deng, S., Wolfinger, R.D., Paules, R.S. & Hamadeh, H.K. Cross-site comparison of gene expression data reveals high similarity. Environ. Health Perspect. 112, 449–455 (2004). 24. Chu, T-M., Deng, S. & Wolfinger, R.D. Modeling Affymetrix data at the probe level. in DNA microarray and statistical genomics techniques: Design, analysis, and interpretation of experiment. (eds. Edwards, J.W., Beasley, T.M., Page, G.P. and Allison, D.B.) 197–222 (Chapman & Hall/CRC, Taylor & Francis Group, Boca Raton, FL, 2006).

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

ARTICLES

The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements MAQC Consortium* Over the last decade, the introduction of microarray technology has had a profound impact on gene expression research. The publication of studies with dissimilar or altogether contradictory results, obtained using different microarray platforms to analyze identical RNA samples, has raised concerns about the reliability of this technology. The MicroArray Quality Control (MAQC) project was initiated to address these concerns, as well as other performance and data analysis issues. Expression data on four titration pools from two distinct reference RNA samples were generated at multiple test sites using a variety of microarray-based and alternative technology platforms. Here we describe the experimental design and probe mapping efforts behind the MAQC project. We show intraplatform consistency across test sites as well as a high level of interplatform concordance in terms of genes identified as differentially expressed. This study provides a resource that represents an important first step toward establishing a framework for the use of microarrays in clinical and regulatory settings.

Recently, pharmacogenomics and toxicogenomics have been identified both by the US Food and Drug Administration (FDA) and the US Environmental Protection Agency (EPA) as key opportunities in advancing personalized medicine1,2 and environmental risk assessment3. These agencies have issued guidance documents to encourage scientific progress and to facilitate the use of these data in drug development, medical diagnostics and risk assessment (http://www. fda.gov/oc/initiatives/criticalpath/; http://www.fda.gov/cder/guidance/ 6400fnl.pdf; http://www.fda.gov/cdrh/oivd/guidance/1549.pdf; http:// www.epa.gov/osa/genomics.htm). However, although DNA microarrays represent one of the core technologies for this purpose, concerns have been raised regarding the reliability and consistency, and hence potential application of microarray technology in the clinical and regulatory settings. For example, a widely cited study reported little overlap among lists of differentially expressed genes derived from three commercial microarray platforms when the same set of RNA samples was analyzed4. Similar low levels of overlap have been reported in other interplatform and/or cross-laboratory microarray studies5–8. Although similar results continue to appear in peer-reviewed journals9,10, raising doubts about the repeatability, reproducibility and comparability of microarray technology11–13, several studies have also been recently published showing increased reproducibility of microarray data generated at different test sites and/or using different platforms14–18. It follows that before this technology can be applied in clinical practice and regulatory decision making, microarray standards, quality measures and consensus on data analysis methods need to be developed2,19–21.

Here we describe the MAQC project, a community-wide effort initiated and led by FDA scientists involving 137 participants from 51 organizations. In this project, gene expression levels were measured from two high-quality, distinct RNA samples in four titration pools on seven microarray platforms in addition to three alternative expression methodologies. Each microarray platform was deployed at three independent test sites and five replicates were assayed at each site. This experimental design and the resulting data set provide a unique opportunity to assess the repeatability of gene expression microarray data within a specific site, the reproducibility across multiple sites and the comparability across multiple platforms. Objective assessment of these technical metrics is an important step towards understanding the appropriate use of microarray technology in clinical and regulatory settings. This study also addresses many other needs of the scientific community pertaining to the use and analysis of microarray data (see MAQC goals in Supplementary Data online). The MAQC project has generated a rich data set that, when appropriately analyzed, reveals promising results regarding the consistency of microarray data between laboratories and across platforms. In this article, we detail the study design, describe its implementation and summarize the key findings of the MAQC main study. The accompanying set of articles22–26 provides additional analyses and related data sets. Although the sample types used in this study are not directly representative of a relevant biological study, the study provides technical insights into the capabilities and limitations of microarray technology. Similar levels of concordance in cross-laboratory and interplatform comparisons have been independently reported using a toxicogenomics study26.

*A list of authors and their affiliations appears at the end of the paper. Correspondence and requests for materials should be addressed to L.S. ([email protected]). Received 6 June; accepted 31 July; published online 8 September 2006; doi:10.1038/nbt1239

NATURE BIOTECHNOLOGY

VOLUME 24

NUMBER 9

SEPTEMBER 2006

1151

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

ARTICLES RESULTS Experimental design The MAQC project (http://www.fda.gov/nctr/science/centers/ toxicoinformatics/maqc/) repeatedly assayed four pools comprised of two RNA sample types on a variety of gene expression platforms and at multiple test sites. The two RNA sample types used were a Universal Human Reference RNA (UHRR) from Stratagene and a Human Brain Reference RNA (HBRR) from Ambion. The four pools included the two reference RNA samples as well as two mixtures of the original samples: Sample A, 100% UHRR; Sample B, 100% HBRR; Sample C, 75% UHRR:25% HBRR; and Sample D, 25% UHRR:75% HBRR. This combination of biologically different RNA sources and known titration differences provides a method for assessing the relative accuracy of each platform based on the differentially expressed genes detected. A unique feature of the MAQC project is that both sample type A and sample type B are commercially available to the community for a few years to come in the exact batches as those used by the MAQC project. Six commercially available microarray platforms were tested: Applied Biosystems (ABI); Affymetrix (AFX); Agilent Technologies (AGL for two-color and AG1 for one-color); GE Healthcare (GEH); Illumina (ILM) and Eppendorf (EPP). In addition, scientists at the National Cancer Institute (NCI) generated spotted microarrays using oligonucleotides obtained from Operon. The RNA sample types were also tested on three alternative gene expression platforms: TaqMan Gene Expression Assays from Applied Biosystems (TAQ TaqMan is a registered trademark of Roche Molecular Systems, Inc.); StaRT-PCR from Gene Express (GEX) and QuantiGene assays from Panomics (QGN). Each microarray platform provider selected three sites for testing. In most cases, five replicate assays for each of the four sample types were processed at each of the test sites. Six of the microarray providers used one-color protocols where one labeled RNA sample was hybridized to each microarray (Table 1). The Agilent two-color and NCI microarrays were tested using a two-color protocol so that two differently labeled RNA samples were simultaneously hybridized to the same microarray. The Eppendorf assay contained two identical microarrays on one glass slide, which were independently hybridized to two samples. Although only a single fluorescent dye was used, the Eppendorf data are presented in a ratio format.

Each microarray provider used its own software to generate a quantitative signal value and a qualitative detection call for each probe on the microarray. This attention to the qualitative calls of each platform resulted in our using a potentially different number of genes in each calculation. It also had an impact on data analysis, because some, but not all, of the platforms removed suspect or low intensity data. In addition, 11 hybridizations were removed from further analysis due to quality issues. Table 1 notes the final number of hybridizations used in the final data analysis for each microarray platform. Further details are presented in Methods and Tables S1-S4 in Supplementary Data online. Pre-hybridization and posthybridization quality information of samples is available as Supplementary Table 1 online. A direct comparison of results across platforms was challenging because of inherent differences in protocols, number of data points per platform and data preprocessing methods. Whenever possible, all platforms were included in any comparisons, but occasionally results from one or two platforms were excluded from an analysis because the data comparison was untenable and forced contrivance that was ultimately uninformative. Although some data from the alternative platforms are presented in this article, a more thorough discussion is included elsewhere22. Probe mapping Microarray experiments generally rely on a hybridization intensity measurement for an individual probe to infer a transcript abundance level for a specific gene. This relationship raises several difficult issues, including: which gene corresponds to which probe, and how sensitive and specific is the probe. Previous publications have suggested that some of the variability in cross-platform studies was due to annotation problems that made it difficult to reconcile which genes were measured by specific probes27–30. Despite the fact that the human genome sequence is complete, the final list of actual genes has yet to be determined. All identifiers are moving targets, and even the NCBI hand-curated reference sequences are often modified. Another issue is that a gene expression assay designed to measure a given RNA target may unknowingly detect multiple alternatively spliced transcripts, which may have different functions and expression patterns. Thus, the number of genes or transcripts

Table 1 Gene expression platforms and data analyzed in the MAQC main study Number of Number of Number of Number of Total number of Manufacturer

probesa

test sites

samples

replicates

microarraysb

Code

Protocol

Platform

Applied Biosystems

ABI

One-color microarray

Human Genome Survey Microarray v2.0

32,878

3

4

5

58

Affymetrix Agilent

AFX One-color microarray HG-U133 Plus 2.0 GeneChip AGL Two-color microarrayc Whole Human Genome Oligo Microarray, G4112A

54,675 43,931

3 3

4 2

5 10

60 56

Eppendorf

AG1 EPP

One-color microarray Whole Human Genome Oligo Microarray, G4112A One-color microarray DualChip Microarray

43,931 294

3 3

4 4

5 5

56 60

GE Healthcare Illumina

GEH ILM

One-color microarray One-color microarray

CodeLink Human Whole Genome, 300026 Human-6 BeadChip, 48K v1.0

54,359 47,293

3 3

4 4

5 5

60 59

NCI_Operon Applied Biosystems

NCI TAQ

Two-color microarray TaqMan assays

Operon Human Oligo Set v3 4200,000 assays available

37,632 1,004

2 1

4 4

5 4

33 N/A

Panomics Gene Express

QGN GEX

QuantiGene assays StaRT-PCR assays

B2,600 assays available B1,000 assays available

245 207

1 1

4 4

3 3

N/A N/A

Total

442

aA

global definition of probes is used to include individual probes, probe sets or primer pairs depending on the gene expression platform. The numbers listed in this table are derived from product literature and may include some platform duplication. Alternative figures for the number of probes analyzed are provided as Table S5 in Supplementary Data online. bMaximum number of microarrays per one-color protocol is 60 (3 sites 4 sample types 5 replicates). As described in the text, replacement hybridizations but not outlier hybridizations are included in the main study data analysis. Only data from 386 microarrays were analyzed in this article. Additional data sets are described in Table S4 in Supplementary Data online. cAlthough not presented in this paper, the Agilent two-color data (56 microarrays) are discussed elsewhere24. In the remaining figures, test sites and sample types are referenced using the following nomenclature: ‘‘platform code_test site_ sample ID’’. Sample A ¼ 100% UHRR; Sample B ¼ 100% HBRR; Sample C ¼ 75% UHRR: 25% HBRR; and Sample D ¼ 25% UHRR: 75% HBRR.

1152

VOLUME 24

NUMBER 9

SEPTEMBER 2006

NATURE BIOTECHNOLOGY

Figure 1 Repeatability of expression signal 12,000 60 within test sites. For the one-color platforms, the CV of the expression signal values between site replicates of the same sample type was 10,000 50 calculated for all generally detected genes. The distributions of these replicate CVs are presented in a series of twelve box and whiskers plots for 8,000 40 each microarray platform: one for each of the four sample types at the three test sites. The plots are highlighted to distinguish the sample 6,000 30 replicates: sample A (white), sample B (light blue), sample C (light purple) and sample D (dark blue). The twelve plots showing results from the platforms with three test sites are presented in 4,000 20 the following order from left to right: A1, A2, A3, B1, B2, B3, C1, C2, C3, D1, D2 and D3. For the two-color NCI platform, the CV of the expression 2,000 10 Cy3/Cy5 ratios between site replicates of the same sample type was similarly calculated. The distributions of these replicate CVs are presented A B C D A B C D A B C D A B C D A B C D A B C D A B C D 0 0 in a series of eight box and whiskers plots from ABI AFX AG1 EPP GEH ILM NCI the two NCI test sites in the following order from Platform-sample left to right: A1, A2, B1, B2, C1, C2, D1, and D2. The median (gap), interquartile range as well as the 10th and 90th percentile values are indicated in each plot. Only genes from the 12,091 common set that were detected in at least three of the replicates were included in the box plots and CV calculations. This number varies by platform/sample/test site and is noted as the line plot with the secondary axis and as Table S6 in Supplementary Data online. The platforms and sample types are labeled according to the nomenclature presented in Table 1. CV (%)

No. of detected genes

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

ARTICLES

detected with a gene expression platform is inherently difficult to define and quantify. A unique advantage of the MAQC project is that most of the sequence information for the probes used in each gene expression technology was provided by the manufacturers. We mapped the probes (see Supplementary Methods online and Supplementary Notes online) to the Ref Seq human mRNA database31 (http:// www.ncbi.nlm.nih.gov/Ref Seq) and to the AceView database32 (http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly), a less curated but more comprehensive database, which includes all the Ref Seq, GenBank and dbEST human cDNA sequences. Although the total number of probes varied across platforms, the six high-density microarray platforms assayed similar numbers of Entrez genes (15,429–16,990) and had similar percentages of probes (68–84%) that aligned to AceView transcripts (see Table S5 in Supplementary Data online). We found that 23,971 of the 24,157 Ref Seq NM Accessions from the March 8, 2006 release were assayed by at least one platform (Supplementary Table 2 online) and that 15,615 Accessions were assayed by all high-density microarray platforms used in the MAQC study. Because of alternative splicing, each platform mapped to roughly four Ref Seq transcripts per three Entrez genes. To simplify the interplatform comparison, we condensed the complex probe-target relationships to a ‘one-probe-to-one-gene’ list. The 15,615 Ref Seq entries on all of the high-density microarray platforms represented 12,091 Entrez genes. For each gene, we selected a single Ref Seq entry (Supplementary Table 4 online), primarily the one annotated by TaqMan assays, or secondarily the one targeted by the majority of platforms. When a platform contained multiple probes matching the same Ref Seq entry, only the probe closest to the 3¢ end was included in the common set (Supplementary Table 3 online). In this way, we selected for each high-density platform 12,091 probes matching a common set of 12,091 reference sequences from 12,091 different genes (Supplementary Table 5 online).

NATURE BIOTECHNOLOGY

VOLUME 24

NUMBER 9

SEPTEMBER 2006

Intraplatform data repeatability and reproducibility We examined microarray data for consistency within each platform by reviewing both the intrasite repeatability and the intersite reproducibility at two levels: the quantitative signal values and the qualitative detection calls. Only genes that were detected in at least three of the five sample replicates (or generally detected genes) were included in most of these calculations. This filter accounts for the different manner in which the microarray platforms identified genes below their quality thresholds, and directs our research away from the less confident, noisy results. The number of generally detected genes for each sample type at each site varied from 8,000 to 12,000 for the highdensity microarray platforms, but was relatively consistent between test sites using the same platform (Fig. 1). The coefficient of variation (CV) of the quantitative signal values between the intrasite replicates was calculated using the generally detected subset from the 12,091 common genes for each sample type at every test site. The distribution of the replicate CV measures across the set of detected genes is displayed in a series of box and whiskers plots in Figure 1. Most of the one-color microarray platforms and test sites demonstrated similar replicate CV median values of 5–15%, although the distributions of replicate CV results differed between platforms. For the two-color NCI microarrays, the replicate CVs were calculated using the Cy3/Cy5 ratios. (Sample type A was used as the Cy5 reference in all NCI hybridizations.) These values were only slightly larger than the one-color signals for the same sample type. We next examined the total CV of the quantitative signal, which included both the intrasite repeatability as well as variation due to intersite differences. By definition, the total CV measure (n r 15) will be larger than the replicate CV measures (n r 5). Median values for the total CV distribution and the average of three replicate CV medians for each platform are presented in Figure 2. Overall, the total CV median was very consistent across all platforms, ranging from 10% to just over 20% and not dramatically higher than the replicate CV median values. In general, the total CV median was up to twice as large as the replicate CV median, but this result is not

1153

12,000

25

10,000

20

8,000

15

6,000

10

4,000

5

2,000

0

0 A B CD ABI

A B CD AFX

A B CD AG1

A B CD EPP

A B CD GEH

A B CD ILM

Platform-sample

Figure 2 Signal variation within and between test sites. For each of the four sample types, the replicate CV of signal within a test site (blue bar) and the total CV of signal across and within sites (red bar) are presented. As in Figure 1, genes detected in at least three of the replicates of a sample type at a single test site are included in the replicate CV calculation. Genes present in the intersection of these gene lists are included in the total CV calculation. (These gene lists are therefore slightly different than those in Figure 1.) The number of such genes within each platform and sample type is noted by blue dots connected by lines and is read on the secondary axis. It is also reported as Table S6 in Supplementary Data online. Intrasite normalization was performed according to default settings for each manufacturer, and intersite normalization was performed by scaling between sites (see main text). The NCI platform is omitted because data from only two test sites was available in the main study so intersite reproducibility measures may not be representative. The platforms and sample types are labeled according to the nomenclature presented in Table 1.

unexpected and simply implies that site-related effects should be taken into account when combining data from multiple sites using the same platform. To assess variation in the qualitative measures, the percentage of the 12,091 common genes with concordant detection calls between replicates of the same sample type was calculated for each of the four sample types on each platform (Fig. 3). These figures include

either all sample replicates at a single site (n r 5) or all sample replicates across the test sites (n r 15). Most one-color test sites demonstrated 80–95% concordance in the qualitative calls for the sample replicates within their facility. The value dropped to 70–85% concordance for the reproducibility of the qualitative calls across all three test sites. It is not surprising that platforms with more detected calls (Fig. 1) generally had higher concordance percentages. For example, the NCI microarrays detected almost all of the 12,091 common genes and had concordance percentages near 100% between test sites. Microarray platforms that had lower numbers of detected genes generally had reduced concordance percentages. Interestingly, the GE Healthcare platform had both a large number of genes detected (B11,000 per hybridization) and approximately 85% concordance between test sites. Interplatform data comparability Expression values generated on different platforms cannot be directly compared because unique labeling methods and probe sequences will result in variable signals for probes that hybridize to the same target. Alternatively, the relative expression between a pair of sample types should be maintained across platforms. For this reason, we examined the microarray data for comparability between platforms by reviewing sample type B relative to sample type A expression values with three different metrics: differential gene list overlap, log ratio compression and log ratio rank correlation. For log ratio compression and rank correlation, only generally detected genes from the common 12,091 gene list were included in the analysis. For the gene list overlap, all 12,091 common genes were considered. A list of differentially expressed genes was generated for each test site and compared to lists from other test sites using the same platform and those using a different platform. A percent score was calculated to indicate the number of genes in common between each pair of test site lists. The percentage of overlap for each comparison is displayed in Figure 4. Note the graphic comparisons are asymmetrical indicating the analysis is performed in two directions. That is, the percentage of test site Y genes on the list from test site X can be different from the

100

Figure 3 Concordance of detection calls within and between test sites. For the 12,091 common 90 genes, detection calls within each platform were categorized as either ‘detected’ or ‘not detected.’ 80 For each sample type within each platform, the percentage of genes with calls that were perfectly 70 concordant as ‘detected’ within the replicates 60 for a given site is plotted as blue dots, and the corresponding percentage of genes with calls 50 perfectly concordant as ‘detected’ across all sites are plotted as the blue bars. The total percentage 40 of genes with perfectly concordant calls (detected and not detected) within a site is plotted as the 30 yellow dots, and the corresponding percentage of genes with calls perfectly concordant across all 20 sites is plotted as the top of the yellow bars. The bars are split between perfectly detected genes 10 123 123 123123 123 123123123 123 123123123 123 123123123 123123123 123 12 12 1212 (blue portion) and perfectly not detected genes A B C D A B C D A B C D A B C D A B C D A B C D (yellow portion) across all test sites. It is not 0 AFX AG1 GEH ILM NCI ABI expected that detected genes are concordant Platform-sample-site across sample types. The number of perfectly detected genes for each test site is provided as Table S6 in Supplementary Data online. As described in the main text, the stringency with which individual platforms determine that the data for a gene is sufficiently reliable to be called detected has different manufacturer defaults, leading to altered concordance percentages. Changes in the settings for sensitivity/specificity may shift the proportion of the bar assigned to each detection category. Because reliability depends on platform-specific details, detected calls do not correspond directly to relative abundance and may vary between platforms. Note: as some platforms have removed outlier hybridizations, the number of replicates within (n r 5) and between sites (n r 15) varies for determining concordance. Genes with consistent calls (%)

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

30

No. of detected genes across sites

Median CV (%)

ARTICLES

1154

VOLUME 24

NUMBER 9

SEPTEMBER 2006

NATURE BIOTECHNOLOGY

ARTICLES ABI_1

Overlap between test site pairs (%)

ABI_2

80 70 60 50 40 30 20 10 0

ABI_3 AFX_1 AFX_2 Test site Y gene list

AG1_2 AG1_3 GEH_1 GEH_2 GEH_3 ILM_1 ILM_2 ILM_3 NCI_1 NCI_2

ILM_3

NCI_1

ILM_2

ILM_1

GEH_3

GEH_2

AG1_3

GEH_1

AG1_2

AFX_3

AG1_1

AFX_2

ABI_3

AFX_1

ABI_2

ABI_1

NCI_2

Test site X gene list

percentage of test site X genes on the test site Y list. For all but the NCI test sites, the gene list overlap is at least 60% for each test site comparison (both directions) with many site pairings achieving 80% or more between platforms and 90% within platforms. Typically, the genes that the NCI microarray platform identified as differentially expressed were also identified on the other platforms, suggesting a low false positive rate for this platform. However, the converse was

a

ABI_1 ABI_2 ABI_3 AFX_1 AFX_2 AFX_3 AG1_1 AG1_2 AG1_3 EPP_1 EPP_2 EPP_3 GEH_1 GEH_2 GEH_3 ILM_1 ILM_2 ILM_3 NCI_1 NCI_2 GEX_1 QGN_1 TAQ_1

Compression/expansion difference between site pairs 0.60 0.50 0.40 0.30 0.20 0.10 0.00 –0.10 –0.20 –0.30 –0.40 –0.50 –0.60

not necessarily true, most likely due to more log ratio compression observed in the NCI platform and the use of a stringent P-value threshold. Each microarray platform has a defined background correction method and dynamic range of signal detection, which can lead to over- or underestimates of log ratios and fold changes in expression between sample types. To examine the level of compression or ABI_1 b ABI_2

Rank correlation of log ratios between site pairs

ABI_3 AFX_1 AFX_2 AFX_3 AG1_1 AG1_2 AG1_3 EPP_1 EPP_2 EPP_3 GEH_1 GEH_2 GEH_3 ILM_1 ILM_2 ILM_3 NCI_1 NCI_2 GEX_1 QGN_1 TAQ_1

1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00

ABI_1 ABI_2 ABI_3 AFX_1 AFX_2 AFX_3 AG1_1 AG1_2 AG1_3 EPP_1 EPP_2 EPP_3 GEH_1 GEH_2 GEH_3 ILM_1 ILM_2 ILM_3 NCI_1 NCI_2 GEX_1 QGN_1 TAQ_1

ABI_1 ABI_2 ABI_3 AFX_1 AFX_2 AFX_3 AG1_1 AG1_2 AG1_3 EPP_1 EPP_2 EPP_3 GEH_1 GEH_2 GEH_3 ILM_1 ILM_2 ILM_3 NCI_1 NCI_2 GEX_1 QGN_1 TAQ_1

Test site Y

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

AFX_3 AG1_1

Figure 4 Agreement of gene lists. This graph indicates the concordance of genes identified as differentially expressed for pairs of test sites, labeled as X and Y. A list of differentially expressed genes between sample type A replicates versus sample type B replicates was generated for each test site (using the 12,091 common genes with Z twofold change and P o 0.001 thresholds) and compared for commonality to other test sites. The size of these gene lists is reported as Table S7 in Supplementary Data online. No filtering related to the qualitative detection call was performed. The color of the square in the matrix reflects the percent overlap of genes on the list for the test site Y (listed in row) that are also present on the list for the test site X (listed in column). A light-colored square indicates a high percent overlap between the gene lists at both test sites. A dark-colored square indicates a low percent overlap, suggesting that most genes identified in site Y were not identified in site X. Numerical values for the percent overlap are presented as Table S9 in Supplementary Data online. Note: the graph is asymmetric and not complementary. Only the six high-density microarray platforms are presented. As described in the text, data from some platforms were omitted from these calculations because of quality issues. The platforms and sample types are labeled according to the nomenclature presented in Table 1. The _1, _2 and _3 suffixes refer to test site location.

Test site X

Test site X

Figure 5 Agreement of log ratios across platforms and test sites. (a) Log ratio compression/expansion. This graph indicates the percent difference from equivalency between platform/sites (corresponding to a slope value 1 for the best fitted line using orthogonal regression) of the log ratio differential expression using A and B replicates. A dark spot implies equivalency (slope ¼ 1 - percent difference ¼ 0). A positive percent difference in slope from the ideal line (aqua) indicates compression of log signal for test site Y relative to test site X. A negative percent difference in the ideal line (magenta) indicates expansion. Read as ‘‘What is the difference from equivalence in slope (m ¼ 1) for the test site Y versus test site X ?’’ Only genes detected by both test sites in at least three replicates of sample type A and three replicates of sample type B are included in the calculation, and the number for each pair is reported as Table S8 in Supplementary Data online. Numerical values for the percent difference are presented as Table S10 in Supplementary Data online. Note: the graph is asymmetric, but approximately complementary. As described in the text, data from some platforms were omitted from these calculations due to quality issues. The platforms and sample types are labeled according to the nomenclature presented in Table 1. The _1, _2 and _3 suffixes refer to test site location. (b) Rank correlation of log ratios. This graph indicates the correlation of the log ratio differential expression values (using A versus B replicates) when we examine their rank. Large positive log ratio values would be ranked high and large negative log ratio values would be ranked low. Read as ‘‘What is the correlation of the rank log ratio values between the test site Y and the test site X?’’ Only genes generally detected in both sample types A and B and by both test sites are included in the calculation, and the number for each pair is reported as Table S8 in Supplementary Data online. Numerical values for the rank correlation are presented as Table S11 in Supplementary Data online. Note: the graph is symmetric. As described in the text, data from some platforms were omitted from these calculations due to quality issues. The platforms and sample types are labeled according to the nomenclature presented in Table 1. The _1, _2 and _3 suffixes refer to test site location.

NATURE BIOTECHNOLOGY

VOLUME 24

NUMBER 9

SEPTEMBER 2006

1155

ARTICLES

b

12

d

12

12 10

8

8

8

8

6

6

6

6

4 2 0 –2 –4

4 2 0 –2 –4 –6

–6

–12

Site1: n = 528, r = 0.86 Site2: n = 523, r = 0.85 Site3: n = 567, r = 0.84 –12 –10 –8 –6 –4 –2 0

2

4

6

–12

–12 –10 –8 –6 –4 –2 0

TaqMan log ratio

0 –2 –4

2

4

6

–12

8 10 12

f

12

–12 –10 –8 –6 –4 –2 0

2

4

g

12

10

8

8

6

6

6

0 –2 –4 –6 –8 –10 –12

4 2 0 –2 –4 –6

Site1: n = 670, r = 0.84 Site2: n = 680, r = 0.86 Site3: n = 660, r = 0.84 –12 –10 –8 –6 –4 –2 0

2

4

6

TaqMan log ratio

8 10 12

0 –4

6

Site1: n = 53, r = 0.91 Site2: n = 64, r = 0.79 Site3: n = 84, r = 0.83

–8 –10 –12

8 10 12

–12 –10 –8 –6 –4 –2 0

2

4

6

8 10 12

TaqMan log ratio

12

10

8

2

2 –2

TaqMan log ratio

10

4

4

–6

Site1: n = 532, r = 0.90 Site2: n = 547, r = 0.92 Site3: n = 595, r = 0.90

–8 –10

TaqMan log ratio

ILM log ratio

GEH log ratio

e

2

–6

Site1: n = 469, r = 0.92 Site2: n = 451, r = 0.92 Site3: n = 472, r = 0.93

–8 –10

8 10 12

4

NCI log ratio

–8

EPP log ratio

10

AG1 log ratio

10

–10

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

c

12

10

AFX log ratio

ABI log ratio

a

4 2 0 –2 –4 –6

Site1: n = 516, r = 0.91 Site2: n = 505, r = 0.91 Site3: n = 484, r = 0.91

–8 –10 –12

–12 –10 –8 –6 –4 –2 0

2

4

6

8 10 12

Site1: n = 769, r = 0.82 Site2: n = 740, r = 0.83

–8 –10 –12

–12 –10 –8 –6 –4 –2 0

TaqMan log ratio

2

4

6

8 10 12

TaqMan log ratio

Figure 6 Correlation between microarray and TaqMan data. The scatter plots compare the log ratio differential expression values (using A versus B replicates) from each microarray platform relative to values obtained by TaqMan assays. Each point represents a gene that was measured on both the microarray and TaqMan assays. The spot coloring indicates whether the data were generated in test site 1 (black), test site 2 (blue) or test site 3 (red) for the microarray platform. Only genes that were generally detected in sample type A replicates and sample type B replicates were used in the comparisons. The exact number of probes analyzed for each test site and its correlation to TaqMan assays are listed in the bottom right corner of each plot. As described in the text, data from some platforms were omitted from these calculations because of quality issues. The platforms and sample types are labeled according to the nomenclature presented in Table 1. The line shown is the ideal 451 line.

expansion in log ratios, we determined the best fitted line for the log ratio estimates between pairs of test sites. The percent difference of the slope for each comparison is displayed in Figure 5a. An ideal slope of 1 would result in a percent difference of 0; negative or positive percent differences in the slope of the ideal line indicate compression or expansion of the log ratios in one test site relative to the other. For each commercial one-color platform, good agreement was observed between its three test sites. Most of the interplatform test site comparisons also showed little compression or expansion. Test site 1 for the NCI microarrays produced consistently different results from the other test sites, both within and between platforms. The comparability of results across platforms was also examined using a rank correlation metric. Log ratios for the differential expression observed between sample B replicates and sample A replicates were calculated for the generally detected common genes and then compared between test sites and across platforms. The rank correlations of the log ratios are displayed visually in Figure 5b. Good agreement was observed between all sites, even those using different microarray platforms. In fact, the median rank correlation was 0.87 and the smallest rank correlation value was 0.69 between the microarray platforms. Assessing relative accuracy The relative accuracy of the microarray platforms can be assessed using either the titrated mixtures of the RNA samples23 or gene abundance measurements collected with alternative technologies22. Figure 5, as well as Tables S12 and S13 in Supplementary Data online, illustrate the relative rank correlation and compression/expansion values for log (B/A) between microarray-based and alternative gene expression technologies. Further comparisons between each microarray platform relative to the TaqMan assays are presented as scatter plots in Figure 6.

1156

The log ratios of sample type B to sample type A expression detected on the TaqMan assays were compared to the log ratios obtained for the same genes on the microarray assays. Only genes that were generally detected in both sample A and B replicates on the TaqMan assays and on the microarray were included in this analysis. The relative accuracy of each high-density platform to the TaqMan assay data was generally higher for those microarray platforms with fewer genes detected as indicated by number and magnitude of deviations from the ideal 451 line indicated in Figure 5a and Figure 6. Correlation with alternative platforms Similarly, the Affymetrix, Agilent, and Illumina platforms displayed high correlation values of 0.90 or higher with TaqMan assays based on comparisons of B450–550 genes, whereas the GE Healthcare and NCI platforms had a reduced average correlation of 0.84, but included almost 30% more genes in the data comparisons. These additional genes were not identified as ‘not detected’ during the data review process, but may represent less confident results due to lower signals exhibiting greater variance. Thus, much of the difference in comparability metrics may be a reflection of the algorithm used to assign detection calls. Similar correlation values for the microarray platforms were observed relative to each of the other alternative platforms, StaRT-PCR, and QuantiGene22. DISCUSSION The results of the MAQC project provide a framework for assessing the potential of microarray technologies as a tool to provide reliable gene expression data for clinical and regulatory purposes. All onecolor microarray platforms had a median CV of 5–15% for the quantitative signal (Fig. 1) and a concordance rate of 80–95% for the qualitative detection call (Fig. 3) between sample replicates. This variation increased when data from different test sites using

VOLUME 24

NUMBER 9

SEPTEMBER 2006

NATURE BIOTECHNOLOGY

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

ARTICLES the same platform were included (Figs. 2 and 3). However, lists of differentially expressed genes averaged B89% overlap between test sites using the same platform and B74% overlap across onecolor microarray platforms (Fig. 4). Importantly, the ranks of log ratios were highly correlated among the microarrays (minimum R ¼ 0.69; Fig. 5b), indicating that all platforms were detecting similar changes in gene abundance. These results indicate that, for these sample types and these laboratories, microarray results were generally repeatable within a test site, reproducible between test sites and comparable across platforms, even when the platforms used probes with sequence differences as well as unique protocols for labeling and expression detection. Within the MAQC study, there were notable differences in various dimensions of performance between microarray platforms. Some platforms had better intrasite repeatability overall (e.g., Illumina), better intersite reproducibility (e.g., Affymetrix), or more consistency in the detection calls (e.g., GE Healthcare). Likewise, some platforms were more comparable to TaqMan assays (e.g., Applied Biosystems and Agilent one-color), whereas others demonstrated signal compression (e.g., NCI_Operon). Some of these differences were manifest in the apparent power analyses (see Figure S1 in Supplementary Data online) as test sites with smaller CV values (Fig. 1) typically had more power to discriminate differences between groups, as would be expected. Other differences might have been related to the platform’s signal-to-analyte response characteristics22. It is important to note that 11 (2.4%) of the 453 microarray hybridizations were removed from the analysis due to quality issues (listed as Table S1 in Supplementary Data online). The relative performance of some platforms might have been altered if this data filter had not been applied. Each microarray platform has made different trade-offs with respect to repeatability, sensitivity, specificity and ratio compression. One interesting result was that platforms with divergent approaches to measuring expression often generated comparable results. For example, data from Affymetrix test sites, which use multiple short oligonucleotide probes per target with perfect match and mismatch sequences, and Illumina test sites, which use plasma-etched silicon wafers containing beads with long oligonucleotide probes, were remarkably similar in the numbers of genes detected and the detection call consistency, gene list overlap and ratio compression analyses. In other words, the expression patterns generated were reflective of biology regardless of the differences in technology. Some of the results were affected by differences in data analysis and detection call algorithms. This effect is most noticeable in the foldchange compression observed in the two-color results from the NCI microarrays, which generally included low intensity probes resulting in over 95% detection call rate. The comparability of the NCI microarrays relative to the other platforms improves when background is based on ‘alien’ or negative control sequences. This alternative method reduces the detection call rate to 60–70%, while generally increasing the absolute fold changes in up- and downregulated genes (E.S.K., unpublished data). Interestingly, the NCI platform had lower intrasite repeatability (Fig. 1), but demonstrated comparable rankings in log ratios when compared to the other platforms (Fig. 5b). Additional analyses of the MAQC data are provided in the accompanying articles. For example, the microarray platforms detected known differences in gene abundance between defined RNA mixtures23 and generated differential expression results that were comparable with other gene expression platforms22,24. The comparability of the gene expression results increased when the microarrays and other methodologies analyzed overlapping

NATURE BIOTECHNOLOGY

VOLUME 24

NUMBER 9

SEPTEMBER 2006

sequences from the same gene22. Furthermore, external RNA controls included in some microarray platforms were useful predictors of technical performance25. Direct comparison of different microarray platforms is neither a new nor an original idea in the realm of high-throughput biology. However, the data set generated by the MAQC project is unique in both its size and content. The main study compares seven different microarray platforms and includes B60 hybridizations per platform using well-characterized, commercially available RNA sample types. Including the reagents used in the two pilot studies and the toxicogenomics validation study26, 1,327 microarrays have been used for this project (see Table S4 in Supplementary Data online). Moreover, the availability of the probe sequences in the MAQC project enabled us to approach the interplatform comparisons with greater scientific rigor. We performed detailed probe mapping to confirm identity and reveal potential sequence- or target-based differences between the gene expression platforms. This analysis confirmed that the great majority of probes were very carefully chosen and of high quality. Most of the results in this report are based on a set of 12,091 common genes that are represented on six high-density microarray platforms, but which generally use different probe sequences for detection. Our probe selection procedure may have introduced a bias in the study because the imposed criteria neither reflect the platform design philosophies nor does it account for the very rich underlying biology. More than one probe per target can be a highly desirable feature on microarray platforms because a single probe may not capture all tissue-specific effects. We also found a number of probes that were not gene specific, suggesting a strategy of targeting multigene families. The MAQC data set captures intrasite, intersite and interplatform differences. However, it does not address protocol, time or other technical variables within a test site because all test sites used the same protocol and generated replicate data at approximately the same time (except as noted under data filtering). The effect and levels of these sources of variation have been described in other studies15,33. Furthermore, our analysis does not include performance metrics based on ‘biology’ (e.g., Gene Ontology terms or pathways)26. Though a relatively high level of concordance of differentially expressed gene lists were observed in this study, it is possible that a higher level of agreement would be detected using these other methods of gene list concordance34, or that a lower level would be observed with sample types that were more realistically similar. It should be noted that the results presented in this paper in terms of log ratios and overlap of lists of differentially expressed genes were derived from comparing sample types A and B, which exhibited the greatest differences among the four sample types used in the MAQC project. In practical applications, the expected differences between sample types (e.g., treated versus control animals) are usually much smaller compared to those seen between sample types A and B. Therefore, the comparability of microarray data reported in this paper does not necessarily mean that the same level of consistency would be achieved in toxicogenomic or pharmacogenomic applications. This difference can be seen from the relatively lower power and smaller overlap of gene lists (see Figures S1-S2 in Supplementary Data online) when comparing sample types C and D, where the maximum fold change is three. The MAQC data set can be used to compare normalization methods23 and data analysis algorithms26 (see Figure S2 in Supplementary Data online), similar to a currently available website (http:// affycomp.biostat.jhsph.edu) which illustrates the impact of the different data analysis methods on expression results30,34. It is our hope that

1157

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

ARTICLES future studies will add to the MAQC data set. For example, microarray providers could submit gene expression results from new microarrays with updated probe content and then use the MAQC data set to confirm consistency with older versions of the microarray. In an effort to equally represent all platforms and to present results in a timely manner, this publication analyzed only 386 microarray hybridizations from 20 test sites. However, additional data sets from the MAQC main study are available (listed as Tables S1-S4 in Supplementary Data online). Although most sites generated quality results, some differences were detected between test sites using the same platform. Thus, microarray studies need unified metrics and standards, which can be used to identify suboptimal results and monitor performance in microarray facilities. Previous reports have relied heavily on the statistical significance (P value) rather than on the actual measured quantity of differential expression (fold change or ratio) in identifying differentially expressed genes. This strict reliance on P values alone has resulted in the apparent lack of agreement between sites and microarray platforms20,26. Our results from analyzing the MAQC human data sets (see Figure S2 in Supplementary Data online) and the rat toxicogenomics data set26 indicate that a straightforward approach of foldchange ranking plus a nonstringent P cutoff can be successful in identifying reproducible gene lists, whereas ranking and selecting differentially expressed genes solely by the t-test statistic predestine a poor concordance in results, in particular for shorter gene lists, due to the relatively unstable nature of the variance (noise) estimate in the t-statistic measure. More robust methods such as ranking using the test statistic from the Significance Analysis of Microarrays (SAM)35 did not generate more reproducible results compared to fold-change ranking in our cross-laboratory and interplatform comparisons. Our results are consistent with previously published data20. Furthermore, the impact of normalization methods on the reproducibility of gene lists becomes minimal when the fold change, instead of the P value, is used as the ranking criterion for gene selection24,26. Two initiatives for microarray reference materials are currently in progress. A group led by FDA’s Center for Drug Evaluation and Research (CDER) developed two mixed-tissue RNA pools with known differences in tissue-selective genes that can be used as rat reference materials36, whereas the External RNA Controls Consortium (ERCC) is testing polyadenylated transcripts that can be added to each RNA sample before processing to monitor the technical performance of the assay37. The MAQC project complements these efforts by establishing several commercially available human reference RNA samples, and an accompanying large data set, which can be used by the scientific community to compare results generated in their own laboratories for quality control and performance validation efforts. In fact, the commercial availability of the MAQC reference sample types allowed several laboratories to generate and submit additional gene expression data to the MAQC project after the official deadline (listed as Table S4 in Supplementary Data online). Repeated intersite comparisons, such as a proficiency testing, are required three times a year for many Clinical Laboratory Improvement Amendments (CLIA) assays and may also be useful in microarray facilities to monitor the comparability and consistency of data sets generated over time38. For example, a proficiency testing program evaluated the performance over a 9-month period of 18 different laboratories by repeatedly hybridizing three replicates of the same two RNA sample types to Affymetrix microarrays (L.H.R. and W.D.J., unpublished results). This study revealed the range of quality metrics and the impact of protocol differences on the microarray results. The MAQC human reference RNA

1158

sample types could be used in this kind of intersite proficiency testing program. In summary, the technical performance of microarrays as assessed in the MAQC project supports their continued use for gene expression profiling in basic and applied research and may lead to their use as a clinical diagnostic tool as well. International organizations such as ERCC37, the Microarray Gene Expression Data Society39 and this MAQC project are providing the microarray community with standardization of data reporting, common analysis tools and useful controls that can help provide confidence in the consistency and reliability of these gene expression platforms. METHODS Probe mapping. Affymetrix, Agilent, GE Healthcare, Illumina and Operon oligonucleotides used by the NCI provide publicly available probe sequences for their microarray platforms in a spreadsheet format (websites listed in Supplementary Data online). The probe sequences for Applied Biosystems microarrays can be individually obtained through the Panther database (http:// www.pantherdb.org) and the sequences of the intended regions for QuantiGene (Panomics) assays are available upon request. Probe sequences for Eppendorf microarrays are not yet publicly available, but were provided to the MAQC project for confidential analysis. Gene Express provided annotation and approximate forward and reverse primer locations for the StaRT-PCR assays, which were sufficient to localize the intended target. For TaqMan assays, Applied Biosystems provided Assay ID, amplicon size, assay location on the Ref Seq and a context sequence (exact 25-nt sequence that includes the TaqMan assay detection probe). The MAQC probe mapping (Supplementary Methods online and Supplementary Notes online) used the March 8, 2006 Ref Seq release containing 24,000 curated accessions to which we subjectively added 157 entries that were recently either withdrawn or retired from the NCBI curation. AceView comparisons were based on the August 2005 database32. An exact match of the sequence of the probe to the database entry was required. Probes matching only the reverse strand of a transcript were excluded as well as probes matching more than one gene. An exact match of 80% of the probes within a probe set (usually 9 probes out of 11) was required for Affymetrix. The results based on these stringent criteria are provided as Supplementary Tables 2–5 online and summarized as Table S5 in Supplementary Data online. The counts for the StaRT-PCR and TaqMan assays were based on the annotation provided by Gene Express and Applied Biosystems. In the AceView analysis, the mapping was tolerant to low levels of noncentral mismatches, but applied a stringent gene-specific filter so that probes which potentially cross-hybridize were removed even if they had a single exact match. RNA preparation. The total RNA sources were tested and selected based on the results of 160 microarrays from Pilot Project I (data not shown). The Universal Human Reference RNA (catalog no. 740000) and Human Brain Reference RNA (catalog no. 6050) were generously donated by Stratagene and Ambion, respectively. The four titration mixtures of the samples were selected based on the results of 254 microarrays from Pilot Project II (data not shown) and prepared as described elsewhere23. The titration pools were mixed at the same time at one site using a documented protocol (MAQC_RNA_Preparation_ SOP.doc) available at the MAQC website (http://www.fda.gov/nctr/science/ centers/toxicoinformatics/maqc/). Each test site received 50-mg aliquots of the four sample types and confirmed the RNA quality using a Bioanalyzer (Agilent) before initiating target preparation. Target preparation and quality assessments. Every test site was provided with instructions (MAQC_Sample_Processing_Overview_SOP.doc) on the processing of RNA samples, conducting quality assessment of RNA reference samples, target preparations and replication guidelines, standardized nomenclature for referencing samples and a template for reporting quality assessment data (MAQC_RNA_Quality_Report_Template.xls). The gene expression vendors generously provided all reagents to the test sites. Each microarray test site assessed cRNA yields using a spectrophotometer and determined the median transcript sizes using a Bioanalyzer (Agilent). Pre-hybridization and posthybridization quality metrics are presented as Supplementary Table 1 online.

VOLUME 24

NUMBER 9

SEPTEMBER 2006

NATURE BIOTECHNOLOGY

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

ARTICLES Some statistically significant differences were observed in these quality metrics between sites (data not shown). Affymetrix, Agilent, Applied Biosystems and Eppendorf test sites added platform-specific external RNA controls to the samples before processing25. Data were submitted to the FDA’s National Center for Toxicological Research (FDA/NCTR) directly from each test site and distributed to the eleven official analysis sites for review. Lists of the gene expression test sites and data analysis centers are available as Tables S1 and S2 in Supplementary Data online. All test sites for one vendor used the same target preparation protocols and processed all replicates at approximately the same time, with two exceptions: (i) Microarray slides at the NCI test sites were scanned at 100% laser power, but the photomultiplier setting varied from slide to slide and (ii) some outlier hybridizations were repeated at a later date as described below. Exact protocols for sample processing are available at the MAQC website (http://www.fda.gov/ nctr/science/centers/toxicoinformatics/maqc/) and are briefly described in Supplementary Data online. Data filters. Outlier hybridizations were repeated or removed from the analysis after the original data submission deadline in October 2005. One site each for the NCI and GE Healthcare platforms repeated all sample types in the MAQC study (NCI_2 and GEH_2) due to protocol issues. One Illumina site (ILM_2) repeated two samples in the MAQC study due to low cRNA yield, and another Illumina site (ILM_1) did not hybridize one sample replicate due to the same reason. Data quality from 11 hybridizations at seven test sites (ABI_2, ABI_3, AG1_1, AG1_2, AG1_3, AGL_1 and AGL_2) was not satisfactory. More details are provided as Table S3 in Supplementary Data online. Data processing. The platform-specific methods used for background subtraction, data normalization and the optional incorporation of offset values are described in Supplementary Data online. Each test site submitted its data (including image files) to the FDA/NCTR. All data were imported into the ArrayTrack database system40,41 and preprocessed and normalized according to the manufacturer’s suggested procedures. Each gene was reviewed for quality and marked with a detection call, using the manufacturer’s protocol. Data in a uniform format were distributed to all test sites and official data analysis sites for independent study. Data analysis. Data analyses were performed on either all of the 12,091 common genes or a subset of this group based on the qualitative detection call reported for each hybridization. The size of these subsets in each of the test sites for each sample type is reported as Table S6 in Supplementary Data online. Signal repeatability and reproducibility. The coefficient of variation (CV) of the signal or Cy3/Cy5 values (not log transformed) between the intrasite replicates (n r 5) was calculated for genes that were detected in at least three replicates of the same sample type within a test site. The distributions of these replicate CV values are displayed in Figure 1. The replicate CV medians from three test sites are included in Figure 2. A total CV (Fig. 2) of the signal values was calculated for all replicates across three test sites (n r 15) using the intersection of the generally detected gene lists (that is, genes detected in at least three replicates at all three sites). A global scaling normalization is inherently applied to data from the GE Healthcare and Agilent platforms, but is not part of data extraction and normalization on the Applied Biosystems, Affymetrix (using PLIER+16) and Illumina platforms. To account for these differences, Applied Biosystems, Affymetrix and Illumina provided scaling factors for each test site that were included when measuring the total CV. Concordance of detection call. Analyses were performed on all 12,091 common genes using the feature quality metrics provided by the manufacturers. All calls were resolved to a Detected or Not Detected status. Details on each platform’s method of determining qualitative calls are provided in Supplementary Data online. In general, the results are provided regarding the consistency of the resolved detection calls. If the call was missing because the microarray was absent, then the detection value was not considered. Otherwise, the qualitative call was considered, including those cases where the signal value was missing.

NATURE BIOTECHNOLOGY

VOLUME 24

NUMBER 9

SEPTEMBER 2006

Gene list agreement. A list of differentially expressed genes was identified for each test site using the usual two group t-test that assumes equal variances between groups resulting in a pooled estimate of variance. This calculation is based on log signal. The criteria were P value o 0.001 and a mean difference greater than or equal to twofold. No filtering related to gene detection was performed. For each pair of test sites, the number of genes in both lists was identified. Percent overlap (Fig. 4) was calculated as the number of genes in common divided by number of genes on the list from one test site. For example, the agreement score for test site Y relative to test site X equals the number of genes on both lists divided by the number of genes on the test site Y list. Log ratio comparability. The log ratio of each gene is defined as the average of log signal for all sample B replicates minus the average of log signal of all sample A replicates. (This value is the equivalent of the log of the ratio of the geometric average of signal for all sample A replicates to the geometric average of signal for all sample B replicates.) Only genes that were detected in at least three sample A replicates and detected in at least three sample B replicates for both test sites were included. To detect compression or expansion (Fig. 5a), the slope (m) was calculated for each pair of test sites using orthogonal regression due to the potential measurement error in both sites. This analysis is based on the formula y ¼ mx + b, where y is the log ratio from test site Y and x is the log ratio from test site X. As the ideal slope is 1, the percent difference from ideal is simply m – 1. Comparability between a pair of test sites was also examined using Spearman rank correlations of the log ratios (Fig. 5b). This value compares the relative position of a gene in the test site X rank order of the log ratio (fold change) values against its position in test site Y rank order. Scatter plots of the log ratios from all sites against the log ratios generated with the TaqMan assays are presented in Figure 6. Accession numbers. All data are available through GEO (series accession number: GSE5350), ArrayExpress (accession number: E-TABM-132), ArrayTrack (http://www.fda.gov/nctr/science/centers/toxicoinformatics/Array Track/), and the MAQC web site (http://www.fda.gov/nctr/science/centers/ toxicoinformatics/maqc/). Note: Supplementary information is available on the Nature Biotechnology website.

ACKNOWLEDGMENTS All MAQC participants freely donated their time and reagents for the completion and analysis of the MAQC project. Participants from the National Institutes of Health (NIH) were supported by the Intramural Research Program of NIH, Bethesda, Maryland. D.H. thanks Ian Korf for BLAST discussions. This study utilized a number of computing resources, including the highperformance computational capabilities of the Biowulf PC/Linux cluster at the NIH (http://biowulf.nih.gov/) as well as resources at the analysis sites. DISCLAIMER This work includes contributions from, and was reviewed by, the FDA, the EPA and the NIH. This work has been approved for publication by these agencies, but it does not necessarily reflect official agency policy. Certain commercial materials and equipment are identified in order to adequately specify experimental procedures. In no case does such identification imply recommendation or endorsement by the FDA, the EPA or the NIH, nor does it imply that the items identified are necessarily the best available for the purpose. COMPETING INTERESTS STATEMENT The authors declare competing financial interests (see the Nature Biotechnology website for details). Published online at http://www.nature.com/naturebiotechnology/ Reprints and permissions information is available online at http://npg.nature.com/ reprintsandpermissions/ 1. Lesko, L.J. & Woodcock, J. Translation of pharmacogenomics and pharmacogenetics: a regulatory perspective. Nat. Rev. Drug Discov. 3, 763–769 (2004). 2. Frueh, F.W. Impact of microarray data quality on genomic data submissions to the FDA. Nat. Biotechnol. 24, 1105–1107 (2006). 3. Dix, D.J. et al. A framework for the use of genomics data at the EPA. Nat. Biotechnol. 24, 1108–1111 (2006).

1159

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

ARTICLES 4. Tan, P.K. et al. Evaluation of gene expression measurements from commercial microarray platforms. Nucleic Acids Res. 31, 5676–5684 (2003). 5. Ramalho-Santos, M., Yoon, S., Matsuzaki, Y., Mulligan, R.C. & Melton, D.A. ‘‘Stemness’’: transcriptional profiling of embryonic and adult stem cells. Science 298, 597–600 (2002). 6. Ivanova, N.B. et al. A stem cell molecular signature. Science 298, 601–604 (2002). 7. Miller, R.M. et al. Dysregulation of gene expression in the 1-methyl-4-phenyl-1,2,3,6tetrahydropyridine-lesioned mouse substantia nigra. J. Neurosci. 24, 7445–7454 (2004). 8. Fortunel, N.O. et al. Comment on ‘‘‘Stemness’: transcriptional profiling of embryonic and adult stem cells’’ and ‘‘a stem cell molecular signature’’. Science 302, 393 author reply 393 (2003). 9. Miklos, G.L. & Maleszka, R. Microarray reality checks in the context of a complex disease. Nat. Biotechnol. 22, 615–621 (2004). 10. Frantz, S. An array of problems. Nat. Rev. Drug Discov. 4, 362–363 (2005). 11. Marshall, E. Getting the noise out of gene arrays. Science 306, 630–631 (2004). 12. Michiels, S., Koscielny, S. & Hill, C. Prediction of cancer outcome with microarrays: a multiple random validation strategy. Lancet 365, 488–492 (2005). 13. Ein-Dor, L., Zuk, O. & Domany, E. Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc. Natl. Acad. Sci. USA 103, 5923–5928 (2006). 14. Petersen, D. et al. Three microarray platforms: an analysis of their concordance in profiling gene expression. BMC Genomics 6, 63 (2005). 15. Dobbin, K.K. et al. Interlaboratory comparability study of cancer gene expression analysis using oligonucleotide microarrays. Clin. Cancer Res. 11, 565–572 (2005). 16. Irizarry, R.A. et al. Multiple-laboratory comparison of microarray platforms. Nat. Methods 2, 345–350 (2005). 17. Larkin, J.E., Frank, B.C., Gavras, H., Sultana, R. & Quackenbush, J. Independence and reproducibility across microarray platforms. Nat. Methods 2, 337–344 (2005). 18. Kuo, W.P. et al. A sequence-oriented comparison of gene expression measurements across different hybridization-based technologies. Nat. Biotechnol. 24, 832–840 (2006). 19. Shi, L. et al. QA/QC: challenges and pitfalls facing the microarray community and regulatory agencies. Expert Rev. Mol. Diagn. 4, 761–777 (2004). 20. Shi, L. et al. Cross-platform comparability of microarray technology: intra-platform consistency and appropriate data analysis procedures are essential. BMC Bioinformatics 6 Suppl. 2, S12 (2005). 21. Ji, H. & Davis, R.W. Data quality in genomics and microarrays. Nat. Biotechnol. 24, 1112–1113 (2006). 22. Canales, R.D. et al. Evaluation of DNA microarray results with quantitative gene expression platforms. Nat. Biotechnol. 24, 1115–1122 (2006). 23. Shippy, R. et al. Using RNA sample titrations to assess microarray platform performance and normalization techniques. Nat. Biotechnol. 24, 1123–1131 (2006).

24. Patterson, T.A. et al. Performance comparison of one-color and two-color platforms within the MicroArray Quality Control (MAQC) project. Nat. Biotechnol. 24, 1140– 1150 (2006). 25. Tong, W. et al. Evaluation of external RNA controls for the assessment of microarray performance. Nat. Biotechnol. 24, 1132–1139 (2006). 26. Guo, L. et al. Rat toxicogenomic study reveals analytical consistency across microarray platforms. Nat. Biotechnol. 24, 1162–1169 (2006). 27. Mecham, B.H. et al. Sequence-matched probes produce increased cross-platform consistency and more reproducible biological results in microarray-based gene expression measurements. Nucleic Acids Res. 32, e74 (2004). 28. Carter, S.L., Eklund, A.C., Mecham, B.H., Kohane, I.S. & Szallasi, Z. Redefinition of Affymetrix probe sets by sequence overlap with cDNA microarray probes reduces crossplatform inconsistencies in cancer-associated gene expression measurements. BMC Bioinformatics 6, 107 (2005). 29. Draghici, S., Khatri, P., Eklund, A.C. & Szallasi, Z. Reliability and reproducibility issues in DNA microarray measurements. Trends Genet. 22, 101–109 (2006). 30. Irizarry, R.A., Wu, Z. & Jaffee, H.A. Comparison of Affymetrix GeneChip expression measures. Bioinformatics 22, 789–794 (2006). 31. Pruitt, K.D., Tatusova, T. & Maglott, D.R. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 33, D501–D504 (2005). 32. Thierry-Mieg, D.& J, T.-M. AceView: a comprehensive cDNA-supported gene and transcripts annotation. Genome Biology 7, Suppl. 1, S12 (2006). 33. Bammler, T. et al. Standardizing global gene expression analysis between laboratories and across platforms. Nat. Methods 2, 351–356 (2005). 34. Harr, B. & Schlotterer, C. Comparison of algorithms for the analysis of Affymetrix microarray data as evaluated by co-expression of genes in known operons. Nucleic Acids Res. 34, e8 (2006). 35. Tusher, V.G., Tibshirani, R. & Chu, G. Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. USA 98, 5116–5121 (2001). 36. Thompson, K.L. et al. Use of a mixed tissue RNA design for performance assessments on multiple microarray formats. Nucleic Acids Res. 33, e187 (2005). 37. Baker, S.C. et al. The External RNA Controls Consortium: a progress report. Nat. Methods 2, 731–734 (2005). 38. Reid, L.H. The value of a proficiency testing program to monitor performance in microarray laboratories. Pharm. Discov. 5, 20–25 (2005). 39. Ball, C.A. et al. Standards for microarray data. Science 298, 539 (2002). 40. Tong, W. et al. ArrayTrack–supporting toxicogenomic research at the U.S. Food and Drug Administration National Center for Toxicological Research. Environ. Health Perspect. 111, 1819–1826 (2003). 41. Tong, W. et al. Development of public toxicogenomics software for microarray data management and analysis. Mutat. Res. 549, 241–253 (2004).

AUTHORS The following authors contributed to project leadership: Leming Shi1, Laura H Reid2, Wendell D Jones2, Richard Shippy3, Janet A Warrington4, Shawn C Baker5, Patrick J Collins6, Francoise de Longueville7, Ernest S Kawasaki8, Kathleen Y Lee9, Yuling Luo10, Yongming Andrew Sun9, James C Willey11, Robert A Setterquist12, Gavin M Fischer13, Weida Tong1, Yvonne P Dragan1, David J Dix14, Felix W Frueh15, Federico M Goodsaid15, Damir Herman16, Roderick V Jensen17, Charles D Johnson18, Edward K Lobenhofer19, Raj K Puri20, Uwe Scherf21, Jean Thierry-Mieg16, Charles Wang22, Mike Wilson12,18, Paul K Wolber6, Lu Zhang9,23, William Slikker, Jr1, Leming Shi1, Laura H Reid2 Project leader: Leming Shi1 Manuscript preparation team leader: Laura H Reid2 MAQC Consortium: Leming Shi1, Laura H Reid2, Wendell D Jones2, Richard Shippy3, Janet A Warrington4, Shawn C Baker5, Patrick J Collins6, Francoise de Longueville7, Ernest S Kawasaki8, Kathleen Y Lee9, Yuling Luo10, Yongming Andrew Sun9, James C Willey11, Robert A Setterquist12, Gavin M Fischer13, Weida Tong1, Yvonne P Dragan1, David J Dix14, Felix W Frueh15, Federico M Goodsaid15, Damir Herman16, Roderick V Jensen17, Charles D Johnson18, Edward K Lobenhofer19, Raj K Puri20, Uwe Scherf21, Jean Thierry-Mieg16, Charles Wang22, Mike Wilson12,18, Paul K Wolber6, Lu Zhang9,23, Shashi Amur15, Wenjun Bao24, Catalin C Barbacioru9, Anne Bergstrom Lucas6, Vincent Bertholet7, Cecilie Boysen25, Bud Bromley25, Donna Brown26, Alan Brunner3, Roger Canales9, Xiaoxi Megan Cao27, Thomas A Cebula28, James J Chen1, Jing Cheng29, Tzu-Ming Chu24, Eugene Chudin5, John Corson6, J Christopher Corton14, Lisa J Croner30, Christopher Davies4, Timothy S Davison18, Glenda Delenstarr6, Xutao Deng22, David Dorris12, Aron C Eklund17, Xiao-hui Fan1, Hong Fang27, Stephanie Fulmer-Smentek6, James C Fuscoe1, Kathryn Gallagher31, Weigong Ge1, Lei Guo1, Xu Guo4, Janet Hager32, Paul K Haje33, Jing Han20, Tao Han1, Heather C Harbottle34, Stephen C Harris1, Eli Hatchwell35, Craig A Hauser36, Susan Hester14, Huixiao Hong27, Patrick Hurban19, Scott A Jackson28, Hanlee Ji37, Charles R Knight38, Winston P Kuo39, J Eugene LeClerc28, Shawn Levy40, Quan-Zhen Li41, Chunmei Liu4, Ying Liu42, Michael J Lombardi17, Yunqing Ma10, Scott R Magnuson43, Botoul Maqsodi10, Tim McDaniel4, Nan Mei1, Ola Myklebost44, Baitang Ning1, Natalia Novoradovskaya13, Michael S Orr15, Terry W Osborn38, Adam Papallo17, Tucker A Patterson1, Roger G Perkins27, Elizabeth H Peters38, Ron Peterson45, Kenneth L Philips19, P Scott Pine15, Lajos Pusztai46, Feng Qian27, Hongzu Ren14, Mitch Rosen14, Barry A Rosenzweig15, Raymond R Samaha9, Mark Schena33, Gary P Schroth23, Svetlana Shchegrova6, Dave D Smith47, Frank Staedtler45, Zhenqiang Su1, Hongmei Sun27, Zoltan Szallasi48, Zivana Tezak21, Danielle Thierry-Mieg16, Karol L Thompson15, Irina Tikhonova32, Yaron Turpaz4, Beena Vallanat14, Christophe Van7,

1160

VOLUME 24

NUMBER 9

SEPTEMBER 2006

NATURE BIOTECHNOLOGY

ARTICLES Stephen J Walker49, Sue Jane Wang15, Yonghong Wang8, Russ Wolfinger24, Alex Wong6, Jie Wu27, Chunlin Xiao9, Qian Xie27, Jun Xu22, Wen Yang10, Liang Zhang29, Sheng Zhong50, Yaping Zong51, William Slikker, Jr1 Scientific management (National Center for Toxicological Research, US Food and Drug Administration): Leming Shi, Weida Tong, Yvonne P. Dragan, William Slikker, Jr. Affiliations:

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

1National

Center for Toxicological Research, US Food and Drug Administration, 3900 NCTR Road, Jefferson, Arkansas 72079, USA; 2Expression Analysis, Inc., 2605 Meridian Parkway, Durham, North Carolina 27713, USA; 3GE Healthcare, 7700 S. River Parkway, Suite 2603, Tempe, AZ 85284, USA; 4Affymetrix, Inc., 3420 Central Expressway, Santa Clara, California 95051, USA; 5Illumina,Inc. 9885 Towne Centre Drive, San Diego, California 92121, USA; 6Agilent Technologies, Inc., 5301 Stevens Creek Blvd., Santa Clara, California 95051, USA; 7Eppendorf Array Technologies, rue du Se´minaire 20a, 5000 Namur, Belgium; 8NCI Advanced Technology Center, 8717 Grovemont Circle, Bethesda, Maryland 20892, USA; 9Applied Biosystems, 850 Lincoln Centre Drive, Foster City, California 94404, USA; 10Panomics, Inc., 6519 Dumbarton Circle, Fremont, California 94555, USA; 11Medical University of Ohio, 3000 Arlington Avenue, Toledo, Ohio 43614, USA; 12Ambion, An Applied Biosystems Business, 2130 Woodward Street, Austin, Texas 78744, USA; 13Stratagene Corp., 11011 North Torrey Pines Road, La Jolla, California 92130, USA; 14Office of Research and Development, US Environmental Protection Agency, 109 TW Alexander Drive, Research Triangle Park, North Carolina 27711, USA; 15Center for Drug Evaluation and Research, US Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, Maryland 20993, USA; 16National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, Maryland 20894, USA; 17University of Massachusetts-Boston, 100 Morrissey Boulevard, Boston, Massachusetts 02125, USA; 18Asuragen, Inc., 2150 Woodward, Austin, Texas 78744, USA; 19CogenicsTM, A Division of Clinical Data, Inc., 100 Perimeter Park Drive, Suite C, Morrisville, North Carolina 27560, USA; 20Center for Biologics Evaluation and Research, US Food and Drug Administration, 29 Lincoln Drive, Bethesda, Maryland 20892, USA; 21Center for Devices and Radiological Health, US Food and Drug Administration, 2098 Gaither Road, Rockville, Maryland 20850, USA; 22UCLA David Geffen School of Medicine, Transcriptional Genomics Core, Cedars-Sinai Medical Center, 8700 Beverly Boulevard, Los Angeles, California 90048, USA; 23Solexa, Inc., 25861 Industrial Boulevard, Hayward, California 94545, USA; 24SAS Institute, Inc., 100 SAS Campus Drive, Cary, North Carolina 27513, USA; 25Vialogy Corp., 2400 Lincoln Avenue, Altadena, California 91001, USA; 26Operon Biotechnologies, 2211 Seminole Drive, Huntsville, Alabama 35805, USA; 27Z-Tech Corp., 3900 NCTR Road, Jefferson, Arkansas 72079, USA; 28Center for Food Safety and Applied Nutrition, US Food and Drug Administration, 8401 Muirkirk Road, Laurel, Maryland 20708, USA; 29CapitalBio Corp., 18 Life Science Parkway, Changping District, Beijing 102206, China; 30Biogen Idec, 5200 Research Place, San Diego, California 92122, USA; 31US Environmental Protection Agency, Office of the Science Advisor, 1200 Pennsylvania Avenue, NW, Washington, DC 20460, USA; 32Yale University, W.M. Keck Biotechnology Resource Laboratory, Microarray Resource, 300 George Street, New Haven, Connecticut 06511, USA; 33TeleChem ArrayIt, 524 E. Weddell Drive, Sunnyvale, California 94089, USA; 34Center for Veterinary Medicine, US Food and Drug Administration, 8401 Muirkirk Road, Laurel, Maryland 20708, USA; 35Cold Spring Harbor Laboratory, 500 Sunnyside Boulevard, Woodbury, New York 11797, USA; 36Burnham Institute, 10901 North Torrey Pines Road, La Jolla, California 92037, USA; 37Stanford University School of Medicine, 318 Campus Drive, Stanford, California 94305, USA; 38Gene Express, Inc., 975 Research Drive, Toledo, Ohio 43614, USA; 39Harvard School of Dental Medicine, Department of Developmental Biology, 188 Longwood Avenue, Boston, Massachusetts 02115, USA; 40Vanderbilt University, 465 21st Avenue South, Nashville, Tennessee 37232, USA; 41University Texas Southwestern Medical Center, 6000 Harry Hines Boulevard/ND6.504, Dallas, Texas 75390, USA; 42University of Texas at Dallas, Department of Computer Science, MS EC31 Richardson, Texas 75083, USA; 43GenUs BioSystems, Inc., 1808 Janke Drive Unit M, Northbrook, Illinois 60062, USA; 44Norwegian Microarray Consortium, Rikshospitalet - Radiumhospitalet Health Centre, Montebello, N0310 Oslo, Norway; 45Novartis, 250 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA; 46MD Anderson Cancer Center, Breast Medical Oncology Department-Unit 1354, 1155 Pressler Street, Houston, Texas 77230, USA; 47Luminex Corp., 12212 Technology Boulevard, Austin, Texas 78727, USA; 48Harvard Medical School, Children’s Hospital Informatics Program at the Harvard-MIT Division of Health Sciences and Technology (CHIP@HST), Boston, Massachusetts 02115, USA; 49Wake Forest University School of Medicine, Department of Physiology and Pharmacology, Medical Center Boulevard, Winston-Salem, North Carolina 27157, USA; 50University of Illinois at Urbana-Champaign, Department of Bioengineering, 1304 W. Springfield Avenue, Urbana, Illinois 61801, USA; 51Full Moon Biosystems, Inc., 754 N. Pastoria Avenue, Sunnyvale, California 94085, USA.

NATURE BIOTECHNOLOGY

VOLUME 24

NUMBER 9

SEPTEMBER 2006

1161

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

ARTICLES

Rat toxicogenomic study reveals analytical consistency across microarray platforms Lei Guo1, Edward K Lobenhofer2, Charles Wang3, Richard Shippy4, Stephen C Harris1, Lu Zhang5, Nan Mei1, Tao Chen1, Damir Herman6, Federico M Goodsaid7, Patrick Hurban2, Kenneth L Phillips2, Jun Xu3, Xutao Deng3, Yongming Andrew Sun8, Weida Tong1, Yvonne P Dragan1 & Leming Shi1 To validate and extend the findings of the MicroArray Quality Control (MAQC) project, a biologically relevant toxicogenomics data set was generated using 36 RNA samples from rats treated with three chemicals (aristolochic acid, riddelliine and comfrey) and each sample was hybridized to four microarray platforms. The MAQC project assessed concordance in intersite and cross-platform comparisons and the impact of gene selection methods on the reproducibility of profiling data in terms of differentially expressed genes using distinct reference RNA samples. The real-world toxicogenomic data set reported here showed high concordance in intersite and cross-platform comparisons. Further, gene lists generated by fold-change ranking were more reproducible than those obtained by t-test P value or Significance Analysis of Microarrays. Finally, gene lists generated by fold-change ranking with a nonstringent P-value cutoff showed increased consistency in Gene Ontology terms and pathways, and hence the biological impact of chemical exposure could be reliably deduced from all platforms analyzed.

To validate and extend the findings of the MAQC project1, described elsewhere in this issue, we generated a toxicogenomics data set using a rat chemical exposure study. One of the objectives of the MAQC project was to assess the reproducibility of gene expression profiling data across laboratories and platforms. Analysis of the MAQC data set shows the high reproducibility of microarray data under well-controlled conditions and further indicates that the criteria used to define differentially expressed genes can have a dramatic impact on the overlap of the resulting gene lists. In particular, lists of differentially expressed genes generated using fold change, rather than t-test P value for gene selection have been previously proposed to be more reproducible1,2. The two RNA samples used in the MAQC project were reference samples with no explicit biological connection: the Stratagene Universal Human Reference RNA (comprised of RNA from ten different cell lines) and Ambion Human Brain Reference RNA1. The availability of these data provides an invaluable resource for benchmarking laboratory performance and for testing and validating new procedures, equipment and reagents, for example. Although data from these reference samples address technical performance and reproducibility of results from microarray technology, they cannot address whether microarray data from different laboratories or platforms would result in the same biological interpretation of real-world

samples. We therefore sought to apply the findings of the MAQC study to a set of experimental toxicogenomic data to validate the approach. Several recent publications have investigated the genotoxicity of three botanical carcinogens: aristolochic acid, riddelliine and comfrey3–6. In the present study, 36 RNAs were isolated from the kidney and/or liver of rats exposed to one of these compounds or a control group. To corroborate the findings of the MAQC project and to determine whether the same biological interpretations would result from cross-platform comparisons, we hybridized these samples to four commercially available platforms (Affymetrix, Agilent, Applied Biosystems and GE Healthcare). To address intersite performance, we used the Affymetrix platform at two different test sites. The results from this study are consistent with those of the MAQC project in that good concordance is found between data generated at different sites, as well as from different platforms. Furthermore, when fold-change ranking is used as the primary criterion for selecting differentially expressed genes, the overlap between gene lists from different laboratories using either the same or different platforms is high. In contrast, when a t-statistic (P-value) ranking is used as the primary criterion the cross-site or cross-platform overlap is substantially lower1,2. The selection criteria for differential expression can thus affect both the apparent reproducibility of microarray data, as well as

1National Center for Toxicological Research, US Food and Drug Administration, 3900 NCTR Road, Jefferson, Arkansas 72079, USA; 2Cogenics, A Division of Clinical Data, 100 Perimeter Park Drive, Suite C, Morrisville, North Carolina 27560, USA; 3UCLA David Geffen School of Medicine, Transcriptional Genomics Core, Cedars-Sinai Medical Center, 8700 Beverly Boulevard, Los Angeles, California 90048, USA; 4GE Healthcare, 7700 S. River Parkway, Suite #2603, Tempe, Arizona 85284, USA; 5Solexa, 25861 Industrial Boulevard, Hayward, California 94545, USA; 6National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, Maryland 20894, USA; 7Center for Drug Evaluation and Research, US Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, Maryland 20993, USA; 8Applied Biosystems, 850 Lincoln Centre Drive, Foster City, California 94404, USA. Correspondence should be addressed to L.G. ([email protected]) or L.S. ([email protected]).

Received 5 June; accepted 18 July; published online 8 September 2006; doi:10.1038/nbt1238

1162

VOLUME 24

NUMBER 9

SEPTEMBER 2006

NATURE BIOTECHNOLOGY

ARTICLES

b

d

AG1

GEH

GEH_K_AA_4 GEH_K_AA_6 GEH_K_AA_3 GEH_K_AA_5 GEH_K_AA_1 GEH_K_AA_2 GEH_K_CTR_4 GEH_K_CTR_6 GEH_K_CTR_2 GEH_K_CTR_3 GEH_K_CTR_1 GEH_K_CTR_5 GEH_L_RDL_1 GEH_L_RDL_6 GEH_L_RDL_3 GEH_L_RDL_5 GEH_L_RDL_2 GEH_L_RDL_4 GEH_L_CTR_4 GEH_L_AA_4 GEH_L_AA_3 GEH_L_AA_1 GEH_L_AA_2 GEH_L_AA_6 GEH_L_CTR_1 GEH_L_CTR_2 GEH_L_CTR_3 GEH_L_CTR_5 GEH_L_CTR_6 GEH_L_AA_5 GEH_L_CFY_5 GEH_L_CFY_4 GEH_L_CFY_1 GEH_L_CFY_3 GEH_L_CFY_2 GEH_L_CFY_6

c

AFX

AFX_L_CFY_4 AFX_L_CFY_6 AFX_L_CFY_2 AFX_L_CFY_3 AFX_L_CFY_1 AFX_L_CFY_5 AFX_L_AA_5 AFX_L_CTR_6 AFX_L_AA_4 AFX_L_AA_1 AFX_L_AA_3 AFX_L_AA_2 AFX_L_AA_6 AFX_L_CTR_1 AFX_L_CTR_3 AFX_L_CTR_4 AFX_L_CTR_2 AFX_L_CTR_5 AFX_L_RDL_1 AFX_L_RDL_2 AFX_L_RDL_4 AFX_L_RDL_3 AFX_L_RDL_5 F AFX_L_RDL_6 AFX_K_AA_4 AFX_K_CTR_4 AFX_K_CTR_2 AFX_K_CTR_1 AFX_K_CTR_3 AFX_K_CTR_5 AFX_K_CTR_6 AFX_K_AA_6 AFX_K_AA_3 AFX_K_AA_5 AFX_K_AA_1 AFX_K_AA_2

ABI_K_AA_4 ABI_K_AA_1 ABI_K_AA_2 ABI_K_AA_3 ABI_K_AA_6 ABI_K_AA_5 ABI_K_CTR_2 ABI_K_CTR_3 ABI_K_CTR_1 ABI_K_CTR_5 ABI_K_CTR_4 ABI_K_CTR_6 ABI_L_CFY_4 ABI_L_CFY_1 ABI_L_CFY_5 ABI_L_CFY_6 ABI_L_CFY_2 ABI_L_CFY_3 ABI_L_CTR_6 ABI_L_RDL_1 ABI_L_RDL_2 ABI_L_RDL_4 ABI_L_RDL_5 ABI_L_RDL_3 ABI_L_RDL_6 ABI_L_CTR_5 ABI_L_CTR_2 ABI_L_CTR_1 ABI_L_CTR_3 ABI_L_CTR_4 ABI_L_AA_6 ABI_L_AA_2 ABI_L_AA_1 ABI_L_AA_3 ABI_L_AA_4 ABI_L_AA_5

ABI

AG1_K_AA_6 AG1_K_CTR_3 AG1_K_CTR_4 AG1_K_CTR_5 AG1_K_CTR_1 AG1_K_CTR_2 AG1_K_CTR_6 AG1_K_AA_4 AG1_K_AA_3 AG1_K_AA_5 AG1_K_AA_1 AG1_K_AA_2 AG1_L_CFY_1 AG1_L_CFY_2 AG1_L_CFY_4 AG1_L_CFY_5 AG1_L_CFY_3 AG1_L_CFY_6 AG1_L_AA_2 AG1_L_RDL_2 AG1_L_RDL_6 AG1_L_RDL_3 AG1_L_RDL_4 AG1_L_RDL_1 AG1_L_RDL_5 AG1_L_AA_5 AG1_L_AA_6 AG1_L_CTR_5 AG1_L_AA_4 AG1_L_CTR_2 AG1_L_CTR_4 AG1_L_AA_1 AG1_L_AA_3 AG1_L_CTR_6 AG1_L_CTR_1 AG1_L_CTR_3

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

a

Figure 1 Hierarchical clustering of platform-specific microarray data separates samples by tissue and treatment. For each platform, the log2 intensity data from all 36 microarrays after filtering for genes flagged as below the detection level were hierarchically clustered using an average linkage algorithm and Euclidean distance as the distance metric. (a) Data from the Applied Biosystems platform (ABI). (b) Affymetrix site 1(AFX). (c) Agilent (AG1). (d) GE Healthcare (GEH). The sample labels are colored based on treatment/tissue group. Black, control kidney; purple, aristolochic acid–treated kidney; blue, control liver; red, aristolochic acid–treated liver; orange, riddelliine-treated liver; green, comfrey-treated liver.

the biological interpretation of the data. By using fold-change ranking plus a nonstringent P-value cutoff, the overlap of differentially expressed gene lists is increased, leading to improved agreement of the biological interpretation of the data in terms of enriched Gene Ontology (GO) nodes and pathways. Furthermore, data generated by this approach led to novel biological findings concerning chemical exposure. These findings are reproducible across laboratories and platforms when the preferred gene selection criteria are used. Together, these results further support the findings of the MAQC project, highlight the importance of appropriate data analysis procedures and demonstrate that microarray data generated from different platforms not only result in similar biological interpretation, but also reveal novel findings. RESULTS RNA was isolated from the target organs of rats exposed to aristolochic acid, riddelliine or comfrey, from studies that have been detailed previously3–6. In total there were six treatment/tissue groups: kidney from aristolochic acid–treated rats, kidney from vehicle

control, liver from aristolochic acid–treated rats, liver from riddelliine-treated rats, liver from comfrey-treated rats and liver from vehicle control. Within each treatment/tissue group there were six biological replicates. Aliquots of these samples were prepared and distributed to each of the test sites for gene expression profiling using microarrays from four different platforms. Laboratory procedures were identical to those in the MAQC project1. Unless otherwise stated, the platform manufacturer’s recommendations were used for data processing. Hierarchical clustering analysis To assess the overall reproducibility of microarray data from the four platforms, we performed hierarchical clustering analyses for each platform. Within each platform, samples were largely clustered first by tissue type and then by treatment (Fig. 1). Within each platform there are individual samples that did not cluster with the other members of their respective treatment/tissue group; however, the only sample that was consistently different across all platforms was sample no. 4 from the aristolochic acid–treated kidney

Table 1 Average Pearson correlation coefficients of log2-normalized intensity data for each treatment/tissue group No. of Probe(set)s

Aristolochic acid kidneya

Control kidney

Aristolochic acid liver

Comfrey liver

Control liver

Riddelliine liver

Applied Biosystems (ABI)

26,857

0.9586 (0.9623)

0.9742

0.9636

0.9737

0.9634

0.9705

Affymetrix no. 1 (AFX) Affymetrix no. 2 (AFX2)

31,099 31,099

0.9748 (0.9828) 0.9736 (0.9818)

0.9881 0.9879

0.9871 0.9860

0.9861 0.9827

0.9876 0.9862

0.9867 0.9836

Agilent (AG1) GE Healthcare (GEH)

41,071 35,129

0.9610 (0.9711) 0.9697 (0.9739)

0.9701 0.9761

0.9642 0.9690

0.9659 0.9690

0.9740 0.9687

0.9675 0.9734

Test site

aNumbers

in parentheses represent data after excluding sample no. 4.

NATURE BIOTECHNOLOGY

VOLUME 24

NUMBER 9

SEPTEMBER 2006

1163

ARTICLES

Kidney, aristolochic acid Liver, comfrey Liver, aristolochic acid Liver, riddelliine

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

group. Similar results were obtained using principal components analysis (Supplementary Fig. 1 online). DNA adduct data indicate that this sample had 50% fewer DNA adducts compared to the other five animals in the same treatment group (Mei, N. et al., ABI-K-AA-1 ABI-K-AA-4 AFX-K-AA-4 AFX2-K-AA-4 AG1-K-AA-4 GEH-K-AA-4 AFX-K-AA-1 AFX2-K-AA-1 AFX-K-AA-2 AFX2-K-AA-2 AFX-K-AA-3 AFX2-K-AA-3 AFX-K-AA-5 AFX2-K-AA-5 AFX-K-AA-6 AFX2-K-AA-6 AG1-K-AA-1 AG1-K-AA-2 AG1-K-AA-3 AG1-K-AA-5 GEH-K-AA-1 GEH-K-AA-2 GEH-K-AA-3 GEH-K-AA-5 GEH-K-AA-6 AGI-K-AA-6 ABI-K-AA-2 ABI-K-AA-3 ABI-K-AA-5 ABI-K-AA-6 ABI-CFY-1 ABI-CFY-2 ABI-CFY-3 ABI-CFY-6 ABI-CFY-4 ABI-CFY-5 AFX-CFY-1 AFX-CFY-2 AFX-CFY-3 AFX-CFY-6 AFX2-CFY-6 AFX2-CFY-3 AFX2-CFY-1 AFX2-CFY-2 AFX-CFY-5 AFX2-CFY-5 AFX-CFY-4 AFX2-CFY-4 AG1-CFY-1 AG1-CFY-2 AG1-CFY-3 AG1-CFY-6 AG1-CFY-4 AG1-CFY-5 GEH-CFY-1 GEH-CFY-2 GEH-CFY-3 GEH-CFY-6 GEH-CFY-4 GEH-CFY-5 ABI-AA-1 ABI-AA-2 ABI-AA-3 ABI-AA-4 ABI-AA-6 ABI-AA-5 AFX-AA-1 AFX-AA-3 AFX-AA-2 AFX-AA-4 AFX2-AA-4 AFX2-AA-1 AFX2-AA-2 AFX2-AA-3 AFX2-AA-6 AFX-AA-6 AG1-AA-1 AG1-AA-3 AG1-AA-4 GEH-AA-1 GEH-AA-2 GEH-AA-4 GEH-AA-3 AFX-AA-5 AFX2-AA-5 AG1-AA-5 GEH-AA-5 AG1-AA-6 AG1-AA-2 GEH-AA-6 ABI-RDL-1 ABI-RDL-3 ABI-RDL-6 ABI-RDL-2 ABI-RDL-4 ABI-RDL-5 AFX-RDL-1 AFX2-RDL-1 AFX-RDL-2 AFX2-RDL-2 AFX2-RDL-4 AFX-RDL-4 AFX-RDL-3 AFX-RDL-6 AFX2-RDL-3 AFX-RDL-5 AFX2-RDL-5 AFX2-RDL-6 AG1-RDL-2 AG1-RDL-3 AG1-RDL-6 AG1-RDL-1 AG1-RDL-5 GEH-RDL-1 GEH-RDL-2 GEH-RDL-4 GEH-RDL-3 GEH-RDL-5 GEH-RDL-6 AG1-RDL-4

1164

unpublished data), suggesting that the consistent failure of this sample to cluster with its treatment/tissue group may be biologically based. It was also determined that aristolochic acid–treated liver samples showed a relatively small difference in expression profiles when compared to their tissue-matched control group. This result was reproduced across all platforms and is consistent with previous observations that kidney, not liver, is the target organ of aristolochic acid–mediated carcinogenesis7. The reproducibility of the microarray data was further explored by calculating the Pearson correlation coefficients of the log2 intensity data for all pair-wise sample comparisons within a treatment/tissue group for each platform. Table 1 shows the average correlation of biological replicates within each treatment/tissue group for each platform and further demonstrates the high degree of similarity of these data. Because of the presence of an animal that had a diminished treatment response, the aristolochic acid–treated kidney group had a significantly lower correlation, as expected, compared to other groups (e.g., P ¼ 0.0024, two-sided, paired t-test compared to the control kidney group). Removal of sample no. 4 from the aristolochic acid–treated kidney group resulted in a less significant difference (P ¼ 0.085, two-sided, paired t-test compared to the control kidney group). These data coupled with the DNA adduct data consistently indicate that sample no. 4 from the aristolochic acid–treated kidney group has a different response relative to the other group members. Therefore, for the assessment of cross-platform data consistency, the data from this sample have been excluded. Overlap of differentially expressed gene lists across sites One of the fundamental goals of a gene expression profiling experiment is to identify those genes that are differentially expressed within the system being studied. There are a large number of methods for selecting such genes, and ultimately, the genes that are identified have a fundamental impact on the biological interpretation of the data. Therefore, this toxicogenomics study was used to validate the findings in regard to gene selection methods by employing different selection criteria and determining the percentage of overlap between different laboratories or platforms1,2. The overlap across the two sites that generated data using the Affymetrix platform is high (85–90%) when the genes (from a few up to B2,000) are selected by rank ordering the genes based on fold change (Supplementary Fig. 2 online). As more genes are considered differentially expressed (that is, moving to the right on the x-axis) the percentage of overlap begins to decline because of the inclusion of more genes demonstrating smaller fold changes, which are less likely to be reproducible across sites. There is a small decrease in the overlap when a P-value cutoff of 0.01 or 0.05 is applied to the fold change–based,

Figure 2 Hierarchical clustering of all individual sample data from all microarray platforms separated by tissue and treatment. Within each platform/site, a fold change was calculated and log2 transformed for all 5,112 common genes that did not have any missing values (n ¼ 4,609) for each of the 24-treated individual samples compared to a tissue-match control. These values were then hierarchically clustered using Euclidean distance metric and average linkage. Each row represents the results from an individual treated animal assayed on a particular platform. Each row is labeled with a platform designation first, followed by the organ assayed for kidney samples, and then the treatment and unique animal identifier (1–6). ABI, Applied Biosystems platform; AFX, Affymetrix site 1; AFX2, Affymetrix site 2; AG1, Agilent; and GEH, GE Healthcare. K, kidney; AA, aristolochic acid; RDL, riddelliine; and CFY, comfrey. The yellow boxes highlight areas in which replicates of the same sample across all multiple platform and/or sites have clustered together.

VOLUME 24

NUMBER 9

SEPTEMBER 2006

NATURE BIOTECHNOLOGY

high (490%). Furthermore, global scaling methods do not alter the rank order of genes based on fold change (hence the gene lists); therefore, the overlap between raw, mean-, or median-scaled data is 100% when using the fold change for ranking and selecting genes. However, these scaling factors can affect the magnitude of the fold changes and the P values and thus will only affect the gene list when a P-value criterion is involved in gene selection. Our results are consistent with those reported elsewhere8. In addition to the standard t-test, numerous different statistical tests have been used for the identification of differentially expressed genes9. One commonly used method is Significance Analysis of Microarrays (SAM)10. Supplementary Figure 4 online illustrates the intersite concordance results of differentially expressed genes selected based on fold-change ranking, SAM, t-test and random selection when the data from the comfrey-treated liver samples are compared to their corresponding controls. The site-site concordance based on SAM was clearly improved over that based on a simple t-test, but did not achieve the same level of concordance as that reached based on foldchange ranking. Similar results were obtained when other sample pairs or cross-platform data were analyzed in the same manner (data not shown). Cumulatively, these results illustrate that fold change–based

gene-selection methods. This results from the P-value threshold altering the composition of the total list of genes such that each test site has a different list of genes to begin with in the gene selection process, thereby increasing the intersite inconsistency. Supplementary Figure 2 also illustrates the overlap when genes are selected based on P-value rank ordering alone or with a fold-change criterion of 2.0 or 1.4. For P value–based gene-selection methods, the overlap gradually increases as the number of differentially expressed genes increases. An increase in the overlap is also observed when a fold-change cutoff of 1.4 or 2.0 is applied in conjunction with the P-value criterion. This is understandable since the larger fold changes are more easily reproduced than smaller ones. The impact of different normalization methods on the overlap of gene lists was also assessed by comparing the overlap of gene lists derived from two normalization methods using the same gene selection method on the same sample pair comparison from data generated at the same test site (Supplementary Fig. 3 online). When P value is used as the criterion for gene selection, the overlap from different normalization methods is relatively low. However, when genes are ranked and selected based on fold change with or without a P-value cutoff, the overlap between different normalization methods is very

Intralaboratory concordance (%)

a

100

ABI

b 100

AFX

c 100

90

90

90

80

80

80

70

70

70

60

60

60

50

50

50

40

40

40

30

30

30

20

20

20

10

10

10

0

0 1

10

100

1,000

d 100

AG1

100

e 100 90

80

80

70

70

60

60

50

50

40

40

30

30

20

20

10

10

0

10

1,000

10,000

Number of genes selected as differentially expressed

90

AFX2

0 1

10,000

Number of genes selected as differentially expressed

Intralaboratory concordance (%)

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

ARTICLES

1

10

100

1,000

10,000

Number of genes selected as differentially expressed

GEH

Fold-change rank ordering Fold-change rank ordering/P-value <0.05 Fold-change rank ordering/P-value <0.01 P -value rank ordering/fold-change >2.0 P -value rank ordering/fold-change >1.4 P -value rank ordering

0 1

10 100 1,000 10,000 Number of genes selected as differentially expressed

1

10 100 1,000 10,000 Number of genes selected as differentially expressed

Figure 3 Intralaboratory overlap of differentially expressed gene lists generated using different selection criteria. For each platform, the liver control and comfrey treatment groups were equally and randomly divided into two experiments and the differentially expressed genes were identified independently from the two experiments using different gene selection criteria. Differentially expressed genes were selected from a subset of genes that are detectable by both experiments. The x-axis represents the number of genes selected as differentially expressed, and the y-axis represents the overlap (%) of two gene lists for a given number of differentially expressed genes. Each line on the graph represents the intralaboratory overlap of differentially expressed gene lists based on one of six different gene ranking/selection methods. Red, fold-change rank ordering only; orange, P-value rank ordering only; light green, fold-change rank ordering and P o 0.01; blue, fold-change rank ordering and P o 0.05; teal, P-value rank ordering and fold change 41.4; and purple, P-value rank ordering and fold change 42.0. (a) Applied Biosystems (ABI). (b) Affymetrix site 1 (AFX). (c) Affymetrix site 2 (AFX2). (d) Agilent (AG1). (e) GE Healthcare (GEH).

NATURE BIOTECHNOLOGY

VOLUME 24

NUMBER 9

SEPTEMBER 2006

1165

20

40 60 80 100 120 140 Number of GO terms

100 90 80 70 60 50 40 30 20 10 0

c

AFX_FC (P < 0.05) AFX_P

0

20

40 60 80 100 120 140 Number of GO terms

100 90 80 70 60 50 40 30 20 10 0

AFX2_FC (P < 0.05) AFX2_P

0

20

d

Figure 4 Intralaboratory overlap of enriched GO terms. The control and treatment groups were equally and randomly divided into two experiments. From each experiment, the top 200 genes based on either a fold change (blue line) or P value (pink line) ranking were selected. The GO terms associated with these genes were then rank ordered and the overlap between the two experiments was identified and graphed to compare the percentage of overlap (y-axis) against the total number of GO terms present in both experiments. The results depicted are derived from the comfrey-treated comparisons for each platform, but similar results were generated with the other treatment comparisons. (a) Applied Biosystems (ABI). (b) Affymetrix site 1 (AFX). (c) Affymetrix site 2 (AFX2). (d) Agilent (AG1). (e) GE Healthcare (GEH).

Overlap of differentially expressed gene lists across platforms To assess the reproducibility of data across multiple microarray platforms, we identified the list of genes that was measured by all four of the microarray platforms using the March 2006 version of the RefSeq database and the methods described by the MAQC project1. This resulted in the identification of 5,112 common genes, which were used in all subsequent cross-platform comparisons. Consistent with results from intersite comparisons (Supplementary Fig. 2 online), the crossplatform data comparisons reveal the same trends. Specifically, the percentage of overlap for differentially expressed gene lists is highest when fold change–based gene selection methods are used (Supplementary Fig. 5a online). Not surprisingly, the cross-platform overlap is higher (B80%) in all instances when genes that are not reproducibly detected on the microarrays are omitted (e.g., those probes that are flagged as ‘not present’) (Supplementary Fig. 5b online). These results combined with intersite results further corroborate the findings of the MAQC project that fold change–based selection criteria for differentially expressed genes generate more reproducible results1,2. No measure of sensitivity or specificity of the approach was included in the analysis. Within each platform/site, the fold change was calculated for all 5,112 common genes that did not have any missing values (n ¼ 4,609) for each of the 24-treated individual samples compared to a tissue-match control and these values were then hierarchically clustered (Fig. 2). The resulting dendrogram illustrates that the samples are separated by tissue and then by treatment. Each of the four major branches of the dendrogram contain all of the biological replicate data for a given treatment/tissue group regardless of the site or platform that was used to generate the data. Within each of these branches, the platform as opposed to the biological replicate is the next major division. There are a few notable exceptions to this observation. When the same platform is performed at different test sites, the replicates of the same sample assayed at different sites cluster more closely together. In a few instances, the results from multiple different platforms for the same biological sample cluster together (e.g., aristolochic acid–treated liver sample no. 5). Because no gene selection criteria were used to generate this visualization, these results further indicate that interlaboratory and cross-platform data are highly reproducible.

1166

AG1_FC (P < 0.05) AG1_P

0

e

selection methods usually offer a higher level of consistency of lists of differentially expressed genes.

100 90 80 70 60 50 40 30 20 10 0

40 60 80 100 120 140 Number of GO terms

Common GO terms (%)

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

0

b

Common GO terms (%)

ABI_FC (P < 0.05) ABI_P

Common GO terms (%)

100 90 80 70 60 50 40 30 20 10 0

Common GO terms (%)

a

Common GO terms (%)

ARTICLES

20

40 60 80 100 120 140 Number of GO terms

100 90 80 70 60 50 40 30 20 10 0

GEH_FC (P < 0.05) GEH_P

0

20

40

60

80 100 120 140

Number of GO terms

Agreement of biological interpretation with GO and pathways Typically, a microarray-based experiment is performed in a single laboratory using a single platform. Furthermore, it is relatively common to use three biological replicates in a toxicogenomic study when multiple groups of samples are involved. To explore whether or not a similar biological response was obtained when comparing results within a given laboratory, we generated data from six biological replicates. The control and treatment groups were then equally and randomly divided into two artificial experiments. Consistent with the interlaboratory and cross-platform results, the overlap of differentially expressed genes using different gene selection criteria from the intra-laboratory results revealed the same trend, namely that fold change–based selection criteria generate more reproducible results (Fig. 3). For each of the ABI, AFX and AFX2 intralaboratory comparisons, the overlap of gene lists was almost identical with or without a P cutoff (o0.05) for up to B1,000 genes selected as differentially expressed; for AG1 and GEH, the use of a P cutoff (o0.05) slightly increased the overlap of gene lists. However, the use of a more stringent P cutoff (o0.01) decreased the overlap of gene lists. These intralaboratory comparison results are consistent with those of interlaboratory comparisons (Supplementary Fig. 2 online). Therefore, a modest P cutoff (o0.05) appeared to be reasonable for data sets of this small sample size (3). Furthermore, the use of a fold-change threshold increased the overlap of gene lists derived from P-value ranking; a more stringent fold-change threshold leads to higher overlap of gene lists (Fig. 3 and Supplementary Fig. 2 online). The differences in overlap of gene lists based on selection criteria were further investigated by assessing the impact on the associated GO terms. From each artificial experiment, the top 200 genes based on either a fold-change (with P o 0.05 cutoff) or P-value ranking were selected. The P value from the Fisher’s exact test was calculated for each GO term associated with these genes. For each artificial experiment, the GO terms were then rank-ordered based on the P value. The overlap between the two artificial experiments was determined by dividing the number of GO terms commonly meeting a P-value ranking criterion in both of the artificial experiments by the total number of GO terms meeting the P-value criterion for either experiment. Figure 4 illustrates the percentage of overlapping GO terms plotted against a defined number of the highest ranking GO terms from both experiments. Clearly, the overlap of GO terms was much higher when genes are selected by fold change compared

VOLUME 24

NUMBER 9

SEPTEMBER 2006

NATURE BIOTECHNOLOGY

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

0

5

10

15

20

25

30

35

Number of KEGG pathways

0

5

10

15

20

25

30

c

35

Number of KEGG pathways

100 90 80 70 60 50 40 30 20 10 0

AFX2_FC (P < 0.05) AFX2_P

0

5

10

15

20

25

30

d

35

Figure 5 Intralaboratory overlap of differentially enriched KEGG pathways. The control and treatment groups were equally and randomly divided into two experiments. From each experiment, the top 200 genes based on either a fold change (blue line) or P-value (pink line) ranking were selected. The KEGG pathways associated with these genes were then rank ordered and the overlap between the two experiments was identified and graphed to compare the percentage of overlap (y-axis) against the total number of KEGG pathways present in both experiments. The results depicted are derived from the comfrey-treated comparisons for each platform, but similar results were generated with the other treatment comparisons. (a) Applied Biosystems (ABI). (b) Affymetrix site 1 (AFX). (c) Affymetrix site 2 (AFX2). (d) Agilent (AG1). (e) GE Healthcare (GEH).

Agreement of biological response To further explore the agreement of biological response across the microarray platforms, we combined data from the cross-platform common gene list (5,112 genes) and from the six comfrey-treated liver samples and compared them to the data from the six control liver samples for each platform. A t-test was performed and genes with P o 0.05 were identified. This filtered gene set was then rank ordered by fold change and for each platform the top 250 up- and downregulated genes were selected, generating a list of the top 500 differentially expressed genes for each of the five platform/site combinations (the overlap in genes between the gene lists for any two platforms is 470%). A GO enrichment analysis was performed for each platform by comparing the content of the top 500 differentially expressed genes to the content of the 5,112 common gene list using a Fisher’s Exact Test in GoMiner11,12, resulting in an enrichment P for each GO term. A comparison of P values across platforms identified 101 nodes that were significantly over- or underenriched (P o 0.05) in at least four of five platforms, with nearly 60% of these terms being significant in all five platforms. Inspection of these enriched categories confirmed that the different microarray platforms were reporting the same biological responses in these samples, and also provided novel insight into the effects of comfrey exposure. Comfrey is a perennial plant that has been widely used for over 2,000 years as an herbal medicine for a wide variety of ailments. However, comfrey has been shown to be both genotoxic and hepatotoxic13. The exact molecular mechanism underlying these toxicities is not fully understood, but is known to be associated with the pyrrolizidine alkaloids present in comfrey, which can be metabolically activated and bind to DNA6,14. Considering that there are 4350 different pyrrolizidine alkaloids found in over 6,000 different species15, it has been suggested that pyrrolizidine alkaloids are ‘‘probably the

NATURE BIOTECHNOLOGY

VOLUME 24

NUMBER 9

SEPTEMBER 2006

100 90 80 70 60 50 40 30 20 10 0

AG1_FC (P < 0.05) AG1_P

0

Number of KEGG pathways

e

to those selected by P value. Similar results were obtained when the gene lists are mapped to KEGG pathways (Fig. 5) or other pathway databases (e.g., Ingenuity) (data not shown). These results clearly show that common biological responses are evident when genes are selected by criteria that lead to reproducible gene lists. Nonoverlapping lists of differentially expressed genes generally lead to inconsistent biological interpretation of microarray results in terms of GO terms and pathways.

Common KEGG pathways (%)

AFX_FC (P < 0.05) AFX_P

Common KEGG pathways (%)

ABI_FC (P < 0.05) ABI_P

100 90 80 70 60 50 40 30 20 10 0

Common KEGG pathways (%)

b

100 90 80 70 60 50 40 30 20 10 0

Common KEGG pathways (%)

a

Common KEGG pathways (%)

ARTICLES

5 10 15 20 25 30 Number of KEGG pathways

100 90 80 70 60 50 40 30 20 10 0

35

GEH_FC (P < 0.05) GEH_P

0

5

10

15

20

25

30

35

Number of KEGG pathways

most common poisonous plant constituents that poison livestock, wildlife, and humans, worldwide14.’’ Examination of the 101 significant GO terms revealed at least two that were noteworthy: copper ion homeostasis (GO:0006878) and vitamin A metabolism (GO:0006776). Dietary or medicinal exposure to several pyrrolizidine alkaloid–containing plants has been shown to result in decreased levels of vitamin A in the liver and increased liver levels of copper16–18, but there is no indication that these effects have been observed in response to comfrey exposure. These results suggest that comfrey influences copper and vitamin A levels similar to other pyrrolizidine alkaloid–containing plants. Furthermore, these data are the first indication that changes in liver vitamin A and copper levels in response to pyrrolizidine alkaloid– exposure are transcriptionally regulated. Interestingly, only four genes associated with copper ion homeostasis are present in the common gene list and in all instances each platform identified two of these genes as significantly upregulated: amyloid beta (A4) precursor protein (APP) and prion protein (PRNP). Previously, both of these genes were shown to bind copper and were shown to be upregulated in response to chronic copper exposure19–21. Cumulatively, these findings indicate that comfrey, like several other pyrrolizidine alkaloid–containing plants, may affect liver levels of vitamin A and copper. Importantly, these data demonstrate that different microarray platforms can consistently report novel biological findings at the level of biological processes and of individual genes. DISCUSSION In this study, a data set was created that could validate and extend the findings of the MAQC project by focusing on a biologically relevant set of samples. Specifically, a large toxicogenomics data set was generated using 36 RNA samples from rats treated with three chemicals and four commercial microarray platforms to investigate the agreement of intersite and cross-platform gene lists. When a few or up to 2,000 genes are selected as differentially expressed from different sites using the same microarray platform, the percentage of overlap is B85% based on a fold-change criterion for gene ranking and selection (Supplementary Fig. 2 online). A lower percentage of overlap is observed using P as the criterion for gene ranking and selection, in particular when fewer genes are selected as differentially expressed. This same trend was also observed when gene selection methods were compared across platforms using the subset of 5,112 common genes (Supplementary Fig. 5 online). In addition,

1167

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

ARTICLES concordance offered by the widely used SAM approach did not achieve the same high level of concordance generated by fold-change ranking (Supplementary Fig. 4 online). These results are also consistent with those based on MAQC human samples and highlight the problems with commonly used gene selection methods that are solely based on t-test P values1,2. As expected, the degree of overlap of gene lists directly affects the ability to consistently identify the same biological response in regard to GO terms (Fig. 4) and KEGG pathways (Fig. 5). Therefore, to ensure reproducible biological interpretation of microarray results, it is important that criteria for generating lists of differentially expressed genes are selected properly. The lack of overlap of lists of differentially expressed genes selected using a P-value criterion may be explained by the fact that fold change is calculated by comparing signal intensity for a given gene as directly measured using a microarray, whereas the P-value calculation incorporates the signal-to-noise ratio. Therefore, if the signal intensity for the gene is more reproducible across laboratories or platforms than the associated noise level, this would result in the finding that fold change–based, gene-selection methods are more reproducible. However, the impact of the proposed analysis method on two other parameters, sensitivity and specificity, will also have to be assessed before any final conclusions can be drawn regarding the generalizability of this approach. Sample size is another important factor that impacts concordance of lists of differentially expressed genes. It is interesting to compare the results of Figure 3 (AFX and AFX2) with those of Supplementary Figure 2b online in which for the same microarray platform, one can observe an overall increased level in the overlap of differentially expressed genes when six replicates from different laboratories are compared as opposed to the three replicates from within the same laboratory. This increase is observed despite the potential for interlaboratory variation, which would affect the six-replicate comparisons but not comparisons of three replicates within a laboratory. This demonstrates the relationship between increases in statistical power and the resulting gain in reliable detection of differential expression that occurs with increased sample sizes. It is worth noting that differences between individual biological replicates also contribute to the relatively lower overlap observed in Figure 3. To illustrate the importance of using gene selection criteria that maximize overlap of gene lists, we first filtered the data (comfrey compared to control) using a relatively nonstringent P cutoff (o0.05) and then the remaining genes were rank ordered using fold change. By selecting the top 250 up- and downregulated genes from each platform and performing a GO enrichment analysis, not only was the cross-platform reproducibility of GO terms demonstrated, but a novel biological finding was also revealed on all platforms and at all sites. Specifically, comfrey, like several other pyrrolizidine alkaloid–containing plants, affects liver levels of vitamin A and copper; furthermore, these changes are, at least in part, transcriptionally regulated. Microarray technology has had a profound impact on biological research partially from its ability to identify differentially expressed genes that may be used to develop potential biomarkers, elucidate molecular mechanisms and group similar samples based on gene signatures. Therefore, the reproducibility and reliability of the data from a study and the choice of methods that lead to the identification of concordant lists of differentially expressed genes are critical for biological interpretation. Concerns have been raised regarding the reliability of microarray results due to the apparent lack of overlap of the lists of differentially expressed genes22–28. The results from this study suggest that the disappointingly low concordance reported in

1168

some earlier publications can be attributed in large part to the practice of deriving differentially expressed gene lists based on the ranking of genes solely by a statistical significance measure. Furthermore, these results demonstrate that microarray data generated from different platforms can not only result in a similar biological interpretation, but also reveal novel findings. METHODS Microarray processing. Details on the description of the in vivo portion of this study has been described3–6. Briefly, groups of six 6-week-old Big Blue rats were gavaged with riddelliine (1 mg/kg body weight) or aristolochic acid (10 mg/kg body weight) five times a week for 12 weeks or Big Blue rats were fed a diet of 8% comfrey roots for 12 weeks. The animals were sacrificed after 12 weeks of treatment, and the tissues were isolated, frozen quickly in liquid nitrogen and stored at –80 1C. RNA was isolated from tissues of rats that had been exposed to aristolochic acid (liver and kidney), riddelliine (liver), comfrey (liver) or a control group (liver and kidney). There were six biological replicates for each treatment/tissue group for a total of 36 samples. The samples were randomly labeled and each test site was provided an aliquot of each sample. To avoid potential confounding factors in experimental implementation, the identity of the RNA samples was kept unknown to the test sites before data were submitted to FDA/NCTR. The sample ID, RNA Integrity Number, OD ratio, microarray ID and data file names are provided in the Supplementary Table 1 online. Each of the RNA samples was labeled and hybridized to a microarray from one of four commercial platforms: Affymetrix (Rat Genome 230 2.0), Agilent (Whole Rat Genome Oligo Microarray, G4131A), Applied Biosystems (Rat Genome Survey Microarray) and GE Healthcare (Rat Whole Genome Bioarray, 300031). Except for Affymetrix, which was performed at two independent test sites, each platform was used at one single test site with 36 microarrays using biological replicate RNA samples. The labeling and hybridizations were performed according to the manufacturer’s recommendation using methods detailed in the MAQC project1. Data analysis. Unless otherwise stated, the manufacturer’s recommended normalization methods were used: quantile normalization for Applied Biosystems, PLIER with an offset value of 16 for Affymetrix and median-scaling for both Agilent and GE Healthcare1. To assess the impact of normalization methods on microarray results, we compared a limited number of commonly used normalization methods: raw, mean, median and quantile (Supplementary Fig. 3 online). The toxicogenomics data set generated in this study has also been used for the evaluation of microarray assay performance based on external RNA controls29. Six different gene selection methods were used: (i) fold-change rank ordering only, (ii) fold-change rank ordering and P o 0.01, (iii) fold-change rank ordering and P-value cutoff o 0.05, (iv) t-test P value (assuming equal variance) rank ordering only, (v) P-value rank ordering and fold change 41.4, (vi) P-value rank ordering and fold change 42.0. The percentage of overlapping genes from these differentially expressed gene lists was then calculated in the same way as was described elsewhere1. ArrayTrack30 was used for GO and KEGG pathway mapping, whereas GO enrichment analyses were performed using High Throughput GoMiner11,12. Cross-platform sequence mapping to RefSeq. Probe sequences from each microarray platform were mapped onto the NCBI-curated rat RefSeq database from March 2006. The same mapping criteria as reported for the main MAQC study was used1. The primary mapping criterion is a perfect match between a probe sequence and the target transcript sequence: a probe perfectly matches a transcript provided that a completely homologous sequence of length equal to the probe length is found anywhere on the transcript. The only exception to this rule is from the Affymetrix platform in which a ProbeSet is considered a perfect match to a transcript as long as 80% of probes within the ProbeSet (usually nine out of 11) perfectly match the same transcript. To simplify the cross-platform data analysis, a mapping table was generated with one probe per gene. Consistent with the MAQC main study1, if more than one probe from a platform perfectly matches the same gene, the probe closest to the 3¢ UTR was considered, resulting in 5,204 common non-model RefSeq mRNAs (NMs) mapped across 5,112 common genes (Supplementary Table 2 online).

VOLUME 24

NUMBER 9

SEPTEMBER 2006

NATURE BIOTECHNOLOGY

ARTICLES Accession numbers. All data are available through GEO (series accession number: GSE5350), ArrayExpress (accession number: E-TABM-132), ArrayTrack (http://www.fda.gov/nctr/science/centers/toxicoinformatics/ArrayTrack/), and the MAQC web site (http://www.fda.gov/nctr/science/centers/ toxicoinformatics/maqc/).

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

Note: Supplementary information is available on the Nature Biotechnology website.

ACKNOWLEDGMENTS E.K.L., K.L.P. and P.H. acknowledge Agilent Technologies, Inc. and Affymetrix, Inc. for their material contributions to this work, thank John Pufky, Stephen Burgin and Jennifer Troehler for their outstanding technical assistance, and gratefully acknowledge the Advanced Technology Program of the National Institute of Standards and Technology, whose generous support provided partial funding of this research (70NANB2H3009). C.W. acknowledges Affymetrix, Inc. for material contributions to this work. R.S. acknowledges technical support of Alan Brunner for generating GE Healthcare microarray data. L.G. and L.S. thank X. Megan Cao, Stacey Dial, Carrie Moland and Feng Qian for their superb technical assistance. DISCLAIMER This work includes contributions from, and was reviewed by, the FDA and the NIH. Certain commercial materials and equipment are identified in order to adequately specify experimental procedures. In no case does such identification imply recommendation or endorsement by the FDA or the NIH, nor does it imply that the items identified are necessarily the best available for the purpose. COMPETING INTERESTS STATEMENT The authors declare competing financial interests (see the Nature Biotechnology website for details). Published online at http://www.nature.com/naturebiotechnology/ Reprints and permissions information is available online at http://npg.nature.com/ reprintsandpermissions/ 1. MAQC Consortium. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat. Biotechnol. 24, 1151–1161 (2006). 2. Shi, L. et al. Cross-platform comparability of microarray technology: intra-platform consistency and appropriate data analysis procedures are essential. BMC Bioinformatics 6 (Suppl 2), S12 (2005). 3. Chen, L., Mei, N., Yao, L. & Chen, T. Mutations induced by carcinogenic doses of aristolochic acid in kidney of Big Blue transgenic rats. Toxicol. Lett. 165, 250–256 (2006). 4. Mei, N., Chou, M.W., Fu, P.P., Heflich, R.H. & Chen, T. Differential mutagenicity of riddelliine in liver endothelial and parenchymal cells of transgenic Big Blue rats. Cancer Lett. 215, 151–158 (2004). 5. Mei, N., Heflich, R.H., Chou, M.W. & Chen, T. Mutations induced by the carcinogenic pyrrolizidine alkaloid riddelliine in the liver cII gene of transgenic Big Blue rats. Chem. Res. Toxicol. 17, 814–818 (2004).

NATURE BIOTECHNOLOGY

VOLUME 24

NUMBER 9

SEPTEMBER 2006

6. Mei, N., Guo, L., Fu, P.P., Heflich, R.H. & Chen, T. Mutagenicity of comfrey (Symphytum Officinale) in rat liver. Br. J. Cancer 92, 873–875 (2005). 7. Arlt, V.M., Stiborova, M. & Schmeiser, H.H. Aristolochic acid as a probable human cancer hazard in herbal remedies: a review. Mutagenesis 17, 265–277 (2002). 8. Patterson, T.A. et al. Performance comparison of one-color and two-color platforms within the MicroArray Quality Control (MAQC) project. Nat. Biotechnol. 24, 1140– 1150 (2006). 9. Allison, D.B., Cui, X., Page, G.P. & Sabripour, M. Microarray data analysis: from disarray to consolidation and consensus. Nat. Rev. Genet. 7, 55–65 (2006). 10. Tusher, V.G., Tibshirani, R. & Chu, G. Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. USA 98, 5116–5121 (2001). 11. Zeeberg, B.R. et al. GoMiner: a resource for biological interpretation of genomic and proteomic data. Genome Biol. 4, R28 (2003). 12. Zeeberg, B.R. et al. High-Throughput GoMiner, an ‘industrial-strength’ integrative gene ontology tool for interpretation of multiple-microarray experiments, with application to studies of Common Variable Immune Deficiency (CVID). BMC Bioinformatics 6, 168 (2005). 13. Stickel, F. & Seitz, H.K. The efficacy and safety of comfrey. Public Health Nutr. 3, 501–508 (2000). 14. Fu, P.P., Xia, Q., Lin, G. & Chou, M.W. Pyrrolizidine alkaloids–genotoxicity, metabolism enzymes, metabolic activation, and mechanisms. Drug Metab. Rev. 36, 1–55 (2004). 15. Betz, J.M., Eppley, R.M., Taylor, W.C. & Andrzejewski, D. Determination of pyrrolizidine alkaloids in commercial comfrey products (Symphytum sp.). J. Pharm. Sci. 83, 649–653 (1994). 16. Cheeke, P.R. Toxicity and metabolism of pyrrolizidine alkaloids. J. Anim. Sci. 66, 2343–2350 (1988). 17. Huan, J. et al. Dietary pyrrolizidine (Senecio) alkaloids and tissue distribution of copper and vitamin A in broiler chickens. Toxicol. Lett. 62, 139–153 (1992). 18. Moghaddam, M.F. & Cheeke, P.R. Effects of dietary pyrrolizidine (Senecio) alkaloids on vitamin A metabolism in rats. Toxicol. Lett. 45, 149–156 (1989). 19. Armendariz, A.D., Gonzalez, M., Loguinov, A.V. & Vulpe, C.D. Gene expression profiling in chronic copper overload reveals upregulation of Prnp and App. Physiol. Genomics 20, 45–54 (2004). 20. Hesse, L., Beher, D., Masters, C.L. & Multhaup, G. The beta A4 amyloid precursor protein binding to copper. FEBS Lett. 349, 109–116 (1994). 21. Varela-Nallar, L., Toledo, E.M., Chacon, M.A. & Inestrosa, N.C. The functional links between prion protein and copper. Biol. Res. 39, 39–44 (2006). 22. Tan, P.K. et al. Evaluation of gene expression measurements from commercial microarray platforms. Nucleic Acids Res. 31, 5676–5684 (2003). 23. Ramalho-Santos, M., Yoon, S., Matsuzaki, Y., Mulligan, R.C. & Melton, D.A. ‘‘Stemness’’: transcriptional profiling of embryonic and adult stem cells. Science 298, 597– 600 (2002). 24. Ivanova, N.B. et al. A stem cell molecular signature. Science 298, 601–604 (2002). 25. Fortunel, N.O. et al. Comment on ’’ ‘Stemness’: transcriptional profiling of embryonic and adult stem cells’’ and ‘‘a stem cell molecular signature’’. Science 302, 393 author reply 393 (2003). 26. Marshall, E. Getting the noise out of gene arrays. Science 306, 630–631 (2004). 27. Miller, R.M. et al. Dysregulation of gene expression in the 1-methyl-4-phenyl-1,2,3,6tetrahydropyridine-lesioned mouse substantia nigra. J. Neurosci. 24, 7445–7454 (2004). 28. Frantz, S. An array of problems. Nat. Rev. Drug Discov. 4, 362–363 (2005). 29. Tong, W. et al. Evaluation of external RNA controls for the assessment of microarray performance. Nat. Biotechnol. 24, 1132–1139 (2006). 30. Tong, W. et al. ArrayTrack–supporting toxicogenomic research at the US Food and Drug Administration National Center for Toxicological Research. Environ. Health Perspect. 111, 1819–1826 (2003).

1169

E R R ATA , C O R R I G E N D A A N D R E T R A C T I O N

Erratum: Alfimeprase to succeed Genentech’s alteplase? Brian Vastag Nat. Biotechnol. 24, 875–876 (2006)

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

In the 6th paragraph, the statement “in 1996, Genentech launched Alteplase” is incorrect. Genentech received FDA approval of Alteplase for heart attack in 1987. In 1996, it received approval for Alteplase’s second indication, stroke.

Erratum: Diversifying chemical arrays Laura DeFrancesco Nat. Biotechnol. 24, 799 (2006) In the print version of the article, the author of the featured article is incorrectly identified as Brandord et al. The author’s name is Bradner.

Corrigendum: All in the RNA family Beverly L. Davidson Nat. Biotechnol. 24, 951–952 (2006) In the fifth paragraph, the abbreviation for prostate-specific membrane antigen (PSMA) was mistakenly written several times as PMSA. This error also appears in Figure 1.

Corrigendum: Engineering and characterization of a superfolder green fluorescent protein Jean-Denis Pédelacq, Stéphanie Cabantous, Timothy Tran, Thomas C Terwilliger & Geoffrey S Waldo Nat. Biotechnol. 24, 79–88 (2005) In the legend for Figure 4b and in the last line of paragraph 6 in Methods, “number of moles” should be moles. Also in Methods, paragraph 3, “superfolder GFP (27.747 kDa/mole)...” should read “superfolder GFP (27747 g/mole)” and “folding reporter GFP (27.742 kDa/mole)...” should read “...folding reporter GFP (27,742 g/mole).” The error has been corrected in the PDF version of the article.

Retraction: Identification of genes that function in the TNF-α-mediated apoptotic pathway using randomized hybrid ribozyme libraries Hiroaki Kawasaki, Reiko Onuki, Eigo Suyama & Kazunari Taira Nat. Biotechnol. 20, 376–380 (2002) Although the gene discovery technology described in this paper has been demonstrated to have practical utility by several independent researchers, the first author of the paper failed to maintain a proper data notebook to support the results presented. As this constitutes nonadherence to the ethical standards in scientific research, and in accordance with the recommendations from the National Institute of Advanced Industrial Science & Technology (AIST), R. Onuki, E. Suyami and K. Taira respectfully retract this paper. H. Kawasaki declines to associate himself with this retraction and maintains that all the data contained in the paper are valid.

1170

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

CAREERS AND RECRUITMENT

Five attributes of a successful manager in a research organization Grace H W Wong What does it take to make the transition from scientist to manager?

L

ittle in the education, training or background of scientists prepares them for management. Good managers tend to be good with people; they look at the larger picture, are good at motivating their team, adapt to unanticipated business events, are comfortable working to budgets and are able to assess and respond to risk. In contrast, researchers at the bench spend most of their time focusing on narrow scientific questions, designing experiments and budgeting resources needed for those experiments. One would think that these two professions—business management and scientific research—were mutually exclusive, and in the vast majority of cases, one would be right. But a talented few have been successful in making the transition from the bench to the boardroom. I asked several of these individuals to identify the key attributes to their success and the factors that influenced their transition from academia into management (Box 1).

The industry research manager It’s difficult enough managing a team of people in any business. One must manage budgets, prioritize time, delegate tasks, motivate a team and provide clear leadership. But managers in a research-intensive organization, such as a biotech or pharmaceutical company, also have to contend with several additional challenges. First and foremost, a research manager’s team (that is, scientists) comprises probably one of the least manageable groups of people on the planet. As Bob Ruffalo, president of R&D at Wyeth (Madison, NJ, USA), succinctly puts it: “Although I enjoy heading a group of scientists, they are, by their nature, very difficult to

Grace H. W. Wong is chief scientific officer at ActoKine Therapeutics and president of Student Vision. e-mail: [email protected]

manage, and they are not always comfortable with change.” Second, the team’s business goal—discovering (if not developing and marketing) drugs—is an endeavor with one of the highest rates of Wyeth’s Bob Ruffalo: failure and attrition “Scientists are, by of any industry. This their nature, very means that a manager difficult to manage.” is often faced with the decision to close or shelve projects—projects that their teams often are invested in intellectually—and the highs do not necessarily outweigh the disappointments. Third, the drug sector is so incredibly diverse that expertise may not be transferable. For example, a manager at a small startup venture faces different challenges from one heading a large team at a multinational pharma company. Business and management skills acquired in a small-to-medium enterprise (SME) environment, where money and resources are at a premium, may be less relevant to teams in big pharma, and vice versa. The jobs themselves are intensive and balance many different skill sets. Although most research managers remain married to the science—devoting as much time as possible to planning experiments, carrying out secondary analysis of data and reviewing key and pertinent literature—they spend an equal amount of their time performing managerial activities (personnel and site-wide meetings), prioritizing workloads for a particular day and attending various conferences. Many of the managers interviewed emphasized the importance of remaining close to their teams, and visiting the laboratories under their supervision to

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

demonstrate interest and support and to answer any questions. Five key attributes Given the demands and responsibilities of a manager in a drug research group—whatever its size or focus—it’s no surprise that it takes a talented person to succeed. Exceptional individuals may each have a unique way of tackling their jobs, but on the basis of feedback from our respondents, most successful R&D managers share several common attributes. Determination. In a sector where progress often appears to take the form of two steps forward and one step back, several executives regard staying power, quiet determination and persistence in the face of adversity as a key characteristic. Lex Van der Ploeg, site head for Merck Research Laboratories in Boston, Massachussetts, believes that research managers need “motivation and drive” and a resolve not to “get discouraged by failures.” Wyeth’s Ruffalo agrees, saying that you need to “learn how to cope with disappointment. The thing that frustrates me most is the enormous risk that we face in drug discovery and development. Most people do not understand the kinds of risk. . . the pharmaceutical industry is an extremely hard place to work.” Drive and diligence. Long hours and hard work are a given in biotech and pharma research groups. A typical day for Van der Ploeg starts between 4 and 5 a.m. “I get up early, walk the dog and then head to work. Because the days are generally packed with meeting and events, I make sure to reserve space and time for reflection. Evenings are spent with my family or with late-day work events. After 9 p.m., I do a bit more work. This daily cycle runs six days per week.”

1171

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

CAREERS AND RECRUITMENT Robert Lewis, former senior vice president at Aventis and former chief scientific officer at Seattle-based Cell Therapeutics, also puts in long hours. At Aventis his daily routine included “at least two group scientific meetings on basic or early development projects (1–2 1/2 hours each), a group site management meeting (human resources, budget, site services, etc.) (1/2–1 1/2 hours), up to three one-on-one meetings with colleagues/scientists from the organization (1/2–1 hour each), time for reading enclosed materials on e-mail and answering/initiating correspondence (1 1/2–2 hours),

with the remainder of the time spent reading scientific journal articles. This, without lunch, amounted to a 11- to 12-hour day.” William Shek, senior scientific director at Charles River Laboratories (Wilmington, MA, USA), spends the majority of his waking hours at his company. His working day largely involves resolving “technical and management problems.” But once the laboratory has emptied in the late afternoon to early evening, he uses the time to “concentrate on writing and computer programming, which has become an avocation and occupation of mine because

of the critical role of information management in the laboratory.” He usually goes home “around midnight.” Because of these long and intense workdays, Lewis is keenly aware of the need for managers to develop time management skills.

Long days are the norm, according to Aventis’ Robert Lewis.

Box 1 In their own words William R. Shek, senior scientific director, research animal diagnostic services, Charles River Laboratories, Wilmington, MA, USA. “I became a veterinarian when I was just 13. Consequently, I went to a high school with a special program in agricultural and spent my summers working on dairy farms in upstate New York. At that time, farm experience was a requirement for entry into vet school. I graduated from high school and went on to attend the College of Agriculture at Cornell University where I majored in biology. After three years as a undergraduate, including a semester at Tel Aviv University, I was accepted to the Cornell New York State College of Veterinary Medicine where I matriculated in the fall of 1974. During the summer of 1975, I started graduate research in microbiology at the veterinary school. I graduated from there in 1977, and went on to complete MSc and PhD degrees in 1979 and 1982, respectively. Although I had been offered an a position as assistant professor at the Cornell’s State Veterinary Diagnostic Laboratory, I decided after 12 years it was time to move on. And so I accepted a job as director of microbiology and immunology at Charles River Laboratories, where I began work in the spring of 1982 and have been employed ever since.” Gary Peltz, head of genetics and genomics, Roche Palo Alto, Palo Alto, CA, USA. “I was an MD/PhD student at Stanford University who did a residency in internal medicine and a fellowship in rheumatology at the University of California, San Francisco. Although I had planned to go into academic medicine, I changed course when I looked at several academic positions in the early 1990s. The very low level of research funding was very discouraging and was coupled with very demanding clinical obligations placed on junior faculty. This made it very difficult to engage in the type of cutting-edge research that I wanted to pursue. Therefore, my first job was at Syntex Research, which subsequently became part of Roche.” Scott Wadsworth, research fellow, medical devices group, Center for Biomaterials & Advanced Technologies, Johnson & Johnson, Somerville, NJ, USA. “I had an MSc in agricultural biochemistry/ marine sciences and wanted to be a marine biologist. When I realized only a few jobs were available, I rethought my option and spent two years in a rheumatology laboratory at Children’s Hospital of Philadelphia, which inspired me to obtain a PhD in immunology at the University of Pennsylvania. Between 1985 and 1989, I held postdoctoral/staff fellow positions at the National Institute of Allergy and Infectious Diseases, studying the role of integrins in T cell development and function. And from there I joined J&J as a

1172

senior scientist. From 1995 I have been biology leader for J&J’s p38 kinase inhibitor program, anti-inflammatory drug discovery, putting four compounds into preclinical development. Since 2002, I have worked on various drug-device combination products, resulting in four prototypes handed off to operating companies for preclinical development. Currently at the Center for Biomaterials & Advanced Technologies I am continuing work on discovery/development of novel drug/biologic-device combination products for indications in orthopedics, postsurgical adhesion, postoperative ileus and drugeluting stents.” Martin Wasserman, former Pfizer, GlaxoSmithKline, Bristol-Myers Squibb, Roche, Aventis and AtheroGenics (Alpharetta, GA, USA) executive. “I began my career with an undergraduate degree in pharmacy and spent five years as a registered pharmacist in a drugstore. I decided to matriculate to The University of Texas Medical Branch in Galveston to pursue a PhD degree in pharmacology and toxicology, which I received in 1972. I was immediately recruited by The Upjohn Company in Kalamazoo, Michigan (now Pfizer), where I spent over nine years as a bench researcher in the hypersensitivity diseases research department. I was then recruited by SmithKline & French (now GlaxoSmithKline) to head the pharmacology department. When SK&F merged with Beecham in the late 1980s, my position was eliminated and I sought a position with Bristol-Myers Squibb as their first director of human pharmacology (a newly created position in clinical research) where my group performed creative phase 1 studies. After spending over three years at BMS, I was sought and hired by Hoffmann-La Roche as director of bronchopulmonary pharmacology in research, from which four years later I was recruited by Marion Merrell-Dow to become the group director of three departments (immunology, metabolic diseases and respiratory research). Soon after, MMD became Hoechst Marion Roussel and later, Aventis, and then Sanofi-Aventis, where my title was vice president and senior distinguished scientist in the respiratory and rheumatoid arthritis disease group and the acting head of oncology. After seven years, an interesting opportunity arose at a small startup biotech company, AtheroGenics. I became senior vice president of discovery research and chief scientific officer. After four and a half years, I chose to officially retire after 35 years in the drug industry and relocate to be closer to my children in California. Now settled there, I am commencing a campaign to explore opportunities to consult with the industry, academia or institutions.

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

CAREERS AND RECRUITMENT

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

“A manager should ask him/herself how able he/she is to thoughtfully delegate and monitor tasks without micromanaging, on one hand, or being blindly dependent upon others, on the other hand,” he says. “It is important for a scientific manager to be able to modulate the pace of his/her day and not to be constantly overworked. There is no reward for burnout!” Passion. Given the workload and day-today frustrations associated with working in the drug industry, a common motivation among the interviewees was altruism—to help reduce human suffering through the discovery of new medicines. Franz Hefti, formerly at Genentech Rinat’s Franz Hefti: and Merck, and Helping people is the now executive vice dominant motivational president at Rinat force. Neuroscience (S. San Francisco, CA, USA), says: “It has always been my dream and goal to bring better medication to people who suffer from diseases of the nervous system. The ability to help [do] this is [my] dominant motivational force.” Van der Ploeg is also upbeat about the drug discovery endeavor: “I am in this business because of great science, stimulating and excellent colleagues, and motivated teams.” At Roche Palo Alto in California, head of genetics and genomics Gary Peltz emphasizes the research challenge: “I enjoy solving scientific problems that can impact human health. I am particularly fortunate to work with a motivated and talented multidisciplinary team of scientists (in genetics, statistics, computation, genomics and biology) that can undertake high risk/high reward projects.” Several other managers also emphasize the rewards of participating in the research endeavor. “I enjoy the collegial nature of scientific/biomedical pursuit, the people-to-people interactions and achieving global recognition for my work,” says Martin Wasserman, a former manager at five pharma companies who recently retired from his post as chief scientific officer at AtheroGenics (Alpharetta, GA, USA). Elsewhere, Aventis’ Lewis praises Martin Wasserman advises scientists interested in management to network.

Box 2 Starting out The research managers interviewed for this article had several pieces of advice for those thinking of moving from the bench into research management at a company. Wyeth’s Bob Ruffalo exorts fledgling managers to “work very hard, publish extensively and remember that discovering and developing new drugs is one of the most noble professions, which patients depend upon us to do.” But what practical steps can you take to increase your chances of making the transition? Roche’s Gary Peltz says that when he visits universities and meets with graduate students and postdocs, he was initially quite surprised to find one universally asked question: “What was it like in industry?” “They were more concerned with my answer to that question than discussing their science,” he says. “It was clear that virtually all academic programs offer very little career counseling or direction for trainees, which is a major deficiency.” Rinat’s Franz Hefti agrees: “It’s important to understand the differences between academic and industrial research. The goal of academic research is to understand nature; the goal of biopharmaceutical research is to find effective treatment for human diseases. Academic research favors an individualistic approach that emphasizes the contribution of an individual; industrial research favors teamwork and emphasizes the common goal.” Peltz’s pragmatic suggestions for students: “First, inquire about and explore a number of options before choosing a career path. Second, realize that there is a wide range of options within industry. Just as the experience at Stanford is very different from that at a local community college, the cultures and experiences in small startup companies differs from that in large pharma companies. Lastly, I strongly suggest that students read Tom Friedman’s book, The World Is Flat. Things are changing within the pharma industry; the pace of change is going to accelerate, and you’d better be prepared for it.” Martin Wasserman advises those interested in research management careers to “consider an undergraduate degree in pharmacy, which permits exposure to most of the biomedical disciplines, unlike a pre-med degree,” adding, “Where possible, take courses in biotechnology.” Charles River’s William Shek also notes, “some of my colleagues have gotten MBA degrees and gone on to senior management.” Wasserman also stresses the importance of attending job fairs and local and national meetings for exposure and appointments. “Try to network with recruiting firms,” he says, “and consider investing in society memberships; invest in the FASEB [the Federation of American Societies for Experimental Biology] directory of members.” Finally, he advises “learn who the executives are and try to set up appointments with them.”

“the people, individually, the science (as a continuous learning experience) and the opportunity to drive many new and potentially productive ideas into actual experiments that challenge hypotheses.” Reinhard Ebner, principal scientist at Avalon Pharmaceuticals (Germantown, MD, USA), feels that a manager’s capacity to play an instrumental, or even leading, role in a team that finds the answer to a complex, long sought-after problem can be an “indescribably rewarding experience, second only to the involvement in an initiative that succeeds in making a concrete contribution to the development of a solution for a previously unmet medical need.” And it’s not only the altruistic side of the drug discovery business that galvanizes people. For Scott Wadsworth, research fellow at Johnson & Johnson (New Brunswick, NJ, USA), it’s “the independence and entrepreneurial spirit that exists, despite being part of a huge corporation. I like the opportunity to have an impact in a large corporation.”

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006

Broad experience. Given the diverse responsibilities and skill required in a research manager position, it helps to be well read and to develop as broad a scientific and business knowledge as possible. Wadsworth says it is important to “diversify your experience,” adding, “Make sure you have demonstrated Reinhard Ebner significant, quantifi- of Avalon says the able, reproducible smaller the company, successes in your the more demanding and wide-ranging early career. Network the management as much as possible problems. within your company and outside. Gain as much exposure outside your company as possible, via speaking engagements, chairing meetings, publishing, etc. Gain as much

1173

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

CAREERS AND RECRUITMENT management experience as possible, by leading project teams, mentoring postdocs, hosting interns, etc. Do all that and management positions will come naturally.” Charles River’s Shek also emphasizes the importance of broad horizons. “I have found it to be particularly important to acquire knowledge and skills beyond my field of scientific specialization in the areas of quality control, project management and bioinformatics,” he says. He adds, “I have had the resources to do many interesting things, to expand my knowledge and skills and to collaborate with highly intelligent and talented colleagues at and/or outside [the company]. As a member of the research models and services division of Charles River, I have participated in a wide array of projects involving diverse disciplines including genetics, diagnostics, engineering, bioinformatics and so forth.” Flexibility, inspiration and leadership. There is no doubt that the pharma industry is currently undergoing a difficult period in terms of sustaining growth, meeting investor expectations and public perception. Working to improve the poor productivity and high attrition of drug pipelines is a key goal for many research managers. Roche’s Peltz puts the problem like this: “My major challenge is maintaining momentum and progress within a constantly changing environment that has an increasingly near-term outlook. This makes it more difficult to maintain cohesion among the large number of individuals performing

1174

the work, and with the stakeholders concerned about the outcome. Discovery science is a lot like cooking; if you open the oven door too often, the cake will not rise.” There is a burgeoning demand for experienced research managers in biotech companies; many of these are being recruited from pharma. But as Avalon’s Ebner points out, having consecutively worked for established large, growing medium-sized, and entirely new startup biotech companies, the decision paths in small and large enterprises are very different. “The younger and more unfinished an institution, the more demanding, wide-ranging and intensive the management problems. This is most acute in the startup setting, which is a bit like starting a family restaurant, where everyone has to help out on every front.” The increasingly cross-disciplinary nature of research and the need to collaborate intramurally and extramurally also creates management headaches. “Some of the bigger and most difficult, yet most important, questions can only be answered by the coordinated studies of many investigators, often from different institutions and countries. This has been true for many fields of discovery for a while, but is now increasingly apparent in the biological sciences. Making the best use of combined efforts almost always requires a great deal of organizational, communicative, planning and even diplomatic skills,” says Ebner. A research manager’s job also offers the opportunity to mentor and reward excellence

and achievement within a team. According to Lewis, with senior management positions, he really enjoyed the chance to do “things that seriously affect the quality-of-life for employees in a positive way; this means that the ‘power’ of a senior job is most useful when it is used to (appropriately) enrich the lives of the most junior colleagues.” Conclusions There are several keys to success in making the transition from the bench to the boardroom (Box 2). Research managers need determination, diligence and passion to do work that matters and that makes a difference. They must also possess the experience to lead and ability to inspire their team. The most effective managers are patient, have a sense of humor, respect their colleagues and are willing to subordinate their ego for the benefit of the organization. If you love being a scientist but crave the financial and professional benefits of management, heading a research group as a department or division leader at a company offers several opportunities. In this role, you will have greater supervisory and budget management responsibilities, and the compensation that comes with them. A person who is a great bench scientist will never be happy being a mediocre manager, but a great scientist who has the ability and desire to move into management has a whole new set of opportunities to achieve important and satisfying results.

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

© 2006 Nature Publishing Group http://www.nature.com/naturebiotechnology

PEOPLE

Archemix (Cambridge, MA, USA) has announced the appointment of Robert Schaub (left) as vice president of preclinical discovery. Dr. Schaub comes to Archemix after 16 years with Genetics Institute and Wyeth Pharmaceuticals, most recently as the assistant vice president for cardiovascular and metabolic diseases. “Archemix’s aptamer technology has the potential to bring forth an entirely new class of therapeutics for the treatment of acute and chronic diseases,” says Dr. Schaub. “I look forward to leveraging my experience with both biotherapeutics and small-molecule drug candidates to this new class of therapeutics.” In addition, Archemix has elevated Page Bouchard (above) to lead its research and preclinical development group. Dr. Bouchard joined Archemix in November 2004 as senior vice president of preclinical drug discovery and development. “Page possesses the unique combination of scientific experience and leadership skills necessary to guide our rapidly expanding pipeline of therapeutic aptamers through research and preclinical development,” says Archemix president and CEO Errol De Souza. “We are privileged to have a leader of his caliber directing our R&D efforts.”

Genomic Vision (Paris) has appointed founder Aaron Bensimon as president and CEO. He will be assisted by Daniel Nerson, who has been named chief operating officer. Dr. Bensimon has been head of the genome stability unit at the Institut Pasteur since 1994, where he developed molecular combing technology and its use in the precise study of genomes. The technology has resulted in 13 patents granted to the Institut Pasteur, for which Genomic Vision has an exclusive license. Novavax (Malvern, PA, USA) has announced the appointment of Jeffrey Church as vice president, chief financial officer and treasurer. He joins the company from GenVec, where he served as CFO, treasurer and corporate secretary since 1988. Bernhard R. M. Ehmer has been appointed to the supervisory board of Hybrigenics (Paris) as a non-executive independent director. Dr. Ehmer is currently CEO of BioPheresis Technologies, and previously served at Merck KGaA in several capacities, most recently as vice president for corporate strategic planning and alliance management. Sylvie Gregoire has been appointed executive chair of the board of directors at IDM Pharma (Irvine, CA, USA). She has been a board member since August 2005. Dr. Gregoire previously served

1176

as president and CEO of GlycoFi, and currently serves on the boards of Cubist Pharmaceuticals and Caprion Pharmaceuticals. Algeta (Oslo, Norway) has appointed Johan Harmenberg as chief medical officer and Michael Dornish as chief scientific officer. Before joining Algeta, Dr. Harmenberg spent nine years at Medivir as vice president of development. Dr. Dornish has nearly 25 years’ research experience in the life sciences industry, most recently as vice president, R&D at FMC Biopolymer. Dr. Dornish replaces Algeta cofounder Roy Larsen, who has decided to pursue other interests and opportunities but will continue as a consultant to the company. Peter Hnik has been named chief medical officer at iCo Therapeutics (Vancouver, BC, Canada). Dr. Hnik most recently served as associate director of clinical research with QLT, playing a critical role in designing and directing Visudyne clinical trials in AMD and diabetic retinopathy. Celera Genomics Group (Rockville, MD, USA) has named Joel Jung vice president of finance. Mr. Jung has held several executive and senior positions with Chiron, including most recently vice president and treasurer. Rosemary Mazanet has joined the board of directors of Cellumen (Pittsburgh, PA, USA).

Dr. Mazanet is presently CEO of Breakthrough Therapeutics and acting CEO of Access Pharmaceuticals. Previously, she has served as the CSO and general partner of Oracle Partners, and before that was director of clinical research at Amgen. James A. Ratigan has joined Nitric BioTherapeutics (Philadelphia, PA, USA), formerly known as Theranox, as CFO. He previously served as executive vice president and CFO of Orapharma, where he raised private capital for the startup, directed its IPO in 2000 and helped orchestrate its sale to Johnson & Johnson. ArQule (Woburn, MA, USA) has named Nigel J. Rulewski as chief medical officer. Dr. Rulewski brings to ArQule more than two decades of experience in R&D, regulatory affairs and commercialization, having previously served as senior vice president of BioAccelerate and vice president, medical affairs and chief medical officer at Astra USA. ADVENTRX Pharmaceuticals (San Diego, CA, USA) has announced that Joachim P. H. Schupp has been appointed to the newly created position of vice president of medical affairs. Dr. Schupp served most recently as vice president of clinical business solutions and clinical data services at ProSanos. Steve Toon has joined the board of Simcyp (Sheffield, UK), which offers in silico simulation and prediction of pharmacokinetics and drugdrug interactions in virtual patient populations. Dr. Toon has over 20 years’ experience in the pharmaceutical industry, previously serving as CEO of Medeval. AAIPharma (Wilmington, NC, USA) has named Martin Tyson to the position of senior vice president, information systems and technology. Mr. Tyson was most recently senior vice president and chief information officer for Quintiles Transnational. The company also announced the appointment of Ninad Deshpanday to the newly created position of vice president of pharmaceutical business development. Previously, Dr. Deshpanday was vice president of drug product development for Synta Pharmaceuticals.

VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

No title

Recommend Documents