Annual Reports in Combinatorial Chemistry and Molecular Diversity Volume 1 (Annual Reports in Combinatorial Chemistry & Molecular Diversity)

Annual Reports in Combinatorial Chemistry and Molecular Diversity Volume 1 This page intentionally left blank Ann...

Author: W.H. Moos | M.R. Pavia | B.K. Kay | A.D. Ellington

17 downloads 1017 Views 4MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Annual Reports in

Combinatorial Chemistry and

Molecular Diversity Volume 1

This page intentionally left blank

Annual Reports in

Combinatorial Chemistry and

Molecular Diversity Volume 1 edited by

W.H. Moos MitoKor San Diego, CA 92121 , U.S.A.

M.R. Pavia Sphinx Pharmaceuticals Cambridge, MA 02139, U.S.A.

B.K. Kay University of North Carolina Chapel Hill, NC 27599-3280, U.S.A.

A.D. Ellington Indiana University Bloomington, IN 47405, U.S.A.

KUWER ACADEMIC PUBLISHERS New York / Boston / Dordrecht / London / Moscow

eBook ISBN: Print ISBN:

0-306-46904-9 9-072-19923-5

©2002 Kluwer Academic Publishers New York, Boston, Dordrecht, London, Moscow Print ©1997 ESCOM Science Publishers B.V. Leiden All rights reserved No part of this eBook may be reproduced or transmitted in any form or by any means, electronic, mechanical, recording, or otherwise, without written consent from the Publisher Created in the United States of America Visit Kluwer Online at: and Kluwer's eBookstore at:

http://kluweronline.com http://ebooks.kluweronline.com

Preface Combinatorial chemistry and molecular diversity approaches to scientific inquiry and novel product research and development (R&D) have exploded in the 1990s. For example, in the preparation of drug candidates, the automated, permutational, and combinatorial use of chemical building blocks now allows the generation and screening of unprecedented numbers of compounds. Drug discovery – better, faster, cheaper! Notably, more compounds have been made and screened in the 1990s than in the last 100 years of pharmaceutical research combined, and new drug candidates are possibly for the first time making their way into clinical development pipelines in a more efficient way. After leading the way in 1995–1996 with the first scientific journal to represent this new field and discipline, ESCOM leads the way again with ‘Annual Reports in Combinatorial Chemistry and Molecular Diversity’ (ARCCMD). Just as ‘Annual Reports in Medicinal Chemistry’ (Academic Press) has established a premier position for yearly reviews in pharmaceutical chemistry and related topics, ARCCMD will do the same for the burgeoning new arena of combinatorial chemistry and molecular diversity. This annual is a ‘must read’ for everyone involved in pharmaceutical and biotechnology R&D, and also for those who are training students for careers in the industry. Moreover, as the field of molecular diversity expands into areas such as bio-organic chemistry, materials science, and beyond, the series will be of considerable benefit to readers who work at state-of-the-art interfaces of science and technology. This first volume in the series pulls together authors from the U.S., Europe, and the Pacific Rim. Coedited by Michael Pavia (Lilly-Sphinx), Brian Kay (University of North Carolina), Andy Ellington (Indiana University), and Walter Moos (MitoKor), it covers three major areas: (i) combinatorial chemistry; (ii) combinatorial biology and evolution; and (iii) informatics and related topics. Within each section, chapters have been prepared by experts in the field. The first section covers: mixture pools versus parallel individual compound synthesis, solution versus solid-phase synthesis and solid supports, analytical tools, automation, and small-molecule libraries. The second section includes: theoretical issues, phage display, synthetic peptide libraries, and nucleic acid libraries. The third and final section includes: databases and library design, high throughput screening, coding strategies versus deconvolutions, intellectual property issues, deals and collaborations, successes to date, and a compendium of solid-phase chemistry publications. We hope you enjoy this new annual forum!

Walter Moos Michael Pavia Brian Kay Andy Ellington

V

Contents Preface

v

Section I: Combinatorial Chemistry Combinatorial chemistry: A perspective

3

M.R. Pavia (Sphinx Pharmaceuticals, Cambridge, MA. U.S.A.)

Techniques for mixture synthesis J.S. Kiely (Houghten Pharmaceuticals Inc., San Diego, CA, U.S.A) Introduction Methods for the preparation of mixture-based libraries Iterative library deconvolutions Heterocyclic libraries Peptidomimetic, peptide and receptor libraries Newer deconvolution methods and expansions of library diversity Rationale for mixture-based combinatorial libraries Conclusions

6 6 7 7 9 10 15 16

Techniques for single-compound synthesis S. Sarshar and A.M.M. Mjalli (OntogenCorp., Carlsbad, CA, U.S.A) Issues to consider Spatially dispersed strategy The Multipin SPOC system The DIVERSOMER method The OntoBLOCK system Split and recombine strategy Comparison of techniques Conclusions and future prospects

19 20 22 23 23 25 27 28

Recent advances in solid-phase synthesis S.E. Hall (Sphinx Pharmaceuticals, Durham, NC, U.S.A.) Introduction Solid-phase chemistry Cleavage methodology Reactions Solid-phase-supported reagents and scavengers Conclusions

vi

30 30 30 30 36 39

Contents

Selection of supports for solid-phase organic synthesis I. Sucholeiki (Sphinx Pharmaceuticals, Cambridge, MA,U.S.A) Introduction Monitoring the solid support Types of supports Kieselguhr-polyacrylamide composite Polystyrene-polyacrylamide composite Polystyrene-polyethylene glycol composite Polyethylene glycol-polyacrylamide composite (PEGA) Controlled pore glass (CPG) Kieselguhr Polyethylene-based supports Cellulose-based supports The effect of the support on the rate of solid-phase reactions Enhancing support loadings Summary and Conclusions

41 41 42 42 43 43 43 43 43 43 44 44 45 47

Solution-phase combinatorial chemistry D.M. Coe and R. Storer (Glaxo Wellcome Research and Development, Stevenage, UK.) Introduction Solution-phase synthesis of pools of compounds Solution-phase synthesis of discrete compounds Liquid-phase synthesis Use of supported reagents Conclusions

50 51 51 54 56 57

Analytical methods for the quality control of combinatorial libraries W.L. Fitch (Affymax Research Institute, Santa Clara, CA, U.S.A.) Introduction Analytical methods used in solid-phase synthesis Measuring yields and loading Following reactions and determining structures of intermediates on-bead Cleave and characterize methods for evaluating intermediates and products Analytical methods for evaluating libraries Parallel synthesis Split/pool solid-phase libraries and solution-phase pools Conclusions

59 59 60 60 62 64 64 65 66

Automated synthesis S.H. DeWitt (DIVERSOMER Technologies Inc. and Parke-Davis Pharmaceutical Research, Ann Arbor, MI, U.S.A.) Introduction System designs

69 69 vii

Contents

Reactors Architecture Valving automation Robotic automation Applications Process optimization Peptide synthesis Organic synthesis and combinatorial chemistry Conclusions

69 70 70 71 71 71 72 72 73

Applications of combinatorial technology to drug discovery D.V. Patel (Versicor Inc,, South San Francisco, CA, U.S.A.) Introduction Synthesis and screening of oligomeric peptide and non-peptide libraries Synthesis and screening of orthogonal and binary encoded libraries Combinatorial synthesis and screening of heterocyclic, drug-like scaffold libraries Parallel synthesis and screening of bioactive pharmacophore libraries Solid-phase synthesis of heterocyclic, drug-like molecules Conclusions

78 78 80 81 84 86 88

Section II: Combinatorial Biology and Evolution Combinatorial biology and evolution: A perspective

93

B.K. Kay and A.D. Ellington (University of North Carolina at Chapel Hill, Chapel Hill, NCandIndiana University, Bloomington, IN U.S.A.)

Models and search strategies for applied molecular evolution B. Levitan (Santa Fe Institute, Santa Fe, NM, U.S.A.) Introduction Laboratory technique-based models and search strategies Affinity distribution models Comparing methods of molecular search Models and search strategies SELEXION model SELEXION results Mandecki et al. model of phage display Mandecki model results Levitan/Kauffman model of phage display Levitan/Kauffman model results Conclusions from laboratory technique-based models Fitness landscape-based models and search strategies Definitions Sequence space viii

95 96 96 97 99 99 101 104 105 107 112 119 124 124 124

Contents

Other molecular spaces Fitness landscapes Properties of fitness landscapes Models of fitness landscapes Spin glass -like models RNA secondary structure models Other approaches Library design and search strategies Designing the initial library Measuring molecular landscape properties Search on spin glass -like landscapes Search on RNA secondary structure landscapes Pooling strategies Conclusions from landscape -based models Conclusions

125 126 127 128 129 133 134 135 135 138 140 143 144 147 148

Molecular evolutionary biology P. Schuster (Institut für Theoretische Chemie und Strahlenchemie der Universität Wien, Wien, Austria and Santa Fe Institute, Santa Fe, NM, U.S.A.) Introduction Evolutionary dynamics The RNA model Evolutionary biotechnology Concluding remarks and perspectives

153 154 157 163 166

Landscapes for molecular evolution: Lessons from in vitro selection experiments with nucleic acids S.D. Jhaveri, I. Hirao, S. Bell, K.W. Uphoff and A.D. Ellington (Indiana University, Bloomington, IN, U.S.A.) Introduction In vitro selection A cartoon representation of landscapes that map sequence to function How rugged are landscapes? How does the length of the random sequence pool affect the landscape? Is there a tyranny of short motifs? How does nucleic acid chemistry affect the landscape? What effect does the target have on the landscape? The relationship between affinity and specificity Conclusions

169 169 172 174 178 179 181 183 187 190

Synthetic peptide libraries Z.-G. Zhao and K.S. Lam (The University of Arizona, Tucson, AZ, U.S.A.) Introduction Peptide chemistry

192 193 ix

Contents

Solid supports Anchoring linkers Amino acid side-chain diversity Peptide bond formation Cyclic libraries Spatially addressable parallel libraries Multipin method SPOT synthesis Light-directed peptide synthesis on chips Synthetic library methods requiring deconvolution Iterative approach Positional scanning Orthogonal partition Recursive deconvolution Affinity selection of soluble peptide libraries One–bead-one–compound peptide library method Perspectives

193 194 196 196 197 197 197 198 198 198 199 200 200 200 201 201 204

Phage display J. Collins (Gesellschaft für Biotechnologische Forschung, Braunschweig, Germany) Introduction Phage and phagemid display: Some important parameters Library type and suitability Binding kinetics and library size Recognition of synergistic intramolecular interaction: Phage versus synthetic combinatorial libraries Epitope mapping: Is phage display the best way? How good are the banks? Vector and library stability Codon usage, bias and representation Somatic hypermutation ‘Landscape’ and ‘mosaic’ phage λ packaging: A help in generating more primary clones λ phage as an alternative phage-display system? Constraints and context Constrained peptide display libraries Designed presentation matrices Increasing diversity by reassortment and recombination In vivo site-specific recombination to generate large combinatorial libraries λ recombination to generate combinatorial libraries Cre/loxP recombination to generate combinatorial libraries Exon shuffling Direct interaction rescue Panning on cells Innovative screening protocols ‘Substrate phage’ x

210 212 217 219 219 221 221 223 229 231 232 232 234 235 237 239 239 240 240 244 246 247 250 250

Contents

Guiding specificity SH3-ligands and mirror-imaging with D-peptide ligands Selecting disease-specific mimotopes Selecting human antibodies directed to a particular epitope Summary and future trends

250 251 252 254 254

Section III: Informatics and Related Topics Informatics and related topics: A perspective

265

W.H. MOOS (MitoKor, San Diego, CA, U.S.A.)

Modern chemical and biological databases B.D. Christie and J.G. Nourse (MDL Information Systems Inc., San Leandro, CA, U.S.A.) Introduction Representing generic structures Working with generic structures Enumeration Clipping Searching generic structures

267 267 268 268 270 271

Practical high throughput screening M. Reichman and A.L. Harris (Ligand Pharmaceuticals, San Diego, CA and Chiron Corporation, Emeryville, CA, U.S.A.) Historical perspective Molecular ‘diversity’ Compound acquisition and registration for HTS Overview of new software for HTS Fundamental approaches to data analysis and interpretation HTS assay validation guidelines Rationale Standard operating procedures (SOPs) Example of the importance of detailed SOPs Validity of protocols State-of-the-art in laboratory robotics today New leads discovery: Anticipating the future Advances in miniaturization Advances in laboratory robotics Advances in information management Multiplexing in HTS Summary and key issues

273 274 275 276 278 280 280 280 280 281 282 283 283 284 284 284 285

xi

Contents

Deconvolution methods in solid-phase synthesis J.J. Baldwin and R.E. Dolle (Pharmacopeia Inc., Princeton, NJ, U.S.A.) Introduction Combinatorial libraries Mixture libraries and deconvolution Split synthesis approach Mass spectrometry and structure determination Encoded combinatorial libraries Peptide encoding strands Oligonucleotide encoding strands Molecular tags as an encoding strategy Radio frequency encoding Conclusions

287 287 287 289 291 292 292 292 292 295 295

Patent strategies in molecular diversity K. Bozicevic (Fish & Richardson P.C., Menlo Park, CA, U.S.A.) Introduction Real and intellectual property Types of intellectual property Types of patent claims Process claims Product claims Product-by-process claims Enforcing patent rights Recent developments Economic facts Conclusions

298 298 299 302 303 305 306 307 309 311 312

Combinatorial chemistry alliances in the 1990s: Review of deal structures M.G. Edwards (Recombinant Capital Inc., San Francisco, CA, U.S.A.) Introduction and Methods Selected combinatorial chemistry alliances Pre-commercial payments Evolving alliance structures Compound exclusivity Conclusions

314 314 316 317 319 320

Combinatorial chemistry: The promise fulfilled? J. Hauske (Sepracor Inc., Marlboro, MA, U.S.A.) The practitioners The clinical results HPI What therapy area? xii

321 322 323 323

Contents

Is the molecular target new? What is the structural class and corporate identification number of the candidate? How long from idea inception to initiation of the clinical trial? Did the candidate molecule arise from an optimization library of an existing lead or is it an analog of a lead structure discovered utilizing combinatorial methods? Eli Lilly What therapy area? Is the molecular target new? What is the structural class and corporate identification number of the candidate? How long from idea inception to initiation of the clinical trial? Did the candidate molecule arise from an optimization library of an existing lead or is it an analog of a lead structure discovered utilizing combinatorial methods?

323 323 323

324 324 324 324 324 324

324

A compendium of solid-phase chemistry publications I.W. James (Chiron Technologies, Clayton, VIC, Australia) Introduction Solid-phase organic chemistry (SPOC) Reviews Products General reactions Named reactions Asymmetric reactions Enzyme-catalyzed reactions Transition metal chemistry Solid-phase inorganic chemistry (SPIC)

326 326 326 327 330 334 334 335 335 336

Author Index

347

Subject Index

348

xiii

This page intentionally left blank

Section I

Combinatorial Chemistry

This page intentionally left blank

Combinatorial chemistry: A perspective Michael R. Pavia Sphinx Pharmaceuticals, A Division of Eli Lilly & Co., 840 Memorial Drive, Cambridge, MA 02139, U.S.A.

Chemically generated libraries are now being widely embraced by researchers in both industry and academia. The field of combinatorial chemistry began only a decade ago. For much of its existence researchers were devoted to preparing large libraries of peptides. Next came the progression to peptide-like oligomeric compounds and, most recently, the preparation of small organic molecules. Today, the vast majority of efforts in the field are devoted toward the preparation of these small, non-oligomeric molecules. The first examples of non-oligomeric small-molecule diversity generation were reported only four years ago. Since then we have seen the steady development of new chemistries and equipment applied to library generation. It is now possible to synthesize a broad structural range of compounds using these methodologies, and innovative methods and chemistries are being reported on a weekly basis. Section 1, ‘Combinatorial chemistry’, reviews the current state of this rapidly progressing field. This first section focuses on three major areas: library generation method (mixtures and single compounds); chemistry (solid-phase and solution synthesis); and quality/speed (analytical and automation). Finally, the section ends with a chapter reviewing examples where combinatorial chemistry/rapid organic synthesis has been used for discovering novel lead structures for pharmaceutical purposes. Much of the early work in combinatorial chemistry focused on the preparation of large mixtures of compounds. The most widely used technique for mixture synthesis is the split/recombine method which assures that each component of the mixture is present in approximately equimolar concentrations. The structures of the bound ligands are determined either through an iterative, or recursive, deconvolution strategy or through the use of encoded libraries. The chapter by Kiely reviews the preparation of libraries as mixtures from 1995 to the present. The important areas which are reviewed are methods of preparation, means of identifying active compounds from the mixtures, and why one might wish to prepare mixtures. The chapter also includes the preparation of mixtures of non-peptide molecules, as well as peptidomimetic molecules, and even a few examples of peptides. Examples are presented where mixture synthesis and screening have been effective in identifying interesting biological leads. Many investigators retain legitimate concerns around screening mixtures of compounds. The screening of single, structurally defined molecules has a proven track record in the industry. For this reason, a large number of groups have focused on the synthesis of single compounds, often referred to as rapid organic synthesis. A number of laboratories have developed methods for the rapid, simultaneous synthesis of large numbers of compounds using array synthesis that results in one single and well-defined compound being prepared at each site. Annual Reports in Combinatorial Chemistry and Molecular Diversity, Vol. 1, pp. 3–5 © 1997 ESCOM Science Publishers B. V.

3

M.R. Pavia

The chapter by Sarshar and Mjalli reviews the present state of single-compound synthesis. The methods reviewed include the preparation of spatially addressable libraries using unique reaction vessels and automation, and also the use of split and recombine strategies which have the ability to afford single compounds for testing. This latter approach may be a very powerful combination of rapid chemical synthesis with a proven screening methodology. Compounds prepared by single-compound methods are being used for both lead generation and lead optimization, and may be especially useful in meeting the strict requirements of purity and structural characterization required for medicinal chemistry structure–activity studies. Solid-phase or solution synthesis? A large percentage of the libraries initially produced were prepared by solid-phase organic synthesis. The range and complexity of organic reactions which can be successfully carried out on solid phase is rapidly expanding. Recently, we are seeing an increased sophistication in the complexity of chemistries which can be carried out in solution as well. The introduction of solid-phase-bound reagents and scavengers has made solution chemistry a very viable methodology for library synthesis. The answer to the question ‘solid-phase or solution synthesis’ is clear. The methods are complementary and both should be included in the chemist’s toolbox. Hall reviews recent advances in solid-phase chemistry, focusing on new reactions reported in 1995 or later for the generation of non-oligomeric molecules. With the range of reactions available to the investigator rapidly expanding, we are on the verge of being able to prepare nearly any type of molecule desired. A brief discussion of resin-bound reagents and resin-bound scavengers is also included (further coverage of this later topic can also be found in the chapter by Coe and Storer). While solid-phase chemistry is valuable, it is not without its problems such as difficulties in the analysis of resin-bound products, low loading capacities of many resins, and physical properties of resins which can severely affect reaction rates and site accessibility. Sucholeiki reviews the selection of supports for solid-phase organic synthesis. To address many of the problems presented above, we are seeing the use of a wide range of solid supports from the traditional Merrifield resin to newer composites, cellulose, silica, and others. This chapter presents the pros and cons of many of these potential supports, reviews modifications to enhance the solution-like behavior of the bound materials, and methods to increase the loading capacities. An increasing number of groups are generating libraries in which parallel reactions occur in solution. Until very recently, solution-based syntheses were primarily utilized for the preparation of simple structures. However, the introduction of resin-bound reagents, scavenger resins, and parallel purification schemes is allowing more complex chemistries to be performed in parallel solution format. The chapter by Coe and Storer reviews the current state of library generation using solution-based techniques. Examples are given where these techniques are being used to prepare both mixtures and single compounds for use in both lead discovery and optimization. Possibly the most valuable use of these solution techniques is for the rapid optimization of existing leads. Quality/speed: To achieve high-quality libraries in an efficient fashion, it is important to have access to appropriate analytical tools and automation. In his chapter, Fitch reviews the issues relating to analytical methods for the quality 4

Combinatorial chemistry: A perspective

control of combinatorial libraries. He raises and discusses three crucial issues: What tools can we use for solid-phase reactions? How can we analyze all these samples? How much characterization of libraries is possible or appropriate? The use of automation is an important component in library generation. Considerable efforts to develop automated systems for organic chemistry are underway which not only free the chemist for more productive endeavors but also assure consistency in repetitive procedures. DeWitt reviews automated synthesis from 1990 to the present, and concludes that full exploitation of automation in the chemistry laboratory has not yet been realized. However, many of the important, but repetitive, manipulations needed to perform an effective library preparation are now routinely automated. As library synthesis becomes commonplace in the industry, the chemist will require even more automated systems. Clearly, we can expect exciting advances in this area in the future. Finally, the chapter by Patel reviews the use of combinatorial libraries in drug discovery. It is clear that this technology is being widely applied across the pharmaceutical industry. Examples are presented using traditional rational design principles, novel structural lead identification by optimizing the range of diversity space being explored, and simple brute force approaches. Ligands for a wide range of interesting biological targets including enzymes (thermolysin, trypsin, HIV protease, phospholipase, collagenase, and others), receptors (vasopressin, neurokinins, estrogen, and others), and several other classes of targets are reviewed; the list is growing daily. The future: The field of combinatorial chemistry has moved at an incredibly fast pace in the last several years. What can we expect to see next year in Volume 2? Advances in adapting solution chemistry to solid-phase organic synthesis will continue and we can expect that nearly all of the standard reactions in organic chemistry will be successfully carried out on solid phase. The creation of novel cleavable linkers for tethering reaction components to supports and improvements aimed at identifying novel solid supports to improve reaction yields and product loading will all contribute to this effort. An expanded scope of reactions that can occur with solution techniques using solid-bound reagents and scavengers as well as increasingly effective parallel purification schemes are expected. In short, soon we will have the ability to make nearly any class of molecule we wish using these rapid techniques. Improved generations of automated instrumentation built to perform organic synthesis and create libraries largely under computer control are also expected. Furthermore, progress in analytical methodology will continue. Successes resulting from combinatorial synthesis have already been reported for both lead generation and optimization and we can expect that these successes will be reported at an ever-increasing pace. We can also expect to see further advances in applying combinatorial techniques outside the traditional pharmaceutical venue, such as in material science, agricultural science, and many other areas. We hope you enjoy this section. We are already beginning to prepare for next year’s volume, and expect to see a continuing stream of exciting new breakthroughs.

5

Techniques for mixture synthesis John S. Kiely Houghten Pharmaceuticals Inc., 3550 General Atomics Court, San Diego, CA 92121, U.S.A.

Introduction This chapter is intended to provide an overview of the field of mixture-based combinatorial chemistry. It deals with reports of library preparation and utilization for the period from the beginning of 1995 to mid-1996. As this is the first Report in this planned series, some discussion of pre-1995 work will be included to set the proper context for the more recent work. The discussion will cover the methods of preparation of mixture-based libraries, the means of identifying actives from the mixtures, and the rationale for using mixtures. There have appeared a substantial number of reports on new organic syntheses being carried out on solid support. In general, these are important to mixture-based library production, but in a number of these reports the syntheses were not extended to produce actual libraries or were employed for the synthesis of arrays. For these reasons, these types of reports are beyond the scope of this chapter and will not be discussed. An effort has been made to include the important reports and patents describing actual libraries. A significant number of reviews and books on combinatorial libraries [1–4], molecular diversity [5–8], and solid-phase synthesis [9,10] have appeared in the last 3 years. For a more in-depth look at the evolution of this area, the reader is referred to these publications. In addition, a new journal has been started dealing with molecular diversity. This journal, Molecular Diversity, is published by ESCOM and is available in printed and electronic form. Associated with it is an Internet worldwide web site devoted to combinatorial chemistry and molecular diversity which can be accessed as www.vesta.pd.com. Independently, two laboratories working in peptide research developed array (matrix) methods of the multiple simultaneous synthesis of single peptides. The method of Houghten and co-workers employed a porous polypropylene mesh packet to enclose standard polystyrene resin for solid-phase synthesis [11]. Alternatively, Geysen’s method employed polyacrylate-grafted polypropylene rods to accomplish multiple syntheses in a discrete 96well microtiter plate-based array [12]. Along with others, these same laboratories generated the first reports on mixture-based syntheses, which came shortly after the array publications [13–15]. In two ‘Perspectives’ published in 1994 in the Journal of Medicinal Chemistry, workers at Affymax very clearly laid out the evolutionary progress in preparing combinatorial libraries as mixtures [3,4].

Methods for the preparation of mixture-based libraries The synthesis of mixture-based libraries in any of the various formats relies on the divide, couple, and recombine paradigm (Fig. 1) to assemble the mixture [13–15]. In this method, the solid support is divided into appropriate portions, each portion is coupled 6

Annual Reports in Combinatorial Chemistry and Molecular Diversity, Vol. I, pp. 6–18 © I997 ESCOM Science Publishers B. V.

Techniques for mixture synthesis DIVIDE

COUPLE

RECOMBINE

DIVIDE

COUPLE

Fig. 1. Divide, couple, and recombine method.

(reacted) with a different building block, the portions are recombined and mixed to a homogeneous state, reproportioned into the proper number of new pools (now as a mixture of the first building blocks), and coupled to a second set of building blocks. If the synthesis is to continue, the pools are remixed and the cycle is repeated. When the synthesis is complete, the pools are not remixed. This separation as the final set of pools serves as the first level in the deconvolution. This technique is amenable to essentially any number of building blocks and as many divide, couple, and recombine cycles as is desired. However, one must keep in mind the statistical nature of the divide step. As the library grows in size (number of compounds), there must be more copies of the individual compounds in order to assure that in each divide step all pools will contain at least one copy of every compound. The means to account for this has been described [16,17].

Iterative library deconvolutions Heterocyclic libraries Iterative deconvolution is the original deconvolution method and remains quite reliable. The method relies on the synthesis of the library by the divide, couple, and recombine method to prepare a series of mixtures each with one residue of a selected diversity position being unique to each mixture. An active mixture(s) is selected and a resynthesis is performed whereby a second diversity position is defined. This is repeated until the resynthesis produces individual compounds. The highly active individual compounds this yields are the ‘actives’ observed in the original active pool(s) of the library. The iterative method has been modeled by computer simulations. The results reported indicate that, even when accounting for experimental variability, an iterative deconvolution will converge to a molecule(s) that is the most active or very close to the most active (within 1 kcal) even for very large pools (~65 000 compounds/pool) [18,19]. A number of reports have appeared describing the preparation and deconvolution of iteration-based libraries (Fig. 2). A library based on piperazinediones (1) having three diversity sites was reported that comprised 10 pools of 100 compounds. No biological or screening data were initially reported [20,21]. This library was constructed by the reductive alkylation of a support-bound amino acid followed by the coupling of a second amino acid and finally cyclization. A later footnote revealed that this library produced a piperazinedione with a high affinity for the neurokinin-2 receptor [22]. Additionally, a four-diversitysite piperazinedione (2) and a related morpholinedione (3) library were reported without screening data [23]. These libraries were prepared through the submonomer method, which provides for greater potential diversity within the library [24]. A library of 2000 compounds having a core structure based on 3-amino-5-hydroxybenzoic acid (4) was reported, 7

J. S. Kiely

again, without activities [25]. For this library, the core structure was sequentially amine acylated and phenol alkylated to create the library diversity. A novel, highly potent ACE inhibitor analogue of captopril was discovered from the deconvolution of a 500-member mercaptoacyl proline (5) library. Library preparation was done via the assembly of four components using amino acid aldimine azomethine ylid cycloaddition with a series of alkenes [26]. The preliminary studies for a 1,4-benzodiazepine-2,5-dione library (6) and a 1 (2H)-isoquinolinone library (7), both based upon ‘submonomer’ peptoid synthesis methods, were reported by workers from Chiron [27,28]. Another report describes the macrocyclization via Heck chemistry to prepare a small library of macrocyclic amides (8), reminiscent of macrocyclic antibiotics [29]. A library based on Kemp’s triacid was reported to provide 75 000 racemates (9), which were assayed for biological activity (undefined) and hits deconvoluted directly via mass spectrometry [30]. A phenyl piperazine library (10) based on Sn2 and SnAr displacement of alkyl or aryl halides was described to provide a small prototype library (< 200 compounds) without biological data [31]. The methods to prepare a 3,4-disubstituted β−lactam library (11) via the Staudinger reaction were reported along with a very small test library synthesis. For a larger version of this library, the variation in diastereomeric ratio for the individual compounds investigated would likely make library component equimolarity problematic [32]. A modest-sized library based on

Fig. 2. Core structures of recent heterocyclic libraries.

8

Techniques for mixture synthesis

cyclooxygenase- 1 inhibiting thiazolidinone (12) was prepared and deconvoluted in a search for more potent analogues. Unfortunately, a more potent analogue was not found among the combinations of building blocks employed in this library. The deconvolution led to the active compound upon which the library was based. This is a reconfirmation that controlled mixtures will accurately identify the activities present [33]. Peptidomimetic, peptide and receptor libraries A number of reviews have appeared covering the area of libraries based on peptidomimetics, peptides and receptors [34,35]. Pfizer chemists reported on an endothelin antagonist developed from a 30 000-compound peptidomimetic library and on the SAR information obtained during the deconvolution [22]. A 900-member peptidylphosphonate library (13) yielded three compounds with less than 100 nM activity and with heretofore unreported groups in the P 2' position [36] (Fig. 3). A report appeared during 1996 that reported on preparing peptide libraries using a Zymark robot and carrying out sophisticated analyses of the mixtures by NMR, tandem MS, and capillary electrophoresis [37]. A new on-resin dual-color colorimetric analysis method for screening peptide libraries named PELICAN was reported. This method relies on immunostaining of compounds binding to the target providing one color and those binding to the immunoreagent itself giving a different color [38]. An on-resin tag-encoded peptidomimetic library designed to identify amide-based structures that bind to the Src SH3 domain was reported to provide novel micromolar active leads [39]. A peptide library designed to be conformationally defined was reported to yield antimicrobial peptides with improved activity following deconvolution [40,41]. µ-Opioid receptor selective peptides were uncovered in a free N-terminal peptide library and in the N-acetyl version of the same peptide library. Several of the identified sequences were low nanomolar in potency and unrelated to the natural enkephalin sequence [42]. Peptides binding to the extracellular portion of the dopamine D2 receptor were found within a library of pentapeptides using a magnetic bead affinity selection methodology with Kd’s as low as 0.1 µM [43]. An endothelin receptor antagonist hexapeptide was developed from a small library via three rounds of deconvolution [44]. A library of phosphinic peptides provided subnanomolar selective tetrapeptide inhibitors of zinc endopeptidases [45]. A peptide mimic of the hepatitis A antigenic sequence was found from a hexapeptide library, and from this a series of effective antibodies was developed [46]. The use of mass spectrometry to deconvolute a peptide library was described by Youngquist et al. [47] and was the subject of an issued patent [48]. Novel calcium-independent antigens to a calcium-dependent monoclonal antibody with potencies greater than that of the original antigen were reported recently [49]. A phosphodiester-based library (14) was employed to identify by iterative deconvolution LTB4 and PLA2 binding inhibitors from 20 000+ compounds [50]. The library employed novel phosphoramidite building blocks and produced inhibitors with IC50’s in the range of 0.7–2.0 µM. However, to achieve the intended library size with the available building blocks, the library was composed of pentamers. This resulted in molecular weights substantially over the generally accepted 600–700 amu upper limit on oral availability. A report appeared on combinatorial library polyallylamine-based polymers ( 15) that behave as phosphatase catalysts [51]. Based on the method for preparing this particular polymer 9

JS. Kiely

library, full deconvolution was not possible in any reasonable manner. An encoded library of synthetic ionophores ( 16) derived from the cyclen core was reported which yielded improved complexation of Cu2+ [52]. An artificial peptide/steroid-based receptor library (17) was constructed and shown to selectively bind 5Leu enkephalin [53,54] (Fig. 3).

Newer deconvolution methods and expansions of library diversity The original iterative method for deconvolution remains robust and highly useful. However, there are opinions that the resynthesis aspect makes this deconvolution slow to yield individual hits, although no study is known that comparatively quantifies this. In order to overcome this perception, a number of research groups are actively developing additional deconvolution methods for mixture-based libraries. A method to improve the efficiency of iterations labeled ‘recursive deconvolution’ was introduced [55]. In this modified iteration method, at each split step of the split and mix process a portion of each pool is held back from the mixing step. Thus, in the subsequent iterative deconvolution, a sample of each pool at the previous split step is immediately available for coupling to the active component. This is postulated to speed up the complete deconvolution process by removing the need to return to the beginning of the synthesis to produce the needed pools. The drawbacks for large libraries in this format are clearly the very significant quantities of material that are required for archival purposes and the nontrivial information tracking required for the ‘recursed’ pools. A rather powerful deconvolution method reported is positional scan deconvolution developed by Houghten and co-workers [56] (Fig. 4). In this method a number of copies

Fig. 3. Oligomeric libraries.

10

Techniques f o r mixture synthesis

Library Format

Library Construction

Screening for Active Pools

OnXX

O1XX, O2XX, O3XX

O3XX

XOnX

XO1X, XO2X, XO3X

XO3X

XXOn

XXO1, XXO2, XXO3

XXO 1

Synthesis of Individual Actives ' O3O3O1

Fig. 4. Positional scan library format: three diversity positions, three elements per position. O=fixed position, X = equimolar random position.

of a given library are prepared simultaneously, wherein the number of library copies prepared is equal to the number of diversity positions. In each copy of the library, a different diversity position is isolated and possible variants at that position are investigated. The other diversity positions are kept as equimolar mixtures of all the possible substituents. For a given copy of the library, the diversity position to be evaluated has one library pool for each variant at that diversity position. Testing of the individual pools allows one to rank-order the variants. This identifies the most active substituent (variant) at each diversity position. All pools from all libraries are screened in parallel and the most active residue(s) at each diversity position is identified. From this information, a series of individual compounds encompassing all the permutations of the active residues at each diversity position is synthesized and tested. In the examples published, this yields a series of highly active compounds and has included peptides [49,57–59] and non-peptide polyamines [60]. A comparison of positional scan formatted libraries to libraries of identical diversity deconvoluted by the iterative method demonstrated that, for peptides, similar sequences could be identified by the two methods. At some positions, there were a number of conservative amino acid differences noted in the comparison of the two methods [61]. Having demonstrated the power of the positional scan formatted library with peptide libraries, it seems logical to conclude that the format could be applied to any structural class of molecule. This has been recently explored both theoretically and experimentally by Freier, Konings, and co-workers [19,62]. These workers have shown that, for the computationally accessible oligonucleotides binding to oligonucleotide targets as a model system, positional scan deconvolution was as effective as iterative deconvolution except where multiple alignments of pool molecules with the target were possible. In the case of multiple alignment, following multiple pools greatly increased the positional scan success rate. This has been demonstrated experimentally for peptides [56]. As positional scan deconvolution is applied further to heterocyclic libraries, the multiple alignment problem is not likely to be of concern as the structures employed are not likely to have multiple modes of binding to their target. For heterocyclic libraries in a positional scan format, a key factor is in being able to devise a method for efficiently carrying out the construction of the library in the positional scan format. For peptide libraries, the Houghten laboratory was able to employ mixtures of building blocks where the molar ratios had been adjusted to account for reactivity (kinetic rate) differences of incoming building blocks and thereby achieve equimolar mixtures. A non-peptide, two-dimensional positional scan library, identified as an indexed combinatorial library, was reported recently [63]. In this work a novel acetyl11

J.S. Kiely

choline esterase inhibitor was uncovered from the mixture and confirmed through resynthesis of the individual compound. An alternative library format, an orthogonal library, was introduced by Tartar and coworkers [64] (Fig. 5). With this format, a group of 25 D- and achiral amino acids were divided into five groups of five (the A1–A5 building block groups), and these were coupled to make all the possible trimers as 125 mixtures of 125 compounds. The original 25 building blocks were then redistributed into a second grouping of five groups of five, the B1–B5 building block groups. In this second set, no two building blocks are in the same set together as in the corresponding first (A) set. When the second set is used to construct trimers (also 125 per mixture), only one trimer structure is common to any two A and B mixtures. Additionally, the groupings were done such that the diversity of the building blocks within a group was as high as possible. This was done to minimize additivity in the test signal. The utility of this format was demonstrated in a V2 vasopressin receptor binding assay wherein dose dependency studies produced one active A set library pool and two active B set pools. Making the individual compound common to A and the first B yielded a 64 nM inhibitor of ligand binding to the V2 vasopressin receptor. The second compound common to the A pool and the second B pool showed no activity. The authors speculate that the original activity in this second B pool was due to an impurity. Although not identical, both the orthogonal and positional scan formatted libraries share the features that all mixtures are made at the start of the library process and only individual compound synthesis is required after the first screening of mixtures. This is an extra initial effort with regard to the synthesis of mixtures when compared to an iterative method. The advantage is that no intermediate mixture syntheses will be required. If prepared in sufficient quantity, the library can be screened over a large number of assays, and the added effort of initial mixture syntheses will be translated into an efficiency in deconvolution relative to the continual resynthesis of mixtures with iterative deconvolution. A & B Library Building Block Sets Sets B – –1 B – –2 LibraryPools A1A1 A2A1 A1A2 A2A2 B1B1 B2B1 B1B1 B2B2

ActivePools

–A–1 –A2– W Y

X Z Deconvolute Elements

A1A 2

(WY)(XZ)

B1B1

(WX)(WX)

WX

Fig. 5. Orthogonal library format: two-set, four-element library.

12

Active Building Blocks

Techniques for mixture synthesis

TAG

Tag Reagent

Decoding Method

DNA

Nucleotide phosphoramidite

PCR

Peptide

Peptide activated ester

Edman degradation mass spectrometry

Haloaromatic phenol ether

Haloaromatic phenol ether diazoacetate

Electron capture gas chromatography

Fig. 6. Encoded libraries.

A demonstration of liquid-phase combinatorial synthesis was recently introduced whereby the highly soluble, but highly crystalline, monomethyl polyethylene glycol (PEG) was employed to allow solution synthesis of a peptide library and solid phase-like purification by a change to a solvent where the PEG was insoluble. The demonstration pentapeptide library yielded the expected anti-β-endorphin binding sequence [65] . A second paradigm for deconvolution is through libraries based on an ‘encoded’ bead method (Fig. 6). In this method a readable chemical tag was simultaneously attached to the individual bead for each step in the synthesis of the actual molecule on the bead [66]. Herein the library is based on a one-bead–one-compound method, and the activity is measured for individual beads. This determination of activity can be while the test compound is still attached to the bead, where the tag is read directly after the activity is measured. Alternatively, the test compound is cleaved from the bead in a manner that, once activity is detected, allows one to directly return to the individual bead in order to read the tag. The original research in this area was done by Lam and co-workers [14,67, 68]. These efforts have progressed to the issuance of a U.S. patent on the ‘one-bead–onecompound’ methodology with claims for peptide libraries [69]. The consequences of the appearance of this patent to those using one-bead–one-compound methods are undetermined at this time. The coding strategies rely on using a binary or higher base code for the information coding and on the ability to read with accuracy very low concentrations of the tag or to amplify the signal in some manner. Additionally, tagging methods have the requirement that the chemistries needed to assemble the actual test molecule and the tag be mutually compatible (orthogonal). Several groups have reported on these methods (Fig. 6). This includes the use of PCR (polymerase chain reaction) amplifiable oligonucleotide tags [70–72]. With oligonucleotide tags, a long primer sequence at each end of the coding segment is needed to get a good fidelity of PCR copying. This tag is of limited utility due to the need to synthesize a very long tag string and the acid instability of the oligonucleotide to many chemistries. Peptides were employed as the coding tag by scientists at Selectide using a branched and orthogonally protected linkage to the bead. Here, one branch 13

J.S. Kiely

held the peptide tag and the other the test molecule [73,74]. These same workers extended this work to a library where the spacing between the side chains was variable but where the tagging method was unchanged [75]. The most robust of the encoded methods employs nonreactive haloaromatics introduced through carbene insertion reactions. After cleavage from the beads, the tags are detected by electron capture gas chromatography [76]. This methodology has been applied to a dihydrobenzopyran library and an acylpiperidine library. Both of these libraries gave active structures with IC50’s in the low nanomolar range against carbonic anhydrase [77,78]. The concept of employing a radiofrequency encodable tag has been reported [79]. The concept of preparing a library from a library was introduced as a means of leveraging the initial effort in preparing a mixture-based library (Fig. 7). This is accomplished through a chemical transformation of the library as a whole into a structurally related, but physicochemically different, library [80,8 1]. This idea is applicable to transforming a single functionality (a point transformation) present in all members of the library, i.e., the conversion of an olefin to isoxazole (16) [81]. Alternatively, the conversion of all identical functionalities contained in all molecules (a global transformation) of a library is possible and was demonstrated in converting a peptide library to an N-permethylated peptide library (17) [80,82]. Both transformations, ‘point or global’, yield a new diversity library with no new library construction, only a single chemical transformation. To do this requires that sufficient library material be prepared in the initial synthesis in some separable form in order to partition the library between the two forms. A follow-on concept to the one-bead–one-compound concept was recently published as a ‘library of libraries’, whereby the authors demonstrated, for peptides, the idea of one-bead–one-motif [83]. For linear peptides, this method assumes that some positions are important for structure and some are important for contact with the target. The library construction is accomplished by combining elements of an iterative library with a parallel array library. The construction of the library breaks the structure down into positions defined as structural positions and motif (pharmacophore) positions. The structural positions are built with mixtures of building blocks and the motif positions are constructed through the divide, couple, and recombine procedure. The results reported showed that peptide binding motifs could be identified that reproduced known motifs.

Fig. 7. Libraries from libraries: point and global transformations.

14

Techniques for mixture synthesis

Rationale for mixture-based combinatorial libraries A number of commentaries on the reasons for doing combinatorial chemistry have appeared [84–88]. In general, the consensus reasoning behind combinatorial libraries of any type is the recognition that biological screening capabilities have become so efficient that pharmaceutical firms do not have, and cannot acquire, access to sufficient numbers of potential test compounds to satisfy the throughput capacity of the screens. It was recognized that the chemical synthesis of new molecules for high throughput screens by one-at-a-time methods was wholly inadequate. The need to redress this imbalance has driven both array-based library syntheses (see the chapter by Sarshar and Mjalli in this volume) and mixture-based library development. The decision to utilize mixture-based library approaches derives from two conclusions. The first is that current robotic techniques for preparing sufficiently large numbers of molecules in single compound array libraries are still below the capacity of the high throughput screening. The time frame for achieving robotic synthesis capabilities equal to screening throughput is not yet known. In contrast, mixture-based libraries do have the capacity to produce truly large numbers of compounds. This can effectively ‘feed’ a high throughput screen by producing large numbers of compounds even if the numbers of samples to be tested are not large. The second reason is the seemingly counterintuitive notion of testing a mixture to find active individual compounds. The use of mixtures in screening is well supported as a successful paradigm by the historical and ongoing success of natural products screening. Here mixtures are the norm and are demonstrably successful. With mixture-based libraries, the direct analogy to natural product mixtures is maintained and several improvements are added. In contrast to natural product mixtures, mixture-based synthetic libraries are prepared with known and well-controlled chemistry, thus assuring that activities can be reproduced. In all mixture formats, equimolar concentrations of all elements of the library are achieved through control of the mixing aspects of the library synthesis and through reaction optimization efforts. These two aspects assure that all elements of the library to be prepared are in fact prepared and in nearly equimolar concentrations. This equimolarity is unattainable with natural product mixtures and is a highly beneficial aspect of the synthetic library mixture. Ultimately, the utility of library mixtures will rest on several factors, including the total number of compounds one wishes to screen, the time available to do the screening, the time necessary to run each assay, and the available supply of biological target/organism. If time and the supply of target are not of concern, then screening individual compounds may make sense. However, if the biological target is in limited supply, whereby individual testing would exhaust the supply, then screening mixture-based libraries offers a feasible and useful alternative. For example, a 500 000 individual compound library, if available, could exhaust the biological reagent supply, whereas a mixture-based library would be far more sparing of reagent. If time factors, either cumulative testing time for a large number of compounds or individual assay times, are long then screening mixtures rather than individual compounds may be better. These points are not always considered when selecting a library format for a given assay. In the future, however, these considerations may be the primary basis upon which library formats are chosen with regard to a new assay. 15

J.S. Kiely

A U.S. patent has been issued covering nonnatural antisense structures in a combinatorial library format that claims the combinatorial library itself in an independent claim [89]. The consequences of this type of claim and/or the likelihood this type of claim will be granted for heterocyclic libraries are unknown at this time.

Conclusions The work described in this report illustrates that, while still in its infancy, mixturebased libraries are being employed and explored as lead discovery tools. Successes are evident in identifying new structures that are interesting novel leads. To date, little use of mixture-based libraries for lead development/optimization has been reported. The emphasis on heterocyclic drug-like mixture-based libraries is so recent that it is too soon to expect that the leads that have been reported are more than initial investigational examples. It is highly likely that the more interesting leads suitable for full development remain proprietary and will only be disclosed in the future. The literature review reported here makes it clear that efforts are continuing to refine and expand mixture-based libraries and their deconvolution methods as lead discovery tools. It is probably obvious to state that the next review of this topic in this series will contain much new information.

References 1 Desai, M.C., Zuckermann, R.N. and Moos, W.H., Drug Dev. Res., 33 (1994) 174. 2 Furka, A., Drug Dev. Res., 36 (1995) 1. 3 Gallop, M.A., Barrett, R.W., Dower, W.J., Fodor, S.P.A. and Gordon, E.M., J. Med. Chem., 37 (1994) 1233. 4 Gordon, E.M., Barrett, R.W., Dower, W.J., Fodor, S.P.A. and Gallop, M.A., J. Med. Chem., 37 (1994) 1385. 5 Rinnova, M. and Lebl, M., Coll. Czech. Chem. Commun., 61 (1996) 172. 6 Terrett, N.K., Gardner, M., Gordon, D.W., Kobylecki, R.J. and Steele, J., Tetrahedron, 51 (1995) 8135. 7 Chaiken, I. and Janda, K.D. (Eds.) Exploiting Molecular Diversity and Solid-Phase Synthesis, American Chemical Society, Washington, DC, U.S.A., 1996. 8 Cortese, R. (Ed.) Combinatorial Libraries Synthesis, Screening, and Application Potential, Walter de Gruyter, Berlin, Germany, 1996. 9 Thompson, L.A. and Ellman, J.A., Chem. Rev., 96 (1996) 555. 10 Fruchtel, J.S. and Jung, G., Angew. Chem., Int. Ed. Engl., 35 (1996) 17. 11 Houghten, R.A., Proc. Natl. Acad. Sci. USA, 82 (1985) 5131. 12 Geysen, H.M., Meloen, R.H. and Barteling, S.J., Proc. Natl. Acad. Sci. USA, 81 (1984) 3998. 13 Houghten, R.A., Pinilla, C., Blondelle, S.E., Appel, J.R., Dooley, C.T. and Cuervo, J.H., Nature, 354 (1991) 84. 14 Lam, K.S., Salmon, S.E., Hersh, E.M., Hruby, V.J., Kazmierski, W.M. and Knapp, R.J., Nature, 354 (1991) 82. 15 Furka, A., Sebestyen, E, Asgedom, M. and Dibo, G., Int. J. Pept. Protein Res., 37 (1991) 487. 16 Zhao, P., Nachbar, R.B., Bolognese, J.A. and Chapman, K., J. Med. Chem., 39 (1996) 350. 17 Burgess, K., Liaw, A.I. and Wang, N., J. Med. Chem., 37 (1994) 2985. 18 Freier, S.M., Konings, D.A.M., Wyatt, J.R. and Ecker, D.J., J. Med. Chem., 38 (1995) 344. 19 Konings, D.A.M., Wyatt, J.R., Ecker, D.J. and Freier, S.M., J. Med. Chem., 39 (1996) 2710. 20 Gordon, D.W. and Steele, J., Bioorg. Med. Chem. Lett., 5 (1995) 47.

16

Techniques for mixture synthesis 21 Safár, P., Stierandová, A. and Lebl, M., In Maia, H.L.S. (Ed.) Peptides 1994 (Proceedings of the 23rd European Peptide Symposium), ESCOM, Leiden, The Netherlands, 1995, pp. 471–472. 22 Terrett, N.K., Bojanic, D., Brown, D., Bungay, P.J., Gardner, M., Gordon, D.W., Mayers, C.J. and Steele, J., Bioorg. Med. Chem. Lett., 5 (1995) 917. 23 Scott, B.O., Siegmund, A.C., Marlowe, C.K., Pei, Y and Spear, K.L., Mol. Div., 1 (1995) 125. 24 Zuckermann, R.N., Kerr, J.M., Kent, S.B.H. and Moos, W.H., J. Am. Chem. Soc., 114 (1992) 10646. 25 Dankwardt, S.M., Phan, T.M. and Krstenansky, J.L., Mol. Div., 1(1995) 113. 26 Murphy, M.M., Schullek, J.R., Gordon, E.M. and Gallop, M.A., J. Am. Chem. Soc., 117 (1995) 7029. 27 Goff, D.A. and Zuckermann, R.N., J. Org. Chem., 60 (1995) 5748. 28 Goff, D.A. and Zuckermann, R.N., J. Org. Chem., 60 (1995) 5744. 29 Hiroshige, M., Hauske, J.R. and Zeng, P., J. Am. Chem. Soc., 117 (1995) 11590. 30 Kocis, P., Issakova, O., Sepetov, N.F. and Lebl, M., Tetrahedron Lett., 36 (1995) 6623. 31 Dankwardt, S.M., Newman, S.R. and Krstenansky, J.L., Tetrahedron Lett., 36 (1995) 4923. 32 Ruhland, B., Bhandari, A., Gordon, E.M. and Gallop, M.A., J. Am. Chem. Soc., 118 (1996) 253. 33 Look, G.C., Schullek, J.R., Holmes, C.P., Chinn, J.P., Gordon, E.M. and Gallop, M.A., Bioorg. Med. Chem. Lett., 6 (1996) 701. 34 Eichler, J., Appel, J.R., Blondelle, S.E., Dooley, C.T., Doerner, B., Ostresh, J.M., Perez-Paya, E., Pinilla, C. and Houghten, R.A., Med. Res. Rev., 15 (1995) 481. 35 Eichler, J. and Houghten, R.A., Mol. Med. Today, 1 (1995) 174. 36 Campbell, D.A., Bermak, J.C., Burkoth, T.S. and Patel, D.V., J. Am. Chem. Soc., 117 (1995) 5381. 37 Boutin, J.A., Hennig, P., Lambert, P., Berth, S., Petit, L., Mahieu, J., Volland, J. and Fauchere, J., Anal. Biochem., 234 (1996) 124. 38 Buettner, J.A., Dadd, C.A., Baumbach, G.A., Masecar, B.L. and Hammond, D.J., Int. J. Pept. Protein Res., 47 (1996) 70. 39 Combs, A.P., Kapoor, T.M., Feng, S., Chen, J.K., Daude-Shaw, L.F. and Schreiber, S.L., J. Am. Chem. Soc., 118 (1996) 287. 40 Dooley, C.T. and Houghten, R.A., Life Sci., 52 (1993) 1509. 41 Blondelle, S.E., Takahashi, E., Houghten, R.A. and Perez-Paya, E., Biochem. J., 313 (1996) 141. 42 Dooley, C.T., Kaplan, R.A., Chung, N.N., Bidlack, J.M. and Houghten, R.A., Pept. Res., 8 (1995) 124. 43 Sasaki, S., Takagi, M., Tanaka, Y. and Maeda, M., Tetrahedron Lett., 37 (1996) 85. 44 Neustadt, B., Wu, A., Smith, E.M., Nechuta, T., Fawzi, A., Zhang, H. and Ganguly, A.K., Bioorg. Med. Chem. Lett., 5 (1995) 2041. 45 Jiracek, J., Yiotakis, A., Vincent, B., Lecoq, A., Nicolaou, A., Checler, F. and Dive, V., J. Biol. Chem., 270 (1995) 21701. 46 Mattioli, S., Imberti, L., Stellini, R. and Primi, D., J. Virol., 69 (1995) 5294. 47 Youngquist, R.S., Fuentes, G.R., Lacey, M.P. and Keough, T., J. Am. Chem. Soc., 117 (1995) 3900. 48 Sepetov, N., Issakova, O., Krchnák, V. and Lebl, M., U.S. Patent 5 470 753, 28 November 1995. 49 Pinilla, C., Buencamino, J., Appel, J.R., Hopp, T.P. and Houghten, R.A., Mol. Div., 1 (1995) 21. 50 Davis, P.W., Vickers, T.A., Wilson-Lingardo, L., Wyatt, J.R., Guinosso, C.J., Sanghvi, Y.S., DeBaets, E.A., Acevedo, O.L., Cook, P.D. and Ecker, D.J., J. Med. Chem., 38 (1996) 4363. 51 Menger, EM., Eliseev, A.V. and Migulin, V.A., J. Org. Chem., 60 (1996) 6666. 52 Burger, M.T. and Still, W.C., J. Org. Chem., 60 (1995) 7382. 53 Still, W.C., Acc. Chem. Res., 29 (1996) 155. 54 Cheng, Y., Suenaga, T. and Still, W.C., J. Am. Chem. Soc., 118 (1996) 1813. 55 Erb, E., Janda, K.D. and Brenner, S., Proc. Natl. Acad. Sci. USA, 91 (1994) 11422. 56 Pinilla, C., Appel, J.R., Blanc, P. and Houghten, R.A., BioTechniques, 13 (1992) 901. 57 Pinilla, C., Appel, J.R., Blondelle, S.E., Dooley, C.T., Eichler, J., Ostresh, J.M. and Houghten, R.A., Drug Dev. Res., 33 (1994) 133. 58 Pinilla, C., Appel, J., Blondelle, S., Dooley, C., Dorner, B., Eichler, J., Ostresh, J. and Houghten, R.A., Biopolym. Pept. Sci. Sect., 37 (1995) 221. ^

17

J.S. Kiely 59 Pinilla, C., Buencamino, J., Appel, J.R., Houghten, R.A., Brassard, J.A. and Ruggeri, Z.M., Biomed. Pept. Protein Nucleic Acids, 1 (1995) 199. 60 Dooley, C.T. and Houghten, R.A., Analgesia, 1 (1995) 400. 61 Blondelle, S.E., Houghten, R.A. and Perez-Paya, E., J. Biol. Chem., 271 (1996) 4093. 62 Wilson-Lingardo, L., Davis, P.W., Ecker, D.J., Hebert, N., Acevedo, O., Sprankle, K., Brennan, T., Schwarcz, L., Freier, S.M. and Wyatt, J.W., J. Med. Chem., 39 (1996) 2720. 63 Pirrung, M.C. and Chen, J., J. Am. Chem. Soc., 117 (1995) 1240. 64 Deprez, B., Williard, X., Bourel, L., Coste, H., Hyafil, F. and Tartar, A., J. Am. Chem. Soc., 117 (1995) 5405. 65 Han, H., Wolfe, M.M., Brenner, S. and Janda, K.D., Proc. Natl. Acad. Sci. USA, 92 (1995) 6419. 66 Janda, K.D., Proc. Natl. Acad. Sci. USA, 91 (1994) 10779. 67 Lebl, M., Krchnák, V., Sepetov, N.F., Seligmann, B., Strop, P., Felder, S. and Lam, K.S., Biopolym. Pept. Sci. Sect., 37 (1995) 177. 68 Madden, D., Krchnák, V. and Lebl, M., Perspect. Drug Discov. Design, 2 (1994) 269. 69 Lam, K.S. and Salmon, S.E., U.S. Patent 5 510240, 23 April 1996. 70 Needels, M.C., Jones, D.G., Tate, E.H., Henkel, G.L., Kochersperger, L.M., Dower, W.J., Barrett, R.W. and Gallop, M.A., Proc. Natl. Acad. Sci. USA, 90 (1993) 10700. 71 Nielsen, J., Brenner, S. and Janda, K.D., J. Am. Chem. Soc., 115 (1993) 9812. 72 Brenner, S. and Lerner, R.A., Proc. Natl. Acad. Sci. USA, 89 (1992) 5381. 73 Nikolaiev, V., Stierandova, A., Krchnák, V., Seligmann, B., Lam, K.S., Salmon, S.E. and Lebl, M., Pept. Res., 6 (1993) 161. 74 Salmon, S.E., Lam, K.S., Lebl, M., Kandola, A., Khattri, P.S., Healy, S., Wade, S., Patek, M., Kocis, P., Krchnák, V., Thorpe, D. and Felder, S., In Hodges, R.S. and Smith, J.A. (Eds.) Peptides: Chemistry, Structure and Biology (Proceedings of the 13th American Peptide Symposium), ESCOM, Leiden, The Netherlands, 1994, pp. 1001–1002. 75 Krchnák, V., Weichsel, AS., Cabel, D. and Lebl, M., Pept. Res., 8 (1995) 198. 76 Ohlmeyer, M.H.J., Swanson, R.N., Dillard, L.W., Reader, J.C., Asouline, G., Kobayashi, R., Wigler, M. and Still, W.C., Proc. Natl. Acad. Sci. USA, 90 (1993) 10922. 77 Baldwin, J.J., Burbaum, J.J., Henderson, I. and Ohlmeyer, M.H.J., J. Am. Chem. Soc., 117 (1995) 5588. 78 Burbaum, J.J., Ohlmeyer, M.H.J., Reader, J.C., Henderson, I., Dillard, L.W., Li, G., Randle, T., Sigal, N.H., Chelsky, D. and Baldwin, J.J., Proc. Natl. Acad. Sci. USA, 92 (1995) 6027. 79 Moran, E.J., Sarshar, S., Cargill, J.F., Shahbaz, M.M., Lio, A. and Mjalli, A.M.M., J. Am. Chem. SOC., 117(1995) 10787. 80 Ostresh, J.M., Husar, G.M., Blondelle, S.E., Dorner, B., Weber, P.A. and Houghten, R.A., Proc. Natl. Acad. Sci. USA, 91 (1994) 11138. 81 Pei, Y. and Moos, W.H., Tetrahedron Lett., 35 (1994) 5825. 82 Houghten, R.A., Ostresh, J.M., Husar, G.M., Dorner, B. and Blondelle, S.E., In Maia, H.L.S. (Ed.) Peptides 1994 (Proceedings of the 23rd European Peptide Symposium), ESCOM, Leiden, The Netherlands, 1995, pp. 459–460. 83 Sepetov, N.F., Krchnák, V., Stankova, M., Wade, S., Lam, K.S. and Lebl, M., Proc. Natl. Acad. Sci. USA, 92 (1995) 5426. 84 Krchnák, V. and Lebl, M., Mol. Div., 1 (1996) 193. 85 Mitscher, L.A., Chemtracts Org. Chem., 8 (1995) 19. 86 Czarnik, A.W., Chemtracts Org. Chem., 8 (1995) 13. 87 Pirrung, M.C., Chemtracts Org. Chem., 8 (1995) 5. 88 Lyttle, M.H., Drug Dev. Res., 35 (1995) 230-236. 89 Summerton, J.E. and Weller, D.D., U.S. Patent 5 506 337, 9 April 1996. ^

^

^

^

^

∨

^

18

Techniques for single-compound synthesis Sepehr Sarshar and Adnan M.M. Mjalli Ontogen Corp., 2325 Camino Vida Roble, Carlsbad, CA 92009, U.S.A.

Issues to consider Traditionally, lead discovery in the pharmaceutical industry is achieved either through rational drug design or high throughput screening of sample collections, plant extracts or animal tissues. Subsequently, medicinal chemists optimize these leads through singlecompound synthesis. Such a process may take 6.5 years on average before producing suitable clinical candidates [ 1]. With the advent of high throughput biological screening, the synthesis of compounds has become the bottleneck in drug discovery. Such a wide gap in efficiency between medicinal chemistry output and high throughput screening outlines the need for novel high throughput synthetic technologies geared towards the automated production of large libraries of small organic molecules. The idea behind high throughput synthesis is not new and has been applied to peptides and oligonucleotides for many years [2]. The success of this methodology is partly due to the pioneering work of Merrifield and his use of solid supports for the synthesis of polypeptides [3]. While this technology was eventually automated [4], Merrifield’s original apparatus could only produce a single polypeptide at a time. Therefore, in order to realize the automated production of large libraries of small non-peptidyl organic molecules, the logistics related to the synthesis, isolation and identification of thousands of compounds had to be addressed. Such strategies eventually led to the development of the field of combinatorial chemistry [2a]. In one case, single compounds are synthesized in parallel and in a spatially dispersed format where the coordinates of the synthesis site in a two-dimensional array can be directly associated with a distinct chemical structure. Although the identification process is simplified in this case, the size of the potential libraries is limited to 104 compounds. Another approach which takes advantage of the split and recombine strategy [5] necessitates the use of chemical [6] or physical tags [7]. With this method, compound libraries with >106 components may be produced at the expense of time-consuming tagging and decoding techniques. While both methods have advantages and shortcomings, when employed in concert they provide the medicinal chemist with powerful tools in lead discovery and optimization. This review will focus on the latest methods and automated technologies directed at the high throughput synthesis of single compounds. These methods include the production of spatially dispersed libraries using uniquely designed reaction vessels and robotic instruments as well as a split and recombine strategy which takes advantage of a radio frequency encoding system.

Annual Reports in Combinatorial Chemistry and Molecular Diversity, Vol. I, pp. 19–29 © 1997 ESCOM Science Publishers B. V.

19

S. Sarshar and A.M.M. Mjalli

Spatially dispersed strategy Automated synthesis of peptide and oligonucleotide libraries was initiated about 10 years ago [4]. Within the last three years, there has been much attention focused on the generation of combinatorial libraries of small molecules. As with biopolymers, the use of solid resin support was central to the advance of this field. In solid-phase synthesis, one of the reactants is covalently bound to the solid support and an excess of the other reactants may be used in each step to drive reactions to completion. Purification of the intermediates and final product is easily achieved through extensive washing of the resin after each chemical step. For the purpose of high throughput synthesis, cleavage of the final

Fig. 1. The Multipin SPOC system.

20

Techniques for single-compound synthesis

Scheme 1. Multipin SPOC synthesis of peptoid libraries.

product directly into the wells of a standard 96-well microtiter plate provides a spatially dispersed combinatorial library of pure single compounds in solution. These libraries may be screened directly in a variety of biological assays and the structure of each compound is assigned based on its physical location in the microtiter plate. Recent literature contains a multitude of examples of synthetic organic methodologies which have been optimized and applied to the solid-phase synthesis of combinatorial libraries [2]. In fact, the production of a number of these libraries has been realized on such systems as the MultipinTM SPOC [8], the DIVERSOMERTM [9] and the OntoBLOCK [10]. All three apparatuses allow the automated production of spatially dispersed combinatorial libraries and facilitate the isolation, identification, screening and archiving of single compounds in distinct physical locations which are crucial factors during lead discovery and optimization.

Scheme 2. Multipin SPOC synthesis of benzodiazepine libraries.

21

S. Sarshar and A. M. M. Mjalli

Fig. 2. The DIVERSOMER apparatus.

The Multipin SPOC system In 1984, Geysen and co-workers introduced a method for the parallel synthesis of peptides using spatially dispersed polymeric pins [ 11]. This Multipin SPOC (solid-phase organic chemistry) system [8] is now commercially available from Chiron Technologies. The design is based on a modular 8 ×12 matrix of polypropylene pins grafted with a methacrylic aciddimethylacrylamide copolymer. The pins are functionalized with a variety of linkers to allow flexibility in the design of library synthesis. Currently, the loading levels of the pins vary from 1–1.5 µmol (Multipin SPOC 1) to 5–7 µmol (Multipin SPOC 5); however, a 50 µmol pin is due to be released in 1997. A standard 96-well microtiter plate may be used to carry out the parallel synthesis of up to 96 single compounds (Fig. 1). The reactants are added to each well of the microtiter plate (A) either by hand or using an automated delivery system. The pin array (B) is then placed on top of the plate and the resin is allowed to incubate with the reactants. Subsequent to each chemical step, B is removed and any excess reactants are rinsed off and the process is repeated until the synthesis is completed. The final products are cleaved into a 96-well microtiter plate. According to the manufacturer, the polypropylene polymer from which the pins are made softens at temperatures above 90 ºC, a matter which should be taken into consideration when planning the library synthesis.

Scheme 3. Examples of libraries synthesized using the DIVERSOMER system.

22

Techniquesfor single-compound synthesis

Using the Multipin SPOC systems, researchers at Chiron have demonstrated the automated solid-phase synthesis of ‘peptoids’ 1 (Scheme 1) [12], while Ellman and co-workers have completed the synthesis of a combinatorial library of 11 200 spatially dispersed benzodiazepines 2 (Scheme 2) [13]. The DIVERSOMER method The DIVERSOMER method [9] was developed by DeWitt and co-workers at ParkeDavis. The apparatus consists of a 4 × 2 array of glass reservoirs embedded in an anodized aluminum reaction block (A) which mates with a Teflon holder block (B) fitted with a series of long glass fritted tubes (C) designed to contain up to 100 mg of polymer support (Fig. 2). Solvent and reactants are placed in the reservoirs of A and reaction takes place as the reactants diffuse through the glass frit at the bottom of C and come into contact with the resin. This design facilitates the filtration and washing of the polymer after each chemical step. The A block may be cooled or heated based on the requirements of the reaction. In the latter case, the long glass tubes (C) act as reflux condensers when a current of chilled inert gas is circulated within the chamber. Teflon gaskets and an acrylic manifold provide an inert and airtight environment so that moisture- and air-sensitive reactions may be carried out on solid support using the setup in D (Fig. 2). The DIVERSOMER eight-pin synthesizer (D) is commercially available from CHEMGLASS Inc. The synthesis of non-oligomeric compounds such as highly substituted benzodiazepines 3, hydantoins 4 and quinolones 5 (Scheme 3) has been successfully automated at ParkeDavis using the DIVERSOMER system [9]. The OntoBLOCK system The OntoBLOCK system [10] is a fully automated synthesis-analysis platform. A set

Fig. 3. Alpha and beta OntoBLOCKs.

23

S. Sarshar and A.M.M. Mjalli

Fig. 4. The OntoBLOCK synthesis station.

of two anodized aluminum blocks (labeled ‘alpha’ and ‘beta’ OntoBLOCKs), each containing 48 2-ml replaceable reaction chambers (Fig. 3), are placed in a PC-controlled robotic synthesis station (Fig. 4) which delivers solvents and reactants to each well according to a customized script. Each chamber consists of a polypropylene frit which supports the synthesis resin. The blocks are mirror images of each other and are designed so that the ‘alpha’ block delivers its contents to the 48 even wells of a microtiter plate whereas the ‘beta’ block delivers its contents to the 48 odd wells. The blocks are speta sealed and can be pressurized up to 30 psi. A docking station and gas inlets provide an inert environment for moisture- and air-sensitive reactions. The temperature of the reactions may be varied from –100 ºC up to +120 ºC. Low temperatures are obtained by pumping liquid N2 through the block. When a high-temperature environment is desired during the course

Fig. 5. The OntoBLOCK system synthesis, analysis and archival strategy.

24

Techniques for single-compound synthesis

Scheme 4. Examples of libraries synthesized using the OntoBLOCK system.

of a reaction, the sealed blocks are placed in a preheated laboratory oven. Reaction block docking stations are located inside the oven. These docking stations provide orbital agitation for the reaction blocks during the course of the reaction. Extensive washing of the resin with a series of appropriate solvents removes excess reactants and impurities from the solid support after each chemical step (Fig. 5). This task is carried out very efficiently by an automated wash station which utilizes two arrays of 48 luer lock needles to simultaneously add solvent to a set of alpha–beta reaction blocks. Microsoft WindowsTM-based control software provides a custom-designed ‘wash profile’ for each combinatorial experiment. This profile specifies the identity, volume, order and delivery speed of each wash solvent, as well as the agitation speed and time between solventdeliveries. Once the final compound has been cleaved from the solid support, the contents of the blocks are drained into a standard 96-well microtiter plate. The solvent is removed in a vacuum oven and appropriate daughter plates (master chemistry plate, master biology plate and analytical plate) are prepared using a 96-channel pipetting robot. The average production capacity of such a work station is between 1000 and 2000 single spatially dispersed compounds per day. Ontogen has also automated the synthesis of a variety of heterocyclic pharmacophores [ 14] (Scheme 4) using the OntoBLOCK system.

Split and recombine strategy For linear multistep syntheses, the split and recombine strategy is a fast and effective method of producing large libraries. In fact, a synthesis with x inputs and y chemical steps gives rise to a library of xy components. Chemical tagging methods have been used extensively for the identification of active components in such libraries [6]. An alternative to chemical tagging is a coding system that uses radio frequency (rf) encodable microchips. 25

S. Sarshar and A. M. M. Mjalli

Fig. 6. Split synthesis strategy using the OntoCODE system.

Glass-encased microchips that are pretuned to emit a unique binary code when pulsed with electromagnetic radiation [15] are widely used to tag and identify laboratory animals via subcutaneous injection. An unlimited number of unique identifier tags are available based on the size of the transponder binary codes. Association of an encodable transponder with a synthesis site during a split and recombine synthesis could allow the storage of relevant background information such as reaction path, chemical inputs or chemical transformation. Hence, uploading these data from the microchip at any time during the experiment would provide detailed information about the identity of the chemical structure associated with each synthesis site. In this technique, an rf encodable microchip is coupled with a capsule of derivatized polystyrene resin such that each unique synthesis site can be tagged with a unique identifier code. The inert nature of the rf transponder construction renders this tagging strategy compatible with virtually all synthetic methods. Additionally, the noninvasive transmission or retrieval of information from any capsule is unambiguous and instantaneous, avoiding the possibility of long reaction and/or analysis times associated with chemical tags. Scientists at Ontogen have developed the OntoCODE system [ 16] which is the result of a modified split synthesis strategy [5] used in conjunction with the OntoBLOCK [10] system. In this approach, the capsules were pooled in different beakers and the location of each capsule was recorded via scanning of its rf transponder code (Fig. 6). The codes were uploaded to a PC database program. Transponders (IMI type) were scanned with a DAS-5004 Integrated ID Data Acquisition Module (BioMedic Data Systems Inc.). Lists of transponder ID codes were uploaded via serial port to a PC using the DASLINK communication software and parsed into an Excel spreadsheet program. Software subrou-

Scheme 5. Cinnamate peptide inhibitors of PTP1B.

26

Techniques for single-compound synthesis

tines were constructed using a Visual Basic Application and were nested in the Excel program. Subsequently, the first round of synthesis was performed. The capsules were scanned and redistributed according to a flow chart provided by the program. This would ensure that all possible members of the library would be synthesized. A histogram of the synthesis was developed in the PC database by following this determinate distribution procedure for all capsules. With the Furka split synthesis strategy [5], a stochastic distribution of multiple copies of individual synthesis sites (resin beads) ensures the successful synthesis of all possible chemical combinations. In the OntoCODE method, this is done by the determinate distribution of synthesis sites (capsules). Of course, determinate distribution of capsules between synthetic steps is only possible if the code on an individual capsule can be retrieved during the synthesis. After the final round of synthesis, transponders were scanned and distributed into individual wells of OntoBLOCK reaction vessels. The final products were cleaved from the solid support and transferred directly to 96-well microtiter plates. Thus, a split and recombine strategy was employed to produce a spatially dispersed library of single compounds without the need for any chemical tagging. As an example of this technology, a library of 125 cinnamate capped tripeptides was synthesized on Rink resin [17] using five amino acid building blocks per coupling step (Scheme 5) [ 16]. This library was screened against the human protein tyrosine phosphatase PTP1B enzyme and a number of potent inhibitors were identified (40 nM < IC50 < 200 nM). Recently, researchers from IRORI have developed a similar technique based on microchips to which information is written before each chemical step [ 18].

Comparison of techniques A successful lead optimization process is, in general, a result of careful and systematic analysis of structure-activity relationships (SAR) within an active series of compounds. The validity of the SAR obtained from screening a combinatorial library is a reflection of an accurate knowledge of the identity, purity and quantity of each library member. Also important is the ease and speed with which information can be relayed from the biological screen to the medicinal chemist. In fact, herein lies the main difference between the two techniques discussed so far. The libraries obtained from split and recombine syntheses which produce a single compound per resin bead have the advantage of producing >106 compounds. However, there are many drawbacks to this method. First, each bead in the resin pool must be chemically tagged at every step. This is a time-consuming process and also presents challenges to the chemists who must design the syntheses in order to avoid the potential chemical incompatibilities between the compounds and the tags on each bead. Second, only nanomolar levels of material are available from each bead, rarely enough material for multiple biological assays, structure identification and archiving. Hence, after the chemical tags have been removed and analyzed, the medicinal chemist must resynthesize and purify larger quantities of the active compound in order to carry out further biological testing. Third, the components of the library are often screened while still bound to the resin. This may lead to erroneous results since solution and polymer-bound conformations of these compounds may differ drastically. Similarly, steric interactions between the 27

S. Sarshar and A. M. M. Mjalli

polymer surface and the enzyme or receptor may add to this problem. The larger sizes of these libraries prohibit the establishment of comprehensive SAR since only a small percentage of the active components are chosen for structural identification. Hence a compound could be interpreted as inactive when in fact it may not have formed during the reaction. The problems outlined above are easily circumvented with the use of spatially dispersed synthesis techniques. Both the DIVERSOMER and OntoBLOCK methods allow the synthesis and isolation of milligram quantities of each library component without any need for chemical tagging. Once the compounds have been screened in solution, their identity is immediately assigned based on their physical location in the microtiter plate. The purity of each component can also be evaluated using a variety of analytical techniques (e.g. NMR, MS, HPLC). Researchers at Ontogen have designed special computer software which allows rapid analysis of the biological screening data from combinatorial libraries. Before a library is synthesized, a registration program generates structures for each library component. Each structure is then assigned a systematic number and is stored in a customized Daylight database. A customized Oracle database is then used to associate this systematic number with the location, biological and analytical data of each compound in a combinatorial library. The final data may be plotted as 3D histograms (plate location versus biological activity) and SAR can be established almost immediately. The OntoCODE system effectively combines the advantages of both the spatially dispersed and split and recombine strategies and allows the chemist to build large archivable combinatorial libraries with milligram quantities of each compound and without the need for chemical tagging.

Conclusions and future prospects The automated spatially dispersed synthesis of single compounds has effectively shortened lead discovery time by properly addressing the strict requirements of traditional medicinal chemistry. Pure, single compounds are synthesized on a milligram scale, can be identified without chemical tagging and archived for future screens. However, the inability to accurately quantify the isolated amount of every library component is a major obstacle during lead optimization. High throughput purification, identification and quantification of combinatorial libraries could potentially resolve this issue. Such a task could be automated with the advent of parallel HPLC-MS systems with in-line evaporative light scattering (ELS) detectors. Solid-phase organic chemistry is another area where analytical techniques are not as efficient as their solution counterparts. However, high-resolution NMR spectra of polymer-bound low molecular weight organic compounds have been obtained using magic angle spinning (MAS) probes including the Varian liquid NanoprobeTM [19]. Such techniques should facilitate the optimization of reaction conditions on solid support and may also obviate the need for chemical tags in the future.

References 1 Kreeger, K.Y., Scientist, 10 (1996) 1. 2 a. Thompson, L.A. and Ellman, J.A., Chem. Rev., 96 (1996) 555. b. Ellman, J.A., Acc. Chem. Res., 29 (1996) 132. 28

Techniques for single-compound synthesis

3 4 5

6

7 8 9

10

11 12

13 14

15 16

17 18 19

c. Früchtel, J.S. and Jung, G., Angew. Chem., Int. Ed. Engl., 35 (1996) 17. d. Gordon, E.M., Gallop, M.A. and Patel, D.V., Acc. Chem. Res., 29 (1996) 144. Merrifield, R.B., J. Am. Chem. Soc., 85 (1963) 2149. Merrifield, R.B., Angew. Chem., Int. Ed. Engl., 24 (1985) 799. a. Sebestyen, E, Dibo, G., Kovacs, A. and Furka, A., Biomed. Chem. Lett., 3 (1993) 413. b. Furka, A., Sebestyen, E., Asgedom, M. and Dibo, G., Int. J. Pept. Protein Res., 37 (1991) 487. c. Furka, A., Sebestyen, E, Asgedom, M. and Dibo, G., Abstracts of the 14th International Congress of Biochemistry, Vol. 5, Prague, Czechoslovakia, 1988, p. 47. a. For a recent review on this subject, see Janda, K., Proc. Natl. Acad. Sci. USA, 91 (1994) 10779. b. Oligonucleotide tags: Nielsen, J., Brenner, S. and Janda, K.D., J. Am. Chem. Soc., 115 (1993) 9812. c. Peptide tags: Kerr, J.M., Banville, S.C. and Zuckermann, R.N., J. Am. Chem. Soc., 115 (1993) 2529. Haloaromatic tags: d. Still, W.C., Acc. Chem. Res., 29 (1996) 155. e. Baldwin, J.J., Burbaum, J.J., Henderson, I. and Ohlmeyer, M.H.J., J. Am. Chem. Soc., 117 (1995) 5588. For the use of ‘tea bags’ in the synthesis of peptide libraries, see: Houghten, R.A., Proc. Natl. Acad. Sci. USA, 82 (1985) 5131. Multipin SPOC is available from Chiron Technologies ([email protected]). a. DeWitt, S.H. and Czarnik, A.W., Acc. Chem. Res., 29 (1996) 114. b. Cody, D.M.R., DeWitt, S.H., Hodges, J.C., Kiely, J.S., Moos, W.H., Pavia, M.R., Roth, B.D., Schroeder, M.C. and Stankovic, C.J., U.S. Patent No. 5 324483, 28 June 1994. c. DeWitt, S.H., Kiely, J.S., Stankovic, C.J., Schroeder, M.C., Cody, D.M.R. and Pavia, M.R., Proc. Natl. Acad. Sci. USA, 90 (1993) 6909. a. Cargill, J.F., Maiefski, R.R. and Toyonaga, B.E., International Symposium on Laboratory Automation and Robotics Proceedings (ISLAR ’95), Boston, MA, U.S.A., Zymark Corp., 1996, pp. 221–234. b. Cargill, J.F. and Maiefski, R.R., Lab. Robotics Autom., 8 (1996) 139. Geysen, N.M., Meloen, R.H. and Barteling, S.J., Proc. Natl. Acad. Sci. USA, 81 (1984) 3998. Simon, R.J., Kania, R.S., Zuckermann, R.N., Huebner, V.D., Jewell, D.A., Banville, S., Ng, S., Wang, L., Rosenberg, S., Marlowe, C.K., Spellmeyer, D.C., Frankel, A.D., Santi, D.V., Cohen, F.E. and Bartlett, P.A., Proc. Natl. Acad. Sci. USA, 89 (1992) 9367. Bunin, B.A., Plunkett, M.J. and Ellman, J.A., Methods Enzymol., 267 (1996) 448. a. Sarshar, S., Siev, D. and Mjalli, A.M.M., Tetrahedron Lett., 37 (1996) 835. b. Zhang, C., Moran, E.J., Woiwode, T.F., Short, K.M. and Mjalli, A.M.M., Tetrahedron Lett., 37 (1996) 751. c. Mjalli, A.M.M., Sarshar, S. and Baiga, T.J., Tetrahedron Lett., 37 (1996) 2943. a. U.S. Patent No. 5 252 962, Biomedic Data Systems, Maywood, NJ, U.S.A., October 1993. b. U.S. Patent No. 5 351 052, Texas Instruments Inc., Austin, TX, U.S.A., September 1994. a. Mjalli, A.M.M. and Toyonaga, B.E., Net. Sci., 1 (1995) (http://www.awod.codnetsci). b. Moran, E.J., Sarshar, S., Cargill, J.F., Shahbaz, M.M., Lio, A., Mjalli, A.M.M. and Armstrong, R.W., J. Am. Chem. Soc., 117 (1995) 10787. Rink, H., Tetrahedron Lett., 28 (1987) 3787. Nicolaou, K.C., Xiao-Yi, X., Parandoosh, Z., Senyei, M. and Nova, M.P., Angew. Chem., Int. Ed. Engl., 34 (1995) 2289. a. Keifer, P.A., J. Org. Chem., 61 (1996) 1558. b. Sarkar, S.K., Garigipati, R.S., Adams, J.L. and Keifer, P.A., J. Am. Chem. Soc., 118 (1996) 2305.

29

Recent advances in solid-phase synthesis Steven E. Hall Sphinx Pharmaceuticals, A Division of Eli Lilly & Co., 4615 University Drive, Durham, NC 27717, U.S.A.

Introduction Interest in combinatorial chemistry continues to grow as the efficient generation of large numbers of compounds finds utility not only for pharmaceutical research but also for the rapid synthesis of other classes of organic molecules [1]. There have been several recent descriptions of combinatorial libraries generated by solution chemistry techniques [2–4]; however, the majority of researchers have taken advantage of solid-phase chemistry procedures. Several recent reviews have summarized the general area of combinatorial chemistry [5–18] as well as solid-phase chemistry [19,20]. As a result, this review will focus only on work published from 1995 to May 1996. It will also emphasize new reactions developed for the generation of non-oligomeric molecules.

Solid-phase chemistry Cleavage methodology Most combinatorial chemistry laboratories have employed either cross-linked polystyrene or polystyrene grafted with polyethylene glycol (Tentagel). Many of the linkers used to connect the polymer to the growing molecule of interest were originally developed for use in peptide synthesis, but nonetheless remain useful for small-molecule synthesis. One disadvantage of these linkers is that cleavage from the resin results in an invariant functional group, usually an acid, carboxamide, or alcohol. Two groups [21,22] have recently described a silicon-based linker that leaves no trace of the linker (Fig. 1). A third group [23] demonstrated that halodesilylation could also be used to cleave the compounds from the resin to produce aryl bromides or iodides. Reactions Workers at Sphinx [24] have published a synthetic route to a series of highly functionalized biphenyl compounds. The key reactions in this sequence are the biaryl Stille and/or Suzuki couplings to form the biphenyl nucleus and the use of Mitsunobu chemistry to attach the variable side chains (Fig. 2). The illustrated route utilizes a solution-based diaryl coupling followed by on-resin Mitsunobu reactions. Although the diaryl coupling proceeds on solid phase, this group’s desire to maximize resin loading prompted them to implement the former route. Chemists from Merck [25] have also described the use of the Mitsunobu reaction for the functionalization of either phenols or benzyl alcohols using TMAD/Bu3P. Similar chemistry has been applied successfully to intermediate phenols prepared using the Mimotope pin technology [26]. 30

Annual Reports in Combinatorial Chemistry and Molecular Diversity, Vol. 1, pp. 30–40 © I997 ESCOM Science Publishers B. V.

Recent advances in solid-phase synthesis

Fig. 1. Resin cleavage by protodesilylation.

Researchers have focused much of their attention on developing new carbon-carbon bond forming reactions. Two groups have described the successful application of an intramolecular Heck reaction (Fig. 3). In one example, the reaction is used to close six-

Fig. 2. Mitsunobu reactions on solid phase.

31

S.E. Hall

Fig. 3. Solid-phase intramolecular Heck reactions.

membered lactam 3 [27], whereas the second group [28] utilized the reaction to provide excellent yields of macrocyclic lactams 4 . Several routes to cycloalkanes have been developed (Fig. 4). Bolton [29] described the synthesis of azabicyclo[4.3 .0]nonen-8-ones using an intramolecular Pauson–Khand cyclization. The relative stereochemistry was controlled in this cyclization step which proceeded in good yield regardless of whether the nitrogen atom bore an allyl (shown) or propargyl (not shown) substituent. The ene reaction was employed in a route to trans-substituted cyclopentane and cyclohexanes [30]. Reductive cleavage from the resin with LiBH4 provided the diol or, alternatively, cleavage with Ti(OEt)4 produced the diester. A route to bicyclic alkanes was published by Ley et al. [31], which involved a double Michael addition with a resin-bound unsaturated ester to afford the highly functionalized bicyclo[2.2.2]octane(Fig. 5). 32

Recent advances in solid-phase synthesis

Fig. 4. Cyclizations using carbon-carbon bond formation.

A number of complex heterocycles have been assembled using dipolar cycloadditions (Fig. 6). The Affymax group [32] published an approach to the synthesis of tetrasubstituted pyrrolidines by the reaction of azomethine ylids with electron-deficient olefins. A similar approach was described by researchers at Monsanto; however, the aldehyde component was bound to the resin instead of the amino acid [33]. Kurth and co-workers [34] described a route to 2,5-disubstituted tetrahydrofurans using a nitrile oxide cycloaddition as the key reaction. Mjalli et al. [35] synthesized highly substituted pyrroles using the dipolar cycloaddition of intermediate 5 with mono- or disubstituted acetylenes.

Fig. 5. Tandem Michael addition reaction.

33

S. E. Hall

Fig. 6. Dipolar cycloaddition reactions.

Solid-phase syntheses of a wide variety of heterocycles have been published over the past year (Fig. 7). A number of these involve the relative straightforward application of known solution-based sequences to solid phase. This includes solid-phase versions of the Biginelli reaction to prepare pyrimidinones [36], dihydropyridines [37], pyrazoles [38], and phthalhydrazides [39]. 34

Recent advances in solid-phase synthesis

Fig. 7. Solid-phase heterocycle synthesis.

A solid-phase-supported version of the Pictet–Spengler reaction has been demonstrated for the synthesis of mono- and disubstituted tetrahydro-β-carbolines [40,41]. Disubstituted quinolines have been prepared via solid-phase synthesis utilizing three classes of building blocks: aryl methyl ketones, ω-functionalized acids, and primary amines [42]. Several variants of the Wittig olefination reaction have been adapted to solid phase (Fig. 8). Williard et al. [43] prepared a series of stilbenes using the Horner–Emmons reaction on resin-bound aldehydes. A route to either substituted or unsubstituted unsatu35

S.E. Hall

Fig. 7. (continued).

rated carboxamides was developed using resin-bound phosphonates under MasamuneRoush conditions [44]. As illustrated in Fig. 6, imines serve as important intermediates for a number of solidphase syntheses. Additional uses for these versatile intermediates are described in Fig. 9. Resin-bound thioketene acetals have been shown to condense with imines to provide, after reductive cleavage, a route to substituted aminoalcohols 6 [45]. Aminophosphinic acids 7 were prepared by allowing bis(trimethylsilyl)phosphonite to react with resin-bound imines [46]. A number of selective transformations (Fig. 10) have been described which include the selective allylation on alcohols in the presence of amides [47], the Lewis acid catalyzed cleavage of benzyl alcohol esters with secondary amines to afford tertiary amides [48], the synthesis of ketones from Weinreb-type amides [49], and the synthesis of tertiary amines by a Michael addition/alkylation/Hoffman elimination sequence [50]. A number of other solid-phase chemistries have been described for the generation of combinatorial libraries. This includes the synthesis of urea-linked diamines [51], bisamide phenols [52,53], polyphenols [54], thiazolidinones [55,56], thiazolines [57], hydantoins [58], and diaminoalcohols [59,60]. Readers interested in more details of these libraries are referred to the section on small molecule libraries.

Solid-phase-supported reagents and scavengers All of the reactions discussed up to this point are conducted in such a manner that the growing molecule of interest is bound to a solid support. Another important method 36

Recent advances in solid-phase synthesis

Fig. 8. Olefination reactions on solid phase.

utilizes reagents which are bound to a solid support. There have been numerous descriptions of new reagents and new applications of existing reagents over the past year. Perhaps the most notable comes from the Monsanto laboratories in which Parlow [61] de-

Fig. 9. Imines in solid-phase synthesis.

37

S.E. Hall

Fig. 10. Miscellaneous functional group manipulations.

Fig. 11. Solid-supported reagents in synthesis: (a) poly(4-vinylpyridinium dichromate), cyclohexane, 65 °C, 12 h; (b) perbromide on Amberlyst A-26, cyclohexane, 65 °C, 12 h; (c) Amberlite IRA-900(4-chloro-1-methyl5-(trifluoromethyl)-lH-pyrazol-3-ol), cyclohexane, 65 °C, 12 h; (d) reagents a, b, c, 65 °C, 16 h.

38

Recent advances in solid-phase synthesis

scribed the synthesis of heterocyclic ether 8 by a three-step, one-pot sequence (Fig. 11). The resins necessary for oxidation, bromination, and alkylation were mixed simultaneously for 16 h at 65 °C with 1-phenethyl alcohol to provide a 48% yield of heterocycle 8. Solid-supported reagents which have found utility include Nafion-scandium Lewis acid catalyst (allyl additions to aldehydes) [62], HOBt (medium-ring lactamization) [63], EDC (preparation of active esters) [64], and thiazolium hydrotribromide (brominations) [65]. A review has also appeared describing the use of supported reagents in separation science, primarily for the selective sequestration of metal ions [66].

Conclusions As recent as three years ago, the synthetic chemist had a limited portfolio of reactions which could be conducted on solid phase and most of the useful reactions involved carbon-heteroatom bond formation, in particular, carbon-nitrogen bonds. An explosion in interest in combinatorial chemistry techniques has brought solid-phase synthesis back into the spotlight. During the past 18 months, a number of groups have published on the solid-phase synthesis of complex, functionality-rich small molecules. At the current rate of progress, one would expect that, within a few years, the synthetic chemist will have the ability to tackle nearly any synthesis using solid-phase techniques.

References 1 Hsieh-Wilson, L.C., Xiang, X.-D. and Schultz, P.G., Acc. Chem. Res., 29 (1996) 164. 2 Keating, T.A. and Armstrong, R.W., J. Am. Chem. Soc., 118 (1996) 2574. 3 Cheng, S., Comer, D.D., Williams, J.P., Myers, P.L. and Boger, D.L., J. Am. Chem. SOC., 118 (1996) 2567. 4 Han, H. and Janda, K.D., J. Am. Chem. Soc., 118 (1996) 2539. 5 Terrett, N.K., Gardner, M., Gordon, D.W., Kobylecki, R.J. and Steele, J., Tetrahedron, 51 (1995) 8135. 6 Lee, J., Barrett, R.E. and Bovy, P.R., Lett. Pept. Sci., 2 (1995) 253. 7 Seligmann, B., Abdul-Latif, F., Al-Obeidi, F., Flegelova, Z., Issakova, O., Kocis, P., Krchnak, V., Lam, K., Lebl, M., Ostrem, J., Safar, P., Sepetov, N., Stierandova, A,, Strop, P. and Wildgoose, P., Eur. J. Med. Chem., 30 (1995) 319. 8 Gordon, E.M., Curr. Opin. Biotechnol., 6 (1995) 624. 9 Chabala, J.C., Curr. Opin. Biotechnol., 6 (1995) 632. 10 DeWitt, S.H. and Czarnik, A.W., Curr. Opin. Biotechnol., 6 (1995) 640. 11 Doyle, P.M., J. Chem. Technol. Biotechnol., 64 (1995) 317. 12 Sofia, M.J., Drug Discov. Technol., 1 (1996) 27. 13 Gordon, E.M., Gallop, M.A. and Patel, D.V., Acc. Chem. Res., 29 (1996) 144. 14 Armstrong, R.W., Combs, A.P., Tempest, P.A., Brown, S.D. and Keating, T.A., Acc. Chem. Res., 29 (1996) 123. 15 Still, W.C., Acc. Chem. Res., 29 (1996) 155. 16 DeWitt, S.H. and Czarnik, A.W., Acc. Chem. Res., 29 (1996) 114. 17 Ellman, J.A., Acc. Chem. Res., 29 (1996) 132. 18 Thompson, L.A. and Ellman, J.A., Chem. Rev., 96 (1996) 555. 19 Fruchtel, J.S. and Jung, G., Angew. Chem., Int. Ed. Engl., 35 (1996) 17. 20 Hermkens, P.H.H., Ottenheijm, H.C.J. and Rees, D.C., Tetrahedron, 52 (1996) 4527. 21 Chenera, B., Finkelstein, J.A. and Verber, D.F., J. Am. Chem. Soc., 117 (1995) 11999. 22 Plunkett, M.J. and Ellman, J.A., J. Org. Chem., 60 (1995) 6006.

39

S.E. Hall 23 Han, Y., Walker, S.D. and Young, R.N., Tetrahedron Lett., 37 (1996) 2703. 24 Pavia, M.R., Cohen, M.P., Dilley, G.J., Dubuc, G.R., Durgin, T.L., Forman, F.W., Hediger, M.E., Milot, G., Powers, T.S., Sucholeiki, I., Zhou, S. and Hangauer, D.G., Bioorg. Med. Chem., 4 (1996) 659. 25 Rano, T.A. and Chapman, K.T., Tetrahedron Lett., 36 (1995) 3789. 26 Valerio, R.M., Bray, A.M. and Patsiouras, H., Tetrahedron Lett., 37 (1996) 3019. 27 Goff, D.A. and Zuckermann, R.N., J. Org. Chem., 60 (1995) 5748. 28 Hiroshige, M., Hauske, P. and Zhou, P., J. Am. Chem. Soc., 117 (1995) 11590. 29 Bolton, G.L., Tetrahedron Lett., 37 (1996) 3433. 30 Tietze, L.F. and Steinmetz, A., Angew. Chem., Int. Ed. Engl., 35 (1996) 651. 31 Ley, S.V., Mynett, D.M. and Koot, W.-J., Synlett., (1995) 1017. 32 Murphy, M.M., Schullek, J.R., Gordon, E.M. and Gallop, M.A., J. Am. Chem. Soc., 117 (1995) 7029. 33 Hamper, B.C., Dukesherer, D.R. and South, M.S., Tetrahedron Lett., 37 (1996) 3671. 34 Beebe, X., Chiappari, C.L., Olmstead, M.M., Kurth, M.J. and Schore, N.E., J. Org. Chem., 60 (1995) 4204. 35 Mjalli, A.M.M., Sarshar, S. and Baiga, T.J., Tetrahedron Lett., 37 (1996) 2943. 36 Wipf, P. and Cunningham, A., Tetrahedron Lett., 36 (1995) 7819. 37 Gordeev, M.F., Patel, D.V. and Gordon, E.M., J. Org. Chem., 61 (1996) 924. 38 Marzinzik, A.L. and Felder, E.R., Tetrahedron Lett., 37 (1996) 1003. 39 Nielsen, J. and Rasmussen, P.H., Tetrahedron Lett., 37 (1996) 3351. 40 Mohan, R., Chou, Y.-L. and Morrissey, M.M., Tetrahedron Lett., 37 (1996) 3963. 41 Kaljust, K. and Unden, A., Tetrahedron Lett., 36 (1995) 921 1. 42 Ruhland, T. and Kunzer, H., Tetrahedron Lett., 37 (1996) 2757. 43 Williard, R., Jammalamadaka, V., Zava, D., Benz, C.C., Hunt, C.A., Kushner, P.J. and Scanlan, T.S., Chem. Biol., 2 (1995) 45. 44 Johnson, C.R. and Zhang, B., Tetrahedron Lett., 36 (1995) 9253. 45 Kobayashi, S., Hachiya, I., Suzuki, S. and Moriwaki, M., Tetrahedron Lett., 37 (1996) 2809. 46 Boyd, E.A., Chan, W.C. and Loh Jr., V.M., Tetrahedron Lett., 37 (1996) 1647. 47 Motorina, I.A., Parly, E and Grierson, D.S., Synlett., (1996) 389. 48 Barn, D.R., Morphy, J.R. and Rees, D.C., Tetrahedron Lett., 37 (1996) 3213. 49 Dinh, T.Q. and Armstrong, R.W., Tetrahedron Lett., 37 (1996) 1161. 50 Morphy, J.R., Rankovic, Z. and Rees, D.C., Tetrahedron Lett., 37 (1996) 3209. 51 Hutchins, S.M. and Chapman, K.T., Tetrahedron Lett., 36 (1995) 2583. 52 Dankwardt, S.M., Phan, T.M. and Krstenansky, J.L., Mol. Div., 1 (1995) 113. 53 Meyers, H.V., Dilley, G.J., Durgin, T.L., Powers, T.S., Winssinger, N.A., Zhu, H. and Pavia, M.R., Mol. Div., 1 (1995) 13. 54 Green, J., J. Org. Chem., 60 (1995) 4287. 55 Holmes, C.P., Chinn, J.P., Look, G.C., Gordon, E.M. and Gallop, M.A., J. Org. Chem., 60 (1995) 7328. 56 Look, G.C., Schullek, J.R., Holmes, C.P., Chinn, J.P., Gordon, E.M. and Gallop, M.A., Bioorg. Med. Chem. Lett., 6 (1996) 707. 57 Patek, M., Drake, B. and Lebl, M., Tetrahedron Lett., 36 (1995) 2227. 58 Dressman, B.A., Spangle, L.A. and Kaldor, S.W., Tetrahedron Lett., 37 (1996) 937. 59 Kick, E.K. and Ellman, J.A., J. Med. Chem., 38 (1995) 1427. 60 Wang, G.T., Li, S., Wideburg, N., Krafft, G.A. and Kempf, D.J., J. Med. Chem., 38 (1995) 2995. 61 Parlow, J.J., Tetrahedron Lett., 36 (1995) 1395. 62 Kobayashi, S. and Nagayama, S., J. Org. Chem., 61 (1996) 2256. 63 Huang, W. and Kalivretenos, A.G., Tetrahedron Lett., 36 (1995) 9113. 64 Adamczyk, M., Fishpaugh, J.R. and Mattingly, P.G., Tetrahedron Lett., 36 (1995) 8345. 65 Kessat, A. and Babadjamian, A., Eur. Polymer J., 32 (1996) 193. 66 Alexandratos, S.D. and Crick, D.W., Ind. Eng. Chem. Res., 35 (1996) 635.

40

Selection of supports for solid-phase organic synthesis Irving Sucholeiki Sphinx Pharmaceuticals, A Division of Eli Lilly & Co., 840 Memorial Drive, Cambridge, MA 02139, U.S.A.

Introduction The field of peptide synthesis has been one of the engines that have driven the development of new supports and linkers for solid-phase organic synthesis. Pioneering work by Merrifield and others in peptide synthesis has made 1% cross-linked polystyrene the standard support for solid-phase organic chemical reactions [1,2]. The synthesis of nonpeptide molecules has placed demands on this support that, in some cases, have produced less-than-optimal results. There are several issues unique to the solid-phase production of non-peptide molecules that differentiate it from the area of peptide synthesis. This differentiation comes from the fact that a larger variety of different compounds are expected to be produced in a combinatorial library and hence this translates to a larger number of different reactions that are attempted. This is in contrast to standard solid-phase peptide synthesis where one is dealing with a set number of reaction types (i.e., base or acid deprotection, amide coupling, peptide cleavage). Therefore, several important issues must be considered before choosing a support for combinatorial library production, such as whether the support is compatible with the intended reaction conditions and/or reaction solvent. Even if that does not pose a problem, one will find that transferring a reaction from solution to a solid support can dramatically affect the reaction rate. Additionally, there is the issue of how to monitor the solid-phase reaction. The type of support one chooses can sometimes restrict the type of method one uses to monitor the solid-phase reaction. Lastly, there is the issue of the loading capacity of the support, which is proving to be a far more important issue for the field of non-peptide combinatorial chemistry than, for example, the fields of solid-phase peptide or oligonucleotide synthesis. These are the issues that will be dealt with in the following review and, like in many things, one will find that achieving the ideal in one area will, in many instances, be a result of compromises in other areas.

Monitoring the solid support An evaluation of a potential solid support begins by first determining which method or methods will be used to monitor what is attached on the particular support. Certain types of supports are not compatible with all of the available methods of analysis or, at least, not as easy to evaluate as others. Table 1 lists the major techniques available to monitor solid-phase reactions. Measuring the absorbance of a UV-active group cleaved off a support and elemental analysis of the support itself are both methods Annual Reports in Combinatorial Chemistry and Molecular Diversity, Vol. 1, pp. 41–49 © 1997 ESCOM Science Publishers B. V.

41

I. Sucholeiki TABLE 1 TECHNIQUES USED TO MONITOR SOLID-PHASE ORGANIC SYNTHESIS Method

Example combinatorial references

Elemental analysis Spectrophotometric Gel-phase carbon NMR Gel-phase proton NMR FTIR Mass spectrometry

3 and 4 5 and 6 7 and 8 9 and 10 11 and 12 13 and 14

commonly used to measure the loading level [3–6]. Other techniques which are more qualitative yet give more structural information about the bound molecule include gelphase carbon NMR, gel-phase proton NMR, FTIR and laser desorption mass spectrometry [7–14]. In general, inorganic supports such as those based on silica rely on elemental analysis, FTIR and spectrophotometric measurements of a cleaved UV-active molecule to gauge the efficiency of a solid-phase reaction. Polymeric supports, in addition to these techniques, can also be monitored using gel-phase carbon and proton NMR. It will be discussed later in this review how these techniques have been used to not only give structural information on the bound molecules but also to give information regarding their environment. In addition, the chapter by Fitch in this book will give a more comprehensive review of available analytical techniques.

Types of supports There are many types of supports available for solid-phase organic synthesis. They can be configured to take on many shapes and sizes (i.e., beads, pins, disks, sheets, etc.) [ 15,16]. The most common support is polystyrene cross-linked with 1–2% divinylbenzene. This support has the advantage of exhibiting high loading levels (up to 2 mmol reactive groups/g) and has a proven track record (>20 years) in solid-phase peptide synthesis, which began with Merrifield [1]. Derivatization is usually accomplished through direct chloromethylation or lithiation followed by treatment with some electrophile [2]. The support also exhibits large variations in swelling volume when exposed to varying polar solvents. For example, it can swell up to 3 times its dry volume in dimethylformamide and up to 4 times its dry volume in tetrahydrofuran. As a result of this variability in swelling volume, its application in continuous-flow solid-phase peptide or oligonucleotide synthesis became impractical. Many new composite supports were then created which exhibited less variation in swelling when exposed to various polar and/or aqueous solvents. Some of these new composite supports as well as other materials now available for combinatorial chemistry are listed below. Kieselguhr-polyacrylamide composite This is a composite support consisting of polyacrylamide gel trapped in the porous structure of kieselguhr. The loading is typically 0.1–0.2 mmol amine/g resin [17–19]. 42

Supports for solid-phase organic synthesis

Polystyrene-polyacrylamide composite Also known as Polyhype-based composites, this is a composite of 10– 50% cross-linked polystyrene containing a very high pore volume (≈90%) and containing polyacrylamide inside the cavities. Both covalently attached polystyrene-polyacrylamide and noncovalently attached polystyrene-polyacrylamide composites are available with substitutions in the range of 0.1–2.0 mmol amine/g resin [20,21]. Polystyrene-polyethylene glycol composite This is a composite of 1 % cross-linked polystyrene and covalently attached polyethylene glycol (PEG) (2000–3000 Da). One form, sold under the name TentaGel (Rapp Polymere), is made by polymerizing ethylene oxide on to a primary alcohol located on the crosslinked polystyrene [22,23]. A second form, sold under the name PEG-PS (Perseptive), is made by attaching, through the formation of an amide bond, an already formed amino terminal PEG chain (Jeffamine) to 1 % cross-linked polystyrene [24]. The loading capacities for both types of supports are typically in the range of 0.2–0.3 mmol amine/g resin. Polyethylene glycol-polyacrylamide composite (PEGA) This is a polyacrylamide-PEG composite produced through the copolymerization of N,N -dimethyl acrylamide, bis-2-acrylamidoprop-1-yl-PEG1900 and sarcosine ethyl ester. This support is reported to have good swelling characteristics in a broad range of solvents ranging from toluene to water. The support has reported loading capacities in the range of 0.4–0.8 mmol amine/g [25,26]. Controlled pore glass (CPG) This is a highly porous form of nearly pure silica (~96% SiO2) which is available in many pore diameters in the range of 40–2500 Å and exhibiting a very narrow pore diameter range (±10 Å) . This support has primarily been used in oligonucleotide synthesis but it has also been used in solid-phase peptide synthesis [27–30]. Typical loadings are in the range of 0.06–0.17 mmol amine/g. Kieselguhr This is a silica-based support (85–90% SiO2) containing alumina (3–6%) and can be found in nature in the form of fossilized diatoms. This type of support has found primary usage in both continuous-flow oligonucleotide and peptide synthesis. Typical loadings are in the range of 0.05–0.13 mmol amine/g and go by the names NovaSyn KD (CalbiochemNovabiochem Corp.) or Macrosorb K (Phase Separations Ltd.) [31]. Polyethylene-based supports There are a couple of polyethylene-based composites that have shown some success in solid-phase organic synthesis. Merrifield and co-workers [32] introduced a polystyrenepolyethylene composite support in the form of a sheet for solid-phase peptide synthesis, which exhibited an amine loading level of 1 .0 mmol amine/g. This material has since been molded into various shapes, including tubing for a continuous-flow peptide synthesizer (reported loading=0.21 mmol amine/m tube or 0.67 mmol amine/g) [33]. Geysen and coworkers [34,35] synthesized a polyethylene-polymethacrylate copolymer which was molded 43

I. Sucholeiki

to the shape of a pin to function as a solid support for peptide synthesis. The pins were later composed of a polypropylene-polymethacrylate-dimethylacrylamide composite and applied to not only peptide synthesis but also to non-peptide combinatorial library production [36]. At least two types of detachable pin crowns are available (Chiron Technologies), a 1.1 µmol amine/crown (around 0.025 mmol amine/g) and a 5–7 µmol amine/crown [37]. Cellulose-based supports Cellulose supports in the form of paper sheets have been used for the multiple simultaneous synthesis of peptides. An amino acid is attached to the surface of the cellulose support through the formation of an ester bond at the C-terminal end of the amino acid [38,39]. Typical loading levels for this support are in the range of 0.5–0.6 µmol/cm2. Peptide libraries synthesized on this support have been primarily tested while still attached to the cellulose surface. Cotton has also been used as a support for the synthesis of peptide libraries exhibiting loading levels ofaround 0.04–0.12 mmol/g (1–3 µmol/cm2) [40].

The effect of the support on the rate of solid-phase reactions In an ideal system the reaction kinetics of a solid-phase reaction will be solely dependent on the rate of diffusion of reagents and reaction products into and out of the pores of the support. Yet, other factors such as the type of solid support and/or reacting solvent used can have a dramatic effect on both the reaction rate and the overall yield of the desired product. In the case of polystyrene, higher levels of cross-linking can negatively affect the rate of a solid-phase organic chemical reaction. In order to understand why this is the case, one must first examine the physical nature of the resin-bound substrate. Lightly cross-linked polystyrene (with 1–2% divinylbenzene) can be seen as a sponge with the majority of the active sites located in the cavities. Solvents such as methylene chloride cause the support to expand, while solvents such as methanol cause the support to contract. Higher levels of cross-linking reduce the extent to which the support expands and contracts and also affects the level of reactivity. Work by Regen [41,42] using electron spin resonance (ESR) spectroscopy of nitroxide radical probes showed that the bound substrate was more restricted than a substrate dissolved in the swollen particle. Regen also showed that a higher degree of swelling was associated with a decrease in the rotational correlation time. Such a decrease can be interpreted to mean that the bound nitroxide probe exhibits a greater mobility in the expanded state of the cross-linked polystyrene. In addition, Regen provided evidence that with greater cross-linking, which translates to less expansion of the support, the internal viscosity of the solvent increased. Gel-phase 13C NMR has also been used to evaluate the extent of mobility of bound substrates [43–49]. Giralt and co-workers [45–47] have successfully used gel-phase 13C NMR to correlate the line widths of amino acids bound to various supports to their peptide coupling yields. Giralt found that a broadened 13C line spectrum was related to restricted mobility, and that the narrower the line widths, the higher the coupling yield for the bound amino acid. Additionally, he compared five different supports (polystyrene 1 % cross-linked with divinylbenzene, macroporous polystyrene, Kel-F-g-styrene, polyacrylamide and controlled pore glass) and found higher peptide fragment coupling rates for polystyrene-1 %-divinylbenzene over the other four supports. 44

Supports for solid-phase organic synthesis

One can dramatically sharpen the gel-phase 13C NMR line widths (and, possibly, enhance the reaction rate) by attaching a spacer arm between the bound molecule and the support. Bayer and Rapp [48,49] have found that attaching a long polyethylene glycol spacer (MW = 2000–3000 Da) can dramatically sharpen the line widths and increase the relaxation time of the bound component. In fact, both commercially available PEGincorporated supports (TentaGel and PEG-PS) exhibit almost 3 times narrower line widths as compared to standard polystyrene-1%-divinylbenzene and the polystyrenepolyacrylamide (Polyhype)-based composites [50,51]. How does this dramatic narrowing of the line width for the PEG-incorporated supports translate to a solid-phase reaction? In the case of oligonucleotide synthesis, the TentaGel support was found to give a much higher isolated yield (87%) in the solid-phase synthesis of a 5-oligonucleotide-PEG400 fragment as compared to polystyrene-1 %-divinylbenzene (11 %) [52]. However, some studies show that in the synthesis of larger oligonucleotides (>20 nucleotides), both CPG and Teflon exhibit higher coupling yields per cycle (>98%) as compared to the TentaGel support (90–91%) [53]. In peptide synthesis, the PEG-PS support was shown to give higher purity levels (84%) in the synthesis of the hydrophobic peptide sequence H-Ala10Val-NH2 as compared to polystyrene-1%-divinylbenzene (36%) [54]. As was mentioned earlier, the rate of the solid-phase reaction is ideally diffusion controlled and, therefore, a smaller and uniform diameter bead should exhibit higher, consistent, reaction rates. This idea was tested in the synthesis of a 17 amino acid peptide sequence in which a monosized TentaGel(l5 µm) was compared with both standard TentaGel (60–100 µm) and standard polystyrene resin (40–80 µm). The results showed that the much smaller and uniform TentaGel support (1 5 µm) gave a purer product over the other two less uniform and larger diameter supports [55]. The use of uniform-sized beads has become even more important in one-bead–one-structure combinatorial libraries where consistent solid-phase reaction rates are crucial to the production of a uniform and reproducible library. Typical size selection is accomplished through sieving, although other techniques such as ‘continuous-flow flotation’ have recently been applied for combinatorial library production [56].

Enhancing support loadings Attaching a long hydrophilic spacer arm such as PEG between the support and the bound molecule may enhance the overall yield and purity of the desired product, but it also results in reducing the overall loading capacity of the bound molecule. A reduction in loading capacity of greater than a factor of 3 has been observed when going from standard aminomethylated polystyrene resin (~0.7–1 .0 mmol amine/g) to the PEG-incorporated supports (~0.2–0.3 mmol amine/g). Even when taking swelling characteristics into consideration, one still finds a disparity in the loading capacity (mmol/ml) of at least a factor of 2 between the two types of supports when comparing them in solvents such as dimethylformamide [57]. One solution to this problem has been to incorporate lysine branching off the support. First introduced by Tam [58] for the solid-phase synthesis of immunogenic peptide antigens, lysine branching was later used to enhance resin loadings of other types of supports. Using the branched lysine approach, the loading capacity for the PEG-incorporated TentaGel support rose from 0.25 mmol amine/g to 0.76 mmol 45

I. Sucholeiki

Fig. 1. Branched supports – lysine branched TentaGel 1, ‘high-load’ PEG-PS 2 and ArgoGel 3.

amine/g ( 1, Fig. 1) [59]. Lysine branching has also been used on other supports such as cotton [60]. One problem with the use of lysine branching is the possibility of intramolecular side reactions occurring between the two proximal lysine amines. This problem has been addressed by attaching amine functionality at both the proximal and terminal ends of the PEG chain, as in the ‘high-load’ PEG-PS support 2 which exhibits a loading capacity of between 0.3 and 0.5 mmol/g (Fig. 1) [61]. Recently, another support has become commercially available which has two PEG chains (30–40 units long) branching from a single carbon attachment point [62]. This PEG-incorporating (ArgoGel) support 3 is reported to have a loading capacity of 0.5 mmol amine/g (Fig. 1) [63]. In addition, both the PEGPS- and TentaGel-type supports have recently incorporated polyethylene oxide chains of lower molecular weight in order to increase the overall loading capacity even more [61]. If one requires higher loading capacities and is willing to sacrifice some measure of solution-like character, one can use 1 % cross-linked polystyrene directly. One of the simplest ways of derivatizing cross-linked polystyrene (1 % divinylbenzene) is through lithiation with butyl lithium [64]. Treatment of the lithiated product with powdered dry ice, followed by quenching with acid, gives a carboxylated polystyrene support with a loading capacity of 2–3 mmol COOH/g. This type of support has been successfully used

Fig. 2. Ultra-high load poly[ N-[2-(4-hydroxyphenyl)ethyl]acrylamide] support 4.

46

Supports for solid-phase organic synthesis

in the production of non-peptide combinatorial libraries [65,66]. If even higher loadings are required, then one must move away from standard polystyrene and into acrylamidebased supports. Epton and co-workers synthesized a support (4) consisting of cross-linked poly[N-[2-(4-hydroxyphenyl)ethyl]acrylamide] with an initial loading capacity of 5 .0 mmol hydroxyl/g, the theoretical maximum (Fig. 2). Using this support they were able to produce multigram quantities of various small peptides [67,68].

Summary and Conclusions This review lists the various types of supports that are available for solid-phase organic chemical synthesis and also briefly mentions the types of techniques that can be used to monitor the extent to which a solid-phase reaction has progressed. There is also a discussion on how supports are modified so that the molecules attached to them may exhibit more solution-like behavior and how this translates to enhanced overall yields. Lastly, various strategies for enhancing the overall resin loading capacity, which included the concept of resin branching, have been discussed. As was mentioned in the Introduction section, achieving the ideal in one area of solid-phase synthesis sometimes requires compromises in other areas. We have seen that one pays the price in the form of lower loading capacities in order to have a support which exhibits good solution-like characteristics for molecules bound to them. It is clear that as more attention is paid to this subject, the disparity will lessen. This review has only lightly touched upon the topic of support stability under various reaction conditions. The main reason for this is that very little has been written on the subject. Many of those developing new supports have primarily done so for the fields of solid-phase peptide and/or oligonucleotide synthesis, areas which encompass a very narrow range of reaction conditions. As the field of combinatorial chemistry matures and new reaction types are attempted, an overall stability profile will gradually emerge.

References 1 Barany, G. and Merrifield, R.B., In Gross, E. and Meienhofer, J. (Eds.) The Peptides, Vol. 2, Academic Press, New York, NY, U.S.A., 1980, pp. 1–284. 2 Fréchet, J.M.J., In Hodge, P. and Sherrington, D.C. (Eds.) Polymer-Supported Reactions in Organic Synthesis, Vol. 6, Wiley, New York, NY, U.S.A., 1980, pp. 294–342. 3 Tietze, L.F. and Steinmetz, A., Angew. Chem., Int. Ed. Engl., 35 (1986) 651. 4 Forman, F.W. and Sucholeiki, I., J. Org. Chem., 60 (1995) 523. 5 Chu, S.S. and Reich, S.H., Bioorg. Med. Chem. Lett., 5 (1995) 1053. 6 Green, J., J. Org. Chem., 60 (1995) 4287. 7 DeWitt, S.H., Kiely, J.S., Stankovic, C.J., Schroeder, M.C., Cody, D.M.R. and Pavia, M.R., Proc. Natl. Acad. Sci. USA, 90 (1993) 6909. 8 Dressman, B.A., Spangle, L.A. and Kaldor, S.W., Tetrahedron Lett., 37 (1996) 937. 9 Fitch, W.L., Detre, G., Holmes, C.P., Shoolery, J.N. and Keifer, P.A., J. Org. Chem., 59 (1994) 7955. 10 Keifer, P.A., J. Org. Chem., 61 (1996) 1558. 11 Yan, B., Kumaravel, G., Anjaria, H., Wu, A., Petter, R., Jewell, C.F. and Wareing, J.R., J. Org. Chem., 60 (1995) 5736. 12 Chen, C., Randall, L.A.A., Miller, R.B., Jones, A.D. and Kurth, M.J., J. Am. Chem. Soc., 116 (1994) 2661.

47

I. Sucholeiki 13 14 15 16 17 18 19 20 21

22 23 24

25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45

46 47 48

48

Egner, B.J., Langley, G.J. and Bradley, M., J. Org. Chem., 60 (1995) 2652. Fitzgerald, M.C., Harris, K., Shevlin, C.G. and Siuzdak, G., Bioorg. Med. Chem., 6 (1996) 979. Moos, W.H., Green, G.D. and Pavia, M.R., Annu. Rep. Med. Chem., 28 (1993) 315. Fruchtel, J.S. and Jung, G., Angew. Chem., Int. Ed. Engl., 35 (1996) 17. Atherton, E., Brown, E., Sheppard, R.C. and Rosevear, A., J. Chem. SOC. Chem. Commun., (1981) 1151. Gait, M.J., Matthes, H.W., Singh, M., Sproat, B.S. andTitmas, R.C., Nucleic Acids Res., 10 (1982) 6243. Minganti, C., Ganesh, K.N., Sproat, B.S. and Gait, M.J., Anal. Biochem., 147 (1985) 63. Small, P.W. and Sherrington, D.C., J. Chem. SOC. Chem. Commun., (1989) 1589. Bhaskar, N.K., King, B.W., Meyers, P. and Westlake, J.P., In Epton, R. (Ed.) Innovation and Perspectives in Solid Phase Synthesis, 3rd International Symposium, Mayflower Worldwide, Birmingham, U.K., 1994, p. 451. Bayer, E. and Rapp, W., Chem. Pept. Proteins, 2 (1986) 3. Bayer, E., U.S. Patent 4 908 405, 1990. Zalipsky, S., Albericio, F. and Barany, G., In Hruby, V.J., Deber, C.N. and Kopple, K.D. (Eds.) Peptides: Structure and Function (Proceedings of the Ninth American Peptide Symposium), Pierce Chemical Co., Rockford, IL, U.S.A., 1985, pp. 257–260. Renil, M. and Meldal, M., Tetrahedron Lett., 33 (1995) 4647. Meldal, M., Auzanneau, F.I., Hindsgaul, O. and Palcic, M.M., J. Chem. Soc. Chem. Commun., (1994) 1849. Zhao, B.P., Panigrahi, G.B., Sadowski, P.D. and Kreprinsky, J.J., Tetrahedron Lett., 37 (1996) 3093. Truffert, J.C., Lorthioir, O., Asswline, U., Thunong, N.T. and Brack, A., Tetrahedron Lett., 35 (1994) 2353. Haralambidis, J., Duncan, L. and Tregear, G.W., Tetrahedron Lett., 28 (1987) 5199. Mingati, C., Ganesh, K.N., Sproat, B.S. and Gait, M.J., Anal. Biochem., 147 (1985) 63. Atherton, E., Brown, E., Sheppard, R.C. and Rosevear, A., J. Chem. Soc. Chem. Commun., (1981) 1151. Berg, R.H., Almdal, K., Pedersen, W.B., Holm, A., Tam, J.P. and Merrifield, R.B., J. Am. Chem. Soc., 111 (1989) 8024. Winther, L., Nielsen, C.S., Pedersen, W.B. and Berg, R.H., Peptides 1994, 23rd European Peptide Symposium, Braga, Portugal, September 4–10, 1994, poster no. 101. Geysen, H.M., Meloen, R.H. and Barteling, S.J., Proc. Natl. Acad. Sci. USA, 81 (1984) 3998. Maeji, N.J., Valerio, R.M., Bray, A.M., Campbell, R.A. and Geysen, H.M., React. Polym., 22 (1994) 203. Bunin, B.A., Plunkett, M.J. and Ellman, J.A., Proc. Natl. Acad. Sci. USA, 91 (1994) 4708. Valerio, R.M., Bray, A.M. and Maeji, N.J., Int. J. Pept. Protein Res., 44 (1994) 158. Frank, R., Tetrahedron, 48 (1992) 9217. Kramer, A., Reineke, U., Malin, R., Schleuning, W., Vakalopoulou, E., Scholz, P. and Mergener, J.S., Pept. Res., 6 (1993) 314. Eichler, J., In Epton, R. (Ed.) Innovation and Perspectives in Solid Phase Synthesis, 1st International Symposium, Mayflower Worldwide, Birmingham, U.K., 1990, pp. 337-343. Regen, S.L., J. Am. Chem. Soc., 96 (1974) 5275. Regen, S.L., Macromolecules, 8 (1975) 689. Epton, R., Goddard, P. and Irvin, K., J. Polym., 21 (1980) 1367. Fréchet, J.M.J., Tetrahedron, 37 (1981) 663. Giralt, E., Albericio, E, Bardella, F., Eritja, R., Feliz, M., Pedroso, E., Pons, M. and Rizo, J., In Epton, R. (Ed.) Innovations and Perspectives in Solid Phase Synthesis, 1st International Symposium, Mayflower Worldwide, Birmingham, U.K., 1990, pp. 111–120. Albericio, F., Pons, M., Pedroso, E. and Giralt, E., J. Org. Chem., 54 (1989) 360. Giralt, E., Rizo, J. and Pedroso, E., Tetrahedron, 40 (1984) 4141. Bayer, E. and Rapp, W., In Harris, J.M. (Ed.) Poly(Ethylene Glycol) Chemistry: Biotechnology and Biomedical Applications, Plenum, New York, NY, U.S.A., 1992, pp. 325–345.

Supports for solid-phase organic synthesis 49 Bayer, E., Albert, K., Willisch, H., Rapp, W. and Hemmasi, B., Macromolecules, 23 (1990) 1937. 50 Sucholeiki, I., Med. Chem. Res., 5 (1995) 618. 51 Sucholeiki, I., In Epton, R. (Ed.) Innovation and Perspectives in Solid Phase Synthesis, 4th International Symposium, Mayflower Worldwide, Birmingham, U.K., in press. 52 Bayer, E., Bleicher, K., Maier, M., Gaus, H.J., Schmeer, K. and Bauer, T., In Epton, R. (Ed.) Innovation and Perspectives in Solid Phase Synthesis, 3rd International Symposium, Mayflower Worldwide, Birmingham, U.K., 1994, pp. 9–20. 53 Zhao, B.P., Panigrahi, G.B., Sadowski, P.D. and Krepinsky, J.J., Tetrahedron Lett., 37 (1996) 3093. 54 Albericio, E, Bacardit, J., Barany, G., Coull, J.M., Egholm, M., Giralt, E., Griffin, G.W., Kates, S.A., Nicolás, E. and Solé, N.A., In Maia, H.L.S. (Ed.) Peptides 1994 (Proceedings of the 23rd European Peptide Symposium), ESCOM, Leiden, The Netherlands, 1995, pp. 27 1–272. 55 Eggenweiler, M., Clausen, N., Fritz, H., Zhang, L. and Bayer, E., In Maia, H.L.S. (Ed.) Peptides 1994 (Proceedings of the 23rd European Peptide Symposium), ESCOM, Leiden, The Netherlands, 1995, pp. 275–276. 56 Chenera, B., Finkelstein, J.A. and Veber, D.F., J. Am. Chem. Soc., 117 (1995) 11999. 57 General loadings and swelling volumes were taken from Novabiochem support specifications and Ref. 48. 58 Tam, J.P., Proc. Natl. Acad. Sci. USA, 85 (1988) 5409. 59 Butz, S., Rawer, S., Rapp, W. and Birsner, U., Pept. Res., 7 (1994) 20. 60 Eichler, J., Pinilla, C., Chendra, S., Appel, J.R. and Houghten, R.A., In Epton, R. (Ed.) Innovation and Perspectives in Solid Phase Synthesis, 3rd International Symposium, Mayflower Worldwide, Birmingham, U.K., 1994, pp. 227–232. 61 McGuinness, B.F., Kates, S.A., Griffin, G.W., Herman, L.W., Sole, N.A., Vágner, J., Albericio, E and Barany, G., In Kaumaya, P.T.P. and Hodges, R.S. (Eds.) Peptides: Chemistry and Biology (Proceedings of the 14th American Peptide Symposium), Mayflower Worldwide, Birmingham, U.K., 1996, pp. 125– 126. 62 Martin, J.F., presentation in IBC’s Molecular Diversity and Combinatorial Chemistry – Applications for Drug Lead Discovery, San Diego, CA, U.S.A., 1996. 63 ArgoGel® is a product of Argonaut Technologies Inc. 64 Farall, M.J. and Fréchet, J.M.J., J. Org. Chem., 41 (1976) 3877. 65 Meyers, H.V., Dilley, G.J., Durgin, T.L., Powers, T.S., Winssinger, N.A., Zhu, H. and Pavia, M.R., Mol. Div., 1 (1995) 13. 66 Pavia, M.R., Cohen, M.P., Dilley, G.J., Dubuc, G.R., Durgin, T.L., Forman, EW., Hediger, M.E., Milot, G., Powers, T.S., Sucholeiki, I., Zhou, S. and Hangauer, D.G., Bioorg. Med. Chem., 4 (1996) 659. 67 Epton, R.E., Wellings, D.A. and Williams, A., React. Polym., 6 (1987) 143. 68 Epton, R.E. and Williams, A., Int. J. Biol. Macromol., 3 (1981) 334.

49

Solu tion-phase combinatorial chemistry Diane M. Coe and Richard Storer Glaxo Wellcome Research and Development, Medicines Research Centre, Gunnels Wood Road, Stevenage, Hertfordshire SGl 2N Y. U. K.

Introduction Solution-phase approaches in combinatorial chemistry have received significantly less publicity to date than have solid-phase approaches, as evidenced by the relatively small amount of published literature. This is, however, not a fair indication of the amount of effort being applied to solution-phase work, primarily in the major pharmaceutical companies but also in some specialist organisations. Many of these groups are beginning to present material at conferences that has yet to appear in the primary literature. This review will deal comprehensively, but exclusively, with published material to date in this field. Some indication of work in progress that has yet to be published was included in a recent review [1]. The areas covered in this chapter are divided into the following sections: solution-phase synthesis of pools of compounds; solution-phase synthesis of discrete compounds; liquid-phase synthesis; use of supported reagents. Solution-phase synthesis has both advantages and disadvantages over solid-phase synthesis, and the preferred approach in a particular case will depend on a variety of considerations. Solution-phase expertise is currently the stock-in-trade of most synthetic organic chemists and much of the wealth of developed reactions is potentially available for combinatorial work. Whilst solution-phase reaction development has often been employed prior to transfer to solid phase, it occupies an important position in its own right in the combinatorial armoury. Solution-phase work is free from a number of the chemical constraints which currently limit the general application of solid-phase methods, but has clear disadvantages with respect to purification of products and pool generation. The use of supported reagents offers an attractive option for improving the quality of products prepared using solution-phase chemistry. Additionally, liquid-phase synthesis, for example using PEG, provides opportunities to combine some of the benefits of solidphase approaches with the versatility of solution-phase synthesis. Smart methods such as resin capture for isolating specific compounds from mixtures of products will also help to increase the utility of solution-based approaches. This chapter encompasses developments in each of these areas. The published work described herein serves to exemplify each of the above considerations. Both pool generation allied with potential decoding strategies and array syntheses of discrete compounds have been described. As with solid-phase approaches, the first reactions employed have been amide bond formations and similar reactions, but in both disciplines a wider variety of reactions is now being actively employed leading to the production of many different product classes. 50

Annual Reports in Combinatorial Chemistry and Molecular Diversity, Vol. 1, pp. 50–58 © 1997 ESCOM Science Publishers B. V.

Solution-phase combinatorial chemistry

Solution-phase synthesis of pools of compounds This is fundamentally different from the more popular solid-phase approaches which generate pools as mixtures of beads, each bearing a single compound with all the advantages that method conveys with regard to decoding. Whilst there is an option to use mixtures of reactants in solution phase, the pools prepared require either an effective decoding strategy or a suitable isolation procedure to allow the identification of specific compounds of interest. The earliest report of a solution-phase combinatorial library of small drug-like molecules was by Smith et al. [2]. The reaction of 40 individual acid chlorides with an equimolar mixture of 40 nucleophiles (alcohols or amines) and of the individual nucleophiles with an equimolar mixture of the acid chlorides produced a library of 1600 amides and esters as pools of 40 compounds. The orthogonal design strategy determined that each possible product should appear in a unique pair of samples to allow the rapid identification of lead compounds after screening. In a similar approach, Pirrung and Chen [3] prepared an ‘indexed’ library of 54 carbamates from a set of nine alcohols and nine isocyanates. The library was prepared as 15 sublibraries in which each of the alcohols and isocyanates was reacted with an equimolar mixture of the other reactants. The product mixtures were tested as inhibitors of acetylcholinesterase and their activities were used as ‘indices’ to the rows and columns of a two-dimensional matrix reflecting the activities of individual products. An identical approach was used by the same authors in the preparation of a library of 72 tetrahydroacridines by the condensation of 12 cyclohexanones with six cyanoanilines in 18 sublibraries [4]. Rebek and co-workers [5–8] prepared a series of libraries based around two central rigid cores: 9,9-dimethylxanthene and cubane tetracid chlorides. Unlike the above approaches, a series of iterative sublibrary generation and screening was used to identify increasingly potent compounds. All the above approaches identified novel lead structures of potential biological interest. A recently reported preparation of a pooled library used an unsymmetrical polyazacyclophane scaffold to prepare pools of 100 compounds resulting from N-alkylation using sets of 10 benzyl halides at two different sites on the molecule [9].

Solution-phase synthesis of discrete compounds Early work has again employed highly developed chemical reactions such as amide bond formation. Boger and co-workers [10–12] have published a series of articles that give details of the high-purity solution-phase parallel synthesis of libraries using a simple liquid–liquid extraction after each step (see Fig. 1). The libraries demonstrated the benefits of a number of the principles of the combinatorial approach. The nucleophilic opening of an anhydride which already contained a latent centre of diversity generated an acid group which was itself subjected to further modification. Selway et al. [13] have employed a selection of other chemistries in addition to acylation, notably sulphonylation, alkylation and aryl coupling, using boronic acid in the generation of solution-phase arrays in lead optimisation work. 51

D. M. Coe and R. Storer

Purify by acid/base washing

Purify by acid/base washing

Fig. 1. Solution-phase synthesis employing a purification protocol.

Among other chemistries being developed, the synthesis of heterocyclic molecules has proved a popular option both from chemical and biological perspectives. Two approaches employed are the modification of existing heterocyclic systems and the construction of the heterocycle during the reaction of interest. Whitten et al. [14] have exemplified the former in sequential halide displacement reactions of dichlorotriazines with amines. The latter approach has been exemplified in a number of published cases. An indication of the development requirements of well-known reactions for successful employment in array synthesis is provided by the work of Watson and co-workers [15] to prepare thiazole derivatives. Adaptation of the Hantzch synthesis of 2-aminothiazoles (Fig. 2) from ahaloketones and thioureas was used in the one-step preparation of discrete samples. Inclusion in the array of a compound of known biological activity provided a valuable internal control. Hanessian and Yang [I6] have reported a high yielding general method for the preparation of 5-alkoxyhydantoins (Fig. 3) both in solution and on solid support using a four-step procedure in a strategy allowing access to three points of diversity. It was used to produce two libraries of 25 distinct hydantoins in which each compound was fully characterised. Keating and Armstrong [17,18] have developed a solution-phase Ugi four-component reaction (Fig. 4) which utilises a ‘universal’ isocyanide. The α-acylamino amides can be

Fig. 2. Synthesis of 2-aminothiazoles.

52

Solution-phase combinatorial chemistry

Fig. 3. Preparation of 5-alkoxyhydantoins.

converted into new carboxylic acids, esters or thioesters, pyrroles and 1,4-benzodiazepine2,5-diones. This methodology allows the formation of products offering a wide range of diversity rather than impose the functionality derived from the commercially available monomers. In addition, the above work introduced the potentially important concept of resin capture, where the product is isolated by attachment to the resin via a suitable functional handle [19].

Fig. 4. Ugi four-component reaction.

53

D. M. Coe and R. Storer

Fig. 5. Synthesis on soluble polymeric support.

Liquid-phase synthesis An extension of solution-phase approaches which allows the possibility of combining the advantages of solution- and solid-phase syntheses is afforded by the use of soluble polymeric supports (see Fig. 5). Janda and co-workers [20,21] developed a technique

Fig. 6. Neomycin B mimetics.

54

Solution-phase combinatorial chemistry

Fig. 7. Dendrimer-supported synthesis.

referred to as liquid-phase combinatorial synthesis (LPCS). This methodology is reliant on the physical properties of polyethylene glycol monomethyl ether (MeOPEG), the solubility properties of which allow reactions to be conducted in the liquid phase whilst the propensity to crystallise in certain solvents allows the isolation and purification of each step of the reaction. Janda validated the technique by the preparation of two libraries – one peptidic and one non-peptidic. Wong and co-workers [22] also applied the methodology in the preparation of a library of neomycin B mimetics (Fig. 6). Two libraries were prepared by a multiple-component condensation of neamine derived aldehyde, t-butyl isocyanide or isocyanoacetic acid methyl ester, CBZ-N amino acids (×13) and a glycine conjugated polyethylene glycol methyl ether. The products of the reaction were isolated by precipitation. A related approach, termed dendrimer-supported combinatorial chemistry (DCC), has been disclosed by Kim et al. [23]. It uses dendrimers as soluble supports (see Fig. 7). In this case the reactions are performed in solution and the dendrimeric intermediates are isolated/purified by size exclusion chromatography. The strategy was validated by the preparation of a 3 × 3 × 3 combinatorial library using the Fischer indole synthesis.

Fig. 8. Use of a polymer-supported scandium catalyst.

55

D. M. Coe and R. Storer

Fig. 9. Use of solid-supported scavenging agents.

Use of supported reagents A complementary approach is afforded by the option to use reagents attached to suitable supports to allow the easy removal of excess or derived products. A useful review of this field was published in 1981 and recent advances in combinatorial chemistry have prompted a revival of interest [24]. Parlow [25] has demonstrated the use of ammonium exchange resins in the preparation of pooled libraries of aryl and heteroaryl ethers. Two batches of Amberlite IRA-900 were prepared, each of which contained a mixture of 10 aryl or heteroaryl oxides; these were then reacted independently with an electrophile to afford a mixture of ethers. Kobayashi and Nagayama [26] have reported the preparation of a library of quinoline derivatives using a novel polymer-supported scandium catalyst (Fig. 8) in a three-component coupling reaction. The scandium catalyst has the advantage of being partially soluble in the dichloromethane/acetonitrile mixtures but can be precipitated by the addition of hexanes and thus be removed quantitatively by filtration. Kaldor et al. [27] have a complementary approach which uses a solid-supported scavenging agent, nucleophile or electrophile, to remove excess reagents (see Fig. 9). The conditions have been developed for amine acylation using an amine scavenger and for amine alkylation using an isocyanate scavenger. Alkylations using reductive amination were also disclosed using a combination of resin-bound aldehyde to scavenge excess amine

Fig. 10. Simultaneous use of polymer-supported reagents. (i) Poly(4-vinylpyridinium)dichrornate, cyclohexane, ∆.. (ii) Perbromide on Amberlyst A-26. (iii) Amberlite IRA-900 (4-chloro-l-methyl-5-(trifluoromethyl)-1Hpyrazol-3-ol).

56

Solution-phase combinatorial chemistry

and a resin-bound borohydride to effect reduction. A two-step sequence using supported reagents was exemplified by the preparation of five trisubstituted ureas starting from amines via a reductive amination and subsequent acylation. A series of transformations was also accomplished by the simultaneous use of different polymer-supported reagents (Fig. 10). Parlow [28] published a multistep synthesis from sec-phenethyl alcohol using resin-supported dichromate, perbromide and pyrazol3-ol.

Conclusions The work reviewed herein gives an indication of the potential of solution-phase approaches. In the next 12 months we believe there will be a significant increase in publications of solution-phase work, reflecting the variety of clean, robust chemistry currently being developed. Parallel purification methods and thoughtful application of complementary technologies to obtain good-quality products will ensure a bright future for solutionphase approaches. Among the newer approaches which have yet to be seriously exploited, the use of fluorous reagents [29–31] to allow further options in phase separations appears to have enormous potential.

References 1 Storer, R., Drug Discov. Today, 1 (1996) 248. 2 Smith, P.W., Lai, J.Y.Q., Whittington, A.R., Cox, B., Houston, J.G., Stylli, C.H., Banks, M.N. and Tiller, P.R., Bioorg. Med. Chem. Lett., 4 (1994) 2821. 3 Pirrung, M.C. and Chen, J., J. Am. Chem. Soc., 117 (1995) 1240. 4 Pirrung, M.C., Chau, J.H.-L. and Chen, J., Chem. Biol., 2 (1995) 621. 5 Carell, T., Wintner, E.A., Bashir-Hashemi, A. and Rebek, J., Angew. Chem., Int. Ed. Engl., 33 (1994) 2059. 6 Carell, T., Wintner, E.A. and Rebek, J., Angew. Chem., Int. Ed. Engl., 33 (1994) 2061. 7 Shipps, G.W., Spitz, U.P. and Rebek, J., Bioorg. Med. Chem., 4 (1996) 655. 8 Carell, T., Wintner, E.A., Sutherland, A.J., Rebek, J., Dunayevskiy, Y.M. and Vouros, P., Chem. Biol., 2 (1995) 171. 9 An, H. and Cook, P.D., Tetrahedron Lett., 37 (1996) 7233. 10 Cheng, S., Comer, D.D., Williams, J.P., Myers, P.L. and Boger, D.L., J. Am. Chem. Soc., 118 (1996) 2567. 11 Boger, D.L., Tarby, C.M., Myers, P.L. and Caporale, L.H., J. Am. Chem. Soc., 118 (1996) 2109. 12 Cheng, S., Tarby, C.M., Comer, D.D., Williams, J.P., Caporale, L.H., Myers, P.L. and Boger, D.L., Bioorg. Med. Chem., 4 (1996) 727. 13 Selway, C.N. and Terrett, N.K., Bioorg. Med. Chem., 4 (1996) 645. 14 Whitten, J.P., Xie, Y.F., Erickson, P.E., Webb, T.R., De Souza, E.B., Grigoriadis, D.E. and McCarthy, J.R., J. Med. Chem., 39 (1996) 4354. 15 Bailey, N., Dean, A.W., Judd, D.B., Middlemiss, D., Storer, R. and Watson, S.P., Bioorg. Med. Chem. Lett., 6 (1996) 1409. 16 Hanessian, S. and Yang, R.-Y, Tetrahedron Lett., 37 (1996) 5835. 17 Keating, T.A. and Armstrong, R.W., J. Am. Chem. Soc., 117 (1995) 7842. 18 Keating, T.A. and Armstrong, R.W., J. Am. Chem. Soc., 118 (1996) 2574. 19 Brown, S.D. and Armstrong, R.W., J. Am. Chem. Soc., 118 (1996) 6331.

57

D.M. Coe and R. Storer 20 21 22 23 24 25 26 27 28 29 30 31

58

Han, H., Wolfe, M.M., Brenner, S. and Janda, K.D., Proc. Natl. Acad. Sci. USA, 92 (1995) 6419. Janda, K.D. and Han, H., Methods Enzymol., 267 (1996) 234. Park, W.K.C., Auer, M., Jaksche, H. and Wong, C.-H., J. Am. Chem. Soc., 118 (1996) 10150. Kim, R.M., Manna, M., Hutchins, S.M., Griffin, P.R., Yates, N.A., Bernick, A.M. and Chapman, K.T., Proc. Natl. Acad. Sci. USA, 93 (1996) 10012. Akelah, A. and Sherrington, D.C., Chem. Rev., 81 (1981) 557. Parlow, J.J., Tetrahedron Lett., 37 (1996) 5257. Kobayashi, S. and Nagayama, S., J. Am. Chem. Soc., 118 (1996) 8977. Kaldor, S.W., Siegel, M.G., Fritz, J.E., Dressman, B.A. and Hahn, P.J., Tetrahedron Lett., 37 (1996) 7193. Parlow, J.J., Tetrahedron Lett., 36 (1995) 1395. Curran, D.P. and Hadida, S., J. Am. Chem. Soc., 118 (1996) 2531. Curran, D.P. and Hoshino, M., J. Org. Chem., 61 (1996) 6480. Curran, D.P., Chemtracts, 9 (1996) 75.

Analytical methods for the quality control of combinatorial libraries William L. Fitch Affymax Research Institute, 3410 Central Expressway, Santa Clara, CA 95051, U.S.A.

Introduction New analytical chemistry technology developments have always been involved in the transitions to new generational eras in the drug discovery industry. The first or ‘natural product’ era depended on the early ability to determine the structures of natural compounds. The second or ‘synthetic organic chemical’ era utilized the advances in spectroscopy and chromatography to expedite the synthesis of a wide variety of structures. The third or ‘rational design’ era depended on new advances in biochemistry, molecular modeling and structural NMR to fit molecules to targets. The present, ‘combinatorial’ era relies on automation of synthesis and screening to overwhelm a target with possible effectors. As stated by Kuhr [1], ‘Recent advances .... may lead to exciting new breakthroughs in other areas when combined together .... . It is the job of the analytical chemist to recognize and utilize these diverse elements to produce instruments with capabilities that had never previously been imagined’. The new field of molecular diversity raises three issues which need to be addressed by the organic analytical chemistry community: (i) What tools can we use for following solidphase reactions? (ii) How can we analyze all these samples? (iii) How much characterization of libraries is possible or appropriate? This chapter deals with these problems and reviews the literature since a similar review written in June 1995 [2] (earlier seminal publications are described where appropriate). Other analytical issues such as decoding of combinatorial libraries or the applications of affinity separations and single-bead mass spectrometry for library deconvolution are dealt with in other chapters of this book.

Analytical methods used in solid-phase synthesis Solid-phase chemistry has been, and will continue to be, central to the success of combinatorial chemistry, but, as pointed out in several previous reviews, new methods are needed to follow reactions and to assure their fidelity in the solid phase [3–6]. This process is routine for solution-phase organic chemistry and is usually done with chromatography (TLC, HPLC, GC). However, in a solid-phase reaction, the solid-bound starting material and product are not available for chromatography without cleavage and the reagents or building blocks are used in large excess, so their concentration does not change with reaction. When two or more solid-phase reactions are performed in sequence, this issue can become critical. Hard experience at Affymax has proven the value of following each and every reaction in some meaningful way. Annual Reports in Combinatorial Chemistry and Molecular Diversity, Vol. 1, pp. 59–68 © I997 ESCOM Science Publishers B. V.

59

W.L. Fitch

Measuring yields and loading In liquid-phase chemistry, one will start a reaction with weighed amounts of pure starting materials. The weight of the purified product then allows calculation of the reaction yield. This is a key result, useful for comparing different reactions or the applicability of the reaction to different substructures. In solid-phase chemistry the additional issue of loading on the bead must be considered. Loading is the measure of how much ligand or reacting functional group is on the resin per unit weight of the resin. The measurement of loading is necessarily gravimetric because resins are insoluble, and is usually described in units of mmol/g. For example, in Merrifield’s synthesis of aminomethyl resin [7], the loading of the resin was determined by elemental analysis and titration to be 0.2 mmol/g (see Fig. 1). The yield of a solid-phase reaction is given by the loading of the starting material compared to the loading of the product. The elemental analysis of intermediates as a way of measuring loading is frequently reported [8–11], in spite of difficulties caused by the dilution factor of the solid support [3]. Care should be taken not to rely on the HPLC purity of cleaved products as a measure of yield in a multi-step solid-phase sequence, although this practice is common [12]. A better procedure is to do a few representative reactions on a larger scale for gravimetric yield measurement [13]. IR spectroscopy has been used to estimate the yield [10]. The UV spectrophotometric quantitative measurement of Fmoc release from derivatized amino groups is still a very common method for measuring loadings [14]. The importance of measuring and reporting yields cannot be overemphasized. The continued growth and success of solid-phase synthesis must be built on a solid foundation with rigorous characterization of solid supports, linkers and intermediates. Following reactions and determining structures of intermediates on-bead Qualitative color tests are frequently used to follow reactions to assure completion. The Kaiser ninhydrin test is the best known of these [15]. An improved method for the detection of secondary amines has been reported [16]. The Fmoc method for assuring complete deprotection of an amine is in wide use [13]. IR methods can be used to support a proposed intermediate in a reaction [17] or to follow the incorporation of a distinctive functional group (e.g. CHO) onto a resin [8,18]. IR is especially useful for surfaces such as pins or crowns, where NMR techniques are not useful. High-quality IR spectra can be obtained from single beads [19].

Fig. 1. Merrifield synthesis of aminomethyl resin.

60

Quality control ofcombinatorial libraries

Fig. 2. MAS 1H spectrum of succinoylated TentaGel S NH2, obtained from a slurry of 10 mg of resin in CDC13.

Natural abundance gel-phase 13C NMR spectroscopy is used for assessing intermediate structures in solid-phase synthesis [20], but requires long acquisition times and so is not useful for following reactions. Affymax scientists [21] have pioneered the use of specifically 13C-labeled starting materials in conjunction with fast 13C-gel phase NMR to follow multi-step reactions [17,22,23]. 31P NMR has sensitivity comparable to 1H NMR but susceptibility to solid-phase line broadening more like 13C. Gel-phase 31P is therefore very useful in following the reactions of phosphorus-containing moieties [24]. Adequate signal to noise at 300 MHz is obtained from 50 mg of PEG PAL resin in a regular 5 mm tube in less than 10 min accumulation time. Integration of the 31P signals proved useful for quantifying the progress of the reaction. Similarly, Sn NMR has been used to follow a reaction [9]. Magic angle spinning (MAS) NMR brings the power of 1H NMR to solid-phase chemistry [2,25]. A simple MAS spectrum, that of the reaction product of succinic anhydride and TentaGel S NH2, is shown in Fig. 2. Presaturation of the PEG resonance at 3.6 ppm is critical to spectral quality. The complex peaks from 3.4 to 3.9 ppm are due to residual PEG, 13C satellites and the terminal CH2 of the PEG chain. The characteristic doubling of the solution-phase CDCl3 and TMS resonances can be seen. 61

W.L. Fitch

Fig. 3. Structure of compound from ‘single-bead NMR’.

The special MAS probes are still difficult to obtain, but reports of their use in the general solid-phase literature are increasing [8,26,27]. The Varian group [28,29] has published comparisons of 1H MAS spectra for similar ligands on different resins and using different solvents. This group [30] has also published a comparison of the NMR spectra obtained by conventional probes versus MAS probes. The PEG-polystyrene resins (e.g. TentaGel) give by far the best line shapes on MAS. Spectra for the pure polystyrene resins (e.g. Wang, Merrifield) can benefit from the use of a spin-echo pulse sequence with appropriate τ values which preferentially suppresses the signals from the rotationally restricted polymer atoms compared to the ligand atoms [31]. Sarkar et al. [32] report the use of MAS for ‘single-bead NMR’. A simple molecule on Wang resin 1 (Fig. 3) was fully 13C-labeled. A 13C-filtered 1H NMR spectrum was then recorded for a single bead. The isotope editing eliminates the interferences from solvents and contaminants, allowing the signal to be observed. The only information obtained is the resonance for the methyl protons directly attached to the labeled carbon, making this experiment mainly a novelty. A Sandoz group [33–35] has pioneered the use of MAS 1H NMR on a Bruker CPMAS probe. The CPMAS technique is robust and more amenable to automation but does not offer as high a resolution compared to Varian’s Nano-NMR probe [30]. Cleave and characterize methods for evaluating intermediates and products Because of the difficulties associated with characterizing compounds on-bead, the most common method for characterizing a solid-phase synthesis product is to cleave it from the solid support and then use solution-phase analytical tools. Photolabile linkers [36] are especially useful for this because often the functional groups and protecting groups in a building block are designed to be photostable; acid cleavage (from a TFA labile linker) will often remove protecting groups as well as free the compound from the support. All of the normal techniques of solution-phase analysis are available after the resin is removed: chromatography, NMR and mass spectrometry. These solution-phase methods need no further discussion here. The two mainstays of modern mass spectrometry, electrospray MS and MALDI/ TOFMS, can both be used with adequate sensitivity for single-bead detection of molecular weight and structural information [37]. Single-bead MS using TFA cleavage and MALDI/ TOF analysis is now a well-established technique [38–41]. A Scripps group [42] reports 62

Quality control of combinatorial libraries

direct simultaneous photocleavage and MALDI photoionization of a ligand connected to a resin support with a photolabile linker. This experiment is remarkable in that it requires a photolysis in the solid-phase followed by diffusion away from the bead into the matrix, where the excited-state matrix can donate a proton and ‘assist’ the transmission into the gas phase. In our own hands, a similar experiment required two steps. First the bead plus matrix was photocleaved with the laser. Then the target was removed from the instrument, and solvent was added to promote diffusion and subsequent cocrystallization of the ligand and matrix. Return to the laser target yielded an excellent MALDI/TOF signal [43]. If generally applicable, direct MALDI/TOFMS analysis will be a powerful addition to the usefulness of photocleavable linkers. An application of several of these analytical techniques can be found in a paper by Ley et al. [44]. In this work Wang resin 2 was elaborated to a variety of bicyclo[2.2.2]octanones 4 and 5 which were reductively alkylated and then cleaved in three different ways to yield substituted bicyclo[2.2.2]octanes (see Fig. 4). In this work reactions were optimized using IR and gel-phase 13C NMR and cleavage. To determine the 4/5 ratio, the resin was reductively cleaved to a mixture of diols which were purified by column chromatography and their basic structures determined by 1H NMR and MS. Their isomer ratios were determined by GC, 1H or 19F NMR and the exo/endo assignments were made with NOESY Finally, yields were measured gravimetrically, comparing the weight of isolated product to the theoretical weight based on the known loading of the starting Wang resin. Similar procedures were used to assign the structures and stereochemistries in the isolated products 6, 7 and 8. This paper is an excellent illustration of the chemical and spectroscopic tools which need to be applied to completely characterize a complex set of reactions. It is this rigor which is the heart of synthetic organic chemistry. Careful adherence

Fig. 4. Synthesis of bicyclo[2.2.2]octanes.

63

W.L. Fitch

Fig. 5. Monthly mass spectrometry samples analyzed at Affymax Research Institute.

to high standards allows the field to grow, each synthesis building with confidence on what has come before.

Analytical methods for evaluating libraries Parallel synthesis The central dogma of organic chemistry has been that compounds should be thoroughly characterized as to structure and purity before testing them for any useful property. These rules are still enforced by the major organic chemistry journals [45–47] whenever work is published. Combinatorial chemistry could be seen to break these rules in that poorly characterized mixtures are subjected to testing. However, in all cases, results from the uncharacterized mixtures will be confirmed by retesting of a traditionally synthesized and characterized authentic sample. Acknowledgement of this strategy is even written in the Notice to Authors of one of the journals [46]. This issue highlights the characterization difference between parallel synthesis and combinatorial synthesis. Parallel synthesis is automated traditional organic chemistry. Each compound is made in a separate reactor, purified and characterized. There is no excuse for not fully characterizing compounds made by parallel synthesis. Jonathan Ellman’s laboratory at UC Berkeley has been a pioneering academic center for solid-phase chemistry development. His philosophy is ‘to synthesize libraries of discrete compounds in a spatially separate fashion, rather than libraries of compound mixtures, to allow for rigorous analytical characterization’ [48,49]. What parallel synthesis does is put a new burden on analytical throughput to keep up with the productivity of the automated synthesis laboratory. To illustrate the issue, Fig. 5 shows the increase in mass spectrometry samples analyzed at Affymax over the last 2 64

Quality control of combinatorial libraries

years. During this time there has been no increase in the number of chemists in the laboratories but a decided shift from one-at-a-time synthesis to automated solid-phase synthesis. Mass spectrometry has been able to cope with this workload with the fortuitous maturity of flow-injection electrospray techniques and open access techniques [50]. Scientists at Ontogen [51] have gone so far as to put multiple autosamplers on one mass spectrometer to allow for subminute turnaround between injections in support of their high throughput synthesis efforts. NMR has trouble keeping up with these sampling requirements and should be used more selectively for new chemistry development and where thorough stereochemical outcomes are in question. Future developments in coupled LC/NMR may allow for fast, flow-injection NMR [52]. The yield and purity of discretely synthesized compounds can be determined in the traditional way: purify by extraction, crystallization or chromatography; measure the yield gravimetrically and confirm the purity by elemental analysis, chromatography or 1H NMR. Increasingly, the rigorous sample purification necessary for meaningful elemental analysis is deemed too slow, or an inadequate quantity of material is synthesized for the gravimetric measurement of yield. Scientists at ArQule [53] use a quick HPLC methods development, comparing six different 10 min HPLC gradients of model compounds, and then HPLC/UV for the purity assessment of large parallel synthesis libraries. Using HPLC/UV alone as a measure of purity is fraught with danger; newer LC detectors such as the chemiluminescent nitrogen [54] or the evaporative light scattering detector offer promise of being more truly universal and predictable in their response. In order to report meaningful yields, Bunin et al. [49] went so far as to independently synthesize 20 of a set of 182 discretes so authentic reference standards could be used to calibrate HPLC quantitative analyses of the combinatorially prepared benzodiazepines. Scientists at Lilly [55] are using TLC for evaluating the purity of discrete libraries; a parallel analysis to match parallel synthesis! MS and LC/MS are certainly useful for identifying components of a mixture, but, in spite of published attempts [56], should never be used for quantitative analysis without authentic reference standards. Split/pool solid-phase libraries and solution-phase pools Issues in the quality control of combinatorial libraries have been reviewed [5,6,57]. The solution-phase synthesis of libraries has been recently reviewed [58]. The analysis of mixtures is much more difficult than the analysis of single compounds. For single compounds, even fairly impure ones, it is straightforward to isolate the spectroscopic signal from the main component, confirming its structure, and to estimate its purity with chromatography and 1H NMR., With mixtures of more than a few components, we may be able to isolate spectra of pure components via chromatography/spectrometry but reliable information as to the concentration of any component will only be obtainable if authentic standards are available. In spite of this, HPLC/UV is commonly used to evaluate small model libraries. Gas chromatography is preferred as a higher resolution separation method even if derivatization is required to see all of the members of a set [11]. Capillary electrophoresis has been used to separate a large peptide library into positive, neutral and negatively charged groups and the relative integrated areas of peaks correlated to the theoretical distribution [59]. 65

W.L. Fitch

The quality control of larger libraries must take a systems approach [2]. Building blocks must be checked rigorously [60]. Chemistries must be extensively rehearsed with single compounds and small model libraries of a size where HPLC is likely to separate all of the members [61,62]. Incorporation of a positive control is a good way to check on the successful synthesis of a library [63]. Finally, steps should be taken to assure the completion of reactions in the actual library production and that good documented practices are followed in handling automated equipment. For all of the above good reasons, few publications have come out on combinatorial mixture analysis in this review period. Mixture analysis of peptides by MS and MS/MS has continued as an active field [2,59,64]. Nanospray (very low flow electrospray) has become the method of choice for studies of mixtures of peptides by MS/MS [65,66]. 2D NMR has been used to assure that unusual amino acids were correctly incorporated in a mixed synthesis [59]. Rather than analyze mixtures, we prefer to quality control split pool libraries by singlebead methods. In most cases, single beads will be kept separated for cleavage into separate assays. So combining beads for analysis creates an artificial problem. As described above, IR and NMR data can be measured from single beads, but will yield little useful information on unknown structures. Single-bead mass spectrometry, however, can readily measure the molecular weight of the component from each bead. The statistical distribution of these molecular weights can then be compared to the theoretical distribution. For encoded solid-phase libraries, we can then compare the encoded structure with the independently determined molecular weight as a quality check on both the synthetic process and the encoding process. This sequence has proven very useful in our laboratories [43].

Conclusions Solid-phase synthesis practitioners should have elemental analysis, IR and NMR tools at hand for following reactions. Loadings of key intermediates should be measured and yields for new reactions reported. Mass spectrometry, HPLC and NMR will be needed for confirming structures after cleavage from the support. When new solid-phase chemistry or solution-phase synthesis of parallels is undertaken, the products must be rigorously characterized for purity and structure. Mixture analysis methods are of little value for interesting sized libraries. Single-bead analysis methods are being developed which can be used to assure the quality of larger solid-phase split pool libraries.

Acknowledgements George Detre obtained the spectrum of Fig. 2 from a sample synthesized by Richard Wilson. Ted Baer and Dave Campbell offered helpful suggestions on this manuscript.

References 1 Kuhr, W.G., Book of Abstracts, 211th ACS National Meeting, New Orleans, LA, U.S.A., American Chemical Society, Washington, DC, U.S.A., 1996, ANYL-023.

66

Quality control of combinatorial libraries 2 Fitch, W.L., Detre, G. and Look, G.C., In Kerwin, J.F. and Gordon, E.M. (Eds.) Combinatorial Chemistry and Molecular Diversity in Drug Discovery, Wiley, New York, NY, U.S.A., 1997, in press. 3 Fruchtel, J.S. and Jung, G., Angew. Chem., Int. Ed. Engl., 35 (1996) 17. 4 Hermkens, P.H.H., Ottenheijm, H.C.J. and Rees, D., Tetrahedron, 52 (1996) 4527. 5 Terrett, N.K., Gardner, M., Gordon, D.W., Kobylecki, R.J. and Steele, J., Tetrahedron, 51 (1995) 8135. 6 Thompson, L.A. and Ellman, J.A., Chem. Rev., 96 (1996) 555. 7 Mitchell, A.R., Kent, S.B.H., Engelhard, M. and Merrifield, R.B., J. Org. Chem., 43 (1978) 2845. 8 Cherera, B., Finklestein, J.A. and Veber, D.E., J. Am. Chem. Soc., 117 (1995) 11999. 9 Forman, F.W. and Sucholeiki, I., J. Org. Chem., 60 (1995) 523. 10 Kobayashi, S., Hachiya, I., Suzuki, S. and Moriwaki, M., Tetrahedron Lett., 37 (1996) 2809. 11 Tietze, L.F. and Steinmetz, A., Angew. Chem., Int. Ed. Engl., 35 (1996) 651. 12 Hiroshige, M., Hauske, J.R. and Zhou, P., Tetrahedron Lett., 36 (1995) 4567. 13 Tempest, PA., Brown, S.D. and Armstrong, R.W., Angew. Chem., Int. Ed. Engl., 35 (1996) 640. 14 Ni, Z., Maclean, D., Holmes, C.P., Murphy, M.M., Ruhland, B., Jacobs, J.W., Gordon, E.M. and Gallop, M.A., J. Med. Chem., 39 (1996) 1601. 15 Kaiser, E., Colescott, R.L., Bossinger, C.D. and Cook, P.I., Anal. Biochem., 34 (1970) 595. 16 Vojkovsky, T., Pept. Res., 8 (1995) 236. 17 Gordeev, M.F., Patel, D.V. and Gordon, E.M., J. Org. Chem., 61 (1996) 924. 18 Hauske, J.R. and Dorff, P., Tetrahedron Lett., 36 (1995) 1589. 19 Yan, B., Kumaravel, G., Anjaria, H., Wu, A., Petter, R.C., Jewell Jr., C.F. and Wareing, J.R., J. Org. Chem., 60 (1995) 5736. 20 Dressman, B.A., Spangle, L.A. and Kaldor, S.W., Tetrahedron Lett., 37 (1996) 937. 21 Look, G.C., Holmes, C.P., Chinn, J.P. and Gallop, M.A., J. Org. Chem., 59 (1994) 7588. 22 Murphy, M.M., Schullek, J.R., Gordon, E.M. and Gallop, M.A., J. Am. Chem. Soc., 117 (1995) 7029. 23 Holmes, C.P., Chinn, J.P., Look, G.C., Gordon, E.M. and Gallop, M.A., J. Org. Chem., 60 (1995) 7328. 24 Johnson, C.R. and Zhang, B., Tetrahedron Lett., 36 (1995) 9253. 25 Fitch, W.L., Detre, G., Holmes, C.P., Shoolery, J.N. and Keifer, P.A., J. Org. Chem., 59 (1994) 7955. 26 Ruhland, B., Bhandari, A., Gordon, E.M. and Gallop, M.A., J. Am. Chem. Soc., 118 (1996) 253. 27 Gordon, E.M., Gallop, M.A. and Patel, D.V., Acc. Chem. Res., 29 (1996) 144. 28 Keifer, P.A., J. Org. Chem., 61 (1996) 1558. 29 Keifer, P.A. and Sehrt, B., High-Resolution NMR Spectra of Solid-Phase Synthesis Resins, Varian NMR Instruments, Palo Alto, CA, U.S.A., 1996. 30 Keifer, P.A., Baltusis, L., Rice, D.M., Tymiak, A.A. and Shoolery, J.N., J. Magn. Reson., 119 (1996) 65. 31 Garigipati, R.S., Adams, B., Adams, J.L. and Sarkar, S.K., J. Org. Chem., 61 (1996) 2911. 32 Sarkar, S.K., Garigipati, R.S., Adams, J.L. and Keifer, P.A., J. Am. Chem. Soc., 118 (1996) 2305. 33 Shapiro, M.J., personal communication, 1996. 34 Anderson, R.C., Jarema, M.A., Shapiro, M.J., Stokes, J.P. and Ziliox, M., J. Org. Chem., 60 (1995) 2650. 35 Anderson, R.C., Stokes, J.P. and Shapiro, M.J., Tetrahedron Lett., 36 (1995) 5311. 36 Holmes, C.P. and Jones, D.G., J. Org. Chem., 60 (1995) 2318. 37 Brummel, C.L., Vickerman, J.C., Carr, SA., Hemling, M.E., Roberts, G.D., Johnson, W., Weinstock, J., Gaitanopoulos, D., Benkovic, S.J. and Winograd, N., Anal. Chem., 68 (1996) 237. 38 Haskins, N.J., Hunter, D.J., Organ, A.J., Rahman, S.S. and Thom, C., Rapid Commun. Mass Spectrom., 9 (1995) 1437. 39 Egner, B.J., Langley, G.J. and Bradley, M., J. Org. Chem., 60 (1995) 2652. 40 Egner, B.J., Cardno, M. and Bradley, M., J. Chem. Soc. Chem. Commun., (1995) 2163. 41 Zambias, R.A., Boulton, D.A. and Griffin, P.R., Tetrahedron Lett., 35 (1994) 4283. 42 Fitzgerald, M.C., Harris, K., Shevlin, C.G. and Siuzdak, G., Bioorg. Med. Chem. Lett., 6 (1996) 979. 43 Fitch, W.L., Lu, A., Tsutsui, K., Shah, N. and Falick, A.M., Proceedings of the 44th ASMS Conference on Mass Spectrometry and Allied Topics, Portland, OR, U.S.A., May 12–16, 1996, p. 1043. 44 Ley, S.V., Mynett, D.M. and Koot, W.-J., Synlett., (1995) 1017.

67

W.L. Fitch 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

61 62 63 64 65 66

68

Editors, 1996 Guidelines for Authors, J. Org. Chem., 61 (1996) 7A. Editors, Notice to Authors, J. Med. Chem., 39 (1996) 7A. Editors, Notice to Authors of papers, J. Am. Chem. Soc., 118 (1996) 9A. Ellman, J.A., Acc. Chem. Res., 29 (1996) 132. Bunin, B.A., Plunkett, M.J. and Ellman, J.A., Proc. Natl. Acad. Sci. USA, 91 (1994) 4708. Taylor, L.C.E., Johnson, R.L. and Raso, R., J. Am. SOC. Mass Spectrom., 6 (1995) 387. Moran, E. and Fang, L., presented at the CHI Conference on High Throughput, Screening and Rapid Compound Characterization, San Diego, CA, U.S.A., 1996. Fell, J.B., Chin, J.A., Shapiro, M.J. and Wareing, J.R., Book of Abstracts, 211th ACS National Meeting, New Orleans, LA, U.S.A., American Chemical Society, Washington, DC, U.S.A., 1996, ANYL-029. Li, L.Y.T. and Kyranos, J.N., Proceedings of the 44th ASMS Conference on Mass Spectrometry and Allied Topics, Portland, OR, U.S.A., May 12–16, 1996, p. 1041. Bizanek, R., Manes, J.D. and Fujinari, E.M., Pept. Res., 9 (1996) 40. Meyers, H.V., personal communication at the IBC Conference on Molecular Diversity and Combinatorial Chemistry, 1996. Smart, S.S., Mason, T.J., Bennell, P.S., Maeji, N.J. and Geysen, H.M., Int. J. Pept. Protein Res., 47 (1996) 47. Lyttle, M.H., Drug Dev. Res., 35 (1995) 230. Storer, R., Drug Discov. Today, 1 (1996) 248. Boutin, J.A., Hennig, P., Lambert, P., Bertin, S., Petit, L., Mahieu, J., Serkiz, B., Volland, J. and Fauchere, J., Anal. Biochem., 234 (1996) 126. Zuckermann, R.N., Martin, E.J., Spellmeyer, D.C., Stauber, G.B., Shoemaker, K.R., Kerr, J.M., Figliozzi, G.M., Goff, D.A., Siani, M.A., Simon, R.J., Banville, S.C., Brown, E.G., Wang, L., Richter, L.S. and Moos, W.H., J. Med. Chem., 37 (1994) 2678. Carell, T., Wintner, E.A., Bashir-Hashemi, A. and Rebek Jr., J., Angew. Chem., Int. Ed. Engl., 33 (1994) 2059. Pirrung, M.C. and Chen, J., J. Am. Chem. Soc., 117 (1995) 1240. Terrett, N.K., Bojanic, D., Brown, D., Bungay, P.J., Gardner, M., Gordon, D.W., Mayers, C.J. and Steele, J., Bioorg. Med. Chem. Lett., 5 (1995) 917. Dunayevskiy, Y., Vouros, P., Carell, T., Wintner, E.A. and Rebek Jr., J., Anal. Chem., 67 (1995) 2906. Davis, M.T., Stahl, D.C., Hefta, S.A. and Lee, T.D., Anal. Chem., 67 (1995) 4549. Wilm, M., Neubauer, G. and Mann, M., Anal. Chem., 68 (1996) 527.

Automated synthesis Sheila H. DeWitt DIVERSOMER Technologies Inc. and Parke-Davis Pharmaceutical Research, A Division of Warner-Lambert Company, 2800 Plymouth Road, Ann Arbor, MI 48105, U.S.A.

Introduction The advent of combinatorial chemistry for the high-throughput synthesis of compounds [1–8] has driven the advancement of automated methods for synthesis as well as for reaction design, information management, analysis, and purification. Although laboratory automation has historically been implemented in clinical, analytical, process, or product development programs [9, 10], it has become a necessity in chemical synthesis laboratories to realize the full potential of combinatorial chemistry efforts. Some perspective articles [11–14] and reviews [15–21] of automated synthesis efforts have been published, but few have explored the shift to and incorporation into combinatorial or high-throughput chemistry [22–24]. This chapter will review the literature published since 1990 relevant to automated synthesis. The execution of syntheses by manual or automated means is often dictated by the objectives for sample throughput and compatibility of the chemistry with existing instrumentation. For some combinatorial chemistry strategies, manual methods are satisfactory and automation is not warranted. For example, the solid-phase synthesis of six mixtures, containing a total of 34 012 224 hexapeptides (18 amino acids × 6 steps), was achieved manually in six individual reactors by Dooley and Houghten [25]. On the other hand, the solid-phase synthesis of 48 individual 14-mer peptides was best implemented with an automated peptide synthesizer by Gausepohl et al. [26]. The method of synthesis (i.e. solution-phase or solid-phase) as well as requisite pre- and post-synthesis operations (i.e. resin loading, reagent preparation, isolation, analysis, and purification) influence the need and subsequent implementation of automated synthesis.

System designs Reactors The design and selection of chemical reactors is influenced by the method of synthesis (solution-phase or solid-phase) and the interface with automation. Historically, glass reactors have been with automation for process optimization of solution-phase chemistry [27– 37]. Recently, similar reactors have been utilized for automated solution-phase combinatorial chemistry [38]. Reactors for solid-phase synthesis, however, require some mechanism for filtration and washing of the solid support. Representative reactors for automated solidphase peptide synthesis include fritted polyethylene or polypropylene cartridges [26,39–45] and polyethylene or polypropylene pins [26,46–48]. Only two groups have attempted automated peptide synthesis utilizing cotton or tea bags as reactors [49,50]. Although these Annual Reports in Combinatorial Chemistry and Molecular Diversity, Vol. I, pp. 69–77 © 1997 ESCOM Science Publishers B. V.

69

S. H. De Witt

reactors are compatible with the reaction conditions for amide coupling, they are not compatible with the reaction conditions required for cleavage of the peptide from the solid support (i.e. HF or TFA). Therefore, with a few exceptions [51], most automated peptide synthesizers require manual transfer to a separate glass or Teflon reactor for cleavage[26]. As high-throughput synthesis strategies exploit more diverse chemistries and higher capacities, the reactors need to be compatible with both the reaction conditions and automation methods. Although glass or Teflon reactors are optimal for full chemical and thermal compatibility [52], engineering challenges have limited the integration of these reactors with parallel and automated processing. As a result, automated synthesis systems which utilize glass or Teflon reactors in combination with parallel processing have only recently been introduced [53–67]. Architecture Automated systems are designed to process operations in a serial or parallel fashion. A system that processes samples in parallel performs the same operation on a plurality of samples before proceeding to the next operation. On the other hand, a serial processing system performs all of the operations on one sample before proceeding to the next sample and often requires scheduling software to maximize the use of the system [27,38,68,69]. Without scheduling, a serial processing instrument will wait idly during a single reaction that requires 24 h to complete a synthesis cycle. At this rate, less than 30 reactions could be achieved in one month. Parallel processing significantly improves the throughput and efficiency of automated synthesis systems. The architecture of an automated synthesis system can be modular or integrated, open or closed. A modular system is characterized by multiple, interchangeable components which are combined to provide multiple options for use (see Ref. 61 as an example). In contrast, an integrated system’s components are used together to achieve specific functions (see Ref. 70 as an example). An example of modularity in the laboratory is the combination of thermometer adapters, distillation head, condenser, and receiving flasks for distillation. The alternative, a one-piece vigreux, jacketed, distillation head is a commercially available integrated piece of glassware. The differentiating feature of an open versus closed system is the ability to interact with and maximize the use of individual components. The closed system, on the other hand, represents a ‘turnkey’ approach which enables full automation of the entire process. However, this instrument does not allow the user to interact with the individual system components during processing. An example of a closed system is an HPLC instrument while flash chromatography represents an open system. Two primary methods exist for the introduction and removal of reagents from reactors: a flow-through system employing inlet and outlet valves or a liquid-handling robot with stand-alone reactors. Although a valving system always has a closed architecture, a robotic system may have an open or closed architecture. Valving automation The optimal configuration of a valve or flow-through system is achieved by mounting the reactors at a stationary location in the instrument, thus dictating a closed architecture 70

Automated synthesis

[33–36,51,59,61,62,71–74]. The number of reactions that have been controlled simultaneously in this type of system has not exceeded 36 [62]. Robotic automation A robotic system enables mobility of the reactors. The components of a robotic system include (i) a manipulator or end effector; (ii) a controller; and (iii) robot arms. Robots may also be mounted as a stationary platform, a gantry mechanism where the robot is suspended from a bridge-like frame, or on a linear track drive. A variety of robots exist for both laboratory and industrial use [75,76]. Laboratory robots are generally dedicated to either liquid handling or pick-and-place operations. When all of the functions of the automated synthesis operation are resident at the robot station, the system is closed [26,27,3 1–36,38–41,45,53–55,57,58,69–72,77–81]. These systems eliminate any manual intervention during a complete synthesis cycle. However, they also preclude the usage of any system components for other desired laboratory operations. In contrast, an open robotic system employs dedicated workstations to enable the use of individual components (i.e. robot, agitator) [60,63–67,82,83]. For example, following robotic dispensing of reagents to reactors, the reactors can be transferred from the dispensing station to a reaction station, liberating the dispensing station for other functions.

Applications Process optimization The effects of reaction parameters such as temperature, agitation, solvent, and concentration can be determined by repetitively performing the reaction while varying the parameter being studied and holding the other parameters constant. This type of ‘one variable at a time’ process optimization can be very time-consuming and labor-intensive when several parameters are thought to be important in a reaction. Even when employing statistical design techniques [69], the number of experimental runs required to generate enough data to be confident in the results can be quite large. The reduction in product development time requires the automation of as much of this type of process as possible. Lindsey and co-workers [27,69,70], Weglarz and Atkin [32], and Metivier and co-workers [31,81] have all developed and applied Zymark® robotic workstations to optimize chemistry. Lindsey and co-workers [69] completed a factorial design study (16 experiments) to examine the role of catalyst and reactant concentrations on porphyrin yield in less than 1 day of workstation time. Weglarz and Atkin at Dow Chemical Company [32] studied the effect of reaction parameters on (i) the alkoxy substitution of cellulosic ethers; (ii) the base-catalyzed conversion of phenethyl bromide to styrene; and (iii) the onset of crystallization employing a fiber optic probe. Metivier and co-workers at Rhône-Poulenc [31,81] focused on the evaluation of catalysts, reagents, and solvents for process optimization work of numerous proprietary reactions. Boettger [84] developed and implemented a robotic system for the optimization of a palladium-catalyzed reaction by varying the concentration, temperature, and amount and type of catalyst and additives. Porte and co-workers have developed and applied numerous systems for automated process optimization. Representative reactions include the 71

S. H. De Witt

production of phthalic anhydride by a Diels–Alder reaction between maleic anhydride and butadiene [36], the preparation of 2-hexanol from the reaction of butylmagnesium chloride with acetaldehyde [35], the hydrolysis of benzaldehyde dioxolane using a stirred reactor in the repetitive and continuous modes [34], and the synthesis of glycine in an aqueous medium through ammonolysis of monochloroacetic acid with hexamethylenetetramine as a catalyst [33]. Peptide synthesis The repetitive nature of oligomeric synthesis has enabled the rapid implementation of solid-phase and automated methods for DNA [20,21,85,86] and peptide combinatorial libraries. Using these systems for the synthesis of single compounds or mixtures of compounds, multiple reaction vessels numbering 8 [45], 15 [80], 20 [59], 24 [50], 25 [57,58], 36 [53–55,77–79], 48 [26,39–41], or 96 [42–44] can be manipulated. Only a few of these systems enable automated resin mixing and splitting within the instrument to generate mixtures of compounds [53,59,78,87,88]. Integration of automated amide synthesis with automated purification has been implemented by workers at Bristol-Myers Squibb [89] using a Zymark robotic system. Organic synthesis and combinatorial chemistry Historically, the more diverse chemistries and sample manipulations of general organic synthesis have hindered the design and use of automated systems. Recently, however, TABLE 1 REPRESENTATIVE AUTOMATED SYNTHESIS SYSTEMS Company / Individual

ABIMED / Gilson [26,39–41] Advanced Chemtech / Zinsser [42–44] Affymax[62] Argonaut [61] Boutin [57,58] Bohdan [60] Chiron [53–56,77–79,93] CS Bio [73] Dow Chemical [32] DIVERSOMER Technologies / ParkeDavis [63–67,95–97] Lindsey [27,69,70] Ontogen [82,83,98] Porte[33–36] Selectide [59] Selectide[80] Shimadzu [45] Rhône-Poulenc [31,81] Takeda [71,72] Zeneca [38]

72

No. of Reactors 48 96 36 24 25 48 36 1 1 40 15 48 1 20 15 8 6 3 50

Phase Soln

x

x

Solid x x x x x x x x

Automation

Processing

Valve Robotic

Serial Parallel

x x x x x x x x

x x x

x x

x x x

x x x x x x x x

x x

x x x

x x x x x x x

x x x x x x x x

x x x x x

x x

Automated synthesis

combinatorial and high-throughput synthesis has emerged in response to increased competition and increased capacity needs in the pharmaceutical industry [1–8]. Therefore, more flexible systems to automate both solution-phase and solid-phase organic synthesis have been developed. One of the earliest instruments for automated organic synthesis was developed at Takeda in the late 1980s [90]. This system was later utilized to generate more than 200 derivatives of substituted N-(carboxyalkyl)amino acids [72] by solution-phase chemistry. Since 1992, workers at Chiron have extended the use of their Zymark robot system from peptide synthesis to libraries containing up to 5000 peptoid [oligo(N-substituted glycines)] [56,91–94] analogues. Workers at Parke-Davis reported the DIVERSOMER® method in 1993 [63,64] which utilized proprietary reactors in combination with a Tecan® robot. This technology has been implemented to generate libraries of hydantoins [63–66,95], benzodiazepines [63–66, 95], and quinolones [67,96,97] by solid-phase synthesis. Zeneca reported the execution of solution-phase chemistry with a Zymark robotic system, thereby generating 50 quinuclidine analogues to test for squalene synthetase inhibition [38]. Workers at Ontogen [82,83] reported the use of the OntoBLOCK system which utilizes a Tecan robot. They have implemented this technology for solid-phase synthesis to generate over 50 000 compounds corresponding to libraries of pyrroles, phosphonates, phosphinates, lactams, imidazoles, hydantoin imides and thioimides, oxazoles, and b-lactams. These libraries have resulted in potent and selective inhibitors of iNOS, PTPases and cdc25 phosphatase and compounds that reverse the P-glycoprotein (Pgp)-based multiple drug resistance (MDR) phenomenon in cellular assays and in animal models [98]. In 1996, workers at GlaxoWellcome disclosed the solution-phase preparation of 2aminothiazole combinatorial libraries [99]. Utilizing 20 glass vials and a DPC liquid dispensing robot, they prepared a library of 2500 compounds by this procedure. Also in 1996, Argonaut reported the use of their computer-controlled fluid delivery system for biaryl synthesis via a Suzuki coupling reaction [61]. A summary of representative systems for automated synthesis described in this chapter is shown in Table 1.

Conclusions The full exploitation of automation in a synthetic chemistry laboratory has not yet been realized. No single system addresses the needs of every user and the full range of chemical reactions. As the demand for such systems increases, the need for new and improved solutions becomes increasingly evident. It is noteworthy that the execution of combinatorial chemistry strategies can be, and in many cases is, limited by the availability of equipment and instrumentation. Combinatorial chemistry efforts have begun to identify the limitations of traditional tools for synthetic chemists. Automated synthesis in combination with emerging technologies for productivity gains (i.e. statistical experimental design [100], expert systems [101], inkjet printing [102], microfabrication [103], and nanomaterials [104]) will continue to revolutionize the field of synthetic chemistry. 73

S.H. DeWitt

References 1 Pavia, M.R., Sawyer, T.K. and Moos, W.H., Bioorg. Med. Chem. Lett., 3 (1993) 387. 2 Moos, W.H., Green, G.D. and Pavia, M.R., Annu. Rep. Med. Chem., 28 (1993) 315. 3 Gordon, E.M., Barrett, R.W., Dower, W.J., Fodor, S.P.W. and Gallop, M.A., J. Med. Chem., 37 (1994) 1385. 4 Gallop, M.A., Barrett, R.W., Dower, W.J., Fodor, S.P.W. and Gordon, E.M., J. Med. Chem., 37 (1994) 1233. 5 DeWitt, S.H., In Wermuth, C.G. (Ed.) The Practice of Medicinal Chemistry, Academic Press, New York, NY, U.S.A., 1996, pp. 117–134. 6 Rinnova, M. and Lebl, M., Coll. Czech. Chem. Commun., 61 (1996) 171. 7 Storer, R., Drug Discov. Today, 1 (1996) 248. 8 Patel, D.V. and Gordon, E.M., Drug Discov. Today, 1 (1996) 134. 9 Kost, G.J., Handbook of Clinical Automation, Robotics and Optimization, Wiley, New York, NY, U.S.A., 1996. 10 Fawzi, M.B., In Strimaitis, J.R. and Little, J.N. (Eds.) International Symposium on Laboratory Automation and Robotics, Zymark Corp., Hopkinton, MA, U.S.A., 1991, pp. 19–35. 11 Bauer, B.E., In Chaiken, I.M. and Janda, K.D. (Eds.) Molecular Diversity and Combinatorial Chemistry: Libraries and Drug Discovery, American Chemical Society, Washington, DC, U.S.A., 1996, pp. 233–243. 12 Domanico, P.L., Chemometr. Intell. Lab. Syst., 26 (1994) 115. 13 Beugelsdijk, T.J., Genet. Anal. Tech. Appl., 8 (1991) 217. 14 O’Connor, S., J. Autom. Chem., 15 (1993) 9. 15 Lindsey, J.S., Chemometr. Intell. Lab. Syst., 17 (1992) 15. 16 Guette, J., Crenne, N., Bulliot, H., Desmurs, J. and Igersheim, F., Pure Appl. Chem., 60 (1988) 1669. 17 Auffrett, A.D., Hirst, W., Meade, L.G. and Thacker, M.W., Pept. Biol. Fluids, 34 (1986) 15. 18 Andrews, R.P. and Summers, C., Am. Biotechnol. Lab., 4 (1986) 30. 19 Newton, R. and Fox, J.E., Adv. Biotechnol. Processes, 10 (1988) 1. 20 Newton, R., Am. Biotechnol. Lab., 7 (1989) 41. 21 Kaplan, B.E. and Itakura, K., In Narang, S.A. (Ed.) Synthesis and Applications of DNA and RNA, Academic Press, Orlando, FL, U.S.A., 1987, pp. 9–45. 22 Hardin, J.H. and Smietana, F.R., Mol. Div., 1 (1996) 270. 23 Fruchtel, J.S. and Jung, G., Angew. Chem., Int. Ed. Engl., 35 (1996) 17. 24 DeWitt, S.H. and Czarnik, A.W., Curr. Opin. Biotechnol., 6 (1995) 640. 25 Dooley, C.T. and Houghten, R.A., Life Sci., 52 (1993) 1509. 26 Gausepohl, H., Boulin, C., Kraft, M. and Frank, R.W., Pept. Res., 5 (1992) 315. 27 Corkan, L.A. and Lindsey, J.S., Chemometr. Intell. Lab. Syst., 17 (1992) 47. 28 Frisbee, A.R., Nantz, M.H., Kramer, G.W. and Fuchs, P.L., J. Am. Chem. Soc., 106 (1984) 7143. 29 Lantrip, D.A., Fuchs, P.L. and Kramer, G.W., Adv. Lab. Autom. Robotics, 5 (1989) 115. 30 Matsuda, R., Ishibashi, M. and Takeda, Y., Chem. Pharm. Bull., 36 (1988) 3512. 31 Metivier, P., Josses, P., Bulliot, H., Corbet, J.P. and Joux, B., Chemometr. Intell. Lab. Syst., 17 (1992) 137. 32 Weglarz, T.E. and Atkin, S.C., Adv. Lab. Autom. Robotics, 6 (1990) 435. 33 Fauduet, H., Nikravech, M. and Porte, C., Process Control Qual., 8 (1996) 41. 34 Porte, C., Canatas, A. and Delacroix, A., Lab. Robotics Autom., 7 (1995) 197. 35 Porte, C., Kouz, E.M. and Delacroix, A., Lab. Robotics Autom., 6 (1994) 119. 36 Delacroix, A., Desmoineaux, V., Guette, J.P., Petit, J. and Porte, C., Lab. Robotics Autom., 5 (1993) 3. 37 Hayashi, N., Sugawara, T., Shintani, M. and Kato, S., J. Autom. Chem., 11 (1989) 212. 38 Main, B.G. and Rudge, D.A., In Strimaitis, J.R. and Hawk, G.L. (Eds.) International Symposium on Laboratory Automation and Robotics, Zymark Corp., Hopkinton, MA, U.S.A., 1994, pp. 425–434.

74

Automated synthesis 39 Gausepohl, H., Kraft, M., Boulin, C. and Frank, R.W., In Rivier, J.E. and Marshall, G.R. (Eds.) Peptides: Chemistry, Structure and Biology (Proceedings of the 11th American Peptide Symposium), ESCOM, Leiden, The Netherlands, 1990, pp. 1003–1004. 40 Gausepohl, H., Kraft, M., Boulin, C. and Frank, R.W., In Giralt, E. and Andreu, D. (Eds.) Peptides 1990 (Proceedings of the 21st European Peptide Symposium), ESCOM, Leiden, The Netherlands, 1991, pp. 206–207. 41 Gausepohl, H. and Behn, C., In Epton, R. (Ed.) Innovation and Perspectives in Solid Phase Synthesis, 3rd International Symposium, Mayflower Worldwide, Birmingham, U.K., 1994, pp. 175–1 82. 42 Schnorrenberg, G. and Lang, R.E., In Rivier, J.E. and Marshall, G.R. (Eds.) Peptides: Chemistry, Structure and Biology (Proceedings of the 11th American Peptide Symposium), ESCOM, Leiden, The Netherlands, 1990, pp. 1029–1030. 43 Schnorrenberg, G., Wiesmüller, K.-H., Beck-Sickinger, A.G., Drechsel, H. and Jung, G., In Giralt, E. and Andreu, D. (Eds.) Peptides 1990 (Proceedings of the 21st European Peptide Symposium), ESCOM, Leiden, The Netherlands, 1991, pp. 202–203. 44 Schnorrenberg, G., Chem. Oggi, 10 (1992) 33. 45 Nokihara, K., Yamamoto, R., Hazama, M., Wakizawa, O. and Nakamura, S., In Epton, R. (Ed.) Innovation and Perspectives in Solid Phase Synthesis, 2nd International Symposium, Intercept, Andover, U.K., 1992, pp. 445–448. 46 Wiesmuller, K.-H., Treffer, U., Spohn, R. and Jung, G., In Schneider, C.H. and Eberle, A.N. (Eds.) Peptides 1992 (Proceedings of the 22nd European Peptide Symposium), ESCOM, Leiden, The Netherlands, 1993, pp. 308–309. 47 Gausepohl, H., Kraft, M., Boulin, C. and Frank, R.W., In Epton, R. (Ed.) Innovation and Perspectives in Solid Phase Synthesis, 1st International Symposium, SPCC (UK), Birmingham, U.K., 1990, pp. 487–490. 48 Gausepohl, H. and Frank, R.W., In Schneider, C.H. and Eberle, A.N. (Eds.) Peptides 1992 (Proceedings of the 22nd European Peptide Symposium), ESCOM, Leiden, The Netherlands, 1993, pp. 310–311. 49 Lebl, M., Stierandova, A., Eichler, J., Patek, M., Pokorny, V., Jehnicka, J., Mudra, P., Zenisek, K. and Kalousek, J., In Epton, R. (Ed.) Innovation and Perspectives in Solid Phase Synthesis, 2nd International Symposium, Intercept, Andover, U.K., 1992, pp. 25 1–257. 50 Pokorny, V., Mudra, P., Jehnicka, J., Zenisek, K., Pavlik, M., Voburka, Z., Rinnova, M., Stierandova, A., Lucka, A.W., Eichler, J., Houghten, R. and Lebl, M., In Epton, R. (Ed.) Innovation and Perspectives in Solid Phase Synthesis, 3rd International Symposium, Mayflower Worldwide, Birmingham, U.K., 1994, pp. 643–648. 51 Neimark, J. and Briand, J.P., Pept. Res., 6 (1993) 219. 52 Cole-Parmer® Instrument Company: Chemical Resistance Charts, Cole-Parmer, Niles, IL, U.S.A., 1995–1996, pp. 1672–1680. 53 Zuckermann, R.N., Kerr, J.M., Siani, M.A. and Banville, S.C., Int. J. Pept. Protein Res., 40 (1992) 497. 54 Zuckermann, R.N. and Banville, S., U.S. Patent 5 240 680, 1993. 55 Zuckermann, R.N., Heubner, V.D., Santi, D.V. and Siani, M.A., US. Patent 5 252296, 1993. 56 Zuckermann, R.N., Figliozzi, G.M., Banville, S.C., Kerr, J.M., Siani, M.A., Martin, E.J., Brown, E.G. and Wang, L., In Epton, R. (Ed.) Innovation and Perspectives in Solid Phase Synthesis, 3rd International Symposium, Mayflower Worldwide, Birmingham, U.K., 1994, pp. 397–402. 57 Boutin, J.A., Hennig, P., Lambert, P.-H., Bertin, S., Petit, L., Mahieu, J.-P., Serkiz, B., Volland, J.-P. and Fauchere, J.-L., Anal. Biochem., 234 (1996) 126. 58 Boutin, J.A. and Fauchere, J.-L., In Strimaitis, J.R. and Hawk, G.L. (Eds.) International Symposium on Laboratory Automation and Robotics, Zymark Corp., Hopkinton, MA, U.S.A., 1995, pp. 197–210. 59 Bartak, Z., Bolf, J., Kalousek, J., Mudra, P., Pavlik, M., Pokorny, V., Rinnova, M., Voburka, Z., Zenisek, K., Krchnák, V., Lebl, M., Salmon, S.E. and Lam, K.S., Methods, 6 (1994) 432. 60 Harness, J.R., In Chaiken, I.M. and Janda, K.D. (Eds.) Molecular Diversity and Combinatorial Chemistry: Libraries and Drug Discovery, American Chemical Society, Washington, DC, U.S.A., 1996, pp, 188–198. ^

75

S. H. De Witt 61 Gooding, O., Hoeprich Jr., P.D., Labadie, J.W., Porco Jr., J.A., Van Eikeren, P. and Wright, P., In Chaiken, I.M. and Janda, K.D. (Eds.) Molecular Diversity and Combinatorial Chemistry: Libraries and Drug Discovery, American Chemical Society, Washington, DC, U.S.A., 1996, pp. 199–206. 62 Sugarman, J., Rava, R., Kedar, H., Dower, W., Barrett, R., Gallop, M. and Needels, M., U.S. Patent 5 503 805, 1995. 63 DeWitt, S.H., Kiely, J.S., Stankovic, C.J., Schroeder, M.C., Cody, D.M.R. and Pavia, M.R., Proc. Natl. Acad. Sci. USA, 90 (1993) 6909. 64 DeWitt, S.H., Stankovic, C.J. and Schroeder, M.C., In Strimaitis, J.R. and Hawk, G.L. (Eds.) International Symposium on Laboratory Automation and Robotics, Zymark Corp., Hopkinton, MA, U.S.A., 1993, pp. 248–263. 65 DeWitt, S.H., Schroeder, M.C., Stankovic, C.J., Strode, J.E. and Czarnik, A.W., Drug Dev. Res., 33 (1994) 116. 66 DeWitt, S.H., Bear, B.R., Brussolo, J.S., Duffield, M.J., Hogan, E.M., Kibbey, C.E., MacDonald, A.A., Nickell, D.G., Rhoton, R.L. and Robertson, G.A., In Chaiken, I.M. and Janda, K.D. (Eds.) Molecular Diversity and Combinatorial Chemistry: Libraries and Drug Discovery, American Chemical Society, Washington, DC, U.S.A., 1996, pp. 207–218. 67 DeWitt, S.H. and Czarnik, A.W., Acc. Chem. Res., 29 (1996) 114. 68 Aarts, R.J., Lindsey, J.S., Corkan, L.A. and Smith, S.F., Clin. Chem., 41 (1995) 1004. 69 Corkan, L.A., Plouvier, J.C. and Lindsey, J.S., Chemometr. Intell. Lab. Syst., 17 (1992) 95. 70 Corkan, A. and Lindsey, J.S., Adv. Lab. Autom. Robotics, 6 (1990) 477. 71 Sugawara, T., Kato, S. and Okamoto, S., J. Autom. Chem., 16 (1994) 33. 72 Hayashi, N., Sugawara, T. and Kato, S., J. Autom. Chem., 13 (1991) 187. 73 Chang, H.-W. and Alavazza, D.M., U.S. Patent 5 380495, 1995. 74 Myers, P.L., In Strimaitis, J.R. and Hawk, G.L. (Eds.) International Symposium on Laboratory Automation and Robotics, Zymark Corp., Hopkinton, MA, U.S.A., 1995, pp. 235–248. 75 Strimaitis, J.R., J. Chem. Educ., 66 (1989) 8. 76 Beugelsdijk, T.J., Lab. Robotics Autom., 2 (1990) 219. 77 Zuckermann, R.N. and Banville, S.C., Pept. Res., 5 (1992) 169. 78 Zuckermann, R.N., Kerr, J.M., Siani, M.A., Banville, S.C. and Santi, D.V., Proc. Natl. Acad. Sci. USA, 89 (1992) 4505. 79 Zuckermann, R.N., Siani, M.A. and Banville, S.C., Lab. Robotics Autom., 4 (1992) 183. 80 Krchnák, V., Cabel, D. and Lebl, M., Pept. Res., 9 (1996) 45. 81 Josses, P., Joux, B., Barrier, R., Desmurs, J.R., Bulliot, H., Ploquin, Y and Metivier, P., Adv. Lab. Autom. Robotics, 6 (1990) 463. 82 Cargill, J.F., Maiefski, R.R. and Toyonaga, B.E., In Strimaitis, J.R. and Hawk, G.L. (Eds.) International Symposium on Laboratory Automation and Robotics, Zymark Corp., Hopkinton, MA, U.S.A., 1995, pp. 221–234. 83 Cargill, J.F. and Maiefski, R.R., Lab. Robotics Autom., 8 (1996) 139. 84 Boettger, S.D., Lab. Robotics Autom., 4 (1992) 169. 85 Lashkari, D.A., Hunicke-Smith, S.P., Norgren, R.M., Davis, R.W. and Brennan, T., Proc. Natl. Acad. Sci. USA, 92 (1995) 7912. 86 Hebert, N., In Chaiken, I.M. and Janda, K.D. (Eds.) Molecular Diversity and Combinatorial Chemistry: Libraries and Drug Discovery, American Chemical Society, Washington, DC, U.S.A., 1996, p. 40. 87 Saneii, H., Shannon, J.D., Miceli, R.M., Fischer, H.D. and Smith, C.W., In Hodges, R.S. and Smith, J.A. (Eds.) Peptides: Chemistry, Structure and Biology (Proceedings of the 13th American Peptide Symposium), ESCOM, Leiden, The Netherlands, 1994, pp. 1018–1020. 88 Zsigo, J. and Saneii, H., In Hodges, R.S. and Smith, J.A. (Eds.) Peptides: Chemistry, Structure and Biology (Proceedings of the 13th American Peptide Symposium), ESCOM, Leiden, The Netherlands, 1994, pp. 91–92. ^

76

Automated synthesis 89 Lawrence, R.M., Fryszman, O.M., Poss, M.A., Biller, S.A. and Weller, H.N., In Strimaitis, J.R. and Hawk, G.L. (Eds.) International Symposium on Laboratory Automation and Robotics, Zymark Corp., Hopkinton, MA, U.S.A., 1995, pp. 211–220. 90 Hayashi, N. and Sugawara, T., Tetrahedron Comput. Methodol., 1 (1988) 237. 91 Zuckermann, R.N., Martin, E.J., Spellmeyer, D.C., Stauber, G.B., Shoemaker, K.R., Kerr, J.M., Figliozzi, G.M., Goff, D.A., Siani, M.A., Simon, R.J., Banville, S.C., Brown, E.G., Wang, L., Richter, L.S. and Moos, W.H., J. Med. Chem., 37 (1994) 2678. 92 Zuckermann, R.N., Kerr, J.M., Kent, S.B.H. and Moos, W.H., J. Am. Chem. Soc., 114 (1992) 10646. 93 Simon, R.J., Kania, R.S., Zuckermann, R.N., Huebner, V.D., Jewell, D.A., Banville, S., Ng, S., Wang, L., Rosenberg, S., Marlowe, C.K., Spellmeyer, D.C., Tan, R., Frankel, A.D., Santi, D.V., Cohen, F.E. and Bartlett, P.A., Proc. Natl. Acad. Sci. USA, 89 (1992) 9367. 94 Figliozzi, G.M., Goldsmith, R., Ng, S.C., Banville, S.C. and Zuckermann, R.N., Methods Enzymol., 267 (1996) 437. 95 DeWitt, S.H., Kiely, J.S., Pavia, M.R., Stankovic, C.J. and Schroeder, M.C., U.S. Patent 5 324 483, 1994. 96 MacDonald, A.A., DeWitt, S.H., Hogan, E.M. and Ramage, R., Tetrahedron Lett., 37 (1996) 4815. 97 MacDonald, A.A., DeWitt, S.H. and Ramage, R., Chimia, 50 (1996) 266. 98 Harris, A.L. and Toyonaga, B.E., In Chaiken, I.M. and Janda, K.D. (Eds.) Molecular Diversity and Combinatorial Chemistry: Libraries and Drug Discovery, American Chemical Society, Washington, DC, U.S.A., 1996, pp. 298–308. 99 Bailey, N., Dean, A.W., Judd, D.B., Middlemiss, D., Storer, R. and Watson, S.P., Bioorg. Med. Chem. Lett., 6 (1996) 1409. 100 Martin, E.J., Blaney, J.M., Siani, M.A., Spellmeyer, D.C., Wong, A.K. and Moos, W.H., J. Med. Chem., 38 (1995) 1431. 101 Lousse, E, Iskra, J.-L., Porte, C. and Delacroix, A,, Lab. Robotics Autom., 7 (1995) 93. 102 Nishioka, G.M., U.S. Patent 5 449754, 1995. 103 Begg, G.S., Simpson, R.J. and Burgess, A.W., U.S. Patent 5 516 698, 1996. 104 Martin, C.R., Science, 266 (1994) 1961.

77

Applications of combinatorial technology to drug discovery Dinesh V. Patel Versicor Inc., 270 East Grand Avenue, South San Francisco, CA 94080, U.S.A.

Introduction Time- and cost-efficient approaches for combating new diseases is the guiding principle of modern medicine. New chemical entities are a critical requirement at the front end of drug discovery. In this context, combinatorial chemistry has emerged as a timely drug discovery tool. Through its enormous capacity to make available large numbers of different compounds in a very short period of time, combinatorial chemistry appears to be an ideal source for the discovery of novel drug entities against emerging disease targets. The continuing development of high throughput bioassay screening formats coupled with human genomics as a potential source of new biological targets creates an acute need for large numbers and types of diverse molecules. Conventional sources of chemicals do not bear the capacity to solve this crisis. The practical application of combinatorial chemistry to drug discovery is expected to fulfill this need, and hopefully in turn revert the problem by creating a demand for more efficient screening strategies. A continuing shift of challenges and synergism between chemistry, biology and automated instrumentation will soon bring us to previously unattainable levels of sophistication and efficiency in the area of novel drug discovery. Excellent reviews on combinatorial chemistry have appeared recently [1–6], and this chapter is confined to summarizing important applications of combinatorial chemistry to drug discovery through 1995, with major emphasis on non-peptidic molecules and small molecules.

Synthesis and screening of oligomeric peptide and non-peptide libraries Historically, the field of combinatorial chemistry commenced with libraries of linear, oligomeric peptides whose preparation was based on the reliable amide bond forming solid-phase chemistry pioneered by Merrifield in the 1960s. The use of Furka’s split and pool protocol [7] benefits from exponential multiplicity of synthesizable analogs using a small set of amino acid (AA) building blocks. The availability of hundreds of AA building blocks (BBs) and the simplicity of peptide bond construction have successfully led to the synthesis and screening of large libraries to uncover molecules with interesting biological activities, as illustrated by representative examples 1–3 in Fig. 1 [8–10]. While these events have clearly helped validate the concept of combinatorial technology, most drug discovery programs will limit the use of such peptidic molecules as initial hits that may serve to define important structural elements for binding and lay the foundation for further lead discovery and development. The limited potential of peptides themselves to succeed as

78

Annual Reports in Combinatorial Chemistry and Molecular Diversity, Vol. 1, pp. 78–89 © 1997 ESCOM Science Publishers B. V.

Applications of combinatorial technology to drug discovery H-(Y-G-G-F-M-A)-NH2 1 IC50=28 nM µ-opioid ligand from >52 million 6-mer library [9]

H-(R-A-L-P-P-L-P-R-Y)-NH2 2 Kd=7.8 µM Src SH3 ligand from >2 million 9-mer library [10]

Ac-(D-Trp-L-Phe-D-Trp)-NH2 3 IC50,=4.5 µM Serotonin reuptake inhibitor from 10 648 member library [11]

Fig. 1. Bioactive molecules from oligomeric peptide libraries.

orally bioavailable drugs has led to extensive investigation aimed at generating chemical diversity around non-amide scaffolds. Early efforts in this direction led to libraries based on the oligomeric arrangement of non-peptidic functionalities such as N-substituted glycines [11], carbamates [12], and ureas [13]. Such molecules will retain most of the chemical, physical, and geometric 3D diversity provided by peptides, and are also expected to have better proteolytic stability, absorption, and pharmacokinetic properties. Thus, the synthesis and screening of 18 mixtures of a 204-member peptoid trimer library 4, in which the choice of side chains was biased to resemble known ligands of the 7-transmembrane G-protein-coupled receptor (7-TM/ GPCR) family, have led to the identification of nanomolar ligands 5 and 6 against µopiate and -adrenergic receptors, respectively (Fig. 2) [14,15]. The combinatorial assembly of linear but nonrepetitive BBs can provide a collection of valuable hetero-oligomeric molecules. Incorporating the elements of mechanism-based rational design of enzyme inhibitors into such a strategy can be very effective for pharmacophore-based drug discovery. An early example in this category came in the form of tripeptidyl phosphonic acid analog libraries [15,16]. A phosphinyl acid moiety serves as a transition-state (TS) analog and a metal chelator, and has been successfully utilized in

Fig. 2. Bioactive molecules from oligomeric non-peptide libraries.

79

D.V. Patel

the past in the rational design of potent inhibitors of metalloproteases such as thermolysin and angiotensin converting enzyme (ACE) [17,18]. Employing efficient Mitsunobu-type coupling chemistry for phosphorus-oxygen (P-O) bond formation on solid support [ 15], a 540-member TS analog inhibitor library of trimer peptidyl phosphonates 7 was prepared and screened employing a depletion assay for activity against the metalloprotease enzyme, thermolysin (Fig. 2) [ 16]. Besides identifying the most potent peptidyl phosphonate inhibitor for thermolysin reported in the literature (table, compound 8, P2' = alanine, Ki = 49 nM), the serial deconvolution process also uncovered new analogs with basic (8, P 2' = arginine and histidine) and neutral H-bonding (P 2' = glutamine) side chains. This unexpected finding serves to illustrate that while individual analog synthesis may lead to prematurely biased approaches based on initial findings, e.g. a prior study of thermolysin inhibitors with only hydrophobic P 2' side chains, the library approach will be less biased, more versatile and thus expected to provide more diverse structure–activity relationship (SAR) data.

Synthesis and screening of orthogonal and binary encoded libraries Screening large pools of compounds has several disadvantages, and often results in missed or false activities. To obviate such limitations, different pooling and structure identification strategies such as orthogonal libraries and binary encoded libraries have been developed and reported to provide examples of bioactive molecules.

Fig. 3. Bioactive molecules from orthogonal and binary encoded libraries.

80

Applications of combinatorial technology to drug discovery

Deprez et al. [19] prepared two orthogonal sets of a trimer library from 25 AAs comprising 253= 15 625 members (Fig. 3). Each set (A and B) had 125 sublibraries (A1–125 and B1–125 with 125 members per sublibrary, but any sublibrary from A and any sublibrary from B shared only one trimer molecule in common. Screening of the two sets of 125 orthogonal pools for activity against the V2 vasopressin receptor by using an assay measuring inhibition of binding of radiolabeled vasopressin to renal LLCPKl cells resulted in the orthogonal identification of 9 as a potent ligand (IC50 = 63 nM). Smith et al. [20] prepared two orthogonal sets of a solution-phase 160O-amide/ester dimer library from 40 acid chlorides and 40 amine and alcohol nucleophiles (Fig. 3). Subjecting the 80 sublibrary samples (40 members per sublibrary) to a variety of high throughput screens led to the identification of moderately active neurokinin NK3 receptor antagonist 10 (IC50 = 60 µM) and the matrix metalloprotease collagenase-1 inhibitor 11 (IC50 = 55 µM). Chabala et al. [21] have prepared encoded libraries of small molecules attached via photolinker on solid support, and used a binary encoding strategy employing chemically inert haloaryl molecular tags (Fig. 3). These tags are introduced using a carbene insertion reaction and released via oxidation [22]. Various synthetic transformations such as acylations, sulfonylations, carbamoylation, halide displacements, reductive aminations, cyclocondensations, and Mitsunobu chemistry are compatible with the tagging chemistry and were used to prepare libraries of small molecules. Specifically, a 6727-member linear library 12 was prepared and screened against bovine carbonic anhydrase II (bCAII) using a fluorescence-based dansyl amide displacement assay followed by the decoding of positive beads to uncover compound 13 (Kd = 4 nM) as a potent carbonic anhydrase inhibitor.

Combinatorial synthesis and screening of heterocyclic, drug-like scaffold libraries Most drug molecules clearly fall in the category of non-oligomeric, small-sized, heterocyclic molecules. Thus, exploitation of combinatorial technology for drug discovery can be expected to rely heavily on the ability to generate chemical libraries of such molecular entities. Solid-phase synthesis (SPS) makes a scaffold amenable to split-pool protocols, and also offers general advantages such as driving difficult reactions to completion through the employment of excess reagents, and easy purification of desired immobilized products from unwanted reagents and side-products in solution. A consequence of this has been a frenzy of activity from various laboratories aimed at translating all important solutionphase chemistries of historical drug-like molecules into corresponding solid-phase and combinatorial formats. Unlike the synthesis of linear, oligomeric peptidic and non-peptidic libraries described so far, heterocycles are non-polymeric scaffolds and are usually constructed from diverse building blocks employing nonrepetitive chemistry. One of the first examples in this category was reported by Ellman and co-workers [23] through the preparation of libraries of benzodiazepines, an important heterocyclic class of pharmaceutical agents (Fig. 4). The synthesis proceeds through the intermediacy of 2aminobenzophenone BBs 14 attached on an acid cleavable linker bearing solid support through a hydroxyl or carboxyl group of the aryl moiety. The parallel synthesis of a 192member 1,4-benzodiazepine library 15 comprising analogs with diverse chemical functionalities such as amides, amines, phenols, carboxylic acids and indoles was performed [24]. 81

D. V. Patel

Fig. 4. Bioactive molecules from the synthesis and screening of drug-like scaffold, small-molecule libraries.

Upon screening for binding to the cholecystokinin A receptor using a competitive radioligand binding assay, detailed structure–activity relationship (SAR) data were acquired, from which the indole analog 16 was identified as a potent CCK ligand library member. The limitation of the scarce availability of 2-aminobenzophenone BBs has been recently overcome by developing a more facile method for constructing 2-aminoaryl ketone deriva82

Applications ofcombinatorial technology to drug discovery

tives on solid support using palladium-mediated Stille coupling between 2-aminoaryl stannane on solid support and an acid chloride as the solution coupling partner [25]. Thus, the need for aminobenzophenone BBs in the overall synthesis is replaced by the readily available acid chlorides. More recently, the modified synthetic protocol has been utilized for preparing an 11 200-analog library from 20 acid chlorides, 35 AAs, and 16 alkylating agents [26]. An alternate synthetic strategy for benzodiazepines involves the condensation of resinbound a-amino esters 17 with 2-aminobenzophenone imines followed by TFA treatment of the intermediate to effect cleavage and cyclization [27]. The parallel synthesis of 40 discrete benzodiazepine analogs 18 was performed, and expected SAR data were generated in a bioassay based on the inhibition of fluoronitrazepam (Fig. 4). Resin-bound a-amino esters 17 react cleanly with aromatic and heteroaromatic aldehydes at room temperature in neat trimethyl orthoformate to afford the corresponding imines 20. Lewis acid mediated ionization of these immobilized imines leads to a-amino aldimines which undergo chelation-controlled regio- and stereoselective [3 + 2] cycloaddition with a wide variety of electron-deficient olefin dipolarophiles to yield the fivemembered pyrrolidines. A pool of pyrrolidines prepared combinatorially from four AAs, five aromatic aldehydes, and four olefins was acylated with three different mercaptoacyl chlorides to provide, after appropriate deprotections and final TFA cleavage from resin, a library of >480 mercaptoacyl proline analogs 21. Such molecules have found well-precedented use as inhibitors of ACE, as exemplified by the antihypertensive drug Captopril. Serial deconvolutive screening of this library for activity against ACE led to the identification of 22 (Ki= 160 pM) as an analog 3 times more potent than Captopril [28]. Diketopiperazines (DKPs) are a well-studied class of dipeptide mimetics that have shared a great deal of history and success in the pharmaceutical industry. Recently, a three-step SPS of a 1000-member (103) DKP library 24 employing two different sets of 10 AAs and one set of 10 aldehyde BBs has been reported (Fig. 4) [29]. Starting from resinbound secondary amines 23 derived from sodium triacetoxyborohydride-mediated reductive alkylation of the amino group of immobilized a-amino esters 17 with aldehydes, PyBrop-mediated double coupling with the next set of Boc-protected AAs provides the penultimate dipeptide precursors. Boc removal with TFA followed by a short reflux in toluene to induce ring cyclization and simultaneous cleavage from the solid support afforded the desired DKPs 24 in solution as 10 pools with 100 members in each pool. Iterative screening and resynthesis of this library has led to several bioactive molecules, as exemplified by the identification of a neurokinin-2-receptor ligand 25 (IC50 = 313 nM) [3]. The dihydropyridine (DHP) scaffold has surfaced in numerous diverse bioactive molecules, the most successful examples being calcium channel blocker antihypertensive drugs such as Nifedipine and Nimodipine. A facile synthetic route for the SPS of DHPs proceeding through the intermediacy of immobilized β-keto esters and β-enamino esters 26 has recently been described by Gordeev and co-workers [30]. This methodology was extended toward the preparation of a 100-member DHP library 27 comprising 10 pools, with each pool bearing a distinct aryl group and 10 different R1/R2 substituents derived from the 10 β-keto esters employed for forming enamino esters. Serial deconvolutive screening for calcium channel blockade activity using a cortex membrane binding assay 83

D.V. Patel

identified several active compounds including the commercial drug Nifedipine 28 (IC5o = 18 nM). This example represents a significant departure from conventional solid-phase and combinatorial chemistry in the sense that it does not utilize any AA building blocks in the construction of the core scaffold.

Parallel synthesis and screening of bioactive pharmacophore libraries The extension of the application of combinatorial chemistry from lead discovery to lead optimization has resulted in a gradual shift from split-pool protocol-based libraries generating mixtures of compounds to the parallel synthesis of discrete analogs (see Fig. 5). The emphasis in the latter case is not on the size of the libraries but rather on the yield, purity,

Fig. 5. Parallel synthesis and screening of bioactive molecules.

84

Applications of combinatorial technology to drug discovery

and appropriate characterization of various analogs to allow for their meaningful biological evaluation and reliable SAR study. Numerous small-member libraries from the parallel synthesis of drug-like molecules have been reported recently. Aldol condensation of the zinc enolate of resin-bound alkyl ester 29 with aromatic aldehyde or ketone forms a β-hydroxy ester, which upon treatment with DIBAL-H leads to simultaneous reduction and cleavage of the ester moiety from the resin to give a soluble 1,3-diol 31 [31]. Parallel synthesis utilizing three ester and nine carbonyl building blocks afforded a library of 27 analogs which was screened for antioxidative efficiency using a ferric thiocyanate assay. A 24-analog hydroxystilbene library 34 was constructed from the Horner–Emmons olefination of four resin-bound hydroxybenzaldehyde BBs 33 and six different benzyl phosphonate anions [32]. Screening for activity in a cell-based estrogenic assay identified several analogs such as 35 with IC50 values in the 5–15 µM range. A four-component Ugi condensation of cinnamate acids 36 with a series of aldehydes and isocyanides in the presence of Rink resin serving as an ammonia synthon leads, after acidolytic cleavage, to clean isolation of NH-acyl amino amide type cinnamic acid derivatives 37. Screening of individual analogs against hematopoietic protein tyrosine phosphatase (HePTP) using p-nitrophenyl phosphate as a substrate led to the identification of 38 as a moderately active, but novel, non-phosphorus-based inhibitor of HePTP (IC50 = 4 µM) [33]. Using a set of six o-cyanoaniline 39 and 12 cyclohexanone 40 BBs, Pirrung and coworkers prepared two orthogonal sets of a 72-member tetrahydroacridine library 41. Screening against acetylcholinesterase led to the identification of 7-nitrotacrine 42 as a 10fold more potent analog (Ki = 10 nM) than the parent compound tacrine [34]. Extensive efforts in the arena of the rational design of aspartyl protease inhibitors such as renin and HIV have led to the discovery of several TS analog mimics. These templates can be viewed as monomeric units around which molecular diversity can be generated by appropriate chemical modifications of the other functional groups present on these templates. Examples of such monomer decoration strategies have been reported by two different groups for the SPS of HIV protease inhibitor libraries derived from hydroxyethylamine and 1,2-diol TS analog BBs [35,36].

Fig. 6. SPS of transition-state analog pharmacophores and libraries.

85

D.V. Patel

Fig. 7. Monomer decoration strategies for molecular diversity.

In the first instance, diamino alcohol and diamino diol BBs are immobilized through their hydroxyl groups as acetal 43 and ketal 44, respectively (Fig. 6). The monomeric units are now well suited for a bidirectional solid-phase synthesis strategy for preparing C-2 symmetric molecules. A library of >300 discrete analogs 45 was prepared and screened against HIV protease to identify several potent inhibitors (IC50< 100 nM) [35]. In the other method, a masked amino diol pharmacophore is attached through its hydroxyl group onto a dihydropyran functionalized polystyrene support (Fig. 6). The tosyl and azido groups of pharmacophoric unit 46 provide convenient handles for bidirectional derivatization. The tosyl group can be displaced with primary amines, and the resulting secondary amine can be converted to amides or ureas. The azido group can be reduced to an amine and used for further functionalization. The practicality of this approach was illustrated by the SPS of various known HIV protease inhibitors 47 and 48 [36]. Acid chloride handles on various templates 49–51 were acylated with equimolar mixtures of 19 different appropriately protected amines to prepare non-peptidic polyamide libraries. The three templates possess different types of symmetry and offer varying degrees of 3D spatial diversity. Screening for trypsin inhibition employing a chromogenic assay identified the xanthane analog 52 (Ki = 9.4 µM) as a moderately active, but structurally novel, inhibitor of trypsin (see Fig. 7) [37].

Solid-phase synthesis of heterocyclic, drug-like molecules Non-oligomeric and air- or moisture-sensitive chemistries such as cycloadditions, cyclizations, and carbanion condensations have already been successfully enabled on solid support, and some recent examples are discussed below. It is anticipated that these studies will be further extended in combinatorial formats and libraries subjected to HTS in the near future. Resin-boudα-amino esters,besides being traditionally used for making peptides, have served as key intermediates for the construction of various heterocyclic scaffolds. Thus, they react smoothly with isocyanates to form ureas, which upon heating under acidic conditions cyclize to form hydantoins 53 [27]. A one-pot, three-component condensation of resin-bound a-amino esters with aldehydes and a-mercapto acids affords 4-thiazolidi86

Applications of combinatorial technology to drug discovery

Fig. 8. SPS of drug-like scaffolds and pharmacophores.

nones 54 in good yields [38]. The reaction proceeds through the intermediacy of imines, as evidenced by the successful conversion of stable imines derived from aromatic aldehydes to the cyclized products upon treatment with mercapto acids. Such imines provide a convenient starting point for further generation of heterocyclic diversity. Besides their reaction with azomethine ylides to form pyrrolidines as discussed previously, they also undergo smooth [2 + 2] cycloaddition with ketenes derived from acid chlorides to provide β-lactams 55 with good regiocontrol [39]. Heterocycles such as hydantoins, thiazolidinones, and β-lactams (see Fig. 8) possess a wide range of biological activities, and generic libraries of such scaffolds may provide new leads from random screening. Alternatively, these can be viewed as peptidomimetic fragments and utilized in the construction of larger molecular size non-peptidic libraries. Two different routes have been reported for the SPS of 1,4-benzodiazepine-2,5-diones 56 [40,41]. In the first method, the intermediacy of a tertiary amide is a critical requirement for efficient lactamization, presumably by favoring a cis conformation for the acyclic precursor [40]. The second method effects ring closure on solid support via an intramolecular aza-Wittig (Staudinger) reaction to construct a benzodiazepinedione scaffold embedded on an N-substituted glycine scaffold. An immobilized secondary amine is acylated with o-azido benzoyl chloride and treated with Bu3P in toluene at room temperature to form the iminophosphorane. Heating at 130 °C for several hours leads to cyclization and TFA treatment cleaves the desired product, which is typically obtained in adequate yield (34–90%) and good purity (60–95%) [41]. O-tethered β-keto esters, through the intermediacy of arylidene keto esters, have been efficiently utilized for the construction of immobilized dihydropyridines. Ceric ammonium nitrate (CAN) oxidation to pyridines followed by acidolytic cleavage provides a facile entry into nicotinic acid derivatives 57 [42]. A three-component Biginelli cyclization of ureas on resin with a solution mixture of aldehydes and β-keto esters provides dihydropyrimidines 58 in high yield and purity [43]. Heterocycles such as dihydropyridines and pyrimidines have historically proven to be a rich source of antimicrobial, antitumor, antiviral, and cardiovascular agents. 87

D. V. Patel

Conclusions Any new invention needs to be followed up by noteworthy applications to be truly successful. In combinatorial technology, we have quickly moved past the conceptual phase, and current efforts in various pharmaceutical organizations are centered toward the practical exploitation of this technology to discover new drugs. Because of its capability to generate large collections of small-sized, drug-like molecules and screen them against multiple targets in very short periods of time, combinatorial technology will increase the rate at which new leads are discovered and ultimately reduce the timeline for moving and transforming initial hits at the bench to final drug molecules in the clinic. It will hopefully also offer us fresh leads against unprecedented human and microbial genomic targets for which the traditional compound collection may be unsuitable. It, however, cannot be expected to totally displace previous drug discovery methods, but instead be most effective through synergistic integration with those conventional methods. In summary, combinatorial technology offers an interdisciplinary approach to drug discovery through the coordination of chemistry, biology, engineering and computational expertise. With serious commitment from the pharmaceutical industry worldwide to exploit combinatorial science as a critical drug discovery tool, we can look forward to the emergence of numerous combichem-initiated clinical candidates and drugs over the coming years.

Acknowledgements I would like to thank Drs. Eric M. Gordon, Mikhail F. Gordeev and Jeff Jacobs at Versicor and Mark Gallop at Affymax for their valuable contributions toward establishing combinatorial chemistry as a critical drug discovery tool.

References 1 Gallop, M.A., Barrett, R.W., Dower, W.J., Fodor, S.P.A. and Gordon, E.M., J. Med. Chem., 37 (1994) 1233. 2 Gordon, E.M., Barrett, R.W., Dower, W.J., Fodor, S.P.A. and Gallop, M.A., J. Med. Chem., 37 (1994) 1385. 3 Terrett, N.K., Gardner, M., Gordon, D.W., Kobylecki, R.J. and Steele, J., Tetrahedron, 51 (1995) 8135. 4 Thompson, L.A. and Ellman, J.A., Chem. Rev., 96 (1996) 555. 5 Patel, D.V. and Gordon, E.M., Drug Discov. Today, 1 (1996) 134. 6 Gordon, E.M., Gallop, M.A. and Patel, D.V., Acc. Chem. Res., 29 (1996) 144. 7 Furka, A., Sebestyen, F., Asgedom, M. and Dibo, G., Int. J. Pept. Protein Res., 37 (1991) 487. 8 Houghten, R.A., Appel, J.R., Blondelle, S.E., Cuervo, J.H., Dooley, C.T. and Pinilla, C., Pept. Res., 5 (1992) 351. 9 a. Chen, J.K., Lane, W.S., Brauer, A.W., Tanaka, A. and Schreiber, S.L., J. Am. Chem. Soc., 115 (1993) 12591. b. Chen, J.K. and Schreiber, S.L., Angew. Chem., Int. Ed. Engl., 34 (1995) 953. 10 Koppel, G., Dodds, C., Houchins, B., Hunden, D., Johnson, D., Owens, R., Chaney, M., Usdin, T., Hoffman, B. and Brownstein, M., Chem. Biol., 2 (1995) 483. 11 Simon, R.J., Kania, R.S., Zuckermann, R.N., Huebner, V.D., Jewell, D.A., Banville, S., Ng, S., Wang, L., Rosenberg, S., Marlowe, C., Spellmeyer, D.C., Tan, R., Frankel, A.D., Santi, D.V., Cohen, F.E. and Bartlett, P.A., Proc. Natl. Acad. Sci. USA, 89 (1992) 9367.

88

Applications of combinatorial technology to drug discovery 12 Cho, C.Y., Moran, E.J., Cherry, S.R., Stephans, J.C., Fodor, S.P.A., Adams, C.L., Sundaram, A., Jacobs, J.W. and Schultz, P.G., Science, 261 (1993) 1303. 13 Burgess, K., Linthicum, D.S. and Shin, H., Angew. Chem., Int. Ed. Engl., 34 (1995) 907. 14 a. Zuckermann, R.N., Kerr, J.M., Kent, S.B.H. and Moos, W.H., J. Am. Chem. Soc., 114 (1992) 10646. b. Zuckermann, R.N., Martin, E.J., Spellmeyer, D.C., Stauber, G.B., Shoemaker, K.R., Kerr, J.M., Figliozzi, G.M., Goff, D.A., Siani, M.A., Simon, R.J., Banville, S.C., Brown, E.G., Wang, L., Richter, L.S. and Moos, W.H., J. Med. Chem., 37 (1994) 2678. 15 Campbell, D.A. and Bermak, J.C., J. Org. Chem., 59 (1994) 658. 16 Campbell, D.A., Bermak, J.C., Burkoth, T.S. and Patel, D.V., J. Am. Chem. Soc., 117 (1995) 5381. 17 a. Bartlett, P.A., Biochemistry, 26 (1987) 8553. b. Morgan, B.P., Scholtz, J.M., Ballinger, M.D., Zipkin, I.D. and Bartlett, P.A., J. Am. Chem. Soc., 113 (1991) 297. 18 Karanewsky, D.S., Badia, M.C., Cushman, S.W., DeForrest, J.M., Dejneka, T., Loots, M.J., Perri, M.G., Petrillo, E.W. and Powell, J.R., J. Med. Chem., 31 (1988) 204. 19 Deprez, B., Williard, X., Bourel, L., Coste, H., Hyafil, E and Tartar, A,, J. Am. Chem. Soc., 117 (1995) 5405. 20 Smith, P.W., Lai, J.Y.Q., Whittington, A.R., Cox, B., Houston, J.G., Srylli, C.H., Banks, M.N. and Tiller, P.R., Bioorg. Med. Chem. Lett., 45 (1994) 2821. 21 Chabala, J.C., Baldwin, J.J., Burbaum, J.J., Chelsky, D., Dillard, D.W., Henderson, I., Li, G., Ohlmeyer, M.H.J., Randle, T.L., Reader, J.C., Rokosz, L. and Sigal, N.H., Perspect. Drug Discov. Design, 2 (1994) 305. 22 Ohlmeyer, M.H.J., Swanson, R.N., Dillard, L.W., Reader, J.C., Asouline, G., Kobayashi, R., Wigler, M. and Still, W.C., Proc. Natl. Acad. Sci. USA, 90 (1993) 10922. 23 Bunin, B.A. and Ellman, J.A., J. Am. Chem. Soc., 114 (1992) 10997. 24 Bunin, B.A., Plunkett, M.J. and Ellman, J.A., Proc. Natl. Acad. Sci. USA, 91 (1994) 4708. 25 Plunkett, M.J. and Ellman, J.A., J. Am. Chem. Soc., 117 (1995) 3306. 26 Bunin, B.A., Plunkett, M.J. and Ellman, J.A., Methods Enzymol., in press. 27 DeWitt, S.H., Kiely, J.S., Stankovic, J.S., Schroeder, M.C., Cody, D.M.R. and Pavia, M.R., Proc. Natl. Acad. Sci. USA, 90 (1993) 6909. 28 Murphy, M.M., Schullek, J.R., Gordon, E.M. and Gallop, M.A., J. Am. Chem. Soc., 117 (1995) 7029. 29 Gordon, D.W. and Steele, J., Bioorg. Med. Chem. Lett., 5 (1995) 47. 30 a. Gordeev, M.F., Patel, D.V. and Gordon, E.M., J. Org. Chem., 61 (1996) 924. b. Patel, D.V., Gordeev, M.F., England, B.P. and Gordon, E.M., In Chaiken, I.M. and Janda, K.D. (Eds.) Molecular Diversity and Combinatorial Chemistry: Libraries and Drug Discovery, American Chemical Society, Washington, DC, U.S.A., 1996, pp. 58–69. 31 Kurth, M.J., Randall, L.A.H., Chen, C., Melander, C. and Miller, R.B., J. Org. Chem., 59 (1994) 5862. 32 Williard, R., Jammalamadaka, V., Zava, D., Benz, C.C., Hunt, C.A., Kushner, P.J. and Scanlan, T.S., Chem. Biol., 2 (1995) 45. 33 Cao, X., Moran, E.J., Siev, D., Lio, A., Ohashi, C. and Mjalli, A.M.M., Bioorg. Med. Chem. Lett., 5 (1995) 2953. 34 Pirrung, M.C., Chau, J.H.-L. and Chen, J., Curr. Biol., 2 (1995) 621. 35 Wang, G.T., Li, S., Wideburg, N., Krafft, G.A. and Kempf, D.J., J. Med. Chem., 38 (1995) 2995. 36 Kick, E.K. and Ellman, J.A., J. Med. Chem., 38 (1995) 1427. 37 Carell, T., Wintner, E.A. and Rebek, J., Angew. Chem., Int. Ed. Engl., 33 (1994) 2061. 38 Holmes, C.P., Chinn, J.P., Look, G.C., Gordon, E.M. and Gallop, M.A., J. Org. Chem., 60 (1995) 7328. 39 Ruhland, B., Bhandari, A., Gordon, E.M. and Gallop, M.A., J. Am. Chem. Soc., 118 (1996) 253. 40 Boojamra, C.G., Burow, K.M. and Ellman, J.A., J. Org. Chem., 60 (1995) 5742. 41 Goff, D.A. and Zuckermann, R.N., J. Org. Chem., 60 (1995) 5744. 42 Gordeev, M.F., Patel, D.V., Wu, J. and Gordon, E.M., Tetrahedron Lett., 37 (1996) 4643. 43 Wipf, P. and Cunningham, A.A., Tetrahedron Lett., 36 (1995) 7819.

89

This page intentionally left blank

Section II

Combinatorial Biology and Evolution

This page intentionally left blank

Combinatorial biology and evolution: A perspective Brian K. Kaya and Andrew D. Ellingtonb a

Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-3280, U.S.A. b Departrnent of Chemistry, Indiana University, Bloomington, IN 47405, U.S.A.

Over the past several years, combinatorial libraries of biological molecules, such as peptides and nucleic acids, have proven invaluable as reagents with which to study molecular recognition of proteins and non-proteins. Such libraries have been used extensively to define the specificity of protein/protein, protein/RNA, protein/peptide, and RNA/small molecule interactions. The study of molecular recognition using in vitro or artificial selection techniques appears to stand somewhere between basic chemistry and basic biology. The chemist might wonder what is learned from selection experiments, as they apparently do not contribute to principles that can guide the rational design of molecular hosts and guests. The biologist might wonder what is learned from selection experiments, as they do not obviously provide insights into natural selection or evolutionary history. However, such speculations would be both premature and, ultimately, wrong. Artificially selected molecular interfaces are the unfettered universe from which both design and natural selection draw. Chemists may not always see what their molecular designs have in common with the products of natural selection, but can begin to appreciate how at least some of their efforts may be mirrored in the rapidly growing number of artificially selected interfaces. Biologists typically relegate much of the molecular detail of life to unknown historical choices, but artificial selection can identify what choices may have been available for history to make in the first place. Insights and materials garnered from selection experiments are already proving their worth to scientists in both academia and industry. For the academic scientist, artificial selections provide a wealth of new molecules to study; for example, the number of structures of artificially selected RNA molecules is beginning to rival the number of structures of naturally selected RNA molecules. The structures of the artificially selected RNA molecules contain many interesting new structural and functional motifs that might never have been otherwise observed. For the industrial scientist, a finer appreciation of the relationship between sequence, structure, and binding function has provided new insights into rational drug design and the development of novel affinity reagents. This section of the Annual Reports comprises five contributions from experts in their fields. These contributions attempt to meld advanced technical aspects of biopolymer selection with more theoretical treatments. As a result, the interplay between the applied and basic scientific aspects of an emergent field becomes apparent. The opus by Dr. Bennett Levitan (Santa Fe Institute) provides one of the first coherent frameworks for understanding selection techniques. This chapter provides a firm connection between the experimental techniques used and the probabilistic models that of necessity underlie these techniques, and should prove to be a benchmark both in understanding why selections Annual Reports in Combinatorial Chemistry and Molecular Diversity, Vol. 1, pp. 93–94 © 1997 ESCOM Science Publishers B. V.

93

B.K. Kay and A.D. Ellington

work and, more importantly, in how to make selections work better. The second chapter contains a theoretical treatment of nucleic acid landscapes, originally pioneered and here expanded upon by Dr. Peter Schuster (University of Vienna). This theoretical work can be compared with the results presented in the chapter by Dr. Andrew Ellington and coworkers (Indiana University), dealing with nucleic acid selection in the context of landscape models. In the chapter by Dr. Kit Lam (University of Arizona), there is a review of the synthesis and screening of soluble peptides by the ‘one-peptide–one-bead’ approach. Finally, in the contribution by Dr. John Collins (GBF-Braunschweig), the latest advances in phage display are discussed.

94

Models and search strategies for applied molecular evolution Bennett Levi tan Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, U.S.A.

Introduction In just a few years, molecular diversity techniques have revolutionized pharmaceutical design and experimental methods for studying receptor binding, consensus sequences, genetic regulatory mechanisms, and many other issues in biochemistry and chemistry [1–6]. Because of the enormous libraries of ligands that can be used and the rapidity of the techniques, methods of applied molecular evolution such as SELEX and phage display have become particularly popular [ 1,5,7–11]. These methods have been enormously successful, yet the theoretical work developed for them so far is quite limited. The success of these methods is not trivial: the huge number of sequences being searched through, the low concentrations of individual species, and the noise and biases inherent in the techniques would seem to make these experiments very difficult. Understanding why they work so well, and showing how they can perform better and for more complex molecular search problems, falls under the purview of theory. Recent theoretical work in several research groups, including ours, shows how and why applied molecular evolution works despite these intrinsic sources of error. Even more important, the models demonstrate that laboratory protocols can be optimized for a given set of experimental goals – for example, finding the best ligand in a library or finding a variety of high affinity ligands for a consensus sequence in as few rounds as possible. The simulations and calculations also suggest other possible techniques, such as using multivalent binding as a tool to manipulate selection pressure or purposely increasing nonspecific background binding to assist mutation in adding new ligands to the library. Short of knowing the affinity of every ligand in a combinatorial library, there is no way to know how well an experiment has done. Theoretical approaches provide a basis for estimating the performance of an experiment, comparing different search methods, and choosing a method appropriate for a given experimental goal. The work reviewed below summarizes current theoretical approaches to molecular diversity techniques and shows the roles they can play in optimizing experimental design and in searching molecular fitness landscapes, a means of relating ligands to their properties. Because there is a large gulf between theoretical approaches based on laboratory techniques and those based on molecular fitness landscapes, this review is written in two parts: the first part reviews the laboratory technique-based models with special focus on the assumptions and simplifications behind them, and the second part reviews the terminology, models and search results for molecular fitness landscapes. Annual Reports in Combinatorial Chemistry and Molecular Diversity, Vol. 1. pp. 95–152 © 1997 ESCOM Science Publishers B. V.

95

B. Levitan

Laboratory technique-based models and search strategies At present, many popular applied molecular evolution protocols do not involve mutation or recombination. The laboratory technique-based models presented in this section are of this type. Incorporating mutation requires fitness landscape models or some other means of relating molecular properties to particular sequences. The more abstract models reviewed later allow for mutation and recombination and are based heavily on landscape structure. The models in the present section are based on affinity distribution p(Ka), the probability that a ligand chosen at random from the library has affinity Ka. While affinity distributions contain only a small amount of information compared to affinity landscapes, they are sufficient to design optimal search algorithms for certain problems. For example, when phage display and SELEX are implemented without mutation, they are only culling the best ligands from the initial library [8–10] If the library is designed in an unbiased manner, the search depends only on the probability distribution of ligand affinities for the given target. If the library is designed in a biased manner, such as concentrating the initial ligands in one region of the sequence space or building a library based on the variable gene regions of lymphocyte RNA from an immunized animal [11], landscape-type information is needed to calculate the initial affinity distribution, but the actual search depends only on this distribution and not the landscape. Affinity distribution models For ligand affinities, several distributions for p(Ka) are commonly used. The Sips distribution is based on the Langmuir adsorption equation and gives a Gaussian-like distribution for binding free energies [12]. Unfortunately, the corresponding affinity distribution has mathematical properties that make it difficult for analytic work [13]. An often-used model is the log-normal distribution [13,14]. This distribution results from the assumption that binding free energies are Gaussian [15]. Yet another model is the receptor affinity distribution (RAD) model, which results from assuming that binding surface subsites on two molecules have a given probability for being either complementary or uncomplementary [16]. The number of complementary subsites, and hence the binding free energy, then follows a binomial distribution. This model has been successfully fit to data from studies of immunoglobin binding and the human olfactory system [16]. Unfortunately, it is difficult to use analytically. A similar model is described elsewhere [17]. Other models, based on antibody affinity measurements, are multimodal [18]. In general, these distributions all share the intuitive property that the vast majority of ligands have low to moderate affinity, and fewer and fewer ligands have higher affinity. Though the Sips, log-normal and RAD distributions can be made visually quite similar, they will always differ in their moments and other mathematical properties. It should be stressed that these distributions are for affinities. For other properties, such as enzymatic activity, these distributions may not be relevant, though at least one model for catalytic activity is based on these distributions [19]. 96

Applied molecular evolution

Comparing methods of molecular search

Throughout the following section, different search methods and different parameters for a particular search method are compared. Comparing different search methods requires a performance measure, a probabilistic measure of satisfying well-defined criteria for successful search. A common performance measure is the enrichment function [20,21]. Enrichment can be defined as the ratio of either concentrations or mole fractions before and after selection as a function of affinity or as a function of ligand rank in the library; for example,

E(Ka) =

[ligands after selection with affinity Ka] / [all ligands after selection] (1) [ligands before selection with affinity Ka] / [all ligands before selection]

E(Ka) > 1 means that ligands of affinity Ka are selected to a greater extent than the library on average, 0 < E(Ka) < 1 means that these ligands are selected to a lesser extent than the library on average, and E(Ka) = 0 means none of these ligands are selected. This definition of enrichment can also be expressed as E(Ka) =

fraction of ligands with affinity Ka selected fraction of all ligands selected

(2)

Enrichment can be considered for either a single round of selection or over several rounds in succession. The enrichment function provides some measure of the effects of a particular laboratory protocol. However, it is not necessarily the best measure as (i) it is cumbersome to use for comparative purposes, since it is a function of affinity rather than a single number; (ii) it does not reflect the particular goals of a search, such as whether one wants one ligand with very high affinity or a variety of ligands with high affinities; and (iii) if target concentration does not vastly exceed the total ligand concentration, the enrichment function changes depending on the initial affinity distribution. For these reasons, we consider other performance measures. The key issue for a successful search, and hence performance measures, is mole fraction. Unlike computer optimization methods, it is not possible to detect the presence of single members of the population nor to store information on the best sequences after they are lost from the population. Instead, only a small number of ligands are sampled for detailed study in a laboratory protocol, typically 20-200. The probability that ligand i with mole fraction fi is included in a sample of size s from a large library is approximately 1 – (1 – fi)s, which is small for small fi. With such a small fraction of the library being sampled, it is imperative that the desired ligands occupy a substantial portion of the library. For example, with s = 100 samples, achieving probability p = 0.99 of including a species i in the sample requires a mole fraction fi = 1 high mole fraction in such large libraries. Good performance measures will incorporate mole fraction information. For search without mutation, example measures are (i) the mole fraction of the highest affinity ligand (from the initial library) at a particular round; (ii) the probability that a given number of ligands with affinities above a specified threshold appear in a sample taken after a particu97

B. Levitan

Fig. 1. Laboratory technique-based models discussed in this review. Arrows show the different paths that can be taken by an individual phage during a round of selection and amplification. Where included, variables on the arrows indicate the fraction of ligands that follow that path. ‘T’ refers to the target, ‘P’ refers to plastic – to all nontarget surfaces to which phage can bind (background binding) such as the polystyrene well and a poor blocking agent. (a) SELEXION model [20]; (b) Mandecki et al.’s phage display model [21]; (c) Levitan and Kauffman’s phage display model [27]. Ltot: initial ligand; LBT: ligands bound to target after equilibrium; LBP: ligands bound to target after washing; LBXi>: ligands bound to X (T = target, P = plastic) in valency i (monovalent or bivalent) after equilibrium; L ree: unbound ligands after equilibrium; Lwxi: ligands bound to X in valency i after washing; Lpart: ligands retained on nitrocellulose during partitioning; Lelu: ligands eluted from well; Lamp: ligands after equilibrium; Lren: ligands after renormalization; a: fraction of phage that nonspecifically bind h(Ka): enrichment function for equilibrium only; e–k off 1 ash: fraction of targetbound phage retained during washing; d: fraction of nonspecifically bound phage released during one wash cycle; w: number of wash cycles. f

lar round; (iii) the generation at which the mole fraction of the best ligand (or several high affinity ligands) exceeds a given value; and (iv) the sample size required to achieve a given probability of including in the sample at least one copy of one or several ligands with affinities above a given threshold at a given round. For searches which add new species each round by mutation, measures such as the first and third need be defined with respect to affinities above a threshold, not the rank of a given affinity in the initial library. It is helpful to classify modeling approaches as to whether they are deterministic or 98

Applied molecular evolution

stochastic. Deterministic models are based on average equilibrium and kinetic behavior, while stochastic models take rigorous account of the probabilistic nature of library design and of each step in a laboratory protocol. Deterministic models are generally easier to design and are more amenable to analytic results, while stochastic models have more parameters, many of which are not precisely known. However, stochastic models can be used for the minuscule concentrations of individual species found in the first round of experiments or in every round when there is mutation. Deterministic models are unable to handle these two situations properly, as they act as if fractions of molecules can be retained and later amplified to full molecules again. The models discussed in the following section are of both types. Models and search strategies To my knowledge, there are only three laboratory technique-based models for applied molecular evolution. Each is reviewed in detail here. SELEXION model Irvine et al. developed a mathematical model for the RNA design procedure SELEX (Systematic Evolution of Ligands by Exponential enrichment) [7,10] which they call SELEXION (SELEX with Integrated Optimization by Nonlinear analysis) [20]. The SELEXION model consists of three components: (i) deterministic equilibrium; (ii) partitioning; and (iii) amplification and renormalization (Fig. 1 a). The equilibrium and partitioning steps together, and more generally the steps of equilibrium, washing and elution, are known as ‘affinity selection’ or ‘biopanning’. Deterministic equilibrium For the equilibrium step, Irvine et al. use a deterministic solution of the mass-action equations for n species of ligand competing for target. They solve the equilibrium numerically using Newton’s method [22]. To initialize the numerical method, they find an approximate solution using the bulk dissociation constant, 〈 Kd〉 , the effective dissociation constant representative of the population as a whole. 〈 Kd〉 is directly measurable. For theoretical purposes, they estimate 〈 Kd〉 using the reciprocal of the average affinity of the population: (3)

where fi is the mole fraction of species i in the initial population. The term ‘bulk’ is meant to distinguish the effective dissociation constant of the population from the population’s average dissociation constant, which are not the same (Eq. 3). The 〈 Kd〉 approximation becomes less valid with increasing protein concentration and for ligands with lower affinities, so it cannot be used as a general solution; however, the approximate solution is excellent for initializing the numerical solution. This approach works well for small n. However, it requires solving n equations simultaneously and is computationally complex. Solving the equilibrium becomes unwieldy and slow as n gets large (our simulations handled up to about 400 species using this approach). 99

B. Levitan

To bypass this limitation, they approximate libraries using only a few classes (≤33) of ligands. For example, in some simulations, class affinities were based on data from a SELEX experiment whose input was 65 536 RNAs made by varying eight sites in the gene 43 translational operator of bacteriophage T4. The best (highest affinity) ligand was assigned the affinity of the gene 43 wild-type sequence to T4 protein gp43, and other ligands were assigned affinities near the average affinity of the population or intermediate between those of the average and the wild type. Initial copy numbers of each class were 1 or 2 for the high and intermediate affinity classes and several tens of thousands for the average affinity classes. They observed little difference in simulation results as they changed the number of classes or the copy numbers of each class. Partitioning Partitioning refers to separating the bound and free ligands after equilibrium. The partitioning method modeled in SELEXION involves pouring the mixture of ligands and targets through a nitrocellulose filter. Free RNAs tend to fall through the filter, while those bound to protein attach to the filter. To model the noise (the nonspecific binding) in this process, they defined CP as the fraction of bound RNAs that remain on the filter (Correct Partitioning) and BG as the fraction of free RNAs that remain on the filter (BackGround). These parameters can be measured in experiments using wells with no target and wells with target using careful quantification of the target and ligand concentrations before and after elution. The concentration of RNA of type i partitioned onto the filter is then (4) where [Lipart ] is the concentration of RNA type i that is partitioned, [LBT i ] is the concentra] is the concentration tion of RNA type i that bound target in the equilibrium and [Lfree i of RNA type i that remains unbound. (Note: The notations of the various papers are converted into a single notation used throughout this review (Table 1).) For optimal partitioning, one would like CP = 1 and BG = 0. From unpublished experiments, they observed CP = 0.8 and BG = 0.001. This partitioning model makes two assumptions. It assumes that the retention of RNAs bound to nitrocellulose is independent of the randomized component, an unpublished observation made both by Irvine et al. [20] and Mandecki et al. [21]. Sequence-independent background binding may indicate that flow or mechanical issues are responsible for the background component more than actual binding. It also assumes that the retention of RNAs bound to nitrocellulose-bound proteins is independent of the randomized component. This is far less likely, as the pouring of the equilibrium mixture through the filter is very similar to the washing steps used in phage display and other methods of partitioning, in which ligands are released according to their off-rates. The effect is to underestimate the degree of selection, as high affinity ligands are more likely to have lower off-rates. Amplification and renormalization Amplification refers specifically to the enlarging of the selected population by methods such as PCR. Renormalization refers to extracting a portion of the amplified population, typically on the order of the size of the original library, to obtain the input for the next round. In the SELEXION model, amplification and renormalization are combined into a single step. The partitioned population is scaled to the size of the original library, and the concentration of each species is scaled by the 100

Applied molecular evolution

same factor. The mole fractions of each species are not altered by the combined amplification/renormalization step, but the total number of molecules is increased. SELEXION results Deterministic model For their deterministic model, Irvine et al. use two performance measures: enrichment of the best-binding species and the number of rounds needed to amplify the best species to half the population. They define enrichment of species i as the ratio of the mole fraction of species i after a round to its mole fraction before the round (Eq. 1). Their simulations show enrichments qualitatively similar to the SELEX experiment they modeled. However, the simulations required more rounds to yield high concentrations of the best ligand than did the experiment, possibly reflecting the model's not incorporating the effect of different off-rates during partitioning. Irvine et al. demonstrate important points about the effect of background partitioning. With no background (BG=0), enrichment of the best ligand is largest for low target concentrations (Fig. 2). As target concentration is increased, enrichment changes little until a threshold concentration, beyond which the enrichment decreases rapidly. This decrease occurs because high target concentrations make it easy for ligands of any affinity to bind, and the relative advantage of the best ligand decreases. In contrast, with background partitioning (BG > 0), there is an optimal target concentration that maximizes enrichment (Fig. 2). The peak in enrichment results from a tradeoff between competing trends. As [Ttot] increases, all ligands bind more easily and enrichment decreases. As [Ttot] decreases, all ligands bind in lower amounts, and the bound ligands are increasingly overwhelmed by background binding. Even worse, because there are more TABLE 1 MAJOR VARIABLES USED IN THIS REVIEW n Ttot Tfree Litot Lfree i LBT i Lipart Lelu i Pisel Ka,i kon,i koff,i 〈 Kd〉 Kd 1 twash CP BG fi a

b

Number of ligand types Total number of target moleculesa Number of free target molecules after equilibrium Total number of ligand i molecules Number of free ligand i molecules after equilibrium Number of ligand i molecules bound to target after equilibrium Number of ligand i molecules retained after partitioning Number of ligand i molecules eluted off binding surface(s) Probability that one ligand i molecule is selected Intrinsic affinity of species i for targetb Intrinsic target on-rate for ligand i Intrinsic target off-rate for ligand i Bulk dissociation constant Intrinsic target dissociation constant of best ligand in library Total time for wash Correct partitioning probability Nonspecific background partitioning probability Mole fraction of ligand i

Variables refer to number of molecules as written. When enclosed in square brackets, they refer to concentrations. Intrinsic affinities and rates refer to the monovalent ligand binding to monovalent target in solution [15].

101

B. Levitan

Fig. 2. Enrichment (defined as the change in mole fraction of the best ligand divided by the change in mole fraction of the entire library) as a function of target concentration for varying degrees of background binding. With no background, enrichment is largest and constant until a threshold target concentration. With background, there is a target concentration at which enrichment is optimal. As background increases, this optimal concentration increases and the optimal target enrichment decreases. (From Ref. 20.)

unbound ligands to partition nonspecifically, background partitioning is actually increased by lowering [Ttot]. As background partitioning (BG) increases, the optimal target concentration increases and the enrichment at this concentration decreases. These changes occur because the number of target-bound ligands does not change with BG, while the number of background partitioning ligands constantly increases with increasing BG. SELEXION provides a means to estimate this optimal target concentration under the conditions where the approximate equilibrium solution is valid (concentration of unbound target is small compared to the total amount of target). They calculate the optimal target concentration in terms of the bulk dissociation constant 〈 Kd〉 , the ratio BG/CP and the dissociation constant of the best ligand Kd1: (5) where [ligand] is the total concentration of ligand. The enrichment for this target concentration is also easily calculated:

(6)

Equations 5 and 6 show that it is the affinity of the best ligand in comparison to the bulk affinity (1/〈 Kd〉 ) of the library that is important, not the actual affinity of this ligand. Similarly, the ratio of correct and background partitioning (CP/BG) is more important 102

Applied molecular evolution

than their absolute values. For example, from Eq. 6, as long as CP/BG ≥1 (true in general) and 〈 Kd〉 /Kd1 ≥1 (true by definition), enrichment per round is bounded by these ratios: (7) The value of reducing background partitioning is evident from a plot of Eq. 6 (Fig. 3). Enrichment for a given ratio of 〈 Kd〉 /Kd1 is greatly increased as background partitioning is reduced. Their simulations also show that the number of rounds necessary until the best ligand has a mole fraction of at least 1/2 decreases as 〈 Kd〉 /Kd1 increases. When there is no estimate for Kd1, they calculate a target concentration that minimizes the number of rounds needed until a mole fraction of 1/2 for the realistic range of 4 ≤ Kd1/〈 Kd〉 ≤ 106. When there is also no estimate for CP/BG, they derive a similar formula over both the same 〈 Kd〉 /Kd1 range and the realistic range of 0.001 ≤CP/BG ≤0.1. Because the coefficients in the formulae for these concentrations are determined by simulations, their general applicability is unclear. Unfortunately, these estimates degrade in performance as the population size increases, though the simulations in Ref. 20 are insufficient to determine the extent of this degradation. Stochastic equilibrium The stochastic equilibrium model in SELEXION is based on the probability that one copy of a particular ligand will bind target. This probability is approximated by dividing that ligand’s equilibrium bound concentration, as calculated from the deterministic model, by its initial concentration. By assuming that each species binds independently, a fair assumption which is essentially true under conditions of target excess, Irvine et al. calculate the target concentration for which ligands of affinity Ka have a specified probability of binding target. The utility of this calculation is that it provides a quantitative means to set selection stringency based on the competing demands of rapid enrichment and low probability of losing the best ligands. Selection stringency, also called selection strength, reflects the degree to which an affinity difference induces differential enrichment. Stringency is most commonly increased by lowering target concentration, though other means, such as

Fig. 3. Effect of background partitioning on maximal enrichment of the best ligand in the library (enrichment defined as in Fig. 2). Enrichment increases with increasing 〈 Kd〉/ Kd1 and with decreasing BG/CP. (From Ref. 20.)

103

B. Levitan

increasing wash time, are also effective. High target concentrations (low stringency) ensure the survival of the best ligand, but at the cost of reducing enrichment and requiring many rounds for good selection (Fig. 2). Low target concentrations (high stringency), such as defined in Eq. 5, yield high enrichment, but also risk losing some of the best ligands due to their initially very low concentrations. Laboratories have various schedules for increasing stringency. Irvine et al. suggest a two-tiered stringency schedule, in which the initial rounds are set for high probability of the best ligand surviving and the later rounds are set for high enrichment. They derive expressions dependent on the values for CP/BG, 〈 Kd〉 and Kd1 for the number of rounds at which this change should take place. The number of high probability rounds needed drops rapidly as the difference between the affinity of the best ligand and the bulk affinity of the population increases. These expressions for the number of high probability rounds are based on the assumption that, for all these rounds, the concentration of unbound target is small compared to the total amount of target. This assumption is certainly true for the first round, but is increasingly suspect with each round as the concentrations of high affinity ligands increase. The error is such that the best ligand’s enrichment is increasingly overestimated in successive rounds. When applying these expressions to the laboratory, it may be prudent to reformulate them using an exact numerical solution for the equilibrium. These reformulations can be made using the equilibrium and washing solutions developed in the Levitan/Kauffman model discussed below. Mandecki et al. model of phage display Two important limitations in the SELEXION study are (i) the model does not include washing and off-rates for the loss of bound ligands; and (ii) many results are based on ad hoc initial library affinity distributions and unrealistically small libraries. Mandecki et al. developed a deterministic model of phage display that includes washing and a more realistic affinity distribution p(Ka) [2 1]. They show that incorporating washing is critical for understanding the strength of selection. The model consists of three stages: setting the affinity distribution, equilibrium biopanning and dissociative biopanning (Fig. lb). Affinity distribution Ligands are either grouped into m affinity classes with distribution p(Ka,i), i = 1 ...m, or viewed as drawn from a continuous distribution p(Ka). For most methods of phage display, the peptide sequence is inserted in the pIII minor coat protein, of which there are likely five copies at one end of the phage [23]. To avoid the complexities and approximations associated with multivalent binding, Mandecki et al. view the association constants in the model as ‘apparent binding constants’, indicating the avidity of the phage for the target. As discussed below and in the Levitan/Kauffman model, this approximation can give results quite distinct from those generated with a formal model of multivalent binding. For illustrative purposes, Mandecki et al. assumed an order of magnitude decrease in p(Ka) for each order of magnitude increase of Ka. This approximation is a reasonable assumption for intrinsic affinity, the affinity observed for a ligand monovalently binding monovalent target in solution [15]. However, if these values are interpreted as avidity, it is less realistic. To a first approximation, bivalent affinity is proportional to intrinsic 104

Applied molecular evolution

affinity squared, though in practice it is generally just a few orders of magnitude higher than the intrinsic affinity. This difference results in a much steeper decline in avidity probability p(Ka) than for affinity. Additionally, avidity has been shown to be a function of free receptor concentration [24]. The consequence is that the avidity for species i would change each round. Fortunately, Mandecki et al. only use this affinity distribution as a simple example to provide some sense of how the distribution changes over rounds, so these issues do not lessen the qualitative results gleaned from this work. Equilibrium Mandecki et al. use conservation of ligand and receptor equations and solution equilibrium mass-action equations to find the bound concentrations of each type of phage. They do not solve these equations explicitly, but solve for [LBT(Ka)], the concentration of ligand with affinity Ka bound to target after equilibrium, as a function of [Tfree], the total concentration of bound target. However, even without an explicit solution, these results are useful for cases when [Tfree] can be approximated, such as when the target is in vast excess over phage or in the initial round or two, when most phage do not bind. For nonspecific binding, they assume a fraction a of input phage bind nonspecifically before equilibrium, independent of Ka and constant during all rounds (a is on the order of 0.05). The remainder of the phage are available for the equilibrium. Dissociative biopanning Dissociative biopanning is the preferential retention of ligands with slower off-rates during washing of solid support after equilibrium. The ‘partitioning’ step in SELEXION can be more generally considered as a combination of the washing and elution steps used in many laboratory protocols. For specific binding, they model dissociation using the first-order kinetics of monovalent binding, such that LBT i (twash) = (–k off,i twash) , For nonspecific binding, they assume a fraction d of all nonspecifically LBT i (0)e bound phage are removed each wash (each addition and removal of wash buffer). dw is the fraction of nonspecifically bound phage remaining after w washes. Enrichment They define enrichment function h(Ka) as the concentration of ligands with affinity Ka after selection divided by the corresponding concentration before selection. Note that this definition for enrichment is different from that used in SELEXION. For equilibrium without washing, enrichment as a function of unbound target is (8) where b is a normalization constant. The enrichment function when an equilibrium step is followed by a washing step is (9) Assuming that the amount of free target and the fraction of nonspecifically bound phage remain the same each round, the enrichment over r rounds of biopanning is (h(Ka))r. Mandecki model results Equilibrium biopanning The effect of the equilibrium is evident from plots of Eq. 8 (Fig. 4). This plot is based on Fig. 1 in Ref. 21, but I have included the background binding not included in this figure in Ref. 21. For a given [Tfree], enrichment is essentially a + b for Ka above a threshold and essentially a for Ka below another threshold. For 105

B. Levitan

intermediate Ka, enrichment is proportional to log(Ka). Smaller amounts of free target after the equilibrium shifts the logarithmic portion of the curve towards higher affinities, reflecting the increased selection pressure that results from smaller target concentrations. In the first round, since the library consists mostly of low affinity ligands, most target molecules will remain unbound, allowing the same approximation as used in SELEXION; that is, [T free] ≈ [Ttot]. This approximation provides a means to set the target concentration for the first round of an experiment. The logarithmic portion of the equilibrium enrichment curve is centered at Ka = (b – a)/[[Tfree](a + b)]. Thus, to retain ligands with affinities above Kmin in the first round, set [Ttot] = (b – a)/[Kmin(a + b)]. Unfortunately, good estimates for a and especially b are likely rare. However, if the background binding is negligible (a ≈ 0) or very low compared to specific binding (a << b), the appropriate target concentration reduces to [Ttot] = I/Kmin. In later rounds, much more target is bound, [Tfree] < [Ttot], the logarithmic region shifts to higher affinities, and these settings for [Ttot] are not valid. This approach to setting target concentration does not incorporate the probability of surviving selection used to design a stringency schedule in SELEXION. However, since the upper threshold is not the affinity of the best ligand, but simply a threshold, ligands with higher affinities will have very high probabilities of binding. Because selecting ligands with Ka above a threshold is a less stringent performance criterion than selecting the best ligand, the target concentrations calculated here are less rigidly defined than those used for the SELEXION calculations. While the dependence of enrichment on [Ttot] in this model appears different than that in the SELEXION, this difference is only an artifact of the differf

Fig. 4. Equilibrium enrichment function from Mandecki et al.’s model of phage display for different target concentrations (Eq. 8). Curves show enrichment over one round (defined as the ratio of concentrations for a ligand before and after equilibrium) as a function of affinity for different amounts of unbound target. Washing is not included in calculating these curves. As a greater fraction of the target is bound by phage, enrichment at any affinity is reduced and the logarithmic portion of the curve shifts towards higher affinities, reflecting the increased selection pressure that results from lower [Ttot]. The lower plateau is due to nonspecific binding of 5% (a = 0.05, see text) of the input phage. The upper plateau is the sum of the nonspecifically binding phage plus specifically binding phage (a+ b in the model, where b is a normalizing constant). Because the equilibrium is not solved explicitly, b is set to maintain a constant total number of phage specifically bound (0.2 in these simulations).

106

Applied molecular evolution

ent definitions used for enrichment – SELEXION uses a ratio of mole fractions and Mandecki et al. use a ratio of concentrations. With the same definition, the results are similar. Dissociative biopanning The case of pure equilibrium selection is not realistic, as there must be some washing step to remove unbound ligands. Mandecki et al. also consider the more general case of selection with both equilibrium and washing. They show that the degree of background binding versus the amount of target binding is a critical determinant of the rate of enrichment, a result also seen in SELEXION (Eqs. 6 and 7). Figure 5 demonstrates this point. Three cases are shown: (i) a library where the pIII molecules of most phage have the insert (and thus are able to bind target) with a wash time of 1000 s; (ii) a library where most phage are lacking the insert and do not bind target (such as can occur with single-chain antibody libraries) with a wash time of 1000 s; and (iii) same as case (i) but with a wash time of 30 s. With a long wash and relatively little influence from background binding, the enrichment function approaches a step-function shape, changing by several orders of magnitude over a small range of Ka (Fig. 5a). For shorter washes or more background binding, the enrichment function changes more slowly (Fig. 5b). For long wash times, the wash step has the predominant influence on selection, as has been observed experimentally [25,26]. For short wash times, the equilibrium step is responsible for much more of the shape of the enrichment function. The difference in selection due to background binding is very apparent from the changes in the affinity distribution over rounds (Figs. 5c and d). For case (i), the vast majority of the library consists of high affinity ligands after the third round. In contrast, for case (ii), low affinity molecules still dominate the library after the third round. Case (iii) looks much like case (i). For the range of parameters considered, higher background binding is much more harmful to phage display than having short wash times. An important concern with this model is that the equilibrium is not solved explicitly. This limits the utility of the enrichment function to those cases when the free target concentration can be approximated. The plots in Fig. 5 are made under the assumption that the amount of free target [Tfree], the fraction of nonspecifically bound phage a, and the normalization parameter b remain the same each round. These assumptions are increasingly invalid with each round as the average affinity of the library increases. A second difficulty that results from having no explicit solution was that it encouraged including the ad hoc normalization parameter b. Selecting values of b is somewhat arbitrary. For comparative purposes, a + b was kept constant in different simulations. However, it may not be the case that more phage binding target (higher b) means fewer phage nonspecifically binding (lower a). The additional phage that bound could come from the unbound phage, and maintaining constant a + b may be misleading. While these results are said to apply both to solution-phase binding (target and phage both in solution) and solid-phase binding (target attached to a microtiter well or paramagnetic beads and phage in solution), the mathematical development is completely for solution phase. Affinity and rate constants are known to differ considerably between these two phases and can result in considerable differences in simulation results. This issue is addressed in the Levitan/Kauffman model discussed below. Levitan/Kauffman model of phage display Building on Irvine et al.’s and Mandecki et al.’s work, Kauffman and I have developed 107

B. Levitan

Fig, 5. Effects of background binding and washing time on enrichment functions and affinity distributions in Ref. 21. (a) and (b) show the enrichment function for equilibrium alone, washing alone, and their combined effect. (c) and (d) show the library affinity distributions in the initial library and over the first three rounds. [Ttot]= 10-8 M, d=0.4, w = 10 washes. (a) Case (i): Most phage display the peptide, background binding is low, wash time is 1000 s (a=0.05, b=0.2). Here the washing dominates selection. (b) Case (iii): Same as in (a) but with a wash time of 30 s. Here equilibrium and washing both contribute to selection. (c) Case (i): Here biopanning works well, with the high affinity phage dominating the population after the third round. (d) Case (ii): A very small percentage (on the order of 0.001–0.1%) of phage display the peptide (a precise percentage cannot be defined using this model), background binding is high, wash time is 1000 s (a=0.24, b=0.01). Here affinity selection works poorly, with background phage dominating the population even after the third round.

a very detailed stochastic model of phage display [27,28]. The model is specifically designed to study the influence of the stochastic nature of each laboratory step, yet is amenable to analytic results and very rapid computer simulation. The model is shown in Fig. 1c. 108

Applied molecular evolution

Library design Phage target affinities are drawn at random from the RAD p(Ka) model (see above) [16]. These are the intrinsic affinities, those observed for a peptide monovalently binding target in solution [ 15]. They are converted into ‘statistical’ affinities for multivalent binding in the multivalent mass-action equations. Affinity towards nontarget surfaces (which we generically call ‘plastic’, though they may include surfaces other than a polystyrene well) is set at a constant affinity for all phage. Equilibrium The competitive equilibrium is solved using the mass-action equations for n ligands binding to m targets, with different affinities between each ligand and each target. We consider the specific case of binding to an intended target and to the plastic (m = 2). The model allows for both monovalent and bivalent phage and for either surface or solution binding. In this review, I focus primarily on the surface binding, bivalent case. 109

B. Levitan

Bivalent binding is described using the appropriate mass-action equations [24,29,30]. For binding in solution, the first and second interactions are both characterized by the intrinsic association constant and the intrinsic on- and off-rates. This is a reasonable assumption, as the pIII molecules on a filamentous phage are joined only at their base, distant from the peptide insert, and likely bind independently. For binding to targets immobilized on a surface, we use the ‘ad hoc’ surface binding model, in which the first association constant is unchanged, but the first interaction on- and off-rates are replaced with effective rate constants that incorporate both diffusion of phage to a surface and binding to immobilized target [29,30]. The second interaction is characterized by a crosslinking association constant and on and off cross-linking rates that take into account the intrinsic affinity, diffusion, and the limited mobility of a monovalently bound bivalent ligand [24,31]. We approximate the phage as a long, prolate spheroid to calculate its diffusion constant [32], which we find is about 2 × 10–7 cm2/s. Ka,x, the cross-linking association constant, can be approximated as Ka/d, where d is the distance between binding sites on a ligand [31]. For phage, d is on the order of 10 nm [33]. For a third interaction between phage and the surface, steric constraints will likely be the overwhelming determinant for the binding constants. These steric constraints also make trivalent binding less likely than bivalent binding. Additionally, proteolysis removes some of the pIII molecules, decreasing valency in some stochastic manner. For these reasons, we do not consider a binding valency greater than 2. This is just as well, as there is no simple means to estimate the higher valency cross-linking rates. We assume no phage cross-links both a target molecule and the plastic surface, as the targets are elevated off the surface. The equilibrium equations can be solved exactly under the assumption that the free concentrations of target and plastic binding sites after equilibrium are negligible in comparison with their total concentrations. When this is not the case, the equations can be solved numerically. Once the deterministic solution to the equilibrium equations is found, probabilities for phage binding monovalently or bivalently, to target or plastic, or to remain free are found by dividing the respective concentrations by the initial concentration. These probabilities are used in a multinomial probability distribution to simulate equilibrium. One concern with this approach is that nonspecific binding may not be caused by an equilibrium between ligands and the nontarget surfaces. Background binding may result from flow or mechanical causes, as has been suggested by Mandecki et al. [21]. Washing We solve the nonequilibrium situation for washing by using the differential form of the mass-action equations. Under washing conditions (free phage concentration is zero) and with the excess target and plastic approximations, these equations can be solved exactly and allow calculating the amount of phage left in a well. These deterministic results are converted into probabilities for use in a trinomial distribution to probabilistically simulate washing, giving the number of phage of each type that remain bound singly, remain bound doubly or are washed off. Elution The elution model is an extension of that used in SELEXION [20]. Elution is considered to be sequence independent, but with different properties for each target. Phage monovalently bound to target have probability CP of being eluted. Phage bivalently bound to target are eluted with probability (CP)2, reflecting the increased difficulty of removing bivalently bound phage. Similarly, phage bound to plastic monovalently and bivalently have 110

Applied molecular evolution

elution probabilities of BG and (BG)2, respectively. These probabilities are used in a binomial distribution for the number of phage eluted versus the number that remain bound. Amplification and renormalization Amplification and renormalization are combined in the simplified form used in SELEXION. Each phage surviving elution is scaled the same amount, yielding a renormalized library of the same size as the original library, but with the mole fractions that resulted from elution. Sampling The results of an experiment are determined by the small number of phage (typically 20-200) that are sampled from the eluted phage. Because this sample is such a minuscule fraction of the library, we include a probabilistic model of sampling in the model.

Fig. 6. Effect of increasing stringency in the Levitan/Kauffman phage display model. Curves show the affinity distribution at screening rounds 0–4. The initial distribution, ‘round 0’, is that given by the RAD model [16]. tot tot (a) Constant stringency; (b) increasing stringency each generation in which [Tround i ]=0.25 [Troundi–1 ].

111

B. Levitan

Levitan/Kauffman model results Effect of an increasing stringency strategy The effect of different schedules for increasing selection stringency can be seen in affinity distribution changes over rounds. In Fig. 6a, the average affinity distribution is shown for the initial library and the first four rounds for the case of constant stringency. The initial library (round 0) follows the RAD distribution [16]. Subsequent rounds show decreases in low affinity phage mole fractions and increases in those of high affinity phage. The peak of the distribution after the first round is at about 2 × 106 M–1. The mole fractions of the highest affinity phage are not the peak for several reasons: (i) The extremely low concentrations of the highest affinity bins induce a transient disadvantage for such bins. This transient can last for several rounds when the selection stringency is low. (ii) These low concentrations increase the probability that some of the best few ligands will be lost from the library, reducing the average mole fraction of these bins. (iii) While the competitive advantage of the best few phage over the majority of the library is very large, their competitive advantage over phage with affinities 1 or 2 orders of magnitude lower is much smaller. It is at these affinities where the peak is located. The average results for simulations with an increasing stringency strategy are shown in Fig. 6b. The target concentration decreases by 75% each round. By the second round, the high affinity peak is at a higher affinity than in the constant stringency case. This peak also migrates to higher affinities with each round. However, the low affinity phage occupy much more of the library than in the constant stringency case, since lowering target concentration reduces the number of phage that bind target but does not influence the number of phage that bind nonspecifically. For example, the round 4 low affinity phage mole fractions are comparable to those in round 3 of the constant stringency case. Thus, while a decreasing stringency strategy allows higher affinity ligands to be found, more rounds will be needed to find them with the same probability. If the stringency increase is too rapid, most high affinity phage are lost. The optimal rate of stringency increase depends on the performance measure, the details of which are a topic of our current research. Probabilities of binding, surviving washing and selection In one round of selection, each phage has only two possible destinies: it can be retained after elution (selected), or it can be lost from the library. The probabilities associated with all the different paths towards elution in Fig. 1c can be collapsed into pisel , the probability that a single phage of type i is selected. The equilibrium, washing and elution steps composing selection are mathematically equivalent to a single random process acting on the library, modeled with a binomial distribution. With this formalism, the probability distribution for the number of phage of type i selected is given by (10) where the ligand variables refer to the number of phage and not their concentrations and sel tot tot elu (L itot /Lelu can be expressed in terms i ) = L i !/[L i – L i )!] is the binomial coefficient. p i of the probabilities of binding target monovalently or bivalently, the probabilities of binding nonspecifically monovalently or bivalently, CP, BG, and the probabilities that phage bound to target or plastic (monovalently or bivalently) remain bound after washing, all of which are derivable from the model. 112

Applied molecular evolution

This binomial provides a simple means to consider the initial very low concentrations of ligands and the stochastic aspects of selection. For example, the probability that any ligand i will survive selection can be expressed: (11) While the binding of one ligand depends on the degree to which the other ligands bind, this information is all collapsed within the definition of pisel . The binomial formalism makes the binding of different ligands appear independent. The probability that two ligands i and j both survive selection is given by (12) The probabilities for more complex ligand survival questions can be formulated similarly. Similar binomial distributions characterize the number of phage of type i that bind and the number that stay bound after washing. Probabilities for phage bivalently binding target are shown in Fig. 7a. These probabilities are analogous to Mandecki et al.’s enrichment function, shown in Figs. 4 and 5, but without background binding. Higher target concentrations increase the probability of binding for a given affinity phage. Probabilities for target-bound phage remaining bound after washing are shown in Fig. 7b. Longer wash times shift the steep portion of the curve towards higher affinities, reflecting the increased affinity needed to maintain a bond during a longer wash. The slope of the washing curve with surface-bound, bivalent phage is much steeper than that for binding, similar to the monovalent, solution-phase case shown by Mandecki et al. (Fig. 5). However, the steep region of the bivalent wash curves is at much lower affinities than that of the corresponding solution-phase mathematics, monovalent curves (Fig. 7b). The difference in location of the steep portion occurs for two reasons: (i) Bivalently bound phage are harder to wash off than monovalently bound phage. (ii) The ad hoc model for surface binding shows that, as more and more ligands are dislodged from a receptor-filled surface, the rate at which ligands leave the surface is reduced. The justreleased ligands either diffuse from the surface or rebind, and the likelihood of rebinding increases with increased free receptor concentration on the surface [29]. The solution-phase mathematics used for the monovalent case ignores this issue. This difference in location of the steep portion is quite significant. For realistic target concentrations and wash times, the bivalent wash probability is essentially 1 for any phage that bind, and washing removes few bound phage. The monovalent wash probability is significantly less than 1 for many phage that bind monovalently, and dominates the selection. Thus, our calculations and Mandecki et al.’s calculations indicate that the selection is mostly influenced by the equilibrium for multivalent binding and mostly influenced by washing for monovalent binding and long wash times. One implication for a mixture of monovalent and multivalent phage is that, over a certain range of affinities, multivalent ligands of some low affinity are more likely to be selected than monovalent ligands with a higher affinity. For these reasons, there is little visible difference between the binding probability curve (Fig. 7a) and the curve for probability of selection via binding to target (Fig. 7c), other than the slight loss of probability due to elution not releasing every target-bound phage. How113

B. Levitan

Fig. 7. Selection probabilities for the bivalent phage model with solid-phase equilibrium. The horizontal axis indicates intrinsic affinity of the displayed peptide to target. (a) Probability phage bind target for Ttot = 108, 1010 and 1012 target molecules. (b) Probability phage bound to target remain bound after washing for twash= 2, 20, 200 and 2000 s, Ttot= 1012. The two curves marked ‘mono’ indicate the wash curves for monovalent binding in solution, plotted for comparison to other models discussed in this review. (c) Probability phage are selected via binding to target for twash = 200 s. (d) Probability phage are selected via binding either target or plastic for twash=200 s. (106 species of phage, CP=0.9, BG=0.01, Ptot= 1015, Ka to plastic= 104 M–1.)

ever, nonspecific background binding retains all phage at the same probability irrespective of their target affinities, resulting in a lower limit to the selection probability (Fig. 7d). Distribution of mole fractions The binomial formulation can give a probability distribution for the mole fraction fi of the population that a phage of type i will occupy after one round of selection. Calculating a mole fraction needs the total number of selected phage. Since this total is the sum of n binomially distributed random variables, for n 114

Applied molecular evolution

large, as is the case in molecular diversity experiments, the central limit theorem shows that the total can be well approximated by a normal distribution whose mean µ sel is the sum of the n means and whose variance σ 2sel is the sum of the n variances: (13) If there is only one copy of each type of phage before selection, p(fi) is of the form shown in Fig. 8a. Here, either the best ligand is not selected and its mole fraction is 0, or it is selected and its mole fraction is 1 divided by the total number of selected phage. p(fi) thus consists of a spike of value p(Lisel = 0) at fi = 0 and a hump centered about 1/( 1 + µ sel), where here µ sel does not include ligand i. When there are several copies of phage i, p(fi) consists 115

B. Levitan

a

b

Fig, 8. Probability distributions for the mole fraction occupied by the ligand with highest affinity after one round of selection, An arrow indicates the probability for a mole fraction of 0. The rapid jumps to minute probabilities in the middle of the humps are caused by the tails of the adjacent humps and the discrete nature of the distribution. (a) One copy of each type of phage in the initial library. The best ligand is either not selected (mole fraction of 0) or selected (mole fraction of 1 divided by the total number of selected phage). (b) Three copies of each type of phage in the initial Iibrary. The three humps correspond to the cases of one, two or three copies of the best ligand being selected. (n = 100 species, CP = 0.9, BG = 0.1.)

of a peak at 0 and humps centered about all possible values of Lisel divided by Lsel i + µ sel (again µ sel not including ligand i) (Fig. 8b). When there are hundreds to thousands of copies of each phage, the humps merge into a more complex distribution. Effect of sample size A useful performance measure for a search strategy is the number of ligands that have to be sampled and sequenced to achieve a given goal. Sample size is an important performance measure, as plating out individual phage and sequencing 116

Applied molecular evolution

their inserted DNA can be costly and time-consuming for large samples. For example, a strategy that gives a 95% chance of including at least one copy of the best ligand in just 20 samples after the third round is superior to a strategy requiring 40 samples, all else being equal. Minimal sample sizes can be calculated given the mole fractions of the different species in a library. When the objective is to find the best phage in the initial library, the number of samples required for a given probability of finding that phage starts on the order of the size of the entire library and drops rapidly with each round (Fig. 9a). (Note: To avoid the confounding influence of loss of high affinity species in the first round, these simulations have 10 copies of each species in the initial library.) More often, however, the objective is to find a variety of sequences with high affinity, such as when searching for a consensus sequence or trying to deduce a motif. In this case, there is a risk of going too many rounds (Fig. 9b). What happens in this case is that, in the early rounds, the mole fractions of all the best few binders increase. As the rounds continue, the best binder starts to take over the population, and reduces the mole fractions of the second, third, etc. best phage. As a result, it is increasingly more difficult to include at least one copy of each of the best few ligands, and the minimal sample size increases. Eventually, some of the best few ligands are lost due to competition with the best ligand, and it becomes impossible to achieve the goal of the search. Two major determinants of the shapes of these ‘isoperformance’ curves are the initial target concentration and the selection stringency schedule. Figure 9c shows that more rapid drops in target concentration reduce the sample size needed to find the best ligand by a particular round. Not shown in this curve is that, as stringency increases more rapidly, there eventually comes a rate beyond which the best ligand is lost. Much as described for the stochastic equilibrium model in SELEXION, there is a schedule for increasing stringency that gives the optimal probability of retaining the best ligand. When searching for several high affinity sequences, increasing the stringency rate also helps by shifting to earlier rounds the point at which the minimal number of samples is needed (Fig. 9d). However, higher stringency rates also brings the rapid increase in minimal sample size to earlier rounds, making more critical the need to sample the library in the correct round. We are currently investigating whether these generic results can be solved for a particular molecular design problem. Effect of initial copy number For large sequence spaces, a library can only include a small fraction of all possible sequences. The maximally diverse library, the most likely type of library to result from random sampling of a large sequence space, has only one copy of each sequence. Having only one copy is a major source of noise and error in the selection. For example, in all our simulations, increasing the initial copy number of all species from 1 drastically increases both the mole fraction of the best phage and the library’s average affinity in all rounds. However, given a library size, the initial copy number can be increased only at the cost of reducing the diversity of the library. Doubling the number of copies halves the number of samples from p(Ka), resulting in lower maximal affinities for the library. In this case, the relationship between a performance measure and initial copy number is more complex. Depending on the target concentration, the degree of background binding and the stringency schedule, the advantage can be to either greater diversity with small copy numbers 117

B. Levitan

a

b

Fig. 9. ‘Isoperformance curves’ for sampling the best or the five best phage in the initial library as a function of round number. Curves show the number of samples needed to ensure probability p (p=0.5, 0.9, 0.95, 0.99) of including at least one copy of the best phage or at least one copy each of the best five phage. The initial library consists of 10 copies each of 106 species. All curves start with Ttot= 1011 molecules. Curves (a) and (b) both have target concentration dropping by 75% (stringency factor g= 0.25) each round. (a) Sample size for finding the best phage with probability p. (b) Sample size for finding each of the top five phage with probability p. After round 6, at least one of the top five phage have been lost from the library, due to competition from the higher affinity phage. (c) Sample size for finding the best phage with probability p=0.5 for various stringency increase factors g. (d) Sample size for finding each of the top five phage with probability p = 0.5 for various stringency factors g.

(Fig. 10a) or to greater copy numbers with less diversity (Fig. 10b). This issue of whether there is an advantage to increasing copy number at the cost of lessened diversity requires further investigation. 118

Applied molecular evolution

c

d

Fig. 9 . (continued).

Conclusions from laboratory technique-based models The laboratory technique-based models all demonstrate qualitatively similar results. The models can all be easily altered for other techniques of applied molecular evolution, and many of the results are quite general. The simulations provide a great deal of insight into why phage display, SELEX and related techniques seem to work so well. Despite the sources of noise and nonspecific binding, the higher affinity ligands almost always occupy a large fraction of the population in just 3–4 rounds. On the other hand, the studies also suggest that experiments can yield significantly better results (higher affinities, greater variety of high affinity ligands, etc.) if experimental parameters are set appropriately. The studies suggest that there are two critical determinants of enrichment (defined as 119

B. Levitan

a

b

Fig. 10. Effects of higher initial library copy numbers and degree of background binding on library average affinity. Copy numbers are 1, 2, 3, 4 and 10 with a constant library size of 106. Doubling the copy number halves the variety of species, resulting in fewer high affinity species in the initial library. While the larger copy number increases the probability that at least one copy of the best few species is selected, the lower affinities of these best few species reduce the probability that any individual copy will be selected. The optimal tradeoff depends on the degree of background binding. (a) Modest background binding. The best results are obtained for copy numbers of 1 or 2. (b) High background binding. The best results are obtained for a copy number of 10.

in Eq. 1) for a species with affinity Ka: (i) the ratio of Ka to some measure of the average affinity of the library (such as 1/〈 Kd〉 ); and (ii) the degree of background binding compared to specific binding (such as defined with BG/CP). Proteolysis and changes in peptide conformation clearly influence enrichment but, since these processes are not included in any of the models, the relative magnitude of their effects is not clear. For solid-phase, 120

Applied molecular evolution

bivalent binding, under the wide range of conditions considered in the Levitan/Kauffman model, washing did not play a critical role in selection, even for twash = 2000 s. Using a solution-phase model and monovalent binding, Mandecki’s work shows that washing can be either the dominant influence on selection or only a contributor along with the equilibrium. I use the term ‘solution-phase model’ to emphasize that modeling washing with solution-phase mathematics may not make sense, as washing implies a solid support against which liquid is poured. Washing also has less influence on selection in solid-phase binding than in solution-phase binding, since the solid-phase cross-linking affinity constant is much higher than that for solution-phase binding and the off-rate decreases with increasing free receptor concentration, making ligands much more difficult to wash off [29]. Selection stringency can be viewed as the strength of the competition between the best ligands and the rest of the library. With any degree of background binding, there exists a target concentration that yields optimal enrichment for ligands of a given Ka. However, this (generally low) target concentration is at odds with the high target concentrations needed to ensure retention of high affinity ligands in the early rounds. Lower target concentrations reduce the number of ligands selected via target, but do not change the number of ligands nonspecifically bound. As a result, the mole fraction of ligands selected for their affinity to target is reduced (Fig. 6). These points suggest that it may be of greater benefit to increase stringency by the use of very long wash times than through lower target concentrations, provided the wash is long enough to shift the steep portion of the wash curve to high enough affinities (Fig. 7). Longer washes affect both target-bound and nonspecifically bound ligands. As a result, long washes will not drastically reduce the mole fraction of target-bound ligands but will still remove low affinity ligands compared to high affinity ligands. The washing probabilities in Fig. 7b suggest that a more refined control of selection is possible by manipulating phage valencies. Current studies show there are likely five pIII molecules at the end of a filamentous phage, of which any number may have the displayed peptide [23]. With the use of phagemids or other techniques, there is some ability to control the valencies of the peptide. Higher valency allows for translating the transitional region to higher or lower affinities depending on the target concentration. More significantly, the higher the valency, the more the equilibrium and washing probability curves look like step functions. A step-like selection function gives exquisite control of selection pressure by increasing the ability to discern between low and high affinity ligands. Step-like selection functions are also very amenable to theoretical study [34]. A useful role for theory in determining how many rounds to perform is suggested by the sample size curves in Fig. 9. If the goal is to find the highest affinity ligand in the library, then the round in which to sample is dictated by the sample size. The larger the sample, the fewer the rounds needed. If the goal is to collect a variety of high affinity ligands, then the round in which to sample is determined by the minimum of the sample size curve. These plots also point out the danger of going too many rounds – the molecules one is interested in can be competed out of the population. Another risk of going too many rounds is described in Ref. 35, where the error in measuring affinities can cause the continual loss of the best ligands in the library. Many of the simulation and empirical results described in these models can be approximated analytically. Some of the derivations in SELEXION are good examples, and the 121

B. Levitan

Fig. 11. Average highest affinity (average first-order statistic) in the initial library as a function of library size. The affinity distribution p(Ka) is log-normal with mean 3.2 × 106 M–1 and standard deviation 107 (from Ref. 14). While the average affinity of the best ligand always increases with increasing library size, the incremental increase for one more library member decreases as the library is made larger. This diminishing return relates to the tradeoff in library size versus ligand copy number described in the Levitan/Kauffman model (see text).

binomial characterization of Levitan and Kauffman is potentially useful in this regard. However, the derivations all ultimately require such estimates as average affinity and the few highest affinities in a library of a given size. One important means to estimate unknown parameters is with order statistics [36]. The affinity distribution p(Ka) can be used to find the expected affinity of the best ligand, second-best ligand, and so on, in the initial library (Fig. 11). Such statistics are called order statistics, as they provide properties of values drawn from a probability distribution sorted in numerical order. It is evident from Fig. 11 that there are diminishing returns from increasing the library size in that the incremental increase in the highest affinity becomes increasingly small for large libraries. The limitation in using order statistics is that p(Ka) is generally unknown, limiting this approach to those cases for which there are estimates to p(Ka). Once there is such an estimate, many of the results in SELEXION can be cast in terms of p(Ka), thereby removing the need to make guesses for Kd1 and 〈 Kd〉 . Of particular interest would be finding target concentrations for the stringency schedule strategies described in SELEXION without having to consider a range for Kd1/〈 Kd〉 that spans several orders of magnitude. The sample sizes calculated in the Levitan/Kauffman model could also be calculated in terms of order statistics, as can many other useful properties considered in all three models. Several issues stand out as important for better application of theoretical approaches to experiments: (1) Better comparison of surface-phase to solution-phase experiments. Solution-phase rates and constants are significantly different from those for solid phase. A model comparing these approaches can potentially decide which approach to use for a given experimental objective. (2) Determining the degree to which multivalency drives the results in phage display. 122

Applied molecular evolution

Useful quantitative results require a model to match the actual chemistry. Additionally, controlling peptide valency is potentially a tool to manipulate selection pressure. (3) Determining whether background binding is caused by an equilibrium with nontarget surfaces or by flow and mechanical causes. The correct mechanism is needed for a model to yield quantitatively useful results. (4) Use of affinity distribution order statistics to assist in estimating experimental parameters. This approach is dependent on having estimates for p(Ka) and error bounds on the resulting order statistics. (5) Searching for optimal stringency schedules. Two stringency schedules are considered in these models: a two-tier stringency strategy, in which low stringency rounds are followed by high stringency rounds (SELEXION), and exponentially decreasing target concentrations (Levitan/Kauffman). There is no a priori reason that these schedules are better than others. Finding more optimal schedules that are matched to the performance measure, either by analytic derivation or searching through a space of schedules, would improve experimental design. (6) Selection stringency in an experimental protocol can be controlled in several ways; for example, decreasing blocking so as to increase background binding, reducing wash times, increasing target concentration or adding new ligands to the library each round. A comparison between these methods would be useful.

a

b

Fig. 12. Sequence spaces for the sequence/structure shown in the right column. (a) Sequence for ssRNA of length 3 with monomers G and A only. (b) Part of the sequence space for the 1,4-benzodiazepin-2-one derivative library constructed in Ref. 156. For clarity, not all neighbor relations are shown with lines in this figure. Points which differ at one site are called 1-mutant neighbors. For example, the points marked o are all 1mutant neighbors of the point marked .

•

123

B. Levitan

Fitness landscape-based models and search strategies The concept of fitness is defined in more detail below, but can generally be regarded as the molecular property of interest, such as affinity towards a given receptor. This section focuses on the three arenas in which fitness landscape structure has relevance to molecular diversity techniques: (i) library design, in which methods of biasing the initial library can be evaluated based on landscape structure; (ii) mutation and recombination, for which protocol parameters can be determined from landscape properties; and (iii) variant search strategies such as pooling, in which there is no mutation but directed expansion of a subset of the library towards a sequence subspace. For laboratory protocols using random, unbiased library design and no mutation, models of the type described above are completely appropriate. Landscape models are much more abstract than the laboratory technique-based models. As extensive as theory about evolution and optimization on fitness landscapes has become, there is still little work on matching a search algorithm to landscape properties. Additionally, much of this work is based on landscape properties that are presently very difficult to measure with any statistical significance for molecular landscapes. For these reasons, and for reasons of limited space, the landscape search results will be explained in much less detail than the laboratory-based techniques. This section is divided into four parts: (i) definitions of terms used in fitness landscape studies and caveats about their misuse; (ii) review of models for fitness landscapes; (iii) results from studies of search on fitness landscapes; and (iv) conclusions from these results. Definitions Sequence space The set of candidate solutions considered by a search procedure is often called the search space of the problem. For molecular design problems, there are several possible search spaces, the most common being sequence space, the space of all molecules being considered [4,37–39]. The concept of a sequence space is important because it provides a framework for formal theory and it has heuristic value in developing intuition for searches and communicating ideas. Sequence spaces are discrete, though search spaces in general may be discrete, continuous, or discrete on some axes and continuous on others. Depending on the type of molecule being designed, different coordinate systems may prove most appropriate for the sequence space. For linear polymers such as peptides, oligonucleotides, oligosaccharides and unnatural biopolymer mimetics, the most obvious coordinate system is an N-dimensional discrete space with A points along each axis, where N is the length of the polymer and A the number of possible monomers at any site (Fig. 12a). For nonlinear molecules, one can define ‘special-purpose’ spaces. For example, 1,4benzodiazepine libraries have been designed by substituting various functional groups at certain sites on a benzodiazepine scaffold [40,41]. Each substitution site defines a different axis, and the number of possible substitutions at a site is the number of points along that axis (Fig. 12b). Sequence spaces may also be defined based on encoded libraries. In such libraries, different molecular species are labeled with readable tags that record the sequence of 124

Applied molecular evolution

reactions that formed them [41–43]. In encoded libraries, the dimensionality and size of the space can be based on the encoding, allowing the space to be independent of the structure of the macromolecule or the mapping between the labels and macromolecules. It is important to distinguish between a sequence space and the neighbor relationship between points in the space. Many studies of searching sequence space implicitly use point mutation to define the neighbor relationship; that is, points are neighbors in the space if the molecules they correspond to differ by one monomer or one site substitution. Such points are called 1-mutant neighbors. This neighbor relationship results in the intuitive and appealing Hamming distance measure between points in which the distance between points is the number of sites at which they differ [44]. However, the intuition that results from this implicit view can be misleading, as search algorithms can have much more complex mutation operations than point mutation, such as insertion, deletion and inversion. Properties that are defined for a Hamming metric may not be relevant to searches that use these operations. Additionally, with operations such as recombination (crossover), the neighbors of a point depend on what species are presently in the population, necessitating a much more complex definition of neighborhood [45–49]. One means to overcome this potential confusion is to consider several neighbor relationships, each defined by a different procedure that the algorithm performs [50]. Another is to drop the concept of a neighborhood completely and only use the mapping from sequences to fitness values. Any notion of neighborhood is then subsumed within the search algorithm and changes according to details of the algorithm. When describing various models, for both simplicity and comparison with the literature, I will generally keep to the view that neighbors are defined by point mutations and distances are Hamming distance. Other molecular spaces A concept related to sequence space is shape space, in which various descriptions of the physical dimensions or other basic features important for molecular binding define the space [51–54]. The set of features is referred to as the generalized shape of a molecule. For example, axes might represent physical dimensions, charge, hydrophobicity, dipole moment, and the like [52]. Each molecule is represented by a point in this space. An antibody binding to an antigen can be represented as a cloud of varying density centered about the antigen to which it has maximal affinity. The cloud is of lesser density for nearby antigens towards which it has lesser affinity. Importantly, the affinity of one molecule for another can be represented as a function of their shape space coordinates. The fitness of a molecule is this affinity function when the other molecule is the given receptor. A benefit to these types of spaces is their having many fewer dimensions than the sequence spaces of the corresponding molecules, making them easier to search. Shape spaces are also continuous or consist of a mixture of continuous and discrete dimension, unlike sequence spaces, and can be searched using a wider variety of search algorithms. However, generalized shape descriptions do not completely characterize a particular molecule, as many molecules may correspond to the same point in the space [54]. As a result, the spaces used in the past have not had sufficient specificity, so the results show many false positives for successful binding [5 1]. Additionally, since many molecules undergo solvent-dependent configurational changes upon binding, the shape description is not 125

B. Levitan

always relevant. It has also been argued that the fitness landscapes in shape space will be very rugged with many discontinuities, making search very difficult [51]. Searches in shape space are thus presently of limited utility, though this may change as better computational facilities allow for higher dimensional representations with fewer simplifications and with improvements in the algorithms for docking between molecules. A space similar to shape space is catalytic task space [4]. Points in this space represent different reactions that a molecule can catalyze. This is more of an intuitive concept for developing qualitative arguments than a well-developed quantitative space, but it may eventually be made rigorous enough for theoretical molecular diversity applications. Fitness landscapes The term fitness in evolutionary search has many different definitions. The concept of fitness landscapes originated with Sewall Wright in the field of evolutionary biology [55]. He considered fitness as a function of either genotype or gene frequencies whose value is a measure of the reproductive success of the corresponding organism or species. In evolutionary biology, fitness can refer to the reproductive success of a single organism or to a measure of how a population is matched to its environment [4,55–57]. In physics, fitness is analogous to free energy, the minimization of which determines the configuration of a system [34,58–62]. In applied molecular evolution, fitness generally has one of two meanings: (i) It can refer specifically to how well a molecule performs a desired function, typically the affinity of a ligand for a given receptor or its catalytic activity for a given reaction. (ii) It can refer to the rate at which a molecule in a population of molecules is copied over one iteration, similar to the notion of enrichment in the molecular diversity literature. This second definition is more complex, as fitness depends not only on the properties of a molecule but also on the properties of the rest of the population. Since fitness then changes each iteration as the population changes, the whole fitness landscape metaphor is weakened. For these reasons, I will restrict myself to the first definition of fitness. The fitness function is simply the mapping between points in sequence space and their fitnesses. The fitness landscape is the combination of the fitness function and the neighbor relationship. With neighbors defined by point mutation, for an N-site molecule, the landscape is the N-dimensional surface that results from plotting the fitness function over an N-dimensional Cartesian coordinate sequence space (Fig. 13). Any theoretical study of applied molecular evolution needs information on the fitnesses of the molecules in the search space, as it is not possible to characterize the performance of search algorithms without knowing properties of the landscape being searched [63]. Since the ideals of sequence-to-structure or sequence-to-function models are not yet possible, it is necessary to use approximations to these relationships or make assumptions about their functional form. To this end, a large variety of models have been developed, ranging from randomly choosing affinities from a probability distribution to detailed biophysical descriptions of sequence-structure prediction. These models are often used to study protein folding, the immune system and molecular evolution (the study of macromolecule evolution and the reconstruction of evolutionary histories), but they can also be used to study applied molecular evolution [4,39,53,64–67]. A number of these models are reviewed below. 126

Applied molecular evolution

Fig. 13. Example of a two-dimensional fitness landscape. x1 and x2 are coordinates in a space. The global peak is the point of highest fitness in the space. A local peak is a point all of whose neighbors, defined by some neighbor relationship, are of lower fitness. In this figure, neighbors are points that differ by one in either x1 or x2.

Properties of fitness landscapes Fitness landscapes have been characterized by a variety of properties, many of which are based on the ideas of adaptive walks and local peaks [4,38,67–79]. An adaptive walk is a search technique that begins at some point (molecule) in the sequence space and successively moves (walks) to higher fitness neighbors. Neighbors are tried and discarded until a more fit one is found, with neighborhood defined in the general sense described above. A walk continues until a point is reached which is fitter than all its neighbors. Such points are called localpeaks or local optima. Properties based on adaptive walks and local peaks are useful, as adaptive walks are a key component of many evolutionary search algorithms as well as of phylogenic evolution [4,37,38,80]. Walks can get stuck at local peaks and miss the global peak. For example, in antibody affinity maturation, the halting in affinity improvement despite attempts at additional mutation is akin to being trapped at a local optimum [4,14,81]. Some commonly used peak and adaptive walk-based properties are (i) the number of fitness peaks; (ii) the average length of walks from randomly chosen points to peaks; (iii) the average number of peaks that can be reached by walks from a randomly chosen point; and (iv) the average fraction of fitter neighbors as a function of fitness. Other useful properties are listed and evaluated for many landscape models in Refs. 4, 67, 69, 77 and 78. One of the most important landscape properties is autocorrelation, a measure of similarity of fitnesses of neighboring points. Uncorrelated landscapes may have very dissimilar fitness values for adjacent points and are called rugged. Formally, for landscapes which are stationary (have the same mean, variance and autocorrelation throughout the space), autocorrelation is defined as 127

B. Levitan

(14) where 〈x〉 means the average (expected) value of x over the sequence space, ax is one point in the landscape, ax+d is a point distance d from ax, and µ f and σ f2 are the mean and variance of the fitnesses. (Other definitions of autocorrelation are also used, but they all have a similar meaning [67,69,82,83]. Note that some authors use the term isotropic instead of stationary and that alternative definitions for stationarity are sometimes used.) R ranges from –1 to 1. Consider d = 1. An R near 1 (high correlation) means adjacent points have near-identical fitnesses and that the landscape is essentially flat. An R near –1 (high anticorrelation)means adjacent points have near-equal and opposite fitnesses and that the landscape essentially alternates between two values. An R of 0 (uncorrelated) means that the fitnesses of adjacent points are totally unrelated and that the landscape is completely random. Molecular affinity landscapes are universally considered to have a 1 -mutant Hamming distance correlation between 0 and 1 [3,4,37,39,64–67,71–73,83]. Autocorrelation is an important property. The more correlated a landscape, the fewer local peaks it has, the less often adaptive walks will get trapped in its local optima, and, in general, the easier it is to search with adaptive walks and similar search methods. Very correlated landscapes have few local peaks, while uncorrelated and anticorrelated landscapes have many local peaks. As discussed below, good methods of library design and search take correlation into account. However, autocorrelation can be misleading if the search method and mutation operators are not kept in mind. If a search involves mutations of two or more sites at once, the definition of a 1-mutant neighbor local peak is not as relevant to the search. If the search involves crossover or inversion, a molecule may convert in one step to a molecule very distant by the Hamming measure, and the concept of 1-mutant neighbor local peaks can be irrelevant [84]. Thus, the local peaks that are important are those within the neighbor relationship given by the search algorithm; they are not statistical properties of a landscape itself. Several types of autocorrelation are often used for landscapes. In several important papers, Weinberger and Stadler consider both autocorrelation between adjacent points along a random walk in the landscape and autocorrelation between points a given Hamming distance apart independent of any walk [67,77,78,82,83]. Both definitions yield similar information about the landscape and can be computed from one another for stationary landscapes. Other types of autocorrelation are based on neighborhoods defined by complex mutation operations such as crossover [45–49,85]. →

→

→

Models of fitness landscapes It is important to clarify exactly what fitness landscape models are. In essence, they are simply means to assign fitnesses to points in a sequence space. The fitnesses together with a neighbor relationship form a landscape. There are an infinite number of landscapes for any sequence space. For a given problem, such as finding antibodies with high affinity to an antigen, the vast majority of landscapes are unrealistic. An appropriate fitness landscape model generates the more realistic ones. In general, fitness landscape models can be viewed as generating landscapes from a probability density function over all landscapes. 128

Applied molecular evolution

Quite often, a model is capable of producing all possible landscapes; however, for the vast majority of landscapes, the probability of the model producing them is extremely low. The quality of a fitness landscape model is determined by how closely its landscape probabilities match those of the problem being considered. Spin glass-like models The conflicting energetic constraints that make molecular fitness landscapes difficult to search have many similarities with similar constraints in models of magnetic materials known as spin glasses [86]. Because spin glasses have been studied in great detail, spin glass-like models have proven very instructive in studying applied molecular evolution. A spin glass is a disordered magnetic material in which the magnetic dipoles composing the material are oriented in different directions. A dipole’s preferred orientation, determined by which orientation has the lowest energy, depends on the orientations of nearby dipoles. Models of spin glasses consist of a lattice of ‘Ising spins’ with each spin pointing either up or down. When one spin switches to its preferred orientation, it generally will cause many other spins to switch their preferences. The system exhibits frustration in that many of the orientation preferences are mutually incompatible (Fig. 14a). If it were possible to satisfy all the preferences simultaneously, the material would have a low energy. Since frustration prevents this, the material has many possible configurations with higher energies that cannot be brought to lower energies by any spin changes. This frustration is responsible for multiple local peaks and local minima in landscapes [4,34,62,87]. Frustration in spin glasses is very similar to that caused by the conflicting energetic constraints in molecular fitness landscapes [4,34,58,87–90]. Frustration in a binding reaction can result from the optimal choice of a monomer based on certain constraints being incompatible with the optimal choices for the same monomer based on other constraints (Fig 14b). When a spin glass-like model is used for molecular fitness landscapes, energy corresponds to either the free energy change during binding or to the affinity itself, spins correspond to the sites for components in the molecule, and spin orientations correspond to different choices for the components. The main differences between spin glass models and molecular landscapes are that there are usually more than two different monomers for a site and that the interactions in spin glasses are much more symmetric than those found in molecules. Nevertheless, the extensive work done on spin glasses is directly relevant to optimization on molecular landscapes. Random energy model The random energy model (REM) results from using a fitness distribution p(f) to assign fitnesses randomly to points in the landscape [ 14,59,60,70,71,81, 91,92]. p(f) is the probability that a point in the sequence space has fitness f and is exactly analogous to affinity distribution p(Ka). Such landscapes have zero correlation (are very rugged), have many local fitness peaks, and result in very short adaptive walks. Very few of the local peaks are accessible by adaptive walks from any particular point. REM landscapes have been applied to the maturation of the immune system. When exposed to a new antigen, antibodies undergo on average only 6–10 point mutations in the course of achieving typically a 50–100-fold increase in affinity [70,71]. Such short walks from random initial points to local peaks occur in rugged landscapes. The short walks and only modest affinity increase can be interpreted to indicate that the antibody affinity landscape is essentially uncorrelated [14,71,81,91]. However, because there is such 129

B. Levitan

Fig. 14. Examples of frustration, the inability to simultaneously satisfy conflicting constraints. (a) A twodimensional spin glass. If the weights between all four points are all positive, then the preferences of all spins are satisfied when they point in the same direction. If all the weights are negative, all preferences are satisfied if the spins in two diagonal corners point one way and the other spins point the other way. Such systems do not demonstrate frustration. In this figure, the weights between points 1 and 2, points 2 and 3, and points 1 and 4 are positive, so these spin pairs prefer to be oriented in the same direction. The weight between points 3 and 4 is negative, so these spins prefer opposite orientations. It is not possible to simultaneously satisfy these four constraints, and the system is frustrated. (b) The line represents the binding pocket of a receptor. ai and aj are two amino acids deep in the pocket. Frustration can occur when local constraints make particular choices of amino acids optimal for ai and aj independently, but using one of these optimal choices precludes usage of the other.

a large variety of antibodies in organisms, the antibodies that initially respond have affinities on the order of 5 × 104 to 1 × 105 M–1, corresponding to moderate to highly ranked fitnesses. On random landscapes, the corresponding points are almost always local peaks or one step from local peaks [70,92]. Since the walks from such points would be much shorter than those observed for the immune system, the 6-10 step walks also provide an argument for correlated landscapes. While it is generally believed that affinity landscapes are at least moderately correlated, the immune system data are subject to multiple interpretations. Depending on the particular assumptions and model parameters, the 6–10 step walks can imply that the antibody affinity landscapes have any degree of correlation [14,92]. The effects of these parameters are discussed in an enlightening discussion in Ref. 14. A different argument that affinity landscapes are correlated is given in Refs. 4, 70 and 91. If p(f) is kept constant, increasing the number of conflicting constraints results in increasingly uncorrelated landscapes whose average peak fitness drops towards the average fitness, a phenomenon called the complexity catastrophe [80]. Roughly, the complexity catastrophe occurs because the average fitness of local peaks becomes more similar to that of the entire landscape as the number of peaks increases. The complexity catastrophe is a serious concern for rugged landscapes, for even if selection is strong enough to pull an adapting population to accessible local peaks, those peaks on average are just mediocre [70,92]. Assuming a log-normal-like shape for p(Ka) with a mean affinity anywhere around 1 × 102 to 1 × 104 M–1, such a majority of mediocre peaks is at odds with the much higher affinities seen after affinity maturation. Thus, much of the recent work on molecular landscapes has focused on models that generate correlated landscapes. NK model The NK model is a simple landscape model that allows arbitrary degrees of correlation between 0 and 1 [4,92,93]. The sequence space is an N-dimensional space 130

Applied molecular evolution

with A possible values at each site, corresponding to a polymer of length N with A different monomers. In this and the following sections, a particular sequence is denoted as a= (a1,a2, ...,aN), where ai is the monomer at site i, and a’s fitness is denoted as f(a). The keys to the model are: (i) The fitness of a sequence is the average of fitness contributions from each site →

→

→

→

where fi(a) is the fitness contribution from site i. (ii) The contribution of site i depends on the particular monomers at site i and at K other sites, where K ranges from 0 to N – 1. For example, the fitness contribution of site 4 may depend on a3, a4 and a5 for a K=2 model, though in general the K sites need not be adjacent to the contributing sites (Table 1). (In fact, most important results are insensitive to where the sites are chosen [70,92].) For each site i, the values of the fitness contributions for each configuration of the K + 1 monomers are chosen randomly from some distribution, usually uniformly from 0–1. Clearly no biophysics or biology go into the NK model. What makes it useful, and relevant to so many problems in biology and evolution, is how it so well captures the notion of epistasis. Epistasis refers to interactions in which one gene in a genome alters the phenotypic expression of genes at other loci. For molecules, this is the degree to which changing a monomer at one site influences the energetics of other monomers. Epistasis is the primary cause for the conflicting constraints that result in frustration in the NK and other spin glass models (Fig. 15). K controls landscape ruggedness, as there are more conflicting constraints with larger K. For K = 0, each site is independent of other sites and there are no conflicts. The monomers that give the largest fitness contribution for each site individually determine the optimal sequence collectively. Changing any site from its optimal monomer reduces the total fitness, regardless of the monomers elsewhere. Thus, K = 0 landscapes are singlepeaked. Because each monomer change only alters one of the N fitness contributions and changes the fitness by a small amount, K = 0 landscapes are also very correlated. As K increases, the increasing number of conflicts increases frustration and results in more local peaks, lower average fitnesses and decreased correlation. At K = N – 1, the conflicts are so severe that changing any monomer results in a totally new fitness. These landscapes are completely random, have many low-fitness peaks, and have zero correlation. Being able to tune ruggedness by varying K allows for great flexibility in mapping the NK model onto real-world problems and in using them as models for molecular fitness landscapes. They have predicted statistical features of antibody affinity maturation to an antigen in the immune system [4,70,92]. They have been used to analyze and compare hillclimbing, pooling and recombination as methods of applied molecular evolution [65]. They have also been used to consider the influence of measurement error on adaptive walks in sequence spaces [35]. The properties of NK landscapes have been well investigated [4,70, 77,78,83,94–97]. There are also several variants of the NK model that have been informally considered, such as having different K’s at different sites, making the epistasis high in some regions and low in others (a property reminiscent of the block model described below), drawing the fitness values from various distributions, and making fitness the product of the site fitness contributions rather than their average. 131

B. Levitan

Fig. 15. Example of frustration in an NK fitness landscape model. The tables list the fitness contributions for sites 3 and 4 as a function of their K = 2 epistatic inputs and their own values. The highest fitness contribution for f3 requires [a2,a3,a4] =[1,0,0], while for f4 it requires [a3,a4,a5]= [1,0,1]. These two constraints cannot be mutually satisfied, leading to frustration. As K increases, the number of such conflicts rises and results in an increasingly rugged landscape whose peaks are of increasingly lower average fitness.

p-Spin model Derrida [59] and later Amitrano [34] developed a more general class of fitness functions called p-spin models. p-spin models merge the variable ruggedness of NK models with the amenability to mathematical analysis found in spin glass models. The sequence space for p-spin models is N-dimensional with site values of either 1 or -1. Fitness is defined as

where Ji1,...,ip are chosen independently from some random distribution. The sum is over all p-sized subsets of sites within l...N. Each subset has an independent J value. The properties of p-spin landscapes have been studied in detail [34,79]. One observation of importance is that p plays a role similar to that of K. The number of local peaks increases exponentially with p [76]. This occurs because, as p increases, there are more conflicting constraints that cause the site values that maximize one J function to conflict with those site values that maximize other J functions. As with NK models, p-spin models are applicable to a wide range of biologic and molecular problems. One useful variant of the p-spin model is to define fitness as the sum of site interactions for several values of p. This variant shows parallels with the contact map method of inverse protein folding [98–100]. For example, p = 1 interactions may represent the free energy contributions for a given amino acid being buried or exposed to solvent or being found in certain secondary structures, and p = 2 or 3 interactions may represent the contributions of free energies when two or three given amino acids are in contact [98]. Block model Another model that allows for variable degrees of correlation and has been applied to the evolution of antibodies is the block model [73]. This model is motivated by the observation that proteins are composed of functional domains, each of which has high epistasis internally, but between which there is much less epistasis. The block model breaks a sequence into subsequences or ‘blocks’, each of which has its own, independent fitness function. Blocks need not be contiguous; they can represent amino acids in tertiary structure domains. Mutations in one block affect the fitness of that block only, while the total fitness of the sequence is the sum of the fitnesses of all blocks. Landscape ruggedness 132

Applied molecular evolution

can be decreased (and fitness correlation increased) both by increasing the correlation within blocks or by increasing the number of blocks for a constant sequence length. An advantage of the block model is that different fitness models and lengths can be used for each block, providing flexibility unavailable in any individual model. Different correlations and other properties can be imposed in different regions of the sequence space. For example, since particular monomers can be critical determinants of affinity [101], a spin glass model with no to low correlation can be used for a block corresponding to a binding site. A model with greater correlation or other properties might be more appropriate for the structural regions. Similarly, for a molecule with several binding sites, each can be modeled with a different block with a different landscape model. RNA secondary structure models A completely different class of landscape models has developed from studies of RNA secondary structure folding (recent review in Ref. 66, software available in Ref. 102). These studies are reviewed by Schuster elsewhere in this collection, so I will only summarize certain relevant results here. As with proteins, the spatial structure of RNA molecules is very hard to predict. As an alternative, models of RNA secondary structure can be used as an approximation to RNA spatial structure. Secondary structure models have proven quite useful because [66,67]: (i) the major part of the free energy of folding occurs in the base pairing and base pair stacking of RNA molecules, which are the basis of secondary structure models; (ii) secondary structure is generally sufficient to interpret RNA function and reactivity; (iii) RNA secondary structures are conserved in evolutionary phylogeny; (iv) statistical properties of RNA secondary structural landscapes are independent of the folding algorithm [103]; and (v) the models are amenable to rapid computer simulation and theoretical studies. Comparisons between true and predicted RNA secondary structures are discussed in Refs. 104–106. The NK, p-spin and other models described above all assign fitnesses directly to sequences. In contrast, the secondary structure landscapes are best thought of in two parts: sequence → structure → fitness (i) Sequence-to-structure mappings, which indicate which structure a sequence folds into. This mapping is referred to as a combinatory map. (ii) Structure-to-fitness mappings, which indicate what fitness is assigned to a structure. Algorithms for the sequence-tostructure mapping are based on minimum free-energy pairings of nucleotides, kinetic folding, and other methods (Ref. 103 and references cited therein). Graph theory-based models, which incorporate no biophysics, are also surprisingly successful at sequence-tostructure mapping [74,75,107–109]. For the mapping from shape to fitness, fitnesses can be assigned to the structures at random. A more realistic mapping results from defining the fitness of a structure as its similarity to a given ‘goal’ structure, the idea being that similar structures are more likely to have similar function. The structural similarity can be measured with a distance measure for tree-like graph representations of secondary structures, as demonstrated in Fig. 16 [74,110,111]. The energetics of RNA secondary structure folding demonstrate the same types of frustration as in spin glass models. 133

B. Levitan

RNA secondary structure studies have yielded many insights of significance for molecular evolution, both natural and applied. Because the mapping from sequence-to-structure is many-to-one, the number of RNA structures is vastly smaller than the number of RNA sequences [67,75,112]. Additionally, the vast majority of sequences fold into the same few percent of all possible RNA structures, with the result that those structures that occur most commonly and hence are most often searched for compose only a small fraction of all possible structures. This fraction decreases with increasing chain length; so for sufficiently long RNAs, the majority of possible RNA secondary structures can be ignored. Those sequences that fold into the same structure are distributed throughout sequence space in connected chains of 1-mutant neighbors known as neutral networks [75,107–109]. As discussed below, neutral networks have important consequences for evolutionary search. For example, all common structures can be found within a ball in sequence space of a radius much smaller than that of the entire space. Calculations show this ‘covering radius’ to be 15 for 100 base long RNAs; in other words, mutating only 15 of the 100 bases in an arbitrarily chosen sequence is sufficient to find every common RNA structure [75,108,109,113]. The mutation schemes in laboratory search protocols can be matched to this property [114]. Note that while certain types of spin glass models produce landscapes with neutral networks, the spin glass-like models used for molecular properties generally do not. Other approaches The largest limitation of the spin glass landscape models described above is that they are, at best, only statistically similar to actual molecular affinity landscapes. The RNA secondary structure models may make good structural approximations, but the structureto-fitness mappings in existence are clearly not good reflections of affinity. Work in protein and nucleotide folding offers the prospect of much more realistic fitness functions – functions particular to a particular receptor and class of ligand. While work in threedimensional quantitative structure-activity relationships (3D QSAR) is not generally cast

Fig. 16. Example of tree-like representation for RNA secondary structure. Each hairpin structure is shown next to its equivalent tree. With such representations, a graph theoretic measure can measure the ‘distance’ between these trees and help generate fitness values for a fitness landscape. For example, the distance between two structures may be defined as the minimal number of elementary graph operations (insert a point, switch an edge, etc.) needed to convert one tree into the other. Note that there are many variants of tree representations for RNA secondary structures and many definitions of graph distance. In low-resolution tree representations, several secondary structures can map to the same graph.

134

Applied molecular evolution

in a fitness function framework, if an appropriate sequence space is defined, 3D QSAR can also be viewed as a landscape model. These topics are outside the scope of this review, but reviews may be found in Refs. 115–1 17. Library design and search strategies Designing the initial library Library size A combinatorial library must be of a certain minimum size for it to have on average at least one ligand that can bind to arbitrary targets with a given minimum affinity. Such libraries are said to ‘cover’ or ‘saturate’ the sequence space of targets. Arguments based on shape space imply that a repertoire of 106 different antibodies is sufficient to give a probability on the order of 0.99999 that any antigen is recognized by at least one antibody in the immune system [52,53]. ‘Recognition’ does not imply high affinity, as affinities on the order of 5 × 104 to 105 M–1 are sufficient to initiate clonal selection and affinity maturation. In analogous fashion, applied molecular evolution techniques that use mutation can mutate modest affinity ligands from the initial library into high affinity ligands. This suggests that the libraries presently in use are of more than sufficient size for techniques using mutation, at least when the binding site for the ligand is on the order of the size of that for antibodies. For techniques not using mutation, the minimum affinity needed in the initial library is that which would be an adequate result from the search. There are no estimates for the repertoire sizes that cover target sequence space with higher minimal affinities; however, arguments based on catalytic task space suggest that 108 molecules are sufficient for saturation [4]. If enzymes are considered to bind to transitional states between reactants and products, then 108 ligands may be adequate for techniques without mutation. However, there is no estimate for the minimal affinity towards a transitional state needed to catalyze a reaction, so it is unclear what exactly is achieved by a library of this size. An order statistics approach could also be used to choose a library size. Given an affinity distribution p(Ka) and a minimum desired affinity, order statistics calculations can yield the number of ligands needed to include at least a specified number of ligands with this affinity or greater with a specified probability (Fig, 11). Library bias Biasing a library involves any means of choosing sequences from the sequence space in a nonuniform manner. The goal of biasing is either to increase the maximal or average fitness in the initial library or to choose sequences such that mutation in successive rounds is more likely to find high affinity sequences. Depending on what information is known about the landscape, the best means to bias varies significantly. If absolutely no information is known about the landscape, sequences can be sampled in any fashion with no penalty [63]. However, this is not the case for molecular affinity landscapes, Both experimental and theoretical work suggest that affinity landscapes have some degree of correlation [4,65,70,71,73]. Simulations on correlated NK landscapes show that biasing the initial library by choosing points clustered together in sequence space on average produces libraries inferior to those produced by choosing points at random (Fig. 17). This difference occurs because high fitness values in a highly correlated landscape are tightly grouped in one or a few regions. The probability of the cluster of sampled points being near these regions is much lower than that of being distant from them. As correla135

B. Levitan

tion decreases, the high fitness points are increasingly scattered throughout the space, and the difference between random and biased sampling gradually becomes less significant and vanishes. For search techniques that use mutation, an additional disadvantage to clustering is that many of the sequences in the initial library will be regenerated by the mutation process, losing an opportunity to explore the landscape with increased diversity. There may also be other means of biasing the library that better take advantage of a known landscape correlation or other properties [118]. Libraries may also be biased by taking advantage of prior knowledge such as the affinities of a few sequences. Since many libraries are designed to improve on a ligand of known affinity, this is a common scenario. One approach might be to select new sequences at some optimal mutation distance from the initial, known sequence, a technique considered on NK landscapes [119] and another simple landscape [120]. Consider a point of fitness 0.5 on a correlated landscape with fitnesses scaled symmetrically between 0 and 1. Since 0.5 is the average fitness, at any Hamming distance from this point, half the mutants will be more fit than 0.5 and half less fit. Points at small Hamming distances will on average only be slightly different from 0.5, while points more distant than the correlation length will be distributed according to p(f) with no bias from the initial point’s fitness (Fig. 18a). To look for neighbors of high fitness, the optimal distance to search at is any distance above the correlation length, since the variation in fitnesses is largest there.

Fig. 17. Simulation of the effect of biasing an initial library when no information is initially known about the underlying affinity landscape. Library size n = 10, 100 or 1000 points were selected from an NK model of N = 12 and K= 0 ... 11. Points were selected either at random (unbiased) or tightly clustered in one region of the landscape (biased). The biased points are consecutive sequences counted in binary. The curves show the maximum fitness in the initial library as a function of increasing ruggedness (K). For all library sizes, the unbiased set contains higher fitness members than the biased set. The difference is largest for highly correlated landscapes and gradually disappears as correlation decreases. For larger library sizes, the best fitness in the library increases. While the bias can occasionally be favorable, the vast majority of the time the bias induced by picking sequences in a cluster is unfavorable.

136

Applied molecular evolution

a

b

Fig. 18. Distribution of fitnesses as a function of mutant Hamming distance from current position on the landscape for an initial fitness of (a) 0.5 and (b) 0.543. Simulations were carried out on NK landscapes with N = 100 and K = 2, yielding the high nearest neighbor correlation of 0.97 and a correlation length of 33.3. Vertical bars show +1 and –1 standard deviation from the mean of fitnesses found at each search distance. If the best of six mutants at each distance is chosen, then the best mutant can be found at Hamming distance 33 from the fitness 0.5 point and at decreasing distances as the initial fitness increases. (From Ref. 119.)

If the point has greater fitness, say f = 0.6, then the fitnesses of mutants will average slightly below 0.6 at small distances and gradually return to the average fitness of 0.5 with increasing distance. If several mutants are generated at distance d, there will be a scattering of fitnesses about the average for distance d. However, since the fitnesses of mutants vary both above and below the mean, there exists a distance at which the scattering of mutant fitnesses to higher fitness is maximized (Fig. 18b). This is the optimal distance at 137

B. Levitan

which to make this many mutants to the library. For landscapes with small, positive correlation, the optimal mutation distance from a point decreases slowly as the point’s fitness increases from the average fitness. For highly correlated landscapes, the optimal mutation distance decreases very rapidly, as seen in Fig. 18b. With a known average correlation for the landscape, the optimal mutation distance could be calculated. Thus, one means to build a library is to start with a sequence of measured affinity in a landscape of known correlation and generate a variety of mutants at the optimal distance. Then one or several of these mutants can be sequenced and its affinity measured. A second set of mutants can be generated at the optimal distance from this new point. Repeating this process can generate the entire library. This particular example can be generalized in many ways, such as biasing the mutants according to information from several sequences of known affinity. Issues worthy of additional study are, for a given library size, how many rounds of mutant generations to use, how many mutants at the optimal distance from a known sequence to make, and how many sequences should be measured for use in biasing the next set of mutants. Additional arguments indicate that there is another disadvantage to biasing initial RNA libraries by clustering. As discussed above, there are far fewer RNA secondary structures than RNA sequences. The common structures, those whose sequences make up the vast majority of the sequence space, are distributed randomly throughout the space, but the same structures occupy many nearby points because they are also distributed on neutral networks. It is disadvantageous to choose samples clustered too tightly, as the neutral nets percolate through the cluster and reduce the diversity of the chosen sequences, while sampling at random can select from all common sequences. Thus, to the extent that the RNA secondary structural models and results apply to true RNA structure and function, when there is no prior knowledge of RNA affinities, RNA libraries should be designed by random selection of sequences without clustering. Measuring molecular landscape properties This is a relatively unexplored field of great importance in designing optimal molecular landscape search methods, studying properties of binding between various classes of molecules, and for studying phylogenic molecular evolution. No search algorithms work well on all problems; rather, a search method performs best on certain problems and performs poorly on other problems [63]. While matching a search algorithm to a problem is still a relatively unexplored field, it is quite apparent that properties such as the landscape fitness distribution and autocorrelation can be very important in choosing an algorithm [67,77,83,119,121]. The general problem is that many properties measured for one region of the landscape may not be the same in other regions, so a vast number of points must be examined before property averages take on statistical significance. Properties that are the same throughout sequence space are called stationary. Generally, this is dealt with by examining properties that are either assumed or proven to be adequately stationary throughout the landscape. However, the level of stationarity that is ‘adequate’ depends on how sensitive the search algorithm is to variation in the property, which is generally unknown. In general, correlation is different in different regions of sequence space, and likely is higher for changes in the structural components of molecules than for changes in the binding site components. 138

Applied molecular evolution

With the exception of fitness distribution p(f) and landscape ruggedness or autocorrelation, the measurement of molecular landscape properties has been little considered [70,7 1, 73,81,91,92]. Estimating affinity distributions from a small number of samples is well studied and will not be covered here [18,122–124]. Estimating the distributions of other molecular properties is not as well considered, though there are some promising techniques. For example, a large array of peptides can be synthesized in parallel in a spatially addressable manner, and their affinities characterized with epifluorescence microscopy [125]. The resulting affinities can give the correlation in the region of sequence space examined. A promising approach is Hamming chromatography [126]. In this method, random oligomers partially complementary to a given wild-type nucleotide sequence are synthesized and then separated by Hamming distance to the wild type in an affinity column or matrix. The column is coated with copies of the wild-type sequence, and temperature and other conditions are varied so that mutants are eluted in order of increasing complementarity to the wild type. The collected fractions are each a different Hamming distance from the complement of the wild type. Measuring the wild type’s fitness/affinity and the average fitness of the different fractions can give the autocorrelation and other properties of the landscape about the wild type. Repetition of this method using different sequences as the wild type gives an average correlation function as well as a sense of how stationary the correlation is. One concern with Hamming chromatography is that the collected fractions may overlap and contain sequences at several distances, particularly for long nucleotide sequences. The energy contribution of one site is small compared to the total contributions from all sites in long sequences, and noise may overwhelm the signal. But for small sequences, this technique should work very well. It is also possible to use data collected from landscape searches to find landscape properties. Pooling or deconvolution strategies provide several profitable approaches: (i) Since the pools are already separated by subsequences, pool affinities reflect landscape structure. It may be possible to reexpress correlation or other properties in terms of pool affinities. These data may be particularly useful as they are based on sequences distributed throughout the landscape, providing very good average measurements. (ii) The fitness distribution of 1-mutant neighbors of optima from pooling strategies is very dependent on landscape ruggedness; as ruggedness increases, the fitness distribution decreases in mean and increases in variance [65]. This distribution can be measured after pooling and related to landscape correlation. (iii) When two independent 1 -mutant variants of a protein are combined, the change in free energy of binding for the resulting double mutant may not reflect the sum of free energy differences for the independent changes. The effects of the mutated residues are generally additive when the residues are distant and not additive when they are in close contact [127,128]. The frequency with which superadditive and subadditive double mutants occur (free energy change greater or less than the sum of the independent changes) starting from the pooled optimum sequences is also a reflection of landscape ruggedness: superadditive and subadditive mutants do not occur on smooth (K= 0) NK landscapes but are increasingly likely as ruggedness increases. The frequency of such mutants can be measured and give a measure of ruggedness. 139

B. Levitan

Finally, it may also be possible to extract useful data from the number and variety of consensus sequences found by iterative selection/amplification methods. The change in these data as mutation rates are changed may be particularly useful, though the relevant mathematics has not been developed. Search on spin glass-like landscapes The limited space for this review precludes a proper review of the extensive literature on this subject. I will confine this section to results concerning setting selection stringency and mutation rate, as these issues are critical to any laboratory implementation of search. There are four relevant bodies of literature: adaptive walk studies, theoretical immune system studies, formal spin glass studies, and theoretical genetic algorithm studies. Much of the adaptive walk literature has already been discussed above. General issues As discussed throughout the section on Laboratory technique-based models and search strategies, search without mutation benefits from an increasing selection stringency strategy. However, a constantly increasing stringency may not be appropriate when using point mutation. With mutation, there is a tradeoff between retaining the best binders found so far and searching unexplored regions of sequence space [4,35,64,67, 80,110,129–131]. The shape generally assumed for the affinity distribution p(Ka) (see above) causes the vast majority of new mutants to be of lower affinity than the species already in the population. Retaining these mutants is important for search, as successive point mutations through low affinity regions of sequence space are one means of finding high affinity regions (‘dark horses’, as they are called by Yu and Smith [132]). A high stringency will remove the new mutants before there is an opportunity for successive mutation. Too high a stringency may also remove the rare high affinity mutants before they have a chance to amplify, particularly after several rounds when the population has a high bulk affinity. However, too low a stringency relative to the mutation rate will result in landscape exploration at the cost of losing the good members already found, and a population centered around even a very high affinity peak in sequence space will ‘melt’ all over the landscape [80,129,131,133]. Thus, rather than constantly increasing stringency, searches with mutation may be better served with intermediate stringencies, at least until the last few rounds of the search. With crossover, there is the additional concern of population diversity [47,134-139]. One role of crossover is to add to the library sequences very distant (Hamming distance) from those already in the population. For this to happen, the library must consist of sufficiently diverse species so the recombined fragments differ. With just a few species in the population (low diversity), little is gained by recombining fragments of their sequences. High selection stringency can rapidly lessen population diversity. Point mutation can help retain diversity, but too high a mutation rate (beyond what is called the error threshold [80,129,133]) can result in the same population melting as caused by too low a selection stringency. Clearly, the relationships between mutation rate, crossover rate and selection stringency are very complex, and a theoretical approach is necessary to control them. The theory to calculate these relationships, particularly in the context of applied molecular evolution, is still sparse. The remainder of this section highlights some potentially very useful concepts and approaches for experiment protocol design. The summary of literature here is by no means complete. 140

Applied molecular evolution

Spin glass and adaptive walk approaches When there is no selection pressure at all, corresponding to all sequences having the same fitness (a flat fitness landscape), mutation results in the population walking randomly in sequence space [34,129]. If the initial population consists of randomly selected sequences, the population eventually is whittled down into families, each composed of descendants of one of the original sequences. Such families are known as quasispecies, which can be viewed as a small cloud of points in sequence space centered about a master sequence [80]. A quasispecies is analogous to a consensus sequence. With a flat landscape, the quasispecies meander throughout the landscape, as all sequences are selected and amplified at the same rates. However, after on the order of n generations (library of size n), all sequences are members of the same family, with a comparatively small dispersion about the master sequence. The master sequence continuously wanders about the landscape, though, as there is no selection pressure. As selection stringency is increased, the number of quasispecies a population splits into increases, and the locations of the quasispecies become fixed [34,129]. High selection stringency can be viewed as breaking a sequence space up into allowed and forbidden regions. The allowed regions contain sequences with sufficient fitness to survive selection, and the forbidden regions contain sequences that almost always are not selected. Whenever a mutation moves a sequence from an allowed to a forbidden region, that sequence is essentially lost from the population, and the effect is to trap sequences within an allowed region. So one effect of high selection on moderate to very rugged landscapes is to partition the population into quasispecies in different, isolated allowed regions. Selection stringency can thus be viewed as also controlling the variety of consensus sequences that are found in a search. A different approach to setting mutation rates comes from studies of the immune system [67,113,130,140,141]. These studies show that the mutation rate schedule that maximizes affinity towards antigens in somatic mutation of B cells is one in which periods of high mutation rates alternate with periods of mutation-free growth. Such a strategy performs significantly better than constant rate schemes. During a high mutation rate generation, a large variety of new mutants are formed, providing an opportunity to explore the landscape. During the subsequent mutation-free round, the higher affinity members are preferentially amplified without risk of loss from mutation. Once they are amplified, there is less risk of losing the good binders during a later high mutation rate round. The theory that needs to be developed here is how to set the high mutation rate, how many mutation-free rounds should be followed by how many high mutation rounds, and how selection stringency should be varied during the course of these rounds. Another approach is possible if the affinity landscape correlation is known. The mutation rate in a given round can be based on the average fitness of the library in that round. The optimal mutation distance (see above) for a correlated landscape depends on the fitness of the point being mutated and can be converted into a mutation rate. More generally, if there is an estimate of the distribution of fitnesses around a population’s average fitness as a function of number of rounds, theoretically a distribution of optimal mutation distances could be calculated, which could yield a mutation rate schedule for an evolving population. Genetic algorithm approaches Genetic algorithms (GAS) are a computer search technique very similar in approach to laboratory selection/amplification methods such as 141

B. Levitan

phage display and SELEX [135–137,142,143]. In a GA, candidate solutions to a problem are typically represented as binary strings (analogous to biopolymers). A random population of strings is generated (library design) and undergoes cycles of selection and amplification with both mutation and crossover. The fitness of a string reflects how well its encoded solution solves the problem. Because of this similarity to laboratory methods, applied molecular evolution methods can benefit from GA theoretical work. Generally, GA parameters have been chosen with ad hoc rules which are then refined by repeatedly running the GA on the problem. Newer approaches try to use fitness function properties to set GA parameters [85,120,144] and allow for increasing stringency or decreasing mutation rate schedules [120,134,145], though the best means of choosing the schedule is still unclear. It seems difficult to apply these approaches to the laboratory. However, recent approaches to studying GAs based on statistical mechanics show great promise for addressing laboratory protocol issues. In these approaches, an approximation to the average fitness distribution of the population p(f) over generations is calculated in terms of population size, selection stringency, mutation rate, crossover rate and other GA parameters. p(f) is approximated using cumulants, statistical measures of a probability distribution whose evolution over generations of the GA can be calculated. The p(f) approximation allows calculating average population fitness, average best fitness in the population, and other useful properties. There are also approaches to finding exact solutions to GA dynamics, though solving them is much more difficult than solving the statistical mechanical approximation [45,146]. However, a new approach to solving GA dynamics is much more analytically tractable and may be very applicable to applied molecular evolution [147]. The formal statistical mechanical approach was developed by Shapiro and PrügelBennett [47]. In Ref. 47, the cumulants were solved for two simple fitness functions and the simplifying assumption of maximum entropy, in which it is assumed that the population is as diversified over sequence space as possible given the constraint of p(f). Extensions of this approach allow for (i) using fitness landscape statistical measures instead of the maximum entropy assumption [46]; (ii) more complex landscape problems [46,138]; and (iii) using the average correlation between population members in addition to the cumulants to characterize the p(f) [138]. This last measure is useful for gauging the effectiveness of crossover. By incorporating performance measures of relevance to applied molecular evolution, these results can be applied to laboratory protocols. For example, with the sample size (see above) and the p(f) estimate (analogous to p(Ka)), one can calculate performance measures based on the probability of finding ligands with affinity above a given threshold in terms of all parameters in the model. These newer GA studies also suggest approaches for deriving selection stringency and mutation rate schedules. Shapiro and Prügel-Bennett demonstrate a mutation schedule that maximizes the average fitness of the best member in a population [139]. Back describes a means to set a mutation rate schedule based on calculating the optimal mutation distance given the population’s average fitness [120]. The derivation is for a single, simple landscape, but is generalizable in principle. There are also several procedures suggested for setting selection stringency so as to maintain sufficient population diversity for effective crossover [134,138,139]. All these approaches are potentially applicable to the analogous issues in applied molecular evolution. 142

Applied molecular evolution

Search on RNA secondary structure landscapes Search on RNA secondary structure landscapes is distinctly different from search on the spin glass-like models. The difference is a result of the neutral networks that percolate the space. Note that, in practice, the sequences on neutral networks need not have exactly the same fitness, but fitnesses whose differences are below a threshold determined by the mutation rate and noise in the system. As with search on spin glass landscapes, this topic is quite extensive and is reviewed in several papers [39,67,69,113] as well as in Schuster’s contribution to this collection, so I will only touch on a few key points. Exploration of unexamined regions of sequence space by point mutation is less risky on neutral net landscapes than on landscapes without neutral nets [64]. Since a library is of finite size, using a portion of the library to explore unexamined regions depletes that portion which stores the best fitnesses found so far. Because the exploring members will in general wander quite far from the highest fitness sequences found, too much exploration poses the risk of losing the results of the search. Neutral nets, however, are in a sense similar to an extended point that percolates throughout the space. All points on a net have the same fitness/structure but different genotypes. The 1-mutant neighbors of such extended structures are much greater in number than the 1-mutant neighbors accessible to individual points. The many different neighboring structures of a high fitness net already found can be explored without losing representatives of the high fitness net, an idea similar to those expressed in Refs. 38 and 148. The number of new secondary structures accessible by single base mutations off a neutral net increases linearly with distance traveled during a random walk along the neutral net [64]. This result holds even when some sites are constrained to certain bases, in an effort to partially simulate additional constraints in primary structure. Percolation along neutral nets also prevents populations from being caught in evolutionary traps such as local optima, the major problem for search on spin glass-like landscapes. The accessibility of many novel structures one point mutation off most neutral nets is not simply a consequence of a net’s percolating through sequence space. Different nets could shadow one another to a great extent, limiting the number of different structures easily accessed. Instead, networks meander so that the networks of the vast majority of all common structures are within only one point mutation of at least one point on the networks for every other common structure [64,75]. To control exploration in neutral net spaces, it is beneficial to think in terms of two mutation thresholds [107,110,149]. The genotypic error threshold is the mutation rate above which a population of sequences will percolate along a high fitness neutral net, but will not drift into nearby lower fitness nets. Genotype information is rapidly lost due to percolation, but the structure, and hence fitness, is maintained. The phenotypic error threshold is the mutation rate above which structural information is also lost due to genetic drift. The sequences migrate to nearby, lower fitness nets because selection pressure is overwhelmed by the error rate. Search in RNA sequence space can be regulated by adjusting a mutation rate between these thresholds, allowing for varying degrees of exploration and retention of known structures. Steps towards calculating these thresholds are described in Refs. 67, 107 and 149. Search in neutral net landscapes shows several other properties [110]: (i) An evolving population will often split into subpopulations which diffuse independently on different 143

B. Levitan

neutral nets. (ii) The ability to explore vast regions of sequence space without loss of a high fitness structure makes the results of search much less sensitive than in nonneutral net landscapes to which particular sequences were in the initial population. (iii) Autocorrelation measures are misleading measures for neutral net landscapes. Autocorrelation averages the changes in fitness from a point to its neighbors in all directions. This averaging completely hides the neutral net structure. Depending on how fitnesses are assigned to the networks, these landscapes can appear very rugged using a standard autocorrelation measure, even though the trajectories followed by mutating sequences subject to selection pressure may be highly correlated in fitness. Pooling strategies Pooling strategies, also known as iterative deconvolution methods, are a variant of evolutionary search methods [65,150–152]. The general technique is to iteratively partition a sequence space into pools with no sequences in common. The pools are tested for the property of interest, and the best pool is partitioned into subpools for the following cycle. For example, a hexapeptide library can be partitioned into 400 pools by setting the first two sites each to one of their 20 values and randomizing the remaining four. The pool with the highest measured bulk affinity can then be partitioned into 400 subpools differing in the next two sites. Testing these subpools and then partitioning and testing the best identifies a unique, high affinity sequence (the pool optimum) [151]. There are many variations on the pooling approach. While they work well in practice, they all suffer from the same limitation in that the binding capacity of a pool is more a reflection of its bulk affinity rather than its highest affinity member. Consequently, a pool with several high or near-optimal binders may appear superior to one with a better binder, and the wrong pool may be used for the next cycle. The degree of interference caused by these suboptimal binders determines the quality of the search. Kauffman and Macready [65] examined pooling strategies on NK fitness landscapes. They simulated two pooling strategies that differed in the order in which amino acid sites were used to partition the best pool each cycle (order of unrandomization) and the number of subpools they were partitioned into. Results were qualitatively the same for both. A pool’s fitness was defined as the average fitness of its members. Results depended heavily on landscape ruggedness. Both mean and highest fitnesses in the best pool increase each cycle (Fig. 19). For highly correlated landscapes (K = 0), the fitness distribution retained the same form and shifted up the same distance each cycle (Fig. 19a). With increasing ruggedness (increasing K), there were conflicting trends: the distribution widened each cycle, improving the fitness of the best member of the best pool. However, the mean increased substantially less in the early cycles, lowering the fitness of the best member (Fig. 19b). These conflicting trends have not been examined in detail, but they strongly imply that a pooling strategy (number of pools, size of pools, etc.) can be optimized for a problem if landscape ruggedness is estimated in advance. The simulations also show that pooling generally does not result in local peaks. Adaptive walks via point mutations from pool optima found local peaks of higher fitness. Interestingly, if the best two pools in each cycle were retained, recombination between the 2c (c cycles) resulting pool optima generally only found points of lower fitness than the pool optima. But when the recombinants were then used as the start for adaptive walks 144

Applied molecular evolution

a

b

Fig. 19. Distributions of fitnesses of the best pool during successive cycles (iterations) of a pooling strategy. Simulations are on an NK landscape with N= 6 and 20 monomers per site, corresponding to a hexapeptide. The initial 400 pools are fixed at sites 3 and 4. The second pools are fixed at sites 2 and 5 , and the third pools are fixed at sites 1 and 6. (a) K= 0; (b) K= 3. (From Ref. 65.)

to local peaks, sequences were found whose average fitness exceeded that resulting from adaptive walks directly from the pool optima. Kauffman et al. interpret these results to mean that pooling tends to find points near regions of the landscape with many high local peaks, but generally does not find those peaks. In contrast to these results, Freier et al. [150] show that deconvolution is very successful in finding the best or near-best member of a library. They simulated deconvolution of RNA libraries with affinities given by an RNA hybridization model modified to give affinity estimates. The pooling strategy was similar to those studied by Kauffman et al. [65], but they allowed for many different orders of unrandomizing the library, differentsized targets with multiple alignment sites possible between ligands and target, and measurement error. Pool performance was defined as the average affinity of the pool. 145

B. Levitan

For a library of nine base long (9-mer) ligands hybridizing to 9-mer or 18-mer targets, deconvolution always found the best binder for any order of unrandomization. Against a 6-mer target, the selected ligand varied with order of unrandomization, but its affinity was always within a factor of 5 of that of the best binder. This difference is attributed to the large variety of alignments possible between a 9-mer ligand and a 6-mer target. The additional alignments skew the ligand affinity distribution towards higher affinities, so that there are many more suboptimal binders to interfere with the search. For deconvolution with measurement error (simulated by adding Gaussian noise to the affinities), deconvolution was far more likely to find suboptimal binders, but these binders almost always had affinities close to that of the best binder. Measurement error is much less of a problem for the 6-mer target, because there are so many suboptimal binders that erroneously selected pools also give good, if not optimal, results. Kauffman et al.’s simulations imply that pooling strategies tend to find points that are not local optima but are located around a region of sequence space in which many good local optima are located. However, Freier et al.’s simulations show that pooling strategies can often find the global optima or points with fitnesses very near that of the global optima. There are at least three possible causes for this discrepancy: (i) The different fitness/affinity distributions. The NK model gives a Gaussian p(f), while the RNA hybridization model gives a log-normal-like distribution p(Ka). The tail of a Gaussian distribution tapers much more quickly than that of a log-normal, resulting in many more suboptimal binders to interfere with the search. (ii) The different fitness landscape models. The affinity landscape in Freier et al. is likely a neutral net landscape, as it is based on an RNA secondary structure prediction model (see above). Search on NK landscapes is quite different from search on neutral net landscapes (see above). (iii) The different sizes of the sequence spaces. The smaller space in Freier et al. (209 RNAs) is much easier to search than that in Kauffman et al. (20100 peptides). A potential difficulty with both studies is that a pool’s average affinity differs from its bulk affinity. Since the key to success is whether the best pool is selected each cycle, this is an important consideration. The definition of 〈 Kd〉 , the bulk dissociation constant in SELEXION (Eq. 3), indicates that the bulk affinity is approximately the average affinity when the post-equilibrium free target concentration is small [20]. In general, as this free target increases, the observed bulk affinity decreases from the average affinity. It is not obvious that the relative performance of two pools cannot be changed with different target concentrations or other reaction equilibrium conditions. For deconvolution in a laboratory, these studies suggest: (i) Landscape ruggedness can be estimated by examining the changes in the fitness/affinity distribution of the best pool over cycles. For example, data in Ref. 152 show that pool mean affinities increase each cycle, suggesting that peptide landscapes are at least moderately correlated. (ii) Varying a pooling algorithm to include more than the single best pool each cycle will yield better pool optima. (iii) Rounds of uphill-search via point mutations will improve on the sequences that result from pooling. (iv) When keeping more than the best pool each cycle, rounds of uphill-search via point mutations after recombination of pool optima can give even better sequences. These studies also indicate areas needing further study: (i) Simulation or mathematical analysis of pooling using the correct pool bulk affinity instead of approximating it with 146

Applied molecular evolution

average affinity. (ii) Determination of the degree to which pooling results depend on sequence length and the number of monomers at each site. (iii) Characterization of performance and tradeoffs as affinity landscape ruggedness is varied and how the deconvolution strategy can be matched to ruggedness. Conclusions from landscape-based models The landscape-based search methods show great promise for designing laboratory search protocols. With sufficiently advanced theory, they can give selection stringency schedules, mutation rate schedules, expected number of rounds to achieve a given search goal, pool sizes for deconvolution and other important laboratory parameters. The theory may also be able to characterize properties of true molecular affinity landscapes, either based on the data collected during such experiments or in specially designed experiments. To achieve this promise, two important limitations in the present literature need to be overcome. The first limitation is that the landscape models are very abstract. Their results apply to molecular search in a general way, but are difficult to relate to laboratory concerns. Ideally, future work will combine the mathematical rigor of landscape-based search with the chemical and experimental details of the laboratory technique-based models. Some work along these lines has started with calculating mutation rates for SELEX schemes based on RNA secondary structure landscape models [114]. The second limitation, which applies to both landscape-based and laboratory techniquebased models, concerns the difficulty of applying models for a generic molecular search problem to a particular search problem. Biophysical and other arguments can make guesses for properties such as landscape correlation and affinity distribution p(Ka), but it is unknown how well these generic properties can be used to optimize a search strategy for a particular ligand and class of targets. The relevant properties can always be measured in specially designed experiments, but the cost may be prohibitive and not offset by the benefits of using the model. Several issues stand out as important for advancing theory for landscape-based models: (1) The REM, NK and p-spin models all are attempts to capture the important statistical properties of true molecular landscapes in a simple model. Because they contain no biophysical information, they are limited in how well they can achieve this. The block model is an important step in removing some of the simplifications in these models, as it allows for nonstationary properties that can be matched to different regions of molecules. Ideally, landscape models can be based on experimental data. Unfortunately, despite the tremendous interest in molecular optimization, there is still relatively little data that can be used this way. As more data are collected on the effects of substitutions in protein structural and loop regions, antibody CDRs and framework regions, etc., a block or other type of model can be developed that uses appropriate fitness functions for each block. Combined efforts by theoreticians and experimentalists may also help devise experiments that measure key true affinity landscape properties without excessive laboratory effort. (2) The mathematical analyses of evolutionary dynamics on spin glass-like landscapes and the statistical mechanics and exact solution approaches to genetic algorithms can be related to the techniques and parameters used in laboratories. If performance measures of importance in applied molecular evolution are related to the approximated affinity 147

B. Levitan

distribution given by the mathematics, it is possible to compare and perhaps find optimal selection stringency and mutation rate schedules. (3) The statistical mechanics approaches to genetic algorithm dynamics can be applied to the newly developed technique of sexual PCR, in which nucleotide sequences can be recombined at multiple sites [153,154]. (4) The optimal mutation distance approach [119,120] can be generalized for arbitrary classes of fitness landscapes and populations of molecules. (5) The pooling strategies should be reexamined with the correct bulk affinity calculations and the areas for further study outlined above. (6) In most drug design problems, finding a lead molecule is followed by the perhaps even more difficult process of lead optimization. If the properties that define the optimized molecule can be measured in vitro, the lead design and optimization searches can be combined into a single search. In landscape terms, this corresponds to searching several fitness landscapes simultaneously. There is a large body of literature on such multiobjective optimization problems, but the techniques are more appropriate for computer simulations than for implementation in chemistry. However, there is a growing body of literature for multiobjective optimization with genetic algorithms [155]. These methods are much more amenable to laboratory implementation. Considering multiobjective optimization in the context of applied molecular evolution offers exciting possibilities for finding molecules with several desired properties much more rapidly than at present. (7) All the models in this review make the simplifications that only one type of ligand binds a target and that there is only one binding site per target molecule. This approximation is valid for small targets that only have room to bind one ligand at a time. However, when the target is a lengthy sequence or, as is becoming more common, an entire bacterium or cell, the approximation can lead to false results. The equilibrium and washing equations, such as derived in the laboratory technique-based models, can be extended for multivalent ligands and multivalent targets with several different classes of binding sites on both ligands and targets. This would have the effect of merging the issues related to multivalent binding with those of multiobjective optimization.

Conclusions While I have made a clear distinction between laboratory technique-based and landscape-based models, the distinction is more artifactual than representative of fundamental differences. The laboratory technique-based models do not include mutation or crossover, so the only ‘landscape property’ they depend on is the affinity distribution p(Ka). Once mutation is included, some type of relationship between specific sequences and their affinities must be included. Landscapes are one means of including this relationship. Work with landscape-based models does not include laboratory techniques or parameters because the questions posed in this work do not require this added level of complexity and because of the paucity of experimental data to define actual affinity landscapes. If the landscape work is to solve actual laboratory protocol problems, the laboratory and chemistry details need to be included. Ideally, future work will include mathematically rigorous analyses of landscape-based models that incorporate chemical and experimental details. 148

Applied molecular evolution

Acknowledgements I would like to thank Martijn Huynen, Stuart Kauffman, William Macready, Wlodek Mandecki, Melanie Mitchell, Richard Palmer, Alan Perelson, Christian Reidys and Bill Tozier for many helpful discussions on the material in this review and for reviewing early versions of the manuscript. I especially thank Margaret Alexander for assistance in collecting the papers for this review and Simon Fraser for help in crafting some figures.

References 1 Ellington, A.D. and Szostak, J.W., Nature, 346 (1990) 818. 2 Joyce, G.F., Sci. Am., 267 (1990) 90. 3 Kauffman, S., J. Theor. Biol., 157 (1992) 1. 4 Kauffman, S., The Origins of Order, Oxford University Press, New York, NY, U.S.A., 1993. 5 Kay, B.K., Perspect. Drug Discov. Design, 2 (1994) 251. 6 Kenan, D.J., Tsai, D.E. and Keene, J.D., Trends Biochem. Sci., 19 (1994) 48. 7 Klug, S.J. and Famulok, M., Mol. Biol. Rep., 20 (1994) 97. 8 Scott, J.K., Trends Biochem. Sci., 17 (1992) 241. 9 Scott, J.K. and Smith, G.P., Science, 249 (1990) 386. 10 Tuerk, C. and Gold, L., Science, 249 (1990) 505. 11 Winter, G., Griffiths, A.D., Hawkins, R.E. and Hoogenboom, H.R., Annu. Rev. Immunol., 12 (1994) 433. 12 Sips, R., J. Chem. Phys., 16 (1948) 490. 13 Goldstein, B., Biophys. Chem., 3 (1975) 363. 14 Macken, C.A. and Perelson, AS., In Perelson, A.S. and Kauffman, S.A. (Eds.) Molecular Evolution on Rugged Landscapes: Santa Fe Institute Studies in the Sciences of Complexity. Vol. IX, Addison-Wesley, Reading, MA, U.S.A., 1991, pp. 93–118. 15 Weber, G., Protein Interactions, Chapmann & Hall, New York, NY, U.S.A., 1992. 16 Lancet, D., Sadovsky, E. and Seidemann, E., Proc. Natl. Acad. Sci. USA, 90 (1993) 3715. 17 Farmer, J.D., Packard, N.H. and Perelson, A.S., Physica D, 22 (1986) 187. 18 Bruni, C., Gandolfi, A. and Germani, A., In Matchuk, G.I. and Belykh, L.N. (Eds.) Mathematical Modeling in Immunology and Medicine, North-Holland, New York, NY, U.S.A., 1983. 19 Lancet, D., Kedem, 0. and Pilpel, Y., Ber. Buns. Phys. Chem., 98 (1994) 1166. 20 Irvine, D., Tuerk, C. and Gold, L., J. Mol. Biol., 222 (1991) 739. 21 Mandecki, W., Chen, Y.C.J. and Grihalde, N., J. Theor. Biol., 176 (1995) 523. 22 Press, W.H., Teukolsky, SA., Vetterling, W.T. and Flannery, B.P., Numerical Recipes in C, 2nd ed., Cambridge University Press, Cambridge, U.K., 1992. 23 Gailus, V. and Rasched, I., Eur. J. Biochem., 222 (1994) 927. 24 Sulzer, B. and Perelson, A.S., Math. Biosci., 135 (1996) 147. 25 Barrett, R.W., Cwirla, S.E., Ackerman, M.S., Olson, A.M., Peters, E.A. and Dower, W.J., Anal. Biochem., 204 (1992) 357. 26 Hawkins, R.E., Russell, S.J. and Winter, G., J. Mol. Biol., 226 (1992) 889. 27 Levitan, B. and Kauffman, S., Technical Report, Santa Fe Institute, Santa Fe, NM, U.S.A., 1996, in preparation. 28 Levitan, B.S. and Kauffman, S.A., In Exploiting Molecular Diversity: Small Molecule Libraries for Drug Discovery, Cambridge Healthtech Institute, January 1996. 29 Goldstein, B., Comments Theor. Biol., 1 (1989) 109. 30 Goldstein, B. and Dembo, M., Biophys. J., 68 (1995) 1222. 31 Goldstein, B. and Wofsy, C., Lectures Math. Life Sci., 24 (1994) 109. 32 Cantor, C.R. and Schimmel, P.R., Biophysical Chemistry, Vol. 2, Freeman, San Francisco, CA, U.S.A., 1980. 33 Marvin, D.A. and Hohn, B., Bacteriol. Rev., 33 (1969) 172. 34 Amitrano, C., Peliti, L. and Saber, M., J. Mol. Evol., 29 (1989) 513.

149

B. Levitan 35 Levitan, B. and Kauffman, S., Mol. Div., 1 (1995) 53. 36 Kendall, M.G. and Stuart, A., The Advanced Theory of Statistics, Vol. 1, Hafner, New York, NY, U.S.A., 1958. 37 Eigen, M., In Pines, D. (Ed.) Emerging Synthesis in Science: Proceedings of the Founding Workshops of the Santa Fe Institute, Santa Fe Institute, Santa Fe, NM, U.S.A., 1985, pp. 25–69. 38 Maynard-Smith, J., Nature, 225 (1970) 563. 39 Schuster, P., Chemica Scripta, 26B (1986) 27. 40 Bunin, B.A. and Ellman, J.A., J. Am. Chem. Soc., 114 (1992) 10997. 41 Thompson, L.A. and Ellman, J.A., Chem. Rev., 96 (1996) 555. 42 Borchardt, A. and Still, W.C., J. Am. Chem. Soc., 116 (1994) 373. 43 Brenner, S. and Lerner, R.M., Proc. Natl. Acad. Sci. USA, 89 (1992) 5381. 44 Hamming, R.W., Bell Syst. Tech. J., 29 (1950) 147. 45 Altenberg, L., In Whitley, D. and Vose, M. (Eds.) Foundations of Genetic Algorithms, Vol. 3, Morgan Kaufmann, San Francisco, CA, U.S.A., 1995, pp. 23–49. 46 Bornholdt, S., In Belew, R.K. and Vose, M. (Eds.) Foundations of Genetic Algorithms, Vol. 4, Morgan Kaufmann, San Mateo, CA, U.S.A., 1996. 47 Prügel-Bennett, A. and Shapiro, J., Physica D, in press. 48 Prügel-Bennett, A. and Shapiro, J.L., Phys. Rev. Lett., 72 (1994) 1305. 49 Stadler, P.F. and Wagner, G.P., Technical Report 96-07-046, Santa Fe Institute, Santa Fe, NM, U.S.A., 1996. 50 Jones, T., Technical Report 95-02-025, Santa Fe Institute, Santa Fe, NM, U.S.A., 1995. 51 Carneiro, J. and Stewart, J., J. Theor. Biol., 169 (1994) 391. 52 Perelson, A.S. and Oster, G.F., J. Theor. Biol., 81 (1979) 645. 53 Perelson, A.S. and Weisbuch, G., Technical Report 95-06-056, Santa Fe Institute, Santa Fe, NM, U.S.A., 1995. 54 Segel, L.A. and Perelson, A.P., In Perelson, A. (Ed.) Theoretical Immunology, Part II: Santa Fe Institute Studies, Addison-Wesley, Redwood City, CA, U.S.A., 1988, pp. 321–343. 55 Provine, W.B., Sewall Wright and Evolutionary Biology, University of Chicago Press, Chicago, IL, U.S.A., 1986. 56 Crow, J.R. and Kimura, M., An Introduction to Population Genetics Theory, Harper & Row, New York, NY, U.S.A., 1970. 57 Ewens, W., Mathematical Population Genetics, Springer, New York, NY, U.S.A., 1979. 58 Anderson, P.W., Proc. Natl. Acad. Sci. USA, 80 (1983) 3386. 59 Derrida, B., Phys. Rev. Lett., 45 (1980) 79. 60 Derrida, B., Phys. Rev., B24 (1981) 2613. 61 Derrida, B. and Peliti, L., Bull. Math. Biol., 53 (1991) 55. 62 Toulouse, G., Commun. Phys., 2 (1977) 115. 63 Wolpert, D.H. and Macready, W.G., Technical Report SFI-TR-02-010, Santa Fe Institute, Santa Fe, NM, U.S.A., 1995. 64 Huynen, M., J. Mol. Evol., 43 (1996) 165. 65 Kauffman, S.A. and Macready, W.G., J. Theor. Biol., 173 (1995) 427. 66 Schuster, P., Technical Report 96-07-047, Santa Fe Institute, Santa Fe, NM, U.S.A., 1996. 67 Schuster, P. and Stadler, P., Comput. Chem., 18 (1994) 295. 68 Baskaran, S., Stadler, P.F. and Schuster, P., J. Theor. Biol., 181 (1996) 299. 69 Fontana, W., Stadler, P.F., Bornberg-Bauer, E.G., Griesmacher, T., Hofacker, I.L., Tacker, M., Tarazona, P., Weinberger, E.D. and Schuster, P., Phys. Rev., E47 (1993) 2083. 70 Kauffman. S.A., Weinberger, E.D. and Perelson, A.S., In Perelson, AS. (Ed.) Theoretical Immunology, Part I: Santa Fe Institute Studies in the Sciences of Complexity, Addison-Wesley, Reading, MA, U.S.A., 1988, pp. 349–382. 71 Macken, C.A. and Perelson, A.S., Proc. Natl. Acad. Sci. USA, 86 (1989) 6191. 72 Macken, C.A. and Stadler, P.F., In Nadel, L. and Stein, D.L. (Eds.) Lectures in Complex Systems, Vol. 6, Addison-Wesley, Reading, MA, U.S.A., 1993, pp. 43-86. 73 Perelson, A.S. and Macken, C.A., Proc. Natl. Acad. Sci. USA, 92 (1995) 9657. 74 Reidys, C.M. and Stadler, P.F., Comput. Chem., 20 (1996) 85. 75 Reidys, C.M., Stadler, P.F. and Schuster, P., Bull. Math. Biol., 59 (1997) 339. 76 Stadler, P.F. and Krakhofer, B., Rev. Mex. Fis., 42 (1996) 355. SFI preprint 95-09-076.

150

Applied molecular evolution 77 78 79 80 81 82 83

Weinberger, E.D., Biol. Cybern., 63 (1990) 325. Weinberger, E.D., Phys. Rev., A44 (1991) 6399. Weinberger, E.D. and Stadler, P.F., J. Theor. Biol., 163 (1993) 255. Eigen, M. and Schuster, P., The Hypercycle, Springer, New York, NY, U.S.A., 1979. Macken, C.A., Hagan, P.S. and Perelson, AS., SIAM J. Appl. Math., 51 (1991) 799. Stadler, P.F., J. Math. Chem., 20 (1996) 1. Stadler, P.F., In Lopéz-Peña, R., Capovilla, R., García-Pelayo, R., Waelbroeck, H. and Zertuche, F. (Eds.) Complex Systems and Binary Networks, Springer, Berlin, Germany, 1996, pp. 77–163. 84 Jones, T.C., Master’s Thesis, University of New Mexico, Albuquerque, NM, U.S.A., 1995. 85 Manderick, B., De Weger, M. and Spiessens, P., In Belew, R.K. and Booke, L.B. (Eds.) Proceedings of the Fourth International Conference on Genetic Algorithms, Morgan Kaufmann, San Mateo. CA, U.S.A., 1991, pp. 143–150. 86 Mézard, M., Parisi, G. and Virasoro, M.A., Spin-Glass Theory and Beyond, World Scientific, Singapore, Singapore, 1987. 87 Anderson, P.W., In Pines, D. (Ed.) Emerging Synthesis in Science: Proceedings of the Founding Workshops of the Santa Fe Institute, Santa Fe Institute, Santa Fe, NM, U.S.A., 1985, pp. 25-69. 88 Dittes, F.-M., Phys. Rev. Lett., 76 (1996) 4651. 89 Frauenfelder, H., Sligar, S. and Wolynes, P.G., Science, 254 (1991) 1598. 90 Wolynes, P.G., In Peliti, L. (Ed.) Biologically Inspired Physics, Plenum, New York, NY, U.S.A., 1991, p. 15. 91 Kauffman, S.A. and Levin, S.. J. Theor. Biol., 128 (1987) 11. 92 Kauffman, S.A. and Weinberger, E.D., J. Theor. Biol., 141 (1989) 211. 93 Kauffman, SA., In Stein, D. (Ed.) Lectures in the Sciences of Complexity, Addison-Wesley, Reading, MA, U.S.A., 1989, pp. 619–712. 94 Derrida, B., Phil. Mag., B56 (1987) 917. 95 Flyvbjerg, H. and Lautrup, B., Phys. Rev., A46 (1992) 6714. 96 Mainieri, R., unpublished manuscript, Los Alamos National Laboratory, Los Alamos, NM, U.S.A. 97 Weinberger, E.D., Technical Report 96-02-003, Santa Fe Institute, Santa Fe. NM, U.S.A., 1996. 98 Godzik, A. and Skolnick, J., Proc. Natl. Acad. Sci. USA, 89 (1992) 12098. 99 Goldstein, R.A., Luthey-Schulten, Z.A. and Wolynes, P.G., Proc. Natl. Acad. Sci. USA, 89 (1992) 9029. 100 Wodak, S.J. and Rooman, M.J., Curr. Opin. Struct. Biol., 3 (1993) 247. 101 Geysen, H.M., Rodda, S.J., Mason, T.J., Tribbick, G. and Schoofs, P.G., J. Immunol. Meth., 102 (1987) 259. 102 Hofacker, I.L., Fontana. W., Stadler, P.F., Bonhoeffer, L.S., Tacker, M. and Schuster, P., Monatsschr. Chem., 125 (1994) 167. 103 Tacker, M., Stadler. P.F., Bornberg-Bauer, E.G., Hofacker, I.L. and Schuster, P., Technical Report 9604-016, Santa Fe Institute, Santa Fe, NM, U.S.A., 1996. 104 Gutell, R., Larsen, N. and Woese, C., Microbiol. Rev., 58 (1994) IO. 105 Huynen, M., Gutell, R. and Konings, D., Technical Report 96-07-044, Santa Fe Institute, Santa Fe, NM, U.S.A., 1996. 106 Konings, D. and Gutell, R., RNA, 1 (1995) 559. 107 Forst, C.V., Reidys, C.M. and Weber, J., In Morán, F., Moreno, A., Merelo, J. and Chacón, P. (Eds.) Advances in Artificial Life, Lecture Notes in Artificial Intelligence, Vol. 929, Springer, Berlin, Germany, 1995, pp. 128–147. 108 Grüner, W., Giegerich, R., Strothmann, D., Reidys, C.M., Weber, J., Hofacker, I.L., Stadler, P.F. and Schuster, P., Monatsh. Chem., 127 (1996) 355. 109 Grüner, W., Giegerich, R., Strothmann, D., Reidys, C.M., Weber, J., Hofacker, I.L., Stadler, P.F. and Schuster, P., Monatsh. Chem., 127 (1996) 375. 110 Huynen, M.A., Stadler, P.F. and Fontana, W., Proc. Natl. Acad. Sci. USA, 93 (1996) 397. 11 1 Shapiro, B.A. and Zhang, K., CABIOS, 6 (1990) 309. 112 Schuster, P., Fontana, W., Stadler, P.F. and Hofacker, I.L., Proc. Biol. Sci. R. SOC., 255 (1994) 279. 113 Schuster, P., J. Biotechnol., 41 (1995) 239. 114 Gebinoga, M., Reidys, C.M. and Forst, C.. In 1st Workshop on RNA-Biochemistry (Schwerpunktskolloquium RNA-Biochemie), Bayreuth, Germany, April 1995.

151

B. Levitan 115 Chan, H.S. and Dill, K.A., Phys. Today, 46 (1993) 24. 116 Dill, K.A., Biochemistry, 29 (1990) 7133. 117 Oprea, T.I., Ho, C.M.W. and Marshall, G.R., In Reynolds, C.H., Holloway, M.K. and Cox, H.K. (Eds.) Computer-Aided Molecular Design, American Chemical Society, Washington, DC, U.S.A., 1994. 118 LaBean, T.H. and Kauffman, S.A., Protein Sci., 2 (1993) 1249. 119 Kauffman, S. and Macready, W., Complexity, 1 (1995) 26. 120 Back, T., In Manner, R. and Manderick, B. (Eds.) Parallel Problem Solving From Nature, Vol. 2, Elsevier, Amsterdam, The Netherlands, 1992, pp. 85–94. 121 Macready, W.G. and Woplert, D.H., Technical Report SFI-TR-96-03-009, Santa Fe Institute, Santa Fe, NM, U.S.A., 1996. 122 Bruni, C., Germani, A., Koch, G. and Strom, R., J. Theor. Biol., 61 (1976) 143. 123 Hsu, C.S., In Perelson, A.S. (Ed.) Theoretical Immunology, Part II: Santa Fe Institute Studies in the Sciences of Complexity, Addison-Wesley, Reading, MA, U.S.A., 1988, pp. 135–325. 124 Kim, Y.T., Weblin, T.P. and Siskind, G.W., J. Immunol., 112 (1974) 2002. 125 Fodor, S.P.A., Read, J.L., Pirrung, M.C., Stryer, L., Lu, A.T. and Solas, D., Science, 251 (1991) 767. 126 Schwienhorst, A., Schober, A., Guenther, R. and Stadler, P.F., Biodiversity, 1 (1996) 187. 127 Lowman, H.B. and Wells, J.A., J. Mol. Biol., 234 (1993) 564. 128 Wells, J.A., Biochemistry, 29 (1990) 8509. 129 Eigen, M. and McCaskill, J., J. Phys. Chem., 92 (1988) 6881. 130 Kepler, T.B. and Perelson, A.S., Proc. Natl. Acad. Sci. USA, 92 (1995) 8219. 131 Woodcock, G. and Higgs, P.G., J. Theor. Biol., 179 (1996) 61. 132 Yu, J. and Smith, G.P., Methods Immunol., 267 (1996) 3. 133 Fontana, W. and Schuster, P., Biophys. Chem., 26 (1987) 123. 134 De la Maza, M. and Tidor, B., In Forrest, S. (Ed.) Proceedings of the Fifth International Conference on Genetic Algorithms, Morgan Kaufmann, San Mateo, CA, U.S.A., 1993, pp. 124–131. 135 Goldberg, D.E., Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley, Reading, MA, U.S.A., 1989. 136 Holland, J.H., Adaptation in Natural and Artificial Systems, University of Michigan Press, Ann Arbor, MI, U.S.A., 1975. 137 Mitchell, M., An Introduction to Genetic Algorithms, MIT Press, Cambridge, MA, U.S.A., 1996. 138 Rattray, L.M., Complex Systems, 9 (1995) 213. 139 Shapiro, J.L. and Prügel-Bennett, A., Lecture Notes in Comput. Sci., 993 (1995) 14. 140 Kepler, T.B. and Perelson, A.S., Immunol. Today, 14 (1993) 412. 141 Kepler, T.B. and Perelson, AS., J. Theor. Biol., 164 (1993) 37. 142 Davis, L. (Ed.) Handbook of Genetic Algorithms, Van Nostrand Reinhold, New York, NY, U.S.A., 1991. 143 Holland, J.H., Scientific American, 260 (1992) 44. 144 Grefenstette, J.J., In Whitley, D. and Vose, M. (Eds.) Foundations of Genetic Algorithms, Vol. 3, Morgan Kaufmann, San Francisco, CA, U.S.A., 1995, pp. 139–161. 145 Mahfoud, S.W., Complex Systems, 7 (1993) 155. 146 Vose, M.D., Evol. Comput., 3 (1995) 453. 147 Van Nimwegen, E., Crutchfield, J.P. and Mitchell, M., Physica A, in press. 148 Kimura, M., The Neutral Theory of Molecular Evolution, Cambridge University Press, Cambridge, U.K., 1983. 149 Reidys, C.M., Ph.D. Thesis, Friedrich Schiller Universitaet, Mathematics Faculty, Jena, Germany, September 1995. 150 Freier, S.M., Konings, D.A.M., Wyatt, J.R. and Ecker, D.J., J. Med. Chem., 38 (1995) 344. 151 Geysen, H.M., Pinnacles, 2 (1992) 6. 152 Houghten, R.A., Pinilla, C., Blondelle, S.E., Appel, J.R., Dooley, C.T. and Cuervo, J.H., Nature, 354 (1991) 84. 153 Stemmer, W.P.C., Proc. Natl. Acad. Sci. USA, 91 (1994) 10747. 154 Stemmer, W.P.C., Nature, 370 (1994) 389. 155 Fonesca, C.M. and Fleming, P.J., Evol. Comput., 3 (1995) 1. 156 DeWitt, S.H., Kiely, J.S., Stankovic, C., Schroeder, M., Cody, D. and Pavia, M., Proc. Natl. Acad. Sci. USA, 90 (1993) 6909.

152

Molecular evolutionary biology Peter Schuster Institut für Theoretische Chemie und Strahlenchemie der Universität Wien, Währingerstrasse 17, A-1090 Wien, Austria and Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, U.S.A.

Summary Molecular evolution is described by a comprehensive model that is based on partitioning of the complex dynamics of evolution into three simpler processes: (i) population dynamics; (ii) population support dynamics; and (iii) genotypephenotype mapping. Only in the case of evolution of molecules in the test tube all three processes can be analysed in sufficient detail. Global relations between RNA sequences and secondary structures are studied by means of computer algorithms and analysed with a mathematical model based on random graphs. Three general results have implication for evolution experiments: (i) the existence of relatively few common and many rare structures; (ii) covering of shape space by small and connected regions of sequence space; and (iii) the existence of extended neutral networks in sequence space. Generalizations of the results derived from RNA secondary structures to three-dimensional structures and proteins are discussed.

Introduction Evolutionary biology is traditionally dealing with phylogeny derived from comparative morphology of species and with adaptation of species to the environment. This conventional macroscopic approach was complemented by molecular biology since the early days of the discovery of the structure of DNA. Reconstruction of phylogenetic trees by means of molecular sequence data was the first issue of molecular evolution and indeed, many details that are unaccessible to macroscopic biology were revealed by inspection and analysis of molecular data. A spectacular early discovery was the phenomenon of random drift of populations in a space of genotypes, called sequence space, by Kimura [1], who characterized this as neutral evolution. It occurs when the genotypes have identical fitness and hence are neutral with respect to selection. Almost at the same time, Spiegelman [2] and Eigen [3] laid the experimental and theoretical foundations of molecular evolution in the sense of evolution in vitro. These studies demonstrated that evolutionary phenomena are no privilege of cellular life: nucleic acid molecules, when brought into suitable environments that sustain replication, show all features of Darwinian evolution. In particular, they are capable of variation through mutation; occasionally even recombination is observed. Populations of RNA molecules optimize fitness and adapt to changes in the environment. Since these early days in the late 1960s, molecular evolutionary biology became an established topic in modern biology and found direct practical applications in the form of its youngest daughter: evolutionary biotechnology or applied molecular evolution [4–6]. Annual Reports in Combinatorial Chemistry and Molecular Diversity, Vol. 1, pp. 153–168 © 1997 ESCOM Science Publishers B. V.

153

P. Schuster

The goal of evolutionary biotechnology is to design and optimize biomolecules according to Darwin's principle of variation and selection of the best suited variants. The current progress in evolutionary biotechnology is strongly dependent on the state of the art of fundamental research in molecular and evolutionary biology. Planning and control of efficient selection experiments require detailed knowledge of the mechanisms of molecular evolution. Without a solid background, applied molecular evolution is doomed to fail, just as chemical engineering would be without chemical reaction kinetics. Thus, there is a strong need for further development of evolutionary theory. Two characteristics of molecular evolution are fundamentally different from chemical kinetics: (i) the potential numbers of molecular species, in particular polynucleotides and proteins, are extremely large; and (ii) template-induced replication reactions are inherently autocatalytic. The first issue is indeed prohibitive for systematic studies based on screening of all possible variants. For example, we have more than 10120 different RNA sequences of chain length n = 200. All these sequences are in principle accessible through consecutive (point) mutations. A population of nucleic acid molecules thus can cover only a tiny fraction of all possibilities. Amplification through autocatalysis is the basis of all evolutionary phenomena: even small differences in fitness increase steadily and may lead to complete loss of less fit types within a sufficiently large number of generations. Replication assays are nowadays available for DNA and RNA. Provided one succeeds to prepare an optimal gene, it can be readily amplified for any kind of further usage. This chapter is concerned with a recently developed concept for the illustration and analysis of evolutionary processes at the molecular level [7]. The model makes several predictions for efficient strategies in the evolutionary design and optimization of biomolecules. The text is organized in the following way: at first we present the new view on evolutionary dynamics, which is based on an appropriate partitioning of evolutionary phenomena into simpler processes that can be studied at the molecular level. Our assumption can also be understood as an extension of population genetics, which visualizes evolutionary migration of populations in the space of genotypes and deals explicitly with relations between genotypes and phenotypes. Next, we present results derived from RNA secondary structures as a suitable and sufficiently simple model of genotype–phenotype relations. In vitro evolution of RNA molecules is chosen as an appropriate example to demonstrate the applicability of the theoretical concept to the problem of optimization of molecular properties. A generalization of the results from the RNA model to biopolymers in general concludes the chapter.

Evolutionary dynamics Evolution is an overwhelmingly complex phenomenon, no matter whether it is concerned with molecules or with organisms. In order to set the stage for understanding, analysing, and modeling evolution we partition its dynamics into three distinct processes (Fig. 1) that can be studied independently of the other two [7]: (i) population dynamics; (ii) (population) support dynamics; and (iii) mapping of genotypes into phenotypes. In molecular evolution, population dynamics is tantamount to population genetics of asexually reproducing haploid individuals or to chemical reaction kinetics of polynucleotide replication and mutation. Replication-mutation kinetics is conventionally described 154

Molecular evolutionary biology

Shape Space

Fig. 1. Partitioning of the complex phenomenon of evolution into three simpler processes. Population dynamics is tantamount to population genetics of asexually reproducing haploid populations or chemical kinetics of polynucleotide replication. Population support dynamics describes the migration of populations in genotype or sequence space during the course of evolutionary optimization. Genotype–phenotype mapping deals with the unfolding of a phenotype under the conditions and constraints provided by the environment. It represents the major source of complexity in evolution.

in a space of particle numbers or concentrations, called the concentration space. Population dynamics of evolution experiments deals with selection of the ‘fittest’ genotype currently present in the population. It can be visualized as a hill-climbing process on a fitness landscape in the sense of Darwin’s selection principle of survival of the fittest. Fitness is optimized and an optimal state of the population is approached. When many genotypes 155

P. Schuster

share identical (optimal) fitness values, populations may drift between different optimal solutions in a random walk-like manner. In the limit of (infinitely) large populations, selection kinetics of polynucleotides is properly modeled by the concept of molecular quasispecies that accounts explicitly for the influence of mutation [8]. Depending on both fitness differences and mutation rates, two different scenarios of replication–mutation kinetics are observed. At sufficiently large fitness differences or small enough mutation rates, populations approach stationary mutant distributions, so-called quasispecies. The concept of molecular quasispecies is particularly useful for a deeper understanding of virus evolution and for the development of new antiviral strategies. If the mutation rate exceeds a (computable) threshold value, no stationary state is reached. Instead, all genotypes have some finite lifetime: new genotypes are formed at every instant and old ones die out. Above the threshold, populations migrate through sequence space. In the case of vanishing fitness differences or neutral evolution, the critical mutation rate becomes zero: all populations are non-stationary. Evolution at the primitive level of molecules in the test tube is already dealing with two different time scales, selection kinetics being the fast process complemented by slow migration of populations through sequence space. Chemical reaction kinetics is commonly dealing with the interconversions of small numbers of molecular species, all (except perhaps some intermediates) being present in large numbers. The opposite is commonly true for the evolution of biopolymers: the numbers of genotypes (4n DNA or RNA sequences of chain length n) exceed by many orders of magnitude the numbers of individuals in populations. Consequently, nature and researchers carrying out evolution experiments are handling only minute fractions of the possible. Understanding evolution implies knowing how populations migrate through sequence space. Every sequence is a point in sequence space and the distance between two points, generally known as the Hamming distance, is the minimal number of point mutations required to convert one sequence into another. The members of populations that were grown by replication and mutation are closely related in the sense that Hamming distances between them are small. This is, of course, not the case for populations of random polymers, which are also used as initial conditions in artificial selection experiments. The support of a population is the set of genotypes being actually present, irrespective of their abundance. In other words, a genotype belongs to the support no matter if it is carried by a single individual only or by a very large group of individuals. The support is represented by a connected area (or two or more disjoint areas) in sequence space. Support dynamics describes the development of the population support in time and, in essence, complements population dynamics by extending it into sequence space. It is, for example, dealing with the course of adaptive evolution and, in case of selective neutrality, random drift. Support dynamics provides also a useful illustration of the error threshold in replication–mutation systems. Depending on mutation rates, populations converge to some defined area in sequence space and form stationary quasispecies if mutation rates are below the threshold. Alternatively, migration of populations does never come to an end and they drift randomly through sequence space. The transition from the stationary to the drifting regime occurs in the form of a surprisingly sharp threshold. General analytical approaches to support dynamics are available only for the case of selective neutral156

Molecular evolutionary biology

ity: Kimura’s theory of neutral evolution [1], or a more recent approach based on the ‘random energy model’ [9]. All other examples studied so far relied on computer simulation [10,11]. Genotype–phenotype mapping is a core issue of all evolutionary processes. It provides the parameters for population dynamics in the form of rate and equilibrium constants and thus assigns fitness values to individual genotypes. It deals explicitly with the internal structure of biological objects and thus represents the true source of complexity in biology. Mapping genotypes onto phenotypes is tantamount to predicting the properties of the phenotype from a known DNA (or RNA) sequence. Needless to say, such an attempt is completely out of reach at the current state-of-the-art organisms like bacteria or more complex forms of life. For small RNA viruses, some information is available from the interpretation of life cycles: in fortunate cases viability can be derived from structures and sequences obtained from phylogenetic comparisons. More reliable predictions are possible for in vitro evolution of RNA molecules: here phenotypes are nothing but three-dimensional molecular structures [2]. Analysis of molecular structures is therefore an important issue for molecular evolution. The search for regularities in the relations between RNA sequences and structures is the topic of the next section. Genotype–phenotype mapping of polynucleotides calls for a representation in a space of phenotypes called shape space. It is indeed possible to define quantitative measures of distance between structures that have metric properties [12]. The arrows in the diagram of evolutionary dynamics shown in Fig. 1 indicate a basic relation: as indicated, each of the three processes provides the input for the next one in the cycle. Genotype–phenotype mapping passes reaction parameters to population dynamics. Population dynamics describes how the new genotypes are created through mutation (and recombination) and how the old ones die out. It thereby determines the migration of populations in sequence space, which is the subject of support dynamics. Support dynamics eventually defines the regions of sequence space towards which populations are moving. These are the regions that are relevant in genotype–phenotype mapping.

The RNA model In the simplest known example, evolution of RNA molecules in the test tube, genotype–phenotype mapping is tantamount to describing the relation between RNA sequences and structures formed under specified conditions. The conventional biophysical approach is mainly concerned with structure predictions from given sequences, properly characterized as the ‘one-sequence–one-structure’ problem. This approach, no matter how involved the biophysical methods are, is not suitable for a description of evolutionary phenomena, since migrating populations cover regions of sequence space. A basic knowledge of global information on sequence-to-structure mappings is required. Global findings, for example, are statistical data on the accessibility of RNA structures by folding random sequences, data on the occurrence of structures in the neighbourhoods of reference sequences or data on inverse folding (determining the set of sequences that form a given structure). Three-dimensional structures of RNA molecules are hard to predict, since experimental information on relations of biopolymer sequences and structures is rare. Three-dimensional structures at atomic resolution are unique in the sense that there are no two sequences 157

P. Schuster

forming the same structure. Molecular biology, however, tells us that many sequences are indistinguishable with respect to biological function [1]. Thus, appropriate coarse graining of structures is required in order to be able to relate sequences to structures and properties. The primary question thus is to find an appropriate level of coarse graining that is relevant for the problem considered. In general, this question is too complex to give an answer at present. Nevertheless, we can study the consequences of coarse graining on structures at established and well-defined levels of coarse graining like RNA secondary structures or lattice models of proteins. Within the last five years, we have pursued a research program aiming at an exploration of global relations between sequences and secondary structures of RNA molecules. Most of the work was done with minimum free energy structures, but the results are more generally valid [12]. At first we studied statistical properties of RNA structures derived from random sequences and compared them with those of natural molecules [13]. Application of combinatorics to counting the numbers of acceptable RNA secondary structures yielded estimates which showed that we are dealing with many orders of magnitude more sequences than structures. The next problem to be investigated was obvious: How are the sequences folding into the same structure distributed in sequence space? The following studies led to three important results: the verification of the existence of common (and rare) structures [14,151, the discovery of the principle of shape space covering [14], and the detection and analysis of extended neutral networks formed by sequences folding into the same structure [15,16]. TABLE 1 DISTRIBUTION OF COMMON STRUCTURES IN THE GC30 REFERENCE CASE OBTAINED BY FOLDING ALL GC-ONLY SEQUENCES OF CHAIN LENGTH n = 30 Rank r

Size of the neutral network m

Sequence of components Σ

1 2 3 4 5 6 7 8 9 10

1 568 485 1 558 807 1 348 203 1 343 951 1 328 606 1 314 207 1 240 483 1 230 689 1 215 577 1 206 046

1 568 485 1 558 807 1 348 203 1 343 951 1 328 606 1 314 205 ⊕ 1 ⊕ 1 637 048 ⊕ 603 435 631 582 ⊕ 599 107 626 000 ⊕ 589 577 622 112 ⊕ 583 934 . . .

. . .

17 18 19 20

. . .

962 962 961 954

971 268 970 425

962 971 500 993 ⊕ 461 292 ⊕ 1 961 970 531 837 ⊕ 422 588

Structures are ranked according to the sizes of their neutral networks, i.e. the numbers of sequences (m) forming them under the specified conditions. (The most common structure has rank (r) 1, followed by the structures with rank 2, 3, ..., in decreasing frequency.) The GC30 reference system has 1.073 × 109 sequences, forming all together 218 220 different secondary structures [15]. Components of neutral networks are connected regions in sequence space (see also Fig. 3). Numbers separated by ‘⊕’ give the sizes of the individual components.

158

Molecular evolutionary biology TABLE 2 DISTRIBUTION OF RARE STRUCTURES IN THE GC30 REFERENCE CASE Size of the neutral network m 1 65 ≤ 10 ≤20 ≤ 30 ≤ 40 ≤50 ≤ 70 ≤100

Number of different structures m(m) 12 362 41 487 60 202 80 355 91 898 100 044 106 129 114 938 124 187

The table contains cumulative numbers of structures m(m), which are formed by m or less (neutral) sequences. Thus we have 12 362 different structures that are formed by a single sequence only, and 41 487 structures that are formed by five sequences or less [15].

A relatively small number of common structures is contrasted by a large fraction of all structures that are formed by a single sequence or a few sequences only. Rare structures (with non-vanishing probabilities) cannot be found through non-exhaustive search procedures. Accordingly, they cannot be found by natural or by in vitro evolution techniques and hence play no role in nature or in test-tube evolution. Rare structures, however, are accessible by exhaustive search procedures that fold all sequences of a certain chain length into structures. At the current state of the art, some 109 RNA sequences can be handled by standard computational facilities. This implies that exhaustive folding cannot be applied to AUGC sequences of chain length n > 16 and to binary GC-only or AU-only sequences of chain length n> 30. For the purpose of illustration, we present data obtained for GC sequences of chain length n = 30 (Tables 1 and 2). Exhaustive folding of all 1.073 × 109 GC30 sequences yielded 218 220 different secondary structures. The frequencies of individual structures in sequence space are given in terms of the sizes of their neutral networks, i.e., by the numbers of sequences forming them. Frequencies differ by more than six orders of magnitude and, as known already [14,15], few common structures are complemented by many rare structures: the two most abundant structures are formed by more than 1.5 million sequences each, and the structure of rank 100 is present at 492500 points in sequence space (Table 1). The rare end of the structures is shown in Table 2: more than 50% of all structures are formed by 100 sequences each or less, and 12 362 structures occur only once in the entire sequence space. Frequent structures and their relative abundance can readily be determined by conventional statistical methods; rare structures, however, escape any other approach except exhaustive folding. It turned out that a straightforward definition of common for structures is to be at least as frequent as the average structure, which is formed by: #{Scommon} ≥ (total number of sequences) / (total number of structures) sequences. In the special example, GC30, all structures that are common are formed by more than 1.073 × 1 09 / 2 18 820 = 4907 sequences. The number of common structures is 159

P. Schuster

22 718 and, what is surprising indeed, 999 508 805 sequences fold into the common structures. In other words, 93.1% of all sequences fold into 10.4% of all structures. Investigations on the chain length dependence of the number of common structures and the percentage of sequences folding into these structures gave an important result: for sufficiently long chain lengths, the common structures make up a negligibly small fraction of all structures; however, almost all sequences are folding into them [15]. Therefore, only the common structures play a role in natural evolution and in evolutionary biotechnology, as the rare ones are impossible to find by realistic (non-exhaustive) searching procedures. Shape space covering by a small fraction of all sequences (Fig. 2) is straightforwardly predicted from two results: (i) the number of sequences that form a common structure is fairly large; and (ii) the observation, derived by inverse folding, that sequences folding into the same structure are distributed (almost) randomly in sequence space. A combination of both findings allows to estimate the radius (in Hamming distance) around any arbitrarily chosen reference sequence that leads to a (multidimensional) ball which is sufficiently large to contain at least one sequence for every common structure. This conjecture was confirmed by analysing the data from exhaustive folding [15]. Shape space covering has direct implications for the design of evolution experiments, as it allows to plan mutation rates and amplification factors such that areas of appropriate size in sequence space are scanned efficiently. Sequences folding into the same structure form neutral networks in sequence space. A mathematical model based on random graph theory was designed [16] in order to allow for the derivation of analytical expressions. Neutral networks are represented by graphs in sequence space that show an interesting percolation phenomenon: depending on the

Sequence Space

Shape Space

Fig. 2. Shape space covering by a small spherical region of sequence space. In order to find (at least) one sequence for every common structure, it is only necessary to search a relatively small sphere around an arbitrarily chosen reference point in sequence space. For example, the covering radius for RNA molecules of chain length n = 100 was determined to be rcov = 15; the covering sphere thus contains about 4 × 1024 sequences compared to 1.6 × 1060 sequences in the entire sequence space.

160

Molecular evolutionary biology

A

B

Fig. 3. Neutral networks in sequence space. Depending on the fraction of nearest neighbors that are selectively neutral (λ), the network is either partitioned into a largest component, the so-called giant component, and many small components (A: λ< λcr) or it consists of a single component, usually spanning the whole sequence space (B: λ >λcr). In case of RNA secondary structures, common structures form connected networks of type B.

mean fraction of neutral nearest neighbors in sequence space (λ), the network consists either of several disjoint components representing islands or a single large component spanning the whole sequence space. The first situation is encountered for λvalues below a certain connectivity threshold (λ< λcr), whereas the second scenario is found for sufficiently large fractions of neutral neighbors (λ >λcr). The two cases are shown in Fig. 3. As shown in Table 1, the networks of common structures are generally above the connect161

P. Schuster

ivity threshold. Deviations from the predictions of the random graph model are readily interpreted in terms of specific features of the underlying molecular structures [ 15]. Neutral networks play an important role in evolutionary optimization, as they enable populations to escape from evolutionary traps in the form of local fitness optima (Fig. 4). Populations migrate on neutral networks and, likewise, on a flat landscape by a diffusionlike mechanism [9,11]. Whenever optimization ends on a local fitness optimum, the population starts to drift on the neutral network belonging to the structure that corresponds Adaptive Walks without Selective Neutrality

Sequence Space Fig. 4. The role of neutral networks in evolutionary optimization through adaptive walks and random drift. Adaptive walks allow to choose the next step arbitrarily from all directions where fitness is (locally) nondecreasing. Populations can bridge over narrow valleys with widths of a few point mutations. In the absence of selective neutrality (upper part) they are, however, unable to span larger Hamming distances and thus will approach only the next major fitness peak. Populations on rugged landscapes with extended neutral networks evolve along the network by a combination of adaptive walks and random drift at constant fitness (lower part). In this manner, populations bridge over large valleys and may eventually reach the global maximum of the fitness landscape.

162

Molecular evolutionary biology

to this optimum. Random drift is continued until the population reaches an area in sequence space where the fitness values are higher than those at the network. Then, the next period of adaptive optimization sets in. A complete optimization run thus appears as a stepwise process: phases of increasing fitness are interrupted by ‘static’ periods with fitness values fluctuating around a constant value. When we are optimizing on the extended neutral networks of common structures, the population will eventually find the global optimum. Computer simulations [ 10,11] have indeed shown the occurrence of optimization on two time scales leading to steps in the time course of the fitness function: periods of fast increase in the average fitness of the population are interrupted by fairly long phases of constant fitness, due to random drift.

Evolutionary biotechnology The theory of molecular evolution and the in vitro evolution experiments suggest practical applications to the design of biopolymer molecules as they were proposed already in the 1980s [4]. The basic principles of the so-called ‘irrational design’ of biomolecules are indeed identical with Darwin’s natural law of variation and selection. Molecular properties are improved iteratively in selection cycles in order to achieve an optimal match with the predefined target function. The process is sketched in Fig. 5. Every selection cycle consists of three phases: amplification, diversification, and selection. In these experiments, the fitness of genotypes is tantamount to their probability to enter the next selection round. Information on various parameters is required in order to make evolutionary design of biomolecules efficient. There are, for example, population sizes, amplification factors, mutation rates, reaction volumes, and many other factors that strongly influence the outcome and the efficiency of an evolution experiment. The individual parameters are not independent and therefore simple estimates based on uncorrelated superposition are likely to be incorrect. The model introduced earlier in this chapter allows to implement computer simulations that are suitable to clarify the role of individual parameters and to provide estimates on the optimal choices. The results obtained for sequence structure relations of RNA are higly relevant for the design of optimal protocols. Shape space covering may be used to choose mutation rates that are sufficiently large to explore the covering sphere. The radius of the covering sphere does not depend on the reference sequence and thus RNA molecules with random sequences are as good as preselected molecules in the initial populations of selection experiments. The existence of neutral networks implies that the course of evolutionary optimization occurs in a stepwise manner. This feature is properly accounted for by alternating applications of low and high mutation rates: low mutation rates are appropriate in the adaptive phases, whereas high mutation rates speed up the diffusion on networks. There is, however, an upper limit to the error rate: if it exceeds the critical value of a ‘phenotypic error threshold’ [ l l, 17], the population falls off the neutral network and the selected structure is lost. The phenotypic error threshold is the analogue of the conventional (genotypic) error threshold [8] in the presence of selective neutrality. Amplification and diversification can be achieved in different ways by the current techniques of molecular biology. Several efficient replication assays are available for DNA 163

P. Schuster

Fig. 5. Selection cycles in evolutionary design of biopolymers. The optimization of the desired molecular properties and functions is achieved iteratively in consecutive selection cycles. Each cycle consists of the following three phases: amplification, diversification, and selection.

and RNA. The previously used virus-specific RNA replicases like Qβ replicase lost their dominating role because the sequences and the structures of templates have to fulfil several prerequisites for the recognition by the enzyme. Currently used techniques like for 164

Molecular evolutionary biology

example the polymerase chain reaction (PCR) [ 18] or the self-sustained sequence-replication (3SR) reaction [19] are much less restrictive and work with almost every template sequence. The major difference between the examples lies in the protocol: conventional PCR requires two different reaction temperatures, whereas the 3SR reaction works isothermally. As the 3SR reaction illustrates, RNA replication works more generally and more efficient with DNA intermediates. Two strategies are commonly applied in the production of variants: (i) insertion of stretches with random sequences into DNA and RNA; or (ii) replication with artificially enhanced error rates. In the latter case, increase in reaction temperature, enzyme modification and addition of dyes that intercalate between base pairs serve the purpose of reducing the replication fidelity. The real challenge of the evolutionary approach is the selection step. It requires either ingenious chemical or physical intuition or, in cases where intellectual skill is doomed to fail for reasons of principle, elaborate high-tech equipment for massively parallel processing of very small probes containing a few copies of the genotype, sometimes even a single molecule only. The SELEX technique [20] is the most widely used example of a selection procedure applied to pick suitable molecular candidates from a great number of variants in homogeneous solutions. A molecular ‘negative’ to which the desired molecule should bind optimally is covalently bound to the stationary phase of a chromatographic column. Molecules binding to the target are retained from the solution when it is poured over the column, whereas the non-suitable candidates remain dissolved and are discarded after leaving the column. Then, a series of solvents is applied in successive selection cycles that require higher and higher binding constants to the target for molecules to stay in the race. Eventually only the molecules with optimal binding constants ‘survive’. RNA molecules were obtained by the SELEX procedure that can discriminate between closely related targets through binding constants that differ by several orders of magnitude, for example between caffeine and theobromine [5,21]. Alternatives to SELEX make use of chemical tags that are attached to the molecular candidates for selection. These tags are monitoring the desired molecular properties, for example the reactivity of ribozymes [6]. Many classes of selection problems can neither be solved by a SELEX protocol nor by another technique starting from homogeneous solutions. Selection can then be achieved by an alternative method, based on spatial separation and screening. The solution carrying the candidates for selection is partitioned into small droplets that contain, on average, single genotypes. After processing the samples are screened by means of a suitable analytical tool, preferentially fluorescence spectroscopy, and the selected molecules are used as input for the next selection cycle. Depending on the particular problem the number of samples that have to be screened for successful optimization may be very large. In order to achieve the desired result in reasonably short times and with acceptable consumption of materials, massively parallel techniques working with very small probes are needed. This simple and basic idea is contrasted by enormous difficulties encountered in the implementation of such a selection technique. A recent approach to build an ‘evolution machine’ was based on silicon wafer technology [22,23]. The wafer contains miniaturized reaction vessels in numbers from several thousands up to one million (maximum) on plates with diameters up to 20 cm. Fast processing and screening of samples is achieved by special equipment, such as a micropipette of liquid jetter type, a high-precision positioning device and a CCD camera as recorder. Problems that become solvable by means 165

P. Schuster

of such microtechnology are manifold and not restricted to selection experiments. They span a wide range from elaborate serial transfer experiments and molecular screening after spatial separation to software-controlled substance production in combinatorial chemistry as well as parallel synthesis of oligonucleotides and oligopeptides. Microsystem technology appears to be particularly important for future evolutionary protein design in vitro. RNA molecules are rather limited in their biochemical functions. Proteins are certainly much more versatile. The principle of Darwinian evolution becomes applicable to protein design only if proteins can evolve together with their genes, on the DNA level or on the level of the m-RNAs. The key problem here is indeed to provide tight coupling between the nucleic acid and the protein derived from it through translation. A batch system in which all RNAs and protein molecules are contained in the same solution would clearly not work. The technique of phage display [24] serves as a powerful example of in vivo coupling of genes and translation products. A possible in vitro coupling procedure isolates individual RNA molecules in compartments prior to translation, for example on the wafer. Translation is carried out in the reaction chambers. After translation, the proteins are subjected to screening for the desired properties. The use of massively parallel techniques is important since up to millions of variants may have to be compartmentalized, translated and tested in order to perform successful searches. The m-RNA molecules are isolated from the reaction chambers with the best suited proteins and used in further selection cycles (Fig. 5). Variation is produced anew through replication and mutation of RNA (or DNA) molecules. Eventually an optimized protein is obtained in some reaction chamber from where the optimal gene can be isolated. The optimal gene can then be cloned and translated. Conventional genetic engineering techniques are available for protein production in larger quantities.

Concluding remarks and perspectives RNA secondary structures provide a kind of ‘toy universe’ that allows to study the mapping of genotypes into molecular phenotypes in great detail. Three global relations between RNA sequences and structures were found to be important for evolution: (1) Few common structures are contrasted by a large number of rare structures that are very hard to find in sequence space by both natural and artificial search strategies. (2) Shape space covering by small connected regions in sequence space shows that only a relatively small fraction of all sequences has to be searched in order to find any of the common structure. (3) Sequences folding into the same structure form extended neutral networks that enable evolutionary dynamics to approach major peaks of fitness landscapes and eventually allow to reach the global optimum. We were able to show that these three properties related to neutrality with respect to selection make evolution of RNA molecules for predefined purposes a much simpler problem than previously thought. The results derived here for RNA secondary structures can be generalized to threedimensional structures. Additional constraints provided by tertiary interactions will reduce the degree of neutrality. In order to apply the concepts presented here to real spatial structures, suitable coarse graining is required. Such techniques still need to be developed. 166

Molecular evolutionary biology

There are also strong hints that protein structures fulfil relations very similar to those reported for the RNA structures. In a recent paper [25] it was shown that the first regularity, the existence of a few common and many rare structures, is indeed observed with lattice protein models as well. Extensive neutrality seems to be present with proteins, as the data on the molecular clock of evolution derived from sequence comparisons show [1]. Elegant experimental studies have shown that molecular evolution can be applied successfully to the problem of the design of biopolymers for predefined purposes. Techniques based on RNA molecules are well developed already. Work on evolutionary protein design in vitro makes good progress and suitable protocols will be available in the near future. In order to make evolutionary design of biopolymers a powerful and efficient technological tool, however, large-scale optimization of the currently exercised procedures is necessary. To reach this goal a more detailed and comprehensive knowledge of the basis of evolutionary processes and their mechanisms is required. Although the understanding of evolution has been substantially improved by the accessibility of molecular data, the theory of molecular evolution is far from complete at present. Accordingly, further development of the theoretical concepts is needed in order to be able to provide firm instructions for planning and control of optimization experiments in evolutionary biotechnology.

Acknowledgements The work reported here was supported financially by the Austrian ‘Fonds zur Förderung der wissenschaftlichen Forschung’ (Projects P-10578-MAT and P-11065), by the Commission of the European Union (Contract Study PSS*0884), and by the Santa Fe Institute.

References 1 Kimura, M., The Neutral Theory of Molecular Evolution, Cambridge University Press, Cambridge, U.K., 1983. 2 Spiegelman, S., Q. Rev. Biophys., 4 (1971) 213. 3 Eigen, M., Naturwissenschaften, 58 (1971) 465. 4 a. Eigen, M. and Gardiner, W.C., Pure Appl. Chem., 56 (1984) 967. b. Kauffman, S.A., J. Theor. Biol., 119 (1986) 1. 5 Ellington, A.D., Curr. Biol., 4 (1994) 427. 6 a. Beaudry, A.A. and Joyce, G.F., Science, 257 (1992) 635. b. Chapman, K.B. and Szostak, J.W., Curr. Opin. Struct. Biol., 4 (1994) 618. 7 a. Schuster, P., Physica D, (1997) in press (Santa Fe Institute Preprint No. 96-07-47). b. Schuster, P., In Morán, F., Moreno, A., Merelo, J.J. and Chacón, P. (Eds.) Advances in Artificial Life, Lecture Notes in Artificial Intelligence, Vol. 929, Springer, Berlin, Germany, 1995, pp. 3–19. 8 Eigen, M., McCaskill, J. and Schuster, P., Adv. Chem. Phys., 75 (1989) 149. 9 Derrida, B. and Peliti, L., Bull. Math. Biol., 53 (1991) 355. 10 a. Fontana, W. and Schuster, P., Biophys. Chem., 26 (1987) 123. b. Fontana, W., Schnabl, W. and Schuster, P., Phys. Rev., A40 (1989) 3301. 11 Huynen, M.A., Stadler, P.F. and Fontana, W., Proc. Natl. Acad. Sci. USA, 93 (1996) 397. 12 Tacker, M., Stadler, P.F., Bornberg-Bauer, E.G., Hofacker, I.L. and Schuster, P., Eur. Biophys. J., 25 (1996) 115. 13 Fontana, W., Konings, D.A.M., Stadler, P.F. and Schuster, P., Biopolymers, 33 (1993) 1389.

167

P. Schuster 14 Schuster, P., Fontana, W., Stadler, P.F. and Hofacker, I.L., Proc. R. Soc. London, B255 (1994) 279. 15 a. Grüner, W., Giegerich, R., Strothmann, D., Reidys, C., Weber, J., Hoacker, I.L., Stadler, P.F. and Schuster, P., Monatsh. Chem., 127 (1996) 355. b. Grüner, W., Giegerich, R., Strothmann, D., Reidys, C., Weber. J., Hoacker, I.L., Stadler, P.F. and Schuster, P., Monatsh. Chem., 127 (1996) 375. 16 Reidys, C., Stadler, P.F. and Schuster, P., Bull. Math. Biol., 59 (1997) 339. 17 Reidys, C., Forst, C. and Schuster, P., (1997) to be published. 18 a. Saiki, R.K., Scharf, S., Faloona, EA., Mullis, K.B., Horn, C.T., Ehrlich, H.A. and Arnheim, N., Science, 230 (1985) 1350. b. Mullis, K.B., Sci. Am., 262 (1990) 56. 19 a. Guatelli, J.C., Whitfield, K.M., Kwoh, D.Y., Barringer, K.J., Richman, D.D. and Gingeras, T.R., Proc. Natl. Acad. Sci. USA, 87 (1990) 1874. b. Gebinoga, M. and Oehlenschläger, E, Eur. J. Biochem., 235 (1996) 256. 20 Tuerk, C. and Gold, L., Science, 249 (1990) 505. 21 Jenison, R.D., Gill, S.C., Pardi, A. and Polisky, B., Science, 257 (1994) 635. 22 Schober, A., Walter, N.G., Tangen, U., Strunk, G., Ederhof, T., Dapprich, J. and Eigen, M., Biotechniques, 18 (1995) 652. 23 a. Schober, A., Schwienhorst, A., Köhler, J.M., Fuchs, M., Günther, R. and Thürk, M., Microsystems Technol., 1 (1995) 168. b. Köhler, J.M., Pechmann, R., Scharper, A., Schober, A., Jovin, T.M., Thürk, M. and Schwienhorst, A., Microsystems Technol., 1 (1995) 202. 24 Winter, G. and Milstein, C., Nature, 349 (1991) 293. 25 Li, H., Helling, R., Tang, C. and Wingreen, N., Science, 273 (1996) 666.

168

Landscapes for molecular evolution: Lessons from in vitro selection experiments with nucleic acids Sulay D. Jhaveri, Ichiro Hirao, Sabine Bell, Kyle W. Uphoff and Andrew D. Ellington Department of Chemistry, Indiana University, Bloomington, IN 47405, U.S.A.

Introduction The efficiency and complexity of natural biopolymers have long been a source of fascination to both biologists and chemists. Studies of molecular phylogeny have confirmed that the nanomachines of the cell are the products of evolution, just as earlier work in comparative physiology and zoology showed that macroscopic traits are under selective pressure. The understanding that molecules could evolve led inevitably to attempts to ‘breed’ better molecules, just as the understanding that genetics affected phenotype led to the startling diversity of dog, pigeon, and horse phenotypes originally commented on by Darwin. Starting with Spiegelman in the early 1970s [1], numerous researchers have carried out ‘test tube evolution’ experiments designed to produce novel molecular phenotypes.

In vitro selection At root, the principles that guide the in vitro selection (Fig. 1) of molecules are much the same as those guiding the natural selection of organisms. However, genetic diversity is generated by the chemical synthesis of DNA oligonucleotides containing a random sequence core rather than by the accumulation of mutations in organismal genomes. While the natural selection of organisms is dependent on the complex, translated linkage between genotype and phenotype, the in vitro selection of nucleic acids is dependent on the simpler and more direct linkage between sequence and structure. Since nucleic acids are not just information-carrying macromolecules with the potential for self-replication, but can also fold into conformations that determine function, the random sequence pool represents a diverse set of ‘species’ that can potentially perform a variety of desired functions, such as binding to target molecules or catalyzing reactions. These functions are selected for either by physically separating species from the population (e.g., affinity immobilization) or by allowing only successful species to breed (e.g., ligation of a primer sequence required for the polymerase chain reaction (PCR)). Once a desired trait has been selected for, constant sequence regions flanking the random sequence core allow enzymatic amplification to take place in a test tube. Although this molecular mechanism is not as colorful as the behaviors frequently associated with the division and propagation of organisms, it is far more powerful: amplification factors of as much as 1 000 000 can be achieved in a single generation. As in nature, selection for the desired trait followed by Annual Reports in Combinatorial Chemistry and Molecular Diversity, Vol. I, pp. 169–91 © 1997 ESCOM Science Publishers B. V.

169

S. D. Jhaveri et al.

Fig. 1. In vitro selection scheme, showing an outline of the steps involved in the in vitro selection of functional aptamers. Sequences from an RNA pool created by combined chemical and enzymatic synthesis are partitioned according to their abilities to perform a desired task. Unfit RNAs are discarded, and the RNAs that are ‘fit’ to do the task are reverse transcribed, PCR amplified, and regenerated through in vitro transcription. Multiple cycles of selection and amplification should result in the selective enrichment of the fittest species.

amplification results in a functionally enriched pool of variants. Multiple cycles of selection and amplification lead to the purification of those molecular species that are most functional or fecund. Nucleic acids that can perform a wide variety of binding reactions have been selected from random sequence pools by affinity immobilization. Oliphant et al. [2] selected DNA molecules that could bind to the yeast transcriptional activator GCN4 from a randomsequence DNA pool that spanned nine positions. Since then, aptamers (nucleic acid ligands) have been selected against a variety of protein targets that naturally bind to nucleic acids, such as EF-Tu, ribosomal proteins, Qβ replicase, and reverse transcriptase (reviewed in Ref. 3). In addition, aptamers have been selected against intracellular and 170

Landscapes for molecular evolution

extracellular proteins that were not normally thought to interact with nucleic acids. These aptamers have been applied to the dissection or modulation of metabolic pathways. For example, aptamers have been found that bind to and inhibit the signalling functions of cytokines such as fibroblast growth factor, vascular endothelial growth factor, and nerve growth factor (reviewed in Ref. 4). Aptamers that can specifically inhibit enzymes such as protein kinase C [5] or human neutrophil elastase [6–8] have also been selected. Singlestranded DNA aptamers selected to bind thrombin have been found to specifically interact with the highly charged, anion-binding exosite of thrombin, fold into a G-quartet motif, and block blood coagulation both in vitro and in vivo [9–12]. In vitro selection experiments have also been carried out against small molecular targets. Sassanfar and Szostak [13] selected a high-affinity RNA binding site for the cellular metabolite ATP. Several groups have now selected aptamers that can bind to aminoglycoside antibiotics such as neomycin and kanamycin [ 14–1 6]. Other small-molecule targets have included other metabolic cofactors such as flavin, nicotinamide, and cyanocobalamin, amino acids such as tryptophan and arginine, and drugs such as theophylline (reviewed in Refs. 17 and 18). Finally, novel nucleic acid catalysts have also been selected from random sequence pools (reviewed in Ref. 19). Joyce and co-workers have manipulated the function of the Group I self-splicing ribozyme, selecting variants that can utilize calcium or cleave DNA from partially randomized pools [20,21]. Lorsch and Szostak [22] selected a polynucleotide kinase ribozyme from a completely random sequence pool that flanked a previously selected ATP binding site. Many of the novel ribozymes can catalyze reactions that are relevant to protein biosynthesis, bolstering arguments that translation may have arisen in a putative RNA world. For example, Lohse and Szostak [23] have selected ribozymes that can carry out an acyl transfer reaction, while Illangasekare et al. [24] have isolated a

Fig. 2. Geometric representation of a sequence space. Shown is the sequence space for a dinucleotide using a four-letter alphabet. Each point represents a sequence. Each double-arrowed line represents a Hamming distance h = 1.

171

S. D. Jhaveri et al.

Fig. 3. Cartoon representation of a fitness landscape.

ribozyme that aminoacylates its 2' (3') terminus when provided with phenylalanyladenosine monophosphate. Beyond generating novel species for the study of molecular recognition and nucleic acid catalysis, these experiments can provide fundamental insights into the nature of the landscapes that govern molecular evolution. By analyzing the relationships between selected sequences and their attendant functions, we can better understand how difficult it may be to chance upon function, and what the probable course of evolution may be once a nascent function has been discovered.

A cartoon representation of landscapes that map sequence to function Before expanding on how in vitro selection experiments can be used to help define evolutionary landscapes, an explication of landscapes should be provided. A fitness landscape is a surface that is generated by graphing points in a sequence space to their corresponding fitnesses (i.e., plotting the fitness function [25]). A nucleic acid sequence space would therefore be the set of all possible unique sequences of length N, and would contain 4N points. Such sequence spaces [26] can best be represented by a multidimensional Boolean polyhedron (Fig. 2), where each vertex would represent one sequence. Relationships between sequences within such a space can be determined by calculating their Hamming distance (h), a metric that assesses the number of sites at which two sequences differ [27]. Hence, given the four-letter alphabet for RNA, a sequence of length N would have 3N nearest neighbors with h = 1. These nearest neighbors would extend in many dimensions, and all sequences would be topologically equivalent. Since this representation of sequence space would be extremely complicated to view and apprehend, we choose to use a simplified cartoon. The cartoon representation of a fitness landscape (Fig. 3) used in this chapter has only three dimensions. The z-axis scales the relative fitness of molecules. Fitness is arbitrarily defined, and may pertain to how well an RNA binds to a particular ligand, or how well it catalyzes a desired reaction. The sequences on the top of the cartoon peaks have a higher degree of fitness than the ones on the plateau. The xy-plane represents a two-dimensional apparition of much more complex sequence spaces. Sequences that have smaller Hamming distances would be closer 172

Landscapes for molecular evolution

in this plane than sequences with larger Hamming distances, but the mathematical relationships are not precise, nor is the topology defined as in a Boolean polyhedron. However, the cartoon representation can be immediately used to graph data gathered from in vitro selection experiments, and may prove extremely useful for extracting qualitative rather than quantitative conclusions. For example, by placing sequences derived from selection experiments in the xy-plane and separating them according to their Hamming distances, it may be possible to gain a rough map of the relationships between local and global optima in a fitness landscape. A caveat to translating the results of in vitro selections to RNA fitness landscapes is the incompleteness of the experimental exploration. A library that is randomized at 35 positions has 435 or 1021 unique sequences, while the typical number of RNA molecules used in a selection experiment is around 1014. This restriction is a practical rather than an absolute one: 1014 sequences is roughly 10 µg of nucleic acids, an amount that can be prepared relatively easily. Preparing more than 100 µg of nucleic acids is a difficult undertaking, and preparing the multiple grams of random sequences that would be required to probe a landscape corresponding to a 35-mer is (for now) impossible. Thus, the 35-mer fitness landscape is being probed at only 1 in every 10 000 000 positions. Moreover, usually only the sequences that survive a selection are characterized in any detail, while sequences that are not fit or fecund are seldom characterized at all. Therefore, while something may be known about the higher peaks, almost nothing is known about the valleys. Another consequence of the incomplete search is that only a few of the multiple possible local optima will be identified, and it is frequently unclear whether the global fitness peak has been found. Finally, sequences corresponding to selectively ‘neutral’ paths between peaks may be rare even in a complete description of the landscape and thus would be completely unobserved in a selection experiment. Overall, in vitro selection experiments can be thought of as a coarse, pointalistic investigation of fitness landscapes. Nonetheless, as we will see, such sketchy views can still be surprisingly informative, a testimony either to the skill of the experimenters or perhaps to fundamental properties of the landscapes themselves. Schuster [28] has developed a compelling theoretical model of nucleic acid sequence space. In this model, a nucleic acid sequence maps to the secondary structure, and secondary structure is assumed to correlate directly with function. Given that most aptamer families seem to be almost completely described by their primary sequence and secondary structural folds, this assumption may be very tenable (see, for example, results from selections that target keratinocyte growth factor, described below). On examining the map of sequence to secondary structural folds, Schuster finds that a relatively limited portion of sequence space can encompass all possible structural folds. Thus, complete descriptions of a fitness landscape may not be necessary in order to gain all relevant insights into the map of sequence to function. In the next several sections, we will examine parameters that affect the topography of fitness landscapes. First, we will examine the general ruggedness of fitness landscapes. We will then focus on how variables having to do with the RNA molecules themselves, length and chemistry, affect landscape topography. Conversely, we will compare landscapes for related targets and attempt to identify how the changes in target molecules lead to changes in selected landscapes. Finally, we will show how more fundamental insights into molecular recognition can sometimes be extracted from such cartoon views. 173

S.D. Jhaveri et al.

How rugged are landscapes? At first glance, a fitness landscape can be ‘rugged’ or ‘smooth’. In a rugged landscape, there is very little similarity in fitness between adjacent sequences (Fig. 4). A smooth landscape would be one in which the fitness of adjacent points is very similar (this is also known as a high degree of autocorrelation [25]). Of course, there is no such thing as a perfectly smooth landscape. Depending on the scale and perspective of the cartoon, differences in function can range from extremely subtle to drastic. One of the first questions that was answered by selection experiments was just how many sequence ‘solutions’ might exist for a given functional ‘problem’, and therefore just how rugged fitness landscapes might be. The answer turned out to be highly dependent on the target and the selection. For example, Tuerk and Gold [29] selected RNA molecules that could bind to T4 DNA polymerase from a pool that was constrained to form a stem–loop structure and that contained eight random sequence positions. After four cycles of selection and amplification, this selection returned a wild-type RNA known to bind the polymerase, and also an unanticipated, novel sequence. In contrast, Ellington and Szostak [30] selected RNA molecules that could bind to small organic dye molecules from a pool that was structurally unconstrained and contained 100 random sequence positions. After six cycles of selection and amplification, these authors retrieved multiple (estimated to be > 100 000) different sequence solutions that could bind a given dye. Since specific dye-binding sequences had not previously been discovered, all of the sequences were novel. Moreover, the sequences appeared to be largely unrelated to one another. In a cartoon view (Fig. 5), one landscape would appear extremely sparse (two functional

a

b

Fig. 4. Smooth and rugged landscapes. (a) In a smooth landscape, adjacent sequences have similar fitness. (b) In a rugged landscape, adjacent sequences do not have similar fitness.

174

Landscapes for molecular evolution

a

b

Fig. 5. Landscapes with varying functional densities. (a) An extremely sparsely populated landscape; (b) a relatively populous landscape.

peaks), while the other would look relatively populous (many functional peaks). However, the density of peaks in these two experiments should be considered as well: 2/48 ≈ 1/3 × 104 versus 105/1013 ≈ 1/108 (1013 was the actual population size, as opposed to a potential population with a size of 4100 ≈ 1060). Thus, the peaks in the sparse landscape would seem clustered relative to the peaks in the populated landscape. While the differences in the results cited above may have been due to differences in the targets, the binding reactions or the pools, one key difference that always affects the number of sequence solutions recovered is the stringency of the selection. For example, while there are many possible explanations for the differing numbers of aptamers identified by the experiments carried out in the laboratories of Gold and Szostak (proteins may return fewer solutions than small organic targets, or small libraries may return fewer solutions than larger libraries), the most obvious difference between these experiments was the stringency of selection. The selection that targeted phage T4 polymerase strove to identify sequences that bound as well as the wild-type RNA, while the selections that targeted small organic dyes had no preconceived criterion other than to identify binding species. The aggregate Kd values observed for anti-dye aptamers (ca. 100 µM) were well above those observed in subsequent, more stringent selections that targeted small molecules [ 18]. The relationship between stringency and the apparent nature of the landscape that relates sequence to function can also be observed within individual selection experiments. For example, Lato et al. [14] selected RNA molecules that could bind to the aminoglycoside antibiotics kanamycin and lividomycin, and recovered millions of different 175

S. D. Jhaveri et al.

binding species (aggregate Kd: 220 nM). In contrast, other groups [15,16] have selected RNA molecules that could bind to the aminoglycoside antibiotics tobramycin and neomycin, respectively, and recovered relatively few binding species that centered on particular motifs (with Kd values of ca. 2 and 100 nM, respectively). The different answers derived by these different groups were not the results of the different targets. The aminoglycosides are chemically very similar and are known to interact strongly with RNA. Rather, the different groups were in fact seeking different answers. Lato et al. [14] sought to identify RNA molecules that bound aminoglycosides only as well as the ribosome (Kd of ca. 100 nM–1 µM), while Wallis et al. [15] and Wang and Rando [16] sought to identify the absolute highest affinity binding species. Therefore, Lato et al. carried out a relatively low stringency selection, while Wallis et al. and Wang and Rando carried out very stringent selections. When the RNA pool selected to bind to lividomycin was further winnowed for high-affinity species, one major sequence predominated [31]. Therefore, comparing different fitness landscapes must of necessity be qualitative, since even as basic a property as the apparent number of functional peaks is highly dependent on selection conditions. To put it in a different way, selection experiments tend to return quantized solutions (individual sequences that are as good as an arbitrary functional cutoff), while complete fitness landscapes may contain a gradient of solutions (the implied scale of the z-axis; Fig. 6). Irrespective of how selection stringency affects experimental views of fitness landscapes, the notion that some landscapes may be sparse while others contain numerous functional peaks is intriguing. However, neither of these extreme views adequately represents the reality of most selection experiments. In general, selections return a limited number of related sequence families that contain particular sequence or structural motifs. For example, selections that targeted cyanocobalamin produced five different sequence classes [32]. However, the motifs in these different families were unrelated to one another. Selections that targeted vascular endothelial growth factor (vEGF) returned six sequence families, each of which contained a well-defined but different sequence and structural motif [33]. Moreover, while the families differed in their affinity for vEGF, most of the measured dissociation constants were in the nanomolar range, implying that ‘lesser’ solutions, which

Fig. 6. Effects of stringent selection on the fitness landscape. The apparent number of functional peaks is a function of stringency.

176

Landscapes for molecular evolution

Fig. 7. Representations of clusters of sequences. (a) A mesa in the fitness landscape would represent a cluster of sequences with similar activities. (b) A peak would represent a cluster of sequences that embody a gradient of activities.

could not bind as well, had been ruthlessly displaced from the population during the selection. The results of these (and many other) selections would be graphed as a limited number of high and distant peaks. As was mentioned earlier, since selection experiments provide only a sketchy view of the landscape it is unclear whether it might be possible to move from one aptamer family to another via a series of sequence variants that would each still be able to bind the target ligand. Such ‘neutral paths’ have been theoretically postulated, but have not been experimentally demonstrated. The fact that multiple sequences within a space may contain the same functional motif or variations on the motif means that in at least some dimensions functional solutions are related to one another. Partial randomization and re-selection experiments have revealed the extent to which a given motif can vary. For example, an anti-Rev aptamer selected from a completely random sequence pool was resynthesized as a pool in which 62.5% of the residues were wild-type and 37.5% were non-wild-type [34]. Thus, if the wild-type sequence contained a guanosine, the partially randomized pool contained 62.5% guanosine, 12.5% adenosine, 12.5% uridine, and 12.5% cytidine. Some residues showed no change from the original ligand following reselection, while others, such as U9, revealed new preferences (in this case, changing to a C). In one instance, A7 and U31 simultaneously changed to a G and a C, respectively, indicating a preference for a new WatsonCrick pairing between these positions. Finally, some residues, such as those in a terminal loop region, showed little or no sequence preference; instead, the distribution of bases at these positions mirrored the distributions in the original random sequence pool. Similar analyses have been carried out for an anti-reverse transcriptase aptamer [35], the Revbinding element [36], and the Rex-binding element [37]. In each instance, some of the residues were absolutely conserved between selected aptamers, some co-varied and appeared to support secondary or tertiary structural interactions, and some stochastically seemed to track the original random sequence population. In our cartoon view, the results of these selection experiments can be roughly represented by showing the sloping sides of the peak (a cluster of sequences that embody a gradient of activities), or as a ‘mesa’ in the fitness landscape (a cluster of sequences with similar activities; Fig. 7). The degree of slope or the area of the mesa would be correlated with the number of sequences related to the functional peak. 177

S.D. Jhaveri et al.

How does the length of the random sequence pool affect the landscape? As indicated above, one difference between the original selection experiments carried out by the laboratories of Gold and Szostak was the length of the random sequence tract. This variable does not of necessity affect the nature of the landscape: there may be landscapes that have few dimensions (i.e., landscapes corresponding to short sequences), yet many functional peaks, and conversely there may be landscapes that have many dimensions (i.e., landscapes corresponding to long sequences), yet few functional peaks. However, as a practical matter it may be that longer sequences can form more (and more interesting) structural and functional solutions than shorter sequences. The hypothesis that ‘longer can be better’ is supported by a comparison of selection experiments that targeted arginine. Yarus and co-workers selected aptamers that could bind to arginine [38], or to arginine and guanosine [39], from pools that contained 25 random sequence positions. The dissociation constants for the arginine:aptamer complexes ranged from 0.2 to 4 mM. The selected RNAs formed secondary structures in which residues from the random sequence regions were paired with residues from the flanking constant regions. Thus, these aptamers may have been constrained in part by the arbitrary sequences found in the constant regions, and the loose affinities that were observed may have been a result of this skewing rather than the size of the pool. However, Tao and Frankel [40] selected aptamers that could bind to arginine from a pool containing 30 random sequence positions. The dissociation constants for the arginine:aptamer complexes were again in the millimolar range. In contrast, Famulok [41] selected anti-arginine aptamers from a pool that contained 74 partially random sequence positions, and the dissociation constants for the arginine:aptamer complexes were as low as 10 µM. The same group also selected anti-arginine aptamers from a pool that contained 11 3 completely random sequence positions, and in this case the dissociation constant for the tightest aptamer:arginine complex was 330 nM [42]. Interestingly, the anti-arginine aptamers identified in all four selections bore little or no resemblance to one another (Fig. 8; adapted from S.E. Osborne and A.D. Ellington (in press)). While the differences between the results of these

Fig. 8. Consensus sequences of arginine aptamers. (a) Sequence derived from a pool containing 25 random sequence positions; Connell et al. [38]. (b) This sequence binds to both arginine and guanosine [39]. (c) 30 positions; Tao and Frankel [40]. (d) 74 random positions, partially random pool; Famulok et al. [41]. (e) 113 random positions; Geiger et al. [42].

178

Landscapes for molecular evolution

Fig. 9. RNA landscapes of differing sequence lengths. Longer sequences may yield more higher peaks.

experiments may reflect differences in selection conditions or stringencies, it is instructive that as more residues were varied, tighter binding species became available. To test the relationships between sequence spaces of different sizes more directly, two different sequence spaces can be explored for function in parallel. The smaller space is a subset of the larger space (Fig. 9), but the topographies of different regions are unknown. We used libraries that span either 30 or 71 random sequence positions as the starting points for selections. The lengths of the random sequence tracts were arbitrary, but may be instructive: 30-mers can form numerous minimal structural elements, such as stem– loops and pseudoknots, and many aptamers have been culled from 30-mer pools, while 71-mers can form more complex structural elements, such as helical junctions, and have the potential to generate architectures that are buttressed by tertiary structural interactions (e.g., tRNA at ca. 75 residues). RNA molecules that could bind to the tyrosine phosphatase from Yersinia pestis (Yop51*∆ 162) were selected from these pools (S. Bell and A.D. Ellington, unpublished results). After eight cycles of selection and amplification, only one motif remained in the pool of selected 30-mers, a sequence that is predicted to form a stem–loop. In the pool of selected 71-mers, a sequence motif similar to that found in the 30-mer pool was isolated, and had the same binding affinity for the target (Fig. 10). Similarly, anti-Rev aptamers were selected from random sequence pools that contained 18 and 32 random sequence positions [34,43]. In another example, Green et al. [44] selected aptamers with chemically modified bases that could bind to vEGF from pools containing 30 or 50 randomized positions. Thus, at first glance it appears as though the length of the random sequence pool does not significantly perturb the identity of selected sequences.

Is there a tyranny of short motifs? There may be other reasons why pools of different sizes yield the same motifs. If we were able to query each and every point (sequence) in a space about its function, we would obtain a complete and accurate map of the landscape that maps sequence to 179

S.D. Jhaveri et al.

function. In practice, however, the problem is turned on its head: a limited number of selected, functional species (aptamers or ribozymes) are queried about their sequence. Functional sequences that are overrepresented in a population will be more frequently observed than functional sequences that are underrepresented. Because representation is inversely proportional to size, short sequence and structural motifs will predominate over longer sequence and structural motifs. The probability of finding a particular 10-nucleotide motif is 1 in 410, or about one in a million. The probability of finding a particular 20nucleotide motif is about one in a trillion. Thus, there are a million copies of the 10nucleotide motif to every one of the 20-nucleotide motif. This truism has been termed the ‘tyranny of short motifs’, and may have consequences beyond its theoretical import. Even if the longer sequence has a discernably larger affinity for a target molecule, the global optimum on a fitness landscape may never be reached unless the selection is carried out under extremely stringent conditions and/or many cycles of selection are performed. Thus, when the same motif is identified in parallel selections involving random sequence pools of different sizes, it is unclear whether this motif is truly optimal, or if a longer, better aptamer may have been unintentionally overlooked. In the absence of a complete description of the fitness landscape it is difficult to tell whether the ‘tyranny of short motifs’ is an experimental reality. However, given the available data, the ‘tyranny’ can be demonstrated by proving that (i) longer sequence and structural motifs are functionally better than shorter sequence and structural motifs, yet (ii) longer sequence and structural motifs are found less frequently than shorter ones. For example, Bartel and Szostak selected RNA ligases from a pool that spanned 220 random sequence positions [45]. Three families of ligases were identified: two of the families contained multiple members and were structurally simple, while the final ribozyme was a single example of a structurally complex pseudoknot with multiple, pendant hairpins. Kinetic analyses showed that the short ribozyme ligases were slightly less efficient than their longer counterpart. Interestingly, the single, complex ribozyme should not have been found at all. An additional selection experiment was carried out in which the complex ribozyme was partially randomized and variants that could still perform the ligation reaction were selected [46]. By comparing the sequences of the reselected ribozymes it was proven possible to identify invariant residues that were critical for function, and co-varying residues that contributed to structural features. Overall, the ‘information content’ of the complex ribozyme was so high that it should have been present only once in every 2.5 × 1018 sequences, while the original pool spanned only 1.4 × 1015 sequences. These results suggest either that Bartel was extraordinarily lucky, or that multiple, complex motifs were present in the population. If the experiment were carried out again with a different but equally complex pool, a different but equally complex (and hence equally improbable) ribozyme ligase might be found. As an aside, these results may have implications for the

Yop N30 YopN71

Fig. 10. Comparison of N30 and N71 sequences for Yop51*∆162 aptamers.

180

Landscapes for molecular evolution

Family 1

Family 2

Fig. 11. The two aptamer families isolated from the selection against FGF-2. R = A or G, Y = C or U, M = A or C, W = G or U, D = A or G or U, N = any base.

origin and evolution of life. It is thought that the first life forms may have been selfreplicating nucleic acids. The landscape topography predicted by Ekland et al. [46] suggests that as the self-replicators grew in size, new catalytic activities may have become much more readily available to them. Similarly, selections that targeted basic fibroblast growth factor (FGF-2) demonstrate both the reality of the ‘tyranny’ and how it can be thwarted by the stringency of selection. Jellinek et al. [47] used two RNA pools that spanned 30 random sequence positions as starting points for obtaining anti-FGF-2 aptamers. Two independent selection experiments were carried out, one in the presence of a specific competitor (heparin), and one without this competitor. Two aptamer families were isolated (Fig. 11). The selection in which heparin was not used as competitor yielded both families, while the selection in which heparin was used as a competitor yielded only the motif from Family 2 (see Fig. 11). Both families competed with heparin for binding to FGF-2, but Family 2 aptamers had roughly 10-fold higher affinity for FGF-2 than did Family 1 aptamers. The information content of these two families was calculated based on the number of conserved and semiconserved residues. The probability of finding the Family 1 motif was 1 in 3.6 × 107, while the probability of finding the Family 2 motif was 1 in 4.8 × 1011. Therefore, it appears as though the inclusion of heparin in the binding reaction led to a more stringent selection and the concomitant identification of the longer, less frequent, but more optimal sequence motif.

How does nucleic acid chemistry affect the landscape? By altering the base set of monomers, the relationship between sequence and function may be perturbed both for individual aptamers or ribozymes, and in the overall fitness landscape. In particular, it is unclear whether modified nucleotides will augment the chemistry of aptamers or ribozymes and thus allow shorter, simpler, modified sequence motifs to be as effective as longer, more complex, unmodified sequence motifs. A further relevant question is the degree to which nucleic acid chemistry and nucleic acid sequence track one another: Will modified sequence motifs that fulfill a given function be similar to unmodified sequence motifs that fulfill the same function, or will the two be virtually unrecognizable? Just as length effects were evaluated by parallel selections with pools of different sizes, the effects of chemistry on fitness landscapes can be determined by carrying out selections with and without modified nucleotides. 181

S. D. Jhaveri et al.

In the quest to adapt in vitro selection experiments to the development of pharmaceutical reagents, a major concern is how the stability of the nucleic acids in vivo can be improved. Several groups have incorporated nucleotides containing sugars with altered functional groups into selections in order to inhibit ribonuclease activity. For example, the incorporation of 2'-amino-2'-deoxypyrimidines in selections that targeted either FGF-2 or vEGF resulted in the identification of aptamers that were extremely stable in serum [44,48]. In addition, the aptamers identified in the selection with modified nucleotides were very different from those previously obtained via a selection using unmodified bases [47,48]. Not only were the sequences different, but the frequencies with which individual residues occurred in the sequences were also skewed, notably a higher incidence of guanosine residues. DNA can be construed as a modification of RNA, and the results of similar selections that start from either a DNA or an RNA pool are informative. Aptamers against reverse transcriptase selected from RNA pools [49] have different sequences and structures compared to those selected from DNA pools [50]; the same is true for aptamers against ATP [13,51]. The DNA aptamer against thrombin forms a G-quartet [12], whereas the RNA aptamer is predicted to form a stem–loop structure [52]. Not only do modified nucleotides skew fitness landscapes, but different modified nucleotides seem to skew fitness landscapes in different ways. Selections against keratinocyte growth factor (KGF) were carried out using pools containing either 2'-fluoro- or 2'amino-pyrimidines [53]. The resultant 2'-fluoro aptamers had higher binding affinities and bioactivities than the 2'-amino aptamers. Although there was no observed sequence conservation between the aptamers, the secondary structure was conserved. Selected sequences were predicted to fold into two related pseudoknot structural classes. In contrast, the two classes of 2'-amino aptamers that were extracted could be folded into stem–loop structures containing either symmetric or asymmetric loops. Finally, in the broadest comparison that can be made, aptamers against human neutrophil elastase (hNE) have been selected from a modified RNA pool, a DNA pool, and a DNA pool conjugated to a peptide inhibitor [6–8]. Although consensus sequences and structures were found for aptamers derived from all four pools, they did not remotely resemble one another. While modified nucleotides can clearly alter the location of functional optima in sequence spaces, some aptamers seem remarkably robust to substitution with modified nucleotides, We have selected anti-Rev aptamers containing 2'-amino-pyrimidines from the same random sequence pool that was used to select RNA aptamers, and retrieved identical sequence and structural motifs (L. Giver et al., unpublished results). Aptamers containing 5-(1 -pentynyl)-2'-deoxyuridine in place of thymidine were generated against thrombin, and the resulting sequences were strikingly different [54]. The anti-vEGF aptamers that contained 2'-amino-pyrimidines could be further modified by the post-selection incorporation of 2'-O-methyl-purines [44]. When modified purines were introduced at 10 out of 14 ribopurine positions in the aptamer, a 17-fold improvement in binding ability was observed. Chemical modification interference of the binding pseudoknot ligand to HIV-RT revealed that, except for residues C13 and G14, all of the bases could be substituted with 2-methoxyribose in place of ribose, and the aptamer still retained binding activity [35]. However, when the modified anti-bFGF aptamers were completely substituted with 2'-methoxypurines, they could no longer bind [44]. The fact that the inclusion 182

Landscapes for molecular evolution

of modified nucleotides in a selection can result either in the identification of completely new functional optima or a modest alteration of the sequences of old optima is intriguing, and will be touched on again below. Apart from the aptamer and its target, one must consider the environmental parameters that can affect molecular fitness landscapes. After all, the binding reactions are of a chemical nature and parameters such as monovalent and divalent cation concentrations, pH, temperature, and ionic strength of the buffer affect not only the structures and stability of the interacting ligand and target, but the stability of the interaction as well. In a selection of RNA aptamers specific for cyanocobalamin [32], the salt specificity of the aptamer was changed from Li+ to Na+ plus Mg2+ using a mutagenized version of the aptamer. The new aptamer interacted with the target in a completely different manner, making different contacts with the molecule, and it also demonstrated altered specificity, in that it could also bind weakly to adenosylcobalamin. The divalent cation dependence of the Tetrahymena group I intron was changed from Mg2+ to Ca2+ using in vitro evolution [20].

What effect does the target have on the landscape? So far we have focused on variables that alter the nature of the random sequence pool and implicitly assumed that all targets are equivalent in terms of their ability to bind RNA molecules. Obviously, such an assumption is false. Not all targets are equally appealing to RNA, and the nature of the target can have a huge effect on the nature of the landscape that connects sequence and function. Targets that elicit RNA ligands more readily would be expected to have denser landscapes, while targets that elicit RNA ligands only grudgingly would be expected to yield sparse landscapes. In this respect it is interesting to note that, despite the infancy of selection technology, several generalizations can be made about whether and which targets may yield RNA ligands. First, for small ligands it appears as though charged or aromatic compounds are much better selection targets than aliphatic compounds [18,3 1]. Second, although the

Fig. 12. Relationships between targets and sequences. (a) Related targets elicit related sequences; (b) related targets elicit unrelated sequences.

183

S.D. Jhaveri et al.

entire surface of a protein target is available during the selection process, most aptamers tend to bind to positively charged patches, such as heparin binding sites [47]. Finally, we have noted that the aptamers frequently inhibit the function of enzymes or regulatory factors by binding to active or allosteric sites [55]. The structural rationale behind this ‘homing principle’ is that the simplest way for a sequence to form multiple interactions with a target is to fit into a crevice or channel on a protein, rather than binding to a protruding epitope or splaying itself out on the surface of a protein. From another vantage, nucleic acids (or virtually any biopolymer) utilize less information to form balls that can present multiple bonding opportunities around their periphery than to form cusps that enfold a target. In contradiction to this explanation, while aptamers that bind small ligands act like ‘hosts’ enfolding ‘guests’, they do not seem to be more information-rich than aptamer ‘guests’ that bind to protein ‘hosts’. Moreover, at least some aptamers do appear to be able to enfold surface features of proteins: Rev interacts with RNA molecules via an arginine-rich motif (ARM). Anti-Rev aptamers enfold the α-helical Rev ARM within an expanded major groove [56,57]. As an aside, the fact that even short nucleic acids can readily form binding cusps may provide additional evidence for their hypothesized roles in the origin of life and the evolution of metabolism. In addition to deriving generalizations regarding what classes of targets may most readily elicit aptamers, an analysis of selection experiments can help us to relate fanciful fitness landscapes to the realities of molecular recognition. In particular, while selection experiments clearly work, it is unclear how they work. The supposition that in a large, random sequence pool there will always be some folded structure(s) that will be chemically complementary to the surface of a given target is plausible based on the evidence to date, but not immediately verifiable except by induction. However, it may be possible to test the consequences of this hypothesis more directly. If a map between ‘target space’ and nucleic acid sequence space is always possible, we might a priori imagine that in such a map related targets will elicit related sequences (Fig. 12a). Alternatively, it may be that a map between ‘target space’ and sequence space will be extremely convoluted, and that targets with related sequences and chemistries will elicit completely unrelated sequences (Fig. 12b). If the latter situation is found to prevail, it is more difficult to rationalize how any given target will map to at least one unique sequence without the appearance of ‘holes’ in the map. In other words, if the map is discontinuous between related points, then some phenomenon unrelated to what we know of molecular recognition must be operating in order to ensure complete coverage of ‘target space’ by nucleic acid sequence space. In order to determine exactly how target molecules affect a selection, we will examine several selection experiments in which target molecules were known to be related to one another. First, aptamers that bind to thrombin seem to adopt a standard motif irrespective of slight movements in target space. Bock et al. [9] originally selected a DNA quadruplex that bound thrombin. Tsiang et al. [58] more recently carried out a selection against a thrombin variant that contained an arginine-to-glutamate position in the anionic exosite. Despite the fact that this mutation might have been expected to alter the electrostatic interactions with the aptamer, aptamers that bound this variant contained a single T-to-A substitution in one of the loops connecting the strands of the quadruplex. Curiously, the new aptamer could bind to both wild-type and mutant thrombins. 184

Landscapes for molecular evolution

Second, while we and others have previously described selections that targeted the Rev protein of HIV-1 [34,36,43], Rev is just one member of a class of proteins that bind their cognate RNAs via ARMS [59]. We have selected aptamers that bind to another ARM protein, the Rex protein of HTLV-I, from the same pool that was used to select anti-Rev aptamers (S.D. Baskerville et al., manuscript in preparation). The motifs that were returned in the Rex selection initially appeared unrelated to those returned in the Rev selection. At first glance, this result was unsurprising: despite the fact that both proteins are ARM family members, their RNA binding domains are likely quite different. While the Rev ARM forms an α-helix [60], the Rex ARM contains multiple proline residues that would prevent a-helix formation [37]. Nonetheless, on closer examination it appears that at least some anti-Rev and anti-Rex aptamers share common sequences and substructures. The best anti-Rev aptamer contains a short, paired sequence motif adjacent to a pyrimidine bulge: 5'-CUC ... UUGAG-3'. This sequence and structural motif has also been observed in the HIV-1 TAR element and is known to form a pocket that can bind a single arginine residue [61]. The guanidino group of arginine forms a pseudo-base pair with the Hoogsteen face of the 3'-most guanosine residue, while the 5'-most uridine in the bulge forms a base triple with the A:U base pair that pulls the negatively charged phosphate backbone into apposition with the positively charged amino acid. Similar arginine-binding pockets have now been observed in anti-arginine aptamers [40], in aptamers selected to bind to arginine-rich peptides [62], and in anti-Rex aptamers derived from the wild-type Rex-binding element [37]. Thus, targets that are related by the inclusion of arginine appear to elicit related sequence and structural motifs. However, the individual aptamers are by-and-large specific for their cognate targets. Thus, it is possible that the differences between the aptamers may determine their different specificities. If such a relationship can be found and rationalized, it would represent a simple 'code' for protein–nucleic acid interactions that could identify how changes in target sequence or shape would map to changes in nucleic acid sequence or shape (and vice versa). Obviously, knowledge of such a code would be a boon to the de novo design of molecular interfaces. Third, we have selected aptamers that can bind to the βII isozyme of protein kinase C from an RNA pool that spanned 120 random positions [5]. The aptamers fell into several families, and individuals from two of the most prominent families were assayed for their ability to inhibit the enzymatic activity of PKC. While these aptamers efficiently inhibited the enzymatic activity of the βII isozyme, they had a 10-fold lower Ki for the βI isozyme (96% similar) and showed no activity against the α isozyme (80% similar). These specificities rival those seen with monoclonal antibodies. We have now selected aptamers that can bind to the α isozyme of PKC from the same RNA pool. Sequence comparisons of the anti-βII and anti-α aptamers (Fig. 13) suggest that the map that relates target space and sequence space is convoluted. For example, while one family of aptamers was returned from both selections, other families were unique for one or the other isozyme. Taken together, these results validate both of the postulated mappings between target space and sequence space. In the first example, closely related targets elicited closely related aptamers. In the second, more distantly related targets elicited similar aptamers. In the third, closely related targets elicited distinct aptamers. The final iteration, distant targets eliciting distinct aptamers, is the norm. Thus, it may be that some molecular recognition events are amenable to modest and continuous mappings between the se185

S. D. Jhaveri et al.

Fig. 13. Dendrogram of PKC aptamers: comparison of the anti-α and anti-βII aptamers.

quences and structures of targets and the sequences of nucleic acid ligands, while other molecular recognition events more readily admit to discontinuous jumps between the structures of targets and the sequences of ligands. Most likely, both types of mappings are possible for any one target. The fact that similar targets do not always find similar sequences is likely due to the fact that not all gradations of structure and affinity can be 186

Landscapes for molecular evolution

formed by minor perturbations of a given ligand. In the next section, we shall find that not all gradations of structure and affinity can be formed, even by more major perturbations of ligands.

The relationship between affinity and specificity The realities of molecular recognition underlie all of the fitness landscapes that we have described. Conversely, these landscapes may serve to elucidate general principles that govern the evolution of molecular recognition. For example, Eaton and co-workers [63] have hypothesized that the natural or artificial selection of ligands having high affinity for their targets will in general beget ligands with high specificities for their targets. This hypothesis is based on the model that as ligands with progressively higher affinities for a target evolve, they will frequently form more bonds or interactions with a target and will therefore meld more precisely onto the surface of the target (Fig. 14a). In support of this argument, Eaton et al. have shown that RNA molecules selected to bind basic fibroblast growth factor have much greater affinity for their cognate target than for other, related fibroblast growth factors. Similarly, DNA molecules selected to bind thrombin do not bind other related serine proteases. And, as we have seen, RNA molecules selected to bind one protein kinase C isozyme bind with reduced affinity even to highly related isozymes [5].

a

b

c

Fig. 14. Interactions between selected biopolymers and related targets. (a) Aptamers selected to bind to a given target may ‘fit’ that target so tightly that any perturbation of the interface in a related target will disrupt binding. (b) Aptamers selected to bind to a given target may bind to related targets because at least some of the portions of the surfaces of the targets are largely the same. (c) Aptamers selected to bind to a given target may ‘fit’ that target relatively loosely, forming multiple weak or nondirected interactions, and thus may be relatively impervious to structural differences found on the surfaces of related targets.

187

S. D. Jhaveri et al.

Fig. 15. Graphical representation of specificity classes. The axes on this triangular graph represent the relative contribution of a given aptamer’s binding ratio to the total affinity profile of the aptamer. The three values that place an aptamer on this graph were derived as follows. A given binding ratio (for either wild-type MS2, E89K, or E89T) was divided by the sum of these three binding ratios, and the result was normalized to 1. Amongst the operator-like aptamers, the outlier is the dual-specificity aptamer.

While the scenario presented by Eaton et al. is plausible and empirically supported, it is unclear whether it is of necessity true. For example, the ‘holes’ on selected ligands could seek out progressively larger sets of related ‘bumps’ on related target molecules while eschewing unrelated ‘bumps’ (Fig. 14b). Similarly, it is possible that the monomer sets available for the construction of biopolymer ligands are not sufficiently ‘granular’ to match any possible set of ‘bumps’ on target molecules, and thus may not always be able to discriminate between closely related shapes (Fig. 14c). In order to better test the generalization that affinity begets specificity, we selected aptamers that could bind to a series of closely related proteins, i.e., single- and doublesubstitution variants of the MS2 coat protein from an RNA pool that had 30 random sequence positions (I. Hirao et al., Chem. Biol., manuscript submitted). Numerous families of aptamers were returned; similar aptamers were frequently found in selections that targeted different proteins. By assessing the complexity of the original population and the information content of the aptamers that were returned, it can be argued that the aptamers are optimal for their targets: that is, no better (short) binding sequences exist. The various families of aptamers were assayed for their ability to discriminate between the closely related protein targets. Despite the fact that optimal binding species were identified, many of the aptamers can readily cross-recognize the related targets. Figure 15 is a ‘specificity profile’ for the aptamer families relative to three MS2 coat protein variants that contain either glutamate (E), lysine (K), or threonine (T) at position 89 (a position 188

Landscapes for molecular evolution

that fronts the RNA binding site of MS2 coat protein). While some of the families are relatively specific and can discriminate between favored and disfavored ligands by factors of two- to fourfold, other families are quite catholic in their ability to bind these variants. Thus, it does not appear as though selections for affinity must of necessity beget ligands with high specificities. However, the in vitro selection protocol can be varied to circumvent this obstacle. One method to obtain aptamers that can specifically bind to a particular target would have to include a counterselection step, which eliminates those sequences that can bind to closely related targets. Using counterselection, aptamers have been selected that bind to theophylline, but not to caffeine, and the aptamers can discriminate against caffeine by > 10 000-fold [64]. Since the hypothesized correlation between affinity and specificity has previously been shown to hold for other target sets (e.g., fibroblast growth factors), the question becomes: Why was it not observed for the MS2 selection? The model put forth by Eaton et al. [63] assumed that nucleic acid structure space was essentially continuous, and that some sequence could always be found that would fold into a structure closely complementary to a distinctive feature on the surface of a protein. In reality, both nucleic acids and proteins are constructed from monomers that have discrete chemistries and structures. There are no canonical nucleotides that are ‘intermediate’ between the purines adenosine and guanosine, nor canonical amino acids that fill the gap between the basic side chains of lysine and arginine. If nucleotides, amino acids, and ultimately their polymers are structurally quantized or granular (capable of assuming a limited number of shapes), rather than structurally continuous (capable of assuming virtually any shape), then it should not be surprising that complexes between these polymers are similarly quantized or granular, and that strong complementarity may not always be achieved (Fig. 15). Since the strengths and specificities of molecular complexes can be significantly altered by structural changes on the order of tenths of Ångstroms, it is possible that for some molecular interfaces no possible ‘fits’ exist that will lead to both affinity and specificity. Thus, a correlation between affinity and specificity can best be observed when the distances between the sequences and structures of related protein targets are large enough that at least some RNA ligands can make distinctions (as was the case for aptamers selected to bind wildtype MS2 versus Qβ coat proteins). Conversely, a correlation between affinity and specificity may be lost when the distances between related protein targets grow so small that few or no RNA ligands can adequately make distinctions (as was the case for aptamers selected to bind the MS2 coat protein versus its variants). An interesting corollary to this analysis is that, if not all ‘fits’ of molecular interfaces are possible, then the best possible ‘fits’ for related interfaces may differ significantly in sequence. In other words, even if a target molecule changes slightly, its cognate ligand may have to change drastically in order to maintain the tightest possible interaction. In support of this hypothesis, we have observed that single amino acid changes in the sequence of a protein target were compensated for by large changes in the sequences and predicted structures of cognate RNA ligands. Moreover, given the fact that pools with related chemistries (2'-hydroxyl- versus 2'-deoxy- versus 2'-amino- versus 2'-fluoro-pyrimidines) can yield quite different sequence answers when challenged with the same target, it is not surprising that a single pool would not generally yield similar sequence answers when challenged with related targets. 189

S. D. Jhaveri et al.

Conclusions Thus far, the lessons from in vitro selection experiments provide rough cartoon-like sketches that are nevertheless useful in understanding the principles of evolution with species that have a remarkably direct link between genotype and phenotype. Given the limited amount of data that can be garnered from in vitro evolution experiments, we have been able to assess factors influencing the size and diversity of the sequence spaces, and offer a description of our vision of the fitness landscape. Finally, by examining the correlation between targets and the fitness landscapes generated, it is possible to make generalizations relating abstract fitness landscapes to the realities of specific molecular recognition.

References 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Kramer, F.R., Mills, D R., Cole, P.E., Nishihara, T. and Spiegelman, S., J. Mol. Biol., 89 (1974) 719. Oliphant, A.R., Brandl, C. and Struhl, K., Mol. Cell. Biol., 9 (1989) 2944. Uphoff, K.W., Bell, S.D. and Ellington, A.D., Curr. Opin. Struct. Biol., 6 (1996) 281. Gold, L., Polisky, B., Uhlenbeck, 0. and Yarus, M., Annu. Rev. Biochem., 64 (1995) 763. Conrad, R., Keranen, L.M., Newton, A.C. and Ellington, A.D., J. Biol. Chem., 269 (1994) 32051. Lin, Y., Qui, Q., Gill, S.C. and Jayasena, S.M., Nucleic Acids Res., 22 (1994) 5229. Lin, Y, Padmapriya, A., Morden, K.M. and Jayasena, S.D., Proc. Natl. Acad. Sci. USA, 92 (1995) 11044. Smith, D., Kirshenheuter, G.P., Charlton, J., Guidot, D.M. and Repine, J.E., Chem. Biol., 2 (1995) 741. Bock, L.C., Griffin, L.C., Latham, J.A., Vermaas, E.H. and Toole, J.J., Nature, 355 (1992) 564. Paborsky, L.R., McCurdy, S.N., Griffin, L.C., Toole, J.J. and Leung, L.L.K., J. Biol. Chem., 268 (1993) 20808. Griffin, L.C., Tidmarsh, G.F., Bock, L.C., Toole, J.J. and Leung, L.L.K., Blood, 12 (1993) 3271. Padmanabhan, K., Padmanabhan, K.P., Ferrara, J.D., Sadler, J.E. and Tulinsky, A., J. Biol. Chem., 268 (1993)17651. Sassanfar, M. and Szostak, J.W., Nature, 364 (1993) 550. Lato, S.M., Boles, A.R. and Ellington, A.D., Chem. Biol., 2 (1995) 291. Wallis, M.G., Von Ahsen, U., Schroeder, R. and Famulok, M., Chem. Biol., 2 (1995) 543. Wang, Y. and Rando, R.R., Chem. Biol., 2 (1995) 281. Ellington, A.D., Curr. Biol., 4 (1994) 427. Ellington, A.D., Ber. Buns. Phys. Chem., 98 (1994) 1115. Hager, A.J., Pollard Jr., J.D. and Szostak, J.W., Chem. Biol., 3 (1996) 717. Lehman, N. and Joyce, G.F., Nature, 361 (1993) 182. Beaudry, A.A. and Joyce, G.F., Science, 257 (1992) 635. Lorsch, J.R. and Szostak, J.W., Nature, 371 (1994) 31. Lohse, P.A. and Szostak, J.W., Nature, 381 (1996) 442. Illangasekare, M., Sanchez, G., Nickles, T. and Yarus, M., Science, 267 (1995) 643. Schuster, P. and Stadler, P.F., Comput. Chem., 18 (1994) 295. Eigen, M., In Pines, D. (Ed.) Emerging Synthesis in Science: Proceedings of the Founding Workshops of the Santa Fe Institute, Santa Fe Institute, Santa Fe, NM, U.S.A., 1985, pp. 25-69. Hamming, R.W., Bell Syst. Technol. J., 29 (1950) 147. Schuster, P., J. Biotechnol., 41 (1994) 239. Tuerk, C. and Gold, L., Science, 249 (1990) 505. Ellington, A.D. and Szostak, J.W., Nature, 346 (1990) 818. Lato, S.M. and Ellington, A.D., Mol. Div., 2 (1996) 103. Lorsch, J.R. and Szostak, J.W., Biochemistry, 34 (1994) 973.

190

Landscapes for molecular evolution 33 Jellinek, D., Green, L.S., Bell, C. and Janjic, N., Biochemistry, 33 (1994) 10450. 34 Jensen, K.B., Green, L., MacDougal-Waugh, S. and Tuerk, C., J. Mol. Biol., 235 (1994) 237. 35 Green, L., Waugh, S., Binkley, J.P., Hostomska, Z., Hostomsky, Z. and Tuerk, C., J. Mol. Biol., 247 (1995) 60. 36 Bartel, D.P., Zapp, M.L., Green, M.R. and Szostak, J.W., Cell, 67 (1991) 529. 37 Baskerville, S.D., Zapp, M. and Ellington, A.D., J. Virol., 69 (1995) 7559. 38 Connell, G.J., Illangasekare, M. and Yarus, M., Biochemistry, 32 (1993) 5497. 39 Connell, G.J. and Yarus, M., Science, 264 (1994) 1137. 40 Tao, J. and Frankel, A.D., Biochemistry, 35 (1995) 2229. 41 Famulok, M., J. Am. Chem. Soc., 116 (1994) 1698. 42 Geiger, A., Burgstaller, P., Von der Eltz, H., Roeder, A. and Famulok, M., Nucleic Acids Res., 24 (1996) 1029. 43 Giver, L., Bartel, D., Zapp, M., Pawul, A., Green, M. and Ellington, A.D., Nucleic Acids Res., 21 (1993) 5509. 44 Green, L.S., Jellinek, D., Bell, C., Beebe, L.A., Feistner, B.D., Gill, S.C., Jucker, F.M. and Janjic, N., Chem. Biol., 2 (1995) 683. 45 Bartel, D.P. and Szostak, J.W., Science, 261 (1993) 1411. 46 Ekland, E.H., Szostak, J.W and Bartel, D.P., Science, 269 (1995) 364. 47 Jellinek, D., Lynott, C.K., Rifkin, D.B. and Janjic, N., Proc. Natl. Acad. Sci. USA, 90 (1993) 11227. 48 Jellinek, D., Green, L.S., Bell, C., Lynott, C.K., Gill, N., Vargeese, C., Kirschenheuter, G., McGee, D.P.C., Abesinghe, P., Pieken, W.A., Shapiro, R., Rifkin, D.B., Moscatelli, D. and Janjic, N., Biochemistry, 34 (1995) 11363. 49 Tuerk, C., MacDougal, S. and Gold, L., Proc. Natl. Acad. Sci. USA, 89 (1992) 6988. 50 Schnieder, D.J., Feigon, J., Hostomsky, Z. and Gold, L., Biochemistry, 34 (1995) 9599. 51 Huizenga, D.E. and Szostak, J.W., Biochemistry, 34 (1995) 656. 52 Kubik, M.F., Stephens, A.W., Schneider, D., Marlar, R.A. and Tasset, D., Nucleic Acids Res., 22 (1994) 2619. 53 Pagratis, N.C., Bell, C., Chang, Y.-E, Jennings, S., Fitzwater, T., Jellinek, D. and Dang, C., Nat. Biotechnol., 15 (1997) 68. 54 Latham, J.A., Johnson, R. and Toole, J.J., Nucleic Acids Res., 22 (1994) 2817. 55 Tian, Y., Adya, N., Wagner, S., Giam, C.Z., Green, M.R. and Ellington, A.D., RNA, 1(1995) 317. 56 Leclerc, F., Cedergren, R. and Ellington, A.D., Nat. Struct. Biol., 1 (1994) 293. 57 Ye, X., Gorin, A., Ellington, A.D. and Patel, D.J., Nat. Struct. Biol., 3 (1996) 1026. 58 Tsiang, M., Gibbs, C.S., Griffin, L.C., Dunn, K.E. and Leung, L.L.K., J. Biol. Chem., 270 (1995) 19370. 59 Mattaj, I., Cell, 73 (1993) 837. 60 Tan, R., Chen, L., Buettner, J.A., Hudson, D. and Frankel, A.D., Cell, 73 (1993) 1031. 61 Puglisi, J.D., Ruoying, T., Calnan, B.J., Frankel, A.D. and Williamson, J.R., Science, 257 (1992) 76. 62 Xu, W. and Ellington, A.D., Proc. Natl. Acad. Sci. USA, 93 (1996) 7475. 63 Eaton, E.B., Gold, L. and Zichi, D.A., Chem. Biol., 2 (1995) 633. 64 Jenison, R.D., Gill, S.C., Pardi, A. and Polisky, B., Science, 263 (1994) 1425.

191

Synthetic peptide libraries Zhan-Gong Zhao and Kit S. Lam Arizona Cancer Center, Department of Medicine and Department of Microbiology and Immunology, College of Medicine, The University of Arizona, 1515 North Campbell Avenue, Tucson, AZ 85724, U.S.A.

Introduction Combinatorial chemistry is a rapidly developing field. It is now considered as one of the most important recent advances in medicinal chemistry. In addition to drug lead identification, combinatorial chemistry can also be applied to the optimization of the initial lead. Most pharmaceutical companies have been focusing their efforts on smallmolecule library development because drug leads discovered from these libraries are more likely to be able to cross cellular membranes. Therefore, for intracellular targets, smallmolecule libraries are preferred. However, for extracellular targets, particularly those with larger natural ligands such as insulin, growth factors, and cytokines, peptide or oligomeric non-peptide libraries may sometimes be very useful, since it is unlikely that a small molecule (e.g., MW 300-600) can bind to these receptors with high affinity. Peptide libraries are also useful for the development of drug leads for enzymes that use peptides as a substrate, e.g., protease, and protein kinase. These peptide drug leads can then be converted to active peptidomimetics. The development of drugs for peptide hormone receptors, and vaccines for both humoral and cellular immunity certainly are two additional areas where peptide libraries will play an important role. Combinatorial chemistry, in addition to facilitating the drug discovery process, is also an invaluable tool for basic research, particularly in the area of molecular recognition. Many biologically important molecular interactions involve proteins or peptides. Peptide libraries, therefore, will continue to be an extremely useful tool for basic research. Combinatorial chemistry began in 1984 when Geysen et al. first synthesized a limited array of peptides on polyethylene or polypropylene pins [1]. Two years later, the iterative approach of combinatorial library on pins was introduced [2]. In 1990, the successful application of the phage-displayed peptide library method for ligand identification was reported by three research groups [3–5]. In 1991, Lam et al. [6] introduced the ‘one-beadone-compound’ combinatorial peptide library method and Houghten et al. [7] described the solution-phase combinatorial peptide library method using an iterative approach. Since then, the combinatorial library field has taken off and has now expanded to non-peptide oligomer libraries as well as small-molecule libraries [8]. Taking into consideration the three components of combinatorial library methods (production, screening, and structure determination), one may categorize the various peptide library methods into five distinct groups: (i) biological libraries; (ii) spatially addressable parallel solid-phase or solutionphase peptide libraries; (iii) peptide libraries requiring deconvolution; (iv) affinity selection of soluble peptide libraries; and (v) one-bead–one-compound combinatorial peptide librar-

192

Annual Reports in Combinatorial Chemistry and Molecular Diversity, Vol. 1, pp. 192-209 © 1997 ESCOM Science Publishers B. V.

Synthetic peptide libraries

ies. For biological peptide library methods, please refer to the chapter by Collins in this volume. Recent advances of the remaining four synthetic peptide library methods as well as library-related peptide chemistry will be briefly discussed in this review.

Peptide chemistry Merrifield’s concept of solid-phase synthesis was first introduced three decades ago for peptide synthesis [9]. Subsequently, this approach has been extended into synthesizing oligonucleotides [10,11], oligosaccharides [12,13], and small organic molecules [14]. Although approaches based on solution-phase synthesis have been used in combinatorial library synthesis [15,16], most of the chemical libraries synthesized to date have been constructed by solid-phase methods. The key elements of the solid-phase peptide synthesis – e.g., solid support, anchoring linkage, side-chain diversity, coupling chemistry, and unnatural peptide structures, etc. – have been optimized for uses in combinatorial formats. Solid supports The basic requirements of solid support for solid-phase peptide synthesis include its mechanical stability, solvent swellability, and compatibility. The low cross-linked polystyrene-divinylbenzene beaded resin was first introduced by Merrifield [9] in a simple tetrapeptide synthesis. Three decades later, the ‘Merrifield resin’ continues to be one of the major solid supports for peptide synthesis. For combinatorial peptide library synthesis, the choice of solid supports is largely determined by the way the library is screened. For Houghten’s soluble synthetic combinatorial library (SCL) approach, the peptides are cleaved from the resin as mixtures before the screening; thus, no extra requirement is needed for the support. Polystyrene-based MBHA resin was frequently used in their library synthesis [17]. For libraries based on Lam’s one-bead–one-compoundapproach, the screening process is conducted on a single-bead basis. Therefore, the choice of the resin is particularly important. Ideally, the resin bead should be (i) uniform in size and substitution; (ii) fully compatible with organic and aqueous conditions; (iii) nonsticky during the synthesis and screening process; (iv) nonfluorescent if a fluorescence screening assay is used; and (v) available in a variety of sizes, shapes, and porosities. Currently, none of the commercially available resins fulfills all five criteria; there is a definite need for the development of a new solid support. Polydimethylacrylamide beads (Pepsyn Gel, Millipore, now PerSeptive Biosystems) were used in the author’s laboratory to synthesize the first one-bead–one-compound peptide library [6]. It was later replaced by TentaGel or polyoxyethylene-grafted polystyrene resin [18]. Although TentaGel is heterogeneous in substitution, it is uniform in size, nonsticky, and compatible with organic and aqueous conditions. Therefore, TentaGel has now become the most popular resin for solid-phase library synthesis. It is important to note that only functional sites on the surface of the TentaGel resin are accessible to macromolecular targets. This property has recently been utilized as an advantage to create beads with spatially segregated, differential protecting groups (surface versus inside of beads). These beads are particularly useful in the generation and screening of encoded combinatorial libraries [19,20]. The other solid supports which have been used in ‘one-bead–one-peptide’ library syn193

Z. -G Zhao and K S. Lam

thesis include controlled-pore glass (CPG) beads [21], polyethylene glycol cross-linked polyamide (PEGA) resin [22], and polyhydroxylated methacrylate (Toyopearl®) beads [23]. Compared to TentaGel resin, these supports each possess an open structure permitting biologically active proteins to diffuse into the interior. Another polymer carrier similar to TentaGel resin in composition and properties is the PEG-PS resin [24] of Millipore (now PerSeptive Biosystems). The major difference between PEG-PS and TentaGel is that PEGPS has the functional group next to the hydrophobic polystyrene chain and the polyoxyethylene chains only serve as ‘modifier’ ofpolymer properties, while TentaGel has the functionalizable group at the end of the polyoxyethylene chains. The other new solid supports introduced recently include Argonaut Technologies’ ArgoGel[25], which has high loading capacity, and Barany’s CLEAR resins, which are highly cross-linked and yet show excellent swelling properties and performance in batchwise and continuous-flow synthesis [26]. Besides classical resin beads, other polymeric carriers were also used for the synthesis of peptide libraries in various formats. Polyacrylate-grafted polypropylene ‘pins’ were used for the synthesis of the first combinatorial chemical library [1,2]. This type of support continues to be heavily used in multiple peptide [27] and non-peptide [28] library synthesis. Cellulose paper, originally used by Frank et al. as a solid-phase support for oligodeoxyribonucleotide synthesis [29], has also been used as the support for multiple ‘SPOT’ synthesis of peptide libraries [30,31]. Polystyrene-grafted polyethylene film (PS-PE) may also be used in combinatorial peptide library synthesis [32]. The specific feature of the membrane type of carrier is its dividability. This feature has been used for the synthesis of libraries with a nonstatistical distribution of library members, where no compound is missing and none is represented more than once [33]. Anchoring linkers Over the years, a number of cleavable linkers that are acid labile, base labile, or photolabile have been developed for solid-phase peptide synthesis. (This topic has been covered in detail by several review papers [34–36].) For libraries that require the linker to be cleaved before screening, most of these conventional linkers can be used. Several unconventional linkers have been found to be particularly useful and user-friendly for combinatorial applications (see Fig. 1). Among them are methionine-containing linker [37] and safety-catch benzylhydrylamine linker 1 [38]. Bray et al. [39] have utilized an orthogonal peptide–resin linker 2 which allows the final deprotection and removal of contaminating chemicals and the peptide is later released into an aqueous buffer. Hoffmann and Frank [40] recently described a novel safety-catch linker 3 based on the intramolecular catalytical

Fig. 1. Structures of linkers 1, 2 and 3.

194

Synthetic peptide libraries

Fig. 2. The two-stage release of peptides from iminodiacetic acid-based dual cleavable linker.

hydrolysis of an ester bond. Peptides attached to the solid support through this linker can be released into neutral aqueous buffer. In the case of the one-bead–one-compound peptide library, no cleavable linker is necessary if the library is screened while the peptides are still attached to the solid support. The polyoxyethylene chain of TentaGel functions as a long hydrophilic linker for displaying the peptides. Traditional solid-phase peptide synthesis relies on a -COOH to -NH2 polymerization (i.e., C- to N-terminus) of the polypeptide chain; thus, only the N-terminus is free. In some cases, however, it is required that the carboxyl terminus be free for binding to occur. Since the synthesis of peptide from N- to C-terminus is not well developed, it is necessary to apply a technique to ‘reverse’ the peptide on solid beads. A reversible strategy via a cyclization/cleavage protocol has been adapted by several groups [41–43]: the peptide is synthesized using a linker included in the cyclic structure, and after cyclization the linker is selectively cleaved and the C-terminus of the peptide displayed. One version of this scheme generates a free N-terminus enabling N-terminal Edman degradation for sequencing [42]. An alternative approach uses a template that serves as both the linker and cyclization point [41]. The dual releasable linker developed by the Selectide group is useful to release peptides from a bead-library for solution-phase screening [44]. The sequential release of peptides in two steps provides the opportunity to screen large libraries and arrives at the active compounds without any additional deconvolution procedures (Fig. 2). Cardno and Bradley [45] recently reported a simple multiple release system achieved by treating aminomethyl resin with an equal molar mixture of commercially available linkers to generate multifunctionalized resin. The hydroxymethyl groups of the linkers were then functionalized and Fmoc peptide chemistry was carried out in the normal manner. The two steps of cleavage can be achieved by treating the resin first with 1% TFA and then with 95% TFA (Fig. 3). 195

Z.-G. Zhao and K.S. Lam

Amino acid side-chain diversity Both L- and D-amino acids, as well as many other unnatural amino acids can all be incorporated into synthetic peptide libraries. However, until very recently, most unnatural amino acids needed to be prepared first before they could be used as building blocks in the synthesis of unnatural peptides. Two new techniques have emerged that make solidphase side-chain derivatization convenient. Rivier’s group [46] utilized aminoglycine as the building block for the synthesis of peptide scaffolds. The racemic a-Fmoc, a’-Boc-aminoglycine [Boc-D/L-Agl(Fmoc),Boc-D/L-Agl(Me,Fmoc)] were introduced into the peptide chain by Boc chemistry. The removal of the Fmoc protecting groups and acylation of the newly unmasked amino side chain with acyl halides or carboxylic acids results in sidechain diversity. Recently, O’Donnel et al. [47] reported the synthesis of unnatural peptides using a solid-phase synthesis method. In their procedure, three new steps were added to the normal solid-phase peptide synthesis sequence – activation of an N-terminal glycine residual on the growing peptide chain via Schiff base formation (with benzophenone imine), alkylation of the residue to make a desired side-chain modification, and hydrolysis to remove the activating group. The key to this technique is the use of mild, nonionic reagents called ‘Schwesinger bases’ [48]. Peptide bond formation In peptide library synthesis, four major kinds of coupling reagents, active esters, preformed symmetrical anhydride, acid halides, and in situ reagents, have been used for the stepwise introduction of Nα -protected amino acids. However, protocols involving in situ reagents, including the carbodiimides, and the HOBt-based reagents (BOP, HBTU, TBTU, PyBOP) have been more popular than the others. Recently, efficient coupling reagents based on HOAt (1 -hydroxy-7-azabenzotriazole), such as HATU, PyAOP, and HAPyU [49], were found to be superior to the HOBt-based reagents in the synthesis of peptides which include hindered amino acids. CF3-NO2-PyBOP, another powerful coupling reagent, has been shown to be very promising for the acylation of N-methylamino acids [50]. With the introduction of urethane-protected N-carboxyanhydrides (UNCAs) [5 1], condensation reactions to form peptides can be achieved with only CO2 as the coproduct. Using these reagents, Zhu and Fuller [52] were able to conduct a general onepot synthesis of protected and partially protected tripeptides. Since no coupling reagents are involved, the UNCA approach may be used in solution-phase peptide library synthesis.

Fig. 3. Multifunctionalized resin and conditions of sequential cleavage: (A) 1% TFA/DCM; (B) 95% TFA.

196

Synthetic peptide libraries

Cyclic libraries Much attention has been given to the synthesis of cyclic peptide libraries. Cyclic peptides are resistant toward enzymatic degradation [53,54], and the conformational constraint imposed by cyclization may also be responsible for their high affinity to some biological receptors [55,56]. One-bead–one-compound disulfide cyclic peptide libraries have yielded ligands which bind to gpIIb/IIIa integrin, whereas similar linear libraries have not [57]. Potent and selective a-glucosidase inhibitors were identified from the iterative synthesis and screening of 26 cyclic libraries (disulfide and lactams) of varying size and composition [58]. Cellulose-immobilized cyclic hepta- and octapeptide libraries have been screened for ligands that bind to TGF-β [59]. Compounds with chymotrypsin inhibitory activity were also identified through the screening of a cyclic peptide template library by Houghten’s group [60]. Various cyclization techniques have been used in the past, which include disulfide bond [61], side-chain lactam [59], and head-to-tail amide bond formation [62]. In most cases, the readiness of an open-chain precursor to cyclize depends on the size of the ring and the sequence of the peptide to be cyclized. Usually, ring closure with hexa- and pentapeptides is somewhat hampered; however, the cyclization may be enhanced by the presence of turn structure inducing amino acids such as glycine, proline, and D-amino acids, etc., as was the case in the study of McBride et al. [63]. Spatola et al. [64] have recently conducted extensive studies in these aspects and several cyclic penta-, hexa-, and heptapeptide libraries were prepared and analyzed. Their results showed that rapid cyclization rates can be achieved with optimized synthesis and cyclization procedures, and many combinations of cyclic peptides can be formed in high quality if they contain structural features that make cyclization more facile.

Spatially addressable parallel libraries In spatially addressable synthesis, peptides with known amino acid sequences are synthesized at predetermined locations. The synthesis and screening of these libraries are performed in parallel. Depending on the method used, screening can be performed while the peptides are still attached to the solid support (e.g., binding assay), or the peptides can be released from the solid support for solution-phase assay. The three techniques based on this strategy are the multipin method, the cellulose-based SPOT method, and light-directed peptide synthesis on chips. The DIVERSOMERTM method described by DeWitt et al. [65], originally designed for non-peptide libraries, can certainly be used for the preparation of limited peptide libraries. The main advantage of the spatially addressable parallel libraries is that the amino acid sequences of the ligands are predetermined; therefore, subsequent structure determination of the positive ligands is not needed. The disadvantage of this method, however, is that only a limited number of peptides can be made by this approach. Multipin method Geysen et al. [l] introduced this method for generating peptides in a reusable form. In this method, peptides are synthesized on polyacrylate-grafted polyethylene pins arranged in standard 96-well microtiter plate format, and screening is performed by an enzyme197

Z.-G. Zhao and K.S. Lam

linked immunosorbant assay (ELISA) while peptides remain bound to the pins. Later, cleavable linkers were developed [39] so that peptides can be released from the pin for solution-phase assay. Recent applications of this method include T-cell proliferation studies [66–68], substance P-receptor binding studies [69,70], structure–activity relationship (SAR) studies of a series of potent hexapeptides that bind endothelin receptors [71,72], and searching for inhibitors of human heart chymase [73]. This technique was originally developed as an immunological tool for epitope mapping. Less than 50 nmol of discrete peptide was synthesized on each pin at the early stage. Continuing evolution of this technology has now allowed synthesis of scale up to 10 µmol per pin [74]. Therefore, this technology is becoming useful both as a screening tool and as an efficient method of producing individual peptides. SPOT synthesis This approach was developed in Frank’s [75] laboratory initially as an easy and economic alternative to the multipin method. Cellulose, in the form of paper [76–78], can be functionalized for peptide synthesis and the synthesis is accomplished by ‘spotting’ the solution of protected amino acids onto the functionalized cellulose paper in the presence of activating reagents. Except for the individual coupling steps, all the other steps of the synthesis cycle can be simultaneously performed by treating the entire paper with the appropriate reagents and solutions. The peptide arrays can be used either directly in ELISA-based receptor binding studies, or after punching out the spot, individually cleaved and applied in solution (for a recent review, see Ref. 79). Using this technique, Frank [80] has analyzed a T-cell epitope with a series of soluble peptide variants. Recently, the spot synthesis technique has been extended into synthesizing random libraries which can be deconvoluted by iterative [31] or positional scanning approaches [81]. To facilitate SPOT synthesis, an automatic peptide synthesizer on paper with a SPOT format is now commercially available (Abimed, Langenfeld, Germany [82]). Light-directed peptide synthesis on chips This technique employs two well-developed techniques, semiconductor-based photolithography and solid-phase chemical synthesis. The first application of this technology was performed by Fodor et al. [83]. They synthesized minute quantities of 1024 different peptides on a single glass plate using a photolithographic masking process and photolabile N-α-amino protecting groups. Each peptide was synthesized in an area of 50 µm × 50 µm. Because of the limited quantity of each peptide available on the chip, biological assay is restricted to binding assay quantitated by fluorescence microscopy [84]. The maximum number of compounds that can be synthesized is governed by the spatial resolution of the photolithographic mask. It is reported that up to 65 536 peptides can now be synthesized in an area slightly greater than 1 cm2 [85]. This technique has recently been used in an epitope mapping study [86]. A review of this technology can be found in Gallop et al. [85].

Synthetic library methods requiring deconvolution Methods in this category involve the synthesis of a number of peptide mixtures. These mixtures are then screened for a specific biological activity. Once an active mixture is 198

Synthetic peptide libraries

A. Synthesis of 125 Orthogonal Sublibraries (An An An)

B. Partition of Amino Acids

C. Examplesof Deconvolution

Fig. 4. Orthogonal partition combinatorial library approach: (A) synthetic scheme of 125 orthogonal sublibraries, AnAnAn (the same scheme is used for the synthesis of sublibraries BnBnBn); (B) building block partition matrix; (C) examples of deconvolution.

identified, one can deduce the structure of the active component based on the synthetic history. Therefore, isolation of the active component for structural determination is not needed. Four general deconvolution approaches are summarized here: (i) iterative approach; (ii) positional scanning approach; (iii) orthogonal partition approach; and (iv) recursive deconvolution. Iterative approach Geysen et al. [2] first successfully applied this approach in the multipin system in an on-pin ELISA assay to define a mimotope for a monoclonal antibody directed against a discontinuous epitope on the foot-and-mouth disease virus particle. Later, Houghten et 199

Z.-G. Zhao and K.S. Lam

al. [7] applied a similar iterative strategy in the generation and screening of a soluble synthetic combinatorial peptide library. In this approach, multiple library resynthesis and analysis are needed for deconvolution. This approach has resulted in the successful identification of antigenic determinants [87–90], opioid peptides [91,92], antimicrobial peptides [93], and enzyme inhibitors [94]. Recently, this approach was also used by Tegge et al. [31] in the determination of cyclic nucleotide-dependent protein kinase substrate specificity from the screening of a cellulose paper-bound library. Positional scanning This approach was introduced by Houghten’s group [95] to shorten the time required for the identification of active peptide sequences from a large peptide mixture. Positional scanning peptide libraries are composed of individual peptide mixtures in which a single position is defined with a single amino acid while the remaining positions are composed of mixtures of amino acids. For a hexapeptide library constructed from 20 amino acids, 120 positional scanning mixtures are synthesized, and after assaying the most active amino acid residues at each position are identified. All possible combinations of amino acids with activity exceeding a certain threshold level are then prepared and tested to identify explicit sequences with potent activities. So far, this approach has been used in identifying melittin inhibitors [96], opioid ligands [97], and antigenic determinants for different antibodies [98]. This approach has also been applied to cellulose membrane-bound peptide libraries for molecular recognition studies [81]. Orthogonal partition Deprez et al. [99] reported that two libraries of the same soluble 15 625 tripeptides were synthesized by an orthogonal partition approach (Fig. 4). Twenty-five different amino acids were partitioned in a 5 × 5 matrix format and divided into five groups in one way labeled as A1-A5, and orthogonally in another way labeled as B1–B5 (Fig. 4B). Each library resulted from the incorporation of one group of amino acids, An for library A and Bn for library B, at each of the three variable positions. Therefore, two orthogonally different but related libraries, AnAnAn and BnBnBn (125 sublibraries each), were synthesized by the ‘split-only’ method (Fig. 4A). Ideally, in a given screen, the chemical structure of the active tripeptide can be deduced by examining the chemical composition of active sublibraries from A and B (Fig. 4C). However, in practice, the result is more complex, particularly when there are multiple different motifs for a given target. Recursive deconvolution This method was described by Erb et al. [100] in a model study in which a pentapeptide library incorporating glycine, leucine, phenylalanine, and tyrosine was constructed on a solid support by the method of ‘split synthesis’. In this method, each protected amino acid is first coupled to an aliquot of resin. A portion of the resin from each reaction vial is then set aside and catalogued; the remaining resin from each vial is combined and redivided. The steps of (i) coupling; (ii) saving and cataloguing; and (iii) randomizing are repeated until a library is obtained. The sublibraries of which the N-terminal amino acid is defined are screened against the target; the amino acid of the sublibrary that demonstrated the highest activity is then coupled to the partial libraries that had been set aside 200

Synthetic peptide libraries

and catalogued in the previous split synthesis. This recursive deconvolution is repeated until the most potent peptides are identified. Since only one split synthesis is required, this strategy could potentially facilitate the iterative approach described above.

Affinity selection of soluble peptide libraries In this method, peptide libraries are first prepared by solid phase. The peptides are then cleaved off the resins and loaded onto an affinity column to select for peptides of interest. Songyang et al. have successfully applied this approach to identify peptides that specifically bind to SH2 domains [101] and kinase domains of protein kinases [102]. This approach has also been applied to oligonucleotide libraries, in which the eluted oligonucleotides can be amplified by PCR and recycled through the column several times for further enrichment [103]. However, since the eluted peptides, unlike the oligonucleotides, cannot replicate themselves, enough peptides must be collected for sequence determination. Ideally, the eluted peptides could be purified and microsequenced individually, but, in practice, this is often not feasible. Instead, Songyang et al. [101,102] microsequenced all the eluted peptides concurrently. As a result, only one predominant motif, rather than multiple distinct motifs, can be identified. A similar methodology which combines affinity electrophoresis and mass spectrometry has been introduced as a solution-based approach for screening peptide libraries. This method allows on-line, one-step selection and structural identification of candidate ligands from a limited peptide library [104, 105].

One-bead–one-compound peptide library method The one-bead–one-compound combinatorial peptide library method is based on the fact that when a library is synthesized by a ‘split-synthesis’ method, only one peptide (1013 copies) is displayed on a single bead (100 µm diameter). Furka et al. first described the split-synthesis method for synthesizing 27 soluble peptides in two abstracts [ 106,107] and later in a full paper [108], but failed in all these reports to recognize the one-bead–onepeptide concept and its potential for screening millions of peptides while they are still attached to the beads. Lam et al. [6,109] first reported the synthesis and screening of the one-bead–one-compound peptide library and identified ligands for anti-β-endorphin monoclonal antibody and streptavidin from a random pentapeptide library of over a million different peptides. In the last 5 years, numerous investigators have applied this one-bead-one-compound library method to peptide, non-peptide oligomer, and smallmolecule libraries, for a variety of different targets. For recent reviews, see Refs. 110–112. Both on-bead binding or functional assays, as well as solution-phase assays have been developed to screen the bead-libraries. The on-bead binding assay involves the attachment of a reporter group to the molecular targets; the reporter group could be an enzyme, a fluorescent tag, a color dye, or a radionuclide. The enzyme-linked colorimetric screening assay, using chromogenic substrates that yield colored insoluble precipitates, has been commonly used and has been applied to the identification of ligands for monoclonal antibodies [6,113–116], surface idiotypes of lymphoma cells [117], streptavidin [6,118], avidin [118], thrombin and factor Xa [110], MHC-Class I molecule [119], and glycosomal 201

Z.G. Zhao and K.S. Lam

Fig. 5. Examples of various screening methods for the one-bead-one-compound combinatorial libraries: (A) a positive bead in an enzyme-linked colorimetric assay using 5-bromo-4-chloro-3-indolylphosphate (BCIP) as the substrate; (B) a positive bead can be easily retrieved by a hand-held micropipette; (C) a dual-color enzyme-linked colorimetric assay using two different substrates sequentially, purple (BCIP + nitroblue tetrazolium), brown (BCIP + iodonitrotetrazolium); (D) a whole-cell on-bead binding assay with a monolayer of intact cells on the surface of a positive bead; (E) an autoradiogram showing positive [32P]-labeled peptide beads in an on-bead functional assay for protein kinase peptide substrate; (F) an in situ solution-phase releasable assay for detecting anticancer agent, showing a clear zone of inhibition surrounding a positive bead; MTT was used to stain the live cells. Please refer to the text and references for the details of each method. Pepsyn Gel beads were used in panel C and TentaGel beads were used in the remaining panels.

202

Synthetic peptide libraries

phosphoglycerate kinase [ 120]. A dual-color detection scheme has also been developed [121] to minimize the number of false positive beads. Figure 5 illustrates the various methods that have been developed in the author's laboratory to screen the one-bead–onecompound combinatorial libraries. Schreiber’s group [122] used fluorescent labeled SH3 domains to screen for SH3 binding peptides [122] and peptidic compounds with non-peptidic elements [123]. The fluorescent approach is particularly appealing when the bead-library is analyzed by a fluorescent activated cell sorter (FACS). Kassarjian et al. [124] used [125I]-anti-β-endorphin monoclonal antibody in conjunction with autoradiography to probe bead-libraries, but this technique is rarely used for macromolecular targets because it is generally more cumbersome than the enzyme-linked colorimetric or fluorescent approach. Recently, Nestler et al. [125] developed a microautoradiographic screening method with [14C] radiolabeled probe to screen bead-libraries. This technique is particularly valuable for small molecular targets since steric hindrance between a small molecular target and an enzyme may be a potential problem for the enzyme-linked approach. Still [126] used dye-labeled molecular probes to discover sequence-selective peptide ligands of synthetic receptors from an encoded one-bead–one-compound library. Synthetic receptors labeled with a single dye [127] and two dyes of different colors [128] were used for screening the bead-library. The disadvantage of using dye-labeled probes is that the dye itself tends to interact with many ligands. Lam et al. [129] used a small organic dye, indigo carmine, to screen their beadlibraries and identified specific peptide motifs that interact with the dye. Wennemers and Still [130] also reported similar work with other organic dyes. Recently, Pennington et al. [131] reported the use of a whole-cell binding assay to identify binding peptides for α6β1 integrin on the cell surface of a prostate cancer line (DU145). In this method, intact cells were first mixed with the peptide bead-library in a small disposable column for 1–2 h. The free cells were then removed by washing the column with culture medium. Beads coated with a monolayer of intact cells could be easily identified under a dissecting microscope. Positive beads were then recycled (treated with 8 M guanidine HCl to remove the cells) and screened in the presence of a specific blocking antibody. The negative beads in this secondary screen were then isolated for microsequencing. Sasaki et al. [132] reported the use of magnetic beads as a probe to screen random peptide bead-libraries for dopamine-D2 receptor binding peptides. Their probe was prepared by coupling the extracellular nanopeptide sequence of CNS dopamine-D2 receptor with the magnetic beads. In addition to binding assay, the bead-library can also be screened for functional properties. For example, Lam's group [ 133,1341 determined the preferred kinase substrate sequence by incubating peptide bead-libraries with [γ-32P]ATP and a particular protein kinase. The [32P]-labeled beads were then washed thoroughly, immobilized on glass plates with agar, and positive beads were identified by autoradiography and microsequenced. Using this approach, substrate motifs for CAMP-dependent protein kinase [ 133] and p60c-scr protein tyrosine kinase [135,1361 were identified. Certainly, similar approaches can also be applied to other posttranslational modification sites. Meldal et al. [137,138] developed a novel fluorogenic quenching screening method to identify proteolytic substrate motifs from a bead-library. More recently, a related technique was developed to identify peptides with protease inhibitory activity [ 139]. 203

2.-G. Zhao and K.S. Lam

In addition to on-bead binding or functional assays, solution-phase assays have been developed for the screening of the one-bead–one-compound libraries. In this approach, cleavable linkers are used (see the Peptide chemistry section above) so that peptides can be released from the beads. The two general approaches for solution-phase assay include (i) a two-stage release assay using a 96-well format [57,140]; and (ii) the in situ releasable solution-phase assay with immobilized beads [140]. In the in situ releasable solution-phase assay, Lerner and co-workers [141,142] immobilized the bead-library on a polyethylene sheet which was then exposed to TFA vapor and neutralized with NH3 vapor. The sheetimmobilized beads were placed on an agar bed of melanophores (pigment cells) that had been transfected with a specific G-protein-coupled receptor plasmid DNA. Pigment dispersion induced by the released peptides was monitored by video image subtraction. With this approach, potent peptides with agonist or antagonist activities for any Gprotein-coupled receptor can, in principle, be developed. Using a different approach, Salmon et al. [143] immobilized beads with tumor cell lines in soft agar, where one of the linkers was cleaved and the peptides diffused outward from each bead. Forty-eight hours later, 3-[4,5-dimethylthiazole-2-yl]-2,5-diphenyltetrazolium bromide (MTT) was added to stain the living cells. A clear zone of inhibition was detected around beads with potential anticancer cell activity. With this approach, anticancer peptides were discovered. One advantage of the one-bead–one-compound combinatorial library method is that all the ligands in the library are synthesized and screened in parallel; as a result, it is not uncommon to identify multiple different motifs in a single screen [113,117,118,136]. This is in contrast to those library methods that require deconvolution through which usually only one predominant motif can be identified; if there are multiple equally prominent but distinct motifs, the results would probably be difficult to interpret. The main disadvantage of the one-bead–one-compound library method, however, is that the positive beads need to be isolated for structure determination. This is relatively straightforward for sequenceable peptides (via Edman degradation with an automatic protein sequencer). For nonpeptide oligomer or small organic libraries, however, a mass spectrometry method or an encoding strategy is needed for the structural determination of the positive compounds [110–112].

Perspectives While combinatorial small-molecule libraries certainly are invaluable in drug development, one should not ignore the fact that peptide libraries also play an important role for lead identification as well as drug development for certain targets. Furthermore, most of the solid-support chemistry, linker chemistry, screening methods, library strategies, etc. currently employed in small-molecule libraries were originally developed in synthetic peptide library research. Indisputably, combinatorial peptide libraries will continue to be an extremely useful tool for basic research. This review has presented a general overview on the synthesis and screening of peptide libraries to solve various biological and chemical problems. The future of the combinatorial peptide chemistry field is bright, because, after all, many of the intercellular as well as intracellular signal pathways in living organisms involve peptides and proteins. 204

Synthetic peptide libraries

References 1 Geysen, H.M., Meloen, R. and Barteling, S., Proc. Natl. Acad. Sci. USA, 81 (1984) 3998. 2 Geysen, H.M., Rodda, H.M. and Mason, T.J., Mol. Immunol., 23 (1986) 709. 3 Cwirla, S.E., Peters, E.A., Barrett, R.W. and Dower, W.J., Proc. Natl. Acad. Sci. USA, 87 (1990) 6378. 4 Devlin, J.J., Panganiban, L.C. and Devlin, P.E., Science, 249 (1990) 404. 5 Scott, J.K. and Smith, G.P., Science, 249 (1990) 386. 6 Lam, K.S., Salmon, S.E., Hersh, E.M., Hruby, V.J., Kazmierski, W.M. and Knapp, R.J., Nature, 354 (1991) 82. 7 Houghten, R.A., Pinilla, C., Blondelle, S.E., Appel, J.R., Dooley, C.T. and Cuervo, J.H., Nature, 354 (1991) 84. 8 Thompson, L.A. and Ellman, J.A., Chem. Rev., 96 (1996) 555. 9 Merrifield, R.B., J. Am. Chem. Soc., 85 (1963) 2149. 10 Davis, P.W., Hudson, D., Little, M., Sinaha, N.D., Usman, N., Weiss, M. and Wright, P., In Epton, R. (Ed.) Innovation and Perspectives in Solid Phase Synthesis, 2nd International Symposium, Intercept, Andover, U.K., 1992, pp. 63–75. 11 Beaucage, S.L. and Iyer, R.P., Tetrahedron, 48 (1992) 2223. 12 Douglas, S.P., Whitfield, D.M. and Kerpinsky, J.J., J. Am. Chem. Soc., 113 (1991) 5095. 13 Danishefsky, S.J., McClure, K.F., Randolph, J.T. and Ruggeri, R.B., Science, 260 (1993) 1307. 14 Leznoff, C.C., Acc. Chem. Res., 11 (1978) 327. 15 Cheng, S., Comer, D.D., Williams, J.P., Myers, P.L. and Boger, D.L., J. Am. Chem. Soc., 118 (1996) 2561. 16 Cheng, S., Tarby, C.M., Comer, D.D., Williams, J.P., Caporale, L.H., Myers, P.L. and Boger, D.L., Bioorg. Med. Chem., 4 (1996) 727. 17 Pinilla, C., Appel, J.R., Blondelle, S.E., Dooley, C.T., Eichler, J., Ostresh, J.M. and Houghten, R.A., Drug Dev. Res., 33 (1994) 133. 18 Rapp, W., In Jung, G. (Ed.) Combinatorial Peptide and Nonpeptide Libraries, A Handbook, VCH, Weinheim, Germany, 1996, pp. 425–464. 19 Vagner, J., Krchnák, V., Sepetov, N.F., Strop, P., Lam, K.S., Barany, G. and Lebl, M., In Epton, R. (Ed.) Innovation and Perspectives in Solid Phase Synthesis, 3rd International Symposium, Mayflower, Birmingham, U.K., 1994, pp. 347–352. 20 Vagner, J., Barany, G., Lam, K.S., Krchbak, V., Sepetov, N.F., Ostresm, J.A., Strop, P. and Lebl, M., Proc. Natl. Acad. Sci. USA, 93 (1996) 8194. 21 Singh, J., Allen, M.P., Ator, M.A., Gainor, J.A., Whipple, D.A., Solowiej, J.E., Treasurywala, A.M., Morgan, B.A., Gordon, T.D. and Upson, D.A., J. Med. Chem., 38 (1995) 217. 22 Meldal, M. and Svendsen, I., J. Chem. Soc. Perkin Trans. 1, (1995) 1591. 23 Buettner, J.A., Dadd, C.A., Baumbach, G.A., Masecar, B.L. and Hammond, D.J., Int. J. Pept. Protein Res., 47 (1996) 70. 24 Barany, G., Sole, N.A., Van Abel, R.J., Albericio, E and Selsted, M.E., In Epton, R. (Ed.) Innovation and Perspectives in Solid Phase Synthesis, 2nd International Symposium, Intercept, Andover, U.K., 1992, pp. 29–38. 25 Gooding, O., Hoeprich, P.O.J., Labadie, J.W, Porco, J.A.J., Van Eikeren, P. and Wright, P., In Chaiken, I.M. and Janda, K.D. (Eds.) Molecular Diversity and Combinatorial Chemistry: Libraries and Drug Discovery, American Chemical Society, Washington, DC, U.S.A., 1996, pp. 199–206. 26 Kempe, M. and Barany, G., J. Am. Chem. Soc., 118 (1996) 7083. 27 Maeji, N.J., Valerio, R.M., Bray, A.M., Campbell, R.A. and Geysen, H.M., React. Polym., 22 (1994) 203. 28 Bunin, B.A., Plunkett, M.J. and Ellman, J.A., Proc. Natl. Acad. Sci. USA, 91 (1994) 4708. 29 Frank, R., Heikens, W., Heisterberg-Moutsis, G. and Blocker, H., Nucleic Acids Res., 11 (1983) 4365. ^

205

Z.-G. Zhao and K.S. Lam 30 Kramer, A., Volkmer-Engert, R., Malin, R., Reineke, U. and Schneider-Mergener, J., Pept. Res., 6 (1993) 314. 31 Tegge, W., Frank, R., Hofmann, F. and Dostmann, W.R.G., Biochemistry, 34 (1995) 10569. 32 Berg, R.H., Almdal, K., Pedersen, W.B., Holm, A., Tam, J.P. and Memfield, R.B., J. Am. Chem. Soc., 111 (1989) 8024. 33 Stankova, M., Wade, S., Lam, K.S. and Lebl, M., Pept. Res., 7 (1994) 292. 34 Fields, G.B., Tian, Z. and Barany, G., In Grant, G.A. (Ed.) Synthetic Peptides: A User’s Guide, Freeman, New York, NY, U.S.A., 1992, pp. 77–183. 35 Fruchtel, J.S. and Jung, G., In Jung, G. (Ed.) Combinatorial Peptide and Nonpeptide Libraries, A Handbook, VCH, Weinheim, Germany, 1996, pp. 19–78. 36 Winter, M., In Jung, G. (Ed.) Combinatorial Peptide and Nonpeptide Libraries, A Handbook, VCH, Weinheim, Germany, 1996, pp. 465–510. 37 Stankova, M. and Lebl, M., Mol. Div., 2 (1996) 75. 38 Patek, M. and Lebl, M., Tetrahedron Lett., 32 (1991) 3891. 39 Bray, A.M., Maeji, N.J. and Geysen, H.M., Tetrahedron Lett., 31 (1990) 5811. 40 Hoffmann, S. and Frank, R., Tetrahedron Lett., 35 (1994) 7763. 41 Holmes, C.P. and Rybak, C.M., In Hodges, R.S. and Smith, J.A. (Eds.) Peptides: Chemistry, Structure and Biology (Proceedings of the 13th American Peptide Symposium), ESCOM, Leiden, The Netherlands, 1994, pp. 992–994. 42 Lebl, M., Krchnák, V., Sepetov, N.F., Nikolaev, V., Stierandova, A., Safar, P., Seligman, B., Strop, P., Lam, K.S. and Salmon, S.E., In Epton, R. (Ed.) Innovation and Perspectives in Solid Phase Synthesis, 3rd International Symposium, Mayflower, Birmingham, U.K., 1994, pp. 233–238. 43 Kania, R.S., Zuckermann, R.N. and Marlowe, C.K., J. Am. Chem. Soc., 116 (1994) 8835. 44 Kocis, P., Krchnák, V. and Lebl, M., Tetrahedron Lett., 34 (1993) 7251. 45 Cardno, M. and Bradley, M., Tetrahedron Lett., 37 (1996) 135. 46 Rivier, J.E., Jiang, G.-C., Simon, L., Koerber, S.C., Porter, J., Craig, A.G. and Hoeger, C.A., In Kaumaya, P.T.P. and Hodges, R.S. (Eds.) Peptides: Chemistry, Structure and Biology (Proceedings of the 14th American Peptide Symposium), Mayflower, Birmingham, U.K., 1996, pp. 275–277. 47 O’Donnel, M.J., Zhou, C. and Scott, W.L., J. Am. Chem. Soc., 118 (1996) 6070. 48 Schwesinger, R., Willaredt, J., Schlemper, H., Keller, M., Schmidt, D. and Fritz, H., Chem. Ber., 127 (1994) 2435. 49 Kates, S.A., Carpino, L.A. and Albericio, F., In Kaumaya, P.T.P. and Hodges, R.S. (Eds.) Peptides: Chemistry, Structure and Biology (Proceedings of the 14th American Peptide Symposium), Mayflower, Birmingham, U.K., 1996, pp. 893–895. 50 Wijkmans, J.C.H.M., Blok, F.A.A., Van der Marel, G.A., Van Boom, J.H. and Bloemhoff, W., In Kaumaya, P.T.P. and Hodges, R.S. (Eds.) Peptides: Chemistry, Structure and Biology (Proceedings of the 14th American Peptide Symposium), Mayflower, Birmingham, U.K., 1996, pp. 92–93. 51 Fuller, W.D., Cohen, M.P., Shabankareh, M., Blair, R.K., Goodman, M. and Naider, F.R., J. Am. Chem. Soc., 112 (1990) 7414. 52 Zhu, Y.-F. and Fuller, W.D., Tetrahedron Lett., 36 (1995) 807. 53 Dimaio, J. and Schiller, P.W., Proc. Natl. Acad. Sci. USA, 77 (1980) 7162. 54 Sham, H.L., Bolis, G., Stein, H.H., Fesik, S.W., Marcotte, P.A., Plattner, J.J., Rempel, C.A. and Greer, J., J. Med. Chem., 31 (1988) 284. 55 Smith, D.D., Slaninova, V.J. and Hruby, V.J., J. Med. Chem., 35 (1992) 1558. 56 AI-Obeidi, F., Castrucci, A.M., Hadley, M.E. and Hruby, V.J., J. Med. Chem., 32 (1989) 2555. 57 Salmon, S.E., Lam, K.S., Lebl, M., Kandola, A., Khattri, P.S., Wade, S., Patek, M., Kocis, P., Krchnák, V., Thorpe, D. and Felder, S., Proc. Natl. Acad. Sci. USA, 90 (1993) 11708. 58 Eichler, J., Lucka, A.W., Pinilla, C. and Houghten, R.A., Mol. Div., 1 (1995) 233. 59 Kramer, A., Schuster, A., Reineke, U., Malin, R., Volkmer-Engert, R., Landgraf, C. and SchneiderMergener, J., Methods (A Companion to Methods Enzymol.), 6 (1994) 388. ^

^

^

206

Synthetic peptide libraries 60 Eichler, J., Lucka, A.W. and Houghten, R.A., Pept. Res., 7 (1994) 300. 61 Bradshaw, C.G., Chollet, A.R. and Wells, T.N.C., In Schneider, C.H. and Eberle, A.N. (Eds.) Peptides 1992 (Proceedings of the 22nd European Peptide Symposium), ESCOM, Leiden, The Netherlands, 1993, pp. 318–319. 62 Darlak, K., Romanovskis, P. and Spatola, A.F., In Hodges, R.S. and Smith, J.A. (Eds.) Peptides: Chemistry, Structure and Biology (Proceedings of the 13th American Peptide Symposium), ESCOM, Leiden, The Netherlands, 1994, pp. 981–983. 63 McBride. J.D., Freeman, N., Domingo, G.J. and Leatherbarrow, R.J., J. Mol. Biol., 259 (1996) 819. 64 Spatola, A.F. and Romanovskis, P., In Jung, G. (Ed.) Combinatorial Peptide and Nonpeptide Libraries, A Handbook, VCH, Weinheim, Germany, 1996, pp. 327–347. 65 DeWitt, S.H., Kiely, J.S., Stankovic, C.J., Schroeder, M.C., Cody, D.M.R. and Pavia, M.R., Proc. Natl. Acad. Sci. USA, 90 (1993) 6909. 66 Maeji, N.J., Bray, A.M. and Geysen, H.M., J. Immunol. Methods, 134 (1990) 23. 67 Gammon, G., Geysen, H.M., Apple, R.J., Pickett, E., Palmer, M., Ametani, A. and Sercarz, E.E., J. Exp. Med., 173 (1991) 609. 68 Mutch, D.A., Rodds, S.J., Benstead, M., Valerio, R.M. and Geysen, H.M., Pept. Res., 4 (1991) 132. 69 Wang, J.-X., Bray, A.M., DiPasquals, A.J., Maeji, N.J. and Geysen, H.M., Bioorg. Med. Chem. Lett., 3 (1993) 447. 70 Wang, J.-X., DiPasquals, A.J., Bray, A.M., Maeji, N.J. and Geysen, H.M., Bioorg. Med. Chem. Lett., 3 (1993) 451. 71 Spellmeyer, D.C., Brown, S., Stauber, G.B., Geysen, H.M. and Valerio, R., Bioorg. Med. Chem. Lett., 3 (1993) 519. 72 Spellmeyer, D.C., Brown, S., Stauber, G.B., Geysen, H.M. and Valerio, R., Bioorg. Med. Chem. Lett., 3 (1993) 1253. 73 Bastos, M., Maeji, N.J. and Abeles, R.H., Proc. Natl. Acad. Sci. USA, 92 (1995) 6738. 74 Maeji, N.J., Bray, A.M., Valerio, R.M. and Wang, W., Pept. Res., 8 (1995) 33. 75 Frank, R., Tetrahedron, 48 (1992) 9217. 76 Blankemeyer-Menge, B. and Frank, R., Tetrahedron Lett., 29 (1988) 5871. 77 Frank, R. and Doring, R., Tetrahedron, 44 (1988) 6031. 78 Frank, R., Bioorg. Med. Chem. Lett., 3 (1993) 425. 79 Frank, R., Hoffmann, S., Kieb, M., Lahmann, H., Tegge, W., Behn, C. and Gausepohl, H., In Jung, G. (Ed.) Combinatorial Peptide and Nonpeptide Libraries, A Handbook, VCH, Weinheim, Germany, 1996, pp. 363–386. 80 Frank, R., J. Biotechnol., 41 (1995) 259. 81 Schneider-Mergener, J., Kramer, A. and Reineke, U., In Cortese, R. (Ed.) Combinatorial Libraries: Synthesis, Screening, and Application Potential, Walter de Gruyter, Berlin, Germany, 1996, pp. 53–68. 82 Kramer, A. and Schneider-Mergener, J., In Maia, H.L.S. (Ed.) Peptides 1994 (Proceedings of the 23rd European Peptide Symposium), ESCOM, Leiden, The Netherlands, 1995, pp. 475–476. 83 Fodor, S.P.A., Read, J.L., Pirrung, M.C., Stryer, L., Lu, A.T. and Solas, D., Science, 251 (1991) 767. 84 Fodor, S.P.A., Rava, R.P., Huang, X.C., Pease, A.C., Holmes, C.P. and Adam, C.L., Nature, 364 (1993) 555. 85 Gallop, M.A., Barrett, R.W, Dower, W.J., Fodor, S.P.A. and Gordon, E.M., J. Med. Chem., 37 (1994) 1233. 86 Holmes, C.P., Adams, C.L., Kochersperger, L.M., Mortensen, R.B. and Aldwin, L.A., Biopolym. Pept. Sci. Sect., 37 (1995) 199. 87 Pinilla, C., Appel, J.R. and Houghten, R.A., In Zegers, N., Boersma, W and Claasen, E. (Eds.) Immunological Recognition of Peptides in Medicine and Biology, CRC Press, Boca Raton, FL, U.S.A., 1995, pp, 1–14. 88 Pinilla, C., Appel, J.R. and Houghten, R.A., Mol. Immunol., 30 (1993) 577.

207

Z.-G. Zhao and K.S. Lam 89 Appel, J.R., Pinilla, C. and Houghten, R.A., Immunomethods, 1 (1992) 17. 90 Pinilla, C., Appel, J.R. and Houghten, R.A., Gene, 128 (1993) 71. 91 Dooley, C.T., Kaplan, R.A., Chung, N.N., Schiller, P.W., Bidlack, J.M. and Houghten, R.A., Pept. Res., 8 (1995) 124. 92 Dooley, C.T., Chung, N.N., Schiller, P.W. and Houghten, R.A., Proc. Natl. Acad. Sci. USA, 90 (1993) 10811. 93 Blondelle, S.E. and Houghten, R.A., Annu. Rep. Med. Chem., 27 (1992) 159. 94 Eichler, J. and Houghten, R.A., Biochemistry, 32 (1993) 11035. 95 Pinilla, C., Appel, J.R., Blanc, P. and Houghten, R.A., BioTechniques, 13 (1992) 901. 96 Pinilla, C., Appel, J.R., Blondelle, S.E., Dooley, C.T., Eichler, J., Ostresh, J.M. and Houghten, R.A., Drug Dev. Res., 33 (1994) 133. 97 Dooley, C.T. and Houghten, R.A., Life Sci., 52 (1993) 1509. 98 Pinilla, C., Appel, J.R., Milich, D. and Houghten, R.A., In Hodges, R.S. and Smith, J.A. (Eds.) Peptides: Chemistry, Structure and Biology (Proceedings of the 13th American Peptide Symposium), ESCOM, Leiden, The Netherlands, 1994, pp. 1016-1017. 99 Deprez, B., Williard, X., Bourel, L., Coste, H., Hyafil, F. and Tartar, A., J. Am. Chem. SOC., 117 (1995) 5405. 100 Erb, E., Janda, K.D. and Brenner, S., Proc. Natl. Acad. Sci. USA, 91 (1994) 11422. 101 Songyang, Z., Shoelson, S.E., Chandhuri, M., Gish, G., Pawson, T., Haser, W., King, E, Roberts, T., Ratnofsky, S., Lechleider, R.J., Neel, B.G., Birge, R.B., Fajardo, J.E., Chou, M.M., Hanafusa, H., Schaffhausen, B. and Cantley, L.C., Cell, 72 (1993) 767. 102 Songyang, Z., Carrway III, K.L., Eck, M.J., Harrison, S.C., Feldman, R.A., Modammadi, M., Schlessinger, J., Hubbard, S.R., Smith, D.P., Eng, C., Lorenzo, M.J., Ponder, B.A.J., Mayer, B.J. and Cantley, L.C., Nature, 373 (1995) 536. 103 Ellington, A.D. and Szostak, J.W., Nature, 346 (1990) 1104. 104 Chu, Y-H., Avila, L.Z., Biebuyck, H.A. and Whitesides, G.M., J. Org. Chem., 58 (1993) 648. 105 Chu, Y.-H., Dunayevskiy, Y.M., Kirby, D.P., Vouros, P. and Karger, B.L., J. Am. Chem. SOC., 118 (1996) 7827. 106 Furka, A., Sebestyen, F., Asgedom, M. and Dibo, G., poster presented at the 14th International Congress of Biochemistry, Prague, Czechoslovakia, 1988. 107 Furka, A., Sebestyen, E, Asgedom, M. and Dibo, G., poster presented at the Xth International Symposium on Medicinal Chemistry, Budapest, Hungary, 1988. 108 Furka, A., Sebestyen, F., Asgedom, M. and Dibo, G., Int. J. Pept. Protein Res., 37 (1991) 487. 109 Lam, K.S. and Salmon, S.E., U.S. Patent 5510240, 1996. 110 Lebl, M., Krchnák, V., Sepetov, N.F., Seligmann, B., Strop, P., Felder, S. and Lam, K.S., Biopolym. Pept. Sci. Sect., 37 (1995) 177. III Lam, K.S. and Lebl, M., In Jung, G. (Ed.) Combinatorial Peptide and Nonpeptide Libraries, A Handbook, VCH, Weinheim, Germany, 1996, pp. 173–201. 112 Lam, K.S., Lebl, M. and Krchnák, V., Chem. Rev., (1997) in press. 113 Lam, K.S., Lake, D., Salmon, S.E., Smith, J., Chen, M.L., Wade, S., Abdul-Latif, F., Leblova, Z., Ferguson, R.D., Krchnák, V., Sepetov, N.F. and Lebl, M., Methods (A Companion to Methods Enzymol.), 9 (1996) 482. 114 Mattioli, S., Imberti, L., Stellini, R. and Primi, D., J. Virol., 69 (1995) 5294. 115 Steward, M.W., Stanley, C.M. and Obeid, O.E., J. Virol., 69 (1995) 7668. 116 De Koster, H.S., Amons, R., Benckhuijsen, W.E., Feijlbrief, M., Schellekens, G.A. and Drijfhout, J.W., J. Immunol. Methods, 187 (1995) 179. 117 Lam, K.S., Lou, Q., Zhao, Z.G., Chen, M.L., Smith, J., Pleshko, E. and Salmon, S.E., Biomed. Pept. Protein Nucleic Acids, 1 (1995) 205. 118 Lam, K.S. and Lebl, M., Immunomethods, 1 (1992) 11. 119 Smith, M.H., Lam, K.S., Hersh, E.M., Lebl, M. and Grimes, W.J., Mol. Immunol., 31 (1994) 1431. ^

^

^

208

Synthetic peptide libraries 120 Samson, I., Kerremans, L., Rozenski, J., Samyn, B., Van Beeumen, J., Van Aerschot, A. and Herdewijn, P., Bioorg. Med. Chem., 3 (1995) 257. 121 Lam, K.S., Wade, S., Abdul-Latif, F. and Lebl, M., J. Immunol. Methods, 180 (1995) 219. 122 Chen, J.K., Lane, W.S., Brauer, A.W., Tanaka, A. and Schreiber, S.L., J. Am. Chem. Soc., 115 (1993) 12519. 123 Combs, A.P., Kapoor, T.M., Feng, S., Chen, J.K., Daude-Snow, L.F. and Schreiber, S.L., J. Am. Chem. Soc., 118 (1996) 287. 124 Kassarjian, A., Schellenberger, V. and Truck, C.W., Pept. Res., 6 (1993) 129. 125 Nestler, H.P., Wennemers, H., Sherlock, R. and Dong, D.L.Y., Bioorg. Med. Chem. Lett., 6 (1996) 1327. 126 Still, W.C., Acc. Chem. Res., 29 (1996) 155. 127 Ohlmeyer, M.H.J., Swanson, R.N., Dillard, L.W., Reader, J.C., Asouline, G., Kobayashi, R., Wigler, M. and Still, W.C., Proc. Natl. Acad. Sci. USA, 90 (1993) 10922. 128 Boyce, R., Li, G., Nestler, H.P., Suenaga, T. and Still, W.C., J. Am. Chem. Soc., 116 (1994) 7955. 129 Lam, K.S., Zhao, Z.G., Wade, S., Krchnák, V. and Lebl, M., Drug Dev. Res., 33 (1994) 157. 130 Wennemers, H. and Still, W.C., Tetrahedron Lett., 35 (1994) 6413. 131 Pennington, M.E., Lam, K.S. and Cress, A.E., Mol. Div., 2 (1996) 19. 132 Sasaki, S., Takagi, M., Tanaka, Y. and Maeda, M., Tetrahedron Lett., 37 (1996) 85. 133 Wu, J., Ma, Q.N. and Lam, K.S., Biochemistry, 33 (1994) 14825. 134 Lam, K.S. and Wu, J., Methods (A Companion to Methods Enzymol.), 6 (1994) 401. 135 Lam, K.S., Wu, J. and Lou, Q., Int. J. Pept. Protein Res., 45 (1995) 587. 136 Lou, Q., Leftwich, M.E. and Lam, K.S., Bioorg. Med. Chem., 4 (1996) 677. 137 Meldal, M., Svendsen, I., Breddam, K. and Auzanneau, EI., Proc. Natl. Acad. Sci. USA, 91 (1994) 3314. 138 Meldal, M., Methods (A Companion to Methods Enzymol.), 6 (1994) 417. 139 Meldal, M. and Svendsen, I., J. Chem. Soc. Perkin Trans. 1, (1995) 1591. 140 Lam, K.S., Salmon, S.E., Hruby, V.J., Hersh, E.M. and AI-Obeidi, E, WO PCT 92/00091, 1992. 141 Jayawickreme, C.K., Graminski, G.F., Quillan, J.M. and Lerner, M.R., Proc. Natl. Acad. Sci. USA, 91 (1994) 1614. 142 Jayawickreme, C.K., Quillan, J.M., Graminski, G.F. and Lerner, M.R., J. Biol. Chem., 269 (1994) 29846. 143 Salmon, S.E., Liu-Stevens, R.H., Zhao, Y., Lebl, M., Krchnák, V., Wertman, K., Sepetov, N. and Lam, K.S., Mol. Div., 2 (1996) 57. ^

^

209

Phage display John Collins Department of Applied Genetics, Gesellschaft für Biotechnologische Forschung, 0-38124 Braunschweig, Germany

Introduction The methodology and application of phage- and phagemid-display mimics the evolutionary process (Fig. 1), concentrating on a single molecular property or interaction. As such it depends on huge diversity, a selection criterion determined by the environment (fixed by the experimenter) and the time factor, which can be measured in doubling times of the organism. In combinatorial libraries the diversity is enormous and concentrated in small regions or domains. The time scale is dramatically shortened by using organisms which have a very short generation time (e.g. ca. 3 min for a bacteriophage) and where the ‘functional’ selection criterion is often confined to affinity for a particular immobilized molecular target. It should also be noted that there is a built-in selection, as in evolution, which works at the level of replication time of individual organisms (phage variants) during the biological amplification step. The combinatorial approach with both biologically amplifiable systems and synthetic libraries look, in principle, very similar (Fig. 1). The main distinction lies in the quantity of the variant species needed to obtain sufficient material for analysis of the enriched sequence. If a single phage, from a population of 1014 phage (easily contained in 1 ml), with the required properties, survives the initial panning round, this is sufficient for subsequent analysis. Chemically synthesized libraries require at least 5 orders of magnitude more material, and, as such, are presently at the limits of the technology with a diversity of at most 108 variant molecules per ml. This is discussed in more detail below. Perhaps one of the most important recent developments with the phage-display system comes from the use of recombination and somatic mutagenesis techniques, which again mimics the evolutionary process in natural systems, to generate reassortments between mutagenized regions and to accumulate modifications over larger areas in complex proteins. Since the initial concept was first introduced in 1985 by Smith [1], the methodology has been applied to a wide spectrum of questions in an ever-broadening field of molecular biology. Although this makes a comprehensive review more difficult, I am indebted to other reviewers who facilitated the initial literature search; in particular Barbas and Burton [2], Burritt et al. [3], Choo and Klug [4], Clackson and Wells [5], Cortese (Ref. 6; including the many individual chapters), Cortese et al. [7], Jongsma et al. [8], Ladner [9] and Makowski [10]. The technology has been grouped together with other methods of combinatorial chemistry to be described as ‘irrational drug design’, in contrast to computer-aided model-based ‘rational design’. Even the most optimistic ‘rational’ protein designers admit that they cannot accurately predict the exact effect of a point mutation on the folding of an entire 210

Annual Reports in Combinatorial Chemistry and Molecular Diversity, Vol. 1. pp. 210–262 © 1997 ESCOM Science Publishers B. V.

Phage display

molecule, and may only model the first picosecond of an actual folding pathway, which may be of the order of a millisecond. To paraphrase one reviewer on the state-of-the-art of protein design [11]: ‘in semi-rigid models success comes more from fitting geometries based on the structure of known complex structures than on known structures of the uncomplexed components. Even in these cases assumptions of rigidity are invalid. In addition, simple scoring functions based on surface complementarity, surface area burial, solvation free energy, electrostatic-interaction energy, or the total molecular mechanics energy, often fail to distinguish near native and structures far from native.’ If this is the situation, then an empirical ‘seeing how the cookie crumbles’ approach such as phage display may indeed be the order of the day. However, the proof of the pudding is in the eating. Combinatorial libraries and biotechnological evolution

Fig. 1. A flow chart comparison of the use of combinatorial libraries and the evolutionary process (see text).

211

J. Collins

In addition, the mathematics of the undertaking, in terms of the number of combinations of possible structures, is against the ‘rational’ approach. The number of variants of a small protein 80 amino acids in length far exceeds the estimated number of atomic particles in the universe. If an empirical approach works effectively, delivering more data and specific ligands in a shorter time, then one must consider the other approaches as irrational, or at least severely limited. The innovative potential and the ingenuity which have gone into designing libraries, vectors and selection protocols are impressive. However, considerable input has come from structural modelling and design considerations, in the proliferation of phage-display scaffolds to provide constrained variable epitopes in different environments or ‘contexts’. In addition the method has been used to understand protein folding, protein–protein interactions and structure–function relationships in catalysis. I hope that, in this review, I can do justice to this, recognizing, however, that not every recent contribution could be included. I have also, where relevant, discussed the distinct advantages, or even disadvantages, of phage display, in comparison with combinatorial chemical or other approaches, for the definition of intermolecular interactions, enzyme evolution and the generation of novel ligands. The application of phage display to defining (functional) binding motifs covers a huge range (these references are, in general, in addition to the more detailed account given in the text below); e.g. metal chelaters [12], ssDNA/RNA binding clones [13], DNA-binding transcription regulators [14], TCR mimotopes [15], definition of SH3epitopes [16], immunogenetic markers [17], MHC specificity [18,19], oligosaccharide mimotopes [20], activation of DNA binding of p53 [21], platelet-aggregation antagonists [22], vaccine development [2,23,24], calmodulin binding peptides [25], protease inhibitors (many; see Table 2) and tissue-specific ligands [26], to give just a few examples. Readers are referred to Smith and Scott [27] for a background introduction to the development of the method, basic concepts and protocols as well as applications up to 1992, and to Kay et al. [28] for a more recent treatise. Phage and phagemid display: Some important parameters Many millions of peptide or protein variants, all of which I will refer to collectively as ligands, can be presented on the surface of bacteriophage (phage). The complexity of variant epitopes generated in even a modest library exceeds that of all the epitopes encoded in the human genome, excluding the generation of somatic diversity inherent in the immunoglobulin gene rearrangements [29]. In the original protocols the gene for the ligand is fused to one of the phage coat protein genes in the same translational reading frame. As the phage are produced from individual clones, each variant product is thus physically coupled to the gene coding for that particular variant. Consecutive rounds of affinity adsorption, followed by amplification, enrich for ligands with specific binding properties. The amplification of the enriched phage (or phagemid; see below) is carried out in an F+ E. coli host. Repeating selection and amplification cycles leads to the isolation of a small number of clones which exhibit good binding properties to the target molecule in question. One cycle of panning (affinity adsorption) and amplification, which usually gives an enrichment for specific binding clones by a factor of between 100 and 10 000, only takes about 1 day. A quantitative increase in the proportion of the popula212

Phage display

tion binding can often be seen after the second or third panning step. These clones can be sequenced and consensus motifs defined. If a ranking of binding affinity has been done in parallel, the amino acid positions directly contributing to or, conversely, unimportant for binding can be rapidly visualized. The system, therefore, represents one of, if not the most rapid procedure(s) available for screening peptide binding motifs. More recently a wide range of small proteins have been developed as scaffolds for presenting constrained TABLE 1 NOMENCLATURE FOR M13 TYPE PHAGE- AND PHAGEMID-DISPLAY VECTOR TYPES AS PROPOSED BY SMITH [209] Vector type

Genotype

Helper phage / hybrid protein copies per phage particle

Typical vector(s)

Reference(s)

3

Phage, hybrid-gIII

Not required / 5

3+3

Phagemid, hybrid-gIII

Delivers pIII / ca. 1

3zip3

Delivers pIII /ca. 1

33 6+6

Phagemid, leu-zipper (Jun)-gIII+ hybrid-leuzipper (fos) Phage, hybrid-gIII+ gIII Phagemid, hybrid-gVI

fUSE5, fAFF1, 1, 40, 76, fdtetDOG1,fkSHO, 212, 224, fdCAT1 225, 226 pHEN, pCOMB3, 64, 67, 72, pFAB4, pSKAN8, 73, 118, 149, λSurfZap, pHMB, 218, 227, pM13Tsn::III, pLUCK, 228, 229, pICDlLS, pM835 (λint). 230, 231, pFab60, pCW93 + 232, 233 pCW99 (Cre/loxP) pJuFo, pBluescript66-12 37, 168, 171 *pOK2 (for DIRE)

66 8

Phage, hybrid-gVI +gVI Phage, hybrid-p VIII

88

Phage, hybrid-p VIII

8+8

Phagemid, hybrid-p VIII+p VIII

VV (lambda) DD (lambda)

Phage, hybrid-V+ V Phage, hybrid-D + D

Not required / ca. 1 Delivers pVI/ <<1

M 13-‘33’, M 13704 pDONG61, lZLG6 (pZLG6)

Not required /ca. 1 Not required but extension limited to ca. 7 aa/ca. 2700 Not required / ca. 25-100 Delivers pVIII / 25–>100

Theoretical f8-1, fd8(series), PM52, 113, 116, PM54 194

MB56, MB49, fdAMPLAY88 pC89, p8V5 pMl3Tsn::VIII, MB26, pKfdH Not required / ca. 50 λfoo Not required / >1 (2-35 λpRH804, λfooDc

156, 234 121, 235

194, 236 23, 33, 196, 201, 211, 226, 231 122 124, 125

Phagemid may be propagated as plasmids without phage production. Upon bacterial infection with helper phage, both helper and phagemid are packaged by virtue of the MI3 (or fd) replication/packaging origin being present on the phagemid. Type 3, 33 and 3+3 vectors fuse the presented ligand to the amino terminus of either full-length (all) or amino-terminally truncated pIII (e.g. 198–406 fragment; types 3+3 and 33 only). Types 6 and 8 are designated accordingly, indicating fusion to pVI (C-terminal extensions) and pVIII, respectively, Type 66 is presented as a hypothetical possibility. Lambda vectors, DD and VV, are explained in more detail in the text. l (lambda) proteins are designated alphabetically, i.e. pV is protein ‘Vee’ not five, and pD is protein ‘Dee’ not fifty. The column ‘Helper phage/hybrid protein copies per phage’ indicates whether or not a helper phage is needed to supply coat protein and replication functions, and the valency of the displayed hybrid protein.

213

J. Collins TABLE 2 PROTEINS PRESENTED WITH PHAGE DISPLAY Protein

MW (kDa)

Vector typea

Bank

Designed for

Reference(s)

Cytokines/growth factors hGH IL-3 TNF IL-6 IL-8 CNTF Fos-C5a

22 15 3×17 20 2 × 8.4 23 8.4

3+3 3 3+3 3+3 3+3 3+3 3+3

Yes No No Yes No Yes (Yes

Protein design Ligand interaction studies " (and new scaffold) "

95, 221 237 5 238, 149 5 239, 240 241

Receptors/lectins CD4 FceRI α-chain Ricin B chain PDGF receptor Protein A, B domain, mini Z-domain

25 19 32 55 6.2 (30 3.7

3 3 3 3 3+3, DD 3+3

No No No No Yes Yes

Ligand interaction studies " " "

159 242, 243 159 159 244, 124 153

Enzymes Trypsin Staphylococcusnuclease Alkaline phosphatase

23 16.5 2×60

33, 88 3+3 3, 3+3 3zip3b 3+3

No Yes No

?

Yes

Change substrate specificity

No No Yes No No Yes

Glutathione transferase (AI-I) Lysozyme β-Lactamase Subtilisin Dihydrofolate reductase cdk (kinases) β-Galactosidase

ca. 25 14.4 50

Affinity matrix? Minimal discont. epitopes

231 159, 103 245, 73 171 246

4×118

3+3 3, DD 3 3zip3 kip3 VV, DD

Enzyme inhibitors Tendamistat

8.2

33

Yes

General, constrained epitope

156

Protease inhibitors BPTI APPI PAI PSTI LDTI-C LACI-Dl Ecotin Cystatin

6.5 6.5 42 6.5 5.2 6.7 16 12

3, 6+6 3+3 3+3 3+3 3+3 3 3+3 3+3

Yes Yes Yes Yes Yes Yes Yes Yes

Inhibitor optimization (novel specificities)

87, 235 185 104 72 –e

25 2×25 50

3, 3+3, 33 Yes 3+3, 3, 8+8 Yes 3+3 Yes

Antibody fragments scFv Fab Diabody (divalent scFv dimer)

214

Test mechanism-based inhibitor selection

" " " " "

General General General, double valency

73 247, 248, 124 247 –c –d 122, 124

89 249 250 213 211, 228 214

Phage display TABLE 2 (continued) Protein DNA-binding proteins Zif268 Rop (ColE1) cI (lambda repressor)

MW (kDa)

Vector typea

Bank

Designed for

Reference(s)

10 14.3 (dimer) (dimer)

3+3 3+3

Yes Yes

251 252

3zip3

DNA-binding specificities Novel scaffold; 4 a-helical bundle; protein design

No

–f

Natriuretic peptides ANP (and minidomain)

3.1 1.6

3+3

Yes

Minimum size of constrained 5 epitope for receptor (152) interaction

cDNA expression Ancylostoma caninum Mouse B-cell S. aureus Human PBLs (β-actin)

– – >8 ca. 23

6+6 3zip3b 3+3 3zip3b

Yes Yes Yes Yes

20–>40

3zip3b

Yes

Select protease inhibitors Selection of B-cell cDNAs Pathogenecity factors HIV reverse transcriptase interaction Allergen isolation

3.6

8+8

Yes

12

3

Yes

120

VV

No

Aspergillus fumigatus ribotoxin reAsp fX Miscellaneous (designed) Cys2His2 zinc-finger consensus (CP-1) Cytochrome b562 Agglutinin

235 37 253 254 171, 216, 255

Metal binding site 150 (pharmacophore modelling) Novel scaffold; 4 a-helical 155 bundle; protein design 122

For vector nomenclature, see Table 1. Table 2 includes and extends the data compiled by Clackson and Wells [5]. 'Hybrid protein' designates the presented ligand fused to a coat protein (minor coat proteins, normally five copies of each of the wild-type proteins; or the major coat protein (pVIII) normally present at ca. 2700 copies or more if the vector is larger). b This vector allows separate expression of the cDNA fragments (zip2-X) and truncated pIII (zipl-pIII; designated 3zip3 type in Fig. 3) which associate extracellularly as heterodimers, via an antiparallel leucine zipper, stabilized by two cysteine–cysteine bridges. c R. Crameri and D. Stuber, quoted as a personal communication in Ref. 210. d Jaussi et al., quoted as a personal communication in Ref. 210. e E. Auerswald and A.S. Tanaka, personal communication. Abbreviations: ANP, atrial natriuretic protein; APPI, amyloid b-protein precursor inhibitor (Kazal inhibitor analogue); BPTI, bovine pancreatic trypsin inhibitor (Kunitz inhibitor); cI, l phage repressor; CD4, helper T-cell determination antigen; cdk, intracellular cycline-dependent kinases from man and hamster; CNTF, ciliary neurotrophic factor; Fab, heterodimer of light and heavy chain V and C1 domains; FceRI, mast cell Fc receptor I; hGH, human growth hormone; IL, interleukin; LACI-D1, a Kunitz inhibitor domain of the human lipoprotein associated coagulation inhibitor; LDTI-C, a novel trypsin inhibitor from the leech; PAI, plasminogen activator inhibitor; PBL, peripheral blood lymphocyte; PDGF, platelet-derived growth factor; Protein A, Staphylococcus aureus immunoglobulin binding protein A; PSTI, pancreatic secretory trypsin inhibitor (Kazal inhibitor); Rop, copy number control protein of ColE1; scFv, single chain fusion of the light and heavy chain V domains of immunoglobulin; TNF, tumour necrosis factor; Zif168, zinc-finger DNA-binding protein. f R. Crameri, quoted as a personal communication in Ref. 210. a

215

J. Collins

epitopes. Also special vectors designed for the presentation of cDNA libraries, and for detailed analysis of structure–function relationships of molecular interactions, including catalysis, have been produced (see below and Table 2). The distinction between phage and phagemid is not always clearly made in the literature. This is often confusing if a vector is cited without recapping on its basic properties. The whole methodology is referred to as phage display, since usually the f-type (f1, fd, M13; see also the special section on new developments with lambdoid phage) phage coat is used to assemble the final packaged particle. There are, however, important differences between phage and phagemid with respect to the number of copies of the ligand presented (valency), e.g. monomeric (types 3+3 and 6+6 phagemid, and types 33 or 66 phage), pentameric (type 3 and type 6 phage; see Ref. 30) and higher valencies, ca. 25–100 (type 8+8 phagemid and type 88 phage) up to 2700 copies (type 8 phage) (see Table 1). It should be noted that many vectors which are of the 33 or 3+3 type use a fusion peptide in which only a carboxy-terminal portion of pIII is present in the fusion protein (fCA55 [31]; fKN16 [32]), which cannot mediate infectivity, on its own, but allows presentation of the ligand on the phage. The functional pIII protein, needed for infectivity, is supplied, along with other coat proteins for the type 3+3 vectors, by packaging in the presence of helper phage. High valency imposes restrictions on the ability to select very strong binding clones from a population with diverse affinities, due to the so-called ‘chelation effect’. This is comparable to the difference seen in the avidity of a pentavalent IgM molecule compared to the affinity of a monovalent Fv fragment. High valency vectors do allow isolation of weak binding clones in the absence of stronger binding clones and, as such, can be useful to find lead motifs which can be extended or mutated for further optimization (e.g. Ref. 33). If the affinities of a monovalent ligand were 10–4 M, it could show an avidity as a pentameric molecule, on a high density target, of up to 10–9 M. This effect can be avoided to some extent by panning on targets immobilized at low density, and by competition with a ligand during washing and/or elution. Following on the review of Clackson and Wells [5], I have updated the listing of proteins which have been expressed by phage display (Table 2), with reference to which type of vector has been used (nomenclature as in Table 1). The use of fusions to the minor coat proteins pVII and PIX has been attempted [34], although with little success. Knowledge of the overall structure of the fd phage (see Fig. 2) comes largely from the spectroscopic analyses of Makowski (Ref. 35; see also detailed structural modelling from Marvin et al. [36]), who also predicts that in the fd-related phage Pf1 the pVIII major coat protein may be more tolerant of longer inserts, i.e. for the creation of novel type 8 vectors. An interesting vector (which I call 3zip3 type in Table 2 and Fig. 3) is presented in Table 2. Since the cDNA gene product is fused to the carboxy terminus of the fusion protein (a leucine zipper), it has the advantage that it only requires the correct reading frame at the 5'-end of the cDNA in creating the bank [37] and as such would be some threefold more efficient in creating the bank. Although there is no apparent limitation to the size of a protein which may be presented, it should be noted that with the exception of Zif268 (e.g. the zinc-finger protein), cytochrome b562, CP- 1 (a designed zinc-finger consensus) and Rop (a small four-α-helicalbundle DNA replication control protein from ColE1), all the proteins displayed on phage, 216

Phage display

directly as fusions to phage capsid proteins (type 3, 3+3, 6+6, 88 and 8+8 vectors), are normally secretable proteins. There is an additional limitation to the size of proteins (or peptides) which can be efficiently presented on pVIII, although type 8+8 vectors seem reasonably tolerant at low valencies. The expression of proteins as fusions with a leucine zipper (type 3zip3 vectors; Table 1; Fig. 3) which then associates with a leucine-zipper-pIII fusion protein allows the presentation of other large nonsecreted proteins (β-actin, DHFR, cI, cdk; Table 2). Developments to construct λ phage-display vectors, which should allow the presentation of nonsecretable proteins, are treated separately below. The various modalities in which proteins are presented on the display vector are illustrated in Fig. 3 with respect to the association of monomers, heterodimers, and particularly for antibodies, the Fab, scFv and diabody display (see Refs. 38 and 39 for reviews). The ‘direct interaction rescue’ system is treated separately below. Library type and suitability Different users will emphasize various properties which they require in a phage-display library. I have made a selection of a few of the important considerations to be made in selecting or developing such a library. In some cases different motifs have been found for

Fig. 2. The structure of M13 (fd), according to Makowski [10,35], and λ bacteriophage, showing the proteins which have been tested or implemented in phage display. The phage are both oriented so that their ‘feet’, which they use for attachment to, and penetration through, the cell membrane, are at the top of the diagram. The M 13 proteins are numbered with roman numerals. λ (lambda) proteins are designated alphabetically, i.e. pV is protein ‘Vee’ not five, and pD is protein ‘Dee’ not fifty.

217

J. Collins

Fig. 3. The presentation of proteins in different forms fused to a phage coat protein. pIII-protein fusions are given as an example. All proteins are oriented with their amino terminus to the left unless otherwise indicated. References: For pVI fusions see Ref. 79; Fab (e.g. Ref. 79); pVI fusion [21 1]; see Ref. 38 for a review. Fab presentation is a paradigm for any heterodimer presentation, where the second (or third or more?) subunit can be expressed separately with its (their) own signal peptide. Association of the subunits occurs in the periplasm or at the surface of the cell. Association can even be carried out with a constant second chain by renaturation out of a guanidinium hydrochloride solution on purified phage [212]; scFv [I 60,2131; diabodies [214,215]; review [39]. The correct configuration of the divalent bispecific antibody is preferentially induced by having a very short linker between the VL and VH domains, preventing them from associating with each other. The VHa-VLb fusion is expressed from a separate cistron and associates to the indicated divalent ‘diabody’ form after being secreted into the medium; with 3zip3 [37,171,216] an expressed cDNA, as a Cterminal extension to a leucine zipper (for example), associates via an antiparallel binding to a truncated pIII having an amino-terminal leucine-zipper extension; in ‘direct interaction rescue’ (DIRE; Refs. 168 and 169) a functional pIII molecule is only reconstituted if there is a strong interaction between the proteins X and Y, one of which would be either in hypermutated form or an expressed cDNA epitope.

the same binding target, e.g. calcium-dependent calmodulin binding [25], although no clear explanation is apparent. Thus, in spite of the validity of the following general considerations, it may be advisable to try a range of display libraries to initiate a lead towards developing optimal ligands for a particular target. 218

Phage display

Binding kinetics and library size One would like to attain a stringent selection for the strongest binding clones from a mixture of molecules with a wide range of affinities to the target, even if only very small amounts of the target are available. Barrett et al. [40] studied the kinetics of binding and dissociation, as well as considering the effect of lowering target concentration. The binding affinities of synthetic peptides and phage carrying the same peptide sequence were found to be very similar. If only small amounts of target are available, the initial binding kinetics with very large banks (e.g. with more than 1010 primary clones, where the concentration of an individual species in a population of 1013 particles/ml is less than 10–17 M) will be highly unfavourable in selecting the ideal variant. That the technique works at all is due to the fact that the reaction is driven by the concentration of the immobilized target and by increasing the exposed surface to which it is attached (e.g. using beads, or allowing association with a soluble, biotinylated target which is subsequently trapped on an avidincoated surface; the affinity of avidin for biotin is barely distinguishable from a covalent attachment, ca. 10–14 M). When a single ligand molecule is presented per particle, the onus for binding phage is thus placed on a low dissociation rate constant. This supports a rationale in which phagemid with multivalent presentation are used in initial screens [33]. Subsequently, dedicated subset libraries (so-called ‘second-generation libraries’) can be used in later optimization pannings. However, if a single ‘ideal’ phage variant survives the initial panning round, it is more highly favoured in the subsequent rounds, where it is then present at higher concentrations. To test if enrichment is indeed taking place during panning, an increase in the number of bound phage is checked initially by ELISA or titrating the transducing particles. This is usually seen after two to five rounds of panning. A number of randomly selected clones are sequenced. A few clones are usually seen to be represented more than once, in a successful panning, and sequence homologies (consensus motif(s)) can often be recognized between several clones. Where random point mutation is used throughout the scaffold, as in the mimicry of ‘somatic maturation’ in antibodies [41], more than 10 rounds of ever-increasing washing stringency may be appropriate, as multiple mutations accumulate. Where a preferential enzyme substrate, e.g. for a protein kinase, is to be enriched, it is only the modified (e.g. phosphorylated phage, Ref. 42) phage which is selected. This modified ligand phage can, for instance, be enriched on an antibody, which can be immobilized in vast excess. In this case the enzyme need only be present in trace catalytic amounts, and indeed a limiting concentration of the enzyme is necessary to distinguish the preferred substrate for promiscuous kinases [43]. Recognition of synergistic intramolecular interaction: Phage versus synthetic combinatorial libraries To identify an individual enriched ligand from synthetic chemical libraries (see the chapter by Zhao and Lam), the number of variants used is limited by the amount of material which is needed for sequence analysis. The use of smaller beads has been made possible by inclusion of an encryption tag, e.g. an oligonucleotide, the sequence of which can be determined after PCR [44,45]. The size of such banks is, all the same, similar to phage-display libraries with some 108 molecular species per ml, using 10 µm beads [46]. 219

J. Collins TABLE 3 CONSENSUS SEQUENCES FOR HUMAN PANCREATIC SECRETORY TRYPSIN INHIBITOR (PSTI) VARIANTS, SELECTED BY PHAGE DISPLAY IN THE pSKAN8 VECTOR, WHICH INHIBIT EITHER BOVINE CHYMOTRYPSIN (CHY) OR HUMAN LEUKOCYTE ELASTASE (HLE) WITH SUBNANOMOLAR Ki VALUES (REF. 72; AND P. RÖTTGEN AND J. COLLINS, UNPUBLISHED DATA) Library pSKAN8-HyB (X7C)

pSKAN8-HyC (CX6VC)

Best CAPD-designed PSTI variants

CHY-inhibitors 19 E3 D1 I1 A1 S1 L1 N1

17 T3 P2 D1 R1 S1 L1

18 W4 Y2 F1 M1 L1

17 T1 G8 P7 Q3 L1 E1 V1 A1

18 19 W20 D7 Y5 V6 L4 S6 F3 N3 Mq T 3 M3 I2 Q2 E1

20 L3 S2 Y1 R1 V1 I1

HLE-inhibiors 21 R7 K1 Q1

22 P3 W3 T1 R1 S1

23 V6 M1 I1 L1

20 21 22 23 L17 R31 P27 V – M7 K3 R3 W4 A2 R3 Q1 Y2 T1 H1

17 18 19 20 21 22 23 T Y D Y R P V – – –

No enriched motif

17 18 19 (1) P2 V2 N1 S1 and 17 18 19 (2) P4 A4 V3 T1

20 21 22 23 W1 R1 P1 V – Y1 M1 20 21 22 23 L4 Q1 W3 V – L1 A1 F1

17 18 19 20 21 22 23 T V I Y – – V – F P –

The aa position is given above the bold major consensus sequence. The number of times that a particular amino acid was found at each position is shown as a suffix. Positions not varied are underlined. It should be noted that in the case of the CHY-inhibitor selection from two different banks (HyB (CX7C) and HyC (CX6VC)), a highly reproducible picture is obtained for both the main consensus, and for the ranking of amino acids which can occur as the second or third most favoured. The positions at which the highest (position 3) and lowest (position 5) variability is tolerated are also reproduced in the analyses with the two libraries. The HLE-inhibitor variants show a different situation. Although the variants are all enriched from one bank, they fall into two groups, implying a strong differential synergism between amino acids in the inhibitory loop. It should be noted that for HLE-inhibitors the most variable position is now position 5. Sequences optimized during a computer-aided protein design project [88] are shown for comparison, with the nonvaried positions (found in hPSTI) underlined.

The alternative approach, also using synthetic molecules, is to look at consensus sequences, where invariable positions can be recognized above background, e.g. in the determination of the phospho-peptide-binding specificity of SH2-domains [47,48], and peptide-binding to SH3-domains [49,50]. This will not provide information on optimal subsets where synergistic interactions between neighbouring amino acids within the variable sequence may play a major role, and which may indeed be the strongest binding clones (see Table 3). Such subspecies may be a very minor class and disappear in the consensus 220

Phage display

background. It is also noted that there are examples where a receptor may bind in two orientations, e.g. in the case of the SH3-domain [51], or where no dramatic anchor residues are essential, or even where more than one site on the target can act as a strong binding site. This may scramble a potential consensus sufficiently so as to make it uninterpretable. Epitope mapping: Is phage display the best way? There are innumerable examples of using phage display for recognizing the (hopefully) linear epitope recognized by a given antibody (see Refs. 3 and 7). In 11 examples, most showed linearity with sequences in the original antigen, others seem to mimic complex epitopes in another fashion, e.g. Renschler [52] showed that peptides binding to an antibody produced by a lymphoma fell into three consensus classes. My personal opinion is that this can be done more efficiently with synthetic peptides. Where the antigen is known, epitope scanning with overlapping synthetic peptides based on the original sequence is routine (e.g. Ref. 53). Even when the original antigen is unknown, this can be done efficiently according to the matrix immobilized ‘Spot method’ [54–58] which generates 400 sets of peptides of the form (say) XXXO1O2XX, where O1 and O2 are defined amino acids for each set, and X is a mixture of all 20 amino acids. The preferred set is defined (O1 = D1, O2 = D2), e.g. by ELISA, and the next 400 sets prepared, of the form XXO1D1D2O2XX. After three to four rounds of selection (e.g. panning), short linear epitopes are defined. Constrained phage-displayed peptide libraries often yield the strongest binding mimotopes having consensus motifs unrelated to the original linear epitope (e.g. Refs. 59 and 60; and numerous examples below), so that a correlation between the consensus and a putative antigen would be impossible. On the other hand, the situation is dramatically in favour of phage-display libraries where disease, or autoimmune disease antigen epitopes are to be recognized in polyclonal serum (e.g. Ref. 61 discussed below), a problem which can hardly be approached by synthetic library methods. How good are the banks? Vector and library stability A library which is to be distributed or to be used as a reference (e.g. ‘immortalized’ antibody repertoire, or general cDNA expression library) should exhibit stability as a gene bank, even when greatly amplified. In general the phagemid vectors are to be considered more stable than phage libraries, where the latter depend almost entirely on single-strand DNA propagation during amplification. In addition problems may occur due to intolerance by E. coli for certain sequences during DNA replication, with perhaps deletion of repeats and palindromes (e.g. Refs. 62 and 63). The problem of deletions has been recognized in the vector pComb3 and more stable derivatives were constructed by deletion of short repeats (pICD1 LS; Ref. 64). Libraries constructed by Cre/loxP recombination [65–69], where the loxP sites are present as direct repeats, might be expected to show similar instability. Largely anecdotal examples and logical comparisons with other expression systems imply that a negative selection for a subpopulation may occur during the amplification steps. This may be due to poor translation or transcription, toxicity of the protein presented, lack of secretion (e.g. a run of more than eight consecutive hydrophobic amino 221

J. Collins

acids, amongst many others) and wrong folding, followed by proteolytic degradation, particularly where a constrained scaffold protein is being used. In fact, the problem of toxicity (pIII; Ref. 70) can be readily demonstrated with some vectors during derepression of the pIII-hybrid protein production in the absence of phage production. In one detailed study with nonapeptide fusions to pIII in the fAFF vector, a dramatic reduction of phage titre (3-7orders of magnitude) was seen, directly corresponding to the number of positively charged amino acids within the varied region [71]. Hosts carrying SecY-secretion pathway prlA-suppressor mutations almost completely restored the phage titre to normal. Another way to reduce the lethal effect of the gene products during propagation is to use strongly repressible promoters (such as λpL) for the control of production of the protein or peptide to be presented on the phage (e.g. Refs. 64, 72 and 73). Another study, in which a random decapeptide, (NNN)10, was inserted after amino acid 7 of the mature pIII in fUSE5 (type 3), showed no appreciable bias for codon usage or amino acid distribution [74]. The problem of folding may be combatted by choosing proteins for a presentation scaffold for which a tolerance of diversity in (a) specific loop(s) has been demonstrated or is implicit, e.g. conotoxins, serine protease inhibitors and antibodies. The negative effect of the presented ligand on propagation (i.e. in this case the small plaque phenotype) has been used effectively by O’Neil et al. [75] to investigate mutations in the immunoglobulin-binding B-domain of the staphylococcal protein G. Variants in which the folding of the domain could be destabilized gave larger plaques. It is often difficult to evaluate the quality of banks described in the literature with respect to the parameters described above, particularly with respect to folding. Often physical evidence that a hybrid protein is actually displayed is not presented. In many cases, including peptide libraries, it is still unclear if infectivity is increased due to natural proteolytic cleavage of amino-terminally extended pIII hybrid proteins, and to what extent this proteolysis is actually occurring [76]. An investigation of the fraction of phage (vector type 3, 6 or 8) primary clones (selected as antibiotic resistant colonies) which can actually generate recombinant phage would give a partial answer to the questions posed above, but, with the exception of Ref. 71, discussed above, I have not found such an analysis in the literature. The very low absolute titres of phage (where given), often only 1010 to 1011 pfu/ml, compared to over 1012 for wild-type M13, during bank preparation imply that this could be a considerable problem. Since the whole protocol is subject at various stages to an evolutionary selection (Fig. l), problems which affect viability and replication rates during propagation can dramatically counterselect subpopulations of clones. Since phage are constantly in the propagation mode it would seem reasonable that phage banks, as compared to phagemid, which are amplified mostly as plasmids, without phage protein synthesis, would be more susceptible to such negative selection. All these factors contribute to a reduction in the actual number and relative abundance of variants present in the library, particularly after amplification. The theoretical number of combinations present in the bank is usually considerably less than the number of possible combinations, if more than seven positions are randomized. How the clonal representation may further be influenced by the mutation strategy is treated in the following section. 222

Phage display

Codon usage, bias and representation As Clackson and Wells [5] pointed out in their review, the actual number of clones necessary to be sure (99%) that all possible variants are present, at least once in the primary clone set, is at least 5 times greater than the number of possible permutations of amino acid sequences (i.e. 20n, where n is the number of codons fully randomized). The number required may be considerably higher due to e.g. biased base distribution in the random sequences inserted and the degenerate nature of the genetic code. Using the hypervariable codons NNG/T [77], which reduces stop-codon frequency to 1 in 32 (21% of octamer peptide coding sequences will be interrupted by at least one stop-codon TAG), as opposed to 3 in 64 for NNN, affects the distribution frequency of the amino acids: arginine, serine and leucine will occur threefold more frequently, and alanine, glycine, proline, threonine and valine twice as frequently as the rest of the amino acids [72,78]. This bias is similar to that found in natural proteins in terms of relative frequency rank-

Fig. 4. Computer model of the interaction of chymotrypsin and the PSTI variant containing the sequence CRWDLRRVC (aa 16-24) (J. Collins, P. Röttgen and U. Lessel, unpublished results). Asp19 and Arg17 from the PSTI variant are shown in the proximity of the active site of the catalytic triad shown with space-filling structures, Ser195, His57 and Asp102. The α-carbonyl backbone of the inhibitory loop is marked in yellow. It appears that, contrary to other inhibitors of the Kazal or Kunitz class, two inhibitor side chains intrude into the active site, where Arg17 neutralizes the charge from Asp102 and actually moves it away from the histidine. Asp19 approaches the histidine ring from the other side. The aliphatic body of Arg17 lies over the histidine ring and would form a hydrophobic attraction. It is implied from this model that the inhibitor directly inactivates the catalytic triad, with the side chains of amino acids 18 and 21 occupying the substrate-binding pockets.

223

J. Collins

Fig. 5. Strategy for the use of mutational libraries following the protocol used by Schier et al. [94] (see also Refs. 85, 93, 128 and 130) for the optimization of a large 20 aa loop in the CDR3 of an antibody heavy chain. The sequence (theoretical) found in the variable region of the best binding clones from the first selection protocol, e.g. enriched population after three panning rounds, is shown as four separate regions, which are mutated separately, to produce four mutagenesis libraries. The 70: 10: 10: 10 mutagenesis method is used. For amino acids encoded by degenerate codon sets, where the third base is undetermined, this means that in 51% of the cases the original amino acid will be substituted by one of the other 19 possibilities. For amino acids encoded by a single codon (methionine and tryptophan) the original codon will be replaced at a frequency of 66%, and 61% substitution for those encoded by two codons. Following further panning selection, an example is shown for one of the mutagenesis libraries of the enriched sequences. On the basis of these data the sequence of the best binder is seen to contain three changes compared to the original sequence. It is assumed that the banks mutated in regions 2 and 4 did not produce better binding clones, but that the optimal binder from region 3 has two mutations compared to the original (asterisks). The final optimization is carried out as shown, making combinations of the enriched mutations in regions 1 and 3, again yielding a further increase in binding affinity in the fifth combination shown. This optimized region can now be recombined by CDR- or chain-shuffling to yield combinations which have picomolar binding constants (Fig. 6).

224

Phage display

Fig. 6. Overall compilation of strategies used for optimizing phage-presented antibody. Starting from an initial repertoire library [174,217–220] CDRs can be oligonucleotide-mutagenized [12,66] involving: the CDR can be shuffled, i.e. exchanged (only shown for CDR3) [66,221], the light and heavy chains recombined ( chain shuffling ) (Refs. 65, 69 and 162; see Figs. 2 and 3), consensus-mutational libraries constructed (Ref. 94; see Fig. 5) optionally followed by a repeat of the former chain shuffling (combination testing; Figs. 8–10) and finally subjected to ‘somatic mutation’ throughout the framework. The final mutagenesis may be either via passaging the phage/phagemid through a mutD5 strain [41] or via error-prone PCR and reinsertion of the whole scFv cassette [102]. These principles could be applied to any multidomain, heterodimer presented and selected on phage.

ing, but will be advantageous for methionine and tryptophan, to the deficit of isoleucine, aspartic acid, asparagine, cysteine, glutamic acid, glutamine, histidine, phenylalanine, lysine and tyrosine. The reduction in the frequency of aromatic amino acids and cysteine is, at least for systems using a constrained surface loop, to be considered as an advantage in view of the difficulties in secreting long hydrophobic stretches, as well as with respect 225

J. Collins

Fig. 7. A number of presentation scaffolds are presented, as ribbon diagrams of the backbone, showing the regions which have been hypermutated in libraries (chains of beads). α-Helices (red), β-sheets (ochre), and turns (blue) can be distinguished. S-S bonds are represented as short red bars. Structures were taken from the Brookhaven Protein Database under the indicated index number, given in brackets. The initial images were generated with RASMOL version 2.6 and modified with CorelDraw 5.0. (I am indebted to Jörn Glökler for his help in creating this compilation.) Note that ANP is essentially a constrained cyclic peptide; the protein A Z-domain (a third α-helix is missing in the representation behind and between the two helices illustrated) and cytochrome b562 are almost entirely α-helix, whereas scFv, tendamistat and the ‘minibody’ (not shown; almost the same general structure as tendamistat, but without the cysteine residues) are b-barrels. The proteinase inhibitors in the top row have mixed structures. The structures are all represented on the same scale to allow comparison of the area containing the hypervariable surfaces. References: (Alzheimer’s) amyloid β-protein precursor inhibitor [222]; bovine pancreatic trypsin inhibitor [87]; human pancreatic secretory trypsin inhibitor [72]; atrial natriuretic protein [152]; Z-domain of S. aureus protein A [153,154]; tendamistat [156]; cytochrome b562 [155]; scFv antibody fragment – numerous references are presented throughout the text.

226

Phage display

to protein folding. The use of NNG/C (e.g. Ref. 79) also allows coding of all amino acids with the same effect on distribution as NNG/T. Some libraries (e.g. Ref. 78) have been constructed using NNB (B = C, G or T) codons. Although this reduces the frequency of stop codons to 1 in 48, which may be important in constructing longer hypervariable regions, the relative representation of the amino acids is more biased towards serine, arginine and leucine, which are then 5, 4 and 4 times more frequent, respectively, than methionine, glutamic acid, glutamine, lysine and tryptophan. Where a phage vector (type 3) bank (insert S(S or R)X18PG X18SR) was constructed in supE DHSαF', only two clones from a hundred were interrupted by a stop codon, although some 76% of the clones would have been expected to do so [80]. This implies that the suppressor was too weak to give effective suppression during the formation of the phage particles. In contrast, using a synthetic CDR-antibody library, also in a type 3 vector, amplified in TG1 supE [66,81], clones were selected which contained stop codons, which were assumed to be translated as glutamine. Arkin and Youvan [82] have developed an algorithm for ‘doping’ (the addition of fixed ratios of the bases at each step during the synthesis of the hypervariable mutagenic oligonucleotide) the ratio of bases at any position in the codon, so that a bias can be imposed with respect to the type of amino acid, e.g. particularly the second base is determining the chemical nature of the subset: A gives hydrophilic residues; C small amphiphilic residues; T hydrophobic residues. A particular aim would be to find a codon bias which maximizes the chemical diversity (e.g. large aromatic versus charged or small aliphatic amino acids) of the randomized set with a minimum of sequences. Goldman and Youvan [83] use this algorithm for optimizing codon usage subsets for randomized libraries, in studies on random mutagenesis at several positions throughout the light harvesting antennae protein of Rhodobacter capsulatus. I found the criteria for choice of the subgroup of mutagenesis sites and substitutions curious and against the empirical strategy. This approach is certainly of great interest if developing (say) a leucine zipper, β-sheet, hydrophilic flexible linker, or as was done for a four-a-helical bundle, by alternating NAN and NTN codons (Ref. 84; unfortunately not done in conjunction with phage display!), or by doped randomization of a long (say, more than eight codons) sequence (see also the 70:10:10:10 mutagenesis library technique below [85]). In view of the success of the somatic maturation approach [41], in which mutations are accumulated throughout the protein due to a relatively high background mutagenesis rate in E. coli mutD5 hosts [86], it would seem that the empirical approach may often be preferential to confining oneself to biased subsets. For example, the self-imposed restriction, to study a total of 1000 subsets of replacements at five positions in the inhibitory loop of BPTI for the phage-display selection of elastase inhibitors [87], goes against the whole rationale of the empirical approach, which, at its best, should provide innovative new insights into structures in which new inter- and intramolecular synergistic interactions have occurred. Such issues of nonobviousness have become a critical issue in the intellectual property field (see the chapter by Bozicevic). We have used the latter approach with a human pancreatic secretory trypsin inhibitor (PSTI) library in which 6, 7 or 8 (one additional compared to the original loop) positions were randomly varied (NNT/G) in the inhibitory loop (Ref 72; and J. Collins and P. Röttgen, unpublished results in Table 3) with the aim, amongst others, of developing stronger inhibitors of a range of serine proteases. 227

J. Collins

The elastase inhibitors (Table 3) selected from these libraries fall into two narrow categories, in which one is very homologous to traditional inhibitors proposed from molecular modelling, but where the other differs strongly. The first set shows conservation of VX(W/Y)XP (aa positions 18–22),the other AXLXW. It would seem that this result implies a strong synergism between the conserved amino acids present in this highly constrained extended loop. Strangely, position 21, thought by modellers to be best fitted by arginine or phenylalanine [88] and essential for a strong binding, appears as the most variable position in this analysis. Amongst the subnanomolar chymotrypsin inhibitors, modelling of one of the best variants implied a novel inhibitory mechanism for protein serine protease inhibitors, in which two amino acid side chains (arginine and aspartic acid) intrude into the proximity of the catalytic triad of the protease rather than binding in the substrate-binding pockets (see Fig. 4). Markland et al. [89,90] have, by varying small (initially only 30 000 variants) biased subset (‘variegated’) sequences, succeeded in isolating subnanomolar inhibitor variants of LACI-Dl (see Table 2) for the serine proteases plasmin and kallikrein, although not for thrombin. The initial selection strategy for residues which can be varied is similar to the algorithm approach of Laskowski [91], based on comparisons with known inhibitors of the same class. The inherent assumption, thereby, that single residues contribute additively to the binding constant, independently of intramolecular synergisms within the inhibitor itself, has been criticized on the basis of experimental results [88]. The authors support their conservative approach by their wish to obtain variants which have minimal change(s) with respect to the original protein. It has, however, been reported that a single amino acid exchange in a serine protease inhibitor can change the relative specificity of the inhibitor by 6 orders of magnitude [88]. In view of this, it is difficult to evaluate whether this ‘variegated’ approach succeeded due to their strategy, or to the fact that many variants on this geometrically optimized scaffold would have fulfilled the job to the same extent. Ophir and Gershoni [92] created an algorithm to evaluate the optimal doping frequency in mutagenic libraries with synthetic cassettes of different length. As one example, for a sequence encoding 22 aa, an optimal doping frequency of 5% was found. At this level 6% would have no mutation; 18%, 1: 26%, 2; 23%, 3; 15%, 4; and 7%, 5 mutations. A library of 6.5 × 1010 clones would be necessary to encompass all combinatorial sets of five mutations within this 22 aa coding region. In the mutational library approach of Yanofsky et al. (Ref. 85; see also Ref. 93), peptide libraries are generated based on an original, weakly binding, motif. This sequence is mutated at a rate of 50% per amino acid residue (for amino acids encoded by codons where the third position is completely degenerate; 66% mutation rate for amino acids encoded by a single codon sequence), and three additional, completely random, amino acids are added as extensions at both ends. This frequency of mutation is achieved by synthesizing oligonucleotides where the original base is incorporated at a frequency of 70% and the other three bases at 10% each. The method is, therefore, referred to as the 70: 10: 10: 10 approach. The use of this method to develop strong EPO-receptor ligands is described in more detail below (see the ‘Constraints and context’ section). This mutagenesis approach, in combination with light- and heavy-chain shuffling, is also the basis for the success in optimizing the affinity of an antibody directed towards 228

Phage display

the tumour antigen ErbB2, described by Schier et al. [94]. They initially isolated an antiErb2 scFv antibody from a human nonimmunized phage-display repertoire library (Kd = 1.6 × 10–8 M). Nine positions in the VL-CDR3 loop were partially randomized by the 70: 10: 10:10 method (see above), yielding, in combination with the original VH-chain, phage-display-selected clones with a Kd of 10–9 M. The CDR3 of the heavy chain was 20 aa in length, for which four separate mutagenesis banks (each ca. 107 clones) were generated, varying 16 positions, four residues at a time, based partly on an alanine substitution scan, to determine which residues were likely to be of importance, and structural considerations (e.g. a pair of cysteines, a tryptophan and a histidine were not mutated; Fig. 5). Clones selected under more stringent conditions showed a further improvement in binding, in combination with the best VL previously selected (Kd = 1.1 × 10–10 M). An interesting point here is that there was no correlation between the alanine scan and the variability of the positions found amongst the best binding clones (in contrast to Ref. 95). I interpret this to be an indication of the plurality of synergistic effects which can occur in a randomized set, again a good argument for the empirical approach. Combinations of two and five mutations, respectively from the two banks, which showed the strongest binding clones, yielded a further increase to 1.3 × 10–11 M. The increase in binding constant was shown to be almost entirely due to a decrease in the dissociation rate constant. This relative increase in affinity, compared to the original antibody, and the absolute dissociation constant achieved, represents one of the most successful attempts to obtain antibody maturation, either in vitro or in vivo. Yang et al. [96] also increased the binding affinity of an antibody from 6.3 nM to 15 pM by a similar combination of CDR mutagenesis and shuffling, which they refer to as ‘CDR walking’. The use of synthetic trinucleotides as building blocks would obviate the problems of stop codons or other unwanted amino acids turning up randomly in hypervariable sequences, and would indeed allow the predetermined definition of amino acid codon frequency bias. This has been demonstrated by Glaser et al. [97] for phage-display library construction: see also Ref. 98 for a review of current DNA synthesis technology. Some authors have been concerned that the use of codons for which there is a low level of tRNA in E. coli might limit expression. This ‘rare codon’ effect has only been observed in extreme cases where a protein is massively overproduced and contains several rare Argcodons (e.g. Ref. 99). This completely depletes the concentration of charged tRNA in the cell. Since an effect will only be seen if translation undergoes chain termination at the slowing point, it would not be expected to play any role where only a few hundred molecules are being synthesized per cell, as in phage display. Somatic hypermutation Epitopes presented in constrained scaffolds may be prevented from optimal interactions with their target by local deleterious charge, hydrophobic, hydrophilic or steric interactions. Methods have, therefore, been developed which introduce hypervariability, not only into the highly variable epitopes but also into the framework which carries them. This has been achieved by the use of error-prone PCR or, more recently, in vivo hypermutation. Ouellette and Wright [100] in their review of PCR amplifiable DNA and RNA ligands, 229

J. Collins

including cyclic amplification and selection of targets (CASTing), target detection assays (TDA), selected and amplified binding site (SAAB) and systematic evolution of ligands by exponential enrichment (SELEX), emphasize reiterative affinity enrichment in the definition of transcription factor binding sites. This mimics the evolutionary process, whereby the constant addition of low-level mutagenesis during each amplification cycle plays an important role. This philosophy also lies behind the various protocols described below. Error-prone PCR has been used to introduce mutations at the rate of about 1% (i.e. one nucleotide per hundred is mutated after ca. 24 rounds of PCR) into the scaffold of phage-displayed proteins, e.g. the VH domain (Ref. 101; see also Ref. 102), staphylococcal nuclease [103] and PAI-1 [104], in an attempt to imitate the introduction of mutations during the somatic maturation (see Ref. 105) of antibodies. Hawkins et al. obtained, on average, 1.7 mutations per VH gene. From 40 000 mutated clones, a variant was selected which had increased its binding affinity by fourfold (to Ka = 1.9 × 109 M–1) even though a type 3 pentavalent vector was used, which does not allow efficient discrimination amongst tight binding clones. Gram et al. [102], with the same approach, but higher mutation frequencies (various libraries with mutation rates of 1–4%), selected clones from the most mutated population, on a type 3+3 vector, of 13- to 30-fold higher affinity compared to the parental antibody (10–5 M). These selected variants contained multiple mutations. Low et al. [41] have introduced a novel method, at least with respect to phage display, for the introduction of hypermutation throughout the scaffold. This depends on the high mutation rates of E. coli mutD5 strains [86]. The mutation rate in this strain, which bears a proof-reading-negative DNA polymerase III, is only about 50-fold above background (ca. 10–7 per generation per codon) in minimal medium, but can be increased by 20- to 2000-fold by growth in rich medium. This is some 103 to 105 fold higher than the background mutation rate in non-mutD strains. Initial panning on haptens yielded a phagedisplayed scFv clone which binds with an affinity of 320 nM. Further, more stringent, rounds of selection in the mutD5 strain gave consecutive accumulation of point mutations, equivalent to somatic maturation, to an affinity of 3.2 nM. A mutation rate of 1 in 2107 bases was obtained during a single passage in the mutator strain, where the phage were amplified from 199 to 7× 1010 transducing units. The model system consisted of a phage (type 3) presenting a model anti-phOx scFv fragment. Selection conditions were changed from one panning round to the next (more stringent; competition with the soluble phOx hapten and more washes (sometimes 30)) to reduce recovered phage to 106 per round. Pools were taken after 4, 8 and 12 rounds of mutation, followed by three passages of selection without mutation for a total of 15 (!) rounds of panning. The improved binding of the optimal variant (Kd = 3.2 nM) was entirely due to a 100-fold decrease in the dissociation rate constant (koff=4.3 × 10–3 s–1). A genealogical tree was constructed that showed how the consecutive accumulation of five different consecutive mutations contributed to the final variant. The fascinating concept developed by Stemmer [106–111] of DNA-shuffling to reassort modified regions in vitro, by DNA synthesis priming with a collection of fragments from the mutagenized population, is an embellishment which generates additional diversity. This method reassembles a collection of plasmids using PCR on a mixture of DNAse I 230

Phage display

generated fragments from a mutant population, with [108] or without the addition of further primers. This automatically reassorts mutations, which have occurred in different regions of the mutated gene, e.g. saving and reassorting valuable mutations which were originally in the wrong combination. The procedure is described as an example of the crossover reassortment seen during sexual recombination, and as such has been coined ‘sexual PCR’. The method has been used to create chimeras of related genes from different species [112] and has recently been applied to phage-display library construction [111]. It will undoubtedly find more applications in the near future.

‘Landscape’ and ‘mosaic’ phage In a class by itself is the concept of ‘landscape clones’, using type 8 vectors, emanating from Smith’s laboratory [113]: ‘If. .. the foreign peptide is displayed on every subunit, emergent properties may include not only local functionalities that reside in a single variable peptide and its immediate surroundings, but also global functions that inhere in the entire surface ‘landscape’.’ Previous work on type 8 vectors (as opposed to 8+8), particularly from Cortese’s (e.g. Refs. 23, 114 and 115) and Cesareni’s laboratories [34,116], had shown severe limitation on the size of insert which can be tolerated. Molecular modelling predicts that the reiterative surface ‘map unit’, on the landscape generated, will be of about 730 Å2, where the variable peptide covers over a third of this surface. The effectiveness of the library (3 × 109 primary clones) was tested for selection in binding to dioxin or to an antibody to concanavalin A (ConA). Apparently submillimolar binding clones were found for dioxin, most of which contained the motif EPFP, and anti-ConA-binding clones also matched, although not the strongest binding clones, a linear epitope found in an exposed loop of ConA. This latter observation supports the idea that the displayed mimotopes are behaving as constrained epitopes in the context of their local surroundings. New application areas for landscape phage are suggested for metal complexing surfaces with repeating geometry, which might provide new materials with novel catalytic, electrical, magnetic or optical properties, although I feel that if cysteine is underrepresented this may be limiting. As an interesting aside of some practical significance to everyone working with phage presentation, the paper also describes the selection of chloroform-resistant phage. Resistance increased in the enriched population from the initial resistance level of 0.002% up to 7% survival. These clones were used to illustrate the potential of a novel concept designated ‘mosaic’ clone libraries. ‘Mosaic’ clones were produced from experiments with double infections of wild-type and chloroform-resistant clones. Such clones have a surface covered with about equimolar amounts of pVIII coming from each phage. If, in this example, one were to substitute variant phage for the wild-type phage, mosaic phage would be created where the surface contains two variant pVIII molecules. This would increase the diversity of presented surfaces in the library by many orders of magnitude since a patchwork would result, presenting novel structures dependent on components from each variant (abstract Braunschweig meeting, ‘Antibody technology and applications in health and environment’, September 1996). Although the (two?) variant phage would not be physically linked they would be co-enriched during panning, since they would be packaged in the same mosaic coat. Referring to Iannolo et al. [116], it is suggested that larger inserts, covering a larger 231

J. Collins

portion of the surface, may be obtained by moving the randomized region by about 11 aa towards the carboxy terminus of pVIII (see also Ref. 117 for an X-ray diffraction analysis of the structure of three pVIII-hybrid proteins). Such clones exhibit a very low viability, but, nevertheless, sufficient variants should survive to provide an interesting new resource library.

λ packaging: A help in generating more primary clones Since it is the electroporation step which limits the efficiency of primary bank formation (ca. 105 to 106 clones per µg of ligated plasmid DNA; lower figures are given for bacteriophage vectors), lambda carrier vectors have been constructed which allow the efficient use of the λ in vitro packaging system for introducing the phagemid display vector into the cell via transduction. Since the phagemid vector is bracketted by the packaging-start and -stop sequences of bacteriophage M13, superinfection of the transduced λ-phagemid hybrid with an M13-type helper phage causes the phagemid genome to be efficiently excised, packaged, and secreted into the culture medium. The efficiency of in vitro packaging is some fourfold higher than the upper end of the efficiencies quoted for electroporation, namely 6 to 7 × 107 pfu/µg insert. The ImmunoZAP vector of this type is designed to present light-chain and heavy-chainpIII (truncated pIII, aa 198–406) fusions as an Fab fragment [118]. The ZipZap vector can be used to create pIII fusions of peptides or other proteins in the same manner [118]. The SurfZap vector contains two expression cassettes with pelB signal sequences for the creation of phagemid banks displaying heterodimers (i.e. also Fab fragments), where one chain is attached to the amino terminus of pIII [118]. It uses the rare-cutters NotI and Spe I in the cloning process. It was demonstrated that the frequency of positive clones was maintained in the library in spite of the numerous amplification steps involved in its preparation [119,120]. The λ ZLG6 vector system also uses the high efficiency of the λ in vitro packaging and transduction system to create phagemid gene banks. However, Cre/loxP recombination is used to excise the phagemid from the λ genome (based on the λ ZIPLOX vector; Life Technologies, Gaithersberg) and the gene fusion site is conceived to facilitate cDNA expression cloning as an extension to pVI at its 3'-end (type 6+6), with variant vectors in all three reading frames, namely pZLG6 1, pZLG62 and pZLG63 [ 12 1]. We are currently optimizing a novel protocol in which a cosmid vector is used to package and efficiently transduce up to nine phagemid genomes at a time. These are then resolved to individual phagemid genomes upon superinfection of the recipient bacterium with M13 helper phage. In validation experiments efficiencies of over 3 × 107 cosmid clones per microgram ligated DNA can be obtained (J. Collins, T. Boldicke, J. Glökler and N. Chavand, unpublished results), which each contain some six to eight phagemid-display vector genomes, i.e., an overall efficiency of ca. 2 × 108 clone/µg ligated vector DNA.

λ phage as an alternative phage-display system? The redox potential of the bacterial cell is of a reducing nature compared to that in the eukaryotic cytoplasm. This has a negative effect on the folding of eukaryotic proteins 232

Phage display

which depend on the formation of cysteine–cysteinebridges. The examples given in Table 2 for successful display of eukaryotic proteins with type 3, 3+3 , 6+6, 88 and 8+8 vectors are, thus, largely restricted to secreted proteins. The exceptions are Zif268, cytochrome b562 and CP-1. CP-1 is essentially a newly designed scaffold. Many cytoplasmic proteins contain hydrophobic runs which will cause them to stick in the cell membrane, if one attempts to secrete them. In addition some (usually bacterial cytoplasmic) proteins contain numerous cysteines, none of which form a disulphide bridge. One would like, therefore, to extend the phage-display method to the world of bacterial and eukaryotic cytoplasmic proteins. It would be advantageous to have a system in which the phage assembly takes place in the bacterial cytoplasm, rather than in the periplasm or medium. This possibility has been investigated in four papers, two of which, those by Maruyama et al. [122] and Dunn [123], test if the major tail protein, pV, of λ will tolerate substitutions. Mikawa et al. [124] and Sternberg and Hoess [125] demonstrate that the head protein, pD, can be used. In both the pV and pD display systems, a mixture of normal and fusion proteins was produced (see Fig. 2). In the first paper [122] the vector lfoo was constructed in which the V gene (see Fig. 2) is fused upstream of the foreign protein so that the foreign gene product can also be expressed as free protein with its own ribosome binding site and initiation codon. In addition, the linker separating V and the foreign insert contains an in-frame amber stop codon. Depending on the level of amber suppression allowed, variable amounts of truncated pV and pV-fusion protein are formed. E. coli TG-1, Q1, EQ170 and EQ82 strains, for example, suppress too strongly to allow a good yield of viable phage displaying βgalactosidase (475 kDa as tetramer) or agglutinin (120 kDa). Successful model panning experiments were carried out on anti-β-Gal antibody, for the β-Gal displaying phage, and on mucin for the agglutinin display clone. Electron microscopy showed that ca. 50% of the phage obtained from the moderate suppressor strain 4358 were carrying at least one β-galactosidase molecule, i.e. incorporation of the hybrid protein into intact λ particles could be demonstrated. Vectors of this type would, to follow the nomenclature given above, be classified as true presenting phage of VV-type. In the second paper [123] a plasmid expressing the V-gene fusion was present in the cell infected with a phage carrying a normal V gene. The pentameric peptide sequence RRASV was inserted as a C-terminal extension of pV, separated by the flexible linker (SGGG). In a further construct, β-galactosidase could be incorporated at one copy per phage at least, again using the same incorporation site and linker. This does not yet actually represent a true phagemid or phage-display system on which panning selection could be imposed, but represents a feasibility exercise to assay the tolerance of the phage to allow assembly and hybrid protein incorporation into viable phage. This demonstrates that the creation of l phage or phagemid libraries is a real possibility. One of the drawbacks of the λ display vectors is the difficulty of manipulating its large genome (50 kb). Sternberg and Hoess solve this problem by using a small plasmid which can integrate into a partially deleted λ genome using the Cre/loxP recombination system (see the next topic). They describe a vector with fusions to the 11 kDa protein pD, which plays a major role in stabilizing the prehead intermediate during λ morphogenesis. It is not essential if the lambda genome is under 82% of its normal size (74% is the minimal size which gives an infectious particle), although pD- particles are EDTA 233

J. Collins

sensitive. The pD protein is assembled in a trimeric form, up to 405 copies per virus particle. All these factors taken together imply that pD may be ideal as a presentation molecule. Mikawa et al. [124] demonstrate that large protein fusions, i.e. β-lactamase, β-galactosidase, IgG-binding domains and S. aureus protein A can be functionally expressed and tolerated at both the N- and C-termini of the pD protein. They vary the amount of pD presented on the phage by using high and low copy number plasmids expressing pD in the strain infected. The type DD phage display vector, λfooDc, is designed to express cDNA libraries as carboxy-terminal extensions of pD. The Sternberg and Hoess lambda vector (λDarn15imm21nin5) has the b538 deletion which removes 21.5% of its genome. It also has a loxP site and a Dam15 mutation. The plasmid partner, e.g. pRH800, in this system has (for the validation experiment) an angiotensin II (AII) octamer peptide fused to the N-terminus of pD, under the control of the ara promoter, in one construct with the B1 domain of the streptococcal G protein as flexible linker. When bacteria carrying AII-D or AII-B1-D fusion plasmids were infected with λ Dam, hybrid pD molecules were incorporated into phage particles as shown by Western blotting and biopanning with anti-angiotensin antibody. To move the plasmids into the λ genome, a loxP site was inserted downstream of gene D. Cointegration was achieved by infecting a constitutively cre+ strain containing the AII-(BI)-D/loxP plasmid, pRH804, with the λloxP Dam15b538 phage (multiplicity of infection= 1). Although no data are given on the physical cre-mediated cointegration products, and recombination experiments were carried out in suppressor-plus strains, it seems that the experiment worked as planned, since Apr clones were selected on infecting a cre+ host strain with the phage from the previous cross, and biopanning was demonstrated after recombination. The cointegrate (phagemid) can be maintained in this strain by virtue of the integrated plasmid replication origin. The authors also point out the possibility of using this system to create somatic gene therapy vehicles based on the observation that RGD-motif-displaying phage (on pVIII) can be taken up by Hep2 cells [126] (hepatocytes; see the section ‘Panning on cells’). In conclusion, these feasibility exercises were successful, but I am presently unaware of their use in actual library construction or selective panning. The alternative system for cytoplasmic expression, which has been well characterized in generating gene banks and panning selection, is the plasmid-protein-complexdisplay system (also called ‘peptides-onplasmids’) developed by Schatz [127–130] using extensions to the C-terminus of the lacrepressor. This is not a phagemid system, since the plasmid DNA is naked and is reintroduced into the cell by transformation. Two other bacteriophage systems have also been investigated as potential phage-display vectors, namely bacteriophage T4 [ 13 1], where a C-terminal extension on fibritin encoded by gene wac (whisker’s antigen control) could be displayed, and P4 [132], where a decapeptide was successfully inserted and presented near the N-terminus of the capsid decoration component, Psu. Their relative usefulness cannot yet be evaluated.

Constraints and context A number of references throughout this review mention the importance of constraints within the displayed epitope in finding strong binding clones (e.g. Refs. 5, 9, 72, 133 and 234

Phage display

134). A simple consideration of the thermodynamics of binding implies that, if the peptide is constrained in a form which can fit a binding pocket on the target, this ligand will be at a relatively higher concentration in solution, compared with a nonconstrained peptide where the optimal three-dimensional conformation is only achieved transiently. This also means that once bound it should dissociate at a lower rate. Furthermore, since induced-fit is limited in a constrained loop, one might generally expect constrained peptides to exhibit a higher specificity for a particular target molecule, amongst a set of given targets. The negative corollary to this is that the chance of finding the ‘correct configuration’ within the constrained loop might be massively reduced compared with a free peptide, which can associate with a broader range of targets by induced-fit. This leads one to a rationale that a large number of different libraries should be screened, if one is using a target with a binding site of unknown configuration. Alternatively, one can take a dedicated library containing an essential motif (e.g. the proline-rich library to select for SH3-binding peptides [49,135–139]; or e.g. serine protease inhibitors as a scaffold to search for specific inhibitors of other serine proteases [72,89,90,140]; see also Table 2). Constrained peptide display libraries Burritt et al. [3] have, very recently, reviewed various oligopeptide libraries which are available. Of the 23 libraries mentioned, six can be considered as designed to be constrained: five contain cysteines bracketting the hypervariable region (4–9 aa long) and one [78] contains, in addition, a ProGly-‘turn’ between two 18-mer hypervariable regions. Since biotin or avidin/streptavidin are often used to immobilize targets, it bears repeating that the early investigations defined HPQ [140,141] or HP(Q/M)nonpolar [78] as ligands for streptavidin and that WXPPF(R/K) can be selected on biotin [142]. It is also to be noted that some authors working with linear peptide-presentation found that synthetic peptides did not always exhibit the binding properties of the nonconstrained peptide selected (e.g. Ref. 143), i.e. that the peptide on the phage is probably constrained by its local environment. This latter phenomenon forms the basis for the ‘landscape and mosaic phage’ concept described in a separate section above [113]. This is not the case for the constrained ‘cyclic’ peptides mentioned here. The following examples demonstrate the power of using constrained epitope libraries, for recognizing strongly binding motifs, which may represent the linear sequence of the normal ligand or mimic its steric structure, i.e. a so-called ‘mimotope’. The motif ArgGlyAsp (RGD) has been characterized by phage display to be integral to the recognition of short peptides by integrins. Strong binding clones, e.g. of αIIb β1 integrin, are of interest in preventing platelet aggregation, and have been isolated from constrained banks of the CX6C type [144] where it was shown that binding was strongly dependent on the intramolecular disulphide bridge formation. A cyclic peptide CRRETAWAC (note: not containing an RGD motif) was similarly isolated on α5 β 1 integrin and was shown to be an antagonist of cell binding to fibronectin but not vitronectin, again dependent on the peptide being in the nonreduced form. This illustrates that conformation can affect binding significantly, i.e. sequences other than those present in the free peptide can be selected as optimal binders from the constrained peptide libraries. Extending on this, Koivunen et al. [145] selected peptides which bind specifically to a range of integrins, 235

J. Collins

using phage-display libraries of either the cysteine constrained type mentioned before or presenting just CX9. Only peptides containing an extra cysteine, i.e. of the cyclic peptide type, were selected from this latter library. This may be due either to the role of constrained conformation in this molecular interaction or to the fact that clones with an odd number of cysteines are not viable. The RGD motif was common to all binding sequences, but was influenced by adjacent sequences and the length of the loop, e.g. ACDCRGDCFCG (two cysteine–cysteine bonds?) was 20-fold more potent than other circular peptides with a single disulphide and 200-fold more potent than the best linear peptides. A similar picture emerges in the selection of plasminogen activator inhibitor analogues, panning on an antibody to the inhibitor. A CX6C library was used and strong binding clones were isolated. The conservative (an O for an S) substitution of one cysteine by serine reduced binding of the optimized peptide by 2 orders of magnitude. One of the most dramatic recent achievements in using phage display of peptide libraries to develop lead molecules for pharmaceutical drugs was the isolation of an agonist for the erythropoietin receptor (EPOR; Refs. 33 and 112), which normally mediates erythropoiesis in response to erythropoietin (EPO; Kd for the receptor = 200 pM). EPO has become a standard therapeutic for the treatment of various anaemic conditions. Soluble EPOR was available for use as an immobilized target. A type 8+8 library (3 × 109 primary clones) of constrained (cyclic) random octamers, i.e. bracketted by cysteines, was used initially. The idea in using such a library was that, by using a multivalent presentation, motifs might be obtained, which, even if exhibiting low affinity, would indicate the direction to take in developing further analogues. Initial experiments using acid elution were unsuccessful. Positive results were obtained first by using a thrombin cleavable site between the ligand and pVIII, which allowed release of target-bound phage by thrombin treatment. A synthetic peptide, based on the sequence of the selected clone (CRIGPITWVC), showed EPOR-binding. To pursue this further, dedicated libraries were generated, using a type 3+3 phagemid vector (low valency of presentation), containing mutated sequences based on the original consensus, CRIGPITWVC and X3CRIGPITWVCX3, in which the mutation rate for each codon in the original consensus sequence was 50%. This strategy had been developed previously and was used successfully to isolate high affinity interleukin-I receptor antagonists [85] and E-selectin binding clones (Ref. 93; see Fig. 5). The consensus sequence enriched with the type 8+8 vector did not bind appreciably when presented at lower valency on the type 3+3 vector. These banks were panned on the EPOR, using EPO as a competitive ligand in the washing during the second and third panning rounds, with subsequent acid elution. This will enrich for higher affinity clones binding to the normal EPO binding site (see also Refs. 40 and 85). The first library showed no enrichment, whereas the second library yielded 34 binding clones showing the consensus sequence XYXCRMGP(I/M)TWVC(K/R/S/T)PX (bold type for the sequence present in the original sequence). None of these sequences has homology to the primary sequence of EPO. Mutation of a cysteine to a serine reduced binding by a 1000-fold, again emphasizing the importance of the structural constraint. Five peptides were chosen for synthesis on the basis of this consensus. The crystal structure of a representative EPOR agonist from this set, complexed with EPOR, was solved [112]. This showed that the first tyrosine is a contact point to the receptor and GPXTW forms a β-turn which would stabilize the peptide structure. The EPO mimotope, termed EMPl , showed erythroid 236

Phage display

colony stimulation on bone marrow cells, although on a molar basis 5 orders of magnitude less efficiently than EPO (200 nM cf. to 20 pM for EPO). The mimotopes were shown to be fully functional in activating EPOR specific signal transduction as demonstrated by the tyrosine kinase activation pattern. Perhaps the most surprising finding is that EMPl peptide forms a symmetric dimer which binds symmetrically to stabilize a dimer of the receptor. The peptide–peptide interface is also stabilized by a number of hydrophobic interactions, the internal loop constraints being due not only to the cysteine-cysteinedisulphide bond but also to hydrophobic interactions (Tyr4 and Phe8). Six hydrogen bonds form between each peptide monomer and the receptor, where the peptide dimer straddles the groove between the receptor monomers like a saddle. These studies are an elegant illustration of the strategy of using different phagemid-display libraries to work towards higher efficiency ligands, and how, with the help of structural analysis and modelling, the way is opened for the development of novel therapeutic drugs. It is a commonly observed phenomenon that dimerization of a eukaryotic receptor e.g. with antibody can cause activation of the receptor, and that cytokine interactions often only involve a relatively small number of residues at the receptor–cytokine interface (see Ref. 85). Whether this approach can be generally applied to developing other pharmaceutical ligand ‘leads’ from peptide libraries remains to be seen. The studies on the interleukin-1 agonists [85] are in contrast with the EPOR-agonist paper, in that cyclic constrained peptides were not enriched in the former, but nanomolar inhibitors were still obtained. This set of papers coming from the Affymax laboratories [33,85,93] is impressive also for the number, diversity and size of the peptidedisplay libraries used. Designed presentation matrices It is obvious from the aforementioned examples that constraint of the hypervariable or random region plays an important role in the isolation of strong ligands from phagedisplay libraries. Antibody libraries present essentially six constrained loops (where at least two are highly variable), and, where sufficient diversity is present, it seems that specificity can be obtained to any target. Antibodies have, however, the disadvantage that they are large and it is difficult to model their exact three-dimensional structure, should one wish to continue designing smaller, perhaps completely synthetic mimotopes, even though this has in fact been done for the creation of mimetics based on the structure of one antibody CDR-loop [146]. This direction of thought has motivated a number of laboratories to examine alternative proteins, with known structure, as scaffolds for the presentation of constrained loops. These differ in size, in their core structure, in the flexibility allowed in the hypervariable region(s) and the total surface of the hypervariable region [147]. In all cases the design involves using regions which are known to be well exposed to solvent. Figure 7 is a compilation of such scaffolds which have been used to create large libraries, showing the residues which were randomized. Three proteins, for which I did not have structural data, should also be mentioned, namely: the so-called ‘minibody’ [147–149]; a Kunitz inhibitor domain of the human lipoprotein associated coagulation inhibitor (LACI-D1 ; Refs. 89 and 90) which resembles BPTI; and CP-1, a designed scaffold zinc-finger [150]. The minibody is a 61 aa designed 237

J. Collins

protein, based on the two pairs of three-stranded β-sheets of the immunoglobulin Vdomain. AS such its structure resembles a single domain of scFv structure, which is itself similar in structure to tendamistat. A library was generated varying two of the three loops furthest away from the N- and C-termini. The use of LACI-D1, APPI, BPTI and PSTI libraries to select protease inhibitors with altered specificity is described elsewhere in this text. CP-1 is based on the 26 aa consensus zinc-finger motif of DNA-binding proteins, where the bold -type characters were varied PYKCPECGKSFSQKSDLVKHQRTHTG, — — — — in the library. The Cys2His2 tetrad, underlined, forms a binding pocket for a zinc ion, where the cysteines are in a β-sheet and the histidines in an α-helix. The structure is further stabilized by hydrophobic packing of Tyr2, Phe11 and Leu17. The varied positions are solvent accessible and located on one side of the molecule. Selection for mimotopes of a Shigella lipopolysaccharide (LPS), by binding to an anti-LPS antibody, yielded a consensus sequence. Interestingly, binding was dependent on zinc, indicating that the constrained zinc-finger conformation was necessary for binding. Circular dichroism and Co(II)-complex adsorption were used to confirm that the basic scaffold conformation had been maintained in the variants. The other scaffolds, of defined structure, shown in Fig. 7 (Table 2), have also been used in phage-display gene libraries. The atrial natriuretic protein (ANP) libraries varied in all but the cysteine bridge would represent essentially a special case of a ‘cyclic’ peptide library in which the loop is 14 amino acids. Cunningham et al. [151] randomized throughout the whole 28 aa region, with the exception of the cysteines, with seven separate banks, in each of which three to five residue ‘windows’ were mutagenized. Phage display was used to select clones with specific affinity to the ANP-receptor A in the presence of competing ANP-receptor C. A combination of six mutations seen to affect the specificity in the right direction were combined in a final construct with the required properties, with an additional bonus that some of the mutations led to a higher secretion of the variant. In a specialized approach [ 152] to miniaturize ANP, while retaining the biological properties, protein design and phage-display selection methods involving binding to the ANPreceptor were applied. The final 15 aa derived peptide binds the ANP-receptor with only a fivefold lower affinity compared to the original 28 aa protein. A similar concept was applied to the 59 aa triple α-helical Z-domain derived from the IgG-binding B-domain of protein A. The final product was a double helical 33 aa derivative, which can still bind to IgG with a Kd of 43 nM, compared to 10 nM for the original (only these two of the helices are shown in Fig. 7; Ref. 153). The effect of mutations within the triple helical Zdomain, at the indicated positions on the solvent accessible surface, of over 600 Å2, with respect to protein folding, and the creation of a monovalent phage-display library has been described by Nord et al. [154]. This triple helical bundle represents perhaps the most highly constrained library of the group discussed here. Creating this library, via mutagenesis of specific positions over an extended region, while maintaining the scaffold, required a special strategy for the reconstitution of the whole gene from a mixture of degenerate templates, bridges and PCR primers. The use of this system to select insulin and Taq-DNA-polymerase binding clones from a monovalent-presentation library of 4.5 × 107 clones was recently described (communicated by K. Nord, E. Gunneriusson, M. Uhlén and P.-Å. Nygren, GBF-Symposium: ‘Antibody technology and applications in health and environment’, Braunschweig, Germany, September, 1996). 238

Phage display

Ku and Schultz [155] demonstrated the use of cytochrome b562, another triple a-helical bundle, as a scaffold for phage display in a type 3, fdtet, vector. The two loops mutagenized can provide relatively flexible sites for the formation of binding pockets for small ligands, or a surface of some 800 Å2 for interaction with larger molecules. It is thermally stable and can be produced at high levels. A library of 2 × 108 recombinants was generated in which one loop had four, the other five, randomized amino acids. Strong selection of a consensus set, which was similar to antibodies binding such ligands, in terms of the abundance of solvent exposed aromatic amino acid residues, was obtained on N-methylN-p-nitrobenzylamine conjugated to BSA. The clones were specific for this conjugate, recognizing neither BSA alone nor a conjugate to ovalbumin. Tendamistat, a 74 aa β-sheet α-amylase inhibitor from Streptomyces, was used to generate a phagemid-display library [156] of some 108 clones in a type 33 phage vector (M13704), where one loop contained three varied residues, and one loop six. Controls showed that a normal functional tendamistat could be presented, and performed Alascreening to determine which loop residues were important for binding to α-amylase. Testing the library by panning on an anti-endothelin antibody, clones were obtained which showed no discernible consensus. The strongest binding clone bound essentially via the longer loop alone, with an affinity comparable to endothelin-presenting phage. The sequence in this loop (YIGSHG) interestingly shows no similarity to the LEPWL consensus found with libraries presenting either constrained or unconstrained peptides [1 57]. This finding is difficult to explain other than as a ‘context phenomenon’.

Increasing diversity by reassortment and recombination The total variability present amongst the primary clones generated in making a phage(mid) bank is the central factor on which the potential usefulness of the technique depends. The creation of diversity in a phage(mid) library is, in the classic approach, limited by the transformation efficiency of E. coli. The libraries cited in Table 2, for example, ranged from a few thousand variants, where a role for key amino acid positions in a specific bimolecular interaction was being investigated, to some 3 × 108 for banks which were intended to be used for general ligand selection on a variety of targets. Recently some gene banks have been generated by ‘brute force’, i.e. an industrial scale production involving up to a thousand electroporations to generate libraries containing an excess of 1010 primary clones [ 158]. These are valuable resources, but beyond the means of university research laboratories wishing to generate their own libraries. A more elegant concept for the production of diversity is to mimic nature, which often uses recombination between domains or motif cassettes. In vivo site-specific recombination to generate large combinatorial libraries The broad applicability of phage display of antibodies has been amply demonstrated for a wide range of targets and extensively reviewed [38,159–161]. Many of the concepts for the increase in combinatorial library complexity have arisen from laboratories working on the antibody libraries (see Figs. 1–3). Although using constant light chains and variable heavy chains in phage vectors only relatively weakly binding clones (>µM) were often 239

J. Collins

obtained (better from the larger banks), reassortment, so-called chain shuffling, of variable light chains with the heavy chains from the first selection, allowed enrichment panning to yield clones with more than a 100-fold higher affinities [162]. Assuming that each chain library contains some 108 variants, the total number of combinations exceeds 1016 a figure which can only be achieved in the PCR amplified DNA/RNA-based combinatorial libraries [163–165]. The limitation to the size of combinatorial gene banks is thus a handicap if one wishes to find the original combination of chains in an antibody repertoire library. It is also noted that the larger the library, the greater the chances of obtaining a high affinity member (e.g. Ref. 161). A number of methods, such as λ packaging vectors, and Cre/loxP andλ-recombinase-mediated in vivo recombination systems, have been developed to meet these demands. With further chain shuffling and mutational enrichment procedures, which simulate somatic maturation of the immune system [41], i.e. introducing additional variation to the bank, even higher affinities can be selected. These principles should be directly applicable to working with other phage-presentation libraries. λ Recombination to generate combinatorial libraries Two recombination systems have been described for use with phagemid display, both for the generation of hypervariable banks of antibody libraries by chain shuffling of the light and heavy chain subunits. The Cre/loxP system derives from the bacteriophage P1 and requires a 33 bp target sequence (loxP) which brackets the insert after recombination. The other system utilizes the lysogenization system of λ. l phage uses a site-specific recombination to integrate into the host chromosome during lysogenization. The only λ phage function required is the λint gene. The recombination is irreversible in the absence of the λxis gene product (Fig. 8B). Geoffrey et al. (Ref. 166; see also Ref. 167) have described a pair of vectors: a plasmid (pM827) containing attB; and a phagemid (pM834) containing attP. On induction of Int protein (heat shock induction of E. coli strain D1210HP) the two plasmids recombine irreversibly at these sites, forming a cointegrate, pM835 (Fig. 8A). pM834 contains the promoter for the chloramphenicol resistance gene (cat) near the att site, while pM827 carries the promoterless cat gene. After recombination the promoter is juxtaposed upstream of the resistance gene so that the new cointegrate is a phagemid encoding chloramphenicol resistance. One vector encodes immunoglobulin VL-CL and the other encodes VH-CH1-pIII fusions. The final cointegrate is a phagemid (type 3+3), specifying chloramphenicol resistance, which will present Fab on packaging, after superinfection with a helper phage. Chain shuffling, and the generation of large combinatorial libraries, can thus be achieved by infecting a heavy chain library on the phagemid pM834 into a population carrying a light chain library in pM827. Each time the bank is created new combinations of heavy and light chains are produced. The authors demonstrate that the principle works, but, disappointingly, apparently some technical problems have so far prevented the production of very large repertoires and the isolation of high affinity antibodies to targets of interest. Cre/loxP recombination to generate combinatorial libraries Waterhouse et al. (Ref. 65; Fig. 9B) first demonstrated the use of the Cre/loxP in vivo site-specific recombination for chain shuffling (the procedure is described in the figure 240

Phage display

legend). A combinatorial library of Fab-presenting phage, estimated to contain an excess of 1010 clones, was created with this method by Griffiths et al. [66]. The Cre function was obtained subsequent to infecting the cells with a PlCre bacteriophage. The efficiency of recombination was estimated to have caused 28% of the acceptor phage to have received

A

B

Fig. 8. The generation of a combinatorial Fab library according to Sodoyer et al. [166,167]. (A) The phagemid pM834 heavy-chain-PI11 fusion library is introduced into the same strain as the light chain library, carried in the low copy number plasmid pM827. pcat refers to the promoter for the chloramphenicol resistance gene. cat is the structural gene for chloramphenicol transacetylase without a promoter. After the Int-mediated sitespecific recombination the required phagemid form (pM835) is achieved. This is packaged on addition of helper phage. Only the recombined vector carries a functional chloramphenicol resistance gene (Cmr). Kmr = kanamycin resistance gene. Eori = replication origin of ColE1. (B) The two sites attB and attP give site-specific recombination mediated by the phage protein Int, to yield a cointegrate bordered by recombined att sites designated attPB and attP'B'. The reaction can only be reversed if Int and another phage protein Xis are present.

241

J. Collins

A

B

Fig. 9. The formation of combinatorial libraries between light and heavy variable regions of Fab antibody fragments by combinatorial infection and in vivo recombination. Waterhouse et al. [65] and Griffiths et al. [66] used Cre/loxP-mediated gene cassette exchange to shuffle the light and heavy chains in a type 3 library. (A) The number of different CDR1/2 combinations (expressed germline sequences [66] and hypervariable CDR3 regions (see e.g. Ref. 223)) created in two separate banks are shown as well as the calculated number of combinations of light and heavy chains that could, theoretically, be produced. (B) fdDOG-2lox phage encode genes whose products can associate to form an Fab fragment presented on the phage, since the heavy chain is fused to pIII (see Fig. 3). The pUC plasmid, present in the cell, encodes only a heavy chain with a c-myc tag (not shown) extension. The heavy chain genes in both phage and plasmid are bracketted by a copy of the loxP511 and loxPwt (wild-type) sites, allowing exchange of this cassette between the two replicons via Cre-mediated site-specific recombination. The mutant only recombines with another mutant loxP site, but not with a wild-type site, and vice versa. Recombination (including exchange of the heavy chain cassette) is induced by infecting the cells carrying the pUC plasmids and the upper fdDOG1-2lox phage with a Pl- cre phage, which supplies the Cre recombinase. This acts in a reversible fashion to allow recombination between all homologous loxP sites. The final result of a reciprocal recombination between the homologous loxP sites would be a mixture of the two phage (illustrated) and two plasmids (one illustrated) and higher multimer cointegrates (not tested). The formation of the required recombinant (lower phage) is seen in the form of a phagemid which can present the recombinant Fab antibody fragments fused to pIII. The formation of the recombinant DNA products was also tested by PCR.

a heavy chain from the donor vector. The isolation of a range of high affinity antibodies to a number of targets was described for this bank [66]. This library was considerably larger than any previously produced. 242

Phage display

A system which similarly uses Cre/loxP recombination, but to produce scFv-presenting type 3+3 chain shuffling, has been described by Tsurushita et al. (Ref. 67; Fig. 10). The system requires that there be a continuous open reading frame through the 33 bp loxP site, whereby the linker, thus generated between the VH- and the VL-chain, should still be compatible with the formation of a correctly folded scFv fragment. This was positively demonstrated in the model system for two different antibodies. The formation of packageable Apr and Apr+Cmr clones increased a 1000-fold on induction of Cre. Packaging the

Fig. 10. In vivo recombination to generate combinatorial type 3+3 phagemid presenting scFv according to Tsurushita et al. [67]. A VH-chain library is constructed in the pCW93 ampicillin resistance (Apr) plasmid vector, where the VH-chain coding region is preceded by a loxP511 and a normal loxP site (see Fig. 9 legend). A light chain library bracketted by these loxP sites is constructed in the low copy number chloramphenicol resistance (Cmr) phagemid pCW99. Since the plasmids have different replication origins, they can stably coexist in the same cell. The cell also contains pCre, which has a pSC101 replication origin, and encodes both tetracycline resistance and the Cre recombinase which is under the control of the araB promoter. On addition of arabinose to the medium, Cre is produced. This initiates reversible recombination at the loxP sites, generating the phagemid plasmids A and B, which in turn can generate the original plasmids, plasmid D and phagemid C, amongst a range of other products, consisting mostly of higher concatamers of all these constructs. The large arrows in the centre designate phagemid. The black arrows designate phagemid (plasmid A and plasmid C) which can synthesize (under lac promoter control) and display scFv-pIII on superinfection with an f-type helper phage.

243

J. Collins

products of the recombination products, which, it is noted, can be far more diverse than indicated in the figure or the publication, showed surprisingly that the majority of the clones (titre 5.9 × 109/ml) were of the anticipated C-type phagemid. This is, therefore, an extremely efficient and straightforward method which should lead to the production of excellent very large recombinatorial libraries.

Exon shuffling An ingenious method has been developed, which largely avoids the problem of having the loxP site (11 amino acids defined) in the reading frame of a gene fusion, when using the Cre/loxP in vivo recombination system to shuffle contiguous regions. The paper of Fisch et al. (Ref. 69; Fig. 11) uses a self-splicing intron, which removes itself from the mRNA. A loxP site has been inserted into the intron, while still preserving the splicing function of the intron (Fig. 11C). The intron still requires that a few of the bases in the bordering, donor and acceptor sequences be defined. This leads to the fixed sequence ALLRY between the two hypervariable decameric amino acid sequences created in the initial type 3 library. In spite of this limitation, the creation of more than 1011 clones in this experiment is impressive. To test the effectivity of the bank, panning was carried out on an anti-p53 antibody, bovine alkaline phosphatase and b-glucuronidase. Since the nonrearranged and nonspliced products cannot present the hypervariable sequences, due

→ Fig. 11. Exon shuffling, according to Fisch et al. [69]. (A) A library of exon1 variants, containing 10 contiguous randomized positions, is present initially in an E. coli TG1 (F+), pACYCaraCre strain. Exon1 is cloned between two different loxP sites. IoxP wt (wt = wild type, shown as small open circles) lies downstream of exon1 in a loop of the type 1, self-splicing intron from Tetrahymena thermophila, where it does not effect the function of the intron (see (C)). IoxP 511 (shown as small filled circles) differs from IoxP wt by a single point mutation and should, in the presence of Cre recombinase, provided from the pACYCaraCre plasmid, preferentially recombine with other loxP511 sites, as IoxP wt should preferentially recombine with other IoxP wt sites. The loxP sites are 34 bp long. The type 3 phage library is introduced into this strain by infection. The library fdDOG- loxP511-loxPwt-exon2 carries the designated elements in the given order, whereby IoxP wt is also inserted into the intron as in the pUC vector. When Cre is induced, by the addition of arabinose to the medium, site-specific recombination takes place efficiently at the loxP sites. One of the possible recombination products has a replacement of the insert between the two loxP sites in the phage with exonl from the plasmid. Initial phage and plasmid will also be present along with a range of other recombination products. The phage (original and recombinant) can be replicated and secreted into the medium where they now represent a very large combinatorial library (estimated to be about 1.6 × 1011 primary clones) in which exonl and exon2 sequences have been reassorted. This could be repeated with a new exonl or exon2 library to generate more novel variants. (B) Sequences generated in the recombinant: the primary transcript has a sequence which could encode; to the left of the intron, from exonl, the pelB leader sequence immediately preceding the hypervariable decamer, then alanine and leucine followed by a stop codon. The sequence to the right originates from exon2. After splicing, a messenger is generated which has a continuous open reading frame for a pelBL-hypervariable decamer-ALLRY-second hypervariable decamer-linker-pIII protein which will be displayed on the secreted phage from this strain (along with original exon2-pIII sequences; this latter will be diluted out on further amplification of the phage). (C) A generalized scheme of the RNA structure of the type 1 self-splicing intron from Tetrahymena thermophila into which the IoxP wt sequence has been inserted. The splice sites are shown. Seven bases in the first intron and eight bases in the second exon are determined by the sequence in the intron as necessary for excision. This leads to the retained constant sequence (AALLRY) between the two hypervariable decamer sequences shown in (B).

244

Phage display

A

B

C

245

J. Collins

to the stop codon at the 5'-end of the intron, only correctly fused, spliced constructs should be selected. Indeed, highly specific binding clones of this type were obtained: panning on the antibody yielded clones with the motif (R/K)HSV, in either the left or right hypervariable region (see Fig. 11B); panning on alkaline phosphatase yielded 12 different clones (Kd ca. 1 µM), nine of which had the same amino-terminal sequence and different carboxy-terminal sequences; panning on β-glucuronidase yielded five different clones (Kd ca. 20 nM), two with the same amino-terminal sequence, four with the sequence DP(V/L). In the latter two groups it was shown that the binding was due to the amino-terminal segment. An examination of the sequences reported for the clones showed seven from 21 had the linker (ALLRY) directly fused to pIII, which is explained by the authors as due to a partial loss of exon2 during the assembly of the fdDOG1-exon2 repertoire. The variable length (8–10 aa) of the amino-terminal hypervariable sequence is not discussed. In conclusion, this is a powerful novel concept for the generation of very large hypervariable libraries with a minimum of constraint with respect to the structures remaining at the recombination site. If the loxP/intron were in the scaffold between hypervariable domains this would present no limitation at all. In addition, as the authors note, the limitations imposed with respect to the fixed donor and acceptor sites may be transient, since it is possible to generate intron mutants, by altering internal complementarity within the intron, which will use other donor and acceptor sequences.

Direct interaction rescue The following papers, from 1994, present an interesting, if not well documented, method which uses a radically different selection protocol. Reviewing the literature, however, I am not aware of it having been used effectively by any other laboratory. It should, perhaps, therefore, be considered under the heading of ‘emerging technologies’. The phage-display technique, referred to as ‘direct interaction rescue’ (DIRE; Ref. 168) or ‘selection and amplification of phage’ (SAP; Ref. 169), does not depend on panning at all. The principle requires that a specific molecular interaction between ligand and target reconstitutes a functional pIII protein on the phage particle. The C-terminal portion of pIII is responsible for attachment to the phage, but if shorter than about 200 amino acids it will not mediate binding to the F-pilus during infection of the cell. Conversely, the N-terminal 261 aa portion of pIII (total length normally 406 aa) cannot attach to the phage, but can bind the F-pilus. By producing two gene fusions to produce NH2-pIIIfragment-X and Y-COOH-pIII-fragment (where X and Y are two different proteins or peptides; see Fig. 3), both functions can only be present in a single complex if they associate strongly via affinity between X and Y. Leader peptides are of course present at the amino terminus of both constructs to allow secretion. This has been compared with the yeast two-hybrid system [ 170], which requires a similar intermolecular interaction to generate an active transcription complex for a marker gene. Gramatikoff et al. [168], starting with pJuFo, a phagemid from Crameri and Suter [171], which already contains a double-cysteine-stabilized leucine zipper attached to a carboxy-terminal pIII fragment (Jun-zipper-C-pIII; see 3zip3 type vectors; Fig. 3), tested if an amino-terminal-pIII-fragment-leucine-zipper (N-pIII-fos-zipper) associated strongly enough to allow functional 246

Phage display

reconstitution of pIII, i.e. the formation of particles able to infect F+ E. coli, packaged with the aid of a pIII-deleted helper phage. As was to be expected from the studies with pJuFo, this was indeed the case. The phagemid carrying both pIII-fragment fusions is electroporated into a recipient already carrying the pIII-deletion phage. cDNA clones could be isolated from a library where X (Fig. 3), as carboxy-terminal extensions on the amino-terminal pIII fragment, is a cDNA fragment library, and Y, as the ‘bait’, was maintained as the leucine zipper from Jun. Although there is little detail in the paper, coiled-coil, potential zipper forming regions, from a ribosomal protein and a portion of tropomyosin, which had been isolated as a Jun-zipper binding protein in the yeast twohybrid system, as well as two other, unidentified, proteins were isolated. Duenas and Borrebaeck [169] presented a very similar system (SAP), again with few details, in which the interacting moieties, as above, were: Y, antibody fragments directed against the V3 loop of gp120(HIV) or hen lysozyme; and X, the corresponding antigens. The vectors were phagemid and used a pIII-deleted helper phage to prepare the packaged particles. Specific amplification of phage could only be obtained if the correct fusion protein was added to the phage presenting antibody fused to truncated carboxy-terminal pIII. A library was constructed where a naive antibody repertoire was presented fused to the Cterminal pIII fragment. An amplification of the library in the presence of the HIV-antigen fused to the N-terminal pIII fragment yielded 50% positive-binding clones after three panning rounds. A parallel experiment in which the same library was enriched by panning with normal helper phage yielded no positive clones. (This reviewer assumes this may be due to the different valencies of the SAP system, namely 5, and the phagemid 3+3 vector, namely 1. This is still surprising since one would assume that a high affinity interaction is required to select in both systems (no data on this in either paper).) It should also be noted that the background could be quite high, since M13 phage deleted for pIII, or normal phage on an F–host can still infect E. coli in an F-independent fashion with an efficiency of up to 10–4 (dependent on the calcium concentration).

Panning on cells The enrichment of cell-specific ligands by panning phage banks directly on cells has been technically challenging, although there have been a number of very successful examples in the literature (see Ref. 59 for another recent review). One of the first examples published was the search for SCFv fragments which recognize the blood group antigens A, B, O, RhD and E, and Kell (Kpb) [172]. The scFv-pIII phage bank [173], containing a VHrepertoire from two donors, was selected on intact erythrocytes. Selected clones were then tested in an agglutination assay. Anti-B, -RhD, -RhE and -Kell, although no anti-A or 0, antibodies were found. This was a remarkable achievement, since the antigens are present as only 2–5000copies per cell. In a further report, Rapoport’s group [ 174,175] selected anti-thyroid peroxidase (antiTPO) scFv fragments from a phagemid repertoire library, generated from mRNA extracted from B-cells invading the thyroid of autoimmune patients. Thyroid peroxidase was known to be an autoantigen. Thirty high affinity antibodies (Kd< 10–10) were isolated, four different H chains and three L chains. The library was panned on TPO-expressing (surface antigen) CHO cell clones, with subtraction of antibodies to other surface antigens on 247

J. Collins

TPO-negative CHO cells. After three rounds of panning, 100% of the clones were strong binding clones (round 0, 0.008%; round 1, 5%; round 2, 68%). In a similar panning protocol, a uniform population of ADP-activated platelets, adherent to plastic, was used to select binding clones from a 15-mer peptide presenting phage library [141]. The goal was to identify ligands to surface markers which would inhibit platelet aggregation [59]. The ratio of eluted to input phage increased 10-fold in the fourth round of panning, using urea elution. Ninety per cent of the eluted phage population bound specifically to platelets. Synthetic peptides representing the peptides displayed by the isolated phage were found to prevent platelet aggregation with an IC50 between 6 and 240 µM. A variety of peptide motifs were evident among the platelet-binding sequences. Five of the clones contained the sequence RGD, which is known to bind to the glycoprotein GPIIb-IIIa, an integrin (α IIb β3) that binds fibrinogen, vitronectin, von Willebrand factor and fibronectin. One peptide inhibits the aggregation of platelets induced by ristocetin. Another peptide resembles a portion of the fibrinogen γ-chain, whereas others were similar to the snake venom protein kistrin or to lactoferrin (KRDS sequence). Finally some peptides contained the motif KHS, which has no homology to the known proteins. As the authors point out, the utility of this approach using peptide libraries is the diversity and biological activity of the peptides isolated. Platelets were used directly as a target for panning a peptide library in the search for a thrombin receptor antagonist, using a peptide agonist as elution agent [176]. This led to the generation of a peptide which could antagonize platelet aggregation, triggered by the agonist peptide, and inhibit serotonin release and tyrosine phosphorylation triggered by either thrombin or the agonist peptide. Doyle et al. [59] report the isolation of urokinase plasminogen activator receptor (UPAR) agonists from a phage-displayed random peptide library. As the receptor was not easily available in soluble form (also Ref. 177), it was overexpressed in both Sf9 insect cells infected with baculovirus vectors and in COS7 cells. These two cell types were used in alternate rounds of panning so as to avoid enriching clones which bound to cell surface components other than the UPAR. Three rounds of panning yielded positive clones, with the short consensus of either LWXXq (where 8 is an aromatic amino acid) or XFXXYLW. The clone most frequently represented, encoded a peptide which binds the receptor with a 10 nM affinity. The use of mock infected Sf9 cells for negative selection and UPAR-expressing cells for positive selection was unsuccessful, as one should probably expect, since, at the most, only 10% of the binding clones will actually bind during panning. The authors also selected peptides which bind fibroblast growth factor (FGF) receptor using the same methods, isolating a clone displaying a peptide with submicromolar affinity. Meulemans et al. [178] describe the use of a nonimmunized mouse SCFv library phagemid library (type 3+3; pCANTAB5 vector) to select antibodies specific for a cytoskeletal antigen by competitive elution with a monoclonal antibody against cytokeratins. The unique aspect of this work was that the phage were screened with methanol/acetone fixed human ovarian carcinoma cells. To select for phage which reacted with the specific cytokeratin epitope target, the phage population resulting from four rounds of selection were panned on bladder carcinoma cells, using competitive elution with the anti-cytokeratin monoclonal antibody. It seems that at least one scFv clone was obtained which had some 248

Phage display

specific affinity for cytokeratin 18, although neither the clone sequences nor the properties of the scFvs are described in sufficient detail for a critical evaluation. Eggena [17] used panning on neutrophil cells to isolate a specific clone from an antibody repertoire library prepared from an ulcerative colitis patient’s lamina propria. This antibody represents an immunogenetic marker for the disease. De Kruif et al. [179] describe an interesting use of cell sorting to select for SCFvS (seven different light chains, 49 germline heavy chains with synthetically varied CDRs; 6–15 aa) which specifically bind to peripheral blood cells (PBLs) of different types. Phagemid particles were mixed with the PBLs overnight, washed twice and stained with fluorescent monoclonal anti-CD20 and anti-CD3 antibodies. This staining allows sorting of eosinophils, B- and T-lymphocytes, and other cells in a fluorescence activated cell sorter. The shear stress developed during this process was considered sufficient to substitute for some of the washing steps during panning. Phage were eluted from the sorted fractions with acid, neutralized and propagated. This was repeated for the separate cell-specific fractions. Some 63–100% of the phagemid particles showed specific binding to the cell fraction on which they had been sorted. One clone was characterized as specifically recognizing a CD8αchain. Two B-lymphocyte binding clones recognized B-cell markers expressed at a very late stage in B-cell maturation, which have not hitherto been characterized. This demonstrates an elegant approach for the isolation of antibodies to markers which are present in their native form, even on rare subpopulations of cells, from synthetic repertoire libraries. Cai and Garen [180] investigated the selection of melanoma-specific antibodies by panning a scFv-presenting type 3 phage repertoire library derived from melanoma patients who had been immunized with autologous cells transformed with a retrovirus expressing interferon-γ. One library was generated using mRNA from PBLs enriched by binding to melanoma cells (bank 1) and the other from a nonenriched population (bank 2). After three rounds (bank 1), or two rounds (bank 2, which finally yielded all the specific clones) of panning on autologous melanoma cells, more than 90% of the clones showed specific binding to melanoma cells in an ELISA. These populations were subjected to a further round of negative selection on normal melanocytes. After testing specific binding on a wide range of cell types and tumour lines, it was concluded that the clones from one patient’s Ab repertoire specifically recognized cells of the melanocyte lineage. One clone obtained from the second patient’s repertoire library specifically recognized melanoma cells (also from another patient) but did not react with normal endothelial cells, melanocytes, fibroblasts or with normal cells present in melanoma sections. This clone only crossreacted with one of two pancreatic tumours in the panel tested. It is concluded that this type of search should best be carried out with a collection of Fvs derived from many different patients since there is an obvious difference in the way they react to immunization with autologous tumour cells. Finally, in one of the most amazing publications of the year, Pasqualini and Ruoslahti [26] describe the isolation of clones which bind to specific tissues, by panning in vivo. Phage populations were eluted from tissues surgically removed from an animal into which a phage library had been injected in vivo. Since tumour cells and lymphocytes can home to specific tissues, the authors concluded that these tissues display specific marker molecules accessible to circulating cells, and circulating phage. Constrained, or potentially constrained, libraries of the type X2CX5–7C, CX9, X2CX14CX2 and X2CX18 in the fUSE5, 249

J. Collins

type 3 vector (see Refs. 145, 181 and 182) were injected in the mouse tail vein (1014 to 1016 phage). Most of the phage remained in circulation during the 1–4 min before the animals were sacrificed by snap freezing. Phage were rescued from ground tissues in the presence of protease inhibitors by adding extracts directly to bacteria. Although some tissues retained large amounts of phage aspecifically, e.g. the liver, specific enrichment was found in the kidney and particularly in the brain. The brain-specific population showed three enriched sequence motifs (54% with SRL). Brain-isolated phage bound brain tissue about eightfold more effectively than kidney. Unselected clones showed no differential binding. Kidney-selected phage were isolated from the CX5/6C libraries with 60% of the clones containing two motifs; the most specific clone bound kidney sevenfold better than brain. Immunohistochemical studies showed that the brain-specific clones bound to capillary epithelium throughout the brain, whereas the kidney-specific clones bound to glomeruli and in between the tubules. Synthetic peptides based on the most frequent sequences were able to compete with the binding of brain-selected phage to brain tissues, or when attached to red blood cells, target them to the brain. The authors point out the significance of such studies to develop somatic gene therapy vectors, particularly directed to tumour vasculature. In summary, the use of alternating cycles of selection on different cell lines expressing the same antigen, and selective elution with competitive antibody or ligand seem to be particularly effective protocols for targetting ligands to specific epitopes on proteins which can only be obtained intact in or on cells, i.e. are unavailable as purified proteins. In addition, two protocols showed that total PBLs and even whole animal screening/panning can yield antibodies or ligands to interesting novel targets. The use of nonnaive libraries, i.e. enriched for disease-specific antibodies, may be considered as critical to the success in isolating high-affinity antibodies, e.g. Refs. 174 and 175.

Innovative screening protocols ‘Substrate phage’ Matthews and Wells [ 183] have presented an interesting variation on the normal selection protocol, designated ‘substrate phage’. They displayed human growth hormone (hGH) on phage linked to pIII with a random linker. This library is bound to immobilized hGH-receptor. Elution was carried out by treatment with a protease which normally did not cleave the phage or hybrid protein. The subset of clones enriched by cleavage in the linker thus presents a consensus for the preferential protease substrates. This method has also been used to characterize the preferential cleavage site of furin, a mammalian enzyme involved in the cleavage of many constitutively expressed protein precursors [ 184]. Vectors constructed with such proteolytic linkers can thus be released from their target during panning with a mild buffer containing the appropriate protease, without having to guess at optimal elution conditions (see Ref. 85; thrombin cleavage for elution). Guiding specificity The following examples use competition during panning to select for enhanced specificity for a particular target. Cunningham et al. [151] selected an A-type-ANP receptor-specific 250

Phage display

agonist (atrial natriuretic peptide analogue or variant) using competition panning with soluble receptor B while selecting on receptor A. Dennis and Lazarus [I85] selected APPI (Alzheimer’s β-amyloid precursor protein inhibitor) variants specific for blood-clotting factor VIIa which bound more weakly, or not at all, to FXa. They used competitive selection with FXIa, plasmin and kallikrein in the panning buffer to prevent enrichment for those with specificity for these proteases. An overall consensus inhibitor (TF7I-C), previously isolated, was a 1.9 nM inhibitor for FVII, but a 0.8 nM inhibitor of FXIa, 1.2 nM for kallikrein and 40 nM for plasmin. The competitive enrichment amongst variants of this latter inhibitor from a library in which nine positions were very partially randomized (ca. 360 000 variants) yielded inhibitors with a single change. Clear effects of counterselection were seen only in one of four banks. The final inhibitor constant, although all worse than TF7I-C for FVIIa, was found to be a 100-fold more selective relative to FXIa and kallikrein. An alternative to negative selection has been presented by Stausbøl-Grøn et al. [186], who demonstrated that the addition of control proteins during the adsorption and washing steps during panning can guide the specificity of the selection process towards a target which is absent in the control mixture. It is an interesting possibility that such a protocol would help specific enrichment in selection of disease-specific phagotopes as described above, whereby control patient sera would be added as competitor during panning. It is, of course, questionable as to whether or not the concentrations of the individual antibodies in the serum are high enough to produce a significant effect. SH3-ligands and mirror-imaging with D-peptide ligands Peptide ligands which bind Src-homology type 3 domains (SH3) have been investigated by a number of groups using both synthetic and phage-display systems, subsequent to Baltimore’s group [ 187] inferring that a proline-rich sequence was involved in the interaction. Screening on SH3 has shown nonconstrained proline-rich core sequences [49,135–137], where the specificity for different SH3-domains lies mainly in adjacent sequences [188]. This latter paper is, in fact, a major ‘tour de force’, in which the consensus specific for a very wide range of SH3-domains is defined by phage display with a very large dedicated peptide library presenting (X)6PXXP(X)6. Binding affinities of peptides, based on the motifs identified, confirm results with phage. Clones of type II (bind in the opposite orientation [ 189]), which showed a broader binding specificity amongst SH3-domains, were used to screen mouse and human cDNA expression libraries, selecting 18 genes, of which 10 were novel [190]. One can compare these phage-display experiments with the synthetic peptide and rational design approach for Abl-SH3 binding clones. Pisabarro and Serrano [50] varied the decapeptide APTYPPPLPP substituting single positions, one at a time with two to four different amino acids (a total of 29 peptides). Binding varied between twofold better and 20 times worse, e.g. the latter particularly when the prolines were exchanged. Combining favourable substitutions yielded a 0.4 µM binder, APTYSPPPPP. In the model, the long Pro-run is favourable due to reduction in entropic freedom. Although the consensus and/or optimal sequences arrived at with the two techniques are very similar, it seems to me that the relevant biological information obtained, relative to the effort involved, greatly favours the empirical approach. 251

J. Collins

The potential therapeutic value of targetting the signal transduction pathway has recently been demonstrated with specific Jak2-kinase inhibitors in an animal leukaemia model [191]. Since these pathways require that SH3 and SH2 interact with their appropriate ligands, this binding is also of interest as a target for therapeutic design. The development of ligands with clinical potential obviates the direct use of short peptides, which are easily degraded in vivo, either in the serum or in the cell cytoplasm. In a dramatic demonstration of a novel strategy, Schumacher et al. [133] have developed D-amino acid polypeptide ligands, which should be resistant to proteolysis, but which, because of their small size, easily diffuse to and into target organs/tumour tissue. They used synthetic enantiomeric Src-SH3 domain, composed entirely of D-amino acids, as the target during panning, With a type 3 phage library presenting a disulphide constrained C(X)10C loop they identified binding epitopes as L-amino acid ligands. Interestingly, the synthetic D-aa enantiomer ligands bind to the natural L-protein. The D-forms of CLSGLRLGLVPC, CLMGLRLGLLPC, CAYGFKLGLIKC, with a consensus containing a long hydrophobic region interrupted by an arginine or lysine, bind normal Src-SH3. The binding is dependent on the constrained structure, i.e. on disulphide bond formation, with the best binding clones having a moderate affinity (Kd of 63 µM). NMR was used to investigate the binding site of the D-peptide on SH3. It fits into the shallow groove used for binding the natural ligand, which is in agreement with the competitive binding with natural ligands. Modelling implied that the conserved arginine/lysine residue binds in pocket-A, as does the conserved arginine in the normal substrate, but the portion of the cleft binding the proline residues is not involved in strong interactions with the D-peptide. The authors consider that Denantiomer synthesis is feasible by chemical synthesis for domains of up to 100 aa, and as such could be applicable for screening with any of the numerous small domains which occur in natural proteins (e.g. WW, PHD, SH2, etc.; Ref. 192). This approach is, of course, not confined to screening with phage-display libraries. Selecting disease-specific mimotopes A knowledge of disease-related antigens in autoimmune disease is often lacking. A resolution of this problem would simplify the search for the early immune response antigens correlated with the initiation of the disease, or at the worst provide a diagnostic tool to assess the disease status. In terms of vaccination and the development of an immune competent status subsequent to either infection or vaccination, there is also the need for diagnosis of the immune response against protective antigens. The aim in both cases is to select a collection of antigens, which can be used to recognize the presence of antibodies against specific epitopes in the serum (or cerebrospinal fluid), where, initially, the antigens are unknown. Although faced with the apparently insurmountable problem of challenging 108 to 109 phage variants with perhaps a million different antibodies present in the serum, a number of groups have attempted to carry out just such a selection. One of the first studies, Dybwad et al. [193] used panning of a type 8 nonapeptide library, alternating positive selection on pooled sera from a number of rheumatoid arthritis (RA) patients, and negative selection (see reservations about the effectivity of such an approach, above) on sera from a control group. The experiment was moderately successful, in that the enriched phage population contained five unrelated clones which could 252

Phage display

react with 44% of sera from RA patients, whereas they reacted with only 13% of sera from the control group. Cortese’s group [7,60,61,115,194,195] have developed a different screening approach. In view of the concentration problems involved during the initial panning step, the high avidity of multivalent type 8, or type 8+8 vectors (see Refs. 114 and 196) is considered essential to the success of the enterprise, as in the previous example given. Pools of positive sera are immobilized on beads for the panning, and pools of both positive and negative sera are used for screening gridded blots of up to 100000 clones selected during the panning. The method has been applied to prediabetic and high risk insulin dependent diabetes mellitus patient sera (Mennuni et al., in preparation, quoted in Ref. 195; preliminary results show significantly higher reactivity of some clones for patient compared to control sera), cold, insoluble IgM-complex fractions from type II cryoglobulinemia patients (CryoII; Macchini et al., in preparation, cited in Ref. 195) and the cerebrospinal fluid of multiple sclerosis patients [61]. In all three examples, phage (termed phagotopes) were isolated which could interact specifically with antibodies from patient sera or, in the latter case, with high titre to patient cerebrospinal fluid (CSF). In the case of the CSFspecific clones, three consensus motifs were found. However, the clones isolated bound to antibodies from both the multiple sclerosis and the control groups. The association with the disease state, although not implying an etiological relationship, was indicated by the finding that the antibodies binding phagotopes were present at a much higher titre in the CSF of a few individuals from the MS patient group. In the case of the CryoII study, phagotopes showed a motif which implied that the dominant antigen recognized by the disease-specific IgMs was the T-lymphocyte and natural killer cell surface antigen LAG-3. This was confirmed by further experiments. The same serum panning and selection strategies have been used to isolate phagotopes recognized by antibodies specifically present in hepatitis B virus- (HBV; Ref. 115) and hepatitis C virus- (HCV; Ref. 195) immune sera. In the first study two different clones were found, one of which was similar to the natural viral antigen sequence. At least one of the two antigens were recognized by 80% of sera from immunized individuals and by 40% of sera from HBV-infected patients. In the case of the HCV study a whole panel of (at least 40) phagotope clones were selected (panned and positively and negatively screened, with HCV-infected patient sera and control sera) which reacted only with pooled patient sera. Although any one clone reacted with serum antibodies in only half of the patient samples, the total collection gave a ‘fingerprint’ of the immune status of the individuals, with close to 100% identification of positive sera and zero cross-reaction with negative control sera. This is a very encouraging result which should be extremely useful in the development of diagnostic kits. The isolation of disease-related antibodies specific for either melanoma [ 180], the autoimmune thyroid peroxidase antigen [ 174], or neutrophil antigen characteristic of ulcerative colitis [17] from nonnaive patient repertoire phage libraries has been reviewed above in the ‘Panning on cells’ section. The isolation of anti-viral antibodies with diagnostic or therapeutic potential from antibody repertoire libraries has been described in earlier papers: e.g. against HIV [197], hepatitis B virus [ 198] and human respiratory syncytial virus [199]. The use of phage-display libraries to characterize a number of epitopes recognized by polyclonal autoimmune antibodies has also been reported for affinity purified anti-TNFα 253

J. Collins

autoantibodies [200]. A number of papers have also pointed out that multivalent type 8 phage are extremely good immunogens [24,201,202]. As such they have been proposed as potential vaccines. Selecting human antibodies directed to a particular epitope Jespers et al. [203] present a protocol whereby, using phage display and starting from a mouse monoclonal antibody to a particular epitope, a completely human antibody can be derived to the same epitope. The mouse light and heavy chains are cloned from the hybridoma. A phage library presenting human light chains is infected into an E. coli host expressing the mouse monoclonal heavy chain. Panning on the original antigen selects hybrid Fab directed to the same antigenic epitope. A host strain, expressing the human light chain, is infected with a phage library of human heavy chains. Panning of the phage produced, which are now displaying completely human Fab fragments, on the original antigen allowed selection of human antibodies directed to the same epitope that is recognized by the original mouse antibody. In this manner human anti-amino-terminal TNF α antibodies were produced with 15 nM dissociation constants to human TNFα. This differs from previous protocols [204–207] to ‘humanize’ antibodies in that (i) it is a straightforward approach, not requiring a knowledge of the structure of the antibody; and (ii) the final antibody is of completely human origin.

Summary and future trends Many of the studies with phage display are directed towards the development of pharmaceuticals. Peptides and proteins are generally not favoured, because of their poor pharmacokinetic properties, immunogenicity and, for larger proteins, their poor penetration into tissue. On the other hand, the concept of a ‘lead molecule’ is prevalent in the development of small synthetic pharmaceutical drugs. This requires that three-dimensional structural predictions can be made for the isolated ligands, so that a synthetic chemist may initiate a project to make a mimotope, hopefully having these desirable characteristics [208]. Free peptides and larger proteins, such as antibodies, with flexible complex interactions cannot easily be modelled. This explains to some extent the tendency to produce small compact matrices, such as tendamistat, protein A (mini Z-domain), BPTI, PSTI, PAI, ecotin, ANP-minidomain and CP-1 (Table 2) to present constrained epitopes, as well as the use of constrained disulphide peptide libraries. The number of papers in which these approaches have been successfully applied and where structural models have been generated is increasing rapidly. A tendency is discernible towards specific context scaffolds and dedicated gene banks to approach particular problems (e.g. proline-rich for SH3-domains; constrained scaffolds in addition to antibodies, e.g. in the search for protease inhibitors or integrin competitors) as well as becoming a basic tool in protein design studies. The studies to use phage display to define disease-specific epitopes, in infections and autoimmune disease (e.g. Refs. 17 and 61), without knowing the specific antigens involved are exciting possibilities which are at present difficult to evaluate. Perhaps the most dramatic trend, apart from the diversity of the vectors, is that the original limitations with respect to the size of the libraries have been overcome to a great 254

Phage display

extent. A number of groups have exerted considerable effort, as in the ‘brute force’ approach, and ingenuity (λ packaging vectors; Cre/loxP or λ-recombinase in vivo recombination systems; exon shuffling) into producing a new generation of banks with 1011 (e.g. Ref. 69) or more clones. With such banks, particularly with antibody libraries, higher affinity antibodies have been obtained via chain shuffling and mutational enrichment procedures, which simulate somatic maturation of the immune system. Sequential protocols in which one starts with a range of multivalent banks, and works towards higher affinities with monovalent libraries, mutational libraries or extension libraries, derived from the enriched clones, have been shown to be highly effective (e.g. Refs. 33, 85 and 94). This and the ‘evolutionary’ approach in highly mutagenic E. coli hosts over many selection rounds (e.g. Ref. 41) will surely influence further experimental design. Although there is still a limitation to the phage display of nonsecretable (cytoplasmic) proteins, which may be alleviated by the developments which have been initiated with λ display vectors, and although many laboratories had ‘teething problems’ with this technology in the early nineties, it is now clearly established as a major tool in nearly all areas of biotechnological, biomedical and molecular biological research. This now seems particularly attractive in conjunction with the developments of somatic gene therapy cell targetting (e.g. Ref. 26) and the definition of novel gene family members, based on the recognition of common functional domains (e.g. Ref. 190) as a support activity for the human genome programme. As a last note, in this multimedia age data on combinatorial libraries are available on the WWW (world wide web or world binding domain?) at addresses compiled by Peters and Sikorski [134].

References 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Smith, G.P., Science, 228 (1985) 1315. Barbas III, C.F. and Burton, D.R., Trends Biotechnol., 14 (1996) 230. Burritt, J.B.. Bond, C.W., Doss, K.W. and Jesaitis, A.J., Anal. Biochem., 238 (1996) 1. Choo, Y and Klug, A., Curr. Opin. Biotechnol., 6 (1995) 431. Clackson, T. and Wells, J.A., Trends Biotechnol., 12 (1994) 173. Cortese, R. (Ed.) Combinatorial Libraries. Synthesis, Screening and Application Potential, Walter de Gruyter, Berlin, Germany, 1995. Cortese, R., Monaci, P., Nicosia, A., Luzzago, A., Felici, F, Galfre, G., Pessi, A., Tramontano, A. and Sollazzo, M., Curr. Opin. Biotechnol., 6 (1995) 73. Jongsma, M.A., Stiekema, W.J. and Bosch, D., Trends Biotechnol., 14 (1996) 331. Ladner, R.C., Trends Biotechnol., 13 (1995) 426. Makowski, L., Curr. Opin. Struct. Biol., 4 (1994) 225. Rosenfeld, R., Vadja, S. and DeLisi, C., Annu. Rev. Biophys. Biomol. Struct., 24 (1995) 677. Barbas III, C.F., Rosenblum, J.S. and Lerner, R.A., Proc. Natl. Acad. Sci. USA, 90 (1993) 6385. Cheng, X., Kay, B.K. and Juliano, R.L., Gene, 171 (1996) 1. Choo, Y., Sanchez-Garcia, I. and Klug, A., Nature, 372 (1994) 642. Stryhn, A., Andersen, P.S., Pedersen, L.O., Svejgaard, A., Holm, A., Thorpe, C.J., Fugger, L., Buus, S. and Engberg, J., Proc. Natl. Acad. Sci. USA, 93 (1996) 10338. De Leo, ER., Ulman, K.V., Davis, A.R., Jutila, K.L. and Quinn, M.T., J. Biol. Chem., 271 (1996) 17013.

255

J. Collins 17 Eggena, M., Targan, S.R., Iwanczyk, L., Vidrich, A., Gordon, L.K. and Braun, J., J. Immunol., 156 (1996)4005. 18 Hammer, J., Takacs, B. and Sinigaglia, F., J. Exp. Med., 176 (1992) 1007. 19 Hammer, J., Valsasnini, P., Tolba, K., Bolin, D., Higelin, J., Takacs, B. and Sinigaglia, F., Cell, 74 (1993) 197. 20 Oldenburg, K.R., Longanathan, D., Goldstein, I.J., Scholtz, P.G. and Gallop, M.A., Proc. Natl. Acad. Sci. USA, 89 (1992) 5393. 21 Stephen, C.W., Helminen, P. and Lane, D.P., J. Mol. Biol., 248 (1995) 58. 22 Smith, J.W., Tachias, K. and Madison, E.L., J. Biol. Chem., 270 (1995) 30486. 23 Felici, F., Luzzago, A., Folgori, A. and Cortese, R., Gene, 128 (1993) 21. 24 Willis, A.E., Perham, R.N. and Wraith, D., Gene, 128 (1993) 79. 25 Adey, N.B. and Kay, B.K., Gene, 169 (1996) 133. 26 Pasqualini, R. and Ruoslahti, E., Nature, 380 (1996) 364. 27 Smith, G.P. and Scott, J.K., Methods Enzymol., 217 (1992) 228. 28 Kay, B.K., Winter, J. and McCafferty, J. (Eds.) Phage Display of Peptides and Proteins. A Laboratory Manual, Academic Press, San Diego, CA, U.S.A., 1996. 29 Fields, C., Adams, M.D., White, O. and Venter, J.C., Nat. Genet., 7 (1994) 345. 30 Grant, R.A., Lin, T.-C., Konigsberg, W. and Webster, RE., J. Biol. Chem., 256 (1981) 539. 31 Crissman, J.W. and Smith, G.P., Virology, 132 (1984) 445. 32 Nelson, F.K., Friedman, S.M. and Smith, G.P., Virology, 108 (1981) 338. 33 Wrighton, N.C., Farrell, F.X., Chang, R., Kashyap, A.K., Barbone, F.P., Mulcahy, L.S., Johnson, D.L., Barrett, R.W., Jolliffe, L.K. and Dower, W.J., Science, 273 (1996) 458. 34 Cesareni, G., Minenkova, O., Dente, L., Iannolo, G., Zucconi, A., Citterich, M.H., Lanfrancotti, A., Castagnoli, L. and Vetriani, C., In Cortese, R. (Ed.) Combinatorial Libraries. Synthesis, Screening and Application Potential, Walter de Gruyter, Berlin, Germany, 1995, pp. 111–126. 35 Makowski, L., Gene, 128 (1993) 5. 36 Marvin, D.A., Hale, R.D., Nave, C. and Citterich, M.H., J. Mol. Biol., 235 (1994) 260. 37 Light, J., Maki, R. and Assa-Munt, N., Nucleic Acids Res., 24 (1996) 4367. 38 Little, M., Breitling, F., Micheel, B. and Dübel, S., Biotechnol. Adv., 12 (1994) 539. 39 Holliger, P. and Winter, G., Curr. Opin. Biotechnol., 4 (1993) 446. 40 Barrett, R.W., Cwirla, S.E., Ackerman, M.S., Olson, A.M., Peters, E.A. and Dower, W.J., Anal. Biochem., 204 (1992) 357. 41 Low, N.M., Holliger, P. and Winter, G., J. Mol. Biol., 260 (1996) 359. 42 Westendorf, J.M., Rao, P.N. and Gerace, L., Proc. Natl. Acad. Sci. USA, 91 (1994) 714. 43 Schmitz, R., Baumann, G. and Gram, H., J. Mol. Biol., 260 (1996) 664. 44 Brenner, S. and Lerner, R.A., Proc. Natl. Acad. Sci. USA, 89 (1992) 5381. 45 Burbaum, J.J., Ohlmeyer, M.H.J., Reader, J.C., Henderson, I., Dillard, L.W., Li, G., Randle, T.L., Sigal, N.H., Chelsky, D. and Baldwin, J.J., Proc. Natl. Acad. Sci. USA, 92 (1995) 6027. 46 Needels, M.C., Jones, D.G., Tate, E.H., Heinkel, G.L., Kochersperger, L.M., Dower, W.J., Barrett, R.W. and Gallop, M.A., Proc. Natl. Acad. Sci. USA, 90 (1993) 10700. 47 Songyang, Z., Margolis, B., Chaudhuri, M., Shoelson, S.E. and Cantley, L.C., J. Biol. Chem., 270 (1995) 14863. 48 Muller, K., Gombert, O., Manning, U., Grossmüller, F., Graff, P., Zaegel, H., Zuber, J.F., Freuler, F., Tschopp, C. and Baumann, G., J. Biol. Chem., 271 (1996) 16500. 49 Yu, H., Chen, J.K., Feng, S., Dalgarno, D.C., Brauer, A.W. and Schreiber, S.L., Cell, 76 (1994) 933. 50 Pisabarro, M.T. and Serrano, L., Biochemistry, 35 (1996) 10634. 51 Finan, P., Koga, H., Zvelebil, M.J., Waterfield, M.D. and Kellie, S., J. Mol. Biol., 261 (1996) 173. 52 Renschler, M.F., Bhatt, R.R., Dower, W.J. and Levy, R., Proc. Natl. Acad. Sci. USA, 91 (1994) 3623. 53 Geysen, H.M., Mason, T.J. and Rodda, S.J., J. Mol. Recog., 1 (1988) 32. 54 Tegge, W., Frank, R., Hofmann, E and Dostmann, W.R.G., Biochemistry, 34 (1995) 10569.

256

Phage display 55 Frank, R., Tetrahedron, 48 (1992) 9217. 56 Frank, R., Güler, S., Krause, S. and Lindenmaier, W., In Giralt, E. and Andreu, D. (Eds.) Peptides 1990 (Proceedings of the 21st European Peptide Symposium), ESCOM, Leiden, The Netherlands, 1991, pp. 151–152. 57 Pinilla, C., Appel, J.R. and Houghten, R.A., Gene, 128 (1993) 71. 58 Sepetov, N.F., Krchnák, V., Stankova, M., Wade, S., Lam, K.S. and Lebl, M., Proc. Natl. Acad. Sci. USA, 92 (1995) 5426. 59 Doyle, M.V., Doyle, L.V., Fong, S., Goodson, R.J., Panganiban, L., Drummond, R., Winter, J. and Rosenberg, S., In Cortese, R. (Ed.) Combinatorial Libraries. Synthesis, Screening and Application Potential, Walter de Gruyter, Berlin, Germany, 1995, pp. 159–174. 60 Cortese, R., Felici, F., Galfre, G., Luzzago, A., Monaci, P. and Nicosia, A., Trends Biotechnol., 12 (1994) 262. 61 Cortese, I., Tafi, R., Grimaldi, L.M.E., Martino, G., Nicosia, A. and Cortese, R., Proc. Natl. Acad. Sci. USA, 93 (1996) 11063. 62 Collins, J., Cold Spring Harbor Symp. Quant. Biol., 45 (1981) 409. 63 Collins, J., Volckaert, G. and Nevers, P., Gene, 19 (1982) 139. 64 Courtney, B.C., Williams, K.C. and Schlager, J.J., Gene, 165 (1995) 139. 65 Waterhouse, P., Griffiths, A.D., Johnson, K.S. and Winter, G., Nucleic Acids Res., 21 (1993) 2265. 66 Griffiths, A.D., Williams, S.C., Hartley, O., Tomlinson, I.M., Waterhouse, P., Crosby, W.L., Kontermann, R.E., Jones, P.T., Low, N.M., Allison, T.J., Prospero, T.D., Hoogenboom, H.R., Nissim, A., Cox, J.P.L., Harrison, J.L., Zaccolo, M., Gherardi, E. and Winter, G., EMBO J., 13 (1994) 3245. 67 Tsurushita, M., Fu, H. and Warren, C., Gene, 172 (1996) 59. 68 Jespers, L.S., De Keyser, A. and Stanssens, P.E., Gene, 173 (1996) 179. 69 Fisch, I., Kontermann, R.E., Finnern, R., Hartley, O., Soler-Gonzalez, A.S., Griffiths, A.D. and Winter, G., Proc. Natl. Acad. Sci. USA, 93 (1996) 7761. 70 Boeke, J.D., Model, P. and Zinder, N.D., Mol. Gen. Genet., 186 (1982) 185. 71 Peters, E.A., Schatz, P.J., Johnson, S.S. and Dower, W.J., J. Bacteriol., 176 (1994) 4296. 72 Röttgen, P. and Collins, J., Gene, 164 (1995) 243. 73 Maenaka, K., Furuta, M., Tsumoto, K., Watanabe, K., Ueda, Y. and Kumagai, I., Biochem. Biophys. Res. Commun., 218 (1996) 682. 74 DeGraaf, M.E., Miceli, R.M., Mott, J.E. and Fischer, H.D., Gene, 128 (1993) 13. 75 O’Neil, K.T., Hoess, R.H., Raleigh, D.P. and DeGrado, W.F., Proteins, 21 (1995) 11. 76 Parmley, S.F. and Smith, G.P., Gene, 73 (1988) 305. 77 Scott, J.K. and Smith, G.P., Science, 249 (1990) 386. 78 Kay, B.K., Adey, N.B., He, Y.-S, Manfredi, J.P., Mataragnon, A.H. and Fowlkes, D.M., Gene, 128 (1993) 59. 79 Barbas III, C.F., Bain, J.D., Hoekstra, D. and Lerner, R.A., Proc. Natl. Acad. Sci. USA, 89 (1992) 4457. 80 Calaman, S.D., Carson, G.R., Henry, L.D., Kubinec, J.S., Kuestner, R.E., Ahmed, A., Wilson, E.M., Lin, A.Y., Rittershaus, C.W., Marsh Jr., H.C. and Jones, N.H., J. Immunol. Methods, 164 (1993) 233. 81 Nissim, A., Hoogenboom, H.R., Tomlinson, I.M., Flynn, G., Midgley, C., Lane, D. and Winter, G., EMBO J., 13 (1994) 692. 82 Arkin, A.P. and Youvan, D.C., Biotechnology, 10 (1992) 297. 83 Goldman, E.R. and Youvan, D.C., Biotechnology, 10 (1992) 1557. 84 Kamtekar, S., Schiffer, J.M., Xiong, H., Babik, J.M. and Hecht, M.H., Science, 262 (1993) 1680. 85 Yanofsky, S.D., Baldwin, D.N., Butler, J.H., Holden, F.R., Jacobs, J.W., Balasubramanian, P., Chinn, J.P., Cwirla, S.E., Peters-Bhatt, E., Whitehorn, E.A., Tate, E.H., Akeson, A., Bowlin, T.L. and Dower, W.J., Proc. Natl. Acad. Sci. USA, 93 (1996) 7381. 86 Fowler, R.G., Degnen, G.E. and Cox, E.C., Mol. Gen. Genet., 133 (1974) 179. 87 Roberts, B.L., Markland, W., Ley, A.C., Kent, R.B., White, D.W., Guterman, S.K. and Ladner, R.C., Proc. Natl. Acad. Sci. USA, 89 (1992) 2429. ^

257

J. Collins Szardenings, M., Vasel, B., Hecht, H.-J., Collins, J. and Schomburg, D., Protein Eng., 8 (1995) 45. Markland, W., Ley, A.C., Lee, S.W. and Ladner, R.C., Biochemistry, 35 (1996) 8045. Markland, W., Ley, A.C. and Ladner, R.C., Biochemistry, 35 (1996) 8058. Empie, M.W. and Laskowski Jr., M., Biochemistry, 21 (1982) 2274. Ophir, R. and Gershoni, J.M., Protein Eng., 8 (1995) 143. Martens, C.L., Cwirla, S.E., Lee, R.Y-W., Whitehorn, E., Chen, E.Y-F., Bakker, A., Martin, E.L., Wagstrom, C., Gopalan, P., Smith, C.W., Tate, E., Koller, K.J., Schatz, P.J., Dower, W.J. and Barrett, R.W., J. Biol. Chem., 270 (1995) 21 129. 94 Schier, R., McCall, A., Adams, G.P., Marshall, K.W., Merritt, H., Yim, M., Crawford, R.S., Weiner, L.M., Marks, C. and Marks, J.D., J. Mol. Biol., 263 (1996) 551. 95 Lowman, H.B. and Wells, J.A., J. Mol. Biol., 234 (1993) 564. 96 Yang, W.P., Green, K., Pinz-Sweeney, S., Briones, A.T., Burton, D.R. and Barbas III, C.F., J. Mol. Biol., 254 (1995) 392. 97 Glaser, S.M., Yelton, D.E. and Huse, W.D., J. Immunol., 149 (1992) 3903. 98 Virnekas, B., Ge, L., Plückthun, A., Schneider, K.C., Wellnhofer, G. and Moroney, S.E., Nucleic Acids Res., 22 (1994) 5600. 99 Brinkmann, U., Mattes, R.E. and Buckel, P., Gene, 85 (1989) 109. 100 Ouellette, M.M. and Wright, W.E., Curr. Biol., 6 (1995) 65. 101 Hawkins, R.E., Russel, S.J. and Winter, G., J. Mol. Biol., 226 (1992) 889. 102 Gram, H., Marconi, L.-A., Barbas III, C.F., Collet, T.A., Lerner, R.A. and Kang, A.S., Proc. Natl. Acad. Sci. USA, 89 (1992) 3576. 103 Light, J. and Lerner, R.A., Bioorg. Med. Chem., 3 (1995) 955. 104 Pannekoek, H., Van Meijer, M., Schleef, R.R., Loskutoff, D.J. and Barbas III, C.F., Gene, 128 (1993) 135. 105 Ziegner, M., Steinhauser, G. and Berek, C., Eur. J. Immunol., 24 (1994) 2393. 106 Stemmer, W.P.C., Crameri, A., Ha, K.D., Brennan, T.M. and Heyneker, H.L., Gene, 164 (1995) 49. 107 Stemmer, W.P.C., Proc. Natl. Acad. Sci. USA, 91 (1994) 10747. 108 Crameri, A. and Stemmer, W.P.C., Biotechniques, 18 (1995) 6. 109 Crameri, A., Whitehorn, E.A., Tate, E. and Stemmer, W.P.C., Nat. Biotechnol., 14 (1996) 315. 110 Stemmer, W.P.C., Nature, 370 (1994) 389. 111 Crameri, A., Cwirla, S. and Stemmer, W.P.C., Nat. Med., 2 (1996) 100. 112 Livnah, O., Stura, E.A., Johnson, D.L., Middleton, SA., Mulcahy, L.S., Wrighton, N.C., Dower, W.J., Jolliffe, L.K. and Wilson, LA., Science, 273 (1996) 464. 113 Petrenko, V.A., Smith, G.P., Gong, X. and Quinn, T., Protein Eng., 9 (1996) 797. 114 Luzzago, A., Felici, E, Tramontano, A., Pessi, A. and Cortese, R., Gene, 128 (1993) 51. 115 Folgori, A., Tafi, R., Meola, A., Felici, F., GalfrÚ, G., Cortese, R., Monaco, P. and Nicosia, A., EMBO J., 13 (1994) 2236. 116 Iannolo, G., Minenkova, O., Petruzzelli, R. and Cesareni, G., J. Mol. Biol., 248 (1995) 835. 117 Kishchenko, G., Batliwala, H. and Makowski, L., J. Mol. Biol., 241 (1994) 208. 118 Hogrefe, H.H., Amberg, J.R., Hay, B.N., Sorge, J.A. and Shopes, B., Gene, 137 (1993) 85. 119 Hogrefe, H.H., Mullinax, R.L., Lovejoy, A.E., Hay, B.N. and Sorge, J.A., Gene, 128 (1993) 119. 120 Amberg, J., Hogrefe, H., Lovejoy, H., Hay, B., Shopes, B., Mullinax, R. and Sorge, J.A., Strategies, 5 (1993) 2. 121 Zhu, Z., Zapata, G., Shalaby, R., Snedecor, B., Chen, H. and Carter, P., Biotechnology, 14 (1996) 192. 122 Maruyama, I.N., Maruyama, H.I. and Brenner, S., Proc. Natl. Acad. Sci. USA, 91 (1994) 8273. 123 Dunn, I.S., J. Mol. Biol., 248 (1995) 497. 124 Mikawa, Y.G., Maruyama, I.C. and Brenner, S., J. Mol. Biol., 262 (1996) 21. 125 Sternberg, N. and Hoess, R.H., Proc. Natl. Acad. Sci. USA, 92 (1995) 1609. 126 Hart, S.L., Knight, A.M., Harbottle, R.P., Mistry, A., Hunger, H.D., Cutler, D.F., Williamson, R. and Coutelle, C., J. Biol. Chem., 269 (1994) 12468. 88 89 90 91 92 93

258

Phage display 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147

148 149 150 151 152 153 154 155 156 157 158 159 160 161

Schatz, P.J., Biotechnology, 11 (1993) 1138. Cull, M.G., Miller, J.F. and Schatz, P.J., Proc. Natl. Acad. Sci. USA, 89 (1992) 1865. Gates, C.M., Stemmer, W.P.C., Kaptein, R. and Schatz, P.J., J. Mol. Biol., 255 (1996) 373. Schatz, P.J., Cull, M.G., Martin, E.L. and Gates, C.M., Methods Enzymol., 267 (1996) 171. Efimov, V.P., Nepluev, I.V. and Mesyanzhinov, V.V., Virus Genes, 10 (1995) 173. Lindqvist, B.H. and Naderi, S., FEMS Microbiol. Rev., 17 (1995) 33. Schumacher, T.N.M., Mayr, L.M., Minor Jr., D.L., Milhollen, M.A., Burgess, M.W. and Kim, P.S., Science, 271 (1996) 1854. Peters, R. and Sikorski, R.S., Nat. Biotechnol., 14 (1996) 1031. Rickles, R.J., Botfield, M.C., Weng, Z., Taylor, J.A., Green, O.M., Brugge, J.S. and Zoller, M.J., EMBO J., 13 (1994) 5598. Cheadle, C., Ivashchenko, Y, South, V., Searfoss, G.H., French, S., Howk, R., Ricca, G.A. and Jaye, M., J. Biol. Chem., 269 (1994) 24034. Sparks, A.B., Quilliam, L.A., Thorn, J.M., Der, C.J. and Kay, B.K., J. Biol. Chem., 269 (1994) 23853. Cheadle, C., Ivashchenko, Y, South, V., Searfoss, G.H., French, S., Howk, R. and Jaye, M., J. Biol. Chem., 269 (1994) 24034. Rickles, R.J., Botfield, M.C., Zhou, X.-M., Henry, P.A., Brugge, J.S. and Zoller, M.J., Proc. Natl. Acad. Sci. USA, 92 (1995) 10909. McLafferty, M.A., Kent, R.B., Ladner, R.C. and Markland, W., Gene, 128 (1993) 29. Devlin, J.J., Panganiban, L.C. and Devlin, P.E., Science, 249 (1990) 404. Saggio, I. and Laufer, R., Biochem. J., 293 (1993) 613. Smith, G.P., Schultz, D.A. and Ladbury, J.E., Gene, 128 (1993) 37. O’Neil, K.T., Hoess, R.H., Jackson, S.A., Ramachandran, N.S., Mousa, S.A. and DeGrado, W.F., Proteins Struct. Funct. Genet., 14 (1992) 509. Koivunen, E., Wang, B. and Ruoslahti, E., Biotechnology, 13 (1995) 265. Saragovi, H.U., Fitzpatrick, D., Raktabutr, A., Nakanishi, H., Kahn, M. and Greene, M.I., Science, 253 (1991) 792. Sollazzo, M., Bianchi, E., Felici, F., Cortese, R. and Pessi, A., In Cortese, R. (Ed.) Combinatorial Libraries. Synthesis, Screening and Application Potential, Walter de Gruyter, Berlin, Germany, 1995, pp. 127–143. Bianchi, E., Venturini, S., Pessi, A., Tramontano, A. and Sollazzo, M., J. Mol. Biol., 236 (1994) 649. Martin, F., Toniatti, C., Salvati, A.L., Ciliberto, G., Cortese, R. and Sollazzo, M., J. Mol. Biol., 255 (1996) 86. Bianchi, E., Folgori, A., Wallace, A., Acali, S., Phalipon, A., Barbato, G., Bazzo, R., Cortese, R., Felici, F. and Pessi, A., J. Mol. Biol., 247 (1995) 154. Cunningham, B.C., Lowe, D.G., Li, B., Bennett, B.D. and Wells, J.A., EMBO J., 13 (1994) 2508. Li, B., Tom, J.Y.K., Oare, D., Yen, R., Fairbrother, W.J., Wells, J.A. and Cunningham, B.C., Science, 270 (1995) 1657. Braisted, A.C. and Wells, J.A., Proc. Natl. Acad. Sci. USA, 93 (1996) 5688. Nord, K., Nilsson, J., Nilsson, B., Uhlén, M. and Nygren, P.-Å., Protein Eng., 8 (1995) 601. Ku, J. and Schultz, P.G., Proc. Natl. Acad. Sci. USA, 92 (1995) 6552. McConnell, S.J. and Hoess, R.H., J. Mol. Biol., 250 (1995) 460. Hoess, R.H., Mack, A.J., Walton, H. and Reilly, T.M., J. Immunol., 153 (1994) 724. Vaughan, T.J., Williams, A.J., Pritchard, K., Osbourn, J.K., Pope, A.R., Earnshaw, J.C., McCafferty, J., Hodits, R.A., Wilton, J. and Johnson, K.S., Nat. Biotechnol., 14 (1996) 309. Chiswell, D.J. and McCafferty, J., Trends Biotechnol., 10 (1992) 80. Marks, J.D., Hoogenboom, H.R., Griffiths, A.D. and Winter, G., J. Biol. Chem., 267 (1992) 16007. FitzGerald, K., Chiswell, D., Earnshaw, J., Smith, R., Kenten, J., Williams, R. and McCafferty, J., In Cortese, R. (Ed.) Combinatorial Libraries. Synthesis, Screening and Application Potential, Walter de Gruyter, Berlin, Germany, 1995, pp. 189–204.

259

J. Collins 162 Marks, J.D., Griffiths, A.D., Malmqvist, M., Clackson, T.P., Bye, J.M. and Winter, G., Biotechnology, 10 (1992) 779. 163 Ellington, A.D. and Szostak, J.W., Nature, 346 (1990) 818. 164 Ellington, A.D. and Szostak, J.W., Nature, 355 (1992) 850. 165 Bartel, D.P. and Szostak, J.W., Science, 261 (1993) 141 1. 166 Geoffrey, F., Sodoyer, R. and Aujume, L., Gene, 151 (1994) 109. 167 Sodoyer, R., Aujume, L., Geoffrey, F., Pion, C., Peubez, I., Montegue, B., Jacquemot, P. and Dubayle, J., In Kay, B.K., Winter, J. and McCafferty, J. (Eds.) Phage Display of Peptides and Proteins. A Laboratory Manual, Academic Press, San Diego, CA, U.S.A., 1996, pp. 215–226. 168 Gramatikoff, K., Giorgiev, G. and Schaffner, W., Nucleic Acids Res., 22 (1994) 5761. 169 Duenas, M. and Borrebaeck, C.A.K., Biotechnology, 12 (1994) 999. 170 Luban, J. and Goff, S.P., Curr. Biol., 6 (1995) 59. 171 Crameri, R. and Suter, M., Gene, 137 (1993) 69. 172 Marks, J.D., Ouwehand, W.H., Bye, J.M., Finnern, R., Gorick, B.D., Voak, D., Thorpe, S.J., HughesJones, N.C. and Winter, G., Biotechnology, 11 (1993) 1145. 173 Marks, J.D., Hoogenboom, H.R., Bonnert, T.P., McCafferty, J., Griffiths, A.D. and Winter, G., J. Mol. Biol., 222 (1991) 581. 174 Portolano, S., McLachlan, S.M. and Rapoport, B., J. Immunol., 151 (1993) 2839. 175 Chazenbalk, G.D., Portolano, S., Russo, D., Hutchison, J.S., Rapoport, B. and McLachlan, S., J. Clin. Invest., 92 (1993) 62. 176 Doorbar, J. and Winter, G., J. Mol. Biol., 244 (1994) 361. 177 Goodson, R.J., Doyle, M.V., Kaufman, S.E. and Rosenberg, S., Proc. Natl. Acad. Sci. USA, 91 (1994) 7129. 178 Meulemans, E.V., Slobbe, R., Wasterval, P., Ramaekers, F.C.S. and Van Eys, G 244 (1994) 353. 179 De Kruif, J., Terstappen, L., Boel, E. and Logtenberg, T., Proc. Natl. Acad. Sci. USA, 92 (1995) 3938. 180 Cai, X. and Garen, A., Proc. Natl. Acad. Sci. USA, 92 (1995) 6537. 181 Koivunen, E., Wang, B., Dickinson, G.D. and Ruoslahti, E., Methods Enzymol., 245 (1994) 346. 182 Koivunen, E., Wang, B. and Ruoslahti, E., J. Cell Biol., 124 (1994) 373. 183 Matthews, D.J. and Wells, J.A., Science, 260 (1993) 1113. 184 Matthews, D.J., Goodman, L.J., Gorman, C.M. and Wells, J.A., Protein Sci., 3 (1994) 1197. 185 Dennis, M.S. and Lazarus, R.A., J. Biol. Chem., 269 (1994) 22137. 186 Stausbøl-Grøn, B., Wind, T., Kjaer, S., Kahns, L., Hansen, N.J.V., Kristensen, P. and Clark, B.F.C., FEBS Lett., 391 (1996) 71. 187 Ren, R., Mayer, B.J., Cicchetti, P. and Baltimore, D., Science, 259 (1993) 1157. 188 Sparks, A.B., Rider, J.E., Hoffman, N.G., Fowlkes, D.M., Quilliam, L.A. and Kay, B.K., Proc. Natl. Acad. Sci. USA, 93 (1996) 1540. 189 Feng, S., Kasahara, C., Rickles, R.J. and Schreiber, S.L., Proc. Natl. Acad. Sci. USA, 92 (1995) 12408. 190 Sparks, A.B., Hoffman, N.G., McConnell, S.J., Fowlkes, D.M. and Kay, B.K., Nat. Biotechnol., 14 (1996) 741. 191 Meydan, N., Grunberger, T., Dadi, H., Shahar, M., Arpala, E., Lapidot, Z., Leeder, J.S., Freedman, M., Cohen, A., Gazit, A., Levitski, A. and Roifman, C.M., Nature, 379 (1996) 645. 192 Karlsson, T., Songyang, Z., Landgren, E., Lavergne, C., Di Fiore, P.P., Anafi, M., Pawson, T., Cantley, L.C., Claesson-Welsh, L. and Walsh, M., Oncogene, 10 (1995) 1475. 193 Dybwad, A., Forre, O., Kjeldssen Kragh, J., Natvig, J.B. and Sioud, M., Eur. J. Immunol., 23 (1993) 3189. 194 Malik, P., Terry, T.D. and Perham, R.D., In Kay, B.K., Winter, J. and McCafferty, J. (Eds.) Phage Display of Peptides and Proteins. A Laboratory Manual, Academic Press, San Diego, CA, U.S.A., 1996, pp. 127–140. 195 Nicosia, A., Monaci, P., Luzzago, A., Galfre, G., Felici, E, Prezzi, C., Mennuni, C., Meola, A.,

260

Phage display

196 197 198 199 200 201 202 203 204 205 206 207 208 209 210

211 212 213 214 215 216 217 218 219 220

221 222 223 224 225 226 227

Mecchia, M. and Cortese, R., In Cortese, R. (Ed.) Combinatorial Libraries. Synthesis, Screening and Application Potential, Walter de Gruyter, Berlin, Germany, 1996, pp. 145–1 57. Felici, F., Castagnoli, L., Musacchio, A., Jappelli, R. and Cesarini, G., J. Mol. Biol., 222 (1991) 301. Burton, D.R., Barbas III, C.F., Persson, M.A.A., Koenig, S., Chanock, R.M. and Lerner, R.A., Proc. Natl. Acad. Sci. USA, 88 (1991) 10134. Zebedee, S.L., Barbas III, C.F., Hom, Y.-L., Caothien, R.H., Graff, R., DeGraw, J., Pyati, J., LaPolla, R., Burton, D.R., Lerner, R.A. and Thornton, G.B., Proc. Natl. Acad. Sci. USA, 89 (1992) 3175. Barbas III, C.F., Crowe Jr., J.E., Cababa, D., Jones, T.M., Zebedee, S.L., Murphy, B.R., Chanock, R.M. and Burton, D.R., Proc. Natl. Acad. Sci. USA, 89 (1992) 10164. Sioud, M., Dybwad, A., Jespersen, L., Suleyman, S., Natvig, J.B. and Forre, O., Clin. Exp. Immunol., 98 (1994) 520. Greenwood, J., Willis, A.E. and Perham, R.N., J. Mol. Biol., 220 (1991) 821. Di Marzo Verones, F., Willis, A.E., Boyer-Thompson, C., Appella, E. and Perham, R.N., J. Mol. Biol., 243 (1994) 167. Jespers, L.S., Roberts, A., Mahler, S.M., Winter, G. and Hoogenboom, H.R., Biotechnology, 12 (1994) 899. Riechmann, L., Clark, M., Waldmann, H. and Winter, G., Nature, 332 (1988) 323. Winter, G.P., Phil. Trans. R. Soc. London, B324 (1989) 537. Gorman, S.D., Clark, M.R., Routledge, E.G., Cobbold, S.P. and Waldmann, H., Proc. Natl. Acad. Sci. USA, 88 (1991) 4181. Carter, P., Presta, L., Gorman, C.M., Ridgway, J.B.B., Henner, D., Wong, W.L.T., Rowland, A.M., Kotts, C., Carver, M.E. and Shepard, H.M., Proc. Natl. Acad. Sci. USA, 89 (1992) 4285. Kuntz, I.D., Science, 257 (1992) 1078. Smith, G.P., Gene, 128 (1993) 1. Suter, M., Foti, M., Ackermann, M. and Crameri, R., In Kay, B.K., Winter, J. and McCafferty, J. (Eds.) Phage Display of Peptides and Proteins. A Laboratory Manual, Academic Press, San Diego, CA, U.S.A., 1996, pp. 195–214. Kang, A.S., Barbas III, C.F., Janda, K.D., Benkovic, S.J. and Lerner, R.A., Proc. Natl. Acad. Sci. USA, 88 (1991) 4363. Figini, M., Marks, J.D., Winter, G. and Griffiths, A.D., J. Mol. Biol., 239 (1994) 68. McCafferty, J., Griffiths, A.D., Winter, G. and Chiswell, D.J., Nature, 348 (1990) 552. McGuiness, B.T., Walter, G., FitzGerald, K., Schuler, P., Mahoney, W., Duncan, A.R. and Hoogenboom, H.R., Nat. Biotechnol., 14 (1996) 1149. Holliger, P., Prospero, T. and Winter, G., Proc. Natl. Acad. Sci. USA, 50 (1993) 6444. Crameri, R., Jaussi, R., Menz, G. and Blaser, K., Eur. J. Biochem., 226 (1994) 53. Ward, E.S., Güssow, D., Griffiths, A.D., Jones, P.T. and Winter, G., Nature, 341 (1989) 544. Barbas III, C.F., Kang, A.S., Lerner, R.A. and Benkovic, S.J., Proc. Natl. Acad. Sci. USA, 88 (1991) 7978. Huse, W.D., Sastry, L., Iverson, S.A., Kang, A.S., Alting-Mees, M., Burton, D.R., Benkovic, S.J. and Lerner, R.A., Science, 246 (1989) 1275. Griffiths, A.D., Malmqvist, M., Marks, J.D., Bye, J.M., Embleton, M.J., McCafferty, J., Baier, M., Holliger, K.P., Goricke, B.D., Hughes-Jones, N.C., Hoogenboom, H.R. and Winter, G., EMBO J., 12 (1993) 725. Wasik, M.A., Scand. J. Immunol., 35 (1992) 421. Dennis, M.S. and Lazarus, R.A., J. Biol. Chem., 269 (1994) 22129. Hoogenboom, H.R. and Winter, G., J. Mol. Biol., 227 (1992) 381. Cwirla, S.E., Peters, E.A., Barrett, R.W. and Dower, W.J., Proc. Natl. Acad. Sci. USA, 87 (1990) 6378. McCafferty, J., Griffiths, A.D., Winter, G. and Chiswell, D.J., Nature, 348 (1990) 552. Markland, W., Roberts, B.L., Saxena, M.J., Ladner, R.C. and Guterman, S.K., Gene, 109 (1991) 13. Bass, S., Greene, R. and Wells, J.A., Proteins Struct. Funct. Genet., 8 (1990) 309. 26 1

J. Collins 228 Hoogenboom, H.R., Griffiths, A.D., Johnson, K.S., Chiswell, D.J., Hudson, P. and Winter, G., Nucleic Acids Res., 19 (1991) 4133. 229 Jackson, R.H., McCafferty, J., Johnson, K.S., Pope, A.R., Roberts, A.J., Chiswell, D.J., Clackson, T.P., Griffiths, A.D., Hoogenboom, H.R. and Winter, G., In Rees, A.R., Sternberg, M.J.E. and Wetzel, R. (Eds.) Protein Engineering. A Practical Approach, Oxford University Press, Oxford, U.K., 1992, pp. 277-301. 230 Ohrum, H., Andersen, P.S., Oster, A., Johansen, L.K., Riise, E., Bjonvard, M., Svendsen, I. and Engberg, J., Nucleic Acids Res., 21 (1993) 4491. 231 Corey, D.R., Shiau, A.K., Yang, Q., Janowski, B.A. and Craik, C.S., Gene, 128 (1993) 129. 232 Geoffrey, F., Sodoyer, R. and Aujume, L., Gene, 151 (1994) 109. 233 Johansen, L.K., Albrechtsen, B., Andersen, H.W. and Engberg, J., Protein Eng., 8 (1995) 1063. 234 McConnell, S.J., Kendall, M.L., Reilly, T.M. and Hoess, R.H., Gene, 151 (1994) 115. 235 Jespers, L.S., Messens, J.H., De Keyser, A., Eeckhout, D., Van den Brande, I., Gansemans, Y.G., Lauwereys, M.J., Vlasuk, G.P. and Stanssens, P.E., Biotechnology, 13 (1995) 378. 236 Watson, E.K., Williamson, R. and Chapple, J., Br. J. Gen. Pract., 41 (1991) 237. 237 Gram, H., Strittmatter, U., Lorenz, M., Glück, D. and Zenke, G., J. Immunol. Methods, 161 (1993) 169. 238 Cabibbo, A., Sporeno, E., Toniatti, C., Altamura, S., Savino, R., Paonessa, G. and Ciliberto, G., Gene, 167 (1996) 41. 239 Saggio, I., Cloaguen, I. and Laufer, R., Gene, 152 (1995) 35. 240 Saggio, I., Cloaguen, I., Poiana, G. and Laufer, R., EMBO J., 14 (1995) 3045. 241 Hennecke, M., Kola, A., Baensch, M., Wrede, A., Klos, A., Bautsch, W. and Kohl, J., Gene, 184 (1997) 263. 242 Robertson, M.W., J. Biol. Chem., 268 (1993) 12736. 243 Scarselli, E., Esposito, G. and Traboni, C., FEBS Lett., 329 (1993) 223. 244 Djojonegoro, B.M., Benedik, M.J. and Willson, R.C., Biotechnology, 12 (1994) 169. 245 McCafferty, J., Jackson, R.H. and Chiswell, D.J., Protein Eng., 4 (1991) 955. 246 Widersten, M. and Mannervik, B., J. Mol. Biol., 250 (1995) 115. 247 Soumillion, P., Jespers, L., Bouchet, M., Marchand-Brynaert, J., Sartiaux, P. and Fastrez, J., Appl. Biochem. Biotechnol., 47 (1994) 175. 248 Soumillion, P., Jespers, L., Bouchet, M., Marchand-Brynaert, J., Winter, G. and Fastrez, J., J. Mol. Biol., 237 (1994) 415. 249 Wang, C.-I., Yang, Q. and Craik, C.S., J. Biol. Chem., 270 (1995) 12250. 250 Tanaka, A.S., Sampaio, C.A.M., Fritz, H. and Auerswald, E.A., Biochem. Biophys. Res. Commun., 214 (1995) 389. 251 Rebar, E.J. and Pabo, C.O., Science, 263 (1994) 671. 252 Vispo, N.S., Felici, F., Castagnoli, L. and Cesareni, G., Ann. Biol. Clin., 50 (1993) 917. 253 Jacobsson, K. and Frykberg, L., Biotechniques, 18 (1995) 878. 254 Hottiger, M., Grammatikoff, K., Georgiev, O., Chapponier, C., Schaffner, W. and Hiibscher, U., Nucleic Acids Res., 23 (1995) 736. 255 Crameri, R. and Blaser, K., Int. Arch. Allergy Immunol., 107 (1995) 460.

262

Section III

Informatics and Related Topics

This page intentionally left blank

Informatics and related topics: A perspective Walter H. Moos MitoKor, 11494 Sorrento Valley Road, San Diego, CA 92121. U.S.A.

The healthcare industry has encountered significant new challenges in the 1990s, and there will be additional medical and economic worries post-2000. As a result, cost containment and managed care, mega-mergers and acquisitions, venture-capital funded technology development and competition, and many other formidable issues are catalyzing significant paradigm shifts in therapeutic research and development programs. The resulting efforts to pursue new ways of attacking drug discovery and preclinical development, better, faster, and cheaper, are being aided by innovative and novel approaches in the design, synthesis, screening, and optimization of worthwhile product candidates. Today, combinatorial chemistry and molecular diversity (CCMD) technologies include interfaces with numerous scientific and professional disciplines. ‘Informatics’ (see Fig. 1) is one of the buzz words of this new field, touching on many aspects of CCMD, as well

Fig. 1. Informatics and related topics in the context of combinatorial chemistry and molecular diversity. Annual Reports in Combinatorial Chemistry and Molecular Diversity, Vol. 1, pp. 265–266 © 1997 ESCOM Science Publishers B. V.

265

W.H. Moos

as genomics and other hot topics of the 1990s, especially when prefixed with ‘bio’. The intellectual property generated by this new field has spawned multiple collaborations, mostly between ‘big pharma’ and start-up ‘biotech’ ventures. The wealth of information generated by combinatorial approaches has led to enhanced database methodologies, capable of handling an increasingly overwhelming amount of data. The focus on solidphase chemistries has spurred the development of a substantial new repertoire of solidphase organic and inorganic transformations and conditions. Of course, for the healthcare industry, none of the above is worth much unless interesting biological activity can be uncovered, and high throughput screening has become a field unto itself in the process of exploiting CCMD and other modern pharmaceutical and biotechnology platforms. Furthermore, several clever strategies for identifying active components in large libraries have been developed. Taken together, this advanced mix of technologies is catapulting a historically conservative industry infrastructure into the new millennium. For perhaps the first time, new (bio)pharmaceutical discovery and development can claim a more streamlined approach to filling the proverbial pipeline of new drug candidates. It is indeed a brave new world, and one that promises to deliver, if not immediately, then soon!

266

Modern chemical and biological databases Bradley D. Christie and James G. Nourse MDL Information Systems Inc., 14600 Catalina Street, San Leandro, CA 94577, U.S.A

Introduction Chemical structure databases and information management systems are a standard item in nearly all chemical structure research environments [1]. They are used to store and retrieve chemical structures as a data type as well as data associated with them. These systems have been designed for, and are customarily used for, single chemical structures that are reasonably well characterized. The recent surge in combinatorial chemistry work has provided a challenge to these systems since the number of chemical structures involved has grown dramatically. In addition, the actual entity studied may be a mixture of tens to millions of structures or more. Such mixtures can often be represented by a generic or Markush structure. This latter challenge will be emphasized in the discussion that follows.

Representing generic structures Under the collective term of combinatorial chemistry is a wide range of methods for synthesizing molecules in parallel, including specific synthesis using robotic techniques, synthesis of mixtures of affixed molecules, and even ‘virtual experiments’, where structures are generated in silico for quantitative structure-activityrelationships (QSARs) or other computational applications. The set of structures involved in these methods is often referred to as a combinatorial library. The number of structures represented for one of these combinatorial libraries might range from dozens to millions of structures. To represent a large library, a single structure known as an R-group or Markush or generic structure can often be used to represent the entire library. Generic structures are represented as a core fragment with attached substituents. The core fragment is called the root or scaffold and is the fragment that is common to every one of the structures. Attached to the root are labels, shown as R1, R2, R3, etc. (Fig. 1). These labels represent R-groups and are the parts of the structure that vary. Each R-group is composed of a set of members. Every specific structure in the generic structure can be represented by selecting one member of each R-group to replace the corresponding Rgroup label. Because of the wide variety of methods and uses for combinatorial chemistry, the software to manage these generic structures must be able to store and manipulate them in several different ways. For small libraries, the simplest way to manage the structures is to generate and store each structure individually. This is called enumeration. For larger libraries, enumeration is impractical, as it would enormously increase the size of a database, and would make searching and other operations very difficult. This problem is analogous to one already addressed in chemical structure database information manageAnnual Reports in Combinatorial Chemistry and Molecular Diversity, Vol. 1, pp, 267-272 © 1997 ESCOM Science Publishers B. V.

267

B.D. Christie and J G . Nourse

Fig. 1. A combinatorial library represented as an R-group structure, and two enumerated structures.

ment systems in which it is possible to query a database with a generic chemical structure [2]. While it might be possible to perform as many individual queries as there are structures implied by the generic structure, this would be impractical. Although the representation of these queries is very similar to a combinatorial library, there are some significant differences in both design and use. Since the exact compositions of the database structures are not determined until after the query has been posed, the query typically contains query features, such as fuzzy atoms and bonds and ambiguous connectivity. There is no need for these features to represent a combinatorial library, since all the structures are known. The syntax for representing R-groups in a query can also include features such as zero occurrences to reduce the retrieval of undesired records. Such a feature is unnecessary for a combinatorial library representation, since it is a complete set of specific structures and not an open-ended query. Markush representations are also used to store and search structures in patents [3]. As with R-group queries, patents generally represent an open-ended set of structures and contain generic features such as an alkyl group of 1–6 carbons or even something as ambiguous as a heterocyclic ring. Storing and searching these open-ended features is difficult, and so searching patent structures is a challenging and ongoing problem. Since the goal of a combinatorial library representation is to be precise and specific, this complexity is undesirable. Limiting the representation to only specific atoms and bonds in roots and R-group members, without fuzzy atoms, bonds, other query features, or generic features, is both practical and reasonable. By limiting the scope of the representation, other challenges, such as storage and searching, become more tractable.

Working with generic structures Chemical structure information management systems also need to support operations that are required by the researcher or database administrator that go beyond just storing and retrieving data. Two new capabilities to support combinatorial chemistry research are enumeration and clipping. Enumeration is the process of creating the specific structures implied by a generic structure. Clipping is the process of creating the members required for the R-groups of a generic structure. These will be discussed in turn. Enumeration The enumeration process takes a generic structure and performs the substitutions defined by one or more of the R-groups to generate a number of more specific structures. 268

Modern chemical and biological databases

If all the R-groups are expanded, then the result is a set of specific structures. If only some of the R-groups are expanded, then the result is a set of subgeneric structures. These subgeneric structures are useful in representing deconvolution experiments. This process is shown in Fig. 2. The enumeration capability can be useful for storing experimental data associated with combinatorial libraries. If specific compounds or submixtures have been generated, stored, tested, or other properties have been recorded, then an enumerated structure can be generated so that it can be associated with these data. Data stored with combinatorial libraries include tags (an encoding physically carried with the molecules of a library) and codes (a symbolic label) that are associated with the R-group members. Each explicit structure then has a corresponding ordered list of these data, corresponding to the specific members of the structure. These lists can be generated automatically during the enumeration process; they can also be used to constrain the enumeration to a specific subset of structures with known tags or codes. On the other hand, the enumerated structures themselves might be useful as input for other computer programs that generate computational data, such as QSAR or other chemical property predictions. Since these programs accept and work on individual specific structures, these structures must be generated both for input to these programs and for association with their output. This capability is important for the process of designing new combinatorial libraries. There might often be cause to limit the enumeration to less than every possible specific structure or subgeneric structure represented by the combinatorial library. If the library represents a real experiment, then perhaps not every combination of R-group members represents a compound that was actually synthesized. If the library is to be enumerated

Fig. 2. Enumeration of a generic structure.

269

B.D. Christie and J. G Nourse

for computational purposes, then there may be external restrictions that can be used to limit the number of structures and therefore the amount of computation required. Rules can be added to constrain the enumeration process, based on the tags or codes or other generated data such as molecular weight or formula. For example, the structure in Fig. 3 can be created from the combinatorial library shown by restricting the enumeration to just the structure with molecular weight range 161–162 or molecular formula C7H12NOCl. Alternatively, this structure can be created by restricting the enumeration to the named structure ethyl-chloro-lactam. It is important that the method used to restrict the enumeration does not require that all the possible structures be created and tested with the constraint since combinatorial libraries may imply millions of structures. If some of the enumerated structures have symmetry, or there is redundancy in the Rgroup members, then there is the possibility that some of the enumerated structures are identical. How to represent these depends on the type of experiment. If the structures represent tagged compounds on a bead, well, or chip, then each duplicate represents a specific location with its own data. In this case, it may be desirable to maintain separate records, each identified by their unique list of tags. If the structures represent an unseparated mixture of compounds, then duplicate structures do not have any experimental significance. In this case, the enumeration should avoid adding duplicate structures to its output. For specific structures, this can be done by searching for duplicate structures with existing methods. For subgeneric structures, this requires identifying identical generic structures, which is discussed below. Clipping There is a connection between the R-groups of a combinatorial structure and the methods and reagents used in its synthesis: each R-group member corresponds to a specific reagent whose leaving group has been ‘clipped’ to attach it to the root. A chemical information management system can assist in this by transforming the reagent to an Rgroup member (Fig. 4). A set of rules identifies the leaving group. Not only does this simplify the process of generating a combinatorial structure, but it also maintains the link between the structure and the actual reagents used in its creation.

Fig. 3. Restricted enumeration.

270

Modern chemical and biological databases

Fig. 4. Clipping a reagent to form an R-group member.

Fig 5 Alternative representations for 4-pyridone.

Searching generic structures Searching by chemical structure has become an essential component of chemical information management systems. Structures can be identified that match a specific set of structural features that correspond to certain biological, chemical, or physical properties. Identity searches are also useful as a means of identifying duplicate records. These searches are just as important for combinatorial databases. The search for specific properties transforms archived combinatorial libraries into a data mine of thousands or millions of structures, and their usefulness can transcend the initial experiment for which they were designed. Searching a database of specific structures with structural queries is a standard capability of modern chemical structure database systems [1]. The challenge caused by combinatorial chemistry is to search databases of mixed generic and specific structures with both generic and specific queries efficiently and accurately. The simplest type of structure search is the identity, or exact match, search. Testing a pair of explicit structures for identity is a problem that has been solved, and for which efficient algorithms now exist [4]. Even between explicit structures, identity comparison can become complicated when issues such as tautomerism, oligomerization, and idiomatic differences in representation (Fig. 5) are considered. It is important to keep in mind that the goal is to test for the identity of the molecules, and not their specific representations as drawn and entered in a database. This has implications for comparing generic structures as well. Two structures may have different roots or R-groups, as in the very simple case shown in Fig. 6, but still represent the same library.

Fig. 6. Alternative representations of a simple combinatorial library.

271

B.D. Christie and J. G. Nourse

Fig. 7. Nontrivial overlap between two simple combinatorial libraries.

Fig. 8. Substructure search of generic structures.

Related to the identity search but unique to combinatorial libraries is a search to find the specific structures that are shared between two libraries. The result can be expressed, for each library, as a percentage overlap with the other library. This type of search could be useful in evaluating the development or purchase of a library, and in analyzing deconvolution experiments. As shown in Fig. 7, even simple combinatorial libraries can be challenging to compare. In this case, each library contains six explicit structures, and there are two in common, for a 33% overlap. The overlap between small libraries such as these could be determined by enumerating each and comparing the specific structures using the existing methods for exact match. Enumeration is prohibitive for interactive comparisons of larger libraries, so the generic structures themselves must be compared with each other [5]. Another searching capability is required by the need to find a partial structure or substructure in a database that includes generic structures. The ability to do a substructure search is standard in modern chemical structure information systems. The problem is more complicated when generic structures must be searched, yet even more necessary since the generic might represent millions of specific structures. The query in Fig. 8 is compared with the generic structure to find the set of specific structures. This operation must be performed on the generic structure itself rather than on the set of specific structures implied by the generic to be practical.

References 1 a. Willett, P., J. Chemometrics, 1 (1987) 139. b. Ahrens, E.K.F., In Warr, W.A. (Ed.) Chemical Structures: The International Language of Chemistry, Springer, London, U.K., 1988, pp. 97–111. 2 Barnard, J.M. (Ed.) Computer Handling of Generic Chemical Structures, Cower, Aldershot, U.K., 1984. 3 Holliday, J.D. and Lynch, M.F., J. Chem. Inf. Comput. Sci., 35 (1995) 659 and references cited therein. 4 Wipke, W.T. and Dyott, T.M., J. Am. Chem. Soc., 96 (1974) 4834. 5 Leland, B.A., Christie, B.D., Nourse, J.G., Grier, D.L. and Smith, D.H., Managing the combinatorial explosion, presented at the 4th International Conference on Chemical Structures, Noordwijkerhout, The Netherlands, 2–6 June 1996.

272

Practical high throughput screening Melvin Reichmana and Alex L. Harrisb a

Ligand Pharmaceuticals, 9393 Towne Centre Drive, San Diego, CA 92121, USA. b Chiron Corporation, 4560 Horton Sireet, Emeryville, CA 94608, U.S.A.

Historical perspective During the first half of the century, there was virtually an exclusive reliance on animal testing as the primary model for drug discovery and development. New chemical entities were administered to rodents in the primary screen assay, and the appropriate responses were monitored for indications of therapeutic potential. Compounds meeting the appropriate potency and efficacy criteria were ‘promoted’ to more diverse and sophisticated animal models to characterize their pharmacological profile. The responses that were monitored included blood pressure (hypotensives), latency to respond to painful stimuli (analgesics), attenuation of seizure propensity (antiepileptics) and other responses that were intuitively and pharmacologically valid indicators of medicinal potential or toxicity. Some of these methods were semiautomated and quite sophisticated for their time, particularly for cardiovascular indications [1]. As medicinal chemistry developed into a more rational discipline, it became more important to measure the intrinsic activities of compounds independent of their pharmacokinetic parameters. The measurement of intrinsic efficacies was essential for rationalizing structure-activity relationships (SARs), a foundation of modern drug discovery. These drivers led to the development of a variety of assay models that employed isolated organ tissues to study drug action. Such in vitro approaches dramatically advanced our knowledge of the pharmacological basis of drug action [2]. In many instances, the correlation between smooth muscle responses and efficacy in humans was remarkable. Examples include the guinea pig ileum as a model for narcotic analgesia [3,4] and vascular smooth muscle strips as a model for hypotensive agents [5,6]. While these approaches allowed measurement of the intrinsic efficacy of drugs apart from their pharmacokinetic attributes, they were quite laborious. In most assays, a technician could only measure the activities of a handful of new chemical entities in a day. Advances in the fields of biochemistry and metabolism between 1930 and 1950 were followed by the post-war introduction and utilization of radioisotopes in the biological sciences during the 1960s and 1970s. These advances led to revolutionary changes in our understanding of drug action. Pharmaceutical companies utilized the new knowledge to establish novel bioassays to find new medicines for old diseases. In addition, in response to the increasingly rigorous regulatory requirements of drug development, pharmaceutical companies had established batteries of pharmacological and biochemical assays to characterize their compounds. Quite frequently, new activities were observed in the course of such ‘selectivity’ testing (see Ref. 7). As more efficient, mechanistically based assays were developed in vitro, it made more economic sense to ‘randomly’ screen existing chemicals Annual Reports in Combinatorial Chemistry and Molecular Diversity, Vol. 1, pp. 273–286 © 1997 ESCOM Science Publishers B. V.

273

M. Reichman and A.L. Harris

fig. 1. lypical assay plate setup of a 96-well plate, mapped with compounds, reference standards, and positive and negative controls.

in a variety of therapeutically relevant assays for activities that could be commercialized. Moreover, since only a fraction of a percent of new chemical entities entered clinical trials, pharmaceutical companies had quite large standing inventories of chemically well-characterized compounds. It became evident that screening old compounds for new biological activity was a means of adding tremendous value to the compound inventories that were acquired over three quarters of a century or more. During the 1980s, automation became more prevalent throughout society and manufacturing, as well as in industrial research laboratories. The nexus of biology, medicinal chemistry and laboratory automation represents a technological wave known today as high throughput drug screening (HTS). This new discipline facilitates drug discovery through testing huge chemical libraries in parallel bioassays that are highly automated. Automation allows mass screening to be accomplished with a constrained workforce, at a reasonable cost and in a reasonable time. Successful HTS endeavors to deliver multiple novel templates from which to initiate medicinal chemistry programs and to develop new drugs. Recently, HTS has assisted in the exploitation of other exciting, radically new approaches to drug discovery; the most important of these is combinatorial chemistry. The advances in HTS over the past decade may have even set the stage for this emerging technology. The two technologies are symbiotic to one another. The momentum behind HTS has fostered the formation of a professional society founded in 1993, the Society of Biomolecular Screening (SBS), in order to provide an accessible and visible forum for this core technology [8]. The view of leading members of this society is that neither primary screening capacity nor compound supply is rate-limiting for drug discovery. This review will discuss some of the main elements of modern HTS.

Molecular ‘diversity’ The number and quality of chemical samples available to an HTS laboratory for unrestricted screening represents a fundamental sort of ‘investment capital’ for drug discovery today. There are two ways, both arising only in the last 10 years, to rapidly 274

Practical high throughput screening

build a large chemical inventory to support a modern HTS laboratory. One approach is through combinatorial chemistry technologies. The other means to acquire diverse chemical libraries is to purchase compounds from the dozen plus vendors who make available catalogs of organic, ‘drug-like’ compounds suitable for random HTS.

Compound acquisition and registration for HTS As mentioned above, one can purchase compounds already plated in microtiter plates from any of several vendors for $10–20 per mg of sample. The compounds are distributed on 96-well plates (see Fig. 1), allowing blank columns for positive (reference compound) and/or negative (solvent) controls. The plates are now frequently shipped with electronic plate maps on computer disks. These plate maps contain relevant information on structure and vendor ID numbers. Many pharma sites are now spending the time to solvate and distribute their existing compound libraries on microtiter plates. The compound ID numbers and plate maps can easily be imported into virtually any compound registration software system. This information can be linked to data analysis packages (see below) to provide a means of integrating structure and activity data. In addition to acquiring plated samples, purchasing chemical powders in 10–20mg amounts should also be considered. There are several advantages to acquiring powders

Fig. 2. Sample weighing and distribution robot.

275

M. Reichman and A. L. Harris

instead of solutions. The main advantages of powders are that the structural confirmation of HTS ‘hits’ is best accomplished with powders, and that they generally cost less than solubilized, plated samples on a unit mass basis. For example, the cost of purchasing 20 mg sample lots can be as low as $1–2 per mg, or one-tenth the cost of the same samples distributed in 96-well plates. A disadvantage of acquiring powders is that weighing, solubilizing and plating powders is labor-intensive. One could, however, acquire a relatively inexpensive sample-weighing station capable of automatically taring empty, and reweighing filled, bar-coded vials, as well as solubilizing samples by delivering metered amounts of solvent. With automated methods, the only manual labor needed is for a person to transfer 1–3 mg of powder on the tip of a spatula into the vial. The automated sample preparation workstation could then reweigh the vial and calculate the sample weight by subtracting the tare weight. Considering the molecular weight of each sample, the system could also solubilize each sample to a concentration of say 10 mM, vortex the samples and then distribute them to microtiter plates. Utilizing a system equipped with a bar-code reader, the entire process of deriving plated inventories from powders – except for transferring the powders to the vials – can be readily automated. Such systems range in price from around $35K to greater than $150K, depending upon their sophistication and flexibility (for examples, see Ref. 9 and Fig. 2).

Overview of new software for HTS Software – for compound registration and tracking, data analysis and database retrieval – is an essential component of any high throughput screening operation. Until recently, pharma sites developed their own software to support HTS. Very recently, a small number of companies have released HTS packages for sale. While commercially available software will not completely satisfy the needs of every organization, these applications provide an excellent starting point for managing HTS. Table 1 summarizes several of the packages recently released. BIOSTAR from Tripos Inc. (St. Louis, MO, U.S.A.) is a Macintosh-based program written in 4-DL, a prevalent, relational database program on the Macintosh platform. The program allows the registration of compounds directly from Excel spreadsheets, text files, word processing documents and other common files. Following the registration of samples in the database, submission can occur to the different assays also registered in the database as a ‘turnkey’ process requiring no manual data entry. BIOSTAR runs on the Macintosh platform and offers a ready solution for effectively engaging in HTS. Another plus is that support is from a well-established company specializing in various chemical informatics applications. Data exchange can occur between BIOSTAR and other databases, such as Oracle, although its two-way exchange capabilities are not yet fully implemented. Future releases will no doubt address such deficiencies. The application has excellent potential for interfacing with the entire suite of molecular diversity management software from Tripos. The source code for the application is proprietary; therefore, customization of the software would be a daunting task. Furthermore, it may be difficult finding expert 4-DL programmers to modify the software. CSAP is a product from MLR automation (San Francisco, CA, U.S.A.), a small informatics ‘startup’ company specializing in data management for HTS. The product is 276

Practical high throughput screening TABLE 1 REVIEW OF SOFTWARE PACKAGES FOR HIGH THROUGHPUT SCREENING Name

Vendor

Platform

Cost

Features

Limitations

BIOSTAR

Tripos Inc. (St. Louis, MO, U.S.A.)

Macintosh based (4-DL)

Around $20K

Compound registration, assay assignment, data analysis, data storage and retrieval, easy interface with chemical structure and diversity software offered by Tripos

Only on Macintosh, difficult to customize, cannot execute cross-plate operations

CSAP

MLR automaWindowstion based (San Francis(Access) co, CA, U.S.A.)

Around $20K

Data analysis, data storage and retrieval, seamless integration with MS-Excel and MS-Word and Visual Basic, customizable, ODBC compliant

Only PC-based, vendor not well established, offers no other products

MDL SCREEN

MDL Informa- Oracle -based tion Systems accessible (San Leandro, multi-platCA, U.S.A.) form

> $100K

Compound registration, assay assignment, data analysis, data storage and retrieval, easy interface with chemical structure and Oraclebased infrastructure, available on multiple desktop platforms

More expensive and very hard to custo mize

Windows-based only and written in Microsoft Access, an application fast becoming a standard in industry for rapid database applications development. A strong advantage of Access applications is their seamless integration with standard office automation products, such as MS-Word and MS-Excel. This application easily allows the registration of samples from standardized file formats and their submission to different assays, and it interacts well with popular mainframe and PC databases. Moreover, Access is very compatible with application development tools such as Visual Basic and other common programming tools. A major advantage of CSAP is that the vendor is rather flexible and supports – even encourages – customization by the user. A site with special needs should have little trouble customizing CSAP. The software is reasonably priced at around $20K. The main potential disadvantage of this package is that the entrepreneurial company supporting the product is less established than either MDL or Tripos. MDL SCREEN is the newest entry into the ranks of commercial software to manage HTS. The application interfaces well with popular mainframe databases such as Oracle. SCREEN can handle larger amounts of data, limited mainly by the computational abilities of the processor and the storage capacity of the hard disk. SCREEN is a crossplatform application, a definite plus in a multi-platform environment. The quality control features in SCREEN are well developed. A significant advantage of this software is its 277

M. Reichman and A. L. Harris

compatibility with other MDL products, particularly ISIS. The latter is a client-server relational database application allowing for integration of structural and biological data. Together, SCREEN and ISIS should provide many features needed to manage an HTS laboratory even at the largest pharma sites. MDL is an established vendor of informatics products supporting drug discovery. The disadvantages of the product are its relative cost (> $100K) and inflexible database architecture; MDL usually will not support the product should the code be modified. In summary, for a startup site that prefers to work on Macintosh platforms, BIOSTAR would be a good first choice. Conversely, for a Windows environment, CSAP would be an excellent starting package. For larger, cross-platform sites with heavier data management needs, and for those that already use other MDL products, SCREEN should be seriously considered.

Fundamental approaches to data analysis and interpretation There are three fundamental aspects in the analysis of HTS data: (i) determining which compounds in a run or group of runs are active; (ii) determining the potencies, efficacies, selectivities and mechanism of the actives; and (iii) quality control. There are two basic methods for determining which compounds are active. One involves calculation of the median or mean values of all the samples on a plate (excluding the plate controls), and setting an appropriate activity criterion relative to either of these two parameters. For example, many screeners are of the view that a ‘hit rate’ should not exceed 1% of the samples screened. One could, therefore, consider a compound to be active if its biological activity differs by more than three standard deviations from the mean or median value of the samples on the plate. By definition, these are the top 1% of the actives on the plate. This method has several important advantages, including that it is simple, it has intrinsic and statistical validity and it represents the best method of analyzing HTS data when there are no positive or solvent controls on the plate. With this method, the entire plate serves as the control. With the percentage of the plate control/ median method, there is less need for individual wells containing controls; therefore, edge or any other matrix effect phenomena will not bias the analysis of the data (see below). A dividend of this approach is that the screening efficiency is increased by 10–20%, since the control wells could be replaced with sample. Moreover, it is easier to run diverse assays in parallel from the same compound dilution boxes if the different controls for each assay can be eliminated. Those screeners uncomfortable selecting actives without reference to explicit plate controls would define activity relative to the activity of the solvent control and/or known reference compounds on each plate. Most screeners will format their libraries such that one or two columns on the plate contain the positive and negative controls. The latter are the solvent controls and the activity criterion could be determined as a fold induction or percentage inhibition relative to the solvent controls. By the inclusion of reference samples with known biological activity, one can also calculate the relative efficacy for each hit. Since the controls will be variable from plate to plate, the percentage control and efficacy values should be calculated on a plate-by-plate basis by the software, rather than as a single large batch. Evaluation of the ‘precision’ and ‘accuracy’ of the bioassay 278

Practical high throughput screening

Fig. 3. A typical sigmoidal concentration-response curve, generated using the four-parameter fit equation: y = (A–D)/(l+(x/C)B) + D, where A = max, B = slope, C = midpoint, D = min.

can occur with this method by monitoring the intra- and interassay control responses on each plate. Each of the HTS software packages discussed above allows users the option of calculating actives based on either a percentage of the median or control values. Regardless of the power and features of any data management system, one should employ ‘common’ sense before engaging in compound retesting and characterization campaigns. For example, there may be little value in characterizing those agonist compounds that, at 10 µM, yield a stimulatory response of only 20% above the null solvent control, if the known reference compound yields, at 10 nM, a 120-fold response. In other words, one should question the pharmacological value of following up on compounds with < 5% efficacy and pharmacological potency > 10 µM. Once the primary actives have been determined, their potencies (concentrations yielding one-half the maximum response) and efficacies (activities relative to a reference compound) may be derived by characterizing the response magnitude as a function of increasing drug concentration in a graded manner. Full- or half-log steps are usually plotted on the x-axis versus the absolute or relative response. So plotted, the concentration-response curves should be sigmoidal in nature (Fig. 3). The potency is defined as the mid-inflection point of this curve, and the maximal efficacy as the upper plateau. The slope of the curve may provide some indication as to the pharmacological mechanism of drug action. There are two straightforward means of deriving these parameters. The four-parameter fitting method utilizes nonlinear least-squares regression to fit the data to a sigmoidal model, whereas the log-logit method transforms the data to a linear function known as the ‘Hill 279

M. Reichman and A.L. Harris

plot’ [10]. In the latter model, the potency relates to the x-intercept of the line. The disadvantage of this model is that direct responses cannot be plotted on the y-axis, insofar as responses are percentages of a fixed maximal value. A more detailed analytical discussion of the ‘best’ means of deriving these values is beyond the scope of this article (for general reviews of the merits of linear versus nonlinear comparisons, see Refs. 10 and 11).

HTS assay validation guidelines Rationale All HTS assays should have a solid scientific rationale, e.g., that modulators (agonists, partial agonists, mixed agonists/antagonists or pure antagonists) that elicit a particular mechanistically based endpoint will be predictive of clinical utility. While this requirement may seem obvious, it is one of the more complex issues in managing HTS, particularly at larger research-oriented sites. For a company founded on a single technology, such as inhibition or activation of a newly discovered ‘proprietary’ gene, enzyme or receptor, it is straightforward to rationalize which screens to run. In a larger company, however, with unique functional departments and broad therapeutic areas, and a research management infrastructure that is ‘matrixed’ across ‘line’ departments, many scientific and commercial considerations should come into play before initiating any HTS assay. Possible issues include questions of met versus unmet medical need, risk analysis, whether the target is within the expertise of the organization, competitive stance, and what resources will be needed downstream in order to commercialize the project (see Ref. 12). Standard operating procedures (SOPs) A review of the scientific literature in virtually any area of pharmacology will reveal a large apparent variability in the reported values for drug potencies. No doubt, a large measure of this variability is due to variances in the experimental protocols. An advantage of having a centralized HTS group is that all protocols within a given assay type will be consistent across various assays. This allows a quantitative, bioanalytical comparison of the activities of ‘hits’ in different assays. Pharmacologically valid SOPs are essential for evaluating cross-reactivity data on HTS actives. The HTS laboratories should update their written SOPs whenever the assay protocols change in a significant way. Every detail of the assay, down to the microtiter plate vendor and lot number, should be recorded. One must be cognizant of the variables that affect the potency and efficacy values of the reference compounds. The latter should be monitored with reasonable and customary quality control and assurance procedures. Example of the importance of detailed SOPs Many might not give much thought to consistent drug dilution protocols for all projects; however, solubilization protocols can be quite important. Consider the likelihood that several different laboratories would dilute a 10 mM stock to a 1 µM solution in different ways. Some might choose to use four serial 1:10 dilutions, others might use two serial 1 : 100 dilutions, others might do a combination of 1 : 10 and 1 : 100 dilutions, and so forth. Some might perform these dilutions in serum-containing media, some in serum-free buffer, etc. Finally, some might serially dilute the compound in EtOH – or DMSO – and 280

Practical high throughput screening

Fig. 4. A Sagian HP Orca Microplate robotic system (reprinted with permission from Sagian Corp., Indianapolis, IN, U.S.A.).

‘spike’ a small aliquot of the solvated compound into the assay. For hydrophobic compounds, or those with a slow rate of solvation, each of the above solubilization protocols could (and in the authors’ opinion will, at times) lead to a different result. Similarly, the composition of the assay buffer, as well as the order and manner in which reagents are added, can also affect the results. In summary, the importance of consistent SOPS between different screening assays cannot be overemphasized. In HTS, the details matter and can make the difference between the success and failure of an assay to find leads of sufficient quality to justify the assignment of dedicated medicinal chemistry resources to a project. Validity of protocols In addition to the above general considerations, specific assay types also require additional validation data. In Table 2, some of the most fundamental parameters that comprise assay validity are summarized. Following validation of the assay, intrinsically by the use of reference compounds with known activities, it is useful to prepare a few ‘generic’ plates of the test compounds. These plates should be run in any HTS assay prior to running actual sample plates through the screen. Insofar as possible, these should be ‘representative’ of those in the inventory. The representative compounds could be randomly selected from the inventory. Alternatively, computational methods used for similarity and dissimilarity analysis should be applied to derive a diverse, ‘core’ set of compounds ‘representative’ of the diversity in the library. These plates should also contain some known biological perturbants, i.e., toxic compounds, to determine whether they are detectable as hits in screens looking for antagonists or inhibitors. Similarly, inactive samples should be ‘spiked’ with known active compounds. If an assay is valid, it should be able to detect the known actives on virtually every plate, in every run. 281

M. Reichman and A. L. Harris

State-of-the-art in laboratory robotics today Once a valid data management infrastructure is established, valid data analysis tools are in place, a large chemical inventory is available, and robust assays are fully operational, only then should one seriously consider the automation of key HTS assays. Two approaches may be taken to automate bioassays. One involves constructing a robotic system that can execute the entire assay unattended. While this approach offers great capacity in the turnover of compounds, it usually takes significant human and monetary resources to design and implement. It also takes patience and perseverance. Moreover, dedicated experts (apart from the biologists involved) are usually needed to maintain such a laboratory. The other approach involves the purchase of readily available, robotic workstations with graphical, menu-driven software interfaces. The laboratory personnel, rather than a robot arm, provide for the interconnectivity and the flow of samples between workstations. While such modular or ‘semiautomated’ approaches do not at first yield a fully automated system, they can yield true ‘turnkey’ economies [13]. In many cases, the lower cost of implementation and maintenance of such a semiautomated, vendor-supported, facility will offset the losses in compound turnover. There are several vendors of fully automated systems. Examples of simple systems that are hybrids between the expert systems and the workstations include the Beckman Biomek 2000 equipped with a side-loader robotic arm supplied by Beckman or Sagian (Beckman Instruments, Fullerton, CA, U.S.A.; Sagian, Indianapolis, IN, U.S.A.). Another similar approach is offered in the TRAC systems available through TECAN (Tecan US Inc., Research Triangle Park, NC, U.S.A.). Both these systems combine excellent stand-alone workstations with a robotic arm and appropriate accessories. The advantage of these systems is that their programming environment is graphical and menu-driven. The lowerlevel application language is also quite straightforward. While systems such as these are TABLE 2 PROTOCOL VALIDITY For cell-free enzyme assays one should demonstrate that: - incubation times and protein concentrations are within the linear range for the reaction - substrate concentration is neither excessive nor rate-limiting - the Km and Vmax values are within an acceptable range of literature values For radioligand binding studies, one should demonstrate that: - specific binding is saturable and linear with time and protein concentration - the Kd and Bmax values are within acceptable tolerances - the Kd values measured either by equilibrium or kinetic methods are similar For whole-cell functional assays, one should determine: - that cells are viable under the conditions of the assay - whether the growth phase of the cells is in the linear range for the assay - whether the cell density is an important factor in affecting the results - whether the original source of the cells used is known; whether different lots yield similar reference values - that the cells can be subjected to the forces of gravity and surface shear when centrifuged or washed, respectively

282

Practical high throughput screening

excellent for running simple enzymatic or receptor binding assays, they may not be sophisticated enough to run more complex cellular assays. The simplicity of such systems is their greatest virtue, and is derived by a high standardization of the system platform and available options (see Fig. 4 for an example). However, such a standardization can sometimes be limiting to effectively running more complex assays. More sophisticated, fully automated systems are available from several well-established robotics integrators. They can automate even the most complex assays, both for discovery research and development projects. Examples include complex cell-based transcriptional assays involving centrifugation, and automation of all steps in the measurement of plasma drug levels by high-pressure liquid chromatography (HPLC). Their power and flexibility come at a price; one should not expect out-of-the-box, smooth operation from a customintegrated laboratory robotics system. As mentioned above, installing, validating and operating these systems usually requires a dedicated individual with a strong mechanical proclivity as well as assay development expertise. Although relationships have been established recently between robotics and software vendors, none of these systems are integrated with HTS data management systems. For example, one application worth developing involves automatically forwarding worklists of active compounds for concentration response testing. The ‘hits’ would be identified by the data analysis programs described above. These programs would be integrated with the robotic assay system in order to ‘pick and place’ the active compounds into dilution plates. The robotic system would perform concentration response testing using the same reagents already on the table. Each of the leading software vendors is starting to work with robotics vendors to develop such integrated approaches. The results of similarity searching exercises could also be automatically processed and prepared for testing in this manner. The first vendor that provides such software tools with their hardware will begin to capture the laboratory robotics HTS market.

New leads discovery: Anticipating the future Advances in miniaturization The technologies supporting HTS are evolving rapidly. About 5 years ago, reports on combinatorial chemistry were just beginning to emerge. Today, this technology is becoming an integral part of drug discovery at both larger and smaller sites. Since some of the combinatorial technologies provide small sample yields, such as those based on solidphase synthetic methods, new miniaturized automation devices can be expected. Miniaturization is important when reagent or compounds are very limited or very expensive. The 384-well microtiter plate is one example, as are precision pipettors that are capable of rapidly delivering submicroliter aliquots to an entire plate [14]. Similarly, we should see a broader use of high-density pin replicating tools (HDPRTs). These are essentially 96 or 384 ‘pins’ aligned in register with the wells of the microtiter plates. Each of these pins (roughly 1 mm in diameter) is dipped into the plated drug inventory and then dipped into the assay media. This passive transfer – the HDPRT itself contains no moving parts – allows rapid addition of drugs with reasonable precision (±20%) for HTS. In the future, we might see a more prevalent use of such devices in ‘plate-less’ assays, wherein drug samples are ‘blotted’ as high-density arrays on cells growing as a ‘lawn’. An appropriate 283

M. Reichman and A. L. Harris

colorimetric or functional readout, such as β-galactosidase activity or cell viability, respectively, would be used to detect actives. Such methods are in use today [15,16], although they are not yet widespread. Advances in laboratory robotics In the arena of laboratory robotics, we should soon start to witness a better exploitation of automation principles that are already used in product manufacturing. Some of these principles include a much greater speed of operation, utilization of assembly line manufacturing principles, true multi-tasking, unattended batch processing abilities and online data analysis and decision making. For example, several companies are now building ultrahigh throughput HTS systems, although none have been validated in the field yet. One of these concept systems is from SAIC (Science Applications International Corporation, San Diego, CA, U.S.A.) and employs three electromagnetic, novel gantry-robot arms and a novel platform comprised of plate stackers that rise from below the deck of the system. The system will have a 1400-plate capacity, several controlled temperature zones and a ‘use’ footprint less than an 8' × 10' room. Similarly, CRS Robotics (Burlington, ON, Canada) is displaying a concept system that employs workstations equipped with plate stacking devices. Instead of the robot arm moving single plates on and off the various workstations on the system, the arm would move magazines each containing 20 plates or more. Finally, another HTS system design by Tomtec (Hamden, CT, U.S.A.) allows running several different assay modalities simultaneously. These modalities include filtration-based radioligand binding assays, colorimetric enzyme assays, fluorometric assays and cell-based transcriptional assays with chemiluminescence as the endpoint. Advances in information management Soon, we can expect more systems that integrate HTS assay execution with data analysis and decision making. For example, as the precision of robotic systems increases, it will become more feasible to set the criteria for designating active compounds closer and closer to the background noise level of the assay. Molecular diversity and structural pattern recognition analysis will be closely tied to data analysis. Decisions on what compounds to retest will be made based not only on the magnitude of the response but also on novel structural characteristics of dozens of weak actives. As high throughput organic synthesis becomes more routine, it will be more feasible to perform exploratory chemistry even on the weakest of actives. Combinatorial chemistry will be used routinely to determine which of parallel sets of weak actives can be improved most rapidly through high throughput synthetic methods. Multiplexing in HTS An emerging approach that will become more routine involves performing HTS on mixtures of 10 or more compounds in order to expedite discovery of the best potential pharmacophores in large chemical libraries. A priori, this approach is one way to realize approximately 10-fold gains in screening throughput and capacity without resorting to robotics or other sophisticated methods. Companies are disclosing some clever deconvolution strategies to identify the active components in mixtures [17]. Another approach to multiplexing will involve monitoring two or more endpoints that report on the activity of 284

Practical high throughput screening

separate biochemical processes in the cell [18]. Some companies already offer kits to accomplish this for molecular biology research (Tropix, Bedford, MA, U.S.A.). In the immediate future, more and more sites will begin to employ such abbreviated approaches more routinely.

Summary and key issues The discipline of HTS has had a profound effect on discovery research in the entire pharmaceutical industry. One could perhaps argue that the proliferation of certain biopharmaceutical startup companies, including those that are genomics-based, results directly from the emergence of modern HTS. It is clearer than ever that drugs are discovered today mainly through targeted HTS campaigns. However, HTS represents only the ‘admission ticket’ to travel the path to New Drug Application. Proficiency in many other arenas remains key. Regardless of the sophistication of the target or approaches to it, the pharmacological basis of HTS remains fundamental. How leads should be prioritized, what their underlying mechanism of action is and which assays are most predictive of the ultimate clinical endpoint(s) are examples of other issues that remain fundamental to success. Additional issues include: how stringent similarity searching should be and in how many dimensions; the viability of ‘opportunistic HTS’ (i.e., to first find a lead and then plan how to characterize and develop it) in to drug discovery; how deep the secondary assay estate should be and how contract research organizations (CROs) should be deployed; the appropriate lifetime for a screen; the right balance of targets in the HTS assay portfolio and the activity sufficient to initiate medicinal chemistry. Failure to consider these issues early, before they become rate-limiting, will result in wasting time and money, or outright failure, regardless of the prowess of the HTS laboratory. Effectively leveraging the economies of HTS – including associated technologies such as combinatorial and parallel high-throughput synthetic chemistry – throughout the R&D organization will likely be the next area of progress in the industry. As this occurs, we should see the present distinctions (apart from size) in preclinical drug development strength between ‘startup sites’ and ‘big pharma’ begin to blur. As we enter the second century in pharmaceutical R&D, HTS continues to deliver more pharmacophores for more projects than an organization can effectively develop. For now, HTS technologies have fulfilled today’s credo for new drug discovery and development, i.e., a better, cheaper and faster discovery of new leads from which to develop new medicines for old diseases.

References 1 Freyburger, W.A., Weeks, J.R. and Ducharmem, D.W., Naunyn-Schmied. Arch. Pharmacol., 251 (1965) 39. 2 Goodman, F.R., In Williams, M. and Malick, J.B. (Eds.) Drug Discovery and Development, Humana Press, Clifton, NJ, U.S.A., 1987, pp. 97–109. 3 Ronai, A.Z., Berzetei, I., Szekely, J.I., Graf, L. and Bajusz, S., Pharmacology, 18 (1979) 18. 4 Ward, S.J., Pierson, A.K. and Michne, W.F., Life Sci., 33 (Suppl. 1) (1983) 303. 5 Harris, A.L., Swamy, V.C. and Triggle, D.J., Can. J. Physiol. Pharmacol., 62 (1984) 146. 6 Webb, R.C. and Bohr, D.F., Am. Heart J., 102 (1981) 251.

285

M. Reichman and A. L. Harris 7 Maxwell, R.A. and Eckhardt, S.B., Drug Discovery: A Casebook and Analysis, Humana Press, Clifton, NJ, U.S.A., 1990. 8 Shaw, I., J. Biomol. Screening, 1 (1996) 1. 9 France, D.S., Murdoch, M.K., Russel, M., Surve, N., Ma, X., Drelich, M., Duston, C., Hanson, A., Paterniti, J.R. and Weinstein, D.B., Lab. Robotics Autom., 5 (1993) 201. 10 Leatherbarrow, R.J., Trends Biochem. Sci., 15 (1990) 455. 11 Fleming, W.W., Westfall, D.P., De La Lande, I.S. and Jellett, L.B., J. Pharmacol. Exp. Ther., 181 (1972) 339. 12 Smith, G.S., The Process of New Drug Discovery and Development, CRC Press, Boca Raton, FL, U.S.A., 1992. 13 Reichman, M., Schneider, P.H., Anderson, S.N., Williamson, L.N. and Savage, M.A., Proc. Int. Symp. Lab. Autom. Robotics, 4 (1992) 466. 14 Janzen, B. and Domanico, P., J. Biomol. Screening, 2 (1996) 63. 15 Brandt, D.W., In Proceedings of the IBC Conference on High Throughput Screening, 1996, in press. 16 Quillan, J.M., Jayawickreme, C.K. and Lerner, M.R., Proc. Natl. Acad. Sci. USA, 92 (1995) 2894. 17 Devlin, J.J., Liang, A,. Trinh, L., Polokoff, M.A., Senator, D., Zheng, W., Kondracki, J., Kretchmer, P.J., Morser, J., Lipson, S.E., Spann, R., Loughlin, J.A., Dunn, K.V. and Morrissey, M.M., Drug Dev. Res., 37 (1996) 80. 18 Ciccarelli, R.B., Winter, L.A., Lorenz, R., Harris, A.L., Crawford, A.C., Bailey, T.R., Singh, B., Hammarskjold, M.-L., Rekosh, D. and Hughes, J.V., Antiviral Chem. Chemother., 5 (1994) 169.

286

Deconvolution methods in solid-phase synthesis John J. Baldwin and Roland E. Dolle Pharmacopeia Inc., 101 College Road East, Princeton, NJ 08540, U.S.A.

Introduction The word library is used to define a collection of compounds usually built around a common structural motif. There are three general approaches to library preparation: parallel synthesis, mixture synthesis and split synthesis. One of these preferred strategies, parallel synthesis, is the approach where the compounds are made individually by automated or semi-automated methods. The library members may be made either in solution by classical methods, in solution attached to a polymeric carrier or on solid support. In parallel synthesis there must be linkage to a spatially defined position. The structure of the product is inferred from the position of the reactor and by the order of addition of the synthons and reagents at that position in space. Every possible member, resulting from the combinatorial mix of the synthons, need not be included in the library.

Combinatorial libraries In contrast, combinatorial libraries represent a subclassification and define a library in which every possible member that can be generated from the sets of reactants is present. Such libraries can be prepared by parallel synthesis, but are much more commonly synthesized by either mixture synthesis or split synthesis. Because of a commonality in structure among the members in each set of reactants and the optimized reaction conditions used, combinatorial libraries are composed of variants around a scaffold where the substituents control much of the diversity. The preferred mode of preparation for both mixture and split synthesis libraries is the solid phase. Both polymer matrices or pins and resin beads have been used. The solidphase approach is preferred because of its simplicity and ease of purification and isolation of the reaction products. Unlike the spatially addressable library when structure is defined by position in a set of reaction vessels, the structure of interesting library members prepared by mixture or split synthesis must be defined by highly sensitive analytical methods or indirectly by encoding or by biological results combined with resynthesis, the so-called deconvolution method.

Mixture libraries and deconvolution The structure determination of a library member by deconvolution can be used with either mixture libraries or those generated by split synthesis. Mixture libraries are formed by allowing a defined group of synthons to first couple to the support and then the process is repeated as described below until the library is complete. Because the chemistry Annual Reports in Combinatorial Chemistry and Molecular Diversity, Vol. 1, pp. 287–297 © 1997 ESCOM Science Publishers B. V.

287

J.J. Baldwin and R. Dolle

Fig. 1. Mixture library and deconvolution. (AAA) =all 20 amino acids simultaneously incorporated; (AA) = one of 20 amino acids; (AA) — = preferred amino acid as determined by bioassay.

and kinetics of coupling must be well developed, the method has been applied mainly to peptides and oligonucleotide libraries. As pioneered by Geysen and Houghten, the structure of the most active library member is determined in an iterative process by bioanalysis followed by resynthesis [1–7]. For example, a library of hexapeptides is prepared as 400 sublibraries, each containing 160 000 discrete members. Each sublibrary is composed of 400 defined dipeptides coupled with four sets of 20 amino acids. Each of the 400 sublibraries is evaluated as a mixture against a biotarget. The sublibrary with the highest bio read-out is resynthesized such that the two known amino acids are held constant and coupled with a mixture of three sets of 20 amino acids. The remaining amino acid is separately defined and forms the basis for 20 new sublibraries. The highest bio read-out across this set of sublibraries defines the third preferred amino acid. This iterative process is repeated until the structure of the most active hexapeptide is defined (Fig. 1). In a similar approach, called position scanning [8,9], sets of sublibraries are prepared in which the 20 amino acids at each position are separately defined (Fig. 2). For example, with a pentapeptide, the first set of amino acids defines 20 sublibraries and each of the amino acids in positions 2, 3, 4 and 5 are fully randomized with a set of 20 amino acids. The second set of 20 sublibraries is prepared with a defined amino acid at position 2; the amino acids in positions 1, 3, 4 and 5 are fully randomized. In a similar manner, amino acids at positions 3, 4 and 5 are each defined in sets of 20 sublibraries. Testing of the five sets of 20 sublibraries will define the preferred amino acids at each of the five positions in the pentapeptide chain. The resynthesis of individual peptides having the preferred 288

Deconvolution methods in solid-phase synthesis

amino acids at each position and retesting will define the structure of active members in the library.

Split synthesis approach The third, and perhaps the most powerful, approach to combinatorial libraries is split synthesis. This technique was first defined as an approach to peptide libraries by Furka et al. [10-12] and Lam et al. [13] (Fig. 3). In this strategy each of the first-step synthons is independently added to resin in an individual reaction vessel. After the reaction is complete, the solid supports are pooled and then redivided into a second set of reaction vessels. A single member of the second set of synthons is added to each vessel. The supports are again pooled and redivided for the third reaction sequence. This pooling, mixing and dividing is continued until the synthesis is complete and results in only a single compound being incorporated on any solid support bead. This is quite different from mixture synthesis, where many different compounds may be present on the solid support. Although only a single compound is present on any resin bead, that particle cannot be easily distinguished from any other particle holding a different member of the library. At this stage, all beads are anonymous [14,15]. The structure of the bioactive members of a combinatorial library prepared by split synthesis can be determined by several methods including bioactivity/deconvolution, microanalytical methods and encoding. In a generalized example of the biorecognition/ deconvolution approach, the final sublibraries are not pooled but tested as mixtures. The most active sublibrary then defines the synthon used in the last synthetic step. The synthesis is then repeated to the penultimate step; these sublibraries are not mixed, but each is separately reacted with the preferred last-step synthon. These resulting sublibraries are tested; the most active then defines the preferred synthon in the penultimate step, and hence the last two synthons present in the most active member of the family are defined.

Fig. 2. Mixture libraries – position scanning. (AA) =one of 20 amino acids; (AAA) = all 20 amino acids simultaneously incorporated.

289

J.J. Baldwin and R. Dolle

Fig. 3. Combinatorial synthesis.

This iterative process is repeated until the full structure of the most active member is determined. The process is illustrated in Fig. 4 with an eight-membered library prepared in three steps using two synthons in each step. In a related strategy termed recursive deconvolution [16], samples of all sublibraries at each stage are retained. With this approach it is not necessary to repeat the entire synthesis at each stage of the deconvolution. One simply adds the previously determined preferred synthons to the reserved sublibraries. With peptide and oligonucleotide libraries prepared by split synthesis, the structure of a bound ligand on an individual bead can be determined by microsequencing [17,18]. This approach has been applied to testing libraries where the bead bound compounds are evaluated for their ability to bind to monoclonal antibodies or other recognition macromolecules. In one such approach, fluorescence-based methods are used to select the most 290

Deconvolution methods in solid-phase synthesis

active, that is, the most fluorescent beads can be sorted by deflection and the structure of the library member can then be determined by sequencing.

Mass spectrometry and structure determination A second highly sensitive microanalytical method appropriate for structure determination in libraries prepared by split synthesis is mass spectrometry [19–25]. The amount of releasable compound from the typical 130 µm bead used in split synthesis is approximately 200 pmol. Since most of this material will be used in bioassays, only the most sensitive methods such as mass spectrometry offer significant potential. However, this method is not ideal, especially in regard to issues of molecular weight degeneracy, for the unambiguous identification of the molecular ion, if present. The general utility and reliability of the method has not been established for large libraries of small nonoligomeric compounds where essentially all the material on the bead has been utilized in bioassays. Attempts have been made to circumvent the limitation of molecular weight degeneracy by a judi-

Fig. 4. Deconvolution of split synthesis sublibraries.

291

J.J. Baldwin and R. Dolle

cious choice of the pool members [26,27] or by adding a small percentage of a capping reagent at each step. With this latter method, mass spectrometric analysis of the reaction products will exhibit molecular ions of the truncated capped sequence as well as the full oligomer[22]. NMR using the magic angle spinning (MAS) technique has also been used to determine the structure of a bound ligand [28–30]. In one case complete assignment was made to Wang resin bound Fmoc-lysine Boc using MAS HMQe and TOCSY data [30].

Encoded combinatorial libraries The simplest strategy for determining the structure of a compound prepared on a single bead is encoding. This indirect method of structure determination can be used both with testing methods that involve bead bound ligands as well as with solution methods that use material removed from the solid support.

Peptide encoding strands Fourencoding technologies have been developed: oligonucleotides [31,32], peptides [33], molecular tags [34,35] and radio frequency [36,37] encoding. With peptides as the encoding units, an alternating synthesis of the ligand and the coding strand is used. An orthogonal protecting group approach using base labile N-Fmoc and acid labile N-MOZ substituents has been developed. This allows independent elongation of the coding strand and the ligand, each built from one of the amino groups of a lysine spacer. The structure of the ligand can be inferred by microsequencing of the coding strand using Edman degradation. This method and the related approach described by Lebl and co-workers [38] have been applied mainly to the structure determination of nonsequential peptides where unnatural amino acids had been incorporated into the ligand strand. Oligonucleotide encoding strands The use of DNA as an encoding strand was first proposed by Brenner and Lerner [31] and pioneered by Needels et al. [32]. Because of the amplifying power of the polymerase chain reaction, only trace amounts of the encoding strand are incorporated onto the polymeric support. The method has been applied to peptide libraries and requires an orthogonal protection strategy for alternating the elongation between the coding strand and the ligand. Such differential protection allows the controlled growth of both chains. After biological evaluation, the DNA strand may be amplified by PCR and then microsequenced to identify the encoded ligand. The potential limitations of protection strategies for the DNA strand and its compatibility with the reagents and conditions used for nonoligomeric libraries may limit the generality of this encoding method. Molecular tags as an encoding strategy Molecular tags overcome the chemical sensitivity and orthogonal protection required by both peptide and oligonucleotide encoding. Two such methods have been developed and applied to the synthesis of both peptide and nonoligomeric libraries. One of these [34] approaches uses chemically robust haloaromatic tags which are used in a binary encoding 292

Deconvolution methods in solid-phase synthesis

Fig. 5. Encoded combinatorial chemistry libraries.

293

J.J. Baldwin and R. Dolle

Fig. 6. Amine encoding strand.

scheme to unambiguously define each synthon used in the library construction (Fig. 5). By defining each synthon used in each step on each bead, the reaction history of every solid suport particle can be defined and the structure of the compound inferred. By using the encoding molecules in a binary manner, only the presence or absence of a given tag needs to be determined and only a few tags are needed to encode an enormous amount of information. With n tags 2n bits of information can be encoded; a null set of tags is not used since the absence of all tags in an encoding set may lead to some ambiguity in interpretation. The encoding tags are halophenoxy derivatives of aliphatic alcohols and are attached to either a photolabile linker as a carbonate or an oxidatively cleavable linker as an ether. The photolabile linker/tag complex is especially well suited to encode peptide libraries through attachment to approximately 1% of the amino groups of each amino acid synthon at eachstage of the synthesis. For structure determination, the tags are released by photolysis at 360 nM. The oxidatively cleavable linker/tag complex has broader application and is introduced into the polymeric support by a rhodium-catalyzed carbene insertion. The tags are liberated from the solid support by treatment with ceric ammonium nitrate. For analysis the released tags are first silylated and then separated by gas chromatography. Since the tags are electrophoric, they can be analyzed in subpicomole quantity by electron capture detection. The structure of the compound is deduced from the resulting chromatogram. It is not only the ease of introduction, facile detachment, rapid determination and straightforward interpretation that makes these electrophoric encoding molecules so useful, but also the chemical stability of the tags. This stability toward reactants and 294

Deconvolution methods in solid-phase synthesis

reagents used for nonoligomeric libraries is a critical component in their broad applicability: the availability of this robust encoding system has made the split synthesis adaptable to the preparation of large nonoligomeric libraries. A second encoding method [35] using chemically robust tags has been developed. In this approach secondary amines are incorporated into a poly N-(dialkylcarbamoylmethyl) glycine strand. A defined set of N-protected N -(dialkylcarbamoylmethyl) glycines is added to the amine group of an N-Boc glycine amide of functionalized Tentage1 resin (Fig. 6). After deprotection, chain elongation is accomplished by the addition of additional tagging units. For decoding the secondary amine tags are released by acid hydrolysis using 6 N HCI at 130–140°C in a sealed tube for 15 h. The tags may be read directly using electrospray mass spectrometry or by capture with dansyl chloride and analysis by HPLC. Radio frequency encoding Radio frequency encoding is a nonchemical approach used to define the reaction history experienced by a set of solid supports. The method, at this stage of its development, is really a parallel synthesis technique where the synthons used are defined not by a position in space but are recorded using radio frequency signals on a semiconductor memory device. In practice an 8 × 1 × 1 mm semiconductor memory microchip capable of receiving, storing and emitting radio frequency signals is placed in a porous container along with solid support beads. A group of these 'tea bags' is used in a split synthesis paradigm and a specific radio frequency signal is recorded on the memory chip for each synthon at each step. A 96-membered peptide library was prepared using 96 such tea bags. Simply reading the radio frequency code from the chip in any particular tea bag defined the reaction path experienced by the beads in that packet [36]. In a similar approach, a pretuned glass encased microchip set to emit a unique binary code is placed in a polypropylene tea bag loaded with polystyrene beads. Using a modified split synthesis approach, a 125-membered tripeptide library N-capped as the p-carboxycinnamic acid amide was prepared on Rink resin. Each porous reactor contained a radio frequency transporter which successfully defined the structure of two inhibitors of protein tyrosine phosphatase [37].

Conclusions When the structure of a bioactive library member cannot reliably or consistently be determined by classical methods, a deconvolution strategy may be employed. This situation usually arises when a library is generated by mixture synthesis or by split synthesis. With peptide libraries the bioanalysis/resynthesis method has been successfully used to define the most active sublibrary at each stage during an iterative resynthesis. This process provides only limited structure–activity relationships (SAR) and the results can be confounded by additivity effects produced by the presence of many weakly active members in any one pool. The related position scanning approach offers the possibility of greater SAR information, but involves a significant resynthesis of potentially active members and also, with large pool sizes, lends itself to additivity effects. Encoding techniques offer a powerful approach to rapidly dissecting out significant SAR information from an entire library. Among the methods available, the chemically 295

J.J Baldwin and R. Dolle

inert tags offer the greatest promise. These encoding molecules allow a significant breadth in the chemistry that can be used for the synthesis of the library members, One of the most recently defined techniques, radio frequency encoding, depends on the inert chemistry nature of the microchip. This property lends itself to the needed wideranging chemistry which is restricted only by the nature of the polymeric support and the linking element used for compound detachment. However, the method is currently not easily adaptable to large combinatorial libraries. For the approach to become a truly useful tool in drug discovery, it must be capable of encoding large nonoligomeric libraries. It is only with large, well-designed libraries that high-quality leads can be found and useful SAR produced.

References 1 Geysen, H.M., Rodda. S.J. and Mason, T.J., Mol. Immunol., 223 (1986) 709. 2 Geysen, H.M., Rodda, S.J., Mason, T.J., Tribbick, G. and Schoofs, P.G., J. Immunol. Meth., 102 (1987) 259. 3 Bray, A.M., Maeji, N.J. and Geysen, H.M., Tetrahedron Lett., 31 (1990) 5811. 4 Geysen, H.M. and Mason, T.J., Bioorg. Med. Chem. Lett., 3 (1993) 397. 5 Houghten, R.A., Pinilla, C., Blondelle, S.E., Appel, J.R., Dooley, C.T. and Cuervo, J.H., Nature, 354 (1991)84. 6 Houghten, R.A., Appel, J.R., Blondelle, S.E., Cuervo, J.H., Dooley, C.T. and Pinilla, C., BioTechniques, 13 (1992)412. 7 Houghten, R.A. and Dooley, C.T., Bioorg. Med. Chem. Lett., 3 (1993) 405. 8 Pinilla, C., Appel, J.R.. Blanc, P. and Houghten, R.A., BioTechniques, 13 (1992) 901. 9 Dooley, C.T. and Houghten, R.A., Life Sci., 52 (1993) 1509. 10 Furka, A., Sebestyen, E, Asgedom, M. and Dibo, G., 14th International Congress of Biochemistry, Vol. 5, Prague, Czechoslovakia, 10–15 July 1988, Walter de Gruyter, Berlin, Germany, 1989, p. 47 (abstract). 11 Furka. A., Sebestyen, F., Asgedom, M. and Dibo, G., 10th International Symposium on Medicinal Chemistry, Budapest, Hungary, 15–19 August 1988, Elsevier, Amsterdam, The Netherlands, 1989, p. 288 (abstract). 12 Furka, A., Sebestyen, E, Asgedom, M. and Dibo, G., Int. J. Pept. Protein Res., 37 (1991) 487. 13 Lam, K.S., Salmon, S.E., Hersh, E.M., Hruby, V.J., Kazmierski, W.M. and Knapp, R.J., Nature, 354 (1991) 82. 14 Dooley, C.T., Chung, N.N., Wilkes, B.C., Schiller, P.W., Bidlack, J.M., Pasternak, G.W. and Houghten, R.A., Science, 266 (1994) 2019. 15 Blondelle, S.E., Takahashi, S.E., Weber, P.A. and Houghten, R.A., Antimicrob. Agents Chemother., 38 (1994) 2280. 16 Erb, E., Janda, K.D. and Brenner, S., Proc. Natl. Acad. Sci. USA, 91 (1994) 11422. 17 Moldal, M., Svendsen, I., Breddam, K. and Auzanesu, E, Proc. Natl. Acad. Sci. USA, 91 (1994) 3314. 18 Needels, M.N., Jones, D.G., Tate, E.N., Heinkel, G.L., Kochersperger, L.M., Dower, W.J., Barrett, R.W. and Gallop, M.A., Proc. Natl. Acad. Sci. USA, 90 (1993) 10700. 19 Stankova, M., Issakova, O., Sepetov, N.F., Krchnák, V., Lam, K.S. and Lebl. M., Drug Dev. Res., 33 (1994)146. 20 Chen, C., Randall, L.A.A., Miller, R.B., Jones, A.D., Kurth, M.J., J. Am. Chem. Soc., 116 (1994) 2661. 21 Brown, B.B., Wagner, D.S. and Geysen, H.M., Mol. Div., 1 (1995) 4. 22 Egner, B.J., Langley, G.J. and Bradley, M., J. Org. Chem., 60 (1995) 2652. 23 Zambia, R., Boulton, A. and Griffin, P.R., Tetrahedron Lett., 35 (1994) 4283. 24 Brummel, C.L., Lee, I.N.W., Zhou, Y. and Benkovic, S.J., Science, 264 (1994) 399. 25 Brummel, C.L., Vickerman, J.C., Carr, S.A., Hemling, M.E., Roberts, ED., Johnson, W., Weinstock, J., Gaitanopoulos, D., Benkovic, S.J. and Winograd, N., Anal. Chem., 68 (1996) 237. ^

296

Deconvolution methods in solid-phase synthesis 26 Dunayevskiv, YM., Vouros, P., Carell, T., Wintner, E.A. and Rebek, J., Anal. Chem., 67 (1995) 2906. 27 Carell, T., Wintner, E.A., Sutherland, A.J., Rebek, J., Dunayevskiv, Y.M. and Vouros, P., Chem. Biol., 2 (1995) 171. 28 Fitch, W.L., Detre, G., Holmes, C.P., Shoolery, J.N. and Keifer, P.A., J. Org. Chem., 59 (1994) 7955. 29 Anderson, R.C., Jarema, M.A., Shapiro, M.J., Stokes, J.P. and Zilioz, M., J. Org. Chem., 60 (1995) 2650. 30 Anderson, R.C., Stokes, J.P. and Shapiro, M.J., Tetrahedron Lett., 36 (1995) 5311. 31 Brenner, S. and Lerner, R.A., Proc. Natl. Acad. Sci. USA, 89 (1992) 5381. 32 Needels, M.C., Jones, D.G., Tate, E.H., Heinkel, G.L., Kochersperger, L.M., Dower, W.J., Barrett, R.W. and Gallop, M.A., Proc. Natl. Acad. Sci. USA, 90 (1993) 10700. 33 Kerr, J.M., Banville, S.C. and Zuckermann, R.N., J. Am. Chem. Soc., 115 (1993) 2529. 34 Ohlmeyer, M.H.J., Swanson, R.N., Dillard, L.W., Reader, J.C., Asouline, G., Kobayashi, R., Wigler, M. and Still, W.C., Proc. Natl. Acad. Sci. USA, 90 (1993) 10922. 35 Ni, Z., Maclean, D., Holmes, C.P., Murphy, M.M., Ruhland, B., Jacobs, J.W., Gordon, E.M. and Gallop, M.A., J. Med. Chem., 39 (1996) 1601. 36 Nicolaou, K.C., Ziao, X.Y., Parandoosh, Z., Senyei, A. and Nova, M.P., Angew. Chem., Int. Ed. Engl., 34 (I 995) 2289. 37 Moran, E.J., Sarshar, S., Cargil, J.F., Shahbaz, M.M., Lio, A., Mjalli, A.M.M. and Armstrong, R.W., J. Am. Chem. Soc., 117 (1995) 10787. 38 Nikolaiev, V., Stierandova, A., Krchnák, V., Seligmann, B., Lam, K.E., Salmon, S.E. and Lebl, M., Pept. Res., 6 (1993) 161. ^

297

Patent strategies in molecular diversity Karl Bozicevic Fish & Richardson P.C., 2200 Sand Hill Road, Menlo Park, CA 94025, U.S.A.

Introduction Advocating the property rights of clients is the very essence of what lawyers do. The skill and strategy of that advocacy is an important factor in dictating the results obtained and, specifically, the economic return one can obtain on an intellectual property portfolio. Intellectual property (e.g., patent rights) is a specific type of property which shares legal characteristics with real property, i.e., land ownership. Understanding the similarities and differences between real and intellectual property provides a basis for understanding how it is possible to protect property in the form of inventive methods of chemical synthesis and libraries of compounds in the field of molecular diversity. Researchers may incorrectly assume that it is necessary to isolate a compound with a desired activity from a library to obtain patent protection. However, scores of patents already exist in the field of molecular diversity directed to methods of making libraries, devices used in these methodologies, assay methodologies and on libraries themselves. The following provides a discussion of some of those patents along with an analysis of specific strategies applied to protect the inventions and thereby provide a means for obtaining a lucrative economic return on the research investment.

Real and intellectual property In a very basic legal sense, real property (succinctly defined as land; however, it includes things that are permanently fixed to land and is technically a right, interest or ownership existing in the soil) is based on ownership of land and intellectual property (broadly speaking, a property right resulting from the physical manifestation of original thought) is based on ownership of ideas (see Table 1). But what does it mean to ‘own’ land or an idea? Most non-lawyers have little more than a vague idea of what it legally means to ‘own’ anything and as such are often surprised to find their ownership interest is almost never complete, i.e., absolute (an absolute right in property providing the holder with a complete, unqualified, and unconditional possession, control, dominion, and right of disposition which descends to one’s heirs upon death). In a legal sense, ownership interests relate to the degree to which the owner can use and enjoy the property and exclude others from it. If one owns a home on a piece of land, the right to use and enjoy the land is defined in a deed (a writing signed by the grantor whereby title to realty is transferred) describing rights which are generally sufficient but far from complete. The owner’s use of the land is generally limited to a single family home, i.e., the owner cannot use the land to construct a multiple dwelling structure in the form of a multiple story apartment building and 298

Annual Reports in Combinatorial Chemistry and Molecular Diversity, Vol. I , pp. 298-313 © I997 ESCOM Science Publishers B. V.

Patent strategies in molecular diversity TABLE 1 CHARACTERISTICS OF REAL VERSUS INTELLECTUAL PROPERTY Characteristic

Real property

Intellectual property

How acquired

Originally discovered and claimed – then conveyed by deed

Originally invented, authored or conceived and protected by obtaining patents, copyrights, trademarks, or maintaining as trade secret

Defined by

Boundary lines – the metes and bounds recited on a deed

Patents by what is claimed and at times what is equivalent thereto

Duration

Defined by the ownership interests of the deed – period of time can be very long or short

Patents valid after issuance and run for 20 years from the filing date with some possibilities to extend

Value relates to

Size and location where others want to be

Breadth of claims and degree to which others want to use what is protected

How protected

Action brought for trespass in state court

Action brought for infringement in federal court

can certainly not use the land as a dump or to build a factory or other business. The owner can generally exclude others from the land but this right to exclude is also limited. Utility companies will generally have ‘easements’ (a right of use over the property of another) over the land which provide a right to run electric, phone, water, gas, and sewage lines over or under the land. Such easements generally include a right to enter the land to repair and maintain the lines. The ownership interests are also limited in time. Although one can own the property for life and, in general, will the property to an heir, the ability to control ownership into future generations is quite limited. Many jurisdictions have a ‘rule against perpetuity’, meaning that one cannot convey property beyond the time of lives in being plus 21 years.

Types of intellectual property Intellectual property (see Table 2) has some basic similarities to real property and noting this is useful in understanding what it means to legally own the products of intellectual creativity. The deed to land is held by the owner and describes the metes and bounds (the boundary lines of land, with their terminal points and angles) of the property. The deed to intellectual property is in the form of a patent (the term ‘patent’ has a variety of meanings, but is used here to refer to a right, in the U.S., to exclusive manufacture, use, and sale of an invention under Title 35 of the U.S. Code), copyright (a right to exclude others from copying a work of authorship as per Title 17 of the U.S. Code), or trademark (mark of authenticity of being distinguished from the goods or services of others), all of which are issued by the Federal Government. A trademark is a name, symbol, and/or color (the pink color on insulation was entitled to protection and registration, see Ref. 1) placed on a product to indicate the source of origin of the product. Trademarks can be continually renewed by grants from the Federal Government and Title 15 of the U.S. Code. Although devices and libraries created in the 299

K. Bozicevic TABLE 2 TYPES OF INTELLECTUAL PROPERTY COMPARED Characteristic

Patent

Trademark

Copyright

Time period

Twenty years from filing

Can be perpetual if renewed

Life of author plus 50 years

Legal requirements

Novel, unobvious and fully disclosed

Distinct, not confusingly similar to the mark of another

Original work of authorship

costs

Generally US$ 5000– 20 000

Generally less than US$ 1000

Generally less than US$ 1000

How to obtain

File application fully enabling with disclosure of best mode

File application with mark and use in interstate commerce

Reduce any original work of authorship to tangible medium and file to register

What is protectable

Any new and useful process, machine, manufacture or composition of matter – includes compounds, viruses, cells, or animals but not a method of doing business

Any word, name, phrase, color, symbol or combination thereof but not descriptive or confusingly similar marks

Any original work of authorship reduced to a tangible medium, e.g., books, movies, records, photos, music

field of molecular diversity could be sold under a particular trademark (e.g., DIVERSOlibrosTM), the trademark would not prevent others from making or selling a similar product provided it was sold under a different name – the difference must be sufficient such that the marks are not 'confusingly similar' to each other. A copyright can be issued to protect works such as writings, music, movies, photos, etc. (any original work of authorship fixed in any tangible medium of expression – see 17 USC §102). The right granted allows the copyright owner to prevent others from copying the particular manner of expression (for example, others may copy the idea of the ‘Wizard of Oz’ where a girl is separated from her home, has adventures and finds her way home, but may not use the same characters and script to tell the story), but does not prevent others from using the same or equivalent ideas and concepts. For this reason, copyright protection is of little use in protecting the creative output of researchers dealing with the field of molecular diversity. Thus, patents are what is relied on to obtain protection. In some instances, it is possible to obtain protection by maintaining the invention as a trade secret. A trade secret may consist of any formula, pattern, device, or compilation of information which is used in one's business, and which gives one an opportunity to obtain an advantage over competitors who do not know or use it. It may be a formula for a chemical compound, a process of manufacturing, treating or preserving materials, a pattern for a machine or other device, or a list of customers. Others may not illegally take trade secret information, but they may independently develop it and thereafter freely use such information. The characteristics of the main forms of protection are summarized in Table 2 in a generalized and simplified manner. A U.S. patent is a form of intellectual property which is granted to a patentee inventor, for a limited time, by the Federal Government in exchange for the inventor having created 300

Patent strategies in molecular diversity

and disclosed to the world the invention covered by the claims of the patent (see 35 USC §271). Like a deed to real property, a patent to intellectual property allows the owner to exclude others. The land owner can sue for the tort of trespass (any unlawful interference with one’s property right) by (i) proving a valid deed is held; and (ii) showing that another has, without the right to do so, trespassed on the land covered by the deed. When (i) and (ii) are proven, a judgement is obtained and generally the trespasser will voluntarily leave the land. However, if the trespasser refuses to leave even after the judgement is obtained, the party obtaining the judgement executes the judgement via the local sheriff who is authorized to use force, if needed, to remove the trespasser. Thus, the owner’s right to exclude is fully exercised. A patentee can sue a trespasser or, in the case of patents, an infringer (in general, one who makes, uses, or sells a patented invention without the authority of a patentee, but see 35 USC §271(a)–(g), which are discussed further below) in a similar manner. The patentee must prove that (i) a valid patent is held; and (ii) the infringer is making, using, or selling that which is covered by a claim of the patent. When (i) and (ii) are proven, a judgement is obtained and the infringer will stop the infringing activity either voluntarily or by being forced to stop by federal marshals. A deed, as with a patent, can be held invalid for a variety of reasons. (Patents are most often invalidated by showing that the subject matter being patented was known by others before the patentee or was obvious in view of what was known before under two sections of the federal statutes dealing with patents, 35 USC §102 or §103.) However, when a deed to land is held invalid, the ownership rights will end up being held by another, i.e., there is no land without an owner. When a patent is held invalid, it is possible that the ownership rights will go to another such as a prior inventor (such disputes are often handled in a proceeding called an Interference as per 35 USC §146), but generally the rights evaporate and are held by no one. Thus, the public is free to make, use, and sell that which was covered by the patent. A patent does not confer a right to make, use, and sell, only the right to exclude others. If no one has the right to exclude, all are free to make, use, and sell that which was covered by the invalidated patent, provided there is (are) no other patent(s) in place preventing such. TABLE 3 TYPES OF PATENT CLAIMS Type

Covered

Not covered

Used when

Product claims

Compounds, libraries of compounds, phage, assay devices

Method of using compounds, more efficient way of making compounds, can still be patented by others

the product is new

Process claims

Methods of making and using compounds

Compounds made by different methods

the process is new and the compound is old

Product-byprocess claims

Varies – can cover compounds made by any process, may be limited to process step

Varies depending on whether the compound is new

Best when compound or composition can only be described by the process by which it is made

301

K. Bozicevic

Deeds and patents are also alike in that the value of each is generally greater if they claim rights to a large area of property. A deed to one square inch of property by itself is of little value in that the owner may not have access to the property and others can avoid trespass easily, e.g., by walking around and not over that square inch. When the claims of a patent are narrowly focussed, others can easily ‘design around’ that which is claimed and the patent has little or no value. Some deeds cover vast areas but are still of little value. For example, a deed to hundreds of acres in the Antarctica would have little value in that few people want to go on the land. The same is true with respect to patents. A patent which covers a large segment of technology which no one wants to use is of little or no value. Thus, the trick to increasing value (for deeds and patents) is to cover a large area of property located where others want to be. The following explains strategies used to obtain valuable property rights in technologies relating to molecular diversity.

Types of patent claims The claims of a patent are what describe the metes and bounds of the property owned. If a claim ‘reads’ on a product, then that product infringes the claim and the patentee can prevent others from making, using, or selling the product (see 35 USC §271 for all rights to exclude afforded by patents). A claim ‘reads’ on a product if the product includes a component or step which corresponds to each component or step of the claims. The product may have additional components or steps and still infringe. By analogy, one may cross over several different pieces of land every day and be a trespasser on your land if one of the pieces of property traversed is covered by your deed. However, if the claim includes elements or steps which do not correspond to elements or steps of the product, there is no infringement as the claim does not ‘read’ on the product. Claims are, at times, read under the ‘doctrine of equivalents’ whereby when there is no literal infringement or ‘reading on’, infringement is found if the element does substantially the same thing in substantially the same way to obtain substantially the same results. For example, if you invent a table and claim a horizontal support surface (the table top) and four vertical supports (the four legs), the making of a three-legged table does not infringe the claims – there is nothing for the fourth claimed leg to read on. Thus, to show infringement broad claims which recite the minimum number of elements are more likely to be infringed. However, if a claim is too broad it will be held invalid because it ‘reads on’ prior art, i.e., it covers previously known and disclosed inventions (one can only get a patent if the invention was not previously known by others or obvious over what was known, see 35 USC §102 and §103). Thus, using the table analogy a claim to a one-legged table would be best in terms of encompassing infringers. However, if one- and two-legged tables were known by others this would invalidate a claim to a one-legged table. A threeor four-legged table claim might be patentable over previously known one- or two-legged tables due to the greater stability provided by the extra leg(s). There are basically three types of patent claims: (i) product or compound claims; (ii) process or method claims; and (iii) product-by-process claims (see Table 3). The name of each type of claim is self-defining. Product claims cover devices such as robotics [2] which are used in combinatorial chemistry. In addition, these claims can cover a particular compound, group of compounds or even an entire library of compounds [3]. Note that 302

Patent strategies in molecular diversity

Fig. 1. Patent application and priority dates.

a compound or product claim covers the compound regardless of how it is made. A process claim can cover a series of process steps used in the chemical synthesis of a compound or an inventive method used to produce any product. Note that a process claim may not be limited to producing any particular compound – it covers any process which carries out the claimed steps even though the process might produce a wide range of different products by using different reactants. Product-by-process claims are particularly useful when the product itself can only be described by the process used to make it. A product-by-process claim is subject to different interpretations depending on the circumstances and these are described further below. A ‘true’ product-by-process claim will cover all products of the type claimed even if produced by process steps different from those recited in the claims [4]. The main types of claims are summarized below. Process claims Some of the earlier work in molecular diversity was carried out by Rutter et al. [5] producing large libraries of peptides, i.e., a mixture of peptides which included not just hun303

K. Bozicevic

dreds but tens of thousands and even millions of peptides. The first claims to issue were method claims directed to one specific embodiment whereby activated amino acid reactants were reacted with acceptor amino acid reactants in amounts based on the reaction rate constants. An exemplary method claim from U.S. Patent No. 5 010 175 is the following: What is claimed is: ‘A method of preparing in the same reaction vessel a mixture of peptides of distinct, unique and different sequences which mixture contains each peptide in retrievable and analyzable amounts and in substantially equal molar amounts, comprising: combining and reacting activated amino acids with an acceptor amino acid or peptide wherein the said activated amino acids are provided in concentrations relative to each other based on the relative coupling constants so that the mixture of the peptides resulting from the reaction contains each of the peptides in predictable and dejined amounts sufficient for each of the peptides to be retrieved and analyzed.’ The above method claim is broad in scope by covering all methods which use a basic concept of the invention. However, it is narrow in two respects. First, it does not cover peptide libraries made by other methods [6]. Second, it is limited to a specific version of the basic method wherein the reaction product mixture includes each product in ‘substantially equal molar amounts’. These limitations were dealt with in related applications as described below. Specifically, the application was filed as a continuation application. (Under 35 USC §120, once a U.S. application is filed additional ‘continuation’ applications can be filed off it claiming priority back to the filing date of the first filed case, see 37 CFR 1.60.) The filing of an application establishes a priority date and others wanting to invalidate the patent must be before that date to establish a prior invention. A continuation application has the same priority date as the originally filed parent application (see Fig. 1). The claims of the continuation application were prosecuted without the limitation regarding equal molar amounts. An issued claim from U.S. Patent No. 5 225 533 reads as follows: What is claimed is: ‘A method of preparing a mixture of distinct, unique and different peptides in the same vessel, which mixture contains each peptide in retrievable and analyzable amounts, compr ising: combining and reacting activated amino acids with an acceptor amino acid or peptide wherein the said activated amino acids are provided in concentrations relative to each other based on the relative coupling constants so that the mixture of the peptides resulting from the reaction contains each of the peptides in predictable and dejined amounts sufficient for each of the peptides to be retrieved and analyzed.’ The claim from the ’533 patent is broader than the claim from the ’175 patent. Both claim an inventive method for making a mixture of peptides whereby amino acid reactants are combined with each other based on their relative reaction rate constants. By combin304

Patent strategies in molecular diversity

ing reactants with each other in proportional amounts based on reaction rate constants, no individual or groups of individual products will dominate the reaction product mixture, thereby drowning out products produced from less reactive amino acid reactants. However, the ’533 claim is not limited to producing mixtures wherein each of the components of the mixture is present in substantially equal molar amounts. Thus, the strategy here was to obtain initial protection on a specific embodiment of the invention via the ’175 patent which included the ‘substantially equal molar’ limitation and then to expand the scope via the ’533 patent which eliminated that limitation in favor of the phrase ‘retrievable and analyzable amounts’. Both of the above discussed claims are ‘method claims’ and as such would not cover peptide libraries made by different methods. The value of the claims is determined by their breadth (and these claims are fairly broad) and the degree to which others want to use the method. Methods of synthesis can vary greatly in efficiency and there is often more than one way to make a compound. If there is more than one way to make a compound, everyone will want to use the most efficient method. Thus, if others develop methods which are equivalent or more efficient than the patented method the value of the patent is decreased. This is a difficult problem to deal with when trying to obtain meaningful protection on process inventions. The coverage can be improved at times by (i) contemplating alternative methods of synthesis when filing the application on a primary method; and (ii) defining reactants via general structural formulae rather than specific moieties. Product claims At times, an inventive process is applied to produce an inventive product. When this occurs, the product can be patented, thereby making it possible for the inventors to prevent the product from being made by any method. If a patent is obtained on the product, the patent would not prevent others from developing and patenting a more efficient method of synthesis. However, the product patent would dominate the field. More specifically, the holder of the product patent could prevent the holder of the improved method patent from commercializing the new efficient process as it might be applied to making the product via his patented process. In such a situation, both sides recognize the benefit provided by the other and a cross-license agreement is often reached, i.e., each patentee grants the other a license to operate under the other’s patent. Both sides may pay the other a royalty based on sales. In general, the product patentee holds a dominant position in the negotiation of the agreement because the product patentee can commercialize the product by using a less efficient method of synthesis but the process patentee cannot commercialize the process unless it can be used to make products not covered by the product patent. After obtaining the second broader process patent, Rutter et al. were able to obtain claims to a product, i.e., to libraries per se. Prior to the methods of Rutter et al., others made small libraries [7], but not libraries containing thousands of peptides wherein each peptide of the library was present in a relative amount such that it was not drowned out by the presence of other components of the library. Examples of claims to the library per se obtained in U.S. Patent No. 5 266 684 are provided below. We claim: ‘(1) A predetermined mixture of peptides containing 8000 or more different peptides of 305

K. Bozicevic

distinct, unique and different amino acid sequences, wherein the presence ofeach peptide in the mixture is predetermined, each peptide is present in the mixture in retrievable and analyzable amounts and the mixture includes at least one biologically active peptide in a retrievable and analyzable amount. (2) A mixture as claimed in claim 1, wherein the mixture contains 160000 or more different peptides of distinct, unique and different amino acid sequences, each in retrievable and analyzable amounts. (3) The mixture as claimed in claim 1, wherein the mixture contains 3 200 000 or more different peptides of distinct, unique and different amino acid sequences, each in retrievable and analyzable amounts. (4) The mixture as claimed in claim 1, wherein the mixture contains 64 000 000 or more different peptides of distinct, unique and different amino acid sequences, each in retrievable and analyzable amounts.’ Claim 1 of the ’684 patent indicates that the library contains 8000 or more peptides. Thus, any library containing 8000 or more peptides is covered by the claim regardless of how the library is made provided all the other claim limitations ‘read’ on the library. The dependent claims 2–4 might, at first, appear broader than the independent claim 1 because they cover larger libraries containing 160 000,3 200 000 and 64 000 000 peptides. However, dependent claims are always narrower in scope than the independent claims upon which they depend and such is the situation here. A library containing 8000 peptides and having the other limitations of claim 1 would infringe claim 1 but not claim 2, which requires a library of 160 000 or more peptides to be infringed. Referring to the ‘table’ analogy above, claim 1 is to a one-legged table and claim 2 is to a two-legged table. Thus, claim 2 is narrower than claim 1 because claim 1 covers subject matter not covered by claim 2, i.e., libraries containing 8001–159 999 peptides. A claim to a one-legged table covers both a one-legged and a two-legged table. However, a claim to a two-legged table does not cover a one-legged table. Product-by-process claims In addition to claiming libraries via a standard product claim, the ’684 patent includes a product-by-process claim as follows. ‘(5) A mixture of 8000 or more peptides with distinct, unique and different amino acid sequences, which mixture contains each of the 8000 or more peptides in retrievable and analyzable amounts, the mixture being produced by a process, comprising: combining and reacting activated amino acids with an acceptor amino acid or peptide wherein the activated amino acids areprovided in concentrations relative to each other based on their relative coupling constant so that the mixture of the peptides resulting from the reaction contains reaction product peptides in amounts suffient for any of the 8000 or more peptides to be retrieved and analyzed and wherein the mixture includes at least one biologically active peptide in a retrievable and analyzable amount.’ A product-by-process claim can be used in three different situations [4] as follows: (i) when the product is new and unobvious, but is not capable of independent definition; (ii) when the product is old or obvious, but the process is new; and (iii) when the product is 306

Patent strategies in molecular diversity

new and unobvious, but has a process-based limitation (e.g., an extruded composition). It is necessary to study the prosecution history of the application resulting in the patent to determine which situations apply. This allows for interpretation of the claim, i.e., determining if the claim covers all such products produced by any process [8] or only products produced using the process steps recited in the claim (see Ref. 4). Because product-by-process claims can be used in a variety of different situations and because they can be interpreted in different ways, such claims can form an important part of a strategy for protecting inventions in the field of molecular diversity. A strategy of using different types of claims and claims of the same type but different in scope can be applied to robotics [2] used in making peptides and nucleotide sequences, libraries created on the surface of phage [9], diagnostics [10] which use combinatorial libraries or to virtually any invention in the field of molecular diversity – limitations on the type and scope of the claim being dictated by the prior art (under 35 USC §102 and §103 the claim cannot cover that which was previously known or claim subject matter deemed to be obvious in view of that which was previously known) existing at the time of the invention and the extent of disclosure included in the patent application. (A patent application must fully describe and disclose the invention which is claimed. This is referred to as an enablement requirement under 35 USC §112.)

Enforcing patent rights The vast majority of patents are never litigated. Without litigation, a patent can produce income and value via licensing royalties, encouraging investments, and providing a degree of market exclusivity which makes it possible to obtain a higher profit margin than would be possible without the patent. However, when a patentee is unable to dissuade others believed to be making, using, or selling a product or process covered by the patent, an infringement action can be filed. Understanding the different sections of the statute under which infringement can be filed is useful in understanding how an invention in the field of molecular diversity might be protected. There are at least five basically different types of infringement (see Fig. 2). (Infringement is covered by 35 USC §27 1. Although there are more than five sections to the statute and variations and combinations of each type of infringement, an effort at simplification is made here.) (i) Direct infringement (35 USC §271(a); the most common type of infringement) involves making, using, or selling a patented invention. (ii) Active inducement (35 USC §271(b); there must be a direct 271(a) infringer also) to infringe involves taking action which induces another to directly infringe a patent such as selling nonpatented compounds with instructions on how to use them to make a combinatorial library which is patented. (iii) Contributory infringement (35 USC §271(c); the thing sold must have substantially no noninfringing use) involves selling something such as a chemical synthesizer which is not patented but which can only be used for one purpose, which is infringing, such as carrying out a patented process on making a peptide library. (iv) Export/component infringement (35 USC §27 1(f) is where one gathers nonpatented components together and exports them out of the U.S. for later assembly into a product which would infringe if made in the U.S. (v) Product of patented process infringement is where one sells a product made by a process patented in the U.S. (35 USC §271(g). This provides broader coverage 307

K. Bozicevic

*Note that 271(b) and (c) infringement require that someone be a 271(a) direct infringer. Fig. 2. Types of infringement.

to process claims by allowing a holder of a process patent to sue one who sells a product made by the patented process when that process was carried out outside the U.S.). By covering inventions via the different types of claims described above and availing oneself of the different types of infringement actions, it is possible to develop a range of strategies for protecting inventions in the field of molecular diversity. It is well known, for example, to use combinatorial libraries to search for one or more active compounds which bind to a given receptor. Claiming the active compounds found would be important, but it might also be useful to claim specific types of libraries likely to contain such compounds and methods of isolating related compounds from the library. Such claims would aid in preventing others from locating and developing related compounds. Libraries or subgroups of libraries can be products themselves which could be covered by product and/or product-by-process claims. Claims to the library need not describe in detail each or any particular molecule in the library. The claim need only clearly and distinctly claim the boundaries of the library as a whole such that those reading the claim understand when they are infringing – one does not infringe such a claim by making any particular compound but by making the library. The value of library claims such as those described above in the ’684 patent is understood when one considers the ‘split resin’ methodology [11] which was developed after the ‘reaction rate constant’ method described above (see ‘Process claims’). Using the split resin 308

Patent strategies in molecular diversity

method (i) an amino acyl resin is divided into a number of pools; (ii) each pool is reacted with a different amino acid reactant and each reaction is driven to completion; (iii) each of the reaction products obtained in each pool are combined and thereafter steps (i)–(iii) may be repeated any number of times. With each cycle (i.e., repeating steps (i)–(iii)), the number of reaction products obtained increases by a multiple of the number of pools. Thus, using 20 pools for 20 different amino acid reactants the number of reactants increases 20-fold in each cycle. Accordingly, by using six cycles one can create a library of 206 or 64 000 000 different peptides. Because each reaction is driven to completion, there is no need to calculate reaction rate constants or to combine reactants with each other proportionally based on reaction rate constants. Thus, the ‘split resin’ method can obtain the same results as the ‘reaction rate constant’ method with much less difficulty. However, due to the product claims obtained in the ’684 patent, any use of the mixed resin method to make a library containing 8000 or more peptides would directly infringe the product claims as well as the product-by-process claims of the ’684 patent. This infringement would exist notwithstanding the patentability of the ‘mixed resin’ method. The split resin method was patented via method claims. Those claims would not directly cover a mechanical synthesizer which might be used to carry out the method. However, if the synthesizer did not have a substantial noninfringing use then the sale of the synthesizer would constitute contributory infringement under §27 1 (c), provided it could be shown that at least one purchaser of the synthesizer carried out the claimed method, thereby committing direct infringement under §271(a). Thus, by proceeding under a 27 l(c) contributory infringement action the patentee can stop the infringement at its source and not be burdened with suing a large number of individual purchasers, i.e., the manufacturer of the synthesizer is sued and not the hundreds of purchasers of the synthesizer who carry out the method and are direct infringers. Recent developments Until recently, patents had a term of 17 years from issuance (35 USC §154 prior to 8 June 1995). Allowing the patent term to run from the date of issuance and not from the date of filing of an application for a patent was particularly important for patent applications in technologies where (i) the prosecution period (i.e., the period from application filing to patent issuance) is long; and (ii) the technology continues to retain value many years after the patent issues. Patent applications filed after 7 June 1995 which issue as a patent will have a term of 20 years (§534 of Pub. L. 103-465 amended 35 USC §154 as of 8 June 1995) from the date of filing of the application, i.e., not a term which runs from the date the patent issues. This creates no problems in some technologies dealing with molecular diversity because (i) the technology is out of date after a short time, e.g., 10 years or less, so that the shorter patent term still provides protection when the technology has value; and/or (ii) the patent application is issued as a patent in 3 years or less from the date of filing, thereby providing a patent term of 17 years or more. The United States Patent Office publishes statistics claiming an application pendency period of about 18 months. However, the number is deceptive in that applications are pending for longer average times in some technologies. Further, the application may go abandoned and be refiled as a continuation 309

K. Bozicevic TABLE 4 RECENT PATENT LAW CHANGES Old law

New law

Patent term

Seventeen years from issuance

Twenty years from application filing

Patentability under §103

Method needed to be independently patentable over prior methods

Method patentable if product produced is patentable – biotech area only

application several times before it issues. Although the Patent Office would measure each application to get the 18 month member the real pendency period is the total of all the applications, i.e., the time from the first filing to issuance of the last filed continuation application which is generally several times the 18 month period. Applications claiming technologies involving the manipulation of DNA such as in the preparation and use of phage display libraries often take longer than 3 years to prosecute. Thus, the patent will generally have a term of less than 17 years, i.e., 20 years minus the period of prosecution. Many technologies in the field of molecular diversity are used to find a new chemical entity which is useful as a pharmaceutical drug. Obtaining FDA approval on a new drug can take many years and additional years often go by before the drug is widely accepted and prescribed by doctors. In view of these difficulties, it can easily take 10–15 years before the patent owner receives a return on the research and development investment. Profits being obtained near the end of the patent term are often quite significant, e.g., on the order of one million dollars a day. Thus, any shortening of the patent term can have a significant financial impact on the ability of the patentee to obtain enough profit so that reinvesting in the research and development of another drug appears economically prudent (see Table 4). Based on the above, it can be understood that some recently enacted legislation could have a negative impact on the biotech industry. However, there have also been positive changes. For example, a recent amendment to 35 USC §103 (35 USC §103(b) amended on 1 November 1995 per Pub. L. 104-41) makes it possible to obtain patent protection on a process for making certain biotech products when the product is patentable even if the process for making it was previously known. One can imagine that a DNA sequence encoding a protein might be patentable even if the protein encoded by the DNA sequence were obvious and therefore unpatentable [12]. As per the new legislation, if the inventor obtains claims to the DNA sequence the inventor is also entitled to claims to methods of making cells containing the DNA sequence and the methods of making the protein expressed by those cells. Claims directed to methods of making a protein can be quite valuable when combined with claims to the DNA sequence. The claims to the DNA sequence per se could be used under 271(a) against those in the U.S. wishing to make the protein because one would use the claimed sequence to make the protein and be a 271(a) infringer. The claims to the method of making the protein could be used against non-U.S. entities who carried out the process abroad and then imported the protein into the U.S. for sale, constituting 271(g) infringement. There is a recent case demonstrating how 271(g) can be used to prevent importation of a protein (specifically a hormone) when the patentee only has claims on a process for making a plasmid used to make the hormone [13]. Specifically, in that case Genentech 310

Patent strategies in molecular diversity

held a patent on a method of constructing a plasmid which could be placed within a microorganism to produce a human growth hormone. Bio-Technology General Corporation carried out the process covered by Genentech’s patent, but did so in Israel in 1983 even prior to the enactment of 35 USC §271(g) but not prior to the issuance of Genentech’s patent. Years after making the plasmid via the method covered by Genentech’s patent, the human growth hormone made by a cell line containing the plasmid was imported into the United States. Upon importation, Genentech sued Bio-Technology General Corporation under 271(g), claiming that the human growth hormone was ‘a product which is made by a process patented in the United States’. The District Court held that the human growth hormone which was imported into the United States was a product which was made by the process patented by Genentech. It was clear that Bio-Technology General Corporation had used the Genentech process of making a plasmid. The plasmid was, of course, an essential part of the overall process of making the human growth hormone. The Appellate Court affirmed the decision of the District Court, pointing out that infringement under 271(g) does not consist of the making of a product by a process patented in the United States (which was done prior to the enactment of 271(g)) but rather the importation, offer to sell, sale, or use of the product made by such a process (which was done after the enactment of 271(g)). The recent amendment to 35 USC §103 combined with 35 USC §271(g) and the abovediscussed Genentech case can be combined to develop some powerful strategies for protecting intellectual property in the field of molecular diversity. Specifically, amendments to 35 USC §103 make it easier to obtain method claims after one has demonstrated that a biotechnology product is patentable. The precedent provided by other recent cases makes it easier to obtain patent protection on certain biotechnology products such as DNA sequences. Product claims to a DNA sequence should prevent others in the United States from using the sequence to make the desired protein product. Such product claims would not prevent others from using the sequence outside the United States to produce the protein product. However, by using the process claims and 271(g), it is possible to prevent the product produced by that patented process from being imported into the United States.

Economic facts Much of the technology of molecular diversity is directed towards finding chemical compounds which can be developed into pharmaceutically effective drugs. Changes in the law TABLE 5 ECONOMIC OVERVIEW Cost/profit

Amount (billion dollars)

Total U.S. health cost Pharmaceutical cost Profit on pharmaceuticals Profit after taxes (net)

900 60 15 7.5

The net profit is 0.83% of total costs. Conclusion: reducing profits on drugs has a negligible effect on reducing total health care costs and could increase long-term cost by reducing the number of new drugs developed.

311

K Bozicevic

which make it possible to more readily protect these technologies via patents could have the effect of increasing the price of drugs to consumers in that patents are used to provide market exclusivity and generally increase price and profit. However, as profits increase others are motivated to develop new drugs which provide a benefit to the consumer. Changes in the laws which decrease the term of patents such as by allowing the term to run from the date of the application as opposed to the date of the issuance of the patent could decrease the price of drugs, thereby providing a benefit to the consumer. Specifically, when the patent expires competition enters the market, which generally causes the price to be driven downward. However, a decreased price means a decreased profit for the original developer of the drug, resulting in a decreased motivation to develop new drugs. When the real economics of health care costs are examined closely, it can be seen that increased drug costs have little effect on increasing overall health costs. Further, providing improved drugs can have a dramatic effect on decreasing health care costs. This issue is often underappreciated by the government and the public at large because prescription drug revenues in the United States exceed 60 billion dollars, which is a large sum. Because the amount of money spent on drugs is so large, it appears that any reduction in drug costs would have a significant effect on reducing health care costs. However, total health care costs in the United States exceed 900 billion dollars. Further, the profits to U.S. drug companies on the sale of drugs are 25% of the revenues or 15 billion dollars before taxes. The 15 billion dollars in profits represent 1.6% of the 900 billion dollars in health care costs and less than that if one considers that federal, state and local taxes return about half of the 15 billion dollars to the government (see Table 5). Because the profit on drugs sold in the United States represents such a small portion of the overall health care costs (less than 1% after taxes), it can be seen that even a total elimination of the profits on drugs would not have a significant impact on overall health care costs. Further, the elimination or even a substantial reduction of profits on drugs would act as a tremendous disincentive to the development of new drugs. It is generally accepted that 90% or more of all newly developed drugs are developed in the United States. Further, a newly developed highly effective drug could substantially reduce health care costs. Imagine the reduction in health care costs which could be achieved by a new drug which eliminated the need for bypass surgery, cured a particular type of cancer or inactivated HIV replication, thereby curing patients with AIDS. These potential benefits are often overlooked by the government because the U.S. elderly population constitutes about 12% of the population, making up a strong voting block who spend disproportionally large amounts on drugs – just over 25% of total drug expenditures. However, if the true economic picture could be seen it would be understood that increasing patent protection in the field of molecular diversity, and thus pharmaceutical drugs, would be more likely to reduce overall health care costs even if it resulted in larger drug company profits and overall larger expenditures on pharmaceuticals.

Conclusions Ownership in real and intellectual property can be secured by a number of different legal mechanisms. Those who invent new and useful products and methods can be rewarded with a governmentally granted exclusive right to make, use, and sell the invention. 312

Patent strategies in molecular diversity

The purpose of the grant is to promote progress, i.e., encourage others to invent by showing how profitable invention can be. To actually realize a profit, the types of subject matter protectable, types of claims used and, ultimately, the type of infringement actions that might be brought need to be considered in developing an overall strategy.

References 1 2 3 4 5 6 7 8 9

Owens-Corning Fiberglass Corp., 227 USPQ 417 (Fed. Cir. 1985). U.S. Patent No. 5 053 454, issued 1 October 1991 to A.K. Judd. U.S. Patent No. 5 266 684, issued 30 November 1993 to W.J. Rutter and D.V. Santi. Atlantic Thermoplastics Co. Inc. v. Faytex Corporation, 23 USPQ 2d 1801 (Fed. Cir. 1992). U.S. Patent No. 5 010 175, issued 23 April 1991 to W.J. Rutter and D.V. Santi. U.S. Patent No. 5 182 366, which teaches a different method of making a large peptide library. US. Patent No. 4631 211, issued 23 December 1986 to R.A. Houghten. Scripps Clinic & Research Foundation v. Genentech Inc., 18 USPQ 2d 1001 (Fed. Cir. 1991). U.S. Patent No. 5 223400, issued 29 June 1993 to R.C. Ladner, S.K. Guterman, B.L. Roberts, W. Markland, A.C. Ley and R.B. Kent. 10 U.S. Patent No. 5 492 807, issued 20 February 1996 to D.V. Santi. 11 U.S. Patent No. 5 182 366, issued 26 January 1993 to V.D. Huebner and D.V. Santi. 12 Bell, 26 USPQ 2d 1529 (Fed. Cir. 1993); Deuel, 34 USPQ 2d 1210 (Fed. Cir. 1995). 13 Bio-Technology General Corporation v. Genentech Inc., 23 USPQ 2d 1801 (Fed. Cir. 1992).

313

Combinatorial chemistry alliances in the 1990s: Review of deal structures Mark G. Edwards Recombinant Capital Inc., 220 Montgomery Street, Suite 1800, San Francisco, CA 94104, U.S.A.

Introduction and Methods Combinatorial chemistry (CC) is the simultaneous synthesis of large numbers of compounds. In combination with high volume screens, CC has been credited with reshaping the drug discovery process [ 1]. Less recognized, however, has been the substantial impact that CC has had on the structure and economics of early stage research alliances between biotechnology companies embracing this technology and their pharmaceutical partners. CC alliances have evolved significantly from the target screening deals done by Affymax in 1991 into novel alliance structures and relationships which are unique to the CC field. This article will review the principal contractual terms of a series of 20 CC alliances commenced between 1991 and 1996. These 20 deals involve seven biotechnology companies active in drug discovery and development using CC: five are public companies (ArQule, Arris, Chiron, Houghten and Pharmacopeia), one has filed preparatory documents to become public (Alanex) and one was public prior to its acquisition in 1995 (Affymax). All seven companies are (or, in the case of Affymax, were) subject to the public filing requirements of the Securities and Exchange Commission (SEC) governing material contracts of public companies. This review draws extensively on the public versions of the 20 CC alliances filed with the SEC by these seven companies. (The public versions are ‘redacted’, which means that key business elements have been marked confidential and withheld from disclosure. In some instances, it was possible to recreate the missing business elements based on other public sources, including annual and quarterly reports as well as offering prospectuses. Where a specific business element has been redacted and not disclosed elsewhere, the entry is marked confidential (CON) and omitted from average or aggregate figures in the analysis.) The first section will describe the basis upon which the 20 CC alliances were chosen for analysis and the general characteristics of these alliances as a group. The second section reviews the pre-commercial payments to the biotech (biotech is used hereafter to imply a biotechnology company) partner in each alliance, with emphasis on noncancelable payments (firm $ payments). The third section then subdivides these agreements on the basis of what CC services are made available to the biotech’s commercialization partner. The fourth section compares several CC biotechs’ methods of allocating compound exclusivity among their respective alliance partners.

Selected combinatorial chemistry alliances Using Recombinant Capital’s Recapping Corporate Alliances Database [2], we found 87 publicly announced deals involving CC as of October 15, 1996. Of these, 46 were SEC

314

Annual Reports in Combinatorial Chemistry and Molecular Diversity, Vol. 1, pp. 314–320 © 1997 ESCOM Science Publishers B. V.

Combinatorial chemistry alliances

filed as of the same date. From this number, we eliminated nine biotech/biotech deals, as well as an additional 12 alliances where the biotech contributed significant biological assets (e.g. Neurogen, NeXstar, Microcide, Ariad). Of the remaining 25 agreements, we attempted to select the most financially significant deals for each biotech, not counting acquisitions. Table 1 lists the 20 CC alliances selected for review. From the standpoint of commercialization partners, Table 1 includes CC alliances undertaken by 16 major pharmaceutical companies (15 after taking into account the merger of Ciba-Geigy and Sandoz to form Novartis, announced in March 1996). In each case, the commercialization partner is responsible for preclinical and clinical testing, product registration, manufacturing and marketing of primary products which emanate from the 20 CC alliances. All 20 alliances selected are structured on a worldwide license basis. The research focus of the alliance is one or more broad diseases or disorders in 17 (85%) of the deals, as compared to generalized delivery of CC libraries in the remaining three agreements. Specific molecular targets are chosen by the pharma partner within such disease, but this choice is usually open to substitution without penalty. Interestingly, with a few narrow restrictions, the eventual product license is not limited by disease but extends to all useful medical indications. As will be discussed below, the principal purpose achieved by the identification of disease foci for specific pharma partners appears to be the establishment of exclusionary categories for future CC alliances. In other words, although any of a biotech’s partners may eventually commercialize an alliance product to treat, for example, diabetes, only one such partner will be the ‘diabetes partner’ in terms of target selection and/or CC library generation. TABLE 1 CC ALLIANCES Biotech licensor

Commercialization partner (pharma)

Focus of alliance

Affymax

Marion Merrell Dow (5/91) Ciba-Geigy (7/91) American Home Products (2/94

3 CON targets 3 CON targets 11 CON targets

Alanex

Astra AB (12/94) Novo Nordisk (10/95) Roche Bioscience (6/96)

Opiate receptor target for pain Diabetes target(s) Pain target(s)

ArQule

Pharmacia (3/95) Abbott (6/95) Solvay (11/95) Roche Bioscience (9/96)

Target(s) for CON field Target(s) for CON field All drug uses All drug uses

Arris

Pharmacia (8/95)

Anticoagulation target(s)

Chiron

Ciba-Geigy (11/95)

2 CON targets

Houghten

Procter & Gamble (1/95) Novo Nordisk (2/96)

Target(s) for CON field Target(s) for CON field

Pharmacopeia

Schering-Plough (12/94) Schering AG (2/95) Sandoz (10/95) Bayer AG (12/95) Daiichi Pharmaceutical (3/96) Organon (5/96)

Target(s) for cancer and asthma Target(s) for multiple sclerosis All drug uses Target(s) for CON field Target(s) for CON field Target(s) for CON field

315

M.G. Edwards

Pre-commercial payments Like other early stage R&D alliances, a large fraction of total pre-commercial payments from pharma licensee to biotech licensor in CC alliances comes in the form of R&D reimbursement. Such reimbursement is usually negotiated as a minimum annual payment per year (R&D/yr), either on a fixed dollar basis or as a function of the number of biotech full-time equivalent scientists (FTEs) expected to conduct the alliance program. In addition, R&D reimbursement is usually negotiated for an expected number of years (R&D term), although often it is possible for the licensee to cancel R&D reimbursement prior to the completion of the R&D term. Table 2 ranks the 20 CC alliances by the aggregate amount of noncancelable pre-commercial payments (firm $ payments) to the biotech partner. (The table does not include milestone payments, although such payments pertain to all 20 deals, since in each instance the aggregate dollar value of such payments per product was redacted.) Firm $ payments include any upfront fee, equity and R&D reimbursement payments, with the exception of R&D reimbursements which are cancelable at the licensee's option. Contingent equity and/or milestone payments are also excluded. For the 20 CC alliances as a group, average firm $ payments are $12.1 million, which increase to $14.1 million if all (including cancelable) R&D reimbursements are counted. Of the seven CC biotech companies reviewed, the largest firm $ payments were obtained by Affymax, Chiron and Pharmacopeia. TABLE 2 PRE-COMMERCIAL PAYMENTS Biotech licensor

Pharma licensee

Firm $ payments

Affymax Chiron Pharmacopeia Pharmacopeia Affymax Affymax Pharmacopeia ArQule Pharmacopeia Pharmacopeia Arris ArQule Alanex Pharmacopeia ArQule Houghten Houghten ArQule Alanex Alanex

AHPC Ciba-Geigy Bayer Organon MMD Ciba-Geigy Schering Solvay Daiichi Schering AG Pharmacia Roche Roche Sandoz Abbott P&G Novo Nordisk Pharmacia Novo Nordisk Astra AB

%57.0M 26.0a 19.5 19.0 15.0 15.0 13.0 11.5 11.0 9.0 8.5 8.4 6.7 5.0 4.4 4.0 4.0 2.0 1.75 1.0

1.0 0.25 0.25

$12.1M

$2.0M

Averages a b

c

Upfront fees

Equity

R&D/yr

R&D term Contingent (years) equity

$10.0M

$9.4M CON 4.75 3.5 2.5 2.5 3.0 3.5 4.0 2.75 2.8 3.7 1.8 2.0 2.2 2.0 2.0 1.0 1.5 0.75

5 5 2 3 3 3 2 1–5 1.5-3 2 3–5 2–3 1.5–3 2–5 2 Accessc 2 Access 1–3 1–3

$2.9M

2.2

$5.5M CON

CON

10.0 4.2 7.5 7.5 7.0 7.0 5.0 3.5

CON 1.0 4.0 CON 2.0

$6.4M

$30M eq.

$4.2M eq.

$13M eq. $lM ter.b $3M eq. $2.5M eq.

$1M ter. $1M eq.

Payment is for technology access only and does not include sponsored R&D (amounts CON). Termination payment for early cancellation of sponsored R&D. Library access fee.

316

Combinatorial chemistry alliances

Table 2 also shows that one-half of the deals analyzed have an upfront fee, which averages $2.0 million in the six instances where the amount of the fee was disclosed. Onehalf of the group also has an initial equity investment, averaging $6.4 million per agreement. With respect to R&D/yr, the average funding rate is $2.9 million per year. Of the 20 CC alliances, 12 (60%) compensate R&D on an FTE basis, while the remainder use fixed periodic payments. The average noncancelable R&D term is 2.2 years, increasing to 3.2 years if cancelable portions of the R&D term are counted. Six deals include one or more additional equity investments on a contingent basis. For purposes of comparison, Recap analyzed a sample of 29 early stage biotech alliances, where the common element among the deals analyzed was the inclusion of one or more biological target(s) as the focus of the research [3]. For this group, firm $ payments per early stage target alliance averaged $11.7 million. This average increased to $1 8.5 million, however, when cancelable R&D reimbursements were included. Upfront fees were paid in 16 deals (55%) and averaged $1.6 million per alliance. Equity was an element in 22 of the early stage target alliances (75%), averaging $6.3 million per deal. As compared to early stage target alliances, the 20 CC alliances are quite similar in terms of pre-commercial payments to the biotech licensor. Upfront fees and equity investments are somewhat larger in the group of CC alliances, but more prevalent among the early stage target deals. Expected R&D payments are significantly larger for the target deals ($18.5M versus $14.1M); however, this difference evaporates when the two groups are compared on the basis of noncancelable R&D payments. Both group averages, however, mask considerable variation in pre-commercial payments and alliance structure of individual agreements. The next section will attempt to address the various structures of the group of 20 CC alliances.

Evolving alliance structures The earliest CC alliances largely inherited the structures of early stage target agreements. While such deals vary widely with respect to the nature of the biological target(s) and the type of compound screening to be performed, target screening structures are nonetheless quite similar to each other in terms of the type of research payments made to the biotech partner, as well as the stage of hand-off of successful research results to the commercialization partner. Typically, the biotech will perform screening services on a cost reimbursement basis; ‘hits’ are then forwarded to the pharma partner for assessment, chemical modification and clinical development; milestone payments are made to the biotech on the basis of clinical benchmarks achieved by the pharma partner; and the biotech receives royalties on commercial sales of resultant products. In contrast to this kind of target screening structure, however, several recent CC alliances have been structured in ways that are apparently unique to the CC field and perhaps, in some cases, to even a single biotech participant within this field. Table 3 subdivides the 20 agreements on the basis of the specific CC services to be performed by the biotech for its commercialization partner. The earliest CC alliances, undertaken by Affymax, are structurally similar to target screening deals – Affymax generates CC libraries, screens these libraries against specific drug targets in-house, turns over ‘hits’ for evaluation and development, and is paid first milestones and then royalties on the basis of its partner’s advancement of the project. No CC libraries are delivered to 317

M.G. Edwards

the commercialization partner, nor is the underlying technology for CC library generation shared with or transferred to the pharma company. The three Alanex agreements also follow this screening structure, as does Pharmacopeia's deal with Schering AG. Within TABLE 3 GROUPING OF THE 20 AGREEMENTS Pharma/biotech

Date

Screen CC Lib.

Deliver CC Lib.

Sell CC Tech.

Royalty exclusionsa

MMD/Affymax Ciba-Geigy/Affymax AHPC/Affymax Astra/Alanex Schering AG/Pharmacope Novo Nordisk/Alanex Roche/Alanex

5/91 7/91 2/94 12/94 2/95 10/95 6/96

Yes Yes Yes Yes Yes Yes Yes

No No No No No No No

No No No No No No No

None None None No royalties payable Nonpatented products None None

Schering/Pharmacopeia Bayer/Pharmacopeia

12/94 12/95

Yes Yes

Yes Yes

No No

Excluded productsb Excluded products

P&G/Houghten Pharmacia/ArQule Abbott/ArQule Solvay/ArQule Novo Nordisk/Houghten Roche/ArQule

1/95 3/95 6/95 11/95 2/96 9/96

No No No No No No

Yes Yes Yes Yes Yes Yes

No No No No No No

Excluded productsc Nonpatented products Abbott derivative productsd Nonpatented products None None

Pharmacia/Arris Ciba-Geigy /Chiron

8/95 11/95

Yes Yes

No No

Yes Yes

No royalties on pharma CC No royalties on pharma CC

Sandoz/Pharmacopeia

10/95

No

Yes

Yes

None

Daiichi/Pharmacopeia Organon/Pharmacopeia

3/96 5/96

Yes Yes

Yes Yes

Yes Yes

No royalties on pharma CC Excluded products

a

b

c d

Except as noted below, these agreements specify that royalties will be paid to the biotech partner on any products developed by the alliance or utilizing the CC technology (royalty rates are all CON). Excluded Product is defined in the Schering/Pharmacopeia deal as any product for the therapy or prophylactic treatment or prevention of diseases and conditions in human or animal which does not contain an Agreement Compound, but contains (i) a Library Compound which is screened by Schering and identified as having biological activity against a particular target, which compound Schering has previously identified as having activity against the same biological target; (ii) a Library Compound which Schering requests the decoding, and synthesizes and incorporates it into Schering’s in-house chemical sample collection, and following the Exclusivity Period of such a Random Library, Schering determines that such a compound also has different biological activity with respect to a second target, and Schering develops and commercializes such a compound solely for use with respect to such a second target; (iii) any Derivative Compound conceived more than –(CON) years following the end of the Exclusivity Period of the Random Library in which the Subject Compound was a member, except that a conceived compound under this subsection shall not be deemed an Excluded Product if it falls within the scope of the patent granted on a Schering invention, a Pharmacopeia invention or a joint invention, issued as of the termination date; (iv) a compound developed by Schering following the end of the last to expire Exclusivity Period of the Random Libraries; (v) a compound resulting from a logical series of medicinal chemistry modifications to a compound described in subsection (i) or (ii) above. Definition of Excluded Product in the P&G/Houghten agreement is confidential. Abbott Derivative Products are defined in the Abbott/ArQule deal as a chemical compound generated by ArQule for Abbott by use of chemical modification methods from an Abbott Core Compound. An Abbott Core Compound is any compound supplied to ArQule by Abbott.

318

Combinatorial chemistry alliances

this subgroup, the most significant structural differences between individual agreements involve whether, and to what extent, royalties are to be paid to the biotech on products resulting from such screening efforts. The next subgroup in Table 3 involves two alliances undertaken by Pharmacopeia (with Schering and Bayer). In these deals, Pharmacopeia agrees both to conduct screening services related to specific drug targets and to deliver certain CC libraries (random and focused) to its pharma partner for pharma’s in-house drug screening. Because of the dual nature of these agreements, both deals have elaborate definitions distinguishing licensed products from CC generated compounds, whereby the latter are essentially excluded for royalty payment purposes. A third structural subgroup consists of a series of alliances undertaken by both Houghten (with P&G and Novo Nordisk) and ArQule (with Pharmacia, Abbott, Solvay and Roche). In these relationships, the biotech provides no screening services but rather simply delivers CC libraries (again random and focused) to its pharma partner for pharma’s in-house drug screening. Interestingly, this subgroup varies by biotech and by partner with respect to the nature and extent of exclusions from royalty obligations. For example, two of the ArQule deals (with Pharmacia and Solvay) pay royalties only in the event that CC compounds are patent protected; a third ArQule alliance (with Abbott) includes royalties for nonpatented compounds, but excludes any CC compounds generated as a focused library from a core Abbott compound. The last three subgroups in Table 3 each involve some combination of the services described above, along with an added element: the nonexclusive (or exclusive in the case of Pharmacia/Arris) license and transfer of technology associated with the generation of CC libraries. For the most part, such agreements exclude from royalty payments to the biotech any CC compounds produced by the pharma partner using such technology. Nonetheless, CC technology transfer appears to be valued dearly by both parties: in the case of Chiron’s deal with Ciba-Geigy, the price tag for nonexclusive technology transfer alone was $26 million. Finally, although omitted from this analysis, one might characterize the acquisition of Affymax by Glaxo for $534 million, of Sphinx by Eli Lilly for $75 million, and of Selectide by Marion (now Hoechst Marion Roussel) for $59 million as additional examples of such CC technology transfer, albeit on an exclusive basis. We consider directly the issues concerning exclusivity in CC alliances in the next section.

Compound exclusivity With respect to the 20 CC alliances, the approaches taken to compound exclusivity roughly mirror the grouping of Table 3. For those deals involving only target screening with CC compounds, exclusivity follows the path of ‘hits’ – which is to say that a particular CC compound becomes exclusive to a pharma partner once it is recognized as a ‘hit’ in a particular screening assay, and then remains exclusive unless reverted to the biotech on the basis of project termination or nondiligence in development. For all CC compounds which are not ‘hits’, however, pharma’s license is effectively terminated upon the completion of screening. With regard to the second subgroup, exclusivity is principally time-limited. Both of the Pharmacopeia alliances in this subgroup are target-related; nonetheless, as was mentioned earlier, pharma’s license to developed products is not limited to specific 319

M.G. Edwards

diseases. Instead, these deals provide time-limited exclusive access to both random and focused libraries, during which a partner may identify a compound for development and so preserve its exclusivity (subject to diligence), and after which Pharmacopeia may utilize such libraries itself or with third parties. An interesting caveat in this regard is the stipulation in both agreements that all CC libraries be newly generated. On this basis, one can envision the emergence of a substantial residual library of CC compounds, now exclusive to Pharmacopeia, but not available for reuse in pharma alliances having equivalent stipulations. In the Houghten and ArQule deals, by contrast, a ‘winner takes all’ situation arises. Partners have essentially equivalent access to CC libraries on a nonexclusive basis, with the option to reserve designated compounds (and derivative libraries) as exclusive on the basis of observed biological activity and the payment of a corresponding milestone. All undesignated compounds revert to the biotech for continued licensure or internal use. Finally, the CC alliances involving technology transfer essentially follow the patterns described above, with two exceptions. First, with regard to CC libraries generated by the pharma partner, such libraries become the exclusive property of such a partner, as would be expected. Second, and surprisingly, the Ciba/Chiron alliance goes against the grain of the other CC agreements by granting exclusivity (or withholding license rights) on an indication-by-indication and class of product basis. For example, specific diseases are reserved to Chiron, as are vaccines, antisense and gene therapy products.

Conclusions Pharmaceutical compound libraries, ranging from 100 000 to 500 000 compounds per company, have long been regarded as the ‘crown jewels’ of their respective R&D organizations. On the basis of one newly synthesized compound per researcher per week and using $250 000 as the fully burdened cost per researcher per year, a 100 000 compound library approximates a $500 million investment in compound synthesis, patenting, and potential screening against biological targets to determine structure–activity relationships, etc. And yet Pharmacopeia’s 1995 annual report describes its generation of 100 000 compounds per calendar quarter and the use of robotic systems capable of analyzing 100 000 compounds per week. Clearly the advent of CC technology generally, and rapid proliferation of such technology through CC alliances specifically, has brought into focus many questions concerning the value, durability and exclusivity of pharma’s compound library assets. This review has taken a snapshot of a rapidly emerging and evolving field. It will be interesting to observe whether the valuations, structures and methods of proprietary protection described here are solidified over the next several years or continue to transform into as yet unprecedented contractual arrays.

References 1 Baum, R., Chem. Eng. News, 74 (1996) 28. 2 A front-end search engine of Recap’s RCA Database is available on the Internet (http://www.recap.com). 3 The results of this analysis of early stage target alliances are available on the Internet (http://www.recap. com/report3.html).

320

Combinatorial chemistry: The promise fulfilled? Jim Hauske Sepracor Inc., 11I Locke Drive, Marlboro, M A 01 752, U. S.A.

Webster’s New Collegiate Dictionary defines promise as ‘... a right to expect or to claim the performance or forbearance of a specified act; ground for hope, expectation or assurance of eventual success ...’. Combinatorial chemistry has certainly generated very high expectations in the drug discovery community, but has the technique of combinatorial chemistry, and the practitioners of the technique, really delivered on that promise? The real promise of combinatorial chemistry should be defined in terms of a drug discovery and not a preclinical or, for that matter, a clinical candidate. Quite clearly, combinatorial methodology has yet to deliver on that promise, since no drugs have yet to emerge from clinical trials. However, for the purposes of this chapter, the editors of this series hope to document the continued advance of drug candidates generated by combinatorial methods through the clinic and, ultimately, to the market place. That is, the promise fulfilled. At this stage in the development of combinatorial methods, this chapter represents more of a place holder than a chapter of substance. In a sense, this chapter is as much an unfulfilled promise as the technique it reports upon, since there is really very little accessible clinical data to report. Therefore, we will outline the seemingly ever-expanding list of practicing combinatorial chemistry companies, report anecdotal information concerning pre-clinical discoveries and, most significantly, summarize the progress of those candidates actually in clinical development. In some respects the pre-clinical information would be very useful, but, not too surprisingly, this is very difficult information to document and to verify. Hopefully, in the not too distant future, there will be a detailed drug discovery story, with combinatorial chemistry as the centerpiece, which will appear in this chapter.

The practitioners No matter what the specific format one utilizes to practice the technique, combinatorial chemistry is often defined as a rapid, flexible and novel means of discovering novel drug leads and optimized clinical candidates. The adjectives rapid, flexible and novel have been used to support the notion that this is certainly not the province of the established pharmaceutical industry. In large part, this is the result of the hype associated with the explosion of so many small companies attempting to sustain a business in combinatorial chemistry, for many large pharmaceutical companies have excellent combinatorial chemistry competency. Specifically, Pfizer and Warner-Lambert/Parke-Davis made seminal contributions, and continue to make significant contributions, to the techniques of combinatorial chemistry. Although not in the vanguard of the large pharmaceutical practitioners, Merck and Abbott now have significant and productive efforts in combinatorial chemistry. The commonality all these companies have relates to the use of combinatorial chemistry in a non-peptide sense. The established pharmaceutical companies have clearly focused Annual Reports in Combinatorial Chemistry and Molecular Diversity, Vol. 1, pp. 321-325 © 1997 ESCOM Science Publishers B. V.

321

J. Hauske

everyone’s attention on the need to synthesize non-peptide libraries. Large pharmaceutical companies that were a little further behind in terms of implementing the technique purchased some of the smaller, focused players in the hope of immediately establishing the necessary combinatorial chemistry infrastructure within their organizations. For example, Glaxo, Lilly and Marion Merrell Dow purchased Affymax, Sphinx and Selectide, respectively. Of those three purchases, it appears that the drug discovery programs developed by the Lilly/Sphinx entity have proven most successful, since they are apparently the first to discover a non-peptide clinical candidate utilizing combinatorial chemistry techniques. Chiron and Houghten Pharmaceutical (HPI) led essentially everyone in combinatorial chemistry techniques utilizing peptoids and peptides, respectively. Both companies have progressed with great rapidity toward the construction of a variety of non-amide-containing libraries. Also, both companies remain highly visible, as well as highly successful, by virtue of their continuing combinatorial chemistry collaborations with a number of corporate partners. In this context, combinatorial chemistry provides a great deal of leverage in many corporate partnering deals. For example, companies with a true drug discovery focus, such as Neurogen and Arris, now have collaborative agreements, which include combinatorial chemistry as a significant component of the deal structure, with Pfizer and Upjohn-Pharmacia, respectively. Even the rational drug design companies, such as Vertex and Ariad, have expanding combinatorial chemistry approaches as part of their drug discovery programs. In contrast to either the large pharmaceutical companies or the smaller drug discovery companies we have discussed, Alanex, ArQule, CombiChem and Pharmacopeia were all founded as combinatorial chemistry companies per se. Although they all have collaborative agreements with significant domestic or international partners, Pharmacopeia and Alanex appear to be garnering the largest share of the available collaborative dollars. Although all these companies have incredibly bright staff and significant technology, they have not yet provided ‘the compound’, which will define the next clinical breakthrough. Again, the promise of combinatorial chemistry requires the rapid, and frequent, discovery of multiple clinical candidates, which in turn will provide the next generation of drugs. As the combinatorial approach to drug discovery matures, it is a widely held hope, perhaps expectation, that these companies will be materially responsible for that new generation of drugs. It remains the promise.

The clinical results As noted earlier, there are presently very few compounds in the clinic that were discovered utilizing combinatorial chemistry. It does appear, however, that there are a number of compounds in pre-clinical development. Although most of the information concerning these compounds is anecdotal, it is useful to consider the therapy areas for which these pre-clinical molecules may prove useful. For example, Pfizer has discovered a number of molecules that are useful versus a variety of central nervous system (CNS) and metabolic disease indications. It appears that all these molecules resulted from lead optimization libraries, since the leads were discovered via high throughput screening and then combinatorial chemistry programs were initiated. Lilly workers have a number of interesting molecules useful for both chronic and metabolic diseases. These molecules were 322

Combinatorial chemistry: The promise fulfilled?

also apparently discovered utilizing combinatorial techniques to synthesize lead optimization libraries from an established lead molecule. A number of anti-infective agents have also been discovered by Chiron and HPI, although it is not clear that any of these specific targets are in development. Finally, a number of libraries that are useful versus a broad range of protease targets have been discovered by essentially everyone having a combinatorial approach. Presumably, these libraries will provide a large number of exciting clinical candidates in the future. Although few, there are certainly some molecules in clinical development that were discovered utilizing combinatorial chemistry. Also, there are some interesting questions one may ask to frame the context of these clinical development candidates. For example: (i) what therapy area is the candidate molecule targeted against; (ii) is the candidate molecule operative versus a known molecular target or is it binding a new molecular target; (iii) is the candidate molecule a new structural class or a derivative of an existing structural class; (iv) what is the compound’s corporate identification number and the compound structure; (v) how long did it take to reach the point of phase I studies; and (vi) is the candidate molecule an analog generated from an existing lead structure, or was the initial lead structure generated via combinatorial methods and then analoged via combinatorial chemistry? Clearly, not all these questions are equally significant with regard to drug discovery, but they all provide the means to ‘texture’ the quality of the method of discovery. HPI, Lilly, Neurogen and Pfizer have initiated clinical trials in a number of areas, including atherosclerosis, cancer, CNS, diabetes and obesity. At this time, the trials being conducted by HPI and Eli Lilly provide the most detailed data and, therefore, we will use them to generate the information we require to answer the questions that we have posed.

HPI What therapy area? HPI has compounds in clinical trials for a couple of therapy areas: namely, phase II trials for cancer adjunct therapy, type II diabetes and obesity and phase I trials for acute pain. These are all significant therapeutic areas and clinical success in any of them would be a highly significant contribution to human health. Is the molecular target new? The molecular target is defined as a cytokine regulator; however, it may not be new. What is the structural class and corporate identification number of the candidate? As of this writing, we do not know the structure of HP228, but it is likely a new derivative cf a known structural class. How long from idea inception to initiation of the clinical trial? Again, as of this writing we are not certain. However, HPI went public in April 1996, so we presume that the total time necessary for this discovery was considerably reduced from the average of 6.5 years that is typically quoted as the total time required from the original inception of the idea until entry into phase I trials. 323

J. Hauske

Did the candidate molecule arise from an optimization library of an existing lead or is it an analog of a lead structure discovered utilizing combinatorial methods? This is the most intriguing aspect of the clinical candidate generated by the Houghten workers, since the discovery of HP228 was apparently the result of a lead generation library. This library provided the new chemical entity, which ultimately was the platform for further library optimization chemistry, which, in turn, provided the clinical candidate. This appears to be the only case to date of a lead generated by combinatorial methods, which, after optimization, has resulted in the clinical candidate. The other cases in the clinic that we have information about all arose from optimization libraries on existing leads.

Eli Lilly What therapy area? The Lilly compound is directed toward a CNS indication and later in 1996 the Lilly workers may divulge the specific therapy area. Is the molecular target new? Although the Lilly workers are rightly reluctant to disclose the molecular target, it is apparently new. This is truly an extraordinary result. An untapped potential of combinatorial libraries relates to the possibility of not only finding significantly active molecules, but also of defining new molecular targets. This aspect of the Lilly discovery deserves special attention. What is the structural class and corporate identification number of the candidate? LY-334370 is the clinical candidate. It is a new member of an existing structural class, although the precise structure will not be revealed until later this year. How long from idea inception to initiation of the clinical trial? It required 23 months from the initiation of the program until the start of phase I trials. This is another extraordinary accomplishment, since the average time, industrywide, from idea inception until entry to the clinic is approximately 6.5 years. One early promise of combinatorial chemistry related to the presumption that the technique would compress the time to the clinic. Certainly, the drug discovery industry’s attraction to the technique is clearly related to the possibility of increased efficiency and decreased time to development. The Lilly workers may have the first demonstration of this increased efficiency. Did the candidate molecule arise from an optimization library of an existing lead or is it an analog of a lead structure discovered utilizing combinatorial methods? LY-334370 was synthesized in a lead optimization library. The original lead structure was accessed from a biased set of molecules related to a platform with an affinity for the receptor target. The clinical candidate was synthesized directly in the lead optimization library. So, is combinatorial chemistry the indispensable technology of the drug discoverers of the next century, or simply an empty promise, which has received extraordinary hype? The 324

Combinatorial chemistry: The promise fulfilled?

answer to this question requires a good deal more data; however, some practitioners of the technique seem to be on a more efficient path to the discovery of interesting clinical candidates. If the goals of the editors of this volume are achieved, the pages in this chapter that are yet to be written will provide the necessary information to separate the success from the hype.

Acknowledgements The author is very grateful to Drs. Pavia and Kaldor for generously lending their assistance in writing this chapter.

325

A compendium of solid-phase chemistry publications Ian W. James Chiron Technologies Pty. Ltd., II Duerdin Street, Clayton, VIC 3168, Australia

Introduction Solid-phase organic chemistry (SPOC) has undergone a resurgence in recent years, largely brought on by its application to combinatorial chemistry. The solution-phase chemist has had for some time a number of excellent collections on available reactions. The solid-phase chemist has not had access to a similar comprehensive collection, although a few excellent reviews have been published (see the Reviews section). This review covers recent literature, and literature going back to the 1960s. It is intended as a resource for the chemist when planning a solid-phase synthesis. Through the references given, the chemist can gain an appreciation as to how a reaction may be performed on the solid phase. The conditions previously described may not be ideal for combinatorial chemistry. However, they may provide the chemist with a useful starting point. An important difference between solution-phase and solid-phase chemistry is the new variable: the polymer support. The polymer has significant influence on the reaction. There have been many solid supports used over the years with variation in polymer type, degree of cross-linking, and even quality. These factors have significant influence on solvation properties and reaction kinetics, and hence on synthesis characteristics. Various grafted polymers may have characteristics that differ from other supports. Consequently, the optimal conditions for a reaction on one polymer may be different from those for the same reaction on a different polymer. This should not dishearten the chemist. A reported reaction may be used as the basis for optimization studies of the reaction on a different support. Variables such as solvent, temperature, and reaction time may all be investigated to optimize the reaction on the new surface. This compilation, although extensive, is not exhaustive. Apologies to any who find their papers omitted! Comments to improve the next edition are welcome.

Solid-phase organic chemistry (SPOC) Reviews E1196, Her96, Fru96, Pat96, Ter95b, Gor94, Des94a, Las87, Fre81, Hod80, Lez78b, Cro76, Pat75 The following issues are devoted to combinatorial chemistry and contain some interesting SPOC reviews: Acc. Chem. Res., 29 (3) (1996); Chemtracts-Org. Chem., 8 (1) (1995). For a historical perspective, see Merrifield’s original paper: Mer63.

326

Annual Reports in Combinatorial Chemistry and Molecular Diversity, Vol. I,pp. 326–344 © 1997 ESCOM Science Publishers B.V.

A compendium of solid-phase chemistry publications

Products Acetal

Acyl phosphate Alcohol

Aldehyde Alkene

Alkyl iodide Alkyne Tertiary amide

Amine

α-Amino phosphonate β-Amino thioester Ammonium salt Aryl alkyne Aryl arsonic acid Aryl bromide Aryl chloride Aryl iodide Aryl selenic acid Aryl sulfonic acid Azide Benzimidazole Biaryl Boronic acid Carbodiimide β-Carboline (tetrahydro)

Aur95, Dou95, Kic95, Liu95, Me194, Tho94, Blo90, Vee87, Fre84, Hod83, Fre75c, Lez77d, Dum73, Lez73, Won73 Reb75a Aja95, Dou95, Mey95b, Kur94, Liu95, Sha94b, Ale92, Dor84, Gir84, Hod83, Xu83, Lez80, Far76, Fre75b, Won74, Lez73, Zeh73, Fre71 Che94, Bee92, Fre84, Tay82, Far76, Fre75a, Dum73, Fre71, Aye65 Bro96, Gor96b, Bee95a, Hir95a, Joh95, Wi195, Che94, Yu94, Cha85, Svi84, Fy178, Fy177, Lez77b, Lez77d, Nie76, Won74, Lez73, Fre71 Wor79 Yu94, Svi79, Fy177, Lez77b, Lez76 Koh96, Nor96, Boo95, Kic95, Ran95, Ric95, Ang94, Sim94, Zam94, Zuc94a, Zuc94b, Sim92, Zuc92, Go178, Lez77c Bar96, Kha96, Koh96, Ko196, Mor96, Pai96, Phi96, Tor96, Aja95, Boo95, Bra95, Cha95, Dan95, Gof95a, Gof95b, Gor95, Kic95, Ley95, Mey95a, Mey95b, Ric95, Rob95, Sim94, Vir94, Zam94, Zuc94a, Zuc94b, Ale92, Zuc92, Hoc90, Dup89, Coy88, Wei88, Sas87, Fre84, Gir84, Its84, Mit78, Pie75, Pie70 Zha96b Kob96a Ost94, Dup89, Fre84 You94 Jac79 De195, Ber83, Tay81, Jac79, Far76, Bon75, Pit75b, Re174, Hei72 Tay81 Tay82 Tay83 Kam95, Tay82 Tor96 Phi96 Gui96, Mar96, Che95, For95, Bac94, Des94b, Fre94, Aps82 Fra95, Far76, Sey76 Wei88,Wei72 May96, Moh96, Yan96b, Ka195

327

I.W. James

Carbamate Carboxylic acid Carboxylic acid chloride Carboxylic ester alkyne ester vinyl ester Carboxylic peracid Chloride Diazonium chloride Diene Dihydropyridine Dihydropyrimidine Enamine Epoxide Ester Ether

Hydrazine Hydrazone Imidazole Imidazopyrimidine (tetrahydro) Imide Imine

Indole Iodide Isocyanide Isoquinoline (dihydro) (tetrahydro) Isoquinolinone Isoxazole β-Keto ester Ketone Lactam Lactone 328

Geo96, Ka196, Smi96, Pai96, Hau95, Hut95, Ka195, Kic95, Als94, Hut94, Cho93, Dix78 Pat94, Mar93, Bee92, Tam85, Gav79, Fy178, Go178, Far76, Fre75a Mey95b, Fy178, Fre75a, Aye65 Mac96, Anu95, Mey95b, Vet95, Fre94, Ha194, Fre84, Hor78, Lez77c, Aye65, Mer63 Gav79 Gav79 Fre75a, Har74, Tak67 Fy178, Far76, Fre75b, Won74, Cra68 Sem95, Kat76 Nie76 Gor96b Gor96a, Wip95 Gor96a, Mac96 Rob95, Dan94, Dan93 Gui96, Jud96, Tei96, Yan95, Cro80 Bro96, Ham96, Tei96, Tor96, Va196a, Bee95a, Boo95, Che95, Krc95, Ran95, Che94, Moo94, Ric94a, Tho94, Yan94, Bee92, Its87, Sie87, Fre84, Gir84, Its84, Ha182, Mca82, Lez77a, Lez77b, Lez72 Sem95 Hut96b, Sha95, Blo90 Sar96, Zha96a Hut96a Reb75b Loo96, Ni96, Odo96, Boo95, Boy96, Bra95, Cha95, Che95, Gor95, Gre95, Loo95, Mur95, Plu95a, Plu95b, Bun94a, Bun94b, Dew93, Bun92, Mca82, Wor79 Hut96b Wor79 Zha96a Meu95 Hut96a, Meu95 Gof95b Pei94, Bee92, Yed80 Gor96b, Cha81 Aja95, Bra95, Plu95b, Zik95, Blo90, Gir84, Deg80, Lez80, Ro175, Cro70 Ni96, Boo95, Gof95a Moo94, Moo92

A compendium of solid-phase chemistry publications

Mesylate 4-Metathiazanone Nitrile Oligosaccharide

Oligosaccharide mimetic Oxathioacetal Oxazaborolidine Oxazaborolidinone Oxime Phosphate ester Phosphine Phosphine dichloride Phosphine oxide Phosphine sulfide Phosphinic acid Phosphite ester Phosphonate Phosphonium salt Phosphoramidate Phosphorane Phosphorothioate Phthalocyanine Porphyrin Pyridine Pyrrole Pyrrolidine Quinazoline-2,4-dione Quinolone Selenide Selenoxide Silane Siloxy ether Silyl chloride Silyl enol ether Silyl ether Sulfonamide Sulfonium ylid Sulfonyl Sulfonyl azide Sulfonyl chloride Sulfoxide

Fy178, Fy177, Lez77b, Lez76 Ho195 Dup89, Gav79, Aye65 Adi96, Rad96, Sof96, Dou95, Rob95, Dan94, Ha194, Sch94, Yan94, Dan93, Ver93, Dou91, Vee87, Chi76, Zeh73, Fre71 Mu195 Rav94 Fra95 Kam95 Sha95, Fin89, Gir84, Deg80, Far76, Lez73 Wij96, Ber95a, Tan95, Rob94, Dor84, Yip71 Gil96, Ber83, Mas78, Far76, Pit74, Re174, Str74 Hei72 Re174 Re174 Gil96 Boy96 Tan95, Dor84 Fat96, Leb96, Zha96b, Cam95, Cam94 Fre84, Ber83, Fy178, Hei72, Fre71 Fat96, Tan95, Reb75a Hei72, Mck72, Cam7 1b Iye96 Ha182, Lez82, Lez76 Lez78a,Ro175 Gor96b Mja96 Ham96, Ni96, Mur95 Buc96 Mac96 Kat76 Kat76 Tay81 Rob95, Sah93, Cha85 Cha85, Far76 Kob96a, Kob96b Bee95b, Dou95, Dan94, Bee92 Bea96, Boi96, Che95, Dan95, Gen95, Kam95, Mey95b, Bac94, Reb75b, Reb74 Far79 Nie76, Mar70 Rou74 Kam95, Rou74 Pat95 329

I.W. James

Thiazolidine 4-Thiazolidinone Thioester Thioether

Thiol Thiourea Triazine Urea

Pat95, Sha95 L0096, Ni96, Ho195, Loo94 Kob96a, Kob96b, Can95, Ric94b, Svi87 Yan96a, Sha96b, For95, Gas95, May95, Che94, Suc94, Mae92, Hod85, Far79, Far76, Nie76, Mar70 Kob96a, Suc94, Vir94, Fre79, Far76 Chug5 De195 Buc96, Kim96, Ko196, Bur95, Hut95, Kic95, Mey95b, Ter95a, Hut94, Bur93, Dew93, Wei88

General reactions Acylation (see also under Friedel–Crafts) of enolate of N-hydroxy amide of sulfonamide nitrogen Alkylation (see also Enolate chemistry) of alkyne of amide nitrogen

of amine of aryl group of benzyl halide of α-imine of thiol Alkynati on with RC ≡ C– via cross-coupling Aromatic nitration Benzyne chemistry Bromination of aryl group Carbony lation Carboxylation Chlorination from alcohol Cyclization amide formation biaryl formation dihydroisoquinoline formation 330

Cha81 Aki85 Bac96, Bac94 Svi84,Fy177 Buc96, Boo95, Plu95a, Plu95b, Bac94 (with diazomethane), Bun94a, Bun94b, Bun92 Gre95 Hua82, Far76, Ka174 Dyg96 Odo96 Suc94, Vir94 Lez77b, Lez76 You94 Rav94 Maz79, Jay76 Ber83, Jac79, Far76, Bon75, Pit75b, Re174, Cam71a Far76 Tam85, Far76, Fy176 Far76, Won74 Lee96, Boo95, Kan94, Mar93, Rov91, Isi82 Aps82 Meu95

A compendium of solid-phase chemistry publications

imidazole formation imine formation intramolecular Heck reaction intramolecular Michael addition iodolactone tetrahydrofuran formation thioether formation Cycloadditions [2+2] with cyclobutadiene [2+2+ 1] Pauson–Khand [3+2] via miinchnone [3+2] with azomethine ylid [3+2] with RCNO [4+2] Diels–Alder [4+2] Diels–Alder with diimine Deallylation (see also Palladium chemistry Electrophilic addition Enolate chemistry Li enolate Na enolate Zn enolate Enol ether chemistry silyl enol ether Ester hydrolysis Free radical addition to olefin by thio radical reduction Hydroboration Lithiation of alkyne of aryl group of methyl thioether of trityl group Nucleophilic addition to imine of phosphorus Nucleophilic aromatic substitution Nucleophilic cyclization Nucleophilic substitution

Zha96a Plu95a, Plu95b, Bun94a, Bun94b, Bun92 Gof95b, Hir95b Sha96b, Ley95 Moo94, Moo92 Bee92 May95 Reb75b, Reb74 Bo196,Sch90 Mja96 Ham96, Mur95 Bee95a, Bee95b, Mur95, Pei94, Bee92,Yed80 Yed80 Gav79 Val96b, Ber95b Bee92, Tay82, Tay81 Bac94, Moo94, Moo92, Mca82, Cha81, Wor79, Pat70 Cam71a Kur94 Kob96a Pat94 Gas95, Hod85 Wor79 Hod83 Svi84,Fy177 Ber83, Far76, Nie76, Cam71a Cro77 Coh77 Bee92, Fy178, Far76 Hir83 Boy96 Mac96, Nor96, Phi96, Sha96a, Smi96, Yan96a, Dan95, Isi82 Mac96, Vir94 Bar96, Tor96, Aja95, Kic95, Ran95, Vir94, Dan93, Gru73

331

I.W. James

Oxidation primary alcohol to aldehyde pyr. SO3 secondary alcohol to ketone DMSO-oxalyl chloride PDC aldehyde to carboxylic acid m CPBA sodium dichromate/acid aldehyde to carboxylic peracid ozone alkene to epoxide oxirane alkylborane to alcohol trimethylamine oxide carboxylic acid to carboxylic peracid hydrogen peroxide carboxylic acid chloride to carboxylic peracid hydrogen peroxide primary chloride to aldehyde DMSO dihydropyridine to pyridine ceric ammonium nitrate phosphine to phosphine oxide peracetic acid phosphite to phosphate iodine selenide to selenoxide hydrogen peroxide or bromine/sodium hydroxide thioether to sulfonyl hydrogen peroxide mCPBA thioether to sulfoxide mCPBA Phosphorylation Photochemical cyclization Reductions acid to alcohol DIBAL aldehyde to alcohol Red-A1 sodium borohydride

332

Che94 Blo90 Bra95 Bee92 Fre75a, Aye65 Fre75a Rob95, Dan94, Dan93 Hod83 Fre75a, Har74, Tak67 Fre75a Bee95b, Bee92, Dum73, Fre71, Aye65 Gor96b Re174 Wij96, Tan95, Dor84

Kat76 Mar70 Nie76 Pat95 Wij96, Sha94b Van75

Kur94 Won74, Lez73 Zeh73

A compendium of solid-phase chemistry publications

alkene to alkane rhodium catalyst/H2 alkyne to alkene borane disiamylborane (to give cis only) amide to amine borane Red-A1 aromatic nitro to aromatic amine tin(II) chloride azide to amine dithiol tin(II) chloride carbamate to methylamine Red-A1 carboxylic ester to alcohol LiAlH4 diazo to hydrazine SnCl2 disulfide to thiol PBu3 imine to amine borane-pyridine complex NaBH4 NaBH3CN NaBH(OAc)3 iodide to alkyl Bu3SnH ketone to alcohol NaBH4 via hydrosilylation nitrile to primary amine LiAlH4 nitro group to amine NaBH4/Cu(acac)2 phosphine sulfide to phosphine methylation then HMPT pyridinium salt to dihydropyridine Na2S2O4 Reductive alkylation

Oji94 Svi84 Fy177 Pai96 Liu95 Gof95a, Mey95b Rob95, Dan94 Kim96, Tor96, Kic95 Liu95 Gir84, Fre71 Sem95 Suc94, Vir94 Kha96 Cha95 Bra95, Gre95, Meu95, Kal93, Hoc90, Coy88 Koh96, Boo95, Che95, Gor95, Ley95 Wor79 Aja95, Ale92 Mas78 Dup89, Pie75, Pie70 Phi96 Gil96 Dup89 Kha96, Koh96, Cha95, Gor95, Gre95, Mey95a, Kal93, Hoc90, Coy88, Sas87

333

I.W. James

Reductive amination Sulfonation

Boo95, Bra95, Che95, Ley95 Kam95

Named reactions Aldol addition Aldol condensation Aza-Wittig (Staudinger) reaction Biginelli dihydropyrimidine synthesis Bischler–Napieralski reaction Dieckmann cyclization Ene intramolecular Finkelstein Fischer indole synthesis Friedel–Crafts Grignard reaction Hantzsch heterocyclization Henry reaction Knoevenagel condensation Michael addition enolate intermolecular intramolecular RO– RS– intermolecular intramolecular secondary amines Mitsunobu reaction Pauson–Khand cycloaddition Pictet–Spengler reaction Ugi four-component condensation Wittig reaction (and Horner–Emmons condensation)

Kob96b, Kur94 Lez73 Gof95a Wip95 Meu95 Cro80, Cro70 Tei96 Wor79 Hut96b Aja95, Zik95, Gir84, Deg80 Che95, Hau95, Liu95, Xu83, Nie76, Fre75b, Kaw74, Lez73, Kaw72, Cra68 Gor96b Bee95a, Bee95b,Bee92 Gor96b, Tei96

Ley95 Ley95 Bee95a Che94, Mae92 Sha96b Ko196,Mor96 Bro96, Ham96, Nor96, Val96a, Cam95, Krc95, Ran95, Ric94a Bo196, Sch90 May96, Moh96, Yan96b, Ka195 Mja96, Tem96, Zha96a Joh95, Wi195, Che94, Cha85, Fy178, Lez77b, Lez77d, Won74, Fre71

Asymmetric reactions Alkene hydrogenation (catalytic) Lithium enolate Nucleophilic addition to ketones Grignard reaction 334

OJi94 Moo94, Moo92, Mca82, Wor79 Kaw74, Kaw72

A compendium of solid-phase chemistry publications

Reduction ketone to alcohol via hydrosilylation

Mas78

Enzyme-catalyzed reactions Acetal formation (glycosidic bond)

Ha194,Me194,Sch94

Transition metal chemistry Cadmium organocadmium alkylation Chromium carbenoid reaction Cobalt cycloaddition Manganese organomanganese alkylation Mercury aryl mercuration Palladium arylhalide-arylzinc coupling deallylation Heck intermolecular intramolecular Stille Suzuki Platinum Suzuki-type reaction Rhodium carbene insertion (reaction with solid support in tagging) Scandium aldol reaction Vanadium oxidative coupling Zinc arylhalide-arylzinc coupling ene reaction Lewis acid

Lez80 Pul93 Bo196, Sch90 Lez80 Tay83 Mar96 Va196b, Ber95a, Vet95, Kat94, Sha94a Hir95a, You94, Yu94 Koh96, Gof95b, Hir95b For95, Plu95a, Plu95b, Des94b Bro96, Gui96, Koh96, Che95, Bac94, Fre94 Bro96

Bal95, Nes94 Kob96b Aps82 Mar96 Tei96 Hut96b, Dor84

335

I.W. James

Solid-phase inorganic chemistry (SPIC) Aluminum 1,1'-bi-2-naphthol complex phthalocyanine complexes Arsenic aryl arsonic acid Cobalt phosphine complex porphyrin complex Copper amine complex Iridium carbonyl cluster phosphine complex Iron cyclobutadiene complex phosphine complex porphyrin complex Magnesium organomagnesium halide Mercury organomercury compounds Nickel amine complex phosphine complex Osmium Sn-Os compound Palladium phosphine complex Platinum phosphine complex Rhodium amine complex ammonium ion pair cyclopentadienyl complex 1,3-diketone complex phosphine complex Ruthenium phosphine complex Tin tin chloride tin hydride

336

Hu96 Lez93 Jac79 Reg77, Eva74 Rol75, Co174 Oht95 Raf78 Col72 Reb75b, Reb74 Eva74, Str74 Co173 Bur78 Jan82, Tay82, Bur78, Reb75b Pet95 Pet95, Bem82, Pit75b, Pit75c, Eva74 Bur75 Wan94, Bem82 Bem82 Pet95 Set94 Dyg96 Cap82 Gi196, Ach78, Mas78, Pit75a, Pit75c, Pit74, Str74, Dum73, Col72, Cap71 Ngu95, Pit75a, Pit75c, Str74 Bur78 Bla91, Ger91, Ger90

A compendium of solid-phase chemistry publications

Titanium titanocene Vanadium imine complex

Gru77, Bon75, Gru73 Das95

References Ach78 Adi96

Achiwa, K., Chem. Lett., (1978) 905. Adinolfi, M., Barone, G., De Napoli, L., Iadonisi, A. and Piccialli, G., Tetrahedron Lett., 37 (1996) 5007. Aja95 Ajayaghosh, A. and Rajasekharan Pillai, V.N., Tetrahedron Lett., 36 (1995) 777. Aki85 Akiyama, M., Shimizu, K., Aiba, S. and Katoh, H., Bull. Chem. Soc. Jpn., 58 (1985) 1421. Ale92 Alewood, P.F., Brinkworth, R.I., Dancer, R.J., Garnham, B., Jones, A. and Kent, S.B.H., Tetrahedron Lett., 33 (1992) 977. Als94 Alsina, J., Rabanal, F., Giralt, E. and Albericio, F., Tetrahedron Lett., 35 (1994) 9633. Ang94 Angell, Y.M., Garcia-Echeverria, C. and Rich, D.H., Tetrahedron Lett., 35 (1994) 5981. Anu95 Anuradha, M.V. and Ravindranath, B., Tetrahedron, 51 (1995) 5671. Aps82 ApSimon, J.W. and Dixit, D.M., Can. J. Chem., 60 (1982) 368. Aur95 Aurell, M.J., Boix, C., Ceita, M.L., Llopis, C., Tortajada, A. and Mestres, R., J. Chem. Res. (M), (1995) 2569. Aye65 Ayers, J.T. and Mann, C.K., Polymer Lett., 3 (1965) 505. Bac94 Backes, B.J. and Ellman, J.A., J. Am. Chem. Soc., 116 (1994) 11171. Bac96 Backeshy, B.J., Virgilio, A.A. and Ellman, J.A., J. Am. Chem. Soc., 118 (1996) 3055. Ba195 Baldwin, J.J., Burbaum, J.J., Henderson, I. and Ohlmeyer, M.H.J., J. Am. Chem. SOC., 117 (1995) 5588 (reaction with solid support in tagging). Bar96 Barn, D.R., Morphy, J.R. and Rees, D.C., Tetrahedron Lett., 37 (1996) 3213. Bea96 Beaver, K.A., Siegmund, A.C. and Spear, K.L., Tetrahedron Lett., 37 (1996) 1145. Bee92 Beebe, X., Schore, N.E. and Kurth, M.J., J. Am. Chem. Soc., 114 (1992) 10061. Bee95a Beebe, X., Chiappari, C.L., Olmstead, M.M., Kurth, M.J. and Schore, N.E., J. Org. Chem., 60 (1995) 4204. Bee95b Beebe, X., Schore, N.E. and Kurth, M.J., J. Org. Chem., 60 (1995) 4196. Bem82 Bemi, L., Clark, H.C., Davies, J.A., Fyfe, C.A. and Wasylishen, R.E., J. Am. Chem. Soc., 104 (1982) 438. Ber83 Bernard, M. and Ford, W.T., J. Org. Chem., 48 (1983) 326. Ber95a Bergmann, E and Bannwarth, W., Tetrahedron Lett., 36 (1995) 1839. Ber95b Bergmann, F., Kueng, E., Iaiza, P. and Bannwarth, W., Tetrahedron, 51 (1995) 6971. Bla91 Blanton, J.R. and Salley, J.M., J. Org. Chem., 56 (1991) 490. Blo90 Blossey, E.C., Cannon, R.G., Ford, W.T., Periyasamy, M. and Mohanraj, S., J. Org. Chem., 55 (1990) 4664. Bo196 Bolton, G.L., Tetrahedron Lett., 37 (1996) 3433. Bon75 Bonds Jr., W.D., Brubaker, C.H., Chandrasekaran, E.S., Gibbons, C., Grubbs, R.H. and Kroll, L.C., J. Am. Chem. Soc., 97 (1975) 2128. Boo95 Boojamra, C.G., Burow, K.M. and Ellman, J.A., J. Org. Chem., 60 (1995) 5742. Boy96 Boyd, E.A., Chan, W.C. and Loh Jr., V.M., Tetrahedron Lett., 37 (1996) 1647. Bra95 Bray, A.M., Chiefari, D.S., Valerio, R.M. and Maeji, N.J., Tetrahedron Lett., 36 (1995) 5081. Bro96 Brown, S.D. and Armstrong, R.W., J. Am. Chem. Soc., 118 (1996) 6331. Buc96 Buckman, B.O. and Mohan, R., Tetrahedron Lett., 37 (1996) 4439. Bun92 Bunin, B.A. and Ellman, J.A., J. Am. Chem. Soc., 114 (1992) 10997. Bun94a Bunin, B.A., Plunkett, M.J. and Ellman, J.A., Proc. Natl. Acad. Sci. USA, 91 (1994) 4708.

337

I. W. James Bun94b Bur75 Bur78 Bur93 Bur95 Cam71a Cam71b Cam94 Cam95 Can95 Cap71 Cap82 Cha81 Cha85 Cha95 Che94

Bunin, B.A. and Ellman, J.A., Polymer Preprints, 35 (1994) 983. Burlitch, J.M. and Winterton, R.C., J. Am. Chem. Soc., 97 (1975) 5605. Burlitch, J.M. and Winterton, R.C., J. Organomet. Chem., 159 (1978) 299. Burdick, D.J., Struble, M.E. and Burnier, J.P., Tetrahedron Lett., 34 (1993) 2589. Burgess, K., Linthicum, D.S. and Shin, H., Angew. Chem., Int. Ed. Engl., 34 (1995) 907. Camps, F., Castells, J., Ferrando, M.J. and Font, J., Tetrahedron Lett., (1971) 1713. Camps, F., Castells, J., Font, J. and Vela, F., Tetrahedron Lett., (1971) 1715. Campbell, D.A. and Bermak, J.C., J. Am. Chem. Soc., 116 (1994) 6039. Campbell, D.A., Bermak, J.C., Burkoth, T.S. and Patel, D.V., J. Am. Chem. Soc., 117 (1995) 5381. Canne, L.E., Walker, S.M. and Kent, S.B.H., Tetrahedron Lett., 36 (1995) 1217. Capka, M., Svoboda, P., Cerný, M. and Hetfleje, J., Tetrahedron Lett., (1971) 4787. Capka, M., Czakoová, M., Urbaniak, W. and Schubert, U., J. Mol. Cat., 74 (1982) 335. Chang, Y.H. and Ford, W.T., J. Org. Chem., 46 (1981) 5364. Chan, T.-H. and Huang, W.-Q., J. Chem. Soc. Chem. Commun., (1985) 909. Chan, W.C. and Mellor, S.L., J. Chem. Soc. Chem. Commun., (1995) 1475. Chen, C., Ahlberg Randall, L.A., Miller, R.B., Jones, A.D. and Kurth, M.J., J. Am. Chem. Soc., 116 (1994) 2661. Che95 Chenera, B., Finkelstein, J.A. and Veber, D.F., J. Am. Chem. Soc., 117 (1995) 11999. Chi76 Chiu, S.-H.L. and Anderson, L., Carbohydr. Res., 50 (1976) 227. Cho93 Cho, C.Y., Moran, E.J., Chery, S.R., Stephans, J.C., Fodor, S.P.A., Adams, C.L., Sundaram, A., Jacobs, J.W. and Schultz, P.G., Science, 261 (1993) 1303. Chu95 Chu, S.S. and Reich, S.H., Bioorg. Med. Chem. Lett., 5 (1995) 1053. Coh77 Cohen, B.J., Kraus, M.A. and Patchornik, A., J. Am. Chem. Soc., 99 (1977) 4165. Col72 Collman, J.P., Hegedus, L.S., Cooke, M.P., Norton, J.R., Dolcetti, G. and Marquardt, D.N., J. Am. Chem. Soc., 94 (1972) 1789. Co173 Collman, J.P. and Reed, C.A., J. Am. Chem. Soc., 95 (1973) 2048. Co174 Collman, J.P., Gagne, R.R., Kouba, J. and Ljusberg-Wahren, H., J. Am. Chem. Soc., 96 (1974) 6800. Coy88 Coy, D.H., Hocart, S.J. and Sasaki, Y., Tetrahedron, 44 (1988) 835. Cra68 Cramer, F. and Koster, H., Angew. Chem., Int. Ed. Engl., 7 (1968) 473. Cro70 Crowley, J.I. and Rapoport, H., J. Am. Chem. Soc., 92 (1970) 6363. Cro76 Crowley, J.I. and Rapoport, H., Acc. Chem. Res., 9 (1976) 135. Cro77 Crosby, G.A. and Kato, M., J. Am. Chem. Soc., 99 (1977) 278. Cro80 Crowley, J.I. and Rapoport, H., J. Org. Chem., 45 (1980) 3215. Dan93 Danishefsky, S.J., McClure, K.F., Randolph, J.T. and Ruggeri, R.B., Science, 260 (1993) 1307. Dan94 Danishefsky, S.J., Randolph, J.T., Roberge, J.Y., McClure, K.F. and Ruggeri, R.B., Polymer Preprints, 35 (1994) 977. Dan95 Dankwardt, S.M., Newman, S.R. and Krstenansky, J.L., Tetrahedron Lett., 36 (1995) 4923. Das95 Das, S.K., Kumar Jr., A., Nandrajog, S. and Kumar, A., Tetrahedron Lett., 36 (1995) 7909. Deg80 DeGrado, W.F. and Kaiser, E.T., J. Org. Chem., 45 (1980) 1295. De195 Deleuze, H. and Sherrington, D.C., J. Chem. Soc. Perkin Trans. II, (1995) 2217. Des94a Desai, M.C., Zuckermann, R.N. and Moos, W.H., Drug Dev. Res., 33 (1994) 174. Des94b Deshpande, M.S., Tetrahedron Lett., 35 (1994) 5613. Dew93 DeWitt, S.H., Kiely, J.S., Stankovic, C.J., Schroeder, M.C., Cody, D.M.R. and Pavia, M.R., Proc. Natl. Acad. Sci. USA, 90 (1993) 6909. Dix78 Dixit, D.M. and Leznoff, C.C., Isr. J. Chem., 17 (1978) 248. Dor84 Dorman, M.A., Noble, S.A., McBride, L.J. and Caruthers, M.H., Tetrahedron, 40 (1984) 95. Dou91 Douglas, S.P., Whitfield, D.M. and Krepinsky, J.J., J. Am. Chem. Soc., 113 (1991) 5095. Dou95 Douglas, S.P., Whitfield, D.M. and Krepinsky, J.J., J. Am. Chem. Soc., 117 (1995) 2116. Dum73 Dumont, W., Poulin, J.-C., Dang, T.-P. and Kagan, H.B., J. Am. Chem. Soc., 95 (1973) 8295.

338

A compendium of solid-phase chemistry publications Dup89 Dyg96 E1196 Eva74 Far76 Far79 Fat96 Fin89 For95 Fra95 Fre71 Fre75a Fre75b Fre75c Fre79 Fre81 Fre84 Fre94 Fru96 Fyl76 Fyl77 Fyl78 Gas95 Gav79 Gen95 Geo96 Ger90 Ger91 Gi196 Gir84 Gof95a Gof95b Go178 Gor96a Gor96b Gor95 Gor94 Gre95 Gru73 Gru77 Gui96 Ha182

Dupas, G., Decormeille, A., Bourguignon, J. and Quéguiner, G., Tetrahedron, 45 (1989) 2579. Dygutsch, D.P. and Eilbracht, P., Tetrahedron, 52 (1996) 5461. Ellman, J.A., Acc. Chem. Res., 29 (1996) 132. Evans, G.O., Pittman Jr., C.U., McMillan, R., Beach, R.T. and Jones, R., J. Organomet. Chem., 67 (1974) 295. Farrall, M.J. and Fréchet, J.M.J., J. Org. Chem., 41 (1976) 3877. Farrall, M.J., Durst, T. and Fréchet, J.M.J., Tetrahedron Lett., (1979) 203. Fathi, R., Rudolph, M.J., Gentles, R.G., Patel, R., MacMillan, E.W., Reitman, M.S., Pelham, D. and Cook, A.F., J. Org. Chem., 61 (1996) 5600. Findeis, M.A. and Kaiser, E.T., J. Org. Chem., 54 (1989) 3478. Forman, F.W. and Sucholeiki, I., J. Org. Chem., 60 (1995) 523. Franot, C., Stone, G.B., Engeli, P., Spöndlin, C. and Waldvogel, E., Tetrahedron: Asymmetry, 6 (1995) 2755. Fréchet, J.M. and Schuerch, C., J. Am. Chem. Soc., 93 (1971) 492. Fréchet, J.M.J. and Haque, K.E., Macromolecules, 8 (1975) 130. Fréchet, J.M.J. and Haque, K.E., Tetrahedron Lett., (1975) 3055. Fréchet, J.M.J. and Pellé, G., J. Chem. Soc. Chem. Commun., (1975) 225. Fréchet, J.M.J., De Smet, M.D. and Farrall, M.J., Polymer, 20 (1979) 675. Fréchet, J.M.J., Tetrahedron, 37 (1981) 663. Fréchet, J.M.J., Kelly, J. and Sherrington, D.C., Polymer, 25 (1984) 1491. Frenette, R. and Friesen, R.W., Tetrahedron Lett., 35 (1994) 9177. Früchtel, J.S. and Jung, G., Angew. Chem., Int. Ed. Engl., 35 (1996) 17. Fyles, T.M. and Leznoff, C.C., Can. J. Chem., 54 (1976) 935. Fyles, T.M., Leznoff, C.C. and Weatherston, J., Can. J. Chem., 55 (1977) 4135. Fyles, T.M., Leznoff, C.C. and Weatherston, J., Can. J. Chem., 56 (1978) 1031. Gasparrini, F., Misiti, D., Villani, C., Borchardt, A., Burger, M.T. and Still, W.C., J. Org. Chem., 60 (1995) 4314. Gaviña, F., Gil, P. and Palazón, B., Tetrahedron Lett., (1979) 1333. Gennari, C., Nestler, H.P., Salom, B. and Still, W.C., Angew. Chem., Int. Ed. Engl., 34 (1995) 1763. George, M.H., Hailes, H.C. and Widdowson, D.A., J. Chem. Soc. Perkin Trans. I, (1996) 1395. Gerigk, U., Gerlach, M., Neumann, W.P., Vieler, R. and Weintritt, V., Synthesis, (1990) 448. Gerlach, M., Jördens, F., Kuhn, H., Neumann, W.P. and Peterseim, M., J. Org. Chem., 56 (1991) 5971. Gilbertson, S.R. and Wang, X., Tetrahedron Lett., 37 (1996) 6475. Giralt, E., Rizo, J. and Pedroso, E., Tetrahedron, 40 (1984) 4141. Goff, D.A. and Zuckermann, R.N., J. Org. Chem., 60 (1995) 5744. Goff, D.A. and Zuckermann, R.N., J. Org. Chem., 60 (1995) 5748. Goldwasser, J.M. and Leznoff, C.C., Can. J. Chem., 56 (1978) 1562. Gordeev, M.F., Patel, D.V. and Gordon, E.M., J. Org. Chem., 61 (1996) 924. Gordeev, M.F., Patel, D.V., Wu, J. and Gordon, E.M., Tetrahedron Lett., 37 (1996) 4643. Gordon, D.W. and Steele, J., Bioorg. Med. Chem. Lett., 5 (1995) 47. Gordon, E.M., Barrett, R.W., Dower, W.J., Foder, S.P.A. and Gallop, M.A., J. Med. Chem., 37 (1994) 1385. Green, J., J. Org. Chem., 60 (1995) 4287. Grubbs, R.H., Gibbons, C., Kroll, L.C., Bonds Jr., W.D. and Brubaker Jr., C.H., J. Am. Chem. SOC., 95 (1973) 2373. Grubbs, R., Lau, C.P., Cukier, R. and Brubaker Jr., C., J. Am. Chem. Soc., 99 (1977) 4517. Guiles, J.W., Johnson, S.G. and Murray, W.V., J. Org. Chem., 61 (1996) 5169. Hall, T.W., Greenberg, S., McArthur, C.R., Khouw, B. and Leznoff, C.C., Nouv. J. Chem., 6 (1982) 653.

339

I. W. James Hal94 Ham96 Har74 Hau95 Hei72 Her96 Hir83 Hir95a Hir95b Hoc90 Hod80 Hod83 Hod85 Hol95 Hor78 Hu96 Hua82 Hut94 Hut95 Hut96a Hut96b Isi82 Its84 Its87 Iye96 Jac79 Jan82 Jay76 Joh95 Jud96 Kal74 Kal93 Kal95 Kal96 Kam95 Kan94 Kat76 Kat94 Kaw72 Kaw74 Kha96 Kic95 Kim96 Kob96a Kob96b

340

Halcomb, R.L., Huang, H. and Wong, C.-H., J. Am. Chem. Soc., 116 (1994) 11315. Hamper, B.C., Dukesherer, D.R. and South, M.S., Tetrahedron Lett., 37 (1996) 3671. Harrison, C.R. and Hodge, P., J. Chem. Soc. Chem. Commun., (1974) 1009. Hauske, J.R. and Dorff, P., Tetrahedron Lett., 36 (1995) 1589. Heitz, W. and Michels, R., Angew. Chem., Int. Ed. Engl., 11 (1972) 298. Hermkens, P.H.H., Ottenheijm, H.C.J. and Rees, D., Tetrahedron, 52 (1996) 4527. Hirao, A., Itsuno, S., Hattori, I., Yamaguchi, K., Nakahama, S. and Yamazaki, N., J. Chem. Soc. Chem. Commun., (1983) 25. Hiroshige, M., Hauske, J.R. and Zhou, P., Tetrahedron Lett., 36 (1995) 4567. Hiroshige, M., Hauske, J.R. and Zhou, P., J. Am. Chem. Soc., 117 (1995) 11590. Hocart, S.J., Murphy, W.A. and Coy, D.H., J. Med. Chem., 33 (1990) 1954. Hodge, P. and Sherrington, D.C. (Eds.) Polymer-Supported Reactions in Organic Synthesis, Wiley, Chichester, U.K., 1980. Hodge, P. and Waterhouse, J., J. Chem. Soc. Perkin Trans. I, (1983) 2319. Hodge, P., Khoshdel, E., Waterhouse, J. and Frechet, J.M.J., J. Chem. Soc. Perkin Trans. I, (1985) 2327. Holmes, C.P., Chinn, J.P., Look, G.C., Gordon, E.M. and Gallop, M.A., J. Org. Chem., 60 (1995) 7328. Horiki, K., Igano, K. and Inouye, K., Chem. Lett., (1978) 165. Hu, Q.-S., Zheng, X.-F. and Pu, L., J. Org. Chem., 61 (1996) 5200. Huang, X., Chan, C.-C. and Zhou, Q.-S., Synth. Commun., 12 (1982) 709. Hutchins, S.M. and Chapman, K.T., Tetrahedron Lett., 35 (1994) 4055. Hutchins, S.M. and Chapman, K.T., Tetrahedron Lett., 36 (1995) 2583. Hutchins, S.M. and Chapman, K.T., Tetrahedron Lett., 37 (1996) 4865. Hutchins, S.M. and Chapman, K.T., Tetrahedron Lett., 37 (1996) 4869. Isied, S.S., Kuehn, C.G., Lyon, J.M. and Merrifield, R.B., J. Am. Chem. Soc., 104 (1982) 2632. Itsuno, S., Ito, K., Hirao, A. and Nakahama, S., J. Chem. Soc. Perkin Trans. I, (1984) 2887. Itsuno, S., Sakurai, Y., Ito, K., Hirao, A. and Nakahama, S., Polymer, 28 (1987) 1005. Iyer, R.P., Devlin, T., Habus, I., Ho, N.-H., Yu, D. and Agrawal, S., Tetrahedron Lett., 37 (1996) 1539. Jacobson, S.E., Mares, E and Zambri, P.M., J. Am. Chem. Soc., 101 (1979) 6938. Janout, V. and Regen, S.L., J. Org. Chem., 47 (1982) 2212. Jayalekshmy, P. and Mazur, S., J. Am. Chem. Soc., 98 (1976) 6710. Johnson, C.R. and Zhang, B., Tetrahedron Lett., 36 (1995) 9253. Judice, J.K., Namenuk, A.K. and Burnier, J.P., Bioorg. Med. Chem. Lett., 6 (1996) 1961. Kalir, R., Fridkin, M. and Patchornik, A., Eur. J. Biochem., 42 (1974) 151. Kaljuste, K. and Unden, A., Int. J. Pept. Protein Res., 42 (1993) 118. Kaljuste, K. and Unden, A., Tetrahedron Lett., 36 (1995) 9211. Kaljuste, K. and Unden, A., Tetrahedron Lett., 37 (1996) 3031. Kamahori, K., Tada, S., Ito, K. and Itsuno, S., Tetrahedron: Asymmetry, 6 (1995) 2547. Kania, R.S., Zuckermann, R.N. and Marlowe, C.K., J. Am. Chem. Soc., 116 (1994) 8835. Kato, M., Michels, R. and Heitz, W., Polymer Lett., 14 (1976) 413. Kates, SA., de le Torre, B.G., Eritja, R. and Albericio, F., Tetrahedron Lett., 35 (1994) 1033. Kawana, M. and Emoto, S., Tetrahedron Lett., (1972) 4855. Kawana, M. and Emoto, S., Bull. Chem. Soc. Jpn., 47 (1974) 160. Khan, N.M., Arumugam, V. and Balasubramanian, S., Tetrahedron Lett., 37 (1996) 4819. Kick, E.K. and Ellman, J.A., J. Med. Chem., 38 (1995) 1427. Kim, J.-M., Bi, Y., Paikoff, S.J. and Schultz, P.G., Tetrahedron Lett., 37 (1996) 5305. Kobayashi, S., Hachiya, I., Suzuki, S. and Moriwaki, M., Tetrahedron Lett., 37 (1996) 2809. Kobayashi, S., Hachiya, I. and Yasuda, M., Tetrahedron Lett., 37 (1996) 5569.

A compendium of solid-phase chemistry publications

Las87 Leb96 Lee96 Ley95 Lez72 Lez73 Lez76 Lez77a Lez77b Lez77c Lez77d Lez78a Lez78b Lez80 Lez82 Lez93 Liu95 Loo94 Loo95 Loo96 Mac96 Mae92 Mar70 Mar93 Mar96 Mas78 May96 May95 Maz79 Mca82 Mck72 Me194 Mer63 Meu95 Mey95a Mey95b Mit78 Mja96 Moh96

Koh, J.S. and Ellman, J.A., J. Org. Chem., 61 (1996) 4494. Kolodziej, S.A. and Hamper, B.C., Tetrahedron Lett., 37 (1996) 5277. Krchnák, V., Flegelová, Z., Weichsel, A.S. and Lebl, M., Tetrahedron Lett., 36 (1995) 6193. Kurth, M.J., Ahlberg Randall, L.A., Chen, C., Melander, C., Miller, R.B., McAlister, K., Reitz, G., Kang, R., Nakatsu, T. and Green, C., J. Org. Chem., 59 (1994) 5862. Laszlo, P. (Ed.) Preparative Chemistry Using Supported Reagents, Academic Press, San Diego, CA, U.S.A., 1987. Le Bec, C. and Wickstrom, E., J. Org. Chem., 61 (1996) 510. Lee, J., Griffin, J.H. and Nicas, T.I., J. Org. Chem., 61 (1996) 3983. Ley, S.V., Mynett, D.M. and Koot, W.J., Synlett., (1995) 1017. Leznoff, C.C. and Wong, J.Y., Can. J. Chem., 50 (1972) 2892. Leznoff, C.C. and Wong, J.Y., Can. J. Chem., 51 (1973) 3756. Leznoff, C.C. and Fyles, T.M., J. Chem. Soc. Chem. Commun., (1976) 251. Leznoff, C.C. and Dixit, D.M., Can. J. Chem., 55 (1977) 3351. Leznoff, C.C., Fyles, T.M. and Weatherston, J., Can. J. Chem., 55 (1977) 1143. Leznoff, C.C. and Goldwasser, J.M., Tetrahedron Lett., (1977) 1875. Leznoff, C.C. and Sywanyk, W., J. Org. Chem., 42 (1977) 3203. Leznoff, C.C. and Svirskaya, P.I., Angew. Chem., Int. Ed. Engl., 17 (1978) 947. Leznoff, C.C., Acc. Chem. Res., 11 (1978) 327. Leznoff, C.C. and Yedidia, V., Can. J. Chem., 58 (1980) 287. Leznoff, C.C. and Hall, T.W., Tetrahedron Lett., 23 (1982) 3023. Leznoff, C.C., McArthur, C.R. and Qin, Y., Can. J. Chem., 71 (1993) 1319. Liu, G. and Ellman, J.A., J. Org. Chem., 60 (1995) 7712. Look, G.C., Holmes, C.P., Chinn, J.P. and Gallop, M.A., J. Org. Chem., 59 (1994) 7588. Look, G.C., Murphy, M.M., Campbell, D.A. and Gallop, M.A., Tetrahedron Lett., 36 (1995) 2937. Look, G.C., Schullek, J.R., Holmes, C.P., Chinn, J.P., Gordon, E.M. and Gallop, M.A., Bioorg. Med. Chem. Lett., 6 (1996) 707. MacDonald, A.A., DeWitt, S.H., Hogan, E.M. and Ramage, R., Tetrahedron Lett., 37 (1996) 4815. Maeji, N.J., Tribbick, G., Bray, A.M. and Geysen, H.M., J. Immunol. Meth., 146 (1992) 83. Marshall, D.L. and Liener, I.E., J. Org. Chem., 35 (1970) 867. Marlowe, C.K., Bioorg. Med. Chem. Lett., 3 (1993) 437. Marquais, S. and Arlt, M., Tetrahedron Lett., 37 (1996) 5491. Masuda, T. and Stille, J.K., J. Am. Chem. Soc., 100 (1978) 268. Mayer, J.P., Bankaitis-Davis, D., Zhang, J., Beaton, G., Bjergarde, K., Anderson, C.M., Goodman, B.A. and Herrera, C.J., Tetrahedron Lett., 37 (1996) 5633. Mayer, J.P., Heil, J.R., Zhang, J. and Munson, M.C., Tetrahedron Lett., 36 (1995) 7387. Mazur, S. and Jayalekshmy, P., J. Am. Chem. Soc., 101 (1979) 677. McArthur, C.R., Worster, P.M., Jiang, J.-L. and Leznoff, C.C., Can. J. Chem., 60 (1982) 1836. McKinley, S.V. and Rakshys Jr., J.W., J. Chem. Soc. Chem. Commun., (1972) 134. Meldal, M., Auzanneau, F.-I., Hindsgaul, O. and Palcic, M.M., J. Chem. SOC. Chem. Commun., (1994) 1849. Merrifield, R.B., J. Am. Chem. Soc., 85 (1963) 2149. Meutermans, W.D.F. and Alewood, P.F., Tetrahedron Lett., 36 (1995) 7709. Meyer, J.-P., Davis, P., Lee, K.B., Porreca, F., Yamamura, H.I. and Hruby, V.J., J. Med. Chem., 38 (1995) 3462. Meyers, H.V., Dilley, G.J., Durgin, T.L., Powers, T.S., Winssinger, N.A., Zhu, H. and Pavia, M.R., Mol. Div., 1 (1995) 13 (WWW: http://vesta.pd.com/accepted/full_text/article_3/md_003.htm). Mitchell, A.R., Kent, S.B.H., Engelhard, M. and Merrifield, R.B., J. Org. Chem., 43 (1978) 2845. Mjalli, A.M.M., Sarshar, S. and Baiga, T.J., Tetrahedron Lett., 37 (1996) 2943. Mohan, R., Chou, Y. and Morrissey, M.M., Tetrahedron Lett., 37 (1996) 3963. ^

Koh96 Kol96 Krc95 Kur94

341

I. W. James Moo92 Moo94 Mor96 Mu195 Mur95

Moon, H.-S., Schore, N.E. and Kurth, M.J., J. Org. Chem., 57 (1992) 6088. Moon, H.-S., Schore, N.E. and Kurth, M.J., Tetrahedron Lett., 35 (1994) 8915. Morphy, J.R., Rankovic, Z. and Rees, D.C., Tetrahedron Lett., 37 (1996) 3209. Müller, C., Kitas, E. and Wessel, H.P., J. Chem. Soc. Chem. Commun., (1995) 2425. Murphy, M.M., Schullek, J.R., Gordon, E.M. and Gallop, M.A., J. Am. Chem. SOC., 117 (1995) 7029. Nes94 Nestler, H.P., Bartlett, P.A. and Still, W.C., J. Org. Chem., 59 (1994) 4723 (reaction with solid support in tagging). Ngu95 Nguyen, S.T. and Grubbs, R.H., J. Organomet. Chem., 497 (1995) 195. Ni96 Ni, Z.-J., Maclean, D., Holmes, C.P., Murphy, M.M., Ruhland, B., Jacobs, J.W., Gordon, E.M. and Gallop, M.A., J. Med. Chem., 39 (1996) 1601. Nie76 Nieuwstad, Th.J., Kieboom, A.P.G., Breijer, A.J., Van der Linden, J. and Van Bekkum, H., Recl. Trav. Chim. Pays-Bas, 95 (1976) 225. Nor96 Norman, T.C., Gray, N.S., Koh, J.T. and Schultz, P.G., J. Am. Chem. Soc., 118 (1996) 7430. Odo96 O’Donnell, M.J., Zhou, C. and Scott, W.L., J. Am. Chem. Soc., 118 (1996) 6070. Oht95 Ohtani, N., Inoue, Y., Inagaki, Y., Fukuda, K. and Nishiyama, T., Bull. Chem. Soc. Jpn., 68 (1995) 1669. Oji94 Ojima, I., Tsai, C.-Y. and Zhang, Z., Tetrahedron Lett., 35 (1994) 5785. Ost94 Ostresh, J.M., Husar, G.M., Blondelle, S.E., Dörner, B., Weber, P.A. and Houghten, R.A., Proc. Natl. Acad. Sci. USA, 91 (1994) 11138. Pai96 Paikoff, S.J., Wilson, T.E., Cho, C.Y. and Schultz, P.G., Tetrahedron Lett., 37 (1996) 5653. Pat70 Patchornik, A. and Kraus, M.A., J. Am. Chem. Soc., 92 (1970) 7587. Pat75 Patchornik, A. and Kraus, M.A., Pure Appl. Chem., 43 (1975) 503. Pat94 Pátek, M., Drake, B. and Lebl, M., Tetrahedron Lett., 35 (1994) 9169. Pat95 Pátek, M., Drake, B. and Lebl, M., Tetrahedron Lett., 36 (1995) 2227. Pat96 Patel, D.V. and Gordon, E.M., Drug Discov. Technol., 1 (1996) 134. Pei94 Pei, Y. and Moos, W.H., Tetrahedron Lett., 35 (1994) 5825. Pet95 Petrucci, M.G.L. and Kakkar, A.K., J. Chem. Soc. Chem. Commun., (1995) 1577. Phi96 Phillips, G.B. and Wei, G.P., Tetrahedron Lett., 37 (1996) 4887. Pie70 Pietta, P.G. and Marshall, G.M., J. Chem. Soc. Chem. Commun., (1970) 650. Pie75 Pietta, P. and Brenna, O., J. Org. Chem., 40 (1975) 2995. Pit74 Pittman Jr., C.U. and Hanes, R.M., Ann. New York Acad. Sci., 239 (1974) 76. Pit75a Pittman Jr., C.U., Smith, L.R. and Hanes, R.M., J. Am. Chem. Soc., 97 (1975) 1742. Pit75b Pittman Jr., C.U. and Smith, L.R., J. Am. Chem. Soc., 97 (1975) 341. Pit75c Pittman Jr., C.U. and Smith, L.R., J. Am. Chem. Soc., 97 (1975) 1749. Plu95a Plunkett, M.J. and Ellman, J.A., J. Org. Chem., 60 (1995) 6006. Plu95b Plunkett, M.J. and Ellman, J.A., J. Am. Chem. Soc., 117 (1995) 3306. Pul93 Pulley, S.R. and Hegedus, L.S., J. Am. Chem. Soc., 115 (1993) 9037. Rad96 Rademann, J. and Schmidt, R.R., Tetrahedron Lett., 37 (1996) 3989. Raf78 Rafalko, J.J., Lieto, J., Gates, B.C. and Schrader Jr., G.L., J. Chem. Soc. Chem. Commun., (1978) 540. Ran95 Rano, T.A. and Chapman, K.T., Tetrahedron Lett., 36 (1995) 3789. Rav94 Ravindranathan, T., Chavan, S.P. and Awachat, M.M., Tetrahedron Lett., 35 (1994) 8835. Reb74 Rebek, J. and Gaviña, F., J. Am. Chem. Soc., 96 (1974) 7112. Reb75a Rebek Jr., J. and Gaviña, F., J. Am. Chem. Soc., 97 (1975) 3221. Reb75b Rebek Jr., J. and Gavifia, F., J. Am. Chem. Soc., 97 (1975) 3453. Reg77 Regen, S.L. and Lee, D.P., Macromolecules, 10 (1977) 1418. Re174 Relles, H.M. and Schluenz, R.W., J. Am. Chem. Soc., 96 (1974) 6469. Ric94a Richter, L.S. and Gadek, T.R., Tetrahedron Lett., 35 (1994) 4705. Ric94b Richter, L.S., Tom, J.Y.K. and Burnier, J.P., Tetrahedron Lett., 35 (1994) 5547.

342

A compendium of solid-phase chemistry publications Ric95 Rob94 Rob95 Rol75 Rou74 Rov91 Sah93 Sar96 Sas87 Sch90 Sch94 Sem95 Set94 Sey76 Sha94a Sha94b Sha95 Sha96a Sha96b Sie87 Sim92

Sim94 Smi96 Sof96 Str74 Suc94 Svi79 Svi84 Svi87 Tak67 Tam85 Tan95 Tay81 Tay82 Tay83 Tei96 Tem96 Ter95a Ter95b Tho94 Tor96 Val96a

Richter, L.S. and Zuckermann, R.N., Bioorg. Med. Chem. Lett., 5 (1995) 1159. Robles, J., Pedroso, E. and Grandas, A., J. Org. Chem., 59 (1994) 2482. Roberge, J.Y, Beebe, X. and Danishefsky, S.J., Science, 269 (1995) 202. Rollman, L.D., J. Am. Chem. SOC., 97 (1975) 2132. Roush, W.R., Feitler, D. and Rebek, J., Tetrahedron Lett., (1974) 1391. Rovero, P., Quartara, L. and Fabbri, G., Tetrahedron Lett., 32 (1991) 2639. Saha, A.K., Sardaro, M., Waychunas, C., Delecki, D., Kutny, R., Cavanaugh, P., Yawman, A., Upson, D.A. and Kruse, L.I., J. Org. Chem., 58 (1993) 7827. Sarshar, S., Siev, D. and Mjalli, A.M.M., Tetrahedron Lett., 37 (1996) 835. Sasaki, Y., Murphy, W.A., Heiman, M.L., Lance, V.A. and Coy, D.H., J. Med. Chem., 30 (1987) 1162. Schore, N.E. and Najdi, S.D., J. Am. Chem. Soc., 112 (1990) 441. Schuster, M., Wang, P., Paulson, J.C. and Wong, C.-H., J. Am. Chem. Soc., 116 (1994) 1135. Semenov, A.N. and Gordeev, K.Y., Int. J. Pept. Protein Res., 45 (1995) 303. Setty-Fichman, M., Blum, J., Sasson, Y. and Eisen, M., Tetrahedron Lett., 35 (1994) 781. Seymour, E. and Fréchet, J.M.J., Tetrahedron Lett., (1976) 1149. Shapiro, G. and Buechler, D., Tetrahedron Lett., 35 (1994) 5421. Shapiro, G., Swoboda, R. and Stauss, U., Tetrahedron Lett., 35 (1994) 869. Shao, J. and Tam, J.P., J. Am. Chem. Soc., 117 (1995) 3893. Shapiro, M.J., Kumaravel, G., Petter, R.C. and Beveridge, R., Tetrahedron Lett., 37 (1996) 4671. Sharma, S.K., Wu, A.D. and Chandramouli, N., Tetrahedron Lett., 37 (1996) 5665. Sieber, P., Tetrahedron Lett., 28 (1987) 2107. Simon, R.J., Kania, R.S., Zuckermann, R.N., Huebner, V.D., Jewell, D.A., Banville, S., Ng, S., Wang, L., Rosenberg, S., Marlowe, C.K., Spellmeyer, D.C., Tan, R., Frankel, A.D., Santi, D.V., Cohen, F.E. and Bartlett, P.A., Proc. Natl. Acad. Sci. USA, 89 (1992) 9367. Simon, R.J., Martin, E.J., Miller, S.M., Zuckermann, R.N., Blaney, J.M. and Moos, W.H., Tech. Prot. Chem., 5 (1994) 533. Smith, A.L., Thomson, C.G. and Leeson, P.D., Bioorg. Med. Chem. Lett., 6 (1996) 1483. Sofia, M.J., Drug Discov. Technol., 1 (1996) 27. Strathdee, G. and Given, R., Can. J. Chem., 52 (1974) 3000. Sucholeiki, I., Tetrahedron Lett., 35 (1994) 7307. Svirskaya, P.I., Leznoff, C.C., Weatherston, J. and Laing, J.E., J. Chem. Eng. Data, 24 (1979) 152. Svirskaya, P.I. and Leznoff, C.C., J. Chem. Ecol., 10 (1984) 321. Svirskaya, P.I. and Leznoff, C.C., J. Org. Chem., 52 (1987) 1362. Takagi, T., J. Polymer Sci. Polym. Lett., 5 (1967) 1031. Tam, J.P., J. Org. Chem., 50 (1985) 5291. Tan, W., Iyer, R.P., Jiang, Z., Yu, D. and Agrawal, S., Tetrahedron Lett., 36 (1995) 5323. Taylor, R.T., Crawshaw, D.B., Paperman, J.B., Flood, L.A. and Cassell, R.A., Macromolecules, 14 (1981) 1134. Taylor, R.T., Cassell, R.A. and Flood, L.A., Ind. Eng. Chem. Prod. Res. Dev., 21 (1982) 462. Taylor, R.T. and Flood, L.A., J. Org. Chem., 48 (1983) 5160. Teitze, L.F. and Steinmetz, A., Angew. Chem., Int. Ed. Engl., 35 (1996) 651. Tempest, P.A., Brown, S.D. and Armstrong, R.W., Angew. Chem., Int. Ed. Engl., 35 (1996) 640. Terrett, N.K., Bojanic, D., Brown, D., Bungay, P.J., Gardner, M., Gordon, D.W., Mayers, C.J. and Steele, J., Bioorg. Med. Chem. Lett., 5 (1995) 917. Terrett, N.K., Gardner, M., Gordon, D.W., Kobylecki, R.J. and Steele, J., Tetrahedron, 51 (1995) 8135. Thompson, L.A. and Ellman, J.A., Tetrahedron Lett., 35 (1994) 9333. Tortolani, D.R. and Biller, S.A., Tetrahedron Lett., 37 (1996) 5687. Valerio, R.M., Bray, A.M. and Patsiouras, H., Tetrahedron Lett., 37 (1996) 3019.

343

I. W. James Va196b Valerio, R.M., Bray, A.M. and Stewart, K.M., Int. J. Pept. Protein Res., 47 (1996) 414. Van75 Vanest, J.-M., Gorsane, M., Libert, V., Pecher, J. and Martin, R.H., Chimia, 29 (1975) 343. Vee87 Veeneman, G.H., Notermans, S., Liskamp, R.M.J., Van der Marel, G.A. and Van Boom, J.H., Tetrahedron Lett., 28 (1987) 6695. Ver93 Verduyn, R., Van der Klein, P.A.M., Douwes, M., Van der Marel, G.A. and Van Boom, J.H., Recl. Trav. Chim. Pays-Bas, 112 (1993) 464. Vet95 Vetter, D., Tumelty, D., Singh, S.K. and Gallop, M.A., Angew. Chem., Int. Ed. Engl., 34 (1995) 60. Vir94 Virgilio, A.A. and Ellman, J.A., J. Am. Chem. Soc., 116 (1994) 11580. Wan94 Wang, P.-W. and Fox, M.A., J. Org. Chem., 59 (1994) 5358. Wei72 Weinshenker, N.M. and Shen, C.M., Tetrahedron Lett., (1972) 3281. Wei88 Weinshenker, N.M., Shen, C.M. and Wong, J.Y., Org. Synthesis Coll., VI (1988) 951. Wij96 Wijkmans, J.C.H.M., Meeuwenoord, N.J., Bloemhoff, W., Van der Marel, G.A. and Van Boom, J.H., Tetrahedron, 52 (1996) 2103. Wil95 Williard, R., Jammalamadaka, V., Zava, D., Benz, C.C., Hunt, C.A., Kushner, P.J. and Scanlan, T.S., Chem. Biol., 2 (1995) 45. Wip95 Wipf, P. and Cunningham, A., Tetrahedron Lett., 36 (1995) 7819. Won73 Wong, J.Y. and Leznoff, C.C., Can. J. Chem., 51 (1973) 2452. Won74 Wong, J.Y., Manning, C. and Leznoff, C.C., Angew. Chem., Int. Ed. Engl., 13 (1974) 666. Wor79 Worster, P.M., McArthur, C.R. and Leznoff, C.C., Angew. Chem., Int. Ed. Engl., 18 (1979) 221. Xu83 Xu, Z.-H., McArthur, C.R. and Leznoff, C.C., Can. J. Chem., 61 (1983) 1405. Yan94 Yan, L., Taylor, C.M., Goodnow Jr., R. and Kahne, D., J. Am. Chem. Soc., 116 (1994) 6953. Yan95 Yan, B., Kumaravel, G., Anjaria, H., Wu, A., Petter, R.C., Jewell Jr., C.F. and Wareing, J.R., J. Org. Chem., 60 (1995) 5736. Yan96a Yan, B. and Kumaravel, G., Tetrahedron, 52 (1996) 843. Yan96b Yang, L. and Guo, L., Tetrahedron Lett., 37 (1996) 5041. Yed80 Yedidia, V. and Leznoff, C.C., Can. J. Chem., 58 (1980) 1144. Yip71 Yip, K.F. and Tsou, K.C., J. Am. Chem. Soc., 93 (1971) 3272. You94 Young, J.K., Nelson, J.C. and Moore, J.S., J. Am. Chem. Soc., 116 (1994) 10841. Yu94 Yu, K.-L., Deshpande, M.S. and Vyas, D.M., Tetrahedron Lett., 35 (1994) 8919. Zam94 Zambias, R.A., Boulton, D.A. and Griffin, P.R., Tetrahedron Lett., 35 (1994) 4283. Zeh73 Zehavi, U. and Patchornik, A., J. Am. Chem. Soc., 95 (1973) 5673. Zha96a Zhang, C., Moran, E.J., Woiwode, T.F., Short, K.M. and Mjalli, A.M.M., Tetrahedron Lett., 37 (1996) 751. Zha96b Zhang, C. and Mjalli, A.M.M., Tetrahedron Lett., 37 (1996) 5457. Zik95 Zikos, C.C. and Ferderigos, N.G., Tetrahedron Lett., 36 (1995) 3741. Zuc92 Zuckermann, R.N., Kerr, J.M., Kent, S.B.H. and Moos, W.H., J. Am. Chem. Soc., 114 (1992) 10646. Zuc94a Zuckermann, R.N. and Goff, D.A., Polymer Preprints, 35 (1994) 975. Zuc94b Zuckermann, R.N., Martin, E.J., Spellmeyer, D.C., Stauber, G.B., Shoemaker, K.R., Kerr, J.M., Figliozzi, G.M., Goff, D.A., Siani, M.A., Simon, R.J., Banville, S.C., Brown, E.G., Wang, L., Richter, L.S. and Moos, W.H., J. Med. Chem., 37 (1994) 2678.

344

Indexes

This page intentionally left blank

Author Index Baldwin, J.J. 287 Bell, S. 169 Bozicevic, K. 298 Christie, B.D. 267 Coe, D.M. 50 Collins, J. 210

Kay, B.K. 93 Kiely, J.S. 6 Lam, K.S. 192 Levitan, B. 95 Mjalli, A.M.M. 19 Moos, W.H. 265

DeWitt, S.H. 69 Dolle, R.E. 287

Nourse, J.G. 267

Edwards, M.G. 314 Ellington, A.D. 93, 169

Patel, D.V. 78 Pavia, M.R. 3

Fitch, W.L. 59 Hall, S.E. 30 Harris, A.L. 273 Hauske, J. 321 Hirao, 1. 169

Reichman, M. 273 Sarshar, S. 19 Schuster, P. 153 Storer, R. 50 Sucholeiki, I. 41 Uphoff, K.W. 169

James, I.W. 326 Jhaveri, S.D. 169

Zhao, Z.-G. 192

347

Subject Index 3zip3 type 216, 247 vectors 246 70:10:10:10 method 228ff 96-well microtiter plate 20 A-type-ANP receptor-specific agonist 250 Abimed 198 Activated cell sorter (FACS) 203 Activation of DNA binding of p53 212 Adaptive walk 127ff , 141, 162 ADP-activated platelets 248 Affinity 7, 9, 59, 93, 95ff , 169ff , 192, 197, 201, 210ff , 324 adsorption 212 selection 9, 99, 192, 201 Affymax 6, 33, 61, 88, 236, 314, 322 Agonist for the erythropoietin receptor (EPOR) 236 Alanex 314, 322 Alanine scan 229 Algorithm for ‘doping’ 227 Amberlite 56 Amplification 99ff 154, 160, 164, 169ff , 210ff β-Amyloid precursor protein inhibitor 251 Anchor residues 221 ANP-minidomain 254 Anti-TNFα autoantibodies 253 Antibodies 9, 129ff , 185, 200, 218ff , 290 Antibody maturation 229 repertoire library 240ff Antisense 16, 320 Applied molecular evolution 95ff , 153 Aptamers 170ff ArgoGel 46, 194 Argonaut 49, 73, 194 Ariad 315, 322 Armstrong 52 ArQule 65, 314, 322 348

Array 3, 6ff , 19ff , 50ff , 69, 79, 139, 192, 212, 244, 283, 320 Arris 314, 322 Atrial natriuretic protein (ANP) 238 attB 240 attP 240 Autocorrelation 128ff , 174 Autoimmune 220, 247 disease antigen epitopes 221 Bacteriophage 100, 210ff T4 100, 234 P4 234 BBs 78ff Berkeley 64 Binding to melanoma cells 249 Bioavailable 79 Biopanning 99ff , 233 Biopolymers 20, 142, 154, 169 BIOSTAR 219, 276 Biotin 219, 235 Biotinylated target 2 19 Blood group antigens 247 Boger 51 BPTI 224, 237, 254 Brain-specific clones 250 Bray 190, 194, 210ff , 231, 239, 250 Brenner 292 Bruker 62 Building blocks 5, 7ff , 27, 35, 59, 66, 78ff , 196, 226 Calmodulin binding peptides 212 Capillary electrophoresis 9, 65 CASTing 230 CDR walking 229 Cellulose 4, 44, 100, 194ff Chain shuffling 226ff Chelation 83 effect 216 CHEMGLASS 23 Chiron 8, 22, 44, 314ff , 322 Chloroform-resistant clones 23 1 Chromogenic 86, 201

Subject Index

Chymotrypsin inhibitors 228 Ciba-Geigy 315, 319 Claim 301ff , 321 Clipping 268 Clones 212ff Colorimetric 9, 201, 283 Compact matrices 254 Competition during panning 250 Connectivity 161, 268, 282 Conotoxins 222 Consensus motif 213ff sequences 95ff , 182, 220 Constrained disulphide peptide libraries 254 epitopes 214ff libraries 235 loop 235 phage-displayed peptide libraries 221 variable epitopes 212 Context phenomenon 239 Contexts 212 Controlled pore glass (CPG) 43 Copolymer 22, 43 Cosmid vector 233 Cotton 44, 69 CP-1 217ff Cre/loxP recombination system 233 CSAP 276ff Cyclic peptide 197, 235 Cys2His2 tetrad 238 Cytochrome b562 2 17ff Cytokeratins 248 Cytoskeletal antigen 248 D-amino acid polypeptide ligands 252 D-enantiomer synthesis 252 Daylight database 28 Decoding 19, 50, 59, 81, 295 Deconvolution 3, 7ff , 59, 80, 139ff , 192ff , 269, 284, 287ff Definition of SH3-epitopes 212 Degeneracy 291 Deletion of repeats and palindromes 221

Designed scaffold zinc-finger (CP-1) 237 DeWitt 5, 23, 69, 167 Diabody display 217 Dimerization of a eukaryotic receptor 237 Direct interaction rescue 217, 246 Disease-related antibodies 253 Disease-specific epitopes 254 Disks 42, 275 Dissociation rate constant 219ff DIVERSOMERTM 21, 197 Divide, couple, and recombine 6ff Divinylbenzene 42ff , 193 DNA-binding transcription regulators 212 DNA-shuffling 230 Dow 71, 322 Drug discovery 5, 19, 59, 78ff , 192, 265, 273ff , 314, 321ff Dye 165, 174, 201 E. coli mutD5 strains 230 E-selectin binding clones 236 Early immune response antigens 252 Ecotin 254 Elastase inhibitors 228 Electron capture gas chromatography 13 Electroporation 232 Electrospray 62ff , 295 Ellman 23, 64 Encoded libraries 3, 10, 80, 125 Encryption tag 219 Enrichment 97ff , 201, 213ff Enumeration 267ff Enzyme-linked immunosorbant assay (ELISA) 198 Epitope 184, 198, 212ff scanning 221 Equimolar 3, 8ff , 51, 86, 230 Error threshold 140ff , 156 Error-prone PCR 229ff ESR 44 Evaporative light scattering (ELS) 28 Evolution 5, 6, 73, 93, 95ff , 153ff , 169ff , 198, 210ff , 273 349

Subject Index

Evolutionary biology 126, 150, 153 biotechnology 153ff selection 222 Exon shuffling 244 Expressed by phage display 216 Extension libraries 255 Fab display 217 Fibrinogen 171, 248 Fibronectin 235, 248 Filamentous phage 110, 121 Fingerprint of the immune status 253 Fitness 153ff , 172ff landscape 95ff , 155, 166, 172ff Fluorescent 193, 203, 249, 291 Fmoc 60, 195, 292 Fodor 198 Frank 178, 194 FTIR 42 Furin 250 Furka split synthesis 27 Fuzzy 268 FVII 251 FXIa 251 Gallop 198 Gausepohl 69 Gel-phase 42ff , 61 λint Gene 240 Generic structure 267ff Genetic algorithm 140ff engineering 166 Genetics 154, 169, 210 Genome 131, 169, 212ff Genotypes 143, 153 Geysen 6, 22, 43, 192ff , 288 Giralt 44 Glaxo 50, 319, 322 Gold 174ff , 224 Hamming distance 125ff , 156, 171ff HCV-infected patient sera 253 Head-to-tail 197 Hepatitis B virus 253 350

Heterocyclic 7ff , 25, 39, 52, 81ff , 268 Heterodimers 217 HF 70, 217 High throughput 5, 15, 19ff , 65, 78, 266, 273ff , 322 screening (HTS) 15, 19, 274 High-density pin replicating tools (HDPRTs) 283 Hill plot 279 HIV 5, 85, 182, 253, 312 Hoechst Marion Roussel 319 Houghten 6ff , 69, 192ff , 288, 314 HP228 323 HPLC 28, 59ff , 70, 283, 295 HTS 86, 274ff Human FAB fragments 254 genome programme 255 pancreatic secretory trypsin inhibitor (PSTI) 227 respiratory syncytial virus 253 Immunization with autologous tumour cells 249 Immunogenetic markers 212 ImmunoZAP 232 λIn vitro packaging system 231 In vitro selection 169ff In vivo hypermutation 229 Indexed library 11 Induced-fit 235 Information systems 267 Inhibitor variants 228 Integrin 197, 234ff αIIbβl 235 Intellectual property 5, 224, 265, 298ff Inter- and intramolecular synergistic interactions 227 Interleukin-1 receptor antagonists 236 Intramolecular disulphide bridge 235 IRORI 27 Irrational drug design 163, 210 ISIS 278 Iterative deconvolution 7ff , 144 Janda 54 Joyce 171

Subject Index

Kaldor 56 Kallikrein 224, 251 Kauffman 104ff Kel-F-g-styrene 44 Kidney-specific clones 250 Kieselguhr 42 Kinetics of binding and dissociation 219 Kurth 33, 47 LACI-D1 224, 237 Lack of secretion 221 Ladner 210 Lam 13, 94, 193ff 289 Landscape 94, 95ff , 155ff , 169ff , 230 and mosaic phage 231 Larger plaques 222 Lebl 292 Lerner 204, 292 Leucine zipper from Jun 247 Libraries from libraries 14 Ligands for streptavidin 235 Light- and heavy-chain shuffling 228 Lilly 3, 65, 319, 322ff Liquid-phase combinatorial synthesis (LPCS) 13, 55 Loading 4, 22, 30, 41ff , 60ff , 69, 194 Low valency 217, 236 loxP/intron 246 loxP sites 221 LY-334370 324 Macroporous 44 Magic angle spinning (MAS) 28, 61, 292 MALDI 62 Markush structure 267 Mass spectrometry 8, 42, 59ff 201, 291ff Master plate 25 Matrix 6, 22, 51, 63, 81, 139, 200, 220, 278 immobilized ‘Spot method’ 221 MDL SCREEN 277 Medicinal chemistry 4, 6, 19, 192, 273ff Melanoma patients 249

Merck 30, 321 Merrifield 4, 19, 41ff , 60ff , 78, 193, 326 resin 4, 193 Metal chelaters 212 MHC specificity 212 Microchips 26 Microsequencing 203, 290 Microsoft Windows 25 Microtechnology 166 Mimotope 199, 212ff Minibody 237 Minor coat proteins pVII and pIX 216 MLR 276 Modified ligand phage 219 nucleotides 121ff Molecular recognition 93, 172ff , 192, 200 Monovalent libraries 255 Monsanto 33 Mosaic clone libraries 231 Motif 14, 93, 117, 171ff , 200ff , 212ff , 287 Multipin 21ff , 197ff Multivalent banks 255 presentation 219, 235 Mutagenic libraries 228 Mutation 5, 11, 95ff , 153ff , 169, 184, 210ff rate 140ff , 156ff , 226ff Mutational enrichment procedures 240, 255 libraries 255 N-carboxyanhydrides (UNCAs) 196 Nanospray 66 Natural killer cell surface antigen LAG-3 253 Neutral evolution 153ff networks 134ff , 153ff NeXstar 315 NMR 9, 28, 42ff , 59ff 252, 292 NOESY 63 351

Subject Index

Nonnaive libraries 250 Novartis 315 Oligomeric 3, 23, 30, 72, 78ff , 192, 291ff Oligonucleotides 11, 19, 45, 124, 166, 169, 193ff , 226, 292 Oligosaccharides 124, 193 mimotopes 212 One-bead–one-compound 13ff , 192ff OntoBLOCK 21ff , 73 Ontogen 25ff , 65, 73 Optimization 4, 15, 19ff , 69ff , 84, 97ff , 154ff , 192, 214, 265, 322ff , 326 Oracle database 28 Orca 281 Orthogonal library 12, 80 PlCre bacteriophage 241 λPackaging vectors 240, 255 PAI 229, 254 Panning 99ff , 161, 210ff in vivo 249 on cells 233ff Parke-Davis 23, 73, 321 PCR 13, 79, 100, 148, 165, 169, 201, 220ff , 292 amplified DNA/RNA-based combinatorial libraries 240 PEG-PS 43ff , 194 PEGA 43, 194 Peptide 3, 6ff , 19ff , 30, 41ff , 65, 69ff , 78ff , 93, 104ff , 166, 182, 185, 192ff , 212ff , 288ff , 303ff , 321 binding to SH3-domains 220 Peptides-on-plasmids 234 Peptidomimetic 3, 9, 87, 192 Peptoid 8, 21, 73, 79, 322 Percolation 143, 160 Peripheral blood cells 249 Pfizer 9, 321ff Phage 216 λ Phage 240 λPhage-display vectors 217 Phage display 5, 94, 95ff , 166, 210ff , 310 scaffolds 212 352

Phagemid 121, 210ff Phage Pfl 216 Phagotopes 253 Pharmaceutical ligand leads 237 Pharmacopeia 314ff , 322 Pharmacophore 14, 25, 79ff , 284 Phenotype 153ff , 169, 221 Photolabile linkers 62 Photolithographic 198 Phylogeny 127ff , 153, 169 Pin 21ff , 30ff , 42ff , 60, 69, 192ff , 283, 287 Pirrung 51, 85 pJuFo 246 Plasmids 217ff Plasmin 224ff Platelet aggregation 235, 248 antagonists 212 Plurality of synergistic effects 229 Polyacrylamide 44 Polyethylene 43 glycol (PEG) 13, 30, 43, 55, 194 Polyhype 43 Polymerase chain reaction (PCR) 13, 165, 169, 292 Polymethacrylate 43 Polypropylene 6, 22, 44, 69, 192, 295 Polystyrene resin 6, 26, 45, 62, 193 Pools 5, 7ff , 50ff , 65, 80ff , 139ff , 170ff , 229, 253, 309 Positional scan 10ff , 198ff Promiscuous kinases 219 Protease inhibitors 85, 212ff Protein A (mini Z-domain) 254 Proteins 93, 100, 133, 153ff , 170ff , 192ff , 210ff Proteolytic degradation 222 PSTI 224ff pV of λ 233 and pD display systems 233 pVIII 214ff QSAR 135, 267 Quasispecies 141, 156 Radio frequency 19, 292ff

Subject Index

Rapp Polymere 43 Rare codon effect 229 Rational design 5, 59, 79ff , 93, 164, 210, 251 Reactors 69 Rebek 51 Recombination 96ff , 153, 210ff Recursive deconvolution 10, 199ff , 290 Replication 153ff , 169, 210ff , 312 Resynthesis 7ff , 83, 200, 287ff RGD motif 236 Rheumatoid arthritis 252 Ribozyme 165, 171ff Rink resin 27, 85, 295 RNA 93, 96ff , 153ff , 171ff , 212ff structures 134, 157 Root 267ff Rugged landscapes 128ff Ruggedness 13lff , 173 SAAB 230 Sagian 281 SAIC (Science Applications International Corporation) 284 Sandoz 62, 315 SAR 9, 27, 80ff , 198, 273, 295 Scaffolds, 79ff , 196, 212ff Scavengers 4, 36 scFv display 217 Schreiber 203 Scripps 62 Searching 160, 198, 267ff , 283 Second-generation libraries 219 SecY-secretion pathway prlAsuppressor mutations 222 Selected anti-thyroid peroxidase (antiTPO) scFv fragments 247 Selectide 13, 195, 319, 322 Selection 9, 45, 93, 153ff , 169ff , 192ff , 210ff ,315 and amplification of phage (SAP) 246 SELEX 95, 165, 230 SELEXION 99ff Self-splicing intron 244 Sequence space 96ff , 153ff , 172ff

Sexual PCR 148, 231 SH3 9, 203, 212ff Shape space 125ff , 153ff covering 158ff β-Sheet α-amylase inhibitor 239 Sheets 42, 237, 276 Shuffling 228ff Side-chain 193ff Site-specific recombination 240ff Small-molecule 3, 30, 171, 192 Smith 51, 81, 140, 210ff Somatic gene therapy cell targetting 255 maturation 219ff of antibodies 230 Spatially addressable 4, 139, 192, 287 Specificity 93, 126, 183ff , 200, 212ff of SH2-domains 220 Sphinx 319, 322 Spin glasses 129 Split and recombine 3, 19ff SPOC 20ff , 326 SPOT 194ff ssDNA/RNA binding clones 212 Standard operating procedures (SOPS) 280 Still 203 Stochastic 27, 99ff , 177 Stringencies 179 Strong synergism 228 Structure of the fd phage 216 Sublibraries 51, 81, 200, 288ff Submonomer 7 Substrate phage 250 Symmetric dimer 237 Synergistic interactions between neighbouring amino acids 220 Synthetic chemical libraries 219 peptide 5, 192ff , 219ff trinucleotides 226 Synthons 287ff Szostak 171ff Tag 13, 222 Tartar 12 353

Subject Index

TCR mimotopes 212 TDA 230 Tecan 73, 282 Tendamistat 239, 254 Tentage1 30, 43ff , 61, 193ff TFA 62, 70, 83, 195 Thermodynamics of binding 235 Tissue-specific ligands 212 TLC 59 Tomtec 284 Topography 173 Toxicity of the protein presented 221 Trade secret 300 Transponder 26 Tripos 276 Turnkey 70, 276 Type 3+3 214 Type DD phage display vector 234

354

Ulcerative colitis 249 Urokinase plasminogen activator receptor (UPAR) agonists 248 UV 42, 60 Vaccine development 212 Varian 28, 62, 221 Variegated sequences 228 Visual Basic 27, 277 Vitronectin 235 Von Willebrand factor 248 Wang 62ff , 176, 292 Wells 210 World wide web (WWW) 255 Zinc-finger protein 210ff Zymark robot 9