Virtual Screening: An Alternative or Complement to High Throughput Screening?

VIRTUAL SCREENING: AN ALTERNATIVE OR COMPLEMENT TO HIGH THROUGHPUT SCREENING? Virtual Screening: An Alternative or Co...

Author: Gerhard Klebe

48 downloads 756 Views 7MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

VIRTUAL SCREENING: AN ALTERNATIVE OR COMPLEMENT TO HIGH THROUGHPUT SCREENING?

Virtual Screening: An Alternative or Complement to High Throughput Screening?

Proceedings of the Workshop ‘New Approaches in Drug Design and Discovery’, special topic ‘Virtual Screening’, Schloß Rauischholzhausen, Germany, March 15–18, 1999

Edited by

Gerhard Klebe Institute of Pharmaceutical Chemistry, Philipps University of Marburg, Marbacher Weg 6, D-35032 Marburg, Germany

Reprinted from Perspectives in Drug Discovery and Design, Volume 20,2000

Kluwer Academic Publishers New York / Boston / Dordrecht / London / Moscow

eBook ISBN: Print ISBN:

0-306-46883-2 0-792-36633-6

©2002 Kluwer Academic Publishers New York, Boston, Dordrecht, London, Moscow

All rights reserved

No part of this eBook may be reproduced or transmitted in any form or by any means, electronic, mechanical, recording, or otherwise, without written consent from the Publisher

Created in the United States of America

Visit Kluwer Online at: and Kluwer's eBookstore at:

http://www.kluweronline.com http://www.ebooks.kluweronline.com

Table of Contents Preface Combination of molecular similarity measures using data fusion C.M.R. Ginn, P. Willett and J. Bradshaw Optimization of the drug-likeness of chemical libraries J. Sadowski

vii 1 17

Generating consistent sets of thermodynamic and structural data for analysis of protein-ligand interactions T.G. Davies, J.R.H. Tame and R.E. Hubbard

29

Multiple molecular superpositioning as an effective tool for virtual database screening C. Lemmen, M. Zimmermann and T. Lengauer

43

A recursive algorithm for efficient combinatorial library docking M. Rarey and T. Lengauer

63

Modifications of the scoring function in FlexX for virtual screening applications M. Stahl

83

A knowledge-based scoring function for protein-ligand interactions: Probing the reference state I. Muegge

99

Predicting binding modes, binding affinities and ‘hot spots’ for protein-ligand complexes using a knowledge-based scoring function H. Gohlke, M. Hendlich and G. Klebe

115

Hydrophobicity maps and docking of molecular fragments with solvation N. Majeux, M. Scarsi, C. Tenette-Souaille and A. Caflisch

145

Virtual screening with solvation and ligand-induced complementarity V. Schnecke and L.A. Kuhn

171

Similarity versus docking in 3D virtual screening J. Mestres and R.M.A. Knegtel

191

Discovering high-affinity ligands from the computationally predicted structures and affinities of small molecules bound to a target: A virtual screening approach T.J. Marrone, B.A. Luty and P.W. Rose

209

In vitro and in silico affinity fingerprints: Finding similarities beyond structural classes H. Briem and U.F. Lessel

231

Computer-assisted synthesis and reaction planning in combinatorial chemistry J. Gasteiger, M. Pförtner, M. Sitzmann, R. Höllering, O. Sacher, T. Kostka and N. Karg

245

Evaluation of reactant-based and product-based approaches to the design of combinatorial libraries V.J. Gillet and O. Nicolotti

265

Author Index Subject Index

289 291

Preface Virtual Screening: An Alternative or Complement to High Throughput Screening?

Gerhard Klebe

In the next couple of years the human genome will be fully sequenced [1]. This will provide us with the sequence and overall function of all human genes as well as the complete genome for many microorganisms. Subsequently it is hoped, that by means of powerful bioinformatic tools, the gene variants can be determined that contribute to various multifactorial diseases and genes that exist in certain infectious agents but not humans. As a consequence, this will allow us to define the most appropriate levels for drug intervention. It can be expected that the number of potential drug targets will increase, possibly by a factor of 10 or more [2,3]. Nevertheless, sequencing the human genome or, for that matter, the genome of other species will be only the starting point for the understanding of their biological function. Structural genomics is a likely follow-up, combined with new techniques to validate the therapeutic relevance of such newly discovered targets [4]. Accordingly, it can be expected that in the near future we will witness a substantial increase in novel putative targets for drugs. To address these new targets effectively, we require new approaches and innovative tools [3]. At present two alternative, however complementary, techniques are employed: experimental high-throughput screening (HTS) of large compound libraries, increasingly provided by combinatorial chemistry, and computational methods for virtual screening (VS) and de novo design [5]. Experimental HTS involves highly sophisticated robotics and advanced engineering know-how. Appropriate molecular test systems have to be automated and adapted to the conditions of HTS. Advanced computer and informatics technology has to handle the logistics and the immense data flow. HTS typically produces a tremendous amount of ligand binding data with typical hit rates of about 1%. Perhaps, at first glance, this figure appears quite low. However, considering one to several million compounds to be assayed per HTS run, this hit rate still provides a fair number of active compounds. Because HTS requires engagement in several cost- and labor-intensive techniques, many attempts have been made to increase its efficacy. As a consequence, in many companies, modelers have shifted their focus toward

viii the design of libraries optimally suited for HTS. So-called ‘optimally diverse libraries’ showing a minimum of redundancy have been created and compiled on the basis of inventive, property-discriminating descriptors. However, the enrichment with respect to discovered hits did not significantly depart from a random selection taken from a large library holding various organic compounds in the correct molecular weight range [6]. Perhaps these studies have stimulated and improved our understanding of similarity and helped to design targeted libraries for one particular binding site or as isosteres for a given reference ligand. Similarity and likewise diversity are typical properties that can only be defined relative to a reference and not globally over an entire sample of compounds. In the early stage of HTS quite optimistic and enthusiastic perspectives have been predicted. Together with the emergence of combinatorial chemistry, that was expected to push the frontiers of compound synthesis ahead by some orders of magnitude, the end of any rational and knowledge-based approaches has been forecasted. Today, several years later, a more realistic view has been accepted. First of all, automating biological testing is not without problems. False positives or non-specific target binding of possible test candidates are only some of the problems that puzzle scientists. Quite depressing are the reported success rates to translate apparent actives from HTS into leads that are suited for a subsequent optimization into a drug candidate [7]. Nevertheless, although hits discovered by HTS provide medicinal chemists with real chemical compounds that bind to a target [8], these hits do not contribute to our understanding of why and how they act upon the target. Any increase in knowledge is produced only once experimental structural biology or molecular modeling come into play to detect structural similarity or possible common binding modes among the obtained hits. Often enough hits are quite diverse in their chemical structure, thus preventing any reasonable intuitive comparison. Virtual screening, VS, is an alternative where the selection of compounds with predicted binding properties is attempted in the computer [9]. The approach appears quite tempting. Compounds to be studied do not necessarily exist and their testing does not consume valuable substance material. Experimental deficiencies, e.g. due to limited solubility or other effects that can interfere with the assay conditions do not matter. In contrast to HTS, VS requires as key prerequisite knowledge about the criteria responsible for binding to a particular target. Either the three-dimensional structure of the target is given by crystal structure determination, by NMR and by homology modeling, or at least a rigid reference ligand with known bioactive conformation is known that allows for sophisticated pharmacophore modelling. This provides information about the binding-site geometry and helps to define and predict

ix possible ligand-binding modes. Once the receptor-bound conformation of a reference ligand is known or can be estimated, searches for molecules with similar recognition properties, eventually experienced by quite different molecular skeletons, can be started. These comparative techniques either use fast flexible docking algorithms or focus on sophisticated molecular superposition techniques. However, if one sufficiently understands the features that make topologically diverse ligands similar or that are responsible for achieving a particular affinity toward a certain receptor, VS can be applied to screen either compound libraries of existing substances or computer-generated molecules. The latter examples could be detected as prospective leads and accordingly potential candidates for subsequent synthesis. Speculations have been made about the number of potential drug-like molecules (<500 Da) that could be synthesized, principally. Impressive numbers of 1060 up to 10200 compounds have been discussed [10]. Even the largest computer would not be sufficient to handle, screen or select such a vast universe of molecules. However, the computer provides the chance to generate molecules considering and reflecting properties that meet the criteria required for drug bicding, simultaneously satisfying the conditions for drug-likeness, e.g. defined by Lipinsky’s rule of five [11]. Accordingly, virtual screening has to be seen in close connection to de novo design of individual compounds, but even more of medium size targeted libraries based on principles from combinatorial chemistry. VS requires some key technologies. Docking and molecular superpositioning are the most important techniques to select candidate molecules. Both methods require a relevant and reliable scoring scheme to either predict the expected binding affinity toward a particular protein reference or to estimate the degree of similarity with a given reference ligand. Binding results when individually solvated molecules combine, usually shedding some of their association with solvent. Accordingly, solvation properties are of utmost importance for ligand binding and their understanding is a clear prerequisite to develop a successful scoring function. As a kind of status report on the maturity of VS as a technique in drug design, the first workshop on new approaches in drug design and discovery was held in March 1999 at Schloß Rauischholzhausen close to Marburg in Germany [ 12]. More than 80 scientists gathered and discussed their experience with the different techniques. The speakers were invited to summarize their contributions together with their impression on the present applicability of their approach. Several of the speakers followed this request, which is summarized in this publication. The first two contributions, Combination of molecular similarity measures using data fusion by C.R. Ginn, P. Willett and J. Bradshaw, and Optimization of the drug-likeness of chemical libraries by J. Sadowski focus on the definition of similarity, in particular with respect to the

x selection and optimization of compound libraries also satisfying the general criteria for drug likeness. Since the prediction of binding affinity is of utmost importance in VS, the analysis of protein-ligand interactions is discussed in a special contribution of the York group by R.E. Hubbard, Th.G. Davies and J.R.H. Tame, Generating consistent sets of thermodynamic and structural data for analysis of protein-ligand interactions. Two contributions, one by C. Lemmen, M. Zimmermann and Th. Lengauer, Molecular superpositioning as an effective tool for virtual database screening, and one by M. Rarey and Th. Lengauer on A recursive algorithm for efficient combinatorial library docking describe two of the key techniques used to screen molecules whether they meet the similarity or binding-site criteria with a reference. Scoring of the results obtained by docking is essential in VS. Accordingly, several approaches have been summarized in this contribution. M. Stahl, Modifications of the scoring function in FlexX for virtual screening applications, I. Muegge, A knowledge-based scoring function for proteinligand interactions: Probing the reference state and H. Gohlke, M. Hendlich and G. Klebe, Predicting binding modes, binding affinities, and ‘hot spots’ for protein-ligand complexes using a knowledge-based scoring function describe new and improved scoring functions. The consideration in particular of solvation properties is described in the articles of N. Majeux, M. Scarsi, C. Tennette-Souaille and A. Caflisch, Hydrophobicity maps and docking of fragments with solvation and of V. Schnecke and L.A. Kuhn, Virtual screening with solvation and ligand-induced complementarity. A comparative study facing similarity methods with docking techniques is described by J. Mestres and R.M.A. Knegtel, Similarity versus docking in 3D virtual screening. An application to compose new potential leads from small components by computer screening is presented by T.J. Marrone, B.A. Luty and P.W. Rose, Discovering high affinity ligands from computationally predicted structures and affinities of small molecules bound to a target: A virtual screening approach, whereas H. Briem and U. Lessel try to find similarity criteria by docking or superposition methods: In vitro and In silico affinity fingerprints: Finding similarities beyond structural classes. A last section of the contributions focuses on the design of new compound libraries exploiting principles of combinational chemistry. In a contribution from J. Gasteiger, M. Pfortner, M. Sitzmann, R. Hollering, O. Sacher, Th. Kostka and N. Karg, Computer assisted synthesis and reaction planning in combinatorial chemistry, concepts from synthesis planning are applied to the development of compound libraries. V.J. Gillet and O. Nicolotti, Evaluation of reactant-based or productbased approaches to the design of combinatorial libraries describe criteria to design combinatorial libraries on the basis of properties of either the products or the reactants.

xi

References 1. 2. 3. 4.

5. 6.

7. 8. 9. 10. 11. 12.

Lander, E.S., Science, 274 (1996) 536. Drews, J., Nat. Biotechnol., 14 (1996) 1516. Drews, J., Drug Discov. Today, 5 (2000) 2. a. Skolnick, J. and Fetrow, J.S., Trends Biotechnol., 18 (2000) 34. b. Burley, S.K., Almo, S.C., Bonanno, J.B., Capel, M., Chance, M.R., Gaasterland, T., Lin, D., Sali, A,, Studier, F.W. and Swaminathan, S., Nat. Genet., 23 (1999) 151. c. Evans, W.E. and Relling, M.V., Science, 286 (1999) 487. d. Jones, D.A. and Fitzpatrick, F.A., Curr. Opin. Chem. Biol., 3 (1999) 71. Müller, K., Perspect. Drug Discov. Design, 3 (1995) v (Preface). Martin, Y., Does Virtual Pre-Screening Selection Increase the observed Quality and Quantity of Hits from Vendor and Combinatorial Libraries? Workshop Virtual Screening, Rauischholzhausen, 1999. Lahana, R., Drug Discov. Today, 4 (1999) 447. Ramesha, C.S., Drug Discov. Today, 5 (2000) 43. Walters, W.P., Stahl, M.T. and Murcko, M.A., Drug Discov. Today, 3 (1998) 160. Bohacek, R.S., Martin, C. and Guida, W.C., Med. Res. Rev., 16 (1996) 3. Lipinski, C.A., Lombardo, F., Dominy, B.W. and Feeney, P.J., Drug Delivery Rev., 23 (1997) 3. http://pc1664.pharmazie.uni-marburg.de/workshop99/

Perspectives in Drug Discovery and Design, 20: 1–16, 2000. KLUWER/ESCOM © 2000 Kluwer Academic Publishers. Printed in the Netherlands.

Combination of molecular similarity measures using data fusion CLAIRE M.R. GINN1, PETER WILLETT1 ,* and JOHN BRADSHAW 1 2

2

University of Sheffield, Western Bank, Sheffield S10 2TN, U.K. GlaxoWellcome Research and Development Limited, Stevenage SGI 2NY U.K.

Summary. Many different measures of structural similarity have been suggested for matching chemical structures, each such measure focusing upon some particular type of molecular characteristic. The multi-faceted nature of biological activity suggests that an appropriate similarity measure should encompass many different types of characteristic, and this article discusses the use of data fusion methods to combine the results of searches based on multiple similarity measures. Experiments with several different types of dataset and activity suggest that data fusion provides a simple, but effective, approach to the combination of individual similarity measures. The best results were generally obtained with a fusion rule that sums the rank positions achieved by each molecule in searches using individual measures. Key words: data fusion, database searching, molecular similarity, similarity measure

Introduction Measures of inter-molecular similarity play an important role in drug- and pesticide-discovery programmes, being used for both database searching [ 1] and structure-activity studies [2]. Many different types of similarity measure have been described in the literature (see, e.g., [3–5]) but the great majority of published studies have considered the use of only a single type of similarity measure: in many cases, indeed, a description of a new type of similarity measure forms the principal focus of the publication. Even where this is not the case, multiple measures have typically been employed only as the input to a comparative study that seeks to identify the ‘best’ measure, using some quantitative performance criterion. As an example, an early study in our laboratory [6] compared 36 different similarity measures by means of simulated leave-one-out property prediction, and concluded that the Tanimoto coefficient was the most appropriate similarity coefficient of those tested for measuring the resemblances between pairs of fragment bit-strings. Such * To whom correspondence should be addressed. E-mail: [email protected].

2 comparisons, of which there are many in the literature, are limited in that they assume, normally implicitly, that there is some specific type of structural feature, weighting scheme or whatever that is uniquely well suited to describing the type(s) of biological activity that are being sought for in a similarity search. The assumption cannot be expected to be generally valid, given the multi-faceted nature of biological activities, and this article investigates the use of data fusion [7] for combining multiple similarity measures.

Data fusion Background Data fusion is ‘a process of combining inputs from sensors with information from other sensors, information processing blocks, databases, or knowledge bases, into one representational format’ [8]. Defence applications have provided much of the driving force for the development of data fusion techniques, with published examples including establishing the friend-or-foe nature of an incoming missile or aeroplane, predicting the range and direction of a battlefield target, and navigating an un-manned armoured vehicle. Other applications include surveillance operations by law enforcement agencies, real-time control of continuous manufacturing processes, the provision of all-weather visibility for aircraft pilots, and multi-imaging systems for the analysis of medical images (see, e.g. [9]). However, data fusion can be, and is, used in much more commonplace situations: for example, establishing that it is safe to cross a road involves taking input from one’s ocular sensors (eyes) and aural sensors (ears), and then combining this information with the knowledge that an empty road is a safe road to give an output denoting the safety of the proposed action. Again, a committee in which all members can contribute will often arrive at a superior decision to the one that would have been reached by just the committee chair – although there are, of course, many exceptions to such a rule! The basic rationale for data fusion is that using the information presented by a number of sensors enables further information to be inferred that would be outside the capabilities of a single sensor. For example, if one sensor detects a tank, then all that can be deduced is the existence and the position of that tank. However, if two sensors detect the same tank then inferences can be made regarding the direction of its movement, while the addition of a temporal dimension permits the tank’s velocity to be calculated. Add to that the ability to compare the observed behaviour with records of past behaviour of tanks and the system becomes capable of threat analysis. As well as being able to infer more information, the use of a fusion system also leads to both

3 qualitative and quantitative improvements in several ways. Thus, improved operational performance can occur if one of the sensors were to become damaged, as there would still be information coming in from the others (an obvious advantage in military applications where sensors will be exposed to combat conditions and are thus liable to become damaged). Data fusion leads to extended coverage since multiple sensors can cover disparate areas, times and qualities, and it leads to an increased level of confidence in the results since multiple sensors can act together to confirm an event and to reduce any ambiguity surrounding, e.g., the classification of an event.

Combination of rankings Our interest in data fusion methods arose from recent work on their application to information retrieval (IR), specifically to the combination of the rankings produced by different retrieval mechanisms when applied to databases of textual documents. An early study is that by Belkin et al. [10], in which data fusion was used to combine the results of a series of searches of bibliographic databases, conducted in response to a single query, but employing different indexing and searching strategies. A query was processed using different strategies, each of which was used to produce a ranking of a set of documents in order of decreasing similarity with the query. The ranks for each of the documents were then combined using one of several different fusion rules (including the MIN, MAX and SUM rules discussed below); the output of the fusion rule was taken as the document’s new similarity score and the fused lists were then re-ranked in descending order of similarity. This work soon led to many other studies (see, e.g. [11–13]) and the combination of document rankings is now a well-established technique, as is exemplified by its use in a meta-search engine that provides access to the World Wide Web using a combination of different search engines [14]. The work on chemical data fusion reported here is based directly on these previous IR studies, and involves the simple procedure shown in Scheme 1, where a user-defined target structure is searched against a database using several different similarity measures. The fusion rules that we use here are based on those identified by Belkin et al. [10], and are summarised in Table 1. It will be seen that the MIN and MAX rules represent the assignment of extreme ranks to database structures and it is thus hardly surprising that both can be highly sensitive to the presence of a single ‘poor’ retrieval system amongst those that are being combined. The SUM rule is expected to be more stable against the presence of a single poor or noisy input ranking; here, each database structure is assigned the sum of all the rank positions at which it occurs in the input lists. This report considers just these three rules but

4 1.

Execute a similarity search of a chemical database for some particular target structure using two, or more, different measures of inter-molecular structural similarity.

2.

Note the rank position, ri , of each database structure in the ranking resulting from use of the i-th similarity measure. Combine the various rankings using one of the fusion rules (MIN, MAX or SUM), giving a new combined score for each database structure. Rank the resulting combined scores, and then use this ranking to calculate a quantitative measure of the effectiveness of the search for the chosen target structure.

3.

4.

Scheme 1. Combination of similarity rankings using data fusion. Table 1. Fusion rules for combining n ranked lists, where ri denotes the rank position of a specific database structure in the i-th (1 _< i _< n) ranked list Name

Fusion rule

MIN MAX SUM

minimum (r1, r2, . . ., ri, . . ., rn) maximum (r 1 , r2, . . . , ri , . . ., rn ) n i =1 ri

Σ

there are clearly many others that could be considered, e.g., the median, the product, the harmonic mean, etc. of the individual rankings. The combined scores output by the fusion rule are then used to re-order the database structures to give the final ranked output. In many cases, especially with the SUM rule, the application of the fusion rule may result in the assignment of the same score to two or more items. When this happens, it is necessary to specify a further sort key to allow the resolution of the tied structures, e.g., alphabetical ordering of the canonicalised connection tables describing the tied database structures or the allocation of weights to individual rankings (perhaps based on past performance in Similarity searches) so that a high position in one ranking would differ in importance from that same position in another ranking.

Chemical applications Chemical applications of data fusion are not completely novel. As long ago as 1973, Clerc and Erni noted that ‘when data from several different spectro-

5 scopic methods are used for comparison purposes, greatly enhanced performance may be expected because the methods complement each other’ [15] and went on to discuss the use of a scoring scheme based on weighted contributions from each of several molecular properties and spectra. More recently, Masui and Yoshida [16] have reported the use of the SPECTRA system for combining the similarity scores obtained in searches of a database containing mass, IR, and 1H and 13C NMR spectral data when one or more of the spectra are missing for a particular sample molecule. In work more analogous to that reported here, Kearsley et al. have used both similarity-based and rank-based procedures to combine pairs of similarity searches of the Standard Drug File database, and found that significant improvements in performance could be achieved in simulated property prediction experiments [17, 18]. Finally, So and Karplus have recently advocated combining different QSAR methods to obtain models with heightened predictivity [ 19]. Our initial studies of data fusion were undertaken as part of a project to evaluate the EVA descriptor, which characterises a molecule by its fundamental vibrational fingerprint [20]. Although originally developed for QSAR applications, the EVA descriptor can also be used for similarity searching and a range of EVA-based similarity measures were hence evaluated using a dataset containing 8178 molecules from the Starlist file [21]. Comparable searches were also carried out using the 2D similarity searching routines in the UNITY chemical information management system [22], and using data fusion to combine the two individual types of ranking. Simulated leave-oneout property prediction experiments using the logP data in the Starlist file showed that, on average, the fused rankings appeared to be better than the original 2D and EVA rankings. Although the differences were not always statistically significant, the study provided at least some evidence that data fusion could be used to improve the performance of similarity searching in chemical databases: the remainder of this article reports further experiments that have been undertaken to ascertain the accuracy of this conclusion. Full details of the work are provided by Ginn [23].

Cellular-uptake dataset The dataset These experiments involved a set of 136 biological dyes that are used to stain cells so as to visualise various organelles, specifically the lysosomes (L), the mitochondria (M), and the nucleus (N). These three broad activity classes were subdivided into eight mechanism-specific subclasses [24] and each molecule in the dataset was allocated an 8-bit activity bit-string in

6 which the i-th bit was switched on if the molecule exhibited the i-th activity. Three different types of descriptor were used to characterise the molecules in this dataset: 2D fragments, 3D fragments and physical properties. The 2D fragment descriptors used here were the fingerprints produced by Barnard Chemical Information Limited (BCI) [25], while the 3D fingerprints were based on the NBN non-bonded torsion angle descriptor developed by Bath et al. [26]. The physical property descriptor comprised three standardised properties for each molecule: the logarithm of the octanol/water partition coefficient, the net electric charge, and the number of bonds included in the delocalised electron system of the molecule [24]. Each molecule in the dataset was considered as a target for similarity searching using each of the three similarity measures, with the similarity between a pair of molecules being calculated using the Tanimoto coefficient (the simple binary form of this coefficient for the 2D and 3D fingerprint measures and the generalised, non-binary form for the physical property measure) [5]. The three rankings for each target structure were fused using the SUM, MIN and MAX fusion algorithms defined previously.

Comparison of ranks and similarities An inspection of Scheme 1 shows that Step 2 of the basic fusion procedure involves the rank positions for each database structure, rather than the similarity scores that are output by the similarity measure. On first sight, the former might seem to be the less intuitively reasonable approach as it involves a loss of information when compared with the use of scores. However, there are two factors associated with the use of similarity scores that lessen their attractiveness. Firstly, as researchers are more likely to be concerned with some number of nearest-neighbours to the target structure, rather than with those items that are above some threshold of similarity, it seems logical to consider the rank positions of the items irrespective of their similarity scores. Secondly, and more importantly, despite having the same range of scores (such as zero to unity for the binary version of the Tanimoto coefficient [5]), the distributions within these ranges given by different similarity measures may not be directly comparable, with the possibility of biasing the fusion rule in much the same way as unstandardised numeric data can affect the results of a multivariate analysis. We have compared the distributions of scores for each similarity method at each rank n, using the Kolmogorov-Smirnov test, which provides a simple and direct way of testing whether two distributions differ in any way, e.g., in location, dispersion or skewness [27]. If the distributions of the similarity scores for two original similarity measures are significantly different then it would be unwise to fuse them without applying some form of standardisation

7

Figure 1. Plots of mean score against rank for the three types of original (i.e., unfused) similarity measure for the cellular-uptake dataset.

procedure (i.e., the use of rank positions in the present context). Figure 1 shows plots of the mean similarity scores (averaged over all 136 target structures) at each rank position, n (1 ≤ n ≤ 100). The figure shows that while the 2D and 3D scores are distributed similarly, the physical property scores exhibit a markedly different distribution. Focusing upon the important top parts of the rankings, pairs of the distributions were compared for n = 1–10 using the Kolmogorov-Smirnov test: these tests showed that the distribution of scores for the physical properties measure was significantly different ( p ≤ 0.01) to those from both 2D and 3D for n = 1–10 and that the distribution of scores for 2D was significantly different to those for 3D for n = 1–4 We hence conclude that the distributions of similarity scores can be very different, even if they have the same ranges, thus supporting our use of ranks as the input to the various fusion rules studied here. Similar results were obtained [23] in a comparable study of the EVA and 2D rankings of the Starlist dataset mentioned previously.

Fusion results Having established the appropriateness of rank-based fusion, the main experiments were evaluated in two ways. In the first, a count was made of the molecules ranked in the top ten positions that belonged to the same activity subclass as the target structure. These counts were then averaged over each of the eight subclasses, with the results shown in Table 2, where L1–4 (lysosomes), M1–2 (mitochondria) and N1–2 (nucleus) denote the eight activity subclasses identified in the dataset. The bold-font underlined elements denote fusion results that perform at least as well as the best individual similarity measure. It will be seen that the best similarity measure, in terms of actives being highly ranked, varies across activity subclasses; however, the results

8 Table 2. The mean number of actives in the top-10 rank positions for each activity class in the cellular-uptake dataset for the original similarity methods (columns 2D, Phys and 3D) and after data fusion (columns MAX, MIN and SUM). The bold underlined numbers indicate a fused result at least as good as the best original similarity measure for that target structure Activity

2D

Phys

3D

MAX

MIN

SUM

L1 L2 L3 L4 M1 M2 N1 N2

1.40 2.14 5.53 5.33 2.29 6.48 2.71 3.67

3.05 3.50 6.35 4.44 6.17 5.00 4.43 4.19

1.12 5.93 3.69 4.06 2.50 5.52 2.71 3.67

2.96 4.36 6.00 5.17 4.96 6.52 3.19

2.02 3.57 5.81

3.25 5.00 6.16

4.00

4.67 4.08 5.86 3.29 4.24

5.50 5.04 6.17 3.81 4.00

Mean actives

3.99

4.64

3.65

4.65

4.19

4.93

Mean rank

4.63

2.88

5.00

2.94

3.50

2.01

demonstrate that both SUM and MAX are, overall, to be preferred to the individual results. SUM also does well if one ranks the measures for each search, rather than using the actual numbers of actives retrieved (which vary considerably from one search to another). For example, in the first row of Table 2, SUM identifies most actives and is given the rank 1, Phys identifies the next highest number of actives and is given the rank 2 and so on down to 3D, which identifies the smallest number of actives and is thus given the rank 6. The mean ranks obtained in this way, when averaged across the eight activity sub-classes, are listed in the bottom row of the table and demonstrate clearly the effectiveness of the SUM fusion rule with this dataset. The second set of analyses employed the Hamming distance [5] between the activity bit-strings of the target structure and a database structure, i.e., the number of times that the two bit-strings differ. For example, if the target is active for subclasses L1, M1 and N1 then a Hamming distance of 0 between a database structure and the target indicates that the former is also active in subclasses L1, M1 and N1 and only in those classes, and would thus be a most appropriate hit for that target molecule. Figure 2 shows the mean Hamming distance for each similarity measure across all 136 target structures at rank n (1 ≤ n ≤ 10), and it can be seen that the SUM and MAX fusion algorithms give results that are consistently better (i.e., a smaller mean

<0.01 0.42 <0.01 0.20 0.43 0.60

<0.01 <0.01 <0.01

SUM

<0.01 <0.01 <0.01

SUM

0.24 <0.01 0.53 <0.01 0.58 <0.01

<0.01 0.32 <0.01 <0.05 0.07 0.53

n=7

<0.01 <0.01 <0.01

2D 3D Phys

<0.05 <0.01 <0.01

MAX MIN

n=6

<0.01 0.18 <0.01 0.84 <0.01 0.34

MAX MIN

n=2

Method MAX MIN SUM

2D 3D Phys

Method MAX MIN SUM

n = l

<0.01 <0.01 0.11

SUM

0.38 <0.05 0.43

<0.01 <0.01 <0.05

SUM

0.79 <0.01 0.43 <0.01 0.54 <0.01

MAX MIN

n=8

<0.01 <0.01 <0.01

MAX MIN

n=3 SUM

<0.01 <0.05 <0.01 <0.01 0.21 0.43

<0.01 <0.01 <0.05

SUM

0.43 <0.01 0.43 <0.01 0.33 <0.01

MAX MIN

n=9

<0.01 <0.01 <0.01

MAX MIN

n=4

SUM

<0.01 <0.01 <0.01 <0.01 0.36 0.28

<0.01 <0.01 <0.05

SUM

0.58 <0.01 0.42 <0.01 0.38 <0.01

MAX MIN

n = 10

<0.01 <0.01 <0.01

MAX MIN

n=5

Table 3. The p values from the Wilcoxon test for rank positions n = 1–10. Values < 0.05 denote a fusion rule that is significantly better than and original similarity measure for the cellular-uptake dataset

9

10

_ n <10. _ Figure 2. The mean Hamming distance at each rank n, 1<

Hamming distance) than those from any of the individual similarity methods. A pairwise comparison of similarity methods was carried out using the Wilcoxon Matched-Pairs Signed-Ranks test [27]. Specifically, the test was used to compare the Hamming distances for each fusion rule with each of the original similarity methods, target by target, and thus to indicate whether the two methods that are being compared are significantly different. Table 3 shows the p values for n = 1–10. It can be seen that SUM is significantly better than each of three original similarity methods for all values of n, with 28 out of the 30 sets of comparisons being highly significant ( p ≤ 0.01). MAX also performs well, but MIN is noticeably inferior to the other two fusion rules for this dataset. Taken together, these results show that the fused similarity measures can, in some cases at least, enable better predictions to be made of the cell-staining activities of the molecules than can the original measures, with SUM appearing to perform best of the three fusion rules tested here. When we take account of the rather variable performance of the individual similarity measures from one activity to another, it can be concluded that SUM-based fusion provides an effective way of generating a reliable single ranking with respect to both a single activity and the activity classes as a whole.

11

World Drug Index dataset Having demonstrated the potential of data fusion on a small dataset, the next set of experiments used a file of structures and associated broad-class bioactivity data from the World Drug Index (WDI) database [28]. Three different types of similarity measure were used here, these being based on 2D fragment occurrence data, 3D geometric information and on molecular fields. The 2D rankings were obtained using the UNITY fingerprints mentioned previously, while the 3D rankings were obtained using the atom-mapping measure described by Pepperrell et al. [29]. This measure uses inter-atomic distance information to identify pairs of atoms, one in the target structure and one in the current database structure, that are surrounded by similar patterns of atoms; these initial atomic equivalences are then used to construct an approximate mapping of the target structure onto the database structure. The field-based rankings were obtained using the FBSS (for field-based similarity searching) program described by Drayton et al. [30], in which a target structure is aligned with a database structure by means of their steric, hydrophobic and electrostatic fields. The particular version of the program used here considered all three types of field in the generation of an alignment, and hence in the resulting similarity score (this corresponding to the ‘All’ search of Drayton et al. [30]). Ten target structures were chosen that had been used previously by Kearsley et al. in their studies of WDI-based similarity searching [17]. The similarity searches were performed on datasets of approximately 3600 structures, each containing the activity class for the target structure with an additional 3500 randomly selected WDI molecules. The data available for fusing comprised three sets of rankings (one for each of the original similarity measures) for each of the 10 targets, with the effectiveness of each search being measured by the number of molecules in the top-50 rank positions that had the same activity as the target; other performance measures for this dataset are discussed by Ginn [23]. Table 4 lists the numbers of actives identified in the original and fused searches for each of the 10 target structures. The results obtained are similar to those obtained with the cellular-uptake dataset: while the fused results are not always as good as the best individual result, they provide a generally high, and thus robust, level of effectiveness whereas the best original measure varies from target to target. This is particularly clear if one inspects the mean activities and ranks at the bottom of the table, where it will be seen that SUM would again seem to be the fusion rule of choice.

12 Table 4. The number of actives found in the top-50 rank positions for searches in the WDI database for the original similarity methods (columns 3D, 2D and FBSS) and after data fusion (columns MAX, MIN and SUM). The bold underlined numbers indicate a fused result at least as good as the best original similarity measure for that target structure Target

3D

2D

FBSS

MAX

MIN

SUM

Apomorphine Captopril Cycliramine Diazepam , Diethylstilb ol Fenoterol Gabaxadol Morphine RS86 Serotonin Mean actives Mean rank

15 23 43 27 44 19 6 20 0 13 21.0 3.60

23 34 31 27 33 33 2 28 8 19 23.8 3.05

14 12

24 26

16 27

26 31

36 15 34 17 6 16

43 23 42 28 5 19

42 23 38 29 6 24

45 22 42 31 5 16

5 13 16.8 5.15

10 '13 23.3 3.40

6 20 23.1 3.05

14 15 24.7 2.75

Kahn dataset The dataset The final section evaluates data fusion when a larger number of original similarity measures is available. The dataset used here is described by Kahn in a discussion of descriptors for the analysis of combinatorial libraries [31]: it contains 75 compounds each belonging to one of 14 well-defined activity classes (angiotensin-converting enzyme inhibitors, acetylcholine receptor inhibitors, antagonists of 2-aminoproprionic acid, aldose reductase inhibitors, angiotensin-II receptor antagonists, beta adrenergic blockers of the type-3 receptor, cyclo oxygenase 2 receptor antagonists, dopamine 3 receptor (ant)agonists, endothelin receptor (ant)agonists, histamine 2 antagonists, neurokinase1 receptor antagonists, HIV- 1 protease inhibitors, non-nucleoside HIV reverse transcriptase inhibitors, and steroid aromatase inhibitors). Six similarity measures were used to generate rankings: the Molecular Simulations Inc. (MSI) [32] Jurs descriptors; FBSS (as discussed in the previous section); two types of ChemX 3D flexible fingerprints [33]; and two types of Daylight 2D fingerprints [34]. The Jurs descriptors are part of the MSI Cerius2 package, and describe shape and electronic charge by mapping

13 the atomic partial charges onto the solvent accessible areas of the individual atoms within a molecule. All of the 30 possible Jurs descriptors [35] were calculated for each member of the dataset. The values were then normalised, and the similarity between pairs of sets of values calculated using the non-binary Tanimoto coefficient. In what follows, the inclusion of the Jurs rankings in a fusion combination is indicated by ‘J’. The FBSS similarity measure has been described previously: its inclusion in a fusion combination is denoted by ‘F’. The ChemX 3D flexible fingerprint keys record the presence or absence of potential pharmacophoric patterns (consisting of three pharmacophore centres and the associated inter-atomic distances) in any of the low-energy conformations identified by a rule-based conformational analysis of a molecule. Two sets of similarity scores were generated from these fingerprints: the Tanimoto coefficient scores and the Tversky similarity scores [5,36], the inclusion of these in a fusion combination being denoted by ‘3’ or by ‘T’, respectively. The Daylight fingerprints were based on unfolded fingerprints considering pathlengths of up to 7, the inclusion of these in a fusion combination being denoted by ‘2’ (for a standard fingerprint where a bit is either set or not set) or by ‘N’ (for a fingerprint where a count is kept of how many times each bit is set), respectively. Thus 23F, for example, represents the fusion of the standard Daylight, Tanimoto ChemX and FBSS rankings. The similarity scores for these experiments were calculated using either the binary or non-binary versions of the Tanimoto coefficient, as appropriate.

Fusion results In view of its performance in the studies discussed above, we used just the SUM rule for the fusion experiments, with all possible combinations of rankings from the similarity methods being studied (in much the same way as So and Karplus have very recently evaluated the effectiveness of all possible combinations of seven different QSAR methods [19]). Table 5 details the mean numbers of actives (i.e., molecules with the same activity as the target structure) found in the top-10 nearest neighbours when averaged over all 75 target structures. The values of c at the top of the table denote the number of similarity measures that were fused (so that, e.g., c = 1 represents the original measures and c = 2 represents the fusion of a pair of the original measures) and a bold-font underlined element indicates a fused combination that is better than the best original individual measures (which was the ChemX keys with the Tanimoto coefficient). It will be seen that very many of the fused combinations in Table 5 are bold underlined, rhus providing further support for the use of SUM to fuse similarity rankings, and Ginn reports similar results from other analyses of this dataset [23]. The table also shows that the fraction of the combinations

14 Table 5. Mean number of actives found in the 10 nearest neighbours when combining various numbers, c, of different similarity measures for searches of the Kahn dataset. Bold underlined entries indicate a fused result at least as good as the best original similarity measure c=1

c=2

c=3

2 3 F J N T

23 2F 2J 2N 2T 3F 3J 3N 3T FJ FN FT

1.10 1.04 1.01 0.68 0.95 1.09 1.25 1.00 1.32 1.20 0.91 1.11

23F 23J 23N 23T 2FJ 2FN 2FT 2JN 2JN 2NT 3FJ 3FN

1.28 1.39 1.04 1.24 1.35 1.08 1.28 1.03 1.10 0.95 1.40 1.19

JN 0.89 JT 0.93 NT 0.85

3FT 3JN 3JT 3NT FJN FJT FNT JNT

1.33 1.25 1.45 1.20 1.11 1.21 1.11 1.12

0.80 1.12 0.89 1.08 0.63 0.69

c=4

c=5

23FJ 23FN 23FT 23JN 23JT 23NT 2FJN 2FJT 2FNT 2JNT

23FJN 23FJT 23FNT 23JNT 2FJNT 33FJN

1.52 1.23 1.43 1.31 1.45 1.25 1.28 1.53 1.28 1.17 3FJN 1.35 3FJT 1.55 3FNT 1.41 3JNT 1.36 FJNT 1.32

c=6 1.45 1.69 1.36 1.43 1.43 1.51

23FJNT 1.43

that are bold underlined increases in line with c, so that all combinations with _ 4 perform at least as well as the best of the individual similarity measures. c> However, it is not the case that, e.g., the c = 5 combinations are invariably superior to the c = 4 combinations, and the best result overall was obtained with 23FJT (rather than with 23FJNT, the combination involving all of the individual measures). Thus, while simply fusing as many individual measures as are available in a similarity investigation would appear to perform well, superior results may be obtained from fusing a subset of the individual measures; this has also been noted in searches of text databases [10] but there is no obvious predictive mechanism for identifying an optimal combination a priori [23,37].

15

Conclusions In this article we have discussed the use of data fusion methods to combine the rankings resulting from similarity searches of chemical datasets. Our experiments, which have employed a range of types of molecule and performance criterion, demonstrate that use of a fusion rule such as SUM will generally result in a level of performance (however this is quantified) that is at least as good (when averaged over a number of searches) as the best individual measure: since the latter often varies from one target structure to another in an unpredictable manner, the use of a fusion rule will generally provide a more consistent level of searching performance than if just a single similarity measure is available.

Acknowledgements We thank the following: the Engineering and Physical Sciences Research Council and GlaxoWellcome Research and Development Limited for funding; BioByte Corp., Derwent Information, Darren Green, Richard Horobin, Sonia Ranade, David Turner and David Wild for data; and Barnard Chemical Information Limited and Tripos Inc. for software support.

References 1. Downs, G.M. and Willett, P., Rev. Comput. Chem., 7 (1995) 1. 2. Dean, P.M. and Perkins, T.D.J., In Martin. Y.C. and Willett, P. (Eds.) Designing Bioactive Molecules: Three-Dimensional Techniques and Applications, American Chemical Society, Washington DC 1998, pp. 199–218. 3. Special issue devoted to molecular similarity, J. Chem. Inf. Comput. Sci., 32 (1992) 577–752. 4. Dean, P.M. (Ed.) Molecular Similarity in Drug Design, Chapman and Hall, Glasgow, 1975. 5. Willett, P., Barnard, J.M. and Downs, G.M., J. Chem. Inf. Comput. Sci., 38 (1998) 983. 6. Willett, P. and Winterman, V., Quant. Struct.-Act. Relat., 5 (1986) 18. 7. Hall, D.L., Mathematical Techniques in Multisensor Data Fusion, Artech House, Northwood, MA, 1992. 8. Kokar, M. and Kim, K., Control Eng. Pract., 2 (1994) 803. 9. Arabnia, H.R. and Zhu, D. (Eds.) Proceedings of the International Conference on Multisource-Multisensor Information Fusion, Fusion’98, CSREA Press, 1998. 10. Belkin, N.J., Kantor, P., Fox, E.A. and Shaw, J.B., Inf. Proc. Manag., 31 (1995) 431. 11. Savoy, J., Ndarugendamwo, M. and Vrajitoru, D., Proceedings of the Fourth Text Retrieval Conference, National Institute for Standards and Technology NIST Special Publication 500–236, Gaithersberg, MD, 1996, pp. 537–547.

16 12. Lee, J.H., Proceedings of the Twentieth Annual International Conference on Research and Development in Information Retrieval, Association for Computing Machinery, New York, NY, 1997, pp. 267–276. 13. Pfeifer, U., Poersch, T. and Fuhr, N., Inf. Proc. Manag., 32 (1996) 667. 14. Smeaton, A.F. and Crimmins, F., URL: http://www.inf.udec.cl/~campos/fusion/fusionpc/fusion-www6.html 15. Clerc, T. and Erni, F., Topics Curr. Chem., 39 (1973) 91. 16. Masui, H. and Yoshida, M., J. Chem. Inf. Comput. Sci., 36 (19%) 294. 17. Kearsley, S.K., Sallamack, S., Fluder, E.M., Andose, J.D., Mosely, R.T. and Sheridan, R.P., J. Chem. Inf. Comput. Sci., 36 (1996) 118. 18. Sheridan, R.P., Miller, M.D., Underwood, D.J. and Kearsley, S.K., J. Chem. Inf. Comput. Sci., 36 (1996) 128. 19. So, S.-S. and Karplus, M., J. Comput.-Aided Mol. Design, 13 (1999) 243. 20. Ginn, C.M.R., Turner, D.B., Willett, P., Ferguson, A.M. and Heritage T.W., J. Chem. Inf. Comput. Sci., 37 (1997) 23. 21. The Starlist file is available from BioByte Corp. at http://clogp.pomona.edu/ 22. UNITY is available from Tripos Inc. at http://www.tripos.com 23. Ginn, C.M.R., The Application of Data Fusion to Similarity Searching of Chemical Databases. Ph.D. thesis, University of Sheffield, 1998. 24. Ranade, S.S., Prediction of Cellular Uptake of Foreign Chemicals Using Cluster Analysis, Ph.D. thesis, University of Sheffield, 1998. 25. Bamard Chemical Information Limited is at URL http://www.bcil .demon.co.uk 26. Bath, P.A., Poirrette, A.R., Willett, P. and Allen, F.H., J. Chem. Inf. Comput. Sci., 34 (1994)141. 27. Siegel, S. and Castellan, N.J., Nonparmetric Statistics. McGraw-Hill, New York, NY, 1988. 28. The World Drug Index database is available from Derwent Information at URL http://www.denvent.co.uk 29. Peperrell, C.A., Taylor, R. .and Willett, P., Tetrahedron Comput. Methodol., 3 (1990) 575. 30. Drayton, S.K., Edwards, K., Jewell, N.E., Turner, D.B., Wild, D.J., Willett, P., Wright, P.M. and Simmons, K., Internet J. Chem., URL http://www.ijc.com/articles/1998v1/37/ 31. Kahn, S.D., Schleyer, P.v.R., Allinger, N.L., Clark, T., Gasteiger, J., Kollman, P.A., Schaefer III, H.F and Schreiner, P.R. (Eds.), Encyclopedia of Computational Chemistry, Vol. 1, John Wiley, Chichester, 1998, 417–425. 32. Molecular Simulations Inc. is at URL http://www.msi.com 33. ChemX products are available from Oxford Molecular Limited at URL http://www.oxmol.co.uk 34. Daylight Chemical Information Systems Inc. is at URL http://www.daylight.com 35. Stanton, D.T. and Jurs P.C., Anal. Chem., 62 (1990) 2323. 36. Bradshaw, J., URL: http://w ww.daylight.com/meetings/mug97/Bradshaw/MUG97/tv_tversky.html 37. Smeaton, A.F., Proceedings of the Twentieth BCS-IRSG Colloquium, Grenoble, France (in press).

Perspectives in Drug Discovery and Design, 20: 17–28, 2000. KLUWER/ESCOM © 2000 Kluwer Academic Publishers. Printed in the Netherlands.

Optimization of the drug-likeness of chemical libraries JENS SADOWSKI ZHF/G –A 30, BASF AG, D-67056 Ludwigshafen, Germany

Summary. A scoring scheme for the classification of molecules into drugs and non-drugs was established. It was set up by using atom type descriptors for encoding the molecular structures and by training a feed-forward neural network for classifying the molecules. The approach was parameterized by using large databases of drugs and non-drugs – the Available Chemicals Directory (ACD) with 169 331 molecules and the World Drug Index (WDI) with 38 416 molecules. It was able to reveal features in the molecular descriptors that either qualify or disqualify a molecule for being a drug. The method classified about 80% of the ACD and the WDI correctly. It was extended to the application for crop protection compounds and can be used to prioritize compounds for synthesis, purchase, or biological testing. An enhancement allows to optimize the drug character of combinatorial libraries. Key words: drug-likeness, fingerprints, genetic algorithms, neural networks

Introduction With the upcoming experimental methods for handling large numbers of compounds in drug design – high-throughput screening and combinatorial chemistry – the focus of molecular modelling in this area was on diversity [1]. The questions to computational chemistry were most often like: Which is the most diverse subset of a given set of compounds? However, more recently a need for methods that can handle additional criteria like drug-likeness [2– 4] or bioavailability [5] was discovered. The questions are now manifold, like: Which compounds are drug-like, toxic, or bioavailable? How can these multiple criteria including diversity be optimized simultaneously in library design? In the following, some methods for answering these questions will be demonstrated.

Drug-likeness score Method A method for classifying compounds into drugs and non-drugs was developed recently [4]. The intention of such an approach was to model the intuition

18

Figure I. Neural network approach for classifying compounds.

and experience of medicinal chemists beyond certain advantageous or disadvantageous (e.g., reactive) substructures. This is somehow a complex feeling for suitability, synthesizability, stability, bioavailability, or toxicity. The druglikeness approach is based on a very simple type of fingerprint and a neural network. Figure 1 illustrates this schematically. A given chemical compound is translated into a suitable descriptor that is forwarded to a trained neural network which in turn gives back an estimate of a certain compound class the molecule belongs to (e.g., drug or non-drug). The main idea behind this approach is that the same knowledge which medicinal chemists achieve in their career is in the form of many examples

19

Figure 2. Distribution of predicted scores for the training sets (solid lines) and for the complete databases (dashed lines).

contained in public databases. The World Drug Index [13] and the Available Chemicals Directory [14] were used as collections of drugs and nondrugs, respectively. Both databases were preprocessed [4] in order to remove compounds with certain reactive or otherwise unwanted substructures and duplicates. In addition, exact matches of WDI compounds (i.e., drugs) were removed from the ACD. Clearly, there are much more drug-like compounds in the ACD and there are certain compound classes in the WDI which are not typical drugs as, e.g., cytostatics. This is the remaining noise in such data collections. The WDI and ACD compounds were assigned drug-likeness scores of 1 (drug) and 0 (non-drug). The fingerprint descriptor consists simply of the counts of the Ghose/Crippen atom types [6–11] within a molecule. A neural network approach was chosen due to the advantages such a nonlinear approach has over comparable linear methods like linear regression or partial least squares (the main disadvantage of a neural network is its limited

20 use in the interpretation of the data). The neural network was trained with a randomly drawn training set of 5000 drugs from the WDI and 5000 nondrugs from the ACD by using the SNNS program [12]. The parameters for the network training and a suited network architecture were determined by a number of empirical tests. It turned out that a feed-forward network with 92 input neurons (the Ghose/Crippen fingerprint), five hidden units, and one output neuron (the drug-likeness score) achieved sufficient predictivity. For details of this approach see the original publication [4]. The trained network is now available for predicting the drug-likeness of chemical compounds. Figure 2 shows the quality of this prediction. The score distributions for the training sets (closed lines) and the complete databases (dashed lines) are compared for training and test data. First, there is no significant difference between the predictions for the training data (5000 drugs and 5000 non-drugs) and the complete databases (38 416 drugs from the WDI and 169 331 non-drugs from the ACD). Thus, the network is highly predictive for data that were not shown to it during the training process. Secondly, about 80% of each dataset were predicted correctly. This approach can be further used to prioritize compounds for synthesis, purchase, or biological testing. At the moment we use at BASF a threshold value of 0.3 for the discrimination between drugs and non-drugs. This shifts the number of correctly classified drugs to 90% while keeping the correctly predicted non-drugs still at a level of 70%.

Retrospective analysis of HTS data The approach for predicting the drug-likeness of compounds was applied to the analysis of high-throughput screening data. For two receptor assays and three enzyme assays, the percentage of drug-like molecules with a score greater than 0.3 was estimated for four different data sets in the screening cascade: 1. The total amount of compounds that went into screening - usually several hundred thousands. 2. The subset of compounds passing the primary screen at a certain level of percent inhibition – several thousands. 3. The subset passing the secondary screen, a certain IC50 level – several hundred compounds. 4. Manual selection by medicinal chemists for further development in chemistry projects – about ten compounds. The criteria for this selection are somehow the intuitions of experienced medicinal chemists for drug-likeness, e.g., bioavailability, toxicity, stability, or synthesizability. This is

21

Figure 3. Retrospective analysis of HTS data.

the step where a properly trained neural network should make similar decisions. Figure 3 illustrates the results. Whereas the percentage of drug-like compounds with a score >0.3 remains rather constantly on the 50–60% level for the first three HTS stages (screening, after primary screen (% inhibition), and after secondary screen (IC50)), it jumps up to 70% for the enzyme assays and to nearly 100% for the receptor assays after manual inspection by medicinal chemists (whether the difference between receptors and enzymes is systematically or not should be discussed when more data are available). Two conclusions can be drawn from this result. First, there is no direct correlation between general drug-likeness and activity in certain biological assays – also non-drug-like compounds can cause signals. Secondly, the trained neural network for the drug-likeness prediction behaves similar to trained medicinal chemists and tends to select the same sort of compounds.

22 Therefore, efforts for screening can be cut significantly by applying this filter before HTS.

Crop protection score Data In order to extend the approach for drug-likeness described above to crop protection compounds, it would be nice to have an equivalent for the World Drug Index in the crop protection field. Interestingly, there is no such database with the chemical structures in a computer-readable format available. Therefore, two databases that were assembled in-house at BASF were used: 986 crop protection compounds that are on the market or under development (CP) and 1203 compounds from new crop protection patents (PAT). The structural overlap of these two databases is less than 1% (data not shown). The two databases were pre-processed with the same procedure as the drugs in Reference 4. The CP database (986 compounds) was used along with 1000 ‘non-crop protection’ compounds from the ACD [ 14] as training set. The CP database can be considered as the currently available world of crop protection compounds. The PAT database consisting of 1203 mostly newly patented compounds in this area can be considered as the future of crop protection chemistry. This database was used as a test set.

Results The neural network training was performed like for the drug filter described above [4]. The results are shown in Figure 4. The vast majority of the CP data (91%) is on the right hand side of the diagram. These are the correctly classified crop protection compounds. The majority of the non-crop protection compounds (71%) are on the left hand side. Thus, the new filter is able to distinguish between compounds that are suited for crop protection purposes and those which are not. In order to assess the predictivity of the approach, the second crop protection database PAT, being the future of crop protection chemistry, was sent through the trained neural network. Figure 5 shows the distribution of the score for this dataset. Clearly, also these 1203 compounds were classified mostly correct (69%).

Cross-validation In addition, the two filters for drugs and crop protection compounds were cross-validated by applying the crop protection network to the World Drug

23

Figure 4. Distribution of crop protection scores in the CP (closed line) and ACD (dashed line) datasets.

Figure 5. Crop protection score distribution in the PAT dataset (solid line). For comparison, the distributions for the training data (Figure 4) are shown with thin dashed lines.

24 Index and the drug-likeness network to the crop protection compounds. The crop protection filter found 67% compounds not suited for crop protection in the WDI. On the other hand, the drug filter found 77% non-drugs in the two crop protection databases (CP and PAT, 2189 compounds).

Multidimensional combinatorial library optimization Multiple criteria for the selection of an optimal sub-library A real-world example for what is needed today in combinatorial library optimization is the following. Given 7262 reagents A and 1761 reagents B, these can be assembled into a virtual library of 13 .10 6 products A-B (one step reaction, exact data not shown – proprietary scaffold). These building blocks are selected exclusively under synthesis considerations, i.e., no other criteria like diversity are involved in this step. The synthesis robot can handle 15 by 15 reagents in one run. The task is to find an optimal 15 × 15 sub-library (225 products) out of the large virtual library. The criteria should be the crop protection score (see above), diversity, and the price of the starting materials. There are 1082 possible sub-libraries, i.e. much too many for a systematic exploration.

Genetic algorithm Gillet et al. [15] proposed to solve such problems by a genetic algorithm (GA) [20]. Other GA applications in library design have been used in lead optimization [16–17], in library mixture optimization [18], and for the selection of preferable compounds from a large virtual library (i.e., ‘cherry picking’) [ 19]. Genetic algorithms optimize a population of individuals (possible solutions) by improving their ‘fitness’, i.e., the adaption to the problem, by applying principles of the natural evolution like ‘mutation’ and ‘crossover’. The implementation used here is based on the Genesis program [21]. The individuals in a population are different 15 × 15 sub-libraries out of the virtual library described above. Their fitness is the weighted sum of the percentage of compounds with a crop protection score greater than 0.3, of a diversity index, and of the reciprocal prices of the starting materials. The GA was run with a population size of 50, a maximum number of generations of 200, a mutation rate of 0.1%, and a cross-over rate of 60%. These are more or less the recommended default values [21]. Sufficient convergence could be reached with these parameters (data not shown). The diversity of the sub-libraries was calculated from the products instead of from the individual building blocks. It was shown recently, that particularly

25

Figure 6. Distribution of the percentage of suitable compounds (crop protection score >0.3) in 10 000 randomly drawn 15 × 15 libraries.

for diversity optimization such an approach is superior [22]. In addition, the crop protection score cannot be calculated from the starting materials. The diversity index is the normalized sum of the absolute differences of the Ghose/Crippen fingerprints [4] of all pairs of compounds within a given 15 × 15 sub-library. The Ghose/Crippen fingerprints of the 225 products in a given sub-library are calculated from the fingerprints of the individual building blocks. Since these fingerprints are also the basis for the crop protection score, the computation of the fitness function is very effective in terms of computer resources. It takes less than 30 min to do the 10 000 fitness function calculations needed for one optimization run on an SGI R10000 processor (data not shown).

Results Ten optimal 15 × 15 sub-libraries were generated by the genetic algorithm described above. In order to assess the quality of the results, they were compared to (1) 10 000 randomly generated libraries, (2) 10 libraries optimal with respect to maximal diversity regardless of the other criteria, and (3) 10 libraries with minimal diversity. The ten optimal libraries had about 78% compounds with a crop protection score greater than 0.3 and a diversity index of 5.2 at costs of about 3000 USD (20 g per building block). The 10 000 randomly

26

Figure 7. Distribution of the cost of the starting materials for 10 000 randomly drawn 15 × 15 libraries.

drawn compounds behaved much less favorable. Figures 6 and 7 show the distribution of the percentage of suitable compounds over these libraries and the distributions of the costs of the starting materials. As can be seen from Figure 6, about 80% of the 10 000 random libraries have 0–10% suitable compounds, the rest contain 10–30% suitable compounds – a rather insufficient rate. The same holds for the building block costs (20 g per BB). There is no random library with less than 30 000 USD cost for the starting materials. The mean value is at 80 000 USD, and the maximum value at about 130 000 USD. Thus, the GA optimization is by far more advantageous. Figure 8 illustrates this superiority again. The diversity index of the sub-libraries is plotted against the percentage of suitable compounds (crop protection score >0.3) for the 10 000 random libraries, 10 optimal libraries with respect to score, diversity and cost, 10 maximal diverse libraries regardless of the other criteria, and 10 minimal diverse libraries regardless of the other criteria. The last two groups of libraries were obtained by GA runs for maximizing and minimizing diversity alone without the other criteria (score and cost) in order to find the lower and upper ends of the diversity scale for this type of combinatorial library – there is no absolute diversity scale. The diagram shows the 10 best libraries on the right hand side of the score axis and in the upper third of the diversity scale with the min. diverse and the max. diverse libraries as end points. This is much better with respect to

27

Figure 8. Diversity index vs. percentage of suitable compounds (score >0.3): 10 000 randomly drawn 15 × 15 libraries, the best library after optimization, the maximal and the minimal diverse libraries.

both criteria than the 10 000 random libraries and much better with respect to the score than the max. diverse libraries. Thus, the GA found a sufficient compromise between several independent criteria for combinatorial library optimization.

Conclusions The application of neural network approaches for the recognition of compounds that are drug-like or suited for crop protection and their discrimination from basic chemicals were discussed. These methods succeed for both application areas with an average correctness of classification of 70–80%. They will be extended in the future to similar criteria like toxicity or bioavailability. In a second approach, a genetic algorithm was used to optimize simultaneously three criteria – crop protection suitability, diversity, and cost – for a small sub-library out of a huge virtual combinatorial library. It could be

28 shown that in cases where a systematic approach is impossible (1082 possibilities) the GA is by far superior to a stochastic solution.

Acknowledgements I thank Peter Plath for providing the crop protection databases and Regina Hill for providing and discussing the combinatorial library example.

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16, 17. 18. 19. 20. 21. 22.

Warr, W.A., J. Chem. Inf. Comput. Sci., 37 (1997) 134. Gillet, V.J. and Bradshaw, J., J. Chem. Inf. Comput. Sci., 38 (1998) 165. Ajay, Walters, W.P. and Murcko, M.A., J. Med. Chem., 41 (1998) 3314. Sadowski, J. and Kubinyi, H., J. Med. Chem., 41 (1998) 3325. Lipinski, C.A., Lombardo, F., Dominy, B.W. and Feeney, P.J., Adv. Drug Delivery Rev., 23 (1997) 3. Ghose, A.K. and Crippen, G.M., J. Med. Chem., 28 (1985) 333. Ghose, A.K. and Crippen, G.M., J. Comput. Chem., 7 (1986) 565. Ghose, A.K. and Crippen, G.M., J. Chem. Inf. Comput. Sci., 27 (1987) 21. Ghose, A.K., Pritchett, A. and Crippen, G.M., J. Comput. Chem., 9 (1988) 80. Visnavadhan, V.N., Ghose, A.K., Revankar, G.R. and Robins, R.K., J. Chem. Inf. Comput. Sci., 29 (1989) 163. Ghose, A.K., Viswanadhan, V.N. and Wendoloski, J.J., J. Phys. Chem. A, 102 (1998) 3762. SNNS: Stuttgart Neural Network Simulator; Version 4.0, University of Stuttgart, 1995. WDI (World Drug Index), Version 2/96, Derwent Information, 1996. ACD (Available Chemicals Directory), Version 2/96, MDL Information Systems, 1996. Gillet, V.J., Willett, P., Bradshaw, J. and Green, D.V.S., J. Chem. Inf. Comput. Sci., 39 (1999) 169. Weber, L., Wallbaum, S., Broger, C. and Gubernator, K., Angew. Chem., 107 (1995) 2452. Singh, J., Ator, M.A., Jaeger, E.P., Allen, M.P., Whipple, D.A., Soloweij, J.E., Chowdhary, S. and Treasurywala, A.M., J. Am. Chem. Soc., 118 (1996) 1669. Brown, R.D. and Martin, Y.C., J. Med. Chem., 40 (1997) 2304. Sheridan, R.P. and Kearsley, S.K., J. Chem. Inf. Comput. Sci., 35 (1995) 310. Goldberg, D.E., Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley, Reading, MA, 1989. Genesis program by J.J. Grefenstette, Naval Research Laboratory, Washington, DC, 1987. Gillet, V.J., Willett, P. and Bradshaw, J., J. Chem. Inf. Comput. Sci., 37 (1997) 731.

Perspectives in Drug Discovery and Design, 20: 29-42, 2000. KLUWER/ESCOM © 2000 Kluwer Academic Publishers. Printed in the Netherlands.

Generating consistent sets of thermodynamic and structural data for analysis of protein-ligand interactions THOMAS G. DAVIES, JEREMY R.H. TAME and RODERICK E. HUBBARD* Structural Biology Laboratory, Department of Chemistry, University of York, Heslington, York YO10 5DD, U.K.

Summary. The development of reliable, transferable methods that can compute the energy of interaction between proteins and ligands is a major challenge for computational chemistry. Understanding the energetics of protein-ligand interactions would not only provide powerful tools for prediction in structure-assisted ligand and library design, but also enrich our appreciation of the subtleties of structure that underlie molecular recognition in biological systems. One of the central problems in developing effective models is the quality and quantity of experimental data on the structure and thermodynamics of protein-ligand complexes. In this article we discuss some of the issues and some of the experimental programmes of research we have initiated to provide such data. We summarise the characteristics necessary for a model system and the experimental techniques available. This includes a discussion of calorimetry, inhibition assays and crystallographic results on series of complexes in our laboratory, including penicillin acylase, thrombin, sialidase and in particular the oligopeptide binding protein, OppA. As well as discussing the lessons we have learnt about the characteristics of an ideal model system, we also present some preliminary analyses of what our combined structural and thermodynamic data have told us. Key words: calorimetry, crystallography, drug design, ligand binding, molecular recognition

Introduction The specific recognition of target molecules by proteins and nucleic acids is fundamental to all life processes. However, our current knowledge of the relationship between the structure of these macromolecules and the affinity of ligand binding is very poor, a major limitation in structure based ligand design. In recent years the number of known structures of protein-ligand complexes has grown significantly, but for many of these the interaction has not been well characterised thermodynamically. Individual protein-ligand systems are * To whom correspondence should be addressed. E-mail: [email protected]

30 in any case of limited use for developing an empirical model of the thermodynamics of protein-ligand interactions. Series of related ligands binding to a protein may be more instructive, since the structural and thermodynamic differences can (in theory) be correlated. Many of the series of protein-ligand complexes studied to date are of just a small series of compounds, the ligand not being changed in a systematic fashion. Where structures of a large number of enzyme-inhibitor complexes have been solved, the ligands tend to be rather varied molecules developed as part of a medicinal chemistry programme. In such cases the mode of binding may vary considerably, making it difficult to determine the contributions of individual interactions to binding. The priority of a commercial drug discovery project is to determine whether and how tightly a given ligand has bound to the target. Finally, comparison of the thermodynamics of complexes from a variety of laboratories and systems is often impossible because the binding assays, and the conditions used, are so varied. What is needed is a macromolecule that can form a very large variety of high affinity ligand complexes in which both the macromolecule and ligand adopt essentially invariant conformations. This will allow the design of a large number of ligands in order to remove or introduce new interactions with the host molecule and to measure the effect of these changes on the affinity of binding.

Finding a suitable protein-ligand system When we first began our work in this area, we identified four main criteria for a suitable system with which to probe protein-ligand interactions. These were: 1. Pure, native protein was available for assays and structure determination. 2. A reliable assay was available to measure binding affinity. 3. The structure of the protein-ligand complex was relatively straightforward to determine at high resolution. 4. A large number of ligands were available with subtly varying chemistry. The enzyme penicillin acylase (penicillin amidohydrolase EC 3.5.1.1 1) satisfied all of these criteria. This enzyme catalyses the hydrolysis of the amide bond in benzylpenicillin (Penicillin G, 1, Scheme 1) to give phenyl acetic acid and 6-aminopenicillanic acid. The first structure of the enzyme [1] revealed a well-defined binding cavity for phenylacetic acid (Figures la, b). A survey of chemical suppliers catalogues identified a number of phenylacetic acid derivatives and so we

31

determined the structure of the enzyme in complex with a set of these ligands [2] (see Table 1). A relatively straightforward enzyme assay was available and the inhibition constants measured could be used to reflect the binding affinities of the different ligands. The apparent binding affinity was consistent with simplistic modelling of the different ligands into the original structure. However, the structures revealed that the protein undergoes a conformational change on ligand binding (Figure 1c). Two distinct conformations are possible; therefore the affinity measured is the overall balance of energetics for the ligand binding to the enzyme and the energy penalty for inducing conformational change in the enzyme. These experiments added another requirement to our criteria for an experimental system for measuring protein-ligand interactions: 5. The structure of the protein and orientation of the ligand remain constant across the series of complexes.

Isothermal titration calorimetry The binding affinity for the inhibitors of penicillin acylase was measured using kinetic data. Isothermal titration calorimetry (ITC) is a more general method for determining binding constants which does not depend on catalytic activity. It also has the considerable advantage of providing a measure of the affinity and enthalpy of binding in a single experiment. A schematic diagram of an ITC experiment is shown in Figure 2. A solution of one reactant (usually the protein) is maintained at a constant temperature in a sample cell. This is kept warmer than the surrounding isothermal jacket so that there is a constant thermal gradient and thus a constant input of energy is required to the sample heating coil. A concentrated solution

32

Figure 1. Ligand binding to penicillin acylase [2]. (a) Structure of the enzyme determined with phenylacetic acid (yellow) bound (pdb code: lpnl). Red is the A chain, blue the B chain of the enzyme, with the active site serine and oxyanion hole coloured yellow. (b) Detail of the active site of the unliganded structure (pdb code: lpnk), showing the active site serine. The pink grid surface shows the solvent accessible surface calculated with a probe of 1A. (c) Overlay of the structures of phenylacetic acid derivatives in complex with the enzyme. Blue = m-nitrophenylacetic acid (pdb code: 1ai5) and p-hydroxyphenylacetic acid (pdb code: 1ai6).

33 Table 1. Chemical structures of the phenylacetic acid derivatives whose structure has been determined complexed to penicillin acylase, together with measured inhibition constants [2]

of the ligand is then injected. If the binding of the ligand to the protein is exothermic, then heat is liberated; if it is an endothermic process then heat is absorbed. The change in the amount of energy required to keep the sample cell at the constant temperature can then be measured. A series of injections is made and the sample allowed to return to thermal equilibrium after each. The total heat released or absorbed is then determined. A typical trace is shown in

34

Figure 3. Typical output from an ITC experiment (in this case the binding of the ligand KNapK to OppA, measured at pH 7 and 22 °C).

35 Figure 3 – as the ligand saturates the protein binding site, so the calorimeter trace decreases and the final spikes in the trace show the heat of dilution of the ligand in the sample buffer. From the shape of the curve a binding affinity can be estimated and from the amount of heat evolved the enthalpy of the interaction can be calculated. Knowing both ∆G and ∆H, the entropy of the interaction can also be calculated. The measurement is limited by the sensitivity of the thermocouples and the amount of heat generated or taken up during the reaction. This gives two other requirements: 6. The binding of ligand to protein must involve a measurable enthalpy change. 7. The dissociation constant for the interaction should be in the 10-8 to 10-3 M range. The next exploratory system was thrombin. Interestingly, Weber and co-workers [3] have studied the binding of a series of tripeptides with varying primary amine side chains in the P1 position. They determined structures and used kinetics to measure Ki, but were unable to rationalise the full range of observed affinities. Our collaborators, Glaxo-Wellcome, had available supplies of the enzyme and also a series of ligands associated with a drug discovery programme. For one of these ligands, argatroban, we are able to measure binding with calorimetry. However, this is a ligand with a Kd approaching nanomolar, beyond the sensitivity of calorimetry. Unfortunately, most of the other available ligands were not very soluble in aqueous solution. They could be taken up using DMSO but the heat of dilution would then be very large and mask the enthalpy of binding of the ligand. This gives another requirement for this type of experiment: 8. Both protein and ligand need to be highly soluble in aqueous buffers. We also explored measuring the binding of the neu-5-Ac-2-en series of inhibitors to the influenza virus enzyme, sialidase [4]. Although it was possible to obtain good quality calorimetry traces for this system, it proved too expensive to produce the large quantities of the enzyme that were required for a systematic study. In addition, it was not possible to develop a satisfactory protocol for recovery of the enzyme after calorimetry. This adds a more focussed requirement on protein availability as: 9. Large quantities of protein available, recoverable post-calorimetry.

36

Figure 4. Secondary structure schematic diagram of the structure of the OppA protein complexed with KnapK [6] as pink van der Waals spheres (pdb code: 1b0h).

Our current model system - OppA OppA is a periplasmic binding protein involved in the uptake of extracellular peptides by Gram-negative bacteria such as E. coli. The structure of the protein was determined at York some years ago [5], revealing that in common with other periplasmic binding proteins, OppA engulfs its ligand between two domains joined by a flexible hinge (Figure 4). It binds peptides between two and five residues in length with little regard to sequence. Tight binding (Kd ~1 µM) is achieved by the protein utilising the hydrogen bonding potential of the peptide backbone and strategically placed charged groups to bind the N and C termini. The ligand side-chains are accommodated in large hydrated cavities in the protein. The initial studies on a series of peptides of sequence KXK showed that the structure of the protein remains essentially unchanged on binding different ligands [7]. This system satisfies all the criteria discussed. We can produce large quantities of pure protein that is highly soluble and can be recycled by dialysis after ligand-binding experiments have been performed. Peptide ligands can be ordered directly from suppliers and there is a large repertoire of natural and unnatural amino acids which can be incorporated into the ligand. The ligands are, in general, highly soluble in aqueous solution and although the enthalpy change for some ligands is rather low, it has always been possible to measure the thermodynamics of binding. Crystallisation and determination of the structure of the protein-ligand complex is also relatively routine, although because a conformational change is involved, co-crystallisation is necessary rather than soaking crystals with ligand.

37

Figure 5. Structure of the peptide ligands KXK when bound to OppA, together with selected side chains from the protein (E32, W397, R404, H405). The structures are deposited with pdb codes 1b05, 1b0h, lblh, 1b2h, 1b3h, 1b32, 1b3f, 1b3g, 1b3h, 1b31, 1b40, 1b46, 1b4h, 1b4z, 1b51, 1b52, 1b58, 1b5h, 1b5i, 1b5j, 1b6h, 1b7h, 1b9j, 1jet, 1jeu, 1jev, 1olb.

Two main series of ligands have been studied to date. The first is the natural KXK series where X is one of the 20 naturally occurring amino acids. The crystal structures and calorimetry data have been obtained for the full series [8]. The second series is also of the type KXK, but with X chosen from commercially available abnormal amino acids [6]. This series allows minimal changes to be made to the ligand. For example, comparing the ligands where X = diaminopropionic acid, diaminobutyric acid, ornithine and lysine, the effects can be seen of moving the side-chain amino group along the binding pocket in steps of one carbon atom. Similarly the results with norleucine and norvaline can be compared with alanine. In this way a much more precise picture is obtained than by comparing rather different groups with each other. One interesting feature of OppA is that the pockets can enclose bulkier ligands than the protein would expect to handle normally. These are not physiological ligands, as a key function of the protein is to select only natural peptides for transport into the cell. Figure 5 shows the structure of the ligands and selected OppA side-chains for all the published (and deposited) structures determined to date in the series

38 KXK. Although there is some variation in the detailed conformation of the flanking lysine side-chains across the series, the peptide backbone of the ligand remains essentially constant through all the structures. Importantly, there are only minor adjustments in the conformation of one side-chain (Glu 32) in the OppA protein structure. The figure emphasizes how the major change is in the chemical nature and bulk of the central amino acid side-chain. What is not shown (see References 6 and 8 for details) are the changes observed in the solvent structure in the ligand binding cavities. The program LUDI [9] is the best-documented and widely used empirical scoring potential for estimating binding affinity for protein-ligand complexes. In common with other scoring functions, the affinity of binding is estimated from contact areas and hydrogen bonding, no explicit account being made for dynamics. We computed LUDI binding estimates for the interaction between different peptides and OppA. The program gave binding affinities that were far larger (as much as 106) than measured, and this error was not systematic – there was no correlation between the computed and experimental values. Through our links with the software company MSI, we were able to inspect the source code for this program, and established that the program was sensibly taking the solvent to be part of the protein. Detailed analysis of this structural and thermodynamic data is currently underway, including thermodynamic perturbation to compute ∆∆G values, solvation free energy calculations, and attempts at deriving a new empirical scoring function. The results of these calculations will be presented elsewhere. There are some general features of the structures, however, which are worthy of note. Figure 6 shows the observed trend in the binding affinity for the KXK series with Dab, Dap, Om, Lys as the central side-chain. This positions an amino group at different positions in the binding pocket. Although there are clearly some energetic differences associated with the changing number of solvent molecules in the binding pocket, an important drop in Kd is noticed for ornithine. Inspection of the structures shows that in this amino acid, the amino group is placed alongside a salt bridge between a glutamic acid and arginine in the protein. This emphasises (as with the penicillin acylase example discussed above) that the observed binding affinity is not only a consequence of the interactions between the ligand and the protein, but also includes a term for any changes in the internal energy of the protein itself. Another interesting observation (discussed in more detail for the natural KXK series [8]) is how the protein structure has evolved to provide a balance between entropy and enthalpy such that the overall binding affinity of the ligand for the protein remains approximately constant. This is a recognised phe-

39

Figure 6. Moving an amino group through a ligand binding cavity. On the left hand side is a detail from the structure of KdapK (pdb code: 1b6h) with solvent molecules shown. On the right hand side is the measured Kd for the series KXK where X is Dab, Dap, Orn and Lys [6].

Figure 7. Enthalpy-entropy compensation in ligand binding. Comparison of the structures of KNvaK (red, pdb code: 1b6h) and KNleK (blue, pdb code: 1b7h) together with the measured thermodynamic values.

40 Table 2. Chemical structures of the central side chain in the peptides KXK. Kd is the dissociation constant for the tripeptide binding to OppA as determined calorimetrically [6]

nomenon [ 10,11], which occurs universally when weak non-covalent forces are involved. The often-quoted explanation is that a strong, enthalpically favourable interaction results in a general tightening up of the complex, thus reducing the vibrational freedom and the entropy of the system. A striking example of this is available from our results when comparing norvaline and norleucine (Figure 7). Our dataset provides an opportunity to explore this phenomenon in more detail.

Ligand conformation and flexibility The affinity of ligand binding reflects both the enthalpy and entropy of complex formation. Importantly, this includes not only the change in entropy and enthalpy of the protein on losing solvent but also differences in the energy

41 required to take the ligand out of solution and into the binding pocket. This can include entropic differences, depending on ligand solvation, but also differences in enthalpy and entropy if the ligand has preferred conformations or internal interactions that are lost on binding. For the abnormal amino acid series, KXK, we have been exploring whether there are differences between the ligands in NMR measurements such as chemical shift, J-J coupling and NOESY spectra (Davies et al., in preparation). To a first approximation, it would appear that the preferred conformations and order parameters for the free peptides in solution are essentially the same. NOESY experiments so far also indicate that the tri-peptides all have access to the same conformational space, except for those containing aromatic amino acids.

Final comments on the OppA system No strong correlations have so far been observed between the experimentally and computationally determined energies of binding in any of the many computational procedures examined. There are a number of possible reasons for this. The first is that the OppA system has evolved as a relatively indiscriminate binding protein. The various amino acid side-chains that line the ligand binding pockets, and the solvent in them, avoid strong interactions with the ligand, only the main-chain is bonded tightly. Scoring functions such as LUDI, however, are parameterised on other systems where the geometry of interaction is optimal. It is also a challenge for the calculations to take into account the explicit disorder that is seen in solvent positions in some of the complexes. Existing scoring functions reward for good interactions, but do not penalise for bad – particularly perturbations that occur in the receptor protein. Current work is focussed on scoring functions that take into account poorer contacts – the ongoing experimental programme of abnormal amino acids is challenging the side chain pockets and will provide additional data on the effect of these unfavourable interactions. In addition, the buried water molecules and flexibility of the ligands are not typical of most protein-ligand complexes. We continue our search for other suitable experimental systems that will provide complementary data on a quite different system. Our experiences to date provide an extremely valuable set of criteria on which to choose such a system.

42

Acknowledgements Aspects of this work were performed by Sarah Done, Sally Lewis, Sara Sleigh, Chandra Verma, Tony Wilkinson, Lisa Wright and Janet Woodford. We thank Simon Duckett of York for help with the NMR experiments and Rob Cooke of Glaxo-Wellcome for advice. All figures were produced with QUANTA 98 (Molecular Simulations Inc). The majority of the financial support is from the BBSRC, with important contributions from Glaxo-Wellcome and Pfizer. J.R.H.T. is a Royal Society University Research Fellow.

References 1. Duggleby, H.J., Tolley, S.P., Hill, C.P., Dodson, E.J., Dodson, G.G. and Moody, P.C.E., Nature, 373 (1995) 264. 2. Done, S.R.H., Brannigan, J.A., Moody, P.C.E. and Hubbard, R.E., J. Mol. Biol., 284 (1998) 463. 3. Weber, P.C., Lee, S.L., Lewandowski, F.A., Schadt, M.C., Chang, C.H. and Kettner, C.A., Biochemistry, 34 (1995) 3750. 4. Von Itzstein, M., Wu, W.-Y., Kok, G.B., Pegg, M.S., Dyason, J.C., Jin, B., Van Phan, T., Smythe, M.L., White, H.F., Oliver, S.W., Colman, P.M., Varghese, J.N., Ryan, D.M., Woods, J.M., Bethell, R.C., Hotham, R.C., Cameron, J.M. and Penn, C.R., Nature, 363 (1993) 418. 5. Tame, J., Murshudov, G.N., Dodson, E.J., Neil, T.K., Dodson, G.G., Higgins, C.F. and Wilkinson, A.J., Science, 264 (1994) 1578. 6. Davies, T.G., Hubbard, R.E. and Tame, J.R.H., Protein Sci., 8 (1999) 1432. 7. Tame, J.R.H., Dodson, E.J., Murshudov, G., Higgins, C.F. and Wilkinson, A.J., Structure, 3 (1995) 1395. 8. Sleigh, S.H., Seavers, P.R., Wilkinson, A.H., Ladbury, J.E. and Tame, J.R.H., J. Mol. Biol., 291 (1999) 393. 9. Bohm, H.J., J. Cornput.-Aided Mol. Design, 8 (1994) 243. 10. Dunitz, J.D., Chem. Biol., 2 (1995) 709. 11. Gilli, P., Ferretti, V., Gilli, G. and Borea, P.A., J. Phys. Chem., 98 (1994) 1515.

Perspectives in Drug Discovery and Design, 20: 43-62,2000. KLUWER/ESCOM © 2000 Kluwer Academic Publishers. Printed in the Netherlands.

Multiple molecular superpositioning as an effective tool for virtual database screening CHRISTIAN LEMMENa,*, MARC ZIMMERMANNb and THOMAS LENGAUERb,c a CombiChem lnc., 1804 Embarcadero Road, Palo Alto, CA 94303, U.S.A. b German National Research Center for Information Technology (GMD), Institute for

Algorithms and Scientific Computing (SCAl), SchloB Birlinghoven, D-53754 Sankt Augustin, Germany c University of Bonn, Research Group Professor Lengauer (FGL), Department of Computer Science, RömerstraBe 164, D-531 17 Bonn, Germany

Summary. Molecular superpositioning is an important task in rational drug design. Usually it is the key step in a comparative analysis of molecules by 3D QSAR methods. Also it is helpful for the elucidation of a pharmacophore and crucial in the attempt to derive a receptor model. Generally speaking, molecular superpositioning can be seen as the analog of molecular docking if the receptor structure is not available, and direct methods are not applicable. Virtual database screening is the computational counterpart to modem experimental techniques like high throughput screening and assaying of combinatorial libraries. Both screening techniques have the common goal to detect active molecules in a large selection of compounds. Usually hundreds of thousands of candidates are to be tested, hence, time is the limiting factor and rapid processing of utmost importance. Descriptor-based methods that usually provide a simple linear encoding of the molecules meet the demands of computational speed and have been used predominantly for the task of virtual screening, for a long time. However, more powerful superposition methods have been developed during the past few years and now begin also to be applicable to screening large databases. Especially in combination with the faster methods, molecular superpositioning as the final step of a filtering protocol provides a powerful tool for virtual database screening. The present work reports on our latest developments of molecular superpositioning techniques and assessing their applicability to virtual database screening. Key words: database filtering, molecular superpositioning, structural alignment, virtual screening

Introduction Superpositioning molecules is a specific problem addressed in the more gen-

* To whom correspondence should be addressed. E-mail: [email protected]

44 eral field of molecular similarity research. The problem of assessing the similarity of molecules, taking their 3D structure into account, can be stated as follows: Given a set of molecules binding to the same location on a molecular target, (a) determine the conformation and relative orientation such as dictated by their common binding mode and (b) find a similarity metric that prioritizes this configuration among all others. Clearly, this problem has several aspects that may be considered to a larger or lesser extent. First, in general, several molecules have to be considered simultaneously. However, frequently methods only perform painvise comparisons. This is a problem, since it can happen that the multiple alignment provides a plausible match of the structures but neither of the implied painvise matches does [1,2]. Second, molecules must be considered as flexible objects that usually may adopt millions of different shapes, even if only a limited energy-range is permitted. Frequently, conformational models are used that allow for enumerating conformers. Third, six degrees of freedom have to be considered with regard to the relative orientation of the molecules. Here, a multitude of restrictions have been proposed to cut down the size of the search space. One example is to discretize the problem to a matching problem with a limited number of atoms, groups or chemical features. Another alternative is to assume the true superposition in some local minimum of a similarity function and to perform local searches starting from a set of initial orientations. In addition, sometimes, the search space is restricted to a cubic grid. Finally, the similarity metric ideally provides a smooth function that has its global optimum in the above described ideal configuration. However, even in docking studies where the target structure is given, no single energy function exists that meets these standards. Therefore, usually similarity measures are used that approximate the desired behavior, i.e., arrive at a local optimum in a configuration close to the desired geometry for at least a test set of examples. 2D methods neglect the whole geometric issue completely and aim simply at a measure that prefers molecules with similar chemical groups in similar locations of the 2D formula. In the following, we mention landmark contributions to molecular superpositioning and molecular similarity research in the past two decades. Around 1979 Crippen and co-workers laid the groundwork of distance geometry [3,4]. This technique utilizes the description of molecules on the basis of their inter-atomic distances or valid distance ranges. Triangle or higher order inequalities are used to narrow down ranges as much as possible. The so-called embedding problem has to be solved to find 3D coordinates for all atoms that obey the distance constraints. This is usually solved using randomized approaches. Distance geometry has proven to be a useful tool for docking, pharmacophore detection and receptor modeling.

45 At about the same time the active analogue approach was laid out by Marshall and co-workers [5]. This method starts with a putative set of key pharmacophoric groups and their intermolecular correspondence in a set of active compounds. The key idea then is to enumerate all possible conformers and to store all resulting inter-feature distances in an array indexed by these distances. These so-called distance maps of several molecules can simply be intersected to obtain those conformations that result in a consistent pharmacophoric pattern across all molecules. For the same objective clique detection methods have been employed as well [6,7]. In a so-called distance compatibility graph all distance- and typecompatible atom pairs in a specific reference structure and any conformer of all other considered structures are represented by nodes. Those nodes are connected that share an atom in each of the represented structures. A clique in this graph represents a matching of type-compatible atoms across the set of molecules. Thus, large cliques represent potentially interesting pharmacophores. The method algorithmically enumerates such cliques. Geometric Hashing is a powerful method that addresses the problem of finding three-dimensional objects in complex scenes. It originates from computer vision [8] and has been widely used to find matches between functional groups of molecules that are either to be docked or to be superposed. The key element in these techniques is a hash table that stores a rotationally invariant representation of one molecule and can be queried with structural features of the other molecule [9]. Given a matching of the pharmacophoric features on the molecules an RMS-fit provides the superposition of the structures that gives the smallest sum of squared distances between the related features [10]. The directed tweak technique allows for an RMS-fit considering molecular flexibility. Using analytical derivatives of the objective function the underlying optimization problem can be solved extremely fast [11]. Other optimization-based approaches aim at maximizing some kind of overlap volume. The classical similarity measures provided by Cabó [ 12] and Hodgkin [13] are standard objective functions in such methods. Conformational sampling and the generation of reasonable starting points for the optimization are the critical ingredients in an optimization-based approach. All kinds of optimization techniques, gradient based, with genetic algorithms, simulated annealing, and many others have been attempted to arrive at reasonable optima quickly. Virtual database screening aims at detecting active molecules in a large collection of compounds. The compounds may either be given explicitly (e.g. a corporate database) or implicitly as a combinatorial library with core fragments and substituent lists. Traditionally, database screening has been the

46 domain of topology- or descriptor-based methods that assess similarity calculation rapidly at the expense of limited accuracy and difficult interpretability of the results. Superposition methods have been employed for the screening task as well [14,15]. Usually, stepwise filtering protocols were used to keep computational demands low [16,17]. Reviews on the early work on molecular similarity can be found in References 18–20. An outstanding collection of articles and reviews on pharmacophore elucidation can be found in Reference 21. Drug design applications of molecular similarity and molecular complementarity concepts are described in Reference 22. Molecular similarity measures are reviewed in Reference 23. Excellent reviews on molecular superpositioning can be found in References 24–28. A recent review on descriptor-based methods is given in Reference 29. A review on the structural alignment of molecules highlighting the last recent methodical developments can be found in Reference 30. Our group contributed several methods for different variants of the problem to superimpose molecules. Initially, these were designed for the purpose of the preparation of structures for 3D-QSAR analysis. However, they turned out to be powerful enough to screen large datasets. Especially some recent enhancements to the software and the use of a two-step filtering protocol using a rough and fast superpositioning technique first and an expensive but accurate alignment tool later made our methods applicable to virtual database screening. In the following sections we briefly review the approaches to molecular superpositioning that we contributed during the last years and detail the recent enhancements we implemented. Then we describe two ways of evaluating superpositioning methods and the performance that we achieved with regard to these criteria. First, we compare the geometries that our method produces to crystal data. Second, we describe the filtering protocol we designed and show the effectiveness of our methods in a virtual database screening scenario. In order to keep the presentation concise, we restrict ourselves to presenting the results on one out of five example cases that we investigated. However, the results on the other example cases usually look quite similar and the entire set of charts can be inspected on our web-page (http://cartan.gmd.de/reliwe/pd3_paper99.html). Our presentation concludes with an outlook to future work.

Methods We developed two software tools for the purpose of molecular superpositioning: RIGFIT, a method to rigid-body superimpose molecules, and FLEXS, for the task of flexibly superimposing a test molecule onto a rigid reference struc-

47 ture. Since the former is a special case of the latter problem, the whole system is called FLEXS [ 15]. The flexible superpositioning strategy comprises three steps: the selection of a number of base fragments, the placement of the base fragments, and an incremental construction, completing the partial superpositions generated by the base placement step. The procedure is illustrated by the left branch of the flowchart in Figure 1. The rigid-body superpositioning comprises a numerical optimization procedure (see the right branch of the flowchart). It turns out that this method is also well suited for the superpositioning of molecular fragments. Therefore, we employ RIGFIT in FLEXS in two places: for the rigid-body superpositioning of whole molecules and as one of three alternatives for the base placement step during flexible superpositioning. FLEXS provides a scripting language that allows for comfortable control of the superpositioning task. Also, it permits batch processing, like performing loops over series of compounds or invoking RIGFIT with multiple conformers that have been generated separately. The runtime of the flexible superpositioning lies in the range of a few minutes, whereas rigid-body superpositioning is performed in a couple of seconds on a current workstation or PC (like a single processor SUN-Ultra-30 Workstation with 128 MB of main memory and 296 MHz clock speed). The rigid-body superpositioning in RIGFIT implements an optimization procedure using the Hodgkin index [ 13] as the goal function and utilizing sets of Gaussian functions to model a variety of physico-chemical properties. The method employs basic techniques from crystallography that help to speed up the process significantly. The basic concepts in here have been described previously by Klebe et al. [3 1] for the modeling part and Nissink et al. [32] for the crystallographic techniques. The resulting RIGFIT approach has several interesting characteristics. First, moving to Fourier space, which can be done analytically, provides easy access to a translation-invariant description of the molecules. Hence the first step in a RIGFIT optimization is to determine local optima for the rotation of the molecules starting from a number of orientations. The second step comprises the translational optimization which can be performed in Fourier space quite efficiently as well. Finally, the approximate solutions are post-optimized using the original Hodgkin index and considering all six degrees of freedom. Figure 2 illustrates how this method compares with a traditional overlap optimization procedure. Though the procedure looks more complex, it reduces the necessary number of local optimizations and allows for much faster three-dimensional optimizations for the intermediate steps. Since the translational optimization is especially fast, large numbers of start translations are permitted which are the prerequisite for successfully placing molecular fragments. For the algorithmic details see Reference 1.

48

Figure 1. Flow chart of a single superposition of a test ligand onto a reference structure with FLEXS. The user decides between flexible fitting (left branch) and rigid-body superpositioning (right branch). The rigid-body superpositioning has been described elsewhere in detail as the RIGFIT procedure [1]. RIGFIT is used in FLEXS in two places (indicated by the dashed line). In any case FLEXS produces a set of reasonable placements of the test ligand.

The flexible superpositioning part of FLEXS heavily uses the matching of H-bonding partners in both molecules. However, neither the interacting atoms nor specific site points are matched. Instead, those regions in space that describe the preferred position of protein counter atoms are required to intersect in order to form a valid matching. The H-bonding geometries as well as directional hydrophobic interactions are modeled by point sets in space. A combinatorial optimization procedure enumerates triangles of such interaction points on the base fragment and searches for compatible triangles

49

Figure 2. The traditional approach to rigid-body superpositioning with local optimization techniques is shown on the left hand side. Here, n × m local optimizations with six degrees of freedom have to be performed. In contrast, the RIGFIT procedure is illustrated on the right hand side. It also starts with n start rotations; however, these are optimized independently, and result in k < n local optima. Then k × m translational optimizations are performed that result in r << k × m local optima. These again are post-optimized and result in a comparatively small set of s solutions, just as the traditional approach. Hence, fewer local optimizations are performed at three different steps of the protocol. In addition, as symbolized by the size of the shaded rectangles, these optimizations (either dealing with only three degrees of freedom or comprising only a few steps of a post-optimization) can be carried out quite fast.

in a hash table for the reference structure. Each such match of two triangles defines a placement of the base fragment on the reference structure. If the triangle matching procedure fails, the slower RIGFIT procedure is invoked to place the fragment. In this way a number of base fragments, each with a set of low-energy conformers, is placed in a number of plausible orientations on the reference structure. Subsequently, the test ligand is incrementally built up by adding the remaining fragments in a stepwise fashion. Each fragment contains exactly one rotatable bond or a whole ring system. Upon adding a fragment, each of its preferred torsions or ring conformers are tried in turn. Thus, in each step, a number of partial placements is extended to a larger number of partial placements of molecular entities that are larger by one fragment. In order to avoid the inherent combinatorial explosion of alternative solutions, in each step, all partial placements are scored and only the topranking solutions are carried to the next step. See Figure 3 for an illustration of the tree-like combinatorial optimization scheme. For the technical details of this algorithm we refer to Reference 33.

50

Figure 3. Flow chart of the combinatorial placement procedure in F LEX S which explores the tree structured configuration space, permitted by the underlying discrete modeling of the superpositioning problem. Dots symbolize placements (for all but the leaves of the tree being partial placements) of the test ligand. White colored dots are supposed to indicate high-scoring placements that are shuffled to the left by the intermediate sorting steps and comprise the roots of the next level of the evaluated portion of the configuration tree. The triangles, starting from each root, symbolize that multiple solutions are obtained upon adding a new fragment in multiple conformations to an existing partial placement.

An important recent enhancement to the flexible superpositioning has been the extension of the algorithm to handle multiple base fragments to start with. These can be selected either manually, automated by a heuristic procedure, or on the basis of a common substructure. The latter is of special interest if combinatorial libraries are to be superimposed that contain a limited number of substructures suitable as base fragments. In combination with the manual definition of placements for these fragments a significant speed-

51 up is achieved, too, since the base placement step usually takes up to two thirds of the runtime of the entire procedure. The automated selection of base fragments is critical for processing large numbers of molecules in a virtual database screening experiment. We employ a simple heuristic that loops over all fragments of a limited size and scores them by three important and, to some extent, opposing aspects: the number of potential interacting groups, the number of conformers, and the volume spanned by the fragment. Since these are at the same time the dominating factors for the superpositioning method it is likely that among high scoring fragments there are reasonable ones to start with. Another enhancement of the software is to allow merging multiple Gaussian representations into a single one and to use such a composite model as the Gaussian representation of the reference molecule. The different sets of Gaussians may originate from different compounds, from different conformers of one compound, or may even be artificially constructed. In this way the reference compound may, in fact, represent a whole set of compounds, conformational uncertainty may be appropriately described, and, e.g., excluded volumes may be incorporated into the molecule to be fitted onto. Since the number of terms to be evaluated during Gaussian overlap calculation grows approximately with the square of the number of atoms, we reduce the number of Gaussians as follows: We evaluate all painwise distances between all the Gaussians. Each distance that lies below a certain threshold results in an edge in a corresponding distance graph that contains a node for each Gaussian. The distance graph is not necessarily connected. The aim is to merge terminal nodes and cliques in such a graph. Figure 4 illustrates an example situation and the nodes to be merged. We achieve this goal using a two-pass procedure, determining the Gaussians to be merged first and subsequently merging them. The technical details and implementation can be found in Reference 34. In order to determine the merged Gaussian representation three parameters have to be optimized, height, width, and position. The width of the Gaussians is set to a constant in all our calculations. To optimize the position requires solving a separate expensive optimization problem. Instead we simply use the center of gravity of the Gaussians to be merged. Given the widths and positions, the heights are optimized in order to minimize the difference between the old and the reduced Gaussian description.

Comparison with X-ray data The effectiveness of an indirect approach to drug design like molecular superpositioning is difficult to evaluate. A quasi-standard metric is to compare

52

Figure 4. Distance-based merging of Gaussian functions. Dots display positions of Gaussians and edges connect dots representing Gaussians which lie closer to each other than a certain threshold. The encircled Gaussians are to be merged.

the computed alignments with crystallographic data. This is possible if the 3D structures of protein-ligand complexes with different ligands binding to the same protein are available. To this end we extended a dataset initially compiled by Klebe et al. [31] to a total of 76 protein-ligand complexes of 14 proteins with the ligands varying in size from 18 to 158 atoms and up to 35 rotatable bonds. This way, 460 pairs of molecules are found that bind to the same protein. We mutually superimposed the complexes by minimizing the positional differences between the backbone Cα atoms. Then we extracted the ligands from the complexes keeping their obtained relative orientation in space to define our reference alignment. The approach is now to take the reference compound in the given orientation and conformation, to superimpose the test compound whose conformation is taken to be unknown flexibly onto the reference and to measure the distance of the observed versus the predicted conformation and orientation of the test compound in terms of the traditionally used root-mean-square deviation (RMSD) of all non-hydrogen atoms. Of course there are a lot of pros and cons to this kind of evaluation. However, it gives at least an idea as to what extent a method is able to reproduce configurations determined by experiment. Since the superposition can only be meaningful in those regions where reference compound and test compound overlap to a reasonable extent we propose a minimum overlap volume requirement of 60% of the volume of the smaller compound and measure the RMSD of only those atoms of the test compound that overlap with the reference in the experimentally observed configuration. Table 1 shows the respective adjusted values (differentiated by the rank at which the alignment

53 Table 1. X-ray data comparison Rank Top Top 10 All

RMSD <1.0 Å

<1.5 Å

<2.0 Å

20.7% 28.7% 39.4%

37.7% 45.7% 55.4%

45.4% 53.7% 62.0%

The table displays the percentage of the 284 alignment pairs for which reasonable alignments are achieved. The results are categorized by different RMSD thresholds and different fractions of the rank lists considered.

appears on the solution list) for a reasonable subset of 284 pairs of molecules (defined by the 60%-criterion, cf. above). With almost 50% of the alignments reproduced with a reasonable RMSD on one of the first ten ranks this is an encouraging result. Similar validation studies recently performed for docking methods achieved reproduction rates of about 60–70% [35,36].

Application Since we do not have access to real activity data on an assay, we simulate a real-world application of our virtual screening methods by hiding a set of known active molecules in a larger database of compounds of unknown activity. The test is to pick as many known active molecules among a certain small fraction of high ranking candidates. We approach this by the help of a single active compound or at most a small set of known actives that serve as a bait to fish the other active compounds. Traditionally the measurement of performance for this kind of approach employs enrichment values or enrichment plots that quantitatively show how many more active compounds are found in a fraction of the processed database than in a random selection. The enrichment is defined as

N A (p ) denotes the number of known active molecules among the top-ranking fraction p of compounds of the database (hit-rate), N A is the number of active molecules in the entire database, and N is the size of the screened database.

54 Sometimes it is also useful to scale the enrichment by the maximum possible enrichment opt (EA(p)) in order to allow for a fair comparison across different datasets or datasets of varying size.

Note that this normalization simply reduces the enrichment to the fraction of the actives that are maximally to be found. In order to give some visual impression as to what extent the actives condense in the upper portion of the database, we provide a bar-code-like visualization of the distribution of the actives after screening in Figure 10. Finally, we compare the performance and the hit lists of our method to those of a standard 2D approach, namely the DAYLIGHT fingerprint software [37], in order to show that the actives picked by either method differ significantly. The experiments have been carried out on datasets of three different sizes (named small-size, medium-size, and large-size). Large numbers of training evaluations have been performed on 100 randomly selected compounds of a dataset collected by Briem [38]. Each of these training sets contains 10 active and 90 inactive compounds. Briem’s collection comprises the following activity classes: 136 PAF-antagonists (PAF), 114 HMG-CoA inhibitors (HMG), 40 ACE inhibitors (ACE), 49 thromboxan A2 receptor antagonists (TXA2), 52 5HT3 receptor ligands (5HT3), and 581 randomly selected compounds from the MDDR database [39]. All actives are known to bind in the nanomolar range to their target. We took each activity class into account separately. All medium-size test cases were composed of actives of a specific activity class and the remaining structures of Briem’s dataset as the inactives. I.e., in total, we roughly have 1000 molecules for each of the five different sets of actives. The large-size test databases consist of all actives of a specific class and the whole NCI database as a supplement of inactive compounds [40]. This size of datasets with random chance of a hit of approximately 1/1000 closely mimics real world applications. In the first set of experiments carried out on the training dataset, we applied FLEXS using the following iterated protocol for evaluating the capabilities of the Gaussian merging procedure in order to enrich the information content of the reference structure. (1) We start with each of the 10 active molecules in the training sets, in turn, and determine the superposition score of the remaining 99 molecules. The solution lists ranked by score are analyzed to obtain 10 enrichment curves, the mean of which is displayed as a single curve in Figure 6 for the HMG example case. (2) The active molecule on rank x (x to be chosen) of a solution list is taken in its generated position and

55

Figure 5. Flow chart of the automated procedure applied with six iterations to obtain enriched Gaussian representations for the reference compounds.

merged (by means of its Gaussian representation) to the reference compound with which it was aligned. The composite model is then taken as the original reference in another execution of step 1 again. The protocol is illustrated in Figure 5. The choice of the rank x of the active compound to be merged poses a tradeoff between close analogues (top ranking) with quite reliable placements and remote analogues (lower rank) with less reliable placements but higher probability to be of a different chemical class. We tested taking placements on rank 3 and 10, respectively, with quite similar performances for both choices. ~ Figure 6 shows the average enrichment charts displaying EA(p) for the first 10 percentiles on the HMG example cases. The chart on the left hand side displays the results on the training set. The right hand side shows the results on Briem’s entire dataset. As a point of reference the enrichments archived with the DAYLIGHT fingerprint software have been included as the first of the series of curves displayed. Note that similar results are achieved on both dataset sizes. Also the reference enrichment protocol performs similarly across the different activity classes. Most enrichment curves are found to increase significantly from step to step, hence indicating that, in fact, the information content of the reference model is increased by the merging of the Gaussian descriptors. Note that this also implies that, to some extent, meaningful alignments have been generated for the molecules, since otherwise, instead of reinforcement of the reference model, a blurring effect on the same would have been expected. Also, Figure 6 exemplifies our experience that the results achieved on a smaller dataset convincingly carry over to a larger one. I.e., an active compound that fishes a lot of other actives of the smaller set is also a good bait on the larger application. This is an important result since it allows for perform-

56

Figure 6. Normalized enrichments on the first 10 percentiles of the training set (left) and Briem’s entire set (right), both for the HMG example, are displayed. As a reference the respective enrichments obtained using the DAYLIGHT software are added in front of the other curves showing steps 1 to 5, respectively.

ing several runs and extensive testing on a subset at low computational costs, followed by the real and presumably expensive applications using a carefully selected set of reference compounds. Of course some bias may be present by chance on a small subset and may mislead the choice of a potent reference structure. However, in our experience, taking not only a single but the three best performing structures on the training set was sufficient to consistently improve results. Also, we found that fusing the results of these individual runs consistently enhances the performance. The fusion operators we took into account were minimum, maximum, and the mean of the individual ranks. Technically, applying the fusion operator f to a compound that has been aligned to three different references on ranks r1, r2, and r3, respectively, means to determine the rank r of the compound by r = f (r1 , r2, r3). The newly determined rank is used to reorder the sequence of the database. Figure 7 shows the enrichment plots obtained by applying the different operators on the HMG example case. Usually, the min operator performs best. Note that the fused ranking may even be superior to the individual rankings. We made the important observation that the hits obtained with our tools are usually quite dissimilar from those obtained by a 2D method. For a comparison we applied the DAYLIGHT software again. Figure 8 shows the average overlap we observed on the training datasets across all activity classes, among the top 10 candidates. The important point about this result is that 2D similarities are usually much less surprising to a chemist than a 3D similarity of topologically only remotely similar molecules. Our final tests were performed using both of our methods in a two-step filtering protocol. With the preceding experiments we determined the best performing fragments in Briem’s dataset for the different activity classes.

57

Figure 7. Enrichment charts for F L E X S (bottom) and RIGFIT (top) on the training datasets of the HMG example are displayed. The curves (differentiated by symbol) show the behaviour of the three fusion operators we tried. Additionally, the individual enrichments are shown as error bars. The min curve in the bottom chart is clearly on top and therefore superior to the individual results.

58

Figure 8. Across all training datasets the average number of top 10 hits found by FLEXS and DAYLIGHT, respectively, are shown. Additionally the average fraction of common hits is indicated by a different shading.

Figure 9. Enrichments on the first 10 percentiles of the NCI database (dashed) and on the top 5000 high-scoring molecules of the filtering step (solid), both for the HMG example are displayed. In the filtering step R IGF IT was used to shrink the NCI to 5000 molecules. The enrichment in the second step was achieved with F L E X S.

These have been used to screen the entire NCI [40] database. For a complete search with a single fragment this RIGFIT experiment took roughly 30 h of computing time. The dashed curve in Figure 9 shows the achieved enrichment in the HMG example case. Out of 114 active molecules, 53 have been pushed to the top 5000 ranking molecules, i.e., to the top 4% of the database. These top

59

Figure 10. The bar charts qualitatively show the compression of the actives to the top scoring fraction of the database. The left hand side shows the R I G F IT experiment on the NCI database with 120 000 structures and randomly scattered actives in the beginning (leftmost). The top 5000 ranking compounds contain 53 of the 114 HMG actives after screening (second from left) and are taken as input for the F L EX S screening again randomly scattered (second from right). The resulting ranking of the actives is indicated in the rightmost column.

60 5000 candidate structures have been screened using FLEXS, again with the best performing reference compounds on the smaller datasets. For a complete search with a single reference compound this experiment took about 150 h of CPU time. It has been performed in parallel with our daily computing routine, distributing the computing task across the available unused hardware in our institute overnight. The solid curve in Figure 9 shows the achieved enrichment. The bar charts in Figure 10 give an impression of how the distribution of actives looks with these enrichments as compared to random scattering of the actives. Again, similar results have been obtained across all datasets.

Conclusions and outlook We presented an overview on structural superpositioning and molecular similarity, in general. We contributed to this field the alignment tool FLEXS which allows to flexibly superimpose one structure onto a rigid reference rapidly. Several recent enhancements of the software, like full automation, fragmentbased superpositioning at high speed, and enriching the representation of the reference structure, made our software applicable to screening virtual databases. We designed a two-step filtering protocol that enables us to go through large compound collections and mimic real-world applications in our evaluation scenario. It is demonstrated that the critical parameters, like the selection of an appropriate reference compound, can be trained on a small-size training set at low computational costs and that results carry over reasonably well to the large-scale application. A topic to be addressed in future work is multiple flexible superposition considering several compounds simultaneously. Iterated pairwise alignments and making use of the merging of Gaussian representations of several molecules certainly can produce reasonable starting positions for a multimolecule flexible post-optimization on which we are currently working. Another important issue is that with the currently used screening protocol we reduce the wealth of information contained in the laboriously aligned ligands to a single number, namely the similarity score. There is certainly much more information present in an alignment that could be exploited to predict activity.

Acknowledgements This work is part of the Relimo project, funded by the bmb+f (Bundesministerium für Bildung und Forschung) and the participating industrial partners

61 Boehringer Ingelheim Pharma KG and Merck KgaA, Darmstadt under Grant 0311 620.

References 1. Lemmen, C., Hiller, C. and Lengauer, T.: J. Computer-Aided Mol. Design, 12 (1998) 491. 2. Mestres, J., Rohrer, D.C. and Maggiora, G.M., J. Mol. Graph. Model., 15 (1997) 114. 3. Crippen, G.M., J. Med. Chem., 22 (1979) 988. 4. Crippen, G.M. and Havel, T.F., Distance Geometry and Molecular Conformation, Research Studies Press, Taunton, U.K., 1988. 5. Marshall, G.R., Barry, C.D., Bosshard, H.D., Dammkoehler, R.D. and Dunn, D.A., In Olson, E.C. and Christoffersen, R.E. (Eds.), Computer-Assisted Drug Design, Vol. 112, American Chemical Society, Washington, DC, U.S.A., 1979, pp. 205-222. 6. Takahashi, Y., Maeda, S. and Sasaki, S.-I., Anals Chimica Acta, 200 (1987) 363. 7. Martin, Y.C., Bures, M.G., Danaher, E.A., DeLazzer, J., Lico, I. and Pavlik, P.A., J. Cornput.-Aided Mol. Design, 7 (1992) 83. 8. Lamdan, Y. and Wolfson, H.J., In IEEE International Conference on Computer Vision, Tampa, FL, 1988, pp. 238-249. 9. Nussinov, R. and Wolfson, H.J., Proc. Natl. Acad. Sci. USA, 88 (1991) 10495. 10. Kabsch, W., Acta Crystallogr., A32 (1976) 922. 11. Hurst, T., J. Chem. Inf. Comput. Sci., 34 (1994) 190. 12. Carbó, R., Leyda, L. and Amau, M., Int. J. Quant. Chem., 17 (1980) 1185. 13. Hodgkin, E.E. and Richards, G., Int. J. Quant. Chem., Quant. Biol. Symp., 14 (1987) 105. 14. Thomer, D.A., Wild, D.J., Willett, P. and Wright, P.M., J. Chem. Inf. Comput. Sci., 36 (1996) 900. 15. Lemmen, C., Lengauer, T. and Klebe, G., J. Med. Chem., 41 (1998) 4502. 16. Hahn, M., J. Chem. Inf. Comput. Sci., 37 (1997) 80. 17. Wang, T. and Zhou, J., J. Chem. Inf. Comput. Sci., 38 (1998) 71. 18. Brint, A.T. and Willett, P., J. Chem. Inf. Comput. Sci., 27 (1997) 152. 19. Martin, Y.C., Bures, M.G. and Willett, P., In Lipkowitz, B. and Boyd, D.B. (Eds.), Reviews in Computational Chemistry, VCH, Weinheim, Germany, 1990, pp. 265–294. 20. Johnson, M.A. and Maggiora, G.M. (Eds.), Concepts and Applications of Molecular Similarity, John Wiley & Sons, New York, NY, U.S.A., 1990. 21. Kubinyi, H. (Ed.), 3D QSAR in Drug Design. Theory, Methods and Applications, ESCOM, Leiden, The Netherlands, 1993. 22. Dean, P.M., In Dean, P.M. (Ed.), Molecular Similarity in Drug Design, Blackie Academic & Professional, London, U.K., 1995, pp. 1–23. 23. Good, A.C., In Dean, P.M. (Ed.), Molecular Similarity in Drug Design, Blackie Academic & Professional, London, U.K., 1995, pp. 24–56. 24. Humblet, C. and Dunbar Jr., J.B., In Venuti, M.C. (Ed.), Annual Reports in Medicinal Chemistry, Vol. 28, Chapter VI: Topics in Drug Design and Discovery,Academic Press, London, U.K., 1993, pp. 275–284. 25. Klebe, G., In Kubinyi, H. (Ed.), 3D QSAR in Drug Design. Theory, Methods and Applications, ESCOM, Leiden, The Netherlands, 1993, pp. 173–199. 26. Willett, P., J. Mol. Recogn., 8 (1995) 290.

62 27. Brown, R.D. and Martin, Y.C., J. Chem. Inf. Comput. Sci., 36 (1996) 572. 28. Bures, M.G., In Charifson, P.S. (Ed.), Practical Application of Computer-aided Drug Design, Marcel Dekker, New York, NY, U.S.A., 1997, pp. 39–72. 29. Matter, H. and Rarey, M., In Jung, G. (Ed.), Combinatorial Organic Chemistry, WileyVCH, Weinheim, 2000. 30. Lernmen, C. and Lengauer, T., J. Cornput.-Aided Mol. Design, 14 (2000) 215. 31. Klebe, G., Mietzner, T. and Weber, F., J. Cornput.-Aided Mol. Design, 8 (1994) 751. 32. Nissink, J.W.M., Verdonk, M.L., Kroon, J., Mietzner, T. and Klebe, G., J. Comput. Chem., 18 (1997) 638. 33. Lemmen, C. and Lengauer, T., J. Comput.-Aided Mol. Design, 11 (1997) 357. 34. Lernmen, C., Computational Methods for the Structural Alignment of Molecules. Number 1 in GMD Research Series. GMD – Forschungszentrum Informationstechnik, Sankt Augustin, Germany, 1999. 35. Jones, G., Willett, P., Glen, R.C. and Taylor, R., J. Mol. Biol., 267 (1997) 727. 36. Kramer, B., Rarey, M. and Lengauer, T., Proteins Struct. Funct. Genet., 37 (1999) 1. 37. DAYLIGHT Inc., Mission Viejo, CA, U.S.A. DAYLIGHT Software Manual, 1994. 38. Briem, H. and Kuntz, I.D., J. Med. Chem., 39 (1996) 3401. 39. MDL Information Systems Inc., San Leandro, CA, U.S.A. MACCS Drug Data Report (MDDR). 40. NCI DB release (http://dtp.nci.nih.gov/docs/3d_database/strucural_information/structural_data.html, containing 126,710 structures, 07/01/1998. After conversion to SYBYL’s mol2 format and validation 121,491 structures remained.

Perspectives in Drug Discovery and Design, 20: 63–81, 2000. KLUWER/ESCOM © 2000 Kluwer Academic Publishers. Printed in the Netherlands.

A recursive algorithm for efficient combinatorial library docking MATTHIAS RAREY* and THOMAS LENGAUER GMD - German National Research Center for Information Technology, Institute for Algorithms and Scientific Computing (SCAI), Schloß Birlinghoven, D-53754 Sankt Augustin, Germany

Summary. Due to the rapid development of combinatorial chemistry and high throughput screening, a new virtual screening scenario emerged. While previously the focus was on analyzing large collections of compounds available to the medicinal chemist, nowadays the search space is defined in the form of large, possibly virtual, combinatorial libraries. In this article we describe how the structure of combinatorial libraries can be exploited to speed up docking predictions. Based on our incremental construction method implemented in the docking software FLEX X we developed a recursive scheme to traverse the combinatorial library space efficiently. We applied our docking algorithm to three libraries with sizes from a few hundred up to 20 000 molecules. In all cases, we are able to show that similar results are achieved as in a sequential docking of the library molecules. The computing time, however, can be reduced by a factor of up to 30 resulting in an average time of about 5 s per library molecule. Key words: combinatorial docking, combinatorial libraries, conformational analysis, drug design, molecular docking, receptor ligand interaction

Introduction The development of combinatorial chemistry and its application to drug design [1,2] has led to new search problems in the area of virtual screening [3]. First of all, the number of molecules which can be synthesized on the basis of combinatorial chemistry increased dramatically compared to classical methods. This implies a much larger search space which has to be covered by virtual screening methods. Probably more important for the development of virtual screening methods is the introduction of structure into this increased search space. If an unstructured compound collection is given, each molecule has to be analyzed independently in a screening experiment. Combinatorial libraries, however, * To whom correspondence should be addressed. E-mail: [email protected]

64 follow a systematic build-up law synthesizing molecules from a highly limited set of building blocks. This structure can be exploited to severely reduce the runtime of virtual screening calculations. Here we focus on the structure-based design of targeted combinatorial libraries through fast molecular docking. In general, we can distinguish three kinds of problems. In all cases, we assume that the 3D structure of the target protein is known: Combinatorial Docking Problem Given a library, calculate the docking score (and the geometry of the complex) for each molecule of the library. R-group Selection Problem Given a library, select molecules for the individual R-groups in order to form a smaller sublibrary with an enriched number of hits for the target protein. de Novo Library Design Problem Given a catalog of molecules, design a library (including the rules of synthesis) optimizing the number of hits for the target protein. Methods for these problems have emerged from the area of molecular docking and de novo ligand design (see [4] for an overview on combinatorial docking methods, [5-12] for reviews on docking and de novo design). In the former case, the docking algorithms are applied to fragments of the whole molecule and the resulting information is then connected, yielding placements for individual library molecules. In the latter case, the de novo ligand design is constrained by predefined rules of synthesis. Early algorithms for the combinatorial docking problem analyzed the similarity in given ligand datasets in order to speed up the search process. The focus in these papers is to relate ligands within the dataset structurally. One approach to do so is to generate a minimal tree structure representing the whole ligand dataset [ 13]. Another approach is to speed up conformational searching based on clustering similar molecules [14]. In both cases, the derived hierarchy of molecules can then be used in an incremental construction docking method. The combinatorial docking tools PRO_SELECT [ 15] and CombiDOCK [ 16] are also based on the incremental construction method. In both approaches, a library is formed by a template (or core) molecule with a set of attachment points to which one out of a predefined set of substituents can be connected. The template is positioned in the active site without considering the substituents. Starting from a few orientations of the template, the substituents are placed into the active site of the protein independently. In case of

65 PRO_SELECT, substituents are then selected based on score and additional criteria like 2D similarity and feasibility of synthesis. CombiDOCK performs an additional step in order to calculate a score for whole library molecules by combining fragment scores. Some approaches based on ligand de novo design software have been published for R-group selection problems. Kick et al. [17] applied a variant of the BUILDER program [ 18] to the preselection of substituents for a library targeted to Cathepsin D. The paper also contains an experimental validation of the method. Bohm [19] applied the LUDI program to the docking of two groups of fragments which can be connected painwise in a single-step reaction to search for new thrombin inhibitors. In principle, all programs for fragmentbased de novo ligand design can be applied in a similar way to the R-group selection problem. Finally, we mention two methods for de novo library design. Caflisch [20] applied the MCSS technique generating fragment placements which are subsequently connected. The DREAM++ software [2 1] combines tools for fragment placement and selection. The selection process is done such that only a small set of well-characterized organic reactions are needed to create the library. Here we propose a new algorithm for the combinatorial docking problem based on the incremental construction algorithm implemented in FLEXX [22–24]. The algorithm is part of a new combinatorial docking extension of FLEXX called FLEXXc. The algorithm is more flexible than previous approaches with respect to the variety of libraries which can be processed. Also, R-groups are placed sequentially such that R-group dependencies are taken into account. We applied FLEXXc to three libraries with up to 20 000 molecules and compare the results to sequential docking calculations of the enumerated library.

Methods FLEXXc is a toolbox for docking calculations on combinatorial libraries providing functionality for handling libraries, for docking single fragments, combinations of fragments, or whole molecules from the library, and for browsing through combinatorial docking results. The reason for this design is that, depending on the size of the library and its structure (the number of R-groups and how they are connected), different docking algorithms might be useful. In the following, we will first present the model for combinatorial libraries underlying FLEXXc. Then, we will focus on the data structures developed for fast online generation of library molecules. In the third part, we will intro-

66

Figure 1. Model of the structure of a combinatorial library: Tree representation of a library with an explicit core (a); instance of the core and R-group 1 with the X - and R-atoms explicitly marked (b); resulting partial molecule after linking a core and an R-group 1 instance (c).

duce one of our docking methods, called the recursive combinatorial docking algorithm.

A model for combinatorial libraries In FLEXXc, a combinatorial library is represented by a rooted tree structure, the so-called library tree (see Figure 1). Each node of the tree stands for a building block, either the core or an R-group, the core is the root of the tree. For each building block including the core, several alternative molecule fragments are allowed; we will call them instances of the R-group or core in the following. The edges of the tree encode the rules for building the molecules of the library. We assume that atoms involved in forming a bond between two building blocks are explicitly marked by the user. We call the unique atom with which an instance is linked to the core or the previous R-group the Xatom and the atoms to which further R-groups can be added the R-atoms (see Figure 1). This model has some limitations: First of all, a structure of the library and the individual R-group instances are static. This means that the synthesis aspect of library design is separated from the docking process and no true de novo library design is possible. However, as soon as the chemical reactions as the basis of the library are given, this model is very flexible since various core templates and a large number of alternatives for each R-group can be preselected. Due to the tree structure, the model does not handle ring closures between different R-groups. In those cases where the synthesized ring differs only in the ligands connected to the ring, the library tree can be defined by taking the ring as a single R-group. Note that the model is able to handle R-groups connected to R-groups.

67

Figure 2. The tree representing the structure of a combinatorial library. On the right, the tree is shown after switching R-group 3 to the core. Only the path between the previous core and the R-group has to be modified. Unchanged parts are shown in grey.

Finally, the model has an explicit core which may be handled differently in the docking algorithms. Because we are able to handle a set of core instances, there is no real difference between a core and an R-group within the model. Therefore, each R-group can also be defined as the core. In fact, we have implemented a simple operation to change the role of the core and an Rgroup called core switching (see Figure 2) which will be used in one of our test cases later on.

Data structures for combinatorial libraries For the development of data structures for combinatorial libraries, we have the following design objectives: First, the library is handled in its closed form, i.e. it is represented by its R-group instances rather than by an enumerated list of molecules. This allows us to handle large libraries in main memory and computing time for generating library molecules is spent only if the molecules are needed for calculation. Second, the data structure must allow for an efficient construction of the library molecules including the handling of all physico-chemical data which has to be assigned for docking calculations. Third, the data structure has to allow an efficient transfer of docking results from one (partial) molecule of the library to another. In order to explain how we reach these objectives, we give a rough description of the basic data structures within FLEXX: A molecule is stored by a set of cross-linked lists. Each list contains information about specific objects like atoms, bonds, or ring systems. Each object is labeled with the physico-chemical data assigned during the data structure initialization like interaction types and geometries assigned to atoms, low energy torsion angles assigned to bonds, and low energy ring conformations assigned to ring systems.

68

Figure 3. FLEXX hierarchical data structures: molecules are represented as object lists. A docking solution (placement) contains two pointers (matches and conformation) into tree structures, storing paired interactions and the molecule conformation hierarchically. Both trees contain references back to the molecule.

In a docking calculation, two hierarchical data structures are constructed during the incremental construction of the complex. The first data structure contains information about paired interactions and refers to atoms, the second contains information about the constructed conformation and refers to bonds and ring systems. The data structures are hierarchical in the sense that placements with a common ancestor in the incremental construction also share common parts in these two data structures. Figure 3 gives a graphical representation of this design. The advantage of these data structures is that they share common information which does not need to be copied during the incremental construction at the sacrifice of having to maintain a complex linked data structure during the calculation. If we want to transfer placement information from one library molecule to another having a part in common, we have to update all links referring from the hierarchical placement information to the molecule. This time-consuming and complicated process can be avoided only if the common parts of the two molecules are identical, i.e. are located at the same place in physical memory. We therefore use the following scheme which builds molecules from the core and R-group instances directly, avoiding copying of molecule data: Each instance of a core or R-group is handled internally like a complete molecule with additional information on the connecting atoms. Two operations are developed which allow the ‘virtual synthesis’ of the library molecules. The extend operation adds an R-group instance to either the core or a partially synthesized library molecule. First, all object lists are linked

69

Figure 4. A library molecule considered as a stack of core/R-group instances. The combination of remove and extend operations allows to go from one molecule of the library to another one. Placement information calculated for the common part (grey shaded boxes) can be shared.

and the connecting X- and R-atom are removed from the atom list. In the bond and ring system list, the X-atom is replaced by the atom adjacent to the R-atom and the R-atom is replaced by the atom adjacent to the X-atom (see also Figure lb). A new bond is formed between the replacing atoms and physico-chemical data (torsion angle information) is attached to it. The remove operation performs the inverse of the extend operation, i.e. the created bond is deleted, the X- and R-atoms are inserted and reintroduced into the bond and ring system lists, the object lists are unlinked. So far, the remove operation is only able to remove the R-group instance added last. Based on this data structure, a combinatorial library molecule can be considered as a stack of selected R-group instances (see Figure 4). With each extend operation, we select an R-group and an instance of this R-group and put it on top of the stack. With the remove operation we are able to remove Rgroup instances from the top of the stack. We therefore go from one molecule of the library to another by removing some R-group instances and adding different instances. The R-group instances not changed during this process describe the part which is in common between the two molecules. The placement information calculated for this part can be used for the new molecule without modification.

The recursive combinatorial docking algorithm Based on the data structure described above, several docking algorithms can be easily implemented like docking of individual R-groups, core–R-group combinations, or sequential docking of the library molecules enumerated during the docking calculation. Here, we present a different method based on recursively adding and removing R-group instances. The input of the algorithm is a three-dimensional protein structure, a library tree, and an ordering of the R-groups, called the sequential build-up order S. S defines in which order the R-groups will be added to the molecules during the docking process. S always starts with the core, it then contains

70

Figure 5. The enumeration process of the library can be considered as a tree once a build-up order is defined for the R-groups (shown on the left). Each tree level corresponds to adding an R-group, the leaves represent the individual library molecules. The numbers within the tree nodes identify the instance of the R-group added.

the R-groups in an order such that an R-group is always linked to one of the previous R-groups in S or to the core. Therefore, every prefix of S defines a connected partial molecule. Currently, defining the sequential build-up order is not automated and has to be done by the user. We will discuss its influence to the docking calculation in the Results section. In the first phase, the core instances are docked to the protein. If the instances are small, only the base placement algorithm of FLEXX [25] is used, otherwise the whole docking procedure consisting of automatic base selection [23], base placement, and complex construction [22] is used. Afterwards, a set of about 200 to 600 energetically favorable orientations for each core instance is available. In the second phase, we subsequently dock the library molecules using the incremental construction algorithm. Given the sequential build-up order, all library molecules can be considered as the leaves of a large construction tree: On the first level of the tree, there is a node for each instance of the core. On the i-th level, there is a child node for each instance of the i-th R-group in S for each node of the (i – 1)-st level (see Figure 5). With the recursive combinatorial docking algorithm, we perform a preorder traversal of this construction tree: The first instance of each R-group is added until we reach the first molecule of the library (the left-most leave of the construction tree). Then all molecules are built traversing the tree depth-first and from left to right. The input of the recursive procedure is the partially constructed molecule m, the selection of placements P calculated for m, and the partial sequence S’ of the build-up order sequence S containing R-groups which have not been

71 added so far (the postfix of S). In order to describe the algorithm, we use the following terms: head(S’) is the first element of S’; tail(S’) is S’ except of the first element head( S’). If S’ is empty, m is a complete library molecule and the placements in P are stored or evaluated further. Otherwise, the R-group head(S’) is added in the current recursive call, For all instances r of head(S’) the following steps have to be performed: First the current molecule m is extended by r using the extend operation of the combinatorial library data structure explained above. Then, the incremental construction algorithm is executed to extend the placements in set P by the added instance r of the molecule to the new placement set P’. Next, the recursive procedure is called again with the extended molecule, the newly calculated P’, and the build-up sequence tail(S’) skipping the first R-group of S’. When the recursive call is finished, two clean-up operations are necessary. First, the placement set P’ is deleted, then the R-group instance r is released from the molecule again. An outline of the recursive algorithm is given in Figure 6. In order to initiate the recursive calculation, the algorithm is called for each core instance with the placements from the first phase and the build-up sequence without the core, tail( S). For further evaluation, the highest scoring placements for each library molecule are stored to disk. For this task, we use a priority queue of files which allows to always keep exactly the docking results for the k highest scoring library molecules (where k is user defined).

Results The recursive combinatorial docking algorithm can preferably be applied in cases where the library has one group exhibiting characteristic binding properties to the protein. Although this is not always the case, it occurs frequently in structure-based drug design, for example if a small molecule is already known to bind to a specific group of the protein like in metallo-proteinases or to specificity pockets like in thrombin, biotin, or DHFR.

Data sets In order to test the method, we have created three different libraries for thrombin and DHFR. The first library, called the benzamidine library, is very small and contains a core and one R-group. The core is either para- or meta-substituted benzamidine, there are 46 instances for the R-group making 92 molecules in total. The R-group instances differ in size from 1 to 39 atoms, contain up

72 COMBINATORIAL_DOCKING( protein R,combilib L) % initiates recursive combinatorial docking calculation 1 S ← define R-group build-up order; 2 foreach instance m ∈ core of L do P ← dock_molecule(R, m); 3 4 recursive_rgroup_placement(R, m, P,tail(S)); od;

RECURSIVE_RGROUP_PLACEMENT(protein R, moleculem, placements P, buildup-order S) % extend m sequentially by the instances of an R-group, calculates placements % and further adds R-groups recursively 1 if S = ∅ then 2 evaluate placements P; 3 return; fi; 4 foreach instance r ∈ head(S) do m ← m extended by R-group instance r; 5 P' ← incremental_construction(R, m, P); 6 recursive_rgroup_placement (R, m, P', tail(S)); 7 delete placements P'; remove R-group instance r from m; od; 8 Figure 6. The recursive combinatorial docking algorithm: After placing the core molecules, the recursive traversal through the library is started. In each call, all instances of one R-group are added and the incremental construction algorithm is applied to the extended molecule.

to two ring systems and up to eight rotatable bonds. The molecules including binding data to thrombin have been collected by Böhm and Klebe [26]. The second library is called the pyridine library, taken from [27]. It contains a single core molecule and three R-groups as shown in Figure 7a. The R-groups contain 23, 24, and 7 instances making 3864 molecules in total. The library contains molecules with up to 94 atoms, four ring systems, and 21 rotatable bonds. The third library is called the UGI-160 library and is taken from [28]. The library is based on the UGI reaction and contains a core with two instances due to the stereo center and four R-groups (see Figure 7b). Unfortunately, not all R-group instances of the library are published, such that we have used the (2 × 2 × 2 × 10 × 2) = 160 molecules sublibrary from the 320 000 molecule library listed in the publication. We artificially increased this library to a 2 × 10 × 10 × 10 × 10 library named UGI-20000 by creating additional R-group instances.

73

Figure 7. Cores of the pyridine (a) and the UGI (b) library. The * labels the stereo center of the UGI core.

All core and R-group instances have been preprocessed with Sybyl [29] as follows: correct sybyl atom and bond types as well as formal charges have been assigned, hydrogens were added, 3D structures have been generated and energy minimized. Finally, we marked X- and R-atoms.

Docking the benzamidine library into thrombin For this first experiment, a 3D structure of thrombin was taken from PDB [30] entry ldwd containing the complex between human α-thrombin and NAPAP [31]. The active site was defined to contain all atoms located at a distance of less than or equal to 6.5 Å around atoms from NAPAP, all water molecules were removed. Because there is only one R-group, no build-up order has to be selected. Comparing a sequential docking run (docking each library molecule independently) with a combinatorial run shows only minor differences in the results (correlation coefficient: 0.9), see Figure 8a. This could be expected since the benzamidine unit plays a dominant role in the binding process and is docked first in the combinatorial run. Also, there is no choice concerning the R-group order. Differences in the calculations result from the fact that, in the sequential run, the algorithm selects a set of base fragments distributed over the whole ligand while, in the combinatorial run, the base fragments are limited to the core molecules, i.e. benzamidine in this case. Comparing the calculated docking scores against the experimental binding affinities, a correlation of 0.58 is achieved, see Figure 8b. It should be noted that, in this case, all molecules bind to thrombin and span a relatively small range of binding affinities.

74

Figure 8. Correlation between sequential and combinatorial docking (a) of the benzamidine library and the correlation of docking scores from the combinatorial run against experimental values (b).

75

Docking the pyridine library into DHFR The 3D structure of DHFR was taken from PDB entry 4dfr containing the complex between E. coli DHFR and methotrexate [32]. The active site was defined to contain all atoms at a distance of less than or equal to 6.5 Å around methotrexate. All waters except HOH A 403 and 405 have been removed. These two waters have been kept because they are known to be highly conserved. Although they form hydrogen bonds to the pyridine unit, they are not necessary to determine the correct orientation of the pyridine core with FLEXX. In this library, the three R-groups are located very closely to each other which makes it a good test case for studying the influence of the build-up order. We performed the docking calculations with three different orders: R3 – R4 – R5, R5 – R4 – R3, and R4 – R3 – R5. While, for the first two orders, the results are again very similar to the sequential run (correlation of 0.81 and 0.87), the correlation drops to 0.74 in the third case. Looking at the plot (see Figure 9) reveals that some molecules cannot be docked correctly in the combinatorial run. The reason might be that in these cases the instances of R4 prefer orientations in which they cause overlap with R3 or R5 later on. Although this library might be an extreme case because of the relatively small distance between the R-atoms at the core, this example shows that dependencies between R-groups have to be taken into account in combinatorial docking calculations.

Docking the UGI libraries into thrombin For docking the UGI libraries we used the same 3D structure of thrombin from ldwd that we used for docking the benzamidine library. An inspection of the library shows that R-group 3 is designed for binding into the S 1 pocket of thrombin (see Figure 10). We therefore used our core switching functionality in order to start the recursive combinatorial docking calculation with R-group 3 instead of the core. For this experiment, neither structures nor experimental binding affinities were available to us. Weber et al. [28] presented the best 4-component compound found during their optimization procedure (see Figure 11), the stereo isomer is not specified. Within the results of docking the sublibrary we found the highest ranking stereo isomer (R-configuration) of this molecule at rank 2 with a predicted score of –40 kJ mol–1. This is in good agreement with the experimental value of –38 kJ mol–1. In the sequential docking run, this molecule was found only at rank 38 with a suboptimal placement with score –35 kJ mol–1. In this application, the combinatorial algorithm produced a better result than the sequential al-

76

Figure 9. Correlation between sequential and combinatorial docking of the pyridine library info DHFR shown for two different build-up orders.

77

Figure 10. Compounds used for R-group 3 from the UGI library, taken from [28].

Figure 11. Molecule from the UGI-library with highest binding affinity (–38 kJ mol–1) found in the study of Weber [28].

gorithm. The reason for this is that, in the combinatorial run, knowledge of the docking problem is entered by specifying the part of the molecule that is docked first. The search space is therefore limited such that the space of low-energy conformations can be searched more exhaustively. In the UGI-20000 library, the molecule was found at rank 1592 which is 15.9% of the database taking into account that two stereo isomers are contained in the library.

Computing times The main advantage of using the recursive combinatorial docking algorithm versus the sequential algorithm is a reduction of computing time. Computing times for all sequential and combinatorial runs are given in Table 1; calcula-

78 Table 1. Computing times Library

Benzamidine Pyridine

UGI-160 UGI-20000

# of

Sequential

mol.

Total

Permol.

Order

Total

5 h 4 min

3:18 min

1

11 min

7.2 s

3da

6:42 min

4-3-5 3-4-5 5-4-3

4 h 13 min 4h 18min 8 h 27 min

3.9 s 4.0 s 7.9 s

92 3864

160 20000

Combinatorial

3-0-1-2-4 24 min 3-0-1-2-4 25 h 42 min

Per mol.

9.0 s 4.6 s

Calculations are performed on a SUN Ultra-30 Workstation with 300 MHz processor. Performed on a cluster of six SUN Ultra-5 workstations.

a

tions are performed on a 300 MHz SUN UltraSPARC workstation. The UGI libraries have been docked with the combinatorial algorithm only. Docking the benzamidine library sequentially took about 3:18 min per molecule on average. The combinatorial run took 11 min in total, which is 7.2 s per molecule on average. Therefore, the combinatorial algorithm is a factor 27.5 faster than the sequential algorithm. For the sequential run of the pyridine library, a parallel version of FLEXX was used running on a SUN workstation cluster with six CPUs. This run took about 3 days real time, which is about 6:42 min per molecule on average (this is a real time measurement and should be taken as a rough worst case estimate since other jobs may have been running on this cluster at the same time). For the combinatorial run, the average computing time per molecule is between 3.9 and 7.9 s depending on the build-up order of the library. Taking into account the different processor speeds and the fact that not all of the CPU time was available, the speed-up lies approximately between 50 and 25. For the UGI libraries, no sequential run was performed. The combinatorial run of the UGI-20000 library took 4.6 s per molecule. The two UGI libraries also demonstrate the fact that the calculation time per molecule drops with the size of the library. For the pyridine library, computing times for different build-up orders are given. In this case, taking the R-group with fewer instances (R5) first lowers the performance. Considering a small example with only two R-groups, say x and y, it can be shown that the average docking time per instance determines the most time-efficient build-up order: Let nx, ny be the number of instances in R-group x and y and tx, ty be the computing time for placing all instances

79 of the R-groups. Then the total time for build-up order x-y is Txy = x + nxty implying that Txy < Tyx if x/(nx – 1) > y/(ny – 1).

Conclusions In this article we presented a combinatorial docking algorithm based on the incremental construction method in FLEXX. The idea of the algorithm is to enumerate the library molecules during the incremental construction algorithm, based on a tree data structure allowing to reuse previously calculated docking results efficiently. Because we assume that the structure of the library is given, the main application of this algorithm is in the development of focussed libraries in cases where already some information about the protein and putative ligands is available. We applied the new algorithm to three different libraries. For two libraries, we compared the sequential versus the combinatorial docking results showing that they are in good agreement. Nevertheless, it is also shown that the results depend on the order in which the R-groups are added in the build-up procedure. For the third library, we demonstrated that the algorithm is able to retrieve a known inhibitor from a large virtual library. Because the docking algorithm enumerates the library on the fly, the algorithm is very time- and space-efficient. The calculations for a large library could be done basically in main memory. Compared to a sequential calculation, the combinatorial docking algorithm is 25 to 30 times faster allowing the docking of a 20000-molecule library on a single CPU in a day. The recursive combinatorial docking algorithm can be applied in cases where one group plays a dominant role in the binding process. This group can be the core or one of the R-groups, it can have several instances and also several different binding modes. If such a group does not exist, the algorithm can still be applied with different build-up orders.

Acknowledgements The authors thank Bernd Kramer (4 Scientific Computing GmbH, Martinsried) for fruitful discussions on this topic and preparing most of the input data used in this article. We also thank our cooperation partners, especially Gerhard Klebe (University of Marburg), Hans Briem and Uta Lessel (Boehringer Ingelheim Pharma KG, Ingelheim) for various helpful comments during the method development. This work is part of the Relimo project, funded by the bmb+f (Bundesministerium für Bildung und Forschung) and the participat-

80 ing industrial partners Boehringer Ingelheim Pharma KG and Merck KgaA, Darmstadt under grant 03 11 620.

References 1. Gallop, M.A., Barrett, R.W., Dower, W.J., Fodor, P.A. and Gordon, E.M., J. Med. Chem., 37 (1994) 1233. 2. Gordon, E.M., Barrett, R.W., Dower, W.J., Fodor, P.A. and Gallop, M.A., J. Med. Chem., 37 (1994) 1386. 3. Walters, W.P., Stahl, M.T. and Murcko, M.A., Drug Disc. Today, 3 (1998) 160. 4. Kubinyi, H., Curr. Opin. Drug Discov. Development, 1 (1998) 16. 5. Kuntz, I.D., Science, 257 (1992) 1078. 6. Blaney, J.M. and Dixon, J.S., Perspect. Drug Discov. Design, 1 (1993) 301. 7. Lewis, R.A. and Meng, E.C., In Vinter, J.G. and Gardner, M. (Eds.), Molecular Modelling and Drug Design, CRC Press, Boca Raton, FL, 1994. 8. Guida, W.C., Curr. Opin. Struct. Biol., 4 (1994) 777. 9. Colman, P.M., Curr. Opin. Struct. Biol., 4 (1994) 868. 10. Rosenfeld, R., Vajda, S. and DeLisi, C., Annu. Rev. Biophys. and Biomol. Struct., 24 (1995) 677. 11. Böhm, H.-J., Curr. Opin. Biotechnol., 7 (1996) 433. 12. Lengauer, T. and Rarey, M., Curr, Opin. Struct. Biol., 6 (1996) 402. 13. Rarey, M., Kramer, B., Bernd, C. and Lengauer, T., In Hunter, L. and Klein, T. (Eds.), Biocomputing: Proceedings of the 1996 Pacific Symposium (electronic version at http://www.cgl.ucsf.edu/psb/psb96/proceedings/eproceedings.html). World Scientific Publishing Co, Singapore, 1996. 14. Makino, S. and Kuntz, I.D., J. Comput. Chem., 19 (1998) 1834. 15. Murray, C.W., Clark, D.E., Auton, T.R., Firth, M.A., Li, J., Sykes, R.A., Waszkowycz, B., Westhead, D.R. and Young, S.C., J. Cornput.-Aided Mol. Design, 11 (1997) 193. 16. Sun, Y., Ewing, T.J.A., Skillman, A.G. and Kuntz, I.D., J. Cornput.-Aided Mol. Design, 12 (1999) 597. 17. Kick, E.K., Roe, D.C., Skillman, A.G., Guangcheng, L., Ewing, T.J.A., Sun, Y., Kuntz, I.D. and Ellman, J.A., Chem. Biol., 4 (1997) 297. 18. Roe, D.C. and Kuntz, I.D., J. Cornput.-Aided Mol. Design, 9 (1995) 269. 19. Bohm, H.J., Banner, D.W. and Weber, L., J. Cornput.-Aided Mol. Design, 13 (1999) 51. 20. Caflisch, A., J. Cornput.-Aided Mol. Design, 10 (1996) 372. 21. Makino, S., Ewing, T.J.A. and Kuntz, I.D., J. Cornput.-Aided Mol. Design, 13 (1999) 513. 22. Rarey, M., Kramer, B., Lengauer, T. and Klebe, G., J. Mol. Biol., 261 (1996) 470. 23. Rarey, M., Kramer, B. and Lengauer, T., J. Cornput.-Aided Mol. Design, 11 (1997) 369. 24. Kramer, B., Rarey, M. and Lengauer, T., Proteins Struct. Funct. Genet., 37 (1999) 228. 25. Rarey, M., Wefing, S. and Lengauer, T., J. Cornput.-Aided Mol. Design, 10 (1996) 41. 26. Böhm, H.-J., Thrombin-Inhibitors, collected experimental data, personal communication. 27. Selassie, C.D., Fang, Z., Li, R., Hansch, C., Debnath, G., Klein, T.E., Langridge, R. and Kaufman, B.T., J. Med. Chem., 32 (1989) 1895. 28. Weber, L., Wallbaum, S., Broger, C. and Gubernator, K., Angew. Chem. Int. Ed. Engl., 34 (1 995) 2280.

81 29. Tripos Associates, Inc., St. Louis, MO, U.S.A., SYBYL Molecular Modeling Software Version 6.x, 1994. 30. Bernstein, F.C., Koetzle, T.F., Williams, G.J.B., Meyer Jr., E.F., Brice, M.D., Rodgers, J.R., Kennard, O., Shimanouchi, T. and Tasumi, M., J. Mol. Biol., 112 (1977) 535. 31. Banner, D.W. and Hadvary, P., J. Biol. Chem., 266 (1991) 20085. 32. Bolin, J.T., Filman, D.J., Matthews, D.A., Hamlin, R.C. and Kraut, J., J. Biol. Chem., 257 (1982) 13650.

Perspectives in Drug Discovery and Design, 20: 83–98, 2000. KLUWER/ESCOM © 2000 Kluwer Academic Publishers. Printed in the Netherlands.

Modifications of the scoring function in FlexX for virtual screening applications MARTIN STAHL F: Hoffmann - La Roche Ltd., Pharmaceutical Research, CH-4070 Basel, Switzerland (E-mail:[email protected])

Summary. A modification of the hydrogen bond score in the docking program FlexX is presented. Hydrogen bonds formed in inaccessible regions of protein cavities thereby gain larger weight than others formed at the protein surface. The modified scoring function is tested with thrombin as a target. Secondly, a recently published knowledge-based scoring function is compared to the FlexX scoring function in several database ranking experiments. Key words: docking, hydrogen bonds, scoring, virtual screening

Introduction The goal of ‘virtual screening’ is to select subsets of chemical libraries in such a way that they are enriched with compounds showing a desired affinity towards a given macromolecular target [ 1,2]. Docking calculations are a means of database prioritization that makes use of the 3D structure of a receptor in a quantitative way [3–8]. The computational expenditure for docking calculations is higher than for 2D similarity and most 3D pharmacophore search methods. Reasonable database sizes are 100–10 000 compounds; depending on the problem specification, between 20 and 1000 compounds can nowadays be docked per CPU-hour. Docking has the advantages that – besides the target 3D structure – no other experimental information is needed and that there is no bias towards finding active compounds of specific structural classes. Since the seminal work by the Kuntz group [9], many docking algorithms have been proposed [ 10–24], some of which have resulted in commercially available packages such as GOLD [17,18], DOCK [9–11,13] and FlexX [21,22], which are suitable for virtual screening purposes. Flexible docking programs search the translational, rotational and conformational space of a putative ligand in the active site pocket of the receptor (see References 25–28 for recent analyses of search strategies). For each pose (denoting a conformation together with its orientation in space relative to the

84 receptor coordinates), the interactions formed between the receptor and the ligand are evaluated. This leads to an energy value, a score, for each pose. The score is a measure of the free energy of binding. It serves various ranking purposes, (i) ranking of poses generated for one ligand molecule and one receptor structure (structure prediction), (ii) ranking of different ligands relative to the same receptor (database prioritization) and (iii) comparison of binding energies of a ligand in two different receptors (selectivity assessment). Up to now, there exists no scoring function with satisfactory performance in all these ranking problems. Indeed, the accurate and rapid prediction of binding free energies is the one challenging problem of structure-based drug design [29–33]. Many types of fast general scoring functions have been proposed, ranging from molecular mechanics force fields [9,15,25,33–36], to functions empirically fitted to experimental binding energies [37–43] and potentials of mean force [44–50]. While the reproduction of experimentally solved complex structures is an accepted way of testing and comparing docking programs [51,52], surprisingly few virtual screening applications have been published. Compound selection based on docking calculations, for example, has been done for thrombin [53,54], thymidylate synthase [55,56], DHFR enzymes [57] and HIV protease [58,59]. However, only in rare cases [13], such studies are conclusive as to the efficiency of the docking tools in enriching subsets of a database in active compounds. The reason may be that only few consistent sets of measured binding energies are available to the public. Here we describe applications of the docking program FlexX [21,22] to a number of Roche in-house target structures and compound libraries in order to assess the performance of this docking tool in virtual screening applications. A simple modification of H-bond score in the FlexX empirical scoring function is presented as well as a comparison of the FlexX empirical scoring function [37] with a recently published knowledge-based scoring function [47], The results presented here give insight into the usefulness of current docking tools and scoring functions in practical virtual screening applications.

Computational methods Preparation of ligand libraries Three sets of Roche compounds were prepared, (i) a set of 3700 Roche compounds with known thrombin Ki values, (ii) a set of 470 diaminopyrimidines with associated IC50 data for S. aureus DHFR, and (iii) a set of 650 COX2 inhibitors. The thrombin and the DHFR data sets uniformly cover 6 orders of magnitude in activity and contain around 15% inactive molecules, while the

85 COX2 dataset covers 4 orders of magnitude. The program Corina [60] was used to generate 3D structures in Sybyl [61] mol2 format. The files generated by Corina were further processed by a C routine that generated a likely protonation state of acidic and basic functional groups. Aliphatic amines, amidines and guanidines were protonated, carboxylic acids were deprotonated, while the protonation state of aromatic nitrogen-containing heterocycles was left as generated by Corina. A set of 5000 randomly selected compounds from the WDI [62] was converted to mol2 format by means of the same procedure.

Docking protocol An X-ray structure of human α-thrombin complexed with NAPAP (PDB [63] code ldwd) was chosen as thrombin target structure. All protein atoms within a distance of 8 Å, of any NAPAP ligand atom were defined as active site atoms. The water molecule 47 in the P1 pocket was retained as an active site atom. In-house X-ray structures of Staphylococcus aureus DHFR [64] and COX2 were prepared in a similar way. All libraries were docked into the thrombin active site by means of the standard scoring scheme for hydrogen bonds. The WDI and thrombin libraries were also docked using a modified version employing accessibility scaling (vide infra). FlexX default settings were used except for ∆Grot, which was set to 0.7 kJ mol–1. Automatic base placement was used for the WDI, thrombin and COX2 libraries (average execution time per molecule: 2 min on an SGI R10K processor). The DHFR library was docked using a fixed placement of the diaminopyrimidine moiety that was taken from the X-ray structure (average execution time 6 s). Only rank 1 solutions were considered for each compound. Since many compounds contained stereocenters with unknown configuration and because a complete enumeration of all stereoisomers could not be afforded, docking was performed with just two enantiomers of an arbitrary stereoisomer generated by Corina. Only the structure and energy with the better rank 1 score was used for database ranking.

Hydrogen bond scoring schemes When solvation effects are not explicitly accounted for, empirical scoring functions treat hydrogen bond (or in the case of force fields electrostatic) contributions equal, whether they are formed on the surface of the protein or within a protein pocket. In our modified scheme, individual H-bond contributions are evaluated as a function of the solvent accessibility of the H-bond partner in the binding pocket, simulating the electrostatic shielding experienced by buried hydrogen bonds [65]. The standard hydrogen bond scoring

86

Figure 1 . Schematic representation of the calculation of surface accessibility. Instead of the 18 probe directions on this 2D representation, 45 spherical directions are used in the 3D case. In a first step, the number of directions is counted along which access to a surface point P is unhindered from the outside of the protein. In a second step, the resulting integer values are scaled in an interval between 0 and 1. Points on planar or convex parts of a surface receive accessibility values of 1.

scheme in FlexX is based on the penetration of so-called interaction geometries [21] of the binding partners. If a hydrogen bond exists by this definition, its contribution to the score is calculated as a constant term multiplied by a penalty function describing the amount of deviation from preset ideal angle and distance values. The modified scoring scheme introduces a second scaling term. For each point on the Connolly surface [66] of the binding site, an accessibility value a is assigned, whose calculation is described in detail elsewhere [67] (cf. Figure 1). Average a values are calculated for each surface atom. A sigmoidal function was empirically chosen to scale the hydrogen bond scores according to the accessibility values. Several functions of the general form (1 + exp(p(a – q) – r)), where a is the atomic accessibility value, were tested with varying values of p, q and r, ranging from almost linear to very

87 steeply ascending scaling functions. Best performance on a series of in-house data sets (data not shown) was found with values of p = 10, q = 0.2 and r = 5. This scaling function virtually removes the contributions of all hydrogen bonds formed at values of a = 0.6 and above and removes about half of the contribution of hydrogen bonds formed at a = 0.3. As a consequence, there is a strong overall reduction of hydrogen bond energy in the total score, which reflects our observation that contributions of the lipophilic contact surface are often underestimated in the FlexX scoring function. The performance of FlexX for structure prediction is not altered when the modified score is used (results not shown). It is obvious that the modified scoring scheme for hydrogen bonds is still a crude approximation. Still, all hydrogen bonds between various functional groups are treated equal and the chemical environment of each hydrogen bond is not considered [68]. Attempts by Bohm to improve his regression-based scoring scheme by terms accounting for the environment of hydrogen bonds have failed [41], but it is likely that terms for such secondary interactions cannot be derived by regression techniques.

Calculation of enrichment factors In this study we were not interested in absolute scoring energy values obtained from single docking runs, but focussed exclusively on the ranking of molecules with respect to each other. The quality of the rankings was assessed by enrichment factors. For this purpose, libraries were divided into ‘active’ and ‘inactive’ compounds at an arbitrary pKi threshold. In enriching experiments employing the WDI library, sets of 100 randomly selected compounds from the thrombin library were selected and added as the only ‘active’ compounds to the WDI library. In all cases, enrichment factors were calculated as a function of the size of library subsets by the formula ef(subset size) = fraction active in subset/fraction active in library On average, random screening should result in enrichment factors of ef = 1, values of ef < 1 are obtained when the subset contains less active compounds than should be expected from the library average. Enrichment factors were not scaled, i.e. absolute values obtained depend on the ratio of active and inactive compounds defined in each specific case and should be compared with the maximum achievable ef that is obtained by dividing the total number of molecules by the number of active molecules. For very large subsets of a library the term ‘enrichment’ naturally loses its meaning.

88

Figure 2. a) Plot of pKi vs. number of rotatable bonds for the thrombin library. Enrichment plots for two subsets of the thrombin library containing only compounds with 3–8 rotatable bonds (b) and only compounds with more than 8 rotatable bonds (c) are shown.

Standard vs. accessibility-scaled H-bond score Ranking a library of thrombin inhibitors The thrombin library used in this study has two properties that should be kept in mind: Firstly, 75% of the compounds contain an amidinium or guanidinium functional group and another 7% contain a different basic functional group designed to fit into the S1 pocket. Therefore, any successful ranking of binding affinity must rely on secondary hydrogen bonds and hydrophobic interactions rather than on the presence or absence of a basic group binding to Asp189 in the S1 pocket. Secondly, Figure 2a shows that the library contains a large percentage of compounds with many rotatable bonds and that there is a clear tendency for the large compounds to have higher activity. In order

89 to minimize the effect of molecule size, we will discuss results separately for two subsets, one containing all molecules with 3–8 rotatable bonds (1627 compounds, 141 <0.1 µM , set A) and a second one with molecules having more than 8 rotatable bonds (1513 compounds, 452 <0.1 µM , set B). The enrichment factors calculated with standard and accessibility scaled (AS) Hbond scoring for both sets A and B are depicted in Figures 2b and c. An activity threshold of pKi = 7 was applied. Since the percentage of molecules defined to be active in the two sets is different, it should be kept in mind that the maximum achievable enrichment factors are also quite different (1 1.5 for set A vs. 3.3 for set B). Generally speaking, 30 to 50% of the maximum achievable enrichment rate is reached in top 10% subsets of the ranked databases: The differentiation between high-affinity and low-affinity thrombin inhibitors is moderate, which must be attributed to errors of the scoring function as well as in the ligand poses generated by the docking algorithm. For set B, slightly better performance is found with the standard scoring scheme. In order to distinguish between large active and inactive molecules, hydrogen bonds formed at the periphery of the cavity must obviously be accounted for. For set A, the AS scaled scoring scheme performs significantly better than standard scoring. A more detailed look at the docking results reveals structural reasons for this fact. Many of the compounds defined as ‘inactive’ in set A are either docked outside the active site, where their interactions have less weight when accessibility scaling is used, or form few unfavorable hydrogen bonds within the cavity. This stresses the importance of hydrogen bonds formed in buried pockets of a protein. In this sense, accessibility scaling of hydrogen bonds emphasizes the pharmacophore-like character of specific atoms in protein cavities. It is clear that set A corresponds more closely than set B to a typical distribution of molecules in a virtual screening experiment, because large molecular weight and high molecular flexibility is usually not a desirable property of compounds in a screening library. The AS H-bond potential therefore seems to be appropriate for virtual screening experiments. In order to test this hypothesis, further enrichment experiments were performed using a 5000 molecule random subset of the WDI database.

Retrieving thrombin inhibitors from a WDI subset library Two random sets of 100 thrombin inhibitors were generated from the thrombin library. The first random set was restricted to compounds with 0.1–0.001 µM binding constants (high-affinity set), the second set only contained 5– 10 µM inhibitors (low-affinity set). In both cases, the additional restriction of 3-8rotatable bonds per thrombin inhibitor was applied in order to avoid the scoring bias for large molecules mentioned above. Docking results for

90

Figure 3. Illustration of the enrichment achieved with two different sets of 100 thrombin inhibitors in a 5000 molecule random subset of the WDI. (a, b) high-affinity subset, (c, d) low-affinity subset. Dashed lines: standard scoring, solid lines: accessibility-scaled H-bond scoring. The right hand plots show the fraction of active compounds found at each percent level of the ranked library.

the thrombin inhibitor sets were combined with those for the WDI subset and the total library was re-ranked according to FlexX scores. The resulting enrichment curves are plotted in the left column of Figure 3; the percentage of thrombin inhibitors found at each percent level of the total library is plotted in the right hand column of Figure 3 for better comparison. FlexX can clearly distinguish between the compounds from the high-affinity set and the WDI subset (Figures 3a and b). Standard scoring achieves ef values higher than 20 and retrieves all thrombin inhibitors within the top 9% of the total library. AS H-bond scoring performs even better: The maximum enrichment factor of about 50 is almost reached and all thrombin inhibitors are among the top 4% of the ranked library. Finding high-affinity inhibitors of thrombin in a WDI subset, however, is certainly not a realistic virtual screen-

91 ing experiment. When a screening library is searched for inhibitors of a new target, it is more reasonable to expect inhibitors in the low micromolar range at best. This situation is mimicked by the low-affinity data set added to the WDI subset. Figures 3c and d illustrate that picking weakly binding thrombin inhibitors from a library is a far more difficult task for a docking program, even though thrombin has a well-defined set of H-bond donor and acceptor groups that play a role in inhibitor recognition and binding. Standard FlexX scoring retrieves only about 60% of the thrombin medium-affinity inhibitors and 50% of the low-affinity inhibitors within the top 5% of the total library. In the same fraction of the library, AS H-bond scoring finds 20% more active compounds. The WDI subset contained only 143 compounds with amidinium systems (less than 3%). In the combined low-affinity and WDI libraries, 34 of these amidinium compounds were contained in the top 5% when standard scoring was applied. With AS H-bond scoring, this number decreases to 21. This means that the modified scoring scheme does not simply enrich any amidinium systems but is able to distinguish between active and inactive amidinium compounds to some extent. It can be concluded that at least for this application, AS H-bond scoring is a significant improvement over the standard scoring scheme. AS H-bond scoring can help to focus the inhibitor search for targets where active site cavities contain a number of buried donors and acceptors that must be satisfied. It has meanwhile been successfully applied in a number of Roche projects that fulfill these criteria, e.g. p38 MAP kinase and Staphylococcus aureus gyrase B. It might also be useful for the assembly of motif libraries (small, weakly but specifically binding molecules) for SAR by NMR experiments [69–71]. Modifications of scoring functions like the one presented here should be employed with caution, since there is an inherent danger of a loss of generality. For example, in the case of thrombin, AS H-bond scaling focusses on charged S 1 -binding groups and could easily overlook inhibitor classes with neutral S 1 groups such as those from Merck or Eli Lilly (for reviews see References 72 and 73). As long as there is no truly generally applicable scoring function, however, it is and will be fair practice to tailor scoring functions to individual types of targets [58,74–77].

FlexX scoring function versus a potential of mean force As has been demonstrated above, the performance of FlexX in library prioritization is often still moderate, especially when activity differences between highly potent and less potent inhibitors have to be estimated. Alternative scoring schemes are therefore of great interest. Recently, a new knowledge based scoring function has been proposed by Muegge and Martin [47,78].

92 We were interested to compare its performance to that of the FlexX scoring function, because it relies on a completely different approach and has a different functional form. We have re-implemented this scoring function as a stand-alone C program. FlexX rank 1 solutions for the thrombin, DHFR diaminopyrimidine and COX2 libraries were used as input structures for scoring in order to be able to compare scores of identical complex structures. Structures for which no docking solutions could be found are omitted in the enrichment calculations. We are aware of the fact that this procedure contains a hidden bias, because the generation of the ligand poses was driven by one of the two scoring functions that are compared. In our hands, however, the Muegge scoring function has proven to be less sensitive to small coordinate changes than the FlexX scoring function. We have repeated the re-scoring experiment by using a Powell optimizer [79] in conjunction with the Muegge scoring function and found very minor changes in enrichment factors. Enrichment plots for the complete thrombin database are shown in Figure 4a. Both scoring functions give very similar curves, reaching almost 50% of the maximum achievable enrichment factor of 6.1 at 15% of the database. Depending on the value of ∆Grot used, enrichment calculated with the FlexX scoring function can vary significantly, while there is no such flexibility penalty term in the Muegge function. Figure 4b shows enrichment curves for the COX2 database (activity threshold at pKi = 7, maximum ef = 6). In this case, FlexX fails completely for the following reasons: The COX2 binding site is a very confined and inaccessible cavity. Inhibitor activity is mainly due to lipophilic interactions within the tight cavity. Small changes in cavity size can significantly alter the size of the contact surface and make it difficult for FlexX to generate accurate ligand placements. In addition, some inhibitors form hydrogen bonds with Arg and Glu residues at one end of the pocket. These residues can move substantially to accommodate different types of inhibitors. Since hydrogen bond scores depend strongly on angle and distance deviations from an ideal value, in this docking experiment the total FlexX score is a meaningless number. In spite of the large uncertainties in ligand placement and protein conformation, moderate enrichment is obtained with the Muegge score. This result supports our experience that the Muegge function is a rather robust scoring function. In this experiment, the neglect of directionality in potentials of mean force is certainly an advantage. In the third example, there is less doubt about the correct orientation of the ligand than in the two cases discussed above. All compounds of the DHFR library contain a diaminopyrimidine moiety, whose orientation could be expected to remain constant and was taken from an X-ray structure. The remainder of most ligands has little conformational freedom and extends along a deep active site cleft that stretches from the position NADPH cofactor

93

Figure 4. Enrichment plots calculated with the FlexX and Muegge scoring functions for the thrombin library (a) and the COX2 library (b).

to Arg57. All in-house X-ray structures of DHFR complexes could be well reproduced as rank 1 solutions of FlexX employing standard settings. As can be seen from Figure 5d, high enrichment is obtained. In this case, the activity threshold was chosen to be pKi = 8.5 (maximum ef = 5.8), since high affinity can be quite easily achieved with diaminopyrimidine ligands. There is a decent correlation (r2 = 0.61) between calculated and experimental binding affinities for both Muegge and FlexX score (Figures 5a and b). There is also good agreement between FlexX and Muegge score, supporting the fundamental validity of both approaches (Figure 5c). The correlation between FlexX and Muegge score (r2 = 0.67) is better than that between experimental

94

Figure 5. Plots of calculated FlexX score (a) and Muegge score (b) versus experimentally determined log(IC50) values for diaminopyrimidines binding to S. aureus DHFR. (c) Correlation between Muegge score and FlexX score. (d) Enrichment plots obtained with FlexX and Muegge scores.

data and each score. Two reasons can be made responsible for this fact: Firstly, systematic errors in the determination of IC50 values can affect the correlation. Secondly, both scoring functions assess only static interactions between protein and ligand and omit many solvation and conformational aspects of ligand binding. It is certainly encouraging to see that protein-ligand interaction terms calculated by two very different methods correlate well. On the other hand, it becomes clear that accurate affinity predictions cannot be based on such interaction terms alone.

95

Conclusions When docking calculations are used as a means of database prioritization, a substantial number of approximations have to be accepted: The conformational and orientational space of the ligand is not searched exhaustively, and, above all, the scoring functions employed can provide only crude approximations to binding free energies. Further restrictions are the neglect of structural water and protein flexibility. Within FlexX, water molecules can be placed together with the incremental construction of the ligand [80], but contributions to the score by each water molecule are difficult to estimate. Several methods have been proposed to account for movement of protein side chains [81–84], but these methods are either ineffective or too time consuming to be applied in virtual screening. A promising approach is docking into ensembles of protein structures, representing important conformational states of the binding site [85]. In spite of a long list of approximations, satisfactory enrichment rates can be achieved by means of database docking. This has been shown by retrospective analyses of several Roche in-house compound libraries. The excellent results obtained for the DHFR library point to the importance of accurate ligand poses. Enrichment rates achieved in this study are somewhat higher than those obtained in a retrospective analysis of cathepsin D inhibitors [13]. It has also been demonstrated that it is of great importance to test virtual screening tools under realistic conditions, i.e. not by trying to retrieve a set of selected high-affinity ligands within otherwise randomly assembled test libraries. Since the ideal general-purpose scoring function has yet to be found, in order to yield acceptable results, scoring functions must often be tailored to solve specific problems. A ‘tailored’ version of the FlexX scoring function has been presented and could be shown to give good enrichment factors in library screening with thrombin as a target. On the other hand, the Vertex group has reported that ‘consensus scoring’ has proven to be successful in many applications, i.e. the selection of those potential inhibitors that score well with several general-purpose scoring functions [86]. This strategy can help to reduce the number of false positives in virtual screening. As more alternative scoring schemes become available, this strategy might become even more interesting. Potentials of mean force like the Muegge scoring function and other recently published alternatives [48,50] should be given special attention.

96

Acknowledgements I would like to thank Matthias Rarey and Thomas Lengauer for a library version of FlexX and valuable help with the program as well as Ingo Muegge and Yvonne Martin for fruitful discussions on their scoring function. Furthermore, I would like to thank my colleagues at Roche for a stimulating scientific environment, especially Hans-Joachim Böhm, Daniel Bur, Paul Gerber and Gisbert Schneider.

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27.

Van Drie, J.H. and Lajiness, M.S., Drug Discov. Today, 3 (1998) 274. Walters, W.P., Stahl, M.T. and Murcko, M.A., Drug Discov. Today, 3 (1998) 160. Colman, P.M., Curr. Opin. Struct. Biol., 4 (1994) 868. Kuntz, I.D., Meng, E.C. and Shoichet, B.K., Acc. Chem. Res., 27 (1994) 117. Jones, G. and Willet, P., Curr. Opin. Biotechnol., 6 (1995) 652. Rosenfeld, R., Vajda, S. and DeLisi, C., Annu. Rev. Biophys. Biomol. Struct., 24 (1995) 677. Gschwend, D.A., Good, A.C. and Kuntz, I.D., J. Mol. Recogn., 9 (1996) 175. Lengauer, T. and Rarey, M., Curr. Opin. Struct. Biol., 6 (1996) 402. Kuntz, I.D., Blaney, J.M., Oatley, S.J., Langridge, R. and Ferrin, T., J. Mol. Biol., 161 (1982) 269. Shoichet, B.K., Bodian, D.L. and Kuntz, I.D., J. Comput. Chem., 13 (1992) 380. Meng, E.C., Shoichet, B.K. and Kuntz, I.D., J. Comput. Chem., 13 (1992) 505. Makino, S. and Kuntz, I.D., J. Comput. Chem., 18 (1997) 1812. Sun, Y., Ewing, T.J.A., Skillman, A.G. and Kuntz, I.D., J. Comput.-Aided Mol. Design, 12 (1998) 579. Welch, W., Ruppert, J. and Jain, A.N., Chem. Biol., 3 (1996) 449. Goodsell, D.S. and Olson, A.J., Proteins, 8 (1990) 195. Gehlhaar, D.K., Verkhivker, G.M., Rejto, P.A., Sherman, C.J., Fogel, D.B., Fogel, L.J. and Freer, S.T., Chem. Biol., 2 (1 995) 3 17. Jones, G., Willet, P. and Glen, R.C., J. Mol. Biol., 245 (1995) 43. Jones, G., Willet, P., Glen, R.C., Leach, A.R. and Taylor, R., J. Mol. Biol., 267 (1997) 721. Sandak, B., Nussinov, R. and Wolfson, H.J., CABIOS, 11 (1995) 87. Fischer, D., Lin, S.L., Wolfson, H.J. and Nussinov, R., J. Mol. Biol., 248 (1995) 459. Rarey, M., Kramer, B., Lengauer, T. and Klebe, G., J. Mol. Biol., 261 (1996) 470. Rarey, M., Kramer, B. and Lengauer, T., J. Comput.-Aided Mol. Design, 11 (1997) 369. Miller, M.D., Kearsley, S.K., Underwood, D.J. and Sheridan, R.P., J. Comput.-Aided Mol. Design, 8 (1994) 153. Trosset, J.-Y. and Scheraga, H.A., J. Comput. Chem., 20 (1999) 412. Ewing, T.J.A. and Kuntz, I.D., J. Comput. Chem., 18 (1997) 1175. Westhead, D.R., Clark, D.E. and Murray, C.W., J. Comput.-Aided Mol. Design, 11 (1997) 209. Baxter, C.A., Murray, C.W., Clark, D.E., Westhead, D.R. and Eldridge, M.D., Proteins, 33 (1998) 367.

97 28. Vieth, M., Hirst, J.D., Dominy, B.N., Daigler, H. and Brooks III, C.L., J. Comput. Chem., 14 (1998) 1623. 29. Ajay and Murcko, M.A., J. Med. Chem., 38 (1995) 4953. 30. Knegtel, R.M.A. and Grootenhuis, P.D.J., In Kubinyi, H., Folkers, G. and McMartin, Y.C. (Eds.), 3D QSAR in Drug Design: Ligand Protein Interactions and Molecular Similarity, Kluwer/ESCOM, Dordrecht, 1998, pp. 99–1 14. 31. Clark, D.E., Murray, C.W. and Li, J. In Lipkowitz, K.B. and Boyd, D.B. (Eds.), Reviews in Computational Chemistry, Wiley-VCH, New York, NY, 1997, pp. 67–126. 32. Oprea, T.I. and Marshall, G.R. In Kubinyi, H., Folkers, G. and Martin, Y.C. (Eds.), 3D QSAR in Drug Design: Ligand Protein Interactions and Molecular Similarity, Kluwer/ESCOM, Dordrecht, 1998, pp. 3–17. 33. Vieth, M., Hirst, J.D., Kolinski, A. andBrooks, C.L., J. Comput. Chem., 19 (1998) 1612. 34. Luty, B.A., Wasserman, Z.R., Stouten, P.F.W., Hodge, C.N., Zacharias, M. and McCammon, J.A., J. Comput. Chem., 16 (1995) 454. 35. Viswanadhan, V.N., Reddy, M.R., Wlodawer, A., Varney, M.D. and Weinstein, J.N., J. Med. Chem., 39 (1996) 705. 36. Shoichet, B.K., Leach, A.R. and Kuntz, I.D., Proteins, 34 (1999) 4. 37. Böhm, H.-J., J. Comput.-Aided Mol. Design, 8 (1994) 243. 38. Meng, E.C., Kuntz, I.D., Abraham, D.J. and Kellogg, G.E., J. Cornput.-Aided Mol. Design, 8 (1994) 299. 39. Jain, A.N., J. Comput.-Aided Mol. Design, 10 (1996) 427. 40. Elridge, M.D., Murray, C.W., Auton, T.R., Paolini, G.V. and Mee, R.P., J. Comput.Aided Mol. Design, 11 (1997) 425. 41. Bohm, H.-J., J. Comput.-Aided Mol. Design, 12 (1998) 309. 42. Wang, R., Liu, L., Lai, L. and Tang, Y., J. Mol. Model., 4 (1998) 379. 43. Murray, C.W.,Auton, T.R. andElridge, M.D., J. Cornput.-AidedMol. Design, 12 (1999) 503. 44. Wallquist, A., Jernigan, R.L. and Covell, D.G., Protein Sci., 4 (1995) 1181. 45. Wallquist, A. and Covell, D.G., Proteins, 25 (1996) 403. 46. DeWitte, R.S. and Shakhnovich, E.I., J. Am. Chem. Soc., 118 (1996) 11733. 47. Muegge, I. and Martin, Y.C., J. Med. Chem., 42 (1999) 791. 48. Mitchell, J.B.O., Laskowski, R.A., Alex, A. and Thornton, J.M., J. Comput. Chem., 20 (1999) 1165. 49. Mitchell, J.B.O., Laskowski, R.A., Alex, A., Forster, M.J. and Thornton, J.M., J. Comput. Chem., 20 (1999) 1177. 50. Gohlke, H., Hendlich, M. and Klebe, G., J. Mol. Biol., 295 (2000) 337. 51. Dixon, J.S., Proteins, Suppl. 1 (1997) 198. 52. Knegtel, R.M.A., Bayada, D.M., Engh, R.A., von der Saal, W., van Geerestein, V.J. and Grootenhuis, P.D.J., J. Comput.-Aided Mol. Design, 13 (1999) 167. 53. Massova, I., Martin, P., Bulychev, A., Kocz, R., Doyle, M., Edwards, B.F.P. and Mobashery, S., Bioorg. Med. Chem. Lett., 8 (1998) 2463. 54. Burkhard, P., Taylor, P. and Walkinshaw, M.D., J. Mol. Biol., 277 (1998) 449. 55. Shoichet, B.K., Stroud, R.M., Santi, D.V., Kuntz, I.D. and Perry, K.M., Science, 259 (1993) 1445. 56. Toyoda, T., Brobey, R.K.B., Sano, G.-I., Horii, T., Tomioka, N. and Itai, A., Biochem. Biophys. Res. Commun., 235 (1997) 515. 57. Gschwend, D.A., Sirawaraporn, W., Santi, D.V. and Kuntz, I.D., Proteins, 29 (1997) 59. 58. DesJarlais, R.L. and Dixon, J.S., J. Comput.-Aided Mol. Design, 8 (1994) 231.

98 59. Friedman, S.H., Ganapathi, P.S., Rubin, Y. and Kenyon, G.L., J. Med. Chem., 41 (1998) 2424. 60. Sadowski, J., Schwab, C.H. and Gasteiger, J., Corina, v. 2.1; Molecular Networks GmbH Computerchemie, Erlangen, 1998. 61. Sybyl molecular modeling software, v. 6.2; Tripos Associates, St. Louis, MO, 1994. 62. WDI (World Drug Index), version 2/96; Denwent Information, 1996. 63. Bernstein, F.C., Koetzle, T.E., Williams, G.J.B., Meyer, J.E.F., Brice, M.D., Rodgers, J.R., Kennard, O., Shimanouchi, T. and Tasumi, M., J. Mol. Biol., 112 (1977) 535. 64. Dale, G., Broger, C., D’Arcy, A., Hartmann, P., DeHoogt, R., Jolidon, S., Kompis, I., Labhardt, A.M., Langen, H., Locher, H., Page, M.G.P., Stüber, D., Then, R.L., Wipf, B. and Oefner, C., J. Mol. Biol., 226 (1997) 23. 65. A related hydrogen bond scoring scheme has been implemented at Agouron as part of the ‘HTS’ scoring function, Rose, P., UCSF Computer-Assisted Molecular Design Course, San Francisco, CA, January 16-18,1997. 66. Connolly, M.L., Science, 221 (1983) 709. 67. Stahl, M., Taroni-Osterroth, C. and Schneider, G., J. Comput. Chem., 20 (1998) 336. 68. Tame, J.R.H., J. Comput.-Aided Mol. Design, 13 (1999) 99. 69. Shuker, S.B., Hajduk, P.J., Meadows, R.P. and Fesik, S.W., Science, 274 (1996) 1531. 70. Olejniczak, E.T., Hajduk, P.J., Marcotte, P.A., Nettesheim, D.G., Meadows, R.P., Edalji, R., Holzman, T.F. and Fesik, S.W., J. Am. Chem. SOC., 119 (1997) 5828. 71. Hajduk, P.J., Sheppard, G., Nettesheim, D.G., Olejniczak, E.T., Shuker, S.B., Meadows, R.P., Steinman, D.H., Carrera Jr., G.M., Marcotte, P.A., Severin, J., Walter, K., Smith, H., Gubbins, E., Simmer, R., Holzman, T.F., Morgan, D.W., Davidsen, S.K., Summers, J.B. and Fesik, S.W., J. Am. Chem. SOC., 119 (1997) 5818. 72. Wiley, M.R. and Fisher, M.J., Exp. Opin. Ther. Patents, 7 (1997) 1265. 73. Sanderson, P.E.J. and Nayor-Olsen, A.M., Curr. Med. Chem., 5 (1998) 289. 74. Grootenhuis, P.D.J. and van Galen, P.J.M., Acta Crystallogr., D51 (1995) 560. 75. Holloway, M.K., Wai, J.M., Halgren, T.A., Fitzgerald, P.M.D., Vacca, J.P., Dorsey, B.D., Levin, R.B., Thompson, W.J., Chen, L.J., deSolms, S.J., Gaffin, N., Lyle, T.A., Sanders, W.A., Tucker, T.J., Wiggins, M., Wiscount, C.M., Woltersdorf, O.W., Young, S.D., Darke, P.L. and Zugay, J.A., J. Med. Chem., 38 (1995) 305. 76. Verkhivker, G., Appelt, K., Freer, S.T. and Villafranca, J.E., Protein. Eng., 8 (1995) 677. 77. Horvath, D., J. Med. Chem., 40 (1997) 2412. 78. Muegge, I., Martin, Y.C., Hajduk, P.J. and Fesik, S.W., J. Med. Chem., 42 (1999) 2498. 79. Press, W.H., Teukolsky, S.A., Vetterling, W.T. and Flannery, B.P., Numerical Recipes in C, Cambridge University Press, Cambridge, 1992, pp. 412–420. 80. Rarey, M., Kramer, B. and Lengauer, T., Proteins, 34 (1999) 17. 81. Leach, A.R., J. Mol. Biol., 235 (1994) 345. 82. Schnecke, V., Swanson, C.A., Getzoff, E.D., Tainer, J.A. and Kuhn, L.A., Proteins, 33 (1998) 74. 83. Apostolakis, J., Plückthun, A. and Caflisch, A., J. Comput. Chem., 19 (1998) 21. 84. Keserü, G.M. and Kolossváry, I., Molecular Mechanics and Conformational Analysis in Drug Design, Blackwell Science, Oxford, 1999. 85. Knegtel, R.M.A., Kuntz, I.D. and Oshiro, C.M., J. Mol. Biol., 266 (1 997) 424. 86. Charifson, P.S., Corkery, J.J., Murcko, M.A. and Walters, W.P., J. Med. Chem., 42 (1999) 5100.

Perspectives in Drug Discovery and Design, 20: 99–114, 2000. KLUWER/ESCOM © 2000 Kluwer Academic Publishers. Printed in the Netherlands.

A knowledge-based scoring function for protein-ligand interactions: Probing the reference state INGO MUEGGE Bayer Research Center; 400 Morgan Lane, West Haven, CT 06516, U.S.A. (E-mail: [email protected])

Summary. Knowledge-based scoring functions have recently emerged as an alternative and very promising way of ranking protein-ligand complexes with known 3D structure according to their binding affinities. These simplified potential-based approaches use the structural information stored in databases of protein-ligand complexes to derive atom pair interaction potentials also known as potentials of mean force (PMF). The derived PMF depend on the definition of a suitable reference state. The reference states vary among suggested knowledgebased scoring functions. Therefore, we attempt here to shed some light on the influence of different reference state definitions on the predictive power of a knowledge-based scoring function that has been introduced by us very recently [J. Med. Chem., 42 (1999) 791]. It is shown that a reference state that implicitly and more comprehensively accounts for protein and ligand solvation gives the most consistent scoring results for four test sets of diverse protein-ligand complexes taken from the Brookhaven Protein Data Bank. It is also shown that a reference sphere radius of at least 7–8 Å is needed to effectively capture solvation effects that are treated implicitly in the scoring function. Key words: Helmholtz free energy, PMF scoring, protein-ligand binding, reference state

Introduction Docking/scoring approaches are widely used now in virtual screening experiments. Large databases can be searched for suitable lead compounds that are active against a macromolecular target if the 3D structure of the target (usually an enzyme or receptor) is known [1–4]. While the fast sampling of possible binding modes can be accomplished very efficiently by various docking programs [5–12], the identification of the correct binding mode and the ranking of different ligands according to their binding affinities to the protein target remain the Achilles’ heel of docking/scoring approaches [13]. The fast ranking of binding modes of putative protein-ligand complexes is accomplished by using so called scoring functions [14,15]. There are three main types of scoring functions that are implemented in docking/scoring programs today. Force field scoring functions have been used for more than a

100 decade. They rely on calculated non-bonded interaction terms between protein and ligand atoms based on standard force fields such as AMBER [16,17]. Additional solvation or entropy terms are sometimes considered [18] and chemical as well as contact scores are used [8]. Recent improvements include the introduction of a generalized Born treatment for long range electrostatic interactions [19]. Empirical scoring functions, introduced a few years ago [20–23], use multivariate regression methods to fit a set of physically motivated parameters like hydrogen bonding energy, lipophilicity, ion pair interactions, entropic contributions, and solvation contributions for a training set of protein-ligand complexes of known 3D structure to measured binding constants. Several scoring functions have been introduced very recently that are partially or completely knowledge-based [24–29]. They use the sum of potentials of mean force (PMF) between protein and ligand atoms derived from the Brookhaven Protein Data Bank [30] (PDB) as a measure for protein-ligand binding affinity. Knowledge-based scoring functions have been successfully used in docking studies of different protein targets and have shown some improvement over empirical and force field-based scoring functions in predicting correct binding modes and in ranking putative protein-ligand complexes [4,24–26, 31,32]. Although there is no direct theoretical rationale of linking knowledgebased potentials, as used in scoring functions, to binding free energies of protein-ligand complexes [33], the results of these docking studies are very encouraging [4,31,32]. One of the most important aspects of knowledgebased scoring functions is the introduction of an appropriate reference state to derive meaningful PMF. Since there is no unique concept of how this reference state should be designed, we attempt in this perspective to compare three possible reference states, as well as the effect of different radii of the reference sphere on the predictive power of a knowledge-based scoring function. Interpretations of these reference states as well as their effect on ranking diverse protein-ligand complexes taken from the PDB, according to their binding affinities, are discussed based on our recently introduced PMF scoring function [27].

Outline of the knowledge-based scoring function The idea of deriving knowledge-based potentials for scoring protein-ligand complexes has been inspired by the success of similar potentials in folding and protein structure evaluation [34–36]. Protein-ligand atom pair potentials can be calculated from structural data alone under the assumption that an observed crystallographic complex represents the optimum placement of the ligand atoms relative to the protein atoms constrained by covalent bonds.

101 Considering hundreds of complexes in the PDB, there are millions of observed distances between ligand and protein atoms. Statistical potentials can be derived between protein and ligand atom types that are similar but not strictly equal to potentials of mean force [33,37]. Nevertheless, due to the familiarity with the term, we call them here potentials of mean force (PMF). The PMF score is calculated as sum over all protein-ligand atom pair interaction free energies Aij (r ) as function of the atom pair distance r by (1) where kl is a ligand-protein atom pair of type ij. The potential of mean force for an atom pair of type ij can be written as (2) where kB is the Boltzmann factor, T is the absolute temperature, and fjvol_corr (r) is a ligand volume correction factor. ρ ij (r ) designates the number density of atom pairs of type i j at a certain distance r . It is calculated by (3) where each distance is represented by the delta function δij, (r – rij). The sum is taken over all proteins p in the database and over all ligand-protein atom pairs kl. is the number density of a ligand-protein atom pair of type ij in a reference sphere with radius R that is defined as (4) A detailed outline of the scoring function, including the derivation of the ligand volume correction factor, is provided elsewhere [27]. The PMF are derived from a set of 697 protein-ligand complexes taken from Reference 27. A bin size of 0.2 Å is applied. Sixteen protein atom types and 34 ligand atom types are defined and lead to 544 PMF of which 282 are statistically significant [27]. Protein-ligand contacts that have less than 1000 occurrences in the database are considered statistically insignificant and are ignored. PMF derived for hydrogen atom types are also ignored throughout the work motivated by the fact that many ligands and proteins used to derive the potentials do not have all hydrogen atoms present in the database. Test

102 runs with smoothed potentials, which are generated by using Savitzky-Golay filters, do not improve the scoring. Therefore, the derived potentials are taken at their face values. We show below, that the predictive power of the potentials are, indeed, very sensitive to small changes.

Definition of reference states Perhaps the most crucial element of a knowledge-based scoring function is the definition of an appropriate reference state. This reference state must reflect the equilibrium distribution of protein atoms around an arbitrary point in the protein if no interactions with a ligand occur. Unfortunately, the definition of such a reference state is not unique. It is influenced by the presence and treatment of solvent molecules, by ligand atoms, by its size, and by its location. Moreover, the documentation of knowledge-based scoring functions leaves the reader often in the dark about the exact definition of the reference state used. Therefore, we aim here to discuss three viable reference states and compare their effects on the predictive power of the PMF scoring function.

Reference state 1 In the above described scoring function the reference state (referred to below as Ref1) is calculated as (5) where 〈 〉i,complex designates an average over all ligand atoms of type i and over all protein-ligand complexes that are used to derive the PMF. The reference spheres are located around the ligand atoms of type i. Parts of the reference spheres are occupied by the solvent. Therefore, the reference state captures the average solvent exposure of ligand molecules in the proteinligand complexes. More specifically, it captures the different solvation of types of ligand atoms that are on average more buried in the protein matrix (hydrophobic atoms) or more exposed to the solvent (hydrophilic atoms). Solvation effects are implicitly captured in Ref1 since the different ratios of solvent to protein volume in spherical shells at radius r and in the reference sphere with radius R influence ρii (r) and ρiibulk to a different extent (Figure 1). The PMF are calculated by using Equation 2.

103

Figure 1. Schematic illustration of a protein-ligand-solvent system. It is shown that the ratio between solvent volume and protein-ligand volume in a spherical shell of thickness ∆r1 close to the ligand is smaller than that in a spherical shell with thickness ∆r2. It is also smaller than that in the reference sphere with radius R. The differences in these volume ratios directly affect ρij (r) and ρ ijbulk in Equation 2, since these number densities are calculated depending on the available protein volumes.

Reference state 2 Independent of the protein-ligand complexes used to derive the PMF, a more general reference state can be defined as (6) where ρJ(r) is a number density of protein atoms of type j at distance r from an arbitrary point in a protein. The reference sphere with radius R must

104 be entirely filled with protein atoms since solvent exposure of the reference sphere, which is no longer centered at a specific ligand atom, would introduce arbitrary solvent effects. 〈 〉prot designates an average over protein complexes of the PDB. In order to assure that Aij (r) approaches zero with decreasing r, an additional set of protein atom dependent correction factors C j must be added to the reference state which is referred to below as Ref2. This is necessary since, in contrast to ρij(r), the spherical volumes in which the ρjbulk (r) are calculated do not contain any solvent. Note that the reference number densities no longer depend on the ligand atoms. In order to derive ρjbulk we find the average number of protein heavy atoms to entirely fill a spherical reference volume of radius R = 12 Å, to be about ρbulk = 400. The average number of protein atoms of any type at a distance r from an arbitrary reference point in the protein can then be calculated by using the expression (7) where ∆r → 0. A set of 697 protein-ligand complexes (taken from Reference 27) is chosen. The midpoint of each protein is found by using an iterative numerical procedure. One probe sphere is chosen per protein and the average occurrence of protein atoms of type j is determined and averaged over all protein structures. For each protein, ρJ(r) is scaled by the ratio of protein heavy atoms in the sphere, ρreal (r) , and the optimum number of protein atoms ρ (r) determined before. (8) where ρjreal (r) is the number of protein atoms of type j at distance r. This procedure assures that no solvent volume effects were introduced into the reference number densities. Table 1 lists the average occurrences of protein atom types found in a spherical reference volume with radius 12 Å. Interestingly, the relative average occurrence of protein atoms of type j is the same in a sphere of radius 6 Å, using the same midpoints. However, while the 12 Å, sphere shows a Gaussian distribution of occurrences, the average number of the 6 Å, sphere shows a highly fluctuating occurrence pattern as illustrated in Figure 2. The finding that the average number densities in the 6 and 12 Å, radius spheres are comparable suggests that the 12 Å, sphere contains mostly protein atoms that are not part of the protein surface. Using Ref2 the PMF can be calculated as (9)

105

Figure 2. Histogram of occurrences of the protein type CP in a database of 697 protein crystal structures (see Reference 27) in spheres with radius 12 Å (black) and 6 Å (white). The latter is scaled by a factor of 8 to account for the volume difference between the spheres.

Reference state 3 A more solvent independent reference state (Ref3) is defined by applying a correction factor to ρijbulk (Equation 5).

(10) where ρijbulk is defined in Equation 5 and ρbulk is defined as (11) ρ ( r) is the number density of protein atoms of any type in spherical shells that are entirely filled with protein atoms at the distance r (Equation 7). ρ(r) represents an average overprotein structures in the database. ρi (ρ) designates the number density of protein atoms of any type at the distance r from a ligand atom of type i in a particular protein-ligand complex. ρ ibulk is defined

106 Table 1. Average occurrence of protein atoms in a sphere of given radiusa Protein atom type CF CP cF cP CO CN NC ND NR OC OA OD SA

Average occurrence in a 12 Å, reference sphere

Average occurrence in a 6 Å, reference sphereb

84.8 121.9 33.7 10.5 4.1 3.0 4.6 59.5 2.9 7.9 55.3 8.9 1.3

83.8 119.4 32.7 11.9 4.8 2.9 4.9 57.3 4.3 9.9 54.9 10.6 1.0

a The atom types correspond to the protein atom types defined by Muegge and Martin [27]. The average occurrences are calculated by using the same set of 697 protein complexes used by Muegge and Martin to derive the PMF [27]. One test sphere is used per protein structure only. The midpoint of the sphere is set in the midpoint of the protein located by an iterative numerical procedure. For a sphere of 12 Å, radius an optimal occurrence of protein heavy atoms of 400 is found. In case more or less than 400 protein heavy atoms are found in the sphere, a volume correction factor is used to scale the number of occurrences of particular protein atom types up or down (Equation 8). It accounts for the ratio of the actual and optimal numbers of heavy atoms in the sphere. b The average occurrence is scaled up by a factor 8 to reflect the fact that the volume of a sphere of radius 6 Å, is eight times smaller than that of 12 Å, radius.

as number density of protein atoms in a spherical reference sphere of radius R around a ligand atom of type i by (12) The PMF can then be calculated as (13)

107 Ref3 represents a reference state that is independent of solvent exposure of ligand atoms of type i. However, as for ρijbulk in Refl, the ρijbulk2 are still calculated from reference spheres that are centered at the ligand atoms for each protein-ligand complex. The PMF derived with Ref3 still implicitly consider solvent effects since the density of certain protein atom types is higher at the surface of the protein than inside the protein. However, the degree of which solvation effects are captured is much smaller than in Ref 1. Other possible reference states, including those that use explicit solvent molecules in the reference sphere, have been discussed for instance by Jernigan and Bahar [38]. Reference states such as random mixing [39] that have been developed for protein folding and threading are not applicable here since they have been specifically designed to capture the protein folding energy that is irrelevant for the calculation of protein-ligand binding energies. Refl and Ref3 are derived ‘on the fly’ by evaluating the protein atoms occurring around ligand atoms in the protein-ligand complexes used to derive the PMF. In contrast, Ref2 is defined as reference state independent of the protein-ligand interactions. Therefore, the comparison of scoring results obtained with Refl and Ref2 allows us to discuss the issue of a suitable reference state. Refl and Ref3 reflect different degrees of capturing solvation effects in the derived PMF. Therefore, the comparison of scoring results obtained with Refl and Ref3 allows us to assess the effect of the implicit treatment of solvation on the predictive power of the PMF scoring function.

Test sets for scoring experiments In the following sections we compare the performances of PMF scoring functions derived by using different reference states. For doing so we use four different test sets. Set 1 resembles set 1 in Reference 27. It contains 16 serine protease complexes taken from the PDB. The experimental data are taken from Tables 2 and 6 of Eldridge et al. [23] Starting with the highest affinity complex, the following complexes are used: lets, ldwd, letr, lppc, ltmt, lpph, lett, 3ptb, ltnh, 1dwb, ltng, ltnl, lbra, ltni, ltnk. Set 2 of 15 metalloprotease complexes taken from the PDB resembles set 2 in Reference 27. The experimental data are taken from Table 3 of Eldridge et al. [23]. Starting with the highest affinity complex, the following complexes are used: 7cpa, 6cpa, 4tmn, 8cpa, 1mnc, 5tmn, ltlp, ltmn, 5tln, lcbx, 3tmn, 2tmn, 6tmn, 3cpa, 4tln. Set 3 of 17 diverse protein-ligand complexes taken from the PDB resembles set 5 in Reference 27. The experimental data are taken from Tables 1 and 3 of Böhm [20]. Starting with the highest affinity complex, the following complexes are used: 4dfr, lfkf, 4phv, lme, lphg, 2tsc, 2gbp, lrbp, 4hvp, 2cpp, 5cpp, 2xis, 2ifb, 2ypi, lphf, 5can, lmbi. Set 4 resembles

108

Figure 3. An arbitrary set of 12 PMF is shown. The four letter code refers to the atom pair interaction types, where the first two letters refer to the protein and the second two letters to the ligand type (see Reference 27). The two curves per graph belong to PMF derived with Refl and Ref3, respectively. PMF/Ref1 curves are slightly more negative than PMF/Ref3 curves.

a set of 63 protein-ligand complexes from the PDB together with reported binding constants assembled by Böhm [40]. It consists of the following PDB complexes: lacj, ladd, lbzm, lcbx, 1cil, lcps, lctt, ldwb, ldwc, lela, lelc, 1fkf, lhpv, lhvr, 1183, lldm, lmbi, lphe, lphf, lphg, lppc, lpph, 1pso, lrbp, lrne, lsbp, lsre, ltlp, ltmn, ltnk, lulb, 2cpp, 2ctc, 2er6, 2gbp, 2gpb, 2phh, 2tsc, 2xis, 2ypi, 3cpa, 3dfr, 3ptb, 3tpi, 4dfr, 4er4, 4fab, 4gr1, 4hmg, 4hvp, 4tln, 4tmn, 5cpp, Stim, 5tln, 5tmn, 6can, 6cpa, 6rsa, 7cpa, 7cpp, 9aat, 9hvp.

Scoring results with different reference states Figure 3 shows derived PMF using reference states Ref l and Ref3. The curves appear to be very similar, Refl generates slightly more negative potentials than Ref3 does. The scoring conditions are chosen as in Muegge and Martin [27], that is, carbon-carbon interactions are scored with a 6 Å, cut-off radius and all other interactions with a 9 Å, cut-off radius. The PMF are derived using a reference sphere radius of 12 Å Table 2 shows the correlation between PMF

109 Table 2. Correlation between PMF score and measured binding affinity for four test setsa No.

Test set

No. of complexes

Reference state

R2

SD

1

Serine protease

16

Ref1 Ref2 Ref3

0.91 0.86 0.91

0.79 1.10 0.77

2

Metalloprotease

15

Ref1 Ref2 Ref3

0.59 0.55 0.53

1.84 1.88 1.94

2a

Metalloprotease w/o 1 outlier

14

Ref1 Ref2 Ref3

0.77 0.61 0.76

1.39 1.82 1.42

3

Diverse set1

17

Ref1 Ref2 Ref3

0.68 0.42 0.52

1.55 2.00 1.75

4

Diverse set2

63

Ref1 Ref2 Ref3

0.54 0.45 0.39

1.80 1.95 2.07

a The standard deviations (SD) between the binding affinities and a linear regression line, calculated by using M-estimates [42], are given in log Ki values.

scores and binding affinities for the four test sets described above. Figures 4– 7 show the correlation between the PMF scores and the binding affinities for the four test sets and the three different reference states. From Table 2 one can see that PMF/Ref1 generally performs best. For test sets of protein-ligand complexes from the same protein class (sets 1 and 2) the differences between the reference states are only small. In set 1, Ref2 leads to an inability of distinguishing between low binding complexes. The most notable difference between Ref2 and Ref1/3 in set 2 is that the outlier, the hydroxamate binding complex 1mnc, disappears in Ref2. However, the overall correlation between PMF score and log Ki is significant for all cases, particularly after removing the 1mnc outlier in set 2. Larger differences are found in test sets 3 and 4. Set 3 shows significant correlation only for PMF/Ref1 and PMF/Ref3. PMF/Ref2 performs worst. However, for a larger set of diverse protein-ligand complexes (set 4), Ref2 performs better than Ref3. Here, only PMF/Ref1 leads to a significant correlation between experiment and calculated score. The most

110

Figure 4. PMF score of 16 serine protease complexes (set 1) as function of observed binding affinities calculatedby usingRef1(♦),Ref2 and Ref3 respectively.

Figure 5. PMF score of 15 metalloprotease complexes (set 2) as function of observed binding affinities calculated by using Refl Ref2 and Ref3 respectively.

notable shortcoming of PMF/Ref2 is the finding that compounds with the highest binding affinities are no longer given the best scores (Figure 7). The results suggest that a reference state that is specific for the set of proteinligand complexes used to derive the PMF and that most comprehensively accounts for solvation effects should be used to derive the PMF. This is the more important the more diverse the protein-ligand complexes are that are scored by the PMF scoring function.

Figure 6. PMF score of 17 diverse protein-ligand complexes (set 3) as function of observed binding affinities calculated by using Refl Ref2 and Ref3 respectively.

111

Figure 7. PMF score of 63 diverse protein-ligand complexes (set 4) as function of observed binding affinities calculated by using Ref1 Ref2 and Ref3 respectively.

Distance dependence of reference states Equations 5–13 show that the radius R of a reference sphere has an effect on the derived potentials. Figure 1 illustrates that part of this effect comes from the implicit introduction of solvent effects into the model. We are particularly interested to find the minimum radius of the reference sphere sufficient to implicitly account for solvent contributions. Below that radius a ligand atom would be buried in the protein matrix such that it does not feel the effect of the solvent anymore, including cooperative effects due to other ligand atoms that may be more exposed to the solvent. In addition, we would like to know the optimal reference sphere radius that produces the best correlation between score and experiment. These questions have been tackled here by studying the four sets of protein-ligand complexes and predicting their binding affinities using PMF derived with reference sphere radii between 6 and 12 Å. In order to compare the scoring results for different reference sphere radii on equal footing, we use a scoring cutoff radius for all proteinligand interactions of 6 Å. Note that this is different from the optimal cutoff scheme found by Muegge and Martin that uses a 6 Å cutoff for carbon-carbon interactions and a 9 Å, cutoff for all other interactions. Using PMF/Ref1, Figure 8 shows the squared correlation coefficients R2 between calculated score and measured log Ki as function of the radius of the reference sphere. Test sets of diverse protein-ligand complexes (sets 3 and 4) show plateaus with statistically significant correlation between binding affinity and PMF score at reference sphere radii between 8 and 12 Å Smaller radii lead to insignificant correlation between score and measured binding constants. For the reference state derived by using a 6 Å radius the R2 is only 0.36 for set 3 and 0.32 for set 4. Test sets of special protein classes (sets 1 and 2) also show a continuous decline in R2 with decreasing reference sphere radius below 9 Å. Set 1 shows a smaller but still significant decrease in correlation with decreasing radius. Set 2 seems to be optimal between 7 and 8 Å. Towards

112

Figure 8. R2 between calculated score and measured log Ki for four sets of protein-ligand complexes as function of the radius of the reference sphere used for deriving the PMF/Ref1 (set1 set2 set2a set3 set4

smaller radii it also leads to smaller R2. However, the significantly better scoring between 6.5 and 9 Å of set 2 compared to 10–12 Å is mostly due to the outlier 1mnc. Removing the outlier from set 2 results in an improved correlation at all distances, especially at distances above 9 Å. It is interesting to note here that the optimal reference sphere radius for all the four test cases (using set 2a instead of 2) was found to be 9 Å. This finding is independent of the cutoff scheme used. It has been reproduced for the original cutoff scheme of 6/9 Å proposed earlier [27] for reference spheres with radii of >9 Å (data not shown). The results suggest that a radius of 7–8 Å is sufficient to capture most of the solvation effects in a PMF scoring function. A reference sphere with a 6 Å radius leads to significantly worse correlation between binding affinities and calculated scores. This result is consistent with the finding of good correlation for a PMF score recently reported by Mitchell et al. for a test set of 90 proteinligand complexes from the PDB [41]. This scoring function (BLEEP) uses a reference sphere of 8 Å to derive the PMF. Akin to Muegge and Martin [27] it has been found that additional terms of solvation do not improve the scoring.

113 The finding that Mitchell’s scoring function performs similarly but not quite as good as the PMF score may be attributed to the fact that it misses a volume correction factor that has been introduced for PMF scoring [27]. In addition, it may be that Mitchell et al. use a reference state that is similar to Ref2. The effect of the ligand volume correction factor needs to be studied further but is out of the scope of this work. PMF scoring functions that use a reference sphere radius of 6 Å, such as those reported by Verkhivker et al. [24] and Gohlke et al. [28], use additional solvation terms in their models in order to get good correlation with experimental binding affinities. For instance, the PMF term of Verkhivker’s scoring function alone does not show any correlation with the binding affinities of his test set. This is again consistent with our finding that a radius of 6 Å leads to a sub-optimal correlation between the PMF score and measured binding affinities.

Conclusions Different reference states in knowledge-based scoring functions lead to different predictions of binding affinities. A reference state that most comprehensively captures the effects of protein-ligand solvation consistently yields the best correlation between calculated PMF score and measured binding affinities for four different sets of protein-ligand complexes taken from the Brookhaven Protein Data Bank. The minimum radius of a reference sphere that is capable of capturing solvation effects effectively must be greater than 6 Å. A reference sphere radius of at least 7–8 Å is found to be sufficient to get good correlation between measured binding affinities and calculated score. A radius of 9 Å is found to be optimal but not significantly better than larger radii.

References 1. Dixon, J.S. and Blaney, J.M., Designing Bioactive Molecules. In: Three-dimensional techniques and applications, Martin, Y. C. and Willett, P., (Eds.) American Chemical Society, Washington, DC, 1998, pp. 175–197. 2. Makino, S. and Kuntz, I.D., J. Comput. Chem., 18 (1997) 1812. 3. Walters, W.P., Stahl, M.T. and Murcko, M.A., Drug Discovery Today, 3 (1998) 160. 4. Muegge, I., Martin, Y.C., Hajduk, P.J. and Fesik, S.W., J. Med. Chem., 42 (1999) 2498. 5. Kuntz, I.D., Blaney, J.M., Oatley, S.J. and Langridge, R.L., J. Mol. Biol., 161 (1982) 269. 6. Kuntz, I.D., Science, 257 (1992) 1078. 7. Jones, G., Willett, P., Glen, R.C. and Leach, A.R., J. Mol. Biol., 267 (1997) 727. 8. Ewing, T. and Kuntz, I.D., J. Comput. Chem., 18 (1997) 1175. 9. Goodsell, D.S. and Olson, A.J., Proteins, 8 (1990) 195.

114 10. Welch, W., Ruppert, J. and Jain, A.N., Chem. Biol., 3 (1996) 449. 11. Rarey, M., Kramer, B., Lengauer, T. and Klebe, G., J. Mol. Biol., 261 (1996) 470. 12. Miller, M.D., Kearsley, S.K., Underwood, D.J. and Sheridan, R.P., J. Cornput.-Aided Mol. Design, 8 (1994) 153. 13. Dixon, J.S., Protein Struct. Funct. Genet., Suppl., 1 (1997) 198. 14. Oprea, T.I. and Marshall, G.R., Perspect. Drug Discov. Des., 9/10/11 (1998) 35. 15. Tame, J.R.H., J. Comput.-Aided Mol. Design, 13 (1999) 99. 16. Weiner, S.J., Kollman, P.A., Case, D.A., Singh, U.C., Ghio, C., Alagona Jr., G., Profeta, S. and Weiner, P., J. Am. Chem. Soc., 106 (1984) 765. 17. Weiner, S.J., Kollman, P.A., Nguyen, D.T. and Case, D.A., J. Comput. Chem., 7 (1986) 230. 18. Stoichet, B.K., Leach, A.R. and Kuntz, I.D., Proteins, 34 (1999) 4. 19. Zou, X., personal communication. 20. Böhm, H.-J., J. Comput.-Aided Mol. Design, 8 (1994) 243. 21. Jain, A.N., J. Comput.-Aided Mol. Design, 10 (1996) 427. 22. Head, R.D., Smythe, M.L., Oprea, T.L., Waller, C.L., Green, S.M. and Marshall, G.M., J. Am. Chem. SOC., 118 (1996) 3959. 23. Eldridge, M.D., Murray, C.W., Auton, T.R., Paolini, G.V. and Mee, R.P., J. Comput.Aided Mol. Design, 11 (1997) 425. 24. Verkhivker, G., Appelt, K., Freer, S.T. and Villafranca, J.E., Protein Eng., 8 (1995) 677. 25. Wallqvist, A., Jernigan, R.L. and Covell, D.G., Protein Sci., 4 (1995) 1881. 26. DeWitte, R.S. and Shakhnovich, E.I., J. Am. Chem. SOC., 118 (1996) 11733. ’ 27. Muegge, I. and Martin, Y.C., J. Med. Chem., 42 (1 999) 79 1. 28. Gohlke, H., Hendlich, M. and Klebe, G., J. Mol. Biol., 295 (2000) 337. 29. Mitchell, J.B.O., Laskowski, R.A., Alex, A. and Thornton, J.M., J. Comput. Chem., 20 (1999) 1165. 30. Bernstein, F.C., Koetzle, T.F., Williams, G.J.B., Jr., Meyer, F.E.F., Brice, M.D., Rodgers, J.R., Kennard, O., Shimanouchi, T. and Tasumi, M., Eur. J. Biochem., 80 (1977) 319. 31. Muegge, I., Med. Chem. Res., 7–8 (1999) 490. 32. Ha, S., Andreani, R., Robbins, A. and Muegge, I., J. Comput.-Aided Mol. Design, 14 (2000) 435. 33. Ben-Naim, A., J. Chem. Phys., 107 (1997) 3698. 34. Sippl, M.J., J. Mol. Biol., 213 (1990) 859. 35. Sippl, M.J., J. Comput.-Aided Mol. Design, 7 (1993) 473. 36. Sippl, M.J., Ortner, M., Jaritz, M., Lackner, P. and Flöckner, H., Folding Des., 1 (1996) 289. 37. Thomas, P.D. and Dill, K., J. Mol. Biol., 257 (1996) 457. 38. Jernigan, R.L. and Bahar, I., Curr. Opin. Struct. Biol., 6 (1996) 195. 39. Miyazawa, S. and Jernigan, R.L., J. Mol. Biol., 256 (1996) 623. 40. Bohm, H.-J., J. Cornput.-Aided Mol. Design, 12 (1998) 309. 41. Mitchell, J.B.O., Laskowski, R.A., Alex, A., Forster, M.J. and Thornton, J.M., J. Comput. Chem., 20 (1999) 1177. 42. Press, W.H., Teukolsky, S.A., Vetterling, W.T. and Flannery, B.R., Numerical Recipes, University Press, Cambridge, 1986.

Perspectives in Drug Discovery and Design, 20: 115–144, 2000. KLUWER/ESCOM © 2000 Kluwer Academic Publishers. Printed in the Netherlands.

Predicting binding modes, binding affinities and ‘hot spots’ for protein-ligand complexes using a knowledge-based scoring function HOLGER GOHLKE, MANFRED HENDLICH and GERHARD KLEBE* Department of Pharmaceutical Chemistry, Philipps-University of Marburg, Marbacher Weg 6,D-35032 Marburg, Germany

Summary. The development of a new knowledge-based scoring function (DrugScore) and its power to recognize binding modes close to experiment, to predict binding affinities, and to identify ‘hot spots’ in binding pockets is presented. Structural information is extracted from crystallographically determined protein-ligand complexes using ReLiBase and converted into distance-dependent pair-preferences and solvent-accessible surface (SAS) dependent singlet preferences of protein and ligand atoms. The sum of the pair preferences and the singlet preferences is calculated using the 3D structure of protein-ligand complexes either taken directly from the X-ray structure or generated by the docking tool FlexX. Drugscore discriminates efficiently between well-docked ligand binding modes (root-mean-square deviation < 2.0 Å with respect to a crystallographically determined reference complex) and computer-generated ones largely deviating from the native structure. For two test sets (91 and 68 protein-ligand complexes, taken from the PDB) the calculated score recognizes poses deviating < 2 Å from the crystal structure on rank 1 in three quarters of all possible cases. Compared to the scoring function in FlexX, this is a substantial improvement. For five test sets of crystallographically determined protein-ligand complexes as well as for two sets of ligand geometries generated by FlexX, the calculated score is correlated with experimentally determined binding affinities. For a set of 16 crystallographically determined serine protease inhibitor complexes, a R2 value of 0.86 and a standard deviation of 0.95 log units is achieved as best result; for a set of 64 thrombin and trypsin inhibitors docked into their target proteins, a R2 value of 0.48 and a standard deviation of 0.7 log units is calculated. Drugscore performs better than other stateof-the-art scoring functions. To assess DrugScore’s capability to reproduce the geometry of directional interactions correctly, ‘hot spots’ are identified and visualized in terms of isocontour surfaces inside the binding pocket. A data set of 159 X-ray protein-ligand complexes is used to reproduce and highlight the actually observed ligand atom positions. In 74% of all cases, the actually observed atom type corresponds to an atom type predicted by the most favorable score at the nearest grid point. The prediction rate increases to 85% if at least an atom type of the same class of interaction is suggested. DrugScore is fast to compute and includes implicitly solvation and entropy contributions. Small deviations in the 3D structure are tolerated and, since only contacts to non-hydrogen atoms are regarded, it does not require any assumptions on protonation states. Key words: binding affinity, docking, knowledge-based, protein-ligand interactions, scoring function, virtual screening * To whom correspondence should be addressed. E-mail: [email protected]

116

Introduction The process of finding novel leads for a new target is one of the most important steps in a drug development program. Today two complementary strategies are followed: experimental high-throughput screening and computational methods exploiting structural information of the protein binding site [1–4]. The latter approaches try to predict, e.g. via docking, the actual binding mode of a ligand at the binding site [5,6]. Several of the published docking procedures are fast enough to serve the purpose and suggest solutions approximating the native pose in up to 80% of the cases [7–9]. Nevertheless, the pose closest to the experimental situation is often not ranked as the energetically most favorable one within a set of decoy geometries, which indicates deficiencies in the applied ranking schemes [10]. Consequently, we embarked on the development of a new scoring function. At best, binding affinity is determined by statistical thermodynamics resulting in a master equation that considers all contributing effects [11,12]. Although theoretically most convincing, elaborate methods such as free energy perturbation (FEP) or thermodynamic integration (TI) are computationally too demanding for the application described above [13]. Instead, the partitioning of binding affinity into several additive terms or descriptors is a widely accepted assumption for the development of empirical regression-based scoring functions [ 14]. Usually a number of empirically derived contributions is fitted to a data set of experimental observations [ 15– 19]. Approaches such as VALIDATE [20] are based on the ideas of QSAR. These approaches achieve a precision of about 1.5 orders of magnitude while predicting Ki [15,20]. However, any regression analysis suffers from the fact that the obtained conclusions can only be as precise and generally valid as the data used covers all contributing and discriminating effects in proteinligand complexes. The same arguments are true for penalty filters developed to discard computer-generated artifacts from a list of favorable ligand poses [21]. We decided to follow an alternative way to develop a scoring function based on empirical knowledge. During its development we decided not to assign protonation states to the various atom types, assuming that the derived statistical preferences implicitly reflect these influences. Furthermore, any binding feature not in agreement with the most frequently observed contact preferences will likely be penalized due to its minor occurrence. Knowledge-based potentials have been applied to rank different solutions of the protein-folding problem [22–24]. Up to now, this approach has been applied to only five case studies for the ranking of different protein-ligand complexes. Except for a single test case in the most recently published work

117 [25], none of these, however, engaged in identifying near-native poses of one ligand with respect to one protein. Wallqvist et al. [26,27] classified the surfaces of buried ligand atoms found in 38 complexes and developed a model to predict the Gibbs free energy of binding based on these observed atom-atom preferences. From an analysis of 10 HIV protease inhibitor complexes, they approximated the free energy of binding to within ± 1.5 kcal/mol. Using a dataset of 30 HIV-1, HIV-2, and SIV proteases, Verkhivker et al. [28] compiled a distance-dependent knowledge-based pair-potential which was then combined with hydrophobicity [29] and conformational entropy scales [30]. DeWitte and Shaknovich [31] used a sample of 126 structures from the PDB [32] to develop a set of ‘interatomic interaction free energies’ for a variety of atom types. Muegge and Martin [33] explored structural information of known proteinligand complexes from the PDB and derived distance-dependent Helmholtz free interaction energies of protein-ligand atom pairs. Using 77 protein-ligand complexes for validation, the calculated score achieves a standard deviation from the observed binding affinities of 1.8 log K i units. The scoring function was further evaluated by docking weak-binding ligands to the FK506-binding protein [34]. Mitchell et al. [25,35] developed a potential of mean force at atomic level using high-resolution X-ray structures from the PDB considering 820 possible atom-atom pairs. The performance to identify low-energy binding modes from decoy conformations is tested only for heparin binding to bFGF (PDBcode: lbfc). While the crystal structure was ranked lowest, the best-scored of the generated structures deviates largely from the experimental situation. Evaluating a test set of 90 different PDB complexes, with respect to binding energies, a squared correlation coefficient of 0.55 is achieved as optimum. In the present article, we describe the development of a new scoring function (called DrugScore) to predict protein-ligand interactions (see also [36]). It is based on the vast structural knowledge stored in the entire PDB retrievable using ReLiBase [37]. Knowledge-based probabilities, well adjusted to describe specific short-range distances between ligand and protein functional groups are combined with terms considering solvent-accessible-surface portions of bath partners that become buried upon binding. For the first time, knowledge-based probabilities are used to discriminate and predict ligandbinding modes for 159 protein-ligand complexes. Multiple solutions, generated for these examples by FlexX, have been re-ranked to obtaina significantly improved scoring with respect to their deviation from the native pose. In addition, the power of this approach to predict binding affinities is tested by

118 analyzing sets of crystallographically determined protein-ligand complexes and protein-ligand geometries generated by FlexX. Compared to the results of commonly applied scoring functions, DrugScore reveals convincingly better predictions. Finally, the implicit consideration of directionality effects in intermolecular interactions described by the distance-dependent part of the function is demonstrated by visual inspection and a statistical evaluation. The results are encouraging compared to the recently published Superstar method [38].

Theory and methods In the following, we shortly summarize the theoretical background of our approach (for a detailed description see [36]). The noncovalent complex formation between a ligand and a protein is usually performed in aqueous solution. An implicit description of the complex solute-solvent interactions and entropic solvent effects together with the involved enthalpic contributions resulting from interatomic forces (e.g. electrostatic or van der Waals) [39] is reflected by the formalism used to derive potentials of mean force from database knowledge [23,40]. Derivation of statistical distance-dependent pair-preferences and solvent-accessible suface-dependent singlet-preferences Following an approach at atomic level [41,42], distance-dependent pair-potentials between ligand and protein atoms of type i and j are compiled by (1) where gi,j (r) is the normalized radial pair-distribution function for atoms of types i and j, separated by a distance in the interval of r and r +dr; g(r) is the normalized mean radial pair distribution function for a distance between two atoms in the range r and r + dr. It incorporates all nonspecific information common to all atom pairs present in an environment typical for proteins. The definition of an upper radius limit rmax for interactions between atoms i and j [43] determines the overall shape of the resulting potentials. Sampling over short distances will emphasize the specific interactions formed by a ligand functional group with the neighboring binding-site residues. To guarantee that these interactions will dominate, we restrict our sampling to distances between 1 and 6 Å, with a bin size of 0.1 Å. The rationale for this upper limit arises from the fact that a 6 Å contact is short enough not to involve a water molecule as mutual mediator of a ligand-to-protein interaction. To avoid

119 the sampling over large distances but to include solvent-mediated effects, an alternative approach is required [42,44]. A combination of the short-distance sampling together with the findings from protein-fold prediction motivated us to derive a knowledge-based one-body potential scaled to the size of the solvent accessible surface (SAS) of the protein and the ligand that becomes buried upon complex formation: (2) In this equation, gi is the normalized distribution function of the surface area of an atom i in the buried state (SAS ) (considering ligand and protein individually) in comparison to the solvated state (SAS0 ). It is calculated by an approximate cube-algorithm similar to the one introduced by Böhm [15]. In this assumption any polar portion of the SAS that becomes buried in the complex, however still facing a polar environment, is considered to remain in a condition equivalent to ‘solvent accessible’ [45]. As a first approximation, the ligand conformations found by X-ray crystallography or docking procedures are assumed to be identical to those adopted in the solvent. In this crude model, conformational changes experienced upon ligand binding [46] are not considered. Both short-range pair- and SAS-potentials are derived using the ReLiBase system [37] for data extraction. For our purpose, we evaluated crystallographically determined complexes with resolutions better than 2.5 Å. Complexes with covalently bound ligands or ligands with less than 6 or more than 50 nonhydrogen atoms were discarded. Furthermore, we excluded all complexes that were subsequently used in the validation of the predictive power of the potentials to avoid any redundancy or training effects due to overfitting. Potentials were derived for 17 different atom types using the SYBYL atom type notation: C.3, C.2, C.ar, C.cat, N.3 (= N.4), N.ar (= N.2), N.am, N.pl3, O.3, O.2, O.co2, S.3 (= S.2), P.3, F, C1, Br including metal atoms Met (= Ca, Zn, Ni, Fe).

Calculation of the total score for a given ligand pose In our approach we assume that a reasonable description of the total preference ∆ W of a particular binding geometry can be approximated by summing over all individual contributions (i.e. of ki ligand atoms of type i and lj protein atoms of typej).

120 (3) γ is an adjustable parameter, optimized empirically to be 0.5. Our approach does not incorporate explicitly additional contributions to the binding energy such as conformational, rotational, and translational entropy. Furthermore, energy contributions arising from intramolecular interactions (van der Waals and torsion potentials) are neglected. Since popular docking tools such as FlexX, DOCK, and GOLD generate only favorable ligand conformations, we believe that these terms can only be of minor importance in comparison to the solute state contributions. The obtained scoring values are taken to rank different poses of one ligand in a single protein with respect to the rms deviation from the geometry as found in the crystal structure.

Calculation of binding affinities As stated above, the obtained statistical preferences and in consequence the calculated scores are considered to implicitly contain not only enthalpic but also entropic contributions to binding. Although not a proof, but if the derived scoring values correlate with experimentally determined binding free energies, it appears evident that the important contributions are correctly and sufficientlycovered. For the calculation of scoring values, cofactors and metal atoms are taken as part of the protein whereas water molecules are omitted. The values obtained are related to the experimentally determined binding affinities using an adjustable parameter cs. It is determined iteratively by scaling ∆ W in a way that the standard deviations of these calculated values become equal to the observed ones (pKi) according to a straight line with zero intercept. (4) The general applicability of this relationship depends on the fact whether this adjustable parameter can be transferred among different data sets. Implicit consideration of directionality in radial-symmetrical pair-potentials The strength of interactions, especially between polar functional groups but also e.g. between aromatic rings, depends on their mutual distance and relative orientation in space. Since a single pair-potential purely exhibits spherical symmetry, the question is how well directionality between pairs of atoms is

121 implicitly reflected if multiple pair-potentials are considered as a composite representation. To assess the spatial resolution of the entire ensemble of the pair potentials, a cubic grid with 0.5 Å, spacing is constructed in the binding pocket with a margin of 8 Å, around a ligand. At every grid point not occupied by the protein, a scoring value is calculated considering all possible ligand-atom types (Equation 1). The obtained grid values are contoured for the individual ligand-atom types. The isopleths shown comprise 10% of the potential values above the global minima for each type. For a statistical evaluation, the type of a solvent-inaccessible ligand atom actually found in the analyzed crystal structure is compared to the type predicted by our function in this local area. For the analysis, the scoring values for a C.3, N.3, O.3, O.2, or O.co2 probe at the neighboring grid point are compared and the probe with the best scored value is selected. We decided to use a 0.5 Å grid because the largest possible distance between a ligand atom and its nearest grid point amounts to half of the through-space diagonal ( ≈ 0.43 Å). We believe the grid is detailed enough, since this distance is close to the mean positional errors in experimental structure determinations. In principle a scoring value can be calculated by DrugScore at any position in space, also where precisely a ligand atom is found in a crystal structure. However, with the motivation to predict favorable ligand atom sites inside a binding pocket de novo, the grid-based approach appears most appropriate.

Results Statistical preferences of ligand-protein atom pairs Correlation functions for pairs of ligand-protein atoms were derived by counting their occurrence frequencies at discrete distances. Subsequently, statistical preferences were computed using Equation 1. In the following, we will focus on some illustrative examples, depicted in Figures 1 and 2, out of 289 possible atom-pair combinations. In all cases, the first atom-type index i is attributed to a ligand atom, the second j to a protein or cofactor atom. The derived preferences can be divided into two main classes: the first contains interactions between polar and charged atoms and exhibits pronounced minima at distances between 2.5 and 3.0 Å. They correspond to hydrogenbonds and salt-bridges (Figure 1). The second class comprises non-polar interactions and displays broader minima which reveal favorable interactions at distances >3.5 Å (Figure 2). Going from O.3-O.3 to O.3-O.co2 and O.co2-N.p13, the minima corresponding to the shell of next neighbors fall into a decreasing distance range,

122

Figure 1. Statistical preferences for polar/charged pair interactions as a function of the distance R, calculated according to Equation 1.

thus exhibiting favorable interactions at lower distances. Contacts between the above-mentioned atom-types can be assigned to a ‘normal’ hydrogenbond, a polar charge-assisted interaction and a salt-bridge [48]. Expressed in terms of statistical preferences (Equation 1), an ideal O.3-O.3 interaction is 2.5 times less favorable than a similar O.3-O.co2 interaction. For the O.co2N.pl3 atom pair one has to take into account that in a bidentate salt bridge between carboxylate and amidinium/guanidinium this interaction is counted twice. For nonpolar contacts (Figure 2), the C.ar-C.ar interaction shows a slightly more structured preference compared to the (2.3-C.3 interaction and the minimum of the former resides at a shorter distance of 3.7 Å, in agreement with the well-known aromatic-aromatic interactions [49]. In contrast, C.3-C.3 interactions do not show any distinct preference of the atom pair distribution over the entire distance range of 4 to 6 Å of favorable interactions. This is clearly in agreement with the well-known fact that the latter type of interaction hardly exhibits any directional preferences.

123

Figure 2. Statistical preferences for nonpolar/aromatic pair interactions as a function of the distance R, calculated according to Equation 1.

Figure 3. Statistical preferences for ligand atoms of type C.3 and O.co2 as calculated from the distribution functions for solvent accessibility of both atom types for complexed and separated state from the protein according to Equation 2. The number of cubes (#cubes) is an approximate measure for the solvent accessibility; zero cubes refer to complete burial.

124 Table 1. Results for scoring multiple docking solutions of 91 protein-ligand complexes generated by FlexX and DrugScore % of complexes with solutions exhibiting rmsd of the crystal structure <1.0Å

<1.5 A

<2.0Å

>2.0Å

All ranksa

65

76

84

16

1st rankb FlexX DrugScore

20 39

37 66

54 73

46 27

Improvementc

95

78

35

–41

All solutions of each docking experiment for the 91 complexes are considered. This number expresses the portion of all complexes for which at least one solution with the given rmsd was computed by FlexX. b Only the ligand geometry scored to be on the 1st rank by either FlexX or DrugScore is considered. The numbers are related to the ones in the first line. c Calculated by (%DrugScore – %FlexX)/%FlexX. a

SAS-dependent preferences for solvent-exposure of ligand and protein atoms To incorporate solvent effects we derived a SAS-dependent singlet preference either for ligand and protein atoms according to Equation 2. Figure 3 shows the statistical preferences for a C.3 or O.co2 ligand atom to be exposed in the protein-bound state compared to the unbound state. Except for very small surface portions of remaining solvent exposed at the binding site, complete burial of C.3 atoms in protein-ligand complexes is strongly favorable for complex formation. A completely different, although quite reasonable behavior is observed for O.co2-type atoms. In the complexed state, the distribution results as a compromise between several effects. On the one side, burial of O.co2 atoms diminishes the SAS. On the other side, surface portions contacting polar protein atoms are not considered as being buried. Hence, ‘partial’ burial for O.co2 results as best compromise for the complexed state. Expressed in terms of preferences, a ‘partial’ burial favors complex formation in contrast to a complete burial.

125

‘Scoring scoring functions ’ To assess the quality of a generated docking solution in terms of its deviation from the experimental structure, we define those ligand poses as ‘well docked’ that deviate by not more than 2.0 Å, root-mean-square deviation (rmsd) from the crystal structure. Visual inspection of a number of test cases showed that within this limit the generated solutions resemble the native binding mode. (For detailed statistics on ligand poses with rmsd <1.0 Å, and <1.5 Å, from the crystal structure, see Table 1 .) To test the discriminatory power of a scoring function in rendering prominent well-docked solutions from obvious artifacts, we checked how often ‘well-docked’ solutions correspond to the best rank. This criterion meets the requirement for virtual screening of large databases: due to the immense data flow a detailed investigation of only the best hits will be feasible and has to focus on the best ranked solutions. Assuming that the total statistical preference renders prominent the native protein-ligand pose, the experimental solutions should be ranked best. While applying this most stringent criterion one has to keep in mind that crystal structures of proteins are determined to limited resolution. Accordingly, we alleviate our criterion and allow also solutions deviating by less than 2.0 Å from the experimentally given structure to be ranked better than the crystal structure. The precision of predicted binding affinities is limited by the accuracy of the experimentally determined ones. For data obtained from different laboratories, an error of one order of magnitude with respect to the binding constant is assumed [50,51] whereas a mean error of perhaps 20% is found for sets of related ligands binding to the same protein measured in one laboratory by the same person in an assay with unchanged conditions.

Correlation of the calculated scores versus rmsd of the crystal structure Total preferences were calculated (Equation 3) for docking solutions obtained by FlexX. In Figure 4 the correlations of calculated scores versus the rmsd from the crystal structure for four protein-ligand complexes are displayed; the individual scores were normalized to fall between 0 and 100. In all cases, the crystal structure (rmsd = 0) corresponds to the best score. For lbbp, llst and 2ada, the native geometry is well separated from any computed solution.

126

Figure 4. Correlations of scores, normalized to values between 0 and 100, versus the rmsd from the crystal structure for the protein-ligand complexes lbbp, 11st, lrbp, and 2ada are depicted. The percentage of nonpolar (i.e. carbon) ligand atoms varies from 52% for 2ada to 95% for 1rbp, the number of rotatable bonds from 1 (1rbp, 2ada) to 6 (1bbp) and the number of solutions generated by FlexX from 99 (1rbp) to 289 (llst). Favorable ligand geometries correspond to low scores on the ordinate.

Validation of the approach using two test sets of 91 and 68 protein-ligand complexes The successful reproduction of some test cases indicates the scope of the method; however, to rigorously validate our method we studied two sets of protein-ligand complexes. Both test sets are taken from the sample used to validate FlexX [52]. The considered ligands cover a broad range of chemical diversity. In the first case (91 complexes), FlexX has already recognized a generated solution with rmsd <2.0 Å on the first rank in 54% of the cases. For the remaining 46%, FlexX also generates a geometry with rmsd <2.0 Å, however, not ranked as best solution. The second test set contains 68 additional complexes. Out of these, for 28 cases a computed solution with an rmsd <2.0 Å, is found by FlexX on rank 1.

127

Figure 5. Accumulated number of complexes as a function of the rmsd from the crystal structure found for ligand poses on rank 1 scored by FlexX (O) and DrugScore (∆), respectively. The number of complexes with the best geometry found on any rank by FlexX is depicted as solid squares.

For 38 remaining cases, FlexX did not generate a pose with rmsd <2.0 Å, using default settings. This second set was not involved in the development and parameter adjustment and served as a validation set for the scoring function. All complexes used for validating have been excluded from the database used to derive the probability distributions and statistical preferences.

Recognition of ‘well-docked’ solutions Figure 5 summarizes the accumulated number of complexes plotted versus the rmsd with respect to the crystal structure of the best ranked ligand pose either determined by FlexX or by Drugscore. In addition, the FlexX solution with the smallest rmsd disregarding its actual rank is plotted. It gives an idea how well an ideal scoring function could perform and shows how often FlexX generates well-docked solutions for the present data set. Apparently, the new scoring function performs significantly better than the one implemented in FlexX. Table 1 gives the percentages of cases found on rank 1 with an rmsd of <1.0, <1.5, <2.0 and >2.0 Å, compared to the best approximating geometry found on any rank. With respect to identifying well-

128

Figure 6. The rank of the crystal structure calculated by DrugScore among all decoy geometries generated by FlexX for each of the 91 protein-ligand complexes of the first test set is shown,

docked solutions on rank 1, FlexX succeeds in 54% of the cases whereas DrugScore detects 73%. For the second test set (68 complexes), out of the 30 satisfactorily docked solutions, FlexX and DrugScore are equally successful (93 and 92%, respectively).

Recognition of the crystal structures As mentioned above, assuming that the crystallographically determined structures are ‘optimal’ solutions, they should obtain the best scoring compared to any computer-generated ligand pose. As shown in Figure 6, this is actually the case for 54% of the examples of the first test set (91 complexes). If we alleviate the criterion to well-docked solutions, in 7 1% of the cases the crystal structure or a close-by solution is ranked best by Drugscore. Applying the scoring function to the second test set, 65% of all crystal structures are found on rank 1. Here the alleviated conditions are even fulfilled in 90% of all cases.

129

Prediction of binding affinities To assess the predictive power of DrugScore to estimate binding affinities, a scoring value is calculated (Equation 3) for different sets of protein-ligand complexes taken from literature [15,16,50]. The obtained values are scaled versus the experimental results (Equation 4). and the squared regression coefficient, the standard deviation and the maximal deviation are determined. Two different tests are performed for reason of comparison. DrugScore is applied to complexes of known X-ray structure and binding affinities. Additionally, to asses how well the near-native binding modes are recognized and their affinities are predicted, a series of ligands is docked with FlexX.

Predicting binding affinities for data sets of PDB crystal structures Data sets of PDB protein-ligand complexes are taken from Eldridge et al. [50] and Böhm [ 15,161 comprising serine proteases, metalloproteases, Larabinose binding proteins, endothiapepsins and mixed samples composed of different proteins. Similar data have recently been used by Muegge and Martin [33] to compare their scoring function to SCORE1 [15] and SMoG score [31]. Table 2 and Figure 7 summarize the statistics on the performance of DrugScore on six different data sets. The set of serine proteases comprises 16 trypsin and thrombin proteinligand complexes (Figure 7a). The best correlation with R2 = 0.86 is achieved for this test set. Examples of low-binding affinities (pKi’s <2) (lbra, 1tni, 1 tnj, 1 tnk, 1 tnl) are predicted to bind tighter by about 1.5 orders of magnitude. The second set of metalloproteases contains 15 complexes of carboxypeptidase A, thermolysin, and collagenase (Figure 7b) (R2 = 0.70). The largest deviation is obtained for 6tmn with 3.3 orders of magnitude (see also section on ‘Visualization of ‘hot spots’ in protein binding pockets’). The third set comprises 11 endothiapepsin complexes. A weak correlation of R2 = 0.30 is found. Nevertheless, the weakest binding ligand (PD125754 in 1eed) as well as the strongest one (H-261 in 2er7) are correctly recognized compared to examples with intermediate affinity (pKi’s between 6 and 8). Set no. 4 consists of 9 arabinose-binding protein-ligand complexes. Since each of the nine crystal structures contains two epimers of the sugars, for both of them scoring values are calculated separately. However, the computed values for each pair of epimers are nearly identical. Thus, even though different amounts of α - or β-form of each sugar have been present during experimental affinity determinations, due to the mutarotation equilibrium this fact has no consequences for our computer experiment. The correlation obtained is the worst of all data sets with R2 = 0.22, although the standard deviation only amounts to 0.75 log units. Including the crystallographically determined water molecules 309, 310 and 311 does not yield any improvement. These

130 Table 2. Statistical parameters of the correlations between experimentally determined binding affinities and those calculated by DrugScore Data set

No. of complexes

pKi range

R2

SDa

MDa

log(-cs)b

Serine proteases Metalloproteases Endothiapepsins Arabinose-binding proteins ‘Others’d Böhm1998e Böhm1998 (I)f Böhm1998 (II)g ‘Mixed set’h Thrombin/trypsin inhibitors’ ThermolysinJ Thermolysin (I)k Thermolysin (II)1 Thermolysin (III)m

16 15 17 18 (9)c 17 71 49 46 55 64 61 43 15 14

7 10 4 3 8 13 10 10 10 5 10 10 5 5

0.86 0.70 0.30 0.22 0.43 0.33 0.44 0.56 0.44 0.48 0.35 0.43 0.36 0.50

0.95 1.53 0.94 0.75 1.85 2.21 1.79 1.53 1.80 0.71 1.70 1.68 1.53 1.39

1.50 3.32 1.67 1.30 3.39 7.22 4.52 4.36 4.27 1.25 4.00 3.90 3.00 3.27

-1.55 -1.49 -1.86 -1.23 -1.53 -1.48 -1.44 -1.44 -1.43 -1.59 -1.59 -1.58 -1.58 -1.60

f

The standard deviation (SD) and the maximum deviation (MD) are given in units of pKi . cs is calculated as described in the text. c Each epimer found in the crystal structure of 9 arabinose-binding proteins is treated separately. d This set contains complexes of the combined training and test sets of Böhm’s work [15] that were not found in the four data sets above and show a resolution of less than or equal to 2.5 Å. e Only protein-ligand complexes found in the PDB are taken from the training and test set f of Böhm’swork[16]. f Subset of Böhm1998 that contains only complexes that show a resolution less than or equal to 2.5 Å, and comprise ligands with less than or equal to 40 non-hydrogen atoms. g Subset of Böhm1998(I) excluding 3 outliers: 1cil, lsbp, 1tnk. See text for explanation. h Subset of both validation sets of DrugScore for recognition of near-native geometries for which inhibition constants were found in literature. Binding affinities were calculated for the best ranked ligand geometries generated by FlexX of each ligand. i Set of related inhibitors of thrombin and trypsin, taken from [54] for which ligand geometries are generated by FlexX via docking the compounds into the protein structure taken from Obst et al. [55] or from lpph. j Sixty-one thermolysin inhibitors taken from the training set reported in [56] for which ligand geometries are generated by FlexX via docking the compounds into the protein structure ltlp. k Subset of ‘Thermolysin’ containing only those ligands for which a reasonable geometry was generated by FlexX. 1 Fifteen thermolysin inhibitors taken from the test set in [56] for which ligand geometries were generated as described (j). m Subset of Thermolysin(II) excluding the outlier ZGPLNH2. a

b

131

Figure 7. Correlation plots of experimentally determined pKi values versus calculated ones by DrugScore for four different sets of X-ray protein-ligand complexes as defined in the text: serine proteases (a), metallo-proteases (b), ‘others’ (c), and Böhml998(I/II) (d). For the latter, the three outliers 1cil, lsbp, 1tnk are highlighted (circles). Together with the ideal correlation line, deviations by ±1 log units are depicted.

waters are considered to contribute to the specificity of the protein towards the binding of L-arabinose, D-fucose and D-galactose [53]. Data set 5 (‘others’) comprises 17 protein-ligand complexes formed with different proteins. These data were also used by Muegge and Martin (Figure 7c). Including cofactors as parts of the protein, we obtain a correlation of R2 = 0.43. Obviously, three examples are predicted too high: the renincomplex lrne as well as the HIV-proteases 4hvp and 4phv with 51, 54 and

132 46 non-H atoms in the ligands, respectively. Since Drugscore has been parameterized only for ligands composed of less than 50 non-hydrogen atoms, which is considered to be an upper limit for the size of drug-like molecules, information concerning ligands with a size close to or beyond this limit could be regarded as incompletely covered in our analysis. Finally, a composed data set has been selected considering the X-ray data used by Bohm in his SCORE2 study [16] (Figure 7d). Any modeled complexes used by Bohm have been discarded. This set (Böhm1998) of 71 complexes yields a correlation of R2 = 0.33. Excluding all complexes with a resolution >2.5 Å or possessing ligands with more than 40 non-H atoms (vide supra) improved the correlation to R2 = 0.44 (Böhm1998 (I)). Three remaining outliers can be explained: in lsbp the ligand is a sulfate, thus falling beyond the lower limit of 6 non-hydrogen atoms used to parameterize DrugScore. In lcil (inhibitor ETS bound to carbonic anhydrase II), the sulfonamide nitrogen bound to the zinc ion is deprotonated and yields a strong ionic interaction to the metal. DrugScore treats this interaction as a contact between a normal amide nitrogen and a zinc, thus underestimating its contribution to the binding. Finally, the ligand in ltnk shows quite unusual intramolecular dimensions supposedly resulting from some deficiencies in the final refinement process. Removing these three entries results in a significantly improved regression with R2 = 0.56 and SD = 1.53 (Böhm1998 (11)). The performance of DrugScore is compared to other scoring functions in terms of some statistical descriptors determined for the first five test sets (data for PMFScore, SCORE1, SMoGScore taken from [33]; for ProteusScore taken from [50]). Figure 8 depicts the standard deviation for each of these test sets; for arabinose-binding proteins, Muegge and Martin report a value of 69.7 in the case of SCORE1 while there are no results given by Eldridge et al. for the ‘others’ test set. Compared to both knowledge-based scoring functions (PMFScore and SMoGScore), DrugScore performs better, resulting in a lower standard deviation for all four test sets considering one particular protein target. For the mixed data set (‘others’) PMFScore performs better. Comparing the regression-based scoring functions (SCORE1 and ProteusScore) with respect to Drugscore, the latter is superior to SCORE1 in all reported cases while ProteusScore obviously performs better in the case of the arabinose-binding proteins. Predicting binding affinities for ligand geometries generated with FlexX The examples used so far are all based on experimentally determined binding geometries. Obviously, the scoring function performs satisfactorily for experimentally given geometries. However, in virtual screening, affinity predictions are attempted based on modeled binding modes. This is in partic-

133

Figure 8. Comparison of different currently developed scoring functions with DrugScore’s performance to predict binding affinities in terms of their standard deviation. Five sets of X-ray protein-ligand complexes are used as defined in the text: serine proteases, metallo-proteases, endothiapepsins, arabinose-binding proteins, ‘others’. Data for PMFScore, SCORE1, and SMoGScore are taken from Muegge and Martin [33], for ProteusScore from Eldridge et al. [50] (except for test set ‘others’). In the case of arabinose-binding proteins, scored by SCORE1, a value of 69.7 is reported [33].

ular the case if a docking tool is applied. Here, the scoring function has to render prominent the most likely geometry from a large sample of generated geometries. To test the performance of DrugScore with respect to virtual screening, we calculated scoring values for ligand binding geometries generated by FlexX for (a) a sample of 55 complexes taken from the combined validation sets for DrugScore to render prominent well-docked solutions, (b) a congeneric series of 32 inhibitors [54] docked into thrombin [55] or trypsin (lpph), as well as (c) a series of 61 and 15 thermolysin inhibitors (taken from the training and test sets of [56], respectively) docked into the protein structure ltlp. For the sample (a), a standard deviation of 1.8 log units and a squared correlation coefficient of 0.44 (data not shown) is achieved. For comparison,

134

Figure 9. Correlation of experimentally determined pK i values versus calculated ones by DrugScore for (a) a set of thrombin and trypsin inhibitors docked into thrombin (taken from [55]) and trypsin (taken from lpph) and (b) a set of themolysin inhibitors [56] docked into the protein structure 1tlp, respectively, by FlexX. In the latter case, only those ligands are displayed for which FlexX finds a reasonable geometry (Thermolysin(I)). Together with the ideal correlation line, deviations by ±1 log units are depicted.

135

Figure 10. Intrinsic geometrical constraints reflected by the atom pair preferences of 0.2-0.3 and C.2-O.3. Given the minima of the statistical pair preferences (O.2-O.3: 2.55 Å; C.2-O.3: 3.45 Å) and the bond length (C.2-O.2: 1.22 Å), the C.2-O.2-O.3 angle is calculated to be 128°.

the squared correlation coefficient for scoring values calculated for the crystal geometries of this set amounts to 0.34. Thus, in the present case the affinity prediction based on computer-generated geometries is more precise than using the crystal coordinates. However, we anticipate that is not usually the case. The affinity predictions for the thrombin and trypsin inhibitors (sample (b), Figure 9) deviate from the experimental pKi values by 0.7 log units and yield a squared correlation coefficient of 0.56. The predictions are of the same quality for thrombin and trypsin. In the case of the thermolysin inhibitors (c), a squared correlation coefficient of 0.35 is calculated for the total set of 61 ligands ( ‘Thermolysin’) (data not shown). If the set is restricted to only those ligands where FlexX at least predicts a geometry with less than 3 Å rmsd from modeled reference structures (using crystal geometries as templates for the modeling), the R2 value increases to 0.42 (standard deviation: 1.68 log units) (Thermolysin(I), 43 cases, Figure 9). Considering the set of 15 thermolysin inhibitors (Thermolysin(II)), an R2 value of 0.36 is calculated. Excluding ZGPLNH2 as outlier from the latter set (Thermolysin(III)), a squared correlation coefficient of 0.50 (standard deviation: 1.39) is found.

Implicit directionality of spherical-symmetric pair-polentiah Since the compiled preferences for a given atom pair are calculated solely as a function of the mutual distance, any directional features can only be implicitly contained in the derived pair-potentials. As an example, the hydrogenbond between a carbonyl group and an O.3-type oxygen should be considered

136 (Figure 10). The most favorable interaction for O.2-O.3 occurs at a mutual distance of 2.55 Å, for the C.2-O.3 interaction at 3.45 Å. Starting with these values and assuming a C.2-O.2 bond length of 1.22 Å, a C.2-O.2-O.3 angle of 128° is calculated, well in agreement with the expected bond angle. A similar orientational preference is observed if representative fragments stored in ISOSTAR are consulted [57]. Additional contact preferences formed by the neighboring atoms will further constrain the spatial arrangement of a specific directional interaction.

Visualization of 'hot spots' in protein binding pockets A regularly spaced grid is generated inside the binding site and scoring Values are calculated at every grid point using different ligand-atom types. The results are contoured individually for each atom type. Arabinose-binding protein (1 abe) binds both epimers of arabinose preferentially to other sugar ligands [53]. The isocontour surface for C.3 (comprising values 10% above the minimum) encompasses all ligand-atom positions of this type as found in the crystal structure (Figure 11). The O.3 contour (10% level) depicts three favorable regions in space; all are occupied by hydroxyl groups of the ligand in the crystal structure. Interestingly, the contour for oxygen O-1 extends over a range where actually the oxygens of the a- and β-epimer bind. O-2 and O-5 do not coincide with regions contoured as most favorable. The fact can be explained because O-2 orients towards the solvent and no favorable interactions with the protein can be determined. Contours encompassing scoring values calculated for C.3,O.3, and N.am (10% level) inside the binding pocket of thermolysin (5tmn) are displayed together with the phosphor-analogue (ZGPLL) of the peptide carbobenzoxyGly-Leu-Leu (Figure 11). For the phosphonamidate, Bartlett and Marlowe [58] reported a Ki value of 9.1 nM while the phosphonate isomer (substitution of P-NH by P-O: ZGP(O)LL) resulted in 990 times weaker binding. In contrast, for the phosphinate analogue (substitution of P-NH by P-CH2: ZGP(C)LL) an inhibition constant of 180 nM is reported [59], only 20 times weaker than the phosphonamidate. These findings were attributed mainly to solvation effects and the potential to form a hydrogen bond between the ligand atom adjacent to phosphorus and the carbonyl oxygen of Ala 113. Interestingly, DrugScore highlights favorable binding both for an N.am and a C.3 atom at a position next to the ligand atom adjacent to P. O.3 is not favorable at this position. However, for this atom-type, promising regions coincide with the terminal oxygens of the – PO–2 – moiety binding to zinc. The contours described are displayed on a 10% level for each atom type thus resulting in a relative description. However, a per-atom contribution to the total score using C.3,O.3, and N.am, respectively, yields an absolute affin-

137

Figure 11. Isocontour surfaces encompassing values 10% above the global minimum of all scoring values on a 0.5 Å, spaced grid for several atom types are depicted. Grid values are calculated by DrugScore inside the binding pockets of labe (top) and Stmn (bottom) for different ligand probe atoms. In the case of labe, the α− and β -epimers of arabinose are shown. The surfaces are color-coded as: dark blue (sp3-hybridized oxygen), yellow (aliphatic carbon) [ labe] and cyan (sp3-hybridized oxygen), yellow (amide nitrogen), magenta (aliphatic carbon) [5tmn]. For 5tmn, the arrow indicates the phosphonamidate nitrogen of ZGPLL, which is substituted by oxygen in the phosphonate ZGP(O)LL and by carbon in the phosphinate ZGP(C)LL.

138 ity description. The values for C.3 (-11.11) and N.am (–9.33) at the positions adjacent to P deviate by only 15%, however, that of an 0.3 (–4.34) at the same position contributes much less to binding affinity. As a consequence, the phosphonamidate ZGPLL (5tmn) and the phosphinate ZGP(C)LL (modeled from 5tmn by N.am/C.3 exchange) obtain comparable total scores while the phosphonate ZGP(O)LL (6tmn) is predicted to bind weakest. The rank ordering is qualitatively predicted correctly, however, the phosphonate is yet predicted too high in affinity (vide supra). Supposedly, this can be attributed to the simultaneous consideration of hydroxyl- and ether-oxygens in the general 0.3 atom-type used in our compilation of the statistical potentials. Accordingly, in our potentials, the unfavorable ester-oxygen to carbonyl-oxygen interactions are overwhelmed by other, more favorable O.3/O.2 contacts.

Correspondence of ‘hot spots’ and observed ligand atom types In order to determine how often ‘hot spots’ detected by DrugScore actually match with ligand atom types, 159 crystallographically determined proteinligand complexes were analyzed. In principle, Drugscore can be computed analytically at every position of the binding site, accordingly also at the crystallographically determined positions of the ligands under investigation. However, to use a more general approach in particular with respect to the analysis of unoccupied binding sites, we estimate the scoring by extrapolating from precalculated values assigned to the intersections of a 0.5 Å grid. In a first step, we focus on fully buried ligand atoms only and test how frequent these atoms fall next to local minima in Drugscore for the five atom types C.3, O.3, O.2, O.co2, and N.3. By selecting these atom types, we intended to consider a small set of typical representatives for distinct types of interactions (hydrophobic, H-bond donors/acceptors, positively/negatively charged). In addition, this choice is close to the one used for the validation of Superstar [38] and allows a direct comparison (vide infra). The results are summarized in Table 3. Assuming equal weights for all five atom types and full coincidence of ligand atomic positions with local minima of DrugScore, in the worst case a chance prediction of 20% would result. However, significantly higher prediction rates are observed. In particular, aliphatic carbon and amino nitrogen atoms (also implicitly including ammonium groups) are correctly predicted in 92 and 73% of all cases, respectively, while a carbonyl oxygen atom is recognized only in 27% of the cases. For O.3 and O.co2 types, the rates amount to 37 and 46%, respectively. In total, an overall prediction rate of 74% is achieved. Grouping similar atom types together (‘hydrophobic’: C.3; ‘hydrophilic’: O.3, O.2, O.co2, N.3), in 86% of all cases the correct or a similar atom type is recognized. Assuming that most of the N.3-type atoms (amino

139 Table 3. Statistics on the prediction rates of buried ligand atom types suggested by DrugScore compared to atom types observed in crystallographically determined protein-ligand complexes at the same spatial positions Actual

Predicted No.

C.3a 745 168 O.3a O.2a 124 a 67 O.co2 15 N.3a Overall 1119

C.3a

O.3a

O.2a

O.co2a

N.3a

92% 3% 40% 37% 18% 26% 6% 27% 20% 7%

<1% 5% 27% 17% 0%

<1% 7% 23% 44% 0%

4% 11% 6% 6% 73%

Correct

74%

H. phob./ h. phil.b

Sim. Interact.c

92% 60% 82% 94% 80% 86%

92% 60% 76% 88% 80% 85%

Atom types are according to the SYBYL notation: aliphatic carbon (C.3), sp3-hybridized oxygen (O.3), carbonyl-oxygen (O.2), carboxyl(ate)-oxygen (O.co2), (protonated) aminonitrogen (N.3). b Atoms are grouped separating hydrophobic – hydrophilic properties: C.3 versus O.3/O.2/O.co2/N.3. c Atoms are grouped showing similar type of interaction (N.3 is considered to be protonated): C.3 (hydrophobic); 0.3, N.3 (hydrogen-bond donors); O.3,O.2,O.co2 (hydrogen-bond acceptors). a

nitrogen) are protonated in proteins, atom types can be grouped according to a possible interaction type of the atom under investigation (C.3 (hydrophobic); 0.3 (donor + acceptor); O.2, O.co2 (only acceptor); N.3 (only donor)). With this classification, in 85% of all cases the correct interaction type is predicted. The recently described tool Superstar [38] only considers four distinct probes (aliphatic carbon; hydroxyl-, carbonyl-oxygen; protonated amino nitrogen) for a similar validation study. Following the above-described arguments, a chance prediction of 25% has to be assumed. For a test set of 122 protein-ligand complexes, Superstar detects the correct atom type in 82% of the cases if only solvent-inaccessible ligand atoms are considered. This figure increases to 90% if ‘similar’ atom types are admitted. We included the carboxylate oxygen to consider an atom type most likely bearing a negative charge. It is the contrast to an amino nitrogen most likely being positively charged. Indeed, if our approach suggests a carboxylate oxygen as the most favorable type, an amino nitrogen is the atom type found the least frequent at this position by experiment. The same holds for the reverse.

140

Discussion and conclusions In this study, distance-dependent pair preferences and SAS-dependent singlet preferences are derived from crystallographically determined protein-ligand complexes. The scoring function DrugScore incorporates both terms and shows very promising results. It discriminates satisfactorily between welldocked (rmsd <2.0 Å) ligand binding modes and largely deviating ones generated by the docking tool FlexX. This is demonstrated for two test sets comprising 91 and 68 complexes, respectively. A substantial improvement of 35% is achieved compared to the original FlexX scoring. DrugScore’s ability to predict binding affinities is assessed by correlating experimentally determined pKi values with the computed scores. Proteinligand complexes taken from the PDB as well as sets of docked ligands were investigated. Compared to currently applied scoring functions, DrugScore reveals lower standard deviations. Most remarkably, the composite picture of spherical pair-potentials in Drugscore exhibits implicitly information about the directionality of interactions. It possesses predictive power to suggest positions in space that are most favorable for particular ligand-atom types. Knowledge-based approaches are assumed to be general since they implicitly incorporate even those effects that are yet not fully understood. Converting structural database information into statistical preferences considers entropy effects arising from cooperativity and changes of solvation due to their mean-field character. Moreover, less frequently populated states are considered with lower statistical preferences, thus implicitly penalizing computergenerated artifacts. Additionally, since no explicit training set is used in contrast to the derivation of e.g. regression-based scoring functions, our scoring function should be generally applicable. Hydrogen atoms are not explicitly considered in our scoring function. Most complexes in the PDB either lack or contain only force-field assumed hydrogen atoms. However, in particular the positions of polar hydrogen atoms strongly depend on the influences of their molecular environment. Changes of the electrostatic field of a protein might result in substantial p Ka shifts of ionizable groups upon ligand binding. In consequence, defining protonation states a priori e.g. during a docking experiment is by no means straightforward. Although at first glance, the neglect of H-atom positions appears to imply the loss of information about the directionality of polar interactions, the composite consideration of many-fold pair-preferences in a compact molecular environment recovers these features (Figure 10). By visualization of calculated hot spots, we definitely demonstrated this directionality in protein-ligand interactions to be implicitly included in our

141 distance-dependent pair-potentials. Similar considerations about anisotropic interactions resulting from the summation of individually isotropic contributions led to the correct description of e.g. directional hydrogen bonds by taking into account only Lennard-Jones-type and electrostatic interactions in force fields [60]. We believe that this important property of our approach can be attributed to the comparatively short upper limit of 6 Å considered during the compilation ofour potentials. While binding affinity is largely determined by the amount of buried non-polar surface, the specificity of ligand binding is mainly attributed to directional interactions such as hydrogen bonds [48]. The graphical display of a knowledge-based scoring function in terms of ‘hot spots’ in a binding pocket suggests further applications. Highlighting the regions of a binding site where a particular type of ligand atom appears to be most favorable allows one to use them as an interactive design tool. Nota bene, these ‘hot spots’ do not simply represent regions offavorable energy but also include entropical contributions. Additionally, this information should be used in a docking tool to drive the initial ligand placement. For comparison, data in the Cambridge Structural Database (CSD) [61] can be used to derive statistical preferences of intermolecular interactions [57]. They allow to develop a scoring function to discriminate different computer-generated crystal packings [62]. Furthermore, additional data call upon a more sophisticated consideration of atom types. However, mixing data from the PDB with those from the CSD involves the following fundamental problem: protein-ligand complexes are usually crystallized from water whereas the overwhelming part of organic small molecules are crystallized from organic solvents. As a consequence, for a quantitative correlation as anticipated in our analysis the influence of the hydrophobic effect is expected to be smaller in the data derived from the CSD compared to those from the PDB. This influence was first recognized by Verdonk et al. during the development of Superstar [38]. The recent studies of Verkhivker et al. [28], Muegge et al. [33] and Mitchell et al. [35] use a similar formalism to derive potentials. These approaches are difficult to compare since hardly any of these studies elucidate the discriminative power to render prominent the native pose. In our opinion, this is the most crucial prerequisite prior to an estimation of binding affinities in virtual screening. A scoring function, demonstrated to operate satisfactorily on crystal structures, does not necessarily handle computer-generated, often artificial and incorrect binding modes equally well. To compare different scoring functions with respect to the prediction of absolute binding affinities, one has to remember that R2 values (but not the standard deviations) heavily depend on the composition of the data sets considered. Accordingly, we compared DrugScore’s performance with respect

142 to other scoring functions in terms of their achieved standard deviations. Nevertheless, the trends observed in Figure 8 are similarly reflected once the R2 values are computed, since identical data sets were used for comparison. Noteworthy, except for the mixed set (‘others’), DrugScore performs better than the other available knowledge-based approaches. Supposedly, for virtual screening applications, it is more important to correctly predict the binding affinity of different ligands with respect to one selected protein than to rank correctly mixed sets of various protein-ligand complexes. In this study, the standard deviation for the docked thrombin and trypsin inhibitors falls below one log unit. This demonstrates DrugScore’s power to predict binding energies for computer generated ligand geometries. Nevertheless, since the measured pKi values are assumed to be affected by an even smaller error limit, the experimental accuracy is not yet matched. The larger standard deviations of 1.5 to 1.7 log units in the case of the docked thermolysin inhibitors have to be discussed in view of the supposedly much higher experimental data scatter, since these data were collected from several different sources with deviating assay conditions. In addition, due to an increasing conformational complexity of the ligands, FlexX does not detect reasonable geometries in all of the cases. Scoring unlikely geometries, however, cannot be expected to predict affinity reliably. Currently, DrugScore is implemented into FlexX. We expect an improvement of the incremental ligand build-up and placement procedure mainly in reducing the number of generated solutions. With respect to the affinity predictions we expect further enhancement once water molecules are included in our considerations.

Acknowledgements The present study was funded as part of the RELIMO-Project (grant no. 03 11619) by the German Federal Ministry for Education, Science, Research, and Technology (BMBF) and a grant of the ESCOM science foundation for H.G. for a stay in the laboratory of Prof. F. Diederich. H.G. gratefully acknowledges the kind hospitality in Zürich.

References 1. 2. 3. 4.

Muller, K., Perspect. Drug Discov. Design, 3 (1995) v. Walters, W.P., Stahl, M.T. and Murcko, M.A., Drug Discov. Today, 3 (1998) 160. Van Drie, J.H. and Lajiness, M.S., Drug Discov. Today, 3 (1998) 274. Kubinyi, H., Curr. Opin. Drug Discov. Develop., 1 (1998) 4.

143 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44.

Lengauer, T. and Rarey, M., Curr. Opin. Struct. Biol., 6 (1996) 402. Kuntz, I.D., Meng, E.C. and Shoichet, B.K., Acc. Chem. Res., 27 (1994) 117. Rarey, M., Kramer, B., Lengauer, T. and Klebe, G., J. Mol. Biol., 261 (1996) 470. Kuntz, I.D., Blaney, J.M., Oatley, S.J., Langridge, R. and Ferrin, T.E., J. Mol. Biol., 161 (1982) 269. Jones, G., Willett, P., Glen, R.C., Leach, A.R. and Taylor, R., J. Mol. Biol., 267 (1 997) 727. Dixon, J.S., Proteins, Suppl. 1, (1997) 198. Beveridge, D.L. and DiCapua, EM., Annu. Rev. Biophys. Biophys. Chem., 18 (1989) 431. Kollman, P., Chem. Rev., 93 (1993) 2395. Kollman, P.A., Acc. Chem. Res., 29 (1996) 461. Dill, K.A., J. Biol. Chem., 272 (1997) 701. Böhm, H.J., J. Comput.-Aided Mol. Design, 8 (1994) 243. Böhm, H.J., J. Comput.-Aided Mol. Design, 12 (1998) 309. Jain, A.N., J. Comput.-Aided Mol. Design, 10 (1996) 427. Murray, C.W., Auton, T.R. and Eldridge, M.D., J. Comput.-Aided Mol. Design, 12 (1998) 503. Rose, P. W., Scoring methods in ligand design, Proceedings of 2nd UCSF Course in Computer-Aided Molecular Design, San Francisco, CA, 1997. Head, R.D., Smythe, M.L., Oprea, T.I., Waller, C.L., Green, S.M. and Marshall, G.R., J. Am. Chem. Soc., 118 (1996) 3959. Stahl, M. and Böhm, H.-J., J. Mol. Graph. Model, 16 (1998) 121. Vajda, S., Sippl, M. and Novotny, J., Curr. Opin. Struct. Biol., 7 (1997) 222. Jernigan, R.L. and Bahar, I., Curr. Opin. Struct. Biol., 6 (1996) 195. Torda, A.E., Curr. Opin. Struct. Biol., 7 (1997) 200. Mitchell, J.B.O., Laskowski, R.A., Alex, A., Forster, M.J. and Thornton, J.M., J. Comput. Chem., 20 (1999) 1177. Wallqvist, A. and Covell, D.G., Proteins, 25 (1996) 403. Wallqvist, A., Jernigan, R.L. and Covell, D.G., Protein Sci., 4 (1995) 1881. Verkhivker, K., Appelt, K., Freer, S.T. and Villafranca, J.E., Protein Eng., 8 (1995) 677. Sharp, K.A., Nicholls, A., Friedman, R. and Honig, B., Biochemistry, 30 (1991) 9686. Pickett, S.D. and Sternberg, M.J., J. Mol. Biol., 231 (1993) 825. DeWitte, R.S. andshaknovich, E.I., J. Am. Chem. SOC., 118 (1996) 11733. Bernstein, F.C., Koetzle, T.F., Williams, G.J., Meyer Jr., E.E., Brice, M.D., Rodgers, J.R., Kennard, O., Shimanouchi, T. and Tasumi, M., J. Mol. Biol., 112 (1977) 535. Muegge, I. and Martin, Y.C., J. Med. Chem., 42 (1999) 791. Muegge, I., Martin, Y.C., Hajduk, P.J. and Fesik, S.W., J. Med. Chem., 42 (1999) 2498. Mitchell, J.B.O., Laskowski, R.A., Alex, A. and Thornton, J.M., J. Comput. Chem., 20 (1999) 1165. Gohlke, H., Hendlich, K. and Klebe, G., J. Mol. Biol., 295 (2000) 337. Hendlich, M., Acta Crystallogr., D 54 (1998) 1178. Verdonk, M.L., Cole, J.C. and Taylor, R., J. Mol. Biol., 289 (1999) 1093. Bohm, H.-J. and Klebe, G., Angew. Chem. Int. Ed. Engl., 35 (1996) 2566. Sippl, M.J., Curr. Opin. Struct. Biol., 5 (1995) 229. Sippl, M.J., J. Mol. Biol., 213 (1990) 859. Sippl, M.J., J. Comput.-Aided Mol. Design, 7 (1993) 473. Godzik, A., Kolinski, A. and Skolnick, J., Protein Sci., 4 (1995) 2107. Miyazawa, S. and Jernigan, R.L., Proteins, 34 (1999) 49.

144 45. 46. 47. 48. 49. 50. 51.

52. 53. 54. 55. 56. 57. 58. 59. 60. 61.

62.

Koehl, P. and Delarue, M., Proteins, 20 (1994) 264. Testa, B., Carrupt, P.A., Gaillard, P., Billois, F. and Weber, P., Pharm. Res., 13 (1996) 335. SYBYL, Tripos Inc., St. Louis, MO. Davis, A.M. and Teague, S.J., Angew. Chem. Int. Ed. Engl., 38 (1999) 736. Burley, S.K. and Petsko, G.A., Science, 229 (1985) 23. Eldridge, M.D., Murray, C.W., Auton, T.R., Paolini, G.V. and Mee, R.P., J. Comput.Aided Mol. Design, 11 (1997) 425. Hosur, M.V., Bhat, T.N., Kempf, D.J., Baldwin, E.T., Liu, B., Gulnik, S., Wideburg, N.E., Norbeck, D.W., Appelt, K. and Erickson, J.W., J. Am. Chem. SOC., 116 (1994) 847. Kramer, B., Rarey, M. and Lengauer, T., Proteins, 37 (1999) 145. Quiocho, EA., Wilson, D.K. and Vyas, N.K., Nature, 340 (1989) 404. Obst, U., De novo-Design und Synthese neuartiger, nichtpeptidischer ThrombinInhibitoren, Ph.D. Thesis, ETH Zurich, Zurich, 1997. Obst, U., Banner, D.W., Weber, L. and Diederich, F., Chem. Biol., 4 (1997) 287. De Priest, S.A., Mayer, D., Naylor, C.B. and Marshall, G.R., J. Am. Chem. SOC., 115 (1993) 5372. Bruno, I.J., Cole, J.C., Lommerse, J.P., Rowland, R.S., Taylor, R. and Verdonk, M.L., J. Comput.-Aided Mol. Design, 11 (1997) 525. Bartlett, P.A. and Marlowe, C.K., Science, 235 (1987) 569. Grobelny, D., Goli, U.B. and Galardy, R.E., Biochemistry, 28 (1989) 4948. Weiner, S.J., Kollman, P.A., Case, D.A., Singh, U.C., Ghio, C., Alagona, G., Profeta, S. and Weiner, P., J. Am. Chem. SOC., 106 (1984) 765. Allen, F.H., Davies, J.E., Galloy, J.J., Johnson, O., Kennard, O., Macrae, C.F., Mitchell, E.M., Mitchell, G.F., Smith, J.M. and Watson, D.G., J. Chem. Inf. Comput. Sci., 31 (1991) 187. Hofmann, D.W.M. and Lengauer, T., J. Mol. Model, 4 (1998) 132.

Perspectives in Drug Discovery and Design, 20: 145–169, 2000. KLUWER/ESCOM © 2000 Kluwer Academic Publishers. Printed in the Netherlands.

Hydrophobicity maps and docking of molecular fragments with solvation NICOLAS MAJEUX, MARCO SCARSI, CATHERINE TENETTE-SOUAILLE and AMEDEO CAFLISCH* Department of Biochemistry, University of Zürich, Winterthurerstraße 190, CH-8057 Zürich, Switzerland

Summary. Two methods for structure-based computational ligand design are reviewed. Hydrophobicity maps allow to quantitatively estimate and graphically display the propensity of nonpolar groups to bind at the surface of a protein target [Scarsi et al., Proteins Struct. Funct. Genet., 37 (1999) 565]. The program SEED (Solvation Energy for Exhaustive Docking) finds optimal positions and orientations of nonpolar fragments using the hydrophobicity maps, while polar fragments are docked with at least one hydrogen bond with the protein [Majeux et al., Proteins Struct. Funct. Genet., 37 (1999) 88]. An efficient evaluation of the binding energy, including continuum electrostatic solvation, allows to dock a library of 100 fragments into a 25-residue binding site in about five hours on a personal computer. Applications to thrombin, a key enzyme in the blood coagulation cascade, and the p38 mitogen-activated protein kinase, which is a target for the treatment of inflammatory and neurodegenerative diseases, are presented. The role of the hydrophobicity maps and structure-based docking of a fragment library in exploiting genomes to design drugs is addressed. Key words: continuum electrostatics, docking of fragment library, generalized Born approximation, hydrophobicity map, molecular surface, p38 MAP kinase, SEED, solvation, thrombin

Introduction Hydrophobicity is an important factor in molecular recognition [1–4] and the accurate prediction of the binding modes of nonpolar molecules to proteins in aqueous solvent is useful for ligand docking and drug design [5]. We have recently developed an approach to calculate and visualize the hydrophobicity at the surface of a protein (hydrophobicity maps) [6]. It is based on the evaluation of the nonpolar energy and electrostatic desolvation of the receptor with a continuum model. These energy contributions determine the binding modes of a nonpolar compound to the hydrophobic surface regions of a re* To whom correspondence should be addressed. E-mail: [email protected]

146 ceptor. Electrostatic interactions between ligand and receptor do not play a significant role in the case of a nonpolar ligand. Computer programs for structure-based ligand design are useful tools for de novo design and lead modification [7–10]. The combinatorial strategy chosen for structure-based ligand design consists of three parts: the docking of molecular fragments, the connection of the docked fragments by combinatorial principles to generate candidate ligands, and the estimation of (relative) binding affinities [11]. The docking approach implemented in the program SEED determines optimal positions and orientations of small to mediumsize molecular fragments in the binding site of an enzyme or receptor [12]. SEED docks polar fragments so that at least one hydrogen bond with optimal distance to a protein polar group is made. The hydrophobicity maps are used for the docking of apolar fragments. Our numerical continuum electrostatic methodology [13,14] and ad hoc look-up tables are employed to efficiently evaluate the protein and fragment desolvation upon binding and the screened electrostatic interaction. The fragments are then connected in a combinatorial way by the program CCLD [15]. For the final evaluation of candidate ligands, which is not discussed in this article, one could use a multi-layer scoring system that utilizes more than one binding affinity estimation method [16]. A recent review on computational approaches for drug design contains a detailed discussion of QS AR methods, structure-based docking and design programs, as well as a complete recapitulation of the available techniques to estimate (relative) binding affinities, i.e., from knowledge-based scoring functions to molecular dynamics-based free energy techniques [17]. The reader, interested in implicit solvation models is referred to two recent and comprehensive reviews [18,19],

Hydrophobicity maps Methods Hydrophobicity maps are a graphical representation of the binding energy of a nonpolar probe sphere rolling over the surface of the receptor. The binding energy includes both the electrostatic and nonpolar contributions to the association of a hydrophobic compound at the surface of a receptor. A continuum approach is used for the electrostatic component, whereas the van der Waals interaction describes the nonpolar contribution. The binding energy is displayed by color-rendering on the surface of the receptor. This yields a precise visualization of the surface hydrophobicity as well as a clear distinction between hydrophobic and hydrophilic zones in close proximity.

147

Binding energy of a nonpolar probe sphere at the surface of a receptor The solvent accessible surface (SAS) is spanned by the center of a probe sphere rolling over the van der Waals surface of a molecule [20]. A number of points are distributed uniformly on the SAS of the receptor to describe in a discrete manner the different positions of the center of the probe sphere. On each of these points the binding energy of the nonpolar probe sphere (∆E) is approximated, as explained in the next subsection, by the sum of van der Waals interaction energy (EvdW) and electrostatic desolvation of the receptor (∆ Edesolv): (1) Parameters for the van der Waals energy and partial charges from standard force fields can be used. In the applications presented in this work, the allhydrogen MSI CHARMm22 parameter set [21,22] was used. The evaluation of ∆ E for about 55 000 positions of the probe sphere on the thrombin surface requires about 35 s on a 195 MHz R10000 processor.

Van der Waals interaction energy The nonelectrostatic contributions to binding consist of the solute–solute van der Waals energy (favorable to binding), the loss of solute–solvent van der Waals energy (unfavorable), and the disruption of water structure which is a favorable entropic effect at room temperature [23]. A number of approaches have been proposed to evaluate these contributions [23–28]. Here it is assumed that solute–solvent van der Waals interactions and disruption of water structure compensate each other (see Reference 28 and Figure 6 of Reference 29), and that the solute–solute van der Waals energy can account for the nonelectrostatic component of the binding energy. Therefore, the van der Waals energy between the probe sphere and the receptor atoms (EvdW) is assumed to account for all the nonelectrostatic contributions to the association of the probe sphere to the receptor. It is calculated as: (2) where ri is the distance between the receptor atom i and the probe sphere. εi and Ri are the van der Waals energy minimum and radius of atom i. The probe sphere van der Waals radius ( Rprobe) and energy minimum (εprobe) are input values. Since the probe sphere is rolled over the receptor van der Waals surface, it does not clash with it and is always at optimal distance from at least one receptor atom.

148

Electrostatic desolvation energy The electrostatic desolvation of the receptor accounts for the loss of receptor–solvent favorable electrostatic interactions due to the removal of part of the highly polarizable solvent to accommodate a nonpolarizable probe sphere. This contribution always disfavors association and can be calculated within the assumption of continuum electrostatics [13,14,30–35]. The system is partitioned into solvent and solute regions and two different dielectric constants are assigned to each region. The electrostatic energy E of the receptor in solution can be expressed in terms of the electric displacement vector D (x) and of a location dependent dielectric constant ∈(x) as an integral over the three-dimensional (3D) space R3 [36]: (3) Since D (x) is additive, for point charges it can be rewritten as a sum over all charges i of the receptor: (4) Concerning the electrostatics, docking a nonpolar sphere at the surface of the receptor has the only effect of modifying the dielectric properties in the space occupied by the sphere. Over this volume the dielectric constant changes from the solvent value ( ∈w ) to the solute value ( ∈p ). Usually, ∈w is set to 78.5 which is the value of water at room temperature, while the value of ∈p can range from 1 to 4. In the limit in which D (x) does not change significantly upon docking of the sphere, the variation of the electrostatic energy of the receptor (i.e., the desolvation) can be written according to Equation 3 as an integral over the volume occupied by the probe sphere ( Vprobe): (5) where τ = 1/ ∈p - 1/ ∈w. The volume occupied by the probe sphere is assumed to be a sphere of 1.7 Å, radius, i.e., the van der Waals radius of the probe sphere augmented by 0.3 Å, to include small voids between the probe and receptor surfaces. A 3D grid is built around the receptor and Equation 5 becomes: (6)

149 where ∆V is the volume of a grid cube and the index k runs over the grid points occupied by the probe sphere. The grid spacing is usually 0.5 Å. The electric displacement of every charge of the receptor can be approximated by the Coulomb field [13,34,37]: (7) where xi is the position of the receptor atom i and qi its partial charge. Equation 7 is an analytical approximation of the total electric displacement and fulfills the condition of validity of Equations 5 and 6, i.e., D (x) is independent of the dielectric environment. The receptor desolvation in the Coulomb field approximation results from Equation 6 together with Equation 7: (8) The accuracy of this approximation is discussed in the Methods section of SEED (see below). It is important to note that the desolvation of a charged ion by a small nonpolar sphere at a distance r from the ion varies approximately as 1/r4 (Equation 8). This is a very short range effect compared with the ion electrostatic potential which varies as 1/r. Hence, the potential alone cannot properly describe electrostatic desolvation.

Graphical rendering The hydrophobicity is color-displayed over the molecular surface (MS) [38], which is traced by the surface of the probe sphere rolling over the van der Waals surface of the receptor. The MS consists of the convex receptor surface/probe contact areas and the concave (or reentrant) receptor surface/probe areas, and is preferred to the SAS because it gives a more precise description of the small details at the surface of the receptor. A smooth MS covering the receptor is generated via the molecular graphics package GRASP [23] as an ensemble of triangles. The hydrophobicity at each vertex is the value of the binding energy of the probe sphere (Equation 1) in the closest position to the vertex. This value is then visually displayed with the help of colors ranging from green (hydrophobic) through white (intermediate) to blue (hydrophilic).

150

Applications The hydrophobicity maps of thrombin and p38 mitogen-activated protein (MAP) kinase are presented here. The approach has been previously validated on 10 protein-ligand complexes [6]. In five of these complexes the ligand was a natural peptide or protein, while in the other five it was an organic compound with hydrophobic moieties. In all complexes tested up to date, the hydrophobicity maps correctly predict the regions of the receptor which are occupied by the nonpolar groups of the ligand.

Thrombin Thrombin is a trypsin-like serine protease which fulfills an essential role in both haemostasis and thrombosis [39]. In the blood coagulation cascade, thrombin is the final enzyme that cleaves fibrinogen to release fibrinopeptides A and B and generate fibrin, which can then polymerize to form a haemostatic plug. The S3 and S2 precleavage subpockets of the active site have a hydrophobic character, whereas at the bottom of the S1 or recognition pocket the carboxyl group of Asp189 is a salt bridge partner for basic side chains. Na((2-naphthylsulfinyl)glycyl)-DL-p-amidinophenylalanylpiperidine (NAPAP) is an archetypal active site inhibitor of thrombin (Figure la). It fills the S3 and S2 pockets with its naphthalene and piperidine groups, respectively. Moreover, it is anchored by its basic group (benzamidine) into S1 to form a salt bridge with Asp 189 [40]. The hydrophobicity map of the nonprime region of the thrombin active site is shown in Figure la. The side chains of Asp189 and NAPAP are also shown. S3 and part of S2 are identified as hydrophobic, while S1 shows a hydrophilic character. The naphthalene and piperidine groups of NAPAP are in contact with hydrophobic zones and bury 13 of the 100 most hydrophobic points (over a total of about 55 000) on the SAS of thrombin. The polar groups of NAPAP bind to the hydrophilic zones of the thrombin active site: the Asp189 side chain and the Gly216 backbone polar groups. The energy loss of removing water from the hydrophilic zones is compensated upon binding by favorable electrostatic ligand-receptor interactions. The lower part of S1, despite being a narrow concave cavity, is identified as hydrophilic, since electrostatic desolvation of Asp189 dominates over the favorable vdW interactions between the probe sphere and the surrounding thrombin atoms. The curvature mapped on the MS (not shown) does not take into account thrombin electrostatic desolvation and suggests that the bottom of the S1 pocket is the most hydrophobic region, in contrast with the actual binding mode of NAPAP.

151

Figure 1. Hydrophobicity maps calculated with Equations 1, 2 and 8, and displayed with the program GRASP [23]. The molecular surface is displayed with colors ranging from green (hydrophobic) through white (intermediate) to blue (hydrophilic) according to the hydrophobicity map value. (a) Thrombin-NAPAP complex [40]. The transparent MS of the thrombin active site is displayed together with the side chain of Asp189 and NAPAP in a cylinder model (carbon atoms are white, nitrogen blue, oxygen red, and sulfur yellow). (b) Complex of the p38 MAP kinase and the triarylimidazole SB203580 (PDB code 1A9U) [48]. The 70 most hydrophobic points on the ATP binding site surface are displayed in magenta. Yellow crosses mark the five hydrophobic regions discussed in the text.

152

p38 MAP kinase MAP kinases are essential enzymes for intracellular signalling cascades because they phosphorylate several regulatory proteins. They are responsive to hormones, cytokines, environmental stresses and other extracellular stimuli, and are activated by a dual phosphorylation of a threonine and tyrosine in the TXY motif in the so-called phosphorylation lip. p38 MAP kinase (or CSBP2) plays a role in processes as diverse as transcriptional regulation, production of interleukins, and apoptosis of neuronal cells [41–44]. Inhibitors of p38 activity could therefore be useful as a treatment strategy for inflammatory and neurodegenerative diseases. The CSAIDTM (cytokine suppressive anti-inflammatory drugs) class of anti-inflammatory compounds inhibits the synthesis of cytokines, such as interleukin-1 and tumor necrosis factor, by specific inhibition of the MAP kinase p38 [41,45,46]. They have a common chemical pattern: A central five-membered ring, either imidazole or pyrrole, substituted by a pyridine or a pyrimidine ring, a fluorinated or iodinated phenyl ring, and a third substituent at position 1 or 2 (Figures lb and 6). These low-molecular weight inhibitors and their analogs bind to the ATPbinding cleft of the inactivated form of p38 and are competitive with respect to ATP. They are potent inhibitors, with IC50 in the nanomolar range [45,47], and highly selective for p38 compared to the other MAP kinases. In Figure lb the most hydrophobic points in the ATP binding site are displayed together with the triarylimidazole inhibitor SB203580 [48]. Five hydrophobic regions of concave shape are found by the computational approach described above. They are colored in green in Figure lb and their approximate center is marked by a yellow cross. Three regions are consistent with the available structural data of p38 MAP kinase/inhibitor complexes [48,49], whereas two regions are novel. The most hydrophobic pocket is located between the Thr106 and Lys53 side chains, and is occupied by the phenyl group of the diaryl- and triarylimidazole inhibitors. The hydrophobic pocket lined by the Thr106 and Met109 side chains is occupied by the pyridine or pyrimidine cycle. In the diarylimidazole inhibitors, the N-substituent of the central imidazole is in contact with the hydrophobic region close to the Val30 and Val38 side chains [48]. Surprisingly, the hydrophobic pockets below Glu71 and Met109 are empty in the available crystal structures of the MAP kinase p38/inhibitor complexes [48,49]. For the inhibitors known to bind at the ATP site, it is expected that additional nonpolar substituents directed towards the two unoccupied hydrophobic pockets will improve the binding affinity.

153

Docking a fragment library with SEED Methods The different types of fragments are docked by SEED in the order specified by the user. After each fragment placement the binding energy is estimated. The binding energy is the sum of the van der Waals interaction and electrostatic energy with continuum solvation. The successive fragment type is docked, after all placement-energy evaluations of the preceding fragment type have been made. The fragment docking procedure and energy evaluation are outlined in this section. Further details of the method, e.g., the clustering procedure, are given in the original paper [12]. For the docking of a library of 100 fragments into a binding site of about 25 residues, the latest version of SEED requires about 5 h of CPU time on a single processor (195 MHz R10000 or PentiumIII 550 MHz). For more than one processor the speed-up is linear so that the docking of a library of 1000 fragments would require about 6 h of an 8-processor server or cluster.

Fragment docking The binding site of the receptor is defined by a list of residues, which are selected by the user. Fragments are considered polar if they have at least one H-bond donor or acceptor. Due to this definition some ‘polar’ fragments can have considerable hydrophobic character (e.g., diphenylether). Therefore they can also be docked by the procedure for nonpolar fragments if specified by the user. Docking of polar fragments These are docked so that one or more hydrogen bonds with the receptor are formed. The fragment is then rotated around the H-bond axis to increase sampling. Figure 2a shows the sampling of docked positions for pyrrole and acetone around a tyrosine side chain. Ideal and close-to-ideal hydrogen bond geometries are sampled in a discrete but exhaustive way. Docking of nonpolar fragments The hydrophobicity maps are used to dock nonpolar fragments. The points on the receptor SAS are ranked according to the sum of van der Waals interaction and receptor desolvation (Equation 1), and the best n points (where n is an input parameter) are selected for docking. As an illustrative example, Figure lb shows the 70 most hydrophobic points on the ATP binding site of the p38 MAP kinase. For both the fragment and the receptor, vectors are defined by joining each point on the SAS with the corresponding atom. Finally, nonpolar fragments are docked by matching a vector of the fragment with a vector of the receptor

154

Figure 2. Relaxed-eyes stereoview of the fragments docked by SEED around a tyrosine side chain. (a) Acetone and pyrrole, (b) benzene. Carbon atoms are black, oxygen and nitrogen atoms dark gray, and hydrogen atoms light gray. Hydrogen bonds are drawn with dashed lines.

at the optimal van der Waals distance. To improve sampling, additional rotations of the fragment are performed around the axis joining the receptor atom and fragment atom (Figure 2b). To increase efficiency, nonpolar fragments are discarded without calculation of the electrostatic energy, if the van der Waals interaction is less favorable than a threshold value.

155 For both polar and nonpolar fragments, the docking is exhaustive on a discrete space. The discretization originates from the limited number of preferred directions and rotations around them. Fragment symmetries are checked only once for every fragment type and are exploited to increase the efficiency in docking.

Electrostatic energy with continuum solvation The main assumption underlying the evaluation of the electrostatic energy of a fragment-receptor complex is the description of the solvent effects by continuum electrostatics [ 13,14,30–35,50–52]. The system is partitioned into solvent and solute regions and different values of the dielectric constant are assigned to each region. In this approximation only the intra-solute electrostatic interactions need to be evaluated. This strongly reduces the number of interactions with respect to an explicit treatment of the solvent. Moreover, it makes feasible the inclusion of solvent effects in docking studies where the equilibration of explicit water molecules would be a major difficulty. The electrostatic effects of the solvent are relevant and it has been shown that the continuum dielectric model provides an accurate description of molecules in solution [ 14,53]. The difference in electrostatic energy in solution upon binding of a fragment to a receptor can be calculated as the sum of the following three terms [15,50]: • Desolvation of the receptor: Electrostatic energy difference upon binding the uncharged (all partial charges switched off) fragment to the charged receptor in solution. • Screened fragment-receptor interaction: Electrostatic interaction energy between the fragment and the receptor in solution. • Desolvation of the fragment: Electrostatic energy difference upon binding the charged fragment to the uncharged (all partial charges switched off) receptor in solution. The definition of the solute volume, i.e., the low dielectric volume, is central in the evaluation of these energy terms with a continuum model. The solute– solvent dielectric boundary is described by the molecular surface (MS) of the solute [38]. A grid covering the receptor is set up. In a first step the volume occupied by the isolated receptor is defined on the grid. Subsequently for every position of a docked fragment the volume enclosed by the MS of the fragment-receptor complex is identified. The screened fragment-receptor interaction and the fragment desolvation are evaluated with a grid-based implementation [ 13,14] of the generalized Born (GB) approximation [31–35]. The GB approach would be too time con-

156 suming for the evaluation of the desolvation of the receptor which is calculated by the procedure described in the Methods section of the hydrophobicity maps.

Receptor desolvation It is evaluated using Equation 8 where the index k runs over the grid points in the volume occupied by the fragment. The volume occupied by a docked fragment is the part of the volume enclosed by the MS of the complex that was not occupied by the isolated receptor. It consists of the actual volume of the fragment and the interstitial volume enclosed by the reentrant surface between fragment and receptor, Screened fragment-receptor interaction The fragment-receptor interaction in solution is calculated via the GB approximation [31]. In a solvent of dielectric constant εw, the interaction energy between two charges embedded in a solute of dielectric constant εp is (9) where τ = 1/εp – 1/εw,

(10) and qi is the value of the partial charge i, while rij is the distance between charges i and j. Rieff is the effective radius of charge i and it is evaluated numerically on a 3D grid covering the solute as described in Reference 13. It is a quantity depending only on the solute geometry and represents an estimate of the average distance of a charge from the solvent. The intermolecular interaction energy is calculated as: (1 1)

where listi contains the receptor atoms belonging to the neighbor list of atom i. The electrostatic neighbor list includes all the receptor atoms of the van der Waals neighbor list and one atom for every charged residue whose charge center is within a distance of 13 Å, from the closest binding site residue. Supplementing the van der Waals neighbor list with a monopole approximation of distant charged residues dramatically reduces the error originating from the long range electrostatic interactions.

157 Fragment desolvation The fragment intramolecular energy in solution is calculated with the GB formula as described in Reference 13: (12)

where the two sums run over the partial charges of the fragment. Equation 12 differs from Equation 11 due to the presence of the self-energy term Eiself. self is the This term is not zero only in the case of intramolecular energies. Ei self-energy of charge i and represents the interaction between the charge itself and the solvent. It is calculated as [13,34]: (13) where RivdW is the van der Waals radius of charge i. The difference in the intramolecular fragment energy upon binding to an uncharged receptor in solution is: (14) where Edocked and Efree are the energies in solution of the fragment bound and unbound to the receptor, respectively. They are evaluated according to Equation 12. For the unbound fragment (Efree) the effective radii are calculated considering as solute the volume enclosed by the molecular surface of the fragment. For the bound fragment (Edocked) the solute is the volume enclosed by the molecular surface of the receptor-fragment complex. Efree is evaluated only once per fragment type, while Edocked is recalculated for every fragment position in the binding site.

Validation The approximations inherent to our continuum electrostatic approach were validated by comparison with finite difference solutions of the Poisson equation [12]. For this purpose, the three electrostatic energy terms were calculated with SEED and UHBD [52,54] for a set of small molecules and ions distributed over the binding site of thrombin and at the dimerization interface of the HIV-1 aspartic protease monomer. The molecule set included acetate ion, benzoate ion, methylsulfonate ion, methylammonium ion, methylguanidinium ion, 2,5-diketopiperazine, and benzene. The total number of fragment-receptor complexes analyzed were 1025 for thrombin (Figure 3) and 1490 for the HIV-1 protease monomer. The agreement between the two

158

Figure 3. Correlations in the electrostatic energies calculated by finite difference solution of the Poisson equation (x-axis) and SEED (y-axis). Values are plotted for 1025 complexes of thrombin with small molecules. The total electrostatic energy is the sum of the protein desolvation, screened interaction, and ligand desolvation scaled by 0.78, 1.28, and 1.14, respectively. The finite difference calculations were performed with the program UHBD [52,54]. An interior dielectric of 4, solvent dielectric of 78.5, and grid spacing of 0.5 Å were used for both SEED and UHBD.

methods is very good, and better for a solute dielectric constant of 4.0 (Figure 3) than 1.0 (Figure 2 of Reference 12). In Table 2 of Reference 12 it was shown that systematic errors (slope ≠ 1) are independent of the receptor and

159 the solute dielectric constant and consequently can be corrected by the use of appropriate scaling factors for the different energy terms.

Application to thrombin Apart from a relatively small rigid body motion of the Tyr60A-Trp60D loop, thrombin assumes the same conformation in complexes with different inhibitors [55,56]. Figure 4 shows the most relevant results of a SEED run (interior dielectric of 1.0) on the nonprime region of the thrombin active site while a more detailed description of the SEED functionality maps of thrombin is given in Reference 12. The hydrophobic fragments bind preferentially to the S3 and S2 pockets (Figure 4a). The +1 charged groups, e.g., benzamidine (Figure 4b), 5-amidine indole (Figure 4c) and methylguanidinium, are involved in optimal hydrogen bonds with the Asp189 side chain in the S1 pocket. The SEED results are in agreement with the large amount of structural data on thrombin/inhibitor complexes [10,39,40,55–61]. The fragments docked by SEED were then connected by the program CCLD [15] as a further test of the usefulness of SEED for ligand design. CCLD generated 390 candidate ligands in 33 min on a R10000 processor. Four interesting hits, which ranked as 2nd, 37th, 42nd and 90th, are shown in Figures 5c–f. Hits 1 and 2 (Figures 5c and d) are similar to Argatroban (Figure 5b), which is a reversible inhibitor of thrombin with a Ki of 19 nM [40,62]. They have nonpolar groups in S3 and S2 and a guanidinium in S1. The sulfonamide NH of compounds 1,2, and Argatroban donates a hydrogen bond to the backbone CO of Gly216. Moreover, the carbonyl group in 2 and Argatroban accepts from the NH group of Gly216. Additional hydrogen bonds, with respect to Argatroban, are present in 1 and 2, namely between an SO2 oxygen and the NH group of Gly219, and between the guanidinium and the main chain CO of Gly219. Furthermore, the amide group close to the guanidinium in compound 1 donates to the carbonyl of Ser214. Argatroban has less polar interactions with thrombin than hits 1 and 2 but its double ring moiety fills the S3 pocket better than the cyclohexyl ring of compounds 1 and 2. Hits 3 and 4 (Figures 5e and f) have a benzamidine in the S1 pocket and a benzene in S2. The benzamidine moiety of compound 3 donates to the two oxygens of the Asp189 side chain and to the carbonyl group of Gly219. In the S3 pocket the hydroxyl substituent of cyclohexane donates to the main chain CO of Glu97A. Compound 4 is similar to 4-TAPAP (Figure 5a), a reversible inhibitor of thrombin [40,63] whose racemic mixture has a Ki of 640 nM. In 4-TAPAP and hit 4 the benzamidine is involved in a salt bridge with the Asp189 side chain, the sulfonyl part accepts from the NH group of Gly219 and one NH donates to the carbonyl group of Gly216. The interaction with

160

Figure 4. (a) Relaxed-eyes stereoview of the SEED cluster representatives of benzene (thick lines) in the thrombin active site (thin lines). The NAPAP inhibitor is also shown (medium lines), though it was removed during the SEED procedure. The SEED cluster representatives are labeled according to their binding energy rank within representatives of the same type. (b) Same as in (a) for benzamidine. Hydrogen bonds between protein and ligands are shown with dashed lines. (c) Same as in (b) for 5-amidine indole. Reprinted with permission from [12].

161

Figure 5. Schematic representations of the interactions between thrombin and (a) 4-TAPAP, (b) Argatroban, and (c–f) CCLD hits 1 to 4. Reprinted with permission from [12].

162 Table 1. SEED results for the p38 MAP kinase Rank of clustera

Intermolecular vdWaals Elect

Electrostatic desolvation Receptor Fragment

∆Gbindingb

Sitec

-14.7 -10.1 -11.3 -10.7 -11.0

-0.6 0.0 0.3 0.5 0.1

3.4 2.0 3.1 2.8 3.6

0.3 0.3 0.3 0.3 0.3

-11.6 -7.8 -7.7 -7.2 -7.0

Phenyl Pyridine Phe169 Phe169 Phenyl

-9.1 -8.5 -7.6 -9.2 -9.1

-0.6 -2.0 0.1 -1.6 -2.2

1.4 4.1 1.3 4.9 6.1

0.8 0.8 0.8 0.8 0.8

-7.4 -5.6 -5.4 -5.1 -4.4

Pyridine Lys53 Pyridine Lys53 Lys53

Benzene 1 2 3 4 5 Pyridine 1 2 3 4 5

Each cluster contains 10 fragment positions. Energy values (in kcal mol–1) are given for the cluster representative which has the most favorable binding energy among the 10 members of the cluster. Cluster 1 is shown in Figure 6. b Sum of the values in the four preceding columns, i.e., intermolecular interactions and electrostatic desolvation energies. c The site is defined by the substituent of the triarylimidazole inhibitor (in boldface) or the closest side chain of the p38 MAP kinase. a

NH of Gly216 is missing in 4 but there are two additional hydrogen bonds, with the CO groups of Ser214 and Gly219. This last hydrogen bond also occurs in the NAPAP thrombin complex. The naphthalene ring of 4 fills the S3 pocket as NAPAP [55]. These representative examples and visual analysis of other SEED-CCLD hits indicate that the present approach generates candidate ligands with interaction patterns similar to known thrombin inhibitors.

Application to the p38 MAP kinase The SEED maps of benzene and pyridine were obtained with an interior dielectric of 4, solvent dielectric of 78.5, and grid spacing of 0.5 Å. A complete analysis of the results for a library of about 100 fragments and a comparison with the available structural data of p38 MAP kinase/inhibitor complexes will be given elsewhere (Tenette-Souaille et al., manuscript in preparation).

163

Figure 6. Relaxed-eyes stereoview of the 10 best benzenes and pyridines docked by SEED into the p38 MAP kinase. The bound conformation of the SB203580 inhibitor [48] is displayed to show that the 10 best benzenes and pyridines match the corresponding groups of SB203580.

Benzene The first cluster of benzene occupies the hydrophobic pocket, which contains the phenyl group of the known triarylimidazole inhibitors. The orientation of the members of the first cluster is similar to that observed in the crystallographic structure (Figure 6). A remarkable gap of 3.8 kcal mol–1 in the binding energy is found between the representative of the first cluster and the other cluster representatives (Table 1). Furthermore, even the other nine members of the first cluster have a more favorable energy than the representative of the second cluster. This difference is mainly due to the very favorable van der Waals term in the first cluster. The representatives of clusters 2 to 5 display similar energy values. Most of the fragments containing a phenyl ring (e.g., naphthalene, tetraline, N-methyl indole, and dibenzocyclohexane) match the phenyl group of the known triarylimidazole inhibitors. Moreover, there is a large energy gap (from 2.5 to 4.0 kcal mol–1) between the first cluster and the other clusters. Pyridine Pyridine, as well as other fragments containing an aromatic ring with a hydrogen bond acceptor, overlaps the pyridine substituent of the triarylimidazole inhibitors (Figure 6). The orientation of the members of the first cluster is very close to that of the known inhibitors and they are involved in a hydrogen bond with the backbone NH of Met109 as in the crystal structure [49]. The main chain NH of Met109 is indeed the privileged partner of

164 fragments with a hydrogen bond acceptor. As in the case of benzene, there is a significant gap (1.8 kcal mol–1) between the first cluster representative of pyridine and the second cluster (Table 1).

Conclusions A procedure to determine hydrophobic regions on the surface of a protein and a continuum electrostatic approach for the accurate and efficient docking of a fragment library have been presented. The hydrophobicity maps allow easily to discriminate between hydrophobic and hydrophilic surface regions that are close in space. This was illustrated for the thrombin-NAPAP and p38 MAP kinase–triarylimidazole inhibitor complexes, whose hydrophobicity maps can be used for de novo design and lead modification. Furthermore, for the thrombin–NAPAP and farnesyltransferase–farnesylpyrophosphate complexes it was shown previously that existing approaches based on the analysis of the surface curvature and/or the electrostatic potential are not as valuable in distinguishing regions where nonpolar and polar groups can bind [ 1]. The fraction of the most hydrophobic receptor regions that are buried at the binding interface is in general particularly high, suggesting that hydrophobic association is determinant in protein-ligand binding. This confirms previous findings [6,7]. A number of very efficient docking programs have been published recently. Most of them use either a scoring function with a crude approximation of solvation [64–66] or a vacuum energy derived from a molecular mechanics force field [67,68]. The program DOCK, which pioneered the use of geometric criteria to select ligands which best complement the shape of the receptor site [69,70], has been supplemented by the evaluation of ligand desolvation [71,721. To efficiently screen large databases of compounds, DOCK assumes that every ligand desolvates the receptor equally and that the ligand is completely desolvated upon binding. The continuum electrostatic approach implemented in SEED does not make these assumptions. There are two main advantages in SEED with respect to the multiple copy simultaneous search method (MCSS) which is a force field-based approach for determining optimal positions and orientations of functional groups in a protein binding site [73–78]. These are the inclusion of electrostatic solvation and the determination of all favorable binding modes. The effects of the solvent are neglected in MCSS which calculates the protein-fragment interactions with a vacuum potential [21]. This choice in MCSS was based on the principle that fast methods are necessary to perform effective searches of the binding site and that good candidate ligands subsequently can be ranked in terms of their binding energy. A possible difficulty is that minimized positions

165 may be missed or misplaced due to the lack of a solvation correction during the MCSS minimization. The best energy minima without solvation do not always turn out to be those of most interest [76]. Particularly problematic is the docking of apolar fragments which, without inclusion of solvation, are positioned in both hydrophilic and hydrophobic regions of the binding site. This problem is solved in SEED by the prioritization of apolar regions on the protein surface according to low electrostatic desolvation and favorable van der Waals interactions, as well as the efficient use during docking of a protein desolvation look-up table. SEED samples optimal binding modes and can also find positions which do not necessarily correspond to local minima of the energy function (e.g., a favorable region with relatively flat potential energy in between two well pronounced minima). This is an advantage with respect to MCSS because not all of the molecular fragments, even in potent inhibitors, have optimal interactions with the protein [76]. Fragments are docked as rigid bodies by SEED. For larger ligands with rotatable bonds, conformational flexibility can be taken into account by docking different conformations. Programs are available for the automatic generation of diverse low-energy conformations of small molecules [79,80]. In the case of large ligands with many rotatable bonds one could use SEED to find optimal positions for the rigid moieties and then use other techniques which allow for full ligand and eventually also protein flexibility [8 1].

Future perspectives The two computational approaches presented in this article, hydrophobicity maps and SEED, were developed to precisely determine hydrophobic pockets and to dock fragment libraries. The hydrophobicity maps will be useful for the characterization of binding sites for the 3D structures [82] of the large amount of sequences that are emerging from the many genome projects. The location of a binding or association site can be predicted from the clusters of most hydrophobic points on the surface, and the size and ligand type could be estimated from the area and/or the volume of the binding site cleft. Drug design is a fully multidisciplinary research field. Methodologies and procedures from different scientific disciplines support and cross-fertilize each other. A typical example is the combinatorial strategy for fragment-based design which is common to SAR by NMR [83,84], the multiple solvent crystal structures method [85], and SEED-CCLD [2,15]. A better understanding of the physical principles of solvation is needed to help in designing drugs. Continuum models of solvation effects are particularly useful for docking; their use will grow significantly in the near future. The continuum electro-

166 static approach implemented in SEED allows to efficiently dock a library of molecular fragments to a receptor of known structure. The SEED-CCLD strategy uses combinatorial principles to construct candidate ligands. Possible applications are for de novo design, as documented here for thrombin, lead optimization, and the selection of monomers for parallel synthesis and combinatorial chemistry. Although there is not yet a computational approach to step directly from genomes to drugs [86], we think that the methods and procedures described in this issue of Perspectives in Drug Discovery and Design are useful new developments for drug discovery.

Acknowledgements We thank S. Ahmed, J. Apostolakis, N. Budin and C. Ehrhardt for helpful discussions, and A. Widmer (Novartis Pharma Inc., Basel) for the molecular modelling program WITNOTP which was used for visual analysis of the results and for preparing the Figures 2,4, and 6. This work was supported by the Swiss National Science Foundation (Nationalfonds, Grant No. 3 1 -53604.98), the Helmut Horten Foundation, and the EMDO Foundation. The program SEED (for SGI or PC running the Linux operating system), as well as the library of fragments in mol2 format, is available for non profit institutions from the last author (e-mail: [email protected]).

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15.

Janin, J. and Chothia, C., J. Biol. Chem., 265 (1990) 16027. Janin, J. and Chothia, C., Biochemistry, 17 (1978) 2943. Horton, N. and Lewis, M., Protein Sci., 1 (1992) 169. Young, L., Jernigan, R.L. and Covell, D.G., Protein Sci., 3 (1994) 717. Davis, A.M. and Teague, S.J., Angew. Chem. Int. Ed., 38 (1999) 736. Scarsi, M., Majeux, N. and Caflisch, A., Proteins Struct. Funct. Genet., 37 (1999) 565. Veerapandian, P., Structure-Based Drug Design, Marcel Dekker Inc., New York, NY, 1997. Kubinyi, H., Curr. Opin. Drug Design Discov., 1 (1998) 4. Caflisch, A., Walchli, R. and Ehrhardt, C., News Physiol. Sci., 13 (1998) 182. Böhm, H.J., Banner, D.W. and Weber, L., J. Comput.-Aided Mol. Design, 13 (1999) 51. Caflisch, A. and Karplus, M., Perspect. Drug Discov. Design, 3 (1995) 51. Majeux, N., Scarsi, M., Apostolakis, J., Ehrhard, C. and Caflisch, A., Proteins Struct. Funct. Genet., 37 (1999) 88. Scarsi, M., Apostolakis, J. and Caflisch, A., J. Phys. Chem. A, 101 (1997) 8098. Scarsi, M., Apostolakis, J. and Caflisch, A., J. Phys. Chem. B, 102 (1998) 3637. Caflisch, A., J. Comput.-Aided Mol. Design, 10 (1996) 372.

167 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41.

42. 43. 44. 45. 46. 47.

48.

So, S. and Karplus, M., J. Comput.-Aided Mol. Design, 13 (1999) 243. Apostolakis, J. and Caflisch, A., Comb. Chem. High Throughput Screening, 2 (1999) 91. Cramer, C.J. and Truhlar, D.G., Chem. Rev., 99 (1999) 2161. Roux, B. and Simonson, T., Biophys. Chem., 78 (1999) 1. Lee, B. and Richards, F.M., J. Mol. Biol., 55 (1971) 379. Brooks, B.R., Bruccoleri, R.E., Olafson, B.D., States, D.J., Swaminathan, S. and Karplus, M., J. Comput. Chem., 4 (1983) 187. Momany, F.A., Klimkowski, V.J. and Schäfer, L., J. Comput. Chem., 11 (1990) 654. Nicholls, A., Sharp, K.A. and Honig, B., Proteins Struct. Funct. Genet., 11 (1991) 281. Privalov, P.L. and Gill, S.G., Adv. Protein Chem., 39 (1988) 191. Privalov, P.L. and Gill, S.G., Adv. Protein Chem., 247 (1990) 559. Creighton, T.E., Curr. Opin. Struct. Biol., 1 (1991) 5. Friedman, R.A. and Honig, B., Biophys. J., 69 (1995) 1528. Caflisch, A., Fischer, S. and Karplus, M., J. Comput. Chem., 18 (1997) 723. Vorobjev, Y.N., Almagro, J.C. and Hermans, J., Proteins Struct. Funct. Genetics, 32 (1998) 399. Warwicker, J. and Watson, H.C., J. Mol. Biol., 157 (1982) 671. Still, W.C., Tempczyk, A., Hawley, R.C. and Hendrickson, T., J. Am. Chem. SOC., 112 (1990) 6127. Hawkins, G.D., Cramer, C.J. and Trulhar, D.G., Chem. Phys. Lett., 246 (1995) 122. Hawkins, G.D., Cramer, C.J. and Trulhar, D.G., J. Phys. Chem., 100 (1996) 19824. Schaefer, M. and Karplus, M., J. Phys. Chem., 100 (1996) 1578. Di Qiu, Shenkin, P.S., Hollinger, F.P., and Still, W.C., J. Phys. Chem. A, 101 (1997) 3005. Jackson, J.D., Classical Electrodynamics, John Wiley & Sons, New York, NY, 1975. Luo, R., Moult, J. and Gilson, M.K., J. Phys. Chem. B, 101 (1997) 11226. Richards, F.M., Annu. Rev. Biophys. Bioeng., 6 (1977) 151. Tapparelli, C., Metternich, R., Ehrhardt, C. and Cook, N.S., Trends Pharmacol Sci., 14 (1993) 366. Brandstetter, H., Turk, D., Hoeffken, H.W., Grosse, D., Stuerzebecher, J., Martin, D.P., Edwards, B.F.P. and Bode, W., J. Mol. Biol., 226 (1992) 1085. Lee, J.C., Laydon, J.T., McDonnell, P.C., Gallagher, T.F., Kumar, S., Green, D., McNulty, D., Blumenthal, M.J., Heys, J.R., Landvatter, S.W. and Young, P.R., Nature, 372 (1994) 739. Han, J., Lee, J.D., Bibbs, L. and Ulevitch, R.J., Science, 265 (1994) 808. Rouse, J., Cohen, P., Trigon, S., Morange, M., Alonso-Llamazares, A., Zamanillo, D., Hunt T. and Nebreda, A.R., Cell, 78 (1994) 1027. Freshney, N.W., Rawlinson, L., Guesdon, F., Jones, E., Cowley, S., Hsuan, J. and Saklatvala, J., Cell, 78 (1994) 1039. Cuenda, A., Rouse, J., Doza, Y.N., Meier, R., Cohen, P., Gallagher, T.F., Young, P.R. and Lee, L.C., FEBS Lett., 364 (1995) 229. Lee, L.C. and Adams, J.L., Curr. Opin. Biotechnol., 6 (1995) 657. Gallagher, T.F., Seibel, G.L., Kassis, S., Laydon, J.T., Blumenthal, M.J., Lee, J.C., Lee, D., Boehm, J.C., Fier-Thompson, S.M., Abt, J.W., Soreson, M.E., Smietana, J.M., Hall, R.F., Garigipati, R.S., Bender, P.E., Erhard, K.F., Krog, A.J., Hofmann, G.A., Sheldrake, P.L., McDonnell, P.C., Kumar, S., Young, P.R. and Adams, J.L., Bioorg. Med. Chem., 5 (1997) 49. Wang, Z., Canagarajah, B.J., Boehm, J.C., Kassisa, S., Cobb, M.H., Young, P.R., AbdelMeguid, S., Adams, J.L. and Goldsmith, E.J., Structure, 6 (1998) 1117.

168 49. 50. 51. 52. 53. 54. 55. 56. 57.

58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78. 79. 80. 81. 82.

Tong, L., Pav, S., White, D.M., Rogers, S., Crane, K.M., Cywin, C.L., Brown, M.L. and Pargellis, C.A., Nat. Struct. Biol., 4 (1 997) 3 1 1. Gilson, M. K. and Honig, B. H., Proteins Struct. Funct. Genet., 4 (1988) 7. Bashford, D. and Karplus, M., Biochemistry, 29 (1990) 10219. Davis, M. E., Madura, J. D., Luty, B. A. and McCammon, J. A., Comput. Phys. Commun., 62 (1991) 187. Marrone, T. J., Gilson, M. K. and McCammon, J. A., J. Phys. Chem., 100 (1996) 1439. Davis, M.E. and McCammon, J.A., J. Comput. Chem., 10 (1989) 386. Banner, D.W. and Hadvary, P., J. Biol. Chem., 266 (1991) 20085. Stubbs, M. T. and Bode, W., Perspect. Drug Discov. Design, 1 (1993) 431. Hilpert, K., Ackermann, J., Banner, D.W., Gast, A., Gubemator, K., Hadvary, P., Labler, L., Müller, K., Schmid, G., Tschopp, T. and van de Waterbeemd, H., J. Med. Chem., 37 (1994) 3889. Lyle, T.A., Perspect. Drug Discov. Design, 1 (1993) 453. Tabernero, L., Chang, C.Y., Ohringer, S., Lau, W.F., Iwanowicz, E.J., Han, W.C., Wang, T.C., Seiler, S.M., Roberts, D.G.M. and Sack, J.S., J. Mol. Biol., 246 (1995) 14. Obst, U., Gramlich, V., Diederich, F., Weber, L. and Banner, D.W., Angew. Chem., 107 (1995) 1874. Wagner, J., Kallen, J., Ehrhardt, C., Evenou, J. and Wagner, D., J. Med. Chem., 41 (1998) 3664. Kikumoto, R., Tamao, Y., Tezuka, T., Tonomura, S., Hara, H., Ninomiya, K., Hijikata, A. and Okamoto, S., Biochemistry, 23 (1984) 85. Stürzebecher, J., Walsmann, P., Voigt, B. and Wagner, G., Thromb. Res., 36 (1984) 457. Rarey, M., Kramer, B., Lengauer, T. and Klebe, G., J. Mol. Biol., 261 (1996) 470. Eldridge, M.D., Murray, C.W., Auton, T.R., Paolini, G.V. and Mee, R.P., J. Comput.Aided Mol. Design, 11 (1997) 425. Baxter, C.A., Murray, C.W., Clark, D.E., Westhead, D.R. and Eldridge, M.D., Proteins Struct. Funct. Genet., 33 (1998) 367. Jones, G., Willett, P., Glen, R.C., Leach, A.R. and Taylor, R., J. Mol. Biol., 267 (1997) 727. Lorber, D.M. and Shoichet, B.K., Protein Sci., 7 (1998) 938. Kuntz, I.D., Blaney, J.M., Oatley, S.J., Langridge, R. and Ferrin, T.E., J. Mol. Biol., 161 (1982) 269. DesJarlais, R.L., Sheridan, R.P., Dixon, J.S., Kuntz, I.D. and Venkataraghavan, R., J. Med. Chem., 29 (1986) 2149. Shoichet, B.K., Stroud, R.M., Santi, D.V., Kuntz, I.D. and Perry, K.M., Science, 259 (1993) 1445. Shoichet, B.K., Leach, A.R. and Kuntz, I.D., Proteins Struct. Funct. Genet., 34 (1999) 4. Miranker, A. and Karplus, M., Proteins Struct. Funct. Genet., 11 (1991) 29. Caflisch, A., Miranker, A. and Karplus, M., J. Med. Chem., 36 (1993) 2142. Miranker, A. and Karplus, M., Proteins Struct. Funct. Genet., 23 (1995) 472. Grootenhuis, P.D.J. and Karplus, M., J. Comput.-Aided Mol. Design, 10 (1996) 1. Joseph-McCarthy, D., Hogle, J.M. and Karplus, M., Proteins Struct. Funct. Genet., 29 (1997) 32. Leclerc, F. and Karplus, M., Theor. Chem. Acc., 101 (1999) 131. Kolossvary, I. and Guida, W.C., J. Am. Chem. Soc., 118 (1996) 5011. Klebe, G. and Mietzner, T., J. Comput.-Aided Mol. Design, 8 (1994) 583. Apostolakis, J., Plückthun, A. and Caflisch, A., J. Comput. Chem., 19 (1998) 21. Sánchez, R. and Sali, A., Proc. Natl. Acad. Sci. USA, 95 (1998) 13597,

169 83. 84. 85. 86.

Shuker, H.B., Hajduk, P.J., Meadows, R.P. and Fesik, S.W., Science, 274 (1996) 1531. Hajduk, P.J., Meadows, R.P. and Fesik, S.W., Science, 278 (1997) 497. Mattos, C. and Ringe, D., Nat. Biotechnol., 14 (1996) 595. Dill, K.A., Nature, 400 (1999) 309.

Perspectives in Drug Discovery and Design, 20: 171–190, 2000. KLUWER/ESCOM © 2000 Kluwer Academic Publishers. Printed in the Netherlands.

Virtual screening with solvation and ligand-induced complementarity VOLKER SCHNECKE and LESLIE A. KUHN* Protein Structural Analysis and Design Laboratory, Department of Biochemistry, Michigan State University, East Lansing, MI 48824-1319, U.S.A.

Summary. We present our database-screening tool SLIDE, which is capable of screening large data sets of organic compounds for potential ligands to a given binding site of a target protein. Its main feature is the modeling of induced complementarity by making adjustments in the protein side chains and ligand upon binding. Mean-field theory is used to balance the conformational changes in both molecules in order to generate a shape-complementary interface. Solvation is considered by prediction of water molecules likely to be conserved from the crystal structure of the ligand-free protein, and allowing them to mediate ligand interactions, if possible, or including a desolvation penalty when they are displaced by ligand atoms that do not replace the lost hydrogen bonds. A data set of over 175 000 organic molecules was screened for potential ligands to the progesterone receptor, dihydrofolate reductase, and a DNA-repair enzyme. In all cases the screening time was less than a day on a Pentium II processor, and known ligands as well as highly complementary new potential ligands were found. Key words: bound water, dihydrofolate reductase, DNA repair enzymes, docking, drug design, flexibility, molecular recognition, progesterone receptor

Introduction Screening a large database of organic compounds for potential ligands to a protein is often seen as a simple extension of the docking problem, which is the prediction of the favorable binding mode for a single ligand. When doing ligand screening by docking, as in our screening tool SLIDE, the docking problem must be solved for each ligand candidate in the database. But, because hundreds of thousands of ligands are screened, the time that a screening tool can spend for each compound must be far less than several minutes, which is the typical runtime for fast docking tools that model full ligand flexibility [1–3]. When spending only one minute per compound, the screening time for a database of 100 000 molecules is about 10 weeks. In order to * To whom correspondence should be addressed. E-mail: [email protected].

172 reduce the runtime to a realistic time frame, say, a day, it is first important to efficiently rule out infeasible candidates, then to focus on the few promising molecules in the database. The limitation of the time for conformational search in a screening tool also affects its scoring function, which is used to rate the complementarity of protein and ligand in a given conformation. Rather than estimating the binding affinity, a computational intensive and still imprecise science, the goal of the scoring function in a screening tool is to give an appropriate relative ranking for the potential ligands, with known or new ‘real’ ligands obtaining top ranks. Ideally, such a scoring function should be robust, with real ligands obtaining high scores irrespective of their exact binding mode. Another important and often neglected aspect of docking and database screening is the induced shape complementarity of the protein upon ligand binding. Many cases are known in which the binding site undergoes significant conformational changes when binding different ligands [4–6]. When assessing the quality of docking tools, typically known ligands are redocked into fixed binding sites that are tailored to bind that very ligand, since they are taken from the corresponding crystallographic complex. Although this is likely to bias the selectivity for the known ligand and its score relative to other candidates, the effect might be minor for lead optimization, where similar ligands are docked to compare their binding modes and relative affinities. However, when screening compounds from a database for lead discovery, bias towards known ligands should be avoided in the search. Our approach is to screen using the ligand-free conformation of the target protein, when available, and to model induced complementarity for the protein side chains as well as ligand when screening and docking a large variety of potential ligands. In this article, we describe applications of our screening tool SLIDE [7], which is able to reduce large compound databases of more than 175 000 organic molecules to a ranked list of approximately 100 docked potential ligands within an hour to two days, depending on the binding-site characteristics and degree of flexibility in the screened molecules. In addition to ligand flexibility, SLIDE models full flexibility of ligands and protein side chains when docking potential ligands, and uses Consolv [8] to predict water-mediated interactions with the ligand.

Background The majority of results reported for database screening are based on the application of tools that were designed to predict the favorable binding mode and the binding affinity for a single ligand. Several docking tools are available, and they are widely used for structure-based ligand design [2,3,9–19].

173 Docking tools can be classified by the method they use to represent the binding site, by the technique for sampling ligand conformations, and by the way they construct the docked ligand. All docking tools fast enough to screen a large data set of molecules are based on so-called descriptor-matching approaches [20], which means that they represent the binding site by a template of points, onto which ligand atoms are mapped during the search. The template points can describe the shape of the binding site [9,21], or favorable chemical interaction centers above the protein surface, where hydrogen-bond donors or acceptors, metal ions, or hydrophobic groups of the ligands can be placed [1–3,11,22]. During the search for the optimal binding mode of a ligand, different conformers for the molecule are generated, which can be done randomly, e.g., by using a genetic algorithm [16,17,19,23], or by systematically sampling discrete torsional angles for the rotatable bonds of the ligand [1–3]. The faster docking tools construct the ligand incrementally in the binding site [2,3], or dock fragments of the ligand independently and chemically link them later, if the combination is feasible [1,12–14,24]. All docking tools that have been used for database screening employ a bindingsite template for guiding the search and incremental construction of the ligand in the binding site [ 1,25–30]. While all recent docking tools consider full ligand flexibility, induced complementarity of the protein upon ligand binding is not modeled, at least not in the faster docking tools. In approaches that model protein flexibility, this is often limited, e.g., by only rotating terminal hydrogens to optimize intermolecular hydrogen bonding [16,17], or by using rotamer libraries for the side-chain conformations [31,321. Other approaches model explicitly defined side-chain flexibility [33], hinge bending [34], or they dock ligands against an ensemble of protein structures [35]. Molecular dynamics simulations [36– 38] may yield the most realistic models of protein and ligand flexibility, but the resulting runtime for docking a single ligand is likely to be in the range of hours. A drawback of many docking tools is that they neglect the effect of bindingsite solvation and the potential for water-mediated interactions between protein and ligand [8,39–42]. While it is certainly possible to consider bound water molecules as part of the rigid protein in most docking tools, recently three sophisticated approaches have been reported, which either predict conserved binding-site waters [8], compute potential water positions prior to docking [43], or solvate the ligand molecule [29]. Recent docking and screening tools can identify potential ligands from up to 150 000 compounds within a few days, when considering full ligand flexibility [ 1,25–28,44]. Even in the absence of modeling inducible complementarity and solvation, there have been successful project reports in structure-based

174 lead discovery or design, including the identification of new inhibitors for thymidylate synthase [25], P. carinii dihydrofolate reductase (DHFR) [27], P. falciparum DHFR [45], trypanothione reductase [46], and human thrombin [47]. Our goal with the new screening tool SLIDE is to incorporate a balanced model of protein and ligand flexibility as well as a knowledge-based model of solvation that is fast enough to be used for screening and docking hundreds of thousands of compounds.

Methods: The screening tool SLIDE SLIDE (for ‘Screening for Ligands by Induced-fit Docking’) can screen databases of 3D structures of over 100 000 small organic molecules, typically within hours to a day, on an ordinary desktop workstation. It has also been used for screening 185 000 peptides, which are more flexible, within a few days [7,48]. SLIDE uses multi-level hashing, mean-field theory, and an empirically tuned scoring function to efficiently recognize infeasible compounds, dock the most promising ligand candidates, and produce a ranked list of some 100 potential ligands for a given protein target.

Representing the binding site The binding site of the protein is described by a template of favorable interaction points above its surface, onto which ligand atoms are mapped during the search. A template includes four different types of points: •

• •

•

Hydrogen-bond donor point. During screening, SLIDE can place a hydrogen-bond donor of the ligand onto this point, which is determined to be within favorable hydrogen-bonding distance of a protein hydrogenbond acceptor. Hydrogen-bond acceptor point. Each acceptor point is within favorable hydrogen-bonding distance of a protein hydrogen-bond donor. Hydrogen-bond donor/acceptor point. This is within hydrogen-bonding distance of both a hydrogen-bond acceptor and a donor of the protein, so either a ligand hydrogen-bond donor or acceptor can be placed here, or a group that can accept and donate at the same time (e.g., hydroxyl oxygen). Hydrophobic interaction center. These points are placed above a hydrophobic surface patch of the protein, and are matched by the centers of the most hydrophobic ligand groups, hydrocarbon rings.

175 The template can be automatically generated based on the ligand-free structure of the protein, which reduces bias towards known ligands. For automatic template generation, the binding site is filled with random points that are 2.5 to 5.0 Å from a protein atom. To determine favorable hydrogen-bonding positions, each of these points is checked for donors or acceptors in the protein within a distance of 2.5 to 3.5 Å; for protein hydrogen-bond donors, the angle between the donor, the donated hydrogen, and the probe point is also taken into account, and must be larger than 120°. Hydrophobic points are located between 3.5 and 5.0 Å, from the nearest protein atom. For these points, the average hydrophilicity of all protein atoms within 5.0 8, is below 0.1, indicating a hydrophobic site (based on the values provided in Reference 49). All points of the same type are then clustered using complete-linkage clustering [50] to yield a computationally tractable number of template points (typically up to 200). Similarly, interaction points in each potential ligand are defined as those that can act as hydrogen bond acceptors, donors, acceptors and/or donors (e.g., hydroxyl oxygen atoms), or hydrophobic centers. The latter are defined as the centers of hydrocarbon rings with 6 or fewer carbon atoms (e.g., cyclohexane or benzene rings). Hydrogen-bond donors or acceptors in the ligand candidates are identified for oxygen, nitrogen, sulfur, and halogen atoms based on the molecular orbital type, valency, and presence of hydrogen atoms in SYBYL mol2 format files prepared for each molecule in the ligand database. The interaction points in ligand candidates are mapped onto points in the binding site template having the same chemistry. Alternatively, the template can be defined based on interaction patterns observed in complexes with known ligands, biasing the search towards ligands with similar interaction patterns, similar to pharmacophore-based screens. For either the 'unbiased', automatically generated templates, or templates designed based on known ligand binding, special key interaction points that must be matched by the ligand can also be included. This is useful to ensure that a certain part of the binding site is covered, or that a docked ligand makes particular interactions. Beyond the template, which governs the selection of complementary ligands, the binding site of the protein is represented by a shell of surface residues and water molecules likely to mediate protein-ligand interactions. During the ligand search, all triangles of hydrogen-bond and hydrophobic interaction points in the screened molecules are mapped exhaustively onto triangles of template points with compatible geometry and chemistry, and such a mapping serves as a basis for docking a molecule into the binding site. A multi-level hashing approach is used to directly access all template triangles with feasible chemistry and geometry for a given set of three interaction centers in the ligand. Before the search, all possible template triangles

176 are generated from the set of binding-site template points, and are indexed via four levels of hash (indexing) tables. The indices in these hash tables are based on the chemistry (H-bond donor/acceptor or hydrophobic) of the three triangle points, on the perimeter of the triangle, and then on the longest and the shortest side for each of the indexed template triangles. By using these four properties for a given triplet of interaction centers in a ligand candidate, all template triangles with compatible geometry and chemistry can be directly and very efficiently accessed. For feasible matches between each ligand triangle and template triangle, the geometrically best mapping is computed, which is then used to transform the ligand triangle onto the corresponding template points by applying a least-squares fit superposition. When including key points in the template, only those triangles that include at least one of these key interaction centers are indexed in the hash tables.

Docking the anchor fragment The matched ligand interaction centers define the anchor fragment, which is the part of the molecule containing the three interaction centers. To maintain the distances between these matched points, all flexible bonds within this anchor fragment are rigidified. All chemically and geometrically feasible anchor fragments are then exhaustively tested in each ligand candidate for their ability to match triangles within the protein template. Collisions of the anchor fragment with protein main-chain atoms are resolved by iterative translations of the fragment as a rigid body. For this, a global translation vector is used to shift the anchor fragment the minimal amount necessary to resolve all collisions [5]. If all main-chain collisions can be resolved, the remaining atoms of the ligand are added to the anchor fragment in the conformation found for the molecule in the database. These atoms outside the anchor fragment are considered flexible, such that all single bonds in these parts can be rotated later, to resolve collisions with protein atoms. At this point we retain only those ligand dockings with at least 50% of their carbon atoms buried against the protein in order to keep only those dockings with good shape complementarity and minimal exposure of hydrophobic atoms to solvent; our analysis of 89 known protein-ligand complexes [5 1] showed they all met this criterion [7].

Modeling induced complementarity Induced fit between the two molecules is modeled by resolving any collisions of their flexible parts by directed rotations of single bonds in either the ligand or side chains of the protein. This follows the paradigm that in most cases the two molecules will move as little as possible in order to be shape-complementary. There are typically several rotations that will re-

177 solve an intermolecular collision, and an approach based on mean-field theory [32,52,53] is used to decide which rotations to use to improve the shape complementarity. For each pairwise intermolecular collision, the bonds in each molecule that can resolve the collision are identified. They are stored in a system together with the corresponding minimum rotation angle and the number of non-hydrogen atoms that will be displaced by the rotation. These two values provide the basis for a force measuring the cost of the rotation. A probability is assigned to each rotation, and all rotations that can be used to resolve one particular collision are initialized with equal probabilities. During several cycles of the mean-field optimization, these probabilities are updated and converge to higher values for those rotations that represent a globally optimal choice. When applying these rotations, a maximal number of collisions is resolved with minimal conformational changes in both molecules, without bias to one or the other; details of the mathematics of this procedure are provided in Reference 7. In each cycle of the mean-field optimization process, a mean force is computed for each rotation in the system, which is based on the force associated with this rotation and its correlations with other rotations in the system. The probabilities for all rotations in the system are updated at the end of each cycle, taking into account the mean forces of alternative rotations for the same collision. We do 10 cycles of the optimization, then the probabilities have typically converged to define a near-optimal set of rotations. All feasible rotations are applied in the order provided by the computed probabilities. Since it is likely that not all rotations can be resolved and that new collisions might have emerged, the mean-field optimization process is iterated up to 10 times. Intramolecular collisions are also tolerated, since it is assumed that they will be resolved in a future iteration. The result of the mean-field optimization process is either the exclusion of a molecule as infeasible, if collisions cannot be resolved, or a shape-complementary docking of the two molecules.

Considering binding-site solvation In order to not bias the search towards known ligands, we typically use the binding site from a ligand-free crystal structure of the target protein for screening. Water molecules are often observed in these crystal structures, and SLIDE can consider tightly bound waters when docking potential ligands. The current approach is to either translate a water molecule, if it overlaps with a ligand atom after docking the ligand into the binding site, or to displace it. A bound water molecule is only displaced if its collisions cannot be resolved by iterative translations, which are computed by summing the translation vectors that resolve each collision between the water molecule and a protein or ligand

178 atom. SLIDE considers a penalty term for each displaced water when scoring a complex, and only displacements by non-polar ligand atoms are penalized. To select which protein-bound water molecules to include in the screening and docking, we use a knowledge-based approach to determine those waters likely to be conserved upon ligand binding and to fix a penalty for their displacement. The tool Consolv [8], a k-nearest-neighbor classifier, is used to predict which binding-site waters will be conserved and which will be displaced upon ligand binding. Consolv’s prediction is based on several features that describe the favorability of the local environment of a water molecule, and its knowledge base is a set of 5542 water molecules taken from 30 independently solved protein structures. Prior to screening, we remove all waters that are predicted to be displaced and for the remaining waters we use Cansolv’s prediction confidence to scale the penalties for their displacement. To compute the penalty, we count the number of hydrogen bonds that are lost by displacing this water and scale this number by Consolv’s prediction confidence (between 50 and 100%).

Scoring a potential ligand Whenever a collision-free complex is generated, a score is assigned to the ligand based on the number of intermolecular hydrogen bonds and the hydrophobic complementarity of its interface with the protein. If not provided in the protein or ligand structure, the position of the shared hydrogen in each intermolecular hydrogen bond is computed. This position is well-defined for all but the terminal hydrogens in lysine and hydroxyl side chains; for these cases we choose the optimal hydrogen position subject to bonding constraints. All hydrogen bonds with a donor–acceptor distance up to 3.5 Å and a donorhydrogen-acceptor angle larger than 120° contribute equally to the score. If water molecules are included in the interface, all water-mediated hydrogen bonds are also counted. Intra-protein hydrogen bonds that were broken due to the rotation of a protein side chain, or hydrogen bonds to waters that were displaced upon ligand docking, lower the overall hydrogen-bond count by the number of lost hydrogen bonds. Note that this does not penalize the displacement of a water molecule by a polar ligand atom that preserves the hydrogen bond to the protein. The number of hydrogen bonds lost by displacing water molecules is weighted by Consolv’s prediction confidence of their displacement. The final intermolecular hydrogen-bond score between protein P and ligand L, reflecting loss in intra-protein and water-mediated hydrogen bonds, is HBONDS(P,L). For computing the hydrophobic complementarity value, atomic hydrophilicity values were taken from a statistical survey of hydration of the different atom types in 56 protein structures [49] (hydrophobicity values for protein

179 atoms were taken from Table II and values for ligand atoms from Table III in Reference 49). The contribution of a single ligand atom is based on the comparison of its hydrophobicity value with the average hydrophobicity of the surrounding protein surface atoms [7]. Given the hydrophobicity h(a) of an atom a, with h(a) ε [0.. .635] calculated as the average number of hydrations per 1000 occurrences of that atom type (Table II in Reference 49), a value of 0 represents a maximally hydrophobic atom, 635 is maximally hydrophilic, and 3 17 is intermediate. The hydrophobic complementarity of the contact surface between protein P and ligand L is computed as:

where h’( li) = max{317 – h(li), 0} considers only the hydrophobic contribution of ligand atoms li, since values larger than 317 refer to hydrophilic atoms. The hydrophobicity h(Pi) of the protein neighborhood Pi for a single ligand atom li is defined as the average hydrophobic contribution of all protein atoms pj within a distance of 4.0 Å of the ligand atom li :

The denominator in each term of the sum describing the hydrophobic score, HPHOB(P,L), is always greater than or equal to 32, which is 10% of the maximum score for a single ligand atom. This ensures that the overall HPHOB(P,L) score is not dominated by a few contacts with very small differences between protein and ligand hydrophobicity. The scoring function SCORE(P,L) for a collision-free complex is a linear combination of the hydrophobic and hydrogen-bond terms: SCORE(P, L) = A . HPHOB(P, L) + B . HBONDS(P, L) The relative contribution of these terms was tuned for best fit to the experimentally determined affinities of 89 protein-ligand complexes [51], giving the weight of 1.3: 1.0 for the hydrogen-bond term relative to the hydrophobic term.

180

Results SLIDE was previously used to screen for potential ligands to a bacterial aspartic protease, the human estrogen receptor, glutathione transferase, and HIV- 1 protease [7,48]. Here, we screen for potential ligands to human uracil-DNA glycosylase (coordinates of a complex with 6-amino-uracil provided by C. Mol and J.A. Tainer, The Scripps Research Institute), to the ligand-binding domain of the human progesterone receptor (PDB entry 1a28), and to E. coli dihydrofolate reductase (PDB entry 1ra9). The modeling of inducible complementarity and the control of molecular diversity in the set of potential ligands found by SLIDE has been described elsewhere [7], and here we include knowledge-based solvation. We screened two different databases for the three target proteins: • A subset of 70 113 compounds taken from the Cambridge Crystallographic Database System (CSD, http://www.ccdc.cam.ac.uk). These are all organic compounds with less than 100 atoms and at least three interaction centers that can be mapped onto template points. • 105 517 compounds from the NCI database (http://dtp.nci.nih.gov), which were taken from the conformers for the open set of this database as they were prepared by the group of J. Gasteiger using CORINA [54]. We used different approaches for designing the binding-site templates. The smallest template, consisting only of six points, was generated for the progesterone receptor. The interaction points were generated based on the centers of four carbon rings and two ketone oxygen atoms in the progesterone bound to the receptor in PDB entry 1a28, resulting in a template consisting of four hydrophobic and two acceptor points. One water molecule, which interacts with the bound progesterone in this structure was included in the binding site during screening. A search with such a small template is like a pharmacophorebased search, which restricts the set of potential ligands that SLIDE finds to compounds more or less similar to the known ligand, since that ligand and each potential new ligand share at least three interaction centers due to the triangle matching during docking. With this small template for the progesterone receptor, the total screening time for the more than 175 000 compounds was about nine minutes on an Intel Pentium II/450 processor running Solaris 2.7. Figure 1 shows a typical example of the kind of ligand SLIDE found for this screen with the small template. The ligand is 16α,17α-cyclopentenoprogesterone (CSD entry BUBRUJ), which is the known ligand progesterone with a cyclopentene substituent added to its D ring. It obtained a score of 39.9, which ranks it 52th out of the 175 630 screened compounds. The highest-

181

Figure 1. A cyclopenteno-progesterone (grey) from the CSD (entry BUBPUJ) was identified as a potential ligand and docked by SLIDE into the ligand-binding site of the human progesterone receptor (PDB 1a28). The template was based on six interaction centers of the progesterone from the crystal structure, which is shown in yellow tubes and is overlaid virtually exactly by SLIDE’S ligand. To fit the additional cyclopentene substituent, four side chains in the receptor underwent minor conformational changes; their native conformation is shown in yellow, and SLIDE’S conformation for these side chains is colored by atom type (green: carbon; red: oxygen). Note that the hydrophobic cyclopentene is in contact with hydrophobic groups in the receptor.

ranked progesterone from the CSD received a score of 37.4. A total number of 154 potential ligands were docked into the binding site and obtained a score higher than 35, which is a reasonable cutoff for ligands similar in size and chemistry to the known ligand. Like the progesterone in the crystal structure in PDB 1a28, this ligand makes one water-mediated hydrogen bond. To fit the highly rigid cyclopentene-progesterone into the binding site, adjustments in protein side chains were necessary. The figure shows four side chains in their native conformation together with the final, rotated conformation proposed by SLIDE. Note that only minor rotations were necessary to accommodate the cyclopentene, which demonstrates favorable hydrophobic complementarity with the neighboring side chains in the progesterone receptor. In the screening for ligands for the human uracil-DNA glycosylase [55,56], the binding site was taken from a crystal structure of a complex with 6-aminouracil bound deep in the active-site cleft. Twelve water molecules from this structure were predicted as being conserved by Consolv and included in the binding site during screening. A known inhibitor for this DNA-repair enzyme is a 84-residue protein that mimics DNA but binds irreversibly to the glycosylase [57]. We used the positions of five H-bond donors and acceptors in the bound 6-amino-uracil as key points (out of which at least one must be

182

Figure 2. The structure of 6-amino-uracil, the ligand present in the crystal structure of the human uracil-DNA glycosylase used for screening, is shown together with two highly-ranked molecules suggested by SLIDE as potential ligands for this enzyme.

matched by any potential ligand) in a template consisting otherwise of 150 automatically generated interaction points. The cumulative screening time for both databases was slightly over 17 h. Figure 2 shows the structure of 6-amino-uracil and two of SLIDE’S ligands, one with obvious resemblance to the known ligand. Figures 3 and 4 show these ligands in SLIDE’S binding modes together with key side chains and waters that interact with them. The ligand in Figure 3, CSD entry PICTIE, obtained a score of 32.8 and rank 55 with three water-mediated interactions to the protein, and the ligand in Figure 4, NCI entry 39807 (CAS 6313-89-9), obtained a score of 27.2, which ranked it 384th in the set of 683 potential ligands that were docked by SLIDE and scored higher than 25.0 out of the data set of over 175 000 compounds that were screened. The latter ligand binds similarly to 6-amino-uracil, but shows better complementarity due to additional water-mediated hydrogen bonds to the protein. In the screening runs against E. coli dihydrofolate reductase (PDB entry 1ra9), again a hybrid template design was used. To ensure that all ligands docked by SLIDE interact with the side chains binding pyrimidine in the

183

Figure 3. This figure shows 3’-deoxysangivamycin (CSD entry PICTIE), which was docked by SLIDE as a potential ligand into the active site of human uracil-DNA glycosylase, a DNA-repair enzyme. Key side chains and binding-site waters that interact with the ligand are shown, and feasible hydrogen bonds are indicated by dotted lines. Two side chains of the enzyme, a phenylalanine and a glutamine, were rotated by SLIDE to accommodate the ligand and are also shown in their original conformation (yellow).

Figure 4. A ligand from the NCI database (entry 39807), docked by SLIDE into the active site of human uracil-DNA glycosylase. It mimics the binding of 6-amino-uracil, the ligand bound in the structure that was used for screening. This ligand shows better complementarity than the original one, due to the additional carboxylate group that interacts with two conserved waters and a histidine side chain.

184

Figure 5. Two CSD compounds that were selected and docked into the active site of dihydrofolate reductase by SLIDE and obtained high scores. Both are known DHFR inhibitors.

known ligands methotrexate and dihydrofolate, two runs of the automatic template generator were done: one to specify 23 key points located in the pyrimidine region of the binding site, and another to fill the remaining part of the binding site with 64 additional template points. Four binding-site waters from the crystal structure of the ligand-free DHFR were predicted to mediate interactions and included during screening. The screening time for the 175 000 compounds was about 14 h. In the set of potential ligands identified by SLIDE were at least two known DHFR inhibitors (Figure 5), and their key interactions are shown in Figures 6 (CSD entry JOXTIZ) and 7 (CSD entry FIRNID). SLIDE’S scores for these ligands were 5 1.1 and 5 1.9, which ranked them 205th and 141th, respectively. Both ligands place a pyrimidine group in the same site, and the adamantyl-pyrimidine (FIRNID, Figure 7) binds deeper in the corresponding cleft. In the docking of CSD ligand JOXTIZ (Figure 6) a second water molecule fills the non ligand-occupied space. This water was displaced by an amino group in the docking of CSD ligand FIRNID (Figure 7), which replaces the hydrogen bonds to the other water and to the aspartic acid side chain of DHFR, so that this displacement was not penalized in SLIDE’S score.

185

Figure 6. Methylbenzoprim (CSD entry JOXTIZ), a known potent DHFR inhibitor, docked by SLIDE into the active site of E. coli dihydrofolate reductase. The ligand was selected by SLIDE out of 175 000 compounds in the screening database. Its pyrimidine group binds in the same cavity as the pyrimidine of the natural ligand, dihydrofolate, which was aided by positioning key template points in that area. The deeper part of this cavity is occupied by two bound water molecules, which were observed in the ligand-free protein structure that was used for screening (PDB 1ra9) and predicted by Consolv as being conserved upon ligand binding. One side chain, a leucine, was rotated by SLIDE upon ligand docking, and its original conformation is shown in yellow.

Discussion SLIDE is an efficient database screening tool, which searches data sets of structures of more than 175 000 organic compounds within minutes, when using a small template, as we did in the case of the progesterone receptor screen, or within several hours, as shown for uracil-DNA glycosylase and dihydrofolate reductase, where we used a more general binding-site template. It accomplishes this by using an efficient multi-level hashing scheme to directly access triplets of feasible interaction points in the binding-site template, onto which triplets of ligand interaction centers are mapped. On one hand this is a straightforward way to compute a transformation of the ligand into the binding site, so that the ligand already makes three favorable interactions, and on the other hand it is also an efficient way to rule out infeasible compounds: all compounds that lack a set of three favorable interactions are discarded before attempting docking into the binding site, For the progesterone receptor with the very specific 6-point template, more than 163 000 compounds, i.e., 93% of the screening databases, never needed to be docked into the binding site

186

Figure 7. Another known inhibitor for DHFR found by SLIDE in the CSD: 2,4-diamino-5-( 1-adamantyl)-6-methylpyrirnidine (CSD entry FIRNID), which binds with higher affinity to DHFR than methotrexate [65]. Again, the pyrimidine ring is located in the targeted area of the binding site and makes one water-mediated interaction. The other water molecule located in that area (shown in Figure 6) was displaced by a polar amino group, resulting in no desolvation penalty. Note the hydrophobic complementarity of the side chains in contact with the adamantan (yellow indicates their initial conformations).

for this reason. For more general templates, like the 155-point template for uracil-DNA glycosylase, docking and conformational search were performed for more than 70 000 compounds (40% of the database). Early in the development of SLIDE, we tried to reduce the complexity of the conformational search for the protein by using a rotamer library for the side chains, which had been done in docking approaches [31,32]. However, in the majority of cases all rotamers cause new collisions, and in a recent study it was shown that side chains close to ligand-binding sites tend to adopt non-rotameric conformations [58]. In most cases, including the examples described above, only minor rotations in both ligand and protein are necessary to generate a shape-complementary interface. These rotations are computed exactly by SLIDE, avoiding costly sampling of rotational angles. The conformational search is the most computationally complex step of screening with SLIDE. Our model of flexibility is more realistic than that in docking or screening tools that only consider ligand flexibility, since ligand and protein flexibility are treated equally, and the mean-field optimization selects those rotations for resolving collisions that cause the minimal overall distortion for the complex. Full conformational search is not done for the ligand, but rather its database conformation is used as a starting conformation

187 for docking. Since the structures for potential ligands are taken from crystal structures (CSD) or rule-derived models (NCI), they begin in a low-energy conformation. To deal with cases where the binding conformation of a ligand is very different from the database conformation, the database can be enriched by a series of low-energy conformers for screening [28,59]. Although our scoring function was empirically tuned based on published affinities for PDB complexes, we do not try to predict precise binding affinities in SLIDE. Several empirically derived scoring functions can be found in the literature [51,60–65]. Scoring functions sensitive to small conformation changes may not be appropriate for a screening tool like SLIDE, which cannot perform a conformational search for 100 000 or more ligand candidates. A sensitive scoring function is more appropriate in a fine-docking tool, which must predict differences in binding affinities for very similar conformations of a complex during the search. The scoring function in SLIDE is designed instead to rank the set of all potential ligands based on their complementarity. All examples described above were ranked within the top potential ligands for each target protein. SLIDE includes a web-based interface that enables the user to easily browse through the potential ligands and visualize SLIDE’S docking. The inclusion of binding-site solvation is in accordance with our models of induced fit and scoring. The positions of water molecules in the binding site from the crystal structure of the target protein are analyzed, and those predicted as conserved by Consolv are kept. In contrast to a method that precomputes several favorable water positions prior to docking, then picks the best positions to fill gaps between the molecules [43], SLIDE starts with ‘real’ water molecules and shifts them when they collide with ligand atoms. As in the conformational search, the idea is to start with a reasonable configuration and make only minimal changes, as necessary. If the collision of a water molecule cannot be resolved, the water is displaced and a desolvation penalty term is only applied when a lost hydrogen bond is not replaced by a corresponding protein-ligand interaction. While SLIDE’S docking procedure must be very quick, rather than comprehensive, in order to screen a large number of molecules, its inclusion of protein flexibility and solvation gives SLIDE advantages over other docking procedures. Because of the fast screening time, SLIDE can be used to search very large compound databases for the discovery of novel lead structures, and due to distinguishing a rigid anchor fragment for each screened molecule attached to flexible side chains, it will be straightforward to extend SLIDE for combinatorial screening.

188

Acknowledgements We thank Cliff Mol and John Tainer from The Scripps Research Institute for making the coordinates of the human uracil-DNA glycosylase complex available to us. The development of SLIDE was sponsored by the Deutsche Forschungsgemeinschaft (grant Schn 576/1-1 to V.S.), the National Science Foundation (grant DBI-960083 1 to L.A.K.), and the American Heart Association (grant 994009IN to L.A.K.).

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26.

Welch, W., Ruppert, J. and Jain, A.N., Chem. Biol., 3 (1996) 449. Rarey, M., Wefing, S. and Lengauer, T., J. Comput.-Aided Mol. Design, 10 (1996) 41. Rarey, M., Kramer, B., Lengauer, T. and Klebe, G., J. Mol. Biol., 261 (1996) 470. Jorgensen, W.L., Science, 254 (1991) 954. Schnecke, V., Swanson, C.A., Getzoff, E.D., Tainer, J.A. and Kuhn, L.A., Proteins Struct. Funct. Genet., 33 (1998) 74. Betts. M.J. and Stemberg, J.E., Protein Eng., 12 (1999) 271. Schnecke, V. and Kuhn, L., Proteins Struct. Funct. Genet., 2000 (manuscript submitted). Raymer, M.L., Sanschagrin, P.C., Punch, W.F., Venkataraman, S., Goodman, E.D. and Kuhn, L.A., J. Mol. Biol., 265 (1997) 445. Kuntz, I.D., Blaney, J.M., Oatley, S.J., Langridge, R. and Ferrin, T.E., J. Mol. Biol., 161 (1982) 269. Meng, E.C., Shoichet, B.K. and Kuntz, I.D., J. Comput. Chem., 13 (1992) 505. Shoichet, B.K. and Kuntz, I.D., Protein Eng., 6 (1993) 723. Böhm, H.-J., J. Comput.-Aided Mol. Design, 6 (1992) 61. Böhm, H.-J., J. Comput.-Aided Mol. Design, 6 (1992) 593. Böhm, H.-J., J. Comput.-Aided Mol. Design, 10 (1996) 265. Rarey, M., Kramer, B. and Lengauer, T., J. Comput.-Aided Mol. Design, 11 (1997) 369. Jones, G., Willett, P. and Glen, R.C., J. Mol. Biol., 245 (1995) 43. Jones, G., Willett, P., Glen, R.C., Leach, A.R. and Taylor, R., J. Mol. Biol., 267 (1997) 727. Morris, G.M., Goodsell, D.S., Huey, R. and Olson, A.J., J. Comput.-Aided Mol. Design, 10 (1996) 293. Moms, G.M., Goodsell, D.S., Halliday, R.S., Huey, R.S., Hart, W.E., Belew, R.K. and Olson, A.J., J. Comput. Chem., 19(14) (1998) 1639. Kuntz, I.D., Meng, E.C. and Shoichet, B.K., Acc. Chem. Res., 27 (1994) 117. Fischer, D., Lin, S. L., Wolfson, H. L. and Nussinov, R., J. Mol. Biol., 248 (1995) 459. Ruppert, J., Welch, W. and Jain, A.N., Protein Sci., 6 (1997) 524. Oshiro, C.M., Kuntz, I.D. and Scott Dixon, J., J. Comput.-Aided Mol. Design, 9 (1995) 113. Eisen, M.B., Wiley, D.C., Karplus, M. and Hubbard, R.E., Proteins Struct. Funct. Genet., 19 (1994) 199. Shoichet, B.K., Stroud, R.M., Santi, D.V., Kuntz, I.D. and Perry, K.M., Science, 259 (1993) 1445. Böhm, H.-J., J. Comput.-Aided Mol. Design, 8 (1994) 623.

189 27. Gschwend, D.A., Sirawaraporn, W., Santi, D.V. and Kuntz, I.D., Proteins Struct. Funct. Genet., 29 (1997) 59. 28. Lorber, D.M. and Shoichet, B.K., Protein Sci., 7 (1998) 938. 29. Shoichet, B.K., Leach, A.R. and Kuntz, I.D., Proteins Struct. Funct. Genet., 34 (1999) 4. 30. Makino, S., Ewing, T.J.A. and Kuntz, I.D., J. Comput.-Aided Mol. Design, 13 (1999) 513. 31. Leach, A.R., J. Mol. Biol., 235 (1994) 345. 32. Jackson, R.M., Gabb, H.A. and Sternberg, M.J.E., J. Mol. Biol., 276 (1998) 265. 33. Totrov, M. and Abagyan, R., Proteins Struct. Funct. Genet., Supplement 1 (1997) 215. 34. Sandak, B., Wolfson, H.J. and Nussinov, R., Proteins Struct. Funct. Genet., 32 (1998) 159. 35. Knegtel, R.M.A., Kuntz, I.D. and Oshiro, C.M., J. Mol. Biol., 266 (1997) 424. 36. Wassserman, Z.R. and Hodge, C.N., Proteins Struct. Funct. Genet., 24 (1996) 227. 37. Apostolakis, J., Plückthun, A. and Caflisch, A,, J. Comput. Chem., 19 (1998) 21. 38. Mangoni, M., Roccatano, D. and Di Nola, A., Proteins Struct. Funct. Genet., 35 (1999) 153. 39. Poornima, C.S. and Dean, P.M., J. Comput.-Aided Mol. Design, 9 (1995) 500. 40. Poornima, C.S. and Dean, P.M., J. Comput.-Aided Mol. Design, 9 (1995) 513. 41. Ladbury, J.E., Chem. Biol., 3 (1996) 973. 42. Sanschagrin, P.C. and Kuhn, L.A., Protein Sci., 7 (1998) 2054. 43. Rarey, M., Kramer, B. and Lengauer, T., Proteins Struct. Funct. Genet., 34 (1999) 17. 44. Lawrence, M.C. and Davis, P.C., Proteins Struct. Funct. Genet., 12 (1992) 31. 45. Toyoda, T., Brobey, R.K.B., Sano, G., Horii, T., Tomioka, N. and Itai, A., Biochem. Biophys. Res. Commun., 235 (1997) 515. 46. Horvath, D., J. Med. Chem., 40 (1997) 2412. 47. Burkhard, P., Taylor, P. and Walkinshaw, M.D., J. Mol. Biol., 277 (1998) 449. 48. Schnecke, V. and Kuhn, L.A., In Procs. ISMB 99, 7th Int. Conf. on Intelligent Systems for Molecular Biology, AAAI Press, Menlo Park, CA, 1999, pp. 242–251. 49. Kuhn, L.A., Swanson, C.A., Pique, M.E., Tainer, J.A. and Getzoff, E.D., Proteins Struct. Funct. Genet., 23 (1995) 536. 50. Duda, R.O. and Hart, P.E., Pattern Classification and Scene Analysis, John Wiley & Sons, New York, NY, 1973. 51. Eldridge, M.D., Murray, C.W., Auton, T.R., Paolini, G.V. and Mee, R.P., J. Comput.Aided Mol. Design, 11 (1997) 425. 52. Koehl, P. and Delarue, M., J. Mol. Biol., 239 (1994) 249. 53. Koehl, P. and Delarue, M., Curr. Opin. Struct. Biol., 6 (1996) 222. 54. Sadowski, J., Gasteiger, J. and Klebe, G., J. Chern. Inf. Comput. Sci., 34 (1994) 1000. 55. Mol, C.D., Arvai, AS., Slupphaug, G., Kavli, B., Alseth, I., Krokan, H.E. and Tainer, J.A., Cell, 80 (1995) 869. 56. Parikh, S.S., Mol, C.D., Slupphaug, G., Bharati, S., Krokan, H.E. and Tainer, J.A., EMBO J., 17 (1998) 5214. 57. Mol, C.D., Arvai, AS., Sanderson, R.J., Slupphaug, G., Kavli, B., Krokan, H.E., Mosbaugh, D.W. and Tainer, J.A., Cell, 82 (1995) 701. 58. Heringa, J. and Argos, P., Proteins Struct. Funct. Genet., 37 (1999) 44. 59. Knegtel, R.M.A., Bayada, D.M., Engh, R.A., von der Saal, W., van Geerestein, V.J. and Grootenhuis, P.D.J., J. Comput.-Aided Mol. Design, 13 (1999) 167. 60. Böhrn, H.-J., J. Comput.-Aided Mol. Design, 8 (1994) 243. 61. Böhm, H.-J., J. Comput.-Aided Mol. Design, 12 (1998) 309. 62. Jain, A.N., J. Comput.-Aided Mol. Design, 10 (1996) 427.

190 63. Head, R.D., Smythe, M.L., Opera, T.I., Waller, C.L., Green, S .M. and Marshall, G.R., J. Am. Chem. Soc., 118 (1996) 3959. 64. Murray, C.W., Auton, T.R. and Eldridge, M.D., J. Cornput.-Aided Mol. Design, 12 (1998) 503. 65. Mügge, I. and Martin, Y.C., J. Med. Chem., 42(5) (1999) 791. 66. Cody, V., Sutton, P.A. and Welsh, W.J., J. Am. Chem. Soc., 109 (1987) 4053.

Perspectives in Drug Discovery and Design, 20: 191–207, 2000. KLUWER/ESCOM © 2000 Kluwer Academic Publishers. Printed in the Netherlands.

Similarity versus docking in 3D virtual screening JORDI MESTRES* and RONALD M.A. KNEGTEL** Molecular Design and Informatics, N. V. Organon, 5340 BH Oss, The Netherlands

Summary. The development of similarity methods for fast flexible ligand superposition has recently received considerable attention. These efforts have brought similarity methods to a level of performance comparable to the well established protein-ligand docking methods for binding mode assessment and molecular database screening. However, the strengths and intrinsic limitations of both methodologies have been also stressed out extensively. As the number of resolved ligand-bound protein structures increases, combining ligand-based and receptor-based approaches emerges as a consensus strategy to maximally exploit the structural information available and improve the results obtained with either of the methods alone. Key words: flexible ligand docking, flexible ligand superposition, molecular similarity, thrombin, virtual screening

Introduction Virtual screening (VS, also referred to as molecular database screening or in silico screening) is the process of reducing a library containing an unmanageable number of compounds (available or virtual) to a limited number of potentially promising compounds for the target (or target family) of interest by means of computational techniques [1]. The rapid advance of new technologies such as combinatorial chemistry [2] and high-throughput screening [3] offers the possibility of synthesizing and testing hundreds of thousands of compounds. However, as the pressure on pharmaceutical companies to deliver more targets to the lead discovery pipeline augments, VS will increasingly become a valuable strategy for prioritizing compounds for screening. This should provide an optimum balance between the possibility of still screening every single target while maintaining time, cost, and waste of compounds at a reasonable level. * To whom correspondence should be addressed. E-mail: j .mestres@ organon.oss.akzonobel.nl ** Present address: Vertex Pharmaceuticals, 88 Milton Park, Abingdon, Oxon OX14 4RY, U.K.

192 The recent advances and developments in library design and automated computational methods for VS [4–6] have broadened its applicability to the different stages of a pharmaceutical project, namely, hit discovery, hit optimization, and lead optimization. At the hit discovery level, VS can be applied to general libraries of diverse available compounds in the search for those compounds containing alternative scaffolds showing the appropriate shape and chemical characteristics of the target of interest. At this stage, the activity criterion is usually not very stringent and a compound in the micromolar range can be usually considered as a ‘hit’. Once hits are identified and confirmed, a hit optimization program starts. At this level, VS can be applied to more focused libraries of available or virtual compounds built around each one of the selected hits. If the activity of an original hit is improved up to the sub-micromolar range, then that hit becomes a ‘lead’. Difficulties during the optimization process of a hit may foresee further difficulties during lead optimization. Therefore, it is important to have at this level different hit choices, as some of the hits identified at the discovery level will never be promoted to leads for strategic reasons. Finally, in a lead optimization program, VS can be applied to more targeted libraries aiming at not only further improving the activity of the compound but also its selectivity, toxicity and ADME properties [7]. As the selection criteria for compounds are narrowed when going from hit discovery to lead optimization, the efficiency of VS will depend more on the amount and type of structural information available. With the introduction of biophysical methods in drug discovery [8] during the 1980s it became possible to determine the binding modes of ligands in the context of a protein binding site at atomic resolution. This made it extremely attractive to use the shape and chemical composition of the binding site for searching databases for complementary molecules. On this basis, putative ligands can be potentially retrieved from databases of (virtual) molecules [9] or designed de novo to fit the protein binding site [10]. Since then, the relevance of docking methods for, on one hand, identifying molecules from a database that optimally fit into the protein active site and, on the other hand, proposing a binding mode for those molecules, has been widely established [11–20]. Although docking molecules to the 3D structure of their biological targets provides, in principle, the most accurate filter to identify and improve lead compounds, docking protocols suffer from a number of limitations. First of all, the conformational and orientational space to be searched for each potential ligand and its receptor is vast [21,22]. Any attempt to reduce it has consequences for the breadth and accuracy of the search. Secondly, the thermodynamical and quantum-mechanical descriptions of protein-ligand interactions are daunting and currently still beyond a definite and exact theoretical description [23–27]. This is, for instance, reflected in the painful absence of

193 reliable and broadly applicable scoring functions for protein-ligand interactions [28–31]. Finally, the receptor structure is often perceived as a constant [32], while in fact protein structures are known to adapt themselves to varying extents to the ligand presented to them [33–36]. Due to the sharp increase in van der Waals repulsion at short distances, small changes in the receptor structure can drastically change the course of a docking simulation. Clearly, attempting to dock a ligand to a non-complementary protein conformation defies the lock-and-key paradigm that underlies the concept of docking. In conclusion, despite its proven usefulness, molecular docking has not yet achieved the degree of accuracy that may have been expected when it was originally perceived in a situation where structures of the protein or a protein-ligand complex are known in atomic detail [37–42]. Central to the application of computational techniques in medicinal chemistry is the similarity-property principle, which states that structurally related molecules should demonstrate similar biological activities [43,44]. The derivation of classical QSAR [45,46] and more recently 3D-QSAR methods [47– 51] as well as the concept of molecular similarity and diversity for designing virtual libraries [52–57] rely on this assumption. Since in many cases only structural information is available on active ligands at the onset of a project, many computational approaches have focused on relating observed activities with the 2D or 3D structural and physico-chemical properties of the ligands. Some methods aim at evaluating molecular descriptors [58–60] or identifying the presence or absence of certain chemical groups (fingerprints) [6 1,62]. Other methods search for substructure sets [63,64], geometric patterns or pharmacophores [65–67] within sets of active and inactive molecules. Finally, several methods based on the steric, electrostatic, hydrophobic and/or hydrogen-bonding surface properties of active molecules have also been developed [68–78]. In the last two categories, the alignment of molecules, in a way that is of relevance to their mode of action, represents a fundamental and challenging issue for the entire similarity approach [79]. Even though protocols for 3D alignment of molecules may yield ‘perfect’ alignments, according to the definition of similarity being used, there is no guarantee that the alignment obtained has relevance to the true binding modes of these molecules when complexed with their biological receptor. This presents an intrinsic caveat to the approach and indeed, cases are known where similar ligands display radically different binding modes [79–81]. Even more subtle deviations from the true binding mode, suggested by rigorously superimposing similar molecules, may affect the predictive power of models based on such superpositions [82]. Despite their limitations, both approaches have developed into practical and useful tools for the computational chemist working in the pharmaceutical

194 industry. The principles of molecular similarity and chemical complementarity to the receptor have recently been implemented in computational tools for virtual screening of compound libraries. In practice, however, both methods are often used separately. When the structure of the receptor is unknown, active ligands are used to understand and predict the activities of novel compounds. If crystal structures are available for the system at hand, structurebased design would be the method of choice [83]. Given the strengths and limitations of both methodologies, however, a combination of ligand-based and receptor-based approaches could be used to improve the results obtained with either of them. The alignment of ligands may be improved by including knowledge on the binding site (vide infra) and docking may be performed more effectively if similarity with known ligand binding modes is enforced [20]. Although molecular docking has always been aimed at searching for all possible binding modes of novel ligands, even if they are very different from previously observed binding modes, a complete search of the binding site may not be necessary in most practical applications. On the basis of currently available crystal structures one can conclude that diverse inhibitors in most instances occupy the same regions within the binding site and in structure-based focusing of combinatorial libraries the use of a rigid and pre-placed scaffold greatly simplifies the docking of analogues [ 16]. Finally, large discrepancies between results obtained with both methods may identify compounds that deviate from the known SAR by manifesting new binding modes or question the applicability of either method to the system under study. In order to investigate some of these critical issues, the results obtained from two implementations of similarity and docking methods in a 3D virtual screening test case will be compared.

Methodological aspects All docking calculations were done with the program DOCK 4.0 [21]. Sphere centers, to be used for matching with ligand atoms during docking, were generated in the ligand binding site with SPHGEN [11]. Initial sphere sets were manually reduced using Sybyl 6.5 [84] to yield sets of 20–30 well-dispersed spheres. DOCK 4.0 scoring grids were generated with CHEMGRID [85], using a 0.3 Å, grid spacing and a 4r distance-dependent dielectric. The number of accepted bumps between receptor and ligand was set to 3. The minimum anchor size was set to 4 atoms. A maximum of 100 iterations and a convergence threshold of 0.1 were set for chemical score minimization. The scope of the sampling is related to the number of starting orientations that can be tried for each base fragment (anchor) and the number of fragment orientations (seeds) that are kept at each stage of the ligand incremental-construction pro-

195 cess. During sampling, the number of starting orientations and the number of seeds were set to 100 and 5, respectively. All similarity calculations were done with the program MIMIC [76]. Two types of Gaussian-based molecular fields were used to evaluate molecular similarities. An atom-centered steric-volume field (MSV) [86] and an unitedatom point-charge electrostatic potential (MEP) [69] are used to represent the steric and electrostatic features of a molecule, respectively, Although these are rather crude approximations, they are generally adequate to reproduce qualitatively most of the important steric and electrostatic features of a molecule [87–89]. A field-based similarity measure (ZAB ) between two molecules is computed as (1) where FA(r) and FB(r) are the molecular fields associated to the reference molecule, A, and the target molecule, B, respectively. Then, in order to compare similarity values from different pairs of molecules, similarity measures are normalized using a cosine-like similarity index, SAB , defined as (2) SAB depends on the mutual superposition of molecules, as well as on their conformations. Therefore, assuming a rigid reference molecule, optimization of SAB depends on three translational (t) and three rotational (Θ) degrees of freedom [76,87,88], as well as on a number of conformational degrees of freedom [20,89] for the target molecule (τ B, set of rotatable bonds). Depending on the molecular field used to calculate the similarity, SAB can take values between 0 and 1, for positive definite molecular fields, such as MSV, and between –1 and +1, for non-positive definite fields, such as MEP. Following previous validation studies [87–89], throughout the work a similarity index defined by a weighted combination of MSV and MEP similarity indices in a + was used. 2:1 ratio (SAB = In order to find the optimal superposition between the reference and the target molecules, the orientational space is explored using a spherical systematic search [76]. Basically, the reference molecule is kept fixed and the adapting molecule is systematically placed in a number of orientations around the reference molecule, from which optimization of the molecular similarity, as defined by Equation 2, will proceed following normal gradient-seeking techniques. During sampling, the number of starting orientations and gradient convergence criteria were set to 12 (90-degree search) and 0.01, respectively.

196 An automatic molecular-size checking with respect to the reference molecule decides whether sampling needs to be further extended for smaller target molecules or limited for much larger target molecules. Finally, MIMIC has recently also been extended to allow the inclusion of an active-site bump-penalty term to the similarity scoring of the different alignments obtained when sampling. This is simply done by evaluating the asymmetic steric-field similarity between the receptor active site, R, and the target molecule, B, with respect to the target molecule as (3) The final similarity, between the reference molecule, A , and a target molecule, B, with the inclusion of a bump-penalty term with the receptor, At present, the steric penalty is R, is evaluated as = SAB applied only after the MIMIC search has been completed. Therefore, this correction has at most influence on the relative ordering of the alignment solutions if steric clashes are detected in the alignments. Throughout this work, a MIMIC calculation including the active site bump penalty will be referred to as DP-MIMIC (Docking Penalized MIMIC calculation). The DPMIMIC approach can be considered the complementary implementation of the recently developed SP-DOCK approach (Similarity Penalized DOCK calculation), where a ligand-ligand similarity term is included in the docking energy score [20].

3D virtual screening for thrombin ligands Structure-based three-dimensional VS (3D-VS) is quite often identified solely with the application of docking methods once the structure of the target of interest has been resolved. In many cases, several years are required to obtain the target structure, which limits the applicability of VS from docking to quite late in a pharmaceutical project timeline. The recent development of similarity methods allowing for fast flexible ligand superposition [20,77,78] has moved the applicability of 3D-VS much earlier in the project stages. In most cases, 3D-VS from similarity could be applied now as soon as hits from highthroughput screening are confirmed or become publicly available from other sources. Therefore, taking docking as the method of reference, it is probably appropriate to compare the performance of similarity and docking methods in 3D-VS to extrapolate the validity of applying 3D similarity methods at those early stages. In this contribution, the performance of two particular implementations, MIMIC for similarity and DOCK for docking, will be compared

197 in a virtual screening of 10 000 diverse compounds selected from the ACD database in search for thrombin-like compounds. The thrombin case has been selected, on the one hand, because of its extensive use as a validation test by other implementations of similarity and docking methods, being thus relevant for comparative purposes. On the other hand, it has been shown recently that, despite having a well-defined active site, it can still be a challenge for docking methods aiming at obtaining small root-mean square deviations in binding mode assessment applications [20,30] and good active-molecule enrichments in molecular database screening applications [20,42]. As structural templates for the 3D-VS, the NAPAP ligand in MIMIC calculations and its complementary thrombin structure in DOCK calculations were extracted from the Protein Data Bank (lDWD, 3.0 Å, resolution). Hydrogen atoms were automatically added to the crystal structure of the NAPAP ligand and Gasteiger-Marsili [90] atomic charges generated using Sybyl 6.5. Along with the structure of the NAPAP ligand, the active site residues included in a 10 Å, sphere around NAPAP were also considered in DP-MIMIC calculations. In order to reduce bias towards the observed water-assisted ligand-protein interactions in 1 DWD, no crystallographic water molecules were considered in DOCK calculations. A diverse selection of 10 000 molecules was obtained by clustering the BCI fingerprints [58] of the entire ACD database. For all molecules, threedimensional structures were generated automatically by CORINA [9 1], and Gasteiger-Marsili [90] atomic charges calculated with Sybyl 6.5 [84]. This set of 10 000 molecules constitutes the virtual library that was screened with MIMIC and DP-MIMIC to identify molecules with similar steric and electrostatic characteristics to the NAPAP ligand and with DOCK to identify molecules with complementary shape and interaction sites to the thrombin active site. The computational cost of screening those 10 000 molecules with MIMIC and DP-MIMIC was about 50 h and with DOCK about 255 h on a single CPU of an SGI/R10000 machine. However, it is important to note that 3D-VS calculations can be fully performed in parallel by distributing different database chunks to the number of CPUs available. An important issue in any analysis of virtual screening results is how to decide which fraction of the best-scoring molecules will be further considered for a more detailed inspection or directly submitted for a prioritized experimental testing. Commonly, the 5–10% highest-ranking molecules within the database are subjectively chosen. However, a more rational method of selecting the set of potentially most interesting molecules would be desirable. With this in mind, the use of the (normalized) similarity or docking scores is proposed as an objective metric for the (de)selection of molecules from 3D searches [20].

198

Figure 1. Normalized score obtained from MIMIC (in blue) and DOCK (in red) for the best-scoring 2000 molecules.

Figure 1 depicts the decline of the normalized MIMIC similarity and DOCK energy scores with increasing rank for the highest-ranking 2000 molecules obtained by the two 3D-VS runs. The distribution of scores obtained with DP-MIMIC is practically identical to that plotted in Figure 1 for the original MIMIC search. Thus, although the final rankings of individual molecules may change due to the active site steric-field correction, the overall discrimination between highly ranked molecules and the bulk of the database is maintained in DP-MIMIC with respect to MIMIC. This notwithstanding, it is interesting to observe from Figure 1 that both MIMIC and DOCK methods experience a sharp decrease in normalized scores within the first 200 molecules. After that, the scores experience an almost linear decrease [20]. In particular, the DOCK score provides a faster initial decay than the MIMIC score. This is most likely due to two factors. On one hand, the substantial emphasis put on charged contacts by the DOCK energy score (i.e. in the P1 pocket of thrombin), which is less pronounced in the MIMIC similarity score (based on a ‘soft’ Gaussian representation of the electrostatic potential). On the other hand, the fact that, by definition, molecules will always overlap to some extent in similarity methods. In any case, the shape of the curves in Figure 1 indicates that the first 100–200 molecules in the ranked list are clearly favoured by MIMIC and DOCK over those molecules constituting the linear plateau that is eventually reached. Therefore, it is important to check always whether the 3D-VS method being used provides a clear discrimination between a reduced number of high-scoring molecules and the rest of the database. If this is indeed the case, by simply setting a minimum for the

199 gradient of the normalized score, an independent and rational metric can be obtained to select molecules from different virtual screening approaches for further analysis or experimental testing [20]. Another point worth analyzing when comparing 3D-VS results from different methods or implementations of the same method is their degree of consensus or redundancy within the highest-scoring molecules. As recently pointed out [1], one way to get around the problem of discriminating a set of potentially interesting molecules is to perform consensus scoring by selecting only compounds that obtain a certain threshold score in a number of different scoring functions. In the case presented here, however, consensus will be demanded between different 3D-VS methodologies rather than between protein-ligand scoring functions [20]. By following this strategy it is expected that the likelihood of selecting potentially interesting compounds will be higher. Figure 2 provides a chemical diversity analysis of the molecules favoured by both MIMIC and DOCK. For each of the 10 000 molecules of the database, 45 descriptors were computed, which were previously selected by cluster significance analysis of a larger number of descriptors. Subsequently, a principal component analysis was done. The axes in Figure 2 represent the first two principal components (explaining 45.8% and 12.6% of the total variance), the values of which have been scaled in order to produce a uniform density of molecules along the two axes. A more detailed description of the diversity analysis performed here can be found elsewhere [55]. There are two aspects worth remarking from Figure 2. The first one is that the molecules that have been ranked by both MIMIC and DOCK within the top 1000 scores (in red) cover almost the same diversity space as that described by the entire set of 10 000 diverse compounds from the ACD (in gray). Although one may have expected molecules selected on the basis of 3D similarity or docking to be limited to a confined region of chemical space, a diverse range of structures is retrieved by both methods. The second one is that some of the consensus molecules overlap well with the property-space regions populated by known thrombin inhibitors (in blue). In fact, the consensus compounds cover more diversity space than the selection of known thrombin inhibitors. However, it is not possible to state a priori that this is due to a true diversification in active scaffolds for thrombin or to a lack of specificity of the scoring functions used by MIMIC and DOCK. Selection of these compounds for further assay testing should give the answer a posteriori. At this point, it is interesting to address the question of whether combining 3D similarity and protein structure-based approaches could improve the retrieval of potential hits from chemical databases with respect to the original similarity and docking implementations. Our recent experiences with includ-

200

Figure 2. First two principal components of the diversity analysis of the 470 consensus molecules between MIMIC and DOCK within the best-scoring 1000 molecules (in red) with respect to the entire set of 10 000 ACD compounds (in gray) and 32 known thrombin inhibitors (in blue). The descriptor space has been scaled to produce a uniform density of compounds along the two axes.

ing a ligand-ligand similarity term in the context of flexible protein-ligand docking have shown promising results [20]. Here, we report the novel use of introducing information on the active site environment in 3D similarity searches, as implemented in the program MIMIC (vide supra). The percentages of consensus molecules between MIMICLIP-MIMIC, MIMIC/DOCK, DP-MIMIC/DOCK and MIMIC/DP-MIMIC/DOCK within the 1000 bestscoring molecules (10% of the database) are 66.9, 47.0, 43.5 and 36.0%, respectively. Interestingly, MIMIC and DP-MIMIC show less than 67% of consensus molecules. As a result of the inclusion of an active site bump penalty in the similarity scoring, a different ranking of molecules is clearly obtained from MIMIC and DP-MIMIC. Unfortunately, this is not translated in a higher number of consensus molecules between DP-MIMIC and DOCK

201 than between MIMIC and DOCK, probably due to the fact that the active site penalty is incorporated only after, and not during, the similarity optimization. It is, though, gratifying to find out that two conceptually different methods, such as MIMIC and DOCK, still reach consensus on almost half (47%) of the 1000 top-scoring molecules and that more than one third (36%) of the 1000 top-scoring molecules are independently identified by MIMIC, DP-MIMIC, and DOCK. A final aspect of the comparison between similarity and docking approaches to 3D-VS concerns the binding modes proposed by each method. Despite being computationally demanding, one of the attractive characteristics of structure-based methods is their ability to provide a 3D alignment (in similarity methods) or orientation into the active site (in docking methods), from which a ligand optimization program can be more intuitively guided. It would be, thus, interesting to analyze the degree of resemblance between the binding modes proposed from similarity and docking for some of the consensus molecules identified during the 3D-VS. A detailed comparison of the performance of MIMIC and DOCK for assessing the binding mode of a set of 32 known thrombin inhibitors was recently performed [20]. For the sake of completeness, the effect of introducing a ligand-ligand similarity penalty term into DOCK (SP-DOCK) was also investigated [20]. Here, emphasis will be put in analyzing the effect of introducing a protein-ligand docking penalty term into MIMIC (DP-MIMIC) on the final binding mode. For this purpose, the best binding mode solutions for two consensus compounds, MFCD00038940 and MFCD00063402, independently identified within the top-scoring 100 molecules (1 % of the database) by MIMIC, DP-MIMIC, and DOCK, are depicted in Figure 3. In general, three situations have been encountered when introducing a docking penalty term into the similarity score in DP-MIMIC as compared to the original similarity score in MIMIC. First, inclusion of a bump penalty retains the best alignment identified from MIMIC using similarity only. This is especially likely if the ligands are of similar or smaller size and shape than the reference inhibitor and additional clashes with the protein are not expected. Second, inclusion of the bump penalty promotes a lower-scoring solution from MIMIC to the best alignment solution from DP-MIMIC, but the new best alignment solution is just a refinement of the previous one, thus resulting in a binding mode which is qualitatively equivalent to the one identified by MIMIC. This is the case of MFCD00063402. As can be observed in Figure 3, because some steric clash with the active site is identified in the best solution from MIMIC, the second-best solution from MIMIC becomes the best solution from DP-MIMIC, approaching in this case the best solution from DOCK. Third, inclusion of the bump penalty promotes a lower-scoring solution from

202

Figure 3. Binding mode solutions for two consensus compounds, MFCD00038940 and MFCD00063402 (with carbon atoms in white), identified within the best-scoring 100 molecules from MIMIC, DP-MIMIC, and DOCK. The structure of NAPAP (with carbon atoms in green) is provided as a visual reference in all cases. The active site of thrombin is represented by a blue translucent surface.

203 MIMIC to the best alignment solution from DP-MIMIC, resulting in an essentially different binding mode. This is the case of MFCD00038940. As can be observed in Figure 3, the placement of the naphthalene ring system in MFCD00038940 clashes into the protein active site in the best solution found by MIMIC. Due to this dramatic steric clash, the second-best solution from MIMIC becomes the best solution from DP-MIMIC, where the spatial positions of the naphthalene and benzene rings have been interchanged and a completely different binding mode is proposed. Analysis of the 10 best orientations identified by DOCK reveals that they define two major binding modes. Interestingly, in this case DOCK favors a binding mode closer to that identified by MIMIC, albeit with an orientation with much less 3D overlap with NAPAP. However, solutions resembling the binding mode proposed by DP-MIMIC are also found with lower chemical score. Therefore, hybrid similarity-docking methods as DP-MIMIC or SP-DOCK [20] do represent real alternatives to the pure MIMIC and DOCK methods. With the number of ligand-bound protein structures available in the Protein Data Bank constantly increasing, the possibility of using structure-based methods that maximally exploit all the information present in ligand-bound structures will become more attractive.

Conclusions Virtual screening, using 3D structural information of a ligand, a protein or a ligand-bound protein structure, has rapidly developed into a useful addition to available 2D ligand-based technologies. Although computationally slower than selections performed on the basis of 2D fingerprints and molecular descriptors, 3D virtual screening utilizes more information regarding the molecular-field requirements for biological activity. In addition, it has the potential to identify entirely new scaffolds and more effectively guide ligand optimization programs. With the pressure from the identification of new targets to screen and the need to reduce time, cost and waste of compounds, combinatorial chemistry and high-throughput screening technologies should effectively integrate 3D virtual screening tools to prioritize reagent or compound selection for synthesis and testing. In this article, both ligand- and receptor-based VS strategies as implemented in the MIMIC and DOCK programs, respectively, reach nearly 50% consensus on high-ranking compounds from a diverse chemical database. MIMIC may prove more likely to identify compounds that are missed by docking approaches due to the use of a single receptor conformation or imperfect scoring functions. DOCK, on the other hand, is less dependent on the reference structure and enforces complementarity with the receptor in

204 regions not explored by the reference molecule used in similarity approaches. Introducing an additional steric field derived from the protein active site, as in DP-MIMIC, can focus ligand-based 3D screening even further at less computational cost than the use of a full-blown protein-ligand scoring function. In conclusion, flexible 3D similarity searches provide a generally applicable approach to computational lead discovery, independent of the availability of structural data on the biological target. Its intrinsic dependence on the conformation of the reference molecule suggests, however, that a rigid reference scaffold or its experimentally determined bound conformation is preferably desirable. If structural information on the receptor is available, docking approaches can be applied in lead discovery and especially lead optimization. At the optimization stage, one would prefer to use the maximal amount of information available to focus combinatorial libraries or congeneric series towards optimal fit with the binding site. It is, thus, envisaged that a promising strategy would be to combine the use of ligand- and receptor-based virtual screening techniques, allowing one method to overcome limitations of the other.

References 1. Walters, P.A., Stahl, M.T. and Murcko, M.A., Drug Discov. Today, 3 (1998) 160. 2. Armstrong, R.W., Combs, A.P., Tempest, P.A., Brown, S.D. and Keating, T.A., Acc. Chem. Res., 29 (1996) 123. 3. Koltermann, A., Kettling, U., Bieschke, J., Winkler, T. and Eigen, M., Proc. Natl. Acad. Sci. USA, 95 (1998) 1421. 4. VanDrie, J.H. and Lajiness, M.S., Drug Discov. Today, 3 (1998) 274. 5. Kubinyi, H., Curr. Opin. Drug Discov. Develop., 1 (1998) 16. 6. Ghose, A.K., Viswanadhan, V.N. and Wendoloski, J.J., In Parrill, A. and Reddy, M.R. (Eds.) Rational Drug Design, American Chemical Society, Washington, DC, 1998. 7. Clark, D.E., Drug Discov. Today, 5 (2000) 49. 8. Martin, J.L., Curr. Med. Chem., 3 (1996) 419. 9. Kuntz, I.D., Science, 257 (1992) 1078. 10. Böhm, H.-J., Curr. Opin. Biotechnol., 7 (1996) 433. 11. Kuntz, I.D., Blaney, J.M., Oatley, S.J., Langridge, R.L. and Femn, T.E., J. Mol. Biol., 161 (1982) 269. 12. Rarey, M., Kramer, B., Lengauer, T. and Klebe, G., J. Mol. Biol., 261 (1996) 470. 13. Welch, W., Ruppert, J. and Jain, A.N., Chem. Biol., 3 (1996) 449. 14. Jones, G., Willett, P., Glen, R.C., Leach, A.R. and Taylor, R., J. Mol. Biol., 267 (1997) 727. 15. Moms, G.M., Goodsell, D.S., Halliday, R.S., Huey, R., Hart, W.E., Belew, R.K. and Olson, A.J., J. Comput. Chem., 19 (1998) 1639. 16. Sun, Y., Ewing, T.J., Skillman, A.G. and Kuntz, I.D., J. Comput.-Aided Mol. Design, 12 (1998) 597.

205 17. Baxter, C.A., Murray, C.W., Clark, D.E., Westhead, D.R. and Eldridge, M.D. Proteins, 33 (1998) 367. 18. Wang, J., Kollman, P.A. and Kuntz, I.D., Proteins, 36 (1999) 1. 19. Trosset, J.Y. and Scheraga, H.A., J. Comput. Chem., 20 (1999) 412. 20. Fradera, X., Knegtel, R.M.A. and Mestres, J., Proteins Struct. Funct. Genet., 40 (2000) 623. 21. Ewing, T. and Kuntz, I.D., J. Comput. Chem., 18 (1997) 1175. 22. Vieth, M., Hirst, J.D., Dominy, B.N., Daigler, H. and Brooks 111, C.L., J. Comput. Chem., 19 (1998) 1623. 23. Bush, B.L. and McCammon, J.A., Biophys. J., 72 (1997) 1047. 24. Ajay and Murcko, M.A., J. Med. Chem., 38 (1996) 4953. 25. Raffa, R.B., Life Sci., 65 (1999) 967. 26. Friesner, R.A. and Beachy, M.D., Curr. Opin. Struct. Biol., 8 (1998) 257. 27. Monard, G. and Merz Jr., K.M., Acc. Chem. Res., 32 (1999) 904. 28. Vieth, M., Hirst, J.D., Kolinski, A. and Brooks III, C.L., J. Comput. Chem., 19 (1998) 1612. 29. Tame, J.R.H., J. Comput.-Aided Mol. Design, 13 (1999) 99. 30. Knegtel, R.M.A., Bayada, D.M., Engh, R.A., von der Saal, W., van Geerestein, V.J. and Grootenhuis, P.D.J. J. Comput.-Aided Mol. Design, 13 (1999) 167. 31. Muegge, I. and Martin, Y.C., J. Med. Chem., 42 (1999) 791. 32. Gschwend, D.A. and Kuntz, I.D., J. Comput.-Aided Mol. Design, 10 (1996) 123. 33. Leach, A.R., J. Mol. Biol., 235 (1994) 345. 34. Knegtel, R.M.A., Kuntz, I.D. and Oshiro, C.M., J. Mol. Biol., 266 (1997) 424. 35. Apostolakis, J., Pluckthun, A. and Caflisch, A,, J. Comput. Chem., 19 (1998) 21. 36. Sandak, B., Wolfson, H.J. and Nussinov, R., Proteins, 32 (1998) 159. 37. Blaney, J.M. and Dixon, J.S., Perspect. Drug Discov. and Design, 1 (1993) 301. 38. Jones, G. and Willett, P., Curr. Opin. Biotech., 6 (1995) 652. 39. Lengauer, T. and Rarey, M., Curr. Opin. Struct. Biol., 6 (1996) 402. 40. Sansom, C., Nat. Biotechnol, 16 (1998) 917. 41. Kirkpatrick, D.L., Watson, S. and Ulhaq, S., Comb. Chem. High-Through. Screen., 2 (1999) 211. 42. Knegtel, R.M.A. and Wagener, M., Proteins Struct. Funct. Genet., 37 (1999) 334. 43. Johnson, M.A. and Maggiora, G.M. (Eds.), Concepts and Applications of Molecular Similarity, Wiley, New York, NY, 1990. 44. Dean, P.M. (Ed.), Molecular Similarity in Drug Design, Blackie Academic, London, 1995. 45. Karelson, M., Lobanov, V.S. and Katritzky, A.R., Chem. Rev., 96 (1996) 1027. 46. Hansch, C., Hoekman, D. and Gao, H., Chem. Rev., 96 (1996) 1045. 47. Cramer III, R.D., Patterson, D.E. and Bunce, J.D., J. Am. Chem. Soc., 110 (1988) 5959. 48. Klebe, G., Perspect. Drug Discov. Design, 12/13/14 (1998) 87. 49. Kubinyi, H. (Ed.), 3D QSAR in Drug Design: Theory, Methods and Applications, ESCOM, Leiden, 1993. 50. Kubinyi, H., Folkers, G. and Martin, Y.C. (Eds.), 3D QSAR in Drug Design: Ligandprotein Interactions and Molecular Similarity, Perspect. Drug Discov. Design, 9/10/11 (1998). 51. Kubinyi, H., Folkers, G. and Martin, Y.C. (Eds.), 3D QSAR in Drug Design: Recent Advances, Perspect. Drug Discov. Design, 12/13/14 (1998). 52. Willett, P., Similarity and Clustering in Chemical Information Systems, Wiley, New York, NY, 1994.

206 53. Martin, E.J., Blaney, J.M., Siani, M.A., Spellmeyer, D.C., Wong, A.K. and Moos, W.K., J. Med. Chem., 38 (1995) 1431. 54. Hassan, M., Bielawaski, J.P., Hempel, J.C. and Waldman, M., Mol. Div., 2 (1996) 64. 55. Bayada, D.M., Hamersma, H. and van Geerestein, V.J., J. Chem. Inf. Comput. Sci., 39 (1999) 1. 56. Martin, E.J. and Critchlow, R.E., J. Comb. Chem., 1 (1999) 32. 57. Agrafiotis, D.K., Myslik, J.C. and Salemme, F.R., Mol. Div., 4 (1999) 1. 58. Barnard, J.M. and Downs, G.M., J. Chem. Inf. Comput. Sci., 32 (1992) 644. 59. Brown, R.D. and Martin, Y.C., J. Chem. Inf. Comput. Sci., 37 (1997) 1. 60. Matter, H., J. Med. Chem., 40 (1997) 1219. 61. Flower, D.R., J. Chem. Inf. Comput. Sci., 38 (1998) 379. 62. Xue, J., Godden, J.W. and Bajorath, J., J. Chem. Inf. Comput. Sci., 39 (1999) 881. 63. Cramer, R.D., Redl, G. and Berkhoff, C.E., J. Med. Chem., 17 (1973) 533. 64. Hagadone, T.R., J. Chem. Inf. Comput. Sci., 32 (1992) 515. 65. Van Drie, J.H., J. Comput.-Aided Mol. Design, 11 (1997) 39. 66. Pickett, S.D., Luttmann, C., Guerin, V., Laoui, A. and James, E., J. Chem. Inf. Comput. Sci., 38 (1998) 144. 67. Chen, X., Rusinko III, A., Tropsha, A. and Young, S.S., J. Chem. Inf. Comput. Sci., 39 (1999) 887. 68. Kearsley, S.K. and Smith, G.M., Tetrahedron Comput. Methods, 3 (1990) 615. 69. Good, A.C., Hodgkin, E.E. and Richards, G.W., J. Chem. Inf. Comput. Sci., 32 (1992) 188. 70. Jain, A.N., Dietterich, T.G., Lathrop, R.H., Chapman, D., Critchlow Jr., R.E., Bauer, B.E., Webster, T.S. and Lozano-Perez, T., J. Comput.-Aided Mol. Design, 8 (1994) 635. 71. Klebe, G., Mietzner, T. and Weber, F, J. Comput.-Aided Mol. Design, 8 (1994) 751. 72. McMartin, C. and Bohacek, R.S., J. Comput.-Aided Mol. Design, 9 (1995) 237. 73. Perkins, T.D.J., Mills, J.E.J. and Dean, P.M., J. Comput.-Aided Mol. Design, 9 (1995) 479. 74. Jones, G., Willett, P. and Glen, R.C., J. Comput.-Aided Mol. Design, 9 (1995) 532. 75. Lemmen, C., Hiller, C. and Lengauer, T., J. Comput.-Aided Mol. Design, 12 (1998) 491. 76. Mestres, J., Rohrer, D.C. and Maggiora, G.M., J. Comput. Chem., 18 (1997) 934. 77. Lemmen, C., Lengauer, T. and Klebe, G., J. Med. Chem., 41 (1998) 4502. 78. Miller, M.D., Sheridan, R.P. and Kearsley, S.K., J. Med. Chem., 42 (1999) 1505. 79. Klebe, G., In Kubinyi, H. (Ed.) 3D QSAR in Drug Design: Theory, Methods and Applications, ESCOM, Leiden, 1993, p. 173. 80. Mattos, C. and Ringe, D., In Kubinyi, H. (Ed.) 3D QSAR in Drug Design: Theory, Methods and Applications, ESCOM, Leiden, 1993, p. 226. 81. Bohm, H.-J. and Klebe, G., Angew. Chem. Int. Ed. Engl., 35 (1996) 2588. 82. Grootenhuis, P.D.J. and van Helden, S.P., In Wipff, G. (Ed.) Computational Approaches in Supramolecular Chemistry, Kluwer Academic, Dordrecht, 1994, p. 137. 83. Babine, R.E. and Bender, S.L., Chem. Rev., 97 (1997) 1359. 84. Tripos Inc., St. Louis, MO, U.S.A. 85. Meng, E.C., Shoichet, B.K. and Kuntz, I.D., J. Comput. Chem., 13 (1992) 505. 86. Rohrer, D.C., In Carb6 R. (Ed.) Molecular Similarity and Reactivity: From Quantum Chemical to Phenomenological Approaches, Kluwer, Amsterdam, 1995, p. 141. 87. Mestres J., Rohrer, D.C. and Maggiora, G.M., J. Mol. Graph. Model, 15 (1997) 114. 88. Mestres, J., Rohrer, D.C. and Maggiora, G.M., J. Comput.-Aided Mol. Design, 13 (1999) 79.

207 89. Mestres, J., Rohrer, D.C. and Maggiora, G.M., J. Comput.-Aided Mol. Design, 14 (2000) 39. 90. Gasteiger, J. and Marsili, M., Tetrahedron 36 (1980) 3219. 91. Sadowski, J., Gasteiger, J. and Klebe, G., Inf. Comput. Sci., 34 (1994) 1000.

Perspectives in Drug Discovery and Design, 20: 209–230, 2000. KL UWER/ESCOM © 2000 Kluwer Academic Publishers. Printed in the Netherlands.

Discovering high-affinity ligands from the computationally predicted structures and affinities of small molecules bound to a target: A virtual screening approach TAMI J. MARRONE*, BROCK A. LUTY and PETER W. ROSE* Agouron Pharmaceuticals, Inc., A Warner Lambert Company, San Diego, CA 92121–1111, U.S.A.

Summary. We describe a ‘virtual NMR screening’ method to assist in the design of inhibitors that occupy different sites within a target. We dock small molecules into the active site of an enzyme and score them. Keeping the tightest-binding lead fixed in space, we dock and score other small molecules in its presence. Using this approach, linker groups are used to join the compounds together to form a high-affinity inhibitor. We present validation of our computational approach by reproducing experimental results for FKBP and stromelysin. Docking simulations are not subject to experimental problems such as proteolysis, protein or compound insolubility, or enzyme size. Because docking is fast and our scoring method can distinguish between high- and low-affinity inhibitors, this docking procedure shows promise as integral part of a drug-design strategy. Key words: docking, drug design, SAR by NMR, virtual screening

Introduction One approach to drug design is the screening of libraries of compounds for a desired effect on a target (i.e. inhibition of an enzyme) using high throughput screening assays. Large numbers of compounds are screened rapidly with the expectation that this process will identify a high-affinity ligand for the target. More often than not, these screenings produce leads of low to moderate binding affinity which require further refinement to obtain a high-affinity lead. Shuker et al. [1] have reported a nuclear magnetic resonance (NMR)based method to guide the synthesis of high-affinity inhibitors by linking two or more low-affinity inhibitors which bind in proximal sites of an enzyme. This methodology assumes that the free energy of binding can be approximated as an additive function; the free energy of this linked inhibitor is * To whom correspondence should be addressed.

210 approximately the sum of the free energies of binding of each initial molecule and the linker groups. Hence linking together a micromolar inhibitor with a millimolar inhibitor could possibly create a nanomolar inhibitor. In this NMR-based methodology, the NMR serves as the high throughput screening assay and determines both the orientation of the ligands in their binding sites and their affinities. Both the orientation and binding affinities of the ligands are crucial pieces of information to design a composite inhibitor from the ligands. The orientations of the ligands within the enzyme and the enzyme structure guide the identity of the linker groups, the appropriate length of the linkers used to join the lead molecules, and the points of attachment to the lead molecules. The affinities of the ligands determine which compound in a series of analogues should be chosen as the optimized lead prior to linking them together so that the tightest-binding inhibitor can be achieved. Figure 1 is a schematic diagram, which describes this novel experimental approach. A set of small organic compounds is screened and the binding constants of these ligands and the location of their binding sites are determined. A lead molecule is identified within this set, and analogs of this lead are then screened to optimize the binding to this primary site. In the presence of saturating concentrations of the ligand optimized for the primary site, small organic ligands are screened to determine their affinity for a site proximal to the primary site. A lead for this secondary site is identified and analogs are tested to optimize binding to the second site. Once the binding sites and affinities are determined for each small molecule lead, they are linked together to form a composite ligand. This method was successful for designing highaffinity inhibitors for the FK506 binding protein [13, FKBP, and stromelysin [2]. Computational docking methods [3] have been developed to rapidly screen databases and identify putative drug candidates. These screenings usually identify only low- to moderate-binding ligands that require further optimization. Here, we describe a computational docking method, ‘virtual NMR screening’, that is similar in spirit to an NMR screening approach, but is performed using computer models of binding rather than using NMR. As with the NMR technique, our approach relies on predicting both the structures of the leads and their affinities correctly. Figure 1 shows a schematic of the process, with the computational steps shown in boxes. The first step is to screen for ligands that bind in a primary pocket. Molecules are docked from a database of compounds and scored using an empirical free energy function. The best scoring (lowest energy) small molecule is selected as the initial lead, and can be optimized using several methods. If the empirical free energy function is accurate enough to distinguish between very similar molecules, optimization can be accomplished by docking and scoring molecules

211

Figure 1. A comparison of the experimental NMR-based approach to ligand design and ‘virtual NMR screening’. The computational procedures that can be performed at each step of the process are outlined in boxes.

212 that closely resemble the primary compound. At this stage, more costly free energy simulations, such as free energy perturbation (FEP) [4], could also be used to optimize the ligand, by ‘growing’ in or ‘removing’ functional groups computationally. Once a ligand has been optimized, its coordinates are frozen to its lowest energy docked structure. Next, molecules from a database are docked in the presence of the optimized primary lead. These docked molecules interact with the protein and the primary lead, which allows the molecule docked in the primary site to help guide the positioning of the molecules. The secondary lead is then optimized by docking and scoring or using more elaborate free energy techniques. Once the positions and the affinities of the primary and secondary compounds have been determined, visual examination of the proximity of the compounds to one another can be used to estimate the length and type of linkers needed to connect them. Several linker compositions and lengths can be tested by docking the composite ligands that are generated from the two low-affinity leads. The composite ligands can then be ranked and their synthesis prioritized. Many de novo computational drug-design methods rely on the docking or the joining of fragments or small molecules to create high-affinity ligands [5]. Our computational method differs from other methods; secondary small molecules are docked in the presence of a bound ligand, which influences the docking of the secondary ligands. This approach can reduce intramolecular repulsions when the compounds are assembled into an inhibitor. Both the experimental NMR-based approach and the computational approach have advantages and disadvantages when compared to one another. The biggest advantage of the ‘virtual NMR screening’ is that we are not restricted by experimental difficulties, such as proteolysis of proteins, the size of the enzyme, or solubility difficulties that can occur in the actual experiments. We can dock a small molecule with very low solubility, which could become an integral and essential part of a large soluble inhibitor. However, such a compound may not easily be screened experimentally. We can screen compounds from a database much more quickly than an NMR experiment, and can perform multiple runs in parallel on several computer processors. However, ‘virtual NMR screening’ is prone to problems that are common to many other computer-modeling approaches: the quality of the energy function and adequate sampling of conformations. Given our success here, we have addressed these potential problems in an appropriate manner by using high-quality potential energy functions and sampling many ligand conformations. The major advantage of experimental NMR screening over the present version of its computational counterpart is the ability to measure the structure and affinity of the ligands in the presence of a mobile protein. The fixed protein structure used in computational docking imparts the high speed; only

213

Figure 2. FKBP inhibitors used in the docking study.

conformations of the ligand are sampled. If a ligand causes major structural changes within a protein upon binding that affect the position of the binding site or the affinity of the ligand, ‘virtual NMR screening’ will miss these results. Hence, the ‘virtual NMR screening’ approach is useful for systems where these large structural changes do not occur. To validate our computational approach, we have tested this procedure on both the FKJBP and stromelysin results described for the experimental NMR-based screening systems [2]. These experimental results provide crucial experimental data for which to validate our computational approach. Not only do they possess data for single compounds bound to enzymes, but they

214 provide results for compounds bound in the presence of saturating concentrations of a primary ligand and for the full-size inhibitors. We present our computational results for FKBP, which examine the binding of compounds F2–F14, shown in Figure 2. The numbering of the compounds is the same as in Shuker et al. [1], which reports the experimental method and results, but they are prefixed with ‘F’ to indicate these compounds were used in the FKBP study. From Shuker’s study, compound F2 binds in the primary site with modest affinity (Kd = 2 µ M). In the saturating presence of F2, a benzanilide derivative binds weakly to a proximal site. Several benzanilide derivatives (F3–F9) were examined, and compound F9 exhibited the highest affinity in the presence of F2. Thus compounds F10–Fl4 were synthesized, which linked groups F2 and F9. We also describe the results for stromelysin, which examines the binding of the compounds in Figure 3. The numbering of the compounds is the same as in the experimental paper by Hadjuk et al. [2], but they are prefixed with ‘S’ to indicate these compounds were involved in the stromelysin study. We present the results of these validation studies and show that the docking approach presented here shows promise as a computational drug discovery tool.

Computational methods The crystal structures were taken from the Protein Data Bank for FKBP (pdb entry: lfkf [6]), and stromelysin (pdb entry: lsln [7]). The proteins were prepared for docking by adding hydrogen atoms and optimizing their positions in the presence of the co-crystal ligand. All titratable residues were assigned protonation states consistent with neutral pH. Water molecules in the cocrystal structures that exposed more than 1 Å2 of surface area to other water molecules or to solvent accessible regions were removed. This left five and eight water molecules in the FKBP and stromelysin structures, respectively, which were retained in the docking simulations. Using the co-crystal structure of the protein-ligand complex, a binding site was defined for each of the complexes by an orthorhombic cell. The dimensions of the cell were initially calculated to include the entire co-crystal ligand within the cell. A 5-Å buffer was also added to each dimension of the cell. These binding sites encompassed significant portions of the proteins given the large size of the co-crystal ligands. Docking was confined to these binding sites. Making the assumption that the protein remains fixed at its initial coordinates and the force field is pairwise decomposable, the field of the protein can be pre-calculated on a collection of three-dimensional grids which completely cover the binding site. A full description of the grid calculations

215

Figure 3. Stromelysin inhibitors used in the docking study.

216 and the method for extracting the energy for an arbitrary molecule located anywhere in the binding site can be found elsewhere [8]. The intermolecular energy was calculated using the Amber 94 force field plus a desolvation term. The desolvation term is based upon the simple Gaussian functional form proposed by Stouten et al. [9]. Each atom is assigned a solvation parameter based on the affinity of the atom for solvent and a fragmental volume based on the amount of water the atom excludes from solvating the surrounding atoms. Here we assumed that the solvation parameter was proportional to the square of the charge of the atom plus a small negative constant that is related to the hydrophobicity of a neutral atom. The fragmental volume was estimated as the volume of a sphere with a radius equal to the minimum in the van der Waals interaction potential for the given atom interacting with an atom identical to itself. This function was optimized by comparing desolvation energies calculated with this method for a large number of protein-ligand complexes to desolvation energies calculated using a Finite-Difference Poisson-Boltzmann method [10]. The intramolecular energy of the ligand was estimated using the Dreiding generic force field [11]. This force field describes torsions using generic atom types, which is useful when screening a database that contains functional groups not found in most protein force fields. Other internal degrees of freedom were not sampled for the ligand. The total energy was the sum of the Amber intermolecular energy, the solvation energy, and the Dreiding intramolecular energy without any scaling of any individual term. The charges for the docked ligands were calculated using the MNDO [12] Hamiltonian within MOPAC 7.0 [13] and were scaled by the factor 1.18 if the ligand was neutral. The atomic charges for the protein atoms were taken from the Amber 94 [ 14] force field, except for the charges in the stromelysin active site which contains a zinc. MNDO charges were obtained for the stromelysin active site from a smaller zinc-centered model that consisted of the metal ion and three 5-methyl imidizole molecules. Lennard-Jones parameters for all atoms were taken from the Amber 94 [14] parameter set. The docking was performed on a potential surface of the protein which had been smoothed by using the soft core technique suggested by Zacharias et al. [15] and Beutler et al. [16] in a slightly different context. The smoothing of the potential surface removes high interconversion barriers and allows slight overlap of protein and ligand atoms. This improves search performance. The functional form used was similar to that suggested by Beutler et al. [16] except a constant value of 2.75 Å was used for softening the LennardJones potential and a constant value of 1.75 Å, was used for softening the electrostatic potential. After docking, each structure was relaxed to a local minimum, using the original (not softened) potential function.

217 All docking simulations were performed using the AGDOCK program [17]. The program uses an evolutionary programming algorithm to optimize the score of a population of proposed protein-ligand complexes over a number of generations. Position, orientation, and the internal torsion angles of the ligand are sampled with the algorithm. The resulting docked structures were scored using the High Throughput Screening (HTS) program developed at Agouron. This program uses an empirical energy function that has been parameterized using experimental data (see the Appendix). Although this function rapidly estimates the binding affinity of a protein-ligand complex, it is not used in the docking study, because it is not as computationally efficient as the soft-core potential. Throughout the remainder of this manuscript, the energetics and structures given for the docked compounds are the orientations that were scored to give the most favorable binding free energy using HTS . Four sets of docking simulations were performed on each protein. For each simulation, the ligand was docked 200 times starting from a random conformation, orientation, and position in the binding site. First the ligands that were shown to bind in the primary site of the FKBP and the stromelysin were docked to their respective proteins. The ligands that were known to bind to the secondary site were docked in both the absence and the presence of the lowest energy structure of the primary ligands of FKBP and stromelysin. Finally, the composite inhibitors created from the linked lead compounds were also docked to their respective proteins.

FKBP results Table 1 lists the free energies of binding of several compounds which were found to bind in the primary site during the first screening of the FKBP NMR experiment. These compounds were docked to FKBP in our computational approach and were found to prefer the primary binding site. Our predicted free energies of binding are within 1–1.7 kcal/mol of the reported experimental values. More importantly, our method for estimating affinities from the docked structures is able to distinguish between low- and high-affinity ligands as shown by its ability to predict the pipecolinic acid derivative as the tightest binding ligand from the compounds in Table 1. Figure 4a shows the position of the lowest scoring docked structure of the pipecolinic acid derivative. It forms very similar interactions with the protein as described in the experimental NMR paper [1] for this system. Residues that exhibited NOEs with the bound inhibitors are labeled [1]. Docking compounds F3–F9 independently to FKBP, the results show these compounds to bind to the primary site. The benzanilide derivatives scored

218

Figure 4. Structure of the lowest energy configuration of (a) the pipecolinic acid derivative, compound F2, docked to FKBP; (b) compound F9 docked to FKBP in the presence of the lowest energy structure of the pipecolinic acid derivative; and (c) compound F14 docked to FKBP.

219 Table 1. Comparison of FKBP simulation results with experimental results: docking into primary site

* All experimental ∆G values were calculated from reported Kd or IC50 values for the complexes.

between –5 and –3 kcal/mol (100 µM –10 mM) with HTS when docked in this primary site. The results of docking compounds F3–F9 in the presence of the lowest scoring configuration of F2 identified a binding pocket proximal to the primary pocket in agreement with the Shuker study. Table 2 contains the predicted free energies of binding of the docked compounds along with the experimental values. The predicted free energies of binding are within 0.5– 1.7 kcal/mol of the experimental values. Although our computational method cannot clearly distinguish the rank ordering of the benzanilide compounds,

220 Table 2. Comparison of FKBP simulation results with experimental results: docking in the presence of a pipecolinic acid derivative

Compound

R1

R2

R3

R4

F3 F4 F5 F6 F7 F8 F9

OH H OH H H H OH

OH OH H H H H H

H H H OH H H H

H H H H OH H OH

∆Gpredicted (kcal/mol)

∆Gexp1* (kcal/mol)

–3.1 –3.1 –3.3 –3.1 –3.7 –3.3 –3.7

–4.2 –3.7 –4.4 –3.7 –4.5 –2.8 –5.4

* All experimental AG values were calculated from reported Kd or IC50 values for the complexes.

it does predict compounds F7 and F9 to have the highest binding affinities among the benzanilides that were docked; this is in agreement with experiment. Although not done in this study, once the secondary pocket is identified, the docking could be directed at this particular pocket to reduce both the search space volume and the computational cost. Figure 4b shows the lowest scoring docked position of compound F9 in the presence of the pipecolinic acid derivative, F2. This molecule adopts a slightly different conformation in the docked FKBP structure than in the NMR structure, due to the positioning of arginine 57. The arginine side chain partially obstructs the benzanilide binding pocket. Because this benzanilide is a low-affinity ligand, slight changes in the conformation of the protein near this secondary binding pocket appear to have little effect on the free energy of binding of these compounds. Because the composite compounds F10–Fl4 have 16 to 20 rotatable bonds to sample during flexible docking of the ligand, the conformational space is just too large to be directly searched, even with our efficient docking method. Therefore we employed ‘partially fixed docking’ [ 18] studies of these larger compounds. Here a significant portion of the moiety that represents compound F2 within these ligands was fixed at the position optimized in the

2 21 Table 3. Comparison of FKBP simulation results with experimental results: docking of linked inhibitors Compound

n

∆G predicted (kcal/mol)

∆G exp1 (kcal/mol)*

F10 F11 F12 F13 F14

3 4 5 6 –

–11.4 –11.2 –10.6 –10.7 –10.4

–10.5 –10.2 –10.0 –9.0 –10.0

* All experimental ∆ G values were calculated from reported Kd or IC50 values for the complexes.

initial docking simulation of F2, while the rest of the molecule was free to rotate. Table 3 contains the results for these inhibitors. Comparing the predicted free energies to the experimental free energies of binding, we find that the predicted values are approximately within 0.5–1.7 kcal/mol of the experimental values. Compounds F10 and F11 are predicted to be the tightest binding inhibitors. This prediction is in agreement with the experimental results presented in Table 3. Figure 4c shows the lowest energy configuration of compound F14 docked to FKBP. Comparing the position of the benzanilide group of this compound to the position in Figure 4b, we find that they occupy similar binding sites although in slightly different orientations. These FKBP results demonstrate an important caveat for both the experimental and computational approach. Linking small molecules together to form one inhibitor can create tighter binding inhibitors, but not necessarily selective inhibitors or drug-like molecules. FKBP has a well-defined site that binds the pipecolinic core. However, the benzanilide site is near the surface and not well defined. The increased affinity of the composite ligand could be the result of the increase in the size of the inhibitor and not the selectivity of affinity of the benzanilide for its surface site. Increasing the size of the pipecolinic inhibitor would potentially increase the number of van der Waals interactions with the protein as well as the hydrophobicity of the inhibitor, which would make it more favorable to be bound to the protein rather than solvated in water.

222

Figure 5. Structure of the lowest energy configuration of (a) the acetohydroxamic acid docked to stromelysin; (b) compound S8 docked to stromelysin in the presence of acetohydroxamic acid and (c) compound S50 docked to stromelysin.

223 Table 4. Predicted stromelysin binding affinities Compound

∆ G predicted (kcal/mol)

∆G exp2 (kcal/mol)*

S1 S2** S4** S6** S7** S8** S11** S26** S49 S50 S51 S52 S53

–1.2 –6.0 –6.2 –5.7 –5.2 –6.5 –6.0 –7.2 –8.7 –9.1 –9.5 –10.0 –9.2

–2.4 –4.8 ± 0.3 > –2.7 –5.1 ± 0.1 –3.8 ± 0.1 –6.4 ± 0.3 –4.5 ± 0.4 –6.4 ± 0.3 –9.0 –10.4 –7.5 –7.4 –10.7

* All experimental ∆ G values were calculated from reported Kd or IC50 values for the complexes. ** Indicates docked in the presence of compound S1, acetohydroxamic acid.

Stromelysin results In the experimental studies [2], acetohydroxamic acid (compound Sl) was bound in the active site to prevent proteolysis as well as provide the lead for the primary site. Computationally, acetohydroxamic acid was docked to stromelysin in the protonated form to simplify the protein-ligand model. It is conceivable that there could be protonation state changes upon binding of the ligand to the protein, as well as charge transfer between the acetohydroxamic acid and the metal center of the protein. Figure 5a shows the lowest energy configuration of the acetohydroxamic acid, compound S1. This position is very similar to the binding orientation shown in Hadjuk et al [2]. Despite using such a simple model, we obtain reasonable results for predicting the structure and the binding affinity (see Table 4) of the acetohydroxamic acid. The compounds S2–S26 shown in Figure 3 were docked in the absence and in the presence of the acetohydroxamic acid, compound S1. Interestingly, these compounds bound in the same hydrophobic pocket regardless of the presence or absence of a ligand in the primary binding site. This is in contrast to the FKBP docking studies where all ligands preferred to bind in the primary site where the trimethoxyphenyl pipecolinic acid bound. Table 4 shows the predicted binding affinities for these compounds in the presence of the acet-

224 ohydroxamic acid as well as the free energy of binding for acetohydroxamic acid. The predicted free energies reproduce the experimental values [2] well, except for compound S4 which is predicted to bind several kilocalories more tightly than the free energies estimated from NMR experiments. Examining the compounds in the SAR series described in the Hadjuk paper [2], it is not so clear why some compounds in the series bound tightly and others did not in the NMR experiment. Given the hydrophobic nature of the molecules in general and the fact that some molecules contain a polar or ionizable group, it is possible that there could be some solubility difficulties due to solute aggregation. If there is solute aggregation, the estimated free ligand concentration will be lower than predicted and the dissociation constant will be an upper bound. It is also plausible that the scoring function incorrectly ranks this compound, which demonstrates a limitation in the scoring function that needs to be addressed. Figure 5b shows the lowest energy structure of compound S8 bound to stromelysin in the presence of acetohydroxamic acid. The distance from the methyl carbon of the acetohydroxamic acid to the hydroxyl oxygen of compound S8 was 3.51 Å. Compounds S49–S52 were created [2] from compound S1 and compound S8 with various length hydrocarbon linkers. Compound S53 was created [2] from compound S1 and compound S26. The predicted free energies are found in Table 4 and agree well with experiment. Figure 5c shows the lowest energy configuration of compound S50. The docking of these composite compounds was done with a fully flexible ligand, unlike the larger composite compounds of the FKBP study. Although the acetohydroxamic acid moiety remains bound to the zinc, it moves slightly closer to the pocket containing the hydrophobic moiety, possibly in an attempt to bury the hydrophobic linking group.

Conclusions We have demonstrated that ‘virtual NMR screening’ was able to reproduce the experimental results described for an NMR-based screening method. Both structures and binding free energy were in agreement with experiment. The root-mean-square errors derived from correlation plots of the experimental vs. predicted binding affinities for FKBP and stromelysin were 0.5 kcal/mol and 1.5 kcal/mol, respectively. This demonstrates the ability of the HTS program to rank ligands that differ in affinity by approximately 10-fold or more. As mentioned previously, protein flexibility could be a potential problem in any docking simulation that does not account for it. We have accounted for some of this flexibility by using a soft-core potential in both the docking and scoring energy functions to represent the local movements in the protein

225 atoms, such as thermal fluctuations. Large scale motions, such as binding site formation upon complexation, are not included in the present study due to their computational expense. However, researchers in the docking community are actively working to develop computationally inexpensive methods to model large scale conformational motions which could be applied in this method. Using ‘virtual NMR screening’ as computational aspect prior to experimental work could facilitate timely drug discovery. Our docking and scoring methods can dock and score a ligand with a few rotatable bonds within a few minutes or less. The HTS scoring method can differentiate between compounds with high and low affinities, yet due to its empirical nature, cannot distinguish between compounds that differ by approximately 1–2 kcal/mol. Such distinctions are better suited to other computational methods such as free energy perturbation or experimental methods such as focussed combinatorial library screening. ‘Virtual NMR screening’ is most appropriately used as a pre-screening tool to suggest possible leads for experimental testing.

Acknowledgements We wish to thank the National Institutes of Health, grant number GM39599, for funding of this project. We would also like to acknowledge Sandra Arthurs, Djamal Bouzida, Tony Colson, Stephen Freer, Daniel Gehlhaar, and Veda Larson for software assistance and Paul Rejto and Gennady Verkhivker for helpful discussions.

Appendix Estimating binding free energy The binding free energy of each protein-ligand complex was estimated using the HTS (computational High Throughput Screening) scoring function. This empirical scoring function, which can rapidly predict the affinity of a ligand for a protein binding site, assumes that the total free energy of binding can be decomposed into a linear combination of physically meaningful terms. Experimental data for protein-ligand complexes, combined with theoretical models of the underlying physics, were used to develop and optimize the functional form and parameters. The HTS function can be written as:

226 (1) Individual terms will be discussed in detail in the following. The terms in the HTS function were calibrated separately whenever possible. The solvation terms were fitted to experimental solvation free energies and the entropy terms were derived from sublimation thermodynamic data. Hydrogen bond and metal center bond terms were adopted from the parametrization of the Ludi scoring function [19]. The HTS function has been validated on a set of proprietary protein-ligand complexes without further calibration: HlV-1 Protease (34 structures) (σ = 1.70 kcal/mol, Glycinamide Ribonucleotide Transformylase (35 structures) σ = 1.14 kcal/mol, Thymidylate Synthase (39 structures) σ = 1.5 1 kcal/mol, Stromelysin/Matrilysin/Collagenase (3 1 structures) σ = 1.47 kcal/mol, FKBP (54 structures) σ = 1.30 kcal/mol. Interactions between the ligand and protein are weak at the surface of the protein due to solvent screening and the flexibility of surface side chains. The effect is taken into account by expressing the interaction terms as function of the burial. The burial factor, b, depends on the distance, d, of a particular atom from the nearest point on the molecular surface of the complex. The burial factor is linearly scaled from 0 at the surface to a value of 1 at a distance of 4 Å below the molecular surface. b=0 b = d/4 b=1

if d = 0Å if d < = 4 Å if d > 4 Å

(2)

The hydrogen bond energy, ∆Gh–b, is a function of the burial, the hydrogen bond geometry, fgeometry, and the type of donor and acceptor atom, fdonor/acceptor. The standard hydrogen bond parameters are –2.2 kcal/mol for chb and 0.3 for cb. (3) fb = cb + (bacceptor + bdonor)/2

(4)

fgeometry = cos (Θ) f ∆R f∆α

(5)

The terms f∆Ρ and f∆α are taken from the Ludi scoring function [19]. AR is the deviation of the H...Acceptor distance from the ideal hydrogen bond distance (1.9 A), and ∆α is the deviation of the donor-hydrogen-acceptor angle from linearity. Θ is the angle of the donor out of the acceptor plane. fdonor/acceptor is a factor describing the hydrogen bond strength of the donor and acceptor type, and ∆Gdes–hb accounts for desolvation of the donor and acceptor atoms involved in hydrogen bonds. Hydrogen bonds are defined by

227 the following criteria: H...A distance < = 3 Å, donor-H-acceptor angle > = 90 degrees, donor-acceptor-acceptor-antecedent angle > = 85 degrees, and the H-acceptor-acceptor-antecedent angle > 75 degrees. Interactions that are outside of these ranges are included in the electrostatic and desolvation terms of Equation 1. C-H bonds in aromatic systems are treated as weak hydrogen bond donors and aromatic carbons are treated as weak acceptor atoms. ∆ Ges is an estimate of the free energy of electrostatic interaction between ligand and protein atoms not directly involved in hydrogen bonds. These interactions are described as dipole-dipole, dipole-charge or charge-charge interactions. To mimic solvent screening of the electrostatic interactions, the effective dielectric is set to 4 in the interior of the protein and is linearly scaled to a value of 10 at the molecular surface using the burial factor. ∆ GM is an estimate of the electrostatic free energy of interaction between the ligand and any metal centers within the protein involved in ligand binding, such as the zinc that can interact with hydroxamic acids in stromelysin. ∆ GM = fb cos3 (Θ)

f∆ R

facceptor

∆ R <= 0.2Å, f∆R = 1.0 f∆ R = 1.0 – (∆ R – 0.2)/0.4 0.2Å < ∆R < 0.6Å, ∆ R >= 0.6 Å, f∆ R = 0.0

(6) (7)

This energy is a function of the burial, fb, the metal to acceptor distance, f∆ R, and the identity of the acceptor, facceptor. It is also a function of the angle made between the metal center and the plane of the acceptor, Θ. The term f∆ R was taken from the Ludi scoring function [19]. ∆R is the deviation of the Metal ... Acceptor distance from the ideal distance. A desolvation penalty is applied to polar atoms of the ligand and the binding site that do not form hydrogen bonds. The total desolvation penalty, ∆ Gdes–p, is a sum of atomic contributions. Every atom’s contribution depends on the change in its solvent-accessible surface area, ∆A, on a function of the burial factor, fb, and a solvation parameter, σ, that is specific for the atom type. The solvation parameters for a number of atom types have been derived from experimental data [20]. Solvent-accessible surface areas are calculated with the Amber 94 radii [ 14] and a water probe radius of 1.4 Å. ∆ Gdes–p = Σ(∆ A fb σ atomtype) (8) Burial of nonpolar surface contributes favorably to the binding free energy. The total contribution, ∆Gdes–np, is written as a sum over the change in surface area of each non-polar atom, AA, modulated by a function of the burial factor, fb, which enhances the hydrophobic interactions in deep pockets and attenuates hydrophobic interactions at flat protein surfaces: ∆ Gdes–np

=

σ alkane Σ(∆ A fb)

(9)

228 The solvation parameter, σ alkane = 0.007 kcal mol–1 Å–2, was obtained from hydration free energies of alkanes [ 19]. ∆ GVdw is an estimate of contribution of the van der Waals interaction to complex formation. The van der Waals interaction is estimated using a Lennard-Jones type potential with a soft-core as described by Beutler et al. [16]. The soft-coring of the potential allows for small overlaps of the ligand and protein atoms, implicitly accounting for some mobility of the atoms while making the function more robust computationally. The change in free energy due to inducing strain into a ligand upon complex formation, ∆Gstrain, is difficult to estimate, since the energy of only a single configuration is used in the calculation. Currently, the only strain recognized by the program is the syn-pentane interaction [21]. An example of a syn-pentane or g+g– interaction is the 1,3-diaxial interaction in cyclohexane derivatives, where two substituents in a 1,5 arrangement are in close proximity. An energy penalty, ∆Gsyn–pentme = 2 kcal/mol, is added to the strain energy term for each syn-pentane interaction. ∆ Ginternal is an estimate of the loss in free energy due to the immobilizing internal rotations in the protein and ligand when complexation occurs. In the current form of HTS, only the freezing of internal rotations in the ligand is taken into account. This loss in conformational entropy is calculated from the estimated change in the number of conformational states between the bound and unbound ligand. To estimate the number of states in the unbound state each rotatable bond in the ligand is analyzed. Bonds to terminal groups, e.g. CH3, NH3, and bonds that rotate symmetric groups, e.g. carboxylate or phenyl, are not considered in the calculation. The rotation around an sp2-sp2 bond generates two states, m = 2, and the rotation around an sp2-sp3 and sp3sp3 bond generates three states, m = 3, in the unbound ligand. We calculate the number of states in the bound form by formally splitting the ligand into two fragments at the rotatable bond and evaluating the interaction energy of each fragment with the protein. Suppose both fragments interact strongly with the protein, in this case the ligand is frozen into one state for the rotatable bond under consideration. Now suppose a situation where one fragment interacts strongly with the protein and the other fragment does not interact with the protein, but is exposed to solvent. In this case, the rotation about the bond is not restricted. The loss in mobility is then related to the energy of the fragment that has the lowest interaction energy, Elow = Maximum(Efragment1, Efragment2). To estimate the number of states available in the bound state we assume that we have one state with the energy, E1 = Elow and that the other m–1 states are not occupied. The unoccupied states are assigned an energy,

229 E2 ... Em = 0.0 kcal/mol, and we evaluate the number of states in the bound form, Z by summing over all states, m. Z= Σ exp(e(–Ei+E1)/RT)

(10)

The free energy due to entropy loss, ∆Ginternal, is calculated as a sum over all rotatable bonds: ∆ Ginternal = RT Σ ln(m/Z)

(11)

The binding of a ligand to a protein reduces the number of translational and rotational degrees of freedom in the system and therefore provides a contribution, ∆Gtrans/rot, to the binding free energy: ∆ Gtrans/rot = ∆Gcratic

–

0.6

∆Hinteraction

(12)

Here we model this factor as the sum of terms which represents the change in cratic entropy [22] upon complexation and a term that accounts for enthalpyentropy compensation. At a concentration of 1.0 mol/1 the cratic entropy contributes –8 cal/mol/K, corresponding to a free energy of 2.4 kcal/mol at 300 K. The second term in ∆Gtrans/rot is based on the phenomenon of enthalpyentropy compensation. A strong interaction between the protein and the ligand results in a strong reduction in the mobility of the ligand and is associated with a large loss in entropy. We use the enthalpy-entropy relationship between the enthalpy and entropy of sublimation to calibrate the entropy term. The analysis of the sublimation thermodynamics of rigid molecules by Searle and Williams [23] gave the following relationship: –T∆ Ssublimation = –0.6 ∆ Hsublimation. The same scaling factor is applied to the interaction energy to get an estimate of the entropy loss upon ligand binding.

References 1. Shuker, S.B., Hadjuk, P.J., Meadows, R.P. and Fesik, S.W., Science, 274 (1996) 1531. 2. Hadjuk, P.J., Sheppard, G., Nettescheim, D.G., Olejniczak, E.T., Shuker, S.B.,Meadows, R.P., Steinman, D.H., Carrerra, Jr., G.M., Marcotte, P.A., Severein, J., Walter, K., Smith, H., Gubbins, E., Simmer, R., Holzman, T.F., Morgan, D.W., Davidsen, S.K., Summers, J.B. and Fesik, S.W., J. Am. Chem. SOC., 119 (1997) 5818. 3. See some representative reviews and examples: Jones, G. and Willet, P., Curr. Opin. Biotechnol., 6 (1995) 652; Lybrand, T.P., Curr Opin. Struct. Biol., 5 (1995) 224; Kuntz, I.D., Meng, E.C. and Schoichet, B.K., Acc. Chem. Res., 27 (1994) 117; Marrone, T.J., Briggs, J.M. and McCammon, J.A., Annu. Rev. Pharmacol. Toxicol., 37 (1997) 71; Makino, S. and Kuntz, I.D., J. Comput. Chem., 18 (1997) 1812. 4. See for example Straatsma, T. P., In Lipkowitz, K.B. and Boyd, D.B. (Eds.), Reviews of Computational Chemistry, VCH Publishers, Inc., New York, NY, 1996, chapter 2,

230 5. Bohm, H.-J., J. Comput.-Aided Mol. Design, 5 (1992) 61; Verlinde, C.L.M.J., Rudenko, G. and Hol, W.G.J., J. Comp. -Aided Mol. Design, 6 (1992) 13 1; Ho, C.M.W. and Marshall, G.R., J. Comput.-Aided Mol. Design, 7 (1993) 623; Tschinke, V. and Cohen, N.C., J. Med. Chem., 36 (1993) 3863; Rotstein, S.H. and Murcko, M.A., J. Med. Chem., 36 (1993) 1700; Eisen, M.B., Wiley, D.C., Karplus, M. and Hubbard, R., Proteins, 19 (1994) 199; Miranker, A. and Karplus, M., Proteins, 11 (1991) 29; Böhm, H.-J., Perspect. Drug Discov. Design, 3 (1995) 21; DeWitte, R.S. and Shakhnovich, E.I., J. Am. Chem. Soc., 118 (1996) 11733; Pearlman, D.A. and Murcko, M.A., J. Med. Chem., 39 (1996) 1651; Rarey, M., Kramer, B., Lengauer, T. and Klebe, G., J. Mol. Biol., 261 (1996) 470. 6. Van Duyne, G.D., Standaert, R.F., Karplus, P.A., Schreiber, S.L. and Clardy, J., Science, 252 (199) 839. 7. Becker, J.W., Marcy, A.I., Rokosz, L.L., Axel, M.G., Burbaum, J.J., Fitzgerald, P.M.D., Cameron, P.M., Esser, C.K., Hagmann, W.K., Hermes, J.D. and Springer, J.P., Protein Sci., 4 (1995) 1966. 8. Meng, E.C., Shoichet, B.K. and Kuntz, I.D., J. Comput. Chem., 13 (1992) 505; Luty, B.A., Wasserman, Z.R., Stouten, P.F.W., Hodge, C.N., Zacharias, M. and McCammon, J.A., J. Comput. Chem., 116 (1995) 454. 9. Stouten, P.F.W., Frommel, C., Nakamura, H. and Sander, C., Mol. Simul., 10 (1993) 97. 10. Manuscript in preparation. 11. Mayo, S.L., Olafson, B.D. and Goddard, W.A., J. Phys. Chem., 94 (1990) 8897. 12. Dewar, M.J.S. and Thiel, W., J. Am. Chem. SOC., 99 (1977) 4899; Dewar, M.J.S. and Thiel, W., J. Am. Chem. SOC., 99 (1977) 4907. 13. Stewart, J.J.P., Quantum Chemistry Program Exchange (QCPE), Program QCMP130, Department of Chemistry, Indiana University, Bloomington, IN, 1995. 14. Cornell, W.D., Cieplak, P., Bayly, C.I., Gould, I.R., Merz, K.M., Ferguson D.M., Spellmeyer, D.C., Fox, T., Caldwell, J.W. and Kollman, P.A., J. Am. Chem. SOC., 117 (1995) 5179. 15. Zacharias, M., Straatsma, T.P. and McCammon, J.A., J. Chem. Phys., 100 (1994) 9025. 16. Beutler, T.C., Mark, A.E., van Schaik, R.C., Gerber, P.R. and van Gunsteren, W.F., Chem. Phys. Lett., 222 (1994) 529. 17. Gehlhaar, D.K., Verkhivker, G.M., Rejto, P.A., Sherman, C.J., Fogel, D.B., Fogel, L.J. and Freer, S.T., Chem. Biol., 2 (1995) 317. 18. Gehlhaar, D.K., Bouzida, D. and Rejto, P.A., In Parrill, L.A. and Reddy, M.R. (Eds.), Rational Drug Design, ACS Symposium Series 719, American Chemical Society, Washington, DC, 1999, Chapter 19, p. 292. 19. Böhm, H.-J., J. Comput.-Aided Mol. Design, 8 (1994) 243. 20. Cabani, S., Gianni, P., Mollica, V. and Lepori, L., J. Solution Chem., 10 (1981) 563. 21. Hoffmann, R.W.,Angew. Chem. Int. Ed. Engl., 31 (1992) 1124. 22. Kauzman, W., Adv. Protein Chem., 14 (1959) 1. 23. Searle, M.S. and Williams, D.H., J. Am. Chem. SOC., 114 (1992) 10690.

Perspectives in Drug Discovery and Design, 20: 23 1–244, 2000. KLUWER/ESCOM © 2000 Kluwer Academic Publishers. Printed in the Netherlands.

In vitro and in silico affinity fingerprints: Finding similarities beyond structural classes HANS BRIEM* and UTA F. LESSEL Boehringer Ingelheim Pharma KG, Department of Lead Discovery, Binger Straße 173, D-5521 6 Ingelheim, Germany

Summary. In this article, we review the use of in vitro and in silico affinity fingerprints as novel descriptors for similarity searches in molecular databases and QSAR analyses. An affinity fingerprint for a particular molecule is constructed as a vector of either its binding affinities, docking scores or superpositioning pseudo energies against a reference panel of proteins or small molecules. In contrast to most other molecular descriptors, affinity fingerprints are not directly derived from molecular structures. As such, they offer the possibility to detect similarities amongst molecules independent of their structural scaffolds. In this report we introduce the Flexsim-S method, an extension of our previous work on virtual affinity fingerprints. Moreover, we demonstrate that virtual affinity fingerprint methods are comparable to some popular two-dimensional descriptors in terms of correctly classifying compounds, but complementary with respect to the particular search results (hit lists). Key words: affinity fingerprints, database searching, molecular descriptors, molecular similarity, QSAR, virtual screening

Introduction Watching the seemingly ever increasing throughput both in molecular synthesis (high-throughput organic synthesis, HTOS) and biological testing (highthroughput screening, HTS), which allows a rapid screening of even the large compound pools of major pharmaceutical organizations, a naive viewer might guess that the role of computational methods to assess molecular similarity/dissimilarity is of decreasing importance in lead finding and optimization. For various reasons the opposite development holds true: – There are still many interesting biological targets which are not amenable to HTS. For medium and low throughput assays however, compound * To whom correspondence should be addressed. E-mail: [email protected].

232 selection by diversity considerations or by similarity to given lead compounds usually is the method of choice. For targets whose 3D structures are available either from X-ray crystallography, high-resolution nuclear magnetic resonance spectroscopy (NMR) or homology modeling, virtual screening techniques like computational docking or 3D database searches might be other options. – Prior to synthesis, a virtual library containing even billions of compounds can be constructed in the computer. In order to design and synthesize ‘real’ libraries around a given lead structure (lead-optimization libraries) or for use in broad screening (screening libraries), one may select compounds from the virtual library applying either similarity or diversity criteria. – Similarly, one might consider external compound sources (e.g. vendor’s catalogues) in order to fill ‘diversity voids’ in the corporate database or to purchase compounds similar to a lead structure. – Since high-throughput compound screening is typically done in a highly automated fashion and at a single concentration, the biological data might be noisy, thus giving rise to a considerable number of both false positives and false negatives. Whereas false positives typically will be detected in secondary assays, false negatives might get lost in the further hit-to-lead process. One way to uncover at least some of those is to rescreen all compounds which have some similarity to a primary HTS hit, no matter whether they have passed the first HTS hit threshold or not. – Considering the fact that many promising drug candidates fail due to their insufficient ADME (absorption, distribution, metabolism and excretion) properties, there is increasing interest in measuring these properties in an HTS mode or to predict them based on similarities to well-characterized compounds. All these applications of molecular (dis-)similarity solely rely on the similar property principle [1] which states that there has to be a strong correlation between the molecular descriptors used to define similarity and the properties one would like to predict (e.g. binding affinity, physicochemical or pharmacokinetical properties). In recent years, many researchers have investigated the predictive power of various molecular similarity descriptors [2–5]. Most of these analyses came to the conclusion that topological 2D descriptors (e.g. path fingerprints from DAYLIGHT [6] or MDL‘s MOLSKEYS [7]) are useful in this respect and that they are to some extent superior to 3D descriptors which take into account the three-dimensional features of molecules.

233 One possible explanation for these surprising results can be found in the nature of the data sets used in the investigations mentioned above: Typically, in a medicinal chemistry program many compounds with the same scaffold are synthesized around a given lead structure and towards the end of such a program, only minor modifications to the most active compounds are added. This is an ideal situation for 2D similarity measures: many active compounds sharing most of their 2D features. The 3D methods, however, can sometimes be very sensitive to these small changes. This would explain that such compound sets are misclassified by 3D measures. Although for many applications clustering of compounds of the same structural class is desired, often one would like to cross the boundaries of structural scaffolds in order to find ‘surprising’ similarities of compounds sharing the same biological or physicochemical properties. This is particularly interesting for instance when a patent-protected competitor’s compound is used as the search query or when unwanted side effects can be associated with a structural class. In order to achieve such an ‘island-hopping ’ situation, the respective molecular descriptors have to be derived at a higher level of abstraction than the 2D structural keys. One of the most promising ideas in this respect, which was first described in the early 1990s, is the use of so-called ‘affinity fingerprints’. In the following paragraphs, we will give a brief review of this field. In zddition we will describe a recently developed extension, called FlexsimS. Finally, we will compare results obtained by our virtual affinity fingerprint methods to those obtained by employing some popular 2D descriptors.

In vitro affinity fingerprints This approach was first introduced by Weinstein et al. [8,9] at the National Cancer Institute (NCI) and Kauvar et al. [10] at Terrapin and pursued by Dixon and Villar [11] at Telik. Here, an affinity fingerprint for a compound is defined as a vector of its binding affinities against a panel of uncorrelated reference proteins. The underlying assumption behind the affinity fingerprint idea is that binding properties of the ligands in the reference panel can be extrapolated in order to predict affinities to a new target of interest. The researchers at Terrapin have subsequently used the affinity fingerprints much like the components of a multivariate QSAR calculation [10]. For each new target, a QSAR model is calculated based on a small training set. This equation, which can be viewed as the computational surrogate of the target, is then taken to predict affinities for the remainder of the data set. The researchers in Weinstein’s group at NCI have profiled more than 60 000 compounds by measuring their 50% growth-inhibitory concentration (GI50)

234 against a panel of 60 human cancer cell lines. They have used the resulting fingerprints in a neural network approach to classify compounds retrospectively with respect to their mechanisms of action [8]. Moreover, the activity patterns were applied in conjunction with structural descriptors for virtual screening of the complete NCI database in order to find new anticancer drugs [9].

‘In silico’ (or virtual) affinity fingerprints Encouraged by Terrapin’s and NCI’s success using in vitro affinity fingerprints, we were the first who developed and reported virtual affinity fingerprints [5]. The in vitro assays were replaced by computational docking of the ligands into binding pockets of protein structures solved by X-ray crystallography. Consequently, a molecule’s virtual affinity fingerprint is defined as a vector of its calculated docking scores with respect to the reference panel. Our first approach – subsequently termed DOCKSIM – employed the docking program DOCK (Version 3.5) [12] which at that time only allowed rigid docking of the ligand test set. A reference panel of eight uncorrelated protein structures was selected from the Brookhaven Protein Data Bank [13]. In contrast to Terrapin’s work, we did not aim at a quantitative affinity prediction but rather used the pairwise euclidean distances of the affinity fingerprints as a similarity measure in order to classify compounds correctly (see Reference 5 and Methods section). Recently, we described a further development of the DOCKSIM approach, called Flexsim-X [14]. The main enhancements compared to DOCKSIM can be summarized as follows: – Instead of rigid docking we were able to apply flexible docking using the program FlexX (Version 1.65) [15]. – The composition and size of the reference panel of protein binding pockets were optimized by systematic as well as genetic algorithm-based (GA) procedures. This resulted in a remarkable performance increase compared to arbitrary selection of the panel. Finally, Ghuloum et al. [16] from MetaXen proposed an approach called molecular hashkeys, which uses surface-based comparisons of target molecules with a reference panel comprising small, drug-like molecules instead of proteins. Despite this difference compared to all the other methods described so far, the molecular hashkey approach nevertheless clearly lies at the heart of the original affinity fingerprint idea. Ghuloum et al. used their program to predict two molecular properties:

235 Table1. Characteristics of DOCKSIM, Flexsim-X and Flexsim-S DOCKSIM

Flexsim-X

Flexsim-S

Underlying program

DOCK (V. 3.5)

FlexX (V. 1.65)

FlexS (V. 1.32)

Reference panel composition

8 binding pockets from PDB

41 binding pockets from PDB

44 small molecules from MDDR

Reference panel selection

Arbitrarily

GA-based optimization out of 100 pockets

GA-based optimization out of 100 molecules

Ligand treatment

Rigid docking

Flexible docking

Rigid fitting

By applying a weighted K nearest neighbor (KNN) classification, they were able to predict the octanol-water partition coefficient (logP) of a set of almost 1000 compounds by an accuracy similar to the ClogP program [17]. In addition, they successfully trained a neural network in order to get a predictive model for intestinal absorption. Interestingly, by simply adding more members to the reference panel they observed a plateau, i.e. no further increase in predictiveness, similar to our experience described in our Flexsim-X paper [14]. This supports our hypothesis that on one hand a careful selection and optimization of the panel is crucial for the success of the affinity fingerprint methods, but on the other hand a kind of natural limit seems to exist for the panel size. Independently from MetaXen's work, we have recently developed a similar approach which we termed Flexsim-S. It involves superimposing ligand molecules onto a set of small reference molecules employing the FlexS [ 18] program (Version 1.32). As in our docking approaches DOCKSIM and Flexsim-X, a virtual affinity fingerprint is constructed as a linear vector of scores. In Flexsim-S, however, the docking scores are substituted by measuring the alignment quality of two small molecules. During development of Flexsim-X we learned that an optimization both in size and composition of the reference panel is crucial in order to improve the classification power of the virtual affinity fingerprints. Consequently, we used the same GA-based optimization procedure [19] as described for Flexsim-X. Table 1 summarizes some characteristic features of our three different virtual affinity fingerprint approaches DOCKSIM, Flexsim-X and Flexsim-S. A detailed description of Flexsim-S will be given in the Methods section below.

236

Methods Flexsim-S

Selection of reference molecules A set of 100 reference molecules was extracted randomly from the MACCS Drug Data Report (MDDR), provided by MDL Information Systems Inc. (San Leandro, CA). Special attention was paid to avoiding any overlap of the reference set with the ligand test set. 3D structures for the reference set were generated by the Corina program [20] (Version 2.12). Hydrogens and partial charges, according to the Gasteiger-Marsili method [21], were added within the SYBYL molecular modeling package (Tripos Inc., St. Louis, MO). A list of the respective MDDR ID-codes can be obtained from the authors upon request. Selection of ligand test set For all evaluations, we used the same test set of 957 ligands extracted from the MDDR. The set contains 134 PAF antagonists, 49 5-HT3 antagonists, 49 TXA2 antagonists, 40 ACE inhibitors and 111 HMG-CoA reductase inhibitors. Additionally 574 compounds from the MDDR database were selected randomly. None of these belong to any of the five activity classes. Since the test set was already used for the development of Flexsim-X [ 14] and DOCKSIM [5] (with some slight modifications), further details may be found in the respective papers and will not be repeated here. Superimposing of ligand test set Each ligand in the test set was superimposed rigidly onto each reference molecule, using the RigFit option of the program FlexS. Details of the FlexS and RigFit algorithms are described in References 18 and 22. A fingerprint vector for each test ligand was constructed from the pseudo fitting energies calculated by FlexS (‘FlexS scores’). Genetic algorithm-based optimization of molecules in the reference panel In order to select optimal combinations of reference molecules, the genetic algorithm (GA)-based approach was applied exactly in the same way as described in our Flexsim-X paper [14] and will not be repeated here. As fitness function we used the mean sample hit rate as described in the ‘success measure’ paragraph below. The optimum was achieved using the following GA parameters (see Reference 14 for a detailed discussion):

237 Mutation rate: Crossover rate: Replacement: Selection: Tournament size: Population size: Number of generations:

1.0 1.0 uniform tournament 2 200 400

Descriptor comparison Generation of 2D descriptors (1) DAYLIGHT fingerprints [6] were generated at fixed length (1024 bits) without folding. (2) Within the ISIS database system, 166 predefined fragment-based keys are available (ISIS MOLSKEYS) [7]. These keys were extracted from the MDDR database and converted into a binary vector for each test ligand. (3) Feature Trees (FTREES) [23]. A feature tree is a more abstract 2D molecular descriptor than a fingerprint. Instead of counting atoms or fragments, a tree is constructed with each node representing a set of chemical features. Details of the methods are given in Reference 23. (4) Tripos’ Molecular Holograms [24]. Molecular holograms can be regarded as an extended form of 2D fingerprints. Instead of just determining the absence or presence of particular fragments, holograms maintain a count of the number of times each fragment occurs. In addition, branched and cyclic fragments are considered separately. Molecular holograms have been introduced by Tripos Inc. as part of the Hologram QSAR (HQSAR) module within SYBYL. Generation of 3D descriptors The score matrices for DOCKSIM and Flexsim-X are constructed as described in References 5 and 14. For DOCKSIM we used DOCK Version 3.5 with standard force field scoring. The Flexsim-S method is described above. Generation of euclidean distance matrices For all 2D and 3D descriptor matrices, euclidean distances were calculated for each possible combination of test ligands.

238

Success measure For each of the 383 ligands from the five activity classes in the ligand test set, the 10 most similar compounds (i.e. the 10 nearest neighbors) from the remainder of the whole test set were determined. An individual ‘hit rate’ for each compound can be computed based on the fraction of nearest neighbors belonging to the same activity class as the query compound itself. Finally, for the whole set of 383 ligands from the five activity classes, the individual hit rates were summed up and a mean hit rate was calculated. Enrichment is achieved when the nearest-neighbor hit rate is higher than the fraction of the particular activity class in the whole data set. Calculation of overlap For each compound from the five activity classes and each descriptor method, 20 nearest neighbors were determined. Subsequently, for each possible pair of descriptors, individual percentages of overlap were computed according to Equation 1 (e.g. a compound with 15 out of 20 nearest neighbors in common for a particular descriptor combination would have an individual overlap value of 75%). Individual % overlap = Number of nearest neighbors in common . 100 Number of nearest neighbors considered

(1)

Finally, the mean percentage of overlap for each descriptor combination is calculated.

Results and discussion Flexsim-S Three different approaches to virtual affinity fingerprints have been developed in our group: In our previous work, we have employed the docking programs DOCK and FlexX in order to dock test ligands into a reference panel of binding pockets taken from the PDB. The former approach, called DOCKSIM, used a rigid docking algorithm and a panel of eight binding pockets selected more or less arbitrarily. In an attempt to optimize the reference panel of binding pockets in terms of size and composition and to introduce flexible docking, we developed the Flexsim-X method. Finally, in this article we present an extension which uses the molecular superpositioning program FlexS to generate virtual affinity fingerprints. To optimize the reference panel

239 for this approach, called Flexsim-S, we applied the same genetic algorithmbased optimization protocol as described in detail for Flexsim-X [14]. The optimum, expressed as mean sample hit rate, was obtained using 44 out of 100 reference molecules.

Descriptor validation and comparison In order to validate our methods and to see them in context with some popular 2D descriptors (DAYLIGHT fingerprints, ISIS MOLSKEYS, FTREES and Tripos’ Molecular Holograms), we were particularly interested in answering two questions: First, how well do our affinity fingerprints perform in terms of correctly classifying compounds according to their activity classes and second, do these methods yield similar or rather complementary hit lists with respect to the existing 2D approaches? To answer the first question, we applied a standard ligand test set of approximately 1000 compounds. As described in the Methods section above, mean sample hit rates were calculated to gauge the predictive power of each descriptor set. It has to be kept in mind that a totally meaningless descriptor (e.g. a random assignment of activity classes) would result in a mean sample hit rate of about 9% for this data set. Thus, any figure higher than that can be regarded as enrichment, i.e. classification results better than random. Results are given in Figure 1 with the descriptors sorted in order of decreasing mean sample hit rates. Yielding values between 57 and 71%, all four 2D methods are superior to our affinity fingerprints. Nevertheless, the optimized Flexsim-X and Flexsim-S panels (50 and 48%, respectively) perform only slightly worse than e.g. DAYLIGHT fingerprints or Tripos’ Holograms. These findings are in agreement with results obtained by other researchers who observed that 2D descriptors outperform 3D approaches in many cases. As discussed in the Introduction, we believe that this is partly due to the strong bias towards 2D similarity in many validation sets (like in ours). Taking this into account we are quite satisfied approaching the 2D results so closely. Compared to our older DOCKSIM method (mean sample hit rate of 26%), we could clearly demonstrate a big improvement in overall performance. There are several reasons which might account for this enhancement: – DOCKSIM employs the rigid docking algorithm of DOCK 3.5, whereas Flexsim-X allows a flexible adaptation of the ligands. – The docking algorithms as well as the scoring functions of DOCK and FlexX are quite different.

240

Figure 1. Mean sample hit rates for the 2D descriptors (dark grey) and the virtual affinity fingerprints (light grey).

– For Flexsim-X and Flexsim-S, we carefully optimized the reference panel compositions yielding 41 and 44 members, respectively. For DOCKSIM, only eight binding pockets were selected more or less arbitrarily. We believe that the latter issue has the largest effect on the improvement obtained. To verify or disprove this hypothesis, we plan to run the same reference panel optimization procedures using both DOCK 3.5 as well as the latest DOCK Version 4.0 [25], allowing flexible docking. As we have already discussed in the Introduction and demonstrated in our previous work on virtual affinity fingerprints, these descriptors are able to find ‘surprising’ similarities amongst molecules from the same activity class, i.e. similarities which are neither obvious to a ‘chemist’s eye’ nor to the 2D fingerprint methods described above. In order to examine this in more

241

Figure 2. Result of the descriptor comparison (expressed as mean % overlap) for each descriptor combination (black: 2D vs 2D descriptors; grey: 2D vs 3D descriptors; white: 3D vs 3D descriptors).

detail, we systematically calculated the hit list overlaps – based on the 20 nearest neighbors for each compound – for every descriptor combination (see Methods section above). Results are given in Figure 2 in terms of mean percentage of overlap.

2D vs 2D The overlap percentage ranges from 47% (MOLSKEYS vs Holograms) up to 63% (MOLSKEYS vs DAYLIGHT fingerprints). 2D vs 3D Even the highest overlap value of 40% (Flexsim-S vs FTREES) is smaller than the lowest overlap in the 2D vs 2D section. The other values range from 17 to 33%.

242

3D vs 3D The overlap values for our three different virtual affinity fingerprint methods are all around 20%. The following conclusions may be drawn: – Despite the differences in how the 2D descriptors are calculated, they all yield similar hit lists, leading to high overlap values. – The 2D vs 3D results show that hit lists obtained by virtual affinity fingerprints are truly complementary to those obtained by standard 2D methods. This complementarity is particularly remarkable considering the fact that both the 2D as well as the optimized 3D descriptors are able to achieve an enrichment of compounds from the same activity class to a similar extent. – Most surprising to us was the finding that the overlap values amongst the three affinity fingerprint methods (3D vs 3D) are all very low. Each approach seems to have its special characteristics, which again raises hopes for complementarity in similarity search or activity prediction results.

Conclusion and outlook We were able to demonstrate that virtual affinity fingerprints can be used for virtual screening of molecular databases in order to classify compounds by their activity classes. In this context, we believe that a careful selection and optimization of the reference panel of either protein binding pockets (DOCKSIM and Flexsim-X) or small molecules (Flexsim-S) is an important step to improve the predictive performance. In addition, scoring functions and treatment of the ligands (rigid vs flexible) might play an important role in this respect. As improved scoring functions and enhanced docking algorithms become available, we will check their influence on our results. By comparing the hit lists obtained by either virtual affinity fingerprints or some popular 2D descriptors we showed that there is high redundancy amongst the latter approaches whereas the 3D fingerprints can be regarded as complementary to both the 2D methods as well as amongst themselves. Consequently, for a general database search for compounds similar to a given lead structure, a mix taking into account the results of all methods would be highly desirable. Therefore we are particularly interested in data fusion techniques as proposed by Willett’s group [26] to merge search results from different descriptor types.

243

Figure 3. Example of hits found with DOCKSIM, Flexsim-X and Flexsim-S. All the hits are members of the respective 10 nearest neighbor lists of the query structure. None of them, however, are found amongst the nearest neighbor hit lists of any of the 2D descriptor methods. All compounds shown are described as PAF antagonists in the MDDR database.

Most important, affinity fingerprints are capable to reflect biological similarities of molecules beyond their structural classes. This can be highly desirable, e.g. to search a corporate database for compounds similar to an early screening hit or even to a competitor’s compound. In order to illustrate this ‘island-hopping’ situation, three examples are given in Figure 3 (one for each of our virtual affinity methods). All of the ‘hits’ are found amongst the top 10 nearest neighbors of the query molecule and are correctly classified as PAF antagonists. On the other hand, none of these compounds is part of the top 10 hit lists determined by any of the 2D finger print methods.

Acknowledgements This work was performed as part of the RELIMO-project, which is funded by the German Federal Ministry for Education, Science, Research, and Technology (BMBF) under grant No. 0311623. The authors are grateful to Dr Matthias Rarey from GMD-SCAI for kindly providing us with the Feature Tree distance matrix and for many valuable discussions. In addition, we would like to thank Dr Christian Lemmen from Combichem for making us familiar with the FlexS program.

244

References 1. Johnson, M.A. and Maggiora, G.M., Concepts and Applications of Molecular Similarity, Wiley, New York, NY, 1990. 2. Brown, R.D. and Martin, Y.C., J. Chem. Inf. Comput. Sci., 36 (1999) 572. 3. Patterson, D.E., Cramer, R.D., Ferguson, A.M., Clark, R.D. and Weinberger, L.E., J. Med. Chem., 39 (1996) 3049. 4. Matter, H., J. Med. Chem., 40 (1997) 1219. 5. Briem, H. and Kuntz, I.D., J. Med. Chem., 39 (1996) 3401. 6. DAYLIGHT, Version 4.62, DAYLIGHT Inc., Mission Viejo, CA. 7. ISIS, Version 2.1.4, Molecular Design Ltd., San Leandro, CA. 8. Weinstein, J.N., Kohn, K.W., Grever, M.R., Viswanadhan, V.N., Rubinstein, L.V., Monks, A.P., Scudiero, D.A., Welch, L., Koutsoukos, A.D., Chiausa, A.J. and Paull, K.D., Science, 258 (1992) 447. 9. Weinstein, J.N., Myers, T.G., O’Connor, P.M., Friend, S.H., Fornace, A.J., Kohn, K.W., Fojo, T., Bates, S.E., Rubinstein, L.V., Anderson, N.L., Buolamwini, J.K., van Osdol, W.W., Monks, A.P., Scudiero, D.A., Sausville, E.A., Zaharevitz, D.W., Bunow, B., Viswanadhan, V.N., Johnson, G.S., Wittes, R.E. and Paull, K.D., Science, 275 (1997) 343. 10. Kauvar, L.M., Higgins, D.L., Villar, H.O., Sportsman, J.R., Engqvist-Goldstein, A., Bukar, R., Bauer, K.E., Dilley, H. and Rocke, D.M., Chem. Biol., 2 (1995) 107. 11. Dixon, S.L. and Villar, H.O., J. Chem. Inf. Comput. Sci., 38 (1998) 1192. 12. a. DOCK, Version 3.5, University of California, San Francisco, CA. b. Kuntz, I.D., Blaney, J.M., Oatley, S.J., Langridge, R. and Ferrin, T.E., J. Mol. Biol., 161 (1982) 269. c. Shoichet, B.K., Bodian, D.L. and Kuntz, I.D., J. Comput. Chem., 13 (1992) 380. d. Meng, E.C., Shoichet, B.K. and Kuntz, I.D., J. Comput. Chem., 13 (1992) 505. 13. Bernstein, F.C., Koetzle, T.F., Williams, G.J.B., Meyer Jr., E.F., Brice, M.D., Rodgers, J.R., Kennard, O., Shimanouchi, T. and Tasumi, M., J. Mol. Biol., 112 (1977) 535. 14. Lessel, U.F. and Briem, H., J. Chem. Inf. Comput. Sci., 40 (2000) 246. 15. Rarey, M., Kramer, B., Lengauer, T. and Klebe, G., J. Mol. Biol., 261 (1996) 470. 16. Ghuloum, A.M., Sage, C.R. and Jain, A.N., J. Med. Chem., 42 (1999) 1739. 17. Leo, A.J., Chem. Rev., 93 (1993) 1281. 18. Lemmen, C., Lengauer, T. and Klebe, G., J. Med. Chem., 41 (1998) 4502. 19. SUGAL Genetic Algorithm package, Version 2.1, written by Dr Andrew Hunter at the University of Sunderland, U.K. 20. Sadowski, J., Schwab, C.H. and Gasteiger, J., CORINA 3D-Structure Generator Program description, 1997. 21. Gasteiger, J. and Marsili, M., Tetrahedron, 36 (1980) 3219. 22. Lemmen, C., Hiller, C. and Lengauer, T., J. Comput.-Aided Mol. Design, 12 (1998) 491. 23. a. Rarey, M. and Dixon, J.S., J. Comput.-Aided Mol. Design, 12 (1998) 471. b. http://cartan.gmd.de/ftrees/ftrees_home.html. 24. SYBYL, Version 6.5.3, HQSAR Module, Tripos Inc., St. Louis, MO. 25. DOCK, Version 4.0, University of California, San Francisco, CA. 26. Ginn, C.M.R., Ranade, S.S., Willett, P. and Bradshaw, J., In: Arabnia, H.R. and Zhu, D. (Eds.) Proceedings of the International Conference on Multisource-Multisensor Information Fusion, Fusion’98, CSREA Press, 1998, pp. 307–313.

Perspectives in Drug Discovery and Design, 20: 245–264, 2000. KLUWER/ESCOM © 2000 Kluwer Academic Publishers. Printed in the Netherlands.

Computer-assisted synthesis and reaction planning in combinatorial chemistry JOHANN GASTEIGER, MATTHIAS PFÖRTNER, MARKUS SITZMANN, ROBERT HÖLLERING, OLIVER SACHER, THOMAS KOSTKA and NORBERT KARG Computer-Chemie-Centrum, Institut für Organische Chemie, Universität Erlangen-Nürnberg, Nagelsbachstraße 25, D-91052 Erlangen, Germany

Summary. In combinatorial chemistry, hundreds of thousands of reactions are run in parallel, on beads, or simultaneously in solution. A careful planning of these reactions is therefore of paramount importance in order to influence the products obtained in these experiments. We present here three software systems that should assist the chemist in solving problems met in combinatorial chemistry: WODCA can be used for the planning of the synthesis of combinatorial libraries. EROS is designed to model the course of chemical reactions to predict their products. CORA is a tool to analyze series of reactions such as those contained in reaction databases to derive knowledge that can be used in designing and simulating chemical reactions. Key words: combinatorial chemistry, library design, synthesis design, strategic bonds, substructure searches, reaction prediction, reaction classification, Kohonen networks

Introduction The advent of combinatorial chemistry has left a profound impact on chemistry in general, and on drug design in particular. The wealth of data obtained in combinatorial chemistry experiments has asked for the development of computer methods to process this large amount of information. Until now, however, nearly all computer-based methods have only taken the structure of compounds into account to address such problems as diversity of chemical libraries, the design of focused or drug-like libraries, and the analysis of results from high-throughput screening. Much less attention has been devoted to the development of computer methods for planning and analyzing the chemical reactions employed in combinatorial chemistry experiments. However, in recent years it has become increasingly clear that an efficient exercise of combinatorial chemistry needs

246 to pay much attention to an understanding of chemical reactions employed in an experiment. Three major questions have to be answered that need information on chemical reactions: • How can I plan the synthesis of my library? • Which products will I get in a given reaction? • What can I learn from a series of reactions that I have performed? The large amount of information gained in combinatorial chemistry experiments quite clearly suggests the use of computerized methods for ordering and analyzing this information. In fact, we believe that combinatorial chemistry in general, and parallel synthesis in particular, provides information that can greatly foster our understanding of chemical reactions. A series of structural variations of a certain reaction type performed under identical conditions, or a given reaction performed under systematic variations of reaction conditions provide information that can give us unique insights into the scope and limitations of a reaction or reaction type. The variety of questions that have to be answered when performing combinatorial chemistry have made us develop three different program systems, each one specifically focused to answer one of the questions posed above. The three systems are: • WODCA for designing the synthesis of combinatorial libraries • EROS for modeling chemical reactions to predict chemical reactivity and the outcome of a chemical reaction • CORA for analyzing a series of reaction instances. In the following, we will briefly outline these systems and some of the methods incorporated within them and then show some applications.

Synthesis design The system WODCA (Workbench for the Organization of Data for Chemical Applications) has been developed over the last 10 years to assist in the planning of the synthesis of individual target compounds [1,2]. Work of recent years now also enables its use in designing entire libraries of compounds. As the acronym implies, WODCA provides a workbench and a series of tools for designing a synthesis. Two of the more important tools are search methods for defining strategic bonds and methods for searching for similar

247

Figure 1. Example of a target structure (TIBO). Strategic bonds (in this example only carbon-heteroatom bonds) are rated by WODCA (relative to the bond with the highest rating 100). The numbers in brackets are the ranking of the bonds.

compounds in libraries of available starting materials [3]. Strategic bonds comprise bonds where a target should be broken apart (disconnected) to obtain synthesis precursors which would react back to the target structure through efficient organic reactions. For each bond the two possible heterolytic bond breakings are explored and evaluated by physicochemical effects such as bond polarity [4], and the stabilization of charges by inductive [5], resonance [6], and polarizability effects [7]. These methods are calculated by rapid procedures collected in PETRA (Parameter Estimation for the Treatment of Reactivity Applications) [8,9]. Furthermore structural effects such as branching of bonds, positions at ring systems, and stereochemical centers are taken into account. The rating of the strategic bonds of a molecule is then scaled to a value between 0 and 100. Figure 1 shows the results obtained for the compound TIBO [10]. The searches for available starting materials are performed on the basis of similarity criteria that are either based on structural similarity or on generalized reactions [3]. In order that WODCA can be used for the design of the synthesis of libraries of compounds, substructure search methods have been included in the system. These, in conjunction with the search for strategic bonds, can provide a series of representatives of certain classes of compounds that can act as precursors for the synthesis of an entire library. We will briefly explain how these methods in WODCA can be utilized in a sequence of steps to derive a series of starting materials for a combinatorial library. 1. A typical representative of a desired library is input. 2. A search for strategic bonds is performed. This points out the bonds that should be retrosynthetically broken.

248 3. One or more strategic bonds are broken. In the decision which bonds should be broken, the chemist/user can follow the rating suggested by the program, or she/he can override it and select her/his own strategic bonds. 4. The strategic bonds will be broken and the free valencies thus obtained will be appended with atoms or groups of atoms in a way that takes care of the charges generated after heterolysis. 5. These synthesis precursors are presented to the user in order that she/he removes atoms or groups at those positions where the precursors should be generalized, where variations in the groups should be made in the combinatorial synthesis. 6. With these fragments substructure searches are performed in a catalog of available starting materials. This provides all members of a class of compounds contained in a catalog of chemical suppliers. Various catalogs are incorporated in the WODCA system. Mechanisms are available to load other catalogs such as in-house or corporate databases. These steps are illustrated in Figure 2 with the example of developing a synthesis for substituted pyrazoles and are further commented: 1. 1-Pheny1-3,5-dimethylpyrazole is chosen and input as a typical representative of a desired library of compounds. 2. The rating of strategic bonds is shown in Figure 2. This rating defines the sequence in which the bonds would automatically be broken: First, the bond with the higher rating, then the bond with the lower rating. 3. The user can follow the sequence of strategic bonds suggested by the program or can change the order. Furthermore, it is possible to require several bonds to be broken automatically. In the given example, the two bonds suggested by the program were accepted as strategic to be broken. 4. In breaking these strategic bonds WODCA realizes, on the basis of calculations of physicochemical effects by methods contained in the PETRA package, that these bonds should be broken heterolytically and it determines where the charges should go. Based on these charges, decisions are made which atoms or groups, in our case hydrogen and oxygen atoms and a hydroxy group, should be added to these free sites. 5. The user has decided to remove the two methyl groups from the precursor 2-hydroxy-pent-2-ene-4-on. Various options are available to define which atoms are allowed at these open positions for the substructure search. In our case, no restrictions on the types of atoms allowed at these free sites were imposed. Furthermore, the option that allows tautomerization was invoked. For the other precursor, phenylhydrazine, all hydrogen atoms

249

Figure 2. Steps in the synthesis planning of a pyrazole library with WODCA (see also text): 1. given target structure (1); 2. strategic bonds rated by WODCA; 3. both bonds are broken by user; 4. precursors (2) and (3) generated by WODCA; 5. definition of open sites at these precursors at all positions where the precursors could be generalized; 6. substructure search in Fluka catalog.

on the phenyl ring were removed, basically asking for a search for any substituted phenylhydrazine. 6. The two substructure searches in the Fluka catalog contained in WODCA provided 144 hits for the 1,3-dicarbonyl-substructure (54 hits in the Acros catalog) and 23 substituted phenylhydrazines (38 phenylhydrazines in the Acros catalog). The hits obtained with the 1,3-dicarbonyl subunit contained quite a few /β-ketoesters, β-ketoamides, and substituted malonic

250

Figure 3. More specific substructures defined for the precursors (2) and (3). The number of hits in the Fluka catalog (and, in parentheses, in the Acros catalog) for each substructure is given.

acids. In order to exclude these, more specific substructure searches have to be performed. On the other hand, if also aliphatic hydrazines should be included, a more general substructure has to be specified. Figure 3 shows the results obtained with these variations in the definition of the query substructures. This selection of starting materials can be entered into a combinatorial chemistry experiment or a parallel synthesis. For modeling and evaluating such a synthesis of a library these precursors can be handed over to the EROS system (see next section).

Reaction prediction It becomes increasingly clear that combinatorial chemistry asks for a deeper understanding of the reactions to be performed in synthesizing a library. It is common knowledge that organic reactions do not always give the products that one expects, that they might give by-products, that the yields might be low, and that reactions proceed with different rates. In the initial phases of the development of combinatorial chemistry these points have not always obtained enough attention; however, they now gain increasing importance. Since many years we have been concerned with the modeling of chemical reactivity [11–15]. To handle these investigations we have developed the EROS system (Elaboration of Reactions for Organic Synthesis) [16–19]. Recently, we developed a new version of EROS that has a wide range of applications, from modeling laboratory reactions through studying the degradation of chemicals in the environment, all the way to simulating mass spectra [19]. A detailed description of this new version of EROS is given elsewhere [19,20]. Here, we only want to outline its application to combin-

251 atorial chemistry. The wide range of applications of EROS has been realized by the introduction of novel, clearly defined concepts. Again, more details are given elsewhere; here only essentials necessary for showing the utilization of EROS for combinatorial chemistry are given. In order to model a chemical reaction, details on the experimental set-up have to be given. To this end, information on the number of reactors and the number of phases, on the mode of a reaction, and on the kinetic set-up have to be specified. These notions are now further explained: A reactor is a place where reactions are run at the same time. Such a reactor may consist of several places that are separated from each other but communicate with each other; these are the phases. Such a phase might be characterized by different reaction conditions, e.g., by different solvents; the communication occurring through the interface. The mode of a reaction specifies how molecules will react with each other: A situation where each starting material might react with each other molecule (mode = mix), or a situation where one has mono- or pseudomono-molecular kinetics (mode = monomolecular), or a set-up where no reaction occurs at all (mode = inert). The mode can be used in conjunction with information in the rule file on rate constants of the individual reaction steps (vide infra) to set up a system of differential equations for the kinetics of the system. This system of equations can be solved either by the Gear algorithm [21], or the Runge-Kutta [22], or Runge-Kutta-Merson method [23]. The use and benefits of the concepts introduced above are illustrated with a variety of examples elsewhere [19] and will be indicated here for the example of a combinatorial chemistry experiment. EROS aspires to be applicable to the entire range of organic chemistry. This is to say, that the design of EROS is such that any type of organic reaction and arbitrary conditions can, in principle, be modeled. However, the knowledge presently available to EROS allows an in-depth modeling for only a few reactions. It is presently up to a system engineer or computer chemist to develop models for a specific reaction type of interest in close collaboration with an organic chemist. Mechanisms for developing such a knowledge base for EROS are available. In the design of EROS a clear-cut separation of the system proper from the knowledge base has been made. The knowledge of which reactions EROS can be applied to is contained in a separate file of rules on reaction types (Figure 4). This modular set-up allows easy addition of rules for new reaction types, thereby extending the scope of EROS. EROS considers reactions as bond breaking and making and electron shifting procedures much in the same ways as an organic chemist might draw curved arrows for specifying a reaction mechanism. A reaction rule specifies

252

Figure 4. Basic set-up of the EROS system, the system core interacts with the routines for calculating physicochemical effects and the external knowledge base.

Figure 5. General reaction scheme for the breaking of two bonds followed by the formation of two new bonds; constraints for A, B, C and D can be applied according to atom type, atom and bond properties.

which bond and electron shifting pattern should be used. It can further make restrictions on the types of atoms (A, B, etc.) and bonds (single, double, etc.) that this bond rearrangement is to be performed to. Furthermore, a reaction rule can provide mechanisms for the evaluation of a reaction type. Such an evaluation might range from the calculation of heats of reaction [24,25], through explicit mathematical functions to model chemical reactivity [11], all the way to the calculation of absolute rate constants as exemplified for the hydrolysis of amides [26]. In this endeavor, the evaluation of a reaction can incorporate a variety of physicochemical effects that are automatically calculated by PETRA for any structure generated in an EROS run. These methods calculate partial atomic charges [4,6], qσ and qπ, quantitative measures of the inductive χσ [5], resonance χπ [6], and polarizability effect α [7], as well as heats of formation

25 3

Figure 6. Different phases are used to store starting materials and reaction products.

[25]. The more experimental evidence is available for a certain reaction type, the better an evaluation can be derived and incorporated into a reaction rule. A variety of software tools has been developed for extracting knowledge on a reaction type from a series of reaction instances. These comprise statistical methods [12], as well as neural networks [15,27,28], both of unsupervised and supervised learning. One such approach will be detailed in the next section. First we will, however, show how the concepts of reactors, phases, and modes allow the handling of a set of reactions from a parallel synthesis or a combinatorial chemistry experiment. For a parallel synthesis or combinatorial chemistry experiment, a single reactor has to be defined as all reactions are performed at the same time. The set of all starting materials are analyzed for predefined substructures (e.g., acid chlorides) and all starting materials having a given substructure are stored in one phase. Thus, one has initially to specify as many phases as one has different types of starting materials. We will illustrate the application of EROS to combinatorial chemistry with the parallel synthesis of amides from acid chlorides and amines. In this case, one needs two phases for storing the two types of starting materials (Figure 6). The mode of the first phase that stores all acid chlorides is set to ‘inert’ because no reactions are to be performed between these starting materials. The mode of the second phase, storing the amines, is set to ‘interface’. This has the effect that one amine after the other is taken from this phase and each one is individually reacted with each separate acid chloride (at the ‘interface’). Thus, all combinations of amines with acid chlorides are produced.

254

Figure 7. 15 acid chlorides react with 15 amines. All possible 225 amides were generated.

In the given example 15 acid chlorides were reacted with 15 amines giving 225 amides (Figure 7). When several reaction steps are performed, more reactors are needed. In a previous publication [19] it has been shown how the Ellman synthesis [29] of 1,4-benzo-diazepines from 2-aminobenzophenones, amino acids, and alkylating agents can be modeled by a sequence of five phases. The reaction rules contain procedures for the evaluation of a reaction type. These models are derived by learning from a series of individual reactions much in the same way as chemists have learned their rules on chemical reactivity from a set of individual reactions. The more observations on reactions are available and the better the experimental data, the more detailed and better a model can be developed. Clearly, if data on the kinetics of a series of reactions are available, an indepth evaluation of a reaction type can be derived. We will show this in the following for a quantitative evaluation of the hydrolysis of amides. When less detailed information is available, such as reaction yields, only a more cursory model can be obtained. At the lower end of the scale we only have information on reactions that have been observed such as on those stored

255

Figure 8. The numbering of atoms at the reaction center ana all physicochemical aescnptors as used in Equation 1.

in a reaction database. How knowledge on a reaction type can be obtained even in such a case will be shown in the next section on learning from reaction databases. But let us first address the case where kinetic data are available. Some time ago, we have performed an in-depth analysis of the hydrolysis of amides, both under basic conditions and under acid catalysis [26]. Here, we will only concentrate on amide hydrolysis under basic conditions. Pseudo-first order rate constants for the hydrolysis of 13 benzamides, 18 anilides, and five phenylureas were gathered from literature, standardized to unit concentration of OH– and converted into free energies of activation, ∆ G≠. Then, a set of descriptors on electronic effects at the reaction center was calculated by the PETRA package for each individual reaction. The dataset was split into a training set of 33 reactions and a test set of five reactions. Stepwise multi-linear regression analysis on the training set provided Equation 1 for the quantitative modeling of this reaction type. ∆ G≠ = 107.6 – 198.1∆qπ (2–3)

+

1.39R+(2–1)

+

1.10R+(3–2)

(1)

(in kJ mol–1) ∆ qπ ( i–j ) gives the difference in the π charge of the bond between atoms i and j, whereas R+( i–j ) is a measure of the stabilization of a positive charge through the resonance effect on atom i obtained on heterolysis of the bond i–j. Figure 8 shows the numbering of atoms at the reaction center and identifies the physicochemical descriptors as used in Equation 1. Equation 1 modeled the free energies of activation of the reactions in the training set with a correlation coefficient, r, of 0.958 and a standard error, s, of 3.3 kJ mol–1 (0.58 log k units) in a range of 27 kJ mol–1. Application of Equation 1 to the five reactions of the test set showed a standard error of 1.8 kJ mol–1 (0.31 log k units).

256 This equation was incorporated into the rule base of EROS and allowed the calculation of absolute rate constants for the hydrolysis of a wide range of benzamides, anilides, and phenylureas [26]. It was shown that it could be generalized to the hydrolysis of benzoylphenylureas such as those constituting important agrochemicals. Clearly, enough kinetic data that allow such an in-depth analysis are only available for a few reaction types. Then, other data – preferentially quantitative ones – are necessary for deriving models for the evaluation of reaction types. We attribute much hope in data on the yields of a series of reactions run under identical or systematically varied reaction conditions. We challenge readers to provide us such data in order that we can derive a model for their reaction type. When not even yields for reactions run under comparable conditions are given, another approach for deriving knowledge on reactions has to be chosen. This is the theme of the following section.

Knowledge from reaction databases The advent of reaction databases had a profound impact on the planning of reactions in the laboratory. Detailed information on a multitude of individual chemical reactions can be brought into the laboratory with the tip of a finger. With the growing size of these reaction databases – the largest among them contain several millions of reactions – it became of increasing interest to extract knowledge on chemical reactions from these databases, to learn rules on reaction types from a series of individual reactions. Such an inductive learning process has been the cornerstone of accumulating knowledge on chemical reactions from the very beginning; chemists have ordered individual reactions according to their common features, have thus defined reaction types and have made inferences on the outcome of a chemical reaction through reasoning by analogy. It is attractive to perform such an inductive learning on an electronic basis now that reaction information has become available in electronic form. Neural networks are powerful inductive learning procedures with a broad range of applications in chemistry and drug design [30]. Since several years we are utilizing machine learning and unsupervised neural network methods for the classification of reactions into reaction types [27,28,31]. Such a classification can answer important questions met in combinatorial chemistry: • Will my selected reaction proceed in the desired direction? • Do I have a reaction with a broad scope? • Are my selected reactions diverse enough?

257

Figure 9. A reaction is a point in a multidimensional space spanned by various physicochemical effects (∆ Hf: heat of reaction; R: resonance effect; q: charge distribution; χ: inductive effect). Such a space is projected into two dimensions by a Kohonen network.

Figure 10. Similarity of chemical reactions: different directions for different types of similarities and different distances for different degrees of similarities.

The investigation of a set of reactions first asks for an appropriate representation of chemical reactions. We have first concentrated on studying the influence of the structure of the starting materials onto the course of a chemical reaction, leaving aside, for the time being, the influence of reaction conditions. The course of a chemical reaction is largely governed by the physicochemical effects exerted onto the reaction site, onto the bonds being broken and made during a reaction. These different physicochemical effects can be considered as coordinates of a space; a chemical reaction is then a point in such a multi-dimensional space. We will use a self-organizing neural network as introduced by Kohonen [32] to project such a multi-dimensional space into a discrete two-dimensional space (Figure 9) [27]. A two-dimensional space is particularly suited for the visualization of similarities between reactions (Figure 10). Different directions can represent different types of relationships – to the north lie different reaction types

258

Figure 11. Architecture of a Kohonen network. An input pattern (vector) X consists of m elements. Each neuron of the network is represented by a column with m weights, wjk.

than to the south east. And different distances represent different degrees of similarities – the closer two reactions are, the more similar they are. A brief introduction to the self-organizing neural network developed by Kohonen seems warranted for an understanding of the investigations reported here. Figure 11 shows the architecture of a Kohonen network in the form of a two-dimensional arrangement of neurons. Each neuron, a vertical column, contains as many weights, wji, as there are input data (descriptors), xi, for each object to be projected into this network. In our application, an object is a chemical reaction characterized by physicochemical descriptors at the reaction site (see the following example). A reaction will be put into that neuron that contains weights most similar to the descriptors of the input object. A competitive learning algorithm will adjust the weights of the neurons of the network to the input data. Reactions described by similar physicochemical effects will be put into the same or adjacent neurons. Thus, a Kohonen network can be used for similarity perception and for clustering a dataset of reactions into reaction types [27,28]. In the following study a dataset of reactions was investigated where all reactions involved the breaking of a C-O and of a C-H bond and the making of a C-C bond. These reactions were characterized by the physicochemical descriptors indicated in Table 1. The dataset consisted of 288 reactions that were projected into a Kohonen network of size 20 × 20. Figure 12 shows the view from above (cf. Figure

259 Table 1. Eight physicochemical parameters used to characterize each reaction center of 288 training reactions and 266 test reactions

Figure 12. Kohonen map obtained for the classification of 288 reactions, marked with the Bayes Theorem classification method. White boxes represent empty neurons; boxes with an × denote conflict neurons. The symbols characterize different classes found by Bayes classification.

11) onto the resulting network. Each neuron is marked to identify a specific reaction type that was assigned through a Bayes classification procedure [33]. The classification of reactions into reaction types by a combination of a Kohonen network and a Bayes classification compares favorably with the classification assigned intellectually by a chemist to the mapping by a Kohonen network (Figure 13). (Note that separation into different coherent areas

260

Figure 13. Comparison of classification methods: on the left-hand side the Kohonen map was marked with the Bayes Theorem classification method, on the right-hand side it was marked by chemists intellectually. The different reaction types are indicated by different symbols. Note that the symbols of the two networks cannot directly be compared. Only the assignment of similar coherent areas is important.

is the important result. The symbols of the individual reaction types cannot directly be compared.) Once a Kohonen network has been trained with a series of reaction instances it can be used to assign a membership to a reaction type for a set of test reactions. Figure 14 shows this for 266 reactions having the same reaction center as the reactions studied above. Kohonen networks trained in the fashion described above with a set of reactions can be used to define the diversity of reactions of a combinatorial chemistry experiment. Let us return for this endeavour to the experiment of a parallel synthesis of amides from acid chlorides and amines described in the previous section as an application of EROS to combinatorial chemistry. In this example, 15 acid chlorides were reacted with 15 amines to give 225 amides (see Figure 7). The question is now, do these reactions cover the entire space of amide formations from acid chlorides and amines? To answer this question a Kohonen network was trained with all 214 reactions found in the

261

Figure 14. Kohonen map obtained for the classification of 266 reactions as test dataset. These reactions were projected into the trained network obtained by the classification of 288 reactions. The neurons were marked by chemists intellectually. Table 2. Eight physicochemical parameters used to characterize each reaction center of 225 amide building reactions Electronic variable q tot χσ χπ αi

H–N

x x x x

C–C1

→N–C

x x x x

qtot = total charge. χσ = s electronegativity. χπ = p electronegativity. α i = effective atom polarizability.

Theilheimer reaction database for amide formation from acid chlorides and amines. These reactions were represented by eight physicochemical effects at the reaction site as shown in Table 2. In other words, the information in the Theilheimer database was taken as representing the universe of amide formations from acid chlorides and amines. If this is accepted – which is presently an acceptable definition given what information is available in electronic form – the weights of the Kohonen

262

Figure 15. Visualization of the reaction space: on the left-hand side the 214 reactions (obtained from the Theilheimer database) correspond to the complete reaction space, whereas on the right-hand side the 225 reactions (corresponding to the reaction subspace) cover less than 50% of the trained network.

network (of size 15 × 15) trained with these 214 reactions store the entire scope of amide formations conceivable from these starting materials. Now we send the dataset of 225 reactions obtained from the previously mentioned 15 acid chlorides and 15 amines through this network. Each of these test reactions will be mapped into a neuron that has weights most similar to the descriptors of the reaction considered. Figure 15 compares the Kohonen maps of the 214 reactions from the Theilheimer database and the maps obtained by sending the 225 test reactions through this network of the Theilheimer database, used as reference network. This analysis shows that the 225 test reactions cover less than 50% of the space defined by the Theilheimer database. This suggests the need for additional reactions to be explored to cover the entire range of amide formations exemplified by the Theilheimer database.

Conclusions Chemists have largely learned from experiments, have drawn generalization and conclusion from their observations to plan new experiments. The large amount of reactions performed in combinatorial chemistry offers great potential for enhancing our knowledge on the scope and limitations of reaction types. A set of reactions performed under identical conditions provides information that has to be stored and systematically analyzed.

263 The unsupervised inductive learning techniques embodied in CORA can provide insights into chemical reactions, a knowledge that can be used both for synthesis design and reaction prediction. WODCA provides a set of tools for planning a combinatorial library. Foremost are methods for defining strategic bonds, a process that can benefit from the result of a CORA study. Substructure searches can then be utilized to extend the list of starting materials to be used for the combinatorial chemistry experiment. WODCA has been designed to be highly interactive in order that the chemist/her can bring in her/his special interests. EROS, on the other hand, provides a framework for the simulation of chemical reactions. It needs knowledge derived from a series of experiments to define the scope and limitations of a reaction type, a knowledge again obtainable through the use of CORA. Thus, both WODCA and EROS trace back their inherent knowledge from careful analysis of a series of chemical reactions. Reaction databases play an important role in this endeavor. However, it must be observed that reaction databases still have large deficiencies: quite often not all information on reaction conditions, or the outcome of a reaction (co-products, by-products, yields) are stored. Thus, we attach large expectations to combinatorial chemistry, hoping that the reactions performed are carefully analyzed and this information is stored in electronic form. This would greatly enhance our potential to learn more about chemical reactions.

Acknowledgements We gratefully acknowledge financial support of our work by Deutsche Forschungsgemeinschaft, Bundesministerium für Bildung und Forschung. We thank Dr. Wolf-Dietrich Ihlenfeldt, Dr. Klaus-Peter Schulz, Dr. Larissa Steinhauer, Dr. Lingran Chen and Achim Herwig for their contributions to the work reported here.

References 1. Ihlenfeldt, W.-D. and Gasteiger, J., Angew. Chem., 107 (1995) 2807; Angew. Chem. Int. Ed. Engl., 34 (1995) 2613. 2. http://www2.ccc.uni-erlangen.de/software/wodca. 3. Gasteiger, J., Ihlenfeldt, W.-D., Fick, R. and Rose, J.R., J. Chem. Inf. Comput. Sci., 32 (1992) 700. 4. Gasteiger, J. and Marsili, M., Tetrahedron, 36 (1980) 3219. 5. Hutchings, M.G. and Gasteiger, J., Tetrahedron Lett., 24 (1983) 2541.

264 6. Gasteiger, J. and Saller, H., Angew. Chem., 97 (1985) 699; Angew. Chem. Int. Ed. Engl., 24 (1985) 687. 7. Gasteiger, J. and Hutchings, M.G., J. Chem. Soc. Perkin Trans., 2 (1984) 559. 8. Gasteiger, J., In Jochum, C., Hicks, J. and Sunkel, J. (Eds.), Physical Property Prediction in Organic Chemistry, Springer Verlag, Heidelberg, 1988, pp. 119–138. 9. http://www 2.ccc uni-erlangen . de/software/petra. 10. Pauwels, R., Andries, K., Desmyter, J., Schols, D., Kukla, M.J., Breslin, H.J., Raeymaeckers, A., Gelder, J., Woestenborghs, R., Heykants, J., Schellekens, K., Janssen, M.A.C., Clerq, E. and Janssen, P.A.J., Nature, 343 (1990) 470. 11. Gasteiger, J., Rose, P. and Saller, H., J. Mol. Graphics, 6 (1988) 87. 12. Gasteiger, J., Saller, H. and Löw, P., Anal. Chim. Acta, 191 (1986) 111. 13. Gasteiger, J., Schulz, K.-P. and Kredler, C., J. Chem. Inf. Comput. Sci., 33 (1993) 385. 14. Schulz, K.-P. and Gasteiger, J., J. Chem. Inf. Comput. Sci., 33 (1993) 395. 15. Simon, V., Gasteiger, J. and Zupan, J., J. Am. Chem. SOC., 115 (1993) 9148. 16. Gasteiger, J. and Jochum, C., Top. Curr. Chem., 74 (1978) 93. 17. Gasteiger, J., Hutchings, M.G., Christoph, B., Gann, L., Hiller, C., Löw, P., Marsili, M., Saller, H. and Yuki, K., Top. Curr. Chem., 137 (1987) 19. 18. Röse, P. and Gasteiger, J., Anal. Chim. Acta, 235 (1990) 163. 19. Höllering, R., Gasteiger, J., Steinhauer, L., Schulz, K.-P. and Herwig, A,, J. Chem. Inf. Comput. Sci., 2 (2000) 482. 20. http://www2.ccc.uni-erlangen,de/software/eros. 21. Gear, C.W., Numerical Initial Value Problems in Ordinary Differential Equations, Prentice Hall, Englewood Cliffs, NJ, 1971. 22. Press, W.H., Flannery, B.P., Teukolsy, S.A. and Vetterling, W.T., Numerical Recipies: The Art of Scientific Computing, Cambridge University Press, Cambridge, 1989. 23. Köckler, N., Numerische Algorithmen in Softwaresystemen, Teubner, Stuttgart, 1990. 24. Gasteiger, J., Comput. Chem., 2 (1978) 85. 25. Gasteiger, J., Jacob, P. and Strauß, U., Tetrahedron, 35 (1979) 139. 26. Gasteiger, J., Hondelmann, U., Rose, P. and Witzenbichler, W., J. Chem. Soc. Perkin Trans. 2, (1995) 193. 27. Chen, L. and Gasteiger, J., J. Am. Chem. Soc., 119 (1997) 4033. 28. Satoh, H., Sacher, O., Nakata, T., Chen, L., Gasteiger, J. and Funatsu, K., J. Chem. Inf. Comput. Sci., 38 (1998) 210. 29. Bunin, B.A., Plunkett, M.J. and Ellman, J.A., In Jung, G. (Ed.), Combinatorial Peptide and Nonpeptide Libraries, A Handbook, VCH Verlagsgesellschaft, Weinheim, 1996. 30. Zupan, J. and Gasteiger, J., Neural Networks in Chemistry and Drug Design, VCHVerlag, Weinheim, 1999. 31. Rose, J.R. and Gasteiger, J., J. Chem. Inf. Comput. Sci., 34 (1994) 74. 32. Kohonen, T., Self-organizing Maps, Springer Verlag, Berlin, 1995. 33. http://ic-www.arc.nasa.gov/ic/projects/bayes-group/autoclass/index.html.

Perspectives in Drug Discovery and Design, 20: 265–287, 2000. KLU WER/ESCOM © 2000 Kluwer Academic Publishers. Printed in the Netherlands.

Evaluation of reactant-based and product-based approaches to the design of combinatorial libraries VALERIE J. GILLET* and ORAZIO NICOLOTTI Krebs Institute for Biomolecular Research and Department of Information Studies, University of Shefield, Western Bank, Shefield S10 2TN, U.K.

Summary. The large numbers of compounds that are now available in drug discovery programmes have resulted in the need for methods to select compounds, both from external suppliers and in combinatorial library design procedures. In this article we describe a method that has been developed for scoring and ranking compounds according to their likelihood of exhibiting activity. The method can be used to determine the order in which compounds should be screened as well as to guide compound acquisition programmes. We then describe a series of experiments we have conducted that explore the benefits of designing combinatorial libraries through an analysis of product space rather than reactant space. The experiments are based on several different diversity metrics and molecular descriptors. We also show how productbased selection allows multi-objectives to be optimised simultaneously, for example, diversity and physicochemical properties, allowing the design of diverse and drug-like libraries. Key words: bioactivity profiles, combinatorial library design, diversity, product-based selection, reactant-based selection

Introduction Combinatorial chemistry and high-throughput screening are now important techniques in the discovery of novel bioactive compounds in the pharmaceutical and agrochemical industries. Evidence of this is seen in a recent review that lists as many as 321 combinatorial libraries that were reported in the literature in 1998 alone [1]. Combinatorial chemistry allows very large numbers of compounds to be synthesised simultaneously; however, in practice, the size of libraries is often constrained by many factors including the capacity of screening programmes and the cost of reactants. Hence, there has been a great deal of interest in developing effective methods for selecting subsets of compounds, both in the design of libraries and in compound acquisition programmes [2,3]. The criteria used for selecting compounds depend on the * To whom correspondence should be addressed. E-mail: v,[email protected].

266 application for which the compounds are being used. For example, when designing libraries for screening across a wide range of biological targets the emphasis is usually on diversity so that a wide variety of structural types are contained within the library. Assessing diversity requires firstly that the compounds are described using numerical descriptors and secondly the definition of a metric that is used to quantify diversity with respect to the descriptors used [2,3]. Despite the recent flood of research into methods for assessing diversity there is still much debate about which methods are the best [4–9]. Other criteria, such as similarity to known actives, assume importance in the design of lead optimisation libraries. In this article we describe a method we have developed for scoring and ranking compounds according to their likelihood of exhibiting activity. The method can be used to determine the order in which compounds are screened as well as to guide compound acquisition programmes. We then describe a series of experiments we have conducted that explore the benefits of designing combinatorial libraries through an analysis of product space rather than reactant space. The experiments are based on several different diversity metrics and molecular descriptors. We also show how product-based selection allows multi-objectives to be optimised simultaneously, for example, diversity and physicochemical properties, allowing the design of diverse and drug-like libraries.

Bioactivity profiling It is evident that the effectiveness of screening libraries will be increased if the compounds contained within the library have ‘drug-like’ properties. However, it is difficult to define clearly the concept of ‘drug-likeness’ in terms of the exact characteristics that a molecule should have in order to be viable as a drug. Rather, biological activity is known to be the result of a complex range of different characteristics such as lipophilicity, flexibility, hydrogen bond donating ability, etc. Despite these difficulties in characterising ‘druglike’ molecules there are some general criteria that can be applied during combinatorial library design and compound acquisition programmes to filter out undesirable compounds. For example, eliminating high molecular weight compounds, compounds that contain reactive groups which may interfere with the intended reaction, or lead to toxic or unstable products, and highly flexible molecules. Another example of a filtering technique is the well known ‘Rule-of-five’ developed by Lipinski et al. [10]. The rule is based on easy-tocalculate properties that are designed to identify compounds that are likely to exhibit poor intestinal absorption. We have developed a knowledge-based approach for estimating the likeli-

267 Table 1. SMARTS definitions used to identify the presence of hydrogen bond donors (HBD), hydrogen bond acceptors (HBA) and rotatable bonds (RB) within a molecule Feature

SMARTS

HBD HBA RB

[!#6;!H0] [$([!#~;+0]);!$([F,C1,Br,I]);!$([o,s,nX3]);!$([Nv5,Pv5,Sv4,Sv6])] [!$(*#*)&!D1]-&! @[!$(*#*)&!D1]

hood of a molecule exhibiting bioactivity [11]. Weights are derived that can be used to score and rank compounds so that if the compounds are screened in rank order the active molecules should be found more rapidly than if they are screened at random. Thus, the method can be used to order compounds for screening and to guide compound acquisition programmes. We characterise activity by analysing what is currently known about bioactive molecules, and we define a molecule as likely to be bioactive if it has characteristics that are similar to known bioactive molecules. A limitation of the method is that it clearly biases compounds towards those that have been shown to exhibit activity in the past; however, given that we still have much to learn about the structure-activity relationships that exist in known areas of biological activity, we believe this to be a valuable ‘data mining’ tool. A training set, consisting of molecules in two different classes, is used to derive weights that are then used to score and rank molecules. A molecule is scored according to the extent to which its properties are typical of all the molecules in the class in which it belongs. For example, weights can be derived that discriminate between ‘drug-like’ and ‘non-drug-like’ compounds and the molecules are ranked according to their likelihood of exhibiting activity in any therapeutic area. When choosing compounds for screening against particular therapeutic targets, weights can be derived that are specific for a given therapeutic area, for example, CNS activity. The weights are based on easy-to-calculate physicochemical properties such as molecular weight (MW), number of rotatable bonds (RBs), number of hydrogen bond donors (HBDs), number of hydrogen bond acceptors (HBAs), number of aromatic rings (ARs), 2 κα shape descriptor [12], and ClogP [13]. In principle, any other easy-to-calculate properties could also be used. The MW, AR and 2κα shape index features used in the experiments described below were calculated using the Daylight toolkit [14]. A hydrogen bond donor is defined as any heteroatom that carries at least one hydrogen, and a hydrogen bond acceptor is defined as a heteroatom with no positive charge, excluding the halogens, aromatic oxygen, sulphur and pyrrole nitrogen and the higher

268 oxidation levels of nitrogen, phosphorus and sulphur. Note that an atom can be considered as both a donor and an acceptor. The SMARTS definitions of these substructural features are given in Table 1. The distribution of each feature in a set of compounds is represented by a set of bins with a total of 20 bins per feature. The structural features HBD, HBA, RB and AR are represented by counts and the bin size is set to one. Thus, for HBDs, the first bin represents the number of molecules in the database that have no donors, the second bin represents the number of molecules with exactly one donor, and so on, with the final bin representing the number of molecules with 19 or more donors. The physicochemical properties are also represented by bins, but in these cases the bins represent ranges of values. For example, the first bin for 2 κα, represents the number of molecules with 2κ α values in the range 0.00–1.99, the second bin represents the number of molecules with values in the range 2.00–3.99, and so on. The bins representing the distribution of MW have a range of 75, so that the bins represent the ranges 0.00–74.99, 75.00–149.99, . . . and > 1425.00. Weights are then assigned to each of the bins at random and a genetic algorithm (GA) [15] is used to derive optimum weights that maximise the discrimination between two classes of compounds. The chromosomes of the GA are integer strings that map directly to the weights. The standard genetic operators of crossover and mutation are used to generate child chromosomes. The fitness function of the GA measures the extent to which the weights contained in a chromosome can be used to discriminate between two classes of molecules, merged within the training set. Each molecule in the training set is scored by summing weights over all the features where the weight for an individual feature is determined by the value of that feature within the molecule. The molecules are then ranked according to decreasing score and the fitness function is calculated as the average ranked position of molecules in the preferred set (for example, the set of active compounds). Thus the GA attempts to shift the distribution of scores in one class of molecules relative to the other class in order that maximum separation between the two distributions is achieved. The method has been applied to the discrimination of drugs and nondrugs as represented by the World Drugs Index (WDI) [16] and the SPRESI database [ 17], respectively. The databases were preprocessed as follows: the molecules were restricted to those that contain the elemental types: C, N, O, F, P, S, C1, Br, and I; those with molecular weight in the range 100 to 1000; only parent compounds were included in the case of salts; and where possible charges were neutralised by altering the number of hydrogens. Adjusting the charges ensures that the molecules are treated consistently with respect to pH and it also allows simpler definitions of hydrogen bond donors and acceptors

269 to be used. SPRESI was further processed by removing the compounds that occur in WDI, and then selecting a 16 661-member random sample. Previous experiments showed that the subset of SPRESI is representative of the whole database [11]. It is assumed that the remaining SPRESI compounds represent inactive molecules. In practice, of course, there may well be SPRESI molecules that have not yet been identified as potential active molecules but the percentage of these is assumed to be negligible. (The fact that drug companies typically screen 10 000 molecules to find a novel lead compound implies that drug activity is a rare event and therefore the chance of finding active compounds in SPRESI is low.) WDI was further processed by analysing the activity classes assigned by Derwent. Molecules were removed as follows: molecules with no activity class assigned, molecules that are labelled as ‘trial-prep’ and molecules that belong to the following activity classes: pesticides and plant hormones (except for fungicides), zootoxins, toxins, surfactants, diagnostics, chelators and adsorbents. It is assumed that the remainder of WDI represents a wide variety of active molecules and that it is not biased towards any particular class(es) of compound(s), although an inspection of its contents suggests that at least some classes, such as antimicrobials, are overly represented. We then selected a random sample of 1 000 compounds. The features (number of HBDs, HBAs, RBs, AR, MW, and 2κα, and ClogP) were calculated for each of the molecules in SPRESI and WDI. The GA was run to minimise the average position of WDI molecules once the molecules had been scored and ranked. The distributions in Figure l show a clear separation between the two classes of compounds. In terms of a screening experiment, these results indicate that screening 11% of the total set of compounds would result in the extraction of 50% of the WDI compounds, as shown in Figure 2. Figure 3 shows the results of applying the weights in a predictive manner, that is to 10 000 WDI and 166 610 SPRESI compounds. It can be seen that the weights are also effective when applied to previously unseen compounds. Extensive experiments have already been reported [11] that demonstrate that the method is even more effective when applied to discriminate specific therapeutic classes from inactive compounds: for example, the effectiveness of the method at discriminating compounds belonging to the class of antibiotics from SPRESI compounds is shown in Figure 4. The method can also be used to identify compounds in a given therapeutic class from within ‘druglike’ collections. For example, Figure 5 illustrates the effect of training the GA to discriminate between compounds within the class of antibiotics and compounds in other therapeutic classes within WDI. A similar approach to that described here has been adopted at Glaxo-

270

Figure 1. The distribution of scores for compounds in WDI, in black, is shown superimposed on the distribution of scores in a subset of the SPRESI database. The y-axis represents the percentage of compounds, the x-axis represents the score.

Wellcome for the selection of a corporate screening set of compounds [18]. Similar methods have also been developed more recently by Ajay et al. [19], Sadowski [20] and Wagener and van Geerestein [21]. The methods differ in the algorithms used to discriminate between compounds (for example, neural networks and recursive partitioning are used in place of the GA), the descriptors that are used to represent the compounds, and in the data to which the methods are applied. However, broadly similar results are found in all cases. A GA has the potential advantage over neural networks that the weights that are optimised are visible (in neural networks the weights are hidden). Thus, in theory the weights produced by the GA are interpretable. However, further analysis is required in order to interpret the weights found here since a single set of weights is generated that encompasses information from a whole range of structures and since no account is taken of the co-occurrence of features within a molecule.

Library design in product space The design of a combinatorial library experiment involves identifying pools

271

Figure 2. The number of WDI compounds found over intervals of the ranked list. In this simulated screening experiment 50% of the drug-like compounds are found by screening 11% of the compounds. The black horizontal line shows the rate at which the WDI compounds would be found if they were distributed at random throughout the list.

of available reactants, for example, by searching in-house databases and the Available Chemicals Directory [22]. Experience shows that potential reactants for most combinatorial syntheses are readily available in numbers that greatly exceed the capacity of current screening procedures. Some reactants can be eliminated from further consideration through the use of filtering techniques that remove those with undesirable characteristics, using approaches analogous to those discussed in the previous section. However, the number of available reactants that remain often still exceeds capacity and hence it is necessary to apply selection techniques in order to reduce the sizes of the reactant pools to manageable sizes. In most approaches to library design, subset selection is applied at the reactant level and a product library is then synthesised from the chosen reactants. However, although this approach is computationally appealing, a limitation is that optimising the reactants, for example, according to structural diversity, does not imply an optimised set of products [23]. An alternative approach involves enumerating the full virtual combinatorial library and performing the selection at the product level. Any of the methods developed for reactant-based selection can be applied at the product level

272

Figure 3. The results of applying the weights predictively are shown superimposed on the results found for the training compounds. The training set consists of 1 000 WDI compounds and 16 661 SPRESI compounds. The predicted sets consist of 10 000 WDI compounds and 166 610 SPRESI compounds. The dashed line represents the results that would be expected if the WDI compound were distributed at random throughout the ranked list.

in a process known as ‘cherry-picking’. However, although this approach can lead to a subset of products that are optimally diverse, it is synthetically inefficient when mapped to a combinatorial library experiment since no account is taken of the combinatorial constraint. That is, there is no guarantee that each reactant from one pool occurs in a product with each reactant from a second pool. For example, we conducted an experiment to cherry-pick 1 600 diverse products from a 400 × 400 virtual amide library and found that maximising the structural diversity required no less than 137 different amines and 146 different carboxylic acids [23]. The systematic joining of all these amines to all of the carboxylic acids as performed in practical combinatorial synthesis would result in 19 992 molecules, of which the 1 600 most diverse molecules are a subset. The synthetic inefficiency of performing selection at the product level by cherry picking has also been noted by Cribbs et al. [24]. In their work, nearly all of the reactants were required in order to build the selected molecules. We have developed a program called SELECT [23,25] that performs product-based selection taking direct account of the combinatorial constraint.

273

Figure 4. The distribution of scores for antibiotics, in black, is shown superimposed on the distribution of scores in a subset of the SPRESI database. The y-axis represents the percentage of compounds, the x-axis represents the score.

Figure 5. The distribution of scores for antibiotics, in black, is shown superimposed on the distribution of scores in WDI, with the antibiotic compounds removed. The y-axis represents the percentage of compounds, the x-axis represents the score.

274 That is, SELECT can be used to design combinatorial subsets that are by definition synthetically efficient and that are optimised with respect to diversity and other user-defined properties. The diversity of libraries designed using SELECT can be measured using different descriptors, for example, Daylight [14] and UNITY [26] fingerprints and Molconn-Z parameters [27], and different diversity metrics, for example, the sum-of-pairwise dissimilarities and average nearest neighbour distance. SELECT [25] is based on a genetic algorithm and uses a multi-objective fitness function that allows many properties to be optimised simultaneously with diversity. Thus, the physicochemical property profiles of libraries can be optimised in the design of diverse and ‘drug-like’ libraries. SELECT can also be used to design libraries that complement existing libraries and to explore different library configurations. In a previous study [23] we investigated the effectiveness of product-based selection of a combinatorial library relative to reactant-based selection. Our experiments considered selecting a range of different sized subsets from three different combinatorial libraries using Daylight fingerprints as descriptors and a single diversity metric, the sum-of-pairwise dissimilarities using the cosine coefficient. We used dissimilarity-based compound selection (DBCS) [28] to select diverse reactants that were then enumerated into a product library and the diversity of the library was measured using the sum-of-pairwise dissimilarities. Diverse combinatorial subsets were then selected from the full virtual libraries using SELECT and their diversities were compared with the analogous libraries selected by analysing reactant space. Our experiments demonstrated that choosing reactants through an analysis of product space results in significantly more diverse libraries than if the selection is made at the reactant level, using the chosen combination of descriptor (Daylight fingerprints) and diversity metric (the sum-of-pairwise dissimilarities). In fact, we found that the product-based combinatorial libraries were intermediate in diversity between reactant-based selection and cherry picking in product space; however, they have the considerable advantage over cherry picking in that the subset libraries are themselves combinatorial libraries and hence amenable to efficient synthesis. We report here a more extensive series of experiments to determine the effectiveness of product-based selection relative to reactant-based selection for a number of different descriptors and diversity metrics. Specifically, we have investigated the effectiveness of product-based combinatorial library design considering three different descriptors, three different diversity metrics and two different libraries. All calculations were made using the SELECT program. The descriptors are 1024 bit Daylight fingerprints [14], 992 bit UNITY fingerprints [26] and 538 Molconn-Z parameters [27]. The Molconn-

275 Z parameters are real numbers that have been standardised to fall in the range 0. . . 1. The diversity metrics are the sum-of-pairwise dissimilarities calculated using the cosine coefficient (and implemented using the O(N) centroid algorithm [29]), SUMcos , the sum-of-pairwise dissimilarities using the Tanimoto coefficient, SUMTAN, and the average nearest neighbour distance using the Tanimoto coefficient, NN. SUMcos for a library of N molecules is defined as:

where COS(J,K) is the similarity between molecules J and K defined using the cosine coefficient; SUMTAN for a library of N molecules is defined as:

where TAN(J, K) is the similarity between molecules J and K defined using the Tanimoto coefficient; and NN for a library of N molecules is defined as:

where 1 – TAN(J, K) is the distance between molecule J and molecule K and MIN ( 1 – TAN(J,K)) is the distance from molecule J to its closest neighbour. The experiments were carried out as follows. In reactant-based selection, SELECT was used to choose diverse subsets of reactants of specified sizes from each of the reactant pools independently, the reactants were then enumerated to form a product library and its diversity was measured using the same metric as was used to select the reactant subsets. (SELECT can be used for reactant-based selection by setting the number of components in the library to one.) In product-based selection, SELECT was used to find an optimised combinatorial subset directly. In each experiment, the descriptors used and diversity metrics to be optimised were the same for the reactantbased selection and the product-based selection and the resulting libraries were of the same size and configuration. The first library to be investigated was a two-component library amide library where amines in one pool are reacted with carboxylic acids from another to form amides. A virtual library was built using 100 amines and 100 carboxylic acids, the reactant pools each being formed by extracting structures at random from SPRESI [17]. Daylight fingerprints, UNITY fingerprints and Molconn-Z parameters were calculated for each of the reactants

276

Figure 6. The thiazoline-2-imine library. Table 2. Reactant-based versus product-based diversities for amide libraries selected using Daylight fingerprints as descriptors. The column headed ‘Min’ gives the diversity calculated when SELECT was run to find combinatorial subsets with minimum diversity. The final column gives the percentage difference in diversity between product-based and reactant-based selection relative to the range of values possible (given by subtracting the Min diversity from the Product diversity) Metric

Size

Reactants

Products

Min

% Diff

SUMCOS

900 (30 × 30) 400 (20 × 20) 100 (10 × 10)

0.565 (0.002) 0.567 (0.001) 0.560 (0.001)

0.586 (0.002) 0.595 (0,001) 0.601 (0.001)

0.356 0.305 0.227

9.4 9.6 10.8

SUMTAN

900 (30 × 30) 400 (20 × 20) 100 (10 × 10)

0.715 (0.002) 0.717 (0.004) 0.706 (0.006)

0.744 (0.002) 0.750 (0.001) 0.747 (0.002)

0.522 0.462 0.362

12.5 11.4 10.6

900 (30 × 30) 400 (20 × 20) 100 (10 × 10)

0.253 (0.003) 0.270 (0,008) 0.315 (0.003)

0.305 (0.001) 0.347 (0.004) 0.405 (0.005)

0.045 0.034 0.019

20.1 24.4 27.8

NN

in the reactant pools. Next, the full library of 10 000 amides was enumerated, and the descriptors were calculated for each product molecule. The second library was a three-component library that is based on a thiazoline-2-imine template [30] and the reaction is shown in Figure 6. The R1 reactants are isothiocyanates; the R2 reactants are amines; and the R3 reactants are haloketones. Reactants for each pool were extracted at random from SPRESI. The pools consisted of 10 isothiocyanates; 40 amines; and 25 haloketones, representing a fully enumerated virtual library of 10 000 thiazoline-2-imines. Daylight fingerprints, UNITY fingerprints and MolconnZ parameters were calculated for each product molecule in the enumerated virtual library and for each reactant in the reactant pools. Table 2 shows the results for selecting amide libraries of various sizes using reactant-based selection and product-based selection for Daylight finger-

277

Table 3. Reactant-based versus product-based diversities for amide libraries selected using UNITY fingerprints as descriptors. The final two columns are as described for Table 2 Metric

Size

Reactants

Products

Min

% Diff

SUMCOS

900 (30 × 30) 400 (20 × 20) 100 (10 × 10)

0.552 (0.002) 0.569 (0.001) 0.576 (0.005)

0.566 (0.002) 0.584 (0.003) 0.601 (0.001)

0.339 0.302 0.226

5.9 5.2 6.7

SUMTAN

900 (30 × 30) 400 (20 × 20) 100 (10 × 10)

0.715 0.717 (0.003) 0.727 (0.004)

0.727 0.737 (0.002) 0.746 (0.001)

0.507 0.470 0.364

5.5 7.4 5.0

NN

900 (30 × 30) 400 (20 × 20) 100 (10 × 10)

0.243 0.272 (0.009) 0.297 (0.005)

0.294 0.333 (0.006) 0.399 (0.003)

0.045 0.028 0.014

20.5 19.6 26.3

Table 4. Reactant-based versus product-based diversities for amide libraries selected using normalised Molconn-Z parameters as descriptors. The final two columns are as described for Table 2 Metric

Size

Reactants

Products

Min

%Diff

SUMcos

900 (30 × 30) 400 (20 × 20) 100 (10 × 10)

0.278 (0.001) 0.294 (0.001) 0.315 (0.002)

0.288 (0.000) 0.308 (0.001) 0.332 (0.001)

0.121 0.104 0.076

6.5 7.8 5.1

SUMTAN

900 (30 × 30) 400 (20 × 20) 100 (10 × 10)

0.451 0.474 (0.002) 0.488 (0.005)

0.470 0.492 (0.001) 0.513 (0.001)

0.217 0.182 0.136

7.5 5.8 6.9

NN

900 (30 × 30) 400 (20 × 20) 100 (10 × 10)

0.107 0.128 (0.003) 0.147 (0.007)

0.150 0.179 (0.003) 0.232 (0.002)

0.036 0.031 0.023

37.7 33.1 42.9

278 prints for the three diversity metrics, SUMcos, SUMTAN and NN. Tables 3 and 4 show similar results for UNITY fingerprints and Molconn-Z parameters, respectively. In general, the results are based on average diversities and standard deviations (given in brackets) over five runs, except for some of the runs using the SUMTAN and NN metrics. These metrics are O(N2) in complexity, unlike the SUMcos metric which is O(N), and insufficient computing resources were available to allow repeated runs. In each case it can be seen that product-based selection is more effective in selecting diverse libraries than is reactant-based selection. We have considered a number of different ways of quantifying the differences in diversity values, since the absolute values are related to the particular descriptors and diversity metrics used. One way might be to determine the degree of overlap between the subsets generated by reactant-based selection as compared to product-based selection on the assumption that the greater the difference between the subsets, the greater is the difference in effectiveness between the two methods. However, since both reactant-based and productbased selection are non-deterministic (the subsets are selected using a GA), different runs of the algorithm can produce difference results. It has been our experience that whereas the final diversity measure does not vary greatly from one run to another, as evidenced by the low standard deviations in the tables, the exact composition of the subsets can vary. In other words, there are many different subsets that give the same near maximal diversity and subsets having the same diversity can have a relatively small degree of overlap. Thus the degree of overlap between sets cannot be used to quantify a difference in diversity. The way we have chosen to quantify the difference in diversities is to calculate the percentage change in diversity based on the range of diversity values that are possible for a given subset size. When measuring the similarity, or dissimilarity, between two compounds the possible values are in the range 0. . . 1; however, the range of values possible for a diversity metric such as the sum-of-pairwise dissimilarities falls in a much smaller range. In earlier work [23], we demonstrated that a GA is able to find near-optimally diverse subsets when operating in cherry-picking mode. If we assume here that SELECT is able to find the global maximum diversity for combinatorial subsets selected from a combinatorial library, then we can also use SELECT to find the global minimum by minimising the diversity of the subsets chosen. The columns headed Min in Tables 2 to 4 report the minimum diversities found over 5 runs (except for a few cases using the SUMTAN and NN diversity metrics when the calculation was performed once only, see above) when product-based selection is performed to select the subset with minimum diversity. The final column then gives the difference between product-based

279 Table 5. Reactant-based versus product-based diversity for a thiazoline-2-imine library using Daylight fingerprints as descriptors. The final two columns are as described for Table 2 Metric

Size

Reactants

Products

Min

% Diff

SUMCOS

900 (6 × 10 × 15) 0.394 400 (4 × 10 × 10) 0.389 0.362 100 (2 × 10 × 5)

0.424 0.420 0.406

0.303 0.272 0.221

24.8 21.0 23.8

SUMTAN

900 (6 × 10 × 15) 400 (4 × 10× 10) 100 (2 × 10 × 5)

0.563 0.552 0.514

0.594 0.589 0.574

0.455 0.424 0.345

22.3 22.4 26.2

900 (6 × 10 × 15) 400 (4 × 10 × 10) 100 (2 × 10 × 5)

0.151 0.167 0.208

0.204 0.232 0.289

0.051 0.042 0.027

34.6 34.2 30.1

NN

and reactant-based selection calculated as a percentage of the possible range of values (the minimum value subtracted from the maximum value found using product-based selection). The results for the thiazoline-2-imine library using Daylight fingerprints and the three diversity metrics are shown in Table 5. The minimum diversity possible and the percentage differences in diversity for product-based selection versus reactant-based selection were calculated as for the amide libraries. The results over all descriptors and all diversity metrics for both libraries are summarised in Table 6 for selecting 900-member subset libraries of configuration 30 × 30 for the amide libraries and 6 × 10 × 15 for the thiazoline-2imine libraries (6 isothiocyanates, 10 amines and 15 haloketones). It can be seen that product-based selection is more effective in all cases. The effect is more pronounced over all the descriptors and metrics for the three-component thiazoline-2-imine library. It is an intuitive result that product-based selection should increase in effectiveness as the number of components in a library increases: reactant-based selection takes no account of the relationship between reactants selected from different reactant pools, and the greater the number of pools the more of a limitation this is likely to become. The effectiveness of product-based selection versus reactant-based selection using SUMcos or SUMTAN as the diversity metric is more pronounced for Daylight fingerprints as descriptors rather than UNITY fingerprints or Molconn-Z parameters. Daylight fingerprints include large structural fragments (containing up to 7 atoms) and hence the product molecules are likely

280 Table 6. Percentage differences in reactant-based selection versus product-based selection over the three descriptor types, and the three metrics for the amide and thiazoline-2-imine libraries. The libraries contain 900 products in configuration 30 × 30 for the amide library and configuration 6 × 10 × 15 for the thiazoline-2-imine library Diversity metric

Descriptor

Amide % Diff

Thiazoline % Diff

SUMcos

Daylight UNITY Molconn-Z

9.4 5.9 6.5

24.8 12.9 12.6

SUMTAN

Daylight UNITY MolConn- Z

12.5 5.5 7.5

22.3 8.0 11.4

NN

Daylight UNITY Molconn-Z

20.1 20.5 37.7

34.6 35.6 49.2

to be represented by fragments that span reactants that originate in different pools: thus there will be more structural information encoded in the product molecules for use in the diversity calculation. This is especially the case for the three-component library. Therefore, it is not surprising that better results can be achieved by performing the analysis in product space. UNITY fingerprints consist of bits that are derived from paths within a molecule in addition to some structural keys that record the presence or absence of particular fragments. The structural keys tend to be more localised than the path-based fragments and hence there will be fewer bits that arise in the product molecules only. The molconn-Z parameters encompass a huge range of types of molecular descriptor, and it is thus more difficult to explain the precise magnitudes of the diversities resulting from their use. The difference between product-based and reactant-based selection is most marked for the NN diversity metric. Choosing diverse reactants (using any metric) and then enumerating the products will result in clusters of closely related molecules in product space (since in a two-component library a given reactant from one pool will exist in product molecules with all the reactants from the other pool). The SUMcos and SUMTAN distance metrics are based on calculating the sum-of-pairwise dissimilarities between compounds and

281 diverse sets of compounds found using these measures can contain compounds that are close in descriptor space [9], thus the existence of clusters of compounds can still result in high diversities using these metrics. The NN metric, however, favours an even distribution of compounds in descriptor space and the occurrence of clusters in product space is likely to result in a relatively poor NN diversity score. Thus, maximising the NN metric directly in product space is likely to produce a better spread of molecules throughout the space than can be achieved by considering the reactants alone. Similar studies have been performed recently by Jamois [31] and Pearlman [32]. Jamois has reported the difference in product-based selection relative to reactant-based selection as a percentage of the representativity of the subset relative to the entire virtual library. Pearlman has developed a method for library design that is called reactant-biased, product-based selection and compared the diversity of libraries designed by this method with cherry-picking in product space and reactant-based selection. Again the differences in diversity are reported as percentages. It is not possible to make a quantitative comparison between the approaches of Jamois and Pearlman and that described here since different diversity metrics and descriptors have been used. However, the results of both Pearlman and Jamois support our general findings in that they both conclude that product-based selection results in significantly more diverse libraries than does reactant-based selection.

Design of diverse and drug-like libraries An important advantage of performing library design in product space is the ability to optimise the properties of individual molecules within the library simultaneously with the diversity of the library. SELECT has been designed with a multi-component fitness function whereby the physicochemical property profiles of the libraries can be optimised with respect to the profile of the same property in some reference collection, for example a ‘drug-like’ profile as found in WDI. In the following experiments we have compared the physicochemical property profiles of diverse libraries selected by analysing reactant space with the profiles of the same physicochemical properties in libraries selected from product space that are optimised on property and diversity, simultaneously. The property profiles we have investigated are the distribution of rotatable bonds and molecular weight, although the methods are applicable to any rapidly computable molecular property. In each case, the profile is recorded in a series of 20 bins where each bin represents the percentage of compounds in the library having a given number of rotatable bonds or having molecule weight within a given range. In the case of rotatable bond profiles the bins represent the occurrence of 0, 1, 2, . . . >19 rotatable

282 Table 7. The RMSD between molecular weight profiles of WDI and reactant-based libraries is compared with the RMSD achieved by optimising the profile in the product libraries. Diversity is measured using Daylight fingerprints and SUMcos. The product-based libraries have much better profiles and are also more diverse than the reactant-based libraries Library

Amide

Thiazoline-2-imine

Size

30 × 30 10 × 10 6 × 10 × 15 2 × 10 × 5

Reactants

Products

Diversity

∆MW

Diversity

∆MW

0.564 0.561

6.19 5.76

0.573 0.565

1.38 0.83

0.394 0.362

8.08 9.47

0.415 0.390

6.38 4.26

Table 8. The RMSD between rotatable bond profiles of WDI and reactant-based libraries is compared with the RMSD achieved by optimising the profile in the product libraries. Diversity is measured using Daylight fingerprints and SUMcos. The product-based libraries have better profiles and are also more diverse than the reactant-based libraries Library

Amide

Thiazoline-2-imine

Size

30 × 30 10 × 10 6 × 10× 15 2 × 10 × 5

Reactants

Products

Diversity

∆RB

Diversity

∆RB

0.564 0.561

4.62 5.24

0.574 0.594

3.30 2.85

0.394 0.362

7.31 6.96

0.410 0.376

5.09 4.05

bonds. In the case of molecular weight profiles the bins cover the following ranges: 0. . .49,50. . .99, . . . ≤950. Distributions of the properties in WDI are used as the reference distributions and SELECT attempts to minimise the root mean squared difference (RMSD) between the profile of the designed library and the reference profile. The profiles of rotatable bonds and molecular weights were calculated for the subsets consisting of 900 compounds that were generated by selecting diverse reactants followed by enumeration, using Daylight fingerprints and SUMcos. SELECT was then run to choose combinatorial subsets in product space that are optimised firstly on diversity and rotatable bond profile and secondly on diversity and molecular weight profile. In each case, the fitness

283

Figure 7. The physical property profiles of amide libraries designed using reactant-based selection (in white) are compared with libraries that are optimsed in product space (grey) and with the property profiles found in WDI (black). (a) Rotatable bond profiles; (b) molecular weight profiles.

284

Figure 8. The physical property profiles of thiazoline-2-imine libraries designed using reactant-based selection (in white) are compared with libraries that are optimised in product space (grey) and with the property profiles found in WDI (black). (a) Rotatable bond profiles; (b) molecular weight profiles.

285 function consisted of the sum of two weighted terms, the diversity term and the relevant property term. The property was included in the fitness function as the RMSD between the distribution of the property in the library represented in a chromosome and the distribution of the property in WDI, where the distributions are given as percentages. The weight assigned to diversity was 1.0 and the weight assigned to the RMSD of the property was 0.1, these weights being chosen so that the RMSD property values were approximately in the same range of values as diversity. The results for the amide and thiazoline-2-imine libraries, shown graphically in Figures 7 and 8, respectively, demonstrate that reactant-based selection often results in libraries with poor physicochemical property profiles. By performing the analysis in product space it is possible to design libraries with optimised physicochemical property profiles. The profiles of the libraries designed in product space have much more ‘WDI-like’ profiles and are therefore likely to contain more compounds with the potential to be bioactive.

Conclusions In this article, we have described novel methods for compound selection and library design. The compound selection method is a knowledge-based approach whereby weights are derived based on the discrimination between two classes of compounds used as training sets. The weights can then be used to score and rank compounds according to the extent to which they exhibit characteristics that are common to the compound classes used in training. For example, weights can be derived that discriminate between ‘drug-like’ and ‘non-drug-like’ compounds and can be used subsequently to order compounds for screening so that the compounds with ‘drug-like’ characteristics are screened first. Using the SELECT program for designing combinatorial libraries in product space, we have provided evidence, based on a number of different molecular descriptors and different diversity metrics, that productbased library design can lead to better optimised libraries than when the analysis is performed in reactant space. We have also shown that productbased analysis allows the simultaneous optimisation of multi-objectives in the design of libraries that are not only diverse but that also have ‘drug-like’ physicochemical properties.

286

Acknowledgements We thank the following: Professor Peter Willett and Dr. David Wilton at Sheffield University and Dr. John Bradshaw and Dr. Darren Green at GlaxoWellcome Research and Development for helpful discussions throughout this work; GlaxoWellcome Research and Development for financial support for V.J.G.; The University of Bari for financial support for O.N.; and Daylight Chemical Information Systems Inc., and Tripos Inc. for software.

References 1. Dolle, R.E. and Nelson Jr., K.H., J. Comb. Chem., 1 (1999) 235. 2. Dean, P.M. and Lewis, R.A. (Eds.), Molecular Diversity in Drug Design, Kluwer Academic Publishers, Dordrecht, 1999. 3. Willett, P. (Ed.), Computational Methods for the Analysis of Molecular Diversity, Perspectives in Drug Discovery and Design, Vols. 7/8, Kluwer Academic Publishers, Dordrecht, 1997. 4. Brown, R.D. and Martin, Y.C., J. Chem. Inf. Comput. Sci., 36 (1996) 572. 5. Brown, R.D. and Martin, Y.C., J. Chem. Inf. Comput. Sci., 37 (1997) 1. 6. Patterson, D.E., Cramer, R.D., Ferguson, A.M., Clark, R.D. and Weingerger, L.E., J. Med. Chem., 39 (1996) 3049. 7. Matter, H., J. Med. Chem., 40 (1997) 1219. 8. Pötter, T. and Matter, H., J. Med. Chem., 41 (1998) 478. 9. Snarey, M., Terrett, N.K., Willett, P. and Wilton, D.J., J. Mol. Graph. Model, 15 (1997) 372. 10. Lipinski, C.A., Lombardo, F., Dominy, B.W. and Feeny, P.J., Adv. Drug Deliv., 23 (1997) 3. 11. Gillet, V.J., Willett, P. and Bradshaw, J., J. Chem. Inf. Comput. Sci., 38 (1998) 165. 12. Kier, L.B., Med. Res. Rev., 7 (1987) 417. 13. CLOGP3 Reference Manual. Daylight Chemical Information Systems, Inc., Mission Viejo, CA, U.S.A. 14. Daylight Chemical Information Systems, Inc., Mission Viejo, CA, U.S.A. 15. Goldberg, D.E., Genetic Algorithms in Search, Optimisation, and Machine Learning, Addison-Wesley, Reading, MA, 1989. 16. The World Drug Index (WDI) is maintained by Dement Publications Ltd., London. 17. The SPRESI database is produced by the All-Union Institute of Scientific and Technical Information of the Academy of Science of the USSR (VINITI) in Moscow, and the Central Information Processing for Chemistry (ZIC) in Berlin. It consists of data extracted from approximately 1000 journals, and also patents, books and other sources from 1975– 1990. It is distributed by Daylight Chemical Information Systems, Inc., Mission Viejo, CA, U.S.A. 18. Hann, M., Hudson, B., Lewell, X., Lifely, R., Miller, L. and Ramsden, N., J. Chem. Inf. Comput. Sci., 39 (1999) 897. 19. Ajay, Walters, W.P. and Murcko, M., J. Med. Chem., 41 (1998) 3314. 20. Sadowski, J. and Kubinyi, H., J. Med. Chem., 41 (1998) 3325.

287 21. Wagener, M. and van Geerestein, V.J., Potential Drugs and Non-drugs: Prediction and Identification of Important Structural Features. Paper presented at the Fifth International Conference on Chemical Structures, The Netherlands, 1996. 22. The Available Chemicals Directory is available from MDL Information Systems Inc., San Leandro, CA, U.S.A. 23. Gillet, V.J., Willett, P. and Bradshaw, J., J. Chem. Inf. Comput. Sci., 37 (1997) 731. 24. Cribbs, C.M., Menius, J.A., Cummins, D. and Young, S.S., Statistical Methods for Monomer Selection in Chemical Library Design. Paper presented at the 211th Meeting of The American Chemical Society, 1996. 25. Gillet, V.J., Willett, P. and Bradshaw, J., J. Chem. Inf. Comput. Sci., 39 (1999) 169. 26. UNITY is available from Tripos Inc., St. Louis, MO, U.S.A. 27. Molconn-Z is available from eduSoft, Ashland, VA, U.S.A. 28. Lajiness, M.S., Perspect. Drug Discov. Design, 7/8 (1997) 65. 29. Holliday, J.D., Ranade, S.S. and Willett, P., Quant. Struct.-Act. Relat., 14 (1995) 501. 30. Watson, S., Solution Phase Synthesis of Libraries Based on Thiazole Templates. 3rd Annual Random and Rational Conference, Geneva, Strategic Research Institute, New York, NY, 1996. 31. Jamois, E.A., Evaluation of Reagent-based and Product-based Strategies in the Design of Combinatorial Subsets, Paper presented at the 217th Meeting of The American Chemical Society, 1999. 32. Pearlman, R.S. and Smith, K.M., Novel Algorithms for the Design of Diverse and Focused Combinatorial Libraries, Paper presented at the 217th Meeting of The American Chemical Society, 1999.

Perspectives in Drug Discovery and Design, 20: 289, 2000. KLUWER/ESCOM

Author Index Volume 20 2000

Bradshaw,J., 1 Briem, H., 231

Knegtel,R.M.A., 191 Kostka,T., 245 Kuhn, L.A., 171

Caflisch, A., 145 Davies, T.G., 29 Gasteiger, J., 245 Gillet,V.J., 265 Ginn, C.M.R., 1 Gohlke, H., 115 Hendlich, M., 115 Höllering, R., 245 Hubbard, R.E., 29 Karg,N., 245 Klebe, G., 115

Lemmen, C., 43 Lengauer, T., 43,63 Lessel, U.f., 231 Luty, B.A., 209

Rose,P.W., 209 Sadowski, J., 17 Sacher, O., ‘245 Scarsi, M., 145 Schnecke, V., 171 Sitzmann, M., 245 Stahl,M., 83

Majeux, N., 145 Marrone, T.J., 209 Mestres, J., 191 Muegge,I., 99

Willett, P., 1

Nicolotti, O. 265

Zimmermann, M., 43

Pförtner, M., 245 Rarey,M., 63

Tame, J.R.H., 29 Tenette-Souaille, C., 145

Perspectives in Drug Discovery and Design, 20: 291–295, 2000. KLUWER/ESCOM

Subject Index Volume 20 2000

2D descriptors 233,242 fingerprint 6,204 similarity 5, 56, 83, 233 3D-QSAR 193 analysis 46 3D database searches 232 descriptors 232, 242 fingerprint 6,12 pharmacophore 83 similarity 56, 201, 204 methods 196 5-HT3 antagonist 236 accessibility 86, 89 ACD 197 ACE inhibitor 236 acetylcholine receptor inhibitors 12 active analogue approach 44 ADME properties 192, 232 affinity 29 fingerprints, 231 AGDOCK 2 16 aldose reductase inhibitors 12 anchor fragment 176 angiotensin -converting enzyme inhibitor 12 -II receptor antagonists 12 aqueous buffers 36 arabinose-binding protein 136 argatroban 36 assays 30 atom pair potentials 100 Available Chemicals Directory 19, 273

base placement 47 beta adrenergic blockers 12 binding affinities 34, 38, 73, 93, 113, 116, 125, 232 energy 165 free energy 225 mode 99, 145 site characteristics 172 solvation 177, 187 waters 178 bioactivity profiles 265 bit-string 5, 8 buried water 42 Cambridge Crystallographic Database 180 Cambridge structural database 141 chemical database 5 diversity 199 reactions 246, 257 reactivity 246, 250, 252, 254 clique detection 45 cliques 51 clustering 175 procedure 153 similar molecules 64 co-crystallisation 37 combinatorial chemistry v, vii, 17, 191, 204, 245, 26 1, 262, 265 docking 63, 69, 71, 116 library 24, 63–47, 204, 246, 273, 285

292 library design 265, 281 screening 187 computational docking 210 computing times 77 conformation 44 conformational analysis 63 change 31 entropy 228 searching 64, 172, 186 space 220 consensus scoring 95 conserve 178 continuum electrostatics 155, 164 solvation 153 CORINA 180,197 Coulomb field 149 COX2 84 crop protection score 22 cross-over 24 crystal structure 128 crystallographic data 52 cyclo oxygenase 2 12

compatibility graph 45 geometry 44 graph 51 diversity 17, 27, 245, 265, 266, 274 measure 278 DOCK 65, 83, 120, 164, 194, 234, 235 docking vii, 64, 83, 116, 171, 186, 194, 232 methods 196 dopamine 3 receptor (ant)agonists 12 drug design 63, 145 discovery 36 drug-like 245, 266, 270, 281, 285 molecules vii, 22 1, 234 drug-likeness 17-19 DrugScore 117,140

electrostatic desolvation 145, 148 interactions 156, 227 potential 195 empirical energy function 2 17 knowledge 116 data free energy function 210 fusion 1–2, 12 scoring potential 38 mining 267 endothelin receptor (ant)agonists, 12 database endothermic process 33 screening 173, 191 endothiapepsins 129 searching 1,23 1 enrichment 53, 55, 238 Daylight fingerprints 12–13, 54, 55, 237, factors 87–90 274,275,282 EROS system 250,256,261,263 de novo design 146,164 enthalpy 34,39 descriptor vi, 5, 266, 274 -entropy relationship 229 -based methods 46 entropic effect 147 matching 173 entropy 34,39,40 design 30 exothermic binding 33 desolvation 155, 157, 216, 227 DHFR 71, 84, 92 farnesyltransferase 164 dielectric constant 148,156 FBSS 11 dihydrofolate reductase 180, 182 feature trees 237 directional hydrophobic interactions 48 fingerprints 6, 17–25, 232, 233 directionality 118, 121 fitness 24 distance FKBP 214

293 flexibility 40, 171 hydrophilic zone 146 flexible superpositioning 49 hydrophobic FlexS 46 fragments 159 interaction 174 FLEXX 65–67, 79, 83–84, 92, 95, 118, 120,235 regions 164 Fourier space 47 zone 146 fragment bit-strings 1 hydrophobicity 145, 146, 179 fragment libraries 165 incremental construction 47, 173 Gaussian interaction centers 175 functions 47 interactions 30 representations 51, 55 intermolecular hydrogen bonding 173 genetic algorithms 17–26, 234, 268, 278 internal rotations 228 genome projects 165 ISOSTAR 134 geometric hashing 45 isothermal titration calorimetry 31 GOLD 83,120 GRASP 149 kinetic data 31 knowledge based 116, 140 H-bonding geometries 48 potentials 117 hashing approach 175 scoring function 91, 99, 100, 113, heat of dilution 34 127,242 Helmholtz free Kohonen networks 245 energy 99 interaction energies 117 L-arabinose binding proteins 129 hierarchical data structures 68 large database 171 high-throughput lead optimization 192 data 20 library screening v, 17, 191, 196, 204, 209, design 65, 191, 245, 273, 283 231,232,245,265 screening 209 histamine 2 antagonists 12 ligand hit binding 29 discovery 192 conformation 40 optimization 192 docking 145 -to-lead 232 linkers 2 12 HIV-protease 84 Lipinsky’s rule vii HIV-1 protease 12, 157 LUDI 38, 65, 227 HMG-CoA reductase inhibitor 236 Hodgkin index 47 MAP kinase 152 hot spots 138, 141 MCSS 65 HQSAR 237 MDDR 236 human genome v medicinal chemistry 30 hydrogen bond 83, 85, 89 metalloproteases 129 acceptor 174 MIMIC 195 donor 174 mitogen-activated protein (MAP) kinase geometries 153 150

294 molecular descriptors 193, 204, 231, 285 docking 63,242 holograms 237 recognition 145 similarity 1,43, 193, 231 surface 149, 157 multiple copy simultaneous search 164 flexible superposition 59 mutation 24

ligand-binding modes 118 product -based selection 265 level 273 progesterone receptor 180 protein-bound water 178 Protein Data Bank 203 protein-ligand binding 99 complexes 29, 38, 52, 150, 225 interactions 30, 116, 192 protonation states 140 PteusScore 132

NAPAP 150,197 -thrombin complex 162 NCI 58, 180 QSAR 5, 13, 231 neural networks 17–19, 256, 257, 258, model 233 271 neurokinase- 1 receptor 12 reactant NMR 209 -based selection 265 level 273 one-body potential 119 reaction OppA 36 classification 245 prediction 245 P38 MAP 164 receptor PAF antagonist 236 -fragment complex 157 pair ligand interaction 63 -potentials 1 18 reference preferences 140 alignment 5 2 PDB protein-ligand complexes 129 state 99, 102, 113 penicillin acylase 30 relative orientation 44 perturbation 116 ReLiBase 119 PETRA 247, 248 RigFit 46 pharmacophore root-mean-square deviation (rmsd) 124 -based search 180 rotamer libraries 173, 186 modelling vi Rule-of-five Lipinski 266 pharmacophoric groups 45 physicochemical SAR by NMR 91 descriptors 258 SCORE1 132 properties 193, 266, 267, 281, 285 scoring 83, 178 PMFScore 132 functions 41, 84–91, 95, 99, 112, 116, PMF scoring 99 117, 164, 172, 187, 204, 224 positions of polar hydrogen atoms 140 screening 171 potential of mean force 117,118 SEED 146 predict serine proteases 129 binding affinities 118, 129 shape-complementary 176

295 sialidase 36 signalling cascades 152 similarity vi, 194, 232, 247, 257, 278 measure 1 6, 11,45, 195 score 3, 6, 196 search 2, 5, 11–15 singlet preference 123, 140 SMoGScore 132 soaking 37 so!vation 1 12, 173 models 146 properties vii solvent accessible regions 2 14 accessible surface 1 19 effects 155 -mediated effects 119 specific recognition 29 SPRESI 268, 270, 275 statistical preferences 121 steric-volume field 195 strain energy term 228 strategic bonds 245, 248 stromelysin 221 structural diversity 273 structure-based approaches 201 design 64 ligand design 146, 172 substructure searches 245, 248 superposition vii superpositioning of molecules 43 Superstar 138, 139 Sybyl 197

synthesis design 245 Tanimoto coefficient 1, 6, 275 targeted libraries vi Theilheimer database 261 thermodynamics 29, 30, 229 integration 116 thermolysin 134, 136 thrombin 71, 73, 84, 88, 92, 134, 150, 159,197 -NAPAP 164 thymidylate synthase 84 trypsin 134 TXA2 236 UNITY fingerprints 11, 274, 275, 280 uracil-DNA glycosylase 180, 18 1 vibrational freedom 40 virtual database screening 45, 51 NMR screening 210, 224 screening v, vi, 83, 95, 99, 116, 124, 191, 231, 232 water 75, 150 -mediated hydrogen bonds 178 interactions 172, 173 structure 147 WODCA 246, 248, 263 World Drug Index (WDI) 11, 19, 85-87, 268, 270, 281