Annual Reports in Computational Chemistry, Volume 6

Annual Reports in COMPUTATIONAL CHEMISTRY VOLUME 6 Edited by Ralph A. Wheeler Department of Chemistry and Biochemist...

Author: Ralph A. Wheeler | David C. Spellmeyer

57 downloads 1368 Views 6MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Annual Reports in

COMPUTATIONAL CHEMISTRY VOLUME

6 Edited by

Ralph A. Wheeler Department of Chemistry and Biochemistry,

Duquesne University,

600 Forbes Avenue,

Pittsburgh, PA 15282-1530.

Sponsored by the Division of Computers in Chemistry of the American Chemical Society

Amsterdam • Boston • Heidelberg • London • New York • Oxford Paris • San Diego • San Francisco • Singapore • Sydney • Tokyo

Elsevier Radarweg 29, PO Box 211, 1000 AE Amsterdam, The Netherlands Linacre House, Jordan Hill, Oxford OX2 8DP, UK 32 Jamestown Road, London NW1 7BY, UK 525 B Street, Suite 1900, San Diego, CA 92101-4495, USA 30 Corporate Drive, Suite 400, Burlington, MA 01803, USA First edition 2010 Copyright � 2010 Elsevier B. V. All rights reserved No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means electronic, mechanical, photocopying, recording or otherwise without the prior written permission of the publisher. Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone (+44) (0) 1865 843830; fax (+44) (0) 1865 853333; email: [email protected]. Alternatively you can submit your request online by visiting the Elsevier web site at http://www.elsevier.com/locate/permissions, and selecting Obtaining permission to use Elsevier material. Notice No responsibility is assumed by the publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. Because of rapid advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made Library of Congress Cataloging-in-Publication Data A catalogue record for this book is available from the Library of congress British Library Cataloging in Publication Data A catalogue record for this book is available from the British Library ISBN: 978-0-444-53552-8 ISSN: 1574-1400

For information on all Elsevier publications visit our website at elsevierdirect.com

Printed and bound in USA 10 11 12 13

10 9 8 7 6 5 4 3 2 1

Working together to grow libraries in developing countries www.elsevier.com | www.bookaid.org | www.sabre.org

CONTRIBUTORS Orlando Acevedo Department of Chemistry and Biochemistry, Auburn University, Auburn, AL, USA Kristin S. Alongi Dean’s Office and Department of Chemistry & Physics, College of Science & Technology, Armstrong Atlantic State University, Savannah, GA, USA Wei An Department of Chemical and Biological Engineering, The University of Alabama, Tuscaloosa, AL, USA Oshrit Arviv Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel Mauricio Cafiero Department of Chemistry, Rhodes College, Memphis, TN, USA Qiang Cui Department of Chemistry and Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI, USA Olga Dolgounitcheva Department of Chemistry and Biochemistry, Auburn University, Auburn, AL, USA Brett I. Dunlap Chemistry Division, Naval Research Laboratory, Washington DC, USA George M. Giambas¸u Biomedical Informatics and Computational Biology; Department of Chemistry, University of Minnesota, Minneapolis, MN, USA Andreas W. Go¨tz San Diego Supercomputer Center, University of California San Diego, La Jolla, CA, USA Tzachi Hagai Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel Daniel Harries Institute of Chemistry and the Fritz Haber Research Center, The Hebrew University of Jerusalem, Jerusalem, Israel

ix

x

Contributors

Sheng-You Huang Department of Physics and Astronomy, Department of Biochemistry, Dalton Cardiovascular Research Center, and Informatics Institute, University of Missouri, Columbia, MO, USA George Khelashvili Department of Physiology and Biophysics, Weill Medical College of Cornell University, New York, NY, USA Kah Chun Lau Department of Chemistry, George Washington University, Washington DC, USA Yaakov Levy Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel Hongzhi Li Institute of Molecular Biophysics, Florida State University, Tallahassee, FL, USA Yan Ling Department of Chemistry and Biochemistry, University of Southern Mississippi, Hattiesburg, MS, USA Maura Livengood Department of Chemistry, Rhodes College, Memphis, TN, USA Donghong Min Institute of Molecular Biophysics, Florida State University, Tallahassee, FL, USA J. V. Ortiz Department of Chemistry and Biochemistry, Auburn University, Auburn, AL, USA Dalit Shental-Bechor Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel Edward C. Sherer Merck and Co., Inc., Rahway, NJ, USA George C. Shields Dean’s Office & Department of Chemistry, College of Arts & Sciences, Bucknell University, Lewisburg, PA, USA Tai-Sung Lee Biomedical Informatics and Computational Biology; Department of Chemistry, University of Minnesota, Minneapolis, MN, USA C. Heath Turner Department of Chemical and Biological Engineering, The University of Alabama, Tuscaloosa, AL, USA Hunter Utkov Department of Chemistry, Rhodes College, Memphis, TN, USA

Contributors

xi

Jonah Z. Vilseck Department of Chemistry and Biochemistry, Auburn University, Auburn, AL, USA Ross C. Walker San Diego Supercomputer Center, University of California San Diego, La Jolla, CA, USA Xian Wang Department of Chemical and Biological Engineering, The University of Alabama, Tuscaloosa, AL, USA Mark J. Williamson San Diego Supercomputer Center, University of California San Diego, La Jolla, CA, USA Thorsten Wo¨lfle San Diego Supercomputer Center, University of California San Diego, La Jolla, CA, USA; Lehrstuhl fu¨r Theoretische Chemie, Universita¨t Erlangen, Erlangen, Germany Dong Xu San Diego Supercomputer Center; National Biomedical Computation Resource, University of California San Diego, La Jolla, CA, USA Wei Yang Institute of Molecular Biophysics; Department of Chemistry and Biochemistry, Florida State University, Tallahassee, FL, USA Darrin M. York Department of Chemistry, University of Minnesota, Minneapolis, MN, USA Alexander V. Zakjevskii Department of Chemistry and Biochemistry, Auburn University, Auburn, AL, USA Viatcheslav G. Zakrzewski Department of Chemistry and Biochemistry, Auburn University, Auburn, AL, USA Yong Zhang Department of Chemistry, Chemical Biology, and Biomedical Engineering, Stevens Institute of Technology, Castle Point on Hudson, Hoboken, NJ, USA Xiaoqin Zou Department of Physics and Astronomy, Department of Biochemistry, Dalton Cardiovascular Research Center, and Informatics Institute, University of Missouri, Columbia, MO, USA

PREFACE Annual Reports in Computational Chemistry (ARCC) was instituted to provide timely reviews of topics important to researchers in Computational Chemistry. ARCC is published and distributed by Elsevier and sponsored by the American Chemical Society’s Division of Computers in Chemistry (COMP). Members in good stand ing of the COMP Division receive a copy of the ARCC as part of their member benefits. Since previous volumes have received such an enthusiastic response from our readers, the COMP Executive Committee expects to deliver future volumes of ARCC that build on the solid contributions in our first five volumes. To ensure that you receive future installments of this series, please join the Division as described on the COMP website at http://www.acscomp.org. Volume 6 features 14 outstanding contributions in six sections and includes a new section devoted to Nanotechnology and the reemergence of the Chemical Education section. Topics covered (and Section Editors) include Simulation Meth odologies (Carlos Simmerling), Quantum Chemistry (Gregory S. Tschumper), Chemical Education (George C. Shields), Nanotechnology (Luke E.K. Achenie), Biological Modeling (Nathan Baker), and Bioinformatics (Wei Wang). Although individual chapters in ARCC are now indexed by the major abstracting services, we plan to continue the practice of cumulative indexing of both the current and past editions to provide an easy identification of past reports. As was the case with our previous volumes, the current volume of Annual Reports in Computational Chemistry has been assembled entirely by volunteers to produce a high-quality scientific publication at the lowest possible cost. The Editor and the COMP Executive Committee extend our gratitude to the many people who have given their time to make this edition of Annual Reports in Computational Chemistry possible. The authors of each of this year’s contributions and the Section Editors have graciously dedicated significant amounts of their time to make this volume successful. This year’s edition could not have been assembled without the help of Clare Caruana of Elsevier. Thank you one and all for your hard work, your time, and your contributions. We trust that you will find this edition to be interesting and valuable. We are actively planning the seventh volume and anticipate that it will restore one or more previously popular sections, including Materials and/or Emerging Technologies. In addition, we are actively soliciting input from our readers about future topics, so please contact the editor to make suggestions and/or to volunteer as a contributor. Sincerely, Ralph A. Wheeler, Editor

xiii

Section 1

Simulation Methodologies

Section Editor: Carlos Simmerling Department of Chemistry, State University of New York,

Stony Brook, NY 11794, USA

CHAPTER

1 Advancements in Molecular Dynamics Simulations of Biomolecules on Graphical Processing Units Dong Xu1,2, Mark J. Williamson1, and Ross C. Walker1

Contents

1. Introduction 2. An Overview of GPU Programming 2.1 GPU/CPU hardware differences 2.2 The emergence of GPU programming languages 2.3 GPU programming considerations 3. GPU-Based Implementations of Classical Molecular Dynamics 3.1 Early GPU-based MD code development 3.2 Production GPU-based MD codes 4. Performance and Accuracy 4.1 Performance and scaling 4.2 Validation 5. Applications 5.1 Protein folding 6. Conclusions and Future Directions Acknowledgments References

Abstract

Over the past few years competition within the computer game market coupled with the emergence of application programming interfaces to support general purpose computation on graphics processing units (GPUs) has led to an explosion in the use of GPUs for acceleration of scientific applications. Here we explore the use of GPUs within the context of condensed phase molecular dynamics (MD) simulations. We discuss the algorithmic differences that the GPU architecture imposes on MD codes, an overview of the challenges involved in using GPUs for MD, followed by a

1

San Diego Supercomputer Center, University of California San Diego, La Jolla, CA, USA

2

National Biomedical Computation Resource, University of California San Diego, La Jolla, CA, USA

Annual Reports in Computational Chemistry, Volume 6 ISSN: 1574-1400, DOI 10.1016/S1574-1400(10)06001-9

4

6

6

7

8

9

9

11

13

13

14

15

15

16

17

17

� 2010 Elsevier B.V. All rights reserved.

3

4

Ross C. Walker et al.

critical survey of contemporary MD simulation packages that are attempting to utilize GPUs. Finally we discuss the possible outlook for this field. Keywords: GPU; CUDA; stream; NVIDIA; ATI; molecular dynamics; accelerator; OpenMM; ACEMD; NAMD; AMBER

1. INTRODUCTION Since the first molecular dynamics (MD) simulation of an enzyme was described by McCammon et al. [1]. MD simulations have evolved to become an important tool in understanding the behavior of biomolecules. Since that first 10 ps long simulation of merely 500 atoms in 1977, the field has grown to where small enzymes can be routinely simulated on the microsecond timescale [2—4]. Simula tions containing millions of atoms are now also considered routine [5,6]. Such simulations are numerically intensive requiring access to large-scale supercom puters or well-designed clusters with expensive interconnects that are beyond the reach of many research groups. Many attempts have been made over the years to accelerate classical MD simulation of condensed-phase biological systems by exploiting alternative hardware technologies. Some notable examples include ATOMS by AT&T Bell Laboratories [7], FASTRUN designed by Columbia University in 1984 and con structed by Brookhaven National Laboratory in 1989 [8], the MDGRAPE system by RIKEN [9] which used custom hardware—accelerated lookup tables to accel erate the direct space nonbond calculations, Clearspeed Inc. which developed an implicit solvent version of the AMBER PMEMD engine [10,11] that ran on their custom designed Advance X620 and e620 acceleration cards [12], and most recently DE Shaw Research LLC who developed their own specialized architec ture for classical MD simulations code-named Anton [13]. All of these approaches have, however, failed to make an impact on main stream research because of their excessive cost. Table 1 provides estimates of the original acquisition or development costs of several accelerator technologies. These costs have posed a significant barrier to widespread development within the academic research community. Additionally these technologies do not form Table 1 Example cost estimates for a range of hardware MD acceleration projects Accelerator technology

Manufacturer

Estimated cost per node

CX600 MDGRAPE-3 ATOMS FASTRUN

ClearSpeed Riken AT&T Bell Laboratories Columbia University and Brookhaven National Laboratory NVIDIA/ATI

~$10,000 ~$9,000,000a ~$186,000 (1990) ~$17,000 (1989)

GPU a

Total development cost: $15 million [14].

$200—800

5

Advancements in MD Simulations of Biomolecules on GPUs

part of what would be considered a standard workstation specification. This makes it difficult to experiment with such technologies leading to a lack of sustained development or innovation and ultimately their failure to mature into ubiquitous community-maintained research tools. Graphics processing units (GPUs), on the other hand, have been an integral part of personal computers for decades. Ever since 3DFX first introduced the Voodoo graphics chip in 1996, their development has been strongly influenced by the entertainment industry in order to meet the demands for ever increasing realism in computer games. This has resulted in significant industrial investment in the stable, long-term development of GPU technology. Additionally the strong demand from the consumer electronics industry has resulted in GPUs becoming cheap and ubiquitous. This, combined with substantial year over year increases in the comput ing power of GPUs, means they have the potential, when utilized efficiently, to significantly outperform CPUs (Figure 1). This makes them attractive hardware targets for acceleration of many scientific applications including MD simulations. The fact that high-end GPUs can be considered standard equipment in scientific workstations means that they already exist in many research labs and can be purchased easily with new equipment. This makes them readily available to researchers and thus tempting instruments for computational experimentations. The nature of GPU hardware, however, has made their use in general purpose computing challenging to all but those with extensive three-dimensional (3D) graphics programming experience. However, as discussed in Section 2 the devel opment of application programming interfaces (APIs) targeted at general pur pose scientific computing has reduced this complexity to the point where GPUs are beginning to be accepted as serious tools for the economically efficient acceleration of an extensive range of scientific problems. In this chapter, we provide a brief overview of GPU hardware and programming techniques and then review the progress that has been made in using GPU hardware to accelerate classical MD simulations of condensed-phase biological systems; we review some of the challenges and limitations that have faced those trying to (a) 1000

(b) 120

NVIDIA GPU INTEL CPU

G92 G80 Ultra

500

G80

250

G71 NV35 NV40 NV30

G70

3.2 GHz 3.0 GHz Hapertown Core 2 Duo

100

Ja nJu 03 nN 03 ov Ap 03 rSe 04 pFe 04 bJu 05 lD 05 ec M 05 ay O 06 ct M 06 ar Au 07 gJa 07 nJu 08 n08

GT200 = GeForce GTX 280 G92 = GeForce 9800 GTX G80 = GeForce 8800 GTX

GT200 G80 Ultra G80

80 60 G71

40 NV40

20 0

0

NVIDIA GPU INTEL CPU

NV30 Northwood

Prescott EE Woodcrest

Harpertown

Ja nJu 03 nN 03 ov Ap 03 rSe 04 pFe 04 bJu 05 lD 05 ec M 05 ay O 06 ct M 06 ar Au 07 gJa 07 nJu 08 n08

Peak GFlop/s

750

Memory bandwidth GB/s

GT200

G71 = GeForce 7900 GTX G70 = GeForce 7800 GTX NV40 = GeForce 6800 Ultra

Nv35 = GeForce FX 5950 Ultra NV30 = GeForce FX 5800

Figure 1 Peak floating-point operations per second (a) and memory bandwidth (b) for Intel CPUs and NVIDIA GPUs. Reproduced from [15].

6

Ross C. Walker et al.

implement MD algorithms on GPUs, consider performance numbers and validation techniques, and then highlight some recent applications of GPU-accelerated MD. Finally, we comment on the limitations of current GPU MD implementations and what the future may hold for acceleration of MD simulations on GPU hardware.

2. AN OVERVIEW OF GPU PROGRAMMING 2.1 GPU/CPU hardware differences In order to comprehend where the performance benefits lie and understand the complexity facing programmers wishing to utilize GPUs, it is necessary to compare the underlying nature, and design philosophies, of the GPU with that of the CPU. Conventional CPUs found in the majority of modern computers, such as those manufactured by Intel and advanced micro devices (AMD), are designed for sequential code execution as per the Von Neumann architecture [16]. While running a program, the CPU fetches instructions and associated data from the computer’s random access memory (RAM), decodes it, executes it, and then writes the result back to the RAM. Within the realm of Flynn’s taxonomy [17], this would be classified as single instruction, single data (SISD). Physically, a CPU generally comprises of the following units (Figure 2). The control unit receives the instruction/data pair from RAM during the decoding phase and disseminates out the instruction to give to the arithmetic logic unit (ALU) which is the circuitry that carries out the logical operations on the data. Finally, there are cache units which provide local and fast temporary data storage for the CPU. Historically, performance improvements in sequential execution have been obtained by increasing CPU clock speeds and the introduction of more complex ALUs that perform increasingly composite operations in fewer clock cycles. Additionally, pipelining, which is executing instructions out of order or in parallel while maintaining the overall appearance of sequential execution, has also improved performance (but not calculation speed) by increasing the number of instructions a CPU can execute in a unit amount of time; and larger on chip cache memory is often used to hide latency. In contrast to the CPU’s generality, GPUs (Figure 2) have been designed to facilitate the display of 3D graphics by performing large numbers of floating (a) CPU

(b) GPU

ALU

ALU

ALU

ALU

Control

ALU

ALU

ALU

ALU

Control

Cache Cache

DRAM

DRAM

Figure 2 Abstraction contrasting CPU and GPU design. Adapted from [18].

Advancements in MD Simulations of Biomolecules on GPUs

7

point operations per video frame: they are essentially specialized numeric com puting engines. The dominant strategy adopted by the graphics industry to meet this requirement has been to maximize the throughput of a massive number of parallel threads which can all access the RAM on the GPU board. Herein lies the key difference with CPUs: the same operation can be carried out on different parts of the input data within the GPU’s memory by an army of individual threads concurrently. Within Flynn’s taxonomy, this falls into the single instruc tion, multiple data (SIMD) category. A GPU has a hierarchical structure composed of multiple streaming multipro cessors (SMs) which in turn consist of sub units of streaming processors. Memory is also hierarchical, maintaining an approximately constant size to speed ratio; all SMs share the same device global memory which is large, but relatively slow. Smaller, lower latency, on-chip memory which is local to each SM and available to all streaming processors within that SM is provided and even faster register-like memory is present on each streaming processor. A read-only cache of the device global memory is available to each SM in the form of a texture cache. Physically, GPUs have a much larger number of ALUs than a CPU, but the ALUs are not as complex as the ones found in a CPU. The GPU’s clock speed is normally about half that of a contemporary CPU’s; however, GPUs typically have an order of magnitude larger memory bandwidth to their onboard device global memory.

2.2 The emergence of GPU programming languages The spectrum of GPU accessibility for scientific use has two extremes. Prior to the development of current general purpose GPU programming models by the major GPU hardware manufacturers, heroic attempts [19] had been made by pioneers in the field in hijacking graphic specific APIs, such as OpenGL, and using them as vehicles for carrying out general purpose calculations. However, development was time consuming and essentially hardware specific. At the other extreme, a compiler should exist which can compile existing scientific code for execution on GPUs without the scientist having to consider the underlying nature of the hardware one is calculating on. At present, we are somewhere in-between these points; the barrier to utilizing GPU hardware for general purpose computation has been reduced by the intro duction of general purpose GPU programming models such as NVIDIA’s Com pute Unified Device Architecture (CUDA) [15] and AMD’s Stream [20]. However, algorithmic paradigm shifts are often required in existing codes to maximize such performance offered by the massively parallel GPU hardware. The CUDA programming model from NVIDIA appears to be the most mature and widespread in scientific applications at this moment in time, hence the discussion here will focus on specifics pertaining to it. CUDA, a C-like program ming language, enables code to run concurrently on the CPU and GPU, with the assumption that the numerically intensive parts of a program will be executed on the GPU and remaining sections, which are perhaps not suited to the GPU, remain executing on the CPU. A mechanism is provided for the two parts of the running code to communicate with each other.

8

Ross C. Walker et al.

CUDA abstracts the hierarchical GPU hardware structure outlined, into a programming framework, requiring the coder to write in an intrinsically parallel fashion. The small numerically intensive subroutines of code that run specifically on the GPU are termed kernels. These are executed in blocks where each block contains multiple instances of the kernel, termed threads. This partitioning enables the following (CUDA runtime mediated) physical mapping onto the GPU hardware: each block is run on an individual MP with the number of threads determined by the number of physical SPs within the MP. As a result, only threads within the same block can synchronize with each other. This block-based parallelism and the need to keep all SM units busy in order to achieve efficient performance lead to a number of nontrivial programming considerations.

2.3 GPU programming considerations A key strategy in improving wall clock time to scientific problem solution is recasting an algorithm in a way that makes it computationally palatable for the nature of the hardware that it is being executed on; an algorithm that performs poorly on a CPU may perform many orders of magnitude better on a GPU and vice versa. However, when dealing with scientific problems, it is essential that alternative approaches to solving the underlying physics yield the same solu tion, albeit via different paths. It is very tempting given the architectural differ ences of GPU hardware to change the nature of the problem being solved without a thorough understanding of the implications this has on the scientific results. General strategies when developing efficient algorithms on GPUs include the following: 1. Ensure that host-to-device communication during a calculation is kept to a minimum. For example, one should ensure that as much of the calculation remains on the GPU as possible. Ferrying data back and forth between the GPU and the host machine is costly due to the latency of the PCIe bus, hence if one is storing atomic coordinates on the host’s memory, then the GPU is going to be idle while it is waiting for an updated set to arrive. The above holds within the GPU as well. A corollary to this is that very often it is more efficient to recalculate an existing result on the GPU again, rather than fetch it from a nonlocal location. 2. Accuracy issues that arise from hardware single precision (SP) limitations need to be controlled in a way that is acceptable to the scientific algorithm being simulated. Approaches to this include sorting floats by size prior to addition and making careful use of double precision (DP) where needed [15]. 3. Recasting the problem in a vector fashion that groups data that will be operated on in the same way allows for maximizing the efficiency of the SPs. It should be clear from the above discussion that while GPUs offer an attrac tive price performance ratio, there are significant hurdles to utilizing them efficiently. Indeed, in some cases, the development costs of GPU-specific code may negate the cost/performance benefits.

Advancements in MD Simulations of Biomolecules on GPUs

9

3. GPU-BASED IMPLEMENTATIONS OF CLASSICAL MOLECULAR DYNAMICS As illustrated in the previous section, GPUs have come a long way in terms of their ease of use for general purpose computing. In the last four years, beginning in 2006, NVIDIA’s CUDA and ATI’s Stream APIs have made programming GPUs signifi cantly easier and the addition of DP hardware in NVIDIA’s GT200 line and ATI’s FireStream series has facilitated effective implementation of MD algorithms. Due to the reasons discussed above, GPUs are still significantly more complex to program than traditional CPUs. However, the potential cost/performance benefit makes them enticing development platforms. It is only very recently, however, that the use of GPUs for MD simulations has begun to mature to the point where fully featured production MD codes have appeared. The lure of very high performance improve ments for minimal cost has influenced early attempts at accelerating MD on GPUs. As we see below, the race to develop MD codes on this “new” hardware has led many to take inappropriate or untested approximations rather than taking the time to address the shortcomings of GPUs. It is also very difficult to compare successes and performance between implementations since a number of manuscripts show only speedups of small parts of the code or comparison against very different types of simulations. A detailed look at what appears, at first sight, to be a very crowded and successful field uncovers only a few select codes that could be considered production ready. In this section, we provide an overview of the peer-reviewed literature on GPU-based MD along with a discussion of these production ready codes.

3.1 Early GPU-based MD code development In what was arguably the first published implementation of GPU-accelerated MD, Yang et al. [19] reported an algorithm designed for MD simulation of thermal conductivity. This work was prior to the release of the CUDA and Stream APIs and hence the authors were forced to implement their algorithm directly in OpenGL [21]. Using an NVIDIA GeForce 7800 GTX, they observed performance improvements of between 10 and 11 times that of a single Intel Pentium 3.0 GHz processor. While an impressive proof of concept, the Yang et al. implementation was very simplistic containing just Lennard—Jones interactions and a neighbor list that was constructed to remain static over the course of the simulation. It thus lacked many of the important features, such as covalent terms, short- and longrange electrostatics, thermostats, barostats, neighbor list updates, and restraints needed for MD of biological systems. Nevertheless, this pioneering study demon strated that implementing an MD code on GPUs was feasible. The advent of the CUDA and Stream programming APIs made programming GPUs significantly easier and brought with them an explosion of GPU MD implementations. Most early implementations of MD on GPUs are characterized by an exploration of the field with the development of codes and GPU-specific algorithms focused on simplistic, artificial, or very specific model problems rather than the application of GPUs to “real-world” production MD simulations.

10

Ross C. Walker et al.

The first apparent MD implementation to use CUDA was by Liu et al. [22]. Like Yang et al., they too chose to implement just a simplistic van der Waals potential allowing them to avoid all of the complexities inherent in production MD simulations of condensed-phase systems. Unlike Yang, Liu et al. recomputed their neighbor list periodically providing the first example of a neighbor list update for MD on GPUs. Stone et al. [23] published a lengthy discussion on the implementation of a series of target algorithms for molecular modeling computations, including tech niques for direct Coulomb summation for calculating charge—charge interactions within a cutoff. They also discussed possible techniques for evaluation of forces in MD, providing the first mention of a combined treatment of direct space van der Waals and electrostatics in a GPU implementation. Their implementation, however, did not include any actual MD but instead focused on the more simplistic applications of ion placement and the calculation of time-averaged Coulomb potentials in the vicinity of a simulated system. While providing an example of how Coulomb interactions can be accelerated with GPUs and laying the groundwork for developing an experimental GPU-accelerated version of NAMD [24], the example applications are of limited interest for conducting production MD simulations. Following on the heels of Yang et al., a number of groups begun implement ing their own MD codes on GPUs although most were still simply proof-of concept prototypes with limited applicability for production MD calculations. For example, van Meel et al. [25] implemented a cell-based list algorithm for neighbor list updates but still only applied this to simple van der Waals fluids while Rapaport [26] provided a more detailed look at neighbor list approaches for simple van der Waals potentials. Anderson et al. [27] were the first to include the calculation of covalent terms, adding GPU computation of van der Waals and harmonic bond potentials to their HOOMD code in order to study nonionic liquids. They also included integrators and neighbor lists in their implementation; however, while the HOOMD GPU implementation went a step closer to a full MD implementation, it still neglected most of the complexities including both short- and long-range electrostatics, angle terms, torsion terms, and constraints required for simulating condensed-phase systems. Davis et al. [28] used a simple truncated electrostatic model to carry out simulations of liquid water. Their approach was similar to Anderson but also included angle and short-range electrostatic terms. While a demonstration of a condensed-phase simulation, the approach used was still extremely restrictive and of limited use in real-world applications. These early GPU-based MD implementations are characterized by signifi cantly oversimplifying the mathematics in order to make implementation on a GPU easier, neglecting, for example, electrostatics, covalent terms, and hetero genous solutions. This has resulted in a large number of GPU implementations being published but none with any applicability to “real-world” production MD simulations. It is only within the last year (2009/2010) that useful GPU imple mentations of MD have started to appear.

Advancements in MD Simulations of Biomolecules on GPUs

11

3.2 Production GPU-based MD codes The features typically necessary for a condensed-phase production MD code for biological simulations are explicit and implicit solvent implementations, correct treatment of long-range electrostatics, support for different statistical ensembles (NVT, NVE and NPT), thermostats, restraints, constraints, and integration algorithms. At the time of writing, there are only three published MD GPU implementations that could be considered production quality codes. These are the ACEMD code of Harvey et al. [29], the OpenMM library of Friedrichs et al. [30], and NAMD of Phillips et al. [24], although other inde pendent implementations such as support for generalized Born implicit solva tion in AMBER 10 [10] (http://ambermd.org/gpus) and support for explicit solvent PME calculations in AMBER 11 [31] are available but have not yet been published. The ACEMD package by Harvey et al. could be considered the first GPUaccelerated fully featured condensed-phase MD engine [29]. This program includes support for periodic boundaries and more importantly both shortand long-range electrostatics using a smooth particle mesh Ewald (PME) approach [32—34]. The OpenMM library initially only implemented the impli cit solvent generalized Born model on small- and medium-sized systems using direct summation of nonbonded terms [30]; Eastman and Pande further improved the OpenMM library and adapted it to explicit solvent simulation [35] although initially using reaction field methods instead of a full treatment of long-range electrostatics. Additionally, a GPU-accelerated version of GRO MACS has been developed which works via links to the OpenMM library. GPU acceleration of explicit solvent calculations are also available in NAMD v2.7b2, although acceleration is limited since only the direct space nonbond interactions are calculated on the GPU at present, necessitating a synchroniza tion between GPU and CPU memory on every time step [24]. A comparison of the key features of production MD codes, at the time of writing, is listed in Table 2. From a functionality perspective, at the time of writing, AMBER 11 includes the broadest set of features, capable of running implicit and explicit solvent simulations in all three ensembles with flexible restraints on any atoms as well as allowing the use of multiple precision models although it only supports a single GPU per MD simulation at present. Some of the other codes do not include all of the key features for MD simulation such as pressure coupling and implicit solvent models although this will almost certainly change in the future. The NAMD implementation is CPU centric, focusing on running MD in a multiple node, multiple GPU environment, whereas others implement all MD features on the GPU and strive to optimize MD performance on a single GPU or multiple GPUs on a single node. We note that of all the production MD codes available OpenMM is the only one to support both NVIDIA and ATI GPUs; the others are developed just for NVIDIA GPUs. We also note that ACEMD and AMBER are commercial pro ducts, whereas the others are available under various open-source licensing models.

12

Code

Simulation implementation

GPU acceleration

Multiple GPU support

GPU type

Licensing model

ACEMD

Explicit solvent, PME, NVE, NVT, SHAKE Explicit solvent, implicit solvent (GB), PME, NVE, NVT, SHAKE Explicit solvent, PME, NVE, NVT, NPT, SHAKE, Restraint

All features

Three GPUs at present

NVIDIA

Commercial

All features

Single GPU at present

ATI/NVIDIA Free, open source

Direct space on nonbonded interactions only All features

Multiple GPUs on multiple NVIDIA nodes, but scalability bottlenecked by internode communication Single GPU at present NVIDIA

OpenMMa

NAMD

AMBER11 Explicit solvent, implicit (PMEMD) solvent (3 GB variants),

PME, NVE, NVT, NPT,

SHAKE, Restraint

a

GROMACS has been implemented with OpenMM.

Free, open source

Commercial (source available)

Ross C. Walker et al.

Table 2 Key feature comparison between the GPU-accelerated MD codes

Advancements in MD Simulations of Biomolecules on GPUs

13

4. PERFORMANCE AND ACCURACY 4.1 Performance and scaling The performance of MD simulations on modern clusters and supercomputers is currently limited by the communication bottlenecks that occur due to the sig nificant imbalances that exist between CPU speeds and hardware interconnects. The use of GPUs does nothing to alleviate this and indeed actually exacerbates it by making an individual node faster and thus increasing the amount of communication per unit of time that is required between nodes. For this reason, GPU-accelerated MD does not offer the ability to run substantially longer MD simulations than are currently feasible on the best supercomputer hardware, nor does it provide a convincing case for the construction of large clusters of GPUs; however, what it does offer is the ability to run substantially more sampling on a workstation or single node for minimal cost. The huge performance gap that exists between cluster interconnects and GPUs has meant that the majority of implementations have focused on utilizing just a single GPU (OpenMM, AMBER) or multiple GPUs within a single node (ACEMD). Only NAMD has attempted to utilize multiple nodes but with success that is largely due to simulating very large systems and not attempting to optimize single-node per formance, thus requiring large numbers of GPUs to achieve only modest speed ups and negating many of the cost/performance benefit arguments. Thus the benefit of GPUs to condensed-phase MD should be seen in the concept of condensing small (2—8 node) clusters into single workstations for a fraction of the cost rather than providing a way to run hundreds of microseconds of MD per day on large clusters of GPUs. A fair comparison of performance across current implementations is very difficult since it is almost impossible to run identical simulations in different programs, and indeed even within the same program it is not always possible to make a fair comparison since additional approximations are often made to the GPU implementation in the desire to achieve larger speedups without considering such approaches on the CPU. There are also numerous situations where people compare the performance of individual kernels, such as the Coulomb sum, rather than the complete implementation. Indeed a careful look at the current literature finds speedups ranging from 7 to 700þ. To understand why such numbers might be misleading, consider, for example, the performance reported by Davis et al. [28] in which they compare simulations of various boxes of water with their GPU imple mentation against that of the CHARMM [36] code. They claim on average to be 7 faster than CHARMM on a single CPU but at no point in their paper mention the version of CHARMM used, the compilers used, or even the settings used in the CHARMM code. It should be noted that, largely for historical reasons, the use of default settings in CHARMM tends to give very poor performance. There are then of course multiple optimizations that can be made on the GPU due to the simplicity of the water model. The first is the use of cubic boxes which can benefit vectoriza tion on the GPU, for codes supporting PME it also provides more optimum fast fourier transform (FFT) performance. The second is the use of the SPC/Fw water

14

Ross C. Walker et al.

model [37] which avoids the complexities of doing SHAKE-based constraints on the GPU. Finally, the use of a pure water box means that all molecules are essentially identical. This allows one to hard code all of the various parameters, since all bonds are identical, all oxygen charges are identical, etc., and thus avoid the additional costs associated with doing such lookups on the GPU. For these reasons, the performance and speedups quoted for various GPU implementations should typically be considered an upper bound on the performance achievable. Additionally, many factors determine the performance of GPU-accelerated MD codes. Implicit solvent simulations in general show much greater perfor mance boosts over explicit solvent simulation due to the reduced complexities of the underlying algorithm. Specifics include avoiding the need for FFTs and the use of infinite cutoffs which in turn remove the complexity of maintaining a neighbor list. Friedrichs et al. [30] reported more than 60-fold speedup between their single-precision OpenMM code and presumably AMBER 9’s DP Sander implementation for systems of 600 atoms and more than two orders of magnitude speedup for systems of 1200 atoms in OpenMM implicit solvent simulations [30]. Similar speedup has been observed in direct comparisons between AMBER’s PMEMD code running on 2.8 GHz Intel E5462 CPUs and NVIDIA C1060 Tesla cards [38,39]. Phillips et al. reported up to 7-fold speedup for explicit solvent simulation with GPU-accelerated NAMD, relative to CPU-based NAMD [40], while OpenMM also showed impressive linear performance scaling over system size in its non-PME explicit solvent simulations and at least 19-fold speedup compared to single-CPU MD on simulations of the lambda repressor [30]. How ever, it is unclear from the OpenMM manuscript if the comparisons are like for like since the AMBER and NAMD numbers appear to be for full PME-based explicit solvent simulations. ACEMD showed that its 3-CPU/3-GPU performance was roughly equivalent to 256-CPU NAMD on the DHFR system and 16-CPU/16 GPU accelerated NAMD on the apoA1 system [29].

4.2 Validation While the majority of articles describing new GPU MD implementations have focused considerable attention on performance comparison to CPU simulations, there has been very little effort to comprehensively test and validate the imple mentations, both in terms of actual bugs and in the use of various approxima tions such as single precision or alternative electrostatic treatments. Since DP has only recently become available on GPUs and because SP still offers a more than 10-fold performance enhancement, all of the GPU-based MD implementa tions use either single precision or a combination of hybrid single and DP math. Several authors have attempted to provide validation of this and other approx imations but often only in a limited fashion while instead preferring to focus on performance. For example, van Meel et al. [25] and Phillips et al. [24] made no mention of validation. Davis et al. [28] simply ran their water box simulations on the CPU and GPU and then provided plots of energy and temperature profiles for the two simulations without any form of statistical analysis.

Advancements in MD Simulations of Biomolecules on GPUs

15

Liu et al. [22] simply stated that their CUDA version of the code gives output values that are within 0.5% of their Cþþ version, while Anderson et al. [27] just compare the deviation in atom positions between two runs on different CPU counts and on the GPU. Harvey et al. [29] attempted more in-depth validation of their code; however, this was still far from comprehensive. For example, they stated in their manu script that “Potential energies were checked against NAMD values for the initial configuration of a set of systems, ..., in order to verify the correctness of the force calculations by assuring that energies were identical within 6 significant figures.” Since scalar potential energies do not convey information about the vector forces, it is unclear how the authors considered this a validation of their force calcula tions. They provide a table with energy changes in the NVE ensemble per nanosecond per degree of freedom but do not provide any independent simula tions for comparison. The authors also state that “... we validate in this section the conservation properties of energy in a NVT simulation ...” which is of little use in validation since energy is not a conserved quantity in the NVT (canonical) ensemble. Additionally, they carried out calculations of Na—Na pair distribution functions using their ACEMD GPU code and also GROMACS on a CPU; how ever, the lack of consistency in the simulation parameters between GPU and CPU and the clear lack of convergence in the results mean that the validation is qualitative at best. Friedrichs et al. [30] attempted to validate their OpenMM implementation by simply examining energy conservation for simulations of the lambda repressor and stating, although as with Harvey et al. not providing the numbers in the table to ease comparison, that this compares favorably with other DP CPU implementations. The push to highlight performance on GPUs has meant that not one of the currently published papers on GPU implementations of MD actually provide any validation of the approximations made in terms of statistical mechanical proper ties. For example, one could include showing that converged simulations run on a GPU and CPU give identical radial distribution functions, order parameters, and residue dipolar couples to name but a few possible tests.

5. APPLICATIONS While a significant number of papers published describe GPU implementations of MD, a review of the literature reveals very few cited uses of these codes in “real-world” simulations. Indeed only Pande et al. have such papers published at the time of writing. This serves to underscore the nascent nature of this field.

5.1 Protein folding In the only published examples of the use of GPU-accelerated bio-MD simula tions, Pande et al. have used the OpenMM library to study protein folding in

16

Ross C. Walker et al.

implicit solvent [41]. This work studied the folding pathways of a three-stranded beta-sheet fragment derived from the Hpin1 WW domain (Fip35) [41] and the 39 residue protein NTL9 [42]. The estimated folding timescale of Fip35 experimen tally is ~13 ms. With an average performance of 80—200 ns/day on a single GPU, for this 544-atom protein fragment and utilizing the Folding@Home distri buted computing network [43], they were able to generate thousands of inde pendent trajectories totaling over 2.73 ms of ensemble-averaged results, with an average length of 207 ns per trajectory and with some trajectories of greater than 3 ms in length allowing a direct exploration of the folding landscape. Similar trajectory lengths were calculated for the NTL9 (922 atom) case. Addi tionally, Harvey and De Fabritiis performed a 1 ms explicit solvent MD simu lation of the villin headpiece to probe its folding kinetics as part of their ACEMD benchmark results and achieved 66 ns/day on a three-GPU-equipped workstation [29]. These studies have demonstrated the significance of GPUaccelerated MD implementations in helping researchers use personal work stations to reach simulation timescales that would typically only be possible using large clusters and obtain ensemble-averaged results that provide sam pling timeframes comparable to experiment. This potentially opens the door to studying a whole range of relevant biological events without requiring access to large-scale supercomputer facilities.

6. CONCLUSIONS AND FUTURE DIRECTIONS It should be clear from this chapter that the field of GPU acceleration of condensed-phase biological MD simulations is still in its infancy. Initial work in the field concentrated on artificially simplistic models and it is only recently that production quality MD codes have been developed that can make effective use of this technology. The pressure to achieve maximum performance has led to a number of shortcuts and approximations being made, many without any real validation or rigorous study. What initially appears to be an established and extremely active field actually, upon scrap ing the surface, consists of only a few select codes which could be considered to be production ready and even less examples of “real-world” use. However, the current cost benefits of GPUs are enticing and this is driving both code and hardware development. In a few short years, GPU-based MD codes have evolved from proof-of-con cept prototypes to production-level software packages. Despite the substantial progress made in the code development, the difficulty in programming GPU devices still persists, forcing approximations to be made to circumvent some of the limitations of GPU hardware. However, NVIDIA’s recently released Fermi [44] architecture and the accompanying CUDA 3.0 library [15] for the first time provides features such as full support for DP and error-correcting memory along with a more versatile FFT implementation that many consider vital to effective use of GPUs for MD simulations. Given this, a number of established groups in the biological MD field are in the process of developing GPU-accelerated versions of

Advancements in MD Simulations of Biomolecules on GPUs

17

their software. This will bring more competition to the field and hopefully with it a better focus on extensive validation of the approximations made. It is anticipated that with the release of GPU versions of widely used MD codes the use of GPUs in research involving MD will likely increase exponen tially over the coming years assuming that developers can demonstrate the credibility of these implementations to the same degree to which CPU imple mentations have been subjected over the years.

ACKNOWLEDGMENTS This work was supported in part by grant 09-LR-06-117792-WALR from the University of California Lab Fees program and grant XFT-8-88509-01/DE-AC36-99GO10337 from the Department of Energy to RCW.

REFERENCES 1. McCammon, J.A., Gelin, B.R., Karplus, M. Dynamics of folded proteins. Nature 1977, 267, 585—90. 2. Duan, Y., Kollman, P.A. Pathways to a protein folding intermediate observed in a 1-microsecond simulation in aqueous solution. Science 1998, 282, 740—4. 3. Yeh, I., Hummer, G. Peptide loop-closure kinetics from microsecond molecular dynamics simula tions in explicit solvent. J. Am. Chem. Soc. 2002, 124, 6563—8. 4. Klepeis, J.L., Lindorff-Larsen, K., Dror, R.O., Shaw, D.E. Long-timescale molecular dynamics simulations of protein structure and function. Curr. Opin. Struct. Biol. 2009, 19, 120—7. 5. Sanbonmatsu, K.Y., Joseph, S., Tung, C. Simulating movement of tRNA into the ribosome during decoding. Proc. Natl. Acad. Sci. USA 2005, 102, 15854—9. 6. Freddolino, P.L., Arkhipov, A.S., Larson, S.B., Mcpherson, A., Schul-ten, K. Molecular dynamics simulations of the complete satellite tobacco mosaic virus. Structure 2006, 14, 437—49. 7. Bakker, A.F., Gilmer, G.H., Grabow, M.H., Thompson, K. A special purpose computer for mole cular dynamics calculations. J. Comput. Phys. 1990, 90, 313—35. 8. Fine, R., Dimmler, G., Levinthal, C. FASTRUN: A special purpose, hardwired computer for molecular simulation. Protein Struct. Funct. Genet. 1991, 11, 242—53. 9. Susukita, R., Ebisuzaki, T., Elmegreen, B.G., Furusawa, H., Kato, K., Kawai, A., Kobayashi, Y., Koishi, T., McNiven, G.D., Narumi, T., Yasuoka, K. Hardware accelerator for molecular dynamics: MDGRAPE-2. Comput. Phys. Commun. 2003, 155, 115—31. 10. Case, D.A., Darden, T.A., Cheatham, T.E., Simmerling, C.L., Wang, J., Duke, R.E., Luo, R., Crowley, M., Walker, R.C., Zhang, W., Merz, K.M., Wang, B., Hayik, S., Roitberg, A., Seabra, G., Kolossvary, I., Wong, K.F., Paesani, F., Vanicek, J., Wu, X., Brozell, S.R., Steinbrecher, T., Gohlke, H., Yang, L., Tan, C., Mongan, J., Hornak, V., Cui, G., Mathews, D.H., Seetin, M.G., Sagui, C., Babin, V., Koll man, P.A., AMBER 10, University of California, San Francisco, 2008. 11. Case, D.A., Cheatham, T.E., Darden, T., Gohlke, H., Luo, R., Merz, K.M., Onufriev, A., Simmerling, C., Wang, B., Woods, R.J. The amber biomolecular simulation programs. J. Comput. Chem. 2005, 26, 1668—88. 12. Yuri, N. Performance analysis of clearspeed’s CSX600 interconnects, in Parallel and Distributed Processing with Applications, 2009 IEEE International Symposium, pp. 203—10. 13. Shaw, D.E., Deneroff, M.M., Dror, R.O., Kuskin, J.S., Larson, R.H., Salmon, J.K., Young, C., Batson, B., Bowers, K.J., Chao, J.C., Eastwood, M.P., Gagliardo, J., Grossman, J.P., Ho, R.C., Ierardi, D.J., Kolossv� ary, I., Klepeis, J.L., Layman, T., Mcleavey, C., Moraes, M.A., Mueller, R., Priest, E.C., Shan, Y., Spengler, J., Theobald, M., Towles, B., Wang, S.C. Anton, a special-purpose machine for molecular dynamics simulation. SIGARCH Comput. Archit. News 2007, 35, 1—12. 14. Narumi, T., Ohno, Y., Noriyuk, F., Okimoto, N., Suenaga, A., Yanai, R., Taiji, M. In From Computa tional Biophysics to Systems Biology: A High-Speed Special-Purpose Computer for Molecular

18

15.

16. 17. 18. 19. 20. 21. 22.

23. 24.

25. 26. 27. 28. 29. 30.

31.

32. 33. 34. 35. 36.

37.

Ross C. Walker et al.

Dynamics Simulations: MDGRAPE-3 (eds J. Meinke, O. Zimmermann, S. Mohanty and U.H.E. Hansmann) J. von Neumann Institute for Computing, Ju¨lich, 2006, pp. 29—36. NVIDIA: Santa Clara, CA, CUDA Programming Guide, http://developer.download.nvidia.com/ compute/cuda/30/toolkit/docs/NVIDIACUDAProgrammingGuide3.0.pdf (Accessed March 6, 2010) von Neumann, J. First draft of a report on the EDVAC. IEEE Ann. Hist. Comput. 1993, 15, 27—75. Flynn, M.J., Some computer organizations and their effectiveness. IEEE Trans. Comput. 1972, C-21, 948—60. Kirk, D.B., Hwu, W.W. Programming Massively Parallel Processors, Morgan Kaufmann Publish ers, Burlington, 2010. Yang, J., Wang, Y., Chen, Y. GPU accelerated molecular dynamics simulation of thermal conduc tivities. J. Comput. Phys. 2007, 221, 799—804. AMD: Sunnyvale, CA, ATI, www.amd.com/stream (Accessed March 14, 2010) Woo, M., Neider, J., Davis, T., Shreiner, D. OpenGL Programming Guide: The Official Guide to Learning OpenGL, version 1.2, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1999. Liu, W., Schmidt, B., Voss, G., Mu¨ller-Wittig, W. In High Performance Computing–HiPC 2007: Lecture Notes in Computer Science (eds S. Aluru, M. Parashar, R. Badrinath and V.K. Prasanna), Vol. 4873, Springer, Berlin/Heidelberg, 2007, pp. 185—96. Stone, J.E., Phillips, J.C., Freddolino, P.L., Hardy, D.J., Trabuco, L.G., Schulten, K. Accelerating molecular modeling applications with graphics processors. J. Comput. Chem. 2007, 28, 2618—40. Phillips, J.C., Stone, J.E., Schulten, K. Adapting a message-driven parallel application to gpu accelerated clusters, In SC ’08: Proceedings of the 2008 ACM/IEEE conference on Super comput ing, 1—9, IEEE Press, Piscataway, NJ, USA, 2008. van Meel, J.A., Arnold, A., Frenkel, D., Portegies Zwart, S.F., Belleman, R.G. Harvesting graphics power for MD simulations. Mol. Simulat. 2008, 34, 259—66. Rapaport, D.C. Enhanced molecular dynamics performance with a programmable graphics pro cessor, arXiv Physics, 2009, arXiv:0911.5631v1 Anderson, J.A., Lorenz, C.D., Travesset, A. General purpose molecular dynamics simulations fully implemented on graphics processing units. J. Comput. Phys. 2008, 227, 5342—59. Davis, J., Ozsoy, A., Patel, S., Taufer, M. Towards Large-Scale Molecular Dynamics Simulations on Graphics Processors, Springer, Berlin/Heidelberg, 2009. Harvey, M.J., Giupponi, G., De Fabritiis, G. ACEMD: Accelerating biomolecular dynamics in the microsecond time scale. J. Chem. Theory Comput. 2009, 5, 1632—9. Friedrichs, M.S., Eastman, P., Vaidyanathan, V., Houston, M., Le Grand, S., Beberg, A.L., Ensign, D. L., Bruns, C.M., Pande, V.S. Accelerating molecular dynamic simulation on graphics processing units. J. Comput. Chem. 2009, 30, 864—72. Case, D.A., Darden, T.A., Cheatham, T.E.III, Simmerling, C.L., Wang, J., Duke, R.E., Luo, R., Crowley, M., Walker, R.C., Williamson, M.J., Zhang, W., Merz, K.M., Wang, B., Hayik, S., Roitberg, A., Seabra, G., Kolossv�ary, I., Wong, K.F., Paesani, F., Vanicek, J., Wu, X., Brozell, S.R., Steinbrecher, T., Gohlke, H., Yang, L., Tan, C., Mongan, J., Hornak, V., Cui, G., Mathews, D.H., Seetin, M.G., Sagui, C., Babin, V., Kollman, P.A. Amber 11, Technical report, University of Cali fornia, San Francisco, 2010. Darden, T., York, D., Pedersen, L. Particle mesh ewald: An Nlog(N) method for ewald sums in large systems. J. Chem. Phys. 1993, 98, 10089—92. Essmann, U., Perera, L., Berkowitz, M.L., Darden, T., Lee, H., Pedersen, L.G. A smooth particle mesh Ewald method. J. Chem. Phys. 1995, 103, 8577—93. Harvey, M.J., De Fabritiis, G. An implementation of the smooth particle mesh Ewald method on GPU hardware. J. Chem. Theory Comput. 2009, 5, 2371—7. Eastman, P., Pande, V.S. Efficient nonbonded interactions for molecular dynamics on a graphics processing unit. J. Comput. Chem. 2010, 31, 1268—72. Brooks, B.R., Bruccoleri, R.E., Olafson, B.D., States, D.J., Swaminathan, S., Karplus, M. CHARMM: A program for macromolecular energy, minimization, and dynamics calculations. J. Comput. Chem. 1983, 4, 187—217. Wu, Y., Tepper, H.L., Voth, G.A. Flexible simple point-charge water model with improved liquidstate properties. J. Chem. Phys. 2006, 124, 24503.

Advancements in MD Simulations of Biomolecules on GPUs

19

38. Grand, S.L., Goetz, A.W., Xu, D., Poole, D., Walker, R.C. Accelerating of amber generalized born calculations using nvidia graphics processing units. 2010 (in preparation). 39. Grand, S.L., Goetz, A.W., Xu, D., Poole, D., Walker, R.C. Achieving high performance in amber PME simulations using graphics processing units without compromising accuracy. 2010 (in preparation). 40. Phillips, J.C., Braun, R., Wang, W., Gumbart, J., Tajkhorshid, E., Villa, E., Chipot, C., Skeel, R.D., Kale, L., Schulten, K. Scalable molecular dynamics with NAMD. J. Comput. Chem. 2005, 26, 1781—802. 41. Ensign, D.L., Pande, V.S. The Fip35 WW domain folds with structural and mechanistic hetero geneity in molecular dynamics simulations. Biophys. J. 2009, 96, L53—5. 42. Voelz, V.A., Bowman, G.R., Beauchamp, K., Pande, V.S. Molecular simulation of ab initio protein folding for a millisecond folder NTL9(1-39). J. Am. Chem. Soc. 2010, 132, 1526—8. 43. Shirts, M., Pande, V.S. Computing: Screen savers of the world unite! Science 2000, 290, 1903—4. 44. NVIDIA Corporation Next generation CUDA compute architecture: Fermi, 2009.

CHAPTER

2 Quantum Chemistry on Graphics Processing Units Andreas W. Go ¤ tz1, Thorsten Wo ¤ lfle1,2, and Ross C. Walker1

Contents

Abstract

1. Introduction 2. Software Development for Graphics

Processing Units 3. KohnSham Density Functional and HartreeFock Theory 3.1 Electron repulsion integrals 3.2 Numerical exchange-correlation quadrature 3.3 Density-fitted Poisson method 3.4 Density functional theory with Daubechies wavelets 4. Ab Initio Electron Correlation Methods 4.1 Resolution-of-identity second-order MłllerPlesset

perturbation theory 5. Quantum Monte Carlo 6. Concluding Remarks Acknowledgments References

22

23

24

25

29

29

30

31

31

32

33

34

34

We report on the current status of algorithm development and software implementations for acceleration of quantum chemistry and computational condensed matter physics simulations on graphics processing units (GPUs) as documented in the peer-reviewed literature. We give a general overview of programming techniques and concepts that should be considered when porting scientific software to GPUs. This is followed by a discussion of Hartree-Fock and density functional theory, wave function-based electron correlation methods and quantum Monte Carlo in which we outline the underlying problems and present the approaches which aim at exploiting the performance of the massively parallel GPU hardware. We conclude with a

1

San Diego Supercomputer Center, University of California San Diego, La Jolla, CA, USA

2

Lehrstuhl fu¨r Theoretische Chemie, Universita¨t Erlangen, Erlangen, Germany

Annual Reports in Computational Chemistry, Volume 6 ISSN: 1574-1400, DOI 10.1016/S1574-1400(10)06002-0

2010 Elsevier B.V. All rights reserved.

21

22

Andreas W. Go ¤ tz et al.

critical assessment of the present state of the field and discuss future directions that are likely to be taken. Keywords: quantum chemistry; density functional theory; HartreeFock theory; MłllerPlesset perturbation theory; quantum Monte Carlo; graphics processing units; CUDA; NVIDIA; ATI; accelerator

1. INTRODUCTION Commodity graphics processing units (GPUs) are becoming increasingly popular to accelerate molecular and condensed matter simulations due to their low cost and potential for high performance when compared with central processing units (CPUs). In many instances, classical approximations are very successful for such simulations. However, a large number of problems of contemporary nano-, bio-, or materials science require a quantum mechanical description of the electronic structure [1—3]. This chapter provides an overview of recent developments within quantum chemistry and computational condensed matter physics that utilize accelerator hardware for this purpose. Quantum chemistry and solid-state physics codes implement relatively complex algorithms [4]. The challenge in using GPUs lies in adapting these algorithms to take advantage of their specialized hardware. A successful GPU implementation requires, for example, a careful consideration of the memory hierarchy in order not to expose memory access latency [5]. When using singleprecision GPUs, the numerical accuracy is a central issue because six to seven significant figures are frequently insufficient to match the accuracy of the under lying theoretical model, that is, to achieve “chemical accuracy” of 1 kcal mol—1. Finally, care should be taken to allow for a coevolution of the code with the hardware. There are two general strategies for an implementation. First, a com plete reimplementation of existing functionality into a new software package. The most common way, however, is to incrementally include GPU kernels for the computationally intensive parts of existing software packages. The latter approach has the advantage of retaining the full functionality of software packages that in many cases have evolved over several decades. This chapter begins with a brief introduction to the general concepts that have to be considered in order to successfully port scientific software to GPUs. The rest of this chapter is structured according to the different theoretical models com monly used in quantum chemistry, beginning with density functional theory (DFT) in Section 3 which also covers Hartree—Fock (HF) theory. Section 4 deals with ab initio electron correlation methods while Section 5 discusses quantum Monte Carlo (QMC). Each of these sections contains an overview of the critical parts of the underlying theory followed by a presentation and analysis of approaches that have been taken to accelerate the computationally intensive parts on GPUs. Section 6 summarizes the present state of GPU implementations for quantum chemistry and finishes with general conclusions on trends to be expected in the foreseeable future.

Quantum Chemistry on Graphics Processing Units

23

2. SOFTWARE DEVELOPMENT FOR GRAPHICS PROCESSING UNITS An excellent introduction to software development for GPUs including a discus sion of the hardware and its historic development can be found in the book of Kirk and Hwu [5]. In order to be able to write software which runs efficiently on GPUs, it is necessary to have an understanding of the characteristics of the GPU hardware architecture. A GPU is an example of a massively parallel stream-processing architecture which uses the single-instruction multiple data (SIMD) vector processing model. Typical GPUs contain many arithmetic units which are arranged in groups that share fast access memory and an instruction unit. The high density of arithmetic units, however, comes at the expense of larger cache sizes and control units. The NVIDIA GeForce 8800 GTX GPU which was released in late 2006, for example, consists of 16 sets of streaming multiprocessors (SMs), each of which is composed of eight scalar proces sors (ScaPs). Each SM operates independently of the other SMs and at any given clock cycle, each ScaP within an SM executes the same instruction but for different data. Due to this intrinsic parallelization, a GPU can outperform a standard CPU for tasks which exhibit a dense level of data parallelism. Successful approaches in GPU programming therefore require exposing the data parallelism in the underlying problem. Each SM has access to four different types of on-chip memory with high bandwidth. In the case of the NVIDIA GeForce 8800 GTX, these are 1024 local registers per ScaP, shared memory (cache) of 16 kilobytes (KB), read-only constant cache of 8 KB to speed up reads from the constant memory space, and read-only texture cache of 8 KB to speed up reads from the texture memory space. In addition, a large, noncached off-chip graphics card memory is available. This memory, however, has a high latency of approximately 500 GPU cycles. Applications on a GPU are organized into streams and kernels. The former represent blocks of data while the latter execute operations on the data. Before a GPU kernel is executed, the CPU must copy required data to the GPU memory. To maximize the speedup of the implemented kernels, the algorithm has to be adapted to the underlying hardware architecture-dependent features like memory layout. Copy operations between main memory and graphics card memory, for example, should be avoided because access to the main memory has a high latency on the order of hundreds of GPU cycles. One of the main problems when programming GPUs is the limited size of working memory (registers, caches) which are available on chip. A large number of parallel threads should therefore be run concurrently to hide the latency of the registers and the shared and global memory and avoid pipeline stalls. It is important to realize that many of these considerations are not only important for GPU programming. The arrangement of data in a data-parallel fashion, for example, is also important for parallel programming of distributed memory architectures, which are found in most of today’s standard CPU clusters. Thus many of the techniques employed to improve the parallel efficiency of quantum chemistry codes are also applicable to GPUs. The same holds for the optimization of memory access patterns. A general

24

Andreas W. Go ¤ tz et al.

example for a portable algorithm is the fastest fourier transform in the west (FFTW) Fourier transform library which reaches optimal performance on the target platform by using a divide-and-conquer strategy [6]. Early use of GPUs required one to describe a problem to be solved in terms of a graphics pipeline employing either OpenGL or DirectX graphics programming languages. This complexity made general purpose computation on GPUs a research topic. However, with the release of NVIDIA’s compute unified device architecture (CUDA) [7] and ATI’s Stream [8] application programming inter faces (APIs), implementations of algorithms for GPUs using a relatively simple extension of the standard C language have become possible. A detailed overview of the hardware and CUDA and Stream APIs can be found on the NVIDIA [9] and ATI [8] homepages, respectively. In addition, high abstraction subroutine libraries are available that provide algorithms for commonly used problems in quantum chemistry and solid-state physics such as Fourier transforms (CUFFT) [10] and linear algebra (CUBLAS, MAGMA) [11,12]. The first generation of GPUs to support CUDA, such as the NVIDIA Geforce 8800 GTX, only featured 32-bit single-precision (SP) arithmetics and thus was of only limited use for quantum chemistry. Major efforts had to be made to deal with roundoff errors resulting from the lack of 64-bit double-precision (DP) data types. The second generation of GPUs introduced the missing 64-bit arithmetics, albeit only at an eighth of the SP performance. GPU cards dedicated to general purpose computing such as the NVIDIATesla C1060, which also supports large amounts of up to 4 gigabytes (GB) onboard memory, were introduced. The low speed of the DP arithmetics and missing features such as error-correcting code (ECC), however, still hamper widespread acceptance of this generation of GPUs for scientific computing as compared to multi socket CPUs. The third generation of GPUs (such as the NVIDIA Fermi) will solve some of the major problems of the earlier models. Most importantly, DP support will be included at only half the speed of SP arithmetics. The availability of a global address space and 64-bit support will help to address the memory requirement to solve larger problems and support multiple GPUs in an easier and more transparent fashion. Access to CPU main memory will remain slow, however, because the data transfer takes place over the peripheral component interconnect (PCI) bus.

3. KOHNSHAM DENSITY FUNCTIONAL AND HARTREEFOCK THEORY Due to its excellent balance between accuracy and computational cost, Kohn— Sham density functional theory (KS-DFT) [13,14] is usually the method of choice to investigate electronic ground states and their properties in chemistry and solid-state physics [15,16]. Hartree—Fock (HF) wavefunctions, on the other hand, are the starting point for ab initio electron correlation methods [4,15] which are discussed in Section 4. There are two major computational bottlenecks in KS-DFT and HF calcula tions [15]: evaluation of the KS (or Fock) matrix elements and solution of the selfconsistent field (SCF) equations. The latter requires diagonalization of the Fock

Quantum Chemistry on Graphics Processing Units

25

Table 1 Summary of the capabilities and performance of GPU-based KS-DFT and HF implementations published to date rE Parallela Speedupb

Authors

lmax

ERIs J

Yasuda [18,27] Ufimtsev and Martinez [21,23,25] Asadchev et al. [26] Brown et al. [30,31]

p p

Yes Yes No Yes No No Yes Yes Yes No Yes Yes

10 100

g f

Yes No No No No No No Yesc No Yes Yes Yes

25 15d

K

XC V

a

Support for parallelization across multiple GPUs.

Estimates for one GPU as compared to one CPU.

c Contribution due to Poisson density fitting via numerical quadrature.

d Using 12 ClearSpeed xe620 accelerator cards.

b

matrix which eventually dominates the computational cost for very large calcula tions. This topic has not been extensively discussed in the GPU literature but could potentially be tackled with alternatives to diagonalization as employed in linear scaling approaches to electronic structure methods [17]. The computational effort for the formation of the KS (or Fock) matrix is dominated by the evaluation of the two-electron repulsion integrals (ERIs) which are required for the Coulomb and exact-exchange contributions and, in the case of DFT, also the numerical quadrature of the exchange-correlation (XC) contribution. Efforts to accelerate these steps are summarized in Table 1 and reviewed in the remainder of this section.

3.1 Electron repulsion integrals The ERIs which are required in quantum chemistry are given as ð ðrÞ ðrÞ ðr0 Þl ðr0 Þ ðjlÞ ¼ dr dr0 ; jr r0 j

ð1Þ

where are basis functions that are usually chosen to be atom-centered Gaus sian functions. In general, these basis functions are contracted, that is, linear combinations of primitive Gaussian functions p and the ERIs become ðjlÞ ¼

X dp dq dr dls ðpqjrsÞ:

ð2Þ

pqrs

Formally, OðN 4 Þ of these ERIs need to be evaluated, where N denotes the size of the molecule under consideration. Although for large systems most of the inte grals are zero or negligible, the asymptotic scaling remains OðN 2 Þ and the sheer number of ERIs that need to be calculated represents a major computational bottleneck. Many different algorithms have been devised for the calculation of these ERIs and their efficiency depends on the contraction length and angular

26

Andreas W. Go ¤ tz et al.

momentum quantum number of the basis functions involved [4]. CPU-based quantum chemistry codes therefore implement several ERI algorithms and make use of the best method for a given type of ERI. From the ERIs, the Coulomb and exact-exchange contributions to the KS (or Fock) matrix are obtained as J ¼

X Pl ðjlÞ; l

K ¼

X Pl ðjlÞ;

ð3Þ

l

where P are elements of the density matrix. As is common in direct SCF methods, by combining Eqs. (1) and (3), the contributions to the KS (or Fock) matrix can be evaluated directly such that the contracted ERIs never need to be explicitly calculated and stored in memory. Yasuda was the first to realize the potential of GPUs for the acceleration of ERI calculations [18]. In his work, the major problems hindering algorithm develop ment on GPUs are addressed and the results for the calculation of the Coulomb contribution to the KS matrix with s- and p-type basis functions are presented for a CUDA implementation. Although it is not the most efficient algorithm for ERIs over basis functions with low angular momentum quantum number, the Rys quadrature [19] scheme was chosen. Due to its low memory requirements, this scheme allows one to maximize the load balance of the GPU’s SMs. A new inter polation formula for the roots and weights of the quadrature was proposed which is particularly suitable for SIMD processing, and an error analysis for the quad rature was given. A mixed-precision (MP) CPU/GPU scheme was introduced which calculates the largest ERIs (prescreened via the Schwarz integral bound and an adjustable threshold) in DP on the CPU and the remaining ERIs in SP on the GPU such that the absolute error in the calculated ERIs can be controlled. This, together with data accumulation (Coulomb matrix formation) via 48-bit multi precision addition (which can be implemented in software for GPUs without DP support), leads to accurate DFT SCF energies while the errors are of the order of 10 3 au (around 1 kcal mol—1) if all ERIs are calculated on the GPU. The contribu tions to the Coulomb matrix are directly computed from the uncontracted ERIs in a SIMD fashion on the GPU which avoids the problem of having to transfer the large amount of ERIs from GPU to CPU memory. Instead, only the density and Coulomb matrix have to be transferred. If all ERIs are evaluated on the GPU (NVIDIA GeForce 8800 GTX), speedups around one order of magnitude have been observed for the formation of the Coulomb matrix for molecules as big as valinomycin (168 atoms) with a 6-31G basis set as compared to a conventional implementation running on an Intel Pentium 4 CPU with 2.8 GHz [18]. If part of the ERIs are calculated on the CPU to reduce the error in the total energy to 10 6 au (less than 10 3 kcal mol—1), the speedup drops to around three. However, as Yasuda states, there is room for improvement in the performance, for example, through pipelining and also potentially by exploiting the DP functionality of current and future GPUs. Ufimtsev and Martinez (UM) have also developed CUDA kernels for the calcu lation of ERIs and Fock matrix formation involving s- and p-type basis functions on GPUs [20,21]. They opted for the McMurchie—Davidson [22] scheme because it

Quantum Chemistry on Graphics Processing Units

27

requires relatively few intermediates per integral resulting in a low memory requirement, similar to the Rys quadrature. Three different mappings of the com putational work to thread blocks have been tested which result in different load balancing and data reduction overhead and the ERI kernels have carefully been optimized accordingly [21]. If the Fock matrix contributions are directly evaluated from the primitive ERIs, it becomes most efficient to assign the calculation of each primitive ERI batch (i.e., all ERIs over basis functions with magnetic quantum numbers for the given angular momentum quantum numbers) to one thread, independent of the contraction length of the basis functions. In order to maximize load balancing, the integral batches are presorted into blocks of angular momentum classes [21] and within these blocks according to their magnitude [23]. As in Yasuda’s work [18], the Fock matrix elements are directly computed on the GPU but pre- and postprocessing are done on the CPU. This approach has been paralle lized over multiple GPUs [23]. HF SCF calculations with a 3-21G and 6-31G basis set using UM’s implementa tion and an NVIDIA GTX280 card can be more than 100 times faster than the quantum chemistry program package GAMESS [24] on a single 3.0 GHz Intel Pentium D CPU [25]. For small- and medium-sized molecules, most of the time is spent in the Fock matrix formation on the GPU. However, for large molecules such as olestra (453 atoms, 2131 basis functions), the linear algebra (LA) required for the solution of the SCF equations starts to become a bottleneck, requiring as much as 50% of the Fock matrix computation time (LA performed on the GPU using CUBLAS). A parallel efficiency of over 60% was achieved on three NVIDIA GeForce 8800 GTX cards as compared to the use of only one graphics accelerator. Two points should be mentioned here. First, the limitation to s- and p-type functions results in small integral blocks that can be treated entirely in shared memory and registers which means that the ratio of computation to memory access is high. This situation will change for basis functions with higher angular momentum quantum numbers. Furthermore, the Rys quadrature [19] which was used by GAMESS in these com parisons is a legacy Fortran implementation that underperforms on modern CPUs [26]. ERI algorithms which are more efficient on CPUs do exist and less favorable GPU speedups should be observed for comparisons against implementations of these algorithms which are optimized for performance on modern CPUs. The error in the SCF energies obtained with UM’s code due to the use of SP arithmetics quickly exceeds 10 3 au (chemical accuracy, less than 1 kcal mol—1) for larger molecules [23]. However, ERI evaluation in SP and data accumulation in DP, which can be performed on newer GPUs with negligible additional computational cost, improve the accuracy to this level in all investigated cases. In addition, error compensation in relative energies was observed, presumably due to cancellation of contributions of large ERIs. For larger molecules, however, computation of the larger ERIs in DP will be required, as has been extensively discussed before by Yasuda [18]. UM have also implemented the calculation of the Coulomb and exactexchange contributions to the analytical HF energy gradients with s- and p-type basis functions on GPUs [25]. Using the 3-21G basis set, a speedup between 6 for small molecules and over 100 for larger molecules (olestra) has

28

Andreas W. Go ¤ tz et al.

been obtained running in parallel on a system equipped with two NVIDIA GTX295 cards (each of which has two GPUs) and an Intel Core2 quad-core 2.66 GHz CPU. Reference was again made to GAMESS, running in parallel on all four CPU cores. Using the mixed SP/DP approach discussed above, the root mean squared error in the forces is distributed around 10 5 au, which is close to typical convergence thresholds for geometry optimizations. Geometry optimization of a helical hepta-alanine was shown to lead to an optimized structure in good agreement with GAMESS results with an error in the final energy as low as 0.5 kcal mol—1. Good energy conservation was shown for an HF Born—Oppenheimer molecular dynamics simulation of an H3O+(H2O)30 cluster with the 6-31G basis set in the microcanonical ensemble using the velocity Verlet algorithm with a time step of 0.5 fs. An energy drift of 0.022 kcal mol—1 ps—1 was observed over a simulation time of 20 ps. Recently, Asadchev et al. presented algorithms and a CUDA implementation for the calculation of uncontracted ERIs including up to g-type functions [26]. The Rys quadrature [19] was chosen which, in addition to its low memory footprint, is efficient for integrals with higher order angular momentum. The major problem is that, unlike numerical LA kernels, the quadrature has very complex memory access patterns which span a large data set and depend on the particular ERI class being evaluated. As an example, an ðffjffÞ ERI shell block requires 5376 floating-point numbers for intermediate quantities which are reused multiple times and 104 floating-point numbers for the final ERIs [26]. With DP this corresponds to 123,008 bytes, which is much larger than cache sizes available on GPUs. Therefore, these intermediates must be stored and loaded from the device memory as required and it becomes mandatory to arrange the parallel calculation of the ERIs in such a way that these memory loads are minimized. For this purpose, integrals in a shell block are reordered such that intermediates can be reused as often as possible. Another problem is the large amount of code required to cover all possible cases of integral types in an efficient manner. The authors therefore adopted a template-based approach in which all cases can be generated from a single template in an automated fashion. The performance of these GPU ERI kernels was tested on NVIDIA GeForce GTX 275 and NVIDIA Tesla T10 cards and compared to the performance of the ERI evaluation with the Rys quadrature as implemented in GAMESS (which, as noted above, underperforms on modern CPUs) [26]. While the CPU code achieves around 1 GFLOPS (giga floating point operations per second), the GPUs achieve around 25 GFLOPS in DP and 50 GFLOPS in SP, which is approxi mately 30% of the theoretically possible DP peak performance. The difference between performance in SP and DP is approximately a factor of 2 which shows that the computations are memory bound rather than compute bound. No tim ings are given for the data transfer between GPU memory and main memory apart from stating that it takes several times longer than the actual execution time of the ERI kernels. It is clear that, in order to retain the speed advantage of the ERI evaluation on the GPU, processing of the ERIs (e.g., Fock matrix formation) must be implemented on the GPU device, as well.

Quantum Chemistry on Graphics Processing Units

29

3.2 Numerical exchange-correlation quadrature In the generalized gradient approximation (GGA) to DFT, the XC potential depends on the electron density and its gradient r and is a complicated function in three dimensional space. This makes an analytical solution of the XC integrals impossible and numerical quadrature is used to compute the XC matrix elements, ð X XCðGGAÞ GGA GGA V ¼ dr ðrÞ XC ðrÞ ðrÞ» wk ðrk Þ XC ðrk Þ ðrk Þ;

ð4Þ

k

where rk are the quadrature points and wk the corresponding weights. The numerical XC quadrature is perfectly suited for parallelization and Yasuda was the first to exploit GPUs for this purpose [27]. He adopted a strategy in which the computationally less demanding steps in the quadra ture (grid generation, evaluation of GGA on the grid points) are done in DP XC on the CPU while the expensive steps are done on the GPU. These are the evaluation of and r on the grid points and the summation of Eq. (4) which can be formulated as matrix-vector multiplications and dot products. Both steps are organized in batches of grid points and nonnegligible basis functions that are small enough to be kept entirely in shared memory. Although in this way some of the basis function values on the grid points must be recalculated, this is more than compensated for by the low latency of the shared memory. In order to deal with roundoff errors due to the use of SP floating-point numbers on the GPU, Yasuda introduced a scheme in which the XC potential is which is chosen such that its matrix approximated with a model potential model XC elements can be calculated analytically. This is done in DP on the CPU while the GPU is used for calculating the correction, that is, for the numerical quadrature of model the matrix elements of DvXC ¼ GGA . Without the model potential, errors XC XC in the total energy of valinomycin with a 3-21G or 6-31G basis set and the PW91 [28] XC functional are close to 10 4 au. With the model potential approach, the error is reduced to 10 5 au which is sufficient for most purposes. A speedup of approximately 40 is observed with an NVIDIA GeForce 8800 GTX graphics card as compared to a conventional implementation running on an Intel Pentium 4 CPU with 2.8 GHz. This translates into a speedup of around five to ten as compared to more modern CPUs.

3.3 Density-fitted Poisson method Brown et al. have presented a different heterogeneous approach to accelerate DFT, combining ClearSpeed accelerator cards [29] in parallel with a host CPU [30,31]. The ClearSpeed accelerator hardware is a compute-oriented stream architecture with raw performance comparable to that of modern GPUs while offering support for DP. Just as for GPUs, an efficient use of this hardware requires fine-grained parallelization with a large number of lightweight threads and any algorithm developed for these accelerators will map well onto GPUs. By using the Poisson

30

Andreas W. Go ¤ tz et al.

density fitting method, all bottlenecks of a DFT calculation could be shifted into finely parallelizable numerical quadrature. Density fitting [32,33], also called reso lution-of-identity (RI) [34] Coulomb method, is used to avoid the need to calculate the four-index ERIs of Eq. (1). Instead, the Coulomb contributions to the KS matrix are obtained from three-center ERIs ðjÞ, where ’ are auxiliary density fitting basis functions. As a result, the formal scaling of this step becomes OðN 3 Þ and the prefactor is reduced. The auxiliary basis set can be chosen to consist of a few atom centered Gaussian functions augmented with Poisson functions (obtained by applying the Poisson operator ^ p ¼ ð4 Þ 1 r2 to atom-centered Gaussian func tions) whereby the majority of the three-index ERIs is replaced with short-ranged ð three-index overlap integrals ð; Þ ¼ dr ðrÞ ðrÞ’ ðrÞ. This leads to a further reduction of the prefactor. Furthermore, these overlap integrals can be calculated by numerical quadrature. However, to maintain numerical stability in the SCF procedure, a higher accuracy than provided by default XC quadrature grids is required, thus increasing the number of grid points. The implementation, which is not restricted to basis functions with low angular momentum quantum numbers, passes only information about the numerical quadrature grid, the basis functions, the KS matrix, and the density matrix between the accelerator cards and the host system. The numerical quadrature of the XC contribution and the Coulomb contribution due to the integrals ð; Þ is done on the accelerator cards in batches of grid points such that all computations can be done within the cache memory of the accelerator cards. All other parts of the DFT calculation are performed on the host CPU. Compared to an implementation with analytical evaluation of the integrals ð; Þ running on one core of a dual core AMD Opteron 2218 CPU with 2.6 GHz, a speedup between 7 and 15 was observed with 12 ClearSpeed xe620 cards for SCF single-point [30] and gradient [31] calculations. The calculations were run for molecules of the size between chorismate (24 atoms) and an alanine helix consisting of 12 monomers (123 atoms) with 6-31G and cc pVTZ and corresponding density fitting basis sets. There is further room for improvement, for example, by implementing prescreening which is missing so far. However, work done on the host is already becoming a bottleneck and needs to be addressed. The diagonalization, for example, takes approximately 30% of the total runtime.

3.4 Density functional theory with Daubechies wavelets Another effort in the physics community should be mentioned here. The BigDFT software [35] is based on Daubechies wavelets instead of Gaussian basis func tions and offers support within the CUDA programming framework. It was shown to achieve a high parallel efficiency of 90% on parallel computers in which the cross-sectional bandwidth scales well with the number of processors. It uses a parallelized hybrid CPU/GPU programming model and compared to the full CPU implementation, a constant speedup of up to six was achieved with the GPU-enabled version [35].

Quantum Chemistry on Graphics Processing Units

31

4. AB INITIO ELECTRON CORRELATION METHODS The quantum chemist’s traditional way to approximate solutions of the electronic Schro¨dinger equation is so-called ab initio, wave function-based electron correlation methods. These methods improve upon the HF mean-field approximation by add ing many-body corrections in a systematic way [15]. As of the time of this writing, efforts to accelerate ab initio calculations with GPUs are scarce. However, it is expected that this will change in the near future because these methods are of critical importance whenever higher accuracy is required than what can be achieved by DFT or for types of interactions and properties for which DFT breaks down.

4.1 Resolution-of-identity second-order MłllerPlesset perturbation theory Second-order Møller—Plesset perturbation theory (MP2) is the computationally least expensive and most popular ab initio electron correlation method [4,15]. Except for transition metal compounds, MP2 equilibrium geometries are of comparable accuracy to DFT. However, MP2 captures long-range correlation effects (like dispersion) which are lacking in present-day density functionals. The computational cost of MP2 calculations is dominated by the integral trans formation from the atomic orbital (AO) to the molecular orbital (MO) basis which scales as OðN 5 Þ with the system size. This four-index transformation can be avoided by introduction of the RI integral approximation which requires just the transformation of three-index quantities and reduces the prefactor without significant loss in accuracy [36,37]. This makes RI-MP2 the most efficient alter native for small- to medium-sized molecular systems for which DFT fails. Aspuru-Guzik and coworkers have worked on accelerating RI-MP2 calculations [38,39]. They exploited the fact that the step which dominates the computational cost of an RI-MP2 calculation essentially consists of matrix multiplications to generate the approximate MO integrals from the half-transformed three-index integrals Bia ; P ; X ðiajjbÞ» Bia ; P Bjb ; P :

ð5Þ

P

Here, i, j (a, b) label occupied (virtual) MOs and P labels auxiliary basis functions. CPU implementations proceed by multiplying a matrix of size Nvirt Naux (num ber of virtual orbitals number of auxiliary basis functions) against its transpose for each pair ij of occupied orbitals. To take full benefit of GPUs for these matrix multiplications, the matrices have to be larger than a given threshold to minimize the impact of the bus latency when transferring the matrices from the CPU to the GPU memory. Depending on the system size (number of atoms, size of basis sets employed), this is achieved by treating several pairs ij of occupied orbitals together [38]. For the multiplication of general matrices whose size is too large to be held in the onboard memory of the GPU, a library has been developed [39,40]. As established for standard parallel matrix multiplications, this library uses a

32

Andreas W. Go ¤ tz et al.

two-dimensional decomposition. Partial matrix multiplications of these blocks are performed on the GPU with CUBLAS routines and the results are accumu lated on the CPU. To improve the numerical accuracy, a heterogeneous comput ing model is employed in which numerically large contributions to the final result are computed and accumulated on a DP device (in general the CPU) and the remaining small contributions are efficiently treated by the SP GPU device. It was shown that errors can be reduced by an order of magnitude in exchange for a moderate performance decrease with this MP approach. Compared to the standard CPU implementation, speedups of 13.8, 10.1, and 7.8 were obtained on an NVIDIA Tesla C1060 GPU equipped with 4 GB of memory for the 168-atom molecule valinomycin in SP, MP, and DP, respectively. The correspond ing correlation energy error is —10.0 kcal mol—1, —1.2 kcal mol—1, and essentially zero, respectively [39]. While the largest speedup can be obtained by performing the matrix multiplications entirely in SP, the resulting error is larger than acceptable for chemical accuracy. It is therefore inevitable to put up with some performance penalty for the sake of accuracy. It was shown that the ERI evaluation becomes computa tionally as expensive as the integral transformation [38]. We therefore anticipate a combination with the approaches discussed in Section 3 for the ERI evaluation.

5. QUANTUM MONTE CARLO Quantum Monte Carlo (QMC) [41] is one of the most accurate methods for solving the time-independent Schro¨dinger equation. As opposed to variational ab initio approaches, QMC is based on a stochastic evaluation of the underlying integrals. The method is easily parallelizable and scales as OðN 3 Þ, however, with a very large prefactor. Anderson et al. have shown [42] how to accelerate QMC calculations by executing CUDA kernels that are explicitly optimized for cache usage and instruction-level parallelism for the computationally intensive parts on a GPU. These are the basis function evaluation on grid points and, similar to the numer ical XC quadrature and RI-MP2, matrix multiplications. The Kahan Summation Formula to improve the accuracy of GPU matrix multiplications was explored which was necessary because of the lack of fully compliant IEEE floating-point implementations on GPUs in 2007. For small molecules with 8—28 atoms (32—152 electrons and 80—516 basis functions), approximately fivefold speedup was obtained using an NVIDIA GeForce 7800 GTX graphics card as compared to an optimized implementation running on an Intel Pentium 4 CPU with 3 GHz. Meredith et al. have used an implementation of the quantum cluster approx imation on SP GPUs to study the effect of disorder on the critical temperature for superconductivity in cuprates with a two-dimensional Hubbard model on a regular lattice [43]. Trivial modifications to the code base were made, performing matrix multiplications on the GPU using the CUBLAS library. Attempts to increase the performance by circumventing the data transfer bottleneck and implementing the remaining data manipulations on the GPU instead of the CPU resulted in a perfor mance loss for all but the largest problem size that was investigated. The simple

Quantum Chemistry on Graphics Processing Units

33

reason is that smart algorithms that can be implemented efficiently on CPUs do not map well onto GPU architectures or, in other words, the GPU has to do more work to achieve the same result. For the largest problem size studied, a fivefold speedup was observed running on a cluster with 32 AMD Opteron 2.6 GHz CPUs and 32 NVIDIA 8800 GTX graphics cards as compared to using only the CPUs in parallel. Sufficient accuracy for scientifically meaningful results within the employed model was proven by comparison to DP results obtained on a CPU.

6. CONCLUDING REMARKS Quantum chemistry software that exploits the capabilities of modern GPUs has only recently started to emerge. Significant parts of these initial efforts have been devoted to minimize errors caused by the lack of DP support on older GPUs. The advent of next-generation GPUs that support DP arithmetics at a peak performance of only a factor of 2 less than that of SP will make these special approaches obsolete. At the same time, future developments will be greatly facilitated. From the literature, one can observe that in order to achieve good results in programming with GPUs it is often necessary to write GPU-only versions of the code. One typically has to abandon many of the smart optimizations that have been developed over the years for CPUs and expensive copy operations from the CPU to the GPU memory have to be minimized. With careful work, it is possible to achieve speedups which should allow researchers to perform calculations that otherwise would require large and expensive CPU clusters. However, the nature of GPU programming is such that significant effort is still required to make effective use of GPUs. These complex ities are the reason that the quantum chemistry software that is available for GPUs at the time of this writing is still in its infancy and not yet ready for general use. GPU implementations that are capable of full HF and DFT calculations, for example, are still restricted to s- and p-type basis functions. HF calculations are not of much practical use by themselves but only as starting point for correlated ab initio methods which require basis functions with high angular momentum quantum numbers. Similarly, meaningful DFT calculations have to use polarization functions which means that even for simple organic molecules or biomolecules without metal atoms at least d-type functions are required. While GPU-based ERI implementations for high angular momentum basis func tions have been developed, these still have to be incorporated into software capable of performing ab initio or DFT calculations. Up to now only energies and gradients have been considered which allows for explorations of potential energy surfaces. However, a variety of other quan tum chemistry applications would also benefit from the computational power that GPUs provide. Of high interest for the researcher are static and dynamic molecular response properties. Frequently, these require a higher computational effort than energy and gradient evaluations. We therefore expect to see develop ments in this area soon.

34

Andreas W. Go ¤ tz et al.

We are looking forward to exciting new developments of quantum chemistry software for GPUs accompanied by ground-breaking applications in the near future.

ACKNOWLEDGMENTS This work was supported in part by grant 09-LR-06-117792-WALR from the University of California Lab Fees program and grant XFT-8-88509-01/DE-AC36-99GO10337 from the Department of Energy to RCW.

REFERENCES 1. Clary, D.C. Quantum chemistry of complex systems. Science 2006, 314(5797), 265—6. 2. Carter, E.A. Challenges in modeling materials properties without experimental input. Science 2008, 321(5890), 800—3. 3. Reiher, M. (ed.) Atomistic Approaches in Modern Biology, Topics in Current Chemistry, Springer, Heidelberg, 2007. 4. Helgaker, T., Jørgensen, P., Olsen, J. Molecular Electronic-Structure Theory, Wiley, West Sussex, England, 2000. 5. Kirk, D.B., Hwu, W.W. Programming Massively Parallel Processors, Morgan Kaufmann Publish ers, Burlington, MA, 2010. 6. Frigo, M., Johnson, S.G. The design and implementation of FFTW3. Proc. IEEE 2005, 93(2), 216—31. 7. NVIDIA: Santa Clara, CA, CUDA Programming Guide, http://developer.download.nvidia.com/ compute/cuda/3_0/toolkit/docs/NVIDIA_CUDA_ProgrammingGuide_3.0.pdf (Accessed March 6, 2010). 8. AMD: Sunnyvale, CA, ATI, www.amd.com/stream (Accessed March 14, 2010). 9. NVIDIA: Santa Clara, CA, CUDA, http://www.nvidia.com/object/cuda_home.html (Accessed March 6, 2010). 10. NVIDIA: Santa Clara, CA, CUFFT Library, http://developer.download.nvidia.com/compute/ cuda/2_3/toolkit/docs/CUFFT_Library_2.3.pdf (Accessed March 6, 2010). 11. NVIDIA: Santa Clara, CA, CUBLAS Library 2.0, http://developer.download.nvidia.com/ compute/cuda/2_0/docs/CUBLAS_Library_2.0.pdf (Accessed March 6, 2010). 12. Innovative Computing Laboratory, University of Tennessee, Matrix Algebra on GPU and Multicore Architectures, http://icl.cs.utk.edu/magma (Accessed March 6, 2010). 13. Kohn, W., Sham, L. Self-consistent equations including exchange and correlation effects. Phys. Rev. 1965, 140, A1133—8. 14. Parr, R.G., Yang, W. Density-Functional Theory of Atoms and Molecules, Oxford University Press, Oxford, 1989. 15. Jensen, F. In Annual Reports in Computational Chemistry (ed D.C. Spellmeyer), Vol. 1, Elsevier, Amsterdam, 2005, pp. 3—17. 16. Fiolhais, C., Nogueira, F., Marques, M.A.L. A Primer in Density Functional Theory, Lecture Notes in Physics, Springer Verlag, Berlin, 2003. 17. Salek, P., Høs, S., Thøgersen, L., Jørgensen, P., Manninen, P., Olsen, J., Jansık, B. Linear-scaling implementation of molecular electronic self-consistent field theory. J. Chem. Phys. 2007, 126, 114110. 18. Yasuda, K. Two-electron integral evaluation on the graphics processor unit. J. Comput. Chem. 2007, 29(3), 334—42. 19. Dupuis, M., Rys, J., King, H.F. Evaluation of molecular integrals over Gaussian basis functions. J. Chem. Phys. 1976, 65, 111—16. 20. Ufimtsev, I.S., Martınez, T.J. Graphical processing units for quantum chemistry. Comput. Sci. Eng. 2008, 10(6), 26—34. 21. Ufimtsev, I.S., Martınez, T.J. Quantum chemistry on graphical processing units. 1. Strategies for two-electron integral evaluation. J. Chem. Theory Comput. 2008, 4(2), 222—31.

Quantum Chemistry on Graphics Processing Units

35

22. McMurchie, L.E., Davidson, E.R. One- and two-electron integrals over Cartesian Gaussian func tions. J. Comput. Phys. 1978, 26, 218—31. 23. Ufimtsev, I.S., Martinez, T.J. Quantum chemistry on graphical processing units. 2. Direct selfconsistent-field implementation. J. Chem. Theory Comput. 2009, 5(4), 1004—15. 24. Schmidt, M.W., Baldridge, K.K., Boatz, J.A., Elbert, S.T., Gordon, M.S., Jensen, J.H., Koseki, S., Matsunaga, N., Nguyen, K.A., Su, S., Windus, T.L., Dupuis, M., Montgomery, J.A., Jr. General atomic and molecular electronic structure system. J. Comput. Chem. 1993, 14(11), 1347—63. 25. Ufimtsev, I.S., Martinez, T.J. Quantum chemistry on graphical processing units. 3. Analytical energy gradients, geometry optimization, and first principles molecular dynamics. J. Chem. Theory Comput. 2009, 5(10), 2619—28. 26. Asadchev, A., Allada, V., Felder, J., Bode, B.M., Gordon, M.S., Windus, T.L. Uncontracted Rys quadrature implementation of up to g functions on graphical processing units. J. Chem. Theory Comput. 2010, 6(3), 696—704. 27. Yasuda, K. Accelerating density functional calculations with graphics processing unit. J. Chem. Theory Comput. 2008, 4(8), 1230—6. 28. Perdew, J.P., Chevary, J., Vosko, S., Jackson, K.A., Pederson, M.R., Singh, D., Fiolhais, C. Atoms, molecules, solids, and surfaces: Applications of the generalized gradient approximation for exchange and correlation. Phys. Rev. B 1992, 46, 6671—87. 29. ClearSpeed: Bristol, UK, www.clearspeed.com (Accessed March 14, 2010). 30. Brown, P., Woods, C., McIntosh-Smith, S., Manby, F.R. Massively multicore parallelization of Kohn-Sham theory. J. Chem. Theory Comput. 2008, 4(10), 1620—6. 31. Brown, P., Woods, C.J., McIntosh-Smith, S., Manby, F.R., A massively multicore parallelization of the Kohn-Sham energy gradients, J. Comput. Chem. 2010, 31(10), 2008—13. 32. Baerends, E.J., Ellis, D., Roos, P. Self-consistent molecular Hartree-Fock-Slater calculations. I. The computational procedure. Chem. Phys. 1973, 2, 41—51. 33. Dunlap, B.I., Connoly, J.W.D., Sabin, J.R. On some approximations in applications of Xa theory. J. Chem. Phys. 1979, 71, 3396—402. ¨ hm, H., Ha¨ ser, M., Ahlrichs, R. Auxiliary basis sets to approximate 34. Eichkorn, K., Treutler, O., O Coulomb potentials (Chem. Phys. Lett. 1995, 240, 283) Chem. Phys. Lett. 1995, 242, 652—60. 35. Genovese, L., Ospici, M., Deutsch, T., Mehaut, J.-F., Neelov, A., Goedecker, S. Density functional theory calculation on many-cores hybrid CPU-GPU architectures. J. Chem. Phys. 2009, 131, 34103. 36. Feyereisen, M.W., Fitzgerald, G., Komornicki, A. Use of approximate integrals in ab initio theory. An application in MP2 energy calculations. Chem. Phys. Lett. 1993, 208, 359—63. 37. Weigend, F., Ha¨ser, M., Patzelt, H., Ahlrichs, R. RI-MP2: Optimized auxiliary basis sets and demonstration of efficiency. Chem. Phys. Lett. 1998, 294, 143—52. 38. Vogt, L., Olivares-Amaya, R., Kermes, S., Shao, Y., Amador-Bedolla, C., Aspuru-Guzik, A. Accel erating resolution-of-the-identity second-order Møller-Plesset quantum chemistry calculations with graphical processing units. J. Phys. Chem. A 2008, 112(10), 2049—57. 39. Olivares-Amaya, R., Watson, M.A., Edgar, R.G., Vogt, L., Shao, Y., Aspuru-Guzik, A. Accelerating correlated quantum chemistry calculations using graphical processing units and a mixed preci sion matrix multiplication library. J. Chem. Theory Comput. 2010, 6(1), 135—44. 40. SciGPU-GEMM v0.8, http://www.chem-quantum.info/scigpu/?p=61 (Accessed March 6, 2010). 41. Ceperley, D., Alder, B. Quantum Monte Carlo. Science 1986, 231(4738), 555—60. 42. Anderson, A.G., Goddard, W.A., III, Schro¨der, P. Quantum Monte Carlo on graphical processing units. Comput. Phys. Commun. 2007, 177(3), 298—306. 43. Meredith, J.S., Alvarez, G., Maier, T.A., Schulthess, T.C., Vette, J.S. Accuracy and performance of graphics processors: A quantum Monte Carlo application case study. Parallel Comput. 2009, 35(3), 151—63.

CHAPTER

3 Computing Free-Energy Profiles Using Multidimensional Potentials of Mean Force and Polynomial Quadrature Methods Jonah Z. Vilseck and Orlando Acevedo

Contents

1. Introduction 2. Methods 3. Polynomial Quadrature Method 4. Multidimensional Potentials of Mean Force 5. Conclusion Acknowledgments References

Abstract

The accurate calculation of free-energy profiles for condensed-phase and enzymatic reactions is often computationally demanding when employing traditional methods such as a combined quantum and molecular mechanical (QM/MM) simulation featuring configurational sampling. A novel polynomial fitting and analytical integration method was recently developed for proton transfer reactions that provides a seven-fold enhancement to the calculation speed compared with traditional potentials of mean force (PMF) methods and yields close agreement with experimental free energies of activation. In addition, the expansion of PMF simulations to monitor three simultaneous reaction coordinates was also reported to enhance phase space sampling, which is useful for accurately elucidating complex reaction mechanisms. This review focuses upon the development of these methods and their utility is illustrated in recent examples including hydrolysis reactions in fatty acid amide hydrolase, Kemp elimination reactions in antibody 4B2 and ionic liquid environments, and condensed-phase singlet oxygen ene reactions.

38 39 40 44 46 47 47

Department of Chemistry and Biochemistry, Auburn University, Auburn, AL, USA Annual Reports in Computational Chemistry, Volume 6 ISSN: 1574-1400, DOI 10.1016/S1574-1400(10)06003-2

2010 Elsevier B.V. All rights reserved.

37

38

Jonah Z. Vilseck and Orlando Acevedo

Keywords: free-energy perturbation; QM/MM calculations; potentials of mean force; polynomial; free-energy profiles

1. INTRODUCTION The precise calculation of free-energy changes is vital for the characterization of reaction pathways and chemical equilibria. However, the challenge is to obtain a reliable estimate for complex molecular systems within a reasonable allocation of computer time and resources [1—7]. Adequate sampling of regions that substan tially contribute to the free energy of fluidic systems and flexible macromolecules has proven to be especially difficult computationally [8—11]; yet, such systems are generally of interest in many organic and biochemical enzymatic studies [12—15]. Multiple successful approaches have been reported for computing free-energy surfaces [16—20], but of specific interest to this work is the free-energy perturba tion (FEP) technique that utilizes the Zwanzig expression (Eq. (1)) to relate the free-energy difference between an initial (0) and a final (1) state of a system, i.e., mutation [21]. For the applications of Eq. (1) to chemical equilibria, normally, the reference and target states are different molecules, A and B, and a change in medium is investigated by comparing the free-energy change for the mutation of A into B in two environments. Simulations performed through a thermodynamic cycle facilitate the calculation of the free energies, such as shown in Scheme 1–where DGA and DGB are the energies of transfer of A and B from medium 1 to medium 2, respectively. DG1 and DG2 are computed and the medium effect is given by DDG = DG2 — DG1 = DGB — DGA. *

+ ðE1 E0 Þ DGð0 ! 1Þ ¼ kB T ln exp kB T

ð1Þ 0

Instead of chemical mutations, free-energy changes may be computed as a function of some inter- or intramolecular coordinate, e.g., a bond distance between two atoms or dihedral angle. The free-energy surface along the chosen coordinate is known as a potential of mean force (PMF). Several methods exist for carrying out PMF calculations [22]. This review focuses on our recent developments of technical advances to traditional PMF simulations. Two novel

A

ΔG1

B

ΔGA

Medium 1 ΔGB

ΔG2 A

B

Medium 2

Scheme 1 Free-energy cycle diagram for mutation of two different molecules (A and B) in two different solvent mediums.

Computing Free-Energy Profiles Using Multidimensional PMF and Polynomial Quadrature Methods

39

nth-ordered polynomial integration approaches have been developed that sig nificantly enhance the speed of traditional PMF calculations, e.g., from approxi mately 6 months to 3 weeks for a typical enzymatic proton transfer reaction, without a significant loss in accuracy [14,23,24]. In addition, the expansion of PMF simulations to monitor three simultaneous reaction coordinates has also improved the predictive capabilities of the FEP simulations by creating 3-D freeenergy profiles in resemblance of 3-D and 4-D NMR experiments, where the 2-D spectrum is spread out over additional dimensions [8]. The usefulness of the multidimensional computational technique has proven powerful in elucidating a long-standing mechanistic controversy regarding whether the oxygen (1O2)-ene reaction follows a concerted or stepwise pathway while highlighting the inherent dangers of defining valley—ridge inflection (VRI) points on potential energy surfaces (PESs) using a limited number of reaction coordinates [8]. Successful applications of the polynomial quadrature method for accurate calculations of Kemp eliminations in catalytic antibody 4B2 [14] and in a condensed-phase ionic liquid environment [23] are also highlighted.

2. METHODS In our implementation, an FEP method is used where the calculation is broken into a series of intermediate steps or “windows” that is defined by a coupling parameter li. In this way, the overall perturbation is split into multiple small steps. For each intermediate step (li ! li þ 1), the system undergoes Metropolis Monte Carlo (MC) statistical mechanics equilibration and averaging and the resultant free-energy difference is determined. The total free-energy difference for the system is the sum of all intermediate windows. Double-wide sampling is used to double the efficiency of these calculations by simultaneously calculating li ! li þ 1 and li ! li — 1 windows [25]. Each window consists of a small geometric perturbation (�l = 0.01 ˚ ) for each reaction studied to ensure adequate overlap between configura 0.05 A tions (l þ �l), which are combined to produce a 1-D PMF. MC sampling methods allow the required geometric perturbation coordinates to be simply fixed at the desired value. 2-D free-energy profiles can also be built as required by coupling a PMF simulation along one reaction coordinate with a second coordinate (2-D PMF). All highlighted reactions in this review used mixed quantum and molecular mechanical (QM/MM) calculations featuring the PDDG/PM3 semiempirical QM method [26—28] for the QM region and the optimized potentials for liquid simula tions - all atom (OPLS-AA) force field [29,30] for the MM region. The QM system consisted of the reacting substrates; in addition, any amino acids that participated in the reaction were also included for the enzymatic systems. As is typical for ˚ from the binding proteins with more than 300 amino acids, residues more than 15 A site were removed, which leaves 150—200 residues nearest to the ligand. All clipped residues were capped with an acetyl or N-methylamine group. Atomic charges in the QM region were computed using the CM3 charge model [31] and scaled by a factor of 1.12 for the protein systems and 1.14 for the solution-phase reactions in order to accurately reproduce experimental free energies of hydrations

40

Jonah Z. Vilseck and Orlando Acevedo

[32]. Protein complexes originated from their corresponding crystal structures [33,34] with any initial bad contacts resulting from insertion of the reacting sub strates into the active site relaxed through conjugate-gradient energy minimiza tions. QM and MM regions were connected through a modified link-atom approach [35] for the enzymatic reactions and through intermolecular interactions via Len nard-Jones and Columbic terms for the condensed-phase reactions. The total charge of the protein systems is normally made neutral by adjusting charges on residues ´˚ furthest away from the active site. The entire system was solvated in a 22 A radius cap of 1000 TIP4P water molecules [36] for the proteins and by using boxes consisting of 400—750 OPLS-AA solvent molecules with periodic boundary condi tions for condensed-phase reactions. A half-harmonic potential with a force con ´˚ from the center of the stant of 1.5 kcal/mol A2 was applied to the water cap at 22 A enzymatic system; this ensured the prevention of water evaporation during the simulation. All simulations were carried out in Monte Carlo for proteins (MCPRO) for proteins and with biochemical and organic simulation system (BOSS) for solu tion-phase reactions [37]. Solute—solvent and solvent—solvent cutoff distances of ´˚ 12 A were employed. All simulations were run at 25C and 1 atm. For the protein systems, each FEP window required 5 million (M) configurations of solvent relaxa tion, 10 M configurations for full equilibration, and 25—50 M configurations of averaging. The solution-phase reactions required minimally 5 M configurations of equilibration followed by 10 M configurations of averaging.

3. POLYNOMIAL QUADRATURE METHOD In our recent work elucidating the hydrolysis mechanism for fatty acid amide hydrolase (FAAH), a cubic polynomial method was reported as a technical advance to conventional PMF simulations for proton transfers [24]. FAAH is an integral membrane protein involved in endocannabinoid metabolism that remarkably hydrolyzes amides and esters with similar rates [34,38,39]. Our study clarified the mechanisms and unusual selectivity by utilizing a PDDG/PM3-based QM/MM/ FEP approach to obtain free-energy barriers for the reaction pathways. Determining the origin of the rate accelerations derived from the unique catalytic triad, Ser-SerLys [40], in FAAH required the calculation of a large number of proton transfer reactions. In view of the number of such reactions in the investigation, the use of traditional PMF methods would have been prohibitively resource-consuming. For a typical proton transfer, O—H O0 ! O H—O0 , it was found that the O O0 distance remains relatively constant and that r(O—H)—r(H—O0 ) can be used to compute a 1-D PMF. Normally, these calculations are split into a series of 30 ´˚ windows with increments spanning 0.04 A for the proton transfer over the O O0 distance; the resolution is half of the window size due to the use of double-wide sampling. In our study of the FAAH catalytic mechanism, it was shown that the change in DG (i.e., DDG) for the individual FEP windows could be fit almost perfectly by a cubic polynomial, as shown in Figure 1a [24]. Typically, the sum of all DDG values relative to a reference DG = 0 value in the reaction would yield a

(a)

1.5

ΔΔG (kcal/mol)

Computing Free-Energy Profiles Using Multidimensional PMF and Polynomial Quadrature Methods

1

R 2 = 0.9912

(b)

41

30

0.5

ΔG (kcal/mol)

25 20

15 10

0 −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 −0.5 Ser→Lys proton transfer (A)

5 0 −0.8 −0.6 −0.4 −0.2

0

0.2 0.4 0.6 0.8

Ser→Lys proton transfer (A)

Figure 1 For proton transfers, the changes in DG for seven individual windows (a) fitted by a cubic polynomial, which is integrated analytically to give the full PMF (b, solid line). The exact PMF using 33 windows is shown for comparison (b, dotted line).

DG for the overall proton transfer PMF (dotted line in Figure 1b); however, analytical integration of only seven FEP windows (instead of the usual 30 win dows) can also accurately yield the full PMF (solid line in Figure 1b). An average deviation of no more than 0.5 kcal/mol was found relative to the traditional PMF method when employing this polynomial quadrature method. Additionally, the 2-D PMF simulations carried out in the FAAH study always involved a proton transfer as one of the reaction coordinates, allowing the use of this cubic poly nomial quadrature method in at least one direction. This significantly reduced the time needed to compute a complete free-energy surface from 900 windows requiring 6 months of real time on 20 simultaneous processors to 65 windows requiring 3 weeks. The cubic polynomial methodology has also been used with good relative success in the simulation of Kemp elimination reactions in antibody 34B4 [41] and as an essential part of the de novo design process in Baker’s Kemp elimination enzymes [42,43]. The cubic polynomial quadrature method does have drawbacks; the most notable is a significant reduction in accuracy when computing elimination reactions. This was first observed in our studies of the catalytic mechanism of antibody 4B2 [44—46] for the Kemp elimination of 5-nitro-benzisoxazole (Figure 2) [14]. In order to accurately reproduce the experimental activation barrier, most of the proton transfer PMFs needed to be refined using every window–significantly reducing the efficiency of the calculations. To overcome this drawback, higher-order polynomial fits were tested. It was shown that a fifth-order polynomial closely fits the seven windows of the PMF simulation while integration yielded a sextic polynomial with a considerably improved fit compared with the full “exact” PMF. A comparison of the different ordered polynomial fits are presented in Figure 3. Here, two proton transfers for the Kemp elimination of 5-nitro benzisoxazole in aqueous solution were modeled with acetate acting as a base in a periodic box of 740 TIP4P water molecules (Figure 3, left) and within antibody 4B2 using the previously described enzymatic methodology (Figure 3, right). In the case of acetate, the reaction coordinates mimic those of the 4B2

42

Jonah Z. Vilseck and Orlando Acevedo

O

OH

NO2

NC

ΔG TS = 17.1

4B2

28.5–30

O

27–28.5 O

25.5–27

4B2

24–25.5

H NO2

22.5–24

N

21–22.5

O

19.5–21 18–19.5 16.5–18 15–16.5 13.5–15

1.96 1.80 1.64 1.48 R(NO) (Å) 1.32

1.0 0.8 0.7 0.5 0.4 0.2 0.1 −0.1 −0.2 −0.4 −0.6 −0.7 −0.9 −1.0

12–13.5 10.5–12 9–10.5 7.5–9 6–7.5 4.5–6

kemp→Glu L34 proton transfer (Å)

3–4.5 1.5–3 0–1.5

Figure 2 Free-energy profile for the Kemp elimination of 5-nitro-benzisoxazole by antibody 4B2. The reaction coordinate for the proton transfer is r(OH)r(HC) with r(OH)þr(HC)=2.85 ¯. Maximum free-energy values are truncated to 30 kcal/mol for clarity.

20

Relative free energy

35

Full FEP

30 20

Cubic

10

15 10

Fifth order

15

25

Acetate

5 R = 1.38 NO 0 −1.5 −1.0 −0.5

5 Antibody 4B2

RNO = 1.42 0.0

0.5

1.0

0 1.5 −0.8 −0.6 −0.4 −0.2

0.0

0.2

0.4

0.6

0.8

Proton Transfer from 5-nitro-benzisoxable (Å)

Figure 3 Proton transfers from 5-nitro-benzisoxazole to acetate (left) and antibody 4B2 (right) in water. The changes in DG (kcal/mol) are computed using the cubic and fifth-order polynomial methods, and the exact PMF using 50 windows.

Computing Free-Energy Profiles Using Multidimensional PMF and Polynomial Quadrature Methods

43

˚ system with the exception that a r(O—H) þ r(H—C) constant value of 3.2 A ˚ as opposed to 2.85 A provided improved results. In viewing Figure 3, with different fixed r(N-O) values as examples, it is immediately clear that a greater level of accuracy is obtained relative to the full 50 window “exact” PMF simula tion when computing the free-energy profile using the fifth-order polynomial method compared with the cubic polynomial method. In addition, the computed DG‡ of 25.6 + 1.0 kcal/mol for the Kemp elimination of 5-nitro benzisoxazole by acetate in water is in good agreement with the experimentally measured value of 23.8 kcal/mol [47]. The new methodology provided a seven-fold improvement in speed over traditional PMF methods for the enzymatic calcula tions. Higher order polynomials were also tested but found to give nearly identical energies and polynomial fits (R2 = 0.999 to the seven windows) compared with the fifth order. Furthermore, for the 5-nitro-benzisoxazole—antibody 4B2 system, the computed DG‡ of 17.1 kcal/mol with an uncertainty of less than 1.0 kcal/mol (Figure 2) agrees well with the DG‡ prediction of 16.2 + 1.0 kcal/mol from the full “exact” PMF simulation and the experimental estimate of DG‡ = 19.7 kcal/mol from the reported conditions [45]. These simu lations have been shown to provide highly accurate results while maintaining the computational speed enhancements reported for the cubic polynomial quadrature method; in addition, close agreement with experimental rate accelerations was also reported for an unrelated allylic isomerization of a b,a-unsaturated ketone in antibody 4B2 [14]. In our studies of the 5-nitro-benzisoxazole—antibody 4B2 system, the freeenergy profile for this reaction is shown in Figure 2. It was determined that proper alignment of the substrate is achieved through tight hydrogen bonds between a glutamate residue (Glu L34) and two ring-bound hydrogens. This helps position the substrate for proton abstraction and subsequent ring opening. Additionally, a favorable aqueous microenvironment within the active site, consisting of four water molecules, was found to stabilize developing charges in the substrate, thus accelerating enzymatic activity. Initially, two water molecules were found to interact with the N and O atoms of the isoxazole group (at distances ranging ˚ ) along with a third water molecule that stabilized the between 2.0 and 2.5 A ˚ ). Water anionic glutamate residue (with a HO—H O—CO distance of 1.65 A played a particularly significant role in the transition structure–stabilizing the developing negative charges across the isoxazole N and O–accelerating reaction kinetics [48]. This is most clearly identified when compared with the effects of placing a single acetonitrile molecule in the 4B2 active site. When QM/MM/FEP calculations were performed for this altered system, the DG‡ was almost 2 kcal/mol higher than before (DG‡ = 18.2 + 1.0 kcal/mol) in good agreement with experimental observations. The acetonitrile resides within the pocket where the previous water molecules stabilized the N and O isoxazole atoms, thus disrupting favorable hydrogen bonding and raising overall activation energies. Comparable speed gains were also obtained for another Kemp elimination featuring the piperidine-catalyzed ring opening of benzisoxazole in the con densed-phase environment provided by the ionic liquid 1-butyl-3-methylimida zolium hexafluorophosphate, [BMIM][PF6], (Scheme 2) [23]. While these

44

Jonah Z. Vilseck and Orlando Acevedo

H N O

N H

CN

[BMIM][PF6]

O−

Scheme 2 Kemp elimination reaction of benzisoxazole with piperidine.

simulations were carried out as an initial test of recently developed ionic liquid OPLS-AA parameters, the results show the diversity of the applications for the developed polynomial methods. A 2-D PMF free-energy surface was established by combining the proton abstraction by piperidine, r(N—H)—r(H—C), with the ring opening of the benzisoxazole ring r(N—O) along two reaction coordinates. The overall reaction DG‡ was determined to be 25.2 kcal/mol [23] after a cratic entropy correction of 1.89 kcal/mol [49]. These results agree well with the experi mental value of 22.6 + 0.5 kcal/mol [50] when considering computed and experi mental uncertainties and the overestimation of 1.0 kcal/mol for the elimination reaction from the fifth-ordered polynomial method [14]. Again, accurate results were obtained at a seven-fold increase in computational speed in an ionic liquid environment–highlighting the utility of this method.

4. MULTIDIMENSIONAL POTENTIALS OF MEAN FORCE PESs are useful for understanding the chemical reactivity and selectivity of a system as it varies with structural change. However, larger molecules can be problematic to model due to the high dimensionality resulting from a large number of degrees of freedoms–making it difficult to plot and visualize. This problem can be resolved by focusing the construction of the PES on a few selected degrees of freedom associated with bond formation and bond cleavage, usually one or two reaction coordinates. While this is useful for most systems, it is inadequate for studies requiring the simultaneous modeling of three or more reaction coordinates. For example, in our recent study of the condensed-phase ene reaction between singlet 1O2 and simple alkenes the importance of properly modeling a multidimensional PES featuring a VRI point is emphasized [8]. The ene reaction between 1O2 and tetramethylethylene has been reported as the first experimentally supported example of a PES featuring a VRI with sig nificant chemical consequences for product selectivity [51—54]. In that work the PES was computed using a CCSD(T)/6-31G(d) grid featuring two bond-making coordinates between the attacking oxygen of the 1O2 and the olefinic carbons (R1 and R2 in Figure 4a). A unique “two-step no-intermediate” mechanism was reported where two transition states are connected as sequential saddle points on a 3-D PES (Scheme 3a) in contrast to the more traditional stepwise mechanism featuring the rate-limiting symmetric addition of 1O2 to the alkene followed by the formation of a charge-separated or biradical intermediate (Scheme 3b). How ever, the inherent dangers of defining VRI points using a limited number of

45

(a) O R1 O R2

RCH R1

ROH

(b)

O

Relative free energy

Computing Free-Energy Profiles Using Multidimensional PMF and Polynomial Quadrature Methods

O R2

Perepoxide

15 Water DMSO

10

Cyclohexane

5 Perepoxide

0 TSadd.

−5

Allylic hydroperoxide

TSabs.

Reaction coordinate

Reactants

Product

Figure 4 (a) Ene reaction coordinates between 1O2 and tetramethylethylene and (b) free-energy profiles (kcal/mol) for the ene reaction in three solvents from 3-D PMF calculations.

(a)

O O

O

O

O

VRI

(1) O

(b)

O H

H O

O

O O O

(2) O

O (2)

O

(1)

O

O

O

O H

H O

O

O

(3)

Scheme 3 (A) Two-step no-intermediate mechanism for the ene reaction between 1O2 and tetramethylethylene, where (1) is the rate-limiting transition state and (2) is a perepoxide transition state. (B) Traditional ene reaction mechanism featuring a (2) perepoxide and (3) diradical or zwitterionic intermediate.

reaction coordinates have been reported in the literature by several groups [55—60]. While Singleton et al. [51] provided significant validation of their PES in the gas phase, a new study was undertaken in the condensed phase that followed three simultaneous reaction coordinates: two bonds that form between the attacking oxygen and both olefinic carbons (R1 and R2), as used in the previous calculation, and a proton abstraction coordinate between the allylic hydrogen and the terminal oxygen (ROH) (see Figure 4a) [8]. A novel 3-D PMF of the ene reaction between 1O2 and tetramethylethylene was computed in water, dimethyl sulfoxide, and cyclohexane by first holding one ˚ at intervals of reaction coordinate fixed constant (R2) between 1.45 and 2.25 A ˚ , while perturbing the remaining two coordinates (R1 and ROH) in incre 0.1 A ˚ . Nine free-energy maps were created, which were then combined ments of 0.05 A into a single energy surface via the normalization and perturbation of the

46

Jonah Z. Vilseck and Orlando Acevedo

˚ ). To our knowledge, this was the first 3-D energies to the first map (R2 = 1.45 A PMF simulation performed for an organic reaction featuring the perturbation of three reaction coordinates simultaneously. In our analyses of the 1O2-ene reaction PES, the 3-D PMF calculations revealed that the condensed-phase system fol lowed a traditional stepwise mechanism featuring a symmetric charge-separated perepoxide intermediate (2) stabilized by increasing solvent polarity. Two addi tional 2-D PMF QM/MM/FEP PESs and subsequent ab initio calculations at the CCSD(T)/6-31G(d) and MP4(SDQ)/6-31G(d) theory levels helped reinforce this conclusion. As most readily seen in Figure 4b, the reaction proceeds first through the rate-limiting transition structure, TSadd, (1) to a perepoxide intermediate (2). The reaction then quickly passes through a second transition structure, TSabs, where the terminal oxygen abstracts a proton to form the ene products. The charge separation present in the perepoxide intermediate is expected to be extremely sensitive to solvent polarity and hydrogen bonding; accordingly, a direct correla tion of increasing perepoxide stability with increasing solvent polarity was found. In addition to the neglect of solvent, temperature, and entropy in the original study, a truncated description of the free-energy surface may have also contributed to the reported differences between the traditional and the “two-step no-intermediate” mechanisms. For example, by examining the energy minima from the 3-D PMF free-energy maps, the PES could be downgraded into a 2-D surface. The resultant solution-phase PES for the 1O2-ene reaction closely resembled the previous gas-phase “two-step no-intermediate” mechanism. The 3-D PMF methodology may have a significant impact on the exploration of other computationally derived PESs featuring VRI points and can be conducted for any simple organic reaction requiring the employment of three reaction coordinates.

5. CONCLUSION Two recent advances to traditional PMF simulations have been developed and discussed, demonstrating new possibilities of speed enhancements for the calculation of free-energy profiles and the construction of multidimensional PESs. A polynomial quadrature method has been developed for the efficient modeling of proton transfers via a polynomial fitting and analytical integration method that yields energies with accuracies comparable to “exact” PMF simula tions. Activation barriers computed for hydrolysis reactions in FAAH and for Kemp elimination reactions in antibody 4B2 and ionic liquid environments that utilized the method were in close agreement with experimental results; a seven-fold increase in computational efficiency was found compared with tradi tional methods. A novel 3-D PMF method was also discussed as a computational route to model three reaction coordinates simultaneously and demonstrated for the ene reaction between 1O2 and tetramethylethylene. The technique helped to elucidate the complex reaction mechanism and may have a significant impact on the exploration of other computationally derived PESs featuring VRI points.

Computing Free-Energy Profiles Using Multidimensional PMF and Polynomial Quadrature Methods

47

ACKNOWLEDGMENTS Gratitude is expressed to Auburn University and the Alabama Supercomputer Center for support of this research and to Dr. Ivan Tubert-Brohman and Professor William L. Jorgensen for their efforts on the cubic polynomial method and helpful discussions.

REFERENCES 1. Rodinger, T., Pomes, R. Enhancing the accuracy, the efficiency and the scope of free energy simulations. Curr. Opin. Struct. Biol. 2005, 15, 164—70. 2. Simonson, T., Archontis, G., Karplus, M. Free energy simulations come of age: Protein-ligand recognition. Acc. Chem. Res. 2002, 35, 430—7. 3. van Gunsteren, W.F., Daura, X., Mark, A.E. Computation of free energy. Helv. Chim. Acta 2002, 85, 3113—29. 4. Kollman, P.A. Free energy calculations: Applications to chemical and biochemical phenomena. Chem. Rev. 1993, 93, 2395—417. 5. Straatsma, T.P., McCammon, J.A. Computational alchemy. Annu. Rev. Phys. Chem. 1992, 43, 407—35. 6. Beveridge, D.L., Dicapua, F.M. Free-energy via molecular simulation–applications to chemical and biomolecular systems. Annu. Rev. Biophys. Biophys. Chem. 1989, 18, 431—92. 7. Jorgensen, W.L. Free energy calculations, a breakthrough for modeling organic chemistry in solution. Acc. Chem. Res. 1989, 22, 184—9. 8. Sheppard, A.N., Acevedo, O. Multidimensional exploration of valley-ridge inflection points on potential energy surfaces. J. Am. Chem. Soc. 2009, 131, 2530—40. 9. Krivov, S.V., Karplus, M. Hidden complexity of free energy surfaces for peptide (protein) folding. Proc. Nat. Acad. Sci. USA 2004, 101, 14766—70. 10. Kla¨ hn, M., Braun-Sand, S., Rosta, E., Warshel, A. On possible pitfalls in ab initio quantum mechanics/molecular mechanics minimization approaches for studies of enzymatic reactions. J. Phys. Chem. B 2005, 109, 15645—50. 11. Mitchell, M.J., McCammon, J.A. Free-energy difference calculations by thermodynamic integration–difficulties in obtaining a precise value. J. Comput. Chem. 1991, 12, 271—5. 12. Acevedo, O., Jorgensen, W.L. Advances in quantum and molecular mechanical (QM/MM) simu lations for organic and enzymatic reactions. Acc. Chem. Res. 2010, 43, 142—51. 13. Acevedo, O., Armacost, K. Claisen rearrangements: Insight into solvent effects and “on water” reactivity from QM/MM simulations. J. Am. Chem. Soc. 2010, 132, 1966—75. 14. Acevedo, O. Role of water in the multifaceted catalytic antibody 4B2 for allylic isomerization and Kemp elimination reactions. J. Phys. Chem. B 2009, 113, 15372—81. 15. Acevedo, O., Jorgensen, W.L. Solvent effects on organic reactions from QM/MM simulations. Annu. Rep. Comput. Chem. 2006, 2, 263—78. 16. Knight, J.L., Brooks, C.L., III l-Dynamics free energy simulation methods. J. Comput. Chem. 2009, 30, 1692—700. 17. Jorgensen, W.L., Thomas, L.L. Perspective on free-energy perturbation calculations for chemical equilibria. J. Chem. Theory. Comput. 2008, 4, 869—76. 18. Chipot, C., Pohorille, A. Free Energy Calculations: Theory and Applications in Chemistry and Biology, Vol. 86, Springer, Berlin, 2007. 19. Chipot, C., Pearlman, D.A. Free energy calculations. The long and winding gilded road. Mol. Simulat. 2002, 28, 1—12. 20. Reynolds, C.A., King, P.M., Richards, W.G. Free energy calculations in molecular biophysics. Mol. Phys. 1992, 76, 251—75. 21. Zwanzig, R.W. High-temperature equation of state by a perturbation method. I. Nonpolar gases. J. Chem. Phys. 1954, 22, 1420—6. 22. Chipot, C., Pohorille, A. Calculating Free Energy Differences Using Perturbation Theory, Springer Series in Chemical Physics (Free Energy Calculations), Vol. 86, Springer, Berlin, 2007, pp. 33—75.

48

Jonah Z. Vilseck and Orlando Acevedo

23. Sambasivarao, S.V., Acevedo, O. Development of OPLS-AA force field parameters for 68 unique ionic liquids. J. Chem. Theory. Comput. 2009, 5, 1038—50. 24. Tubert-Brohman, I., Acevedo, O., Jorgensen, W.L. Elucidation of hydrolysis mechanisms for fatty acid amide hydrolase and its lys142ala variant via QM/MM simulations. J. Am. Chem. Soc. 2006, 128, 16904—13. 25. Jorgensen, W.L., Ravimohan, C. Monte Carlo simulation of differences in free energies of hydra tion. J. Chem. Phys. 1985, 83, 3050—4. 26. Repasky, M.P., Chandrasekhar, J., Jorgensen, W.L. PDDG/PM3 and PDDG/MNDO: Improved semiempirical methods. J. Comput. Chem. 2002, 23, 1601—22. 27. Tubert-Brohman, I., Guimara˜es, C.R.W., Repasky, M.P., Jorgensen, W.L. Extension of the PDDG/ PM3 and PDDG/MNDO semiempirical molecular orbitial methods to the halogens. J. Comput. Chem. 2003, 25, 138—50. 28. Tubert-Brohman, I., Guimara˜es, C.R.W., Jorgensen, W.L. Extension of the PDDG/PM3 semiempi rical molecular orbital method to sulfur, silicon, and phosphorus. J. Chem. Theory. Comput. 2005, 1, 817—23. 29. Jorgensen, W.L., Tirado-Rives, J. Potential energy functions for atomic-level simulations of water and organic and biomolecular systems. Proc. Nat. Acad. Sci. USA 2005, 102, 6665—70. 30. Jorgensen, W.L., Maxwell, D.S., Tirado-Rives, J. Development and testing of the OPLS all-atom force field on conformational energetics and properties of organic liquids. J. Am. Chem. Soc. 1996, 118, 11225—36. 31. Thompson, J.D., Cramer, C.J., Truhlar, D.G. Parameterization of charge model 3 for AM1, PM3, BLYP, and B3LYP. J. Comput. Chem. 2003, 24, 1291—304. 32. Blagovic, M.U., Morales, P., de Tirado, S.A., Pearlman, W., Jorgensen, L. Accuracy of free energies of hydration from CM1 and CM3 atomic charges. J. Comput. Chem. 2004, 25, 1322—32. 33. Golinelli-Pimpaneau, B., Gonc¸alves, O., Dintinger, T., Blanchard, D., Knossow, M., Tellier, C. Structural evidence for a programmed general base in the active site of a catalytic antibody. Proc. Natl. Acad. Sci. USA 2000, 97, 9892—5. 34. Bracey, M.H., Hanson, M.A., Masuda, K.R., Stevens, R.C., Cravatt, B.F. Structural adaptations in a membrane enzyme that terminates endocannabinoid signaling. Science 2002, 298, 1793—96. 35. Guimara˜es, C.R.W., Udier-Blagovic, M., Jorgensen, W.L. Macrophomate synthase: QM/MM simu lations address the Diels-Alder versus Michael-aldol reaction mechanism. J. Am. Chem. Soc. 2005, 127, 3577—88. 36. Jorgensen, W.L., Chandrasekhar, J., Madura, J.D., Impey, W., Klein, M.L. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 1983, 79, 926—35. 37. Jorgensen, W.L., Tirado-Rives, J. Molecular modeling of organic and biomolecular systems using BOSS and MCPRO. J. Comput. Chem. 2005, 26, 1689—700. 38. Cravatt, B.F., Giang, D.K., Mayfield, S.P., Boger, D.L., Lerner, R.A., Gilula, N.B. Molecular char acterization of an enzyme that degrades neuromodulatory fatty-acid amides. Nature 1996, 384, 83—7. 39. Cravatt, B.F., Lichtman, A.H. Fatty acid amide hydrolase: An emerging therapeutic target in the endocannabinoid system. Curr. Opin. Chem. Biol. 2003, 7, 469—75. 40. McKinney, M.K., Cravatt, B.F. Evidence for distinct roles in catalysis for residues of the serineserine-lysine catalytic triad of fatty acid amide hydrolase. J. Biol. Chem. 2003, 278, 37393—9. 41. Alexandrova, A.N., Jorgensen, W.L. Origin of the activity drop with the E50D variant of catalytic antibody 34E4 for Kemp elimination. J. Phys. Chem. B 2009, 113, 497—504. 42. Ro¨ thlisberger, D., Khersonsky, O., Wollacott, A.M., Jiang, L., DeChancie, J., Betker, J., et al. Kemp elimination catalysts by computational enzyme design. Nature 2008, 453, 190—5. 43. Alexandrova, A.N., Ro¨ thlisberger, D., Baker, D., Jorgensen, W.L. Catalytic mechanism and perfor mance of computationally designed enzymes for Kemp elimination. J. Am. Chem. Soc. 2008, 130, 15907—15. 44. Yu, J., Hsieh, L.C., Kochersperger, L., Yonkovich, S., Stephans, J.C., Gallop, M.A., et al. Progress toward an antibody glycosidase. Angew. Chem. Int. Ed. 1994, 33, 339—41. 45. Genre-Grandpierre, A., Tellier, C., Loirat, M., Blanchard, D., Hodgson, D.R.W., Hollfelder, H., et al. Catalysis of the Kemp elimination by antibodies elicited against a cationic hapten. Bioorg. Med. Chem. Lett. 1997, 7, 2497—502.

Computing Free-Energy Profiles Using Multidimensional PMF and Polynomial Quadrature Methods

49

46. Gonc¸alves, O., Dintinger, T., Lebreton, J., Blanchard, D., Tellier, C. Mechanism of an antibody catalysed allylic isomerization. Biochem. J. 2000, 346, 691—8. 47. Hu, Y., Houk, K.N., Kikuchi, K., Hotta, K., Hilvert, D. Nonspecific medium effects versus specific group positioning in the antibody and albumin catalysis of the base-promoted ring-opening reactions of benzisoxazoles. J. Am. Chem. Soc. 2004, 126, 8197—205. 48. Warshel, A., Sharma, P.K., Kato, M., Xiang, Y., Liu, H., Olsson, M.H.M. Electrostatic basis for enzyme catalysis. Chem. Rev. 2006, 106, 3210—35. 49. Hermans, J., Wang, L. Inclusion of loss of translational and rotational freedom in theoretical estimates of free energies of binding. Application to a complex of benzene and mutant T4 lysozyme. J. Am. Chem. Soc. 1997, 119, 2707—14. 50. D’Anna, F., La Marca, S., Noto, R. Kemp elimination: A probe reaction to study ionic liquids properties. J. Org. Chem. 2008, 73, 3397—403. 51. Singleton, D.A., Hang, C., Szymanski, M.J., Meyer, M.P., Leach, A.G., Kuwata, K.T., et al. Mechan ism of ene reactions of singlet oxygen. A two-step no-intermediate mechanism. J. Am. Chem. Soc. 2003, 125, 1319—28. 52. Singleton, D.A., Hang, C., Szymanski, M.J., Greenwald, E.E. A new form of kinetic isotope effect. Dynamic effects on isotopic selectivity and regioselectivity. J. Am. Chem. Soc. 2003, 125, 1176—7. 53. Leach, A.G., Houk, K.N. Diels—alder and ene reactions of singlet oxygen, nitroso compounds and triazolinediones: Transition states and mechanisms from contemporary theory. Chem. Commun. 2002, 1243—55. 54. Leach, A.G., Houk, K.N., Foote, C.S. Theoretical prediction of a perepoxide intermediate for the reaction of singlet oxygen with trans-cyclooctene contrasts with the two-step no-intermediate ene reaction for acyclic alkenes. J. Org. Chem. 2008, 73, 8511—9. 55. Baker, J., Gill, P.M.W. An algorithm for the location of branching points on reaction paths. J. Comput. Chem. 1988, 9, 465—75. 56. Bosch, E., Moreno, M., Lluch, J.M., Bertran, J. Intrinsic reaction coordinate calculations for reaction paths possessing branching points. Chem. Phys. Lett. 1989, 160, 543—8. 57. Ramquet, M.-N., Dive, G., Dehareng, D. Critical points and reaction paths characterization on a potential energy hypersurface. J. Chem. Phys. 2000, 112, 4923—34. 58. Schlegel, H.B. Some thoughts on reaction-path following. J. Chem. Soc. Faraday. Trans. 1994, 90, 1569—74. 59. Valtazanos, P., Ruedenberg, K. Bifurcations and transition states. Theor. Chim. Acta 1986, 69, 281—307. 60. Wales, D.J. Potential energy surfaces and coordinate dependence. J. Chem. Phys. 2000, 113, 3926—7.

CHAPTER

4 QM/MM Alchemical Free Energy Simulations: Challenges and Recent Developments Wei Yang1,2, Qiang Cui3, Donghong Min1, and Hongzhi Li1

Contents

Abstract

1. Introduction 2. Direct and Indirect Schemes for QM/MM AFE Simulations 2.1 The direct scheme AFE simulations 2.2 The indirect scheme AFE simulations 3. The Long-range Electrostatic Treatment in QM/MM AFE

Simulations 4. The Sampling Issue in QM/MM AFE Simulations 4.1 The first-order generalized ensemble-based QM/MM

AFE simulations 4.2 The orthogonal space random walk simulation method

as a future scheme 5. Concluding Remarks and Future Perspectives Acknowledgments References

52

53

53

54

55

56

56

57

58

59

59

The difference between free energy changes occurring at two chemical states can be rigorously estimated via alchemical free energy (AFE) simulations. Traditionally, most AFE simulations are carried out under the classical energy potential treatment; then, accuracy and applicability of AFE simulations are limited. Following the natural evolution, employing the quantum mechanical (QM)-based potentials, particularly the combined QM and molecular mechanical (QM/MM) potentials, in AFE simulations is a natural next step. To make such QM/MM AFE simulations routinely applicable and reliable to complex systems, several major challenges have to be met: (1) to ensure structural integrities for

1

Institute of Molecular Biophysics, Florida State University, Tallahassee, FL, USA

2

Department of Chemistry and Biochemistry, Florida State University, Tallahassee, FL, USA

3

Department of Chemistry and Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI, USA

Annual Reports in Computational Chemistry, Volume 6 ISSN: 1574-1400, DOI 10.1016/S1574-1400(10)06004-4

2010 Elsevier B.V. All rights reserved.

51

52

Wei Yang et al.

robust electronic structural calculations when unphysical states are simulated; (2) to accurately describe long-range electrostatic interactions; and (3) to efficiently sample the configuration space to guarantee free energy convergence when costly QM/MM potentials are applied. This review summarizes recent developments related to these challenges. Keywords: alchemical free energy simulation; combined quantum mechanical/ molecular mechanical potential; generalized ensemble simulation; conforma tional sampling; long-range electrostatic interaction

1. INTRODUCTION An ultimate goal of computational chemistry and biophysics is to quantitatively reproduce and predict experimentally measured values. Among all the possible observables, free energy changes that govern the equilibria of various molecular or biomolecular processes are arguably the most important. An accurate predic tion of free energy differences relies on two interrelated technical treatments: the underlying energy function and the sampling strategy. Despite rapidly improv ing computing power, it is generally important to carefully consider the two factors to balance the prediction accuracy and efficiency. There are two general types of free energy simulations [1] that employ either molecular dynamics (MD) or Monte Carlo (MC) as the fundamental sampling tool. The first type follows the free energy change as a physical process (e.g., either a chemical reaction or a conformational transition) occurs; corre spondingly, the actual process has to be explicitly sampled. In the second type of free energy simulation, by contrast, only the free energy differences between a pair of end states are of interest; in this situation, the calculations do not have to follow the authentic physical paths. A classical example in this context is the calculation of the solvation free energy difference between two chemical species A and B [DDG=DGgas!solution(B)—DGgas!solution(A)]. Based on the widely employed thermodynamic cycle [1—5], DDG can be obtained as the difference of the alchemical (A!B) free energy changes in the two limiting environments [DDG=DGsolution(A!B)—DGgas(A!B)]. Thereby, visits to the phase space regions that could be challenging to sample, such as those associated with gas—solution interface, can be conveniently avoided. Based on specific applications, the same analogy can be applied to, for instance, ligand molecule changes, amino acid muta tions, protonations/deprotonations, and electron transfers in half reactions. This category of free energy simulation is commonly called “alchemical free energy (AFE) simulation” [2—15]. AFE simulations have been commonly performed based on the molecular mechanical (MM) energy model. Then alchemical transitions can be readily achieved via the construction of the following hybrid potential energy function: Uo ¼ Us ðlÞ þ Ue ;

ð1Þ

where the constraints of Us ð0Þ ¼ UsA and Us ð1Þ ¼ UsB are used to obtain the energy terms (UsA and UsB ) unique in two end chemical states (respectively

QM/MM Alchemical Free Energy Simulations: Challenges and Recent Developments

53

represented by l = 0 and l = 1); Ue represents the energy terms associated with the environment. One of the simplest forms of Eq. (1) is a linear equation, Uo ¼ ð1 lÞUsA þ lUsB þ Ue :

ð2Þ

Notably, the advantage of applying MM energy functions in AFE simulations lies in the fact that force evaluations are not sensitive to large structural distortions, i.e., force calculations for structures in unphysical state (l ¼ 6 0 or l ¼ 6 1) do not pose numerical problems. Despite tremendous progress in classical force field-based developments [16—38], many AFE calculation applications require more sophisticated potential energy functions; good examples include accurate prediction of redox potentials, tautomerization free energy, and metal-associated ligand-binding affinity. In these cases, employing the quantum mechanical (QM)-based potentials, particularly the combined QM and MM (QM/MM) potentials [39—43], in AFE simulations is a natural choice [44—48]. To make such QM/MM AFE simulations routinely applic able and reliable to complex systems, however, several major challenges have to be met: (1) to ensure structural integrities for robust electronic structural calculations when unphysical states (l ¼ 6 0 or l ¼ 6 1) are simulated; (2) to accurately describe long-range electrostatic interactions that are vital to alchemical transformations with net charge changes; and (3) to efficiently sample the configuration space to guarantee free energy convergence when costly QM/MM potentials are applied. The present review summarizes recent developments related to these challenges.

2. DIRECT AND INDIRECT SCHEMES FOR QM/MM AFE SIMULATIONS To cope with the structural integrity problem during QM/MM AFE simulations, two schemes have been developed.

2.1 The direct scheme AFE simulations In the “direct” scheme, each QM/MM AFE calculation is performed based on explicitly mixing the starting state (AQM/MM) and the ending state (BQM/MM), i.e., only one hybrid potential energy function is employed in each free energy calculation [45,48—59]. Such alchemical transformation can be realized simply through mechanical switching, which involves two independent QM/MM force calculations as reflected in the following equation: Uo ¼ ð1 lÞUsA;QM=MM þ lUsB;QM=MM þ Ue ;

ð3Þ

where UsA and Us B in Eq. (2) are described by the corresponding QM/MM energy A;QM=MM B;QM=MM

terms Us and Us . To avoid two (potentially costly) QM/MM force calculations in each simulation step, alchemical mixing can be formulated via electro nic switching, which involves directly mixing the electronic Hamiltonian of the two

54

Wei Yang et al.

end states [48,58]. When two end states share a similar molecular structure, such as in the calculation of redox potentials or excitation free energies [49], chemically stable configurations can be robustly maintained in intermediate state QM/MM force calculations. When two end states have different number of atoms (or having the same number of atoms but with distinct structures), the chaperoned approach [50—55] is required to realize the “direct” scheme. In the chaperoned approach, two sets of structures are used for each of the two end states: in addition to the structure that physically represents the corresponding chemical species, a “chaperone” is employed to maintain the structural integrity of the atoms that only exist in the other end state. The chaperoned strategy has the following hybrid potential function: h i B;Chaperone A;QM=MM Uo ¼ ð1 lÞ Us ðXA Þ þ Us ðXB Þ h i A;Chaperone B;QM=MM ð4Þ þl Us ðXB Þ þ Us ðXA Þ þ Ue ; where XA and XB represent the positions of the atoms in the potential functions A;Chaperone B;Chaperone A;QM=MM B;QM=MM Us and Us ; the chaperone energy terms Us and Us can be described either classically or quantum mechanically as long as they are decoupled from the rest of the system. It should be noted that when two end states are chemically distinct, it is extremely challenging to rigorously apply the “direct” scheme due to the numerical singularity problem that occurs in simulations where some atoms are “annihilated”; because of the complication of the QM/MM treat ment and the numerical singularity problem, in this situation, end point contribu tions can only be empirically estimated via an extrapolation strategy. Clearly, the “direct” scheme is more appropriate for redox potential type of calculations, where two end states have very similar nuclear configurations but different electronic structures; in this case, Eq. (3) can be directly employed. Notably, in pKa calcula tions, if the deprotonated states do not interact strongly with the surrounding environment, an effective approximation is to keep the van der Waals interactions associated with the acidic proton in QM/MM free energy simulations and then annihilate the acidic proton in a separate set of simulations [55].

2.2 The indirect scheme AFE simulations In comparison with the “direct” scheme, the “indirect” scheme, which was pioneered by Gao et al. [44—48,60—63], can robustly deal with cases where the end states are drastically different. In the “indirect” scheme, the free energy difference of interest, DGA;QM=MM ! B;QM=MM , is typically calculated in three steps, which are respectively responsible for estimating DGA;MM ! A;QM=MM , DGA;MM ! B;MM , and DGB;MM ! B;QM=MM ; thereafter, DGA;QM = MM ! B;QM=MM can be calculated via the following equation: DGA;QM=MM ! B;QM=MM ¼ DGB;MM ! B;QM=MM DGA;MM ! A;QM=MM þ DGA;MM ! B;MM : ð5Þ

QM/MM Alchemical Free Energy Simulations: Challenges and Recent Developments

55

The advantage of the “indirect” scheme lies in the fact that QM/MM calculations are only required in the transformation between two energy functions (MM vs. QM/MM) for the same molecule, e.g., Uo ¼ ð1 lÞUsA;MM þ lUsA;QM=MM þ Ue

ð6Þ

for the calculation of DGA;MM ! A;QM=MM . Therefore, possible atom annihilations or large chemical configuration changes are only required in the DGA;MM ! B;MM calculation, where the end point singularity problem can be readily taken care of by the soft-core potential approach [64—66]. Clearly, the “indirect” scheme is particularly appropriate for the free energy calculations where two end states have different chemical configurations. Till now, the “indirect” scheme has been mostly employed to understand the solvation of organic molecules or pKa shifts. It can be anticipated that in the near future, this scheme will be more widely employed in a broader range of applications such as drug discovery.

3. THE LONG-RANGE ELECTROSTATIC TREATMENT IN QM/MM AFE SIMULATIONS As extensively discussed [12,67], a proper treatment of long-range electrostatics is crucial to the prediction accuracy of AFE calculations; this is particularly true when the corresponding chemical change has a net difference in total charge. In the context of classical AFE simulation, besides the pioneering work of the Warshel group [67], the generalized solvent boundary potential (GSBP) method [68] and the Ewald summation-based method [69] have evolved to be two major choices for long-range electrostatic treatments. The GSBP method was designed for the spherical boundary treatment, where atoms near the site of interest are explicitly simulated and the electrostatic effects due to the rest of the system, including bulk water, are implicitly represented via continuum electrostatics (i.e., Poisson—Boltzmann). The accuracy of GSBP in AFE simulations has been demonstrated in challenging application studies, in particu lar related to the protein—ligand binding problem [70]. Recently, the GSBP method has been introduced into the QM/MM regime; specifically, it was first formulated with self-consistent charge density functional tight binding (SCC-DFTB)-based QM/MM potential [71] and more recently with semiempirical and ab initio QM/ MM potentials [72,73]. Rather systematic tests of such QM/MM extensions found that quantitative results can be obtained with AFE simulation [55] when the site of interest is sufficiently removed from the boundary [74,75]. Obviously, the GSBP scheme is limited due to its restricted sampling of configuration space, especially the motions of inner region atoms that are correlated with those in the outer region. In addition, the surface polarization problems at the interface of inner and outer regions need to be further addressed in future developments. In comparison, the Ewald summation-based methods were developed for the periodic boundary treatment, where the entire system plus its solvent environment are explicitly represented. Thus, Ewald summation-based simulations are parti cularly suited for studying free energy changes that involve delocalized

56

Wei Yang et al.

conformational changes. In the context of QM/MM simulations, the Ewald summa tion has been implemented with semiempirical QM (such as AM1 and SCC-DFTB [76,77])-based methods [55,78]; it should be noted that in these implementations, the original Ewald summation formulation was employed. Overall, associated compu tational costs are very high for free energy simulations. Up to now, to the best of our knowledge, there is no report on the usage of the Ewald summation-based QM/MM potentials in AFE simulations, although its advantage is quite transparent. To reduce the computational cost, one obvious task is to reformulate QM/MM potentials in the scheme of particle mesh Ewald summation [69]; alternatively, other electrostatic models such as the Wolf summation approach [79] can be employed, although the accuracy of these models for interfacial systems remains to be carefully tested. To eventually realize practically efficient Ewald summation-based QM/MM AFE simu lations, the hope largely lies in either dramatic computing power increase or leap frog improvement of sampling techniques.

4. THE SAMPLING ISSUE IN QM/MM AFE SIMULATIONS Since QM/MM force calculations are time-consuming, achieving rapid conver gence in free energy simulations with enhanced sampling techniques has been a particularly important challenge. Similar to the situation in classical AFE simula tions, there are two interrelated aspects: (1) how to efficiently collect samples to fill the phase space gap between two end states and (2) how to accurately collect samples to achieve specific ensemble averages as required by the employed free energy theory. The former aspect is usually called the “overlap sampling” issue and the latter aspect is usually called the “conformational sampling” issue. Apparently, the difficulty level of an AFE simulation depends on (1) how well the phase regions of two target chemical states overlap and (2) how rough the energy landscape is in the region that connects the two end states. Specifically, in the “direct” scheme of QM/MM AFE simulations (for instance, in the redox potential calculations), the sampling challenge lies in the fact that in the environ ment (i.e., solvent and/or protein), nontrivial structural reorganizations are likely to occur during the chemical transformation. In the “indirect” scheme of QM/MM AFE simulations, any discrepancy between the employed MM and QM/MM potentials could also lead to substantial environmental reorganiza tions; this is particularly true when the MM parameters have not been thoroughly tested/refined for the specific system in hand, although theoretically, the MM intermediate states in Eq. (6) should not influence the target free energy differ ence DGA;QM=MM ! B;QM=MM , provided that adequate sampling is carried out.

4.1 The first-order generalized ensemble-based QM/MM AFE simulations Recently, the sampling issue in QM/MM AFE simulations has attracted substan tial research efforts. Most of these developments were carried out in the framework

QM/MM Alchemical Free Energy Simulations: Challenges and Recent Developments

57

of the generalized ensemble sampling [80—84]. Up to now, two specific strategies have been used to realize the first-order generalized ensemble-based QM/MM AFE simulations, in which various scaling parameter states are randomly visited. They include the replica exchange-based strategy and the simulated scaling-based strategy. The advantage of the first-order generalized ensemble technique lies in the fact that random walks in the order parameter space allow each immediate state to be more efficiently sampled. In the replica exchange-based strategy, QM/MM AFE simulations have been developed in the indirect scheme [61,63]. Specifically, a replica array from the state (A, QM/MM) to the state (A, MM), from the state (A, MM) to the state (B, MM), and then from the state (B, MM) to the state (B, QM/MM) is arranged. In this hybrid potential space formed by various states, two end points are the target QM/MM states; the central replica array from the state (A, MM) and to the state (B, MM) is the same as that employed in the classical replica exchange-based AFE simulations. Based on such replica array, the replica exchange procedure can lead to a (uniform) random walk along the designed hybrid path. The first replica exchange-based QM/MM AFE simulation [61] was applied to calculate the free energy difference between two peptide residues based on the SCC-DFTB QM treatment. Recently, the replica exchange-based QM/MM AFE simulations were performed to calculate the ion solvation free energies as well [63]. In the simulated scaling strategy [26], a biasing potential (or a biasing weight function) is employed as in the following equation: Um ¼ Uo þ f m ðlÞ ¼ Us ðlÞ þ Ue þ f m ðlÞ;

ð7Þ

where f m ðlÞ is targeted as Go ðlÞ; here, Go ðlÞ is the l-dependent free energy profile in the canonical ensemble with Uo as the potential energy function. The simulated scaling strategy requires l to be dynamically coupled with the system motions; for this purpose, the hybrid MC-based methods [24,25] or the l-dynamics approach [22,23] can be employed. In a simulated scaling, a well-designed recursion method, such as the Wang—Landau recursion method [85] or the metadynamics recursion method [86,87], is employed to adaptively generate f m ðlÞ toward Go ðlÞ. As shown in recent works, the simulated scaling strategy can accelerate the sampling of QM/MM AFE simulations in both the “direct” scheme [26], where the hybrid potential is constructed according to Eq. (3), and the “indirect” scheme [62], where the same hybrid path is constructed based on the above scheme.

4.2 The orthogonal space random walk simulation method as a future scheme As discussed above, the sampling issue in QM/MM AFE simulations largely occurs in the phase region orthogonal to the order parameter l; namely, in the environment; to ensure efficient free energy convergence, (collective) environ mental relaxation needs to be synergistically sampled with the l move. From this point of view, the first-order generalized ensemble treatment has limitations because of its sole focus on the removal of explicit free energy barriers along l.

58

Wei Yang et al.

Recently, the concept of second-order generalized ensemble was introduced to additionally remove the hidden free energy barriers that are responsible for slow/collective environmental relaxation [38,84]. This novel scheme, referred to as the “orthogonal space random walk (OSRW)” technique, has shown intriguing capability in both free energy simulation [38] and general conformational sam pling [84]. It is anticipated that future improvements of the OSRW strategy can more effectively deal with the sampling issue in QM/MM AFE simulations. In OSRW, in addition to the first-order generalized ensemble treatment (namely, with a biasing energy term f m ðlÞ), one more biasing energy term Fm ½l; hðlÞ is introduced (thus second-order) to further flatten free energy surfaces along l and hðlÞ. Ideally, for every target l state, hðlÞ represents the corresponding order parameter that describes the necessary structural relaxation in the space ortho gonal to l; thereby, the required environmental sampling can be accelerated. Specifically, @Uo =@l has been identified as such a functional form; this order parameter function is particularly robust for systems in which orthogonal space structural transitions are strongly coupled to the change of l. Accordingly, in the OSRW-based AFE simulations, the target energy function is @Uo @Uo Um ¼ Uo þ f m ðlÞ þ Fm l; ¼ Us ðlÞ þ Ue þ f m ðlÞ þ Fm l; ; ð8Þ @l @l where Uo is the QM/MM alchemical hybrid potential based on either Eq. (3) (in the direct scheme) or Eq. (6) (in the indirect scheme); f m ðlÞ is targeted as Go ðlÞ (Go ðlÞ is the l-dependent free energy profile in the canonical ensemble with Uo as the potential energy function), and Fm ½l; @Uo =@l is targeted as G’o ðl; @Uo =@lÞ (G’o ðl; @Uo =@lÞ is the ðl; @Uo =@lÞ space free energy profile corresponding to the canonical ensemble with Uo Go(l) as the potential energy function). With the above treatments, the OSRW sampling strategy is anticipated to result in significant efficiency improvements in QM/MM AFE simulations. Therefore, the work on the integration of this advanced free energy simulation sampling method with QM/MM AFE approaches will be of great significance.

5. CONCLUDING REMARKS AND FUTURE PERSPECTIVES The difference of free energy changes occurring in two chemical states can be rigorously estimated via AFE simulations. Traditionally, most AFE simulations have been carried out using classical potential energy functions, which limits the accuracy and applicability of AFE simulations. For instance, predictions of tau tomerization free energy and binding affinity of metal ligands are outstanding examples for which the use of a QM/MM potential function can be very bene ficial. Therefore, developing efficient and accurate QM/MM-based AFE simula tion techniques is important. Considering the high computational cost and numerical complexities asso ciated with QM/MM potential functions, there are three major challenges for

QM/MM Alchemical Free Energy Simulations: Challenges and Recent Developments

59

QM/MM AFE simulations as we have reviewed above. Thanks to many pioneer ing efforts and more recent developments, the issues concerning the structural integrity of the QM region in unphysical (l 6¼ 0,1) states and the treatment of longrange electrostatics have been thoroughly addressed, although further work is still required to improve the efficiency of existing methods. The key remaining and possibly long-standing issue still lies in sampling. We anticipate that with further advancement of generalized ensemble methods, such as the OSRW method reviewed here, achieving the ambition of practically efficiently QM/MM AFE simulations will be soon within reach.

ACKNOWLEDGMENTS We would like to acknowledge the National Science Foundation (MCB-0919983 to WY, CHE-0957285 to QC) for funding support. W. Y. would like to thank the Florida state university high performance computing center and the institute of molecular biophysics computing facility for the computing supports on the related research.

REFERENCES 1. Chipot, C., Shell, M.S., Pohorille, A. In Free Energy Calculations: Theory and Applications in Chemistry and Biology (eds A. Pohorille, A.C. Chipot). Springer, Heidelberg, 2007, pp. 1—32. 2. Warshel, A. Dynamics of reactions in polar solvents. Semi-classical trajectory studies of electron transfer and proton transfer reactions. J. Phys. Chem. 1982, 86, 2218—24. 3. Warshel, A. Simulating the energetics and dynamics of enzymatic reactions. Pont. Acad. Sci. Scr. Var. 1983, 55, 59—81. 4. Tembe, B.L., McCammon, J.A. Ligand receptor interactions. Comput. Chem. 1984, 8, 281—3. 5. Jorgensen, W.L., Ravimohan, C. Monte-Carlo simulation of differences in free-energies of hydration. J. Am. Chem. Soc. 1985, 83, 3050—4. 6. Bash, P.A., Singh, U.C., Langridge, R., Kollman, P.A. Free-energy calculations by computersimulation. Science 1987, 236, 564—8. 7. Gao, J., Kuczera, K., Tidor, B., Karplus, M. Hidden thermodynamics of mutant proteins–a molecular-dynamics analysis. Science 1989, 244, 1069—72. 8. Jorgensen, W.L. Free-energy calculations–a breakthrough for modeling organic-chemistry in solution. Acc. Chem. Res. 1989, 22, 184—9. 9. Beveridge, D.L., Dicapua, F.M. Free-energy via molecular simulation–applications to chemical and biomolecular systems. Annu. Rev. Biophys. Biophys. Chem. 1989, 18, 431—92. 10. Straatsman, T.P., McCammon, J.A. Computational alchemy. Annu. Rev. Phys. Chem. 1992, 43, 407—35. 11. Kollman, P. Free-energy calculations–applications to chemical and biochemical phenomena. Chem. Rev. 1993, 93, 2395—417. 12. Simonson, T., Archontis, G., Karplus, M. Free energy simulations come of age: Protein-ligand recognition. Acc. Chem. Res. 2002, 35, 430—7. 13. Rodinger, T., Pomes, R. Enhancing the accuracy, the efficiency and the scope of free energy simulations. Curr. Opin. Struc. Biol. 2005, 15, 164—70. 14. Gilson, M.K., Zhou, H.X. Calculation of protein-ligand binding affinities. Annu. Rev. Biophys. Biomol. Struct. 2007, 36, 21—42. 15. Jorgensen, W.L., Thomas, L.L. Perspective on free-energy perturbation calculations for chemical equilibria. J. Chem. Theor. Comp. 2008, 4, 869—76. 16. Kirkwood, J.G. Statistical mechanics of fluid mixtures. J. Chem. Phys. 1935, 3, 300—13. 17. Zwanzig, R.W. High-temperature equation of state by a perturbation method. 1. Nonpolar gases. J. Chem. Phys. 1954, 22, 1420—6.

60

Wei Yang et al.

18. Bennett, C.H. Efficient estimation of free-energy differences from Monte-Carlo data. J. Comput. Phys. 1976, 22, 245—68. 19. Souaille, M., Roux, B. Extension to the weighted histogram analysis method: Combining umbrella sampling g with free energy calculations. Comput. Phys. Comm. 2001, 135, 40—57. 20. Shirts, M.R., Bair, E., Pande, V.S. Equilibrium free energies from nonequilibrium measurements using maximum-likelihood methods. Phys. Rev. Lett. 2003, 91, 140601. 21. Lu, N.D., Kofke, D.A., Woolf, T.B. Improving the efficiency and reliability of free energy calcula tions using overlap sampling methods. J. Comput. Chem. 2004, 25, 28—39. 22. Kong, X.J., Brooks, C.L. Lambda-dynamics: A new approach to free energy calculations. J. Chem. Phys. 1996, 105, 2414—23. 23. Knight, J.L., Brooks, C.L. Lambda-dynamics free energy simulation methods. J. Comput. Chem. 2009, 30, 1692—1700. 24. Tidor, B. Simulated annealing on free-energy surfaces by a combined molecular-dynamics and Monte-Carlo approach. J. Phys. Chem. 1993, 97, 1069—73. 25. Pitera, J., Kollman, P. Designing an optimum guest for a host using multimolecule free energy calculations: Predicting the best ligand for Rebek0 s “tennis ball”. J. Am. Chem. Soc. 1998, 120, 7557—67. 26. Li, H.Z., Fajer, M., Yang, W. Simulated scaling method for localized enhanced sampling and simultaneous “alchemical” free energy simulations: A general method for molecular mechanical, quantum mechanical, and quantum mechanical/molecular mechanical simulations. J. Chem. Phys. 2007, 126, 024106. 27. Min, D., Yang, W. Energy difference space random walk to achieve fast free energy calculations. J. Chem. Phys. 2008, 128, 191102. 28. Darve, E., Pohorille, A. Calculating free energies using average force. J. Chem. Phys. 2001, 115, 9169—83. 29. Bitetti-Putzer, R., Yang, W., Karplus, M. Generalized ensembles serve to improve the convergence of free energy simulations. Chem. Phys. Lett. 2003, 377, 633—41. 30. Fasnacht, M., Swendsen, R.H., Rosenberg, M. Adaptive integration method for Monte Carlo simulations. Phys. Rev. E. 2004, 69, 056704. 31. Ytreberg, F.M., Swendsen, R.H., Zuckerman, D.M. Comparison of free energy methods for mole cular systems. J. Chem. Phys. 2006, 126, 184114. 32. Pomes, R., Eisenmesser, E., Post, C.B., Roux, B. Calculating excess chemical potentials using dynamic simulations in the fourth dimension. J. Chem. Phys. 1999, 111, 3387—95. 33. Sugita, Y., Kitao, A., Okamoto, Y. Multidimensional replica-exchange method for free-energy calculations. J. Chem. Phys. 2000, 113, 6042—51. 34. Woods, C.J., Essex, J.W., King, M.A. The development of replica-exchange-based free-energy methods. J. Phys. Chem. B 2003, 107, 13703—10. 35. Lu, N.D., Wu, D., Woolf, T.B., Kofke, D.A. Using overlap and funnel sampling to obtain accurate free energies from nonequilibrium work measurements. Phys. Rev. E 2004, 69, 057702. 36. Christ, C.D., van Gunsteren, W.F. Enveloping distribution sampling: A method to calculate free energy differences from a single simulation. J. Chem. Phys. 2007, 126, 184110. 37. Abrams, J.B., Rosso, L., Tuckerman, M.E. Efficient and precise solvation free energies via alchem ical adiabatic molecular dynamics. J. Chem. Phys. 2006, 125, 074115. 38. Zheng, L.Q., Chen, M.G., Yang, W. Random walk in orthogonal space to achieve efficient freeenergy simulation of complex systems. Proc. Natl. Acad. Sci. USA 2008, 105, 20227—32. 39. Warshel, A., Levitt, M. Theoretical studies of enzymic reactions–dielectric, electrostatic and steric stabilization of carbounium-ion in reaction of lysozyme. J. Mol. Biol. 1976, 103, 227—49. 40. Field, M.J., Bash, P.A., Karplus, M. A combined quantum-mechanical and molecular mechanical potential for molecular-dynamics simulations. J. Comput. Chem. 1990, 11, 700—33. 41. Gao, J.L. Hybrid quantum and molecular mechanical simulations: An alternative avenue to solvent effects in organic chemistry. Acc. Chem. Res. 1996, 29, 298—305. 42. Bakowies, D., Thiel, W. Hybrid models for combined quantum mechanical and molecular mechanical approaches. J. Phys. Chem. 1996, 100, 10580—94. 43. Monard, G., Merz, K.M. Combined quantum mechanical/molecular mechanical methodologies applied to biomolecular systems. Acc. Chem. Res. 1999, 32, 904—11.

QM/MM Alchemical Free Energy Simulations: Challenges and Recent Developments

61

44. Gao, J.L., Xia, X.F. A priori evaluation of aqueous polarization effects through Monte Carlo QM MM simulations. Science 1992, 258, 631—5. 45. Luzhkov, V., Warshel, A. Microscopic models for quantum-mechanical calculations of chemical processes in solutions–LD/AMPAC and SCAAS/AMPAC calculations of solvation energies. J. Comput. Chem. 1992, 13, 199—213. 46. Gao, J.L., Luque, F.J., Orozco, M. Induced dipole-moment and atomic charges based on average electrostatic potentials in aqueous-solution. J. Chem. Phys. 1993, 98, 2975—82. 47. Wesolowski, T., Warshel, A. Ab-initio free-energy perturbation calculations of solvation freeenergy using the frozen density-functional approach. J. Phys. Chem. 1994, 98, 5183—7. 48. Stanton, R.V., Little, L.R., Merz, K.M. Quantum free-energy perturbation study within a PM3-MM coupled potential. J. Phys. Chem. 1995, 99, 483—6. 49. Gao, J.L., Li, N.Q., Freindorf, M. Hybrid QM/MM simulations yield the ground and excited state pK(a) difference: Phenol in aqueous solution. J. Am. Chem. Soc. 1996, 118, 4912—3. 50. Li, G.H., Zhang, X.D., Cui, Q. Free energy perturbation calculations with combined QM/MM potentials: Complications, simplifications, and applications to redox potential calculations. J. Phys. Chem. B 2003, 107, 8643—53. 51. Li, G.H., Cui, Q. pKa calculations with QM/MM free energy perturbations. J. Phys. Chem. B 2003, 107, 14521—8. 52. Olsson, M.H.M., Hong, G.Y., Warshel, A. Frozen density functional free energy simulations of redox proteins: Computational studies of the reduction potential of plastocyanin and rusticyanin. J. Am. Chem. Soc. 2003, 125, 5025—39. 53. Yang, W., Bitetti-Putzer, R., Karplus, M. Chaperoned alchemical free energy simulations: A general method for QM, MM, and QM/MM potentials. J. Chem. Phys. 2004, 120, 9450—3. 54. Hu, H., Yang, W.T. Dual-topology/dual-coordinate free-energy simulation using QM/MM force field. J. Chem. Phys. 2005, 123, 041102. 55. Riccardi, D., Schaefer, P., Yang, Y., Yu, H.B., Ghosh, N., Prat-Resina, X., et al. Development of effective quantum mechanical/molecular mechanical (QM/MM) methods for complex biological processes. J. Phys. Chem. B 2006, 110, 6458—69. 56. Blumberger, J., Tavernelli, I., Klein, M.L., Sprik, M. Diabatic free energy curves and coordination fluctuations for the aqueous Agþ/Ag2þ redox couple: A biased Born-Oppenheimer molecular dynamics investigation. J. Chem. Phys. 2006, 124, 064507. 57. Blumberger, J., Sprik, M. Quantum versus classical electron transfer energy as reaction coordinate for the aqueous Ru2þ/Ru3þ redox. Theor. Chem. Acc. 2006, 115, 113—26. 58. Zeng, X.C., Hu, H., Hu, X.Q., Cohen, A.J., Yang, W.T. Ab initio quantum mechanical/molecular mechanical simulation of electron transfer process: Fractional electron approach. J. Chem. Phys. 2008, 128, 124510. 59. Cheng, J., Sulpizi, M., Sprik, M. Redox potentials and pKa for benzoquinone from density functional theory based molecular dynamics. J. Chem. Phys. 2009, 131, 154504. 60. Gao, J.L., Freindorf, M. Hybrid ab initio QM/MM simulation of N-methylacetamide in aqueous solution. J. Phys. Chem. A 1997, 101, 3182—8. 61. Li, H.Z., Yang, W. Sampling enhancement for the quantum mechanical potential based molecular dynamics simulations: A general algorithm and its extension for free energy calculation on rugged energy surface. J. Chem. Phys. 2007, 126, 114104. 62. Zheng, L., Li, H., Yang, W. In From Computational Biophysics to Systems Biology (CBSB08) NIC series, Vol. 36. (eds U.H.E. Hansmann, J. Meinke, S. Mohanty, W. Nadler and O. Zimmermann). Ju¨lich, Germany, 2008, pp. 57—64. 63. Woods, C.J., Manby, F.R., Mulholland, A.J. An efficient method for the calculation of quantum mechanics/molecular mechanics free energies. J. Chem. Phys. 2008, 128, 014109. 64. Zacharias, M., Straatsma, T.P., McCammon, J.A. Separation-shifted scaling: A new scaling method for Lennard-Jones interactions in thermodynamic integration. J. Chem. Phys. 1994, 100, 9025—31. 65. Beutler, T.C., Mark, A.E., van Schaik, R.C., Gerber, P.R., van Gunsteren, W.F. Avoiding singularities and numerical instabilities in free energy calculations based on molecular simulations. Chem. Phys. Lett. 1994, 222, 529—39. 66. Steinbrecher, T., Mobley, D.L., Case, D.A. Nonlinear scaling schemes for Lennard-Jones interac tions in free energy calculations. J. Chem. Phys. 2007, 127, 214108.

62

Wei Yang et al.

67. Warshel, A., Russell, S.T. Calculations of electrostatic interactions in biological-systems and in solutions. Q. Rev. Biophys. 1984, 17, 283—422. 68. Im, W., Berneche, S., Roux, B. Generalized solvent boundary potential for computer simulations. J. Chem. Phys. 2001, 114, 2924—37. 69. Darden, T., York, D., Pedersen, L. Particle mesh Ewald–an Nlog(N) method for Ewald sums in large systems. J. Chem. Phys. 1993, 98, 10089—92. 70. Deng, Y.Q., Roux, B. Computations of standard binding free energies with molecular dynamics simulations. J. Phys. Chem. B 2009, 113, 2234—46. 71. Schaefer, P., Riccardi, D., Cui, Q. Reliable treatment of electrostatics in combined QM/MM simulation of macromolecules. J. Chem. Phys. 2005, 123, 014905. 72. Benighaus, T., Thiel, W. Efficiency and accuracy of the generalized solvent boundary potential for hybrid QM/MM simulations: Implementation for semiempirical Hamiltonians. J. Chem. Theor. Comp. 2008, 4, 1600—9. 73. Benighaus, T., Thiel, W. A general boundary potential for hybrid QM/MM simulations of solvated biomolecular systems. J. Chem. Theor. Comp. 2009, 5, 3114—28. 74. Riccardi, D., Schaefer, P., Cui, Q. pK(a) calculations in solution and proteins with QM/MM free energy perturbation simulations: A quantitative test of QM/MM protocols. J. Phys. Chem. A 2005, 109, 17715—33. 75. Riccardi, D., Cui, Q. pK(a) analysis for the zinc-bound water in human carbonic anhydrase II: Benchmark for “Multiscale” QM/MM simulations and mechanistic implications. J. Phys. Chem. A 2007, 111, 5703—11. 76. Cui, Q., Elstner, M., Kaxiras, E., Frauenheim, T., Karplus, M. A QM/MM implementation of the self-consistent charge density functional tight-binding (SCC-DFTB) method. J. Phys. Chem. B 2001, 105, 569—85. 77. Nam, K., Gao, J.L., York, D.M. An efficient linear-scaling Ewald method for long-range electro static interactions in combined QM/MM calculations. J. Chem. Theor. Comp. 2005, 1, 2—13. 78. Fennel, C.J., Gezelter, J.D. Is the Ewald summation still necessary? Pairwise alternatives to the accepted standard for long-range electrostatics. J. Chem. Phys. 2006, 125, 234104. 79. Denesyuk, N.A., Weeks, J.D. A new approach for efficient simulation of Coulomb interactions in ionic fluids. J. Chem. Phys. 2008, 128, 124109. 80. Hansmann, U.H.E., Okamoto, Y. Monte Carlo method for systems with rough energy landscape. Phys. Rev. E 1997, 56, 2228—33. 81. Berne, B.J., Straub, J.E. Novel methods of sampling phase space in the simulation of biological systems. Curr. Opin. Struc. Biol. 1997, 7, 181—9. 82. Mitsutake, A., Sugita, Y., Okamoto, Y. Generalized-ensemble algorithms for molecular simulations of biopolymers. Biopolymers 2001, 60, 96—123. 83. Okamoto, Y. Generalized-ensemble algorithms: Enhanced sampling techniques for Monte Carlo and molecular dynamics simulations. J. Mol. Graph. Model. 2004, 22, 425—39. 84. Zheng, L.Q., Chen, M.G., Yang, W. Simultaneous escaping of explicit and hidden free energy barriers: Application of the orthogonal space random walk strategy in generalized ensemble based conformational sampling. J. Chem. Phys. 2009, 130, 234105. 85. Wang,, F.G., Landau, D.P. Efficient, multiple-range random walk algorithm to calculate the density of states. Phys. Rev. Lett. 2001, 86, 2050—3. 86. Laio, A., Parrinello, M. Escaping free-energy minima. Proc. Natl. Acad. Sci. USA 2002, 99, 12562—6. 87. Ensing, B., De Vivo, M., Liu, Z.W., Moore, P., Klein, M.L. Metadynamics as a tool for exploring free energy landscapes of chemical reactions. Acc. Chem. Res. 2006, 39, 73—81.

Section 2

Quantum Chemistry

Section Editor: Gregory S. Tschumper Department of Chemistry and Biochemistry, University of Mississippi,

University, MS 38677, USA

CHAPTER

5 Deciphering Structural Fingerprints for Metalloproteins with Quantum Chemical Calculations Yan Ling1 and Yong Zhang2

Contents

1. Introduction 2. Computational Details 3. Results And Discussion 4. Conclusions Acknowledgments References

Abstract

Computational investigations of spectroscopic observables can help many experimental studies and provide an important venue for the structural investigations of proteins. Here we report the first detailed quantum chemical investigation of the hydrogen-bonding effect on Mo ¤ ssbauer spectroscopic properties of metalloproteins, using various active site models of oxymyoglobin. The hydrogen bond between O2 and the distal His residue was found to strengthen the binding of oxygen, highlighting the role of protein environment on its biological function. The hydrogen bonding also entails more FeIII O 2 character. These structural effects result in clear differences in the predicted Mo¤ ssbauer properties, with those of the lowest energy, hydrogen-bonded, Weiss-type, open-shell singlet state, in best agreement with the experiment. These results suggest that the use of quantum chemical calculations of Mo ¤ssbauer properties can help identify and assess the effect of hydrogen bonding in the protein active site.

66 68 70 74 75 75

Keywords: quantum chemical calculations; Mo ¤ssbauer; hydrogen bonding; oxymyoglobin

1

Department of Chemistry and Biochemistry, University of Southern Mississippi, Hattiesburg, MS, USA

2

Department of Chemistry, Chemical Biology, and Biomedical Engineering, Stevens Institute of Technology, Castle Point on Hudson, Hoboken, NJ, USA

Annual Reports in Computational Chemistry, Volume 6 ISSN: 1574-1400, DOI 10.1016/S1574-1400(10)06005-6

2010 Elsevier B.V. All rights reserved.

65

66

Yan Ling and Yong Zhang

1. INTRODUCTION Metals, due to their rich diversity of oxidation states, spin states, and coordina tion states, play significant roles in affecting or controlling biochemical activities. In fact, metalloproteins have been widely found in biological systems, perform ing frequently vital functions such as structural stabilization, storage of metal ions, electron transfer, small molecule (e.g., O2 and NO) binding, and catalysis [1,2]. Various spectroscopic techniques have been developed to investigate pro teins, such as nuclear magnetic resonance (NMR), electron spin resonance, Mo¨ ssbauer, and vibrational spectroscopies. The observable properties obtained through the use of these spectroscopic tools are “fingerprints” of protein systems. Nowadays, quantum chemical calculations can be used to predict many interest ing spectroscopic properties of proteins [3,4]. Our group is interested in devel oping quantum chemical methods that can predict wide ranges of experimental spectroscopic properties with theory-versus-experiment correlation coefficients R2 0.98. Previous investigations show that accurate predictions of some char acteristic spectroscopic observables can help assign and correct experimental spectra [5—7], reveal relevant geometric and electronic origins of these spectro scopic observables [5—26], and more importantly provide a valuable venue to refine or determine protein structures [10,13,14,18,22,27]. Currently, the most widely used technique to determine protein structures is X-ray crystallography, which accounts for 85% structures deposited in the protein data bank (PDB, www.rcsb.org) and is followed by NMR spectroscopy for 14% PDB structures. Although protein X-ray structures are useful and are generally more accurate than NMR structures, they still suffer from many accu racy problems. According to the information from the official PDB website [28], generally the positional uncertainty is about 1/5-1/10 of the resolution for good ˚ resolution, for very structures with crystallographic R factor £ 0.2. Even at 1.33 A small proteins (10 kD), the average positional error of light atoms can be as high ˚ [29,30]. In addition, it is hard to differentiate C, N, and O from the as 0.15 A electron density map generated from the X-ray diffraction data, so conformations may be messed up in some cases. Moreover, protonation states are generally inaccessible for conventional X-ray crystallography and recent investigations indicate that the protonation states of certain amino acids in proteins are gen erally questionable [31]. In particular, since the positions of light atoms in ligands can be severely blurred in electron density maps by the presence of the central heavy metal atom, metal sites in biomolecules have been known for some time to be prone to “substantial uncertainties in metric features, inexact stereochemistry, incomplete definition of the total ligand set (missing ligands), and the distinction between water and hydroxide” [1]. For instance, in relatively “good” X-ray ˚ ) for cyanometmyoglobin, the reported Fe—CN structures (resolution < 2.0 A ˚ and 96, respectively bond length and Fe—C—N bond angle vary as high as 0.70 A [32]. It has been recently reported that quantum chemical calculations can be used to locally improve X-ray structures, particularly for the metal center [33]. Linear-scaling quantum chemical methods have also been developed to refine

Deciphering Structural Fingerprints for Metalloproteins with Quantum Chemical Calculations

67

protein crystal structures [34]. These methods utilize quantum chemical energy restraints with an X-ray target function. Because the electron density map from the conventional X-ray crystallogra phy is not of sufficient accuracy, we have been developing integrated quantum mechanics and spectroscopy (QM/S) techniques to solve these problems. This approach is based on an intrinsic mathematical relationship between a spectro scopic observable of a given molecule and its molecular structure. According to a fundamental theorem in the density functional theory (DFT)–the Hohenberg—Kohn theorem [35], any property can be expressed by a functional of the molecular system’s electron density. Therefore, a spectroscopic observable property (e.g., NMR chemical shift) is a functional of the electron density, (r1, r2,. . ., rn), which is linked to the molecular structure or spatial arrangement of the atoms (n atoms) in this molecule, (r1, r2,. . ., rn). In many cases, this quantitative structure observable relationship (QSOR) cannot be expressed explicitly, so quantum chemical geometry optimization is needed to find the optimal structure that can minimize the prediction errors for some key experimental spectroscopic observables [13,14,18]. In addition to the use of this implicit QSOR approach, sometimes, numerical explicit QSOR may be constructed and a probability surface can be used to directly find the optimal geometric parameters [10,27]. The easiest way to determine a geometric parameter from using the experimental spectroscopic data may be the use of an analytic QSOR, which is actually not uncommon, e.g., the well-known Karplus relationship [36]. The use of high-resolution spectroscopic measurements (e.g., NMR) may provide information about the atoms of interest (and sometimes all atoms) in a biomolecule, in solid and solution states, in static and dynamic ways, in vitro and in vivo, and does not require diffraction-quality crystals as used in the X-ray crystallography. In addition, we use high-quality quantum chemical geometry optimization to obtain energy-minimized geometry for each possible candidate structure of the biomolecule of interest, from which only the one that allows the best prediction of experimental spectroscopic data is chosen as the best structure. In this way, the final structure is quantitatively compatible with both experi mental data and theoretical energy requirements. In practice, we compare several different observable properties with experiments to improve the statistical sig nificance of the final structure. These techniques are particularly helpful for the structure determination of metal sites. In fact, they have enabled successful X-ray structure refinements for a number of metalloproteins having different metal coordination environments, spin states, and reaction states [10,13,14,18,22,27]. They have also provided additional information not available from conventional X-ray structures, e.g., the hydrogen positions used to determine protonation states [18,22]. The integrated QM/S inves tigations resulted in the first report of the protonation state of the diphosphate group in a protein as mono-protonation, the identification of correct protonation states of drug candidates, and thus assistance in finding new drug leads [18]. In addition, this approach supports a new protonation state designation of one of the active site histidines in rusticyanine, one of the electron-transfer blue copper pro teins, that can better explain its unique function and NMR experimental results [22].

68

Yan Ling and Yong Zhang

Since hydrogen atoms can play significant roles on protein functions and their positions are inaccessible from conventional X-ray crystallography, we would like to go a step further than determining the protonation states in covalent molecules, to examine the use of the QM/S approach in identifying or assessing the weak inter action of hydrogen bonding in the metalloprotein active site. Here, we report a quantum chemical investigation of the Mo¨ ssbauer properties in oxymyoglobin, MbO2. The hydrogen bonding of the distal His residue to the bound oxygen mole cule is well known to provide an important basis of oxygen binding and discrimina tion between CO and O2 [37—44]. Various electronic states of MbO2 were proposed in the past [37]. Mo¨ ssbauer spectroscopy is an invaluable tool to investigate iron containing proteins [45]. But there is no prior analysis of the hydrogen-bonding effect on the Mo¨ssbauer properties of oxymyoglobin. We performed a series of quantum chemical calculations of the Mo¨ ssbauer properties for the two most popular electronic states, the Pauling-type closed-shell singlet 1FeII—1O2 (1) and the Weiss-type open-shell singlet 2FeIII"#2O 2 (2), as well as a closely related triplet 2 III 2 Fe "" O2 (3) for comparison. Our results indicate a clear effect of hydrogen bond ing on the predicted Mo¨ssbauer spectroscopic parameters, and the predicted values using the hydrogen-bonding distal His residue in the most widely accepted Weiss type open-shell singlet system [37,39,40,42,44] are in the best agreement with experiment. These results suggest that the use of the integrated QM/S approach is able to assist in discovering hydrogen bonds in the functional sites of proteins.

2. COMPUTATIONAL DETAILS The basic molecular structure of the MbO2 active site used in our calculations with the labels for bridging oxygen (Ob) and terminal oxygen (Ot) atoms is shown in Figure 1. The heme group is represented by a porphyrin. Both the proximal

Ob

Ot

Figure 1 Molecular structure of the basic MbO2 active site model. The hydrogen bond between the terminal oxygen atom and the hydrogen atom of the distal His residue is highlighted by a dashed line.

Deciphering Structural Fingerprints for Metalloproteins with Quantum Chemical Calculations

69

and the distal His residues are truncated to be 5-methylimidazole. Six models were investigated, including both the non-HB (without distal His) models (1A, 2A, 3A) and the HB (with distal His) models (1B, 2B, 3B), having above-men tioned three electronic states, respectively. Geometries of all these structural models were built on the basis of the oxymyoglobin X-ray crystal structure ˚ resolution [46] and subject to full optimization, using determined at a 1.60 A the DFT method BPW91 [47,48] with the Wachters’ basis for Fe [49,50], 6—311G for other heavy atoms, and 6—31G for hydrogens in the Gaussian 98 program [51]. This is the same approach used previously to investigate other heme sys tems [14]. Then, the Mo¨ssbauer properties of these optimized structures were calculated using the following methods. The 57Fe Mo¨ ssbauer quadrupole splitting (DEQ) arises from the nonspherical nuclear charge distribution in the I = 3/2 excited state in the presence of an electric field gradient (EFG) at the 57Fe nucleus, while the isomer shift (Fe) arises from differences in the electron density at the nucleus between the absorber (the molecule or system of interest) and a reference compound (usually a-Fe at 300 K). The former effect is related to the components of the EFG tensor at the nucleus as follows [45]: � �1 = 2 1 2 DEQ ¼ eQVzz 1 þ 2 3

ð1Þ

where e is the electron charge, Q is the quadrupole moment of the E = 14.4 keV excited state, and the principal components of the EFG tensor are labeled accord ing to the convention: jVzz j > jVyy j > jVxx j

ð2Þ

with the asymmetry parameter being given by ¼ The isomer shift in Fe

Vxx Vyy Vzz

ð3Þ

57

Fe Mo¨ssbauer spectroscopy is given by [45] � �� 2 2 � 2 2 ¼ EA EFe ¼ Ze hR i hR2 i jcð0Þj2A jcð0ÞjFe 3

ð4Þ

where Z represents the atomic number of the nucleus of interest (iron) and R, R are average nuclear radii of the ground and excited states of 57Fe. Since jcð0Þj2Fe is a constant, the isomer shift (from Fe) can be written as Fe ¼ ½ð0Þ c

ð5Þ

where is the so-called calibration constant and (0) is the computed charge density at the iron nucleus. Both and c can be obtained from the correlation between experimental Fe values and the corresponding computed (0) data in a training set. Then, one can use Eq. (5) to predict Fe for a new molecule from its computed (0), as described in detail elsewhere for a wide variety of heme and other model systems [8].

70

Yan Ling and Yong Zhang

The hybrid functional B3LYP [52] with the Wachters’ basis for Fe [49,50], 6—311G for all the other heavy atoms, and 6—31G for hydrogens in the Gaussian 03 program [53] were used to predict Mo¨ssbauer quadrupole splittings and isomer shifts, which is the same approach used in the previous work for various ironcontaining proteins and models [8—14,23] and was found to have a generally better performance than the pure DFT method BPW91 [8—14] for Mo¨ ssbauer property predictions. To calculate DEQ, we first evaluated the principal compo nents of the EFG tensor at the 57Fe nucleus (Vii), then we used Eq. (1) to deduce DEQ, using a precise recent determination [54] of Q = 0.16 (+5%) 10—28 m2, a value previously found to permit excellent accord between theory and experi ment in a broad range of systems [8—14,23]. In order to calculate Fe values, we read the Kohn—Sham orbitals from the Gaussian 03 calculations into the AIM 2000 program [55] to evaluate the charge density at the iron nucleus, (0). Then, we evaluated the isomer shifts by using the equation derived previously [8]: Fe ¼ 0:404 ½ð0Þ 11614:16

ð6Þ

In addition, we used Bader’s atoms-in-molecules (AIM) theory [56,57] to help analyze some of the results. For convenience, we give here a very brief overview of this approach. According to the AIM theory, every chemical bond has a bond critical point at which the first derivative of the charge density, (r), is zero. The (r) topology is described by a real, symmetric, second-rank Hessian-of-(r) tensor, and the tensor trace is related to the bond interaction energy by a local expression of the virial theorem: � � 4m ð7Þ TrðHessianÞ ¼ r2 ðrÞ ¼ ½2GðrÞ þVðrÞ h2 where r2(r) is the Laplacian of (r), G(r) and V(r) are electronic kinetic and electronic potential energy densities, and m is the electron mass, respectively. Negative and positive r2(r) values are associated with shared-electron (covalent) interactions and closed-shell (electrostatic) interactions, respectively. In the latter case, one can further evaluate the total energy density, H(r), at the bond critical point: HðrÞ ¼ GðrÞ þVðrÞ

ð8Þ

A negative H(r) is termed partial covalence, while a positive H(r) indicates a purely closed-shell, electrostatic interaction [56—58]. All critical point properties were calculated by using the AIM2000 program [55].

3. RESULTS AND DISCUSSION As shown in Table 1, the calculated Mulliken spin densities of Fe and O2 moieties (Fe and O2) confirm that the calculated systems have indeed the closed-shell singlet, open-shell singlet and triplet spin states. Based on the calculated total electronic energies, the Weiss-type open-shell singlet (2) is most stable, followed

71

Deciphering Structural Fingerprints for Metalloproteins with Quantum Chemical Calculations

Table 1 Mulliken spin densities, charges, and energies of optimized MbO2 models Fe (au)

1A 1B 2A 2B 3A 3B a

0.00 0.00 1.15 —1.07 0.55 0.86

O2 (au)

0.00 0.00 —1.09 1.02 1.49 1.18

QFe (au)

QO2(au)

1.01 1.02 1.02 1.03 1.02 1.04

—0.31 —0.37 —0.32 —0.37 —0.33 —0.40

Ea (kJ/mol)

0.00 0.00 —6.38 —6.43 1.16 4.99

EHB (kJ/mol)

—29.75 —29.80 —25.91

The total electronic energies are referenced to those of 1A and 1B for non-HB (—2668.35151 au) and HB (—2933.92459 au) models, respectively.

(a)

(b)

(c)

Figure 2 Isosurface representations of (a) spin density in model 2B (face view); (b) spin density in model 2B (side view); and (c) aHOMO-3 in model 2B. Contour values are at –0.02, 0.02, and 0.005 au, respectively. The arrow in (c) highlights the hydrogen atom that is involved in the hydrogen bonding of O2 with the distal His residue.

by the Pauling-type closed-shell singlet (1), and then the triplet (3). This trend is independent of hydrogen bonding and is consistent with previous studies [37,39,40,42,44]. The spin density distribution in the most widely accepted 2B model is shown in Figure 2a and 2b for two views, which shows the clear antiferromagnetic coupling between iron and oxygen with dark and light gray colors, respectively. The hydrogen-bonding energies for the distal His residue were calculated using the optimized geometries as follows: EHB ¼ EðFeðPorÞðHisÞðO2 Þ HisÞ EðFeðPorÞðHisÞðO2 ÞÞ EðHisÞ

ð9Þ

where Por stands for porphyrin. The predicted EHB of —29.80 kJ/mol in 2B is similar to that from the previous calculation using the hybrid quantum mechanics and molecular mechanics (QM/MM) method [42]. In addition, as

72

Yan Ling and Yong Zhang

Table 2 Geometric parameters of optimized MbO2 models

Expta 1A 1B 2A 2B 3A 3B a

FeOb (¯)

FeObOt (degree)

ObOt (¯)

1.83 1.776 1.770 1.833 1.829 1.877 1.861

116 122.3 121.8 120.8 120.5 132.2 129.9

1.22 1.275 1.289 1.280 1.292 1.271 1.287

H Ot (¯)

H Ob (¯)

1.964

2.636

1.971

2.625

1.978

2.707

The PDB file 1MBO (Reference [46]).

shown in Table 2, the calculated Fe—Ob bond length in the lowest energy state (2B) is close to the value seen in the oxymyoglobin X-ray structure [46] and the experimental bond lengths of isoelectronic RNO iron porphyrin complexes [59]. Compared to the results of recent QM/MM calculations [37], our computed Fe—Ob bond length, Ob—Ot bond length, and Fe—Ob—Ot angle are also similar. These results further support the use of the BPW91 method for geometry opti mizations of the oxymyoglobin models. It can be seen from Table 2 that the Fe—Ob bond length increases from the closed-shell singlet 1 to open-shell singlet 2 and then to the triplet 3, a trend independent of hydrogen bonding. This suggests that the Fe—Ob bonding is largely determined by the electronic state of the complex. The hydrogen-bonding interactions are very similar in both closed-shell and open-shell singlet systems, as reflected by almost identical hydrogen-bonding distances and energies shown in Tables 1 and 2. However, the hydrogen bond is a little bit weaker in the triplet excited state. Interestingly, upon formation of the hydrogen bonds, the Fe—Ob bond contracts in each of these three electronic states (see Table 2). Meanwhile, based on the charge analysis (see QFe and QO2 data in Table 1), the hydrogen bonding makes iron slightly more positively charged and the O2 moiety much more negatively charged (20% increase). These results suggest that the FeIII— O 2 character is strengthened by the distal His hydrogen bonding, consistent with the result from more sophisticated CASSCF/MM calculations [37]. The contracted Fe—O bond and enlarged formal charges of Fe and O2 moieties may enhance the interaction between them and provide another way of strengthening the binding of oxygen molecule in the protein besides the stabilization from the hydrogen bonding itself. This result highlights the role of protein environment on Mb’s biological function [37—44]. This hydrogen bonding also impacts Mo¨ ssbauer spectroscopic properties as shown in Table 3. It should be noted that the use of the B3LYP method has enabled accurate predictions of the Mo¨ ssbauer quadrupole splittings and isomer shifts in a wide variety of iron proteins and model systems covering all iron spin states and coordination states [8—14,23], with a theory-versus-experiment correla tion coefficient R2=0.98 over an experimental range of 8.80 mm/s in 47 systems

Deciphering Structural Fingerprints for Metalloproteins with Quantum Chemical Calculations

Table 3

Mo ¤ssbauer properties of optimized MbO2 models

Expta 1A 1B 2A 2B 3A 3B a

73

Non-HB HB Non-HB HB non-HB HB

S

DEQ (mm/s)

Fe (mm/s)

0 0 0 0 0 1 1

—2.31 —1.78 —1.99 —2.00 —2.11 1.78 —2.19

0.2 0.4 0.3 0.1 0.2 0.9 0.7

0.27 0.35 0.32 0.39 0.36 0.48 0.40

Reference [45].

for DEQ and R2 = 0.97 over an experimental range of 2.34 mm/s in 48 systems for Fe. As seen from Table 3, for all the three electronic states, the hydrogen bonding makes the Mo¨ ssbauer quadrupole splitting more negative and the isomer shift smaller. From the above discussion, we know that the FeIII—O 2 character is enhanced upon hydrogen bonding, and therefore the iron experiences more negative charge flow from the oxygen moiety, which would make the quadru pole splitting, a measure of the EFG, more negative [9]. The shortened Fe—O bond length after the formation of the distal hydrogen bond discussed above brings in closer the negatively charged O 2 moiety compared with the non-HB models, and thus increases the charge densities at the iron nucleus. According to Eq. (6), this results in the smaller isomer shifts as observed in Table 3. By comparing the predicted and experimental data of all three Mo¨ ssbauer parameters (quadrupole splitting, asymmetry parameter, and isomer shift) together for the six MbO2 models, the computed values of —2.11 mm/s, 0.2, and 0.36 mm/s from the lowest energy form, the Weiss-type open-shell singlet 2B, are in best agreement with the experimental results of —2.31 mm/s, 0.2, and 0.27 mm/s, respectively. While the computed isomer shift is similar to those from a recent report (0.33 and 0.36 mm/s), the computed Mo¨ ssbauer quadrupole splitting here represents an improvement over the most recent calculations (—2.67 and —2.76 mm/s) [37]. Moreover, this is the first detailed investigation of hydrogen-bonding effect on Mo¨ ssbauer spectroscopic properties in metalloproteins. These results suggest that the use of quantum chemical calculations of Mo¨ ssbauer properties that are sensitive probes of the iron-containing protein systems can help identify and assess the effect of hydrogen bonding in the protein active site. In addition, these results suggest that although the Fe—Ob bond length in the oxymyoglobin X-ray structure [46] is basically the same as computed here, the Ob—Ot bond length and the Fe—Ob—Ot angle need to be refined, consistent with previous investigations of protein X-ray structures [10,13,14,18,22,27]. To further characterize this important hydrogen bonding, which strengthens the O2 binding in myoglobin, AIM calculations were performed to analyze the bond critical point properties. As seen from Figure 1, the hydrogen atom that is involved in the hydrogen bonding is located between two oxygen atoms.

74

Table 4

a

Yan Ling and Yong Zhang

AIM results of optimized MbO2 models

Models

(r) (au)

G(r) (au)

V(r) (au)

r2(r) (au)

1B H Ot 2B H Ot 3B H Ot Ranges for other HBsa 1B Fe—Ob 2B Fe—Ob 3B Fe—Ob

0.0245 0.0245 0.0232 0.0120.025

0.0201 0.0198 0.0192 0.0040.026

0.0203 0.0201 0.0192 0.0030.024

0.0798 —0.0002 0.0783 —0.0003 0.0769 0.0000 0.0200.109 —0.0010.003

0.1445 0.1280 0.1183

0.2309 0.1737 0.1555

0.2653 0.1981 0.1738

0.7860 0.5974 0.5489

H(r) (au)

—0.0344 —0.0244 —0.0183

References [20, 58].

However, because the distances of H Ob are too large (see Table 2), there is only one hydrogen bond in each of the HB models (1B, 2B, 3B) identified from AIM calculations. As shown in Table 4, all the bond critical point properties are similar to other typical hydrogen bonds, as found in protein backbone structures [58] and some metal N—H/C—H interactions [20]. The positive Laplacian r2(r) indicates that the H. . .Ot hydrogen bond is basically of electrostatic nature and the negli gible total energy density H(r) further suggests that the bonding character is almost purely electrostatic, with van der Waals penetrations [20,58]. This kind of penetration is demonstrated in Figure 2c. This hydrogen bonding is also similar to that in MbNO [10]. The Fe—Ob bonds though are mostly of electrostatic nature, or dative as found recently by CASSCF/MM calculations [37], based on positive Laplacian r2(r) results (see Table 4); they have large covalent features since the total energy density H(r) is of much larger negative values, based on AIM theory [56—58]. In fact, the recent CASSCF/MM calculations [37] also sug gest an important role of iron d orbital and the oxygen orbital in the Fe—O2 bonding, as can be seen from Figure 2a and 2b.

4. CONCLUSIONS The results we have described above are of interest for a number of reasons. First, the distal hydrogen bond was found to provide approximately —29.8 kJ/mol stabilization effect for O2 binding in Mb in the ground state, which highlights the role of protein environment on the biological function of myoglobin. The hydrogen bonding entails more FeIII— O 2 character. Second, a systematic inves tigation of Mo¨ ssbauer properties of various non-HB and HB models of MbO2 indicate a clear hydrogen-bonding effect. The more negative Mo¨ ssbauer quadru pole-splitting results from the more pronounced FeIII—O—2 character and the smal ler Mo¨ ssbauer isomer shift results from the shortened Fe—O bond. The predicted Mo¨ ssbauer quadrupole splitting, asymmetry parameter, and isomer shift of the lowest energy form, the most widely accepted Weiss-type open-shell singlet 2B,

Deciphering Structural Fingerprints for Metalloproteins with Quantum Chemical Calculations

75

are in best agreement with the experiment. Third, AIM calculations suggest that the H Ot bonding character is almost purely electrostatic, with van der Waals penetrations, similar to other ordinary hydrogen bonds in, e.g., protein backbone structures. Taken together, these results represent the first quantum chemical investigation of the hydrogen-bonding effect on Mo¨ ssbauer spectroscopic prop erties of metalloproteins, which are sensitive probes of iron sites in proteins. These results suggest that the use of quantum chemical calculations of Mo¨ ssbauer properties can help identify and assess the effect of hydrogen bond ing in the protein active site, which should facilitate spectroscopic characteriza tions and structural investigations of iron-containing proteins.

ACKNOWLEDGMENTS This work was supported by the NIH grant GM-085774. We are also grateful to the Mississippi Center for Supercomputing Research and USM Vislab for the generous use of their computing facilities.

REFERENCES 1. Holm, R.H., Kennepohl, P., Solomon, E.I. Structural and functional aspects of metal sites in biology. Chem. Rev. 1996, 96, 2239—314. 2. Cowan, J.A. Inorganic Biochemistry–An Introduction, Wiley-VCH, New York, 1997. 3. Neese, F. Quantum chemical calculations of spectroscopic properties of metalloproteins and model compounds: EPR and Mo¨ssbauer properties. Curr. Opin. Chem. Biol. 2003, 7, 125—35. 4. Wang, B., Merz, K.M. Validation of the binding site structure of the cellular retinol-binding protein (CRBP) by ligand NMR chemical shift perturbations. J. Am. Chem. Soc. 2005, 127, 5310—1. 5. Kervern, G., Pintacuda, G., Zhang, Y., Oldfield, E., Roukoss, C., Kuntz, E., et al. Solid-state NMR of a paramagnetic DIAD-FeII catalyst: Sensitivity, resolution enhancement, and structure-based assignments. J. Am. Chem. Soc. 2006, 128, 13545—52. 6. Mao, J.H., Zhang, Y., Oldfield, E. Nuclear magnetic resonance shifts in paramagnetic metallopor phyrins and metalloproteins. J. Am. Chem. Soc. 2002, 124, 13911—20. 7. Zhang, Y., Mukherjee, S., Oldfield, E. 67Zn NMR chemical shifts and electric field gradients in zinc complexes: A quantum chemical investigation. J. Am. Chem. Soc. 2005, 127, 2370—1. 8. Zhang, Y., Mao, J.H., Oldfield, E. 57Fe Mo¨ssbauer isomer shifts of heme protein model systems: Electronic structure calculations. J. Am. Chem. Soc. 2002, 124, 7829—39. 9. Zhang, Y., Mao, J.H., Godbout, N., Oldfield, E. Mo¨ ssbauer quadrupole splittings and electronic structure in heme proteins and model systems: A density functional theory investigation. J. Am. Chem. Soc. 2002, 124, 13921—30. 10. Zhang, Y., Gossman, W., Oldfield, E. A density functional theory investigation of Fe-N-O bonding in heme proteins and model systems. J. Am. Chem. Soc. 2003, 125, 16387—96. 11. Zhang, Y., Oldfield, E. An investigation of the unusual 57Fe Mo¨ssbauer quadrupole splittings and isomer shifts in 2 and 3-coordinate Fe(II) complexes. J. Phys. Chem. B 2003, 107, 7180—8. 12. Zhang, Y., Oldfield, E. 57Fe Mo¨ssbauer quadrupole splittings and isomer shifts in spin-crossover complexes: A density functional theory investigation. J. Phys. Chem. A 2003, 107, 4147—50. 13. Zhang, Y., Oldfield, E. On the Mo¨ ssbauer spectra of isopenicillin N synthase and a model {FeNO}7 (S=3/2) system. J. Am. Chem. Soc. 2004, 126, 9494—5. 14. Zhang, Y., Oldfield, E. Cytochrome P450: An investigation of the Mo¨ssbauer spectra of a reaction intermediate and an Fe(IV)=O model system. J. Am. Chem. Soc. 2004, 126, 4470—1. 15. Zhang, Y., Oldfield, E. Solid-state 31P NMR chemical shielding tensors in phosphonates and bisphosphonates: A quantum chemical investigation. J. Phys. Chem. B 2004, 108, 19533—40. 16. Zhang, Y., Sun, H.H., Oldfield, E. Solid-state NMR Fermi contact and dipolar shifts in organome tallic complexes and metalloporphyrins. J. Am. Chem. Soc. 2005, 127, 3652—3.

76

Yan Ling and Yong Zhang

17. Cheng, F., Sun, H.H., Zhang, Y., Mukkamala, D., Oldfield, E. Solid state 13C NMR, crystallo graphic, and quantum chemical investigation of chemical shifts and hydrogen bonding in histi dine dipeptides. J. Am. Chem. Soc. 2005, 127, 12544—54. 18. Mao, J.H., Mukherjee, S., Zhang, Y., Cao, R., Sanders, J.M., Song, Y.C., et al. Solid-state NMR, crystallographic, and computational investigation of bisphosphonates and farnesyl diphosphate synthase-bisphosphonate complexes. J. Am. Chem. Soc. 2006, 128, 14485—97. 19. Zhang, Y., Oldfield, E. 31P NMR chemical shifts in hypervalent oxyphosphoranes and polymeric orthophosphates. J. Phys. Chem. B 2006, 110, 579—86. 20. Zhang, Y., Lewis, J.C., Bergman, R.G., Ellman, J.A., Oldfield, E. NMR shifts, orbitals, and MH-X bonding in d8 square planar metal complexes. Organometallics 2006, 25, 3515—9. 21. Mukkamala, D., Zhang, Y., Oldfield, E. A solid state 13C NMR, crystallographic, and quantum chemical investigation of phenylalanine and tyrosine residues in dipeptides and proteins. J. Am. Chem. Soc. 2007, 129, 7385—92. 22. Zhang, Y., Oldfield, E. NMR hyperfine shifts in blue copper proteins: A quantum chemical investigation. J. Am. Chem. Soc. 2008, 130, 3814—23. 23. Ling, Y., Zhang, Y. Mo¨ssbauer, NMR, geometric, and electronic properties in S=3/2 iron porphyr ins. J. Am. Chem. Soc. 2009, 131, 6386—8. 24. Ling, Y., Zhang, Y. Deciphering the NMR fingerprints of the disordered system with quantum chemical studies. J. Phys. Chem. A 2009, 113, 5993—7. 25. Sharma, A.K., Ling, Y., Greer, A.B., Hafler, D.A., Kent, S.C., Zhang, Y., et al. Evaluating the intrinsic cysteine redox-dependent states of the A-chain of human insulin using NMR spectroscopy, quantum chemical calculations, and mass spectrometry. J. Phys. Chem. B 2010, 114, 585—91. 26. Ling, Y., Mills, C., Weber, R., Yang, L., Zhang, Y. NMR, IR/Raman, and structural properties in HNO and RNO (R=alkyl and aryl) metalloporphyrins with implication for the HNO-myoglobin complex. J. Am. Chem. Soc. 2010, 132, 1583—91. 27. McMahon, M.T. deDios, A.C., Godbout, N., Salzmann, R., Laws, D.D., Le, H., Havlin, R.H., Oldfield, E. An experimental and quantum chemical investigation of CO binding to heme proteins and model systems: A unified model based on 13C, 17O, and 57Fe nuclear magnetic resonance and 57 Fe Mo¨ ssbauer and infrared spectroscopies. J. Am. Chem. Soc. 1998, 120, 4784—97. 28. RCSB Protein Data Bank. Nature of 3D Structural Data, http://www.rcsb.org/pdb/static.do? p=general_information/about_pdb/nature_of_3d_structural_data.html. 29. Guss, J.M., Bartunik, H.D., Freeman, H.C. Accuracy and precision in protein structure analysis: ˚ resolution. Restrained least-squares refinement of the structure of poplar plastocyanin at 1.33 A Acta Crystallogr. B 1992, 48, 790—811. 30. Ray, G.B., Li, X.Y., Ibers, J.A., Sessler, J.L., Spiro, T.G. How far can proteins bend the FeCO unit–distal polar and steric effects in heme proteins and models. J. Am. Chem. Soc. 1994, 116, 162—76. 31. Signorini, G.F., Chelli, R., Procacci, P., Schettino, V. Energetic fitness of histidine protonation states in PDB structures. J. Phys. Chem. B 2004, 108, 12252—7. 32. Protein data bank structures with files numbers of 2CMM, 1EBT, 1EBC, 1B0B, 1EMY, 2FAL. 33. Ryde, U., Nilsson, K. Quantum chemistry can locally improve protein crystal structures. J. Am. Chem. Soc. 2003, 125, 14232—3. 34. Yu, N., Yennawar, H.P., Merz, K.M., Jr. Refinement of protein crystal structures using energy restraints derived from linear-scaling quantum mechanics. Acta Crystallogr. D 2005, 61, 322—32. 35. Hohenberg, P., Kohn, W. Inhomogeneous electron gas. Phys Rev. B 1964, 136, 864—71. 36. Karplus, M. Vicinal proton coupling in nuclear magnetic resonance. J. Am. Chem. Soc. 1963, 85, 2870—1. 37. Chen, H., Ikeda-Saito, M., Shaik, S. Nature of the Fe-O2 bonding in oxy-myoglobin: Effect of the protein. J. Am. Chem. Soc. 2008, 130, 14778—90. 38. Jensen, K.P., Roos, B.O., Ryde, U. O2-binding to heme: Electronic structure and spectrum of oxyheme, studied by multiconfigurational methods. J. Inorg. Biochem. 2005, 99, 45—54. 39. Angelis, F.D., Jarzcki, A.A., Car, R., Spiro, T.G. Quantum chemical evaluation of protein control over heme ligation: CO/O2 discrimination in myoglobin. J. Phys. Chem. B 2005, 109, 3065—70. 40. Blomberg, L.M., Blomberg, M.R.A., Siegbahn, P.E.M. A theoretical study on the binding of O2, NO and CO to heme proteins. J. Inorg. Biochem. 2005, 99, 949—58.

Deciphering Structural Fingerprints for Metalloproteins with Quantum Chemical Calculations

77

41. Rovira, C. Role of the His64 residue on the properties of the Fe—CO and Fe—O2 bonds in myoglobin. A CHARMM/DFT study. J. Mol. Struct. (Theochem.) 2003, 632, 309—21. 42. Sigfridsson, E., Ryde, U. Theoretical study of the discrimination between O2 and CO by myoglo bin. J. Inorg. Biochem. 2002, 91, 101—15. 43. Scherlis, D.A., Estrin, D.A. Hydrogen bonding and O2 affinity of hemoglobins. J. Am. Chem. Soc. 2001, 123, 8436—7. 44. Rovira, C., Kunc, K., Hutter, J., Ballone, P., Parrinello, M. Equilibrium geometries and electronic structure of iron-porphyrin complexes: A density functional study. J. Phys. Chem. A 1997, 101, 8914—25. 45. Debrunner, P.G. In Iron Porphyrins (eds A.B.P. Lever and H.B. Gray), Vol. 3. VCH Publishers, New York, 1989, pp. 139—234. ˚ resolution. J. Mol. Biol. 1980, 142, 46. Phillips, S.E. Structure and refinement of oxymyoglobin at 1.6 A 531—54. 47. Becke, A.D. Density-functional exchange-energy approximation with correct asymptotic behavior. Phys Rev. A 1988, 38, 3098—100. 48. Perdew, J.P., Burke, K., Wang, Y. Generalized gradient approximation for the exchange-correlation hole of a many-electron system. Phys Rev. B 1996, 54, 16533—9. 49. Wachters, A.J.H. Gaussian basis set for molecular wavefunctions containing third-row atoms. J. Chem. Phys. 1970, 52, 1033—6. 50. http://www.emsl.pnl.gov/forms/basisform.html. 51. Frisch, M.J., Trucks, G.W., Schlegel, H.B., Scuseria, G.E., Robb, M.A., Cheeseman, J.R., et al. Gaussian 98, Revision A.9, Gaussian, Inc., Pittsburgh, PA, 1998. 52. Becke, A.D. Density-functional thermochemistry. III. The role of exact exchange. J. Chem. Phys. 1993, 98, 5648—52. 53. Frisch, M.J., Trucks, G.W., Schlegel, H.B., Scuseria, G.E., Robb, M.A., Cheeseman, J.R., et al. Gaussian 03, Revision D.01, Gaussian, Inc., Wallingford, CT, 2004. 54. Dufek, P., Blaha, P., Schwarz, K. Determination of the nuclear-quadrupole moment of 57Fe. Phys Rev. Lett. 1995, 75, 3545—8. 55. Biegler-Ko¨nig, F. AIM2000, Version 2.0, University of Applied Science, Bielefeld, Germany, 2002. 56. Bader, R.F.W. Atoms in Molecules–A Quantum Theory, Claredon Press, Oxford, 1990. 57. Bader, R.F.W. A bond path: A universal indicator of bonded interactions. J. Phys. Chem. A 1998, 102, 7314—23. 58. Arnold, W.D., Oldfield, E. The chemical nature of hydrogen bonding in proteins via NMR: J-couplings, chemical shifts, and AIM theory. J. Am. Chem. Soc. 2000, 122, 12835—41. 59. Godbout, N., Sanders, L.K., Salzmann, R., Havlin, R.H., Wojdelski, M., Oldfield, E. Solid-state NMR, Mo¨ssbauer, crystallographic, and density functional theory investigation of Fe-O2 and Fe-O2 analogue metalloporphyrins and metalloproteins. J. Am. Chem. Soc. 1999, 121, 3829—44.

CHAPTER

6 Ab Initio Electron Propagator Methods: Applications to Fullerenes and Nucleic Acid Fragments Viatcheslav G. Zakrzewski, Olga Dolgounitcheva, Alexander V. Zakjevskii, and J.V. Ortiz

Contents

1. Introduction 2. Electron Propagator Theory 2.1 Self-energy approximations 2.2 Quasiparticle virtual orbital spaces 3. Applications 3.1 Buckminsterfullerene, C60 3.2 Oligonucleotides 4. Conclusions Acknowledgments References

Abstract

Energies of electron attachment or detachment for closed-shell molecules and ions that are large by the standards of ab initio quantum chemistry may be calculated accurately and efficiently with electron propagator methods. Low-order, quasiparticle approximations and their renormalized extensions are compared. A procedure for reducing the dimension of the virtual orbital space introduces low errors with respect to ordinary calculations. A study of the vertical ionization energies of the C60 fullerene reveals the presence of many closely coinciding cationic states, some of which exhibit strong correlation effects. Calculations on the electron detachment energies of anionic fragments of nucleic acids produce many final states and indicate that corrections to Hartree�Fock orbital energies are necessary to obtain the correct order.

80 80 81 84 86 86 87 91 92 92

Department of Chemistry and Biochemistry, Auburn University, Auburn, AL, USA Annual Reports in Computational Chemistry, Volume 6 ISSN: 1574-1400, DOI 10.1016/S1574-1400(10)06006-8

2010 Elsevier B.V. All rights reserved.

79

80

Viatcheslav G. Zakrzewski et al.

Keywords: electron propagator; quasiparticle approximations; renormalized approximations; quasiparticle virtual orbitals; C60 fullerene ionization ener gies; correlation states; nucleotide electron detachment energies

1. INTRODUCTION Propagator approaches to quantum mechanical problems have a long history in theoretical physics. Many-body problems may be formulated in terms of a sequence of propagators, also known as Green’s functions [1]. Introduction of a sequence of propagators corresponding to increasing numbers of particles from one to the full, N-particle limit has been shown to be equivalent to solving the many-body Schro¨dinger equation. The one-electron Green’s function, or electron propagator, of an N-electron system is a function whose poles (values that produce singularities because of vanishing denominators) represent all possible electron binding energies. Within the Born—Oppenheimer approximation, poles of the electron propagator equal negative vertical detachment energies (VDEs) or negative vertical attachment energies (VAEs). A recent monograph discusses the introduction of propagator concepts to molecular electronic structure theory and the evolution of their applications [2].

2. ELECTRON PROPAGATOR THEORY The physical content of the electron propagator resides chiefly in its poles, the energies where singularities lie, and residues, coefficients of the terms responsi ble for the singularities. The residue corresponding to an electron propagator pole, Epole, is defined by ResðEpole Þ ¼ limE ! Epole Gpq ðEÞðE Epole Þ. In its spectral form, the r, s element of the electron propagator (or one-electron Green function) matrix is (

Grs ðEÞ

X hNja= jN 1; nÞðN 1; njas jNi r E þ En ðN 1Þ E0 ðNÞ i n ) X hNjas jN þ 1; mihN þ 1; mja= jNi r þ : E Em ðN þ 1Þ þ E0 ðNÞ þ i m

hha=r ; as ii¼ lim !0

ð1Þ

The limit with respect to is taken because of integration techniques required in a Fourier transform from the time-dependent representation. Indices r and s refer to general, orthonormal spin-orbitals, r ðxÞ and s ðxÞ, respectively, where x is a space-spin coordinate. Matrix elements of the corresponding field operators, a=r and as, depend on the N-electron reference state, jNi, and final states with N + 1 electrons, labeled by the indices m and n. The propagator matrix is energy-dependent; poles occur when E equals a negative VDE, E0 ðNÞ En ðN 1Þ, or a negative VAE, Em ðN þ 1Þ E0 ðNÞ.

Ab Initio Electron Propagator Methods

81

Corresponding residues are related to the Feynman—Dyson amplitudes (FDAs), where Ur ; n ¼ hN 1; njar jNi

ð2Þ

Ur ; n ¼ hN þ 1; nja=r jNi:

ð3Þ

or

FDAs suffice for constructing Dyson orbitals (DOs) for VDEs, where ðxÞ ¼ Dyson;VDE n

X r

r ðxÞUr ; n ;

ð4Þ

r ðxÞUr ; n :

ð5Þ

and for VAEs, where Dyson;VAE ðxÞ ¼ n

X r

In the former case, the DO is related to initial- and final-state wavefunctions via Dyson;VDE ðx1 Þ ¼ n ð pﬃﬃﬃﬃ N CN ðx1 ; x2 ; x3 ; . . . ; xN ÞCN1;n ðx2 ; x3 ; x4 ; . . . ; xN Þdx2 dx3 dx4 . . . dxN ;

ð6Þ

and for VAEs via Dyson;VAE ðx1 Þ ¼ n ð p ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ N þ 1 CN þ 1 ; n ðx1 ; x2 ; x3 ; . . . ; xN þ 1 ÞCN ðx2 ; x3 ; x4 ; . . . ; xN þ 1 Þdx2 dx3 dx4 . . . dxN þ 1 : ð7Þ

2.1 Self-energy approximations FDAs and electron binding energies (i.e., negative VDEs and VAEs) can be found by solving the Dyson equation, which, in its inverse form, can be written as 1 G 1 ðEÞ ¼ G 0 ðEÞ ðEÞ:

ð8Þ

A one-electron, zeroth-order Hamiltonian defines a set of reference eigenfunc tions (spin-orbitals) and eigenvalues (") such that the matrix elements of the corresponding inverse propagator matrix read 1 ½G 0 ðE Þ rs ¼ ðE "r Þrs :

ð9Þ

82

Viatcheslav G. Zakrzewski et al.

Canonical Hartree—Fock orbitals have been the usual choice among quantum chemists. Effects of electron correlation and orbital relaxation in final states are described by the self-energy operator, ðEÞ. The latter operator can be written as a sum of energy-independent and energy-dependent parts according to ðEÞ ¼ ð1Þ þ 0 ðEÞ;

ð10Þ

where the first and second terms are also known as the constant and dynamic self-energy operators. The Dyson equation can be recast as follows [3—6]: ½F þ 0 ðEp ÞDyson ¼ Ep Dyson ; p p

ð11Þ

where F is the usual Fock operator with one-electron, Coulomb and exchange components. Because of the energy dependence of 0 ðEÞ, iterations with respect to E may be performed until agreement between the eigenvalue and the argu ment of latter operator is achieved. Practical calculations require approximations in the self-energy operator. Perturbative improvements to Hartree—Fock, canonical orbital energies can be generated efficiently by neglecting off-diagonal matrix elements of the selfenergy operator in this basis. Such diagonal, or quasiparticle, approximations simplify the Dyson equation to the form Ep ¼ "p þ pp ðEp Þ;

ð12Þ

where Ep is the electron binding energy (negative VDE or VAE). Two kinds of quasiparticle techniques have been widely applied: the outer valence Green function (OVGF) methods [7] and the partial third-order (P3) [8—10] approxima tion. Low-order terms in diagonal self-energy matrix elements are evaluated in both cases. These methods work best in correcting Koopmans results for outer valence molecular orbitals (MOs). A criterion of validity for the OVGF and P3 methods is provided by the pole strength, P, which can be calculated as

@ ð! Þ qq Pq ¼ 1 @!

1 ;

ð13Þ

where ð! Þ qq is the q-th diagonal element of the self-energy part for E ¼ !. For quasiparticle methods and their accompanying one-electron description of a transition to be valid, it is necessary that values of Pq be close to unity [7]. Formulas for quasiparticle approximations can be derived in two ways. The first approach arises [7,11] from older, many-body concepts that are related to quantum field theory [12,13]. Pictorial representations of self-energy expressions take the form of Feynman, Goldstone [14], or Huhengoltz [15] diagrams. After a first presentation of the OVGF approximation [16], a more flexible formulation followed [7]. Detailed numerical procedures have also been described [17]. An alternative way to derive the perturbational expansion for the electron propagator is to use an algebraic approach based on superoperators [3—6,18].

Ab Initio Electron Propagator Methods

83

Further development leads to the P3 approximation [8—10], which is more eco nomical as compared to OVGF from the computational point of view. Whereas OVGF and P3 are similar methods, computationally significant differences exist. OVGF requires habjjcdi transformed electron repulsion integrals in the canonical Hartree—Fock basis, where the usual notations for occupied (i,j,k,. . .) and unoc cupied (a,b,c,. . .) MOs are employed. OVGF calculations can still be done without explicitly computing and storing these integrals with algorithms that recalculate them in the atomic orbital basis as they are needed. This strategy [19,20] has been implemented in the Gaussian suite of programs [21]. Calcula tions that are based on limited integral transformations can be done at the cost of extra CPU time. The P3 method does not require habjjcdi transformed integrals for the calculation of VDEs. (Terms with these integrals do occur for P3 calcula tions of VAEs.) OVGF and P3, respectively, have o 4 and o2 3 arithmetic scaling factors for VDEs, where o and are, respectively, the numbers of occupied and virtual MOs. More precise and flexible methods are needed when the quasiparticle approx imation fails or when low-order, perturbative corrections to Koopmans results are likely to be unreliable. There are many circumstances where infinite-order, or renormalized, approximations are needed. Strong relaxation effects, such as those that accompany core ionizations, require an alternative to the quasiparticle meth ods. Strong differential correlation effects are common in the study of the VDEs of anions. Many VDEs of molecules do not correspond even qualitatively to final states that may be described by a single-determinant wavefunction. Photoelec tron spectra, especially in their inner-valence regions, generally do not provide a simple mapping of final states to occupied MOs. In the terminology of config uration interaction, one may describe final states in terms of the number of holes (h) and particles (p) that appear with respect to a Slater determinant for the initial state. For example, one may refer to a 2hp, correlation or shake-up final state, as opposed to an h, principal or Koopmans final state. The ability to describe all of these situations is recovered through the employment of nondiagonal, renorma lized approximations in the self-energy. The superoperator formalism provides a systematic route to this class of approximations. In this approach, determining matrix elements of the self-energy is avoided in favor of solving an eigenvalue problem. The eigenvalues of the superoperator Hamiltonian matrix equal electron binding energies, whereas Dyson orbitals may be obtained from the corresponding eigenvectors. The most widely applied nondiagonal, renormalized approximations lead to Hermitian superoperator Hamiltonian matrices. The matrix elements that occur in ^ C = CE H

ð14Þ

are expressed in terms of field operator products and a reference state, jref i, according to ^ Þ ¼ href j ½Xþ ; ½H;Y þ jref i; ðXjHY

ð15Þ

84

Viatcheslav G. Zakrzewski et al.

^ is the Hamiltonian superwhere H is the second-quantized Hamiltonian and H operator. X and Y are field operator products such that the number of annihilators exceeds the number of creators by one. For example, a simple annihilation operator of the h type, ai, may interact with a 2hp field operator product such as aj ak a=a . In propagator methods, the h, p, 2hp, 2ph, and higher np(n — 1)h and nh(n — 1)p opera tors interact, whereas in configuration interaction approaches, VDEs and VAEs are described by noninteracting, nh(n — 1)p and np(n — 1)h spaces, respectively. For many of the commonly used renormalized methods, such as 2ph-TDA, NR2, and ADC(3), the operator space spans the h, p, 2hp, and 2ph subspaces [7,22]. Reference states are built from Hartree—Fock determinantal wavefunctions plus perturbative corrections. The resulting expressions for various blocks of the superoperator Hamiltonian matrix may be evaluated through a given order in the fluctuation potential. For small blocks, such as those that involve the h and p operator subspaces, calculation of matrix elements is followed by storage. However, for larger blocks (e.g., 2ph—2ph), storage may be infeasible. Matrix elements of this type may be generated as needed in the midst of matrix—vector multiplications that occur in the Davidson diagonalization procedure [23]. Application of this algorithm has certain peculiarities that do not occur in variational calculations, for the eigenva lues of interest are somewhere in the middle of the spectrum. The lowest or highest eigenvalues are seldom of interest. A satisfactory approach to this pro blem has been described [24] and has been used to calculate the photoelectron spectrum of the C60 [25] and that of a free-base phthalocyanine [26]. Pole strengths (PSs) may be obtained from the eigenvectors, C, according to Pw ¼

X

jCiw j2 þ

i

X a

jCaw j2 ;

ð16Þ

where w is a final-state index. For cases with PSs that are close to unity, conver gence of the diagonalization algorithm is usually rapid and one may focus on a single final state at a time. However, for densely spaced correlation final states with low PSs, it may be necessary to seek many eigenvectors simultaneously. Photoelectron spectra often contain closely spaced correlation (satellite or sha keup) states that are difficult to resolve experimentally. Often, only broad envel opes, where vibronic couplings between electronic states are strong, may be seen. Calculations on VDEs frequently exhibit strong interaction between h and 2hp operators in final states that are confined to a narrow range of energies. Such cases indicate a qualitative failure of the Koopmans description. For example, only the lowest VDE from phthalocyanine has a large PS [26]. The other VDEs correspond to C vectors whose largest elements occur in the 2hp sector.

2.2 Quasiparticle virtual orbital spaces In the diagonal, second-order approximation to the self-energy of the electron propagator, solutions of the Dyson equation (with self-consistent pole energies, !p ) satisfy

Ab Initio Electron Propagator Methods

!p ¼ "p þ

X

pp ð!p Þ;

85

ð17Þ

where "p is the p-th canonical, Hartree—Fock orbital energy. Relaxation and correlation corrections to the Koopmans result ("p ) reside in the energydependent self-energy, where pp ð!p Þ ¼

1X jhpajjijij2 1X jhpijjabij2 þ ; 2 iab !p þ "i "a "b 2 aij !p þ "a "i "j

ð18Þ

p; q; r; . . . label general spin-orbitals; i; j; k; . . . label occupied spin-orbitals; and a; b; c; . . . label virtual spin-orbitals. Elements of the first-order, density-difference matrix in the virtual—virtual block are given by [27—29] Dab ¼ ap bp

X

aij bij þ

i<j

X

iac ibc

ð19Þ

i;c

such that qst

hpqjjsti ; !p þ "q "s "t

ð20Þ

where !p is the second-order pole energy associated with the p-th spin-orbital. For the case of electron detachment energies, the first term on the right side of Eq. (19) vanishes, for p is an occupied spin-orbital. In actual computer implementations, MOs are expressed as linear combina tions of atomic orbitals, X ca ðrÞ: ’a ðrÞ ¼ ð21Þ

Here, ’a ðrÞ is a virtual MO and ðrÞ represents an atomic basis function. The diagonalization of D yields a new set of noncanonical, virtual MOs that is related to the original one by a unitary transformation, cd ¼ cU;

ð22Þ

where cd describes the new set of virtual MOs, the transformation matrix U satisfies DU ¼ Ud

ð23Þ

and d is a diagonal matrix containing the eigenvalues of D. Positive and negative eigenvalues are found, for the D matrix describes an electronic density difference. In the method discussed here, the eigenvectors (columns of U) corresponding to eigenvalues with the smallest absolute values are discarded. In this way, the

86

Viatcheslav G. Zakrzewski et al.

virtual space is reduced. Calculations with self-energies that contain third- and higher order terms are performed after diagonalizing the Fock matrix in the new virtual orbital space. The new set is designated by the acronym QVOS (quasi particle, virtual orbital space) [29—31].

3. APPLICATIONS Examples of electron propagator calculations on relatively large systems are given here. All calculations were performed with Gaussian03 [32] and the devel oper’s version of the Gaussian suite of programs [33]. OVGF and P3 codes can be found in Gaussian03.

3.1 Buckminsterfullerene, C60 The equilibrium structure of C60 was obtained with the B3LYP density functional [34] and the 6-31G(d) basis. The Ih point group was imposed during the optimi zation procedure. Semidirect, symmetry-adapted algorithms for electron propa gator calculations [19,20,24,35,36] were employed with the OVGF, ADC(3), and NR2 approximations. The 6-311G(d) basis, with 1080 contracted Gaussian func tions in total, was used. OVGF calculations were performed with a full active virtual space and then with a truncated active space of 645 virtual MOs. (Virtual space reduction by omission of ordinary canonical, Hartree—Fock orbitals did not have any significant effect on the OVGF results; the largest deviation was 0.06 eV.) In ADC(3) calculations, 84 occupied and 465 virtual MOs were used. Table 1 presents ionization energies (IEs) of C60 obtained with the OVGF, ADC(3), and NR2 methods [25] as well as experimental values [37]. PSs are given in parentheses to the right of each energy value. 2 Hu is the first ionized state of the C60. An IE value of 7.67 eV is obtained with the OVGF method. The PS of 0.89 indicates a one-electron process. The ADC(3) method gives 7.68 eV with a PS value of 0.87, thus confirming the one-electron nature of this ionization. Both methods are in excellent agreement with the experimental value [37]. For the second band, two, almost degenerate IEs are obtained. Once again, the OVGF and ADC(3) values are very close. The corresponding MOs exhibit hg and gg symmetry. These IEs correspond qualitatively to experimental estimates [37,38]. Whereas there are no essential satellite lines, many satellites with very small PS values do appear. Two such examples are given in Table 1 for the ionization from the 10hg MO. The third band envelops four ionized states corresponding to the 6gu, 6t2u, 5hu, and 9hg MOs. Of these, electron detachments from the first two MOs are char acterized by low PSs in the OVGF calculations. As such, exact positions of the main lines may be very different from the energy values given in Table 1. OVGF predicts the 2 Gu ionized state at 11.33 eV. The ADC(3) value is essentially the same and the ADC(3) PS for this energy is 0.70, showing this spectral line to be the most intense.

Ab Initio Electron Propagator Methods

87

Table 1 Vertical ionization energies of C60 (eV) Ionization energies (eV)

MO

KT

OVGFa

ADC(3)

NR2

Exp [37,38].

6hu 10hg

7.87 9.65

7.67(0.89) 9.18(0.88)

7.47(0.83) 8.85(0.80)

7.64 + 0.02 8.95b

6gg

9.91

9.23(0.87)

6gu

12.59

11.33(0.78)

7.68(0.87) 9.07(0.83) 10.49(0.002) 10.54(0.02) 9.16(0.84) 10.46(0.003) 11.32(0.70) 9.84(0.02) 10.45(0.02) 10.40(0.006) 10.66(0.002)

6t2u

13.06

11.88(0.79)

5hu 9hg

13.68 14.03

11.79(0.89) 12.13(0.89)

10.48(0.01) 8.82(0.80) 10.47(0.003)

10.82—11.59c

9.79(0.07) 10.34(0.06) 10.38(0.02) 11.29(0.85) 11.65(0.80)

a

645 virtual orbitals retained.

Band centroid.

c Unresolved band.

b

Numerous shake-ups characterized by very small PS values appear for this ionization. The ADC(3) calculations on a system of this size are not feasible for energies outside of the —15 eV to 2 eV range. Whereas a collapse of the one-electron picture of ionization was anticipated for the above two energy levels, the current ADC(3) calculations revealed only two satellite lines for the 2T2u state. A state with a large PS may exist within a larger energy range or there may be a complete breakdown of the one-electron picture of ionization from this MO. The one-electron picture of ionization is confirmed for detachments from the 5hu and 9hg levels as the corresponding OVGF PS values are high. No IEs for these levels are found by the ADC(3) calculations. In attempts to find the energies missed by the ADC(3) procedure, NR2 calculations were performed. The NR2 procedure usually converges faster than ADC(3) and gives somewhat smaller IEs. Two missing IEs were found: one for ionization from the 5hu level which was placed at 11.29 eV and another for ionization from the 9hg level which appeared at 11.65 eV. The PS values imply relatively large one-electron character for both processes.

3.2 Oligonucleotides Electron transfer processes in DNA and RNA are of fundamental importance in biology. These processes can be directly or indirectly responsible for changes in genetic material leading to mutations [39,40]. They may start with electron detach ment from one site in the nucleic acid strand and end with electron attachment to

88

Viatcheslav G. Zakrzewski et al.

another site of the same or a different strand. Ionizing radiation may create a hole in a DNA fragment that is initially localized on a guanine site. Alternatively, the hole may be transported through stacked bases until it localizes on a guanine [41,42]. Nucleic acids per se are too large for computational studies with ab initio theoretical methods. However, electron transfer phenomena occurring in build ing blocks of nucleic acids (oligonucleotides and their fragments) can be studied. An oligonucleotide may contain n sugar-base fragments linked with (n — 1) phosphate groups. Neutral, monoanionic, and polyanionic oligonucleotides of this kind may be studied computationally. Neutral species contain an extra proton bonded to one of the phosphate oxygen atoms. In the case of monoanionic oligonucleotides, only one of the phosphate groups is deprotonated. Thus, nucleotide units contain three types of sites where an extra electron can reside: a base, a phosphate group, or a sugar bridge. Ionizations from bases are very well documented [43—49]. P3 IEs obtained with the 6-311G basis were in excellent agreement with the results of ultraviolet photoelectron spectroscopy experiments. Electron detachment from anionic mononucleotides was studied experimen tally by electrospray photodetachment photoelectron spectroscopy [50]. These experiments are carried out at relatively low temperatures, 50—100C, allowing the perserverance of substrates. The spectra of anionic mononucleotides (dXMP, where X = A, C, G) [50] were resolved to the following extent: errors of +0.50 eV for first VDEs of dAMP, dCMP, and dTMP were given, whereas an error of +0.10 eV was determined for the VDE of dGMP. The VDEs assessed in the same paper as B3LYP energy differences between anionic and neutral states were about 0.5—0.8 eV off the experimental peak positions for dTMP, dCMP, and dAMP. Attempts to assign the spectra on the basis of the Kohn—Sham orbital energies (as if Koopmans’ theorem were applicable) were unsuccessful. VDEs of mononu cleotide anions have also been calculated with ab initio, electron propagator theory [48,51]. Excellent agreement with experimental energy values was obtained, and the results facilitated assignment of experimental spectra. The adiabatic electron detachment energies of all 16 possible monoanionic dinucleotides have been tabulated in [50], but no VDE values were reported. The spectra of a few dinucleotides were displayed in the same publication. These included the spectra of dAA, dCC, dGG, dTT, dAG, dCG, dGA, and dTG. With some careful inspection of the onsets, it is possible to estimate VDEs from these spectra. Here we present the results of P3 studies of the VDEs of the most stable conformation of a dinucleotide–20 ,20 -deoxyribodithymidine-30 ,50 -monophosphate anion (dDTMP or dTT). The choice of this particular dinucleotide was dictated by (1) the availability of electrospray photodetachment photoelectron spectroscopy data, (2) the relatively small size of dTT, and (3) the lack of tautomeric transitions in the parent nucleobase, thymine. The structure depicted in Figure 1 seems to be a global minimum. It was found among numerous conformations of dTT by means of, first, molecular mechanics analysis and, second, reoptimization of the structures with the B3LYP/6-311 þ þ G model. The structure shown in Figure 1 is charac terized by two H bonds between one of the phosphate oxygens and the two

Ab Initio Electron Propagator Methods

89

Figure 1 The most stable structure of 20 ,20 -deoxyribodithymidine-30 ,50 -monophosphate anion (dTT).

˚ , whereas protons of OH groups on each sugar. One H bond has a length of 1.72 A ˚ the other is 1.86 A long. In addition, both thymine base rings are located on the same side (relative to the approximate plane formed by the phosphate group and sugars) and are perpendicular to each other. The system under consideration is very large for electron propagator calcula tions with a full active space of orbitals. The number of basis functions for the 6-311 þ þ G basis for dTT totals 1004. There are 102 valence occupied MOs and 861 virtual MOs. P3 calculations with an active space of this size failed even when attempted on a computer with 7 Tb of disk space available for storage of intermediate data. A significant reduction of the active space was needed. With current computer resourses, the time of calculations is no longer an issue, whereas the amount of disk space available to store the integrals and intermediates arising in the course of electron propagator calculations is. A reduction in the disk space is achieved through the following procedures. The first step of the QVOS approach requires second-order propagator calculations for occupied MOs. Transformed integrals of the hijjjkli, hijjjkai, hijjjabi, and hiajjjbi types must be stored, which means that the largest integral files are of o2v2 size. This amount of disk space remains the same for every calculation. The P3 calculations need hiajjbci integrals; these storage requirements scale as ov3. Reduction of the virtual space thus leads to significant savings in disk space. The QVOS method was only recently implemented in the Gaussian suites of programs [30] and the largest systems to which it has been applied so far are polycyclic aromatic hydrocarbons [31].

3.2.1 Deoxyribothymidine monophosphate anion The most stable conformation of deoxyribothymidine monophosphate anion (dTMP) (Figure 2) was used to test the performance of the QVOS method.

90

Viatcheslav G. Zakrzewski et al.

Figure 2 The most stable structure of deoxyribo-thymidinemonophosphate anion (dTMP).

With the 6-311 þ þ G basis, the total number of MOs is 568. There are 84 occupied MOs. Table 2 presents the VDEs of this anion calculated with a variable number of virtual orbitals retained according to the QVOS scheme [30]. The difference between the current QVOS code and the one described previously [30] is that in the latter all MOs had to be included, whereas in the current variant any orbital window can be chosen at the second-order step. Columns 1 and 2 of Table 2 list VDEs calculated either (1) with no reduction of the virtual space or (2) by simply omitting core and high-energy virtual MOs. In the latter case, the original set of canonical, Hartree—Fock MOs is employed. Results of gradual reduction of the virtual orbital space are given in the next five columns. Percentages of virtual MOs retained in the calculations are tabu lated. The last column contains the experimental VDEs [50]. The first Table 2 P3 QVOS VDEs of dTMP (eV) VDE

MO

p1 PO4 PO4 þ S S þ PO4 PO4 þ S PO4 þ S p2 PO4 þ nT Disk space, GB a

1a

6.16 6.24 6.60 6.84 6.83 7.37 7.87 7.66 372

2b

6.15 6.23 6.59 6.84 6.83 7.37 7.87 7.66 333

QVOS,VMO kept

Exp [50]

80%

70%

60%

50%

40%

6.15 6.23 6.59 6.83 6.82 7.36 7.87 7.65 209

6.15 6.23 6.59 6.82 6.82 7.36 7.86 7.65 162

6.15 6.23 6.59 6.82 6.82 7.34 7.86 7.64 148

6.13 6.23 6.58 6.80 6.80 7.31 7.85 7.62 148

6.12 6.22 6.56 6.76 6.78 7.26 7.83 7.58 148

All virtual MOs included.

25 core occupied MOs and 25 highest virtual MOs excluded from the orbital space.

b

5.85 + 0.50 6.2 6.6 6.8 6.8 7.0 — 7.5

Ab Initio Electron Propagator Methods

91

experimental VDE was explicitly tabulated [50], whereas the other values were obtained by inspection of the spectral curves. Regular P3 provided VDE values that are in excellent agreement with the experiment. Reduction of the virtual space by 25 highest-energy MOs does not lead to any noticeable changes in the VDE values. The amount of disk space required by calculations is reduced by 42 GB, which is essential for a system of this size. Utilization of the QVOS procedure with a virtual MO space that is 80% as large as the original does not change the VDE values, but the disk space is reduced dramatically. Further reduction of the disk space is achieved with retention of fewer MOs. No additional storage reductions are obtained for virtual orbital spaces that are less than 60% of their original dimension. After this point, disk requirements are determined by the intermediates produced in the first step of the integral transformation.

3.2.2 20 ,20 -Deoxyribodithymidine-30 ,50 -monophosphate anion

Table 3 contains the VDE values obtained for dTT with the 6-311 þ þ G basis and a 50%-reduced virtual MO space produced by the QVOS procedure. (Koopmans’ theorem, or KT, results are listed next to P3 values.) The first VDE of the most stable conformation of dTT is placed at 6.58 eV. The corresponding MO is delocalized over the phosphate group (the main contribution) and the 30 thymine. The second VDE is only 0.2 eV higher and the respective MO is delocalized over the phosphate group, the 50 sugar and the 50 thymine. (The latter fragment’s contribution is largest.) The third energy is 6.92 eV. The corresponding MO is almost completely localized on the 30 thymine with small contributions from the phosphate and the corresponding sugar. The P3 order of these energies is different from the one based on Koopmans’ theorem. The last two energies under consideration are within 0.02 of each other and pertain to electron detach ment from a 50 sugar-base fragment and an MO delocalized over oxygen atoms of the system, respectively. All the energies fit well under the experimental envel opes [50]. All PSs are larger than 0.88.

4. CONCLUSIONS Electron binding energies of closed-shell molecules and ions that are large by the standards of ab initio quantum chemistry may be evaluated accurately and Table 3

QVOS VDEs of the lowest energy conformation of dTT (eV)

MO

KT

P3

Exp [50]

PO4 þ T(30 ) PO4 þ S(50 ) þ T(50 ) T(30 ) þ S(30 ) þ PO4 T(50 ) þ S(50 ) Os, no base

7.16 8.81 7.52 9.07 9.19

6.58 6.78 6.92 7.27 7.29

6.5 6.9 7.1 7.3 7.3

92

Viatcheslav G. Zakrzewski et al.

efficiently with electron propagator methods. Quasiparticle approximations gen erally suffice for predictions that involve the lowest VDEs or VAEs of a molecule or ion. For calculations of higher accuracy or for the study of transitions where strong relaxation or correlation effects are present, nondiagonal, renormalized approximations are preferable. The arithmetic and storage requirements of these methods are highly dependent on the number of virtual orbitals. Therefore, the QVOS procedure for reducing the dimension of the virtual orbital space with minimal loss of accuracy has been introduced. Both classes of self-energy approximations yield useful data for C60. Nondia gonal, renormalized methods reveal the presence of correlation states in photo electron spectra. Quasiparticle calculations on nucleic acid fragments with one or two bases yield many final states that may be obtained from anionic states by electron detachment. The QVOS procedure introduces only minor errors while providing large improvements in computational efficiency. Propagator calculations on an anion with two thymine bases amend the order of final states predicted by Hartree—Fock orbital energies and exhibit the need for correlated methods in interpreting anion photoelectron spectra of nucleic acid fragments.

ACKNOWLEDGMENTS This work was supported by the National Science Foundation through grant CHE-0809199 to Auburn University.

REFERENCES 1. Migdal, A.B. Theory of Finite Fermi Systems, Wiley-Interscience, New York, 1967. ¨ hrn, Y. Propagators in Quantum Chemistry, 2nd edn., Wiley-Interscience, New 2. Linderberg, J., O Jersey, 2004. 3. Ortiz, J.V., Zakrzewski, V.G., Dolgounitcheva, O. In Conceptual Trends in Quantum Chemistry (ed E.S. Kryachko), Vol. 3, Kluwer, Dordrecht, 1997, pp. 465—517. 4. Ortiz, J.V. In Computational Chemistry: Reviews of Current Trends (ed J. Leszczynski), Vol. 2, World Scientific, Singapore, 1997, pp. 1—61. 5. Ortiz, J.V. Toward an exact one-electron picture of chemical bonding. Adv. Quantum Chem. 1999, 35, 33—52. 6. Flores-Moreno, R., Melin, J., Dolgounitcheva, O., Zakrzewski, V.G., Ortiz, J.V. Three approxima tions to the nonlocal and energy-dependent correlation potential in electron propagator theory. Int. J. Quantum Chem. 2010, 110, 706—15. 7. von Niessen, W., Schirmer, J., Cederbaum, L.S. Computational methods for the one-particle Green’s Function. Comput. Phys. Rep. 1984, 1, 57—125. 8. Ortiz, J.V. Partial third order quasiparticle theory: Comparisons for closed-shell ionization ener gies and an application to the borazine photoelectron spectrum. J. Chem. Phys. 1996, 104, 7599—605. 9. Ortiz, J.V., Zakrzewski, V.G. A test of partial third order electron propagator theory: Vertical ionization energies of azabenzenes. J. Chem. Phys. 1996, 105, 2762—69. 10. Ferreira, A.M., Seabra, G., Dolgounitcheva, O., Zakrzewski, V.G., Ortiz, J.V. In QuantumMechanical Prediction of Thermochemical Data (ed J. Cioslowski), Kluwer, Dordrecht, 2001, pp. 131—60.

Ab Initio Electron Propagator Methods

93

11. Cederbaum, L.S., Domcke, W. Theoretical aspects of ionization potentials and photoelectron spectroscopy: Green’s function approach. Adv. Chem. Phys. 1977, 36, 205—344. 12. March, N.H., Young, W.H., Sampanthar, S. The Many-Body Problem in Quantum Mechanics, Cambridge University Press, London, 1967. 13. Abrikosov, A.A., Gorkov, L.P., Dzyaloshinski, I.E. Methods of Quantum Field Theory in Statistical Physics, Prentice-Hall, Englewood Cliffs, NJ, 1963. 14. Goldstone, J. Derivation of the Brueckner many-body theory. Proc. Roy. Soc. A 1957, 239, 267—79. 15. van Hove, L., Hugenholtz, L., Howland, L. Quantum Theory of Many-Particle Systems, Benjamin, New York, 1961. 16. Cederbaum, L.S. One-body Green’s function for atoms and molecules. Theory and applications. J. Phys. B 1975, 8, 290—303. 17. Zakrzewski, V.G., Ortiz, J.V., Nichols, J.A., Heryadi, D., Yeager, D.L., Golab, J.T. Comparison of perturbative and multiconfigurational electron propagator methods. Int. J. Quantum Chem. 1996, 60, 29—36. ¨ hrn, Y., Born, G. Molecular electron propagator theory and calculations. Adv. Quantum Chem. 18. O 1981, 13, 1—88. 19. Zakrzewski, V.G., Ortiz, J.V. Semidirect algorithms in electron propagator calculations. Int. J. Quantum Chem. 1994, S28, 23—7. 20. Zakrzewski, V.G., Ortiz, J.V. Semidirect algorithms for third order electron propagator calcula tions. Int. J. Quantum Chem. 1995, 53, 583—90. 21. GAUSSIAN-09, (Revision A.1)Frisch, M.J., Trucks, G.W., Schlegel, H.B., et al. Gaussian, Inc, : 2009, Wallingford, CT. 22. Ortiz, J.V. A non-diagonal, normalized extension of partional third-order quasiparticle theory: Comparisons for closed-shell ionization energies. J. Chem. Phys. 1998, 108, 1008—18. 23. Davidson, E.R. Iterative calculation of a few of lowest solutions of large real-symmetric matrices. J. Comput. Phys. 1975, 17, 87—94. 24. Zakrzewski, V.G., Dolgounitcheva, O., Ortiz, J.V. Improved algorithms for renormalized electron propagator calculations. Int. J. Quantum Chem. 1999, 75, 607—14. 25. Zakrzewski, V.G., Dolgounitcheva, O., Ortiz, J.V. Electron propagator calculations on C60 and C70 photoelectron spectra. J. Chem. Phys. 2008, 129, 104306. 26. Zakrzewski, V.G., Dolgounitcheva, O., Ortiz, J.V. Strong correlation effects in the electron binding energies of phthalocyanine. Int. J. Quantum Chem. 2009, 109, 3619—25. 27. Cioslowski, J., Ortiz, J.V. One-electron density matrices and energy gradients in second-order electron propagator theory. J. Chem. Phys. 1992, 96, 8379—89. 28. Ortiz, J.V. Energy gradients and effective density differences in electron propagator theory. J. Chem. Phys. 2000, 112, 56—68. 29. Flores-Moreno, R., Ortiz, J.V. In Practical Aspects of Computational Chemistry, Methods, Con cepts and Applications (eds J. Leszczynski, and M.K. Shukla), Springer, Heidelberg, 2009, pp. 1—17. 30. Flores-Moreno, R., Ortiz, J.V. Quasiparticle virtual orbitals in electron propagator calculations. J. Chem. Phys. 2008, 128, 164105. 31. Dolgounitcheva, O., Flores-Moreno, R., Zakrzewski, V.G., Ortiz, J.V. Virtual space reduction in quasi-particle electron propagator calculations: Applications to polycyclic aromatic hydrocarbons. Int. J. Quantum Chem. 2008, 108, 2862—9. 32. GAUSSIAN 03, (Revision C.03)Frisch, M.J., Trucks, G.W., Schlegel, H.B., et al. Gaussian, Inc, : 2004, Wallingford, CT. 33. GAUSSIAN, (Revision H.01)Frisch, M.J., Trucks, G.W., Schlegel, H.B., et al. Gaussian, Inc, : 2008, Wallingford, CT. 34. Becke, D.A. Density-functional exchange-energy approximation with correct asymptotic behavior. Phys. Rev. A 1988, 38, 3098—100;Lee, C.T., Yang, T., Parr, R.C. Development of the Colle-Salvetti correlation-energy formula into a functional of the electron-density, Phys. Rev. B, 1988, 37, 785—9. 35. Zakrzewski, V.G., von Niessen, W. Vectorizable algorithm for Green-function and many-body perturbation methods. J. Comput. Chem. 1993, 14, 13—8. 36. Zakrzewski, V.G., Dolgounitcheva, O., Ortiz, J.V. Efficient electron propagator algorithms for shakeup final states: Anthracene and acridine. Int. J. Quantum Chem. 2000, 80, 836—41.

94

Viatcheslav G. Zakrzewski et al.

37. Lichtenberger, D.L., Nebesny, K.W., Ray, C.D., Huffman, D.R., Lamb, L.D. Valence and core photoelectron-spectroscopy of C60, Buckminsterfullerene. Chem. Phys. Lett. 1991, 176, 203—8. 38. Lichtenberger, D.L., Jatcko, M., Nebesny, K.W., Ray, C.D., Huffman, D.R., Lamb, L.D. The ioniza tions of C60 in the gas phase and in thin solid films. Mat. Res. Soc. Symp. Proc. 1991, 206, 673—8. 39. Becker, D., Sevilla, M.D. In Advances in Radiation Biology (eds J.T. Lett and H. Adler), Vol. 17, Academic Press, New York, 1993, pp. 121—80. 40. Giese, B. Electron transfer in DNA. Curr. Opin. Chem. Biol. 2002, 6, 612—8. 41. Steenken, S. Purine-bases, nucleosides, and nucleotides–aqueous-solution redox chemistry and transformation reactions of their radical cations and e- and OH adducts. Chem. Rev. 1989, 89, 503—20. 42. Steenken, S., Jovanovic, V. How easily oxidizable is DNA? One-electron reduction potentials of adenosine and guanosine radicals in aqueous solution. J. Am. Chem. Soc. 1997, 119, 617—18. 43. Dolgounitcheva, O., Zakrzewski, V.G., Ortiz, J.V. Electron propagator calculations on uracil and adenine ionization energies. Int. J. Quantum Chem. 2000, 80, 831—5. 44. Dolgounitcheva, O., Zakrzewski, V.G., Ortiz, J.V. Electron propagator theory of guanine and its cations: Tautomerism and photoelectron spectra. J. Am. Chem. Soc. 2000, 122, 12304—9. 45. Dolgounitcheva, O., Zakrzewski, V.G., Ortiz, J.V. Ionization energies and Dyson orbitals of thy mine and other methylated uracils. J. Phys. Chem. A 2002, 106, 8411—6. 46. Dolgounitcheva, O., Zakrzewski, V.G., Ortiz, J.V. Electron binding energies of nucleobases and nucleotides. Int. J. Quantum Chem. 2002, 90, 1547—54. 47. Dolgounitcheva, O., Zakrzewski, V.G., Ortiz, J.V. In Fundamental World of Quantum Chemistry: A Tribute to the Memory of Per-Olov Lo¨ wdin (eds E.J. Bra¨ ndas and E.S. Kryachko), Vol. 2, Kluwer, Dordrecht, 2003, pp. 525—55. 48. Zakjevskii, V.V., Dolgounitcheva, O., Zakrzewski, V.G., Ortiz, J.V. Electron propagator studies of vertical electron detachment energies and isomerism in purinic deoxyribonucleotides. Int. J. Quantum Chem. 2007, 107, 2266—73. 49. Dolgounitcheva, O., Zakrzewski, V.G., Ortiz, J.V. Vertical ionization energies of adenine and 9 methyl adenine. J. Phys. Chem. A 2009, 113, 14630—5. 50. Yang, X., Wang, X.B., Vorpagel, E.R., Wang, L.S. Direct experimental observation of the low ionization potentials of guanine in free oligonucleotides by using photoelectron spectroscopy. Proc. Nat. Acad. Sci. U.S.A. 2004, 101, 17588—92. 51. Zakjevskii, V.V., King, S.J., Dolgounitcheva, O., Zakrzewski, V.G., Ortiz, J.V. Base and phosphate electron detachment energies of deoxyribonucleotide anions. J. Am. Chem. Soc. 2006, 128, 13350—1.

Section 3

Chemical Education

Section Editor: George C. Shields Dean’s Office and Department of Chemistry,

College of Arts and Sciences, Bucknell University,

Lewisburg, PA 17837, USA

CHAPTER

7 Using Density Functional Theory Methods for Modeling Induction and Dispersion Interactions in LigandProtein Complexes Hunter Utkov, Maura Livengood, and Mauricio Cafiero

Contents

1. Introduction 2. LigandProtein Complexes 3. Density Functional Theory 4. The Correct Path to Dispersion 5. Applications 6. Conclusions References

Abstract

Density functional theory (DFT) is a relatively fast and inexpensive ab initio computational method that can be used to compute high-accuracy electronic and structural properties of molecular systems. Due to DFTs formal and algorithmic flexibility, it has the potential to model much larger systems than any other ab initio method. One of the modern challenges in science, particularly computational science, is the accurate modeling of proteins and nucleic acids, and the interactions of these macromolecules with their small molecule substrates, including drugs. These intermolecular interactions include hydrogen bonding and other pure electrostatic interactions, which can be modeled accurately using a wide variety of computational methods, and induction/dispersion interactions, which are more difficult to model with current methods. DFT is the most promising method available today to model full-scale proteinligand interactions including induction/dispersion. Currently, DFT has been applied to small models that represent these biological systems, and is increasingly being applied to small subsets of these biological systems. These applications

98 99 100 103 107 110 111

Department of Chemistry, Rhodes College, Memphis, TN, USA Annual Reports in Computational Chemistry, Volume 6 ISSN: 1574-1400, DOI 10.1016/S1574-1400(10)06007-X

2010 Elsevier B.V. All rights reserved.

97

98

Hunter Utkov et al.

provide proof of concept for the potential future application of these methods to full-scale proteinligand interactions. Keywords: DFT; dispersion; protein; ligand; van der waals

1. INTRODUCTION Chemical-level accuracy in computational chemistry is 1—2 kcal/mol; this means that calculations of energy-related molecular properties should lie within 1—2 kcal/mol of the true values in order to be of use at the experimental level. Covalent bonds have bond energies on the order of approximately 100 kcal/mol. Ion interactions can have binding energies on the order of 50 kcal/mol. Hydrogen bonds can have binding energies on the order of 15 kcal/mol, with other dipole—dipole interactions having binding energies between 5 and 20 kcal/mol. Induction and dispersion interactions, being the weakest intermolecular interactions, can have binding energies of less than 5 kcal/mol. It is clear that chemical accuracy is very important in computing induc tion/dispersion interactions since an error even this small can be a 20—40% error in these interactions. Covalent bonds and molecular structures and energetics that depend on covalent bonds can be modeled with a wide variety of methods. Molecular mechanics, semiempirical, and the Hartree—Fock (HF) ab initio methods can be used with a good degree of success to model covalent interactions. Subtle covalent structures and dipole-bound structures and energetics (including hydrogen-bond-bound structures and energetics) can be modeled with these same methods, though greater success is achieved with the post-HF methods, such as Moller—Plessett perturbation theory (MP2, MP4, etc.), configuration interaction (CI), and coupled cluster (CC). These post-HF methods, often called correlated methods, take into account the electron correlation, which is crucial to high-accuracy modeling of electron-bearing systems. Chemically accurate calculation of induction and dispersion-bound structure and energetics usually requires a high-level correlated method; MP2 is a minimal method for these types of studies, and high-quality coupled cluster calculations are preferred. Density functional theory (DFT), as will be described in more detail below, is an ab initio method that can, in principle, model structure and energetics exactly, including all electron correlation effects such as induction and dispersion. DFT has the benefit that it can achieve the accuracy usually attributed to coupled cluster at the cost of the much less expensive HF. This combination of possible accuracy and computational efficiency makes DFT the ab initio method of choice for the analysis of large molecular systems. Biological systems are one of the most challenging areas of research today. The modeling of biological processes usually requires treatment of proteins or nucleic acid macromo lecules, as well as the smaller molecules that interact with them. In general a smaller molecule that interacts in some way with a protein or nucleic acid is a ligand; the specific ligand that is meant to interact with a protein or nucleic acid is that molecule’s substrate. Other small molecules, including drug molecules, can be used to alter the action of the protein or nucleic acid.

Using Density Functional Theory Methods in LigandProtein Complexes

99

2. LIGANDPROTEIN COMPLEXES Perhaps the most visible currently investigated class of ligand—protein complexes are drug—protein systems. Drugs often work as competitive inhibitors: binding to an enzyme active site to prevent the natural ligand (substrate) from binding and thus preventing catalysis of the intended reaction. They can also act as noncom petitive inhibitors, binding to a different portion of the enzyme (an allosteric site) and altering the protein’s structure or function. This drug—protein binding occurs through various nonbonded intermolecular interactions. The specificity of a particular drug for its target enzyme depends largely on the strength and speci ficity of these nonbonded interactions. While hydrogen bonds, charge—charge interactions, and other strong electrostatic forces are usually the primary forces involved in these interactions, induction and dispersion (or van der Waals, vdW) forces can also play an important role. Of the vdW’s forces involved in drug—protein binding, one of the most interesting and notoriously difficult to model computationally is the p-stacking aromatic interaction. For the purpose of this review, we will refer to all induction and dispersion interactions resulting from noncharged aromatic rings (single and polycyclic) simply as aromatic interactions. A recent paper from GlaxoSmithK line states that the average number of aromatic rings in preclinical candidate drug molecules is 3.3, and the number of aromatic rings in proof-of-concept candidate drug molecules is 2.3 [1]. The average number of aromatic rings in oral drugs is 1.6, reported in the same study. While they cannot vouch for similar numbers for other pharmaceutical companies, it is clear that aromatic rings are abundant in drug molecules and thus interactions between these aromatic rings and their protein targets are also very important. These interactions can range from pure dispersion between phenyl groups to the stronger induction and electrostatic forces between aromatic rings containing heteroatoms and polar groups. On the protein side, the target active sites to which the drugs bind can be classified as either hydrophilic or lipophilic, or some combination of the two. Given the ubiquitousness of aromatic rings in drug molecules, it follows that many active sites are either fully or partially lipophilic. Nonpolar amino acids such as leucine (LEU), isoleucine (ILE), and phenylalanine (PHE) are found in lipophilic regions of active sites; tyrosine (TYR) and tryptophan (TRP), while polar, are also found in lipophilic active sites. All of these amino acids bind to drug molecules via induction and dispersion forces. A common weak interaction between drugs and protein residues is the p-stacking interaction between aro matic rings on drug molecules and aromatic rings in PHE, TYR, and TRP. Modern drug design increasingly uses modeling as a cost-saving measure. This computer-aided drug design (CADD) simulates experiments by examining the interactions between potential drug candidates and their target proteins. The two main tools needed in the process of CADD are ligand docking and pose scoring [2]. In ligand docking, a drug candidate molecule is placed into the active site of a protein structure (usually experimentally determined via either X-ray crystallography or NMR spectroscopy). This particular placement, or pose, is

100

Hunter Utkov et al.

then “scored”; the feasibility of the pose is assigned an energy score based on a rough estimation of interactions between the drug candidate molecule and the various residues in the active site. Many poses are evaluated and the top poses are returned and used to evaluate the potential of a candidate molecule for further development. Additionally, the poses are used to elucidate the modes of interaction between the candidate and the active site; this knowledge is used in future development of novel drug candidates. Leach et al. [2] discuss the state of the art in docking and scoring routines, finding that both are currently lacking. Placement of a drug with aromatic rings in an active site with aromatic residues will necessarily depend heavily on aromatic interactions. These aromatic interactions are very poorly described in modern docking engines and scoring functions. While many approaches are being explored that use empirical data to better model these aromatic interac tions, the future of docking and scoring clearly lies in the ability to calculate intermolecular interactions on the fly for use in docking simulations. Advances in computing technology, particularly GPU-based computing that is very well suited to the parallel calculations required in docking, suggest that all-electron, ab initio docking and scoring is not far off. For that reason it is important to examine the ab initio methods with the most potential for use in docking and scoring: the DFT. Until all-electron, ab initio CADD arrives, we can use DFT to examine smaller drug—protein complexes and to validate current scoring functions in order to determine the best way to move forward. This review will discuss some basic DFT formalism, explore current approaches in DFT that are likely to lead to methods appropriate for drug—protein systems, and discuss some recent applica tions of DFT to ligand—protein interactions.

3. DENSITY FUNCTIONAL THEORY In this section we will provide a short introduction to the computational methods discussed in this review. For more information please see the references below. DFT is based on the Hohenberg—Kohn theorem, which states that the exact energy of a molecular system depends on the electron density () of that system; it can be simply summarized as [3] EEXACT ¼ E½: Since the electron density can be written as a function of a single determinant (although not uniquely, as in the Kohn—Sham (KS) implementation of DFT [4]), we can say that the exact energy can be written as a function of a single determinant. In the DFT, the total molecular energy is decomposed into several components: EEXACT ¼ ET ½ þ Eext ½ þ Ecoul ½ þ EX ½ þ EC ½:

Using Density Functional Theory Methods in LigandProtein Complexes

101

From left to right, these are the kinetic energy, the external potential (due to nuclei, electric and magnetic fields, etc.), the coulomb energy between electrons, the exchange energy between electrons, and the correlation energy between electrons. In the KS implementation of DFT, which is the most widely spread and used form of DFT today, we assume the density for an n-electron system can be obtained from a single Slater determinant: ð ¼ jcSD j2 dr2 :::drn ; or, more directly from orbitals: ¼

X

2 :

The density is determined iteratively from orbitals through a self-consistent field calculation, entirely analogous to HF calculations. While HF determines the orbitals that minimize the Fock operator: f^ ¼

X ZA 1 r2 2 r1A A

!

ð þ

ðr2 Þ

^ 12 dr2 þ K r12

^ where the sum runs over all nuclei, A; ZA is the charge of the A-th nucleus, and K is the exchange potential, KS DFT finds the orbitals that minimize the KS operator: f^ ¼

X ZA 1 r2 2 r1A A

!

ð þ

ðr2 Þ

^ XC dr2 þ V r12

^ XC is the combined exchange and correlation potential. We can see then where V that HF is a special case of KS DFT where the correlation is neglected and the HF exchange potential is used. Modern KS DFT methods are distinguished by the functional form of the exchange/correlation potential, and are thus often named by the authors of these potentials. An excellent overview of modern DFT can be found in the work of Koch and Holthausen [5]. Specific DFT methods mentioned below are identified through a reference to the original papers; these methods may also be found in the book by Koch and Holthausen along with appropriate references. The earliest class of DFT methods is known as local (electron) density approx imation (LDA) methods; in the case that the total electron density is decomposed into individual spin densities for þ1/2 and —1/2 spin we refer to these methods as local spin density approximation (LSDA) methods. In these methods the total molecular XC energy is evaluated by integration on a numerical grid of the electron density, and the energy is a function of only the specific value of the density at each point, hence the local density:

102

Hunter Utkov et al.

ELSDA ¼ XC

Xð

drFLSDA XC ð Þ:

One of the most commonly used LSDA methods combines the exchange potential of Slater [6] and the correlation potential of Vosko et al. [7], and is referred to as the SVWN DFT method. In the next level of approximation, the generalized gradient approximation (GGA), the total molecular XC energy, again evaluated on a grid, is written as a function of both the value of the density and the gradient of the density: EGGAA XC

¼

Xð

GGA drFXC ð ; r Þ;

the gradient provides information about how the density changes as we leave that particular point of integration. These methods are sometimes incorrectly referred to as nonlocal methods, but are more properly regarded as semilocal methods. There are many commonly used GGA DFT methods, many of which will be mentioned below. GGA DFT can be improved in several ways; here, we will briefly discuss the hybrid methods and the meta-GGA methods. In the meta-GGA approach, the molecular XC energy at each integration point is described by the density, the gradient of the density, and the orbital-dependent kinetic energy density: ¼ EmetaGGA XC

Xð

drFmetaGGA ð ; r ; Þ: XC

In the hybrid approach, the exchange and correlation energies are calculated by the LSDA, GGA, or meta-GGA methods, and are then combined with the HF exchange energy: EXC ¼ aEX;HF ½ þ bEX;DFT ½ þ cEC;DFT ½; where a, b, and c are weighting coefficients decided by the authors of the particular method. It is also common to further decompose the exchange and correlation energy components into their local (density-dependent) and semilocal (density and density gradient-dependent) contributions, with each having its own weight coefficient. The ubiquitous B3LYP method is the most widely used of the hybrid methods. DFT methods are semiempirical in that most of them are usually parameter ized at some point against some experimental or high-level ab initio data. Most methods under development until the early 2000s, including LSDA, GGA, hybrid, and meta-GGA, were not parameterized to deal specifically with nonbonded interactions, but rather they were tested or trained against atomization energies, ionization potentials, electron affinities, and standard covalent bond lengths. Despite this lack of calibration against intermolecular forces, many DFT

Using Density Functional Theory Methods in LigandProtein Complexes

103

methods have been successful in treating strong interactions such as hydrogen bonds. Recently, more DFT methods have been developed specifically for treat ing weak interactions including dispersion. We will discuss some of these efforts in the next section.

4. THE CORRECT PATH TO DISPERSION An excellent place to begin reviewing the recent work on the application of DFT to induction/dispersion-bound ligand—protein systems is the work of Hobza et al. In 2006, Jurecka et al. [8] proposed a database of interactions between molecular fragments important in biochemical systems. This database included hydrogen-bonded complexes of DNA bases and other relevant molecules, but also included a substantial portion of molecules that interact via vdW’s forces. Some of these forces–traditionally viewed as weaker, and consequently less important–have been shown recently to be very physiologically important. The 100þ interaction energies of the complexes in the JSCH-2005 database were evaluated using MP2 and CCSD(T) calculations extrapolated to the basis set limit. Subsets of this valuable reference were used by Riley et al. [9] in 2007 in the development and validation of the DFT-D (DFT plus dispersion) method for explicitly adding dispersion to conventional DFT functionals. This DFT-D method was shown in Hobza et al.’s work to produce excellent results for the interaction energies of biologically relevant molecular complexes. In the DFT-D method, the dispersion term was used to augment the TPSS functional, and indeed it was able to produce qualitatively correct binding inter action energies for several dispersion-bound complexes, whereas the original TPSS functional predicted only repulsive interactions. Continuing this work in 2008, Valdes et al. [10] used DFT-D to augment TPSS as well as PBE to study the conformations of several short peptides that rely on dispersion interactions for their structure. In that work they were able to show that again TPSS-D out performed the unaugmented TPSS. Two points should be made here about the DFT-D work of Hobza et al. In the above 2008 study, the authors also showed that the novel functionals of Zhao and Truhlar [11—14] performed as well as or better than the explicitly dispersion-augmented functionals. In a series of developments, Zhao and Truhlar arrived at two functionals, the PWB6K and M05-2X functionals (and related functionals), that can model dispersion forces very accurately and dependably. Further, the DFT methods used by Hobza et al. in the explicit dispersion method are functionals (TPSS and PBE) that have been previously shown to fail to predict attractive interactions for dispersion-bound systems (for ring—ring interactions, the work of Cafiero et al. has shown this [15]). Other DFT methods, such as the HCTH set of functionals by Handy et al. [16,17], do predict attractive interactions for dispersion-bound ring—ring com plexes. Augmenting these functionals with the explicit dispersion would in effect double count the dispersion forces and overestimate interaction energies.

104

Hunter Utkov et al.

In the end, a DFT functional that accurately and generally accounts for dispersion is needed, but the above raises the question of how to approach this. The work of Truhlar suggests that a careful parameterization of existing func tionals is the better path. This approach supposes that all of the information to account for dispersion is already contained implicitly in the single-determinant formalism of KS DFT, or can otherwise be accounted for in this formalism. The work of Hobza suggests that dispersion is not intrinsic to KS DFT, and must be added explicitly. There is theoretical support for both the implicit and explicit approaches. The Hohenberg—Kohn theorem suggests that the implicit approach should work: the exact energy is a function of the density that is determined (although not uniquely) from a single determinant in the KS approach. Examina tion of current DFT functionals has shown, however, that low-density behavior is woefully misrepresented in the KS density. This would suggest the necessity of an explicit treatment of dispersion. The work of Zhao and Truhlar is based on existing functionals. The M05-2X functional has been shown to perform very well for nonbonded, biological inter actions, including those in the S22 database of Hobza et al. M05-2X is a hybrid meta-GGA DFT functional, meaning that it combines nonlocal HF exchange with DFT exchange and correlation, which depends on the electron density (), the gradient of this density, and the kinetic energy density (). The DFT exchange in M05-2X is based on the PBE exchange energy density (FX PBE): EXDFT ¼

Xð

drFPBE X ð ; r Þf ðw Þ;

where the kinetic energy enhancement factor (f) contains adjustable parameters. The correlation energy for M05-2X (based on the PW correlation) is broken up into components for parallel spins and for opposite spins and depends on the uniform electron gas limit density (eUEG): E C E C

ð ¼ dreUEG g ðx ; x Þ; ð ¼ dreUEG g ðx Þ

D ; 2

where the g factors contain adjustable parameters and depend on the reduced gradient, x ¼

jr j 4=3

:

The last term in the parallel spin correlation energy term integral ensures numer ical stability and contains the D factor: D ¼ 2ð W Þ;

Using Density Functional Theory Methods in LigandProtein Complexes

105

where W is the von Weisacker kinetic energy density. The DFT exchange and correlation are combined with the HF exchange in the usual manner, adding another adjustable parameter. These three sets of parameters (in the exchange, correlation, and hybrid formulae) were optimized for several databases of mole cular parameters. Both the exchange and correlation energy formulae recover the uniform electron gas limit. M05-2X notably performs well not only for nonbonded electrostatic and dispersion interactions, but also for thermochemistry and kinetics. In a recent paper, Hohensein et al. compared the performance of the above M05-2X and another functional due to Trular et al., M06-2X [18], against the PBE functional augmented with Hobza’s dispersion, or PBE-D [19]. The functionals were used to calculate the JSCH-2005 database interaction energies for hydrogenbonded and dispersion-bound amino acids and nucleobases. While the PBE-D method proved more accurate compared to CCSD(T) for hydrogen-bonded sys tems, M06-2X was more accurate for dispersion-bound stacked aromatic systems. This result is interesting as PBE-D should be expected to outperform explicitly for dispersion-dominated systems. The performance of both the M06-2X and PBE-D functionals was accurate, and considerably better than commonly used func tionals, and the authors suggest PBE-D as the method of choice for dispersiondominated systems. Zhao and Truhlar’s work covers many important model systems, but inter esting work in parameterizing existing DFT methods for actual biological sys tems has also been done by other authors. Kurita et al. [20] parameterized the Perdew—Wang functional (PW) in order to produce rare gas interaction energies in close agreement with experimental values and CCSD(T) values, with the hope of expanding this to DNA base stacking interaction energies. This parameteriza tion was performed by variation of the d parameter in the PW exchange func tional; this parameter exerts control over the low-density, large-density-gradient limit of the exchange energy. This is precisely the area where one would look for improvement in the description of dispersion/induction forces (note this is the same approach taken by Barone et al. in the creation of the modified PW func tionals [21]). The authors obtained improved gas interaction energies, but were less successful in computing DNA base stacking structures. One of the current authors, recognizing the ability of local functionals (such as SVWN and SPL) to model the p—p interactions in aromatic ring stacking, explored the modification of the percent semilocal character in a conventional GGA func tional [22] with the goal of modeling the interactions between amino acid resi dues and a drug (ligand). The ability of these local functionals to model this type of interaction is largely due to a cancellation of errors: the functional itself over estimates energies, but the density in the intermolecular region is underesti mated. Based on the PBE functional (though using S and VWN as the local contributions to exchange and correlation, respectively), this work used a func tional (called LRH0n) of the form: VWN ð; rÞ ¼ ESX ðÞ þ 0:nEPBE ðÞ þ 0:nEPBE ELRH0n XC X ð; rÞ þ EC C ð; rÞ;

106

Hunter Utkov et al.

where n could vary between 0 and 9, and examined the interaction energies between amino acid residues and the Parkinson’s disease drug, Carbidopa. This work demonstrated that the amount of semilocal correction necessary decreases with the size of the amino acid residues; large, aromatic residues of phenylalanine, tyrosine, and tryptophan require little to no gradient correction. This work also showed, though, that no one modification of the base DFT method worked well for all amino acid residues. Using a different approach to describing the weak interaction discussed here, Hobza et al. create the dispersion-augmented DFT methods by adding the dis persion energy to the DFT energy: EDFT D ¼ EDFT þ ED ; where the dispersion energy is given by the familiar form: ED ¼

X ij

ðijÞ

fdamp C6 r6 ij :

The damping function, fdamp, is varied to obtain the optimal functional. In that work the damping function is given by "

f damp ¼ 1 þ exp

1 rij d sR R0ij 1

!# ;

and the damping function parameter, sR, was found to have an optimal value of 0.98. Hobza et al. found that the unaugmented DFT functional more closely approached the values produced by the extrapolated CCSD(T) reference values for hydrogen-bonded systems. Traditionally, DFT been able to model hydrogen bonds and other purely electrostatic interactions. The DFT-D functional was able to produce more qualitatively correct dispersion-dominated interaction energies than the DFT functional, both in gas phase and in solvents. Grimme et al. have also developed DFT-D-type functionals for use in calcu lating dispersion-dominated interactions [23]. This study of PBE-D (using a slightly different parameterization from the work of Hobza et al.), BLYP-D, and B97-D (B97 is a reparameterization of Becke’s functional [24]) again compares performance against the JSCH-2005 database of noncovalently bound systems. In this work the performance of the DFT-D functionals was found to be more accurate for dispersion-dominated systems than hydrogen-bound systems. This behavior is curious as conventional DFT is typically accurate for hydrogenbonding, and the addition of the explicit dispersion term seems to improve dispersion while not improving, or actively decreasing the accuracy of calcula tions of hydrogen bonding. Perhaps the addition of dispersion is overcompensat ing for dispersion already present in the original DFT functionals. Schwabe and Grimme [25] also use the double hybrid method to include longer-range

Using Density Functional Theory Methods in LigandProtein Complexes

107

correlation in DFT through the addition of second-order perturbation theory correlation energy to conventional DFT. This approach, which neglects firstorder corrections to correlation due to use of noncanonical HF orbitals, does not increase the accuracy of DFT for the calculation of dispersion significantly compared with the increase in cost incurred by using the perturbative correc tions. Further, in the application of DFT to large, biological-scale systems, per turbative corrections would be prohibitively expensive. Lin et al. [26] used a dispersion-augmented DFT method to study the interac tions between DNA bases and ellipticine (an intercalant cancer drug). In this case the authors used dispersion-corrected atom-centered potentials to augment the BLYP functional, and found that this method could accurately model the disper sion-dominated interactions with a minimal increase in overhead costs. McNamara and Hillier [27] used a dispersion-augmented semiempirical method (e.g., AM1-D, PM3-D) with a similar dispersion potential to that used by Hobza to study the database of nonbonded interactions proposed by Hobza et al. (i.e., JSCH-2005 and S22). They found that the augmented semiempirical methods achieved close to chemical accuracy (mean unsigned errors of 1.1—1.2 kcal/mol, compared to highlevel calculations) for the interactions in the databases, compared with mean unsigned errors of 8.2—8.6 kcal/mol for the unaugmented methods.

5. APPLICATIONS While method development has focused largely on model systems that involve similar interactions to those found in larger biological systems of interest, there has also been a large body of work on applications of DFT methods directly to protein—ligand systems. These studies encompass intermolecular interactions from hydrogen bonds to metal—ligand coordination, all the way to aromatic p-stacking interactions. Different authors have found different degrees of success using DFT on biological protein—ligand systems, but the bulk of the results are positive. Muzet et al. [28] studied the reduction of glucose by aldose reductase; this reaction is part of the polyol pathway and uses NADPH as a cofactor. All calculations were performed at the experimental, X-ray crystal structure geome try with the DFT program SIESTA. Valence electrons were described using a basis set of finite-range numerical atomic orbitals. DFT quantum chemical modeling included a substructure consisting of 64 amino acid residues surrounding the active site. The authors performed an analysis of electrostatic forces in the cofactor and inhibitor binding using calculations and analysis of electrostatic potential maps. Analysis of the binding interactions, the electrostatic potential of the cofactor NADP, the inhibitor, and the enzyme active site were computed separately, and compared with the electrostatic potential generated by the com plex. The authors show that obtaining accurate electrostatic properties to under stand interactions among proteins, ligands, and cofactors is challenging. However, if provided with a well-refined high-resolution structure, electrostatic properties can be calculated directly by multipolar analysis or using DFT. This

108

Hunter Utkov et al.

work demonstrated that DFT quality potentials can be obtained quickly and almost routinely from high-resolution diffraction data. This calculation can be routinely performed, at a very low cost, with any protein structure at atomic resolution. Riley and Hobza [29] examined the binding of steroid hormones, namely, aldosterone, deoxycorticosterone, and progesterone to the wild-type and mutated mineralcorticoid (MR) steroid hormone receptor. Interaction energies were determined at the DFT/TPSS/TZVP and DFT-D/TPSS/TZVP levels of theory in the gas phase and in solution using the polarizable continuum-implicit solvation model. The DFT-D method, discussed above, consists of a DFT method, augmented by an empirical term that takes into account dispersion effects, which traditional DFT methods describe poorly. In many cases DFT-D yields interaction energies comparable to most highly correlated ab initio methods. The authors specifically looked at interactions between the different steroid hormones and residues found within the binding pocket of MR to elucidate the roles these interactions play in activating this steroid hormone receptor. They were also concerned with interresidue interaction energies for hydrogen-bonding networks near the ligand A-rings and D-rings to show the stabilizing effect. Strong hydro gen bonding is one of the characteristics of the MR agonist, which appear on the D-ring side chain and residues. These hydrogen bonds serve to draw helices together, explaining why the antagonist, progesterone, fails to activate MR. The authors found that while DFT-D produced more accurate results than traditional DFT, traditional DFT did provide acceptable results. The use of DFT for the modeling of dispersion interactions in protein com plexes shows promise in the development of anticancer drugs. Yun et al. [30] examined the interaction energy between p53 and MDM2, a potential drug target, because MDM2 binding inactivates the tumor suppressor activity of p53. This study examined p53—MDM2 binding using quantum mechanical methods, including MP2 and DFT, to determine the interaction energy at five residues on the p53 active site as determined by molecular mechanics. The results from the MP2 calculations at the cc-pVDZ and cc-PVTZ basis sets demonstrated the importance of vdW’s forces, confirming the molecular mechanical results. The DFT results produced by B3LYP at the same basis sets, however, produced repulsive energies and were concluded to be ineffective in accounting for disper sion. This study, however, only employed a single DFT method for their calcula tions, B3LYP, which has been shown ineffective at measuring dispersion. The use of a wide array of DFT methods could demonstrate the potential cost effective ness of DFT to measure dispersion. Due to the disturbance of the cell cycle in cancer cells that allows for uncontrollable growth, there is potential for anticancer drugs that prevent the cell from undergoing division. Dobes et al. [31] examined the interaction energy between cyclin-dependent kinase 2 (cdk2) and the drug Roscovitine. Because of the noncovalent interactions, the focus of this study became modeling dispersion through an array of computational methods, including MP2, CCSD(T), DFT, and DFTB-D. The MP2 calculations were conducted at the aug-cc-pVDZ basis set and demonstrated the importance of dispersion to the protein—drug interaction. The

Using Density Functional Theory Methods in LigandProtein Complexes

109

results from the CCSD(T) and DFTB-D calculations confirmed the previous results. The DFT results did not successfully model dispersion; however, as with the previous study, B3LYP was the only DFT method used, and the calcula tions were conducted at a low basis set (6—31G ). DFT modeling has also been applied to the interaction energy calculations between nucleotide base pairs in DNA. Hesselmann et al. [32] examined the interaction energy between purines and pyrimidines using MP2, CCSD(T), and DFT in combination with symmetry-adapted perturbation theory (SAPT). This study demonstrated agreement between the results produced by the three meth ods, demonstrating the potential for DFT modified with SAPT. Computational methods, including DFT, were used to examine the interac tions in polypeptide structure. Improta et al. [33] examined polypeptide second ary structure, focusing on modeling the dispersion interactions present in the alpha helix. This study relied on an alanine dipeptide analog to model the alpha helix. A wide range of DFT methods were employed: PBEPBE, PBELYP, BLYP, B3LYP, HCTH407, MPWPW91, MPWIPW91, and PBE0. The DFT results demon strated a “local” overestimation of the energy at each residue. The PBE0 was used as the reference functional because it provides comparable results to MP2 and CCSD(T) for this system. DFT has been a crucial tool for modeling of coordinate chemistry of ligands bound to metal ions. For neutral ligands such as carbon monoxide (CO) bound to iron, induction and dispersion forces dominate the interaction, so DFT analysis of weak forces is relevant. Xu et al. [34] studied vibrational frequencies of CO bound to iron in several proteins, including the heme sensor protein CooA, and the H-NOX family of proteins. Computations were performed using the B3LYP functional and standard 6—31G basis set for all the atoms except iron, for which Ahlrichs valence triple (VTZ) basis set was employed. The authors looked specifically at the weaken ing effects on the Fe—CO bond, the proximal histidine hydrogen bonding and tension, heme distortion and hydrogen bonding to the propionate substituents, and steric compression. The vFe and vCO frequencies of heme—CO adducts are sensitive to a variety of interactions with the surrounding protein environment, including distal interactions with polar residues, which directly modulate backbonding. DFT modeling also establishes a backbonding influence other than distal polarity, namely, neutralization of the equatorial negative charge by hydrogen bond donation to heme propionate substituents. Displacement also arises from steric crowding in the distal pocket leading to compression of the FeCO unit. Sinnecker et al. [35] examined the magnetic and optical spectra of metalloproteins using DFT, although the authors acknowledge that DFT has a known tendency of overestimat ing metal—ligand bonding. They present an assessment of the inherent quality of the DFT-based QM/MM procedures for the calculation of spectroscopic properties of metalloproteins. Their results show both the potential and also the limitations to the DFT approach for these types of systems. Zhang et al. [36] combine DFT-based conformation analysis with quanti tative structure—activity relationship (QSAR) analysis. They looked at bioac tive conformations for 25 cyclic imide derivatives as proto-porphyrinogen oxidase (PPO) inhibitors. PPO is the last common enzyme in the biosynthetic

110

Hunter Utkov et al.

pathway leading to heme and chlorophyll synthesis. The geometries of all cyclic imide molecules involved in this study were fully optimized by using the B3LYP functional and the 6—31G(d,p) basis set. The authors determined the bioactive conformations of the side chains of a series of cyclic amide derivatives, looking at 1—3 conformations depending on specific side chain’s ability to rotate. Potential energy surface scanning, molecular docking, and molecular dynamic simulations validated the DFT-QSAR approach. This study provides an effort to extend the application of traditional QSAR meth ods by combining them with electronic structure theory calculations. The current authors have studied the binding of ligands (the substrate and novel drug candidates) to the active site of HMG-coenzyme-A reductase [37]. The popular statin drugs for moderating blood cholesterol levels function as compe titive inhibitors of HMG-Co-A reductase, so knowledge of how ligands bind to this active site aids in the development of novel statin drugs. The authors determined that aromatic interactions in a section of the active site unused in statin drug binding can be important in ligand binding using MP2 and the SVWN DFT methods with the 6—311þG basis set. They applied that information in designing novel drug fragments to attach to existing statin drugs in order to increase the drug binding, thereby increasing the drug’s ability as an inhibitor [38]. The authors docked the novel drugs in the enzyme active site and employed ONIOM calculations (AM1 on the low level and SVWN DFT for the aromatic interactions) to verify that the new drugs bind strongly to the active site. The ONIOM calculations, which are the end result of this study, use SVWN DFT in a system including 11 amino acid residues of the active site and a large (100þ atom) molecule. This type of analysis using other more expensive ab initio methods would be unfeasible. Hong et al. [39] tested the effectiveness of DFT to examine the effects of hydrogen bonding on peptide structure using a model system consisting of 2—5 N-methylfor mamide molecules hydrogen bonded together. This study employed the Becke and Perdew—Wang DFTs at the DZVP basis set. This revealed significant cooperativity between sequential hydrogen-bonded moieties; this type of nonlocal behavior depends implicitly on the ability to model the same type of densities and interactions that are present in induction and dispersion. DFT was used as a standard to compare against molecular mechanics methods in this work, and DFT was found to give expected values. Some additional MP2 single-point calculations were also per formed to verify the other results. Further applications of DFT have been explored through the development of self-consistent charger density functional tight-binding scheme (SCC-DFTB). Elstner et al. [40] applied SCC-DFTB to study the effects of dispersion on protein secondary structure using polyalanine alpha helices.

6. CONCLUSIONS We see that DFT is increasingly used to model a wide range of weak interactions in protein—ligand and nucleic acid—ligand systems. Often, DFT is the only method that can incorporate all electron correlated calculations and that can be applied in

Using Density Functional Theory Methods in LigandProtein Complexes

111

a reasonable amount of time to the systems of interest. Standard DFT methods incorporate electron correlation (and induction/dispersion) with differing degrees of accuracy, but important work is being done to improve the generality and transferability of DFT methods to any induction/dispersion interaction. Truhlar and coworkers lead in the development of DFT methods with implicit incorporation of weak interactions, and Hobza and coworkers have had great success using their dispersion correction for DFT. Further improvements in DFT methods and in algorithmic development will allow the modeling of larger biological systems in the near future.

REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36.

Ritchie, T., MacDonald, S. Drug Disc. Today 2009, 14, 1011. Leach, A., Shoichet, B., Peishoff, C. J. Med. Chem. 2006, 49, 5851. Hohenberg, P., Kohn, W. Phys. Rev. 1964, 136, B864. Kohn, W., Sham, L.J. Phys. Rev. 1965, 140, A1133. Koch, W., Holthausen, M.C. A Chemist’s Guide to Density Functional Theory, Wiley-VCH, Weinheim, Germany, 2001. Slater, J.C. Quantum Theory of Molecular and Solids. Vol. 4: The Self-Contained Field for Mole cular and Solids, McGraw-Hill, New York, 1974. Vosko, S.H., Wilk, L., Nusair, M. Can. J. Phys. 1980, 58, 1200. Jurecka, P., Sponer, J., Cerny, J., Hobza, P. Phys. Chem. Chem. Phys. 2006, 8, 1985. Riley, K., Vondrasek, J., Hobza, P. Phys. Chem. Chem. Phys. 2007, 9, 5555. Valdes, H., Pluhackova, K., Pitonak, M., Rezae, J., Hobza, P. Phys. Chem. Chem. Phys. 2008, 10, 2747. Zhao, Y., Truhlar, D.G. J. Chem. Theory Comput. 2005, 1, 415—32. Zhao, Y., Truhlar, D.G. J. Chem. Theory Comput. 2006, 2, 1009—18. Zhao, Y., Truhlar, D.G. J. Phys. Chem. A 2005, 109, 5656. Zhao, Y., Truhlar, D.G. J. Chem. Theory Comput. 2007, 3, 289. Godfrey-Kittle, A., Cafiero, M. Int. J. Quant. Chem. 2006, 106, 2035. Boese, A.D., Handy, N.C. J. Chem. Phys. 2000, 114, 5497—503. Boese, A.D., Doltsinis, N.L., Handy, N.C., Sprik, M. J. Chem. Phys. 1999, 112, 1670—8. Zhao, Y., Truhlar, D.G. Theor. Chem. Acc. 2008, 120, 215. Hohenstein, E.G., Chill, S.T., Sherrill, C.D. J. Chem. Theory Comput. 2008, 4, 1996. Kurita, N., Inoue, H., Sekino, H. Chem. Phys. Lett. 2003, 370, 161. Adamo, C., Barone, V. J. Chem. Phys. 1998, 108, 664. Hofto, L.R., Lee, C., Cafiero, M. J. Comp. Chem. 2007, 30, 1111. Antony, J., Grimme, S. Phys. Chem. Chem. Phys. 2006, 8, 5287. Grimme, S. J. Comput. Chem. 2006, 27, 1787. Schwabe, T., Grimme, S. Phys. Chem. Chem. Phys. 2007, 9, 3397. Lin, I., von Lilienfield, A., Coutinho-Neto, M., Tavernelli, I., Rothlisberger, U. J. Phys. Chem. B 2007, 111, 14346. McNamara, J., Hillier, I. Phys. Chem. Chem. Phys. 2007, 9, 2362. Muzet, N., Guillot, B., Jelsch, C., Howard, E., Lecomte, C. PNAS 2003, 100, 8742. Riley, K., Hobza, P., J. Phys. Chem. B 2008, 112, 3157. Yun, D., Ye, M., Zhang, J. J. Phys. Chem. B 2008, 112, 11396—401. Dobes, P., Otyepka, M., Strnad, M., Hobza, P. Chem. Eur. J. 2006, 12, 4297—304. Hesselmann, A., Jansen, G., Schutz, M. J. Am. Chem. Soc. 2006, 128, 11730—1. Improta, R., Barone, V. J. Comput. Chem. 2004, 25, 1333—41. Xu, C., Ibrahim, M., Spiro, T. Biochemistry 2008, 47, 2379. Sinnecker, S., Neese, F. J. Comput. Chem. 2006, 27, 1463. Zhang, L., Hao, G., Tan, Y., Xi, Z., Huang, M., Yang G. Bioorg. Med. Chem. 2009, 17, 4935.

112

Hunter Utkov et al.

37. Kee, E.A., Livengood, M.C., Carter, E.E., McKenna, M.L., Cafiero, M. J. Phys. Chem. B 2009, 113, 14810. 38. Utkov, H.E., Cafiero, M., In preparation. 39. Hong, G., Gresh, N., Roques, B., Salahub, D. J. Phys. Chem. B 2000, 104, 9746—54. 40. Elstner, M., Frauenheim, T., Suhai, S. J. Mol. Struct. 2003, 632, 29—41.

CHAPTER

8 Theoretical Calculations of Acid Dissociation Constants: A Review Article Kristin S. Alongi1 and George C. Shields2

Contents

1. Introduction 2. Background 2.1 Thermodynamic cycles 2.2 Gas-phase free energy calculations 2.3 Solvation free energy calculations 3. Calculating Changes in Free Energy in the Gas Phase 4. Calculating Changes in Free Energy in Solution 5. Thermodynamic Cycles 6. Concluding Remarks References

Abstract

Acid dissociation constants, or pKa values, are essential for understanding many fundamental reactions in chemistry. These values reveal the deprotonation state of a molecule in a particular solvent. There is great interest in using theoretical methods to calculate the pKa values for many different types of molecules. These include molecules that have not been synthesized, those for which experimental pKa determinations are difficult, and for larger molecules where the local environment changes the usual pKa values, such as for certain amino acids that are part of a larger polypeptide chain. Chemical accuracy in pKa calculations is difficult to achieve, because an error of 1.36 kcal/mol in the change of free energy for deprotonation in solvent results in an error of 1 pKa unit. In this review the most valuable methods for determining accurate pKa values in aqueous solution are presented for educators interested in explaining or using these methods for their students.

114 115 115 121 121 123 125 130 133 134

1

Dean’s Office and Department of Chemistry & Physics, College of Science & Technology, Armstrong Atlantic State University, Savannah, GA, USA

2

Dean’s Office & Department of Chemistry, College of Arts & Sciences, Bucknell University, Lewisburg, PA, USA

Annual Reports in Computational Chemistry, Volume 6 ISSN: 1574-1400, DOI 10.1016/S1574-1400(10)06008-1

2010 Elsevier B.V. All rights reserved.

113

114

Kristin S. Alongi and George C. Shields

Keywords: pKa; acid dissociation constants; theory; free energy; gas phase; solution; thermodynamic cycles; deprotonation

1. INTRODUCTION Acid dissociation constants, also known as pKa values, are essential for under standing many fundamental reactions in chemistry and biochemistry. For the acid dissociation reaction þ HAðaqÞ ! A ðaqÞ þHðaqÞ

ð1Þ

pKa is defined as pKa ¼ log Ka h ih i Hþ AðaqÞ ðaqÞ Ka ¼ HAðaqÞ

ð2Þ

ð3Þ

Oftentimes pKa values can be measured quite easily experimentally; however, many times chemists are interested in the pKa values of molecules that have not been synthesized or for which experiments are not straightforward. For instance, amino acids that are part of a polypeptide chain have pKa values that vary based on their local environment, which are difficult to determine. Therefore, the ability to computationally calculate these pKa values accurately is important for scien tific advancements in biochemistry and other fields. Chemical accuracy, though, is hard to achieve. Computationally calculating acid dissociation constants is a demanding and arduous process because an error of 1.36 kcal/mol in the change of free energy of reaction 1 results in an error of 1 pKa unit [1,2]. There are numerous studies that use a variety of methods in an attempt to obtain this chemical accuracy. In recent years, there have been new developments, but many discrepancies still exist. The aim of this present work is to compare all of the significant methods for accurate pKa calculations for educators interested in explaining or using these methods with their students. We will focus on three areas of development: thermodynamic cycles, gas-phase free energy calculations, and the change in free energy of solvation calculations. An excellent review of

Ggas HA(g) A–(g) + H+(g) ↑ –ΔGsol(HA) ↓ ΔGsol(A–) ↓ ΔGsol(H+) HA (aq) A–(aq) + H+(aq) ΔGaq

Figure 1 Proton-based thermodynamic cycle.

Theoretical Calculations of Acid Dissociation Constants: A Review Article

115

thermodynamic cycles and the most common solvation models used for pKa calculations has recently been published by Ho and Coote [3].

2. BACKGROUND 2.1 Thermodynamic cycles Numerous thermodynamic cycles have been used to calculate pKa values. One of the most common methods is depicted in Figure 1, based on reaction 1 [1,2,4]: In Figure 1, DGaq represents the overall change in free energy of this reaction in solution, DGgas is the change in the gas-phase free energy, and DGsol is the change in free energy of solvation. Based on the diagram, pKa is calculated using the following equations: pKa ¼

DGaq RT lnð10Þ

ð4Þ

DGaq ¼DGgas þDDGsol

ð5Þ

DGgas ¼Ggas ðHþ Þ þGgas ðA Þ Ggas ðHAÞ

ð6Þ

DDGsol ¼ DGsol ðHþ Þ þDGsol ðA Þ DGsol ðHAÞ

ð7Þ

where

and

All of these free energy values can be calculated using quantum chemistry, except DGsol(Hþ) and Ggas(Hþ), which must be determined experimentally or using thermodyamic theory. These values will be discussed in the corresponding sections. A similar thermodynamic cycle that is often used is based on the acid dissociation equation of a protonated acid [5]: þ HAþ ðaqÞ ! AðaqÞ þ HðaqÞ

ð8Þ

In this case, reaction 8 leads to a thermodynamic cycle similar to Figure 1, which was based on reaction 1. Now, Eq. (6) becomes DGgas=Ggas(Hþ)þGgas(A)— Ggas(HAþ) and Eq. (7) becomes DDGsol=DGsol(Hþ)þDGsol(A)—DGsol(HAþ). Other thermodynamic cycles are based on the acid’s reaction with a water molecule, as depicted in the following reactions [3,5—9]: þ HAðaqÞ þ H2 OðaqÞ ! A ðaqÞ þ H3 OðaqÞ

ð9Þ

þ HAþ ðaqÞ þ H2 OðaqÞ ! AðaqÞ þ H3 OðaqÞ

ð10Þ

116

Kristin S. Alongi and George C. Shields

þ HAþ ðaqÞ þ H2 OðaqÞ ! H2 O AðaqÞ þHðaqÞ

ð11Þ

þ HAðaqÞ þ H2 OðaqÞ ! H2 OA ðaqÞ þHðaqÞ

ð12Þ

Reactions 9—12 lead to equivalent thermodynamic cycles similar to Figure 1. In these cycles, however, the number of waters included in the cycle may vary [8]. A key point is that the concentrations of all species in a given thermodynamic cycle must have the same standard state, including the solute and solvent, and when water is introduced in the cycle it should have a standard state of 1M [6,9]. One limitation of the cycles derived from reactions 9—12 is that the hydronium ion’s change in free energy of solvation is difficult to calculate because of the high charge. However, one can use the accepted experimental DGsol H3Oþ value of —110.3 kcal/mol in the 1M standard state and the accepted experimental value for DGsol H2O of —6.32 kcal/mol [10—12]. This reduces the number of computations and ensures similar accuracy as using the proton-based thermodynamic cycle and the most recently accepted experimental value for DGsol Hþ. The addition of another water to the aqueous reaction, such as in reaction 11 or 12, also results in more calculations. This increases computational error in determining pKa, as the total number of manipulated numbers increases from reaction 1 to reaction 12. For a thorough understanding of the standard state issues that arise when adding water to thermodynamic cycles, study the recent Goddard group paper [9]. Another thermodynamic cycle often used is derived from the acid dissocia tion reaction with a hydroxide ion to produce water [5,7]: HAðaqÞ þOH ðaqÞ ! AðaqÞ þ H2 OðaqÞ

ð13Þ

HAþ ðaqÞ þOHðaqÞ ! AðaqÞ þ H2 OðaqÞ

ð14Þ

Limitations of these corresponding cycles are similar to those used for reac tions 9—12, where the additional reactant increases the number of calculations and, consequently, the computational error. The change in free energy of solvation for OH— is also difficult to determine computationally because it is a diffuse anion, but the experimental value may be used instead. The accepted experimental DGsol(OH—) value is —104.7 kcal/mol in the 1M standard state [12,13]. Again, as long as the correct experimental values are used for the free energy of solvation of Hþ, H3Oþ, H2O, and OH, these different thermo dynamic cycles should yield similar results. Increasing the number of species primarily affects the calculation of DGgas, as Eq. (6) now contains another reactant and product. Reactions 13 and 14 can be modified by hydrating the hydroxide ion with nþm waters, resulting in n waters being attached to A(aq) or A—(aq) and m waters returning to the solvent H2O(aq) after reaction, in the cluster continuum approach first outlined by Pliego and Riveros [3,9,14]. The

Theoretical Calculations of Acid Dissociation Constants: A Review Article

117

difficulty in computing accurate free energy of solvation values for anions is discussed later in Section 2.3. The thermodynamic cycles derived from reactions 9—14 are depicted in a man ner similar to Figure 1. The pKa values are then calculated using Eqs. (4) and (5); Eqs. (6) and (7) are modified to correlate to the specific thermodynamic cycle.

2.1.1 Limiting experimental values One of the main sources of error in pKa calculations is the value used for the free energy of solvation for Hþ, which is explicitly needed in certain thermodynamic cycles. A proton contains no electrons and its free energy cannot be calculated quantum mechanically. Calculation of this energy using the standard equations of thermodynamics and the Sackur—Tetrode equation [15] yields the same value as can be deduced experimentally from the NIST database. The translational energy of 1.5RT combined with PV = RT and H = E þ PV yields a value of H (Hþ) equal to 5/2(RT) or 1.48 kcal/mol. Use of the Sackur—Tetrode equation yields the entropy, TS(Hþ) = 7.76 kcal/mol at 298 K and 1 atm pressure. Finally, since G = H—TS, G(Hþ) = —6.28 kcal/mol. In contrast, over the past 15 years, the DGsol Hþ value has changed considerably, from an accepted range of —254 to —261 kcal/mol in 1991 [7] to the now accepted —265.9 kcal/mol [12]. The basic uncertainty in any determination of the free energy of solvation of an ion is that ions are never isolated. Determination of, say, DGsol for an anion, A—, can be made if the corresponding value for a cation, Cþ, is already know. That way when the DGsol is measured for the salt CA, the value for DGsol of A— can be determined by difference. In practice, all ionic solvation values are referenced against the value for Hþ; every time the value for Gsol Hþ changes, the values for all of the other ions in the various databases change as well. The older value for Gsol Hþ of —261.4 kcal/mol was estimated from an average of five inde pendent measurements of the hydrogen ion electrode [12]. When we began our work on pKa calculations, using carboxylic acids and phenols as test cases, we tried to use the old value of —261.4 kcal/mol. Like previous researchers, it was not possible to calculate accurate absolute pKa values [16—19]. Karplus and coworkers reported that the continuum dielectric methods were not accurate enough to yield accurate absolute solvation free energies. By 2001, new solvation methods had been developed, and relative pKa values were easily determined, leading to a suspicion that the value for DGsol Hþ was erroneous [20]. We used an experimental thermodynamic cycle for acetic acid dissociating to Hþ and the acetate anion to derive an experi mental value for DGsol Hþ of —264.61 kcal/mol [1,2]. At that time, this was by far the most negative value used for these types of calculations, and it was a bit shocking to be using a value that was more than 3 kcal/mol lower than that obtained from the hydrogen ion electrode. However, two groups had used a combined explicit—implicit theoretical approach to obtain DGsol Hþ values of —264.4 and —264.3 kcal/mol for a standard state of 1M [21,22]. In addition, Coe and coworkers had used experimental ion—water clustering data to derive a

118

Kristin S. Alongi and George C. Shields

value for DGsol Hþ of —264.0 kcal/mol [10]. At the time, much of the computa tional chemistry community thought that this value was for a standard state of 1M [23—25], yet after a few years of confusion it was determined that the correct standard state was 1 atm, so that changing to a standard state of 1M changed the value of DGsol Hþ to —265.9 kcal/mol [11,12]. In addition, Goddard and coworkers have recently shown that by including concentration correc tions to Zhan and Dixon’s high-level ab initio calculations of the hydration free energy of the proton [22], their value is corrected to —265.63 + 0.22 kcal/mol [9]. We owe the Camaioni, Goddard, and Cramer/Truhlar research groups a debt of gratitude for bringing clarity to this issue. Discussion of the standard state conversion can be found below. The newest value was originally determined by Tissandier et al. in 1998 using correlations between DGsol of neutral ion pairs and experimental ion—water clustering data, which is known as the cluster pair approximation method. Kelly et al. [12] confirmed this value in 2006 using a similar method but larger data set. Experimental uncertainty, however, does still exist, which introduces uncertainty in all of these procedures. The estimated error in DGsol Hþ is thought to be 2 kcal/mol [12]. We note that the most accurate values of a wide range of unclustered cations and anions, based on the accepted value for the free energy of solvation of Hþ, are given in Reference [12]. For clustered ions, standard state corrections for concentration of water clustered to the ions must be included [9], and the corrected values for clustered ions have been published in the supple mental information of a recent paper on Minnesota solvent models [26]. The absolute solvation free energy of a proton can also be calculated using high-level gas-phase calculations with a supermolecule-continuum approach, involving a self-consistent reaction field model. The change in free energy of solvation is calculated by adding waters to Hþ until a converged value is reached. The solvent is approximated by a dielectric continuum medium that is based on its location around the solute, and the number of quantum mechanically treated solvent molecules is increased to improve the calculations. Using this approach, Zhan and Dixon calculated DGsol Hþ = —264.3 kcal/mol [22]. Correction for a missing standard state term for water concentration changes this value to —265.63 0.22 kcal/mol [9]. Thus the most recent experimental and theoretical determina tions of DGsol Hþ are now less than 0.3 kcal/mol from each other, lending con fidence in using either the —265.63 or —265.9 value, and lowering the uncertainty in this value to be closer to 1 kcal/mol. Zhan and Dixon’s corrected value is the standard against which all future explicit quantum mechanical calculations of DGsol Hþ will be evaluated. The standard state of DGsol Hþ must also be taken into account to produce reliable results. Free energies may be calculated using an ideal gas at 1 atm as a reference for gas-phase calculations or with an ideal gas at 1 mol/L, which is used with free energy of solvation calculations. The value for DGsol and DGgas depends on which standard state is used in their determination. Furthermore, a homo genous equilibrium, where all of the species are in the same standard state, is necessary to obtain reliable pK results. The conversion of the 1 atm standard state to the 1 mol/L standard state can be derived from the relationship between the

Theoretical Calculations of Acid Dissociation Constants: A Review Article

119

equilibrium constant concerning concentration, Kc in 1M, and the equilibrium constant expressed in terms of pressure, Kp in the 1 atm standard state. The relationship between the two constants is derived for the following general reaction: aAðgÞ bBðgÞ

ð15Þ

The corresponding equilibrium constants are Kc ¼

½Bb ½Aa

ð16Þ

PbB PaA

ð17Þ

Kp ¼

Using the ideal gas law, PV = nRT, where R is given by 0.8206 L atm/K mol, we rewrite Eq. (17): Kp ¼

ðnB RT=VÞb ðnB =VÞb ¼ ðRTÞb a ðnA RT=VÞa ðnA =VÞa

ð18Þ

Since nB/V and nA/V now have the units mol/L, they can be replaced by the concentrations of A and B and simplified in terms of Kc: (

½Bb Kp ¼ ½Aa

) ðRTÞb a ¼KC ðRTÞDn

ð19Þ

where Dn is the change in the number of moles, b — a. Equation (19) can be used to show the relationship between the Gibbs free energies in different standard states. The relationship between the 1M state and the standard state of 1 atm is [12]: G ¼G þDG !

ð20Þ

DG ¼DG DG !

ð21Þ

We can determine DG! by using Eq. (19) and the relations between the equili brium constants and the free energies at the two different standard states: DG ¼ RT ln Kc

ð22Þ

DG ¼ RT ln Kp

ð23Þ

Using Eqs. (19), (22), and (23), we can find the relationship between the two free energies at 298.15 K:

120

Kristin S. Alongi and George C. Shields

DG ¼DG RT lnðRTÞDn ¼DG RT lnð24:4654ÞDn

ð24Þ

In relation to Eq. (2), this illustrates that for the dissociation reaction 1, þ AHðaqÞ ! A ðaqÞ þHðaqÞ ; where Dn ¼ 1:

DG ! ¼RT lnð24:4654Þ

ð25Þ

At 298.15 K this conversion of standard states, DG! , equals 1.89 kcal/mol. When DGsol Hþ=—265.9, it is in the gas-phase 1M standard state. If reported with the 1 atm standard it is —264.0 kcal/mol. In calculating accurate pKa values, one must be aware of the standard state because a difference of 1.89 kcal/mol can cause significant error and unreliable values [1,2,12]. The Ggas Hþ value also cannot be determined quantum mechanically. Its value, however, has less uncertainty, and is the same whether determined from experimental values available in the NIST website or from the Sackur—Tetrode equation [15], and is consistently accepted as —6.28 kcal/mol for a standard state of 1 atm [1,2]. The reliance on experimental Hþ values is a significant challenge in producing accurate pKa calculations using some thermodynamic cycles. Relative pKa calculations, which solve for the pKa of one acid in terms of another, are often used [5,20,27—32] to cancel out this error. The calculation is based on the two species in equilibria, where AHþ and BHþ are two acids: þ AHþ ðaqÞ þ BðaqÞ AðaqÞ þBHðaqÞ

ð26Þ

This method is limited to reactions that have a particular acid standard available for Reference [20]. For this review, we will focus on absolute pKa calculations and the most accurate method concerning any molecule of interest.

2.1.2 Alternative methods for calculating pKa values Methods other than thermodynamic cycles are often used to calculate acid dis sociation constants. Previous publications implement the theoretical relationship between pKa and structural property [6], bond valence methods and bond lengths [33], pKa correlations with highest occupied molecular orbital (HOMO) energies and frontier molecular orbitals [34], and artificial neural networks [35] to predict pKa values. In addition much work has been done using physical proper ties as quantitative structure-activity relationship (QSAR) descriptors, and regression equations with such descriptors to yield accurate pKa values for specific classes of molecules [36—47]. The correlation of pKas to various molecular properties, however, is often restricted to specific classes of compounds, and it is

Theoretical Calculations of Acid Dissociation Constants: A Review Article

121

usually unwise to apply these relations to other molecules outside of the data set [8,23]. Therefore, in this review we will focus on the use of various thermody namic cycles in the calculation of acid dissociation constants. Many other variables such as electronic, thermal, and solvation energy and polarizable force fields may also affect the accuracy of pKa calculations but will not be discussed in detail in this review [28,48].

2.2 Gas-phase free energy calculations The gas-phase free energy calculation is the lowest source of error in pKa calculations. High levels of theory, such as CBS-QB3 [49] and CBS-APNO [50], produce reliable DGgas values with root-mean-square deviations of 1.1—1.6 kcal/mol from the free energy of gas-phase deprotonation reactions compiled in the NIST online database [51—53]. With today’s computers and focusing on small molecules, CCSD(T) calculations extrapolated to the com plete basis set limit can give gas-phase free energies as accurate as the experi ment. Details will be discussed in Section 3. The problem, however, is producing accurate results without using such computationally expensive levels of theory [54]. Combinations of different methods, such as model che mistries, density functional theories (DFTs), and ab initio theories, and differ ent basis sets have been used in an attempt to achieve an accurate but less computationally demanding method.

2.3 Solvation free energy calculations The largest source of error in pKa calculations is the change in free energy of solvation calculation for the reaction, which is based on the type of solvation model used and the specific level of theory [1,2,8,12]. The basic problem is that experimental free energies of solvation for ions have error bars of roughly 2—5 kcal/mol, and so models that have been developed to reproduce experimental values have the same inherent uncertainty. It is not possible to improve a particular solvation model by simply increasing the basis set, as one can when calculating ab initio quantum mechanical gasphase values. Explicit solvation methods include the addition of solvent molecules directly in the calculation. This method is advantageous because specific solute—solvent interactions are taken into account. These multiple interactions, however, make it more difficult to find a global minimum for the complex [23,25]. The number of necessary solvent molecules included in the reaction also comes into question, leading to the problem of balancing accuracy with computational expense. In addition, conformational effects can be daunting; it is difficult to know how many different ion—water configurations are necessary to get a conformationally averaged result. Reactions 9—12 use only one water molecule but explicit solva tion methods can be used to examine the effects of adding additional waters to the reaction:

122

Kristin S. Alongi and George C. Shields

HAðaqÞ þnH2 OðaqÞ ! ðH2 OÞn AðaqÞ þHþ ðaqÞ

ð27Þ

þ HAðaqÞ þ ðnþmÞH2 OðaqÞ ! ðH2 OÞm A ðaqÞ þ ðH2 OÞn HðaqÞ

ð28Þ

HA H2 OðnÞðaqÞ þ OH H2 OðmÞðaqÞ ! A H2 OðnþmÞðaqÞ þ H2 OðaqÞ

ð29Þ

Reactions 27—29 depict some examples of explicit solvation effects, where n, or n þ m, is the number of water molecules used in the reaction. Because of the daunting task of computing enough different configurations with a large number of water molecules, complete with frequency calculations to determine the entropic values necessary to obtain free energies for each configuration, quantum chemistry is not yet used routinely for completely explicit solvent models. Recent evidence suggests that if the standard states for water are included correctly, the use of a thermodynamic cycle based on reaction 28 will yield good values for pKas if a cluster cycle (and not a monomer cycle) is used for the waters [9]. In contrast to explicit solvation, implicit solvent effects, where actual solvent molecules are not included in the thermodynamic cycle, are easily implemented for pKa calculations. Various methods used to calculate the change in free energy of solvation, such as the Dielectric Polarizable Continuum Model (DPCM) and Conductor-Like Polarizable Continuum Model (CPCM), use implicit solvation by constructing a solvation cavity around the molecule of interest. These methods have been shown to compute the free energy of solvation for neutral molecules within 1 kcal/mol [23]. The implicit models directly approximate a homogenous dielectric continuum for the neutral compounds, which represents the response of a bulk solvent. This is computationally less demanding than explicit solvation methods, but it is not particularly accurate for ionic species. Strong electrostatic effects make solvent modeling using implicit solvation more challenging [8]. The method yields less accurate values for these species [25] and may also impart a false partial positive charge on the system if wave functions penetrate beyond cavity walls [14]. Furthermore, ionic species also have larger free energies of solvation, due to solute—solvent interactions. Consequently, a smaller percentage of error is necessary for a charged species to produce the same level of chemical accuracy as a neutral molecule [23]. Aside from problems with ionic species, an additional limitation of implicit solvation is that the accuracy depends on the selection of proper boundary techniques, such as the type of solvation cavity [25]. While developing an implicit solvation model, Solvation Model 6 (SM6), Kelly, Cramer, and Truhlar found that when calculating the free energies of solvation for molecules with concentrated regions of charge densities, more accurate values were obtained by adding explicit waters in addition to the implicit effects of the model. They concluded that this occurred because of significant local solute—solvent interactions, which their implicit model did not take into account [25]. This method of including explicit solvation effects while using an implicit model is referred to as a cluster continuum model [55] or implicit—explicit model [25]. One limitation of this method is that one

Theoretical Calculations of Acid Dissociation Constants: A Review Article

123

must determine the number of explicit molecules that yield the most accurate results, which varies based on the type of molecules in the data set [55,56]. Along with deciding whether to use implicit or implicit—explicit solvent models, a specific level of theory and basis set must be used for the calcula tion of the change in free energy of solvation. Similar to the gas-phase free energy, there are a variety of methods and it can be difficult to determine what combination is the most accurate. Further discussion can be found in Section 4.

3. CALCULATING CHANGES IN FREE ENERGY IN THE GAS PHASE The calculation of DGgas should be the lowest source of error in pKa calculations because many high levels of theory can accurately predict these values. In 2006, we showed that CCSD(T) [57—60] is a highly effective method for calculating the change in gas-phase free energies for deprotonation [53]. CCSD(T) stands for coupled cluster with all single and double substitutions along with a quasi perturbative treatment of connected triple excitations, and as of this writing, it is considered the gold standard in ab initio quantum chemistry. CCSD(T) is one of the most effective ways to include electron correlation, which results from the fact that when an electron moves, all other electrons tend to move to avoid the moving electron. Hartree—Fock theory solves the Schro¨dinger equation for an average electronic potential, including electron correlation is essential for obtaining meaningful energetic values and different ways of doing so consume much of the field of computational chemistry. In this case, the coupled cluster calculations included triple excitations for both the complete fourth-order Møller—Plesset (MP4) and CCSD(T) energies (for instance, by using the E4T keyword in Gaussian). The single-point CCSD(T) energy calculations used the augmented correlation consistent polarized n-tuple zeta basis sets (aug-cc pVnZ, n = D, T, Q, 5) of Dunning and coworkers [61]. These calculations were performed upon geometries obtained using fourth-order Møller—Plesset pertur bation theory [62] with single, double, and quadruple substitutions (MP4 (SDQ)). These optimizations, and their corresponding frequency calculations, employed the aug-cc-pVTZ basis set. The frequency calculations ensured that all structures were optimized to a true energetic minimum on the potential energy surface, and the unscaled thermochemical corrections were used to obtain the zero point energies, enthalpies, and Gibbs free energies. Further more, to estimate the energy at the complete basis set limit, a series of two-point extrapolations on the correlation energy were undertaken [53]. In this scheme (Eqs. 30—32), an extrapolated value for the correlation contribution to the total energy is obtained using two consecutive correlation energies, x - 1 and x, and is then added to a nonextrapolated Hartree—Fock energy [63—65]: ¼ ECCSDðTÞ EHF Ecorr x x x

ð30Þ

124

Kristin S. Alongi and George C. Shields

Ecorr x1;x ¼

corr x3 Excorr ðx 1Þ 3 Ex1 x3 ðx 1Þ 3

corr Ex 1 ; x ¼ ExHF þ Ex1;x

ð31Þ ð32Þ

The CCSD(T)//MP4(SDQ)/aug-cc-pVTZ method, with the extrapolation to the complete basis set limit using the aug-cc-pVTZ and aug-cc-pVQZ basis sets, yielded a standard deviation of 0.58 kcal/mol when compared to a select set of experimental values of gas-phase deprotonation reactions compiled in the NIST online database, a data set with uncertainty of <1 kcal/mol [51,53]. The low error of the selected NIST data set makes these values extremely useful in determining accurate pKa calculations and will be referenced throughout the review. Using model chemistry methods, we also reported DGgas calculations with slightly less accuracy, within 1.1—1.6 kcal/mol of experimental values. The model chemistries G3 [66], CBS-QB3 [49], CBS-APNO [50], and W1 [67] produced mean absolute deviations of 1.16, 1.43, 1.06, and 0.95 kcal/mol, respectively [52,53]. This data was confirmed in 2005 where G2, G3, and CBS-APNO predict accurate values of DGgas for formation of ion—water clusters when compared to experi mental results [68—72]. Contrary to the previous publications, however, CBS-QB3 was less accurate for these clustered ions than the other model chemistries. Although effective methods of gas-phase free energy calculations do exist, it would be useful to find computationally less demanding methods that produce a similar accuracy [54]. DFT may offer a more cost-efficient approach, although it must be remem bered that each particular DFT functional and a given basis set represents its own parameterized method. DFT includes some of the correlation energy, although the exact solution to recover all of it is still unknown and is the subject of much theoretical research. For example, fairly accurate results were obtained for PBE1PBE/aug-cc-pVTZ and B3P86/aug-cc-pVTZ [54]. The select NIST data test set included the deprotonation reactions of the following compounds: ammonia, methylamine, dimethylamine, ethylamine, methane, methanol, water, acetylene, ethylene, formaldehyde, hydrogen chloride, propene, nitrous acid, nitric acid, isocyanic acid, furan, and benzene. When compared to experimental results within the NIST database [51], the mean square DGgas deviations for both PBE1PBE/aug-cc-pVTZ and B3P86/aug-cc-pVTZ were 1.6 kcal/mol, exhibiting somewhat less accuracy compared to more computationally expensive methods. In another study, G3MP2, G2, G3, G2MP2, G3B3, G3MP2B3, QCISD(T), CBS-4, CBS-Q, CBS-QB3, and CBS-APNO produced DGgas values for nitrous acid within 0—1.6 kcal/mol [73] of the experimental value of 333.7 kcal/mol [74]. They also found that the less expensive density functional B3LYP produced values within 2.72 kcal/mol of experiment. The commonly used Hartree—Fock level of theory, which does not include correlation energy, produced inaccurate results with a large 4.66 kcal/mol discrepancy [73].

Theoretical Calculations of Acid Dissociation Constants: A Review Article

125

The accuracy of B3LYP has been examined by numerous researchers. In 2003, Fu and colleagues reported that the MP2/6-311þþG(d,p) and B3LYP/6-311þþ G(2df, p) methods yielded gas-phase acidities, or the change in free energy of the reaction: þ AHgas ! A gas þ Hgas

ð33Þ

within 2.2 and 2.3 kcal/mol of experimental values of various organic acids reported in the NIST online database [27,51]. Two years later, Range also reported that B3LYP with the 6-311þþG(3df,2p) basis produced a root-mean square error of 2.5 kcal/mol for reaction 33, when compared to experimental values from the NIST online database [51]. The article also reported that previously discussed high levels of theory, CBS-QB3, G3B3, G3MP2B3, PBE0, and B1B98, have a RMSE all within 1.3 kcal/mol of experimental values [75]. Reaction 33 represents the gas-phase dissociation of an acid, which is the top line of Figure 1. Other publications, however, report more accurate values of B3LYP gas-phase free energy calculations on aliphatic amines, diamines, and aminoamines. In 2007, Bryantsev et al. reported that B3LYP calculations with the basis set 6 31þþG had a mean absolute error of 0.78 kcal/mol from experimental values of the gas-phase basicity (DGgas) of the reverse reaction of Eq. (1) reported in the NIST database [51]. This accuracy is comparable to that of expensive, high-level model chemistries but, because the experimental values have uncertainties of +2 kcal/mol, it is difficult to discern exactly how accurate the calculations are in comparison to values in the other publications [76]. The take-home message remains the same: always benchmark DFT calculations for the systems you are interested in computing.

4. CALCULATING CHANGES IN FREE ENERGY IN SOLUTION The change in free energy of solvation calculation for the reaction is the largest source of error in pKa calculations. To determine the most accurate method we must look both at the type of solvation model used, implicit, explicit, or cluster continuum method (likewise described as implicit—explicit), and the specific level of theory. As previously mentioned, ionic species, in particular, are extremely difficult to calculate because of their strong electrostatic effects and large free energy of solvation values [8,14,23,25]. Implicit solvation models developed for condensed phases represent the solvent by a continuous electric field, and are based on the Poisson equation, which is valid when a surrounding dielectric medium responds linearly to the charge distribution of the solute. The Poisson equation is actually a special case of the Poisson—Boltzmann (PB) equation: PB electrostatics applies when electrolytes are present in solution while the Poisson equation applies when no ions are present. Solving the Poisson equation for an arbitrary equation requires numer ical methods, and many researchers have developed an alternative way to approximate the Poisson equation that can be solved analytically, known as the

126

Kristin S. Alongi and George C. Shields

Generalized Born (GB) approach. The most common implicit models used for small molecules are the Conductor-Like Screening Model (COSMO) [77,78], the DPCM [79], the Conductor-Like Modification to the Polarized Continuum Model (CPCM) [80,81], the Integral Equation Formalism Implementation of PCM (IEF-PCM) [82] PB models, and the GB SMx models of Cramer and Truhlar [23,83—86]. The newest Minnesota solvation models are the SMD universal Solva tion Model based on solute electron density [26] and the SMLVE method, which combines the surface and volume polarization for electrostatic interactions model (SVPE) [87—89] with semiempirical terms that account for local electrostatics [90]. Further details on these methods can be found in Chapter 11 of Reference [23]. Kelly et al. [8] used the cluster continuum model in their study of aqueous acid dissociation constants. They compared the correlation between experimental pKa values and the calculated acid dissociation free energies of anions with and with out an additional explicit water molecule using SM6. Note that because of the relation between pKa and DGaq as shown in Eq. (4), a plot of pKa versus DGaq should yield a slope of 1/2.303RT or 1/RT ln(10). This single water molecule was added only to ions containing three or fewer atoms or ones with oxygen atoms bearing a more negative partial atomic charge than that of the water solute. They reported that when only implicit effects were included, a regression equation with a slope of 0.71/RT ln(10) and correlation r2 = 0.76 was computed. When an explicit water was added to the acids, however, the new regression yielded a slope of 0.87/ RT ln(10) with a correlation of r2 = 0.86. From this observation, Kelly, Cramer, and Truhlar concluded that, for some anions, the accuracy of acid dissociation energies greatly increased with the addition of one explicit water molecule. Furthermore, they attested that an implicit model alone cannot produce such accurate results because it does not take into account strong solute—solvent interactions. They argue that previous publications using implicit methods with strong correlations between pKa values and free energy calculations actually have underlying sys tematic errors in their methods, as indicated by lower slopes [8]. Related to Kelly, Cramer, and Truhlar’s conclusion, Klamt, Eckert, and Diedenhofen’s 2003 publication studied the correlation between experimental pKa values and the free energies of dissociation for 64 organic and inorganic acids for reaction 9. Like the Kelly publication, Klamt, Eckert, and Diedenhofen used an explicit water molecule. Their solvent calculations, however, used Klamt’s COSMO-RS level of theory [78]. They reported a correlation of r2 = 0.984 with a standard deviation of only 0.49. The slope of the regression line, however, was fairly low at 0.58/RT ln(10). Klamt, Eckert, and Diedenhofen claim that this discrepancy is not due to the weakness of the calculation method [91]. Another study by Eckert and Klamt in 2006 confirmed these results by reporting that a correlation of experimental pKa values with free energies of dissociation had an r2 = 0.98 with a deviation of 0.56 pKa units and again a significantly smaller slope than the accepted 1/RT ln(10) [92]. These values seem to indicate that COSMO-RS, contrary to Kelly, Cramer, and Truhlar’s publication, is actually more accurate than SM6. However, Kelly and colleagues might claim that this lower slope indicates underlying systematic error, and we would agree.

127

Theoretical Calculations of Acid Dissociation Constants: A Review Article

Although Kelly, Cramer, and Truhlar found that a single explicit water molecule increased the accuracy of acid dissociation free energies for SM6, this trend was not common to all solvation methods or for the addition of multiple water molecules. They reported the gas-phase binding free energies of (H2O)n CO3—2 with n = 0 to n = 3 for SM6, SM5.43R, and DPCM/98 with UAHF atomic radii levels of theory. As the number of water molecules increased from zero to three, the accuracy also increased for SM6. The two other continuum solvation models, SM5.43R and DPCM, however, decreased in the accuracy of gas-phase binding free energies as the number of explicit water molecules increased. With the addition of one water molecule, SM5.43R did become more accurate. It got significantly worse, however, as the number of water molecules increased, as the absolute deviations went from 1 kcal/mol with one water molecule to 10 kcal/mol with three molecules. The most accurate calculation for DPCM was with no explicit water molecules and the calculations continued to become less accurate with increased numbers of waters surrounding CO3—2. Overall, the most accurate method for the study was SM5.43R with one explicit water molecule, outperforming SM6 with three water molecules, which had an absolute deviation of 3 kcal/mol [8]. To supplement their previous publication, Kelly, Cramer, and Truhlar also conducted an extensive study surrounding the absolute aqueous solvation free energies of ions and ion—water clusters containing a single water molecule. They reported the following mean unsigned errors using values from their recent [12] and previous publication [25]. Table 1 shows that SM6 outperformed all continuum models, with SM6/MPW25/6-31G(d) producing the lowest mean unsigned error (MUE) of

Table 1 Mean unsigned errors in absolute aqueous solvation free energies of ions and ionwater clusters, with a single water molecule, for various continuum solvent models [12,25]

a b

Solvent model

Clustered data seta

All ionsb

SM6/MPW25/MIDI! SM6/MPW25/6-31G(d) SM6/MPW25/6-31þG(d) SM6/MPW25/6-31þG(d,p) SM6/B3LYP/6-31þG(d,p) SM6/B3PW91/6-31þG(d,p) SM5.43R/MPW25/6-31þG(d,p) DPCM/98/HF/6-31G(d) DPCM/03/HF/6-31G(d) CPCM/98/HF/6-31G(d) CPCM/03/HF/6-31G(d) IEF-PCM/03/HF/6-31G(d) IEF-PCM/03/MPW25/6-31þG(d,p)

3.7 3.3 3.5 3.5 3.6 3.5 5.3 5.8 13 6 7.3 7.4 8.6

4.8 4.5 4.6 4.5 4.7 4.6 6.1 5.7 14.3 6 7.1 7.2 8.4

Gas-phase optimized geometries at the B97-1/MG3S level of theory. Gas-phase optimized geometries at the MPW25/MIDI! level of theory.

128

Kristin S. Alongi and George C. Shields

3.3 kcal/mol when used with clustered ions. The data also shows that the clus tered ions resulted in lower MUE than the unclustered ions for all SM6 calcula tions by about 1 kcal/mol, reaffirming their conclusion derived from their prior publication [8]. Other levels of theory, however, do not produce as conclusive results and do not always produce lower MUE when implementing the cluster pair approximation. Overall, Kelly et al. [12] concluded that SM6 with diffuse basis functionals and clustered ions produce the most reliable values for the absolute aqueous solvation free energies (A(gas)!A(aq)). This implies that this method would also lead to the most accurate DGsol values. Furthermore, as the parameters in SM6 were originally developed based on an older accepted DGsol Hþ value of —264.3 kcal/mol reported by Zhan and Dixon [22], this was a significant finding because it showed that the SM6 calculations are also accurate when combined with the currently accepted DGsol Hþ value of —265.9 kcal/mol [10,12]. Jia et al. also studied the cluster continuum method using PCM with the HF/ 6-31þG(d), HF/6-311þþG(d,p), and B3LYP/6-311þþG(d,p) levels of theory for Eq. (13). For a data set of five organic acids, they found that the accuracy of the pKa calculations increases as the number of explicit water molecules increases from 0 to 3 [93]. In this study, relative pKa values were computed, so that lack of electron correlation in the gas-phase calculation apparently cancelled. Focusing on implicit calculations, da Silva et al. compared the success of DPCM and IEF-PCM in pKa calculations at the HF/6-31G(d) level using these polarizable continuum solvent models with UAHF radii and 15 different levels of free energy gas-phase calculations, HF, MP2, QCISD(T), B3LYP, G1, G2, G2MP2, G3, G3B3, G3MP2, CBS-4, CBS-4M, CBS-Q, CBS-QB3, CBS-APNO, for the DGgas calculation. The HF, MP2, and QCISD(T) theories use the 6-311þþG(3df,3pd) basis set. Overall, they found that DPCM was more successful than IEF-PCM at calculating the pKa value of nitrous acid. The most successful method was DPCM with the gas-phase calculation at B3LYP/6-311þþG(3df,3pd), which produced an error of only 0.3 pKa units. da Silva et al. however, concluded that this was probably due to a cancellation of errors. The other accurate values were calcu lated using high-level theories: CBS-APNO, CBS-QB3, and G2 [73]. Using the DPCM method, da Silva et al. also examined the effect of different basis sets with the HF level of theory using gas-phase geometries from G2, CBS Q, CBS-QB3, and CBS-APNO calculations. They reported that as the basis set size increased, excluding aug-cc-pVTZ, the accuracy of the pKa calculation also increased. The most accurate basis set paired with HF was aug-cc-pVQZ, which produced an absolute average error of 0.39 pKa units. da Silva also studied DPCM with DFT methods instead of HF. The average pKa values were calculated using G2, CBS-Q, CBS-QB3, and CBS-APNO for DGgas values. The free energy of solvation was calculated using B3LYP, TPSS, PBE0, B1B95, VSXC, B98, and O3LYP with 6-31G(d), 6-311þþG(3df,3pd), aug-cc-pVDZ, aug-cc-pVTZ, and aug-cc-pVQZ basis sets. Results indicated that the use of DFT methods produce much more accurate results than HF, with all theories within 0.3 pKa units of experimental values. The most accurate methods were VSXC, TPSS, B98, and B1B95, all with absolute average errors of less than 0.15 pKa units. Unlike the HF

Theoretical Calculations of Acid Dissociation Constants: A Review Article

129

results, da Silva found no benefit in using larger basis sets with DFT [73]. This observation rings true as DFT methods are semiempirical and each method with a given basis set is its own distinct model chemistry suitable for specific systems [94]. Increasing the basis set size does not systematically improve the result like it does in ab initio quantum chemistry [23,54]. In 2007, Sadlej-Sosnowska compared the DPCM, CPCM, and IEF-PCM levels of theory for the free energy of solvation calculations. The three methods were observed at the HF and B3LYP levels of theory with basis sets 6-31þG , 6-311þþG , pVDZ, and pVTZ. DPCM was used with a UAHF radius and IEF PCM was paired with UAHF and UA0. Sadlej-Sosnowska found that IEF-PCM with UAHF was more accurate than DPCM with UAHF when applied to neutral molecules, in contrast to the da Silva et al. results [73]. The most accurate level of theory was IEF-PCM with UAHF at the HF/cc-pVTZ level. In comparing radii, IEF-PCM with UAHF was more accurate than UA0 [95]. Takano and Houk studied several computational methods of solvation calcu lations and various cavity models in their 2005 publication [7]. They found that for the calculation of aqueous solvation free energies the CPCM method at the HF/6-31G(d)//HF/6-31þG(d) and HF/6-31þG(d)//B3LYP/6-31þG(d), with UAKS cavities that have radii optimized with PBE0/6-31G(d), are the most accurate, and produce mean absolute deviations of 2.6 kcal/mol. This mean absolute deviation was based on calculations concerning neutral and charged species. The accuracy of each cavity, UAKS, UAHF(G03), UAHF(G98), Bondi, Pauling, UA0, and UFF, varied based on the type of molecule. CPCM with a UAKS cavity model was also compared to COSMO, SM5.42R, PCM, IPCM [96], and cluster continuum with MP2/6-31þG(2df,2p)//HF//6-31þG(d,p) and IPCM. The CPCM data in Takano’s publication was compared with SM5.42R, PCM, IPCM, and cluster continuum data from Pliego and Riveros [97] and COSMO data from various publications [77,98,99]. CPCM was found to have the highest accuracy with a mean absolute deviation from experimental aqueous solvation free energies of 3.04 kcal/mol [7]. The other methods had about a 10 kcal/mol deviation, except IPCM with a MAD of 20 kcal/mol [7]. Yu, Liu, and Wang also studied the effect of cavity models. Their pKa calculations were based upon a system containing an explicit water molecule. Wang and collea gues studied the effect of UAHF, UAKS, Pauling, and Bondi cavity models on the accuracy of CPCM pKa calculations at the B3LYP/6-311þþG(2df,2p) level of theory on a B3LYP/6-31þG(d) optimized geometry. They reported that the pKa calculation depends greatly on the choice of solvation cavity. The most accurate methods were CPCM with an UAKS or UAHF cavity, which produced mean absolute deviations of 0.38 and 0.40 pKa units, respectively, from experimental pKa values [100]. This correlates well with the 2005 data reported by Takano and Houk [7]. Namazian and Halvani studied pKa calculations with an explicit water using the B3LYP/6-31þG(d,p) level of theory for free energy calculations in the gas phase and PCM/B3LYP/6-31þG(d,p) with the UA0 radius for solvation calcula tions. Using a data set of 66 acids, they found the method accurate within an average of 0.58 pKa units. The thermodynamic cycles used an explicit water, as in Eq. (9). Although the method produced pKa values within 0.6 pKa units, there is

130

Kristin S. Alongi and George C. Shields

some uncertainty in the free energy of solvation of H3Oþ and the B3LYP level of theory is not the most accurate for free energy calculations in the gas phase [31]. Gao and colleagues studied several methods of solvation calculations [29]: S1: CPCM/HF/6-311þG(d,p)//HF/6-311þG(d,p)

S2: CPCM/B3LYP/6-311þG(d,p)//HF/6-311þG(d,p)

S3: CPCM/HF/6-311þG(d,p)//CPCM/HF/6-311þG(d,p)

S4: CPCM/B3LYP/6-311þG(d,p)//CPCM/HF/6-311þG(d,p)

S5: CPCM/HF/6-31G(d) (Radii=UAHF)//HF/6-311þG(d,p)

S6: CPCM/HF/6-31G(d) (Radii=UAHF)//CPCM/HF/6-311þG(d,p)

S7: SM5.4/AM1 calculated from Spartan 04

S8: SM5.4/PM3 calculated from Spartan 04

S9: SM5.4/AM1 taken from Reference [101] (AMSOL)

S10: SM5.4/PM3 taken from Reference [101] (AMSOL)

S11: SM5.43R/mPW1PW91/6-31þG(d)//mPW1PW91/MIDI!

S12: CPCM/HF/6-31þG(d) (Radii=UAKS)//HF/6-311þG-(d,p)

S13: CPCM/HF/6-31þG(d) (Radii=UAKS)//CPCM/HF/6-311þG(d,p)

S14: Monte Carlo QM/MM

They found that methods CPCM/HF/6-31G(d) (Radii=UAHF)//CPCM/HF/6 311þG(d,p), SM5.4/PM3 calculated from Spartan 04, SM5.4/AM1 taken from

Reference [101] (AMSOL), and Monte Carlo QM/MM produced free energy of

hydration values within 1 kcal/mol of experimental values. The free energy of

hydration for the anion, Agas!Aaq, would be a vertical line on the right side of

the thermodynamic cycle shown in Figure 1.

The CPCM/HF/6-31G(d) (Radii=UAHF)//CPCM/HF/6-311þG(d,p) and Monte Carlo QM/MM levels of theory were the most accurate and produced free energy of hydration within 0.4 kcal/mol of experimental values [29]. Recognizing that the continuum solvent calculations are the weakest link in pKa calculations, Ho and Coote [3] used the CPCM (with UAKS and UAFH radii), SM6, IPCM, and COSMO-RS models to predict pKa values for a common data set of neutral organic and inorganic acids. They used four different thermo dynamic cycles, and in general found that the COSMO-RS, CPCM, and SM6 models worked best depending on the thermodynamic cycle used. We turn to a discussion of thermodynamic cycles in the next section.

5. THERMODYNAMIC CYCLES Various thermodynamic cycles are used in pKa calculations. Although previously a source of confusion in the field, it is now clear that as long as the most accurate experimental values are used, and no explicit water molecules are added, the choice of cycle should just be a matter of convenience. The most common is based on Eq. (1) and is diagramed in Figure 1, where a molecule is simply deproto nated, yielding a corresponding base and the proton in solution [1,2,4]. This cycle depends on the accuracy of the continuum model used to determine the anion (reaction 1) or cation (reaction 8) solvation energies, calculations that vary in

Theoretical Calculations of Acid Dissociation Constants: A Review Article

131

accuracy depending on the system in question and the parameterization of the solvation model. The most recent Minnesota solvation model, SMVLE [90], appears quite promising in this regard. In this method the SVPE term accounts for bulk electrostatics; the atomic surface tensions account for solvent cavitation, changes in dispersion energy, and any changes in local solvent structure; and a new functional term that accounts explicitly for local electrostatics at the solutesolvent boundary [90]. Because the SMVLE model specifically corrects for surface regions showing high charge concentrations, it gives the most accurate aqueous solvation free energies for ions relative to any other continuum method as of this writing. Other thermodynamic cycles used in various calculations contain explicit water molecules. The effectiveness of this implicit—explicit method in terms of calculating the free energy of solvation is discussed in the preceding section. Since the free energy of solvation is the largest source of error in determining pKa values, the accuracy of the calculation often determines the validity of various thermodynamic cycles. In 2007, Sadlej-Sosnowska studied three different thermodynamic cycles based on the following reactions [95]: þ Cycle 1 : AHðaqÞ ! A ðaqÞ þHðaqÞ

ð34Þ

þ Cycle 2 : AHðaqÞ þ H2 OðaqÞ ! A ðaqÞ þ H3 OðaqÞ

ð35Þ

Cycle 3 : AHðaqÞ þ OH ! A ðaqÞ þ H2 OðaqÞ

ð36Þ

For Cycle 1, the free energies of solvation of AH, A—, and Hþ are required, along with the gas-phase free energy difference for AH(g)!A—(g)þHþ(g). For Cycle 2, the free energies of solvation for AH, H2O, A—, and H3Oþ are required, along with the gas-phase free energy difference for AH(g)þH2O(g)!A—(g) and H3Oþ(g). Finally, for Cycle 3, instead of H3Oþ, we need the free energy of solvation for OH— and the gas-phase free energy difference for AH(g)þOH—(g)!A—(g) þ H2O(g). Sadlej-Sosnowska found that Cycle 1 was the most accurate because there was a large uncertainty in the calculation of the free energy of solvation of H3Oþ and OH— in Cycle 2 and 3, respectively. However, when experimental values were used in place of calculated values for DGsol and DGgas of OH— and H3Oþ, all three cycles produced the same pKa values. Sadlej-Sosnowska stated that the accuracy of Cycle 1 resulted from the use of the most accurate experimental DGsol Hþ value and if this value is changed then the results will shift accordingly [95]. This illustrates a good point, since water dissociates to Hþ and OH—, or two waters dissociate to H3Oþ and OH—; if the correct value for DGsol Hþ is used in Cycle 1 for reaction 34, that is equivalent to using the correct values for DGsol for H2O and H3Oþ in reaction 35, and to the correct values for DGsol for OH— and H2O in reaction 36. Once the value for DGsol Hþ is set, the rest of the values are easily found from this reference, and the DDGsol values should be consistent [12].

132

Kristin S. Alongi and George C. Shields

Many thermodynamic cycles contain the hydrogen proton and anions, which often leads to a large error in the computational calculation of the free energy of solvation of the anion. As a result, cycles with water molecules or additional acids [5,27—31] are often used to try and remove these sources of error. If accurate free energy values are used, pKa calculations can be fairly accurate, but many papers report pKa calculations with less accurate free energy values for Hþ. These publications would need to be recalculated with more accepted values to pro duce reliable and accurate data. For example, in 2006, Nino and coworkers studied the efficiency of model chemistries G1 [102], G2 [103], and G3 [66] for pKa calculations [30]. They reported that G1 is the most accurate of the three models, producing an average difference of 0.51 pKa units from experimental values when used with the CPCM solvation model in the study of aminopyridines. Results showed that the models decreased in accuracy, G1>G2>G3. The publication, however, contains the older accepted DGsol Hþ value of —264.61 kcal/mol in the 1M standard state, obtained from the acetic acid system [1,2], which as we have discussed in Section 2.1.1 is 1.29 kcal/mol more positive than the cur rently accepted —265.9 kcal/mol reported by Tissandier et al. in 1998 [10] and confirmed by Kelly et al. in 2006 [12]. This difference creates a discrepancy in those pKa values calculated in the paper and the actual values produced by the model chemistries. Changing the value for DGsol Hþ from —264.61 to —265.9 produces new pKa values that are approximately 0.95 pKa units less than those reported by Caballero. This changes the order of accuracy of the methods. After calculating new pKa values and comparing them to the experimental values reported in the Caballero publication, we find that the previously reported least accurate method, G3, is now the most effective in producing reliable pKa values. This correction outlines the importance of using correct experimental values and how a difference of merely 1.29 kcal/mol can change the conclusions about the most efficient method of pKa calculation. G3 has been shown to be superior to G2 (and G2 to G1) for many different gas-phase processes, including deprotonation, so the new ordering makes more sense. Quite simply, Nino and colleagues were led astray by the use of the accepted value for DGsol Hþ at the time they started their work. The significance of the free energy of solvation of a proton is also apparent in the publication by Bryantsev et al. [76]. In this work the DGsol Hþ value was treated as a parameter and fitted in order to obtain the most accurate pKa values. The Goddard group used DGsol Hþ = —267.9 kcal/mol and —267.6 kcal/mol for solution-phase and gas-phase optimized calculations, respec tively, for the 1M standard state. The values are off from the accepted value of —265.9 kcal/mol, however, still within the 2 kcal/mol error bars assigned by Kelly et al. [12]. Nevertheless, because of this discrepancy, the reported accuracy of less than 0.5 pKa units for solution-phase optimized geometries and greater than 0.5 pKa units for calculations on gas-phase optimized geometries might be over stated. These results will change when the accepted DGsol Hþ value is used. Recent work on using explicit waters in cluster continuum or implicit—explicit thermodynamic cycles show much promise, as long as the standard state issues

Theoretical Calculations of Acid Dissociation Constants: A Review Article

133

for water are consistent [3,9]. The key point is that water as a solvent, water as a solute, and all species involved in the thermodynamic cycle must be in a 1M standard state. At this point it is not clear how many explicit waters should be used in a cycle [3], although use of the variational method to determine the number of waters to be used, and putting the waters together as clusters instead of monomers, appears to have much promise [9].

6. CONCLUDING REMARKS Many variables affect the accuracy of pKa calculations. With regard to the free energy calculation in the gas phase, extra computational expense might be necessary to achieve values within 1.0 kcal/mol for reaction 33. However, getting this right is straightforward. CCSD(T) single-point energy calculations on MP2 or MP4 geome tries are accurate within a half kcal/mol or better. DFT methods should be benchmarked against appropriate experimental or ab initio results to ensure that the DFT method of choice is suitable for the systems of interest. Compound model chemistry methods such as G3, CBS-APNO, and W1 are also highly accurate. For the free energy of solvation calculation, however, it is difficult to discern the most accurate method. Recently, there have been numerous publications exploring the use of the cluster continuum method with anions. With regard to implicit solvation, there are no definite conclusions to the most accurate method, yet for the PB models the conductor-like models (COSMO; CPCM) appear to be the most robust over the widest range of circumstances [23]. At this writing, the SMVLE method seems to be the most versatile, as it can be used by itself, or with the implicit—explicit model, and the error bars for bare and clustered ions are the smallest of any continuum solvation method. The ability to add explicit water molecules to anions and then use the implicit method (making it an impli cit—explicit model) improves the results more often than the other implicit meth ods that have been used in the literature to date. Concerning thermodynamic cycles, the most important component is the treat ment of the free energy of the hydrogen ions. Even a slight difference in values can produce drastically different trends in pKa. The most accurate experimental value should be used in the equation. As of this writing the best values for the experi mental free energies of solvation, for a standard state of 1M and 298 K, are —265.9 kcal/mol for Hþ, —104.7 kcal/mol for OH—, —110.3 kcal/mol for H3Oþ, and —6.32 kcal/mol for H2O [9—12]. These values are all consistent to each other, as can be seen by using them in thermodynamic cycles to calculate the dissociation of water into its component ions, where DGgas is obtained from the NIST website. For the classic thermodynamic cycle displayed in Figure 1, using the accepted value for DGsol Hþ and considering the conversion of gas-phase calculations to the 1M standard state (Eq. 25), pKa values for the reaction in Eq. (1) at 298.15 K can be determined using Eq. (37), with the four calculated energies in kcal/mol: pKa ¼ ½Ggas ðA Þ Ggas ðHAÞþDGsol ðA Þ Gsol ðHAÞ 270:28567=1:36449 ð37Þ

134

Kristin S. Alongi and George C. Shields

Extra caution should be taken when performing pKa calculations on ionic species, as their strong electrostatic effects and large free energies of solva tion make accurate calculations difficult. Cycles involving explicit water molecules have their merit when dealing with these compounds. Interested readers should refer to the recent literature to ensure that they correct for the standard state of water, which should be 1M and not 55.34M in all cycles [3,9]. To further complicate matters, various functional groups or acidic/basic strength of the molecules may also affect the accuracy of methods. If the implicit solvent method used in the calculation of DGsol(A—) and DGsol(HA) are believed to yield good results for the species in question, then using thermodynamic cycle 1 of Figure 1 and Eq. (37) is the most straightforward way to calculate pKa values. Investigators are encouraged to use the highest level of theory they can afford to calculate Ggas(A—) and Ggas(HA). Due to the numerous potential cycles using explicit molecules, levels of theory, basis sets, and types of molecules, it is impossible to determine one specific method that produces the most accurate pKa values. Rather, this review serves to summarize the current literature and illustrate various schemes that have been successful. Accurate attention to detail and the use of benchmark calculations or experimental values to assist in determination of the correct method to use for a particular system is highly recommended. Further research on thermodynamic cycles using explicit cycles, clustered water structures, con formational effects, and advances in continuum solvation calculations will con tinue to advance this field.

REFERENCES 1. Liptak, M.D., Shields, G.C. Accurate pKa calculations for carboxylic acids using complete basis set and Gaussian-n models combined with CPCM continuum solvation methods. J. Am. Chem. Soc. 2001, 123(30), 7314—9. 2. Liptak, M.D., Gross, K.C., Seybold, P.G., Feldgus, S., Shields, G.C. Absolute pKa determinations for substituted phenols. J. Am. Chem. Soc. 2002, 124(22), 6421—7. 3. Ho, J.M., Coote, M.L. A universal approach for continuum solvent pK(a) calculations: Are we there yet? Theor. Chem. Acc. 2010, 125(1—2), 3—21. 4. Liptak, M.D., Shields, G.C. Experimentation with different thermodynamic cycles used for pKa calculations on carboxylic acids using complete basis set and Gaussian-n models combined with CPCM continuum solvation methods. Int. J. Quantum Chem. 2001, 85, 727—41. 5. Brown, T.N., Mora-Diez, N. Computational determination of aqueous pK(a) values of protonated benzimidazoles (part 1). J. Phys. Chem. B 2006, 110(18), 9270—79. 6. Pliego, J.R. Thermodynamic cycles and the calculation of pK(a). Chem. Phys. Lett. 2003, 367(1—2), 145—9. 7. Takano, Y., Houk, K.N. Benchmarking the conductor-like polarizable continuum model (CPCM) for aqueous solvation free energies of neutral and ionic organic molecules. J. Chem. Theory. Comput. 2005, 1, 70—7. 8. Kelly, C.P., Cramer, C.J., Truhlar, D.G. Adding explicit solvent molecules to continuum solvent calculations for the calculation of aqueous acid dissociation constants. J. Phys. Chem. A 2006, 110(7), 2493—9. 9. Bryantsev, V.S., Diallo, M.S., Goddard, W.A. Calculation of solvation free energies of charged solutes using mixed cluster/continuum models. J. Phys. Chem. B 2008, 112(32), 9709—19.

Theoretical Calculations of Acid Dissociation Constants: A Review Article

135

10. Tissandier, M.D., Cowen, K.A., Feng, W.Y., Gundlach, E., Cohen, M.H., Earhart, A.D., Coe, J.V., Tuttle, T.R. The proton’s absolute aqueous enthalpy and Gibbs free energy of solvation from cluster-ion solvation data. J. Phys. Chem. A 1998, 102(40), 7787—94. 11. Camaioni, D.M., Schwerdtfeger, C.A. Comment on “accurate experimental values for the free energies of hydration of Hþ, OH, and H3Oþ”. J. Phys. Chem. A 2005, 109(47), 10795—7. 12. Kelly, C.P., Cramer, C.J., Truhlar, D.G. Aqueous solvation free energies of ions and ion-water clusters based on an accurate value for the absolute aqueous solvation free energy of the proton. J. Phys. Chem. B 2006, 110(32), 16066—81. 13. Pliego, J.R., Riveros, J.M. Gibbs energy of solvation of organic ions in aqueous and dimethyl sulfoxide solutions. Phys. Chem. Chem. Phys. 2002, 4(9), 1622—7. 14. Pliego, J.R., Riveros, J.M. Theoretical calculation of pK(a) using the cluster-continuum model. J. Phys. Chem. A 2002, 106(32), 7434—9. 15. McQuarrie, D.M. Statistical Mechanics, Harper and Row, New York, 1970, p. 86. 16. Jorgensen, W.L., Briggs, J.M., Gao, J. A priori calculations of pKas for organic compounds in water–the pKa of ethane. J. Am. Chem. Soc. 1987, 109(22), 6857—8. 17. Jorgensen, W.L., Briggs, J.M. A priori pKa calculations and the hydrations of organic-anions. J. Am. Chem. Soc. 1989, 111(12), 4190—7. 18. Lim, C., Bashford, D., Karplus, M. Absolute pKa calculations with continuum dielectric methods. J. Phys. Chem. 1991, 95(14), 5610—20. 19. Schuurmann, G., Cossi, M., Barone, V., Tomasi, J. Prediction of the pK(a) of carboxylic acids using the ab initio continuum-solvation model PCM-UAHF. J. Phys. Chem. A 1998, 102(33), 6706—12. 20. Toth, A.M., Liptak, M.D., Phillips, D.L., Shields, G.C. Accurate relative pK(a) calculations for carboxylic acids using complete basis set and Gaussian-n models combined with continuum solvation methods. J. Chem. Phys. 2001, 114(10), 4595—606. 21. Tawa, G.J., Topol, I.A., Burt, S.K., Caldwell, R.A., Rashin, A.A. Calculation of the aqueous free energy of the proton. J. Chem. Phys. 1998, 109(12), 4852—63. 22. Zhan, C.G., Dixon, D.A. Absolute hydration free energy of the proton from first-principles electronic structure calculations. J. Phys. Chem. A 2001, 105(51), 11534—40. 23. Cramer, C.J. Essentials of Computational Chemistry: Theories and Models, 2nd edn., John Wiley & Sons Ltd, Chichester, England, 2004, p. 579. 24. Palascak, M.W., Shields, G.C. Accurate experimental values for the free energies of hydration of Hþ, OH, and H3Oþ. J. Phys. Chem. A 2004, 108(16), 3692—4. 25. Kelly, C.P., Cramer, C.J., Truhlar, D.G. SM6: A density functional theory continuum solvation model for calculating aqueous solvation free energies of neutrals, ions, and solute-water clusters. J. Chem. Theory. Comput. 2005, 1(6), 1133—52. 26. Marenich, A.V., Cramer, C.J., Truhlar, D.G. Universal solvation model based on solute electron density and on a continuum model of the solvent defined by the bulk dielectric constant and atomic surface tensions. J. Phys. Chem. B 2009, 113(18), 6378—96. 27. Fu, Y., Liu, L., Li, R.C., Liu, R., Guo, Q.X. First-principle predictions of absolute pK(a)’s of organic acids in dimethyl sulfoxide solution. J. Am. Chem. Soc. 2004, 126(3), 814—22. 28. De Abreu, H.A., De Almeida, W.B., Duarte, H.A. pK(a) calculation of poliprotic acid: Histamine. Chem. Phys. Lett. 2004, 383(1—2), 47—52. 29. Gao, D.Q., Svoronos, P., Wong, P.K., Maddalena, D., Hwang, J., Walker, H. pK(a) of acetate in water: A computational study. J. Phys. Chem. A 2005, 109(47), 10776—85. 30. Caballero, N.A., Melendez, F.J., Munoz-Cara, C., Nino, A. Theoretical prediction of relative and absolute pK(a) values of aminopyridines. Biophys. Chem. 2006, 124(2), 155—60. 31. Namazian, M., Halvani, S. Calculations of pK(a) values of carboxylic acids in aqueous solution using density functional theory. J. Chem. Thermodyn. 2006, 38(12), 1495—502. 32. Ho, J.M., Coote, M.L. pK(a) calculation of some biologically important carbon acids–an assessment of contemporary theoretical procedures. J. Chem. Theory. Comput. 2009, 5(2), 295—306. 33. Bickmore, B.R., Tadanier, C.J., Rosso, K.M., Monn, W.D., Eggett, D.L. Bond-valence methods for pK(a) prediction: Critical reanalysis and a new approach. Geochim. Cosmochim. Acta 2004, 68(9), 2025—42.

136

Kristin S. Alongi and George C. Shields

34. da Silva, R.R., Ramalho, T.C., Santos, J.M., Figueroa-Villar, J.D. On the limits of highest-occupied molecular orbital driven reactions: The frontier effective-for-reaction molecular orbital concept. J. Phys. Chem. A 2006, 110(3), 1031—40. 35. Habibi-Yangjeh, A., Danandeh-Jenagharad, M., Nooshyar, M. Application of artificial neural networks for predicting the aqueous acidity of various phenols using QSAR. J. Mol. Model. 2006, 12(3), 338—47. 36. Seybold, P.G., May, M., Bagal, U.A. Molecular-structure property relationships. J. Chem. Educ. 1987, 64(7), 575—81. 37. Needham, D.E., Wei, I.C., Seybold, P.G. Molecular modeling of the physical properties of the alkanes. J. Am. Chem. Soc. 1988, 110(13), 4186—94. 38. Seybold, P.G. Explorations of molecular structure-property relationships. SAR QSAR Environ. Res. 1999, 10(2—3), 101—15. 39. Gross, K.C., Seybold, P.G. Substituent effects on the physical properties and pK(a) of aniline. Int. J. Quantum Chem. 2000, 80(4—5), 1107—15. 40. Gross, K.C., Seybold, P.G. Substituent effects on the physical properties and pK(a) of phenol. Int. J. Quantum Chem. 2001, 85(4—5), 569—79. 41. Gross, K.C., Seybold, P.G., Peralta-Inga, Z., Murray, J.S., Politzer, P. Comparison of quantum chemical parameters and Hammett constants in correlating pK(a) values of substituted anilines. J. Org. Chem. 2001, 66(21), 6919—25. 42. Gross, K.C., Seybold, P.G., Hadad, C.M. Comparison of different atomic charge schemes for predicting pK(a) variations in substituted anilines and phenols. Int. J. Quantum Chem. 2002, 90(1), 445—58. 43. Hollingsworth, C.A., Seybold, P.G., Hadad, C.M. Substituent effects on the electronic structure and pK(a) of benzoic acid. Int. J. Quantum Chem. 2002, 90(4—5), 1396—403. 44. Ma, Y.G., Gross, K.C., Hollingsworth, C.A., Seybold, P.G., Murray, J.S. Relationships between aqueous acidities and computed surface-electrostatic potentials and local ionization energies of substituted phenols and benzoic acids. J. Mol. Model. 2004, 10(4), 235—9. 45. Peterangelo, S.C., Seybold, P.G. Synergistic interactions among QSAR descriptors. Int. J. Quan tum Chem. 2004, 96(1), 1—9. 46. Seybold, P.G. Analysis of the pK(a)s of aliphatic amines using quantum chemical descriptors. Int. J. Quantum Chem. 2008, 108(15), 2849—55. 47. Kreye, W.C., Seybold, P.G. Correlations between quantum chemical indices and the pK(a)s of a diverse set of organic phenols. Int. J. Quantum Chem. 2009, 109(15), 3679—84. 48. Montgomery, J.A., Frisch, M.J., Ochterski, J.W., Petersson, G.A. A complete basis set model chemistry. VI. Use of density functional geometries and frequencies. J. Chem. Phys. 1999, 110(6), 2822—7. 49. Ochterski, J.W., Petersson, G.A., Montgomery, J.A. A complete basis set model chemistry. 5. Extensions to six or more heavy atoms. J. Chem. Phys. 1996, 104(7), 2598—619. 50. Montgomery, J.A., Ochterski, J.W., Petersson, G.A. A complete basis-set model chemistry. 4. An improved atomic pair natural orbital method. J. Chem. Phys. 1994, 101(7), 5900—09. 51. Bartmess, J.E. Negative Ion Energetics Data. http://webbok.nist.gov (accessed January 20, 2006). 52. Pokon, E.K., Liptak, M.D., Feldgus, S., Shields, G.C. Comparison of CBS-QB3, CBS-APNO, and G3 predictions of gas phase deprotonation data. J. Phys. Chem. A 2001, 105, 10483—7. 53. Pickard, F.C., Griffith, D.R., Ferrara, S.J., Liptak, M.D., Kirschner, K.N., Shields, G.C. CCSD(T), W1, and other model chemistry predictions for gas-phase deprotonation reactions. Int. J. Quan tum Chem. 2006, 106(15), 3122—8. 54. Liptak, M.D., Shields, G.C. Comparison of density functional theory predictions of gas-phase deprotonation data. Int. J. Quantum Chem. 2005, 105(6), 580—7. 55. Adam, K.R. New density functional and atoms in molecules method of computing relative pK(a) values in solution. J. Phys. Chem. A 2002, 106(49), 11963—72. 56. Kaminski, G.A. Accurate prediction of absolute acidity constants in water with a polarizable force field: Substituted phenols, methanol, and imidazole. J. Phys. Chem. B 2005, 109(12), 5884—90. 57. Purvis, G.D., Bartlett, R.J. A full coupled-cluster singles and doubles model–the inclusion of disconnected triples. J. Chem. Phys. 1982, 76(4), 1910—8.

Theoretical Calculations of Acid Dissociation Constants: A Review Article

137

58. Watts, J.D., Gauss, J., Bartlett, R.J. Coupled-cluster methods with noniterative triple excitations for restricted open-shell Hartree-Fock and other general single determinant reference functions–energies and analytical gradients. J. Chem. Phys. 1993, 98(11), 8718—33. 59. Lee, Y.S., Kucharski, S.A., Bartlett, R.J. A coupled cluster approach with triple excitations. J. Chem. Phys. 1984, 81(12), 5906—12. 60. Watts, J.D., Bartlett, R.J. The inclusion of connected triple excitations in the equation-of-motion coupled-cluster method. J. Chem. Phys. 1994, 101(4), 3073—78. 61. Dunning, T.H. Gaussian-basis sets for use in correlated molecular calculations. 1. The atoms boron through neon and hydrogen. J. Chem. Phys. 1989, 90(2), 1007—23. 62. Krishnan, R., Pople, J.A. Approximate 4th-order perturbation theory of electron correlation energy. Int. J. Quantum Chem. 1978, 14(1), 91—100. 63. Helgaker, T., Klopper, W., Koch, H., Noga, J. Basis-set convergence of correlated calculations on water. J. Chem. Phys. 1997, 106(23), 9639—46. 64. Halkier, A., Helgaker, T., Jorgensen, P., Klopper, W., Koch, H., Olsen, J., Wilson, A.K. Basis-set convergence in correlated calculations on Ne, N-2, and H2O. Chem. Phys. Lett. 1998, 286(3—4), 243—52. 65. Bak, K.L., Jorgensen, P., Olsen, J., Helgaker, T., Klopper, W. Accuracy of atomization energies and reaction enthalpies in standard and extrapolated electronic wave function/basis set calculations. J. Chem. Phys. 2000, 112(21), 9229—42. 66. Curtiss, L.A., Raghavachari, K., Redfern, P.C., Rassolov, V., Pople, J.A. Gaussian-3 (G3) theory for molecules containing first and second-row atoms. J. Chem. Phys. 1998, 109(18), 7764—76. 67. Martin, J.M.L., de Oliveira, G. Towards standard methods for benchmark quality ab initio thermochemistry–W1 and W2 theory. J. Chem. Phys. 1999, 111(5), 1843—56. 68. Pickard, F.C., Pokon, E.K., Liptak, M.D., Shields, G.C. Comparison of CBS-QB3, CBS-APNO, G2, and G3 thermochemical predictions with experiment for formation of ionic clusters of hydro nium and hydroxide ions complexed with water. J. Chem. Phys. 2005, 122(2), 7. 69. Pickard, F.C., Dunn, M.E., Shields, G.C. Comparison of model chemistry and density functional theory thermochemical predictions with experiment for formation of ionic clusters of the ammo nium cation complexed with water and ammonia; atmospheric implications. J. Phys. Chem. A 2005, 109(22), 4905—10. 70. Cunningham, A.J., Payzant, J.D., Kebarle, P. A kinetic study of the proton hydrate Hþ(H2O)n equilibria in the gas phase. J. Am. Chem. Soc. 1972, 94(22), 7627—32. 71. Meot-Ner, M.(Mautner), Sieck, L.W. Relative acidities of water and methanol and the stabilities of the dimer anions. J. Phys. Chem. 1986, 90, 6687—90. 72. Kebarle, P. Gas phase ion thermochemistry based on ion-equilibria. From the ionosphere to the reactive centers of enzymes. Int. J. Mass Spectrom. 2000, 200, 313—30. 73. da Silva, G., Kennedy, E.M., Dlugogorski, B.Z. Ab initio procedure for aqueous-phase pKa calculation: The acidity of nitrous acid. J. Phys. Chem. A 2006, 110(39), 11371—6. 74. Ervin, K.M., Ho, J., Lineberger, W.C. Ultraviolet photoelectron spectrum of NO2 —. J. Phys. Chem. 1988, 92(19), 5405—12. 75. Range, K., Riccardi, D., Cui, Q., Elstner, M., York, D.M. Benchmark calculations of proton affinities and gas-phase basicities of molecules important in the study of biological phosphoryl transfer. Phys. Chem. Chem. Phys. 2005, 7(16), 3070—9. 76. Bryantsev, V.S., Diallo, M.S., Goddard, W.A. pK(a) calculations of aliphatic amines, diamines, and aminoamides via density functional theory with a Poisson-Boltzmann continuum solvent model. J. Phys. Chem. A 2007, 111(20), 4422—30. 77. Klamt, A., Schuurmann, G. COSMO–A new approach to dielectric screening in solvents with explicit expressions for the screening energy and its gradient. J. Chem. Soc. Perkin Trans. 1993, 2(5), 799—805. 78. Klamt, A., Jonas, V., Burger, T., Lohrenz, J.C.W. Refinement and parameterization of COSMO-RS. J. Phys. Chem. A 1998, 102(26), 5074—85. 79. Miertus, S., Scrocco, E., Tomasi, J. Electrostatic interaction of a solute with a continuum–A direct utilization of ab initio molecular potentials for the prevision of solvent effects. Chem. Phys. 1981, 55(1), 117—29.

138

Kristin S. Alongi and George C. Shields

80. Barone, V., Cossi, M. Quantum calculation of molecular energies and energy gradients in solution by a conductor solvent model. J. Phys. Chem. A 1998, 102(11), 1995—2001. 81. Barone, V., Cossi, M., Tomasi, J. Geometry optimization of molecular structures in solution by the polarizable continuum model. J. Comput. Chem. 1998, 19(4), 404—17. 82. Cossi, M., Scalmani, G., Rega, N., Barone, V. New developments in the polarizable continuum model for quantum mechanical and classical calculations on molecules in solution. J. Chem. Phys. 2002, 117(1), 43—54. 83. Cramer, C.J., Truhlar, D.G. Implicit solvation models: Equilibria, structure, spectra, and dynamics. Chem. Rev. 1999, 99, 2161—200. 84. Marenich, A.V., Olson, R.M., Kelly, C.P., Cramer, C.J., Truhlar, D.G. Self-consistent reaction field model for aqueous and nonaqueous solutions based on accurate polarized partial charges. J. Chem. Theory. Comput. 2007, 3(6), 2011—33. 85. Marenich, A.V., Cramer, C.J., Truhlar, D.G. Universal solvation model based on the generalized born approximation with asymmetric descreening. J. Chem. Theory. Comput. 2009, 5(9), 2447—64. 86. Marenich, A.V., Cramer, C.J., Truhlar, D.G. Performance of SM6, SM8, and SMD on the SAMPL1 test set for the prediction of small-molecule solvation free energies. J. Phys. Chem. B 2009, 113(14), 4538—43. 87. Zhan, C.G., Bentley, J., Chipman, D.M. Volume polarization in reaction field theory. J. Chem. Phys. 1998, 108(1), 177—92. 88. Zhan, C.G., Chipman, D.M. Cavity size in reaction field theory. J. Chem. Phys. 1998, 109(24), 10543—58. 89. Zhan, C.G., Chipman, D.M. Reaction field effects on nitrogen shielding. J. Chem. Phys. 1999, 110(3), 1611—22. 90. Liu, J., Kelley, C.P., Goren, A.C., Marenich, A.V., Cramer, C.J., Truhlar, D.G., Zhan, C.G. Free energies of solvation with surface, volume, and local electrostatic effects and atomic surface tensions to represent the first solvation shell. J. Chem. Theory. Comput. 2010, 6(4), 1109—1117. 91. Klamt, A., Eckert, F., Diedenhofen, M., Beck, M.E. First principles calculations of aqueous pK(a) values for organic and inorganic acids using COSMO-RS reveal an inconsistency in the slope of the pK(a) scale. J. Phys. Chem. A 2003, 107(44), 9380—6. 92. Eckert, F., Klamt, A. Accurate prediction of basicity in aqueous solution with COSMO-RS. J. Comput. Chem. 2006, 27(1), 11—9. 93. Jia, Z.K., Du, D.M., Zhou, Z.Y., Zhang, A.G., Hou, R.Y. Accurate pK(a) determinations for some organic acids using an extended cluster method. Chem. Phys. Lett. 2007, 439(4—6), 374—80. 94. Shields, G.C., Kirschner, K.N. The limitations of certain density functionals in modeling neutral water clusters. Synthesis Reactivity Inorg. Metal-Organic Nano-Metal Chem. 2008, 38(1), 32—6. 95. Sadlej-Sosnowska, N. Calculation of acidic dissociation constants in water: Solvation free energy terms. Their accuracy and impact. Theor. Chem. Acc. 2007, 118(2), 281—93. 96. Foresman, J.B., Keith, T.A., Wiberg, K.B., Snoonian, J., Frisch, M.J. Solvent effects. 5. Influence of cavity shape, truncation of electrostatics, and electron correlation ab initio reaction field calcula tions. J. Phys. Chem. 1996, 100(40), 16098—104. 97. Pliego, J.R., Riveros, J.M. The cluster-continuum model for the calculation of the solvation free energy of ionic species. J. Phys. Chem. A 2001, 105(30), 7241—7. 98. Andzelm, J., Kolmel, C., Klamt, A. Incorporation of solvent effects into density-functional calculations of molecular energies and geometries. J. Chem. Phys. 1995, 103(21), 9312—20. 99. Cossi, M., Rega, N., Scalmani, G., Barone, V. Energies, structures, and electronic properties of molecules in solution with the C-PCM solvation model. J. Comput. Chem. 2003, 24(6), 669—81. 100. Yu, A., Liu, Y.H., Wang, Y.J. Ab initio calculations on pK(a) values of benzo-quinuclidine series in aqueous solvent. Chem. Phys. Lett. 2007, 436(1—3), 276—9. 101. Chambers, C.C., Hawkins, G.D., Cramer, C.J., Truhlar, D.G. Model for aqueous solvation based on class IV atomic charges and first solvation shell effects. J. Phys. Chem. 1996, 100(40), 16385—98. 102. Pople, J.A., Head-Gordon, M., Fox, D.J., Raghavachari, K., Curtiss, L.A. Gaussian-1 theory: A general procedure for prediction of molecular energies. J. Chem. Phys. 1989, 90(10), 5622—29. 103. Curtiss, L.A., Raghavachari, K., Trucks, G.W., Pople, J.A. Gaussian-2 theory for molecularenergies of 1st-row and 2nd-row compounds. J. Chem. Phys. 1991, 94(11), 7221—30.

CHAPTER

9 Antibiotics Targeting the Ribosome: Structure-Based Design and the Nobel Prize Edward C. Sherer

Contents

Introduction Ribosome Antibiotic Complexes RNA as a Drug Target Structure-Based Antibiotic Design: Case Studies 4.1 Designer oxazolidinones 4.2 Designer macrolides 4.3 Aminoglycoside derivatives I 4.4 Pleuromutilin derivatives 4.5 Chloramphenicol derivatives 4.6 Thiostrepton derivatives 4.7 RNA-directed fragment libraries 4.8 A-site scaffolds 4.9 Aminoglycoside derivatives II 5. Concluding Remarks Acknowledgments References

Abstract

Ribosome crystallography has recently been the subject of the Nobel Prize in Chemistry. Elucidation of ribosome structure has had a direct impact on drug design. A general overview of RNA as a drug target is presented followed by several case studies specifically covering molecular modeling and crystallographic impact on antibiotic drug discovery targeting the ribosome.

1. 2. 3. 4.

140

140

145

147

147

149

151

153

154

155

157

159

160

161

162

162

Keywords: ribosome; antibiotics; RNA; drug design; molecular modeling; crystallography; docking; QSAR; Macrolides; Oxazolidinones; aminoglycosides; molecular properties

Merck and Co., Inc., Rahway, NJ, USA Annual Reports in Computational Chemistry, Volume 6 ISSN: 1574-1400, DOI 10.1016/S1574-1400(10)06009-3

2010 Elsevier B.V. All rights reserved.

139

140

Edward C. Sherer

1. INTRODUCTION In a previous volume of ARCC, Fanwick provided insight into the successful incorporation of X-ray crystallography into the classroom [1]. His report laid out background and basic equations helping to define this incredibly powerful tool in modern chemistry, biology, and drug design. The present review serves as an extension of this initial insight into crystallography. The goal of this review is to capture the imagination of students by elucidating the power of crystallography and the science that it drives. To this end, several case studies showing how computational chemistry has impacted structure-based drug design will be described. Examples will be limited to antibiotic design targeting the ribosome. Not only has ribosome crystallography helped to advance the fight against increasingly resistant bacterial infections, but work performed in this field by Ramakrishnan, Steitz, and Yonath has recently been awarded the Nobel Prize in Chemistry.

2. RIBOSOME ANTIBIOTIC COMPLEXES Atomistic representations of macromolecules, especially anything approaching the size of the ribosome, are still relatively recent luxuries. Researchers entering the ribosome field have been indoctrinated into the field with books such as The Ribosome: Structure Function, & Evolution, a book that has cover art depicting rough 3-D sculptures from an art museum, in fact, not too distant from the rough but informative 3-D representations of the ribosome described in the book [2]. This was the state of affairs as of 1990, but a decade later, the community saw a rapid increase in the resolution of ribosome structures. As an example, crystal structures of smaller systems such as the CAP-DNA protein—nucleic acid complex were just being published during the early 1990s, after issues with DNA crystallization were worked out [3,4]. While reference is made to a single struc ture here, the cocrystallization of proteins and nucleic acids is an important component of ribosome structure solution. In what would be the first of a flood of high-resolution ribosome crystal structures, Steitz and coworkers published the atomic structure of the 50S ribo ˚ [5]. While refinement of the 50S subunit was underway, Ramak some at 2.4 A rishnan and coworkers were successfully refining the structure of the 30S [6,7]. In related work involving the structure of the 30S and 50S ribosomal subunit, the Yonath laboratory was helping to refine the atomistic view of protein synthesis and antibiotic binding [8—10]. Taken together, the 50S and 30S comprise the fully functioning ribosome, or the 70S. Structural elucidation of the entire complex has been done in the laboratories of Noller, Cate, and Ramakrishnan, and while resolution has improved considerably over the last several years, the resolution is still low compared to the smaller subunits [11—15]. In 2009, the Nobel Prize in Chemistry was awarded to Ramakrishnan, Steitz, and Yonath for their combined work in elucidating the structure and function of the ribosome. This work has led

Antibiotics Targeting the Ribosome: Structure-Based Design and the Nobel Prize

141

to profound insight into the biology, chemistry, and structure of the ribosome [16—18]. The ribosome has additionally provided a rich environment for struc ture-based drug design. Example crystal structures of the 50S, 30S, and 70S are shown in Figure 1. Translation of the genetic code from messenger RNA (mRNA) to proteins via appropriate selection of transfer RNAs (tRNA) is accomplished by the macro molecular ribosome machine. The 70S ribosome is composed of two main sub units, the 50S and 30S, which are composed of ribosomal RNA (rRNA) and many individual ribosomal proteins (numerical values of the subunits relate to sedi mentation/centrifugation rates). The main subunits of the 50S and 30S are the 23S and 16S, respectively. Spanning the contact surfaces of the 50S and 30S are the aminoacyl-, peptidyl-, and “exit”-sites, labeled the A-, P-, and E-sites (a)

(b)

(c)

(d)

Figure 1 (a) Two views of the 50S from Haloarcula marismortui (IMIK). (b) Two views of the 50S from Deinococcus radiodurans (1NWY). (c) Two views of the 30S from Thermus thermophilus (1FJG). (d) Two views of the 70S from T. thermophilus (1VSA). Proteins have been left out for simplicity.

142

Edward C. Sherer

(a)

(b)

(c)

(d)

Figure 2 Various antibiotics bound to (A) 50S or (B) 30S. A summary of crystal structures and antibiotics displayed is provided in Table 1. An enlarged view of the overlay of structures is found in (C) and (D) for the 50S and 30S, respectively.

P

E

A

P

A

E

PTC

50S

Figure 3

30S

Location of the A-, P-, and E-sites as well as the PTC on the 50S and 30S.

(Figures 2 and 3). Peptide bond formation takes place at the peptidyl transfer ase center (PTC) and as the nascent peptide is extended, the single-stranded protein is extruded out the exit tunnel to further fold. One of the first

Antibiotics Targeting the Ribosome: Structure-Based Design and the Nobel Prize

Table 1

143

Crystal structures of ribosome antibiotic complexes in Figure 2

Haloarcula marismortui 50S

PDB code

Anisomycin (8) Azithromycin (1) Blasticidin Carbomycin Chloramphenicol (5) Clindamycin (6) Erythromycin (11) Linezolid (2) Sparsomycin (7) Spiramycin Telithromycin (12) Tiamulin (30) Tylosin Virginiamycin

3CC4 1YHQ 1KC8 1K8A 1NJI 1YJN 1Y1Z 3CPW 1VQ8 1KD1 1YIJ 3G4S 1K9M 1YIT

Thermus thermophilus 30S

PDB code

Hygromycin Kirromycin Pactamycin Paromomycin (27) Spectinomycin Streptomycin Tetracycline (9)

1HNZ 2WRN 1HNX 2UUA 1FJG 1FJG 1HNW

observations from the high-resolution structure of the ribosome was that the ribosome was in fact a ribozyme since the catalytic PTC pocket was composed entirely of RNA, and not of protein [5,19]. With the availability of highresolution structures of the ribosomal subunits, researchers quickly began solving antibiotic complexes [6,20—26]. Figure 2 and Table 1 depict several examples of different antibiotic classes binding the two large subunits of the ribosome. The long-term implications of this research to society are significant given the fact that multidrug-resistant organisms are on the rise [27—30]. With frontline antibiotics losing effectiveness for the treatment of bacterial infections, the wealth of structural information made available by solution of ribosomeantibiotic complexes has allowed for novel insight into many antibiotic classes. The PTC region of the 50S is the binding location of many antibiotics (Figure 4), where the mechanism of action is interruption of peptide bond formation or protein elongation and extension. Other antibiotics that bind the 30S (Figure 4) serve to block incoming tRNAs. Figure 2 provides a simplified overview of the binding of several antibiotics to the 50S and 30S. The structures

144

Edward C. Sherer

N

O OH

HO

O

O

N

O

OH O

OH

N

OH

N

O

F

NH

HN

OH

O

HO H2N

HO

O

HO HO

O

O NH2

H2N

O

OH NH2

O O

1

2

3

O OH

OH

O

Cl

O

Cl N

HO O

O

H

H

OH OH

N H

NO2

NH

Cl

OH S

OH

OH

O

4 O HN O

N H

5

HO

O

H

O

N

H

S

O

HO

O

O

N

HO

OH

S

NH2 N H

7

6

OH

8

O

OH

OH

O

O

9

Figure 4 Exemplar structures of various antibiotic classes that bind to either the 50S or the 30S subunit. Macrolides: azithromycin (1), oxazolidinones: linezolid (2), aminoglycosides: Kanamycin A (3), Pleuromutilin (4), phenylpropanoids: chloramphenicol (5), lincosamides: clindamycin (6), Sparsomycin (7), Anisomycin (8), and tetracycline (9). See Scheme 9 for thiosptrepton (38). Not pictured: streptogramins such as quinupristin/dalfopristin.

depicted are summarized in Table 1. A comprehensive review of ribosomeantibiotic cocrystal structures has recently been published by Wimberly, and the interested reader is referred to that paper for a table of nearly 100 complexes, along with insightful discussion [31]. Wimberly’s review breaks down the com plexes by binding to the 50S or 30S, as well as separating the antibiotic complexes by species. The dominant organisms that have provided high-resolution ribo some structures are Haloarcula marismortui, Deinococcus radiodurans, Thermus ther mophilus, and Escherichia coli. The structures have given the research community a detailed picture of the prokaryotic, bacterial ribosome. Recently, Rib-X Pharma ceuticals has disclosed its discovery and solution of the crystal structure of the 50S subunit from a Gram-positive organism [31]. Differences in eukaryotic (human) and prokaryotic ribosome structures allow for antibiotics to selectively target bacterial translation. High-resolution crystal structures have provided insight into species selectivity. With the antibacterial resistance problem at hand, and with Nobel-caliber chemistry being brought to bear on the problem, the scientific community is

Antibiotics Targeting the Ribosome: Structure-Based Design and the Nobel Prize

145

poised to deliver new antibiotics when they are needed most. As many antibio tics bind the ribosome, and the ribosome is mainly an RNA-based drug target, one must consider whether conventional drug design in a historically proteincentric field will be effective in an RNA environment.

3. RNA AS A DRUG TARGET Selecting the ribosome as a design target for new antibiotics is the focus of the current review, and as such, the general field of drug design on protein targets will be wholly ignored. The problem of selecting an RNA target is not limited to ribosomal antibiotics, but also includes ribozymes, HIV-1, tRNA, thymidylate synthase mRNA, antisense RNA, small interfering RNA (siRNA), microRNA, and others [32—43]. Of particular interest in the cited RNA drug target reviews are those of Wilson and Li, Thomas and Hergenrother, Drysdale et al., and Foloppe, Matassova and Aboul-ela [33,41—43]. While some would argue that targeting RNA or DNA is an entirely different beast compared to optimizing ligand interactions with proteins, common themes, approaches, and modeling tools can all be brought to bear on the problem. Investigation of ligand complexes with nucleic acid targets or nucleic acids binding to proteins indicates that hydrogen bonding, base stacking, ion pairing, and hydrophobic interactions are all still present and can be considered the driving forces for binding. It is commonplace to see stacking interactions between RNA bases and aromatic rings of ligand molecules. Directed hydrogen bonding between ligands and RNA bases, sugars, and backbone atoms is possible. Inter estingly, aromatic wedges (like the PTC A-site crevice) can accommodate aro matic rings, but these RNA wedges can also accommodate more hydrophobic aliphatic groups. It is commonplace to see basic amines in ligands that form ion pairs with the polyanionic environment created by the RNA phosphodiester backbone. Many structures exist of drugs intercalated into DNA or RNA helices, or of drugs bound to either shallow or deep grooves formed by an array of RNA secondary structures. Secondary structure can take on many forms for RNA, and binding to such features as duplex RNA, internal loops, bulges, stems, or hairpin loops is all possible [41]. Interaction of ligand functionality with RNA bases via hydrogen bonding can be considered pseudo-base pairing. Whether one is opti mizing nucleic acids to bind proteins or nonnucleic acid ligands to bind RNA or DNA, the interactions driving binding share common themes such as those listed above. For this reason antibiotic ligand design can be influenced by decades of work in the fields of antiviral and antibacterial nucleoside medicinal chemistry, as well as nucleic acid base analog design [44—50]. In the cited reviews, catalogs of information exist concerning modifications to nucleic acid monomers, which could easily be incorporated into the design of novel antibiotics or other nucleic acid-binding ligands. Structure-based drug design of protein-binding ligands can be accomplished through many different methods including similarity searching, virtual screening, quantitative structure activity relationship (QSAR) modeling,

146

Edward C. Sherer

pharmacophore searching, and docking, to name a few. Each of the methods mentioned share commonalities whether one is designing protein ligands or nucleic acid ligands. The method that is considered to have the most variation between protein and RNA ligand design is the application of scoring functions. Scoring functions are surrogate binding affinities that serve to rank compounds based upon how well they should bind to a target of interest, and take many forms [51—59]. As pointed out in the cited work, the design and application of scoring functions is of considerable interest, and is an active area of research. A globally predictive scoring function that would even accurately account for all protein—ligand interactions is a daunting goal, and it is at times possible to find greater success when more local, protein target-specific, class-specific, or even ligand chemical-class-specific scoring functions are derived. When considering docking to nucleic acids, specifically RNA, many researchers have taken this approach of pursuing nucleic acid-specific scoring functions. Scoring functions for use in designing ligands to bind RNA have received less attention compared to work with proteins, but important contributions have been made [60—65]. Early work in the field of calculating the affinity of ligands to RNA targets was done by Leclerc and Karplus. The multiple copy simulation search method (MCCS) method, based on energy calculations using the CHARMM forcefield, was used to look at binding to two RNA targets, the TAR RNA of HIV and the 16S aminoglycoside binding site of the 30S ribosome [60]. Improvement in binding predictions was made by scaling down the phosphate charges, and the authors concluded that better system modeling would need to take into account solvent and the polyelectrolyte effects of the nucleic acid. Docking to TAR was also investigated at RiboTargets, work that helped refine the scoring functions internal to the virtual screening system RiboDock [61]. Initial validation of the algorithm was performed by a series of cross-docking experiments over 10 RNA—ligand complexes. Use of the scoring function to select out specific RNA ligands showed good enrichment. RiboDock’s scoring function includes terms for hydrogen bonds, lipophilic interactions, steric repulsion, positively charged carbon acceptor interactions, aromatic stacking, donor—donor and acceptor—acceptor repulsion, and an esti mation of the ligand entropy penalty upon binding. An RNA-centric scoring function named DrugScoreRNA was described by Pfeffer and Gohlke, and is based on distance-dependent atom pair potentials [63]. The RNA version of DrugScore that was derived from 670 nucleic acid crystal complexes was shown to contain terms that varied when compared to the canonical protein DrugScore. Using DrugScoreRNA, the authors were able to distinguish tight binders from weak binders and inactive compounds. In a similar approach, Robertson and Varani used 45 crystal structures of nucleic acid—ligand com plexes to build a distance-dependent all-atom scoring function [64]. As with the Leclerc and Karplus study, Robertson and Varani saw a need for the better treatment of solvent and electrostatics. Moitessier et al. described work to generate the scoring function AutoDock, which was specifically improved by incorporating flexibility in the macromolecular host and ligands as tested by docking aminoglycosides to the A-site [54,59].

Antibiotics Targeting the Ribosome: Structure-Based Design and the Nobel Prize

147

While the above scoring functions look to separate out nucleic acid—ligand interactions, Zhao et al. have attempted to build a scoring function that would rank ligands binding to proteins and nucleic acids, named KScore [65]. KScore works off the potential of mean force (PMF) scoring function PMF99/PMF04, but adds an additional 17 protein and 28 nucleic acid atom types. The method allows for the incorporation of explicit waters and counterions. Using a system of 2422 protein—ligand, 97 RNA—ligand, and 300 DNA—ligand crystal structures, the authors derived a scoring function that was able to show good agreement between docking scores and experimental binding affinities. RNA is a target of active interest as described above. The targeting of RNA shares similar modeling tools and similar fundamental underlying principles. Hydrogen bonding, aromatic stacking, hydrophobic effects, and ligand strain energy are but a few of the similarities. Significant differences related to the design of nucleic acid ligands revolve around the treatment of electrostatics due to the phosphate backbone. Success has been seen in this endeavor both from a general sense [33,41,43] and specifically from the perspective of antibac terials [31,66—72]. For the remainder of this review, focus will remain on case studies specifically applying molecular modeling techniques in combination with crystallography for the design of novel antibiotics. The reader is referred to the above-cited reviews for case studies involving the successful use of molecular modeling for other RNA-based targets.

4. STRUCTURE-BASED ANTIBIOTIC DESIGN: CASE STUDIES 4.1 Designer oxazolidinones Rib-X Pharmaceuticals was founded to capitalize on the antibacterial cocrystal structures determined by Steitz and coworkers cited above. Use of the ribosome crystal structures has proven fruitful for the company as will be described in the two published case studies detailed herein. As an example, the following case study was largely influenced by the cocrystal of linezolid, recently deposited as PDB entry 3CPW [73]. Linezolid (2) was observed to bind near the PTC, and bound entirely to ribosomal RNA. Directed interactions were seen between the oxazolidinone ring and U2539 (Haloarcula #), a hydrogen bond between the NH group of the acetamide tail and the phosphate group of G2540, a p-stacking interaction between the fluor ophenyl ring and a hydrophobic crevice formed by residues A2486 and C2487, and a recruited interaction between the oxazolidinone ring, and U2539. The proximity of linezolid to other antibiotics bound to the ribosome influ enced several separate chemical series at Rib-X, some of which were inspired by linking portions of one binder (in one “ribofunctional” loci) to another [67,68,70,74—76]. With crystal structures in hand, Duffy et al. pursued structurebased drug design using several modeling tools derived from Jorgensen’s labora tory [77—79]. Molecules were designed using a grow-search-score algorithm (AnalogTM or BOMB). Design was driven by QSAR modeling predictive of

148

Edward C. Sherer

various antibacterial in vitro activities, molecular properties (QikProp), and QSAR models predictive of in vivo endpoints such as oral bioavailability [74,79]. Using BOMB, a residue-by-residue breakdown of the interaction energy of each new chemical alteration could be performed to help design maximal interactions with the ribosome. A linking algorithm was used that searched for known chemical solutions able to bridge two antibiotic fragments observed in different crystal structures. In the initial analysis, the proximity of linezolid (PDB 3CPW) and sparsomycin (7) (PDB 1M90) was evident [21,75,76]. Based on this observation, the linking algorithm was used for iterative design (13) of bridged molecules [31,68,80]. The general strategy is shown in Scheme 1. Two example scaffolds built into the linezolid variants are depicted in Scheme 2. These two structures, among others, demonstrated broad antibacterial activity, with compound 15 showing signifi cant affinity for the prokaryotic ribosome (<20 nM) and high selectivity over eukaryotic translation inhibition. The crystal structure of 14 bound to the ribo some has been published (PDB 3CXC) [68,75]. The Sparsomycin/linezolid series provided good antibacterial activity, but an even broader antibacterial spectrum

O

HO O

H

HN O

N H

O S

N H

O

S

N

NH

N

O

F

7

O

2 O

HO

O

H

HN O

HN

linking element

N H

N H

NH

N

O HN

O

13

Scheme 1

H N

H N O

O

O N H

O

O

NH

N F

14

Scheme 2

O N H

O

O

NH

NH

HN

N F

O

15

NH

HN

O

149

Antibiotics Targeting the Ribosome: Structure-Based Design and the Nobel Prize

O

H N

O

NH

N

H2N F

F

HN

16

N

F

O

17

HN

O

O

O N

NH

N F

N

18

N

HN

NH N

F

NH

N

O

H2N

O

H N

HN

O

19 N

O

H N

N

HN F

NH

HN

O

20

Scheme 3

was reported with the publication of the R�-01 family of compounds, what have been deemed designer oxazolidinones [31,80]. This new family of oxazolidinones was described and shown to effectively displace compounds that bind the ribosomal 50S A-site (linezolid site), including chloramphenicol and Puromycin [80]. The structures of several family members (16–20) are depicted in Scheme 3. The reader is referred to the primary citations for tables of antibacterial activity data (as is the case for all case studies). Com pounds such as 17 and 19 were compounds predicted to have good oral bioavail ability in the QSAR model [79]. Compounds such as 16 were predicted to have good Haemophilus influenzae activity in that QSAR activity model [79]. The com putational and crystallographically inspired design of novel oxazolidinones eventually led to Rib-X Pharmaceuticals’ clinical candidate, Radezolid (20), cur rently in Phase II clinical trials [31].

4.2 Designer macrolides Researchers at Rib-X Pharmaceuticals, in addition to reaching Phase II with an oxazolidinone, have also capitalized on the structure of available ribosome macrolide complexes [81—84]. Macrolides (Scheme 4) such as azithromycin (1), clarithromycin (10), and telithromycin (Ketek, 12) are positioned near to linezolid in the PTC region, again binding mostly to RNA. The structures of azithromycin (PDB 1M1K, 1YHQ) and linezolid led to a series of compounds that extended the structure of the macrolides in ways medicinal chemistry had previously consid ered undesirable. Compounds shown in Scheme 5 demonstrate antibacterial activity against many resistant organisms. Much of the benefit of this new class of macrolides

150

Edward C. Sherer

N O

N

O

N

N OH

O

HO

N

OH

OH OH

HO O

O

OH OH

HO

OH

O

OH

N

O

O

O

O

O O

O

O

N

N

O

O

O

O

O

OH

OH

10

O

O

O

OH

O

O O

O

O

N

OH

O

O

O

O

O

O

11

1

12

Scheme 4

F

F O

N

N

N

HO

O

O

22

NH 2

23

O

N

N

N N

N OH

O

OH

O

N

N

O

N

O

O

O

O

O N

O

O

N

O OH

OH

24

O

OH

S

O

N

O

N

N

OH

O

HO

O

F

O

N O

O

O

O

O

N

OH

OH O

N N

N HO

OH

F

F

O

O

O OH

OH

O

NO 2

O

21

O

O

O

N

O

HO

O

O

NO 2 N

O

OH

O

O

O

O

O

O

O

O

O

N

OH

O HO

OH

OH O

N N

N

OH

O

HO

N N

N

O

N

OH

O

N

N

N

O

25

26

O OH

Scheme 5

derives from tighter binding of the desosamine modifications that helps to over come dimethylation of a key adenine (A2508 Haloarcula #) residue, which sits under the desosamine sugar. Methylases that either monomethylate or dimethy late this residue lead to antibiotic strains that are highly resistant to macrolides. Crystallography and molecular modeling at Rib-X were used to map common resistance mutations affecting macrolides’ antibacterial effectiveness [85]. The ribosome-bound crystal structure of compound 22 was solved and showed room near the triazole linker, which could more optimally be filled to pick up additional binding to the ribosome. Computational scoring of modifications led to compounds such as 23 and 21, which brought the MICs of the series against several resistant methylase strains from inactive to very potent [84]. Structures 24 and 25 are examples of macrolides that were designed to modulate molecular

Antibiotics Targeting the Ribosome: Structure-Based Design and the Nobel Prize

151

properties influenced by properties predictions output from QikProp. While all examples in Scheme 5 build off the common azithromycin and clarithromycin cores, researchers at Rib-X have also been able to incorporate other macrolide cores into the more general family of substituted desosamine macrolides [82]. The ability to swap out cores in addition to predicting physical properties of virtual combinations of core and side chain (such as solubility and permeability) has provided compounds with improved exposure (compound 26) [82]. Burak et al. were faced with adjusting molecular property windows acceptable for favorable macrolide pharmacokinetics since the common Lipinski Rule of 5 for normal drug molecules was not relevant [86]. Bodoor et al. point out that for aminoglyco sides and macrolides, the values of polar surface area, hydrogen bond donors, and hydrogen bond acceptors are at times exceptionally large [87]. They highlight the fact that this is, in part, due to the bulky, chemically complex characteristics of antibiotics derived from natural products that do not conform to Rule of 5 guidelines. Compound 26 containing an oxime moiety saw an increase in rat bioavail ability and AUC of two-fold and approximately four-fold, respectively, when compared to azithromycin [82]. Other cores not depicted here but that can be incorporated into the series include telithromycin-type carbamates, additional oxime variants, as well as any of the common 14- and 15-membered macrolides.

4.3 Aminoglycoside derivatives I Case studies 4.1 and 4.2 focused on drugs that bind the 50S, or large ribosomal subunit. The other subunit, which along with the 50S forms the 70S ribosome, is the 30S. Solving the high-resolution crystal structure of the 30S led to insight into the aminoglycosides [7]. The antibiotics Pactamycin (PDB 1HNX), Hygromycin (PDB 1HNZ), and tetracycline (9, PDB 1HNW) were all shown to bind the 30S [6]. Hygromycin is a member of the aminoglycoside family, as is Kanamycin A (3). Vicens and Westhof, as well as Tor have written reviews concerning the targeting of RNA by aminoglycosides [88,89]. Vicens and Westhof partially derive their review from work done to determine the structure of aminoglycosides such as Paromomycin (27) that bound to a decoding site model of the 30S A-site (PDB 1J7T) [89,90]. Aminoglycoside antibacterial activity stems from interrupting bind ing of tRNA molecules to the decoding site. As summarized by Vicens and Westhof, aminoglycosides are observed to bind in deep RNA grooves, shallow grooves of RNA, or at a three-adenine bulge within the 16S. Directed interactions to the ribosome are observed via phosphate groups, with RNA bases, through counterions, protein side chains, base stacking, or pseudo-base pairing. In one major mechanism of antibiotic resistance, a single nucleotide variation of A1480G is enough to disrupt directed hydrogen bonding to the aminoglycoside. Molecu lar modeling was used to help explain why this mutation altered antibiotic binding [91]. RiboTargets, a biotechnology company that focused on antibiotics bound to the ribosome is now known as Vernalis. Instead of focusing on the PTC of the 50S,

152

Edward C. Sherer

RiboTargets focused on the 30S. Ribosome crystallography along with molecu lar modeling was used to propose novel aminoglycosides with improved activ ity against resistant organisms [92,93]. Starting from the NMR structure of Paromomycin bound to the A-site 16S (PDB 1PBR), a docking study was performed using DOCK version 4.0. Two databases, the Cambridge Structural Database and the National Cancer Institute, served as a basis set for supplying molecules for a virtual screen. A total of 273,000 compounds were docked into the binding site. An initial purging of compounds was made based on eliminat ing compounds with steric problems and/or conformational strain. Further refinement of the compounds was made by looking for directed interactions to the ribosomal RNA phosphate backbone, and good complementarity to the depressions and niches of the RNA surface. A subset of 40 compounds was chosen to work up synthetic ideas based on linking the identified groups to a truncated aminoglycoside core (neamine, compound 28, Scheme 6). In the end, the authors focused on pendant amines suitable for forming ion pair interac tions with the negatively charged phosphate backbone. Refinement of the proposed aminoglycoside antibiotics was performed using Amber 5.0 to run solvated energy minimizations. A total of nine compounds were synthesized (all containing substitution similar to 29), of which 29 had the best antibacterial profile [93]. Neamine (the initial core) does not exhibit antibacterial activity itself, but upon synth esis of 29, broad-spectrum antibacterial activity was obtained. The crystal structure of Paromomycin was used as a starting structure (PDB 1J7T) for molecular replacement to determine the cocrystal of 29 in the Vicens and Westhof model system, and coordinates have been deposited (PDB 109M) [92]. Arm 1 of 29, that containing the amide linker, makes four directed contacts with ribosomal RNA. These include contacts between the terminal amino group and two guanines and one cytosine, as well as contacts between the pendant hydroxyl and the phosphate backbone of a nearby uracil. The authors point out that the binding surface seen in the model A-site is identical to that seen in the full 16S. Contacts to arm 2 include a hydrogen bond from guanine to the secondary amine, aliphatic interactions with the OH OH

H 2N

OH

O

HO HO

HO HO

HO HO O

O

O NH2

H2N

OH

O O O

NH2

O

NH2 O

H2N HO

NH2

27 OH

HO HO

NH2

28

O H2N

H 2N O HO HO

H2N

NH2

HO

OH

O

HO HO

H2N

H2N

HO

3

NH2

O

H2N O HO

H N

HO

H NH2

O H N

29

Scheme 6

O NH2

153

Antibiotics Targeting the Ribosome: Structure-Based Design and the Nobel Prize

phosphate backbone (shape complementarity described as methylene hydro gens interlocking with the phosphate groups), and electrostatic interactions between the terminal amine and the phosphate groups of both a guanine and a uracil.

4.4 Pleuromutilin derivatives Another molecular class of antibiotics binding to the PTC region of the 50S ribosomal subunit is the pleuromutilins, represented by structures such as Pleur omutilin (4) and Tiamulin (30) (Scheme 7). Other pleuromutilins include Retapa mulin (31) and Valnemulin (32), both of which are decorated with basic amines as in Tiamulin. Chemical footprinting (chemical modification of target residues is blocked by ligand binding) of ribosomal residues indicates interaction with U2506 and U2585 (E. coli #), which is consistent with the location of pleuromu tilins when cocrystallized with the D. radiodurans 50S (PDB 1XPB, 2OGO, 2OGN, 2OGM) [94—97]. Crystal structures have indicated induced fitting of the various amine side chains opening the possibility of designing in new functionality. As with the linezolid and macrolide modifications, elucidation of the specific ribo somal interactions and available space around the binding environment allowed for new design ideas where historical modifications of the pleuromutilins had been made without knowledge of directed interactions. Lolk et al. set out to modify the existing mutilin core by backing off the extended amine arm and building in a linker that would allow for facile synthesis of many new analogs [95]. As with the macrolide core extensions described above, the triazole was the linker of choice. The authors clearly had in mind binding to RNA nucleotides, as the basis set for modifications was composed entirely of nucleic acid bases and nucleosides. The bases thymine, cytosine, and adenine were incorporated into the extended cores, as well as an unsubstituted phenyl. While these modifications brought into play additional base stacking, and directed hydrogen bonding interactions, of note was the lack of a basic amine in the newly designed species. While Pleuromutilin itself does not contain the

HO

4

S O

O

H2N

30

O

N

O

O

H2N

O

N

OH

O

31

N N

N

S

N H

OH

O

S

N

O

N

OH

O

OH

O

N

O

N

N

OH

O

N

OH

O

O

N O

32

O

33

34 O O

Scheme 7

154

Edward C. Sherer

basic amine, instead terminating in a hydroxyl, the other three mutilins do have such an amine. In the study, improvements in molecular properties, specifically water solubility, were attributed to the amine modifications and these modifica tions were not necessarily important for binding. Initial selection of substitutions was not influenced by molecular modeling, but was instead driven by structural biology and analysis of available space in the ligand cocrystals. Synthesis of 38 mutilin derivatives followed by chemical footprinting identified only two molecules of significant interest. Compounds 33 and 34 demonstrate that binding was possible only for relatively small fragments incorporating a 3-carbon linker off the triazole. Carbon linkers of length 1 or 2 carbons did not show evidence of binding, nor did any of the analogs incorpor ating furanose sugar rings. Identification of 33 and 34 led the authors to pursue docking calculations in an attempt to define specific interactions of the new molecules. Docking studies performed with Glide using a rigid ribosomal-binding site did not result in successful docking of the drugs. Instead, the authors allowed for flexibility in the binding site using soft restraints for residues directly contacting the ligands. A secondary docking experiment was run using the relaxed system leading to acceptable ligand poses in the PTC. All work was performed within the Schro¨dinger 2007 suite, such that minimizations were run under OPLS2005 in combination with an implicit water model. Validation of the procedure was made by docking Tiamulin to reproduce the Deinococcus cocrystal pose. Both ligands docked into the PTC such that U2506 would be protected as observed with chemical footprinting. The extended aromatic arms point down the PTC toward the exit tunnel, but no directed base stacking interactions between the phenyl/adenine rings and any ribosomal RNA bases were evident. The authors do not rule out the possibility for this type of interaction, stating that limitations in capturing of RNA flexibility may impede the docking routine from locating such an interaction. This hypothesis is consistent with the high degree of induced fitting observed in ribosomal—mutilin complexes solved by Yonath et al. [94]. No data indicating antibacterial properties of the newly designed ligands was included in the report.

4.5 Chloramphenicol derivatives A common motif evident in these case studies is binding of antibiotics to the PTC region of the large ribosomal subunit. Chloramphenicol (5) falls into this cate gory, with crystal structures indicating two possible binding possibilities encom passing two distinct binding loci (PDB 1NJI, 1KO1) [9,21]. Variation in binding may be influenced by cocrystallization in Deinococcus versus Haloarcula, where chloramphenicol is observed to bind to either the A-site crevice as with linezolid or near the entrance to the peptide exit tunnel located near the macrolide-binding site. In Haloarcula, chloramphenicol sits near the exit tunnel with the nitrophenyl group stacking between residues G2099 and A2100 (Haloarcula #). Residue G2099 is E. coli residue A2058, the adenine that is methylated by erm methylase leading

155

Antibiotics Targeting the Ribosome: Structure-Based Design and the Nobel Prize

NH2 OH

N

O

N O

NH2 –

OH N

O

N

O O O

P

NH2 O O

O

O

Cl

–

O

O

O

NO2

NH

O

NO2

NH

OH

OH

OH

35

–O

NO2

NH

OH OH

5

O

(1 - 2)

O

OH

P

O

(1 - 2)

O OH

O NH

NO2

NH

O

P

N O

O Cl

N

36

OH

37

Scheme 8

to bacterial resistance. In Deinococcus, the ligand stacks between residues A2486 and C2487 (Haloarcula #). Chemical modification and mutation experiments support binding of chloramphenicol to both locations [21]. Johansson et al. set out to modify the antibacterial properties of chloramphenicol by truncating the dichloro tail and installing new functionality (Scheme 8). The authors used chloramphenicol bound to the A-site crevice as their starting structure, and designed in linkers and head groups with the goal of improving binding. Their hope was to pick up additional binding to the P-loop region. The authors brought in components of the Haloracula structure, namely, the CCdA-Puromycin cofactor, into their Deinococcus model construct. Molecular modeling (dynamics simulations) using MacroModel was performed to determine whether linkers could connect the trun cated chloramphenicol to the dinucleotide sugars of the Puromycin derivative. Out of the modeling exercise, the authors chose to synthesize four molecules. Additionally, a pyrene derivative was hung off the linker in an attempt to stack with some portion of the ribosome. Chemical footprinting of compound 35 showed protection of residue U2506 (E. coli). Originally, compound 35 was designed such that the pyrene could stack with residues U2506, G2583, and U2584, but no chemical footprinting was observed for residues 2583 and 2584. For this reason, it is likely that the compound is binding in a different orientation, possibly even shifting to the other chloramphe nicol-binding site lower in the PTC. While the pyrene conjugate bound and protected U2506, no chemical footprinting was observed for compounds 36 and 37 (either in the 2- or 3-carbon linker varieties). No antibacterial activity was reported for any of the designed derivatives.

4.6 Thiostrepton derivatives In addition to work performed on the 30S, Lentzen and coworkers at the former RiboTargets pursued Thiostrepton (38) derivatives that bind to the L11 ribosomal protein-binding domain (L11BD) on the 23S subunit [98]. Stabilization of L11

156

Edward C. Sherer

binding to the ribosome influences the rate of protein translation, the interruption of which is the mechanism by which Thiostreptons have antibiotic properties. This rationale is partially driven by Thiostrepton- and Micrococin-binding mod els developed using the separate crystal structures of the L11BD (PDB 1MMS) and Thiostrepton (PDB 1E9W) [98]. Lentzen et al. assumed that the structures of the independently derived L11BD and Thiostrepton would not significantly distort upon association. They used the empirical scoring function of rDock to dock the two structures, making sure that NMR NOE assignments gathered were not compromised. Modeling allowed for the authors to identify a substructure of Thiostrepton derivatives that interacts with the L11BD, namely, the DHB or dihydrobutyrine. Building on this structural model, Lentzen et al. sought to truncate Thios trepton down to the relevant binding portion so that medicinal chemistry could be done to improve the antibacterial characteristics, and move into one or more new chemical series [99]. The required substructural portion of Thiostrepton, compound 38 in Scheme 9, which was used as a core scaffold for modification is found on the left-hand side of the molecule, and is conserved in compounds 39–41. The authors chose to pursue replacement of the ethanediol portion of Thiostrepton by incorporation of an amine library. Amide coupling to one of five central cores based on the DHB moiety was undertaken using a set of 16 amines. This provided a library of 93 compounds, of which only 5 were observed upon binding to incorporate a radiolabel into the L11BD. Unfortunately, a model RNA system had been used to monitor radiolabeling, and full ribosomal RNA did not show the same results. The most promising molecules from the library are displayed in Scheme 9. Inhibition of protein translation and antibacterial activity were not obtained by any of the derivatives in the library. The authors

S N

H N

H

N

N S

O

S OH

O

H H N

N H O

H

H H

H

NH

N

O HO H H

S

NH

H H N

N

N

N

39

O H

N

H

S

S

S N

H N

H

N

H N O

O

H O

H

HO OH

H

OH

38

HN

H

OH

40

O

HN HO

S H N

N O

Scheme 9

O

H

N

HO

S

O OH

NH

O

N

H N

O

O

HN

NH2

O

O

H

H

OH

O

N H

S H N

O

O

H N

H N

O

41

S

OH OH

N O

157

Antibiotics Targeting the Ribosome: Structure-Based Design and the Nobel Prize

attribute the lack of specific activity to nonspecific binding to RNA, and it is unclear whether, beyond the initial docking work, molecular modeling impacted the selection of reagents for the amine library.

4.7 RNA-directed fragment libraries Fragment screening approaches to drug discovery are now commonplace, but have historically been performed for protein targets [100,101]. Bodoor et al. have reported results on a screen directed toward the identification of fragments that specifically bind RNA [87]. Work began by identifying 120 small molecules that were known to interact with RNA. Lessons and techniques learned from protein-centric struc ture-based drug design can be adopted when designing RNA-specific ligands; of note is the finding in this RNA fragment study that the 120 RNA-binding ligands had comparable overlap in molecular operating environment (MOE) calculated physical properties to ligands binding proteases and kinases. Each RNA-binding ligand from the literature was fragmented, roughly across rotatable bonds, and then a clustering analysis identified 53 novel scaffold fragments. An additional 13 frag ments were added back into the set as the authors felt some clusters were underrepresented. Using a similarity approach to identify novel RNA binders, the team used the fragments from the 120-member library as templates for identification of a set of 102 new fragments to screen. As an example, compounds 42–46 (Scheme 10) represent known RNA binder cluster 40, and compounds 47–49 represent the available fragments that are most similar to the known species. A model system representing the aminoglycoside-binding portion of the 16S was used to screen the fragments via NMR [87,102]. After screening, the authors identified five HO HO

OH NH2

O

O HO

OH

OH NH2

HO OH

48

O

O

HO

OH

HO

O

OH

–

O

HO HO

O

OH

45

OH

49

46

47

NH2 OH

HO

O

HO

N

NH2

H2N

N

NH2 OH

50

51

52 O

H2N

NH2 NH2

53

Scheme 10

OH

OH

OH

44 OH

NH2

O N

OH

43

OH

O

NH2

O

42

H2N

OH

OH

OH

O

N

54

N

158

Edward C. Sherer

fragments, compounds 50–54, which are thought to bind the A-site, the RNA bulge site, or riboswitches [87]. A drawback to the present study, as pointed out by the authors, is that the study was limited to identifying RNA binders of the A-site 16S ribosomal model. In fact, many of the fragments that defined the search library as well as fragments found during the virtual screen may bind other portions of either the 30S or 50S. For the complete list of identified RNA-binding fragments, the reader is referred to supporting information from the Bodoor et al. paper. Additional RNA binding fragments have been found via NMR screening at Abbott, which identified novel fragments binding to the aminoglycoside A-site [103]. The NMR screen identified 2-aminobenzimidazoles and 2-aminoquino lines, as depicted in Scheme 11 (55–65). A model of the 2-aminoquinoline lead was docked into the A-site RNA model of Puglisi using InsightII [104]. Directed interactions were evident in stacking between RNA and the quinoline rings, and the positively charged primary amine forming an electrostatic interaction with the phosphate backbone. Docking observations led the authors to propose exten sion of the 2-aminoquinoline lead series to include compounds such as 66–69, which showed �M binding affinity to the A-site. Pinto et al. have used a combination of flexible docking and NMR screen ing to identify a set of RNA ligands that bind either human telomerase RNA (hTR-P2b: PDB 1NA2) or the A-site [105]. The authors used MORDOR, a flexible docking algorithm benchmarked against ligand—RNA complexes, to perform the screen for novel RNA binders. A library of 3000 compounds from the KEGG database was screened using MORDOR, followed by NMR confirmation that 10 fragments bound telomerase RNA. The 10 hits were used as probes in a similarity search of approximately 2 million compounds in the

H N

H N

N

56

NH2

60

61

N

N

57

58

N

NH2

NH2

N

N

N

H N

H N NH2

N

55

N

H N NH2

N H

59

N

62

63

N

NH

N

NH2

64

HO

65 O

NH2 N H2N

N

N

N

NH

O N H

N

N

N

N N

66

Scheme 11

67

68

69

N

159

Antibiotics Targeting the Ribosome: Structure-Based Design and the Nobel Prize

N

N

N N

HN

H N

O

HO

O

NH N

N H S

NH2

70

CF3

N O2N N

71

72

73

Scheme 12

ZINC database. Iterative docking, visual inspection, and NMR screening focused the library down to selective binders. Initial ranking of compounds was done solely on the MORDOR interaction energy, but experience led the authors to conclude that a combination of total energy of the system (ligand þ receptor) was a better ranking metric for enrichment compared to interaction energy. Lead classes identified in the modeling exercise include monoaromatics, chained polyaromatics, tetrahydroisoquinolines, 4-aminoqui nolines, 9-aminoacridines, phenothiazines, thiaxanthenones, anthracyclines, bis-tetrahydroisoquinoline-macrocycles, 6-7-6-ring systems, and miscellaneous fused ring systems. The reader is referred to the primary citation for a full structure summary, but a subset of the fragments (compounds 70–73) identi fied as selective hTR-P2b binders are depicted in Scheme 12.

4.8 A-site scaffolds Researchers at Vernalis (the former RiboTargets) capitalized on a working knowl edge of RNA binders along with molecular modeling tools and crystallography to identify new scaffolds that bind the tRNA decoding site (A-site) [106]. The basic screening strategy was to use computational filters such as molecular weight and the number of rotatable bonds to cut down the size of their in house library of nearly 1 million compounds. From the reduced list (~890,000), compounds were docked using RiboDock, and the top-scoring 2000 compounds were chosen for further analysis. The RNA model system was again that of Vicens and Westhof [90]. Initial selection of ligands from the 890,000 was done using the RiboDock scoring function, a minimization of ligand internal strain (for docking calcula tions, all ligand conformations within 7 kJ/mol of the minimum were included), and a balance of polar and nonpolar contributions. The authors found that the scoring function alone favored highly polar compounds, such that a secondary limit on the polar surface area needed to be enforced. While these compounds were argued to be favorable due to interactions with phos phates and hydrogen bonding groups of RNA, it was at the expense of heightened liabilities related to absorption and distribution. In order to speed

160

Edward C. Sherer

OH N H N H

N H

74 N H

NH

N

O O

75

O

N

H N

Br

HN N

N

O

O

O

N

76

N

O N

N

N

N

N

NH

N

Br

O

N O

HN

N H

O

O Br

77

78

79

80

Scheme 13

up the docking calculations, long-range electrostatics were ignored, but the authors felt confident that their scoring function made up for this algorithmic design since a large portion of the best 2000 compounds contained charges of þ1 or þ2 (thought to better interact with the anionic RNA environment). A methodological note is that the normal Monte Carlo/simulated annealing protocol from RiboDock was replaced with a genetic algorithm to improve the docking search. Each one of the 2000 docked ligands was visually inspected. Key interac tions that helped to prioritize the ligands were satisfaction of a guanine base stack and a hydrogen bond to an adenine. From these 2000, a prioritized set of 129 compounds was sent for FRET and NMR analysis. Their screening cam paign led to 34 compounds with some combination of NMR NOEs and/or apparent A-site affinities (Ki,app). All 34 compounds are reported in the pri mary citation and in Scheme 13, all structures (74–80) that showed good affinity in combination with NOEs are depicted (compound 74 was of highest interest to the team).

4.9 Aminoglycoside derivatives II Researchers at Anadys Pharmaceuticals have used structure-based molecular modeling to develop several novel classes of antibiotics binding to the 30S. Structures of aminoglycosides bound to the A-site were used as seed scaffolds [107,108]. They began initial synthetic exploration from a 2-deoxystreptamine (2-DOS, Scheme 14, compound 81), observed to make key hydrogen bonds to the ribosomal decoding site [109]. Design of an azepane-based series (compound 82) was based on picking up directed hydrogen bonds between the azepane ring nitrogen and hydroxyl and ribosomal RNA. Docking studies and molecular dynamics simulations to explore azepane conformational space influenced design, but details of the methodology were not reported. Several compounds in the series exhibited inhibition of translation, but only weak antibacterial activity.

161

Antibiotics Targeting the Ribosome: Structure-Based Design and the Nobel Prize

NH2

NH2

NH2 H N

HO

N

N

NH2

HO HO

NH2 O

O

R OH

H2N OH

NH2

H2N

N

H 2N

N NH

HO

NH2

OH HN

OH O

81

82

83

Scheme 14

Work continued at Anadys on an additional scaffold based on a 3,5-diamino piperidinyltriazine (DAPT) [110,111]. Using the 2-DOS scaffold as a starting point, the DAPT core was observed in structure-based efforts to effectively place functionality in desired locations in the A-site. Compound 83 displayed in vitro antibacterial activity against both Gram-positive and Gram-negative bacteria, but also demonstrated toxicity to human cell lines. The compound was active in an in vivo mouse protection model.

5. CONCLUDING REMARKS Solution of high-resolution crystal structures of various ribosome—antibiotic complexes in the laboratories of Ramakrishnan, Steitz, Yonath, Moore, Noller, and Cate, among others has led to a revolution in the design of antibiotics. This review captures ways in which molecular modeling tools have been combined with crystallography to influence antibiotic drug design. These examples serve as teaching tools for students interested in learning about how these fields are practically applied to combat real-life problems such as the rise of increasingly resistant bacterial infections. The design of ligands targeting proteins or nucleic acids has similar themes, including fundamental physical interactions like hydrogen bonding and base stacking. Antibiotics, often derived from natural products, may have shifted values for molecular properties, but examples above indicate that oral absorption is still achievable when guidance is provided by chemical properties predictors. The Nobel Prize was given, in part, to recognize the significant contribution of a detailed atomistic picture of the ribosome, and these structures have allowed drug design to capitalize on interactions not previously known. The field is poised to use these structures for more de novo design targeting new binding pockets of the ribosome, the result of which would hopefully lead to novel antibiotics effective against resistant bacteria. A note about references: Several citations are to posters presented at recent meetings, and are available at the web page of Rib-X Pharmaceuticals.

162

Edward C. Sherer

ACKNOWLEDGMENTS The author would like to thank Tim Blizzard, Viktor Hornak, Daniel McMasters, and George Shields for critical readings of the manuscript, and colleagues from Rib-X for helpful discussions.

REFERENCES 1. Fanwick, P.E. In Annual Reports in Computational Chemistry (eds D.C. Spellmeyer and R.A. Wheeler), Vol. 3, Elsevier, Amsterdam, 2007, p. 85. 2. Hill, W.E., Dahlberg, A., Garrett, R.A., Moore, P.B., Schlessinger, D., and Warner, J.R. (eds). The Ribosome: Structure, Function, and Evolution, American Society for Microbiology, Washington, DC, 1990. 3. Schultz, S.C., Shields, G.C., Steitz, T.A. Crystallization of Escherichia coli catabolite gene activator protein with its DNA binding site. The use of modular DNA. J. Mol. Biol. 1990, 213(1), 159. 4. Schultz, S.C., Shields, G.C., Steitz, T.A. Crystal structure of a CAP-DNA complex: The DNA is bent by 90 degrees. Science 1991, 253(5023), 1001. 5. Ban, N., Nissen, P., Hansen, J., Moore, P.B., Steitz, T.A. The complete atomic structure of the large ribosomal subunit at 2.4 A resolution. Science 2000, 289(5481), 905. 6. Brodersen, D.E., Clemons, W.M. Jr., Carter, A.P., Morgan-Warren, R.J., Wimberly, B.T., Ramakrish nan, V. The structural basis for the action of the antibiotics tetracycline, Pactamycin, and Hygro mycin B on the 30s ribosomal subunit. Cell 2000, 103(7), 1143. 7. Wimberly, B.T., Brodersen, D.E., Clemons, W.M. Jr., Morgan-Warren, R.J., Carter, A.P., Vonrhein, C., Hartsch, T., Ramakrishnan, V. Structure of the 30s ribosomal subunit. Nature 2000, 407(6802), 327. 8. Bashan, A., Agmon, I., Zarivach, R., Schluenzen, F., Harms, J., Berisio, R., Bartels, H., Franceschi, F., Auerbach, T., Hansen, H.A., Kossoy, E., Kessler, M., Yonath, A. Structural basis of the ribosomal machinery for peptide bond formation, translocation, and nascent chain progression. Mol. Cell. 2003, 11(1), 91. 9. Schlunzen, F., Zarivach, R., Harms, J., Bashan, A., Tocilj, A., Albrecht, R., Yonath, A., Franceschi, F. Structural basis for the interaction of antibiotics with the peptidyl transferase centre in eubacteria. Nature 2001, 413(6858), 814. 10. Schluenzen, F., Tocilj, A., Zarivach, R., Harms, J., Gluehmann, M., Janell, D., Bashan, A., Bartels, H., Agmon, I., Franceschi, F., Yonath, A. Structure of functionally activated small ribosomal subunit at 3.3 Angstroms resolution. Cell 2000, 102(5), 615. 11. Korostelev, A., Laurberg, M., Noller, H.F. Multistart simulated annealing refinement of the crystal structure of the 70s ribosome. Proc. Natl. Acad. Sci. USA 2009, 106(43), 18195. 12. Yusupov, M.M., Yusupova, G.Z., Baucom, A., Lieberman, K., Earnest, T.N., Cate, J.H., Noller, H.F. Crystal structure of the ribosome at 5.5 A resolution. Science 2001, 292(5518), 883. 13. Korostelev, A., Trakhanov, S., Laurberg, M., Noller, H.F. Crystal structure of a 70s ribosome-tRNA complex reveals functional interactions and rearrangements. Cell 2006, 126(6), 1065. 14. Schuwirth, B.S., Borovinskaya, M.A., Hau, C.W., Zhang, W., Vila-Sanjurjo, A., Holton, J.M., Cate, J.H. Structures of the bacterial ribosome at 3.5 A resolution. Science 2005, 310(5749), 827. 15. Selmer, M., Dunham, C.M., Murphy, F.V.T., Weixlbaumer, A., Petry, S., Kelley, A.C., Weir, J.R., Ramakrishnan, V. Structure of the 70s ribosome complexed with mRNA and tRNA. Science 2006, 313(5795), 1935. 16. Moore, P.B., Steitz, T.A. The structural basis of large ribosomal subunit function. Annu. Rev. Biochem. 2003, 72, 813. 17. Yonath, A. High-resolution structures of large ribosomal subunits from mesophilic eubacteria and halophilic archaea at various functional states. Curr. Protein Pept. Sci. 2002, 3(1), 67. 18. Yonath, A. The search and its outcome: High-resolution structures of ribosomal particles from mesophilic, thermophilic, and halophilic bacteria at various functional states. Annu. Rev. Biophys. Biomol. Struct. 2002, 31, 257. 19. Nissen, P., Hansen, J., Ban, N., Moore, P.B., Steitz, T.A. The structural basis of ribosome activity in peptide bond synthesis. Science 2000, 289(5481), 920.

Antibiotics Targeting the Ribosome: Structure-Based Design and the Nobel Prize

163

20. Hansen, J.L., Ippolito, J.A., Ban, N., Nissen, P., Moore, P.B., Steitz, T.A. The structures of four macrolide antibiotics bound to the large ribosomal subunit. Mol. Cell. 2002, 10(1), 117. 21. Hansen, J.L., Moore, P.B., Steitz, T.A. Structures of five antibiotics bound at the peptidyl transfer ase center of the large ribosomal subunit. J. Mol. Biol. 2003, 330(5), 1061. 22. Hermann, T. Drugs targeting the ribosome. Curr. Opin. Struct. Biol. 2005, 15(3), 355. 23. Schlunzen, F., Harms, J.M., Franceschi, F., Hansen, H.A., Bartels, H., Zarivach, R., Yonath, A. Structural basis for the antibiotic activity of ketolides and azalides. Structure 2003, 11(3), 329. 24. Takashima, H. Structural consideration of macrolide antibiotics in relation to the ribosomal interaction and drug design. Curr. Top Med. Chem. 2003, 3(9), 991. 25. Tenson, T., Mankin, A. Antibiotics and the ribosome. Mol. Microbiol. 2006, 59(6), 1664. 26. Wright, G.D. Resisting resistance: New chemical strategies for battling superbugs. Chem. Biol. 2000, 7(6), R127. 27. Bush, K., Macielag, M., Clancy, J. “Superbugs”: New antibacterials in the pipeline. Emerg. Drugs 2000, 5(4), 347. 28. Croft, A.C., D’Antoni, A.V., Terzulli, S.L. Update on the antibacterial resistance crisis. Med. Sci. Monit. 2007, 13(6), RA103. 29. Fukuda, Y. New approaches to overcoming bacterial resistance. Drugs Future 2009, 34(2), 127. 30. Projan, S.J., Bradford, P.A. Late stage antibacterial drugs in the clinical pipeline. Curr. Opin. Microbiol. 2007, 10(5), 441. 31. Wimberly, B.T. The use of ribosomal crystal structures in antibiotic drug design. Curr. Opin. Investig. Drugs 2009, 10(8), 750. 32. Afshar, M., Prescott, C.D., Varani, G. Structure-based and combinatorial search for new RNAbinding drugs. Curr. Opin. Biotechnol. 1999, 10(1), 59. 33. Foloppe, N., Matassova, N., Aboul-Ela, F. Towards the discovery of drug-like RNA ligands?. Drug Discov. Today 2006, 11(21—22), 1019. 34. Gallego, J., Varani, G. Targeting RNA with small-molecule drugs: Therapeutic promise and chemical challenges. Acc. Chem. Res. 2001, 34(10), 836. 35. Hermann, T. Strategies for the design of drugs targeting RNA and RNA-protein complexes. Angew. Chem. Int. Ed. Engl. 2000, 39(11), 1890. 36. Hermann, T., Westhof, E. RNA as a drug target: Chemical, modelling, and evolutionary tools. Curr. Opin. Biotechnol. 1998, 9(1), 66. 37. Lagoja, I.M., Herdewijn, P. Use of RNA in drug design. Expert Opin. Drug Discov. 2007, 2(6), 889. 38. Pearson, N.D., Prescott, C.D. RNA as a drug target. Chem. Biol. 1997, 4(6), 409. 39. Sucheck, S.J., Greenberg, W.A., Tolbert, T.J., Wong, C.H. Design of small molecules that recognize RNA: Development of aminoglycosides as potential antitumor agents that target oncogenic RNA sequences. Angew. Chem. Int. Ed. Engl. 2000, 39(6), 1080. 40. Sucheck, S.J., Wong, C.H. RNA as a target for small molecules. Curr. Opin. Chem. Biol. 2000, 4(6), 678. 41. Thomas, J.R., Hergenrother, P.J. Targeting RNA with small molecules. Chem. Rev. 2008, 108(4), 1171. 42. Wilson, W.D., Li, K. Targeting RNA with small molecules. Curr. Med. Chem. 2000, 7(1), 73. 43. Drysdale, M.J., Lentzen, G., Matassova, N., Murchie, A.I.H., Aboul-Ela, F., Afshar, M. In Progress in Medicinal Chemistry (eds F.D. King and A.W. Oxford), Vol. 39, Elsevier, Amsterdam, 2002, p. 73. 44. Berger, M., Wu, Y., Ogawa, A.K., McMinn, D.L., Schultz, P.G., Romesberg, F.E. Universal bases for hybridization, replication and chain termination. Nucleic Acids Res. 2000, 28(15), 2911. 45. Herdewijn, P., (ed). Modified Nucleosides, Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim, 2008. 46. Ichikawa, S., Matsuda, A. Nucleoside natural products and related analogs with potential ther apeutic properties as antibacterial and antiviral agents. Expert Opin. Ther. Pat. 2007, 17(5), 487. 47. Isono, K. Nucleoside antibiotics: Structure, biological activity, and biosynthesis. J. Antibiot. (Tokyo) 1988, 41(12), 1711. 48. Isono, K. Current progress on nucleoside antibiotics. Pharmacol. Ther. 1991, 52(3), 269. 49. Loakes, D. Survey and summary: The applications of universal DNA base analogues. Nucleic Acids Res. 2001, 29(12), 2437. 50. Chu, C.K. (ed). Antiviral Nucleosides: Chiral Synthesis and Chemotherapy, Elsevier, Amsterdam, 2003.

164

Edward C. Sherer

51. Friesner, R.A., Murphy, R.B., Repasky, M.P., Frye, L.L., Greenwood, J.R., Halgren, T.A., Sanscha grin, P.C., Mainz, D.T. Extra precision glide: Docking and scoring incorporating a model of hydrophobic enclosure for protein-ligand complexes. J. Med. Chem. 2006, 49(21), 6177. 52. Leach, A.R., Shoichet, B.K., Peishoff, C.E. Prediction of protein-ligand interactions. Docking and scoring: Successes and gaps. J. Med. Chem. 2006, 49(20), 5851. 53. Lyne, P.D., Lamb, M.L., Saeh, J.C. Accurate prediction of the relative potencies of members of a series of kinase inhibitors using molecular docking and MM-GBSA scoring. J. Med. Chem. 2006, 49(16), 4805. 54. Moitessier, N., Therrien, E., Hanessian, S. A method for induced-fit docking, scoring, and ranking of flexible ligands. Application to peptidic and pseudopeptidic beta-secretase (BACE 1) inhibitors. J. Med. Chem. 2006, 49(20), 5885. 55. Muegge, I. PMF scoring revisited. J. Med. Chem. 2006, 49(20), 5895. 56. Pham, T.A., Jain, A.N. Parameter estimation for scoring protein-ligand interactions using negative training data. J. Med. Chem. 2006, 49(20), 5856. 57. Warren, G.L., Andrews, C.W., Capelli, A.M., Clarke, B., LaLonde, J., Lambert, M.H., Lindvall, M., Nevins, N., Semus, S.F., Senger, S., Tedesco, G., Wall, I.D., Woolven, J.M., Peishoff, C.E., Head, M.S. A critical assessment of docking programs and scoring functions. J. Med. Chem. 2006, 49(20), 5912. 58. Yang, C.Y., Wang, R., Wang, S. M-score: A knowledge-based potential scoring function accounting for protein atom mobility. J. Med. Chem. 2006, 49(20), 5903. 59. Moitessier, N., Westhof, E., Hanessian, S. Docking of aminoglycosides to hydrated and flexible RNA. J. Med. Chem. 2006, 49(3), 1023. 60. Leclerc, F., Karlplus, M. MCSS-based predictions of RNA binding sites. Theor. Chem. Acc. 1999, 101, 131. 61. Morley, S.D., Afshar, M. Validation of an empirical RNA-ligand scoring function for fast flexible docking using RiboDock. J. Comput. Aided Mol. Des. 2004, 18(3), 189. 62. Perez-Montoto, L.G., Santana, L., Gonzalez-Diaz, H. Scoring function for DNA-drug docking of anticancer and antiparasitic compounds based on spectral moments of 2d lattice graphs for molecular dynamics trajectories. Eur. J. Med. Chem. 2009, 44(11), 4461. 63. Pfeffer, P., Gohlke, H. DrugScoreRNA—knowledge-based scoring function to predict RNA-ligand interactions. J. Chem. Inf. Model 2007, 47(5), 1868. 64. Robertson, T.A., Varani, G. An all-atom, distance-dependent scoring function for the prediction of protein-DNA interactions from structure. Proteins 2007, 66(2), 359. 65. Zhao, X., Liu, X., Wang, Y., Chen, Z., Kang, L., Zhang, H., Luo, X., Zhu, W., Chen, K., Li, H., Wang, X., Jiang, H. An improved PMF scoring function for universally predicting the interactions of a ligand with protein, DNA, and RNA. J. Chem. Inf. Model 2008, 48(7), 1438. 66. Charifson, P.S., Grossman, T.H., Mueller, P. The use of structure-guided design to discover new anti-microbial agents: Focus on antibacterial resistance. Anti-Infective Agents Med. Chem. 2009, 8, 73. 67. Franceschi, F. Back to the future: The ribosome as an antibiotic target. Future Microbiol. 2007, 2, 571. 68. Franceschi, F., Duffy, E.M. Structure-based drug design meets the ribosome. Biochem. Pharmacol. 2006, 71(7), 1016. 69. Knowles, D.J., Foloppe, N., Matassova, N.B., Murchie, A.I. The bacterial ribosome, a promising focus for structure-based drug design. Curr. Opin. Pharmacol. 2002, 2(5), 501. 70. Sutcliffe, J.A. Improving on nature: Antibiotics that target the ribosome. Curr. Opin. Microbiol. 2005, 8(5), 534. 71. Zheng, W., Borovinskaya, M.A., Pai, R.D., Cate, J.H., Doudna, H., Holton, J.M., In Structures of Aminoglycoside Antibiotics Bound to Both Subunits of the Bacterial Ribosome, WO, 2008, 148054. 72. Steitz, T.A., Moore, P.B., Ban, N., Nissen, P., Hansen, J., Sutcliffe, J., Oyerle, A., Ippolito, J.A., In Ribosome Structure and Protein Synthesis Inhibitors, EP, 2003, 1308457 A1. 73. Ippolito, J.A., Kanyo, Z.F., Wang, D., Franceschi, F.J., Moore, P.B., Steitz, T.A., Duffy, E.M. Crystal structure of the oxazolidinone antibiotic linezolid bound to the 50s ribosomal subunit. J. Med. Chem. 2008, 51(12), 3353. 74. Bhattacharjee, A., Chen, S., Farmer, J., Franceschi, F., Hammer, J., Hanselmann, R., Ippolito, J., Johnson, G., Kanyo, Z., Lawrence, L., McConnell, T., Sherer, E., Skripkin, E., Sutcliffe, J., Wang, D., Wimberly, B.,

Antibiotics Targeting the Ribosome: Structure-Based Design and the Nobel Prize

75.

76.

77. 78. 79.

80.

81.

82.

83.

84.

85. 86.

87.

88. 89. 90. 91.

92.

165

Zhou, J., Duffy, E., De Vivo, M., Structure-based drug design targeting infectious diseases, in Gordon Research Conference: New Antibacterial Discovery and Development, Lucca, Italy, 2008. Zhou, J., Bhattacharjee, A., Chen, S., Chen, Y., Duffy, E., Farmer, J., Goldberg, J., Hanselmann, R., Ippolito, J.A., Lou, R., Orbin, A., Oyelere, A., Salvino, J., Springer, D., Tran, J., Wang, D., Wu, Y., Johnson, G. Design at the atomic level: Generation of novel hybrid biaryloxazolidinones as promising new antibiotics. Bioorg. Med. Chem. Lett. 2008, 18(23), 6179. Zhou, J., Bhattacharjee, A., Chen, S., Chen, Y., Duffy, E., Farmer, J., Goldberg, J., Hanselmann, R., Ippolito, J.A., Lou, R., Orbin, A., Oyelere, A., Salvino, J., Springer, D., Tran, J., Wang, D., Wu, Y., Johnson, G. Design at the atomic level: Design of biaryloxazolidinones as potent orally active antibiotics. Bioorg. Med. Chem. Lett. 2008, 18(23), 6175. Jorgensen, W.L. The many roles of computation in drug discovery. Science 2004, 303(5665), 1813. Jorgensen, W.L., Duffy, E.M. Prediction of drug solubility from Monte Carlo simulations. Bioorg. Med. Chem. Lett. 2000, 10(11), 1155. Wang, D., Sherer, E., Duffy, E. A computational suite for the discovery of designer oxazolidinones suitable for IV and oral usage, in 45th Interscience Conference on Antimicrobial Agents and Chemotherapy, 2005. Skripkin, E., McConnell, T.S., DeVito, J., Lawrence, L., Ippolito, J.A., Duffy, E.M., Sutcliffe, J., Franceschi, F. Rx-01, a new family of oxazolidinones that overcome ribosome-based linezolid resistance. Antimicrob. Agents Chemother. 2008, 52(10), 3550. Bhattacharjee, A., Beard, H., DeVito, J., Farmer, J., Franceschi, F., Gould, T., Ippolito, J., Johnson, G., Kanyo, Z., Lawrence, L., Martynow, J., McConnell, T., Sherer, E., Sutcliffe, J., Tang, Y., Tow-Koegh, C., Wang, D., Wimberly, B., Wu, Y., Duffy, E. Structure-based drug design targeting infectious disease: Overcoming resistance and extending the antimicrobial spectrum of macrolide antibio tics. In Gordon Research Conference: Medicinal Chemistry, New London, NH, 2008. Burak, E., Bortolon, E., Franceschi, F., Gould, T., Jing, H., Johnson, G., Kanyo, Z., Molstad, D., Wu, Y., Duffy, E., Enhanced macrolides: Optimized molecular properties for oral exposure, in 49th Interscience Conference on Antimicrobial Agents and Chemotheraphy, San Francisco, CA, 2009. Hanselmann, R., Job, G.E., Johnson, G., Lou, R., Martynow, J.G., Reeve, M.M. Synthesis of an antibacterial compound containing a 1,4-substituted 1h-1,2,3-triazole: A scaleable alternative to the “click” reaction. Org. Proc. Res. Dev. 2010, 14(1), 152. Kanyo, Z., Bhattacharjee, A., Chen, S., Chen, Y., Dalton, J., DeVito, J., Farmer, J., Franceschi, F., Goldberg, J., Hanselmann, R., Ippolito, J.A., Johnson, G., Lawrence, L., Lou, R., McConnell, T.S., Orbin, A., Oyerle, A., Park, M., Salvino, J., Sherer, E.C., Sutcliffe, J., Tang, Y., Wang, D., Wu, Y., Duffy, E. Enhanced macrolides: Overcoming resistance by improving target affinity, in 49th Interscience Conference on Antimicrobial Agents and Chemotherapy, San Francisco, CA 2009. Franceschi, F., Kanyo, Z., Sherer, E.C., Sutcliffe, J. Macrolide resistance from the ribosome per spective. Curr. Drug Targets Infect. Disord. 2004, 4(3), 177. Lipinski, C.A., Lombardo, F., Dominy, B.W., Feeney, P.J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Del. Rev. 1997, 23, 3. Bodoor, K., Boyapati, V., Gopu, V., Boisdore, M., Allam, K., Miller, J., Treleaven, W.D., Weldeghior ghis, T., Aboul-ela, F. Design and implementation of an ribonucleic acid (RNA) directed fragment library. J. Med. Chem. 2009, 52(12), 3753. Tor, Y. Targeting RNA with small molecules. ChemBioChem 2003, 4(10), 998. Vicens, Q., Westhof, E. RNA as a drug target: The case of aminoglycosides. ChemBioChem 2003, 4(10), 1018. Vicens, Q., Westhof, E. Crystal structure of paromomycin docked into the eubacterial ribosomal decoding A site. Structure 2001, 9(8), 647. Pfister, P., Hobbie, S., Vicens, Q., Bottger, E.C., Westhof, E. The molecular basis for A-site mutations conferring aminoglycoside resistance: Relationship between ribosomal susceptibility and X-ray crystal structures. ChemBioChem 2003, 4(10), 1078. Russell, R.J., Murray, J.B., Lentzen, G., Haddad, J., Mobashery, S. The complex of a designer antibiotic with a model aminoacyl site of the 30s ribosomal subunit revealed by X-ray crystal lography. J. Am. Chem. Soc. 2003, 125(12), 3410.

166

Edward C. Sherer

93. Haddad, J., Kotra, L.P., Llano-Sotelo, B., Kim, C., Azucena, E.F. Jr., Liu, M., Vakulenko, S.B., Chow, C.S., Mobashery, S. Design of novel antibiotics that bind to the ribosomal acyltransfer site. J. Am. Chem. Soc. 2002, 124(13), 3229. 94. Davidovich, C., Bashan, A., Auerbach-Nevo, T., Yaggie, R.D., Gontarek, R.R., Yonath, A. Inducedfit tightens pleuromutilins binding to ribosomes and remote interactions enable their selectivity. Proc. Natl. Acad. Sci. USA 2007, 104(11), 4291. 95. Lolk, L., Pohlsgaard, J., Jepsen, A.S., Hansen, L.H., Nielsen, H., Steffansen, S.I., Sparving, L., Nielsen, A.B., Vester, B., Nielsen, P. A click chemistry approach to pleuromutilin conjugates with nucleosides or acyclic nucleoside derivatives and their binding to the bacterial ribosome. J. Med. Chem. 2008, 51(16), 4957. 96. Poulsen, S.M., Karlsson, M., Johansson, L.B., Vester, B. The pleuromutilin drugs tiamulin and valnemulin bind to the RNA at the peptidyl transferase centre on the ribosome. Mol. Microbiol. 2001, 41(5), 1091. 97. Schlunzen, F., Pyetan, E., Fucini, P., Yonath, A., Harms, J.M. Inhibition of peptide bond formation by pleuromutilins: The structure of the 50s ribosomal subunit from Deinococcus radiodurans in complex with tiamulin. Mol. Microbiol. 2004, 54(5), 1287. 98. Lentzen, G., Klinck, R., Matassova, N., Aboul-ela, F., Murchie, A.I. Structural basis for contrasting activities of ribosome binding thiazole antibiotics. Chem. Biol. 2003, 10(8), 769. 99. Bower, J., Drysdale, M., Hebdon, R., Jordan, A., Lentzen, G., Matassova, N., Murchie, A., Powles, J., Roughley, S. Structure-based design of agents targeting the bacterial ribosome. Bioorg. Med. Chem. Lett. 2003, 13(15), 2455. 100. Hajduk, P.J., Bures, M., Praestgaard, J., Fesik, S.W. Privileged molecules for protein binding identified from NMR-based screening. J. Med. Chem. 2000, 43(18), 3443. 101. Jhoti, H., Cleasby, A., Verdonk, M., Williams, G. Fragment-based screening using X-ray crystal lography and NMR spectroscopy. Curr. Opin. Chem. Biol. 2007, 11(5), 485. 102. Fourmy, D., Recht, M.I., Blanchard, S.C., Puglisi, J.D. Structure of the A site of Escherichia coli 16s ribosomal RNA complexed with an aminoglycoside antibiotic. Science 1996, 274(5291), 1367. 103. Yu, L., Oost, T.K., Schkeryantz, J.M., Yang, J., Janowick, D., Fesik, S.W. Discovery of aminoglyco side mimetics by NMR-based screening of Escherichia coli A-site RNA. J. Am. Chem. Soc. 2003, 125(15), 4444. 104. Fourmy, D., Recht, M.I., Puglisi, J.D. Binding of neomycin-class aminoglycoside antibiotics to the A-site of 16 S rRNA. J. Mol. Biol. 1998, 277(2), 347. 105. Pinto, I.G., Guilbert, C., Ulyanov, N.B., Stearns, J., James, T.L. Discovery of ligands for a novel target, the human telomerase RNA, based on flexible-target virtual screening and NMR. J. Med. Chem. 2008, 51(22), 7205. 106. Foloppe, N., Chen, I.J., Davis, B., Hold, A., Morley, D., Howes, R. A structure-based strategy to identify new molecular scaffolds targeting the bacterial ribosomal A-site. Bioorg. Med. Chem. 2004, 12(5), 935. 107. Carter, A.P., Clemons, W.M., Brodersen, D.E., Morgan-Warren, R.J., Wimberly, B.T., Ramakrish nan, V. Functional insights from the structure of the 30s ribosomal subunit and its interactions with antibiotics. Nature 2000, 407(6802), 340. 108. Ogle, J.M., Brodersen, D.E., Clemons, W.M. Jr., Tarry, M.J., Carter, A.P., Ramakrishnan, V. Recognition of cognate transfer RNA by the 30s ribosomal subunit. Science 2001, 292(5518), 897. 109. Barluenga, S., Simonsen, K.B., Littlefield, E.S., Ayida, B.K., Vourloumis, D., Winters, G.C., Takahashi, M., Shandrick, S., Zhao, Q., Han, Q., Hermann, T. Rational design of azepane glycoside antibiotics targeting the bacterial ribosome. Bioorg. Med. Chem. Lett. 2004, 14(3), 713. 110. Zhou, Y., Gregor, V.E., Ayida, B.K., Winters, G.C., Sun, Z., Murphy, D., Haley, G., Bailey, D., Froelich, J.M., Fish, S., Webber, S.E., Hermann, T., Wall, D. Synthesis and SAR of 3,5-diamino piperidine derivatives: Novel antibacterial translation inhibitors as aminoglycoside mimetics. Bioorg. Med. Chem. Lett. 2007, 17(5), 1206. 111. Zhou, Y., Gregor, V.E., Sun, Z., Ayida, B.K., Winters, G.C., Murphy, D., Simonsen, K.B., Vourlou mis, D., Fish, S., Froelich, J.M., Wall, D., Hermann, T. Structure-guided discovery of novel aminoglycoside mimetics as antibacterial translation inhibitors. Antimicrob. Agents Chemother. 2005, 49(12), 4942.

Section 4

Nanotechnology

Section Editor: Luke E.K. Achenie Department of Chemical Engineering, Virginia Polytechnic Institute,

Blacksburg, Virginia 24061, USA

CHAPTER

10 Insights into the Role of Conformational Transitions and Metal Ion Binding in RNA Catalysis from Molecular Simulations Tai-Sung Lee1,2, George M. Giambas¸ u1,2, and Darrin M. York2

Contents

Abstract

1. Introduction 1.1 Hammerhead ribozyme 1.2 L1 Ligase ribozyme 2. Molecular Simulations of the Hammerhead Ribozyme 2.1 Metal binding modes 2.2 Simulations along the reaction coordinate 2.3 Simulations of mutations of key residues 3. Molecular Simulations of the L1 Ligase 3.1 Conformational variation of L1L occurs at dynamical hinge

points 3.2 The U38 loop responsible for allosteric control

is intrinsically flexible 3.3 Anatomy of the ligation site and implications for catalysis 4. Methods 5. Conclusion Acknowledgments References

170

171

172

173

173

181

183

187

188

190

191

193

194

195

196

We present a summary of recent advances in the application of molecular simulation methods to study the mechanisms of RNA catalysis. The focus of this chapter is on the nature of conformational transitions and metal ion binding on structure and activity. Two RNA enzyme systems are considered: the hammerhead ribozyme and the L1 ligase. The hammerhead ribozyme is a

1

Biomedical Informatics and Computational Biology, University of Minnesota, Minneapolis, MN, USA

2

Department of Chemistry, University of Minnesota, Minneapolis, MN, USA

Annual Reports in Computational Chemistry, Volume 6 ISSN: 1574-1400, DOI 10.1016/S1574-1400(10)06010-X

2010 Elsevier B.V. All rights reserved.

169

170

Tai-Sung Lee et al.

small archetype ribozyme that undergoes a conformational transition into a catalytically active conformation in a step that is concerted with changes in metal ion binding in the active site. The L1 ligase ribozyme is an in vitro selected ribozyme that uses a noncanonically base-paired ligation site to catalyze regioselectively and regiospecifically the 50 to 30 phosphodiester bond ligation. The L1 ligase presumably undergoes a large-scale conformational change from an inactive to an active form that involves reorientation of one of the stems by around 80 ¯, making it a novel catalytic riboswitch. Analysis of the simulation results and comparison with experimental measurements provide important new insight into the conformational and chemical steps of catalysis of the hammerhead and L1 ligase ribozymes. Keywords: RNA; catalysis; molecular simulation; hammerhead ribozyme; L1 ligase ribozyme; Mg2+ ions

1. INTRODUCTION The original notion that the only function of RNA molecules was as messenger intermediates in the pathway from the genetic code to protein synthesis has undergone a revolution in the past two and a half decades. The role of RNA in cellular function is now known to be considerably more diverse, ranging from regulation of gene expression and signaling pathways to catalyzing important biochemical reactions, including protein synthesis itself [1—7]. These discoveries have transformed our view of RNA as a simple messenger to one more pro foundly central in the evolution of life forms, our understanding and apprecia tion of which is still in its infancy. Ultimately, the elucidation of the mechanisms of RNA catalysis will yield a wealth of new insights that will extend our under standing of biological processes and facilitate the design of new RNA-based technologies [8—10]. Simulations of biological systems at the atomic level could potentially offer access to the most intimate mechanistic details that may aid in the interpretation of experiments and provide predictive insight into relevant drug design or therapeutic efforts [11]. A quantum mechanical (QM) description is ultimately required for reliable study of chemical reactions, including reactions catalyzed by biological macromolecules such as RNA, but at the same time, a high-level fully QM treatment of these systems in molecular simulations is not yet feasible. A practical approach involves the use of so-called “multiscale models” that require only a small, typically localized region of the system to be treated with the most computationally costly QM methods. The term multiscale model here implies the integration of a hierarchy of models that work together to provide a computationally tractable representation of a complex biochemical reaction in a realistic environment. As a specific example, for enzyme systems, one typically treats the reactive chemical events with a sufficiently accurate high-level QM model, the microscopic solvent fluctuations and changes in molecular conforma tion using molecular mechanical (MM) force field model, and the macroscopic dielectric relaxation using a continuum solvation model. The most simple and

Insights into RNA Catalysis from Simulations

171

widely applied multiscale model to study enzyme reactions is the use of a combined QM/MM potential [12—16]. RNA catalysis simulations are particularly laden with challenges not appar ent for most other biological systems, such as protein enzymes. RNA molecules are highly negatively charged and exhibit strong and often specific interactions with solvent [11,17—19]. This requires special attention to the microscopic in silico model that requires consideration of a very large number of solvent molecules and counter and coions to be included. Electrostatic interactions need to be treated rigorously without cutoff, and long simulation times are typically needed to insure that the ion environment is properly equilibrated [20—22]. These issues are further complicated by the fact that RNA molecules bind divalent metal ions that play an important role in folding and, in many instances, also contribute actively to the catalytic chemical steps. The highly charged nature of RNA and its interaction with divalent metal ions and other solvent components makes inclusion of explicit electronic polarization in the molecular models much more important than in typical protein enzyme sys tems. The chemistry involved in reactions of prototype ribozymes such as cleavage transesterification involves large changes in local charge state and hybridization around phosphorus, exacerbating the need to design QM/MM methods that can reliably model hypervalent states of phosphorus. There is a need to design new models that circumvent the need for “atom-type” para meters to be assigned to the QM system in order to compute QM/MM interac tions, as the “atom type” can change as a reaction proceeds. Finally, there is a growing precedent that many ribozyme reactions may involve large changes in conformation and metal ion binding along the reaction coordinate, creating the need to develop extremely fast semiempirical quantum models that can be practically applied in conjunction with long-time simulations to adequately sample relevant configurations and create multidimensional free-energy sur faces along multiple reaction coordinates.

1.1 Hammerhead ribozyme The hammerhead ribozyme (HHR) [23,24] is an archetype system to study RNA catalysis [25,26]. HHR catalyzes the site-specific attack of an activated 20 OH nucleophile to the adjacent 30 phosphate, resulting in cleavage of the P—O50 phosphodiester linkage to form a 20 ,30 -cyclic phosphate and a 50 alcohol. A detailed understanding of the structure—function relationships in the HHR [6,24] will ultimately aid in the understanding of other cellular RNA catalysts such as the ribosome. The HHR has gained attention as a potential anti-HIV-1 therapeutic agent [27,28], an inhibitor of BCR-ABLi1 gene expression [29], an inhibitor of hepatitis-B virus gene expressions [30,31], and a tool in drug design and target discovery for other diseases [9,32]. Very recently, a discontinuous HHR motif has been found embedded in the 30 -untranslated regions of a mam malian messenger RNA, suggesting a possible role in posttranscriptional gene regulation [33]. However, the detailed reaction mechanism of HHR is still elusive despite significant experimental and theoretical work [24,26,34,35].

172

Tai-Sung Lee et al.

One aspect of the catalytic mechanism that has perplexed the community involves the specific role of divalent metal ions in catalysis. Specifically, one of the main puzzles involves the apparent inconsistency between the interpretation of thio substitution [36,37] and the mutational [34] experiments with available crystallographic structural information of the minimal hammerhead sequence [38—40]. Biochemical experiments have been interpreted to suggest that a pH-de pendent conformational change must precede or be concomitant with the cata lytic chemical step, including a possible metal ion bridge between the A9 and scissile phosphates. This is inconsistent with crystallographic data for the mini mal hammerhead motif [38—40] ,where A9 and scissile phosphates are found to ˚ apart. Moreover, the function of the 20 OH group of G8 remains unclear be ~20 A from this data [6,24]. Recent crystallographic studies of a full-length HHR have characterized the ground-state active site architecture [41] and its solvent struc ture [42], including the binding mode of a presumed catalytically active divalent metal ion in the active site. These findings, together with molecular simulation studies [43—46], have reconciled a long-standing controversy between structural and biochemical studies for this system [47].

1.2 L1 Ligase ribozyme The L1 ligase (L1L) ribozyme, “biology’s first enzyme” [48], is an in vitro selected ribozyme that uses a noncanonically base-paired ligation site to cata lyze regioselectively and regiospecifically the 50 to 30 phosphodiester bond ligation, a reaction relevant to origin of life hypotheses that invoke an RNA World scenario. No known naturally occurring ribozyme catalyzes this phos phodiester assembly reaction. The concern that RNA might be inherently incapable of catalyzing this reaction was put to rest in 1993 with the first in vitro evolution of a ribozyme ligase [49]. Subsequently, several other ribozyme ligases have been produced using in vitro selection techniques [50—57], includ ing a small subset that specifically catalyze regiospecific 30 to 50 phosphodiester linkages characteristic of all extant RNA and DNA polymerases. L1L [55] is one such example and is unusual in that it uses an intrinsically flexible noncanonically base-paired ligation site [51,58]. In addition to its relevance to the origin of life, the L1L ribozyme, presumably as a fortuitous consequence of in vitro selection, functions as an allosteric ribozyme molecular switch [1,59,60]. The L1L can be further engineered to enable derivatives to have their function controlled by small molecules, peptides, or even proteins to create new artifi cial allosteric ribozymes [55,61—63]. The crystal structure of the L1L ligation product [64] reveals in the same asymmetric unit two crystallographically independent conformations, verifying its postulated intrinsic flexibility. The two conformers differ in the orientation of ˚ [64]. one of the stems by a movement of its tip by an arc length of around 80 A Based on the presence/absence of specific contacts between the ligation site and some distant evolutionarily conserved regions, it was proposed that the confor mers represented catalytically active “on” and inactive “off” states [64].

Insights into RNA Catalysis from Simulations

173

Recently, we have identified the dynamical hinge points of the L1L ribo zyme using large-scale molecular dynamics (MD) simulations [65]. We have departed from an analysis of the two crystallized conformers and shown using over 600 ns of MD simulations that the transition between on and off conforma tional states can be almost entirely described by changes in only four virtual torsion angles. Virtual torsions are formed along the virtual bonds between C40 and P atoms and have been shown to be able to discriminate between major RNA folds [66,67]. In this chapter, we summarize our recent efforts to unveil the detailed mechanisms of HHR and L1L catalysis using MD simulations and hybrid QM/ MM calculations. In the case of the HHR, emphasis is placed on the characteriza tion of metal binding modes at different stages along the reaction coordinate, the role of metal ion interactions on structure and activity, and the origin of muta tional effects on catalysis. In the case of the L1L ribozyme, the focus is placed on the identification of dynamical hinge points and the characterization of the conformational transition between catalytically active “on” and inactive “off” states, including induced unfolding resulting from removal of structural divalent metal ions. Together, these studies provide deeper insight into the the role of conformational transitions and metal ion binding in RNA catalysis.

2. MOLECULAR SIMULATIONS OF THE HAMMERHEAD RIBOZYME In this section, we examine metal ion binding modes in the HHR at various stages of progression along the reaction coordinate. Detailed analysis of these binding modes affords insight into the role of divalent metal ions in catalysis. Moreover, long-time MD simulations allow characterization of the electrostatic environment of the HHR and the ability to recruit cationic charge, that when threshold occupancy is achieved, leads to formation of catalytically active conformations. Further study of HHR mutations provide an atomic level interpretation of the origin of mutational effects.

2.1 Metal binding modes MD simulations were set up to explore the metal binding modes in the HHR and their relation to structure and catalysis. A recent joint crystallographic/molecular simulation study [42] of the solvent structure of the full-length HHR indicates that, in the reactant state prior to activation of the nucleophile, a Mn2+ ion is coordinated to the O2P atom of the A9 phosphate and the N7 atom of G10.1. This binding site is designated as the “C-site.” An alternate binding site, where a divalent metal ion bridges A9:O2P and the scissile phosphate (C1.1:O2P) and is designated as the “B-site,” has been inferred from thio/rescue effects [36,37] and also predicted from molecular simulations [44,45]. In the absence of divalent metal ions, HHR activity can be recovered by high concentrations of monovalent ions [68—70]. The specific metal ion binding modes at different stages along the HHR reaction coordinate, and their relation to formation of catalytically active

174

Tai-Sung Lee et al.

structures, has not yet been determined. Here we report results from a series of MD simulations that aim to provide atomic level insight into these questions. To explore the divalent and monovalent metal ion binding modes (Figure 1) and their relation to formation of catalytically active, in-line attack conformations in both the neutral reactant and activated precursor (deprotonated 20 OH nucleo phile) states, we set up the following series of simulations: G-10.1 A-9 O

O

N7 Mg2+

−O

A-9

G-10.1

OH

O

N7

P

OH

H

H

O5′ C-1.1

O2′

O

O

O−

HO G-8

O H

HO

O2′

O2′

OH O

Mg2+

H H2′

P

O O

O−

O

−O

H HO

OH

H2′

O2′ G-8

O5′ P

C-1.1 O

O C-17

C-17 O

O

O

OH

OH

A-9

G-10.1

O

O

N7 −O

P

OH O

2

O

M1

1

4 O−

M2

M3

O

H2′

O2′ G-8

O5′

8 O2′

P

C-1.1 O

O C-17 O

O

OH

Figure 1 Schematic view of the coordination sites in the hammerhead ribozyme active site. Upper left: The coordination pattern of Mg2+ in the C-site coordinated to G10.1:N7 and A9:O2P. Upper right: The coordination pattern of Mg2+ in the B-site bridging A9:O2P and C1.1:O2P of the scissile phosphate. Lower: Coordination sites for Na+ in the hammerhead ribozyme active site found in the RT-Na and dRT-Na simulations. Red numbers next to the coordination sites are the scores used to calculate the coordination index (see text). M1 involves direct binding to A9:O2P and C.1:O2P and indirect binding to G10.1:N 7 through a water molecule. M2 involves direct binding to C17:O20 and C.1:O2P. M3 involves direct binding to C17:O20 and is positioned toward the outside of the active site. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this book.)

175

Insights into RNA Catalysis from Simulations

• RT-C-Mg, the reactant state with Mg2+ at the C-site. • RT-B-Mg, the reactant state with Mg2+ at the bridging position. • dRT-C-Mg, the activated precursor with Mg2+ at the C-site. • dRT-B-Mg, the activated precursor with Mg2+ at the bridging position. • RT-Na, the reactant state in the absence of Mg2+ (in the presence of NaCl only). • dRT-Na, the activated precursor in the absence of Mg2+ (in the presence of NaCl only). In the dRT-C-Mg simulation involving the activated precursor with Mg2+ initially placed at the C-site, the Mg2+ ion quickly (<200 ps) migrates into the B-site position, as observed in a previous simulation study [44], and afterward exhibits nearly identical behavior as if the Mg2+ ion was initially placed at the B-site (dRT-B-Mg). Hence, we only extended the dRT-B-Mg simulation to 300 ns and designate it simply as dRT-Mg. All five simulations were carried out to 300 ns in a background of 0.14 M NaCl. The equilibration for each simulation was monitored by the root mean square deviation (RMSD) and was observed to reach a steady state after 30—50 ns. Hence, all analyses were performed over the last 250 ns of trajectories for each simulation.

2.1.1 A bridging Mg2+ ion maintains rigid coordination patterns that stabilize in-line attack conformations In this section, we compare the effect of different Mg2+ binding modes in both the neutral reactant and activated (deprotonated 20 OH) precursor states on the active site structure and fluctuations. Table 1 lists the averages of key in-line indexes, the A9/scissile phosphate—phosphate distances, and Mg2+ coordination dis tances for the RT-C-Mg, RT-B-Mg, and dRT-Mg simulations. Figure 1 shows a general schematic view of the active site metal ion coordination from the simula tions. The distances and standard deviations indicate that the Mg2+ ion retains rigid coordination with the phosphate oxygens over the course of the simulation, being directly coordinated to A9:O2P in all simulations. In the RT-C-Mg simula tion, the Mg2+ ion coordinates G10.1:N7 indirectly through one of four innersphere water molecules. However, this coordination pattern is not highly Table 1 Characterization of the Mg2+ coordination in the active site R

OO

A9:O2P C1.1:O2P

C17:O20

G8:O2?

G10:N7

RT-C-Mg 4.01(34) 126.5(119) 4.14(49) 2.01(4) 4.40(30) 6.04(90) 5.76(46) 4.19(31) RT-B-Mg 3.28(12) 151.2(79) 2.95(13) 2.02(5) 2.04(5) 4.25(24) 4.57(30) 4.38(25) dRT-Mg 3.64(17) 155.0(80) 2.94(13) 2.01(4) 2.03(5) 3.76(17) 4.62(62) 5.05(26) Analysis was performed over the last 250 ns (10 ps sampling frequency). Distances and angles (Figure 1) are in angstrom and degrees, respectively. Standard deviations (SDs) are listed in parenthesis divided by the decimal precision of the average (e.g., if the average is reported to two digits of decimal precision, the SD is divided by 0.01). R is the in-line attack distance (C17:O20 to C1.1:P), and is the in-line attack angle (between C17:O20 , C1.1:P, and C1.1:O50 ). O—O is the distance between A9:O2P and C1.1:O2P, and other distances are between the Mg2+ and the indicated ligand site.

176

Tai-Sung Lee et al.

conducive to formation of an in-line attack conformation. The RT-B-Mg simula tion, on the other hand, shows a more rigid Mg2+ coordination with both the A9 and scissile phosphate oxygens and sustains a considerable population of in-line attack conformations. These results suggest that the coordination pattern found in the RT-B-Mg simulation is able to stabilize in-line attack conformations more readily than Mg2+ binding at the C-site as in the RT-C-Mg simulation. The dRTMg simulation is similar to the RT-B-Mg simulation with regard to exhibiting rigid coordination with the A9 and scissile phosphate oxygens and stabilization of in-line attack conformations. With the Mg2+ ion at the bridging position (RT-B-Mg and dRT-Mg simu lations), there is considerably reduced interaction with G10.1:N7, which are compensated by interactions with the C17:O20 that occur through two water molecules in the inner sphere of the Mg2+ ion. This interaction is most pronounced in the dRT-Mg simulation where the C17:O20 is deprotonated. In the ground-state reactant simulations with Mg2+ (RT-C-Mg and RT-B-Mg), no Na+ ions were observed to infiltrate the active site. In the activated precursor simulation, dRT-Mg, a single Na+ ion, was observed to be bound at high occupancy to the deprotonated C17:O20 in a manner similar to the M3 position in Figure 1.

2.1.2 Specific Na+ binding patterns are correlated with formation of in-line attack conformations In this section, we explore the monovalent metal ion binding modes that are correlated with formation of catalytically active in-line attack conformations. For the simulations with no Mg2+ ions (RT-Na and dRT-Na) in the active site, binding of the Na+ ions to the coordination sites exhibits larger variation, and exchange events occur giving rise to a fairly broad array of coordination patterns. In order to characterize the distribution and frequency of this array of coordination patterns, a binary-coded coordination index is used (Figure 1). This index is ˚ ) to a ligand, it defined as follows: when a Na+ is within a cutoff distance (3.0 A is classified as bound to the ligand and assigned a unique coordination score for binding to that particular ligand. The coordination scores for the four possible coordination sites (ligands) are 1 for G8:O20 , 2 for A9:O2P, 4 for C1.1:O2P, and 8 for C17:O20 . The coordination index of an ion is the sum of all coordination scores from its bound sites. In this way, the coordination pattern of a Na+ can be uniquely represented by a single number. For example, an index of 12 means that a Na+ directly coordinates to both C1.1:O2P and C17:O20 simultaneously (4 + 8 = 12). Through this coordination index, the coordination patterns of Na+ ions in the active site can be traced as a time series over the course of the simulation as shown in Figure 2. Distinct colors are also used to distinguish individual Na+ ions present in the active site such that transitions between coordination patterns (indexes) can be monitored. Figure 2 shows that in the RT-Na simulation, two Na+ ions (colored as red and green) are present in the active site and, most of time, they both have a

Insights into RNA Catalysis from Simulations

dRT-Na

In-line angle

Coordination index

14

RT-Na

14

12

12

10

10

8

8

6

6

4

4

2

2

180

180

150

150

120

120 0

50 100 150 200 250

177

50 100 150 200 250

Simulation time (ns)

Figure 2 Plot of the in-line attack angle (O20 O50 ) in degrees and the coordination index of Na+ ions for the dRT-Na (left) and RT-Na (right) simulations. The coordination index is defined as follows: when a Na+ ion has a distance less than a 3.0 ¯ cutoff value to a ligand, it is defined as bound to that ligand for indexing purposes. When an ion is bound, the scores for the four possible coordination sites are 1 for G8:O20 , 2 for A9:O2P, 4 for C1.1:O2P, and 8 for C17:O20 . The coordination index of a single Na+ ion is the sum of all scores from its bound sites. Individual Na+ ions are tracked using different colors (red, green, blue, and yellow). Data obtained from the last 250 ns are shown in steps of 500 ps. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this book.)

coordination index of 6, indicating binding to both A9:O2P and C1.1:O2P at the same time (2 + 4 = 6, refer to Figure 1). Hence, two Na+ ions collectively act like a single bridging Mg2+ ion to hold the negatively charged A9 and scissile phos phates together to maintain an in-line conformation. During the period from ~210 to 240 ns, only one Na+ ion (red) with coordination index of 6 is present in the active site. During this period, the in-line angle drops suddenly from around 155 to 120. In those periods, the in-line conformation is no longer held. In the dRT-Na simulation, the in-line angle is less well preserved than in the dRT-Mg simulation. This is consistent with the lower activity of the ribozyme in the absence of Mg2+. Nonetheless, there are several periods (e.g., 25—50 ns and 210—270 ns) where an in-line conformation is visited, and again we observe a high correlation between the Na+ ion coordination index and the in-line conformation. When less than three Na+ ions bind to the active site ligands, the in-line con formation is no longer held, which happens during most of the simulation. The in-line attack angle comes to a ready-to-react value (> 150) when three Na+ ions bind simultaneously to different ligand sites. Figure 3 illustrates the different Na+ binding patterns for cluster A, (defined in Table 2, in-line conformation, lower panels) and cluster B (not in-line confor mation, upper panels) from the dRT-Na simulation. The in-line cluster A clearly exhibits three Na+ bridges that involve C17:O20 /C1.1:O2P, C1.1:O2P/G8:O20 , and

178

Tai-Sung Lee et al.

r (G8:O2′, Na) 3 4 5

6

2

dRT, no Mg, in-line 3 4 5 6 r (C1.1:O2P, Na)

1

r (A9:O2P, Na) 3 4 5

6

4

r (C1.1:O2P, Na)

3 2

5

dRT, no Mg, not in-line

r (C1.1:O2P, Na)

5

4 3

4

2

3

1

1

2

2

6 1

2 61

dRT, no Mg, not in-line

r (C1.1:O2P, Na)

3

2

5

5 4 3

4 3 4

1 6

2

r (C1.1:O2P, Na)

5 r (C17:O2′, Na)

2 1

dRT, no Mg, not in-line

5 r (C17:O2′, Na)

1

6

2

dRT, no Mg, in-line 3 4 5 6 r (G8:O2′, Na)

1

r (C1.1:O2P, Na) 3 4 5

6

2

6

1

1

2

dRT, no Mg, in-line 3 4 5 6 r (A9:O2P, Na)

Figure 3 Two-dimensional radial distribution function of Na+ ions in the active site for the activated precursor simulation without Mg2+ present in the active site (dRT-Na). The lower panels show results for cluster A that contains population members that are in active in-line conformations, and the upper panels show results for cluster B that are not in-line (see Table 2). The axes are the distances (in ¯) to different metal coordination sites. The green lines indicate the regions where Na+ ions have distances less than 3.0 ¯ to both sites indicated by the axes. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this book.)

C1.1:O2P/A9:O2P. On the other hand, for cluster B, the first two of these bridges are absent with the third one being significantly less pronounced. The above analysis suggests that the compensation of the negative charges of these three coordination sites, as well as the bridging binding patterns of Na+ to bring them together, is necessary to keep the in-line conformation in the depro tonated activated precursor state, although the binding patterns are not as rigid as those of Mg2+.

2.1.3 HHR folds to form a cation recruiting pocket in the active site In this section, we examine the preferential occupation of cations in the HHR active site. The 3D density contour maps for the Na+ ion distribution determined over the last 250 ns of simulation (Figure 4) show that the average Na+ ion density at a medium contour level (left panels, Figure 4) is located near the RNA’s phosphate backbone, whereas at high contour level (right panels, Figure 4) the highest prob ability Na+ occupation sites were all concentrated in the active site for both the reactant and activated precursor. No explicit Na+ ions were initially placed in the active site, and Na+ ion exchange events were observed to occur. These results suggest that the HHR folds to form a strong local electronega tive pocket that attracts cations from solution (e.g., either Mg2+, if present,

Table 2 Coordination patterns of Mg2þ and Naþ ions in active site

RT-C-Mg RT-B-Mg

dRT-Mg dRT-Na

Percentage

R

hNMg2 þ i

hCNMg2 þ i

hNBMg2 þ i

hNNaþ i

hCNNaþ i

hNBNaþ i

A B A B A B A A B

20.78% 79.22% 99.54% 0.46% 86.72% 13.28% 100.00% 23.99% 76.01%

3.30 4.13 3.27 4.00 3.23 4.12 3.64 3.50 4.30

144.05 122.66 151.10 129.76 152.91 122.82 154.89 144.72 115.16

1.00 1.00 1.00 1.00 — — 1.00 — —

1.00 1.00 2.00 2.00 — — 2.00 — —

0.00 0.00 1.00 1.00 — — 1.00 — —

0.05 0.03 0.00 0.09 1.15 1.38 0.97 2.96 2.46

1.00 1.00 1.00 1.00 1.99 1.54 1.01 2.29 1.69

0.00 0.00 0.00 0.00 0.88 0.66 0.01 2.68 1.36

Distances and angles (Figure 1) are in angstrom and degrees, respectively. The average values, denoted as h. . .i, are obtained by averaging over all snapshots in the cluster. R is the in-line attack distance (C17:O20 to C1.1:P). is the in-line attack angle (between C17:O20 , C1.1:P, and C1.1:O50 ). N is the number of ions with at least one coordination to any one of the four coordination sites. CN is the total coordination number of all ions with at least one coordination to any one of the four coordination sites. NB is the number of ions which coordinate to at least two of the four coordination sites.

Insights into RNA Catalysis from Simulations

RT-Na

Cluster

179

Figure 4 The 3D density contour maps (yellow) of Na+ ion distributions derived from the RT-Na (upper panels) and dRT-Na simulations (low panels) at different isodensity contour levels (left panels: 0.1; right panels: 1.0). The hammerhead ribozyme is shown in blue with the active site highlighted red. The figure shows that, although the Na+ ions distribute around the RNA phosphate backbone (left panels), the hammerhead ribozyme folds to form a local electronegative recruiting pocket that attracts a highly condensed distribution of the Na+ ions (left panels) both in the reactant state and in the deprotonated activated precursor-state (deprotonated C17:O20 ) simulations. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this book.)

Insights into RNA Catalysis from Simulations

181

or Na+). As discussed previously, threshold occupancy of cationic charge and specific metal ion binding patterns (in particular, bridging coordination of the A9 and scissile phosphates) stabilize the active site and facilitate formation of in-line attack conformations. As analyzed by NMR, a similar case has been observed in the tetraloop—receptor complex, where the divalent ions were experimentally found to be located at strong electronegative positions formed by the RNA fold [71]. Together with the known divalent metal ion binding at the C-site, these results provoke the speculation that perhaps the active sites of some ribozymes such as the HHR have evolved to form electrostatic cation binding pockets that facilitate catalysis. In the case of the HHR, this speculation is further supported by the simulated correlation of cation binding mode with formation of active conformations discussed in detail in the previous sections.

2.2 Simulations along the reaction coordinate In the previous section, we showed that the active site Mg2+ ion prefers to occupy the C-site and B-site in the reactant state and in the deprotonated active precursor, respectively. Starting with these two possible sites, simulations have been performed for transition state (TS) mimics to explore the possible roles of the Mg2+ ion in the chemical reaction step, including four MD simulations of the reactant state in protonated (RT) or activated/deprotonated (dRT) form, two simulations of the early TS (ETS) and two simulations of the late TS (LTS) with the Mg2+ ion initially placed at C-site (c-) or B-site (b-), and finally two additional QM/MM simulations of the ETS and LTS.

2.2.1 Molecular dynamics studies of transition state mimics In the TS mimic simulations with Mg2+ initially placed at the C-site, in both ETS and LTS (c-ETS and c-LTS), the Mg2+ ion migrates from the C-site to the bridging position in less than 0.5 ns and remains at the B-site for the remainder of the simulation, as postulated previously [45]. This migration is likely facilitated by deprotonation of the 20 OH of C17 (the nucleophile) and the accumulation of negative charge that is formed in moving toward the TS. These results are consistent with thio/rescue effect experiments indicating that both the A9 phos phate oxygens and the pro-RP scissile phosphate oxygen exhibit a stereospecific kinetic thio effect in the presence of Mg2+ that can be rescued by Cd2+ ions [37]. Since both c-ETS and c-LTS resulted in the migration of Mg2+ to the B-site, we will focus only on the ETS and LTS mimic simulations with Mg2+ initially placed at the B-site and labeled here b-ETS and b-LTS respectively. In b-ETS and b-LTS simulations, the distance between the A9 and ˚ and the Mg2+ coordination between the scissile phosphates keeps around 4 A C1.1 and A9 phosphate oxygens keeps an axial—axial position along the whole simulations (Table 3). The distance between the A9 and scissile ˚ , which is well suited phosphates in the crystallographic structure is around 4.3 A for Mg2+-bridging coordination [72]. A similar situation is found in the

182

Tai-Sung Lee et al.

Table 3 Comparison of crystallographic and simulation data for selected heavy-atom distances (¯) in the hammerhead active site X-ray

C1.1:O2P…A9:O2P Mg…G8:O20 Mg…C1.1:O50 G8:O20 …C1.1:O50 C17:O20 …C1.1:P G12:N1…C17:O20 A9:N6…G12:N3 A9:N6…G12:O20 A9:N7…G12:N2

Simulation

2GOZ

2OEU

b-RT

b-ETS

b-LTS

4.33 3.04 3.84 3.19 3.18 3.54 2.63 3.21 2.90

4.28 3.14 4.01 3.51 3.30 3.26 3.22 2.98 2.90

3.36(49) 3.97(102) 4.22(21) 4.29(77) 3.61(23) 3.02(27) 3.27(58) 3.36(86) 3.42(93)

4.00(60) 2.24(13) 3.68(35) 4.41(65) 1.89(12) 3.14(28) 3.15(21) 3.01(18) 3.85(44)

4.01(70) 3.21(23) 2.09(50) 2.91(17) 1.76(40) 2.97(13) 3.17(21) 2.99(16) 3.66(33)

Results are from simulations with Mg2+ initially placed at the bridging position for the reactant state (b-RT), the early transition state (b-ETS) mimic, and the late transition state (b-LTS) mimic. Average values are shown with standard deviations in the parenthesis (divided by the decimal precision). X-ray structures used for comparison include the full-length hammerhead RNA crystallographic ˚ resolution (2GOZ) [41] that was also used in this chapter as the starting structure, structure at 2.2 A ˚ resolution structure with resolved Mn2+ sites and solvent (2OEU) [42]. and the 2.0 A

deprotonated reactant-state simulation presented in the previous section that discusses the Mg2+ binding modes. Specifically, in the ETS mimic simulations, where the nucleophilic O20 and leaving group O50 are equidistant from the phosphorus, the Mg2+ ion becomes directly coordinated to the 20 OH of G8 and is positioned closer to the leaving group O50 . Both ETS mimic simulations with Mg2+ initially placed at the bridging position and the C-site position showed very similar results. The coordination of the Mg2+ ion in the ETS mimic simulations is consistent with a role of shifting the pKa of the 20 OH in G8 so as to act as a general acid. In the LTS mimic simulations, both with Mg2+ initially placed at the bridging and the C-site positions, a transition occurs whereby the Mg2+ coordination with the 20 OH of G8 is replaced by direct coordination with the leaving group O50 . In this way, the Mg2+ may provide electrostatic stabilization of the accumulating charge of the leaving group (i.e., a Lewis acid catalyst) [35]. At the same time, the 20 OH of G8 forms a hydrogen bond with the leaving group O50 and is positioned to act as a general acid catalyst.

2.2.2 Metal-assisted proton transfer in the general acid step The classical MD simulations in the previous section suggest that in the ETS, the Mg2+ ion is positioned to shift the pKa of the 20 OH of G8 to act as a general acid, and in the LTS, the Mg2+ ion can act as a Lewis acid catalyst to stabilize the leaving group and is poised to assist proton transfer from the 20 OH of G8. The possible roles inferred from purely classical MD simulations are supported by

Insights into RNA Catalysis from Simulations

183

the QM/MM results as a similar binding pattern is observed. The Mg2+ ion is bonded to G8:O20 in the ETS mimic and switches to C1.1:O50 of the leaving group in the LTS mimic. In the LTS mimic, the G8:HO20 0 is strongly hydrogen bonded to C1.1:O50 . In fact, the proton transfer from G8:O20 to C1.1:O50 occurs spontaneously within 1 ns in the QM/MM simulation of LTS mimics. At the start of the ˚ away simulation, C1.1:O50 is tightly bound to Mg2+, while G8:O20 is around 3 A 2+ 2+ from Mg . As the simulation proceeds, G8:O20 moves closer to Mg and even tually binds to Mg2+ as it gives up its proton to C1.1:O50 . After being protonated, the binding between C1.1:O50 and Mg2+ becomes weaker and their direct coordi nation becomes broken. This QM/MM simulation confirms that Mg2+ enhances the acidity of G8:O20 and facilitates proton transfer to the leaving group.

2.3 Simulations of mutations of key residues In this section, we report MD simulations of the native and mutated full-length HHRs in the reactant state and in an activated precursor state (C17:O20 deproto nated). Each simulation has a production trajectory of 60 ns. There exists a wealth of experimental mutational effect data, and simulations of key mutations can offer a detailed rationalization of these effects in terms of structure and dynamics, forge a closer connection between theory and experiment, and provide deeper insight into mechanism. Simulations were performed with the C3U, G8A, and G8I single mutants, and a C3U/G8A double mutant that exhibits an experimental rescue effect [73]. Simulation-derived key active site structural parameters are provided in Table 4, and representative hydrogen-bond base pairing at the C3—G8 positions are shown in Figure 5. In addition, a set of control simulations were performed on a benign U7C mutation, and the wild-type simulation with the active site Mg2+ ion removed. An implicit assumption herein is that the mutated sequences fold to a native-like structure. Mg2+ can migrate to a bridging position in the activated precursor. In all reactantstate simulations, the Mg2+ stays near A9:O2P and G10.1:N7. In all activated precursor state (deprotonated C17:O20 ) simulations, after few hundred picose conds, the Mg2+ migrates into a bridging position between A9:O2P and C1.1:O2P, ˚ relative to the reactant-state and reduces the distance (d0 in Table 4) by 1 A simulations. The C17:O20 has significant in-line fitness for nucleophilic attack on C1.1:P. In all simulations except G8A, the G8:O20 is hydrogen bonded to the leaving group C1.1:O50 and positioned to act as the general acid. C3U mutation disrupts the active site in the reactant. The C3U mutation reduces the catalytic rate by a factor ~310—4 [76]. The C3U mutation disrupts the normal Watson—Crick hydrogen bonding with G8 (Figure 5), causing a base shift that disrupts the active site structure in the reactant state. The distance between the A9 ˚ and breaks key hydrogen bonds and scissile phosphates increases more than 3.5 A between the O20 nucleophiles of C17 and N1 of G12 (the implicated general base), and between the O50 leaving groups of C1.1 and H2 of G8 (the implicated general

d0 rNu inl Fa r1 1 r2 2 r3

Characterization of the active site structure and fluctuations. Analysis was performed over the last 25 ns (10 ps sampling) WT

NoMg

U7C

C3U

G8A

C3U/G8A

G8I

3.98(40) 3.98(34) 128.2(116) 0.29(15) 2.00(17) 162.9(89) 2.01(10) 160.9(90) 1.88(12) 164.2(85) 2.97(9) 2.10(26) 152.1(138) 2.83(44) 118.2(168) 24.4 d-WTd 2.96(12) 3.54(17) 158.6(78) 0.52(10) 1.90(14) 160.1(103) 1.98(11) 163.1(84) 2.02(20)

3.40(30) 3.18(12) 159.6(82) 0.72(10) 2.12(23) 163.4(89) 2.04(13) 162.7(83) 1.86(11) 161.8(98) 3.01(12) 1.96(14) 154.9(90) 4.61(94) 90.7(506) 0.0 d-NoMg 3.51(76) 3.85(44) 139.2(185) 0.34(18) 1.94(15) 162.6(91) 1.99(10) 161.6(85) 1.91(13)

4.06(40) 4.08(17) 121.7(65) 0.24(6) 1.98(17) 163.7(85) 2.00(10) 161.0(88) 1.89(12) 164.8(81) 2.96(9) 2.11(27) 150.7(136) 2.92(55) 112.6(205) 22.4 d-U7C 2.94(12) 3.59(15) 156.2(69) 0.49(8) 2.02(17) 163.7(85) 2.00(10) 162.6(86) 1.88(11)

7.80(66) 4.16(14) 127.1(67) 0.23(4) — — 2.25(26) 157.0(127) 1.90(13) 163.1(88) 3.61(21) 3.51(66) 121.2(155) 7.82(58) 45.2(164) 0.0 d-C3U 2.94(13) 3.67(16) 154.2(74) 0.45(8) — — 1.92(15) 160.5(109) 1.93(15)

4.22(34) 3.2(10) 156.0(70) 0.68(9) — — — — 2.13(39) 149.4(151) 5.33(66) 2.01(16) 164.2(85) 2.97(33) 119.3(150) 31.2 d-G8A 2.95(12) 3.80(18) 146.6(88) 0.38(8) — — — — 2.82(44)

3.94(39) 3.59(48) 141.3(169) 0.49(23) — — 2.04(21) 160.7(104) 2.06(22) 161.2(101) 3.00(19) 2.38(46) 147.1(181) 3.27(64) 130.6(198) 13.2 d-C3U/G8A 2.93(12) 3.62(16) 155.5(73) 0.48(8) — — 2.08(17) 160.8(101) 1.96(17)

4.27(59) 3.77(44) 131.6(152) 0.37(20) — — 2.01(14) 162.0(88) 1.98(19) 163.6(90) 2.98(13) 2.05(26) 155.6(124) 2.83(86) 118.5(344) 38.4 d-G8I 2.93(13) 3.58(18) 156.6(73) 0.50(9) — — 1.95(11) 164.3(82) 2.11(22) (Continued)

Tai-Sung Lee et al.

d0 rNu inl Fa r1 1 r2 2 r3 3 rNNb rHB HB rHA HA %c

184

Table 4

163.3(93) 2.96(10) 2.16(55) 152.0(151) 2.82(95) 126(398) 62.8

164.7(83) 2.96(9) 1.97(18) 154.9(90) 3.28(130) 92.4(559) 45.6

163.8(85) 2.97(9) 1.95(15) 155.7(96) 2.31(51) 143.4(277) 83.6

161.9(94) 3.75(20) 1.94(14) 158.5(90) 2.99(99) 108.7(397) 43.6

106.9(116) 6.58(46) 1.89(12) 155.5(93) 4.10(63) 82.8(215) 0.8

160.3(105) 3.04(15) 1.95(14) 156.0(99) 2.61(91) 130.6(385) 68.8

163.2(95) 2.93(11) 1.99(16) 2.47(52) 138.6(286) 77.2

U7C and d-U7C are considered as control simulations as it has been shown that U7C mutation has almost no effect on the catalysis [74]. Distance and angles (Figure 5) are in angstrom and degrees, respectively. Standard deviations (SDs) are listed in parenthesis divided by the decimal precision of the average (e.g., if the number is reported to two digits of decimal precision, the SD is divided by 0.01). Boldface font is used to highlight key quantities that are significantly altered with respect to the wild-type (WT) simulation upon mutation and that are discussed in the text. a In-line fitness index [75]. b The N3…N1 distance between nucleobases in the 3 and 8 positions. c ˚ and The hydrogen-bond contact percentage of the general acid with the leaving group defined as the percentage of the snapshots in which rHA £ 3.0 A HA 120. d The notation “d-” denotes the activated precursor-state simulations having the C17:O20 deprotonated.

Insights into RNA Catalysis from Simulations

3 rNNb rHB HB rHA HA %c

185

186

Tai-Sung Lee et al.

A-9

G-10.1

O

N7 Mg G-12 θHB −O H

O

rHB O2′

H θ1

OH

P

N

O O

d0

NH2 N1

2+

O

O− rHA O5′ P

θinl O

O2′

N G-8

N

H2′ θHA

N

O

C-1.1

H O θ2 r1

N

H N3 C-3

θ3 r2

O r H 3

N

H

C-17

O

O

NH2

θ2

OH N

G-8 N

O H r θ3 2

O

r3

H

I-8 N

C3U (~3 × 10 )

O

O θ2

N 3

O H r3

N N C-3

−3

G8A (<4 × 10 )

N H

O

A-8 N θ

H

r3 H

G8I (~0.5)

N

N

N r2 N C-3

H θ3

N

N U-3

−4

O

θ2

N

N

A-8 N r H 2 θ3 N

H

H C3U/G8A (~0.5)

r3

N U-3 O

H2N

Figure 5 Upper: Active site of the full-length hammerhead RNA using the canonical minimal sequence numbering scheme described in [40] and [42]. Lower: Representative hydrogen bonding of the C3:G8 base pair observed from mutant simulations. Experimental relative catalytic rates of mutant versus wild-type minimal sequence ribozymes ðkmut =kwt Þ are shown in parentheses (C3U from [76], G8A from [78], C3U/G8A from [73], and G8I from [34]), and may differ for the full-length sequence.

acid). These perturbations in the reactant state would prevent activation of the nucleophile and progress toward the TS. Experimental evidence shows that C3U indeed reduces the rate constant by more than three orders of magnitude [76]. G8A mutation disrupts the positioning of G8:O20 as general acid in the activated precursor. The G8A mutation reduces the catalytic rate by a factor 0.004 [78]. Simulation results indicate that the G8A mutation considerably weakens the base pair with C3 with only one weak hydrogen bond that remains intact (Figure 5). In the reactant-state simulation, G8A does not appear to dramatically alter the active site contacts relative to the wild-type simulation, with the excep tion of the A8:N1…C3:N3 distance which increases due to a shift in the hydrogenbond pattern (Figure 5). In the activated precursor state, however, the hydrogen bond positioning between G8:H20 and C1.1:O50 is significantly disrupted relative to the wild- type simulation. The G8A mutation shifts the conserved 20 OH of G8 away from the ideal general acid position and can possibly block the general acid step of the reaction. Mutation of G8 to 2-aminopurine (AP) [34,78] or to 2,6 diaminopurine (diAP) [78], which are expected to have similarly weakened hydrogen bonding as the G8A mutation reduces the reaction rate by over three orders of magnitude.

Insights into RNA Catalysis from Simulations

187

G8I and C3U/G8A mutations are relatively benign. Whereas the relatively isos teric C3U and G8A mutations lead to considerably reduced catalytic rates, the G8I [34,78] and C3U/G8A [73] mutations affect the rate by less than an order of magnitude. The C3U/G8A double mutation and G8I single mutation simulations indicate that the hydrogen-bond network retains the overall base positions rela tive to the wild-type simulation and suggest that these two mutations do not significantly alter any of the active site indexes that would affect activity relative to the wild-type simulation (Table 4). Structural deviations that give rise to mutation effects can occur at different stages along the reaction path. Although the canonical Watson—Crick hydrogen-bond network is altered significantly in both C3U and G8A mutations, the simulations suggest that the origin of the mutational effect on the ribozyme kinetics can occur at different stages along the reaction path. In the reactant state, the Mg2+ ion is bound between G10.1:N7 and A9:O2P. The large base-pair shift that occurs in the C3U mutation simulation results in compromise of the active site structure, including the loss of interactions between the proposed general base and the nucleophile. In the activated precursor state, the Mg2+ ion occupies a bridging position between A9:O2P and the scissile phosphate. The G8A mutation, which is very weakly hydrogen bonded, does not sustain a catalytically viable position of the general acid. Hydrogen bonding between nucleobases in the 3 and 8 positions is necessary but not sufficient to preserve active site structural integrity. The G8I and C3U/G8A muta tions that largely preserve a stable base-pair hydrogen-bonded scaffold lead to relatively benign mutations. A C3G/G8C base-pair switch mutation that pre serves hydrogen-bonded base pairing partially rescues activity relative to the single mutations, although still reduces activity by 150—200-fold [41,79]. Recent analysis of all base-pair mutations indicate considerable variation in activity, but all of the nonnative mutations at this position are considerably less active [79]. The present simulation results offer the prediction that whereas both C3U [76] and G8diAP [78] single mutations are observed experimentally to reduce cataly tic activity by several orders of magnitude, a correlated C3U/G8diAP double mutation, which retains base-pair hydrogen bonding, should exhibit a partial rescue effect in the HHR.

3. MOLECULAR SIMULATIONS OF THE L1 LIGASE In this section, we show how large-scale MD simulations together with crystal lographic data were able to reveal the dynamical hinge points of L1L. We have performed a series of MD simulations of L1L in explicit solvent to examine to what degree the two crystallized conformers persist as stable conformational intermediates in solution and examine the nature of the inactive—active confor mational switch. Four simulations were performed departing from each confor mer in both the reactant and product states, in addition to a simulation where local unfolding in the active state was induced. From these simulations, a set of four virtual torsion angles that span two evolutionarily conserved and restricted

188

Tai-Sung Lee et al.

regions were identified as dynamical hinge points in the conformational switch transition. The ligation site visits three distinct states characterized by hydrogenbond patterns that are correlated with the formation of specific contacts that may promote catalysis. The insights gained from these simulations contribute to a more detailed understanding of the coupled catalytic/conformational switch mechanism of L1L that may facilitate the design and engineering of new catalytic riboswitches. The following simulations have been run: • Prod-D: product in the docked conformation • Prod-D-UF: product in the docked conformation—forced unfolding • Prod-U: product in the undocked conformation • Prec-D-XTP: reactant precursor in the docked conformation and the triphosphate in extended conformation • Prec-D-MgTP: reactant precursor in the docked conformation and the triphosphate in an extended conformation coordinated with a Mg2+ ion

3.1 Conformational variation of L1L occurs at dynamical hinge points Figure 6 illustrates the crystallographic presumably active “docked” and inactive “undocked” conformers. Most of the structural variation is observed in the junction region connecting the stems where the RMSD between crystal confor ˚ . This variation in the junction propagates to a large-scale swing of mers is 8.30 A ˚ in arc length. Individually, L1L stems (A, B, and C), as stem C by around 80 A found in the crystal structure, show small deviations (RMSD 1.71, 0.71, and ˚ , respectively) between the docked and undocked conformational states, 2.06 A and the internal base-pair hydrogen-bonding patterns are completely conserved between conformers. The two crystallographic conformers can be distinguished by changes in their virtual torsion angles (Figure 7) [66,67]. A survey of the 142 virtual torsion angles in L1L reveals that only four show a significant change (>45): 18, 37, 44, and 38 (Figure 7). These torsions span regions that contain the evolutiona rily conserved residues: U37, U38, A39 for 37 and 38 and the 5-base motif (C39=G18 G37 A38 C17/U39=A38 U37 G38 U17) and the neighboring U19 for 18 and 44 [62]. Both U38 and U19 are of particular interest in the current work: U38 con tributes to the docking of stem C into stem A in the active conformation and mutation data demonstrates that it is critical for catalysis, and U19 is evolutiona rily conserved (97%), although its role in catalysis remains unclear [55,61—63]. In both crystallographic conformers, U19 is oriented toward the exterior of the L1L body without making any contacts. The lack of clear structural basis that explains the conserved character of U19 in either the docked or undocked crystal struc tures motivates the question as to whether this residue might be important in facilitating the undocked to docked conformational switch. Large fluctuations in the three-way junction suggest the presence of dynami cal hinge points. In the Prod-D and Prod-U simulations, the virtual torsions associated with the residues that form the junction region (J) are characterized by a residue-average standard deviation (RASD) of 16.1 and 18.5, respectively,

Insights into RNA Catalysis from Simulations

Three-way junction G= a C− A G=cc U− A−u 19 U =C G A G=c

Noncanonically base-paired ligation site = G G A−0 U −= C C C G A C G =C A U U U 38 A 71 50U g G A g 1 U G = C G −a U−A G =c = U Inactive conformation Crystal

Stem A

G44

C=g C=g G=c10 a g a a

189

Active conformation Crystal, MD

Stem C Stem C

Inactive Conformations MD

Inactive conformations MD

Stem B ~20 Å ~40 Å

Figure 6 Representative snapshots from the simulations illustrating the coupled onoff conformational switch-catalytic pathway starting from the active and inactive conformations, in both precursor and product states. Middle: Representative snapshots from the simulations are shown with stems A and B (in yellow wireframe surface) aligned and C in ribbons with transparent surfaces. Left panel: Contacts important for stabilization of intermediate states that involve the conserved U19 and stem B. Right panel: Interaction patterns for three states observed in the simulations between G1/GTP1 and the noncanonically base-paired ligation site. The states that do not appear in the crystal structure are indicated with curly arrows. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this book.) 180 38 37

90 ΔTorsion (°)

Δη Δθ

44

0

−90

−180

18

10

20

30

40

50

60

70

Residue

Figure 7 Variations of the virtual torsions (DTorsion, where Torsion indicates either h or q) between the docked (active) and undocked (inactive) conformers as found in the crystal structure. The Dh (black) and D q(red) are shown. Only four virtual torsions, q19, q37, h38, and h44 show significant deviations (80.0 or more), whereas all other virtual torsions show relatively minor changes (typically less than 25.0). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this book.)

190

Tai-Sung Lee et al.

whereas the RASD for the canonically base-paired stem A, for example, is ~12.5 for the Prod-U simulation. Similar values are found for the other stems. This suggests that the junction possesses a higher degree of intrinsic flexibility relative to other regions of the L1L. The major contributors (SD 30.0) to these large values of RASD are 44 (SD 48.7) and 45 (SD 71.0) for the Prod-D simulation, and 18 (SD 56.8), 19 (SD 30.7), 19 (SD 37.8), 20 (SD 53.6), and 20 (SD 30.8) for the Prod-U simulation. In the case of the Prod-D-UF simulation, 44 and 45 virtual torsions have the largest standard deviations in the junction region (~30.0). A possible connection can be made between these relatively large fluctuations of some of the virtual torsions that span the junction region and the transition from the inactive to the active conformation. For example, during Prod-U simu lation (370 ns), 18 is observed to transition through a series of two relatively distant clusters that are disposed exactly on the same path that would allow the undockedÐdocked conformational transformation to occur as apparent from the crystal structure. During the same simulation, the conserved U19 can be observed interact ing specifically with stem B, accounting for its possible role in the stabiliza tion of intermediate states on the conformational transition pathway. These interaction patterns that correlate with the transition along the 18 virtual torsion were observed and involve a variety of base—base, base—backbone, or backbone—backbone hydrogen bonds. Given the conserved nature of the inter acting partners, it is possible that these base—base interactions play an impor tant role in the stabilization of the intermediate states during transitions between the two conformations, and contribute in part to the discrimination in the evolutionary optimization process between active and inactive constructs.

3.2 The U38 loop responsible for allosteric control is intrinsically flexible The fact that U38, a conserved residue critical for catalytic activity in the L1L family, is docked into the ligation site and makes a canonical base pair with a constituent of the ligation site A51 in the docked conformation, whereas in the ˚ away from the site, has led to the undocked conformation it is positioned ~40 A postulate that the former is more likely representative of a catalytically active state [64]. In the crystal structures, the conformation of the U38 loop varies considerably between the docked and undocked conformers, showing two different conforma tions. Two virtual torsions that span the loop, 37 and 38, show a large variation (~120.0) between the two crystal conformers, whereas all the other virtual torsions in the immediate base-paired vicinity show much smaller variation (20.0) (Figure 7). Mechanistically, during the undockedÐdocked conforma tional transition, the U38 loop has to transition from a state where U38 is buried into stem C and makes a sugar-base hydrogen bond with A23 into an open conformation that will expose U38 in solution giving it the possibility to dock

Insights into RNA Catalysis from Simulations

191

into stem A. The docking consists in the formation of a canonical base U38=A52 and several phosphate—Mg2+ contacts, supported by A39 and G40 of stem C on one side and G1 of stem A on the other. In the Prod-D, Prec-D-XTP and Prec-D-MgTP simulations, U38 remains base paired with A51, and the overall behavior of the 37 and 38 virtual torsions is similar. For the Prod-D simulation, the average values for the 37 and 38 torsions (220.3 and 27.0, respectively) stay within 43 of the values found in the docked crystal conformer (243.8 and —15.5, respectively). In the Prod-U simulation, the 37 and 38 virtual torsions follow a similar trend with respect to their undocked conformer values. Averages were 192.1 and 128.7, respectively, compared to 135.9 and 105.9 in the undocked crystal structure. For these set of simulations (Prod-D, Prec-D-XTP, Prec-D-MgTP and Prod-U), U38 fluctuates relatively close to its starting geometry in two approximately separate states indicated with different colors in Figure 8b (lower panel). These two regions are labeled D (solvent exposed/docked) and U (buried), respectively, and two representative structures are depicted in Figure 8a. As shown by the correlation plots of 37 and 38 virtual torsions (Figure 8b, upper panel) in the Prod-D-UF simulation, the U38 loop is able to span the conformational space that covers almost exactly the same area that includes the reunion of the regions sampled during Prod-D and Prod-U simulations, encompassing the conformations found in crystal.

3.3 Anatomy of the ligation site and implications for catalysis L1L catalyzes the phosphodiester-bond formation between GTP1:P and U71: O30 . Generally, this type of nucleotidyl transfer reaction proceeds through a pentacovalent TS or metastable intermediate that causes the inversion of the P stereochemistry and results in the release of pyrophosphate [80,81]. The noncanonically base-paired ligation site on stem A is built from three noncanonical base pairs: U50•G2 (cis-WC/WC), A51&!G1 (trans-Hoogsteen/ sugar edge), and G52•U71(cis-WC/WC) (Figure 6, right inset). In the crystal lographic docked product conformation, A51 is also involved in a canonical base pair with U38 giving rise to base triple U38—A51&!G1. This interaction along with the interactions between G1, A39, and G40 phosphates with a Mg2+ ion is responsible for mediating the docking of stem C into stem A. The forma tion of the base triple U38—A51&!G1 disrupts the hydrogen bond between G1: O20 and A51:N6. This hydrogen bond is present in the undocked conformer crystal structure and is typical for a trans-Hoogsteen/sugar edge base pair [82]. During the Prod-D, Prec-XTP, and Prec-MgTP simulations, the ligation site shows a strong variability, spanning a series of three conformational clusters characterized by the formation of specific hydrogen bond interactions between G1/GTP1 and the rest of the ligation site. G1/GTP1 oscillates between (Figure 9) formation of hydrogen bond with G52, GTP1/G1:N2—G52:O6 (cluster 1), a base triple with A51 and U38 that is identical to that found in crystallized docked conformer (cluster 2), and a base triple with U50 and G2 (cluster 3). One of the structural features that confers an increased flexibility for G1/GTP1 and its ability to interact with different parts of the ligation site for

192

Tai-Sung Lee et al.

180 120

U Conformation

Inactive (Crystal)

60 0

U38

U38:P D Conformation

η38

θ37

η38 (°)

U37:P U37:C4’

Active (Crystal)

−60 −120 −180

U

120 60

D

0 −60

U38

U38:C4’ A39:P

−120 −180

0

60

120 180 240 300 360 θ37 (°)

Figure 8 The U38 loop is responsible for allosteric control of the catalytic step by transitioning from a closed conformation, labeled U conformation (specific, as shown by our simulations, to the inactive/undocked conformers), to an open conformation, labeled D conformation (specific to the active/docked conformations). (Left) Representative conformations for the U and D conformations. (Right) Upper panel: During Prod-D-UF simulation, where unfolding of the docked state was induced by removal of key interstem interactions, the complete transition between the two states was observed by monitoring 37 and 38 virtual torsions. The values of the 37 and 38 found in the two crystallized conformers are shown with blue and orange crosses. Lower panel: The 37 =38 space sampled in the Prod-D-UF overlaps with regions sampled during the Prod-D (red) and Prod-U (blue) simulations. The center of each distribution is marked with black crosses. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this book.)

the Prod-D, Prec-XTP, and Prec-MgTP simulations resides in the way GTP1/G1:O20 interacts with the ligation site and the solvent. During Prod-D, Prec-XTP, and PrecMgTP simulations, GTP1/G1:O20 shows no tendency to recover the G1:O20 —A51:N6 hydrogen bond present in the undocked crystal and Prod-U simulation, but instead interacts alternately with bulk water molecules or G2:O2P. Docking of stem C into stem A causes the rupture of the G1:O20 —A51:N6 hydrogen bond in the active state and has the effect of increasing the degree of conformational variability in the ligation site of the product and precursor/reactant state. During the simulations, the nucleophile U71:HO30 is involved in contacts with U71:O20 and GTP1:O2Pa, where the latter is positioned to possibly act as a general base. These hydrogen bonds are mainly formed from structures in clusters 1 and 3, with negligible probability for cluster 2, as shown in Figure 9. The variability of the ligation site (i.e., the capacity to visit several conformational clusters) makes it possible to induce the formation of specific contacts that might promote the chemical step in catalysis (i.e., the contact between U71:HO30 and GTP1:O2P). This is due to the noncanonical base pairing of the ligation site.

Insights into RNA Catalysis from Simulations

G52

193

1

U71 A51

U38

A

U50

B GTP1

G2

G52

U71

A

2

A51

U38

U50

B

GTP1 G2

U71

G52

A

3

A51

U38

U50

B GTP1

G2

Figure 9 The noncanonically base-paired ligation site exhibits a high degree of conformational variability, passing through a series of three states (clusters 1, 2, and 3) characterized by specific hydrogen-bond patterns between GTP1/G1 and the ligation site. The arrangement of the ligation site in the case of the Prec-MgTP simulations is shown.

4. METHODS Simulations were performed with the NAMD simulation package (version 2.6) [83] using the all-atom Cornell et al. force field (parm99) [84], generated with the AMBER 10 package [85—87] and TIP3P water model [88]. Periodic boundary

194

Tai-Sung Lee et al.

conditions were used along with the isothermal—isobaric ensemble (NPT) at 1 atm and 300 K using Nose—Hoover—Langevin pressure piston control [89,90] with a decay period of 100.0 fs and a damping time scale of 50 fs, and the Langevin thermostat with a damping coefficient of 0.1 ps1. The smooth particle mesh Ewald (PME) method [91,932] was employed with a B-spline interpolation order of 6 and the default value used in NAMD. The fast Fourier transform ˚ spacing. (FFT) grid points used for the lattice directions were chosen using ~1.0 A ˚ with Nonbonded interactions were treated using an atom-based cutoff of 12.0 A ˚ switching of nonbond potential beginning at 10.0 A. Numerical integration was performed using the leap-frog Verlet algorithm with 1 fs time step [93]. Covalent bond lengths involving water hydrogens were constrained using the SHAKE algorithm [94]. For the QM/MM simulations, the system is partitioned into a QM region constituting the active site that is represented by the AM1/d-PhoT Hamiltonian [95] and the modified AM1 magnesium parameters of Hutter and coworkers [96]. The simulations were performed with CHARMM [97] (version c32a2) using the all-atom CHARMM27 nucleic acid force field [98,99] with extension to reactive intermediate models (e.g., TS mimics) [72]. The generalized hybrid orbital (GHO) method [100] is used to cut a covalent bond to divide the system into QM and MM region. Full electrostatic interactions were calculated using a recently intro duced linear-scaling QM/MM-Ewald method [101].

5. CONCLUSION In this chapter, we summarized our progress toward the understanding of HHR and L1L ribozyme catalysis through a multiscale simulation strategy. This strat egy employs long-time MD simulations using a classical MM force field in explicit solvent and specialized MM residues for metal ion interactions and reactive intermediates. Additionally, we employed a set of shorter time simulations using a combined QM/MM potential that uses a recently developed semiempirical QM model for phosphoryl transfer reactions that was derived from high-level density-functional calculations of reactions important in RNA catalysis. Long-range ionic interactions were treated rigorously with linear-scal ing electrostatic methods for periodic systems. Simulation results for the HHR paint a picture of its catalysis that includes a novel role for a catalytic metal ion. The HHR folds to form an electrostatic negative pocket to recruit a threshold occupation of cationic charge, either a Mg2+ ion or multiple monovalent ions when Mg2+ ions are not present. The position and coordination pattern of these ions are important for formation of active in-line attack conformations. In the case of single Mg2+ ion bound in the active site, the Mg2+ ion initially stays at the C-site in the reactant state and migrates to a bridging position (the B-site) after the nucleophile (C17:O20 ) is deprotonated. As the reaction proceeds, the Mg2+ ion can stabilize the accumu lating charge of the leaving group and bind to the general acid (G8:O20 ), signifi cantly increasing its ability to act as a general acid catalyst to transfer a proton to

Insights into RNA Catalysis from Simulations

195

the leaving group (C1.1:O50 ). Our QM/MM studies demonstrate that the Mg2+ ion not only facilitates the protonation of the leaving O50 , but also plays an important role in the final dissociation step of the catalysis. The mutational simulation results are consistent with observed mutational data and suggest that the active site fold is well tuned for the reaction and most disruptions due to mutations have severe impact on the HHR catalysis that can occur at different stages of the reaction. Simulation results for the L1L ribozyme have identified important dynamical hinge points in the conformational transition from inactive to active forms of the L1L and characterized interactions that stabilize intermediates along the transi tion pathway. From the simulations, we have identified a reduced set of four virtual torsions that span two evolutionarily conserved and restricted regions located in the three-way junction and can be used to distinguish between the active and inactive conformations found in crystal. Analysis of the structure and dynamics of the noncanonically base-paired active site have implications for catalysis. Simulations were performed from two different initial arrangements of the reactant state differing in the conformation of the GTP1 triphosphate and its Mg2+ coordination. The U71:O30 nucleophile was observed to make direct hydrogen bond interactions with the O2 of GTP1, in support of the role of this residue as a potential general base. These studies advance our knowledge of the coupled catalytic/conformational riboswitch mechanism of L1L and may have broader implications for understanding the function of prebiotic RNA enzymes. The present work uses molecular simulations to provide a deeper under standing into how structure and dynamics affect catalysis in the HHR and L1L ribozyme. In particular, the role of conformational transitions and metal ion binding is explored. The insights gained from these studies provide guiding principles into catalysis of a archetype ribozyme and a novel catalytic riboswitch that may have broader implications for prebiotic RNA enzymes, and ultimately facilitate the design of new RNA-based biomedical technology. Still much work, both experimental and theoretical, is needed to obtain a consensus view of the detailed mechanism, for which the present study provides important ground work and progress.

ACKNOWLEDGMENTS We are grateful for financial support provided by the National Institutes of Health (GM62248 for the HHR studies and GM084149 for the L1L studies). This work was supported in part by the University of Minnesota Biomedical Informatics and Computational Biology program (TL and DY) and by a generous allocation on an IBM Blue Gene BG/P with 4,096 850 Mhz CPUs at the IBM Advanced Client Technology Center in Rochester, Minnesota, with further thanks to Cindy Mestad, Steven Westerbeck, and Geoffrey Costigan for technical assistance. This research was also performed in part using the Molecular Science Computing Facility (MSCF) in the William R. Wiley Environmental Molecular Sciences Laboratory, a national scientific user facility sponsored by the U.S. Department of Energy’s Office of Biological and Environmental Research and located at the Pacific Northwest National Laboratory, operated for the Department of Energy by Battelle. Computational resources from The Minnesota Supercomputing Institute for Advanced Computational Research (MSI) were utilized in this work.

196

Tai-Sung Lee et al.

REFERENCES 1. Chen, X., Li, N., Ellington, A.D. Ribozyme catalysis of metabolism in the RNA world. Chem. Biodivers. 2007, 4, 633—55. 2. Gesteland, R.F., Cech, T.R., Atkins, J.F. The RNA World: The Nature of Modern RNA Suggests a Prebiotic RNA, 2nd edn., Cold Spring Harbor Laboratory Press, New York, 1999. 3. Gilbert, W. The RNA world. Nature 1918, 319, 618. 4. Lilley, D.M. In Ribozymes and RNA Catalysis. RSC Biomolecular Series (eds D.M. Lilley and F. Eckstein), RSC Publishing, Cambridge, 2008, pp. 66—91. 5. Scott, W.G. Molecular palaeontology: Understanding catalytic mechanisms in the RNA world by excavating clues from a ribozyme three-dimensional structure. Biochem. Soc. Trans. 1996, 24(3), 604—8. 6. Scott, W.G. Ribozymes. Curr. Opin. Struct. Biol. 2007, 17, 280—6. 7. Yarus, M. Boundaries for an RNA world. Curr. Opin. Chem. Biol. 1999, 3, 260—7. 8. Breaker, R.R. Engineered allosteric ribozymes as biosensor components. Curr. Opin. Biotechnol. 2002, 13(1), 31—9. 9. Rubenstein, M., Tsui, R., Guinan, P. A review of antisense oligonucleotides in the treatment of human disease. Drugs Future 2004, 29, 893—909. 10. Vaish, N.K., Dong, F., Andrews, L., Schweppe, R.E., Ahn, N.G., Blatt, L., Seiwert, S.D. Monitoring post-translational modification of proteins with allosteric ribozymes. Nat. Biotech. 2002, 20, 810—5. 11. McDowell, S.E., Sˇpacˇkova, N., Sˇponer, J., Walter, N.G. Molecular dynamics simulations of RNA: An in silico single molecule approach. Biopolymers 2007, 85, 169—84. 12. Gao, J. Methods and applications of combined quantum mechanical and molecular mechanical potentials. Rev. Comput. Chem. 1995, 7, 119—85. 13. Monard, G., Merz, K.M., Jr. Combined quantum mechanical/molecular mechanical methodolo gies applied to biomolecular systems. Acc. Chem. Res. 1999, 32, 904—11. 14. Senn, H.M., Thiel, W. QM/MM studies of enzymes. Curr. Opin. Chem. Biol. 2007, 11, 182—7. 15. Warshel, A. Molecular dynamics simulations of biological reactions. Acc. Chem. Res. 2002, 35, 385—95. 16. Warshel, A. Computer simulations of enzyme catalysis: Methods, progress, and insights. Annu. Rev. Biophys. Biomol. Struct. 2003, 32, 425—43. 17. Norberg, J., Nilsson, L. Molecular dynamics applied to nucleic acids. Acc. Chem. Res. 2002, 35, 465—72. 18. Orozco, M., Noy, A., Perez, A. Recent advances in the study of nucleic acid flexibility by molecular dynamics. Curr. Opin. Struct. Biol. 2008, 18(2), 185—93, http://dx.doi.org/10.1016/j. sbi.2008.01.005 19. Orozco, M., Perez, A., Noy, A., Luque, F.J. Theoretical methods for the simulation of nucleic acids. Chem. Soc. Rev. 2003, 32, 350—64. 20. Auffinger, P., Hashem, Y. Nucleic acid solvation: From outside to insight. Curr. Opin. Struct. Biol. 2007, 17(3), 325—33, http://dx.doi.org/10.1016/j.sbi.2007.05.008. 21. Cheatham, T.E., III Simulation and modeling of nucleic acid structure, dynamics and interactions. Curr. Opin. Struct. Biol. 2004, 14, 360—7. 22. Chen, S.-J. RNA folding: Conformational statistics, folding kinetics, and ion electrostatics. Annu. Rev. Biophys. 2008, 37, 197—214, http://dx.doi.org/10.1146/annurev.biophys.37.032807. 125957. 23. Blount, K.F., Uhlenbeck, O.C. The hammerhead ribozyme. Biochem. Soc. Trans. 2002, 30(Pt 6), 1119—22, http://dx.doi.org/10.1042/. 24. Scott, W.G. Biophysical and biochemical investigations of RNA catalysis in the hammerhead ribozyme. Q. Rev. Biophys. 1999, 32, 241—94. 25. Doherty, E.A., Doudna, J.A. Ribozyme structures and mechanisms. Annu. Rev. Biophys. Biomol. Struct. 2001, 30, 457—75. 26. Scott, W.G. RNA catalysis. Curr. Opin. Struct. Biol. 1998, 8(6), 720—6. 27. Michienzi, A., Cagnon, L., Bahner, I., Rossi, J.J. Ribozyme-mediated inhibition of HIV 1 suggests nucleolar trafficking of HIV-1 RNA. Proc. Natl. Acad. Sci. U.S.A. 2000, 97(16), 8955—60.

Insights into RNA Catalysis from Simulations

197

28. Sarver, N., Cantin, E.M., Chang, P.S., Zaia, J.A., Ladne, P.A., Stephens, D.A., Rossi, J.J. Ribozymes as potential anti-HIV-1 therapeutic agents. Science 1990, 247, 1222—5. 29. Snyder, D.S., Wu, Y., Wang, J.L., Rossi, J.J., Swiderski, P., Kaplan, B.E., Forman, S.J. Ribozyme mediated inhibition of bcr-abl gene expression in a Philadelphia chromosome-positive cell line. Blood 1993, 82(2), 600—5. 30. Feng, Y., Kong, Y.Y., Wang, Y., Qi, G.R. Intracellular inhibition of the replication of hepatitis B virus by hammerhead ribozymes. J. Gastroenterol. Hepatol. 2001, 16(10), 1125—30. 31. Weinberg, M., Passman, M., Kew, M., Arbuthnot, P. Hammerhead ribozyme-mediated inhibition of hepatitis b virus x gene expression in cultured cells. J. Hepatol. 2000, 33(1), 142—51. 32. Sano, M., Taira, K. Hammerhead ribozyme-based target discovery. Methods Mol. Biol. 2007, 360, 143—53, http://dx.doi.org/10.1385/1-59745-165-7:143 33. Martick, M., Horan, L.H., Noller, H.F., Scott, W.G. A discontinuous hammerhead ribozyme embedded in a mammalian messenger RNA. Nature 2008a, 454(7206), 899—902, http://dx.doi. org/10.1038/nature07117. 34. Blount, K.F., Uhlenbeck, O.C. The structure-function dilemma of the hammerhead ribozyme. Annu. Rev. Biophys. Biomol. Struct. 2005, 34, 415—40. 35. Takagi, Y., Ikeda, Y., Taira, K. Ribozyme mechanisms. Top. Curr. Chem. 2004, 232, 213—51. 36. Suzumura, K., Takagi, Y., Orita, M., Taira, K. NMR-based reappraisal of the coordination of a metal ion at the Pro-Rp oxygen of the A9/G10.1 site in a hammerhead ribozyme. J. Am. Chem. Soc. 2004, 126(47), 15504—11. 37. Wang, S., Karbstein, K., Peracchi, A., Beigelman, L., Herschlag, D. Identification of the hammer head ribozyme metal ion binding site responsible for rescue of the deleterious effect of a cleavage site phosphorothioate. Biochemistry 1999, 38(43), 14363—78. 38. Murray, J.B., Szo¨ke, H., Szo¨ke, A., Scott, W.G. Capture and visualization of a catalytic RNA enzyme-product complex using crystal lattice trapping and X-ray holographic reconstruction. Mol. Cell 2000, 5, 279—87. 39. Murray, J.B., Terwey, D.P., Maloney, L., Karpeisky, A., Usman, N., Beigelman, L., Scott, W.G. The structural basis of hammerhead ribozyme self-cleavage. Cell 1998, 92, 665—73. 40. Scott, W.G., Murray, J.B., Arnold, J.R.P., Stoddard, B.L., Klug, A. Capturing the structure of a catalytic RNA intermediate: The hammerhead ribozyme. Science 1996, 274, 2065—9. 41. Martick, M., Scott, W.G. Tertiary contacts distant from the active site prime a ribozyme for catalysis. Cell 2006, 126(2), 309—20. 42. Martick, M., Lee, T.-S., York, D.M., Scott, W.G. Solvent structure and hammerhead ribozyme catalysis. Chem. Biol. 2008, 15, 332—42. 43. Lee, T.-S., Giambau, G.M., Sosa, C.P., Martick, M., Scott, W.G., York, D.M. Threshold occupancy and specific cation binding modes in the hammerhead ribozyme active site are required for active conformation. J. Mol. Biol. 2009, 388, 195—206. http://dx.doi.org/10.1016/j. jmb.2009.02.054. 44. Lee, T.-S., Silva Lopez, C., Giambasu, G.M., Martick, M., Scott, W.G., York, D.M. Role of Mg2+ in Hammerhead ribozyme catalysis from molecular simulation. J. Am. Chem. Soc. 2008, 130(10), 3053—64. 45. Lee, T.-S., Silva-Lopez, C., Martick, M., Scott, W.G., York, D.M. Insight into the role of Mg2+ in hammerhead ribozyme catalysis from x-ray crystallography and molecular dynamics simulation. J. Chem. Theory Comput. 2007, 3, 325—7. 46. Lee, T.-S., York, D.M. Origin of mutational effects at the C3 and G8 positions on hammerhead ribozyme catalysis from molecular dynamics simulations. J. Am. Chem. Soc. 2008, 130(23), 7168—9. 47. Scott, W.G. Morphing the minimal and full-length hammerhead ribozymes: Implications for the cleavage mechanism. Biol. Chem. 2007, 388, 727—35. 48. Joyce, G.F. A glimpse of biology’s first enzyme. Science 2007, 315, 1507—8. 49. Bartel, D.P., Szostak, J.W. Isolation of new ribozymes from a large pool of random sequences. Science 1993, 261(5127), 1411—8. 50. Bagby, S.C., Bergman, N.H., Shechner, D.M., Yen, C., Bartel, D.P. A class I ligase ribozyme with reduced Mg2+ dependence: Selection, sequence analysis, and identification of functional tertiary interactions. RNA 2009, 15, 2129—46.

198

Tai-Sung Lee et al.

51. Ekland, E.H., Szostak, J.W., Bartel, D.P. Structurally complex and highly active RNA ligases derived from random RNA sequences. Science 1995, 269, 364—9. 52. Ikawa, Y., Tsuda, K., Matsumura, S., Inoue, T. De novo synthesis and development of an RNA enzyme. Proc. Natl. Acad. Sci. U.S.A. 2004, 101, 13750—5. 53. Jaeger, L., Wright, M.C., Joyce, G.F. A complex ligase ribozyme evolved in vitro from a group I ribozyme domain. Proc. Natl. Acad. Sci. U.S.A. 1999, 96, 14712—7. 54. McGinness, K.E., Joyce, G.F. In search of an RNA replicase ribozyme. Chem. Biol. 2003, 10(1), 5—14. 55. Robertson, M.P., Ellington, A.D. In vitro selection of an allosteric ribozyme that transduces analytes to amplicons. Nat. Biotech. 1999, 17, 62—6. 56. Rogers, J., Joyce, G.F. A ribozyme that lacks cytidine. Nature 1999, 402, 323—5. 57. Shechner, D.M., Grant, R.A., Bagby, S.C., Koldobskaya, Y., Piccirilli, J.A., Bartel, D.P. Crystal structure of the catalytic core of an RNA-polymerase ribozyme. Science 2009, 326, 1271—5. 58. Landweber, L.F., Pokrovskaya, I.D. Emergence of a dual-catalytic RNA with metal-specific cleavage and ligase activities: The spandrels of RNA evolution. Proc. Natl. Acad. Sci. U.S.A. 1999, 96, 173—8. 59. Ellington, A.D., Szostak, J.W. In vitro selection of RNA molecules that bind specific ligands. Nature 1990, 346(6287), 818—22. http://dx.doi.org/10.1038/346818a0. 60. Marshall, K.A., Ellington, A.D. Training ribozymes to switch. Nat. Struct. Biol. 1999, 6(11), 992—4. 61. Robertson, M.P., Ellington, A.D. Design and optimization of effector-activated ribozyme ligases. Nucleic Acids Res. 2000, 28, 1751—9. 62. Robertson, M.P., Ellington, A.D. In vitro selection of nucleoprotein enzymes. Nat. Biotech. 2001, 19, 650—5. 63. Robertson, M.P., Knudsen, S.M., Ellington, A.D. In vitro selection of ribozymes dependent on peptides for activity. RNA 2004, 10, 114—27. 64. Robertson, M.P., Scott, W.G. The structural basis of ribozyme-catalyzed RNA assembly. Science 2007, 315, 1549—50. 65. Giambas¸u, G.M., Lee, T.-S.L., Sosa, C.P., Robertson, M.P., Scott, W.G., York, D.M. Identification of dynamical hinge points of the L1 ligase molecular switch. RNA 2010, 16(4), 769—80. 66. Duarte, C.M., Pyle, A.M. Stepping through an RNA structure: A novel approach to conforma tional analysis. J. Mol. Biol. 1998, 284(5), 1465—78, http://dx.doi.org/10.1006/jmbi.1998.2233. 67. Wadley, L.M., Keating, K.S., Duarte, C.M., Pyle, A.M. Evaluating and learning from RNA pseudotorsional space: Quantitative validation of a reduced representation for RNA structure. J. Mol. Biol. 2007, 372(4), 942—57, http://dx.doi.org/10.1016/j.jmb.2007.06.058. 68. Curtis, E.A., Bartel, D.P. The hammerhead cleavage reaction in monovalent cations. RNA 2001, 7, 546—52. 69. Murray, J.B., Seyhan, A.A., Walter, N.G., Burke, J.M., Scott, W.G. The hammerhead, hairpin and vs ribozymes are catalytically proficient in monovalent cations alone. Chem. Biol. 1998, 5, 587—95. 70. O’Rear, J.L., Wang, S., Feig, A.L., Beigelman, L., Uhlenbeck, O.C., Herschlag, D. Comparison of the hammerhead cleavage reactions stimulated by monovalent and divalent cations. RNA 2001, 7. 537—45. 71. Davis, J.H., Foster, T.R., Tonelli, M., Butcher, S.E. Role of metal ions in the tetraloop-receptor complex as analyzed by NMR. RNA 2007, 13(1), 76—86. http://dx.doi.org/10.1261/rna.268307 72. Mayaan, E., Moser, A., Mackerell, A.D. Jr., York, D.M. CHARMM force field parameters for simulation of reactive intermediates in native and thio-substituted ribozymes. J. Comput. Chem. 2007, 28, 495—507. 73. Przybilski, R., Hammann, C. The tolerance to exchanges of the Watson Crick base pair in the hammerhead ribozyme core is determined by surrounding elements. RNA 2007, 13, 1625—30. 74. Burgin, A.B., Jr., Gonzalez, C., Matulic-Adamic, J., Karpeisky, A.M., Usman, N., McSwiggen, J.A., Beigelman, L. Chemically modified hammerhead ribozymes with improved catalytic rates. Biochemistry 1996, 35, 14090—7. 75. Soukup, G.A., Breaker, R.R. Relationship between internucleotide linkage geometry and the stability of RNA. RNA 1999, 5, 1308—25.

Insights into RNA Catalysis from Simulations

199

76. Baidya, N., Uhlenbeck, O.C. A kinetic and thermodynamic analysis of cleavage site mutations in the hammerhead ribozyme. Biochemistry 1997, 36, 1108—14. 77. Ruffner, D.E., Stormo, G.D., Uhlenbeck, O.C. Sequence requirements of the hammerhead RNA self-cleavage reaction. Biochemistry 1990, 29, 10695—712. 78. Han, J., Burke, J.M. Model for general acid-base catalysis by the hammerhead ribozyme: pHactivity relationships of G8 and G12 variants at the putative active site. Biochemistry 2005, 44, 7864—70. 79. Nelson, J.A., Uhlenbeck, O.C. Minimal and extended hammerheads utilize a similar dynamic reaction mechanism for catalysis. RNA 2008, 14(1), 43—54, http://dx.doi.org/10.1261/rna.717908. 80. Steitz, T.A. DNA- and RNA-dependent DNA polymerases. Curr. Opin. Struct. Biol. 1993, 3, 31—8. 81. Steitz, T.A., Steitz, J.A. A general two-metal-ion mechanism for catalytic RNA. Proc. Natl. Acad. Sci. U.S.A. 1993, 90, 6498—502. 82. Leontis, N.B., Stombaugh, J., Westhof, E. The non-Watson-Crick base pairs and their associated isostericity matrices. Nucleic Acids Res. 2002, 30, 3497—531. 83. Phillips, J.C., Braun, R., Wang, W., Gumbart, J., Tajkhorshid, E., Villa, E., Chipot, C., Skeel, R.D., Kalee, L., Schulten, K. Scalable molecular dynamics with NAMD. J. Comput. Chem. 2005, 26, 1781—802. 84. Cornell, W.D., Cieplak, P., Bayly, C.I., Gould, I.R., Ferguson, D.M., Spellmeyer, D.C., Fox, T., Caldwell, J.W., Kollman, P.A. A second generation force field for the simulation of proteins, nucleic acids and organic molecules. J. Am. Chem. Soc. 1995, 117, 5179—97. 85. Case, D., Darden, T., Cheatham, T. III, Simmerling, C., Wang, J., Duke, R., Luo, R., Crowley, M., Walker, R., Zhang, W., Merz, K., Wang, B., Hayik, S., Roitberg, A., Seabra, G., Kolossvary, I., Wong, K., Paesani, F., Vanicek, J., Wu, X., Brozell, S., Steinbrecher, T., Gohlke, H., Yang, L., Tan, C., Mongan, J., Hornak, V., Cui, G., Mathews, D., Seetin, M., Sagui, C., Babin, V., Kollman, P. AMBER 10, University of California, San Francisco, 2002. 86. Case, D.A., Cheatham, T.E.III, Darden, T., Gohlke, H., Luo, R., Merz, K.M., Onufriev, A., Simmerling, C., Wang, B., Woods, R.J. The AMBER biomolecular simulation programs. J. Comput. Chem. 2005, 26, 1668—88. 87. Pearlman, D.A., Case, D.A., Caldwell, J.W., Ross, W.R., Cheatham, T., III, DeBolt, S., Ferguson, D., Seibel, G., Kollman, P. AMBER, a package of computer programs for applying molecular mechanics, normal mode analysis, molecular dynamics and free energy calculations to simulate the structure and energetic properties of molecules. Comput. Phys. Commun. 1995, 91, 1—41. 88. Jorgensen, W.L., Chandrasekhar, J., Madura, J.D., Impey, R.W., Klein, M.L. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 1983, 79, 926—35. 89. Feller, S., Zhang, Y., Pastor, R., Brooks, B. Constant pressure molecular dynamics simulation: The Langevin piston method. J. Chem. Phys. 1995, 103, 4613—21. 90. Martyna, G.J., Tobias, D.J., Klein, M.L. Constant pressure molecular dynamics algorithms. J. Chem. Phys. 1994, 101, 4177—89. 91. Essmann, U., Perera, L., Berkowitz, M.L., Darden, T., Hsing, L., Pedersen, L.G. A smooth particle mesh Ewald method. J. Chem. Phys. 1995, 103(19), 8577—93. 92. Sagui, C., Darden, T.A. Molecular dynamics simulations of biomolecules: Long-range electro static effects. Annu. Rev. Biophys. Biomol. Struct. 1999, 28, 155—79. 93. Allen, M., Tildesley, D. Computer Simulation of Liquids, Oxford University Press, Oxford, 1987. 94. Ryckaert, J.P., Ciccotti, G., Berendsen, H.J.C. Numerical integration of the Cartesian equations of motion of a system with constraints: Molecular dynamics of n-Alkanes. J. Comput. Phys. 1977, 23, 327—41. 95. Nam, K., Cui, Q., Gao, J., York, D.M. Specific reaction parametrization of the AM1/d Hamilto nian for phosphoryl transfer reactions: H, O, and P atoms. J. Chem. Theory Comput. 2007, 3, 486—504. 96. Hutter, M.C., Reimers, J.R., Hush, N.S. Modeling the bacterial photosynthetic reaction center. 1. Magnesium parameters for the semiempirical AM1 method developed using a genetic algo rithm. J. Phys. Chem. B 1998, 102, 8080—90. 97. Brooks, B.R., Bruccoleri, R.E., Olafson, B.D., States, D.J., Swaminathan, S., Karplus, M. CHARMM: A program for macromolecular energy minimization and dynamics calculations. J. Comput. Chem. 1983, 4, 187—217.

200

Tai-Sung Lee et al.

98. Foloppe, N., MacKerell, A.D., Jr. All-atom empirical force field for nucleic acids: I. Parameter optimization based on small molecule and condensed phase macromolecular target data. J. Comput. Chem. 2000, 21, 86—104. 99. MacKerell, ,A.D. Jr., Banavali, N.K. All-atom empirical force field for nucleic acids: II. Applica tion to molecular dynamics simulations of DNA and RNA in solution. J. Comput. Chem. 2000, 21, 105—20. 100. Gao, J., Amara, P., Alhambra, C., Field, M.J. A generalized hybrid orbital (GHO) method for the treatment of boundary atoms in combined QM/MM calculations. J. Phys. Chem. A 1998, 102, 4714—21. 101. Nam, K., Gao, J., York, D.M. An efficient linear-scaling Ewald method for long-range electrostatic interactions in combined QM/MM calculations. J. Chem. Theory Comput. 2005, 1(1), 23—13.

CHAPTER

11 Atomistic Modeling of Solid Oxide Fuel Cells C. Heath Turner1, Xian Wang1, Kah Chun Lau2, Wei An1 and Brett I. Dunlap3

Contents

1. Introduction 2. Kinetic Monte Carlo Simulations 3. Kinetic Parameters 3.1 Experimental estimates of kinetic parameters 3.2 Computational estimates of kinetic parameters 4. Atomistic Simulations of Solid Oxide Fuel Cells 5. Conclusions and Future Perspective Acknowledgments References

Abstract

In order to improve the performance, reliability, and efficiency of solid oxide fuel cells, it is important to understand the underlying atomic-level interactions and mechanisms that dictate the global operation dynamics. As a complement to the ongoing experimental progress in this area, a great deal of atomistic-level modeling studies has recently appeared in the literature. This chapter reviews the development of kinetic Monte Carlo simulation approaches for translating atomic-level information into experimentally observable properties. The combination of advanced experimental techniques with atomic-level simulations should provide the insight necessary for the optimal design of next-generation fuel cells.

202 204 206 206 210 212 227 228 228

Keywords: solid oxide fuel cell; modeling; kinetic Monte Carlo; frequency response; atomistic; density functional theory; electrochemistry

1

Department of Chemical and Biological Engineering, The University of Alabama, Tuscaloosa, AL, USA

2

Department of Chemistry, George Washington University, Washington DC, USA

3

Chemistry Division, Naval Research Laboratory, Washington DC, USA

Annual Reports in Computational Chemistry, Volume 6 ISSN: 1574-1400, DOI 10.1016/S1574-1400(10)06011-1

2010 Elsevier B.V. All rights reserved.

201

202

C. Heath Turner et al.

1. INTRODUCTION Solid oxide fuel cells (SOFCs) hold a great deal of promise for converting hydro carbon fuels into electricity. Their conversion efficiency (nearly 70%, which can be increased with cogeneration) is significantly higher than conventional heat engines [1], and if H2 is used as the fuel, electrical power can be generated without CO2 emission. One of the main performance characteristics of SOFCs is that high operating temperatures (>800C) are typically required for efficient energy conversion. Ultimately, the high-temperature operation limits the types of applications that can benefit from this technology, and at present, most of the applications have been restricted to stationary power generation [2—4]. In con junction, the high operating temperatures have detrimental effects on the longterm stability of these systems. As a consequence, one of the primary goals of next-generation SOFC development is lowering the operating temperature, in order to increase material stability and reduce the need for expensive material components (such as the interconnects). As a simple example, while the cation sublattice is relatively stable (as compared to the mobility of the oxygen species), long-term material degradation can still be a major concern [5—9]. At tempera tures above 800C and with expected operating times of more than 30,000 h, cation diffusion can be a major contribution to performance loss. Simultaneously, the power output must be maintained at these lower temperatures, and this is a major challenge. In retrospect, it should be recognized that relatively high tem peratures (650—800C) are still attractive, because this is an optimal window for internal reforming of fuels, such as natural gas [1]. In order to improve performance, reliability, and efficiency of SOFCs, it is important to develop a clear understanding of the underlying atomic-level inter actions and mechanisms that dictate the global operation dynamics. Advanced experimental analytical tools can provide some of this insight. For instance, De Souza and Martin [10] have recently highlighted the utility of secondary ion mass spectrometry for determining elemental and isotopic distributions in solids. This is a powerful approach for understanding the diffusion, mobility, and distribu tion of elements within SOFC materials with nanometer resolution. These ex situ measurements provide information on a length scale that is commensurate with many of the atomistic-level simulation studies, and thus, can provide precise constraints on modeling predictions. While considerable experimental progress has been made in improving SOFC performance, there are still fundamental issues that are often difficult to access experimentally, and this limits our progress. As such, there have been a number of recent computational investigations that have been designed to extract some of this missing information, as a complement to the experimental studies. These modeling studies can be classified according to the resolution of the model, ranging from large-scale microkinetic models [11—13] all the way down to highaccuracy electronic structure calculations [14—24]. While each modeling approach offers a certain level of detail and understanding, we focus on the atomistic-scale modeling techniques that have been applied to these systems, and several of these studies have emerged very recently.

Atomistic Modeling of Solid Oxide Fuel Cells

203

The utility of the modeling approaches, in general, is to improve our under standing of the underlying behavior and mechanisms, which are ultimately responsible for the experimental observations. In particular, SOFC performance is dictated by many factors, some of which are easy to control and study (e.g., temperature, current, and fuel composition), but other parameters and influences are much more difficult to isolate and understand (charge-transfer (CT) limita tions, double-layer structure, ion migration mechanisms, etc.). This creates an opportunity for computational modeling investigations. While models, by default, are limited to reduced (often significantly reduced) representations of the actual systems, the amount of clear information that can be obtained can offer valuable insight. For instance, strong cause-and-effect relationships of the system operating parameters can be extracted, detailed structural information can be obtained, and the role of the energetics of individual system events can be quantified. If a model can demonstrate predictive ability, it becomes a very valuable tool for system design and operation. In order to achieve this type of functionality, a model must be able to capture the fundamental underlying chemical, physical, and structural characteristics of a system. This can be achieved most rigorously (i.e., independent of empirical data fitting) with state-of-the-art ab initio and firstprinciples electronic structure calculations [25]. In general, however, the accurate electronic structure calculations are more appropriate for studying individual reaction events, with small well-defined geometries, and very short timescale dynamics. Therefore, these calculations are not suitable for describing the overall operation dynamics of relatively large, complex systems such as SOFCs. Accurate ab initio and classical molecular dynamics (MD) simulations encounter similar difficulties, if long-time information is desired. Accurate numerical integration during an MD simulation requires time steps short enough to capture atomic vibrations (~10—15 s), and this typically limits the total simulation time to less than a microsecond in a classical MD simulation. In spite of the computational chal lenges, there have been some attempts at MD simulations of SOFC behavior [26—31], such as modeling oxygen diffusion and ion transport within a yttria stabilized zirconia (YSZ) electrolyte. The kinetic Monte Carlo (KMC) simulation method focuses on the state-to state dynamic transitions and neglects the short-time system fluctuations. This approximation allows much longer timescales to be reached, without chemically relevant compromise in the resolution of the simulation, especially for solid-state systems. This is particularly important, since the diffusion of an oxygen ion on the surface of a YSZ electrolyte (among defect sites) requires approximately 1 ms, and the adsorption of one molecular oxygen onto the YSZ at 0.01 atm pressure requires approximately 0.5 ms [32]. Thus, deterministic simulation methods, like MD, are not easily able to capture this behavior, so other methods must be employed. Here, we focus on recent developments in modeling SOFCs that have been attained with KMC simulations. This general modeling approach has been very successful when applied to predict surface deposition processes [33—38] and heterogeneous catalysis [39—48], and it has now become a valuable tool for

204

C. Heath Turner et al.

modeling other systems, such as fuel cell behavior, and for extracting atomiclevel information. Here, we do not present a comprehensive discussion of KMC or the valuable developments of this method that have recently emerged. Rather, we discuss the implementation and characteristics of this method for describing SOFC performance. Here, we present some of the details of implementing this approach for modeling SOFC behavior, illustrate some of the initial predictions and capabilities of this approach, and conclude with some remarks about expec tations for the future.

2. KINETIC MONTE CARLO SIMULATIONS The primary simulation method that we focus on here is KMC [49—53]. It is a very general approach, which can be easily applied to a wide range of materials and dynamical processes. Ultimately, it can provide a good estimate of the time evolution of Markovian processes for a given system [50]. It relies on the a priori knowledge of a given set of transition rates characterizing the simulated pro cesses, which are assumed to obey Poisson statistics [54]. Thus, the system studied is usually initiated by defining an initial structural configuration, along with a list of different possible kinetically driven events or transitions that may occur in the system. In a typical KMC simulation, the system is propagated through time, by stochastically selecting the next event (n) to occur, depending upon the relative probability (Pn) of each possible event. The relative probability of each event is generally dictated by the intrinsic rate constant (kn) of each event, which must be specified as an input to the simulation. These rate constants are typically supplied in an Arrhenius form, with a preexponential factor (kn) and an activation barrier (En). The probability of each event may be affected by the local environment, and therefore, the event probabilities must be updated at each step. For instance, the instantaneous adsorbate coverage on a surface will often affect the adsorption and desorption rates of additional molecules, and this type of behavior can be captured in KMC. The simulation clock is incremented during each step as the system is propagated, so that the time evolution of the system can be monitored (unlike traditional Monte Carlo simulations). The timescale accessible in a KMC simulation tends to be inversely proportional to the rate of the fastest individual events included in the simulation. This allows systems with rare-event processes, such as catalytic systems with high activation barriers, to be efficiently modeled (as all of the dynamics leading up to the event are neglected). These KMC simulations can be accelerated by mapping the system coordi nates to a lattice, which is often a reasonable approximation when dealing with crystalline solids or other systems with well-defined configurations. While this approximation reduces the system’s flexibility, it is much more computationally efficient for evaluating neighbor—neighbor interactions and for propagating the system configuration through time. Once a system is placed on a discrete lattice, a basic recipe for propagating the system can be constructed. For instance, if the system is assigned to a cubic lattice, each lattice site will correspond to a unique

Atomistic Modeling of Solid Oxide Fuel Cells

205

set of integer coordinates (x,y,z), and the possible events that may occur at each site can be quantified (corresponding to the associated rate constants). Dictated by the system configuration at each step, the probabilities of the individual events at each (x,y,z) location will be constantly evolving. Thus, at the beginning of the simulation, and after each simulation step, the event rates must be updated. Mathematically, this is represented in the following set of equations, where the rate (Gn,site) of each event (n) is calculated at each site (x,y,z), which gives a net event rate of Gn, and Gtotal represents the total rate of all events in the system. Gn ¼

X X X x

y

Gtotal ¼

G ðx; y; zÞ z n ; site

X

G n n

ð1Þ ð2Þ

After the system configuration has been defined and the initial rates have been calculated, the system clock is then advanced according to the following equa tion, where Dt is the time step and RN is a random number, evenly distributed between 0 and 1. Dt ¼

lnðRNÞ Gtotal

ð3Þ

After the clock has been incremented, the system configuration is then updated by stochastically choosing an event to occur, according to the probability: Pn ; site ¼

Gn ; site Gtotal

ð4Þ

One way to find the event to carry out, indexed by n, is to order the set of rates and find the n for which Gn1 < Dt1 £ Gn. Once an event is identified to occur, the system configuration is updated, and the list of event rates is updated (according to the new configuration). At each time step, an event is always performed. This is a different procedure, as compared to traditional Monte Carlo calculations, which typically propagate the system by performing trial moves, and either accepting or rejecting these moves. In KMC, the system properties and structural details can be collected as the system propagates through time, and this typically requires 106—109 KMC steps for statistically reliable values. The KMC technique can also be imple mented in parallel computation architectures [54—56], which can be used to accelerate simulations or larger-scale systems, on the basis of perfect time synchronicity [54]. A major practical challenge of KMC simulations is to create a complete catalog of all of the possible processes (or at least the dominant ones), along with accurate transition probabilities. This is a major issue, which cannot be overstated. For instance, it must be understood that the KMC method is not

206

C. Heath Turner et al.

predictive, in the sense that if a reaction or an event is not specified, then it will never occur during the simulation. Therefore, if an important or dominant event is not accounted for in the simulation, the results will likely be unreliable and any predictions questionable. In the next section, we discuss the origin of the kinetic parameters, which have been implemented in several KMC studies of SOFCs. Some of the parameters are extracted from experiments, calculated using electronic structure calculations, or approximated from the results of related systems.

3. KINETIC PARAMETERS In order to design a reasonable SOFC model for a KMC simulation, the first step is choosing the appropriate electrolyte, anode, and cathode materials. Simulta neously, all of the state-to-state dynamics and reaction paths must be represented by a series of well-defined rate constants, which are represented as a set of kinetic parameters. These electrochemical reaction events are designed to satisfy micro scopic reversibility, so that each forward event has a corresponding reverse event (e.g., adsorption/desorption, association/dissociation, and incorporation/excor poration). For every event, establishing accurate kinetic parameters (including activation barriers and preexponential factors) is a critical aspect for correctly modeling the system. This is especially critical for the “dominant” (i.e., ratelimiting) events, and the quality of their corresponding kinetic parameters is pivotal for predicting the correct system behavior. Fortunately, experimental efforts and high-level theoretical calculations (i.e., electronic structure approaches) have provided many of the structural and kinetic details relevant to SOFC operation, and they can be used directly within the KMC modeling framework [57—61].

3.1 Experimental estimates of kinetic parameters Although the KMC modeling approach is intended to mimic a real SOFC experi mental device, a minimally detailed, yet realistic model is often desired, in order to reduce the computational cost. To begin with, YSZ is commonly used as an SOFC electrolyte, and it is an ideal candidate material [62]. Its geometry is easily modeled with a well-defined lattice, with yttrium and zirconium distributed on a face-centered-cubic cation lattice and oxygen and vacancies placed on a simplecubic anion lattice. Doping with yttria (Y2O3) stabilizes the cubic fluorite struc ture of zirconia (ZrO2) and supplies the oxygen vacancies responsible for the ionic conduction at high temperature [61,63,64], yielding a superionic conducting material [65] with a stoichiometry of (Y2O3)x(ZrO2)(1—2x). In the meantime, extensive searches continue for novel electrolytes with higher ionic conductivity near room temperature. Substantial experimental effort has been focused on alternate doping strategies for the synthesis of new bulk materials or on nanotechnology routes for the fabrication of artificial nanostructures with

Atomistic Modeling of Solid Oxide Fuel Cells

207

improved ionic transport properties and conductivity [61,66,67]. Consistent with the experiments, a YSZ electrolyte has most often been adopted in the KMC simulations, with a dopant level of 8—9 mol%, and the YSZ is assumed to behave as an ideal electrolyte (no electronic conduction). An electrolyte is often modeled in these SOFC systems as a capacitive medium, which is characterized by its dielectric properties. The relative permittivity (r) of YSZ, which affects the buildup of electrochemical double layers at the electrode/electrolyte interfaces, can be easily adjusted in the KMC simulations. Ultimately, this can be useful for predicting the property—performance characteristics in the overall SOFC operation. Acting together with the electrolyte, the cathode and anode materials have a profound influence on the electrochemical reactions taking place within the three-phase boundary (TPB) region, and there is strong experimental support for this [68,69]. Depending upon the operating conditions, these reactions are often the rate-limiting reactions, and they are particularly sensitive to any attempts to reduce the SOFC operating temperature [70,71]. As a redox process, the oxygen reduction reaction (ORR) occurs at the cathode, and the fuel is oxidized at the anode (with the O2— ions received from the electrolyte). The electrons of the fuel chemisorption reaction are delivered to a current collector, and the output voltage depends upon the magnitude of the overpotentials at the electrodes. The ideal electrode materials must demonstrate compatibility with the electrolyte, high electrochemical activity and stability, and favorable thermal expansion properties. So far, a wide variety of electrode materials has been reported, including monometallic materials (e.g., Ag, Pt, Cu, and Ni), and various alloy materials (composed of combinations of La, Mn, Fe, Cu, etc.) [71—75]. Although the alloy materials have demonstrated a great deal of potential in real experiments, monometallic electrodes (such as Pt and Ni) are simpler to incor porate into KMC modeling studies. Thus, most of the modeling efforts have focused on studying the ORR at the TPB of a Pt/YSZ cathode and the fuel oxidation at the TPB of a Ni/YSZ anode. As mentioned in the previous section, the basic electrochemistry within the cathode, anode, and electrolyte of our SOFC model can be represented by a series of distinct electrochemical reactions, which together mimic the operation of an ideal SOFC. As an example, Table 1 displays a representative set of elementary reactions and their corresponding kinetic parameters, and these steps (or slight variations) have been implemented in several KMC studies. As suggested by Gauckler’s group [76—78], the kinetic parameters of several elementary steps at the cathode side can be assembled from previous experimental results. For example, oxygen adsorption on the Pt surface has been experimentally investi gated [79—81], which revealed that the saturation of the Pt surface with atomic oxygen depends mainly on Pt surface orientation and roughness. The value on polycrystalline Pt (1015 atoms/cm2) is slightly higher than typical values of 2 1014 to 8 1014 atoms/cm2 for low-index single-crystal Pt surfaces. Simi larly, oxygen desorption, diffusion, dissociation, and dimerization at the Pt/YSZ interface have also been experimentally studied [82—85]. The corresponding reaction rates may be expressed in an Arrhenius form as follows:

208

Event (n)

Location

1

Cathode Adsorption k1: O2(g)þ() ! O2

2 3 4 5 6 7

9 10

Desorption k2: O2 ! O2(g)þ() Diffusion k3: O2 ! O2 Dissociation k4: O2þ() ! O—þ O— Dimerization k5: O—þO— ! O2þ() Diffusion k6: O— ! O— Incorporation k7: O—þVo¨ þe— ! O2—(YSZ)þ() Cathode Excorporation k8: O2—(YSZ)þ () ! O—þVo¨ þe— YSZ Diffusion k9: Vo¨ ! Vo¨ Anode Adsorption k10: H2(g)þ2() ! HþH

11 12 13

Anode Anode Anode

8

Elementary step

Cathode Cathode Cathode Cathode Cathode Cathode

Formation k11: O—þH!OH— Dissociation k12: OH— ! O—þH Association k13: HþH ! H2(g)þ2()

Prefactor (kn)

Activation barrier (En)

References

s0=0.18 (trapping probability) 1.0 1013 s—1 4.65 10—5 m2 s—1 5.0 1011 s—1 2.4078 1014 m2 s mol—1 4.65 10—5 m2 s—1 2.7899 1010m3 s—1 mol—1

m=14.1 (sticking exponent) 37.0—21.0 kJ mol—1 140.5 kJ mol—1 33.0þ16 kJ mol—1 250.0—50 kJ mol—1 140.5 kJ mol—1 130.0þ48.2425 kJ mol—1

[78,80]

2.7899 1010m3 s—1 mol—1

130.0þ48.2425 kJ mol—1 [78]

1.9 1013 s—1 =0.01 (trapping coefficient) 1.0 1013 s—1 5.213 1012 s—1 1.45418 1011 s—1

101.3 kJ mol—1 —

[21,78] [86,88]

97.9 kJ mol—1 37.19 kJ mol—1 88.12 kJ mol—1

[89,106] [89,106] [89]

[78,82] [78,83,84] [78,80] [82,85] [78,83,84] [78]

(Continued)

C. Heath Turner et al.

Table 1 Kinetic parameters used in the KMC simulations, where q represents the local surface coverage and () represents a surface adsorption site on the cathode or anode

Anode Anode Anode

Formation k14: OH—þH ! H2O Desorption k15: H2O ! H2O(g)þ() Adsorption k16: H2O(g)þ() ! H2O

17

Anode

18

Anode

19 20 21 22 23

Anode Anode Anode Anode Anode

Excorporation k17: O2—(YSZ)þ () ! O—þVo¨ þe— Incorporation k18: O—þ Vo¨ þe—!O2—(YSZ)þ() Diffusion k19: H ! H Diffusion k20: OH— ! OH— Diffusion k21: H2O ! H2O Dissociation k22: H2Oþ() ! OH—þH Diffusion k23: O—!O—

7.8 1011 s—1 4.579 1012 s—1 =0.1 (trapping coefficient) 2.7899 1010 m3 s—1 mol—1

42.7 kJ mol—1 62.68 kJ mol—1 —

2.7899 1010 m3 s—1 mol—1

130.0 þ 48.2425 kJ mol—1 [78]

4.65 10—5 m2 s—1 4.65 10—5 m2 s—1 6.6 10—9 m2 s—1 5.655 1012 s—1 4.65 10—5 m2 s—1

140.5 kJ mol—1 140.5 kJ mol—1 140.5 kJ mol—1 91.36 kJ mol—1 140.5 kJ mol—1

[89,106] [89] [87,88]

130.0þ48.2425 kJ mol—1 [78]

[78,83,84] [78,83,84] [78,83,84,88] [89,106] [78,83,84]

Atomistic Modeling of Solid Oxide Fuel Cells

14 15 16

209

210

C. Heath Turner et al.

kn ¼

k0n exp

En kB T

ð5Þ

where kB is Boltzmann’s constant and the subscript n denotes the event number. The rate expressions here include an activation barrier (En), a preexponential factor (k0n), and sometimes a coverage-dependent correction to the reaction rate. On the anode side, the reaction mechanisms are more complicated, due to the presence of additional species and additional elementary reactions. At the anode, reactions including hydrogen adsorption on the Ni surface and water desorption from the YSZ surface may occur in parallel or consecutively. The sticking coeffi cient for hydrogen can be taken from chemisorption studies on Ni(111) [86], while the sticking coefficient for water can be extracted from studies on partial oxida tion of methane over rhodium [87], due to a lack of data on Ni and YSZ. For similar reasons, the diffusion of surface-bound molecules and ions (e.g., water, hydroxyl, and oxygen ion) are assumed to have equivalent kinetic parameters and are approximated by the oxygen diffusion rate from the cathode side. In some cases, the rates of individual events are difficult to extract or estimate from experiment. However, there is now a great deal of high-level computational investigation being performed to uncover this information, as described in the following section.

3.2 Computational estimates of kinetic parameters Although it is difficult to directly monitor the heterogeneous chemistry in an operating fuel cell, computational methods can be used to disentangle the indi vidual events and predict the kinetics of these events on well-defined surfaces [88,89]. Among the available computational methods, density functional theory (DFT) has played a dominant role. For example, using density functional perturbation theory, the lattice dielectric and thermodynamic properties of YSZ crystals as a function of yttria concentration were studied, and the calculated specific heat and dielectric constants are in good agreement with low-temperature experimental values [90]. In order to elucidate the reaction mechanisms of electronic charge transfer involving the electrochemical oxidation of fuel at the anode TPB, Ziegler’s group employed ab initio calculations, yielding detailed mechanisms of H2 and CH4 adsorption and oxidation on the YSZ surface [18,19,91]. The results validated the possibility of the direct oxidation of the fuel without a metallic catalyst on the oxygen-enriched YSZ surface, which is in good agreement with experimental observations [19]. Moreover, further DFT cal culations have been used to investigate reactions at the Ni/YSZ/fuel TPB, with a focus on the fundamental interfacial electrochemistry. The results showed that, due to partial saturation of the valence of an extra oxygen atom of the Ni/YSZ cermet, an oxygen-enriched YSZ surface (YSZþO) is significantly less active toward oxidation of fuel molecules (H2, CH4, and CO) than an oxygen-enriched YSZ surface in the absence of Ni [18]. The stability of surface-adsorbed H2, O2, and OH— on various electrode materials (e.g., Mn, Fe, Co, Ni, Cu, Ru, Rh, Pd, Ag, Pt,

Atomistic Modeling of Solid Oxide Fuel Cells

211

and Au) have been investigated by Rossmeisl et al. with DFT calculations [92]. Their results suggested that the surface-adsorbed oxygen is a key intermediate in the hydrogen oxidation reaction, which is well correlated with experimental observations [93,94]. Besides just hydrogen and oxygen adsorption, the adsorption and oxidation of methane on the Ni surface have been studied by several groups using DFT [95—99]. Other DFT studies of the ORR on metallic surfaces (including Pt) have been extensively reported, and these are helpful for explaining the reac tion pathways at the SOFC cathodes [100—103]. In other work, Pornprasertsuk et al. [21] applied DFT within the semilocal density approximation to calculate a set of energy barriers that oxygen ions encounter during migration within YSZ. Then, based on those DFT results, they performed KMC simulations to show that the maximum conductivity of (Y2O3)x(ZrO2)(1—2x) occurs at around 7—9 mol% Y2O3 at 600—1500 K, and the effective activation energy was predicted to increase at higher Y doping concentrations. Besides using the DFT approach, other high-level theoretical models have also been developed to probe the kinetic parameters, relevant to SOFC operation. For example, the unity bond index-quadratic exponential potential (UBI-QEP) [104,105] was successfully performed to study several elementary reactions on various transition metal surfaces [106]. This approach generated activation bar riers and enthalpy changes for forward and reverse reactions for the formation and dissociation of H2O and OH— species. In general, the dominate events associated with SOFC operation correspond to the charge transfer into and out of the YSZ, as oxygen ions are incorporated into (at the cathode) or expelled from (at the anode) the electrolyte. The general rate expression [107] for these events is implemented by Mitterdorfer and Gauck ler [78] as k0n ¼ kn exp ðb 0 EÞ

ð6Þ

Depending upon the direction of the charge transfer (with respect to the direction of applied voltage), the b0 term is expressed as either b 0 ¼ F=RT or b 0 ¼ ð1 ÞF=RT, where F is Faraday’s constant, E is the applied potential, and is CT coefficient. Moreover, the kn term is expressed by an Arrhenius form, as shown previously in Eq. (5). Equation (6) indicates that the CT events are significantly affected by the voltage drop across the electrode/YSZ interface. Hence, as the applied voltage is varied, the migration of ions (i.e., the current) through the YSZ is altered. In general, previous experimental values and computational data can be used to estimate the kinetic parameters needed for a KMC-based simulation. These parameters may be improved and adjusted after KMC simulation, if an initially identified reaction mechanism is shown to be insufficient to capture the experi mental behavior. Most importantly, the DFTþKMC multiscale simulation approach establishes a well-defined pathway for taking atomistic-level details and reaching lab-level experimental results, which can be used to accelerate the discovery process and enhance engineering design.

212

C. Heath Turner et al.

4. ATOMISTIC SIMULATIONS OF SOLID OXIDE FUEL CELLS For a number of different reasons, as mentioned in Section 1, fuel cell technology has recently attracted a great deal of interest from government, military, and commercial industries. SOFCs are still at an early stage of development. Thus, this technology has not yet been broadly adopted, but rather, has been imple mented in specialized applications and demonstration projects [3]. There are still a great deal of potential applications for SOFCs, but a number of unresolved issues still exist. While these fundamental challenges have been fairly well identified, well-defined routes for resolving these challenges are much more difficult to ascertain. Part of this challenge is due to the large parameter space available and the fact that it can often be difficult to establish precise information about the fundamental behavior of the experimental systems. There are funda mental design questions that range from the grain details of the electrolyte [108,109] all the way down to features at the nanometer length scale. It is recognized that the overall performance, durability, and efficiency of an SOFC can be affected by several different factors. For simplicity, these factors can be grouped into three main categories: material, structural, and operational depen dencies. From this hierarchy, each one of these dependencies can be individually investigated, and in principle, SOFC operation can be methodically optimized. However, there are practical limitations to this approach, since many of the con tributing factors to SOFC performance show strong interdependencies, and these correlations can make it difficult to obtain fundamental information about the underlying mechanisms or obtain clear cause—effect relationships. As such, there has been an emergence of atomistic-level studies to understand the behavior of the individual events involved in SOFC operation and efforts to combine these indi vidual events to simulate the overall behavior of a fuel cell. One of the first attempts at modeling SOFCs with KMC simulations was reported by Modak and Lusk [32]. In their study, their model was restricted to capture the behavior of the electrolyte, YSZ, as a function of the opencircuit voltage, and comparisons were made with analytical predictions (Guoy—Chapman model). The paper focused on the oxygen concentration dis tribution within the electrolyte at the TPB, the voltage profile across the electro lyte, and the electric field within the electrolyte. Furthermore, the influences of the temperature and relative permittivity of the electrolyte on these features were captured. In order to accelerate the convergence of the simulations and to facil itate comparison with analytic models, a one-dimensional (1-D) model was implemented, and the cathode and anode structures and reactions were comple tely neglected. The model was constructed by decomposing the electrolyte into layers, which contain a certain number of cations, anions, and vacancies, as shown in Figure 1. During the simulations, the cations remain fixed, while the anions and vacancies can hop from layer to layer (while preserving the overall charge neutrality of the system). An important consideration in this work is the treatment of the electric field, and its influence on the diffusive ionic motion within the electrolyte. For instance, a hopping event (i.e., diffusion of an ion from one layer to a neighboring

Atomistic Modeling of Solid Oxide Fuel Cells

213

Anode

a0

LKMC

LAnat

Cathode ––

X

Y++

Vac

Figure 1 A one-dimensional lattice for the KMC model. Each horizontal row represents one unit cell within this system. Reprinted from Reference [32], copyright 2005, with permission from Elsevier.

layer) is viewed as an ion moving within a static electric field, E. The electric field, E, can be decomposed into contributions from the local electric field, depending upon the distribution of ionic species within the electrolyte, plus the field arising from the net charges on the cathode and anode. It is assumed that the charge within each individual sheet is distributed uniformly, so that the electric field of each sheet (with charge density i) can be calculated with Gauss’s law according to þ Q ! ¼ E dA ¼ ð7Þ E 0 Er ! E i ¼ – i 2E0 Er

ð8Þ

Here, is the electric flux through each layer, A is the cross-sectional area of each plane in the simulation, Q is the total space charge inside the enclosing surface, ! E i corresponds to a uniform electric field, and 0 and r are the vacuum permit tivity and the relative permittivity of the YSZ, respectively. This approximation (1-D charge gradient) greatly accelerates the calculations. The electrical work WF to transport a charge a is calculated as ! ð9Þ WF ¼ aq E i , Thus, the potential energy, for moving charge q from plane Z0 to plane Zi is qVsc where

0 1 i1 N k X X X a i 0 @ ¼ Vsc þ Eh Eg A Vsc 2 h¼kþ1 g¼1 k¼0

ð1 i NÞ

ð10Þ

214

C. Heath Turner et al.

In the above equations, q represents the charge of the ion being transported and a represents the lattice spacing. The electric field arising from the arrangement of ionic species within the electrolyte must be updated as the ionic species move, and this field is evaluated by calculating the instantaneous charge density within each sheet and then summing these individual contributions along the length of the YSZ. Once the electric field is quantified, the electric work is calculated and is then used to bias the motion of the ionic diffusion according to an Arrhenius expression (Eq. 6), which symmetrically biases the activation energy barrier heights of the ionic diffusion. This 1-D model was able to predict a double-layer structure consistent with the Guoy—Chapman model, with the relative permittivity (r) controlling the effective width of the double-layer structure. For instance, the evolution of the electric field and electrode potential was monitored as a function of time, and the KMC simulations quickly converged to the analytical solution. As the relative permittivity increased, the width of the double layer grew, and as the tempera ture increased, the double layer showed similar increases (as evidenced by charge gradients at the anode and cathode). The relative permittivity is a critical aspect of SOFC performance, since (due to strong ionic polarizability) the relative permittivity near the electrodes can vary by more than an order of magnitude [110]. Overall, the results from this study are reasonable within the assumptions employed, and the general approach taken provided a solid foundation for future work. However, the extension of the 1-D system into a 3-D model is important, in order to make comparisons with real systems. Shortly after the 1-D study of Modak and Lusk, a 3-D KMC model of a YSZ electrolyte appeared in the literature. Pornpresertsuk et al. [21] first used firstprinciples DFT calculations to predict the energetics of oxygen ion diffusion within YSZ via a vacancy mechanism. The energetics were compared to the same process (vacancy-assisted charge diffusion) within a similar electrolyte, scandia-stabilized zirconia (SSZ). Then, this information was used to populate an energy database to feed into KMC simulations of oxygen ion diffusion within YSZ. In this study, only bulk diffusion was modeled (based on a periodic supercell simulation), and the charge gradients and external electric fields were neglected. The DFT calculations, based on the PW91 semilocal functional of Perdew and Wang [111,112], were used to generate a database of 42 migration energy barriers that the oxygen ions encounter during diffusion, as a function of the cation positions. In the KMC simulations, these energy barriers were then used to extract a net oxygen ion diffusion rate, as a function of the temperature and dopant concentration. Consistent with the experimental findings, the simulations predicted that the maximum ionic conductivity would be found when 7—9 mol% YSZ is used and when the temperature is in the range of 600—1500 K. The effective activation energy for ionic diffusion within the 8 mol% YSZ is predicted to be 0.7 eV, which is lower than the experimental value of 0.83—1.05 eV [113]. This discrepancy is attributed to the neglect of vacancy—vacancy interactions, neglect of interactions with extended neighbors (only the first nearest-neighbor cations surrounding the vacancy site and the diffusion oxygen ion were taken into

Atomistic Modeling of Solid Oxide Fuel Cells

215

account), and inaccuracies in the DFT calculations. Interestingly, the KMC calcu lations also indicated that the oxygen ion diffusion barriers in SSZ are less than in YSZ, and this is consistent with experimental observations [114]. As an extension to their prior work, Pornprasertsuk et al. [115] used the same basic modeling framework (DFTþKMC) to conduct an electrochemical impe dance analysis of an ideal YSZ electrolyte. Consequently, this study provided a nice bridge for experimental electrochemical comparisons, since there is a great deal of information available in the literature, generated from electrochemical impedance spectroscopy (EIS) studies of these same materials. It is known that EIS is a powerful tool that can be used to extract intrinsic properties of an operating fuel cell, especially with respect to the electrode/electrolyte interfaces [116,117]. For instance, in both the KMC simulations and from the experimental EIS studies, Nyquist and Bode plots can be generated and subsequently used to extract information about the geometric capacitance, electric double-layer capa citance, and the resistance of the YSZ (based on an equivalent circuit model). These are critical parameters for understanding the material performance and for later use in engineering design of larger SOFC structures. While their previous KMC study [21] was restricted to bulk oxygen ion diffusion within YSZ, their electrochemical impedance analysis simulations included “blocking” electrodes (with neither electrochemical reactions nor diffu sion allowed at the electrodes) at each end of the YSZ, and alternating electrode potentials were applied during the simulation. Based on their DFT-generated energy barriers [21], the KMC simulations were performed on two different-sized systems: 15 8 8 and 45 8 8 unit cells (periodic in the shorter two dimen sions and blocking electrodes as boundary conditions in the longer dimension). With the introduction of an applied potential and charge gradients between the YSZ layers perpendicular to the electrodes, the diffusion of the mobile charge species (i.e., oxygen ions and the vacancies) is affected, analogous to the method proposed by Modak and Lusk [32]. Thus, the migration energy barriers were augmented by the electrode potentials and the instantaneous charge distributions within the YSZ crystal lattice, following Eqs. (7)—(10). Over a range of different applied potential frequencies, ! (103—1010 Hz), and temperatures (400—700 K), the amplitude (I0) and the phase shift () of the current can be quantified, based on a numerical fit of the results [115,118]. From this data, the real (Zreal) and imaginary (Zimg) parts of the impedance (Z) were calculated by fitting to the equation [116]: zð!Þ ¼

V0 ð!Þ ðcos isin Þ ¼ Zreal ð!Þ iZimg ð!Þ I0 ð!Þ

ð11Þ

After this fitting procedure, Bode plots and Nyquist plots were generated. Com parisons were made with experimental samples of 100 nm thin film polycrystal line YSZ, and a close match was observed among the experimental Nyquist plots (taken at 336C) and the simulated Nyquist plots (taken at 400C), with moderate deviations found at low frequencies (see Figure 2). In addition to this informa tion, details about the double-layer structure at the anode and cathode (as a

216

C. Heath Turner et al.

Normalized ⎪lm Z⎪ (Ω-cm)

2.5×1010 EIS 336°C KMC 400 K

2.0×1010

1.5×1010

1.0×1010

5.0×109

0.0 0

1×1010

2×1010

3×1010

Normalized Re Z (Ω-cm)

Figure 2 Nyquist plots of the normalized impedance results of 15 8 8 supercells obtained from kinetic Monte Carlo (KMC) simulations at 400 K compared with the results from the electrochemical impedance spectroscopy (EIS) measurements on a pulsed layer-deposited polycrystalline thin film YSZ (100 nm in thickness) at 336C. Reprinted from Reference [115], copyright 2007, with permission from Elsevier.

function of temperature and frequency) were extracted. At low frequencies (less than ~105 Hz), the applied voltage and the current are in phase, and the double layer is able to fully develop near each electrode. At low temperatures and at higher frequencies, the migration of the ion oxide species is not rapid enough to keep up with the fluctuating voltage. As a result, the phase shift between the applied voltage and the current grows, the double layer does not have sufficient time to develop, and the current tends to increase. A component missing from these first few KMC examples in the literature [21,32,115] is the treatment of the electrochemical reactions on the anode side or the cathode side of the YSZ. In order to bring these simulations closer to the operation of a real fuel cell, Lau et al. performed KMC-based simulations of a half-cell SOFC model (see Figure 3) [119,120]. Several characteristics of the pre vious modeling studies [21,32,115] were retained in this study, such as the accounting of the electric field at different points along the YSZ and the calcula tion of the migration energy barriers within the YSZ. However, in this work, only the cathode side of the YSZ was included, along with the cathodic reactions, which were developed from a simplified reaction mechanism. The values of the individual event rates were extracted from a variety of experiments and electro nic structure calculations (see events 1—9 in Table 1). In their approach, the cathode was assumed to have negligible thickness, so that gas adsorption and reaction were not affected by the catalyst porosity or surface area. The cathode/ YSZ/gas TPB, which is where the oxygen incorporation reaction occurs, was accounted for by scaling the total current generated in the model by the fractional

Atomistic Modeling of Solid Oxide Fuel Cells

Unit Cell

Y3+

Oxide ion vacancy

O2− Flux

y Zi ...

ZN

Present work Double layer − e

Anode

x z

YSZ (electrolyte)

Z0 Z1 ...

O2−

±±±±±

±±±±±

Cathode

O2(g)

Zr4+

217

Double layer − e

Load Vappl

Figure 3 Illustration of the cathode-only YSZ fuel cell model. Reproduced with permission from Reference [118], copyright 2010, The Electrochemical Society.

surface area of YSZ that coincides with the experimental TPB (i.e., ~0.01) of a real SOFC. This was an approximation adapted from previous microkinetic models [76]. In this cathode-only ORR KMC simulation, the overall reaction mechanism included several individual events, each with an associated rate constant. These elementary steps included adsorption/desorption of oxygen molecules from the gas phase onto the surface, diffusion of the adsorbed oxygen species on the cathode/YSZ surface, oxygen incorporation from the surface into the YSZ, and diffusion of the oxygen ions through the YSZ toward to anode side. The oxygen ions pass through the end of the half-cell model (i.e., the opposite end from the cathode side) as the simulation is propagated through time. Consequently, the simulated current (J) is quantified as the net flux of oxygen ions through the boundary, once steady state is reached. To assess the consistency of this KMC model, a variety of materialsindependent, materials-dependent, and geometrical parameters was investigated, and the ionic current calculated from the model was used as the primary metric. The materials-independent parameters included the oxygen partial pressure, sys tem temperature, and the external applied potential. Of these parameters, the oxygen pressure had a weak influence on the current (Figure 4), unless its value falls below a threshold of approximately 0.05 atm. As the temperature increased (from 200 to 800C), the current showed an exponential increase, owing to the thermally activated ion transport in YSZ. As the applied electric potential of the cell increased, a similar increase was found in the calculated ionic current. The materials-dependent parameters included the dopant level (i.e., Y2O3

218

C. Heath Turner et al.

−8

20

15

loge (σ ) (Sm−1)

Lonic current density, j (mA/cm2)

−10 −12 −14 −16 −18 −20 −22 −24 −26 −28

Exp

−30

10

0.8

1.0

1.2

1.4

1.6

1.8

2.0

2.2

PO2 = 0.05 atm PO = 1.00 atm 2 PO = 1.00 atm

5

2

0 400

500

600

700

800

900

1000

1100

Temperature, T (K)

Figure 4 The ionic current plot as a function of temperature at different oxygen partial pressures (0.05, 0.30, and 1.00 atm). The inset is a plot of the simulated conductivity compared to the EIS experiment [115] (in log(s) in Sm1) as a function of inverse temperature (in 1000/T). Reprinted from Reference [119], copyright 2008, with permission from Elsevier.

concentration) of the YSZ and the relative permittivity of the YSZ. An increase in the dopant level (from 6 to 12 mol%) produces marginal increases in the ionic current. Experimentally, the same increase in the electric current is found, but it tends to decrease after passing a dopant level of approximately 12 mol% Y2O3. This result is associated with the decreased ionic conductivity of the YSZ, due to clustering of Y2O3 crystallites and enhanced ion—vacancy interactions (which might block the diffusion channels in the electrolyte) [65], and this complex beha vior was not captured in the KMC modeling. The KMC simulations also predict moderate increases with respect to increases in the relative permittivity. Finally, the geometrical parameters of the model were found to be qualitatively consistent with experiments. Mainly, the current predicted as a function of electrolyte thickness is close to Ohm’s law behavior of a classical conductor, as shown in Figure 5. Thus, it is suggested that ionic transport within the electrolyte is mainly diffusive in nature, and the resistance can be mainly defined by the scattering length of the media. A follow-on KMC study was also reported by Lau et al. [120] of the same cathode-only SOFC model, in order to determine the sensitivity of the ORR simulation results to the underlying kinetic parameters. Using the same basic mechanism as defined previously [119], a baseline set of simulation parameters was first established: dimensions of the YSZ supercell=32 nm2 (area) 10.7 nm (length), relative permittivity = 40, oxygen partial pressure = 0.30 atm, temperature 800C, and applied bias voltage of the fuel cell = —0.50 V. Using these conditions, a parametric sensitivity analysis was performed by individually varying each of the activation energy barriers and preexponential factors of each elementary reaction by +25%, and the corresponding ionic current was calculated. This analysis showed a clear indication of the primary rate-determining steps in the oxygen reduction process of the SOFC cathode, as illustrated in Figure 6. The main

219

Atomistic Modeling of Solid Oxide Fuel Cells

1.50 1.48

log(RA) (Ωcm2)

1.46 1.44 1.42

Δ = 0.30

1.40 1.38 1.36

Cross-section, A1 Cross-section, A2

1.34 1.32 1.30 0.7

0.8

0.9

1.0

1.1

1.2

1.3

1.4

YSZ thickness, log(D) (nm)

Ea 1 Ea 2 Ea 3 Ea 4 Ea 4 Ea 5 Ea 6 Ea 7 Ea 8 Ea

(~1383%) (0,+25%) (−25%,0)

(~4156%)

∼

∼

9

−200

Prefactor F0

Activation barrier Ea

Figure 5 The logarithmic plot of resistance (cm2) vs. logarithmic plot of YSZ thickness (in nm) of the YSZ with two different cross-sectional areas: ~32 nm2 (red circle) and ~117 nm2 (green square). The dotted blue line is the linear regression fit of these data. Reprinted from Reference [119], copyright 2008, with permission from Elsevier. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this book.)

0

200

2000

4000

Change of current density J (%)

F0 1 F0 2 F0 3 F0 4 F0 4 F0 5 F0 6 F0 7 F0 8 F0

(0,+25%) (−25%,0)

9

5000

−30

−20

−10

0

10

20

30

Change of current density J (%)

Figure 6 Sensitivity analysis of the ionic current density (J) on various kinetic parameters. Left: activation energy barrier (Ea); right: preexponential factor (F0). Reprinted from Reference [120], copyright 2009, with permission from Elsevier.

determinants were found to be the oxygen incorporation reaction from the cathode/YSZ interface into the YSZ, followed by the oxygen ion diffusion within the YSZ. With only a moderate reduction (—25%) in the activation energy barrier for the oxygen incorporation step, the ionic current was predicted to increase by more than an order of magnitude. The same level of improvement was also found when the activation energy barrier for the oxygen ion diffusion was reduced. From the analysis, it was found that both factors can affect the distribution and concentration of the electric double layer that is present at the YSZ/cathode interface, and the presence of the double layer has been confirmed by experiments to have a strong

220

C. Heath Turner et al.

impact on the performance of an SOFC [121—123]. Ultimately, this KMC study motivates continued efforts into identifying novel electrolyte formulations that render higher ionic conductivity, and continued improvements in the catalyst/ electrolyte interface that would facilitate the oxygen incorporation step. If these two mechanistic bottlenecks can be accelerated (particularly at lower tempera tures), the power density generated by an SOFC should be significantly improved. Based on the same cathode-only SOFC model, a third KMC simulation study has recently been published [118], and it is focused on the frequency response characteristics of the fuel cell. In the KMC simulations, frequencies ranging from 104 to 109 Hz and temperatures ranging from 600 to 1000C were explored. In addition, the influence of the electrolyte thickness, the oxygen partial pressure, and the relative permittivity of the electrolyte on the frequency response was probed. The frequency range was somewhat offset from typical experimental investigations (10—2—106 Hz) [124], due to practical limitations in the underlying computational details. In the simulations, the SOFC applied voltage was varied +0.5 V at a given frequency, and the fluctuating current was calculated as a function of time. From this data, the real and complex contributions of the impedance, Z, were extracted (i.e., fit to Eq. 11, above), and the Nyquist and Bode plots were constructed. As before, a baseline set of conditions was identi fied: dimension of YSZ supercell = 26 nm2 (area) 21 nm (length), relative per mittivity = 40, oxygen partial pressure = 0.30 atm, temperature = 800C, and | applied voltage| = 0.50 V (fluctuating). Since the detailed electrode morphology was ignored in this model, only a single semicircle feature was found in the Nyquist plots, and the response was closely fitted (using a nonlinear least-square approach) to an equivalent circuit model with two resistors and one capacitor. In the Bode plots (Figure 7), the variation of the current amplitude and the phase angle shift is shown as a function of the frequency, corresponding to a range of temperatures (T) and electrolyte thicknesses (d). From this paper, several key characteristics can be identified, which are similar to the previous electrolyte-only model of Pornprasertsuk et al. [115]. For instance, at low frequencies (<105 Hz) and at high temperatures (>800C), the SOFC behavior mimics DC fuel cell operation, since all of the elementary steps in the model occur on much shorter timescales than the variation in the applied voltage. At lower temperatures (<800C) and moderate frequencies (105—107 Hz), significant features begin to emerge, since the timescale of the event rates began to overlap with the timescale of the voltage variation. Hence, the phase angle shift signifi cantly grows within this region, and the current suddenly begins to grow (as the double-layer resistance is diminished, see Figure 7). At the highest frequencies, the double-layer resistance becomes negligible, and the current plateaus at a max imum value, while the phase angle shift falls to zero. In general, the relative permittivity and the oxygen partial pressure had a less noticeable influence on the frequency response characteristics. After analyzing the Nyquist and Bode plots (Figure 8a and b), the equivalent circuit model used to fit the impedance spectra is shown in Figure 8c, where the elements Re, Rr, and C are assumed to correspond to the electrolyte resistance, the electrochemical reaction resistance, and the total capacitance, respectively. While

Atomistic Modeling of Solid Oxide Fuel Cells

(a) −0.25

1.2

Vcal (V)

J 0.9

d = 20×a d = 30×a d = 40×a d = 50×a d = 60×a

−0.35 −0.40 −0.45

0.6 0.3

−0.50 80

J (A/cm2)

Vcal

−0.30

⎪φ⎪(Degree)

221

0.0 d = 20×a

60

d = 30×a d = 40×a

40

d = 50×a

20

d = 60×a

0

2

3

4

5

6

7

8

9

10

−0.25

Vcal

Vcal (V)

−0.30

J

3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0

T = 873K T = 973K T = 1073K T = 1173K T = 1273K

−0.35 −0.40 −0.45

⎪φ⎪(Degree)

−0.50 80

J (A/cm2)

Log ω (Hz) (b)

T = 873 K T = 973 K T = 1073 K T = 1173 K T = 1273 K

60 40 20 0 2

3

4

5

6

7

8

9

10

Log ω (Hz)

Figure 7 Fitted voltage Vcal, current density J, and phase angle shift f at (a) different electrolyte thicknesses (T = 1073 K and PO2=0.30 atm) and (b) different temperatures (on 10 10 40 supercells and PO2=0.30 atm). Reproduced with permission from Reference [118], copyright 2010, The Electrochemical Society.

the fitted value of Rr is the much larger resistance (due to the large potential drop at the cathode/electrolyte interface) and stays relatively constant, the series resistance (Re) increases proportional to the electrolyte thickness. These features have also been identified experimentally with other electrode/electrolyte materi als (e.g., lanthanum strontium cobalt ferrite/lanthanum dysprosium tungsten molybdate) [125]. In terms of the total capacitance (C), it is postulated that contributions arise from both the geometric capacitance (Cg) and the doublelayer capacitance (Cdl). The fitted value of the overall capacitance C satisfies the

222

C. Heath Turner et al.

(a) 30 Simulated results Fitted results ⎪ZIm⎪ (Ω.cm2)

20

10

0

0

5

10

15

20

25

30

35

40

⎪ZRe⎪ (Ω.cm2)

⎪Z⎪ (Ω.cm2)

(b)

40

Simulated results 30

Fitted results

20 10

⎪φ⎪(Degree)

0 75 50 25 0 3

4

5

6

7

8

9

Log ω (Hz) Rr

Re

(c) C

Figure 8 (a) Nyquist and (b) Bode plots of the impedance results obtained from the KMC simulations at T = 1073 K on 10 10 40 supercells, where the dashed and solid curves correspond to the simulated data and their fitted results, respectively. (c) The equivalent circuit model with two resistances and one capacitance used to fit the frequency response data obtained from the KMC simulations. Reproduced with permission from Reference [118], copyright 2010, The Electrochemical Society.

fundamental description of capacitors in series: 1/C = 1/Cdl þ 1/Cg for (Cdl >> Cg). The total capacitance tends to be strongly related to Cg, which depends on the electrolyte thickness according to Cg = 0rA/d. However, due to the contribu tions from Cdl, small deviations were found in the total capacitance C. In

Atomistic Modeling of Solid Oxide Fuel Cells

223

addition, C was found to be slightly influenced by the temperature (from 600 to 1000C the total capacitance increased from 3.319 10—8 to 3.711 10—8 F/cm2), and this phenomenon has been observed in previous theoretical studies [115]. Overall, the results from the cathode-only KMC simulations [118—120] were found to be qualitatively consistent with experimental trends, with a great deal of the atomistic-level details preserved. However, in order to improve the results, and approach quantitative agreement with experiments, additional features must be incorporated, such as the anode-side reactions, correlation of the ion—vacancy and vacancy—vacancy interactions, grain boundaries, and explicit structural treat ment of the anode and cathode. In order to incorporate some of these necessary features, two KMC-based SOFC simulation studies have recently emerged [126,127] along with some close experimental collaboration [128]. In all of these more recent studies, a complete SOFC model (anodeþcathode) was assembled. In the first of these three studies, Pornprasertsuk et al. [126] used a powerful two-tiered approach to develop their model, by combining DFT-based electronic structure calculations (to predict the energetics of key reactions) along with KMC simulations of the overall SOFC operation. In addition, an explicit TPB was included in the model (using a Pt anode, Pt cathode, and an 8 mol% YSZ electro lyte) so that the details of the electrode/electrolyte interface could be rigorously explored. The energetics of the oxide ion migration within the YSZ and the electrode reactions on the Pt surface were taken from previous literature studies [21,115,129—136]. However, the CT energetics between the YSZ and the Pt had not been well established in the literature, so the authors used DFT to calculate the activation energies involved in this step (as a two-step electron transfer process). In addition to the oxygen ion migration within the YSZ, a well-defined set of elementary steps was defined to occur at the anode side and cathode side of the model. In their work, it was assumed that the CT reactions (i.e., oxygen incorporation and vacancy formation) could only occur at the TPB of the SOFC. Assuming that a finite-size cluster model was sufficient to capture the basic electrochemistry of the electrode/electrolyte interface, the CT reaction energies were calculated with a plane-wave DFT approach, by placing a 38-atom Pt cluster on a (111) 7.4 mol% YSZ surface. In the calculations, the system Hamiltonian was represented with the projector-augmented wave method [137,138] along with the PW91 semilocal exchange-correlation functional [112]. In the DFT calculations, an oxygen atom on the Pt cluster was traced as it moved toward an oxide ion vacancy on the YSZ surface, and it was found that the oxygen incorporation barrier is smaller than that of the vacancy formation (suggesting that vacancy formation is the slower step). The KMC simulations were propagated similarly to the previous studies, and the calculations of the potential gradients in the model were also performed using the same basic framework developed previously. As mentioned earlier, a significant improvement is the explicit treatment of the electrodes, and these were incorporated into the KMC simulations with two basic geometries, Pt-islands and Pt-straps, as illustrated in Figure 9. The KMC simulations were then performed at a temperature of 627C, with a pressure of 1 atm of H2 on the

224

C. Heath Turner et al.

(a)

Pt

Pt Pt

Pt

Pt Pt 5

0.6 nm

1.

nm

YSZ (b)

Pt Pt

5

1.

nm

YSZ

Figure 9 Illustration of (a) Pt-island and (b) Pt-strap catalysts on top of the (111) YSZ supercell. Reproduced with permission from Reference [126], copyright 2009, The Electrochemical Society.

anode side and a pressure of 1 atm of either O2 or air on the cathode side, while the YSZ electrolyte had a cross-sectional dimension of 6 4 nm2 and a thickness of 7 nm. From the results of the KMC simulations, a number of important insights were gained, with several of these trends corroborated by experimental reports in the literature. For instance, increasing the catalyst size on the cathode side (from islands to straps) decreased the number of Pt atoms at the TPB (while the total catalyst loading increased), leading to a lower overall rate of CT reactions. How ever, there is a delicate balance among the elementary steps involved, since the diffusion of chemisorbed oxygen on the cathode surface can become rate-limiting under certain operating conditions. For instance, experimental investigations [139,140] have shown that oxygen dissociative adsorption can be rate-limiting on the cathode side at lower temperatures (<500C), while adatom-oxygen diffu sion may become the limiting step at higher temperatures (>600C). When the gas pressure dependence of the SOFC was probed, there were only small effects observed, and the anode-side reaction rates were found to be independent of the O2 partial pressure at the cathode. This behavior confirms some of the earlier results from the cathode-only simulations [118—120]. Despite the distinct electrode structures, only moderate differences were observed when comparing the I–V performance originating from the Pt-island and the Pt-strap configurations. For instance, Figure 10 shows the I–V curves generated from the KMC simulations, with slightly higher power density observed in the Pt-island configuration. From this data, the authors used the Tafel approximation to estimate the activation loss, the exchange current density, the area-specific resistance, and symmetry factors for the migration barriers of the CT reactions. It was emphasized that if a CT process is not rate-limiting, the Tafel approximation is not sufficient to explain the electrode losses. If misapplied, it could lead to misinterpretation of the results. During the KMC simulations, detailed atomic-level information can also be extracted to understand the poten tial losses within the SOFC model. The accumulation of charged species at the

Atomistic Modeling of Solid Oxide Fuel Cells

225

627°C (1 atm O2, 1 atm H2) 1.1

10

0.7

8

0.5

6

0.3

4

0.1

2

Power density (w/cm2)

0.9

Voltage (V)

12

I–V Pt-Straps I–V Pt-Islands Power Density Pt Straps Power Density Pt Islands

0

−0.1 0

20

40

60

80

Current Density (A/cm2)

Figure 10 SOFC voltage and power density plot of a simulated SOFC (Pt/7 nm YSZ/Pt) at 627C with 1 atm O2 and H2 with respect to the extracted current density. (A filled symbol indicates Pt island results and an open symbol indicates Pt-strap results.) Reproduced with permission from Reference [126], copyright 2009, The Electrochemical Society.

cathode/YSZ and anode/YSZ interfaces was found to extend two or three planes deep into the YSZ. This charge accumulation leads to double-layer overpotential losses, and this observation was consistent with similar KMC-based studies of the cathode-only model [118—120]. The double-layer overpotential is quantified as the loss from supplying the reactants of the forward CT reactions across the double-layer regions near each electrode. In Figure 11, the authors give a clear

627°C 1 atm O2 1 atm H2 Pt-Islands 0

Total potential (V)

−0.2 −0.4 DL CT

−0.6 CT DL OCV 30 A/cm2 60 A/cm2

−0.8 −1 0.0

1.0

2.0

3.0

10 A/cm2 50 A/cm2

4.0

5.0

6.0

7.0

Distance (nm)

Figure 11 Total potential plot at different extracted current densities with respect to the distance across the SOFC supercell with Pt-island catalysts operating at 627C in 1 atm O2 and H2. Reproduced with permission from Reference [126], copyright 2009, The Electrochemical Society.

226

C. Heath Turner et al.

depiction of these losses by plotting the potential losses versus the distance along the SOFC model, with respect to different current densities. Ultimately, the authors conclude that oxide materials with high dielectric constants and high ionic conductivity (such as gadolinium doped ceria) can help spread out the double layer (i.e., charge accumulation) and lower the migration barrier for ionic transport toward the electrode/YSZ interfaces. In fact, there is already experi mental evidence to help support this conclusion [141]. Soon after the development of the full SOFC model, these same authors used their KMC-based approach to analyze their experimental low-temperature impe dance spectra generated from an 8 mol% YSZ fuel cell, with different combina tions of Pt and Au electrodes [128]. From the experimental EIS results, the authors fit their data to an equivalent circuit model (three series components, each consisting of a resistor and a constant-phase element), which allowed resistance and capacitance data to be extracted, corresponding to different sec tions of their Nyquist plots. The KMC simulations were used to complement this work by analyzing the relative frequency of different reactions involved in the SOFC operation. Experimentally, it was observed that the low-frequency loop showed a strong dependence on the bias voltage, and the cathode reactions tended to dominate at low frequencies. This behavior was corroborated by the KMC simulations, which enabled the elementary reactions to be classified according to their relative frequencies. The experimental temperature was 400C, and the simulations were performed at 527—727C, but the relative reac tion frequencies are expected to remain similar. Soon after the report of Pornprasertsuk et al. [126] appeared, a similar study was reported by Wang et al. [127]. There were some slight variations in the model details, such as the dopant concentration (9 mol% vs. 8 mol% YSZ), some of the steps in the reaction mechanism were different, the electrolyte dimensions of Wang’s study were larger, a broader range of operating conditions was explored by Wang, and the electrode details were neglected by Wang. With the KMC approach, Wang investigated a range of material-dependent parameters (electrolyte thickness and relative permittivity of the electrolyte), material-independent para meters (temperature and applied bias voltage), and the sensitivity of the simulation results to the underlying kinetic inputs. Overall, similar trends were found when compared to the earlier report. For instance, similar I—V curves were found in both models and the CT steps were identified as rate-limiting, with respect to the given mechanisms. Several comparisons were made with experimental results, and good qualitative agreement was found. For instance, Figure 12 shows a comparison of the simulated conductivity with an experimental sample of bulk YSZ and a free standing YSZ film (30 mm thickness) with Pt electrodes [70]. The experimental result showed a slightly higher conductivity, possibly indicating a need to lower the KMC barrier to diffusion in the electrolyte. Detailed information was also gathered about the double-layer structure that develops at the electrodes, as a function of the electrolyte thickness and the relative permittivity of the electrolyte. For instance, Figure 13 demonstrates the strong double layer that grows as the relative permittivity is artificially increased in the KMC simulation. This behavior is consistent with the 1-D electrolyte model originally reported by Modak and

Atomistic Modeling of Solid Oxide Fuel Cells

0

227

Δ=−4.76

−1 −2 Δ=−4.51

log (σ) (sm−1)

−3 −4 −5 −6 −7 −8 −9 −10 −11

Δ=−5.11

Vext= −1.0 Vext= −0.5 Freestanding YSZ Film with Pt electrode YSZ Bulk 0.8

1.0

Δ=−6.59

1.2

1.4

1000/T

1.6

1.8

(K−1)

J (mA cm−2)

316

0.540

(b) 1.20

312

0.535

308

0.530

304

J log(RA)

300

0.525 0.520 0.515

296

0.510

292

0.505

288

0.500 20 40 60 80 100 120 140 160 180

0

Relative permittivity εr

log (RA)Ωcm2

Scaled Ionic current density,

(a)

Relative Vö concentration

Figure 12 The simulated conductivity compared to experiment [70] (in log (s)) as a function of inverse temperature (in 1000/T). Reprinted from Reference [127], copyright 2010, with permission from Elsevier.

εr =20 εr =40 εr =60 εr =80

1.15 1.10 1.05

εr =100 εr =120 εr =140 εr =160

1.00 0.95 0.90 0.85 0

4 8 12 148 152 156 160 Layer number (along z-axis)

Figure 13 (a) The scaled ionic current density J (mA cm�2), the logarithmic resistance RA (Wcm2), and (b) the relative Vo¤ concentrations of the YSZ layers as the function of the effective relative permittivity e r. Reprinted from Reference [127], copyright 2010, with permission from Elsevier.

Lusk [32]. Overall, the most influential factors affecting the performance of the SOFC model (as measured by the computed ionic current, J) were the electrolyte thickness, the operating temperature, the applied bias voltage, and the underlying activation barriers of the CT reactions encountered at each electrode.

5. CONCLUSIONS AND FUTURE PERSPECTIVE The material and structural design of the electrolyte, anode, and cathode is still the primary challenge for improving SOFC performance. The ultimate goal is to achieve high efficiency and a long performance lifetime, while operating at lower

228

C. Heath Turner et al.

temperatures. If these goals can be achieved, SOFCs could be commercialized for a broader range of applications in the foreseeable future. The experimental efforts are now strengthened by atomistic-level simulations, which are able to provide reliable information for experimental-scale performance optimization and design guidance. For instance, many candidate structures and materials can first be prescreened for various performance characteristics by computer simulations, prior to any experimental fabrication attempts. As mentioned previously, the most critical aspects of the KMC-based simula tion predictions are the underlying kinetic parameters, which strongly dictate the quantitative accuracy of the model. In order to aid this discovery process, a great deal of electronic structure information is now available in the literature. It is expected that these first-principles studies will continue to grow, especially with the availability of large-scale parallel computing architectures. By combining the mechanistic information (provided by the electronic structure calculations) with larger-scale models (such as KMC), a very powerful and predictive simulation hierarchy can be developed for fundamentally advancing SOFC technology.

ACKNOWLEDGMENTS The Office of Naval Research directly and through the Naval Research Laboratory supported this research.

REFERENCES 1. Srinivasan, S., Mosdale, R., Stevens, P., Yang, C. Fuel cells: Reaching the era of clean and efficient power generation in the twenty-first century. Annu. Rev. Energ. Environ. 1999, 24, 281—328. 2. Brandon, N.P., Skinner, S., Steele, B.C.H. Recent advances in materials for fuel cells. Annu. Rev. Mater. Res. 2003, 33, 183—213. 3. Singhal, S.C. Solid oxide fuel cells for stationary, mobile, and military applications. Solid State Ionics 2002, 152, 405—10. 4. Stambouli, A.B., Traversa, E. Solid oxide fuel cells (SOFCs): A review of an environmentally clean and efficient source of energy. Renew. Sust. Energy Rev. 2002, 6, 433—55. 5. Kilo, M., Taylor, M.A., Argirusis, C., Borchardt, G., Lesage, B., Weber, S., Scherrer, S., Scherrer, H., Schroeder, M., Martin, M. Cation self-diffusion of Ca-44, Y-88, and Zr-96 in single-crystalline calcia- and yttria-doped zirconia. J. Appl. Phys. 2003, 94, 7547—52. 6. Kilo, M. Cation Transport in Stabilised Zirconias, Trans Tech Publications LTD, Zurich, 2005, pp. 185—253. 7. Schulz, O., Martin, M., Argirusis, C., Borchardt, G. Cation tracer diffusion of La-138, Sr-84 and Mg-25 in polycrystalline La0.9Sr0.1Ga0.9Mg0.1O2.9. Phys. Chem. Chem. Phys. 2003, 5, 2308—13. 8. Waernhus, I., Sakai, N., Yokokawa, H., Grande, T., Einarsrud, M.A., Wiik, K. Mass transport in La1-xSrxFeO3(x=0 and 0. 1) measured by SIMS. Solid State Ionics 2004, 175, 69—71. 9. Koerfer, S., De Souza, R.A., Yoo, H.I., Martin, M. Diffusion of Sr and Zr in BaTiO3 single crystals. Solid State Sci. 2008, 10, 725—34. 10. De Souza, R.A., Martin, M. Probing diffusion kinetics with secondary ion mass spectrometry. MRS Bull. 2009, 34, 907—14. 11. Kakac, S., Pramuanjaroenkij, A., Zhou, X.Y. A review of numerical modeling of solid oxide fuel cells. Int. J. Hydrogen Energy 2007, 32, 761—86. 12. Bhattacharyya, D., Rengaswamy, R. A review of solid oxide fuel cell (SOFC) dynamic models. Ind. Eng. Chem. Res. 2009, 48, 6068—86.

Atomistic Modeling of Solid Oxide Fuel Cells

229

13. Lee, S.F., Hong, C.W. Multi-scale design simulation of a novel intermediate-temperature micro solid oxide fuel cell stack system. Int. J. Hydrogen Energy 2010, 35, 1330—8. 14. Choi, Y., Mebane, D.S., Wang, J.H., Liu, M. Continuum and quantum-chemical modeling of oxygen reduction on the cathode in a solid oxide fuel cell. Top. Catal. 2007, 46, 386—401. 15. Marquez, A.I., De Abreu, Y., Botte, G.G. Theoretical investigations of NiYSZ in the presence of H2S. Electrochem. Solid State Lett. 2006, 9, A163—6. 16. Lee, Y.L., Kleis, J., Rossmeisl, J., Morgan, D. Ab initio energetics of LaBO3(001) (B=Mn, Fe, Co, and Ni) for solid oxide fuel cell cathodes. Phys. Rev. B 2009, 80, 224101. 17. Andersson, D.A., Simak, S.I., Skorodumova, N.V., Abrikosov, I.A., Johansson, B. Optimization of ionic conductivity in doped ceria. Proc. Natl. Acad. Sci. U.S.A. 2006, 103, 3518—21. 18. Shishkin, M., Ziegler, T. Oxidation of H2, CH4, and CO molecules at the interface between nickel and yttria-stabilized zirconia: A theoretical study based on DFT. J. Phys. Chem. C 2009, 113, 21667—78. 19. Shishkin, M., Ziegler, T. The oxidation of H2 and CH4 on an oxygen-enriched yttria-stabilized zirconia surface: A theoretical study based on density functional theory. J. Phys. Chem. C 2008, 112, 19662—9. 20. Choi, Y., Lin, M.C., Liu, M.L. Computational study on the catalytic mechanism of oxygen reduction on La0.5Sr0.5MnO3 in solid oxide fuel cells. Angew. Chem. Int. Ed. 2007, 46, 7214—9. 21. Pornprasertsuk, R., Ramanarayanan, P., Musgrave, C.B., Prinz, F.B. Predicting ionic conductivity of solid oxide fuel cell electrolyte from first principles. J. Appl. Phys. 2005, 98, 103513. 22. Stapper, G., Bernasconi, M., Nicoloso, N., Parrinello, M. Ab initio study of structural and electronic properties of yttria-stabilized cubic zirconia. Phys. Rev. B 1999, 59, 797—810. 23. Kleis, J., Jones, G., Abild-Pedersen, F., Tripkovic, V., Bligaard, T., Rossmeisl, J. Trends for methane oxidation at solid oxide fuel cell conditions. J. Electrochem. Soc. 2009, 156, B1447—56. 24. Mukherjee, J., Linic, S. First-principles investigations of electrochemical oxidation of hydrogen at solid oxide fuel cell operating conditions. J. Electrochem. Soc. 2007, 154, B919—24. 25. Carter, E.A. Challenges in modeling materials properties without experimental input. Science 2008, 321, 800—3. 26. Shimojo, F., Okabe, T., Tachibana, F., Kobayashi, M., Okazaki, H. Molecular-dynamics studies of yttria stabilized zirconia. 1. Structure and oxygen diffusion. J. Phys. Soc. Jpn. 1992, 61, 2848—57. 27. Shimojo, F., Okazaki, H. Molecular-dynamics studies of yttria stabilized zirconia. 2. Microscopic mechanism of oxygen diffusion. J. Phys. Soc. Jpn. 1992, 61, 4106—18. 28. Gotte, A., Spangberg, D., Hermansson, K., Baudin, M. Molecular dynamics study of oxygen selfdiffusion in reduced CeO2. Solid State Ionics 2007, 178, 1421—7. 29. van Duin, A.C.T., Merinov, B.V., Jang, S.S., Goddard, W.A. ReaxFF reactive force field for solid oxide fuel cell systems with application to oxygen ion transport in yttria-stabilized zirconia. J. Phys. Chem. A 2008, 112, 3133—40. 30. Schelling, P.K., Phillpot, S.R., Wolf, D. Mechanism of the cubic-to-tetragonal phase transition in zirconia and yttria-stabilized zirconia by molecular-dynamics simulation. J. Am. Ceram. Soc. 2001, 84, 1609—19. 31. Devanathan, R., Weber, W.J., Singhal, S.C., Gale, J.D. Computer simulation of defects and oxygen transport in yttria-stabilized zirconia. Solid State Ionics 2006, 177, 1251—8. 32. Modak, A.U., Lusk, M.T. Kinetic Monte Carlo simulation of a solid-oxide fuel cell: I. Open-circuit voltage and double layer structure. Solid State Ionics 2005, 176, 2181—91. 33. Hu, G.S., Orkoulas, G., Christofides, P.D. Stochastic modeling and simultaneous regulation of surface roughness and porosity in thin film deposition. Ind. Eng. Chem. Res. 2009, 48, 6690—700. 34. Hu, G.S., Orkoulas, G., Christofides, P.D. Regulation of film thickness, surface roughness and porosity in thin film growth using deposition rate. Chem. Eng. Sci. 2009, 64, 3903—13. 35. Liu, J., Liu, C.Q., Conway, P.P. Kinetic Monte Carlo simulation of electrodeposition of polycrys talline Cu. Electrochem. Commun. 2009, 11, 2207—11. 36. Lou, Y.M., Christofides, P.D. Estimation and control of surface roughness in thin film growth using kinetic Monte-Carlo models. Chem. Eng. Sci. 2003, 58, 3115—29. 37. Wadley, H.N.G., Zhou, A.X., Johnson, R.A., Neurock, M. Mechanisms, models and methods of vapor deposition. Prog. Mater. Sci. 2001, 46, 329—77.

230

C. Heath Turner et al.

38. Wang, L.G., Clancy, P. Kinetic Monte Carlo simulation of the growth of polycrystalline Cu films. Surf. Sci. 2001, 473, 25—38. 39. Hansen, E., Neurock, M. First-principles-based Monte Carlo methodology applied to O/Rh(100). Surf. Sci. 2000, 464, 91—107. 40. Hansen, E., Neurock, M. First-principles-based Monte Carlo simulation of ethylene hydrogena tion kinetics on Pd. J. Catal. 2000, 196, 241—52. 41. Hansen, E.W., Neurock, M. Modeling surface kinetics with first-principles-based molecular simulation. Chem. Eng. Sci. 1999, 54, 3411—21. 42. Hansen, E.W., Neurock, M. First-principles-based Monte Carlo methodology applied to O/Rh (100). Surf. Sci. 2000, 464, 91—107. 43. Hansen, E.W., Neurock, M. First-principles-based Monte Carlo simulation of ethylene hydro genation kinetics on Pd. J. Catal. 2000, 196, 241—52. 44. Kieken, L.D., Neurock, M., Mei, D.H. Screening by kinetic Monte Carlo simulation of Pt-Au(100) surfaces for the steady-state decomposition of nitric oxide in excess dioxygen. J. Phys. Chem. B 2005, 109, 2234—44. 45. Mei, D., Sheth, P.A., Neurock, M., Smith, C.M. First-principles-based kinetic Monte Carlo simula tion of the selective hydrogenation of acetylene over Pd(111). J. Catal. 2006, 242, 1—15. 46. Mei, D.H., Neurock, M., Smith, C.M. Hydrogenation of acetylene-ethylene mixtures over Pd and Pd-Ag alloys: First-principles-based kinetic Monte Carlo simulations. J. Catal. 2009, 268, 181—95. 47. Mei, D.H., Ge, Q.F., Neurock, M., Kieken, L., Lerou, J. First-principles-based kinetic Monte Carlo simulation of nitric oxide decomposition over Pt and Rh surfaces under lean-burn conditions. Mol. Phys. 2004, 102, 361—9. 48. Neurock, M., Hansen, E.W. First-principles-based molecular simulation of heterogeneous cata lytic surface chemistry. Comput. Chem. Eng. 1998, 22, S1045—60. 49. Bortz, A.B., Kalos, M.H., Lebowitz, J.L. A new algorithm for Monte Carlo simulation of ising spin systems. J. Comput. Phys. 1975, 17, 10—18. 50. Landau, D.P., Binder, K. A Guide to Monte Carlo Simulations in Statistical Physics, Cambridge, United Kingdom, 2005. 51. Gillespie, D.T. A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. J. Comput. Phys. 1976, 22, 403—34. 52. Voter, A.F. Classically exact overlayer dynamics: Diffusion of rhodium clusters on Rh(100). Phys. Rev. B 1986, 34, 6819—29. 53. Fichthorn, K.A., Weinberg, W.H. Theoretical foundations of dynamical Monte Carlo simulations. J. Chem. Phys. 1991, 95, 1090—6. 54. Martinez, E., Marian, J., Kalos, M.H., Perlado, J.M. Synchronous parallel kinetic Monte Carlo for continuum diffusion-reaction systems. J. Comput. Phys. 2008, 227, 3804—23. 55. Korniss, G., Novotny, M.A., Rikvold, P.A. Parallelization of a dynamic Monte Carlo algorithm: A partially rejection-free conservative approach. J. Comput. Phys. 1999, 153, 488—508. 56. Nandipati, G., Shim, Y., Amar, J.G., Karim, A., Kara, A., Rahman, T.S., Trushin, O. Parallel kinetic Monte Carlo simulations of Ag(111) island coarsening using a large database. J. Phys.-Condens. Matter 2009, 21, 84214. 57. Fleig, J. Solid oxide fuel cell cathodes: Polarization mechanisms and modeling of the electro chemical performance. Annu. Rev. Mater. Res. 2003, 33, 361—82. 58. Bessler, W.G., Gewies, S., Vogler, M. A new framework for physically based modeling of solid oxide fuel cells. Electrochim. Acta 2007, 53, 1782—800. 59. Ihara, M., Kusano, T., Yokoyama, C. Competitive adsorption reaction mechanism of Ni/yttria stabilized zirconia cermet anodes in H-2-H2O solid oxide fuel cells. J. Electrochem. Soc. 2001, 148, A209—19. 60. Adler, S.B., Lane, J.A., Steele, B.C.H. Electrode kinetics of porous mixed-conducting oxygen electrodes. J. Electrochem. Soc. 1996, 143, 3554—64. 61. Garcia-Barriocanal, J., Rivera-Calzada, A., Varela, M., Sefrioui, Z., Diaz-Guillen, M.R., Moreno, K.J., Diaz-Guillen, J.A., Lborra, E., Fuentes, A.F., Pennycook, S.J., Leon, C., Santarnaria, J. Tailor ing disorder and dimensionality: Strategies for improved solid oxide fuel cell electrolytes. ChemPhysChem 2009, 10, 1003—11. 62. Kilner, J.A. Ionic conductors feel the strain. Nat. Mater. 2008, 7, 838—9.

Atomistic Modeling of Solid Oxide Fuel Cells

231

63. Steele, B.C.H., Heinzel, A. Materials for fuel-cell technologies. Nature 2001, 414, 345—52. 64. Chadwick, A.V. Nanotechnology–solid progress in ion conduction. Nature 2000, 408, 925—6. 65. Hull, S. Superionics: Crystal structures and conduction processes. Rep. Prog. Phys. 2004, 67, 1233—314. 66. Evans, A., Bieberie-Hutter, A., Galinski, H., Rupp, J.L.M., Ryll, T., Scherrer, B., Tolke, R., Gauckler, L.J. Micro-solid oxide fuel cells: Status, challenges, and chances. Monatsh. Chem. 2009, 140, 975—83. 67. Will, J., Mitterdorfer, A., Kleinlogel, C., Perednis, D., Gauckler, L.J. Fabrication of thin electrolytes for second-generation solid oxide fuel cells. Solid State Ionics 2000, 131, 79—96. 68. Radhakrishnan, R., Virkar, A.V., Singhal, S.C. Estimation of charge-transfer resistivity of Pt cathode on YSZ electrolyte using patterned electrodes. J. Electrochem. Soc. 2005, 152, A927—36. 69. Horita, T., Yamaji, K., Sakai, N., Xiong, X.P., Kato, T., Yokokawa, H., Kawada, T. Imaging of oxygen transport at SOFC cathode/electrolyte interfaces by a novel technique. J. Power Sources 2002, 106, 224—30. 70. Kwon, O.H., Choi, G.M. Electrical conductivity of thick film YSZ. Solid State Ionics 2006, 177, 3057—62. 71. Goodenough, J.B., Huang, Y.H. Alternative anode materials for solid oxide fuel cells. J. Power Sources 2007, 173, 1—10. 72. Tsipis, E.V., Kharton, V.V. Electrode materials and reaction mechanisms in solid oxide fuel cells: A brief review. J. Solid State Electron 2008, 12, 1039—60. 73. Jimenez, R., Kloidt, T., Kleitz, M. Reaction-zone expansions and mechanism of the O-2, Ag/ yttria-stabilized zirconia electrode reaction. J. Electrochem. Soc. 1997, 144, 582—5. 74. Basu, R.N., Tietz, F., Wessel, E., Stover, D. Interface reactions during co-firing of solid oxide fuel cell components. J. Mater. Process. Technol. 2004, 147, 85—9. 75. Van Herle, J., Vasquez, R. Conductivity of Mn and Ni-doped stabilized zirconia electrolyte. J. Eur. Ceram. Soc. 2004, 24, 1177—80. 76. Mitterdorfer, A., Gauckler, L.J. Identification of the reaction mechanism of the Pt, O-2(g) | yttria stabilized zirconia system–part I: General framework, modelling, and structural investigation. Solid State Ionics 1999, 117, 187—202. 77. Mitterdorfer, A., Gauckler, L.J. Identification of the reaction mechanism of the Pt, O-2(g) | yttria stabilized zirconia system–part II: Model implementation, parameter estimation, and valida tion. Solid State Ionics 1999, 117, 203—17. 78. Mitterdorfer, A., Gauckler, L.J. Reaction kinetics of the Pt, O-2(g) | c-ZrO2 system: Precursormediated adsorption. Solid State Ionics 1999, 120, 211—25. 79. Griffiths, K., Jackman, T.E., Davies, J.A., Norton, P.R. Interaction of O2 with Pt(100): I. Equili brium measurements. Surf. Sci. 1984, 138, 113—24. 80. Bonzel, H.P., Ku, R. On the kinetics of oxygen adsorption on a Pt(111) surface. Surf. Sci. 1973, 40, 85—101. 81. Gland, J.L. Molecular and atomic adsorption of oxygen on the Pt(111) and Pt(S)-12(111) (111) surfaces. Surf. Sci. 1980, 93, 487—514. 82. Gland, J.L., Sexton, B.A., Fisher, G.B. Oxygen interactions with the Pt(111) surface. Surf. Sci. 1980, 95, 587—602. 83. Verkerk, M.J., Burggraaf, A.J. Oxygen transfer on substituted ZrO2, Bi2O3, and CeO2 electrolytes with platinum electrodes. J. Electrochem. Soc. 1983, 130, 78—84. 84. Lewis, R., Gomer, R. Adsorption of oxygen on platinum. Surf. Sci. 1968, 12, 157—76. 85. Campbell, C.T., Ertl, G., Kuipers, H., Segner, J. A molecular beam study of the adsorption and desorption of oxygen from a Pt(111) surface. Surf. Sci. 1981, 107, 220—36. 86. Lapujoulade, J., Neil, K.S. Chemisorption of hydrogen on the (111) plane of nickel. J. Chem. Phys. 1972, 57, 3535—45. 87. Schwiedernoch, R., Tischer, S., Correa, C., Deutschmann, O. Experimental and numerical study on the transient behavior of partial oxidation of methane in a catalytic, monolith. Chem. Eng. Sci. 2003, 58, 633—42. 88. Vogler, M., Bieberle-Hutter, A., Gauckler, L., Warnatz, J., Bessler, W.G. Modelling study of surface reactions, diffusion, and spillover at a Ni/YSZ patterned anode. J. Electrochem. Soc. 2009, 156, B663—72.

232

C. Heath Turner et al.

89. Hecht, E.S., Gupta, G.K., Zhu, H.Y., Dean, A.M., Kee, R.J., Maier, L., Deutschmann, O. Methane reforming kinetics within a Ni-YSZ SOFC anode support. Appl. Catal. A Gen. 2005, 295, 40—51. 90. Lau, K.C., Dunlap, B.I. Lattice dielectric and thermodynamic properties of yttria stabilized zirconia solids. J. Phys.-Condens. Matter 2009, 21, 145402. 91. Galea, N.M., Kadantsev, E.S., Ziegler, T. Studying reduction in solid oxide fuel cell activity with density functional theory–effects of hydrogen sulfide adsorption on nickel anode surface. J. Phys. Chem. C 2007, 111, 14457—68. 92. Rossmeisl, J., Bessler, W.G. Trends in catalytic activity for SOFC anode materials. Solid State Ionics 2008, 178, 1694—700. 93. Setoguchi, T., Okamoto, K., Eguchi, K., Arai, H. Effects of anode material and fuel on anodic reaction of solid oxide fuel-cells. J. Electrochem. Soc. 1992, 139, 2875—80. 94. Diskin, A.M., Cunningham, R.H., Ormerod, R.M.Z. The oxidative chemistry of methane over supported nickel catalysts. Catal. Today 1998, 46, 147—54. 95. Vang, R.T., Honkala, K., Dahl, S., Vestergaard, E.K., Schnadt, J., Laegsgaard, E., Clausen, B.S., Norskov, J.K., Besenbacher, F. Controlling the catalytic bond-breaking selectivity of Ni surfaces by step blocking. Nat. Mater. 2005, 4, 160—2. 96. Wang, S.G., Liao, X.Y., Hu, J., Cao, D.B., Li, Y.W., Wang, J.G., Jiao, H.J. Kinetic aspect of CO2 reforming of CH4 on Ni(111): A density functional theory calculation. Surf. Sci. 2007, 601, 1271—84. 97. Wang, S.G., Cao, D.B., Li, Y.W., Wang, J.G., Jiao, H.J. CO2 reforming of CH4 on Ni(111): A density functional theory calculation. J. Phys. Chem. B 2006, 110, 9976—83. 98. Wang, S.G., Cao, D.B., Li, Y.W., Wang, J.G., Jiao, H.J. Reactivity of surface OH in CH4 reforming reactions on Ni(111): A density functional theory calculation. Surf. Sci. 2009, 603, 2600—6. 99. Blaylock, D.W., Ogura, T., Green, W.H., Beran, G.J.O. Computational investigation of thermo chemistry and kinetics of steam methane reforming on Ni(111) under realistic conditions. J. Phys. Chem. C 2009, 113, 4898—908. 100. Shi, Z., Zhang, J.J., Liu, Z.S., Wang, H.J., Wilkinson, D.P. Current status of ab initio quantum chemistry study for oxygen electroreduction on fuel cell catalysts. Electrochim. Acta 2006, 51, 1905—16. 101. Hyman, M.P., Medlin, J.W. Mechanistic study of the electrochemical oxygen reduction reaction on Pt(111) using density functional theory. J. Phys. Chem. B 2006, 110, 15338—44. 102. Li, T., Balbuena, P.B. Oxygen reduction on a platinum cluster. Chem. Phys. Lett. 2003, 367, 439—47. 103. Sljivancanin, Z., Hammer, B. Oxygen dissociation at close-packed Pt terraces, Pt steps, and Ag covered Pt steps studied with density functional theory. Surf. Sci. 2002, 515, 235—44. 104. Shustorovich, E., Sellers, H. The UBI-QEP method: A practical theoretical approach to under standing chemistry on transition metal surfaces. Surf. Sci. Rep. 1998, 31, 5—119. 105. Sellers, H., Shustorovich, E. Intrinsic activation barriers and coadsorption effects for reactions on metal surfaces: Unified formalism within the UBI-QEP approach. Surf. Sci. 2002, 504, 167—82. 106. Hei, M.J., Chen, H.B., Yi, J., Lin, Y.J., Lin, Y.Z., Wei, G., Liao, D.W. CO2-reforming of methane on transition metal surfaces. Surf. Sci. 1998, 417, 82—96. 107. Wang, D.Y., Nowick, A.S. Cathodic and anodic polarization phenomena at platinum electrodes with doped CeO2 as electrolyte. J. Electrochem. Soc. 1979, 126, 1155—65. 108. Kim, S., Yamaguchi, S., Elliott, J.A. Solid-state ionics in the 21st century: Current status and future prospects. MRS Bull. 2009, 34, 900—6. 109. Guo, X., Waser, R. Electrical properties of the grain boundaries of oxygen ion conductors: Acceptor-doped zirconia and ceria. Prog. Mater. Sci. 2006, 51, 151—210. 110. Hendriks, M.G.H.M., ten Elshof, J.E., Bouwmeester, H.J.M., Verweij, H. The defect structure of the double layer in yttria-stabilised zirconia. Solid State Ionics 2002, 154, 467—72. 111. Perdew, J.P., Wang, Y. Accurate and simple analytic representation of the electron-gas correlationenergy. Phys. Rev. B 1992, 45, 13244—9. 112. Perdew, J.P., Chevary, J.A., Vosko, S.H., Jackson, K.A., Pederson, M.R., Singh, D.J., Fiolhais, C. Atoms, molecules, solids, and surfaces–applications of the generalized gradient approximation for exchange and correlation. Phys. Rev. B 1992, 46, 6671—87. 113. Ioffe, A.I., Rutman, D.S., Karpachov, S.V. On the nature of the conductivity maximum in zirconiabased solid electrolytes. Electrochim. Acta 1978, 23, 141—2.

Atomistic Modeling of Solid Oxide Fuel Cells

233

114. Politova, T.I., Irvine, J.T.S. Investigation of scandia-yttria-zirconia system as an electrolyte material for intermediate temperature fuel cells–influence of yttria content in system (Y2O3)x(Sc2O3)(11-x)(ZrO2)(89). Solid State Ionics 2004, 168, 153—65. 115. Pornprasertsuk, R., Cheng, J., Huang, H., Prinz, F.B. Electrochemical impedance analysis of solid oxide fuel cell electrolyte using kinetic Monte Carlo technique. Solid State Ionics 2007, 178, 195—205. 116. Orazemand, M.E., Tribollet, B. Electrochemical Impedance Spectroscopy, Wiley, Hoboken, 2008. 117. MacDonald, J.R. Impedance Spectroscopy: Theory, Experiments and Applications, Wiley, New York, 2005. 118. Wang, X., Lau, K.C., Turner, C.H., Dunlap, B.I. Kinetic Monte Carlo simulation of AC impedance on the cathode side of a solid oxide fuel cell. J. Electrochem. Soc. 2010, 157, B90—8. 119. Lau, K.C., Turner, C.H., Dunlap, B.I. Kinetic Monte Carlo simulation of the yttria stabilized zirconia (YSZ) fuel cell cathode. Solid State Ionics 2008, 179, 1912—20. 120. Lau, K.C., Turner, C.H., Dunlap, B.I. Kinetic Monte Carlo simulation of O2- incorporation in the yttria stabilized zirconia (YSZ) fuel cell. Chem. Phys. Lett. 2009, 471, 326—30. 121. Horita, T., Yamaji, K., Sakai, N., Xiong, Y.P., Kato, T., Yokokawa, H., Kawada, T. Determination of proton and oxygen movements in solid oxides by the tracer gases exchange technique and secondary ion mass spectrometry. Appl. Surf. Sci. 2003, 203, 634—8. 122. Horita, T., Yamaji, K., Sakai, N., Yokokawa, H., Kawada, T., Kato, T. Oxygen reduction sites and diffusion paths at La0.9Sr0.1MnO3-x/yttria-stabilized zirconia interface for different cathodic over voltages by secondary-ion mass spectrometry. Solid State Ionics 2000, 127, 55—65. 123. Horita, T., Yamaji, K., Ishikawa, M., Sakai, N., Yokokawa, H., Kawada, T., Kato, T. Active sites imaging for oxygen reduction at the La0.9Sr0.1MnO3-x/yttria-stabilized zirconia interface by secondary-ion mass spectrometry. J. Electrochem. Soc. 1998, 145, 3196—202. 124. Huang, Q.A., Hui, R., Wang, B.W., Zhang, H.J. A review of AC impedance modeling and validation in SOFC diagnosis. Electrochim. Acta. 2007, 52, 8144—64. 125. Chang, H.C., Tsai, D.S., Chung, W.H., Huang, Y.S., Le, M.V. A ceria layer as diffusion barrier between LAMOX and lanthanum strontium cobalt ferrite along with the impedance analysis. Solid State Ionics 2009, 180, 412—7. 126. Pornprasertsuk, R., Holme, T., Prinz, F.B. Kinetic Monte Carlo simulations of solid oxide fuel cell. J. Electrochem. Soc. 2009, 156, B1406—16. 127. Wang, X., Lau, K.C., Turner, C.H., Dunlap, B.I. Kinetic Monte Carlo simulation of the elementary electrochemistry in a hydrogen-powered solid oxide fuel cell. J. Power Sources 2010, 195, 4177—84. 128. Holme, T.P., Pornprasertsuk, R., Prinz, F.B. Interpretation of low temperature solid oxide fuel cell electrochemical impedance spectra. J. Electrochem. Soc. 2010, 157, B64—70. 129. Hellsing, B., Kasemo, B., Zhdanov, V.P. Kinetics of the hydrogen oxygen reaction on platinum. J. Catal. 1991, 132, 210—28. 130. Yoon, S.P., Nam, S.W., Kim, S.G., Hong, S.A., Hyun, S.H. Characteristics of cathodic polarization at Pt/YSZ interface without the effect of electrode microstructure. J. Power Sources 2003, 115, 27—34. 131. Nieuwenhuys, B.E. Adsorption and reactions of CO, NO, H2 and O2 on group VIII metal surfaces. Surf. Sci. 1983, 126, 307—36. 132. Germer, T.A., Ho, W. Direct characterization of the hydroxyl intermediate during reduction of oxygen on Pt(111) by time-resolved electron-energy loss spectroscopy. Chem. Phys. Lett. 1989, 163, 449—54. 133. Fridell, E., Elg, A.P., Rosen, A., Kasemo, B. A laser-induced fluorescence study of OH desorption from Pt(111) during oxidation of hydrogen in O-2 and decomposition of water. J. Chem. Phys. 1995, 102, 5827—35. 134. Thiel, P.A., Madey, T.E. The interaction of water with solid-surfaces–fundamental-aspects. Surf. Sci. Rep. 1987, 7, 211—385. 135. Norton, P.R., Davies, J.A., Jackman, T.E. Absolute coverage and isostemc heat of adsorption of deuterium on Pt(111) studied by nuclear microanalysis. Surf. Sci. 1982, 121, 103—10. 136. Ljungstrom, S., Kasemo, B., Rosen, A., Wahnstrom, T., Fridell, E. An experimental-study of the kinetics of OH and H2O formation on Pt in the H2þO2 reaction. Surf. Sci. 1989, 216, 63—92.

234

C. Heath Turner et al.

137. Blo¨ chl, P.E. Projector augmented-wave method. Phys. Rev. B 1994, 50, 17953—79. 138. Kresse, G., Joubert, D. From ultrasoft pseudopotentials to the projector augmented-wave method. Phys. Rev. B 1999, 59, 1758—75. 139. Mizusaki, J., Amano, K., Yamauchi, S., Fueki, K. Electrode-reaction at Pt,O-2(g)/stabilized zirconia interfaces. 2. Electrochemical measurements and analysis. Solid State Ionics 1987, 22, 323—30. 140. Uchida, H., Yoshida, M., Watanabe, M. Effect of ionic conductivity of zirconia electrolytes on the polarization behavior of various cathodes in solid oxide fuel cells. J. Electrochem. Soc. 1999, 146, 1—7. 141. Huang, H., Nakamura, M., Su, P.C., Fasching, R., Saito, Y., Prinz, F.B. High-performance ultrathin solid oxide fuel cells for low-temperature operation. J. Electrochem. Soc. 2007, 154, B20—4.

Section 5

Biological Modeling

Section Editor: Nathan Baker Pacific Northwest National Laboratory, Richland, WA 99352, USA

CHAPTER

12 Modeling Signaling Processes across Cellular Membranes Using a Mesoscopic Approach George Khelashvili1 and Daniel Harries2

Contents

1. Introduction 238

1.1 Lipid rafts as platforms for cellular signaling 238

1.2 The need for large-scale quantitative models to describe

complex signaling machinery 239

2. Mesoscopic Model of Membrane-Associated Signaling Complexes 240

2.1 Overall strategy 240

2.2 System representation and governing free energy 241

2.3 Free energy minimization 244

2.4 Quantitative description of peripheral protein diffusion 245

2.5 Accounting for amphipathic helix insertions 246

3. Model Applications 246

3.1 PIP2 and cellular signalingmechanisms of membrane targeting 246

3.2 BARmembrane interactions 248

3.3 Adsorption of natively unstructured protein domains

onto lipid membranes 252

4. Future Prospects 256

Acknowledgments 257

References 257

Abstract

Computational models are effective in providing quantitative predictions on processes across cellular membranes, thereby aiding experimental observations. Conventional computational tools, such as molecular dynamics or Monte Carlo simulation, offer significant insights when applicable. However, it remains extremely difficult to use these simulation methods to describe large

1

Department of Physiology and Biophysics, Weill Medical College of Cornell University, New York, NY, USA

2

Institute of Chemistry and the Fritz Haber Research Center, The Hebrew University of Jerusalem, Jerusalem, Israel

Annual Reports in Computational Chemistry, Volume 6 ISSN: 1574-1400, DOI 10.1016/S1574-1400(10)06012-3

2010 Elsevier B.V. All rights reserved.

237

238

George Khelashvili and Daniel Harries

macromolecular assemblies within timescales relevant to a vast majority of critical physiological processes. To overcome this outstanding challenge, alternative methods based on coarse-grained representations have more recently emerged. In this chapter, we review one such particular advanced methodology that is based on mean-field-type representations typically used for equilibrium thermodynamic calculations of lipids and proteins. The main advantages of this self-consistent scheme are in adding information concerning longer timescales and in gaining access to the steady state of the system without making a priori assumptions concerning proteinmembrane interactions. We illustrate this methodology using several examples pertaining to interactions of peripheral signaling proteins with lipid membranes. These examples outline the current state of the computational strategy and allow us to discuss several future enhancements that should help the scheme become a powerful methodology complementary to other simulation techniques. With these extensions, the proposed methodology could enable quantitative description of large-scale membrane-associated interactions that are of major importance in physiological processes of the healthy and diseased cell. Keywords: cell signaling; lipid rafts; BAR domains; membrane curvature; membrane elasticity; PIP2 diffusion; mean-field model; coarse-grained theory; PoissonBoltzmann theory; CahnHilliard equations

1. INTRODUCTION 1.1 Lipid rafts as platforms for cellular signaling Overwhelming evidence indicates that function and organization of protein components of living cell membranes are orchestrated at specific spatial and temporal scales. In particular, structural, compositional, and mechanistic proper ties of lipid bilayers play a significant role in regulating the physiological func tion of membrane-associated proteins [1]. One of the best known examples is the existence of specialized plasma membrane domains, typically enriched in cho lesterol and sphingolipids. These patches, termed “rafts”, have been implicated as platforms for various physiological processes, and specifically for cellular signal transduction [2]. As such, rafts have been shown to be important in regulating the function of both transmembranal (TM) and peripheral signaling proteins. For instance, evidence suggests that cholesterol-dependent separation of the TM signaling G-protein-coupled receptors (GPCRs) from their partners can be a determining factor for signaling efficacy [3]. Another example is the use of polyvalent phos phatidylinositol 4,5-bisphosphate (PIP2) lipids, also found to be enriched in rafts, for membrane targeting by various peripheral signaling motifs, such as C2 [4], PH [5], FERM [6], and BAR domains [7]. BAR domains present a particularly fascinating case because they have been found to act as mechanistic modules that are capable of locally reshaping plasma membranes as a part of cell signaling and other physiological functions such as endocytosis [7]. Importantly, BAR modules are involved synergistically with other protein domains, such as PDZ domains, in

Modeling Signaling Processes across Cellular Membranes Using a Mesoscopic Approach

239

interactions with GPCRs to direct subsequent steps in signaling through their effects on membrane remodeling. While this synergism has been proposed spe cifically for those proteins interacting with C-kinase 1 (PICK1) [8], the abundance of BAR domain containing proteins highlights the importance of this class of mechanisms and their putative physiological roles. A fundamentally important question for the role of rafts in cellular signaling is whether such domains exist preformed in living cells so that they can be recog nized by the cellular protein machinery, or alternatively, could rafts present structures that dynamically assemble, adopting specific lipid composition or mem brane deformations around specific protein components in response to physiolo gical function [2]. Biochemical and biophysical studies conducted in vitro on cell membranes, as well as on model lipid assemblies, established that rafts can exist as stable membrane domains in the liquid-ordered (LO) phase surrounded by a relatively fluid (La) lipid environment (for example, see References [9—14] and references therein). These domains are physically different both structurally and mechanistically from the surrounding lipid matrix. In particular, rafts are generally thicker and more rigid compared to other membrane compartments [15—19]. These studies also identified additional putative raft components, such as glycosylpho sphatidylinositol (GPI)-anchored proteins or TM domains [20,21]. However, despite the wealth of data collected in vitro, the challenge in the field still remains to link structural and mechanistic raft characteristics observed in artificial systems with those in living cell membranes under native conditions, where language borrowed from macroscopic phase transitions may become inadequate [2,16].

1.2 The need for large-scale quantitative models to describe complex signaling machinery The difficulty with realistically describing rafts and associated interactions dur ing signal transduction originates from the large number of concerted actions involved. When signaling proteins and other macromolecules adsorb, diffuse on cell membranes, penetrate into the membrane, and associate/dissociate in com plexes within the membrane, they interact through intricate forces that ultimately determine biological function. This complexity of interaction makes it concep tually challenging and computationally very costly to quantify such encounters at the macromolecular level. Computational models are powerful in aiding experimental observations by providing quantitative, testable predictions. However, while conventional com putational tools, such as molecular dynamics (MD) or Monte Carlo (MC) simula tions, offer significant insight when applicable, it remains very difficult to use them in order to describe large macromolecular assemblies within timescales relevant to a vast majority of critical biological processes. Even if the required force-fields are available, using current supercomputational resources, it is pos sible in exceptional cases to use MD simulations for ca. 1 ms for systems as large as 250,000 atoms. But even these relatively extended size and timescales do not permit the consideration of processes that include membrane reshaping, lipid

240

George Khelashvili and Daniel Harries

reorganization, and protein—protein interactions, which evolve concertedly at the lipid membrane interface. Not surprisingly, methods have been devised in sustained efforts to address this perennial challenge. Recent computational strategies have attempted to coarse-grain the system, thus lowering the number of degrees of freedom addressed by the model, thereby also reducing the required computational effort (see, e.g., References [22—25]). However, most of these strategies rely on designing force-fields for specific mesoscopic models, a formidable task in itself. We have been pursuing a somewhat different approach, which takes advantage of the extensive knowledge and quantitative information accumulated on lipids, proteins, and their interactions. In particular, to model membrane-associated inter actions during cellular signaling, we take advantage of available information on the elastic, entropic, and electrostatic properties of lipids and proteins [26,27]. Our starting point is mean-field-type theories that are typically used for equilibrium thermodynamic calculations of lipids and macromolecules such as proteins. Infor mation resulting from these models is then fed as inputs to dynamic Cahn—Hilliard (CH) and stochastic Langevin formulations [28] that allow probing of the molecu lar interactions of membrane-associated proteins in time and space. With this algorithmic formulation, we concentrate on explicitly describing only a smaller number of important degrees of freedom, precluding the need to model individual lipid components. In this review, we describe our modeling strategy and present several applica tions of the method. All the considered examples relate to interactions of periph eral signaling motifs, such as BAR domains or basic (hence positively charged) polypeptides with membranes of raft-like lipid compositions. We aim to illustrate the effectiveness of our approach in describing dynamic membrane processes that involve membrane remodeling upon protein adsorption, as well as lipid rearran gement and segregation following their interaction with adsorbing proteins. The main advantages of this self-consistent scheme are in adding information concerning longer timescales and gaining access to the steady state of the system without making a priori assumptions on protein—membrane interactions. We end by discussing future perspectives and possible extensions of the model that will hopefully allow this to become a powerful methodology complementary to other simulation techniques, such as MD. Together, these strategies should enable the study of large-scale membrane-associated interactions that are of major impor tance in physiological processes of the cell in both healthy and diseased states.

2. MESOSCOPIC MODEL OF MEMBRANE-ASSOCIATED SIGNALING COMPLEXES 2.1 Overall strategy We start by discussing our overall strategy to coarse-grain macromolecular repre sentations by using available information from experiments and from results of atomistic simulations on the material properties of proteins, membranes, and lipid

Modeling Signaling Processes across Cellular Membranes Using a Mesoscopic Approach

241

components, as well as their interactions. This information can be used in order to devise models that treat explicitly only a smaller number of important degrees of freedom. Thus, to quantify the combined kinetic effect of many lipid species interacting with peripheral proteins, and to describe the concomitant membrane shape perturbations, it is essential to be able to calculate the steady state of adsorbing macromolecules in a way that will include all important degrees of freedom in a self-consistent manner. These interactions include electrostatics (Coulomb) forces, lipid mixing, and membrane elastic deformations. In our for mulation, self-consistency is achieved by minimizing the governing model free energy density functional, which is based on the continuum Helfrich free energy for membrane elasticity [29], and on the nonlinear Poisson—Boltzmann (PB) theory of electrostatics [30—34]. By providing a realistic three-dimensional treatment of the electrostatic problem, and requiring only a few phenomenological material con stants to describe the lipid bilayer, this simple formalism accounts for a number of important membrane properties. Although this mesoscopic theory neglects most atomic structural features of a lipid bilayer [34,35], similar membrane and mem brane—macromolecule models have been shown to yield reliable qualitative and quantitative predictions [36—47].

2.2 System representation and governing free energy Our method uses an atomic-level representation of the adsorbing protein in three dimensions, and accounts for lateral reorganization and demixing of lipids, as well as membrane deformations upon adsorption (see Figure 1). We consider the limit of low surface density of adsorbing proteins, so that interactions between

εW, λD

Upper leaflet d/2 d/2

Lower leaflet

d/2

Mid-plane

εW, λD

z x

y

Figure 1 View of the BARmembrane complex at steady state, as predicted from coarse-grained mean-field-level calculations. BAR is shown in spacefill; the membrane interior is shaded gray. In this hybrid model, BAR domain is represented in its all-atom detail, through partial charges and the radii of each constituent atom. The membrane is represented as a two-dimensional incompressible, tensionless, elastic medium comprised of 2D smooth charged surfaces (where the lipid polar head-groups reside), and a low-dielectric hydrophobic core volume. Elastic properties of the membrane are characterized by its bending modulus and locally defined spontaneous curvatures. The system is driven toward equilibrium through the self-consistent minimization of the free energy functional, the latter containing contributions from the systems electrostatic energy, mixing entropy of lipids and ions in the solution, and membrane deformation energy.

242

George Khelashvili and Daniel Harries

proteins are negligible. The adsorbing protein is represented in full-atomistic 3D details, whereas the membrane is considered as a two-dimensional fluid, allow ing us to treat lipid head-group charges in the continuum representation, as usual in regular solution theory. For simplicity of presentation, we assume here membranes containing binary mixtures of acidic and neutral (zwitterionic) lipids. The temporal evolution of the spatially varying charged-lipid compositions on the membrane upper (u) and lower (l) leaflets are linked to the Laplace—Beltrami (LB) operators acting on the corresponding electrochemical potentials through two CH equations (one for each leaflet) each of the form [28]: ! Dlip pﬃﬃﬃ @ðr; tÞ ! ! ¼ Dlip r2LB ð r; tÞ ¼ pﬃﬃﬃ @i ggij @j ð r ; tÞ @t g

ð1Þ

Here, and denote respectively the local mole fraction and local electrochemi cal potential of the charged lipid species in that particular leaflet, g is the metric tensor defined on the leaflet surface, and Dlip represents the diffusion coefficient of charged lipids. Note that Dlip should not affect the equilibrium state. The local electrochemical potentials, in turn, are derived from the free energy functional that itself depends on local lipid component densities and membrane curva ture. This property results in a self-consistent formulation, which remains as the main computational task. More specifically, we assume that the system’s free energy F consists of electrostatic energy, mobile salt ion translational entropy, lipid mixing entropy contributions, membrane bending energy, and a short-range repulsive interac tion energy acting between protein and membrane interfaces [26,27,36,43,44]: F¼Fel þFIM þFlip þFb þFrep

ð2Þ

The system’s electrostatic (Coulomb) energy is given by ð 1 kB T "d ðrÞ2 dv Fel ¼ "0 2 e2

ð3Þ

V

Here, ¼ e=kB T is the dimensionless (reduced) electrostatic potential, with representing the electrostatic potential, kB the Boltzmann’s constant, T the tem perature, and e the elementary charge; "0 is the permeability of free space, while "d is the dielectric constant within the volume element dv. We take "d as 2.0 inside the membrane and the protein and as 80.0 in the aqueous solution. The integra tion in Eq. (3) is performed over the volume V of the entire space. The contribution from the translational entropy of mobile (salt) ions in solution is ð FIM ¼ kB T

nþ ln V

n nþ þ n ln ðnþ þ n 2n0 Þ dv n0 n0

ð4Þ

Modeling Signaling Processes across Cellular Membranes Using a Mesoscopic Approach

243

where nþ and n are local concentrations of mobile cations (þ) and anions (—), respectively, and n0 is the electrolyte concentration in the bulk. The contribution from the 2D mixing entropy due to the mobile lipid mole cules within each leaflet is Flip

# ð" u ð1 u Þ u ln 0 þ ð1 u Þln dAu ð1 u0 Þ u Au " # ð l ð1 l Þ kB T þ l ln 0 þ ð1 l Þln dAl a 1 ð1 01 Þ

kB T ¼ a

ð5Þ

Al

These integrals represent entropic penalties associated with lipid demixing due to possible lipid segregation, on the upper and the lower surfaces of the membrane, respectively. In Eq. (5), 0u and 0l denote the average compositions of charged lipids on the respective leaflets, and a represents the area per lipid head-group. The membrane bending energy in Eq. (2) is the sum of local elastic energies associated with deformations of individual membrane leaflets away from their spontaneous curvatures, as described by the Helfrich free energy: 1 Fu ¼ m 2

ð Au

ð

2

2 1 cu cu0 ðu Þ dAu þ m cl cl0 ðl Þ dAl 2

ð6Þ

Al

Here, cu and cl are the local mean curvatures of each of the two membrane monolayers, and m denotes the bending rigidity of a single monolayer that is here assumed to be the same for each leaflet and for both lipid species. The spontaneous curvatures of the two leaflets, c0u and c0l are described as sums of the spontaneous curvatures of the pure lipid constituents weighted by their local compositions. This approximation has been previously validated [36,48]. Functionally minimizing F with respect to the compositional degree of free dom ðF= ¼ 0Þ results in an expression for the (local) electrochemical potential for charged lipid species on each leaflet [27,36]: ð1 0 Þ þ z þ am ðc0n cc0 Þðc c0 ðÞÞ ¼ þ kB T ln 0 ð1 Þ 0

ð7Þ

where 0 is the average mole fraction of charged lipids, z denotes the valency of charged lipid species, and c0n and c0c represent the spontaneous curvatures of the pure neutral and charged lipids, respectively. Adding lipid species to the for mulation is straightforward, and simply involves modifying the free energies to include an additional compositional variable. Finally, the short-ranged repulsive term Frep accounts for the energy contribu tion related to excluded volume and hydration forces that appear when two surfaces (protein and membrane) come into close proximity of each other [49—52]. This term in the free energy functional is taken as a hard wall potential

244

George Khelashvili and Daniel Harries

˚ , and excludes any that restricts membrane—protein minimal approach to be 2 A configuration that violates this limitation.

2.3 Free energy minimization The free energy functional in Eq. (2) must be minimized with respect to all relevant degrees of freedom in a self-consistent manner. In particular, function ally minimizing F with respect to the mobile ion concentrations leads to the familiar nonlinear PB equation [26—36]:

1 =2

r2 ¼ l 2 sinh

ð8Þ

is the Debye length. This equation is typically used to where l ¼ "0 "w kB T=2e2 n0 describe electrolyte solutions at the mean-field level. Solving Eq. (8) for the volume occupied by the aqueous solution yields the reduced electrostatic potential in space. Note that , in turn, is linked to local lipid compositions in each leaflet through the boundary condition on the leaflet surface @=@r ¼ ez=ða"0 "d Þ, where z and a are valency and lateral area per head-group of charged lipids, respectively. Using the expression for the chemical potential Eq. (7) together with the nonlinear PB Eq. (8) for electrostatic potential and the CH Eq. (1) that describes the temporal evolution of the system from any arbitrary (nonequilibrium) state, the total free energy can be minimized with respect to the local lipid composi tions [53—55]. Practically, this is done by following the state of the system at long times, where steady state is reached. One tacit assumption behind our minimiza tion strategy is that lipid diffusion is fast enough so that lipid compositions locally and continuously adapt to the electrostatic potential in space emanating from the macromolecular adsorbate. In fact, F is also required to be at a minimum with respect to all possible membrane deformations, and this minimization with respect to membrane shape must be carried out self-consistently together with the electrostatic and lipid mixing contributions [27,36]. This presents a challenge, since in principle one has to consider all possible variations in membrane geometry, and these multiple shape deformations generally couple to other degrees of freedom. To overcome this problem, we have designed a novel combined scheme that efficiently accounts for bilayer deformations together with the electrostatic PB solution self-consistently [27]. Our strategy is based on representing the mem brane interface shape (contour) as a linear superposition of N Gaussian functions (used here as a basis set) centered at different locations on the surface of the membrane. In this manner, we can approximate the local membrane height hðx; yÞ at any point ðx; yÞ by the following sum [27]: "

2 !#

2 N X x xi0 y yi0 hðx; yÞ» Ai exp þ ; 2 2 yi xi i¼1

ð9Þ

Modeling Signaling Processes across Cellular Membranes Using a Mesoscopic Approach

245

where Ai -s and i -s denote the amplitude and variances of the ith Gaussian centered at ðxi ; yi Þ. With that, we sample membrane deformations by varying only the Gaussian amplitudes. Note that, Ai -s and i -s are coupled to local membrane curvature through the relation c ¼ r2LB hðx; yÞ, while this local curvature itself is linked to the local lipid composition as described in Eq. (6). The described minimization procedure significantly reduces the dimension ality of phase space that needs to be explored. In the minimization procedure, different Gaussian’s amplitudes are varied at random, and trial moves are accepted only if following the move the free energy is reduced. To ensure selfconsistency, at each trial move we also solve the appropriate PB equation to obtain the electrostatic potential for the particular membrane shape. To couple shape changes to lipid mixing, we alternate between steps aimed at varying membrane deformations, with CH moves that spatially propagate local lipid compositions. This procedure allows the solution to converge to the (local) mini mum of the total free energy.

2.4 Quantitative description of peripheral protein diffusion To also follow the diffusion of an adsorbed protein on a membrane interface, our method implements a dynamic Monte Carlo (DMC) scheme [56—62]. This procedure is advantageous in that it directly relies on available free energies and does not require additional force calculations, which can be time-consuming. According to this scheme, the adsorbed protein diffuses on the membrane surface tracing a stochastic dynamic trajectory. This probabil istic path is generated in accordance with the fluctuation—dissipation theo rem, as the adsorbent’s center of mass makes randompﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ displacements in the two directions of the membrane plane each of size 2Dt 0 D 0 , where is a Gaussian random number with zero mean and unit variance, Dt 0 denotes the dimensionless time-step, and D 0 represents the ratio of protein to lipid diffu sion constants [26]. The random trial move is then accepted or rejected according to a Metropolis-like criterion employing the usual transition prob ability W of value W¼1

if

Fnew ¼ Fold ;

and

W ¼ e ðFnew Fold Þ = kB T

if

Fnew > Fold

ð10Þ

Here, Fold and Fnew are, respectively, the adsorption free energies of the “old” state (before the trial move) and “new” state of the protein—membrane system. If a trial move is accepted, the macromolecule is advanced to the “new” position, the CH equations for lipids are solved, and sampling of membrane deformations are performed for the newly accepted position of the adsorbate. If, on the other hand, the trial move is rejected, the protein remains at its previous position, and the same minimization step is conducted, only now with respect to the previous “old” location of the adsorbate.

246

George Khelashvili and Daniel Harries

2.5 Accounting for amphipathic helix insertions An additional important force that should be considered corresponds to the effect of protein amphipathic helix membrane insertions that often play a critical role in attracting proteins to lipid membranes and in generating membrane curvature (see below). We have made use of an implicit representation of this effect by defining a membrane area (patch) of positive spontaneous curvature (defined as curving “toward” the adsorbing protein) that forms directly “under” an adsorb ing protein at the interaction zone. We then use a phenomenological approach that assumes that the inclusions perturb the bilayer symmetry and its elastic properties primarily around the area of helical insertion [63,64]. We account for insertions to different membrane depths by varying the value for the sponta neous curvature assigned to this locally perturbed membrane region. For each insertion depth, the bilayer is allowed to adjust its geometry locally. The corre sponding deformations at steady state for each penetration depth is found by minimizing the modified free energy functional, which now contains an addi tional (elastic) free energy term accounting for the nonzero spontaneous curva ture region near the adsorbed protein.

3. MODEL APPLICATIONS 3.1 PIP2 and cellular signalingmechanisms of membrane targeting In this section we present several model applications pertaining to the role of polyvalent PIP2 lipids in membrane targeting of peripheral proteins. This target ing mechanism is of special interest for the link between cellular signaling and lipid rafts, because among their multiple functions, PIP2 lipids are known to act as scaffolds for the recruitment of proteins with specific binding domains toward special cell membrane regions, namely rafts, during signal transduction [65]. Through this mechanism, PIP2 lipids are thought to precisely regulate cell signal ing both temporally and spatially. Many of the architectural signaling proteins that use PIP2 lipids for membrane targeting contain structured domains, through which specific binding to poly valent lipids is achieved. Examples include the C2 [4], PH [5], FERM [6], and BAR domains [7]. However, an apparently different type of targeting is realized by numerous other proteins that contain natively unstructured clusters of basic residues, such as the well-studied examples of the GAP43, GTPase K-Ras, and MARCKS proteins or peptides [66—70]. Below, we describe our model application to both structured and unstructured protein domains interacting with PIP2-con taining membranes. The use of positively charged residues for targeting may come as no surprise, as cellular plasma membranes typically contain ~20% anionic lipids. This affords a simple mechanism for protein—lipid binding that is essentially nonspecific, yet able to confine proteins to membrane interfaces. This simple molecular picture has been challenged by recent theoretical and experimental evidence suggesting

Modeling Signaling Processes across Cellular Membranes Using a Mesoscopic Approach

247

that the major anionic lipid component in many cells, phosphatidylserine (PS) (or phosphatidylglycerol), might not be the major participant in peripheral protein binding. Instead, polyvalent lipids such as PIP2 are more likely implicated in segregation close to peripherally adsorbed proteins [45,71—73]. Despite the fact that phosphoinositides constitute typically only around 1% of membrane composition [65], these minority lipids can act at sites of regulation at least partly by electrostatic association with peripheral and embedded proteins. Concen trating PIP2 at the site of protein adsorption is therefore a likely mechanism for local and specific recruitment. It has been suggested that segregated lipids can subsequently be released upon cellular changes, e.g., in Ca2þ concentrations. This provides a way to control the amount of free PIP2 in the membrane, and hence a mechanism for regulating PIP2 known to participate in cellular signaling processes such as enzyme activation, endocytosis, and ion-channel activation [74]. To begin to understand why electrostatic targeting could primarily be achieved by polyvalent rather than the more abundant monovalent lipids, one must focus on the forces that underlie this protein—lipid interaction. Experiments have suggested that PIP2 preferentially segregates at sites of charged protein adsorption. This is reasonable because multivalent lipids should incur a smaller lipid demixing penalty and larger counterion release entropy [75—78] per segre gated lipid, simply because each of them carries a larger charge. Recent theore tical studies predict that multivalent lipids should indeed segregate more than monovalent ones, and that the binding free energy to rigid macromolecules as well as to polyelectrolytes is significantly stronger for such lipids [45,72,73]. But recognizing the dynamic nature of the adsorption problem raises the possibility that the kinetic energy of each adsorbing protein allows it to move so quickly on lipid membranes that some lipids rarely manage to segregate at all. Conversely, lipids may rearrange so quickly around an adsorbing protein that the protein appears stationary to them, creating a transient “binding site”. The result can be a dynamic assembly of a domain or “lipid raft” around a peripheral, adsorbed protein. Through association with the protein, this raft could then impede the protein’s motion in the membrane plane. Our mean-field theory provides an opportunity to quantitatively approach this problem, and describe the combined kinetic effect of many lipid species interacting with peripheral proteins. As described in the next sections, our model allows us to conclude that it is the composition of the membrane on which the adsorbed proteins are diffusing that sensitively determines whether lipids will be effectively sequestered. The model predictions also suggest that protein domains that selectively target PIP2-containing membrane regions can achieve such selectivity through electrostatic interactions alone, and without the need for any additional energy source. However, we also predict that, in order to deform spontaneously flat membrane patches, as required for physiological function, these proteins will utilize alternative mechanisms, such as amphipathic helix insertions. The role of electrostatics in this case appears to be the stabi lization of locally deformed membrane structures, induced by amphipathic inclusions.

248

George Khelashvili and Daniel Harries

3.2 BARmembrane interactions As a first application example, we discuss the interactions of BAR domains with membranes. BAR domains have gained great interest in the study of cell physio logical processes [79—81]. They are known to dimerize into a banana-like mole cular structure [82] that adsorbs to and faces lipid membranes with its concave surface (see Figure 1). The interactions of BAR domain dimers with the cell membrane are associated with a curving of the interface regions that often contain a relatively higher concentration of negatively charged lipids [7,83—85]. The functional role of such membrane remodeling by BARs apparently is to cluster and localize proteins in specialized membrane regions, and is likely to be important for signaling [8]. When present at high concentrations, BAR is capable of tubulating and vesiculating lipid membranes both in vivo and in vitro [8,86]. Some BAR domains (termed N-BARs) have N-terminal regions that appear to fold into amphipathic helices upon BAR—membrane binding, and to insert into the polar head-group region of lipid membranes [86—95].

3.2.1 Elements of membrane remodeling by BAR domains In transforming a membrane that is spontaneously flat at equilibrium into a highly curved structure, BAR appears to take advantage of a special set of structural features (Figure 1). First, by pulling the membrane toward, or away from the protein, the electrostatic interactions between positively charged residues on BAR’s concave surface and negative phospholipid head-groups may cause mem brane deformations away from the flat bilayer plane. The same electrostatic inter actions may also cause lateral sequestration of charged phospholipids near the protein [26,27,45—47,71—73]. This process of lipid demixing in the bilayer plane has been predicted to be particularly significant in membranes containing multivalent lipids, such as PIP2 lipids [26,71,72]. Segregation of such highly charged lipids (net head-group charge of 4.0 at neutral pH [65]) would not only enhance the overall electrostatic interactions between BAR and membrane, but also lead to significant entropic gains. The entropic gain is due to the release into the bulk solution of mobile counterions that were previously bound to each of the macromolecule (protein and membrane) [75,77]. Furthermore, these membrane deformations could lead to local asymmetry between the spontaneous curvatures of the two monolayers comprising a lipid membrane, simply because the head-group of PIP2 is larger than most monovalent lipids, such as PS, or zwiterionic lipids like phosphatidylcholine (PC). Such an asymmetry would be sufficient to produce a local positive curvature in the two bilayer leaflets toward the BAR [63,64,88]. Ultimately, sequestering charged lipids could potentially lead to a new stable state, in which bilayer bending forces favor membranes with local nonzero curvature. Moreover, the mechanism for coupling local lipid composition with membrane curvature may be complemented by a “local spontaneous curvature” mechanism [88], whereby the asymmetry between the spontaneous shapes of two monolayers is achieved by insertion of amphipathic N-terminal helices of certain BAR domains into the lipid polar head-groups region on one side of the mem brane [7,88—95]. According to this mechanism, the insertion of an amphipathic

Modeling Signaling Processes across Cellular Membranes Using a Mesoscopic Approach

249

peptide into one of the leaflets of a flat membrane produces an increase in the local spontaneous curvature of that leaflet because of the local bending of the monolayer where the helix is embedded [63,64,88]. Differences in the sponta neous curvatures of the two monolayers comprising a lipid membrane, one with and the other lacking helical insertions, establishes a new equilibrium state, in which bilayer elastic forces support a locally curved membrane shape. Application of the mean-field theory outlined in the previous section to BAR—membrane systems is geared specifically to discern the role of electrostatic interactions and amphipathic helix insertions in the process of membrane remo deling by BAR domains, by accounting for the coupling between electrostatically driven lipid sequestration and local membrane curvature. By bringing an ener getic perspective to the problem, the model quantitatively answers the following critical questions: Can BAR-induced segregation of polyvalent PIP2 lipids be the cause of substantial membrane deformation? And, how might N-helix insertions complement this coupling?

3.2.2 Lipid demixing upon Amphiphysin BAR dimer adsorption is insufficient on its own to induce significant membrane curvatures Figure 2 shows a top view of the calculated lipid segregation and bilayer defor mations for the equilibrium state of Amphiphysin BAR adsorbing on binary mixtures of 30:70 PS/PC and 4:96 PIP2/PC that are compositionally symmetric on both membrane leaflets. In panel 2C, the BAR domain is outlined for clarity. The results of the free energy minimization procedure reveal weak membrane deformations at equilibrium under the influence of the adsorbing BAR for both PS- and PIP2-containing membranes (Figure 2A—D). The largest membrane defor mation, found in the center of the patches immediately under BAR’s “arch”, ˚ above the height of the planar membrane, a value compar reaches only ~3—4 A able to the expected thermal undulations of the membrane at temperature T = 300 and bending rigidity = 20kBT [97]. These insignificant membrane curvatures are accompanied by only minor segregation of charged lipid around the adsorbing protein in both bilayer mix tures (Figure 2E—H). Thus, even in the regions of strongest aggregation (dark shades) on the BAR-facing leaflet of the PIP2-containing membrane, the PIP2 lipid levels are elevated by only ~1.3 times their 4% bulk value. These PIP2-enriched patches appear near the positively charged tips of the BAR domain, and their formation is the result of strong electrostatic interactions with negatively charged PIP2 lipid head-groups. At the same time, the concentration of PS lipid on the BAR-facing leaflet is minimally affected by the BAR domain. Interestingly, the lipid demixing on the lower monolayer of both membranes can be explained entirely by bending forces acting where the membrane is negatively curved. This curvature favors regions depleted of PS and PIP2 lipid (lighter shades) because these molecules have zero or even positive spontaneous curvature. From the corresponding model binding free energy (DF) we can conclude that lipid demixing and membrane deformations contribute to a lowering DF for BAR/PS/PC and BAR/PIP2/PC complexes by 1.9 kBT and 1.7 kBT, respectively,

250

George Khelashvili and Daniel Harries

Upper leaflet contours

Lower leaflet contours

Upper leaflet PIP2 or PS fractions

Lower leaflet PIP2 or PS fractions

B

E

F

C

D

G

H

30:70 PS/PC

4:96 PIP2/PC

A

0

3

6

9

12

Height above planar membrane,Å

15 0.5

0.7

0.9

1.1

1.3

1.5

φ*α

Figure 2 Adsorption of the Amphiphysin BAR domain on compositionally symmetric binary mixtures of PS/PC (lower panels) and PIP2/PC (upper panels) lipid membranes. The membrane patches are characterized by bending modulus of = 20kBT, and contain 0.3 and 0.04 fraction of PS and PIP2 lipids, respectively. PS and PC lipids are described by spontaneous curvatures of cPS0=1/144 ¯1, cPC0=1/100 ¯1 [88]. The spontaneous curvature of PIP2 is not known from experimental measurements. We assume here cPIP20=1/70 ¯1 in light of the substantial difference in head-group size between PIP2 and PS lipids. The BAR dimer orientation for both calculations is depicted in panel C as the projection of the BAR onto a membrane plane. Panels AD show equilibrium membrane shapes of PIP2/PC and PS/PC membranes, respectively, with contours shown for the local heights of the upper and lower leaflets. Panels EH depict steady-state lipid distributions on the upper and lower leaflets of both membranes (øa, a = PS, PIP2). Shades for EH panels represent ratios of local and average lipid fraction values. For all electrostatic calculations, we used a modified version of the APBS 0.4.0 software [96].

compared to the binding free energies of BAR onto the flat PS/PC and PIP2/PC membranes of the same homogeneous compositions. Nevertheless, the combina tion of lipid segregation with the elastic forces within a membrane appears to be insufficient to produce significant compositional asymmetry between bilayer leaf lets to drive significant bending deformations. Consequently, at steady state, the membrane remains near-flat, within fluctuations, upon BAR adsorption.

3.2.3 N-helix insertions can potentially enhance membrane deformations In order to explore whether insertions of the BAR dimer’s N-helices can enhance membrane curvature, various penetration depths of N-helices were examined, and the results are illustrated in Figure 3. We observe larger membrane deformations upon deeper insertion of N-helices (represented in the model by increasing the local spontaneous curvature). By performing quantitative analysis on binding

Modeling Signaling Processes across Cellular Membranes Using a Mesoscopic Approach

251

Local spontaneous curvature c0=0Å−1

c0=1/200Å−1

c0=1/70Å−1

c0=1/100Å−1

A

D

G

B

E

H

K

C

F

I

L

κ = 10kBT κ = 20kBT

Membrane bending rigidity

κ = 5kBT

J

0

3

6

9

12

15

Height above planar membrane,Å

Figure 3 Steady-state shapes upon binding of the Amphiphysin N-BAR domain dimer plots show upper leaflet contours of membranes with different bending rigidities and with N-helix insertions of various depths. The membrane patches have 0.4 e/nm2 average surface charge densities (corresponding to 0.3 PS lipid fractions) on both layers. The orientation of the BAR domain used in these calculations is the same as in Figure 2. For all systems, a nonzero spontaneous curvature c0 domain was defined for a membrane patch inside the BAR projection area shown in panel L and extending 20 ¯ away from the projected zone. The values for c0 in the range of 01/70 ¯1 were used.

energies for the membrane patches shown in Figure 3, our model predicts that a single adsorbing Amphiphysin N-BAR dimer will stabilize membrane patches that have the inherent propensity for high curvature, reflected by the lipid tendency to create local distortions that closely match the curvature of the BAR dimer itself. Additional calculations at different concentrations of charged lipids revealed that increasing PS lipid fraction from 0.3 to 0.5 resulted in stronger BAR binding with substantial (ca. 6kBT) strengthening of the adsorption free energy, but without noticeable changes in the equilibrium membrane deformation, shown in Figure 3. Taken together, the model results indicate that the N-helix insertions have a critical mechanistic role in the local perturbation and curving of the membrane, which is further stabilized by the electrostatic interaction with the BAR dimer. Figure 3 is also a clear illustration that our method is able to accurately predict the experimentally observed and theoretically reproduced symmetry breaking

252

George Khelashvili and Daniel Harries

upon N-BAR dimer adsorption onto a membrane. Notably, our approach does so through the resulting self-consistent free energy minimization procedure, with out a priori knowledge of any BAR-induced spontaneous curvature fields. This distinguishes our model from alternative mesoscopic approaches that assume BAR-generated nonisotropic curvature fields [25,98].

3.2.4 Membrane tubulation and vesiculation by arrays of BAR domains A question that remains open is how the observed local deformations introduced by a single BAR translate into global changes in membrane shape observed upon binding of high concentrations of BARs [22,99—101]. Results of our calculations predict that, because of the interplay between electrostatic and elastic forces, a single BAR dimer deforms membranes so that the bilayer region under the BAR can be substantially curved. At the same time, the membrane remains flat within fluctuations beyond this interaction zone. Thus, it is clear that surrounding this high curvature area there must exist a narrow region, or “rim”, where the sign of the local membrane curvature changes from positive (under the BAR) to negative (outside the interaction zone) eventually decaying to zero. Although electrostati cally advantageous, the formation of such a rim is opposed by bending forces within the membrane, because lipids in the rim zone pay an elastic penalty for bending away from the spontaneous curvature c0. The larger the membrane deformations, the larger the expected free energy penalty exerted on the rim. To conclude, binding of an additional BAR will be most favorable energetically if, together with minimizing the electrostatic interactions, the BAR also alleviates the membrane stress introduced by the one already adsorbed. The optimal manner for achieving this effect with multiple BARs is clearly not a simple additive super position of effects from a single BAR, but rather must include collective properties [22,99—101]. Bridging the gap between our calculations and experimental results showing membrane tubulation and vesiculation by arrays of BAR domains is one of the future challenges in the field of BAR/membrane modeling.

3.3 Adsorption of natively unstructured protein domains onto lipid membranes As an illustration of membrane binding of natively unstructured protein domains, we describe the predictions from our calculations pertaining to the adsorption of basic lysine-13 (Lys13) peptides onto mixed lipid membranes. Basic polypeptides, such as Lys13, are well-studied simple yet realistic models to describe membrane anchoring of unstructured domains such as MARCKS [46,71]. We first present results for stationary adsorbed Lys13 peptides in the presence of diffusing lipids. Then, to consider the effect of protein mobility, we discuss how our predictions change if the adsorbate is also allowed to diffuse.

3.3.1 Sequestration of PIP2 lipids by adsorbing basic polypeptides Figure 4 shows the charged lipid organization for a ternary 74:25:1 PC/PS/PIP2 mixture (Figure 4a and b) and binary 71:29 PC/PS mixture (Figure 4c) upon

Modeling Signaling Processes across Cellular Membranes Using a Mesoscopic Approach

PS in PC/Ps/PIP2

PIP2 in PC/Ps/PIP2 (a)

PS in PC/Ps (c)

(b)

φ*PIP2

φ*PS

φ*PS

0

0.9

253

1.8

2.7

3.6

4.5

Figure 4 Adsorption of lysine-13 polypeptide onto ternary phosphatidylcholine (PC)/ phosphatidylserine (PS)/PIP2 lipid membrane with 74:25:1 composition (panels A and B), and onto binary PC/PS lipid membrane with 71:29 composition (panel C). (a) Normalized local fraction of PIP2 lipids in the ternary system. (b) Local PS lipid fractions in the ternary system. (c) Local PS lipid fraction in the binary mixture. All plots shown for t = 0.5 ms after beginning of propagation. For these calculations lysine-13 was placed near the membrane, such that the minimum distance between van der Waals radii of lysine-13 and membrane atoms was 3 ¯, and the peptide was oriented with its major (long) axis parallel to the bilayer plane.

Lys13 binding. Both lipid compositions are characterized by the same surface charge density, and the snapshots are taken after 500 ns (a point where steady state is achieved for lipid compositions) starting from a completely homogenous lipid distribution. From Figure 4a we learn that the fraction of PIP2 lipid increases up to 4.5-fold near the adsorbed Lys13 side chains, where the positive charge is greatest. This area is surrounded by a region with lower PIP2 content, showing only 2.5—3-fold increase in multivalent lipid fraction. Because the peptide backbone is rich in both positive and negative charges, there are only minor changes in PIP2 content along the Lys13 backbone with respect to the bulk concentration. In contrast, Figure 4b reveals almost no sequestration of PS by the peptide. The highest increase in PS lipid is only 1.5-fold, observed, as expected, along the Lys13 side chains. For comparison, Figure 4c shows that even in PIP2-free membranes, the segregation of PS lipids around Lys13 is marginal. Thus, in agreement with other theoretical predictions [45,46], our model indicates that an adsorbing stationary basic peptide will sequester primarily PIP2 lipids, and will only very weakly sequester PS lipids.

3.3.2 Diffusion of peripheral proteins on lipid membranes The intriguing question that now arises is how the extent of PIP2 sequestration by Lys13 peptide described above will change when considering a more realistic

254

George Khelashvili and Daniel Harries

scenario, where the adsorbate is also allowed to diffuse on the membrane surface. In particular, we focus on how the macromolecule diffusion rates are affected by the acidic lipids in the membrane, and how different lipids can influence the apparent protein diffusion rates. To address these questions, we follow a simplified spherical macroion that is allowed to move concomitantly with lipid diffusion. To do so, we extend our model to include protein diffusion and performed CH-DMC (see Section 2.4) calculations. We studied the same mixed membranes considered in Figure 4, focusing on two typical cases. In the first, the model protein has a diffusion constant much larger than that of lipids in the unperturbed (bare) membrane, with a ratio D0 = 10 between the two, while in the second, the diffusion constant is comparable to that of the lipids, and D0 = 2 (see Eq. 10). As we show, these two scenarios lead to different lipid and protein diffusion characteristics.

3.3.3 Modeling a fast protein diffusing over PIP2-containing versus PIP2-depleted membranes Following the time evolution of the system, depicted in Figure 5c, reveals signifi cant local PIP2 lipid segregation around the fast-diffusing protein as it moves over a ternary PC/PS/PIP2 membrane. Quantitative analysis of protein diffusion rates predicts a prominent concomitant retardation in the macroion’s movement. Due to lipid rearrangement, the adsorbate diffusion becomes confined, for a limited time, to an area rich in PIP2. However, due to the model protein’s high mobility compared to that of lipids, the adsorbate occasionally and temporarily escapes, leaving behind the multivalent lipid cloud that had segregated around it. The free diffusion of the macroion does not last very long, because PIP2 lipids quickly segregate again around the new protein position. This segregation is due to the large forces acting on the PIP2 lipids by the electrostatic field emanating from the adsorbate. Essentially, the macroion diffuses and drags PIP2 lipids along its way, while the PIP2 units that are segregated retard the free diffusion of the protein. In contrast to the strong PIP2 segregation, we found that PS segregation in the ternary mixtures is very weak, in accordance with our previous findings for the stationary peptide (Figure 4b). We compare this diffusion process with the same rapid model protein diffus ing on a binary PIP2-depleted membrane containing only monovalent (PS) lipids (Figure 5a). Clearly, acidic (PS) lipids segregate around the macroion to a much lesser extent compared to the ternary mixture, resulting in low energetic barriers to adsorbate motion. Hence, the diffusion of the macroion here is less restricted compared to that seen for the ternary mixture.

3.3.4 Slow protein diffusing over PIP2-containing versus PIP2-depleted membranes

Diffusion of a slower model protein, D0 = 2, on the same binary and ternary membranes (Figure 5b and d, respectively) shows qualitatively similar behavior to that observed for D0 = 10. However, due to the lower mobility of the macroion, the acidic lipids have more time to effectively segregate near the adsorbate, and

Modeling Signaling Processes across Cellular Membranes Using a Mesoscopic Approach

(a)

(c)

(b)

(d)

255

D ′ = 10

20 Å D′ = 2

φ*PIP2 (in binaray)

φ*PS (in binaray) 0

0.9

1.8

2.7

3.6

4.5

Figure 5 Diffusion of charged spherical macroion of radius 10 ¯ and a uniform surface charge density of 1e per 93 ¯2 on mixed membranes. The panels show the local surface charge densities after 0.6 ms of simulations (shades) and the entire macroion trajectories in that time (connected lines) for binary (71:29 PC/PS) mixture, D0 =10 (a), for ternary (74:25:1 PC/PS/PIP2) mixture, D0 =10 (c), for binary (PC/PS) mixture, D0 =2 (b), and for ternary (PC/PS/PIP2) mixture, D0 = 2 (d). The dashed circles on each panel represent the projected size of the macroion with arrows indicating the starting position for the macroion center of mass. For clarity, the figures zoom on the relevant membrane surface region explored by the macroion, and a scale bar of 20 ¯ is shown for reference.

therefore segregate more strongly. The result is that a majority of the macroion moves are restricted to the acidic lipid-rich patch that forms close to the protein. This is particularly noticeable for the ternary system, where the macroion practi cally never escapes to go beyond the circular patch formed by PIP2 lipids, but rather diffuses together and within it. Whereas for the fast protein on ternary mixtures we observed the creation and destruction of macroion/PIP2 “binding sites”, for the slower protein this lipid—protein “complex” stays intact for the entire trajectory. In a sense, we find that there are always PIP2 lipids associated with the macroion as it diffuses on the membrane.

256

George Khelashvili and Daniel Harries

3.3.5 Implications for the role of PIP2 in anchoring proteins to specialized membrane domains Our results suggest that PIP2 lipids can diffuse in concert with adsorbed mole cules even when the diffusion of the adsorbate is much faster than lipid diffusion. In contrast, monovalent PS lipids segregate only weakly, so that macromolecule and lipid diffusion will remain largely uncorrelated. The difference in behavior between different lipid species arises because PIP2 lipids, in the presence of the protein electric field, are much more mobile than PS, due to their higher charge and hence larger chemical potential. Predictions from our model bear interesting implications for the role of PIP2 lipids in anchoring natively unstructured domains (and other peripherally bound proteins) to lipid membranes. Clearly, to carry out their function, periph eral proteins must often remain localized in certain regions on the membrane (say, in rafts) for some duration of time. This requires a mechanism that would slow down diffusion across the membrane in the region in which these proteins must act. In agreement with recent experimental observations [70,71], our model predicts segregation of PIP2 lipids around the diffusing charged protein, keeping these lipids effectively “bound” to the protein vicinity, and retarding the protein’s diffusive motion.

4. FUTURE PROSPECTS We have presented a new computational modeling approach to describe mem brane-associated interactions that evolve at mesoscales and over long times. Through illustrative examples, we showed how this self-consistent strategy is successful in elucidating some of the fundamental mechanistic aspects of cellular signaling. Specifically, we have shown how this method can not only help to discern the role of polyvalent lipids in recruitment and confinement of signaling proteins to specialized membrane regions to carry out their physiological func tion, but also can illuminate mechanisms responsible for membrane remodeling by signaling motifs such as BAR domains. Toward developing a powerful and complete methodology, we are now pursuing several key enhancements and extensions to the model, to enable quantitative description of large-scale membrane-associated processes set in action by complex signaling machinery. Two major improvements are especially noteworthy in the context of lipid rafts and cell signaling. The first involves adding to the model quantitative details on TM protein—membrane interactions and connecting these degrees of freedom with the current knowledge on inter actions between peripheral proteins and membranes. The second is to enable a description of phase separating elastic lipid membranes to capture the formation and dynamics of membrane rafts. Obviously, these two enhancements are closely related through the wellknown impact of cholesterol on both raft formation and functioning of raftcontaining TM domains, such as GPCR proteins. Thus, with the intended modifications, the model should be able to elucidate the role of cholesterol and

Modeling Signaling Processes across Cellular Membranes Using a Mesoscopic Approach

257

other synergetic raft modulators in the function and organization of signaling proteins. With that, the extended methodology should become complementary to other simulation techniques, covering temporal and spatial regimes that are currently not readily accessible by existing techniques. The main gain for this type of approach (and a potential reason for the wider applicability) is that the method adds information concerning longer timescales and can reach the steady state of the system. Together with the opportunity to discuss protein—membrane interactions in terms of model free energies, the new methodology can, for the first time, begin to access a quantitative view of large-scale interactions during cellular signaling that act across the plasma membrane interface.

ACKNOWLEDGMENTS We thank Nathan Baker, Michael Holst, and Todd Dolinsky for their advice on modifying the APBS code, as well as Harel Weinstein, Jim Sethna, Adrian Parsegian, David Andelman, and Brian Todd for valuable comments on the original manuscripts describing our method. GK is supported by grants from the National Institutes of Health P01 DA012408 and P01 DA012923. DH acknowledges support from the Israel Science Foundation (ISF Grant No. 1011/07) as well as an allocation for a highperformance computer cluster facility (ISF Grant No. 1012/07). The Fritz Haber research center is supported by the Minerva Foundation, Munich, Germany. Computational resources of the David A. Cofrin Center for Biomedical Information in the HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine are gratefully acknowledged.

REFERENCES 1. McIntosh, T.J., Simon, S.A. Roles of bilayer material properties in function and distribution of membrane proteins. Annu. Rev. Biophys. Biomol. Struct. 2006, 35, 177—98. 2. Lingwood, D., Simons, K. Lipid rafts as a membrane-organizing principle. Science 2010, 327, 46—50. 3. Pontier, S.M., Percherancier, Y., Galandrin, S., Breit, A., Gales, C., Bouvier, M. Cholesterol-dependent separation of the beta2-adrenergic receptor from its partners determines signaling efficacy: Insight into nanoscale organization of signal transduction. J. Biol. Chem. 2008, 283, 24659—72. 4. Hurley, J.H., Misra, S. Signaling and subcellular targeting by membrane binding domains. Annu. Rev. Biophys. Biomol. Struct. 2000, 29, 49—79. 5. Lemon, M.A., Ferguson, K.M. Signal-dependent membrane targeting by pleckstrin homology (PH) domain. Biochem. J. 2000, 350, 1—18. 6. Hamada, K., Shimizu, T., Matsui, T., Tsukita, S., Hakoshima, T. Structural basis of the membranetargeting and unmasking mechanisms of radixin FERM domain. EMBO J. 2000, 19, 4449—62. 7. Peter, B.J., Kent, H.M., Mills, I.G., Vallis, Y., Butler, P.J.G., Evans, P.R., McMahon, H.T. BAR domains as sensors of membrane curvature: The ampiphysin BAR structure. Science 2004, 303, 495—9. 8. Madsen, K.L., Eriksen, J., Milan-Lobo, L., Han, D.S., Niv, M.Y., Ammendrup-Johnsen, I., Henriksen, U., Bhatia, V.K., Stamou, D., Sitte, H.H., McMahon, H.T., Weinstein, H., Gether, U. Membrane localization is critical for activation of the PICK1 BAR (bin/amphiphysin/rvs) domain. Traffic 2008, 9, 1327—43. 9. Simons, K., Ikonen, E. Functional rafts in cell membranes. Nature 1997, 387, 569—72. 10. Brown, D.A. Lipid rafts, detergent-resistant membranes, and raft targeting signals. Physiology 2006, 21, 430—9. 11. Veatch, S.L., Keller, S.L. Seeing spots: Complex phase behavior in simple membranes. Biochim. Biophys. Acta 2005, 1746, 172—85. 12. Edidin, M. The state of lipid rafts: From model membranes to cells. Annu. Rev. Biophys. Biomol. Struct. 2003, 32, 257—83.

258

George Khelashvili and Daniel Harries

13. Munro, S. Lipid rafts: Elusive or illusive? Cell 2003, 115, 377—88. 14. Simons, K., Vaz, W.L. Model systems, lipid rafts, and cell membranes. Annu. Rev. Biophys. Biomol. Struct. 2004, 33, 269—95. 15. McMullen, T.P.W., Lewis, R.N.A.H., McElhaney, R.N. Cholesterol—phospholipid interactions, the liquid-ordered phase and lipid rafts in model and biological membranes. Curr. Opin. Colloid Interface Sci. 2004, 8, 459—68. 16. Lichtenber, D., Goni, F.M., Heerklotz, H. Detergent-resistant membranes should not be identified with membrane rafts. Trend. Biochem. Sci. 2005, 30, 430—6. 17. Pandit, S.A., Khelashvili, G., Jakobsson, E., Grama, A., Scott, H.L. Lateral organization in lipidcholesterol mixed bilayers. Biophys. J. 2007, 92, 440—7. 18. Khelashvili, G., Pandit, S.A., Scott, H.L. Self-consistent mean-field model based on molecular dynamics: Application to lipid-cholesterol bilayers. J. Chem. Phys. 2005, 123, 034910. 19. Khelashvili, G., Scott, H.L. Combined Monte Carlo and molecular dynamics simulation of hydrated 18:0 sphingomyelin-cholesterol lipid bilayers. J. Chem. Phys. 2004, 120, 9841—7. 20. Varma, R., Mayor, S. GPI-anchored proteins are organized in submicron domains at the cell surface. Nature 1998, 394, 798—801. 21. Friedrichson, T., Kurzchalia, T.V. Microdomains of GPI-anchored proteins in living cells revealed by crosslinking. Nature 1998, 394, 801—5. 22. Arkhipov, A., Yin, Y., Schulten, K. Four-scale description of membrane sculpting by BAR domains. Biophys. J. 2008, 95, 2806—21. 23. Marrink, S.J., Risselada, H.J., Yefimov, S., Tieleman, D.P., de Vries, A.H. The MARTINI force field: Coarse grained model for biomolecular simulations. J. Phys. Chem. B 2007, 111, 7812—24. 24. Lu, L., Voth, G.A. Systematic coarse-graining of a multicomponent lipid bilayer. J. Phys. Chem. B 2009, 113, 1501—10. 25. Ayton, G.S., Blood, P.D., Voth, G.A. Membrane remodeling from N-BAR domain interactions: Insights from multi-scale simulation. Biophys. J. 2007, 92, 3595—602. 26. Khelashvili, G., Weinstein, H., Harries, D. Protein diffusion on charged membranes: A dynamic mean-field model describes time evolution and lipid reorganization. Biophys. J. 2008, 94, 2580—97. 27. Khelashvili, G., Harries, D., Weinstein, H. Modeling membrane deformations and lipid demixing upon protein-membrane interaction: The BAR dimer adsorption. Biophys. J. 2009, 97, 1626—35. 28. Chaikin, P.M., Lubensky, T.C. Principles of Condensed Matter Physics, Cambridge University Press, Cambridge, 2000. 29. Helfrich, W. Elastic properties of lipid bilayers: Theory and possible experiments. Z. Naturforsch. 1973, 28c, 693—703. 30. Sharp, K.A., Honig, B. Electrostatic interactions in macromolecules: Theory and applications. Annu. Rev. Biophys. Chem. 1990, 19, 301—32. 31. Andelman, D. In Handbook of Biological Physics (eds A.J. Hoff), Vol. 1B, Elsevier Science B.V., Amsterdam, 1995, pp. 603—42. 32. Reiner, E.S., Radke, C.J. Variational approach to the electrostatic free energy in charged colloidal suspensions: General theory for open systems. J. Chem. Soc. Faraday Trans. 1990, 86, 3901—12. 33. Honig, B., Nicholls, A. Classical electrostatics in biology and chemistry. Science 1995, 268, 1144—9. 34. Borukhov, I., Andelman, D., Orland, H. Steric effects in electrolytes: A modified Poisson-Boltz mann equation. Phys. Rev. Lett. 1997, 79, 435—8. 35. Fogolari, F., Briggs, J.M. On the variational approach to the Poisson-Boltzmann free energies. Chem. Phys. Lett. 1997, 281, 135—9. 36. Harries, D., May, S., Ben-Shaul, A. Curvature and charge modulations in lamellar DNA-lipid complexes. J. Phys. Chem. B 2003, 107, 3624—30. 37. Chernomordik, L.V., Kozlov, M.M. Protein-lipid interplay in fusion and fission of biological membranes. Annu. Rev. Biochem. 2003, 72, 175—207. 38. Jahn, R., Grubmuller, H. Membrane fusion. Curr. Opin. Cell Biol. 2002, 14, 488—95. 39. Kozlovsky, Y., Efrat, A., Siegel, D.A., Kozlov, M.M. Stalk phase formation: Effects on dehydration and saddle splay modulus. Biophys. J. 2004, 87, 2508—21. 40. Kozlovsky, Y., Kozlov, M.M. Membrane fission: Model for intermediate structures. Biophys. J. 2003, 85, 85—96. 41. Wiese, W., Helfrich, W. Theory of vesicle budding. J. Phys. Condens. Matter 1990, 2, SA329—32.

Modeling Signaling Processes across Cellular Membranes Using a Mesoscopic Approach

259

42. Mashl, R.J., Bruinsma, R.F. Spontaneous-curvature theory of clathrin-coated membranes. Biophys. J. 1998, 74, 2862—75. 43. May, S., Harries, D., Ben-Shaul, A. Lipid demixing and protein-protein interactions in the adsorption of charged proteins on mixed membrane. Biophys. J. 2000, 79, 1747—60. 44. Harries, D., May, S., Gelbart, W.M., Ben-Shaul, A. Structure, stability, and thermodynamics of lamellar DNA-lipid complexes. Biophys. J. 1998, 75, 159—73. 45. Haleva, E., Ben-Tal, N., Diamant, H. Increased concentration of polyvalent phospholipids in the adsorption domain of a charged protein. Biophys. J. 2004, 86, 2165—78. 46. Wang, J., Gambhir, A., McLaughlin, S., Murray, D.A. Computational model for the electrostatic sequestration of PI(4,5)P2 by membrane-adsorbed based peptides. Biophys. J. 2004, 86, 1969—86. 47. McLaughlin, S., Murrray, D. Plasma membrane phosphoinositide organization by protein elec trostatics. Nature 2005, 438, 605—11. 48. Andelman, D., Kozlov, M.M., Helfrich, W. Phase transitions between vesicles and micelles driven by competing curvature. Europhys. Lett. 1994, 25, 231—6. 49. Petrache, H.I., Harries, D., Parsegian, V.A. Alteration of lipid membrane rigidity by cholesterol and its metabolic precursors. Macromol. Symp. 2005, 219, 39—50. 50. Petrache, H.I., Gouliaev, N., Tristram-Nagle, S., Zhang, R.T., Sutter, R.M., Nagle, J.F. Interbilayer interactions from high-resolution x-ray scattering. Phys. Rev. E 1998, 57, 7014—24. 51. Evans, E.A., Parsegian, V.A. Thermal-mechanical fluctuations enhance repulsion between bio molecular layers. Proc. Natl. Acad. Sci. 1986, 83, 7132—6. 52. Podgornik, R., Parsegian, V.A. Thermal-mechanical fluctuations of fluid membranes in confined geometries: The case of soft confinement. Langmuir 1992, 8, 557—62. 53. Fang, F., Szleifer, I. Competitive adsorption in model charged protein mixtures: Equilibrium isotherms and kinetic behavior. J. Chem. Phys. 2003, 119, 1053—65. 54. Fang, F., Szleifer, I. Controlled release of proteins from polymer-modified surfaces. Proc. Natl. Acad. Sci. 2006, 103, 5769—74. 55. Faure, M.C., Bassereau, P., Carignao, M.A., Szleifer, I., Gallot, Y., Andelman, D. Monolayers of diblock copolymer at the air-water interface: The attractive monomer-surface case. Euro. Phys. J. B 1998, 3, 365—75. 56. Saxton, M. Anomalous diffusion due to binding: A Monte Carlo study. Biophys. J. 1996, 70, 1250—62. 57. Binder, K., Heermann, D.W. Monte Carlo Simulation in Statistical Physics, Springer Verlag, Berlin, 2002. 58. Fichthorn, K.A., Weinberg, W.H. Theoretical foundations of dynamical Monte Carlo simulations. J. Chem. Phys. 1991, 95, 1090—6. 59. Kang, H.C., Weinberg, W.H. Monte Carlo simulations of surface-rate processes. Acc. Chem. Res. 1992, 25, 253—9. 60. Baumgartner, A. Statics and dynamics of the freely jointed polymer chain with Lennard-Jones interaction. J. Chem. Phys. 1980, 72, 871—9. 61. Graf, P., Nitzan, A., Kurnikova, M.G., Coalson, R.D. A dynamic lattice Monte Carlo model of ion transport in inhomogeneous dielectric environments: Method and implementation. J. Phys. Chem. B 2000, 104, 12324—38. 62. Chern, S.-S., Cardenas, A.E., Coalson, R.D. Three-dimensional dynamic Monte Carlo simulations of driven polymer transport through a hole in a wall. J. Chem. Phys. 2001, 115, 7772—82. 63. Campelo, F., McMahon, H.T., Kozlov, M.M. The hydrophobic insertion mechanism of membrane curvature generation by proteins. Biophys. J. 2008, 95, 2325—39. 64. Zemel, A., Ben-Shaul, A., May, S. Perturbation of a lipid membrane by amphiphatic peptides and its role in pore formation. Biophys. J. 2005, 34, 230—42. 65. McLaughlin, S., Wang, J., Gambhir, A., Murray, D. PIP2 and proteins: Interactions, organization, and information flaw. Annu. Rev. Biophys. Struct. 2002, 31, 151—75. 66. McLaughlin, S., Murray, D. Plasma membrane phosphoinositide organization by protein electro statics. Nature 2005, 438, 605—11. 67. Heo, W.D., Inoue, T., Park, W.S., Kim, M.L., Park, B.O., Wandless, T.J., Meyer, T. PI(3,4,5)P3 and PI(4,5)P2 lipids target proteins with polybasic clusters to the plasma membrane. Science 2006, 314, 1458—61.

260

George Khelashvili and Daniel Harries

68. Yeung, T., Terebiznik, M., Yu, L., Silvius, J., Abidi, W.M., Phillips, M., Levine, T., Kapus, A., Grinstein, S. Receptor activation alters inner surface potential during phagocytosis. Science 2006, 313, 347—51. 69. Wang, J., Gambhir, A., Hangyas-Mihalyne, G., Murray, D., Golebiewska, U., McLaughlin, S. Lateral sequestration of phosphatidylinositol 4,5-biphosphate by the basic effector domain of mystoylated alanine-rich C kinase substrate is due to nonspecific electrostatic interactions. J. Biol. Chem. 2002, 277, 34401—12. 70. Nomikos, M., Mulgrew-Nesbitt, A., Pallavi, P., Mihalyne, G., Zaitseva, I., Swann, K., Lai, F.A., Murray, D., McLaughlin, S. Binding of phosphoinositide-specific phospholipase C- (PLC-) to phospholipid membranes: Potential role of an unstructured cluster of basic residues. J. Biol. Chem. 2007, 282, 16644—53. 71. Golebiewska, U., Gambhir, A., Hangyas-Mihalyne, G., Zaitseva, I., Radler, J., McLaughlin, S. Membrane-bound basic peptides sequester multivalent (PIP2), but not monovalent (PS), acidic lipids. Biophys. J. 2006, 91, 588—99. 72. Tzlil, S., Ben-Shaul, A. Flexible charged macromolecules on mixed fluid lipid membranes: Theory and Monte-Carlo simulations. Biophys. J. 2004, 89, 2972—87. 73. Wang, J., Gambhir, A., McLaughlin, S., Murray, D. A computational model for the electrostatic sequestration of PI(4,5)P2 by membrane-adsorbed based peptides. Biophys. J. 2004, 86, 1969—86. 74. Gomperts, B., Tatham, P., Kramer, I. Signal Transduction, Academic Press, San Diego, 2003. 75. Record, M.T. Jr., Anderson, C.F., Lohman, T.M. Thermodynamic analysis of ion effects on the binding and conformational equilibria of proteins and nucleic acids: The roles of ion association or release, screening, and ion effects on water activity. Q. Rev. Biophys. 1978, 11, 103—78. 76. Parsegian, V.A., Gingell, D. On the electrostatic interaction across a salt solution between two bodies bearing unequal charges. Biophys. J. 1972, 81, 1192—204. 77. Wagner, K., Harries, D., May, S., Kahl, V., Raedler, J.O., Ben-Shaul, A. Direct evidence for counterion release upon cationic lipid-DNA condensation. Langmuir 2000, 16, 303—6. 78. Sharp, K.A., Friedman, R.A., Misra, V., Hecht, J., Honig, B. Salt effects on polyelectrolyte-ligand binding: Comparison of Poisson-Boltzmann, and limiting law/counterion binding models. Bio polymers 1995, 36, 245—62. 79. Ren, G., Vajjhala, P., Lee, J.S., Winsor, B., Munn, A.L. The BAR domain proteins: Molding membranes in fission, fusion, and phagy. Microbiol. Mol. Biol. Rev. 2006, 70, 37—120. 80. Habermann, B. The BAR-domain family of proteins: A case of bending and binding? EMBO Rep. 2004, 5, 250—5. 81. Dawson, J.C., Legg, J.A., Machesky, L.M. BAR domain proteins: A role in tubulation, scission and actin assembly in clathrin-mediated endocytosis. Trends. Cell Biol. 2006, 16, 493—8. 82. Perez, J.L., Khatri, L., Chang, C., Srivastava, S., Osten, P., Ziff, E.B. PICK1 targets activated protein kinase calpha to AMPA receptor clusters in spines of hippocampal neurons and reduces surface levels of the AMPA-type glutamate receptor subunit 2. J. Neurosci. 2001, 21, 5417—28. 83. Lu, W., Ziff, E.B. PICK1 interacts with ABP/GRIP to regulate AMPA receptor trafficking. Neuron 2005, 47, 407—21. 84. Jin, W., Ge, W.-P., Xu, J., Cao, M., Peng, L., Yung, W., Liao, D., Duan, S., Zhang, M., Xia, J. Lipid binding regulates synaptic targeting of PICK1, AMPA receptor trafficking, and synaptic plasti city. J. Neurosci. 2006, 26, 2380—90. 85. Saarikangas, J., Zhao, H., Pykalainen, A., Laurinmaki, P., Mattila, P.K., Kinnunen, P.K., Butcher, S.J., Lappalainen, P. Molecular mechanisms of membrane deformation by I-BAR domain pro teins. Curr. Biol. 2009, 19, 95—107. 86. Gallop, J.L., McMahon, H.T. BAR domains and membrane curvature: Bringing your curves to the BAR. Biochem. Soc. Symp. 2005, 72, 223—31. 87. Itoh, T., De Camilli, P. BAR, F-BAR (EFC) and ENTH/ANTH domains in the regulation of membrane-cytosol interfaces and membrane curvature. Biochim. Biophys. Acta 2006, 1761, 897—912. 88. Zimmerberg, J., Kozlov, M.M. How proteins produce cellular membrane curvature. Nat. Rev. Mol. Cell. Biol. 2006, 7, 9—19. 89. Gallop, J.L., Jao, C.C., Kent, H.M., Butler, P.J.G., Evans, P.R., Langen, R., McMahon, H.T. Mechan ism of endophilin N-BAR domain-mediated membrane curvature. EMBO J. 2004, 25, 2898—910.

Modeling Signaling Processes across Cellular Membranes Using a Mesoscopic Approach

261

90. Farsad, K., Ringstad, N., Takei, K., Floyd, S.R., Rose, K., De Camilli, P. Generation of high curvature membranes mediated by direct endophilin bilayer interactions. J. Cell. Biochem. 2001, 155, 193—200. 91. Masuda, M., Takeda, S., Sone, M., Ohki, T., Mori, H., Kamioka, Y., Mochizuki, N. Endophilin BAR domain drives membrane curvature by two newly identified structure-based mechanisms. EMBO J. 2006, 25, 2889—97. 92. Ford, M.G.J., Mills, I.G., Peter, B.J., Valls, Y., Praefcke, G.J.K., Evans, P.R., McMahon, H.T. Curvature of clathrin-coated pits driven by epsin. Nature 2002, 419, 361—6. 93. Lee, M.C., Orci, L., Hamamoto, S., Futal, E., Ravazzola, M., Schekman, R. Sar1p N-terminal helix initiates membrane curvature and completes the fission of a COPII vesicle. Cell 2005, 122, 605—17. 94. Nie, Z., Hirsch, D.S., Luo, R., Jian, X., Stauffer, S., Cremesti, A., Andrade, J., Lebowitz, J., Marino, M., Ahvazi, B., Hinshaw, J.E., Randazzo, P.A. A BAR domain in the N terminus of the arf GAP ASAP1 affects membrane structure and trafficking of epidermal growth factor receptor. Curr. Biol. 2006, 16, 130—9. 95. Fernandes, F., Loura, L.M.S., Chichon, F.J., Carrascosa, J.L., Fedorov, A., Prieto, M. Role of helix-0 of the N-BAR domain in membrane curvature generation. Biophys. J. 2008, 94, 3065—73. 96. Baker, N.A., Sept, D., Joseph, S., Holst, M.J., McCammon, J.A. Electrostatics of nanosystems: Application to microtubules and the ribosome. Proc. Natl. Acad. Sci. 2001, 98, 10037—41. 97. Lindahl, E., Edholm, O. Mesoscopic undulations and thickness fluctuations in lipid bilayers from molecular dynamics simulations. Biophys. J. 2000, 79, 426—33. 98. Ayton, G.S., Lyman, E., Krishna, V., Swenson, R.D., Mim, C., Unger, V.M., Voth, G.A. New insights into BAR domain-induced membrane remodeling. Biophys. J. 2009, 97, 1616—25. 99. Yin, Y., Arikhipov, A., Schulten, K. Simulations of membrane tubulation by lattices of amphi physin N-BAR domains. Structure 2009, 17, 882—92. 100. Frost, A., Perera, R., Roux, A., Spasov, K., Destaing, O., Egelman, E.H., de Camilli, P., Unger, V.M. Structural basis of membrane invagination by F-BAR domains. Cell 2008, 132, 807—17. 101. Shimada, A., Niwa, H., Tsujita, K., Suetsugu, S., Nitta, K., Hanawa-Suetsugu, K., Akasaka, R., Nishino, Y., Toyama, M., Chen, L., Liu, Z.-J., Wang, B.-C., Yamamoto, M., Terada, T., Miyazawa, A., Tanaka, A., Sugano, S., Shirouzu, M., Nagayama, J., Takenawa, T., Yokoyama, S. Curved EFC/ F-BAR-domain dimers are joined end to end into a filament for membrane invagination in endocytosis. Cell 2007, 129, 761—72.

CHAPTER

13 Folding of Conjugated Proteins Dalit Shental-Bechor, Oshrit Arviv, Tzachi Hagai, and Yaakov Levy

Contents

1. Introduction 2. Methods 3. Results and Discussion 3.1 Folding of glycoproteins 3.2 Folding of proteins with flexible tails 3.3 Folding of ubiquitinated proteins 3.4 Folding of multidomain proteins 4. Conclusions References

Abstract

This review aims at discussing the molecular details of the folding mechanisms of conjugated proteins using computational tools. Almost all studies of protein folding focus on individual proteins and do not consider how interactions with posttranslational modifications and between domains might affect folding. However, different chemical conjugations may introduce a variety of effects on the protein biophysics. These effects depend both on the chemical characteristics of the protein substrate as well as on the chemical and physical properties of the attachment. We review the folding of various types of conjugated proteins, glycoproteins, proteins with tails, ubiquitinated proteins, and multidomain proteins, to explore the underlying biophysical principles of these complex folding processes and in particular to quantify the cross-talk between the protein and its conjugated polymer.

264 267 267 267 270 271 273 275 275

Keywords: protein folding; multidomain proteins; glycosylation; ubiquitination; coarse-grained models

Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel Annual Reports in Computational Chemistry, Volume 6 ISSN: 1574-1400, DOI 10.1016/S1574-1400(10)06013-5

2010 Elsevier B.V. All rights reserved.

263

264

Dalit Shental-Bechor et al.

1. INTRODUCTION The field of protein folding has traditionally focused on the folding of individual proteins in isolation following the paradigm that sequence determines structure and structure determines function. The funnel theory of protein folding hypothe sizes that the folding process of a protein is governed by its native structure as was determined by the sequence, since nonnative interactions that may compete with the native interactions and introduce frustration and thus accumulation of traps are minimized [1,2]. However, protein folding in vivo is much more com plicated because in the cell there are several factors that may affect the folding. For example, chaperons participate in folding and may change the folding path way and thermodynamics, as well as the inherently crowded environment of the cell. Another way to affect the folding of a protein is by conjugated moieties as posttranslational modifications (PTMs) (see Figure 1). There is a large variety of PTMs in the cell that serve diverse biological functions. Phosphorylation, for example, is widely used in signal transduction [3]. Ubiquitination, the covalent attachment of the protein ubiquitin, controls the cellular fate of many eukaryotic proteins [4,5]. Sugar trees attached during glycosylation serve as recognition factors to receptors and in protein—protein interactions [6]; the ability of sugarbinding protein receptors (primarily lectins) to recognize carbohydrate conju gates lies at the heart of many central biological processes [7]. Viruses recognize sugars and use them as targets for cell penetration and infection [8]. Myristoyla tion and palmytoylation, the covalent attachment of a fatty acid, help protein trafficking and membrane association [9].

(a)

(c)

(b)

(d)

(e)

Figure 1 A gallery of conjugated proteins. (a) A tailed SH3 protein. (b) A multidomain protein (FNfn9FNfn10). (c) A glycosylated SH3 protein. (d) A ubiquitinated Ubc7 protein (monomeric ubiquitin). (e) A ubiquitinated Ubc7 (Lys48-linked tetrameric ubiquitin).

Folding of Conjugated Proteins

265

But do the PTMs have a biophysical effect on the protein that may be related to the biological function of the modified protein? In glycoproteins the glycan is added to the unfolded protein while it is in the translocon complex [10], indicat ing that it may assist in obtaining the correct fold following the recruitment of lectins, calnexin and calreticulin [11—13]. There is evidence for enhanced thermo stability imposed by glycosylation [14]; for example, the human immune cell receptor cluster of differentiation CD2 can fold correctly only after glycosylation [15,16]. In other cases, however, elimination of all or some glycans has no effect on folding or protein function, implying that some glycosylation sites are more crucial to folding or function than others and that the effect of glycans on folding is likely to be local. The chemical properties of both the oligosaccharide and the protein may govern the effect of glycosylation on the protein energy landscape and hence the biophysical properties of the conjugated protein. On the one hand, the size of the sugar tree, its chemical composition, and its structure (e.g., the number of branches) have an effect on the overall change in the nature of the conjugated protein. On the other hand, the specific glycosylation site and its chemical environment as well as the number of conjugated glycans may modulate the effect of glycosylation on the protein. A quantification of this relationship between the properties of both the sugar and the protein, and the overall influ ence on the glycoprotein’s energy landscape may formulate a “glycosylation code”. Deciphering such a molecular code, however, is a difficult task for two main reasons. First, there is large variety in the composition and structure of oligosaccharides. Second, the structural information about glycans in the context of the folded protein is very limited–while about 50% of all proteins are glyco sylated only 3.5% of the proteins in the Protein Data Bank (PDB) contain the glycan chains and even fewer entries include the full structure of the glycans [17,18]. By studying the folding of glycosylated proteins in silico we can try to formulate the main characteristics of the interplay between a protein and con jugated sugars. A similar kind of conjugation is the existence of an unstructured tail at the termini of proteins (see Figure 1). In nature, there are intrinsically disordered proteins that remain very flexible until they interact with a companion protein that induces structure [19,20]. Disordered tails, in particular, have a role in interacting with other biomolecules such as DNA [21,22] and can accelerate binding kinetics via the fly-casting mechanism [23—25]. Using computational methods we can ask if the attachment of a flexible polymer can modulate the biophysical properties of the protein. As with glycoproteins, we can also ask how the characteristics of the tail (length and flexibility) affect protein characteristics. Furthermore, tails can be conjugated to the proteins not only at the termini but as a branch via the side chain. In this case, one can ask how the number of tails and their conjugation sites modulate the protein’s biophysical properties. The modification of proteins by attaching chains of ubiquitin (known as ubiquitination) can serve as another example to study interface effects between domains. Ubiquitination is a unique PTM in that the conjugated modification is a protein or a polymer of proteins, which can be viewed as a special case of

266

Dalit Shental-Bechor et al.

multidomain protein. The covalent attachment of ubiquitin molecules to the substrate protein is done using an isopeptide bond between the C-terminal of ubiquitin and a Lys of the substrate. The conjugated ubiquitin itself can be further connected, by one of its seven lysine residues, to other ubiquitin molecules. The points of attachment in the polymer determine the shape and topology of the modification, as well as the fate of the modified protein [26,27]. The most wellcharacterized chain topology is that in which the isopeptide link is formed using lysine on position 48 in each of the attached ubiquitin molecules. This type of attachment creates a densely packed tetramer having a large interface with the ubiquitinated substrate (see Figure 1). These Lys48-linked chains are commonly related to protein degradation. Another, very different, elongated topology is obtained by the attachment of subsequent ubiquitin units using the lysine in position 63. The biological function of this ubiquitin tree is nondegenerative, but related to other processes, such as DNA repair. A fundamental question is the effect of ubiquitination on the protein’s thermodynamics and kinetics. The ubi quitin attachment may significantly affect the substrate in various ways that may support the function introduced by the conjugation [28]. Another kind of conjugated entity may be a protein domain in the context of multidomain proteins (see Figure 1) [29,30]. Multidomain proteins are very common in genomes and the folding of the tethered domain might be different to that of the isolated domains. A domain is defined as a structural, functional, and evolutionary component of proteins that can often be expressed as a single unit [31]. In fact, implementation of sequence analyses had shown that most eukaryotic and a cardinal part of prokaryotic proteins are composed of more than one domain [32] and that proteins have evolved through vast duplication and shuffling of domains. However, only a small fraction of possible domain combinations can be found in wild-type multidomain proteins. This modular character of a limited set of domain families supported the emergence of complex protein functions. Yet, the existing domain combinations must have also met constraints of folding in the native operative context, in which the domains fold in the presence of their tethered neighboring domains. Folding in a multidomain architecture suggests a conservation of energetically favorable folding pathways also in the perspective of these conjugated constructs. Multidomain proteins may be viewed as conjugated proteins in which each domain may affect the folding dynamics and thermodynamic properties of its counterpart domain. Experimentally, the thermodynamics and kinetics of both isolated domains and conjugated constructs from several multidomain proteins were studied (a very detailed and fairly current report can be found in Reference [29]). A computational characterization of the mechanistic principles of the fold ing of multidomain proteins [33], utilizing native structure-based models, pro vides a reduced microscopic description of their folding, which in turn may enable the formulation of the forces involved in the interplay between neighbor ing domains. In this paper, we discuss the biophysical effects that are imposed on a protein by conjugation. Specifically, we ask how the nature of the conjugation (its size, shape, flexibility, and conjugation site) affects protein folding.

Folding of Conjugated Proteins

267

2. METHODS The various conjugated protein systems were studied using coarse-grained models. The protein moiety was studied using a native topology-based model and the conjugate moieties were modeled in various ways to capture their polymeric nature. Both the tail and glycan [34,35] were modeled as flexible polymers. The glycan was represented as a tree of beads where each bead represents a single sugar ring. The rigidity of the glycan was introduced by including an angle potential term between the sugar beads and by the excluded volume effect. The tail was modeled as an entropic chain of beads connected with bonds. The flex ibility of the chain was represented solely by the excluded volume. The conjuga tion of a protein, as in multidomain proteins or ubiquitinated proteins [28], was modeled by the native topology-based model as well, yet one can control the relative stability of the conjugates by constraining the protein dynamics. The details of the models can be found in previous publications [36,37]. We would like to point out that, because conjugated proteins may have inho mogeneous degrees of freedom (i.e., part of the system is significantly more flexible than the rest), special care is required in choosing the thermostat for the molecular dynamics simulations. We have recently reported that the Berendsen and the Langevin thermostats show different abilities to regulate the temperature of sys tems that include flexible and more rigid regions [38]. In simulations performed using the Berendsen thermostat, the flexible tail is significantly hotter than the protein, both in its folded and unfolded states. Upon weakening the strength of the Berendsen thermostat, the temperature gradient between the fast and the slow degrees of freedom is significantly decreased, yet linkage between the tempera tures of the flexible tail and the protein remains. The Langevin thermostat is proven to regulate the temperature of these inhomogeneous systems reliably, without discriminating between the slow and fast degrees of freedom (Figure 2).

3. RESULTS AND DISCUSSION 3.1 Folding of glycoproteins A thorough investigation of the effect of glycosylation on the stability of the conjugated protein demonstrated that, while in some cases the oligosaccharide increased its thermodynamic stability, in other cases protein stability was not affected or was even reduced. This was observed from folding—unfolding simu lations of 35 glycoconjugated variants of the Src Homology domain 3 (SH3). In these simulations the glycan was attached to 35 different solvent-exposed posi tions on the protein’s surface to obtain 35 variants of glycoconjugated proteins with a single oligosaccharide attached. A detailed description of the simulations can be found in References [34,35]. In general, it was observed that the change in protein stability is tightly related to the location of the attached glycan. The influence of the glycan varies between stabilization to significant destabilization, which is reflected by the relative population of the folded and unfolded states.

268

Dalit Shental-Bechor et al.

Berendsen thermostat

80 1.2

40 0 0.0

1.0 × 108

2.0 ×108

2.0

120 1.6 80 1.2

40 0

0.8 3.0 ×108

0.0

Time steps

Temperature

1.6

Native contacts

120 Temperature

Native contacts

Langevin thermostat 160

2.0

160

1.0 ×108 Time steps

0.8 2.0 ×108

Figure 2 The effect of the thermostat on the temperature of the conjugate. Time evolution of the temperatures of the flexible polymeric tail (composed of 80 residues) and the SH3 domain simulated with the Berendsen thermostat (right panel) and the Langevin thermostat (left panel). The gray lines correspond to the time evolution of the number of native contacts and show several folding/unfolding events. The temperatures of the tail (thin black line) and the SH3 domain (thick black line) illustrate that the Langevin thermostat reliably regulates the temperature of the inhomogeneous system without discriminating between the slow and fast degrees of freedom while the Berendsen thermostat yields temperature gradients between the fast and slow degrees of freedom.

(b)

50

–20

1

45 40 35

–60 –80

–1

30 25

0

Ubiquitinated Ubc7 Cross-linked nuclease

Mono glycosylated SH3

ΔCm (%)

–40 ΔTF (%)

Degree of folding (%)

(a) 55

–100

–2 0 2 4 6 8 Number of native contacts at the glycosylation site

1 2 3 4 5 6 7 Number of native contacts at the conjugation site

Figure 3 The linkage between the position of the conjugation sites and the effect introduced by the conjugation. The glycosylation (a) and ubiquitination (b) sites are characterized by the number of native contacts the modification site is involved in. Both glycosylation and ubiquitination will show destabilization if the modification is made at a more structured position. Experimentally, it was shown that cross-linked dimers will be destabilized compared to the isolated monomers if the cross-linking is made through a structured residue (b, triangles).

The thermostability effect at each of the 35 selected glycosylation sites of SH3 is depicted in Figure 3a, and illustrates that glycosylation sites located on loops (less structured positions) are more effective in enhancing protein stability than other sites that are more structured. Since the structures of only a small fraction of natural glycoproteins have been fully resolved by either X-ray crystallography or NMR, statistical analysis of the structural features of favored glycosylation sites is limited. Yet, several structural

269

Folding of Conjugated Proteins

65 0.8 60 0.4

55 50

0.0

45 0

1

2 3 4 Number of glycans

Folded state, R = – 0.46 Unfolded state, R = 0.85

70

5

6

Degree of folding (%)

Change in TF (%)

(b) 17.0

Change in TF, R = 0.79 Degree of folding, R = 0.75

1.2

Radius of gyration (A)

(a)

16.5 16.0 15.5 15.0 10.2 10.1 0

1

2 3 4 Number of glycans

5

6

Figure 4 Effect of degree of glycosylation on protein biophysics. (a) The effect of degree of glycosylation on thermal stability as measured either by the change in TF relative to the unmodified protein (DTF = (TFglycosylatedTFunglycosylated)/TFunglycosylated) or by the degree of folding (calculated at the temperature at which the folded state of the unmodified protein is 50% populated). (b) The effect of degree of glycosylation on the size of folded and unfolded conformations.

analyses of glycoproteins that provide some insight regarding the tendency of the potential site containing the consensus sequon of N-glycosylation to accept glycans [18] found that while occupied N-glycosylation sites can occur on all forms of secondary structure, turns and bends are favored. Combining the latter observation with the finding of higher stabilization by glycans attached at less structured regions (i.e., residues that are involved in fewer native contacts) may suggest that natural glycosylations are involved in protein stabilization. Since many glycoproteins contain several glycans, it is of great interest to understand how the biophysical characteristics of glycoproteins are affected as a function of the number of the covalently attached oligosaccharides. Specifi cally, we asked whether there is a cooperative effect between the various attached glycans. To address this question, six positions on SH3, which stabi lized the protein, were selected from the 35 sites that were studied. For these six selected glycosylation sites, we designed all possible glycosylated variants using the dodecasaccharide Man9GlcNAc2. This design resulted in 63 variants: 6 variants with a single glycan (one at each of the glycosylation sites), 15 with two glycans, 20 with three glycans, 15 with four glycans, 6 with five glycans, and a single fully glycosylated variant in which all six positions were glycosy lated. An increase in the transition temperature (defined in the simulations as the folding temperature (TF) at which the protein has a stability of zero (i.e., DGGlyco = DGWT�0)) is observed as the degree of glycosylation increases (Figure 4). On average, each glycan increases the transition temperature by about 0.6—0.9C. The transition temperature of the SH3 domain with six glycans is, accordingly, higher than that of wild-type SH3 by about 3—4C. A similar increase in thermal stabilization per additional glycan was demonstrated experimentally by the chemical glycosylation of a-chymotrypsin [39,40] and subtilisin Carlsberg [34,35] using either the disaccharide lactose or dextran (Glc(a1—6)n, 10 kDa oligosaccharide), with the increase in their melting

270

Dalit Shental-Bechor et al.

temperature depending on the number of glycans attached. The melting tem peratures of a-chymotrypsin and of subtilisin Carlsberg increased by 1 and 2C per added glycan, respectively. Accordingly, not only do the experiments and simulations share a common stabilization trend as a function of the degree of glycosylation, but they also quantitatively predict similar magnitudes of stabilization. Comprehensive thermodynamic analyses of the simulations demonstrate that changes in the unfolded state cause the thermal stabilization [34]. This observa tion is in accord with the idea that the unfolded state is not just a random coil but, rather, retains some residual structures, and it was observed that the conjugated glycans interfered with the formation of these structures in the unfolded state. This interference destabilized the unfolded state, shifted the thermodynamic equilibrium toward the folded state, and resulted in an overall thermodynamic stabilization.

3.2 Folding of proteins with flexible tails To examine the effect of flexible tails on the stability of the protein we attached tails of various lengths to various positions of the SH3 domain, and simulated the folding/unfolding of each variant to decipher the effect of the tail’s length on the stability and kinetics of the protein. We found that a short tail of few beads stabilized the protein and the stabilization was increased with the length of the tail (Figure 5). However, longer tails destabilize the protein and reduce the TF even below the TF of the unmodified proteins. It seems that the first few beads of the tail are responsible for the stabilization, and a question arises, why do longer tails destabilize the protein? Figure 4b presents the change in the Rg of the protein (b)

0.5

0.08

0

0.06

–0.5

0.04

ΔRg

ΔTF (%)

(a)

–1

–2

0.02

4Å 6Å Without repulsions

–1.5 1

10 Tail length

4Å 6Å Without repulsions

0 100

50

100

Tail length

150

Figure 5 Folding characteristics of tailed proteins. The effects of the length of the attached flexible tail on the proteins thermostability (a) and the proteins radius of gyration in the unfolded state (the tails were attached to an SH3 domain at residue 36). The stability and radius of gyration changes are indicated by DTF = (TFwith tailTFunmodified)/TFunmodified and DRg = (Rgwith tailRgunmodified)/Rgunmodified, respectively. Tails with three different kinds of repulsive interactions with the protein were studied: repulsion distance of 4 or 6 ¯ between the tail and the protein, as well as cases in which the tail had no repulsive interactions with the SH3 protein.

Folding of Conjugated Proteins

271

with respect to that of the wild-type SH3 domain. The Rg of the protein was not altered by the tail in the folded state (not shown); it remained constant and similar to that of the wild-type SH3 domain. In the unfolded state, however, the Rg of the protein was increased with the length of the tail. The inert tail, which was not in a specific interaction with the protein, interfered with the structure of the unfolded state. This observation is in accord with the increased enthalpy and entropy of the conjugated variants in the unfolded state. To understand the effect of the repulsive interaction between the tail and the protein on thermodynamic stability, we repeated the simulations while canceling these repulsive interactions. As a result, we obtained two chains, a protein and an entropic chain, that could penetrate each other’s spaces. Shutting down the repulsion between the tail and the protein reduced the stability of the protein ˚ resulted in enhanced thermal (Figure 5a). Increasing the repulsion distance to 6A stability. These results imply that the protein—tail repulsive interactions are responsible for the alteration in thermodynamic stability. Interestingly, the Rg of the protein during the unfolded state increased with the length of the tail even when the repulsions where shut down (Figure 5b). The entropy of the protein is affected by two opposing factors. First, the tail is very flexible and can disrupt the structure of the unfolded state. This is because the tail increases the Rg of the protein and so increases its enthalpy. This necessarily increases the entropy of the protein, because when the residual structure is reduced, more conformations are available and the entropy increases. On the other hand, one may assume that the tail confines the available space of the protein because it reduces the dynamics of the unfolded chain. The repulsive interactions between the protein and the tail restrict the expansion of the protein and hence reduce its entropy. It is evident that the number of repulsive interactions between the tail and the protein levels off when the tail contains 25 beads and longer tails do not contribute to additional repulsive interactions. As a result of these two opposing factors, the entropy increases as the tail gets longer, but when the confining effect of the protein reaches its saturation level, the remaining effect becomes dominant and the entropy increases more rapidly than the enthalpy. Then, the free energy of the unfolded state becomes lower and, as a result, the protein is destabilized.

3.3 Folding of ubiquitinated proteins Recently, we studied the thermodynamic effects of attaching an ubiquitin moiety to a protein, and suggested that these effects may facilitate the cellular process that this specific signal controls [28]. One of the processes ubiquitination med iates is protein degradation: a highly regulated process in which proteins are first recognized by specific cellular machinery, ubiquitinated by a specific ubiquitin polymer (Figure 1e), and then delivered to the proteasome, where they first undergo unfolding and later are degraded into small fragments. We have specu lated that, in addition to its recognition role, the ubiquitin attachment may enhance the degradation process by thermally destabilizing the protein. To address this question, we selected the enzyme Ubc7 (Figure 6a), which is ubiqui tinated for degradation by a specific ubiquitin polymer (Lys48-linked

272

Dalit Shental-Bechor et al.

(a)

(b) Unmodified Ub @ K18 Ub @ C89 Ub @ K94

94

89

62 70 3 11 18

161 29

Degree of folding (%)

80

60

40

20 1.14

Frequency

(c)

0.24

1.16 1.18 Temperature (a.u.)

Unmodified Ubc7 K48-linked polyUb conjugated @ K18 @ K89 @ K94

0.20

0.02 0.01 0.00 20

30 Rg (Å)

40

50

Figure 6 Thermostability of ubiquitinated proteins. The effect of conjugating various ubiquitin polymers (e.g., a monomeric ubiquitin or Lys48-linked polyubiquitin) at the conserved in vivo sites for degradation of Ubc7 protein (residues 89 and 94) as well as at other lysine residues was investigated (a). The computational melting curve of ubiquitinated Ubc7 at positions 89 and 94 by Lys48-linked polyubiquitin indicates strong destabilization compared to the unmodified Ubc7 while its ubiquitination at Lys18 results in stabilization. To ease the comparison, the melting curve of monoubiquitinated Ubc7 at these positions is shown as well (ubiquitination by monomeric ubiquitin or by Lys-48 linked tetra-ubiquitin are shown by empty and filled symbols, respectively) (b). Distributions of the radii of gyration of the ubiquitinated src-SH3 domain systems in the folded and unfolded state illustrate that the change in thermal stability is correlated with changes in the structure of the unfolded state (c).

polyubiquitin) on two residues only, although many other residues are theo retically available for use. Using native-state simulation models, we studied the thermodynamics of this protein with and without a ubiquitin attached at these two residues as well as on other residues that are not used by the cellular machinery. We used a variety of ubiquitin polymers in our study (for example, the tetrameric ubiquitin polymer that is used to tag proteins for degradation and a monomeric ubiquitin that is used by the cell to mediate other nondegradative processes). We observed a range of ubiquitination effects that varied according to the location of the ubiquitin attachment and the type of the ubiquitin polymer we used [28]. These results varied from overstabilizing the protein to different degrees of

Folding of Conjugated Proteins

273

destabilization (Figure 6b). Interestingly, we observed a significant destabilization when attaching the ubiquitin polymer that is responsible for signaling degradation to the specific two residues that are used by the cell for Ubc7 degradation. This suggests that ubiquitin may directly modulate the attached protein’s properties in a manner that aids the regulated cellular process. Why do we observe such diverse thermodynamics when attaching a protein to another protein moiety, such as seen in the ubiquitin case? Clearly, various factors affect the thermal stability of the attached protein. One such factor is the wellstudied confinement effect, which reduces the entropy of the unfolded state and thereby stabilizes the overall folding reaction (Figure 6c) [41,42]. In the case of ubiquitination, we have observed overstabilization of the protein due to a confine ment effect; however, this effect is relatively minor and rare. Another effect that is largely responsible for the varying degree of destabilization observed in our study arises when two protein moieties move in different directions in the solvent, thereby pulling each other. This pulling results in a distortion of the folded state and in the destruction of residual structures in the unfolded state. The pulling effect leads to overall destabilization, mostly because of the increase in the entropy of the unfolded state, due to the residual structures that are unwound near the ubiquiti nation attachment. We have demonstrated that the degree of destabilization becomes greater when the ubiquitin moiety is attached to a more structured region (Figure 3b). Regions that are structured cannot easily accommodate the attachment of the ubiquitin moiety and its independent movements, and therefore are prone to disruption of the folded state, and to a decrease in the residual structure of the unfolded state near the ubiquitination site. This is evident in a strong correlation observed between the degree of structure in the region and the thermodynamic outcome of attaching an ubiquitin to this region. Our observations of these correla tions are augmented by experimental studies in which a nuclease protein was cross linked in vitro, thus forming a dimer from two monomers using introduced cysteine residues [43,44]. In these studies, different dimers were formed by using different linkage locations, and similarly to our observations, a degree of destabili zation was observed. This degree of destabilization correlates well with the density of the structure near the modification site (Figure 3b). Therefore, from these two different systems–ubiquitination of Ubc7 and nuclease cross-linking–we can conclude that the covalent attachment of a protein to another protein may lead to a significant change in the thermal stability of the conjugated protein, and that the thermodynamic outcome is largely dependent on the properties of the modification site. These effects can be used by the cell to facilitate important processes, such as mediating degradation, as in the case of ubiquitination, and can be exploited by experimentalists to alter the properties of the studied system.

3.4 Folding of multidomain proteins Multidomain proteins are widespread in genomes. The tethering of domains may play a biophysical role in addition to enriching functional diversity. To explore the underlying biophysical principles of the complex folding processes of

274

Dalit Shental-Bechor et al.

multidomain proteins and, in particular, to quantify the cross-talk between the domains, a reduced coarse-grained model based on the native topology was used. The method applied involved a comparison between a two-domain con jugated construct and its isolated domain components. We will concentrate on the FNfn9 domain and its natural conjugated neighbor FNfn10 (the ninth and tenth fnIII domains of Fibronectin, PDB code 1fnf). Experiments have shown that FNfn9, which appeared to be unstable on its own, was significantly stabilized by its conjugated neighbor FNfn10 [45]. However, when FNfn9 was lengthened by two residues, its stability was found to be independent of the presence of FNfn10 [46]. Therefore, it was concluded that the two residues at the C-terminus of FNfn9 and the N-terminus of FNfn10 belong to both domains. Following this domain boundaries definition, the isolated domains and the two-domain conjugated construct were studied and their thermodynamic properties were calculated using the weighted histogram analysis method (WHAM). Figure 7 shows plots of specific heat capacity (Cv) vs. temperature. The peak of these curves corre sponds to the transition folding temperature (TF) at which the protein has zero stability (i.e., DG=0). A significant destabilization is demonstrated by the tether ing of FNfn9 to FNfn10 i.e., the TF of the FNfn9 tethered variant is smaller than that of isolated FNfn9. Moreover, if one does not include the interfacial contacts between the two adjacent domains, the decrease in stability is significantly larger. It seems that, in the framework of our model, the tethering by itself causes considerable thermal destabilization. Additional simulations point to the invol vement of the structure and flexibility of the linker region (marked as balls and sticks in Figure 1b). The contacts in the interface between domains may compen sate for this destabilization; however, in the case of FNfn9-FNfn10 construct this was not sufficient. Next to be considered is the effect of the relative stabilities of the domains. In order to account for the immense difference in thermal stability

(b)

(a) 5000

FNfn9 isolated FNfn9 tethered to FNfn10 FNfn9 tethered to FNfn10 (no interface)

5000

3000 2000 1000 0 1.12 1.14 1.16 1.18 1.20 1.22 1.24 1.26 Temperature (a.u.)

4000 Specific heat

Specific heat

4000

FNfn9 isolated FNfn9 tethered to FNfn10 FNfn9 tethered to folded FNfn10

3000 2000 1000 0 1.12 1.14 1.16 1.18 1.20 1.22 1.24 1.26 Temperature (a.u.)

Figure 7 The thermal stability of the multidomain FNfn9FNfn10 protein. (a) The specific heat curve of the FNfn9 domain in isolation, when it is tethered to FNfn10, and when it is tethered to FNfn10 but no interfacial interactions between the two domains are allowed. (b) The specific heat of the isolated FNfn9 domain is compared to that of an FNfn9 tethered to an infinitely (i.e., permanently) stable FNfn10.

Folding of Conjugated Proteins

275

between FNfn9 and FNfn10 [46,47], we designed FNfn10 to be petrified in the folded state. This means that, during folding, FNfn9 “meets” its tethered neigh bor when the latter is always folded. This situation, which better distributes the thermal stabilities of the components of this two-domain construct, seems to compensate for the original decrease in stability. Now, the construct in which FNfn10 is folded (Figure 7) shows very similar stability to that of isolated FNfn9, as was also found experimentally.

4. CONCLUSIONS Many proteins are composed of several domains. These domains may be in direct contact with each other or linked via a flexible linker. One may ask whether the biophysical characteristics of the domains are modified because of the tethering. We show that the properties of the tethered domains can be significantly affected by conjugation to another domain and these effects depend on the properties of the two domains: their flexibility, relative stability, size, and shape. Accordingly, a multidomain protein should not be viewed as a protein that can be described as “sum of its parts”. While the tethering in natural multidomain proteins always takes place via the termini and the protein remains a linear polymer, PTMs often result in branched proteins in which a conjugate is attached to the protein through the side chains of various amino acids. The conjugate can have a poly meric nature. For example, in glycosylation and ubiquitination, polysaccharides or ubiquitin proteins are attached to the protein substrate, respectively. In this article, we showed that glycosylated and ubiquitinated proteins can be either stabilized or destabilized by the conjugation depending on the degree of con jugation, its position on the protein, and the molecular details of the conjugate. We conclude that conjugation can enrich the properties of proteins that are encoded in the genome and that nature may take advantage of this venue to modulate protein biophysics. We show that ubiquitination can induce destabili zation and unfolding and thus assists degradation by the proteasome.

REFERENCES 1. Oliveberg, M., Wolynes, P.G. The experimental survey of protein-folding energy landscapes. Q. Rev. Biophys. 2005, 38, 245—88. 2. Onuchic, J.N., Wolynes, P.G. Theory of protein folding. Curr. Opin. Struct. Biol. 2004, 14, 70—5. 3. Narayanan, A., Jacobson, M.P. Computational studies of protein regulation by post-translational phosphorylation. Curr. Opin. Struct. Biol. 2009, 19, 156—63. 4. Hershko, A., Ciechanover, A. The ubiquitin system. Annu. Rev. Biochem. 1998, 67, 425—79. 5. Varshavsky, A. Discovery of cellular regulation by protein degradation. J. Biol. Chem. 2008, 283, 34469—89. 6. Sharon, N., Lis, H. Carbohydrates in cell recognition. Sci. Am. 1993, 268, 82—9. 7. Lis, H., Sharon, N. Lectins: Carbohydrate-specific proteins that mediate cellular recognition. Chem. Rev. 1998, 98, 637—74. 8. Harrison, S. Viral membrane fusion. Nat. Struct. Mol. Biol. 2008, 15, 690—8. 9. Resh, M. Trafficking and signaling by fatty-acylated and prenylated proteins. Nat. Chem. Biol. 2006, 2, 584—90.

276

Dalit Shental-Bechor et al.

10. Helenius, A., Aebi, M. Roles of N-linked glycans in the endoplasmic reticulum. Annu. Rev. Biochem. 2004, 73, 1019—49. 11. Lederkremer, G.Z. Glycoprotein folding, quality control and ER-associated degradation. Curr. Opin. Struct. Biol. 2009, 19, 515—23. 12. Molinari, M. N-glycan structure dictates extension of protein folding or onset of disposal. Nat. Chem. Biol. 2007, 3, 313—20. 13. Trombetta, E., Parodi, A. Quality control and protein folding in the secretory pathway. Annu. Rev. Cell Dev. Biol. 2003, 19, 649—76. 14. Wang, C., Eufemi, M., Turano, C., Giartosio, A. Influence of the carbohydrate moiety on the stability of glycoproteins. Biochemistry 1996, 35, 7299—307. 15. Hanson, S.R., Culyba, E.K., Hsu, T.L., Wong, C.H., Kelly, J.W., Powers, E.T. The core trisaccharide of an N-linked glycoprotein intrinsically accelerates folding and enhances stability. Proc. Natl. Acad. Sci. U.S.A. 2009, 106, 3131—6. 16. Wyss, D., Choi, J., Li, J., Knoppers, M., Willis, K., Arulanandam, A., Smolyar, A., Reinherz, E., Wagner, G. Conformation and function of the N-linked glycan in the adhesion domain of human CD2. Science 1995, 269, 1273—8. 17. Lutteke, T. Analysis and validation of carbohydrate three-dimensional structures. Acta Crystal logr. D Biol. Crystallogr. 2009, 65, 156—68. 18. Petrescu, A.J., Milac, A.L., Petrescu, S.M., Dwek, R.A., Wormald, M.R. Statistical analysis of the protein environment of N-glycosylation sites: Implications for occupancy, structure, and folding. Glycobiology 2004, 14, 103—14. 19. Wright, P.E., Dyson, H.J. Intrinsically unstructured proteins: Re-assessing the protein structurefunction paradigm. J. Mol. Biol. 1999, 293, 321—31. 20. Dunker, A.K., Lawson, J.D., Brown, C.J., Williams, R.M., Romero, P., Oh, J.S., Oldfield, C.J., Campen, A.M., Ratliff, C.M., Hipps, K.W., et al. Intrinsically disordered protein. J. Mol. Graph. Model. 2001, 19, 26—59. 21. Crane-Robinson, C., Dragan, A.I., Privalov, P.L. The extended arms of DNA-binding domains: A tale of tails. Trends Biochem. Sci. 2006, 31, 547—52. 22. Vuzman, D., Azia, A., Levy, Y. Searching DNA via a “monkey bar” mechanism: The significance of disordered tails. J. Mol. Biol. 2010, 396, 674—84. 23. Shoemaker, B.A., Portman, J.J., Wolynes, P.G. Speeding molecular recognition by using the folding funnel: The fly-casting mechanism. Proc. Natl. Acad. Sci. U.S.A. 2000, 97, 8868—73. 24. Levy, Y., Onuchic, J.N., Wolynes, P.G. Fly-casting in protein-DNA binding: Frustration between protein folding and electrostatics facilitates target recognition. J. Am. Chem. Soc. 2007, 129, 738—9. 25. T oth-Petr

oczy, A., Simon, I., Fuxreiter, M., Levy, Y. The role of disordered tails in specific DNA binding of homeodomains. J. Am. Chem. Soc. 2009, 131, 15084—5.

26. Finley, D. Recognition and processing of ubiquitin-protein conjugates by the proteasome. Annu. Rev. Biochem. 2009, 78, 477—513. 27. Hochstrasser, M., Deng, M., Kusmierczyk, A.R., Li, X., Kreft, S.G., Ravid, T., Funakoshi, M., Kunjappu, M., Xie, Y. Molecular genetics of the ubiquitin-proteasome system: Lessons from yeast. Ernst Schering Found Symp. Proc. 2008, 1, 41—66. 28. Hagai, T., Levy, Y. Ubiquitin not only serves as a tag but also assists degradation by inducing protein unfolding. Proc. Natl. Acad. Sci. U.S.A. 2010, 107, 2001—6. 29. Han, J.H., Batey, S., Nickson, A.A., Teichmann, S.A., Clarke, J. The folding and evolution of multidomain proteins. Nat. Rev. Mol. Cell. Biol. 2007, 8, 319—30. 30. Batey, S., Nickson, A.A., Clarke, J. Studying the folding of multidomain proteins. HFSP J. 2008, 2, 365—77. 31. Murzin, A., Brenner, S., Hubbard, T., Chothia, C. Scop–A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 1995, 247, 536—40. 32. Apic, G., Gough, J., Teichmann, S. Domain combinations in archaeal, eubacterial and eukaryotic proteomes. J. Mol. Biol. 2001, 310, 311—25. 33. Itoh, K., Sasai, M. Cooperativity, connectivity, and folding pathways of multidomain proteins. Proc. Natl. Acad. Sci. U.S.A. 2008, 105, 13865—70. 34. Shental-Bechor, D., Levy, Y. Effect of glycosylation on protein folding: A close look at thermo dynamic stabilization. Proc. Natl. Acad. Sci. U.S.A. 2008, 105, 8256—61.

Folding of Conjugated Proteins

277

35. Shental-Bechor, D., Levy, Y. Folding of glycoproteins: Toward understanding the biophysics of the glycosylation code. Curr. Opin. Struct. Biol. 2009, 19, 524—33. 36. Clementi, C., Nymeyer, H., Onuchic, J.N. Topological and energetic factors: What determines the structural details of the transition state ensemble and “en-route” intermediates for protein fold ing? An investigation for small globular proteins. J. Mol. Biol. 2000, 298, 937—53. 37. Hills, R.D., Brooks, C.L. Insights from coarse-grained go models for protein folding and dynamics. Int. J. Mol. Sci. 2009, 10, 889—905. 38. Mor, A., Ziv, G., Levy, Y. Simulations of proteins with inhomogeneous degrees of freedom: The effect of thermostats. J. Comput. Chem. 2008, 29, 1992—8. 39. Sola, R.J., Al-Azzam, W., Griebenow, K. Engineering of protein thermodynamic, kinetic, and colloidal stability: Chemical glycosylation with monofunctionally activated glycans. Biotechnol. Bioeng. 2006, 94, 1072—9. 40. Sola, R.J., Rodriguez-Martinez, J.A., Griebenow, K. Modulation of protein biophysical properties by chemical glycosylation: Biochemical insights and biomedical implications. Cell Mol. Life Sci. 2007, 64, 2133—52. 41. Mittal, J., Best, R.B. Thermodynamics and kinetics of protein folding under confinement. Proc. Natl. Acad. Sci. U.S.A. 2008, 105, 20233—8. 42. Takagi, F., Koga, N., Takada, S. How protein thermodynamics and folding mechanisms are altered by the chaperonin cage: Molecular simulations. Proc. Natl. Acad. Sci. U.S.A. 2003, 100, 11367—72. 43. Kim, Y.H., Stites, W.E. Effects of excluded volume upon protein stability in covalently cross-linked proteins with variable linker lengths. Biochemistry 2008, 47, 8804—14. 44. Byrne, M.P., Stites, W.E. Chemically crosslinked protein dimers: Stability and denaturation effects. Protein Sci. 1995, 4, 2545—58. 45. Spitzfaden, C., Grant, R., Mardon, H., Cambell, I. Module-module interactions in the cell binding region of fibronectin: Stability, flexibility and specificity. J. Mol. Biol. 1997, 265, 565—79. 46. Steward, A., Adhya, S., Clarke, J. Sequence conservation in Ig-like domains: The role of highly conserved proline residues in the fibronectin type III superfamily. J. Mol. Biol. 2002, 318, 935—40. 47. Clarke, J., Cota, E., Fowler, S.B., Hamill, S.J. Folding studies of immunoglobulin-like b-sandwich proteins suggest that they share a common folding pathway. Structure 1999, 7, 1145—53.

Section 6

Bioinformatics

Section Editor: Wei Wang Department of Chemistry and Biochemistry,

University of California-San Diego, La Jolla, CA 92093, USA

CHAPTER

14 Mean-Force Scoring Functions for ProteinLigand Binding Sheng-You Huang and Xiaoqin Zou

Contents

1. Introduction 2. Theoretical Background 3. Mean-Force Scoring Functions for ProteinLigand Docking 3.1 Atom-randomization reference state 3.2 Corrected reference state 4. Without The Use of The Reference State 5. Conclusion Acknowledgments References

Abstract

The scoring function is one of the key issues in proteinligand docking for structure-based drug design. Despite considerable success in the past decades, the scoring problem remains unsolved. Among various types of scoring functions that have been developed, mean-force scoring functions have received considerable attention and significant development due to their good balance between accuracy, universality, and computational speed. In this chapter, we have reviewed the recent advances in mean-force scoring functions for proteinligand docking. We have also discussed challenges and possible future directions for improving mean-force scoring functions.

281 283 285 285 289 291 292 293 293

Keywords: scoring functions; molecular docking; proteinligand interactions; mean force; reference state

Department of Physics and Astronomy, Department of Biochemistry, Dalton Cardiovascular Research Center, and Informatics Institute, University of Missouri, Columbia, MO, USA Annual Reports in Computational Chemistry, Volume 6 ISSN: 1574-1400, DOI 10.1016/S1574-1400(10)06014-7

2010 Elsevier B.V. All rights reserved.

281

282

Sheng-You Huang and Xiaoqin Zou

1. INTRODUCTION One of the most important elements for molecular docking is how to evaluate the binding energy score for a given complex, referred to as the scoring function problem. The goal of a scoring function is to successfully predict the binding mode and binding tightness between the protein and the ligand [1—6]. Although a large and ever-increasing number of scoring functions have been developed for protein—ligand binding in the past two decades, the scoring problem remains challenging. According to how they are derived, current scoring functions can be grouped into three categories: force field scoring functions, empirical scoring functions, and mean-force scoring func tions [7—11]. Force field scoring functions [12—22] are based on physical principles and include individual energy components such as van der Waals terms and electrostatic terms, using the atomic parameters from mole cular mechanics force fields such as AMBER [23—25] and CHARMM [26,27]. Semiempirical weighting or scaling parameters are usually necessary for force field scoring functions when combining different energy terms obtained from unrelated approaches. Empirical scoring functions consist of a set of weighted empirical energy terms whose coefficients are obtained by fitting to the binding affinity data of a training set of complexes with known structures [28—36]. Mean-force (or say potential of mean force) scoring functions, also referred to as knowledge-based scoring functions or statistical potentials, are derived from the structural information of protein—ligand complexes [37—39]. Their pairwise interaction parameters are directly converted from the occurrence frequency of atom pairs in a large database of complexes. Unlike the empirical scoring functions whose general applicability could be limited by the number of the complexes in the training set due to the availability of the complexes with both known structures and binding data, the mean-force scoring functions are relatively general and robust due to the large number of diverse experimen tally determined protein—ligand complexes available in the Protein Data Bank (PDB) [40] and due to the fact that the potentials are extracted from structures instead of fitting to known affinity data. Compared to the force field scoring functions that are normally involved in computationally expensive solvent treatment and necessity to introduce semiempirical weighting coefficients for combining different energy terms, the mean-force scoring functions are com putationally much more efficient due to their simple pairwise characteristics and are more general due to the nonuniversality of the force-field weighting coefficients. Therefore, compared to the force field and empirical scoring func tions, the mean-force scoring functions offer a good balance between accuracy and speed, which has boosted a large number of mean-force scoring functions for protein—ligand interactions in the past decade. In this chapter, we will give a brief review of the background and the recent advances of the mean-force scoring functions for protein—ligand docking. Challenges and possible future directions for the development/improvement of mean-force scoring functions will also be discussed.

Mean-Force Scoring Functions for Ligand Binding

283

2. THEORETICAL BACKGROUND The concept of mean force comes from the physics field, which can be illustrated by a simple fluid system of N particles whose positions are r1 ;. . . ;rN [41]. Define a quantity w ðn Þ ðr1 ;. . . ;rn Þ by g ðn Þ ðr1 ;. . . ;rn Þ e w

ðn Þ

ðr1 ; ... ;rn Þ

ð1Þ

where g ðn Þ is called a correlation function and ¼ 1=kB T in which kB is the Boltzmann constant and T is the system absolute temperature. Then, the system can be described by the following formula [41]: ð

rj w ðn Þ

ð . . . e U ðrj UÞdrn þ 1 . . . drN ð ð ¼ ; j ¼ 1; 2; . . . ;n . . . e U drn þ 1 . . . drN

ð2Þ

Here, U is the total potential energy of the system and can be calculated by U¼

N X

uðrij Þ

ð3Þ

i<j

where uðrij Þ is the interatomic interaction potential, rij is the distance between two particles i and j, and the summarization is over all the possible atom pairs in the system. Since rj U is the force acting on the particle j for a fixed configuration { r1 ;. . . ;rN ;} rj w ðn Þ is the mean force over all the configurations of the particles n þ 1; . . . ;N acting on the particle j when the particles 1; . . . ;n are fixed. In other words, w ðn Þ is the potential that gives the mean force acting on particle j, or say, w ðn Þ , the potential of mean force. Take a special case w ð2 Þ ðr12 Þ wðrÞ which reflects the interaction between two particles held at a fixed distance r from the remaining N 2 particles of the system. Then, Eq. (1) can be rewritten as wðrÞ kB Tln ½gðrÞ

ð4Þ

where gðrÞ is the pair distribution function (also referred to as the radial dis tribution function) and wðrÞ is the corresponding pair potential of mean force or mean-force pair potential. It can be seen from Eqs. (2) and (3) that the potential of mean force wðrÞ is not the true potential uðrÞ. Figure 1 shows a comparison between uðrÞ and wðrÞ based on our own molecular dynamic simulations with a simple monatomic fluid system. Only when the particle density of the system is approaching 0, that is, when two particles that are apart at a fixed distance r are not affected by the remaining N 2 particles, wðrÞ ! uðrÞ. Originated from physics, the potential of mean force was first proposed by Tanaka and Scheraga to evaluate protein structure models [37] and substan tially extended by Miyazawa and Jernigan, Sippl, and others [38,39,42—44],

284

Sheng-You Huang and Xiaoqin Zou

g(r ) w (r ) u(r )

2

1

0

–1 r

Figure 1 An example of the pair distribution function gðrÞ, the potential of mean force wðrÞ, and the true interatomic potential uðrÞ based on our molecular dynamic simulations with a simple monatomic system. The only interatomic interactions in the system are the LennardJones format potential.

known as statistical or knowledge-based potentials in the field of computa tional biology. In this approach, the interaction potentials wij ðrÞ are expressed according to Eq. (4) as " # ij ðrÞ wij ðrÞ kB Tln gij ðrÞ ¼ kB Tln ij

ð5Þ

where ij ðrÞ is the number density for the atom pairs of types i and j observed in the known protein structures and ij is the number density of the corresponding pair in a so-called reference state at infinite separation where the atomic interac tion is zero. Unlike the simple monatomic system in which the can be exactly obtained by randomizing all the atoms in the system, the reference state ij ðrÞ for the complicated protein system is inaccessible due to the effects of connectivity, excluded volume, composition, etc. [45]. Therefore, the pair potentials defined in Eq. (5) are not the exact potentials of mean force in physics. Also, similar to the potential of mean force, the knowledge-based potentials of Eq. (5) are not the true interaction potentials, either. Despite these limitations, statistical potentials provide a simple and effective alternative method to derive the interaction free energy from the structural information for complicated protein systems. In the past decades, the method has been significantly developed and used with considerable success in the fields of protein structure prediction, protein—protein interactions, and protein—ligand interactions [46]. Since the reference state is inaccessible for protein systems, the calculations for in Eq. (5) could be arbitrary in real applications. Therefore, a challenging task in the derivation of mean-force scoring functions is how to calculate ij ðrÞ of

Mean-Force Scoring Functions for Ligand Binding

285

the reference state so that the mean-force potentials wij ðrÞ can be closer to the true potentials uij ðrÞ.

3. MEAN-FORCE SCORING FUNCTIONS FOR PROTEINLIGAND DOCKING Unlike the mean-force/knowledge-based scoring functions developed for struc ture prediction, which have been studied for more than three decades, the meanforce/knowledge-based scoring functions developed for protein—ligand binding have only been addressed in the last 10 years. With the availability of an increas ing number of complex structures, a number of mean-force/knowledge-based scoring functions for protein—ligand interactions have been developed using different methods and have been successfully applied to protein—ligand docking. According to how the reference state is calculated, current mean-force/knowl edge-based scoring functions for protein—ligand interactions may be grouped into three categories: atom-randomized reference state, corrected reference state, and without the use of the reference state. The performances of several published mean-force scoring functions for protein—ligand interactions are sum marized in Tables 1 and 2 on binding mode and affinity predictions.

3.1 Atom-randomization reference state The mean-force scoring functions based upon an atom-randomized reference state use the method for the simple monatomic system to calculate ij ðrÞ. In this method, it is assumed that all the atoms are randomly placed in the system without interatomic interactions, ignoring interatomic connectivity, excluded volume, and other factors in the structures of protein—ligand complexes. The resulted pair potentials {wij } are then taken as the true potentials {uij } despite their differences, and the binding energy score is calculated as Energy score ¼

X

uij

ð6Þ

ij

One of the earliest studies is the mean-force potentials developed for the predic tion of HIV-1 protease binding affinity by Verkhivker et al. [58]. Following the original approach by Sippl [39], the pair potential for a given pair of ligand atom i and protein atom j separated by a distance r was defined as f ij ðrÞ wij ðrÞ ¼ kB ln 1 þ mij kB ln 1 þ mij f ðrÞ

ð7Þ

where mij is the number of pairs with ligand atom type of i and protein atom type of j, is a weighting factor for the correction of sparse atom pairs with low statistics, f ij ðrÞ is the frequency of the atom pair ij observed at interatomic distance r, and f ðrÞ is the total number of all atom pairs regardless of atom

286

Test set (no. of complexes)

Function ITScore/SE ITScore PMF DrugScoreCSD DrugScorePDB SMoG2001 BLEEP

References [48] [49,50] [47] [51] [52] [53] [54,55]

Ser (16) 0.89 0.87 0.87 0.79 0.77 0.81 0.79

Met (15) 0.71 0.70 0.58 0.44 0.65 0.64 0.59

L-ara

(18) 0.48 0.49 0.48 0.44 0.20 0.06 0.14

End (11) 0.36 0.35 0.22 0.53 0.55 0.03 0.04

Oth (17) 0.80 0.70 0.69 0.33 0.36 0.50 0.49

all (77) 0.76 0.65 0.61 0.20 0.21 0.46 0.28

Here we use the square of correlation coefficient (R2) rather than correlation coefficient itself (R) in order to maintain consistency with the original data. The correlation data for ITScore/SE, ITScore, BLEEP, and SMoG2001 were taken from our previous study [48], and those for DrugScorePDB and DrugScoreCSD were calculated by the DrugScoreONLINE server (http://pc1664.pharmazie.uni-marburg.de/drugscore).

Sheng-You Huang and Xiaoqin Zou

Table 1 Correlations of binding affinity prediction for seven published knowledge-based scoring functions for the PMF validation sets of 77 proteinligand complexes (all) which consist of five classes [47]: 16 serine protease (Ser), 15 metalloprotease (Met), 18 L-arabinose binding protein (L-ara), 11 endothiapepsin (End), and 17 diverse proteinligand complexes (Oth)

287

Mean-Force Scoring Functions for Ligand Binding

Table 2 The success rates for binding mode prediction and the correlation coefficients (R) for binding affinity prediction with seven knowledge-based scoring functions on Wang et al.s test set of 100 diverse proteinligand complexes Scoring function

References

Success rate (%)

Correlation (R)

ITScore/SE DrugScoreCSD ITScore DrugScorePDB DFIRE Cerius2/PMF KScore

[48] [51] [49,50] [52] [56] [47] [57]

91 87 82 72 58 52 —

0.65 0.62 0.65 0.60 0.63 0.40 0.49

˚ . From [48]. Here, the success criterion for identifying the native binding modes is rmsd £2.0 A

types separated by a distance r. Similarly, they also introduced a set of desolva tion mean-force potentials for ligand atoms as wsolv ðsÞ i

f i ðsÞ ¼ kB Tln ½1 þ mi kB Tln 1 þ mi f ðsÞ

ð8Þ

where s is the fraction of solvent-accessible surface area that is buried upon binding relative to the accessibility of a free ligand, f i ðsÞ is the frequency of occurrence of atom type i with buried fraction s, and f ðsÞ is the frequency of occurrence of all ligand atom types with buried fraction s. The mean-force poten tials were derived based on 30 HIV-1, HIV-2, and SIV protein—ligand complexes and 12 atom pairs. By combining these derived mean-force pair interaction and desolvation potentials with other energy terms, Verkhivker et al. obtained a new scoring function with which they achieved good correlation between the calculated and experimental binding free energies for seven HIV-1 inhibitors. DeWitte and Shakhnovich developed a mean-force scoring function (SMoG) for de novo lead design using 13 protein and 13 ligand atom types [59]. It was based on a set of contact-based potentials in which a ligand atom is defined as in ˚ from each other. The potential contact with a protein atom if they are within 5 A parameter wij was derived from the statistical information about interatomic interactions in crystal structures as wij ¼ kB Tln

pij p

! ð9Þ

where pij is the probability of an interaction/contact between protein atom i and ligand atom j in the system. The reference state p was defined as 1X p¼ pij ; N ij

where N ¼

X 1 ij

ð10Þ

288

Sheng-You Huang and Xiaoqin Zou

The mean-force potentials were derived based on two training sets with ˚ ). One training set consists of 17 protein—ligand com high resolution (£2.0 A plexes in which the ligands bind to the sites on protein surface and the other is composed of 108 complexes in which the ligands bind to the sites in protein pockets. Later, an improved version of the SMoG scoring function referred to as SMoG2001 [53] was developed using a larger training set of 725 protein—ligand complexes. In this newer version, the contact probabilities were normalized to better characterize the reference state, and two distance intervals were intro duced for the computation of contact statistics. Applying SMoG2001 to Muegge and Martin’s test set of 77 complexes [47] yielded a correlation coefficient of 0.68 (or R2 ¼ 0:46). Mitchell et al. developed an atomic-level potential of mean force for pro tein—ligand interactions (BLEEP) based on two hydrogen types, 38 nonmetal heavy atom types, and 18 metal types [54]. The mean-force potential between atom types i and j at interatomic distance r was also derived based on Sippl’s sparse data correction method [39] f ij ðrÞ wij ðrÞ ¼ kB Tln 1 þ mij kB Tln 1 þ mij f ðrÞ

ð11Þ

˚ ) between atom types i where mij is the total number of contacts (i.e., rij < 8:0 A and j, f ij ðrÞ is the proportion of these contacts occurring at distance r, and f ðrÞ is the proportion of all contacts for all atom types occurring at distance r. is a correction parameter for sparse data of certain atomic pairs and was set to 0.02 in their calculations. The potential was represented by a histogram-based form ˚ . According to how water was treated, two types of with a bin width of 0.1 A mean-force potentials were derived: BLEEP-1 that were based on a training set of 351 protein—ligand complexes and the water-inclusive BLEEP-2 that were based on 188 complexes. The BLEEP-1 model was tested on nine serine protease—inhibitor complexes and obtained a correlation coefficient of 0.71 (or R2 ¼ 0:50) between the calculated energy scores and the experimentally deter mined binding data. A further test on a set of 90 protein—ligand complexes shows a good correlation ( R ¼ 0:74 or R2 ¼ 0:55) in affinity predictions [55]. Application of BLEEP to the 77 complexes used by Muegge and Martin [47] yields a correlation of R2 ¼ 0:28 [60]. Gohlke et al. developed a mean-force scoring function (DrugScorePDB) based on 17 atom types and 1376 protein—ligand complex structures from the PDB [52]. In their method, they also followed Sippl’s formalism [39] to calculate the pair potentials: wij ðrÞ ¼ kB Tln

gij ðrÞ gðrÞ

ð12Þ

where gij ðrÞ is the normalized pair distribution function for atoms of types i and j at distance r and is calculated by

Mean-Force Scoring Functions for Ligand Binding

nij ðrÞ=4r2 gij ðrÞ ¼ X ½nij ðrÞ=4r2

289

ð13Þ

r

where nij ðrÞ is the number of atom pair of types i and j at distance r. gðrÞ is the normalized average pair distribution function for any two atoms at distance r, which can be calculated by X

gij ðrÞ

ij

gðrÞ ¼

ð14Þ

i j

˚, In their calculations, the cutoff for the mean-force potentials was set to 6.0 A ˚ . In addition to the pair potentials and the bin size for the distance Dr was 0.1 A wij ðrÞ, DrugScorePDB also included a set of solvent-accessible surface areadependent singlet-potentials to account for the atomic desolvation effect, using a similar computational formalism. DrugScorePDB was validated on two sets of 91 and 68 protein—ligand complexes. A further comparative evalua tion of DrugScore and AutoDock shows that DrugScore yielded slightly super ior results in flexible docking [61]. Recently, a second version of DrugScore (i.e., DrugScoreCSD) [51] was developed based on the Cambridge Structural Database (CSD), which consists of many high-resolution crystal structures of small chemical compounds [62]. Other examples of mean-force scoring functions based on the atom-rando mized reference state include MScore [63], ASP [64], KScore [57], etc.

3.2 Corrected reference state As discussed in Section 2, the atom-randomized reference state used in calcula tions for simple fluid systems cannot be generalized to the connected, finite atomic-size system of protein—ligand complexes. Therefore, researchers attempted to make corrections to the simple random reference state in order to improve the derived mean-force potentials. Muegge and Martin introduced a ligand volume correction to the reference state when deriving their PMF scoring function [47]. The mean-force potential between a protein atom of type i and a ligand atom of type j was expressed as " wij ðrÞ ¼ kB Tln f j

ij ðrÞ j Volcorr ðrÞ ij

# ð15Þ

where f Vol corr ðrÞ is the ligand volume correction factor. ij ðrÞ is the number density of atom pairs of protein type i and ligand type j in a structural database at distance r, and ij represents the average number density of atom pair ij, which can be calculated as

290

Sheng-You Huang and Xiaoqin Zou

ij ¼

Nij nij ðrÞ ; ij ðrÞ ¼ 3 4r2 Dr 4R =3

ð16Þ

where Nij and nij ðrÞ are the numbers of protein—ligand atom pair occurrences of type ij in the reference sphere with a radius R and in the spherical shell between radii r and r þ Dr, respectively. The calculation for the ligand volume j correction factor f Vol corr ðrÞ is more complicated [47], the basic principle of which is that the spherical shell centered at a ligand atom j is not completely occupied by the protein due to the excluded volume effect of the ligand. The shell volume may also be occupied by the ligand or the solvent. Therefore, the shell volume for each shell centered at a ligand atom should be properly partitioned to the protein, ligand, and/or solvent in the derivation of the PMF potentials, by introducing the ligand volume correction factor j f Vol corr ðrÞ. By adjusting the size R of the reference sphere to include the different volume proportions of the protein/ligand/solvent in the sphere, the PMF potentials attempted to account for the effects of desolvation and entropy. PMF has 34 ligand atom types and 16 protein atom types, extracted from a training set of 697 protein—ligand complex structures. With a test set of 77 protein—ligand complexes, the PMF scoring function outperforms the Bohm’s score (LUDI) [31] and small molecule growth (SMoG) [59], yielding a high correlation coefficient (R2 ¼ 0:61) between the calculated scores and the experimental binding constants. The PMF scoring function was also successfully applied to docking/scoring studies of weak ligands for the FK506 binding protein [65] and inhibitors for matrix metalloprotease MMP3 [66]. Recently, a newer version of PMF (PMF04) has been developed on a much larger database of 7152 protein—ligand complexes from the PDB [67]. In this work, Muegge showed that the larger size of the training structural database does not improve the scoring results much and so for the inclusion of metal ions. Based on 19 atom types and 200 protein—ligand complexes, Zhou and cow orkers used a distance-scale finite ideal-gas reference (DFIRE) state to develop a mean-force energy function for protein—ligand, protein—protein, and protein—DNA complexes [56]. The idea of DFIRE is that unlike the ideal gas system in which the available volume is 4r2 Dr for a shell between radii r and r þ Dr, the available volume for a shell from the radius r to r þ Dr centered at a protein/ ligand atom is not proportional to 4r2 for the protein—ligand complex due to the effects of excluded volume of the protein/ligand. An appropriate correction factor should be applied to include the effect of the excluded volume in the computation of the mean-force potentials. Specifically, the DFIRE potentials between atom types i and j at distance r is given by 2 6 6 wij ðrÞ ¼ kB Tln 4

3 7 nij ðrÞ 7 5 r Dr nij ðrcut Þ Drcut rcut

ð17Þ

Mean-Force Scoring Functions for Ligand Binding

291

where nij ðrÞ is the number of atom pairs ij in the shell from the radius r Dr=2 to r þ Dr=2 observed in a given structure database, and rcut is the cutoff for the ˚ . The exponent is a scale factor for considering potentials and was set to 14.5 A the effect of excluded volume and was found to be 1.61 on the basis of a state of uniformly distributed points in finite spheres [68]. Dr=Drcut was introduced so that the interaction potentials wij ðrÞ ! 0 when r ! rcut . Using the test set of 100 protein—ligand complexes by Wang et al. [69], the DFIRE scoring function obtained a correlation coefficient of 0.63 between the calculated scores and the experimental data for binding affinity prediction and a success rate of 58% for identifying native or near-native binding modes.

4. WITHOUT THE USE OF THE REFERENCE STATE No matter whether the reference state is approximated with an atom-randomized state or corrected with a volume factor, the derived mean-force/knowledge-based potentials are not the true interaction potentials according to Eqs. (2)—(5). Due to the complexity of proteins, the reference state is inaccessible [45]. Attempting to address this problem, Huang and Zou have recently developed a mean-force scoring func tion (ITScore) using an iterative method to circumvent the accurate calculation of the reference state [49,50,70—72]. The basic idea of the iterative method is to improve the effective pair potentials uij ðrÞ by iteration until the mean-force potentials converge to a set of stable effective potentials that can reproduce the pair distribu tion functions in experimentally determined protein-ligand complex structures. The method uses the following iterative function: ðkþ1Þ

uij

h i ðkÞ ðkÞ ðrÞ ¼ uij ðrÞ þ lkB T gij ðrÞ gobs ðrÞ ij

ð18Þ

where i and j represent the atom types for a pair of atoms in the protein and the ligand, respectively; k stands for the iterative step; and l is a parameter to control the convergence rate. Without loss of generality, kB T was set to unit one in the calculation. gobs ij ðrÞ is the experimentally observed pair distribution function reflecting the structural characteristics of experimentally determined pro ðkÞ tein—ligand structures, and gij ðrÞ is the predicted pair distribution function for ðkÞ all the sampled ligand poses using the trial potentials uij ðrÞ at the k-th step. ðkÞ The predicted pair distribution function gij ðrÞ can be calculated by using the following formula: ðkÞ gij ðrÞ ðkÞ

ðkÞ

ðkÞ

¼

ij ðrÞ ðkÞ

ij;bulk

ð19Þ

where ij ðrÞ and ij;bulk are the number densities of atom pair type ij occurring in a spherical shell Dr and in a reference sphere of radius Rmax , respectively. Given a dataset of M protein—ligand complexes and L decoys for each complex, ðkÞ ðkÞ ij ðrÞ and ij;bulk were calculated as

292

Sheng-You Huang and Xiaoqin Zou

ðkÞ

ij ðrÞ ¼

M X L nml ðrÞe Uml 1 X ij ML m l 4r2 Dr

ðkÞ

and ij;bulk ¼

M X L N ml e Uml 1 X ij M L m l VðRmax Þ

ð20Þ

where nijml ðrÞ and Nijml are the numbers of atom pair type ij in the spherical shell from r Dr=2 to r þ Dr=2 and the reference sphere for the l-th decoy ligand pose of the m-th complex, respectively. Uml is the energy score of this ligand ðkÞ pose calculated by the potentials uij ðrÞ. L is the total number of putative ligand poses for each complex (including the native binding mode, i.e., l ¼ 0). VðRmax Þ ¼ 4R3max =3 is the volume of the reference sphere. Obviously, max m Nijm ¼ r¼R nij ðrÞ. r¼0 Similarly, the experimentally observed pair distribution function gobs

ij ðrÞ can be obtained from the native structures of the complexes by using Eq. (19) and setting L ¼ 0 in Eq. (20). ð0Þ ð1Þ Thus, given a guess of initial potentials uij ðrÞ, an improved potential uij ðrÞ can be obtained using Eq. (18). Repeating this cycle, the potentials will converge ðkþ1Þ ðkÞ to relatively stable values [i.e., uij ðrÞ»uij ðrÞ]. It can be seen from Eq. (18) that the improvement for the potentials uij ðrÞ depends only on the difference between the predicted and experimentally observed pair distribution functions instead of any properties related to the reference state. Therefore, the iterative method does not face the reference state problem encountered by traditional mean-force/knowledge-based scoring functions. ITScore has been extensively assessed with diverse test sets [50]. It yielded a success rate of 82% on binding mode identification for Wang et al.’s test set [69] of 100 protein—ligand complexes. It achieved a correlation (R) of 0.64 and 0.81 on binding affinity prediction for Wang et al.’s test set and the PMF validation set [47] of 77 complexes, respectively. ITScore was also validated on virtual screen ing using compound databases for four protein targets [73]: ER, MMP3, fXa, and AChE. Very recently, Huang and Zou have also included the solvation effect and configurational entropy in ITScore by using a similar iterative method. The resulted scoring function, referred to as ITScore/SE, significantly improves the performance compared to ITScore [48].

5. CONCLUSION In this chapter, we have reviewed the current types of mean-force scoring func tions according to how they treat the reference state. Despite considerable pro gress, several lines remain open for future development. First, as demonstrated and discussed in our previous study [48], explicit inclusion of the entropic effects may significantly improve the accuracy of a mean-force scoring function. Second, the appropriate categorization of atom types with a good balance of the statistics

Mean-Force Scoring Functions for Ligand Binding

293

of the atom pair occurrences and the number of atom types is a common issue for mean-force scoring functions. Third, the current pairwise potentials are obviously a simplification; how to incorporate many-body interactions and whether this can significantly improve the scoring performance remain unknown because of the introduction of many more parameters to be determined. Finally, binding mode prediction and thus virtual screening are challenges for many mean-force scoring functions, particularly if they involve inaccurate reference state calculations. The physics behind it is that conventional mean-force scoring functions use only information embedded in native structures; the derived potentials are thus not very useful for noncanonical interactions presented in decoys and thereby for binding mode prediction. One way to solve the problem is the iterative approach for ITScore [49], which considers the information embedded in the whole energy landscape (i.e., energies for both native structures and decoys). The derived potentials would contain the gradient force to separate native structures from decoy structures. As mean-force potentials become increasingly accurate, they will serve as a valuable tool for structure-based drug design. Detailed analysis of the mean-force potentials will provide insightful hints on how to improve other physics-based scoring functions. Mean-force potentials that perform well on binding mode predictions may also be implemented for molecular dynamics simulations of protein—ligand interactions.

ACKNOWLEDGMENTS We thank Sam G. Grinter for critical reading of the manuscript. Support to XZ from OpenEye Scientific Software, Inc. (Santa Fe, NM, USA) and Tripos, Inc. (St. Louis, MO, USA) is gratefully acknowledged. XZ is supported by NIH grant GM088517, Cystic Fibrosis Foundation grant ZOU07I0, the Research Board Award RB-07-32, and the Research Council Grant URC 09-004 of the University of Missouri. This work is also supported by Federal Earmark NASA Funds for Bioinformatics Consortium Equipment and additional financial support from Dell, SGI, Sun Microsystems, TimeLogic, and Intel.

REFERENCES 1. Brooijmans, N., Kuntz, I.D. Molecular recognition and docking algorithms. Annu. Rev. Biophys. Biomol. Struct. 2003, 32, 335—73. 2. Bo¨ hm, H.J., Stahl, M. The use of scoring functions in drug discovery applications. Rev. Comput. Chem. 2002, 18, 41—87. 3. Wang, W., Donini, O., Reyes, C.M., Kollman, P.A. Biomolecular simulations: Recent developments in force fields, simulations of enzyme catalysis, protein-ligand, protein-protein, and proteinnucleic acid noncovalent interactions. Annu. Rev. Biophys. Biomol. Struct. 2001, 30, 211—43. 4. Shoichet, B.K., McGovern, S.L., Wei, B., Irwin, J.J. Lead discovery using molecular docking. Curr. Opin. Chem. Biol. 2002, 6, 439—46. 5. Reddy, M.R., Erion, M.D. Free Energy Calculations in Rational Drug Design, Kluwer Academic, New York, 2001. 6. Seifert, M.H.J., Kraus, J., Kramer, B. Virtual high-throughput screening of molecular databases. Curr. Opin. Drug Discov. Devel. 2007, 10, 298—307. 7. Jain, A.N. Scoring functions for protein-ligand docking. Curr. Protein Pept. Sci. 2006, 7, 407—20.

294

Sheng-You Huang and Xiaoqin Zou

8. Schulz-Gasch, T., Stahl, M. Scoring functions for protein-ligand interactions: A critical perspective. Drug Discov. Today Technol. 2004, 1, 231—9. 9. Rajamani, R., Good, A.C. Ranking poses in structure-based lead discovery and optimization: Current trends in scoring function development. Curr. Opin. Drug. Discov. Devel. 2007, 10, 308—15. 10. Gohlke, H., Klebe, G. Statistical potentials and scoring functions applied to protein-ligand bind ing. Curr. Opin. Struct. Biol. 2001, 11, 231—5. 11. Gilson, M.K., Zhou, H.X. Calculation of protein-ligand binding affinities. Annu. Rev. Biophys. Biomol. Struct. 2007, 36, 21—42. 12. Meng, E.C., Shoichet, B.K., Kuntz, I.D. Automated docking with grid-based energy approach to macromolecule-ligand interactions. J. Comput. Chem. 1992, 13, 505—24. 13. Goodsell, D.S., Olson, A.J. Automated docking of substrates to proteins by simulated annealing. Proteins 1990, 8, 195—202. 14. Zou, X., Sun, Y., Kuntz, I.D. Inclusion of solvation in ligand binding free energy calculations using the generalized-Born model. J. Am. Chem. Soc. 1999, 121, 8033—43. 15. Wei, B.Q., Baase, W.A., Weaver, L.H., Matthews, B.W., Shoichet, B.K. A model binding site for testing scoring functions in molecular docking. J. Mol. Biol. 2002, 322, 339—55. 16. Liu, H.-Y., Kuntz, I.D., Zou, X. Pairwise GB/SA scoring function for structure-based drug design. J. Phys. Chem. B 2004, 108, 5453—62. 17. Kuhn, B., Gerber, P., Schulz-Gasch, T., Stahl, M. Validation and use of the MM-PBSA approach for drug discovery. J. Med. Chem. 2005, 48, 4040—8. 18. Liu, H.-Y., Zou, X. Electrostatics of ligand binding: Parametrization of the generalized born model and comparison with the Poisson-Boltzmann approach. J. Phys. Chem. B 2006, 110, 9304—13. 19. Liu, H.-Y., Grinter, S.Z., Zou, X. Multiscale generalized Born modeling of ligand binding energies for virtual database screening. J. Phys. Chem. B 2009, 113, 11793—9. 20. Lyne, P.D., Lamb, M.L., Saeh, J.C. Accurate prediction of the relative potencies of members of a series of kinase inhibitors using molecular docking and MM-GBSA scoring. J. Med. Chem. 2006, 49, 4805—8. 21. Guimaraes, C.R.W., Cardozo, M. MM-GB/SA rescoring of docking poses in structure-based lead optimization. J. Chem. Inf. Model. 2008, 48, 958—70. 22. Thompson, D.C., Humblet, C., Joseph-McCarthy, D. Investigation of MM-PBSA rescoring of docking poses. J. Chem. Inf. Model. 2008, 48, 1081—91. 23. Weiner, P.K., Kollman, P.A. AMBER–assisted model building with energy refinement¡aa general program for modeling molecules and their interactions. J. Comput. Chem. 1981, 2, 287—303. 24. Weiner, S.J., Kollman, P.A., Case, D.A., Singh, U.C., Ghio, C., Alagona, G., Profeta, S., Weiner, P. A new force-field for molecular mechanical simulation of nucleic-acids and proteins. J. Am. Chem. Soc. 1984, 106, 765—84. 25. Weiner, S.J., Kollman, P.A., Nguyen, D.T., Case, D.A. An all-atom force field for simulations of protein and nucleic acids. J. Comput. Chem. 1986, 7, 230—52. 26. Nilsson, L., Karplus, M. Empirical energy functions for energy minimization and dynamics of nucleic acids. J. Comput. Chem. 1986, 7, 591—616. 27. Brooks, B.R., Bruccoleri, R.E., Olafson, B.D., States, D.J., Swaminathan, S., Karplus, M. CHARMM a program for macromolecular energy, minimization, and dynamics calculations. J. Comput. Chem. 1983, 4, 187—217. 28. Jain, A.N. Scoring noncovalent protein-ligand interactions: A continuous differentiable function tuned to compute binding affinities. J. Comput. Aided Mol. Des. 1996, 10, 427—40. 29. Head, R.D., Smythe, M.L., Oprea, T.I., Waller, C.L., Green, S.M., Marshall, G.R. Validate a new method for the receptor-based prediction of binding affinities of novel ligands. J. Am. Chem. Soc. 1996, 118, 3959—69. 30. Eldridge, M.D., Murray, C.W., Auton, T.R., Paolini, G.V., Mee, R.P. Empirical scoring functions: I. The development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes. J. Comput. Aided Mol. Des. 1997, 11, 425—45. 31. Bo¨ hm, H.J. The development of a simple empirical scoring function to estimate the binding constant for a protein-ligand complex of known three-dimensional structure. J. Comput. Aided Mol. Des. 1994, 8, 243—56.

Mean-Force Scoring Functions for Ligand Binding

295

32. Wang, R., Liu, L., Lai, L., Tang, Y. SCORE: A new empirical method for estimating the binding affinity of a protein-ligand complex. J. Mol. Model. 1998, 4, 379—94. 33. Bo¨ hm, H.J. Prediction of binding constants of ptotein ligands: A fast method for the polarization of hits obtained from de novo design or 3D database search programs. J. Comput. Aided Mol. Des. 1998, 12, 309—23. 34. Gehlhaar, D.K., Verkhivker, G.M., Rejto, P.A., Sherman, C.J., Fogel, D.B., Freer, S.T. Molecular recognition of the inhibitor AG-1343 by HIV-1 Protease: Conformationally flexible docking by evolutionary programming. Chem. Biol. 1995, 2, 317—24. 35. Gehlhaar, D.K., Bouzida, D., Rejto, P.A. In Rational Drug Design: Novel Methodology and Practical Applications (eds L. Parrill; M.R. Reddy), American Chemical Society, Washington, DC, 1999, pp. 292—311. 36. Wang, R., Lai, L., Wang, S. Further development and validation of empirical scoring functions for structure-based binding affinity prediction. J. Comput. Aided Mol. Des. 2002, 16, 11—26. 37. Tanaka, S., Scheraga, H.A. Medium- and long-range interaction parameters between amino acids for predicting three-dimensional structures of proteins. Macromolecules 1976, 9, 945—50. 38. Miyazawa, S., Jernigan, R.L. Estimation of effective interresidue contact energies from protein crystal structures: Quasi-chemical approximation. Macromolecules 1985, 18, 534—52. 39. Sippl, M.J. Calculation of conformational ensembles from potentials of mean force. J. Mol. Biol. 1990, 213, 859—83. 40. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E. The Protein Data Bank. Nucleic Acids Res. 2000, 28, 235—42. 41 McQuarrie, D.A. Statistical Mechanics, Harper Collins Publishers, New York, 1976. 42. Hendlich, M., Lackner, P., Weitckus, S., Floeckner, H., Froschauer, R., Gottsbacher, K., Casari, G., Sippl, M.J. Identification of native protein folds amongst a large number of incorrect models. The calculation of low energy conformations from potentials of mean force. J. Mol. Biol. 1990, 216, 167—80. 43. Jones, D.T., Taylor, W.R., Thornton, J.M. A new approach to protein fold recognition. Nature 1992, 358, 86—9. 44. Thomas, P.D., Dill, K.A. An iterative method for extracting energy-like quantities from protein structures. Proc. Natl. Acad. Sci. U.S.A. 1996, 93, 11628—33. 45. Thomas, P.D., Dill, K.A. Statistical potentials extracted from protein structures: How accurate are they? J. Mol. Biol. 1996, 257, 457—69. 46. Li, X., Liang, J. In Computational Methods for Protein Structure Prediction and Modeling (eds Y. Xu, D. Xu and J. Liang), Springer, New York, Vol. 1, 2006, pp. 71—124. 47. Muegge, I., Martin, Y.C. A general and fast scoring function for protein-ligand interactions: A simplified potential approach. J. Med. Chem. 1999, 42, 791—804. 48. Huang, S.-Y., Zou, X. Inclusion of solvation and entropy in the knowledge-based scoring function for protein-ligand interactions. J. Chem. Inf. Model. 2010, 50, 262—73. 49. Huang, S.-Y., Zou, X. An iterative knowledge-based scoring function to predict protein-ligand interactions: I. Derivation of interaction potentials. J. Comput. Chem. 2006, 27, 1865—75. 50. Huang, S.-Y., Zou, X. An iterative knowledge-based scoring function to predict protein-ligand interactions: II. Validation of the scoring function. J. Comput. Chem. 2006, 27, 1876—82. 51. Velec, H.F.G., Gohlke, H., Klebe, G. DrugScoreCSD-knowledge-based scoring function derived from small molecule crystal data with superior recognition rate of near-native ligand poses and better affinity prediction. J. Med. Chem. 2005, 48, 6296—303. 52. Gohlke, H., Hendlich, M., Klebe, G. Knowledge-based scoring function to predict protein-ligand interactions. J. Mol. Biol. 2000, 295, 337—56. 53. Ishchenko, A.V., Shakhnovich Small, E.I. molecule growth 2001 (SMoG2001): An improved knowledge-based scoring function for protein-ligand interactions. J. Med. Chem. 2002, 45, 2770—80. 54. Mitchell, J.B.O., Laskowski, R.A., Alex, A., Thornton, J.M. BLEEP—Potential of mean force describ ing protein-ligand interactions: I. Generating potential. J. Comput. Chem. 1999, 20, 1165—76. 55. Mitchell, J.B.O., Laskowski, R.A., Alex, A., Forster, M.J., Thornton, J.M. BLEEP-Potential of mean force describing protein-ligand interactions: II. Calculation of binding energies and comparison with experimental data. J. Comput. Chem. 1999, 20, 1177—85.

296

Sheng-You Huang and Xiaoqin Zou

56. Zhang, C., Liu, S., Zhu, Q., Zhou, Y. A knowledge-based energy function for protein-ligand, protein-protein, and protein-DNA complexes. J. Med. Chem. 2005, 48, 2325—35. 57. Zhao, X., Liu, X., Wang, Y., Chen, Z., Kang, L., Zhang, H., Luo, X., Zhu, W., Chen, K., Li, H., Wang, X., Jiang, H. An improved PMF scoring function for universally predicting the interactions of a ligand with protein, DNA, and RNA. J. Chem. Inf. Model. 2008, 48, 1438—47. 58. Verkhivker, G., Appelt, K., Freer, S.T., Villafranca, J.E. Empirical free eenrgy calculations of ligandprotein crystallographic complexes. I. Knowledge-based ligand-protein interaction potentials applied to the prediction of human immunodeficiency virus 1 protease binding affinity. Protein Eng. 1995, 8, 677—91. 59. DeWitte, R.S., Shakhnovich, E.I. SMoG: de Novo design method based on simple, fast, and accutate free energy estimate. 1. Methodology and supporting evidence. J. Am. Chem. Soc. 1996, 118, 11733—44. 60. Nobeli, I., Mitchell, J.B.O., Alex, A., Thornton, J.M. Evaluation of a knowledge-based potential of mean force for scoring docked protein-ligand complexes. J. Comput. Chem. 2001, 22, 673—88. 61. Sotriffer, C.A., Gohlke, H., Klebe, G. Docking into knowledge-based potential fields: A compara tive evaluation of DrugScore. J. Med. Chem. 2002, 45, 1967—70. 62. Allen, F.H. The Cambridge Structural Database: A quarter of a million crystal structures and rising. Acta Crystallogr. 2002, B58, 380—8. 63. Yang, C.-Y., Wang, R., Wang, S. M-Score: A new knowledge-based potential scoring function accounting for protein atom mobility. J. Med. Chem. 2006, 49, 5903—11. 64. Mooij, W.T., Verdonk, M.L. General and targeted statistical potentials for protein-ligand interac tions. Proteins 2005, 61, 272—87. 65. Muegge, I., Martin, Y.C., Hajduk, P.J., Fesik, S.W. Evaluation of PMF scoring in docking weak ligands to the FK506 binding protein. J. Med. Chem. 1999, 42, 2498—503. 66. Ha, S., Andreani, R., Robbins, A., Muegge, I. Evaluation of docking/scoring approaches: A comparative study based on MMP3 inhibitors. J. Comput. Aided Mol. Des. 2000, 14, 435—48. 67. Muegge, I. PMF scoring revisited. J. Med. Chem. 2006, 49, 5895—902. 68. Zhou, H., Zhou, Y. Distance-scaled, finite ideal-gas reference state improves structure-derived potentials of mean force for structure selection and stability prediction. Protein Sci. 2002, 11, 2714—26. 69. Wang, R., Lu, Y., Wang, S. Comparative evaluation of 11 scoring functions for molecular docking. J. Med. Chem. 2003, 46, 2287—303. 70. Huang, S.-Y., Zou, X. Ensemble docking of multiple protein structures: Considering protein structural variations in molecular docking. Proteins 2007, 66, 399—421. 71. Huang, S.-Y., Zou, X. Efficient molecular docking of NMR structures: Application to HIV-1 protease. Protein Sci. 2007, 16, 43—51. 72. Huang, S.-Y., Zou, X. An iterative knowledge-based scoring function for protein-protein recogni tion. Proteins 2008, 72, 557—79. 73. Jasobsson, M., Liden, P., Stjernschantz, E., Bostro¨m, H., Norinder, U. Improving structure-based virtual screening by multivariate analysis of scoring data. J. Med. Chem. 2003, 46, 5781—9.

SUBJECT INDEX Note: The letters ‘f ’ and ‘t’ following locators refer to figures and tables respectively.

ab initio electron correlation methods, 30-32

resolution-of-identity MP2, 31-32

ab initio electron propagator methods

applications

buckminsterfullerene, C60, 86-87

oligonucleotides, 87-91

See also Applications of electron propagator

methods

electron propagator theory

quasiparticle virtual orbital spaces, 84-86

self-energy approximations, 81-84

Green’s functions, 80

Accelerator, 4, 4t, 22, 27, 29, 30

ACEMD code of Harvey, 11, 12t, 13-16 PME approach, 11

Acid dissociation constants (pKa), theoretical

calculations of

background

gas-phase free energy calculations, 120

solvation free energy calculations, 121-122

thermodynamic cycles, 114-120

calculating changes in free energy in solution,

125-130

calculating changes in free energy in the gas

phase, 122-124

chemical accuracy, 114

thermodynamic cycles, 130-132

AFE simulations, See Alchemical free energy (AFE) simulations AIM2000 program, 70

AIM theory, See Atoms-in-molecules (AIM) theory Alchemical free energy (AFE) simulations,

51-59

See also Direct scheme AFE simulations;

Indirect scheme AFE simulations;

QM/MM AFE simulations

AMBER, 4, 11, 12t, 13, 14, 152, 194, 282

AMBER 11

AMBER PMEMD engine, 4

Aminoglycoside derivatives I

structure-based antibiotic design

antibacterial activity of, 151

antibiotics binding to 30S, 151

refinement of aminoglycoside antibiotics, 152

RiboTargets, focus on antibiotics binding to

30S, 151

Aminoglycoside derivatives II, 160-161, 160f

Amphipathic helix insertions, 246, 247, 249

Antibiotics targeting the ribosome

ribosome antibiotic complexes, 140-144

RNA as drug target, 144-147

structure-based antibiotic design, case study

aminoglycoside derivatives I, 151-152

aminoglycoside derivatives II, 160-161

A-site scaffolds, 159

chloramphenicol derivatives, 154-155

designer macrolides, 149-151

designer oxazolidinones, 147-148

pleuromutilin derivatives, 152-154

RNA-directed fragment libraries, 156-158

thiostrepton derivatives, 155-156

Anton, 4

APIs, See Application programming interfaces

(APIs)

Application programming interfaces (APIs), 5, 7,

9, 23

Applications of electron propagator methods

buckminsterfullerene, C60

ADC calculations, 86-87

equilibrium structure of, 86

NR2 calculations, 87

OVGF calculations, 86-87

vertical ionization energies of C60 (eV), 87t

calculations with Gaussian03, 86

oligonucleotides

2’,2’-deoxyribodithymidine-3’,

5’-monophosphate anion, 89f, 91

deoxyribothymidine monophosphate

anion, 89-91

disk space requirements in computer

resourse calculations, 89

electron transfer phenomena in DNA and

RNA, 87-88

electrospray photodetachment

photoelectron spectroscopy, 88

297

298

Subject Index

Applications of electron propagator methods (Continued)

P3 studies of the VDEs, 88-89

QVOS method, 89

A-site scaffolds, 160f structure-based antibiotic design

docking by RiboDock, 159

screening of docked ligands, FRET and

NMR analysis, 159

screening strategy, 159

ATI, 4t, 9, 11, 12t, 23, 24

Atomistic modeling of SOFCs

kinetic parameters

computational estimates of, 210-211

experimental estimates of, 206-210

KMC simulations

challenges/issues, 205-206

kinetic parameters, importance, 206,

208t-209t, See also Kinetic parameters in atomistic modeling of SOFCs

rare-event processes, 204

system placed on cubice lattice, events/

even rates, 204-205

system propogation through time, 204

timescale inversely proportional to fastest

events, 204

simulations, See Atomistic simulations of

SOFCs

See also Solid oxide fuel cells (SOFCs)

Atomistic simulations of SOFCs, 212-227

modeling SOFCs with KMC simulations

anode/cathode, influence on

electrochemical reactions, 216-217

Bode plots and Nyquist plots, 215-216, 216f

comparison with analytic models, 212

consistency of KMC model, materials-

independent parameters, 217-218, 218f

3-D KMC model of a YSZ, 214

one-dimensional (1-D) lattice model,

212-214, 213f

Pt-island, Pt-strap catalysts on top of the

YSZ supercell, 224f

SOFC voltage and power density plot of a

simulated SOFC, 225f

two-step electron transfer process, DFT

calculation, 223

SOFC performance, factors (categories), 212

Atom-randomization reference state

ASP, 289

BLEEP (Mitchell), application to 77

complexes, 288

DrugScorePDB (Gohlke), 288-289

KScore, 289

MScore, 289

prediction of HIV-1 protease binding affinity,

285-287

SMoG for de novo lead design, 287-288

Atoms-in-molecules (AIM) theory, 70, 74

Azithromycin, 143t, 144f, 149-151

BAR domains, 238-240, 241f, 246, 248-250, 251f,

252, 256

Berendsen thermostat, 267, 268f

BigDFT software, 30

See also Daubechies wavelets

“Blocking” electrodes, 215

B3LYP/6-311þþG** model, 88, 124, 127-129

Cache units in CPU, 6

CADD, See Computer-aided drug design

(CADD)

Cahn-Hilliard (CH) equations, 240, 242, 244, 245

Cambridge Structural Database (CSD), 151, 289

Carbidopa, 106

Catalysis, 66, 99, 169-195

See also RNA catalysis, confirmational transitions/metal ion binding in

CCdA-Puromycin cofactor, 154

Cellular signaling

processes

endocytosis, 247

enzyme activation, 247

ion-channel activation, 247

See also Modeling signaling processes across cellular membranes

sss“Chaperone,” 54

Chaperoned strategy, 54

CHARMM code, 13, 146, 194, 282

Chemical footprinting, 152-155

Chloramphenicol derivatives

structure-based antibiotic design

antibacterial properties, modification of,

154

binbing, cocrystallization in Deinococcus vs.

Haloarcula, 154

molecular modeling by MacroModel,

154-155

Clarithromycin, 149-150

CM3 charge model, 39

Coarse-grained models, 267, 274

Combined QM and MM (QM/MM) potentials,

53

Computer-aided drug design (CADD), 99-100

Compute Unified Device Architecture (CUDA),

7-10, 15, 16, 23-24, 26, 28, 30, 32

Conductor-Like Polarizable Continuum Model

(CPCM), 122, 125, 127t, 128-130, 132

Subject Index

Confirmational transitions/metal ion binding in RNA catalysis HHR

crystallographic studies, 172

drug design/discovery, 171

inhibitor of BCR-ABLi1 gene expression, 171

inhibitor of hepatitis-B virus gene

expressions, 171

potential anti-HIV-1 therapeutic agent, 171

role in posttranscriptional gene regulation,

171

L1 ligase ribozyme

crystal structure of, 172

dynamical hinge points, identification, 173

in vitro selection techniques, 172

methods, 194

molecular simulations of HHR

metal binding modes, 173-181

simulations along the reaction coordinate,

181-183

simulations of mutations of key residues,

183-187

molecular simulations of L1 ligase

anatomy of the ligation site and

implications for catalysis, 191-193

conformational variation of L1L at

dynamical hinge points, 188-190

U38 loop for allosteric control, 190-191

“multiscale models”

combined QM/MM potential, 170-171

protein enzyme systems

electrostatic interactions in, 171

microscopic in silico model, 171

semiempirical quantum models, need for,

171

RNA as a messenger, 170

Conformational sampling, 56, 58

Construction of multidimensional PES,

factors, 44

ene reaction between singlet O2 and

tetramethylethylene, 45f

3-D PMF calculations, 45-46

rate-limiting transition structure, 45f, 46

three reaction coordinates, 45

“two-step no-intermediate” mechanism, 44

modeling of larger molecules, problems, 44

Corrected reference state

DFIRE, 290-291

ligand volume correction factor, 289-290

test set of 77 protein-ligand complexes for

PMF scoring function, 290

Correlation states, 92

CPCM, See Conductor-Like Polarizable

Continuum Model (CPCM) CPU units, 6, 6f

299

Crystallography, 66, 67, 68, 99, 139-140, 147, 150,

151, 159, 269

See also X-ray crystallography for detection of

metalloproteins CSD, See Cambridge Structural Database (CSD) CUDA, See Compute Unified Device Architecture (CUDA)

Daubechies wavelets, 30

Deinococcus, 141f, 143, 154

Density-fitted Poisson method, 29-30

Density functional theory (DFT), 22, 24, 26,

28-31, 33, 67, 69, 70, 97-111, 123-124,

128, 210-211, 214-215, 223

Deoxyribothymidine monophosphate anion,

89-91, 89f

Designer macrolides

structure-based antibiotic design

antibacterial activity against resistant

organisms, 150, 150f

crystallography and molecular modeling,

150

positioned near linezolid binding to RNA,

149f

Designer oxazolidinones

structure-based antibiotic design

linezolid binding to ribosomal RNA, 147

oxazolidinone ring and U2539,

interaction, 147

proximity of linezolid, linking algorithm,

147

QSAR modeling, 147

R�-01 family of compounds, 148

sparsomycin/linezolid proximity,

schemes, 147-148, 148f, 149f

Development of algorithms on GPUs,

strategies, 8

DFIRE, See Distance-scale finite ideal-gas

reference (DFIRE)

DFT-D (DFT plus dispersion) method, 103, 106,

108

Dielectric Polarizable Continuum Model

(DPCM), 121, 125, 126, 127, 128

Dihydrobutyrine, 155

Direct scheme AFE simulations, 53-54

“annihilated” atoms, extrapolation strategy for, 54

calculations, electronic switching, 53-54

chaperoned approach, 54

Distance-scale finite ideal-gas reference (DFIRE), 290-291

Divalent metal ions, 171, 172, 173

3-D KMC model of a YSZ

comparison with SSZ, 214

300

Subject Index

3-D KMC model of a YSZ (Continued)

DFT calculations

generation of database of 42 migration

energy barriers, 214-215

DFTþKMC modeling, study, 215

EIS studies, 215

DOCK version 4.0, 151

Domain, definition, 266

“Dominant” (rate-limiting) events, 206

Dos, See Dyson orbitals (DOs)

DPCM, See Dielectric Polarizable Continuum

Model (DPCM) D. radiodurans 50S, 153

DrugScoreCSD, 287t, 289

DrugScoreRNA, 146

Dynamic Monte Carlo (DMC) scheme, 245

Dyson orbitals (DOs), 81, 83

EIS, See Electrochemical impedance spectroscopy (EIS) Electrochemical impedance spectroscopy (EIS), 215, 216f

Electrochemistry, 207, 210, 223

Electron detachment energies, 85, 88

Electron propagator theory

quasiparticle virtual orbital spaces, 84-86

self-energy approximations

canonical Hartree-Fock orbitals, choice

of, 82

Davidson diagonalization procedure, 84

Dyson equation, FDAs and electron

binding energies computation,

81-82

Hermitian superoperator Hamiltonian

matrices, 83

pole strengths (PSs), evaluation, 84

quasiparticle approximations, approaches,

82-83

quasiparticle techniques, 82

renormalized methods, 84

Electron repulsion integrals (ERIs), 24-28

calculation of, CUDA implementation for

(Asadchev), 28

McMurchie-Davidson scheme, 26-27

mixed-precision (MP) CPU/GPU scheme, 26

Rys quadrature scheme, 26

Electron spin resonance, 66

Electron transfer phenomena in DNA and RNA,

87-88

Electrospray photodetachment photoelectron

spectroscopy, 88

Ellipticine, 107

Empirical scoring function, 155, 282

ERI calculation, CUDA implementation for

(Asadchev), 28

ERIs, See Electron repulsion integrals (ERIs)

Ewald summation-based methods, 55-56

FAAH, See Fatty acid amide hydrolase (FAAH)

FASTRUN, 4, 4t

Fatty acid amide hydrolase (FAAH), 40

FDAs, See Feynman-Dyson amplitudes (FDAs)

FEP technique, See Free-energy perturbation

(FEP) technique

Feynman-Dyson amplitudes (FDAs), 81

First generation GPUs, 24

First-order generalized ensemble-based QM/

MM AFE simulations, 56-57

advantage, 57

replica exchange-based strategy, 57

simulated scaling-based strategy

in direct/indirect scheme, 57

Wang-Landau recursion method/

metadynamics recursion method, 57

Folding of conjugated proteins

chaperons, role, 264

disordered tails, effects on protein

characteristics, 265

funnel theory of protein folding, 264

glycosylated SH3 protein, 264f

glycosylation

deciphering a glycosylation code,

difficulties, 265

methods

coarse-grained models, 267

inhomogeneous degrees of freedom, 267

native topology-based model, 267

thermostat effects on temperature of the

conjugate, 267, 268f

multidomain proteins, 264f, 266

myristoylation and palmytoylation, 264

phosphorylation, 264

PTMs, role

glycosylation and effect of glycans on

folding, 264-265

results and discussion

folding of glycoproteins, 267-270

folding of multidomain proteins, 273-275

folding of proteins with flexible tails,

270-271

folding of ubiquitinated proteins,

271-273

tailed SH3 protein, 264f

ubiquitinated Ubc7 protein (monomeric/

tetrameric ubiquitin), 264f

ubiquitination, 264, 265-266

Folding of glycoproteins, 267-270

biophysical characteristics of glycoproteins

effect of degree of glycosylation on protein

biophysics, 269-270, 269f

Subject Index

number of native contacts at conjugation/ glycosylation site, 268f

protein stability, influence of glycan, 267

thermodynamic analyses of the simulations,

results, 270

Folding of multidomain proteins, 273-275

FNfn9 domain, 274

thermal stability of multidomain

FNfn9-FNfn10 protein, 274-275, 274f

WHAM, study of thermodynamic properties,

274

Folding of proteins with flexible tails, 270-271

characteristics, 270f

entropy of the protein, opposing factors, 271

longer tails, destabilization of protein,

270-271

repulsive interaction effects between the tail

and protein, 270f, 271

Folding of ubiquitinated proteins, 271-273

protein degradation, 271-272

native-state simulation models, 272

thermostability of ubiquitinated proteins, 272f

affecting factors, 273

Force field scoring functions

AMBER/CHARMM force fields, 282

semiempirical weighting or scaling

parameters, 282

van der Waals/electrostatic terms, energy

components, 282

Free-energy perturbation (FEP) technique, 38, 39,

40, 41, 43, 46

Free-energy profiles, computation of

methods

FEP method, 39

MC sampling methods, 39

QM/MM calculations, methods, 39, See also

Quantum and molecular mechanical

(QM/MM)

multidimensional potentials of mean force,

44-46

construction of PES, factors, 44

PMF calculation, approaches, 38-39

multidimensional computational

technique, 39

polynomial quadrature method,

applications, 39

Polynomial Quadrature Method, 40-44

Zwanzig expression for FEP technique, 38

Free energy simulations, types

molecular dynamics (MD), 52

Monte Carlo (MC), 52

Frequency response characteristics, fuel cell, 220

Fullerene ionization energies, 86, 87t

Fullerenes, 79-92

301

Gaussian 98 program, 69

Generalized ensemble simulation, 56-59

Generalized gradient approximation (GGA), 102

Generalized solvent boundary potential (GSBP)

method, 55

GGA, See Generalized gradient approximation

(GGA)

Glycoproteins, folding of, 267-270

biophysical characteristics of glycoproteins

effect of degree of glycosylation on protein

biophysics, 269-270, 269f

number of native contacts at conjugation/

glycosylation site, 268f

protein stability, influence of glycan, 267

thermodynamic analyses of the simulations,

results, 270

Glycosylation, 264-265, 267-268, 269f, 270

“Glycosylation code,” 265

Glycosylphosphatidylinositol (GPI)-anchored

proteins, 239

GPCRs, See G-protein-coupled receptors

(GPCRs)

G-protein-coupled receptors (GPCRs), 238

GPU-based (early) MD code development, 9-10

cell-based list algorithm for neighbor list, 10

electrostatic model, simulations of liquid

water (Davis), 10

HOOMD GPU implementation, 10

MD implementation for molecular modeling

computations

techniques for direct Coulomb

summation, 10

MD implementation to use CUDA

van der Waals potential, use of (Liu), 10

GPU programming languages

AMD’s Stream, 7

CUDA programming model from NVIDIA,

7-8

scientific use, extremes, 7

GPU, See Graphics processing units (GPUs)

GPUs, software development for

first generation of GPUs, features, 24

high density of arithmetic units, example

FFTW Fourier transform library, portable

algorithm, 23

GPU kernels/streams, 23

GPU programming, problems, 23

NVIDIA GeForce 8800 GTX GPU,

22-23

parallel programming of memory, 23

second generation of GPUs, features, 24

third generation of GPUs, features, 24

GPU vs. CPU, 6-7

CPU

RAM, function of, 6

302

Subject Index

GPU vs. CPU (Continued) sequential code execution, Von Neumann architecture, 6

SISD category, 6

units of, 6, 6f

GPU

display of 3D graphics, strategy, 6

larger memory bandwidth, 7

larger number of ALUs, 7

RAM, function of, 6

SIMD category, 7

units of, 7

Graphics processing units (GPUs), 3-17

APIs, 5

GPU hardware, applications, 5

peak floating-point operations per second, 5f

Voodoo graphics chip (1996), 5

Green’s functions, 80

Grow-search-score algorithm (AnalogTM or

BOMB), 147

GSBP method, See Generalized solvent boundary

potential (GSBP) method

Guoy-Chapman model, 212, 214

Half-cell SOFC model, 216-217

Haloracula, 154

Hammerhead ribozyme (HHR), 171-172

crystallographic studies, 172

drug design/discovery, 171

inhibitor of BCR-ABLi1 gene expression, 171

inhibitor of hepatitis-B virus gene

expressions, 171

potential anti-HIV-1 therapeutic

agent, 171

role in posttranscriptional gene regulation,

171

See also Molecular simulations of HHR

Hartree-Fock (HF) theory, 22, 24, 82, 83-86, 90,

98, 122, 123, 124

HB (with distal His) models, 69

Heme systems, investigation approaches,

69-70

Bader’s AIM theory, 70

DFT method BPW91 with Wachters’ basis for

Fe, 69

6-311G* for heavy atoms, 69

6-31G* for hydrogens in Gaussian 98

program, 69

HHR, See Hammerhead ribozyme (HHR)

HMG-Co-A reductase, 110

HOOMD code, 10

Hybrid GGA methods, 102

Hydrogen bonding, 43, 46, 68, 71, 71f, 72-74, 106,

108, 109, 110, 145, 146, 151, 153, 159, 183,

186f, 187, 188

Hydrogen bonding in metalloprotein quantum chemical investigation of Mo¨ssbauer properties, examples oxymyoglobin, MbO2, 68 Pauling-type closed-shell singlet 1FeII-1O2, 68

QM/S approach, results, 68

Weiss-type open-shell singlet 2FeIII"#2O2, 68

Hygromycin (PDB 1HNZ), 143t, 151

Indirect scheme AFE simulations, 54-55

Induction/dispersion interactions in

ligand-protein complexes

applications

binding of steroid hormones, examination,

108

development of anticancer drugs (DFT),

108

development of novel statin drugs, 110

DFT-QSAR approach, study, 109-110

hydrogen bonding on peptide structure,

DFT study, 110

interactions in polypeptide structure,

study, 109

modeling of coordinate chemistry of

ligands bound to metal ions, 109

chemical accuracy, importance, 98

correct path to dispersion

DFT-D method, augmentation of TPSS

functional, 103

explicit treatment of dispersion, need for

(Hobza), 104

interaction energies in JSCH-2005 database,

evaluation, 103

interactions of DNA bases and ellipticine,

study, 107

modeling of aromatic interactions, local

functionals in, 105

M05-2X, DFT exchange and correlation

energy, 104-105

parameterizing DFT methods, 105

PWB6K/M05-2X, dispersion-augmented

functionals, 103

DFT

GGA DFT methods, 102

hybrid/meta-GGA approach, improved

GGA methods, 102

KS implementation of DFT, 100-101

LDA methods, 101

LSDA methods, 101-102

SVWN DFT method, 102

ligand-protein complexes

CADD, ligand docking and pose scoring,

99-100

DFT, 100

Subject Index

drugprotein binding, 99

hydrophilic or lipophilic active sites, 99

p-stacking aromatic interactions

(GlaxoSmithKline), 99

modeling of biological processes, 98

modeling of covalent interactions, methods

DFT, 98

HF ab initio methods, 98

molecular mechanics, 98

post-HF methods, 98

semiempirical methods, 98

ITScore (Huang and Zou), 286t, 287t, 291-292

JSCH-2005 database, 103, 105, 106, 107

Kahan Summation Formula, 32

Kemp elimination reactions, 41

of benzisoxazole with piperidine, 43-44, 44f

cubic polynomial quadrature method,

drawback, 41

free energy profile of 5-nitro-benzisoxazole by

antibody 4B2, 42f

5-nitro-benzisoxazole-antibody 4B2 system free energy profile of, 42f, 43 proton transfers, 42f testing with higher-order polynomial, 41-43

Kernels, 8

Kinetic Monte Carlo (KMC) simulation, 203,

204-206

Kinetic parameters in atomistic modeling of

SOFCs

computational estimates of, 210-211

charge transfer into and out of the YSZ, rate

expression for, 211

DFTþKMC multiscale simulation

approach, 211

DFT method, example, 210-211

UBI-QEP, 211

experimental estimates of, 206-210

cathode/anode, influence on

electrochemical reactions, 207-210

doping strategies, 206-207

ideal electrode materials, 207

kinetic parameters used in KMC simulations

for different reactions, 208t-209t KMC simulation, See Kinetic Monte Carlo (KMC) simulation Knowledge-based scoring functions, See Meanforce scoring function (SMoG) Kohn-Sham density functional theory (KS-DFT), 24, 25t

KScore, 146

KS-DFT, See Kohn-Sham density functional

theory (KS-DFT)

303

KS-DFT and HF theory

density-fitted Poisson method, 29-30

DFT with Daubechies wavelets, 30

electron repulsion integrals, 25-28

calculation of, CUDA implementation for

(Asadchev), 28

McMurchie-Davidson scheme, 26-27

mixed-precision (MP) CPU/GPU scheme, 26

Rys quadrature scheme, 26

evaluation of SCF equations/ERIs, 24, 25t

numerical exchange-correlation quadrature,

28-29

Langevin thermostat, 267, 268f L11BD, See L11 ribosomal protein-binding domain (L11BD) LDA method, See Local density approximation (LDA) method Ligand-protein complexes, induction/ dispersion interactions in applications binding of steroid hormones, examination, 108

development of anticancer drugs (DFT), 108

development of novel statin drugs, 110

DFT-QSAR approach, study, 109-110

hydrogen bonding on peptide structure,

DFT study, 110

interactions in polypeptide structure,

study, 109

modeling of coordinate chemistry of

ligands bound to metal ions, 109

chemical accuracy, importance, 98

correct path to dispersion

DFT-D method, augmentation of TPSS

functional, 103

explicit treatment of dispersion, need for

(Hobza), 104

interaction energies in JSCH-2005 database,

evaluation, 103

interactions of DNA bases and ellipticine,

study, 107

modeling of aromatic interactions, local

functionals in, 105

M05-2X, DFT exchange and correlation

energy, 104-105

parameterizing DFT methods, 105

PWB6K/M05-2X, dispersion-augmented

functionals, 103

DFT

GGA DFT methods, 102

hybrid/meta-GGA approach, improved

GGA methods, 102

KS implementation of DFT, 100-101

LDA methods, 101

304

Subject Index

Ligand-protein complexes, induction/ dispersion interactions in (Continued)

LSDA methods, 101-102

SVWN DFT method, 102

ligand-protein complexes CADD, ligand docking and pose scoring, 99-100

DFT, 100

drugprotein binding, 99

hydrophilic or lipophilic active sites, 99

p-stacking aromatic interactions

(GlaxoSmithKline), 99

modeling of biological processes, 98

modeling of covalent interactions, methods

DFT, 98

HF ab initio methods, 98

molecular mechanics, 98

post-HF methods, 98

semiempirical methods, 98

Lipid rafts, 238-239, 246, 256

Lipinski Rule of 5, 150

L1 ligase ribozyme

crystal structure of, 172

dynamical hinge points, identification, 173

in vitro selection techniques, 172

See also Molecular simulations of L1 ligase

Local density approximation (LDA) method, 101

Local spin density approximation (LSDA)

method, 101-102

Long-range electrostatic interaction, 53

LRH0n, 105

L11 ribosomal protein-binding domain (L11BD),

155

LSDA method, See Local spin density

approximation (LSDA) method

Macrolides (designer)

structure-based antibiotic design

antibacterial activity against resistant

organisms, 150, 150f

crystallography and molecular modeling,

150

positioned near linezolid binding to RNA,

149f

MacroModel, 154

Macromolecular ribosome machine, function, 140

MbO2 active site model

AIM results of, 73-74, 74t geometric parameters of, 72t molecular structure of, 68f Mo¨ssbauer parameters of, 73t Mulliken spin densities, charges, and energies of, 71t

McMurchie-Davidson scheme, 26-27

MC sampling methods, 39

MD, See Molecular dynamics (MD)

MDGRAPE system, 4

MD simulation

of an enzyme (McCammon), 4

of condensed-phase biological systems,

examples

ATOMS, 4

FASTRUN, 4

MDGRAPE system, 4

MD hardware acceleration projects, cost

estimates, 4t

MD simulation of biomolecules on GPU

applications

protein folding, 15-16

GPU-based implementations of classical

molecular dynamics

early GPU-based MD code development,

9-10

production GPU-based MD codes, 11-12

GPU programming

considerations, 8

GPU/CPU hardware differences. See GPU

vs. CPU

GPU programming languages, emergence

of, 7-8

performance and accuracy

performance and scaling, 13-14

validation, single precision treatments,

14-15

See also Graphics processing units (GPUs)

Mean-field theory, 247, 249

Mean-force scoring function (SMoG), 287

pairwise characteristics, 282

Membrane curvature, 242, 245, 246, 248, 249,

250, 252

Membrane elasticity, 241

Mesoscopic model of membrane-associated

signaling complexes

accounting for amphipathic helix insertions, 246

free energy minimization, 244-245

quantitative description of peripheral protein

diffusion DMC scheme, 245

strategy, 240-241

system representation and governing free

energy BAR-membrane complex at steady state, 241f

2D mixing entropy, 243

electrostatic (Coulomb) energy of system, 242

translational entropy of mobile (salt) ions in

solution, 242

zwitterionic lipids, CH equations, 242

Subject Index

Messenger RNA (mRNA), 140, 171

Meta-GGA methods, 102

Metalloproteins, quantum chemical calculations

computational details, 68-70

heme systems, See Heme systems,

investigation approaches

molecular structure of the basic MbO2

active site model, 68f

non-HB/HB models, 69

hydrogen bonding in metalloprotein

quantum chemical investigation of

Mo¨ssbauer properties, examples, 68

results and discussion

AIM results of of optimized MbO2 models,

73-74, 74t

geometric parameters of optimized MbO2

models, 72t

hydrogen-bonding energies, calculations,

71-72

Mo¨ssbauer parameters of optimized MbO2

models, 73t

Mulliken spin densities/charges/energies

of optimized MbO2 models, 71t

spin density distribution in 2B model,

71, 71f

structure determination of metal sites,

spectroscopic techniques, 66

“fingerprints” of protein systems, 66

NMR spectroscopy, 67

X-ray crystallography, 66-67

X-ray structures vs. NMR structures, 66

X-ray structures

accuracy problems, 66

QM/S techniques, determination of

protonation states, 67

QSOR approach, 67

refinement of, linear-scaling quantum

chemical methods, 66-67

Mg2þ ions, 176, 194

See also Molecular simulations of HHR

MicroRNA, 145

Mixed-precision (MP) CPU/GPU scheme, 26

MM energy model, See Molecular mechanical

(MM) energy model Model applications in cellular signal transduction adsorption of natively unstructured protein

domains onto lipid membranes,

252-256

diffusion of peripheral proteins on lipid

membranes, 253-254

PIP2, role in anchoring proteins to

specialized membrane domains, 256

sequestration of PIP2 lipids by adsorbing

basic polypeptides, 252-253, 253f

305

slow/fast protein diffusing over PIP2 containing vs. PIP2-depleted

membranes, 254-255

BAR-membrane interactions, 248-252

adsorption of amphiphysin BAR domain

on lipid membranes, 249-250, 250f

elements of membrane remodeling by BAR

domains, 248-249

membrane tubulation/vesiculation by

arrays of BAR domains, 252

N-helix insertions, role in membrane

deformations, 250-252, 251f

future prospects, 256-257

PIP2 and cellular signaling-mechanisms of

membrane targeting

amphipathic helix insertions, 247

model predictions, 247

polyvalent over monovalent lipids,

electrostatic targeting, 247

protein binding/adsorption to lipid

membranes, 247

Modeling signaling processes across cellular

membranes

large-scale quantitative models, need for

MD/MC simulations, perennial

challenges/complexity, 239-240

mean-field-type theories, 240

self-consistent scheme, advantages, 240

lipid rafts, platforms for cellular signaling, 238-239

BAR domains, function of, 238-239

biochemical and biophysical studies, 239

GPI-anchored proteins or TM domains,

239

PIP2 lipids, 238

plasma membrane enriched in cholesterol

and sphingolipids, 238

mesoscopic model of membrane-associated

signaling complexes

accounting for amphipathic helix

insertions, 246

free energy minimization, 244-245

quantitative description of peripheral

protein diffusion, 245

strategy, 240-241

system representation and governing free

energy, 241-244

model applications

adsorption of natively unstructured protein

domains onto lipid membranes,

252-256

BAR-membrane interactions, 248-252

PIP2 and cellular signaling-mechanisms of

membrane targeting,

246-247

306

Subject Index

Modeling SOFCs with KMC simulations anode/cathode, influence on electrochemical reactions, 216-217

cathode-only ORR KMC simulation, 217

cathode-only YSZ fuel cell model, 217f

cathode-only SOFC model, 218-220

fitted voltage, current density and phase

angle shift, 220-221, 221f

frequency response characteristics of the

fuel cell, 220

logarithmic plot of resistance vs. YSZ

thickness, 219f Nyquist plots/Bode plots, 220-223, 222f sensitivity analysis of ionic current density on various kinetic parameters, 219f

Molecular docking, 110

See also Protein-ligand docking

Molecular dynamics (MD), 4

See also MD simulation; MD simulation of

biomolecules on GPU

Molecular mechanical (MM) energy model, 52

Molecular modeling, 150

Molecular simulations of HHR

metal binding modes

bridging of Mg2þ ion, effects, 175-176,

175t

“B-site,” 173

coordination patterns of Mg2þ and Naþ

ions in active site, 179t

“C-site,” 173

3D density contour maps of Naþ ion

distributions, 180f divalent and monovalent metal ion binding modes, 174f

dRT-C-Mg simulation, 175

Naþ binding patterns, in-line attack

conformations, 176-178, 177f, 178f

occupation of cations in HHR active site,

179-181

simulations along the reaction coordinate

comparison of crystallographic and

simulation data, 182t

metal-assisted proton transfer in the

general acid step, 182-183

molecular dynamics studies of transition

state mimics, 181-182

simulations of mutations of key residues,

183-187

simulation-derived key active site

structural parameters, 184t-185t

Molecular simulations of L1 ligase

conformational variation of L1L at dynamical

hinge points, 188-190

ligation site anatomy and implications for

catalysis, 191-193

U38 loop for allosteric control, 190-191

Møller-Plesset perturbation theory, 31, 123

Mono-protonation, 67

MORDOR, 158

Mo¨ssbauer, 66, 68, 69, 70, 72-73, 73t

MP2, See Second-order Møller-Plesset

perturbation theory (MP2)

mRNA, See Messenger RNA (mRNA)

Multidomain proteins, 264f, 266

folding of, 273-275

FNfn9 domain, 274

thermal stability of multidomain

FNfn9-FNfn10 protein, 274-275, 274f WHAM, study of thermodynamic properties, 274

M05-2X functional, 103-105

Myristoylation and palmytoylation, 264

NAMD, 10, 11, 12t, 13, 14, 15, 194

N-helix insertions, role in membrane

deformations, 250-252, 251f

NMR, See Nuclear magnetic resonance (NMR)

Non-HB (without distal His) models, 69

Nonpolar amino acids, 99

Nuclear magnetic resonance (NMR), 39, 66, 67,

99, 151, 155, 157, 158, 159, 181, 268

NVIDIA, 4t, 5f, 7, 9, 11, 12t, 14, 16, 23, 24, 26, 27,

28, 29, 32, 33

Oligonucleotides

2’,2’-deoxyribodithymidine-3’,

5’-monophosphate anion, 89f, 91

deoxyribothymidine monophosphate anion,

89-91

disk space requirements in computer resourse

calculations, 89

electron transfer phenomena in DNA and

RNA, 87-88

electrospray photodetachment photoelectron

spectroscopy, 88

P3 studies of the VDEs, 88-89

QVOS method, 89

One-dimensional (1-D) lattice, KMC model,

212-214, 213f

OpenMM library of Friedrichs, 11

GRO-MACS, 11

ORR, See Oxygen reduction reaction (ORR)

Orthogonal space random walk (OSRW)

technique, 58

OSRW technique, See Orthogonal space random

walk (OSRW) technique

Oxazolidinones (designer)

structure-based antibiotic design

linezolid binding to ribosomal RNA, 147

Subject Index

oxazolidinone ring and U2539, interaction,

147

proximity of linezolid, linking algorithm,

147

QSAR modeling, 147

R�-01 family of compounds, 148

sparsomycin/linezolid proximity,

schemes, 147-148, 148f, 149f

Oxygen reduction reaction (ORR), 207

Oxymyoglobin, 68, 69, 72, 73

Pactamycin (PDB 1HNX), 143t, 151

Paromomycin, 143t, 151-152, 152f

Particle mesh Ewald (PME) method, 11, 56, 194

Pauling-type closed-shell singlet, 68, 71

PDB, See Protein Data Bank (PDB)

Peptidyl transferase center (PTC), 141

Peripheral protein diffusion, 245, 253-254

PESs, See Potential energy surfaces (PESs)

Phosphorylation, 264

PIP2, See Polyvalent phosphatidylinositol

4,5-bisphosphate (PIP2) PIP2 and cellular signaling-mechanisms of membrane targeting

amphipathic helix insertions, 247

model predictions, 247

polyvalent over monovalent lipids,

electrostatic targeting, 247

protein binding/adsorption to lipid

membranes, 247

Pleuromutilin derivatives, 152, 153f

structure-based antibiotic design

chemical footprinting of ribosomal

residues, 152-153

docking experiments, 153-154

macrolide core extensions, effects, 153

PME method, See Particle mesh Ewald (PME) method PMF, See Potential of mean force (PMF) Poisson-Boltzmann (PB) theory of electrostatics, 55, 125, 241

Polar amino acids, 99

Polynomial Quadrature Method, 40-44

cubic polynomial methodology

application in Kemp elimination reactions,

41, See also Kemp elimination reactions

drawbacks, 41

FAAH, proton transfer reactions, 40-41

Polyvalent phosphatidylinositol

4,5-bisphosphate (PIP2), 238

See also PIP2 and cellular signaling-mechanisms

of membrane targeting

Potential energy surfaces (PESs), 39

Potential of mean force (PMF), 38, 146, 282,

283-284, 288

307

Potential of mean force scoring functions, See

Mean-force scoring function (SMoG)

Production GPU-based MD codes, 11-12

GPU-accelerated MD codes, key feature

comparison, 12t

production quality codes

ACEMD code of Harvey, 11

AMBER 11, 11

NAMD of Phillips, 11

OpenMM library of Friedrichs, 11

Protein Data Bank (PDB), 282

Protein folding

Hpin1 WW domain (Fip35)/NTL9, study

trajectory length calculations, 16

MD simulation of villin headpiece, 16

OpenMM library, use of, 15

See also Folding of conjugated proteins

Protein-ligand binding, SMoG for

protein-ligand docking

atom-randomization reference state,

285-289

corrected reference state, 289-291

scoring function problem, 282

scoring functions, categories, 282

theoretical background

evaluation of protein structure models (Tanaka and Scheraga), 283

mean-force pair potential, 283

pair distribution function g(r), potential of

mean force w(r), and true interatomic potential u(r), 284f potential energy of the system, computation, 283

reference state, 284

reference state calculation, criteria/

challenges, 284-285

statistical or knowledge-based potentials,

284

without the use of reference state, 291-292

iterative method, 291

ITScore (Huang and Zou), 291-292

ITScore/SE, 292

Protein-ligand docking atom-randomization reference state

ASP, 289

BLEEP (Mitchell), application to 77

complexes, 288

DrugScorePDB (Gohlke), 288-289

KScore, 289

MScore, 289

prediction of HIV-1 protease binding

affinity, 285-287

SMoG for de novo lead design, 287-288

binding mode and affinity predictions,

286t, 287t

308

Subject Index

Protein-ligand docking (Continued) corrected reference state

DFIRE, 290-291

ligand volume correction factor, 289-290

test set of 77 protein-ligand complexes for

PMF scoring function, 290

Pseudo-base pairing, 145

PTC, See Peptidyl transferase center (PTC)

Puromycin, 148

PWB6K functional, 103

QM/MM, See Quantum and molecular mechanical (QM/MM) QM/MM AFE simulations AFE simulations by MM energy model, advantages, 52-53

direct scheme AFE simulations, 53-54

free energy simulations, types

molecular dynamics (MD), 52

Monte Carlo (MC), 52

indirect scheme AFE simulations, 54-55

long-range electrostatic treatments

Ewald summation-based methods, 55-56

GSBP method, 55

major challenges, 53

sampling issue

“conformational sampling” issue, 56

in direct/indirect scheme AFE simulations,

56

first-order generalized ensemble sampling,

56-57

OSRW method, future improvements,

57-58

“overlap sampling” issue, 56

QM/MM calculations, 39, 55, 72, 173

QM/S techniques, See Quantum mechanics and

spectroscopy (QM/S) techniques

QSOR, See Quantitative structure observable

relationship (QSOR)

Quantitative structure observable relationship

(QSOR), 67

Quantum and molecular mechanical

(QM/MM), 39

calculations, methods

CM3 charge model, computation of atomic

charges, 39

intermolecular interactions, condensed-

phase reactions, 40

modified link-atom approach, enzymatic

reactions, 40

OPLS-AA force field, 39

PDDG/PM3 semiempirical QM method,

39

for protein systems, 39-40

Quantum chemical calculations, 65-75

Quantum chemistry on GPUs

ab initio electron correlation methods

resolution-of-identity second-order

Møller-Plesset perturbation theory,

31-32

KS-DFT and HF theory

density-fitted Poisson method, 29-30

DFT with Daubechies wavelets, 30

electron repulsion integrals, 25-28

numerical exchange-correlation

quadrature, 28-29

Quantum Monte Carlo (QMC)

calculations, 32

implementation of quantum cluster

approximation on SP GPUs, 32-33

Kahan Summation Formula, 32

software development for GPUs, 22-24

strategies for implementation, 22

Quantum mechanics and spectroscopy (QM/S)

techniques, 67

Quantum Monte Carlo (QMC)

calculations, 32

Kahan Summation Formula, 32

quantum cluster approximation on SP GPUs,

32-33

CUBLAS library, 32

Quasiparticle approximations, 82-83, 92

Quasiparticle virtual orbitals, 84-86

Quasiparticle, virtual orbital space (QVOS), 86

QVOS, See Quasiparticle, virtual orbital space

(QVOS) Radezolid, 148

“Rafts,” 238

RAM, See Random access memory (RAM)

Random access memory (RAM), 6

“Real-world” simulations, 15

Reference state, 80, 83, 84, 284, 285, 287, 288, 289,

291-292

See also Atom-randomization reference state;

Corrected reference state

Renormalized approximations, 83, 92

Retapamulin, 152, 153f

RiboDock’s scoring function, 146

Ribosomalmutilin complexes, 154

Ribosomal RNA (rRNA), 140, 147, 151, 152, 154,

156, 160

Ribosome antibiotic complexes, 140-144

antibiotics bound to 50S or 30S, 142f

conventional drug design, effectiveness, 144

crystal structures of, 143t

crystal structures of the 50S, 30S, and

70S, 141f

Subject Index

high-resolution crystal structures, 143-144

location of A-, P-, E-sites and PTC on 50S and

30S, 142f

macromolecular ribosome machine,

function, 140

peptide bond formation, 141

structure/function elucidation of ribosome

(Ramakrishnan, Steitz, and Yonath), 140

structures of antibiotic classes that bind to

50S/30S subunit, 144f

RiboTargets, 151

focus on antibiotics binding to 30S

DOCK version 4.0, use of, 151

Ribozymes, 145, 171-172

See also L1 ligase ribozyme

RNA as drug target

antibiotic ligand design, factors, 145

scoring functions, 145-146, See also Scoring functions in designing ligands to bind RNA structure-based drug design of protein-

binding ligands, 145

interaction of ligand complexes

driving forces for binding, 145

pseudo-base pairing, 145

RNA catalysis, confirmational transitions/metal ion binding in HHR

crystallographic studies, 172

drug design/discovery, 171

inhibitor of BCR-ABLi1 gene expression, 171

inhibitor of hepatitis-B virus gene

expressions, 171

potential anti-HIV-1 therapeutic agent, 171

role in posttranscriptional gene regulation,

171

L1 ligase ribozyme

crystal structure of, 172

dynamical hinge points, identification, 173

in vitro selection techniques, 172

methods, 194

molecular simulations of HHR

metal binding modes, 173-181

simulations along the reaction coordinate,

181-183

simulations of mutations of key residues,

183-187

molecular simulations of L1 ligase

anatomy of the ligation site and

implications for catalysis, 191-193

conformational variation of L1L at

dynamical hinge points, 188-190

U38 loop for allosteric control, 190-191

“multiscale models”

combined QM/MM potential, 170-171

309

protein enzyme systems

electrostatic interactions in, 171

microscopic in silico model, 171

semiempirical quantum models, need for,

171

RNA as a messenger, 170

RNA-directed fragment libraries, 156-158

structure-based antibiotic design, 157f, 158f,

159f

2-aminoquinoline lead, docking of,

157-158

fragment screening approaches, 156-157

identification of RNA ligands, 158

MORDOR, 158

Roscovitine, 108

rRNA, See Ribosomal RNA (rRNA)

Rusticyanine, 67

Rys quadrature scheme, 26

Scalar processors (ScaPs), 23

Scandia-stabilized zirconia (SSZ), 214

ScaPs, See Scalar processors (ScaPs)

SCF, See Self consistent field (SCF)

Scoring function problem, 282

Scoring functions, categories

empirical scoring function, 282

force field scoring functions, 282

mean-force scoring functions, 282

Scoring functions in designing ligands to bind RNA, 145-146

AutoDock (Moitessier), 146

docking to TAR, 146

DrugScoreRNA , RNA-centric scoring function,

146

KScore, 146

MCCS method, 146

RiboDock’s scoring function, 146

treatment of electrostatics, 146-147

See also Protein-ligand binding, SMoG for

S22 database, 104, 107

Secondary ion mass spectrometry, 202

Second generation GPUs, 24

Second-order Møller-Plesset perturbation

theory (MP2), 31-32

Self consistent field (SCF), 24

Single instruction, single data (SISD), 6

SiRNA, See Small interfering RNA (siRNA)

SISD, See Single instruction, single data (SISD)

Small interfering RNA (siRNA), 145

SMoG, See Mean-force scoring function (SMoG)

SMoG for protein-ligand docking

atom-randomization reference state

ASP, 289

BLEEP (Mitchell), application to 77

complexes, 288

310

Subject Index

DrugScorePDB (Gohlke), 288-289

KScore, 289

MScore, 289

SMoG for protein-ligand docking (Continued)

prediction of HIV-1 protease binding

affinity, 285-287

SMoG for de novo lead design, 287-288

binding mode and affinity predictions, 286t,

287t

corrected reference state

DFIRE, 290-291

ligand volume correction factor, 289-290

test set of 77 protein-ligand complexes for

PMF scoring function, 290

SMs, See Streaming multiprocessors (SMs)

SOFCs, See Solid oxide fuel cells (SOFCs)

Software development for GPUs

first generation of GPUs, features, 24

high density of arithmetic units, example

FFTW Fourier transform library, portable

algorithm, 23

GPU kernels/streams, 23

GPU programming, problems, 23

NVIDIA GeForce 8800 GTX GPU, 22-23

parallel programming of memory, 23

second generation of GPUs, features, 24

third generation of GPUs, features, 24

Solid oxide fuel cells (SOFCs), 201-228

high operating temperatures, cahracteristic, 202

KMC simulations, 203-204

MD simulations, 203

model ability, impact on system design and

operation, 203

next-generation SOFC development, goal, 202

secondary ion mass spectrometry, 202

SOFC performance, factors, 203

stationary power generation, application, 202

utility of modeling approaches, 203

atomistic-scale modeling techniques, 202

Sparsomycin, 147-148

SPs, See Streaming processors (SPs)

SSZ, See Scandia-stabilized zirconia (SSZ)

p-stacking aromatic interactions

(GlaxoSmithKline), 99

Steroid hormones, binding of, 108

Stochastic Langevin formulations, 240

Streaming multiprocessors (SMs), 7

Streaming processors (SPs), 8

Streams, 23

Structure-based antibiotic design, case study

aminoglycoside derivatives I

antibacterial activity of, 151

antibiotics binding to 30S, 151

refinement of aminoglycoside antibiotics,

152

RiboTargets, focus on antibiotics binding to 30S, 151

aminoglycoside derivatives II, 160-161, 160f

A-site scaffolds, 160f

docking by RiboDock, 159

screening of docked ligands, FRET and

NMR analysis, 159

screening strategy, 159

chloramphenicol derivatives

antibacterial properties, modification of, 154

binding, cocrystallization in Deinococcus vs.

Haloarcula, 154

molecular modeling by MacroModel,

154-155

designer macrolides

antibacterial activity against resistant

organisms, 150, 150f

crystallography and molecular

modeling, 150

positioned near linezolid binding to

RNA, 149f

designer oxazolidinones

linezolid binding to ribosomal RNA, 147

oxazolidinone ring and U2539,

interaction, 147

proximity of linezolid, linking algorithm,

147

QSAR modeling, 147

R�-01 family of compounds, 148

sparsomycin/linezolid proximity,

schemes, 147-148, 148f, 149f

pleuromutilin derivatives

chemical footprinting of ribosomal

residues, 152-153

docking experiments, 153-154

macrolide core extensions, effects, 153

RNA-directed fragment libraries, 156-158,

157f, 158f, 159f

2-aminoquinoline lead, docking of,

157-158

fragment screening approaches, 156-157

identification of RNA ligands, 158

MORDOR, 158

thiostrepton derivatives, 156f

Amide coupling, 156

binding to L11BD, 155

docking of structures, 155

substructure identification, 155-156

Thiostrepton- and Micrococin-binding

models, 155

Telithromycin, 143t, 149, 151

Tetracycline, 143t, 144f, 151

Thiostrepton- and Micrococin-binding models,

155

Subject Index

Thiostrepton derivatives, 156f structure-based antibiotic design

Amide coupling, 156

binding to L11BD, 155

docking of structures, 155

substructure identification, 155-156

Thiostrepton- and Micrococin-binding

models, 155

Third generation GPUs (NVIDIA Fermi), 24

Threads, 8

Three-phase boundary (TPB), 207

Tiamulin, 143t, 152, 153f, 154

TM domains, 239

TPB, See Three-phase boundary (TPB)

Transfer RNAs (tRNA), 140

TRNA, See Transfer RNAs (tRNA)

Two-dimensional Hubbard model, 32

UBI-QEP, See Unity bond index-quadratic

exponential potential (UBI-QEP)

Ubiquitinated proteins, folding of, 271-273

protein degradation, 271-272

native-state simulation models, 272

thermostability of ubiquitinated proteins, 272f

affecting factors, 273

Ubiquitination, 264-266, 268f, 271, 272, 272f, 273

Ubiquitin polymers, 272

U2539 (Haloarcula #), 147

Unity bond index-quadratic exponential

potential (UBI-QEP), 211

VAEs, See Vertical attachment energies (VAEs)

Valley-ridge inflection (VRI) points, 39

311

Valnemulin, 152, 153f

VDEs, See Vertical detachment energies (VDEs)

Vertical attachment energies (VAEs), 80

Vertical detachment energies (VDEs), 80

Vibrational spectroscopy, 66

Vicens and Westhof model system, 152

Voodoo graphics chip (1996), 5

Wang-Landau recursion method, 57

Weighted histogram analysis method (WHAM),

274

Weiss-type open-shell singlet, 68, 70, 73

WHAM, See Weighted histogram analysis

method (WHAM)

Wolf summation approach, 56

See also Ewald summation-based methods

X-ray crystallography for detection of metalloproteins

accuracy problems, 66

determination of geometric parameters

DFT-the Hohenberg-Kohn theorem approach, 67

QM/S techniques, 67

QSOR approach, Karplus relationship, 67

refinement of, linear-scaling quantum

chemical methods, 66-67

YSZ, See Yttria stabilized zirconia (YSZ)

Yttria stabilized zirconia (YSZ), 203, 206-207,

208t-209t, 210-220, 223-227

Zwiterionic lipids, 248

CUMULATIVE INDEX VOLS 16 12

C16O2, 3, 168

3D QSAR, 2, 182; 3, 67, 71

p—p interactions, 3, 183

ab initio, 3, 215, 219, 220

ab initio modelling, 1, 187, 188

ab initio thermochemical methods, 1, 33, 37, 45

absorption, 5, 103, 108—113, 121—123

intestinal, 1, 137—138 see also ADMET properties

accelerated molecular dynamics, 2, 230

accelerator, 6, 4, 22, 25, 27, 29—30

ACEMD, 6, 11—16

ACPF, 3, 163

action optimization, 3, 17, 19

activated state, 3, 220—222

active database, 3, 157

Active Thermochemical Tables, 3, 159

active transport, 1, 139, 140

acyl carrier protein synthase (AcpS), 1, 179

adenosine triphosphate (ATP) site recognition, 1,

187, 188

adiabatic approximations, 1, 20, 25, 27

adiabatic Jacobi correction (AJC), 3, 158

ADME-Tox, 5, 101—104, 108—109, 111, 113, 114,

116—119, 121, 122

ADMET properties

active transport, 1, 139, 140

aqueous solubility, 1, 135—137, 162

blood—brain barrier permeation, 1, 140—142

computational prediction, 1, 133—151

cytochrome P450 interactions, 1, 143, 144

drug discovery, 1, 159—162

efflux by P-glycoprotein, 1, 140, 160, 161

intestinal absorption, 1, 137, 138

intestinal permeability, 1, 134, 135, 161

metabolic stability, 1, 142, 143, 162

oral bioavailability, 1, 134, 138, 139, 159, 160

plasma protein binding, 1, 142

toxicity, 1, 144

AGC group of kinases, 1, 196

agrochemicals, 1, 163

AK peptide, 2, 91

alchemical free energy simulation, 6, 51—59

“alchemical” free energy transformations, 3,

41—53

alignment-independent molecular descriptors, 3, 69

AMBER, 2, 91; 6, 4, 11, 13—14, 152, 194, 282

AMBER force fields, 1, 92, 94—97, 99, 119—121

angular wavefunctions, 1, 225—228

anisotropic polarizability tensors, 3, 180

ANO basis, 3, 201

apparent errors, 3, 196

applicability domain, 2, 113, 118, 120, 123, 125

aqueous solubility, 1, 135—137, 162

aromatic cluster, 3, 212, 221

assay, 4, 23, 24, 204, 205, 208, 210, 212, 213, 221,

223, 225, 226, 229, 230, 232—235, 238, 239

asymmetric top notation, 3, 159

ATI, 6, 4, 11—12, 24

atomic orbital representations, 1, 225—228

atomistic, 6, 140, 161, 201—228, 240, 242

atomistic simulation

boundary conditions, 1, 80

experimental agreement, 1, 77, 78

force fields, 1, 77, 79—82

methodological advances, 1, 79

nucleic acids, 1, 75—89

predictive insights, 1, 78, 79

sampling limitations, 1, 80—82

atomistic simulations

time scale, 3, 15

transition path methods, 3, 16

ATP see adenosine triphosphate

aug-cc-pVnZ, 3, 198

AUTODOCK, 1, 122, 123; 2, 184

B-factors, 3, 32, 34, 35

B3LYP functional, 1, 32, 48—50

back-propagation neural networks (BPNN), 1,

136, 137

Bad, 2, 197, 203

bagging, 2, 136

Bak, 2, 197, 198, 203—205

BAR domains, 6, 238—241, 246, 248—252, 256

barrier heights, 2, 64, 73

base pair opening, 1, 77

basis set superposition errors (BSSE), 2, 68, 74,

76, 78

basis sets, 1, 13—15, 32, 33; 3, 195

Bax, 2, 197, 198, 203, 204

313

314

Cumulative Index Vols 16

Bayes model, 2, 157

Bayesian methods, 2, 132

Bcl-2, 2, 197, 198, 201, 203—206

Bcl-xL, 2, 197, 203—206

Bennett acceptance ratio, 3, 44, 45

benzene dimers, 3, 188

benzene—water, 3, 186

Bessel-DVR, 3, 167

Betanova, 1, 248—9

Bethe—Salpeter equation, 1, 27

bias potential, 2, 224—226, 229, 230

Bid, 2, 197, 203, 205

Bim, 2, 197, 203

binding affinities, 1, 78

binding free energy, 4, 69, 73, 81, 82, 164

calculating, 1, 114—119

protein—ligand interactions, 1, 113—130

scoring functions, 1, 119—126

binding rate, 4, 74—82

bioavailability, 1, 134, 138, 139, 159, 160; 5, 103,

104, 113—119, 121, 122

bioinformatics, 4, 4, 12, 18, 30, 33, 68, 206

biological activity, 4, 24, 204—206, 209, 210, 212,

213, 218, 219, 227, 232

bio-molecular simulation

atomistic simulation, 1, 75—82

nonequilibrium approaches, 1, 108

protein force fields, 1, 91—102

protein—ligand interactions, 1, 113—130

water models, 1, 59—74

biospectrum similarity, 2, 150

Bleep, 2, 162

block averaging, 5, 31, 33—37, 44, 47, 61

blood-brain-barrier, 5, 109, 110, 122

blood—brain barrier permeation, 1, 140—142, 160, 161

BO approximation, 3, 158

body-fixed frame, 3, 166

bond breaking

configuration interaction, 1, 51

coupled cluster methods, 1, 52, 53

generalized valence bond method, 1, 47, 48

Hartree—Fock theory, 1, 46, 48—51

multireference methods, 1, 51—53

perturbation theory, 1, 51, 52

potential energy surface, 1, 54

quantum mechanics, 1, 45—56

self-consistent field methods, 1, 46, 47, 53

spin-flip methods, 1, 53

bond vector(s), 3, 167, 168

boost energy, 2, 225—227

boosting, 2, 136, 151

Born—Oppenheimer approximation, 1, 3, 54

Born—Oppenheimer (BO), 3, 156

BOSS program, 2, 264

boundary conditions, 1, 80

Boyer Commission, 1, 206—207

BPNN see back-propagation neural networks

Bragg’s Law, 3, 89, 90, 97

Breit, 3, 164

Breit term, 3, 163

Bridgman tables, 1, 224

BSSE see basis set superposition errors

Brownian dynamics, 4, 77

Caco-2 absorption, 5, 102

Cahn-Hilliard equations, 6, 240

CAMK group of kinases, 1, 186, 196

Carnegie Foundation, 1, 206—207 casein kinase 2 (CK2), 1, 197

Casida’s equations, 1, 21, 22, 25

caspase-3, 2, 206

caspase-9, 2, 206, 208

CASSCF see complete-active-space selfconsistent field CATS3D, 2, 149

catalysis, 4, 97, 155—157, 161; 6, 66, 99, 169—195, 203

CBS-n methods, 1, 36, 37

CC see coupled cluster cc-pCVnZ, 3, 198, 199

cc-pV(nþd)Z, 3, 197

cc-pVnZ, 3, 196, 199, 202

cc-pVnZ-DK, 3, 200, 202

cc-pVnz-PP, 3, 201, 202

cc-pwCVnZ, 3, 198, 199

CCSD(T), 3, 160

CD see circular dichroism CDKs see cyclin-dependent kinases cell signaling, 6, 238, 246, 256

central nervous system (CNS) drugs, 1, 160, 161

CH2 radical, 3, 156

chance correlations, 2, 153

charge transfer (CT), 1, 26

charge transfer interactions, 3, 180

CHARMM force fields, 1, 77, 79, 92—95, 97—99,

119, 120

chemical amplification, 2, 11

chemical Kinetics Simulator, 2, 4

Chemical Markup Language (CML), 3, 116, 126

chemical space (size of), 2, 143

chemical structures, 4, 128, 204, 205, 208, 211,

218—220, 224, 230, 234

chemical vapor deposition (CVD), 1, 232, 233

chemScore, 2, 162

cholesterol, 5, 5, 6, 8—12, 15, 16

circular dichroism (CD) spectra, 1, 22—24 circular fingerprints, 2, 144

cis-trans isomerization, 2, 228, 229

CI see configurational interaction classification, 4, 14, 15, 17, 27, 44—57, 212, 239

cluster-based computing, 1, 113

Cumulative Index Vols 16

CMAP see correction maps

CMGC group of kinases, 1, 186, 192—194

CNS see central nervous system

CO2, 3, 162, 168

coarse-grained models, 6, 267, 274

coarse-grained theory, 6, 241, 267, 274

coarse-graining, 4, 111

cold shock proteins (CSP), 3, 24

combinatorial QSAR, 2, 113, 120

combined quantum mechanical/molecular

mechanical potential, 6, 53

CoMFA, 2, 152

compartmentalization, 2, 11

complete basis set, 3, 196

complete basis set (CBS) full configuration

interaction (FCI), 3, 156

complete-active-space self-consistent field

(CASSCF) method, 1, 47, 53

compound equity, 1, 171

computational protein design (CPD), 1, 245—253

degrees of freedom, 1, 246

energy function, 1, 246, 247

examples, 1, 248—250

search methods, 1, 247, 248

solvation and patterning, 1, 247

target structures, 1, 246

computational thermochemistry

ab initio methods, 1, 33, 37, 45

CBS-n methods, 1, 36, 37

density functional theory, 1, 32, 33

empirical corrections, 1, 34—36

explicitly correlated methods, 1, 39

G1, G2, G3 theory, 1, 34—36

hybrid extrapolation/correction, 1, 36—37

isodesmic/isogyric reactions, 1, 34

nonempirical extrapolation, 1, 37—39

quantum mechanics, 1, 31—43

semi-empirical methods, 1, 31, 32

Weizmann-n theory, 1, 37—39

concerted rotations, 5, 63, 65

configurational interaction (CI), 1, 9, 10, 48, 51

configurational space, 2, 84

conformation change(s), 3, 32—36

conformational changes, substrate induced P450,

2, 173

conformational flexibility, 1, 173

conformational flooding, 2, 221, 223, 224

conformational fluctuations, 4, 74, 81, 109, 161

conformation restraints, 3, 49, 50

conformational sampling, 3, 48, 49; 6, 56, 58

conformational Transitions, 2, 221, 222, 227

consensus approaches, 1, 145

consensus scoring, 2, 158

continuum salvation models, 3,

198, 203

315

convergence, 5, 26, 27, 37—41, 68, 92, 132, 143,

144, 156

core correlation, 3, 198, 203

core-valence, 3, 199, 202

correction maps (CMAP), 1, 95, 96, 98

correlating functions, 3, 197

correlation energy, 2, 53, 54, 59—62, 64—71, 73,

74, 76

correlation methods, 1, 8—11 correlation states, 6, 92

correlation-consistent, 3, 160, 196

Council for Chemical Research, 1, 240

Council on Undergraduate Research (CUR), 1, 206—208 coupled cluster (CC) methods, 1, 10—11, 37—40,

48—50, 52, 53; 5, 131, 132

CPD see computational protein design CPHMD, 3, 6

Crooks relationship, 3, 45

cross-validation

leave-group-out, 3, 67

leave-one-out, 3, 67

Crystallographic Courseware, 3, 96

CT see charge transfer Cu, Zn superoxide dismutase (SOD), 3, 24, 25

CUDA, 6, 7—10, 15, 23—24, 26, 28, 30, 32

CUR see Council on Undergraduate Research current density, 1, 27

curvilinear, 3, 27

CVD see chemical vapor deposition

cyclin-dependent kinases (CDKs), 1, 186,

192—194

CVRQD, 3, 161—164 CYP inhibitor, 3, 65, 71

CYP substrate, 3, 65, 71

cytochrome c, 3, 22

cytochrome P450, 2, 171; 3, 63, 64

2C5, 2, 172

2C9, 2, 172

3A4, 2, 172

BM-3, 2, 174

eryF, 2, 174

terp, 2, 174

cytochrome P450 interactions, 1, 143, 144

D-Score, 2, 161

D/ERY motif, 3, 211

D2.50, 3, 211

D&C see divide and conquer DA see discriminant analysis data analysis, 4, 42, 218, 223, 226, 227, 232, 239

database, 3, 169; 4, 10, 13, 17, 24—26, 49—52, 68, 92,

204—213, 218, 220—226, 228, 236, 238, 239

database mining, 2, 114,

121—125

316

Cumulative Index Vols 16

databases

drug-likeness, 1, 155, 156

ligand-based screening, 1, 172—175

self-extracting, 1, 223, 225

symbolic computation engines, 1, 223—225

data-mining, 4, 205, 206

Davidson correction, 3, 163

DBOC, 3, 160, 163

de novo protein design, 1, 245

dead-end elimination (DEE), 1, 247—249

degrees of freedom, 1, 246

density fitting, 2, 55, 74, 77

density functional theory (DAT), 6, 22, 24, 30, 67,

97—111, 210

bond breaking, 1, 48, 49

computational thermochemistry, 1, 32, 33

protein—ligand interactions, 1, 116

state of the art, 1, 4, 11—15

time-dependent, 1, 20—30

descriptor binarization effect, 2, 152

designability, 4, 7, 9, 11, 13, 17

DEWE, 3, 168

DEZYMER algorithm, 1, 249

DF-LCCSD(T), 2, 55

DF-LMP2, 2, 55, 73, 75

DFT see density functional theory discriminant

analysis (DA), 1, 138

diagonal Born-Oppenheimer corrections

(DBOC), 3, 158

dielectric constant, 4, 73, 74, 97, 98, 100, 109—111,

113—115, 117, 128, 129, 133

diffusion, 4, 75, 77, 79, 82, 140, 141, 147—152, 174,

176—180, 183, 184, 196

digital repository, 3, 103, 107, 108, 125, 129

dipole polarizability, 3, 179

discrete path sampling (DPS), 3, 16

discrete variable representation (DVR), 3, 166

displacement coordinates, 3, 168

dissipative MD, 3, 139

distant pairs, 2, 54, 62, 63

distributed computing, 1, 113

distributed multipolar expansion, 3, 179

distribution see ADMET properties

divide and conquer (D&C) algorithm, 1, 116—117

DKH, 3, 200

DMS, 3, 156

DMSs, 3, 163, 165

DNA gyrase, 2, 280

DOCK, 2, 157, 159, 161, 179, 184—186, 299—303,

308, 314—317, 319—320

DOCK program, 1, 173, 174, 177, 178, 189

docking, 1, 79, 114, 119, 121, 155, 169, 172—174,

178, 189—196; 2, 141, 145, 157, 159, 161, 162,

284, 297—303, 305—307, 309, 311, 313—321,

323; 4, 27, 68, 82, 160, 161, 207, 212

DockIt, 2, 299, 300, 317

DockScore, 2, 161

DockVision, 2, 299, 300, 315—317

domain approximation, 2, 53, 64, 73—76, 78

domain extensions, 2, 54, 59, 62, 63, 77

DOPI, 3, 166, 168

drug discovery, 1, 155—168; 3, 64

agrochemicals, 1, 163

aqueous solubility, 1, 162

chemistry quality, 1, 157

CMS drugs, 1, 160, 161

databases, 1, 155, 156

drug-likeness, 1, 155—157

intestinal permeability, 1, 161

lead-likeness, 1, 159

metabolic stability, 1, 162

oral drug activity, 1, 159—160

positive desirable chemistry filters, 1, 158, 159

promiscuous compounds, 1, 162, 163

druggability, 4, 23, 29—33, 213

drug-drug interactions, 3, 63

drug-likeness, 1, 155—157; 2, 160

DrugScore, 2, 161, 162

Dublin-core metadata (DC), 3, 104, 107,

108, 125

DVR, 3, 167

E6.30, 3, 211

Eckart—Watson Hamiltonians, 3, 167

education

research-based experiences, 1, 205—214

stochastic models, 1, 215—220

symbolic computation engines, 1, 221—235

effective core potentials, 3, 200

effective fragment potential (EFP), 3, 178

efflux by P-glycoprotein, 1, 140, 160, 161

EFP, 2, 267; 3, 178, 190

EFP-QM, 3, 182

EFP/PCM, 3, 181

induced dipolses, 3, 181

elastic network model(s), 3, 31—37

electrochemistry 6, 207, 210, 223

electron capture dissociation, 5, 164

electron correlation methods, 1, 8—11

electron propagator, 6, 79—92

electron transfer, 5, 164, 165—170, 172—176,

178—181

electron transfer dissociation, 5, 164

electronic embedding, 2, 37

electronic Schro¨dinger equation, 1, 3—15

electrostatic interaction, 3, 179

empirical force fields, 1, 91—102

empirical PESs, 3, 164

empirical scoring functions, 1, 122, 123

energy function, 1, 246—247

Cumulative Index Vols 16

enrichment, 2, 297, 302, 303, 305—309, 313—319

enzyme, 4, 6, 25, 27, 32, 96, 97, 155—165, 208

error analysis, 5, 24

Essential dynamics, 2, 233, 236, 242—244, 247

Euler angles, 3, 168

evolutionary determinants, 4, 4, 5

evolvability, 4, 7—9, 17

Ewald summation, 2, 265

Ewald summation techniques, 1, 59, 62, 75

exact exchange, 1, 26, 27

exchange repulsion, 3, 179, 180

excited state structure/dynamics, 1, 24

excretion see ADMET properties explicit-r12 correlation, 5, 132, 140

explicit solvent, 2, 98, 99, 101, 102, 104—106

exponential damping functions, 3, 180

extended systems, 1, 26

extensible metadata platform (XMP), 3, 104, 107,

109—111

F-Score, 2, 161

FCI, 3, 160

feature selection, 2, 151, 153

FEP see free energy perturbation

FEPOPS, 2, 146

few-body systems, 3, 158

few-electron systems, 3, 156

Fingal, 2, 148

fitness density, 4, 11, 14, 17

first-principles thermochemistry, 3, 160

FIS3, 3, 161, 162, 164

FKBP, 3, 52

FlexX, 1, 173, 178, 189; 2, 157, 159, 184, 186, 299,

300, 308, 313—319

Floþ 299, 300, 317

FLO99, 1, 178

Florida Memorial College, 1, 212

fluctuation theorem, 1, 109

fluid properties, 1, 239—244

focal-point approach (FPA), 1, 39; 3, 160

folding intermediate states, 3, 9

force fields, 3, 162

molecular simulations, 1, 239, 240

nucleic acids, 1, 77, 79—82

protein—ligand interactions, 1, 116, 119—121

proteins, 1, 91—102

structure-based lead optimization, 1, 177

FPA, 3, 160

fragment positioning, 1, 175—177

FRED, 2, 148, 161, 299, 300, 313, 314, 317, 319

free energy, 1, 96, 103—111, 113—130; 4, 6, 69, 73,

92, 108—111, 115, 117, 127—129, 132, 133, 157,

163, 164, 181, 182, 187; 5, 6—16, 55, 109

free energy calculations, 3,

41—53

317

free energy perturbation (FEP), 1, 104, 106; 2, 265;

6, 38—43, 46

free-energy profiles, 2, 265; 6, 37—46 frequency response, 6, 220, 222

fullerene ionization energies, 6, 80

functional microdomains, 3, 211

Fuzzy clustering, 2, 160

fuzzy logic, 1, 218

G-protein coupled receptors (GPCRs), 3, 209

G-Score, 1, 123; 2, 161

G1, G2, G3 theory, 1, 34—36 GAMESS, 3, 190

Gaussian Geminal Methods, 2, 25

Gaussian quadratures, 3, 166

GB-1 beta hairpin, 2, 91, 92

generalized Born, 2, 222; 4, 73, 109, 110, 115, 117,

126, 129, 131, 134

generalized conductor-like screening model

(GCOSMO), 2, 266

generalized ensemble simulation, 6, 56—57 generalized finite basis representation (GFBR),

3, 167

generalized gradient approximation (GGA), 1, 12

generalized valence bond (GVB) method, 1, 47—48 Ghose/Crippen descriptors, 2, 160

Glide, 2, 161, 299, 300, 302, 303, 313—319

global matrices, 1, 116—117 glutathione peroxidase, 2, 47

glycosylation, 6, 264—265, 267—270, 275

GOLD, 2, 161, 162, 184—186, 299, 300, 313—319

GPU, 6, 4—16, 21—33, 100

GRAFS, 3, 210

graphical representations, 1, 225—228, 232, 233

graphics processing units, 6, 5, 21—34 GRID, 2, 148—149 GRIND, 2, 148

GROMACS, 2, 89, 91

GROMOS, 2, 91

GROMOS force fields, 1, 97

GVB see generalized valence bond [H,C,N], 3, 163

H2, 3, 158

Hþ 2 -like systems, 3, 158

H16 2 O, 3, 160, 164

H17 2 O, 3, 159, 160, 164

H18 2 O, 3, 164

H2O, 3, 162, 163, 168

H2S, 3, 163

Hþ 2 , 3, 158

hammerhead ribozyme, 6, 171—181

Hartree—Fock (HF), 3, 160

318

Cumulative Index Vols 16

Hartree—Fock (HF) method, 1, 4—11, 13—15, 20, 21,

46, 48—51

Hartree-Fock theory, 6, 122

HDM2, 2, 209

HEAT (High-accuracy Extrapolate Ab initio

Thermochemistry), 3, 160

Hellmann—Feynman theorem, 1, 21

HF limit, 3, 197

hierarchical protein design, 1, 245

high throughput docking (HTD), 2, 298—302,

304—306, 308, 309, 317—320

high-resolution spectra, 3, 157

high-throughput screening (HTS), 1, 171, 172

HINT, 2, 162

Hohenberg—Kohn (HK) theorem, 1, 11, 20

homodesmotic reactions, 1, 34

homology models, 1, 170, 188, 189; 3, 211

HTD see high throughput docking HTS data analysis, 2, 156

HTS Data Mining and Docking Competition, 2, 159

HTS see high-throughput screening human intestinal oral plasma protein binding, 5,

103, 116

hybrid quantum and molecular mechanical simulation (QM/MM), 2, 263—268 hybrid solvent, 2, 106

hybridization, structure-based, 1, 191, 192

hydration free energies, 1, 103

hydrogen bonding, 6, 43, 46, 68, 71—75, 106,

108—110, 145—146, 151, 153, 159, 183, 186—188

Hylleraas Method, 2, 21

Hylleraas-CI method, 2, 24

hyperdynamics, 2, 221, 224, 225; 5, 80, 83—85, 89,

91—93

IAPs, 2, 206

ICM, 2, 299, 300, 308, 313—314, 318—319

ICMRCI, 3, 163

IL-2, 2, 214

implicit solvent, 2, 99—100; 3, 5; 4, 107—109, 111—113, 117, 125—134 Induced Fit, 3, 218

information triple, 3, 109, 110, 128, 131

intermolecular potential functions, 1, 241, 242

internal coordinates, 3, 166

intestinal absorption, 1, 137—138 intestinal permeability, 1, 134, 135, 161

intrinsic errors, 3, 196

iron chelation, modeling of, 2, 185

isodesmic/isogyric reactions, 1, 34

Jacobi coordinates, 3, 158

Jarzynski relationship, 1, 103—110; 3, 45, 46

Jmol, 3, 99, 113—117, 119—121, 125, 126

Kemp decarboxylation, 2, 263, 264, 271—273, 275

kinetics, 4, 16, 68, 113, 156, 175, 186, 190—192, 196

kinetic Monte Carlo, 6, 203—206, 216

kinome targeting, 1, 185—202

applications, 1, 192—197

ATP site recognition, 1, 187, 188

homology models, 1, 188, 189

kinase family, 1, 186, 187

methodology, 1, 188—192

selectivity, 1, 190, 191

structure-based hybridization, 1, 191, 192

virtual screening, 1, 189, 190

knowledge-based scoring functions, 1, 123—125

knowledge bases, 4, 204, 208—214

Kohn—Sham (KS) equations, 1, 11, 20—22, 25

Kohonen maps, 2, 181

Kriging, 2, 151

L1 ligase ribozyme, 6, 172—173

laboratory course modules, 1, 7

Lamb-shift, 3, 163, 164

Lambda dynamics, 3, 6

Lanczos technique, 3, 166

Langevin, 3, 140, 144, 145; 4, 108, 113, 174,

180, 184

Landau-Zener theory, 5, 166

LCCSD(T), 1, 54, 62, 71, 78

LCCSD(TO), 1, 64

lead optimization see structure-based lead

optimization

lead-likeness, 1, 159

Lennard—Jones (LJ) potential, 1, 93, 94,

116, 121

LES see locally enhanced sampling

level density, 3, 156

library enumeration, 1, 178

ligand binding, 1, 103; 3, 42, 43, 51

ligand-based screening, 1, 172—175, 178—9

LigandFit, 2, 299, 300, 302, 303, 315—17, 319

LigScore2, 2, 161

linear interaction energy, 1, 117

Linear R12 methods, 2, 28

linear scaling, 2, 54, 55, 62, 64, 77

LINGO, 2, 146

link atoms, 2, 37

lipid rafts, 6, 238, 246—247, 256

LJ see Lennard—Jones

LMP2, 2, 55, 60—78

Local Correlation, 2, 53, 77

local coupled cluster, 2, 54

local spin density approximation, 1,

11—12

localized orbitals, 2, 53, 54, 57

locally enhanced sampling (LES), 1, 79

Cumulative Index Vols 16

long-range electrostatic interaction , 6, 53

LOOPSEARCH, 3, 216

LUDI scoring function, 1, 123, 173

lysozyme, 2, 199

machine learning, 4, 4, 25, 41—46, 49, 53—58 many-body perturbation theory, 1, 10

Maple, 1, 228, 230—232 MARVEL, 3, 157—162, 165

master equations, 1, 115, 116, 119, 120

Mathematical Association of America, 1, 215, 216

MaxFlux, 3, 16

maximum common substructure, 2, 160

maximum likelihood methods, 3, 44

MC see Monte Carlo MCSCF see multi-configurational self-consistent field MCSS program, 1, 173, 174, 177

MD see molecular dynamics MDM2, 2, 197, 200, 209—211

mean-field model, 6, 240, 247, 249

mean force, 6, 37—46, 146, 281—293 mechanical embedding, 2, 37

MEMBSTRUCK, 3, 220

membrane, 4, 49, 50, 108, 110, 111, 115—117, 131;

5, 4—8, 12, 13, 38, 69, 104, 108, 111, 113,

115, 116, 119

membrane curvature, 6, 242, 245—246, 248—250, 252

membrane elasticity, 6, 241

Menshutkin reaction, 2, 263, 265—268, 275

metabolic stability, 1, 142, 143, 162

see also ADMET properties

metal surface, 3, 137

Mg2+ ions, 6, 175—177, 181—183, 187—188, 191

Miller indices h, k, l, 3, 91

MLR, 3, 67

MLR see multiple linear regression MM see molecular mechanics model applicability domain, 3, 68, 74

Model scope, 2, 155

Modeling, 6, 10, 44, 97—111, 122, 145—147,

150—151, 153—156, 158—160, 201—228,

237—257

MODELLER, 3, 213

MODLOOP, 3, 216

MOE, 3, 214

MOEDock, 2, 299, 300, 317

MOIL, 3, 19

molecular crowding, 4, 110

molecular descriptors, 2, 141, 144—146, 151; 3, 66

molecular docking, 6, 110

molecular dynamics, 2, 98, 99, 221—224, 227—230,

233—238, 243, 244, 246, 247; 3, 140; 4, 33, 72,

109, 111, 112, 117, 126, 133, 134, 139, 146, 147,

319

161—163; 6, 3—17, 28, 52, 160, 173, 181, 203,

239, 267, 293

atomistic models, 3, 143

coarse-grained, 3, 138, 144

with electronic friction, 3, 143

molecular dynamics (MD) simulation, 1, 75—78, 217, 239, 242

molecular interaction field, 3, 66

molecular mechanics (MM), 1, 119—122

molecular modeling, 1, 59—130

atomistic simulation of nucleic acids, 1, 75—89 free energy, 1, 103—111, 113—130

nonequilibrium approaches, 1, 103—111

protein force fields, 1, 91—102

protein—ligand interactions, 1, 113—130

water models, 1, 59—74

TIP4P, 1, 62—64, 69—72

TIP4P-EW, 1, 64, 65, 69—72

TIP5P, 1, 65—67, 69—72

TIP5P-E, 1, 67—72

molecular orbital representation, 1, 229—231

Molecular Similarity, 2, 141

molecular simulations, 1, 177, 178, 239—244; 4,

134; 6, 169—195

Møller—Plesset form, 1, 10, 48—50

Møller-Plesset perturbation theory, 6, 31, 123

MOLPRINT 2D, 2, 145

Monte Carlo methods, 1, 216—218, 239, 242,

247, 248

Monte Carlo simulation (MC), 2, 263—268, 270,

271, 273, 275; 5, 49, 70

Mo¨ssbauer, 6, 66, 68—70, 72—73

multi-configurational self-consistent field

(MCSCF) method, 1, 9, 10, 46, 47

multicanonical ensemble, 5, 69

multicanonical methods, 3, 48

multi-domain proteins, 6, 264, 266—267, 273—275

MULTIMODE, 3, 166

multiple excitations, 1, 25

multiple linear regression (MLR), 1, 136

multiple sequence alignment, 3, 211—213

multipole approximations, 2, 62

multireference methods, 1, 51—53

MV, 3, 163

MVD1, 3, 164

MVD2, 3, 163

n-mode representation, 3, 167

N2O, 3, 162

N1.50, 3, 211

N7.49, 3, 211, 212

NAMD, 6, 10—15, 194

National Science Foundation (NSF), 1, 206,

207, 209

320

Cumulative Index Vols 16

neural networks, 2, 181

nonadiabatic, 3, 158

nonequilibrium approaches

computational uses, 1, 109

experimental applications, 1, 108

free energy calculations, 1, 103—111

Jarzynski relationship, 1, 103—110

theoretical developments, 1, 108, 109

NMR, 4, 10, 29, 31, 53, 68, 75, 82, 90—92, 96—102,

139—141, 143—147, 149, 151, 152, 162, 206

nonequilibrium work, 3, 45, 46

nonlinear models, 2, 152

normal coordinates, 3, 163, 167, 168

normal mode, 3, 159

NPXXY motif, 3, 212

NR, 2, 211

NSF see National Science Foundation

nuclear hormone receptor, 2, 211

nuclear motion computations, 3, 166

nuclear-motion, 3, 169

nucleic acids, 1, 75—89

nucleophilic aromatic substitution (SNAr), 2,

263, 264

nucleotide electron detachment energies, 6, 80

nudged-elastic-band (NEB) method, 3, 16

nuisance compounds, 1, 162, 163, 190

NVIDIA, 6, 4—5, 7, 9, 11—12, 14, 16, 23—24, 26—29,

32—33 objectives for teaching crystallography, 3, 86—89

OMTKY3, 3, 189

ONIOM, 2, 35

Onsager-Machlup action, 3, 17, 18

OpenMM, 6, 11—15

OPLS-VA/VA force fields, 2, 265, 273

OPLS/AA force fields, 1, 92—94, 97

optical interference, 3, 96

oral bioavailability, 1, 134, 138, 139, 159, 160

oral drug activity, 1, 159, 160

orbital domains, 2, 58, 59, 61—63

orbital representations, 1, 225—231

orthogonal coordinates, 3, 166

oscillating systems, 1, 232, 233

overfitting, 2, 154

oxymyoglobin, 6, 68—69, 72—73

p-glycoprotein, 1, 140, 160—161

p53, 2, 197, 200, 209—211

PAO, 2, 53—62, 68

parallel computing, 1, 242

parallel-replica dynamics, 5, 81, 83, 88, 90, 96

PARAM force fields, 1, 97

partial least squares (PLS), 3, 67

partial least squares (PLS) analysis, 1, 134, 135, 138

patterning, 1, 247

PB see Poisson—Boltzmann

PCM, 2, 266, 271, 275

PCM induced charges, 3, 181

PDB see Protein Data Bank

PDBbind, 2, 161

PDDG/PM3, 2, 263—265, 267, 268, 273—275

PDF inhibitor, 2, 288

periodic boundary conditions, 3, 181

permeability, intestinal, 1, 134, 135, 161

perturbation theory (PT), 1, 10, 51, 52; 3, 156

PES see potential energy surface

pH-coupled molecular dynamics, 3, 4

pH-modulated helix-coil transitions, 3, 9

pharmaceutical chemicals

ADMET properties, 1, 133—151

drug discovery, 1, 155—168

structure-based lead optimization, 1, 169—183

virtual screening protocols, 1, 114, 120, 125

pharmacophore models, 1, 172—174

pharmacophores, 2, 182, 183

PhDOCK, 1, 173, 174, 177

phospholipid, 5, 6, 11, 16

physical chemistry, 1, 215—217

PIP2 diffusion, 6, 254—256

Pipek—Mezey localization, 2, 56, 68

pKa, 3, 4, 188

pKa prediction, 3, 4

pKa values, 4, 73, 90—94, 96—100, 102

plasma protein binding (PPB), 1, 142

PLOP, 3, 216

PLP2, 2, 161

PLS see partial least squares

PMF, 2, 161, 162, 263, 266

PMFScore, 1, 124, 125

Podcast, 3, 99, 118—121, 131

point group symmetry, 3, 94

Poisson—Boltzmann (PB) equation, 1, 117—122; 4,

97, 109, 129

Poisson-Boltzmann theory, 6, 241

polarizable continuum model (PCM), 2, 264,

266, 271

polarization consistent, 3, 196

polymerization, 4, 174, 175, 177, 179—192, 194—196

polymer-source chemical vapor deposition

(PS-CVD), 1, 232, 233

Polynomial, 6, 37—46

polynucleotides, 5, 59, 65

poly(organo)silanes, 1, 232, 233

polypeptides, 5, 59, 61, 65, 69, 164—166, 168—170,

172, 173, 175, 176, 180, 181

pores, 5, 6, 12, 14—16

positive desirable chemistry filters, 1, 158, 159

PostDOCK, 2, 157

potential energy landscape, 2, 221—224, 227, 229,

230

Cumulative Index Vols 16

potential energy surface (PES), 1, 3, 4, 54

potential functions, 1, 241, 242

potential of mean force (PMF), 2, 263—268; 6,

37—46, 284

PPB see plasma protein binding

PREDICT, 3, 219

predictive modeling, 1, 133—151, 240

PRIME, 3, 214

principal component, 5, 39—41, 61, 120

principal component analysis, 2, 233, 235, 236

privileged structures, 1, 158

probabilistic protein design, 1, 249, 250

problem-solving templates, 1, 228

process design, 1, 231, 232

projected atomic orbitals, 2, 53

projective models, 3, 144

proline, 3, 213, 216, 221

promiscuous compounds, 1, 162, 163, 190

protein A, 3, 22

protein conformational change, 4, 101, 161, 162

Protein Data Bank (PDB), 1, 113, 117, 123, 124

protein design, 1, 245—253

degrees of freedom, 1, 246

energy function, 1, 246, 247

examples, 1, 248—250

search methods, 1, 247, 248

solvation and patterning, 1, 247

target structures, 1, 246

protein electrostatics, 4, 90, 102

protein folding, 3, 22; 6, 15—16, 264, 266

protein force fields, 1, 91—102

condensed-phase, 1, 94—96

free energies of aqueous solvation, 1, 96

gas-phase, 1, 94—96

optimization, 1, 96—99

united-atom, 1, 97

protein function, 4, 5—7, 49, 67

protein kinases see kinome targeting

protein misfolding and aggregration, 3, 9

protein—ligand interactions, 1, 113—130; 6, 145,

282, 284—285, 288, 293

protein—protein interaction, 2, 197—198, 200, 202,

203, 205, 211, 214, 215

protein structure, 4, 4—6, 9, 10, 13—15, 17, 24, 30,

42, 49, 50, 53, 54, 56, 58, 90, 91, 93, 96—102,

112, 208

protein-RNA, 4, 49

PS-CVD see polymer-source chemical vapor

deposition

pseudopotentials, 3, 200

PubChem, 4, 204, 205, 211—213, 218—227, 229—240

QED, 3, 158, 163

QM/EFP/PCM, 3, 181

321

QM/MM, 2, 35, 263—268, 270, 271, 273—275; 3,

182, 188, 190; 4, 156—164

QM/MM calculations, 6, 39, 55, 72, 173

QSAR, 3, 66; 5, 104, 105, 107, 109, 110, 115—118,

120—122

QSAR/QSPR models, 1, 133—151

quantum chemical calculations, 6, 65—75

quantum chemistry, 6, 21—34, 115, 121—122, 128

quantum electrodynamics (QED), 3, 155

quantum mechanics, 1, 3—56

basis sets, 1, 13—15, 32, 33

bond breaking, 1, 45—56

computational thermochemistry, 1, 31—43

configurational interaction, 1, 9, 10, 48, 51

coupled cluster methods, 1, 10, 11, 37—40,

48—50, 52, 53

density functional theory, 1, 4, 11, 12, 13—15,

32, 33, 48, 49

electron correlation methods, 1, 8—11

generalized valence bond method, 1, 47, 48

Hartree—Fock method, 1, 4, 5—11, 13—15, 20, 21,

46, 48—51

perturbation theory, 1, 10, 51, 52

potential energy surface, 1, 3, 4, 54

self-consistent field methods, 1, 6—10, 37, 46,

47, 53

semi-empirical methods, 1, 12—13, 15

symbolic computation engines, 1, 225—228

time-dependent density functional theory, 1,

20—30

quantum Monte Carlo, 6, 22, 32—33

quantum number, 3, 164

quantum—classical enzymatic calculations, 1, 103

quasi-static (QS) transformations, 1, 105, 133—151

quasiparticle approximations, 6, 82—83

quasiparticle virtual orbitals, 6, 84—86

QZVPP, 3, 197

R-group descriptor, 2, 147

random Forest, 2, 136, 151

rare event, 3, 140

RASSCF see restricted-active-space selfconsistent

field

re-parameterizations, 1, 59—61, 67, 72

reaction energies, 2, 53, 54, 64, 71, 74, 75, 77

reaction kinetics, 3, 158

receptor activation, 3, 221

reference state, 6, 80, 83—84, 284—285, 287—292

refinement, 3, 216, 218, 219

relativity, 3, 200

REMD see Replica Exchange Molecular Dynamics

renormalized approximations, 6, 83, 92

Replica Exchange Molecular Dynamics, 2, 83, 85,

87, 89—91, 93, 95, 222

322

Cumulative Index Vols 16

Replica exchange with solute tempering (REST),

2, 86

replica-exchange, 3, 7

repository, 4, 10, 56, 205, 218, 238

Research Experiences for Undergraduates

(REU), 1, 209

research institutions, 1, 205—214

restrained electrostatic potential, 1, 92, 93

restricted Hartree—Fock (RHF), 1, 46, 48—50

restricted-active-space self-consistent field

(RASSCF) method, 1, 47

REU see Research Experiences for

Undergraduates

RHF see restricted Hartree—Fock

RISM, 2, 266, 267

RNA, 6, 87, 140—141, 144—147, 151—154, 156—160,

170—173, 180—182, 186, 194

ROC curve, 2, 297, 306, 307, 315

ROCS, 2, 318

Roothaan—Hall equations, 1, 6—8

rotational-vibrational

energy levels, 3, 159

spectra, 3, 169

transitions, 3, 159

rovibrational eigenvalues, 3, 157

Ru(bpy)2þ 3 7

Runge—Gross theorem, 1, 27

Rydberg orbital, 5, 165—168, 170—178

SNA, 2, 270, 271

SNAr, 2, 268—270, 275

sampling barriers, 1, 242, 243

SAR see structure—activity relationships

scads, 1, 250

scaling methods, 1, 6—8

Schro¨dinger equation, 1, 3—15; 2, 297—299, 313,

314, 316, 318—320

scoring functions, 1, 119—126; 6, 100, 145—146,

155, 159, 281—293

scoring functions, quality, 2, 161, 162

self-consistent field (SCF) methods, 1, 6—10, 37,

46, 47, 53

self-consistent reaction field (SCRF), 1, 118, 121

self-extracting databases, 1, 223, 225

self-learning hyperdynamics, 5, 89, 92, 93

selectivity, 4, 23—27, 29, 33, 74

semantic Wiki, 3, 110, 123, 126—128, 131

semi-empirical methods, 1, 12—13, 15,

31, 32

PDDG/PM3, 2, 264, 265, 267, 268, 272,

274, 276

sextic force fields, 3, 162

SHAKE algorithm, 2, 222

signal trafficking see kinome targeting

similar property principle, 2, 141

simulation, 4, 9, 33, 72, 74, 77, 78, 81, 82, 107—109,

111—115, 117, 126, 128—134, 139—144,

146—152, 156, 159—164, 184, 187—192, 194, 195

Slater geminal methods, 2, 28, 30

Smac, 2, 206, 208, 209

small molecule solvation, 3, 50

“soft core” Lennard-Jones interactions, 3, 47

solid oxide fuel cell, 6, 201—228

solubility, 1, 135—7; 5, 104—107, 111, 113, 114, 119,

122, 123

solvation, 1, 117—119, 247

space group symmetry, 3, 94

spectroscopic accuracy, 3, 157

spectroscopic network (SN), 3, 159

spherical harmonics, 3, 167

spin-flip methods, 1, 53

spin relaxation, 4, 139, 140

standard domains, 2, 53, 57, 59, 64, 68, 69, 71,

73—76

standard pKa, 3, 4

standard uncertainty (su), 3, 87

statistical computational assisted design

strategy (scads), 1, 250

Steepest Descent Path (SDP), 3, 19

stochastic difference equation in length (SDEL),

3, 17—19

advantages, 3, 20

disadvantages, 3, 20

stochastic difference equation in time (SDET),

3, 17

Stochastic Gradient Boosting, 2, 137

stochastic models, 1, 215—220

storage capacity, 1, 224, 225

stream, 6, 7, 9, 22—23, 29

string method, 3, 16

strong pairs, 2, 59, 62, 63, 68—9, 71, 73, 75, 77

structural mimicry, 3, 217

structural motifs, 3, 211

structure-activity, 4, 24, 27, 47, 159, 208, 227,

232—235

structure—activity relationships (SAR), 1, 91,

133—151; 4, 24, 159, 161, 204, 208,

210—212, 232

Structure-based design, 2, 197, 202, 205, 209

structure-based drug design, 1, 114, 120, 125; 4,

33, 160

structure-based hybridization, 1, 191, 192

structure-based lead optimization, 1, 169—183

application to specific targets, 1, 179

compound equity, 1, 171

discovery, 1, 171—175

fragment positioning, 1, 175—177

high-throughput screening, 1, 171, 172

library enumeration, 1, 178

ligand—target complex evaluation, 1, 178, 179

Cumulative Index Vols 16

modification, 1, 175—179

molecular simulation, 1, 177, 178

structure visualization, 1, 175

virtual screening, 1, 169, 172—175

structure-based ligand design, 2, 184

structure-based virtual screening, 2, 284

structure-property relationships, 2, 142

structured-prediction, 4, 44, 48—50, 53—55, 57

substrate access, P450, 2, 178

substrate prediction, P450, 2, 172

support vector machines, 1, 137, 145; 2, 128, 149

surface diffusion, 3, 138, 140

Surflex, 2, 161

Sutcliffe—Tennyson triatomic rovibrational

Hamiltonian, 3, 167

symbolic computation engines (SCE), 1, 221—235

advanced application-specific procedures, 1,

229—231

computation power, 1, 228, 229

emulation of professional software, 1, 229—231

graphical representations, 1, 225—228, 232, 233

process design, 1, 231, 232

quantification, 1, 225, 231—233

self-extracting databases, 1, 223

specialized procedures, 1, 228, 229

storage capacity, 1, 224, 225

T4 lysozyme, 3, 52

target structures, 1, 246

TASSER, 3, 220

tautomeric interconversion, 3, 7

TC5b, 2, 89

TDDFT see time-dependent density functional

theory

temperature accelerated dynamics, 5, 81, 85, 86

temperature programmed-desorption, 2, 6

template approach, 1, 228, 229

thermal conductivity, 1, 242, 243

thermochemistry, 3, 158

thermochemistry, computational, 1, 31—43

thermodynamic integration (TI), 3, 44 45

thermodynamics

integration method, 1, 104

nonequilibrium approaches, 1, 103—111

protein—ligand interactions, 1, 113—130

symbolic computation engines, 1, 224, 225

water models, 1, 59—72

thermogravimetric analysis, 2, 6

thermostat, 4, 113, 148

thyroid hormone, 2, 197, 201, 211

time-dependent density functional theory

(TDDFT), 1, 20—30

computational aspects, 1, 21, 22

developments, 1, 26—28

electronic excitations, 1, 20, 21

323

exact exchange, 1, 26, 27

performance, 1, 22—24

qualitative limitations, 1, 25, 26

time-dependent Hamiltonian operators, 1, 104

time-independent Schro¨dinger equation, 3, 167

TIP3P, 2, 86, 89, 266

TIP4P, 1, 62—64, 69—72; 2, 265—267

TIP4P-Ew, 1, 64—65, 69—72

TIP5P, 1, 65—67, 69—72

TIP5P-E, 1, 67—72

titration curves, 4, 90—94, 96—99, 101, 102

TKL see tyrosine kinase-like

TKs see tyrosine kinases

toggle switch, 3, 212

Top7, 1, 249

torsional space, 5, 27, 52, 53

toxicity, 1, 144, 190

see also ADMET properties

TR, 2, 212

transamination, 1, 232, 233

transferable intermolecular potential (TIP) water

molecules, 1, 59—74

transient complex, 4, 75, 77—81

transition path sampling (TPS), 3, 16

transition path theory, 3, 16

transition state theory, 2, 224, 229; 3, 141

Trp-cage, 2, 89, 90, 93

Turbo Similarity Searching, 2, 153

two-electron integrals, 1, 6—7, 12, 13; 3, 182

tyrosine kinase-like (TKL) group of kinases, 1,

186, 196—197

tyrosine kinases (TKs), 1, 186, 194, 195

ubiquitination, 6, 264—266, 268, 271—273, 275

UHF see unrestricted Hartree—Fock

umbrella potential, 2, 223

umbrella sampling, 2, 221, 223, 224, 228, 230

undergraduate research, 1, 205—214

Undergraduate Research Programs (URPs), 1,

208—212

united-atom protein force fields, 1, 97

university research, 1, 205—214

unrestricted Hartree—Fock (UHF), 1, 46, 50, 51

URPs see Undergraduate Research Programs

van’t Hoff reactions, 1, 228, 229

vertical excitation, 1, 22—24

vibrational

band origins (VBOs), 3, 164, 168

energy levels, 3, 161

states, 3, 160

virtual database screening, 2, 201

virtual screening, 1, 169, 172—175, 189, 190; 2, 158

high throughput, 1, 120

protocols, 1, 114, 120, 125

324

Cumulative Index Vols 16

Virtual Screening, performance assessment of algorithms, 2, 144

viscosity, 1, 242, 243

visualization, 1, 175, 225—228, 232, 233

VPT2, 3, 163

water dimer, 3, 188

water models, 1, 59—74; 2, 98, 102

bio-molecular simulation, 1, 59—61

effective fragment potential (EFP), 2, 267

five-site, 1, 65—72

four-site, 1, 62—65, 69—72

generalized conductor-like screening

model (GCOSMO), 2, 266

methods, 1, 61, 62

reference interaction site model (RISM), 2,

267, 268

TIP3P, 2, 266, 267

TIP4P, 1, 62—64, 69—72; 2, 265—267

TIP4P-Ew, 1, 64, 65, 69—72

TIP5P, 1, 65—67, 69—72 TIP5P-E, 1, 67—72 water—benzene dimer, 3, 186, 188

wavefunctions, 1, 225—228 weak pairs, 2, 62—63, 68

Web 2.0, 3, 100, 111, 122, 124, 131

web-based tools, 4, 237

Weighted Probe Interaction Energy Method, 2, 147

Weizmann-n theory, 1, 37—39 Wigner rotation functions, 3, 166

Wiki, 3, 99, 103, 108, 117, 121—131

Wikipedia, 3, 99, 112, 122, 124, 129, 131

Wn (Weizmann-n), 3, 160

XED, 2, 159

XIAP, 2, 206, 208, 209

XScore, 1, 123; 2, 161, 162

Z-factor equation, 1, 22

zeolites, 2, 45

Zwanzig relationship, 3, 43, 44